CN105447146A - Massive data collecting and exchanging system and method - Google Patents
Massive data collecting and exchanging system and method Download PDFInfo
- Publication number
- CN105447146A CN105447146A CN201510843249.3A CN201510843249A CN105447146A CN 105447146 A CN105447146 A CN 105447146A CN 201510843249 A CN201510843249 A CN 201510843249A CN 105447146 A CN105447146 A CN 105447146A
- Authority
- CN
- China
- Prior art keywords
- data
- event
- transmission channel
- receiver
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a massive data collecting and exchanging system. The system adopts a proxy mode. Proxy of the system comprises: a data collector, a transmission channel and a receiver. The data collector is responsible for collecting data of a data source, converting the data into an event through processing, and sending the data into the transmission channel in an event (comprising two parts, namely, event head information and data) form, and supports a plurality of data receivers. The transmission channel is used for caching the event sent from the data collector. The receiver extracts the event in the transmission channel, stores a file in a file system and a database according to corresponding configuration, or submits the file to a remote server or a next level of proxy. According to the massive data collecting and exchanging system and method, all proxies are independent and can perform parallel exchange on a plurality of data sources, thereby realizing separation of data read-in and data write-out so as to making the system architecture more flexible and efficient and lighter.
Description
Technical field
The present invention relates to large data and Data Collection field, specifically a kind of mass data is collected and exchange system and method.
Background technology
Along with ICT (information and communication technology) development accumulates so far, various data become explosion type to develop, make terabyte (Terabyte, TB), petabyte (PetaByte, PB) even end byte (Exabyte, EB) data of level all become a kind of normality, and large data age just arises wherein; Although it is day by day general and ripe that large data breed in infotech, it is never limited to technological layer to the impact that social and economic activities produce, more in essence, he provides a kind of brand-new method for we treat the world, and namely decision behavior will day by day be made based on data analysis instead of manyly as before by virtue of experience make with intuition.
Large data refer in the time range that can cannot bear people carries out with conventional software instrument the data acquisition that catches, manage and process; Conventional software instrument cannot used to process large data, and represent our machine used in everyday is what cannot complete the storage of large data and analyzing and processing task; And high performance giant computer can double, the even raising of several times of price along with the lifting of performance; How to solve these difficult problems? distributed type assemblies can well solve this difficult problem; The mass data storage Internet era that open source projects distributed system architecture (Hadoop) being just in order to solve and process and design, develop; Simply say that Hadoop is one and can more easily develops and the Distributed Calculation of parallel processing large-scale data and storage system; It has, and ability extending transversely is strong, cost is low, efficiency is high, reliable feature; The user of current Hadoop thinks from traditional Internet firm, expands to telecommunications industry, power industry, hospital, financial industry, and obtains applying more and more widely.
Although Hadoop system has so many feature to be applicable to the Storage and Processing of large data, but a lot of raw data is stored in stand-alone machines, but not in Hadoop cluster, if we can not by these exchanges data in Hadoop cluster, the various advantages of Hadoop all cannot be implemented; How these raw data are exchanged to problem Hadoop system platform becoming and first will solve; Therefore our eager searching a kind of can rapidly and efficiently, safe and reliable mode by the exchanges data in different pieces of information source to Hadoop system; Have a sub-project data transfer tool (Sqoop) data in relevant database and Hadoop system can be carried out exchanges data in the project of current Hadoop, but it have two deficiencies: 1, exchanges data can only be carried out with relevant database; 2, the operation of Sqoop relies on the environment of not Hadoop, can not depart from Hadoop and carry out exchanges data.
The present invention is directed to these problems above-mentioned, propose a kind of collection of mass data and exchange system and method.
Summary of the invention
The present invention is that a kind of mass data is collected and exchange system and method, and object is to realize the exchanges data between different pieces of information source and large data processing platform (DPP).
Technical solution of the present invention is: the present invention is that a kind of mass data is collected and exchange system, its special character is, this system adopts proxy mode, the agency of this system comprises data collector, transmission channel, receiver, separate between each agency, parallel switching can be carried out to multiple data source, realize data and read in and being separated of writing out, make system architecture more flexibly, light weight, efficient.
Described data collector is responsible for the Data Collection of data source, is event through process change, in the transmission channel sent, supports several data receiver with the form of event (comprising event header information and data two parts).
Described transmission channel is used for the event that data cached gatherer sends over, and is the reliability ensureing data in transmittance process, only has when event buffer processes this event to next transmission channel or receiver, and ability is deleted in event from then on transmission channel.
Described receiver extracts the event in transmission channel, according to corresponding configuration, file is stored into file system, database, or is submitted in the agency of remote server or next stage.
The data sink of described data collector support comprises file, catalogue and database.
Described transmission channel comprises file and internal memory.
Described receiver comprises distributed file system (HadoopDistributedFileSystem, HDFS), non-relational database (HadoopDatabase, HBase), message system (Kafka) and file.
The present invention is that a kind of mass data is collected and switching method, and its special character is, the method comprises the following steps.
1) configuration file of agency is write according to demand.
2) agency is started according to the configuration file write, act on behalf of after successfully starting, start to transmit data, by receiver, data are read in agency inside from external data source, event is become to be sent to buffer memory in transmission channel the data encapsulation of reading in, the extraction of wait-receiving mode device, receiver extracts these events and they is resolved to raw data, is stored into final destination; After agency starts, the transmitting procedure of data is automatic, automatically can also realize the collection of transform data according to the change of data.
Above-mentioned steps 1) specific implementation step as follows.
100) type of data sink needs to do corresponding configuration according to the type of external data source, if data source is the file under a catalogue, receiver types is configured to catalogue file (SpoolingDirectory, spooldir), also wants the position of disposition data source.
101) type of transmission channel configures as required; Transmission channel also needs the size of the capacity of collocation channel, the options such as the size of transfer capability.
102) type and the user of receiver the most at last data stored in position relevant; When selecting HDFS as receiver, configuration store to the position of HDFS files, the size of file.
Above-mentioned steps 2) described in agency data transmission step as follows.
200) data collector is according to the data in the reading external data source, address of configuration, reads in and first judges whether data are new data afterwards, confirms as newly, pre-service is carried out to data, data are specifically formatd, and adds header, be encapsulated into an event.
201) data collector is sent to event in single or multiple transmission channel, wherein transmission channel can be regarded as a buffer zone, and its preservation event is until receiver extracts and processes this event.
202) receiver extracts the event in transmission channel, and event being resolved becomes raw data, writes data into destination by calling client-side interface, or as the external data source that next stage is acted on behalf of.
Accompanying drawing explanation
The integrated stand composition of Fig. 1 system.
Fig. 2 acts on behalf of internal data flow process figure.
Embodiment
The present invention is described in detail with reference to the accompanying drawings; Following detailed description of the invention is not limitation of the present invention; On the contrary, scope of the present invention is determined by claims.
The present invention is that a kind of mass data is collected and exchange system, and wherein the integrated stand composition of system as shown in Figure 1; This system adopts proxy mode, the agency of this system comprises data collector, transmission channel, receiver, separate between each agency, can carry out parallel switching to multiple data source, realize data to read in and being separated of writing out, make system architecture more flexibly, light weight, efficient.
Data collector is responsible for the Data Collection of data source, is event through process change, in the transmission channel sent, supports several data receiver, as file, catalogue, database with the form of event (comprising event header information and data two parts).
Transmission channel is used for the event that data cached gatherer sends over, and is the reliability ensureing data in transmittance process, only has when event buffer processes this event to next transmission channel or receiver, and ability is deleted in event from then on transmission channel; The passage supported has file, internal memory etc.
Receiver extracts the event in transmission channel, according to corresponding configuration, file is stored into file system, database, or is submitted in the agency of remote server or next stage; The receiver supported has HDFS, HBase, Kafka, file etc.
The present invention is that a kind of mass data is collected and switching method, and the method will complete an exchanges data to be needed to perform following steps.
1) write the configuration file of agency first according to demand, concrete steps are as follows.
100) type of data sink needs to do corresponding configuration according to the type of external data source, such as: if data source is the file under a catalogue, receiver types is configured to spooldir, also wants the position of disposition data source, as: file absolute path.Also have some specific attributes, no longer do concrete introduction.
102) type of transmission channel configures as required, has internal memory, file etc.; Transmission channel also needs the size of the capacity of collocation channel, the options such as the size of transfer capability.
103) type and the user of receiver the most at last data stored in position relevant, as HDFS, HBase etc.; Also having some often to plant parameter specific to receiver, during as selected HDFS as receiver, configuration store to the position of HDFS files, the size etc. of file.
2) agency is started according to the configuration file write, act on behalf of after successfully starting, start to transmit data, by receiver, data are read in agency inside from external data source, become event to be sent to buffer memory in transmission channel the data encapsulation of reading in, the extraction of wait-receiving mode device, receiver extracts these events and they is resolved to raw data, be stored into final destination, as the HDFS in Fig. 1, complete a data transfer; After agency starts, the transmitting procedure of data is automatic, automatically can also realize the collection of transform data according to the change of data.
In step 2) in, act on behalf of internal data flow process as shown in Figure 2; Data exchange system data stream is carried throughout by event; Event is the base unit of exchanges data; Data collector, according to the data in the reading external data source, address of configuration, reads in and first judges whether data are new data afterwards, confirms as newly, pre-service is carried out to data, data are specifically formatd, and adds header, be encapsulated into an event; Then data collector is sent to event in single or multiple transmission channel; You can regard transmission channel as a buffer zone as, and its preservation event is until receiver extracts and processes this event; Receiver extracts the event in transmission channel, event is resolved and becomes raw data, destination is write data into by calling client-side interface, or as the external data source of next stage agency, this is allowed to, as acted on behalf of 1,2,3 in Fig. 1 by the data source of the receiver of oneself as next stage agency 4.
Very blunt design, wherein it should be noted that and present system provides data collector (as the file in Fig. 1, the webserver, database) built-in in a large number, transmission channel (file, internal memory etc.) and receiver (as HDFS in Fig. 1); Dissimilar data collector, can independent assortment between transmission channel and receiver; Array mode can be arranged in configuration file by user, uses very simple, flexible; Such as: transmission channel event buffer in internal memory, also can be able to be persisted on local file system; Receiver can write HDFS daily record, HBase, or even another one data collector etc.
Claims (7)
1. a mass data is collected and exchange system, it is characterized in that: this system adopts proxy mode, the agency of this system comprises data collector, transmission channel, receiver, separate between each agency, parallel switching can be carried out to multiple data source, realize data to read in and being separated of writing out, make system architecture more flexibly, light weight, efficient;
Described data collector is responsible for the Data Collection of data source, is event through process change, in the transmission channel sent, supports several data receiver with the form of event (comprising event header information and data two parts);
Described transmission channel is used for the event that data cached gatherer sends over, and is the reliability ensureing data in transmittance process, only has when event buffer processes this event to next transmission channel or receiver, and ability is deleted in event from then on transmission channel;
Described receiver extracts the event in transmission channel, according to corresponding configuration, file is stored into file system, database, or is submitted in the agency of remote server or next stage.
2. the system as claimed in claim 1, is characterized in that: the data sink of described data collector support comprises file, catalogue and database.
3. the system as claimed in claim 1, is characterized in that: described transmission channel comprises file and internal memory.
4. the system as claimed in claim 1, is characterized in that: described receiver comprises HDFS, HBase, Kafka and file.
5. mass data is collected and a switching method, and it is characterized in that, the method comprises the following steps:
1) configuration file of agency is write according to demand;
2) agency is started according to the configuration file write, act on behalf of after successfully starting, start to transmit data, by receiver, data are read in agency inside from external data source, event is become to be sent to buffer memory in transmission channel the data encapsulation of reading in, the extraction of wait-receiving mode device, receiver extracts these events and they is resolved to raw data, is stored into final destination; After agency starts, the transmitting procedure of data is automatic, automatically can also realize the collection of transform data according to the change of data.
6. method as claimed in claim 5, is characterized in that: the specific implementation step of described step 1) is as follows:
100) type of data sink needs to do corresponding configuration according to the type of external data source, if data source is the file under a catalogue, receiver types is configured to spooldir, also wants the position of disposition data source;
101) type of transmission channel configures as required, and transmission channel also needs the size of the capacity of collocation channel, the options such as the size of transfer capability;
102) type and the user of receiver the most at last data stored in position relevant, when selecting HDFS as receiver, configuration store to the position of HDFS files, the size of file.
7. method as claimed in claim 5, is characterized in that: described step 2) described in the data transmission step of agency as follows:
200) data collector is according to the data in the reading external data source, address of configuration, reads in and first judges whether data are new data afterwards, confirms as newly, pre-service is carried out to data, data are specifically formatd, and adds header, be encapsulated into an event;
201) data collector is sent to event in single or multiple transmission channel, wherein transmission channel can be regarded as a buffer zone, and its preservation event is until receiver extracts and processes this event;
202) receiver extracts the event in transmission channel, and event being resolved becomes raw data, writes data into destination by calling client-side interface, or as the external data source that next stage is acted on behalf of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510843249.3A CN105447146A (en) | 2015-11-26 | 2015-11-26 | Massive data collecting and exchanging system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510843249.3A CN105447146A (en) | 2015-11-26 | 2015-11-26 | Massive data collecting and exchanging system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105447146A true CN105447146A (en) | 2016-03-30 |
Family
ID=55557322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510843249.3A Pending CN105447146A (en) | 2015-11-26 | 2015-11-26 | Massive data collecting and exchanging system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105447146A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106227855A (en) * | 2016-07-28 | 2016-12-14 | 努比亚技术有限公司 | A kind of transacter, system and method |
CN106383758A (en) * | 2016-09-22 | 2017-02-08 | 郑州云海信息技术有限公司 | Operation system-based information acquisition method |
CN108614820A (en) * | 2016-12-09 | 2018-10-02 | 腾讯科技(深圳)有限公司 | The method and apparatus for realizing the parsing of streaming source data |
CN109088782A (en) * | 2018-11-01 | 2018-12-25 | 郑州云海信息技术有限公司 | The log collecting method and device of distributed system |
CN109857448A (en) * | 2018-12-30 | 2019-06-07 | 贝壳技术有限公司 | A kind of multi-data source cut-in method and device |
CN114500315A (en) * | 2021-12-31 | 2022-05-13 | 深圳云天励飞技术股份有限公司 | Equipment state monitoring method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101483887A (en) * | 2009-02-25 | 2009-07-15 | 南京邮电大学 | Multi-proxy collaboration method applied to wireless multimedia sensor network |
CN102801559A (en) * | 2012-08-03 | 2012-11-28 | 南京富士通南大软件技术有限公司 | Intelligent local area network data collecting method |
CN103731298A (en) * | 2013-11-15 | 2014-04-16 | 中国航天科工集团第二研究院七〇六所 | Large-scale distributed network safety data acquisition method and system |
US20140297826A1 (en) * | 2013-04-01 | 2014-10-02 | Electronics And Telecommunications Research Institute | System and method for big data aggregation in sensor network |
CN105025090A (en) * | 2015-06-24 | 2015-11-04 | 上海斐讯数据通信技术有限公司 | Data transmission customization system and method |
-
2015
- 2015-11-26 CN CN201510843249.3A patent/CN105447146A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101483887A (en) * | 2009-02-25 | 2009-07-15 | 南京邮电大学 | Multi-proxy collaboration method applied to wireless multimedia sensor network |
CN102801559A (en) * | 2012-08-03 | 2012-11-28 | 南京富士通南大软件技术有限公司 | Intelligent local area network data collecting method |
US20140297826A1 (en) * | 2013-04-01 | 2014-10-02 | Electronics And Telecommunications Research Institute | System and method for big data aggregation in sensor network |
CN103731298A (en) * | 2013-11-15 | 2014-04-16 | 中国航天科工集团第二研究院七〇六所 | Large-scale distributed network safety data acquisition method and system |
CN105025090A (en) * | 2015-06-24 | 2015-11-04 | 上海斐讯数据通信技术有限公司 | Data transmission customization system and method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106202324B (en) * | 2016-06-30 | 2020-10-30 | 北京奇虎科技有限公司 | Data processing method and device for real-time computing platform |
CN106227855A (en) * | 2016-07-28 | 2016-12-14 | 努比亚技术有限公司 | A kind of transacter, system and method |
CN106383758A (en) * | 2016-09-22 | 2017-02-08 | 郑州云海信息技术有限公司 | Operation system-based information acquisition method |
CN108614820A (en) * | 2016-12-09 | 2018-10-02 | 腾讯科技(深圳)有限公司 | The method and apparatus for realizing the parsing of streaming source data |
CN109088782A (en) * | 2018-11-01 | 2018-12-25 | 郑州云海信息技术有限公司 | The log collecting method and device of distributed system |
CN109857448A (en) * | 2018-12-30 | 2019-06-07 | 贝壳技术有限公司 | A kind of multi-data source cut-in method and device |
CN114500315A (en) * | 2021-12-31 | 2022-05-13 | 深圳云天励飞技术股份有限公司 | Equipment state monitoring method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105447146A (en) | Massive data collecting and exchanging system and method | |
US20230169084A1 (en) | Interactive visualization of a relationship of isolated execution environments | |
US11615082B1 (en) | Using a data store and message queue to ingest data for a data intake and query system | |
US11409756B1 (en) | Creating and communicating data analyses using data visualization pipelines | |
CN108847977A (en) | A kind of monitoring method of business datum, storage medium and server | |
CN103299600B (en) | For transmitting the apparatus and method of live media content | |
CN107818120A (en) | Data processing method and device based on big data | |
US11966797B2 (en) | Indexing data at a data intake and query system based on a node capacity threshold | |
CN109753502B (en) | Data acquisition method based on NiFi | |
US11609913B1 (en) | Reassigning data groups from backup to searching for a processing node | |
CN105512201A (en) | Data collection and processing method and device | |
CN111258978B (en) | Data storage method | |
CN104699723A (en) | Data exchange adapter and system and method for synchronizing data among heterogeneous systems | |
CN101964795A (en) | Log collecting system, log collection method and log recycling server | |
CN108121778B (en) | Heterogeneous data exchange and cleaning system and method | |
CN103561033B (en) | User remotely accesses the device and method of HDFS cluster | |
US11573971B1 (en) | Search and data analysis collaboration system | |
CN104584524A (en) | Aggregating data in a mediation system | |
CN105357280B (en) | A kind of file based on HDFS is traced to the source FTP system | |
CN105930502B (en) | System, client and method for collecting data | |
Malik et al. | A framework for collecting youtube meta-data | |
US11892976B2 (en) | Enhanced search performance using data model summaries stored in a remote data store | |
CN105306261A (en) | Method, device and system for collecting logs | |
US11210212B2 (en) | Conflict resolution and garbage collection in distributed databases | |
CN106919574B (en) | Method for processing remote synchronous file in real time |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160330 |