CN104378234A - Cross-data-center data transmission processing method and system - Google Patents

Cross-data-center data transmission processing method and system Download PDF

Info

Publication number
CN104378234A
CN104378234A CN201410662799.0A CN201410662799A CN104378234A CN 104378234 A CN104378234 A CN 104378234A CN 201410662799 A CN201410662799 A CN 201410662799A CN 104378234 A CN104378234 A CN 104378234A
Authority
CN
China
Prior art keywords
data
daily record
record data
mark
dictionary library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410662799.0A
Other languages
Chinese (zh)
Other versions
CN104378234B (en
Inventor
秦刚
唐玉芳
柳杨
江舟
孔祥鹏
张红意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shujie Technology Co ltd
Original Assignee
BEIJING SHUXUN TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SHUXUN TECHNOLOGY Co Ltd filed Critical BEIJING SHUXUN TECHNOLOGY Co Ltd
Priority to CN201410662799.0A priority Critical patent/CN104378234B/en
Publication of CN104378234A publication Critical patent/CN104378234A/en
Application granted granted Critical
Publication of CN104378234B publication Critical patent/CN104378234B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-data-center data transmission processing method and system. The method includes the steps that a data generating layer generates log data and transmits the log data to a data transferring layer; the data transferring layer searches a dictionary database obtained from a data processing layer for identification of the log data, the log data are replaced by the searched identification, and the unique corresponding relationship between the log data and the identification is stored in the dictionary database; the data transferring layer transmits the log data replaced by the identification to the data processing layer. Further, if the data transferring layer does not find the identification of the log data in the dictionary database, the original log data are kept and transmitted to the data processing layer; the data processing layer supplements the log data which are not replaced by the identification into the dictionary database, the identification is distributed to the log data, the unique corresponding relationship between the log data and the identification is stored, and the updated dictionary database is synchronized to the data transferring layer. By means of the cross-data-center data transmission processing method and system, the transmission number of the log data can be decreased, and the transmission band width and the transmission time can be saved.

Description

Across data transmission processing method and the system of data center
Technical field
The present invention relates to computer and communication technical field, particularly relate to the data transmission processing method across data center and system.
Background technology
Web2.0 be a framework at intellectual environment, the interpersonal mutual and content that produces, via the program in SOA, is published at this environment, management and.Compare Web1.0, it more focuses on the mutual and experience between user.User is the founder of content, is also user simultaneously.The representative service of current Web2.0 comprises: electric business's network, information class, community's network (SNS, as Renren Network), microblogging, micro-letter, dealing, health, drip and open etc.Web2.0 focuses on user interactions, after a microblogging is delivered, after constantly forwarding, commenting on, likely produces the daily record data of GB rank, and this brings new challenge to undoubtedly daily record pre-transmission.
Fig. 1 is log transmission Organization Chart in prior art.As shown in Figure 1, data hold the processing mode being transferred to data analysis layer as follows from producing:
Data generating layer produces user access logs, after carrying out GZ compression, is transferred to data relay layer with certain host-host protocol (such as FTP, HTTP etc.) to original log; Data generating layer can be made up of WEB server, and data relay layer can be made up of data relay server;
As: 1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GEThttp: //www.XXXXX.com/images/xxxxx.gif-NONE/-image/gif " http://www.XXXXX.com/aaaa/440_176147XXX.htm " " Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 " –
This is the complete access log that a WEB terminal produces, after WEB terminal produces, the data of every 5 minutes can generate .gz bag (devicename_20140822.tar.gz), then upload to data relay server by certain host-host protocol (such as FTP, HTTP etc.).
After data relay server receives the GZ APMB package produced for every 5 minutes, after these files are done and are gathered (such as the multiple files on identical device do merge after upload: multiple journal files that devicename (implementor name) is identical merge into a gz file) upload in data analysis layer (being made up of certain distributed storage or computing cluster node) and do statistical analysis and use.
But in prior art, the daily record amount that the WEB server due to data generating layer produces is very huge, bring very high bandwidth cost to transmission.Further, daily record amount greatly then need consuming time will be very long, the ageing of log collection will be very low.
Summary of the invention
The embodiment of the present invention provides a kind of data transmission processing method across data center, and in order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, the method comprises:
Data generating layer produces daily record data, daily record data is transferred to data relay layer;
After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library;
Daily record data after mark replacement is transferred to data analysis layer by data relay layer.
In an embodiment, if data relay layer does not find the mark of daily record data in described dictionary library, then retain former daily record data and transfer to data analysis layer;
Data analysis layer is supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark;
Dictionary library after renewal is synchronized to data relay layer by data analysis layer.
In an embodiment, data analysis layer be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
In an embodiment, data analysis layer sets up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer carries out mark replacement to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
In an embodiment, the dictionary library that data analysis layer is set up comprises combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
The embodiment of the present invention also provides a kind of data transmission processing system across data center, and in order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, this system comprises:
Data generating layer equipment, data relay layer equipment and data analysis layer equipment; Wherein:
Data generating layer equipment, for generation of daily record data, transfers to data relay layer equipment by daily record data;
Data relay layer equipment, for after receiving daily record data, in the dictionary library obtained from data analysis layer equipment, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark being replaced transfers to data analysis layer equipment;
Data analysis layer equipment, for providing described dictionary library, receives the daily record data after mark replacement.
In an embodiment, when data relay layer equipment also for not finding the mark of daily record data in described dictionary library, retaining former daily record data and transferring to data analysis layer equipment;
Data analysis layer equipment also for being supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after upgrading is synchronized to data relay layer equipment.
In an embodiment, data analysis layer equipment specifically for be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
In an embodiment, data analysis layer equipment, specifically for setting up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer equipment is replaced specifically for carrying out mark to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
In an embodiment, data analysis layer specifically for: set up combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
In the embodiment of the present invention, data generating layer produces daily record data, daily record data is transferred to data relay layer; After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark replacement is transferred to data analysis layer by data relay layer; Thus by carrying out daily record data identifying the transmission quantity that the mode of replacing decreases daily record data, saving transmission bandwidth and transmitting consuming time.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.In the accompanying drawings:
Fig. 1 is log transmission Organization Chart in prior art;
Fig. 2 is the flow chart across the data transmission processing method of data center in the embodiment of the present invention;
Fig. 3 is dictionary library synchronization loop schematic diagram in the embodiment of the present invention;
Fig. 4 is that in the embodiment of the present invention, daily record data mark is replaced and transfer process figure;
Fig. 5 is the instantiation figure across the data transmission processing method of data center in the embodiment of the present invention;
Fig. 6 is that in the embodiment of the present invention, dictionary library generates sample schematic diagram;
Fig. 7 is the schematic diagram across the data transmission processing system of data center in the embodiment of the present invention.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the embodiment of the present invention is described in further details.At this, schematic description and description of the present invention is for explaining the present invention, but not as a limitation of the invention.
In order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, the embodiment of the present invention provides a kind of data transmission processing method across data center.Fig. 2 is the flow chart across the data transmission processing method of data center in the embodiment of the present invention, as shown in Figure 2, can comprise in the method:
Step 201, data generating layer produce daily record data, daily record data are transferred to data relay layer;
After step 202, data relay layer receive daily record data, the mark of daily record data is searched in the dictionary library obtained from data analysis layer, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library;
Daily record data after mark replacement is transferred to data analysis layer by step 203, data relay layer.
Flow process can be learnt as shown in Figure 2, in embodiments of the present invention, when data relay layer is to data analysis layer transmission daily record data, directly to transmit daily record data different from prior art, daily record data has been carried out mark to replace, obviously, the daily record data after the replacement of transmission mark will greatly reduce than the transmission quantity of the former daily record data of direct transmission, and then can save transmission bandwidth and transmit consuming time.
During concrete enforcement, if data relay layer does not find the mark of daily record data in described dictionary library, then retain former daily record data and transfer to data analysis layer; Data analysis layer is supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after renewal is synchronized to data relay layer by data analysis layer.
Data generating layer can comprise the data such as WEB server, terminal server generating layer equipment; Data relay layer can comprise the data relay layer equipment such as data relay server; Data analysis layer can comprise the data analysis layer equipment such as distributed type assemblies node.
Illustrate further below.In embodiment, turn the distributed synchronization process of cross-node and the data center of doing dictionary library between layer and data analysis layer in the data.Fig. 3 is dictionary library synchronization loop schematic diagram in the embodiment of the present invention.As shown in Figure 3, multiple dictionary library is generated at data analysis layer, and issued by task or the mode of data relay layer timing acquisition, data relay layer gets the dictionary library that data analysis layer generates, data relay layer carries out mark according to dictionary library to daily record data and replaces, if can not find the mark of daily record data, or transmits by log data.
Such as, Fig. 4 is that in the embodiment of the present invention, daily record data mark is replaced and transfer process figure, as shown in Figure 4, turn server in the data, timing obtains up-to-date DICT (dictionary library) to data relay server from data analysis layer, data relay server carries out corresponding mark according to the dictionary library obtained to daily record data (daily record bar) and replaces process, if find corresponding mark in dictionary library, replace, if not corresponding mark in dictionary library, retain original daily record data and transmit.
Illustrate overall data process flow process again.Fig. 5 is the instantiation figure across the data transmission processing method of data center in the embodiment of the present invention.As shown in Figure 5, in this example, log data is produced by multiple terminal server, is transferred to data relay layer does the preparation transmitted toward data analysis layer after terminal server produces daily record data by certain agreement (such as FTP, HTTP etc.), turn after layer receives daily record data in the data, (when first time transmits, dictionary library is not had with compression to the coupling replacement that daily record data identifies, then former daily record data transmits after only doing compression, do not identify replacement accordingly), first store after data analysis layer receives daily record data, then the daily record data received is carried out to the renewal of dictionary library, the data filling of identified replacement or identification is not had to enter in dictionary library by transmitting, carry out synchronously to the new dictionary library supplemented after having supplemented dictionary library, the incremental portion of dictionary library is synchronized to data relay layer, then for turning layer during subsequent transmission daily record data in the data.
After the flow performing that dictionary library upgrades, just having started dictionary library renewal can be more frequent, but dictionary library just can reach very high resolution substantially after daily record data accumulation to a certain extent, the amount that dictionary library upgrades will be fewer and feweri, the compression ratio of the replacement of daily record data transmission simultaneously will increase greatly, thus reduces bandwidth sum transmission cost and improve the ageing of transmission.
As shown in above-described embodiment, it is cyclic system that whole daily record data is replaced with transmission, effectively can improve the daily record data replacement amount turning layer in the data, thus the data volume of transmission daily record is declined rapidly, even if not also by loop in dictionary library, very fast cover in dictionary library thus to impel the replacement in subsequent transmission to reduce log transmission amount, save bandwidth cost and transmit consuming time.
To collect 100G (gz compression) daily record amount every day, being transferred to data analysis layer daily record amount from data relay layer is: 100G, and under the prerequisite that bandwidth is constant, by existing transmission means, then transmitting daily record amount is 100G, and consuming time is 100s; Replace by dictionary library: a, suppose that all daily records all can be replaced by dictionary library, then transmitting daily record amount is: 52G, consuming time is 52s.Time shorten 42s, memory space saves 48%; If the dictionary in b dictionary library is imperfect, the daily record amount then transmitted in first time transmitting procedure can be larger than the daily record amount of replacing completely, the ratio that has more is calculated by dictionary library, general 80% ~ 90%, but after every secondary dictionary library renewal, then can reach the replacement rate of 90%, the daily record amount then transmitted is: 62G, and the transmission time is 62s, and also there is very large saving in the space equally for transmission time and storage.
During concrete enforcement, data analysis layer can be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
During concrete enforcement, data analysis layer can set up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark; When data relay layer carries out mark replacement to the daily record data comprising long character string information, the mark of different daily record data in long character string information can be searched from multiple dictionary library, replace the long character string information in daily record data in the mode identifying splicing.
Give an example below and unique ID generating mode is described.Fig. 6 is that in the embodiment of the present invention, dictionary library generates sample schematic diagram, shown in figure 6, such as, for the ID generating mode (generating mode of dictionary library) of the information such as URL, Referrer, the UserAgent in daily record data:
1) dictionary library for URL, Referrer is set up:
At data analysis layer, respectively Hash is done to RequestUrl and Referrer in daily record data, then contrast with the data in dictionary library, if do not existed in dictionary library, then RequestUrl new for this dozen of bar be added to dictionary library and be then deposited in unique ID to adding 1 by getting ID maximum to former dictionary library: MAX (RequestUrlId)+1.
As: 1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GEThttp: //www.XXXXX.com/images/xxxxx.gif-NONE/-image/gif " http://www.XXXXX.com/aaaa/440_176147XXX.htm " " Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 " –;
Carry out dictionary to Request to replace with:
1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GETRequestUrlID-NONE/-image/gif ReferrerID"Mozilla/5.0(Windows NT 6.1;WOW64)AppleWebKit/537.1(KHTML,like Gecko)Chrome/21.0.1180.89 Safari/537.1"–;
Wherein RequestID and ReferreID is the unique identification representing corresponding daily record data in dictionary library.
2) for the long character string information in daily record data, as information such as UserAgent, Cookies, can replace the form that an information is spliced by setting up multiple dictionary library, following example:
For that daily record above:
Four dictionary libraries are set up to UserAgent information:
Mozilla/5.0 (Windows NT 6.1; WOW64) corresponding dictionary library 1 (DICT1);
The corresponding dictionary library 2 (DICT2) of AppleWebKit/537.1;
(KHTML, like Gecko) corresponding dictionary library 3 (DICT3);
The corresponding dictionary library 4 (DICT4) of Chrome/21.0.1180.89 Safari/537.1;
Then above daily record data then can be replaced by:
1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GETRequestUrl-NONE/-image/gif ReferrerID"dict1ID+dict2ID+dict3ID+dict4ID"–。
During concrete enforcement, in the dictionary library that data analysis layer is set up, combined field dictionary library can be comprised, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
Lift an example below and the transmission of combined field dictionary compression is described, namely dictionary is done to field combination and carry out the replacement of daily record data content;
As: 1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GEThttp: //www.XXXXX.com/images/xxxxx.gif-NONE/-image/gif " http://www.XXXXX.com/aaaa/440_176147XXX.htm " " Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 " –
To IP (XXX.XXX.XXX.XXX) wherein and UserAgent (" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 ") information carries out combination dictionary, then turns layer in the data and carries out mark replacement.
Field combination substitute mode is combined closely with business, CDN daily record data is analyzed to dimension and the index of daily record, dictionary is set up in conventional combination, can to statistics and calculating section save very large storage overhead and computational resource below.
From above-described embodiment, in the embodiment of the present invention, data relay layer is synchronous with the distributed of dictionary library between data analysis layer and partial node, can effectively compress the daily record data of data relay layer, along with the change of time, dictionary library data grows enriches, at this moment most of field can identifiedly be replaced, then transmitted daily record more and more can be replaced by the unique ID in dictionary library, and daily record amount can be more and more less, saves transmission bandwidth and transmission time.
Unique ID generating mode in dictionary library: first will have unique ID in dictionary library, do the plaintext data of dictionary, the storage of the Hash of the original text to done dictionary library can also be had (for the comparison of data, the efficiency compared can be promoted) with Hash, by daily record data contrast (or contrast of Hash), if there is no then increase a record newly, and the maximum of getting unique ID increases progressively the unique ID for new record.According to the statistics in the formula of announcement cluster and the demand of analysis, combination can be carried out to multiple field and create dictionary library.Replacement for ID unique in whole log transmission and daily record bar dictionary forms a loop, to ensure that the renewal of dictionary library is with synchronous.
Based on same inventive concept, additionally provide a kind of data transmission processing system across data center in the embodiment of the present invention, as described in the following examples.The principle of dealing with problems due to this system is similar to the data transmission processing method across data center, and therefore the enforcement of this system see the enforcement of the data transmission processing method across data center, can repeat part and repeat no more.
Fig. 7 is the schematic diagram across the data transmission processing system of data center in the embodiment of the present invention.As shown in Figure 7, the data transmission processing system across data center in the embodiment of the present invention comprises:
Data generating layer equipment 701, data relay layer equipment 702 and data analysis layer equipment 703; Wherein:
Data generating layer equipment 701, for generation of daily record data, transfers to data relay layer equipment 702 by daily record data;
Data relay layer equipment 702, for after receiving daily record data, in the dictionary library obtained from data analysis layer equipment 703, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark being replaced transfers to data analysis layer equipment 703;
Data analysis layer equipment 703, for providing described dictionary library, receives the daily record data after mark replacement.
During concrete enforcement, data generating layer equipment can comprise WEB server, terminal server etc.; Data relay layer equipment can comprise data relay server etc.; Data analysis layer equipment can comprise distributed type assemblies node etc.
During concrete enforcement, when data relay layer equipment can also be used for the mark not finding daily record data in described dictionary library, retain former daily record data and transfer to data analysis layer equipment;
Data analysis layer equipment can also be used for being supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after upgrading is synchronized to data relay layer equipment.
During concrete enforcement, data analysis layer equipment specifically may be used for the unique corresponding relation of storing daily record data and mark for daily record data allocation identification and in dictionary library in the following way:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
During concrete enforcement, data analysis layer equipment specifically may be used for setting up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer equipment specifically may be used for carrying out mark replacement to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
During concrete enforcement, data analysis layer specifically may be used for: set up combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
In sum, in the embodiment of the present invention, data generating layer produces daily record data, daily record data is transferred to data relay layer; After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark replacement is transferred to data analysis layer by data relay layer; Thus by carrying out daily record data identifying the transmission quantity that the mode of replacing decreases daily record data, saving transmission bandwidth and transmitting consuming time.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the flow chart of the method for the embodiment of the present invention, equipment (system) and computer program and/or block diagram.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; the protection range be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. across a data transmission processing method for data center, it is characterized in that, comprising:
Data generating layer produces daily record data, daily record data is transferred to data relay layer;
After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library;
Daily record data after mark replacement is transferred to data analysis layer by data relay layer.
2. the method for claim 1, is characterized in that, if data relay layer does not find the mark of daily record data in described dictionary library, then retains former daily record data and transfers to data analysis layer;
Data analysis layer is supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark;
Dictionary library after renewal is synchronized to data relay layer by data analysis layer.
3. method as claimed in claim 1 or 2, is characterized in that, data analysis layer be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
4. method as claimed in claim 1 or 2, it is characterized in that, data analysis layer sets up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer carries out mark replacement to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
5. method as claimed in claim 1 or 2, is characterized in that, the dictionary library that data analysis layer is set up comprises combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
6. across a data transmission processing system for data center, it is characterized in that, comprising:
Data generating layer equipment, data relay layer equipment and data analysis layer equipment; Wherein:
Data generating layer equipment, for generation of daily record data, transfers to data relay layer equipment by daily record data;
Data relay layer equipment, for after receiving daily record data, in the dictionary library obtained from data analysis layer equipment, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark being replaced transfers to data analysis layer equipment;
Data analysis layer equipment, for providing described dictionary library, receives the daily record data after mark replacement.
7. system as claimed in claim 6, is characterized in that, when data relay layer equipment also for not finding the mark of daily record data in described dictionary library, retaining former daily record data and transferring to data analysis layer equipment;
Data analysis layer equipment also for being supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after upgrading is synchronized to data relay layer equipment.
8. system as claimed in claims 6 or 7, is characterized in that, data analysis layer equipment specifically for be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
9. system as claimed in claims 6 or 7, it is characterized in that, data analysis layer equipment, specifically for setting up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer equipment is replaced specifically for carrying out mark to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
10. system as claimed in claims 6 or 7, is characterized in that, data analysis layer specifically for: set up combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
CN201410662799.0A 2014-11-19 2014-11-19 Across the data transmission processing method and system of data center Expired - Fee Related CN104378234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410662799.0A CN104378234B (en) 2014-11-19 2014-11-19 Across the data transmission processing method and system of data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410662799.0A CN104378234B (en) 2014-11-19 2014-11-19 Across the data transmission processing method and system of data center

Publications (2)

Publication Number Publication Date
CN104378234A true CN104378234A (en) 2015-02-25
CN104378234B CN104378234B (en) 2018-09-07

Family

ID=52556912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410662799.0A Expired - Fee Related CN104378234B (en) 2014-11-19 2014-11-19 Across the data transmission processing method and system of data center

Country Status (1)

Country Link
CN (1) CN104378234B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484565A (en) * 2016-09-22 2017-03-08 华为数字技术(成都)有限公司 Method of data synchronization between multiple data centers and relevant device
CN107241394A (en) * 2017-05-24 2017-10-10 努比亚技术有限公司 A kind of log transmission method, device and computer-readable recording medium
CN107273485A (en) * 2017-06-13 2017-10-20 苏州弘铭检测科技有限公司 A kind of data store organisation and database remapping method based on configurable data storehouse
CN109362079A (en) * 2018-11-05 2019-02-19 北京小米移动软件有限公司 Data processing method and device
CN109408534A (en) * 2018-11-02 2019-03-01 上海新炬网络信息技术股份有限公司 Method based on character string uniqueness and repeatability displacement output
CN109743188A (en) * 2018-11-23 2019-05-10 麒麟合盛网络技术股份有限公司 Daily record data treating method and apparatus
CN110309176A (en) * 2018-03-15 2019-10-08 腾讯科技(深圳)有限公司 A kind of data processing method and data terminal
CN112905249A (en) * 2021-01-29 2021-06-04 加和(北京)信息科技有限公司 Method for determining device identifier
CN114666406A (en) * 2022-02-24 2022-06-24 国电南瑞科技股份有限公司 Object model-based power internet of things data compression method and device
CN115732036A (en) * 2022-12-06 2023-03-03 云舟生物科技(广州)股份有限公司 Method for adjusting transcript base stock, computer storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101500208A (en) * 2008-01-31 2009-08-05 三星电子株式会社 Data synchronization method and system between devices
CN102611611A (en) * 2011-12-13 2012-07-25 北京安天电子设备有限公司 Log caching system and method
CN103092742A (en) * 2011-10-31 2013-05-08 国际商业机器公司 Optimization method and system of program logging
CN103401937A (en) * 2013-08-07 2013-11-20 中国科学院信息工程研究所 Log data processing method and system
CN103412924A (en) * 2013-08-12 2013-11-27 东软集团股份有限公司 Log multi-language query method and system
CN103532754A (en) * 2013-10-12 2014-01-22 北京首信科技股份有限公司 System and method for high-speed memory and distributed type processing of massive logs
CN103823811A (en) * 2012-11-19 2014-05-28 北京百度网讯科技有限公司 Method and system for processing journals

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101500208A (en) * 2008-01-31 2009-08-05 三星电子株式会社 Data synchronization method and system between devices
CN103092742A (en) * 2011-10-31 2013-05-08 国际商业机器公司 Optimization method and system of program logging
CN102611611A (en) * 2011-12-13 2012-07-25 北京安天电子设备有限公司 Log caching system and method
CN103823811A (en) * 2012-11-19 2014-05-28 北京百度网讯科技有限公司 Method and system for processing journals
CN103401937A (en) * 2013-08-07 2013-11-20 中国科学院信息工程研究所 Log data processing method and system
CN103412924A (en) * 2013-08-12 2013-11-27 东软集团股份有限公司 Log multi-language query method and system
CN103532754A (en) * 2013-10-12 2014-01-22 北京首信科技股份有限公司 System and method for high-speed memory and distributed type processing of massive logs

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484565A (en) * 2016-09-22 2017-03-08 华为数字技术(成都)有限公司 Method of data synchronization between multiple data centers and relevant device
CN106484565B (en) * 2016-09-22 2019-06-28 华为数字技术(成都)有限公司 Method of data synchronization and relevant device between multiple data centers
CN107241394A (en) * 2017-05-24 2017-10-10 努比亚技术有限公司 A kind of log transmission method, device and computer-readable recording medium
CN107273485A (en) * 2017-06-13 2017-10-20 苏州弘铭检测科技有限公司 A kind of data store organisation and database remapping method based on configurable data storehouse
CN110309176A (en) * 2018-03-15 2019-10-08 腾讯科技(深圳)有限公司 A kind of data processing method and data terminal
CN110309176B (en) * 2018-03-15 2024-04-05 腾讯科技(深圳)有限公司 Data processing method and data transfer station
CN109408534A (en) * 2018-11-02 2019-03-01 上海新炬网络信息技术股份有限公司 Method based on character string uniqueness and repeatability displacement output
CN109362079A (en) * 2018-11-05 2019-02-19 北京小米移动软件有限公司 Data processing method and device
CN109743188A (en) * 2018-11-23 2019-05-10 麒麟合盛网络技术股份有限公司 Daily record data treating method and apparatus
CN112905249A (en) * 2021-01-29 2021-06-04 加和(北京)信息科技有限公司 Method for determining device identifier
CN114666406A (en) * 2022-02-24 2022-06-24 国电南瑞科技股份有限公司 Object model-based power internet of things data compression method and device
CN114666406B (en) * 2022-02-24 2023-11-21 国电南瑞科技股份有限公司 Electric power Internet of things data compression method and device based on object model
CN115732036A (en) * 2022-12-06 2023-03-03 云舟生物科技(广州)股份有限公司 Method for adjusting transcript base stock, computer storage medium and electronic equipment
CN115732036B (en) * 2022-12-06 2023-11-28 云舟生物科技(广州)股份有限公司 Method for adjusting transcript base stock, computer storage medium and electronic device

Also Published As

Publication number Publication date
CN104378234B (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN104378234A (en) Cross-data-center data transmission processing method and system
CN104283723B (en) Network access log processing method and processing device
US20150143377A1 (en) Dynamic scheduling of tasks for collecting and processing data using job configuration data
CN104394211A (en) Design and implementation method for user behavior analysis system based on Hadoop
EP3364627B1 (en) Adaptive session intelligence extender
US11188443B2 (en) Method, apparatus and system for processing log data
CN103248645A (en) BT (Bit Torrent) off-line data downloading system and method
CN104765840A (en) Big data distributed storage method and device
CN110011952B (en) Data transmission method, service cluster and client
CN103108051A (en) Synchronous file sharing method from cloud server
US20150188879A1 (en) Apparatus for grouping servers, a method for grouping servers and a recording medium
CN103841180A (en) Network data synchronization method and device based on operating instruction, terminal device and server
CN110019539A (en) A kind of method and apparatus that the data of data warehouse are synchronous
CN110460668B (en) File uploading method and device, computer equipment and storage medium
CN111224831A (en) Method and system for generating call ticket
CN105338107A (en) Stronghold operation synchronous management system and stronghold operation synchronous management method
CN109783330B (en) Log processing method, log display method, and related device and system
WO2013113255A1 (en) Method and apparatus for obtaining web data
CN104503983A (en) Method and device for providing website certification data for search engine
KR102423039B1 (en) Real-time packet data storing method and apparatus for mass network monitoring
CN103354546A (en) Message filtering method and message filtering apparatus
CN102739704A (en) Method and system for data migration in peer-to-peer network
CN102750287B (en) Include method and the download authentication server of index information
US20150088958A1 (en) Information Processing System and Distributed Processing Method
CN110417860A (en) File transfer management method, apparatus, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20161028

Address after: 100088, No. 407, block A, 28 Xinjie street, Xinjie street, Beijing, Xicheng District

Applicant after: Beijing Digital Technology Co.,Ltd.

Address before: 100088, A, No. 406, Putian Desheng Road, 28 Xinjie street, Xicheng District, Beijing

Applicant before: BEIJING SHUXUN TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180731

Address after: 101111 No. 408 block A, 28 new street, new street, Xicheng District, Beijing.

Applicant after: BEIJING SHUJIE TECHNOLOGY CO.,LTD.

Address before: 100088 No. 407 block A, 28 new street, new street, Xicheng District, Beijing.

Applicant before: Beijing Digital Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180907

Termination date: 20211119