Summary of the invention
The embodiment of the present invention provides a kind of data transmission processing method across data center, and in order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, the method comprises:
Data generating layer produces daily record data, daily record data is transferred to data relay layer;
After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library;
Daily record data after mark replacement is transferred to data analysis layer by data relay layer.
In an embodiment, if data relay layer does not find the mark of daily record data in described dictionary library, then retain former daily record data and transfer to data analysis layer;
Data analysis layer is supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark;
Dictionary library after renewal is synchronized to data relay layer by data analysis layer.
In an embodiment, data analysis layer be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
In an embodiment, data analysis layer sets up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer carries out mark replacement to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
In an embodiment, the dictionary library that data analysis layer is set up comprises combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
The embodiment of the present invention also provides a kind of data transmission processing system across data center, and in order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, this system comprises:
Data generating layer equipment, data relay layer equipment and data analysis layer equipment; Wherein:
Data generating layer equipment, for generation of daily record data, transfers to data relay layer equipment by daily record data;
Data relay layer equipment, for after receiving daily record data, in the dictionary library obtained from data analysis layer equipment, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark being replaced transfers to data analysis layer equipment;
Data analysis layer equipment, for providing described dictionary library, receives the daily record data after mark replacement.
In an embodiment, when data relay layer equipment also for not finding the mark of daily record data in described dictionary library, retaining former daily record data and transferring to data analysis layer equipment;
Data analysis layer equipment also for being supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after upgrading is synchronized to data relay layer equipment.
In an embodiment, data analysis layer equipment specifically for be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
In an embodiment, data analysis layer equipment, specifically for setting up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer equipment is replaced specifically for carrying out mark to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
In an embodiment, data analysis layer specifically for: set up combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
In the embodiment of the present invention, data generating layer produces daily record data, daily record data is transferred to data relay layer; After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark replacement is transferred to data analysis layer by data relay layer; Thus by carrying out daily record data identifying the transmission quantity that the mode of replacing decreases daily record data, saving transmission bandwidth and transmitting consuming time.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the embodiment of the present invention is described in further details.At this, schematic description and description of the present invention is for explaining the present invention, but not as a limitation of the invention.
In order to reduce the transmission quantity of daily record data, save transmission bandwidth and transmit consuming time, the embodiment of the present invention provides a kind of data transmission processing method across data center.Fig. 2 is the flow chart across the data transmission processing method of data center in the embodiment of the present invention, as shown in Figure 2, can comprise in the method:
Step 201, data generating layer produce daily record data, daily record data are transferred to data relay layer;
After step 202, data relay layer receive daily record data, the mark of daily record data is searched in the dictionary library obtained from data analysis layer, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library;
Daily record data after mark replacement is transferred to data analysis layer by step 203, data relay layer.
Flow process can be learnt as shown in Figure 2, in embodiments of the present invention, when data relay layer is to data analysis layer transmission daily record data, directly to transmit daily record data different from prior art, daily record data has been carried out mark to replace, obviously, the daily record data after the replacement of transmission mark will greatly reduce than the transmission quantity of the former daily record data of direct transmission, and then can save transmission bandwidth and transmit consuming time.
During concrete enforcement, if data relay layer does not find the mark of daily record data in described dictionary library, then retain former daily record data and transfer to data analysis layer; Data analysis layer is supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after renewal is synchronized to data relay layer by data analysis layer.
Data generating layer can comprise the data such as WEB server, terminal server generating layer equipment; Data relay layer can comprise the data relay layer equipment such as data relay server; Data analysis layer can comprise the data analysis layer equipment such as distributed type assemblies node.
Illustrate further below.In embodiment, turn the distributed synchronization process of cross-node and the data center of doing dictionary library between layer and data analysis layer in the data.Fig. 3 is dictionary library synchronization loop schematic diagram in the embodiment of the present invention.As shown in Figure 3, multiple dictionary library is generated at data analysis layer, and issued by task or the mode of data relay layer timing acquisition, data relay layer gets the dictionary library that data analysis layer generates, data relay layer carries out mark according to dictionary library to daily record data and replaces, if can not find the mark of daily record data, or transmits by log data.
Such as, Fig. 4 is that in the embodiment of the present invention, daily record data mark is replaced and transfer process figure, as shown in Figure 4, turn server in the data, timing obtains up-to-date DICT (dictionary library) to data relay server from data analysis layer, data relay server carries out corresponding mark according to the dictionary library obtained to daily record data (daily record bar) and replaces process, if find corresponding mark in dictionary library, replace, if not corresponding mark in dictionary library, retain original daily record data and transmit.
Illustrate overall data process flow process again.Fig. 5 is the instantiation figure across the data transmission processing method of data center in the embodiment of the present invention.As shown in Figure 5, in this example, log data is produced by multiple terminal server, is transferred to data relay layer does the preparation transmitted toward data analysis layer after terminal server produces daily record data by certain agreement (such as FTP, HTTP etc.), turn after layer receives daily record data in the data, (when first time transmits, dictionary library is not had with compression to the coupling replacement that daily record data identifies, then former daily record data transmits after only doing compression, do not identify replacement accordingly), first store after data analysis layer receives daily record data, then the daily record data received is carried out to the renewal of dictionary library, the data filling of identified replacement or identification is not had to enter in dictionary library by transmitting, carry out synchronously to the new dictionary library supplemented after having supplemented dictionary library, the incremental portion of dictionary library is synchronized to data relay layer, then for turning layer during subsequent transmission daily record data in the data.
After the flow performing that dictionary library upgrades, just having started dictionary library renewal can be more frequent, but dictionary library just can reach very high resolution substantially after daily record data accumulation to a certain extent, the amount that dictionary library upgrades will be fewer and feweri, the compression ratio of the replacement of daily record data transmission simultaneously will increase greatly, thus reduces bandwidth sum transmission cost and improve the ageing of transmission.
As shown in above-described embodiment, it is cyclic system that whole daily record data is replaced with transmission, effectively can improve the daily record data replacement amount turning layer in the data, thus the data volume of transmission daily record is declined rapidly, even if not also by loop in dictionary library, very fast cover in dictionary library thus to impel the replacement in subsequent transmission to reduce log transmission amount, save bandwidth cost and transmit consuming time.
To collect 100G (gz compression) daily record amount every day, being transferred to data analysis layer daily record amount from data relay layer is: 100G, and under the prerequisite that bandwidth is constant, by existing transmission means, then transmitting daily record amount is 100G, and consuming time is 100s; Replace by dictionary library: a, suppose that all daily records all can be replaced by dictionary library, then transmitting daily record amount is: 52G, consuming time is 52s.Time shorten 42s, memory space saves 48%; If the dictionary in b dictionary library is imperfect, the daily record amount then transmitted in first time transmitting procedure can be larger than the daily record amount of replacing completely, the ratio that has more is calculated by dictionary library, general 80% ~ 90%, but after every secondary dictionary library renewal, then can reach the replacement rate of 90%, the daily record amount then transmitted is: 62G, and the transmission time is 62s, and also there is very large saving in the space equally for transmission time and storage.
During concrete enforcement, data analysis layer can be daily record data allocation identification in the following way and in dictionary library the unique corresponding relation of storing daily record data and mark:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
During concrete enforcement, data analysis layer can set up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark; When data relay layer carries out mark replacement to the daily record data comprising long character string information, the mark of different daily record data in long character string information can be searched from multiple dictionary library, replace the long character string information in daily record data in the mode identifying splicing.
Give an example below and unique ID generating mode is described.Fig. 6 is that in the embodiment of the present invention, dictionary library generates sample schematic diagram, shown in figure 6, such as, for the ID generating mode (generating mode of dictionary library) of the information such as URL, Referrer, the UserAgent in daily record data:
1) dictionary library for URL, Referrer is set up:
At data analysis layer, respectively Hash is done to RequestUrl and Referrer in daily record data, then contrast with the data in dictionary library, if do not existed in dictionary library, then RequestUrl new for this dozen of bar be added to dictionary library and be then deposited in unique ID to adding 1 by getting ID maximum to former dictionary library: MAX (RequestUrlId)+1.
As: 1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GEThttp: //www.XXXXX.com/images/xxxxx.gif-NONE/-image/gif " http://www.XXXXX.com/aaaa/440_176147XXX.htm " " Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 " –;
Carry out dictionary to Request to replace with:
1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GETRequestUrlID-NONE/-image/gif ReferrerID"Mozilla/5.0(Windows NT 6.1;WOW64)AppleWebKit/537.1(KHTML,like Gecko)Chrome/21.0.1180.89 Safari/537.1"–;
Wherein RequestID and ReferreID is the unique identification representing corresponding daily record data in dictionary library.
2) for the long character string information in daily record data, as information such as UserAgent, Cookies, can replace the form that an information is spliced by setting up multiple dictionary library, following example:
For that daily record above:
Four dictionary libraries are set up to UserAgent information:
Mozilla/5.0 (Windows NT 6.1; WOW64) corresponding dictionary library 1 (DICT1);
The corresponding dictionary library 2 (DICT2) of AppleWebKit/537.1;
(KHTML, like Gecko) corresponding dictionary library 3 (DICT3);
The corresponding dictionary library 4 (DICT4) of Chrome/21.0.1180.89 Safari/537.1;
Then above daily record data then can be replaced by:
1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GETRequestUrl-NONE/-image/gif ReferrerID"dict1ID+dict2ID+dict3ID+dict4ID"–。
During concrete enforcement, in the dictionary library that data analysis layer is set up, combined field dictionary library can be comprised, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
Lift an example below and the transmission of combined field dictionary compression is described, namely dictionary is done to field combination and carry out the replacement of daily record data content;
As: 1386562882.666 14 XXX.XXX.XXX.XXX TCP_MEM_HIT/200 440 GEThttp: //www.XXXXX.com/images/xxxxx.gif-NONE/-image/gif " http://www.XXXXX.com/aaaa/440_176147XXX.htm " " Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 " –
To IP (XXX.XXX.XXX.XXX) wherein and UserAgent (" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 ") information carries out combination dictionary, then turns layer in the data and carries out mark replacement.
Field combination substitute mode is combined closely with business, CDN daily record data is analyzed to dimension and the index of daily record, dictionary is set up in conventional combination, can to statistics and calculating section save very large storage overhead and computational resource below.
From above-described embodiment, in the embodiment of the present invention, data relay layer is synchronous with the distributed of dictionary library between data analysis layer and partial node, can effectively compress the daily record data of data relay layer, along with the change of time, dictionary library data grows enriches, at this moment most of field can identifiedly be replaced, then transmitted daily record more and more can be replaced by the unique ID in dictionary library, and daily record amount can be more and more less, saves transmission bandwidth and transmission time.
Unique ID generating mode in dictionary library: first will have unique ID in dictionary library, do the plaintext data of dictionary, the storage of the Hash of the original text to done dictionary library can also be had (for the comparison of data, the efficiency compared can be promoted) with Hash, by daily record data contrast (or contrast of Hash), if there is no then increase a record newly, and the maximum of getting unique ID increases progressively the unique ID for new record.According to the statistics in the formula of announcement cluster and the demand of analysis, combination can be carried out to multiple field and create dictionary library.Replacement for ID unique in whole log transmission and daily record bar dictionary forms a loop, to ensure that the renewal of dictionary library is with synchronous.
Based on same inventive concept, additionally provide a kind of data transmission processing system across data center in the embodiment of the present invention, as described in the following examples.The principle of dealing with problems due to this system is similar to the data transmission processing method across data center, and therefore the enforcement of this system see the enforcement of the data transmission processing method across data center, can repeat part and repeat no more.
Fig. 7 is the schematic diagram across the data transmission processing system of data center in the embodiment of the present invention.As shown in Figure 7, the data transmission processing system across data center in the embodiment of the present invention comprises:
Data generating layer equipment 701, data relay layer equipment 702 and data analysis layer equipment 703; Wherein:
Data generating layer equipment 701, for generation of daily record data, transfers to data relay layer equipment 702 by daily record data;
Data relay layer equipment 702, for after receiving daily record data, in the dictionary library obtained from data analysis layer equipment 703, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark being replaced transfers to data analysis layer equipment 703;
Data analysis layer equipment 703, for providing described dictionary library, receives the daily record data after mark replacement.
During concrete enforcement, data generating layer equipment can comprise WEB server, terminal server etc.; Data relay layer equipment can comprise data relay server etc.; Data analysis layer equipment can comprise distributed type assemblies node etc.
During concrete enforcement, when data relay layer equipment can also be used for the mark not finding daily record data in described dictionary library, retain former daily record data and transfer to data analysis layer equipment;
Data analysis layer equipment can also be used for being supplemented in dictionary library by not carrying out identifying the daily record data replaced, and is this daily record data allocation identification, stores the unique corresponding relation of this daily record data and mark; Dictionary library after upgrading is synchronized to data relay layer equipment.
During concrete enforcement, data analysis layer equipment specifically may be used for the unique corresponding relation of storing daily record data and mark for daily record data allocation identification and in dictionary library in the following way:
Cryptographic Hash is got to daily record data, data with existing in this cryptographic Hash and dictionary library is contrasted;
If there is not this cryptographic Hash in dictionary library, then this daily record data is supplemented in dictionary library;
The maximum of getting the existing mark in dictionary library adds 1, and the mark as this daily record data stores.
During concrete enforcement, data analysis layer equipment specifically may be used for setting up multiple dictionary library, stores the unique corresponding relation of different daily record data and mark;
When data relay layer equipment specifically may be used for carrying out mark replacement to the daily record data comprising long character string information, from multiple dictionary library, search the mark of different daily record data in long character string information, replace the long character string information in daily record data in the mode identifying splicing.
During concrete enforcement, data analysis layer specifically may be used for: set up combined field dictionary library, the unique corresponding relation of the described combination of combined field dictionary library storing daily record data field and the combination of mark.
In sum, in the embodiment of the present invention, data generating layer produces daily record data, daily record data is transferred to data relay layer; After data relay layer receives daily record data, in the dictionary library obtained from data analysis layer, search the mark of daily record data, by the mark found, daily record data is replaced, the unique corresponding relation of storing daily record data and mark in described dictionary library; Daily record data after mark replacement is transferred to data analysis layer by data relay layer; Thus by carrying out daily record data identifying the transmission quantity that the mode of replacing decreases daily record data, saving transmission bandwidth and transmitting consuming time.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the flow chart of the method for the embodiment of the present invention, equipment (system) and computer program and/or block diagram.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; the protection range be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.