CN108134775A - A kind of data processing method and equipment - Google Patents

A kind of data processing method and equipment Download PDF

Info

Publication number
CN108134775A
CN108134775A CN201711167866.1A CN201711167866A CN108134775A CN 108134775 A CN108134775 A CN 108134775A CN 201711167866 A CN201711167866 A CN 201711167866A CN 108134775 A CN108134775 A CN 108134775A
Authority
CN
China
Prior art keywords
data block
data
equipment
fingerprint
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711167866.1A
Other languages
Chinese (zh)
Other versions
CN108134775B (en
Inventor
冷继南
关坤
李定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201711167866.1A priority Critical patent/CN108134775B/en
Publication of CN108134775A publication Critical patent/CN108134775A/en
Application granted granted Critical
Publication of CN108134775B publication Critical patent/CN108134775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of data processing method and equipment, are related to field of computer technology, help to save bandwidth resources.This method includes:First equipment calculates the similar fingerprints of data to be transmitted, and the similar fingerprints of data to be transmitted include the similar fingerprints of the first data block;First equipment sends the similar fingerprints of data to be transmitted to the second equipment, and the similar fingerprints of data to be transmitted are used to search in the second equipment whether store the referenced data block similar to data to be transmitted;First equipment receives the fingerprint for the referenced data block that the second equipment is sent;The fingerprint of referenced data block includes the fingerprint of the first referenced data block;The similar fingerprints of first referenced data block are identical with the similar fingerprints of the first data block;First equipment finds the first referenced data block in the first equipment according to the fingerprint of the first referenced data block;First equipment is based on the fingerprint of referenced data block to the second equipment transmission data;The data include the variance data between the first referenced data block and the first data block.

Description

A kind of data processing method and equipment
Technical field
This application involves field of computer technology more particularly to a kind of data processing methods and equipment.
Background technology
With the continuous propulsion of cloud computing industry, the cloud infrastructure of mainstream vendor, if cloud computing center, cloud calamity are in The heart, edge cloud etc. start extensive deployment.These infrastructure formed complicated wide area network (wide area network, WAN) network topology, and when these infrastructure carry out data transmission between each other need a large amount of WAN bandwidth resources of expense. Under limited bandwidth condition, bandwidth is saved usually using WAN acceleration techniques.
WAN acceleration techniques are to install a special equipment i.e. WAN accelerators respectively at wide-area network link both ends (accelerator, ACC).WAN accelerators use data de-duplication by caching the part or all of data transmitted Technology to reduce the data volume transmitted in wide-area network link, so as to save bandwidth resources, shortens information transmission and realizes total time Information acceleration.Specifically:If the WAN accelerators of sending ending equipment side installation judge that sending ending equipment is sent to receiving device Data be stored in the WAN accelerators of receiving device side installation, then the WAN accelerators of sending ending equipment side installation can be with The data are not sent to receiving device, so as to save bandwidth resources.
However, in actual implementation, sending ending equipment is installed to the data that receiving device is sent with receiving device side The identical situation of data of WAN accelerators caching is generally few, and therefore, above-mentioned WAN acceleration techniques can reach saving band The effect of wide resource is limited.
Invention content
In order to achieve the above object, this application provides a kind of data processing method and equipment, help to save bandwidth money Source.
In a first aspect, this application provides a kind of data processing method, this method can include:First equipment calculates to be passed The similar fingerprints of transmission of data;Wherein, the similar fingerprints of data to be transmitted include the similar fingerprints of the first data block, the first data block It is a data block in data to be transmitted;First equipment sends the similar fingerprints of data to be transmitted to the second equipment, to be transmitted The similar fingerprints of data are used to search in the second equipment whether store the referenced data block similar to data to be transmitted;First sets The standby fingerprint for receiving the referenced data block that the second equipment is sent;Wherein, the fingerprint of referenced data block includes the first referenced data block Fingerprint;The similar fingerprints of first referenced data block are identical with the similar fingerprints of the first data block;First equipment is according to the first ginseng The fingerprint for examining data block finds the first referenced data block in the first equipment;First equipment based on the fingerprint of referenced data block to Second equipment transmission data;Wherein, which includes the variance data between the first referenced data block and the first data block.The skill In art scheme, the first equipment is stored with and the first number by carrying out information exchange with the second equipment in the second equipment is determined According to block the first similar referenced data block when, send the difference between the first data block and the first referenced data block to the second equipment Heteromerism evidence.In this way, for compared to the first data block of transmission, bandwidth resources can be saved, so as to reduce information transmission time, that is, are added The fast rate of information throughput.
Wherein, the fingerprint of data block refer to that whole characteristic informations based on the data block obtain for marking the data block Identification information.The fingerprint of different data block is different.The similar fingerprints of data block refer to the special characteristic letter based on the data block Cease the obtained identification information for being used to mark the data block.The similar fingerprints of different data block can be identical, can not also be identical. The similar fingerprints of data to be transmitted can include the similar fingerprints of each data block in data to be transmitted.Referenced data block Fingerprint can include the fingerprint of the referenced data block similar to each data block in data to be transmitted.With data to be transmitted phase As referenced data block, the specifically referenced data block similar to the data block in data to be transmitted.
In a kind of possible design, the first equipment calculates the similar fingerprints of data to be transmitted, can include:First equipment Hash operation is carried out to the data block of data to be transmitted using local sensitivity hash algorithm, obtains the similar fingerprints of data block.Its In, local sensitivity hash algorithm can be such as, but not limited to minhash, simhash etc..
In a kind of possible design, the first equipment utilization local sensitivity hash algorithm to the data block of data to be transmitted into Row Hash operation obtains the similar fingerprints of data block, can include:First equipment cutting data to be transmitted obtains data block;It is right In each data block, the first equipment performs following operation:Extract at least one of data block sub-block;It is breathed out using m kinds Uncommon algorithm, carries out Hash operation at least one sub-block respectively, obtains m Hash sequence;Wherein, it is calculated using a kind of Hash Method carries out Hash operation at least one sub-block, obtains 1 Hash sequence;M is greater than the integer equal to 2;By m Kazakhstan The maximum value in each Hash sequence in uncommon sequence merges, and using the Hash sequence obtained after merging as the data block Similar fingerprints;Alternatively, the minimum value in each Hash sequence in m Hash sequence is merged, and obtained after merging Similar fingerprints of the Hash sequence arrived as the data block.The possible design provides a kind of specific implementation side of minhash Formula, in which, obtaining the process of m Hash sequence can perform parallel, can shorten calculate spent by similar fingerprints in this way Time.
In a kind of possible design, this method can also include:First equipment utilization Differential Compression algorithm is joined to first It examines data block and the first data block carries out Differential Compression.In this way, bandwidth resources can be further saved, so as to reduce information transmission Time accelerates the rate of information throughput.
In a kind of possible design, the similar fingerprints of data to be transmitted also include the similar fingerprints of the second data block, the Two data blocks are another data blocks in data to be transmitted;The fingerprint of referenced data block does not include the finger of the second referenced data block Line;The similar fingerprints of second referenced data block are identical with the similar fingerprints of the second data block;The data also include the second data block.
In a kind of possible design, the first equipment includes first order caching and second level caching, first order caching right and wrong Persistence medium, second level caching is persistence medium, first order caching for cache the part that is stored in the caching of the second level or All data blocks and the partly or entirely fingerprint and similar fingerprints of data block;This method can also include:First equipment is The fingerprint of the first referenced data block is searched in level cache;If the finger less than the first referenced data block is searched in first order caching Line searches the fingerprint of the first referenced data block in then being cached in the second level.Thus, the data block in being cached due to the first order The probability being hit is higher, i.e., usually can find the first referenced data block, therefore can improve letter in first order caching Cease search efficiency.
In a kind of possible design, second level caching includes one or more containers, and each container is at least two numbers According to the set that the fingerprint and similar fingerprints of each data block in block and at least two data blocks are formed, in each container at least There is correlation between the content of two data blocks;This method can also include:If the first equipment is searched in being cached in the second level To a data block, then the container where data block is cached to the first order in caching.Thus, in being cached due to the first order The probability that is hit of data block it is higher, i.e., usually can find the data block, therefore can improve in first order caching Information searching efficiency.
Second aspect, this application provides a kind of data processing method, this method can include:Second equipment receives first The similar fingerprints for the data to be transmitted that equipment is sent, wherein, the similar fingerprints of data to be transmitted include the similar of the first data block Fingerprint, the first data block are a data blocks in data to be transmitted;Second equipment is looked into according to the similar fingerprints of data to be transmitted It finds and the referenced data block similar to data to be transmitted is stored in the second equipment;Wherein, referenced data block includes the first reference Data block, the similar fingerprints of the first referenced data block are identical with the similar fingerprints of the first data block;Second equipment is to the first equipment Send the fingerprint of referenced data block;Wherein, the fingerprint of referenced data block includes the fingerprint of the first referenced data block;Referenced data block Fingerprint for the first equipment to the second equipment transmission data, which includes between the first referenced data block and the first data block Variance data;Second equipment receives the data that the first equipment is sent.In the technical solution, the second equipment by with the first equipment Information exchange is carried out, and when being stored with the similar referenced data block to the first data block in determining the second equipment, to first Equipment sends the fingerprint of the referenced data block, so that the first equipment is sent according to the fingerprint of the referenced data block to the second equipment Variance data between first data block and the referenced data block.In this way, for compared to the first data block of transmission, band can be saved Wide resource so as to reduce information transmission time, that is, accelerates the rate of information throughput.
In a kind of possible design, this method can also include:Second equipment receives the to be transmitted of the first equipment transmission The fingerprint of data, wherein, the fingerprint of data to be transmitted includes the fingerprint of the first data block;Second equipment is according to the first data block When fingerprint is found in the second equipment without the first data block of storage, the second equipment is searched according to the similar fingerprints of the first data block In whether store the first referenced data block.If due to being stored with the first data block in the second equipment, the first equipment can not To the second equipment send the first data block, therefore, the technical solution compared to transmission the first data block and the second referenced data block it Between variance data technical solution, can further save bandwidth resources.
In a kind of possible design, the similar fingerprints of data to be transmitted include the similar fingerprints of the second data block, and second Data block is another data block of data to be transmitted;The fingerprint of referenced data block does not include the fingerprint of the second referenced data block; The similar fingerprints of second referenced data block are identical with the similar fingerprints of the second data block;Data also include the second data block.
In a kind of possible design, the second equipment includes first order caching and second level caching, first order caching right and wrong Persistence medium, second level caching is persistence medium, first order caching for cache the part that is stored in the caching of the second level or All data blocks and the partly or entirely fingerprint and similar fingerprints of data block;This method can also include:Second equipment is The fingerprint (or similar fingerprints) of the first data block is searched in level cache;If it is searched in first order caching less than the first data block Fingerprint (or similar fingerprints), then search the fingerprint (or similar fingerprints) of the first data block in being cached in the second level.Thus, The probability that data block in being cached due to the first order is hit is higher, i.e., usually can find the first number in first order caching According to block (or first referenced data block), therefore information searching efficiency can be improved.
In a kind of possible design, second level caching includes one or more containers, and each container is at least two numbers According to the set that the fingerprint and similar fingerprints of each data block in block and at least two data blocks are formed, in each container at least There is correlation between the content of two data blocks;This method can also include:If the first equipment is searched in being cached in the second level To a data block, then the container where data block is cached to the first order in caching.Wherein, which can be to be transmitted Any one data block or the reference data similar to any one data block in data to be transmitted in data Block.Thus, the probability that the data block in being cached due to the first order is hit is higher, i.e., usually in first order caching The data block is found, therefore information searching efficiency can be improved.
The third aspect, this application provides a kind of data processing equipment, for performing any that above-mentioned first aspect provides Kind method.The data processing equipment can be specifically above-mentioned first equipment.
In a kind of possible design, the data processing equipment can be carried out according to the method that above-mentioned first aspect provides The division of function module, can also will be two or more for example, can correspond to each function divides each function module Function is integrated in a processing module.
In alternatively possible design, which can include:Memory and processor, memory calculate for storing Machine program, when which is executed by processor so that the either method that first aspect face provides is performed.
Fourth aspect, this application provides a kind of data processing equipment, for performing any that above-mentioned second aspect provides Kind method.The data processing equipment can be specifically above-mentioned second equipment.
In a kind of possible design, the data processing equipment can be carried out according to the method that above-mentioned second aspect provides The division of function module, can also will be two or more for example, can correspond to each function divides each function module Function is integrated in a processing module.
In alternatively possible design, which can include:Memory and processor, memory calculate for storing Machine program, when which is executed by processor so that the either method that second aspect provides is performed.
The embodiment of the present application additionally provides a kind of processing unit, to realize the work(of above-mentioned first equipment or the second equipment Can, including processor and interface;Processing unit can be a chip, and processor can also be passed through by hardware to realize Software realizes that, when passing through hardware realization, which can be logic circuit, integrated circuit etc.;When by software come real Now, which can be a general processor, be realized by reading the software code stored in memory, the storage Device can be integrated in the processor, can be located at except processor, be individually present.
Present invention also provides a kind of computer readable storage mediums, store computer program thereon, when the program exists When being run on computer so that computer performs above-mentioned first aspect to any possible method of second aspect.
Present invention also provides a kind of computer program product, when run on a computer so that first aspect is extremely The either method that second aspect provides is performed.
It should be understood that any data processing equipment or computer storage media or the computer program production of above-mentioned offer Product are used to perform corresponding method presented above, and therefore, the attainable advantageous effect of institute can refer to corresponding method In advantageous effect, details are not described herein again.
Description of the drawings
A kind of configuration diagram for system that Fig. 1 is applicable in by data processing method provided by the embodiments of the present application;
Fig. 2 is a kind of interaction schematic diagram of data processing method provided by the embodiments of the present application;
Magnitude relationships of the Fig. 3 between a kind of data to be transmitted provided by the embodiments of the present application, unit length and data block Schematic diagram;
Fig. 4 is a kind of process schematic of similar fingerprints for calculating data block provided by the embodiments of the present application;
Fig. 5 is a kind of schematic diagram of first transmission list provided by the embodiments of the present application;
Fig. 6 is a kind of flow chart of the classification of determining data block provided by the embodiments of the present application;
Fig. 7 is a kind of schematic diagram of second transmission list provided by the embodiments of the present application;
Fig. 8 is a kind of schematic diagram of third transmission list provided by the embodiments of the present application;
The configuration diagram for another system that Fig. 9 is applicable in by data processing method provided by the embodiments of the present application;
Figure 10 is the process schematic of information stored in a kind of update first order caching provided by the embodiments of the present application;
Figure 11 is a kind of structure diagram of data processing equipment provided by the embodiments of the present application;
Figure 12 is the structure diagram of another data processing equipment provided by the embodiments of the present application.
Specific embodiment
Data block refers to the set that a part of data to be transmitted is formed.The size of different data block can be identical, can also It is different.
The fingerprint of data block refers to the mark for being used to mark the data block that whole characteristic informations based on the data block obtain Know information.The fingerprint of different data block is different.
The similar fingerprints of data block, refer to that the special characteristic information based on the data block obtains for marking the data block Identification information.For example, a certain data block is character string " 78905 ", if special characteristic information is the spy of the character of the 2nd position Reference ceases, then the similar fingerprints of the data block are the characteristic information of " 8 ";If special characteristic information is the character of the 5th position Characteristic information, then the similar fingerprints of the data block are the characteristic informations of " 5 ".The similar fingerprints of different data block can be identical, It can not also be identical.For example, two data blocks are character string " 78905 " and " 12345 " respectively, if special characteristic information is the 2nd The characteristic information of the character of a position, then the similar fingerprints of the two data blocks are different, and be respectively " 8 " characteristic information and The characteristic information of " 2 ";If special characteristic information is the characteristic information of the character of the 5th position, the two data blocks it is similar Fingerprint is identical, and is the characteristic information of " 5 ".Wherein, the characteristic information of character here can be character in itself or The information about character being calculated according to specific algorithm.
The referenced data block similar to a certain data block refers to the data block for having same similar fingerprints with the data block. For example, if two data blocks are character string " 78905 " and " 12345 " respectively, and special characteristic information is the character of the 5th position Characteristic information, then the similar fingerprints of the two data blocks are identical, and are the characteristic informations of " 5 ", in this case, " 78905 " Can be as the referenced data block of " 12345 ", similarly, " 12345 " can also be used as the referenced data block of " 78905 ".
Container refers to the set that information is formed.One container can include multiple data blocks, and multiple data are in the block every The fingerprint and similar fingerprints of one data block.Each container tool is there are one container identification, for marking the container.Accelerator can be right Information unification in one container is scheduled, such as all information in a container are written to caching etc..
Term "and/or" in the application, be used only for description affiliated partner incidence relation, represent affiliated partner between There may be three kinds of relationships, for example, A and/or B, can represent:Individualism A exists simultaneously A and B, individualism B.Symbol "/" represents that affiliated partner is relationship such as A/B expressions A or B either.Term " first ", " second " etc. are for distinguishing not Same object rather than the particular order for description object." multiple " refer to two or more.
As shown in Figure 1, it is a kind of framework signal for system that data processing method provided by the embodiments of the present application is applicable in Figure.System shown in FIG. 1 includes:Sending ending equipment 1, transmitting terminal accelerator 2, receiving terminal accelerator 3 and receiving device 4. Sending ending equipment 1 transmits information through transmitting terminal accelerator 2 and receiving terminal accelerator 3 to receiving device 4.Transmitting terminal accelerator 2 Mounted on 1 side of sending ending equipment, receiving terminal accelerator 3 is mounted on 4 side of receiving device, and transmitting terminal accelerator 2 adds with receiving terminal Fast device 3 passes through WAN communication.Sending ending equipment 1 and receiving device 3 may each be data center, for example, cloud computing center, cloud Disaster Preparation Center, edge cloud etc..Transmitting terminal accelerator 2 and receiving terminal accelerator 3 can be referred to as WAN accelerators.The application carries In the scenes such as the data processing method of confession can be applied to data backup, data are restored.
It should be understood that the sending ending equipment in a certain secondary data transmission procedure, it can in another secondary data transmission procedure It can be by as receiving device;Correspondingly, the transmitting terminal accelerator of sending ending equipment side installation, in another secondary data transmission In the process by as receiving terminal accelerator.Similarly, the receiving device in a certain secondary data transmission procedure is passed in another secondary data It may be by as sending ending equipment during defeated;Correspondingly, the receiving terminal accelerator of receiving device side installation, another at this By as transmitting terminal accelerator in secondary data transmission procedure.
It should be understood that transmitting terminal accelerator 2 can be the first equipment described in this application, receiving terminal accelerator 3 can To be the second equipment described in this application;First equipment described in this application may also mean that sending ending equipment 1, and second sets It is standby to refer to receiving device 4;Another kind realizes that the first equipment described in this application may also mean that sending ending equipment 1 and send Accelerator 2 is held, the second equipment refers to receiving terminal accelerator 3 and receiving device 4.
As shown in Fig. 2, the interaction schematic diagram for a kind of data processing method provided by the embodiments of the present application.It is shown in Fig. 2 Method can be applied in system architecture as shown in Figure 1.Method shown in Fig. 2 includes the following steps S101~S110:
S101:Sending ending equipment sends data to be transmitted to transmitting terminal accelerator.
In field of cloud calculation, in general, periodically or non-periodically having a large amount of data needs from sending ending equipment through hair Sending end accelerator and receiving terminal accelerator are transferred to receiving device.The big of the data (i.e. data to be transmitted) transmitted is needed every time It is small to may be the same or different.For example, the size of data that certain needs transmits is 10GB.
S102:After transmitting terminal accelerator receives the data to be transmitted of sending ending equipment transmission, data to be transmitted is cut It is divided into several unit lengths, then, elongated piecemeal is carried out to the data of each unit length, obtains several data blocks.
Since the size for the data for needing to transmit every time may be the same or different.For the ease of management, introduce The concept of " unit length ", the size of unit length can be such as, but not limited to 4MB.In general, transmitting terminal accelerator and reception End accelerator the data of one unit length are uniformly processed (such as unified calculation fingerprint similar fingerprints, uniform transmission Deng).
Elongated piecemeal is a kind of block algorithm that data block is carried out according to data content.Elongated piecemeal can for example but not It is limited to realize using sliding window technique and Rabin fingerprint technology.Any two data block obtained using elongated partition Size can be equal, can not also be equal.The data block obtained after elongated piecemeal is carried out to the data of different unit lengths Number can be identical, can not also be identical.For example, it is assumed that the size of unit length is 4MB, and obtained using elongated partition The average value of the size of data block is about 8KB, then the data of some unit lengths can obtain 511 numbers after elongated piecemeal According to block, the data of some unit lengths can obtain 512 data blocks after elongated piecemeal, and the data of some unit lengths are through becoming 514 data blocks etc. can be obtained after long piecemeal.Since the data block location of elongated piecemeal relies on data content, in data The data block as content before being syncopated as shifting with data is remained to after shifting, so as to be conducive to complete subsequent heavy delete Business, i.e., hereafter described in the business no longer transmitted of primary sources block.
It is as shown in figure 3, big between a kind of data to be transmitted provided by the embodiments of the present application, unit length and data block The schematic diagram of small relationship.Wherein, it with the size of data to be transmitted is 10GB to be in Fig. 3, and unit length is 4MB, data block size Average value be about to illustrate for 8K.In the example, transmitting terminal accelerator is receiving sending ending equipment transmission After 10GB data to be transmitted, can putting in order according to this 10GB data to be transmitted first, using every 4MB as granularity, to this 10GB data to be transmitted carries out cutting, obtains 2560 unit lengths (label is 1~2560 in Fig. 3);Then, it is right 4MB data to be transmitted in per unit length carries out elongated piecemeal, and being divided into 512 data blocks, (label is in Fig. 3 1~512).
It should be noted that block algorithm can be not limited to elongated block algorithm, such as can also be that fixed length piecemeal is calculated Method.Wherein, it is identical to the size of each data block obtained after the data of each unit length progress fixed length piecemeal.
S103:To any unit length, transmitting terminal accelerator calculate each data block in the unit length fingerprint and Similar fingerprints.
Transmitting terminal accelerator can calculate the fingerprint of data block by hash algorithm.Wherein, hash algorithm can for example but It is not limited to following any:Secure Hash Algorithm (secure hash algorithm, SHA1), message digest algorithm 5 (message digest algorithm 5, MD5), modulus algorithm, interception partial bytes algorithm etc..
Transmitting terminal accelerator can by local sensitivity hash algorithm (locality sensitive hashing, LSH), Calculate the similar fingerprints of data block.Local sensitivity hash algorithm be it is a kind of by design meet the special nature i.e. Kazakhstan of local sensitivity Uncommon function, the method for improving similar search efficiency.The same data block obtained using different local sensitivity hash algorithms it is similar Fingerprint can be identical, can not also be identical.Local sensitivity hash algorithm can be such as, but not limited to following any: Minhash, simhash etc..
Optionally, it in order to shorten the time for calculating similar fingerprints and being consumed, in some embodiments of the present application, provides 1)~4 a kind of method for the similar fingerprints for calculating data block, specifically may include steps of):
1) it is multiple sub-blocks by the data block cutting, for any data block.
Wherein, used algorithm can be fixed length block algorithm during by the data block cutting for multiple sub-blocks, It can be elongated block algorithm.For elongated block algorithm, the size of each sub-block can be such as, but not limited to 8 ~16 bytes (Byte).Wherein, the size of different sub-blocks can be identical, can not also be identical.
2) multiple subdata n target sub-block in the block, is extracted.Wherein, n is greater than the integer equal to 1.It is each Target sub-block is a special characteristic information for being regarded as the data block.
For example, it is assumed that the data block of a 8KB 1000 sub-blocks, target sub-block are divided into step 1) Refer to the 4k sub-block, k is greater than the integer equal to 1, then the target sub-block extracted can be:Target subdata Block 4,8,12,16,20 ... 1000.
Step 1)~2) it is a kind of specific implementation that special characteristic information is extracted from data block.The application is unlimited In this.
3), using the different hash algorithm of m kinds, each target sub-block in the block to the data carries out Hash fortune respectively It calculates, obtains m group Hash sequences.Wherein, m is greater than or equal to 2 integer.
Hash operation is carried out to each target sub-block, a cryptographic Hash can be obtained.It is right using a kind of hash algorithm N target sub-block carries out Hash operation, obtains n cryptographic Hash, this n cryptographic Hash forms a Hash sequence.Therefore, it is sharp With the different hash algorithm of m kinds, each target sub-block progress Hash operation in the block to the data, can obtain m respectively Hash sequence, wherein, each Hash sequence includes n cryptographic Hash.
4) maximum value in each Hash sequence, is obtained, the sequence obtained after this m maximum value is merged is as the data The similar fingerprints of block.Alternatively, obtaining the minimum value in each Hash sequence, the sequence obtained after this m minimum value is merged is made Similar fingerprints for the data block.M value (including maximum value or minimum value) is merged, is referred to this m value according to m kinds Putting in order for hash algorithm is ranked up, and obtains a sequence.Wherein, putting in order for this m kind hash algorithm can be appointed Meaning, still, this m kind hash algorithm put in order once it is determined that, then when calculating the set of metadata of similar data block of each data block, make Above-mentioned m value is merged with this fixed puts in order.
As shown in figure 4, the process schematic for the optional realization method.Wherein, it is with the different Hash of m kinds in Fig. 4 Algorithm is specifically hash algorithm 1,2,3, and carries out Hash fortune to multiple target sub-block respectively using hash algorithm 1,2,3 It calculates, obtains what is illustrated for Hash sequence 1,2,3.In the optional realization method, transmitting terminal accelerator obtains m Hash The process of sequence can perform parallel, thus, can shorten the time spent by the similar fingerprints for calculating data block.
It should be noted that since the data in a unit length are uniformly processed in transmitting terminal accelerator, In actual implementation, transmitting terminal accelerator withouts waiting for having been calculated each data block of all data to be transmitted of this secondary transmission After fingerprint and similar fingerprints, S104 is just performed, but, it can be in the finger of each data block during a unit length has been calculated After line and similar fingerprints, you can continue to execute S104 for the unit length.
It should be understood that the fingerprint of data to be transmitted block can include:The finger of each data block in data to be transmitted Line.The similar fingerprints of data to be transmitted block can include:The similar fingerprints of each data block in data to be transmitted.
S104:Transmitting terminal accelerator sends the fingerprint of each data block in the unit length and similar to receiving terminal accelerator Fingerprint.
It is exemplary, fingerprint and the similar fingerprints structure first of each data block of the transmitting terminal accelerator in the unit length Transmission list;Then, the first transmission list is sent to receiving terminal accelerator.First transmission list is used to indicate in the unit length Each data block fingerprint and similar fingerprints.Be in the present embodiment with transmitting terminal accelerator by the fingerprint of a unit length and Similar fingerprints are sent to what is illustrated for receiving terminal accelerator in the form of a list, and certain the application is without being limited thereto.
As shown in figure 5, for a kind of schematic diagram of first transmission list provided by the embodiments of the present application.First transmission list can To include:Each data according to a certain sequence (hereinafter referred to First ray) arrangement in header information and the unit length Finger print information of block etc..Wherein, header information, can example for recording the summary information of content transmitted in the first transmission list Such as but it is not limited to include:The number of data block in the unit length, the initial position of the finger print information of first data block.Its In, the initial position of the finger print information of first data block, Ke Yishi:Represent the first ratio shared by the finger print information of first data block Special position is information of which bit in the first transmission list etc..First ray can be such as, but not limited in S101 When performing elongated piecemeal, the sequence of each data block composition of acquisition.If for example, when performing elongated piecemeal, the unit length by according to Secondary to be divided into data block 1, data block 2, data block 3 ..., then the finger print information of each data block in the first transmission list can be according to It is secondary to be:The finger print information of data block 1, the finger print information of data block 2, data block 3 a data block of finger print information ... finger Line information includes the fingerprint of the data block and the similar fingerprints of the data block.It is that data are included with a unit length in Fig. 5 It is illustrated for 1~data block of block 512.The finger print information of each data block in first transmission list is not limited to such as Fig. 5 institutes The example shown.For example, the finger print information of each data block in the first transmission list can also be successively:Fingerprint, the number of data block 1 According to the fingerprint of block 2, data block 3 fingerprint ... the fingerprint of data block 512, the similar fingerprints of data block 1, data block 2 it is similar Fingerprint, data block 3 similar fingerprints ... data block 512 similar fingerprints.
S105:Receiving terminal accelerator receives the fingerprint of each data block in the unit length of transmitting terminal accelerator transmission After similar fingerprints, the classification of each data block in the unit length is determined.
Receiving terminal accelerator can determine the classification of each data block successively according to First ray, can also determine simultaneously to The classification of few two data blocks, the application is to this without limiting.
The classification of data block includes:Primary sources block, secondary sources block and third class data block.Wherein, if one Data block is primary sources block, represents that receiving terminal accelerator has stored the data block.If a data block is the second class Data block represents not storing the data block in receiving terminal accelerator, but stores the similar reference number to the data block According to block.If a data block is third class data block, represents both not storing the data block in receiving terminal accelerator, also not deposit The storage referenced data block similar to the data block.
For example, for any data block, as shown in fig. 6, the data block can be determined with T1~T6 as follows Classification:
T1:Receiving terminal accelerator obtains the fingerprint and similar fingerprints of the data block.
T2:Receiving terminal accelerator judges the local fingerprint that whether can find the data block.
If so, illustrating that receiving terminal accelerator has stored the data block, then T3 is performed.
If it is not, illustrating that receiving terminal accelerator does not store the data block, then T4 is performed.
If it should be noted that in general, store a data block in receiving terminal accelerator, this number can be stored simultaneously According to the fingerprint of block.Therefore, judge the local fingerprint for whether storing the data block, you can judge locally whether store the data Block.
T3:Receiving terminal accelerator judges that the data block is primary sources block.
After performing T3, then terminate.
T4:Receiving terminal accelerator judges the local similar fingerprints that whether can find the data block.
If so, illustrating that receiving terminal accelerator has stored the referenced data block similar to the data block, then T5 is performed.
If it is not, illustrate that receiving terminal accelerator without storing the referenced data block similar to the data block, then performs T6.
If it should be noted that in general, store a data block in receiving terminal accelerator, this number can be stored simultaneously According to the similar fingerprints of block.Therefore, judge the local similar fingerprints for whether storing the data block, you can local whether store judged The referenced data block similar to the data block.
T5:Receiving terminal accelerator judges that the data block is secondary sources block.
After performing T5, then terminate.
It should be noted that the similar fingerprints due to different data block may be identical, it may in receiving terminal accelerator Multiple data blocks with same similar fingerprints are cached with, based on this, in T5, receiving terminal accelerator can be by multiple data One of data block in the block is used as with reference to data block.Alternatively, receiving terminal accelerator is in caching data block, for having Multiple data blocks of same similar fingerprints can only cache one of data block, add in such manner, it is possible to avoid the occurrence of receiving terminal Having been cached in fast device has a case that multiple data blocks of same similar fingerprints.
It is further to note that receiving terminal accelerator may be used also after judging that a certain data block is secondary sources block To obtain the fingerprint of the referenced data block similar to the data block, so as to prepare to perform S106.
T6:Receiving terminal accelerator judges that the data block is third class data block.
After performing T6, then terminate.
S106:Receiving terminal accelerator feeds back the classification of each data block in the unit length to transmitting terminal accelerator, with And the fingerprint of the referenced data block similar to each secondary sources block in the unit length.
It is exemplary, classification structure the second transmission row of each data block of the receiving terminal accelerator in the unit length Table, and send the second transmission list to transmitting terminal accelerator.Wherein, the second transmission list is used to indicate every in the unit length The fingerprint of the classification of one data block and each referenced data block.
As shown in fig. 7, for a kind of schematic diagram of second transmission list provided by the embodiments of the present application.Second transmission list can To include:The classification logotype of each data block in header information and First ray and with each second class in First ray The fingerprint of the similar referenced data block of data block.Wherein, header information is used for the content transmitted in the second transmission list of record Summary information can such as, but not limited to include:The total length of the classification logotype of each data block in the unit length, the second class The item number of the fingerprint of the referenced data block of data block and initial position etc., wherein, the finger of the referenced data block of secondary sources block The initial position of line, Ke Yishi:It is the second transmission list to represent the first bit shared by the fingerprint of first secondary sources block In which bit information.That 100 secondary sources blocks are included with the unit length in Fig. 7, and with this 100 The similar referenced data block of secondary sources block is marked as what is illustrated for 1~referenced data block of referenced data block 100. In addition, the classification logotype of primary sources block can be binary number " 11 ", the classification logotype of secondary sources block can be two System number " 10 ", the classification logotype of third class data block can be binary number " 00 ", and certain the application is without being limited thereto.
S107:Transmitting terminal accelerator receives the class of each data block in the unit length of receiving terminal accelerator feedback Not and after the fingerprint of the referenced data block similar to each secondary sources block in the unit length, according to following tactful 1 ~3 to receiving terminal accelerator transmission data:
Strategy 1:For primary sources block, any data are not transmitted.
Strategy 2:For secondary sources block, transmitting terminal accelerator judges locally the secondary sources block phase whether can be found As referenced data block.If can the referenced data block similar to the secondary sources block locally be being found, to second class Data block and the referenced data block similar to the secondary sources block carry out Differential Compression, and send difference to receiving terminal accelerator The data obtained after compression.If cannot locally find the referenced data block similar to the secondary sources block, to this second Class data block is compressed, and the information obtained after compression is sent to receiving terminal accelerator.Differential Compression, it can be understood as:First The variance data between the secondary sources block and the referenced data block similar to the secondary sources block is calculated, then to the difference Data are compressed.
Strategy 3:For third class data block, transmitting terminal accelerator compresses third class data block, and sends compression The information obtained afterwards.
For above-mentioned tactful 1, due to being cached with primary sources block in receiving terminal accelerator, transmitting terminal accelerator The data block can not be sent to receiving terminal accelerator.
For above-mentioned tactful 2, due to being cached with the referenced data block similar to secondary sources block in receiving terminal accelerator, Therefore transmitting terminal accelerator only sends the difference number between the secondary sources block and the referenced data block to receiving terminal accelerator According to, you can receiving terminal accelerator is made to recover the secondary sources block according to the variance data and the referenced data block.Wherein, such as The referenced data block is not stored in fruit transmitting terminal accelerator, then cannot perform the step of calculating variance data, in this case, hair Sending end accelerator needs to send the secondary sources block to receiving terminal accelerator.
For above-mentioned tactful 3, due to storage third class data block no in receiving terminal accelerator, also without storage and third The similar referenced data block of class data block, therefore, transmitting terminal accelerator need to send the third data block to receiving terminal accelerator.
In addition, transmitting terminal accelerator Differential Compression for performing or the step of compression in above-mentioned tactful 2 and strategy 3, it can Bandwidth resources are further saved, so as to reduce information transmission time, that is, accelerate the rate of information throughput.Wherein, the application is to holding For used algorithm without limiting, Differential Compression algorithm can be such as, but not limited to following when row Differential Compression and compression It is a kind of:X-delta, LZ-delta etc..Compression algorithm can be such as, but not limited to following any:Gzip, LZ4, bzip, 7zip etc..
Exemplary, when performing S107, transmitting terminal accelerator is received in the unit length of receiving terminal accelerator feedback Each data block classification and the referenced data block similar to each secondary sources block in the unit length after, can To send third transmission list according to above-mentioned construction of strategy third transmission list, and to receiving terminal accelerator.Wherein, third is transmitted List is used to indicate each information (hereinafter referred to Differential Compression data) obtained after Differential Compression and each compressed The information (hereinafter referred to compressed data) obtained afterwards.
As shown in figure 8, for a kind of schematic diagram of third transmission list provided by the embodiments of the present application.Third transmission list can To include:Header information, the classification logotype of each data block in First ray, each Differential Compression data and each pressure Contracting data.Wherein, header information is used to record the summary information for the content transmitted in third transmission list, such as, but not limited to wraps It includes:The total length of the classification logotype of each data block in the unit length, the item number of Differential Compression data and initial position, compression The item number of data and initial position etc..Wherein, the initial position of Differential Compression data (or compressed data), Ke Yishi represent first First bit shared by a Differential Compression data (or compressed data) is the letter of which bit in third transmission list Breath.Wherein, it with the number of secondary sources block that referenced data block can be found in transmitting terminal accelerator is 90 that Fig. 8, which is, And obtain what is illustrated for Differential Compression 1~Differential Compression of data data 90 after Differential Compression.In addition, primary sources The classification logotype of block can be binary number " 11 ", and the secondary sources of referenced data block can be found in transmitting terminal accelerator The classification logotype of block can be binary number " 10 ", it is impossible to the secondary sources of referenced data block are found in transmitting terminal accelerator The classification logotype of block can be binary number " 01 ", and the classification logotype of third class data block can be " 00 ".It should be understood that this Place is the secondary sources block can find referenced data block in transmitting terminal accelerator, is sentenced with receiving terminal accelerator in S105 Fixed secondary sources block, for being marked using same binary system " 10 ", in actual implementation, the label of the two can not also Together.
S108:After receiving terminal accelerator receives above- mentioned information, according to it is following it is tactful 4~5 perform difference decompression and/or Decompression:
Strategy 4:For Differential Compression data, the fingerprint of the corresponding secondary sources block of the Differential Compression data, example are determined Such as, receiving terminal accelerator can determine a Differential Compression data according to the header information of third transmission list, and determine the difference The fingerprint of the corresponding secondary sources block of different compressed data.Then, the referenced data block similar to the secondary sources block is obtained, And difference decompression is carried out to the referenced data block and Differential Compression data, obtain the secondary sources block.
Strategy 5:For compressed data, which is decompressed, obtains data block.It should be understood that based on upper Strategy 2 and strategy 3 are stated it is found that the data block may be secondary sources block, it is also possible to third class data block.
S109:Receiving terminal accelerator is to each he first-class numbert in each data block and the unit length that are obtained in S108 It is assembled according to block, recovers the data of the unit length.It should be understood that number of the receiving terminal accelerator to multiple unit lengths After being assembled, data to be transmitted can be recovered.
S110:Data to be transmitted is sent to receiving device by receiving terminal accelerator.
In data processing method provided by the embodiments of the present application, transmitting terminal accelerator with receiving terminal accelerator by carrying out letter Breath interaction, determines to be stored with the similar reference number to some in data to be transmitted or certain data blocks in receiving terminal accelerator According to block, then, the variance data between the data block and the referenced data block is sent to receiving terminal accelerator.In this way, compared to passing For defeated entire data block, bandwidth resources can be saved, so as to reduce information transmission time, that is, accelerate the rate of information throughput.
With reference to Fig. 1, transmitting terminal accelerator 2 can include:Accelerator agency (agent) 21, first order caching 22, the second level Caching 23 and interface 24;Receiving terminal accelerator 3 can include:Accelerator agency 31, first order caching 32, second level caching 33 and interface 34, as shown in Figure 9.Connection relation between each device can refer to Fig. 9.
Wherein, for any accelerator, it includes accelerator agency be the accelerator control centre.For example, With reference to Fig. 2, dicing step, elongated blocking step in the S102 that transmitting terminal accelerator performs, the tools such as the calculating step in S103 Body can be that the accelerator agency 21 sent in accelerator performs.Determining step in the S105 that receiving terminal accelerator performs, Differential Compression, compression step in S108 etc. can be specifically that the accelerator agency 31 in receiving terminal accelerator performs.
For any accelerator, interface is the interface with WAN communication.Interface can be based on agency's (proxy) agreement, Therefore poxy interfaces are referred to as.For example, an accelerator to another accelerator send information (such as above-mentioned S104, S106, S107 etc.), can be specifically:The accelerator of one accelerator acts on behalf of the interface through the accelerator to another accelerator Send information.One accelerator receives the information of another accelerator transmission, can be specifically the accelerator generation of an accelerator Manage the information of another accelerator transmission of the interface through the accelerator.
For any accelerator, first order caching is non-persistence medium, such as cache memory (cache).Second level caching is persistence medium, such as disk.The first order caches to cache what is stored in the caching of the second level The fingerprint and similar fingerprints of part or all of data block and part or all of data each data block in the block.In one kind It in optional realization method, manages for convenience, first order caching can delay including data buffer storage, fingerprint cache and similar fingerprints It deposits, as shown in Figure 9.Wherein, data buffer storage is used to cache the part or all of data block stored in the caching of the second level, fingerprint cache For caching the fingerprint of part or all of data each data block in the block, similar fingerprints caching is for caching the part or complete The similar fingerprints of portion's data each data block in the block.
For any accelerator, in general, the capacity of second level caching is more than the capacity of first order caching.For example, the The capacity of L2 cache is 20GB, and the capacity of first order caching is 30KB.The first order is set to cache in an accelerator, it can be with The efficiency of information searching is improved, so as to promote caching performance.The second level is set to cache in an accelerator, caching can be increased Capacity so as to improve the hit rate of information searching, and then saves bandwidth resources, reduces information transmission time.
In some embodiments of the present application, accelerator (can be transmitting terminal accelerator or receiving terminal adds Fast device) it searches when a data block (such as above-mentioned T2 or S107) whether is stored in the accelerator, first, in the acceleration The fingerprint of the data block is searched in the first order caching of device;If searching the fingerprint less than the data block in first order caching, The data block is searched in being cached in the second level of the accelerator.Wherein, which can be any one in data to be transmitted A data block or the referenced data block similar to any one secondary sources block in data to be transmitted.Thus, due to The probability that data block in first order caching is hit is higher, i.e., usually can find the data block in first order caching, Therefore information searching efficiency can be improved.
In some embodiments of the present application, receiving terminal accelerator is searched and is received according to the similar fingerprints of a certain data block When whether being stored with the referenced data block similar to the data block in the accelerator of end, it can first search and whether be deposited in first order caching Contain the similar fingerprints;If finding, it is secondary sources block to judge the data block.If it is not searched in first order caching To the similar fingerprints, then such as, but not limited to one of following two realization methods can be performed:
A kind of realization method can be:If not finding the similar fingerprints in first order caching, the is continued to search for Whether the similar fingerprints are stored in L2 cache, and when finding, it is secondary sources block to judge the data block;Do not having When finding, do not have to store the reference data similar to the data block into transmitting terminal accelerator feedback representation receiving terminal accelerator The information of block.Thus, which the probability that the data block in being cached due to the first order is hit is higher, i.e., usually cached in the first order In can find the referenced data block, therefore information searching efficiency can be improved.
Another realization method can be:If the similar fingerprints are not found in first order caching, directly to hair There is no the information for storing the referenced data block similar to the data block in sending end accelerator feedback representation receiving terminal accelerator.It needs What is illustrated is, on the one hand, when the similar fingerprints stored in the caching of the second level are more, receiving terminal accelerator is looked into being cached in the second level Look for the time spent by the process of a similar fingerprints usually longer;On the other hand, according to analysis above, it is believed that: If can not find the similar fingerprints in first order caching, the probability that the similar fingerprints are found in being cached in the second level is smaller.Cause This, in this case, receiving terminal accelerator can not stored directly into transmitting terminal accelerator feedback representation receiving terminal accelerator The information of the referenced data block similar to the data block, so as to save the time searched spent by similar fingerprints.
The information stored in first order caching and second level caching is can be newer.For first order caching, Its data block cached can be the data block before and after data block accessed recently and/or data block accessed recently.It is right For the caching of the second level, the data block of caching can be that access times are greater than or equal to threshold value and/or are accessed recently Data block.For any level caching, when a data block is cached in the caching, the fingerprint and phase of the data block It can be buffered therewith like fingerprint;When a data block is deleted in the caching, the fingerprint and similar fingerprints of the data block therewith by It deletes.
For example, the accelerator agency in transmitting terminal accelerator receives the data to be transmitted of sending ending equipment transmission, and count It calculates after obtaining the fingerprint and similar fingerprints of each data block in the data of a unit length, which is divided into The fingerprint and similar fingerprints of each data block and each data block are buffered in first order caching and second level caching.Wherein, if The free space of any level caching (such as first order caching or second level caching) is not enough to cache what the unit length was divided into The fingerprint and similar fingerprints of each data block and each data block are then deleted in first order caching and are stored into this grade caching earliest And/or the data block and the fingerprint and similar fingerprints of the data block being accessed earliest.If the free space of this grade caching is enough Each data block that the unit length is divided into and the fingerprint and similar fingerprints of each data block are cached, then is directly delayed in this grade of grade Deposit middle each data block for being divided into of the increase unit length and the fingerprint and similar fingerprints of each data block.
For another example, in some embodiments of the present application, for any accelerator, second level caching includes one or more Container, each container include the fingerprint of each data block and similar finger at least two data blocks and at least two data blocks Line.If the accelerator has found a data block in being cached in the second level, the container where the data block is cached to the In level cache.Wherein, the data block can be any one data block in data to be transmitted or in data to be transmitted The similar referenced data block of any one secondary sources block.As shown in Figure 10, it is assumed that second level caching includes multiple containers (figure Container 1~3 is shown in 10), if during certain searching data block (such as above-mentioned T2 or S107), if The data block has been found in container 1 in L2 cache, then has been cached container 1 to the first order in caching.Wherein, Figure 10 is base It is drawn in Fig. 9, first order caching and second level caching in Figure 10 can belong to transmitting terminal accelerator, Huo Zheke To belong to receiving terminal accelerator.It should be noted that since there is correlation between the content of several continuous data blocks, It is therefore contemplated that a data block is hit, then several data blocks before and after the data block are hit in the future probability compared with Height based on this, introduces the concept of " container " in the embodiment, i.e., using at least two data blocks as a set, if the collection A data block in conjunction is hit, then it is assumed that the probability that other data blocks in the set are hit in the future is higher.Such one Come, subsequent information hit rate can be improved.
It is above-mentioned that mainly scheme provided by the embodiments of the present application is described from the angle of method.In order to realize above-mentioned work( Can, it comprises perform the corresponding hardware configuration of each function and/or software module.Those skilled in the art should be easy to anticipate Know, with reference to each exemplary unit and algorithm steps that the embodiments described herein describes, the application can with hardware or The combining form of hardware and computer software is realized.Some function is actually with the side of hardware or computer software driving hardware Formula performs, specific application and design constraint depending on technical solution.Professional technician can be to each specific Using realizing described function using distinct methods, but this realize it is not considered that beyond scope of the present application.
The embodiment of the present application (can be able to be the transmission that is outlined above to data processing equipment according to above method example Hold accelerator or receiving terminal accelerator) division of function module is carried out, for example, can correspond to each function divides each function mould Two or more functions can also be integrated in a processing module by block.Above-mentioned integrated module both may be used The form of hardware is realized, can also be realized in the form of software function module.It is it should be noted that right in the embodiment of the present application The division of module is schematical, and only a kind of division of logic function can have other dividing mode in actual implementation.
As shown in figure 11, it is a kind of data processing equipment 11 provided by the embodiments of the present application.Data processing equipment 11 can be with It is the transmitting terminal accelerator being outlined above or refers to sending ending equipment or sending ending equipment and transmitting terminal accelerator. Data processing equipment 11 shown in Figure 11 can include:Computing unit 1101, transmitting element 1102, receiving unit 1103 and Searching unit 1104.Wherein, computing unit 1101, for calculating the similar fingerprints of data to be transmitted;Wherein, data to be transmitted Similar fingerprints include the similar fingerprints of the first data block, and the first data block is a data block in data to be transmitted.It sends single Member 1102, for sending the similar fingerprints of data to be transmitted to the second equipment, the similar fingerprints of data to be transmitted are for searching the Whether to data to be transmitted similar referenced data block is stored in two equipment.Receiving unit 1103, for receiving the second equipment The fingerprint of the referenced data block of transmission;Wherein, the fingerprint of referenced data block includes the fingerprint of the first referenced data block;First reference The similar fingerprints of data block are identical with the similar fingerprints of the first data block.Searching unit 1104, for according to the first reference data The fingerprint of block finds the first referenced data block in data processing equipment 11.Transmitting element 1102 is additionally operable to, based on reference number According to the fingerprint of block to the second equipment transmission data;Wherein, which includes between the first referenced data block and the first data block Variance data.For example, with reference to Fig. 2, data processing equipment 11 can be specifically transmitting terminal accelerator, and the second equipment can be specifically Receiving terminal accelerator.First data block can be the secondary sources block being outlined above.Computing unit 1101 can be used for holding The step of similar fingerprints are calculated in row S103.Transmitting element 1102 can be used for performing in S104 the step of sending similar fingerprints, In S107 the step of transmission data.Receiving unit 1103 can be used for performing the step for the fingerprint that referenced data block is received in S106 Suddenly.
In a kind of possible design, computing unit 1101 specifically can be used for:It is treated using local sensitivity hash algorithm The data block for transmitting data carries out Hash operation, obtains the similar fingerprints of data block.
In a kind of possible design, computing unit 1101 specifically can be used for:Cutting data to be transmitted obtains data block. For each data block, computing unit 1101 performs following operation:Extract at least one of data block sub-block;It utilizes M kind hash algorithms carry out Hash operation at least one sub-block respectively, obtain m Hash sequence;Wherein, a kind of Kazakhstan is utilized Uncommon algorithm carries out Hash operation at least one sub-block, obtains 1 Hash sequence;M is greater than the integer equal to 2;By m The maximum value in each Hash sequence in Hash sequence merges, and using the Hash sequence obtained after merging as data block Similar fingerprints;Alternatively, the minimum value in each Hash sequence in m Hash sequence is merged, and obtained after merging Similar fingerprints of the Hash sequence arrived as data block.For example, computing unit 1101 specifically can be used for performing mistake shown in Fig. 4 Each step in journey.
In a kind of possible design, data processing equipment 11 can also include:Differential Compression unit 1105, for utilizing Differential Compression algorithm carries out Differential Compression to the first referenced data block and the first data block.For example, Differential Compression unit 1105 has The step of body can be used for performing Differential Compression in S107.
In a kind of possible design, the similar fingerprints of data to be transmitted also include the similar fingerprints of the second data block, the Two data blocks are another data blocks in data to be transmitted;The fingerprint of referenced data block does not include the finger of the second referenced data block Line;The similar fingerprints of second referenced data block are identical with the similar fingerprints of the second data block;Data also include the second data block.The Two data blocks can be the third class data block being outlined above.
In a kind of possible design, data processing equipment 11 further includes first order caching and second level caching, the first order Caching is non-persistence medium, and second level caching is persistence medium, and first order caching stores for caching in the caching of the second level Part or all of data block and partly or entirely data block fingerprint and similar fingerprints.In this case, searching unit 1104 Specifically it can be used for:The fingerprint of the first referenced data block is searched in first order caching;If the first order caching in search less than The fingerprint of first referenced data block searches the fingerprint of the first referenced data block in then being cached in the second level.
In a kind of possible design, second level caching includes one or more containers, and each container is at least two numbers According to the set that the fingerprint and similar fingerprints of each data block in block and at least two data blocks are formed, in each container at least There is correlation between the content of two data blocks.In this case, searching unit 1104 can be also used for:If it is cached in the second level In find a data block, then by the container where data block cache to the first order cache in.For example, searching unit 1104 can For performing each step in scene shown in Fig. 10.
In a kind of possible design, above-mentioned transmitting element 1102 and receiving unit 1103 specifically can be in corresponding diagrams 9 Interface 24.Part or all of in computing unit 1101, searching unit 1104, Differential Compression unit 1105 can be in corresponding diagram 9 Accelerator agency 21.
As shown in figure 12, it is a kind of data processing equipment 12 provided by the embodiments of the present application.Data processing equipment 12 can be with It is the receiving terminal accelerator or receiving device or receiving terminal accelerator and receiving device being outlined above. Data processing equipment 12 shown in Figure 12 can include:Receiving unit 1201, searching unit 1202 and transmitting element 1203. Wherein, receiving unit 1201, for receiving the similar fingerprints for the data to be transmitted that the first equipment is sent, wherein, data to be transmitted Similar fingerprints include the similar fingerprints of the first data block, the first data block is a data block in data to be transmitted.It searches Unit 1202 for the similar fingerprints according to data to be transmitted, finds and is stored in data processing equipment 12 and number to be transmitted According to similar referenced data block;Wherein, referenced data block includes the first referenced data block, the similar fingerprints of the first referenced data block It is identical with the similar fingerprints of the first data block.Transmitting element 1203, for sending the fingerprint of referenced data block to the first equipment;Its In, the fingerprint of referenced data block includes the fingerprint of the first referenced data block;The fingerprint of referenced data block is used for the first equipment to number According to 12 transmission data of processing equipment, which includes the variance data between the first referenced data block and the first data block.It receives Unit 1201 is additionally operable to, and receives the data that the first equipment is sent.For example, with reference to Fig. 2, data processing equipment 12 is specifically to receive Accelerator is held, the first equipment can be specifically transmitting terminal accelerator.First data block can be the he second-class number being outlined above According to block.Receiving unit 1201 specifically can be used for performing in S104 the step of receiving similar fingerprints.Transmitting element 1203 specifically may be used The step of for performing the fingerprint that referenced data block is sent in S106.First data block can be the second class being outlined above Data block.
In a kind of possible design, receiving unit 1201 can be also used for, and receive the number to be transmitted that the first equipment is sent According to fingerprint, wherein, the fingerprint of data to be transmitted includes the fingerprint of the first data block.In this case, searching unit 1202 may be used also For when being found in data processing equipment 12 without the first data block of storage according to the fingerprint of the first data block, according to the The first referenced data block whether is stored in the similar fingerprints searching data processing equipment 12 of one data block.With reference to Fig. 6, receive single Member 1201 can be used for performing T2 and T4 etc..
In a kind of possible design, the similar fingerprints of data to be transmitted include the similar fingerprints of the second data block, and second Data block is another data block of data to be transmitted;The fingerprint of referenced data block does not include the fingerprint of the second referenced data block; The similar fingerprints of second referenced data block are identical with the similar fingerprints of the second data block;Data also include the second data block.Second Data block can be the third class data block being outlined above.
In a kind of possible design, above-mentioned receiving unit 1201 and transmitting element 1203 specifically can be in corresponding diagrams 9 Interface 34.Searching unit 1202 can act on behalf of 31 with the accelerator in corresponding diagram 9.
Since data processing equipment provided by the embodiments of the present application can be used for performing above-mentioned data processing method, It can be obtained technique effect and can refer to above method embodiment, details are not described herein for the embodiment of the present application.
It can be realized in a manner of hardware with reference to the step of described method of present disclosure or algorithm, also may be used It is realized in a manner of being to perform software instruction by processing module.Software instruction can be made of corresponding software module, software Module can be stored on random access memory (random access memory, RAM), flash memory, read-only memory (read Only memory, ROM), Erasable Programmable Read Only Memory EPROM (erasable programmable ROM, EPROM), electricity can Erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM (CD-ROM) or in the storage medium of any other form well known in the art.A kind of illustrative storage medium is coupled to place Device is managed, so as to enable a processor to from the read information, and information can be written to the storage medium.Certainly, it stores Medium can also be the component part of processor.Pocessor and storage media can be located in ASIC.
Those skilled in the art are it will be appreciated that in said one or multiple examples, work(described herein It can be realized with hardware, software, firmware or their arbitrary combination.It when implemented in software, can be by these functions Storage is transmitted in computer-readable medium or as one or more instructions on computer-readable medium or code. Computer-readable medium includes computer storage media and communication media, and wherein communication media includes being convenient for from a place to another Any medium of one place transmission computer program.It is any that storage medium can be that general or specialized computer can access Usable medium.
More than specific embodiment has carried out further specifically the purpose, technical solution and advantageous effect of the application It is bright, it should be understood that the foregoing is merely the specific embodiment of the application, it is not used to limit the protection of the application Range.

Claims (22)

1. a kind of data processing method, which is characterized in that the method includes:
First equipment calculates the similar fingerprints of data to be transmitted;Wherein, the similar fingerprints of the data to be transmitted include the first number According to the similar fingerprints of block, first data block is a data block in the data to be transmitted;
First equipment sends the similar fingerprints of the data to be transmitted, the phase of the data to be transmitted to second equipment It is used for searching in second equipment whether store the referenced data block similar to the data to be transmitted like fingerprint;
First equipment receives the fingerprint for the referenced data block that second equipment is sent;Wherein, the referenced data block Fingerprint includes the fingerprint of the first referenced data block;The similar fingerprints of first referenced data block and the phase of first data block It is identical like fingerprint;
First equipment finds first ginseng in first equipment according to the fingerprint of first referenced data block Examine data block;
The fingerprint of first equipment based on the referenced data block is to the second equipment transmission data;Wherein, the data Include the variance data between first referenced data block and first data block.
2. according to the method described in claim 1, it is characterized in that, first equipment calculates the similar finger of data to be transmitted Line, including:
The first equipment utilization local sensitivity hash algorithm carries out Hash operation to the data block of the data to be transmitted, obtains The similar fingerprints of the data block.
3. according to the method described in claim 2, it is characterized in that, the first equipment utilization local sensitivity hash algorithm is to institute The data block for stating data to be transmitted carries out Hash operation, obtains the similar fingerprints of the data block, including:
Data to be transmitted described in the first equipment cutting obtains data block;For each data block, first equipment is held The following operation of row:
Extract at least one of data block sub-block;
Using m kind hash algorithms, Hash operation is carried out at least one sub-block respectively, obtains m Hash sequence;Its In, Hash operation is carried out at least one sub-block using a kind of hash algorithm, obtains 1 Hash sequence;M is greater than Integer equal to 2;
Maximum value in each Hash sequence in the m Hash sequence is merged, and the Hash that will be obtained after merging Similar fingerprints of the sequence as the data block;Alternatively, by the minimum value in each Hash sequence in the m Hash sequence It merges, and using the Hash sequence obtained after merging as the similar fingerprints of the data block.
4. method according to any one of claims 1 to 3, which is characterized in that the method further includes:
The first equipment utilization Differential Compression algorithm carries out difference to first referenced data block and first data block Compression.
5. method according to any one of claims 1 to 4, which is characterized in that the similar fingerprints of the data to be transmitted are also The similar fingerprints of the second data block are included, second data block is another data block in the data to be transmitted;It is described The fingerprint of referenced data block does not include the fingerprint of the second referenced data block;The similar fingerprints of second referenced data block with it is described The similar fingerprints of second data block are identical;The data also include second data block.
6. method according to any one of claims 1 to 5, which is characterized in that first equipment is cached including the first order It being cached with the second level, first order caching is non-persistence medium, and the second level caching is persistence medium, described first Grade caches to cache the part or all of data block stored in the second level caching and the part or all of data block Fingerprint and similar fingerprints;The method further includes:
First equipment searches the fingerprint of first referenced data block in first order caching;If in the first order The fingerprint less than first referenced data block is searched in caching, then searches first reference number in being cached in the second level According to the fingerprint of block.
7. according to the method described in claim 6, it is characterized in that, second level caching includes one or more containers, often One container is that the fingerprint of each data block and similar fingerprints are formed at least two data blocks and at least two data block Set, there is correlation between the content of at least two data blocks in each container;The method further includes:
If first equipment finds a data block in being cached in the second level, by the container where the data block During caching is cached to the first order.
8. a kind of data processing method, which is characterized in that the method includes:
Second equipment receive the first equipment send data to be transmitted similar fingerprints, wherein, the data to be transmitted it is similar Fingerprint includes the similar fingerprints of the first data block, and first data block is a data block in the data to be transmitted;
Second equipment according to the similar fingerprints of the data to be transmitted, find stored in second equipment with it is described The similar referenced data block of data to be transmitted;Wherein, the referenced data block includes the first referenced data block, first reference The similar fingerprints of data block are identical with the similar fingerprints of first data block;
Second equipment sends the fingerprint of referenced data block to first equipment;Wherein, the fingerprint of the referenced data block Include the fingerprint of first referenced data block;The fingerprint of the referenced data block sets for first equipment to described second Standby transmission data, the data include the variance data between first referenced data block and first data block;
Second equipment receives the data that first equipment is sent.
9. according to the method described in claim 8, it is characterized in that, the method further includes:
Second equipment receives the fingerprint for the data to be transmitted that first equipment is sent, wherein, the number to be transmitted According to fingerprint include the fingerprint of first data block;
Second equipment is found in second equipment according to the fingerprint of first data block without storage described first During data block, searched according to the similar fingerprints of first data block and first reference whether is stored in second equipment Data block.
10. method according to claim 8 or claim 9, which is characterized in that the similar fingerprints of the data to be transmitted include second The similar fingerprints of data block, second data block are another data blocks of the data to be transmitted;The referenced data block Fingerprint do not include the second referenced data block fingerprint;The similar fingerprints of second referenced data block and second data block Similar fingerprints it is identical;The data also include second data block.
11. a kind of data processing equipment, which is characterized in that the equipment includes:
Computing unit, for calculating the similar fingerprints of data to be transmitted;Wherein, the similar fingerprints of the data to be transmitted include the The similar fingerprints of one data block, first data block are a data blocks in the data to be transmitted;
Transmitting element, for sending the similar fingerprints of the data to be transmitted to second equipment, the data to be transmitted Similar fingerprints are used for searching in second equipment whether store the referenced data block similar to the data to be transmitted;
Receiving unit, for receiving the fingerprint for the referenced data block that second equipment is sent;Wherein, the referenced data block Fingerprint includes the fingerprint of the first referenced data block;The similar fingerprints of first referenced data block and the phase of first data block It is identical like fingerprint;
Searching unit, for finding first reference number in the equipment according to the fingerprint of first referenced data block According to block;
The transmitting element is additionally operable to, and the fingerprint based on the referenced data block is to the second equipment transmission data;Wherein, institute It states data and includes the variance data between first referenced data block and first data block.
12. equipment according to claim 11, which is characterized in that the computing unit is specifically used for:Utilize local sensitivity Hash algorithm carries out Hash operation to the data block of the data to be transmitted, obtains the similar fingerprints of the data block.
13. equipment according to claim 12, which is characterized in that the computing unit is specifically used for:
Data to be transmitted obtains data block described in cutting;For each data block, the computing unit performs following operation:
Extract at least one of data block sub-block;
Using m kind hash algorithms, Hash operation is carried out at least one sub-block respectively, obtains m Hash sequence;Its In, Hash operation is carried out at least one sub-block using a kind of hash algorithm, obtains 1 Hash sequence;M is greater than Integer equal to 2;
Maximum value in each Hash sequence in the m Hash sequence is merged, and the Hash that will be obtained after merging Similar fingerprints of the sequence as the data block;Alternatively, by the minimum value in each Hash sequence in the m Hash sequence It merges, and using the Hash sequence obtained after merging as the similar fingerprints of the data block.
14. according to claim 11 to 13 any one of them equipment, which is characterized in that the equipment further includes:
Differential Compression unit, for utilizing Differential Compression algorithm, to first referenced data block and first data block into Row Differential Compression.
15. according to claim 11 to 14 any one of them equipment, which is characterized in that the similar fingerprints of the data to be transmitted The similar fingerprints of the second data block are also included, second data block is another data block in the data to be transmitted;Institute The fingerprint for stating referenced data block does not include the fingerprint of the second referenced data block;The similar fingerprints of second referenced data block and institute The similar fingerprints for stating the second data block are identical;The data also include second data block.
16. according to claim 11 to 15 any one of them equipment, which is characterized in that the equipment further includes first order caching It being cached with the second level, first order caching is non-persistence medium, and the second level caching is persistence medium, described first Grade caches to cache the part or all of data block stored in the second level caching and the part or all of data block Fingerprint and similar fingerprints;
The searching unit is specifically used for:The fingerprint of first referenced data block is searched in first order caching;If The fingerprint less than first referenced data block is searched in the first order caching, then in being cached in the second level described in lookup The fingerprint of first referenced data block.
17. equipment according to claim 16, which is characterized in that the second level caching includes one or more containers, Each container is the fingerprint of each data block and similar fingerprints structure at least two data blocks and at least two data block Into set, there is correlation between the content of at least two data blocks in each container;
The searching unit is additionally operable to:If a data block is found in being cached in the second level, by the data block institute Container cache to the first order cache in.
18. a kind of data processing equipment, which is characterized in that the equipment includes:
Receiving unit, for receiving the similar fingerprints for the data to be transmitted that the first equipment is sent, wherein, the data to be transmitted Similar fingerprints include the similar fingerprints of the first data block, and first data block is a data in the data to be transmitted Block;
Searching unit for the similar fingerprints according to the data to be transmitted, finds to store in the equipment and be treated with described Transmit the similar referenced data block of data;Wherein, the referenced data block includes the first referenced data block, first reference number It is identical with the similar fingerprints of first data block according to the similar fingerprints of block;
Transmitting element, for sending the fingerprint of referenced data block to first equipment;Wherein, the fingerprint of the referenced data block Include the fingerprint of first referenced data block;The fingerprint of the referenced data block is sent out for first equipment to the equipment Data are sent, the data include the variance data between first referenced data block and first data block;
The receiving unit is additionally operable to, and receives the data that first equipment is sent.
19. equipment according to claim 18, which is characterized in that
The receiving unit is additionally operable to, and receives the fingerprint for the data to be transmitted that first equipment is sent, wherein, it is described to treat The fingerprint of transmission data includes the fingerprint of first data block;
The searching unit is additionally operable to, and is found in the equipment according to the fingerprint of first data block without storing described the During one data block, search in the equipment whether store first reference number according to the similar fingerprints of first data block According to block.
20. the equipment according to claim 18 or 19, which is characterized in that the similar fingerprints of the data to be transmitted include the The similar fingerprints of two data blocks, second data block are another data blocks of the data to be transmitted;The reference data The fingerprint of block does not include the fingerprint of the second referenced data block;The similar fingerprints of second referenced data block and second data The similar fingerprints of block are identical;The data also include second data block.
21. a kind of data processing equipment, which is characterized in that including:Memory and processor, wherein, the memory is used to deposit Computer program is stored up, when the computer program is performed by the processor so that as described in any one of claim 1 to 10 Method is performed.
22. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program When running on computers so that method as described in any one of claim 1 to 10 is performed.
CN201711167866.1A 2017-11-21 2017-11-21 Data processing method and equipment Active CN108134775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711167866.1A CN108134775B (en) 2017-11-21 2017-11-21 Data processing method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711167866.1A CN108134775B (en) 2017-11-21 2017-11-21 Data processing method and equipment

Publications (2)

Publication Number Publication Date
CN108134775A true CN108134775A (en) 2018-06-08
CN108134775B CN108134775B (en) 2020-10-09

Family

ID=62388793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711167866.1A Active CN108134775B (en) 2017-11-21 2017-11-21 Data processing method and equipment

Country Status (1)

Country Link
CN (1) CN108134775B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109309670A (en) * 2018-09-07 2019-02-05 深圳市网心科技有限公司 Data stream method and system, electronic device and computer readable storage medium
CN109710502A (en) * 2018-12-19 2019-05-03 苏州科达科技股份有限公司 Log transmission method, apparatus and storage medium
CN111064471A (en) * 2018-10-16 2020-04-24 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
WO2021012162A1 (en) * 2019-07-22 2021-01-28 华为技术有限公司 Method and apparatus for data compression in storage system, device, and readable storage medium
CN112416694A (en) * 2019-08-20 2021-02-26 中国电信股份有限公司 Information processing method, system, client and computer readable storage medium
WO2021121042A1 (en) * 2019-12-18 2021-06-24 华为技术有限公司 Data storage method in storage system and related device
WO2022001548A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Data transmission method, system, apparatus, device, and medium
CN114662160A (en) * 2022-05-25 2022-06-24 成都易我科技开发有限责任公司 Digital summarization method, system and digital summarization method in network transmission

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833486A (en) * 2010-04-07 2010-09-15 山东高效能服务器和存储研究院 Method for designing remote backup and recovery system
CN102185889A (en) * 2011-03-28 2011-09-14 北京邮电大学 Data deduplication method based on internet small computer system interface (iSCSI)
CN102495894A (en) * 2011-12-12 2012-06-13 成都市华为赛门铁克科技有限公司 Method, device and system for searching repeated data
CN103020174A (en) * 2012-11-28 2013-04-03 华为技术有限公司 Similarity analysis method, device and system
CN103858125A (en) * 2013-12-17 2014-06-11 华为技术有限公司 Repeating data processing methods, devices, storage controller and storage node

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833486A (en) * 2010-04-07 2010-09-15 山东高效能服务器和存储研究院 Method for designing remote backup and recovery system
CN102185889A (en) * 2011-03-28 2011-09-14 北京邮电大学 Data deduplication method based on internet small computer system interface (iSCSI)
CN102495894A (en) * 2011-12-12 2012-06-13 成都市华为赛门铁克科技有限公司 Method, device and system for searching repeated data
CN103020174A (en) * 2012-11-28 2013-04-03 华为技术有限公司 Similarity analysis method, device and system
CN103858125A (en) * 2013-12-17 2014-06-11 华为技术有限公司 Repeating data processing methods, devices, storage controller and storage node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
廖海生: "基于重复数据删除技术的数据容灾***的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109309670B (en) * 2018-09-07 2021-02-12 深圳市网心科技有限公司 Data stream decoding method and system, electronic device and computer readable storage medium
CN109309670A (en) * 2018-09-07 2019-02-05 深圳市网心科技有限公司 Data stream method and system, electronic device and computer readable storage medium
CN111064471A (en) * 2018-10-16 2020-04-24 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
CN111064471B (en) * 2018-10-16 2023-04-11 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
CN109710502B (en) * 2018-12-19 2022-06-14 苏州科达科技股份有限公司 Log transmission method, device and storage medium
CN109710502A (en) * 2018-12-19 2019-05-03 苏州科达科技股份有限公司 Log transmission method, apparatus and storage medium
WO2021012162A1 (en) * 2019-07-22 2021-01-28 华为技术有限公司 Method and apparatus for data compression in storage system, device, and readable storage medium
CN112416694A (en) * 2019-08-20 2021-02-26 中国电信股份有限公司 Information processing method, system, client and computer readable storage medium
WO2021121042A1 (en) * 2019-12-18 2021-06-24 华为技术有限公司 Data storage method in storage system and related device
EP4068071A4 (en) * 2019-12-18 2023-01-25 Huawei Technologies Co., Ltd. Data storage method in storage system and related device
US11755207B2 (en) 2019-12-18 2023-09-12 Huawei Technologies Co., Ltd. Data storage method in storage system and related device
WO2022001548A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Data transmission method, system, apparatus, device, and medium
CN114662160A (en) * 2022-05-25 2022-06-24 成都易我科技开发有限责任公司 Digital summarization method, system and digital summarization method in network transmission

Also Published As

Publication number Publication date
CN108134775B (en) 2020-10-09

Similar Documents

Publication Publication Date Title
CN108134775A (en) A kind of data processing method and equipment
US8924687B1 (en) Scalable hash tables
US6754799B2 (en) System and method for indexing and retrieving cached objects
US8344916B2 (en) System and method for simplifying transmission in parallel computing system
CN110191428B (en) Data distribution method based on intelligent cloud platform
CN108255647B (en) High-speed data backup method under samba server cluster
WO2019024780A1 (en) Light-weight processing method for blockchain, and blockchain node and storage medium
CN103116615B (en) A kind of data index method and server based on version vector
CN109508334B (en) For the data compression method of block chain database, access method and system
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
CN107046812A (en) A kind of data save method and device
US20120246125A1 (en) Duplicate file detection device, duplicate file detection method, and computer-readable storage medium
CN104584524A (en) Aggregating data in a mediation system
CN109445702A (en) A kind of piece of grade data deduplication storage
US9667737B2 (en) Publisher-assisted, broker-based caching in a publish-subscription environment
CN104618304A (en) Data processing method and data processing system
CN105407096A (en) Message data detection method based on stream management
CN109189759A (en) Method for reading data, data query method, device and equipment in KV storage system
US8868584B2 (en) Compression pattern matching
CN110245129A (en) Distributed global data deduplication method and device
CN103399943A (en) Communication method and communication device for parallel query of clustered databases
CN103609091B (en) Method and device for data transmission
CN114625805B (en) Return test configuration method, device, equipment and medium
CN110489380A (en) A kind of data processing method, device and equipment
CN106202303B (en) A kind of Chord routing table compression method and optimization file search method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant