CN106201771A - Data-storage system and data read-write method - Google Patents
Data-storage system and data read-write method Download PDFInfo
- Publication number
- CN106201771A CN106201771A CN201510226830.0A CN201510226830A CN106201771A CN 106201771 A CN106201771 A CN 106201771A CN 201510226830 A CN201510226830 A CN 201510226830A CN 106201771 A CN106201771 A CN 106201771A
- Authority
- CN
- China
- Prior art keywords
- finger print
- data block
- bucket
- print information
- multiple knot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of data-storage system, including Centroid with remove multiple knot;Described Centroid removes multiple knot for according to preset strategy each Bucket is assigned to correspondence, according to Bucket and go the corresponding relation of multiple knot to create routing table, and synchronizes described routing table and removes multiple knot to each;Described go multiple knot for according to described routing table, store each described in the data block that represents of the finger print information corresponding to Bucket that is assigned to and described finger print information.Achieve the overall duplicate removal storage management of initial data other to 100PB higher level and the other finger print information of 100TB higher level.
Description
Technical field
The invention belongs to Internet technical field, specifically, relate to a kind of date storage method, data
Storage system and data read-write method, the client for reading and writing data and the system for reading and writing data.
Background technology
Internet firm needs backup, the data of filing to be outburst trend in recent years.For cost
Considering, the main storage of tape always backup and filing system and virtual machine main storage system is situated between
Matter.But the storage environment of tape requires high and the life-span is shorter, is typically accomplished by every 4-5
Dump on new tape.When the quantity of tape is accumulated to several ten thousand even after hundreds of thousands, and unloading works
A nightmare will be become.
Along with the development of magnetic disc, capacity has reached 6T even 8T, its capacity price ratio
Gradually close with tape, and disk is to support random access relative to the advantage of tape, and this makes weight
The application of complex data deleting technique is possibly realized, by combining magnetic disc and data de-duplication skill
Art, can be greatly saved the cost of backup filing.
The most existing data de-duplication commercial product, such as EMC DD990, HP
StoreOnce B6200 equipment, SEPATON DeltaStor software etc., substantially belong to unit mould
Formula, scalability is very limited, maximum available 1.6PB, maximum handling capacity 31TB/h
(8.8G/s), no matter from capacity or performance, the storage of Internet firm cannot be met at all
Demand.
" a kind of support mass data to back up expansible point of Inst. of Computing Techn. Academia Sinica
Cloth data deduplication system ", for the deficiency of single cpu mode, in the autgmentability of machining system
Propose distributed Bloom filter (bloomfilter) with deduplicated efficiency the two aspect to be used for being distributed
The data removing multiple knot in formula machining system route, and propose fingerprint queries based on sampling mechanism
In order to improve fingerprint queries speed, it is achieved that distributed data deduplication system 3D-deduper.
Hereinafter referred to as scheme one.
EMC Inc. also have developed clustered on the basis of its unit (single-node) pattern
Heavily store system (cluster deduplication storage system).Its way is to increase several
Backup server, is responsible for data stream carries out stripping and slicing, the fingerprint of calculating data block, is then packaged into
One superblock (super chunk) also goes multiple knot to carry out according to certain policybased routing to certain
Process.Hereinafter referred to as scheme two.
Both the above scheme is from the point of view of stricti jurise, it is impossible to referred to as distributed system, but cluster system
System.The basic ideas of group system are that to carry out the load of task between multiple reliable single node equal
Weighing apparatus.And the basic ideas of distributed system are to carry out data between multiple insecure single node to divide
Cloth (when data distributing equilibrium, the most naturally achieve the load balancing of task), and utilize the most secondary
The means such as basis or check code ensure reliability.
In above-mentioned two scheme, the fingerprint base of group system is decentralized, although have employed
Certain measure is tried one's best and is routed to the data block occurred before and fingerprint thereof be responsible for process before
Go on multiple knot, it can be difficult to avoid being routed to one to remove multiple knot from this data block untreated
On, thus be mistaken for new data block and repeated to preserve.Fingerprint be have employed by scheme one especially
Sparse index based on sampling, has increased the weight of the probability that data block is misjudged further.Even only 2%
Erroneous judgement, for the system of the order of magnitude the biggest for 100PB, be also unacceptable.
Summary of the invention
In view of this, the application provides a kind of date storage method, data-storage system and reading and writing data side
Method, the client for reading and writing data and the system for reading and writing data, solve going at the big order of magnitude
The technical problem that in weight system, the probability of miscarriage of justice that causes due to the Decentralization of finger print information is bigger.
In order to solve above-mentioned technical problem, this application discloses a kind of date storage method, be applied to include
Centroid and the data-storage system removing multiple knot;Described date storage method, including: described center
Node removes multiple knot according to what each bucket (Bucket) was assigned to correspondence by preset strategy;Described centromere
Point creates routing table according to Bucket with the corresponding relation removing multiple knot, and synchronizes described routing table to each
Remove multiple knot;Described go multiple knot according to described routing table, store each described in the Bucket that is assigned to
The data block that corresponding finger print information and described finger print information represent.
Described go multiple knot according to described routing table, store each described in corresponding to the Bucket that is assigned to
The data block that finger print information and described finger print information represent includes: described in remove multiple knot be each described distribution
To Bucket be respectively created correspondence container (Container) file;Described go multiple knot in each institute
State the finger print information preserving correspondence in the Bucket being assigned to, with each described in the Bucket that is assigned to
Corresponding Container file preserves the data block that described finger print information represents.
Whether the described size going multiple knot to judge described Container file is more than predetermined threshold value;Work as institute
When stating the size of Container file more than predetermined threshold value, described in go multiple knot by described Container literary composition
Part is filed to background server.
Each Bucket is distributed to the corresponding multiple knot that goes according to preset strategy and includes by described Centroid:
What each Bucket was assigned to multiple correspondence by described Centroid removes multiple knot, in the plurality of correspondence
Go multiple knot determines a host node and at least one secondary node.
Described Centroid judges each to go whether multiple knot can be used, or whether adds new duplicate removal joint
Point;When judging that certain goes multiple knot unavailable, or add new when removing multiple knot, described center
Node redistributes described each Bucket;Described Centroid updates described routing table and is synchronized to each
Remove multiple knot;Described multiple knot is gone to carry out Data Migration according to the routing table after described renewal.
Described go multiple knot to carry out Data Migration according to the routing table after described renewal to include: described host node
Described Data Migration is initiated according to the routing table after described renewal.
Described when judging that certain goes multiple knot unavailable, described Centroid is redistributed described each
Bucket includes: when judging that described host node is unavailable, described Centroid from described at least one
Secondary node redefines out a host node;Described go multiple knot according to the routing table after described renewal
Carry out Data Migration to include: described in the host node that redefines initiate institute according to the routing table after described renewal
State Data Migration.
A finger print information storehouse, described finger print information storehouse are stored in solid-state to go multiple knot to include described in each
Cuckoo Hash mapping table on hard disk, including described in remove the fingerprint corresponding to each Bucket of multiple knot
The storage information of the data block that information and described finger print information represent.
Run M cuckoo Hash mapping table on described solid state hard disc simultaneously, and use N number of cloth simultaneously
Paddy bird hash function;Wherein, M × N=128.
Run 32 cuckoo Hash mapping tables on described solid state hard disc simultaneously, and use 4 road cloth simultaneously
Paddy bird hash function.
In order to solve above-mentioned technical problem, disclosed herein as well is a kind of data read-write method, including: will
Data cutting is multiple data block the finger print information calculating each data block respectively;Determine described every number
According to the Bucket corresponding to the finger print information of block;According to the routing table obtained from Centroid, determine and institute
State Bucket corresponding remove multiple knot;Send fingerprint queries request to the duplicate removal corresponding with described Bucket
Node, the request of described fingerprint queries includes the finger print information of data block;Receive corresponding with described Bucket
Remove the finger print information not inquired that multiple knot returns;The finger print information that do not inquires described in uploading and
The data block represented removes multiple knot to corresponding with described Bucket.
The described Bucket corresponding to finger print information determining described each data block includes: by described fingerprint
Information carries out modulo operation with the total quantity of described Bucket, determines institute according to the result of described modulo operation
State the Bucket corresponding to finger print information.
Described method also includes: on the data block of the described finger print information not inquired and representative thereof is whole
When passing complete, uploading the mapped file of described data to removing multiple knot, described mapped file includes described number
According to the finger print information of each data block, the finger print information of described each data block is according to the cutting of data block
Order arrangement.
The described mapped file of described data of uploading to removing multiple knot, including: by described mapped file cutting
For multiple data blocks the cryptographic Hash of the data block calculating mapped file respectively;Determine described mapped file
Bucket corresponding to the cryptographic Hash of data block;Determine and the number of described mapped file according to described routing table
According to removing multiple knot corresponding to the Bucket that the cryptographic Hash of block is corresponding;Upload the data block of described mapped file
With corresponding cryptographic Hash to corresponding to the Bucket corresponding with the cryptographic Hash of the data block of described mapped file
Remove multiple knot.
Described is that multiple data block includes by described mapped file cutting: by the header of described mapped file
Cutting is first data block in the plurality of data block;The header of described mapped file includes described
Total size of mapped file, the information such as total quantity of the plurality of data block.
Described method also includes: from the mapped file going multiple knot to obtain described data;According to described mapping
The finger print information of each data block of data described in file is from going each of the multiple knot described data of acquisition
Data block;Institute is gone out according to the finger print information of the described each data block sequential concatenation in described mapped file
State data.
Described from go multiple knot obtain described data mapped file include: according to the name of described mapped file
Claim and data block sequence number is from each data block going multiple knot to obtain described mapped file;Literary composition is mapped by described
Each data block of part is spliced into the mapped file of described data.
The routing table that described basis obtains from Centroid, determines and corresponding with described Bucket removes multiple knot
Including: when storing data first, obtain routing table from described Centroid;Obtain according to from Centroid
The routing table taken, determines and corresponding with described Bucket removes multiple knot.
The routing table that described basis obtains from Centroid, determines and corresponding with described Bucket removes multiple knot
Also include: send request bag and remove multiple knot to corresponding with described Bucket;Receive and described Bucket
The corresponding respond packet going multiple knot to return, described respond packet includes the version information of routing table;Judge institute
State the version information of the routing table in respond packet and the version letter of the described routing table obtained from Centroid
Cease the most identical;When the version information of the routing table in described respond packet obtains with from Centroid with described
The version information of routing table identical time, determine with described according to the described routing table obtained from Centroid
What Bucket was corresponding removes multiple knot;When the version information of the routing table in described respond packet and from Centroid
When the version information of the routing table obtained differs, obtain the routing table after updating from described Centroid;
Redefine according to the routing table after described renewal and corresponding with described Bucket remove multiple knot.
In order to solve above-mentioned technical problem, disclosed herein as well is a kind of data read-write method, including:
Centroid transmission routing table is to client, and described routing table includes Bucket and goes between multiple knot
Corresponding relation;Going multiple knot to receive the fingerprint queries request of described client, described fingerprint queries please
Ask and include and described finger print information corresponding for the Bucket going multiple knot to be assigned to;Described go multiple knot to institute
State finger print information to inquire about, the finger print information not inquired is back to described client;Described duplicate removal
Node receives the described finger print information not inquired of described client upload and representative data thereof
Block.
Described method also includes: described in go multiple knot preserve in the described Bucket being assigned to described in do not look into
Ask the finger print information that arrives, with described Container file corresponding for the Bucket being assigned to preserves institute
State data block, described in go multiple knot to described client return described data block preserve successful message.
Described go multiple knot to return described data block to described client to preserve before successful message, described
Method also includes: described in go multiple knot by standby for the data block of the described finger print information not inquired and representative thereof
Part is to secondary node.
Described method also includes: described in go multiple knot to preserve the data of mapped file of described client upload
Block and corresponding cryptographic Hash.
Data block and the corresponding cryptographic Hash of the mapped file of the described client upload of described preservation include:
In Container file corresponding to the Bucket of described correspondence, preserve the data block of described mapped file;
In the Bucket of described correspondence, preserve cryptographic Hash and first storage of the data block of described mapped file
Information.
Described first storage information includes: preserve the Container file of the data block of described mapped file
Title, the data block of described mapped file side-play amount in described Container file and described mapping
The size of the data block of file.
Described method also includes: described in go multiple knot to receive described client to obtain described mapped file
The request of data block;The described data block going multiple knot to send described mapped file is to described client;Institute
State multiple knot to receive representated by each finger print information that described client obtains in described mapped file
The request of data block;Described multiple knot is gone to send data block representated by described each finger print information to institute
State client.
Described multiple knot is gone to send data block representated by described each finger print information to described client bag
Include: described in go multiple knot to determine the second storage information of described data block according to described finger print information, described
Second storage information includes the title preserving the Container file of described data block, and described data block exists
Side-play amount in described Container file and the size of described data block;Described go multiple knot according to institute
The title stating Container file judges that described Container file has been filed the most to background server;
When described Container file has been filed to background server, described in go multiple knot according to described data
The size of block side-play amount in described Container file and described data block is from described background server
Obtain described data block and send to described client;When described Container file is still saved in this locality
Time, described in go multiple knot according to described data block side-play amount in described Container file and described
The size of data block obtains described data block and sends to described client from this locality.
Described Centroid sends routing table and includes to client: when described client stores data first
Time, described Centroid receives the routing table request of described client;Described Centroid sends route
Table is to described client.
Described Centroid send routing table also include to client: described in go multiple knot to receive described visitor
The request bag of family end: described in go multiple knot to send respond packet to described client, described respond packet includes institute
State the version information of the routing table that multiple knot preserves;Version letter when the routing table that described client preserves
When breath is inconsistent with the version information of the described routing table going multiple knot to preserve, described Centroid connects
Receive the routing table request of described client;Described Centroid sends the routing table after updating to described visitor
Family end.
Described go multiple knot that described finger print information is inquired about, the finger print information not inquired is back to
Described client includes: described in go multiple knot to judge whether described finger print information is deposited by Bloom filter
?;In the presence of being judged that by Bloom filter described finger print information is not, determine that described finger print information is
The finger print information not inquired;In the presence of judging described finger print information by Bloom filter, referring to
Stricture of vagina information bank inquires about whether described finger print information exists;When inquiring described fingerprint in finger print information storehouse
During information, determine that described finger print information exists;When not inquiring described fingerprint letter in finger print information storehouse
During breath, determine that described finger print information is the finger print information not inquired.
In order to solve above-mentioned technical problem, disclosed herein as well is a kind of data-storage system, including: in
Heart node and one or more remove multiple knot, wherein, described Centroid, for will according to preset strategy
What each bucket (Bucket) was assigned to correspondence removes multiple knot, and corresponding with remove multiple knot according to Bucket
Relation creates routing table, and synchronizes described routing table and remove multiple knot to each;Described remove multiple knot, be used for
According to described routing table, store each described in the finger print information corresponding to Bucket that is assigned to and described finger
The data block that stricture of vagina information represents.
In order to solve above-mentioned technical problem, disclosed herein as well is a kind of client for reading and writing data,
Including: cutting computing module, for being multiple data block by data cutting and calculating each data block respectively
Finger print information;Bucket determines module, for determining corresponding to the finger print information of described each data block
Bucket;Node determines module, for according to the routing table obtained from Centroid, determining with described
What Bucket was corresponding removes multiple knot;Request sending module, is used for sending fingerprint queries request to described
What Bucket was corresponding goes multiple knot, the request of described fingerprint queries to include the finger print information of data block;Information connects
Receive module, for receiving the fingerprint that do not inquire that go multiple knot the return letter corresponding with described Bucket
Breath;Transmission module in data, is used for the data block of finger print information and the representative thereof not inquired described in uploading extremely
Corresponding with described Bucket removes multiple knot.
In order to solve above-mentioned technical problem, disclosed herein as well is a kind of system for reading and writing data, bag
Include: Centroid and remove multiple knot, wherein, described Centroid, it is used for sending routing table to client,
Described routing table includes Bucket and removes the corresponding relation between multiple knot;Described remove multiple knot, be used for connecing
Receiving the fingerprint queries request of described client, the request of described fingerprint queries includes going multiple knot to divide with described
The finger print information corresponding for Bucket being fitted on;Described finger print information is inquired about, the finger that will do not inquire
Stricture of vagina information is back to described client;Receive the described fingerprint letter not inquired of described client upload
Breath and representative data block thereof.
Compared with prior art, the application can obtain and include techniques below effect: achieve 100PB
The overall duplicate removal storage management of the other initial data of higher level and the other finger print information of 100TB higher level,
Having the highest extensibility, adding the new multiple knot rear center node that goes in system can be according to presetting
Strategy re-starts data distribution, goes multiple knot to be automatically performed Data Migration, makes performance and the capacity of system
Can be extended easily.
Certainly, the arbitrary product implementing the application must be not necessarily required to reach all the above skill simultaneously
Art effect.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes of the application
Point, the schematic description and description of the application is used for explaining the application, is not intended that the application's
Improper restriction.In the accompanying drawings:
Fig. 1 is the knot of a kind of data-storage system (for the system of reading and writing data) of the embodiment of the present application
Structure schematic diagram;
Fig. 2 is the routing table schematic diagram of the embodiment of the present application;
Fig. 3 is the schematic flow sheet of a kind of data read-write method of the embodiment of the present application;
Fig. 4 is the schematic flow sheet of a kind of data read-write method of the embodiment of the present application;
Fig. 5 is the structural representation of a kind of client for reading and writing data of the embodiment of the present application.
Detailed description of the invention
Embodiments of the present invention are described in detail, thereby to the present invention below in conjunction with drawings and Examples
How application technology means solve technical problem and reach the process that realizes of technology effect and can fully understand
And implement according to this.
Fig. 1 is the data-storage system (hereinafter referred to as " system ") that the embodiment of the present application provides, including
Centroid 10 and go multiple knot 11, Centroid 10 to couple with removing multiple knot 11.In systems,
Centroid 10 be responsible for multiple distributed management removing multiple knot 11 and intrasystem data are distributed and
Replica management.Multiple knot 11 is gone to be responsible for data block and the finger print information of data block and storage information
It is managed and preserves, and under the distributed management of Centroid 10, completing duplication and the migration of data.
Go multiple knot 11 to possess abstract storage engines layer, new storage engines can be added very easily.
System to the management of data block with bucket (Bucket) as unit, Bucket is one in system and patrols
Collecting concept, distribute a Bucket numbering for each Bucket, this Bucket numbering is used for and each number
Set up corresponding relation according to the finger print information of block by the hash algorithm preset, thus by data block according to
Bucket numbering stores and sets up the corresponding relation between data block and storage file respectively.The center of system
Data block and the finger print information of data block that system is preserved by node by Bucket carry out global administration.
Centroid removes multiple knot according to what each Bucket was assigned to correspondence by preset strategy, this default plan
It can be slightly load balancing.Such as, Centroid obtains each load data removing multiple knot, logical
The real-time change of overload data determines each current load condition removing multiple knot, and Bucket is preferential
Be assigned to present load relatively low remove multiple knot, realize each by data distributing equilibrium and go the negative of multiple knot
Carry equilibrium.This preset strategy can be position security strategy, and such as, Centroid is according to the concerning security matters of data
Property or the authority of client different Bucket is assigned to the different multiple knot that goes, make different concerning security matters level
Other or different rights client data is saved in different going in multiple knot.
Centroid each Bucket is assigned to correspondence remove multiple knot, numbered by Bucket and
Go the mark of multiple knot to the corresponding relation setting up Bucket with remove multiple knot, and create according to this corresponding relation
Build routing table.This routing table can regard a mapping table as, have recorded Bucket and goes reflecting between multiple knot
Penetrating relation, Fig. 2 is the exemplary plot of routing table in the embodiment of the present application, wherein, and the numeral generation of horizontal gauge outfit
Table Bucket numbers, and the copy mark of the digitized representation Bucket of longitudinal gauge outfit, the letter in form divides
Do not represent and different remove multiple knot.As in figure 2 it is shown, the Bucket of the most numbered 0, No. 0 copy quilt
Being assigned to multiple knot D, No. 1 copy is assigned to multiple knot A;The Bucket of numbered 1,0
Number copy is assigned to multiple knot A, and No. 1 copy is assigned to multiple knot B.Fig. 2 is for this
Routing table in application embodiment is illustrative, is not intended that the limit to the application protection domain
System, can arrange any number of multiple knot that goes in system, each to go multiple knot to be assigned to multiple
Bucket, each Bucket can also have one or more backup and back up and remove multiple knot different.
Centroid creates after routing table, is synchronized to this routing table each remove multiple knot.In system
Each goes multiple knot to determine the Bucket being assigned to this locality according to routing table, stores and is assigned to this locality
Finger print information corresponding for Bucket and finger print information represent data block.Multiple knot is gone to preserve each
When finger print information that Bucket is corresponding and the data block that finger print information represents, for each Bucket being assigned to
Create corresponding container (Container) file, each Bucket preserve the finger print information of correspondence,
The data block that finger print information represents is preserved in the Container file corresponding with Bucket.And fingerprint letter
Corresponding relation between breath and Bucket, is to be entered the Bucket sum of internal system by finger print information
Row modulo operation, determines the Bucket numbering that finger print information is corresponding, this calculating process according to operation result
Generally completed by the client to system storage data.When the finger print information corresponding with Bucket increasingly
Time many, the data block preserved in corresponding Container file increases therewith, and Container file takies
Memory space increase the most therewith, in order to ensure each copy going multiple knot can store multiple Bucket
And control to go the load of multiple knot, when the size of a Container file corresponding for Bucket exceedes
During predetermined threshold value, go multiple knot by this Container archive to background server 12, in Fig. 1
Shown in, each go multiple knot to couple with background server 12, go multiple knot to receive corresponding number again
During according to block, store in the Container file being positioned at background server 12.
Each Bucket is distributed to when removing multiple knot of correspondence according to preset strategy by Centroid, can be by
What each Bucket was assigned to multiple correspondence removes multiple knot, makes each Bucket exist in systems multiple
Copy, and be that each copy distributes different copy marks.Such as in the routing table shown in Fig. 2, in
Heart node is that each Bucket is assigned to two and goes multiple knot, each Bucket to remove multiple knot different
Copy be respectively provided with copy mark 0 and 1.
Each Bucket is assigned to multiple when removing multiple knot by Centroid, goes in multiple knot really the plurality of
A fixed host node and at least one secondary node.Centroid can according to copy mark determine host node and
Secondary node, determines a primary copy mark, other copies in multiple copy marks of each Bucket
Mark is backing copy mark, such as, copy is designated the copy of 0 as the master of each Bucket
Copy, the copy of other copy marks is backing copy.And go in multiple knot all, by certain
The multiple knot that goes at the copy place that the copy of Bucket is designated 0 is defined as the host node of this Bucket, should
The multiple knot that goes at other copy places of Bucket is the secondary node of this Bucket.To prevent certain from going
When multiple knot is unavailable, this goes the data of the Bucket on multiple knot to store and read-write all will be unable to carry out.
Each copy of Bucket includes finger print information corresponding for this Bucket and preserves the representative of this finger print information
The Container file of data block.
Centroid can judge each to go whether multiple knot can be used, or judges whether internal system adds
New removes multiple knot.By going the heartbeat message between multiple knot to judge, each duplicate removal saves Centroid
Point whether can with or whether add and new remove multiple knot.Centroid judges that certain goes multiple knot not
Adding new when removing multiple knot in available or system, Centroid redistributes each Bucket, is
The internal Bucket of system will change with the mapping relations removing multiple knot.When certain goes multiple knot unavailable
Time, the Bucket going multiple knot corresponding with this is re-assigned to other according to preset strategy by Centroid
In multiple knot;New when removing multiple knot when adding in system, Centroid according to preset strategy by system
Interior Bucket redistributes.Above two situation all can make internal system Bucket save with duplicate removal
The mapping relations of point change, and Centroid closes with the mapping removing multiple knot according to the Bucket after change
System updates routing table, and the routing table after updating is synchronized to each remove multiple knot.Owing to Bucket distributes
To the multiple knot that goes there occurs change, intrasystem go multiple knot by according to the routing table number after updating
According to migration.
This Data Migration is initiated by the host node of the Bucket going multiple knot to change being assigned to.Example
As, in the routing table, No. 0 copy (primary copy) of the Bucket of numbered 1 is by removing multiple knot A
Become multiple knot B, then by this go multiple knot A initiate numbered 1 Bucket No. 0 copy to
Remove the Data Migration of multiple knot B, go multiple knot B to judge this Bucket of numbered 1 further according to routing table
Other secondary nodes whether there occurs change, if it occur that change, such as from going multiple knot D to become
Removing multiple knot E, the data in the Bucket of numbered 1 backup to multiple knot E the most again, duplicate removal saves
The copy of the Bucket of numbered the 1 of some E is designated backing copy mark.Data Migration is each after completing
Go multiple knot can delete and the number of the local Bucket that there are not mapping relations according to the routing table after updating
According to.When going multiple knot unavailable of the host node as certain Bucket, Centroid is at this Bucket
Secondary node in redefine a host node, by the host node redefined according to update after route
The Data Migration about this Bucket initiated by table.Such as, the host node of the Bucket of numbered 1
When going multiple knot A unavailable, Centroid from numbered 1 the secondary node duplicate removal of Bucket
Node B and go in multiple knot C, determines the host node of the Bucket that multiple knot B is numbered 1,
The copy mark then removing the copy of the Bucket of numbered 1 in multiple knot B becomes primary copy mark (example
Such as 0), in the routing table after renewal, the secondary node of the Bucket of numbered 1 is for removing multiple knot C and going
Multiple knot D, then by going the multiple knot B data by the Bucket of numbered 1 to backup to multiple knot D.
Each duplicate removal intra-node is built with a finger print information storehouse.This finger print information storehouse includes multiple knot
The finger print information corresponding to each Bucket and the storage information of data block that represents of finger print information.This refers to
Stricture of vagina information bank can be to use the form of Key-Value Store, with finger print information as Key, this finger print information
The storage information of the data block represented is Value.During the reading and writing data of system, relate to a large amount of
Finger print information inquiry and comparison process, each go in multiple knot use Bloom filter to undertake part
Inquiry request, due to the possibility of the existence under-enumeration of Bloom filter, also have a large amount of request need further exist for by
Finger print information storehouse completes.Therefore the reading performance requirement to finger print information storehouse (Key-Value Store)
The highest.For small-sized Key-Value Store, can be Key-Value pair
(Key-Value Pair) with presets, such as log-structure, leave on common hard disc, so
After in internal memory set up index, with the Key-Value Pair in access hard disk rapidly.But owing to this is
System is applied to the data storage of more than 100PB rank, the very big (100PB of finger print information and information memory capacity
Initial data, the finger print information of corresponding about 50TB and storage information), the most now cannot set
Standby internal memory sets up index.Therefore inventors herein have recognized that, completely can be at solid state hard disc (Solid
State Drives, SSD) on realize a Hash table to deposit whole Key-Value in finger print information storehouse
Pair.This Hash table being stored on solid state hard disc is cuckoo Hash mapping table, owing to removing multiple knot
The Bloom filter that first passes through carry out the inquiry comparison of finger print information, there is the under-enumeration due to hash-collision
Situation, cuckoo Hash mapping table is a kind of mode that can process hash-collision, its basic ideas
Be use two different hash functions to calculate the position that Key deposits, (1) if two positions all
Free time, then a position is selected to insert;(2) if only one of which position is idle, then this it is inserted into
Clear position;(3) if two positions are the most idle, randomly choose one of both position and should
Key on position kicks out of, and the position then calculating another cryptographic Hash of the Key kicked out of corresponding is entered
Row inserts, if this position is sky, inserts, if not for empty, then kicks out of the Key on this position,
Clear position is found in so continuation always.Obvious this mode likely produces Infinite Cyclic, the most generally
Set a maximum lookup number of times, when reaching this maximum, it is believed that this Hash table is the fullest.Invention
People selects cuckoo Hash, is because the system input and output number of times when inquiring about Key and is usually arranged as often
Amount.
Common cuckoo Hash mapping table only has the utilization rate of 49%, so generally using cuckoo Hash
Two kinds of main deformation: 1) increase hash function number;2) increase each position and can deposit Key
Number.Both deformation may serve to improve the utilization rate of cuckoo Hash mapping table.The application's
Inventor have selected murmur2 hash function as basic hash function, and by arranging different kinds
Son, identical Key value can produce different cryptographic Hash.
Owing to assisting based on NVMe (NonVolatile Memory express, high speed nonvolatile storage)
View solid state hard disc (SSD) bottom be all the page (Page) with 4K as ultimate unit, therefore fingerprint
Information bank is all to be written and read operation sized by 4K when carrying out operating.In finger print information storehouse
Key-Value Pair size is 256Byte, then the Page of a 4K can store 16 finger print informations.
So this cuckoo Hash mapping table, 16 Key-Value Pair are deposited in each position, each
Key-Value Pair is to write by insertion sequence in Page, does not sort by Key, this unordered
Mode can avoid the expense brought that sorts on solid state hard disc.Reality according to present inventor
Test, uses the asynchronous mode of 128 concurrent (queue depth × job number=128), can fully excavate
(Input/Output Operations Per Second per second is written and read (I/O) for the IOPS of NVMe
The number of times of operation) ability (450K), the biggest concurrent in order to produce, present inventor exists
Two aspects are optimized: 1, run multiple cuckoo Hash mapping table on one piece of NVMe hard disk
The Key-Value Store of form;2, on one piece of NVMe hard disk, multiple cuckoo hash function is used,
And use asynchronous reading manner;And need to meet: the cuckoo run on every piece of solid state hard disc simultaneously
The number of Hash mapping table is multiplied by the number of cuckoo hash function equal to 128.Present inventor sends out
Existing, when cuckoo hash function becomes many, the QPS of single cuckoo Hash mapping table (query rate per second,
Query Per Second) decline fairly obvious, become 8 tunnel cuckoo from 4 tunnel cuckoo hash functions and breathe out
During uncommon function, QPS have dropped half, and when cuckoo hash function very little time, cuckoo Hash mapping
Table space utilization rate then declines substantially.Choosing comprehensively considers performance and space availability ratio, present invention
People selects 4 tunnel cuckoo hash functions, and its space availability ratio can reach 98.66%.Thus, it is desirable to
32 cuckoo Hash mapping tables are run on one piece of NVMe hard disk.And it is divided into multiple Hash mapping table
Another one benefit is to reduce the locking granularity in this finger print information storehouse.
The process below client and above-mentioned data-storage system carrying out data read-write operation is done further
Explanation.Client is when system write data, as it is shown on figure 3, this process comprises the following steps.
In step s 201, data cutting is multiple data block and calculates each data respectively by client
The finger print information of block.
The relatively low hash algorithm of collision rate is used to calculate the cryptographic Hash of each data block as finger print information,
The hash algorithms such as such as SHA-1, MD5.
In step S202, client determines the Bucket corresponding to the finger print information of each data block.
The finger print information of data block and intrasystem Bucket sum are carried out modulo operation by client, according to
The result of modulo operation is mated, so that it is determined that this finger print information is corresponding with Bucket numbering
Bucket.Such as, the cryptographic Hash of data block is a, and intrasystem Bucket sum is p, carries out delivery
Computing a%P, modulo operation result is 2, then the Bucket that finger print information reference numeral is 2 of this data block.
In step S203, client, according to the routing table obtained from Centroid, determines and Bucket
Corresponding removes multiple knot.
Client removes multiple knot according to what the routing table preserved determined Bucket place that finger print information is corresponding,
When client is first to system write data, first routing table can be asked to Centroid.Such as, data
The Bucket that finger print information reference numeral is 2 of block, in the routing table, the Bucket of numbered 2 is divided
It is fitted on multiple knot A and removes multiple knot B, the master of the Bucket wherein going multiple knot A to be numbered 2
Node, the secondary node of the Bucket going multiple knot B to be numbered 2, it is therefore desirable to by this data block
Finger print information be sent to multiple knot A and carry out fingerprint queries.
In step S204, client sends fingerprint queries request and removes multiple knot to corresponding with Bucket,
The request of this fingerprint queries includes the finger print information of data block.
Client includes reading thread, sending thread and logical process thread.Multiple reading threads are born respectively
The different piece blaming these data carries out stripping and slicing and calculates the finger print information of data block, then that finger print information is temporary
Being stored in inquiry request queue, each reading thread includes multiple queries request queue, each inquiry request
Queue correspond to a Bucket numbering.Client can be by temporary for the finger print information of corresponding same Bucket
It is stored in same inquiry request queue.When the data in inquiry request queue exceed a certain amount of or are somebody's turn to do
Inquiry request queue be deferred to after date, read thread and inquiry request be placed into the buffering sending thread
District.
Send thread according to Bucket corresponding to each inquiry request queue, send request bag to this Bucket
Place remove multiple knot (host node of this Bucket).In one embodiment, this transmission thread includes
Four relief areas, two of which relief area stores to the request of system transfers, and corresponding fingerprint is looked into respectively
Asking and ask summed data block upload request, two other relief area receives what client other threads internal sent
Newly requested, the most corresponding fingerprint queries please summed data block upload request.Two kinds of different relief areas are set,
The newly requested separation that will sending to request and other threads of system transfers, it is possible to avoid other threads
Occur blocking for a long time during write is newly requested.
When sending thread and receiving the respond packet that multiple knot is beamed back, respond packet can be sent at logic
Reason thread processes accordingly.Logical process thread is according to the fingerprint queries going multiple knot to return accordingly
The upload request not inquiring finger print information and its data block represented is passed to send thread by result, by
Send thread and the finger print information not inquired and its data block transmission represented are removed multiple knot to corresponding.
The transmission that such thread burse mode can ensure that request is continuous print, smoothly.
Wherein send the respond packet that receives of thread and include that this goes the currently stored routing table of multiple knot
Version information, it is judged that the version information of the routing table in this respond packet and the routing table obtained from Centroid
Version information the most identical, when the routing table in respond packet version information with from Centroid obtain
When the version information of routing table differs, represent this Centroid have updated routing table and be synchronized to be
Removing multiple knot in system, now client is by sending thread to obtaining the route after updating from Centroid
Table, and according to the routing table after this renewal redefine Bucket corresponding remove multiple knot, thus the most true
What the fixed finger print information not inquired and its data block represented should upload to removes multiple knot.Work as respond packet
In the version information of routing table identical with the version information of the routing table obtained from Centroid time, still root
Determine according to the routing table obtained from Centroid and corresponding with Bucket remove multiple knot, the fingerprint not inquired
What information and its data block represented should be uploaded goes multiple knot constant.
In step S205, go multiple knot that finger print information is inquired about, the fingerprint letter that will do not inquire
Breath is back to client.
Duplicate removal intra-node includes a Bloom filter and a finger print information storehouse.This Bloom filter is built
Found this hash index removing the currently stored all finger print informations of multiple knot;In this finger print information storehouse with
Depositing of the data block that the in store all finger print informations of form of Key-Value Pair and finger print information represent
Storage information.Go multiple knot by fingerprint queries ask in all finger print informations access successively Bloom filter and
Finger print information storehouse.Calculate the hash index of each finger print information by Bloom filter and judge whether and cloth
Hash index in grand filter is identical.When being different from the hash index in Bloom filter, then
Determining this to go in multiple knot does not has the data block of identical finger print information and representative thereof, when with Bloom filter
In certain hash index phase simultaneously as there is the leak of hash-collision in Bloom filter, can determine that this
Finger print information exists the most, needs whether comprise this fingerprint by finger print information library inquiry further and believes
Breath, when there is this finger print information in finger print information storehouse, determines that this finger print information exists, when fingerprint is believed
When breath storehouse does not exist this finger print information, determine that this finger print information does not exists.First pass through and there is fingerprint letter
The Bloom filter of breath hash index carries out inquiry can improve the efficiency of multiple knot fingerprint queries, then leads to
Cross the under-enumeration situation that finger print information storehouse is likely to occur due to hash-collision to make up Bloom filter, improve
Go the accuracy of multiple knot fingerprint queries.Multiple knot is gone not inquire all in the request of this fingerprint queries
Finger print information put into respond packet and be back to client.This respond packet also includes that this goes multiple knot currently to deposit
The version information of the routing table of storage, judges whether to need to update routing table for client.
In step S206, finger print information that client upload does not inquires and the data block of representative thereof are extremely
Corresponding with Bucket removes multiple knot.
Client the data block of the finger print information not inquired in respond packet and representative thereof is uploaded to
The Bucket place that the finger print information that do not inquires is corresponding remove multiple knot.If the version information of routing table
Be not changed in, then the multiple knot that goes at Bucket place that should be corresponding with the finger print information not inquired walks exactly
Carry out fingerprint queries in rapid S205 removes multiple knot.Fingerprint queries request in other finger print informations due to
Exist in removing multiple knot, then need not again upload, it is to avoid system has repeated to store identical data
Block.
In step S207, go multiple knot preserve in the Bucket being assigned to described in the finger that do not inquires
Stricture of vagina information, preserves described data block in the Container file corresponding with the Bucket being assigned to.
Go multiple knot to receive the data block of finger print information and the representative thereof not inquired, with finger print information
Corresponding Bucket preserves the finger print information not inquired, in the Bucket institute corresponding with finger print information
Corresponding Container file preserves the data block that this finger print information represents.The name of Container file
Title is by the general unique identifier of numbering+internal system of the Bucket corresponding to Container file
(UUID)+date (Date) forms, such as 2_abcd234_010515.In order to ensure that data block is write
Enter disk, by the way of O_SYNC flag bit is set, data block is write corresponding Container literary composition
Part, just returns after making to have write every time, is write by the finger print information of this data block after pwrite returns again
Enter finger print information storehouse, during write finger print information storehouse, using the finger print information of this data block as Key, should
Second storage information of data block, as Value, forms a Key-Value Pair and is saved in finger print information
In storehouse.This second storage information includes the title preserving the Container file of this data block, these data
Block side-play amount (Offset) in this Container file and the size (Chunksize) of this data block.
The hash index of the Key-Value Pair of this new preservation is updated, for follow-up in Bloom filter
Data duplication elimination query.
In step S208, go multiple knot to return data block to client and preserve successful message.
After the data block of the finger print information not inquired and representative thereof preserves, go multiple knot to client
The end return data block successful message of preservation, or in one embodiment, when the fingerprint letter not inquired
When there is secondary node in systems in the Bucket that breath is corresponding, not host node not inquiring client upload
Finger print information and after the data block of representative preserves, then backup to the standby joint of corresponding Bucket
Point, returns the data block successful message of preservation when backing up complete backward client.
In step S209, when the data block of the finger print information not inquired and representative thereof has all been uploaded
Bi Shi, the mapped file of client upload data is to removing multiple knot.
Mapped file includes the finger print information of each data block of these data, and the fingerprint of each data block
Information according to client by this data cutting be during multiple data block cutting order arrangement, with ensure to this
The correct mapping of data.
Mapped file piecemeal, when uploading mapped file, is uploaded by client too.Client will map
File cutting is the cryptographic Hash of multiple data block the data block calculating mapped file respectively.Such as, client
End calculates the cryptographic Hash of each data block of mapped file respectively by murmur2 hash function.Client
Determine the Bucket corresponding to cryptographic Hash of the data block of mapped file, determine according to routing table and map literary composition
Remove multiple knot corresponding to the Bucket that the cryptographic Hash of the data block of part is corresponding, upload the data of mapped file
Block and corresponding cryptographic Hash are to removing multiple knot corresponding to corresponding Bucket.This mapping literary composition of client upload
During the data block of part, carry out fingerprint queries according to the cryptographic Hash of the data block of each mapped file too,
Only upload the data block of mapped file corresponding to the cryptographic Hash not inquired, it is to avoid upload the mapping of repetition
File data blocks.When mapped file cutting is multiple data block by client, by the header of mapped file
Cutting is first data block in multiple data block, and the header of this mapped file includes mapped file
The information such as the total quantity of total size and the plurality of data block.
In step S210, go the data block of mapped file of multiple knot preservation client upload with corresponding
Cryptographic Hash.
In the Bucket corresponding with the cryptographic Hash of the data block of mapped file, preserve the data of mapped file
The cryptographic Hash of block and the first storage information, at the Container file corresponding to the Bucket of this correspondence
The data block of middle preservation mapped file.This first storage information includes the data block of preservation mapped file
The title of Container file, the data block of mapped file side-play amount in Container file and reflecting
Penetrate the size of the data block of file.Again with the title+data block sequence number of mapped file as Key, to map
First storage information of the data block of file is Value, updates finger print information as Key-Value Pair
Storehouse.So far client all terminates to the process of system write data.
As shown in Figure 4, in the embodiment of the present application, client reads the process of data, this process bag from system
Include following steps.
In step S301, client according to the mapped file title of data and data block sequence number to duplicate removal
Node request mapped file.
Client is first to first data block going multiple knot to ask mapped file, the first of mapped file
Individual data block includes the header of this mapped file.The header of this mapped file includes the big of mapped file
The total quantity of the data block of little and this mapped file.Client according to the header of mapped file to duplicate removal
Node sends the request of other data blocks obtaining mapped file.
In step s 302, multiple knot is gone to send the data block of mapped file to client.
Go to mapped file title that multiple knot sends and data block sequence number and finger print information storehouse according to client
In Key mate, thus inquire the data block of this mapped file in finger print information storehouse
Key-Value Pair, determines first corresponding for the Key storage information of data block with mapped file.According to
Container file name in first storage information determines which the data block of this mapped file is stored in
In Container file, further according to inclined in Container file of the data block of this mapped file
The size of shifting amount and this mapped file data block gets the number of this mapped file from Container file
According to block.
In step S303, client is spliced into mapped file, and root according to the data block of mapped file
According to the finger print information of data block each in mapped file to duplicate removal node requests data block.
Client is spliced into complete mapped file according to the block sequence number of mapped file data block.Mapped file
Finger print information including all data blocks and the order arrangement of the cutting according to data block.Client determine with
The Bucket that finger print information is corresponding, is determining the Bucket place corresponding with finger print information by routing table
Remove multiple knot, go multiple knot to send the request of the corresponding data block of acquisition to this.
In step s 304, the data block that the finger print information going multiple knot to send mapped file represents is to visitor
Family end.
Multiple knot is gone to inquire about finger print information storehouse according to the finger print information in the request obtaining data block, inquiry
To the second storage information corresponding with this finger print information.According to the Container literary composition in the second storage information
Part title determines which Container file is the data block that this finger print information represents be saved in, and according to
The size of this data block side-play amount in Container file and this data block obtains from Container file
Get this data block.In one embodiment, according to the Container filename in the second storage information
After claiming to determine which Container file is the data block that this finger print information represents be saved in, it is judged that should
Background server filed by Container file, if this Container file is filed
Background server, goes multiple knot to get number from this Container file being stored in background server
Send to client according to block and by data block.
In step S305, client is according to suitable in mapped file of the finger print information of each data block
Sequence is spliced into described data.
As it is shown in figure 5, the embodiment of the present application is used for the client of reading and writing data, including:
Cutting computing module 501, for being multiple data block by data cutting and calculating each data respectively
The finger print information of block;
Bucket determines module 502, for determining the Bucket corresponding to finger print information of described each data block;
Node determines module 503, for according to the routing table obtained from Centroid, determining with described
What Bucket was corresponding removes multiple knot;
Request sending module 504, for sending fingerprint queries request to the duplicate removal corresponding with described Bucket
Node, the request of described fingerprint queries includes the finger print information of data block;
Information receiving module 505, returns not for receiving the multiple knot that goes corresponding with described Bucket
The finger print information inquired;
Transmission module 506 in data, are used for finger print information and the data of representative thereof not inquired described in uploading
Block removes multiple knot to corresponding with described Bucket;When the described finger print information not inquired and representative thereof
When data block is all uploaded complete, it is additionally operable to the mapped file uploading described data to removing multiple knot, described
Mapped file includes the finger print information of each data block of described data, the fingerprint letter of described each data block
Cease the cutting order arrangement according to data block.
It addition, also disclose in a kind of the embodiment of the present application the system for reading and writing data, it is referred to figure
Shown in 1, including: Centroid 10 and one or more multiple knot 11 that goes, wherein,
Described Centroid 10, is used for sending routing table to client, and described routing table includes Bucket
And remove the corresponding relation between multiple knot;
Described removing multiple knot 11, for receiving the fingerprint queries request of described client, described fingerprint is looked into
The request of inquiry includes and described finger print information corresponding for the Bucket going multiple knot to be assigned to;Described fingerprint is believed
Breath is inquired about, and the finger print information not inquired is back to described client;Receive described client
The described finger print information not inquired uploaded and representative data block thereof.
It should be noted that the system for reading and writing data illustrated in fig. 1 and Fig. 3, shown by 4
The feature of embodiment is the most corresponding, the client for reading and writing data illustrated in fig. 5 also with Fig. 3,4
The feature of shown embodiment is the most corresponding, therefore Fig. 1,5 embodiment in weak point can join
See the description of Fig. 3, embodiment shown by 4, repeat no more.
Date storage method, data-storage system and the data read-write method of the embodiment of the present application offer, use
Client in reading and writing data and the system for reading and writing data, it is achieved that other former to 100PB higher level
The overall duplicate removal storage management of beginning data and the other finger print information of 100TB higher level, has the highest
Extensibility, after what system addition was new removes multiple knot, Centroid can re-start according to preset strategy
Data are distributed, and go multiple knot to be automatically performed Data Migration, make the performance of system and the capacity can be easily
It is extended.Multiple knot is gone to achieve a high-performance finger print information storehouse based on solid state hard disc each,
In solid state hard disc, set up jumbo cuckoo Hash mapping table, overcome the data volume when finger print information
Cannot set up index time the biggest in internal memory, and then cannot be carried out the technical difficulty of duplication elimination query, protect simultaneously
Demonstrate,prove the efficiency of finger print information inquiry and and improve the accuracy that finger print information is inquired about.
In a typical configuration, calculating equipment include one or more processor (CPU), input/
Output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory
(RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).
Internal memory is the example of computer-readable medium.
Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be by
Any method or technology realize information storage.Information can be computer-readable instruction, data structure,
The module of program or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory
(PRAM), static RAM (SRAM), dynamic random access memory (DRAM),
Other kinds of random access memory (RAM), read only memory (ROM), electrically erasable
Read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory
(CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, tape
Magnetic rigid disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be by
The information that calculating equipment accesses.According to defining herein, computer-readable medium does not include non-temporary electricity
Brain readable media (transitory media), such as data signal and the carrier wave of modulation.
As employed some vocabulary in the middle of description and claim to censure specific components.This area skill
Art personnel are it is to be appreciated that hardware manufacturer may call same assembly with different nouns.This explanation
In the way of book and claim not difference by title is used as distinguishing assembly, but with assembly in function
On difference be used as distinguish criterion." bag as mentioned by the middle of description in the whole text and claim
Contain " it is an open language, therefore " comprise but be not limited to " should be construed to." substantially " refer to receivable
In range of error, those skilled in the art can solve described technical problem, base in the range of certain error
Originally described technique effect is reached.Additionally, " coupling " word comprises any directly and indirectly electrical coupling at this
Catcher section.Therefore, if a first device is coupled to one second device described in literary composition, then described first is represented
Device can directly be electrically coupled to described second device, or by other devices or to couple means the most electric
Property is coupled to described second device.Description subsequent descriptions is to implement the better embodiment of the present invention, so
For the purpose of described description is the rule so that the present invention to be described, it is not limited to the scope of the present invention.
Protection scope of the present invention is when being as the criterion depending on the defined person of claims.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to non-
Comprising of exclusiveness, so that include that the commodity of a series of key element or system not only include that those are wanted
Element, but also include other key elements being not expressly set out, or also include for this commodity or be
Unite intrinsic key element.In the case of there is no more restriction, statement " including ... " limit
Key element, it is not excluded that there is also other identical element in the commodity including described key element or system.
Described above illustrate and describes some preferred embodiments of the present invention, but as previously mentioned, it should reason
Solve the present invention and be not limited to form disclosed herein, be not to be taken as the eliminating to other embodiments,
And can be used for various other combination, amendment and environment, and can in invention contemplated scope described herein,
It is modified by above-mentioned teaching or the technology of association area or knowledge.And those skilled in the art are carried out changes
Move and change is without departing from the spirit and scope of the present invention, the most all should be in the protection of claims of the present invention
In the range of.
Claims (33)
1. a date storage method, it is characterised in that be applied to include Centroid and remove multiple knot
Data-storage system, described date storage method, including:
Described Centroid removes multiple knot according to what each bucket (Bucket) was assigned to correspondence by preset strategy;
Described Centroid creates routing table according to Bucket with the corresponding relation removing multiple knot, and synchronizes institute
State routing table and remove multiple knot to each;
Described go multiple knot according to described routing table, store each described in corresponding to the Bucket that is assigned to
The data block that finger print information and described finger print information represent.
2. date storage method as claimed in claim 1, it is characterised in that described in remove multiple knot root
According to described routing table, store each described in the finger print information corresponding to Bucket that is assigned to and described fingerprint
The data block that information represents, including:
Described go multiple knot be each described in the Bucket that is assigned to be respectively created the container of correspondence
(Container) file;
The described Bucket going multiple knot to be assigned to described in each preserves the finger print information of correspondence, with
The Container file corresponding for Bucket being assigned to described in each preserves what described finger print information represented
Data block.
3. date storage method as claimed in claim 2, it is characterised in that
Whether the described size going multiple knot to judge described Container file is more than predetermined threshold value;
When described Container file size more than predetermined threshold value time, described in go multiple knot by described
Container archive is to background server.
4. date storage method as claimed in claim 1, it is characterised in that described Centroid root
Multiple knot is removed according to what each Bucket was distributed to correspondence by preset strategy, including:
What each Bucket was assigned to multiple correspondence by described Centroid removes multiple knot, the plurality of right
Going of answering determines a host node and at least one secondary node in multiple knot.
5. the date storage method as described in claim 1 or 4, it is characterised in that
Described Centroid judges each to go whether multiple knot can be used, or whether adds new duplicate removal joint
Point;
When judging that certain goes multiple knot unavailable, or add new when removing multiple knot, described center
Node redistributes described each Bucket;
Described Centroid updates described routing table and is synchronized to each remove multiple knot;
Described multiple knot is gone to carry out Data Migration according to the routing table after described renewal.
6. date storage method as claimed in claim 5, it is characterised in that described in remove multiple knot root
Data Migration is carried out according to the routing table after described renewal, including:
Described host node initiates described Data Migration according to the routing table after described renewal.
7. date storage method as claimed in claim 5, it is characterised in that described when judging certain
Individual when going multiple knot unavailable, described Centroid redistributes described each Bucket, including:
When judging that described host node is unavailable, described Centroid is from least one secondary node described
In redefine out a host node;
Described go multiple knot to carry out Data Migration according to the routing table after described renewal to include:
The described host node redefined initiates described Data Migration according to the routing table after described renewal.
8. date storage method as claimed in claim 1, it is characterised in that each described duplicate removal saves
Point includes a finger print information storehouse, the cuckoo Hash that described finger print information storehouse is stored on solid state hard disc
Mapping table, including described in remove the finger print information corresponding to each Bucket of multiple knot and described finger print information
The storage information of the data block represented.
9. date storage method as claimed in claim 8, it is characterised in that on described solid state hard disc
Run M cuckoo Hash mapping table simultaneously, and use N number of cuckoo hash function simultaneously;Wherein,
M × N=128.
10. date storage method as claimed in claim 9, it is characterised in that on described solid state hard disc
Run 32 cuckoo Hash mapping tables simultaneously, and use 4 tunnel cuckoo hash functions simultaneously.
11. 1 kinds of data read-write methods, it is characterised in that including:
It is multiple data block the finger print information calculating each data block respectively by data cutting;
Determine the Bucket corresponding to finger print information of described each data block;
According to the routing table obtained from Centroid, determine and corresponding with described Bucket remove multiple knot;
Send fingerprint queries request to ask to the multiple knot that goes corresponding with described Bucket, described fingerprint queries
Finger print information including data block;
Receive the finger print information that do not inquire that go multiple knot return corresponding with described Bucket;
The finger print information not inquired described in uploading and the data block of representative thereof are to corresponding with described Bucket
Remove multiple knot.
12. methods as claimed in claim 11, it is characterised in that described determine described each data
Bucket corresponding to the finger print information of block includes:
The total quantity of described finger print information Yu described Bucket is carried out modulo operation, transports according to described delivery
The result calculated determines the Bucket corresponding to described finger print information.
13. methods as claimed in claim 11, it is characterised in that described method also includes:
When the data block of the described finger print information not inquired and representative thereof is all uploaded complete, upload institute
Stating the mapped file of data to removing multiple knot, described mapped file includes each data block of described data
Finger print information, the finger print information of described each data block is according to the cutting order arrangement of data block.
14. methods as claimed in claim 13, it is characterised in that described in upload reflecting of described data
Penetrate file to removing multiple knot, including:
By the Kazakhstan that described mapped file cutting is multiple data block the data block calculating mapped file respectively
Uncommon value;
Determine the Bucket corresponding to cryptographic Hash of the data block of described mapped file;
The Bucket corresponding with the cryptographic Hash of the data block of described mapped file is determined according to described routing table
Corresponding removes multiple knot;
Upload the data block of described mapped file and corresponding cryptographic Hash to the data with described mapped file
Multiple knot is removed corresponding to the Bucket that the cryptographic Hash of block is corresponding.
15. methods as claimed in claim 14, it is characterised in that described described mapped file is cut
It is divided into multiple data block to include:
It is first data block in the plurality of data block by the header cutting of described mapped file;Institute
The header stating mapped file includes the size of described mapped file, the total quantity etc. of the plurality of data block
Information.
16. methods as claimed in claim 13, it is characterised in that described method also includes:
From the mapped file going multiple knot to obtain described data;
According to the finger print information in described mapped file from each data going multiple knot to obtain described data
Block;
Go out described according to the finger print information of the described each data block sequential concatenation in described mapped file
Data.
17. methods as claimed in claim 16, it is characterised in that described from going multiple knot to obtain institute
The mapped file stating data includes:
Title according to described mapped file and data block sequence number are from going multiple knot to obtain described mapped file
Each data block;
Each data block by described mapped file is spliced into the mapped file of described data.
18. methods as claimed in claim 11, it is characterised in that described basis obtains from Centroid
The routing table taken, determines that the go multiple knot corresponding with described Bucket includes:
When storing data first, obtain routing table from described Centroid;
According to the routing table obtained from Centroid, determine and corresponding with described Bucket remove multiple knot.
19. methods as claimed in claim 18, it is characterised in that described basis obtains from Centroid
The routing table taken, determines that the go multiple knot corresponding with described Bucket also includes:
Send request bag and remove multiple knot to corresponding with described Bucket;
Receiving the respond packet of going multiple knot return corresponding with described Bucket, described respond packet includes route
The version information of table;
Judge the version information of routing table in described respond packet and the described route obtained from Centroid
The version information of table is the most identical;
Version information and route that is described and that obtain from Centroid when the routing table in described respond packet
When the version information of table is identical, determine and described Bucket according to the described routing table obtained from Centroid
Corresponding removes multiple knot;
When the version information of the routing table in described respond packet and the version of the routing table from Centroid acquisition
When this information differs, obtain the routing table after updating from described Centroid;After described renewal
Routing table redefines corresponding with described Bucket removes multiple knot.
20. 1 kinds of data read-write methods, it is characterised in that including:
Centroid transmission routing table is to client, and described routing table includes Bucket and goes between multiple knot
Corresponding relation;
Go multiple knot receive described client fingerprint queries request, described fingerprint queries request include with
Described finger print information corresponding for the Bucket going multiple knot to be assigned to;
Described go multiple knot that described finger print information is inquired about, the finger print information not inquired is back to
Described client;
The described described finger print information not inquired going multiple knot to receive described client upload and
Representative data block.
21. methods as claimed in claim 20, it is characterised in that described method also includes:
Described go multiple knot preserve in the described Bucket being assigned to described in the finger print information that do not inquires,
Described data block is being preserved in described Container file corresponding for the Bucket being assigned to,
Described go multiple knot to described client return described data block preserve successful message.
22. methods as claimed in claim 21, it is characterised in that described in go multiple knot to described visitor
Before family end returns the described data block successful message of preservation, described method also includes:
Described multiple knot is gone to backup to standby by the data block of the described finger print information not inquired and representative thereof
Use node.
23. methods as claimed in claim 21, it is characterised in that described method also includes:
The data block of the described mapped file going multiple knot to preserve described client upload and corresponding Hash
Value.
24. methods as claimed in claim 23, it is characterised in that in the described client of described preservation
Data block and the corresponding cryptographic Hash of the mapped file passed include:
In the Container file corresponding to the Bucket of described correspondence, preserve described mapped file
Data block;
In the Bucket of described correspondence, preserve the cryptographic Hash and first of the data block of described mapped file
Storage information.
25. methods as claimed in claim 24, it is characterised in that described first storage information includes:
Preserve the title of the Container file of the data block of described mapped file, the data of described mapped file
The size of the data block of block side-play amount in described Container file and described mapped file.
26. methods as claimed in claim 23, it is characterised in that described method also includes:
The request of the described data block going multiple knot to receive the described client described mapped file of acquisition;
The described data block going multiple knot to send described mapped file is to described client;
Described go multiple knot to receive described client to obtain each finger print information in described mapped file
The request of representative data block;
Described multiple knot is gone to send data block representated by described each finger print information to described client.
27. methods as claimed in claim 26, it is characterised in that described in go multiple knot to send described
Data block representated by each finger print information to described client includes:
Described multiple knot is gone to determine the second storage information of described data block according to described finger print information, described
Second storage information includes the title preserving the Container file of described data block, and described data block exists
Side-play amount in described Container file and the size of described data block;
Described multiple knot is gone to judge that described Container file is according to the title of described Container file
No file to background server;
When described Container file has been filed to background server, described in go multiple knot according to described number
According to the size of block side-play amount in described Container file and described data block from described background service
Device obtains described data block and sends to described client;
When described Container file is still saved in this locality, described in go multiple knot to exist according to described data block
Side-play amount and the size of described data block in described Container file obtain described data block from this locality
And send to described client.
28. methods as claimed in claim 20, it is characterised in that described Centroid sends route
Table includes to client:
When described client stores data first, described Centroid receives described client and obtains road
By the request of table;
Described Centroid sends routing table to described client.
29. methods as claimed in claim 28, it is characterised in that described Centroid sends route
Table also includes to client:
The described request bag going multiple knot to receive described client:
Described go multiple knot to send respond packet to described client, described respond packet include described in remove multiple knot
The version information of the routing table preserved;
The version information of the routing table preserved when described client and the described road going multiple knot to preserve
By the version information of table inconsistent time, described Centroid receive described client routing table request;
Described Centroid sends the routing table after updating to described client.
30. methods as claimed in claim 20, it is characterised in that described in go multiple knot to described finger
Stricture of vagina information is inquired about, and the finger print information not inquired is back to described client and includes:
Described multiple knot is gone to judge whether described finger print information exists by Bloom filter;
In the presence of being judged that by Bloom filter described finger print information is not, determine that described finger print information is
The finger print information not inquired;
In the presence of judging described finger print information by Bloom filter, finger print information storehouse is inquired about institute
State whether finger print information exists;
When inquiring described finger print information in finger print information storehouse, determine that described finger print information exists;
When not inquiring described finger print information in finger print information storehouse, determine that described finger print information is not for look into
Ask the finger print information arrived.
31. 1 kinds of data-storage systems, it is characterised in that including: Centroid and one or more go
Multiple knot, wherein,
Described Centroid, for being assigned to going of correspondence according to preset strategy by each bucket (Bucket)
Multiple knot, and create routing table according to Bucket with the corresponding relation removing multiple knot, and synchronize described route
Table removes multiple knot to each;
Described remove multiple knot, for according to described routing table, store each described in the Bucket that is assigned to
The data block that corresponding finger print information and described finger print information represent.
32. 1 kinds of clients for reading and writing data, it is characterised in that including:
Cutting computing module, for being multiple data block by data cutting and calculating each data block respectively
Finger print information;
Bucket determines module, for determining the Bucket corresponding to finger print information of described each data block;
Node determines module, for according to the routing table obtained from Centroid, determining and described Bucket
Corresponding removes multiple knot;
Request sending module, for sending fingerprint queries request to the duplicate removal joint corresponding with described Bucket
Point, the request of described fingerprint queries includes the finger print information of data block;
Information receiving module, corresponding with described Bucket goes what multiple knot returned not inquire about for receiving
The finger print information arrived;
Transmission module in data, is used for the data block of finger print information and the representative thereof not inquired described in uploading extremely
Corresponding with described Bucket removes multiple knot.
33. 1 kinds of systems for reading and writing data, it is characterised in that including: Centroid and duplicate removal joint
Point, wherein,
Described Centroid, is used for sending routing table to client, and described routing table includes Bucket and goes
Corresponding relation between multiple knot;
Described remove multiple knot, for receiving the fingerprint queries request of described client, described fingerprint queries
Request includes and described finger print information corresponding for the Bucket going multiple knot to be assigned to;To described finger print information
Inquire about, the finger print information not inquired is back to described client;Receive in described client
The described finger print information not inquired passed and representative data block thereof.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510226830.0A CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510226830.0A CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106201771A true CN106201771A (en) | 2016-12-07 |
CN106201771B CN106201771B (en) | 2019-07-05 |
Family
ID=57459493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510226830.0A Active CN106201771B (en) | 2015-05-06 | 2015-05-06 | Data-storage system and data read-write method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106201771B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766478A (en) * | 2017-10-11 | 2018-03-06 | 复旦大学 | A kind of design method of concurrent index structure towards high competition scene |
CN107832341A (en) * | 2017-10-12 | 2018-03-23 | 千寻位置网络有限公司 | AGNSS user's duplicate removal statistical method |
CN108093024A (en) * | 2017-11-14 | 2018-05-29 | 西北工业大学 | A kind of classification method for routing and device based on data frequency |
CN108509616A (en) * | 2018-03-30 | 2018-09-07 | 北京怡生乐居信息服务有限公司 | Data processing method and system |
CN109725842A (en) * | 2017-10-30 | 2019-05-07 | 伊姆西Ip控股有限责任公司 | Accelerate random writing layout with the system and method for mixing the distribution of the bucket in storage system |
CN109740037A (en) * | 2019-01-02 | 2019-05-10 | 山东省科学院情报研究所 | The distributed online real-time processing method of multi-source, isomery fluidised form big data and system |
CN110071964A (en) * | 2019-03-26 | 2019-07-30 | 罗克佳华科技集团股份有限公司 | File synchronisation method, device, file sharing network, file are total to system and storage medium |
CN110134331A (en) * | 2019-04-26 | 2019-08-16 | 重庆大学 | Routed path planing method, system and readable storage medium storing program for executing |
CN110209727A (en) * | 2019-04-04 | 2019-09-06 | 特斯联(北京)科技有限公司 | A kind of date storage method, terminal device and medium |
CN110674116A (en) * | 2019-09-25 | 2020-01-10 | 四川长虹电器股份有限公司 | System and method for checking and inserting data repetition of database based on swoole |
CN111158948A (en) * | 2019-12-30 | 2020-05-15 | 深信服科技股份有限公司 | Data storage and verification method and device based on duplicate removal and storage medium |
CN111966649A (en) * | 2020-10-21 | 2020-11-20 | 中国人民解放军国防科技大学 | Lightweight online file storage method and device capable of efficiently removing weight |
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN113420400A (en) * | 2021-07-06 | 2021-09-21 | 北京字跳网络技术有限公司 | Routing relation establishing method, request processing method, device and equipment |
CN113625968A (en) * | 2021-08-12 | 2021-11-09 | 网易(杭州)网络有限公司 | File authority management method and device, computer equipment and storage medium |
CN115988002A (en) * | 2023-02-16 | 2023-04-18 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101539950A (en) * | 2009-05-08 | 2009-09-23 | 成都市华为赛门铁克科技有限公司 | Data storage method and device |
US20120323860A1 (en) * | 2011-06-14 | 2012-12-20 | Netapp, Inc. | Object-level identification of duplicate data in a storage system |
CN102968498A (en) * | 2012-12-05 | 2013-03-13 | 华为技术有限公司 | Method and device for processing data |
-
2015
- 2015-05-06 CN CN201510226830.0A patent/CN106201771B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101539950A (en) * | 2009-05-08 | 2009-09-23 | 成都市华为赛门铁克科技有限公司 | Data storage method and device |
US20120323860A1 (en) * | 2011-06-14 | 2012-12-20 | Netapp, Inc. | Object-level identification of duplicate data in a storage system |
CN102968498A (en) * | 2012-12-05 | 2013-03-13 | 华为技术有限公司 | Method and device for processing data |
Non-Patent Citations (1)
Title |
---|
JIANSHENG WEI等: ""MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services"", 《2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST)》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766478A (en) * | 2017-10-11 | 2018-03-06 | 复旦大学 | A kind of design method of concurrent index structure towards high competition scene |
CN107832341B (en) * | 2017-10-12 | 2022-01-28 | 千寻位置网络有限公司 | AGNSS user duplicate removal statistical method |
CN107832341A (en) * | 2017-10-12 | 2018-03-23 | 千寻位置网络有限公司 | AGNSS user's duplicate removal statistical method |
CN109725842A (en) * | 2017-10-30 | 2019-05-07 | 伊姆西Ip控股有限责任公司 | Accelerate random writing layout with the system and method for mixing the distribution of the bucket in storage system |
CN109725842B (en) * | 2017-10-30 | 2022-10-11 | 伊姆西Ip控股有限责任公司 | System and method for accelerating random write placement for bucket allocation within a hybrid storage system |
CN108093024A (en) * | 2017-11-14 | 2018-05-29 | 西北工业大学 | A kind of classification method for routing and device based on data frequency |
CN108509616A (en) * | 2018-03-30 | 2018-09-07 | 北京怡生乐居信息服务有限公司 | Data processing method and system |
CN109740037A (en) * | 2019-01-02 | 2019-05-10 | 山东省科学院情报研究所 | The distributed online real-time processing method of multi-source, isomery fluidised form big data and system |
CN109740037B (en) * | 2019-01-02 | 2023-11-24 | 山东省科学院情报研究所 | Multi-source heterogeneous flow state big data distributed online real-time processing method and system |
CN110071964B (en) * | 2019-03-26 | 2022-03-15 | 罗克佳华科技集团股份有限公司 | File synchronization method, device, file sharing network, file sharing system and storage medium |
CN110071964A (en) * | 2019-03-26 | 2019-07-30 | 罗克佳华科技集团股份有限公司 | File synchronisation method, device, file sharing network, file are total to system and storage medium |
CN110209727A (en) * | 2019-04-04 | 2019-09-06 | 特斯联(北京)科技有限公司 | A kind of date storage method, terminal device and medium |
CN110134331A (en) * | 2019-04-26 | 2019-08-16 | 重庆大学 | Routed path planing method, system and readable storage medium storing program for executing |
CN110134331B (en) * | 2019-04-26 | 2020-06-05 | 重庆大学 | Routing path planning method, system and readable storage medium |
CN110674116A (en) * | 2019-09-25 | 2020-01-10 | 四川长虹电器股份有限公司 | System and method for checking and inserting data repetition of database based on swoole |
CN110674116B (en) * | 2019-09-25 | 2022-05-03 | 四川长虹电器股份有限公司 | System and method for checking and inserting data repetition of database based on swoole |
CN111158948B (en) * | 2019-12-30 | 2024-04-09 | 深信服科技股份有限公司 | Data storage and verification method and device based on deduplication and storage medium |
CN111158948A (en) * | 2019-12-30 | 2020-05-15 | 深信服科技股份有限公司 | Data storage and verification method and device based on duplicate removal and storage medium |
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN112148928B (en) * | 2020-09-18 | 2024-02-20 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN111966649B (en) * | 2020-10-21 | 2021-01-01 | 中国人民解放军国防科技大学 | Lightweight online file storage method and device capable of efficiently removing weight |
CN111966649A (en) * | 2020-10-21 | 2020-11-20 | 中国人民解放军国防科技大学 | Lightweight online file storage method and device capable of efficiently removing weight |
CN113420400A (en) * | 2021-07-06 | 2021-09-21 | 北京字跳网络技术有限公司 | Routing relation establishing method, request processing method, device and equipment |
CN113625968A (en) * | 2021-08-12 | 2021-11-09 | 网易(杭州)网络有限公司 | File authority management method and device, computer equipment and storage medium |
CN113625968B (en) * | 2021-08-12 | 2024-03-01 | 网易(杭州)网络有限公司 | File authority management method and device, computer equipment and storage medium |
CN115988002A (en) * | 2023-02-16 | 2023-04-18 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
CN115988002B (en) * | 2023-02-16 | 2023-08-15 | 荣耀终端有限公司 | Data transmission method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106201771B (en) | 2019-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106201771A (en) | Data-storage system and data read-write method | |
US10810161B1 (en) | System and method for determining physical storage space of a deduplicated storage system | |
US10380073B2 (en) | Use of solid state storage devices and the like in data deduplication | |
US9798486B1 (en) | Method and system for file system based replication of a deduplicated storage system | |
US9967298B2 (en) | Appending to files via server-side chunking and manifest manipulation | |
CN104077423B (en) | Consistent hash based structural data storage, inquiry and migration method | |
US7827146B1 (en) | Storage system | |
US9141633B1 (en) | Special markers to optimize access control list (ACL) data for deduplication | |
US8548957B2 (en) | Method and system for recovering missing information at a computing device using a distributed virtual file system | |
US9424185B1 (en) | Method and system for garbage collection of data storage systems | |
US9367448B1 (en) | Method and system for determining data integrity for garbage collection of data storage systems | |
US9547706B2 (en) | Using colocation hints to facilitate accessing a distributed data storage system | |
US7689764B1 (en) | Network routing of data based on content thereof | |
CN102708165B (en) | Document handling method in distributed file system and device | |
US7577808B1 (en) | Efficient backup data retrieval | |
US9965505B2 (en) | Identifying files in change logs using file content location identifiers | |
CN105550371A (en) | Big data environment oriented metadata organization method and system | |
Frey et al. | Probabilistic deduplication for cluster-based storage systems | |
CN104184812B (en) | A kind of multipoint data transmission method based on private clound | |
CN104408111A (en) | Method and device for deleting duplicate data | |
US9383936B1 (en) | Percent quotas for deduplication storage appliance | |
CN109522283A (en) | A kind of data de-duplication method and system | |
US20200349115A1 (en) | File system metadata deduplication | |
US8612717B2 (en) | Storage system | |
CN104951475B (en) | Distributed file system and implementation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |