CN109407975A - Data writing method and calculate node and distributed memory system - Google Patents

Data writing method and calculate node and distributed memory system Download PDF

Info

Publication number
CN109407975A
CN109407975A CN201811095443.8A CN201811095443A CN109407975A CN 109407975 A CN109407975 A CN 109407975A CN 201811095443 A CN201811095443 A CN 201811095443A CN 109407975 A CN109407975 A CN 109407975A
Authority
CN
China
Prior art keywords
hard disk
target
data
state
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811095443.8A
Other languages
Chinese (zh)
Other versions
CN109407975B (en
Inventor
何春雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811095443.8A priority Critical patent/CN109407975B/en
Publication of CN109407975A publication Critical patent/CN109407975A/en
Application granted granted Critical
Publication of CN109407975B publication Critical patent/CN109407975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In distributed memory system, after calculate node receives write order, target data is carried in the write order, the corresponding target partition of write order described in the tag queries according to write order;Inquire the corresponding multiple hard disk ID of the target partition, the corresponding multiple hard disks of the target partition include with the hard disk for having selected state and with the hard disk of alternative state;First data are stored to the hard disk for having selected state, but not the hard disk of target data deposit alternative state.

Description

Data writing method and calculate node and distributed memory system
Technical field
The present invention relates to field of storage, in particular to distributed storage technology.
Background technique
In the prior art, distributed memory system is divided into multiple failure domains (usually with server or cabinet It is divided for granularity).A copy only can be stored in the same failure domain, single failure domain failure is avoided to lead to multiple copies not It may have access to, therefore the quantity of system requirements failure domain has to be larger than equal to number of copies.For example, at least being needed under three transcript scenes Three or three or more failure domains, just can guarantee will not place two copies in the same failure domain.When some failure domain loses After effect, the data in the failure domain can be rebuild on other failure domains, to guarantee the integrality of copy amount.
Under prior art, once after creation storage pool, it is subsequent to be difficult to be adjusted the copy of storage pool.Example If the distributed memory systems of three copies can only store data in the way of three copies, can not increase number of copies or Reduce number of copies.
If you need to force reduce number of copies, can only one by one volume reduction failure domain until the quantity and number of copies of failure domain it is equal, so It forces to remove a failure domain again afterwards, since, there are mutual exclusion rule, data are unable to complete reconstruction between copy, be reached with this To the purpose for reducing number of copies.By taking three copies are reduced to two copies as an example, scheme is: first distributed memory system event Barrier domain sum is reduced to three;Then data are stored in a manner of three copies;Then a failure domain is removed by force, Only retain two failure domains;In this way, respectively possessing a copy in two failure domains retained, therefore reach from the reduction of three copies To the purpose of two copies.It can could see and, this scheme is complicated for operation and it is necessary to remove failure domain by force, limitation is very By force.
Summary of the invention
In a first aspect, the present invention provides a kind of embodiment of data writing method, it is used for distributed memory system storing data, The distributed memory system includes calculate node and multiple memory nodes, and each memory node includes hard disk, the method packet Include: the calculate node receives write order, carries target data in the write order and target labels, the target labels are used for Identify the target data;The calculate node inquires target partition corresponding to the target labels;The calculate node is looked into Ask the corresponding multiple hard disk ID of the target partition, the corresponding multiple hard disks of the target partition include having to have selected the hard of state Disk and hard disk with alternative state, the hard disk for having selected state can be used for storing the copy of the target data, wherein institute Stating with the hard disk for having selected state includes Primary Hard Drive and from hard disk, and the hard disk of the alternative state is not used in the storage number of targets According to copy;Storage section of the calculate node by the target data and where having selected hard disk list to be sent to the Primary Hard Drive Point, described selected in hard disk list include from hard disk ID and do not include alternative hard disk ID;Memory node where the Primary Hard Drive The target data is stored in the Primary Hard Drive, and, it is described from hard disk according to the target data is sent to from hard disk ID The memory node at place;The memory node from where the hard disk slave hard disk that target data deposit is local.
Based on the program, the stored copy amount of target data hard disk quantity corresponding with target partition no longer keeps one It causes, to more there is flexibility.
In the first possible implementation of first aspect, in the memory node from where hard disk the target After the local slave hard disk of data deposit, further comprise: the state of alternative hard disk corresponding to the target partition is updated To have selected hard disk;The target data is obtained from the Primary Hard Drive for having stored the target data or from hard disk, obtaining The target data taken stores to new and has selected hard disk.
It, can be to having selected hard disk and the ratio of alternative hard disk to be adjusted, so as to adjust copy amount based on the program.
In second of possible implementation of first aspect, the target data can be deblocking, the target Label is the combination of logical unit number mark LUN ID and logical block address LBA.
This solution provides a kind of specific examples of target data and target labels.
In the third possible implementation of first aspect, the target data can be key-value pair (key value Pair), the target labels are key (key).
This solution provides a kind of specific examples of target data and target labels.
In the 4th kind of possible implementation of first aspect, target partition corresponding to the target labels is inquired, is had Body includes: the cryptographic Hash for calculating target labels, according to the corresponding relationship between the cryptographic Hash and the target partition, inquiry with Target partition corresponding to target labels described in target partition corresponding to the target labels.
This solution provides a kind of concrete operations schemes for inquiring target partition.
In the 5th kind of possible implementation of first aspect, according to the method described in claim 1, wherein: the mesh The state for marking the corresponding multiple hard disks of subregion is related to the target partition.
The state that the program explains hard disk is relevant with target partition.The same subregion is for two subregions, shape State can be different.
Second aspect provides a kind of embodiment of data writing method, described for data to be written to distributed memory system Distributed memory system includes calculate node and multiple memory nodes, and each memory node includes hard disk, which comprises
The calculate node receives write order, carries target data and target labels, the target mark in the write order Label are for identifying the target data;The calculate node inquires target partition corresponding to the target labels;The calculating Target partition described in querying node corresponding multiple hard disk ID, the corresponding multiple hard disks of the target partition include having to have selected shape The hard disk of state and hard disk with alternative state, the hard disk for having selected state can be used for storing the copy of the target data, The hard disk for having selected state is not used in the copy for storing the target data;The calculate node is corresponding according to target partition The target data is sent to where the hard disk for having and having selected state by the hard disk ID for having selected state in multiple hard disk ID Memory node.
Based on the program, the stored copy amount of target data hard disk quantity corresponding with target partition no longer keeps one It causes, to more there is flexibility.In the program, the target data is sent to the hard disk for having and having selected state by calculate node The memory node at place covers two kinds of situations: the first situation is similar to the scheme that first aspect provides, calculation server handle The target data is sent to the storage server where the Primary Hard Drive, and the storage server where the Primary Hard Drive is described in Target data is sent to from the storage server where hard disk, then as being stored from the server where hard disk;Second group Situation is not distinguish Primary Hard Drive and from hard disk, selected hard disk according to corresponding to target partition, calculation server is directly described in Target data is sent to the storage server where having selected hard disk and is stored.
In the first possible implementation of second aspect, the memory node from where hard disk is the number of targets After the local slave hard disk of deposit, further comprises: the state of alternative hard disk corresponding to the target partition is updated to Hard disk is selected;The target data is obtained from the Primary Hard Drive for having stored the target data or from hard disk, acquisition The target data store and to new selected hard disk.
The program may update the ratio for having selected hard disk and alternative hard disk.
The present invention also provides the embodiments of distributed memory system and calculate node, have the effect of above-mentioned corresponding method.
Detailed description of the invention
Fig. 1 is distributed memory system embodiment topological diagram;
Fig. 2 is the embodiment schematic diagram of subregion Yu disk state corresponding relationship;
Fig. 3 is the another embodiment schematic diagram of subregion Yu disk state corresponding relationship;
Fig. 4 is the another embodiment schematic diagram of subregion Yu disk state corresponding relationship;
Fig. 5 is the embodiment flow chart to distributed memory system write-in data;
Fig. 6 is another embodiment schematic diagram of subregion Yu disk state corresponding relationship.
Specific embodiment
In distributed memory system, identical Data duplication is stored in different storage servers, each storage Data on server are known as a copy (copy), and the mode of this protection data is known as more copies (multi-copy), Referred to as mirror image (mirror).Here data are, for example, data block or object.Data block is the data unit of block storage, such as Storage area network (storage area network, SAN);Object is the data sheet of object storage (object storage) Position, such as cloud object storage (cloud object storage) or key assignments (key-value, KV) storage.
Referring to attached drawing 1, host 11 and distributed memory system communication, distributed memory system include storage server 12, Storage server 13, storage server 14 and storage server 15 further include calculation server 16, calculation server 17 and first number According to server 18, the quantity of each server can more (not shown).Wherein, the calculation server is for receiving from host 11 The data of sending calculate the corresponding subregion of data;The corresponding hard disk of subregion is stored in the meta data server 18, in this hair In bright embodiment, the meta data server 18 also extra storage has the state of the corresponding hard disk of subregion, has only selected the hard of state For storing data, the hard disk of alternative state is not used in storing data to disk.Pair of the meta data server to subregion and hard disk relationship Table and disk state is answered to be managed.By query metadata server 18, the calculation server 16 can obtain primary data Need to be sent to storage server (having selected the storage server where hard disk);The storage server is for directly or indirectly The data that the calculation server is sent are received, is stored in and local has selected hard disk.Each the data of hard-disc storage is selected to be known as The copy for the data that host issues.
Calculation server 16 and calculation server 17 are likely to receive the data block of host, when some calculation server is received After the data block sent to host 11, the hard disk ID for storing this data block is inquired.According to query result, this data Block is respectively stored in the hard disk of storage server 12, storage server 13 and storage server 14.In each storage server The data block of storage is identical, and each data block is properly termed as a copy (copy 121, copy 131 and copy 141).Due to pair This sum is 3, and this storage mode is known as 3 copies.
It should be noted that in other embodiments, two in storage server, calculation server and meta data server Person or three can integrate together, such as the function of the existing storage server of the same server has calculation server again Function.Since the essence of technology does not change, independent introduction is not done to such case embodiment of the present invention.
It, can be as Fig. 1 using single storage server as stored copies in the data protection mode of more copies Minimum unit, that is, each storage server is called a failure domain;For the same data block, each storage clothes The copy amount of business device storage is no more than 1, and the failure of any one storage server only will affect this storage server certainly The copy that oneself is stored will not influence the copy in other storage servers.Other than using storage server as failure domain, Can also using hard disk, machine frame, computer room, data center as stored copies minimum unit.
The embodiment of the present invention carries out the volume reduction of copy according to current business demand, and the capacity after volume reduction is to can satisfy currently Business demand will not bring performance to influence during volume reduction because of data reconstruction, and data reconstruction is high-efficient;In addition, also providing Increase the scheme of data copy number.Specifically, the embodiment of the present invention establishes number to be stored by subregion (partition) According to the corresponding relationship between the hard disk of storing data.In the present embodiment, there are maximum number of copies, and by label or its His mode hard disk corresponding to subregion is arranged different states of having selected, and the state of having selected of hard disk includes having selected hard disk and alternative hard Disk.It, can be to the pair of data by changing the ratio selected between hard disk and alternative hard disk within the scope of maximum number of copies The increase or reduction of this quantity;It is entirely in the hard disk corresponding to the subregion when having selected hard disk (there is no alternative hard disk), The maximum number of copies of subregion has selected hard disk number identical with this subregion.
Specifically, there are the maximum number of copies of subregion in distributed memory system.And establish each subregion and hard disk Mapping relations, mapping relations record in the mapping table, this mapping table is also referred to as subregion routing table.Mapping table passes through mostly secondary This mode is stored in meta data server 18.In the embodiment of the present invention, disk state can be marked, disk state Including having selected state and alternative state.In having selected the hard disk of state that can be stored in copy, the hard disk in alternative state can not It is stored in copy.In the hard disk sum+hard disk sum in the alternative state=maximum number of copies for having selected state.Disk state can Be recorded in the mapping table together with the mapping relations, or in addition individually record.
To multiple hard disks corresponding to the same subregion, the precedence relationship between hard disk can be set, according to precedence relationship Determine the sequence of change disk state.For example, multiple hard disks corresponding with same subregion can record these with the mode of chained list The precedence relationship of hard disk.What it is positioned at chained list stem is Primary Hard Drive, remaining hard disk is from hard disk.By taking Fig. 2 as an example, each subregion is corresponding Hard disk number be 4, therefore in the case where whole hard disks are all to have selected state (not shown), each data can be with the side of 4 copies Formula is stored.For example, subregion 1 corresponds to hard disk 211, hard disk 221, hard disk 231 and hard disk 241, this is described with arrow in Fig. 2 Positional relationship of 4 hard disks in chained list.For subregion 1, what it is positioned at chained list stem is hard disk 211, and what it is positioned at chained list tail portion is Hard disk 241, wherein hard disk 211 is Primary Hard Drive, and hard disk 221, hard disk 231 and hard disk 241 are from hard disk;Similar, subregion 2 is right Answer hard disk 251, hard disk 261, hard disk 271 and hard disk 211, the corresponding hard disk 291 of subregion 3, hard disk 301, hard disk 311 and hard disk 211. From figure 2 it may also be seen that hard disk 211 and three subregions are all related, in which: for subregion 1, hard disk 211 is in and has selected state, Hard disk 211 is but in alternative state for subregion 2, subregion 3.In other words, disk state is not the attribute of hard disk itself, A but parameter for subregion.Disk state describe this hard disk for particular zones be selected state or Alternative state.
As shown in Fig. 2, when needing the copy data of data to be reduced to 3, it, can be chained list end for subregion 1 Hard disk 241 positioned at server 24 is set as alternative state, remaining hard disk should be set as having selected state;It, can be with for subregion 2 The hard disk 211 at chained list end is set as alternative state;For subregion 3, the hard disk 211 at chained list end can be set as alternative State.Further, if it is desired to expanding copy amount, such as 4 copies are increased to from 3 copies, then can be 1 in alternative The hard disk of state is updated to select state.For example, the hard disk 214 in Fig. 2 in subregion 1 is changed to select shape from alternative state State, then subregion 1 can support the storage of 4 copies.In a distributed system, disk state is all carried out to all copies Change, can keep the consistent of each subregion copy amount.
In technical solution provided by the present embodiment, distributed storage system can be flexibly adjusted by changing disk state The supported copy amount of system.Since copy amount is fewer, then utilization ratio of storage resources is higher, and copy amount is more, then data Reliability is higher, therefore, after the embodiment of the present invention, can find between utilization ratio of storage resources and data reliability Preferably balance.
With reference to Fig. 3, a specific implementation method for writing data is introduced below.This method can be applied to Fig. 1 institute In the distributed memory system shown.
The operating system (OS) of step S31, host 11 pass through small computer system interface (small computer System interface, SCSI) or Internet Small Computer Systems Interface (internet small computer System interface, iSCSI) it is sent out to any first calculation server (target calculation server) of distributed memory system Send write request (write request is also referred to as write IO request).It is asked for the convenience of description, the write request that this step issues is called first and is write It asks or target write request.It is carried in first write request and needs to be written the data of distributed memory system, referred to as first Data (target data).In addition, also carrying the label of the first data in first write request, label can be distinguished different Data.It should be noted that different data correspond to different labels under normal conditions, allow different numbers in a few cases According to possessing identical label.
Object (object) is stored, label can be the title of object.In key-value (key-value) storage, the One data are value (value), can be using key (key) as label.
Block (block) is stored, it can be using write address as label.Write address may is that logical unit number identifies (LUN ID)+logical block address (Logical Block Address, LBA).Wherein, LUN ID, which describes first data, needs Which LUN to be written;LBA is for describing the specific position that first data are written into the first LUN (target LUN) It sets, referred to as offset.For the convenience of description, in case of no particular description, the present invention is situated between so that block stores as an example It continues, independent explanation is not done for key-value storage.
In this step, host 11 can be the equipment except distributed memory system, be also possible to distributed memory system Interior equipment.Such as can be and integrated with any one storage server shown in FIG. 1, it is in such a scenario, described Host 11 can be the fusion of physical host and storage server, be also possible to possess the storage server of virtual machine function.More Further, if host and storage server integrate, the first data can be generated by storage server and are write The movement of address, " sending the first write request to distributed memory system " can be omitted.After corresponding, in below step S32 " receive scsi command " operation also no longer needs, receive the first write request storage server can directly according to LUN ID and LBA is assembled into a key.
Step S32, first calculation server pass through operation virtual block storage (virtual block store, VBS) Management software receives SCSI write request, (as previously mentioned, first calculation server is desirably integrated into storage server, In the case where integrated, the equipment for receiving scsi command is properly termed as storage server, but the role of this storage server is suitable In the combination of calculation server and storage server).According to the number of the LUN ID of the first data in the scsi command received and first According to LBA be assembled into a key.Then Hash (hash) operation is carried out to the key, obtains cryptographic Hash.Hash function is a kind of One-way function, the input of random length can be become the output of regular length by it, also, can not find two for an output A different input.It is translated when virtual block is stored with and is virtual block storage, be sometimes referred to as virtual block system (virtual storage system, VBS).
The combination of LUN ID and LBA can uniquely determine a data block, thus the two can be stitched together as The label of data block.The connecting method of LUN ID and LBA are, for example: LUN ID is stitched together to form one after in preceding, LBA Key.Key-Value is stored, it can be directly using Key therein as label.
VBS software module can execute volume metadata management, and VBS provides distributed storage by SCSI or iSCSI interface Access point service enables first calculation server to access distributed storage resource by VBS.VBS and storage server Point-to-point communication is carried out, VBS is enable concurrently to access these storage server hard disks.One can be disposed in each storage server VBS process, the VBS on multiple nodes form VBS cluster.In storage server IO can also be promoted by disposing multiple VBS Performance.
Step S33, first calculation server are determined according to the corresponding relationship between the result and subregion of Hash operation The corresponding partition id of first data.For convenience of description, the corresponding partition id of the first data (target data) is called target point Area ID.
Based on distributed hashtable (Distributed Hash Table, DHT), distributed memory system is by hash space (0~2^32) is divided into N parts (such as N equal portions), and every 1 part is 1 subregion (partition), and each subregion possesses multiple Hash Value, each subregion possess a unique ID (partition ID).Each subregion corresponds to multiple hard disks.That is, logical Hash function is crossed, may be implemented: the cryptographic Hash-of Key > subregion-> a plurality of hard disks corresponding relationship.Pair of subregion and hard disk It should be related to preservation in the mapping table, the mapping table can store in meta data server.A plurality of hard disks can be divided into master Hard disk and from hard disk;It can also be the relationship of equality without master and slave differentiation, between hard disk.
Step S34, first calculation server inquire target partition ID according to target partition ID from the mapping table The ID of corresponding hard disk and the state of these hard disks.Record has the corresponding relationship of each partition id and hard disk ID in mapping table, And the state of hard disk.State includes: to have selected state, alternative state.Disk state can be arranged by administrator by program.Place In having selected the hard disk of state referred to as to select hard disk, the hard disk in alternative state is known as alternative hard disk.Hard disk has been selected to can be written into Data cannot be written in data, alternative hard disk.It should be noted that disk state here is not the shape of hard disk physically State, but setting in logic, for marking: when writing data, whether which can be written into the copy of data.Hard disk ID and Storage server address is corresponding, therefore after the corresponding hard disk ID of acquisition target partition, with can obtaining corresponding storage server Location.The state of hard disk is maintained in meta data server.
The VBS module of step S35, the first calculation server are sent to the storage server where Primary Hard Drive: described first Data in the hard disk ID of state has been selected (may include Primary Hard Drive ID and from hard disk ID;Or do not include Primary Hard Drive ID, only wrap It includes from hard disk ID).After storage server where Primary Hard Drive receives the first data, according to Primary Hard Drive ID, first data The Primary Hard Drive is written;And institute is sent according to from hard disk ID to the storage server where having selected the slave hard disk of state State the first data.It should be noted that first calculation server will not be to the storage where the hard disk in alternative state Server sends first data.
By taking the subregion 1 of Fig. 3 as an example, the storage server 21 where Primary Hard Drive 211 executes following operation: the first data are write Enter Primary Hard Drive 211;First data are sent to the storage server 22 at 221 place of hard disk, carry and deposit in the request of transmission Store up the address of server 22 and the ID of hard disk 221;The storage server first data being sent to where hard disk 231 23, the address of storage server 23 and the ID of hard disk 231 are carried in the request of transmission.However, although hard disk 241 is also subregion Hard disk corresponding to 1, but since the state of hard disk 241 is in alternative state, storage server 21 will not be by described One data are sent to the storage server 24 where hard disk 241.
Step S36, from the storage server where hard disk by the server where Primary Hard Drive obtain first data, It is according to the ID from hard disk that first data deposit is respective from hard disk from hard disk ID.That is: storage server 22 is described in Hard disk 221 is written in first data, and hard disk 231 is written in first data by storage server 23.Storage server 24 is not received To first data, first data will not be written in hard disk 241.
Storage server 22 and storage server 23 receive send after first data be written successful response message to The storage server 21, storage server 21, which is sent, is written successful response message to host, and host is made to learn the first data Write as function.
Step S37, meta data server in the corresponding alternative hard disk of target partition one or more is alternative Hard disk is set as having selected hard disk.Meta data server instruction copy is copied to from the server for stored copy state from Alternatively it is updated in the hard disk selected.Specifically, the server where meta data server instruction copy sends copy to new Copy is stored in newly-increased hard disk as the server where increasing newly and having selected hard disk by the server selected where hard disk increased;Alternatively, first Data server instruction obtains pair as the server where increasing newly and having selected hard disk from the server for stored target data This, and it is stored in local newly-increased hard disk.
Referring to fig. 4, the corresponding hard disk 241 of subregion 1 is become to have selected state from alternative state.So need from hard disk 211, Either the first authentic copy in hard disk 221 or 231 copies in hard disk 241, so that the first data are in a manner of four copies It is maintained in distributed memory system.
The alternative hard disk of subregion is set as to have selected hard disk, or has selected hard disk to be set as alternative hard disk subregion, it can It, can also be by meta data server according to tactful automatic execution to be manually configured in meta data server by administrator.Example Such as, it detects that the temperature of data increases in the storage server of distributed system, then alternative hard disk is set as having selected hard disk, from And increase copy amount;Conversely, reducing the quantity of copy.
Step S38-S43, one or more alternative hard disk setting in the corresponding alternative hard disk of target partition After having selected hard disk, host sends the second write request to the distributed memory system.Is carried in second write order Two data and the second label, second label is for identifying second data.Second meter of the distributed memory system It calculates server and receives second write order, and second data are stored.Step S38-S43 and step S31-S36 It is similar, therefore step S31- step S36 can be referred to, it is not detailed.Step S38-S43 and step S31-S36 difference exist In: write request is changed, therefore stored data is also changed;In addition, after step S37, it is alternative hard Disk tails off (minimum situation is to be reduced to 0), hard disk has been selected to increase, and therefore, the copy amount stored required for data is also corresponding Increase.Such as: the hard disk 241 of the subregion 1 of Fig. 4 is become after having selected state from alternative state.So storage server 21 in addition to The second data be written Primary Hard Drive 211, it is also necessary to the ID of the second data and hard disk 221 be sent to storage server 22, The ID of second data and hard disk 231 is sent to storage server 23, and the ID of the second data and hard disk 241 is sent to Storage server 24.Second data are eventually stored in hard disk 211, hard disk 221, hard disk 231 and hard disk 241.In addition, step Calculation server involved in rapid S38-S43 and step S31-S36 can be the same calculation server, be also possible to different meters Calculate server.In order to distinguish the difference of role, the calculation server of step S31-S36 is called the first calculation server, Calculation server in S38-S43 is known as the second calculation server.
In other embodiments, can have a scheme of alternative steps S38-S43, meta data server target partition Hard disk is selected to be set as alternative state.Correspondingly, one or more in the alternative hard disk of target partition has selected hard disk It is set as after alternative hard disk, the server where notice becomes the hard disk of alternative state deletes local copy.Referring to figure 5, for Fig. 2, the hard disk 231 of subregion 1 from having selected state to become alternative state.It so can be in hard disk 231 The first authentic copy is deleted.Hard disk is being selected to be set as the new write order that alternative state receives subregion 1, in order to write with first Order, the second write order are distinguished, this new write order can be called third write order.Third number is carried in third write order According to third label, the stored copy amount of third data can be corresponding with hard disk quantity has been selected, that is, stores to hard disk 211 In hard disk 221.Since the storing process of third data is similar with step S31-S36, do not repeat them here yet.
For all subregions in the same distributed memory system, the consistent of number of copies can be kept, that is, Select hard disk consistent with the ratio of alternative hard disk.

Claims (12)

1. a kind of data writing method, for data, the distributed storage to be written in the memory node to distributed memory system System includes calculate node and the memory node, and each memory node includes hard disk, which comprises
The calculate node receives write order, and target data and target labels are carried in the write order, and the target labels are used In the mark target data;
The calculate node inquires target partition corresponding to the target labels;
The calculate node inquires the corresponding multiple hard disk ID of the target partition, the corresponding multiple hard disk packets of the target partition It includes with the hard disk for having selected state and with the hard disk of alternative state, the hard disk for having selected state can be used for storing the target The copy of data, wherein described to have the hard disk for having selected state include Primary Hard Drive and from hard disk, and the hard disk of the alternative state is not For storing the copy of the target data;
Memory node of the calculate node by the target data and where having selected hard disk list to be sent to the Primary Hard Drive, institute It states to have selected in hard disk list and includes from hard disk ID and do not include alternative hard disk ID;
The target data is stored in the Primary Hard Drive by the memory node where the Primary Hard Drive, and, it is incited somebody to action according to from hard disk ID The target data is sent to described from the memory node where hard disk;
The memory node from where the hard disk slave hard disk that target data deposit is local.
2. according to the method described in claim 1, the target data is stored in this in the memory node from where hard disk After the slave hard disk on ground, further comprise:
The state of the hard disk of alternative state corresponding to the target partition is updated to select state;
The target data is obtained from the Primary Hard Drive for having stored the target data or from hard disk, described in acquisition Target data storage has selected hard disk to new.
3. according to the method described in claim 1, further include:
The target data is deblocking;
The target labels are the combinations of logical unit number mark LUN ID and logical block address LBA.
4. being specifically included according to the method described in claim 1, inquiring target partition corresponding to the target labels:
The cryptographic Hash for calculating the target labels, according to the corresponding relationship between the cryptographic Hash and the target partition, inquiry With target partition corresponding to target labels described in target partition corresponding to the target labels.
5. according to the method described in claim 1, wherein:
The state of the corresponding multiple hard disks of the target partition is related to the target partition.
6. a kind of data writing method, for data to be written to distributed memory system, the distributed memory system includes calculating Node and multiple memory nodes, each memory node include hard disk, which comprises
The calculate node receives write order, and target data and target labels are carried in the write order, and the target labels are used In the mark target data;
The calculate node inquires target partition corresponding to the target labels;
The calculate node inquires the corresponding multiple hard disk ID of the target partition, the corresponding multiple hard disk packets of the target partition It includes with the hard disk for having selected state and with the hard disk of alternative state, the hard disk for having selected state can be used for storing the target The copy of data, the hard disk for having selected state are not used in the copy for storing the target data;
The calculate node is according to the hard disk ID for having selected state in the corresponding multiple hard disk ID of target partition, by the number of targets According to being sent to the memory node having where having selected the hard disk of state.
7. according to the method described in claim 6, the memory node from where hard disk is stored in the target data locally Slave hard disk after, further comprise:
The state of alternative hard disk corresponding to the target partition is updated to select hard disk;
The target data is obtained from the Primary Hard Drive for having stored the target data or from hard disk, described in acquisition Target data storage has selected hard disk to new.
8. a kind of distributed memory system, the distributed memory system includes calculate node, metadata node and multiple storages Node, each memory node include hard disk, in which:
The calculate node is used for:
Write order is received, target data and target labels are carried in the write order, the target labels are for identifying the mesh Mark data;
Inquire target partition corresponding to the target labels;
Inquire the corresponding multiple hard disk ID of the target partition from metadata node, the corresponding multiple hard disks of the target partition Including with the hard disk for having selected state and with the hard disk of alternative state, the hard disk for having selected state can be used for storing the mesh Mark the copy of data, wherein described to have the hard disk for having selected state include Primary Hard Drive and from hard disk, the hard disk of the alternative state It is not used in the copy for storing the target data;
Memory node by the target data and where having selected hard disk list to be sent to the Primary Hard Drive, it is described to have selected hard disk clear Include from hard disk ID in list and does not include alternative hard disk ID;
Memory node where the Primary Hard Drive is used for:
The target data is stored in the Primary Hard Drive, and, according to from hard disk ID by the target data be sent to it is described from Memory node where hard disk;
The memory node from where hard disk is used for:
The local slave hard disk of target data deposit.
9. distributed memory system according to claim 8, the metadata node is also used to:
The state of the alternative hard disk is updated to select hard disk, has selected hard disk corresponding to the target partition to increase Quantity;
Instruction from the Primary Hard Drive for having stored the target data or from hard disk, replicate the target data to newly Select hard disk.
10. distributed memory system according to claim 8, in which:
The target data is deblocking;
The target labels are the combinations of logical unit number mark LUN ID and logical block address LBA.
11. distributed memory system according to claim 8, in which:
The state of the corresponding multiple hard disks of the target partition is related to the target partition.
12. a kind of calculate node, the calculate node and multiple memory nodes belong to distributed memory system, each memory node Including hard disk, the calculate node includes processor and memory and interface, and computer program, the meter are stored in the memory Operator node is executed by the computer program:
By the interface write order, target data and target labels are carried in the write order, the target labels are used In the mark target data;
Inquire target partition corresponding to the target labels;
Inquire the corresponding multiple hard disk ID of the target partition, the corresponding multiple hard disks of the target partition include having to have selected shape The hard disk of state and hard disk with alternative state, the hard disk for having selected state can be used for storing the copy of the target data, The hard disk for having selected state is not used in the copy for storing the target data;
According to the hard disk ID for having selected state in the corresponding multiple hard disk ID of target partition, by the interface by the number of targets According to being sent to the memory node having where having selected the hard disk of state.
CN201811095443.8A 2018-09-19 2018-09-19 Data writing method, computing node and distributed storage system Active CN109407975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811095443.8A CN109407975B (en) 2018-09-19 2018-09-19 Data writing method, computing node and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811095443.8A CN109407975B (en) 2018-09-19 2018-09-19 Data writing method, computing node and distributed storage system

Publications (2)

Publication Number Publication Date
CN109407975A true CN109407975A (en) 2019-03-01
CN109407975B CN109407975B (en) 2020-08-25

Family

ID=65465017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811095443.8A Active CN109407975B (en) 2018-09-19 2018-09-19 Data writing method, computing node and distributed storage system

Country Status (1)

Country Link
CN (1) CN109407975B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488124A (en) * 2020-04-08 2020-08-04 深信服科技股份有限公司 Data updating method and device, electronic equipment and storage medium
CN112015820A (en) * 2020-09-01 2020-12-01 杭州欧若数网科技有限公司 Method, system, electronic device and storage medium for implementing distributed graph database
WO2021008197A1 (en) * 2019-07-17 2021-01-21 华为技术有限公司 Resource allocation method, storage device, and storage system
CN113626217A (en) * 2021-07-28 2021-11-09 北京达佳互联信息技术有限公司 Asynchronous message processing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645859A (en) * 2013-11-19 2014-03-19 华中科技大学 Disk array caching method for virtual SSD and SSD isomerous mirror image
CN104144127A (en) * 2013-05-08 2014-11-12 华为软件技术有限公司 Load balancing method and device
CN104641344A (en) * 2012-05-21 2015-05-20 谷歌公司 Organizing data in a distributed storage system
US20150378825A1 (en) * 2012-08-31 2015-12-31 Cleversafe, Inc. Securely storing data in a dispersed storage network
CN106104460A (en) * 2014-03-06 2016-11-09 国际商业机器公司 Reliability in distributed memory system strengthens
CN107247778A (en) * 2011-06-27 2017-10-13 亚马逊科技公司 System and method for implementing expansible data storage service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247778A (en) * 2011-06-27 2017-10-13 亚马逊科技公司 System and method for implementing expansible data storage service
CN104641344A (en) * 2012-05-21 2015-05-20 谷歌公司 Organizing data in a distributed storage system
US20150378825A1 (en) * 2012-08-31 2015-12-31 Cleversafe, Inc. Securely storing data in a dispersed storage network
CN104144127A (en) * 2013-05-08 2014-11-12 华为软件技术有限公司 Load balancing method and device
CN103645859A (en) * 2013-11-19 2014-03-19 华中科技大学 Disk array caching method for virtual SSD and SSD isomerous mirror image
CN106104460A (en) * 2014-03-06 2016-11-09 国际商业机器公司 Reliability in distributed memory system strengthens

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021008197A1 (en) * 2019-07-17 2021-01-21 华为技术有限公司 Resource allocation method, storage device, and storage system
US11861196B2 (en) 2019-07-17 2024-01-02 Huawei Technologies Co., Ltd. Resource allocation method, storage device, and storage system
CN111488124A (en) * 2020-04-08 2020-08-04 深信服科技股份有限公司 Data updating method and device, electronic equipment and storage medium
CN112015820A (en) * 2020-09-01 2020-12-01 杭州欧若数网科技有限公司 Method, system, electronic device and storage medium for implementing distributed graph database
CN113626217A (en) * 2021-07-28 2021-11-09 北京达佳互联信息技术有限公司 Asynchronous message processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109407975B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
US10209893B2 (en) Massively scalable object storage for storing object replicas
CN109407975A (en) Data writing method and calculate node and distributed memory system
US9405781B2 (en) Virtual multi-cluster clouds
EP2784675B1 (en) Method, device and system for data reconstruction
JP4265245B2 (en) Computer system
US8255420B2 (en) Distributed storage
JP5411250B2 (en) Data placement according to instructions to redundant data storage system
EP1598737B1 (en) A management method and a management system for volume
US9031906B2 (en) Method of managing data in asymmetric cluster file system
CN108604164A (en) Synchronous for the storage of storage area network agreement is replicated
JP4919851B2 (en) Intermediate device for file level virtualization
US20060031647A1 (en) Storage system and data processing system
JP5548829B2 (en) Computer system, data management method, and data management program
US10031682B1 (en) Methods for improved data store migrations and devices thereof
KR100936238B1 (en) Lazy Replication System And Method For Balanced I/Os Between File Read/Write And Replication
CN105324765A (en) Selecting a store for deduplicated data
JP6288596B2 (en) Data processing method and apparatus
CN104081370A (en) Accessing and replicating backup data objects
CN112764968A (en) Data processing method, device, equipment and storage medium
CN106227470B (en) A kind of storage resource management method and device
CN110663034B (en) Method for improved data replication in cloud environment and apparatus therefor
CN111435286A (en) Data storage method, device and system
CN115878046B (en) Data processing method, system, device, storage medium and electronic equipment
JP5470148B2 (en) Node device and computer program
US7991809B2 (en) System and method for managing zone integrity in a storage area network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant