CN109710456A

CN109710456A - A kind of data reconstruction method and device

Info

Publication number: CN109710456A
Application number: CN201811501810.XA
Authority: CN
Inventors: 金朴堃; 杨潇
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2019-05-03
Anticipated expiration: 2038-12-10
Also published as: CN109710456B

Abstract

The application provides a kind of data reconstruction method and device, comprising: when the object storage device OSD topology of the Ceph cluster changes, which comprises the target for determining that pending data are restored puts in order a group PG；Detect whether the current corresponding normal OSD copy amount of target PG is more than or equal to preset minimum number of copies and the detection Ceph cluster load condition；If it is busy condition that the corresponding normal OSD copy amount of the target PG, which is more than or equal to load condition in the minimum number of copies and the Ceph cluster, delay restores the data that data pending in the target PG are restored.It using method provided by the present application, can prevent from causing OSD machine utilization to aggravate because of data are restored, influence the problem of Ceph cluster is to client traffic processing.

Description

A kind of data reconstruction method and device

Technical field

This application involves field of storage more particularly to a kind of data reconstruction methods and device.

Background technique

With flourishing for the technologies such as current cloud computing, big data and Internet of Things, data are also being in explosive growth, are passed The data storage technology of system can not meet the demand of today's society, therefore propose Ceph (distributed memory system).

Ceph is a kind of distributed storage technology, is integrated with the service of object storage, block storage and file storage, has height The advantages that reliability, increasingly automated and enhanced scalability.

In Ceph cluster, as the OSD (Object Storage Device, object storage device) in Ceph cluster) it opens up It flutters when changing, the data in Ceph cluster in the corresponding PG of OSD (Placement Group, put in order group) need to carry out extensive It is multiple.For example, needing for data in PG corresponding with exception OSD to be restored to normally when OSD exception a certain in Ceph cluster On OSD.However, when business IO (Input Output, input and output) quantity that Ceph cluster processing client is sent is very more When, if Ceph cluster workload will be aggravated by carrying out data recovery at this time, when the workload of Ceph cluster is more than a fixed limit When spending, Ceph cluster will hang up a part of client and send business IO, seriously affect the business processing of client.

Summary of the invention

In view of this, the application provides a kind of data reconstruction method and device, made to prevent because of restoring to data It is aggravated at OSD machine utilization, influences the problem of Ceph cluster is to client traffic processing.

Specifically, the application is achieved by the following technical solution:

According to a first aspect of the present application, a kind of data reconstruction method is provided, the method is applied to distributed storage system Monitor in system Ceph cluster, when the object storage device OSD topology of the Ceph cluster changes, the method packet It includes:

The target for determining that pending data are restored puts in order a group PG；

Detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to preset minimum number of copies, with And the detection Ceph cluster load condition；

If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and the Ceph collection Load condition is busy condition in group, then delay restores the data that data pending in the target PG are restored.

Optionally, the method also includes:

If the corresponding normal OSD copy amount of the target PG is less than the minimum number of copies or the Ceph cluster In non-busy condition, then data to be restored in the target PG are restored.

Optionally, the busy-idle condition of the detection Ceph cluster, comprising:

It is pre- that detection reflects whether the current value of the cluster load parameter of the Ceph cluster current loading condition is greater than first If value；

If so, determining that the Ceph cluster is in busy condition；

If it is not, then further detection reflects the node load parameter of each OSD current loading condition in the Ceph cluster Current value；If there are the OSD that the current value of node load parameter is greater than the second preset value in the Ceph cluster, it is determined that described Ceph cluster is in busy condition；If the current value of the node load parameter of all OSD is respectively less than and is equal in the Ceph cluster Second preset value, it is determined that the Ceph cluster is in non-busy condition.

Optionally, after the target that the pending data of the determination are restored puts in order group PG, which comprises

Start preset timer；

Whether the current corresponding normal OSD copy amount of detection target PG is more than or equal to preset minimum copy Number and the detection Ceph cluster load condition, comprising:

Whether overtime detect the timer；

If the timer expiry, detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is pre- If minimum number of copies and the detection Ceph cluster load condition；

If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and described Busy condition is in Ceph cluster, then delay restores the data in the target PG, comprising:

If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and the Ceph collection Busy condition is in group, then return the detection timer whether Chao Shi step.

It is optionally, described that data to be restored in the target PG are restored, comprising:

When starting to restore data to be restored in target PG, start preset second timer；

In the second timer time-out, whether all data to be restored detected in the target PG are completed to restore；

If it is not, then stopping restoring unrecovered data in target PG, the second timer is closed, and return The detection timer whether Chao Shi step.

Optionally, the cluster load parameter includes: the Ceph cluster current business IO quantity and the Ceph cluster The ratio of current all IO quantity；

The node load parameter includes: hard disk utilization, the number IPOS per second being written and read.

Optionally, before the target PG that the pending data of the determination are restored, the method also includes:

Calculate OSD group corresponding to each PG in the Ceph cluster；

The target PG that the pending data of determination are restored, comprising:

For each PG, if the corresponding OSD group of calculated PG is with the PG, currently corresponding OSD group is inconsistent, really The fixed PG is the target PG that pending data are restored.

According to a second aspect of the present application, a kind of Data Recapture Unit is provided, described device is applied to distributed storage system Monitor in system Ceph cluster, when the object storage device OSD topology of the Ceph cluster changes, described device packet It includes:

Determination unit, the target for determining that pending data are restored put in order a group PG；

Detection unit, for detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is preset Minimum number of copies and the detection Ceph cluster load condition；

Delay cell, if for the corresponding normal OSD copy amount of the target PG be more than or equal to the minimum number of copies, And load condition is busy condition in the Ceph cluster, then the delay data of restoring data pending in the target PG into Row restores.

Optionally, described device further include:

Recovery unit, if for the corresponding normal OSD copy amount of the target PG be less than the minimum number of copies or The Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.

Optionally, the detection unit, for detecting the cluster load ginseng for reflecting the Ceph cluster current loading condition Whether several current values is greater than the first preset value；If so, determining that the Ceph cluster is in busy condition；If it is not, then into one Step detection reflects the current value of the node load parameter of each OSD current loading condition in the Ceph cluster；If the Ceph collection There are the OSD that the current value of node load parameter is greater than the second preset value in group, it is determined that the Ceph cluster is in busy shape State；If the current value of the node load parameter of all OSD, which is respectively less than, in the Ceph cluster is equal to second preset value, really The fixed Ceph cluster is in non-busy condition.

Optionally, described device further include:

Start unit, for starting preset timer；

Whether overtime the detection unit is specifically used for the detection timer；If the timer expiry detects the mesh Whether the current corresponding normal OSD copy amount of mark PG is more than or equal to preset minimum number of copies and the detection Ceph collection Group's load condition；

The delay cell, if be specifically used for the corresponding normal OSD copy amount of the target PG be more than or equal to it is described most Busy condition is in small number of copies and the Ceph cluster, then return the detection timer whether Chao Shi step.

Optionally, the recovery unit when data to be restored are restored in the target PG, is specifically used for When starting to restore data to be restored in target PG, start preset second timer；It is super in the second timer Constantly, whether all data to be restored detected in the target PG are completed to restore；If it is not, then stopping to inextensive in target PG Multiple data are restored, and the second timer is closed, and return the detection timer whether Chao Shi step.

Optionally, described device further include:

Computing unit, for calculating OSD group corresponding to each PG in the Ceph cluster；

The determination unit is specifically used for being directed to each PG, if the corresponding OSD group of calculated PG and the PG are currently right The OSD group answered is inconsistent, it is determined that the PG is the target PG that pending data are restored.

Seen from the above description, on the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not vertical That is the data to be restored in target PG are restored, but to judge current Ceph cluster load condition, in determination When Ceph cluster is in busy condition, delay restores the data in target PG, therefore can be effectively prevented because right Data are restored and OSD machine utilization are caused to aggravate, and the generation for handling this problem to Ceph cluster client terminal business is influenced.

On the other hand, the quantity of the application normal OSD copy also corresponding to target PG and default minimum number of copies carry out Compare, guarantees still there can be enough normal OSD copies to protect even if postponing recovery to data in target PG with this Card processing is directed to the read-write business of target PG.

Detailed description of the invention

Fig. 1 is the group-network construction figure of Ceph shown in one exemplary embodiment of the application a kind of；

Fig. 2 is a kind of flow chart of data reconstruction method shown in one exemplary embodiment of the application；

Fig. 3 is the flow chart of another data reconstruction method shown in one exemplary embodiment of the application；

Fig. 4 is a kind of block diagram of Data Recapture Unit shown in one exemplary embodiment of the application；

Fig. 5 is a kind of hardware structure diagram of monitor shown in one exemplary embodiment of the application.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

It is only to be not intended to be limiting the application merely for for the purpose of describing particular embodiments in term used in this application. It is also intended in the application and the "an" of singular used in the attached claims, " described " and "the" including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".

Shown in Figure 1, Fig. 1 is a kind of group-network construction based on Ceph storage shown in one exemplary embodiment of the application Figure.

In the networking, including Ceph cluster and client.

Wherein, above-mentioned client, the also referred to as client of Ceph cluster are mainly used for interacting with Ceph cluster, So that Ceph cluster can handle the read-write business of the client.

Specifically, Ceph cluster can receive the business IO (for example read IO, write IO etc.) that client is sent, and Ceph cluster is based on The business IO that the client is sent executes the read-write business of client.

For example, Ceph cluster receives the IO that writes of client transmission, this can be write the data write-in of IO carrying by Ceph cluster It is local.When reading IO, the Ceph cluster that Ceph cluster receives client transmission can be according to the reading I O read data, and will read Data return to client.

Above-mentioned Ceph cluster can include: monitor, multiple OSD.Certain Ceph cluster may also include meta data server etc. Other equipment are only illustratively illustrated that the equipment for not included to Ceph cluster carries out specific to Ceph cluster here Ground limits.

Wherein, monitor is mainly used for managing each equipment in Ceph cluster.Monitor can be a physical equipment, The cluster that can be made of more physical equipments only illustratively illustrates, here without specifically limiting.

Above-mentioned OSD, the storage for being mainly used for being responsible for data are receiving for example, data are written after receiving write IO request Data etc. are read after to read I O request.OSD is usually the hard disk in Ceph cluster on each physical server.Here only to OSD Function and the equipment form of OSD are illustratively illustrated, without specifically defined.

The several concepts being related in Ceph cluster are introduced below.

1)PG

PG also referred to as puts in order group, is the minimum unit that data are restored and changed in Ceph cluster.PG is that a logic is general It reads, is equivalent to the logical collection comprising one group of data, the data storage that PG is included is in the corresponding OSD group of the PG.

It include multiple OSD in the corresponding OSD group of PG.Data in PG can be replicated more parts, and it is corresponding to be separately stored in the PG OSD group in each OSD in.

For example, the corresponding OSD group of PG1 is [1,2,3], show that the corresponding OSD group of PG1 includes 3 OSD, these three OSD groups OSD mark be respectively OSD 1, OSD 2 and OSD 3.

When write-in is directed to the data of the PG1, can write the data into OSD1, which is synchronized to OSD2 by OSD1 And OSD3, so that each OSD preserves the data in the portion PG1.

2) the corresponding OSD copy of PG

The corresponding OSD copy of OSD in the corresponding OSD group of PG, referred to as PG.The number of OSD in the corresponding OSD group of PG Amount, the corresponding OSD number of copies of referred to as PG.

Still with the corresponding OSD group of above-mentioned PG1 for for [1,2,3], it is corresponding that OSD1, OSD2 and OSD3 are referred to as the PG1 OSD copy.The corresponding OSD copy number of PG1 is 3.

3) business IO and recovery IO

IO present in Ceph cluster can include: business IO and recovery IO.

Business IO: business IO refers to the IO from client, is mainly used for indicating that Ceph cluster carries out the client Read-write business.

Business IO can include: write IO etc. from the reading IO of client, from client.

Restore IO: can be generated when the data in PG are restored, in Ceph cluster and restore IO, restores IO and be mainly used for referring to The PG led in Ceph cluster is restored.

4) data in PG are restored

After the OSD topology in Ceph cluster changes, monitor can the corresponding OSD group of each PG in computing cluster, Then be directed to each PG, if currently corresponding OSD group is different from calculated OSD group by the PG, by the PG be determined as into The PG that row data are restored.

The data in PG restored to pending data carry out restoring to refer to: the data in PG are restored to calculated PG On OSD in corresponding OSD group.

Existing data reset mode is: when monitor monitors a certain OSD failure in Ceph cluster, immediately should Data restore that guide data recovery can be generated in data recovery procedure to normal OSD in PG corresponding with failure OSD Restore IO, the data of recovery in need are carried in recovery IO.After the normal OSD receives recovery IO, it can will restore Data are written to local in the PG carried in IO.

However, processing, which restores IO, to be aggravated if the normal OSD is just handling the largely business IO from client at this time The workload of normal OSD causes the normal OSD performance decline, seriously affects the processing of the business IO sent to client.

In view of this, the application is directed to a kind of data reconstruction method, after target PG to be restored is determined in monitor, If after the quantity of the corresponding normal OSD copy of target PG is more than or equal to preset minimum number of copies, and at current Ceph cluster In busy condition, then delay restores the data in the target PG to purpose OSD.

On the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not immediately to target to be restored Data in PG are restored, but to judge current Ceph cluster busy-idle condition, hurry determining that Ceph cluster is in When commonplace state, delay restores the data in target PG, therefore can be effectively prevented and cause because of restoring to data OSD machine utilization aggravates, and influences the generation for handling this problem to Ceph cluster client terminal business.

In summary, the application, both can be extensive by postponing when Ceph cluster is in busy when carrying out data recovery Data in complicated target PG prevent from causing OSD machine utilization to aggravate because of data are restored, and influence Ceph cluster to client The problem of holding business processing, it can also be ensured that will not influence in postponing the recovery process to data in target PG for the target The read-write business processing of PG.

Referring to fig. 2, Fig. 2 is a kind of flow chart of data reconstruction method shown in one exemplary embodiment of the application.The party Method can be applicable on the monitor of Ceph cluster, when the OSD topology of the Ceph cluster changes, can be performed as follows Step.

Step 201: determining the target PG that pending data are restored；

In Ceph cluster, when the OSD in Ceph cluster is offline because of Network Abnormal or Ceph cluster in increase newly When OSD OSD failure, the OSD topology that can all cause in Ceph cluster changes.

When monitor detects that the OSD topology in Ceph cluster changes, monitor can determine target to be restored PG。

Firstly, first introducing lower monitor is how to detect whether the OSD topology in Ceph cluster changes.

When realizing, monitor can be by detecting whether OSD cluster map changes, to detect the OSD in Ceph cluster Whether topology changes.

Specifically, OSD cluster map, OSD cluster map is stored on monitor for recording the OSD in current Ceph cluster Topology.If monitor monitors the OSD cluster, map changes, and monitor determines that the OSD topology in the Ceph cluster becomes Change, if OSD cluster map does not change, monitor determines that the OSD topology in the Ceph cluster does not change.

If monitor is how to determine target to be restored secondly, the OSD topology introduced in lower Ceph cluster occurs PG。

When realizing, when monitor determines that the OSD topology in Ceph cluster changes, monitor can pass through Crush (Controlled Replication Under Scalable Hashing, the controlled copying algorithm under expansible hash) is calculated Method calculates the corresponding OSD group of each PG.The corresponding OSD group expression of PG that this is calculated is after data recovery, Ceph cluster In each PG and OSD group mapping.

Then, for each PG, it is corresponding with the current PG that monitor can obtain the corresponding OSD group of the calculated PG OSD group.

PG in Ceph cluster is configured there are two set, is up set set and acting set set respectively.up The corresponding OSD group of current each PG is had recorded in set set.Have recorded what Crush algorithm calculated in acting set set The corresponding OSD group of PG.

When obtaining, monitor can obtain the corresponding OSD group of the current PG from up set corresponding with PG set, from The corresponding OSD group of PG that Crush algorithm calculates is determined in acting set set corresponding with the PG.

After getting the corresponding OSD group of calculated PG OSD group corresponding with the current PG, the detectable meter of monitor The whether consistent of the corresponding OSD group of the PG of calculating OSD group corresponding with the current PG

If the corresponding OSD group of calculated PG OSD group corresponding with the current PG is inconsistent, by the PG be determined as to Carry out the target PG of data recovery；

If the corresponding OSD group of calculated PG OSD group corresponding with the current PG is unanimously, it is determined that the PG be not into The PG that row data are restored.

For example, still by taking PG1 as an example, it is assumed that have recorded OSD group [1,2,3] in up set corresponding with PG1, PG1 is corresponding [4,2,3] are had recorded in acting set.

Monitor can obtain PG1 currently corresponding OSD group [1,2,3] in up set set, can be from acting set collection It is obtained in conjunction and passes through the corresponding OSD group [4,2,3] of the calculated PG1 of Crush algorithm.Since monitor determines that PG1 is currently corresponded to OSD group [1,2,3] and by the corresponding OSD group [4,2,3] of the calculated PG1 of Crush algorithm it is inconsistent, then monitor determine PG1 is target PG to be restored.

It should be noted that target PG described here can be a PG, it is also possible to multiple PG, here not to target The number of PG carries out specifically defined.

Step 202: detect the current corresponding normal OSD copy of the target PG quantity whether be more than or equal to it is preset most Small number of copies and the detection Ceph cluster load condition.

Below to two aspect of specific implementation of the trigger mechanism of step 202 and step 202, have to step 202 Illustrate to body.

1, step 202 trigger mechanism

For monitor after determining the target PG group that pending data are restored, monitor can start preset timer (here It is denoted as first timer).

Whether monitor can detect first timer overtime.

If first timer has not timed out, first timer time-out is continued waiting for.

If first timer has timed out, step 202 is triggered, even first timer has timed out, then detects the target Whether the quantity of the current corresponding normal OSD copy of PG is more than or equal to preset minimum number of copies and the detection Ceph collection Group's busy-idle condition.

2, the specific implementation of step 202:

1) whether the quantity of the detectable corresponding normal OSD copy of target PG determined of monitor is more than or equal to preset Minimum number of copies.

When realizing, monitor can determine the quantity of the corresponding normal OSD copy of target PG first.

Specifically, monitor can be from target PG currently corresponding OSD group and the corresponding OSD group of calculated target PG In, select identical OSD in two OSD groups that it is corresponding normal then to count target PG as the corresponding normal OSD of target PG The number of OSD.

For example, still by taking PG1 is target PG as an example, it is assumed that currently corresponding OSD group is OSD group [1,2,3] to PG1, is calculated The corresponding OSD group of PG1 be OSD group [4,2,3].

Identical OSD in the two OSD groups is the corresponding normal OSD that OSD2 and OSD3, then OSD2 and OSD3 are PG1 Copy.The number of the normal OSD copy is 2.

Then, whether the quantity of the detectable corresponding normal OSD copy of target PG determined of monitor is more than or equal to pre- If minimum number of copies.

It should be noted that above-mentioned minimum number of copies is the least OSD copy amount that user can bear according to oneself To determine minimum number of copies.

For example, in existing Ceph cluster, user is in order to guarantee the reliabilities of data, according to memory space, write delay Situations such as one number of copies (assuming that with N to indicate) can be set for Ceph.Usual N >=3.

In this application, other than existing number of copies N, minimum number of copies M is had also been devised in the application.User can basis The value of N sets minimum number of copies (assuming that with M to indicate).

For example, the value of M, i.e. 1≤M≤N can be arranged in user in the section of [1, N].Here only minimum to setting secondary Notebook data is illustratively illustrated, without specifically defined.

Be arranged minimum number of copies and the corresponding normal OSD copy of detection target PG quantity whether be more than or equal to it is preset The purpose of minimum number of copies is: even if guaranteeing recovery of the delay to data in target PG, still can have enough normal OSD copy come guarantee processing be directed to target PG read-write business.

Specifically, for example assume OSD1 failure, currently corresponding OSD group is OSD group [1,2,3] to target PG, calculated The corresponding OSD group of target PG is OSD group [4,2,3], needs to restore the data in target PG to OSD4 at this time.

During data in target PG are restored to OSD4, Ceph cluster can still be received from client and needle To the business IO of target PG, and due to OSD1 failure, and data do not complete and restore, so these business IO can be by OSD copy (i.e. OSD2 or OSD3) is handled.

For example business IO is to read IO, the data at this time due to OSD1 failure, and in OSD1 are not restored also to OSD4, then It needs to read data from the corresponding OSD copy of target PG, for example reads data from OSD2 or OSD3.

If the corresponding OSD copy of target PG is all abnormal or the corresponding OSD number of copies of target PG is less than user preset Threshold value, then be unable to satisfy the processing for business IO.

For these reasons, whether the quantity that the application needs to detect the corresponding normal OSD copy of target PG is more than or equal to Preset minimum number of copies guarantees still can to have enough normal even if postponing recovery to data in target PG with this OSD copy come guarantee processing be directed to target PG read-write business.

2) the Ceph cluster busy-idle condition is detected

Mode one:

Step 2021: it is pre- whether the current value that monitor can detect the cluster load parameter of the Ceph cluster is greater than first If value.

Wherein, the cluster load parameter is used to reflect the load condition of cluster entirety, for example cluster load parameter can be with It is the Ceph cluster current business IO quantity and the Ceph cluster currently all IO ratio of number.

Wherein, the first preset value can be set according to the actual situation by user, here not to first preset value into Row is specifically defined.

When realizing, the number of the statistics available Ceph cluster of monitor currently the quantity A and statistical service IO of all IO Measure B.

Then, monitor can calculate the ratio of B and A, obtain C, wherein C=B/A, and C is exactly the cluster load of Ceph cluster The current value of parameter.

Step 2022: if whether the current value of the cluster load parameter of the Ceph cluster is greater than the first preset value, really The fixed Ceph cluster is in busy condition.

Step 2023: if the current value of the cluster load parameter of the Ceph cluster is less than or equal to the first preset threshold, Further detect the node load parameter of each OSD in the Ceph cluster.

Wherein, the node load parameter of each OSD is used to characterize the load condition of the OSD；The node load parameter of OSD is worked as Preceding value is bigger, shows that the IO of OSD carrying is more, OSD is busier.

The node load parameter may include: the IOPS (Input/Output of the hard disk utilization rate of OSD, OSD Operations Per Second, the number per second being written and read).When node load parameter is the hard disk utilization rate of OSD When, which is preset value relevant to the hard disk utilization rate, and when the node load parameter is the IOPS of OSD, Second preset value is preset value relevant to the IOPS.

Step 2024: if there are the OSD that the current value of node load parameter is greater than the second preset value, prisons in the Ceph cluster Visual organ then can determine that the Cpeh cluster is in busy condition.

Step 2025: if the node load parameter current value of all OSD is respectively less than that be equal to second default in the Ceph cluster Value, monitor then can determine that the Ceph cluster is in non-busy condition.

Mode two:

Monitor detects the node load parameter of each OSD in Ceph cluster, if there are node load ginsengs in the Ceph cluster Several current values is greater than the OSD of the second preset value, and monitor then can determine that the Cpeh cluster is in busy condition, if the Ceph The current value of the node load parameter of all OSD, which is respectively less than, in cluster is equal to the second preset value, and monitor then can determine described Ceph cluster is in non-busy condition.

The advantages of mode one, is:

On the one hand, node load ginseng of the application according to each OSD in the cluster load parameter and Ceph cluster of Ceph cluster The busy-idle condition to determine Ceph cluster is counted, is reflected on the whole in terms of each node two of Ceph cluster from Ceph cluster The state of Ceph cluster, so that the reflection of Ceph cluster busy-idle condition is more fully.

On the other hand, compared with mode two, existing Ceph cluster can detect and record Ceph cluster overall load ginseng The cluster load parameter of Ceph cluster can be read directly in number, monitor.And monitor will obtain the node load ginseng of each OSD Number, it is necessary to be obtained from each OSD node.So reading node of the cluster load parameter of Ceph cluster than obtaining each OSD Load parameter is more convenient.

So the method for employing mode one judges, it is greater than the first preset value in the cluster load parameter for determining Ceph cluster When, so that it may it determines that Ceph cluster is in busy condition, and does not have to the node load parameter for obtaining each OSD again, so significantly The speed of determining Ceph cluster busy-idle condition is saved.

It should be noted that the application is to " whether the quantity for detecting the corresponding normal OSD copy of the target PG is big In being equal to preset minimum number of copies " and the timing of " detecting the Ceph cluster busy-idle condition " specifically limited.

Step 203: if the quantity of the corresponding normal OSD copy of the target PG be more than or equal to preset minimum number of copies, And busy condition is in the Ceph cluster, then delay restores the data in the target PG.

In the embodiment of the present application, if monitor will not be immediately by target PG after currently Ceph cluster is in busy condition Middle data restore OSD corresponding to calculated target PG, but periodically detect the corresponding normal OSD pair of the target PG Whether this quantity is more than or equal to preset minimum number of copies and the detection Ceph cluster busy-idle condition, until target PG Corresponding counter part number is more than or equal to minimum number of copies, and when the state of current Ceph cluster be in non-busy condition, ability is right Data in target PG are restored.

When realizing, if the quantity of the corresponding normal OSD copy of the target PG be more than or equal to preset minimum number of copies, And busy condition is in the Ceph cluster, then the step of " detection first timer whether time-out " in return step 202, If first timer is overtime, step 203 is continued to execute, if first timer has not timed out, waits the first timer overtime. Until the quantity of the corresponding normal OSD copy of the target PG is less than the minimum number of copies or the Ceph cluster is in Non- busy condition starts to restore the data in the target PG.

For example, if the quantity of the corresponding normal OSD copy of the target PG is more than or equal to preset minimum number of copies and institute It states in Ceph cluster in busy condition, then checks whether first timer is overtime.

If first timer has not timed out, wait first timer overtime.If first timer is overtime, then detects target Whether the corresponding normal OSD number of copies of PG is more than or equal to preset minimum number of copies, and the state of the current Ceph cluster of detection. If detecting, the corresponding normal OSD number of copies of target PG is more than or equal to preset minimum number of copies, and current Ceph cluster is in Whether overtime busy condition then reexamines first timer.

If first timer has not timed out, wait first timer overtime.If first timer is overtime, then detects target Whether the corresponding normal OSD number of copies of PG is more than or equal to preset minimum number of copies, and the state of the current Ceph cluster of detection, And so on, until the corresponding counter part number of target PG is less than minimum number of copies, or detect that current Ceph cluster is in When non-busy condition, start to restore the data in target PG.

Step 204: if the corresponding normal OSD copy amount of the target PG is less than the minimum number of copies or described Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.

When realizing, when starting to restore data to be restored in target PG, start preset second timer；

In the second timer time-out, whether all data to be restored detected in the target PG are completed to restore.

If all data to be restored in target PG are completed to restore, terminate Data Recovery Process.

If all data to be restored in target PG are not completed to restore, stop to unrecovered data in target PG Restored, close second timer, and the step of " detection first timer " in return step 202.

If first timer is overtime, detect whether the current corresponding normal OSD copy amount of target PG is more than or equal to Preset minimum number of copies and the detection Ceph cluster load condition, if the corresponding normal OSD number of copies of the target PG Amount is less than the minimum number of copies or the Ceph cluster is in non-busy condition, then starts to unrecovered in target PG Data are restored, and start second timer.

In second timer time-out, whether all data to be restored detected in the target PG are completed to restore.If mesh All data to be restored in mark PG are restored to complete, then terminate process.If all data to be restored in target PG are not complete At recovery, then stops restoring unrecovered data current in target PG, close the second timer, and return to institute The step of stating " whether overtime detecting the first timer ", until all data to be restored in target PG are completed to restore.

Seen from the above description, on the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not vertical The data in target PG to be restored are restored, but current Ceph cluster busy-idle condition are judged, true When determining Ceph cluster and being in busy condition, delay restores the data in target PG, thus can be effectively prevented because Data are restored and OSD machine utilization is caused to aggravate, influence the generation for handling this problem to Ceph cluster client terminal business.

The third aspect, the method that data provided by the present application are restored can be compatible with existing Ceph cluster, such as can be with Compatible with the Crush algorithm of Ceph cluster etc., all data reconstruction methods provided by the present application have good compatibility.

It is the flow chart of another data reconstruction method shown in one exemplary embodiment of the application referring to Fig. 3, Fig. 3.It should Method can be applicable on the monitor in Ceph cluster.

Step 301: when the OSD topology in Ceph cluster changes, determining the target PG that pending data are restored.

Step 302: starting first timer.

Step 303: whether detection first timer is overtime.

If first timer has not timed out, step 304 can be performed.

If first timer is overtime, 305 are thened follow the steps.

Step 304: monitor may wait for first timer time-out.

Step 305: monitor can detect whether the current corresponding normal OSD number of copies of target PG is greater than preset minimum pair This number.

If the current corresponding normal OSD number of copies of target PG is more than or equal to preset minimum number of copies, then follow the steps 306。

If the current corresponding normal OSD number of copies of target PG is less than preset minimum number of copies, 308 are thened follow the steps.

Step 306: whether the current value that monitor can detect the cluster load parameter of Ceph cluster is greater than the first preset value.

If the current value of the cluster load parameter of Ceph cluster is greater than the first preset value, return step 303.

If the current value of the cluster load parameter of Ceph cluster is less than equal to the first preset value, 307 are thened follow the steps.

Step 307: whether the current value that monitor can detect the node load parameter of each OSD in Ceph cluster is greater than the Two preset values.

If the current value of the node load parameter of each OSD in Ceph cluster, which is respectively less than, is equal to the second preset value, execute Step 308.

If in Ceph cluster there are the current value of node load parameter be greater than the second preset value OSD when, return step 303。

Step 308: monitor starts to restore the data of target PG, and starts second timer.

Step 309: after second timer time-out, monitor, which can detect all data to be restored in the target PG, is No completion restores.

If all data to be restored in the target PG are completed to restore, 311 are thened follow the steps.

If thening follow the steps 310 there are unrecovered data in the target PG.

Step 310: monitor can stop restoring unrecovered data current in target PG, and close the second timing Device.

After executing the step 310, return step 303.

Step 311: monitor can terminate the recovery to data in target PG.

The embodiment of the present application also provides Data Recapture Units corresponding with above-mentioned data reconstruction method.

Referring to fig. 4, Fig. 4 is a kind of block diagram of Data Recapture Unit shown in one exemplary embodiment of the application, the data Recovery device can be applicable on monitor, it may include unit as follows.

Determination unit 401, the target for determining that pending data are restored put in order a group PG；

Detection unit 402, for detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is default Minimum number of copies and the detection Ceph cluster load condition；

Delay cell 403, if being more than or equal to the minimum copy for the corresponding normal OSD copy amount of the target PG Load condition is busy condition in number and the Ceph cluster, then postpones the number for restoring data pending in the target PG According to being restored.

Optionally, described device further include:

Recovery unit 404, if for the corresponding normal OSD copy amount of the target PG be less than the minimum number of copies, Or the Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.

Optionally, the detection unit 402, for detecting the cluster load for reflecting the Ceph cluster current loading condition Whether the current value of parameter is greater than the first preset value；If so, determining that the Ceph cluster is in busy condition；If it is not, then into The detection of one step reflects the current value of the node load parameter of each OSD current loading condition in the Ceph cluster；If the Ceph There are the OSD that the current value of node load parameter is greater than the second preset value in cluster, it is determined that the Ceph cluster is in busy State；If the current value of the node load parameter of all OSD, which is respectively less than, in the Ceph cluster is equal to second preset value, Determine that the Ceph cluster is in non-busy condition.

Optionally, described device further include:

Start unit 405, for starting preset timer；

Whether overtime the detection unit 402 is specifically used for the detection timer；If the timer expiry, detection Whether the current corresponding normal OSD copy amount of target PG is more than or equal to described in preset minimum number of copies and detection Ceph cluster load condition；

The delay cell 403, if being specifically used for the corresponding normal OSD copy amount of the target PG more than or equal to described Busy condition is in minimum number of copies and the Ceph cluster, then return the detection timer whether Chao Shi step Suddenly.

Optionally, the recovery unit 404, it is specific to use when data to be restored are restored in the target PG In when starting to restore data to be restored in target PG, start preset second timer；In second timing When device time-out, whether all data to be restored detected in the target PG are completed to restore；If it is not, then stopping in target PG Unrecovered data are restored, and the second timer is closed, and return the detection timer whether Chao Shi step Suddenly.

Optionally, described device further include:

Computing unit 406, for calculating OSD group corresponding to each PG in the Ceph cluster；

The determination unit 401 is specifically used for being directed to each PG, if the corresponding OSD group of calculated PG is worked as with the PG Preceding corresponding OSD group is inconsistent, it is determined that the PG is the target PG that pending data are restored.

Correspondingly, present invention also provides the hardware structure diagrams of 4 shown device of corresponding diagram.

Monitor described herein can be made of a physical server, be also possible to by more with physical custody device The virtual monitor that platform physical server is invented.When the monitor is physical custody device, the hardware configuration of the monitor Figure can be as shown in Figure 5.

It is a kind of hardware structure diagram of monitor shown in one exemplary embodiment of the embodiment of the present application referring to Fig. 5, Fig. 5.

The monitor includes: communication interface 501, processor 502, machine readable storage medium 503 and bus 504；Wherein, Communication interface 501, processor 502 and machine readable storage medium 503 complete mutual communication by bus 504.Processor 502 by reading and executing machine-executable instruction corresponding with data recovery control logic in machine readable storage medium 503, Above-described data reconstruction method can be performed.

Machine readable storage medium 503 referred to herein can be any electronics, magnetism, optics or other physical stores Device may include or store information, such as executable instruction, data, etc..For example, machine readable storage medium may is that easily Lose memory, nonvolatile memory or similar storage medium.Specifically, machine readable storage medium 503 can be RAM (Radom Access Memory, random access memory), flash memory, memory driver (such as hard disk drive), solid state hard disk, Any kind of storage dish (such as CD, DVD) perhaps similar storage medium or their combination.

The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.

For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize application scheme.Those of ordinary skill in the art are not paying Out in the case where creative work, it can understand and implement.

The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims

1. a kind of data reconstruction method, which is characterized in that the method is applied to the prison in distributed memory system Ceph cluster Visual organ, when the object storage device OSD topology of the Ceph cluster changes, which comprises

Detect whether the current corresponding normal OSD copy amount of target PG is more than or equal to preset minimum number of copies, Yi Jijian Survey the Ceph cluster load condition；

If the corresponding normal OSD copy amount of the target PG is more than or equal in the minimum number of copies and the Ceph cluster Load condition is busy condition, then delay restores the data that data pending in the target PG are restored.

2. the method according to claim 1, wherein the method also includes:

If the corresponding normal OSD copy amount of the target PG is less than the minimum number of copies or the Ceph cluster is in Non- busy condition then restores data to be restored in the target PG.

3. the method according to claim 1, wherein the busy-idle condition of the detection Ceph cluster, comprising:

Detection reflects whether the current value of the cluster load parameter of the Ceph cluster current loading condition is greater than the first preset value；

If so, determining that the Ceph cluster is in busy condition；

If it is not, then further detection reflects the current of the node load parameter of each OSD current loading condition in the Ceph cluster Value；If there are the OSD that the current value of node load parameter is greater than the second preset value in the Ceph cluster, it is determined that the Ceph Cluster is in busy condition；If the current value of the node load parameter of all OSD is respectively less than equal to described in the Ceph cluster Second preset value, it is determined that the Ceph cluster is in non-busy condition.

4. according to the method described in claim 2, it is characterized in that, putting in order group in the target that the pending data of the determination are restored After PG, which comprises

Start preset timer；

The current corresponding normal OSD copy amount of detection target PG whether be more than or equal to preset minimum number of copies, with And the detection Ceph cluster load condition, comprising:

Whether overtime detect the timer；

If the timer expiry, detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is preset Minimum number of copies and the detection Ceph cluster load condition；

If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and the Ceph collection Busy condition is in group, then delay restores the data in the target PG, comprising:

If the corresponding normal OSD copy amount of the target PG is more than or equal in the minimum number of copies and the Ceph cluster In busy condition, then return the detection timer whether Chao Shi step.

5. according to the method described in claim 4, it is characterized in that, described extensive to data progress to be restored in the target PG It is multiple, comprising:

If it is not, then stopping restoring unrecovered data in target PG, the second timer is closed, and described in return Detect the timer whether Chao Shi step.

6. according to the method described in claim 3, it is characterized in that,

The cluster load parameter includes: the Ceph cluster current business IO quantity and the Ceph cluster currently all IO numbers The ratio of amount；

7. the method according to claim 1, wherein the target PG restored in the pending data of the determination it Before, the method also includes:

Calculate OSD group corresponding to each PG in the Ceph cluster；

The target PG that the pending data of determination are restored, comprising:

For each PG, if the corresponding OSD group of calculated PG is with the PG, currently corresponding OSD group is inconsistent, it is determined that institute Stating PG is the target PG that pending data are restored.

8. a kind of Data Recapture Unit, which is characterized in that described device is applied to the prison in distributed memory system Ceph cluster Visual organ, when the object storage device OSD topology of the Ceph cluster changes, described device includes:

Detection unit, for detecting whether the current corresponding normal OSD copy amount of target PG is more than or equal to preset minimum Number of copies and the detection Ceph cluster load condition；

Delay cell, if being more than or equal to the minimum number of copies and institute for the corresponding normal OSD copy amount of the target PG Stating load condition in Ceph cluster is busy condition, then the data for postponing to restore data pending in the target PG carry out extensive It is multiple.

9. device according to claim 8, which is characterized in that described device further include:

Recovery unit, if being less than the minimum number of copies or described for the corresponding normal OSD copy amount of the target PG Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.

10. device according to claim 8, which is characterized in that the detection unit reflects the Ceph collection for detecting Whether the current value of the cluster load parameter of group's current loading condition is greater than the first preset value；If so, determining the Ceph collection Group is in busy condition；If it is not, then further detecting the node load for reflecting each OSD current loading condition in the Ceph cluster The current value of parameter；If there are the OSD that the current value of node load parameter is greater than the second preset value in the Ceph cluster, really The fixed Ceph cluster is in busy condition；If the current value of the node load parameter of all OSD is small in the Ceph cluster In equal to second preset value, it is determined that the Ceph cluster is in non-busy condition.

11. device according to claim 9, which is characterized in that described device further include:

Start unit, for starting preset timer；

Whether overtime the detection unit is specifically used for the detection timer；If the timer expiry, target PG is detected Whether current corresponding normal OSD copy amount is more than or equal to preset minimum number of copies and the detection Ceph cluster is negative Load state；

The delay cell, if being specifically used for the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum pair Busy condition is in this number and the Ceph cluster, then return the detection timer whether Chao Shi step.

12. device according to claim 11, which is characterized in that the recovery unit, to extensive in the target PG When multiple data are restored, specifically for starting preset when starting to restore data to be restored in target PG Second timer；In the second timer time-out, detect all data to be restored in the target PG whether complete it is extensive It is multiple；If it is not, then stopping restoring unrecovered data in target PG, the second timer is closed, and described in return Detect the timer whether Chao Shi step.

13. according to the method described in claim 10, it is characterized in that, the cluster load parameter includes: the Ceph cluster The ratio of the current all IO quantity of current business IO quantity and the Ceph cluster；

14. device according to claim 8, which is characterized in that described device further include:

The determination unit is specifically used for being directed to each PG, if the corresponding OSD group of calculated PG and the PG are currently corresponding OSD group is inconsistent, it is determined that the PG is the target PG that pending data are restored.