CN107608826A

CN107608826A - A kind of fault recovery method, device and the medium of the node of storage cluster

Info

Publication number: CN107608826A
Application number: CN201710852629.2A
Authority: CN
Inventors: 董海廷; 王佳琪
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-09-19
Filing date: 2017-09-19
Publication date: 2018-01-19

Abstract

The invention discloses a kind of fault recovery method, device and the medium of the node of storage cluster, this method includes：Node data in the relevant information and malfunctioning node of record malfunctioning node；Wherein, the cluster attaching information of malfunctioning node is comprised at least in relevant information；Fault recovery is carried out to malfunctioning node and obtains start node；Relevant information is read in a manner of script, and target storage cluster is searched according to cluster attaching information；Start node is configured to add to target storage cluster, and node data is write to start node.It can be seen that this method improves the reliability recovered in storage cluster to malfunctioning node and ensures the safety of data in malfunctioning node.In addition, the present invention also provides a kind of local fault recovery device and medium of the node of storage cluster, beneficial effect is as described above.

Description

A kind of fault recovery method, device and the medium of the node of storage cluster

Technical field

The present invention relates to field of storage, more particularly to a kind of fault recovery method of the node of storage cluster, device and Medium.

Background technology

With the arrival in big data epoch, the quantity of data is being continuously increased, traditional to be carried out by single storage device The equipment of storage can not suitable huge data volume, arise at the historic moment, deposit therefore, it is possible to the storage cluster of distributing storage data It is made up of in accumulation memory node, data are distributed in the memory node of storage cluster.

Due to the different properties of the node of composition storage cluster, therefore some nodes may be in cluster work in storage cluster Failure when making, therefore fault recovery is carried out to it to realize storage cluster normal work in node failure.Traditional deposits Accumulation node failure recover be taking human as mode recovery nodes failure and manually using the node after fault recovery as newly save Point is added in storage cluster, but manual operation wastes time and energy, it is also possible to and there is artificial operational error and trigger new failure, Therefore reliability is poor.In addition, existing node restoration methods can remove the node data in node, therefore for the peace of data Do not ensure effectively entirely.

As can be seen here, there is provided a kind of fault recovery method of the node of storage cluster, improve in storage cluster to failure Reliability that node is recovered and the safety of data in malfunctioning node is ensured, being that those skilled in the art are urgently to be resolved hurrily asks Topic.

The content of the invention

It is an object of the invention to provide a kind of fault recovery method, device and the medium of the node of storage cluster, improve The reliability recovered in storage cluster to malfunctioning node and ensure the safety of data in malfunctioning node.

In order to solve the above technical problems, the present invention provides a kind of fault recovery method of the node of storage cluster, including：

Node data in the relevant information and malfunctioning node of record malfunctioning node；Wherein, at least wrapped in relevant information Include the cluster attaching information of malfunctioning node；

Fault recovery is carried out to malfunctioning node and obtains start node；

Relevant information is read in a manner of script, and target storage cluster is searched according to cluster attaching information；

Start node is configured to add to target storage cluster, and node data is write to start node.

Preferably, after the node data in the relevant information of record malfunctioning node and malfunctioning node, this method enters one Step includes：

By relevant information and node data write into Databasce.

Preferably, after target storage cluster is searched according to cluster attaching information, this method further comprises：

Judge whether target storage cluster is working condition；

Start node is configured to add to target storage cluster if it is, performing, and write to start node The step of node data.

The fault type of malfunctioning node is obtained, and Trouble Report is generated according to fault type.

Preferably, record malfunctioning node relevant information and malfunctioning node in node data be specially：

Simultaneously recording-related information and node data are obtained by storage control node.

Preferably, after fault recovery is carried out to malfunctioning node and obtains start node, this method further comprises：

Set the cluster priority of start node；

Accordingly, start node is configured to add to target storage cluster, and nodes is write to start node According to specially：

Start node is configured according to the order of cluster priority to add to target storage cluster, and saved to initial Point write-in node data.

In addition, the present invention also provides a kind of local fault recovery device of the node of storage cluster, including：

Node acquisition module, for record malfunctioning node relevant information and malfunctioning node in node data；

Node recovery module, start node is obtained for carrying out fault recovery to malfunctioning node；

Script execution module, target is searched for reading relevant information in a manner of script, and according to cluster attaching information Storage cluster；

Node adds module, for being configured to start node to add to target storage cluster, and to start node Write node data.

Preferably, the device further comprises：

Database writing module, for by relevant information and node data write into Databasce.

In addition, the present invention also provides a kind of local fault recovery device of the node of storage cluster, including memory, for storing Computer program；

Processor, the step of the fault recovery method of node such as above-mentioned storage cluster is realized during for performing computer program Suddenly.

In addition, the present invention also provides a kind of computer-readable recording medium, meter is stored with computer-readable recording medium Calculation machine program, the step of such as fault recovery method of the node of above-mentioned storage cluster is realized when computer program is executed by processor Suddenly.

The fault recovery method of the node of storage cluster provided by the present invention, record the mistake of the node data of malfunctioning node Journey, equivalent to the extraction and backup to data in malfunctioning node, therefore after malfunctioning node carries out fault recovery, even if data quilt Empty, can also by realizing the recovery of node data in the node that re-writes the node data of backup after fault recovery, The loss of data will not be caused, ensures the safety of data in malfunctioning node.Further, since this method by way of script, is read The relevant information of malfunctioning node is taken, and with the cluster where this determination malfunctioning node, it is achieved that automatically by after fault recovery Start node be configured in the cluster belonging to it, avoid the situation for triggering new failure because of human operational error, and carry The high recovery efficiency of node failure.In addition, the present invention also provides local fault recovery device and Jie of a kind of node of storage cluster Matter, beneficial effect are as described above

Brief description of the drawings

In order to illustrate the embodiments of the present invention more clearly, the required accompanying drawing used in embodiment will be done simply below Introduce, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill people For member, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is a kind of flow chart of the fault recovery method of the node of storage cluster provided in an embodiment of the present invention；

Fig. 2 is a kind of local fault recovery device structure chart of the node of storage cluster provided in an embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on this Embodiment in invention, for those of ordinary skill in the art under the premise of creative work is not made, what is obtained is every other Embodiment, belong to the scope of the present invention.

The core of the present invention is to provide a kind of fault recovery method of the node of storage cluster, and it is right in storage cluster to improve Reliability that malfunctioning node is recovered and ensure the safety of data in malfunctioning node.In addition, another core of the present invention is A kind of local fault recovery device and medium of the node of storage cluster are provided.

In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description The present invention is described in further detail.

Embodiment one

Fig. 1 is a kind of flow chart of the fault recovery method of the node of storage cluster provided in an embodiment of the present invention.It please join Fig. 1 is examined, the specific steps of the fault recovery method of the node of storage cluster include：

Step S10：Node data in the relevant information and malfunctioning node of record malfunctioning node.

Wherein, the cluster attaching information of malfunctioning node is comprised at least in relevant information.

It is understood that relevant information and malfunctioning node by recording the node in storage cluster node failure In node data, backed up equivalent to the malfunctioning node, therefore can be after the failture evacuation by malfunctioning node, root Reduction in terms of carrying out data to the node according to the data backed up.It should be noted that because node is the environment in cluster Middle work, therefore need to record the cluster belonging to the malfunctioning node in relevant information, and then be easy to recover just in the node It can be added into correct cluster and work on after normal state.The relevant configuration of the node can also be included in relevant information The node attribute informations such as parameter information, it should be decided according to the actual requirements, be not specifically limited herein.

Step S11：Fault recovery is carried out to malfunctioning node and obtains start node.

The purpose of this step is to carry out the node of failure the recovery operation such as to initialize, to reach the event for node of fixing a breakdown The purpose of barrier, the data carried out in the node after fault recovery will be eliminated, equivalent to the brand-new node for not being configured and using, That is start node.

Step S12：Relevant information is read in a manner of script, and target storage cluster is searched according to cluster attaching information.

It is understood that relevant information is read out compared to artificial mode according in relevant information by script Hold the acquisition that target storage cluster is carried out to start node and phase subsequently is carried out to start node for the target storage cluster The more efficient of configuration is closed, and because script can perform successively according to the execution step set by user, therefore perform The probability of mistake is minimum.Cluster attaching information can be embodied in the form of cluster ID, can also body in other way It is existing, do not limit herein.

Step S13：Start node is configured to add to target storage cluster, and nodes are write to start node According to.

After the target storage cluster belonging to above-mentioned steps find start node, you can with by script to the initial section Point is configured such that the start node is added in target storage cluster, and then the data backed up in above-mentioned steps are write Into start node with realize to start node be recover.It should be noted that for the configuration configured to start node Parameter can be recorded in relevant information, can also be recorded in the management node of cluster, management node is according to start node Identity information finds the configuration parameter for belonging to the start node for being configured to the start node, does not do herein specific Limit.

The fault recovery method of the node of storage cluster provided by the present invention, record the mistake of the node data of malfunctioning node Journey, equivalent to the extraction and backup to data in malfunctioning node, therefore after malfunctioning node carries out fault recovery, even if data quilt Empty, can also by realizing the recovery of node data in the node that re-writes the node data of backup after fault recovery, The loss of data will not be caused, ensures the safety of data in malfunctioning node.Further, since this method by way of script, is read The relevant information of malfunctioning node is taken, and with the cluster where this determination malfunctioning node, it is achieved that automatically by after fault recovery Start node be configured in the cluster belonging to it, avoid the situation for triggering new failure because of human operational error, and carry The high recovery efficiency of node failure.

Embodiment two

On the basis of above-described embodiment, as a kind of preferable implementation, in the relevant information of record malfunctioning node And after the node data in malfunctioning node, this method further comprises：

By relevant information and node data write into Databasce.

It is understood that relevant information and node data write into Databasce can be ensured into relevant information and node The security of data content, prevent loss and the damage of related data.In addition, it can be realized for a long time to failure section by this step The backup of point relevant information and node data, and support to read repeatedly and use.

In addition, as a kind of preferable implementation, after target storage cluster is searched according to cluster attaching information, the party Method further comprises：

Judge whether target storage cluster is working condition；

It is understood that because in running order cluster is made up of the node to cooperate, and it is single Node can not have the working characteristics of cluster.Therefore, mesh may determine that according to cluster attaching information lookup target storage cluster Whether there is in running order node in mark storage cluster, i.e. whether target storage cluster is working condition, if so, then saying The bright target storage cluster has availability, start node is added just meaningful to the target storage cluster, and then holds Row start node is configured with add to target storage cluster, and to start node write node data the step of.

In addition, as a kind of preferable implementation, in the relevant information of record malfunctioning node and malfunctioning node After node data, this method further comprises：

It is understood that by the way that the failure situation of malfunctioning node is collected for Trouble Report, user can be helped timely Accurately understand the relevant information of failure, and reference is provided for exclusion of the subsequent user to failure, and then failure can be made The fault removal efficiency of node is higher.

In addition, as a kind of preferable implementation, record malfunctioning node relevant information and malfunctioning node in section Point data is specially：

It is understood that storage control node is the node for being specifically used for managing memory node, make whole cluster Work is more orderly, and the action of storage control node is to carry out the backup to malfunctioning node related data, therefore is passed through Storage control node is set to obtain and record can making for the relevant information of malfunctioning node and node data in the cluster The exclusion of node failure is more efficient.

In addition, as a kind of preferable implementation, should after fault recovery is carried out to malfunctioning node and obtains start node Method further comprises：

Set the cluster priority of start node；

It is understood that the node in cluster there may exist relevant property, the normal operation of such as one node needs Establish on the basis of another or multiple nodes are currently running, then needed according to section when above-mentioned node breaks down Association situation between point presets the cluster priority of node, and then in fault recovery, the order according to priority is right Start node is configured to add to target storage cluster, and writes node data to start node, to ensure cluster just Normal working condition.

Embodiment three

Hereinbefore it is described in detail for a kind of embodiment of the fault recovery method of the node of storage cluster, The present invention also provides a kind of local fault recovery device of the node of storage cluster, due to embodiment and the method part of device part Embodiment is mutually corresponding, therefore the embodiment of device part refers to the description of the embodiment of method part, wouldn't repeat here.

Fig. 2 is a kind of local fault recovery device structure chart of the node of storage cluster provided in an embodiment of the present invention.Such as Fig. 2 institutes Show, a kind of local fault recovery device of the node of storage cluster provided in an embodiment of the present invention, including：

Node acquisition module 10, for record malfunctioning node relevant information and malfunctioning node in node data.

Node recovery module 11, start node is obtained for carrying out fault recovery to malfunctioning node.

Script execution module 12, mesh is searched for reading relevant information in a manner of script, and according to cluster attaching information Mark storage cluster.

Node adds module 13, is saved for being configured to start node to add to target storage cluster, and to initial Point write-in node data.

The local fault recovery device of the node of storage cluster provided by the present invention, record the mistake of the node data of malfunctioning node Journey, equivalent to the extraction and backup to data in malfunctioning node, therefore after malfunctioning node carries out fault recovery, even if data quilt Empty, can also by realizing the recovery of node data in the node that re-writes the node data of backup after fault recovery, The loss of data will not be caused, ensures the safety of data in malfunctioning node.Further, since the present apparatus by way of script, is read The relevant information of malfunctioning node is taken, and with the cluster where this determination malfunctioning node, it is achieved that automatically by after fault recovery Start node be configured in the cluster belonging to it, avoid the situation for triggering new failure because of human operational error, and carry The high recovery efficiency of node failure.

On the basis of embodiment three, as a preferred embodiment, the device further comprises：

Example IV

The present invention also provides a kind of local fault recovery device of the node of storage cluster, including memory, is calculated for storing Machine program；

The present invention also provides a kind of computer-readable recording medium, it is characterised in that is deposited on computer-readable recording medium Computer program is contained, the fault recovery method of the node such as above-mentioned storage cluster is realized when computer program is executed by processor The step of.

The computer-readable recording medium of the fault recovery of the node of storage cluster provided by the present invention, record failure section The process of the node data of point, failure is carried out equivalent to the extraction and backup to data in malfunctioning node, therefore in malfunctioning node , can also be real in the node after fault recovery by the way that the node data of backup is re-write even if data are cleared after recovery The recovery of existing node data, the loss of data will not be caused, ensure the safety of data in malfunctioning node.Further, since this is readable Storage medium is by way of script, the relevant information of read failure node, and with the cluster where this determination malfunctioning node, because This, which is realized, is automatically configured to the start node after fault recovery in the cluster belonging to it, avoid because of human operational error and Trigger the situation of new failure, and improve the recovery efficiency of node failure.

A kind of fault recovery method, device and the medium of the node of storage cluster provided by the present invention are carried out above It is discussed in detail.Each embodiment is described by the way of progressive in specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, Some improvement and modification can also be carried out to the present invention, these are improved and modification also falls into the protection domain of the claims in the present invention It is interior.

It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.

Claims

A kind of 1. fault recovery method of the node of storage cluster, it is characterised in that including：

Node data in the relevant information of record malfunctioning node and the malfunctioning node；Wherein, in the relevant information extremely Include the cluster attaching information of the malfunctioning node less；

Fault recovery is carried out to the malfunctioning node and obtains start node；

The relevant information is read in a manner of script, and target storage cluster is searched according to the cluster attaching information；

The start node is configured to add to the target storage cluster, and the section is write to the start node Point data.
2. according to the method for claim 1, it is characterised in that in the record relevant information of malfunctioning node and described After node data in malfunctioning node, this method further comprises：

By the relevant information and the node data write into Databasce.
3. according to the method for claim 1, it is characterised in that deposited described according to cluster attaching information lookup target After accumulation, this method further comprises：

Judge whether the target storage cluster is working condition；

If it is, perform described configured to the start node to add to the target storage cluster, and to described Start node writes the step of node data.
4. according to the method for claim 1, it is characterised in that in the record relevant information of malfunctioning node and described After node data in malfunctioning node, this method further comprises：

The fault type of the malfunctioning node is obtained, and Trouble Report is generated according to the fault type.
5. according to the method for claim 1, it is characterised in that the relevant information of the record malfunctioning node and the event Barrier node in node data be specially：

Obtained by storage control node and record the relevant information and the node data.
6. according to the method for claim 1, it is characterised in that malfunctioning node progress fault recovery is obtained described After start node, this method further comprises：

Set the cluster priority of the start node；

Accordingly, it is described the start node to be configured to add to the target storage cluster, and initially saved to described Point writes the node data：

The start node is configured according to the order of the cluster priority to add to the target storage cluster, and The node data is write to the start node.
A kind of 7. local fault recovery device of the node of storage cluster, it is characterised in that including：

Node acquisition module, for record malfunctioning node relevant information and the malfunctioning node in node data；

Node recovery module, start node is obtained for carrying out fault recovery to the malfunctioning node；

Script execution module, searched for reading the relevant information in a manner of script, and according to the cluster attaching information Target storage cluster；

Node adds module, for being configured to the start node to add to the target storage cluster, and to described Start node writes the node data.
8. device according to claim 7, it is characterised in that the device further comprises：

Database writing module, for by the relevant information and the node data write into Databasce.
9. a kind of local fault recovery device of the node of storage cluster, it is characterised in that including memory, for storing computer journey Sequence；

Processor, the section of the storage cluster as described in any one of claim 1 to 6 is realized during for performing the computer program The step of fault recovery method of point.
10. a kind of computer-readable recording medium, it is characterised in that be stored with computer on the computer-readable recording medium Program, the node of the storage cluster as described in any one of claim 1 to 6 is realized when the computer program is executed by processor Fault recovery method the step of.