CN118093251A

CN118093251A - Fault processing method and device, electronic equipment and storage medium

Info

Publication number: CN118093251A
Application number: CN202410508482.5A
Authority: CN
Inventors: 赵鹏; 郭强; 刘清林
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-04-25
Filing date: 2024-04-25
Publication date: 2024-05-28

Abstract

The application discloses a fault processing method, a device, electronic equipment and a storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: when an error occurs in executing a target event, terminating execution of the target event, modifying the cluster state into a recovery state, and modifying the event level of the cluster state according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event; in the recovery state, processing a node state modification event through a target service module control layer of the node state modification event; when the first cluster recovery event is received, key information is reserved, and non-key information is discarded; wherein the key information includes configuration information; and when the second cluster recovery event is received, inquiring and updating the latest node state from the target service module control layer of the second cluster recovery event. The application improves the fault processing and cluster recovery efficiency.

Description

Fault processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a fault handling method, a fault handling device, an electronic device, and a storage medium.

Background

Distributed storage clusters typically rely on some distributed coherency protocol to build a coherency framework and rely on a consistent state within the cluster provided by the coherency framework to coordinate the behavior of nodes within the cluster to achieve high expansion and high availability of the cluster. The control state machine on each node carries out consistent reading and writing on the cluster state under the coordination of the consistency framework, and controls the application end on each node to make the same behavior under the same state, thereby completing the coordination action effect of each node in the cluster.

Under the normal running condition, the states of all nodes in the cluster are consistent and the behaviors are consistent, but if the cluster states are abnormally changed, the abnormal value can be read by state machines on all the nodes in the cluster, and due to the consistent behaviors of all the state machines, all the nodes in the cluster can terminate the service process after reading the same abnormal value, so that the service and the cluster are down. In the related art, after such problems occur, the manual recovery is highly dependent on implementation personnel, and the failure processing and cluster recovery efficiency are low.

Therefore, how to improve the failure handling and cluster recovery efficiency is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a fault processing method and device, electronic equipment and storage medium, and the fault processing and cluster recovery efficiency is improved.

To achieve the above object, the present application provides a fault handling method applied to a node in a distributed storage cluster, the method including:

when an error occurs in executing a target event, terminating execution of the target event, modifying the cluster state into a recovery state, and modifying the event level of the cluster state according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event;

In the recovery state, processing a node state modification event through a target service module control layer of the node state modification event;

when the first cluster recovery event is received, key information is reserved, and non-key information is discarded; wherein the key information includes configuration information;

and when the second cluster recovery event is received, inquiring and updating the latest node state from the target service module control layer of the second cluster recovery event.

Wherein after inquiring and updating the latest node state from the target service module control layer, the method further comprises:

After receiving a target number of basic events broadcast by the master node, synchronizing the target number of basic events so as to update own event levels; wherein the target number is greater than a recent event queue depth;

After inquiring and updating the latest node state from the own target service module control layer, the target node in the distributed storage cluster sends a target number of basic events to the master node, so that the master node broadcasts the target number of basic events to the nodes participating in cluster recovery.

Wherein after sending the target number of basic events to the master node, the method further comprises:

when a recovery completion event is received, modifying the cluster state from the recovery state to a normal state; and the target node in the distributed storage cluster submits a recovery completion event after sending a target number of basic events to the master node.

Wherein after modifying the cluster state from the recovery state to the normal state, the method further comprises:

Nodes not participating in cluster recovery update their own event levels by copying the cluster state copy entirely.

The nodes participating in cluster recovery are determined according to the event levels of the nodes after the event levels of the nodes in the distributed storage cluster are modified according to preset rules.

Wherein the target number of basic events is the target number of null events.

And after the cluster state is changed into the recovery state by the master node, when the node in the suspension state does not exist, sequentially submitting the first cluster recovery event and the second cluster recovery event through the self business module control layer.

Wherein the node state modification event comprises any one or a combination of any of a node addition event, a node suspension event, a node unhooking event, and a node removal event.

Wherein the key information comprises any one or a combination of a plurality of items of disk array configuration information, storage pool configuration information and virtual volume configuration information.

The non-critical information comprises any one or a combination of a plurality of disk array path states, virtual volume path states, node states, reset input and output statistical information, log information and error logs.

Wherein, still include:

and if the node state modifying event is received in the process of processing the first cluster restoring event and the second cluster restoring event, re-entering the step of processing the node state modifying event through the self target service module control layer.

Wherein, when the execution target event is wrong, terminating the execution of the target event, comprising:

the execution of the target event is terminated when a target event occurrence code predicate is executed on the first clustered copy.

when executing a target event on the first cluster copy, if the business module control layer generates code assertion, the execution of the target event is terminated.

Before the cluster state is modified to the recovery state, the method further comprises:

and covering the first cluster copy by using the second cluster copy to obtain a new first cluster copy.

Wherein, the modifying the cluster state to the recovery state includes:

And modifying the cluster state of the new first cluster copy into a recovery state, switching to the second cluster copy, and modifying the cluster state of the second cluster copy into the recovery state.

Wherein, still include:

When an event to be executed is received, determining a cluster state;

if the cluster state is the recovery state, judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event or the node state modification event;

if not, skipping the execution of the event to be executed.

Wherein after determining the cluster state, the method further comprises:

If the cluster state is a normal state, judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event;

If yes, skipping the execution of the event to be executed;

if not, directly executing the event to be executed.

To achieve the above object, the present application provides a fault handling apparatus applied to a node in a distributed storage cluster, the apparatus comprising:

The first modification module is used for terminating the execution of the target event when the execution of the target event is wrong, modifying the cluster state into a recovery state and modifying the event level of the first modification module according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event;

the processing module is used for processing the node state modification event through the own target service module control layer under the recovery state;

the discarding module is used for reserving key information and discarding non-key information when the first cluster recovery event is received; wherein the key information includes configuration information;

and the first updating module is used for inquiring and updating the latest node state from the target service module control layer of the first updating module when the second cluster recovery event is received.

To achieve the above object, the present application provides an electronic device including:

a memory for storing a computer program;

and a processor for implementing the steps of the fault handling method as described above when executing the computer program.

To achieve the above object, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the fault handling method as described above.

As can be seen from the above solution, the fault processing method provided by the present application is applied to nodes in a distributed storage cluster, and the method includes: when an error occurs in executing a target event, terminating execution of the target event, modifying the cluster state into a recovery state, and modifying the event level of the cluster state according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event; in the recovery state, processing a node state modification event through a target service module control layer of the node state modification event; when the first cluster recovery event is received, key information is reserved, and non-key information is discarded; wherein the key information includes configuration information; and when the second cluster recovery event is received, inquiring and updating the latest node state from the target service module control layer of the second cluster recovery event.

When the fault processing method provided by the application has the fault, the execution of the target event is stopped, the cluster state is modified into the recovery state, the event level of each node is modified according to the same preset rule in the recovery state, each node processes the node state modification event through the control layer of the target service module of the node, key information is reserved through the first cluster recovery event, non-key information is discarded, and the latest node state is inquired and updated through the second cluster recovery event. Therefore, the fault processing method provided by the application realizes the automatic recovery of the cluster, improves the fault processing and cluster recovery efficiency, and ensures that the event levels of the recovered nodes are consistent and the states of the nodes are the latest node states. The application also discloses a fault processing device, an electronic device and a computer readable storage medium, and the technical effects can be realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:

FIG. 1 is an architecture diagram of nodes in a distributed storage cluster, according to an example embodiment;

FIG. 2 is a flow chart illustrating a fault handling method according to an exemplary embodiment;

FIG. 3 is a flow chart illustrating another fault handling method according to an exemplary embodiment;

FIG. 4 is a block diagram of a fault handling apparatus according to an exemplary embodiment;

Fig. 5 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. In addition, in the embodiments of the present application, "first", "second", etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence.

The application is applied to a distributed storage cluster, which comprises a plurality of servers interconnected through a network, wherein each storage server is provided with a rear-end disk or a rear-end disk cabinet. The disks may be shared within a storage cluster that may be multi-tiered and virtualized on the back-end disks, including storage pools, raid (Redundant Arrays of INDEPENDENT DISKS, redundant array of independent disks) groups, virtual disks, etc., to provide higher performance, throughput, availability, and other diversified data storage access services than stand-alone storage. The storage server cluster and the front-end hosts access the same front-end network through which the storage servers serve storage services (virtual disks) to the front-end hosts.

In a distributed storage cluster, critical data needs to be shared within the cluster to avoid single point failures causing data to be inaccessible. To achieve this goal, each node within the storage cluster runs the same cluster and application software. The architecture of each node is shown in fig. 1, and includes a consistency protocol layer, a service module control layer, and a service module application layer.

The consistency protocol layer is used for maintaining the existence of clusters, collecting and distributing cluster events and providing a cluster state space for the service module.

Specifically, when network links between all nodes in the cluster are normal, the coherence protocol layer maintains a cluster heartbeat to confirm that all nodes in the cluster are active. When the heartbeat of the node is lost or a link fault occurs, the consistency protocol layer on each node calculates whether the number of the nodes of a certain network partition exceeds half of the total number of the nodes of the last stable cluster, and the network partition takes over the cluster.

The event collection of the consistency protocol layer faces to each service module, each service module can send events to the consistency protocol layer, and the event distribution of the consistency protocol layer can ensure that each node receives the same event sequence.

The consistency protocol layer provides a cluster state space for the service module, and the service module can read and write the service module by using a fixed interface provided by the consistency protocol layer. The consistency protocol layer ensures that the initial cluster state of each node is the same, and ensures that the writing action of the service module on the cluster state is completely consistent, thereby ensuring that the states on all nodes are consistent. To ensure atomicity of a series of write operations triggered by a single event, the cluster state has two completely identical copies, namely a first cluster copy and a second cluster copy, and the business module needs to write the two copies serially to complete the final state modification. Because there are two copies, the node fails when writing to either copy, and the state of the other copy is intact, so the cluster state can be restored to a consistent state by rolling back or rolling forward from the other copy.

Events in the cluster have a level which increases monotonically from zero, the events are temporarily and permanently stored on nodes in the cluster in the distribution process, the effect of the events is completed after the execution of the events is completed, the level of the events is converted into the latest level of the cluster state, and therefore the storage space is not required to be occupied any more. The EVENT sequence number is infinitely growing, but there is not enough room to keep all EVENTs all the time, so there are only a limited number of EVENTs kept at each node, these buffered EVENTs are called RECENT EVENTs, the maximum number of which is noted RENM (RECENTEVENT NUM MAX). Recent events are persistently recorded and updated by scrolling, and the queue is a recent event queue. Both copies of the cluster state and recent events are persisted and not lost due to software and hardware failures or power loss.

Event commit success requires at least a majority of nodes within the cluster to return success. Events that commit successfully enter the end of the recent event loop queue waiting for execution. Events in the recent event queue may be dequeued for execution from the head of the queue. After one event execution is completed, the cluster state is changed, and then the next event dequeues. In normal operation, the execution sequence of events on all nodes of the cluster is consistent, but the time concurrency is not rigidly guaranteed.

The consistency protocol layer judges that the nodes belong to the same cluster and have two conditions, wherein one condition is that the nodes have common unique cluster identification, and the other condition is that the nodes in the cluster have network links which are mutually connected in pairs. Nodes meeting these conditions are automatically pulled into the cluster and if the two conditions are no longer met, the nodes in the cluster are kicked out of the cluster. When the nodes leave, the consistency protocol layer judges whether the number of the rest nodes in the cluster exceeds half of the number of the nodes of the last stably running cluster, and only when the number of the rest nodes exceeds half, the rest nodes can take over the cluster.

During cluster operation, if a node leaves briefly because of a network or software failure, the state is likely not up-to-date already when it reverts to the cluster, and thus needs to be synchronized from other nodes. Depending on the state of the node and the outdated degree of the recent event, there may be one of two recovery modes, if the level range of the recent event stored by the node when the node leaves is [ N, n+ RENM-1], and the level of the event stored by the node in the cluster when the node returns is n+ RENM, then it is only necessary to send the missing event from the node containing the event to the node behind the node. If the state level of all the cluster nodes in the cluster exceeds n+ RENM, that is, the n+ RENM events on all the nodes are validated and the cluster state is merged, there is no way to make the secondary node catch up by synchronizing the recent events, and at this time, the present node synchronizes a complete cluster state from other nodes in the cluster.

Each business module can add a sub-module to the business module control layer, the core of which is an event handling function. After receiving the event sent by the consistency protocol layer, the service module can call the corresponding logic to read and write the corresponding cluster state, and decides which specific actions of the service module application layer are called according to the states before and after the reading and writing.

Each service module can add a plurality of sub-modules to the service module application layer, and the sub-modules are controlled by the same control layer service sub-module. The service module application layer may send an event to the coherence protocol layer as needed, and the event may include a callback. The event is distributed to each node in the cluster by the consistency protocol layer, and the inside of the node is transferred to the corresponding service module control layer. After the control layer completes writing of the two copies of the cluster state, the action of the corresponding application sub-module is called, the action is returned to the consistency protocol layer after completion, and the consistency protocol layer finds the application layer sub-module initiating the event and calls the callback contained in the event.

The architecture can ensure that the cluster state can be copied to all nodes in the cluster uniformly. The modification of the cluster state is triggered by events, and the coherence protocol layer ensures that each node within the cluster performs these events in the same order. This framework enables difficult node task coordination of the storage cluster, but if there is a problem in the code of the business module control layer event handling that causes a node crash, typically as a valid value assertion in the code, the failure location is the business module control layer logic between steps 3-5 in fig. 1, then when this event is executed, all nodes will crash because the consistency protocol layer distributes this event to all nodes of the cluster. Worse, because these events are persisted, they are re-executed after the node reboots, ultimately resulting in all nodes within the cluster repeatedly crashing. Therefore, a software problem may cause the cluster to be down, the availability of the cluster is reduced to a single machine level, and the continuity of the service is seriously affected.

One major problem with cluster restoration is to ensure that when cluster restoration is complete, all restored nodes have identical copies of the cluster state, i.e., the cluster state is consistent across nodes, and the following list of possible inconsistent scenarios:

1. If only some nodes in the cluster perform cluster recovery, the recovered cluster is composed of some nodes performing recovery actions and other nodes not performing recovery actions. The cluster states held by the two types of nodes are inconsistent, so that even if the same event sequence is received again later, the nodes cannot be guaranteed to make the same coping actions.

2. If all nodes perform cluster recovery, but the initial cluster states at recovery are different, inconsistencies will also result. For example, the distributed storage cluster includes 4 nodes, respectively node A, B, C, D, there are currently 3 events to be processed X, Y, Z, and the consistency protocol layer ensures that the three events are processed sequentially, but cannot guarantee the precise time of event processing, and one complete possibility is that node a has completed processing all 3 events, but node B has processed event X only, and nodes C and D have not had access to any event, and if all nodes stop running at this time and start cluster recovery, the cluster state on each node will also be inconsistent.

To avoid the inconsistent problem of starting cluster restoration at different locations of event processing as described above, one possible approach is to select one node in the cluster to complete the cluster restoration action and copy its state to all other nodes of the cluster after restoration is complete.

But cluster recovery also needs to ensure that the cluster state is consistent with the data stored on the back-end disk. For example, if a data unit has been migrated from Raid a to Raid B and modified after the migration, data inconsistency may be induced if the cluster still considers this data unit to be on Raid a at this time.

The above-mentioned cluster recovery method of selecting one node in the cluster to complete the cluster recovery action and copying its state to all other nodes of the cluster after the recovery is completed may cause inconsistency between the cluster state and the backend disk storage data. For example, for the scenario in 2, assuming that cluster restoration is only performed on node B, the restored cluster state would include the execution result of event X but not the execution result of event Y, Z. However, node a considers that events Y, Z have been performed and may have issued an IO (Input/Output) request or modified its read-write cache based on the result of this execution, and recovery from node B results in invalidation of these actions by node a.

In order to avoid the inconsistent problem, all nodes can stop at the same accurate position of the event sequence, the cluster is triggered to automatically recover, and the consistency of the consistency protocol layer is relied on to ensure that all nodes recover to a consistent state. But even with this strategy it is not constant how many nodes in particular perform the recovery process.

Consider the following scenarios:

1. all nodes have the same cluster state and a bad event triggers all node asserts.

In this case, at first glance, it may be considered that all nodes will start cluster reorganization at the same location assent of the traffic control layer code and at the same time, which in fact does happen in the most optimistic case. This is not necessary because there is a boundary situation: the nodes can process the event only when the most half of the nodes in the election set are online, each node which successfully executes the bad event can asserify, so that the cluster is left, once the most half of the nodes leave the cluster, the rest nodes can not process any event, and can exit the cluster after the lease of the rest nodes exceeds the period, so that the most half of the nodes in the cluster can only be ensured to enter the cluster for recovery.

For the boundary scenario described above, before any node in the cluster handles an event therein, the coherence protocol layer will ensure that each node in the view gets a persistent copy of the event, and once the master node confirms that each node in the view has obtained a copy of the event, it will broadcast a notification to tell all nodes that the event commit was successful, the event can be performed, and the number of nodes that do cluster recovery will depend on how many nodes received the notification before half of the nodes handled the bad event and assast left the cluster. The most optimistic case is that all nodes get this notification, in which case all nodes will handle this bad event and therefore all nodes will initiate cluster recovery. The worst case is that the master node immediately processes the bad event after broadcasting the notification, so that the master node itself asserts, at this time, the broadcast has not yet arrived at any other node, at this time, the remaining nodes will select a new master node, and the new master node will replay the notification broadcast of the bad event, at this time, the worst case is that the new master node will only broadcast the notification to itself, and this process may continue until the majority of nodes asserts in the election set exit, when only half of nodes remain in the cluster, the new master node will not be able to refresh the lease again, replay will not continue, and all the remaining nodes will lease for an excessive period.

2. One node in the cluster is corrupted in cluster state, inconsistent with the other nodes, resulting in only this node asserted.

At this time, the node in the recovery needs to be prevented from being rejoined into the cluster or communicating with other nodes so as to prevent the whole cluster from being broken, and single-point failure is caused.

In addition, similar to the case of entering cluster restoration, there are a number of problems with the timing of exiting cluster restoration.

First, cluster restoration exit must be coordinated in the same way as entry into cluster restoration, the simplest approach being to issue an event that allows all nodes to exit restoration when they receive it, returning to normal IO processing processes. This approach suffers from the same problems as previously described: only a half of the nodes entering the cluster restoration can be guaranteed to process the event and exit. The remaining nodes will lease for excess. However, as long as the exit is successful. Nodes with leases exceeding the period of time return to the cluster and can replay the event of exiting the cluster recovery and ignore the event. To achieve this, the traffic module control layer handling this event may be caused to exit cluster recovery by informing the traffic module application layer of the node.

Secondly, resynchronizing with nodes that do not perform cluster recovery. Although it is acceptable to have only a half of the nodes in the cluster perform cluster recovery, it is also required that once cluster recovery occurs, any other nodes within the cluster should be able to rejoin the cluster. Each node may hold a unique cached copy of a virtual volume that would result in its cached data being lost if that node were unable to add back to the cluster. The consistency protocol layer has a mature mechanism for bringing short missing nodes in the cluster back to the cluster, and the cluster uses one of two corresponding resynchronization modes according to the length of time that the nodes leave the cluster: if the joining node is only a few events behind, then these events will be forwarded to it; if the joining node is far behind, the coherence protocol layer will pass a state copy from a node with the most current state to it. After cluster recovery occurs, all nodes not participating in cluster recovery rejoin the cluster in a manner that obtains a copy of the cluster state, rather than replay the event, because a number of state resets may occur during the recovery process. To ensure this, enough events may be handled during cluster recovery so that the coherence protocol layer does not have enough event history space for replay, thereby forcing triggering of cluster state copy copies.

The embodiment of the application discloses a fault processing method, which improves the fault processing and cluster recovery efficiency.

Referring to fig. 2, a flowchart of a fault handling method is shown according to an exemplary embodiment, as shown in fig. 2, including:

S101: when an error occurs in executing a target event, terminating execution of the target event, modifying the cluster state into a recovery state, and modifying the event level of the cluster state according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event;

The execution main body of the embodiment is each node in the distributed storage cluster, the target event reaches the service module control layer, and when an error occurs in the process of modifying the first cluster copy, the service module control layer terminates the execution of the target event, and modifies the cluster state from the normal state to the recovery state.

As a possible implementation manner, the terminating the execution of the target event when the execution target event is in error includes: the execution of the target event is terminated when a target event occurrence code predicate is executed on the first clustered copy. In a specific implementation, since the service module control layer modifies the first cluster copy and the second cluster copy at two stages of event execution, respectively, and is completely identical, termination always occurs when the first cluster copy is accessed. A failed assertion is encountered while the target event is performed on the first cluster replica, at which point the cluster state is modified from a normal state to a recovery state.

As a possible implementation manner, before modifying the cluster state into the recovery state, the method further includes: and covering the first cluster copy by using the second cluster copy to obtain a new first cluster copy. In a specific implementation, all nodes in the cluster perform state rollback, that is, the second cluster copy is used to cover the first cluster copy.

As a possible implementation manner, the modifying the cluster state into the recovery state includes: and modifying the cluster state of the new first cluster copy into a recovery state, switching to the second cluster copy, and modifying the cluster state of the second cluster copy into the recovery state. In a specific implementation, the cluster state of the new first cluster copy is modified from the normal state to the recovery state, then the second cluster copy is switched to, and the cluster state of the second cluster copy is modified from the normal state to the recovery state.

As a possible implementation manner, the terminating the execution of the target event when the execution target event is in error includes: when executing a target event on the first cluster copy, if the business module control layer generates code assertion, the execution of the target event is terminated. In a specific implementation, cluster restoration is automatically triggered when the service module control layer processes an event assent. In order to avoid that all service module control layer developers replace the asseries in the service module control layer, the platform code and the consistency protocol layer can cooperate to detect whether an asseries occurs in the event processing process, and asseries in the consistency protocol layer and asseries in the service module application layer can not trigger cluster recovery. If CLUSTER recovery triggers, the IO process will EXIT with a specific error code EXIT_RECOVER_CLUSTER EXIT code to the outer process. The outer process recognizes the exit code, when the IO process exits with the exit code, it will first create a coredump to capture enough information of the positioning problem, after coredump data file is written, the outer process will execute a new cluster_recovery process to perform cluster recovery, and the cluster_recovery process is substantially consistent with the behavior of the IO host process.

The cluster_recovery process initializes the nodes according to the normal condition, and in the process of initializing the nodes, the event levels of the nodes are modified according to the same preset rule so as to distinguish the nodes which are not subjected to cluster recovery, so that the nodes in recovery and the nodes which are not recovered can not form a cluster, namely the nodes which participate in cluster recovery are determined according to the event levels of the nodes after the event levels of the nodes in the distributed storage cluster are modified according to the preset rule. The nodes will initiate their communication links and ports in the usual way and will attempt to form a cluster, as the event level of the nodes has been modified, the nodes in the cluster will attempt to form a cluster with other nodes that have also entered the cluster recovery mode, and will ignore those nodes that have not entered the recovery mode until a half of the nodes enter the cluster recovery mode, without further action by the nodes until the cluster is successfully formed. Once the most half of the nodes in the cluster have initiated the cluster recovery process and completed mutual discovery via intra-cluster communication links, the coherence protocol layer will initiate the cluster in the usual manner and issue the necessary sequence of events to have all nodes complete synchronization with the cluster state. This process involves replaying the unsubmitted event and issuing a corresponding node event for the node's leave, all of the business module control layer components will register a surrogate event handling function to handle the event in the cluster restoration process. With a few exceptions, the service module control layer ignores events sent by the consistency protocol layer in the running process of all cluster_recovery processes, so that no problem is caused when the bad events triggering the asset in normal running are played back.

For the common service module control layer except the target service module control layer, only two events, namely a first cluster recovery event and a second cluster recovery event, are processed in the cluster recovery process. For the special service module control layer of the target service module control layer, the special service module control layer exists in the main IO process and the cluster_recovery process, processes the node state modification event besides the first cluster recovery event and the second cluster recovery event, and ignores all other events.

During the cluster restore process, the service module control layer basically keeps the service module application layer in its initial dormant state because most events are ignored by the service module control layer.

As a possible implementation manner, after modifying the cluster state to the recovery state, when there is no node in the suspended state, the master node sequentially submits the first cluster recovery event and the second cluster recovery event through its own service module control layer.

In a specific implementation, in a cluster recovery process, when no node is in a suspended state, a service module control layer of a master node sequentially broadcasts a first cluster recovery event and a second cluster recovery event, and after receiving the first cluster recovery event and the second cluster recovery event, a service module control layer of a node participating in recovery calls a corresponding service module application layer to sequentially execute the first cluster recovery event and the second cluster recovery event.

S102: in the recovery state, processing a node state modification event through a target service module control layer of the node state modification event;

In the cluster recovery process, the target service module control layer of each node can continuously process the node modification event, so that the latest node state moment is ensured. The node state modification event may include a node add event, a node suspend event, a node disconnect event, a node remove event, and the like.

S103: when the first cluster recovery event is received, key information is reserved, and non-key information is discarded; wherein the key information includes configuration information;

In a specific implementation, when each node executes the first cluster recovery event, key information is reserved, integrity check can be performed on the key information, and then non-key information is discarded. The key information may include disk array configuration information, such as a back-end controller, luns (logical unit numbers), patterns of heterogeneous luns, naming of each disk array, etc., storage pool configuration information, such as a virtualization mapping table, ongoing data migration tasks, etc., virtual volume configuration information, such as mapping between virtual volumes and hosts, subordinate IO groups, preferred nodes, naming, etc. The non-critical information may include disk array path status, virtual volume path status, node status, reset input output statistics, log information, error logs, and the like. In the cluster recovery state, the path states of all objects are set offline. The cluster is queried to recover the service module control layer to determine whether the node is on-line or off-line, and the service module control layer state of the cluster is updated accordingly.

S104: and when the second cluster recovery event is received, inquiring and updating the latest node state from the target service module control layer of the second cluster recovery event.

In a specific implementation, when each node executes the second cluster recovery event, the service module control layers are allowed to synchronize with each other, that is, the target service module control layer provides a query function, and the common service module control layer can query the target service module control layer through the query function and update the latest node state. The consistency protocol layer ensures that all events (including the first cluster recovery event and the second cluster recovery event) are executed by either all service module control layers or none of the service module control layers through a two-stage execution mechanism of the first cluster recovery event and the second cluster recovery event, which means that all service module control layers are consistent for most updates to the cluster state.

As a preferred embodiment, after querying the target service module control layer and updating the latest node state, the method further includes: after receiving a target number of basic events broadcast by the master node, synchronizing the target number of basic events so as to update own event levels; wherein the target number is greater than a recent event queue depth; after inquiring and updating the latest node state from the own target service module control layer, the target node in the distributed storage cluster sends a target number of basic events to the master node, so that the master node broadcasts the target number of basic events to the nodes participating in cluster recovery.

In a specific implementation, when the second cluster recovery event is executed, one of the nodes in the distributed storage cluster, i.e. the target node, sends a target number of basic events to the master node, where the target number is greater than the recent event queue depth, which acts as an event level to raise the node, where the basic events may be null events. The master node broadcasts a target number of basic events to the nodes participating in cluster recovery, the nodes participating in cluster recovery execute the target number of basic events, and the event levels of the nodes are synchronized, so that the event levels of the nodes participating in cluster recovery after cluster recovery are consistent.

As a preferred embodiment, after sending the target number of basic events to the master node, the method further includes: when a recovery completion event is received, modifying the cluster state from the recovery state to a normal state; and the target node in the distributed storage cluster submits a recovery completion event after sending a target number of basic events to the master node.

In a specific implementation, after the target node sends a target number of basic events, the service module application layer of the target node sends out a commit recovery completion event, and after the service module control layers of other nodes receive the recovery completion event, the service module application layer corresponding to the other nodes is called to exit the cluster recovery state.

As a preferred embodiment, after modifying the cluster state from the recovery state to the normal state, the method further includes: nodes not participating in cluster recovery update their own event levels by copying the cluster state copy entirely.

In a specific implementation, after the cluster recovery is completed, in order to enable the nodes which do not participate in the cluster recovery to also rejoin the cluster, the nodes which do not participate in the cluster recovery need to synchronize their own event levels, and because the nodes which participate in the cluster recovery execute a target number of basic events, the lifting degree of the event levels is greater than the depth of a recent queue, the nodes which do not participate in the cluster recovery update their own event levels by copying the state copy of the cluster completely, thereby realizing that the nodes which do not participate in the cluster recovery can rejoin the cluster.

As a preferred embodiment, further comprising: and if the node state modifying event is received in the process of processing the first cluster restoring event and the second cluster restoring event, re-entering the step of processing the node state modifying event through the self target service module control layer.

In a specific implementation, if a node state modification event occurs during the process of processing the first cluster recovery event and the second cluster recovery event, step S102 is re-entered, the node state modification event is processed through the target service module control layer, then the first cluster recovery event and the second cluster recovery event are sequentially executed, and the node recovery is performed again, so as to ensure that the event levels of the nodes are consistent.

According to the fault processing method provided by the embodiment of the application, when the execution target event is wrong, the execution of the target event is terminated, the cluster state is modified into the recovery state, the event level of each node is modified according to the same preset rule in the recovery state, each node processes the node state modification event through the control layer of the target service module of the node, key information is reserved through the first cluster recovery event, non-key information is discarded, and the latest node state is inquired and updated through the second cluster recovery event. Therefore, the fault processing method provided by the embodiment of the application realizes the automatic recovery of the cluster, improves the fault processing and cluster recovery efficiency, and simultaneously ensures that the event levels of the recovered nodes are consistent and the state of each node is the latest node state.

The embodiment of the application discloses a fault processing method, and compared with the previous embodiment, the technical scheme of the embodiment is further described and optimized. Specific:

Referring to fig. 3, a flowchart of another fault handling method is shown according to an exemplary embodiment, as shown in fig. 3, including:

s201: when an event to be executed is received, determining a cluster state;

S202: if the cluster state is the recovery state, judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event or the node state modification event;

S203: if the event to be executed is the node state modification event, processing the node state modification event through a target service module control layer of the event to be executed;

s204: if the event to be executed is the first cluster recovery event, key information is reserved, and non-key information is discarded; wherein the key information includes configuration information;

S205: if the event to be executed is the second cluster recovery event, inquiring and updating the latest node state from a target service module control layer of the event to be executed;

S206: after receiving a target number of basic events broadcast by the master node, synchronizing the target number of basic events so as to update own event levels;

S207: if the event to be executed is not any event of the first cluster recovery event, the second cluster recovery event and the node state modification event, skipping the execution of the event to be executed;

It should be noted that, in the recovery state, events other than the first cluster recovery event, the second cluster recovery event, and the node state modification event are ignored, so as to avoid inconsistent node event levels caused by executing other events in the recovery state.

S208: if the cluster state is a normal state, judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event; if yes, skipping the execution of the event to be executed; if not, directly executing the event to be executed.

It should be noted that, in the normal state, the service module control layer tracks the node status in the cluster, which may include the node status, the event level, the index, and the like, and all cluster recovery events including the first cluster recovery event and the second cluster recovery event may be ignored, so as to avoid error cluster recovery of the cluster in the normal state.

The following describes a fault handling apparatus according to an embodiment of the present application, and a fault handling apparatus described below and a fault handling method described above may be referred to each other.

Referring to fig. 4, a structure diagram of a fault handling apparatus according to an exemplary embodiment is shown, as shown in fig. 4, including:

The first modification module 100 is configured to terminate execution of a target event when an error occurs in executing the target event, modify a cluster state into a recovery state, and modify an event level of the first modification module according to a preset rule; after the main node modifies the cluster state into the recovery state, sequentially submitting a first cluster recovery event and a second cluster recovery event;

The processing module 200 is configured to process, in the recovery state, a node state modification event through the own target service module control layer;

A discarding module 300, configured to, when the first cluster recovery event is received, reserve key information, and discard non-key information; wherein the key information includes configuration information;

And the first updating module 400 is configured to query and update the latest node state from the own target service module control layer when the second cluster recovery event is received.

According to the fault processing device provided by the embodiment of the application, when an error occurs in an execution target event, the execution of the target event is terminated, the cluster state is modified into the recovery state, the event level of each node is modified according to the same preset rule in the recovery state, each node processes the node state modification event through the control layer of the target service module of the node, key information is reserved through the first cluster recovery event, non-key information is discarded, and the latest node state is inquired and updated through the second cluster recovery event. Therefore, the fault processing device provided by the embodiment of the application realizes the automatic recovery of the cluster, improves the fault processing and cluster recovery efficiency, and simultaneously ensures that the event levels of the recovered nodes are consistent and the state of each node is the latest node state.

On the basis of the above embodiment, as a preferred implementation manner, the method further includes:

The synchronization module is used for synchronizing the target number of basic events after receiving the target number of basic events broadcast by the master node so as to update the event level of the master node; and after inquiring and updating the latest node state from the target service module control layer of the target node in the distributed storage cluster, the target node sends the target number of basic events to the master node so that the master node broadcasts the target number of basic events to the nodes participating in cluster recovery.

The second modification module is used for modifying the cluster state from the recovery state to the normal state when receiving the recovery completion event; and the target node in the distributed storage cluster submits a recovery completion event after sending a target number of basic events to the master node.

And the second updating module is used for updating the event level of the node which does not participate in cluster recovery by means of completely copying the cluster state copy.

On the basis of the above embodiment, as a preferred implementation manner, the nodes participating in cluster recovery are determined according to the event levels of the nodes after modifying the event levels of the nodes in the distributed storage cluster according to a preset rule.

On the basis of the above embodiment, as a preferred implementation manner, the target number of basic events is the target number of null events.

On the basis of the above embodiment, as a preferred implementation manner, after modifying the cluster state into the recovery state, when there is no node in the suspended state, the master node sequentially submits the first cluster recovery event and the second cluster recovery event through its own service module control layer.

Based on the above embodiments, as a preferred implementation manner, the node state modification event includes any one or a combination of any several of a node addition event, a node suspension event, and a node removal event.

On the basis of the foregoing embodiment, as a preferred implementation manner, the key information includes any one or a combination of any of disk array configuration information, storage pool configuration information and virtual volume configuration information.

Based on the foregoing embodiment, as a preferred implementation manner, the non-critical information includes any one or a combination of any several of disk array path states, virtual volume path states, node states, reset input/output statistics, log information, and error logs.

A receiving module, configured to restart the workflow of the processing module 200 when the node state modification event is received during the process of the first cluster recovery event and the second cluster recovery event.

On the basis of the above embodiment, as a preferred implementation manner, the first modification module 100 is specifically configured to: the execution of the target event is terminated when a target event occurrence code predicate is executed on the first clustered copy.

On the basis of the above embodiment, as a preferred implementation manner, the first modification module 100 is specifically configured to: when executing a target event on the first cluster copy, if the business module control layer generates code assertion, the execution of the target event is terminated.

And the coverage module is used for covering the first cluster copy by adopting the second cluster copy to obtain a new first cluster copy.

On the basis of the above embodiment, as a preferred implementation manner, the first modification module 100 is specifically configured to: and modifying the cluster state of the new first cluster copy into a recovery state, switching to the second cluster copy, and modifying the cluster state of the second cluster copy into the recovery state.

the determining module is used for determining the cluster state when receiving the event to be executed;

The first judging module is used for judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event or the node state modification event when the cluster state is the recovery state; if not, skipping the execution of the event to be executed.

the second judging module is used for judging whether the event to be executed is the first cluster recovery event or the second cluster recovery event when the cluster state is the normal state; if yes, skipping the execution of the event to be executed; if not, directly executing the event to be executed.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Based on the hardware implementation of the program modules, and in order to implement the method according to the embodiment of the present application, the embodiment of the present application further provides an electronic device, and fig. 5 is a block diagram of an electronic device according to an exemplary embodiment, and as shown in fig. 5, the electronic device includes:

a communication interface 1 capable of information interaction with other devices such as network devices and the like;

and the processor 2 is connected with the communication interface 1 to realize information interaction with other devices and is used for executing the fault processing method provided by one or more technical schemes when running the computer program. And the computer program is stored on the memory 3.

Of course, in practice, the various components in the electronic device are coupled together by a bus system 4. It will be appreciated that the bus system 4 is used to enable connected communications between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. But for clarity of illustration the various buses are labeled as bus system 4 in fig. 5.

The memory 3 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.

It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be, among other things, a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read-Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read-Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory), Magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk-Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), and, Double data rate synchronous dynamic random access memory (DDRSDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), Direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory 3 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiment of the present application may be applied to the processor 2 or implemented by the processor 2. The processor 2 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 2 or by instructions in the form of software. The processor 2 described above may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 2 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium in the memory 3 and the processor 2 reads the program in the memory 3 to perform the steps of the method described above in connection with its hardware.

The corresponding flow in each method of the embodiments of the present application is implemented when the processor 2 executes the program, and for brevity, will not be described in detail herein.

In an exemplary embodiment, the present application also provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a memory 3 storing a computer program executable by the processor 2 for performing the steps of the method described above. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, CD-ROM, etc.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

Or the above-described integrated units of the application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied essentially or in part in the form of a software product stored in a storage medium, including instructions for causing an electronic device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims

1. A method of fault handling, for nodes in a distributed storage cluster, the method comprising:

2. The method of claim 1, further comprising, after querying and updating the latest node state with the target traffic module control layer:

3. The method of claim 2, further comprising, after transmitting the target number of basic events to the master node:

4. The method of claim 3, further comprising, after modifying the cluster state from the recovery state to the normal state:

5. The method according to claim 2, wherein the nodes participating in cluster restoration are determined according to the event levels of the nodes after modifying the event levels of the nodes in the distributed storage cluster according to a preset rule.

6. The fault handling method of claim 2, wherein the target number of base events is the target number of null events.

7. The method of claim 1, wherein the master node sequentially submits the first cluster recovery event and the second cluster recovery event through its own traffic module control layer when there is no node in a suspended state after modifying the cluster state to a recovery state.

8. The fault handling method of claim 1, wherein the node state modification event comprises any one or a combination of any of a node addition event, a node suspension event, a node removal event.

9. The method of claim 1, wherein the critical information comprises any one or a combination of disk array configuration information, storage pool configuration information, virtual volume configuration information.

10. The method of claim 1, wherein the non-critical information comprises any one or a combination of disk array path state, virtual volume path state, node state, reset input output statistics, log information, error log.

11. The fault handling method of claim 1, further comprising:

12. The fault handling method of claim 1, wherein terminating execution of the target event when an error occurs in executing the target event comprises:

13. The fault handling method of claim 12, wherein terminating execution of the target event when an error occurs in executing the target event comprises:

14. The method of claim 12, wherein before modifying the cluster state to the recovery state, further comprising:

15. The method of claim 14, wherein modifying the cluster state to the recovery state comprises:

16. The fault handling method of claim 1, further comprising:

When an event to be executed is received, determining a cluster state;

if not, skipping the execution of the event to be executed.

17. The method of claim 16, further comprising, after determining the cluster state:

If yes, skipping the execution of the event to be executed;

if not, directly executing the event to be executed.

18. A failure handling apparatus for a node in a distributed storage cluster, the apparatus comprising:

19. An electronic device, comprising:

a memory for storing a computer program;

A processor for implementing the steps of the fault handling method as claimed in any one of claims 1 to 17 when said computer program is executed.

20. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed, implements the steps of the fault handling method according to any of claims 1 to 17.