CN105242979A

CN105242979A - Backward recovery error tolerance method with forward recovery feature

Info

Publication number: CN105242979A
Application number: CN201510571405.5A
Authority: CN
Inventors: 高胜法; 邵春阳; 高雅娴
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-09-09
Filing date: 2015-09-09
Publication date: 2016-01-13
Anticipated expiration: 2035-09-09
Also published as: CN105242979B

Abstract

The present invention discloses a backward recovery error tolerance method with a forward recovery feature. The method comprises: controlling transmission of a message by means of a defective bit; if both a transmission process and a reception process do not have a failure, before transmitting a message, the transmission process storing the message and a logical clock corresponding to the message in a message log and then transmitting the message; if the process is in a failure recovery stage, prohibiting the process from transmitting the message; if the reception process has the failure, the transmitting process not transmitting the message to the reception process until the reception process is recovered, so that the effect that a non-failure process is subjected to conditional execution in a system failure recovery stage is achieved; when the process has the failure, acquiring a replay message and a corresponding logical clock from the message log under the control of a recovery thread at first, and then re-sorting the replay message according to the logical clock of the message; and finally, re-transmitting the sorted message to the failure process, and the failure process re-receiving the message and processing the message, so as to implement replay of the message.

Description

A kind of have the backward recovery fault-tolerance approach that forward direction recovers feature

Technical field

The present invention relates to distributed system Effect-based operation to reorder message logging fault recovery method.

Background technology

To recovery fault-tolerance study field after distributed system, by protecting stored type, rollback and recovery agreement can be divided into the rollback and recovery agreement of rollback and recovery agreement based on checkpoint and Effect-based operation daily record.The agreement of Effect-based operation daily record rollback and recovery is subdivided into again pessimistic message logging and optimistic message logging.

Adopt in the system of pessimistic message logging [1], be always first saved in message logging after process receipt message, then just submit to application program process.This feature is guaranteed process received message and is received that order is complete is kept at message log file, makes the fault recovery of process comparatively simple; But therefore have a strong impact on performance when system is normally run due to information to message logging first must be preserved after receipt message.Divide to message logging by what process to preserve information by, pessimistic message logging agreement is divided into message logging agreement (sender-based) [2] [3] [4] based on sender and the message logging agreement (receiver-based) based on recipient.

Based in the message logging agreement of recipient, after process receipt message first information such as the reception order of message stored in message logging, and then submit to this message to application program process.

Based in message logging agreement [2] [3] [4] of sender, first message content is kept in core buffer before the transmission process of message sends a message, then sends this message to receiving process.First the reception order of this message is sent to the transmission process of message after receiving process receipt message, and then submit to message to application process.The content of message and receive order unification stored in message logging after the transmission process of message receives the reception order of message.

Adopt in the system of optimistic message logging [1], after process receipt message, at once submit to application program process, then just preserve received message when process is idle to message logging.This feature makes system process have good performance when normal operation; But when breaking down, received and the message not being saved to message logging if there is process, the reception order of this type of message must be lost, and makes the fault recovery of process very complicated.

Effect-based operation reorders and message number is checked journal recovery agreement [5] [6], adopt message rearrangement sequence method, better solve the message sink order caused due to process failure in optimistic daily record and lose problem.Transmission process under this agreement is kept at the reception order of this message of logical timer indirect labelling and by message and logical timer thereof when sending message in the storage of transmission process this locality.After receiving process receipt message, first message is submitted to application program process, then in process free time, message and reception order thereof are saved to message logging.When message sink process failures, first recovering process obtains the message and logical timer thereof of having preserved from message logging, then according to transmission process and receiving process preserve message number difference obtain from this locality of the process of transmission stores because process failure fails to be saved to message and the logical timer thereof of message logging.Then according to the logical timer of message, message is resequenced, and realize the recurrence of message.Process under this agreement, when normally performing, submits to application program process after receiving message at once, only preserves information to message logging in process free time.Above-mentioned feature makes this recovery agreement in fact still belong to optimistic message logging agreement; When process normally performs under this agreement, there is superperformance, and its recovery algorithms is comparatively simple when certain process failures.

The major defect of the journal recovery agreement of message rearrangement sequence and message number inspection is when normal program operation, and process still need be saved to message logging in process free time message after receiving message.But this agreement issues and sends journey to and send before message and the content of message and logical clock value thereof are saved to local storage, the message logging of receiving process end is redundancy, therefore this protocol algorithm also can be optimized further.

Amended recovery agreement [7] of reordering still carries out algorithm design based on the basic theories that reorders, but has done larger amendment to recovery agreement, mainly eliminates the message logging of message sink process end.In this agreement, message send process send first preserve message before message content and logical timer thereof to message logging, after message sink process receipt message, at once submit application program process to, and after this no longer preserve any information.Compared with message logging agreement (sender-based) [2] [3] [4] based on sender, receive order to transmission process without the need to transferring messages after the amended receiving process receipt message reordered in recovery agreement, thus when process is normally performed, there is good performance and guarantee there is not any orphan's process in process failure Restoration stage system.

Amended reordering recovers the constraint that agreement [7] has thoroughly broken away from pessimism, optimistic protocol, defines a kind of brand-new fault-tolerant recovery agreement simultaneously possessing pessimism and optimistic protocol advantage.This agreement has following principal feature:

1) process sends the content of disposable preservation message before message and Indication message accepts the logical timer of order to message logging, then sends a message to receiving process.This feature makes this agreement have pessimistic message logging protocol characteristic, and namely any instant system does not exist orphan's process.

2) do not preserve any information after process receipt message, the process of the process thus under this agreement when normal execution under more optimistic log protocol has more excellent performance.This feature makes this agreement be different from pessimism and optimistic protocol completely, and the performance of its system when normal execution is better than optimistic protocol.

3), during certain process failures, non-fault process is had ready conditions and is continued to perform; Thus at process failure Restoration stage, the process under the more pessimistic log protocol of process under the recovery agreement that reorders has more excellent fault recovery performance.

4) any process can independently preserve its process status (checkpoint) at any time.

The amended These characteristics recovering agreement that reorders makes its property indices all be better than existing all message logging agreements.But this agreement simply show the condition (limit a crashed process and send the conditions such as message to another crashed process) that non-fault process is run, but do not provide non-fault process to have ready conditions the concrete methods of realizing performed, therefore limit the application of agreement.

Object of the present invention and the effect that can reach:

Improve the recovery agreement that reorders, propose a kind of backward recovery fault-tolerance approach based on the theory that reorders being convenient to realize with programmer's language.The method has similar amended reordering and recovers the essential characteristic of agreement [7], and there is following unique distinction: 1, perfect further to the Theories and methods of condition operation under non-fault process failure state, 2, increase process failure zone bit, and control non-fault process by fault position and to have ready conditions operation in the system failure recovery stage.

Message rearrangement sequence ultimate principle

Under segmentation determines that (PWD) supposes, the message sink event of process has randomness, namely time that is received in of message and order have uncertainty, but the transmission event of message determines event really.

As shown in Figure 1, suppose that distributed system is made up of process p, q and r.Wherein, δ _{p, 0}, δ _{q, 0}and δ _{r, 0}represent the original state interval of p, q and r respectively; δ _{q, 1}and δ _{q, 2}represent process q receipt message m respectively ₁and m ₂after state interval; t _pqand t _rqrepresent the communication channel time delay between process p and q and between q and r respectively.Under optimistic message logging agreement, if process q is by message m ₁and m ₂necessary information be saved to journal file before break down in " x " place.After process q breaks down, process p, q and r must restart resend and receive m ₁and m ₂.Obviously, the order of process q playback (replay) message should be m ₁, m ₂, but due to channel time delay t _pqand t _rqnot fixed constant, if t _pq>t _rqthen the order of process q receipt message may become m ₂, m ₁.The example explanation of Fig. 1, although optimistic message logging protocol requirement crashed process accurately sending when recurring, receiving the message not being saved to journal file, but (example in some cases, process channel time delay changes, and it is uneven etc. that the process in system restarts the time) order of actual process messaging may with fault before inconsistent.But repeat, the net result of system should be consistent, and this illustrates that the execution result of system and the reception order of some message have nothing to do under PWD hypothesis. at every turn

Always formerly be related (alwayshappenbefore) relation:

The channel of suppose process is FIFO Reliability Channel, e _iand e _jrepresent message m respectively _iand m _jtransmission or receive event.If the generation of ei is always prior to e in any once execution of system _jgeneration and have nothing to do with the factor such as the time delay of channel, the speed of cpu, then claim e _ialways formerly betide e _j, be designated as:

In fig. 2, R (m ₁), R (m ₂), R (m ₃) and R (m ₄) represent message m respectively ₁, m ₂, m ₃and m ₄reception event, S (m ₁), S (m ₂), S (m ₃) and S (m ₄) represent message m respectively ₁, m ₂, m ₃and m ₄transmission event.Under segmentation determines that (PWD) supposes, because q receives m ₁after must send m ₂, r receives m ₂after must send m ₃, namely therefore r (m ₁) always formerly betide R (m ₃) show R (m ₃) depend on R (m in logic ₁), R (m ₃) and R (m ₁) between relation be a kind of logic dependencies and system other factors have nothing to do.Because the reception event of message is deterministic case, therefore under the hypothesis of FIFO channel,

Due to m in Fig. 2 ₃and m ₄transmission through different channel time delay arrive process q, therefore R (m ₃) not necessarily always formerly betide R (m ₄).If event e _igeneration not necessarily always prior to event e _jgeneration, but relevant with factors such as channel time delay, cpu speed, then claim e _ialways formerly do not betide ej, be designated as in Fig. 2, therefore the message sink sequence of process q reality is m1, m3, m4 or m1, m4, m3.

Message equivalence sequence theorem:

Suppose that S is a message sequence of process p, S ' is the new sequence message in S being rearranged rear formation.Element in S ' meets: the message 1, in all S of being present in still exists in S '; If the reception event of 2 some message has in S be always formerly related, then this pass ties up in S ' and still keeps.Under process channel FIFO and Reliability Channel hypothesis, S with S ' is equivalent sequence (calculating that process completes after receiving two sequences is respectively identical) in process p playback procedure.

Prove: under theorem assumed condition, although some message in S is resequenced in S ', being always formerly related between these message remains unchanged in S '; Therefore in S ', the reception order of message is that process recurs the actual order that may occur in (replay).If S and S ' is not equivalent sequence in process p playback procedure, the message sink event of message sink event then in S in the execution and S ' of process p is at the execution inequivalence of process p, namely each execution of same process is inconsistent, and the consistance attribute that this and process perform contradicts.

In fig. 2, m1, m3 and m4 and m1, m4 and m3 are equivalent sequences to example, and the calculating that namely process p recurrence (replay) m1, m3 and m4 and process after recurrence m1, m4 and m3 complete is identical.

The logical timer improved:

The logical timer LCp that process p improves is an integer variable, for the transmission event of message with receive event count.LCp meets:

1, its initial value is zero;

2, often send a message, LCp adds one;

3, often receive process q message, LCp ← max (LCp+1, LCq+1), wherein LCp and LCq represents process p respectively

With the logical timer of q, max represents the maximal value of getting in LCp+1 and LCq+1.

As shown in Figure 3, p sends m ₁rear LCp=1, sends m ₄rear LCp=2.Q receives and preserves m ₁after, LCq=2, sends m ₂rear LCq=3, receives and preserves m ₃after, LCq=6, receives and preserves m ₄after, LCq=7.

For ease of expressing, by the logical timer of improvement referred to as logical timer.

Message rearrangement sequence logical timer theorem:

If segmentation determine hypothesis PWD set up and then LCp (S (mi)) <LCq (S (mj)).Wherein, R (mi) and R (mj) represents the logical timer after the message m i of process k and reception event .LCp (S (mi)) the expression process p transmission message m i of mj respectively, and LCq (S (mj)) represents the logical timer after process q transmission message m j.

Prove: due to r (mj) logic depends on R (mi), and S (mj) determines event, therefore otherwise, suppose that R (mi) does not always formerly betide S (mj), this means that R (mi) and S (mj) can occur in any order, or R (mi) formerly betides S (mj), or S (mj) formerly betides R (mi).If S (mj) formerly betides R (mi), because S (mj) always formerly betides R (mj), therefore S (mj) can only formerly betide R (mi) indirectly.As shown in Figure 4, inevitable at least exist a message m k between mj and mi, make S (mj) → S (mk), S (mk) → R (mk), R (mk) → S (mi), S (mi) → R (mi), wherein " → " expression is formerly related.In this case, the transmission that mj and mi necessarily passes different transmission channel arrival process k, mj and mi may have different channel time delay, therefore R (mi) always can not formerly betide R (mj), the hypothesis test of this and theorem, therefore according to the definition improving logical timer, LCp (R (mi)) must be less than LCq (S (mj)), i.e. LCp (R (mi) <LCq (S (mj)).Again because so LCp (S (mi)) <LCp (R (mi)).LCp (S (mi)) <LCp (R (mi)) <LCq (S (mj)) can be obtained thus, LCp (S (mi)) <LCq (S (mj)).

Example, in figure 3, lCp (S (m1))=1, LCr (S (m3))=5, LCp (S (m1)) <LCr (S (m3)).

Above-mentioned theorem shows, the logical timer that the message sink order in the message sink sequence of arbitrary process can send process by message is determined.If the content that the transmission process of message is sending logical timer and the message of message being therewith correlated with while message is saved to firm storage or message logging; After then the receiving process of message breaks down, recovering process also can obtain required recurrence message sequence the message of preserving in advance by the rearrangement of its logical timer thus.Example, in figure 3, p process preserves <m1 after sending m1, and LCp=1>, p process preserves <m4, LCp=2> after sending m4; R process preserves <m3, LCr=5> after sending m3; If process p exists " X " place breaks down, can according to two tuple <m1, LCp=1>, <m3, process p is recurred message sequence rearrangement for m1, m4 and m3 by the logical timer in LCr=5> and <m4, LCp=2>.In figure 3, although process q at the message sequence that the fault recovery stage receives is: m1, m4 and m3, different from former message sink sequence m1, m3 and m4; But message sequence m1, m4 and m3 really system cloud gray model time process q one of real messages receiving sequence.The function (except stochastic system) realized due to a system is determined, the calculating that therefore in Fig. 3, q completes after receiving above-mentioned two kinds of different messages sequences is identical.

Restoration stage non-fault process continues executive condition

Before theoretical release of reordering, non-fault process under process resumption stage any agreement or rollback with elimination system in orphan's process (optimistic message logging agreement) of existing, or stop at its current state to wait for the recovery (pessimistic message logging) of crashed process.

Example in Figure 5, if the process in p, q and r expression system respectively; C _x,yy the checkpoint of expression process x, x=p, q, r; Y=0,1; Process q and r breaks down at " x " place respectively.Under pessimistic log protocol, the maximum of system returns to form [1] as indicated by the dashed lines.At the Restoration stage of system, process q need retreat to checkpoint C _{q, 0}and recurring message m 2, m4 and m6, process r need retreat to checkpoint C _{r, 1}and recurring message m 3 and m5, process p stops at its current state and waits for that other processes are recovered from its malfunction.

For another example in figure 6, if the process in q, p and r expression system respectively; C _x,yy the checkpoint of expression process x, x=p, q, r; Y=0,1; Process r breaks down in " x " place before receiving m5 but preserve m5 to message logging.Under optimistic log protocol, the maximum of system returns to form as indicated by the dashed lines.In the System recover stage, process r palpus rollback is to checkpoint C _{r, 1}, due to process q and p State-dependence in or indirectly depend on the reception event of message m 5, therefore for avoid the appearance process q of orphan's process and p must respectively rollback to its checkpoint C _{q, 0}and C _{p, 1}.

In traditional message logging agreement, the execution of process can be divided into two stages [7], normal execute phase and fault recovery stages.In the normal execute phase, after process receipt message or first preserve information and then submit application program process to message logging, or directly submission application program process.In the fault recovery stage, recurrence message [7] (replaymessage) that have crashed process can only receive fault recovery process to send, otherwise crashed process may repeat to receive same message; For the message that crashed process sends, because this type of message is received process by other processes, so the necessary conductively-closed of this type of message, or give up it after being received by receiving process, or forbid pass on [7] of this type of message by communication system.

The state that maximum return to form (maximumrecoverablestate) of a system refers to that system can return to when there is process failure, more properly, the maximum global coherency state that can reach in process resumption stage system is referred to.

In traditional pessimistic message logging agreement; in the fault recovery stage because system finally must return to a maximum global consistent state; therefore must suspend in this stage non-fault process and maximumly return to form in it, to guarantee that Restoration stage terminates rear system and continues to perform from a consistent global state.Otherwise, if the state of non-fault process after the fault recovery stage continuation execution especially transmission of this type of process and the state after receiving new message residing for it is likely resumed with crashed process is inconsistent.Example, in Figure 5, after suppose process q and r breaks down, send message to process q or r if process p continues execution concurrence, then this type of message transmission and receive and certainly will change the message sink sequence of process q or r, thus make system be in a wrong global state.

But under certain conditions, non-fault process performs in the continuation in fault recovery stage does not affect the recovery of crashed process and finally reaching of system conformance global state.

Example in the figure 7, if the process in p, q, r and s expression system respectively; C _x,yy the checkpoint of expression process x, x=p, q, r, s; Y=0,1; Process q and r breaks down at " x " place respectively.Under pessimistic log protocol, after process q and r breaks down, process p and s stops performing with the recovery waiting for crashed process, and the maximum of system returns to form as shown in figure maximumrecoverablestate.Suppose the fault recovery stage at process q and r, if process p continues execution concurrence send message m 8 to process s, and process q and r no longer sends any message after returning to the maximum maximumrecoverablestate of returning to form, then after this system reaches a new global state, as shown in figure Newmaximumrecoverablestate.Owing to not comprising orphan's process in Newmaximumrecoverablestate state, therefore this state is a global coherency state.But, if crashed process is resumed send a message to other processes to the maximum maximumrecoverablestate of returning to form, then the continuation of non-fault process performs the message that causes and sends and may produce non-fault institute and send out message and have crashed process to be resumed the reception order inconsistence problems of rear sent out message.In fig. 8, in the fault recovery stage, if p process suspension waits for that crashed process is recovered, then after System recover, then process s or first receipt message m9 receives m8; If p process continues execution concurrence send m8, crashed process is recovered rear process r and is sent m9, then process s receives m9 (as shown in phantom in FIG.) after first receiving m8.Obviously, the order of process s receipt message m8 and m9 may occur inconsistent in these cases; But this inconsistent be system actual perform time a kind of outwardness.In fact, message m 8 and m9 have been conveyed through different process transmission, when the time delay of process channel changes, are possible for process s two kinds of message sink order.As a deterministic system, the calculating that system completes can not change because of the change of systematic parameter, therefore by after two kinds of different order receipt message, the calculating that process s and whole system complete is identical.

In sum, draw to draw a conclusion:

Suppose that distributed system is loosely coupled system, namely between process without shared drive, normal when running any process send the information of message, comprise message content and message sink order, be all saved to firm storage.

Non-fault process condition performs theorem:

At process failure Restoration stage, if non-fault process continues to perform; If meet the following conditions, then Restoration stage terminates rear system and is in a consistent global state.

1, forbid that message that crashed process sends sends.

2, forbid that other processes send message at the Restoration stage of crashed process to it, namely crashed process only receive fault recovery process the recurrence message sent out.

3, non-fault process continuation execution concurrence send and receipt message, but when sending message to crashed process, must stop waiting for until crashed process is resumed.

Prove:

According to title-based condition, owing to being isolated completely in process failure Restoration stage non-fault process and crashed process, when non-fault process is run, institute's message that sends is inevitable transmits between non-fault process, and non-fault process send message number must be more than or equal to the received number [7] of this type of message (midway message may be there is), therefore can not there is orphan message and orphan's process at any time.Therefore if non-fault process performs at fault recovery stage conditions, then crashed process is resumed the state that rear system must be in a global coherency.

According to above-mentioned theorem, under meeting condition set by theorem, it is identical that non-fault process continued to run its calculating finally completed with the calculating completed in system failure situation in the fault recovery stage of other processes.

The present invention devises fault recovery agreement according to the title-based condition of theorem, this agreement has possessed some feature that forward direction recovers, some process failures only carries out Recovery processing to crashed process, and non-fault process is had ready conditions execution, and system does not stop until complete required calculating.

Summary of the invention

The hydraulic performance decline when process that the object of the invention is to cause for message sink process preservation message sink order in pessimistic message logging normally performs and fault recovery stage non-fault process stop waiting for the low problem of running efficiency of system caused, and reach the object of raising process normal execute phase and fault recovery stage system performance.This invention, with the reception order of logical timer indirect identification message, is first saved to message logging message content and counterlogic clock before process sends message.When message sink process failures, first recovering process obtains from message logging and allly preserves from receiving process the message and counterlogic clock that have received since checkpoint, and according to the counterlogic clock of message, message is resequenced, finally the message after sequence is resend to crashed process.Recovery protocol for error tolerance under the present invention allows multiple process failures; In the fault recovery stage, non-fault process condition performs, and crashed process is resumed alone, thus makes agreement have the feature of forward direction recovery.

For achieving the above object, the present invention adopts following technical scheme:

Distributed system by common process Pi, i=1,2 ... n, state controls process Controli, i=1, and 2 ... n, Recover-manager fault recovery managing process and message logging system composition, as shown in Figure 9.

System respectively defines following data structure:

1, vector T i, Ti=[T _i1, T _i1... T _in], wherein T _ikexpression process Pi receive the message number that Pk sends.For the ease of there being the access to Ti vector under fault and non-failure conditions, Control-i process and common process Pi are defining and are using same Ti vector, and this vector leaves the shared memory space (or local hard disk) of two processes in.For common process, Ti vector is only write; For Control-i process, Ti vector is read-only.

2, fault vectors F, F=[F ₁, F ₂... F _n], F _kthe malfunction of expression process Pk, F _k=1 represents that Pk exists fault, F _k=0 represents Pk non-fault.Reason similar to the above, Control-i process and common process Pi use same F vector and are kept at the shared memory space (or local hard disk) of Control-i process and common process Pi.For common process, F vector is read-only; For Control-i process, F vector is only write.

3, the logical timer improved, LCi (referred to as logical timer variable, its definition as previously mentioned), this variable save is at the memory headroom of common process Pi.

Various parts principle of work:

1, Recover-manager process is responsible for fault detect and the fault recovery of common process in system, primarily of main thread and fault recovery thread composition.Main thread regularly passes through Reliability Channel (routine tcp channel) to process Pi, i=1 ... n, sends heartbeat message, and receives the response message of Pi.

If main thread does not receive the response message of Pi process, illustrate that Pi breaks down (establishing Pi place machine cpu to meet fail-stop hypothesis [8]), then control process Control-i by state and kill Pi process, restart Pi from the permanent checkpoint of Pi process.Then by the fault recovery thread of Recover-manager, recovery operation is carried out to Pi.First fault recovery thread obtains from message logging Mlog and recurs message, recurrence message rearrangement sequence, finally ordering message is sent to Pi process.

If main thread receives the response message of Pi and the schedule time arrives, then transmit control message to state control to preserve after process Control-i, Control-i process receives control message Pi hasty checkpoint and by its persistence (saving as permanent checkpoint).

2, state controls process Control-i, i=1, n, be deployed in same machine with process Pi, be responsible for controlling the process status of Pi, comprise: the current state of preserving Pi is hasty checkpoint, and the hasty checkpoint of Pi is converted to permanent checkpoint, kill Pi process, restart Pi process from permanent checkpoint.

3, common process Pi is responsible for the transmission of application message in system, and submits to application message to systematic difference program.

4, for reducing the bottleneck of system communication, the journal file in message logging system adopts distributed deployment mode, and namely a system disposes multiple message log file Mlog; Each Mlog file be responsible for different process send out the storage of information.Each Mlog is made up of several records, be a preservation process send out the sequential file of information.Each record is a four-tuple: <i, j, LCi, m>, and wherein i and j represents the process identification (PID) of transmission and receiving process respectively, and LCi represents the logical timer after process Pi transmission message m, and m represents the content of message.Each journal file is safeguarded by a finger daemon, and multiple daily record finger daemon completes access and the U of information in Mlog jointly _ijthe operation such as calculating.U _ija variable, represent Pi process send the message number that Pj process receives.U _ijinitial value be zero; Finger daemon often receives the information <i of a message, j, LCi, m>, U _ijvalue add one.All finger daemons of log system maintain the individual U of (n-1) × (n-1) altogether jointly _ijvariable, wherein i=1,2 ... n, j=1,2 ... the number of common process in n, i ≠ j, n expression system.All U _ijvariable constitutes the U matrix [7] of a system, and U matrix have recorded any one process in system and is sent to the message number of other process.

When system is normally run, the message between process Pi and Pj transmits as shown in Figure 10.Before process Pi sends message m, first daily record finger daemon is passed through by <i, j, LCi, m> are stored in Mlog, and then Pi sends message <i, j, LCi, m> at once submit to application A PP process after receiving process Pj, Pj receive m.

When certain process Pi breaks down, as shown in figure 11, the rejuvenation of Pi is as follows.

1, Recover-manager sends Pi failure message (Fi=1) to Control-i process; Control-i process is killed Pi after receiving message and is restarted Pi from the permanent checkpoint of Pi.

2, Control-i process sends Ti vector (as previously mentioned) to Recover-manager.Because Ti vector fractional integration series fits over the shared memory space of Control-i process and common process Pi, therefore after Pi is restarted from permanent checkpoint, Ti vector is resumed to preserving the value before checkpoint, this value receives by Pi process before the permanent checkpoint of preservation, other processes send the number of message.

3, Recover-manager access log finger daemon obtains vector { U _1i, U _2iu _ni.Calculate Pk process (k=1,2 ... n, k ≠ i) be sent to the message number U of Pi process _ki-T _ik, and from Mlog obtain U by from rear to front order by corresponding daily record finger daemon _ki-T _ikindividual Pk is sent to Pi process message (recurrence message), k=1, and 2 ... n, k ≠ i.

4, all recurrence message is resequenced by its logical timer LCi, send the message after sequence successively to Pi process.

Detailed process:

Have the backward recovery fault-tolerance approach that forward direction recovers feature, its step is:

Common process Pi message sink thread and send and receiver function operational scheme as shown in figure 12.

Common process Pi receiving thread operating process is as follows:

1, for k is integer variable, k=1,2 ... n, initialization Tik and LCi are the message number that 0, Tik represents process Pi institute receiving process Pk, and LCi represents the logical timer variable of process Pi.

If 2 from Pj receipt message, then proceed to 3; Otherwise proceed to 4.

3, call ReadM (j, i, LCj, m) function and read message from system message reception buffer zone, wherein j and i represents transmission and the receiving process of message respectively, and LCj represents the logical timer that message m is corresponding.

4, step 3 read message and submit application program process to, then proceed to 5.

If 5 receive Recover-manager main thread send heartbeat message, proceed to 6; Otherwise proceed to 2.

6, transmission acknowledges message to Recover-manager main thread, then proceeds to 2.

Computing machine in supposing the system meets fail-stop disorderly closedown model; because Pi breaks down; Pi can not receive Recover-manager main thread send out heartbeat message, more impossible transmission acknowledges message to Recover process, and therefore the execution of step 6 represents that Pi process does not break down.According to the response whether receiving Pi, Recover-manager main thread can judge whether Pi is in normal operating condition.

It is as follows that common process Pi sends function SendM (i, j, LCi, m) operating process:

If 1 Fi=0, show that message sends process Pi non-fault, then proceed to 2; Otherwise showing Fi=1, there is fault in Pi, proceeds to 4.

If 2 Fj=0, show message sink process Pj non-fault, proceed to 3; Otherwise Fj=1, shows that Pj exists fault, proceed to 2, wait for that Pj recovers from malfunction.

3, send and under the equal non-failure conditions of receiving process, send the logical timer of process and add one: LCi ← LCi+1; Add information <i, j, LCi, m> are to the end of message log file.Wherein, i and j represents transmission and receiving process respectively, and LCi represents the logical timer variable of Pi process, and m represents the load (payload) of message.Proceed to 5.

4, the logical timer sending process adds one: LCi ← LCi+1; End process of transmitting.

5, send application message AM<i, j, LCi, m> to receiving process Pj, end process of transmitting.

Step 4 corresponds to the situation that Pi process breaks down, and when Pi breaks down, Pi does not send message AM<i, j, LCi, m>, and only upgrades LCi variable.

Not sending any message under nonserviceabling in order to ensure process, sending in function the malfunction employing fault flag Fi and Fj and represent transmission and receiving process, if fault position is 1 expression process fault, otherwise representing process non-fault.For realizing above-mentioned Fault Identification mechanism, each common process Pk, k=1,2 ... n, all maintains fault vectors F, F={F1, a F2 ... Fn}.The fault vectors F that process Pk uses is defined in the shared drive of Controlk process and common thread Pk, and by the maintenance update of Controlk process.When certain process breaks down, the Controlk corresponding position arranged in fault vectors is 1, and after certain process is resumed from fault, the Controlk corresponding position arranged in fault vectors is 0.

Common process Pi receiver function ReadM (j, i, LCj, m) operating process is as follows:

1, application message AM<j is received from process Pj, i, LCj, m>.

2, Tij variable adds one: Tij ← Tij+1; Tij represent Pi process receive the message number of Pj process.

3, LCi variable adds one: LCi ← LCi+1; LCi represents the logical timer of Pi.

If 4 LCi are more than or equal to AM.LCj+1, turn to end; Otherwise, turn to 5.AM.LCj represents four-tuple <j, the component LCj in i, LCj, m>, namely sends the logical timer of process Pj.

5, LCi ← AM.LCj+1, turns to end.

Above-mentioned steps 3,4 and 5 is actually the logical timer LCi after calculating Pi receipt message, namely the greater of the logical timer of Pi and Pj is added the logical timer LCi as Pi after.

The main thread of Recover-manager process, message sink thread and fault recovery thread Recover (k) operational scheme are as shown in figure 13.

Main thread operating process is as follows:

1, initialization memory variable, Fi ← 0, i=1,2 ... n, num ← 0.Wherein, Fi is Reflector, and for the malfunction of labeling process Pi, Fi=0 represents Pi non-fault, and Fi=1 represents that Pi exists fault; Num is a loop control variable.

If 2 num are greater than COUNT proceed to 3, otherwise proceed to 5.

COUNT is a constant, can arrange according to the time interval of preserving hasty checkpoint.Preserving time interval of checkpoint is: COUNT × DELAY second, if example COUNT=60, DELAY=2, then preserves a hasty checkpoint every 120 seconds.

3, if Fi=0, show Pi non-fault, then send message informing Control-i process and preserve Pi hasty checkpoint, Newck [i] ← 1.Newck [i] is a zone bit, and Newck [i]=1 represents that Pi exists hasty checkpoint.

4, num ← 0, composes 0 value again to num variable.

5, the value of num ← num+1, num variable adds one; Time delay DELAY second, DELAY is a constant.

6, k ← 0, k assignment 0, k is a variable.

If 7 k are greater than n proceed to 2, otherwise proceed to 8; The number of all common process Pi in system shown in n table.

8, the value of k ← k+1, k variable adds one.

If 9 Fk=0, represent Pk process non-fault, proceed to 10; Otherwise Fk=1, represents that Pk has fault and is in the state of being resumed, proceeds to 7.

10, heartbeat message is sent to Pk with Reliability Channel.

If 11 receive the response message of Pk process to heartbeat message, Pk non-fault is described, proceeds to 13; Otherwise do not receive the response message of Pk process to heartbeat message, illustrate that Pk exists fault, proceed to 12.

12, send message informing Control-k and delete hasty checkpoint; Newck [k] ← 0; Start thread Recover (k) recovering Pk; Fk assignment 1, to control no longer to send heartbeat message to Pk in next circulation, proceeds to 7.

If 13 Newck [k]=1, illustrate to there is Pk hasty checkpoint, proceed to 14; Otherwise, illustrate to there is not Pk hasty checkpoint, proceed to 7.

14, send message informing Control-k process and the hasty checkpoint of process Pk is converted to permanent checkpoint; Newck [k] ← 0, Newck [k] zero setting, proceeds to 7.

Fault recovery Recover (k) threading operation flow process is as follows:

1, Fk=1 message is sent to Control-i, i=1 with Reliability Channel, 2 ... n, notifies that other process Pk process breaks down.

If 2 receive Control-k send out Tk vector, proceed to 3; Otherwise proceed to 2, wait-receiving mode Tk vector.

Note: after Recover (k) thread sends the message of Fi=1 by step 1, process Control-k process corresponding to crashed process Pk will be killed Pk after receiving and be restarted Pk from its permanent checkpoint, then send Tk vector to Recover (k) thread.

3, received Tk vector is saved to internal memory.

4, access finger daemon obtains Uik.

5, access log finger daemon, obtains to front order from journal file the message that Uik-Tki Pi is sent to Pk by from rear.

6, all message is sorted from small to large by LCk variate-value.

7, after sending sequence, message is to Pk, namely recurs message.

8, the message of Fk=0 is sent to Control-i, i=1 with Reliability Channel, 2 ... n; Represent that Pk process is recovered from fault.Fk ← 0, Fk zero setting, turns to end.

The operational scheme of Control-i process as shown in figure 14.

The operating process of Contril-i is as follows:

If 1 receives the message of preserving hasty checkpoint proceed to 2, otherwise proceeds to 3.

2, preserve new hasty checkpoint, delete old hasty checkpoint.

If 3 receive the message of preserving permanent checkpoint proceed to 4, otherwise proceed to 5.

4, delete former permanent checkpoint, Pi hasty checkpoint is converted to permanent checkpoint.

If the message that 5 do not receive Fk=0, proceeds to 7; Otherwise proceed to 6.

6、Fk←0。

Fk zero setting, represents that Pk process is recovered.

If the message that 7 do not receive Fk=1, proceeds to 1; Otherwise proceed to 8.

8、Fk←1。Fk puts 1, represents Pk process failures.

If 9 i are not equal to k, represent that Pi process does not break down, proceed to 1; Otherwise i equals k, show Pi process failures, proceed to 10.

10, crashed process Pi is killed; Then Pi is restarted from the permanent checkpoint of Pi.

11, send Ti vector to Recover (k), proceed to 1.

Note: after this, will recover Pk, i.e. Pi (i=k) by Recover (k) thread.

Recover thread Recover (i) recovery algorithms correctness proof:

First the restorability principle at process status interval is described: a process status interval is recoverable, if no matter in the future any fault appears in this process, re-executing of process always can reach this interval.

Theorem 1 supposes that non-fault process is in halted state when crashed process is resumed, if one or more process failures, then under the effect of Recovery (i) recovering process, crashed process will be resumed to the state before breaking down.

Prove: due to process send message before first the information of message (factor of determination) is kept in message log file, be therefore recoverable according to above-mentioned process status interval this process of restorability principle.Below proved in detail respectively in two kinds of situation.

1, suppose to only have a process Pk to break down, k=1,2 ... n.After fault is detected, crashed process Pk will be killed and from its permanent checkpoint start.Recover thread Reconver (k) by the Tk variable of Contril-k process acquisition Pk process and after obtaining Uik variable by daily record finger daemon, obtain all recurrence message according to difference Uik-Tki by daily record finger daemon.By these message by the sequence of its logical timer, be sent to Pk successively, Pk process receives processing messages again, and finally will reach the state interval before process failures, namely Pk process is recoverable.

If 2 multiple process failures, because each crashed process recovers it separately by recovery thread, and forbid that each crashed process sends message, therefore during multiple process failures, these crashed process can be resumed thread and recover it respectively.

Theorem 1 demonstrates when one or more process failures, each process can be resumed and maximumly return to form to traditional, but above-mentioned recovery agreement allows non-fault process to have ready conditions during fault recovery to be continued to perform, and the consistance of the state after crashed process is resumed and non-fault state of a process will finally determine the correctness of above restoration methods.

Theorem 2 is under the effect of recovery thread, and continue to perform if non-fault process is had ready conditions, then all crashed process are resumed the global state of rear system is a conforming global state.

Prove: as recover thread algorithm described by, if one or more process breaks down, if do not break down process to crashed process send message; stop wait for crashed process recover, otherwise continue perform.

If situation 1 non-fault process sends message to non-fault process, then after crashed process is recovered, the number that the process that do not break down sends message to crashed process must not change, and can not there is orphan message between these processes.

If situation 2 non-fault process stops at the event sending message to crashed process, then after crashed process is recovered, the number that non-fault process sends message to crashed process must also not change, and also can not there is orphan message between these processes.

Comprehensive two kinds of situations, according to the conforming connotation of global state, the system global state after crashed process recovery is a consistent global state.

But, after all crashed process are all resumed, likely occur multiple process that do not break down be resumed to one after crashed process send the situation of message.In fact, different transmissions is through because multiple transmission process sends message to the process after recovery, therefore do not exist always formerly be related [7] between the reception event for these message of receiving process, these events can perform by any order.

Compare with existing message logging restoration methods:

Different messages journal recovery agreement has different Performance Evaluating Indexes, and we use following six indexs to go evaluation one to recover the performance of agreement:

1, N.ckpts, all processes are preserved inspection and are counted.

2, INFOR.add, extraneous information amount entrained by an application message.

3, DIS.rol, process rollback distance.

4, after N.roll, k process sends fault, the number of processes of rollback between convalescence, is needed.

5, Rollback, non-fault process whether rollback.

6, type, protocol type.

Recover the rear in recovery protocol for error tolerance (being called for short forward direction characteristic recovery agreement) of feature at forward direction, because hasty checkpoint is deleted after next heartbeat message, therefore each process only need preserve a permanent checkpoint.Data entrained by each application message are i, j and LCj; So INFOR.add is 3.After process failures, crashed process only needs rollback to its permanent checkpoint, therefore the distance DIS.rol of process rollback is 1.When after k process failures, non-fault process is without the need to rollback, and only need k process rollback, therefore N.roll is k.First preserve information because process sends before message, there is not orphan's process in any time system, therefore this agreement has the simple advantage of pessimistic agreement recovery algorithms; Again owing to not preserving any information to message logging after message sink process receipt message, therefore this agreement has more excellent system normal operational energy than optimistic protocol.Identical with existing protocol [7], be not that rollback or stopping are waited in fault recovery stage non-fault process, but continue to perform, this feature class is similar to forward direction recovery algorithms and makes the process in system have higher operational efficiency.

The message logging of Effect-based operation number School Affairs message rearrangement sequence recovers in agreement (MNCMR) [5] [6], and each process only needs to preserve a checkpoint asynchronously, so N.ckpts equals n.Data entrained by each application message are j and LCj, so INFOR.add is 2.When one or more process failures, only have crashed process rollback to its checkpoint, DIS.ro is 1.The number of processes N.roll of rollback is needed to equal the sum of the process of breaking down between convalescence.

Compared with there is forward direction recovering the backward recovery protocol for error tolerance of feature, MNCMR agreement needs to preserve information and stores to local before process sends message, also need to preserve information to message logging after process receipt message, this must result in the repeated storage of information; There is the rear of forward direction recovery feature and only before message sends, preserve information to recovery protocol for error tolerance, not only save storage space and improve performance when system normally performs.

Since last century the eighties, a large amount of message loggings recovers agreement and is published in domestic and international periodicals and magazines, below selects several typical protocol and MNCMR agreement to compare.Sistla and Welch [1] proposes the optimistic journal recovery agreement of two Effect-based operation, transmission message in an agreement carries transitive dependency vector (representing this agreement with Prasad.1), and the application message that another agreement sends only carries the current state interval value of transmission process (representing this agreement with Prasad.2).In Prasad.1 agreement, extraneous information amount needed for each application message is o (n), needs for each fault the system message exchanging o (n2).In Prasad.2 agreement, extraneous information amount needed for each application message is o (1), needs for each fault the system message exchanging o (n3).In the optimistic message logging agreement of Strom and Yemini [2], the application message of each transmission carries a transitive dependency vector, and this vector has n component, the number of processes that n has for system.When process non-fault performs, each process needs this transitive dependency vector of periodic broadcasting or this vector is attached in the message of transmission.

Table 1 gives the comparative result of agreement of the present invention and above-mentioned agreement.

Table 1

With other non-message journal recovery agreement (as, coordinate checkpoint and recover agreement) compare, the major defect with the backward recovery protocol for error tolerance of forward direction recovery feature is that agreement need be intervened the transmission and reception of common process message, and algorithm is the tool transparency not.With other message logging protocol comparison, forward direction characteristic recovery agreement has advantage that is pessimistic and optimistic protocol concurrently, and has abandoned its shortcoming.Result is relatively as shown in table 2.

Table 2

Main contributions of the present invention:

By fault flag, the transmission of process application message and reception are precisely controlled, achieve and continue to perform having ready conditions of process failure Restoration stage non-fault process, make system have forward direction and recover feature, and make message logging recovery protocol algorithm reach boundary ideal so far.

Accompanying drawing illustrates:

Fig. 1 is the illustration that description messages receives its randomness of order; Fig. 2 is the illustration that description messages sends and is always formerly related between reception event; Fig. 3 is the illustration calculating its value according to the definition of logical timer; Fig. 4 sets forth message to send the key diagram that event S (mj) formerly betides message sink event R (mi) indirectly; The illustration of process rollback when Fig. 5 is fault under pessimistic log protocol; The illustration of process rollback when Fig. 6 is fault under optimistic log protocol; Fig. 7 is that non-fault process condition performs and crashed process no longer sends the illustration of message after being resumed; Fig. 8 is that non-fault process condition performs and crashed process is resumed the illustration of rear transmission message; Fig. 9 is technical scheme figure of the present invention; Figure 10 be system when normally running process send and the key diagram of receipt message; Figure 11 is crashed process rejuvenation key diagram; Figure 12 is common process operational flow diagram; Figure 13 is the operational flow diagram of recovery management process; Figure 14 is the operational flow diagram that state controls process Control-i.

[1]ElnozahyEN,AlvisiL,WangYimin,etal.“ASurveyofRollbackrecoveryProtocolsinMessagepassingSystems,”ACMComputingSurveys,2002,34(3):375-408.

[2]A.Bouteiller,F.Cappello,T.Hérault,G.Krawezik,P.LemarinierandF.Magniette.MPICH-V2:“aFaultTolerantMPIforVolatileNodesbasedonPessimisticSenderBasedMessageLogging,”InProc.ofthe15thInternationalConferenceonHighPerformanceNetworkingandComputing(SC2003),November2003.

[3]D.B.JohnsonandW.Zwaenpoel.“Sender-BasedMessageLogging,”InDigestofPapers:17thInternationalSymposiumonFault-TolerantComputing,pp.14-19,1987.

[4]J.Xu,R.B.NetzerandM.Mackey.Sender-basedmessageloggingforreducingrollbackpropagation.InProc.ofthe7thInternationalSymposiumonParallelandDistributedProcessing,pp.602-609,1995.

[5] Gao Shengfa, Cai Jing, Feng Zhen. " Effect-based operation reorders and message number inspection message logging restoration methods, " People's Republic of China's patent of invention, the patent No.: 201210239710.0,2012.

[6]Jingcai,Shengfagao.“MessageRearrangeTheoryinMessageRecoveryProtocol,”20135thInternationalConferenceonComputerScienceandInformationTechnology(CSIT),pp.293-297,2013.

[7] Shengfagao. " MessageNumberCheckandMessageRearrangingTheoryandProtocol s ", LAPLAMBERTAcademicPublishinghouse, 2014 (monographs).

[8]SCHLICHTING,R.D.ANDSCHNEIDER,F.B.1983.“Fail-stopprocessors:Anapproachtodesigningfault-tolerantcomputingsystems,”ACMTrans.Comput.Syst.1,3,222–238,1983。

Claims

1. one kind has the backward recovery fault-tolerance approach that forward direction recovers feature, it is characterized in that, adopt message rearrangement sequence method, reach non-fault process by the transmission of fault position control message to have ready conditions when other process failures the effect performed, if to send and the equal non-fault of receiving process sends before process sends message and first then the disposable message logging that leaves in of logical timer of message and its correspondence sent message; When process failures, first obtain from message logging under recovery Thread control and recur message and counterlogic clock, then according to the logical timer of message, recurrence message is resequenced; Finally the message after sequence is resend to crashed process, crashed process is receipt message, processing messages again, thus realizes the recurrence of message.

The common process Pi that the method needs sends function SendM(i, j, LCi, m) job step is as follows:

If step 1 F _i=0, show that message sends process Pi non-fault, then proceed to 2; Otherwise Fi=1, shows that Pi exists fault, proceeds to 4.

If step 2 Fj=0, show message sink process Pj non-fault, then proceed to 3; Otherwise Fj=1, shows that Pj exists fault, proceed to 2, wait for that Pj recovers from malfunction.

Under step 3, transmission and the equal non-failure conditions of receiving process, the logical timer sending process adds one: LC _i← LC _i+ 1; Add information <i, j, LC _i, m> is to the end of message log file; Wherein, i and j represents transmission and receiving process respectively, LC _irepresent P _ithe logical timer variable of process, m represents the load (payload) of message; Proceed to step 5.

Step 4, transmission application message AM<i, j, LC _i, m> is to receiving process P _j, end sends process.

The logical timer of step 5, transmission process adds one: LC _i← LC _i+ 1; End process of transmitting.

2. there is backward recovery fault-tolerance approach that forward direction recovers feature as claimed in claim 1, the common process broken down is labeled as P _k, k=1,2,3 ... the step that n, Recover (k) thread recovers Pk is as follows:

Step 1, with Reliability Channel send F _k=1 message to Control-i, i=1,2 ... n, notifies other process P _kprocess breaks down.

If step 2 receive Control-k send out Tk vector, proceed to 3; Otherwise proceed to 2, wait-receiving mode Tk vector.

Step 3, received T _kvector is saved to internal memory.

Step 4, access finger daemon obtains Uik.

Step 5, access log finger daemon, obtains to front order from journal file the message that Uik-Tki Pi is sent to Pk by from rear.

Step 6, all message is pressed LC _kvariate-value sorts from small to large.

After step 7, transmission sequence, message is to P _k, namely recur message.

Step 8, with Reliability Channel send F _kthe message of=0 to Control-i, i=1,2 ... n; Represent P _kprocess is recovered from fault; F _k← 0, F _kzero setting, turns to end.