CN106357813B

CN106357813B - A kind of task applied to shared-file system dispatching method again

Info

Publication number: CN106357813B
Application number: CN201610952589.4A
Authority: CN
Inventors: 陈军; 闫鹏飞
Original assignee: Long Yu Technology (beijing) Ltd By Share Ltd
Current assignee: Long Yu Technology (beijing) Ltd By Share Ltd
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2019-08-06
Anticipated expiration: 2036-11-02
Also published as: CN106357813A

Abstract

The present invention provides a kind of task applied to shared-file system dispatching methods again, include the following steps: when some node failure, and failure node task is carried out between multiple non-failure nodes seizes operation, seize the task of successful node taking over failing node.Described to seize operation using the renaming operation to a certain file to realize, i.e., the multiple non-failure node carries out renaming operation to same file simultaneously, and operating successfully some node is to seize successful node.The inventive method is easy to realize, there is no Single Point of Faliure and can tolerate multiple nodes while break down, the each node that preferably task can be dispatched in distributed system, the problem of the method for the present invention can preferably utilize the characteristics of shared-file system, and task is again dispatched when very good solution node failure.

Description

A kind of task applied to shared-file system dispatching method again

Technical field

The present invention relates to field of communication technology, especially a kind of task applied to shared-file system dispatching method again.

Background technique

Mass data processing is a typical case in distributed system.In this type of application, there is a kind of number applied According to there is following two feature:

(1) it is stored in shared-file system, the client that each node of system can be transferred through shared-file system is visited Ask these data.

(2) there are general character for data, and the general character of data has fixed value range, and data can be protected according to value range There are in different files.

The characteristics of for this kind of application, we are usually that the data file of different value ranges is scheduled to different nodes It is handled.During handling data, node failure and node recovery are a problems in need of consideration.How will failure The task of node reschedules to other nodes and node when restoring how again taking over tasks is to ensure that task smoothly completed One key factor, method proposed by the present invention is i.e. for this problem.

The task that node failure and node restore at present is dispatched again usually has following methods:

Centralization: being arranged a special scheduling node in which, other nodes are all processing node, scheduling node The state of monitoring processing node.When some handles node failure, the task of the node is reassigned into other healthy nodes, When failure node gets well state, original task is taken over by the node again.This mode is realized simply, but is adjusted Degree node is likely to become bottleneck, and there are Single Point of Faliures.

Two-by-two mutually for formula: being matched two-by-two between which interior joint, match node main and standby relation each other.Assume node A It is pairing node with B, when node A fails, task is handled by B, and node A takes over original task, the mistake of node B when restoring Effect and recovery are also such.This mode is realized simply, and Single Point of Faliure problem is not present, but if pairing node fails, then Its task does not have node adapter tube.

Concentrating type: which is that mutually the upgraded version of standby formula needs multiple healthy nodes when a node failure two-by-two Between hold consultation and select a node and take over its task.This mode can tolerate multiple nodes while break down, still The communication being related between multiple nodes implements more complicated.

Summary of the invention

For overcome the deficiencies in the prior art, the present invention provides a kind of tasks applied to shared-file system to dispatch again Method, this method is realized simply, without Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, can be preferably by task point Task each node in distributed system.

The technical solution adopted by the present invention to solve the technical problems is: a kind of task applied to shared-file system is again Dispatching method includes the following steps:

Failure node task is carried out when some node failure, between multiple non-failure nodes seizes operation, seizes into The task of the node taking over failing node of function.

Preferably, it is described seize operation and operated using the renaming to a certain file realize, i.e., it is the multiple do not fail Node carries out renaming operation to same file simultaneously, and operating successfully some node is to seize successful node.

Preferably, the file named with " Ti-Nj " created in shared-file system when the file is initial, In, Ti is mission number, and Nj is node serial number.

Preferably, on each node run a process, it is described seize operation before, traverse above-mentioned All Files, examine The filename and modification time for looking into each file judge whether to seize operation.

Preferably, described to judge whether to seize behaviour for the file of the entitled Ti-Nj of file to number the node for being k The step of making specifically:

If a. i=k, j=k, shows that the task is just handled by node k, update the modification time of this document, continue to locate Manage the task；

If b. i=k, j ≠ k shows that task reason node k is handled, and node k once failed, and seized by other nodes It is handled, present node k has restored to health status, carries out the processing for seizing back this node tasks；

If c. i ≠ k, j=k, shows that task reason node Ni is handled, success is seized by Nk when Ni fails, is updated at this time The modification time of this document simultaneously continues with the task；

If d. i ≠ k, j ≠ k shows that the task by other node processings, judges whether Nj fails at this time, if failure if by Other nodes carry out seizing operation.

Preferably, carry out seizing back the processing of this node tasks described in step b specifically: check repairing for file Ti-Nj Change whether time gap current time has been more than the out-of-service time, be such as more than then to show that node Nj has failed, be at this time found Nj The healthy node of failure is seized, and node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, The modification time of Ti-Nk is updated, task Ti is handled；It is such as less than, then needs to be communicated with Nj, notice Nj stops as predecessor Business, then by Ti-Nj renamed as Ti-Nk, explanation seizes success if success, updates the modification time of Ti-Nk, and processing is appointed Be engaged in Ti.

Preferably, judge whether Nj fails described in step d, carry out gunbattle by other nodes if failure and operate specifically Are as follows: whether the modification time of inspection file Ti-Nj has been more than the out-of-service time apart from current time, is such as more than that then node Nj has lost Effect, the healthy node of be found Nj failure is seized at this time, and node Nk is attempted Ti-Nj renamed as Ti-Nk, if at Function then illustrates to seize success, updates the modification time of Ti-Nk, handle task Ti.

The positive effect of the present invention: set forth herein a kind of implementation method of concentrating type scheduling, this method is not necessarily to multiple nodes Between carry out communication and select hosting node, but allow and carry out seizing for failure node task between multiple nodes, seize successfully The task of node taking over failing node, node also carry out seizing for task when restoring.The process wherein seized needs one and " cuts out Sentence " determine who can seize success, and the server of shared-file system can just serve as the role of this " judge ".It seizes This operation renames operation using file to realize, and multiple file system clients are simultaneously to the renaming of same file One is only had when operation successfully to return, i.e., only has one and seize success.The mechanism for judging node failure can be based on file Modification time is realized.The present invention using in shared-file system multiple client carry out file renaming have exclusiveness this Feature proposes a kind of to utilize renaming operation carry out task the seizing of task dispatching method again, side of the present invention in node failure Method is realized simply, without Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, preferably can be dispatched to be distributed by task The characteristics of each node in formula system, the method for the present invention can preferably utilize shared-file system, very good solution section The problem of task is dispatched again when point failure.

Detailed description of the invention

Fig. 1 is flow diagram of the invention.

Specific embodiment

With reference to the accompanying drawing to a preferred embodiment of the present invention will be described in detail.

Referring to Fig.1, the preferred embodiment of the present invention provides a kind of task applied to shared-file system dispatching method again, should Method selects hosting node without carrying out communication between multiple nodes, but allows between multiple nodes and carry out failure node task It seizes, seizes the task of successful node taking over failing node, node also carries out seizing for task when restoring.Operation is seized to utilize File renaming operation is to realize, and when multiple nodes simultaneously operate the renaming of same file only has one and successfully returns It returns, i.e., only has one and seize success.The mechanism for judging node failure can be realized based on filemodetime, specific real It is now as described below:

Be n it is assumed that number of nodes and number of tasks (data file number) are identical, node serial number from N1 to Nn, mission number from T1 to Tn.When all nodes are health status, one task of each node processing, i.e. node Ni handles task Ti.We will Task Ti and node Nj composition " task-node " is right, i.e. Ti-Nj indicates that task i is handled by node j.When initial, Wo Men The n files named with " Ti-Nj " are created in shared-file system, indicate n " task-node " right, i.e. T1-N1, T2- N2...Tn-Nn shows that Ni processing numbers same task Ti, Ti and is subordinated to Ni.

A process is run on each node, is traversed this n file, is checked the modification time of each filename and file. We are to number the node for being k, for the file of the entitled Ti-Nj of file, carry out following judgement:

If 1) i=k, j=k, this shows that the task is handled by this section point, therefore need to only update the modification time of this document , continue with the task.

If 2) i=k, j ≠ k, this shows this node processing of the task reason, this node once failed, by other nodes It seizes and is handled, this present node has restored to health status, therefore needs to seize back this section point and be handled.Check file Whether whether the modification time of Ti-Nj be more than the out-of-service time apart from current time, is such as more than then to show that node Nj has failed, The healthy node of be found Nj failure is seized at this time.Node Nk is attempted Ti-Nj renamed as Ti-Nk, if success Then explanation seizes success, updates the modification time of Ti-Nk, handles task Ti.It is such as less than, then needs to be communicated with Nj, lead to Know that Nj stops current task, then by Ti-Nj renamed as Ti-Nk, explanation seizes success if success, updates Ti-Nk's Modification time handles task Ti.

If 3) i ≠ k, j=k, this shows that task reason node Ni is handled, and seizes success by Nk when Ni fails, therefore only needs The modification time for updating this document, continues with the task.

If 4) i ≠ k, j ≠ k, this shows the task by other node processings, needs to judge whether Nj fails at this time.Check text Whether whether the modification time of part Ti-Nj be more than the out-of-service time apart from current time, is such as more than then to show that node Nj has lost Effect, the healthy node of be found Nj failure is seized at this time.Node Nk is attempted Ti-Nj renamed as Ti-Nk, if at Function then illustrates to seize success, updates the modification time of Ti-Nk, handle task Ti.

It is number of tasks situation identical with number of nodes above, usual number of tasks can substantially exceed number of nodes, we can will appoint Business carries out Hash calculation according to certain rule, and the task with identical cryptographic Hash is dispatched to identical node and is handled.

In conclusion the method for the present invention can utilize the characteristics of shared-file system well, thus very good solution section The problem of task is dispatched again when point failure.

It is above-described to be merely a preferred embodiment of the present invention, it should be understood that the explanation of above embodiments is only used In facilitating the understanding of the method and its core concept of the invention, it is not intended to limit the scope of protection of the present invention, it is all of the invention Any modification for being made within thought and principle, equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of task applied to shared-file system dispatching method again, which comprises the steps of: when some section Failure node task is carried out when point failure, between multiple non-failure nodes seizes operation, seizes successful node taking over failing The task of node；

Described to seize operation using the renaming operation to a certain file to realize, i.e., the multiple non-failure node is simultaneously to same One file carries out renaming operation, and operating successfully some node is to seize successful node.

2. a kind of task applied to shared-file system according to claim 1 dispatching method again, it is characterised in that: institute State the file named with " Ti-Nj " created in shared-file system when file is initial, wherein Ti is mission number, Nj For node serial number.

3. a kind of task applied to shared-file system according to claim 2 dispatching method again, it is characterised in that: every On a node run a process, it is described seize operation before, traverse All Files, check each file filename and repair Change the time, judges whether to seize operation.

4. a kind of task applied to shared-file system according to claim 3 dispatching method again, it is characterised in that: with The node that number is k, it is described to judge whether to the step of seizing operation for the file of the entitled Ti-Nj of file specifically:

If a. i=k, j=k, shows that the task is just handled by node k, update the modification time of this document, continue with this Task；

If b. i=k, j ≠ k shows that task reason node k is handled, and node k once failed, and seized progress by other nodes Processing, present node k have restored to health status, carry out the processing for seizing back this node tasks；

If c. i ≠ k, j=k, shows that task reason node Ni is handled, success is seized by Nk when Ni fails, updates this article at this time The modification time of part simultaneously continues with the task；

If d. i ≠ k, j ≠ k shows that the task by other node processings, judges whether Nj fails at this time, by other if failure Node carries out seizing operation.

5. a kind of task applied to shared-file system according to claim 4 dispatching method again, it is characterised in that: step Seize back the processing of this node tasks described in rapid b specifically: check the modification time of file Ti-Nj apart from current time Whether it has been more than the out-of-service time, has such as been more than, then shown that node Nj has failed, the healthy node of be found Nj failure carries out at this time It seizes, node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, when updating the modification of Ti-Nk Between, handle task Ti；It is such as less than, then needs to be communicated with Nj, notice Nj stops current task, then orders Ti-Nj again Entitled Ti-Nk, explanation seizes success if success, updates the modification time of Ti-Nk, handles task Ti.

6. a kind of task applied to shared-file system according to claim 4 dispatching method again, it is characterised in that: step Judge whether Nj fails described in rapid d, carries out gunbattle by other nodes if failure and operate specifically: check file Ti-Nj's Whether modification time has been more than the out-of-service time apart from current time, is such as more than, then node Nj has failed, and be found Nj loses at this time The healthy node of effect is seized, and node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, more The modification time of new Ti-Nk, handles task Ti.