CN106357813B - A kind of task applied to shared-file system dispatching method again - Google Patents

A kind of task applied to shared-file system dispatching method again Download PDF

Info

Publication number
CN106357813B
CN106357813B CN201610952589.4A CN201610952589A CN106357813B CN 106357813 B CN106357813 B CN 106357813B CN 201610952589 A CN201610952589 A CN 201610952589A CN 106357813 B CN106357813 B CN 106357813B
Authority
CN
China
Prior art keywords
node
task
file
failure
shared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610952589.4A
Other languages
Chinese (zh)
Other versions
CN106357813A (en
Inventor
陈军
闫鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Long Yu Technology (beijing) Ltd By Share Ltd
Original Assignee
Long Yu Technology (beijing) Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Long Yu Technology (beijing) Ltd By Share Ltd filed Critical Long Yu Technology (beijing) Ltd By Share Ltd
Priority to CN201610952589.4A priority Critical patent/CN106357813B/en
Publication of CN106357813A publication Critical patent/CN106357813A/en
Application granted granted Critical
Publication of CN106357813B publication Critical patent/CN106357813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of task applied to shared-file system dispatching methods again, include the following steps: when some node failure, and failure node task is carried out between multiple non-failure nodes seizes operation, seize the task of successful node taking over failing node.Described to seize operation using the renaming operation to a certain file to realize, i.e., the multiple non-failure node carries out renaming operation to same file simultaneously, and operating successfully some node is to seize successful node.The inventive method is easy to realize, there is no Single Point of Faliure and can tolerate multiple nodes while break down, the each node that preferably task can be dispatched in distributed system, the problem of the method for the present invention can preferably utilize the characteristics of shared-file system, and task is again dispatched when very good solution node failure.

Description

A kind of task applied to shared-file system dispatching method again
Technical field
The present invention relates to field of communication technology, especially a kind of task applied to shared-file system dispatching method again.
Background technique
Mass data processing is a typical case in distributed system.In this type of application, there is a kind of number applied According to there is following two feature:
(1) it is stored in shared-file system, the client that each node of system can be transferred through shared-file system is visited Ask these data.
(2) there are general character for data, and the general character of data has fixed value range, and data can be protected according to value range There are in different files.
The characteristics of for this kind of application, we are usually that the data file of different value ranges is scheduled to different nodes It is handled.During handling data, node failure and node recovery are a problems in need of consideration.How will failure The task of node reschedules to other nodes and node when restoring how again taking over tasks is to ensure that task smoothly completed One key factor, method proposed by the present invention is i.e. for this problem.
The task that node failure and node restore at present is dispatched again usually has following methods:
Centralization: being arranged a special scheduling node in which, other nodes are all processing node, scheduling node The state of monitoring processing node.When some handles node failure, the task of the node is reassigned into other healthy nodes, When failure node gets well state, original task is taken over by the node again.This mode is realized simply, but is adjusted Degree node is likely to become bottleneck, and there are Single Point of Faliures.
Two-by-two mutually for formula: being matched two-by-two between which interior joint, match node main and standby relation each other.Assume node A It is pairing node with B, when node A fails, task is handled by B, and node A takes over original task, the mistake of node B when restoring Effect and recovery are also such.This mode is realized simply, and Single Point of Faliure problem is not present, but if pairing node fails, then Its task does not have node adapter tube.
Concentrating type: which is that mutually the upgraded version of standby formula needs multiple healthy nodes when a node failure two-by-two Between hold consultation and select a node and take over its task.This mode can tolerate multiple nodes while break down, still The communication being related between multiple nodes implements more complicated.
Summary of the invention
For overcome the deficiencies in the prior art, the present invention provides a kind of tasks applied to shared-file system to dispatch again Method, this method is realized simply, without Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, can be preferably by task point Task each node in distributed system.
The technical solution adopted by the present invention to solve the technical problems is: a kind of task applied to shared-file system is again Dispatching method includes the following steps:
Failure node task is carried out when some node failure, between multiple non-failure nodes seizes operation, seizes into The task of the node taking over failing node of function.
Preferably, it is described seize operation and operated using the renaming to a certain file realize, i.e., it is the multiple do not fail Node carries out renaming operation to same file simultaneously, and operating successfully some node is to seize successful node.
Preferably, the file named with " Ti-Nj " created in shared-file system when the file is initial, In, Ti is mission number, and Nj is node serial number.
Preferably, on each node run a process, it is described seize operation before, traverse above-mentioned All Files, examine The filename and modification time for looking into each file judge whether to seize operation.
Preferably, described to judge whether to seize behaviour for the file of the entitled Ti-Nj of file to number the node for being k The step of making specifically:
If a. i=k, j=k, shows that the task is just handled by node k, update the modification time of this document, continue to locate Manage the task;
If b. i=k, j ≠ k shows that task reason node k is handled, and node k once failed, and seized by other nodes It is handled, present node k has restored to health status, carries out the processing for seizing back this node tasks;
If c. i ≠ k, j=k, shows that task reason node Ni is handled, success is seized by Nk when Ni fails, is updated at this time The modification time of this document simultaneously continues with the task;
If d. i ≠ k, j ≠ k shows that the task by other node processings, judges whether Nj fails at this time, if failure if by Other nodes carry out seizing operation.
Preferably, carry out seizing back the processing of this node tasks described in step b specifically: check repairing for file Ti-Nj Change whether time gap current time has been more than the out-of-service time, be such as more than then to show that node Nj has failed, be at this time found Nj The healthy node of failure is seized, and node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, The modification time of Ti-Nk is updated, task Ti is handled;It is such as less than, then needs to be communicated with Nj, notice Nj stops as predecessor Business, then by Ti-Nj renamed as Ti-Nk, explanation seizes success if success, updates the modification time of Ti-Nk, and processing is appointed Be engaged in Ti.
Preferably, judge whether Nj fails described in step d, carry out gunbattle by other nodes if failure and operate specifically Are as follows: whether the modification time of inspection file Ti-Nj has been more than the out-of-service time apart from current time, is such as more than that then node Nj has lost Effect, the healthy node of be found Nj failure is seized at this time, and node Nk is attempted Ti-Nj renamed as Ti-Nk, if at Function then illustrates to seize success, updates the modification time of Ti-Nk, handle task Ti.
The positive effect of the present invention: set forth herein a kind of implementation method of concentrating type scheduling, this method is not necessarily to multiple nodes Between carry out communication and select hosting node, but allow and carry out seizing for failure node task between multiple nodes, seize successfully The task of node taking over failing node, node also carry out seizing for task when restoring.The process wherein seized needs one and " cuts out Sentence " determine who can seize success, and the server of shared-file system can just serve as the role of this " judge ".It seizes This operation renames operation using file to realize, and multiple file system clients are simultaneously to the renaming of same file One is only had when operation successfully to return, i.e., only has one and seize success.The mechanism for judging node failure can be based on file Modification time is realized.The present invention using in shared-file system multiple client carry out file renaming have exclusiveness this Feature proposes a kind of to utilize renaming operation carry out task the seizing of task dispatching method again, side of the present invention in node failure Method is realized simply, without Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, preferably can be dispatched to be distributed by task The characteristics of each node in formula system, the method for the present invention can preferably utilize shared-file system, very good solution section The problem of task is dispatched again when point failure.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Specific embodiment
With reference to the accompanying drawing to a preferred embodiment of the present invention will be described in detail.
Referring to Fig.1, the preferred embodiment of the present invention provides a kind of task applied to shared-file system dispatching method again, should Method selects hosting node without carrying out communication between multiple nodes, but allows between multiple nodes and carry out failure node task It seizes, seizes the task of successful node taking over failing node, node also carries out seizing for task when restoring.Operation is seized to utilize File renaming operation is to realize, and when multiple nodes simultaneously operate the renaming of same file only has one and successfully returns It returns, i.e., only has one and seize success.The mechanism for judging node failure can be realized based on filemodetime, specific real It is now as described below:
Be n it is assumed that number of nodes and number of tasks (data file number) are identical, node serial number from N1 to Nn, mission number from T1 to Tn.When all nodes are health status, one task of each node processing, i.e. node Ni handles task Ti.We will Task Ti and node Nj composition " task-node " is right, i.e. Ti-Nj indicates that task i is handled by node j.When initial, Wo Men The n files named with " Ti-Nj " are created in shared-file system, indicate n " task-node " right, i.e. T1-N1, T2- N2...Tn-Nn shows that Ni processing numbers same task Ti, Ti and is subordinated to Ni.
A process is run on each node, is traversed this n file, is checked the modification time of each filename and file. We are to number the node for being k, for the file of the entitled Ti-Nj of file, carry out following judgement:
If 1) i=k, j=k, this shows that the task is handled by this section point, therefore need to only update the modification time of this document , continue with the task.
If 2) i=k, j ≠ k, this shows this node processing of the task reason, this node once failed, by other nodes It seizes and is handled, this present node has restored to health status, therefore needs to seize back this section point and be handled.Check file Whether whether the modification time of Ti-Nj be more than the out-of-service time apart from current time, is such as more than then to show that node Nj has failed, The healthy node of be found Nj failure is seized at this time.Node Nk is attempted Ti-Nj renamed as Ti-Nk, if success Then explanation seizes success, updates the modification time of Ti-Nk, handles task Ti.It is such as less than, then needs to be communicated with Nj, lead to Know that Nj stops current task, then by Ti-Nj renamed as Ti-Nk, explanation seizes success if success, updates Ti-Nk's Modification time handles task Ti.
If 3) i ≠ k, j=k, this shows that task reason node Ni is handled, and seizes success by Nk when Ni fails, therefore only needs The modification time for updating this document, continues with the task.
If 4) i ≠ k, j ≠ k, this shows the task by other node processings, needs to judge whether Nj fails at this time.Check text Whether whether the modification time of part Ti-Nj be more than the out-of-service time apart from current time, is such as more than then to show that node Nj has lost Effect, the healthy node of be found Nj failure is seized at this time.Node Nk is attempted Ti-Nj renamed as Ti-Nk, if at Function then illustrates to seize success, updates the modification time of Ti-Nk, handle task Ti.
It is number of tasks situation identical with number of nodes above, usual number of tasks can substantially exceed number of nodes, we can will appoint Business carries out Hash calculation according to certain rule, and the task with identical cryptographic Hash is dispatched to identical node and is handled.
In conclusion the method for the present invention can utilize the characteristics of shared-file system well, thus very good solution section The problem of task is dispatched again when point failure.
It is above-described to be merely a preferred embodiment of the present invention, it should be understood that the explanation of above embodiments is only used In facilitating the understanding of the method and its core concept of the invention, it is not intended to limit the scope of protection of the present invention, it is all of the invention Any modification for being made within thought and principle, equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims (6)

1. a kind of task applied to shared-file system dispatching method again, which comprises the steps of: when some section Failure node task is carried out when point failure, between multiple non-failure nodes seizes operation, seizes successful node taking over failing The task of node;
Described to seize operation using the renaming operation to a certain file to realize, i.e., the multiple non-failure node is simultaneously to same One file carries out renaming operation, and operating successfully some node is to seize successful node.
2. a kind of task applied to shared-file system according to claim 1 dispatching method again, it is characterised in that: institute State the file named with " Ti-Nj " created in shared-file system when file is initial, wherein Ti is mission number, Nj For node serial number.
3. a kind of task applied to shared-file system according to claim 2 dispatching method again, it is characterised in that: every On a node run a process, it is described seize operation before, traverse All Files, check each file filename and repair Change the time, judges whether to seize operation.
4. a kind of task applied to shared-file system according to claim 3 dispatching method again, it is characterised in that: with The node that number is k, it is described to judge whether to the step of seizing operation for the file of the entitled Ti-Nj of file specifically:
If a. i=k, j=k, shows that the task is just handled by node k, update the modification time of this document, continue with this Task;
If b. i=k, j ≠ k shows that task reason node k is handled, and node k once failed, and seized progress by other nodes Processing, present node k have restored to health status, carry out the processing for seizing back this node tasks;
If c. i ≠ k, j=k, shows that task reason node Ni is handled, success is seized by Nk when Ni fails, updates this article at this time The modification time of part simultaneously continues with the task;
If d. i ≠ k, j ≠ k shows that the task by other node processings, judges whether Nj fails at this time, by other if failure Node carries out seizing operation.
5. a kind of task applied to shared-file system according to claim 4 dispatching method again, it is characterised in that: step Seize back the processing of this node tasks described in rapid b specifically: check the modification time of file Ti-Nj apart from current time Whether it has been more than the out-of-service time, has such as been more than, then shown that node Nj has failed, the healthy node of be found Nj failure carries out at this time It seizes, node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, when updating the modification of Ti-Nk Between, handle task Ti;It is such as less than, then needs to be communicated with Nj, notice Nj stops current task, then orders Ti-Nj again Entitled Ti-Nk, explanation seizes success if success, updates the modification time of Ti-Nk, handles task Ti.
6. a kind of task applied to shared-file system according to claim 4 dispatching method again, it is characterised in that: step Judge whether Nj fails described in rapid d, carries out gunbattle by other nodes if failure and operate specifically: check file Ti-Nj's Whether modification time has been more than the out-of-service time apart from current time, is such as more than, then node Nj has failed, and be found Nj loses at this time The healthy node of effect is seized, and node Nk is attempted Ti-Nj renamed as Ti-Nk, and explanation seizes success if success, more The modification time of new Ti-Nk, handles task Ti.
CN201610952589.4A 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again Active CN106357813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610952589.4A CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610952589.4A CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Publications (2)

Publication Number Publication Date
CN106357813A CN106357813A (en) 2017-01-25
CN106357813B true CN106357813B (en) 2019-08-06

Family

ID=57863582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610952589.4A Active CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Country Status (1)

Country Link
CN (1) CN106357813B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107707620B (en) * 2017-08-30 2020-09-11 华为技术有限公司 Method and device for processing IO (input/output) request

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103812674A (en) * 2012-11-07 2014-05-21 北京信威通信技术股份有限公司 Method for main and standby server replacement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN103812674A (en) * 2012-11-07 2014-05-21 北京信威通信技术股份有限公司 Method for main and standby server replacement
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework

Also Published As

Publication number Publication date
CN106357813A (en) 2017-01-25

Similar Documents

Publication Publication Date Title
CN106294533B (en) The distributed work flow replicated using database
US20070013948A1 (en) Dynamic and distributed queueing and processing system
US20150278324A1 (en) Quarantine and repair of replicas in a quorum-based data storage system
US9876878B2 (en) Seamless cluster servicing
US9047331B2 (en) Scalable row-store with consensus-based replication
US7962574B2 (en) Data integration in service oriented architectures
US20070206611A1 (en) Effective high availability cluster management and effective state propagation for failure recovery in high availability clusters
US20100293235A1 (en) Method and system for managing the order of messages
CN103473076B (en) The dissemination method of a kind of code release and system
US7849068B2 (en) Remotely updating a status of a data record to cancel a workstation deployment
Milanovic et al. Automatic generation of service availability models
US9633094B2 (en) Data load process
CN105593839B (en) Distributed disaster recovery file sync server system
US20160087759A1 (en) Tuple recovery
CN106357813B (en) A kind of task applied to shared-file system dispatching method again
Avila et al. The stacker crane problem and the directed general routing problem
CN104580428B (en) A kind of data routing method, data administrator and distributed memory system
CN104205775B (en) The system delivered for high reliability and performance application message
JP2022503583A (en) Non-destructive upgrade methods, equipment and systems for distributed tuning engines in a distributed computing environment
EP2980707B1 (en) Method for creating a database clone of a distributed database, system for creating a database clone of a distributed database, program and computer program product
WO2014049327A9 (en) Data distribution system
JP2015201027A (en) Node and program
Bocquillon et al. A constraint-programming-based approach for solving the data dissemination problem
Albassam et al. Model-Based Recovery and Adaptation Connectors: Design and Experimentation
JP2007323422A (en) Distributed database system and method of data synchronization thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant