CN107172149A - Big data instant scheduling method - Google Patents

Big data instant scheduling method Download PDF

Info

Publication number
CN107172149A
CN107172149A CN201710344085.9A CN201710344085A CN107172149A CN 107172149 A CN107172149 A CN 107172149A CN 201710344085 A CN201710344085 A CN 201710344085A CN 107172149 A CN107172149 A CN 107172149A
Authority
CN
China
Prior art keywords
task
load
node
host node
burst
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710344085.9A
Other languages
Chinese (zh)
Inventor
赖真霖
文君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sixiang Lianchuang Technology Co Ltd
Original Assignee
Chengdu Sixiang Lianchuang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sixiang Lianchuang Technology Co Ltd filed Critical Chengdu Sixiang Lianchuang Technology Co Ltd
Priority to CN201710344085.9A priority Critical patent/CN107172149A/en
Publication of CN107172149A publication Critical patent/CN107172149A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a kind of big data instant scheduling method, this method includes:Operation is divided into independent task by host node, and load node is to host node application and performs task;Host node carries out one-level burst to operation, and the execution information for getting all two grades of bursts in this burst is transmitted to load;Internal memory mapping and secondary burst are done in load to this piece, are then placed in thread pool;Each thread of load node, which is performed, to be completed to carry out a reduce after a second task burst, sends result to host node, and host node will load the result progress reduce operations sent;Distribution meets the load process host node of Host List documentation requirements, task is distributed into load node by map processes, and obtain corresponding task result from load node by reduce processes;Merger intermediate result obtains final result, and final job result is given corresponding task submitter by host node by socket communication distinct feed-back.The present invention proposes a kind of big data instant scheduling method, improves the efficiency of cloud computing, the need for meeting high-performance cloud calculating.

Description

Big data instant scheduling method
Technical field
The present invention relates to big data, more particularly to a kind of big data instant scheduling method.
Background technology
Cloud computing is by the way that internet is by huge data storage and calculates the computer that processing routine is distributed to group system In, and corresponding application program service is provided.User to resource when submitting access request, and system can be automatically by request It is switched to the computer and storage system of actual storage resource.The cloud computing platform of virtualization technology is in mass data processing side Face achieves gratifying achievement.But mass data is distributed on large-scale cluster and carries out parallel processing by cloud computing, due to Current main flow cloud computing platform bottom uses virtualization technology, and all softwares and application thereon is operated on virtual hardware, This strategy necessarily brings performance to a certain extent to reduce.And realization mechanism is to use first to store number inside MapReduce According to the strategy for reading forward process again, when middle data volume becomes big, number increases, this pattern necessarily leads to substantial amounts of useless Magnetic disc i/o operation;If data can so increase network load in distal end;, can be by I/O bottlenecks if data are local Limitation, so as to reduce the efficiency of tasks carrying.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of big data instant scheduling method, bag Include:
Operation is divided into independent task, and maintenance task state array by host node;After load node starts, to host node Application task, is obtained after task by predetermined scheme execution task;After the completion of tasks carrying, the result of task is fed back into main section Put and apply for next task;If the mission bit stream that load node is obtained is invalid, this parallel computation Mission Success is completed, Load node winding-up;The operation of limit priority is taken out from task queue;Host node carries out burst to operation, obtains one-level Burst, is put into load cell;Host node inquiry state table obtains the information of one-level burst, then gets in this burst all two grades The execution information of burst is transmitted to load;Load-receipt does internal memory mapping and secondary burst to this piece to task, and detection is total Core number, set up thread pool, burst be then put into thread pool, thread obtains second task point by loading internal state table Piece, completion is performed until content is all in table;It is laggard that each thread of load node performs one second task burst of completion Reduce of row, host node is sent to by reduce results and internal state table, and host node determines whether to send to this load Another burst, if desired sends new task, then continues to take task to be sent to load;Until content is all in state table After the completion of execution, host node sends exit instruction to each load, and load triggers event signal notifies to be retired, the institutes such as monitoring There are load and host node to wait to exit together;Host node will load the result progress reduce operations sent;Host node is analyzed Incoming command line parameter is to obtain the relevant information of this parallel interface operation;According to Host List file, dynamically distributes are more Individual load node simultaneously constitutes a communication domain, then waits load node application task or submits task result;It is then based on simultaneously The dynamic process that row calculates interface is created and administrative model, and the load process host node that distribution meets Host List documentation requirements will Task is distributed to load node by map processes, and obtains corresponding task result from load node by reduce processes;Return And these intermediate results, the final result of this parallel computation operation is obtained, finally this result distinct feed-back is carried to task Friendship person;Once host node, which detects some load node, occurs exception, then new Host List file is produced, and distribute newly Load node performs task to take over abnormal load node;If host node finds all in the corresponding state array of task Item is all to have completed, then host node submits final job result by socket communication distinct feed-back to corresponding task Person.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of big data instant scheduling method, improves the efficiency of cloud computing, to meet high-performance cloud The need for calculating.
Brief description of the drawings
Fig. 1 is the flow chart of big data instant scheduling method according to embodiments of the present invention.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of big data instant scheduling method.Fig. 1 is according to embodiments of the present invention big Data instant scheduling method flow diagram.
The high-performance cloud calculating platform that the present invention is designed directly builds cloud using Heterogeneous Computing node and put down without virtualization Platform bottom;MapReduce is rewritten using the parallel computation interfacing and multithreading of increase multi-level fault tolerance function, calculated In avoid substantial amounts of useless I/O operation, so as to improve the efficiency of cloud computing, to meet the need for high-performance cloud calculates. Under node isomerous environment, based on HADOOP, cloud infrastructure service layer is set up using isomerization hardware, operation two times scheduling is realized, Operation/task rollback and dynamic migration, MapReduce frameworks are set up based on multi-level fault tolerance parallel computation interface.To centre knot Fruit is directly handled, and reduces unnecessary I/O operation.
The cloud computing platform framework based on HADOOP of the present invention includes Heterogeneous Computing node, based on parallel computation interface Fault-tolerant unit, monitoring module, job management module, task management module, distributed data base and MapReduce Computational frames.
Job management module is used to preserve job queue, manages the scheduling of operation, there is provided corresponding for the execution of monitoring operation It is fault-tolerant, support that remote job is submitted and result is returned.Monitoring module is used to manage available host list, notes abnormalities node simultaneously According to loading condition to node sequencing, to select the minimum node of load to perform task first.Task management module is used for task Divide and allocation schedule, tasks carrying, collect and returning result.
Comprising operation communication submodule in job management module, according to user mutual, realize that operation submission is fed back with result; Job management submodule according to priority carries out job scheduling according to the management and scheduling of operation, upkeep operation queue;Monitoring module evidence The running situation and loading level of all nodes are monitored, and accordingly for the Host List file needed for task execution module is provided.
Task management module is the specific performing module of operation, and burst is carried out to operation, and carries out rational management to burst, Finally collect and feed back result of calculation.Task management module is obtained after the operation that job management is distributed with scheduler module, by setting Strategy divide the job into multiple tasks, interacted with monitoring module and perform parallel computation required by task to generate scheduling Host List file;The dynamic process for being then based on parallel computation interface is created and administrative model, and distribution meets Host List text Task is distributed to load node by the load process host node of part requirement by map processes, and is passed through from load node there Reduce processes obtain corresponding task result;These intermediate results of merger can be obtained by the final of this parallel computation operation As a result, finally task submitter is given this result distinct feed-back.It is different that once host node detects the generation of some load node Often, then interacted with monitoring module to produce new Host List file, and distribute new load node to take over abnormal load Node performs task.
After operation communicator module initialization, according to user setting, it is necessary to binding local socket server address with And the socket server remote address of job management submodule, and create two worker threads:Wait thread:Circular wait is received The job result fed back from task management module;Thread is sent, once user have submitted new job, based on job management submodule The server address of block, transmits the operation that user inputs by socket and gives job management submodule.
Job management submodule waits operation communication submodule to submit to the operation of oneself.In worker thread, create Two worker threads.Thread is parsed, the local socket server for safeguarding obtains the work of operation communication submodule remote visiting system Industry, parses the information and confirms whether the information meets rule, is placed behind the address that qualified operation is bound to operation submitter Into multipriority job queue, with execution to be scheduled.Thread is scanned, scan round multipriority job queue is to determine team Whether there is the operation of higher priority in row, if with the presence of the operation of higher priority, taking out operation, life is constructed according to operation Order row simultaneously starts the process of multiple tasks performing module to complete this parallel computation operation;This parallel computation of circular wait is made Industry, which is performed, to be terminated;Complete then to judge that generation is abnormal if operation is not normal execution, then reschedule and perform this subjob.
Host node analyzes incoming command line parameter to obtain the relevant information of this parallel interface operation.According to monitoring mould The Host List file of block generation, the multiple load nodes of dynamically distributes simultaneously constitute a communication domain, then wait load node Shen Please task or submission task result.
Operation is divided into independent task by host node, and safeguards that an initial value is 0 task status array, to remember Record tasks carrying situation.Once there is load node application task, then first search was obtained in state array as appointing corresponding to 0 item Business, assigns the task to applicant and respective items in state array is put into 1;If certain task is completed, correspondence in state array Item should be set to 2;If performing the process exception of some task, respective items should be reset as 0 in state array, with etc. Wait to be reassigned to the execution of other nodes;If state array is got the bid, a certain corresponding task for being designated as 1 is in stipulated time model The interior process exception without feedback result, then the Predicated execution task is enclosed, and the corresponding item of the task in state array is reset 0, performed to be reassigned to other nodes.
If host node finds that all items in the corresponding state array of task are all 2, show that this parallel computation is made Already through completing, final job result is given corresponding task submitter by host node by socket communication distinct feed-back.
After load node starts, analyze its parent process and whether there is, if being not present, refusal is performed;If in the presence of judging Itself it is to be started by host node.Load node obtains just performing task by predetermined scheme after task to host node application task; After the completion of tasks carrying, the result of task is fed back into host node and applies for next task.If what load node was obtained appoints Business information is invalid, then this parallel computation Mission Success is completed, load node winding-up.
Include three-level fault-tolerant networks in high-performance cloud calculating platform proposed by the present invention:The fault-tolerant i.e. operation of one-level is secondary to adjust Degree.When systems scan into cluster a certain node in execution task it is abnormal, then system can reschedule immediately execution this Business.Two grades of fault-tolerant i.e. operation/task rollbacks.Performed at system rollback operation/task to nearest checkpoint.If host node is different Often, then this parallel computation operation fails, and secondary system dispatches the execution state of this parallel computation operation and rollback operation extremely At nearest checkpoint and continue executing with operation;If load node is abnormal, system attempt to restart abnormal nodes and rollback its At execution status of task to nearest checkpoint and continue executing with task.Fault-tolerant three-level is dynamic migration.When abnormal load section Point can not obtain rollback in a short time, that is to say, that in the case of two grades of fault-tolerant failures, and system can be actively by abnormal work The task immigration of node to other working nodes are performed.In order to ensure the stability of parallel interface PC cluster ability, and rack Calculating platform substitutes abnormal load process by the new load process of dynamically distributes.
Monitoring module is used to generate available Host List in present system:According to cpu core numbers and being carrying out The number of processes of task compares, if entering number of passes less than host name is added into Host List if core number.If entering number of passes not less than cpu Core number, then remove host name from Host List.The node where No. 0 process in monitoring programme is monitoring server, Node tasks node where non-zero process.The monitoring of process model is concretely comprised the following steps:
1) it is event signal a kernel objects to be created in each non-zero process, and process opens this event when performing task Signal, calculates and completes trigger event signal.This signal be used for obtain process whether etc. it is to be retired.
2) host name is obtained, oneself is calculated in the number of processes M and process for the task of being carrying out and completes wait through performing and move back The quantity K gone out, sets up available host listing file.
, will through performing the quantity K to be retired such as completion equal to oneself in process if 3) be carrying out the number of processes M of task This host name writes Host List file.
The renewal of Host List has two kinds of strategies:Timing updates to be updated with before task scheduling.Updated for timing, monitor journey Sequence is kept running always, and scheduler program and monitoring are not interacted;For being updated before task, the operation of monitoring programme occurs dividing Before task, monitoring programme automatic start before execution task updates Host List, is then log out, host node process is according to available Host List enters one group of process of Mobile state and performs task, if wherein there is the failure of process task midway, automatic start is monitored again Program updates list, and the process that host node process starts corresponding quantity further according to available host list completes appointing for failure process Business.
Burst of the task scheduling of the present invention based on operation.After operation is taken out in job queue, operation is first carried out One-level burst, node is assigned to by one-level burst, and secondary burst is then carried out in node, and two grades of bursts are assigned into thread, The distribution of task adds the method for thread pool using load cell.For map operations, the distribution of one-level burst is saved according to free time load Point application principle, the strong more bursts of node application of computing capability, the weak node of computing capability completes less burst.Load Node has been carried out after map tasks, and map result as reduce input directly are carried out into first time reduce, then will As a result it is sent to host node and is second of reduce, and obtains final result.
Load to load cell application task, host node will inquire about one-level burst table, if finding state value to be not carried out Burst, this burst information is just sent to load process.When host node is performing the fault-tolerant strategy inch of task dynamic migration, hair Send the position of the starting point and ending point of burst task, and by the execution of all two grades of bursts in one-level burst to be migrated Information table is transmitted to load together.
If this operation is scheduled for the first time, all one-level bursts are all put into load cell and are scheduled by host node. If operation is not scheduled for the first time, host node is selected state value and is scheduled for unfinished piece.Performing the mistake of scheduling Host node continuous updating performs the content of state table in journey.
The content for the load internal state table that host node is sent according to load, if detecting this fraction in now load Piece goes to certain progress and not completed, then sends another burst task to it.If host node detects execution state table In content all to have completed, then whole task is completed, and the signal to be retired such as sends to load node, and this operation is completed.For Load node ,-level burst is sent to after load, and load first does internal memory mapping to this one-level burst, then institute in detection node There is the core number that processor is total, open the thread of respective numbers, start thread pool and perform task.
In the load, one-level burst is represented with load internal state table.Thread pool takes out also according to load internal state table The burst being not carried out performs map tasks, and the input of the result of map tasks as reduce tasks then is performed into reduce appoints Business, during execution, often completes two grades of bursts and just updates the content of load internal state table, and execution is tied Host node really and to the modification information for loading internal state table is issued in the lump, and host node updates institute further according to load internal state table There is the execution state table of burst.Until performing, state is all to perform completion, and all bursts perform completion.
Operation is as follows from the flow that obtains result is submitted to:Operation communication submodule submits the operation of priority;Working pipe Manage the operation that submodule takes out limit priority from task queue;Host node carries out burst to operation, obtains one-level burst, is put into Load cell.Host node obtains the information of one-level burst by inquiring about state table, then gets all two grades of bursts in this burst Execution information transmit to load;Load-receipt does internal memory mapping and secondary burst to this piece, detects total core to task Calculation, sets up thread pool, and burst then is put into thread pool, and thread obtains second task burst by loading internal state table, Completion is performed until content is all in table, thread is exited, thread pool is destroyed.Each thread of load node, which is performed, completes one A reduce is carried out after second task burst, reduce results and internal state table are sent to host node, host node is determined Whether another burst is sent to this load, if desired send new task, then continue to take task to be sent to load.Until shape In state table after the completion of all execution of content, host node sends exit instruction to each load, and load triggers event signal is notified Monitoring etc. is to be retired, and all loads and host node are waited and exited together.Host node will load the result progress reduce sent Operation.Host node sends result to submitter's operation communication submodule of operation.
In node data cache policy, the work in the load node of cloud platform caching cloud platform interior joint subnet is deployed in Industry.Local index server is deployed in each subnet, the operation shared in residing cloud platform is stored, and respectively The corresponding node listing of individual operation.Load node oneself is cached with the identity of resource provider to local index server registration Operation.Line node builds the subnet in cloud platform in local index server organization cloud platform, helps subnet user convenient Find load node in ground.Management server periodically collects job request letter from the local index server in each cloud platform Breath;After each run cache algorithm, management server be each cloud platform generate two cache object inventories, and by this two Individual the job list is sent respectively to two load nodes disposed in cloud platform;Each stores working as operation in node Preceding state also periodically reports management server.Load node updates according to the job list received from management server The process space of oneself;Once some new cache object is synchronously completed, load node will calculate its content hash values, and Local index server registration operation into residing cloud platform.
When the user of some cloud platform starts an operation, the node adds global node subnet and cloud platform simultaneously In subnet.And reported to local index server in the running status of each operation, local index server monitoring cloud platform Subnet, is sent to management server by the relevant information of each active job in cloud platform, includes unique hash marks of operation Know, job size, the number of users accessed, currently available number of copies in cloud platform, the subnet of the most recently requested operation is used Amount etc..According to the information being collected into from local index server, management server is periodically executed a cache algorithm to draw Load node in each cloud platform needs the operation for updating and deleting.And the j ob schedule drawn is immediately sent to each born Carry node.After an operation is synchronously completed, the local index server registration work of load node immediately into cloud platform Industry.Then load node uploads locally to ask the user of the operation to provide data.
Further, operation burst to be distributed is actively sent to by procedure below the node of available free bandwidth. Management server sets decision package, according to the information being collected into, determines the operation for needing to send, and carries out burst to it, sends out The big operation burst of auxiliary node is given, and operation is stored to the memory allocation of user.To the transmission target section each chosen Point adds an operation, and the operation is in running background, and the small operation that the operation is independently performed with user is carried into resource simultaneously Deposition state, shares uploading bandwidth.When user leaves system, the end of job of the running background.Management server selection sends mesh Node is marked, notifies each selected node storage to send a burst of target job;Receive send operation node from The operation burst that cloud platform and node subnet storage are specified.After the completion of the operation burst storage of transmission, it is it to send destination node He provides upload by memory node, is used as auxiliary node;When distribution procedure terminates, management server received transmission operation to all Node send message, notify its memory to remove the operation burst being previously sent from the process space of user.
In summary, the present invention proposes a kind of big data instant scheduling method, improves the efficiency of cloud computing, to meet The need for high-performance cloud is calculated.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims (1)

1. a kind of big data instant scheduling method, it is characterised in that including:
Operation is divided into independent task, and maintenance task state array by host node;After load node starts, to host node application Task, is obtained after task by predetermined scheme execution task;After the completion of tasks carrying, the result of task is fed back into host node simultaneously Apply for next task;If the mission bit stream that load node is obtained is invalid, this parallel computation Mission Success is completed, load Node winding-up;The operation of limit priority is taken out from task queue;Host node carries out burst to operation, obtains a fraction Piece, is put into load cell;Host node inquiry state table obtains the information of one-level burst, then gets all two fractions in this burst The execution information of piece is transmitted to load;Load-receipt does internal memory mapping and secondary burst to this piece, detected always to task Core number, sets up thread pool, and burst then is put into thread pool, and thread obtains second task point by loading internal state table Piece, completion is performed until content is all in table;It is laggard that each thread of load node performs one second task burst of completion Reduce of row, host node is sent to by reduce results and internal state table, and host node determines whether to send to this load Another burst, if desired sends new task, then continues to take task to be sent to load;Until content is all in state table After the completion of execution, host node sends exit instruction to each load, and load triggers event signal notifies to be retired, the institutes such as monitoring There are load and host node to wait to exit together;Host node will load the result progress reduce operations sent;Host node is analyzed Incoming command line parameter is to obtain the relevant information of this parallel interface operation;According to Host List file, dynamically distributes are more Individual load node simultaneously constitutes a communication domain, then waits load node application task or submits task result;It is then based on simultaneously The dynamic process that row calculates interface is created and administrative model, and the load process host node that distribution meets Host List documentation requirements will Task is distributed to load node by map processes, and obtains corresponding task result from load node by reduce processes;Return And these intermediate results, the final result of this parallel computation operation is obtained, finally this result distinct feed-back is carried to task Friendship person;Once host node, which detects some load node, occurs exception, then new Host List file is produced, and distribute newly Load node performs task to take over abnormal load node;If host node finds all in the corresponding state array of task Item is all to have completed, then host node submits final job result by socket communication distinct feed-back to corresponding task Person.
CN201710344085.9A 2017-05-16 2017-05-16 Big data instant scheduling method Pending CN107172149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710344085.9A CN107172149A (en) 2017-05-16 2017-05-16 Big data instant scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710344085.9A CN107172149A (en) 2017-05-16 2017-05-16 Big data instant scheduling method

Publications (1)

Publication Number Publication Date
CN107172149A true CN107172149A (en) 2017-09-15

Family

ID=59816776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710344085.9A Pending CN107172149A (en) 2017-05-16 2017-05-16 Big data instant scheduling method

Country Status (1)

Country Link
CN (1) CN107172149A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144693A (en) * 2018-08-06 2019-01-04 上海海洋大学 A kind of power adaptive method for scheduling task and system
CN111708812A (en) * 2020-05-29 2020-09-25 北京赛博云睿智能科技有限公司 Distributed data processing method
CN112637067A (en) * 2020-12-28 2021-04-09 北京明略软件***有限公司 Graph parallel computing system and method based on analog network broadcast

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086442A1 (en) * 2006-10-05 2008-04-10 Yahoo! Inc. Mapreduce for distributed database processing
CN101706757A (en) * 2009-09-21 2010-05-12 中国科学院计算技术研究所 I/O system and working method facing multi-core platform and distributed virtualization environment
CN101996079A (en) * 2010-11-24 2011-03-30 南京财经大学 MapReduce programming framework operation method based on pipeline communication
CN105279241A (en) * 2015-09-29 2016-01-27 成都四象联创科技有限公司 Cloud computing based big data processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086442A1 (en) * 2006-10-05 2008-04-10 Yahoo! Inc. Mapreduce for distributed database processing
CN101706757A (en) * 2009-09-21 2010-05-12 中国科学院计算技术研究所 I/O system and working method facing multi-core platform and distributed virtualization environment
CN101996079A (en) * 2010-11-24 2011-03-30 南京财经大学 MapReduce programming framework operation method based on pipeline communication
CN105279241A (en) * 2015-09-29 2016-01-27 成都四象联创科技有限公司 Cloud computing based big data processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭羽成: "MPI高性能云计算平台关键技术研究", 《中国博士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144693A (en) * 2018-08-06 2019-01-04 上海海洋大学 A kind of power adaptive method for scheduling task and system
CN109144693B (en) * 2018-08-06 2020-06-23 上海海洋大学 Power self-adaptive task scheduling method and system
CN111708812A (en) * 2020-05-29 2020-09-25 北京赛博云睿智能科技有限公司 Distributed data processing method
CN112637067A (en) * 2020-12-28 2021-04-09 北京明略软件***有限公司 Graph parallel computing system and method based on analog network broadcast

Similar Documents

Publication Publication Date Title
US10635664B2 (en) Map-reduce job virtualization
EP2810164B1 (en) Managing partitions in a scalable environment
US20160275123A1 (en) Pipeline execution of multiple map-reduce jobs
CN107168799A (en) Data-optimized processing method based on cloud computing framework
US7185096B2 (en) System and method for cluster-sensitive sticky load balancing
US8671134B2 (en) Method and system for data distribution in high performance computing cluster
JP5902716B2 (en) Large-scale storage system
US8332862B2 (en) Scheduling ready tasks by generating network flow graph using information receive from root task having affinities between ready task and computers for execution
US20050132379A1 (en) Method, system and software for allocating information handling system resources in response to high availability cluster fail-over events
US7680848B2 (en) Reliable and scalable multi-tenant asynchronous processing
US8959222B2 (en) Load balancing system for workload groups
US20080229320A1 (en) Method, an apparatus and a system for controlling of parallel execution of services
US10505791B2 (en) System and method to handle events using historical data in serverless systems
US20110314465A1 (en) Method and system for workload distributing and processing across a network of replicated virtual machines
CN108536532A (en) A kind of batch tasks processing method and system
US8082344B2 (en) Transaction manager virtualization
JP2010044552A (en) Request processing method and computer system
JP2015506523A (en) Dynamic load balancing in a scalable environment
CN1595360A (en) System and method for performing task in distributing calculating system structure
CN113014611B (en) Load balancing method and related equipment
CN107172149A (en) Big data instant scheduling method
Pattanaik et al. Dynamic fault tolerance management algorithm for VM migration in cloud data centers
Khan et al. Scheduling in Desktop Grid Systems: Theoretical Evaluation of Policies & Frameworks
US20230333880A1 (en) Method and system for dynamic selection of policy priorities for provisioning an application in a distributed multi-tiered computing environment
US20230333884A1 (en) Method and system for performing domain level scheduling of an application in a distributed multi-tiered computing environment using reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170915

RJ01 Rejection of invention patent application after publication