Summary of the invention
Embodiments of the invention provide method, the Apparatus and system of a kind of distributed parallel task processing, and the complexity that can solve distributed parallel task processing system of the prior art is higher, the slow problem of distributed parallel task processing.
First aspect, the embodiment of the present invention provides the method for a kind of distributed parallel task processing, comprising:
Receive pending data;
Be multiple data fragmentations by described pending data cutting;
Described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed;
Receive each processing node result data after treatment;
Described sub-result data is merged, form result data.
Second aspect, the embodiment of the present invention provides the method for a kind of distributed parallel task processing, comprising:
Receive and control the data fragmentation that node sends; Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort;
Data in described data fragmentation are processed, formed sub-result data;
Described sub-result data is sent to described control node.
The third aspect, the embodiment of the present invention provides a kind of node of controlling, and comprising:
Receiving element, for receiving pending data;
Cutting unit is multiple data fragmentations for the described pending data cutting that described receiving element is received;
Allocation units, process for described multiple data fragmentations are distributed to respectively to multiple processing nodes;
Described receiving element, also for receiving each processing node result data after treatment;
Merge cells, merges for the described sub-result data that described receiving element is received, and forms result data.
Fourth aspect, the embodiment of the present invention provides a kind of processing node, comprising:
Receiving element, controls for receiving the data fragmentation that node sends; Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort;
Processing unit, for the data of the described data fragmentation of receiving element reception are processed, forms sub-result data;
Transmitting element, sends to described control node for the sub-result data that described processing unit is formed.
The 5th aspect, the embodiment of the present invention provides the system of a kind of distributed parallel task processing, and comprise and control node and multiple processing node, wherein,
Described control node, for receiving pending data, is multiple data fragmentations by described pending data cutting, described multiple data fragmentations is distributed to respectively to multiple processing nodes and process;
Described processing node, the data fragmentation sending for receiving described control node, processes the data in described data fragmentation, forms sub-result data, and described sub-result data is sent to described control node;
Described control node, also, for receiving each processing node result data after treatment, merges described sub-result data, forms result data.
Method, the Apparatus and system of distributed parallel task provided by the invention processing, control node and receive pending data, be multiple data fragmentations by described pending data cutting, described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed, and receive each processing node result data after treatment, described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
For making the advantage of technical solution of the present invention clearer, below in conjunction with drawings and Examples, the present invention is elaborated.
As shown in Figure 1, the method for the distributed parallel task processing that the embodiment of the present invention provides, sets forth from controlling node side, and described method comprises:
101, receive pending data.
In distributed parallel task, the data volume of described pending data is generally larger, and the size of data volume generally more than 1 terabyte (Terabyte, be called for short TB), but is not only confined to this.
102, be multiple data fragmentations by described pending data cutting.
Wherein, it is data fragmentation that described pending data can be carried out cutting according to the quantity of described processing node, the quantity of described data fragmentation is identical with the quantity of described processing node, and the size of the data of each data fragmentation storage can be identical, but is not only confined to this.
103, described multiple data fragmentations being distributed to respectively to multiple processing nodes processes.
It can be that load information according to each processing node distributes that described multiple data fragmentations are distributed to respectively to multiple processing nodes process, and in the time of every sub-distribution, a data fragmentation in multiple data fragmentations is distributed to the minimum processing node of load; In addition, a data fragmentation Random assignment in described multiple data fragmentations can also be given to a processing node that does not get data fragmentation, but be not only confined to this, described multiple data fragmentations are distributed to respectively to multiple processing nodes and process and can also have other various ways, will not enumerate herein.
104, receive each processing node result data after treatment.
Wherein, described sub-result data is to form after described processing node is processed, the data fragmentation that described processing node can get it reads line by line and processes, independent irrelevant between the data of every row, the arithmetic logic carrying out on processing node can be carried out in multirow data simultaneously.
105, described sub-result data is merged, form result data.
Wherein, the sub-result data that described control node can return to each processing node merges, and forms result data.Described result data can stored data base etc., for follow-up data analysis application.
The method of the distributed parallel task processing that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed, and receive each processing node result data after treatment, described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, promote the speed of distributed parallel task processing.
The opposite side corresponding with controlling node is processing node side, and as shown in Figure 2, the method for the distributed parallel task processing that the embodiment of the present invention provides, sets forth from processing node side, comprising:
201, receive and control the data fragmentation that node sends.
The source of described data fragmentation is to control the pending data that node receives.Described pending data, through controlling grouping and the sequence of node, are not directly carried out cutting and have been formed described data fragmentation by described control node.
202, the data in described data fragmentation are processed, formed sub-result data.
The data fragmentation that described processing node can get it reads line by line and processes, independent irrelevant between the data of every row, and the arithmetic logic carrying out on processing node can be carried out in multirow data simultaneously.
203, described sub-result data is sent to described control node.
The object of above-mentioned steps 203 is that the sub-result data after each processing node deal with data burst all arrives after control node, is merged by described control node, forms result data.
The method of the distributed parallel task processing that the embodiment of the present invention provides, processing node receives data fragmentation, wherein, described data fragmentation is pending data of described control node cutting and obtaining, described pending data are not grouped and sort, described processing node is processed and is formed sub-result data data fragmentation, more sub-result data is sent to described control node.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
Be elaborated and further expand for the method shown in Fig. 1 or Fig. 2 below:
As shown in Figure 3, the method for the distributed parallel task processing that further embodiment of this invention provides, comprising:
301, control node and receive pending data.
In distributed parallel task, the data volume of described pending data is generally larger, and the size of data volume generally more than 1 terabyte (Terabyte, be called for short TB), but is not only confined to this.For example, described pending data can be certain application program at intraday logon information, described logon information comprises the on-line time of the account under this application program, rolls off the production line the time etc., but is not only confined to this.
302, described control node, according to the quantity of described processing node, is multiple data fragmentations by described pending data cutting.After step 302, can perform step 303 or step 304.
Wherein, it is data fragmentation that described pending data can be carried out cutting according to the quantity of described processing node, the quantity of described data fragmentation is identical with the quantity of described processing node, and the size of the data of each data fragmentation storage can be identical, but is not only confined to this.
303, a processing node that does not get data fragmentation is given in a data fragmentation Random assignment in described multiple data fragmentations by control node, until multiple data fragmentation is assigned.Afterwards, continue execution step 308.
For the load that ensures each processing node can be not excessive, need to carry out reasonable distribution to described data fragmentation, specifically can carry out Random assignment to data fragmentation, and receive after data fragmentation at processing node, will can again not receive the data fragmentation of these pending data.
304, the load information of himself is sent to control node by processing node.Perform step afterwards 305-306.
Same, for data fragmentation described in can reasonable distribution, can also distribute according to the size of the load of each processing node.In described load information, carry the load at processing node place.
305, control node according to the load information of each processing node receiving, determine according to described load information the processing node that load is minimum.
Concrete, get after the load information of each processing node at described control node, owing to carrying the load of processing node in described load information, therefore can learn the processing node that load is minimum.
306, control node a data fragmentation in described multiple data fragmentations is distributed to the minimum processing node of described load.Continue execution step 307.
Like this, the each data fragmentation in multiple data fragmentations divides timing, all can distribute to the minimum processing node of load, makes the distribution of data fragmentation comparatively balanced, has ensured the load balancing of processing node.
307, control node and judge whether described multiple data fragmentation is assigned.If described data fragmentation is assigned, execution step 308, otherwise return to execution step 304.
308, processing node is processed line by line to the multirow data in described data fragmentation, forms sub-result data.
The data fragmentation that described processing node can get it reads line by line and processes, independent irrelevant between the data of every row, and the arithmetic logic carrying out on processing node can be carried out in multirow data simultaneously.
Taking above-mentioned pending data be certain application program at intraday logon information as example, if desired filter out the online account in a certain moment, described logon information can be data fragmentation by the cutting of described control node, continued to process by each processing node, according to the on-line time of each account in logon information and rolling off the production line the time, filter out at a time online account.Because multiple processing nodes screen simultaneously, the speed of online account that filters out a certain moment is also very fast.
309, described sub-result data is sent to described control node by processing node.
310, control node described sub-result data is merged, form result data.
What deserves to be explained is, the control node in the embodiment of the present invention and processing node can be all the electronic equipments that computing machine etc. has arithmetic capability.
The method of the distributed parallel task processing that further embodiment of this invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
With reference to the realization of the method shown in above-mentioned Fig. 1 and Fig. 3, as shown in Figure 4, the control node that the embodiment of the present invention provides, comprising:
Receiving element 41, for receiving pending data.
Cutting unit 42 is multiple data fragmentations for the described pending data cutting that described receiving element 41 is received.
Allocation units 43, process for described multiple data fragmentations are distributed to respectively to multiple processing nodes.
Described receiving element 41, also for receiving each processing node result data after treatment.
Merge cells 44, merges for the described sub-result data that described receiving element 41 is received, and forms result data.
Concrete, as shown in Figure 5, described cutting unit 42, for:
According to the quantity of described processing node, the described pending data cutting that described receiving element 41 is received is multiple data fragmentations.
Wherein, the quantity of described data fragmentation is identical with the quantity of described processing node.
Further, as shown in Figure 5, described allocation units 43, also for:
Give a processing node that does not get data fragmentation by a data fragmentation Random assignment in the described multiple data fragmentations after 42 cuttings of described cutting unit.
Further, as shown in Figure 5, described control node also comprises: determining unit 45.
Described receiving element 41, also for receiving the load information of each processing node.
Described determining unit 45, for the load information receiving according to described receiving element 41, determines the processing node that load is minimum.
Described allocation units 43, also for distributing to the minimum processing node of described load by a data fragmentation of the multiple data fragmentations after 42 cuttings of described cutting unit.
What deserves to be explained is, the specific implementation of the control node that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.Described control node can be the electronic equipment that computing machine etc. has arithmetic capability.
The control node that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
With reference to the realization of the method shown in above-mentioned Fig. 2 and Fig. 3, as shown in Figure 6, the processing node that the embodiment of the present invention provides, comprising:
Receiving element 51, controls for receiving the data fragmentation that node sends.
Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort.
Processing unit 52, processes for the data of described data fragmentation that receiving element 51 is received, forms sub-result data.
Transmitting element 53, sends to described control node for the sub-result data that described processing unit 52 is formed.
What deserves to be explained is, described data fragmentation comprises multirow data.
As shown in Figure 6, described processing unit 52, specifically for:
Multirow data in described data fragmentation are processed line by line.
Concrete, as shown in Figure 6, described transmitting element 53, also for:
Send load information to described control node.Wherein, described load information carries the load of processing node.
What deserves to be explained is, the specific implementation of the processing node that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.Described processing node can be the electronic equipment that computing machine etc. has arithmetic capability.
The processing node that the embodiment of the present invention provides, processing node receives data fragmentation, wherein, described data fragmentation is pending data of described control node cutting and obtaining, described pending data are not grouped and sort, described processing node is processed and is formed sub-result data data fragmentation, more sub-result data is sent to described control node.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
As shown in Figure 7, the system of the distributed parallel task processing that the embodiment of the present invention provides, comprises and controls node 61 and multiple processing node 62, wherein,
Described control node 61, for receiving pending data, is multiple data fragmentations by described pending data cutting, described multiple data fragmentations is distributed to respectively to multiple processing nodes 62 and process;
Described processing node 62, the data fragmentation sending for receiving described control node 61, processes the data in described data fragmentation, forms sub-result data, and described sub-result data is sent to described control node 61;
Described control node 61, also, for receiving each processing node 62 sub-result data after treatment, merges described sub-result data, forms result data.
What deserves to be explained is, the specific implementation of the system of the distributed parallel task processing that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.
The system of the distributed parallel task processing that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential common hardware by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium can read, as the floppy disk of computing machine, hard disk or CD etc., comprise that some instructions are in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.