CN104102475A

CN104102475A - Method, device and system for processing distributed type parallel tasks

Info

Publication number: CN104102475A
Application number: CN201310125254.1A
Authority: CN
Inventors: 廖龙; 秦晓强; 答治茜; 罗建国
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2013-04-11
Filing date: 2013-04-11
Publication date: 2014-10-15
Anticipated expiration: 2033-04-11
Also published as: CN104102475B

Abstract

The embodiment of the invention discloses a method, a device and a system for processing distributed type parallel tasks, and relates to the technical field of computers, which solves the problems of the distributed parallel task processing system of the prior art that the complexity is higher, and the distributed parallel task processing speed is lower. The method comprises the following steps of receiving to-be-processed data; cutting the to-be-processed data into a plurality of data fragments; respectively distributing the data fragments to a plurality of processing nodes to be processed; receiving the sub-result data processed by each processing node; combining the sub-result data to form the result data. The method, the device and the system are suitable for performing the parallel processing of data of large data volumes.

Description

Method, the Apparatus and system of the processing of distributed parallel task

Technical field

The present invention relates to field of computer technology, relate in particular to method, the Apparatus and system of a kind of distributed parallel task processing.

Background technology

Current, along with the development of computer technology, the equipment such as computing machine need the quantity of data to be processed also increasing.At present, can carry out by equipment such as many computing machines the parallel processing of the data that data volume is larger.Generally, in the time carrying out the fast processing of the data that data volume is larger, need to be applied to distributed parallel task processing system.Distributed parallel task processing system be a kind of by different location, there is many computing machines communication network difference in functionality or that have different pieces of information and couple together, under unified management control, complete in phase the computer system of information handling task.

Current distributed parallel task processing system generally has the node of control and multiple processing node, control node and receive pending data, and by pending data first divide into groups, sorting operation, the more pending data after grouping, sequence are given to multiple processing nodes and are processed afterwards.In the distributed parallel task of prior art is processed, all need to treat that deal with data is divided into groups, sorting operation, increase the complexity of whole distributed parallel task processing system, make the speed of distributed parallel task processing slower.

Summary of the invention

Embodiments of the invention provide method, the Apparatus and system of a kind of distributed parallel task processing, and the complexity that can solve distributed parallel task processing system of the prior art is higher, the slow problem of distributed parallel task processing.

First aspect, the embodiment of the present invention provides the method for a kind of distributed parallel task processing, comprising:

Receive pending data;

Be multiple data fragmentations by described pending data cutting;

Described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed;

Receive each processing node result data after treatment;

Described sub-result data is merged, form result data.

Second aspect, the embodiment of the present invention provides the method for a kind of distributed parallel task processing, comprising:

Receive and control the data fragmentation that node sends; Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort;

Data in described data fragmentation are processed, formed sub-result data;

Described sub-result data is sent to described control node.

The third aspect, the embodiment of the present invention provides a kind of node of controlling, and comprising:

Receiving element, for receiving pending data;

Cutting unit is multiple data fragmentations for the described pending data cutting that described receiving element is received;

Allocation units, process for described multiple data fragmentations are distributed to respectively to multiple processing nodes;

Described receiving element, also for receiving each processing node result data after treatment;

Merge cells, merges for the described sub-result data that described receiving element is received, and forms result data.

Fourth aspect, the embodiment of the present invention provides a kind of processing node, comprising:

Receiving element, controls for receiving the data fragmentation that node sends; Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort;

Processing unit, for the data of the described data fragmentation of receiving element reception are processed, forms sub-result data;

Transmitting element, sends to described control node for the sub-result data that described processing unit is formed.

The 5th aspect, the embodiment of the present invention provides the system of a kind of distributed parallel task processing, and comprise and control node and multiple processing node, wherein,

Described control node, for receiving pending data, is multiple data fragmentations by described pending data cutting, described multiple data fragmentations is distributed to respectively to multiple processing nodes and process;

Described processing node, the data fragmentation sending for receiving described control node, processes the data in described data fragmentation, forms sub-result data, and described sub-result data is sent to described control node;

Described control node, also, for receiving each processing node result data after treatment, merges described sub-result data, forms result data.

Method, the Apparatus and system of distributed parallel task provided by the invention processing, control node and receive pending data, be multiple data fragmentations by described pending data cutting, described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed, and receive each processing node result data after treatment, described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

Brief description of the drawings

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The process flow diagram one of the method for the distributed parallel task processing that Fig. 1 provides for the embodiment of the present invention;

The flowchart 2 of the method for the distributed parallel task processing that Fig. 2 provides for the embodiment of the present invention;

The process flow diagram of the method for the distributed parallel task processing that Fig. 3 provides for further embodiment of this invention;

The structural representation one of the control node that Fig. 4 provides for the embodiment of the present invention;

The structural representation two of the control node that Fig. 5 provides for the embodiment of the present invention;

The structural representation of the processing node that Fig. 6 provides for the embodiment of the present invention;

The structural representation of the system of the distributed parallel task processing that Fig. 7 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

For making the advantage of technical solution of the present invention clearer, below in conjunction with drawings and Examples, the present invention is elaborated.

As shown in Figure 1, the method for the distributed parallel task processing that the embodiment of the present invention provides, sets forth from controlling node side, and described method comprises:

101, receive pending data.

In distributed parallel task, the data volume of described pending data is generally larger, and the size of data volume generally more than 1 terabyte (Terabyte, be called for short TB), but is not only confined to this.

102, be multiple data fragmentations by described pending data cutting.

Wherein, it is data fragmentation that described pending data can be carried out cutting according to the quantity of described processing node, the quantity of described data fragmentation is identical with the quantity of described processing node, and the size of the data of each data fragmentation storage can be identical, but is not only confined to this.

103, described multiple data fragmentations being distributed to respectively to multiple processing nodes processes.

It can be that load information according to each processing node distributes that described multiple data fragmentations are distributed to respectively to multiple processing nodes process, and in the time of every sub-distribution, a data fragmentation in multiple data fragmentations is distributed to the minimum processing node of load; In addition, a data fragmentation Random assignment in described multiple data fragmentations can also be given to a processing node that does not get data fragmentation, but be not only confined to this, described multiple data fragmentations are distributed to respectively to multiple processing nodes and process and can also have other various ways, will not enumerate herein.

104, receive each processing node result data after treatment.

Wherein, described sub-result data is to form after described processing node is processed, the data fragmentation that described processing node can get it reads line by line and processes, independent irrelevant between the data of every row, the arithmetic logic carrying out on processing node can be carried out in multirow data simultaneously.

105, described sub-result data is merged, form result data.

Wherein, the sub-result data that described control node can return to each processing node merges, and forms result data.Described result data can stored data base etc., for follow-up data analysis application.

The method of the distributed parallel task processing that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, described multiple data fragmentations are distributed to respectively to multiple processing nodes to be processed, and receive each processing node result data after treatment, described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, promote the speed of distributed parallel task processing.

The opposite side corresponding with controlling node is processing node side, and as shown in Figure 2, the method for the distributed parallel task processing that the embodiment of the present invention provides, sets forth from processing node side, comprising:

201, receive and control the data fragmentation that node sends.

The source of described data fragmentation is to control the pending data that node receives.Described pending data, through controlling grouping and the sequence of node, are not directly carried out cutting and have been formed described data fragmentation by described control node.

202, the data in described data fragmentation are processed, formed sub-result data.

The data fragmentation that described processing node can get it reads line by line and processes, independent irrelevant between the data of every row, and the arithmetic logic carrying out on processing node can be carried out in multirow data simultaneously.

203, described sub-result data is sent to described control node.

The object of above-mentioned steps 203 is that the sub-result data after each processing node deal with data burst all arrives after control node, is merged by described control node, forms result data.

The method of the distributed parallel task processing that the embodiment of the present invention provides, processing node receives data fragmentation, wherein, described data fragmentation is pending data of described control node cutting and obtaining, described pending data are not grouped and sort, described processing node is processed and is formed sub-result data data fragmentation, more sub-result data is sent to described control node.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

Be elaborated and further expand for the method shown in Fig. 1 or Fig. 2 below:

As shown in Figure 3, the method for the distributed parallel task processing that further embodiment of this invention provides, comprising:

301, control node and receive pending data.

In distributed parallel task, the data volume of described pending data is generally larger, and the size of data volume generally more than 1 terabyte (Terabyte, be called for short TB), but is not only confined to this.For example, described pending data can be certain application program at intraday logon information, described logon information comprises the on-line time of the account under this application program, rolls off the production line the time etc., but is not only confined to this.

302, described control node, according to the quantity of described processing node, is multiple data fragmentations by described pending data cutting.After step 302, can perform step 303 or step 304.

303, a processing node that does not get data fragmentation is given in a data fragmentation Random assignment in described multiple data fragmentations by control node, until multiple data fragmentation is assigned.Afterwards, continue execution step 308.

For the load that ensures each processing node can be not excessive, need to carry out reasonable distribution to described data fragmentation, specifically can carry out Random assignment to data fragmentation, and receive after data fragmentation at processing node, will can again not receive the data fragmentation of these pending data.

304, the load information of himself is sent to control node by processing node.Perform step afterwards 305-306.

Same, for data fragmentation described in can reasonable distribution, can also distribute according to the size of the load of each processing node.In described load information, carry the load at processing node place.

305, control node according to the load information of each processing node receiving, determine according to described load information the processing node that load is minimum.

Concrete, get after the load information of each processing node at described control node, owing to carrying the load of processing node in described load information, therefore can learn the processing node that load is minimum.

306, control node a data fragmentation in described multiple data fragmentations is distributed to the minimum processing node of described load.Continue execution step 307.

Like this, the each data fragmentation in multiple data fragmentations divides timing, all can distribute to the minimum processing node of load, makes the distribution of data fragmentation comparatively balanced, has ensured the load balancing of processing node.

307, control node and judge whether described multiple data fragmentation is assigned.If described data fragmentation is assigned, execution step 308, otherwise return to execution step 304.

308, processing node is processed line by line to the multirow data in described data fragmentation, forms sub-result data.

Taking above-mentioned pending data be certain application program at intraday logon information as example, if desired filter out the online account in a certain moment, described logon information can be data fragmentation by the cutting of described control node, continued to process by each processing node, according to the on-line time of each account in logon information and rolling off the production line the time, filter out at a time online account.Because multiple processing nodes screen simultaneously, the speed of online account that filters out a certain moment is also very fast.

309, described sub-result data is sent to described control node by processing node.

310, control node described sub-result data is merged, form result data.

What deserves to be explained is, the control node in the embodiment of the present invention and processing node can be all the electronic equipments that computing machine etc. has arithmetic capability.

The method of the distributed parallel task processing that further embodiment of this invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

With reference to the realization of the method shown in above-mentioned Fig. 1 and Fig. 3, as shown in Figure 4, the control node that the embodiment of the present invention provides, comprising:

Receiving element 41, for receiving pending data.

Cutting unit 42 is multiple data fragmentations for the described pending data cutting that described receiving element 41 is received.

Allocation units 43, process for described multiple data fragmentations are distributed to respectively to multiple processing nodes.

Described receiving element 41, also for receiving each processing node result data after treatment.

Merge cells 44, merges for the described sub-result data that described receiving element 41 is received, and forms result data.

Concrete, as shown in Figure 5, described cutting unit 42, for:

According to the quantity of described processing node, the described pending data cutting that described receiving element 41 is received is multiple data fragmentations.

Wherein, the quantity of described data fragmentation is identical with the quantity of described processing node.

Further, as shown in Figure 5, described allocation units 43, also for:

Give a processing node that does not get data fragmentation by a data fragmentation Random assignment in the described multiple data fragmentations after 42 cuttings of described cutting unit.

Further, as shown in Figure 5, described control node also comprises: determining unit 45.

Described receiving element 41, also for receiving the load information of each processing node.

Described determining unit 45, for the load information receiving according to described receiving element 41, determines the processing node that load is minimum.

Described allocation units 43, also for distributing to the minimum processing node of described load by a data fragmentation of the multiple data fragmentations after 42 cuttings of described cutting unit.

What deserves to be explained is, the specific implementation of the control node that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.Described control node can be the electronic equipment that computing machine etc. has arithmetic capability.

The control node that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

With reference to the realization of the method shown in above-mentioned Fig. 2 and Fig. 3, as shown in Figure 6, the processing node that the embodiment of the present invention provides, comprising:

Receiving element 51, controls for receiving the data fragmentation that node sends.

Wherein, described data fragmentation is pending data of described control node cutting and obtaining, and described pending data are not grouped and sort.

Processing unit 52, processes for the data of described data fragmentation that receiving element 51 is received, forms sub-result data.

Transmitting element 53, sends to described control node for the sub-result data that described processing unit 52 is formed.

What deserves to be explained is, described data fragmentation comprises multirow data.

As shown in Figure 6, described processing unit 52, specifically for:

Multirow data in described data fragmentation are processed line by line.

Concrete, as shown in Figure 6, described transmitting element 53, also for:

Send load information to described control node.Wherein, described load information carries the load of processing node.

What deserves to be explained is, the specific implementation of the processing node that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.Described processing node can be the electronic equipment that computing machine etc. has arithmetic capability.

The processing node that the embodiment of the present invention provides, processing node receives data fragmentation, wherein, described data fragmentation is pending data of described control node cutting and obtaining, described pending data are not grouped and sort, described processing node is processed and is formed sub-result data data fragmentation, more sub-result data is sent to described control node.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

As shown in Figure 7, the system of the distributed parallel task processing that the embodiment of the present invention provides, comprises and controls node 61 and multiple processing node 62, wherein,

Described control node 61, for receiving pending data, is multiple data fragmentations by described pending data cutting, described multiple data fragmentations is distributed to respectively to multiple processing nodes 62 and process;

Described processing node 62, the data fragmentation sending for receiving described control node 61, processes the data in described data fragmentation, forms sub-result data, and described sub-result data is sent to described control node 61;

Described control node 61, also, for receiving each processing node 62 sub-result data after treatment, merges described sub-result data, forms result data.

What deserves to be explained is, the specific implementation of the system of the distributed parallel task processing that the embodiment of the present invention provides can, referring to the specific implementation of the method for the distributed parallel task processing in Fig. 3, repeat no more herein.

The system of the distributed parallel task processing that the embodiment of the present invention provides, control node and receive pending data, be multiple data fragmentations by described pending data cutting, and described multiple data fragmentations are distributed to respectively to multiple processing nodes and process, and receive each processing node result data after treatment, and described sub-result data is merged, form result data.And in the prior art, control node and receiving pending data, need first pending data are divided into groups and sorted, do not need under the scene of packet sequence at some, the mode of prior art has increased the complexity of whole distributed parallel task processing system, makes the speed of distributed parallel task processing slower.And the mode of distributed parallel task provided by the invention processing is without pending data are divided into groups and sorted, can reduce the complexity of whole distributed parallel task processing system, can promote the speed of distributed parallel task processing.

Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential common hardware by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium can read, as the floppy disk of computing machine, hard disk or CD etc., comprise that some instructions are in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.

Claims

1. a method for distributed parallel task processing, is characterized in that, comprising:

Receive pending data;

Be multiple data fragmentations by described pending data cutting;

Receive each processing node result data after treatment;

Described sub-result data is merged, form result data.

2. the method for distributed parallel task according to claim 1 processing, is characterized in that, the described step that is multiple data fragmentations by described pending data cutting, comprising:

According to the quantity of described processing node, be multiple data fragmentations by described pending data cutting; Wherein, the quantity of described data fragmentation is identical with the quantity of described processing node.

3. the method for distributed parallel task according to claim 2 processing, is characterized in that, described described multiple data fragmentations is distributed to respectively to the step that multiple processing nodes are processed, and comprising:

Give a processing node that does not get data fragmentation by a data fragmentation Random assignment in described multiple data fragmentations.

4. the method for distributed parallel task according to claim 2 processing, is characterized in that, described described multiple data fragmentations is distributed to respectively to the step that multiple processing nodes are processed, and comprising:

Receive the load information of each processing node;

Determine according to described load information the processing node that load is minimum;

A data fragmentation in described multiple data fragmentations is distributed to the minimum processing node of described load.

5. a method for distributed parallel task processing, is characterized in that, comprising:

Data in described data fragmentation are processed, formed sub-result data;

Described sub-result data is sent to described control node.

6. the method for distributed parallel task according to claim 5 processing, is characterized in that, described data fragmentation comprises multirow data.

7. the method for distributed parallel task according to claim 6 processing, is characterized in that, described data in described data fragmentation is processed, and forms the step of sub-result data, comprising:

Multirow data in described data fragmentation are processed line by line.

8. the method for distributed parallel task according to claim 5 processing, is characterized in that, described method also comprises:

Send the load information of self to described control node; Wherein, described load information carries the load of processing node.

9. control a node, it is characterized in that, comprising:

Receiving element, for receiving pending data;

10. control node according to claim 9, is characterized in that, described cutting unit, for:

According to the quantity of described processing node, the described pending data cutting that described receiving element is received is multiple data fragmentations; Wherein, the quantity of described data fragmentation is identical with the quantity of described processing node.

11. control nodes according to claim 10, is characterized in that, described allocation units, also for:

Give a processing node that does not get data fragmentation by a data fragmentation Random assignment in the described multiple data fragmentations after described cutting unit cutting.

12. control nodes according to claim 10, is characterized in that, described control node also comprises: determining unit;

Described receiving element, also for receiving the load information of each processing node;

Described determining unit, for the load information receiving according to described receiving element, determines the processing node that load is minimum;

Described allocation units, also for distributing to the minimum processing node of described load by a data fragmentation of the multiple data fragmentations after described cutting unit cutting.

13. 1 kinds of processing nodes, is characterized in that, comprising:

14. control nodes according to claim 13, is characterized in that, described data fragmentation comprises multirow data.

15. control nodes according to claim 14, is characterized in that, described processing unit, for:

Multirow data in described data fragmentation are processed line by line.

16. control nodes according to claim 13, is characterized in that, described transmitting element, also for:

The system of 17. 1 kinds of distributed parallel tasks processing, is characterized in that, comprises and controls node and multiple processing node, wherein,