CN107977444A

CN107977444A - Mass data method for parallel processing based on big data

Info

Publication number: CN107977444A
Application number: CN201711306590.0A
Authority: CN
Inventors: 李垚霖
Original assignee: Chengdu Boruide Science & Technology Co Ltd
Current assignee: Chengdu Boruide Science & Technology Co Ltd
Priority date: 2017-12-11
Filing date: 2017-12-11
Publication date: 2018-05-01

Abstract

The present invention provides a kind of mass data method for parallel processing based on big data, this method includes：(1) it is currently available into number of passes to obtain parallel computing platform；(2) according to can use into number of passes, distribute and initialize the buffer queue for stored carry；(3) the processor active task progress region cutting that progress sum operation will be needed into number of passes according to can use；(4) calculating task in each region is asked in multi-process parallel, and last carry value is saved in buffer queue in corresponding item；(5) unified operation is carried out to the result in each region in updated buffer queue.The present invention proposes a kind of mass data method for parallel processing based on big data, on the basis of multinuclear calculating platform, makes full use of distributed parallel environment to improve arithmetic speed.

Description

Mass data method for parallel processing based on big data

Technical field

The present invention relates to parallel computation, more particularly to a kind of method for parallel processing of mass data.

Background technology

The analysis and processing of big data have become the IT abilities of enterprise-essential.Because the scale of construction of big data is big, speed is fast, Species is more, has a large amount of isomery unstructured datas so that big data analysis handles and using there is also huge difficulty.In order to Solve the problems, such as big data computing, the software developer and researcher of countries in the world have carried out many research and reality Trample.In recent years, more and more developers start to pay attention to the effect of soft project, in order to reduce the duplication of labour, improve software Quality and code reuse, many outstanding big data computing storehouses occur therewith.That existing big data computing place provides It is mathematical operation function, and only realizes the serial algorithm of various computings, for multinuclear Distributed Computing Platform and does not apply to.When It is preceding not produce any relevant parallel computation on multinuclear Distributed Computing Platform also.What is overcome in parallel computation mainly asks One of topic is data dependence, and step-by-step sum operation makes algorithm there is very big relativity problem in itself in accumulation algorithm.

The content of the invention

To solve the problems of above-mentioned prior art, the present invention proposes a kind of mass data based on big data simultaneously Row processing method, including：

(1) it is currently available into number of passes to obtain parallel computing platform；

(2) can use according to what is obtained in step (1) into number of passes, the buffer queue of distribution and initialization for stored carry, Its item number is N；

(3) into number of passes the processor active task for carrying out sum operation will be needed to carry out region according to can use of being obtained in step (1) Cutting, the number of the subtask in region and the number of buffer queue correspond, more than or equal to can use into number of passes；

(4) use Dynamic Scheduling Strategy, the calculating task in each region is asked in multi-process parallel, rate first carried out task into Journey then distributes a subtask from the task pool formed by subtask, and each process need to judge current son when asking for subtask Task is last subtask, if serial accumulation algorithm is then called in last subtask, is otherwise directly invoked Serial accumulation algorithm calculates current subtask, and last carry value then is saved in corresponding item in step (2) buffer queue In, store the result into the relevant position of result；

(5) unified operation, detailed process are carried out to the result in each region in updated buffer queue in step (4) For：Travel through buffer queue in remove buffer queue N-1 each value, if carry value is zero, continue to travel through it is next, if Be worth for non-zero, then in the result obtained to step (4) since next area results to result highest order whole region Carry out plus 1 operation, and it is new when advanced potential is not 1 when adding during 1, jump out this ergodic process；Travel through except caching team The carry Flag of highest order is updated after each value of row N-1.

The present invention compared with prior art, has the following advantages：

The present invention proposes a kind of mass data method for parallel processing based on big data, on the basis of multinuclear calculating platform On, make full use of distributed parallel environment to improve arithmetic speed.

Brief description of the drawings

Fig. 1 is the flow chart of the mass data method for parallel processing according to embodiments of the present invention based on big data.

Embodiment

Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing for illustrating the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Just provide a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.

An aspect of of the present present invention provides a kind of mass data method for parallel processing based on big data.Fig. 1 is according to this The mass data method for parallel processing flow chart based on big data of inventive embodiments.The concurrent operation method of the present invention is specifically real Existing step is as follows：

(1) two positive numbers x and y are inputted；

(2) scale of x and y, xs and ys are obtained, (xs and ys can just can be born, and just be expressed as positive number, negative indication this just Number is negative), and sheer size size abs_xs and abs_ys (abs_xs and the abs_ys of x and y>=0)；

(3) abs_xs and abs_ys are compared, if abs_xs is less than abs_ys, two numbers are swapped, it is ensured that The scale of summand is more than or equal to the scale of addend；

(4) it is two numbers and storage allocation ws, memory size abs_xs+1；

(5) pointer of two numbers, xp and yp are obtained；

(6) according to the concrete condition of two numbers, corresponding operation is selected.If the symbol of two numbers needs to carry out when different Subtraction operate, then by first positive number symbol determination result symbol；Need to sum when the symbol of two numbers is identical Operation, the symbol of positive number are determined by the symbol of first number.

(7) in the case of the scale of two number x and y is identical, x and y by numerical digit sum operation, detailed process see Step (8), is then back to last carry Flag；(the xs in the case of scale of x and y is different>Ys), complete in x Ys carry out by numerical digit sum operation with y, obtain the value cy of carry, and the value in the ys positions of x and cy then are added to the of result In ys, remaining part assignment in x (finally be might have into the operation by numerical digit carry) into result, finally return to carry Value cy；

(8) multi-core parallel concurrent scheme of the invention is called, by numerical digit sum operation, to implement step for x and y：

(8.1) it is currently available into number of passes to obtain parallel computing platform；

(8.2) can use according to what is obtained in step (8.1) into number of passes, the caching team of distribution and initialization for stored carry Row, its item number is N；

(8.3) according to the computing times that can use into number of passes by needs progress by numerical digit sum operation obtained in step (8.1) Business carries out region cutting, specific process：(8.31) carried out according to parallel computing platform is currently available into number of passes m and needs by number The number n of position sum operation obtains the size of each subtask and the size of special subtask；(8.32) state and initialize repeatedly It is 0 for variable, sets the value of iteration variable to arrive N-1 for 0, each iteration is completed the son that iteration variable is multiplied by step (8.31) and appointed The size of business, its obtained value are the starting point of each subtask, complete region cutting；The item number of buffer queue and subtask Number it is equal, but with might not be equal into number of passes, when equal with into number of passes, i.e., a process corresponds to a son and appoints Business, but subtask ID and process ID are corresponding in the way of beyond static scheduling；In addition, the value of No. 0 item of buffer queue corresponds to No. 0 The carry Flag of task, the value of No. 1 item of buffer queue correspond to the carry Flag of No. 1 task, and the value of No. 2 items of buffer queue corresponds to The carry Flag of No. 2 task, the value of buffer queue N-1 items correspond to the carry Flag of N-1 tasks, and N is buffer queue Item number.

(8.4) Dynamic Scheduling Strategy is used, the calculating task in each region is asked in multi-process parallel, and rate has first carried out task Process then distributes a subtask from the task pool formed by subtask, and each process need to judge currently when asking for subtask Subtask is last subtask, otherwise can be with if last subtask needs to call serial sum operation Directly invoke serial sum operation and calculate current subtask, last carry value is then saved in step (8.2) buffer queue In in corresponding item, store the result into the relevant position of result；Region cutting, obtained son are carried out to task using into number of passes The size of task differs, and the summation operation operation in the corresponding subtask of positive number highest order is transported with the summation in other subtasks It is different, it is necessary to carry out specially treated to calculate operation number, i.e., by this this special son of corresponding subtask of positive number highest order Task and other subtasks progress same treatment in addition to special duty, but will be corresponded to every time before execution according to subtask Subtask ID judge current process perform region be special subtask；The specific serial mistake for carrying out sum operation Journey：

1) carry cr is assigned a value of zero, pointer variable xp and yp are respectively directed to the lowest order of two numbers；

2) obtain the value of summand pointer meaning position and pointer is directed toward next bit, xl=*xp++；Obtain addend pointer institute Refer to the value of position and pointer is directed toward next bit, yl=*yp++；

3) numerical value of the present bit of acquirement is subjected to sum operation, and stored the result into variable sl, sl=xl+yl；

4) whether summed result is less than the value of the present bit of summand, cy1=sl in being walked on judging<xl；If cy1=1 Represent the carry of the oriented next bit of result of summation, cy1 preserves carry value；If cy1=0 represents that the result of summation is not downward The carry of one；

5) summed result is added into the carry value cy of upper one, and by result there are in rl, rl=sl+cy；

6) with the end value in rl compared with the value in sl, cy2=rl<sl；If cy2=1 is represented due to summed result Cause the carry of oriented next bit plus carry, cy2 preserves carry value；After if cy2=0 represents that the result of summation adds carry Not to the carry of next bit；

7) this operation result is obtained according to cy1 and cy2 either with or without the carry cr=cy1 to next bit | cy2；cy =1 represents the carry of oriented next bit, and cy=0 is represented not to the carry of next bit；

8) end value in rl is stored into the space for storage result；

9) 2) -8 are repeated) content of step, until all bit arithmetics of addend finish, sum operation fortune is carried out by numerical digit to this Finish；

(8.5) unified operation, tool are carried out to the carry result in each region in updated buffer queue in step (8.4) Body process is：

(8.51) each value that buffer queue N-1 is removed in buffer queue is traveled through, if carry value is zero, continues to travel through It is next, if value is non-zero, to the highest of result since next area results in the result obtained to step (8.4) The whole region of position carries out plus 1 operation, and new when advanced potential is not 1 when adding during 1, jumps out this ergodic process；

(8.52) traversal is except the carry Flag that highest order is updated after each value of buffer queue N-1.In buffer queue N-1 Value need not carry out traversing operation, what it was stored is carry last in all positions of positive number as a result, only need to step (8.5) it Result afterwards carries out xor operation, obtains carry value last in whole computing.

In above-mentioned parallel computing platform, to complete multinuclear cooperated computing, the present invention compiles the multiple programming interface of extension Cheng Weineng is by any type in multinuclear Distributed Computing Platform and any number of calculate nodes come the Parallel application journey that performs parallel Sequence.

Serial code is automatically converted into the isomerism parallel code of mixing by compiler.Specifically include, compiler passes through first Parse sentence to determine to need calculating kernel and acquisition and the relevant necessary information of cluster parallel computing accelerated in application program, so Afterwards core is specifically calculated for each node generation equipment for participating in parallel computation.Each core that calculates is to calculate kernel do not existing together Manage the different editions on device.

Cutting calculating task is simultaneously reasonably allocated to multiple calculate nodes, and it is specific that equipment is then performed in each calculate node Calculating core complete to distribute to its subtask.For each calculating kernel, participated in according in multinuclear Distributed Computing Platform The equipment of parallel computation creates the control process of equivalent amount, if p calculate node can use, then creates p process To control, and this p process is run using p dedicated heterogeneous processors, multiple processors are considered as a calculate node. Process t_iPerform equipment and specifically calculate core, wherein 1≤i≤p-1.A part of input data is copied from host node first, so Start all available processes afterwards and come the parallel subtask for performing distribution, result of calculation is finally copied back into host node.It is same with this When, process t_pProcessor, which is performed, in remaining available heterogeneous processor calculates core.Specifically include, process t_pProduce mk-p+1 The progress of work to perform the subtask for distributing to processor parallel, and processor number, k represent each used in wherein m expressions The check figure of processor.

Distributed storage architecture, and the communication delay between node are used between node in view of multi-core parallel concurrent environment More than the communication delay in node, therefore coarse-grained parallelization method is used between node.Procedure below is based between node parallel Realize：It is N number of level-one subregion first by the impartial subdivision of cluster task topology, each level-one subregion is then respectively allocated to one Node is individually handled.Wherein, N is the node total number for participating in parallel computation.

To improve data access speed, the three-level caching on different processor is relatively independent, therefore processor Between also use coarse-grained parallelization method.Established parallel on the basis of level-one subregion between processor：First by each level-one Further impartial subdivision is K secondary partition to subregion, then distributes each secondary partition for being derived from identical level-one subregion respectively Different processor to same node point is individually handled.Wherein, K is the sum of individual node inner treater.So as to by different disposal Communication between device has been limited in same node as far as possible.

For the internuclear parallel computing module for being related to the bottom, using fine grained parallel computational methods.To each two fraction The further decomposition of area's calculating task, finds loop structure therein for each calculation procedure of secondary partition, is decomposed For subtask that is a large amount of orthogonal and can independently executing, each subtask then is distributed to polycaryon processor one counts Core is calculated to go to perform.

MPI Parallel Programming Models are used on programming mode.The top layer of whole parallel computation is built based on MPI processes, its In each process be responsible for controlling the processing of 1 secondary partition.Given full play to by deriving from a large amount of threads inside each MPI processes The advantage of polycaryon processor height concurrent.To realize level load balancing, using following parallel computation flow：

1st step, the data file needed for parallel computation, including the unit of each subregion, node, load are generated by two-stage subregion Lotus, boundary condition and adjacent sectors information etc.；

2nd step, while start K MPI process in each node, wherein each process is responsible for 1 secondary partition data The reading of file；

3rd step, derives T thread inside each MPI processes respectively, completes respective partition using multi-core resource and adds up meter Calculate.Wherein, T is the calculating core sum in single processor；

4th step, derives T thread inside each MPI processes respectively, utilizes multi-core resource to complete accumulative carry and calculates；

5th step, if desired further iteration then jump to the 2nd step and restart to perform, otherwise terminate.

In order between each calculate node rationally, efficiently carry out task scheduling, and reduce the communication overhead between node, this hair It is bright to propose a kind of method for obtaining calculation amount cutting ratio.In cluster parallel computing, when each calculate node has been completed respectively From work, can just think that whole calculating task performs completion.It is assumed that the execution time of each calculate node is with distributing to its meter Calculation amount is proportional, and the total of cluster parallel computing performs time T_totalIt is as follows：

T_total=max (T₁×R₁；T₂×R_2,..., T_N×R_N)

Wherein, N represents to participate in the number of nodes of parallel computation in a multinuclear calculating platform；T_iRepresent that whole calculate is appointed Business individually handles the spent time by i-th (1≤i≤N) a calculate node；R_iRepresent to distribute to i-th calculate node Calculation amount shared ratio in the amount of calculation；When each calculate node, when synchronization completes respective work, acquirement is most rational Total execution time of calculation amount cutting, i.e. cluster parallel computing reaches most short：

Work as R_iAfter determining, the calculation amount W of i-th (1≤i≤p) a calculate node is distributed to_iIt is as follows：W_i=W × R_i

Wherein, W represents the amount of calculation of whole calculating task.

In terms of management of computing, present invention task scheduling strategy setting, setting between node is set in extending sentence is initial Calculation amount cutting ratio and process configuration.For a given cluster parallel computing, compiled using extension sentence notice Translate the following key message of device：The equipment for participating in parallel computation, task scheduling strategy and initial calculation amount cutting ratio between node.

In terms of data management, the present invention comes support section transmission, incremental transmission and asynchronous transmission using extension sentence.It is logical Know which space of compiler need to carry out fractional transmission, and need not specify which which partial data in a space need to upload to Calculate node is downloaded to host node from which calculate node.What the data cutting between each node was provided by runtime system Task Scheduling Mechanism automatically processes.

The compiler of the present invention is using the matrix multiplication procedure that the multiple programming sentence using extension is write as input, translation Produce isomerism parallel matrix multiplication procedure, including following four committed steps：

Step 1：Read the serial code of the multiple programming sentence with extension and syntax tree is constructed after syntactic analysis.

Step 2：Obtain information relevant with cluster parallel computing by parsing extension sentence and carry out relevant operation.Tool Body includes：

(1) determine to need the calculating kernel accelerated in application program；(2) number of nodes for participating in parallel computation, and pin are set Each node designated equipment globally unique ID, device type, device numbering and the equipment for participating in parallel computation are specifically counted Calculate core；(3) initial value, stop value and the step-length for the outermost loop for calculating kernel are obtained；(4) calculation amount cutting ratio is set； (5) specify and task scheduling strategy and open task scheduling between node.

Step 3：Kernel maker is calculated to participate in the generation of each node of parallel computation for specific heterogeneous processor It is specific to calculate core.

Step 4：Syntax tree that de-parsing was changed simultaneously ultimately generates the hybrid parallel code using multiple programming.After conversion Source code in retain annotation and control structure before conversion in source code.

In order to adapt to different types of data level Parallel application, support task scheduling between more flexible and efficient node, The present invention is proposed on the basis of static policies, and using scalability task scheduling strategy, overall strategy is in a calculating Block size is dynamically adjusted according to the performance change of cluster parallel computing in the implementation procedure of kernel, in order to provide the equipment of higher Utilization rate, lower scheduling overhead simultaneously keep calculation amount equilibrium between node.

The 1=n (i.e. W=n) that fetching devise a stratagem calculates the amount of calculation of kernel is used as initial block size, and wherein parameter n can be by compiling Cheng personnel's manual settings；Then adjusted in the implementation procedure for calculating kernel is specified according to the performance change of cluster parallel computing dynamic Whole next piece of size.

Specifically comprise the following steps：

Step 1：The 1st block is performed in unison with using p calculate node, its size is W/n.Specifically include：

(1) according to initial calculation amount cutting ratio R_iBy a part of calculation amount W of the 1st block_r.iDistribute to node D_i(1≤i ≤ p), wherein W_r.i=W_r×R_iAnd W_r=W/n.Initial calculation amount cutting ratio according to participate in parallel computation each node theory Peak performance is calculated.

(2) in node D_iMiddle execution equipment specifically calculates core to complete to distribute to its calculation amount W_r.i。

(3) as node D_iAfter completing the calculation amount for distributing to it, collector node D_iCurrent execution time T_r.i, and count Operator node D_iCurrent perform speed V_r.i, wherein V_r.i=W_r.i/T_r.i。

(4) after all p calculate nodes complete respective work, calculate node D_iOpposite perform speed RV_i, whereinThe opposite speed that performs is used as new calculation amount cutting ratio, updates calculation amount cutting ratio as follows Rate：R_i=RV_i(1≤i≤p)。

(5) calculate present parallel and perform speed V_r, wherein V_r=W_r=T_rAnd Tr=max (T_r.1；T_r.2..., T_r.p)。

(6) completed the amount of calculation W is updated_fWith remaining calculation amount W_r, wherein W_f=W_f+W_rAnd W_r=W-W_f。

Step 2：Remaining calculation amount is judged whether, if it is not, then explanation specifies calculating kernel executed to finish；If so, Similar to step 1, the 2nd block is performed in unison with using p calculate node, its size is 2 × W/n.Specifically：(1) basis obtains To calculation amount cutting ratio by the calculation amount of the 2nd block distribute to participate in parallel computation each node.(2) in each calculate node Middle execution equipment specifically calculates core to complete to distribute to its calculation amount.(3) after each calculate node completes respective work, The execution time of each calculate node is collected, calculate the opposite of each node and perform speed, and calculation amount cutting ratio is updated with this. (4) present parallel is calculated according to the information being collected into and performs speed.(5) next piece of size is adjusted, that is, determines to need in next step The calculation amount to be completed.Speed V is performed by comparing the parallel of previous step_pWith current parallel execution speed V_r, and compare The size W of one block_pThe size W of (i.e. the completed calculation amount of previous step) and current block_rIt is (completed i.e. in current procedures Calculation amount), to determine next piece of size W_nCompared to current block size W_rIt is the multiplication, demultiplication or remains unchanged.(6) Update completed the amount of calculation and remaining calculation amount.

Step 3：Repeat step 2, until remaining calculation amount is 0.

In every single-step iteration, in D_iBefore middle execution current block, by calculation amount cutting ratio R_iUploaded from host node current A part of data of block are to D_i, work as D_iAfter the execution for terminating current block, by calculation amount cutting ratio R_iFrom D_iMiddle download current block Data after part processing are to host node.

For the cluster parallel computing of some data level Parallel applications, internodal data transmission is optimized to be had very much Necessity, should take message between nodes transmission into account in the design of dynamic task scheduling strategy especially between node.To avoid The data transfer of above-mentioned redundancy, the present invention devise a kind of differential data transmission method, need to be held repeatedly suitable for one or more The data level Parallel application of the multiple calculating kernel of row.Specifically include, in specifying the first time for calculating kernel to perform, by initial The whole calculation amount of the calculating kernel is distributed to each node for participating in parallel computation by calculation amount cutting ratio, when each calculate node After completing respective work, the current execution time of each calculate node is collected to calculate new calculation amount cutting ratio.The calculating The subsequent execution each time of kernel is similar with performing for the first time, unlike, second of execution of the calculating kernel starts, by this Calculate the calculation amount cutting ratio updated after the last execution of kernel and whole calculation amount is distributed into each of participation parallel computation Node.

It need to be uploaded in which definite partial data from host node or during be downloaded to host node, including following step Suddenly：

Step 1：Determine which partial data need to be passed specified between calculate node and host node in designated storage area It is defeated.According to the initial value of the outermost loop of specified calculating kernel and stop value and the calculation amount used in current perform Cutting ratio, from designated storage area retrieve and determine one the calculating kernel currently perform in need to be under specified calculate node Be loaded onto the first subarray of host node, and record respectively the subarray in designated storage area start to index and terminate index； Cut according to the initial value of the outermost loop of the calculating kernel and stop value and the calculation amount used in perform next time Divide ratio, retrieved again from designated storage area and determine that a next time in the calculating kernel need to upload in performing from host node To the second subarray of specified calculate node, and the subarray starting to index and terminate rope in designated storage area is recorded respectively Draw.

Step 2：Which partial data in one designated storage area is determined by the subarray obtained in comparison step 1 It need to be transmitted specified between calculate node and host node.If all or part of data are being specified in calculating in the first subarray Refer to devise a stratagem needed for operator node in performing the next time of core, then these data need not be from finger in the current execution of the calculating kernel Devise a stratagem operator node is downloaded to host node；Otherwise, these data need to be from specified calculate node in the current execution of the calculating kernel It is downloaded to host node.If all or part of data are present in the memory of specified calculate node in the second subarray, these Data need not upload to specified calculate node in being performed in the next time of the calculating kernel from host node；Otherwise, these data are at this Specified calculate node need to be uploaded to from host node by calculating in execution next time of kernel.

It is specified for that need to be uploaded in one designated storage area of acquisition from host node that the runtime system of the present invention provides API One or two subarray of calculate node starts to index and terminates index, and can also obtain in a specified array need to be from specifying Calculate node be downloaded to one or two subarray of host node start to index and terminate index.

Preferably, the whole iteration space of a calculating kernel is cut into multiple equal in magnitude or differed in size by the present invention Block, and concomitantly synergistically perform these blocks using multiple calculate nodes in multinuclear Distributed Computing Platform；Upload data, under Data are carried with calculating the calculating parallel processing of core.Realize that data transfer assesses the overlapping of calculation with calculating using three threads, the One thread is responsible for next piece of all or part of data from host node asynchronous upload to specified calculate node；Second line Journey is responsible for the asynchronous execution current block in calculate node is specified；3rd thread is responsible for all or part of data of a upper block Host node is downloaded to from specified calculate node is asynchronous.

Correspondingly, it is scheduled in task scheduling strategy using following steps：

Step 1：The 1st block is performed using p calculate node parallel.Specifically include：(1) according to calculation amount cutting ratio R_r.iBy a part of calculation amount W of the 1st block_r.iDistribute to node D_i(1≤i≤p), wherein W_r.i=W_r×R_r.i、W_r=W/n is (just Beginning block size) and R_r.i=R_i(initial calculation amount cutting ratio).(2) according to calculation amount cutting ratio R_n.iBy next piece (i.e. 2nd block) a part of calculation amount W_n.iIt is pre-assigned to node D_i, wherein W_n.i=W_n×R_n.i、W_n=W=n and R_n.i=R_i。(3) In node D_iMiddle execution equipment specifically calculates core to complete to distribute to its calculation amount W_r.i.If node D_iIt is one and calculates section Point, then in node D_iCalculation amount cutting ratio R is pressed before middle the 1st block of execution_r.iBy a part of data of the 1st block from main section Synchronized upload is put to node D_iIn；When in node D_iDuring the 1st block of middle asynchronous execution, according to calculation amount cutting ratio R_n.iBy the 2nd A part of data of a block are from host node asynchronous upload to node D_iIn.(4) as node D_iComplete distribute to its calculation amount it Afterwards, the current execution time T of collector node Di_r.i, and calculate node D_iCurrent perform speed V_r.i, wherein V_r.i=W_r.i= T_r.i.(5) after all p calculate nodes complete respective work, calculate node D_iOpposite perform speed RV_i, whereinRenewal is used for the calculation amount cutting ratio R of next piece (i.e. the 3rd block) under cutting_nn.i, wherein R_nn.i=RV_i (1≤i≤p)；Calculate present parallel and perform speed V_r, wherein V_r=W_r/T_rAnd T_r=max (T_r.1；T_r.2..., T_r.p).(6) more New completed the amount of calculation W_fWith remaining calculation amount W_r, wherein W_f=W_f+W_rAnd W_r=W-W_f-W_n。

Step 2：Not yet handle, then come simultaneously using p calculate node if there is the work of remaining calculation amount or predistribution Row performs the 2nd block.Specifically include：(1) the calculation amount cutting ratio updated afterwards according to a upper block (i.e. the 1st block) has been performed The calculation amount of next piece (i.e. the 3rd block) is pre-assigned to each node for participating in parallel computation by rate.(2) in each calculate node Perform equipment and specifically calculate core to complete to distribute to the calculation amount of its current block (i.e. the 2nd block).If node D_i(1≤i ≤ p) be a calculate node, then thread 0 presses calculation amount cutting ratio R_n.iNext piece of a part of data are different from host node Step uploads to node D_iIn, thread 1 presses calculation amount cutting ratio R_r.iIn node D_iMiddle asynchronous execution current block, thread 2 is by calculating Measure cutting ratio R_p.iA part of data of a treated upper block are downloaded to host node from node Di is asynchronous, wherein R_p.i、R_r.iAnd R_n.iNode D is represented respectively_iThe shared ratio in a upper block, current block and the distribution of next piece of calculation amount. (3) after each calculate node completes respective work, the execution time of each calculate node is collected, calculates the opposite execution of each node Speed, renewal are used for the calculation amount cutting ratio of next piece (i.e. the 4th block) under cutting, and calculate present parallel and perform speed. (4) according to the last parallel difference and a upper block size and current block performed between speed and present parallel execution speed Size changes to adjust down next piece of size.(5) completed the amount of calculation and remaining calculation amount are updated.

Step 3：Repeat step 2, finishes until remaining calculation amount is processed for the work of 0 or predistribution.

If current block is last block, when thread 1 presses calculation amount cutting ratio R_r.iIn node D_iMiddle asynchronous execution is most During the latter block, thread 2 presses calculation amount cutting ratio R_p.iBy a part of data of processed penultimate block from node D_i It is asynchronous to be downloaded to host node；As node D_iAfter the work for completing oneself, by calculation amount cutting ratio R_r.iBy it is processed last A part of data of a block are from node D_iSynchronously it is downloaded to host node.

In conclusion the present invention proposes a kind of mass data method for parallel processing based on big data, calculated in multinuclear On the basis of platform, distributed parallel environment is made full use of to improve arithmetic speed.

Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, alternatively, they can be realized with the program code that computing system can perform, it is thus possible to which they are stored Performed within the storage system by computing system.Combined in this way, the present invention is not restricted to any specific hardware and software.

It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims

A kind of 1. mass data method for parallel processing based on big data, it is characterised in that including：

(1) it is currently available into number of passes to obtain parallel computing platform；

(2) can use according to what is obtained in step (1) into number of passes, the buffer queue of distribution and initialization for stored carry, its Number is N；

(3) into number of passes the processor active task for carrying out sum operation will be needed to carry out region cutting according to can use of being obtained in step (1), The number of subtask in region and the number of buffer queue correspond, more than or equal to can use into number of passes；

(4) Dynamic Scheduling Strategy is used, the calculating task in each region is asked in multi-process parallel, and the process that rate has first carried out task connects Current subtask need to be judged when asking for subtask by one subtask of distribution, each process from the task pool formed by subtask Whether last subtask, if serial accumulation algorithm is then called in last subtask, otherwise directly invoke serial Accumulation algorithm calculates current subtask, and then last carry value is saved in step (2) buffer queue in corresponding item, will As a result it is stored in the relevant position of result；

(5) unified operation is carried out to the result in each region in updated buffer queue in step (4), detailed process is：Time Go through in buffer queue remove buffer queue N-1 each value, if carry value is zero, continue to travel through it is next, if value is Non-zero, the then whole region of the highest order in the result obtained to step (4) since next area results to result carry out Add 1 operation, and it is new when advanced potential is not 1 during 1 when adding, jump out this ergodic process；Travel through and removed buffer queue N-1 Each value after update highest order carry Flag.
2. according to the method described in claim 1, it is characterized in that：The item number of buffer queue and subtask in the step (2) Number it is equal, when the item number of buffer queue is equal with into number of passes, i.e. process corresponds to a subtask, subtask ID with Process ID corresponds in the way of beyond static scheduling；The value of No. 0 item of buffer queue corresponds to the carry Flag of No. 0 task, caching The value of No. 1 item of queue corresponds to the carry Flag of No. 1 task, and the value of No. 2 items of buffer queue corresponds to the carry of No. 2 task Flag, the value of buffer queue N-1 items correspond to the carry Flag of N-1 tasks, and N is to be cached in claim 1 in step (2) The item number of queue.
3. according to the method described in claim 1, it is characterized in that：Value in the step (5) in buffer queue in N-1 is not required to Traversing operation is carried out, what it was stored is carry last in all positions of positive number as a result, only being carried out to the result after step (5) Xor operation, obtains carry value last in whole computing.