CN107977444A - Mass data method for parallel processing based on big data - Google Patents
Mass data method for parallel processing based on big data Download PDFInfo
- Publication number
- CN107977444A CN107977444A CN201711306590.0A CN201711306590A CN107977444A CN 107977444 A CN107977444 A CN 107977444A CN 201711306590 A CN201711306590 A CN 201711306590A CN 107977444 A CN107977444 A CN 107977444A
- Authority
- CN
- China
- Prior art keywords
- buffer queue
- subtask
- value
- carry
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
Abstract
The present invention provides a kind of mass data method for parallel processing based on big data, this method includes:(1) it is currently available into number of passes to obtain parallel computing platform;(2) according to can use into number of passes, distribute and initialize the buffer queue for stored carry;(3) the processor active task progress region cutting that progress sum operation will be needed into number of passes according to can use;(4) calculating task in each region is asked in multi-process parallel, and last carry value is saved in buffer queue in corresponding item;(5) unified operation is carried out to the result in each region in updated buffer queue.The present invention proposes a kind of mass data method for parallel processing based on big data, on the basis of multinuclear calculating platform, makes full use of distributed parallel environment to improve arithmetic speed.
Description
Technical field
The present invention relates to parallel computation, more particularly to a kind of method for parallel processing of mass data.
Background technology
The analysis and processing of big data have become the IT abilities of enterprise-essential.Because the scale of construction of big data is big, speed is fast,
Species is more, has a large amount of isomery unstructured datas so that big data analysis handles and using there is also huge difficulty.In order to
Solve the problems, such as big data computing, the software developer and researcher of countries in the world have carried out many research and reality
Trample.In recent years, more and more developers start to pay attention to the effect of soft project, in order to reduce the duplication of labour, improve software
Quality and code reuse, many outstanding big data computing storehouses occur therewith.That existing big data computing place provides
It is mathematical operation function, and only realizes the serial algorithm of various computings, for multinuclear Distributed Computing Platform and does not apply to.When
It is preceding not produce any relevant parallel computation on multinuclear Distributed Computing Platform also.What is overcome in parallel computation mainly asks
One of topic is data dependence, and step-by-step sum operation makes algorithm there is very big relativity problem in itself in accumulation algorithm.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of mass data based on big data simultaneously
Row processing method, including:
(1) it is currently available into number of passes to obtain parallel computing platform;
(2) can use according to what is obtained in step (1) into number of passes, the buffer queue of distribution and initialization for stored carry,
Its item number is N;
(3) into number of passes the processor active task for carrying out sum operation will be needed to carry out region according to can use of being obtained in step (1)
Cutting, the number of the subtask in region and the number of buffer queue correspond, more than or equal to can use into number of passes;
(4) use Dynamic Scheduling Strategy, the calculating task in each region is asked in multi-process parallel, rate first carried out task into
Journey then distributes a subtask from the task pool formed by subtask, and each process need to judge current son when asking for subtask
Task is last subtask, if serial accumulation algorithm is then called in last subtask, is otherwise directly invoked
Serial accumulation algorithm calculates current subtask, and last carry value then is saved in corresponding item in step (2) buffer queue
In, store the result into the relevant position of result;
(5) unified operation, detailed process are carried out to the result in each region in updated buffer queue in step (4)
For:Travel through buffer queue in remove buffer queue N-1 each value, if carry value is zero, continue to travel through it is next, if
Be worth for non-zero, then in the result obtained to step (4) since next area results to result highest order whole region
Carry out plus 1 operation, and it is new when advanced potential is not 1 when adding during 1, jump out this ergodic process;Travel through except caching team
The carry Flag of highest order is updated after each value of row N-1.
The present invention compared with prior art, has the following advantages:
The present invention proposes a kind of mass data method for parallel processing based on big data, on the basis of multinuclear calculating platform
On, make full use of distributed parallel environment to improve arithmetic speed.
Brief description of the drawings
Fig. 1 is the flow chart of the mass data method for parallel processing according to embodiments of the present invention based on big data.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing for illustrating the principle of the invention
State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right
Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Just provide a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of mass data method for parallel processing based on big data.Fig. 1 is according to this
The mass data method for parallel processing flow chart based on big data of inventive embodiments.The concurrent operation method of the present invention is specifically real
Existing step is as follows:
(1) two positive numbers x and y are inputted;
(2) scale of x and y, xs and ys are obtained, (xs and ys can just can be born, and just be expressed as positive number, negative indication this just
Number is negative), and sheer size size abs_xs and abs_ys (abs_xs and the abs_ys of x and y>=0);
(3) abs_xs and abs_ys are compared, if abs_xs is less than abs_ys, two numbers are swapped, it is ensured that
The scale of summand is more than or equal to the scale of addend;
(4) it is two numbers and storage allocation ws, memory size abs_xs+1;
(5) pointer of two numbers, xp and yp are obtained;
(6) according to the concrete condition of two numbers, corresponding operation is selected.If the symbol of two numbers needs to carry out when different
Subtraction operate, then by first positive number symbol determination result symbol;Need to sum when the symbol of two numbers is identical
Operation, the symbol of positive number are determined by the symbol of first number.
(7) in the case of the scale of two number x and y is identical, x and y by numerical digit sum operation, detailed process see
Step (8), is then back to last carry Flag;(the xs in the case of scale of x and y is different>Ys), complete in x
Ys carry out by numerical digit sum operation with y, obtain the value cy of carry, and the value in the ys positions of x and cy then are added to the of result
In ys, remaining part assignment in x (finally be might have into the operation by numerical digit carry) into result, finally return to carry
Value cy;
(8) multi-core parallel concurrent scheme of the invention is called, by numerical digit sum operation, to implement step for x and y:
(8.1) it is currently available into number of passes to obtain parallel computing platform;
(8.2) can use according to what is obtained in step (8.1) into number of passes, the caching team of distribution and initialization for stored carry
Row, its item number is N;
(8.3) according to the computing times that can use into number of passes by needs progress by numerical digit sum operation obtained in step (8.1)
Business carries out region cutting, specific process:(8.31) carried out according to parallel computing platform is currently available into number of passes m and needs by number
The number n of position sum operation obtains the size of each subtask and the size of special subtask;(8.32) state and initialize repeatedly
It is 0 for variable, sets the value of iteration variable to arrive N-1 for 0, each iteration is completed the son that iteration variable is multiplied by step (8.31) and appointed
The size of business, its obtained value are the starting point of each subtask, complete region cutting;The item number of buffer queue and subtask
Number it is equal, but with might not be equal into number of passes, when equal with into number of passes, i.e., a process corresponds to a son and appoints
Business, but subtask ID and process ID are corresponding in the way of beyond static scheduling;In addition, the value of No. 0 item of buffer queue corresponds to No. 0
The carry Flag of task, the value of No. 1 item of buffer queue correspond to the carry Flag of No. 1 task, and the value of No. 2 items of buffer queue corresponds to
The carry Flag of No. 2 task, the value of buffer queue N-1 items correspond to the carry Flag of N-1 tasks, and N is buffer queue
Item number.
(8.4) Dynamic Scheduling Strategy is used, the calculating task in each region is asked in multi-process parallel, and rate has first carried out task
Process then distributes a subtask from the task pool formed by subtask, and each process need to judge currently when asking for subtask
Subtask is last subtask, otherwise can be with if last subtask needs to call serial sum operation
Directly invoke serial sum operation and calculate current subtask, last carry value is then saved in step (8.2) buffer queue
In in corresponding item, store the result into the relevant position of result;Region cutting, obtained son are carried out to task using into number of passes
The size of task differs, and the summation operation operation in the corresponding subtask of positive number highest order is transported with the summation in other subtasks
It is different, it is necessary to carry out specially treated to calculate operation number, i.e., by this this special son of corresponding subtask of positive number highest order
Task and other subtasks progress same treatment in addition to special duty, but will be corresponded to every time before execution according to subtask
Subtask ID judge current process perform region be special subtask;The specific serial mistake for carrying out sum operation
Journey:
1) carry cr is assigned a value of zero, pointer variable xp and yp are respectively directed to the lowest order of two numbers;
2) obtain the value of summand pointer meaning position and pointer is directed toward next bit, xl=*xp++;Obtain addend pointer institute
Refer to the value of position and pointer is directed toward next bit, yl=*yp++;
3) numerical value of the present bit of acquirement is subjected to sum operation, and stored the result into variable sl, sl=xl+yl;
4) whether summed result is less than the value of the present bit of summand, cy1=sl in being walked on judging<xl;If cy1=1
Represent the carry of the oriented next bit of result of summation, cy1 preserves carry value;If cy1=0 represents that the result of summation is not downward
The carry of one;
5) summed result is added into the carry value cy of upper one, and by result there are in rl, rl=sl+cy;
6) with the end value in rl compared with the value in sl, cy2=rl<sl;If cy2=1 is represented due to summed result
Cause the carry of oriented next bit plus carry, cy2 preserves carry value;After if cy2=0 represents that the result of summation adds carry
Not to the carry of next bit;
7) this operation result is obtained according to cy1 and cy2 either with or without the carry cr=cy1 to next bit | cy2;cy
=1 represents the carry of oriented next bit, and cy=0 is represented not to the carry of next bit;
8) end value in rl is stored into the space for storage result;
9) 2) -8 are repeated) content of step, until all bit arithmetics of addend finish, sum operation fortune is carried out by numerical digit to this
Finish;
(8.5) unified operation, tool are carried out to the carry result in each region in updated buffer queue in step (8.4)
Body process is:
(8.51) each value that buffer queue N-1 is removed in buffer queue is traveled through, if carry value is zero, continues to travel through
It is next, if value is non-zero, to the highest of result since next area results in the result obtained to step (8.4)
The whole region of position carries out plus 1 operation, and new when advanced potential is not 1 when adding during 1, jumps out this ergodic process;
(8.52) traversal is except the carry Flag that highest order is updated after each value of buffer queue N-1.In buffer queue N-1
Value need not carry out traversing operation, what it was stored is carry last in all positions of positive number as a result, only need to step (8.5) it
Result afterwards carries out xor operation, obtains carry value last in whole computing.
In above-mentioned parallel computing platform, to complete multinuclear cooperated computing, the present invention compiles the multiple programming interface of extension
Cheng Weineng is by any type in multinuclear Distributed Computing Platform and any number of calculate nodes come the Parallel application journey that performs parallel
Sequence.
Serial code is automatically converted into the isomerism parallel code of mixing by compiler.Specifically include, compiler passes through first
Parse sentence to determine to need calculating kernel and acquisition and the relevant necessary information of cluster parallel computing accelerated in application program, so
Afterwards core is specifically calculated for each node generation equipment for participating in parallel computation.Each core that calculates is to calculate kernel do not existing together
Manage the different editions on device.
Cutting calculating task is simultaneously reasonably allocated to multiple calculate nodes, and it is specific that equipment is then performed in each calculate node
Calculating core complete to distribute to its subtask.For each calculating kernel, participated in according in multinuclear Distributed Computing Platform
The equipment of parallel computation creates the control process of equivalent amount, if p calculate node can use, then creates p process
To control, and this p process is run using p dedicated heterogeneous processors, multiple processors are considered as a calculate node.
Process tiPerform equipment and specifically calculate core, wherein 1≤i≤p-1.A part of input data is copied from host node first, so
Start all available processes afterwards and come the parallel subtask for performing distribution, result of calculation is finally copied back into host node.It is same with this
When, process tpProcessor, which is performed, in remaining available heterogeneous processor calculates core.Specifically include, process tpProduce mk-p+1
The progress of work to perform the subtask for distributing to processor parallel, and processor number, k represent each used in wherein m expressions
The check figure of processor.
Distributed storage architecture, and the communication delay between node are used between node in view of multi-core parallel concurrent environment
More than the communication delay in node, therefore coarse-grained parallelization method is used between node.Procedure below is based between node parallel
Realize:It is N number of level-one subregion first by the impartial subdivision of cluster task topology, each level-one subregion is then respectively allocated to one
Node is individually handled.Wherein, N is the node total number for participating in parallel computation.
To improve data access speed, the three-level caching on different processor is relatively independent, therefore processor
Between also use coarse-grained parallelization method.Established parallel on the basis of level-one subregion between processor:First by each level-one
Further impartial subdivision is K secondary partition to subregion, then distributes each secondary partition for being derived from identical level-one subregion respectively
Different processor to same node point is individually handled.Wherein, K is the sum of individual node inner treater.So as to by different disposal
Communication between device has been limited in same node as far as possible.
For the internuclear parallel computing module for being related to the bottom, using fine grained parallel computational methods.To each two fraction
The further decomposition of area's calculating task, finds loop structure therein for each calculation procedure of secondary partition, is decomposed
For subtask that is a large amount of orthogonal and can independently executing, each subtask then is distributed to polycaryon processor one counts
Core is calculated to go to perform.
MPI Parallel Programming Models are used on programming mode.The top layer of whole parallel computation is built based on MPI processes, its
In each process be responsible for controlling the processing of 1 secondary partition.Given full play to by deriving from a large amount of threads inside each MPI processes
The advantage of polycaryon processor height concurrent.To realize level load balancing, using following parallel computation flow:
1st step, the data file needed for parallel computation, including the unit of each subregion, node, load are generated by two-stage subregion
Lotus, boundary condition and adjacent sectors information etc.;
2nd step, while start K MPI process in each node, wherein each process is responsible for 1 secondary partition data
The reading of file;
3rd step, derives T thread inside each MPI processes respectively, completes respective partition using multi-core resource and adds up meter
Calculate.Wherein, T is the calculating core sum in single processor;
4th step, derives T thread inside each MPI processes respectively, utilizes multi-core resource to complete accumulative carry and calculates;
5th step, if desired further iteration then jump to the 2nd step and restart to perform, otherwise terminate.
In order between each calculate node rationally, efficiently carry out task scheduling, and reduce the communication overhead between node, this hair
It is bright to propose a kind of method for obtaining calculation amount cutting ratio.In cluster parallel computing, when each calculate node has been completed respectively
From work, can just think that whole calculating task performs completion.It is assumed that the execution time of each calculate node is with distributing to its meter
Calculation amount is proportional, and the total of cluster parallel computing performs time TtotalIt is as follows:
Ttotal=max (T1×R1;T2×R2,..., TN×RN)
Wherein, N represents to participate in the number of nodes of parallel computation in a multinuclear calculating platform;TiRepresent that whole calculate is appointed
Business individually handles the spent time by i-th (1≤i≤N) a calculate node;RiRepresent to distribute to i-th calculate node
Calculation amount shared ratio in the amount of calculation;When each calculate node, when synchronization completes respective work, acquirement is most rational
Total execution time of calculation amount cutting, i.e. cluster parallel computing reaches most short:
Work as RiAfter determining, the calculation amount W of i-th (1≤i≤p) a calculate node is distributed toiIt is as follows:Wi=W × Ri
Wherein, W represents the amount of calculation of whole calculating task.
In terms of management of computing, present invention task scheduling strategy setting, setting between node is set in extending sentence is initial
Calculation amount cutting ratio and process configuration.For a given cluster parallel computing, compiled using extension sentence notice
Translate the following key message of device:The equipment for participating in parallel computation, task scheduling strategy and initial calculation amount cutting ratio between node.
In terms of data management, the present invention comes support section transmission, incremental transmission and asynchronous transmission using extension sentence.It is logical
Know which space of compiler need to carry out fractional transmission, and need not specify which which partial data in a space need to upload to
Calculate node is downloaded to host node from which calculate node.What the data cutting between each node was provided by runtime system
Task Scheduling Mechanism automatically processes.
The compiler of the present invention is using the matrix multiplication procedure that the multiple programming sentence using extension is write as input, translation
Produce isomerism parallel matrix multiplication procedure, including following four committed steps:
Step 1:Read the serial code of the multiple programming sentence with extension and syntax tree is constructed after syntactic analysis.
Step 2:Obtain information relevant with cluster parallel computing by parsing extension sentence and carry out relevant operation.Tool
Body includes:
(1) determine to need the calculating kernel accelerated in application program;(2) number of nodes for participating in parallel computation, and pin are set
Each node designated equipment globally unique ID, device type, device numbering and the equipment for participating in parallel computation are specifically counted
Calculate core;(3) initial value, stop value and the step-length for the outermost loop for calculating kernel are obtained;(4) calculation amount cutting ratio is set;
(5) specify and task scheduling strategy and open task scheduling between node.
Step 3:Kernel maker is calculated to participate in the generation of each node of parallel computation for specific heterogeneous processor
It is specific to calculate core.
Step 4:Syntax tree that de-parsing was changed simultaneously ultimately generates the hybrid parallel code using multiple programming.After conversion
Source code in retain annotation and control structure before conversion in source code.
In order to adapt to different types of data level Parallel application, support task scheduling between more flexible and efficient node,
The present invention is proposed on the basis of static policies, and using scalability task scheduling strategy, overall strategy is in a calculating
Block size is dynamically adjusted according to the performance change of cluster parallel computing in the implementation procedure of kernel, in order to provide the equipment of higher
Utilization rate, lower scheduling overhead simultaneously keep calculation amount equilibrium between node.
The 1=n (i.e. W=n) that fetching devise a stratagem calculates the amount of calculation of kernel is used as initial block size, and wherein parameter n can be by compiling
Cheng personnel's manual settings;Then adjusted in the implementation procedure for calculating kernel is specified according to the performance change of cluster parallel computing dynamic
Whole next piece of size.
Specifically comprise the following steps:
Step 1:The 1st block is performed in unison with using p calculate node, its size is W/n.Specifically include:
(1) according to initial calculation amount cutting ratio RiBy a part of calculation amount W of the 1st blockr.iDistribute to node Di(1≤i
≤ p), wherein Wr.i=Wr×RiAnd Wr=W/n.Initial calculation amount cutting ratio according to participate in parallel computation each node theory
Peak performance is calculated.
(2) in node DiMiddle execution equipment specifically calculates core to complete to distribute to its calculation amount Wr.i。
(3) as node DiAfter completing the calculation amount for distributing to it, collector node DiCurrent execution time Tr.i, and count
Operator node DiCurrent perform speed Vr.i, wherein Vr.i=Wr.i/Tr.i。
(4) after all p calculate nodes complete respective work, calculate node DiOpposite perform speed RVi, whereinThe opposite speed that performs is used as new calculation amount cutting ratio, updates calculation amount cutting ratio as follows
Rate:Ri=RVi(1≤i≤p)。
(5) calculate present parallel and perform speed Vr, wherein Vr=Wr=TrAnd Tr=max (Tr.1;Tr.2..., Tr.p)。
(6) completed the amount of calculation W is updatedfWith remaining calculation amount Wr, wherein Wf=Wf+WrAnd Wr=W-Wf。
Step 2:Remaining calculation amount is judged whether, if it is not, then explanation specifies calculating kernel executed to finish;If so,
Similar to step 1, the 2nd block is performed in unison with using p calculate node, its size is 2 × W/n.Specifically:(1) basis obtains
To calculation amount cutting ratio by the calculation amount of the 2nd block distribute to participate in parallel computation each node.(2) in each calculate node
Middle execution equipment specifically calculates core to complete to distribute to its calculation amount.(3) after each calculate node completes respective work,
The execution time of each calculate node is collected, calculate the opposite of each node and perform speed, and calculation amount cutting ratio is updated with this.
(4) present parallel is calculated according to the information being collected into and performs speed.(5) next piece of size is adjusted, that is, determines to need in next step
The calculation amount to be completed.Speed V is performed by comparing the parallel of previous steppWith current parallel execution speed Vr, and compare
The size W of one blockpThe size W of (i.e. the completed calculation amount of previous step) and current blockrIt is (completed i.e. in current procedures
Calculation amount), to determine next piece of size WnCompared to current block size WrIt is the multiplication, demultiplication or remains unchanged.(6)
Update completed the amount of calculation and remaining calculation amount.
Step 3:Repeat step 2, until remaining calculation amount is 0.
In every single-step iteration, in DiBefore middle execution current block, by calculation amount cutting ratio RiUploaded from host node current
A part of data of block are to Di, work as DiAfter the execution for terminating current block, by calculation amount cutting ratio RiFrom DiMiddle download current block
Data after part processing are to host node.
For the cluster parallel computing of some data level Parallel applications, internodal data transmission is optimized to be had very much
Necessity, should take message between nodes transmission into account in the design of dynamic task scheduling strategy especially between node.To avoid
The data transfer of above-mentioned redundancy, the present invention devise a kind of differential data transmission method, need to be held repeatedly suitable for one or more
The data level Parallel application of the multiple calculating kernel of row.Specifically include, in specifying the first time for calculating kernel to perform, by initial
The whole calculation amount of the calculating kernel is distributed to each node for participating in parallel computation by calculation amount cutting ratio, when each calculate node
After completing respective work, the current execution time of each calculate node is collected to calculate new calculation amount cutting ratio.The calculating
The subsequent execution each time of kernel is similar with performing for the first time, unlike, second of execution of the calculating kernel starts, by this
Calculate the calculation amount cutting ratio updated after the last execution of kernel and whole calculation amount is distributed into each of participation parallel computation
Node.
It need to be uploaded in which definite partial data from host node or during be downloaded to host node, including following step
Suddenly:
Step 1:Determine which partial data need to be passed specified between calculate node and host node in designated storage area
It is defeated.According to the initial value of the outermost loop of specified calculating kernel and stop value and the calculation amount used in current perform
Cutting ratio, from designated storage area retrieve and determine one the calculating kernel currently perform in need to be under specified calculate node
Be loaded onto the first subarray of host node, and record respectively the subarray in designated storage area start to index and terminate index;
Cut according to the initial value of the outermost loop of the calculating kernel and stop value and the calculation amount used in perform next time
Divide ratio, retrieved again from designated storage area and determine that a next time in the calculating kernel need to upload in performing from host node
To the second subarray of specified calculate node, and the subarray starting to index and terminate rope in designated storage area is recorded respectively
Draw.
Step 2:Which partial data in one designated storage area is determined by the subarray obtained in comparison step 1
It need to be transmitted specified between calculate node and host node.If all or part of data are being specified in calculating in the first subarray
Refer to devise a stratagem needed for operator node in performing the next time of core, then these data need not be from finger in the current execution of the calculating kernel
Devise a stratagem operator node is downloaded to host node;Otherwise, these data need to be from specified calculate node in the current execution of the calculating kernel
It is downloaded to host node.If all or part of data are present in the memory of specified calculate node in the second subarray, these
Data need not upload to specified calculate node in being performed in the next time of the calculating kernel from host node;Otherwise, these data are at this
Specified calculate node need to be uploaded to from host node by calculating in execution next time of kernel.
It is specified for that need to be uploaded in one designated storage area of acquisition from host node that the runtime system of the present invention provides API
One or two subarray of calculate node starts to index and terminates index, and can also obtain in a specified array need to be from specifying
Calculate node be downloaded to one or two subarray of host node start to index and terminate index.
Preferably, the whole iteration space of a calculating kernel is cut into multiple equal in magnitude or differed in size by the present invention
Block, and concomitantly synergistically perform these blocks using multiple calculate nodes in multinuclear Distributed Computing Platform;Upload data, under
Data are carried with calculating the calculating parallel processing of core.Realize that data transfer assesses the overlapping of calculation with calculating using three threads, the
One thread is responsible for next piece of all or part of data from host node asynchronous upload to specified calculate node;Second line
Journey is responsible for the asynchronous execution current block in calculate node is specified;3rd thread is responsible for all or part of data of a upper block
Host node is downloaded to from specified calculate node is asynchronous.
Correspondingly, it is scheduled in task scheduling strategy using following steps:
Step 1:The 1st block is performed using p calculate node parallel.Specifically include:(1) according to calculation amount cutting ratio
Rr.iBy a part of calculation amount W of the 1st blockr.iDistribute to node Di(1≤i≤p), wherein Wr.i=Wr×Rr.i、Wr=W/n is (just
Beginning block size) and Rr.i=Ri(initial calculation amount cutting ratio).(2) according to calculation amount cutting ratio Rn.iBy next piece (i.e.
2nd block) a part of calculation amount Wn.iIt is pre-assigned to node Di, wherein Wn.i=Wn×Rn.i、Wn=W=n and Rn.i=Ri。(3)
In node DiMiddle execution equipment specifically calculates core to complete to distribute to its calculation amount Wr.i.If node DiIt is one and calculates section
Point, then in node DiCalculation amount cutting ratio R is pressed before middle the 1st block of executionr.iBy a part of data of the 1st block from main section
Synchronized upload is put to node DiIn;When in node DiDuring the 1st block of middle asynchronous execution, according to calculation amount cutting ratio Rn.iBy the 2nd
A part of data of a block are from host node asynchronous upload to node DiIn.(4) as node DiComplete distribute to its calculation amount it
Afterwards, the current execution time T of collector node Dir.i, and calculate node DiCurrent perform speed Vr.i, wherein Vr.i=Wr.i=
Tr.i.(5) after all p calculate nodes complete respective work, calculate node DiOpposite perform speed RVi, whereinRenewal is used for the calculation amount cutting ratio R of next piece (i.e. the 3rd block) under cuttingnn.i, wherein Rnn.i=RVi
(1≤i≤p);Calculate present parallel and perform speed Vr, wherein Vr=Wr/TrAnd Tr=max (Tr.1;Tr.2..., Tr.p).(6) more
New completed the amount of calculation WfWith remaining calculation amount Wr, wherein Wf=Wf+WrAnd Wr=W-Wf-Wn。
Step 2:Not yet handle, then come simultaneously using p calculate node if there is the work of remaining calculation amount or predistribution
Row performs the 2nd block.Specifically include:(1) the calculation amount cutting ratio updated afterwards according to a upper block (i.e. the 1st block) has been performed
The calculation amount of next piece (i.e. the 3rd block) is pre-assigned to each node for participating in parallel computation by rate.(2) in each calculate node
Perform equipment and specifically calculate core to complete to distribute to the calculation amount of its current block (i.e. the 2nd block).If node Di(1≤i
≤ p) be a calculate node, then thread 0 presses calculation amount cutting ratio Rn.iNext piece of a part of data are different from host node
Step uploads to node DiIn, thread 1 presses calculation amount cutting ratio Rr.iIn node DiMiddle asynchronous execution current block, thread 2 is by calculating
Measure cutting ratio Rp.iA part of data of a treated upper block are downloaded to host node from node Di is asynchronous, wherein
Rp.i、Rr.iAnd Rn.iNode D is represented respectivelyiThe shared ratio in a upper block, current block and the distribution of next piece of calculation amount.
(3) after each calculate node completes respective work, the execution time of each calculate node is collected, calculates the opposite execution of each node
Speed, renewal are used for the calculation amount cutting ratio of next piece (i.e. the 4th block) under cutting, and calculate present parallel and perform speed.
(4) according to the last parallel difference and a upper block size and current block performed between speed and present parallel execution speed
Size changes to adjust down next piece of size.(5) completed the amount of calculation and remaining calculation amount are updated.
Step 3:Repeat step 2, finishes until remaining calculation amount is processed for the work of 0 or predistribution.
If current block is last block, when thread 1 presses calculation amount cutting ratio Rr.iIn node DiMiddle asynchronous execution is most
During the latter block, thread 2 presses calculation amount cutting ratio Rp.iBy a part of data of processed penultimate block from node Di
It is asynchronous to be downloaded to host node;As node DiAfter the work for completing oneself, by calculation amount cutting ratio Rr.iBy it is processed last
A part of data of a block are from node DiSynchronously it is downloaded to host node.
In conclusion the present invention proposes a kind of mass data method for parallel processing based on big data, calculated in multinuclear
On the basis of platform, distributed parallel environment is made full use of to improve arithmetic speed.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed
Network on, alternatively, they can be realized with the program code that computing system can perform, it is thus possible to which they are stored
Performed within the storage system by computing system.Combined in this way, the present invention is not restricted to any specific hardware and software.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's
Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing
Change example.
Claims (3)
- A kind of 1. mass data method for parallel processing based on big data, it is characterised in that including:(1) it is currently available into number of passes to obtain parallel computing platform;(2) can use according to what is obtained in step (1) into number of passes, the buffer queue of distribution and initialization for stored carry, its Number is N;(3) into number of passes the processor active task for carrying out sum operation will be needed to carry out region cutting according to can use of being obtained in step (1), The number of subtask in region and the number of buffer queue correspond, more than or equal to can use into number of passes;(4) Dynamic Scheduling Strategy is used, the calculating task in each region is asked in multi-process parallel, and the process that rate has first carried out task connects Current subtask need to be judged when asking for subtask by one subtask of distribution, each process from the task pool formed by subtask Whether last subtask, if serial accumulation algorithm is then called in last subtask, otherwise directly invoke serial Accumulation algorithm calculates current subtask, and then last carry value is saved in step (2) buffer queue in corresponding item, will As a result it is stored in the relevant position of result;(5) unified operation is carried out to the result in each region in updated buffer queue in step (4), detailed process is:Time Go through in buffer queue remove buffer queue N-1 each value, if carry value is zero, continue to travel through it is next, if value is Non-zero, the then whole region of the highest order in the result obtained to step (4) since next area results to result carry out Add 1 operation, and it is new when advanced potential is not 1 during 1 when adding, jump out this ergodic process;Travel through and removed buffer queue N-1 Each value after update highest order carry Flag.
- 2. according to the method described in claim 1, it is characterized in that:The item number of buffer queue and subtask in the step (2) Number it is equal, when the item number of buffer queue is equal with into number of passes, i.e. process corresponds to a subtask, subtask ID with Process ID corresponds in the way of beyond static scheduling;The value of No. 0 item of buffer queue corresponds to the carry Flag of No. 0 task, caching The value of No. 1 item of queue corresponds to the carry Flag of No. 1 task, and the value of No. 2 items of buffer queue corresponds to the carry of No. 2 task Flag, the value of buffer queue N-1 items correspond to the carry Flag of N-1 tasks, and N is to be cached in claim 1 in step (2) The item number of queue.
- 3. according to the method described in claim 1, it is characterized in that:Value in the step (5) in buffer queue in N-1 is not required to Traversing operation is carried out, what it was stored is carry last in all positions of positive number as a result, only being carried out to the result after step (5) Xor operation, obtains carry value last in whole computing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711306590.0A CN107977444A (en) | 2017-12-11 | 2017-12-11 | Mass data method for parallel processing based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711306590.0A CN107977444A (en) | 2017-12-11 | 2017-12-11 | Mass data method for parallel processing based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107977444A true CN107977444A (en) | 2018-05-01 |
Family
ID=62009864
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711306590.0A Pending CN107977444A (en) | 2017-12-11 | 2017-12-11 | Mass data method for parallel processing based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107977444A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549583A (en) * | 2018-04-17 | 2018-09-18 | 成都致云科技有限公司 | Big data processing method, device, server and readable storage medium storing program for executing |
CN110232092A (en) * | 2019-04-26 | 2019-09-13 | 平安科技(深圳)有限公司 | A kind of asynchronous solution of batch data based on data processing and relevant device |
CN110515990A (en) * | 2019-07-23 | 2019-11-29 | 华信永道(北京)科技股份有限公司 | Data query methods of exhibiting and inquiry display systems |
CN112039969A (en) * | 2020-08-26 | 2020-12-04 | 浪潮云信息技术股份公司 | AWS 3 URL uploading method based on Redis distributed lock development |
CN112527541A (en) * | 2019-09-19 | 2021-03-19 | 华为技术有限公司 | Method for determining fault calculation core in multi-core processor and electronic equipment |
CN113821506A (en) * | 2020-12-23 | 2021-12-21 | 京东科技控股股份有限公司 | Task execution method, device, system, server and medium for task system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461469A (en) * | 2014-11-14 | 2015-03-25 | 成都卫士通信息产业股份有限公司 | Method for achieving SM2 algorithm through GPU in parallelization mode |
CN104699449A (en) * | 2015-04-03 | 2015-06-10 | 中国科学院软件研究所 | GMP (GNU multiple precision arithmetic library) based big integer addition and subtraction multinuclear parallelization implementation method |
CN104793922A (en) * | 2015-05-04 | 2015-07-22 | 中国科学院软件研究所 | Parallel realization method for large-integer multiplication Comba algorithm on basis of OpenMP |
-
2017
- 2017-12-11 CN CN201711306590.0A patent/CN107977444A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104461469A (en) * | 2014-11-14 | 2015-03-25 | 成都卫士通信息产业股份有限公司 | Method for achieving SM2 algorithm through GPU in parallelization mode |
CN104699449A (en) * | 2015-04-03 | 2015-06-10 | 中国科学院软件研究所 | GMP (GNU multiple precision arithmetic library) based big integer addition and subtraction multinuclear parallelization implementation method |
CN104793922A (en) * | 2015-05-04 | 2015-07-22 | 中国科学院软件研究所 | Parallel realization method for large-integer multiplication Comba algorithm on basis of OpenMP |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549583A (en) * | 2018-04-17 | 2018-09-18 | 成都致云科技有限公司 | Big data processing method, device, server and readable storage medium storing program for executing |
CN108549583B (en) * | 2018-04-17 | 2021-05-07 | 致云科技有限公司 | Big data processing method and device, server and readable storage medium |
CN110232092A (en) * | 2019-04-26 | 2019-09-13 | 平安科技(深圳)有限公司 | A kind of asynchronous solution of batch data based on data processing and relevant device |
CN110232092B (en) * | 2019-04-26 | 2023-08-01 | 平安科技(深圳)有限公司 | Batch data asynchronous solving method based on data processing and related equipment |
CN110515990A (en) * | 2019-07-23 | 2019-11-29 | 华信永道(北京)科技股份有限公司 | Data query methods of exhibiting and inquiry display systems |
CN110515990B (en) * | 2019-07-23 | 2021-10-01 | 华信永道(北京)科技股份有限公司 | Data query display method and query display system |
CN112527541A (en) * | 2019-09-19 | 2021-03-19 | 华为技术有限公司 | Method for determining fault calculation core in multi-core processor and electronic equipment |
US11815990B2 (en) | 2019-09-19 | 2023-11-14 | Huawei Technologies Co., Ltd. | Method for determining faulty computing core in multi-core processor and electronic device |
CN112039969A (en) * | 2020-08-26 | 2020-12-04 | 浪潮云信息技术股份公司 | AWS 3 URL uploading method based on Redis distributed lock development |
CN112039969B (en) * | 2020-08-26 | 2022-04-08 | 浪潮云信息技术股份公司 | AWS 3 URL uploading method based on Redis distributed lock development |
CN113821506A (en) * | 2020-12-23 | 2021-12-21 | 京东科技控股股份有限公司 | Task execution method, device, system, server and medium for task system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107977444A (en) | Mass data method for parallel processing based on big data | |
Lacoste et al. | Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes | |
CN111708641A (en) | Memory management method, device and equipment and computer readable storage medium | |
CN108108242A (en) | Accumulation layer intelligence distribution control method based on big data | |
Kamthe et al. | A stochastic approach to estimating earliest start times of nodes for scheduling DAGs on heterogeneous distributed computing systems | |
US8941674B2 (en) | System and method for efficient resource management of a signal flow programmed digital signal processor code | |
Liu | Parallel and scalable sparse basic linear algebra subprograms | |
CN106371924A (en) | Task scheduling method for maximizing MapReduce cluster energy consumption | |
CN108062249A (en) | High in the clouds data allocation schedule method based on big data | |
Collier et al. | Experiences in developing a distributed agent-based modeling toolkit with Python | |
CN103810041A (en) | Parallel computing method capable of supporting dynamic compand | |
Obaida et al. | Simulation of HPC job scheduling and large-scale parallel workloads | |
Plauth et al. | Using dynamic parallelism for fine-grained, irregular workloads: a case study of the n-queens problem | |
Tan et al. | GPUPool: A holistic approach to fine-grained gpu sharing in the cloud | |
Steinberger | On dynamic scheduling for the gpu and its applications in computer graphics and beyond | |
Plauth et al. | A performance evaluation of dynamic parallelism for fine-grained, irregular workloads | |
CN107967335A (en) | A kind of distributed SQL processing method and system | |
CN110415162B (en) | Adaptive graph partitioning method facing heterogeneous fusion processor in big data | |
CN109597619A (en) | A kind of adaptive compiled frame towards heterogeneous polynuclear framework | |
Jeannot et al. | Experimenting task-based runtimes on a legacy Computational Fluid Dynamics code with unstructured meshes | |
Chandrashekhar et al. | Prediction Model for Scheduling an Irregular Graph Algorithms on CPU–GPU Hybrid Cluster Framework | |
Gopalakrishnan Menon | Adaptive load balancing for HPC applications | |
Lu et al. | Enabling low-overhead communication in multi-threaded openshmem applications using contexts | |
Kosiachenko | Efficient GPU Parallelization of the Agent-Based Models Using MASS CUDA Library | |
Vo et al. | Streaming-enabled parallel data flow framework in the visualization toolkit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180501 |