CN107797852A

CN107797852A - The processing unit and processing method of data iteration

Info

Publication number: CN107797852A
Application number: CN201610804073.5A
Authority: CN
Inventors: 蒲若昂; 张包峰; 方勇; 金晓军; 强琦
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-09-06
Filing date: 2016-09-06
Publication date: 2018-03-13

Abstract

The embodiments of the invention provide a kind of processing unit and processing method of data iteration, wherein, processing unit includes：Code memory, code read module, iteration diagram generation module and interative computation trigger module, wherein iteration diagram generation module are used to generate iteration graph model, and iteration graph model includes：Multiple iterative calculation nodes, calculated for being iterated；Iteration side information memory cell, for storing the information on the iteration side between the iterative calculation node.The information on iteration side includes the first identification information of unique mark upstream and downstream node relationships and the second identification information of record current iteration round.The embodiment of the present invention passes through the multiplexing to iteration side, it have recorded the identification information for representing current iteration round, so that in the topological diagram expansion of iterative calculation, the iterative calculation node that calculating logic need not be repeated repeats to deploy, so as to improve the operational efficiency of iterative calculation, less committed memory resource.

Description

The processing unit and processing method of data iteration

Technical field

The present invention relates to a kind of processing unit and processing method of data iteration, belong to field of computer technology.

Background technology

In the prior art, during figure calculating is carried out, the algorithm of iteration can be used.In existing iterative algorithm, For example, in a kind of Spark (cluster computing system increased income calculated based on internal memory) iteration meter described by RDD (elastic data collection) In calculation, when the node all iterated to calculate is producing DAG (Directed Acyclic Graph, directed acyclic graph) meetings, It is fully deployed according to the genetic connection in iterative calculation, even if there is identical operation logic used in often wheel iterative calculation Node, can also repeat deploy according to the process of interative computation, as shown in figure 1, whole iterative process expand into one it is linear DAG schemes, it is assumed that iterations 3, node A and iterative calculation node B is iterated to calculate in figure and iterative calculation node C is repeated Expand three times, the number of side (being represented in figure with S10-17) also accordingly increases, if iterations is very big, it will produce number Scheme according to very big DAG is measured, so that data iterative processing efficiency is very low, even resulting in run.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of processing unit and processing method of data iteration, to improve data The efficiency of iterative processing.

The embodiment provides a kind of processing unit of data iteration, including：

Code memory, for storing the program code for being used for performing data interative computation；

Code read module, for from the code memory, reading described program code；

Iteration diagram generation module, the program code for being read according to the code read module generate iteration artwork Type, the iteration graph model include：Multiple iterative calculation nodes and iteration side information memory cell,

The multiple iterative calculation node, for stored according to iteration in information memory cell iteration while information, Calculating is iterated, the multiple iterative calculation node forms ring topology；

Iteration side information memory cell, for storing the information on the iteration side between the iterative calculation node, institute State the information including unique mark upstream and downstream node relationships on iteration side the first identification information and record current iteration round the Two identification informations, the iterative data flowed between node that iterates to calculate are believed with first identification information and second mark Manner of breathing associates；

Interative computation trigger module, for after the iteration diagram generation module generates iteration graph model, receiving iteration fortune The primary iteration data of calculation, and to the iteration diagram mode input, to trigger interative computation.

Embodiments of the invention additionally provide a kind of processing method of data iteration, including：

Iteration graph model is generated according to program code, the iteration graph model includes multiple iterative calculation nodes and iteration Side；

Iteration graph model receives the primary iteration data of interative computation, triggering iterative calculation；

The multiple iterative calculation node performs iteration successively according to the information on the iteration side according to iterative calculation order Calculating is handled, wherein, the information on the iteration side includes the first identification information and record of unique mark upstream and downstream node relationships Second identification information of current iteration round, it is described to iterate to calculate the iterative data flowed between node and first identification information It is associated with second identification information.

The processing unit and processing method of data iteration provided in an embodiment of the present invention, by the multiplexing to iteration side, In iteration side, in addition to recording upstream and downstream nodal information, it is also recorded for identifying the identification information of current iteration round, so that In the topological diagram expansion of iterative calculation, it is not necessary to which the iterative calculation node for repeating calculating logic repeats to deploy, i.e., can be real Now no matter iteration how many times, all keep topology diagram it is the same, so as to improve the operational efficiency of iterative calculation, less take Memory source.

Brief description of the drawings

Fig. 1 is the topological structure schematic diagram of data interative computation in the prior art.

Fig. 2 is the structural representation of the processing unit of the data iteration of the embodiment of the present invention one.

Fig. 3 is the structural representation of the processing unit of the data iteration of the embodiment of the present invention two.

Fig. 4 is the structural representation of the processing unit of the data iteration of the embodiment of the present invention three.

Fig. 5 is the structural representation of the processing unit of the data iteration of the embodiment of the present invention four.

Fig. 6 is the schematic flow sheet of the processing method of the data iteration of the embodiment of the present invention five.

Fig. 7 is the web link graph of the embodiment of the present invention six.

Fig. 8 is the topological structure schematic diagram of the embodiment of the present invention six.

Embodiment

Technical scheme is further elaborated below by embodiment

Embodiment one

The present embodiment is related to a kind of processing unit of data iteration, as shown in Fig. 2 it is the data of the embodiment of the present invention one The structural representation of the processing unit of iteration, the processing unit of the present embodiment include：

Code memory 1, for storing the program code for being used for performing data interative computation.The program code can be pre- First write and be stored in code memory 1, iteration graph model is to the effect that depicted in program code, and can pass through fortune The program code go to generate iteration diagram model.

Code read module 2, for from the code memory, reading described program code.

Iteration diagram generation module 3, the program code for being read according to the code read module generate iteration artwork Type (represents) that the iteration graph model includes with dotted portion in figure：Multiple iterative calculation nodes and the storage of iteration side information are single Member.

In the iteration graph model of generation, multiple iterative calculation nodes 11, for according in the information memory cell of iteration side The information on the iteration side of storage is iterated calculating, and multiple iterative calculation nodes 11 form ring topology, as shown in FIG., Multiple iterative calculation nodes are connected by iteration side 21.In each iterative calculation node 11, one or more can be included simultaneously Capable task process, in same iterative calculation node, the calculating logic of each task process is identical, can only be divided With different data, so as to realize parallel processing, the calculating logic of each iterative calculation node can be that identical also may be used To be different, this depends entirely on iterative algorithm in itself.Three iterative calculation nodes are schematically shown in Fig. 2, are respectively Iterate to calculate node A, iterative calculation node B, iterative calculation node C.

In the iteration graph model of generation, iteration side information memory cell (not shown), for storing the iteration The information on the iteration side 21 between calculate node.Iteration side information memory cell can consider opened up in internal memory dedicated for Store the memory headroom of the information on iteration side 21.For the ease of intuitivism apprehension, the specific example of the information on iteration side 21 is labeled in The opening position on the iteration side in figure.The information on iteration side includes the first mark that record has unique mark upstream and downstream node relationships Second identification information of information and record current iteration round, iterates to calculate the iterative data flowed between node and the first mark is believed Breath and the second identification information are associated.

Interative computation trigger module 4, for after iteration diagram generation module 3 generates iteration graph model, receiving interative computation Primary iteration data, and to the iteration diagram mode input, to trigger interative computation.

Interrelational form between above-mentioned iterative data and identification information can be the iterative data flowed between node is made It is middle to carry above-mentioned the first mark and the second mark, or to corresponding between iterative data and mark in each task process Relation is recorded, can also by the task orchestration module mentioned later come to iterative data and the first above-mentioned mark and Corresponding relation between second mark is managed.

Above-mentioned the first identification information and the second identification information can be recorded in the attribute information on iteration side.Specific shape Formula can be expressed by simple StreamId (side identification code) form, and StreamId structure is as follows：(S, N), wherein, S For the mark on iteration side, corresponding to the first above-mentioned identification information, it is necessary to which the S for ensureing each edge is unique, N is the iteration side Iteration round, corresponding to the second above-mentioned identification information, after often taking turns iteration, can be changed.

As shown in Figure 2, the first identification information (S3, S4, S5) when iteration shows each iteration beside and should be repeatedly For the total round of iteration (figure is represented with iteration) on side, and whole StreamId on the iteration side are listed file names with lower section. For example, the total round of the iteration provided in figure is 3, the StreamId for iterating to calculate node A and iterating to calculate between node B includes： (S3,0), (S3,1), (S3,2).It can be seen that first identification information on different iteration sides is different, and ensure opening up It is unique for flutterring in structure chart.

The applied field for the iterative calculation that the processing unit of the data iteration of the embodiment of the present invention can apply to during figure calculates Scape, such as the Spark frameworks mentioned in the prior art and the similar hadoop (distributions developed by Apache funds club System infrastructure) etc. figure under framework iterative calculation.It is specific to may apply in many typical iterative algorithms, for example, Map/Reduce (mapping/reduction) algorithm, in the case where realizing the algorithm, multiple iterative calculation nodes can include at least one Map calculate nodes and at least one Reduce calculate nodes.

The processing unit that the present embodiment provides, by the multiplexing to iteration side, in iteration side, except recording upstream and downstream section Outside point information, it is also recorded for identifying the identification information of current iteration round, so that in the topological diagram expansion of iterative calculation, Need not by calculating logic repeat iterative calculation node repeat deploy, i.e., can realize no matter iteration how many times, all keep open up It is the same to flutter structure chart.Comparison diagram 2 and Fig. 1 are understood, are equally the topological diagram of iteration three times, are only deployed three sides and three in Fig. 2 Node, and Fig. 1 has 9 nodes and 8 sides, if iterations increases, the gap of topological diagram can be more obvious, and often increases Add a node or side, space and the process resource of internal memory will be occupied, it can be seen that, the technical scheme of the present embodiment substantially carries The high operational efficiency of iterative calculation, less committed memory resource.

Embodiment two

The topological structure of embodiment one will be further detailed for the present embodiment.As shown in figure 3, it is real for the present invention The structural representation of the processing unit of the data iteration of example two is applied, Fig. 3 illustrate only iteration diagram model part, in multiple iteration meters In operator node, including head node and tail node, head node are additionally operable to receive primary iteration number in addition to for being iterated calculating According to and trigger iterative calculation, tail node is except in addition to being iterated and calculating, being additionally operable to judge iteration round, meeting After predetermined condition, terminate to iterate to calculate and simultaneously export iteration result data, such as after meeting predetermined iterations, just jump out iteration Circulation.In general, in whole topological structure, the node that head node can be related to iterative data input connects, and tail node It can be connected with the interdependent node of iteration result output.For the topological structure of iterative cycles, it is also possible to can exist one or Multiple intermediate nodes between head node and tail node.

In the exemplary topological structure shown in Fig. 3, iterative calculation node A is head node, and iterative calculation node C is tail Node.Further, iteration graph model can also include the initialization non-iterative side information memory cell of node 14 and first (in figure It is not shown).Wherein, initialization node 14 be connected by non-iterative side 22 with being used as the iterative calculation node A of head node, for pair Input data outside ring topology is processed generation primary iteration data, and according to the first non-iterative side information storage The information on the non-iterative side of unit storage, there is provided to head node.First non-iterative side information memory cell, it is described first for storing The information on the non-iterative side 22 between beginningization node and the head node, the information on non-iterative side 22 include the described first mark and believed Breath, as shown in FIG., because non-iterative side 22 is not related to the record of iteration round, therefore, it only embodies upstream and downstream node Relation.StreamId in the attribute on non-iterative side 22 can also use following form：(S, 0), i.e. the second identification information Corresponding part is arranged to 0.

, wherein it is desired to explanation, non-iterative (including is believed in information memory cell during the first non-iterative in the present embodiment Breath memory cell and second, third and the 4th non-iterative side information memory cell below) it is considered that being opened in internal memory The memory headroom for the information dedicated for storage non-iterative side warded off.For the ease of intuitivism apprehension, the tool of the information on non-iterative side Body example has been labeled in the opening position on the non-iterative side in figure.

Further, as shown in figure 3, as a complete topological structure, iteration graph model can also include input section The output node 15 of point 13 and first and the second non-iterative side information memory cell (not shown).Input node 13 is by non- Iteration side 22 is connected with initialization node 14, for receiving the input data outside topological structure, and according to the second non-iterative side The information on the non-iterative side of information memory cell storage, the input data is sent to initialization node 14.First output section Point 15 is connected by non-iterative side 22 with being used as the iterative calculation node C of tail node, for being believed according to the second non-iterative side Cease the information on the non-iterative side of memory cell storage, the iteration result data obtained from tail node, to topological structure outside it is defeated Go out iteration result data.

It should be noted that what above-mentioned initialization node 14 was not required, if need to add original input data Work, depending on the requirement of form and iterative algorithm in itself of initial data in itself etc., in the situation in the absence of initialization node 14 Under, head node can be directly connected with input node 13 by non-iterative side.

Embodiment three

The present embodiment is further improved on the basis of above-described embodiment, as shown in figure 4, it is the present invention The structural representation of the processing unit of the data iteration of embodiment three, Fig. 4 illustrate only iteration diagram model part, and the present embodiment exists On the basis of preceding embodiment, add the non-iterative side information memory cell of increment output node 15 and the 3rd and (do not show in figure Go out), increment output node 15 is connected by non-iterative side 22 with iterative calculation node 11, for according to the 3rd non-iterative side information The information on the non-iterative side of memory cell storage, obtains the intermediate data in iterative process and output.3rd non-iterative side information Memory cell, for store the increment output node and iterate to calculate node between non-iterative side information, it is described it is non-repeatedly Include first identification information for the information on side.

Correspondingly, iteration graph model can also include the second output node 16 and the 4th non-iterative side information memory cell (not shown), it is connected by non-iterative side 22 with increment output node 15, for single according to the 4th non-iterative side information storage The information on the non-iterative side of member storage, intermediate data is received from increment output node, and to exporting outside topological structure.4th is non- Iteration side information memory cell, for storing the information on the non-iterative side between increment output node and the second output node.

Increment output node 15, it can be connected with all iterative calculation nodes 11, node can also be iterated to calculate with part 11 connections, it is intended that not limited by iterative cycles, directly export intermediate data, the intermediate data can be used for other Algorithm model, so that in the design of algorithm model more flexibly, and do not interfere with normal iterative cycles.

Example IV

As shown in figure 5, its structural representation for the processing unit of the data iteration of the embodiment of the present invention four, Fig. 5 only show Iteration diagram model part is gone out, the present embodiment further increases task orchestration module 31 on the basis of above-described embodiment, uses In to being coordinated and managed to multiple tasks process.As mentioned above, iterative calculation node include a task process or Multiple concurrent task process, the task orchestration module are located at outside topological structure, directly with the task process in each node Communicated, coordinated and managed with the calculating processing carried out to each task.As shown in figure 5, task process is shown in figure The relation between task orchestration module 31, wherein, shown task process can be understood as one in only topological structure Point, and each task process can be located in same node, can also be located in multiple nodes.Task orchestration module 31 is pair Relation and handled data flow of the module of the direct management of task process, task process and node etc., can pass through Above-mentioned the first mark or the first mark and second is identified to identify.

As shown in figure 5, each task process can be regarded as a program module, following several units can be included：It is responsible for From in upstream node task process read in data reading unit (Reader), be responsible for downstream node in task process write Go out the processing unit (Processer) for writing out unit (Writer) and performing calculating task of data.Shown in Fig. 5 State input node 13 in embodiment incoming task process and it is multiple iterative calculation node 11 in iterative calculation task process with And the first output task process in output node 15.Annexation dotted line between task process and task orchestration module 31 Show.

Further, as a kind of alternatively implementation, each task process in iterative calculation node can be with Will processing record information and the first identification information and the second record information associated record：To the iteration of the task process of upstream node The processing of reading the record number, iterative data of data completes record number and to the iterative data of the task process of downstream node Write out record number.Also, the processing record information of current iteration round is sent to task orchestration module 31 by task process.Task Coordination module 31 is coordinated and managed according to processing record information to multiple tasks process.

Further, task orchestration module 31 can be by way of state machine, to each in iterative calculation node Business process is coordinated and managed.As a kind of alternatively implementation, in task orchestration module, it is provided with iteration side one by one Corresponding state machine, i.e., multiple state machines that are mutually related be present in task orchestration module, the relation mechanism of these state machines is just It is the upstream and downstream genetic connection between iteration side.State machine is communicated with each task process to obtain processing record information, And according to processing record information, carry out state switching.

Specifically, above-mentioned state machine has following three kinds of states：

Read in and complete：When meeting first state machine corresponding to the iteration side of all upstreams all in the state for writing out completion, And the reading record number of current state machine is equal to when writing out the condition for recording number of first state machine, current state machine is set to reading Completion status；

Processing is completed：Completion status is read in when meeting that current state machine is in, and record is completed in the processing of current state machine When reading of the number equal to current state machine records the condition of number, current state machine is set to processing completion status；

Write out completion：Completion status is handled when meeting that current state machine is in, and writing out for current state machine records number etc. Record number is completed in the processing of current state machine, then current state machine is set to and writes out completion status.

Task orchestration module 31 is operated by the state change driving task process of state machine.From states above machine Operating mechanism to can be seen that state machine and iteration side be unique corresponding relation, the state change of state machine is to be based on iteration side Data flow and trigger, handled when belonging to same iteration side and belonging to the data flow of same iteration round by stage Cheng Hou, just into next state, so as to ensure the coordinated operation of the task process of the concurrent processing in node, while It can ensure that the task process between node being capable of coordinated operation.

Furthermore, it is necessary to explanation, task orchestration module 31 can enter to the task in whole nodes in topological structure Cheng Jinhang is coordinated and managed, and is not limited solely to be managed each task process in iterative calculation node.In this case, Above-mentioned state machine can join with each non-iterative frontier juncture, for non-iterative such as input node, output node and increment output nodes Calculate node, it handles record information and only associate with the first identification information, the state of state machine switch be referred to it is above-mentioned Switching principle.

Embodiment five

The present embodiment is related to the processing method of the processing unit of the data iteration based on the various embodiments described above.As shown in fig. 6, It is the schematic flow sheet of the processing method of the data iteration of the embodiment of the present invention five.The processing method includes：

Step 101：Iteration graph model is generated according to program code, iteration graph model includes multiple iterative calculation nodes and changed Dai Bian.

Step 102：Iteration graph model receives the primary iteration data of interative computation, triggering iterative calculation.Primary iteration number According to can from external reception, such as by be manually entered either from file read etc. or by embodiment one iteration transport Trigger module is calculated to iteration diagram mode input to trigger.

Step 103：Multiple iterative calculation nodes perform iteration successively according to the information on iteration side according to iterative calculation order Calculating is handled, wherein, the information on iteration side includes the first identification information of unique mark upstream and downstream node relationships and recorded current Second identification information of iteration round, iterates to calculate the iterative data flowed between node and the first identification information and the second mark is believed Manner of breathing associates.

In addition, it can also include after step 103：

Step 104：Wait to export iteration result data after the completion of iterating to calculate.Wherein, in multiple iterative calculation nodes, also Head node and tail node can be included.Above-mentioned step 103 can be specially：Head node receives primary iteration data and triggered and changes In generation, calculates.Wait that iteration result data are exported after the completion of iterating to calculate to be specially：Tail node is in every wheel iterative calculation to right Iteration round is judged, after predetermined condition is met, is terminated and is iterated to calculate and export iteration result data.

In addition, iteration graph model can also include initialization node, receive primary iteration data in head node and triggering changes In generation, can also include before calculating：It is initial repeatedly that initialization node is processed generation to the input data outside ring topology Codes or data, and according to the information on the non-iterative side between initialization node and head node, the primary iteration data are sent to Head node.Wherein, initialization node is connected by non-iterative side with head node, and in the information on non-iterative side, record has first Identification information.

In addition, iteration graph model can also include increment output node, in the topological structure that increment output node be present, At least one iterative calculation node can also be connected by non-iterative side with iterative calculation node and increment output node, then handles Method can also include：During interative computation is performed, according to non-between iterative calculation node and increment output node The information on iteration side, intermediate data is sent to increment output node, increment output node outwards exports intermediate data.Wherein, The information on the non-iterative side includes first identification information.

Further, a task process or multiple concurrent task process can be included by iterating to calculate in node, repeatedly Task orchestration module can also be included for graph model, multiple tasks process can be coordinated and managed by task orchestration module. Also, each task process and the first identification information and the second record information associated record handle record information as follows：To upstream The processing of reading the record number, iterative data of the iterative data of the task process of node completes record number and to downstream node The iterative data of task process writes out record number.So, during calculating is iterated, task process is by current iteration The processing record information of round is sent to task orchestration module, and task orchestration module enters according to processing record information to multiple tasks Cheng Jinhang is coordinated and managed.

Specifically, in task orchestration module, it is additionally provided with and leads to the one-to-one state machine in iteration side, task orchestration module The state change driving task process for crossing state machine is operated.Related operation mechanism on state machine is in reality above Apply example to be described in detail, will not be repeated here.

The processing method of the data iteration of the present embodiment, in calculating processing is iterated, iteration side is carried out, except Record outside upstream and downstream nodal information, be also recorded for identifying the identification information of current iteration round, so that in iterative calculation In topological diagram expansion, it is not necessary to which the iterative calculation node for repeating calculating logic repeats to deploy, i.e., can realize no matter iteration is more Few time, all keep topology diagram the same.So when being iterated calculating processing, space and the process of less internal memory are taken Resource, the operational efficiency of iterative calculation can be significantly improved.

Embodiment six

The present embodiment realizes that page rank (PageRank) is calculated with one using Map/Reduce (mapping/reduction) algorithms Method further illustrates the processing unit of the data iteration of the embodiment of the present invention and based on the processing unit using example Processing method.

Assuming that there is web link graph as shown in Figure 7, totally three thrown the net page in figure, taobao.com, tmall.com, Aliyun.com, wherein, annexation is as follows：Taobao.com has two to go out chain to aliyun.com and tmall.com； Aliyun.com has one to go out chain to tmall.com；Tmall.com has one to go out chain to taobao.com.

PageRank algorithms realize that the calculating logic of each node in figure is such as by topological structure as shown in Figure 8 Under：

Source (data source) node 41：Corresponding to the input node in the embodiment of the present invention, for reading network linking Figure, is sent to initialization node；

InitRank (initialization ranking) node 42：Corresponding to the initialization node in the embodiment of the present invention, for obtaining All webpages, and Initial R ank (ranking) value is given to each webpage, for example, being assigned initially to each webpage in this example Rank value is 1.0, and is sent to head node.

Scatter (distribution) node 43：Corresponding to the head node in the embodiment of the present invention, for according to web link graph, Distribute the Rank value of each webpage.

Sum (polymerization) node 44：Corresponding to the intermediate node in the embodiment of the present invention, obtain for polymerizeing each webpage Rank value, collect to obtain total rank values that each webpage obtains, and send the Rank of each webpage of one and last round of iteration The difference of value.

UpdateRank (renewal) node 45：Corresponding to the tail node in the embodiment of the present invention, according to receiving Rank value Difference, carry out the renewal of Rank value, obtain a new Rank value for each webpage, and be sent to Scatter nodes and enter Row next round iterates to calculate.Meanwhile if iterated conditional reaches, iteration result can be sent to Output nodes, export institute There is the Rank value of webpage.

Cooridnator (coordination) module (not shown)：Corresponding to the task orchestration module in the embodiment of the present invention, For being coordinated and managed to the task process in each node.In the data model for generating the topology diagram shown in Fig. 8, The task process of each node can pre-save the relation between the node of whole topology diagram and side, Cooridnator Module can also be between each node and side relation.

In this example, following condition hypothesis are carried out：

1) iterations is to terminate iteration behind 2, and two rounds of iteration, and corresponding iteration side is S13, S14, S15, iteration Shown in Streamid Fig. 7 of each iteration round on side, non-iterative side is S11, S12, S16, and the iteration round on non-iterative side is 0；

2) quantity of concurrent tasks (task) process of Source nodes is 1, and task process numbering is T0；

3) quantity of InitRank nodes concurrent tasks (task) process is 1, and task process numbering is T1；

4) quantity of Scatter nodes concurrent tasks (task) process is 2, and task process numbering is respectively T2 and T3；

5) quantity of Sum nodes concurrent tasks (task) process is 2, and task process numbering is respectively T4 and T5；

6) quantity of UpdateRank nodes concurrent tasks (task) process is 1, and task process numbering is T6；

7) quantity of Output nodes concurrent tasks (task) process is 1, and task process numbering is T7.

Specific data calculation process is as follows,：

1) processing of Source nodes：The numbering that Source nodes are sent to linked, diagram in InitRank nodes is T1's Task (below for convenience of description, will represent task) with mission number, and notify Cooridnator, be sent out to task id 1 A data.

2) processing of InitRank nodes：InitRank nodes can receive 1 data, upper due to InitRank nodes Trip only has Source nodes, and Coordinator modules have learned that InitRank nodes have distributed data, therefore, Coordinator modules can determine that InitRank T1 should receive 1 data, and notify T1 Reader units to start to read Access evidence.

The T1 of InitRank nodes receive a data processing after, can send three datas (tmall.com, 1.0), (taobao.com, 1.0), (aliyun.com, 1.0) give the Scatter nodes in downstream, and Scatter nodes have two tasks to enter Journey T2 and T3.Assuming that the distribution of data, which is (tmall.com, 1.0) (taobao.com, 1.0), is dealt into T2, (aliyun.com, 1.0) be dealt into T3, and the StreamId of this three data is (S12,0), i.e., and Source nodes and InitRank nodes it Between non-iterative side S12 it is corresponding.In addition, when the T1 of InitRank nodes has run through a data, and Coordinator Module is also notified that T1 only needs to read a data, and T1 can determine that it has run through all data, be given in each webpages of T1 After Initial R ank values, and after having sent data to T2 and T3, it is notified that Coordinator modules T1 processing task is complete Into.

4) processing of Scatter nodes：Scatter nodes are Map type calculate nodes.The T2 of Scatter nodes receives number After (tmall.com, 1.0) and (taobao.com, 1.0), and the StreamId (S12,0) of the data is read, passed through Topology diagram can road, the StreamId in downstream is (S13,1).According to Pagerank Computing Principle, it is assumed that user, which accesses, to be worked as The probability of preceding webpage is set as 0.15, then remaining probability is sent to the webpage for providing chain.

For data (tmall.com, 1.0), 0.15 × Rank value is sent to tmall.com oneself by T2, i.e., Data corresponding to tmall.com are changed into (tmall.com, 1.0 × 0.15), and remaining 0.85 divides equally the webpage for providing link, Taobao.com is exactly given, then data corresponding to taobao.com are changed into (taobao.com, 1.0 × 0.85).Similarly, logarithm According to (taobao.com, 1.0), T2 can produce three datas：(taobao.com,0.15)、(aliyun.com,1.0×0.85× 0.5)、(tmall.com,1.0×0.85×0.5).StreamId corresponding to this five data is (S13,1) above.

5) processing of Sum nodes：Sum nodes are Reduce type calculate nodes.The downstream node of Scatter nodes is Sum Node, it has two task process T4 and T5, it is assumed that T 2 by data (tmall, 0.15), (taobao.com, 0.85), (taobao.com, 0.15), (tmall.com, 0.425) this four data have issued T 4, by data (aliyun.com, 0.425) T5 has been issued, after being sent completely, T 2 is notified that Coordinator modules.

After T4 receives 4 datas, it can be stored inside Reader units, and tell Coordinator modules oneself Four datas corresponding to (S13,1) are have read, waiting Coordinator modules to send finishStream, (data stream transmitting is complete Into) event, to inform that T2 have received with all data corresponding to (S13,1), polymerization can be carried out and calculated.Similarly T5 is performed The same processing.

It should be noted that when Coordinator modules have known that all data of (S13,1) upstream have been sent from, And the task process in downstream, which is also reported, have received all data, then will send out a finishStream event, tell T 4 and T5, can carry out aminated polyepichlorohydrin, and this step, which needs to receive all data, can just run.

T4 can aggregated data (tmall, 0.15), (taobao.com, 0.85), (taobao.com, 0.15) (tmall.com, 0.425) this four data obtains two datas (tmall.com, 0.15+0.425), (taobao.com, 0.15+0.85), and the T6 that is sent in UpdateRank nodes, corresponding StreamId is (S14,1), and is notified Coordinator modules, data corresponding to 4 (S13,1) are had been processed by, and two datas are have sent to T6.Similarly, T5 can send out a data (aliyun.com, 0.425) to T6, and corresponding StreamId is (S14,1), and is notified Coordinator modules.

5) processing of UpdateRank nodes：UpdateRank nodes are Map type calculate nodes.In UpdateRank nodes T6 can receive three datas (tmall.com, 0.575), (taobao.com, 1.0), (aliyun.com, 0.425), and Coordinator modules are notified that T6, inform that it should receive three related datas of (S14,1).T6 has checked the number received According to rear, the list of the local Rank value of renewal is proceeded by, that is, Rank value corresponding to each webpage is updated, then by the number after renewal According to Scatter nodes are then forwarded to, corresponding StreamId is (S15,1) while is also notified that Coordinator modules, warp-wise Task process in Scatter nodes have sent three datas, so as to trigger the iterative calculation of next round.

6) processing of iterative cycles is jumped out：When iteration round is to the second wheel, the UpdateRank nodes as tail node The StreamId of the data received can be identified, when StreamId corresponding to the data for finding to receive is (S14,2), then really Determine iteration round and reached pre-determined number twice, then using OutPut nodes as downstream node, and the data after renewal are sent out OutPut nodes are given, StreamId is (S16,0), and notifies Coordinator modules.

Wherein, in above-mentioned processing procedure, Coordinator inside modules can be used as illustrated in above-mentioned embodiment State machine module, each task process is coordinated and managed by state machine, switched by the state of state machine, is come Trigger the evaluation work of each task process.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above-mentioned each method embodiment can lead to The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey Sequence upon execution, execution the step of including above-mentioned each method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.

Finally it should be noted that：Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations；To the greatest extent The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that：Its according to The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered Row equivalent substitution；And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme.

Claims

A kind of 1. processing unit of data iteration, it is characterised in that including：

Code memory, for storing the program code for being used for performing data interative computation；

Code read module, for from the code memory, reading described program code；

Iteration diagram generation module, the program code for being read according to the code read module generate iteration graph model, institute Stating iteration graph model includes：Multiple iterative calculation nodes and iteration side information memory cell,

The multiple iterative calculation node, for stored according to iteration in information memory cell iteration while information, carry out Iterative calculation, the multiple iterative calculation node form ring topology；

Iteration side information memory cell, it is described to change for storing the information on the iteration side between the iterative calculation node Include the first identification information of unique mark upstream and downstream node relationships and the second mark of record current iteration round for the information on side Know information, the iterative data and first identification information and the second identification information phase flowed between the iterative calculation node Association；

Interative computation trigger module, for after the iteration diagram generation module generates iteration graph model, receiving interative computation Primary iteration data, and to the iteration diagram mode input, to trigger interative computation.
2. processing unit according to claim 1, it is characterised in that

In the multiple iterative calculation node, including head node and tail node, the head node is except based on being iterated Outside calculating, it is additionally operable to receive the primary iteration data and triggers iterative calculation, the tail node for being iterated except calculates Outside, it is additionally operable to judge iteration round, after predetermined condition is met, terminates and iterate to calculate and export iteration result data.
3. processing unit according to claim 2, it is characterised in that the iteration graph model also include initialization node and First non-iterative side information memory cell：

The initialization node, it is connected by non-iterative side with the head node, outside to the ring topology Input data is processed generation primary iteration data, and is changed according to the non-of the first non-iterative side information memory cell storage For the information on side, the primary iteration data are sent to the head node,

The first non-iterative side information memory cell, for store it is described initialization node and the head node between it is non-repeatedly For the information on side, the information on the non-iterative side includes first identification information.
4. processing unit according to claim 3, it is characterised in that the iteration graph model also includes input node, the One output node and the second non-iterative side information memory cell：

The input node, it is connected with the initialization node, for receiving the input data outside topological structure, and according to institute Information of second non-iterative in the non-iterative that information memory cell stores is stated, the input data is sent to the initialization Node；

First output node, is connected with the tail node, for being deposited according to the second non-iterative side information memory cell The information on the non-iterative side of storage, the iteration result data are obtained from the tail node, to the topological structure outside export institute State iteration result data；

The second non-iterative side information memory cell, it is non-between the input node and the initialization node for storing Iteration while information and first output node and the tail node between non-iterative while information.
5. processing unit according to claim 1, it is characterised in that the iteration graph model also includes increment output node With the 3rd non-iterative side information memory cell：

The increment output node, it is connected with the iterative calculation node, for according to the 3rd non-iterative side information memory cell The information on the non-iterative side of storage, obtain the intermediate data in iterative process and output；

3rd non-iterative side information memory cell, for store the increment output node with it is described iterative calculation node it Between non-iterative side information, the information on the non-iterative side includes first identification information.
6. processing unit according to claim 5, it is characterised in that the iteration graph model also includes：

Second output node, it is connected with the increment output node, for according to the 4th non-iterative side information memory cell storage Non-iterative side information, receive the intermediate data from the increment output node, and to exporting outside topological structure；

4th non-iterative side information memory cell, for store the increment output node and second output node it Between non-iterative side information.
7. processing unit according to claim 1, it is characterised in that the iterative calculation node includes a task and entered Journey or multiple concurrent task process, the iteration graph model also include task orchestration module, the task orchestration module, used Multiple task process are coordinated and managed in described.
8. processing unit according to claim 7, it is characterised in that

Each task process handles record as follows with first identification information and the second record information associated record Information：

Reading to the iterative data of the task process of upstream node records number, record number is completed in the processing of iterative data and right The iterative data of the task process of downstream node writes out record number；

The processing record information of current iteration round is sent to the task orchestration module by the task process；

The task orchestration module is coordinated and managed according to the processing record information to the multiple task process.
9. processing unit according to claim 8, it is characterised in that

In the task orchestration module, it is provided with and the one-to-one state machine in iteration side, the state machine and each institute State task process to be communicated to obtain the processing record information, and according to the processing record information, carry out state switching；

Using any one in the state machine as current state machine, the current state machine has following three kinds of states：

Read in and complete：When meeting first state machine corresponding to the iteration side of all upstreams all in the state for writing out completion, and institute The reading record number for stating current state machine is equal to when writing out the condition for recording number of the first state machine, the current state machine It is set to reading completion status；

Processing is completed：Completion status is read in when meeting that the current state machine is in, and the processing of the current state machine is completed Record number and be equal to when reading in the condition for recording number of the current state machine, the current state machine is set to processing completion status；

Write out completion：When meeting that the current state machine is in processing completion status, and the current state machine writes out record Record number is completed in processing of the number equal to the current state machine, then is set to the current state machine and writes out completion status.

The task orchestration module drives the task process to be operated by the state change of state machine.
10. processing unit according to claim 1, it is characterised in that the iterative calculation calculates for Map/Reduce, institute Stating multiple iterative calculation nodes includes at least one Map types calculate node and at least one Reduce types calculate node.
A kind of 11. processing method of data iteration, it is characterised in that including：

Iteration graph model is generated according to program code, the iteration graph model includes multiple iterative calculation nodes and iteration side；

Iteration graph model receives the primary iteration data of interative computation, triggering iterative calculation；

The multiple iterative calculation node performs iterative calculation successively according to the information on the iteration side according to iterative calculation order Processing, wherein, the information on the iteration side includes the first identification information of unique mark upstream and downstream node relationships and recorded current Second identification information of iteration round, it is described to iterate to calculate the iterative data flowed between node and first identification information and institute Stating the second identification information is associated.
12. processing method according to claim 11, it is characterised in that also include：

Wait to export iteration result data after the completion of iterating to calculate.
13. processing method according to claim 12, it is characterised in that

In the multiple iterative calculation node, including head node and tail node；

The iteration graph model receives the primary iteration data of interative computation, and triggering iterative calculation includes：

The head node receives primary iteration data and triggers iterative calculation；

Wait that iteration result data are exported after the completion of iterating to calculate to be included：

The tail node, to judging iteration round, after predetermined condition is met, terminates iteration in every wheel iterative calculation Calculate and export iteration result data.
14. processing method according to claim 13, it is characterised in that the iteration graph model also includes initialization section Point, also include before the head node receives primary iteration data and triggers iterative calculation：

Initialization node is processed to the input data outside the ring topology generates primary iteration data, and according to The primary iteration data are sent to the head by the information on the non-iterative side between the initialization node and the head node Node, wherein, the initialization node is connected with the head node, and the information on the non-iterative side includes the described first mark and believed Breath.
15. processing method according to claim 11, it is characterised in that the iteration graph model also includes increment output section Point, at least one iterative calculation node are connected with increment output node, and the processing method also includes：

During interative computation is performed, according to the non-iterative between the iterative calculation node and the increment output node The information on side, intermediate data is sent to the increment output node, the increment output node outwards exports the mediant According to, wherein, the information on the non-iterative side includes first identification information.
16. processing method according to claim 11, it is characterised in that the iterative calculation node includes a task Process or multiple concurrent task process, the iteration graph model also include task orchestration module, multiple task process Coordinated and managed by the task orchestration module；

Each task process handles record as follows with first identification information and the second record information associated record Information：The processing of reading record number, iterative data to the iterative data of the task process of upstream node complete record number and Record number is write out to the iterative data of the task process of downstream node；

During calculating is iterated, the processing record information of current iteration round is sent to by the task process The task orchestration module, the task orchestration module enter according to the processing record information to the multiple task process Row coordinated management.
17. processing method according to claim 16, it is characterised in that

In the task orchestration module, it is provided with and the one-to-one state machine in iteration side, the state machine and each institute State task process to be communicated to obtain the processing record information, and according to the processing record information, carry out state switching；

Using any one in the state machine as current state machine, the current state machine has following three kinds of states：

Read in and complete：When meeting first state machine corresponding to the iteration side of all upstreams all in the state for writing out completion, and institute The reading record number for stating current state machine is equal to when writing out the condition for recording number of the first state machine, the current state machine It is set to reading completion status；

Processing is completed：Completion status is read in when meeting that the current state machine is in, and the processing of the current state machine is completed Record number and be equal to when reading in the condition for recording number of the current state machine, the current state machine is set to processing completion status；

Write out completion：When meeting that the current state machine is in processing completion status, and the current state machine writes out record Record number is completed in processing of the number equal to the current state machine, then is set to the current state machine and writes out completion status.

The task orchestration module drives the task process to be operated by the state change of state machine.