Content of the invention
Embodiments of the invention provide a kind of branch processing method based on gpu and device, can ensure code logic
While, improve the execution efficiency of branch.
For reaching above-mentioned purpose, embodiments of the invention adopt the following technical scheme that
A kind of first aspect, there is provided processing method of the branch based on gpu, comprising:
Obtain the corresponding information node of currently pending branch, at least include any one or more in described information node
Pending data;
When described information node meets described pre-conditioned, process the described pending data in described information node;
Wherein, described pre-conditioned for making to be currently able to process most described pending datas.
Described, the first in first aspect may obtain that currently pending Branch Tasks are corresponding to disappear in implementation
Before breath node, also include:
According to preset strategy, described pending data is carried out with branch process, pending data described in described same branch
Execution same instructions.
In conjunction with the first possible implementation of first aspect, first aspect, can enable in the second of first aspect
In mode, methods described also includes:
Described pre-conditioned inclusion: the quantity of the described pending data in described information node meets predetermined threshold value;With/
Or,
The corresponding timer expiry of described information node.
In conjunction with first aspect, first aspect the first may implementation, first aspect second can the side of enabling
Formula, in the third possible implementation of first aspect, when the quantity of the described pending data in described information node is full
During sufficient predetermined threshold value, obtain and process the described pending data in described information node.
In conjunction with first aspect, first aspect the first may implementation, first aspect second can the side of enabling
Formula, in the 4th kind of possible implementation of first aspect, also includes:
When timer expiry corresponding in described information node, obtain and process described pending in described information node
Data.
The second of the first the possible implementation in conjunction with first aspect or first aspect or first aspect can enable
Any one in 4th kind of possible implementation of the third possible implementation of mode or first aspect or first aspect
Or several implementation, in the 5th kind of possible implementation of first aspect, also include:
When the quantity of the described pending data in described information node is unsatisfactory for described predetermined threshold value, it is described message
Node arranges timer.
A kind of second aspect, there is provided processing meanss of the branch based on gpu, comprising:
Acquiring unit, for obtaining the corresponding information node of currently pending branch, at least includes in described information node
Any one or more pending datas;
Processing unit, for when described information node meets described pre-conditioned, processing the institute in described information node
State pending data;
Wherein, described pre-conditioned for making to be currently able to process most described pending datas.
In the first possible implementation of second aspect, described processing unit, it is additionally operable to obtain in described acquiring unit
Before taking the corresponding information node of currently pending Branch Tasks, according to preset strategy, bifurcation is carried out to described pending data
Reason, pending data described in described same branch executes same instructions.
In conjunction with the first possible implementation of second aspect, second aspect, can enable in the second of second aspect
In mode, pre-conditioned inclusion in described processing unit: the quantity of the described pending data in described information node meets
Predetermined threshold value;And/or,
The corresponding timer expiry of described information node.
In conjunction with second aspect, second aspect the first may implementation, second aspect second can the side of enabling
Formula, in the third possible implementation of second aspect, described processing unit, specifically for when the institute in described information node
When stating the quantity of pending data and meeting predetermined threshold value, obtain and process the described pending data in described information node.
In conjunction with second aspect, second aspect the first may implementation, second aspect second can the side of enabling
Formula, in the 4th kind of possible implementation of second aspect, described processing unit, specifically for when corresponding in described information node
Timer expiry, obtain and process the described pending data in described information node.
The second of the first the possible implementation in conjunction with second aspect or second aspect or second aspect can enable
Any one in 4th kind of possible implementation of the third possible implementation of mode or second aspect or second aspect
Or several implementation, in the 5th kind of possible implementation of second aspect, described device also includes:
Arranging unit, for being unsatisfactory for described predetermined threshold value when the quantity of the described pending data in described information node
When, it is described information node setting timer.
Branch processing method based on gpu provided in an embodiment of the present invention and device, correspond to getting pending branch
Information node after, and when this information node meets pre-conditioned, obtain and process the pending data in this information node.
In prior art, during data being processed according to branch, improved by the code logic changing processing data
The execution efficiency of branch is compared, the embodiment of the present invention ensure code logic while, execute same instruction once place
In reason, process most data, thus improve the execution efficiency of branch.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work
Embodiment, broadly falls into the scope of protection of the invention.
One embodiment of the invention provides a kind of branch processing method based on gpu, as shown in figure 1, the method includes:
101st, obtain the corresponding information node of currently pending branch.
Wherein, any one or more pending datas are at least included in information node.
What deserves to be explained is, when there is one group or multi-group data, the different disposal condition that meets according to it, by these
Data is divided into corresponding branch, and that is, treatment conditions are corresponding with branch.Such as, when processing to two data, when
When this two data meet condition 1, two numbers are added, and when this two number meets condition 2, this two number is multiplied, then now
Condition 1 corresponds to a branch, and condition 2 corresponds to another branch.
102nd, when information node meets pre-conditioned, obtain and process the pending data in information node.
What deserves to be explained is, in order to, in the single treatment of execution same instructions, process most data, by system or
User setup is pre-conditioned, this pre-conditioned for making to be currently able to process most pending datas.
Branch processing method based on gpu provided in an embodiment of the present invention, is getting the corresponding message of pending branch
After node, and when this information node meets pre-conditioned, obtain and process the pending data in this information node.With existing
In technology, during data being processed according to branch, branch is improved by the code logic changing processing data
Execution efficiency is compared, and the embodiment of the present invention, while ensureing code logic, in the single treatment executing same instruction, is located
Manage most data, thus improve the execution efficiency of branch.
Another embodiment of the present invention provides a kind of branch processing method based on gpu, in conjunction with the description of a upper embodiment,
Before execution above-mentioned 101,
Firstly, it is necessary to pending data be divided the pending data so that in same branch according to preset strategy
Execution identical instruction.
Preferably, preset strategy includes, but are not limited to, and user or system are according to settings such as the form of data or expression values
Execute instruction, this preset strategy include pending data with instruction corresponding relation.Wherein, pending data include one group or
Multi-group data, and the instruction of these data needs execution is identical, the process thread of these data includes two or more, specifically
A main thread, one or more secondary thread can be included.
Further, pre-conditioned described in above-mentioned 102, the quantity including the pending data in information node meets
Predetermined threshold value;And/or, the corresponding timer expiry of information node.
Specifically, such as, so that pending data is data group as a example, in conjunction with above-mentioned by main and auxiliary thread process number
According to description, when on gpu setting parallel thread be 32 when it is preferred that this predetermined threshold value is set to 32 integral multiple, that is,
Say that main thread, secondary thread obtain and treat when the pending data group of storage in this information node meets or exceeds this predetermined threshold value
The data group processing, and execute corresponding logical code, to complete data processing.
In conjunction with pre-conditioned multi-form, in the detailed process of execution above-mentioned 102, as shown in Figure 2, comprising:
1021st, when the quantity of the pending data in information node meets predetermined threshold value, obtain and process information node
In pending data.
1022nd, when timer expiry corresponding in information node, obtain and process the pending data in information node.
What deserves to be explained is, it is provided with predetermined threshold value in order to most data during single treatment, can be processed, but
Equally exist the situation that pending data can not meet this predetermined threshold value, be at this moment accomplished by guaranteeing data by arranging timer
Can be processed in time, that is, when the data of the pending data in information node is unsatisfactory for predetermined threshold value, be information node
Setting timer.
Another embodiment of the present invention provides a kind of branch processing method based on gpu, and the method can be applicable to locate as follows
In reason framework, as shown in figure 3, this framework includes message caching ring 101, main thread 102, secondary thread 103, aging chained list 104, its
Middle message caching ring 101 includes multiple information nodes, and aging chained list 104 includes multiple ageing timer nodes, and message
Information node in caching ring 101 is corresponded with the ageing timer node in aging chained list 104, main thread 102, secondary thread
103 can obtain pending data from information node, and main thread 102 includes acquisition task get event, command synchronization, obtains
Cancel breath get msg, four modules of Business Processing, secondary thread 103 include do-nothing instruction nop, command synchronization, get msg, at business
Service Processing Module in four modules of reason, wherein main thread 102, secondary thread 103, for 101 information nodes getting
In information execute corresponding service code, in present treatment structure, secondary thread 103 can represent the set of one group of secondary thread.
Based on above-mentioned framework, as shown in figure 4, this method includes:
401st, according to preset strategy, pending data is divided, merge the pending data belonging to same branch.
Further, pending data is stored in 101 information node, and branching logic execution sequence is deployed in master
In thread 102, so that 102 can complete thread scheduling, process the data of storage in information node.
What deserves to be explained is, to data execution, same instruction is Branch Tasks, 101 message in the present embodiment
The content of node storage also includes: the message of Branch Tasks and outside interaction, and represents at the inside with Branch Tasks relation
Reason message etc..
402nd, main thread 102 checks whether the information node in first Branch Tasks corresponding 101 meets predetermined threshold value.
Specifically, when determining that information node is unsatisfactory for predetermined threshold value, record current time, and this information node is inserted
To in aging chained list 104.
When determining that this information node meets predetermined threshold value, execute following 403.
403rd, main thread 102 obtains the corresponding information node of current branch task.
Further, after determining main thread 102 and secondary thread 103 command synchronization, following 404 are executed.
404th, main thread 102, secondary thread 103 obtain and process pending data by Service Processing Module.
Further, refresh the corresponding ageing timer of current message node after processing is completed.
Further, when determining that the data obtaining from information node needs to be further processed, by this data
Write back in the corresponding information node of next step Branch Tasks that message caches in ring 101.
Optionally, in another kind of implementation of the embodiment of the present invention, the current Branch Tasks executing are obtained when determining
Node be unsatisfactory for predetermined threshold value, but in the case of having timed out, execute following flow processs:
A, main thread 102 obtain this overtime information node.
B, main thread 102, secondary thread 103 parallel processing Branch Tasks.
Refresh the corresponding ageing timer of present node after c, process Wang Cheng.
Further, when determining that the data obtaining from information node needs to be further processed, this data is write
Return in the corresponding information node of next step Branch Tasks that message caches in ring 101.
Optionally, in another kind of implementation of the embodiment of the present invention, when determination gets current message node neither
Meet in the case that predetermined threshold value also has not timed out, the execution sequence of the Branch Tasks according to storage for the main thread 102 obtains next
Information node is processed.
Optionally, in another kind of implementation of the embodiment of the present invention, main thread 102 is in traversal message caching ring 101
While, aging chained list 104 can be traveled through, specifically execute following flow processs:
A1, main thread determine Branch Tasks currently to be executed, and cache traversal and this Branch Tasks on ring 101 in message
Corresponding information node.
A2, check whether aging chained list 104 is empty.
Further, when aging chained list 104 is not space-time, then following a3 are executed;When aging chained list 104 is space-time, only sentence
Whether the information node that disconnected message caches on ring 101 meets predetermined threshold value, the realization side that this situation is described with above-mentioned 401-404
Formula is identical, is not repeated.
A3, main thread 102 obtain the node of time-out from aging chained list 104.
A4, main thread 102 and the corresponding Branch Tasks of node overtime in secondary thread 103 executed in parallel.
Further, after completion processing, the overtime node that refresh process completes in aging chained list 104.
What deserves to be explained is, after the aging chained list 104 of main thread 102 traversal processing, continuation message caching ring 101 corresponds to position
Put execution traversal processing.
Such as, such as main thread 102 is originally the treatment of aged chained list 104 in the 3rd Branch Tasks, then processed
Aging chained list followed by processes the Branch Tasks after the 3rd branch.
Additionally, the message being related in the present embodiment caches ring 101 syndication message can ensure the many of command synchronization execution
The different pieces of information of thread process same type, makes thread execution efficiency maximum.
What deserves to be explained is, in main thread 102, secondary thread 103 has processed last Branch Tasks or next step is processed
Needs send out and are processed by other Branch Tasks, then main thread 102 sends messages directly to next Branch Tasks.
Another embodiment of the present invention provides a kind of branch process device based on gpu, as shown in figure 5, this device includes:
Acquiring unit 51, processing unit 52.
Acquiring unit 51, for obtaining the corresponding information node of currently pending branch.
Wherein, any one or more pending datas are at least included in information node.
Processing unit 52, for when information node meets pre-conditioned, processing the pending number in described information node
According to.
Preferably, pre-conditioned inclusion: the quantity of the pending data in information node meets predetermined threshold value;And/or, disappear
The breath corresponding timer expiry of node.
Wherein, pre-conditioned for making to be currently able to process most pending datas.
Optionally, processing unit 52, are additionally operable to obtain the corresponding message of currently pending Branch Tasks in acquiring unit 51
Before node, according to preset strategy, branch process is carried out to pending data.
What deserves to be explained is, pending data execution same instructions in same branch.
Specifically, processing unit 52, are additionally operable to when the quantity of the pending data in information node meets predetermined threshold value,
Obtain and process the pending data in information node;When timer expiry corresponding in information node, obtain and process message
Pending data in node.
Optionally, as shown in fig. 6, this device also includes: arranging unit 53.
Arranging unit 53, for when the quantity of the pending data in information node is unsatisfactory for predetermined threshold value, being message
Node arranges timer.
A kind of branch process device based on gpu provided in an embodiment of the present invention, gets pending point in acquiring unit
After propping up corresponding information node, and when this information node meets pre-conditioned, by this information node of processing unit processes
Pending data.In prior art, during data being processed according to branch, by changing the generation of processing data
Code logic is compared come the execution efficiency to improve branch, and the embodiment of the present invention is while in guarantee code logic, same executing
In the single treatment of instruction, process most data, thus improve the execution efficiency of branch.
Another embodiment of the present invention provides a kind of branch process device based on gpu, as shown in fig. 7, this device includes:
Memory 71, processor 72 and bus 73.Wherein, memory 71, processor 72 are communicated to connect by bus 73.
Memory 71 can be read-only storage (read only memory, rom), static storage device, dynamic memory
Equipment or random access memory (random access memory, ram).Memory 71 can with storage program area and its
His application program.When by software or firmware to realize technical scheme provided in an embodiment of the present invention, it is used for realizing this
The program code of the technical scheme that bright embodiment provides is saved in memory 71, and to be executed by processor 72.
Processor 72 can be using general central processing unit (central processing unit, cpu), microprocessor
Device, application specific integrated circuit (application specific integrated circuit, asic), or one or
Multiple integrated circuits, for executing relative program, to realize the technical scheme that the embodiment of the present invention is provided.
Bus 73 may include a path, transmission letter between device all parts (such as memory 71 and processor 72)
Breath.
What deserves to be explained is, although the hardware shown in Fig. 7 illustrate only memory 71 and processor 72 and bus 73,
But during implementing, it should be apparent to a person skilled in the art that this terminal also comprise to realize normally to run institute necessary
Other devices.Meanwhile, according to specific needs, it should be apparent to a person skilled in the art that also can comprise to realize other functions
Hardware device.
Specifically, the device shown in Fig. 7 is used for realizing the method flow shown in Fig. 1-Fig. 4.
Processor 72, for obtaining the corresponding information node of currently pending branch, is additionally operable to meet in advance when information node
If during condition, process the pending data in information node;
Wherein, any one or more pending datas are at least included in information node;Pre-conditioned for making current energy
Enough process most pending datas it is preferred that this pre-conditioned inclusion: the quantity of the pending data in information node meets
Predetermined threshold value;And/or, the corresponding timer expiry of information node.
Memory 71, for storing the pending data in information node.
Optionally, processor 72, be additionally operable to obtain the corresponding information node of currently pending Branch Tasks before, according to
Preset strategy carries out branch process to pending data.
What deserves to be explained is, the pending data execution same instructions in same branch.
Processor 72, specifically for when the quantity of the pending data in information node meets predetermined threshold value, obtaining simultaneously
Process the pending data in information node;When timer expiry corresponding in information node, obtain and process in information node
Pending data.
Optionally, processor 72, are additionally operable to when the quantity of the pending data in information node is unsatisfactory for predetermined threshold value,
For information node, timer is set.
Memory 71, is additionally operable to store predetermined threshold value, and data processing instructions.
A kind of branch process device based on gpu provided in an embodiment of the present invention, gets pending branch in processor
After corresponding information node, and when this information node meets pre-conditioned, process the pending data in this information node.With
In prior art, during data being processed according to branch, improved point by the code logic changing processing data
Execution efficiency compare, the embodiment of the present invention ensure code logic while, execute same instruction single treatment
In, process most data, thus improve the execution efficiency of branch.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can borrow
Help software to add the mode of necessary common hardware to realize naturally it is also possible to pass through hardware, but the former is more preferably in many cases
Embodiment.Based on such understanding, the portion that technical scheme substantially contributes to prior art in other words
Divide and can be embodied in the form of software product, this computer software product is stored in the storage medium that can read, such as count
The floppy disk of calculation machine, hard disk or CD etc., including some instructions with so that computer equipment (can be personal computer,
Server, or the network equipment etc.) method described in execution each embodiment of the present invention.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by described scope of the claims.