CN103714511B - GPU-based branch processing method and device - Google Patents

GPU-based branch processing method and device Download PDF

Info

Publication number
CN103714511B
CN103714511B CN201310695410.8A CN201310695410A CN103714511B CN 103714511 B CN103714511 B CN 103714511B CN 201310695410 A CN201310695410 A CN 201310695410A CN 103714511 B CN103714511 B CN 103714511B
Authority
CN
China
Prior art keywords
information node
branch
data
pending
pending data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310695410.8A
Other languages
Chinese (zh)
Other versions
CN103714511A (en
Inventor
殷罗英
朱坤
吴钊源
陈剑军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bengbu Hongjing Technology Co.,Ltd.
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201310695410.8A priority Critical patent/CN103714511B/en
Publication of CN103714511A publication Critical patent/CN103714511A/en
Application granted granted Critical
Publication of CN103714511B publication Critical patent/CN103714511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a GPU-based branch processing method and a device, and relates to the technical field of data processing. Code logic can be guaranteed and branch execution efficiency can be enhanced simultaneously. The concrete embodiment of the invention comprises that: after a message node corresponding to a branch to be processed is acquired, and the message node meets a preset condition, data to be processed in the message node are acquired and processed. The technical scheme provided by the embodiment of the invention is mainly applied to data processing flow.

Description

A kind of branch processing method based on gpu and device
Technical field
The present invention relates to technical field of data processing, more particularly, to one kind are based on gpu(graphic processing Unit, graphic process unit) branch processing method and device.
Background technology
At present, gpu has parallel processing capability and Programmable Pipeline function, can process nongraphical data.Especially Using simd(single instruction multiple data, single-instruction multiple-data stream (SIMD)) model when, its performance is particularly Superior, the operand of data processing is much larger than data dispatch and the needs of transmission, thus gpu have been widely used supercomputing, The fields such as scientific algorithm, finance, chemistry.Specifically, gpu adopt simd model be one kind to be controlled using a controller many Individual processor, executes identical simultaneously respectively and operates thus realizing sky to each of one group of data (also known as " data vector ") Between on concurrency technology.So this technology, when processing batch data according to same instruction, can give play to multiprocessing The advantage of device, but when different instruction or Data Concurrent specification is not enough, multiprocessor will be by executing difference in batches Instruction carrys out processing data, will result in the low problem of execution efficiency of data-handling efficiency, instruction.
Low in order to solve the problems, such as that above-mentioned different instruction must execute the data-handling efficiency leading in batches, using postponing to change Generation technique.Specifically, when containing branch in the thread of an iteration, using iterative delay technology so that changing in each circulation Dai Zhong, for multiple-limb scene, only walks an individual path by all threads of delay guaranteed, any behaviour is not in other branches Make.State in realization in technical process, inventor finds, although the branch that iterative delay improves on gpu to a certain extent holds Line efficiency, but destroy the logical order of code execution in thread, change the realization of former business.
Content of the invention
Embodiments of the invention provide a kind of branch processing method based on gpu and device, can ensure code logic While, improve the execution efficiency of branch.
For reaching above-mentioned purpose, embodiments of the invention adopt the following technical scheme that
A kind of first aspect, there is provided processing method of the branch based on gpu, comprising:
Obtain the corresponding information node of currently pending branch, at least include any one or more in described information node Pending data;
When described information node meets described pre-conditioned, process the described pending data in described information node;
Wherein, described pre-conditioned for making to be currently able to process most described pending datas.
Described, the first in first aspect may obtain that currently pending Branch Tasks are corresponding to disappear in implementation Before breath node, also include:
According to preset strategy, described pending data is carried out with branch process, pending data described in described same branch Execution same instructions.
In conjunction with the first possible implementation of first aspect, first aspect, can enable in the second of first aspect In mode, methods described also includes:
Described pre-conditioned inclusion: the quantity of the described pending data in described information node meets predetermined threshold value;With/ Or,
The corresponding timer expiry of described information node.
In conjunction with first aspect, first aspect the first may implementation, first aspect second can the side of enabling Formula, in the third possible implementation of first aspect, when the quantity of the described pending data in described information node is full During sufficient predetermined threshold value, obtain and process the described pending data in described information node.
In conjunction with first aspect, first aspect the first may implementation, first aspect second can the side of enabling Formula, in the 4th kind of possible implementation of first aspect, also includes:
When timer expiry corresponding in described information node, obtain and process described pending in described information node Data.
The second of the first the possible implementation in conjunction with first aspect or first aspect or first aspect can enable Any one in 4th kind of possible implementation of the third possible implementation of mode or first aspect or first aspect Or several implementation, in the 5th kind of possible implementation of first aspect, also include:
When the quantity of the described pending data in described information node is unsatisfactory for described predetermined threshold value, it is described message Node arranges timer.
A kind of second aspect, there is provided processing meanss of the branch based on gpu, comprising:
Acquiring unit, for obtaining the corresponding information node of currently pending branch, at least includes in described information node Any one or more pending datas;
Processing unit, for when described information node meets described pre-conditioned, processing the institute in described information node State pending data;
Wherein, described pre-conditioned for making to be currently able to process most described pending datas.
In the first possible implementation of second aspect, described processing unit, it is additionally operable to obtain in described acquiring unit Before taking the corresponding information node of currently pending Branch Tasks, according to preset strategy, bifurcation is carried out to described pending data Reason, pending data described in described same branch executes same instructions.
In conjunction with the first possible implementation of second aspect, second aspect, can enable in the second of second aspect In mode, pre-conditioned inclusion in described processing unit: the quantity of the described pending data in described information node meets Predetermined threshold value;And/or,
The corresponding timer expiry of described information node.
In conjunction with second aspect, second aspect the first may implementation, second aspect second can the side of enabling Formula, in the third possible implementation of second aspect, described processing unit, specifically for when the institute in described information node When stating the quantity of pending data and meeting predetermined threshold value, obtain and process the described pending data in described information node.
In conjunction with second aspect, second aspect the first may implementation, second aspect second can the side of enabling Formula, in the 4th kind of possible implementation of second aspect, described processing unit, specifically for when corresponding in described information node Timer expiry, obtain and process the described pending data in described information node.
The second of the first the possible implementation in conjunction with second aspect or second aspect or second aspect can enable Any one in 4th kind of possible implementation of the third possible implementation of mode or second aspect or second aspect Or several implementation, in the 5th kind of possible implementation of second aspect, described device also includes:
Arranging unit, for being unsatisfactory for described predetermined threshold value when the quantity of the described pending data in described information node When, it is described information node setting timer.
Branch processing method based on gpu provided in an embodiment of the present invention and device, correspond to getting pending branch Information node after, and when this information node meets pre-conditioned, obtain and process the pending data in this information node. In prior art, during data being processed according to branch, improved by the code logic changing processing data The execution efficiency of branch is compared, the embodiment of the present invention ensure code logic while, execute same instruction once place In reason, process most data, thus improve the execution efficiency of branch.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.
A kind of method flow diagram of branch process based on gpu that Fig. 1 provides for one embodiment of the invention;
A kind of method flow diagram of branch process based on gpu that Fig. 2 provides for another embodiment of the present invention;
The composition of the processing framework of the branch processing method applied based on gpu that Fig. 3 provides for another embodiment of the present invention Schematic diagram;
A kind of processing framework execution branch process based on shown in above-mentioned Fig. 3 that Fig. 4 provides for another embodiment of the present invention Method flow diagram;
A kind of composition schematic diagram of branch process device based on gpu that Fig. 5 provides for another embodiment of the present invention;
A kind of composition schematic diagram of branch process device based on gpu that Fig. 6 provides for another embodiment of the present invention;
A kind of composition schematic diagram of branch process device based on gpu that Fig. 7 provides for another embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work Embodiment, broadly falls into the scope of protection of the invention.
One embodiment of the invention provides a kind of branch processing method based on gpu, as shown in figure 1, the method includes:
101st, obtain the corresponding information node of currently pending branch.
Wherein, any one or more pending datas are at least included in information node.
What deserves to be explained is, when there is one group or multi-group data, the different disposal condition that meets according to it, by these Data is divided into corresponding branch, and that is, treatment conditions are corresponding with branch.Such as, when processing to two data, when When this two data meet condition 1, two numbers are added, and when this two number meets condition 2, this two number is multiplied, then now Condition 1 corresponds to a branch, and condition 2 corresponds to another branch.
102nd, when information node meets pre-conditioned, obtain and process the pending data in information node.
What deserves to be explained is, in order to, in the single treatment of execution same instructions, process most data, by system or User setup is pre-conditioned, this pre-conditioned for making to be currently able to process most pending datas.
Branch processing method based on gpu provided in an embodiment of the present invention, is getting the corresponding message of pending branch After node, and when this information node meets pre-conditioned, obtain and process the pending data in this information node.With existing In technology, during data being processed according to branch, branch is improved by the code logic changing processing data Execution efficiency is compared, and the embodiment of the present invention, while ensureing code logic, in the single treatment executing same instruction, is located Manage most data, thus improve the execution efficiency of branch.
Another embodiment of the present invention provides a kind of branch processing method based on gpu, in conjunction with the description of a upper embodiment, Before execution above-mentioned 101,
Firstly, it is necessary to pending data be divided the pending data so that in same branch according to preset strategy Execution identical instruction.
Preferably, preset strategy includes, but are not limited to, and user or system are according to settings such as the form of data or expression values Execute instruction, this preset strategy include pending data with instruction corresponding relation.Wherein, pending data include one group or Multi-group data, and the instruction of these data needs execution is identical, the process thread of these data includes two or more, specifically A main thread, one or more secondary thread can be included.
Further, pre-conditioned described in above-mentioned 102, the quantity including the pending data in information node meets Predetermined threshold value;And/or, the corresponding timer expiry of information node.
Specifically, such as, so that pending data is data group as a example, in conjunction with above-mentioned by main and auxiliary thread process number According to description, when on gpu setting parallel thread be 32 when it is preferred that this predetermined threshold value is set to 32 integral multiple, that is, Say that main thread, secondary thread obtain and treat when the pending data group of storage in this information node meets or exceeds this predetermined threshold value The data group processing, and execute corresponding logical code, to complete data processing.
In conjunction with pre-conditioned multi-form, in the detailed process of execution above-mentioned 102, as shown in Figure 2, comprising:
1021st, when the quantity of the pending data in information node meets predetermined threshold value, obtain and process information node In pending data.
1022nd, when timer expiry corresponding in information node, obtain and process the pending data in information node.
What deserves to be explained is, it is provided with predetermined threshold value in order to most data during single treatment, can be processed, but Equally exist the situation that pending data can not meet this predetermined threshold value, be at this moment accomplished by guaranteeing data by arranging timer Can be processed in time, that is, when the data of the pending data in information node is unsatisfactory for predetermined threshold value, be information node Setting timer.
Another embodiment of the present invention provides a kind of branch processing method based on gpu, and the method can be applicable to locate as follows In reason framework, as shown in figure 3, this framework includes message caching ring 101, main thread 102, secondary thread 103, aging chained list 104, its Middle message caching ring 101 includes multiple information nodes, and aging chained list 104 includes multiple ageing timer nodes, and message Information node in caching ring 101 is corresponded with the ageing timer node in aging chained list 104, main thread 102, secondary thread 103 can obtain pending data from information node, and main thread 102 includes acquisition task get event, command synchronization, obtains Cancel breath get msg, four modules of Business Processing, secondary thread 103 include do-nothing instruction nop, command synchronization, get msg, at business Service Processing Module in four modules of reason, wherein main thread 102, secondary thread 103, for 101 information nodes getting In information execute corresponding service code, in present treatment structure, secondary thread 103 can represent the set of one group of secondary thread.
Based on above-mentioned framework, as shown in figure 4, this method includes:
401st, according to preset strategy, pending data is divided, merge the pending data belonging to same branch.
Further, pending data is stored in 101 information node, and branching logic execution sequence is deployed in master In thread 102, so that 102 can complete thread scheduling, process the data of storage in information node.
What deserves to be explained is, to data execution, same instruction is Branch Tasks, 101 message in the present embodiment The content of node storage also includes: the message of Branch Tasks and outside interaction, and represents at the inside with Branch Tasks relation Reason message etc..
402nd, main thread 102 checks whether the information node in first Branch Tasks corresponding 101 meets predetermined threshold value.
Specifically, when determining that information node is unsatisfactory for predetermined threshold value, record current time, and this information node is inserted To in aging chained list 104.
When determining that this information node meets predetermined threshold value, execute following 403.
403rd, main thread 102 obtains the corresponding information node of current branch task.
Further, after determining main thread 102 and secondary thread 103 command synchronization, following 404 are executed.
404th, main thread 102, secondary thread 103 obtain and process pending data by Service Processing Module.
Further, refresh the corresponding ageing timer of current message node after processing is completed.
Further, when determining that the data obtaining from information node needs to be further processed, by this data Write back in the corresponding information node of next step Branch Tasks that message caches in ring 101.
Optionally, in another kind of implementation of the embodiment of the present invention, the current Branch Tasks executing are obtained when determining Node be unsatisfactory for predetermined threshold value, but in the case of having timed out, execute following flow processs:
A, main thread 102 obtain this overtime information node.
B, main thread 102, secondary thread 103 parallel processing Branch Tasks.
Refresh the corresponding ageing timer of present node after c, process Wang Cheng.
Further, when determining that the data obtaining from information node needs to be further processed, this data is write Return in the corresponding information node of next step Branch Tasks that message caches in ring 101.
Optionally, in another kind of implementation of the embodiment of the present invention, when determination gets current message node neither Meet in the case that predetermined threshold value also has not timed out, the execution sequence of the Branch Tasks according to storage for the main thread 102 obtains next Information node is processed.
Optionally, in another kind of implementation of the embodiment of the present invention, main thread 102 is in traversal message caching ring 101 While, aging chained list 104 can be traveled through, specifically execute following flow processs:
A1, main thread determine Branch Tasks currently to be executed, and cache traversal and this Branch Tasks on ring 101 in message Corresponding information node.
A2, check whether aging chained list 104 is empty.
Further, when aging chained list 104 is not space-time, then following a3 are executed;When aging chained list 104 is space-time, only sentence Whether the information node that disconnected message caches on ring 101 meets predetermined threshold value, the realization side that this situation is described with above-mentioned 401-404 Formula is identical, is not repeated.
A3, main thread 102 obtain the node of time-out from aging chained list 104.
A4, main thread 102 and the corresponding Branch Tasks of node overtime in secondary thread 103 executed in parallel.
Further, after completion processing, the overtime node that refresh process completes in aging chained list 104.
What deserves to be explained is, after the aging chained list 104 of main thread 102 traversal processing, continuation message caching ring 101 corresponds to position Put execution traversal processing.
Such as, such as main thread 102 is originally the treatment of aged chained list 104 in the 3rd Branch Tasks, then processed Aging chained list followed by processes the Branch Tasks after the 3rd branch.
Additionally, the message being related in the present embodiment caches ring 101 syndication message can ensure the many of command synchronization execution The different pieces of information of thread process same type, makes thread execution efficiency maximum.
What deserves to be explained is, in main thread 102, secondary thread 103 has processed last Branch Tasks or next step is processed Needs send out and are processed by other Branch Tasks, then main thread 102 sends messages directly to next Branch Tasks.
Another embodiment of the present invention provides a kind of branch process device based on gpu, as shown in figure 5, this device includes: Acquiring unit 51, processing unit 52.
Acquiring unit 51, for obtaining the corresponding information node of currently pending branch.
Wherein, any one or more pending datas are at least included in information node.
Processing unit 52, for when information node meets pre-conditioned, processing the pending number in described information node According to.
Preferably, pre-conditioned inclusion: the quantity of the pending data in information node meets predetermined threshold value;And/or, disappear The breath corresponding timer expiry of node.
Wherein, pre-conditioned for making to be currently able to process most pending datas.
Optionally, processing unit 52, are additionally operable to obtain the corresponding message of currently pending Branch Tasks in acquiring unit 51 Before node, according to preset strategy, branch process is carried out to pending data.
What deserves to be explained is, pending data execution same instructions in same branch.
Specifically, processing unit 52, are additionally operable to when the quantity of the pending data in information node meets predetermined threshold value, Obtain and process the pending data in information node;When timer expiry corresponding in information node, obtain and process message Pending data in node.
Optionally, as shown in fig. 6, this device also includes: arranging unit 53.
Arranging unit 53, for when the quantity of the pending data in information node is unsatisfactory for predetermined threshold value, being message Node arranges timer.
A kind of branch process device based on gpu provided in an embodiment of the present invention, gets pending point in acquiring unit After propping up corresponding information node, and when this information node meets pre-conditioned, by this information node of processing unit processes Pending data.In prior art, during data being processed according to branch, by changing the generation of processing data Code logic is compared come the execution efficiency to improve branch, and the embodiment of the present invention is while in guarantee code logic, same executing In the single treatment of instruction, process most data, thus improve the execution efficiency of branch.
Another embodiment of the present invention provides a kind of branch process device based on gpu, as shown in fig. 7, this device includes: Memory 71, processor 72 and bus 73.Wherein, memory 71, processor 72 are communicated to connect by bus 73.
Memory 71 can be read-only storage (read only memory, rom), static storage device, dynamic memory Equipment or random access memory (random access memory, ram).Memory 71 can with storage program area and its His application program.When by software or firmware to realize technical scheme provided in an embodiment of the present invention, it is used for realizing this The program code of the technical scheme that bright embodiment provides is saved in memory 71, and to be executed by processor 72.
Processor 72 can be using general central processing unit (central processing unit, cpu), microprocessor Device, application specific integrated circuit (application specific integrated circuit, asic), or one or Multiple integrated circuits, for executing relative program, to realize the technical scheme that the embodiment of the present invention is provided.
Bus 73 may include a path, transmission letter between device all parts (such as memory 71 and processor 72) Breath.
What deserves to be explained is, although the hardware shown in Fig. 7 illustrate only memory 71 and processor 72 and bus 73, But during implementing, it should be apparent to a person skilled in the art that this terminal also comprise to realize normally to run institute necessary Other devices.Meanwhile, according to specific needs, it should be apparent to a person skilled in the art that also can comprise to realize other functions Hardware device.
Specifically, the device shown in Fig. 7 is used for realizing the method flow shown in Fig. 1-Fig. 4.
Processor 72, for obtaining the corresponding information node of currently pending branch, is additionally operable to meet in advance when information node If during condition, process the pending data in information node;
Wherein, any one or more pending datas are at least included in information node;Pre-conditioned for making current energy Enough process most pending datas it is preferred that this pre-conditioned inclusion: the quantity of the pending data in information node meets Predetermined threshold value;And/or, the corresponding timer expiry of information node.
Memory 71, for storing the pending data in information node.
Optionally, processor 72, be additionally operable to obtain the corresponding information node of currently pending Branch Tasks before, according to Preset strategy carries out branch process to pending data.
What deserves to be explained is, the pending data execution same instructions in same branch.
Processor 72, specifically for when the quantity of the pending data in information node meets predetermined threshold value, obtaining simultaneously Process the pending data in information node;When timer expiry corresponding in information node, obtain and process in information node Pending data.
Optionally, processor 72, are additionally operable to when the quantity of the pending data in information node is unsatisfactory for predetermined threshold value, For information node, timer is set.
Memory 71, is additionally operable to store predetermined threshold value, and data processing instructions.
A kind of branch process device based on gpu provided in an embodiment of the present invention, gets pending branch in processor After corresponding information node, and when this information node meets pre-conditioned, process the pending data in this information node.With In prior art, during data being processed according to branch, improved point by the code logic changing processing data Execution efficiency compare, the embodiment of the present invention ensure code logic while, execute same instruction single treatment In, process most data, thus improve the execution efficiency of branch.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can borrow Help software to add the mode of necessary common hardware to realize naturally it is also possible to pass through hardware, but the former is more preferably in many cases Embodiment.Based on such understanding, the portion that technical scheme substantially contributes to prior art in other words Divide and can be embodied in the form of software product, this computer software product is stored in the storage medium that can read, such as count The floppy disk of calculation machine, hard disk or CD etc., including some instructions with so that computer equipment (can be personal computer, Server, or the network equipment etc.) method described in execution each embodiment of the present invention.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, all should contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by described scope of the claims.

Claims (10)

1. a kind of branch processing method based on graphic process unit gpu is it is characterised in that include:
Obtain the corresponding information node of currently pending branch, at least include in described information node any one or more waiting to locate Reason data;
When described information node meets pre-conditioned, process the described pending data in described information node;
Wherein, described pre-conditioned for making to be currently able to process at most described pending data;
Before the corresponding information node of the currently pending branch of described acquisition, also include:
According to preset strategy, branch process is carried out to described pending data, the execution of pending data described in same branch is identical Instruction.
2. method according to claim 1 is it is characterised in that methods described also includes:
Described pre-conditioned inclusion: the quantity of the described pending data in described information node meets predetermined threshold value;And/or,
The corresponding timer expiry of described information node.
3. method according to claim 2 it is characterised in that
When the quantity of the described pending data in described information node meets predetermined threshold value, obtain and process described message section Described pending data in point.
4. method according to claim 2 it is characterised in that
When timer expiry corresponding in described information node, obtain and process the described pending number in described information node According to.
5. the method according to claim 2-4 any one is it is characterised in that include:
When the quantity of the described pending data in described information node is unsatisfactory for described predetermined threshold value, it is described information node Setting timer.
6. a kind of branch process device based on gpu is it is characterised in that include:
Acquiring unit, for obtaining the corresponding information node of currently pending branch, at least includes in described information node arbitrarily One or more pending datas;
Processing unit, described pending in described information node for when described information node meets pre-conditioned, processing Data;
Wherein, described pre-conditioned for making to be currently able to process most described pending datas;
Described processing unit, is additionally operable to, before described acquiring unit obtains the corresponding information node of currently pending branch, press According to preset strategy, branch process is carried out to described pending data, pending data described in same branch executes same instructions.
7. device according to claim 6 it is characterised in that
Pre-conditioned inclusion in described processing unit: the quantity of the described pending data in described information node meets default Threshold value;And/or,
The corresponding timer expiry of described information node.
8. device according to claim 7 it is characterised in that
Described processing unit, specifically for meeting predetermined threshold value when the quantity of the described pending data in described information node When, obtain and process the described pending data in described information node.
9. device according to claim 7 it is characterised in that
Described processing unit, specifically for when timer expiry corresponding in described information node, obtaining and processing described message Described pending data in node.
10. the device according to claim 7-9 any one is it is characterised in that described device also includes:
Arranging unit, for when the quantity of the described pending data in described information node is unsatisfactory for described predetermined threshold value, For described information node, timer is set.
CN201310695410.8A 2013-12-17 2013-12-17 GPU-based branch processing method and device Active CN103714511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310695410.8A CN103714511B (en) 2013-12-17 2013-12-17 GPU-based branch processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310695410.8A CN103714511B (en) 2013-12-17 2013-12-17 GPU-based branch processing method and device

Publications (2)

Publication Number Publication Date
CN103714511A CN103714511A (en) 2014-04-09
CN103714511B true CN103714511B (en) 2017-01-18

Family

ID=50407456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310695410.8A Active CN103714511B (en) 2013-12-17 2013-12-17 GPU-based branch processing method and device

Country Status (1)

Country Link
CN (1) CN103714511B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107124286B (en) * 2016-02-24 2020-05-26 深圳市知穹科技有限公司 System and method for high-speed processing and interaction of mass data
CN111095197B (en) * 2017-10-27 2021-10-15 华为技术有限公司 Code processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315607A (en) * 2007-05-31 2008-12-03 Sap股份公司 Process model control flow with multiple synchronizations
CN102831577A (en) * 2012-08-29 2012-12-19 电子科技大学 Method for fast zooming two-dimensional seismic image based on GPU (graphic processing unit)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8788570B2 (en) * 2009-06-22 2014-07-22 Citrix Systems, Inc. Systems and methods for retaining source IP in a load balancing multi-core environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315607A (en) * 2007-05-31 2008-12-03 Sap股份公司 Process model control flow with multiple synchronizations
CN102831577A (en) * 2012-08-29 2012-12-19 电子科技大学 Method for fast zooming two-dimensional seismic image based on GPU (graphic processing unit)

Also Published As

Publication number Publication date
CN103714511A (en) 2014-04-09

Similar Documents

Publication Publication Date Title
CN106662995B (en) Device, method, system, medium and the equipment seized for providing intermediate thread
CN103207797B (en) Capsule type custom-made updating method based on unified extensible firmware interface firmware system
CN101203831B (en) Device, method and system for caching memory update
CN108139946B (en) Method for efficient task scheduling in the presence of conflicts
CN106462395B (en) Thread in multiline procedure processor framework waits
EP2834744B1 (en) System and method for memory management
CN109885343A (en) A kind of controller low-power consumption starting method, apparatus, computer equipment and storage medium
JP2020173870A (en) Improved function callback mechanism between central processing unit (cpu) and auxiliary processor
CN109783157B (en) Method and related device for loading algorithm program
CN101366004A (en) Methods and apparatus for multi-core processing with dedicated thread management
CN104866443B (en) It is exclusive that storage can be interrupted
US8713262B2 (en) Managing a spinlock indicative of exclusive access to a system resource
CN104050032A (en) System and method for hardware scheduling of conditional barriers and impatient barriers
JP2011507112A5 (en)
CN1983196A (en) System and method for grouping execution threads
CN109901890A (en) A kind of method, apparatus, computer equipment and the storage medium of controller loading multi-core firmware
US20160188243A1 (en) Memory access protection using processor transactional memory support
US20110078418A1 (en) Support for Non-Local Returns in Parallel Thread SIMD Engine
CN110516789A (en) The processing method of instruction set, device and relevant device in convolutional network accelerator
US9513923B2 (en) System and method for context migration across CPU threads
WO2017127180A1 (en) Random-access disjoint concurrent sparse writes to heterogeneous buffers
CN107003897B (en) Monitoring utilization of transaction processing resources
CN103714511B (en) GPU-based branch processing method and device
CN104615445A (en) Equipment IO queue method based on atomic operation
US20240045787A1 (en) Code inspection method under weak memory ordering architecture and corresponding device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201229

Address after: 233000 No.10, building 32, Zone 8, Guangcai market, bengshan District, Bengbu City, Anhui Province

Patentee after: Bengbu Hongjing Technology Co.,Ltd.

Address before: 518000 Baoan District Xin'an street, Shenzhen, Guangdong, No. 625, No. 625, Nuo platinum Plaza,

Patentee before: SHENZHEN SHANGGE INTELLECTUAL PROPERTY SERVICE Co.,Ltd.

Effective date of registration: 20201229

Address after: 518000 Baoan District Xin'an street, Shenzhen, Guangdong, No. 625, No. 625, Nuo platinum Plaza,

Patentee after: SHENZHEN SHANGGE INTELLECTUAL PROPERTY SERVICE Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right