CN110135569A - Heterogeneous platform neuron positioning three-level flow parallel method, system and medium - Google Patents
Heterogeneous platform neuron positioning three-level flow parallel method, system and medium Download PDFInfo
- Publication number
- CN110135569A CN110135569A CN201910289495.7A CN201910289495A CN110135569A CN 110135569 A CN110135569 A CN 110135569A CN 201910289495 A CN201910289495 A CN 201910289495A CN 110135569 A CN110135569 A CN 110135569A
- Authority
- CN
- China
- Prior art keywords
- pointer
- variable
- directed toward
- cpu
- pointer variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 96
- 210000002569 neuron Anatomy 0.000 title claims abstract description 78
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 238000003860 storage Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims description 60
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 31
- 125000004122 cyclic group Chemical group 0.000 claims description 25
- 230000007717 exclusion Effects 0.000 claims description 25
- 230000001360 synchronised effect Effects 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 12
- 230000003139 buffering effect Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004064 recycling Methods 0.000 claims description 3
- 238000004080 punching Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 12
- 230000008901 benefit Effects 0.000 abstract description 5
- 230000000903 blocking effect Effects 0.000 abstract 1
- 238000000638 solvent extraction Methods 0.000 abstract 1
- 238000002054 transplantation Methods 0.000 abstract 1
- 230000001537 neural effect Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000012546 transfer Methods 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 208000018152 Cerebral disease Diseases 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007630 basic procedure Methods 0.000 description 1
- 230000003925 brain function Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Neurology (AREA)
- General Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a heterogeneous platform neuron positioning three-level pipeline parallel method, a system and a medium, wherein slice image data are calculated into blocking parameters according to the image size and the calculation granularity; respectively allocating storage space at a CPU end and a GPU end based on the partitioning parameters; initializing variables and storage space; the CPU carries out task scheduling, the CPU and the GPU simultaneously adopt a three-stage pipeline mode to execute computing tasks, each computing task comprises three steps of parallel data reading-in, positioning computation and data writing-back, and each computing task in the middle executes the data writing-back of the previous computing task and the data reading-in of the next computing task while executing the positioning computation. The method can improve the processing speed of the neuron positioning, and has the advantages of high neuron positioning speed, short total program execution time, flexible three-stage pipeline realization, parameter configuration support and easy transplantation and popularization.
Description
Technical field
The present invention relates to the analytic methods of neural circuit fine structure, and in particular to a kind of heterogeneous platform Neurons location three
Grade flowing water parallel method, system and medium, for realizing Neurons location parallel computation based on CPU-GPU heterogeneous computing platforms.
Background technique
Neural circuit information is the key that understand brain function and cerebral disease mechanism, how to realize neural circuit big data oneself
Dynamic tracking is one of the key scientific problems that the neural area research such as brain science is faced.Neurons location is neural circuit number
According to the key of parsing, accurate pericaryon position is obtained by being analyzed neural circuit image data, is subsequent fixed
Measure the basis of analysis.
Typical large scale range Neurons location method is based on " each cell has and only one cell space " this biology
Learn true, Biophysical model established by mathematical method (such as 1 norm minimum thought) integration, and by solve the model come
Carry out large-scale Neurons location.Such methods are close to various kinds of cell type, shape, size and distribution in large scale range
Degree etc. has good robustness, thus is the large scale range Neurons location of current high-precision neural circuit image data set
Main method, but the image dimension-limited that this method is capable of handling is in the memory size of single calculate node, processing speed
It is limited to the calculated performance of single calculate node.
With being constantly progressive for observation technology, the data scale of high-precision neural circuit image data set is increased rapidly, special
It is not the huge advance of optical labeling molecule and micro-imaging technique, so that high-resolution obtains full brain data and becomes a reality.By
It is larger in primate brain volume, according to current MOST imaging technique calculate, to 10 cubic centimetres of broad range of data into
Each imaging to 1 micrometer resolution of row, can generate hundred TB data.Based on existing Neurons location method, handling 1GB data is needed
1 hour is wanted, with 1TB data instance, needs 1000 hours, that is, more than 40 days, it is this.How from intensive neural group TB grades
Neuron is efficiently positioned in mass data, is still huge challenge in terms of image procossing, it has also become can serious restrict incite somebody to action
The data of acquisition are converted into the bottleneck problem of knowledge.
Graphics processor (Graphic Processing Unit, GPU) uses complete with traditional general multi-core processor
Complete different brand-new design framework.GPU is designed specifically for large-scale data parallel mode, this kind of calculating mode
Typical case includes the processing of figure and video, extensive matrix calculates and the scenes such as numerical simulation.With the processing of general multicore
Device is different, and GPU largely uses SIMD (Single Instruction Multiple Data) structure to realize at the same place
The data and instruction managed under device access is parallel.As GPU programmability constantly enhances, the programmed environments such as especially CUDA and a system
The appearance of column enhanced debugging tool, the complexity for programming GPU general-purpose computations are greatly lowered, and open GPU comprehensively towards logical
With the new era of calculating.General-purpose computations image processor (GPGPU) has developed into high degree of parallelism, and multithreading possesses powerful meter
The many-core processor of calculation ability and high bandwidth of memory.
Compared with isomorphism parallel architecture, the isomerism parallel system that is made of general processor CPU and coprocessor GPU
Structure is a kind of structure for being more suitable for large-scale calculations intensive task.Isomerism parallel architecture can effectively adapt to apply more
The complexity of field performance of program, practical application is high-efficient, has complied with VLSI chip capacity rapid growth
Trend, while can satisfy the growth requirement that application features become more diverse.Isomeric architecture contains different structure
Processor, have the general processor CPU of affairs processing-type and the application specific processor GPU of calculation type, with different types of processing
Device handles different tasks, is the advantage place of isomeric architecture.
It is existing based on CPU's however since CPU-GPU Heterogeneous Computing model is different from traditional CPU isomorphism computation model
Program can not be run directly on GPU.And since GPU can not directly access the memory space of CPU, in order to utilize the meter of GPU
Calculation ability, it is necessary to input data be transferred to the video memory at the end GPU before calculating starts from the memory at the end CPU, after calculating
Calculated result is transferred to the memory at the end CPU from the video memory at the end GPU, and so on, until all calculating tasks are finished.
Frequent data transmission occupies a large amount of program runtime between CPU and GPU, largely affects the operation of program
Efficiency.How the computational efficiency of CPU and GPU is improved, and data transfer overhead is that exploitation is different towards CPU-GPU between reducing CPU and GPU
The difficult point of the Neurons location algorithm of structure architecture.There is presently no the technical sides that Neurons location is carried out using CPU-GPU
The public reporting of case.
Summary of the invention
The technical problem to be solved in the present invention: in view of the above problems in the prior art, a kind of heterogeneous platform neuron is provided
Positioning three-level flowing water parallel method, system and medium, the present invention can be improved the processing speed of Neurons location, have neuron
Locating speed is fast, and program execution total time is short, what three class pipeline was realized flexible and supported parameter configuration, is easy to transplant and promote
Advantage.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention are as follows:
A kind of heterogeneous platform Neurons location three-level flowing water parallel method, implementation steps include:
1) slice image data is calculated by piecemeal parameter according to picture size and calculating granularity;
2) memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively;
3) variable and memory space initialization;
4) task schedule is carried out by CPU, executes calculating task in such a way that CPU and GPU uses three class pipeline simultaneously,
Each calculating task includes that data are read in, location Calculation and data write back three steps, and intermediate each round calculating task exists
The data that last round of calculating task is executed while executing location Calculation write back and execute the data reading of next round calculating task
Enter, so that data are read in, location Calculation and data write back three steps and carry out parallel.
Preferably, the detailed step of step 1) includes:
1.1) calculating can support that the maximum data block size gSizeMax, gSizeMax that calculate are positive integer on GPU:
1.2) x, y are determined respectively, the block size and piecemeal quantity and piecemeal total quantity in the direction z: if xDim <
GSizeMax, then the value that the direction x block size xScale is arranged is xDim, and the value that xScale is otherwise arranged is gSizeMax, if
The value for setting the direction x piecemeal quantity xNum isIf yDim < gSizeMax, the direction y block size is set
The value of yScale is yDim, and the value that yScale is otherwise arranged is gSizeMax.The value that the direction y piecemeal quantity yNum is arranged isIf zDim < gSizeMax, the value that the direction z block size zScale is arranged is zDim, is otherwise arranged
The value of zScale is gSizeMax;The value that the direction z piecemeal quantity zNum is arranged isWherein xDim, yDim,
ZDim is respectively parameter preset;The value that piecemeal total quantity bNum is arranged is xNum*yNum*zNum, the value of piecemeal total quantity bNum
The number consecutively since 1.
Preferably, piecemeal parameter is based in step 2) when the end CPU and the end GPU carry out memory allocation respectively, needle
Three pointer variables gReadPtr, gProcPtr and gWritePtr, each pointer are stated at the end GPU to the end GPU memory allocation
The video memory spatial content of distribution is gSizeMax3, wherein gReadPtr is directed toward next image block data to be processed,
GProcPtr is directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data;For
The end CPU memory allocation states two pointer variables cReadBuf, cWriteBuf on CPU, and the memory of each pointer distribution is empty
Between capacity be gSizeMax3, wherein cReadBuf is used for for the data buffering between gReadPtr and disk, cWriteBuf
Data buffering between gWritePtr and disk;And the block size calculated on CPU is set as gSizeMax, states at the end CPU
Three pointer variables cReadPtr, cProcPtr and cWritePtr, the memory headroom capacity of each pointer distribution is gSizeMax3,
Wherein gSizeMax is the maximum data block size that can support to calculate on GPU.
Preferably, variable and the detailed step of memory space initialization include: to recycle to become for CPU application mutual exclusion in step 3)
Idx is measured, initialization idx is 2,1 number block in disk is read in the memory headroom that pointer variable cProcPtr is directed toward, by magnetic
2 number blocks in disk read in the memory headroom that pointer variable cReadBuf is directed toward, and are then directed toward from pointer variable cReadBuf
Memory headroom by 2 number blocks be transferred to the end GPU pointer variable gProcPtr direction video memory space.
Preferably, the detailed step in step 4) includes:
4.1) start 0~No. 2 process and be responsible for for being responsible for that the tissue of calculating task and data are transmitted on GPU on CPU
3~No. 5 processes of tissue and the data transmission of the upper calculating task of CPU;
4.2) 0~No. 2 process call GPU using three class pipeline by way of execute calculating task, while by 3~
No. 5 processes call CPU to execute calculating task by the way of three class pipeline simultaneously, fixed carrying out each image block neuron
Data block required for next group of Neurons location is read while position, simultaneously by the data block back of upper one group of Neurons location
Disk, so that disk read-write operation and the parallel progress of Neurons location operation;
4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.
Preferably, calculating is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline in step 4.2) to appoint
The detailed step of business includes:
4.2.1A ncGPU line) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process
Journey, all GPU thread parallels carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward;By No. 1 into
Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable
Idx is less than or equal to piecemeal total quantity bNum, and the mutual exclusion cyclic variable idx number block in disk is read in pointer variable
Then the memory headroom that cReadBuf is directed toward is transferred to the end GPU from the memory headroom that the end CPU pointer variable cReadBuf is directed toward and refers to
The video memory space that needle variable gReadPtr is directed toward;The video memory space being directed toward by No. 2 process check pointer variable gWritePtr,
If the video memory space that pointer variable gWritePtr is directed toward has been stored in data block, by the data block from pointer variable gWritePtr
The memory headroom that the video memory space propagation of direction is directed toward to pointer variable cWriteBuf, then refers to from pointer variable cWriteBuf
To memory headroom by the data block be stored in disk, and remove pointer variable gWritePtr direction video memory space;
4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed;By No. 0 into
The exchange of Cheng Jinhang GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of referring to
Pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned by needle variable gProcPtr
Value is pointer variable gWritePtr, and pointer variable gWritePtr is assigned a value of pointer variable gtPtr;No. 0 process check pointer
The video memory space that variable gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A);
4.2.3A the transmission of data blocks in video memory space) being directed toward pointer variable gWritePtr by No. 0 process is to pointer
The memory headroom that variable cWriteBuff is directed toward, then the memory headroom being directed toward from pointer variable cWriteBuf is by data block back
Disk;The video memory space that the end GPU pointer variable gReadPtr, gProcPtr and gWritePtr are directed toward is recycled, the recycling end CPU refers to
The memory headroom that needle variable cReadBuf and cWriteBuf are directed toward.
Preferably, meter is executed in such a way that 3~No. 5 processes call CPU to use three class pipeline simultaneously in step 4.2)
The detailed step of calculation task includes:
4.2.1B ncCPU line) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes
Journey, all CPU line journeys carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel;By No. 4 into
Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable
Idx is less than or equal to bNum, and the mutual exclusion cyclic variable idx number block in disk is read in pointer variable cReadPtr direction
Deposit space;The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if pointer variable cWritePtr refers to
To memory headroom be stored in data block, by the data block be stored in disk, and empty pointer variable cWritePtr direction memory
Space;
4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed;By No. 3 into
The exchange of Cheng Jinhang CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of referring to
Pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned by needle variable cProcPtr
Value is pointer variable cWritePtr, and pointer variable cWritePtr is assigned a value of pointer variable ctPtr;No. 3 process check pointers
The memory headroom that variable cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B);
4.2.3B the data block back disk in memory headroom) being directed toward pointer variable cWritePtr by No. 3 processes;
Recycle the memory headroom that the end CPU pointer variable cReadPtr, cProcPtr and cWritePtr are directed toward.
The present invention also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel systems, the calculating including having GPU
Machine equipment, the computer equipment are programmed to perform the aforementioned heterogeneous platform Neurons location three-level flowing water parallel method of the present invention
Step.
The present invention also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel systems, the calculating including having GPU
Machine equipment is stored on the storage medium of the computer equipment and is programmed to perform the aforementioned heterogeneous platform Neurons location of the present invention
The computer program of three-level flowing water parallel method.
The present invention also provides a kind of computer readable storage medium, it is stored with and is programmed on the computer readable storage medium
To execute the computer program of the aforementioned heterogeneous platform Neurons location three-level flowing water parallel method of the present invention.
Compared to the prior art, the present invention has an advantage that the present invention according to picture size and calculates granularity for slice
Image data calculates piecemeal parameter;Memory allocation is carried out respectively at the end CPU and the end GPU based on piecemeal parameter;Variable and
Memory space initialization;Task schedule is carried out by CPU, executes calculating in such a way that CPU and GPU uses three class pipeline simultaneously
Task reads data block required for next group of Neurons location, same while carrying out each image block Neurons location
When by the data block back disk of upper one group of Neurons location so that disk read-write operation and Neurons location are operated and are advanced
Row.The present invention can be improved the processing speed of Neurons location, have Neurons location speed fast, and program execution total time is short,
Three class pipeline is realized flexibly and is supported parameter configuration, is easy to the advantages of transplanting with promoting.
Detailed description of the invention
Fig. 1 is the basic procedure schematic diagram of present invention method.
Fig. 2 is the parallel schematic illustration of three class pipeline in present invention method.
Specific embodiment
It will hereafter be made with the server for being equipped with 12 core 2.4GHz CPU of two-way and one piece of NVIDIA GTX 1080Ti GPU
For the example of heterogeneous platform, to heterogeneous platform Neurons location three-level flowing water parallel method, system and medium of the present invention carry out into
The detailed description of one step.The hard-disk capacity of the server is 24TB, and memory size 256GB, GPU video memory space is 11GB.It is defeated
Enter data to be made of the free hand drawing upper layer images sequence that 10000 resolution ratio are 40000 × 40000.
As shown in Figure 1, the step of the present embodiment heterogeneous platform Neurons location three-level flowing water parallel method, includes:
1) slice image data is calculated by piecemeal parameter according to picture size and calculating granularity;
2) memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively;
3) variable and memory space initialization;
4) task schedule is carried out by CPU, executes calculating task in such a way that CPU and GPU uses three class pipeline simultaneously,
Each calculating task includes that data are read in, location Calculation and data write back three steps, and intermediate each round calculating task exists
The data that last round of calculating task is executed while executing location Calculation write back and execute the data reading of next round calculating task
Enter, so that data are read in, location Calculation and data write back three steps and carry out parallel.
In the present embodiment, the detailed step of step 1) includes:
1.1) calculating can support that the maximum data block size gSizeMax, gSizeMax that calculate are positive integer on GPU:
1.2) x, y are determined respectively, the block size and piecemeal quantity and piecemeal total quantity in the direction z: if xDim <
GSizeMax, then the value that the direction x block size xScale is arranged is xDim, and the value that xScale is otherwise arranged is gSizeMax, if
The value for setting the direction x piecemeal quantity xNum isIf yDim < gSizeMax, the direction y block size is set
The value of yScale is yDim, and the value that yScale is otherwise arranged is gSizeMax.The value that the direction y piecemeal quantity yNum is arranged isIf zDim < gSizeMax, the value that the direction z block size zScale is arranged is zDim, is otherwise arranged
The value of zScale is gSizeMax;The value that the direction z piecemeal quantity zNum is arranged isWherein xDim, yDim,
ZDim is respectively parameter preset;The value that piecemeal total quantity bNum is arranged is xNum*yNum*zNum, the value of piecemeal total quantity bNum
The number consecutively since 1.Primary variables is defined as follows: cMem:CPU end memory capacity.The end gMem:GPU video memory capacity.GNum:
GPU quantity.XDim: each figure layer x direction pixel quantity.YDim: each figure layer y direction pixel quantity.ZDim: layer count.
In the present embodiment, the maximum data block size gSizeMax that can support to calculate on GPU is calculated are as follows:
In above formula, gMem is the memory size on GPU.
In the present embodiment, the direction x piecemeal quantityThe direction y piecemeal quantityThe direction z piecemeal quantityPiecemeal total quantity bNum=
260 × 260 × 65=4394000, the number consecutively since 1, each data block size are 1543B=3.65GB.
In the present embodiment, memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively in step 2)
When, three pointer variables gReadPtr, gProcPtr and gWritePtr are stated at the end GPU for the end GPU memory allocation,
The video memory spatial content of each pointer distribution is gSizeMax3, wherein gReadPtr is directed toward next image block data to be processed,
GProcPtr is directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data;For
The end CPU memory allocation states two pointer variables cReadBuf, cWriteBuf on CPU, and the memory of each pointer distribution is empty
Between capacity be gSizeMax3, wherein cReadBuf is used for for the data buffering between gReadPtr and disk, cWriteBuf
Data buffering between gWritePtr and disk;And the block size calculated on CPU is set as gSizeMax, states at the end CPU
Three pointer variables cReadPtr, cProcPtr and cWritePtr, the memory headroom capacity of each pointer distribution is gSizeMax3,
Wherein gSizeMax is the maximum data block size that can support to calculate on GPU.
Specifically, three pointer variables gReadPtr, gProcPtr and gWritePtr are stated at the end GPU in the present embodiment,
The video memory spatial content of each pointer distribution is 1543B=3.65GB, and wherein gReadPtr is directed toward next image block to be processed
Data, gProcPtr are directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data.
Two pointer variables cReadBuf, cWriteBuf are stated on CPU, the memory headroom capacity of each pointer distribution is 1543B=
3.65GB, wherein cReadBuf is used for gWritePtr and magnetic for the data buffering between gReadPtr and disk, cWriteBuf
Data buffering between disk.The block size calculated on CPU is set as 3.65GB.Correspondingly, stating that three pointers become at the end CPU
CReadPtr, cProcPtr and cWritePtr are measured, the memory headroom capacity of each pointer distribution is 3.65GB.
Variable and the detailed step of memory space initialization include: to follow for CPU application mutual exclusion in step 3) in the present embodiment
Ring variable i dx, initialization idx are 2, and 1 number block in disk is read in the memory headroom that pointer variable cProcPtr is directed toward,
2 number blocks in disk are read in into the memory headroom that pointer variable cReadBuf is directed toward, then from pointer variable cReadBuf
2 number blocks are transferred to the video memory space of the end GPU pointer variable gProcPtr direction by the memory headroom of direction.
In the present embodiment, the detailed step in step 4) includes:
4.1) start 0~No. 2 process and be responsible for for being responsible for that the tissue of calculating task and data are transmitted on GPU on CPU
3~No. 5 processes of tissue and the data transmission of the upper calculating task of CPU;
4.2) 0~No. 2 process call GPU using three class pipeline by way of execute calculating task, while by 3~
No. 5 processes call CPU to execute calculating task by the way of three class pipeline simultaneously, fixed carrying out each image block neuron
Data block required for next group of Neurons location is read while position, simultaneously by the data block back of upper one group of Neurons location
Disk, so that disk read-write operation and the parallel progress of Neurons location operation;In the present embodiment, step 4.2) by 0~No. 2 into
Journey calls GPU to execute calculating task by the way of three class pipeline, while calling CPU to use three simultaneously by 3~No. 5 processes
The mode of level production line executes calculating task, i.e. CPU and GPU carry out Neurons location simultaneously on CPU and GPU, improve meter
Efficiency is calculated, reduces and calculates the time;
4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.
In the present embodiment, meter is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline in step 4.2)
The detailed step of calculation task includes:
4.2.1A ncGPU line) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process
Journey, all GPU thread parallels carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward;The present embodiment
In, No. 0 process can start 3584 threads with core number 3584 is calculated according on GPU on GPU;By No. 1 process by mutual exclusion
Cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx be less than etc.
In piecemeal total quantity bNum, the mutual exclusion cyclic variable idx number block in disk is read in what pointer variable cReadBuf was directed toward
Then memory headroom is transferred to the end GPU pointer variable gReadPtr from the memory headroom that the end CPU pointer variable cReadBuf is directed toward
The video memory space of direction;The video memory space being directed toward by No. 2 process check pointer variable gWritePtr, if pointer variable
The video memory space that gWritePtr is directed toward has been stored in data block, and the video memory which is directed toward from pointer variable gWritePtr is empty
Between be transferred to pointer variable cWriteBuf direction memory headroom, then from pointer variable cWriteBuf be directed toward memory headroom
The data block is stored in disk, and removes the video memory space of pointer variable gWritePtr direction;Step 4.2.1A) in 0~No. 2
Process simultaneously carry out the end GPU data block read, data block calculate and data block back, realize data transmission and calculate when
Between be overlapped, reduce the data transfer overhead at the end GPU;
4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed;By No. 0 into
The exchange of Cheng Jinhang GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of referring to
Pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned by needle variable gProcPtr
Value is pointer variable gWritePtr, and pointer variable gWritePtr is assigned a value of pointer variable gtPtr;No. 0 process check pointer
The video memory space that variable gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A);This
The step 4.2.2A of embodiment) in realize data exchange by way of exchanging pointer, avoid copying a large amount of memory headrooms, improve
The spatiotemporal efficiency of storage space management;
4.2.3A the transmission of data blocks in video memory space) being directed toward pointer variable gWritePtr by No. 0 process is to pointer
The memory headroom that variable cWriteBuff is directed toward, then the memory headroom being directed toward from pointer variable cWriteBuf is by data block back
Disk;The video memory space that the end GPU pointer variable gReadPtr, gProcPtr and gWritePtr are directed toward is recycled, the recycling end CPU refers to
The memory headroom that needle variable cReadBuf and cWriteBuf are directed toward.
In the present embodiment, held in such a way that 3~No. 5 processes call CPU to use three class pipeline simultaneously in step 4.2)
The detailed step of row calculating task includes:
4.2.1B ncCPU line) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes
Journey, all CPU line journeys carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel;By No. 4 into
Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum (4394000), if mutual exclusion
Cyclic variable idx is less than or equal to bNum (4394000), and the mutual exclusion cyclic variable idx number block in disk is read in pointer and is become
Measure the memory headroom that cReadPtr is directed toward;The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if
The memory headroom that pointer variable cWritePtr is directed toward has been stored in data block, which is stored in disk, and empty pointer variable
The memory headroom that cWritePtr is directed toward;Step 4.2.1B) in 3~No. 5 processes carry out simultaneously the end CPU data block read, data
Block calculates and data block back, and the time-interleaving for realizing data transmission and calculating reduces the data transfer overhead at the end CPU;
4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed;By No. 3 into
The exchange of Cheng Jinhang CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of referring to
Pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned by needle variable cProcPtr
Value is pointer variable cWritePtr, and pointer variable cWritePtr is assigned a value of pointer variable ctPtr;No. 3 process check pointers
The memory headroom that variable cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B);This
The step 4.2.2B of embodiment) in realize data exchange by way of exchanging pointer, avoid copying a large amount of memory headrooms, improve
The spatiotemporal efficiency of storage space management;
4.2.3B the data block back disk in memory headroom) being directed toward pointer variable cWritePtr by No. 3 processes;
Recycle the memory headroom that the end CPU pointer variable cReadPtr, cProcPtr and cWritePtr are directed toward.
As shown in Fig. 2, the processing step in the location Calculation task executed due to CPU and GPU include data read in,
Location Calculation and data write back three steps, and there are dependence, first round algorithm (Round between the data of three steps
1) it executes and reads in internal storage data reading generation 3-D image volume data, the second wheel algorithm (Round 2), which executes, reads in internal storage data
Reading generates 3-D image volume data, Neurons location calculates two steps and carries out simultaneously, opens from third round algorithm (Round 3)
Begin, until location Calculation terminate before wheel third from the bottom (Round n-2), each round algorithm execute in data read in, positioning meter
It calculates and data writes back three steps while carrying out.Wherein data reading is next group of slice image data of processing, location Calculation
It is processing when the corresponding volume data of slice image data of data reading is completed in previous group, it is by upper one group of slice that data, which write back,
Disk array is write back after the Neurons location result treatment of image data.By the above technological approaches, effectively data can be read
Enter the time write back with data be hidden in Neurons location calculate step in.
In conclusion the present embodiment is based on method of partition organizational computing and data transmission using CPU, CPU and GPU use more
Thread carries out Neurons location, by the data transmission step between CPU memory, GPU video memory and disk using multistage pipeline mode, i.e.,
While handling each image block data, next image block data to be processed is read, while processed by upper one
Image block data write back disk so that data transfer operation and data processing operation carry out parallel.The present embodiment heterogeneous platform
Neurons location three-level flowing water is flat in CPU-GPU heterogeneous Computing in parallel through multi-process and multithreading hybrid parallel technology
On platform, while Neurons location calculating is carried out using CPU multi-core processor and GPU many-core coprocessor, and pass through multistage flowing water
Line technology carries out the time-interleaving calculated and data are transmitted, and Neurons location speed can be improved.It is found after statistics operation data,
Compared with the Neurons location algorithm run on 12 core CPU of two-way, the present embodiment heterogeneous platform Neurons location three-level flowing water
Neurons location speed can be increased to 3 times or more by parallel method.
In addition, the present embodiment also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel system, including have GPU
Computer equipment, which is programmed to perform the aforementioned heterogeneous platform Neurons location three-level flowing water of the present embodiment simultaneously
The step of row method.In addition, the present embodiment also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel system, including band
There is the computer equipment of GPU, is stored on the storage medium of the computer equipment and is programmed to perform the aforementioned isomery of the present embodiment
The computer program of platform Neurons location three-level flowing water parallel method.In addition, the present embodiment also provide it is a kind of computer-readable
Storage medium, is stored with that be programmed to perform the aforementioned heterogeneous platform neuron of the present embodiment fixed on the computer readable storage medium
The computer program of position three-level flowing water parallel method.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation
Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art
Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of heterogeneous platform Neurons location three-level flowing water parallel method, it is characterised in that implementation steps include:
1) slice image data is calculated by piecemeal parameter according to picture size and calculating granularity;
2) memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively;
3) variable and memory space initialization;
4) task schedule is carried out by CPU, executes calculating task in such a way that CPU and GPU uses three class pipeline simultaneously, it is each
A calculating task includes that data are read in, location Calculation and data write back three steps, and intermediate each round calculating task is executing
The data that last round of calculating task is executed while location Calculation write back and execute the data reading of next round calculating task, make
Data reading, location Calculation and data are obtained to write back three steps and carry out parallel.
2. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 1, which is characterized in that step 1)
Detailed step includes:
1.1) calculating can support that the maximum data block size gSizeMax, gSizeMax that calculate are positive integer on GPU:
1.2) x, y are determined respectively, the block size and piecemeal quantity and piecemeal total quantity in the direction z: if xDim <
GSizeMax, then the value that the direction x block size xScale is arranged is xDim, and the value that xScale is otherwise arranged is gSizeMax, if
The value for setting the direction x piecemeal quantity xNum isIf yDim < gSizeMax, the direction y block size is set
The value of yScale is yDim, and the value that yScale is otherwise arranged is gSizeMax.The value that the direction y piecemeal quantity yNum is arranged isIf zDim < gSizeMax, the value that the direction z block size zScale is arranged is zDim, is otherwise arranged
The value of zScale is gSizeMax;The value that the direction z piecemeal quantity zNum is arranged isWherein xDim, yDim,
ZDim is respectively parameter preset;The value that piecemeal total quantity bNum is arranged is xNum*yNum*zNum, the value of piecemeal total quantity bNum
The number consecutively since 1.
3. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 2, which is characterized in that in step 2)
Based on piecemeal parameter when the end CPU and the end GPU carry out memory allocation respectively, for the end GPU memory allocation in GPU
State that three pointer variables gReadPtr, gProcPtr and gWritePtr, the video memory spatial content of each pointer distribution are in end
gSizeMax3, wherein gReadPtr is directed toward next image block data to be processed, and gProcPtr is directed toward currently processed image
Block number evidence, gWritePtr are directed toward a upper processed image block data;It is stated on CPU for the end CPU memory allocation
The memory headroom capacity of two pointer variables cReadBuf, cWriteBuf, each pointer distribution is gSizeMax3, wherein
For cReadBuf for the data buffering between gReadPtr and disk, data of the cWriteBuf between gWritePtr and disk are slow
Punching;And the block size calculated on CPU is set as gSizeMax, the end CPU state three pointer variable cReadPtr,
The memory headroom capacity of cProcPtr and cWritePtr, each pointer distribution are gSizeMax3, wherein gSizeMax be GPU on can
Support the maximum data block size calculated.
4. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 3, which is characterized in that in step 3)
It is 2 that variable and the detailed step of memory space initialization, which include: for CPU application mutual exclusion cyclic variable idx, initialization idx, by magnetic
1 number block in disk reads in the memory headroom that pointer variable cProcPtr is directed toward, and 2 number blocks in disk are read in pointer
Then the memory headroom that variable cReadBuf is directed toward passes 2 number blocks from the memory headroom that pointer variable cReadBuf is directed toward
The defeated video memory space being directed toward to the end GPU pointer variable gProcPtr.
5. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 3, which is characterized in that in step 4)
Detailed step include:
4.1) start on CPU and be responsible on GPU in 0~No. 2 process and responsible CPU of the tissue of calculating task and data transmission
3~No. 5 processes of tissue and the data transmission of calculating task;
4.2) calculating task is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline, while passing through 3~No. 5
Process calls CPU to execute calculating task by the way of three class pipeline simultaneously, is carrying out each image block Neurons location
While read data block required for next group of Neurons location, simultaneously by the data block back magnetic of upper one group of Neurons location
Disk, so that disk read-write operation and the parallel progress of Neurons location operation;
4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.
6. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 5, which is characterized in that step 4.2)
In 0~No. 2 process call GPU using three class pipeline by way of execute the detailed step of calculating task and include:
4.2.1A ncGPU thread, institute) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process
There is GPU thread parallel to carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward;It will be mutual by No. 1 process
Reprimand cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx is less than
Equal to piecemeal total quantity bNum, the mutual exclusion cyclic variable idx number block in disk is read in into pointer variable cReadBuf and is directed toward
Memory headroom, then from the end CPU pointer variable cReadBuf be directed toward memory headroom be transferred to the end GPU pointer variable
The video memory space that gReadPtr is directed toward;The video memory space being directed toward by No. 2 process check pointer variable gWritePtr, if referred to
The video memory space that needle variable gWritePtr is directed toward has been stored in data block, which is directed toward from pointer variable gWritePtr
The memory headroom that video memory space propagation is directed toward to pointer variable cWriteBuf, then out of pointer variable cWriteBuf direction
It deposits space and the data block is stored in disk, and remove the video memory space of pointer variable gWritePtr direction;
4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed;By No. 0 process into
The exchange of row GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of pointer and is become
GProcPtr is measured, pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned a value of
Pointer variable gWritePtr is assigned a value of pointer variable gtPtr by pointer variable gWritePtr;No. 0 process check pointer variable
The video memory space that gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A);
4.2.3A the transmission of data blocks in video memory space) being directed toward pointer variable gWritePtr by No. 0 process is to pointer variable
The memory headroom that cWriteBuff is directed toward, then the memory headroom being directed toward from pointer variable cWriteBuf is by data block back disk;
The video memory space that the end GPU pointer variable gReadPtr, gProcPtr and gWritePtr are directed toward is recycled, the end CPU pointer variable is recycled
The memory headroom that cReadBuf and cWriteBuf is directed toward.
7. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 5, which is characterized in that step 4.2)
In pass through and execute the detailed step of calculating task by way of 3~No. 5 processes call CPU to use three class pipeline simultaneously and include:
4.2.1B ncCPU thread, institute) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes
There is CPU line journey to carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel;It will be mutual by No. 4 processes
Reprimand cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx is less than
Equal to bNum, the mutual exclusion cyclic variable idx number block in disk is read in into the memory headroom that pointer variable cReadPtr is directed toward;
The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if pointer variable cWritePtr direction is interior
It deposits space and has been stored in data block, which is stored in disk, and empty the memory headroom of pointer variable cWritePtr direction;
4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed;By No. 3 processes into
The exchange of row CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of pointer and is become
CProcPtr is measured, pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned a value of
Pointer variable cWritePtr is assigned a value of pointer variable ctPtr by pointer variable cWritePtr;No. 3 process check pointer variables
The memory headroom that cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B);
4.2.3B the data block back disk in memory headroom) being directed toward pointer variable cWritePtr by No. 3 processes;Recycling
The memory headroom that the end CPU pointer variable cReadPtr, cProcPtr and cWritePtr are directed toward.
8. a kind of heterogeneous platform Neurons location three-level flowing water parallel system, the computer equipment including having GPU, feature exist
In the computer equipment is programmed to perform heterogeneous platform Neurons location three-level stream described in any one of claim 1~7
The step of water parallel method.
9. a kind of heterogeneous platform Neurons location three-level flowing water parallel system, the computer equipment including having GPU, feature exist
In being stored with that be programmed to perform isomery described in any one of claim 1~7 flat on the storage medium of the computer equipment
The computer program of platform Neurons location three-level flowing water parallel method.
10. a kind of computer readable storage medium, which is characterized in that be stored with and be programmed on the computer readable storage medium
The computer program of heterogeneous platform Neurons location three-level flowing water parallel method described in any one of perform claim requirement 1~7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910289495.7A CN110135569B (en) | 2019-04-11 | 2019-04-11 | Heterogeneous platform neuron positioning three-level flow parallel method, system and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910289495.7A CN110135569B (en) | 2019-04-11 | 2019-04-11 | Heterogeneous platform neuron positioning three-level flow parallel method, system and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110135569A true CN110135569A (en) | 2019-08-16 |
CN110135569B CN110135569B (en) | 2021-09-21 |
Family
ID=67569648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910289495.7A Active CN110135569B (en) | 2019-04-11 | 2019-04-11 | Heterogeneous platform neuron positioning three-level flow parallel method, system and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135569B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516795A (en) * | 2019-08-28 | 2019-11-29 | 北京达佳互联信息技术有限公司 | A kind of method, apparatus and electronic equipment for model variable allocation processing device |
CN110543940A (en) * | 2019-08-29 | 2019-12-06 | 中国人民解放军国防科技大学 | Neural circuit body data processing method, system and medium based on hierarchical storage |
CN110992241A (en) * | 2019-11-21 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Heterogeneous embedded system and method for accelerating neural network target detection |
CN112529763A (en) * | 2020-12-16 | 2021-03-19 | 航天科工微电子***研究院有限公司 | Image processing system and tracking and aiming system based on soft and hard coupling |
CN113806067A (en) * | 2021-07-28 | 2021-12-17 | 卡斯柯信号有限公司 | Safety data verification method, device, equipment and medium based on vehicle-to-vehicle communication |
CN113918356A (en) * | 2021-12-13 | 2022-01-11 | 广东睿江云计算股份有限公司 | Method and device for quickly synchronizing data based on CUDA (compute unified device architecture), computer equipment and storage medium |
CN117689025A (en) * | 2023-12-07 | 2024-03-12 | 上海交通大学 | Quick large model reasoning service method and system suitable for consumer display card |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130169658A1 (en) * | 2011-12-28 | 2013-07-04 | Think Silicon Ltd | Multi-threaded multi-format blending device for computer graphics operations |
CN103617626A (en) * | 2013-12-16 | 2014-03-05 | 武汉狮图空间信息技术有限公司 | Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method |
CN104267940A (en) * | 2014-09-17 | 2015-01-07 | 武汉狮图空间信息技术有限公司 | Quick map tile generation method based on CPU+GPU |
CN104375807A (en) * | 2014-12-09 | 2015-02-25 | 中国人民解放军国防科学技术大学 | Three-level flow sequence comparison method based on many-core co-processor |
CN106815807A (en) * | 2017-01-11 | 2017-06-09 | 重庆市地理信息中心 | A kind of unmanned plane image Fast Mosaic method based on GPU CPU collaborations |
CN109451322A (en) * | 2018-09-14 | 2019-03-08 | 北京航天控制仪器研究所 | DCT algorithm and DWT algorithm for compression of images based on CUDA framework speed up to realize method |
-
2019
- 2019-04-11 CN CN201910289495.7A patent/CN110135569B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130169658A1 (en) * | 2011-12-28 | 2013-07-04 | Think Silicon Ltd | Multi-threaded multi-format blending device for computer graphics operations |
CN103617626A (en) * | 2013-12-16 | 2014-03-05 | 武汉狮图空间信息技术有限公司 | Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method |
CN104267940A (en) * | 2014-09-17 | 2015-01-07 | 武汉狮图空间信息技术有限公司 | Quick map tile generation method based on CPU+GPU |
CN104375807A (en) * | 2014-12-09 | 2015-02-25 | 中国人民解放军国防科学技术大学 | Three-level flow sequence comparison method based on many-core co-processor |
CN106815807A (en) * | 2017-01-11 | 2017-06-09 | 重庆市地理信息中心 | A kind of unmanned plane image Fast Mosaic method based on GPU CPU collaborations |
CN109451322A (en) * | 2018-09-14 | 2019-03-08 | 北京航天控制仪器研究所 | DCT algorithm and DWT algorithm for compression of images based on CUDA framework speed up to realize method |
Non-Patent Citations (3)
Title |
---|
TAO LI: "Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing", 《SPRINGER》 * |
肖难: "基于异构***架构的朴素贝叶斯图像分类算法的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
马永军等: "面向 CPU+GPU 异构平台的模板匹配目标识别并行算法", 《天津科技大学学报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516795A (en) * | 2019-08-28 | 2019-11-29 | 北京达佳互联信息技术有限公司 | A kind of method, apparatus and electronic equipment for model variable allocation processing device |
CN110516795B (en) * | 2019-08-28 | 2022-05-10 | 北京达佳互联信息技术有限公司 | Method and device for allocating processors to model variables and electronic equipment |
CN110543940A (en) * | 2019-08-29 | 2019-12-06 | 中国人民解放军国防科技大学 | Neural circuit body data processing method, system and medium based on hierarchical storage |
CN110992241A (en) * | 2019-11-21 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Heterogeneous embedded system and method for accelerating neural network target detection |
CN112529763A (en) * | 2020-12-16 | 2021-03-19 | 航天科工微电子***研究院有限公司 | Image processing system and tracking and aiming system based on soft and hard coupling |
CN113806067A (en) * | 2021-07-28 | 2021-12-17 | 卡斯柯信号有限公司 | Safety data verification method, device, equipment and medium based on vehicle-to-vehicle communication |
CN113806067B (en) * | 2021-07-28 | 2024-03-29 | 卡斯柯信号有限公司 | Safety data verification method, device, equipment and medium based on vehicle-to-vehicle communication |
CN113918356A (en) * | 2021-12-13 | 2022-01-11 | 广东睿江云计算股份有限公司 | Method and device for quickly synchronizing data based on CUDA (compute unified device architecture), computer equipment and storage medium |
CN113918356B (en) * | 2021-12-13 | 2022-02-18 | 广东睿江云计算股份有限公司 | Method and device for quickly synchronizing data based on CUDA (compute unified device architecture), computer equipment and storage medium |
CN117689025A (en) * | 2023-12-07 | 2024-03-12 | 上海交通大学 | Quick large model reasoning service method and system suitable for consumer display card |
Also Published As
Publication number | Publication date |
---|---|
CN110135569B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135569A (en) | Heterogeneous platform neuron positioning three-level flow parallel method, system and medium | |
CN110363294B (en) | Representing a neural network with paths in the network to improve performance of the neural network | |
Baskaran et al. | Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories | |
Herrero-Lopez et al. | Parallel multiclass classification using SVMs on GPUs | |
CN109993683A (en) | Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network | |
CN110135575A (en) | Communication optimization for distributed machines study | |
CN103761215B (en) | Matrix transpose optimization method based on graphic process unit | |
Scherer et al. | Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors | |
CN105808309B (en) | A kind of high-performance implementation method of the basic linear algebra library BLAS three-level function GEMM based on Shen prestige platform | |
US10725837B1 (en) | Persistent scratchpad memory for data exchange between programs | |
EP3742350A1 (en) | Parallelization strategies for training a neural network | |
CN106484532B (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
Liu | Parallel and scalable sparse basic linear algebra subprograms | |
DE102023105565A1 (en) | METHOD AND APPARATUS FOR EFFICIENT ACCESS TO MULTI-DIMENSIONAL DATA STRUCTURES AND/OR OTHER LARGE BLOCKS OF DATA | |
CN103413273A (en) | Method for rapidly achieving image restoration processing based on GPU | |
CN115390922A (en) | Shenwei architecture-based seismic wave simulation algorithm parallel optimization method and system | |
Bakunas-Milanowski et al. | Efficient algorithms for stream compaction on GPUs | |
DE102020130081A1 (en) | EXTENDED PROCESSOR FUNCTIONS FOR CALCULATIONS | |
CN110383206A (en) | System and method for generating Gauss number using hardware-accelerated | |
CN115756605A (en) | Shallow cloud convection parameterization scheme heterogeneous computing method based on multiple GPUs | |
US20230289398A1 (en) | Efficient Matrix Multiply and Add with a Group of Warps | |
Lin et al. | swFLOW: A dataflow deep learning framework on sunway taihulight supercomputer | |
Zhou et al. | A Parallel Scheme for Large‐scale Polygon Rasterization on CUDA‐enabled GPUs | |
Hou et al. | A GPU-based tabu search for very large hardware/software partitioning with limited resource usage | |
CN111445503B (en) | Pyramid mutual information image registration method based on parallel programming model on GPU cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |