CN110135569A

CN110135569A - Heterogeneous platform neuron positioning three-level flow parallel method, system and medium

Info

Publication number: CN110135569A
Application number: CN201910289495.7A
Authority: CN
Inventors: 邹丹; 朱小谦; 朱敏; 王文珂; 李金才; 汪祥; 陆丽娜; 甘新标; 孟祥飞; 夏飞
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-08-16
Anticipated expiration: 2039-04-11
Also published as: CN110135569B

Abstract

The invention discloses a heterogeneous platform neuron positioning three-level pipeline parallel method, a system and a medium, wherein slice image data are calculated into blocking parameters according to the image size and the calculation granularity; respectively allocating storage space at a CPU end and a GPU end based on the partitioning parameters; initializing variables and storage space; the CPU carries out task scheduling, the CPU and the GPU simultaneously adopt a three-stage pipeline mode to execute computing tasks, each computing task comprises three steps of parallel data reading-in, positioning computation and data writing-back, and each computing task in the middle executes the data writing-back of the previous computing task and the data reading-in of the next computing task while executing the positioning computation. The method can improve the processing speed of the neuron positioning, and has the advantages of high neuron positioning speed, short total program execution time, flexible three-stage pipeline realization, parameter configuration support and easy transplantation and popularization.

Description

A kind of heterogeneous platform Neurons location three-level flowing water parallel method, system and medium

Technical field

The present invention relates to the analytic methods of neural circuit fine structure, and in particular to a kind of heterogeneous platform Neurons location three Grade flowing water parallel method, system and medium, for realizing Neurons location parallel computation based on CPU-GPU heterogeneous computing platforms.

Background technique

Neural circuit information is the key that understand brain function and cerebral disease mechanism, how to realize neural circuit big data oneself Dynamic tracking is one of the key scientific problems that the neural area research such as brain science is faced.Neurons location is neural circuit number According to the key of parsing, accurate pericaryon position is obtained by being analyzed neural circuit image data, is subsequent fixed Measure the basis of analysis.

Typical large scale range Neurons location method is based on " each cell has and only one cell space " this biology Learn true, Biophysical model established by mathematical method (such as 1 norm minimum thought) integration, and by solve the model come Carry out large-scale Neurons location.Such methods are close to various kinds of cell type, shape, size and distribution in large scale range Degree etc. has good robustness, thus is the large scale range Neurons location of current high-precision neural circuit image data set Main method, but the image dimension-limited that this method is capable of handling is in the memory size of single calculate node, processing speed It is limited to the calculated performance of single calculate node.

With being constantly progressive for observation technology, the data scale of high-precision neural circuit image data set is increased rapidly, special It is not the huge advance of optical labeling molecule and micro-imaging technique, so that high-resolution obtains full brain data and becomes a reality.By It is larger in primate brain volume, according to current MOST imaging technique calculate, to 10 cubic centimetres of broad range of data into Each imaging to 1 micrometer resolution of row, can generate hundred TB data.Based on existing Neurons location method, handling 1GB data is needed 1 hour is wanted, with 1TB data instance, needs 1000 hours, that is, more than 40 days, it is this.How from intensive neural group TB grades Neuron is efficiently positioned in mass data, is still huge challenge in terms of image procossing, it has also become can serious restrict incite somebody to action The data of acquisition are converted into the bottleneck problem of knowledge.

Graphics processor (Graphic Processing Unit, GPU) uses complete with traditional general multi-core processor Complete different brand-new design framework.GPU is designed specifically for large-scale data parallel mode, this kind of calculating mode Typical case includes the processing of figure and video, extensive matrix calculates and the scenes such as numerical simulation.With the processing of general multicore Device is different, and GPU largely uses SIMD (Single Instruction Multiple Data) structure to realize at the same place The data and instruction managed under device access is parallel.As GPU programmability constantly enhances, the programmed environments such as especially CUDA and a system The appearance of column enhanced debugging tool, the complexity for programming GPU general-purpose computations are greatly lowered, and open GPU comprehensively towards logical With the new era of calculating.General-purpose computations image processor (GPGPU) has developed into high degree of parallelism, and multithreading possesses powerful meter The many-core processor of calculation ability and high bandwidth of memory.

Compared with isomorphism parallel architecture, the isomerism parallel system that is made of general processor CPU and coprocessor GPU Structure is a kind of structure for being more suitable for large-scale calculations intensive task.Isomerism parallel architecture can effectively adapt to apply more The complexity of field performance of program, practical application is high-efficient, has complied with VLSI chip capacity rapid growth Trend, while can satisfy the growth requirement that application features become more diverse.Isomeric architecture contains different structure Processor, have the general processor CPU of affairs processing-type and the application specific processor GPU of calculation type, with different types of processing Device handles different tasks, is the advantage place of isomeric architecture.

It is existing based on CPU's however since CPU-GPU Heterogeneous Computing model is different from traditional CPU isomorphism computation model Program can not be run directly on GPU.And since GPU can not directly access the memory space of CPU, in order to utilize the meter of GPU Calculation ability, it is necessary to input data be transferred to the video memory at the end GPU before calculating starts from the memory at the end CPU, after calculating Calculated result is transferred to the memory at the end CPU from the video memory at the end GPU, and so on, until all calculating tasks are finished. Frequent data transmission occupies a large amount of program runtime between CPU and GPU, largely affects the operation of program Efficiency.How the computational efficiency of CPU and GPU is improved, and data transfer overhead is that exploitation is different towards CPU-GPU between reducing CPU and GPU The difficult point of the Neurons location algorithm of structure architecture.There is presently no the technical sides that Neurons location is carried out using CPU-GPU The public reporting of case.

Summary of the invention

The technical problem to be solved in the present invention: in view of the above problems in the prior art, a kind of heterogeneous platform neuron is provided Positioning three-level flowing water parallel method, system and medium, the present invention can be improved the processing speed of Neurons location, have neuron Locating speed is fast, and program execution total time is short, what three class pipeline was realized flexible and supported parameter configuration, is easy to transplant and promote Advantage.

In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention are as follows:

A kind of heterogeneous platform Neurons location three-level flowing water parallel method, implementation steps include:

1) slice image data is calculated by piecemeal parameter according to picture size and calculating granularity；

2) memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively；

3) variable and memory space initialization；

4) task schedule is carried out by CPU, executes calculating task in such a way that CPU and GPU uses three class pipeline simultaneously, Each calculating task includes that data are read in, location Calculation and data write back three steps, and intermediate each round calculating task exists The data that last round of calculating task is executed while executing location Calculation write back and execute the data reading of next round calculating task Enter, so that data are read in, location Calculation and data write back three steps and carry out parallel.

Preferably, the detailed step of step 1) includes:

1.1) calculating can support that the maximum data block size gSizeMax, gSizeMax that calculate are positive integer on GPU:

1.2) x, y are determined respectively, the block size and piecemeal quantity and piecemeal total quantity in the direction z: if xDim < GSizeMax, then the value that the direction x block size xScale is arranged is xDim, and the value that xScale is otherwise arranged is gSizeMax, if The value for setting the direction x piecemeal quantity xNum isIf yDim < gSizeMax, the direction y block size is set The value of yScale is yDim, and the value that yScale is otherwise arranged is gSizeMax.The value that the direction y piecemeal quantity yNum is arranged isIf zDim < gSizeMax, the value that the direction z block size zScale is arranged is zDim, is otherwise arranged The value of zScale is gSizeMax；The value that the direction z piecemeal quantity zNum is arranged isWherein xDim, yDim, ZDim is respectively parameter preset；The value that piecemeal total quantity bNum is arranged is xNum*yNum*zNum, the value of piecemeal total quantity bNum The number consecutively since 1.

Preferably, piecemeal parameter is based in step 2) when the end CPU and the end GPU carry out memory allocation respectively, needle Three pointer variables gReadPtr, gProcPtr and gWritePtr, each pointer are stated at the end GPU to the end GPU memory allocation The video memory spatial content of distribution is gSizeMax³, wherein gReadPtr is directed toward next image block data to be processed, GProcPtr is directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data；For The end CPU memory allocation states two pointer variables cReadBuf, cWriteBuf on CPU, and the memory of each pointer distribution is empty Between capacity be gSizeMax³, wherein cReadBuf is used for for the data buffering between gReadPtr and disk, cWriteBuf Data buffering between gWritePtr and disk；And the block size calculated on CPU is set as gSizeMax, states at the end CPU Three pointer variables cReadPtr, cProcPtr and cWritePtr, the memory headroom capacity of each pointer distribution is gSizeMax³, Wherein gSizeMax is the maximum data block size that can support to calculate on GPU.

Preferably, variable and the detailed step of memory space initialization include: to recycle to become for CPU application mutual exclusion in step 3) Idx is measured, initialization idx is 2,1 number block in disk is read in the memory headroom that pointer variable cProcPtr is directed toward, by magnetic 2 number blocks in disk read in the memory headroom that pointer variable cReadBuf is directed toward, and are then directed toward from pointer variable cReadBuf Memory headroom by 2 number blocks be transferred to the end GPU pointer variable gProcPtr direction video memory space.

Preferably, the detailed step in step 4) includes:

4.1) start 0~No. 2 process and be responsible for for being responsible for that the tissue of calculating task and data are transmitted on GPU on CPU 3~No. 5 processes of tissue and the data transmission of the upper calculating task of CPU；

4.2) 0~No. 2 process call GPU using three class pipeline by way of execute calculating task, while by 3~ No. 5 processes call CPU to execute calculating task by the way of three class pipeline simultaneously, fixed carrying out each image block neuron Data block required for next group of Neurons location is read while position, simultaneously by the data block back of upper one group of Neurons location Disk, so that disk read-write operation and the parallel progress of Neurons location operation；

4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.

Preferably, calculating is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline in step 4.2) to appoint The detailed step of business includes:

4.2.1A ncGPU line) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process Journey, all GPU thread parallels carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward；By No. 1 into Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable Idx is less than or equal to piecemeal total quantity bNum, and the mutual exclusion cyclic variable idx number block in disk is read in pointer variable Then the memory headroom that cReadBuf is directed toward is transferred to the end GPU from the memory headroom that the end CPU pointer variable cReadBuf is directed toward and refers to The video memory space that needle variable gReadPtr is directed toward；The video memory space being directed toward by No. 2 process check pointer variable gWritePtr, If the video memory space that pointer variable gWritePtr is directed toward has been stored in data block, by the data block from pointer variable gWritePtr The memory headroom that the video memory space propagation of direction is directed toward to pointer variable cWriteBuf, then refers to from pointer variable cWriteBuf To memory headroom by the data block be stored in disk, and remove pointer variable gWritePtr direction video memory space；

4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed；By No. 0 into The exchange of Cheng Jinhang GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of referring to Pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned by needle variable gProcPtr Value is pointer variable gWritePtr, and pointer variable gWritePtr is assigned a value of pointer variable gtPtr；No. 0 process check pointer The video memory space that variable gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A)；

4.2.3A the transmission of data blocks in video memory space) being directed toward pointer variable gWritePtr by No. 0 process is to pointer The memory headroom that variable cWriteBuff is directed toward, then the memory headroom being directed toward from pointer variable cWriteBuf is by data block back Disk；The video memory space that the end GPU pointer variable gReadPtr, gProcPtr and gWritePtr are directed toward is recycled, the recycling end CPU refers to The memory headroom that needle variable cReadBuf and cWriteBuf are directed toward.

Preferably, meter is executed in such a way that 3~No. 5 processes call CPU to use three class pipeline simultaneously in step 4.2) The detailed step of calculation task includes:

4.2.1B ncCPU line) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes Journey, all CPU line journeys carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel；By No. 4 into Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable Idx is less than or equal to bNum, and the mutual exclusion cyclic variable idx number block in disk is read in pointer variable cReadPtr direction Deposit space；The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if pointer variable cWritePtr refers to To memory headroom be stored in data block, by the data block be stored in disk, and empty pointer variable cWritePtr direction memory Space；

4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed；By No. 3 into The exchange of Cheng Jinhang CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of referring to Pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned by needle variable cProcPtr Value is pointer variable cWritePtr, and pointer variable cWritePtr is assigned a value of pointer variable ctPtr；No. 3 process check pointers The memory headroom that variable cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B)；

4.2.3B the data block back disk in memory headroom) being directed toward pointer variable cWritePtr by No. 3 processes； Recycle the memory headroom that the end CPU pointer variable cReadPtr, cProcPtr and cWritePtr are directed toward.

The present invention also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel systems, the calculating including having GPU Machine equipment, the computer equipment are programmed to perform the aforementioned heterogeneous platform Neurons location three-level flowing water parallel method of the present invention Step.

The present invention also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel systems, the calculating including having GPU Machine equipment is stored on the storage medium of the computer equipment and is programmed to perform the aforementioned heterogeneous platform Neurons location of the present invention The computer program of three-level flowing water parallel method.

The present invention also provides a kind of computer readable storage medium, it is stored with and is programmed on the computer readable storage medium To execute the computer program of the aforementioned heterogeneous platform Neurons location three-level flowing water parallel method of the present invention.

Compared to the prior art, the present invention has an advantage that the present invention according to picture size and calculates granularity for slice Image data calculates piecemeal parameter；Memory allocation is carried out respectively at the end CPU and the end GPU based on piecemeal parameter；Variable and Memory space initialization；Task schedule is carried out by CPU, executes calculating in such a way that CPU and GPU uses three class pipeline simultaneously Task reads data block required for next group of Neurons location, same while carrying out each image block Neurons location When by the data block back disk of upper one group of Neurons location so that disk read-write operation and Neurons location are operated and are advanced Row.The present invention can be improved the processing speed of Neurons location, have Neurons location speed fast, and program execution total time is short, Three class pipeline is realized flexibly and is supported parameter configuration, is easy to the advantages of transplanting with promoting.

Detailed description of the invention

Fig. 1 is the basic procedure schematic diagram of present invention method.

Fig. 2 is the parallel schematic illustration of three class pipeline in present invention method.

Specific embodiment

It will hereafter be made with the server for being equipped with 12 core 2.4GHz CPU of two-way and one piece of NVIDIA GTX 1080Ti GPU For the example of heterogeneous platform, to heterogeneous platform Neurons location three-level flowing water parallel method, system and medium of the present invention carry out into The detailed description of one step.The hard-disk capacity of the server is 24TB, and memory size 256GB, GPU video memory space is 11GB.It is defeated Enter data to be made of the free hand drawing upper layer images sequence that 10000 resolution ratio are 40000 × 40000.

As shown in Figure 1, the step of the present embodiment heterogeneous platform Neurons location three-level flowing water parallel method, includes:

3) variable and memory space initialization；

In the present embodiment, the detailed step of step 1) includes:

1.2) x, y are determined respectively, the block size and piecemeal quantity and piecemeal total quantity in the direction z: if xDim < GSizeMax, then the value that the direction x block size xScale is arranged is xDim, and the value that xScale is otherwise arranged is gSizeMax, if The value for setting the direction x piecemeal quantity xNum isIf yDim < gSizeMax, the direction y block size is set The value of yScale is yDim, and the value that yScale is otherwise arranged is gSizeMax.The value that the direction y piecemeal quantity yNum is arranged isIf zDim < gSizeMax, the value that the direction z block size zScale is arranged is zDim, is otherwise arranged The value of zScale is gSizeMax；The value that the direction z piecemeal quantity zNum is arranged isWherein xDim, yDim, ZDim is respectively parameter preset；The value that piecemeal total quantity bNum is arranged is xNum*yNum*zNum, the value of piecemeal total quantity bNum The number consecutively since 1.Primary variables is defined as follows: cMem:CPU end memory capacity.The end gMem:GPU video memory capacity.GNum: GPU quantity.XDim: each figure layer x direction pixel quantity.YDim: each figure layer y direction pixel quantity.ZDim: layer count.

In the present embodiment, the maximum data block size gSizeMax that can support to calculate on GPU is calculated are as follows:

In above formula, gMem is the memory size on GPU.

In the present embodiment, the direction x piecemeal quantityThe direction y piecemeal quantityThe direction z piecemeal quantityPiecemeal total quantity bNum= 260 × 260 × 65=4394000, the number consecutively since 1, each data block size are 154³B=3.65GB.

In the present embodiment, memory allocation is carried out at the end CPU and the end GPU based on piecemeal parameter respectively in step 2) When, three pointer variables gReadPtr, gProcPtr and gWritePtr are stated at the end GPU for the end GPU memory allocation, The video memory spatial content of each pointer distribution is gSizeMax³, wherein gReadPtr is directed toward next image block data to be processed, GProcPtr is directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data；For The end CPU memory allocation states two pointer variables cReadBuf, cWriteBuf on CPU, and the memory of each pointer distribution is empty Between capacity be gSizeMax³, wherein cReadBuf is used for for the data buffering between gReadPtr and disk, cWriteBuf Data buffering between gWritePtr and disk；And the block size calculated on CPU is set as gSizeMax, states at the end CPU Three pointer variables cReadPtr, cProcPtr and cWritePtr, the memory headroom capacity of each pointer distribution is gSizeMax³, Wherein gSizeMax is the maximum data block size that can support to calculate on GPU.

Specifically, three pointer variables gReadPtr, gProcPtr and gWritePtr are stated at the end GPU in the present embodiment, The video memory spatial content of each pointer distribution is 1543B=3.65GB, and wherein gReadPtr is directed toward next image block to be processed Data, gProcPtr are directed toward currently processed image block data, and gWritePtr is directed toward a upper processed image block data. Two pointer variables cReadBuf, cWriteBuf are stated on CPU, the memory headroom capacity of each pointer distribution is 1543B= 3.65GB, wherein cReadBuf is used for gWritePtr and magnetic for the data buffering between gReadPtr and disk, cWriteBuf Data buffering between disk.The block size calculated on CPU is set as 3.65GB.Correspondingly, stating that three pointers become at the end CPU CReadPtr, cProcPtr and cWritePtr are measured, the memory headroom capacity of each pointer distribution is 3.65GB.

Variable and the detailed step of memory space initialization include: to follow for CPU application mutual exclusion in step 3) in the present embodiment Ring variable i dx, initialization idx are 2, and 1 number block in disk is read in the memory headroom that pointer variable cProcPtr is directed toward, 2 number blocks in disk are read in into the memory headroom that pointer variable cReadBuf is directed toward, then from pointer variable cReadBuf 2 number blocks are transferred to the video memory space of the end GPU pointer variable gProcPtr direction by the memory headroom of direction.

In the present embodiment, the detailed step in step 4) includes:

4.2) 0~No. 2 process call GPU using three class pipeline by way of execute calculating task, while by 3~ No. 5 processes call CPU to execute calculating task by the way of three class pipeline simultaneously, fixed carrying out each image block neuron Data block required for next group of Neurons location is read while position, simultaneously by the data block back of upper one group of Neurons location Disk, so that disk read-write operation and the parallel progress of Neurons location operation；In the present embodiment, step 4.2) by 0~No. 2 into Journey calls GPU to execute calculating task by the way of three class pipeline, while calling CPU to use three simultaneously by 3~No. 5 processes The mode of level production line executes calculating task, i.e. CPU and GPU carry out Neurons location simultaneously on CPU and GPU, improve meter Efficiency is calculated, reduces and calculates the time；

4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.

In the present embodiment, meter is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline in step 4.2) The detailed step of calculation task includes:

4.2.1A ncGPU line) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process Journey, all GPU thread parallels carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward；The present embodiment In, No. 0 process can start 3584 threads with core number 3584 is calculated according on GPU on GPU；By No. 1 process by mutual exclusion Cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx be less than etc. In piecemeal total quantity bNum, the mutual exclusion cyclic variable idx number block in disk is read in what pointer variable cReadBuf was directed toward Then memory headroom is transferred to the end GPU pointer variable gReadPtr from the memory headroom that the end CPU pointer variable cReadBuf is directed toward The video memory space of direction；The video memory space being directed toward by No. 2 process check pointer variable gWritePtr, if pointer variable The video memory space that gWritePtr is directed toward has been stored in data block, and the video memory which is directed toward from pointer variable gWritePtr is empty Between be transferred to pointer variable cWriteBuf direction memory headroom, then from pointer variable cWriteBuf be directed toward memory headroom The data block is stored in disk, and removes the video memory space of pointer variable gWritePtr direction；Step 4.2.1A) in 0~No. 2 Process simultaneously carry out the end GPU data block read, data block calculate and data block back, realize data transmission and calculate when Between be overlapped, reduce the data transfer overhead at the end GPU；

4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed；By No. 0 into The exchange of Cheng Jinhang GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of referring to Pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned by needle variable gProcPtr Value is pointer variable gWritePtr, and pointer variable gWritePtr is assigned a value of pointer variable gtPtr；No. 0 process check pointer The video memory space that variable gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A)；This The step 4.2.2A of embodiment) in realize data exchange by way of exchanging pointer, avoid copying a large amount of memory headrooms, improve The spatiotemporal efficiency of storage space management；

In the present embodiment, held in such a way that 3~No. 5 processes call CPU to use three class pipeline simultaneously in step 4.2) The detailed step of row calculating task includes:

4.2.1B ncCPU line) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes Journey, all CPU line journeys carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel；By No. 4 into Mutual exclusion cyclic variable idx is added 1 by journey, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum (4394000), if mutual exclusion Cyclic variable idx is less than or equal to bNum (4394000), and the mutual exclusion cyclic variable idx number block in disk is read in pointer and is become Measure the memory headroom that cReadPtr is directed toward；The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if The memory headroom that pointer variable cWritePtr is directed toward has been stored in data block, which is stored in disk, and empty pointer variable The memory headroom that cWritePtr is directed toward；Step 4.2.1B) in 3~No. 5 processes carry out simultaneously the end CPU data block read, data Block calculates and data block back, and the time-interleaving for realizing data transmission and calculating reduces the data transfer overhead at the end CPU；

4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed；By No. 3 into The exchange of Cheng Jinhang CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of referring to Pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned by needle variable cProcPtr Value is pointer variable cWritePtr, and pointer variable cWritePtr is assigned a value of pointer variable ctPtr；No. 3 process check pointers The memory headroom that variable cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B)；This The step 4.2.2B of embodiment) in realize data exchange by way of exchanging pointer, avoid copying a large amount of memory headrooms, improve The spatiotemporal efficiency of storage space management；

As shown in Fig. 2, the processing step in the location Calculation task executed due to CPU and GPU include data read in, Location Calculation and data write back three steps, and there are dependence, first round algorithm (Round between the data of three steps 1) it executes and reads in internal storage data reading generation 3-D image volume data, the second wheel algorithm (Round 2), which executes, reads in internal storage data Reading generates 3-D image volume data, Neurons location calculates two steps and carries out simultaneously, opens from third round algorithm (Round 3) Begin, until location Calculation terminate before wheel third from the bottom (Round n-2), each round algorithm execute in data read in, positioning meter It calculates and data writes back three steps while carrying out.Wherein data reading is next group of slice image data of processing, location Calculation It is processing when the corresponding volume data of slice image data of data reading is completed in previous group, it is by upper one group of slice that data, which write back, Disk array is write back after the Neurons location result treatment of image data.By the above technological approaches, effectively data can be read Enter the time write back with data be hidden in Neurons location calculate step in.

In conclusion the present embodiment is based on method of partition organizational computing and data transmission using CPU, CPU and GPU use more Thread carries out Neurons location, by the data transmission step between CPU memory, GPU video memory and disk using multistage pipeline mode, i.e., While handling each image block data, next image block data to be processed is read, while processed by upper one Image block data write back disk so that data transfer operation and data processing operation carry out parallel.The present embodiment heterogeneous platform Neurons location three-level flowing water is flat in CPU-GPU heterogeneous Computing in parallel through multi-process and multithreading hybrid parallel technology On platform, while Neurons location calculating is carried out using CPU multi-core processor and GPU many-core coprocessor, and pass through multistage flowing water Line technology carries out the time-interleaving calculated and data are transmitted, and Neurons location speed can be improved.It is found after statistics operation data, Compared with the Neurons location algorithm run on 12 core CPU of two-way, the present embodiment heterogeneous platform Neurons location three-level flowing water Neurons location speed can be increased to 3 times or more by parallel method.

In addition, the present embodiment also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel system, including have GPU Computer equipment, which is programmed to perform the aforementioned heterogeneous platform Neurons location three-level flowing water of the present embodiment simultaneously The step of row method.In addition, the present embodiment also provides a kind of heterogeneous platform Neurons location three-level flowing water parallel system, including band There is the computer equipment of GPU, is stored on the storage medium of the computer equipment and is programmed to perform the aforementioned isomery of the present embodiment The computer program of platform Neurons location three-level flowing water parallel method.In addition, the present embodiment also provide it is a kind of computer-readable Storage medium, is stored with that be programmed to perform the aforementioned heterogeneous platform neuron of the present embodiment fixed on the computer readable storage medium The computer program of position three-level flowing water parallel method.

The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of heterogeneous platform Neurons location three-level flowing water parallel method, it is characterised in that implementation steps include:

3) variable and memory space initialization；

4) task schedule is carried out by CPU, executes calculating task in such a way that CPU and GPU uses three class pipeline simultaneously, it is each A calculating task includes that data are read in, location Calculation and data write back three steps, and intermediate each round calculating task is executing The data that last round of calculating task is executed while location Calculation write back and execute the data reading of next round calculating task, make Data reading, location Calculation and data are obtained to write back three steps and carry out parallel.

2. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 1, which is characterized in that step 1) Detailed step includes:

3. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 2, which is characterized in that in step 2) Based on piecemeal parameter when the end CPU and the end GPU carry out memory allocation respectively, for the end GPU memory allocation in GPU State that three pointer variables gReadPtr, gProcPtr and gWritePtr, the video memory spatial content of each pointer distribution are in end gSizeMax³, wherein gReadPtr is directed toward next image block data to be processed, and gProcPtr is directed toward currently processed image Block number evidence, gWritePtr are directed toward a upper processed image block data；It is stated on CPU for the end CPU memory allocation The memory headroom capacity of two pointer variables cReadBuf, cWriteBuf, each pointer distribution is gSizeMax³, wherein For cReadBuf for the data buffering between gReadPtr and disk, data of the cWriteBuf between gWritePtr and disk are slow Punching；And the block size calculated on CPU is set as gSizeMax, the end CPU state three pointer variable cReadPtr, The memory headroom capacity of cProcPtr and cWritePtr, each pointer distribution are gSizeMax³, wherein gSizeMax be GPU on can Support the maximum data block size calculated.

4. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 3, which is characterized in that in step 3) It is 2 that variable and the detailed step of memory space initialization, which include: for CPU application mutual exclusion cyclic variable idx, initialization idx, by magnetic 1 number block in disk reads in the memory headroom that pointer variable cProcPtr is directed toward, and 2 number blocks in disk are read in pointer Then the memory headroom that variable cReadBuf is directed toward passes 2 number blocks from the memory headroom that pointer variable cReadBuf is directed toward The defeated video memory space being directed toward to the end GPU pointer variable gProcPtr.

5. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 3, which is characterized in that in step 4) Detailed step include:

4.1) start on CPU and be responsible on GPU in 0~No. 2 process and responsible CPU of the tissue of calculating task and data transmission 3~No. 5 processes of tissue and the data transmission of calculating task；

4.2) calculating task is executed in such a way that 0~No. 2 process calls GPU to use three class pipeline, while passing through 3~No. 5 Process calls CPU to execute calculating task by the way of three class pipeline simultaneously, is carrying out each image block Neurons location While read data block required for next group of Neurons location, simultaneously by the data block back magnetic of upper one group of Neurons location Disk, so that disk read-write operation and the parallel progress of Neurons location operation；

4.3) synchronous 0,1,2,3,4, No. 5 process, calculating terminate.

6. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 5, which is characterized in that step 4.2) In 0~No. 2 process call GPU using three class pipeline by way of execute the detailed step of calculating task and include:

4.2.1A ncGPU thread, institute) can be started with core number ncGPU is calculated according on GPU on GPU by No. 0 process There is GPU thread parallel to carry out Neurons location calculating to the data block that pointer variable gProcPtr is directed toward；It will be mutual by No. 1 process Reprimand cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx is less than Equal to piecemeal total quantity bNum, the mutual exclusion cyclic variable idx number block in disk is read in into pointer variable cReadBuf and is directed toward Memory headroom, then from the end CPU pointer variable cReadBuf be directed toward memory headroom be transferred to the end GPU pointer variable The video memory space that gReadPtr is directed toward；The video memory space being directed toward by No. 2 process check pointer variable gWritePtr, if referred to The video memory space that needle variable gWritePtr is directed toward has been stored in data block, which is directed toward from pointer variable gWritePtr The memory headroom that video memory space propagation is directed toward to pointer variable cWriteBuf, then out of pointer variable cWriteBuf direction It deposits space and the data block is stored in disk, and remove the video memory space of pointer variable gWritePtr direction；

4.2.2A) synchronous No. 0, No. 1 and No. 2 process, after synchronous, GPU current data block, which calculates, to be completed；By No. 0 process into The exchange of row GPU video memory pointer, concrete operations are statement temporary pointer variable gtPtr, and pointer variable gtPtr is assigned a value of pointer and is become GProcPtr is measured, pointer variable gProcPtr is assigned a value of pointer variable gReadPtr, pointer variable gReadPtr is assigned a value of Pointer variable gWritePtr is assigned a value of pointer variable gtPtr by pointer variable gWritePtr；No. 0 process check pointer variable The video memory space that gProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3A), no to then follow the steps 4.2.1A)；

4.2.3A the transmission of data blocks in video memory space) being directed toward pointer variable gWritePtr by No. 0 process is to pointer variable The memory headroom that cWriteBuff is directed toward, then the memory headroom being directed toward from pointer variable cWriteBuf is by data block back disk； The video memory space that the end GPU pointer variable gReadPtr, gProcPtr and gWritePtr are directed toward is recycled, the end CPU pointer variable is recycled The memory headroom that cReadBuf and cWriteBuf is directed toward.

7. heterogeneous platform Neurons location three-level flowing water parallel method according to claim 5, which is characterized in that step 4.2) In pass through and execute the detailed step of calculating task by way of 3~No. 5 processes call CPU to use three class pipeline simultaneously and include:

4.2.1B ncCPU thread, institute) can be started on CPU with core number ncCPU is calculated according on CPU by No. 3 processes There is CPU line journey to carry out Neurons location calculating to the data block that pointer variable cProcPtr is directed toward parallel；It will be mutual by No. 4 processes Reprimand cyclic variable idx adds 1, compares mutual exclusion cyclic variable idx and piecemeal total quantity bNum, if mutual exclusion cyclic variable idx is less than Equal to bNum, the mutual exclusion cyclic variable idx number block in disk is read in into the memory headroom that pointer variable cReadPtr is directed toward； The memory headroom being directed toward by No. 5 process check pointer variable cWritePtr, if pointer variable cWritePtr direction is interior It deposits space and has been stored in data block, which is stored in disk, and empty the memory headroom of pointer variable cWritePtr direction；

4.2.2B) synchronous No. 3, No. 4 and No. 5 processes, after synchronous, CPU current data block, which calculates, to be completed；By No. 3 processes into The exchange of row CPU memory pointer, concrete operations are statement temporary pointer variable ctPtr, and pointer variable ctPtr is assigned a value of pointer and is become CProcPtr is measured, pointer variable cProcPtr is assigned a value of pointer variable cReadPtr, pointer variable cReadPtr is assigned a value of Pointer variable cWritePtr is assigned a value of pointer variable ctPtr by pointer variable cWritePtr；No. 3 process check pointer variables The memory headroom that cProcPtr is directed toward, if content is that sky thens follow the steps 4.2.3B), no to then follow the steps 4.2.1B)；

4.2.3B the data block back disk in memory headroom) being directed toward pointer variable cWritePtr by No. 3 processes；Recycling The memory headroom that the end CPU pointer variable cReadPtr, cProcPtr and cWritePtr are directed toward.

8. a kind of heterogeneous platform Neurons location three-level flowing water parallel system, the computer equipment including having GPU, feature exist In the computer equipment is programmed to perform heterogeneous platform Neurons location three-level stream described in any one of claim 1~7 The step of water parallel method.

9. a kind of heterogeneous platform Neurons location three-level flowing water parallel system, the computer equipment including having GPU, feature exist In being stored with that be programmed to perform isomery described in any one of claim 1~7 flat on the storage medium of the computer equipment The computer program of platform Neurons location three-level flowing water parallel method.

10. a kind of computer readable storage medium, which is characterized in that be stored with and be programmed on the computer readable storage medium The computer program of heterogeneous platform Neurons location three-level flowing water parallel method described in any one of perform claim requirement 1~7.