CN106502632A - A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam - Google Patents

A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam Download PDF

Info

Publication number
CN106502632A
CN106502632A CN201610976893.2A CN201610976893A CN106502632A CN 106502632 A CN106502632 A CN 106502632A CN 201610976893 A CN201610976893 A CN 201610976893A CN 106502632 A CN106502632 A CN 106502632A
Authority
CN
China
Prior art keywords
particle
thread
population
self
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610976893.2A
Other languages
Chinese (zh)
Other versions
CN106502632B (en
Inventor
何发智
张硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Beidou innovation and Application Technology Research Institute Co.,Ltd.
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201610976893.2A priority Critical patent/CN106502632B/en
Publication of CN106502632A publication Critical patent/CN106502632A/en
Application granted granted Critical
Publication of CN106502632B publication Critical patent/CN106502632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam, comprise the following steps:1:Initialization matter function parameter, initializes particle swarm parameter;2:Three CUDA kernel functions are defined, what up to the present angle value that what the speed and position, the fitness value of particle and particle itself of future generation for being respectively used to parallel computation particle was found be preferably adapted to and its corresponding solution, whole population found is preferably adapted to angle value and its corresponding solution;3:Block the and Grid parameters of each kernel function are calculated and are initialized according to self-adaptive thread beam algorithm;4:Call kernel function parallel iteration to update speed and the position of population, and obtain;5:Repeat step 4 result of calculation is exported until reaching the termination condition of setting, GPU;The present invention can significantly shorten Parallel implementation time of the particle cluster algorithm on GPU, reduce power consumption, saves hardware cost.

Description

A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam
Technical field
The present invention relates to a kind of particle group optimizing method, belongs to field of computer data processing, and in particular to one kind is based on The GPU parallel particle swarm optimization methods of self-adaptive thread beam.
Background technology
Particle group optimizing (Particle Swarm Optimization, PSO) algorithm is a kind of evolutionary computing, by In its concept simple, be easily achieved, while the features such as but also with stronger global search and convergence capabilities, and obtained quickly Develop and be widely applied.There are various parallel PSO algorithm versions at present, among these, for CUDA parallel architectures, to thread Allocative decision mainly have two kinds:1) thread corresponds to a particle;2) thread corresponds to a dimension, a Block A corresponding particle.The first coarse grain parallelism method, although have been achieved for good speed-up ratio, but due to each thread in Each dimension corresponding to particle remains serial execution, and degree of concurrence is not high.Second fine grained parallel mode is Improve on the premise of one kind, each particle is corresponded to each Block, then the thread in each Blcok is corresponded to every A dimension in individual particle.So undoubtedly increase
Degree of concurrence, it will be appreciated, however, that in CUDA concurrent programs, all of Block is being assigned to by serial On each stream multiprocessor, can also continue to improve degree of parallelism.
GPU is a kind of special graphics rendering device.Initially GPU is the hardware for being exclusively used in graphics process, but since Since 2006, increasing research worker have studied the GPGPU fields for carrying out general-purpose computations using GPU, all big enterprises It is proposed special GPGPU language, such as CUDA, OPENCL etc..
Content of the invention
The purpose of the present invention is to optimize original computational methods based on GPU, adjusts its parallel architecture and is allowed to parallel efficiency more Height, designs a set of improved CUDA parallel architectures mode, accelerates to execute using image processor (GPU) so that population is calculated Degree of parallelism of the method on individual host is further improved, and is compared first two method and is improve 40 on the multiple of CPU speed-up ratios As many as.
In order to solve the above problems, the solution of the present invention is:
A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam,
The dimension of each particle is divided into several thread beams, using thread block come comprising the thread beam so that one Corresponding one or more particles in individual thread block;
Wherein, the thread beam is SM scheduling and the ultimate unit for executing.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam, based on below equation Population ParticleNum corresponding to number WarpNum and thread block of the thread beam corresponding to adjustment particle:
WarpNum=DivUp (D, WarpSize) (8)
ThreadNum=WarpNum*WarpSize (9)
ParticleNum=
DivDown(BlockSize,ThreadNum) (10)
In formula, D represents that the dimension of Solve problems, WarpSize represent the size of a thread beam in CUDA frameworks;DivUp The function of function is that D rounds up divided by the business that WarpSize is obtained, to obtain the number of Warp corresponding to particle WarpNum;ThreadNum is used for representing the actual total number of threads that uses of each particle;BlockSize is represented one in CUDA frameworks The size of individual Block, the function of DivDown functions are to do to round BlockSize divided by the business that ThreadNum is obtained downwards, To obtain population ParticleNum corresponding to Block.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam are calling kernel function Before,
Based on self-adaptive thread beam algorithm, the number of the thread block of each kernel function is calculated and is initialized using below equation BlockNum and number GridNum of grid:
BlockNum=TreadNum*ParticleNum;
GridNum=DivUp (N, ParticleNum);
In formula, ThreadNum is used for representing the actual total number of threads that uses of each particle;ParticleNum is thread block Corresponding population;N is the total number of particle in population.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam,
Define three CUDA kernel functions, be respectively used to parallel computation particle speed and position, the fitness value of particle and It is best that up to the present angle value that what particle of future generation was found itself be preferably adapted to and its corresponding solution, whole population find Fitness value and its corresponding solution.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam are specifically included following Step:
Step 2.1:Speed and the position kernel of particle is calculated, each GPU thread is according to the thread block number for distributing BlockNum and meshes number GridNum, calculate each problem dimension by the computing formula of particle cluster algorithm is corresponding Corresponding speed and position;
Step 2.2:Calculate that the fitness value and particle itself of future generation of particle found is preferably adapted to angle value and its right The solution kernel that answers, according to the fitness value of each dimension of each particle of the BlockNum and GridNum parallel computations for distributing, then According to the fitness value of each dimension by parallel reduction algorithm, the fitness value of each particle is obtained, finally according to obtained The fitness value of fitness value, more new particle and its corresponding solution;
Step 2.3:Calculate that up to the present whole population find is preferably adapted to angle value and its corresponding solution kernel, leads to Cross the cublasI using CUBLAS<t>Amin () function (data types of the t for operation object) tries to achieve whole particle on GPU What up to the present group found is preferably adapted to angle value and its corresponding solution.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam, based on below equation Initialization matter function:
Wherein, fSphereFor the solution formula of problem function Sphere, fRastrigrinFor asking for problem function Rastrigrin Solution formula, fRosenbrockFor the solution formula of problem function Rosenbrock, x is problem function variable, and D is the dimension of problem function Degree.
Optimize, a kind of above-mentioned GPU parallel particle swarm optimization methods based on self-adaptive thread beam, based on below equation Update the formula of population:
Xid(t+1)=Xid(t)+Vid(t); (5)
Wherein, VidRepresent that the speed of each particle, t represent that current iterative algebra, w represent the inertia weight system of population Number, c1And c2Represent the accelerated factor of population, r1And r2It is equally distributed random number in [0,1] interval,Represent the particle Individual extreme value,Represent the global extremum of whole population, XidRepresent the current location (solution) of the particle.It is based on following public affairs Formula updates particle swarm parameter w and C1/C2:
W=1/ (2*ln (2)); (6)
c1=c2=0.5+ln (2); (7)
Therefore, the invention has the advantages that:
(1) method provided using the present invention, can significantly shorten PSO Algorithm for Solving problem times, improve related application Software responses speed;
(2) method provided with the present invention, can select low side CPU to be used for main frame, and middle and high end GPU is used for calculating, reaches The multi -CPU even performance of cluster, so as to reduce power consumption, save hardware cost.
Description of the drawings
Particle cluster algorithm optimization method flow charts of the Fig. 1 for the embodiment of the present invention.
Fig. 2 is CUDA parallel computational models.
Fig. 3 updates Organization Chart for the GPU ends of the embodiment of the present invention.
The particle cluster algorithm optimization method flow charts that comprising GPU end update Organization Chart of the Fig. 4 for the embodiment of the present invention.
Specific embodiment
Below by embodiment, and accompanying drawing is combined, technical scheme is described in further detail.
Embodiment:
As shown in figure 1, for a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam of the present embodiment, bag Include following steps:
Step 1:Initialization matter function parameter, initializes particle swarm parameter;
Step 2:Three CUDA kernel functions are defined, speed and position, the adaptation of particle of parallel computation particle is respectively used to What angle value and particle itself of future generation were found be preferably adapted to angle value and its up to the present corresponding solution, whole population find Be preferably adapted to angle value and its corresponding solution;
Step 3:The BlockNum and GridNum of each kernel function is calculated and is initialized according to self-adaptive thread beam algorithm;
Step 4:Call kernel function parallel iteration to update speed and the position of population, and obtain And its corresponding solution;
Step 5:Repeat step 4 result of calculation is exported until reaching the termination condition of setting, GPU.
Wherein, the problem function in step 1 is defined based on equation below (1)-(3):
Wherein, fSphereFor the solution formula of problem function Sphere, fRastrigrinFor asking for problem function Rastrigrin Solution formula, fRosenbrockFor the solution formula of problem function Rosenbrock, x is problem function variable, and D is the dimension of problem function Degree.
Shown in more new formula equation below (4)-(5) of population:
Xid(t+1)=Xid(t)+Vid(t); (5)
Wherein, VidRepresent that the speed of each particle, t represent that current iterative algebra, w represent the inertia weight system of population Number, c1And c2Represent the accelerated factor of population, r1And r2It is equally distributed random number in [0,1] interval,Represent the particle Individual extreme value,Represent the global extremum of whole population, XidRepresent the current location (solution) of the particle.Parameter w and C1/ Shown in more new formula following equation below (6)-(7) of C2:
W=1/ (2*ln (2)); (6)
c1=c2=0.5+ln (2); (7)
Three CUDA kernel functions defined in the step of the present embodiment 2, be respectively used to parallel computation particle speed and position, What the fitness value of particle and particle itself of future generation were found is preferably adapted to angle value and its corresponding solution, whole population to mesh Before till find be preferably adapted to angle value and its corresponding solution;
GPU parallel computations, this algorithm are realized on CUDA platforms.Fig. 2 is referred to, CUDA parallel computational models are a kind of SIMD The parallelization computation model of (single-instruction multiple-data), wherein GPU produce a large amount of threads as a coprocessor, can help CPU completes a large amount of simple computation work of highly-parallel.CUDA has three differences using the framework model of Multilayer Memory Level:Thread (Thread), thread block (Block) and block grid (Grid).Thread operates in SP On (StreamingProcessor, stream handle), be wherein most basic performance element, each Thread have one privately owned Depositor, the Thread of multiple execution same instructions can constitute a Block.Block operates in SM On (StreamingMultiprocessor flows multiprocessor), in a Block, all of Thread can pass through in Block Shared drive (ShareMemory) enter row data communication and shared, and realize synchronization, multiple Block for completing identical function A Grid can be constituted.Grid operates in SPA (ScalableStreamingProcessorArray, stream handle array) On, need not communicate between the Block in same Grid, and the execution between Gird is serial.I.e. when program is loaded, After Grid is loaded on GPU, all of Block being assigned on each stream multiprocessor by serial.Therefore, it is necessary to close Thread Count in distribution each Block of reason, preferably to improve degree of parallelism.
In actual motion, Block can be divided into less thread beam Warp.Line on a SM, in each Block Journey carries out sequential packet according to its unique ID, and 32 adjacent threads constitute a Warp.All of thread is in logic Be parallel, but from for the angle of hardware, not all thread can be executed in synchronization, and Warp is only SM Scheduling and the ultimate unit for executing.The definition of Warp is not taken out among CUDA programming models, and Warp is by the hard of GPU Part structures shape, but very big to performance impact.Thread in same Warp may be considered " while " execute, it is not necessary to Synchronize and also can be communicated by sharedmemory, can further save and call _ _ syncthreads () function The consumed time is synchronized to thread.In sum, increase the usage amount of each Block thread, and with Warp as list The optimization that position consideration distributes to Block center lines number of passes, it is possible to obtain higher performance.
Step 2 is implemented including following sub-step:
Step 2.1:Speed and the position kernel of particle is calculated, according to Block quantity BlockNum and Grid that distribute Computing formula of quantity GridNum by particle cluster algorithm, corresponding each the problem dimension that calculates of each GPU thread are corresponded to Speed and position;Wherein the kernel function is defined as follows:
__global__voidParticleFly_VP_kernel(float*Particle_X,float*P article_ V,int*GBestIndex,float*Particle_XBest,float*Particle_Fit Best,curandState*s)
Parameter in function represents the position array (length is population * dimension) of all particles, all grains successively respectively The speed array (length is population * dimension) of son, best particle subscript, degree of being preferably adapted to, the corresponding solution of degree of being preferably adapted to Array (dimension of the length for problem function);
Step 2.2:Calculate that the fitness value and particle itself of future generation of particle found is preferably adapted to angle value and its right The solution kernel that answers, according to the fitness value of each dimension of each particle of the BlockNum and GridNum parallel computations for distributing, then According to the fitness value of each dimension by parallel stipulations (Reduction) algorithm, the fitness value of each particle is obtained.Finally According to the fitness value for obtaining, the fitness value of more new particle and its corresponding solution;Wherein the kernel function is defined as follows:
__global__voidParticleFly_Fit_kernel(float*Particle_X,float*Particle_ XBest,float*Particle_Fit,float*Particle_FitBest)
Parameter in function represents the position array (length is population * dimension) of all particles successively respectively, preferably fits The array (dimension of the length for problem function) of the corresponding solution of response, the fitness array (length is population) of all particles, Degree of being preferably adapted to;
It should be noted that need to use in the kernel function parallel reduction to ask for being preferably adapted to angle value, so needing For the kernel function distribution shared drive, the shared drive size for needing distribution is Block_Size*sizeof (float), wherein Block_Size is the Thread Count in each Block;
Step 2.3:Calculate that up to the present whole population find is preferably adapted to angle value and its corresponding solution kernel, leads to Cross the cublasI using CUBLAS<t>Amin () function (data types of the t for operation object) tries to achieve whole particle on GPU What up to the present group found is preferably adapted to angle value and its corresponding solution;
The schematic diagram of step 3 is as shown in figure 3, calculate and initialize each kernel function according to self-adaptive thread beam algorithm BlockNum and GridNum;
Which is implemented including following sub-step:
Step 3.1:The characteristics of this method is based on CUDA computation models, the dimension of each particle is divided into one or more Warp, then using Blcok come comprising these Warp so that corresponding one or more particles in a Block, every so as to increase The usage amount of individual Block threads, reaches the effect of Warp level parallelisms.
Step 3.2:Population corresponding to the number of Warp corresponding to particle and Block all can be according to the big of dimensionality of particle Little carry out adaptive adjustment.Specific adaptive process is followed shown in equation below (8)-(10):
WarpNum=DivUp (D, WarpSize) (8)
ThreadNum=WarpNum*WarpSize (9)
ParticleNum=
DivDown(BlockSize,ThreadNum) (10)
Step 3.3:D represents that the dimension of Solve problems, WarpSize represent the size of a Warp in CUDA frameworks. The function of DivUp functions is that D rounds up divided by the business that WarpSize is obtained, to obtain the number of Warp corresponding to particle WarpNum.ThreadNum is used for representing the actual total number of threads that uses of each particle.BlockSize is represented one in CUDA frameworks The size of individual Block.The function of DivDown functions is to do to round BlockSize divided by the business that ThreadNum is obtained downwards, To obtain population ParticleNum corresponding to Block.
Step 3.4:Before kernel function is called, it is necessary first to determine size BlockNum and the Grid of the Block of CUDA Size GridNum, the characteristics of to embody self-adaptive thread Shu Fangfa, shown in formula (11) specific as follows-(12):
BlockNum=TreadNum*ParticleNum (11)
GridNum=DivUp (N, ParticleNum) (12)
The original computational methods based on GPU of this algorithm optimization, phase of adjustment parallel architecture to be allowed to parallel efficiency higher, design A set of improved CUDA parallel architectures mode, accelerates to execute using image processor (GPU).By corresponding for a thread dimension Degree, the corresponding particle of one or more Warp, the corresponding Block of one or more particles so that particle cluster algorithm is in list Degree of parallelism on individual main frame is further improved, and is compared first two method and is improve as many as 40 on the multiple of CPU speed-up ratios.
It is evidenced from the above discussion that, the present embodiment has the advantage that:
(1) method provided using the present invention, can significantly shorten PSO Algorithm for Solving problem times, improve related application Software responses speed;(2) method provided with the present invention, can select low side CPU to be used for main frame, and middle and high end GPU is used for calculating, The multi -CPU even performance of cluster is reached, so as to reduce power consumption, hardware cost is saved.
Method described by the present embodiment can be used to play in automatic pathfinding, the field such as image procossing.
Specific embodiment described herein is only to the spiritual explanation for example of the present invention.Technology neck belonging to of the invention The technical staff in domain can be made various modifications or supplement or replaced using similar mode to described specific embodiment Generation, but without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Claims (7)

1. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam, it is characterised in that
The dimension of each particle is divided into several thread beams, using thread block come comprising the thread beam so that a line Corresponding one or more particles in journey block;
Wherein, the thread beam is SM scheduling and the ultimate unit for executing.
2. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 1, its feature exist In the population corresponding to number WarpNum and thread block based on the thread beam corresponding to below equation adjustment particle ParticleNum:
WarpNum=DivUp (D, WarpSize) (8)
ThreadNum=WarpNum*WarpSize (9)
ParticleNum=
DivDown(BlockSize,ThreadNum) (10)
In formula, D represents that the dimension of Solve problems, WarpSize represent the size of a thread beam in CUDA frameworks;DivUp functions Function be that D rounds up divided by the business that WarpSize is obtained, to obtain number WarpNum of Warp corresponding to particle; ThreadNum is used for representing the actual total number of threads that uses of each particle;BlockSize represents a Block in CUDA frameworks Size, the function of DivDown functions is to do to round BlockSize divided by the business that ThreadNum is obtained downwards, to obtain Population ParticleNum corresponding to Block.
3. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 1, its feature exist In, before kernel function is called,
Based on self-adaptive thread beam algorithm, the number of the thread block of each kernel function is calculated and is initialized using below equation BlockNum and number GridNum of grid:
BlockNum=TreadNum*ParticleNum;
GridNum=DivUp (N, ParticleNum);
In formula, ThreadNum is used for representing the actual total number of threads that uses of each particle;ParticleNum is right for thread block The population that answers;N is the total number of particle in population.
4. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 1, its feature exist In,
Three CUDA kernel functions are defined, speed and position, the fitness value of particle and next of parallel computation particle is respectively used to For particle itself found be preferably adapted to angle value and its up to the present corresponding solution, whole population find is preferably adapted to Angle value and its corresponding solution.
5. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 4, its feature exist In specifically including following steps:
Step 2.1:Speed and the position kernel of particle is calculated, each GPU thread is according to the thread block number for distributing BlockNum and meshes number GridNum, calculate each problem dimension by the computing formula of particle cluster algorithm is corresponding Corresponding speed and position;
Step 2.2:Calculate that the fitness value and particle itself of future generation of particle found is preferably adapted to angle value and its corresponding Solution kernel, according to the fitness value of each dimension of each particle of the BlockNum and GridNum parallel computations for distributing, further according to The fitness value of each dimension obtains the fitness value of each particle, finally according to the adaptation for obtaining by parallel reduction algorithm The fitness value of angle value, more new particle and its corresponding solution;
Step 2.3:Calculate that up to the present whole population find is preferably adapted to angle value and its corresponding solution kernel, by making CublasI with CUBLAS<t>Amin () function (data types of the t for operation object) is tried to achieve whole population on GPU and is arrived That found so far is preferably adapted to angle value and its corresponding solution.
6. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 1, its feature exist In based on below equation initialization matter function:
f S p h e r e ( x ) = &Sigma; d = 1 D x d 2 , x d &Element; &lsqb; - 100 , 100 &rsqb; ; - - - ( 1 )
f R a s t r i g r i n ( x ) = &Sigma; d = 1 D &lsqb; x d 2 - 10 c o s ( 2 &pi;x d ) + 10 &rsqb; , x d &Element; &lsqb; - 5.12 , 5.12 &rsqb; ; - - - ( 2 )
f R o s e n b r o c k ( x ) = &Sigma; d = 1 D - 1 &lsqb; 100 ( x d + 1 - x d 2 ) 2 + ( x d - 1 ) 2 &rsqb; , x d &Element; &lsqb; - 10 , 10 &rsqb; ; - - - ( 3 )
Wherein, fSphereFor the solution formula of problem function Sphere, fRastrigrinSolution for problem function Rastrigrin is public Formula, fRosenbrockFor the solution formula of problem function Rosenbrock, x is problem function variable, and D is the dimension of problem function.
7. a kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam according to claim 6, its feature exist In based on the formula that below equation updates population:
V i d ( t + 1 ) = wV i d ( t ) + c 1 r 1 ( P i d b ( t ) - X i d ( t ) ) + c 2 r 2 ( P d g b ( t ) - X i d ( t ) ) ; - - - ( 4 )
Xid(t+1)=Xid(t)+Vid(t); (5)
Wherein, VidRepresent that the speed of each particle, t represent that current iterative algebra, w represent the inertia weight coefficient of population, c1And c2Represent the accelerated factor of population, r1And r2It is equally distributed random number in [0,1] interval,Represent the particle Individual extreme value,Represent the global extremum of whole population, XidRepresent the current location (solution) of the particle;
Particle swarm parameter w and C1/C2 are updated based on below equation:
W=1/ (2*ln (2)); (6)
c1=c2=0.5+ln (2); (7).
CN201610976893.2A 2016-10-28 2016-10-28 A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam Active CN106502632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610976893.2A CN106502632B (en) 2016-10-28 2016-10-28 A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610976893.2A CN106502632B (en) 2016-10-28 2016-10-28 A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam

Publications (2)

Publication Number Publication Date
CN106502632A true CN106502632A (en) 2017-03-15
CN106502632B CN106502632B (en) 2019-01-18

Family

ID=58323248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610976893.2A Active CN106502632B (en) 2016-10-28 2016-10-28 A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam

Country Status (1)

Country Link
CN (1) CN106502632B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578090A (en) * 2017-08-31 2018-01-12 上海爱优威软件开发有限公司 A kind of FPA realization method and systems based on CUDA platforms
CN108959810A (en) * 2018-07-24 2018-12-07 东北大学 A kind of Fast Identification Method, device and the continuous casting installation for casting of slab heat transfer parameter
CN109634830A (en) * 2018-12-19 2019-04-16 哈尔滨工业大学 A kind of CUDA program integration performance prediction method based on multiple features coupling
CN109741796A (en) * 2019-01-07 2019-05-10 厦门大学 A kind of Parallel Particle Swarm Optimization alloy nano particle structural optimization method and system
CN109992385A (en) * 2019-03-19 2019-07-09 四川大学 A kind of inside GPU energy consumption optimization method of task based access control balance dispatching
CN114138449A (en) * 2021-12-14 2022-03-04 河南省儿童医院郑州儿童医院 Rehabilitation training system based on virtual reality
CN114880082A (en) * 2022-03-21 2022-08-09 西安电子科技大学 Multithreading beam warp dynamic scheduling system and method based on sampling state

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604418A (en) * 2009-06-29 2009-12-16 浙江工业大学 Chemical enterprise intelligent production plan control system based on quanta particle swarm optimization
CN101819651A (en) * 2010-04-16 2010-09-01 浙江大学 Method for parallel execution of particle swarm optimization algorithm on multiple computers
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
CN104680235A (en) * 2015-03-03 2015-06-03 江苏科技大学 Design method of resonance frequency of circular microstrip antenna
US20150201910A1 (en) * 2014-01-17 2015-07-23 Centre For Imaging Technology Commercialization (Cimtec) 2d-3d rigid registration method to compensate for organ motion during an interventional procedure
CN105718998A (en) * 2016-01-21 2016-06-29 上海斐讯数据通信技术有限公司 Particle swarm optimization method based on mobile terminal GPU operation and system thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604418A (en) * 2009-06-29 2009-12-16 浙江工业大学 Chemical enterprise intelligent production plan control system based on quanta particle swarm optimization
CN101819651A (en) * 2010-04-16 2010-09-01 浙江大学 Method for parallel execution of particle swarm optimization algorithm on multiple computers
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
US20150201910A1 (en) * 2014-01-17 2015-07-23 Centre For Imaging Technology Commercialization (Cimtec) 2d-3d rigid registration method to compensate for organ motion during an interventional procedure
CN104680235A (en) * 2015-03-03 2015-06-03 江苏科技大学 Design method of resonance frequency of circular microstrip antenna
CN105718998A (en) * 2016-01-21 2016-06-29 上海斐讯数据通信技术有限公司 Particle swarm optimization method based on mobile terminal GPU operation and system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈风等: "基于CUDA的并行粒子群优化算法研究及实现", 《计算机科学》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578090A (en) * 2017-08-31 2018-01-12 上海爱优威软件开发有限公司 A kind of FPA realization method and systems based on CUDA platforms
CN108959810A (en) * 2018-07-24 2018-12-07 东北大学 A kind of Fast Identification Method, device and the continuous casting installation for casting of slab heat transfer parameter
CN108959810B (en) * 2018-07-24 2020-11-03 东北大学 Method and device for rapidly identifying heat transfer parameters of casting blank and continuous casting equipment
CN109634830A (en) * 2018-12-19 2019-04-16 哈尔滨工业大学 A kind of CUDA program integration performance prediction method based on multiple features coupling
CN109634830B (en) * 2018-12-19 2022-06-07 哈尔滨工业大学 CUDA program integration performance prediction method based on multi-feature coupling
CN109741796A (en) * 2019-01-07 2019-05-10 厦门大学 A kind of Parallel Particle Swarm Optimization alloy nano particle structural optimization method and system
CN109741796B (en) * 2019-01-07 2020-06-30 厦门大学 Parallel particle swarm alloy nanoparticle structure optimization method and system
CN109992385A (en) * 2019-03-19 2019-07-09 四川大学 A kind of inside GPU energy consumption optimization method of task based access control balance dispatching
CN109992385B (en) * 2019-03-19 2021-05-14 四川大学 GPU internal energy consumption optimization method based on task balance scheduling
CN114138449A (en) * 2021-12-14 2022-03-04 河南省儿童医院郑州儿童医院 Rehabilitation training system based on virtual reality
CN114880082A (en) * 2022-03-21 2022-08-09 西安电子科技大学 Multithreading beam warp dynamic scheduling system and method based on sampling state
CN114880082B (en) * 2022-03-21 2024-06-04 西安电子科技大学 Multithreading beam warp dynamic scheduling system and method based on sampling state

Also Published As

Publication number Publication date
CN106502632B (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN106502632A (en) A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam
Klöckner et al. Nodal discontinuous Galerkin methods on graphics processors
Nageswaran et al. A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors
Yudanov et al. GPU-based simulation of spiking neural networks with real-time performance & high accuracy
CN106650925A (en) Deep learning framework Caffe system and algorithm based on MIC cluster
CN108564213A (en) Parallel reservoir group flood control optimal scheduling method based on GPU acceleration
CN114490011B (en) Parallel acceleration realization method of N-body simulation in heterogeneous architecture
Jeon et al. Parallel exact inference on a CPU-GPGPU heterogenous system
CN108984483A (en) The electric system sparse matrix method for solving and system reset based on DAG and matrix
CN105183562B (en) A method of rasterizing data are carried out based on CUDA technologies to take out rank
CN109165734A (en) Matrix local response normalization vectorization implementation method
Nagaoka et al. Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation
Kohek et al. Interactive synthesis of self-organizing tree models on the GPU
Zhang et al. Ftsgd: An adaptive stochastic gradient descent algorithm for spark mllib
Pratas et al. Accelerating the computation of induced dipoles for molecular mechanics with dataflow engines
Akyol et al. Multi-machine earliness and tardiness scheduling problem: an interconnected neural network approach
Topa Cellular automata model tuned for efficient computation on GPU with global memory cache
Rajeswar et al. Scaling up the training of deep CNNs for human action recognition
Nazarifard et al. Efficient implementation of the Bellman-Ford algorithm on GPU
Kuźnik et al. Graph grammar-based multi-frontal parallel direct solver for two-dimensional isogeometric analysis
Ohmura et al. Multi-gpu acceleration of optical flow computation in visual functional simulation
Ward et al. Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster
Jain et al. Value Iteration on Multicore Processors
CN108985622A (en) A kind of electric system sparse matrix Parallel implementation method and system based on DAG
Kumar et al. Efficient training of convolutional neural nets on large distributed systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210406

Address after: No.8, Huyue East Road, Longchi street, Liuhe District, Nanjing City, Jiangsu Province

Patentee after: Nanjing Beidou innovation and Application Technology Research Institute Co.,Ltd.

Address before: 430072 Hubei Province, Wuhan city Wuchang District of Wuhan University Luojiashan

Patentee before: WUHAN University