CN113239620A - Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration - Google Patents

Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration Download PDF

Info

Publication number
CN113239620A
CN113239620A CN202110509992.0A CN202110509992A CN113239620A CN 113239620 A CN113239620 A CN 113239620A CN 202110509992 A CN202110509992 A CN 202110509992A CN 113239620 A CN113239620 A CN 113239620A
Authority
CN
China
Prior art keywords
particle
gpu
particle swarm
state
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110509992.0A
Other languages
Chinese (zh)
Other versions
CN113239620B (en
Inventor
康恒一
闫自海
王紫娟
刘世明
甘鹏路
严佳佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PowerChina Huadong Engineering Corp Ltd
Original Assignee
PowerChina Huadong Engineering Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PowerChina Huadong Engineering Corp Ltd filed Critical PowerChina Huadong Engineering Corp Ltd
Priority to CN202110509992.0A priority Critical patent/CN113239620B/en
Publication of CN113239620A publication Critical patent/CN113239620A/en
Application granted granted Critical
Publication of CN113239620B publication Critical patent/CN113239620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an improved particle swarm method for identifying intrinsic model parameters of geotechnical materials based on GPU acceleration. Based on the characteristic of large calculation amount in the calculation process, the method solves the problem of calculation efficiency in two aspects of algorithm optimization and high-performance hardware application. On one hand, a mechanism of early termination is introduced according to the characteristic of error function accumulation, so that the calculated amount is saved; on the other hand, the GPU is applied to the computing equipment, and the optimization of the program structure is carried out according to the characteristics of the instruction set of the computing equipment. The invention has the beneficial effect of improving the cross-order calculation efficiency.

Description

Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration
Technical Field
The invention relates to a method for identifying geotechnical constitutive model parameters, belongs to the field of geotechnical engineering indoor tests and material mechanics, provides an auxiliary method for calibrating mechanical constitutive model parameters by using indoor triaxial mechanical test data, and particularly relates to a calculation method for improving calculation efficiency by introducing an early termination mechanism to error function accumulation of a conventional particle swarm optimization method and performing parallelization acceleration by using a GPU (graphics processing unit).
Background
In numerical simulation in geotechnical engineering, a proper constitutive model and parameter values must be selected to reflect the nonlinear relation of the unit body dimensions of the material, so that a relatively accurate calculation result can be obtained in the calculation of the macroscopic dimension. As a pre-set operation of numerical simulation, it is very important to perform indoor geotechnical triaxial test and calibrate a mechanical constitutive model by using test data. Generally speaking, the parameter calibration of a simple constitutive model relies on the physical meaning of the model itself and a geometric interpretation of the stress-strain curve. For example, in a Von Mises model with only two model parameters, elastic modulus and yield stress, the slope of the elastic portion of the stress-strain curve corresponds to the elastic modulus in the model parameter, and the inflection point at yield corresponds to the yield stress in the model parameter. However, for more complex constitutive models, the geometric features represented in the stress-strain curve cannot correspond to the model parameters one-to-one. And part of high-constitutive models, such as a sub-plastic model, also introduce a plurality of hyper-parameters which do not have clear physical significance and are used for realizing smooth fitting of a stress-strain curve. Such parameters can generally only be obtained by trial and error with repeated commissioning. However, when the number of the model parameters is large, it is very difficult to manually adjust a plurality of hyper-parameters and simultaneously conform a plurality of groups of material stress-strain curves obtained by calculation to the test data. Therefore, the calibration problem of the mechanical parameters of the geotechnical materials must be considered from the viewpoint of optimization.
Particle Swarm Optimization (PSO) is a biological heuristic method in the field of computational intelligence, and belongs to a group intelligent optimization algorithm. The optimization process does not need to solve gradient information of an objective function, is easy to implement, high in universality and strong in local and global search capabilities, and is widely applied to optimization problems in a plurality of scientific and engineering fields such as function optimization (CN201610282691.8), wind power prediction (CN201710415906.3), customer classification (CN201911088225.6) and the like. In the invention, the calibration problem of the material parameters can be considered as independent variable from the viewpoint of optimization, the deviation between geotechnical test data and calculation data of a constitutive model is considered as an error function, and the material model parameters with the minimum error function are solved by using a particle swarm algorithm.
When the material constitutive model is complex, the problem of large calculation amount exists when the particle swarm algorithm is applied to parameter identification. In general, the optimization problem we need to solve is that of a non-convex function. The model parameters are more, the non-convex characteristic is stronger, the global optimum can be found only by more particles, and the problems of premature convergence and falling into the local optimum solution are avoided. On the other hand, stress integration of a complex constitutive model requires a very large amount of computation, which makes the error function computation very large. We know the evolution of specimen stress and strain during the loading test. During the calculation of the error function, the strain increment at each step can be considered as a known quantity, stress integration is performed and the stress state is updated so that comparison with the experimental stress state can be made. The stress-strain curve is integrated and the squares of the error between the calculated stress and the experimental stress for each step are summed as an error function. According to the particle swarm mechanism, in each iteration process, a complete stress-strain curve needs to be calculated for each particle, and the calculation is time-consuming. Therefore, there is a need to improve the computational efficiency of this method.
The invention aims to solve the problem of computational efficiency from two aspects of algorithm optimization and high-performance hardware application. From the above, the error function is calculated as a monotonically increasing accumulation process. According to the characteristic, a mechanism of early termination can be introduced to the particles which exceed the self-history optimization of the error function in the accumulation process, so that the calculation amount is saved. In another aspect, the present disclosure introduces a method for parallelization using a graphics processor GPU. Compared with the conventional central processing unit CPU, the GPU has extremely high floating point arithmetic capability and data throughput and is suitable for the problem of higher algorithm strength. Meanwhile, if the programming is not proper, a Single Instruction Multiple Data (SIMD) instruction set of the GPU can bring serious branch punishment, and the invention adjusts the program structure on the GPU and optimizes the performance aiming at the problem.
Disclosure of Invention
The invention aims to provide an improved particle swarm method for identifying parameters of a geotechnical material constitutive model based on GPU acceleration, which solves the problem of computational efficiency from two aspects of algorithm optimization and high-performance hardware application, and adopts the following technical scheme:
using the strain in the experimental data { εexp,kIntegrating the stress (as indicated by Update function) as a known quantity, gradually updating the stress state { σ }cal,kAnd the calculated value of stress σcalAnd the experimental value sigmaexpIs accumulated for calculating an error function, the accumulation process of the error function E is defined as follows:
calculating a strain increment: delta epsilonk+1=εexp,k+1exp,k (1)
Updating the calculated value of the stress state:
Figure BDA0003059971590000021
cumulative error function:
Figure BDA0003059971590000031
number of population N in terms of particle groupparAs the total thread quantity, the GPU is used for single-instruction multi-instruction at the granularity level of single stress integrationParallelization calculation formulas (1) to (3) of data SIMD are defined as basic parallel steps;
based on the above definition of the error function calculation and the basic parallel steps, the specific execution steps of the calculation of the invention are as follows:
step S1, trial calculation of predetermined number of times NtrialAfter the basic parallel step, performing SIMD parallelization judgment on the running state of each particle, and checking whether the current error function value E exceeds the optimal (pbest) history, wherein the logic of the state judgment is as follows:
if E ≦ pbest and the particle did not complete all data computations is defined to accept accumulation state a0,
if E ≦ pbest and the particle that completed all data calculations is defined as the normal termination state a1,
if E > pbest, then define an abort state A2;
step S2: for all the particles with the state of A1, as the error function value does not exceed the self optimal pbest after the accumulation of all the data is completed, the pbest and the corresponding parameter value can be updated in SIMD parallelization;
step S3: serially traversing all the particles with the state of A1, and if the history optimal pbest of the particles is the global optimal gbest of all the particles, updating the pbest and the corresponding parameter values of the pbest;
step S4: performing SIMD parallel particle swarm update operation on the particles in A1 and A2 states, wherein id is 1,2,3, … and Npar(ii) a D is 1,2,3, …, D; t is 1,2,3, …. id is the number of the particle, k is the current time step, and D is the dimension of the particle. In this example, the value of D is equal to the number of parameters of the constitutive model; omega is an inertia weight factor, the value of the inertia weight factor is nonnegative, when the value is larger, the global optimizing capability is strong, and the local optimizing capability is weak; when the time is small, the global optimizing capability is weak, and the local optimizing capability is strong. By adjusting the magnitude of omega, the global optimization performance and the local optimization performance can be adjusted; c. C1And c2The acceleration constant is called, the former is an individual learning factor of each particle, and the latter is a social learning factor of each particle; in general, c1And c2Taking 2, but not necessarily 2, and trying between 0 and 4; and r1And r2Represents an interval of [0,1 ]]The random number introduces a certain randomness, and increases the chance of finding the global optimum. Each iteration needs to be carried out according to the optimal parameter p of the particleidGlobal optimum parameter p of sum groupgdUpdating the speed v and the position x of the particle; the specific formula is as follows:
updating the particle speed:
Figure BDA0003059971590000032
updating the particle position:
Figure BDA0003059971590000033
step S5: for the particles of a1 and a2 states, the particle swarm state processing of SIMD parallelization is performed, the error function is set to 0, and the state is reset to a 0.
The checking step Check: the completion of one step S1 to S5 is defined as a loop step, and the loop step occurring is denoted as T. After each cycle step of the appointed times, whether the current globally optimal parameter value meets the convergence requirement or not is checked, if so, the calculation is finished, otherwise, the calculation is continued.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the improvement of the invention in the aspect of algorithm improvement is that through the error function calculation process, the intervention of early termination of error function accumulation is carried out on the particles which exceed the self optimal history in the accumulation process, so that unnecessary stress product steps are reduced, and the calculation time is saved. In fact, if the value of the error function of a particle does not exceed its historical optimum, the value of the error function does not participate in subsequent particle swarm calculations, and therefore, the exact value does not need to be known. The conventional particle swarm algorithm always requires accurate calculation of the value of the error function, and the calculation efficiency is relatively low.
The improvement of the invention in the aspect of high-performance hardware application is that the GPU is used for parallelizing single-instruction multi-data SIMD on a fine-grained level of single-time stress integration, and the judgment and processing of the particle running state are carried out in time in the calculation process, so that the particles return to an accumulated state, all threads of the GPU can execute completely consistent instruction sets, the idle waiting time in the SIMD parallelization is reduced, and the utilization efficiency of the GPU calculation is improved. In the conventional parallelization particle swarm optimization, parallelization processing is usually performed on the granularity level of particles, and if a logic branch occurs in the process of calculating an error function, a serious idle waiting problem is generated.
Drawings
FIG. 1 is a flow chart of a calculation;
FIG. 2 shows the ratio of the integral to the ratio of N actually occurringtrialThe relationship between;
fig. 3 shows the number of integration occurrences per second.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and in a specific real-time manner. In particular, to better illustrate the efficiency increase provided by the present invention, the acceleration is illustrated by the von Wollfersdorf subplastic model. In addition, the term description of the present invention with respect to GPU computations refers to OpenCL, but in fact the present algorithm may be implemented on CUDA platforms as well. It should be understood that the embodiments described herein are merely illustrative and are not intended to limit the invention.
The memory layout of the algorithm is as follows:
the main variables applied in the Global Memory (Global Memory) of the GPU terminal are as follows: running State of the particles (runstat Npar]) Array of positions of particles (pos [ N ]par*D]) Velocity array of particles (vel [ N ]par*D]) The current error function value array of the particle (curobj [ N ]par]) Current stress state σ of the particle1,cal~σ3,cal(runSig1[Npar],runSig2[Npar],runSig3[Npar]) Current porosity ratio of particle (runvoidRt [ N ]par]) History optimal error function of particlesNumerical array (pbest [ N ]par]) Historical optimal position array of particles (bestpos [ N ]par*D]) Global optimal position (gbpos [ D ]]) Global optimum error function value (gbest). Besides, variables should be set in the global memory for storing experimental data, and the product step of another experiment is NdataI.e. experimental value of stress σ1,exp~σ3,exp(datSig1[Ndata]、datSig2[Ndata]、datSig3[Ndata]) And strain experimental value ε1,exp~ε3,exp(datEps1[Ndata]、datEps2[Ndata]、datEps3[Ndata])。
Array variables should be arranged in a column-first manner with pos [ N ]par*D]For example, the mth parameter of the id particle is addressed in pos [ id + D m [ ]]. Therefore, in the calculation process, adjacent threads can strictly access the continuous address space, and the access bandwidth can be improved in a combined access mode.
General parameter omega, c of particle swarm optimization1,c2Since the value thereof does not change and the occupied space is small in the calculation, it should be arranged in a Constant Memory (Constant Memory). The hardware will actively cache it in each thread, increasing access bandwidth in repeated calls.
The core of the algorithm executes steps S1 to S5, and should be deployed in a GPU in the form of a kernel function for execution. According to the requirements of functions and parallelization of S1 to S5, 4 GPU-kernels K1-K4 are defined, and the functions and parallelization conditions are described as follows:
k1: this function corresponds exactly to the function of step S1 described in the summary of the invention, the number of bus threads being equal to the total number N of particle clusterspar. Reading the parameter value of the corresponding particle from the pos array in parallel by each thread tid, and entering into an NtrialThe next stress integral is summed with the error function E for the for cycle. In each step k of the cycle, the strain and stress are read from the array of datEps and datSig, and the strain delta is calculated as (datEps1[ k +1 ] according to equations (1) to (3)]-datEps1[k],datEps2[k+1]-datEps2[k],datEps3[k+1]-datEps3[k]) Update thread tid pairStress to calculate (runSig1[ tid)],runSig2[tid],runSig3[tid]) The squared difference between the calculated stress and the experimental stress (runSig1[ tid)]-datSig1[k+1])2+(runSig2[tid]-datSig2[k+1])2+(runSig3[tid]-datSig3[k+1])2Accumulated to the error function curobj [ tid ] corresponding to the thread]In (1). Followed by comparison of curobj [ tid ]]With pbest [ tid ]]And checking whether k exceeds Ndata. If curobj [ tid ]]≤pbest[tid]And k is<NdataDefining the running state runstat [ tid ] of the particle]0, but does not jump out of the loop; if curobj [ tid ]]≤pbest[tid]And k is NdataDefining the running state runstat [ tid ] of the particle]Is 1 and the cycle is skipped; if curobj [ tid ]]>pbest[tid]Defining the running state runstat [ tid ] of the particle]Is 1 and the cycle is skipped.
K2: this function corresponds exactly to the function of step S2 described in the summary of the invention, the number of bus threads being equal to the total number N of particle clusterspar. Each thread tid parallelly acquires the current particle state from the runstat array, and if runstat [ tid ]]1, then pbest [ tid ] is set]=curobj[tid]And updating self-history optimal parameter in bestpos, bestpos [ tid ], using current particle position of pos array]=pos[tid],bestpos[tid+Npar]=pos[tid+Npar],…,bestpos[tid+(D-1)*Npar]=pos[tid+(D-1)*Npar]。
K3: this function corresponds exactly to the function of step S3 described in the summary of the invention, the number of threads is equal to 1, i.e. this is a serial operation. Serially traversing all particles, setting the currently processed particle as i, and checking pbest [ i [ [ i ]]If the error function value is less than the gbest, updating the globally optimal error function value gbest to pbest [ i [ [ i ]]And updates the global optimum parameter, gbpos [0 ]]=bestpos[i],gbpos[1]=bestpos[i+Npar],…,gbpos[D-1]=bestpos[i+(D-1)*Npar]。
K4: this function corresponds to the functions of steps S4, S5 described in the summary of the invention, the number of bus threads equals the total number N of particle groupspar. Each thread tid parallelly acquires the current particle state from the runstat array, and for the runstat [ tid ]]The particle with 1 or 2, updating the sum of the velocities vel and the sum of the velocities of the particle group according to the rules of the formulas (4) and (5)Position pos. Setting curobj [ tid ] at the same time]At 0, wait for a new round of accumulation.
The execution logic of Check step Check is to read the values of gbest and gbpos to the CPU using the clequeuerbuffer function, with a data size of only (1+ D) floats. If the change of the two adjacent gbpos is not large or the gbest value is smaller than the control value, the calculation may be considered to be terminated.
Before the above core calculation step and check step are performed, an initialization operation I is required. The initialization operation includes the following:
GPU platform setup I1: OpenCL basic information such as Platform, Device, Context, Program and the like is set on a GPU, Kernel function parameters are set, and then clCreateBuffer is used for applying for an address space used for calculation on the GPU.
Particle swarm initial state settings I2: initializing the initial position of the particle swarm on the CPU, and copying to a pos array in the GPU by using the clEnqueWriteBuffer. The initial position is randomly selected within a reasonable range of constitutive parameters.
Introduction of experimental data I3: the experimental data for stress and strain were read on the CPU and copied to the datSig and datpes arrays in the GPU using the clEnqueWriteBuffer.
Combining the above information, the flow chart can be summarized as fig. 1, which includes the flow of the running program and the flow of the core calculation step.
The invention is applied to the parameter inversion of the von Wollfersdorf model, which has 8 parameters in total,
Figure BDA0003059971590000071
hs、n、ed0、ec0、ei0alpha, beta. The integral of stress algorithm is based on the strain increase
Figure BDA0003059971590000072
Calculating the stress increment
Figure BDA0003059971590000073
Corresponding to Update in formula (2), which involves more floating point operations, the correlation formula is as follows:
Figure BDA0003059971590000074
Figure BDA0003059971590000075
Figure BDA0003059971590000076
is unit tensor of fourth order (7)
Figure BDA0003059971590000077
Figure BDA0003059971590000078
Figure BDA0003059971590000079
Figure BDA00030599715900000710
Figure BDA00030599715900000711
I is the second order unit tensor (12)
Figure BDA00030599715900000712
Figure BDA00030599715900000713
Figure BDA00030599715900000714
Figure BDA00030599715900000715
The experimental data used for the inversion included two stress paths (drained/not drained), three sets of confining pressures (100kPa/300kPa/500kPa) and three sets of void ratios (0.8/0.9/1.0), for a total of 18 sets of data. Each set of data contained 400 data points, 399 stress product steps, each loaded to 30% axial strain, sufficient to reflect the nonlinear process of specimen loading to a critical state. Using N in the testparThe number of particles is 20000. To ensure the completeness of the data, a total of 7182 stress product steps were used in this section using all 18 sets of data. Tests different trial product steps NtrialInfluence of (1), N trial7182, 3591, 1596, 798, 399, 228, 57, 6, 1. For the calculation to be comparable, the total trial product step is defined as 1436400, which corresponds to N abovetrialThe cycle steps T of (1) are respectively 200, 400, 900, 1800, 3600, 6300, 25200, 239400 and 1436400. In the calculation, the GPU computing device is Nvidia GTX 1050, and the CPU computing device is Intel i 7-7700.
Here, FIG. 2 illustrates the actual integral ratio and NtrialThe previous relationship. In the calculation process, due to the existence of the early termination accumulation mechanism, the number of integration steps actually occurring is less than the specified total trial calculation steps. N is a radical oftrialThe test of early termination is performed after all data is tried out, which is compared to a lower ratio of the number of stress integration steps actually occurring 7182. Thereby reducing NtrialWill cause more check points to be set in the calculation so that particles that have terminated prematurely will return to the state of the stress integration operation, thus following NtrialThe proportion of the number of integration steps of the actual occurring stress is increased. When N istrialIf 1, then every time a stress integration is performed, it is checked whether the particles need to be terminated early. That is, all particles always perform stress integration, and the ratio of the number of steps in which stress integration actually occurs is 1. FIG. 2 illustrates the following with NtrialReduce the actual occurrence ofA case where the proportion of the number of force integration steps gradually increases. In NtrialAt 400 or less, the actual integral ratios are all high, and when N is usedtrialAbove 400, the actual integral ratio proceeds with NtrialThe increase and decrease are faster.
Figure 3 further counts the number of integration steps performed per second. When N is presenttrialWhen smaller, additional overhead is incurred due to the processing function of states K2-K4, despite the higher fraction of integration that is accomplished. In particular, the K3 function of the K2-K4 is a serial function, and since the GPU has low single-core performance, when the K3 function is executed too many times, the performance of the K3526 function is reduced on the GPU more significantly. Thus, NtrialVery small calculations are not very efficient. On the other hand, when N istrialWhen the integral ratio is larger, the actually generated integral ratio is lower, but the particles stopping accumulation in advance can only be in an idle waiting state due to the characteristics of SIMD in the GPU, and cannot be put into subsequent calculation. Therefore, when N istrialThe calculation efficiency is not high when it is large. In NtrialThe calculation efficiency of 798 reaches the maximum of 13366 steps/s, which is 21.28 times that of the CPU.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (7)

1. An improved particle swarm method for parameter identification of a constitutive model of a geotechnical material based on GPU acceleration is characterized in that stress integration is carried out by taking strain in experimental data as a known quantity, an accumulated value of a deviation square of a calculated value of the stress and an experimental value is taken as an error function, and the particle swarm method is used for searching for optimal parameters;
calculating a strain increment: delta epsilonk+1=εexp,k+1exp,k (1);
Updating the calculated value of the stress state:
Figure FDA0003059971580000011
cumulative error function:
Figure FDA0003059971580000012
2. the improved particle swarm method for geotechnical material constitutive model parameter identification based on GPU as claimed in claim 1, wherein the method for calculating error function in the particle swarm method combines the requirement of Single Instruction Multiple Data (SIMD) instruction set of GPU, and the population number N of the particle swarm isparAs the total thread amount, the parallelization computations (1) to (3) of single instruction multiple data SIMD are performed using the GPU at the fine granularity level of single stress integration, which is defined as a basic parallelization step.
3. The improved particle swarm method for the parametric identification of the geotechnical material constitutive model based on GPU acceleration as claimed in claim 1, characterized by comprising the following steps:
step S1, trial calculation of predetermined number of times NtrialAfter the basic parallel step, performing SIMD parallelization judgment on the running state of each particle, and checking whether the current error function value E exceeds the optimal (pbest) history, wherein the logic of the state judgment is as follows: if E ≦ pbest and the particle that did not complete all data computations is defined as accepting the accumulation state A0, if E ≦ pbest and the particle that completed all data computations is defined as the Normal termination state A1, if E>pbest is defined as an abort state a 2;
step S2: for all the particles with the state of A1, as the error function value does not exceed the self optimal pbest after the accumulation of all the data is completed, the pbest and the corresponding parameter value can be updated in SIMD parallelization;
step S3: serially traversing all the particles with the state of A1, and if the history optimal pbest of the particles is superior to the global optimal gbest of all the particles, updating the pbest and the corresponding parameter values thereof;
step S4: performing SIMD parallelized particle swarm update operations on the particles in A1 and A2 states;
step S5: for the particles of a1 and a2 states, the particle swarm state processing of SIMD parallelization is performed, the error function is set to 0, and the state is reset to a 0.
4. The improved particle swarm method for parameter identification of the geotechnical material constitutive model based on GPU acceleration as claimed in claim 3, wherein in step S4, the particle swarm updating operation has the following specific formula:
updating the particle speed:
Figure FDA0003059971580000021
updating the particle position:
Figure FDA0003059971580000022
wherein id is 1,2,3, …, Npar(ii) a D is 1,2,3, …, D; t ═ 1,2,3, …; id is the serial number of the particle, k is the current time step, and D is the dimension of the particle; in this example, the value of D is equal to the number of parameters of the constitutive model; omega is an inertia weight factor, the value of omega is non-negative, and the global optimization performance and the local optimization performance can be adjusted by adjusting the value of omega; c. C1And c2Is called the acceleration constant, and r1And r2Represents an interval of [0,1 ]]A random number of (c); each iteration needs to be carried out according to the optimal parameter p of the particleidGlobal optimum parameter p of sum groupgdThe velocity v and position x of the particle are updated.
5. The improved particle swarm method for parameter identification of geotechnical material constitutive models based on GPU acceleration as claimed in claim 3, further comprising the checking step Check: the completion of one step S1 to S5 is defined as a loop step, and the loop step occurring is denoted as T. After each cycle step of the appointed times, whether the current globally optimal parameter value meets the convergence requirement or not is checked, if so, the calculation is finished, otherwise, the calculation is continued.
6. The improved particle swarm method for parameter identification of the geotechnical material constitutive model based on GPU acceleration as claimed in claim 3, wherein the calculation process is a monotone accumulated error function, and a mechanism of early termination is introduced to particles which have exceeded the self-history optimization of the error function in the accumulation process, so as to save calculation amount.
7. The improved particle swarm method for parameter identification of the geotechnical material constitutive model based on GPU acceleration as claimed in claim 3, wherein the judgment and processing of the particle running state are performed in time during the calculation process, so that the particles return to the accumulation state, each thread of the GPU can execute a completely consistent instruction set, idle waiting time in SIMD parallelization is reduced, and the utilization efficiency of GPU calculation is improved.
CN202110509992.0A 2021-05-11 2021-05-11 Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration Active CN113239620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110509992.0A CN113239620B (en) 2021-05-11 2021-05-11 Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110509992.0A CN113239620B (en) 2021-05-11 2021-05-11 Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration

Publications (2)

Publication Number Publication Date
CN113239620A true CN113239620A (en) 2021-08-10
CN113239620B CN113239620B (en) 2023-03-28

Family

ID=77133226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110509992.0A Active CN113239620B (en) 2021-05-11 2021-05-11 Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration

Country Status (1)

Country Link
CN (1) CN113239620B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
CN105527650A (en) * 2016-02-17 2016-04-27 中国科学院武汉岩土力学研究所 Automatic identification algorithm for microseismic signal and p wave first arrival at engineering scale
CN110674965A (en) * 2019-05-15 2020-01-10 中国电建集团华东勘测设计研究院有限公司 Multi-time step wind power prediction method based on dynamic feature selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999756A (en) * 2012-11-09 2013-03-27 重庆邮电大学 Method for recognizing road signs by PSO-SVM (particle swarm optimization-support vector machine) based on GPU (graphics processing unit)
CN105527650A (en) * 2016-02-17 2016-04-27 中国科学院武汉岩土力学研究所 Automatic identification algorithm for microseismic signal and p wave first arrival at engineering scale
CN110674965A (en) * 2019-05-15 2020-01-10 中国电建集团华东勘测设计研究院有限公司 Multi-time step wind power prediction method based on dynamic feature selection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈子健 等: "隧道围岩稳定性改进模糊概率模型及其应用研究", 《隧道建设(中英文)》, vol. 40, no. 4, 30 April 2020 (2020-04-30), pages 504 - 511 *

Also Published As

Publication number Publication date
CN113239620B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
Ren et al. Sparse LU factorization for parallel circuit simulation on GPU
CN103440377B (en) Based on the flight vehicle aerodynamic profile optimization method for designing improving parallel DE algorithm
CN110135584B (en) Large-scale symbolic regression method and system based on adaptive parallel genetic algorithm
CN102779207B (en) Wing profile optimal design method of parallel difference evolutionary algorithm based on open computing language (Open CL)
Scherer et al. Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors
CN114186749B (en) Flexible workshop scheduling method and model based on reinforcement learning and genetic algorithm
CN106502632B (en) A kind of GPU parallel particle swarm optimization method based on self-adaptive thread beam
CN106959937B (en) A kind of vectorization implementation method of the warp product matrix towards GPDSP
CN111858066B (en) CPU + GPU heterogeneous parallel optimization method in pneumatic theory unified algorithm
CN105808309A (en) High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform
Johar et al. A review of genetic algorithms and parallel genetic algorithms on graphics processing unit (GPU)
US20210174202A1 (en) Method and apparatus with model optimization, and accelerator system
CN112434785B (en) Distributed parallel deep neural network performance evaluation method for supercomputer
CN113239620B (en) Improved particle swarm method for parameter identification of geotechnical material constitutive model based on GPU acceleration
Kim et al. ComPreEND: Computation pruning through predictive early negative detection for ReLU in a deep neural network accelerator
CN117828252A (en) High-performance matrix vector multiplication method based on matrix core
CN105335226B (en) For the iterative static task list scheduling method of multicomputer system
CN115756792A (en) CPU parallel acceleration method and system suitable for intelligent scheduling system
CN112613146A (en) Self-adaptive alignment optimization method, system, storage medium and computing equipment
Marongiu et al. GPU implementation of a SPH-ALE fluid dynamics solver
Jin et al. Parallel particle swarm optimization with genetic communication strategy and its implementation on GPU
US20210209462A1 (en) Method and system for processing a neural network
CN111443947B (en) Sequence comparison method and system for second-generation sequencing data based on many-core platform
Xiang et al. SUBP: soft uniform block pruning for 1xN sparse CNNs multithreading acceleration
Liu et al. Parallel solution of maze optimal path based on ant colony algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant