CN112733401B - Finite element tearing butt joint method and system for numerical simulation of reactor core assembly - Google Patents
Finite element tearing butt joint method and system for numerical simulation of reactor core assembly Download PDFInfo
- Publication number
- CN112733401B CN112733401B CN202011607981.8A CN202011607981A CN112733401B CN 112733401 B CN112733401 B CN 112733401B CN 202011607981 A CN202011607981 A CN 202011607981A CN 112733401 B CN112733401 B CN 112733401B
- Authority
- CN
- China
- Prior art keywords
- finite element
- matrix
- dense
- dense matrix
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 162
- 210000001503 joint Anatomy 0.000 title claims abstract description 26
- 238000004088 simulation Methods 0.000 title claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 100
- 239000013598 vector Substances 0.000 claims abstract description 52
- 230000015654 memory Effects 0.000 claims abstract description 43
- 238000004364 calculation method Methods 0.000 claims abstract description 34
- 238000004891 communication Methods 0.000 claims abstract description 25
- 230000001133 acceleration Effects 0.000 claims abstract description 12
- 239000008358 core component Substances 0.000 claims description 15
- 238000006073 displacement reaction Methods 0.000 claims description 14
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000712 assembly Effects 0.000 claims 1
- 238000000429 assembly Methods 0.000 claims 1
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000002939 conjugate gradient method Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 240000007673 Origanum vulgare Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/23—Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E30/00—Energy generation of nuclear origin
- Y02E30/30—Nuclear fission reactors
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Monitoring And Testing Of Nuclear Reactors (AREA)
Abstract
The invention discloses a finite element tearing butt joint method and a finite element tearing butt joint system for numerical simulation of a reactor core assembly. Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators. The invention adopts a load balancing strategy, so that the memory size of the dense matrix of each process tends to be the average value, cluster resources are fully utilized, and the solving speed is increased. HIP programming was employed to enable the finite element tear butt method to run on the NvidiaCUDA platform and the AMDROMc platform. In the dense matrix vector multiplication stage of the iterative solving process, a dynamic matrix allocation strategy is adopted, so that different processors are allocated to proper calculated quantities, and the calculating resources are fully utilized, so that the solving speed is increased. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and communication waiting time is reduced and the vector inner product speed is accelerated by introducing communication threads.
Description
Technical Field
The invention relates to a finite element tearing butt joint process treatment technology, in particular to a finite element tearing butt joint method and a finite element tearing butt joint system for numerical simulation of reactor core components.
Background
The core component in the nuclear reactor can deform and wear the fuel rod under the conditions of high temperature, irradiation, fluid, pressure and the like, so that a series of problems such as difficult loading and unloading, component damage, fatigue damage and the like are caused, and the safe operation of the reactor is influenced. Because of the special arrangement of the reactor core components and other problems, the theoretical analysis method is very difficult, so that the numerical simulation of the reactor core components is required to be carried out by adopting a finite element method.
The finite element tearing butt joint method (Finite Tearing and Interconnecting DD method, FETI) is an effective scheme for solving the reactor structural mechanics problem, is mainly used for treating a large-scale problem obtained by discretizing a partial differential equation, is an important method for large-scale numerical simulation of a reactor core assembly, and is also suitable for the fields of electromagnetism, aviation technology, mechanical manufacturing and the like. The finite element tear butt joint method was originally proposed by c.farhart and f.x.roux in the structural mechanics field as a non-overlapping region decomposition method, dividing a model into a number of non-overlapping subfields, and each subfield is independent. In order to ensure continuity between subfields, the FETI method is added with a group of unknowns (Lagrange multipliers LM), and in actual solving, a Krylov subspace iteration method is generally adopted to solve LM, and then a subfield equation is solved in each subfield.
However, the original FETI method is not computationally efficient. To solve this problem, farhat et al in 2001 proposed the FETI-DP method (a dual-primal unified FETI method) that eliminates the need for a second set of lagrangian multipliers and unifies all one-layer and two-layer FETI methods previously developed into a single dual pair. The FETI-DP is more robust than the FETI method, has higher calculation efficiency, and is suitable for solving the second-order and fourth-order problems. In 2006, the method of TFETI (Total FETI) was proposed by Dost' al et al. This method is a variant of the FETI method, where Dirichlet boundary conditions are also correlated with LM (lagrangian multiplier), however the rough problem is still an important factor limiting the scalability of the FETI method.
To reduce the impact of the rough problem, improving scalability, klawonn and rhenbach et al proposed the HFETI method in 2010 (Hybrid FETI method). The method combines the FETI method and the FETI-DP method, and groups a plurality of subfields into one cluster, which can be regarded as a three-level region decomposition method. First, one FETI-DP system needs to be set up to handle all clusters. Each cluster is then composed of multiple subfields, which need to be processed using conventional FETI methods. Also, in 2012, kozubek et al proposed a similar method, HTFETI method (Hybrid Total FETI method). The method combines the FETI method and the TFETI method, uses the TFETI method for subfields in each cluster, and uses the FETI method with projection for clusters. The HTFETI method can effectively reduce the rough problem.
However, in iterative solution of FETI, the sparse matrix vector operation consumes a lot of time, so Riha et al proposed LSC method (Local Schur Complement method) in 2016, replacing the sparse matrix vector operation with more efficient dense matrix vector multiplication (GEMV), equivalent to a strategy of time-shifting in space. This dense BLAS 2 level operation has continuous memory access rights and therefore better performance for memory-constrained applications. At the same time, dense matrix vector multiplication is suitable for processing with a GPU accelerator. Thus Vavrik et al used CUDA programming in 2018 to multiply dense matrix vectors to be processed by the GPU.
However, the present finite element tearing butt joint method still has the following problems to be solved urgently: 1) The known finite element tearing butt joint solver supporting heterogeneous parallelism uses CUDA programming, however, the solver realized by using the CUDA programming can only run on an Nvidia CUDA platform and does not support other types of GPU accelerators; 2) When GPU is adopted to calculate dense matrix vector multiplication, CPU is in idle state, and the computing resources of the clusters are not fully utilized; 3) When the numerical simulation of the reactor core assembly is actually carried out, the memory size of the dense matrix assembled by each process is quite different, even the difference between the calculation time and the memory size is 6 times, so that the process with less calculation amount takes a great amount of time to wait for other processes, and the solving time is increased.
Disclosure of Invention
The invention aims to solve the problems of the existing finite element tearing butt joint method, and provides a finite element tearing butt joint system for numerical simulation of reactor core components, which fully utilizes cluster resources, accelerates solving speed, reduces communication waiting time and improves portability.
A finite element tearing butt joint system for numerical simulation of a reactor comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
The input module is used for acquiring the grid file data and carrying out initialization parameter setting.
The region dividing module is used for dividing the grid into a plurality of regions and dividing each region into a plurality of sub-regions.
The matrix assembly module is used for generating a corresponding finite element matrix in each subarea.
The resource collection module is used for collecting the dense matrix size information of each process and comparing occupied memories.
The load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of the boundary node of each region by adopting the existing iteration method; and invoking a vector inner product acceleration policy and a communication computation overlap policy.
The local solving module is used for solving the displacement of the internal nodes of each region.
Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators.
The invention further aims to provide a finite element tearing butt joint method for numerical simulation of a reactor core assembly, which comprises the following specific steps:
step 1: and obtaining geometric model data of the reactor core assembly, and meshing the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core assembly through an input module, and initializes related parameters: finite element method, iterative method, maximum iteration number, iterative accuracy, core component material parameters, core component boundary conditions, etc.
The finite element method may be a FETI or HTFETI.
Step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n areas through the area dividing module, and each area is allocated with one process; while each region is further divided into s sub-regions.
Step 4: and each process generates a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix. Thus, each process generates s dense matrices.
Step 5: the resource collection module is utilized to collect dense matrix information of each process, and the occupied memory size of the dense matrix of the process i is L i Let L min =min{L 1 ,L 2 ,L 3 ...L n*g },L max =max{L 1 ,L 2 ,L 3 ...L n*g }. If it isX represents a threshold value, and the phenomenon of unbalanced load of the reactor core assembly occurs in the finite element processing process, and the reactor core assembly is regulated by adopting a load balancing strategy and enters a step 6; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
Step 6: enabling a load balancing strategy through a load balancing module, and adjusting the size of the memory occupied by the matrix of each process to be near the average value, wherein the method specifically comprises the following steps:
6-1, calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2, comparing the size of the dense matrix memory of each process with the average value, if the size of the dense matrix memory of each process is larger than the average value, considering that the calculated amount of the process is larger, and needing the help of other processes, and setting the process as a helped person; if the calculation amount of the process is smaller than the average value, the calculation amount of the process is considered to be smaller, other processes can be helped, and the process is set as a helpers;
6-3, dividing the process into two groups, wherein the helper is a group, the helped is a group, and sorting each group according to the size of the dense matrix memory, and correspondingly selecting one helped and one helped;
6-4 the helped sends 1 dense matrix to the helped;
6-5 repeating the step 6-4 until the memory of the dense matrix of the current helped person is smaller than the average value, then replacing the next helped person, or the memory of the dense matrix of the helped person is larger than the average value, then replacing the next helped person, and entering the step 6-4;
6-6 repeating steps 6-4 through 6-5 until the memory of all helpers is less than the average, or the dense matrix memory of all helpers is greater than the average.
Step 7: and (3) carrying out iterative solution on each process by an iterative solution module, wherein in each step of iterative solution, vector inner product operation adopts a vector inner product acceleration strategy and a communication calculation overlap strategy, dense matrix vector multiplication adopts HIP (heterogeneous calculation portable interface) programming to calculate on a similar GPU (graphic processing unit) accelerator, and adopts a dynamic matrix allocation strategy.
The vector inner product acceleration strategy is to solve the local vector inner products of all the processes in parallel by multiple threads.
The communication calculation overlap strategy is that each process uses 1 thread for communication, and the rest T-1 threads continue to participate in the local vector inner product calculation, wherein the communication threads participate in the local vector inner product calculation after completing communication.
The dynamic allocation matrix strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation. And dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is next toTime variable.
Step 8: and each process obtains the displacement of the internal nodes through the local solving module according to the iteration solving result, so as to obtain the displacement of all the nodes.
It is a further object of the present invention to provide a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method described above.
The beneficial effects of the invention are as follows:
when the finite element tearing butt joint method is used for numerical simulation of the reactor core assembly, a load balancing strategy is adopted, so that the memory size of the dense matrix of each process tends to be the average value, cluster resources can be fully utilized, and the solving speed is increased. Meanwhile, the invention adopts HIP programming, so that the finite element tearing butt joint method can run on an Nvidia CUDA platform and an AMD ROCm platform, and the portability of codes is improved. In the dense matrix vector multiplication stage of the iterative solving process, a dynamic matrix allocation strategy is adopted, so that different processors are allocated to proper calculated quantities, and the calculating resources are fully utilized, so that the solving speed is increased. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and communication waiting time is reduced and the vector inner product speed is accelerated by introducing communication threads.
Drawings
FIG. 1 is a flow chart of a finite element tear butt acceleration method;
FIG. 2 is a dense matrix memory occupancy map;
FIG. 3 is a dense matrix vector multiplication computation time contrast graph.
Detailed Description
The invention is further analyzed in connection with the following specific examples.
The finite element tearing butt joint method for numerical simulation of the reactor core assembly is applied to deformation prediction of the reactor core assembly under the high temperature condition.
A deformation prediction method of a reactor core component in a nuclear reactor comprises a finite element tearing butt-joint acceleration method facing numerical simulation of the reactor core component; the deformation condition of the current reactor core assembly can be obtained through the obtained solving result (namely the displacement of all nodes), so that a basis is provided for the analysis design of the reactor core assembly.
The following steps and descriptions are specific:
a finite element tearing butt joint system for numerical simulation of reactor core components comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
The input module is used for acquiring the grid file data and carrying out initialization parameter setting.
The region dividing module is used for dividing the grid into a plurality of regions and dividing each region into a plurality of sub-regions.
The matrix assembly module is used for generating a corresponding finite element matrix in each subarea.
The resource collection module is used for collecting the dense matrix size information of each process and comparing occupied memories.
The load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of the boundary node of each region by adopting the existing iteration method; and invoking a vector inner product acceleration policy and a communication computation overlap policy.
The local solving module is used for solving the displacement of the internal nodes of each region.
Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators.
A finite element tearing butt joint method for numerical simulation of reactor core components is shown in fig. 1, and comprises the following specific steps:
step 1: and obtaining geometric model data of the reactor core assembly, and meshing the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core assembly through an input module, and initializes related parameters: finite element method, iterative method, maximum iteration number, iterative accuracy, core component material parameters, core component boundary conditions, etc.
The finite element method may be a FETI or HTFETI.
The iterative method is a pretreatment conjugate gradient method and is used for solving the following finite element equation:
wherein:
F=BK + B T
G=BR
d=BK + f
e=R T f
matrix B is a displacement coordination matrix such that the node displacements on adjacent sub-field interfaces are equal. The matrix K is a finite element stiffness matrix. Matrix R is the basis of the null space of stiffness matrix K, and neighbor f is the load vector. λ and α are both unknowns, determined by selecting the pre-processing matrix p=i-G (G T G) -1 G T Eliminating the variable alpha, and solving by a preprocessing conjugate gradient method to obtain the displacement lambda of the boundary nodes of the subdomain.
The specific algorithm is described as follows:
step 3: n computing nodes, each computing node starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n areas through the area dividing module, and each area is allocated with one process; while each region is further divided into s sub-regions.
Step 4: and each process generates a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix. Thus, each process generates s dense matrices.
Step 5: the resource collection module is utilized to collect dense matrix information of each process, and the occupied memory size of the dense matrix of the process i is L i Let L min =min{L 1 ,L 2 ,L 3 …L n*g },L max =max{L 1 ,L 2 ,L 3 …L n*g }. If it isThe reactor core component is considered to have the load imbalance phenomenon in the finite element processing process, and is required to be adjusted by adopting a load balancing strategy, and the step 6 is entered; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
Step 6: and starting a load balancing strategy through a load balancing module, and adjusting the size of the memory occupied by the matrix of each process to be near the average value.
The load balancing strategy specifically comprises the following steps:
a) Calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
b) Comparing the size of the dense matrix memory of each process with the average value, if the size is larger than the average value, considering that the calculated amount of the process is larger, and needing the help of other processes, and setting the process as a helped person; if the calculated amount is smaller than the average value, the calculated amount of the process is considered to be smaller, and the process of the group can be helped to be a helpers;
c) Dividing the process into two groups, wherein the helper is a group, the helped is a group, each group is ordered according to the size of the dense matrix memory, and one helped are correspondingly selected;
d) The helped sends 1 dense matrix to the helped;
e) Repeating the step d) until the dense matrix memory of the current helped person is smaller than the average value, then replacing the next helped person, or the dense matrix memory of the helped person is larger than the average value, then replacing the next helped person, and entering the step d);
d) Repeating steps d) and e) until the memory of all helpers is less than the average, or the dense matrix memory of all helpers is greater than the average.
Step 7: and (3) carrying out iterative solution on each process by an iterative solution module, wherein in each step of iterative solution, vector inner product operation adopts a vector inner product acceleration strategy and a communication calculation overlap strategy, dense matrix vector multiplication adopts HIP programming to calculate on a similar GPU accelerator, and adopts a dynamic matrix allocation strategy.
The vector inner product acceleration strategy is to solve the local vector inner products of all the processes in parallel by multiple threads.
The communication calculation overlap strategy is that each process uses 1 thread for communication, the rest t-1 threads continue to participate in the local vector inner product calculation, and when the communication thread finishes communication, the thread participates in the local vector inner product calculation again. The specific algorithm is described as follows:
the dynamic allocation matrix strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation. And dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is a temporary variable.
Step 8: and each process obtains the displacement of the internal nodes through the iteration solution result and the local solution module, thereby obtaining the displacement of all the nodes.
Fig. 2 shows comparison of the sizes of the dense matrix memories before and after load balancing, which shows that the load of each process can be regulated to an average value through a load balancing strategy, so that overlong communication waiting time caused by unbalanced load is avoided, and cluster resources are fully utilized. Fig. 3 shows the computation time of dense matrix vector multiplication before and after load balancing, and by comparison, it can be found that the load balancing strategy can effectively accelerate the solving speed.
Claims (5)
1. A finite element tear butt joint method for numerical simulation of a reactor core assembly, comprising the steps of:
step 1: obtaining geometric model data of a reactor core assembly, and carrying out grid division on the geometric model data to generate a grid file;
step 2: each computing node acquires a grid file of a reactor core assembly and initializes related parameters;
step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, grids of the reactor core assembly are divided into g x n areas, and each area is allocated with one process; while each region is further divided into s sub-regions;
step 4: each process generates a corresponding finite element matrix in each subdomain according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix;
step 5: collecting dense matrix information of each process, and judging a load balancing phenomenon in a finite element processing process after comparison; if the load is considered to be unbalanced, the step 6 is carried out, otherwise, the step 7 is carried out;
step 6: starting a load balancing strategy, and adjusting the size of a matrix occupied memory of each process to be near the average value; the method specifically comprises the following steps:
6-1, calculating an average value of the memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2, comparing the memory size of the dense matrix of each process with the average value, and if the memory size is larger than the average value, considering that the calculated amount of the process is larger, and setting the process as a helped person; if the calculation amount is smaller than the average value, the calculation amount of the process is considered to be smaller, and the process is set as a helper;
6-3, dividing the process into two groups, wherein the helper is a group, the helped is a group, and sorting each group according to the size of the dense matrix memory, and correspondingly selecting one helped and one helped;
6-4 the helped sends 1 dense matrix to the helped;
6-5 repeating the step 6-4 until the memory of the dense matrix of the current helped person is smaller than the average value, then replacing the next helped person, or the memory of the dense matrix of the helped person is larger than the average value, then replacing the next helped person, and entering the step 6-4;
6-6 repeating steps 6-4 to 6-5 until the memory of all helpers is less than the average value or the dense matrix memory of all helpers is greater than the average value;
step 7: carrying out iterative solution on each process, adopting a vector inner product acceleration strategy and a communication calculation overlap strategy for vector inner product operation in each iteration of the iterative solution, adopting HIP programming for dense matrix vector multiplication to calculate on a similar GPU accelerator, and adopting a dynamic matrix allocation strategy; the vector inner product acceleration strategy is to solve local vector inner products of all processes in parallel by multiple threads; the communication calculation overlap strategy is that each process uses 1 thread for communication, and the rest T-1 threads continue to participate in local vector inner product calculation, wherein the communication threads participate in the local vector inner product calculation after completing communication; the dynamic matrix allocation strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation; dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration;
step 8: and each process obtains the displacement of the internal nodes through the iteration solution result and obtains the displacement of all the nodes.
2. The finite element tearing butt joint method for numerical simulation of reactor core components according to claim 1, wherein the judging of the load balancing phenomenon in the finite element processing process in the step 5 is specifically:
let the dense matrix of process i occupy the memory size L i Let L min =min{L 1 ,L 2 ,L 3 …L n*g },L max =max{L 1 ,L 2 ,L 3 …L n*g -a }; if it isX represents a threshold value, and the phenomenon of unbalanced load of the reactor core assembly occurs in the finite element processing process, and the reactor core assembly is regulated by adopting a load balancing strategy and enters a step 6; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
3. The finite element tear butt joint method for numerical simulation of reactor core assemblies of claim 2, wherein the specific formula of step 7 is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is a temporary variable.
4. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-3.
5. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011607981.8A CN112733401B (en) | 2020-12-30 | 2020-12-30 | Finite element tearing butt joint method and system for numerical simulation of reactor core assembly |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011607981.8A CN112733401B (en) | 2020-12-30 | 2020-12-30 | Finite element tearing butt joint method and system for numerical simulation of reactor core assembly |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112733401A CN112733401A (en) | 2021-04-30 |
CN112733401B true CN112733401B (en) | 2024-03-12 |
Family
ID=75610898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011607981.8A Active CN112733401B (en) | 2020-12-30 | 2020-12-30 | Finite element tearing butt joint method and system for numerical simulation of reactor core assembly |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112733401B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102232282A (en) * | 2010-10-29 | 2011-11-02 | 华为技术有限公司 | Method and apparatus for realizing load balance of resources in data center |
CN103731498A (en) * | 2013-12-31 | 2014-04-16 | 浙江鸿程计算机***有限公司 | Big data real-time enquiry system load balancing method based on copy selection |
CN105045670A (en) * | 2015-09-01 | 2015-11-11 | 浪潮(北京)电子信息产业有限公司 | Method and system for balancing loads of central processing units and graphic processing units |
CN110472187A (en) * | 2019-08-06 | 2019-11-19 | 中国原子能科学研究院 | A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves |
CN112016232A (en) * | 2020-08-31 | 2020-12-01 | 中国原子能科学研究院 | Tear finite element process processing method and system |
-
2020
- 2020-12-30 CN CN202011607981.8A patent/CN112733401B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102232282A (en) * | 2010-10-29 | 2011-11-02 | 华为技术有限公司 | Method and apparatus for realizing load balance of resources in data center |
CN103731498A (en) * | 2013-12-31 | 2014-04-16 | 浙江鸿程计算机***有限公司 | Big data real-time enquiry system load balancing method based on copy selection |
CN105045670A (en) * | 2015-09-01 | 2015-11-11 | 浪潮(北京)电子信息产业有限公司 | Method and system for balancing loads of central processing units and graphic processing units |
CN110472187A (en) * | 2019-08-06 | 2019-11-19 | 中国原子能科学研究院 | A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves |
CN112016232A (en) * | 2020-08-31 | 2020-12-01 | 中国原子能科学研究院 | Tear finite element process processing method and system |
Non-Patent Citations (3)
Title |
---|
Acceleration Techniques for FETI Solvers for GPU Accelerators;Radim Vavˇr´ık等;《International Conference on High Performance Computing & Simulation》;全文 * |
Hadoop平台中一种Reduce负载均衡贪心算法;刘朵等;《计算机应用研究》;第33卷(第9期);第2658页 * |
有限元边界积分结合撕裂对接法分析电磁散射;宛汀等;《***工程与电子技术》;第32卷(第9期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112733401A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Azad et al. | Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication | |
Lastovetsky et al. | Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing | |
Peng et al. | GLU3. 0: Fast GPU-based parallel sparse LU factorization for circuit simulation | |
Balaprakash et al. | Active-learning-based surrogate models for empirical performance tuning | |
Ida | Lattice H-matrices on distributed-memory systems | |
Lastovetsky et al. | Data distribution for dense factorization on computers with memory heterogeneity | |
Rico-Garcia et al. | Comparison of high performance parallel implementations of tlbo and jaya optimization methods on manycore gpu | |
CN108879691B (en) | Large-scale continuous power flow calculation method and device | |
Kopysov et al. | Hybrid Multi-GPU solver based on Schur complement method | |
CN112035995A (en) | Nonstructural grid tidal current numerical simulation method based on GPU (graphics processing Unit) computing technology | |
CN112733401B (en) | Finite element tearing butt joint method and system for numerical simulation of reactor core assembly | |
CN109101708B (en) | Implicit finite element parallel method based on two-stage region decomposition | |
Yang et al. | Dynamic partitioning of loop iterations on heterogeneous PC clusters | |
CN112016232A (en) | Tear finite element process processing method and system | |
CN108599173B (en) | Method and device for solving batch power flows | |
Biswas et al. | Portable parallel programming for the dynamic load balancing of unstructured grid applications | |
Kuźnik et al. | Graph grammar-based multi-frontal parallel direct solver for two-dimensional isogeometric analysis | |
Kaur et al. | Genetic algorithm solution for scheduling jobs in multiprocessor environment | |
Ghale et al. | Task-based parallel computation of the density matrix in quantum-based molecular dynamics using graph partitioning | |
Marrakchi et al. | Static scheduling with load balancing for solving triangular band linear systems on multicore processors | |
CN116595691B (en) | RDPRQCG large-scale structure topology frequency optimization method | |
Coleman et al. | Enhancing asynchronous linear solvers through randomization | |
CN117435308B (en) | Modelica model simulation method and system based on parallel computing algorithm | |
Singh et al. | Heterogeneous computing with graphical processing unit: improvised back-propagation algorithm for water level prediction | |
Korch et al. | Implementation and Optimization of a 1D2V PIC Method for Nonlinear Kinetic Models on GPUs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |