CN112733401B - Finite element tearing butt joint method and system for numerical simulation of reactor core assembly - Google Patents

Finite element tearing butt joint method and system for numerical simulation of reactor core assembly Download PDF

Info

Publication number
CN112733401B
CN112733401B CN202011607981.8A CN202011607981A CN112733401B CN 112733401 B CN112733401 B CN 112733401B CN 202011607981 A CN202011607981 A CN 202011607981A CN 112733401 B CN112733401 B CN 112733401B
Authority
CN
China
Prior art keywords
finite element
matrix
dense
dense matrix
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011607981.8A
Other languages
Chinese (zh)
Other versions
CN112733401A (en
Inventor
张纪林
张鋆宸
王珏
冯仰德
聂宁明
丁佳明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Hangzhou Dianzi University
Original Assignee
Computer Network Information Center of CAS
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS, Hangzhou Dianzi University filed Critical Computer Network Information Center of CAS
Priority to CN202011607981.8A priority Critical patent/CN112733401B/en
Publication of CN112733401A publication Critical patent/CN112733401A/en
Application granted granted Critical
Publication of CN112733401B publication Critical patent/CN112733401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E30/00Energy generation of nuclear origin
    • Y02E30/30Nuclear fission reactors

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Monitoring And Testing Of Nuclear Reactors (AREA)

Abstract

The invention discloses a finite element tearing butt joint method and a finite element tearing butt joint system for numerical simulation of a reactor core assembly. Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators. The invention adopts a load balancing strategy, so that the memory size of the dense matrix of each process tends to be the average value, cluster resources are fully utilized, and the solving speed is increased. HIP programming was employed to enable the finite element tear butt method to run on the NvidiaCUDA platform and the AMDROMc platform. In the dense matrix vector multiplication stage of the iterative solving process, a dynamic matrix allocation strategy is adopted, so that different processors are allocated to proper calculated quantities, and the calculating resources are fully utilized, so that the solving speed is increased. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and communication waiting time is reduced and the vector inner product speed is accelerated by introducing communication threads.

Description

Finite element tearing butt joint method and system for numerical simulation of reactor core assembly
Technical Field
The invention relates to a finite element tearing butt joint process treatment technology, in particular to a finite element tearing butt joint method and a finite element tearing butt joint system for numerical simulation of reactor core components.
Background
The core component in the nuclear reactor can deform and wear the fuel rod under the conditions of high temperature, irradiation, fluid, pressure and the like, so that a series of problems such as difficult loading and unloading, component damage, fatigue damage and the like are caused, and the safe operation of the reactor is influenced. Because of the special arrangement of the reactor core components and other problems, the theoretical analysis method is very difficult, so that the numerical simulation of the reactor core components is required to be carried out by adopting a finite element method.
The finite element tearing butt joint method (Finite Tearing and Interconnecting DD method, FETI) is an effective scheme for solving the reactor structural mechanics problem, is mainly used for treating a large-scale problem obtained by discretizing a partial differential equation, is an important method for large-scale numerical simulation of a reactor core assembly, and is also suitable for the fields of electromagnetism, aviation technology, mechanical manufacturing and the like. The finite element tear butt joint method was originally proposed by c.farhart and f.x.roux in the structural mechanics field as a non-overlapping region decomposition method, dividing a model into a number of non-overlapping subfields, and each subfield is independent. In order to ensure continuity between subfields, the FETI method is added with a group of unknowns (Lagrange multipliers LM), and in actual solving, a Krylov subspace iteration method is generally adopted to solve LM, and then a subfield equation is solved in each subfield.
However, the original FETI method is not computationally efficient. To solve this problem, farhat et al in 2001 proposed the FETI-DP method (a dual-primal unified FETI method) that eliminates the need for a second set of lagrangian multipliers and unifies all one-layer and two-layer FETI methods previously developed into a single dual pair. The FETI-DP is more robust than the FETI method, has higher calculation efficiency, and is suitable for solving the second-order and fourth-order problems. In 2006, the method of TFETI (Total FETI) was proposed by Dost' al et al. This method is a variant of the FETI method, where Dirichlet boundary conditions are also correlated with LM (lagrangian multiplier), however the rough problem is still an important factor limiting the scalability of the FETI method.
To reduce the impact of the rough problem, improving scalability, klawonn and rhenbach et al proposed the HFETI method in 2010 (Hybrid FETI method). The method combines the FETI method and the FETI-DP method, and groups a plurality of subfields into one cluster, which can be regarded as a three-level region decomposition method. First, one FETI-DP system needs to be set up to handle all clusters. Each cluster is then composed of multiple subfields, which need to be processed using conventional FETI methods. Also, in 2012, kozubek et al proposed a similar method, HTFETI method (Hybrid Total FETI method). The method combines the FETI method and the TFETI method, uses the TFETI method for subfields in each cluster, and uses the FETI method with projection for clusters. The HTFETI method can effectively reduce the rough problem.
However, in iterative solution of FETI, the sparse matrix vector operation consumes a lot of time, so Riha et al proposed LSC method (Local Schur Complement method) in 2016, replacing the sparse matrix vector operation with more efficient dense matrix vector multiplication (GEMV), equivalent to a strategy of time-shifting in space. This dense BLAS 2 level operation has continuous memory access rights and therefore better performance for memory-constrained applications. At the same time, dense matrix vector multiplication is suitable for processing with a GPU accelerator. Thus Vavrik et al used CUDA programming in 2018 to multiply dense matrix vectors to be processed by the GPU.
However, the present finite element tearing butt joint method still has the following problems to be solved urgently: 1) The known finite element tearing butt joint solver supporting heterogeneous parallelism uses CUDA programming, however, the solver realized by using the CUDA programming can only run on an Nvidia CUDA platform and does not support other types of GPU accelerators; 2) When GPU is adopted to calculate dense matrix vector multiplication, CPU is in idle state, and the computing resources of the clusters are not fully utilized; 3) When the numerical simulation of the reactor core assembly is actually carried out, the memory size of the dense matrix assembled by each process is quite different, even the difference between the calculation time and the memory size is 6 times, so that the process with less calculation amount takes a great amount of time to wait for other processes, and the solving time is increased.
Disclosure of Invention
The invention aims to solve the problems of the existing finite element tearing butt joint method, and provides a finite element tearing butt joint system for numerical simulation of reactor core components, which fully utilizes cluster resources, accelerates solving speed, reduces communication waiting time and improves portability.
A finite element tearing butt joint system for numerical simulation of a reactor comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
The input module is used for acquiring the grid file data and carrying out initialization parameter setting.
The region dividing module is used for dividing the grid into a plurality of regions and dividing each region into a plurality of sub-regions.
The matrix assembly module is used for generating a corresponding finite element matrix in each subarea.
The resource collection module is used for collecting the dense matrix size information of each process and comparing occupied memories.
The load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of the boundary node of each region by adopting the existing iteration method; and invoking a vector inner product acceleration policy and a communication computation overlap policy.
The local solving module is used for solving the displacement of the internal nodes of each region.
Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators.
The invention further aims to provide a finite element tearing butt joint method for numerical simulation of a reactor core assembly, which comprises the following specific steps:
step 1: and obtaining geometric model data of the reactor core assembly, and meshing the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core assembly through an input module, and initializes related parameters: finite element method, iterative method, maximum iteration number, iterative accuracy, core component material parameters, core component boundary conditions, etc.
The finite element method may be a FETI or HTFETI.
Step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n areas through the area dividing module, and each area is allocated with one process; while each region is further divided into s sub-regions.
Step 4: and each process generates a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix. Thus, each process generates s dense matrices.
Step 5: the resource collection module is utilized to collect dense matrix information of each process, and the occupied memory size of the dense matrix of the process i is L i Let L min =min{L 1 ,L 2 ,L 3 ...L n*g },L max =max{L 1 ,L 2 ,L 3 ...L n*g }. If it isX represents a threshold value, and the phenomenon of unbalanced load of the reactor core assembly occurs in the finite element processing process, and the reactor core assembly is regulated by adopting a load balancing strategy and enters a step 6; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
Step 6: enabling a load balancing strategy through a load balancing module, and adjusting the size of the memory occupied by the matrix of each process to be near the average value, wherein the method specifically comprises the following steps:
6-1, calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2, comparing the size of the dense matrix memory of each process with the average value, if the size of the dense matrix memory of each process is larger than the average value, considering that the calculated amount of the process is larger, and needing the help of other processes, and setting the process as a helped person; if the calculation amount of the process is smaller than the average value, the calculation amount of the process is considered to be smaller, other processes can be helped, and the process is set as a helpers;
6-3, dividing the process into two groups, wherein the helper is a group, the helped is a group, and sorting each group according to the size of the dense matrix memory, and correspondingly selecting one helped and one helped;
6-4 the helped sends 1 dense matrix to the helped;
6-5 repeating the step 6-4 until the memory of the dense matrix of the current helped person is smaller than the average value, then replacing the next helped person, or the memory of the dense matrix of the helped person is larger than the average value, then replacing the next helped person, and entering the step 6-4;
6-6 repeating steps 6-4 through 6-5 until the memory of all helpers is less than the average, or the dense matrix memory of all helpers is greater than the average.
Step 7: and (3) carrying out iterative solution on each process by an iterative solution module, wherein in each step of iterative solution, vector inner product operation adopts a vector inner product acceleration strategy and a communication calculation overlap strategy, dense matrix vector multiplication adopts HIP (heterogeneous calculation portable interface) programming to calculate on a similar GPU (graphic processing unit) accelerator, and adopts a dynamic matrix allocation strategy.
The vector inner product acceleration strategy is to solve the local vector inner products of all the processes in parallel by multiple threads.
The communication calculation overlap strategy is that each process uses 1 thread for communication, and the rest T-1 threads continue to participate in the local vector inner product calculation, wherein the communication threads participate in the local vector inner product calculation after completing communication.
The dynamic allocation matrix strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation. And dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is next toTime variable.
Step 8: and each process obtains the displacement of the internal nodes through the local solving module according to the iteration solving result, so as to obtain the displacement of all the nodes.
It is a further object of the present invention to provide a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method described above.
The beneficial effects of the invention are as follows:
when the finite element tearing butt joint method is used for numerical simulation of the reactor core assembly, a load balancing strategy is adopted, so that the memory size of the dense matrix of each process tends to be the average value, cluster resources can be fully utilized, and the solving speed is increased. Meanwhile, the invention adopts HIP programming, so that the finite element tearing butt joint method can run on an Nvidia CUDA platform and an AMD ROCm platform, and the portability of codes is improved. In the dense matrix vector multiplication stage of the iterative solving process, a dynamic matrix allocation strategy is adopted, so that different processors are allocated to proper calculated quantities, and the calculating resources are fully utilized, so that the solving speed is increased. In the vector inner product stage, a vector inner product acceleration strategy and a communication calculation overlapping strategy are adopted, and communication waiting time is reduced and the vector inner product speed is accelerated by introducing communication threads.
Drawings
FIG. 1 is a flow chart of a finite element tear butt acceleration method;
FIG. 2 is a dense matrix memory occupancy map;
FIG. 3 is a dense matrix vector multiplication computation time contrast graph.
Detailed Description
The invention is further analyzed in connection with the following specific examples.
The finite element tearing butt joint method for numerical simulation of the reactor core assembly is applied to deformation prediction of the reactor core assembly under the high temperature condition.
A deformation prediction method of a reactor core component in a nuclear reactor comprises a finite element tearing butt-joint acceleration method facing numerical simulation of the reactor core component; the deformation condition of the current reactor core assembly can be obtained through the obtained solving result (namely the displacement of all nodes), so that a basis is provided for the analysis design of the reactor core assembly.
The following steps and descriptions are specific:
a finite element tearing butt joint system for numerical simulation of reactor core components comprises an input module, a region dividing module, a matrix assembling module, a resource collecting module, a load balancing module, an iteration solving module and a local solving module.
The input module is used for acquiring the grid file data and carrying out initialization parameter setting.
The region dividing module is used for dividing the grid into a plurality of regions and dividing each region into a plurality of sub-regions.
The matrix assembly module is used for generating a corresponding finite element matrix in each subarea.
The resource collection module is used for collecting the dense matrix size information of each process and comparing occupied memories.
The load balancing module is used for calling a load balancing strategy and reallocating the dense matrix of each process.
The iteration solving module is used for solving the displacement of the boundary node of each region by adopting the existing iteration method; and invoking a vector inner product acceleration policy and a communication computation overlap policy.
The local solving module is used for solving the displacement of the internal nodes of each region.
Each of the n computing nodes is provided with the finite element tearing butt joint system, and each computing node is provided with g-block GPU accelerators.
A finite element tearing butt joint method for numerical simulation of reactor core components is shown in fig. 1, and comprises the following specific steps:
step 1: and obtaining geometric model data of the reactor core assembly, and meshing the geometric model data through the existing software to generate a mesh file.
Step 2: each computing node acquires a grid file of the reactor core assembly through an input module, and initializes related parameters: finite element method, iterative method, maximum iteration number, iterative accuracy, core component material parameters, core component boundary conditions, etc.
The finite element method may be a FETI or HTFETI.
The iterative method is a pretreatment conjugate gradient method and is used for solving the following finite element equation:
wherein:
F=BK + B T
G=BR
d=BK + f
e=R T f
matrix B is a displacement coordination matrix such that the node displacements on adjacent sub-field interfaces are equal. The matrix K is a finite element stiffness matrix. Matrix R is the basis of the null space of stiffness matrix K, and neighbor f is the load vector. λ and α are both unknowns, determined by selecting the pre-processing matrix p=i-G (G T G) -1 G T Eliminating the variable alpha, and solving by a preprocessing conjugate gradient method to obtain the displacement lambda of the boundary nodes of the subdomain.
The specific algorithm is described as follows:
step 3: n computing nodes, each computing node starts g processes, each process starts T threads, the grid obtained by the input module is divided into g x n areas through the area dividing module, and each area is allocated with one process; while each region is further divided into s sub-regions.
Step 4: and each process generates a corresponding finite element matrix in each subdomain through a matrix assembly module according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix. Thus, each process generates s dense matrices.
Step 5: the resource collection module is utilized to collect dense matrix information of each process, and the occupied memory size of the dense matrix of the process i is L i Let L min =min{L 1 ,L 2 ,L 3 …L n*g },L max =max{L 1 ,L 2 ,L 3 …L n*g }. If it isThe reactor core component is considered to have the load imbalance phenomenon in the finite element processing process, and is required to be adjusted by adopting a load balancing strategy, and the step 6 is entered; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
Step 6: and starting a load balancing strategy through a load balancing module, and adjusting the size of the memory occupied by the matrix of each process to be near the average value.
The load balancing strategy specifically comprises the following steps:
a) Calculating the average memory size of the dense matrix according to the memory size of the dense matrix of each process;
b) Comparing the size of the dense matrix memory of each process with the average value, if the size is larger than the average value, considering that the calculated amount of the process is larger, and needing the help of other processes, and setting the process as a helped person; if the calculated amount is smaller than the average value, the calculated amount of the process is considered to be smaller, and the process of the group can be helped to be a helpers;
c) Dividing the process into two groups, wherein the helper is a group, the helped is a group, each group is ordered according to the size of the dense matrix memory, and one helped are correspondingly selected;
d) The helped sends 1 dense matrix to the helped;
e) Repeating the step d) until the dense matrix memory of the current helped person is smaller than the average value, then replacing the next helped person, or the dense matrix memory of the helped person is larger than the average value, then replacing the next helped person, and entering the step d);
d) Repeating steps d) and e) until the memory of all helpers is less than the average, or the dense matrix memory of all helpers is greater than the average.
Step 7: and (3) carrying out iterative solution on each process by an iterative solution module, wherein in each step of iterative solution, vector inner product operation adopts a vector inner product acceleration strategy and a communication calculation overlap strategy, dense matrix vector multiplication adopts HIP programming to calculate on a similar GPU accelerator, and adopts a dynamic matrix allocation strategy.
The vector inner product acceleration strategy is to solve the local vector inner products of all the processes in parallel by multiple threads.
The communication calculation overlap strategy is that each process uses 1 thread for communication, the rest t-1 threads continue to participate in the local vector inner product calculation, and when the communication thread finishes communication, the thread participates in the local vector inner product calculation again. The specific algorithm is described as follows:
the dynamic allocation matrix strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation. And dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration. The specific formula is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is a temporary variable.
Step 8: and each process obtains the displacement of the internal nodes through the iteration solution result and the local solution module, thereby obtaining the displacement of all the nodes.
Fig. 2 shows comparison of the sizes of the dense matrix memories before and after load balancing, which shows that the load of each process can be regulated to an average value through a load balancing strategy, so that overlong communication waiting time caused by unbalanced load is avoided, and cluster resources are fully utilized. Fig. 3 shows the computation time of dense matrix vector multiplication before and after load balancing, and by comparison, it can be found that the load balancing strategy can effectively accelerate the solving speed.

Claims (5)

1. A finite element tear butt joint method for numerical simulation of a reactor core assembly, comprising the steps of:
step 1: obtaining geometric model data of a reactor core assembly, and carrying out grid division on the geometric model data to generate a grid file;
step 2: each computing node acquires a grid file of a reactor core assembly and initializes related parameters;
step 3: each computing node in the n computing nodes starts g processes, each process starts T threads, grids of the reactor core assembly are divided into g x n areas, and each area is allocated with one process; while each region is further divided into s sub-regions;
step 4: each process generates a corresponding finite element matrix in each subdomain according to the allocated region and the selected finite element method, and each subdomain generates a dense matrix;
step 5: collecting dense matrix information of each process, and judging a load balancing phenomenon in a finite element processing process after comparison; if the load is considered to be unbalanced, the step 6 is carried out, otherwise, the step 7 is carried out;
step 6: starting a load balancing strategy, and adjusting the size of a matrix occupied memory of each process to be near the average value; the method specifically comprises the following steps:
6-1, calculating an average value of the memory size of the dense matrix according to the memory size of the dense matrix of each process;
6-2, comparing the memory size of the dense matrix of each process with the average value, and if the memory size is larger than the average value, considering that the calculated amount of the process is larger, and setting the process as a helped person; if the calculation amount is smaller than the average value, the calculation amount of the process is considered to be smaller, and the process is set as a helper;
6-3, dividing the process into two groups, wherein the helper is a group, the helped is a group, and sorting each group according to the size of the dense matrix memory, and correspondingly selecting one helped and one helped;
6-4 the helped sends 1 dense matrix to the helped;
6-5 repeating the step 6-4 until the memory of the dense matrix of the current helped person is smaller than the average value, then replacing the next helped person, or the memory of the dense matrix of the helped person is larger than the average value, then replacing the next helped person, and entering the step 6-4;
6-6 repeating steps 6-4 to 6-5 until the memory of all helpers is less than the average value or the dense matrix memory of all helpers is greater than the average value;
step 7: carrying out iterative solution on each process, adopting a vector inner product acceleration strategy and a communication calculation overlap strategy for vector inner product operation in each iteration of the iterative solution, adopting HIP programming for dense matrix vector multiplication to calculate on a similar GPU accelerator, and adopting a dynamic matrix allocation strategy; the vector inner product acceleration strategy is to solve local vector inner products of all processes in parallel by multiple threads; the communication calculation overlap strategy is that each process uses 1 thread for communication, and the rest T-1 threads continue to participate in local vector inner product calculation, wherein the communication threads participate in the local vector inner product calculation after completing communication; the dynamic matrix allocation strategy is that when dense matrix vector multiplication is carried out, each process uses 1 thread to call a hipBLAS library, uses a block type GPU accelerator to carry out dense matrix vector multiplication calculation, and the other T-1 threads call an Intel MKL library, and uses a CPU to carry out dense matrix vector multiplication calculation; dynamically distributing matrix quantity to the CPU and the GPU-like accelerator according to the computation time of multiplying the dense matrix vector by the CPU and the GPU-like accelerator during each iteration;
step 8: and each process obtains the displacement of the internal nodes through the iteration solution result and obtains the displacement of all the nodes.
2. The finite element tearing butt joint method for numerical simulation of reactor core components according to claim 1, wherein the judging of the load balancing phenomenon in the finite element processing process in the step 5 is specifically:
let the dense matrix of process i occupy the memory size L i Let L min =min{L 1 ,L 2 ,L 3 …L n*g },L max =max{L 1 ,L 2 ,L 3 …L n*g -a }; if it isX represents a threshold value, and the phenomenon of unbalanced load of the reactor core assembly occurs in the finite element processing process, and the reactor core assembly is regulated by adopting a load balancing strategy and enters a step 6; otherwise, the load balancing in the finite element processing process is considered, and the step 7 is directly carried out.
3. The finite element tear butt joint method for numerical simulation of reactor core assemblies of claim 2, wherein the specific formula of step 7 is as follows:
where N represents the total number of dense matrices that the current process needs to process,representing the number of dense matrices assigned to the class GPU accelerator in the next iteration, < >>Represents the number of dense matrices allocated by the CPU in the next iteration,/->Representing the number of dense matrices allocated to the GPU accelerator of the last iteration class, +.>Representing the number of dense matrices allocated by the CPU in the last iteration, t c Representing the calculation time of the GPU accelerator of the last iteration class, t d Representing the calculation time of CPU in last iteration, x c_sub Representing the number of dense matrices, x, to which a single CPU core is assigned tmp Is a temporary variable.
4. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-3.
5. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-3.
CN202011607981.8A 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly Active CN112733401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011607981.8A CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011607981.8A CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Publications (2)

Publication Number Publication Date
CN112733401A CN112733401A (en) 2021-04-30
CN112733401B true CN112733401B (en) 2024-03-12

Family

ID=75610898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011607981.8A Active CN112733401B (en) 2020-12-30 2020-12-30 Finite element tearing butt joint method and system for numerical simulation of reactor core assembly

Country Status (1)

Country Link
CN (1) CN112733401B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102232282A (en) * 2010-10-29 2011-11-02 华为技术有限公司 Method and apparatus for realizing load balance of resources in data center
CN103731498A (en) * 2013-12-31 2014-04-16 浙江鸿程计算机***有限公司 Big data real-time enquiry system load balancing method based on copy selection
CN105045670A (en) * 2015-09-01 2015-11-11 浪潮(北京)电子信息产业有限公司 Method and system for balancing loads of central processing units and graphic processing units
CN110472187A (en) * 2019-08-06 2019-11-19 中国原子能科学研究院 A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves
CN112016232A (en) * 2020-08-31 2020-12-01 中国原子能科学研究院 Tear finite element process processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102232282A (en) * 2010-10-29 2011-11-02 华为技术有限公司 Method and apparatus for realizing load balance of resources in data center
CN103731498A (en) * 2013-12-31 2014-04-16 浙江鸿程计算机***有限公司 Big data real-time enquiry system load balancing method based on copy selection
CN105045670A (en) * 2015-09-01 2015-11-11 浪潮(北京)电子信息产业有限公司 Method and system for balancing loads of central processing units and graphic processing units
CN110472187A (en) * 2019-08-06 2019-11-19 中国原子能科学研究院 A kind of load balancing parallel method of the three-dimensional neutron transport method of characteristic curves
CN112016232A (en) * 2020-08-31 2020-12-01 中国原子能科学研究院 Tear finite element process processing method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Acceleration Techniques for FETI Solvers for GPU Accelerators;Radim Vavˇr´ık等;《International Conference on High Performance Computing & Simulation》;全文 *
Hadoop平台中一种Reduce负载均衡贪心算法;刘朵等;《计算机应用研究》;第33卷(第9期);第2658页 *
有限元边界积分结合撕裂对接法分析电磁散射;宛汀等;《***工程与电子技术》;第32卷(第9期);全文 *

Also Published As

Publication number Publication date
CN112733401A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
Azad et al. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication
Lastovetsky et al. Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing
Peng et al. GLU3. 0: Fast GPU-based parallel sparse LU factorization for circuit simulation
Balaprakash et al. Active-learning-based surrogate models for empirical performance tuning
Ida Lattice H-matrices on distributed-memory systems
Lastovetsky et al. Data distribution for dense factorization on computers with memory heterogeneity
Rico-Garcia et al. Comparison of high performance parallel implementations of tlbo and jaya optimization methods on manycore gpu
CN108879691B (en) Large-scale continuous power flow calculation method and device
Kopysov et al. Hybrid Multi-GPU solver based on Schur complement method
CN112035995A (en) Nonstructural grid tidal current numerical simulation method based on GPU (graphics processing Unit) computing technology
CN112733401B (en) Finite element tearing butt joint method and system for numerical simulation of reactor core assembly
CN109101708B (en) Implicit finite element parallel method based on two-stage region decomposition
Yang et al. Dynamic partitioning of loop iterations on heterogeneous PC clusters
CN112016232A (en) Tear finite element process processing method and system
CN108599173B (en) Method and device for solving batch power flows
Biswas et al. Portable parallel programming for the dynamic load balancing of unstructured grid applications
Kuźnik et al. Graph grammar-based multi-frontal parallel direct solver for two-dimensional isogeometric analysis
Kaur et al. Genetic algorithm solution for scheduling jobs in multiprocessor environment
Ghale et al. Task-based parallel computation of the density matrix in quantum-based molecular dynamics using graph partitioning
Marrakchi et al. Static scheduling with load balancing for solving triangular band linear systems on multicore processors
CN116595691B (en) RDPRQCG large-scale structure topology frequency optimization method
Coleman et al. Enhancing asynchronous linear solvers through randomization
CN117435308B (en) Modelica model simulation method and system based on parallel computing algorithm
Singh et al. Heterogeneous computing with graphical processing unit: improvised back-propagation algorithm for water level prediction
Korch et al. Implementation and Optimization of a 1D2V PIC Method for Nonlinear Kinetic Models on GPUs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant