CN115291949A

CN115291949A - Accelerated computing device and accelerated computing method for computational fluid dynamics

Info

Publication number: CN115291949A
Application number: CN202211171216.5A
Authority: CN
Inventors: 龚艳琼; 刘必慰; 赵玉新; 黄东昌; 郭阳; 江豪龙; 赖雯; 王洁; 杨益斌
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-11-04
Anticipated expiration: 2042-09-26
Also published as: CN115291949B

Abstract

The application relates to computational fluid dynamics and a computational acceleration method in the technical field of computational fluid dynamics and computers. The accelerated computing device includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the data transmission channels are arranged among the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Description

Accelerated computing device and accelerated computing method for computational fluid dynamics

Technical Field

The present application relates to the field of computational fluid dynamics and computer technologies, and in particular, to an accelerated computing apparatus and an accelerated computing method for computational fluid dynamics.

Background

Computational Fluid Dynamics (CFD for short) mainly utilizes a computer to solve a basic control equation of hydrodynamics, and can relatively easily and accurately simulate the flow characteristics of a complex flow field. In the discretization method of numerical computation analysis simulation using CFD, the finite difference method is a typical method in the numerical solution, and different difference computation formats can be combined by combining the time and space difference formats. However, the geometric shapes, numerical methods, physical and chemical models and the like required by the current CFD are increasingly complex and fine, and extremely high requirements are provided for large-scale calculation. And for the accurate simulation of the real flow, the calculation amount is very huge, and especially for the accurate numerical simulation of a full-size model, the current calculation capability can not be realized.

In order to accelerate the CFD calculation, various researchers in various countries carry out a lot of researches, and the development of CFD is effectively promoted. However, due to the limited computing power of computers, it still faces a serious challenge to implement CFD high performance on general purpose computing hardware.

Disclosure of Invention

In view of the above, there is a need to provide a computational fluid dynamics-oriented accelerated computing apparatus and an accelerated computing method, which have simple hardware structure, high computational efficiency, and flexibility and programmability.

A computational fluid dynamics-oriented acceleration computing device, the acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field.

The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining the adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel.

Further, the differential operation unit further includes:

and the instruction controller is used for controlling the address of the instruction to be executed.

And the instruction memory is used for storing instructions to be executed.

And the plurality of general registers are used for storing register data.

And the arithmetic logic operation unit is used for carrying out logic operation on the operand.

Further, the instruction controller includes: a self-adding one adder and an alternative multiplexer.

Further, the differential operation unit executes an instruction, including four clock cycles of addressing, decoding, executing, and writing back.

In the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;

in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers.

In the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump.

And in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

Further, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit.

The four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.

Furthermore, the number of general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.

Further, the instruction set used by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm.

The instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.

Further, the instruction set is divided into three types according to the difference of operands, including: register type, immediate type, and hybrid type.

Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control class instructions, operation class instructions and data movement class instructions.

Wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data moving instruction comprises the following steps: and (5) moving the command.

An accelerated calculation method facing computational fluid dynamics is used for achieving accelerated calculation facing computational fluid dynamics by adopting the accelerated calculation device; the method comprises the following steps:

and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.

And determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number.

All the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units.

Setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.

And operating the calculation program, and outputting the difference operation results of all nodes in the flow field at different moments.

In the above computational fluid dynamics-oriented acceleration calculation apparatus and acceleration calculation method, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Drawings

FIG. 1 is a schematic diagram of an accelerated computational device oriented to computational fluid dynamics in one embodiment;

FIG. 2 is a diagram of a computational fluid dynamics oriented arithmetic unit in one embodiment;

FIG. 3 is a schematic diagram of a computational fluid dynamics oriented transport channel in one embodiment;

FIG. 4 is a schematic flow chart of a computational fluid dynamics oriented acceleration calculation method according to another embodiment;

FIG. 5 is a schematic structural diagram of an accelerated computing device for solving two-dimensional linear convection equations for computational fluid dynamics in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, there is provided a computational fluid dynamics oriented acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field. Wherein, the differential operation units (DPE) are designed according to the calculation characteristics of the finite differential method widely used for calculating fluid mechanics, and the number of the differential operation units is determined according to the condition that a fluid mechanics equation of the fluid mechanics problem to be solved is expanded on the space.

The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together and completing differential operation of all nodes in a flow field in parallel. A plurality of special operation units can be combined together through a transmission channel, the size of the combined differential operation unit array is changed according to the characteristics of a computational fluid dynamics algorithm, and different differential calculation formats can be realized by changing instructions in an instruction memory, so that the method has the advantages of flexibility and programmability.

The instruction set is designed according to the characteristic that computational fluid dynamics are calculated by using a finite difference method, and the instruction set is the minimum set capable of completing the difference operation. The instruction set is in a 16-bit RISC instruction set encoding format, and the instruction memory is 128 multiplied by 16 bits, namely all instruction encoding is 16-bit equal in length.

In the above computational fluid dynamics-oriented acceleration calculation apparatus, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Further, the differential operation unit further includes: the instruction controller is used for controlling the address of the instruction to be executed; the instruction memory is used for storing instructions to be executed; a plurality of general purpose registers (grs) for storing register data; and the arithmetic logic operation unit is used for carrying out logic operation on the operand.

The hardware structure is simple, a data memory is eliminated on the hardware structure, and each DPE only comprises four parts, namely an instruction controller, an instruction memory, a general register and an arithmetic logic operation unit.

Further, the command controller includes: an adder for adding one and a multiplexer for selecting one from two.

Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and write-back; in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register; in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers; in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump; and in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

In a specific embodiment, as shown in fig. 2, the differential operation unit DPE for computational fluid dynamics includes four parts, i.e., an instruction controller, an instruction memory, a general purpose register, and an arithmetic logic operation unit, wherein the instruction controller is configured to control an address of an instruction to be executed, the instruction controller includes an adder for adding one and a multiplexer for selecting one from two, the instruction memory is configured to store the instruction to be executed, the general purpose register is configured to store register data, the general purpose register has a bit width of 32 bits and a depth of 60, the arithmetic logic operation unit is configured to perform a logic operation on an operand, and the execution of an instruction includes four clock cycles, i.e., address IF, decode ID, execute EX, and write back WB, and the specific process is as follows:

1) Address (IF):

in the address stage, an instruction is read from an instruction memory according to the PC value and is sent to an instruction register (ID _ ir), and meanwhile, the value of the PC in the next period is set, and the instructions can be executed in sequence or jump to a specific address for execution.

2) Decoding (ID):

the decode stage decodes the instruction, extracts the corresponding operands from the instruction according to the instruction function (i.e., the opcode), and places the extracted operands in register a (reg _ a) and register B (reg _ B).

3) Execution (EX):

the EX stage operates reg _ a and reg _ B in the arithmetic logic unit ALU according to the instruction function, stores the operation result in a register C (reg _ C), sets the flag register flag to 0 or 1 according to the instruction function and the operation result, and is used for the instruction controller to determine whether the next instruction is executed sequentially or jumped, i.e. whether the value of the next clock cycle PC is a value in reg _ C or self-increment.

4) Write-back (WB):

the WB stage determines whether to modify the register value and how to modify it according to the instruction function and the EX stage result, and if so, stores the reg _ C value into the corresponding position of the general register. This stage is only valid for instructions that need to modify register values.

Furthermore, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit; the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter the transmission channel through the communication register nearest to the adjacent differential operation units for data communication.

In particular, as shown in fig. 3. The special differential operation unit (DPE) is provided with four Communication registers (cr) and is positioned at four edges of the DPE to form four transmission channels, the four Communication registers (cr 0-cr 4) are connected with an internal general register 55 (gr 55), the gr58 is used for storing the numerical value of the DPE needing differential operation, the adjacent DPE enters the transmission channel through the Communication register nearest to the adjacent DPE for data Communication, and the bit width of the Communication register is 32 bits. Four input ports are added for each DPE to be connected with internal general purpose registers 56-59 (gr 56-gr 59), four output ports are added to be respectively connected with four communication register registers, taking DPE _ X _ Y as an example, where _ X _ Y represents the position of the DPE, and DPE _ X _ Y is correspondingly connected with four adjacent DPEs through the added input and output ports, so as to acquire data of adjacent points and transmit the data to the adjacent points.

Further, the instruction set adopted by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm; the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.

Further, the instruction set is divided into three types according to the operand, including: register type (R type), immediate type (I type), and blend type (RI type). The computational fluid dynamics oriented instruction set encoding format is shown in table 1.

TABLE 1 instruction set encoding format for computational fluid dynamics

Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control instructions, operation instructions and data moving instructions; wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data move class instruction comprises: and (5) moving the command.

In particular, class 3 and class 14 strip machine instructions are set and implemented for computational fluid dynamics algorithm characteristics. Wherein, the control class instruction: null instructions (NOP), stop instructions (HALT), branch jump instructions (BZ, BNZ, BN, BNN); operation class instructions: a fixed point immediate addition instruction (ADDI), a fixed point immediate subtraction instruction (SUBI), a fixed point comparison instruction (CMP), an immediate jump instruction (JUMPI), a floating point register addition instruction (ADDF), a floating point register subtraction instruction (SUBF), a floating point register multiplication instruction (MULF); data move class instruction: move instruction (MOV). The specific format and operation of the computational fluid dynamics oriented instruction set is shown in table 2.

TABLE 2 instruction set specific format and operation schedule for computational fluid dynamics

In one embodiment, as shown in fig. 4, a computational fluid dynamics-oriented acceleration calculation method is provided, which is used for implementing computational fluid dynamics-oriented acceleration calculation by using any one of the acceleration calculation apparatuses described above; the method comprises the following steps:

step 400: and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.

Step 402: and determining the number of required differential operation units and transmission channels according to an iterative calculation formula, and iterating in time to determine the program cycle number.

Step 404: all the differential operation units are combined together through a transmission channel, and initialization data are set and stored in a general register of the differential operation units.

Step 406: setting instructions and an execution sequence in an instruction memory according to an iterative calculation formula and program cycle times to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.

Step 408: and operating a calculation program, and outputting differential operation results of all nodes in the flow field at different moments.

In a collective embodiment, firstly, a special differential operation unit (DPE) and an instruction set are designed according to the calculation characteristics of a finite difference method widely used in computational fluid mechanics; secondly, setting a transmission channel (Chanel) for the operation unit, so that data can be transmitted between adjacent differential operation units for differential operation; and finally, combining a plurality of differential operation units through a transmission channel to finish differential operation of all nodes in the flow field in parallel. As shown in fig. 5, taking a two-dimensional linear convection equation as an example, the specific process is as follows:

(1) And according to the characteristics of the two-dimensional linear convection equation, the two-dimensional linear convection equation is expanded in space to determine the number of needed DPE units and transmission channels, and iteration is performed in time to determine the program cycle number.

The expression of the two-dimensional linear convection equation is:

by finite difference methods, in which the difference is forward in time, backward in space, and deformed to obtain

Velocity calculation formula of (c):

wherein the content of the first and second substances,

which is indicative of the current flow field velocity,

is the flow field velocity at the next moment,

as a matter of time, the time is,

is a two-dimensional space coordinate, and is,

、

and

for discretization step length, and for a known constant, setting the flow field velocity at the initial moment as:

the boundary conditions are as follows:

therefore, the flow field velocity at any time after the flow field velocity is calculated iteratively.

Is provided with

6400 (80 rows x 80 columns) DPEs are used to calculate the velocity for all discrete points in two-dimensional space at a time, for a total of 100 iterations.

(2) 6400 ((0-79) row x (0-79) column) arithmetic units (DPE) are combined together through a transmission channel, and initialization data is set and stored in a general register as follows:

6400 ((0-79) line x (0-79) column) arithmetic units (DPEs) are combined together by a transmission channel, the flow field speed at the initial time is stored in gr1, the value of gr1 in DPEs of 19 th to 39 th lines is set to be 2 (32 'b0 \ 10000000 \ u 00000000000000000000000 is converted to 32' single-precision floating point, and the same is true below), the values of gr1 in the rest DPEs are 1 (32 'b0 \ u 01111111 \ u 00000000000000000000000), and the flow field speeds stored in gr2 of all the DPEs are set to be 1 (32' b0 \ u 01111111 \ u 0000000000000000000000000000000)

（

) In addition, gr3 of all DPEs stores iterations of 32' d100 (no floating point arithmetic is required). In addition, two registers of gr56 and gr57 of all columns of the 0 th row, all columns of the 79 th row, all rows of the 0 th column and all rows of the 79 th column are constrained to be 1 (32' b0 \u01111111 \u00000000000000000000000), because the differential characteristic of the two-dimensional linear convection equation only uses the data of the upper stage, only two communication registers of cr2 and cr3 are used here, and cr0 and cr1 are not used.

(3) The instructions of the instruction memory are set, and pseudo codes set by the instructions in the instruction memory are shown in table 3.

Table 3: pseudo code for instruction setup in instruction memory

(4) And operating the program, and outputting a result: and outputting the differential operation results of all nodes in the flow field at different moments.

In conclusion, the acceleration computing device and method for computational fluid dynamics of the invention have the advantages of simple hardware structure, high computing energy efficiency, flexibility, programmability and the like, and can accelerate computational fluid dynamics. Taking the implementation of the two-dimensional linear convection equation as an example, 6400 DPE operation units are combined, the program length is 22 instructions (the cycle length is 19), the running time is 1908 clock cycles, and each DPE uses 11 general purpose registers and 2 communication registers. Aiming at the characteristics of the CFD algorithm, the size of the combined DPE array is changed, and different differential calculation formats can be realized by changing the instructions in the instruction memory, so that the method is flexible and programmable.

It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An accelerated computing device oriented to computational fluid dynamics, the accelerated computing device comprising: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field;

2. The accelerated computing apparatus according to claim 1, wherein the differential operation unit further includes:

the instruction controller is used for controlling the address of the instruction to be executed;

the instruction memory is used for storing instructions to be executed;

a plurality of general purpose registers for storing register data;

3. The accelerated computing apparatus of claim 2, the instruction controller comprising: a self-adding one adder and an alternative multiplexer.

4. The apparatus of claim 2, wherein the differential arithmetic unit executes an instruction comprising an address, decode, execute, and write-back four clock cycles;

in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers;

in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for judging whether the next instruction is executed in sequence or in jump by the instruction controller;

and in the write-back stage, judging whether the value of the general register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

5. The accelerated computing apparatus according to claim 2, wherein the differential operation unit includes 4 transmission channels, and the transmission channels are obtained by configuring a communication register in the differential operation unit;

6. The device of claim 2, wherein the number of the general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.

7. An accelerated computing apparatus according to claim 1, wherein the instruction set employed by the differential operation unit is set according to a characteristic of a computational fluid dynamics algorithm;

8. The accelerated computing apparatus of claim 7, wherein the instruction set is classified into three types according to operand, comprising: register type, immediate type, and hybrid type.

9. An accelerated computing device according to claim 7, wherein the set of instructions is classified according to the function of the instruction, the set of instructions comprising: control instructions, operation instructions and data transfer instructions;

wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate number adding instruction, a fixed-point immediate number subtracting instruction, a fixed-point comparing instruction, an immediate number skipping instruction, a floating-point register adding instruction, a floating-point register subtracting instruction and a floating-point register multiplying instruction; the data moving instruction comprises the following steps: and (5) moving the command.

10. A computational fluid dynamics-oriented acceleration computing method for implementing computational fluid dynamics-oriented acceleration computing using the acceleration computing device according to any one of claims 1 to 9; the method comprises the following steps:

expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula;

determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number;

all the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units;

setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used for realizing different differential calculation formats;

and operating the calculation program, and outputting differential operation results of all nodes in the flow field at different moments.