CN115291949A - Accelerated computing device and accelerated computing method for computational fluid dynamics - Google Patents

Accelerated computing device and accelerated computing method for computational fluid dynamics Download PDF

Info

Publication number
CN115291949A
CN115291949A CN202211171216.5A CN202211171216A CN115291949A CN 115291949 A CN115291949 A CN 115291949A CN 202211171216 A CN202211171216 A CN 202211171216A CN 115291949 A CN115291949 A CN 115291949A
Authority
CN
China
Prior art keywords
instruction
differential operation
register
instructions
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211171216.5A
Other languages
Chinese (zh)
Other versions
CN115291949B (en
Inventor
龚艳琼
刘必慰
赵玉新
黄东昌
郭阳
江豪龙
赖雯
王洁
杨益斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211171216.5A priority Critical patent/CN115291949B/en
Publication of CN115291949A publication Critical patent/CN115291949A/en
Application granted granted Critical
Publication of CN115291949B publication Critical patent/CN115291949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Abstract

The application relates to computational fluid dynamics and a computational acceleration method in the technical field of computational fluid dynamics and computers. The accelerated computing device includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the data transmission channels are arranged among the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Description

Accelerated computing device and accelerated computing method for computational fluid dynamics
Technical Field
The present application relates to the field of computational fluid dynamics and computer technologies, and in particular, to an accelerated computing apparatus and an accelerated computing method for computational fluid dynamics.
Background
Computational Fluid Dynamics (CFD for short) mainly utilizes a computer to solve a basic control equation of hydrodynamics, and can relatively easily and accurately simulate the flow characteristics of a complex flow field. In the discretization method of numerical computation analysis simulation using CFD, the finite difference method is a typical method in the numerical solution, and different difference computation formats can be combined by combining the time and space difference formats. However, the geometric shapes, numerical methods, physical and chemical models and the like required by the current CFD are increasingly complex and fine, and extremely high requirements are provided for large-scale calculation. And for the accurate simulation of the real flow, the calculation amount is very huge, and especially for the accurate numerical simulation of a full-size model, the current calculation capability can not be realized.
In order to accelerate the CFD calculation, various researchers in various countries carry out a lot of researches, and the development of CFD is effectively promoted. However, due to the limited computing power of computers, it still faces a serious challenge to implement CFD high performance on general purpose computing hardware.
Disclosure of Invention
In view of the above, there is a need to provide a computational fluid dynamics-oriented accelerated computing apparatus and an accelerated computing method, which have simple hardware structure, high computational efficiency, and flexibility and programmability.
A computational fluid dynamics-oriented acceleration computing device, the acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field.
The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining the adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel.
Further, the differential operation unit further includes:
and the instruction controller is used for controlling the address of the instruction to be executed.
And the instruction memory is used for storing instructions to be executed.
And the plurality of general registers are used for storing register data.
And the arithmetic logic operation unit is used for carrying out logic operation on the operand.
Further, the instruction controller includes: a self-adding one adder and an alternative multiplexer.
Further, the differential operation unit executes an instruction, including four clock cycles of addressing, decoding, executing, and writing back.
In the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;
in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers.
In the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump.
And in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
Further, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit.
The four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.
Furthermore, the number of general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.
Further, the instruction set used by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm.
The instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.
Further, the instruction set is divided into three types according to the difference of operands, including: register type, immediate type, and hybrid type.
Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control class instructions, operation class instructions and data movement class instructions.
Wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data moving instruction comprises the following steps: and (5) moving the command.
An accelerated calculation method facing computational fluid dynamics is used for achieving accelerated calculation facing computational fluid dynamics by adopting the accelerated calculation device; the method comprises the following steps:
and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.
And determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number.
All the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units.
Setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.
And operating the calculation program, and outputting the difference operation results of all nodes in the flow field at different moments.
In the above computational fluid dynamics-oriented acceleration calculation apparatus and acceleration calculation method, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.
Drawings
FIG. 1 is a schematic diagram of an accelerated computational device oriented to computational fluid dynamics in one embodiment;
FIG. 2 is a diagram of a computational fluid dynamics oriented arithmetic unit in one embodiment;
FIG. 3 is a schematic diagram of a computational fluid dynamics oriented transport channel in one embodiment;
FIG. 4 is a schematic flow chart of a computational fluid dynamics oriented acceleration calculation method according to another embodiment;
FIG. 5 is a schematic structural diagram of an accelerated computing device for solving two-dimensional linear convection equations for computational fluid dynamics in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in FIG. 1, there is provided a computational fluid dynamics oriented acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field. Wherein, the differential operation units (DPE) are designed according to the calculation characteristics of the finite differential method widely used for calculating fluid mechanics, and the number of the differential operation units is determined according to the condition that a fluid mechanics equation of the fluid mechanics problem to be solved is expanded on the space.
The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together and completing differential operation of all nodes in a flow field in parallel. A plurality of special operation units can be combined together through a transmission channel, the size of the combined differential operation unit array is changed according to the characteristics of a computational fluid dynamics algorithm, and different differential calculation formats can be realized by changing instructions in an instruction memory, so that the method has the advantages of flexibility and programmability.
The instruction set is designed according to the characteristic that computational fluid dynamics are calculated by using a finite difference method, and the instruction set is the minimum set capable of completing the difference operation. The instruction set is in a 16-bit RISC instruction set encoding format, and the instruction memory is 128 multiplied by 16 bits, namely all instruction encoding is 16-bit equal in length.
In the above computational fluid dynamics-oriented acceleration calculation apparatus, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.
Further, the differential operation unit further includes: the instruction controller is used for controlling the address of the instruction to be executed; the instruction memory is used for storing instructions to be executed; a plurality of general purpose registers (grs) for storing register data; and the arithmetic logic operation unit is used for carrying out logic operation on the operand.
The hardware structure is simple, a data memory is eliminated on the hardware structure, and each DPE only comprises four parts, namely an instruction controller, an instruction memory, a general register and an arithmetic logic operation unit.
Further, the command controller includes: an adder for adding one and a multiplexer for selecting one from two.
Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and write-back; in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register; in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers; in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump; and in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
In a specific embodiment, as shown in fig. 2, the differential operation unit DPE for computational fluid dynamics includes four parts, i.e., an instruction controller, an instruction memory, a general purpose register, and an arithmetic logic operation unit, wherein the instruction controller is configured to control an address of an instruction to be executed, the instruction controller includes an adder for adding one and a multiplexer for selecting one from two, the instruction memory is configured to store the instruction to be executed, the general purpose register is configured to store register data, the general purpose register has a bit width of 32 bits and a depth of 60, the arithmetic logic operation unit is configured to perform a logic operation on an operand, and the execution of an instruction includes four clock cycles, i.e., address IF, decode ID, execute EX, and write back WB, and the specific process is as follows:
1) Address (IF):
in the address stage, an instruction is read from an instruction memory according to the PC value and is sent to an instruction register (ID _ ir), and meanwhile, the value of the PC in the next period is set, and the instructions can be executed in sequence or jump to a specific address for execution.
2) Decoding (ID):
the decode stage decodes the instruction, extracts the corresponding operands from the instruction according to the instruction function (i.e., the opcode), and places the extracted operands in register a (reg _ a) and register B (reg _ B).
3) Execution (EX):
the EX stage operates reg _ a and reg _ B in the arithmetic logic unit ALU according to the instruction function, stores the operation result in a register C (reg _ C), sets the flag register flag to 0 or 1 according to the instruction function and the operation result, and is used for the instruction controller to determine whether the next instruction is executed sequentially or jumped, i.e. whether the value of the next clock cycle PC is a value in reg _ C or self-increment.
4) Write-back (WB):
the WB stage determines whether to modify the register value and how to modify it according to the instruction function and the EX stage result, and if so, stores the reg _ C value into the corresponding position of the general register. This stage is only valid for instructions that need to modify register values.
Furthermore, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit; the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter the transmission channel through the communication register nearest to the adjacent differential operation units for data communication.
In particular, as shown in fig. 3. The special differential operation unit (DPE) is provided with four Communication registers (cr) and is positioned at four edges of the DPE to form four transmission channels, the four Communication registers (cr 0-cr 4) are connected with an internal general register 55 (gr 55), the gr58 is used for storing the numerical value of the DPE needing differential operation, the adjacent DPE enters the transmission channel through the Communication register nearest to the adjacent DPE for data Communication, and the bit width of the Communication register is 32 bits. Four input ports are added for each DPE to be connected with internal general purpose registers 56-59 (gr 56-gr 59), four output ports are added to be respectively connected with four communication register registers, taking DPE _ X _ Y as an example, where _ X _ Y represents the position of the DPE, and DPE _ X _ Y is correspondingly connected with four adjacent DPEs through the added input and output ports, so as to acquire data of adjacent points and transmit the data to the adjacent points.
Furthermore, the number of general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.
Further, the instruction set adopted by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm; the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.
Further, the instruction set is divided into three types according to the operand, including: register type (R type), immediate type (I type), and blend type (RI type). The computational fluid dynamics oriented instruction set encoding format is shown in table 1.
TABLE 1 instruction set encoding format for computational fluid dynamics
Figure 958208DEST_PATH_IMAGE001
Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control instructions, operation instructions and data moving instructions; wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data move class instruction comprises: and (5) moving the command.
In particular, class 3 and class 14 strip machine instructions are set and implemented for computational fluid dynamics algorithm characteristics. Wherein, the control class instruction: null instructions (NOP), stop instructions (HALT), branch jump instructions (BZ, BNZ, BN, BNN); operation class instructions: a fixed point immediate addition instruction (ADDI), a fixed point immediate subtraction instruction (SUBI), a fixed point comparison instruction (CMP), an immediate jump instruction (JUMPI), a floating point register addition instruction (ADDF), a floating point register subtraction instruction (SUBF), a floating point register multiplication instruction (MULF); data move class instruction: move instruction (MOV). The specific format and operation of the computational fluid dynamics oriented instruction set is shown in table 2.
TABLE 2 instruction set specific format and operation schedule for computational fluid dynamics
Figure 536824DEST_PATH_IMAGE002
In one embodiment, as shown in fig. 4, a computational fluid dynamics-oriented acceleration calculation method is provided, which is used for implementing computational fluid dynamics-oriented acceleration calculation by using any one of the acceleration calculation apparatuses described above; the method comprises the following steps:
step 400: and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.
Step 402: and determining the number of required differential operation units and transmission channels according to an iterative calculation formula, and iterating in time to determine the program cycle number.
Step 404: all the differential operation units are combined together through a transmission channel, and initialization data are set and stored in a general register of the differential operation units.
Step 406: setting instructions and an execution sequence in an instruction memory according to an iterative calculation formula and program cycle times to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.
Step 408: and operating a calculation program, and outputting differential operation results of all nodes in the flow field at different moments.
In a collective embodiment, firstly, a special differential operation unit (DPE) and an instruction set are designed according to the calculation characteristics of a finite difference method widely used in computational fluid mechanics; secondly, setting a transmission channel (Chanel) for the operation unit, so that data can be transmitted between adjacent differential operation units for differential operation; and finally, combining a plurality of differential operation units through a transmission channel to finish differential operation of all nodes in the flow field in parallel. As shown in fig. 5, taking a two-dimensional linear convection equation as an example, the specific process is as follows:
(1) And according to the characteristics of the two-dimensional linear convection equation, the two-dimensional linear convection equation is expanded in space to determine the number of needed DPE units and transmission channels, and iteration is performed in time to determine the program cycle number.
The expression of the two-dimensional linear convection equation is:
Figure 503643DEST_PATH_IMAGE003
by finite difference methods, in which the difference is forward in time, backward in space, and deformed to obtain
Figure 212973DEST_PATH_IMAGE004
Velocity calculation formula of (c):
Figure 84983DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 701910DEST_PATH_IMAGE006
which is indicative of the current flow field velocity,
Figure 523235DEST_PATH_IMAGE007
is the flow field velocity at the next moment,
Figure 685357DEST_PATH_IMAGE008
as a matter of time, the time is,
Figure 592133DEST_PATH_IMAGE009
is a two-dimensional space coordinate, and is,
Figure 481592DEST_PATH_IMAGE010
Figure 423003DEST_PATH_IMAGE011
and
Figure 254562DEST_PATH_IMAGE012
for discretization step length, and for a known constant, setting the flow field velocity at the initial moment as:
Figure 648634DEST_PATH_IMAGE013
the boundary conditions are as follows:
Figure 341784DEST_PATH_IMAGE014
therefore, the flow field velocity at any time after the flow field velocity is calculated iteratively.
Is provided with
Figure 606543DEST_PATH_IMAGE015
6400 (80 rows x 80 columns) DPEs are used to calculate the velocity for all discrete points in two-dimensional space at a time, for a total of 100 iterations.
(2) 6400 ((0-79) row x (0-79) column) arithmetic units (DPE) are combined together through a transmission channel, and initialization data is set and stored in a general register as follows:
6400 ((0-79) line x (0-79) column) arithmetic units (DPEs) are combined together by a transmission channel, the flow field speed at the initial time is stored in gr1, the value of gr1 in DPEs of 19 th to 39 th lines is set to be 2 (32 'b0 \ 10000000 \ u 00000000000000000000000 is converted to 32' single-precision floating point, and the same is true below), the values of gr1 in the rest DPEs are 1 (32 'b0 \ u 01111111 \ u 00000000000000000000000), and the flow field speeds stored in gr2 of all the DPEs are set to be 1 (32' b0 \ u 01111111 \ u 0000000000000000000000000000000)
Figure 107538DEST_PATH_IMAGE016
Figure 988906DEST_PATH_IMAGE017
) In addition, gr3 of all DPEs stores iterations of 32' d100 (no floating point arithmetic is required). In addition, two registers of gr56 and gr57 of all columns of the 0 th row, all columns of the 79 th row, all rows of the 0 th column and all rows of the 79 th column are constrained to be 1 (32' b0 \u01111111 \u00000000000000000000000), because the differential characteristic of the two-dimensional linear convection equation only uses the data of the upper stage, only two communication registers of cr2 and cr3 are used here, and cr0 and cr1 are not used.
(3) The instructions of the instruction memory are set, and pseudo codes set by the instructions in the instruction memory are shown in table 3.
Table 3: pseudo code for instruction setup in instruction memory
Figure 485747DEST_PATH_IMAGE018
(4) And operating the program, and outputting a result: and outputting the differential operation results of all nodes in the flow field at different moments.
In conclusion, the acceleration computing device and method for computational fluid dynamics of the invention have the advantages of simple hardware structure, high computing energy efficiency, flexibility, programmability and the like, and can accelerate computational fluid dynamics. Taking the implementation of the two-dimensional linear convection equation as an example, 6400 DPE operation units are combined, the program length is 22 instructions (the cycle length is 19), the running time is 1908 clock cycles, and each DPE uses 11 general purpose registers and 2 communication registers. Aiming at the characteristics of the CFD algorithm, the size of the combined DPE array is changed, and different differential calculation formats can be realized by changing the instructions in the instruction memory, so that the method is flexible and programmable.
It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An accelerated computing device oriented to computational fluid dynamics, the accelerated computing device comprising: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field;
the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining the adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel.
2. The accelerated computing apparatus according to claim 1, wherein the differential operation unit further includes:
the instruction controller is used for controlling the address of the instruction to be executed;
the instruction memory is used for storing instructions to be executed;
a plurality of general purpose registers for storing register data;
and the arithmetic logic operation unit is used for carrying out logic operation on the operand.
3. The accelerated computing apparatus of claim 2, the instruction controller comprising: a self-adding one adder and an alternative multiplexer.
4. The apparatus of claim 2, wherein the differential arithmetic unit executes an instruction comprising an address, decode, execute, and write-back four clock cycles;
in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;
in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers;
in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for judging whether the next instruction is executed in sequence or in jump by the instruction controller;
and in the write-back stage, judging whether the value of the general register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
5. The accelerated computing apparatus according to claim 2, wherein the differential operation unit includes 4 transmission channels, and the transmission channels are obtained by configuring a communication register in the differential operation unit;
the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.
6. The device of claim 2, wherein the number of the general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.
7. An accelerated computing apparatus according to claim 1, wherein the instruction set employed by the differential operation unit is set according to a characteristic of a computational fluid dynamics algorithm;
the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.
8. The accelerated computing apparatus of claim 7, wherein the instruction set is classified into three types according to operand, comprising: register type, immediate type, and hybrid type.
9. An accelerated computing device according to claim 7, wherein the set of instructions is classified according to the function of the instruction, the set of instructions comprising: control instructions, operation instructions and data transfer instructions;
wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate number adding instruction, a fixed-point immediate number subtracting instruction, a fixed-point comparing instruction, an immediate number skipping instruction, a floating-point register adding instruction, a floating-point register subtracting instruction and a floating-point register multiplying instruction; the data moving instruction comprises the following steps: and (5) moving the command.
10. A computational fluid dynamics-oriented acceleration computing method for implementing computational fluid dynamics-oriented acceleration computing using the acceleration computing device according to any one of claims 1 to 9; the method comprises the following steps:
expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula;
determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number;
all the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units;
setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used for realizing different differential calculation formats;
and operating the calculation program, and outputting differential operation results of all nodes in the flow field at different moments.
CN202211171216.5A 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics Active CN115291949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211171216.5A CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211171216.5A CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Publications (2)

Publication Number Publication Date
CN115291949A true CN115291949A (en) 2022-11-04
CN115291949B CN115291949B (en) 2022-12-20

Family

ID=83833618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211171216.5A Active CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Country Status (1)

Country Link
CN (1) CN115291949B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001038832A2 (en) * 1999-11-24 2001-05-31 S.C. Ack S.R.L. System for metering fluids
US20030212712A1 (en) * 2002-05-13 2003-11-13 Jinsheng Gu Byte-level file differencing and updating algorithms
CN1787376A (en) * 2004-12-07 2006-06-14 奥特拉股份有限公司 Techniques for implementing hardwired decoders in differential input circuits
CN102842222A (en) * 2012-08-30 2012-12-26 西北工业大学 FPGA (Field Programmable Gate Array) online prediction control method based on Phillips macroscopic traffic flow model
CN102930730A (en) * 2012-11-19 2013-02-13 西安费斯达自动化工程有限公司 Online traffic bottleneck prediction control method based on FPGA and improved Phillips model
US20140200833A1 (en) * 2011-09-21 2014-07-17 Fujitsu Limited Object motion analysis apparatus, object motion analysis method, and storage medium
CN104639310A (en) * 2014-12-31 2015-05-20 东华大学 Method for detecting capacity of SHA-1 algorithm for resisting attack of differential fault
CN105264779A (en) * 2013-01-22 2016-01-20 阿尔特拉公司 Data compression and decompression using simd instructions
JP2018136255A (en) * 2017-02-23 2018-08-30 セイコーエプソン株式会社 Physical quantity sensor, electronic equipment and mobile object
CN111797045A (en) * 2016-12-21 2020-10-20 艾尔默斯半导体股份公司 Method for initializing a differential two-wire data bus and method for transmitting data
US20200362839A1 (en) * 2019-05-15 2020-11-19 Leistritz Pumpen Gmbh Method for determining a flow volume of a fluid delivered by a pump
CN112099762A (en) * 2020-09-10 2020-12-18 上海交通大学 Co-processing system and method for quickly realizing SM2 cryptographic algorithm
CN112098273A (en) * 2020-08-14 2020-12-18 山东大学 Near-field dynamics-based permeation grouting process simulation method and system
CN112818494A (en) * 2021-02-10 2021-05-18 西北工业大学 Functional gradient flow pipe modal and response analysis method based on differential quadrature method
CN112842312A (en) * 2021-02-01 2021-05-28 上海交通大学 Heart rate sensor and self-adaptive heartbeat lock ring system and method thereof
WO2021245101A1 (en) * 2020-06-05 2021-12-09 Politecnico Di Milano A computing platform for preventing side channel attacks
CN113935258A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Computational fluid dynamics acceleration method, device, equipment and storage medium
WO2022046761A1 (en) * 2020-08-26 2022-03-03 Tpe Midstream Llc Configurable fluid compression apparatus, control, and associated methods
CN115049529A (en) * 2021-03-08 2022-09-13 上海联影医疗科技股份有限公司 Image gradient determination method, device, equipment and storage medium

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001038832A2 (en) * 1999-11-24 2001-05-31 S.C. Ack S.R.L. System for metering fluids
US20030212712A1 (en) * 2002-05-13 2003-11-13 Jinsheng Gu Byte-level file differencing and updating algorithms
CN1787376A (en) * 2004-12-07 2006-06-14 奥特拉股份有限公司 Techniques for implementing hardwired decoders in differential input circuits
US20140200833A1 (en) * 2011-09-21 2014-07-17 Fujitsu Limited Object motion analysis apparatus, object motion analysis method, and storage medium
CN102842222A (en) * 2012-08-30 2012-12-26 西北工业大学 FPGA (Field Programmable Gate Array) online prediction control method based on Phillips macroscopic traffic flow model
CN102930730A (en) * 2012-11-19 2013-02-13 西安费斯达自动化工程有限公司 Online traffic bottleneck prediction control method based on FPGA and improved Phillips model
CN105264779A (en) * 2013-01-22 2016-01-20 阿尔特拉公司 Data compression and decompression using simd instructions
CN104639310A (en) * 2014-12-31 2015-05-20 东华大学 Method for detecting capacity of SHA-1 algorithm for resisting attack of differential fault
CN111797045A (en) * 2016-12-21 2020-10-20 艾尔默斯半导体股份公司 Method for initializing a differential two-wire data bus and method for transmitting data
JP2018136255A (en) * 2017-02-23 2018-08-30 セイコーエプソン株式会社 Physical quantity sensor, electronic equipment and mobile object
US20200362839A1 (en) * 2019-05-15 2020-11-19 Leistritz Pumpen Gmbh Method for determining a flow volume of a fluid delivered by a pump
WO2021245101A1 (en) * 2020-06-05 2021-12-09 Politecnico Di Milano A computing platform for preventing side channel attacks
CN112098273A (en) * 2020-08-14 2020-12-18 山东大学 Near-field dynamics-based permeation grouting process simulation method and system
WO2022046761A1 (en) * 2020-08-26 2022-03-03 Tpe Midstream Llc Configurable fluid compression apparatus, control, and associated methods
CN112099762A (en) * 2020-09-10 2020-12-18 上海交通大学 Co-processing system and method for quickly realizing SM2 cryptographic algorithm
CN112842312A (en) * 2021-02-01 2021-05-28 上海交通大学 Heart rate sensor and self-adaptive heartbeat lock ring system and method thereof
CN112818494A (en) * 2021-02-10 2021-05-18 西北工业大学 Functional gradient flow pipe modal and response analysis method based on differential quadrature method
CN115049529A (en) * 2021-03-08 2022-09-13 上海联影医疗科技股份有限公司 Image gradient determination method, device, equipment and storage medium
CN113935258A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Computational fluid dynamics acceleration method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘胜等: "一种自主设计的面向E级高性能计算的异构融合加速器", 《计算机研究与发展》 *
陶小涵等: "基于SW26010处理器的FT程序的性能优化", 《计算机科学》 *

Also Published As

Publication number Publication date
CN115291949B (en) 2022-12-20

Similar Documents

Publication Publication Date Title
Wong et al. ρ-VEX: A reconfigurable and extensible softcore VLIW processor
Ren et al. FPGA acceleration of the pair-HMMs forward algorithm for DNA sequence analysis
CN104699458A (en) Fixed point vector processor and vector data access controlling method thereof
Qi et al. Accelerating framework of transformer by hardware design and model compression co-optimization
CN109144469B (en) Pipeline structure neural network matrix operation architecture and method
EP2372587B1 (en) Apparatus and method for simulating a reconfigurable processor
CN102081513B (en) Method for performing instruction optimization on column confusion process in advanced encryption standard (AES) encryption algorithm and instruction set processor
TWI724545B (en) Apparatus and method for image processing
Cho et al. FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks
CN115291949B (en) Accelerated computing device and accelerated computing method for computational fluid dynamics
JP4477959B2 (en) Arithmetic processing device for broadcast parallel processing
JP7324754B2 (en) Add instruction with vector carry
Hilewitz et al. Bit matrix multiplication in commodity processors
Conceição et al. Efficient emulation of quantum circuits on classical hardware
CN110914800B (en) Register-based complex processing
WO2021250392A1 (en) Mixed-element-size instruction
Lei et al. FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic
CN103677735B (en) A kind of data processing equipment and digital signal processor
Arnold Improved DNA-sticker arithmetic: tube-encoded-carry, Logarithmic Number System and Monte-Carlo methods
Hou et al. System level power consumption modeling and optimization for coarse-grained reconfigurable architectures
Tan et al. A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning Instructions
Jiang et al. Dynamic Multi-bit Parallel Computing Method Based on Reconfigurable Structure
Andrzejczak An Improved Architecture of a Hardware Accelerator for Factoring Integers with Elliptic Curve Method
Rutzig et al. Balancing reconfigurable data path resources according to application requirements
KR101722695B1 (en) Reconfigurable processor and method for processing loop having memory dependency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant