CN112860220B - Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation - Google Patents

Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation Download PDF

Info

Publication number
CN112860220B
CN112860220B CN202110178984.2A CN202110178984A CN112860220B CN 112860220 B CN112860220 B CN 112860220B CN 202110178984 A CN202110178984 A CN 202110178984A CN 112860220 B CN112860220 B CN 112860220B
Authority
CN
China
Prior art keywords
floating point
operated
unit
point number
precision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110178984.2A
Other languages
Chinese (zh)
Other versions
CN112860220A (en
Inventor
谢歆昂
李凯
李博宇
杜来民
代柳瑶
毛伟
余浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Maitexin Technology Co ltd
Original Assignee
Southern University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern University of Science and Technology filed Critical Southern University of Science and Technology
Priority to CN202110178984.2A priority Critical patent/CN112860220B/en
Publication of CN112860220A publication Critical patent/CN112860220A/en
Priority to PCT/CN2021/131745 priority patent/WO2022170809A1/en
Application granted granted Critical
Publication of CN112860220B publication Critical patent/CN112860220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a reconfigurable floating point multiply-add operation unit and a method suitable for multi-precision calculation, wherein mantissas of floating points with different precisions are divided by adopting a unified method to obtain a plurality of bit sections, different numbers of same-class unit multipliers are called to realize multiplication operation of the plurality of bit sections in one period and output corresponding products, and then the multiplication and accumulation operation results of the floating points can be obtained after the products are subjected to shift addition operation. The invention adopts a uniform mantissa division scheme to avoid the problem of bit redundancy, adopts a uniform unit multiplier to improve the hardware utilization rate, and can also realize the multiply-accumulate operation of half-precision floating point numbers, the multiply-accumulate operation of single-precision dot product floating point numbers and the multiply-accumulate operation of double-precision floating point numbers. The method solves the problems that in the prior art, an operation method supporting multi-precision floating-point multiplication generates bit redundancy, the hardware utilization rate is low and the like.

Description

Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation
Technical Field
The invention relates to the field of digital circuits, in particular to a reconfigurable floating point multiply-add operation unit and a reconfigurable floating point multiply-add operation method suitable for multi-precision calculation.
Background
With the rapid development and wide application of scientific calculation, machine learning training and the like, a multiplication unit capable of supporting floating point data processing is produced. The conventional fixed-point multiplier has fixed input bit number and cannot meet the requirement of multi-precision calculation, so a method for supporting multi-precision floating-point multiplication operation appears. However, the existing operation method supporting multi-precision floating-point multiplication needs various mantissa division schemes and needs to separate the generated product into two parallel parts by a zero padding method, so that the problems of precision loss, bit redundancy, low hardware utilization rate and the like exist.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The present invention is directed to provide a reconfigurable floating-point multiply-add unit and method suitable for multi-precision computation, which solve the problems of bit redundancy and low hardware utilization rate in the operation method supporting multi-precision floating-point multiply operation in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides a reconfigurable floating-point multiply-add operation method suitable for multi-precision computation, where the method includes:
acquiring effective numbers of floating point numbers to be operated, and generating a plurality of target sections based on the effective numbers; the number of the plurality includes one;
determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target section as an operand of one unit multiplier, and acquiring a product generated by the unit multiplier based on the operand;
and performing shift addition operation on the product, and taking an operation result generated based on the shift addition operation as a result of multiply-accumulate operation of the floating point number to be operated.
In one embodiment, the method comprises the steps of obtaining a significant number of a floating point number to be operated, and generating a plurality of target sections based on the significant number; the number of ones includes one, including:
adding a 1-bit integer to the mantissa part of the floating point number to be operated;
taking the number on the significant digit of the floating point number obtained after the adding is finished as the significant digit of the floating point number to be operated;
when the bit number of the effective digit is larger than that of the unit multiplier, the effective digit is divided according to the bit number of the unit multiplier, and a plurality of target sections are generated after the division; the number of pieces includes one.
In one embodiment, the determining the number of called unit multipliers according to the precision of the floating-point number to be operated on, and using a destination segment as an operand of one unit multiplier, the obtaining the product generated by the unit multiplier based on the operand includes:
determining the number of the called unit multipliers according to the precision of the floating point number to be operated;
taking a target segment as an operand of a unit multiplier;
the operands are input to the unit multipliers to generate row products.
In one embodiment, when the unit multiplier is a 14-bit multiplier, the determining the number of invoked unit multipliers according to the precision and the logarithm of the floating-point number to be computed includes:
when the floating point number to be operated is a half-precision floating point number, n calls n unit multipliers for the floating point number to be operated;
when the floating point number to be operated is a single-precision floating point number, calling 4n unit multipliers by n floating point numbers to be operated;
when the floating point number to be operated is a double-precision floating point number, calling 16n unit multipliers by n floating point numbers to be operated;
n is an integer greater than 0.
In one embodiment, the generating row products after inputting the operands to the unit multiplier comprises:
inputting the operand into the unit multiplier, and generating a plurality of row products after encoding the operand through unsigned bit booth.
In one embodiment, when the floating point number to be operated on is a double-precision floating point number, the using a target segment as an operand of a unit multiplier further includes:
and when the bit digits of the target segment corresponding to the floating point number to be operated are not equal, performing bit complement operation on the target segment with the minimum bit digit.
In one embodiment, the performing a shift-and-add operation on the product and taking an operation result generated based on the shift-and-add operation as a result of a multiply-and-accumulate operation on the floating point number to be operated includes:
inputting the product into a preset addition tree;
calculating the displacement of the product, and performing displacement operation on the product according to the displacement through the addition tree;
and performing summation operation on the data obtained after the shift operation to obtain a result of multiply-accumulate operation of the floating point number to be operated.
In one embodiment, the displacement amount includes at least one of an inner displacement amount and an outer displacement amount;
the calculation method of the internal displacement amount is as follows: taking the sum of the high and low bits of the segment numbers divided based on the floating point number to be operated as the internal shift amount of the product corresponding to the segment numbers;
the calculation method of the external displacement amount is as follows: adding the exponent parts of the floating point number to be operated to obtain exponent sums, and taking the maximum value of all the obtained exponent sums as a reference value; and performing difference on the reference value and the exponent sum to obtain exponent difference, and using the exponent difference as the external shift amount of the product corresponding to the floating point number to be operated.
In a second aspect, an embodiment of the present invention further provides a reconfigurable floating-point multiply-add operation unit suitable for multi-precision calculation, where the operation unit includes:
the dividing module is used for acquiring effective numbers of floating point numbers to be operated and generating a plurality of target segments based on the effective numbers; the number of the plurality includes one;
the unit multiplier is used for determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target section as an operand of one unit multiplier and acquiring a product generated by the unit multiplier based on the operand;
and the addition tree is used for carrying out shift addition operation on the products and taking an operation result generated based on the shift addition operation as a result of the multiply-accumulate operation of the floating point number to be operated.
In one embodiment, the arithmetic unit includes 16n unit multipliers, and n is a non-negative number.
The invention has the beneficial effects that: the embodiment of the invention avoids the problem of bit redundancy by adopting a uniform mantissa division scheme, improves the hardware utilization rate by adopting a uniform unit multiplier, and can also realize multiply-accumulate operation of half-precision floating point numbers, multiply-accumulate operation of single-precision floating point numbers and multiply-accumulate operation of double-precision floating point numbers. The method solves the problems that in the prior art, an operation method supporting multi-precision floating-point multiplication generates bit redundancy, the hardware utilization rate is low and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a reconfigurable floating-point multiply-add operation method suitable for multi-precision calculation according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a division scheme of significant digits of floating point numbers of different precision provided by an embodiment of the invention.
Fig. 3 is a schematic diagram of an operating principle of a 14-bit basic multiplier provided in an embodiment of the present invention.
Fig. 4 is a calculation diagram of 16 sets of product input adder trees for calculating a pair of FPs 64 according to an embodiment of the present invention.
Fig. 5 is a block diagram of an internal basic block of a reconfigurable floating-point multiply-add operation unit suitable for multi-precision calculation according to an embodiment of the present invention.
Fig. 6 is a reference diagram of a minimum operation unit capable of implementing mantissa multiply-accumulate operations on 3 floating point numbers with different precisions according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, back, 8230; etc.) are involved in the embodiment of the present invention, the directional indications are only used for explaining the relative positional relationship between the components, the motion situation, etc. in a specific posture (as shown in the figure), and if the specific posture is changed, the directional indications are correspondingly changed.
With the rapid development and wide application of scientific calculation, machine learning training and the like, a multiplication unit capable of supporting floating point data processing is produced. The input bit number of the conventional fixed-point multiplier is fixed, the requirement of multi-precision calculation cannot be met, and hardware resources cannot be utilized to the maximum extent according to application requirements, so that the energy efficiency ratio and the throughput rate are improved. Thus, a method of multi-precision floating-point multiplication results. However, when some existing methods supporting multi-precision floating-point multiplication operations implement multi-precision multiplication, the generated product needs to be separated into two parallel parts by a zero padding method, thereby reducing the utilization rate of a system module; some existing methods for supporting multi-precision floating-point multiplication operations need to adopt different mantissa division schemes when realizing multi-precision multiplication, for example, an architecture is based on a 15-bit multiplier and optimized to support floating-point multiplication operations with FP128 precision, but when being used for floating-point multiplication operations with other precisions, a large amount of bit redundancy and hardware resource waste are generated. In short, the input bit number of the conventional fixed-point multiplier is fixed, so that the requirement of multi-precision calculation cannot be met, and the hardware resources cannot be utilized to the maximum extent according to the application requirement, so that the energy efficiency ratio and the throughput rate are improved; the existing operation method supporting multi-precision floating-point multiplication has the problems of precision loss, bit redundancy, low hardware utilization rate and the like.
Based on the above defects in the prior art, the present invention provides a reconfigurable floating-point multiply-add operation method suitable for multi-precision calculation, which is implemented by dividing mantissas of floating points with different precisions by a uniform method to obtain a plurality of bit segments, calling different numbers of same-class unit multipliers to implement multiplication operations of the plurality of bit segments in one cycle and output corresponding products, and then performing shift addition operation on the products to obtain an operation result of multiplying mantissas of floating points. The invention adopts a uniform mantissa division scheme to avoid the problem of bit redundancy, adopts a uniform unit multiplier to improve the hardware utilization rate, and can also realize the multiply-accumulate operation of half-precision floating point numbers, the multiply-accumulate operation of single-precision floating point numbers and the multiply-accumulate operation of double-precision floating point numbers. The method solves the problems that in the prior art, an operation method supporting multi-precision floating-point multiplication generates bit redundancy, the hardware utilization rate is low and the like.
As shown in fig. 1, the method comprises the steps of:
s100, obtaining effective numbers of floating point numbers to be operated, and generating a plurality of target sections based on the effective numbers; the number of ones includes one.
In the floating-point number multiplication, the exponent part of the multiplication result is the sum of the exponent parts of the two floating-point numbers to be multiplied, and the mantissa part of the multiplication result is the product of the mantissas of the two floating-point numbers to be multiplied. The present embodiment is mainly optimized for the method of generating the mantissa part of the multiplication result, i.e. the product of the mantissas of two floating-point numbers to be multiplied, in the multiplication of floating-point numbers. Specifically, first, the present embodiment needs to acquire an effective number of a floating point number to be operated, where the effective number refers to data that needs to participate in multiplication operation in mantissas of the floating point number to be operated, and only after the effective number that needs to participate in multiplication operation is determined first, the subsequent multiplication operation can be performed. After acquiring the significant number, the present embodiment needs to generate one or more target segments based on the significant number, and then use the target segments as input data of the unit multiplier.
The step S100 includes the steps of:
step S110, adding a 1-bit integer to the mantissa part of the floating point number to be operated;
step S120, taking the number on the significant digit of the floating point number obtained after the adding is finished as the significant digit of the floating point number to be operated;
step S130, when the bit number of the effective digit is larger than that of the unit multiplier, dividing the effective digit according to the bit number of the unit multiplier, and generating a plurality of target sections after division; the number of pieces includes one.
Specifically, in order to obtain the valid number of the floating point number to be operated, in this embodiment, a 1-bit integer is first added to the mantissa portion of the floating point number to be operated, and then the number on the valid bit of the floating point number obtained after the addition is completed is used as the valid number of the floating point number to be operated. For example, after a 1-bit integer is added to the mantissa part of a half-precision floating point number (floating point 16 bit, FP 16), the significand thereof is 11 bits; after a 1-bit integer is added to the mantissa part of a single-precision floating point number (floating point 32 bit number, FP 32), the effective digit is 24 bits; the significand of a double-precision floating-point number (floating-point 64-bit number, FP 64) is 53 bits by adding a 1-bit integer to the mantissa portion. After obtaining the significant number, this embodiment further needs to obtain a target segment according to the significant number, and use the target segment as input data of a subsequent unit multiplier.
Specifically, in this embodiment, the bit number of the significant digit needs to be compared with the bit number of the unit multiplier, and finally, what kind of processing should be performed on the significant digit is determined, so as to generate the target segment. When the number of bits of the significant number is less than or equal to the number of bits of the unit multiplier, the significant number may be directly input as a target segment into the unit multiplier. For example, as shown in fig. 2, when the unit multiplier is a 14-bit basic unit multiplier, the significand of a 16-bit floating point number is only 11 bits, so that the significand of the 16-bit floating point number does not need to be divided, and can be directly used as a target segment.
When the bit number of the significant digit is greater than the bit number of the unit multiplier, it is obviously impossible to directly input the significant digit into the unit multiplier, so that the significant digit needs to be divided, and then a plurality of target segments generated after the division are input into the unit multiplier. For example, when the unit multiplier is a 14-bit basic unit multiplier, the significant number of a 32-bit floating-point number is 24 bits, and therefore the significant number needs to be divided to generate 2 12-bit target segments. Similarly, the significand of a 64-bit floating point number is 53 bits, so that the significand also needs to be divided to generate 14:13:13: 4 target segments of 13.
After acquiring the target segment, the target segment needs to be input into a unit multiplier, so as shown in fig. 1, the method further includes the following steps:
step S200, determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target segment as an operand of one unit multiplier, and obtaining the product generated by the unit multiplier based on the operand.
In this embodiment, first, the number of called unit multiplications needs to be determined according to the precision of the floating point number to be operated. Then, the obtained target segment is used as an operand of a unit multiplier, and it can be understood that a unit multiplier needs two operands to carry out multiplication operation, one operand is used as a multiplier, and the other operand is used as a multiplicand. The product generated by the unit multiplier based on the operands is then obtained. In the multiplication, if the multiplier is a number of two or more bits, each bit of the multiplier is used to multiply the multiplicand when multiplying, and the product obtained by each multiplication is called a product or an incomplete product.
The step S200 specifically includes the following steps:
step 210, determining the number of the called unit multipliers according to the precision of the floating point number to be operated;
step 220, using a target segment as an operand of a unit multiplier;
step 230, generating a plurality of row products after inputting the operands into the unit multiplier.
In this embodiment, the number of the called unit multipliers needs to be determined first. In one implementation, when the unit multiplier is a 14-bit multiplier, the determining the number of called unit multipliers according to the precision and the logarithm of the floating-point number to be operated includes: when the floating point number to be operated is a half-precision floating point number, n calls n unit multipliers for the floating point number to be operated; when the floating point number to be operated is a single-precision floating point number, calling 4n unit multipliers for the floating point number to be operated by n; when the floating point number to be operated is a double-precision floating point number, calling 16n unit multipliers by n floating point numbers to be operated; n is an integer greater than 0.
For example, assuming that the unit multiplier is a 14-bit basic unit multiplier, when the multiply-accumulate operation result of 16 half-precision floating point numbers needs to be calculated simultaneously, 16 14-bit basic unit multipliers need to be invoked, for the following reason, because the present embodiment divides the significant number based on the bit number of the unit multiplier, the significant number of the half-precision floating point number can be directly used as a target segment, and therefore, when the multiply-accumulate operation result of 16 half-precision floating point numbers is calculated simultaneously, 1 14-bit basic unit multiplier needs to be invoked for each half-precision floating point number, and a total of 16 14-bit basic unit multipliers need to be invoked. Similarly, when the multiply-accumulate operation result of 4 pairs of single-precision floating-point numbers needs to be calculated simultaneously, 16 14-bit basic unit multipliers also need to be called. Because the effective number of the single-precision floating point number needs to be divided before being input into the unit multiplier, the division result is that 2 target segments are generated, 2 × 2=4 multiplication combination modes exist between 4 target segments corresponding to 1 pair of single-precision floating point numbers, that is, 4 unit multipliers are needed, and 4 × 4=16 unit multipliers are needed for multiply-accumulate operation of 4 pairs of single-precision floating point numbers. Similarly, when the multiply-accumulate operation result of 1 pair of double-precision floating point numbers needs to be calculated simultaneously, 16 basic unit multipliers need to be called, because the effective numbers of the double-precision floating point numbers are divided to generate 4 target segments, 4 × 4=16 multiplication combination modes exist between 8 target segments corresponding to 1 pair of double-precision floating point numbers, and therefore 16 basic unit multipliers are needed in total.
In addition, since the division of the significant number may generate target segments with unequal number of bits, in one implementation, the method further includes, before taking a target segment as an operand of a unit multiplier: and when the bit digits of the target segment corresponding to the floating point number to be operated are not equal, performing bit complement operation on the target segment with the minimum bit digit, wherein the bit complement operation can be realized in a zero complement mode. For example, when the unit multiplier is a 14-bit basic unit multiplier, the effective number corresponding to the double-precision floating-point number is 53 bits, and 14:13:13: 4 targets of 13, zero padding operation is required for the target segment of 13 bits.
Then, one target segment is used as an operand of one unit multiplier, and then, a plurality of row products generated by the unit multiplier are obtained. Specifically, after the operands are input to the unit multipliers, the operands are encoded by a booth (booth) with an unsigned bit to generate line products (as shown in fig. 3).
After the product is obtained, in order to obtain the result of multiply-accumulate operation of floating point number, as shown in fig. 1, the method further includes the following steps:
and step S300, performing shift addition operation on the product, and taking an operation result generated based on the shift addition operation as a result of multiply-accumulate operation of the floating point number to be operated.
After the product output by the unit multiplier is obtained, in order to obtain an accurate multiplication result, the present embodiment needs to perform shift addition operation on the obtained product, and then take the operation result after the shift addition operation as the result of multiply-accumulate operation on the floating point number to be operated.
In an implementation manner, the step S300 specifically includes the following steps:
step S310, inputting the product into a preset addition tree;
step S320, calculating the displacement of the product, and performing displacement operation on the product according to the displacement through the addition tree;
and step S330, summing the data obtained after the shifting operation to obtain the result of multiply-accumulate operation of the floating point number to be operated.
The embodiment sets an addition tree in advance for the scheme of generating the target segment and the use condition of the unit multiplier so as to realize lossless processing of data. Specifically, after a product is obtained, the product is input into the addition tree, then a displacement of the product is calculated in the addition tree, and then the product is shifted according to the displacement. Specifically, the displacement calculated in the addition tree includes at least one of an internal displacement and an external displacement, that is, only the internal displacement, only the external displacement, and both the internal displacement and the external displacement, and a final value of the displacement needs to be determined specifically according to the precision of a floating point number to be calculated and a calculation logarithm. And calculating the internal displacement by taking the sum of high and low bits of the segment number divided based on the floating point number to be operated as the internal displacement of the product corresponding to the segment number. The calculation method of the external displacement amount is as follows: adding the exponent parts of the floating point number to be operated to obtain exponent sums, taking the maximum value of all the obtained exponent sums as a reference value, then performing difference on the reference value and the exponent sums to obtain exponent differences, and finally taking the exponent differences as external shift amounts of products corresponding to the floating point number to be operated.
In short, when a plurality of pairs of floating point numbers are calculated at the same time, the number of the exponent portion of the floating point number itself is required as the external displacement amount. After the effective number is divided, different unit multipliers are called for operation, and although the bit number of the output product of each unit multiplier is the same, the sum of the high and low bits of the divided segment number is required to be used as the internal displacement amount, and the multiplication and accumulation operation result of the correct floating point number can be generated by performing corresponding high and low bit displacement and then accumulating.
For example, 16 multiply-accumulate operation results of half-precision floating point numbers are calculated simultaneously, 16 14-bit basic unit multipliers need to be called, effective numbers of the half-precision floating point numbers are not divided, but multiply-accumulate operation results of a plurality of pairs of floating point numbers need to be calculated simultaneously, so that in this case, the displacement of the product is only an external displacement, 16 exponential sums of the half-precision floating point numbers are inquired first, the largest sum of the 16 exponential sums is used as a reference value, and 16 different exponential sums are subtracted from the reference value respectively to obtain the external displacement.
When the multiply-accumulate operation results of 4 pairs of single-precision floating point numbers are calculated simultaneously, since the effective numbers of the single-precision floating point numbers are input into the unit multiplier after being divided, and the multiply-accumulate operation results of a plurality of pairs of floating point numbers need to be calculated simultaneously, the displacement of the product in this case includes both the internal displacement and the external displacement. Firstly, inquiring 4 exponent sums of 4 pairs of single-precision floating point numbers, using the maximum sum of the 4 exponent sums as a reference value, and then subtracting the reference value from 4 different exponent sums respectively to obtain the external displacement. In addition, it is necessary to use the high and low bits of the divided segment number based on the floating point number to be operated as the internal shift amount of the product corresponding to the segment number, for example, a 0 ×b 0 The sum of the segments and the position 0+0=0, then the internal displacement is 0; a is 1 ×b 0 1 and 1+0=1, the internal displacement is shifted to the left by 1 × 14=14 bits; a is 3 ×b 1 And bit 3+1=4, the internal displacement is shifted left by 4 × 14=56 bits.
When calculating the result of multiply-accumulate operation for 1 pair of double-precision floating-point numbers, since the significant number of the double-precision floating-point number is divided and then input to the unit multiplier, but only the result of multiply-accumulate operation for 1 pair of floating-point numbers needs to be calculated, the displacement of the product in this case only includes the internal displacement, such as a 1 ×b 0 The resulting shift amount is 1 × 14bit, a 1 ×b 1 And a 2 ×b 0 The resulting shift amounts are all 2 x 14 bits (as shown in fig. 4).
After the displacement of the product is calculated, the product is shifted according to the displacement, and then the data obtained after the shifting operation is summed, so that the result of the multiply-accumulate operation of the floating point number to be operated can be obtained.
Based on the foregoing embodiment, the present invention further provides a reconfigurable floating-point multiply-add operation unit suitable for multi-precision calculation, as shown in fig. 5, where the operation unit includes:
the dividing module 01 is used for acquiring effective numbers of floating point numbers to be operated and generating a plurality of target segments based on the effective numbers; the number of the plurality includes one;
the unit multiplier 02 is used for determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target section as an operand of one unit multiplier and acquiring the product generated by the unit multiplier based on the operand;
and the addition tree 03 is used for performing shift addition operation on the product and taking an operation result generated based on the shift addition operation as a result of multiply-accumulate operation of the floating point number to be operated.
In one implementation manner, in order to enable the arithmetic unit to realize multiply-accumulate operations of half-precision floating point numbers, single-precision floating point numbers and double-precision floating point numbers, the arithmetic unit comprises 16n unit multipliers, and n is a non-negative number. For example, as shown in fig. 6, when the arithmetic unit includes 16 basic unit multipliers with 14 bits, each unit multiplier can implement 1 group of multiplication operations of half-precision floating point numbers, and thus 16 multiplication and accumulation operations of half-precision floating point numbers can be implemented simultaneously. Every 4 unit multipliers can realize 1 group of single-precision floating point number multiplication operations, so 4 pairs of single-precision floating point numbers can be simultaneously multiplied and accumulated. The 16 unit multipliers can realize 1 group of double-precision floating point number multiplication operations, and therefore, the multiplication and accumulation operations of 1 pair of double-precision floating point numbers can also be realized. FIG. 6 is a reference diagram of a minimum operation unit capable of performing multiply-accumulate operations on 3 floating-point numbers with different precision according to the present invention. Therefore, the embodiment of the invention can at least complete multiply-accumulate operation of a plurality of pairs of half precision floating point numbers, multiply-accumulate operation of a plurality of pairs of single precision floating point numbers or multiply-accumulate operation of 1 pair of double precision floating point numbers in one clock period under the condition of no limitation of hardware resources. Compared with fixed FP32 and FP64 multiplication and addition units, the arithmetic unit provided by the invention can improve the maximum throughput rate by 4 times and 16 times respectively.
In summary, the present invention discloses a reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation, which divide mantissas of floating points of different precision by using a uniform method to obtain a plurality of bit segments, call different numbers of unit multipliers of the same type to complete multiplication operations of the plurality of bit segments in one cycle and output corresponding products, and then perform shift addition operations on the products to obtain multiply-accumulate operation results of floating points. The invention adopts a uniform mantissa division scheme to avoid the problem of bit redundancy, adopts a uniform unit multiplier to improve the hardware utilization rate, and can also realize the multiply-accumulate operation of half-precision floating point numbers, the multiply-accumulate operation of single-precision floating point numbers and the multiply-accumulate operation of double-precision floating point numbers. The method solves the problems that in the prior art, an operation method supporting multi-precision floating-point multiplication generates bit redundancy, the hardware utilization rate is low and the like.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (8)

1. A reconfigurable floating-point multiply-add operation method suitable for multi-precision calculation is characterized by comprising the following steps:
acquiring effective numbers of floating point numbers to be operated, and generating a plurality of target sections based on the effective numbers; the number of the plurality includes one;
determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target section as an operand of one unit multiplier, and acquiring a product generated by the unit multiplier based on the operand;
performing shift addition operation on the product, and taking an operation result generated based on the shift addition operation as a result of multiply-accumulate operation of the floating point number to be operated;
determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target segment as an operand of one unit multiplier, and acquiring the product generated by the unit multiplier based on the operand comprises the following steps:
determining the number of the called unit multipliers according to the precision of the floating point number to be operated;
taking a target segment as an operand of a unit multiplier;
inputting the operands into the unit multiplier to generate a plurality of row products;
when the unit multiplier is a 14-bit multiplier, the determining the number of the called unit multipliers according to the precision and the logarithm of the floating point number to be operated comprises:
when the floating point number to be operated is a half-precision floating point number, n calling n unit multipliers for the floating point number to be operated;
when the floating point number to be operated is a single-precision floating point number, calling 4n unit multipliers for the floating point number to be operated by n;
when the floating point number to be operated is a double-precision floating point number, calling 16n unit multipliers by n floating point numbers to be operated;
n is an integer greater than 0.
2. The method according to claim 1, wherein the method comprises obtaining significant digits of floating point numbers to be operated, and generating a plurality of target segments based on the significant digits; the number of ones includes one of:
adding a 1-bit integer to the mantissa part of the floating point number to be operated;
taking the number on the significant digit of the floating point number obtained after the adding is finished as the significant digit of the floating point number to be operated;
when the bit number of the effective digit is larger than that of the unit multiplier, the effective digit is divided according to the bit number of the unit multiplier, and a plurality of target sections are generated after the division; the number of pieces includes one.
3. The method of claim 1, wherein the inputting the operands to the unit multiplier to generate the row products comprises:
and inputting the operands into the unit multiplier, and generating a plurality of row products after encoding the operands through unsigned bit Booth.
4. The method of claim 1, wherein when the floating point number to be operated on is a double-precision floating point number, the using a destination segment as an operand of an element multiplier further comprises:
and when the bit digits of the target segment corresponding to the floating point number to be operated are not equal, performing bit complement operation on the target segment with the minimum bit digit.
5. The method according to claim 1, wherein the performing a shift-and-add operation on the product and using an operation result generated based on the shift-and-add operation as a result of the multiply-and-accumulate operation on the floating point number to be operated comprises:
inputting the product into a preset addition tree;
calculating the displacement of the product, and performing displacement operation on the product according to the displacement through the addition tree;
and performing summation operation on the data obtained after the shift operation to obtain a result of multiply-accumulate operation of the floating point number to be operated.
6. The reconfigurable floating-point multiply-add operation method according to claim 5, wherein the shift amount comprises at least one of an internal shift amount and an external shift amount;
the calculation method of the internal displacement amount is as follows: taking the sum of the high and low bits of the segment numbers divided based on the floating point number to be operated as the internal shift amount of the product corresponding to the segment numbers;
the calculation method of the external displacement amount is as follows: adding the exponent parts of the floating point number to be operated to obtain exponent sums, and taking the maximum value of all the obtained exponent sums as a reference value; and performing difference on the reference value and the exponent sum to obtain exponent difference, and using the exponent difference as the external shift amount of the product corresponding to the floating point number to be operated.
7. A reconfigurable floating-point multiply-add arithmetic unit adapted for multi-precision computing, the arithmetic unit comprising:
the dividing module is used for acquiring effective numbers of floating point numbers to be operated and generating a plurality of target segments based on the effective numbers; the number of ones includes one;
the unit multiplier is used for determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target segment as an operand of one unit multiplier and acquiring a product generated by the unit multiplier based on the operand;
an addition tree, configured to perform a shift addition operation on the product, and use an operation result generated based on the shift addition operation as a result of a multiply-accumulate operation on the floating point number to be operated;
determining the number of the called unit multipliers according to the precision of the floating point number to be operated, taking a target segment as an operand of one unit multiplier, and acquiring the product generated by the unit multiplier based on the operand comprises the following steps:
determining the number of the called unit multipliers according to the precision of the floating point number to be operated;
taking a target segment as an operand of a unit multiplier;
inputting the operands into the unit multiplier to generate a plurality of row products;
when the unit multiplier is a 14-bit multiplier, the determining the number of the called unit multipliers according to the precision and the logarithm of the floating point number to be operated comprises:
when the floating point number to be operated is a half-precision floating point number, n calls n unit multipliers for the floating point number to be operated;
when the floating point number to be operated is a single-precision floating point number, calling 4n unit multipliers for the floating point number to be operated by n;
when the floating point number to be operated is a double-precision floating point number, calling 16n unit multipliers by n floating point numbers to be operated;
n is an integer greater than 0.
8. The reconfigurable floating-point multiply-add unit of claim 7, wherein the unit comprises 16n unit multipliers, n being a non-negative number.
CN202110178984.2A 2021-02-09 2021-02-09 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation Active CN112860220B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110178984.2A CN112860220B (en) 2021-02-09 2021-02-09 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation
PCT/CN2021/131745 WO2022170809A1 (en) 2021-02-09 2021-11-19 Reconfigurable floating point multiply-accumulate operation unit and method suitable for multi-precision calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110178984.2A CN112860220B (en) 2021-02-09 2021-02-09 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation

Publications (2)

Publication Number Publication Date
CN112860220A CN112860220A (en) 2021-05-28
CN112860220B true CN112860220B (en) 2023-03-24

Family

ID=75989427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110178984.2A Active CN112860220B (en) 2021-02-09 2021-02-09 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation

Country Status (2)

Country Link
CN (1) CN112860220B (en)
WO (1) WO2022170809A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860220B (en) * 2021-02-09 2023-03-24 南方科技大学 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation
US20230083270A1 (en) * 2021-09-14 2023-03-16 International Business Machines Corporation Mixed signal circuitry for bitwise multiplication with different accuracies
CN114237551B (en) * 2021-11-26 2022-11-11 南方科技大学 Multi-precision accelerator based on pulse array and data processing method thereof
CN116301717A (en) * 2022-11-22 2023-06-23 中国科学院自动化研究所 Method and device for determining multiply-add sum, electronic equipment and storage medium
CN115827555B (en) * 2022-11-30 2024-05-28 格兰菲智能科技有限公司 Data processing method, computer device, storage medium, and multiplier structure
CN117908835B (en) * 2024-03-20 2024-05-17 南京邮电大学 Method for accelerating SM2 cryptographic algorithm based on floating point number computing capability

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8468191B2 (en) * 2009-09-02 2013-06-18 Advanced Micro Devices, Inc. Method and system for multi-precision computation
CN104111816B (en) * 2014-06-25 2017-04-12 中国人民解放军国防科学技术大学 Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN107273090B (en) * 2017-05-05 2020-07-31 中国科学院计算技术研究所 Approximate floating-point multiplier and floating-point multiplication oriented to neural network processor
US10346133B1 (en) * 2017-12-21 2019-07-09 Qualcomm Incorporated System and method of floating point multiply operation processing
CN109062540B (en) * 2018-06-06 2022-11-25 北京理工大学 Reconfigurable floating point operation device based on CORDIC algorithm
CN109739555B (en) * 2019-01-04 2023-06-16 腾讯科技(深圳)有限公司 Chip comprising multiply-accumulate module, terminal and control method
CN112189184A (en) * 2019-09-29 2021-01-05 深圳市大疆创新科技有限公司 Floating point number processing method and device and movable platform
CN112860220B (en) * 2021-02-09 2023-03-24 南方科技大学 Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation

Also Published As

Publication number Publication date
CN112860220A (en) 2021-05-28
WO2022170809A1 (en) 2022-08-18

Similar Documents

Publication Publication Date Title
CN112860220B (en) Reconfigurable floating-point multiply-add operation unit and method suitable for multi-precision calculation
EP0472139A2 (en) A floating-point processor
US10949168B2 (en) Compressing like-magnitude partial products in multiply accumulation
KR101603471B1 (en) System and method for signal processing in digital signal processors
CA2530015C (en) Division and square root arithmetic unit
US5148386A (en) Adder-subtracter for signed absolute values
KR100203468B1 (en) Arithmetic apparatus for floating point numbers
CN116400883A (en) Floating point multiply-add device capable of switching precision
JPH0368416B2 (en)
CN116594590A (en) Multifunctional operation device and method for floating point data
US5260889A (en) Computation of sticky-bit in parallel with partial products in a floating point multiplier unit
CN117472325B (en) Multiplication processor, operation processing method, chip and electronic equipment
US5278782A (en) Square root operation device
CN116933840A (en) Multi-precision Posit encoding and decoding operation device and method supporting variable index bit width
CN116627379A (en) Reconfigurable method and system for supporting multi-precision floating point or fixed point operation
KR100290906B1 (en) method and appratus for performing simultaneously addition and rounding in a floating point multiplier
AU630617B2 (en) Improved floating point unit computation techniques
KR100317767B1 (en) Floating point binary quad word format multiply instruction unit
US6963895B1 (en) Floating point pipeline method and circuit for fast inverse square root calculations
JP2022162183A (en) Computing device and computing method
CN1220935C (en) Structure of enhancing half scale double accuracy floating point multiple continuous line efficiency
JPH04172526A (en) Floating point divider
JP2645422B2 (en) Floating point processor
Gonzalez-Navarro et al. A binary integer decimal-based multiplier for decimal floating-point arithmetic
Lutz et al. Fused FP8 4-Way Dot Product With Scaling and FP32 Accumulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240124

Address after: 518000, Building 307, Building 2, Nanshan Zhiyuan Chongwen Park, No. 3370 Liuxian Avenue, Fuguang Community, Taoyuan Street, Nanshan District, Shenzhen, Guangdong Province

Patentee after: Shenzhen Maitexin Technology Co.,Ltd.

Country or region after: China

Address before: Southern University of Science and Technology

Patentee before: Southern University of Science and Technology

Country or region before: China

TR01 Transfer of patent right