WO2023028884A1 - Floating-point number computing circuit and floating-point number computing method - Google Patents

Floating-point number computing circuit and floating-point number computing method Download PDF

Info

Publication number
WO2023028884A1
WO2023028884A1 PCT/CN2021/115811 CN2021115811W WO2023028884A1 WO 2023028884 A1 WO2023028884 A1 WO 2023028884A1 CN 2021115811 W CN2021115811 W CN 2021115811W WO 2023028884 A1 WO2023028884 A1 WO 2023028884A1
Authority
WO
WIPO (PCT)
Prior art keywords
mantissa
floating
point number
circuit
split
Prior art date
Application number
PCT/CN2021/115811
Other languages
French (fr)
Chinese (zh)
Inventor
毛伟
余浩
谢环
董镇江
Original Assignee
华为技术有限公司
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 南方科技大学 filed Critical 华为技术有限公司
Priority to PCT/CN2021/115811 priority Critical patent/WO2023028884A1/en
Priority to CN202180096895.1A priority patent/CN117178253A/en
Publication of WO2023028884A1 publication Critical patent/WO2023028884A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations

Definitions

  • the embodiment of the present application relates to the computer field, and further relates to the application of artificial intelligence (AI) technology in the computer field, especially a floating-point number calculation circuit and a floating-point number calculation method.
  • AI artificial intelligence
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • Convolution neural network is currently widely used in various types of image processing applications.
  • FP floating point
  • FP16 data Insufficient precision will lead to non-convergence or slow convergence of network training, so it is necessary to use higher-precision FP32 data to ensure the effect of network training.
  • higher-precision FP64 data In addition, in supercomputing applications, it is necessary to use higher-precision FP64 data for numerical calculations.
  • a multiplier with a smaller number of digits can be used to calculate a floating-point number with a larger number of digits.
  • FP64 type floating-point data can be calculated by a multiplier used to calculate FP32 data.
  • the network device splits the FP64 floating-point data into smaller floating-point numbers for multiplication, and then adds the results of the multiplication through the adder to obtain the product of the FP64 floating-point data.
  • the number of bits of the adder required to add the results of the multiplication operation is large, and the cost of hardware design is high, which is not conducive to technology promotion.
  • the embodiment of the present application provides a floating-point number calculation circuit and a floating-point number calculation method.
  • the floating-point number calculation circuit can split a floating-point number with a large number of digits into a floating-point number with a small number of digits. Therefore, the floating-point number calculation circuit
  • the timing overhead is short, the hardware design cost is low, and the calculation performance of the multiplier is reasonably utilized.
  • the first aspect of the present application provides a floating-point number calculation circuit, the floating-point number calculation circuit is used to calculate the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, The second floating-point number includes a second exponent and a second mantissa, and the floating-point number calculation circuit includes: an exponent processing circuit and a calculation circuit; an output terminal of the exponent processing circuit is electrically connected to an input terminal of the calculation circuit;
  • the index processing circuit is used to obtain a first shift number according to the first index and the second index, and the first shift number is used to represent the product between the first split mantissa and the second split mantissa
  • the number of shifts, the first split mantissa is obtained by splitting the first mantissa, and the second split mantissa is obtained by splitting the second mantissa; the calculation circuit is used to select and output multiple Part of the data in the first
  • the calculation circuit in the floating-point number calculation circuit may select to output some data in the multiple first operation results to obtain multiple first addition data and multiple second addition data.
  • the calculation circuit splits the multiple first calculation results with higher digits into multiple first addition data and multiple second addition data with lower digits by selecting part of the data in the first calculation results, and then, by The adder with a smaller bit width sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with a lower number of bits to obtain the product of the first mantissa and the second mantissa.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • the calculation circuit includes a multiplication circuit, an addition circuit, and a first selection circuit; the output terminal of the exponent processing circuit is electrically connected to the input terminal of the multiplication circuit; the first The input end of a selection circuit is electrically connected to the output end of the multiplication circuit, and the output end of the first selection circuit is electrically connected to the input end of the addition circuit; the first selection circuit is used to select and output a plurality of The low-order data in the first operation result obtains a plurality of the first addition data, and selects and outputs the high-order data in the first operation result to obtain a plurality of the second addition data; the addition circuit uses After adding a plurality of the first addition data and a plurality of the first operation results to obtain a low addition result and carry data, for the carry data, the plurality of second addition data and the plurality of first The operation results are added to obtain a high-order addition result, and the product of the first mantissa and the second mantissa is
  • the first selection circuit can select to output the low-order data in the plurality of first operation results to obtain a plurality of first addition data, and select to output a plurality of first addition data.
  • the high-order data in the operation result obtains a plurality of second addition data.
  • the first selection circuit may split the first operation result with a large number of bits into first addition data and second addition data with a small number of bits. Further, the first addition data and the second addition data are respectively summed with the corresponding first operation result to obtain a low bit addition result and a high bit addition result. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the calculation circuit.
  • the adding circuit includes a first adder and an accumulator; the input end of the first adder is electrically connected to the output end of the first selection circuit, and the first The output end of an adder is electrically connected to the input end of the accumulator; the first adder is used to compare a plurality of first addition data and a plurality of first operation results in a first calculation cycle adding the low-order addition result and the carry data, and adding the carry data, a plurality of the second addition data, and a plurality of the first operation results in a second calculation cycle to obtain the high-order addition result;
  • the accumulator is configured to accumulate the low-order addition result and the high-order addition result to obtain a product of the first mantissa and the second mantissa.
  • the length of the mantissa part of the FP64 floating point number is 53 bits. Therefore, the total length of the mantissa obtained after calculating A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 type floating-point numbers in a calculation unit (PE unit), the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits. The area cost and timing cost of the subsequent adder are too high.
  • the multiplication of a pair of FP64 mantissa can be selected and split into two parts (part1 and part2) by the first selection circuit, and the first adder calculates the part1 part in the first calculation cycle to obtain the low addition result, and in the second cycle Calculate part2 to get the high-order addition result.
  • the accumulator accumulates the results of the two calculation cycles to obtain the product of the first mantissa and the second mantissa. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the floating-point number calculation circuit.
  • the floating-point number calculation circuit further includes a splitting circuit; the output end of the splitting circuit is electrically connected to the input end of the exponent processing circuit and the input end of the multiplication circuit connection; the splitting circuit is used to split the first mantissa into the first split mantissa, the first split mantissa includes a first high-order mantissa and a first low-order mantissa, and the second The mantissa is split into the second split mantissa, the second split mantissa includes a second high-order mantissa and a second low-order mantissa, and the first shift number is used to indicate the highest bit of each high-order mantissa and each low-order mantissa The shift difference between the highest bits of .
  • the floating-point number calculation circuit can split the mantissa part with the larger number of digits of the first floating-point number into the first high-order mantissa and the first low-order mantissa with smaller digits, and the second The mantissa part with a large number of floating-point numbers is split into the second high-order mantissa and the second low-order mantissa with a smaller number of digits, so that a multiplier with a smaller number of digits is used to calculate the product of each split mantissa part, reducing the The design cost of the hardware rationally utilizes the computing performance of the multiplier.
  • the first high-order mantissa includes a third mantissa
  • the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa
  • the first The second high order mantissa includes the eighth mantissa
  • the second low order mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
  • a specific splitting method for the first mantissa and the second mantissa is provided. After splitting the mantissa part of the FP64 floating-point number using this splitting method, you can use FP32 After the mantissa part of the FP128 type floating-point number is split by this split method, the FP64 type multiplier can be used for calculation.
  • This splitting method can realize the multiplication of the first mantissa and the second mantissa by using a multiplier with a smaller number of digits. The construction cost of the floating-point number calculation circuit is reduced, and it is more conducive to technology promotion.
  • the floating-point number calculation circuit further includes a storage circuit; the output terminal of the splitting circuit is electrically connected to the input terminal of the storage circuit; the input terminal of the exponent processing circuit It is electrically connected to the first output end of the storage circuit; the input end of the calculation circuit is electrically connected to the second output end of the storage circuit; the storage circuit is used to store the first split mantissa, the The second split mantissa, the first exponent, the second exponent, the third shift number and the fourth shift number, the third shift number is used to represent the shift of the first split mantissa number of digits, the fourth shift digit is used to represent the shift digit of the second split mantissa.
  • This possible implementation method provides a specific implementation method for storing temporary data in a floating-point number calculation circuit, which improves the feasibility of the solution.
  • the exponent processing circuit includes a second adder, a second selection circuit, and a third adder; the input terminal of the second adder is connected to the first The output end is electrically connected, the output end of the second adder is electrically connected to the first input end of the third adder; the second input end of the third adder is connected to the output end of the second selection circuit Electrically connected, the output end of the third adder is electrically connected to the first input end of the calculation circuit; the second adder is used to combine the first index, the second index, the first Three shift numbers and the fourth shift number are added to obtain a plurality of second operation results; the second selection circuit is used to select a maximum value among the plurality of second operation results; the third addition and a device, configured to subtract the maximum value among the plurality of second operation results from each second operation result to obtain the first shift number.
  • This possible implementation method provides a specific implementation form of the index processing circuit, which improves the feasibility of the solution.
  • the multiplication circuit includes a multiplier and a shift register; the input end of the multiplier is electrically connected to the second output end of the storage circuit, and the output end of the multiplier end is electrically connected with the first input end of the shift register; the second input end of the shift register is electrically connected with the output end of the third adder; the output end of the shift register is electrically connected with the first input end of the shift register
  • the input terminal of an adder is electrically connected; the multiplier is used to multiply the first split mantissa and the second split mantissa to obtain a plurality of third operation results;
  • the first shift number performs shift processing on a plurality of the third operation results to obtain a plurality of the first operation results.
  • This possible implementation method provides a specific implementation form of the multiplication circuit, which improves the feasibility of the solution.
  • the floating-point number calculation circuit further includes a memory controller; the output terminal of the memory controller is electrically connected to the input terminal of the splitting circuit; the memory controller, It is used to obtain the first floating point number and the second floating point number, and send the first floating point number and the second floating point number to the splitting circuit.
  • This possible implementation manner provides a specific implementation form of a hardware structure capable of obtaining the first floating-point number and the second floating-point number, which improves the feasibility of the solution.
  • the first floating-point number further includes a first sign bit
  • the second floating-point number further includes a second sign bit
  • the second aspect of the present application provides a floating-point number calculation method for calculating the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, and the second floating-point number Including a second exponent and a second mantissa, the method includes: obtaining a first shift number according to the first exponent and the second exponent, and the first shift number is used to represent the first split mantissa and the second mantissa
  • the shift number of the product between the two split mantissas, the first split mantissa is obtained by splitting the first mantissa
  • the second split mantissa is obtained by splitting the second mantissa; select and output multiple Part of the data in the first operation result obtains a plurality of first addition data and a plurality of second addition data, based on a plurality of the first addition data, a plurality of the second addition data, and a plurality of
  • partial data in multiple first operation results may be selected to be output to obtain multiple first addition data and multiple second addition data.
  • a plurality of first operation results with a higher number of digits are split into a plurality of first addition data and a plurality of second addition data with a lower number of digits, and then, by bit width
  • the smaller adder sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with lower digits to obtain the product of the first mantissa and the second mantissa.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • partial data in the multiple first operation results may be selected to be output to obtain multiple first addition data and multiple second addition data.
  • a plurality of first operation results with higher digits are split into a plurality of first addition data and a plurality of second addition data with lower number of digits, and then, by bit The adder with smaller width sums the multiple first addition data, the multiple second addition data and the multiple first operation results with lower number of digits to obtain the product of the first mantissa and the second mantissa.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • the selecting and outputting part of the data in the multiple first operation results obtains multiple first addition data and multiple second addition data, and according to the multiple first addition data
  • Obtaining the product of the first mantissa and the second mantissa for a plurality of second addition data and a plurality of first operation results includes: selecting and outputting low-order data in a plurality of first operation results to obtain A plurality of the first addition data, select and output the high-order data in the plurality of first operation results to obtain a plurality of the second addition data; for the plurality of the first addition data and the plurality of the first operation The results are added to obtain a low-order addition result and carry data, and the carry data, a plurality of the second addition data and a plurality of the first operation results are added to obtain a high-order addition result, and the high-order addition result and the The product of the first mantissa and the second mantissa is obtained after the low-order addition results are accumulated.
  • the addition of the plurality of first addition data and the plurality of first operation results obtains the low-order addition result and carry data, and the carry data, the plurality of Adding the second addition data and a plurality of the first operation results to obtain a high-order addition result, and accumulating the high-order addition result and the low-order addition result to obtain a product of the first mantissa and the second mantissa , including: adding a plurality of first addition data and a plurality of first operation results in a first calculation cycle to obtain the low-order addition result and the carry data, and adding the carry data in a second calculation cycle , adding a plurality of the second addition data and a plurality of the first operation results to obtain the high-order addition result; accumulating the low-order addition result and the high-order addition result to obtain the first mantissa and the The product of the second mantissa.
  • both the first floating point number and the second floating point number are FP64 floating point numbers. Since the length of the mantissa part of the FP64 floating point number is 53 bits. Therefore, the total length of the mantissa obtained after calculating A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 type floating-point numbers in a calculation unit (PE unit), the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits. The area cost and timing cost of the subsequent adder are too high.
  • the method further includes: splitting the first mantissa into the first split mantissa, the first split mantissa includes the first high order mantissa and the first a low mantissa, splitting the second mantissa into the second split mantissa, the second split mantissa includes a second high mantissa and a second low mantissa, and the first shift number is used to indicate each high bit The shifted difference between the most significant bit of the mantissa and the most significant bit of each lower mantissa.
  • the mantissa part of the first floating-point number with a large number of digits can be split into the first high-order mantissa and the first low-order mantissa with a small number of digits, and the mantissa with a large second floating-point number The part is split into the second high-order mantissa and the second low-order mantissa with a smaller number of digits, so that a multiplier with a smaller number of digits is used to calculate the product of each mantissa after splitting, which reduces the design cost of the hardware and makes reasonable use of performance of the multiplier.
  • the first high-order mantissa includes a third mantissa
  • the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa
  • the first The second high order mantissa includes the eighth mantissa
  • the second low order mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
  • a specific splitting method for the first mantissa and the second mantissa is provided. After splitting the mantissa part of the FP64 floating-point number using this splitting method, you can use FP32 After the mantissa part of the FP128 type floating-point number is split by this split method, the FP64 type multiplier can be used for calculation.
  • This splitting method can realize the multiplication of the first mantissa and the second mantissa by using a multiplier with a smaller number of digits. The construction cost of the floating-point number calculation circuit is reduced, and it is more conducive to technology promotion.
  • the method further includes: storing the first split mantissa, the second split mantissa, the first exponent, the second exponent, the third shift number of digits and a fourth shift digit, the third shift digit is used to represent the shift digit of the first split mantissa, and the fourth shift digit is used to represent the shift digit of the second split mantissa digits.
  • This possible implementation method provides a specific implementation method for storing temporary data in a floating-point number calculation method, which improves the feasibility of the solution.
  • the obtaining the first shift number according to the first index and the second index includes: selecting a maximum value among the plurality of second operation results; The maximum value in the second operation results is respectively subtracted from the second operation results to obtain the first shift number.
  • This possible implementation manner provides a specific implementation form for obtaining the first shift number, which improves the feasibility of the solution.
  • the method further includes: multiplying the first split mantissa and the second split mantissa to obtain multiple third operation results; A shift number performs shift processing on a plurality of the third operation results to obtain a plurality of the first operation results.
  • This possible implementation manner provides a specific implementation form for obtaining the first operation result, which improves the feasibility of the solution.
  • the method further includes: acquiring the first floating point number and the second floating point number.
  • the first floating-point number further includes a first sign bit
  • the second floating-point number further includes a second sign bit
  • the third aspect of the present application provides a floating-point number calculation circuit
  • the floating-point number calculation circuit includes: an index processing circuit and a calculation circuit, the calculation circuit includes a first multiplication circuit, a first selector and an addition circuit; the index The output end of the processing circuit is electrically connected to the input end of the first multiplication circuit; the output end of the first multiplication circuit is electrically connected to the input end of the first selector; the output end of the first selector is electrically connected to the input end of the first selector.
  • the input end of the adding circuit is electrically connected.
  • the first multiplication circuit in the process of calculating the product of the mantissa parts of two floating-point numbers by the calculation circuit in the floating-point number calculation circuit, the first multiplication circuit will output multiple calculation results, and the first selector can place multiple calculation results in The operation result with a larger number is split into the first addition data and the second addition data with a smaller number of digits, and then, in the addition circuit, a plurality of first addition data with lower digits can be combined by an adder with a smaller bit width. and summing the plurality of second addition data and the plurality of first operation results to obtain a product of the first mantissa and the second mantissa.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • the floating-point number calculation circuit is used to calculate the product of the first floating-point number and the second floating-point number, and the first floating-point number includes the first exponent, the first mantissa, and the first Sign bit, the second floating point number includes a second exponent, a second mantissa and a second sign bit; the input end of the exponent processing circuit is used to receive the first exponent and the second exponent; the calculation circuit The input terminal of is used to receive the first mantissa and the second mantissa.
  • the adding circuit includes a first adder and an accumulator; the input end of the first adder is electrically connected to the output end of the first selector, and the first adder The output of an adder is electrically connected to the input of the accumulator.
  • the first adder can use the first addition data and multiple unsplit
  • the low-order addition result is obtained from the operation result
  • the high-order addition result is obtained according to the second addition data and the multiple divided operation results.
  • the accumulator sums the low-order addition result and the high-order addition result to obtain the product of the first mantissa and the second mantissa. Since the bit width of the low-order addition result and the high-order addition result is small, the first adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the calculation circuit.
  • the first multiplication circuit includes a first multiplier and a first shift register; the first input terminal of the first shift register is connected to the output of the exponent processing circuit The terminal is electrically connected, the second input terminal of the first shift register is electrically connected to the output terminal of the first multiplier, and the output terminal of the first shift register is electrically connected to the input terminal of the first selector. connect.
  • the first multiplier can calculate the product between the split first mantissa and the split second mantissa, and the first shift register can be based on the shift number output by the exponent processing circuit. The result output by the first multiplier is shifted, and the operation result is output after the shift.
  • This possible implementation provides a specific implementation of the multiplication circuit, which improves the feasibility of the solution.
  • the exponent processing circuit includes a second adder, a second selector, and a third adder; the output terminal of the second adder is connected to the first The first input end of the three adders is electrically connected; the second input end of the third adder is electrically connected to the output end of the second selector, and the output end of the third adder is electrically connected to the first shifter.
  • the first input terminal of the bit register is electrically connected.
  • This possible implementation method provides a specific implementation form of the index processing circuit, which improves the feasibility of the solution.
  • the calculation circuit further includes a second multiplication circuit, and the second multiplication circuit includes a second multiplier and a second shift register; the second shift register of the second shift register An input terminal is electrically connected to the output terminal of the index processing circuit, a second input terminal of the second shift register is electrically connected to the output terminal of the second multiplier, and an output terminal of the second shift register It is electrically connected with the input terminal of the first adder.
  • the second multiplier in the second multiplication circuit multiplies the split first mantissa and the split second mantissa, and the second shift register is based on the shift output from the exponent processing circuit.
  • the number shifts the result output by the second multiplier to obtain multiple operation results.
  • the second shift register directly inputs a plurality of operation results to the first adder, so that the first adder can combine the first addition result output in the first multiplication circuit, the second addition result and the multiplication result output by the second shift register.
  • the results of two operations are summed to obtain the product of the first mantissa and the second mantissa.
  • the floating-point number calculation circuit further includes a memory controller, a third selector, and a register; the input end of the third selector is electrically connected to the output end of the memory controller The output end of the third selector is electrically connected to the input end of the register; the first output end of the register is electrically connected to the input end of the index processing circuit, and the second output end of the register is electrically connected to the input end of the index processing circuit.
  • the input terminals of the calculation circuit are electrically connected.
  • the memory controller transmits the first floating-point number and the second floating-point number acquired at the memory to the third selector, and the third selector splits the first mantissa and the second mantissa and then inputs them to the register save.
  • This possible implementation improves the feasibility of the solution.
  • a third aspect of the embodiments of the present application provides a computing device, and the computing device includes a control circuit and a floating-point number computing circuit.
  • the floating-point number calculation circuit calculates data under the control of the control circuit, and the floating-point number calculation circuit is the floating-point number calculation circuit described in the first aspect or any possible implementation of the first aspect, or,
  • the floating-point number calculation circuit is the floating-point number calculation circuit described in the third aspect or any possible implementation manner of the third aspect.
  • Fig. 1 is the processing schematic diagram of the convolutional neural network provided by the present application.
  • Fig. 2 is the composition schematic diagram of the floating-point number of FP32 type that the embodiment of the present application provides;
  • FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a computing circuit provided by an embodiment of the present application.
  • Fig. 5 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
  • Fig. 6 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the mantissa part of the first floating-point number and the second floating-point number provided in the present application;
  • FIG. 8 is a schematic structural diagram of a computing circuit provided in this application.
  • FIG. 9 is a schematic diagram of an operation process of a computing circuit provided in the present application.
  • FIG. 10 is a schematic diagram of an embodiment of a floating-point number calculation circuit provided in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 12 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a split circuit provided by the present application.
  • FIG. 14 is a schematic structural diagram of a first mantissa and a second mantissa provided by the present application;
  • FIG. 15 is a schematic structural diagram of a storage circuit provided by the present application.
  • FIG. 16 is a schematic diagram of a connection relationship between a memory controller and memory provided by the present application.
  • FIG. 17 is a schematic structural diagram of an index processing circuit provided in this application.
  • FIG. 18 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 19 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 20 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 21 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • Fig. 1 is a processing schematic diagram of the convolutional neural network provided by the present application.
  • Convolutional neural network has broad application prospects in image, speech recognition and other fields.
  • the convolutional neural network needs to perform convolution operations on multiple convolution kernels and one or more feature maps. Specifically, for each convolution kernel, start from the first pixel of the feature map, and move pixel by pixel in the row direction. When the end of the row is reached, move down one pixel in the column direction, and return to the row direction starting point, and repeat the process of moving in the row direction until all pixels of the feature map are traversed.
  • the parameters in the convolution kernel and the data in the corresponding position of the feature map are used as the two parts of the convolution operation to perform the convolution operation (multiply two by two and then accumulate the products one by one), After the convolution result is obtained, the convolution result is output.
  • Convolution neural network is currently widely used in various types of image processing applications.
  • image processing applications use floating point (floating point, FP) 16 types of data to perform network training on the model
  • FP16 floating point
  • FP32 floating point
  • the floating-point calculation circuit involved in the present invention can be applied not only in the field of artificial intelligence, but also in the field of data signal processing, such as image processing systems, radar systems and communication systems.
  • the circuit and method can optimize the performance of digital signal processing (DSP) or other digital devices.
  • DSP digital signal processing
  • LTE long term evolution
  • UMTS universal mobile telecommunications system
  • GSM global system for mobile communications
  • a multiplier with a smaller number of digits can be used to calculate a floating-point number with a larger number of digits.
  • FP64 type floating-point data can be calculated by a multiplier used to calculate FP32 data.
  • the network device splits the FP64 floating-point data into smaller floating-point numbers for multiplication, and then adds the results of the multiplication through the adder to obtain the product of the FP64 floating-point data.
  • the bit width of the adder required for adding the multiplication results is large, and the hardware design cost is high, which is not conducive to technology promotion.
  • the embodiments of the present application provide a floating-point number calculation circuit, a floating-point number calculation method, and a calculation device.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • each floating-point number consists of three parts, namely the sign bit (sign), the exponent bit (exp) and the mantissa bit (mantissa).
  • sign the sign bit
  • exp the exponent bit
  • mantissa the mantissa bit
  • the actual value of a floating point number is equal to sign*2 exp *mantissa.
  • FIG. 2 is a schematic diagram of composition of FP32 type floating-point numbers provided by the embodiment of the present application.
  • the FP32 floating-point number has 1-bit sign, 8-bit exp and 24-bit mantissa, displaying a total of 32 bits stored. Among them, the highest bit of mantissa is implicitly stored (if exp is not 0, the hiden bit is 1, otherwise the hiden bit is 0), and the three parts total 32 bits.
  • the floating-point number calculation circuit, floating-point number calculation method and calculation device provided by the application will be introduced in detail below in conjunction with the accompanying drawings in the application. First, the floating-point number calculation circuit provided by the application will be introduced.
  • FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the floating-point number calculation circuit includes at least an exponent processing circuit 101 and a calculation circuit 102 .
  • the output end of the index processing circuit 101 is electrically connected to the input end of the calculation circuit 102 .
  • the floating-point number calculation circuit is used to calculate the product of the first floating-point number and the second floating-point number.
  • the first floating point number includes a first exponent and a first mantissa
  • the second floating point number includes a second exponent and a second mantissa.
  • the exponent processing circuit 101 and the calculation circuit 102 can calculate the product of the mantissa part of the first floating point number and the second floating point number according to the first floating point number and the second floating point number. This calculation process will be explained below.
  • the exponent processing circuit 101 may obtain the first shift number according to the first exponent and the second exponent.
  • the first shift number is used to represent the shift number of the product between the first split mantissa and the second split mantissa.
  • the first split mantissa is obtained by splitting the first split mantissa, and the second split mantissa is split by the second split mantissa. Get it.
  • the calculation circuit 102 may select to output part of the data in the multiple first calculation results to obtain the multiple first addition data and the multiple second addition data.
  • the multiple first addition data the multiple second addition data and Multiple first operation results obtain the product of the first mantissa and the second mantissa, and the first operation result is used to represent the data obtained after the product of the first split mantissa and the second split mantissa is shifted according to the first shift number .
  • the calculation circuit in the floating-point number calculation circuit may select to output some data in the multiple first operation results to obtain multiple first addition data and multiple second addition data.
  • the calculation circuit splits the multiple first calculation results with higher digits into multiple first addition data and multiple second addition data with lower digits by selecting part of the data in the first calculation results, and then, by The adder with a smaller bit width sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with lower bits to obtain the product of the first mantissa and the second mantissa.
  • the bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
  • FIG. 4 is a schematic structural diagram of a calculation circuit 102 provided in an embodiment of the present application.
  • the calculation circuit 102 may include a multiplication circuit 201 , a first selection circuit 202 and an addition circuit 203 .
  • the output terminal of the exponent processing circuit 101 is electrically connected to the input terminal of the multiplication circuit 201 .
  • the input end of the first selection circuit 202 is electrically connected to the output end of the multiplication circuit 201
  • the output end of the first selection circuit 202 is electrically connected to the input end of the addition circuit 203 .
  • the first selection circuit 202 may select and output low-order data among the multiple first operation results to obtain multiple first addition data, and select and output high-order data among the multiple first operation results to obtain multiple second addition data.
  • the addition circuit 203 can add a plurality of first addition data and a plurality of first operation results to obtain a low-order addition result and carry data, and add the carry data, a plurality of second addition data and a plurality of first operation results to obtain a high-order addition As a result, the product of the first mantissa and the second mantissa is obtained after the high-order addition result and the low-order addition result are accumulated.
  • the addition circuit 203 and/or more first selection circuits 202 are not limited here.
  • the adding circuit 203 has a specific implementation manner. The following will take FIG. 5 as an example to illustrate a specific implementation form of the adding circuit 203 provided in the present application.
  • FIG. 5 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
  • the adding circuit 203 may include a first adder 301 and an accumulator 302 .
  • the input end of the first adder 301 is electrically connected to the output end of the first selection circuit 202
  • the output end of the first adder 301 is electrically connected to the input end of the accumulator 302 .
  • the first adder 301 can add a plurality of first addition data and a plurality of first operation results in the first calculation cycle to obtain the low-order addition result and carry data, and in the second calculation cycle, the carry data and the plurality of second addition data and adding the multiple first operation results to obtain a high-order addition result.
  • the accumulator 302 may accumulate the low bit addition result and the high bit addition result to obtain the product of the first mantissa and the second mantissa.
  • the numbers of the first adder 301 and the number of accumulators 302 included in the adding circuit 203 shown in FIG. 5 are only for illustration.
  • the adding circuit 203 may include more first adders 301 and/or more accumulators 302, which are not limited here.
  • the multiplication circuit 201 has a specific implementation manner. The following will take FIG. 6 as an example to illustrate a specific implementation form of the multiplication circuit 201 provided in this application.
  • FIG. 6 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
  • the multiplication circuit 201 includes a multiplier 303 and a shift register 304;
  • the input terminal of the multiplier 303 is electrically connected to the output terminal of the exponent processing circuit 101 , and the output terminal of the multiplier 303 is electrically connected to the input terminal of the shift register 304 .
  • the multiplier 303 may multiply the first split mantissa and the second split mantissa to obtain multiple third operation results.
  • the shift register 304 can perform shift processing on multiple third operation results according to the multiple first shift numbers to obtain multiple first operation results.
  • the format of the mantissa part of the first floating point number and the second floating point number is first introduced.
  • Fig. 7 is a schematic diagram of mantissa parts of the first floating point number and the second floating point number provided in this application.
  • the calculation circuit can split the mantissa part (first mantissa) of the first floating-point number A into five parts a0, a1, a2, a3, and a4.
  • the mantissa part (second mantissa) of the second floating point number B is split into five parts b0, b1, b2, b3 and b4.
  • a0, a1, a2, a3, and a4 are the first split mantissas
  • b0, b1, b2, b3, b4 are the second split mantissas.
  • the digits of a1, a2, a3, a4, b1, b2, b3, and b4 are all 12 bits
  • the digits of a0 and b0 are 5 bits.
  • the multiplication of the mantissa part of the first floating point number A and the mantissa part of the second floating point number B can be expressed as Formula 1 when the calculation circuit 102 performs operations.
  • the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits.
  • the area cost and timing cost of the subsequent adder are too high. Therefore, you can choose to divide the multiplication of a pair of FP64 mantissa into two parts, calculate the first part (part1) in the above formula in the first calculation cycle, and calculate the second part (part2) in the second calculation cycle.
  • FIG. 8 is a schematic structural diagram of a computing circuit provided in this application.
  • each calculation module respectively calculate the product between the first split mantissa and the second split mantissa such as a0*b4, a4*b0, a1*b3, a3*b1, a2*b2,
  • the a0*b4, a4*b0, a1*b3, a3*b1, and a2*b2 mentioned above are for example illustration.
  • the multiplier will divide the 5 first split mantissas and the 5 second split mantissas into two All the products between the two are calculated, and 25 third operation results are obtained.
  • the first operation result can be obtained.
  • 48bit is the following third operation results a0*b4, a4*b0 , the first shift number of a1*b3, a3*b1, a2*b2.
  • the multiple shift registers can perform shift processing on the multiple third operation results according to the first shift number.
  • the third operation result and the corresponding 48-bit shift number described in the above example are only illustrative. In the actual calculation process, the shift register can shift more third operation results according to other first shift numbers. , the first shift number may also be other shift numbers, which are not limited here.
  • the first selection circuit in the first calculation module and the second calculation module receives the shifted a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 (the first calculation result ). Since the length of the shifted results of a2*b2, a1*b3, a3*b1, a0*b4, and a4*b0 is 24 bits, the first selection circuit can output the lower 12 bits (first addition data) in the first calculation cycle , through a plurality of 52bit adders (first adder), the lower 12bit and the shifted a1*b4, a4*b1, a2*b3, a3*b2, a3*b3, a2*b4, a4* Add b2, a3*b4, a4*b3, and a4*b4 (the first operation result) to obtain the low-order addition result and carry data, where the carry data refers to the data generated by carry after multiple addition results are calculated.
  • the first selection circuit in the first calculation module and the second calculation module receives a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 (the first operation result ).
  • the first selection circuit can output the high-order 12 bits (second addition data) in the second calculation cycle , through a plurality of 52-bit adders (first adder), the high-order 12bit and the shifted a0*b0, a0*b1, a1*b0, a0*b2, a2*b0, a1*b1, a0* Add b3, a3*b0, a1*b2 and a2*b1 (the first operation result) and the carry data to obtain the high-order addition result.
  • first adder 52-bit adders
  • the accumulator accumulates the low bit addition result and the high bit addition result to obtain the product of the first mantissa and the second mantissa.
  • the above example illustrates the operation process of the calculation circuit 102 in conjunction with the specific hardware structure in FIG. 8 .
  • the calculation circuit 102 obtains the low-order addition in the first calculation cycle according to the first addition data and the first operation result.
  • FIG. 9 is a schematic diagram of an operation process of a calculation circuit provided in this application.
  • FIG. 9 in which PP1 to PP25 respectively represent a plurality of first operation results, wherein a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 respectively correspond to PP11 to PP15 in the figure.
  • the sum of the low-order mantissa parts of PP11 to PP15 and other first operation results is calculated in the first calculation cycle, and the sum of the high-order mantissa parts of the carry data PP00, PP11 to PP15 and other first operation results is calculated in the second calculation cycle.
  • the first calculation cycle calculates the sum of 15 groups of low-bit bits, and the low 12 bits of PP1 are taken as the [11:0] bits of the final 106-bit result, and the high 12 bits of PP1 are used as the addition tree. Input for the lower 12 bits. Take the lower 12 bits of PP11 to PP15 as the upper 12 bits in the input 48-bit addition tree, and PP2 to PP10 are as shown in the figure, and the [59:12] bits after the interception and shift are the input of the addition tree, here In the first calculation cycle, the 15 groups of low-bit bits are calculated and accumulated to obtain a 52-bit result.
  • the 48-bit result of [47:0] is the [59:12] bit of the final 106-bit result, and the high-order 4 Bits [51:48] are used as carry data.
  • the second calculation cycle take the high 12 bits of PP11 to PP15 as the low 12 bits in the input 48-bit addition tree, and get the low 12 bits of PP25 as the high 12 bits of the input 48-bit addition tree, here
  • the bit width of PP25 is only 10 bits, so the upper 12 bits of PP25, that is, [23:12] can be ignored, and PP16 to PP24 are as shown in the figure, and the bits [59:12] after interception and shifting are
  • the input of the addition tree is added with the carry signal of the first calculation cycle to obtain a 52-bit result.
  • bit width of the addition tree can be achieved 48bit. It is also possible to completely use the addition tree to cover the carry. At this time, the bit width of the addition tree needs to be 52 bits. Specifically, there is no limitation here.
  • FIG. 10 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • FIG. 10 they are the corresponding positions in the addition tree of the first operation results obtained after the calculation of each part of part1 and part2.
  • the 60bit addition tree can cover the calculation of part1.
  • the addition tree for calculating the low-order addition result (Part1) requires a 52-bit bit width, and the addition tree can completely cover the 4-bit carry of the Part1 part.
  • the addition tree for calculating the high-order addition result (Part2) requires a 48-bit bit width. In this way, for each Part, the addition tree only needs 52 bits at most to realize the multiplication operation between the first mantissa and the second mantissa.
  • FIG. 11 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • the black parts in the figure are the corresponding positions of the first operation results in part1 in the addition tree.
  • the lower 12 bits of a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 shifted according to the first shift number are located at 48bit-60bit.
  • the shifted a1*b4, a4*b1, a2*b3, and a3*b2 are located at 36bit-60bit.
  • the shifted a3*b3, a2*b4, and a4*b2 are located at 24bit-48bit.
  • the shifted a3*b4 and a4*b3 are located at 12bit-36bit.
  • the shifted a4*b4 is located at 0bit-24bit.
  • FIG. 12 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • the black parts in the figure are the corresponding positions of the first operation results in part2 in the addition tree.
  • the high-order 12 bits of a0*b4 and a4*b0 shifted according to the first shift number are located at 60bit-72bit.
  • the high-order 12 bits of a2*b2, a1*b3, and a3*b1 shifted according to the first shift number are located at 60bit-72bit.
  • a0*b3 and a3*b0 shifted according to the first shift number are located at 60bit-77bit.
  • the shifted a1*b2 and a2*b1 according to the first shift number are located at 60bit-84bit.
  • the shifted a0*b2 and a2*b0 according to the first shift number are located at 72bit-89bit.
  • a1*b1 shifted according to the first shift number is located at 72bit-96bit.
  • the shifted a0*b1 and a1*b0 according to the first shift number are located at 84bit-101bit.
  • the a0*b0 shifted according to the first shift number is located at 96bit-106bit.
  • the floating-point number calculation circuit may also include a splitting circuit.
  • FIG. 13 is a schematic structural diagram of a splitting circuit provided in the present application.
  • the output terminal of the splitting circuit is electrically connected with the input terminal of the exponent processing circuit and the input terminal of the multiplication circuit.
  • the splitting circuit may include a first selector and a register, and an output end of the first selector is electrically connected to an input end of the register.
  • the first selector inputs the first floating-point number and the second floating-point number, and the first selector stores the results obtained after splitting the first floating-point number and the second floating-point number into corresponding registers.
  • FIG. 14 is a schematic structural diagram of a first mantissa and a second mantissa provided by the present application.
  • multiple first selectors in the splitting circuit can split the first mantissa into the first split mantissa, the first split mantissa includes the first high order mantissa and the first low order mantissa, and the first split mantissa
  • the two-mantissa is split into the second split mantissa, the second split mantissa includes the second high-order mantissa and the second low-order mantissa
  • the first shift number is used to indicate the difference between the highest bit of each high-order mantissa and the highest bit of each low-order mantissa shift difference.
  • the first high-order mantissa includes the third mantissa
  • the first low-order mantissa includes the fourth mantissa
  • the fifth mantissa includes the sixth mantissa
  • the second high-order mantissa includes the eighth mantissa
  • the second low-order mantissa includes Ninth mantissa, tenth mantissa, eleventh mantissa, and twelfth mantissa.
  • multiple first selectors in the splitting circuit can split the first mantissa of the first floating-point number into the third mantissa and the fourth mantissa , the fifth mantissa, the sixth mantissa, and the seventh mantissa.
  • the first selector may split the second mantissa of the second floating-point number into an eighth mantissa, a ninth mantissa, a tenth mantissa, an eleventh mantissa, and a twelfth mantissa.
  • the splitting circuit can split the mantissa part of the first floating-point number into the third mantissa 10001 with a length of 5 bits, the fourth mantissa with a length of 12 bits 100000000001, the fifth mantissa with a length of 12 bits 100000000011, and the sixth mantissa with a length of 12 bits 100000000111 and the seventh mantissa 100000001111 with a length of 12 bits.
  • the third mantissa belongs to the first high mantissa
  • the fourth mantissa, the fifth mantissa, the sixth mantissa, and the seventh mantissa belong to the first low mantissa.
  • the first shift number is used to indicate the shift difference between the highest bit of the high-order mantissa and the highest bit of each low-order mantissa, that is, the first shift number of the first mantissa is 0, and the first shift number of the fourth mantissa is the first
  • the shift difference between the first digit of the four mantissa and the first digit of the third mantissa is 5 bits, which is the same as the number of digits of the third mantissa, so the first shift digit of the fourth mantissa is a right shift of 5 bits.
  • the first shift digit of the fifth mantissa is the 17-bit shift difference between the first digit of the fifth mantissa and the first digit of the third mantissa, which is the same as the sum of the shift digits of the third mantissa and the fourth mantissa, so the fifth The first shift of the mantissa is a right shift of 17 bits.
  • the first shift of the sixth mantissa is the shift difference of 29 bits between the first digit of the sixth mantissa and the first digit of the third mantissa, which is the same as the sum of the shift digits of the third mantissa, the fourth mantissa, and the fifth mantissa , so the first shift of the sixth mantissa is a right shift of 29 bits.
  • the first shift digit of the seventh mantissa is the shift difference of 41 bits between the first digit of the seventh mantissa and the first digit of the third mantissa, and the third digit, the fourth digit, the fifth digit and the sixth digit
  • the sum of the shift digits is the same, so the first shift digit of the seventh mantissa is shifted right by 41 bits.
  • the first high-order mantissa and the second high-order mantissa can also have other different splitting methods, for example, the length of the first digit is 9 bits, and the second mantissa, the third mantissa, the fourth mantissa and the fifth mantissa are all It is 11bit, which is not limited here.
  • the splitting manner of the second high-order mantissa is similar to that of the first high-order mantissa, and the splitting manner of the second low-order mantissa is similar to that of the first low-order mantissa, and details are not described here.
  • the first floating-point number can be a floating-point number of type FP32, the first floating-point number can also be a floating-point number of type FP64, and the first floating-point number can also be a floating-point number of type FP128.
  • the number of points is not limited here.
  • the mantissa part of the first floating-point number may be split into two parts, or may be split into multiple parts, which is not specifically limited here.
  • the number of digits of each mantissa part after splitting may be equal, or the number of digits of each mantissa part after splitting may be unequal, which is not specifically limited here.
  • the floating-point number calculation circuit may further include a storage circuit.
  • the output end of the splitting circuit is electrically connected to the input end of the storage circuit
  • the input end of the index processing circuit is electrically connected to the first output end of the storage circuit.
  • the input terminal of the calculation circuit is electrically connected with the second output terminal of the storage circuit.
  • FIG. 15 is a schematic structural diagram of a storage circuit provided in the present application.
  • a plurality of registers are included in the storage circuit for storing the first split mantissa, the second split mantissa, the first exponent, the second exponent, the third shift number and the fourth shift number, the third The number of shifts is used to represent the number of shifts in the mantissa of the first split, and the fourth number of shifts is used to represent the number of shifts in the mantissa of the second split.
  • the number of registers included in the storage circuit in FIG. 15 is only for illustration.
  • the storage circuit may include more registers than those shown in FIG. 15 , and the storage circuit may include fewer registers than those shown in FIG. 15 , which are not specifically limited here.
  • the floating-point number calculation circuit may further include a memory controller.
  • FIG. 16 is a schematic diagram of a connection relationship between a memory controller and memory provided by the present application.
  • the input end of the memory controller is connected to the output end of the memory, and the output end of the memory controller is electrically connected to the input end of the split circuit.
  • the first floating point number and the second floating point number are stored in the memory, and the memory controller can obtain the first floating point number and the second floating point number, and send the first floating point number and the second floating point number to the splitting circuit.
  • the memory may be a double data rate (DDR) memory, or other memory, which is not specifically limited here.
  • the memory controller may be a DDR controller, or other types of memory controllers, which are not specifically limited here.
  • the index processing circuit 101 has a specific implementation method.
  • the index processing circuit 101 can obtain the first shift number according to the first index and the second index. There is also a specific calculation method. The specific method of the index processing circuit 101 will be described below in conjunction with FIG. 17 Implementation and operation process of the exponent processing circuit 101.
  • FIG. 17 is a schematic structural diagram of an index processing circuit provided in this application.
  • the exponent processing circuit 101 includes at least a second adder 401 , a second selection circuit 402 and a third adder 403 .
  • the input end of the second adder 401 is electrically connected to the first output end of the storage circuit, and the output end of the second adder 401 is electrically connected to the first input end of the third adder 403 .
  • the second input end of the third adder 403 is electrically connected to the output end of the second selection circuit, and the output end of the third adder 403 is electrically connected to the first input end of the calculation circuit 102 .
  • the second adder 401 may add the first exponent, the second exponent, the third shift number and the fourth shift number to obtain multiple second operation results.
  • the second selection circuit 402 may select the maximum value among the multiple second operation results.
  • the third adder 403 subtracts the maximum value among the multiple second operation results from each second operation result to obtain the first shift number.
  • the present application also provides a floating-point number calculation method.
  • the specific implementation of the floating-point number calculation method can be understood with reference to the above-mentioned floating-point number calculation circuits described in FIG. 3 to FIG. 17 , and details are not repeated here.
  • the present application also provides another floating-point number calculation circuit.
  • the specific implementation of the floating-point number calculation method can be understood with reference to the above-mentioned floating-point number calculation circuit described in Figures 3 to 17, and details will not be repeated here.
  • FIG. 18 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • Step 1 Please refer to FIG. 18, the second floating point number B is the data in the filter matrix.
  • the DDR controller memory controller
  • X in Figure 10 is the A_MSB and A_LSB obtained after the mantissa split of each first floating-point number A , and the exponent part EXP corresponding to each A_MSB, A_LSB, the mantissa part of the second floating point number B is split into two parts, MSB and LSB, and stored in the weight RAM (storage circuit), among 1, 2, and N in Fig. 18
  • the included content is B_MSB and B_LSB obtained after splitting the mantissa of each second floating point number B, and the exponent part EXP corresponding to each B_MSB and B_LSB.
  • FIG. 19 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • Step 2 Please refer to Figure 19, the split mantissa in the weight RAM is preloaded into the convolution calculation unit, and at the same time EXP (the exponent part corresponding to each mantissa part after splitting) is processed by EXP offset (the second adder) , which is also preloaded into the convolution computing unit.
  • EXP the exponent part corresponding to each mantissa part after splitting
  • FIG. 20 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • Step 3 Please refer to Figure 20, extract the first segment of mantissa data (Part I) from the data RAM, and the same EXP part is also first processed by exp offset, and then placed in the convolution calculation unit, and the preloaded parameters (Part I) ) to calculate and get the result.
  • FIG. 21 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
  • Step 4 Please refer to FIG. 21 , the convolution processing unit 1 forwards the first piece of data (Part I) to the calculation unit 2, and obtains the second piece of data (Part II) from the data RAM. After the calculation unit 1 acquires the data of the II part, the calculation unit 2 completes the operation and generates the result after acquiring the data of the I part. After each clock, computing units 2-N forward the data processed by the previous clock to the next computing unit, and computing unit 1 acquires new data from the data RAM each time.
  • Step 5 Repeat step 4 until all the data is calculated and the result is generated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

A floating-point number computing circuit, a floating-point number computing method, and a computing apparatus. A computing circuit in the floating-point number computing circuit may select and output a portion of data in multiple first operation results to obtain multiple first additive data and multiple second additive data. By means of selecting a portion of data in the first operation results, the computing circuit splits the multiple first operation results having a high number of bits into multiple first additive data and multiple second additive data having a low number of bits, and then by means of an adder which has a small bit width, adding the multiple first additive data and multiple second additive data having a low number of bits, and the multiple first operation results to obtain the product of a first mantissa and a second mantissa. The adder used when calculating the product of mantissa portions of a first floating-point number and a second floating-point number has a small bit width and low hardware design costs, and is thus more suitable for technological popularization.

Description

一种浮点数计算电路以及浮点数计算方法A floating-point number calculation circuit and a floating-point number calculation method 技术领域technical field
本申请实施例涉及计算机领域,进一步涉及人工智能(artificial intelligence,AI)技术在计算机领域中的应用,尤其是一种浮点数计算电路以及浮点数计算方法。The embodiment of the present application relates to the computer field, and further relates to the application of artificial intelligence (AI) technology in the computer field, especially a floating-point number calculation circuit and a floating-point number calculation method.
背景技术Background technique
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
卷积神经网络(convolution neural network,CNN)目前被广泛应用于多种类型的图像处理应用中,此类应用在使用浮点数(floating point,FP)16数据对模型进行网络训练时,由于FP16数据精度不足,会导致网络训练不收敛或者收敛速度慢,所以需要使用更高精度的FP32数据来保证网络训练效果。此外,在超算应用中,需要使用更高精度的FP64数据来进行数值计算。Convolution neural network (CNN) is currently widely used in various types of image processing applications. When such applications use floating point (floating point, FP) 16 data for network training, due to FP16 data Insufficient precision will lead to non-convergence or slow convergence of network training, so it is necessary to use higher-precision FP32 data to ensure the effect of network training. In addition, in supercomputing applications, it is necessary to use higher-precision FP64 data for numerical calculations.
现有的数据计算方案中,可以采用较小位数的乘法器来计算位数较大的浮点数。例如,可以通过个用于计算FP32数据的乘法器来计算FP64类型的浮点数据。网络设备通过将FP64类型的浮点数据拆分为位数较小的浮点数后做乘法运算,再通过加法器将乘法运算后的结果相加得到FP64类型的浮点数据的乘积。该种传统的计算方式中,将乘法运算后的结果相加所需要的加法器的位数大,硬件设计代价高,不利于技术推广。In the existing data calculation scheme, a multiplier with a smaller number of digits can be used to calculate a floating-point number with a larger number of digits. For example, FP64 type floating-point data can be calculated by a multiplier used to calculate FP32 data. The network device splits the FP64 floating-point data into smaller floating-point numbers for multiplication, and then adds the results of the multiplication through the adder to obtain the product of the FP64 floating-point data. In this traditional calculation method, the number of bits of the adder required to add the results of the multiplication operation is large, and the cost of hardware design is high, which is not conducive to technology promotion.
发明内容Contents of the invention
本申请实施例提供了一种浮点数计算电路以及浮点数计算方法,该浮点数计算电路可以把位数较大的浮点数拆分为位数较小的浮点数,从而,该浮点数计算电路时序开销短,硬件设计代价低,合理的利用了乘法器的计算性能。The embodiment of the present application provides a floating-point number calculation circuit and a floating-point number calculation method. The floating-point number calculation circuit can split a floating-point number with a large number of digits into a floating-point number with a small number of digits. Therefore, the floating-point number calculation circuit The timing overhead is short, the hardware design cost is low, and the calculation performance of the multiplier is reasonably utilized.
本申请第一方面提供了一种浮点数计算电路,所述浮点数计算电路用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数和第一尾数,所述第二浮点数包括第二指数和第二尾数,所述浮点数计算电路包括:指数处理电路和计算电路;所述指数处理电路的输出端与所述计算电路的输入端电连接;所述指数处理电路,用于根据所述第一指数和所述第二指数获取第一移位数,所述第一移位数用于表示第一拆分尾数和第二拆分尾数之间乘积的移位数,所述第一拆分尾数由所述第一尾数拆分得到,所述第二拆分尾数由所述第二尾数拆分得到;所述计算电路,用于选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二 加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,所述第一运算结果用于表示所述第一拆分尾数和所述第二拆分尾数的乘积根据所述第一移位数进行移位后得到的数据。The first aspect of the present application provides a floating-point number calculation circuit, the floating-point number calculation circuit is used to calculate the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, The second floating-point number includes a second exponent and a second mantissa, and the floating-point number calculation circuit includes: an exponent processing circuit and a calculation circuit; an output terminal of the exponent processing circuit is electrically connected to an input terminal of the calculation circuit; The index processing circuit is used to obtain a first shift number according to the first index and the second index, and the first shift number is used to represent the product between the first split mantissa and the second split mantissa The number of shifts, the first split mantissa is obtained by splitting the first mantissa, and the second split mantissa is obtained by splitting the second mantissa; the calculation circuit is used to select and output multiple Part of the data in the first operation result obtains a plurality of first addition data and a plurality of second addition data, based on a plurality of the first addition data, a plurality of the second addition data, and a plurality of the first operation results Acquire the product of the first mantissa and the second mantissa, the first operation result is used to indicate that the product of the first split mantissa and the second split mantissa is performed according to the first shift number The data obtained after shifting.
本申请中,浮点数计算电路中的计算电路可以选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据。计算电路通过选择第一运算结果中部分数据的方式将多个位数较高的第一运算结果拆分为位数较低的多个第一加法数据和多个第二加法数据,进而,通过位宽较小的加法器将位数较低的多个第一加法数据、多个第二加法数据和多个第一运算结果做和可以得到第一尾数与第二尾数的乘积。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In the present application, the calculation circuit in the floating-point number calculation circuit may select to output some data in the multiple first operation results to obtain multiple first addition data and multiple second addition data. The calculation circuit splits the multiple first calculation results with higher digits into multiple first addition data and multiple second addition data with lower digits by selecting part of the data in the first calculation results, and then, by The adder with a smaller bit width sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with a lower number of bits to obtain the product of the first mantissa and the second mantissa. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
在第一方面一种可能的实现方式中,所述计算电路包括乘法电路、加法电路和第一选择电路;所述指数处理电路的输出端与所述乘法电路的输入端电连接;所述第一选择电路的输入端与所述乘法电路的输出端电连接,所述第一选择电路的输出端与所述加法电路的输入端电连接;所述第一选择电路,用于选择输出多个所述第一运算结果中的低位数据得到多个所述第一加法数据,选择输出多个所述第一运算结果中的高位数据得到多个所述第二加法数据;所述加法电路,用于对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。In a possible implementation manner of the first aspect, the calculation circuit includes a multiplication circuit, an addition circuit, and a first selection circuit; the output terminal of the exponent processing circuit is electrically connected to the input terminal of the multiplication circuit; the first The input end of a selection circuit is electrically connected to the output end of the multiplication circuit, and the output end of the first selection circuit is electrically connected to the input end of the addition circuit; the first selection circuit is used to select and output a plurality of The low-order data in the first operation result obtains a plurality of the first addition data, and selects and outputs the high-order data in the first operation result to obtain a plurality of the second addition data; the addition circuit uses After adding a plurality of the first addition data and a plurality of the first operation results to obtain a low addition result and carry data, for the carry data, the plurality of second addition data and the plurality of first The operation results are added to obtain a high-order addition result, and the product of the first mantissa and the second mantissa is obtained after the high-order addition result and the low-order addition result are accumulated.
该种可能的实现方式中,乘法电路输出多个第一运算结果后,第一选择电路可以选择输出多个第一运算结果中的低位数据得到多个第一加法数据,选择输出多个第一运算结果中的高位数据得到多个第二加法数据。第一选择电路可以将位数较大的第一运算结果拆分成位数较小的第一加法数据和第二加法数据。进而将第一加法数据、第二加法数据分别与对应的第一运算结果做和,得到低位加法结果和高位加法结果。由于低位加法结果和高位加法结果位宽较小,所以计算低位加法结果和高位加法结果所采用的加法器位宽较小,降低了计算电路的构建成本。In this possible implementation, after the multiplication circuit outputs a plurality of first operation results, the first selection circuit can select to output the low-order data in the plurality of first operation results to obtain a plurality of first addition data, and select to output a plurality of first addition data. The high-order data in the operation result obtains a plurality of second addition data. The first selection circuit may split the first operation result with a large number of bits into first addition data and second addition data with a small number of bits. Further, the first addition data and the second addition data are respectively summed with the corresponding first operation result to obtain a low bit addition result and a high bit addition result. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the calculation circuit.
在第一方面一种可能的实现方式中,所述加法电路包括第一加法器和累加器;所述第一加法器的输入端与所述第一选择电路的输出端电连接,所述第一加法器的输出端与所述累加器的输入端电连接;所述第一加法器,用于在第一计算周期对多个所述第一加法数据和多个所述第一运算结果相加得到所述低位加法结果和所述进位数据,在第二计算周期对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到所述高位加法结果;所述累加器,用于将所述低位加法结果和所述高位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。In a possible implementation manner of the first aspect, the adding circuit includes a first adder and an accumulator; the input end of the first adder is electrically connected to the output end of the first selection circuit, and the first The output end of an adder is electrically connected to the input end of the accumulator; the first adder is used to compare a plurality of first addition data and a plurality of first operation results in a first calculation cycle adding the low-order addition result and the carry data, and adding the carry data, a plurality of the second addition data, and a plurality of the first operation results in a second calculation cycle to obtain the high-order addition result; The accumulator is configured to accumulate the low-order addition result and the high-order addition result to obtain a product of the first mantissa and the second mantissa.
该种可能的实现方式中,由于FP64类型的浮点数的mantissa部分长度为53bit。因此,A_mantissa*B_mantissa计算后得到的尾数部分的总长度数为106bit。如果想在一个计算单元(PE单元)内直接完成一对FP64类型的浮点数的尾数部分的计算,adder(第一加法器)需要扩位成支持长度为106bit的数据计算的加法器,括位后的adder的面积代价和时序代 价均过高。因此,可以通过第一选择电路选择将一对FP64的mantissa的乘法拆成两个部分(part1和part2),第一加法器在第一计算周期中计算part1部分得到低位加法结果,在第二周期中计算part2得到高位加法结果。累加器将两个计算周期得到的结果累加后便得到第一尾数和第二尾数的乘积。由于低位加法结果和高位加法结果位宽较小,所以计算低位加法结果和高位加法结果所采用的加法器位宽较小,降低了浮点数计算电路的构建成本。In this possible implementation manner, the length of the mantissa part of the FP64 floating point number is 53 bits. Therefore, the total length of the mantissa obtained after calculating A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 type floating-point numbers in a calculation unit (PE unit), the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits. The area cost and timing cost of the subsequent adder are too high. Therefore, the multiplication of a pair of FP64 mantissa can be selected and split into two parts (part1 and part2) by the first selection circuit, and the first adder calculates the part1 part in the first calculation cycle to obtain the low addition result, and in the second cycle Calculate part2 to get the high-order addition result. The accumulator accumulates the results of the two calculation cycles to obtain the product of the first mantissa and the second mantissa. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the floating-point number calculation circuit.
在第一方面一种可能的实现方式中,所述浮点数计算电路还包括拆分电路;所述拆分电路的输出端与所述指数处理电路的输入端和所述乘法电路的输入端电连接;所述拆分电路,用于将所述第一尾数拆分为所述第一拆分尾数,所述第一拆分尾数包括第一高位尾数与第一低位尾数,将所述第二尾数拆分为所述第二拆分尾数,所述第二拆分尾数包括第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。In a possible implementation manner of the first aspect, the floating-point number calculation circuit further includes a splitting circuit; the output end of the splitting circuit is electrically connected to the input end of the exponent processing circuit and the input end of the multiplication circuit connection; the splitting circuit is used to split the first mantissa into the first split mantissa, the first split mantissa includes a first high-order mantissa and a first low-order mantissa, and the second The mantissa is split into the second split mantissa, the second split mantissa includes a second high-order mantissa and a second low-order mantissa, and the first shift number is used to indicate the highest bit of each high-order mantissa and each low-order mantissa The shift difference between the highest bits of .
该种可能的实现方式中,本申请提供的浮点数计算电路可以把第一浮点数位数较大的尾数部分拆分为位数较小的第一高位尾数与第一低位尾数,把第二浮点数位数较大的尾数部分拆分为位数较小的第二高位尾数与第二低位尾数,从而采用较小位数的乘法器来计算拆分后的各尾数部分的乘积,降低了硬件的设计成本,合理的利用了乘法器的计算性能。In this possible implementation, the floating-point number calculation circuit provided by the application can split the mantissa part with the larger number of digits of the first floating-point number into the first high-order mantissa and the first low-order mantissa with smaller digits, and the second The mantissa part with a large number of floating-point numbers is split into the second high-order mantissa and the second low-order mantissa with a smaller number of digits, so that a multiplier with a smaller number of digits is used to calculate the product of each split mantissa part, reducing the The design cost of the hardware rationally utilizes the computing performance of the multiplier.
在第一方面一种可能的实现方式中,所述第一高位尾数包括第三尾数,所述第一低位尾数包括第四尾数、第五尾数、第六尾数以及第七尾数,所述第二高位尾数包括第八尾数,所述第二低位尾数包括第九尾数、第十尾数,第十一尾数以及第十二尾数。In a possible implementation manner of the first aspect, the first high-order mantissa includes a third mantissa, the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa, and the first The second high order mantissa includes the eighth mantissa, and the second low order mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
该种可能的实现方式中,提供了对于第一尾数和第二尾数的一种具体的拆分方式,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP64类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算第一尾数和第二尾数的乘积。降低了浮点数计算电路的构建成本,更有利于技术推广。In this possible implementation, a specific splitting method for the first mantissa and the second mantissa is provided. After splitting the mantissa part of the FP64 floating-point number using this splitting method, you can use FP32 After the mantissa part of the FP128 type floating-point number is split by this split method, the FP64 type multiplier can be used for calculation. This splitting method can realize the multiplication of the first mantissa and the second mantissa by using a multiplier with a smaller number of digits. The construction cost of the floating-point number calculation circuit is reduced, and it is more conducive to technology promotion.
在第一方面一种可能的实现方式中,所述浮点数计算电路还包括存储电路;所述拆分电路的输出端与所述存储电路的输入端电连接;所述指数处理电路的输入端与所述存储电路的第一输出端电连接;所述计算电路的输入端与所述存储电路的第二输出端电连接;所述存储电路,用于存储所述第一拆分尾数、所述第二拆分尾数、所述第一指数、所述第二指数、第三移位数和第四移位数,所述第三移位数用于表示所述第一拆分尾数的移位数,所述第四移位数用于表示所述第二拆分尾数的移位数。In a possible implementation manner of the first aspect, the floating-point number calculation circuit further includes a storage circuit; the output terminal of the splitting circuit is electrically connected to the input terminal of the storage circuit; the input terminal of the exponent processing circuit It is electrically connected to the first output end of the storage circuit; the input end of the calculation circuit is electrically connected to the second output end of the storage circuit; the storage circuit is used to store the first split mantissa, the The second split mantissa, the first exponent, the second exponent, the third shift number and the fourth shift number, the third shift number is used to represent the shift of the first split mantissa number of digits, the fourth shift digit is used to represent the shift digit of the second split mantissa.
该种可能的实现方式提供了一种浮点数计算电路中存储临时数据的一种具体的实现方式,提升了方案的可实现性。This possible implementation method provides a specific implementation method for storing temporary data in a floating-point number calculation circuit, which improves the feasibility of the solution.
在第一方面一种可能的实现方式中,所述指数处理电路包括第二加法器、第二选择电路以及第三加法器;所述第二加法器的输入端与所述存储电路的第一输出端电连接,所述第二加法器的输出端与所述第三加法器的第一输入端电连接;所述第三加法器的第二输入端与所述第二选择电路的输出端电连接,所述第三加法器的输出端与所述计算电路的第一输入端电连接;所述第二加法器,用于将所述第一指数、所述第二指数、所述第三移位数和所述第四移位数相加以得到多个第二运算结果;所述第二选择电路,用于选择多个所述 第二运算结果中的最大值;所述第三加法器,用于将多个所述第二运算结果中的最大值分别与各第二运算结果相减以得到所述第一移位数。In a possible implementation manner of the first aspect, the exponent processing circuit includes a second adder, a second selection circuit, and a third adder; the input terminal of the second adder is connected to the first The output end is electrically connected, the output end of the second adder is electrically connected to the first input end of the third adder; the second input end of the third adder is connected to the output end of the second selection circuit Electrically connected, the output end of the third adder is electrically connected to the first input end of the calculation circuit; the second adder is used to combine the first index, the second index, the first Three shift numbers and the fourth shift number are added to obtain a plurality of second operation results; the second selection circuit is used to select a maximum value among the plurality of second operation results; the third addition and a device, configured to subtract the maximum value among the plurality of second operation results from each second operation result to obtain the first shift number.
该种可能的实现方式提供了一种指数处理电路的具体的实现形式,提升了方案的可实现性。This possible implementation method provides a specific implementation form of the index processing circuit, which improves the feasibility of the solution.
在第一方面一种可能的实现方式中,所述乘法电路包括乘法器和移位寄存器;所述乘法器的输入端与所述存储电路的第二输出端电连接,所述乘法器的输出端与所述移位寄存器的第一输入端电连接;所述移位寄存器的第二输入端与所述第三加法器的输出端电连接;所述移位寄存器的输出端与所述第一加法器的输入端电连接;所述乘法器用于将所述第一拆分尾数以及所述第二拆分尾数相乘得到多个第三运算结果;所述移位寄存器用于根据多个所述第一移位数对多个所述第三运算结果做移位处理得到多个所述第一运算结果。In a possible implementation manner of the first aspect, the multiplication circuit includes a multiplier and a shift register; the input end of the multiplier is electrically connected to the second output end of the storage circuit, and the output end of the multiplier end is electrically connected with the first input end of the shift register; the second input end of the shift register is electrically connected with the output end of the third adder; the output end of the shift register is electrically connected with the first input end of the shift register The input terminal of an adder is electrically connected; the multiplier is used to multiply the first split mantissa and the second split mantissa to obtain a plurality of third operation results; The first shift number performs shift processing on a plurality of the third operation results to obtain a plurality of the first operation results.
该种可能的实现方式提供了一种乘法电路的具体的实现形式,提升了方案的可实现性。This possible implementation method provides a specific implementation form of the multiplication circuit, which improves the feasibility of the solution.
在第一方面一种可能的实现方式中,所述浮点数计算电路还包括内存控制器;所述内存控制器的输出端与所述拆分电路的输入端电连接;所述内存控制器,用于获取所述第一浮点数和所述第二浮点数,并且向所述拆分电路发送所述第一浮点数和所述第二浮点数。In a possible implementation manner of the first aspect, the floating-point number calculation circuit further includes a memory controller; the output terminal of the memory controller is electrically connected to the input terminal of the splitting circuit; the memory controller, It is used to obtain the first floating point number and the second floating point number, and send the first floating point number and the second floating point number to the splitting circuit.
该种可能的实现方式提供了一种能够获取第一浮点数和第二浮点数的硬件结构的一种具体的实现形式,提升了方案的可实现性。This possible implementation manner provides a specific implementation form of a hardware structure capable of obtaining the first floating-point number and the second floating-point number, which improves the feasibility of the solution.
在第一方面一种可能的实现方式中,所述第一浮点数还包括第一符号位,所述第二浮点数还包括第二符号位。In a possible implementation manner of the first aspect, the first floating-point number further includes a first sign bit, and the second floating-point number further includes a second sign bit.
本申请第二方面提供了一种浮点数计算方法,用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数和第一尾数,所述第二浮点数包括第二指数和第二尾数,所述方法包括:根据所述第一指数和所述第二指数获取第一移位数,所述第一移位数用于表示第一拆分尾数和第二拆分尾数之间乘积的移位数,所述第一拆分尾数由所述第一尾数拆分得到,所述第二拆分尾数由所述第二尾数拆分得到;选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,所述第一运算结果用于表示所述第一拆分尾数和所述第二拆分尾数的乘积根据所述第一移位数进行移位后得到的数据。The second aspect of the present application provides a floating-point number calculation method for calculating the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, and the second floating-point number Including a second exponent and a second mantissa, the method includes: obtaining a first shift number according to the first exponent and the second exponent, and the first shift number is used to represent the first split mantissa and the second mantissa The shift number of the product between the two split mantissas, the first split mantissa is obtained by splitting the first mantissa, and the second split mantissa is obtained by splitting the second mantissa; select and output multiple Part of the data in the first operation result obtains a plurality of first addition data and a plurality of second addition data, based on a plurality of the first addition data, a plurality of the second addition data, and a plurality of the first operation results Acquire the product of the first mantissa and the second mantissa, the first operation result is used to indicate that the product of the first split mantissa and the second split mantissa is performed according to the first shift number The data obtained after shifting.
本申请提供的浮点数计算方法中,可以选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据。通过选择第一运算结果中部分数据的方式将多个位数较高的第一运算结果拆分为位数较低的多个第一加法数据和多个第二加法数据,进而,通过位宽较小的加法器将位数较低的多个第一加法数据、多个第二加法数据和多个第一运算结果做和可以得到第一尾数与第二尾数的乘积。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In the floating-point number calculation method provided in the present application, partial data in multiple first operation results may be selected to be output to obtain multiple first addition data and multiple second addition data. By selecting part of the data in the first operation result, a plurality of first operation results with a higher number of digits are split into a plurality of first addition data and a plurality of second addition data with a lower number of digits, and then, by bit width The smaller adder sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with lower digits to obtain the product of the first mantissa and the second mantissa. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
本申请中,计算第一尾数和第二尾数的乘积时,可以选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据。通过选择第一运算结果中部分数据的方式将多个位数较高的第一运算结果拆分为位数较低的多个第一加法数据和多个第二加法数据,进而,可以通过位宽较小的加法器将位数较低的多个第一加法数据、多个第二加 法数据和多个第一运算结果做和可以得到第一尾数与第二尾数的乘积。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In the present application, when calculating the product of the first mantissa and the second mantissa, partial data in the multiple first operation results may be selected to be output to obtain multiple first addition data and multiple second addition data. By selecting part of the data in the first operation result, a plurality of first operation results with higher digits are split into a plurality of first addition data and a plurality of second addition data with lower number of digits, and then, by bit The adder with smaller width sums the multiple first addition data, the multiple second addition data and the multiple first operation results with lower number of digits to obtain the product of the first mantissa and the second mantissa. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
在第二方面一种可能的实现方式中,所述选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,包括:选择输出多个所述第一运算结果中的低位数据得到多个所述第一加法数据,选择输出多个所述第一运算结果中的高位数据得到多个所述第二加法数据;对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。In a possible implementation manner of the second aspect, the selecting and outputting part of the data in the multiple first operation results obtains multiple first addition data and multiple second addition data, and according to the multiple first addition data Obtaining the product of the first mantissa and the second mantissa for a plurality of second addition data and a plurality of first operation results includes: selecting and outputting low-order data in a plurality of first operation results to obtain A plurality of the first addition data, select and output the high-order data in the plurality of first operation results to obtain a plurality of the second addition data; for the plurality of the first addition data and the plurality of the first operation The results are added to obtain a low-order addition result and carry data, and the carry data, a plurality of the second addition data and a plurality of the first operation results are added to obtain a high-order addition result, and the high-order addition result and the The product of the first mantissa and the second mantissa is obtained after the low-order addition results are accumulated.
该种可能的实现方式中,得到多个第一运算结果后,可以选择输出多个第一运算结果中的低位数据得到多个第一加法数据,选择输出多个第一运算结果中的高位数据得到多个第二加法数据。即将位数较大的第一运算结果拆分成位数较小的第一加法数据和第二加法数据。进而将第一加法数据、第二加法数据分别与对应的第一运算结果做和,得到低位加法结果和高位加法结果。由于低位加法结果和高位加法结果位宽较小,所以计算低位加法结果和高位加法结果所采用的加法器位宽较小,降低了计算电路的构建成本。In this possible implementation, after obtaining multiple first operation results, you can choose to output the low-order data in the multiple first operation results to obtain multiple first addition data, and choose to output the high-order data in the multiple first operation results A plurality of second addition data is obtained. That is, the first operation result with a larger number of digits is split into first addition data and second addition data with a smaller number of digits. Further, the first addition data and the second addition data are respectively summed with the corresponding first operation result to obtain a low bit addition result and a high bit addition result. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the calculation circuit.
在第二方面一种可能的实现方式中,所述对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积,包括:在第一计算周期对多个所述第一加法数据和多个所述第一运算结果相加得到所述低位加法结果和所述进位数据,在第二计算周期对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到所述高位加法结果;将所述低位加法结果和所述高位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。In a possible implementation manner of the second aspect, the addition of the plurality of first addition data and the plurality of first operation results obtains the low-order addition result and carry data, and the carry data, the plurality of Adding the second addition data and a plurality of the first operation results to obtain a high-order addition result, and accumulating the high-order addition result and the low-order addition result to obtain a product of the first mantissa and the second mantissa , including: adding a plurality of first addition data and a plurality of first operation results in a first calculation cycle to obtain the low-order addition result and the carry data, and adding the carry data in a second calculation cycle , adding a plurality of the second addition data and a plurality of the first operation results to obtain the high-order addition result; accumulating the low-order addition result and the high-order addition result to obtain the first mantissa and the The product of the second mantissa.
该种可能的实现方式中,若第一浮点数和第二浮点数都是FP64类型的浮点数。由于FP64类型的浮点数的mantissa部分长度为53bit。因此,A_mantissa*B_mantissa计算后得到的尾数部分的总长度数为106bit。如果想在一个计算单元(PE单元)内直接完成一对FP64类型的浮点数的尾数部分的计算,adder(第一加法器)需要扩位成支持长度为106bit的数据计算的加法器,括位后的adder的面积代价和时序代价均过高。因此,可以选择将一对FP64的mantissa的乘法拆成两个部分(part1和part2),在第一计算周期中计算part1部分得到低位加法结果,在第二周期中计算part2得到高位加法结果。然后,将两个计算周期得到的结果累加后便得到第一尾数和第二尾数的乘积。由于低位加法结果和高位加法结果位宽较小,所以计算低位加法结果和高位加法结果所采用的加法器位宽较小,降低了浮点数计算电路的构建成本。In this possible implementation manner, if both the first floating point number and the second floating point number are FP64 floating point numbers. Since the length of the mantissa part of the FP64 floating point number is 53 bits. Therefore, the total length of the mantissa obtained after calculating A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 type floating-point numbers in a calculation unit (PE unit), the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits. The area cost and timing cost of the subsequent adder are too high. Therefore, you can choose to split the multiplication of a pair of FP64 mantissa into two parts (part1 and part2), calculate the part1 part in the first calculation cycle to obtain the low-order addition result, and calculate part2 in the second cycle to obtain the high-order addition result. Then, the product of the first mantissa and the second mantissa is obtained after the results obtained in the two calculation cycles are accumulated. Since the bit width of the low-order addition result and the high-order addition result is small, the adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the floating-point number calculation circuit.
在第二方面一种可能的实现方式中,所述方法还包括:将所述第一尾数拆分为所述第一拆分尾数,所述第一拆分尾数包括第一高位尾数与第一低位尾数,将所述第二尾数拆分 为所述第二拆分尾数,所述第二拆分尾数包括第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。In a possible implementation manner of the second aspect, the method further includes: splitting the first mantissa into the first split mantissa, the first split mantissa includes the first high order mantissa and the first a low mantissa, splitting the second mantissa into the second split mantissa, the second split mantissa includes a second high mantissa and a second low mantissa, and the first shift number is used to indicate each high bit The shifted difference between the most significant bit of the mantissa and the most significant bit of each lower mantissa.
该种可能的实现方式中,可以把第一浮点数位数较大的尾数部分拆分为位数较小的第一高位尾数与第一低位尾数,把第二浮点数位数较大的尾数部分拆分为位数较小的第二高位尾数与第二低位尾数,从而采用较小位数的乘法器来计算拆分后的各尾数部分的乘积,降低了硬件的设计成本,合理的利用了乘法器的计算性能。In this possible implementation, the mantissa part of the first floating-point number with a large number of digits can be split into the first high-order mantissa and the first low-order mantissa with a small number of digits, and the mantissa with a large second floating-point number The part is split into the second high-order mantissa and the second low-order mantissa with a smaller number of digits, so that a multiplier with a smaller number of digits is used to calculate the product of each mantissa after splitting, which reduces the design cost of the hardware and makes reasonable use of performance of the multiplier.
在第二方面一种可能的实现方式中,所述第一高位尾数包括第三尾数,所述第一低位尾数包括第四尾数、第五尾数、第六尾数以及第七尾数,所述第二高位尾数包括第八尾数,所述第二低位尾数包括第九尾数、第十尾数,第十一尾数以及第十二尾数。In a possible implementation manner of the second aspect, the first high-order mantissa includes a third mantissa, the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa, and the first The second high order mantissa includes the eighth mantissa, and the second low order mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
该种可能的实现方式中,提供了对于第一尾数和第二尾数的一种具体的拆分方式,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP64类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算第一尾数和第二尾数的乘积。降低了浮点数计算电路的构建成本,更有利于技术推广。In this possible implementation, a specific splitting method for the first mantissa and the second mantissa is provided. After splitting the mantissa part of the FP64 floating-point number using this splitting method, you can use FP32 After the mantissa part of the FP128 type floating-point number is split by this split method, the FP64 type multiplier can be used for calculation. This splitting method can realize the multiplication of the first mantissa and the second mantissa by using a multiplier with a smaller number of digits. The construction cost of the floating-point number calculation circuit is reduced, and it is more conducive to technology promotion.
在第二方面一种可能的实现方式中,所述方法还包括:存储所述第一拆分尾数、所述第二拆分尾数、所述第一指数、所述第二指数、第三移位数和第四移位数,所述第三移位数用于表示所述第一拆分尾数的移位数,所述第四移位数用于表示所述第二拆分尾数的移位数。In a possible implementation manner of the second aspect, the method further includes: storing the first split mantissa, the second split mantissa, the first exponent, the second exponent, the third shift number of digits and a fourth shift digit, the third shift digit is used to represent the shift digit of the first split mantissa, and the fourth shift digit is used to represent the shift digit of the second split mantissa digits.
该种可能的实现方式提供了一种浮点数计算方法中存储临时数据的一种具体的实现方式,提升了方案的可实现性。This possible implementation method provides a specific implementation method for storing temporary data in a floating-point number calculation method, which improves the feasibility of the solution.
在第二方面一种可能的实现方式中,所述根据第一指数和第二指数获取第一移位数,包括:选择多个所述第二运算结果中的最大值;将多个所述第二运算结果中的最大值分别与各第二运算结果相减以得到所述第一移位数。In a possible implementation manner of the second aspect, the obtaining the first shift number according to the first index and the second index includes: selecting a maximum value among the plurality of second operation results; The maximum value in the second operation results is respectively subtracted from the second operation results to obtain the first shift number.
该种可能的实现方式提供了一种获取第一移位数的具体的实现形式,提升了方案的可实现性。This possible implementation manner provides a specific implementation form for obtaining the first shift number, which improves the feasibility of the solution.
在第二方面一种可能的实现方式中,所述方法还包括:将所述第一拆分尾数以及所述第二拆分尾数相乘得到多个第三运算结果;根据多个所述第一移位数对多个所述第三运算结果做移位处理得到多个所述第一运算结果。In a possible implementation manner of the second aspect, the method further includes: multiplying the first split mantissa and the second split mantissa to obtain multiple third operation results; A shift number performs shift processing on a plurality of the third operation results to obtain a plurality of the first operation results.
该种可能的实现方式提供了一种获取第一运算结果的具体的实现形式,提升了方案的可实现性。This possible implementation manner provides a specific implementation form for obtaining the first operation result, which improves the feasibility of the solution.
在第二方面一种可能的实现方式中,所述方法还包括:获取所述第一浮点数和所述第二浮点数。In a possible implementation manner of the second aspect, the method further includes: acquiring the first floating point number and the second floating point number.
在第二方面一种可能的实现方式中,所述第一浮点数还包括第一符号位,所述第二浮点数还包括第二符号位。In a possible implementation manner of the second aspect, the first floating-point number further includes a first sign bit, and the second floating-point number further includes a second sign bit.
本申请第三方面提供了一种浮点数计算电路,所述浮点数计算电路包括:指数处理电路和计算电路,所述计算电路包括第一乘法电路、第一选择器和加法电路;所述指数处理电路的输出端与所述第一乘法电路的输入端电连接;所述第一乘法电路的输出端与所述第 一选择器的输入端电连接;所述第一选择器的输出端与所述加法电路的输入端电连接。The third aspect of the present application provides a floating-point number calculation circuit, the floating-point number calculation circuit includes: an index processing circuit and a calculation circuit, the calculation circuit includes a first multiplication circuit, a first selector and an addition circuit; the index The output end of the processing circuit is electrically connected to the input end of the first multiplication circuit; the output end of the first multiplication circuit is electrically connected to the input end of the first selector; the output end of the first selector is electrically connected to the input end of the first selector. The input end of the adding circuit is electrically connected.
本申请中,在浮点数计算电路中的计算电路计算两个浮点数的尾数部分的乘积的过程中,第一乘法电路会输出多个运算结果,第一选择器可以将多个运算结果中位数较大的运算结果拆分为位数较小的第一加法数据和第二加法数据,进而,加法电路中可以通过位宽较小的加法器将位数较低的多个第一加法数据、多个第二加法数据和多个第一运算结果做和以得到第一尾数与第二尾数的乘积。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In the present application, in the process of calculating the product of the mantissa parts of two floating-point numbers by the calculation circuit in the floating-point number calculation circuit, the first multiplication circuit will output multiple calculation results, and the first selector can place multiple calculation results in The operation result with a larger number is split into the first addition data and the second addition data with a smaller number of digits, and then, in the addition circuit, a plurality of first addition data with lower digits can be combined by an adder with a smaller bit width. and summing the plurality of second addition data and the plurality of first operation results to obtain a product of the first mantissa and the second mantissa. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
在第三方面一种可能的实现方式中,所述浮点数计算电路用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数、第一尾数和第一符号位,所述第二浮点数包括第二指数、第二尾数和第二符号位;所述指数处理电路的输入端用于接收所述第一指数和所述第二指数;所述计算电路的输入端用于接收所述第一尾数和所述第二尾数。In a possible implementation manner of the third aspect, the floating-point number calculation circuit is used to calculate the product of the first floating-point number and the second floating-point number, and the first floating-point number includes the first exponent, the first mantissa, and the first Sign bit, the second floating point number includes a second exponent, a second mantissa and a second sign bit; the input end of the exponent processing circuit is used to receive the first exponent and the second exponent; the calculation circuit The input terminal of is used to receive the first mantissa and the second mantissa.
该种可能的实现方式中,说明了浮点数计算电路运算时输入的浮点数的具体格式,提升了方案的可实现性。In this possible implementation, the specific format of the floating-point number input during the operation of the floating-point number calculation circuit is described, which improves the feasibility of the solution.
在第三方面一种可能的实现方式中,所述加法电路包括第一加法器和累加器;所述第一加法器的输入端与所述第一选择器的输出端电连接,所述第一加法器的输出端与所述累加器的输入端电连接。In a possible implementation manner of the third aspect, the adding circuit includes a first adder and an accumulator; the input end of the first adder is electrically connected to the output end of the first selector, and the first adder The output of an adder is electrically connected to the input of the accumulator.
该种可能的实现方式中,第一选择器将运算结果拆分为位数较低的第一加法数据和第二加法数据后,第一加法器可以根据第一加法数据和多个未拆分的运算结果得到低位加法结果,根据第二加法数据和多个拆分的运算结果得到高位加法结果。累加器对低位加法结果和高位加法结果做和可以得到第一尾数和第二尾数的乘积。由于低位加法结果和高位加法结果位宽较小,所以计算低位加法结果和高位加法结果所采用的第一加法器位宽较小,降低了计算电路的构建成本。In this possible implementation, after the first selector splits the operation result into the first addition data and the second addition data with lower digits, the first adder can use the first addition data and multiple unsplit The low-order addition result is obtained from the operation result, and the high-order addition result is obtained according to the second addition data and the multiple divided operation results. The accumulator sums the low-order addition result and the high-order addition result to obtain the product of the first mantissa and the second mantissa. Since the bit width of the low-order addition result and the high-order addition result is small, the first adder used for calculating the low-order addition result and the high-order addition result has a small bit width, which reduces the construction cost of the calculation circuit.
在第三方面一种可能的实现方式中,所述第一乘法电路包括第一乘法器和第一移位寄存器;所述第一移位寄存器的第一输入端与所述指数处理电路的输出端电连接,所述第一移位寄存器的第二输入端与所述第一乘法器的输出端电连接,所述第一移位寄存器的输出端与所述第一选择器的输入端电连接。In a possible implementation manner of the third aspect, the first multiplication circuit includes a first multiplier and a first shift register; the first input terminal of the first shift register is connected to the output of the exponent processing circuit The terminal is electrically connected, the second input terminal of the first shift register is electrically connected to the output terminal of the first multiplier, and the output terminal of the first shift register is electrically connected to the input terminal of the first selector. connect.
该种可能的实现方式中,第一乘法器可以计算拆分后的第一尾数与拆分后的第二尾数之间的乘积,第一移位寄存器可以根据指数处理电路输出的移位数对第一乘法器输出的结果进行移位,移位后输出运算结果。该种可能的实现方式提供了一种乘法电路的具体的实现方式,提升了方案的可实现性。In this possible implementation, the first multiplier can calculate the product between the split first mantissa and the split second mantissa, and the first shift register can be based on the shift number output by the exponent processing circuit. The result output by the first multiplier is shifted, and the operation result is output after the shift. This possible implementation provides a specific implementation of the multiplication circuit, which improves the feasibility of the solution.
在第三方面一种可能的实现方式中,其特征在于,所述指数处理电路包括第二加法器、第二选择器以及第三加法器;所述第二加法器的输出端与所述第三加法器的第一输入端电连接;所述第三加法器的第二输入端与所述第二选择器的输出端电连接,所述第三加法器的输出端与所述第一移位寄存器的第一输入端电连接。In a possible implementation manner of the third aspect, it is characterized in that the exponent processing circuit includes a second adder, a second selector, and a third adder; the output terminal of the second adder is connected to the first The first input end of the three adders is electrically connected; the second input end of the third adder is electrically connected to the output end of the second selector, and the output end of the third adder is electrically connected to the first shifter. The first input terminal of the bit register is electrically connected.
该种可能的实现方式提供了一种指数处理电路的具体的实现形式,提升了方案的可实现性。This possible implementation method provides a specific implementation form of the index processing circuit, which improves the feasibility of the solution.
在第三方面一种可能的实现方式中,所述计算电路还包括第二乘法电路,所述第二乘 法电路包括第二乘法器和第二移位寄存器;所述第二移位寄存器的第一输入端与所述指数处理电路的输出端电连接,所述第二移位寄存器的第二输入端与所述第二乘法器的输出端电连接,所述第二移位寄存器的输出端与所述第一加法器的输入端电连接。In a possible implementation manner of the third aspect, the calculation circuit further includes a second multiplication circuit, and the second multiplication circuit includes a second multiplier and a second shift register; the second shift register of the second shift register An input terminal is electrically connected to the output terminal of the index processing circuit, a second input terminal of the second shift register is electrically connected to the output terminal of the second multiplier, and an output terminal of the second shift register It is electrically connected with the input terminal of the first adder.
该种可能的实现方式中,第二乘法电路中的第二乘法器对拆分后的第一尾数和拆分后的第二尾数做乘积,第二移位寄存器根据指数处理电路输出的移位数对第二乘法器输出的结果进行移位得到多个运算结果。第二移位寄存器直接将多个运算结果输入至第一加法器,以便第一加法器可以将第一乘法电路中输出的第一加法结果、第二加法结果与第二移位寄存器输出的多个运算结果做和得到第一尾数和第二尾数的乘积。In this possible implementation, the second multiplier in the second multiplication circuit multiplies the split first mantissa and the split second mantissa, and the second shift register is based on the shift output from the exponent processing circuit. The number shifts the result output by the second multiplier to obtain multiple operation results. The second shift register directly inputs a plurality of operation results to the first adder, so that the first adder can combine the first addition result output in the first multiplication circuit, the second addition result and the multiplication result output by the second shift register. The results of two operations are summed to obtain the product of the first mantissa and the second mantissa.
在第三方面一种可能的实现方式中,所述浮点数计算电路还包括内存控制器、第三选择器和寄存器;所述第三选择器的输入端与所述内存控制器的输出端电连接,所述第三选择器的输出端与所述寄存器的输入端电连接;所述寄存器的第一输出端与所述指数处理电路的输入端电连接,所述寄存器的第二输出端与所述计算电路的输入端电连接。In a possible implementation manner of the third aspect, the floating-point number calculation circuit further includes a memory controller, a third selector, and a register; the input end of the third selector is electrically connected to the output end of the memory controller The output end of the third selector is electrically connected to the input end of the register; the first output end of the register is electrically connected to the input end of the index processing circuit, and the second output end of the register is electrically connected to the input end of the index processing circuit. The input terminals of the calculation circuit are electrically connected.
该种可能的方式中,内存控制器将内存处获取的第一浮点数和第二浮点数传输至第三选择器,第三选择器对第一尾数和第二尾数进行拆分之后输入至寄存器保存。该种可能的实现方式提升了方案的可实现性。In this possible way, the memory controller transmits the first floating-point number and the second floating-point number acquired at the memory to the third selector, and the third selector splits the first mantissa and the second mantissa and then inputs them to the register save. This possible implementation improves the feasibility of the solution.
本申请实施例第三方面提供了一种计算装置,计算装置包括控制电路以及浮点数计算电路。所述浮点数计算电路在所述控制电路的控制下计算数据,所述浮点数计算电路为如上述第一方面或第一方面任意一种可能实现方式中所描述的浮点数计算电路,或者,所述浮点数计算电路为如上述第三方面或第三方面任意一种可能实现方式中所描述的浮点数计算电路。A third aspect of the embodiments of the present application provides a computing device, and the computing device includes a control circuit and a floating-point number computing circuit. The floating-point number calculation circuit calculates data under the control of the control circuit, and the floating-point number calculation circuit is the floating-point number calculation circuit described in the first aspect or any possible implementation of the first aspect, or, The floating-point number calculation circuit is the floating-point number calculation circuit described in the third aspect or any possible implementation manner of the third aspect.
附图说明Description of drawings
图1为本申请提供的卷积神经网络的处理原理图;Fig. 1 is the processing schematic diagram of the convolutional neural network provided by the present application;
图2为本申请实施例提供的FP32类型的浮点数的组成示意图;Fig. 2 is the composition schematic diagram of the floating-point number of FP32 type that the embodiment of the present application provides;
图3为本申请实施例提供的浮点数计算电路的一结构示意图;FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application;
图4是本申请实施例提供的一种计算电路的一种结构示意图;FIG. 4 is a schematic structural diagram of a computing circuit provided by an embodiment of the present application;
图5是本申请实施例提供的一种计算电路的另一种结构示意图;Fig. 5 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application;
图6是本申请实施例提供的一种计算电路的另一种结构示意图;Fig. 6 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application;
图7为本申请中提供的第一浮点数和第二浮点数的尾数部分的示意图;FIG. 7 is a schematic diagram of the mantissa part of the first floating-point number and the second floating-point number provided in the present application;
图8为本申请中提供的一种计算电路的一种结构示意图;FIG. 8 is a schematic structural diagram of a computing circuit provided in this application;
图9为本申请中提供的一种计算电路的运算过程的一种示意图;FIG. 9 is a schematic diagram of an operation process of a computing circuit provided in the present application;
图10为本申请实施例提供的浮点数计算电路的一实施例示意图;FIG. 10 is a schematic diagram of an embodiment of a floating-point number calculation circuit provided in an embodiment of the present application;
图11为本申请实施例提供的浮点数计算电路的另一实施例示意图;FIG. 11 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application;
图12为本申请实施例提供的浮点数计算电路的另一实施例示意图;FIG. 12 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application;
图13为本申请提供的一种拆分电路的一种结构示意图;FIG. 13 is a schematic structural diagram of a split circuit provided by the present application;
图14为本申请提供的一种第一尾数和第二尾数的结构示意图;FIG. 14 is a schematic structural diagram of a first mantissa and a second mantissa provided by the present application;
图15为本申请提供的一种存储电路的一种结构示意图;FIG. 15 is a schematic structural diagram of a storage circuit provided by the present application;
图16为本申请提供的一种内存控制器与内存之间的连接关系示意图;FIG. 16 is a schematic diagram of a connection relationship between a memory controller and memory provided by the present application;
图17为本申请中提供的一种指数处理电路的一种结构示意图;FIG. 17 is a schematic structural diagram of an index processing circuit provided in this application;
图18为本申请实施例提供的浮点数计算电路的另一实施例示意图;FIG. 18 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application;
图19为本申请实施例提供的浮点数计算电路的另一实施例示意图;FIG. 19 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application;
图20为本申请实施例提供的浮点数计算电路的另一实施例示意图;FIG. 20 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application;
图21为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 21 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only part of the present application, rather than all of them. . Those skilled in the art know that, with the emergence of new application scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or modules is not necessarily limited to the expressly listed Instead, other steps or modules not explicitly listed or inherent to the process, method, product or apparatus may be included. The naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logic sequence indicated by the naming or numbering. The execution order of the technical purpose is changed, as long as the same or similar technical effect can be achieved.
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
图1为本申请提供的卷积神经网络的处理原理图。Fig. 1 is a processing schematic diagram of the convolutional neural network provided by the present application.
卷积神经网络CNN在图像、语音识别等领域有广泛的应用前景。如图1所示,卷积神经网络需要对多个卷积核以及一张或者多张特征图进行卷积运算。具体的,对于每一个卷积核,将其从特征图的第一个像素开始,延行方向逐像素移动,当到达此行的终点时,在列方向下移一个像素,行方向的回到起点,并且重复行方向移动过程,直到遍历特征图的所有像素。在卷积核移动的过程中,将卷积核中的参数和特征图相中对应位置的数据作为卷积运算的两部分输入,进行卷积操作(两两相乘再将乘积逐个累加),得到卷积结果后输出该卷积结果。Convolutional neural network (CNN) has broad application prospects in image, speech recognition and other fields. As shown in Figure 1, the convolutional neural network needs to perform convolution operations on multiple convolution kernels and one or more feature maps. Specifically, for each convolution kernel, start from the first pixel of the feature map, and move pixel by pixel in the row direction. When the end of the row is reached, move down one pixel in the column direction, and return to the row direction starting point, and repeat the process of moving in the row direction until all pixels of the feature map are traversed. In the process of moving the convolution kernel, the parameters in the convolution kernel and the data in the corresponding position of the feature map are used as the two parts of the convolution operation to perform the convolution operation (multiply two by two and then accumulate the products one by one), After the convolution result is obtained, the convolution result is output.
卷积神经网络(convolution neural network,CNN)目前被广泛应用于多种类型的图像处理应用中,图像处理应用在使用浮点数(floating point,FP)16类型的数据对模型进行网络训练时,由于FP16类型的数据精度不足,会导致网络训练不收敛或者收敛速度慢,所以需要使用更高精度的FP32类型的数据来保证网络训练效果。此外,在某些应用中,需要使用更高精度的FP64类型的数据以及FP128类型的数据来进行模型训练。Convolution neural network (CNN) is currently widely used in various types of image processing applications. When image processing applications use floating point (floating point, FP) 16 types of data to perform network training on the model, due to Insufficient precision of FP16 data will lead to non-convergence or slow convergence of network training, so it is necessary to use higher-precision FP32 data to ensure network training effect. In addition, in some applications, it is necessary to use higher-precision FP64 type data and FP128 type data for model training.
需要说明的是,本发明中涉及的浮点数计算电路除了可以应用于人工智能领域外,还可以应用于数据信号处理领域,比如图像处理***,雷达***和通讯***。此电路和方法可以优化数字信号处理(digital signal processing,DSP)或其它数字设备的性能。比如应用于长期演进(long term evolution,LTE)、通用移动通信***(universal mobile telecommunications system,UMTS)、全球移动通信***(global system for mobile communications,GSM)等现行通讯***中的数字设备。It should be noted that the floating-point calculation circuit involved in the present invention can be applied not only in the field of artificial intelligence, but also in the field of data signal processing, such as image processing systems, radar systems and communication systems. The circuit and method can optimize the performance of digital signal processing (DSP) or other digital devices. For example, it is applied to digital devices in current communication systems such as long term evolution (LTE), universal mobile telecommunications system (UMTS), and global system for mobile communications (GSM).
现有的数据计算方案中,可以采用较小位数的乘法器来计算位数较大的浮点数。例如,可以通过个用于计算FP32数据的乘法器来计算FP64类型的浮点数据。网络设备通过将FP64类型的浮点数据拆分为位数较小的浮点数后做乘法运算,再通过加法器将乘法运算后的结果相加得到FP64类型的浮点数据的乘积。该种传统的计算方式中,将乘法运算后的结果相加时所需要的加法器的位宽较大,硬件设计代价高,不利于技术推广。In the existing data calculation scheme, a multiplier with a smaller number of digits can be used to calculate a floating-point number with a larger number of digits. For example, FP64 type floating-point data can be calculated by a multiplier used to calculate FP32 data. The network device splits the FP64 floating-point data into smaller floating-point numbers for multiplication, and then adds the results of the multiplication through the adder to obtain the product of the FP64 floating-point data. In this traditional calculation method, the bit width of the adder required for adding the multiplication results is large, and the hardware design cost is high, which is not conducive to technology promotion.
针对现有的数据计算方案所存在的上述问题,本申请实施例提供了一种浮点数计算电路、浮点数计算方法以及计算装置。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In view of the above-mentioned problems existing in the existing data calculation schemes, the embodiments of the present application provide a floating-point number calculation circuit, a floating-point number calculation method, and a calculation device. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
在介绍本申请提供的浮点数计算电路、浮点数计算方法以及计算装置之前,先通过下面的示例来介绍浮点数的格式以及浮点数的计算方式。Before introducing the floating-point number calculation circuit, the floating-point number calculation method and the calculation device provided in this application, the format of the floating-point number and the calculation method of the floating-point number will be introduced through the following examples.
目前有四种格式的浮点数较为常见,分别为FP16、FP32、FP64以及FP128。其中,每种浮点数都由三部分组成,分别是符号位(sign)、指数位(exp)和尾数位(mantissa)。一个浮点数的实际值等于sign*2 exp*mantissa。 Currently, there are four common floating-point formats, namely FP16, FP32, FP64, and FP128. Among them, each floating-point number consists of three parts, namely the sign bit (sign), the exponent bit (exp) and the mantissa bit (mantissa). The actual value of a floating point number is equal to sign*2 exp *mantissa.
图2为本申请实施例提供的FP32类型的浮点数的组成示意图。FIG. 2 is a schematic diagram of composition of FP32 type floating-point numbers provided by the embodiment of the present application.
如图2所示,FP32类型的浮点数有1bit的sign,8bit的exp和24bit的mantissa,显示存储的共计32bit。其中mantissa的最高位隐式存储(如果exp不为0,则hiden bit为1,否则hiden bit为0),三部分共计32bit。As shown in Figure 2, the FP32 floating-point number has 1-bit sign, 8-bit exp and 24-bit mantissa, displaying a total of 32 bits stored. Among them, the highest bit of mantissa is implicitly stored (if exp is not 0, the hiden bit is 1, otherwise the hiden bit is 0), and the three parts total 32 bits.
在计算浮点数A*B时,指数部分的计算过程为A_exp+B_exp,尾数部分的计算过程为A_mantissa*B_mantissa。然后将新得到的exp和mantissa按照标准中的格式生成新的浮点数。When calculating the floating-point number A*B, the calculation process of the exponent part is A_exp+B_exp, and the calculation process of the mantissa part is A_mantissa*B_mantissa. Then the newly obtained exp and mantissa are generated into new floating-point numbers according to the format in the standard.
在计算浮点数A+B时,先求出A_exp和B_exp中较大的一个。假设,A_exp比B_exp大n。然后在mantissa相加时,就需要先将B_mantissa右移n个bit,然后再与A_mantissa相加,得到新的mantissa,再根据标准生成新的的浮点数。在计算多个浮点数一起相加时,会先求得其中最大的exp,然后根据最大的exp与各个浮点数的exp之间的差值,对mantissa做相应的移位,然后再将移位后的mantissa相加。When calculating the floating-point number A+B, first find the larger one of A_exp and B_exp. Suppose, A_exp is larger than B_exp by n. Then when mantissa is added, it is necessary to shift B_mantissa to the right by n bits, and then add it to A_mantissa to obtain a new mantissa, and then generate a new floating-point number according to the standard. When calculating the addition of multiple floating-point numbers, the largest exp will be obtained first, and then the mantissa will be shifted accordingly according to the difference between the largest exp and the exp of each floating-point number, and then the shift After the mantissa is added.
下面将结合本申请中的附图,对本申请提供的浮点数计算电路、浮点数计算方法以及 计算装置分别进行详细介绍,首先介绍本申请提供的浮点数计算电路。The floating-point number calculation circuit, floating-point number calculation method and calculation device provided by the application will be introduced in detail below in conjunction with the accompanying drawings in the application. First, the floating-point number calculation circuit provided by the application will be introduced.
图3是本申请实施例提供的一种浮点数计算电路的一结构示意图。FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
请参阅图3,本申请提供的浮点数计算电路中至少包括指数处理电路101和计算电路102。Referring to FIG. 3 , the floating-point number calculation circuit provided by the present application includes at least an exponent processing circuit 101 and a calculation circuit 102 .
其中,指数处理电路101的输出端与计算电路102的输入端电连接。Wherein, the output end of the index processing circuit 101 is electrically connected to the input end of the calculation circuit 102 .
本申请中,浮点数计算电路用于计算第一浮点数和第二浮点数的乘积。其中,第一浮点数包括第一指数和第一尾数,第二浮点数包括第二指数和第二尾数。指数处理电路101和计算电路102可以根据第一浮点数和第二浮点数计算第一浮点数和第二浮点数尾数部分的乘积。下面将阐述这一计算过程。In the present application, the floating-point number calculation circuit is used to calculate the product of the first floating-point number and the second floating-point number. Wherein, the first floating point number includes a first exponent and a first mantissa, and the second floating point number includes a second exponent and a second mantissa. The exponent processing circuit 101 and the calculation circuit 102 can calculate the product of the mantissa part of the first floating point number and the second floating point number according to the first floating point number and the second floating point number. This calculation process will be explained below.
本申请中,指数处理电路101可以根据第一指数和第二指数获取第一移位数。第一移位数用于表示第一拆分尾数和第二拆分尾数之间乘积的移位数,第一拆分尾数由第一尾数拆分得到,第二拆分尾数由第二尾数拆分得到。In this application, the exponent processing circuit 101 may obtain the first shift number according to the first exponent and the second exponent. The first shift number is used to represent the shift number of the product between the first split mantissa and the second split mantissa. The first split mantissa is obtained by splitting the first split mantissa, and the second split mantissa is split by the second split mantissa. Get it.
本申请中,计算电路102可以选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个第一加法数据、多个第二加法数据和多个第一运算结果获取第一尾数与第二尾数的乘积,第一运算结果用于表示第一拆分尾数和第二拆分尾数的乘积根据第一移位数进行移位后得到的数据。In this application, the calculation circuit 102 may select to output part of the data in the multiple first calculation results to obtain the multiple first addition data and the multiple second addition data. According to the multiple first addition data, the multiple second addition data and Multiple first operation results obtain the product of the first mantissa and the second mantissa, and the first operation result is used to represent the data obtained after the product of the first split mantissa and the second split mantissa is shifted according to the first shift number .
本申请中,浮点数计算电路中的计算电路可以选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据。计算电路通过选择第一运算结果中部分数据的方式将多个位数较高的第一运算结果拆分为位数较低的多个第一加法数据和多个第二加法数据,进而,通过位宽较小的加法器将位数较低的多个第一加法数据、多个第二加法数据和多个第一运算结果做和可以得到第一尾数与第二尾数的乘积。计算第一浮点数与第二浮点数的尾数部分的乘积时所采用的加法器的位宽小,硬件设计代价低,更利于技术推广。In this application, the calculation circuit in the floating-point number calculation circuit may select to output some data in the multiple first operation results to obtain multiple first addition data and multiple second addition data. The calculation circuit splits the multiple first calculation results with higher digits into multiple first addition data and multiple second addition data with lower digits by selecting part of the data in the first calculation results, and then, by The adder with a smaller bit width sums the plurality of first addition data, the plurality of second addition data and the plurality of first operation results with lower bits to obtain the product of the first mantissa and the second mantissa. The bit width of the adder used when calculating the product of the mantissa part of the first floating point number and the second floating point number is small, and the hardware design cost is low, which is more conducive to technology promotion.
下面的示例将详细说明本申请中指数处理电路101和计算电路102计算浮点数尾数部分乘积的详细过程。The following example will describe in detail the detailed process of calculating the product of the mantissa part of the floating-point number by the exponent processing circuit 101 and the calculation circuit 102 in this application.
首先说明计算电路102的具体实现方式以及计算电路102的运算过程。Firstly, the specific implementation of the calculation circuit 102 and the operation process of the calculation circuit 102 will be described.
图4是本申请实施例提供的一种计算电路102的一种结构示意图。FIG. 4 is a schematic structural diagram of a calculation circuit 102 provided in an embodiment of the present application.
请参阅图4,本申请中,可选的,计算电路102中可以包括乘法电路201、第一选择电路202和加法电路203。Referring to FIG. 4 , in this application, optionally, the calculation circuit 102 may include a multiplication circuit 201 , a first selection circuit 202 and an addition circuit 203 .
本申请中,指数处理电路101的输出端与乘法电路201的输入端电连接。第一选择电路202的输入端与乘法电路201的输出端电连接,第一选择电路202的输出端与加法电路203的输入端电连接。In this application, the output terminal of the exponent processing circuit 101 is electrically connected to the input terminal of the multiplication circuit 201 . The input end of the first selection circuit 202 is electrically connected to the output end of the multiplication circuit 201 , and the output end of the first selection circuit 202 is electrically connected to the input end of the addition circuit 203 .
第一选择电路202可以选择输出多个第一运算结果中的低位数据得到多个第一加法数据,选择输出多个第一运算结果中的高位数据得到多个第二加法数据。The first selection circuit 202 may select and output low-order data among the multiple first operation results to obtain multiple first addition data, and select and output high-order data among the multiple first operation results to obtain multiple second addition data.
加法电路203可以对多个第一加法数据和多个第一运算结果相加得到低位加法结果和进位数据,对进位数据、多个第二加法数据和多个第一运算结果相加得到高位加法结果,将高位加法结果和低位加法结果累加后得到第一尾数与第二尾数的乘积。The addition circuit 203 can add a plurality of first addition data and a plurality of first operation results to obtain a low-order addition result and carry data, and add the carry data, a plurality of second addition data and a plurality of first operation results to obtain a high-order addition As a result, the product of the first mantissa and the second mantissa is obtained after the high-order addition result and the low-order addition result are accumulated.
本申请中,可选的,图4中展示的乘法电路201、第一选择电路202和加法电路203的数量仅做示例性说明,计算电路102中可以包括更多的乘法电路201,更多的加法电路203和/或更多的第一选择电路202,具体此处不做限定。In the present application, optionally, the numbers of the multiplication circuit 201, the first selection circuit 202 and the addition circuit 203 shown in FIG. The addition circuit 203 and/or more first selection circuits 202 are not limited here.
本申请中,加法电路203具有具体的实现方式,下面将以图5为例来说明本申请提供的加法电路203的一种具体的实现形式。In the present application, the adding circuit 203 has a specific implementation manner. The following will take FIG. 5 as an example to illustrate a specific implementation form of the adding circuit 203 provided in the present application.
图5是本申请实施例提供的一种计算电路的另一种结构示意图。FIG. 5 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
可选的,加法电路203可以包括第一加法器301和累加器302。Optionally, the adding circuit 203 may include a first adder 301 and an accumulator 302 .
其中,第一加法器301的输入端与第一选择电路202的输出端电连接,第一加法器301的输出端与累加器302的输入端电连接。Wherein, the input end of the first adder 301 is electrically connected to the output end of the first selection circuit 202 , and the output end of the first adder 301 is electrically connected to the input end of the accumulator 302 .
第一加法器301可以在第一计算周期对多个第一加法数据和多个第一运算结果相加得到低位加法结果和进位数据,在第二计算周期对进位数据、多个第二加法数据和多个第一运算结果相加得到高位加法结果。The first adder 301 can add a plurality of first addition data and a plurality of first operation results in the first calculation cycle to obtain the low-order addition result and carry data, and in the second calculation cycle, the carry data and the plurality of second addition data and adding the multiple first operation results to obtain a high-order addition result.
累加器302可以将低位加法结果和高位加法结果累加后得到第一尾数与第二尾数的乘积。The accumulator 302 may accumulate the low bit addition result and the high bit addition result to obtain the product of the first mantissa and the second mantissa.
本申请中,可选的,图5中展示的加法电路203中包括的第一加法器301和累加器302的数量仅做示例性说明。加法电路203中可以包括更多的第一加法器301和/或更多的累加器302,具体此处不做限定。In this application, optionally, the numbers of the first adder 301 and the number of accumulators 302 included in the adding circuit 203 shown in FIG. 5 are only for illustration. The adding circuit 203 may include more first adders 301 and/or more accumulators 302, which are not limited here.
本申请中,乘法电路201具有具体的实现方式,下面将以图6为例来说明本申请提供的乘法电路201的一种具体的实现形式。In this application, the multiplication circuit 201 has a specific implementation manner. The following will take FIG. 6 as an example to illustrate a specific implementation form of the multiplication circuit 201 provided in this application.
图6是本申请实施例提供的一种计算电路的另一种结构示意图。FIG. 6 is another schematic structural diagram of a computing circuit provided by an embodiment of the present application.
请参阅图6,乘法电路201包括乘法器303和移位寄存器304;Referring to FIG. 6, the multiplication circuit 201 includes a multiplier 303 and a shift register 304;
乘法器303的输入端与指数处理电路101的输出端电连接,乘法器303的输出端与移位寄存器304的输入端电连接。The input terminal of the multiplier 303 is electrically connected to the output terminal of the exponent processing circuit 101 , and the output terminal of the multiplier 303 is electrically connected to the input terminal of the shift register 304 .
乘法器303可以将第一拆分尾数以及第二拆分尾数相乘得到多个第三运算结果。The multiplier 303 may multiply the first split mantissa and the second split mantissa to obtain multiple third operation results.
移位寄存器304可以根据多个第一移位数对多个第三运算结果做移位处理得到多个第一运算结果。The shift register 304 can perform shift processing on multiple third operation results according to the multiple first shift numbers to obtain multiple first operation results.
下面以一个具体的计算示例来说明本申请中计算电路102的运算过程。The calculation process of the calculation circuit 102 in this application is described below with a specific calculation example.
在介绍计算电路102的运算过程之前,首先介绍一下第一浮点数和第二浮点数的尾数部分的格式。Before introducing the operation process of the calculation circuit 102, the format of the mantissa part of the first floating point number and the second floating point number is first introduced.
图7为本申请中提供的第一浮点数和第二浮点数的尾数部分的示意图。Fig. 7 is a schematic diagram of mantissa parts of the first floating point number and the second floating point number provided in this application.
示例性的,假设内存中存储有第一浮点数A和第二浮点数B,其中,第一浮点数A与第二浮点数B均为FP64类型的浮点数。在处理FP64类型的浮点数的计算时,如图7,计算电路可以将第一浮点数A的mantissa部分(第一尾数)拆分为a0、a1、a2、a3、a4五个部分。将第二浮点数B的mantissa部分(第二尾数)拆分为b0、b1、b2、b3、b4五个部分。其中,a0、a1、a2、a3、a4便是第一拆分尾数,b0、b1、b2、b3、b4便是第二拆分尾数。a1、a2、a3、a4、b1、b2、b3、b4的位数均为12bit,a0、b0的位数为5bit。Exemplarily, it is assumed that a first floating-point number A and a second floating-point number B are stored in the memory, wherein both the first floating-point number A and the second floating-point number B are FP64 floating-point numbers. When processing the calculation of FP64 floating-point numbers, as shown in FIG. 7 , the calculation circuit can split the mantissa part (first mantissa) of the first floating-point number A into five parts a0, a1, a2, a3, and a4. The mantissa part (second mantissa) of the second floating point number B is split into five parts b0, b1, b2, b3 and b4. Among them, a0, a1, a2, a3, and a4 are the first split mantissas, and b0, b1, b2, b3, b4 are the second split mantissas. The digits of a1, a2, a3, a4, b1, b2, b3, and b4 are all 12 bits, and the digits of a0 and b0 are 5 bits.
得到第一拆分尾数和第二拆分尾数后,计算电路102进行运算时,第一浮点数A的 mantissa部分与第二浮点数B的mantissa部分的乘法可以表示为公式1。After the first split mantissa and the second split mantissa are obtained, the multiplication of the mantissa part of the first floating point number A and the mantissa part of the second floating point number B can be expressed as Formula 1 when the calculation circuit 102 performs operations.
公式1:Formula 1:
A mantissa*B mantissa A mantissa *B mantissa
=(a0<<48bit+a1<<36bit+a2<<24bit+a3<<12bit+a4)*=(a0<<48bit+a1<<36bit+a2<<24bit+a3<<12bit+a4)*
(a0<<48bit+a1<<36bit+a2<<24bit+a3<<12bit+a4)(a0<<48bit+a1<<36bit+a2<<24bit+a3<<12bit+a4)
=Part1[Low_12bit(a0*b4+a4*b0+a1*b3+a3*b1+a2*b2)<<48bit=Part1[Low_12bit(a0*b4+a4*b0+a1*b3+a3*b1+a2*b2)<<48bit
+(a1*b4+b1*a4+a2*b3+a3*b2)<<32bit+(a1*b4+b1*a4+a2*b3+a3*b2)<<32bit
+(a2*b4+a4*b2+a3*b3)<<24bit+(a2*b4+a4*b2+a3*b3)<<24bit
+(a3*b4+a4*b3)<<12bit+(a3*b4+a4*b3)<<12bit
+(a4*b4)]+(a4*b4)]
+Part2[(a0*b0))96bit+Part2[(a0*b0))96bit
+(a0*b1+a1*b0)<<84bit+(a0*b1+a1*b0)<<84bit
+(a0*b2+a2*b0+a1*b1)<<72bit+(a0*b2+a2*b0+a1*b1)<<72bit
+(a0*b3+a3*b0+a1*b2+a2*b1)<<60bit+(a0*b3+a3*b0+a1*b2+a2*b1)<<60bit
+High_12bit(a0*b4+a4*b0+a1*b3+a3*b1+a2*b2)<<48bit]+High_12bit(a0*b4+a4*b0+a1*b3+a3*b1+a2*b2)<<48bit]
本申请中,由于FP64类型的浮点数的mantissa部分长度为53bit。因此,A_mantissa*B_mantissa计算后得到的尾数部分的总长度数为106bit。如果想在一个计算单元(PE单元)内直接完成一对FP64类型的浮点数的尾数部分的计算,adder(第一加法器)需要扩位成支持长度为106bit的数据计算的加法器,括位后的adder的面积代价和时序代价均过高。因此,可以选择将一对FP64的mantissa的乘法拆成两个部分,在第一计算周期计算上述公式中的第一部分(part1),在第二计算周期计算第二部分(part2)。In this application, since the length of the mantissa part of the FP64 floating point number is 53 bits. Therefore, the total length of the mantissa obtained after calculating A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 type floating-point numbers in a calculation unit (PE unit), the adder (the first adder) needs to be expanded into an adder that supports data calculations with a length of 106 bits. The area cost and timing cost of the subsequent adder are too high. Therefore, you can choose to divide the multiplication of a pair of FP64 mantissa into two parts, calculate the first part (part1) in the above formula in the first calculation cycle, and calculate the second part (part2) in the second calculation cycle.
下面结合附图以及上述公式1分别介绍乘法电路201、第一选择电路202和加法电路203的运算过程。The operation process of the multiplication circuit 201 , the first selection circuit 202 and the addition circuit 203 will be respectively introduced below in conjunction with the accompanying drawings and the above formula 1.
(1)乘法电路的运算过程。(1) The operation process of the multiplication circuit.
图8为本申请中提供的一种计算电路的一种结构示意图。FIG. 8 is a schematic structural diagram of a computing circuit provided in this application.
首先,结合图8以及上述公式说明乘法电路的运算过程。First, the operation process of the multiplication circuit will be described in conjunction with FIG. 8 and the above formula.
请参阅图8,本申请中,假设将计算电路102的硬件部分划分为多个计算模块。其中,每个计算模块中的多个乘法器分别计算a0*b4、a4*b0、a1*b3、a3*b1、a2*b2等第一拆分尾数和第二拆分尾数之间的乘积,上述a0*b4、a4*b0、a1*b3、a3*b1、a2*b2是为了举例说明,具体计算过程中,乘法器会将5个第一拆分尾数和5个第二拆分尾数两两之间的乘积全部计算出来,得到25个第三运算结果。Referring to FIG. 8 , in this application, it is assumed that the hardware part of the computing circuit 102 is divided into multiple computing modules. Wherein, a plurality of multipliers in each calculation module respectively calculate the product between the first split mantissa and the second split mantissa such as a0*b4, a4*b0, a1*b3, a3*b1, a2*b2, The a0*b4, a4*b0, a1*b3, a3*b1, and a2*b2 mentioned above are for example illustration. During the specific calculation process, the multiplier will divide the 5 first split mantissas and the 5 second split mantissas into two All the products between the two are calculated, and 25 third operation results are obtained.
移位寄存器根据指数处理电路输出的多个第一移位数分别对25个第三运算结果进行移位后,便可以得到第一运算结果。上述公式中,示例性的,(a0*b4+a4*b0+a1* b3+a3*b1+a2*b2)<<48bit中,48bit便是下述第三运算结果a0*b4、a4*b0、a1*b3、a3*b1、a2*b2的第一移位数。多个移位寄存器便可以对多个第三运算结果根据第一移位数做移位处理。After the shift register respectively shifts the 25 third operation results according to the plurality of first shift numbers output by the index processing circuit, the first operation result can be obtained. In the above formula, for example, (a0*b4+a4*b0+a1*b3+a3*b1+a2*b2)<<48bit, 48bit is the following third operation results a0*b4, a4*b0 , the first shift number of a1*b3, a3*b1, a2*b2. The multiple shift registers can perform shift processing on the multiple third operation results according to the first shift number.
上述示例中说明的第三运算结果以及对应的48bit移位数仅做示例性说明,实际计算过程中,移位寄存器可以根据其他第一移位数对更多个第三运算结果进行移位处理,第一移位数也可以是其他移位数,具体此处不做限定。The third operation result and the corresponding 48-bit shift number described in the above example are only illustrative. In the actual calculation process, the shift register can shift more third operation results according to other first shift numbers. , the first shift number may also be other shift numbers, which are not limited here.
(2)第一选择电路以及加法电路的运算过程。(2) Operation process of the first selection circuit and the addition circuit.
下面根据图8以及上述公式说明第一选择电路以及加法电路的运算过程。The operation process of the first selection circuit and the addition circuit will be described below according to FIG. 8 and the above formula.
本申请中,第一计算模块和第二计算模块中的第一选择电路接收到移位处理之后的a2*b2、a1*b3、a3*b1、a0*b4和a4*b0(第一运算结果)。由于a2*b2、a1*b3、a3*b1、a0*b4和a4*b0移位后的结果的长度为24bit,第一选择电路可以在第一计算周期输出低位的12bit(第一加法数据),通过多个52bit的加法器(第一加法器)将低位的12bit与移位处理后的a1*b4、a4*b1、a2*b3、a3*b2、a3*b3、a2*b4、a4*b2、a3*b4、a4*b3、a4*b4(第一运算结果)相加,得到低位加法结果和进位数据,其中进位数据是指多个加法结果计算后由于进位产生的数据。In this application, the first selection circuit in the first calculation module and the second calculation module receives the shifted a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 (the first calculation result ). Since the length of the shifted results of a2*b2, a1*b3, a3*b1, a0*b4, and a4*b0 is 24 bits, the first selection circuit can output the lower 12 bits (first addition data) in the first calculation cycle , through a plurality of 52bit adders (first adder), the lower 12bit and the shifted a1*b4, a4*b1, a2*b3, a3*b2, a3*b3, a2*b4, a4* Add b2, a3*b4, a4*b3, and a4*b4 (the first operation result) to obtain the low-order addition result and carry data, where the carry data refers to the data generated by carry after multiple addition results are calculated.
同理可知,第一计算模块和第二计算模块中的第一选择电路接收到移位处理之后的a2*b2、a1*b3、a3*b1、a0*b4和a4*b0(第一运算结果)。由于a2*b2、a1*b3、a3*b1、a0*b4和a4*b0移位后的结果的长度为24bit,第一选择电路可以在第二计算周期输出高位的12bit(第二加法数据),通过多个52bit的加法器(第一加法器)将高位的12bit与移位处理后的a0*b0、a0*b1、a1*b0、a0*b2、a2*b0、a1*b1、a0*b3、a3*b0、a1*b2和a2*b1(第一运算结果)、进位数据相加,得到高位加法结果。In the same way, it can be seen that the first selection circuit in the first calculation module and the second calculation module receives a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 (the first operation result ). Since the length of the shifted results of a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 is 24 bits, the first selection circuit can output the high-order 12 bits (second addition data) in the second calculation cycle , through a plurality of 52-bit adders (first adder), the high-order 12bit and the shifted a0*b0, a0*b1, a1*b0, a0*b2, a2*b0, a1*b1, a0* Add b3, a3*b0, a1*b2 and a2*b1 (the first operation result) and the carry data to obtain the high-order addition result.
累加器将低位加法结果和高位加法结果累加得到第一尾数和第二尾数的乘积。The accumulator accumulates the low bit addition result and the high bit addition result to obtain the product of the first mantissa and the second mantissa.
上述示例结合图8中的具体的硬件结构说明了计算电路102的运算过程,下面将结合图9,说明一下计算电路102分别在第一计算周期根据第一加法数据和第一运算结果得到低位加法结果的过程,以及,计算电路102在第二计算周期根据第二加法数据、进位数据和第一运算结果得到高位加法结果的过程。The above example illustrates the operation process of the calculation circuit 102 in conjunction with the specific hardware structure in FIG. 8 . Next, in conjunction with FIG. 9 , it will be explained that the calculation circuit 102 obtains the low-order addition in the first calculation cycle according to the first addition data and the first operation result. The process of the result, and the process of the calculation circuit 102 obtaining the high order addition result according to the second addition data, the carry data and the first operation result in the second calculation cycle.
图9为本申请中提供的一种计算电路的运算过程的一种示意图。FIG. 9 is a schematic diagram of an operation process of a calculation circuit provided in this application.
请参阅图9,图中PP1至PP25分别代表多个第一运算结果,其中,a2*b2、a1*b3、a3*b1、a0*b4和a4*b0分别对应图中的PP11至PP15。第一计算周期中计算PP11至PP15的低位尾数部分与其他第一运算结果的和,第二计算周期中计算进位数据PP00、PP11至PP15的高位尾数部分与其他第一运算结果的和。Please refer to FIG. 9 , in which PP1 to PP25 respectively represent a plurality of first operation results, wherein a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 respectively correspond to PP11 to PP15 in the figure. The sum of the low-order mantissa parts of PP11 to PP15 and other first operation results is calculated in the first calculation cycle, and the sum of the high-order mantissa parts of the carry data PP00, PP11 to PP15 and other first operation results is calculated in the second calculation cycle.
本申请中,第一计算周期计算低比特位的15组部分的和,取PP1的低12比特位为最终106比特结果的[11:0]比特位,而PP1的高12比特作为加法树的低12比特位的输入。取PP11至PP15的低12比特位为输入48比特加法树中的高12比特位,PP2至PP10则按图所示,截取移位后的[59:12]比特位为加法树的输入,这里在第一计算周期计算低比特位的15组部分积累加得到52比特的结果,[47:0]的48比特位的结果是最终106比特结果的[59:12]比特位,而高位的4比特即[51:48]则作为进位数据。在第二计算周期中,取PP11 至PP15的高12比特位为输入48比特加法树中的低12比特位,取PP25的低12比特位为输入48比特加法树中的高12比特位,此处的PP25位宽仅为10比特,所以PP25的高12比特位即[23:12]可以舍去不计,PP16至PP24则按图所示,截取移位后的[59:12]比特位为加法树的输入,在加上第一计算周期的进位信号,即可得到52比特的结果。由于因为53bit*53bit最大也只能得到106比特的最终结果,所以第2个周期加法树计算的52比特结果中最高的6个比特位即[51:46]必定为0,而剩下的46比特位即为最终结果的[105:60]比特位。In this application, the first calculation cycle calculates the sum of 15 groups of low-bit bits, and the low 12 bits of PP1 are taken as the [11:0] bits of the final 106-bit result, and the high 12 bits of PP1 are used as the addition tree. Input for the lower 12 bits. Take the lower 12 bits of PP11 to PP15 as the upper 12 bits in the input 48-bit addition tree, and PP2 to PP10 are as shown in the figure, and the [59:12] bits after the interception and shift are the input of the addition tree, here In the first calculation cycle, the 15 groups of low-bit bits are calculated and accumulated to obtain a 52-bit result. The 48-bit result of [47:0] is the [59:12] bit of the final 106-bit result, and the high-order 4 Bits [51:48] are used as carry data. In the second calculation cycle, take the high 12 bits of PP11 to PP15 as the low 12 bits in the input 48-bit addition tree, and get the low 12 bits of PP25 as the high 12 bits of the input 48-bit addition tree, here The bit width of PP25 is only 10 bits, so the upper 12 bits of PP25, that is, [23:12] can be ignored, and PP16 to PP24 are as shown in the figure, and the bits [59:12] after interception and shifting are The input of the addition tree is added with the carry signal of the first calculation cycle to obtain a 52-bit result. Because 53bit*53bit can only get the final result of 106 bits at most, the highest 6 bits in the 52-bit result calculated by the addition tree in the second cycle, that is, [51:46] must be 0, and the remaining 46 The bits are the [105:60] bits of the final result.
本申请中,可选的,对第一计算周期因进位带来的高位4bit[51:48](进位数据PP0),可以选择增加额外的寄存器保存,此时加法树的比特位宽可以做到48bit。也可以完全用加法树覆盖进位,此时加法树的比特位宽需要52bit。具体此处不做限定。In this application, optionally, for the high-order 4bit[51:48] (carry data PP0) brought by the carry in the first calculation cycle, you can choose to add an additional register for storage. At this time, the bit width of the addition tree can be achieved 48bit. It is also possible to completely use the addition tree to cover the carry. At this time, the bit width of the addition tree needs to be 52 bits. Specifically, there is no limitation here.
图10为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 10 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
如图10所示,分别是part1、part2各部分在计算后得到的第一运算结果在加法树中的对应位置。根据图10可知,60bit的加法树能覆盖part1的计算。计算低位加法结果(Part1)的加法树需要52bit位宽,加法树可以完全覆盖Part1部分的4bit的进位。计算高位加法结果(Part2)的加法树需要48bit位宽。这样对每个Part,加法树最多只需要52bit就能够实现第一尾数和第二尾数之间的乘法运算。As shown in FIG. 10 , they are the corresponding positions in the addition tree of the first operation results obtained after the calculation of each part of part1 and part2. According to Figure 10, it can be seen that the 60bit addition tree can cover the calculation of part1. The addition tree for calculating the low-order addition result (Part1) requires a 52-bit bit width, and the addition tree can completely cover the 4-bit carry of the Part1 part. The addition tree for calculating the high-order addition result (Part2) requires a 48-bit bit width. In this way, for each Part, the addition tree only needs 52 bits at most to realize the multiplication operation between the first mantissa and the second mantissa.
图11为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 11 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
如图11所示,图中黑色部分分别是part1中各第一运算结果在加法树中对应的位置。其中,根据第一移位数移位后的a2*b2、a1*b3、a3*b1、a0*b4和a4*b0的低位12bit位于48bit-60bit位。移位后的a1*b4、a4*b1、a2*b3、a3*b2位于36bit-60bit。移位后的a3*b3、a2*b4、a4*b2位于24bit-48bit。移位后的a3*b4、a4*b3位于12bit-36bit。移位后的a4*b4位于0bit-24bit。As shown in FIG. 11 , the black parts in the figure are the corresponding positions of the first operation results in part1 in the addition tree. Wherein, the lower 12 bits of a2*b2, a1*b3, a3*b1, a0*b4 and a4*b0 shifted according to the first shift number are located at 48bit-60bit. The shifted a1*b4, a4*b1, a2*b3, and a3*b2 are located at 36bit-60bit. The shifted a3*b3, a2*b4, and a4*b2 are located at 24bit-48bit. The shifted a3*b4 and a4*b3 are located at 12bit-36bit. The shifted a4*b4 is located at 0bit-24bit.
图12为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 12 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
如图12所示,图中黑色部分分别是part2中各第一运算结果在加法树中对应的位置。As shown in FIG. 12 , the black parts in the figure are the corresponding positions of the first operation results in part2 in the addition tree.
其中,根据第一移位数移位后的a0*b4和a4*b0的高位12bit位于60bit-72bit位。根据第一移位数移位后的a2*b2、a1*b3、a3*b1、的高位12bit位于60bit-72bit位。根据第一移位数移位后的a0*b3、a3*b0位于60bit-77bit位。根据第一移位数移位后的a1*b2、a2*b1位于60bit-84bit位。根据第一移位数移位后的a0*b2、a2*b0位于72bit-89bit位。根据第一移位数移位后的a1*b1位于72bit-96bit位。根据第一移位数移位后的a0*b1、a1*b0位于84bit-101bit位。根据第一移位数移位后的a0*b0位于96bit-106bit位。Wherein, the high-order 12 bits of a0*b4 and a4*b0 shifted according to the first shift number are located at 60bit-72bit. The high-order 12 bits of a2*b2, a1*b3, and a3*b1 shifted according to the first shift number are located at 60bit-72bit. a0*b3 and a3*b0 shifted according to the first shift number are located at 60bit-77bit. The shifted a1*b2 and a2*b1 according to the first shift number are located at 60bit-84bit. The shifted a0*b2 and a2*b0 according to the first shift number are located at 72bit-89bit. a1*b1 shifted according to the first shift number is located at 72bit-96bit. The shifted a0*b1 and a1*b0 according to the first shift number are located at 84bit-101bit. The a0*b0 shifted according to the first shift number is located at 96bit-106bit.
本申请中,上述图11和图12所示的示例中,注意到在Part1和Part2中,都重复计算了a2*b2,a1*b3,a3*b1,a0*b4,a4*b0。但是Part1是取低位的12bit,Part2是取高位的12bit,这可以通过在计算电路中增加第一选择电路来实现,固定移位的开销很小,对面积和功耗的增加也很小。而部分重复运算不增加面积,功耗影响也很小。In the present application, in the above examples shown in FIG. 11 and FIG. 12 , it is noticed that in both Part1 and Part2, a2*b2, a1*b3, a3*b1, a0*b4, and a4*b0 are repeatedly calculated. But Part1 takes the low 12bit, and Part2 takes the high 12bit. This can be realized by adding the first selection circuit in the calculation circuit. The overhead of fixed shifting is very small, and the increase in area and power consumption is also small. However, some repeated operations do not increase the area, and the impact on power consumption is also small.
本申请中,可选的,浮点数计算电路还可以包括拆分电路。In this application, optionally, the floating-point number calculation circuit may also include a splitting circuit.
图13为本申请提供的一种拆分电路的一种结构示意图。FIG. 13 is a schematic structural diagram of a splitting circuit provided in the present application.
请参阅图13,拆分电路的输出端与指数处理电路的输入端和乘法电路的输入端电连接。 可选的,拆分电路中可以包括第一选择器和寄存器,第一选择器的输出端和寄存器的输入端电连接。其中,第一选择器输入第一浮点数和第二浮点数,第一选择器将第一浮点数和第二浮点数拆分后得到的结果保存在对应的寄存器中。Please refer to FIG. 13 , the output terminal of the splitting circuit is electrically connected with the input terminal of the exponent processing circuit and the input terminal of the multiplication circuit. Optionally, the splitting circuit may include a first selector and a register, and an output end of the first selector is electrically connected to an input end of the register. Wherein, the first selector inputs the first floating-point number and the second floating-point number, and the first selector stores the results obtained after splitting the first floating-point number and the second floating-point number into corresponding registers.
图14为本申请提供的一种第一尾数和第二尾数的结构示意图。FIG. 14 is a schematic structural diagram of a first mantissa and a second mantissa provided by the present application.
如图14,具体的,拆分电路中的多个第一选择器可以将第一尾数拆分为第一拆分尾数,第一拆分尾数包括第一高位尾数与第一低位尾数,将第二尾数拆分为第二拆分尾数,第二拆分尾数包括第二高位尾数与第二低位尾数,第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。As shown in Figure 14, specifically, multiple first selectors in the splitting circuit can split the first mantissa into the first split mantissa, the first split mantissa includes the first high order mantissa and the first low order mantissa, and the first split mantissa The two-mantissa is split into the second split mantissa, the second split mantissa includes the second high-order mantissa and the second low-order mantissa, and the first shift number is used to indicate the difference between the highest bit of each high-order mantissa and the highest bit of each low-order mantissa shift difference.
可选的,第一高位尾数包括第三尾数,第一低位尾数包括第四尾数、第五尾数、第六尾数以及第七尾数,第二高位尾数包括第八尾数,第二低位尾数包括第九尾数、第十尾数,第十一尾数以及第十二尾数。图13中的拆分电路输入第一浮点数和第二浮点数之后,拆分电路中的多个第一选择器可以将第一浮点数的第一尾数拆分为第三尾数、第四尾数、第五尾数、第六尾数以及第七尾数。第一选择器可以将第二浮点数的第二尾数拆分为第八尾数、第九尾数、第十尾数,第十一尾数以及第十二尾数。Optionally, the first high-order mantissa includes the third mantissa, the first low-order mantissa includes the fourth mantissa, the fifth mantissa, the sixth mantissa, and the seventh mantissa, the second high-order mantissa includes the eighth mantissa, and the second low-order mantissa includes Ninth mantissa, tenth mantissa, eleventh mantissa, and twelfth mantissa. After the splitting circuit in Figure 13 inputs the first floating-point number and the second floating-point number, multiple first selectors in the splitting circuit can split the first mantissa of the first floating-point number into the third mantissa and the fourth mantissa , the fifth mantissa, the sixth mantissa, and the seventh mantissa. The first selector may split the second mantissa of the second floating-point number into an eighth mantissa, a ninth mantissa, a tenth mantissa, an eleventh mantissa, and a twelfth mantissa.
下面以具体示例来说明拆分电路拆分浮点数的尾数部分的过程。The process of splitting the mantissa part of the floating-point number by the splitting circuit is described below with a specific example.
例如,若第一浮点数为FP64类型的浮点数。假设拆分电路可以将第一浮点数的尾数部分拆分为长度为5bit的第三尾数10001、长度为12bit的第四尾数100000000001、长度为12bit的第五尾数100000000011、长度为12bit的第六尾数100000000111以及长度为12bit的第七尾数100000001111。For example, if the first floating point number is a floating point number of type FP64. Assume that the splitting circuit can split the mantissa part of the first floating-point number into the third mantissa 10001 with a length of 5 bits, the fourth mantissa with a length of 12 bits 100000000001, the fifth mantissa with a length of 12 bits 100000000011, and the sixth mantissa with a length of 12 bits 100000000111 and the seventh mantissa 100000001111 with a length of 12 bits.
本实施例中,第三尾数属于第一高位尾数,第四尾数、第五尾数、第六尾数、第七尾数属于第一低位尾数。第一移位数用于指示高位尾数的最高位与各个低位尾数的最高位之间的移位差值,即第一尾数的移位数为0,第四尾数的第一移位数为第四尾数的首位与第三尾数的首位之间的移位差值5位,与第三尾数的位数相同,所以第四尾数的第一移位数为右移5位。第五尾数的第一移位数为第五尾数的首位与第三尾数的首位之间的移位差值17位,与第三尾数和第四尾数的移位数之和相同,所以第五尾数的第一移位数为右移17位。第六尾数的第一移位数为第六尾数的首位与第三尾数的首位之间的移位差值29位,与第三尾数、第四尾数以及第五尾数的移位数之和相同,所以第六尾数的第一移位数为右移29位。第七尾数的第一移位数为第七尾数的首位与第三尾数的首位之间的移位差值41位,与第三尾数、第四尾数、第五尾数以及第六尾数的移位数之和相同,所以第七尾数的第一移位数为右移41位。In this embodiment, the third mantissa belongs to the first high mantissa, and the fourth mantissa, the fifth mantissa, the sixth mantissa, and the seventh mantissa belong to the first low mantissa. The first shift number is used to indicate the shift difference between the highest bit of the high-order mantissa and the highest bit of each low-order mantissa, that is, the first shift number of the first mantissa is 0, and the first shift number of the fourth mantissa is the first The shift difference between the first digit of the four mantissa and the first digit of the third mantissa is 5 bits, which is the same as the number of digits of the third mantissa, so the first shift digit of the fourth mantissa is a right shift of 5 bits. The first shift digit of the fifth mantissa is the 17-bit shift difference between the first digit of the fifth mantissa and the first digit of the third mantissa, which is the same as the sum of the shift digits of the third mantissa and the fourth mantissa, so the fifth The first shift of the mantissa is a right shift of 17 bits. The first shift of the sixth mantissa is the shift difference of 29 bits between the first digit of the sixth mantissa and the first digit of the third mantissa, which is the same as the sum of the shift digits of the third mantissa, the fourth mantissa, and the fifth mantissa , so the first shift of the sixth mantissa is a right shift of 29 bits. The first shift digit of the seventh mantissa is the shift difference of 41 bits between the first digit of the seventh mantissa and the first digit of the third mantissa, and the third digit, the fourth digit, the fifth digit and the sixth digit The sum of the shift digits is the same, so the first shift digit of the seventh mantissa is shifted right by 41 bits.
本实施例中,第一高位尾数与第二高位尾数还可以有其他不同的拆分方式,例如,第一位数长度为9bit,第二尾数、第三尾数、第四尾数与第五尾数均为11bit,具体此处不做限定。In this embodiment, the first high-order mantissa and the second high-order mantissa can also have other different splitting methods, for example, the length of the first digit is 9 bits, and the second mantissa, the third mantissa, the fourth mantissa and the fifth mantissa are all It is 11bit, which is not limited here.
本实施例中,第二高位尾数与第一高位尾数的拆分方式相类似,第二低位尾数与第一低位尾数的拆分方式相类似,具体此处不做赘述。In this embodiment, the splitting manner of the second high-order mantissa is similar to that of the first high-order mantissa, and the splitting manner of the second low-order mantissa is similar to that of the first low-order mantissa, and details are not described here.
上述拆分方式仅用于举例说明,可选的,第一浮点数可以是FP32类型的浮点数,第一浮点数也可以是FP64类型的浮点数,第一浮点数还可以是FP128类型的浮点数,具体此处 不做限定。可选的,第一浮点数的尾数部分拆分时可以拆分为两个部分,也可以拆分为多个部分,具体此处不做限定。拆分后的各尾数部分的位数可以相等,拆分后的各尾数部分的位数也可以不相等,具体此处不做限定。The above splitting method is only for illustration. Optionally, the first floating-point number can be a floating-point number of type FP32, the first floating-point number can also be a floating-point number of type FP64, and the first floating-point number can also be a floating-point number of type FP128. The number of points is not limited here. Optionally, the mantissa part of the first floating-point number may be split into two parts, or may be split into multiple parts, which is not specifically limited here. The number of digits of each mantissa part after splitting may be equal, or the number of digits of each mantissa part after splitting may be unequal, which is not specifically limited here.
本申请中,可选的,浮点数计算电路还可以包括存储电路。In this application, optionally, the floating-point number calculation circuit may further include a storage circuit.
其中,拆分电路的输出端与存储电路的输入端电连接,指数处理电路的输入端与存储电路的第一输出端电连接。计算电路的输入端与存储电路的第二输出端电连接。Wherein, the output end of the splitting circuit is electrically connected to the input end of the storage circuit, and the input end of the index processing circuit is electrically connected to the first output end of the storage circuit. The input terminal of the calculation circuit is electrically connected with the second output terminal of the storage circuit.
图15为本申请提供的一种存储电路的一种结构示意图。FIG. 15 is a schematic structural diagram of a storage circuit provided in the present application.
请参阅图15,存储电路中包括多个寄存器,用于存储第一拆分尾数、第二拆分尾数、第一指数、第二指数、第三移位数和第四移位数,第三移位数用于表示第一拆分尾数的移位数,第四移位数用于表示第二拆分尾数的移位数。Referring to Fig. 15, a plurality of registers are included in the storage circuit for storing the first split mantissa, the second split mantissa, the first exponent, the second exponent, the third shift number and the fourth shift number, the third The number of shifts is used to represent the number of shifts in the mantissa of the first split, and the fourth number of shifts is used to represent the number of shifts in the mantissa of the second split.
可以理解的是,图15中存储电路中包括的寄存器的数量仅做示例性说明。可选的,存储电路中可以包括比图15中所示的更多的寄存器,存储电路中可以包括比图15所示更少的寄存器,具体此处不做限定。It can be understood that the number of registers included in the storage circuit in FIG. 15 is only for illustration. Optionally, the storage circuit may include more registers than those shown in FIG. 15 , and the storage circuit may include fewer registers than those shown in FIG. 15 , which are not specifically limited here.
本申请中,可选的,浮点数计算电路还可以包括内存控制器。In this application, optionally, the floating-point number calculation circuit may further include a memory controller.
图16为本申请提供的一种内存控制器与内存之间的连接关系示意图。FIG. 16 is a schematic diagram of a connection relationship between a memory controller and memory provided by the present application.
如图16,内存控制器的输入端与内存的输出端连接,内存控制器的输出端与拆分电路的输入端电连接。As shown in FIG. 16 , the input end of the memory controller is connected to the output end of the memory, and the output end of the memory controller is electrically connected to the input end of the split circuit.
本申请中,内存中存储有第一浮点数以及第二浮点数,内存控制器可以获取第一浮点数以及第二浮点数,并且向拆分电路发送第一浮点数和第二浮点数。可选的,该内存可以是双倍数据速率(double data rate,DDR)内存,也可以是其他内存,具体此处不做限定。该内存控制器可以是DDR控制器,也可以是其他类型的内存控制器,具体此处不做限定。In this application, the first floating point number and the second floating point number are stored in the memory, and the memory controller can obtain the first floating point number and the second floating point number, and send the first floating point number and the second floating point number to the splitting circuit. Optionally, the memory may be a double data rate (DDR) memory, or other memory, which is not specifically limited here. The memory controller may be a DDR controller, or other types of memory controllers, which are not specifically limited here.
本申请中,指数处理电路101具有具体的实现方式,指数处理电路101可以根据第一指数和第二指数获取第一移位数也有具体的计算方式,下面结合图17说明指数处理电路101的具体实现方式以及指数处理电路101的运算过程。In this application, the index processing circuit 101 has a specific implementation method. The index processing circuit 101 can obtain the first shift number according to the first index and the second index. There is also a specific calculation method. The specific method of the index processing circuit 101 will be described below in conjunction with FIG. 17 Implementation and operation process of the exponent processing circuit 101.
图17为本申请中提供的一种指数处理电路的一种结构示意图。FIG. 17 is a schematic structural diagram of an index processing circuit provided in this application.
请参阅图17,本申请中,指数处理电路101至少包括第二加法器401、第二选择电路402以及第三加法器403。Please refer to FIG. 17 , in this application, the exponent processing circuit 101 includes at least a second adder 401 , a second selection circuit 402 and a third adder 403 .
其中,第二加法器401的输入端与存储电路的第一输出端电连接,第二加法器401的输出端与第三加法器403的第一输入端电连接。第三加法器403的第二输入端与第二选择电路的输出端电连接,第三加法器403的输出端与计算电路102的第一输入端电连接。Wherein, the input end of the second adder 401 is electrically connected to the first output end of the storage circuit, and the output end of the second adder 401 is electrically connected to the first input end of the third adder 403 . The second input end of the third adder 403 is electrically connected to the output end of the second selection circuit, and the output end of the third adder 403 is electrically connected to the first input end of the calculation circuit 102 .
本申请中,第二加法器401可以将第一指数、第二指数、第三移位数和第四移位数相加以得到多个第二运算结果。第二选择电路402可以选择多个第二运算结果中的最大值。第三加法器403将多个第二运算结果中的最大值分别与各第二运算结果相减以得到第一移位数。In this application, the second adder 401 may add the first exponent, the second exponent, the third shift number and the fourth shift number to obtain multiple second operation results. The second selection circuit 402 may select the maximum value among the multiple second operation results. The third adder 403 subtracts the maximum value among the multiple second operation results from each second operation result to obtain the first shift number.
本申请还提供了一种浮点数计算方法,该浮点数计算方法的具体实现方式可以参照上述图3至图17所述的浮点数计算电路进行理解,具体此处不做赘述。The present application also provides a floating-point number calculation method. The specific implementation of the floating-point number calculation method can be understood with reference to the above-mentioned floating-point number calculation circuits described in FIG. 3 to FIG. 17 , and details are not repeated here.
本申请还提供了另外一种浮点数计算电路,同理,该浮点数计算方法的具体实现方式 可以参照上述图3至图17所述的浮点数计算电路进行理解,具体此处不做赘述。The present application also provides another floating-point number calculation circuit. Similarly, the specific implementation of the floating-point number calculation method can be understood with reference to the above-mentioned floating-point number calculation circuit described in Figures 3 to 17, and details will not be repeated here.
图18为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 18 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
步骤一:请参阅图18,第二浮点数B是滤波矩阵中的数据。DDR控制器(内存控制器)从DDR(内存)中读取多个第一浮点数A与第二浮点数B,通过高低位拆分逻辑(拆分电路)将第一浮点数A的mantissa部分拆分为MSB和LSB两个部分并且存入数据RAM(存储电路),图10中I、II、…X中包括的内容即为各第一浮点数A的mantissa拆分后得到的A_MSB与A_LSB,以及各A_MSB、A_LSB所对应的指数部分EXP,将第二浮点数B的mantissa部分拆分为MSB和LSB两个部分并且存入权重RAM(存储电路),图18中1、2、N中包括的内容即为各第二浮点数B的mantissa拆分后得到的B_MSB与B_LSB,以及各B_MSB、B_LSB所对应的指数部分EXP。Step 1: Please refer to FIG. 18, the second floating point number B is the data in the filter matrix. The DDR controller (memory controller) reads multiple first floating-point numbers A and second floating-point numbers B from the DDR (memory), and splits the mantissa part of the first floating-point number A through high and low bit splitting logic (splitting circuit) Split into two parts, MSB and LSB, and store them in the data RAM (storage circuit). The content included in I, II, ... X in Figure 10 is the A_MSB and A_LSB obtained after the mantissa split of each first floating-point number A , and the exponent part EXP corresponding to each A_MSB, A_LSB, the mantissa part of the second floating point number B is split into two parts, MSB and LSB, and stored in the weight RAM (storage circuit), among 1, 2, and N in Fig. 18 The included content is B_MSB and B_LSB obtained after splitting the mantissa of each second floating point number B, and the exponent part EXP corresponding to each B_MSB and B_LSB.
图19为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 19 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
步骤二:请参阅图19,权重RAM中拆分之后的mantissa预加载到卷积计算单元中,同时EXP(拆分后各尾数部分对应的指数部分)经过EXP offset(第二加法器)处理后,同样预加载到卷积计算单元中。Step 2: Please refer to Figure 19, the split mantissa in the weight RAM is preloaded into the convolution calculation unit, and at the same time EXP (the exponent part corresponding to each mantissa part after splitting) is processed by EXP offset (the second adder) , which is also preloaded into the convolution computing unit.
图20为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 20 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
步骤三:请参阅图20,从数据RAM中提取第一段mantissa数据(I部分),同样EXP部分也先经过exp offset处理后,放置到卷积计算单元中,与预加载的参数(1部分)进行计算并且获得结果。Step 3: Please refer to Figure 20, extract the first segment of mantissa data (Part I) from the data RAM, and the same EXP part is also first processed by exp offset, and then placed in the convolution calculation unit, and the preloaded parameters (Part I) ) to calculate and get the result.
图21为本申请实施例提供的浮点数计算电路的另一实施例示意图。FIG. 21 is a schematic diagram of another embodiment of the floating-point number calculation circuit provided by the embodiment of the present application.
步骤四:请参阅图21,卷积处理单元1将第一段数据(I部分)转发给计算单元2,并且从数据RAM中获取第二段数据(II部分)。计算单元1在获取II部分数据之后、计算单元2在获取I部分数据之后完成运算生成结果。此后每个时钟,计算单元2~N将上一个时钟处理完毕的数据转发给下一个计算单元,计算单元1每次从数据RAM中获取新的数据。Step 4: Please refer to FIG. 21 , the convolution processing unit 1 forwards the first piece of data (Part I) to the calculation unit 2, and obtains the second piece of data (Part II) from the data RAM. After the calculation unit 1 acquires the data of the II part, the calculation unit 2 completes the operation and generates the result after acquiring the data of the I part. After each clock, computing units 2-N forward the data processed by the previous clock to the next computing unit, and computing unit 1 acquires new data from the data RAM each time.
步骤五:重复步骤四直到所有的数据完成运算,生成结果。Step 5: Repeat step 4 until all the data is calculated and the result is generated.
以上对本申请实施例所提供的浮点数计算电路以及浮点数计算方法进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The floating-point number calculation circuit and the floating-point number calculation method provided by the embodiment of the present application have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present application. The description of the above embodiment is only used to help understanding The method of this application and its core idea. At the same time, for those skilled in the art, based on the idea of this application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as limiting the application.

Claims (28)

  1. 一种浮点数计算电路,其特征在于,所述浮点数计算电路用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数和第一尾数,所述第二浮点数包括第二指数和第二尾数,所述浮点数计算电路包括:指数处理电路和计算电路;A floating-point number calculation circuit, characterized in that the floating-point number calculation circuit is used to calculate the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, and the first floating-point number Two floating-point numbers include a second exponent and a second mantissa, and the floating-point number calculation circuit includes: an index processing circuit and a calculation circuit;
    所述指数处理电路的输出端与所述计算电路的输入端电连接;The output end of the index processing circuit is electrically connected to the input end of the calculation circuit;
    所述指数处理电路,用于根据所述第一指数和所述第二指数获取第一移位数,所述第一移位数用于表示第一拆分尾数和第二拆分尾数之间乘积的移位数,所述第一拆分尾数由所述第一尾数拆分得到,所述第二拆分尾数由所述第二尾数拆分得到;The exponent processing circuit is configured to obtain a first shift number according to the first exponent and the second exponent, and the first shift number is used to represent the difference between the first split mantissa and the second split mantissa The number of shifts of the product, the first split mantissa is obtained by splitting the first mantissa, and the second split mantissa is obtained by splitting the second mantissa;
    所述计算电路,用于选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,所述第一运算结果用于表示所述第一拆分尾数和所述第二拆分尾数的乘积根据所述第一移位数进行移位后得到的数据。The calculation circuit is used to select and output part of data in the plurality of first operation results to obtain a plurality of first addition data and a plurality of second addition data, according to a plurality of the first addition data, a plurality of the second addition data Adding data and a plurality of the first operation results to obtain a product of the first mantissa and the second mantissa, the first operation result being used to represent the first split mantissa and the second split mantissa The data obtained after the product of is shifted according to the first shift number.
  2. 根据权利要求1所述的浮点数计算电路,其特征在于,所述计算电路包括乘法电路、加法电路和第一选择电路;The floating-point number calculation circuit according to claim 1, wherein the calculation circuit comprises a multiplication circuit, an addition circuit and a first selection circuit;
    所述指数处理电路的输出端与所述乘法电路的输入端电连接;The output terminal of the exponential processing circuit is electrically connected to the input terminal of the multiplication circuit;
    所述第一选择电路的输入端与所述乘法电路的输出端电连接,所述第一选择电路的输出端与所述加法电路的输入端电连接;The input end of the first selection circuit is electrically connected to the output end of the multiplication circuit, and the output end of the first selection circuit is electrically connected to the input end of the addition circuit;
    所述第一选择电路,用于选择输出多个所述第一运算结果中的低位数据得到多个所述第一加法数据,选择输出多个所述第一运算结果中的高位数据得到多个所述第二加法数据;The first selection circuit is configured to select and output low-order data in a plurality of first operation results to obtain a plurality of first addition data, and select and output high-order data in a plurality of first operation results to obtain a plurality of the second addition data;
    所述加法电路,用于对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。The addition circuit is configured to add a plurality of the first addition data and a plurality of the first operation results to obtain a low-order addition result and carry data, and for the carry data, the plurality of second addition data and Adding a plurality of first operation results to obtain a high-order addition result, and accumulating the high-order addition result and the low-order addition result to obtain a product of the first mantissa and the second mantissa.
  3. 根据权利要求2所述的浮点数计算电路,其特征在于,所述加法电路包括第一加法器和累加器;The floating-point number calculation circuit according to claim 2, wherein the addition circuit comprises a first adder and an accumulator;
    所述第一加法器的输入端与所述第一选择电路的输出端电连接,所述第一加法器的输出端与所述累加器的输入端电连接;The input end of the first adder is electrically connected to the output end of the first selection circuit, and the output end of the first adder is electrically connected to the input end of the accumulator;
    所述第一加法器,用于在第一计算周期对多个所述第一加法数据和多个所述第一运算结果相加得到所述低位加法结果和所述进位数据,在第二计算周期对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到所述高位加法结果;The first adder is configured to add a plurality of the first addition data and a plurality of the first operation results in the first calculation cycle to obtain the low-order addition result and the carry data, and in the second calculation period Periodically add the carry data, a plurality of the second addition data, and a plurality of the first operation results to obtain the high-order addition result;
    所述累加器,用于将所述低位加法结果和所述高位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。The accumulator is configured to accumulate the low-order addition result and the high-order addition result to obtain the product of the first mantissa and the second mantissa.
  4. 根据权利要求2或3所述的浮点数计算电路,其特征在于,所述浮点数计算电路还包括拆分电路;The floating-point number calculation circuit according to claim 2 or 3, wherein the floating-point number calculation circuit also includes a splitting circuit;
    所述拆分电路的输出端与所述指数处理电路的输入端和所述乘法电路的输入端电连接;The output end of the splitting circuit is electrically connected to the input end of the exponent processing circuit and the input end of the multiplication circuit;
    所述拆分电路,用于将所述第一尾数拆分为所述第一拆分尾数,所述第一拆分尾数包括第一高位尾数与第一低位尾数,将所述第二尾数拆分为所述第二拆分尾数,所述第二拆 分尾数包括第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。The splitting circuit is configured to split the first mantissa into the first split mantissa, the first split mantissa includes a first high-order mantissa and a first low-order mantissa, and split the second mantissa Divided into the second split mantissa, the second split mantissa includes a second high-order mantissa and a second low-order mantissa, and the first shift number is used to indicate the highest bit of each high-order mantissa and the highest bit of each low-order mantissa The shift difference between bits.
  5. 根据权利要求4所述的浮点数计算电路,其特征在于,The floating-point number calculation circuit according to claim 4, wherein,
    所述第一高位尾数包括第三尾数,所述第一低位尾数包括第四尾数、第五尾数、第六尾数以及第七尾数,所述第二高位尾数包括第八尾数,所述第二低位尾数包括第九尾数、第十尾数,第十一尾数以及第十二尾数。The first high-order mantissa includes a third mantissa, the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa, and the second high-order mantissa includes an eighth mantissa. The second low mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
  6. 根据权利要求1至5中任意一项所述的浮点数计算电路,其特征在于,所述浮点数计算电路还包括存储电路;The floating-point number calculation circuit according to any one of claims 1 to 5, wherein the floating-point number calculation circuit also includes a storage circuit;
    所述拆分电路的输出端与所述存储电路的输入端电连接;The output end of the splitting circuit is electrically connected to the input end of the storage circuit;
    所述指数处理电路的输入端与所述存储电路的第一输出端电连接;The input end of the index processing circuit is electrically connected to the first output end of the storage circuit;
    所述计算电路的输入端与所述存储电路的第二输出端电连接;The input terminal of the calculation circuit is electrically connected to the second output terminal of the storage circuit;
    所述存储电路,用于存储所述第一拆分尾数、所述第二拆分尾数、所述第一指数、所述第二指数、第三移位数和第四移位数,所述第三移位数用于表示所述第一拆分尾数的移位数,所述第四移位数用于表示所述第二拆分尾数的移位数。The storage circuit is used to store the first split mantissa, the second split mantissa, the first exponent, the second exponent, the third shift number and the fourth shift number, the The third shift number is used to represent the shift number of the first split mantissa, and the fourth shift number is used to represent the shift number of the second split mantissa.
  7. 根据权利要求6所述的浮点数计算电路,其特征在于,The floating-point number calculation circuit according to claim 6, wherein,
    所述指数处理电路包括第二加法器、第二选择电路以及第三加法器;The index processing circuit includes a second adder, a second selection circuit and a third adder;
    所述第二加法器的输入端与所述存储电路的第一输出端电连接,所述第二加法器的输出端与所述第三加法器的第一输入端电连接;The input end of the second adder is electrically connected to the first output end of the storage circuit, and the output end of the second adder is electrically connected to the first input end of the third adder;
    所述第三加法器的第二输入端与所述第二选择电路的输出端电连接,所述第三加法器的输出端与所述计算电路的第一输入端电连接;The second input end of the third adder is electrically connected to the output end of the second selection circuit, and the output end of the third adder is electrically connected to the first input end of the calculation circuit;
    所述第二加法器,用于将所述第一指数、所述第二指数、所述第三移位数和所述第四移位数相加以得到多个第二运算结果;The second adder is configured to add the first exponent, the second exponent, the third shift number and the fourth shift number to obtain a plurality of second operation results;
    所述第二选择电路,用于选择多个所述第二运算结果中的最大值;The second selection circuit is configured to select a maximum value among a plurality of the second operation results;
    所述第三加法器,用于将多个所述第二运算结果中的最大值分别与各第二运算结果相减以得到所述第一移位数。The third adder is configured to subtract the maximum value among the plurality of second operation results from each second operation result to obtain the first shift number.
  8. 根据权利要求6或7中任意一项所述的浮点数计算电路,其特征在于,所述乘法电路包括乘法器和移位寄存器;The floating-point number calculation circuit according to any one of claims 6 or 7, wherein the multiplication circuit comprises a multiplier and a shift register;
    所述乘法器的输入端与所述存储电路的第二输出端电连接,所述乘法器的输出端与所述移位寄存器的第一输入端电连接;The input end of the multiplier is electrically connected to the second output end of the storage circuit, and the output end of the multiplier is electrically connected to the first input end of the shift register;
    所述移位寄存器的第二输入端与所述第三加法器的输出端电连接;The second input end of the shift register is electrically connected to the output end of the third adder;
    所述移位寄存器的输出端与所述第一加法器的输入端电连接;The output end of the shift register is electrically connected to the input end of the first adder;
    所述乘法器用于将所述第一拆分尾数以及所述第二拆分尾数相乘得到多个第三运算结果;The multiplier is used to multiply the first split mantissa and the second split mantissa to obtain a plurality of third operation results;
    所述移位寄存器用于根据多个所述第一移位数对多个所述第三运算结果做移位处理得到多个所述第一运算结果。The shift register is configured to perform shift processing on a plurality of the third operation results according to the plurality of first shift numbers to obtain a plurality of the first operation results.
  9. 根据权利要求6至8中任意一项所述的浮点数计算电路,其特征在于,所述浮点数计算电路还包括内存控制器;The floating-point number calculation circuit according to any one of claims 6 to 8, wherein the floating-point number calculation circuit also includes a memory controller;
    所述内存控制器的输出端与所述拆分电路的输入端电连接;The output end of the memory controller is electrically connected to the input end of the split circuit;
    所述内存控制器,用于获取所述第一浮点数和所述第二浮点数,并且向所述拆分电路发送所述第一浮点数和所述第二浮点数。The memory controller is configured to obtain the first floating point number and the second floating point number, and send the first floating point number and the second floating point number to the splitting circuit.
  10. 根据权利要求1至9中任意一项所述的浮点数计算电路,其特征在于,所述第一浮点数还包括第一符号位,所述第二浮点数还包括第二符号位。The floating-point number calculation circuit according to any one of claims 1 to 9, wherein the first floating-point number further includes a first sign bit, and the second floating-point number further includes a second sign bit.
  11. 一种浮点数计算方法,其特征在于,用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数和第一尾数,所述第二浮点数包括第二指数和第二尾数,所述方法包括:A floating-point calculation method, characterized in that it is used to calculate the product of a first floating-point number and a second floating-point number, the first floating-point number includes a first exponent and a first mantissa, and the second floating-point number includes a second exponent and second mantissa, the method includes:
    根据所述第一指数和所述第二指数获取第一移位数,所述第一移位数用于表示第一拆分尾数和第二拆分尾数之间乘积的移位数,所述第一拆分尾数由所述第一尾数拆分得到,所述第二拆分尾数由所述第二尾数拆分得到;Obtaining a first shift number according to the first exponent and the second exponent, the first shift number is used to represent the shift number of the product between the first split mantissa and the second split mantissa, the The first split mantissa is obtained by splitting the first mantissa, and the second split mantissa is obtained by splitting the second mantissa;
    选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,所述第一运算结果用于表示所述第一拆分尾数和所述第二拆分尾数的乘积根据所述第一移位数进行移位后得到的数据。Selecting and outputting part of the data in the plurality of first operation results to obtain a plurality of first addition data and a plurality of second addition data, according to a plurality of the first addition data, a plurality of the second addition data and a plurality of the The first operation result obtains the product of the first mantissa and the second mantissa, and the first operation result is used to indicate that the product of the first split mantissa and the second split mantissa is based on the first The data obtained after shifting by bit.
  12. 根据权利要求11所述的浮点数计算方法,其特征在于,所述选择输出多个第一运算结果中的部分数据得到多个第一加法数据和多个第二加法数据,根据多个所述第一加法数据、多个所述第二加法数据和多个所述第一运算结果获取所述第一尾数与所述第二尾数的乘积,包括:The floating-point number calculation method according to claim 11, characterized in that, said selecting and outputting part of data in a plurality of first operation results to obtain a plurality of first addition data and a plurality of second addition data, according to a plurality of said The first addition data, the plurality of second addition data and the plurality of first operation results obtain the product of the first mantissa and the second mantissa, including:
    选择输出多个所述第一运算结果中的低位数据得到多个所述第一加法数据,选择输出多个所述第一运算结果中的高位数据得到多个所述第二加法数据;Selecting and outputting low-order data in a plurality of first operation results to obtain a plurality of first addition data, and selecting and outputting high-order data in a plurality of first operation results to obtain a plurality of second addition data;
    对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。adding a plurality of the first addition data and a plurality of the first operation results to obtain a low-order addition result and carry data; The results are added to obtain a high-order addition result, and the product of the first mantissa and the second mantissa is obtained after the high-order addition result and the low-order addition result are accumulated.
  13. 根据权利要求12所述的浮点数计算方法,其特征在于,所述对多个所述第一加法数据和多个所述第一运算结果相加得到低位加法结果和进位数据,对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到高位加法结果,将所述高位加法结果和所述低位加法结果累加后得到所述第一尾数与所述第二尾数的乘积,包括:The floating-point number calculation method according to claim 12, wherein said adding a plurality of said first addition data and a plurality of said first operation results to obtain a low-order addition result and carry data, and said carry data, a plurality of the second addition data and a plurality of the first operation results are added to obtain a high-order addition result, and the high-order addition result and the low-order addition result are accumulated to obtain the first mantissa and the first mantissa The product of two mantissas, including:
    在第一计算周期对多个所述第一加法数据和多个所述第一运算结果相加得到所述低位加法结果和所述进位数据,在第二计算周期对所述进位数据、多个所述第二加法数据和多个所述第一运算结果相加得到所述高位加法结果;In the first calculation cycle, add a plurality of the first addition data and a plurality of the first operation results to obtain the low-order addition result and the carry data, and in the second calculation cycle, the carry data, the plurality of Adding the second addition data and a plurality of the first operation results to obtain the high order addition result;
    将所述低位加法结果和所述高位加法结果累加后得到所述第一尾数与所述第二尾数的乘积。The product of the first mantissa and the second mantissa is obtained by accumulating the low order addition result and the high order addition result.
  14. 根据权利要求12或13所述的浮点数计算方法,其特征在于,所述方法还包括:The floating-point number calculation method according to claim 12 or 13, wherein the method further comprises:
    将所述第一尾数拆分为所述第一拆分尾数,所述第一拆分尾数包括第一高位尾数与第一低位尾数,将所述第二尾数拆分为所述第二拆分尾数,所述第二拆分尾数包括第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的 最高位之间的移位差值。Split the first mantissa into the first split mantissa, the first split mantissa includes a first high order mantissa and a first low order mantissa, split the second mantissa into the second split mantissa Mantissa, the second split mantissa includes a second high mantissa and a second low mantissa, the first shift number is used to indicate the shift difference between the highest bit of each high mantissa and the highest bit of each low mantissa .
  15. 根据权利要求14所述的浮点数计算方法,其特征在于,The floating-point number calculation method according to claim 14, wherein,
    所述第一高位尾数包括第三尾数,所述第一低位尾数包括第四尾数、第五尾数、第六尾数以及第七尾数,所述第二高位尾数包括第八尾数,所述第二低位尾数包括第九尾数、第十尾数,第十一尾数以及第十二尾数。The first high-order mantissa includes a third mantissa, the first low-order mantissa includes a fourth mantissa, a fifth mantissa, a sixth mantissa, and a seventh mantissa, and the second high-order mantissa includes an eighth mantissa. The second low mantissa includes the ninth mantissa, the tenth mantissa, the eleventh mantissa and the twelfth mantissa.
  16. 根据权利要求11至13中任意一项所述的浮点数计算方法,其特征在于,所述方法还包括:According to the floating-point number calculation method described in any one of claims 11 to 13, the method further comprises:
    存储所述第一拆分尾数、所述第二拆分尾数、所述第一指数、所述第二指数、第三移位数和第四移位数,所述第三移位数用于表示所述第一拆分尾数的移位数,所述第四移位数用于表示所述第二拆分尾数的移位数。storing the first split mantissa, the second split mantissa, the first exponent, the second exponent, a third shift number, and a fourth shift number, the third shift number for Indicates the shift number of the first split mantissa, and the fourth shift number is used to represent the shift number of the second split mantissa.
  17. 根据权利要求16所述的浮点数计算方法,其特征在于,所述根据第一指数和第二指数获取第一移位数,包括:The floating-point number calculation method according to claim 16, wherein said obtaining the first shift number according to the first index and the second index comprises:
    选择多个所述第二运算结果中的最大值;selecting a maximum value among a plurality of the second operation results;
    将多个所述第二运算结果中的最大值分别与各第二运算结果相减以得到所述第一移位数。and subtracting the maximum value among the plurality of second operation results from each second operation result to obtain the first shift number.
  18. 根据权利要求16或17中任意一项所述的浮点数计算方法,其特征在于,所述方法还包括:According to the floating-point number calculation method according to any one of claims 16 and 17, the method further comprises:
    将所述第一拆分尾数以及所述第二拆分尾数相乘得到多个第三运算结果;multiplying the first split mantissa and the second split mantissa to obtain a plurality of third operation results;
    根据多个所述第一移位数对多个所述第三运算结果做移位处理得到多个所述第一运算结果。performing shift processing on a plurality of the third operation results according to the plurality of first shift numbers to obtain a plurality of the first operation results.
  19. 根据权利要求12至18中任意一项所述的浮点数计算方法,其特征在于,所述方法还包括:The floating-point number calculation method according to any one of claims 12 to 18, wherein the method further comprises:
    获取所述第一浮点数和所述第二浮点数。The first floating point number and the second floating point number are obtained.
  20. 根据权利要求11至19中任意一项所述的浮点数计算方法,其特征在于,所述第一浮点数还包括第一符号位,所述第二浮点数还包括第二符号位。The floating-point number calculation method according to any one of claims 11 to 19, wherein the first floating-point number further includes a first sign bit, and the second floating-point number further includes a second sign bit.
  21. 一种浮点数计算电路,其特征在于,所述浮点数计算电路包括:指数处理电路和计算电路,所述计算电路包括第一乘法电路、第一选择器和加法电路;A floating-point number calculation circuit, characterized in that the floating-point number calculation circuit includes: an index processing circuit and a calculation circuit, and the calculation circuit includes a first multiplication circuit, a first selector, and an addition circuit;
    所述指数处理电路的输出端与所述第一乘法电路的输入端电连接;The output end of the exponent processing circuit is electrically connected to the input end of the first multiplication circuit;
    所述第一乘法电路的输出端与所述第一选择器的输入端电连接;The output end of the first multiplication circuit is electrically connected to the input end of the first selector;
    所述第一选择器的输出端与所述加法电路的输入端电连接。The output terminal of the first selector is electrically connected with the input terminal of the adding circuit.
  22. 根据权利要求21所述的浮点数计算电路,其特征在于,所述浮点数计算电路用于计算第一浮点数和第二浮点数的乘积,所述第一浮点数包括第一指数、第一尾数和第一符号位,所述第二浮点数包括第二指数、第二尾数和第二符号位;The floating-point number calculation circuit according to claim 21, wherein the floating-point number calculation circuit is used to calculate the product of a first floating-point number and a second floating-point number, and the first floating-point number includes a first exponent, a first a mantissa and a first sign bit, the second floating point number includes a second exponent, a second mantissa and a second sign bit;
    所述指数处理电路的输入端用于接收所述第一指数和所述第二指数;an input of the index processing circuit for receiving the first index and the second index;
    所述计算电路的输入端用于接收所述第一尾数和所述第二尾数。The input terminal of the calculation circuit is used to receive the first mantissa and the second mantissa.
  23. 根据权利要求22所述的浮点数计算电路,其特征在于,所述加法电路包括第一加法器和累加器;The floating-point number calculation circuit according to claim 22, wherein the addition circuit comprises a first adder and an accumulator;
    所述第一加法器的输入端与所述第一选择器的输出端电连接,所述第一加法器的输出端与所述累加器的输入端电连接。The input end of the first adder is electrically connected to the output end of the first selector, and the output end of the first adder is electrically connected to the input end of the accumulator.
  24. 根据权利要求22或23所述的浮点数计算电路,其特征在于,所述第一乘法电路包括第一乘法器和第一移位寄存器;The floating-point number calculation circuit according to claim 22 or 23, wherein the first multiplication circuit comprises a first multiplier and a first shift register;
    所述第一移位寄存器的第一输入端与所述指数处理电路的输出端电连接,所述第一移位寄存器的第二输入端与所述第一乘法器的输出端电连接,所述第一移位寄存器的输出端与所述第一选择器的输入端电连接。The first input end of the first shift register is electrically connected to the output end of the exponent processing circuit, and the second input end of the first shift register is electrically connected to the output end of the first multiplier, so The output end of the first shift register is electrically connected to the input end of the first selector.
  25. 根据权利要求22至24中任意一项所述的浮点数计算电路,其特征在于,所述指数处理电路包括第二加法器、第二选择器以及第三加法器;The floating-point number calculation circuit according to any one of claims 22 to 24, wherein the exponent processing circuit includes a second adder, a second selector, and a third adder;
    所述第二加法器的输出端与所述第三加法器的第一输入端电连接;The output end of the second adder is electrically connected to the first input end of the third adder;
    所述第三加法器的第二输入端与所述第二选择器的输出端电连接,所述第三加法器的输出端与所述第一移位寄存器的第一输入端电连接。The second input end of the third adder is electrically connected to the output end of the second selector, and the output end of the third adder is electrically connected to the first input end of the first shift register.
  26. 根据权利要求22至25中任意一项所述的浮点数计算电路,其特征在于,所述计算电路还包括第二乘法电路,所述第二乘法电路包括第二乘法器和第二移位寄存器;The floating-point number calculation circuit according to any one of claims 22 to 25, wherein the calculation circuit further includes a second multiplication circuit, and the second multiplication circuit includes a second multiplier and a second shift register ;
    所述第二移位寄存器的第一输入端与所述指数处理电路的输出端电连接,所述第二移位寄存器的第二输入端与所述第二乘法器的输出端电连接,所述第二移位寄存器的输出端与所述第一加法器的输入端电连接。The first input end of the second shift register is electrically connected to the output end of the exponent processing circuit, and the second input end of the second shift register is electrically connected to the output end of the second multiplier, so The output end of the second shift register is electrically connected to the input end of the first adder.
  27. 根据权利要求22至25中任意一项所述的浮点数计算电路,其特征在于,所述浮点数计算电路还包括内存控制器、第三选择器和寄存器;The floating-point number calculation circuit according to any one of claims 22 to 25, wherein the floating-point number calculation circuit further includes a memory controller, a third selector, and a register;
    所述第三选择器的输入端与所述内存控制器的输出端电连接,所述第三选择器的输出端与所述寄存器的输入端电连接;The input end of the third selector is electrically connected to the output end of the memory controller, and the output end of the third selector is electrically connected to the input end of the register;
    所述寄存器的第一输出端与所述指数处理电路的输入端电连接,所述寄存器的第二输出端与所述计算电路的输入端电连接。The first output end of the register is electrically connected to the input end of the index processing circuit, and the second output end of the register is electrically connected to the input end of the calculation circuit.
  28. 一种计算装置,其特征在于,所述计算装置包括控制电路以及浮点数计算电路;A computing device, characterized in that the computing device includes a control circuit and a floating-point number computing circuit;
    所述浮点数计算电路在所述控制电路的控制下计算数据,所述浮点数计算电路为如权利要求1至10中任一项所述的浮点数计算电路,或者,所述浮点数计算电路为如权利要求21至27中任一项所述的浮点数计算电路。The floating-point number calculation circuit calculates data under the control of the control circuit, and the floating-point number calculation circuit is the floating-point number calculation circuit according to any one of claims 1 to 10, or, the floating-point number calculation circuit It is the floating-point number calculation circuit according to any one of claims 21 to 27.
PCT/CN2021/115811 2021-08-31 2021-08-31 Floating-point number computing circuit and floating-point number computing method WO2023028884A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/115811 WO2023028884A1 (en) 2021-08-31 2021-08-31 Floating-point number computing circuit and floating-point number computing method
CN202180096895.1A CN117178253A (en) 2021-08-31 2021-08-31 Floating point number calculating circuit and floating point number calculating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/115811 WO2023028884A1 (en) 2021-08-31 2021-08-31 Floating-point number computing circuit and floating-point number computing method

Publications (1)

Publication Number Publication Date
WO2023028884A1 true WO2023028884A1 (en) 2023-03-09

Family

ID=85411809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115811 WO2023028884A1 (en) 2021-08-31 2021-08-31 Floating-point number computing circuit and floating-point number computing method

Country Status (2)

Country Link
CN (1) CN117178253A (en)
WO (1) WO2023028884A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236651A1 (en) * 2002-06-20 2003-12-25 Shuji Miyasaka Floating point number storage method and floating point arithmetic device
US20160248439A1 (en) * 2015-02-25 2016-08-25 Renesas Electronics Corporation Floating-point adder, semiconductor device, and control method for floating-point adder
CN113138750A (en) * 2020-01-20 2021-07-20 华为技术有限公司 Arithmetic logic unit, floating point number multiplication calculation method and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236651A1 (en) * 2002-06-20 2003-12-25 Shuji Miyasaka Floating point number storage method and floating point arithmetic device
US20160248439A1 (en) * 2015-02-25 2016-08-25 Renesas Electronics Corporation Floating-point adder, semiconductor device, and control method for floating-point adder
CN113138750A (en) * 2020-01-20 2021-07-20 华为技术有限公司 Arithmetic logic unit, floating point number multiplication calculation method and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MAO WEI; LI KAI; XIE XINANG; ZHAO SHIRUI; LI HE; YU HAO: "A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing", 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), EDAA, 1 February 2021 (2021-02-01), pages 1793 - 1798, XP033941161, DOI: 10.23919/DATE51398.2021.9473928 *

Also Published As

Publication number Publication date
CN117178253A (en) 2023-12-05

Similar Documents

Publication Publication Date Title
US20210349692A1 (en) Multiplier and multiplication method
US10776078B1 (en) Multimodal multiplier systems and methods
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN112651496A (en) Hardware circuit and chip for processing activation function
CN110515584A (en) Floating-point Computation method and system
CN110673823B (en) Multiplier, data processing method and chip
CN110554854B (en) Data processor, method, chip and electronic equipment
CN114021710A (en) Deep learning convolution acceleration method and processor by using bit-level sparsity
WO2023028884A1 (en) Floating-point number computing circuit and floating-point number computing method
CN110837624B (en) Approximation calculation device for sigmoid function
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN116719499A (en) Self-adaptive pseudo-inverse calculation method applied to 5G least square positioning
CN113031912A (en) Multiplier, data processing method, device and chip
US20220075598A1 (en) Systems and Methods for Numerical Precision in Digital Multiplier Circuitry
WO2022088157A1 (en) Floating-point number computing circuit and floating-point number computing method
CN110647307B (en) Data processor, method, chip and electronic equipment
CN111988031B (en) Memristor memory vector matrix operator and operation method
CN210109789U (en) Data processor
CN209879493U (en) Multiplier and method for generating a digital signal
CN210006031U (en) Multiplier and method for generating a digital signal
CN110515588B (en) Multiplier, data processing method, chip and electronic equipment
CN110688087B (en) Data processor, method, chip and electronic equipment
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN116402106B (en) Neural network acceleration method, neural network accelerator, chip and electronic equipment
CN111258540B (en) Multiplier, data processing method, chip and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21955438

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE