CN118092854A - Area-optimized serial floating point override function computing device and processor - Google Patents

Area-optimized serial floating point override function computing device and processor Download PDF

Info

Publication number
CN118092854A
CN118092854A CN202410510689.6A CN202410510689A CN118092854A CN 118092854 A CN118092854 A CN 118092854A CN 202410510689 A CN202410510689 A CN 202410510689A CN 118092854 A CN118092854 A CN 118092854A
Authority
CN
China
Prior art keywords
unit
vector
data
scaling
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410510689.6A
Other languages
Chinese (zh)
Other versions
CN118092854B (en
Inventor
覃博琛
黄志洪
蔡刚
魏育成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Original Assignee
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ehiway Microelectronic Science And Technology Suzhou Co ltd filed Critical Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority to CN202410510689.6A priority Critical patent/CN118092854B/en
Publication of CN118092854A publication Critical patent/CN118092854A/en
Application granted granted Critical
Publication of CN118092854B publication Critical patent/CN118092854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention provides an area-optimized serial floating point override function computing device and a processor, wherein the device comprises: the device comprises an initialization unit, a control unit, a storage unit, an operation unit and an output selection unit; the initialization unit receives floating point input data and instruction control signals which are externally input to the computing device, outputs initialized data and sends the initialized data to the control unit; the control unit is connected with the storage unit and the operation unit; the control unit selectively inputs different data to the operation unit based on constraint conditions of the overrunning function, and meanwhile, the control unit also judges whether the current iteration converges or not; the operation unit is used for carrying out floating point iterative operation, outputting iterative feedback data, carrying out floating point scaling operation according to the convergence signal, and outputting the data after scaling to the output selection unit; the output selection unit outputs the operation result based on the instruction control signal. The scheme realizes hardware resource multiplexing, reduces the chip area and reduces the resource consumption.

Description

Area-optimized serial floating point override function computing device and processor
Technical Field
The invention relates to the technical field of chip design and floating point operation devices, in particular to a floating point operation device and a processor supporting an overrunning function such as a trigonometric function, a hyperbolic function, an exponential function, a logarithmic function and the like.
Background
Transcendental functions refer to functions in which the relationship between variables cannot be represented by finite times of addition, subtraction, multiplication, division, multiplication, evolution, operations such as trigonometric functions, hyperbolic functions, exponential functions, logarithmic functions, and the like. The transcendental function is widely applied to the fields of image processing, radar remote sensing, signal processing and the like, and along with the development of the fields, higher requirements are put forward on an transcendental function computing device, and a transcendental function calculator with high performance, low delay and high precision can be designed for aviation and military fields or a transcendental function processing design with low power consumption and small area can be adopted for embedded fields.
The CORDIC (Coordinate Rotation Digital Computer, coordinate rotation digital computing) algorithm is the computing method most commonly used at present for realizing the transcendental function processing on hardware, the traditional transcendental function processing device realized on the basis of the CORDIC algorithm mostly adopts fixed-point data type design for input and output data, and compared with IEEE-754 floating-point data, the fixed-point data has the advantages of small data range, low precision and poor compatibility in advanced application programs such as C language programs, but the fixed-point data processing is simpler, easy to realize on hardware and small in area. The IEEE-754 floating point data has wide data range and high precision, the floating point data can be directly defined in a C language program, and complex data paths and large area occupation are brought by the floating point data, so that the floating point data needs to be weighed according to specific application fields.
The variables in HCORDIC (High-radix adaptive CORDIC, high-radix self-adaptive CORDIC) algorithm are designed by adopting an IEEE-754 floating point data format, and the processing of the floating point override function can be realized by using hardware.
Disclosure of Invention
Aiming at the problems of large hardware implementation area, high resource consumption and the like of a floating point data processing transcendental function computing device, the invention aims to realize the floating point transcendental function computing device and a processor, and optimizes the device from the aspect of area so as to be better suitable for the whole area optimization requirement of the existing chip. Specifically, the invention provides the following technical scheme:
in one aspect, the present invention provides an area-optimized serial floating point override function calculation device, the device comprising:
The device comprises an initialization unit, a control unit, a storage unit, an operation unit and an output selection unit;
the initialization unit receives floating point input data and instruction control signals which are externally input to the computing device, outputs initialized data and sends the initialized data to the control unit;
The control unit is connected with the storage unit and the operation unit; the control unit selectively inputs the parameter factors acquired by the storage unit, the data output by the initialization unit and the iteration data output by the operation unit into the operation unit based on constraint conditions of the overrunning function, meanwhile, the control unit also judges whether the current iteration is converged or not, and if the convergence conditions are met, a convergence signal is output to the operation unit; the parameter factors comprise a rotation factor delta, a rotation angle theta, a scale factor K and a scale factor K;
the operation unit is used for carrying out floating point iterative operation and outputting iterative feedback data; and performing floating point scaling operation according to the convergence signal, and outputting the scaled data to an output selection unit;
The output selection unit selects the scaled data sent by the operation unit based on the instruction control signal, and outputs an operation result.
Preferably, the memory unit stores possible values of the parameter factors.
Preferably, the operation unit includes: a first operator composed of a floating-point multiplier and a floating-point adder, a second operator composed of a floating-point multiplier and a floating-point subtractor, a third operator composed of a floating-point multiplier, and a fourth operator composed of a floating-point adder;
the iterative operation and the scaling operation are in serial relation, and the scaling operation is started only after the iterative operation result meets the convergence condition, otherwise, the scaling operation is not started.
The iterative operation and the scaling operation are both performed in an operation unit.
Preferably, the iterative operation mode in the operation unit is as follows:
Wherein, Representing the result of the ith iteration calculation of vector X,/>Representing the result of the ith iteration calculation of vector Y,/>Representing the result of the i-th iterative calculation of the cumulative angle,/>Representing the result of the rotation angle i-th iteration calculation, the variable K, k is a scale factor,/>, and、/>Representing the result of the ith iteration calculation of the scale factor,/>Representing the result of the i-th iteration calculation of the twiddle factor, m e {1,0, -1}.
Preferably, the variablesAnd/>The updating mode of (a) is as follows:
The updating mode in the vector mode is as follows:
the update mode in the rotation mode is as follows:
Wherein, 、/>And/>Index part representing variable X, Y, Z,/>、/>And/>Representing the mantissa of variable X, Y, Z, i represents the number of iterations.
Preferably, the scaling operation is performed in the operation unit in the following manner:
Where X r、Yr and Z r represent variables that satisfy the convergence condition after r iterations, which are output by the control unit, and X out、Yout and Z out represent output data of the scaling unit.
Preferably, the initialization unit completes the initialization of the variable corresponding to the operation of the override function according to the instruction control signal; the variables include vector X, vector Y, cumulative angle Z, and scale factor K.
Preferably, in the operation unit, a first multiplexer is disposed before the first and second operators, and the first multiplexer receives the convergence signal, the twiddle factor delta and the scale factor K and sends the data to be selectively output to the first and second operators respectively;
If the convergence signal is false, the first multiplexer selects the rotation factor delta as output, and the operation unit performs iterative operation; if the convergence signal is true, the first multiplexer selects the scaling factor K as the output, and the operation unit performs scaling operation.
Preferably, in the operation unit, the first operator inputs data selected for output by the vector X, the vector Y, and the first multiplexer; the output is vector X after iterative operation or vector after scaling operation
The second arithmetic unit inputs data selected and output by the vector X, the vector Y and the first multiplexer; the output is the vector Y after iterative operation or the vector after scaling operation
The third arithmetic unit input is a scale factor K and a scale factor K; outputting the iterative scaling factor K;
The fourth operator inputs the accumulated angle Z and the rotation angle theta, and outputs the accumulated angle Z after iterative operation.
Preferably, in the calculating unit, when performing scaling operation:
A floating-point multiplier in the first arithmetic unit performs scaling operation based on the vector Y and the scaling factor K to obtain a scaled vector Y out;
The floating-point multiplier in the second arithmetic unit performs scaling operation based on the vector X and the scaling factor K to obtain a scaled vector X out.
Preferably, the calculating means may calculate eight functions of sine function (sin), cosine function (cos), hyperbolic sine function (sinh), hyperbolic cosine function (cosh), arctangent function (artan), archyperbolic tangent function (artanh), natural exponential function (exp), natural logarithmic function (ln), and the like.
On the other hand, the invention also provides a processor, which at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;
the execution unit includes a serial floating point override function calculation device as described above that performs floating point operations of corresponding override functions based on the instruction control signals.
Compared with the prior art, the scheme has the following beneficial effects:
Based on HCORDIC algorithm idea, the scheme effectively multiplexes and scientifically designs floating point multiplication resources of floating point multiplication and scaling operation in partial transcendental function iterative operation in the hardware design realization process, thereby reducing chip area and resource consumption.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a computational flow diagram of an embodiment of the present invention;
FIG. 2 is a block diagram illustrating a peripheral I/O interface of a floating point override function calculation device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a circuit frame of a floating point override function calculation device according to an embodiment of the present invention;
Fig. 4 is a schematic diagram illustrating structural connection between an operation unit and an output selection unit in a floating point override function calculation device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The following describes a specific implementation of the present solution in conjunction with fig. 1-4.
The scheme provides a floating point override function calculation device, wherein the override function involved in calculation specifically comprises eight instructions, namely a sine function (sin), a cosine function (cos), a hyperbolic sine function (sinh), a hyperbolic cosine function (cosh), an arctangent function (artan), an archyperbolic tangent function (artanh), a natural exponential function (exp), a natural logarithmic function (ln) and the like, and the data input by the instructions and the output result are all the floating point numbers of IEEE-754 standard. The corresponding instruction control signals are generated for the eight function instructions using a one-hot code encoded version.
Fig. 1 shows a flow chart of the present invention for calculating an override function, inputting floating point data, setting i as the iteration number, performing initialization, i=0, performing iterative operation, judging whether the current iteration is converged, if not, adding 1 to the iteration number, and continuing the iterative operation; if the iteration result is converged, scaling operation is carried out on the iteration result, and selection output is carried out after the scaling operation is finished.
FIG. 2 is a schematic diagram showing the structure of a floating point override function computing device according to the present invention, in which a peripheral interface receives an instruction control signal and an input floating point input data, and outputs a floating point output data through the device to complete the floating point calculation for the override function.
The device can calculate eight function instructions of sine function (sin), cosine function (cos), hyperbolic sine function (sinh), hyperbolic cosine function (cosh), arctangent function (artan), archyperbolic tangent function (artanh), natural exponential function (exp), natural logarithmic function (ln) and the like, and encodes the eight instructions by using a single thermal code, as shown in table 1.
In this embodiment, the floating point override function calculation device is preferably designed based on HCORDIC (High-radix adaptive CORDIC) algorithm concept, wherein variables are preferably all floating point data in the IEEE-754 standard.
Referring to fig. 3, the floating point override function calculating device according to the present embodiment includes an initializing unit, a storage unit, a control unit, an operation unit, and an output selecting unit.
The initialization unit receives input data and command control signals input into the computing device from the outside, and initializes the input data according to the command control signals. Preferably, the specific initialization can be performed according to different operational functions, as will be explained later in connection with table 2; the initialization unit is connected with the control unit and outputs initialized data to the control unit. As shown in table 2, the initialization of eight override functions for the process mainly includes the initialization for four variables, vector X, vector Y, cumulative angle Z, and scale factor K.
The control unit is connected with the initialization unit, the operation unit and the storage unit; the control unit receives the initialization data from the initialization unit and the iterative operation data fed back by the operation unit, and selects control data to be input to the operation unit based on the data and the parameters from the storage unit. And judging whether the iterative operation data fed back by the operation unit reaches a convergence condition or not, so as to output a convergence signal to the operation unit.
The storage unit is connected with the control unit, and the values of the prestored parameter factors are taken out from the storage unit according to the address signals sent by the control unit and sent to the control unit. The parameter factors include a rotation factor delta, a rotation angle theta, a scale factor K, and a scale factor K.
The operation unit judges whether the operation is an iterative operation or a scaling operation based on the data sent by the control unit and the convergence signal. If the convergence (i.e. the convergence signal is true), performing scaling operation on the data, and finally transmitting the calculation result of the scaling operation to an output selection unit; if the signal is not converged (i.e. the convergence signal is false), the data is subjected to iterative operation, and the calculation result of the iterative operation is output back to the control unit.
The structure and connection of the operation unit and the output selection unit are shown in fig. 4. The design of the arithmetic unit is realized by adopting a parallel mode of a plurality of floating point calculations, wherein the arithmetic unit comprises a first arithmetic unit formed by a floating point multiplier and a floating point adder, a second arithmetic unit formed by a floating point multiplier and a floating point subtracter, a third arithmetic unit formed by a floating point multiplier and a fourth arithmetic unit formed by a floating point adder.
The output selection unit selects corresponding output according to the received instruction control signal to obtain a final operation result; different command control signals correspond to different functions.
Specifically, the arithmetic unit receives the convergence signal sent by the control unit, the convergence signal is an input of a multiplexer (i.e., a first multiplexer) between the first arithmetic unit and the second arithmetic unit, and δ and K are selectively output based on the convergence signal. If the convergence signal is false, the multipath selector selects delta as output, the operation unit needs to perform iterative operation, and the iterative operation is performed on the vector X, the vector Y, the accumulation angle Z, delta, theta, K and K sent by the control unit according to HCORDIC algorithm iterative formulas; if the convergence signal is true, the multiplexer (i.e., the first multiplexer) selects K as an output, and performs scaling operation on the vector X, the vector Y, the accumulation angle Z, and K sent from the control unit according to the HCORDIC algorithm scaling formula.
In the iterative operation process, the first arithmetic unit performs floating point multiplication and addition operation on X, Y and delta to complete the iterative operation of X; the second arithmetic unit performs floating point multiplication and subtraction operation on X, Y and delta to complete iterative operation of Y; the third arithmetic unit carries out floating-point multiplication operation on K and K to finish iterative operation of K; and the fourth arithmetic unit carries out floating point addition operation on Z and theta to finish iterative operation of Z. Wherein X, Y denotes a vector in the CORDIC algorithm, Z denotes the cumulative angle, and the variable θ denotes the rotation angle. Variable K, k is the scale factor and variable delta is the twiddle factor. The four operators feed back to the control unit with respect to the output of the iterative operation.
In the scaling operation process, a floating point multiplier in the first operator performs floating point multiplication operation on Y, K to finish scaling operation of Y, and outputs a scaling result to an output selection unit; a floating point multiplier in the second arithmetic unit performs floating point multiplication operation on X, K to finish scaling operation of X, and outputs a scaling result to an output selection unit; z does not need special processing, when scaling operation is carried out, Z does not need to be independently carried out, Z in iterative feedback data is directly output, and is synchronously output with X, Y, and a scaling result is output to an output selection unit. In the scaling operation, the floating point arithmetic unit used in the scaling operation can be completely multiplexed with the iterative operation, so that the problem of hardware resource consumption is effectively solved, and the scaling operation is also a design mode which is not available in the floating point computing device at present.
The output selection unit selects corresponding output according to the received instruction control signal to obtain a final operation result; different command control signals correspond to different functions.
When the output selection unit outputs the result, that is, the calculation process of the current floating point override function is finished, a new instruction control signal and input data can be received at the moment, and the next floating point override function is calculated.
The iterative operation and the scaling operation are required to be executed in series, the scaling operation is performed depending on the convergence signal, and the scaling operation can be performed only if the convergence signal is true. And the scaling operation cannot be performed simultaneously with the iterative operation, so that only one transcendental function can be processed in the whole working process of the computing device, the scaling operation cannot be performed on one transcendental function, and the iterative operation can be performed on the other transcendental function.
Next, we describe in detail the floating point operation procedure of the floating point override function calculation device. The floating point transcendental function computing device mainly comprises two processes, namely an iterative operation process and a scaling process aiming at floating point operations of various transcendental functions.
(1) Iterative process
Illustratively, the implementation of the iterative procedure calculates the transcendental function by moving the vectors X and Y on a geometric locus where the pattern m e {1,0, -1} is fixed, where illustratively m = 1,0, -1 corresponds to a circular coordinate system, a linear coordinate system, and a hyperbolic coordinate system, respectively, in order. The variable Z represents the cumulative angle, the variable θ represents the rotation angle, the variable K, k is the scale factor, and the variable δ is the twiddle factor. In the iterative calculation process of the operation unit, in combination with fig. 4, at the ith iteration, the values of X, Y and Z and the current iterative update process of the scaling factor K are as follows:
(1)
(2)
(3)
(4)
(5)
in this embodiment, all variables in the algorithm use IEEE-754 floating point format, so that input and output of floating point numbers can be fully supported from data types.
In HCORDIC algorithm, two modes of operation are also involved, namely vector mode and rotation mode. Illustratively, we use、/>And/>Index part representing variable,/>、/>And/>Representing the mantissas of these variables. Then, in the vector mode and the rotation mode, the variables δ and θ are updated as follows:
vector mode:
(5)
(6)
Rotation mode:
(7)
(8)
(2) Scaling process
The scaling process is performed after the iterative calculation result satisfies the convergence condition. Assuming that the convergence condition is satisfied after r iterations, the control unit inputs the variables X r、Yr and Z r at this time to the scaling unit, and scales and outputs the variables X r、Yr and Z r. The variable scaling calculation mode is as follows:
(9)
(10)
(11)
In a more preferred embodiment, we prefer to use a single thermal code encoding for the command control signals used in the control, for example, specific command control signal encodings may be provided as shown in table 1. In this embodiment, the one-hot code is represented in an 8-bit binary manner to correspond to eight different types of common override functions. Of course, those skilled in the art can also adjust the encoding of the instruction control signal, so as to apply the floating point operation of the floating point override function calculation device proposed in the present scheme, and these conventional modifications should be considered as falling within the protection scope of the present application.
TABLE 1 Single hot code encoded instruction control Signal
As shown in table 1, the corresponding instruction control signals can be set for different override functions so as to control the processes of data selection and the like of the floating point override function computing device to run corresponding function operations.
For example, we can further set the corresponding initialization mode and the judgment of the convergence condition for different transcendental function operations. For the working mode, initial value, convergence condition and output selection of eight transcendental functions for floating point calculation by the floating point transcendental function calculation device, we prefer to set the following modes as shown in table 2:
Table 2 operational modes and Convergence conditions of eight functions in HCORDIC algorithm
We illustrate the operation of a function: for example, sin (a) is calculated, the instruction control signal is "00000001", a is the floating point operand of the input. According to table 2, it can be known that the sin function works in the circumferential coordinate system, that is, when the mode m=1, the operation mode is the rotation mode, the initialization is performed according to table 2, the values are X 0=1,Y0=0,Z0=a,K0 =1, and since the operation mode is the rotation mode, according to formulas (8) and (9), the control unit extracts the values of δ and θ from the storage unit, and takes these variables as the inputs of the iteration formulas (1) - (4), so as to complete an iteration operation; and then, returning the result of the iterative operation to the control unit, judging whether the convergence condition is equal to 0 or approaches to 0, if so, outputting a convergence signal to be true by the control unit, sending the convergence signal and the result of the iterative operation to the operation unit, executing formulas (10) - (12) in the operation unit, sending the calculation result to the output selection unit, and controlling the final output Y out of the output selection unit based on the instruction control signal, namely the value of sin (a). If the iterative operation result does not meet the corresponding convergence condition, namely the convergence signal is false, the iterative operation result is sent to an operation unit to perform the next iterative operation, and then the loop operation of convergence judgment is performed again until the convergence condition is met.
The initialization unit is responsible for executing the corresponding initialization process in the table 2 on the input data of the floating point override function computing device input from the outside according to the instruction control signal.
The control unit receives iterative feedback data of each variable obtained by the operation unit and initialized data. In the device, only one override function calculation is completed, new override function data can be received, so that the initialized data in the control unit and the data fed back by the iterative operation of the operation unit do not exist simultaneously, only the initialized data sent by the initialized unit in the control unit are fed back and parameter factor values extracted from the storage unit are fed back after the iterative calculation is started in the control unit when the initial calculation is performed, and no iterative feedback data is available.
The memory unit stores possible values of the factors delta, theta and k, in particular, in different coordinate systems and in different operation modes, the values of delta, theta and k all vary with the variation of the iteration process variable, and in a more preferred embodiment, the values of delta, theta and k can be calculated in advance according to formulas (5) - (9) and then stored in the memory unit for the operation unit to call.
For the pre-stored possible values of δ, θ and k, which may be calculated in advance using an exhaustive method, for example, in equation (6), the value of the twiddle factor δ i varies according to the variation of the value of the vector X i、Yi, assuming that at the ith iteration, |y i|<|Xi |, and pattern m is not equal to 0, the value of δ i is calculatedThese vectors X i、Yi are all compliant with the floating point data type based on the IEEE-754 standard, having 1 bit in sign bit, 8 bits in exponent field, 23 bits in mantissa field, and a total of 32 bits wide. If the possible values of all vectors X i、Yi are calculated in advance, then the direct operation is carried out "By the way, the value of delta i can be obtained directly, but in the practical hardware design, the data of X i、Yi with 32 bits width cannot be obtained in an exhaustive way, and the data volume is large. Therefore, we discard this 32-bit wide data using only the upper 4 bits of its 23-bit mantissa field, the lower 19 bits, using only the exponent difference "/>The lower 4 bits and the upper 4 bits of the middle 8-bit exponent field are discarded, and at this time, the mantissa is only 16 possible, because the mantissa bit width is only 4 bits, the exponent is only 16 possible, because the exponent bit width is only 4 bits, then "/>The "result is only 16×16=256 possible results, the 256 possible results are exhausted, all the 256 possible results are calculated in advance and stored in the storage unit, and then, in the operation process, the value of the twiddle factor delta i can be found in the storage unit quickly only according to the first 4 bits of the mantissa field and the last 4 bits of the exponent field.
Similarly, other factors θ and k may be similarly calculated.
In particular, the truncation described above is illustrated with 4 bits, and extends to other bit widths, such as mantissa fields using only their upper 8 bits, the lower 15 bits discarded, and the principle is the same.
The above is only one preferred embodiment for pre-storing the possible values of delta, theta and k as memory units. Of course, all possible values of δ, θ and k may be exhausted and stored directly. Or the possible values of the three parameters are calculated by adopting different cut-off modes based on the above description, and are pre-stored so as to facilitate the subsequent floating point operation.
The control unit is responsible for constraining the conditions set in formulas (5) - (9), taking out corresponding delta, theta and k values from the storage unit according to the addresses, outputting the corresponding delta, theta and k values to the operation unit together with X i、Yi and Z i output by the input selection unit, and performing iterative operation; meanwhile, the control unit is also responsible for detecting whether the current X i、Yi and Z i meet the convergence condition in the table 2, outputting a convergence signal as true if the convergence condition is met, and outputting the variable to the operation unit.
The convergence signal is false, the operation unit carries out corresponding floating point operation on the input variable according to formulas (1) - (4), the calculated result is output back to the control unit, and the control unit judges whether the data variable is converged or not.
The convergence signal is true, the operation unit selects K i as one of the inputs of the floating-point multipliers in the first and second operators, performs scaling operation of formulas (10) - (12) on variables X i、Yi and Z i and a scaling factor K i that have reached the convergence condition, and outputs the result to the output selection unit.
The output selection unit selects corresponding outputs in X out、Yout and Z out for the different functions in Table 2 according to the instruction control signals of Table 1.
In addition, the invention also provides a processor, which at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;
the execution unit includes a serial floating point override function calculation device as described above that performs floating point operations of corresponding override functions based on the instruction control signals.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. An area optimized serial floating point override function calculation device, the device comprising:
The device comprises an initialization unit, a control unit, a storage unit, an operation unit and an output selection unit;
the initialization unit receives floating point input data and instruction control signals which are externally input to the computing device, outputs initialized data and sends the initialized data to the control unit;
The control unit is connected with the storage unit and the operation unit; the control unit selectively inputs the parameter factors acquired by the storage unit, the data output by the initialization unit and the iteration data output by the operation unit into the operation unit based on constraint conditions of the overrunning function, meanwhile, the control unit also judges whether the current iteration is converged or not, and if the convergence conditions are met, a convergence signal is output to the operation unit; the parameter factors comprise a rotation factor delta, a rotation angle theta, a scale factor K and a scale factor K;
the operation unit is used for carrying out floating point iterative operation and outputting iterative feedback data; and performing floating point scaling operation according to the convergence signal, and outputting the scaled data to an output selection unit;
The output selection unit selects the scaled data sent by the operation unit based on the instruction control signal, and outputs an operation result.
2. The computing device of claim 1, wherein the arithmetic unit comprises: a first operator composed of a floating-point multiplier and a floating-point adder, a second operator composed of a floating-point multiplier and a floating-point subtractor, a third operator composed of a floating-point multiplier, and a fourth operator composed of a floating-point adder;
the iterative operation and the scaling operation are in serial relation, and the scaling operation is started only after the iterative operation result meets the convergence condition, otherwise, the scaling operation is not started.
3. The computing device according to claim 2, wherein in the computing unit, a first multiplexer is provided before the first and second operators, the first multiplexer receives the convergence signal, the twiddle factor δ and the scale factor K, and sends the data to be selectively output to the first and second operators, respectively;
If the convergence signal is false, the first multiplexer selects the rotation factor delta as output, and the operation unit performs iterative operation; if the convergence signal is true, the first multiplexer selects the scaling factor K as the output, and the operation unit performs scaling operation.
4. The computing device of claim 3, wherein in the arithmetic unit, the first operator inputs data selected for output for vector X, vector Y, and first multiplexer; the output is vector X after iterative operation or vector after scaling operation
The second arithmetic unit inputs data selected and output by the vector X, the vector Y and the first multiplexer; the output is the vector Y after iterative operation or the vector after scaling operation
The third arithmetic unit input is a scale factor K and a scale factor K; outputting the iterative scaling factor K;
The fourth operator inputs the accumulated angle Z and the rotation angle theta, and outputs the accumulated angle Z after iterative operation.
5. The computing device according to claim 1, wherein the iterative operation manner in the operation unit is:
Wherein, Representing the result of the ith iteration calculation of vector X,/>Representing the result of the ith iteration calculation of vector Y,/>Representing the result of the i-th iterative calculation of the cumulative angle,/>Representing the result of the rotation angle i-th iteration calculation, the variable K, k is a scale factor,/>, and、/>Representing the result of the ith iteration calculation of the scale factor,/>Representing the result of the i-th iteration calculation of the twiddle factor, m e {1,0, -1}.
6. The computing device of claim 1, wherein the scaling operation is performed in the operation unit by:
Where X r、Yr and Z r represent variables that satisfy the convergence condition after r iterations, which are output by the control unit, and X out、Yout and Z out represent output data of the scaling unit.
7. The computing device of claim 1, wherein the initialization unit performs initialization of variables corresponding to transcendental function operations according to instruction control signals; the variables include vector X, vector Y, cumulative angle Z, and scale factor K.
8. The computing device of claim 5, wherein the variables areAnd/>The updating mode of (a) is as follows:
The updating mode in the vector mode is as follows:
the update mode in the rotation mode is as follows:
Wherein, 、/>And/>Index part representing variable X, Y, Z,/>、/>And/>Representing the mantissa of variable X, Y, Z, i represents the number of iterations.
9. The computing device according to claim 2, wherein the computing unit performs a scaling operation:
A floating-point multiplier in the first arithmetic unit performs scaling operation based on the vector Y and the scaling factor K to obtain a scaled vector Y out;
The floating-point multiplier in the second arithmetic unit performs scaling operation based on the vector X and the scaling factor K to obtain a scaled vector X out.
10. A processor, wherein the processor at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;
the execution unit comprises a serial floating point override function calculation device according to any one of claims 1-9, which executes floating point operations of corresponding override functions based on the instruction control signals.
CN202410510689.6A 2024-04-26 2024-04-26 Area-optimized serial floating point override function computing device and processor Active CN118092854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410510689.6A CN118092854B (en) 2024-04-26 2024-04-26 Area-optimized serial floating point override function computing device and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410510689.6A CN118092854B (en) 2024-04-26 2024-04-26 Area-optimized serial floating point override function computing device and processor

Publications (2)

Publication Number Publication Date
CN118092854A true CN118092854A (en) 2024-05-28
CN118092854B CN118092854B (en) 2024-07-19

Family

ID=91155247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410510689.6A Active CN118092854B (en) 2024-04-26 2024-04-26 Area-optimized serial floating point override function computing device and processor

Country Status (1)

Country Link
CN (1) CN118092854B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110302231A1 (en) * 2010-06-02 2011-12-08 Maxeler Technologies, Ltd. Method and apparatus for performing numerical calculations
US20170010863A1 (en) * 2015-06-12 2017-01-12 Arm Limited Apparatus and method for controlling rounding when performing a floating point operation
US10168992B1 (en) * 2017-08-08 2019-01-01 Texas Instruments Incorporated Interruptible trigonometric operations
US20190138570A1 (en) * 2016-04-29 2019-05-09 Cambricon Technologies Corporation Limited Apparatus and Methods for Performing Multiple Transcendental Function Operations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110302231A1 (en) * 2010-06-02 2011-12-08 Maxeler Technologies, Ltd. Method and apparatus for performing numerical calculations
US20170010863A1 (en) * 2015-06-12 2017-01-12 Arm Limited Apparatus and method for controlling rounding when performing a floating point operation
US20190138570A1 (en) * 2016-04-29 2019-05-09 Cambricon Technologies Corporation Limited Apparatus and Methods for Performing Multiple Transcendental Function Operations
US10168992B1 (en) * 2017-08-08 2019-01-01 Texas Instruments Incorporated Interruptible trigonometric operations

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
宋晨阳;李涛;牛志璐;李卯良;: "基于CORDIC的浮点超越函数设计与实现", 信息技术, no. 09, 25 September 2017 (2017-09-25) *
李全;陈石平;李晓欢;黄守麟;: "浮点正余弦函数的FPGA及自定义指令实现", 微计算机信息, no. 35, 15 December 2008 (2008-12-15) *
李波, 周端, 王永海: "80位嵌入式超越函数运算器的设计", 计算机工程与科学, no. 12, 30 December 2004 (2004-12-30) *

Also Published As

Publication number Publication date
CN118092854B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
CN110689125A (en) Computing device
JPH02196328A (en) Floating point computing apparatus
US3922536A (en) Multionomial processor system
US5548545A (en) Floating point exception prediction for compound operations and variable precision using an intermediate exponent bus
Zhu et al. Low latency and low error floating-point sine/cosine function based TCORDIC algorithm
KR20180050203A (en) Close path fast incremented sum in a three-path fused multiply-add design
JPH09212337A (en) Floating-point arithmetic processor
US6941334B2 (en) Higher precision divide and square root approximations
CN109976705B (en) Floating-point format data processing device, data processing equipment and data processing method
CN117648959B (en) Multi-precision operand operation device supporting neural network operation
KR20170138143A (en) Method and apparatus for fused multiply-add
CN118092854B (en) Area-optimized serial floating point override function computing device and processor
CN111666065B (en) Trigonometric function pipeline iteration solving method and device based on CORDIC
CN111666064B (en) Trigonometric function loop iteration solving method and device based on CORDIC
Bruguera et al. Design of a pipelined radix 4 CORDIC processor
TW200532552A (en) Methods and apparatus for performing mathematical operations using scaled integers
CN112988110A (en) Floating point processing device and data processing method
CN118092852B (en) CORDIC device applied to floating point processor and floating point processor
WO2005069126A2 (en) A data processing apparatus and method for performing data processing operations on floating point data elements
CN116700664A (en) Method and device for determining square root of floating point number
CN118092853B (en) Instruction set expansion method and device based on RISC-V floating point overrunning function
CN110879696A (en) Speculative computation in square root operations
CN113692561A (en) Apparatus and method for calculating elementary functions using a successive accumulation rotation circuit
JP2002023997A (en) Selection base rounding method for floating point operation
US20240134608A1 (en) System and method to accelerate microprocessor operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant