CN118092854A

CN118092854A - Area-optimized serial floating point override function computing device and processor

Info

Publication number: CN118092854A
Application number: CN202410510689.6A
Authority: CN
Inventors: 覃博琛; 黄志洪; 蔡刚; 魏育成
Original assignee: Ehiway Microelectronic Science And Technology Suzhou Co ltd
Current assignee: Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority date: 2024-04-26
Filing date: 2024-04-26
Publication date: 2024-05-28
Anticipated expiration: 2044-04-26
Also published as: CN118092854B

Abstract

The invention provides an area-optimized serial floating point override function computing device and a processor, wherein the device comprises: the device comprises an initialization unit, a control unit, a storage unit, an operation unit and an output selection unit; the initialization unit receives floating point input data and instruction control signals which are externally input to the computing device, outputs initialized data and sends the initialized data to the control unit; the control unit is connected with the storage unit and the operation unit; the control unit selectively inputs different data to the operation unit based on constraint conditions of the overrunning function, and meanwhile, the control unit also judges whether the current iteration converges or not; the operation unit is used for carrying out floating point iterative operation, outputting iterative feedback data, carrying out floating point scaling operation according to the convergence signal, and outputting the data after scaling to the output selection unit; the output selection unit outputs the operation result based on the instruction control signal. The scheme realizes hardware resource multiplexing, reduces the chip area and reduces the resource consumption.

Description

Area-optimized serial floating point override function computing device and processor

Technical Field

The invention relates to the technical field of chip design and floating point operation devices, in particular to a floating point operation device and a processor supporting an overrunning function such as a trigonometric function, a hyperbolic function, an exponential function, a logarithmic function and the like.

Background

Transcendental functions refer to functions in which the relationship between variables cannot be represented by finite times of addition, subtraction, multiplication, division, multiplication, evolution, operations such as trigonometric functions, hyperbolic functions, exponential functions, logarithmic functions, and the like. The transcendental function is widely applied to the fields of image processing, radar remote sensing, signal processing and the like, and along with the development of the fields, higher requirements are put forward on an transcendental function computing device, and a transcendental function calculator with high performance, low delay and high precision can be designed for aviation and military fields or a transcendental function processing design with low power consumption and small area can be adopted for embedded fields.

The CORDIC (Coordinate Rotation Digital Computer, coordinate rotation digital computing) algorithm is the computing method most commonly used at present for realizing the transcendental function processing on hardware, the traditional transcendental function processing device realized on the basis of the CORDIC algorithm mostly adopts fixed-point data type design for input and output data, and compared with IEEE-754 floating-point data, the fixed-point data has the advantages of small data range, low precision and poor compatibility in advanced application programs such as C language programs, but the fixed-point data processing is simpler, easy to realize on hardware and small in area. The IEEE-754 floating point data has wide data range and high precision, the floating point data can be directly defined in a C language program, and complex data paths and large area occupation are brought by the floating point data, so that the floating point data needs to be weighed according to specific application fields.

The variables in HCORDIC (High-radix adaptive CORDIC, high-radix self-adaptive CORDIC) algorithm are designed by adopting an IEEE-754 floating point data format, and the processing of the floating point override function can be realized by using hardware.

Disclosure of Invention

Aiming at the problems of large hardware implementation area, high resource consumption and the like of a floating point data processing transcendental function computing device, the invention aims to realize the floating point transcendental function computing device and a processor, and optimizes the device from the aspect of area so as to be better suitable for the whole area optimization requirement of the existing chip. Specifically, the invention provides the following technical scheme:

in one aspect, the present invention provides an area-optimized serial floating point override function calculation device, the device comprising:

The device comprises an initialization unit, a control unit, a storage unit, an operation unit and an output selection unit;

the initialization unit receives floating point input data and instruction control signals which are externally input to the computing device, outputs initialized data and sends the initialized data to the control unit;

The control unit is connected with the storage unit and the operation unit; the control unit selectively inputs the parameter factors acquired by the storage unit, the data output by the initialization unit and the iteration data output by the operation unit into the operation unit based on constraint conditions of the overrunning function, meanwhile, the control unit also judges whether the current iteration is converged or not, and if the convergence conditions are met, a convergence signal is output to the operation unit; the parameter factors comprise a rotation factor delta, a rotation angle theta, a scale factor K and a scale factor K;

the operation unit is used for carrying out floating point iterative operation and outputting iterative feedback data; and performing floating point scaling operation according to the convergence signal, and outputting the scaled data to an output selection unit;

The output selection unit selects the scaled data sent by the operation unit based on the instruction control signal, and outputs an operation result.

Preferably, the memory unit stores possible values of the parameter factors.

Preferably, the operation unit includes: a first operator composed of a floating-point multiplier and a floating-point adder, a second operator composed of a floating-point multiplier and a floating-point subtractor, a third operator composed of a floating-point multiplier, and a fourth operator composed of a floating-point adder;

the iterative operation and the scaling operation are in serial relation, and the scaling operation is started only after the iterative operation result meets the convergence condition, otherwise, the scaling operation is not started.

The iterative operation and the scaling operation are both performed in an operation unit.

Preferably, the iterative operation mode in the operation unit is as follows:

Wherein, Representing the result of the ith iteration calculation of vector X,/>Representing the result of the ith iteration calculation of vector Y,/>Representing the result of the i-th iterative calculation of the cumulative angle,/>Representing the result of the rotation angle i-th iteration calculation, the variable K, k is a scale factor,/>, and、/>Representing the result of the ith iteration calculation of the scale factor,/>Representing the result of the i-th iteration calculation of the twiddle factor, m e {1,0, -1}.

Preferably, the variablesAnd/>The updating mode of (a) is as follows:

The updating mode in the vector mode is as follows:

the update mode in the rotation mode is as follows:

Wherein, 、/>And/>Index part representing variable X, Y, Z,/>、/>And/>Representing the mantissa of variable X, Y, Z, i represents the number of iterations.

Preferably, the scaling operation is performed in the operation unit in the following manner:

Where X _r、Y_r and Z _r represent variables that satisfy the convergence condition after r iterations, which are output by the control unit, and X _out、Y_out and Z _out represent output data of the scaling unit.

Preferably, the initialization unit completes the initialization of the variable corresponding to the operation of the override function according to the instruction control signal; the variables include vector X, vector Y, cumulative angle Z, and scale factor K.

Preferably, in the operation unit, a first multiplexer is disposed before the first and second operators, and the first multiplexer receives the convergence signal, the twiddle factor delta and the scale factor K and sends the data to be selectively output to the first and second operators respectively;

If the convergence signal is false, the first multiplexer selects the rotation factor delta as output, and the operation unit performs iterative operation; if the convergence signal is true, the first multiplexer selects the scaling factor K as the output, and the operation unit performs scaling operation.

Preferably, in the operation unit, the first operator inputs data selected for output by the vector X, the vector Y, and the first multiplexer; the output is vector X after iterative operation or vector after scaling operation；

The second arithmetic unit inputs data selected and output by the vector X, the vector Y and the first multiplexer; the output is the vector Y after iterative operation or the vector after scaling operation；

The third arithmetic unit input is a scale factor K and a scale factor K; outputting the iterative scaling factor K;

The fourth operator inputs the accumulated angle Z and the rotation angle theta, and outputs the accumulated angle Z after iterative operation.

Preferably, in the calculating unit, when performing scaling operation:

A floating-point multiplier in the first arithmetic unit performs scaling operation based on the vector Y and the scaling factor K to obtain a scaled vector Y _out;

The floating-point multiplier in the second arithmetic unit performs scaling operation based on the vector X and the scaling factor K to obtain a scaled vector X _out.

Preferably, the calculating means may calculate eight functions of sine function (sin), cosine function (cos), hyperbolic sine function (sinh), hyperbolic cosine function (cosh), arctangent function (artan), archyperbolic tangent function (artanh), natural exponential function (exp), natural logarithmic function (ln), and the like.

On the other hand, the invention also provides a processor, which at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;

the execution unit includes a serial floating point override function calculation device as described above that performs floating point operations of corresponding override functions based on the instruction control signals.

Compared with the prior art, the scheme has the following beneficial effects:

Based on HCORDIC algorithm idea, the scheme effectively multiplexes and scientifically designs floating point multiplication resources of floating point multiplication and scaling operation in partial transcendental function iterative operation in the hardware design realization process, thereby reducing chip area and resource consumption.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a computational flow diagram of an embodiment of the present invention;

FIG. 2 is a block diagram illustrating a peripheral I/O interface of a floating point override function calculation device according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a circuit frame of a floating point override function calculation device according to an embodiment of the present invention;

Fig. 4 is a schematic diagram illustrating structural connection between an operation unit and an output selection unit in a floating point override function calculation device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The following describes a specific implementation of the present solution in conjunction with fig. 1-4.

The scheme provides a floating point override function calculation device, wherein the override function involved in calculation specifically comprises eight instructions, namely a sine function (sin), a cosine function (cos), a hyperbolic sine function (sinh), a hyperbolic cosine function (cosh), an arctangent function (artan), an archyperbolic tangent function (artanh), a natural exponential function (exp), a natural logarithmic function (ln) and the like, and the data input by the instructions and the output result are all the floating point numbers of IEEE-754 standard. The corresponding instruction control signals are generated for the eight function instructions using a one-hot code encoded version.

Fig. 1 shows a flow chart of the present invention for calculating an override function, inputting floating point data, setting i as the iteration number, performing initialization, i=0, performing iterative operation, judging whether the current iteration is converged, if not, adding 1 to the iteration number, and continuing the iterative operation; if the iteration result is converged, scaling operation is carried out on the iteration result, and selection output is carried out after the scaling operation is finished.

FIG. 2 is a schematic diagram showing the structure of a floating point override function computing device according to the present invention, in which a peripheral interface receives an instruction control signal and an input floating point input data, and outputs a floating point output data through the device to complete the floating point calculation for the override function.

The device can calculate eight function instructions of sine function (sin), cosine function (cos), hyperbolic sine function (sinh), hyperbolic cosine function (cosh), arctangent function (artan), archyperbolic tangent function (artanh), natural exponential function (exp), natural logarithmic function (ln) and the like, and encodes the eight instructions by using a single thermal code, as shown in table 1.

In this embodiment, the floating point override function calculation device is preferably designed based on HCORDIC (High-radix adaptive CORDIC) algorithm concept, wherein variables are preferably all floating point data in the IEEE-754 standard.

Referring to fig. 3, the floating point override function calculating device according to the present embodiment includes an initializing unit, a storage unit, a control unit, an operation unit, and an output selecting unit.

The initialization unit receives input data and command control signals input into the computing device from the outside, and initializes the input data according to the command control signals. Preferably, the specific initialization can be performed according to different operational functions, as will be explained later in connection with table 2; the initialization unit is connected with the control unit and outputs initialized data to the control unit. As shown in table 2, the initialization of eight override functions for the process mainly includes the initialization for four variables, vector X, vector Y, cumulative angle Z, and scale factor K.

The control unit is connected with the initialization unit, the operation unit and the storage unit; the control unit receives the initialization data from the initialization unit and the iterative operation data fed back by the operation unit, and selects control data to be input to the operation unit based on the data and the parameters from the storage unit. And judging whether the iterative operation data fed back by the operation unit reaches a convergence condition or not, so as to output a convergence signal to the operation unit.

The storage unit is connected with the control unit, and the values of the prestored parameter factors are taken out from the storage unit according to the address signals sent by the control unit and sent to the control unit. The parameter factors include a rotation factor delta, a rotation angle theta, a scale factor K, and a scale factor K.

The operation unit judges whether the operation is an iterative operation or a scaling operation based on the data sent by the control unit and the convergence signal. If the convergence (i.e. the convergence signal is true), performing scaling operation on the data, and finally transmitting the calculation result of the scaling operation to an output selection unit; if the signal is not converged (i.e. the convergence signal is false), the data is subjected to iterative operation, and the calculation result of the iterative operation is output back to the control unit.

The structure and connection of the operation unit and the output selection unit are shown in fig. 4. The design of the arithmetic unit is realized by adopting a parallel mode of a plurality of floating point calculations, wherein the arithmetic unit comprises a first arithmetic unit formed by a floating point multiplier and a floating point adder, a second arithmetic unit formed by a floating point multiplier and a floating point subtracter, a third arithmetic unit formed by a floating point multiplier and a fourth arithmetic unit formed by a floating point adder.

The output selection unit selects corresponding output according to the received instruction control signal to obtain a final operation result; different command control signals correspond to different functions.

Specifically, the arithmetic unit receives the convergence signal sent by the control unit, the convergence signal is an input of a multiplexer (i.e., a first multiplexer) between the first arithmetic unit and the second arithmetic unit, and δ and K are selectively output based on the convergence signal. If the convergence signal is false, the multipath selector selects delta as output, the operation unit needs to perform iterative operation, and the iterative operation is performed on the vector X, the vector Y, the accumulation angle Z, delta, theta, K and K sent by the control unit according to HCORDIC algorithm iterative formulas; if the convergence signal is true, the multiplexer (i.e., the first multiplexer) selects K as an output, and performs scaling operation on the vector X, the vector Y, the accumulation angle Z, and K sent from the control unit according to the HCORDIC algorithm scaling formula.

In the iterative operation process, the first arithmetic unit performs floating point multiplication and addition operation on X, Y and delta to complete the iterative operation of X; the second arithmetic unit performs floating point multiplication and subtraction operation on X, Y and delta to complete iterative operation of Y; the third arithmetic unit carries out floating-point multiplication operation on K and K to finish iterative operation of K; and the fourth arithmetic unit carries out floating point addition operation on Z and theta to finish iterative operation of Z. Wherein X, Y denotes a vector in the CORDIC algorithm, Z denotes the cumulative angle, and the variable θ denotes the rotation angle. Variable K, k is the scale factor and variable delta is the twiddle factor. The four operators feed back to the control unit with respect to the output of the iterative operation.

In the scaling operation process, a floating point multiplier in the first operator performs floating point multiplication operation on Y, K to finish scaling operation of Y, and outputs a scaling result to an output selection unit; a floating point multiplier in the second arithmetic unit performs floating point multiplication operation on X, K to finish scaling operation of X, and outputs a scaling result to an output selection unit; z does not need special processing, when scaling operation is carried out, Z does not need to be independently carried out, Z in iterative feedback data is directly output, and is synchronously output with X, Y, and a scaling result is output to an output selection unit. In the scaling operation, the floating point arithmetic unit used in the scaling operation can be completely multiplexed with the iterative operation, so that the problem of hardware resource consumption is effectively solved, and the scaling operation is also a design mode which is not available in the floating point computing device at present.

When the output selection unit outputs the result, that is, the calculation process of the current floating point override function is finished, a new instruction control signal and input data can be received at the moment, and the next floating point override function is calculated.

The iterative operation and the scaling operation are required to be executed in series, the scaling operation is performed depending on the convergence signal, and the scaling operation can be performed only if the convergence signal is true. And the scaling operation cannot be performed simultaneously with the iterative operation, so that only one transcendental function can be processed in the whole working process of the computing device, the scaling operation cannot be performed on one transcendental function, and the iterative operation can be performed on the other transcendental function.

Next, we describe in detail the floating point operation procedure of the floating point override function calculation device. The floating point transcendental function computing device mainly comprises two processes, namely an iterative operation process and a scaling process aiming at floating point operations of various transcendental functions.

(1) Iterative process

Illustratively, the implementation of the iterative procedure calculates the transcendental function by moving the vectors X and Y on a geometric locus where the pattern m e {1,0, -1} is fixed, where illustratively m = 1,0, -1 corresponds to a circular coordinate system, a linear coordinate system, and a hyperbolic coordinate system, respectively, in order. The variable Z represents the cumulative angle, the variable θ represents the rotation angle, the variable K, k is the scale factor, and the variable δ is the twiddle factor. In the iterative calculation process of the operation unit, in combination with fig. 4, at the ith iteration, the values of X, Y and Z and the current iterative update process of the scaling factor K are as follows:

(1)

(2)

(3)

(4)

（5）

in this embodiment, all variables in the algorithm use IEEE-754 floating point format, so that input and output of floating point numbers can be fully supported from data types.

In HCORDIC algorithm, two modes of operation are also involved, namely vector mode and rotation mode. Illustratively, we use、/>And/>Index part representing variable,/>、/>And/>Representing the mantissas of these variables. Then, in the vector mode and the rotation mode, the variables δ and θ are updated as follows:

vector mode:

(5)

(6)

Rotation mode:

(7)

(8)

(2) Scaling process

The scaling process is performed after the iterative calculation result satisfies the convergence condition. Assuming that the convergence condition is satisfied after r iterations, the control unit inputs the variables X _r、Y_r and Z _r at this time to the scaling unit, and scales and outputs the variables X _r、Y_r and Z _r. The variable scaling calculation mode is as follows:

(9)

(10)

(11)

In a more preferred embodiment, we prefer to use a single thermal code encoding for the command control signals used in the control, for example, specific command control signal encodings may be provided as shown in table 1. In this embodiment, the one-hot code is represented in an 8-bit binary manner to correspond to eight different types of common override functions. Of course, those skilled in the art can also adjust the encoding of the instruction control signal, so as to apply the floating point operation of the floating point override function calculation device proposed in the present scheme, and these conventional modifications should be considered as falling within the protection scope of the present application.

TABLE 1 Single hot code encoded instruction control Signal

As shown in table 1, the corresponding instruction control signals can be set for different override functions so as to control the processes of data selection and the like of the floating point override function computing device to run corresponding function operations.

For example, we can further set the corresponding initialization mode and the judgment of the convergence condition for different transcendental function operations. For the working mode, initial value, convergence condition and output selection of eight transcendental functions for floating point calculation by the floating point transcendental function calculation device, we prefer to set the following modes as shown in table 2:

Table 2 operational modes and Convergence conditions of eight functions in HCORDIC algorithm

We illustrate the operation of a function: for example, sin (a) is calculated, the instruction control signal is "00000001", a is the floating point operand of the input. According to table 2, it can be known that the sin function works in the circumferential coordinate system, that is, when the mode m=1, the operation mode is the rotation mode, the initialization is performed according to table 2, the values are X ₀=1,Y₀=0,Z₀=a,K₀ =1, and since the operation mode is the rotation mode, according to formulas (8) and (9), the control unit extracts the values of δ and θ from the storage unit, and takes these variables as the inputs of the iteration formulas (1) - (4), so as to complete an iteration operation; and then, returning the result of the iterative operation to the control unit, judging whether the convergence condition is equal to 0 or approaches to 0, if so, outputting a convergence signal to be true by the control unit, sending the convergence signal and the result of the iterative operation to the operation unit, executing formulas (10) - (12) in the operation unit, sending the calculation result to the output selection unit, and controlling the final output Y _out of the output selection unit based on the instruction control signal, namely the value of sin (a). If the iterative operation result does not meet the corresponding convergence condition, namely the convergence signal is false, the iterative operation result is sent to an operation unit to perform the next iterative operation, and then the loop operation of convergence judgment is performed again until the convergence condition is met.

The initialization unit is responsible for executing the corresponding initialization process in the table 2 on the input data of the floating point override function computing device input from the outside according to the instruction control signal.

The control unit receives iterative feedback data of each variable obtained by the operation unit and initialized data. In the device, only one override function calculation is completed, new override function data can be received, so that the initialized data in the control unit and the data fed back by the iterative operation of the operation unit do not exist simultaneously, only the initialized data sent by the initialized unit in the control unit are fed back and parameter factor values extracted from the storage unit are fed back after the iterative calculation is started in the control unit when the initial calculation is performed, and no iterative feedback data is available.

The memory unit stores possible values of the factors delta, theta and k, in particular, in different coordinate systems and in different operation modes, the values of delta, theta and k all vary with the variation of the iteration process variable, and in a more preferred embodiment, the values of delta, theta and k can be calculated in advance according to formulas (5) - (9) and then stored in the memory unit for the operation unit to call.

For the pre-stored possible values of δ, θ and k, which may be calculated in advance using an exhaustive method, for example, in equation (6), the value of the twiddle factor δ _i varies according to the variation of the value of the vector X _i、Y_i, assuming that at the ith iteration, |y _i|<|X_i |, and pattern m is not equal to 0, the value of δ _i is calculatedThese vectors X _i、Y_i are all compliant with the floating point data type based on the IEEE-754 standard, having 1 bit in sign bit, 8 bits in exponent field, 23 bits in mantissa field, and a total of 32 bits wide. If the possible values of all vectors X _i、Y_i are calculated in advance, then the direct operation is carried out "By the way, the value of delta _i can be obtained directly, but in the practical hardware design, the data of X _i、Y_i with 32 bits width cannot be obtained in an exhaustive way, and the data volume is large. Therefore, we discard this 32-bit wide data using only the upper 4 bits of its 23-bit mantissa field, the lower 19 bits, using only the exponent difference "/>The lower 4 bits and the upper 4 bits of the middle 8-bit exponent field are discarded, and at this time, the mantissa is only 16 possible, because the mantissa bit width is only 4 bits, the exponent is only 16 possible, because the exponent bit width is only 4 bits, then "/>The "result is only 16×16=256 possible results, the 256 possible results are exhausted, all the 256 possible results are calculated in advance and stored in the storage unit, and then, in the operation process, the value of the twiddle factor delta _i can be found in the storage unit quickly only according to the first 4 bits of the mantissa field and the last 4 bits of the exponent field.

Similarly, other factors θ and k may be similarly calculated.

In particular, the truncation described above is illustrated with 4 bits, and extends to other bit widths, such as mantissa fields using only their upper 8 bits, the lower 15 bits discarded, and the principle is the same.

The above is only one preferred embodiment for pre-storing the possible values of delta, theta and k as memory units. Of course, all possible values of δ, θ and k may be exhausted and stored directly. Or the possible values of the three parameters are calculated by adopting different cut-off modes based on the above description, and are pre-stored so as to facilitate the subsequent floating point operation.

The control unit is responsible for constraining the conditions set in formulas (5) - (9), taking out corresponding delta, theta and k values from the storage unit according to the addresses, outputting the corresponding delta, theta and k values to the operation unit together with X _i、Y_i and Z _i output by the input selection unit, and performing iterative operation; meanwhile, the control unit is also responsible for detecting whether the current X _i、Y_i and Z _i meet the convergence condition in the table 2, outputting a convergence signal as true if the convergence condition is met, and outputting the variable to the operation unit.

The convergence signal is false, the operation unit carries out corresponding floating point operation on the input variable according to formulas (1) - (4), the calculated result is output back to the control unit, and the control unit judges whether the data variable is converged or not.

The convergence signal is true, the operation unit selects K _i as one of the inputs of the floating-point multipliers in the first and second operators, performs scaling operation of formulas (10) - (12) on variables X _i、Y_i and Z _i and a scaling factor K _i that have reached the convergence condition, and outputs the result to the output selection unit.

The output selection unit selects corresponding outputs in X _out、Y_out and Z _out for the different functions in Table 2 according to the instruction control signals of Table 1.

In addition, the invention also provides a processor, which at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An area optimized serial floating point override function calculation device, the device comprising:

2. The computing device of claim 1, wherein the arithmetic unit comprises: a first operator composed of a floating-point multiplier and a floating-point adder, a second operator composed of a floating-point multiplier and a floating-point subtractor, a third operator composed of a floating-point multiplier, and a fourth operator composed of a floating-point adder;

3. The computing device according to claim 2, wherein in the computing unit, a first multiplexer is provided before the first and second operators, the first multiplexer receives the convergence signal, the twiddle factor δ and the scale factor K, and sends the data to be selectively output to the first and second operators, respectively;

4. The computing device of claim 3, wherein in the arithmetic unit, the first operator inputs data selected for output for vector X, vector Y, and first multiplexer; the output is vector X after iterative operation or vector after scaling operation；

5. The computing device according to claim 1, wherein the iterative operation manner in the operation unit is:

6. The computing device of claim 1, wherein the scaling operation is performed in the operation unit by:

7. The computing device of claim 1, wherein the initialization unit performs initialization of variables corresponding to transcendental function operations according to instruction control signals; the variables include vector X, vector Y, cumulative angle Z, and scale factor K.

8. The computing device of claim 5, wherein the variables areAnd/>The updating mode of (a) is as follows:

The updating mode in the vector mode is as follows:

the update mode in the rotation mode is as follows:

9. The computing device according to claim 2, wherein the computing unit performs a scaling operation:

10. A processor, wherein the processor at least comprises an instruction memory, a decoding module and an execution unit; the decoding module decodes the instruction from the instruction memory to obtain an instruction control signal;

the execution unit comprises a serial floating point override function calculation device according to any one of claims 1-9, which executes floating point operations of corresponding override functions based on the instruction control signals.