CN117215646A

CN117215646A - Floating point operation method, processor, electronic equipment and storage medium

Info

Publication number: CN117215646A
Application number: CN202310552164.4A
Authority: CN
Inventors: 任子木
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-12-12

Abstract

The application relates to the technical field of computers, and provides a floating point operation method, a processor, electronic equipment and a storage medium, which are used for improving the efficiency and the precision of floating point addition and division operation. After the division and addition fusion operation instruction is obtained, fusion operation of floating point division and floating point addition is carried out based on the obtained floating point dividend, floating point divisor and floating point addition number, and the operation efficiency is high. The number of operation times of the floating-point division operation is dynamically determined based on the indexes of the three floating-point numbers, so that the number of operation times of the floating-point division operation can be flexibly adjusted according to actual conditions, the operation efficiency is improved, meanwhile, the intermediate precision loss generated by division iteration is reduced, and in the floating-point addition operation, floating-point quotient and floating-point remainder can be dynamically expanded according to the indexes of the floating-point addend, so that the effective intermediate calculation precision is reserved as much as possible, and the precision of the addition and division operation is further improved.

Description

Floating point operation method, processor, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a floating point operation method, a processor, an electronic device, and a storage medium.

Background

With the development of technology, more and more devices can perform artificial intelligence (Artificial Intelligence, AI) data processing, where AI data processing includes a large number of floating point operations, including floating point addition operations, floating point division operations, and the like.

In floating point arithmetic, the number of times floating point division is invoked is relatively large. Typically, a floating-point division operation is combined with a floating-point addition operation to form a floating-point divide-and-add operation. In the related art, when floating point addition operation is performed, two instructions are generally called to perform floating point division operation and floating point addition operation respectively, so that the operation efficiency is low.

For example, three floating point numbers are A, B and C respectively, and the floating point addition operation relationship is A/B+C, then the related technology calls the first instruction to calculate A/B first, and an intermediate result D is obtained; and D+C is calculated by calling a second instruction, so that a final calculation result is obtained.

Meanwhile, when the floating-point division operation is carried out in the related technology, the iteration times of the floating-point division are only related to whether divisibility is achieved, and the iteration times are usually more, so that the efficiency of the whole floating-point division and addition operation can be reduced, and meanwhile, the calculation precision is lost in the operation process because the result of each floating-point division operation is subjected to normalization and rounding (rounding) operation, and the calculation precision is lower as the iteration times are more.

Therefore, improving the efficiency and accuracy of floating point addition and division operations is a problem to be solved in the current floating point operations.

Disclosure of Invention

The embodiment of the application provides a floating point operation method, a processor, electronic equipment and a storage medium, which are used for improving the efficiency and the precision of floating point addition and division operation.

In one aspect, an embodiment of the present application provides a floating point operation method, which is applied to a floating point operation processor, including:

obtaining a division and addition fusion operation instruction, wherein the floating point fusion operation instruction is an instruction obtained by fusing a floating point division operation instruction and a floating point addition operation instruction;

based on the division and addition fusion operation instruction, a floating point dividend, a floating point divisor and a floating point adder of floating point division and addition operation are obtained;

determining the operation times of floating point division operation in the floating point division addition operation based on the indexes of the three floating point numbers;

based on the operation times, carrying out floating-point division operation on the mantissa of the floating-point dividend and the mantissa of the floating-point divisor to obtain a floating-point quotient and a floating-point remainder;

and carrying out floating point addition operation on the floating point quotient and the mantissa of the floating point addend by adopting a first carry transfer adder to obtain an initial addition result, and carrying out addition operation on the initial addition result based on the floating point remainder to obtain a target addition result.

In another aspect, an embodiment of the present application provides a floating point arithmetic processor, including:

the input/output interface is used for acquiring a divide-and-add fusion operation instruction; based on the division and addition fusion operation instruction, a floating point dividend, a floating point divisor and a floating point adder of floating point division and addition operation are obtained; the floating point fusion operation instruction is an instruction obtained by fusing a floating point division operation instruction and a floating point addition operation instruction;

a floating point divider, configured to perform a floating point division operation on the mantissa of the floating point dividend and the mantissa of the floating point divisor based on the operation number, to obtain a floating point quotient and a floating point remainder;

the floating point adder is used for inputting the mantissas of the floating point quotient and the floating point addend into the first carry transfer adder for floating point addition operation to obtain an initial addition result;

performing input operation on the initial addition result based on the floating-point remainder to obtain a target addition result.

Optionally, the floating point divider includes an exponent processing unit and an iterative operator:

the exponent processing unit is used for calculating a first exponent difference value between the exponent of the floating point dividend and the exponent of the floating point divisor, and calculating a second exponent difference value between the exponent of the floating point addend and the first exponent difference value;

And the iterative arithmetic unit is used for determining the operation times of the floating point division operation in the floating point addition operation based on the second exponent difference value and the preset mantissa digit width.

Optionally, the iterative operator is specifically configured to:

when the second exponent difference value is greater than or equal to the sum of the preset mantissa bit width and the reference bit width, determining that the operation frequency is zero; wherein the reference bit width is determined based on the least reserved bit in the preset mantissa bit width and the or operation result of the rest bits except the highest discard bit in bits;

when the second exponent difference is larger than a first preset bit width and smaller than the sum of the preset mantissa bit width and the reference bit width, determining the operation times based on the difference between the sum of the preset mantissa bit width and the reference bit width and the second exponent difference;

when the second exponent difference value is greater than or equal to a second preset bit width and less than or equal to the first preset bit width, and the preset tail number bit width is greater than the leading zero number, determining the operation times based on the preset tail number bit width, the reference bit width and the leading zero number;

when the second exponent difference value is greater than or equal to the second preset bit width and less than or equal to the first preset bit width, and the preset tail bit width is equal to the leading zero number, determining the operation times based on the preset tail bit width, the reference bit width, the leading zero number and the number of continuous zeros after the leading zero valid bit;

And when the second exponent difference value is smaller than the second preset bit width, determining the operation times based on the preset tail number bit width and the absolute value of the second exponent difference value.

Optionally, the floating point divider further includes a preset lookup table, a first intermediate iteration register, and a second intermediate iteration register:

the preset lookup table is used for inquiring at least one boundary value of the floating point divisor;

the first intermediate iterated register and the second intermediate iterated register are initialized based on the floating point dividend; the first intermediate iteration register and the second intermediate iteration register are used for storing floating point remainder determined by the floating point dividend and the floating point divisor;

the floating point divider is particularly useful for: and carrying out floating point division operation on the floating point dividend, the floating point divisor and each boundary value based on the operation times, and determining the floating point quotient and the floating point remainder based on an operation result when the operation times arrive.

Optionally, the floating point divider further comprises a comparator, a multiplexer and a second carry propagate adder, specifically configured to:

respectively obtaining the current values of the first intermediate iteration register and the second intermediate iteration register, and inputting the current values into the second carry transfer adder to obtain a truncated value of a floating point remainder corresponding to the floating point dividend and the floating point divisor;

Inputting each boundary value and the truncated value into a corresponding comparator for comparison, and selecting a floating point quotient of the floating point dividend and the floating point divisor by adopting the multiplexer based on a comparison result;

and based on the selected floating point quotient, obtaining a floating point remainder of the floating point division operation, and updating the current values of the first intermediate iteration register and the second intermediate iteration register.

Optionally, the floating point divider further comprises a multiplier and a carry save adder, specifically configured to:

multiplying the selected floating point quotient with the floating point divisor based on the multiplier to obtain a product result;

the carry save adder is adopted to subtract the current sum of the values of the first intermediate iteration register and the second intermediate iteration register from the product result to obtain the floating point remainder of the floating point division operation;

and storing the floating point remainder of the floating point division operation into the first intermediate iteration register and the second intermediate iteration register separately so as to update the current values of the first intermediate iteration register and the second intermediate iteration register.

Optionally, the floating point divider further includes an online conversion unit, configured to:

When the selected floating point quotient is a negative number, the floating point quotient is input into the online conversion unit, a floating point quotient of a positive number is obtained, and the converted floating point quotient is stored in a quotient register.

Optionally, the floating point adder includes a relative shift operation unit, a first carry transfer adder, a leading zero prediction unit, and a normalization shift operation unit;

the order shift operation unit is used for performing order shift operation on mantissas of the floating point addends based on second exponent difference values determined by exponents of three floating point numbers, so that the mantissas of the floating point addends are aligned with bit widths of the floating point quotient;

the first carry transfer adder is configured to perform floating point addition operation on the floating point quotient and the mantissa of the floating point addend, and obtain an initial addition and division result;

the order shift operation unit is used for conducting leading zero prediction on the mantissa of the floating point addend and the floating point quotient to obtain a prediction result;

the normalized shift operation unit is used for performing normalized shift operation on the initial addition result based on the prediction result to obtain an intermediate addition result;

the inlet unit is specifically used for: and performing input operation on the intermediate addition result based on the floating point remainder and the preset tail number bit width to obtain the target addition result.

Optionally, the access unit is specifically configured to:

performing OR operation on each bit number in the floating point remainder to obtain a divisor result of the mantissa of the floating point dividend and the mantissa of the floating point divisor;

and based on the divisor result and the preset tail digital width, input operation is carried out on the intermediate divisor result, and the target divisor result is obtained.

Optionally, the access unit is specifically configured to:

when the divisor result is not divisible, determining a waste bit width based on the intermediate divisor result and the preset tail number bit width;

acquiring the lowest reserved bit in the preset tail number bit width and the highest reject bit in the reject bit width, and performing OR operation on the rest bits in the reject bit width to obtain the reject tail bit;

and performing rounding operation on the intermediate addition result based on the least reserved bit, the highest reject bit, the reject last bit and a preset rounding mode to obtain the target addition result.

In another aspect, an embodiment of the present application provides an electronic device, including a floating point operation processor and a memory, where the memory stores a computer program, and when the computer program is executed by the floating point operation processor, the steps of the floating point operation method are implemented.

In another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by an electronic device, perform the steps of the floating point method described above.

In another aspect, an embodiment of the present application provides a computer program product, including a computer program, which when executed by an electronic device implements the steps of the floating point operation method described above.

The embodiment of the application has the following beneficial effects:

in the floating point operation method, the processor, the electronic equipment and the storage medium provided by the embodiment of the application, floating point division operation and floating point addition operation can be performed on the floating point number through one obtained division and addition fusion operation instruction, two operation instructions are not required to be called, and the floating point operation efficiency is higher. When floating point addition operation is carried out, the operation times of the floating point division operation are determined based on the obtained floating point divisor, floating point divisor and floating point addend which are the indexes of the three floating points, so that division iteration operation can be flexibly carried out according to actual requirements, the precision loss of intermediate operation is reduced while the efficiency of the floating point operation is effectively improved under the condition that the floating point addend is large, and meanwhile, when the operation times of the floating point division operation are determined, the index of the floating point addend is used, so that when floating point addition operation is carried out, dynamic expansion can be carried out on the result of the floating point division operation based on the index of the floating point addend, the intermediate precision generated by the floating point division operation is effectively reserved, and the precision of the whole floating point addition operation is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a related art addition/division operation according to an embodiment of the present application;

FIG. 2 is a diagram of an application scenario in which an embodiment of the present application is applicable;

FIG. 3 is a flowchart of an image processing method based on floating point operations according to an embodiment of the present application;

FIG. 4 is a data format of floating point numbers provided by an embodiment of the present application;

FIG. 5 is a division process according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a process for dynamically determining the number of operations for division iteration according to an embodiment of the present application;

FIG. 7 is a flow chart of a method for determining quotient and remainder provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of a vendor selected multiple-segment selector provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of a divider circuit according to an embodiment of the present application;

FIG. 10 is a diagram illustrating an addition process according to an embodiment of the present application;

FIG. 11 is a flowchart of the ingress operation provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a inlet operation provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of a divide-by-fuse circuit according to an embodiment of the present application;

FIG. 14A is a diagram of an embodiment of the present application;

FIG. 14B is a diagram of another embodiment of the present application;

FIG. 15 is a block diagram of an image processing apparatus based on floating point operations according to an embodiment of the present application;

fig. 16 is a block diagram of an electronic device according to an embodiment of the present application;

fig. 17 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Compared with the data mining, which finds the mutual characteristics among big data, the machine learning is more focused on the design of an algorithm, so that a computer can automatically learn the rules from the data and predict unknown data by utilizing the rules.

Deep learning is the core of artificial intelligence, which is the fundamental way for computers to have intelligence, applied throughout various areas of artificial intelligence. Deep learning typically includes neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

The carry propagate adder (Carry Propagate Adder, CPA) is a generic adder for computing the sum of two numbers, with each bit carry output inside connected to the carry input of the higher one.

An on-line conversion unit (on the Fly Convert, OFCVT) is used for performing conversion of positive and negative numbers.

The Look-Up Table (LUP) is essentially a random access memory (Random Access Memory, RAM), and after data is written into the RAM in advance, each time a data is input, it is equal to an address input to perform Look-Up, so as to find out the content output corresponding to the address.

A Multiplexer (multiplexor) is a circuit, also called a data selector or a multiple switch, that can select any one of them as needed during the multiple data transfer.

The following outlines the design ideas of the embodiments of the present application.

With the development of technology, more and more devices can perform artificial intelligence (Artificial Intelligence, AI) data processing, and floating point arithmetic is used in the AI data processing in most cases.

For example, when performing AI processing on an image, for convenience of calculation, color values of pixels in the image are normalized to be floating point numbers of 0.0-1.0, then the image with normalized color values recorded in a plurality of floating point number forms is input into a neural network for feature extraction, and floating point operations are performed on each floating point number in the feature extraction process.

Compared with integer operation, floating point number operation has the characteristics of wide dynamic range and high precision. Particularly, in the process of training the neural network by using the image sample set, the high-precision floating point number operation has a great influence on the generation of network parameters, and if the precision of the floating point number operation is higher, the obtained network parameters are more accurate, so that the inference accuracy of the trained model is improved.

In floating point arithmetic, the number of times floating point division is invoked is relatively large. Typically, a floating-point division operation is combined with a floating-point addition operation to form a floating-point divide-and-add operation. In the active layer of neural networks, floating point divide-plus operations are often used.

In the related art, when performing floating point addition and division operations, a floating point division instruction and a floating point addition instruction are generally called to respectively complete the division operation and the addition operation, specifically, a division result of two floating points is calculated first, and then an addition result of the division result and a third floating point is calculated to obtain a addition and division result.

For example, three floating point numbers are A, B and C respectively, the data type is FP32, the floating point addition operation relationship is A/B+C, as shown in FIG. 1, a floating point division instruction (FP_DIV) is called to perform A/B floating point division operation to obtain a division result D, and then a floating point addition instruction (FP_ADD) is called to perform D+C floating point addition operation. In theory, the input and output data types of the floating point operation are the same, so the floating point operation will output according to the floating point format specified by the IEEE754 standard according to the input data type of the floating point number, and before outputting, the results of the division operation and the addition operation will be normalized (normal) and in (routing) operations.

The floating-point division operation generally needs to perform iterative computation, the result of each iteration is generally temporarily stored in a register, and the number of iterations of the floating-point division operation and the bit width of the output result are positively correlated.

For example, taking the data type of the floating point number as FP32 as an example, the mantissa bit width of the floating point number is 23 bits, and it is assumed that a quotient with a bit width of 2 can be calculated by each floating point division operation, so that at least 12 iterations are required to calculate the division result.

However, by calling two instructions to perform a floating-point division operation and a floating-point addition operation, respectively, the operation efficiency is not high. Meanwhile, since the division result of the floating point division operation needs to be subjected to floating point addition operation with the floating point addend, then the intermediate result of the floating point division operation of each iteration needs to be subjected to one input operation, so that accuracy loss exists, furthermore, the iteration times of the floating point division operation are only related to whether two floating points can be divided, if the floating point can be divided, the iteration can be finished in advance, otherwise, the iteration needs to be continued until the mantissa bit width corresponding to the data type (such as the floating point with the data type of FP32, the cut-off bit width of the mantissa is 24 bits), and therefore, the iteration times are more in most scenes, so that the efficiency of the whole floating point division operation can be reduced, and the more the iteration times, the greater the accuracy loss.

In view of this, the embodiment of the application provides a floating point operation method for improving the precision and efficiency of floating point operation. In the method, the floating point division operation and the floating point addition operation are fused, namely, the floating point division operation and the floating point addition operation can be completed by calling a divide-add fusion operation instruction, so that the operation efficiency is improved. When floating point addition and division operations are carried out, the number of operation times of floating point division operation iteration is flexibly determined based on the obtained floating point divisor, floating point divisor and floating point addend indexes, so that division iteration operations can be flexibly carried out according to actual requirements, and under the condition that the floating point addend is large, the floating point operation efficiency is effectively improved, and meanwhile, the precision loss of intermediate operations is reduced; meanwhile, because the exponent of the floating point addend is used when the operation times of the floating point division operation are determined, the division result of each floating point division operation can be dynamically expanded based on the exponent of the floating point addend when the floating point addition operation is carried out, the calculation precision of the intermediate result is reserved as far as possible, and the precision of the intermediate result generated by the floating point division operation is effectively reserved, so that the precision of the whole floating point addition operation is improved.

Fig. 2 is a schematic diagram of a possible application scenario in an embodiment of the present application. The application scenario includes a terminal device 110 (specifically, a terminal device 1101, a terminal device 1102 and a terminal device …), and a server device 120.

In an alternative embodiment, communication between terminal device 110 and server device 120 may be via a wired network or a wireless network.

Terminal devices 110 include, but are not limited to, personal computers, cell phones, tablet computers, notebooks, e-book readers, smart medical devices, and computer devices with certain computing capabilities such as vehicle terminals. And the computer device may be provided with image processing software for performing floating point operations on images recorded in floating point form.

The server device 120 is a background server corresponding to the image processing software, or a server dedicated to performing floating point operations, which is not particularly limited in the present application. The server device 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), basic cloud computing services such as big data and an artificial intelligence platform.

In the embodiment of the present application, any one or all of the terminal device 110 and the server device 120 may be deployed with an image processing model after training the neural network, for performing floating point operation on the image.

In some possible embodiments, when the image processing model is deployed on the terminal device 110, the terminal device 110 acquires the target image through the camera, and after receiving the data processing instruction (which may be triggered by the target object or automatically triggered by the system), the image processing model is used to calculate a plurality of floating point numbers included in the target image, so as to obtain a processed image, and the processed image is displayed to the target object through the user interface. When floating point addition and division operation is needed, a floating point addition and division operation instruction is obtained, a floating point dividend, a floating point divisor and a floating point adder are obtained according to the floating point addition and division operation instruction, the operation times of floating point division and division operation in the floating point addition and division operation are determined based on the indexes of the three floating points, then the division result of the floating point dividend and the floating point divisor is obtained according to the operation times, and then the addition operation is carried out on the division result and the mantissa of the floating point adder to obtain a division and addition fusion operation result.

In other possible embodiments, when the image processing model is deployed on the server device 120, the server device 120 receives the image sent by the terminal device 110, uses the target classification model to calculate a plurality of floating points included in the target image sent by the terminal device 110, obtains a processed image, sends the processed image to the terminal device 110, and the terminal device 110 displays the processed image to the target object. The operation of the floating point number by the server device 120 is consistent with the operation of the terminal device 110, and is not repeated here.

It should be noted that, the embodiment of the present application does not require restriction on the scene of the floating point operation in the image processing, for example, the floating point operation may be applied to the scene of denoising the image, the scene of performing format conversion on the image, and the scene of classifying and identifying the image.

It should be noted that, in the embodiments of the present application, related data such as the target image is related, when the above embodiments of the present application are applied to specific products or technologies, permission or consent is required, and collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions.

Based on the application scenario shown in fig. 2, an implementation flow of a floating point operation method provided by the embodiment of the present application is shown in fig. 3, where the flow is executed by an electronic device, and the electronic device may be the terminal device 110 in fig. 2 or the server device 120 in fig. 2, and the electronic device is provided with a floating point operation processor, and is used to implement a floating point operation flow, where the flow mainly includes the following steps:

s301: and obtaining a divide-and-fuse operation instruction.

Floating point operations include floating point addition operations, floating point subtraction operations, floating point multiplication operations, floating point division operations, and joint operations (e.g., floating point multiply add operations, floating point divide add operations, etc.). When floating point addition and division operation is needed, an addition and division fusion operation instruction is obtained, wherein the addition and division fusion operation instruction can be sent by other equipment or can be automatically generated according to calculation logic, and the like, and the method is not particularly limited.

The floating point fusion operation instruction is an instruction obtained by fusing the floating point division operation instruction and the floating point addition operation instruction, namely, based on one division addition fusion operation instruction, the floating point division operation and the floating point addition operation can be realized, the two instructions are not required to be called for respectively carrying out the floating point division operation and the floating point addition operation, and the operation efficiency is higher.

In floating point operations, floating point numbers may be recorded in accordance with a uniform standard such that each floating point number has a uniform data format, e.g., each floating point number complies with the IEEE754 standard. Please refer to fig. 4, which is a schematic diagram of the IEEE754 standard.

The floating point number may be represented by three parts including a sign bit, a exponent bit, and a mantissa bit. Wherein, the sign bit can contain a bit value for representing the positive and negative of the floating point number; the exponent bits may include w-bit values for characterizing the order of the floating point number; the mantissa bits may comprise p-bit values that are used to characterize the value of the floating point number.

The floating point addition operation may be applied to image processing scenes recorded in a plurality of floating point numbers, including but not limited to image preprocessing scenes, image feature extraction scenes, and the like. Wherein the plurality of floating point numbers are data related to the image. For example, an image is characterized in a plurality of floating point numbers; for another example, the image features of the image are characterized by a matrix of floating point numbers; for another example, intermediate data of the image in the AI data processing is represented by a plurality of floating point numbers, and the like, specifically, is not limited.

S302: based on the division and addition fusion operation instruction, a floating point dividend, a floating point divisor and a floating point adder of the floating point division and addition operation are obtained.

The floating point addition operation comprises two operation relations of floating point division operation and floating point addition operation, and after a division addition fusion operation instruction is acquired, the fusion operation of the floating point division operation and the floating point addition operation, namely the floating point addition operation, is carried out.

In the floating point addition and division operation process, floating point division operation is firstly carried out, then floating point addition operation is carried out, three floating point numbers are involved in total, the floating point numbers are divided into a floating point dividend, a floating point divisor and a floating point adder, and the data types of the three floating point numbers are the same.

For example, the formula A/B+C is a floating point addition operation, where A is a floating point dividend, B is a floating point divisor, C is a floating point addend, and the data types of the three floating point numbers are FP32.

S303: the number of floating-point division operations in the floating-point division and addition operation is determined based on the exponents of the three floating-point numbers.

In one example, each floating point number contains both mantissa and exponent portions. Let mantissas of floating point dividend be noted man_a, exponents be noted exp_a, mantissas of floating point divisor be noted man_b, exponents be noted exp_b, mantissas of floating point addend be noted man_c, and exponents be noted exp_c.

In the floating-point division and addition operation, a floating-point division operation can be performed first, and the essence of the floating-point division operation is that the process of selecting the quotient is iterated continuously, which is similar to the manual division operation. Therefore, the number of floating-point division operation iterations in the floating-point addition operation affects the accuracy and efficiency of the entire addition operation.

At present, under the condition that the division cannot be completed, the related technology generally iterates until the bit width of the quotient reaches the bit width corresponding to the data type of the floating point number specified in the standard, and then stops, so that the floating point addition operation efficiency is reduced, input operation is performed on the output result every time of the floating point division operation, and as the operation times are increased, the precision loss is increased.

In order to solve the problem, the embodiment of the application provides a mode for flexibly setting the operation times of the floating-point division operation iteration according to different conditions, and dynamically determining the operation times of the floating-point division operation iteration in the floating-point addition operation based on the indexes of the three floating-point numbers after the floating-point dividend, the floating-point divisor and the floating-point addition number in the floating-point addition operation are obtained.

In one example, the three floating point exponents, i.e., the floating point dividend, the floating point divisor and the floating point adder, are input into an Exponent processing unit (exponents Process) to perform subtraction operation to obtain an Exponent difference value, and the number of times of floating point division operation in the floating point divisor addition operation is determined by combining a preset mantissa digit width. Wherein the preset tail number bit width is set according to the data types of the three floating point numbers.

In specific implementation, a first exponent difference value between the exponent of the floating point dividend and the exponent of the floating point divisor is calculated based on the exponent processing unit, a second exponent difference value between the exponent of the floating point addend and the first exponent difference value is calculated, and then the operation times of the floating point division operation are determined based on the second exponent difference value and a preset mantissa digit width.

For example, the second exponent difference is formulated as: exp_delta=exp_c- (exp_a-exp_b) =exp_c-exp_a_sub_b, wherein exp_a_sub_b is the first exponent difference.

In the embodiment of the application, the operation times of floating-point division operation iteration in floating-point division addition operation can be flexibly set according to different conditions, so that the number of floating-point division operation iteration is reduced, the operation efficiency is improved, and the operation precision is improved.

Taking 5 different cases as an example, as shown in fig. 5, assuming that the preset mantissa bit width is m, the determination method of the number of operation times of the floating point division operation iteration mainly includes the following steps:

case one

When the second exponent difference exp_delta is greater than or equal to the sum of the preset mantissa bit width m and the reference bit width k1, that is, exp_delta is greater than or equal to m+k1, it is indicated that the floating point quotient corresponding to the mantissa man_a of the floating point dividend a and the mantissa man_b of the floating point divisor B is much smaller than the mantissa man_c of the floating point addend C, and the subsequent floating point addition result is not affected. Where k1=2 is represented as G bits and S bits in the rounding operation, G bits being the lowest reserved bits in the preset mantissa bit width, S bits being the remaining bits of the bits except the highest reject bit or the reject last bit obtained by the operation.

Case two

When the second exponent difference exp_delta is greater than the first preset bit width k2 and less than the sum of the preset mantissa bit width m and the reference bit width k1, that is, m+k1> exp_delta > k2, it indicates that the floating point quotient corresponding to the mantissa man_a of the floating point dividend a and the mantissa man_b of the floating point divisor B is smaller than the mantissa man_c of the floating point addend C, but there are overlapping portions G1 and G2 in the mantissa range of the effective output, at this time, the number of times of operation of the floating point division operation iteration can be determined based on the difference between the sum of the preset mantissa bit width and the reference bit width and the second exponent difference. Where k2=1.

For example, assuming that a 2bit floating point quotient can be calculated for each floating point division operation, the number of operations is (m+k1-exp_delta)/2.

Case three

When the second exponent difference value exp_delta is greater than or equal to the second preset bit width k3 and less than or equal to the first preset bit width k2, and the preset mantissa bit width m is greater than the leading zero number z, that is, k2 is greater than or equal to exp_delta is greater than or equal to k3& & z < m, the difference between the floating point quotient corresponding to the mantissa man_a of the floating point dividend A and the mantissa man_b of the floating point divisor B and the mantissa man_c of the floating point addend C is smaller, if the sign bit of the floating point quotient is different from the sign bit of the floating point addend, the mantissa of the floating point quotient and the mantissa of the floating point addend actually performs a floating point subtraction operation, leading zero may be generated in the floating point subtraction operation, if z < m, and if z < m, the number of floating point division iteration operation can be determined based on the preset mantissa bit width m, the reference bit width k1 and the leading zero number z. Wherein the second preset bit width k3= -2.

For example, assuming that a 2bit floating point quotient can be calculated for each floating point division operation, the number of operations is (m+k1+z)/2.

Case four

When the second exponent difference exp_delta is greater than or equal to the second preset bit width k3 and less than or equal to the first preset bit width k2, and the preset mantissa bit width m is equal to the preset mantissa bit width m, that is, k2 is greater than or equal to exp_delta is greater than or equal to k3& z=m, it is indicated that the floating point quotient corresponding to mantissa man_a of floating point dividend a and mantissa man_b of floating point divisor B has a smaller phase difference from mantissa man_c of floating point addend C, if the sign bit of the floating point quotient is different from the sign bit of the floating point addend, the subtraction operation actually performed by the mantissa of the floating point quotient and the floating point addend is performed, the leading zero number z generated in the floating point subtraction operation is equal to the preset mantissa bit width m, and at this time, the number t of consecutive zeros after the leading zero z bits of the floating point quotient is considered, because the floating point quotient is the normalized result, that is in the form of 1.Xxx x 2 e, therefore, in order to ensure the maximum accuracy of the operation result, the iterative operation of the leading zero number after the preset mantissa bit width m, the leading zero number k, and the leading zero number z.

For example, assuming that a 2bit floating point quotient can be calculated for each floating point division operation, the number of operations is (m+k1+z+t)/2.

Case five

When the second exponent difference exp_delta is smaller than the third bit width, it indicates that the floating point quotient corresponding to the mantissa man_a of the floating point dividend a and the mantissa man_b of the floating point divisor B has a larger phase difference from the mantissa man_c of the floating point addend C, and at this time, the number of times of operation of the floating point division operation iteration can be determined based on the preset mantissa bit width and the absolute value of the second exponent difference.

For example, assuming that a 2-bit floating-point quotient can be calculated for each floating-point division operation, the number of operations is (|exp_delta|+m)/2.

According to the embodiment of the application, for the floating-point division operation in the floating-point addition operation, after three floating-point numbers are obtained based on the addition and division fused operation instruction, the number of times of the floating-point division operation iteration is dynamically determined according to the indexes of the three floating-point numbers under different conditions, so that the floating-point division operation is not required to be iterated or only required to be iterated in a small amount of iteration operation under the condition of larger floating-point addition number in the floating-point addition operation, the efficiency of the whole floating-point addition operation is effectively improved, the intermediate precision loss generated by division iteration is reduced, and more valid bits in the floating-point division result can be reserved under the condition of higher operation precision requirement, thereby ensuring the calculation precision and simultaneously considering the calculation performance.

S304: and (3) carrying out floating-point division operation on the mantissa of the floating-point dividend and the mantissa of the floating-point divisor based on the operation times to obtain a floating-point quotient and a floating-point remainder.

The floating point division operation mainly comprises two parts, namely an exponent part is subtracted, and a mantissa part is divided, and the specific operation process is shown in fig. 6, and mainly comprises the following steps:

s3041: at least one boundary value of the floating point divisor is obtained based on a preset lookup table.

And aiming at the floating point division operation in the floating point addition operation, searching in a preset lookup table according to the mantissa man_B of the floating point divisor to obtain at least one boundary value corresponding to the floating point divisor.

S3042: the first intermediate iterated register and the second intermediate iterated register are initialized based on the floating point dividend.

The first intermediate iteration register Carry and the second intermediate iteration register Sum are initialized based on the mantissa man_a of the floating point dividend. Specifically, the mantissa of the floating-point quilt number, man_a, is used as the initial value of the first intermediate iteration register, carry, and the initial value of the second intermediate iteration register Sum is set to 0.

The first intermediate iterative register Carry and the second intermediate iterative register Sum are used for storing floating point remainder determined by the floating point dividend and the floating point divisor, namely, the Sum of the first intermediate iterative register Carry and the second intermediate iterative register Sum is the floating point remainder.

S3043: and carrying out floating-point division operation on the floating-point dividend, the floating-point divisor and each boundary value based on the operation times, and determining a floating-point quotient and a floating-point remainder based on an operation result when the operation times arrive.

In one example, for a floating-point division operation in a floating-point divide-and-add operation, an iterative process may be performed using a shortest remaining time first (Shortest Remaining Time, SRT) algorithm.

Taking a floating-point division operation as an example, the process of the floating-point division operation is shown in fig. 7, and mainly includes the following steps:

s3043_1: and respectively acquiring the current values of the first intermediate iteration register and the second intermediate iteration register, and inputting the current values into the second carry transfer adder to obtain the truncated value of the floating point remainder corresponding to the floating point dividend and the floating point divisor.

In specific implementation, the obtained current value Carry of the first intermediate iteration register _i And the current value Sum of the second intermediate iteration register _i And inputting the floating-point remainder into a second carry adder to obtain the floating-point remainder of the floating-point division operation, truncating the floating-point remainder obtained by the floating-point division operation according to the bit width of the second carry transfer adder, and obtaining a truncated value of the floating-point remainder.

For example, assume that the second Carry propagate adder has a bit width of 8 bits by performing the Carry _i And Sum of _i After summation, if the bit width of the obtained floating point remainder exceeds 8 bits, only the first 8 bits of the floating point remainder are reserved, and a truncated value of the floating point remainder is obtained.

S3043_2: and inputting each boundary value and the truncated value into a corresponding comparator for comparison, and selecting a floating point quotient of the floating point dividend and the floating point divisor by adopting a multiplexer based on the comparison result.

In specific implementation, the truncated value of the floating point remainder output by the second carry transfer adder and at least one boundary value of the found floating point divisor are respectively input into a comparator for comparison, and a Multiplexer (multiplexor) is used for selecting a floating point quotient corresponding to the floating point divisor and the floating point divisor according to the comparison result.

As shown in fig. 8, taking an SRT algorithm based on 4 comparators (abbreviated as cmp) as an example, assuming that the bit width of the floating point quotient output by the current floating point division operation is 2 bits, 4 boundary values corresponding to mantissas of the floating point divisor which are searched from a preset lookup table are { LUT [0], LUT [1], LUT [2], LUT [3] }, and each boundary value and a truncated value of the floating point remainder output by the second carry transfer adder are respectively input into the 4 comparators for comparison, and then a 5-by-1 multiplexer is adopted to select the floating point quotient of the current floating point division operation according to the comparison result. Specifically, when the truncated value of the floating point remainder in the first intermediate iteration register and the second intermediate iteration register is smaller than the boundary value LUT [0], a floating point quotient with the bit width of-2exp_delta is selected; when the truncated value of the floating point remainder is between the boundary value LUT [0] and the boundary value LUT [1], selecting a floating point quotient with the bit width of-1exp_delta; when the truncated value of the floating point remainder is between the boundary value LUT [1] and the boundary value LUT [2], selecting the floating point quotient as 0; when the truncated value of the floating point remainder is between the boundary value LUT [2] and the boundary value LUT [3], selecting a floating point quotient with the bit width of exp_delta; when the truncated value of the floating-point remainder is greater than the boundary value LUT [3], a floating-point quotient with a bit width of 2exp_delta is selected.

S3043_3: and based on the selected floating point quotient, obtaining a floating point remainder of the floating point division operation, and updating the current values of the first intermediate iteration register and the second intermediate iteration register.

After the floating-point quotient of the floating-point division operation is determined, the floating-point remainder in the floating-point division operation is also determined. Specifically, the selected floating point quotient is multiplied by the mantissa of the floating point divisor to obtain a product result, then a Carry Save Adder (CSA) is adopted to subtract the product result from the sum of the current values of the first intermediate iterative register and the second intermediate iterative register to obtain the floating point remainder of the current floating point division operation, and the floating point remainder of the current floating point division operation is stored in the first intermediate iterative register and the second intermediate iterative register separately to update the current values of the first intermediate iterative register and the second intermediate iterative register, that is, the CSA comprises three input ports and two output ports. According to the embodiment of the application, the CSA is used for subtraction operation, so that the transmission of carry can be avoided, and the operation delay is reduced.

For the floating-point division operation process in the floating-point addition operation, as shown in fig. 9, the embodiment of the application provides a floating-point division operation circuit, wherein at least one boundary value corresponding to the mantissa of the floating-point divisor is obtained by searching a preset lookup table, and the mantissa of the floating-point divisor is used as the initial value Carry of the first intermediate iterative register _i Sum of initial values of the second intermediate iterator register _i Setting to zero, passing through the second carry transfer adder with the bit width of xbit to obtain the truncated value of the floating point remainder stored in the intermediate iteration register, comparing the truncated value with the magnitude of each boundary value through a comparator, selecting the floating point quotient with the corresponding bit width by a multiplexer, combining the sum of the current values of the intermediate iteration register,determining the floating point remainder of the floating point division operation, and separately storing the floating point remainder in a first intermediate iteration register Carry _i And a second intermediate iteration register Sum _i In order to update the first intermediate iteration register Carry _i And a second intermediate iteration register Sum _i Is the current value of (c).

In one example, a negative number may occur in the floating-point quotient selected by the multiplexer, and therefore, the floating-point quotient selected by each floating-point division operation may be input to the OFCVT for on-line conversion to obtain a positive floating-point quotient and stored in the quotient register before the floating-point addition operation in the floating-point division addition operation is performed.

After the floating-point division operation iteration is completed, a final floating-point quotient and a final floating-point remainder of the mantissa of the floating-point dividend and the mantissa of the floating-point divisor can be obtained.

S305: and inputting mantissas of the floating point quotient and the floating point addend into the first carry transfer adder to perform floating point addition operation to obtain an initial addition result, and performing input operation on the initial addition result based on the floating point remainder to obtain a target addition result.

After the floating-point division operation in the floating-point addition operation is completed, the floating-point addition operation can be performed on the obtained floating-point division result and the floating-point addend, and the floating-point addition operation process is shown in fig. 10, and mainly includes the following steps:

s3051: and performing a step-by-step shifting operation on the mantissa of the floating point addend based on the second exponent difference value determined by the exponents of the three floating point numbers, so that the mantissa of the floating point addend is aligned with the bit width of the floating point quotient.

The mantissa of the floating point addend is subjected to a level Shift operation (Alignment Shift) using the second exponent difference value exp_delta such that the mantissa of the floating point addend is aligned with the bit width of the floating point quotient corresponding to the mantissa of the floating point dividend and the mantissa of the floating point divisor.

S3052: and inputting the mantissa and floating point quotient of the floating point addend into a first carry transfer adder to obtain an initial addition result.

And summing mantissa floating point quotient of the floating point addend by adopting a first carry transfer adder to obtain an initial addition result, wherein the bit width of the initial addition result is the same as that of the initial addition result.

S3053: leading zero prediction is carried out on mantissa of floating point addend and floating point quotient, and a prediction result is obtained.

Meanwhile, the mantissa and floating point quotient of the floating point addend are input into a leading zero prediction unit (Leading Zero Anticipator, LZA) to conduct leading zero prediction, and a prediction result is obtained.

In the embodiment of the application, the leading zero prediction and the floating point addition operation are performed in parallel, so that the integral calculation delay of the floating point addition operation can be further reduced, and the operation efficiency is improved.

S3054: and carrying out normalized shift operation on the initial addition result based on the prediction result to obtain an intermediate addition result.

And (3) performing normalized Shift operation (normal Shift) on the initial addition result based on the prediction result output by the LZA according to the standard of floating point number record to obtain an intermediate addition result.

S3055: and performing input operation on the intermediate addition result based on the floating point remainder and the preset tail number bit width to obtain a target addition result.

After division iteration is completed, the values of the first intermediate iteration register and the second intermediate iteration register are summed through the first carry transfer adder, a floating point remainder can be obtained, and a target division addition result can be obtained based on floating point preset and preset tail number bit widths.

In particular, an or operation (stick) is performed on each digit in the floating point remainder to obtain a divisor result of the mantissa of the floating point dividend and the mantissa of the floating point divisor, for example, when the mantissa of the floating point dividend can be divised by the mantissa of the floating point divisor, the result of the or operation is 0, and when the mantissa of the floating point dividend cannot be divised by the mantissa of the floating point divisor, the result of the or operation is 1. Further, based on the divisor result and the preset tail number bit width, in operation is performed on the intermediate divisor result to obtain a target divisor result of the floating point divisor operation.

The specific process of operation is shown in fig. 11, and mainly comprises the following steps:

s3055_1: and when the divisor result is not divisible, determining the waste bit width based on the intermediate divisor result and the preset tail number bit width.

When the data type of the floating point number is determined, the mantissa bit width of the floating point operation output result is determined, and is recorded as the preset mantissa bit width (namely the valid bit), when the divisor result is not divisible, the bit width of the intermediate divisor result can be used for subtracting the preset mantissa bit width, and the waste bit width is obtained.

S3055_2: and acquiring the lowest reserved bit in the preset tail number bit width and the highest reject bit in the reject bit width, and performing OR operation on the rest bits in the reject bit width to obtain the reject tail bit.

S3055_3: and performing rounding operation on the intermediate addition result based on the least reserved bit, the most reserved bit, the last discarded bit and a preset rounding mode to obtain a target addition result.

For example, as shown in fig. 12, assuming that the bit width of the significant bit is m (unit: bit), the bit width of the discard bit is n (unit: bit), the least reserved bit of the significant bit is G bit, the most discarded bit of the discard bits is R bit, the remaining bits of the discard bits divided by the most discarded bit or the result after the operation is S, and the target addition operation result is either the significant bit or the significant bit +1 according to the G/R/S and the preset rounding mode.

In the floating point operation method provided by the embodiment of the application, the fusion operation of the floating point division operation and the floating point addition operation in the floating point addition operation can be realized through the obtained one division addition operation fusion instruction, different instructions are not required to be respectively called for the two floating point operations, and the operation efficiency is higher. In the floating point division and addition operation process, the operation times of floating point division operation iteration are dynamically determined based on the indexes of the floating point dividend, the floating point divisor and the floating point adder, so that the division iteration times can be flexibly adjusted according to actual conditions, and only a small amount of floating point division operations are required to be executed under the condition that the mantissa of the floating point adder is larger than the floating point quotient, so that the operation efficiency is improved, the intermediate precision loss caused by division iteration is reduced, and the operation precision is improved. Meanwhile, because the exponent of the floating point addend is used when the operation times of the floating point division operation are determined, the division result can be dynamically expanded based on the exponent of the floating point addend when the floating point addition operation is carried out, the intermediate precision generated by the division operation in the floating point division addition operation is effectively reserved, and the precision of the whole operation is further improved.

In order to implement floating point addition and division operations through a division and addition fusion instruction, an embodiment of the present application provides a Division and Addition Fusion (DAF) circuit, as shown in fig. 13, after a floating point quotient and a floating point remainder determined based on a mantissa of a floating point dividend and a mantissa of a floating point divisor, aligning bit widths of the floating point quotient and the mantissa of the floating point addend through a step-by-step shifting operation, and inputting the floating point quotient and the mantissa of the floating point addend into a carry transfer adder and a leading zero prediction unit, respectively obtaining an initial addition result and a prediction result, and performing a normalization shifting operation on the initial addition result according to the prediction result to obtain an intermediate addition result. Further, the values of the first intermediate iteration register and the second intermediate iteration register are summed through the carry transfer adder to obtain a floating point remainder, each bit number of the floating point remainder is subjected to OR operation, and the intermediate addition result is subjected to addition operation based on the OR operation result to obtain a target addition result.

The divide-and-add fusion circuit provided by the embodiment of the application can execute floating point division operation and floating point addition operation by calling one divide-and-add fusion operation instruction, effectively improves the operation efficiency of the whole divide-and-add operation, has small occupied chip area compared with independent circuits of division operation and addition operation, can be suitable for various terminal equipment, and improves the competitiveness of products.

The divide-and-fuse circuit provided by the embodiment of the application can be suitable for various processing scenes of target images recorded in a floating point number form.

In one example, the divide-and-fuse circuit provided by the embodiment of the application can be applied to an image format conversion scene. Specifically, the divide-add fusion circuit is integrated in the smart phone and is used for executing an algorithm of an image processing model deployed in the smart phone, converting the format of the image and generating the cartoon image.

For example, as shown in fig. 14A, for a face image of a target object collected by a smart phone, an image conversion request is sent through an "immediate conversion" option, based on the request, the smart phone performs preprocessing operations such as normalization of color values on the face image, so as to obtain images recorded in a plurality of floating points, and then, an image processing model is adopted to perform feature extraction on the preprocessed images, wherein floating point operations involved in a feature extraction process include floating point addition and addition operations. And generating a cartoon image corresponding to the face image through floating point operation.

It should be noted that, when an image recorded in the form of a floating point number is processed, other floating point operations such as multiply-add operation, subtract operation, and the like may also be performed.

In an example, the divide-and-fuse circuit provided by the embodiment of the application can be applied to preprocessing a scene of an image during target recognition. Specifically, before AI processing is performed on the image, the denoising processing is performed on the target image by adopting the denoising fusion circuit, so that the accuracy of target identification is improved.

For example, as shown in fig. 14B, the upper part is an image before denoising, and the lower part is an image after denoising, which is a contrast effect diagram of denoising an image based on floating point operation. In addition, other operations, such as multiply-add operation, subtract operation, etc., may be performed in addition to the addition operation.

Based on the same technical conception, the embodiment of the application provides a structural schematic diagram of a floating point operation device, which can realize the floating point operation method and achieve the same technical effects.

Referring to fig. 15, the floating point arithmetic device includes an input/output interface 1501, a floating point divider 1502, floating point adders 1503 and , and a unit 1504, wherein:

an input/output interface 1501 for acquiring a divide-by-merge operation instruction; based on the division and addition fusion operation instruction, a floating point dividend, a floating point divisor and a floating point adder of floating point division and addition operation are obtained; the floating point fusion operation instruction is an instruction obtained by fusing a floating point division operation instruction and a floating point addition operation instruction;

A floating-point divider 1502, configured to perform a floating-point division operation on the mantissa of the floating-point dividend and the mantissa of the floating-point divisor based on the operation number, to obtain a floating-point quotient and a floating-point remainder;

a floating-point adder 1503, configured to input the floating-point quotient and the mantissa of the floating-point addend into a first carry transfer adder to perform floating-point addition operation, and obtain an initial addition result;

an input unit 1504 is configured to perform an input operation on the initial addition result based on the floating-point remainder, to obtain a target addition result.

Optionally, the floating point divider includes an exponent processing unit 1502_1 and an iterative operator 1502_2:

the exponent processing unit 1502_1 is configured to calculate a first exponent difference value between the exponent of the floating point dividend and the exponent of the floating point divisor, and calculate a second exponent difference value between the exponent of the floating point addend and the first exponent difference value;

the iterative operator 1502_1 is configured to determine the number of operations of the floating-point division operation in the floating-point division and addition operation based on the second exponent difference and a preset mantissa bit width.

Optionally, the iterative operator 1502_2 is specifically configured to:

Optionally, the floating point divider further includes a preset lookup table 1502_3, a first intermediate iteration register 1502_4, and a second intermediate iteration register 1502_5:

The preset lookup table 1502_3 is configured to query at least one boundary value of the floating point divisor;

the first intermediate iterated register 1502_4 and the second intermediate iterated register 1502_5 are initialized based on the floating point dividend; wherein the first intermediate iterated register 1502_4 and the second intermediate iterated register 1502_5 are configured to store the floating point dividend and a floating point remainder determined by the floating point divisor;

the floating point divider 1502 is specifically configured to: and carrying out floating point division operation on the floating point dividend, the floating point divisor and each boundary value based on the operation times, and determining the floating point quotient and the floating point remainder based on an operation result when the operation times arrive.

Optionally, the floating point divider 1502 further includes a comparator 1502_6, a multiplexer 1502_7, and a second carry propagate adder 1502_8, specifically configured to:

the current values of the first intermediate iteration register 1502_4 and the second intermediate iteration register 1502_5 are respectively obtained, and are input into the second carry transfer adder 1502_8 to obtain a truncated value of a floating point remainder corresponding to the floating point dividend and the floating point divisor;

Inputting each boundary value and the cut-off value into a corresponding comparator 1502_6 for comparison, and selecting a floating point quotient of the floating point dividend and the floating point divisor by using the multiplexer 1502_7 based on the comparison result;

based on the selected floating-point quotient, a floating-point remainder of the current floating-point division operation is obtained, and the current values of the first intermediate iterated register 1502_4 and the second intermediate iterated register 1502_5 are updated.

Optionally, the floating point divider further includes a multiplier 1502_9 and a carry save adder 1502_10, specifically configured to:

multiplying the selected floating point quotient by the floating point divisor based on the multiplier 1502_9 to obtain a product result;

the carry save adder 1502_10 is adopted to subtract the current sum of the values of the first intermediate iteration register 1502_4 and the second intermediate iteration register 1502_5 from the product result to obtain the floating point remainder of the current floating point division operation;

the floating point remainder of the current floating point division operation is stored separately in the first intermediate iteration register 1502_4 and the second intermediate iteration register 1502_5, so as to update the current values of the first intermediate iteration register 1502_4 and the second intermediate iteration register 1502_5.

Optionally, the floating point divider further includes an online conversion unit 1502_11 configured to:

Optionally, the floating-point adder 1503 includes a relative shift operation unit 1503_1, a first carry-propagate adder 1503_2, a leading zero prediction unit 1503_3, and a normalization shift operation unit 1503_4;

the order shift operation unit 1503_1 is configured to perform an order shift operation on the mantissa of the floating point addend based on the second exponent difference value determined by the exponents of the three floating point numbers, so that the mantissa of the floating point addend is aligned with the bit width of the floating point quotient;

the first carry transfer adder 1503_2 is configured to perform a floating point addition operation on the floating point quotient and the mantissa of the floating point addend, to obtain an initial addition result;

the order shift operation unit 1503_3 is configured to perform leading zero prediction on the mantissa of the floating point addend and the floating point quotient, so as to obtain a prediction result;

the normalization shift operation unit 1503_4 is configured to perform a normalization shift operation on the initial addition result based on the prediction result, to obtain an intermediate addition result;

The input unit 1504 is specifically configured to: and performing input operation on the intermediate addition result based on the floating point remainder and the preset tail number bit width to obtain the target addition result.

Optionally, the access unit 1504 is specifically configured to:

According to the image processing device based on floating point operation, floating point division operation and floating point addition operation can be carried out on floating point numbers through one obtained division and addition fusion operation instruction, two operation instructions are not required to be called, and floating point operation efficiency is high. When floating point addition operation is carried out, the operation times of the floating point division operation are determined based on the obtained floating point divisor, floating point divisor and floating point addend which are the indexes of the three floating points, so that division iteration operation can be flexibly carried out according to actual requirements, the precision loss of intermediate operation is reduced while the efficiency of the floating point operation is effectively improved under the condition that the floating point addend is large, and meanwhile, when the operation times of the floating point division operation are determined, the index of the floating point addend is used, so that when floating point addition operation is carried out, dynamic expansion can be carried out on the result of the floating point division operation based on the index of the floating point addend, the intermediate precision generated by the floating point division operation is effectively reserved, and the precision of the whole floating point addition operation is improved.

The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. In one embodiment, the electronic device may be a server device in fig. 2, or may be a terminal device in fig. 2. In this embodiment, the electronic device may be configured as shown in fig. 16, and includes a memory 1601, a communication module 1603, and one or more floating point arithmetic processors 1602.

A memory 1601 for storing a computer program executed by the floating point arithmetic processor 1602. The memory 1601 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an operating instruction set, and the like.

The memory 1601 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1601 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1601, is any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 1601 may be a combination of the above memories.

The floating point arithmetic processor 1602 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. The floating point processor 1602 is configured to implement the floating point method when calling the computer program stored in the memory 1601.

The communication module 1603 is used for communicating with terminal devices and other servers.

The specific connection medium between the memory 1601, the communication module 1603, and the floating point arithmetic processor 1602 is not limited in the embodiment of the present application. The embodiment of the present application is shown in fig. 16, where the memory 1601 and the floating point processor 1602 are connected by a bus 1604, and the bus 1604 is shown in bold in fig. 16, where the connection between other components is merely illustrative and not limiting. The bus 1604 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 16, but only one bus or one type of bus is not depicted.

The memory 1601 has stored therein a computer storage medium having stored therein computer executable instructions for implementing the floating point operation method of the embodiment of the present application. The floating point processor 1602 is configured to perform the steps of the floating point method described above.

In another embodiment, the electronic device may be the terminal device shown in fig. 2. In this embodiment, the structure of the terminal device may include, as shown in fig. 17: communication unit 1710, memory 1720, display unit 1730, camera 1740, sensor 1750, audio circuit 1760, bluetooth module 1770, floating point processor 1780, and the like.

The communication component 1710 is for communicating with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the object to send and receive information through the WiFi module.

Memory 1720 may be used to store software programs and data. The processor 1780 performs various functions of the terminal device and data processing by executing software programs or data stored in the memory 1720. Memory 1720 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Memory 1720 stores an operating system that enables the terminal device to operate. Memory 1720 may store an operating system and various application programs, as well as computer programs for performing the floating point operation methods of embodiments of the present application.

The display unit 1730 may also be used to display information input by an object or information provided to the object and a graphic object interface of various menus of the terminal device. In particular, the display unit 1730 may include a display screen 1732 provided at a terminal device. The display 1732 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 1730 may be used to display an application operation interface in an embodiment of the present application.

The display unit 1730 may also be used to receive input digital or character information, generate signal inputs related to object settings and function control of the terminal device, and in particular, the display unit 1730 may include a touch screen 1731 provided at the terminal device, and may collect touch operations on or near the object, such as clicking buttons, dragging scroll boxes, and the like.

The touch screen 1731 may cover the display screen 1732, or the touch screen 1731 and the display screen 1732 may be integrated to implement input and output functions of the terminal device, and after integration, the touch screen may be simply referred to as a touch screen. The display unit 1730 may display an application program and a corresponding operation procedure in the present application.

Camera 1740 may be used to capture an image of the target. The camera 1740 may be one or more. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device or a complementary metal oxide semiconductor phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then passed to a floating point arithmetic processor 1780 for conversion into a digital image signal.

The terminal device may also comprise at least one sensor 1750, such as an acceleration sensor 1751, a distance sensor 1752, a fingerprint sensor 1753, a temperature sensor 1754. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.

Audio circuitry 1760, speaker 1761, microphone 1762 may provide an audio interface between the object and the terminal device. The audio circuit 1760 may transmit the received electrical signal converted from audio data to the speaker 1761, where the electrical signal is converted to a sound signal by the speaker 1761. The terminal device may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1762 converts the collected sound signals into electrical signals, which are received by the audio circuit 1760 and converted into audio data, which are output to the communication component 1710 for transmission to, for example, another terminal device, or to the memory 1720 for further processing.

The bluetooth module 1770 is configured to interact with other bluetooth devices having bluetooth modules via a bluetooth protocol. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) that also has a bluetooth module through the bluetooth module 1770, so as to perform data interaction.

The floating point arithmetic processor 1780 is a control center of the terminal device, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs stored in the memory 1720 and calling data stored in the memory 1720. In some embodiments, the floating point arithmetic processor 1780 may include one or more processing units; the processor 1780 may also integrate an application processor that primarily handles operating systems, object interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It is understood that the baseband processor described above may not be integrated into the floating point processor 1780. The floating point processor 1780 of the present application may operate an operating system, an application, an object interface display, and a touch response, as well as the floating point method of the present application. In addition, a floating point arithmetic processor 1780 is coupled with the display unit 1730.

In some possible embodiments, aspects of the floating point operation method provided by the present application may also be implemented in the form of a program product, which includes a computer program for causing an electronic device to perform the steps of the floating point operation method according to the various exemplary embodiments of the present application described above when the program product is run on the electronic device, for example, the electronic device may perform the steps as shown in fig. 3.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory, an optical fiber, a portable compact disk read-only memory, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read-only memory and comprise a computer program and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network or a wide area network, or may be connected to an external computing device.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A floating point operation method applied to a floating point operation processor, the method comprising:

2. The method of claim 1, wherein the determining the number of operations of the floating point division operation in the floating point divide-and-add operation based on the exponents of the three floating point numbers comprises:

calculating a first exponent difference value between the exponent of the floating point dividend and the exponent of the floating point divisor based on an exponent processing unit, and calculating a second exponent difference value between the exponent of the floating point addend and the first exponent difference value;

And determining the operation times of the floating point division operation in the floating point division and addition operation based on the second exponent difference value and a preset mantissa bit width.

3. The method of claim 2, wherein the determining the number of operations of the floating point division operation in the floating point divide-and-add operation based on the second exponent difference and a predetermined mantissa bit width comprises:

4. The method of claim 1, wherein the performing a floating point division operation on the mantissa of the floating point dividend and the mantissa of the floating point divisor based on the number of operations to obtain a floating point quotient and a floating point remainder, comprises:

obtaining at least one boundary value of the floating point divisor based on a preset lookup table;

initializing a first intermediate iteration register and a second intermediate iteration register based on the floating point dividend; the first intermediate iteration register and the second intermediate iteration register are used for storing floating point remainder determined by the floating point dividend and the floating point divisor;

And carrying out floating point division operation on the floating point dividend, the floating point divisor and each boundary value based on the operation times, and determining the floating point quotient and the floating point remainder based on an operation result when the operation times arrive.

5. The method of claim 4, wherein for each floating point division operation, performing the following:

respectively obtaining the current values of the first intermediate iteration register and the second intermediate iteration register, and inputting the current values into a second carry transfer adder to obtain a truncated value of a floating point remainder corresponding to the floating point dividend and the floating point divisor;

inputting each boundary value and the truncated value into a corresponding comparator for comparison, and selecting a floating point quotient of the floating point dividend and the floating point divisor by adopting a multiplexer based on a comparison result;

6. The method of claim 5, wherein obtaining the floating point remainder of the current floating point division operation based on the selected floating point quotient and updating the current values of the first intermediate iterator register and the second intermediate iterator register comprises:

Multiplying the selected floating point quotient with the floating point divisor to obtain a product result;

a carry save adder is adopted to subtract the current sum of the values of the first intermediate iteration register and the second intermediate iteration register from the product result to obtain a floating point remainder of the floating point division operation;

7. The method of claim 5, wherein after selecting a floating point quotient of the floating point dividend and the floating point divisor using a multiplexer, the method further comprises:

when the selected floating point quotient is negative, an online conversion unit is adopted to convert the floating point quotient into positive number, and the converted floating point quotient is stored in a quotient register.

8. The method of any of claims 1-7, wherein performing a floating point addition operation on the floating point quotient and mantissa of the floating point addend using a first carry-propagate adder to obtain an initial add result, and performing a -in operation on the initial add result based on the floating point remainder to obtain a target add result, comprises:

Performing a step-by-step shift operation on mantissas of the floating point addends based on second exponent differences determined by exponents of the three floating point numbers, so that the mantissas of the floating point addends are aligned with bit widths of the floating point quotient;

inputting the mantissa of the floating point addend and the floating point quotient into the first carry transfer adder to obtain the initial addition result, and performing leading zero prediction on the mantissa of the floating point addend and the floating point quotient to obtain a prediction result;

based on the prediction result, carrying out normalized shift operation on the initial addition result to obtain an intermediate addition result;

and performing input operation on the intermediate addition result based on the floating point remainder and the preset tail number bit width to obtain the target addition result.

9. The method of claim 8, wherein performing an in operation on the intermediate add result based on the floating point remainder and a predetermined mantissa bit width to obtain the target add result comprises:

10. The method of claim 9, wherein the performing an in operation on the intermediate add result based on the divisor result and the preset tail digital width to obtain the target add result comprises:

11. A floating point arithmetic processor, comprising:

12. An electronic device comprising a floating point arithmetic processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the floating point arithmetic processor to perform the steps of the method of any one of claims 1 to 10.

13. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-10 when said computer program is run on the electronic device.

14. A computer program product comprising a computer program, the computer program being stored on a computer readable storage medium; when a floating point arithmetic processor of an electronic device reads the computer program from the computer readable storage medium, the floating point arithmetic processor executes the computer program such that the electronic device performs the steps of the method of any one of claims 1-10.