CN114756200A

CN114756200A - 64-bit adder for realizing 4 Booth-based multiplier and realization method, arithmetic circuit and chip thereof

Info

Publication number: CN114756200A
Application number: CN202210402682.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Yuanqi Advanced Microelectronics Co ltd
Current assignee: Hangzhou Yuanhe Technology Co ltd
Priority date: 2022-04-02
Filing date: 2022-04-18
Publication date: 2022-07-15

Abstract

The embodiment of the application provides a 64-bit adder for realizing a base 4Booth multiplier, and an implementation method, an arithmetic circuit and a chip thereof, wherein the adder comprises: the device is used for determining the bit corresponding to 16 groups of 32-bit partial products with the basic 4Booth multiplication carry weight on the 0 th bit to the 63 rd bit, respectively compressing the partial products on the 0 th bit to the 63 rd bit and outputting 2 groups of 64-bit data; a carry adder with a carry chain, comprising: each carry module corresponds to a plurality of bits of the 64-bit 2-group data, and the preprocessing unit of each carry module is used for preprocessing the plurality of bits of the corresponding 64-bit 2-group data; the carry calculation unit is used for generating carry output of each bit corresponding to the nth carry module and an inter-stage carry parameter of the nth carry module; the summation module is electrically connected with the N carry modules and is used for processing the 2 groups of data with 64 bits and obtaining the corresponding summation result.

Description

64-bit adder for realizing 4 Booth-based multiplier and realization method, arithmetic circuit and chip thereof

Technical Field

The embodiment of the application relates to the field of circuits, in particular to a 64-bit adder for realizing a base 4Booth multiplier, an implementation method thereof, an arithmetic circuit and a chip.

Background

The radix-4-Booth multiplier is one of the commonly used circuits in digital circuit design, for example, the radix-4-Booth multiplier is often used in complex logic chips such as a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU), and is also often used in comprehensive design chips such as a Micro Controller Unit (MCU) and a Field Programmable Gate Array (FPGA). In general, the multiplication operation can be divided into three steps: partial product generation, partial product compression to two lines of vectors, and finally adding the two lines of vectors. In the partial product generation, radix-4-Booth coding is usually adopted, and the radix-4-Booth coding can reduce the number of partial products of the multiplier by half.

Therefore, how to obtain a final calculation result based on a partial product in the radix 4-Booth code so as to improve the overall performance of the radix 4-Booth code multiplier becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, embodiments of the present application provide a 64-bit adder for implementing a 4-Booth-based multiplier, and an implementation method, an arithmetic circuit, and a chip thereof, so as to overcome all or some of the above technical drawbacks.

In a first aspect, an embodiment of the present application provides a 64-bit adder for implementing a radix-4 Booth multiplier, which includes:

the multi-path carry saving adder is used for determining bit positions corresponding to 16 groups of 32-bit partial products with base 4Booth multiplication carry weights on 0-63 bit positions, respectively compressing the partial products on the 0-63 bit positions and outputting 2 groups of 64-bit data, and the number of the multi-path carry saving adders used for compressing on the 0-63 bit positions is the sum of the number of the partial products on the corresponding bit positions and the number of sign bits minus 2;

a carry adder with a carry chain, configured to add and sum the 64-bit 2 groups of data, wherein the carry adder with the carry chain includes:

each carry module corresponds to a plurality of bit positions of the 64-bit 2-group data, wherein the nth carry module is connected with the (N-1) th carry module and is used for receiving the interstage carry parameters output by the (N-1) th carry module, the multiplicand and the multiplier are 32-bit binary numbers, N is an integer less than or equal to 7, and N is an integer greater than 1 and less than or equal to N; each carry module comprises a preprocessing unit and a plurality of carry calculation units, wherein one carry calculation unit corresponds to one bit of the 64-bit group 2 data; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer greater than or equal to 0 and less than or equal to 31;

the n carry module comprises a preprocessing unit used for preprocessing a plurality of bits in the corresponding 64-bit 2-group data;

the plurality of carry calculation units included in the nth carry module are used for performing operation according to the result of the preprocessing and the interstage carry parameter of the (n-1) th carry module to generate the carry output of each bit corresponding to the nth carry module and the interstage carry parameter of the nth carry module;

a summation module electrically connected to the N carry modules for processing the 64-bit 2 groups of data when a sign bit gating control signal of the 64-bit 2 groups of data is a valid bit, the processing comprising: negating the highest bit of the partial product of the 64-bit 2 groups of data, adding 1 to the highest bit of the first partial product, and adding 1 bit before the highest bit of all the partial products, wherein the bit is 1; and the carry-out unit is used for carrying out operation according to each bit in the processed 2 groups of data with 64 bits and the corresponding carry-out output to obtain a corresponding summation result; wherein the sign bit strobe control signal is used to characterize the partial product as the multiplicand multiplied by a negative multiple.

In a second aspect, the present application provides a method for implementing a 64-bit adder for implementing a radix-4 Booth multiplier, comprising:

receiving 16 groups of 32-bit partial products with a base 4Booth multiplication carry weight; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the 4Booth multiplication of a base; i is an integer greater than or equal to 0 and less than or equal to 31;

determining the bit positions corresponding to the 16 groups of 32-bit partial products with the basic 4Booth multiplication carry weight on the 0 th bit to the 63 rd bit, respectively compressing the partial products on the 0 th bit to the 63 rd bit, and outputting 2 groups of 64-bit data; the number of carry save adders used for compression on the 0 th to 63 th bit bits of the multi-path carry save adders is the sum of the number of partial products on the corresponding bit and the number of sign bits minus 2;

dividing the 2 groups of data with 64 bits obtained by compression into N data groups according to the sequence of the bits from low to high, wherein each data group comprises a plurality of bits in the 2 groups of data with 64 bits, and N is an integer less than or equal to 7; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer greater than or equal to 0 and less than or equal to 31;

preprocessing a plurality of bits contained in each data group;

calculating carry output of a plurality of bit positions contained in each data group, wherein for the nth data group in the N data groups, operation is carried out according to the preprocessing result of the nth data group and the interstage carry parameter of the (N-1) th data group, the carry output of each bit position corresponding to the nth data group and the interstage carry parameter of the nth carry module are generated, and N is an integer which is greater than 1 and less than or equal to N;

when the sign bit gating control signal of the 2 groups of 64-bit data is a valid bit, processing the partial product of the 2 groups of 64-bit data, wherein the processing comprises: negating the highest bit of the partial product in the 64-bit 2-group data, adding 1 to the highest bit of the first partial product, and adding 1 bit before the highest bit of all the partial products, wherein the bit value is 1; wherein the sign bit gating control signal is used for representing that the partial product is multiplied by a multiplicand by a negative multiple;

and performing operation according to each bit in the processed 2 groups of data with 64 bits and the corresponding carry output to obtain a corresponding summation result.

In a third aspect, the present application provides an arithmetic circuit comprising an adder provided according to any one of the embodiments of the first aspect.

In a fourth aspect, the present application provides a chip comprising an arithmetic circuit provided according to any of the embodiments of the second aspect. The embodiment of the application provides a 64-bit adder for realizing a radix 4Booth multiplier and an implementation method, an arithmetic circuit and a chip thereof, because a multi-path carry save adder is used for determining the bit corresponding to 16 groups of 32-bit partial products with the weight of radix 4Booth multiplication carry on the bit positions from 0 to 63, respectively compressing the partial products on the 0 to 63 bit positions and outputting 2 groups of 64-bit data; the carry adder with the carry chain is used for adding and summing the 2 groups of data of 64 bits, the carry adder with the carry chain comprises N carry modules, each carry module corresponds to a plurality of bits in the 2 groups of data of 64 bits, each carry module comprises a preprocessing unit and a plurality of carry calculation units, the preprocessing unit contained in the nth carry module is used for preprocessing the bits in the corresponding 2 groups of data of 64 bits, the plurality of carry calculation units contained in the nth carry module are used for operating according to the preprocessed result and the interstage carry parameters of the nth carry module to generate carry output of each bit corresponding to the nth carry module and the interstage carry parameters of the nth carry module, so that when the carry parameters output by the nth carry module-1 are acquired, each carry calculation unit in the nth carry module can directly utilize the preprocessing result and the interstage carry parameter output by the (n-1) th carry module to calculate the carry output of each bit position in parallel, thereby basically realizing the carry output of each bit position in the partial product of parallel calculation for summation operation, shortening the time length of the whole calculation process and improving the calculation speed.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a schematic diagram of a 64-bit adder for implementing a 4 Booth-based multiplier according to an embodiment of the present application, which is used for summing 16 groups of data with 16 bits;

fig. 2 is a schematic structural diagram of a multi-way carry save adder in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application;

fig. 4 is a schematic circuit diagram of a first preprocessing unit in a carry module of a carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application;

fig. 5 is a schematic circuit diagram of a second preprocessing unit in a carry module of a carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a carry chain of a carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of an implementation method of a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application.

Detailed Description

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Example one

Fig. 1 is a schematic diagram of a 64-bit adder for implementing a 4-Booth-based multiplier according to an embodiment of the present disclosure, which is used for summing 16 groups of data with 16 bits. Each datum is a partial product and is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the multiplication of the base 4 Booth; i is an integer of 0 or more and 31 or less. Specifically, the multi-way carry save adder is used for determining the corresponding bit positions of 16 groups of 32-bit partial products with base 4Booth multiplication carry weights on the 0 th bit to the 63 rd bit positions. Since the carry weights of the 16 sets of partial products are different, the carry weights are arranged in a staggered manner as shown in fig. 1. The multi-path carry save adder respectively compresses partial products on 0 th to 63 th bit positions and outputs 2 groups of data with 64 bits, and the number of the carry save adders used for compression on the 0 th to 63 th bit positions of the multi-path carry save adder is the sum of the number of the partial products on the corresponding bit positions and the number of sign bits minus 2.

Fig. 2 is a schematic structural diagram of a multi-way carry save adder in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application. The multi-path carry-save adder is used for realizing 8-2 data compression on 16 groups of data with 16 bits and outputting 2 groups of data with 64 bits, and the number of the corresponding carry-save adders on each bit of the 64-bit adder is the sum of the number of partial products on the corresponding bit and the number of sign bits minus 2. For example, for the carry save adders corresponding to the 14 th to 18 th bits, the carry save adders corresponding to the 15 th bits are 7, the carry save adders corresponding to the 16 th bits are 8, the carry save adders corresponding to the 17 th bits are 7, and the carry save adders corresponding to the 18 th bits are 9.

Fig. 3 is a schematic structural diagram of a carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application. The carry adder with the carry chain in this embodiment may be an independent hardware circuit structure, or may be a basic circuit unit structure of other devices such as a chip or a microprocessor. As shown in fig. 3, the carry adder with a carry chain in a 64-bit adder for implementing a radix-4 Booth multiplier provided in the embodiment of the present application includes N carry modules 10, where N is an integer less than or equal to 7. Each carry module corresponds to a plurality of bits in the 64-bit 2 groups of data, wherein the 64-bit 2 groups of data are 16-bit binary numbers. For example, one carry module may correspond to 2 bits, 3 bits or more bits, etc. in a 64-bit group of 2 data. It should be understood that the number of bits in the 64-bit group 2 data corresponding to each carry module 10 in the N carry modules may be the same or different. The partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer of 0 or more and 31 or less.

The nth carry module is connected with the (n-1) th carry module and used for receiving the interstage carry parameter output by the (n-1) th carry module, and therefore the interstage carry parameter of the nth carry module and the carry output of each bit position corresponding to the nth carry module are calculated based on the interstage carry parameter output by the (n-1) th carry module. Wherein N is an integer greater than 1 and less than or equal to N.

Each carry module comprises a preprocessing unit and a plurality of carry calculation units, wherein one carry calculation unit corresponds to one bit of the 64-bit 2-group data.

In this embodiment, the n-th carry module includes a preprocessing unit configured to preprocess a plurality of bits in the corresponding 64-bit group 2 data.

Optionally, in an implementation manner of the present application, the preprocessing result includes: an intra-group carry generation signal and an intra-group carry propagation signal. The n-th carry module includes a preprocessing unit specifically configured to: operating each bit in the corresponding 64-bit group 2 data to generate a carry generation signal and a carry propagation signal corresponding to each bit; an intra-group carry generation signal and an intra-group carry propagation signal for each bit are generated based on a carry generation signal and a carry propagation signal for the corresponding at least one bit, respectively.

Specifically, each bit in the corresponding 64-bit group 2 data is subjected to logical and operation, and a carry generation signal of each bit is generated, wherein the carry generation signal is a logical and value operation result of the corresponding bit in the 64-bit group 2 data. And performing logical OR operation on each bit in the corresponding 64-bit 2 groups of data to generate a carry propagation signal of each bit, wherein the carry propagation signal is a logical OR value operation result of the corresponding bit in the 64-bit 2 groups of data. In order to facilitate the overall layout of the circuit implementation, in the embodiment of the present application, the result of performing a logical negation operation on the carry generation signal of each bit is also sometimes referred to as a carry generation signal. Similarly, the result of the logical negation of the carry propagate signal for each bit is referred to as a carry propagate signal.

After the carry generation signal and the carry propagation signal of each bit corresponding to the nth carry module are obtained, the preprocessing unit included in the nth carry module can also perform logical or operation on the carry generation signals of a plurality of adjacent bits to generate an in-group carry generation signal, and the preprocessing unit included in the nth carry module can also perform logical and operation on the carry propagation signals of a plurality of adjacent bits to generate an in-group carry propagation signal. In order to facilitate the overall layout of the circuit implementation, in the embodiment of the present application, the result of performing the logical negation operation on the carry generation signal in the group is also sometimes referred to as the carry generation signal in the group. Similarly, the result of the carry propagate signal within a group being logically negated is referred to as the carry propagate signal within a group.

For example, for the ith bit in the first addend A and the second addend B, the carry generation signal G of the ith bit_i＝A_i·B_iCarry propagation signal P of the ith bit_i＝A_i+B_i. As described above, in order to facilitate the overall layout of the circuit implementation, the carry generation signal and the carry propagation signal of the ith bit are also represented as

Or

Carry-in-group generation signal G from jth bit to ith bit_i:j＝G_i+G_i+1+…+G_iCarry propagate signal P in groups from jth bit to ith bit_i:j＝P_i·P_i+1·…·P_i. As described above, to facilitate an integrated layout in circuit implementation, the carry generation signal and the carry propagation signal within the group of the jth bit through the ith bit may also be sometimes represented as

And

furthermore, G_i:j＝G_i:k+G_k-1:jAnd, P_i:j＝P_i:k·P_k-1:jAnd k is any bit positioned between the jth bit and the ith bit in the order of the bits from low to high.

In this embodiment, the plurality of carry calculation units included in the nth carry module are configured to perform operation according to the result of the preprocessing and the inter-stage carry parameter of the (n-1) th carry module, and generate a carry output of each bit corresponding to the nth carry module and the inter-stage carry parameter of the nth carry module.

Optionally, in an embodiment of the present application, each carry calculation unit included in the nth carry module is specifically configured to perform an operation according to the group carry generation signal and the group carry propagation signal of the corresponding bit and the inter-stage carry parameter of the n-1 th carry module, and generate a carry output of the corresponding bit.

For the highest bit in the multiple bit positions corresponding to the nth carry module, the carry calculation unit corresponding to the highest bit is further configured to use the carry parameter obtained in the calculation of the carry output of the highest bit corresponding to the nth carry module as the inter-stage carry parameter of the nth carry module.

The carry parameter is an intermediate quantity obtained in the calculation process of the carry output of each bit, and a preset relation exists between the carry parameter and the carry output. The carry output of each bit may be obtained by performing an operation based on the carry parameter of the bit and the carry propagation signal of the bit, and specifically, the carry output of each bit is a logical and operation result of the carry parameter of the bit and the carry propagation signal of the bit. For example, if the carry output of the ith bit is C_iThe carry propagation signal of the ith bit is P_iThe carry parameter of the ith bit is Cp_iIf the predetermined relationship is: c_i＝P_i·Cp_i。

If the highest bit in the bit positions corresponding to the (n-1) th carry module is the (k-1) th bit, the carry calculation units in the (n-1) th carry module calculate the carry output C of the (k-1) th bit_k-1Get the carry parameter Cp_k-1As the n-1 th inter-stage carry parameter. If the output result of the preprocessing unit of the nth carry module comprises an in-group carry generation signal G_i:kAnd carry generation signal P in group_i-1：kThen the carry output of the ith bit is C_i＝G_i：k+P_i：k-1·Cp_k-1. In addition, due to P_i：k-1·Cp_k-1＝P_i：k·P_k-1·Cp_k-1Thus, C_i＝G_i：k+P_i：k·C_k-1The same is true.

Due to G_i：kAnd P_i:kCan be obtained by the processing of the preprocessing unit, therefore, the carry calculation unit corresponding to the ith bit in the nth carry module obtains the inter-stage carry parameter C of the (n-1) th carry module_k-1In time, the carry output or carry parameter of the ith bit can be obtained through simple logic operation. In addition, because the preprocessing unit in the nth carry module can preprocess a plurality of bits corresponding to the nth carry module to obtain a plurality of corresponding carry bits in groupsThe plurality of carry calculation units in the nth carry module of the generation signal and the carry propagation signal within the group may calculate a carry output of each bit in parallel based on the corresponding carry generation signal within the group and the carry propagation signal within the group, thereby improving the efficiency of carry calculation.

It should be appreciated that the carry parameter Cp facilitates an integrated layout of the circuit when implemented_k-1And carry out C_k-1Is also sometimes indicated as

And

in the embodiment of the present application, since the preprocessing unit included in the nth carry module preprocesses a plurality of bits in the corresponding group 2 data of 64 bits, the carry calculating units included in the nth carry module, is used for carrying out operation according to the result of the preprocessing and the interstage carry parameter of the (n-1) th carry module to generate the carry output of each bit corresponding to the nth carry module and the interstage carry parameter of the nth carry module, when the inter-stage carry parameter output by the (n-1) th carry module is acquired, each carry calculation unit in the (n) th carry module can directly calculate the carry output of each corresponding bit in parallel by using the preprocessing result and the inter-stage carry parameter output by the (n-1) th carry module, thereby basically realizing the carry output of each bit in the parallel computation 16-bit binary data.

In addition, as shown in fig. 3, the multi-way carry saving adder in the 64-bit adder for implementing the radix-4 Booth multiplier further includes a summation module, the summation module is electrically connected to the N carry modules, so as to process the 2 groups of 64-bit data when the sign bit gating control signal of the 2 groups of 64-bit data is a valid bit, and the processing includes: negating the highest bit of all partial products of a multiplicand and a multiplier, adding 1 to the highest bit of the first partial product, and adding 1 bit number before the highest bit of all the partial products, wherein the bit number is 1; and the carry-out unit is used for carrying out operation according to each bit in the processed 2 groups of data with 64 bits and the corresponding carry-out output to obtain a corresponding summation result; wherein the sign bit strobe control signal is used to characterize the partial product as the multiplicand multiplied by a negative multiple.

For example, for the ith bit in the first addend a and the second addend B, the summation result of the ith bit may be obtained according to the following summation formula. The formula is:

wherein, C_i-1And outputting the carry of the (i-1) th bit in the first addend A and the second addend A.

In this embodiment, since the carry output of each bit in the 16-bit binary data is basically calculated in parallel, the sum result of each bit in the 16-bit binary data can be basically calculated in parallel, thereby shortening the time length of the whole calculation process and improving the calculation speed.

Optionally, in an embodiment of the present application, the number of bits in the 64-bit 2-group data corresponding to the nth carry module is equal to or greater than the number of bits in the 64-bit 2-group data corresponding to the n-1 th carry module.

Because the calculation of the carry output of each bit corresponding to the nth carry module depends on the inter-stage carry parameter of the nth-1 carry module, the carry operation time of each carry calculation unit in the nth carry module has a certain logic time delay relative to the carry operation time of each carry calculation unit in the nth-1 carry module. By making the number of the bits in the 64-bit 2-group data corresponding to the nth carry module equal to or greater than the number of the bits in the 64-bit 2-group data corresponding to the n-1 th carry module, the logic delay can be fully utilized to calculate the carry generation signal and the carry propagation signal in the group, so that the situation that the nth carry module waits for the inter-stage carry parameter of the n-1 th carry module during calculation is avoided, and the time consumed by operation is further reduced.

Optionally, in an embodiment of the present application, N is equal to 7, the 1 st carry module corresponds to bits 0 to 3 of the 64-bit group 2 data, the 2 nd carry module corresponds to bits 4 to 7 of the 64-bit group 2 data, the 3 rd carry module corresponds to bits 8 to 15 of the 64-bit group 2 data, the 4 th carry module corresponds to bits 16 to 31 of the 64-bit group 2 data, the 5 th carry module corresponds to bits 32 to 48 of the 64-bit group 2 data, the 6 th carry module corresponds to bits 49 to 58 of the 64-bit group 2 data, and the 7 th carry module corresponds to bits 50 to 63 of the 64-bit group 2 data. Therefore, the layout of the adder is concentrated, the area is small, and the overall structural layout is facilitated.

It should be understood that, in this embodiment, the number N of carry modules may be 2, 4, or more, and the specific bit corresponding to each carry module may be set according to needs, which is not limited in this embodiment.

Example two

Based on the 64-bit adder for implementing the radix-4 Booth multiplier provided in the first embodiment, further, the present embodiment provides a schematic structural diagram of one carry module in the multi-way carry save adder in the 64-bit adder for implementing the radix-4 Booth multiplier shown in fig. 3. It should be understood that the carry module may be any one of the N carry modules in the first embodiment, and for convenience of description, the carry module is hereinafter referred to as an nth carry module. In this embodiment, the n-th carry module includes preprocessing units including at least one first preprocessing unit and at least one second preprocessing unit that are alternately arranged.

In this embodiment, the first preprocessing unit is configured to perform an operation on an ith bit and an (i-1) th bit in the corresponding 64-bit group of data of 2 bits to generate a first preprocessing result, where the first preprocessing result indicates a logical or operation result of carry generation signals of the ith bit and the (i-1) th bit, and i is an odd number.

Optionally, in a specific implementation manner of the present application, as shown in fig. 4, the first preprocessing unit includes: a first and gate 201, a second and gate 202 and a first nor gate 203, wherein a first input terminal and a second input terminal of the first and gate 201 respectively receive the ith bit, and an output terminal of the first and gate 201 is connected to a first input terminal of the first nor gate 203; a first input terminal and a second input terminal of the second and gate 202 respectively receive the (i-1) th bit, an output terminal of the second and gate 202 is connected to a second input terminal of the first nor gate 203, and an output terminal of the first nor gate 203 outputs the first preprocessing result. For example, if the first addend is A and the second addend is B, the first pre-processing result is

Wherein G is_iAnd G_i-1For the carry generation signal of the ith bit and the carry generation signal of the (i-1) th bit.

It should be understood that the first preprocessing unit may also be directly implemented by a nor gate, which is not limited in this embodiment.

In this embodiment, the second preprocessing unit is configured to perform an operation on a jth bit and a j-1 th bit in the corresponding 64-bit group of data 2 to generate a second preprocessing result, where the second preprocessing result indicates a logical and operation result of carry propagation signals of the jth bit and the j-1 th bit, and j is an even number.

Optionally, in a specific implementation manner of the present application, as shown in fig. 5, the second preprocessing unit includes: a first or gate 301, a second or gate 302 and a first nand gate 303, wherein a first input end and a second input end of the first or gate 301 respectively receive the jth bit, and an output end of the first or gate 301 is connected to a first input end of the first nand gate; the first input end and the second input end of the second or gate 302 respectively receive the j-1 th bit, the output end of the second or gate 302 is connected to the second input end of the first nand gate 303, and the output end of the first nand gate 303 outputs the second preprocessing result. For example, if the first addend is A and the second addend is B, the first pre-processing result is

Wherein, P_jAnd P_j-1The carry propagation signal of the j bit and the carry propagation signal of the j-1 bit.

It should be understood that the second preprocessing unit may also be directly implemented by an or nand gate, which is not limited in this embodiment.

Correspondingly, the plurality of carry calculation units included in the nth carry module are used for obtaining carry output of corresponding bit positions based on at least one first preprocessing result, at least one second preprocessing result and the inter-stage carry parameters of the (n-1) th carry module.

Optionally, in an embodiment of the present application, the preprocessing units included in the nth carry module further include a third preprocessing unit and a fourth preprocessing unit, where the third preprocessing unit performs operations on at least two adjacent ones of the first preprocessing result output by the at least one first preprocessing unit and the second preprocessing result output by the at least one second preprocessing unit, respectively, to generate a corresponding third preprocessing result and a fourth preprocessing result, the third preprocessing result indicates a carry parameter between corresponding adjacent bits, and the fourth preprocessing result indicates a logical and operation result of a carry propagation signal of corresponding adjacent bits. And the plurality of carry calculation units contained in the nth carry module are used for obtaining carry output of corresponding bit positions based on the third preprocessing result, the fourth preprocessing result and the interstage carry parameters of the nth-1 carry module.

For example, the third preprocessing unit processes the first preprocessed result

And

and a second pre-processing result

Performing an operation to generate an instructionCarry parameter between 4 bit to 7 bit

The fourth preprocessing unit pair is based on the second preprocessing result

And second pre-processing results

Performing an operation to generate the logical OR operation result of the carry generation signals indicating the 3 rd bit through the 6 th bit, i.e. an intra-group carry propagation signal

(i.e., PAN _6_ 3). The corresponding carry calculation unit may obtain the carry output of the 7 th bit based on the third pre-processing result GON _7_4 and the fourth pre-processing result PAN _6_3, in combination with the inter-stage carry parameter of the n-1 th carry module.

Optionally, in an embodiment of the present application, the multiple carry calculation units included in the nth carry module include a first carry calculation unit corresponding to the ith bit, and the first carry calculation unit includes a third or gate, a third and gate, and a second nor gate;

a first input end of a third OR gate is connected to an output end of the corresponding second preprocessing unit, a second input end of the third OR gate is connected to the inter-stage carry parameter output by the (n-1) th carry module, an output end of the third OR gate is connected to a first input end of the third AND gate, a second input end of the third AND gate is connected to an output end of the corresponding first preprocessing unit, and an output end of the third AND gate outputs the carry parameter of the ith bit;

the output end of the third AND gate is connected to the first input end of the second NOR gate, the second input end of the second NOR gate receives the carry propagation signal of the ith bit, and the output end of the second NOR gate is connected to the summation module so as to output the carry output of the ith bit to the summation module.

Optionally, in an embodiment of the present application, the plurality of carry calculation units further includes a second carry calculation unit corresponding to the jth bit, and the second carry calculation unit includes a fourth or gate and a second nand gate.

The first input end of the fourth or gate is connected to the output end of the corresponding second preprocessing unit, the second input end of the fourth or gate is connected to the inter-stage carry parameter output by the (n-1) th carry module or the carry parameter of the (j-1) th bit, the output end of the fourth or gate is connected to the first input end of the second nand gate, the second input end of the second nand gate receives the carry generation signal corresponding to the j th bit, and the output end of the second nand gate is connected to the summation module so as to output the carry output of the j th bit to the summation module.

In this embodiment, because the first preprocessing unit, the second preprocessing unit, the third preprocessing unit, and the fourth preprocessing unit in each carry module preprocess a plurality of bits in the 64-bit 2-group data corresponding to each carry module, and each carry module includes a plurality of carry computing units, when each carry module acquires the inter-stage carry parameter output by the previous carry module, the plurality of carry computing units in each carry module can directly utilize the preprocessing result and the inter-stage carry parameter output by the previous carry module to compute the carry output of each corresponding bit in parallel, thereby basically implementing the carry output of each bit in the 16-bit binary data in parallel.

As shown in fig. 6, the 1 st carry module corresponds to the 0 th bit to the 3 rd bit of the 64-bit 2-group data, the 2 nd carry module corresponds to the 4 th bit to the 7 th bit of the 64-bit 2-group data, the 3 rd carry module corresponds to the 8 th bit to the 15 th bit of the 64-bit 2-group data, the 4 th carry module corresponds to the 16 th bit to the 23 th bit of the 64-bit 2-group data, and the 5 th carry module corresponds to the 24 th bit to the 31 th bit of the 64-bit 2-group data.

In addition, by regularly arranging the first preprocessing unit, the second preprocessing unit, the third preprocessing unit, the fourth preprocessing unit, the first carry calculating unit and the second carry calculating unit, the calculation speed of the 64-bit adder for realizing the 4 Booth-based multiplier can be improved, meanwhile, the occupied area of the 64-bit adder for realizing the 4 Booth-based multiplier can be reduced, the wiring is concentrated, and the overall structural layout is facilitated.

It should be noted that fig. 6 is only a specific example for illustrating a carry chain of a multi-way carry save adder in a 64-bit adder for implementing a radix 4Booth multiplier provided in this embodiment, the number of carry modules may be 2, 4, or more according to actual needs, and specific bits corresponding to each carry module may be set according to needs, which is not limited in this embodiment.

EXAMPLE III

Based on the 64-bit adder for realizing the radix-4 Booth multiplier provided by the embodiments, the embodiments of the present application provide an implementation method of the 64-bit adder for realizing the radix-4 Booth multiplier. Fig. 7 is a flowchart of an implementation method of a 64-bit adder for implementing a radix-4 Booth multiplier according to an embodiment of the present application. As shown in fig. 7, the method for implementing a 64-bit adder for implementing a radix-4 Booth multiplier includes:

s501, receiving 16 groups of 32-bit partial products with base 4Booth multiplication carry weights; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer greater than or equal to 0 and less than or equal to 31;

s502, determining bit positions corresponding to 16 groups of 32-bit partial products with base 4Booth multiplication carry weight on 0-63 bit positions, respectively compressing the partial products on the 0-63 bit positions, and outputting 2 groups of 64-bit data; the number of the carry-save adders used for compression on the 0 th bit to the 63 th bit of the multi-path carry-save adder is the sum of the number of the partial products on the corresponding bit and the number of the sign bits minus 2;

s503, dividing the compressed 2 groups of 64-bit data into N data groups according to the sequence of the bit positions from low to high, wherein each data group comprises a plurality of bit positions in the 64-bit 2 groups of data, and N is an integer less than or equal to 7; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer greater than or equal to 0 and less than or equal to 31;

s504, preprocessing a plurality of bits contained in each data group;

s505, calculating carry output of a plurality of bit positions contained in each data group, wherein for the nth data group in the N data groups, operation is performed according to the preprocessing result of the nth data group and the interstage carry parameter of the (N-1) th data group, the carry output of each bit position corresponding to the nth data group and the interstage carry parameter of the nth carry module are generated, and N is an integer which is greater than 1 and less than or equal to N;

s506, when the sign bit gating control signal of the 64-bit 2-group data is a valid bit, processing the 64-bit 2-group data, the processing including: negating the highest bit of all partial products of the multiplicand and the multiplier, adding 1 to the highest bit of the first partial product, and adding 1 bit before the highest bit of all the partial products, wherein the bit value is 1; wherein the sign bit gating control signal is used for representing that the partial product is multiplied by a multiplicand by a negative multiple;

and S507, performing operation according to each bit in the processed 2 groups of data with 64 bits and the corresponding carry output to obtain a corresponding summation result.

The method for implementing the 64-bit adder for implementing the radix-4-Booth multiplier provided in the embodiment of the present application is used for implementing the 64-bit adder for implementing the radix-4-Booth multiplier in the foregoing device embodiment, and has the beneficial effects of the corresponding device embodiment, and details are not repeated here.

Example four

The embodiment of the application provides an arithmetic circuit which comprises a 64-bit adder provided according to any one of the first embodiment and the second embodiment and used for realizing a base 4Booth multiplier. The principle and effect are similar, and the detailed description is omitted here.

EXAMPLE five

The embodiment of the present application provides a chip, which includes the operation circuit provided according to the fourth embodiment. The principle and effect are similar, and the detailed description is omitted here.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A 64-bit adder for implementing a radix-4 Booth multiplier, the 64-bit adder for implementing a radix-4 Booth multiplier comprising:

the multi-path carry storage adder is used for determining bit positions corresponding to 16 groups of 32-bit partial products with base 4Booth multiplication carry weights on 0 th to 63 th bit positions, respectively compressing the partial products on the 0 th to 63 th bit positions and outputting 2 groups of 64-bit data, wherein the number of the multi-path carry storage adders used for compressing on the 0 th to 63 th bit positions is the sum of the number of the partial products on the corresponding bit positions and the number of sign bits minus 2;

a carry adder with a carry chain for adding and summing the 2 groups of data of 64 bits, the carry adder with the carry chain comprising:

each carry module corresponds to a plurality of bit positions of the 64-bit 2-group data, wherein the nth carry module is connected with the (N-1) th carry module and is used for receiving the interstage carry parameters output by the (N-1) th carry module, the multiplicand and the multiplier are 32-bit binary numbers, N is an integer less than or equal to 7, and N is an integer greater than 1 and less than or equal to N; each carry module comprises a preprocessing unit and a plurality of carry calculation units, wherein one carry calculation unit corresponds to one bit of the 64-bit group 2 data; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the 4Booth multiplication of a base; i is an integer greater than or equal to 0 and less than or equal to 31;

a summation module electrically connected to the N carry modules for processing the 2 groups of 64-bit data when a sign bit gating control signal of the 2 groups of 64-bit data is a valid bit, the processing comprising: negating the highest bit of the partial product of the 64-bit 2-bit data, adding 1 to the highest bit of the first partial product, and adding 1 bit before the highest bit of all the partial products, wherein the bit is 1; and is used for carrying on the operation according to each bit in 2 groups of data of said 64 bit after processing, and corresponding carry output, get the corresponding summation result; wherein the sign bit strobe control signal is used to characterize the partial product as the multiplicand multiplied by a negative multiple.

2. A 64-bit adder according to claim 1 for implementing a radix-4 Booth multiplier, wherein N is equal to 7, the 1 st carry module corresponds to the 0 th bit to the 3 rd bit of the 2 groups of data with 64 bits, the 2 nd carry module corresponds to the 4 th bit to the 7 th bit of the 64-bit 2-group data, the 3 rd carry module corresponds to 8 th bit to 15 th bit of the 64-bit 2-group data, the 4 th carry module corresponds to 16 th bit to 31 th bit of the 64-bit 2-group data, the 5 th carry module corresponds to 32 th bit to 48 th bit of the 64-bit 2-group data, the 6 th carry module corresponds to 49 th bit to 58 th bit of the 64-bit 2-group data, the 7 th carry module corresponds to the 50 th bit to the 63 th bit of the 64-bit group 2 data.

3. A 64-bit adder for implementing a radix-4 Booth multiplier as claimed in claim 2, wherein the pre-processed result comprises: an intra-group carry generation signal and an intra-group carry propagation signal;

the n-th carry module includes a preprocessing unit specifically configured to: operating each bit in the corresponding 64-bit group 2 data to generate a carry generation signal and a carry propagation signal corresponding to each bit; generating an in-group carry generation signal and an in-group carry propagation signal for each bit based on a carry generation signal and a carry propagation signal for the corresponding at least one bit, respectively;

each carry calculation unit included in the nth carry module is specifically configured to perform operation according to the group carry generation signal and the group carry propagation signal of the corresponding bit and the inter-stage carry parameter of the (n-1) th carry module, and generate a carry output of the corresponding bit.

4. A 64-bit adder according to claim 3 and configured to implement a radix-4 Booth multiplier, wherein the carry computation unit of the highest bit corresponding to the nth carry module is further configured to use a carry parameter obtained in computation of a carry output of the highest bit in the plurality of bits corresponding to the nth carry module as the inter-stage carry parameter of the nth carry module, and the carry output of the highest bit is obtained by performing an operation based on the carry parameter of the highest bit and a carry propagation signal of the highest bit.

5. The 64-bit adder for implementing a radix-based 4Booth multiplier of claim 4, wherein the n-th carry module comprises preprocessing units comprising at least one first preprocessing unit and at least one second preprocessing unit arranged alternately;

the first preprocessing unit is used for operating the ith bit and the (i-1) th bit in the corresponding 64-bit 2-group data to generate a first preprocessing result, the first preprocessing result indicates the logical OR operation result of the carry generation signals of the ith bit and the (i-1) th bit, and i is an odd number;

the second preprocessing unit is configured to perform operation on a jth bit and a j-1 th bit in the corresponding 64-bit group of 2 data to generate a second preprocessing result, where the second preprocessing result indicates a logical and operation result of carry propagation signals of the jth bit and the j-1 th bit, and j is an even number;

the n-th carry module comprises a plurality of carry calculation units, and the carry calculation units are used for obtaining carry output of corresponding bit positions based on the first preprocessing result, the second preprocessing result and the inter-stage carry parameters of the n-1 th carry module.

6. The 64-bit adder for implementing a radix-based 4Booth multiplier of claim 5, wherein the plurality of carry calculation units included in the nth carry module comprises a first carry calculation unit corresponding to the ith bit, and the first carry calculation unit comprises a third or gate, a third and gate, and a second nor gate;

a first input end of the third or gate is connected to an output end of the corresponding second preprocessing unit or fourth preprocessing unit, a second input end of the third or gate is connected to the inter-stage carry parameter output by the n-1 th carry module, an output end of the third or gate is connected to a first input end of the third and gate, a second input end of the third and gate is connected to an output end of the corresponding first preprocessing unit or third preprocessing unit, and an output end of the third and gate outputs the carry parameter of the ith bit;

7. The 64-bit adder according to claim 6, wherein the carry computation units comprise a second carry computation unit corresponding to the jth bit, and the second carry computation unit comprises a fourth or gate and a second nand gate;

a first input end of the fourth or gate is connected to an output end of the corresponding second preprocessing unit, a second input end of the fourth or gate is connected to the inter-stage carry parameter output by the n-1 th carry module or the carry parameter of the j-1 th bit, an output end of the fourth or gate is connected to a first input end of the second nand gate, a second input end of the second nand gate receives the carry generation signal corresponding to the j bit, and an output end of the second nand gate is connected to the summation module to output the carry output of the j bit to the summation module.

8. A method for implementing a 64-bit adder for implementing a radix-4 Booth multiplier, comprising:

receiving 16 groups of 32-bit partial products with a base 4Booth multiplication carry weight; the partial product is used for representing the product of the ith +1 bit, the ith bit and the (i-1) th bit of the multiplier and the multiplicand based on the radix 4Booth multiplication; i is an integer greater than or equal to 0 and less than or equal to 31;

dividing the 2 groups of data with 64 bits obtained by compression into N data groups according to the sequence of the bits from low to high, wherein each data group comprises a plurality of bits in the 2 groups of data with 64 bits, and N is an integer less than or equal to 7;

preprocessing a plurality of bits contained in each data group;

when the sign bit gating control signal of the 2 groups of 64-bit data is a valid bit, processing the partial product of the 2 groups of 64-bit data, wherein the processing comprises: negating the highest bit of the partial product of the 64-bit 2 groups of data, adding 1 to the highest bit of the first partial product, and adding 1 bit before the highest bit of all the partial products, wherein the bit is 1; wherein the sign bit gating control signal is used for representing that the partial product is multiplied by a multiplicand by a negative multiple;

9. An arithmetic circuit comprising a 64-bit adder according to any of claims 1 to 7 for implementing a radix-4 Booth multiplier.

10. A chip characterized in that it comprises an arithmetic circuit according to claim 9.