CN116048455B

CN116048455B - Insertion type approximate multiplication accumulator

Info

Publication number: CN116048455B
Application number: CN202310207613.1A
Authority: CN
Inventors: 刘伟强; 尹培培; 陈珂; 王成华; 夏伟杰
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2023-06-02
Anticipated expiration: 2043-03-07
Also published as: CN116048455A

Abstract

An insertion type approximate multiplication accumulator comprises an accurate partial product generating circuit, an approximate partial product generating circuit, a sign bit expanding circuit, an accurate partial product compressing circuit, an approximate partial product compressing circuit and a travelling wave carry adder circuit. The exact partial product generating circuit multiplies the high of the input operandM‑kBit division is used for coding, and an accurate partial product is generated; the approximate partial product generating circuit multiplies the low of the input operandk‑pBit division is used for coding, and an approximate partial product is generated; the sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the cumulative term and adds the highest bit of the cumulative termN‑kBit combining output to the accurate partial product compression circuit to approximate partial product sum low of the cumulative termk‑pBit merging and outputting to an approximate partial accumulating circuit; and finally, generating a multiplication accumulation structure by the travelling wave carry adder circuit. The invention can realize faster multiply-accumulate calculation and lower energy consumption by approximate calculation.

Description

Insertion type approximate multiplication accumulator

Technical Field

The invention relates to the technical field of approximate arithmetic circuit design, in particular to an insertion type approximate multiplication accumulator.

Background

With the advent of the big data age, artificial intelligence, machine learning, data mining, image recognition and the like have been increasingly applied, the complexity of algorithms and the processing capacity of data have been increased continuously, while such cognitive applications are similar to human perception, usually do not need unique or accurate results, and a sufficiently good result that can be accepted by users can meet the application requirements. The approximate calculation is used as a novel calculation paradigm for improving the processing energy efficiency of the integrated circuit, and the low power consumption and the high performance of the system are replaced at the cost of calculation precision, so that the method is an effective method for reducing the calculation complexity of the fault-tolerant application and improving the calculation speed.

The multiplication accumulator is a typical calculation in cognitive application, and the basic operation unit of calculation such as convolution, dot multiplication and the like is the multiplication accumulator. The approximate multiply-accumulator is less studied, most of which are the application of an approximate multiplier or an approximate adder to the multiply-accumulator. Paper An Approximate Multiply-Accumulate Unit with Low Power and Reduced Area published in IEEE Computer Society Annual Symposium on VLSI in 2019 discloses an approximate multiply-accumulate unit based on multiplication and addition combined approximation processing, but the method is applicable to unsigned multiply-accumulate units, such as extended to signed computation, requiring the input to be converted into an unsigned number in advance by adding a sign bit preprocessing unit, but this processing additionally increases system power consumption.

Disclosure of Invention

The present invention addresses the deficiencies in the prior art by providing an insertion type approximate multiply accumulator.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an inserted approximate multiplication accumulator is characterized in that the approximate multiplication accumulator comprises an accurate partial product generating circuit, an approximate partial product generating circuit, a sign bit expanding circuit, an accurate partial product compressing circuit, an approximate partial product compressing circuit and a traveling wave carry adder circuit;

the exact partial product generating circuit multiplies the high of the input operandM-kBits are encoded according to a group of three bits, and an accurate partial product is generated; wherein the input bit widths of the multiplication input operands are allMBinary representation of the multiplicand and multiplier, respectively, as

and />

，mRepresenting the number of bits>

The cumulative term has an input bit width ofN，/>

，kIs of approximate length and is a positive integer;

the approximate partial product generating circuit multiplies the low of the input operandk-pBits are encoded according to a group of three bits, and an approximate partial product is generated; wherein,pis a truncated length and is a positive integer;

the sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the cumulative term, and expands the high bit of the accurate partial product and the cumulative term after expansionN-kBit merging and outputting to an accurate partial product compression circuit; the sign bit expansion circuit reduces the approximate partial product and the accumulation termk-pBit merging and outputting to an approximate partial product compression circuit;

the accurate partial product compression circuit will be high in the accurate partial product and the cumulative termN-kBit compression into two rows, and output to the travelling wave carry adder;

the approximate partial product compression circuit reduces the approximate partial product and the sum termk-pBit compression into two rows, and output to the travelling wave carry adder;

generated by traveling wave carry adder circuit for accurate part of the compression circuit and approximate part of the compression circuitN-pThe partial product of two rows of bits is processed to produce a highN-pOutput result of bit multiplication accumulator, lowpThe bit direct output is 0.

In order to optimize the technical scheme, the specific measures adopted further comprise:

further, in the accurate partial product generating circuit, the output expression of the accurate partial product is:

, in the formula ,/>

Represent the firstiLine 1jPrecise partial product of the column,/->

，/>

，/>

Represents the negation operation, + represents the OR operation, +.>

Representing an exclusive or operation.

Further, in the approximate partial product generating circuit, the output expression of the approximate partial product is:

, in the formula ,

represent the firstiLine 1jAn approximate partial product of the columns.

Further, the sign bit expansion circuit expands the sign bit of the most significant bit of the accurate partial product according to the bit width of the cumulative term, and the calculation process comprises:

the exact partial product of each row is calculated according to the cumulative term bit widthNExpanding, the sign bit of the accurate partial product of the 1 st row is

First, thenThe sign bit of the exact partial product of the row is +.>

Accumulating sign bits of all accurate partial products to form an accumulated sum of sign bitss，/>

，/>

Representing the negation operation;

wherein, the expansion sign bit of the accurate partial product of the 1 st row is simplified toM+1、M+2 and M+3 for these 3 columns, corresponding to

、/>

And

the expansion sign bit of the accurate partial product of the 2 nd row is simplified intoM+3 and M+4 for 2 columns, correspondingto +.>

And 1, th->

Line 2M-1 and 2MThe expansion sign bit of the column-accurate partial product is reduced to +.>

And 1, 2M+1 toNThe column expansion sign bits are all corresponding to 1.

Further, the precision partial backlog circuit is comprised of a precision 4-2 compressor and a precision 3-2 compressor.

Further, the approximate partial product compression circuit is comprised of an approximate 4-2 compressor.

The beneficial effects of the invention are as follows: the invention discloses an approximate multiplication accumulator, which inserts accumulation items into the operation process of a multiplier, wherein the approximate multiplication accumulator is at low levelpBit direct output 0, middlek-pBit use approximation partial product generation circuit and approximation partial product compression circuit, highM-kThe bit uses the accurate partial product generating circuit, the high is passed through the sign bit expanding circuitM-kThe bit-accurate partial product is extended toN-kAfter the bits, the exact partial product compression circuit is used again. Compared with the prior approximate multiplication accumulator, the invention has faster calculation speed and lower power consumption and area. The multiplication accumulation circuit adopts a cutting-off and approximation means, thereby effectively reducing the complexity of the circuit. The sign bit expansion of the invention can effectively process signed accumulation items in advance, thereby accelerating the calculation speed.

Drawings

Fig. 1 is a block diagram of the structure of the approximate multiply accumulator of the present invention.

FIG. 2 is a schematic view ofM=8，N=20，p=3，kAn exemplary 8 x 8+20 bit input approximate multiply accumulator array schematic, where,

representing the approximate partial product>

Representing the exact partial product,/->

Representing the sign bit compensation partial product,/->

The term of accumulation is represented by a term of accumulation,

representing truncated cumulative term,/->

Representing the truncated partial product, & lt & gt>

Representing the truncated sign bit compensation partial product, < >>

Representing approximate compressor>

Representing an accurate compressor.

Fig. 3 is a truth table for approximate partial product generation.

Fig. 4 is a gate level circuit diagram of approximate partial product generation.

Fig. 5 is a truth table for an approximation 4-2 compressor.

Fig. 6 is a gate level circuit diagram of an approximate 4-2 compressor.

Description of the embodiments

The invention will now be described in further detail with reference to the accompanying drawings.

Fig. 1 is a block diagram of the structure of an approximate multiply accumulator of the present invention, which includes an exact partial product generation circuit, an approximate partial product generation circuit, a sign bit expansion circuit, an exact partial product compression circuit, an approximate partial product compression circuit, and a traveling wave carry adder circuit.

Multiplicand with approximate multiply accumulatorAMultiplier and multiplierBThe input bit widths are allMBinary is expressed as

And

，mrepresenting the number of bits>

The cumulative term has an input bit width ofN，MAndNare all positive integers, and ∈ ->

。

The exact partial product generating circuit is a high that multiplies the input operands (multiplicand and multiplier)M-kBits are encoded in a three-bit set of partitions, generating an exact partial product,kis of approximate length and is a positive integer.

The approximate partial product generating circuit is to multiply the low of the input operandk-pThe bits are encoded in a set of three bits to generate an approximate partial product.

The sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the cumulative term, and expands the high bit of the accurate partial product and the cumulative term after expansionN-kThe bits are combined and output to the accurate partial product compression circuit. The sign bit expansion circuit also approximates partial product and cumulant lowk-pThe bits are combined and output to an approximate partial product compression circuit,pis a truncated length and is a positive integer.

The accurate partial product compression circuit compresses all accurate partial product and high-order accumulation items into two rows by adopting an accurate 4-2 compressor and an accurate 3-2 compressor, and outputs the two rows to the travelling wave carry adder.

The approximate partial product compression circuit compresses all the approximate partial product and low-order accumulation items into two rows by adopting an approximate 4-2 compressor, and outputs the two rows to the travelling wave carry adder.

The traveling wave carry adder generates the accurate part of the accumulation circuit and the approximate part of the accumulation circuitN-pThe partial product of two rows of bits is processed to produce a highN-pOutput result of bit multiplication accumulator, lowpThe bit is directly output as 0, and the two parts form the structure of the final multiplication accumulator.

The output expression of the accurate partial product generation circuit is:

，

in the formula ,

represent the firstiLine 1jPrecise partial product of the column,/->

，/>

，/>

Represents the negation operation, + represents the OR operation, +.>

Representing an exclusive or operation.

The output expression of the approximate partial product generation circuit is:

, in the formula ,/>

Represent the firstiLine 1jAn approximate partial product of the columns.

Fig. 3 is a truth table for approximate partial product generation, and fig. 4 is a gate-level circuit diagram for approximate partial product generation.

The sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the accumulation term, and the specific calculation process comprises the following steps:

the exact partial product of each row is calculated according to the cumulative term bit widthNExpansion is performed, wherein the sign bit of the first row of accurate partial product is

First, thenThe sign bit of the exact partial product of the row is +.>

Accumulating sign bits of all accurate partial products to form an accumulated sum of sign bitssSimplified->

。

Wherein, the expansion sign bit of the accurate partial product of the 1 st row is simplified toM+1、M+2、M+3 for these 3 columns, corresponding to

、/>

And

namely, the expansion sign bit of the 1 st row and the 9 th to 11 th columns in the figure 2; the expansion sign bit of the accurate partial product of the 2 nd row is simplified intoM+3、M+4 for 2 columns, correspondingto +.>

And 1 to->

The expansion sign bit simplification method of the accurate partial product of the rows is similar, namely, expansion sign bits of the 11 th and 12 th columns of the 2 nd row, the 13 rd and 14 th columns of the 3 rd row and the 15 th and 16 th columns of the 4 th row in the figure 2; at the same time at 2M+1 toNThe expansion sign bit is 1, namely the expansion sign bit of the 17 th to 20 th columns of the 4 th row in fig. 2.

The accurate partial backlog circuit is composed of an accurate 4-2 compressor and an accurate 3-2 compressor.

The approximate partial backlog circuit is composed of an approximate 4-2 compressor, and is approximate 4The input to the-2 compressor is

、/>

、/>

、

The 4 inputs are 4 approximate partial product bits of the same weight, and the output isSumAndCout：/>

，

. Fig. 5 is a truth table for an approximate 4-2 compressor, and fig. 6 is a gate level circuit diagram for an approximate 4-2 compressor.

Fig. 2 shows a partial product process of an approximate multiply-accumulator of an embodiment of the invention. Referring to fig. 2, the two inputs of the multiplicand and multiplier are passed through a partial product generating circuit to generate a partial product array as shown in the upper half of the first stage, where the solid circles represent the exact partial products, which are generated by the exact partial product generating circuit. The open circles represent approximate partial products, and are generated by an approximate partial product generating circuit. The concentric circles are truncated partial products. The diamonds represent the cumulative entries of the inputs that are inserted into the multiply-generated partial array and combined to form a new partial array for the next stage of processing. In the first stage, the dashed box represents an approximately 4-2 compressor, together forming an approximately partial backlog circuit. The solid line boxes represent precision compressors, which together form a precision partial backlog circuit. The accurate partial product compression circuit and the approximate partial product compression circuit compress all partial products into two rows through the second stage and the third stage step by step, then the traveling wave carry adder accumulates the partial products of the two rows, and the fourth stage is entered to generate a final multiplication accumulation result.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. An inserted approximate multiplication accumulator, characterized in that the approximate multiplication accumulator comprises an accurate partial product generating circuit, an approximate partial product generating circuit, a sign bit expanding circuit, an accurate partial product compressing circuit, an approximate partial product compressing circuit and a travelling wave carry adder circuit;

the precise partial product generation circuit multiplies high of the input operandsM-kBits are encoded according to a group of three bits, and an accurate partial product is generated; wherein the input bit widths of the multiplication input operands are allMBinary representation of the multiplicand and multiplier, respectively, as

and />

，mRepresenting the number of bits>

The cumulative term has an input bit width ofN，/>

，kIs of approximate length and is a positive integer;

the approximate partial product generation circuit multiplies the low of the input operandk-pBits are encoded according to a group of three bits, and an approximate partial product is generated; wherein,pis a truncated length and is a positive integer;

the sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the cumulative term, and expands the high bit of the accurate partial product and the cumulative term after expansionN-kBit combining output to accurate partial product compressionIn the circuit; the sign bit expansion circuit approximates partial product and cumulant lowk-pBit merging and outputting to an approximate partial product compression circuit;

the accurate partial product compression circuit will be accurate and accumulate high of the termN-kBit compression into two rows, and output to the travelling wave carry adder;

the approximate partial product compression circuit sums the approximate partial product with low sum termk-pBit compression into two rows, and output to the travelling wave carry adder;

the traveling wave carry adder circuit is used for generating a precise part of the accumulation circuit and an approximate part of the accumulation circuitN-pThe partial product of two rows of bits is processed to produce a highN-pOutput result of bit multiplication accumulator, lowpThe bit direct output is 0.

2. An inserted approximate multiply accumulator as claimed in claim 1, characterized in that: in the precise partial product generating circuit, the output expression of the precise partial product is as follows:

, in the formula ,/>

Represent the firstiLine 1jPrecise partial product of the column,/->

，/>

，/>

Represents the negation operation, + represents the OR operation, +.>

Representing an exclusive or operation.

3. As claimed inAn inserted approximate multiply accumulator as claimed in 1, characterized in that: in the approximate partial product generating circuit, an output expression of the approximate partial product is:

, in the formula ,/>

Represent the firstiLine 1jAn approximate partial product of the columns.

4. An inserted approximate multiply accumulator as claimed in claim 1, characterized in that: the sign bit expansion circuit expands the sign bit of the highest bit of the accurate partial product according to the bit width of the accumulation term, and the calculation process comprises the following steps:

First, thenThe sign bit of the exact partial product of the row is +.>

Accumulating sign bits of all accurate partial products to form an accumulated sum of sign bitss，

，/>

Representing the negation operation;

、/>

and />

And 1, th->

And 1, 2M+1 toNThe column expansion sign bits are all corresponding to 1.

5. An inserted approximate multiply accumulator as claimed in claim 1, characterized in that: the accurate partial product compression circuit is composed of an accurate 4-2 compressor and an accurate 3-2 compressor.

6. An inserted approximate multiply accumulator as claimed in claim 1, characterized in that: the approximate partial-compression circuit is comprised of an approximate 4-2 compressor.