WO2022178861A1 - 并行乘法器及其工作方法 - Google Patents

并行乘法器及其工作方法 Download PDF

Info

Publication number
WO2022178861A1
WO2022178861A1 PCT/CN2021/078251 CN2021078251W WO2022178861A1 WO 2022178861 A1 WO2022178861 A1 WO 2022178861A1 CN 2021078251 W CN2021078251 W CN 2021078251W WO 2022178861 A1 WO2022178861 A1 WO 2022178861A1
Authority
WO
WIPO (PCT)
Prior art keywords
circuit
partial product
adder
encoding
addition
Prior art date
Application number
PCT/CN2021/078251
Other languages
English (en)
French (fr)
Inventor
尹首一
段宁远
韩慧明
刘雷波
魏少军
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to PCT/CN2021/078251 priority Critical patent/WO2022178861A1/zh
Publication of WO2022178861A1 publication Critical patent/WO2022178861A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Definitions

  • the invention relates to the technical field of digital signal processing, in particular to a parallel multiplier and a working method thereof.
  • Digital multipliers are widely used in microprocessors, multimedia and digital signal processors, and other products.
  • Typical DSP functions include convolution, digital filtering, and signal conversion; these functions require various multipliers.
  • a large number of multipliers and multiplier-adders are required. Therefore, a general-purpose multiplier with superior performance (and thus other sub-multipliers) is urgently needed by industry and academia.
  • the criteria for measuring the performance of the multiplier include: critical path delay, area, power consumption, power consumption delay product, area delay product, etc.
  • the architecture of the parallel multiplier is divided into three parts: 1. The generation of the partial product; 2. The accumulation of the partial product; 3. The fast addition of the last two partial product rows.
  • the prior art is based on the traditional square root carry selection adder.
  • Figure 1 shows the structure of the linear carry selection adder. If the depth of each stage increases linearly, it is the square root carry Select Adder. However, as can be seen from Figure 1, the traditional square root carry selection adder has poor timing and a large area, which affects the area of the parallel multiplier and is not conducive to the parallel multiplier meeting the high-performance timing requirements.
  • Embodiments of the present invention provide a parallel multiplier to solve the technical problems of large area and poor timing of the parallel multiplier in the prior art.
  • the parallel multiplier includes:
  • An encoding-decoding circuit for encoding and decoding the digital set of NR4SD + to obtain a partial product array
  • a reduction tree structure connected to the encoding and decoding circuit, for performing accumulation processing on the parts other than the last two partial product rows in the partial product array;
  • an improved square root select carry adder connected to the reduction tree structure for adding the last two partial product rows of the partial product array, wherein the improved square root select carry adder comprises a full adder , a half adder, and a first custom combinational circuit that includes a digital circuit device.
  • the embodiment of the present invention also provides a working method of a parallel multiplier, so as to solve the technical problems of large area and poor timing of the parallel multiplier in the prior art.
  • the method includes:
  • the NR4SD + digital set is encoded and decoded by the encoding and decoding circuit to obtain a partial product array
  • the last two partial product rows of the partial product array are added by an improved square root selective carry adder, wherein the improved square root selective carry adder includes a full adder, a half adder and a first custom combination circuit , the first custom combination circuit includes a digital circuit device.
  • the present invention it is proposed to encode and decode the digital set of NR4SD + to obtain a partial product array.
  • its numerical set is 1 bit less than that of the improved Potz coding, which is beneficial to reducing
  • the circuit area of the partial product is generated.
  • the parts other than the last two partial product rows in the partial product array are accumulated through the reduction tree structure, and the last two partial product rows of the partial product array are added.
  • an improved square root selective carry adder is proposed.
  • the improved square root selective carry adder includes a full adder, a half adder and a first custom combination circuit.
  • the first custom combination circuit includes digital circuit devices, which is different from the traditional one.
  • the timing sequence can be improved, and the area is reduced, which is beneficial for the parallel multiplier to meet the timing requirement of high performance.
  • Fig. 1 is the structural representation of a kind of linear carry selection adder in the prior art
  • FIG. 2 is a structural block diagram of a parallel multiplier provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a square root selective carry adder provided by an embodiment of the present invention that implements an improvement on the last two partial product rows of a partial product array;
  • FIG. 4 is a schematic schematic diagram of a calculation process of the above-mentioned parallel multiplier provided by an embodiment of the present invention.
  • Fig. 5 is a kind of NR4SD - 1 bit coding circuit schematic diagram in the prior art
  • Fig. 6 is a kind of NR4SD + 1 bit coding circuit schematic diagram in the prior art
  • Fig. 7 is a kind of NR4SD - bit wide level coding circuit schematic diagram in the prior art
  • Fig. 8 is a kind of NR4SD + bit wide level coding circuit schematic diagram in the prior art
  • Fig. 9 is a kind of NR4SD - decoding circuit schematic diagram in the prior art.
  • FIG. 10 is a schematic diagram of a NR4SD + decoding circuit in the prior art
  • FIG. 11 is a schematic structural diagram of encoding and decoding provided by an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a partial product array obtained by a Potz coding in the prior art
  • FIG. 13 is a schematic diagram of a coding and decoding circuit operator used in a kind of Potz coding in the prior art
  • 15 is a schematic diagram of a tree structure of a reduction tree in the prior art
  • FIG. 16 is a working method of a parallel multiplier provided by an embodiment of the present invention.
  • a parallel multiplier is provided, as shown in FIG. 2 , the parallel multiplier includes:
  • the encoding and decoding circuit 202 is used to encode and decode the digital set of NR4SD + to obtain a partial product array;
  • a reduction tree structure 204 connected to the encoding and decoding circuit, for performing accumulation processing on the parts other than the last two partial product rows in the partial product array;
  • An improved square root select carry adder 206 connected to the reduced tree structure, for adding the last two partial product rows of the partial product array, wherein the improved square root select carry adder includes a full addition an adder, a half adder, and a first custom combinational circuit that includes a digital circuit device.
  • the circuit device can improve the timing, and at the same time, reduce the area, which is beneficial to the parallel multiplier to meet the high-performance timing requirements.
  • the improved square root selection carry adder in order to improve the time sequence and reduce the area of the square root selection-carry adder in the process of performing fast addition processing on the last two partial product rows of the partial product array, in this embodiment, a method is proposed.
  • the improved square root selection carry adder, the improved square root selection carry adder includes a full adder, a half adder and a first custom combination circuit
  • the first custom combination circuit includes a digital circuit device, specifically, the first custom combination
  • the circuit includes a first combined addition circuit and a second combined addition circuit.
  • the first combined addition circuit includes a NOT gate
  • the second combined addition circuit includes a NOT gate and an OR gate.
  • the structure of the improved square root selection carry adder is shown in the figure. As shown in Fig. 3, compared with Fig. 1, it can be seen that the area of the improved square root selection carry adder is smaller.
  • the full adder is used to add three items of input data; the half adder is used to add two items of input data; the first combined addition circuit is used to add the partial product The most significant bits of the last two partial product rows of the array are added; the second combined addition circuit is used for an item of input data to appear in the right side of the last two partial product rows of the partial product array addition processing is performed.
  • the process of processing the last two partial product row data of the partial product array using the above-mentioned improved square root selection carry adder is shown in the stage 4 part of Fig. 4, which shows the full adder, half adder, first combined adder circuit
  • the black dots represent the values in the partial product array.
  • NR4SD in Figures 5 to 10 refers to a non-redundant Radix-4Signed - Digit set (Non-Redundant Radix-4Signed-Digit ).
  • Fig. 5 and Fig. 7 are the encoding circuits of non-redundant radix-4 encoding with ⁇ -2, -1, 0, 1 ⁇ as the digital set at present, and
  • Fig. 9 is the decoding circuit thereof. in,
  • Equation-1 to Equation-5 are Boolean function expressions of the encoding circuit of NR4SD-. Among them, ⁇ represents logical AND, ⁇ represents logical OR, Indicates the logical XOR, and n, one, and two are assigned by the equals equation.
  • Fig. 6 and Fig. 8 are the encoding circuits of non-redundant radix-4 encoding with ⁇ -1, 0, 1, 2 ⁇ as the digital set at present, and Fig. 10 is the decoding circuit thereof. in,
  • Equation-6 to Equation-10 are Boolean function expressions of the encoding circuit of NR4SD + .
  • the coding and decoding circuits of the two existing non-redundant radix-4 coding algorithms are very complicated.
  • the coding and decoding circuits of the two algorithms use a lot of standard cells, from a technological point of view, there are very few active areas that can be shared.
  • an improved encoding is also proposed.
  • the decoding circuit specifically, the improved encoding and decoding circuit includes a first encoding circuit, a second encoding circuit and a decoding circuit.
  • one of the first encoding circuits is sequentially connected to a plurality of second encoding circuits, and the decoding circuit is connected to the first encoding circuit.
  • An encoding circuit and a second encoding circuit wherein the first encoding circuit includes the standard unit of OAI 21 or NAND, the second encoding circuit includes the standard unit of AOI 21, and the decoding circuit includes the standard unit of OAI 222D.
  • an improved encoding and decoding circuit is used to encode and decode the multiplier b[n-1:0] in the NR4SD + manner. It can be seen that the Boolean functions of encoding and decoding are only based on b[n-1:0].
  • FIG. 11 shows the process of encoding and decoding numerical values in the encoding and decoding process to obtain a partial product array
  • FIG. 11 A schematic diagram of the structure of the first encoding circuit is shown.
  • Fig. 11 shows a schematic diagram of the structure of the second encoding circuit.
  • the first encoding circuit and the second encoding circuit both input b 2j , b 2j+1 , and c 2j combination Output c 2j +2 .
  • FIG. 11 shows a schematic structural diagram of the decoding circuit . Combine the resulting circuit.
  • the partial product array obtained by the modified Potz coding in the prior art is shown in FIG. 12
  • the coding and decoding circuit operator used is shown in FIG. 13
  • the following table 3 is the truth table of the partial product array, ⁇ 31 and ⁇ 30 in FIG. 12 are obtained based on the truth table in Table 3 below.
  • a process of generating the partial product array is proposed, and the generated partial product array can also be optimized.
  • the parallel multiplier also An optimization processing structure may be included for deriving optimization of the partial product array output by the codec circuit using a Boolean function.
  • the reduction tree structure includes a full adder, a half An adder, a second customized combinational circuit including a digital circuit device, and a traveling-wave carry adder, the second customized combinational circuit including an XOR gate and an OR gate.
  • the full adder is used to perform addition processing before the improved square root selection carry adder appears, and the half adder is used to perform addition processing.
  • the addition process is performed when there are two summation terms remaining in the left column of the full adder, or, the addition process is performed when there are two summation terms remaining in the left column of the half adder;
  • the second custom combination circuit using When the input includes the number 1, the addition processing is performed;
  • the traveling wave carry adder is used to perform addition processing on the rightmost end of the partial product array before the appearance of the improved square root selection carry adder, as shown in Figure 4, about
  • the simple tree structure performs accumulation processing in the parts of the partial product array except the last two partial product rows, the distribution of the full adder, the half adder, and the second customized combinational circuit is shown in stage 3 of FIG. 4 . It can be seen that the reduction tree structure proposed in this embodiment minimizes the area sum of the full add
  • the above-mentioned parallel multiplier can be applied to fixed-point operations and floating-point operations of signed, unsigned, and odd-even bit widths, etc., that is, a parallel general-purpose multiplier.
  • the above-mentioned parallel multipliers are used on signed 16-bit wide multipliers, and the obtained area is greater than 31.667% of the area gain of Design Compiler, and the power gain is greater than 21.68%.
  • an embodiment of the present invention also provides a working method of a parallel multiplier, as described in the following embodiments. Since the working method of the parallel multiplier is similar to the parallel multiplier in principle for solving the problem, the implementation of the working method of the parallel multiplier can refer to the implementation of the parallel multiplier, and the repetition will not be repeated.
  • FIG. 16 is a flowchart of a working method of a parallel multiplier according to an embodiment of the present invention, as shown in FIG. 16 , including:
  • Step 1602 Encode and decode the digital set of NR4SD + through the encoding and decoding circuit to obtain a partial product array
  • Step 1604 Accumulate the parts in the partial product array except the last two partial product rows through the reduction tree structure
  • Step 1606 Perform addition processing on the last two partial product rows of the partial product array by an improved square root selection carry adder, wherein the improved square root selection carry adder includes a full adder, a half adder, and a first A custom combinatorial circuit, the first custom combinatorial circuit comprising a digital circuit device.
  • the improved square root selection carry adder includes a full adder, a half adder, and a first A custom combinatorial circuit, the first custom combinatorial circuit comprising a digital circuit device.
  • the last two partial product rows of the partial product array are added by an improved square root selection carry adder, including:
  • the first customized combinational circuit includes a first combinational addition circuit and a second combinational addition circuit, the first combinational addition circuit includes a NOT gate, and the second combinational addition circuit includes a NOT gate and an OR gate, wherein,
  • the half adder When adding two pieces of input data, the half adder is used
  • the second combined addition circuit is used for addition processing.
  • the encoding and decoding circuit includes a first encoding circuit, a second encoding circuit, and a decoding circuit, and one of the first encoding circuits in the encoding and decoding circuit is sequentially connected to a plurality of the second encoding circuits, so The decoding circuit is connected to the first encoding circuit and the second encoding circuit, wherein,
  • the first encoding circuit includes standard cells of OAI 21 or NAND
  • the second encoding circuit includes standard cells of AOI 21
  • the decoding circuit includes standard cells of OAI 222D.
  • the accumulation processing is performed on the parts in the partial product array except the last two partial product rows through the reduction tree structure, including:
  • the reduction tree structure includes a full adder, a half adder, a second customized combinational circuit and a traveling-wave carry adder, and the second customized combinational circuit includes a digital circuit device, wherein,
  • the half adder On the left side of the partial product array, the half adder is used for addition processing when there are two summation terms remaining in the left column of the full adder, or the half adder is used when there are two summation terms remaining in the left column of the half adder.
  • the device performs addition processing
  • the second custom combination circuit is used for addition processing
  • the traveling wave carry adder is used for addition processing.
  • a software is also provided, and the software is used to execute the technical solutions described in the foregoing embodiment and the preferred implementation manner.
  • a storage medium in which the above-mentioned software is stored, and the storage medium includes but is not limited to: an optical disk, a floppy disk, a hard disk, a rewritable memory, and the like.
  • the embodiments of the present invention achieve the following technical effects: it is proposed to perform encoding and decoding processing on the digital set of NR4SD + to obtain a partial product array. It is beneficial to reduce the circuit area for generating the partial product. In addition, the accumulation processing is performed on the parts of the partial product array except the last two partial product rows through the reduction tree structure. In the process of addition processing, an improved square root selection carry adder is proposed.
  • the improved square root selection carry adder includes a full adder, a half adder and a first custom combination circuit.
  • the first custom combination circuit includes digital circuit devices. Compared with the traditional square root carry select adder, the timing sequence can be improved, and the area is reduced, which is beneficial for the parallel multiplier to meet the timing requirement of high performance.
  • embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种并行乘法器及其工作方法,其中,该并行乘法器包括:编码解码电路(202),用于对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;约简树结构(204),连接所述编码解码电路,用于对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;改进后的平方根选择进位加法器(206),连接所述约简树结构,用于对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。该方案可以提高时序,同时,减小了面积,有利于并行乘法器满足高性能的时序要求。

Description

并行乘法器及其工作方法 技术领域
本发明涉及数字信号处理技术领域,特别涉及一种并行乘法器及其工作方法。
背景技术
数字乘法器被广泛地应用于微处理器、多媒体和数字信号处理器、等产品中。典型的数字信号处理器功能包括卷积、数字滤波、信号转换;这些功能需要各种乘法器。另外,在基于智能型体系结构的芯片中,需要大量的乘法器和乘加器。因此,一种性能优越的通用乘法器,(从而衍生出其他的子乘法器),是工业界和学术界迫切需要的。衡量乘法器性能的标准包括:关键路径延时、面积、功耗、功耗延时积、面积延时积等。
当前的乘法器包括串行乘法器和并行乘法器。而串行乘法器的时序通常比较差,无法满足高性能的要求。因此,在满足延时要求的情况下,降低并行乘法器的面积和功耗是目前研究的焦点。
并行乘法器的架构分为三部分:1.部分积的产生;2.部分积的累加;3.最后两个部分积行的快速加法。关于最后两个部分积行的快速加法,现有技术是基于传统的平方根进位选择加法器进行的,图1展示了线性进位选择加法器的结构,如果每级的级深线性增加,就是平方根进位选择加法器。但是,由图1可知,传统的平方根进位选择加法器的时序比较差且面积比较大,使得影响并行乘法器的面积,进而不利于并行乘法器满足高性能的时序要求。
发明内容
本发明实施例提供了一种并行乘法器,以解决现有技术中并行乘法器面积大、时序差的技术问题。该并行乘法器包括:
编码解码电路,用于对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
约简树结构,连接所述编码解码电路,用于对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
改进后的平方根选择进位加法器,连接所述约简树结构,用于对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
本发明实施例还提供了一种并行乘法器的工作方法,以解决现有技术中并行乘法器面积大、时序差的技术问题。该方法包括:
通过所述编码解码电路对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
通过所述约简树结构对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
通过改进后的平方根选择进位加法器对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
在本发明实施例中,提出了对NR4SD +的数字集进行编码和解码处理,得到部分积阵列,与现有技术相比,其数值集比改进的波兹编码少1位,有利于减小产生部分积的电路面积,此外,通过约简树结构对部分积阵列中除最后两个部分积行之外的部分进行累加处理,在对部分积阵列的最后两个部分积行进行加法处理的过程中,提出了改进后的平方根选择进位加法器,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,第一定制组合电路包括数字电路器件,与传统的平方根进位选择加法器相比,可以提高时序,同时,减小了面积,有利于并行乘法器满足高性能的时序要求。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1是现有技术中的一种线性进位选择加法器的结构示意图;
图2是本发明实施例提供的一种并行乘法器的结构框图;
图3是本发明实施例提供的一种对部分积阵列的最后两个部分积行实施改进后的平方根选择进位加法器的结构示意图;
图4是本发明实施例提供的一种上述并行乘法器的计算过程的原理示意图;
图5是现有技术中的一种NR4SD -1位编码电路示意图;
图6是现有技术中的一种NR4SD +1位编码电路示意图;
图7是现有技术中的一种NR4SD -位宽级编码电路示意图;
图8是现有技术中的一种NR4SD +位宽级编码电路示意图;
图9是现有技术中的一种NR4SD -解码电路示意图;
图10是现有技术中的一种NR4SD +解码电路示意图;
图11是本发明实施例提供的一种编码、解码的结构示意图;
图12是现有技术中的一种波兹编码得到的部分积阵列的示意图;
图13是现有技术中的一种波兹编码所用的编码解码电路算子示意图;
图14是本发明实施例提供的一种优化后的部分积阵列的示意图;
图15是现有技术中的约简树的树形结构示意图;
图16是本发明实施例提供的一种并行乘法器的工作方法。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚明白,下面结合附图对本发明实施例做进一步详细说明。在此,本发明的示意性实施例及其说明用于解释本发明,但并不作为对本发明的限定。
在本发明实施例中,提供了一种并行乘法器,如图2所示,该并行乘法器包括:
编码解码电路202,用于对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
约简树结构204,连接所述编码解码电路,用于对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
改进后的平方根选择进位加法器206,连接所述约简树结构,用于对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
由图2所示可知,在本发明实施例中,提出了对NR4SD+的数字集进行编码和解码处理,得到部分积阵列,与现有技术相比,其数值集比改进的波兹编码少1位,有利于减小产生部分积的电路面积,此外,通过约简树结构对部分积阵列中除最后两个部分积行之外的部分进行累加处理,在对部分积阵列的最后两个部分积行进行加法处理的过程中,提出了改进后的平方根选择进位加法器,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,第一定制组合电路包括数字电路器件,与传统的平方根进位选择加法器相比,可以提高时序,同时,减小了面积,有利于并行乘法器满足高性能的时序要求。
具体实施时,为了实现在对部分积阵列的最后两个部分积行进行快速加法处理的过程中,提高平方根选择进位加法器的时序、减小面积,在本实施例了中,提出了一种改进后的平方根选择进位加法器,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,该第一定制组合电路包括数字电路器件,具体的,第一定制组合电路包括第一组合加法电路和第二组合加法电路,第一组合加法电路包括一个非门,第二组合加法电路包括一个非门和一个或门,改进后的平方根选择进位加法器的结构如图3所示,与图1相比可知,该改进后的平方根选择进位加法器面积更小。
具体的,全加器,用于对三项输入数据进行加法处理;所述半加器,用于对两项输入数据进行加法处理;所述第一组合加法电路,用于对所述部分积阵列的最后两个部分积行的最高位进行加法处理;所述第二组合加法电路,用于在所述部分积阵列的最后两个部分积行中,在右侧中,出现一项输入数据时进行加法处理。采用上述改进后的平方根选择进位加法器处理部分积阵列的最后两个部分积行数据的过程如图4中阶段4部分所示,其中显示了全加器、半加器、第一组合加法电路以及第二组合加法电路对数据进行处理时的分布情况,黑色点表示部分积阵列中的数值。
具体实施时,现有技术中的编码解码电路如图5至图10所示,图5至图10中的NR4SD -是指非冗余基4有符号数字集(Non-Redundant Radix-4Signed-Digit)。其中,图5、图7是目前以{-2、-1、0、1}为数字集的非冗余基4编码的编码电路,图9是其解码电路。其中,
Figure PCTCN2021078251-appb-000001
Figure PCTCN2021078251-appb-000002
Figure PCTCN2021078251-appb-000003
Figure PCTCN2021078251-appb-000004
Figure PCTCN2021078251-appb-000005
式-1到式-5是NR4SD -的编码电路的布尔函数式。其中,∧表示逻辑与,∨表示逻辑或,
Figure PCTCN2021078251-appb-000006
表示逻辑异或,n、one、two由等号方程式赋值。
图6、图8是目前以{-1、0、1、2}为数字集的非冗余基4编码的编码电路,图10是其解码电路。其中,
Figure PCTCN2021078251-appb-000007
Figure PCTCN2021078251-appb-000008
Figure PCTCN2021078251-appb-000009
Figure PCTCN2021078251-appb-000010
Figure PCTCN2021078251-appb-000011
式-6到式-10是NR4SD +的编码电路的布尔函数式。
其中,NR4SD -编码电路的编码原理如下表1所示。
Figure PCTCN2021078251-appb-000012
表1
NR4SD +编码电路的编码原理如下表2所示。
Figure PCTCN2021078251-appb-000013
表2
从图5到图10可以看出,现有的两种非冗余基4编码算法的编码解码电路非常复杂。另外,由于两种算法的编码解码电路用到的标准单元比较多,在工艺角度来思考,能够公用的有源区很少。为了进一步减小并行乘法器的电路面积,在本实施例中,还提出了在部分积阵列产生的过程进行改进,即在提出采用非冗余的基4编码的同时,还提出了改进的编码解码电路,具体的,改进的编码解码电路包括第一编码电路、第二编码电路和解码电路,编码解码电路中一个所述第一编码电路依次连接多个第二编码电路,解码电路连接第一编码电路和第二编码电路,其中,第一编码电路包括OAI 21或与非的标准单元,第二编码电路包括AOI 21的标准单元,解码电路包括OAI 222D标准单元。
具体实施时,如图4所示的阶段1,采用改进的编码解码电路对乘数b[n-1:0]进行NR4SD +方式的编码和解码,可见编码、解码的布尔函数式仅基于b 2j+1、b 2j、c 2j+2、c 2j、a i、a i-1实现,其中,b 2j+1为第2j+1个乘数,b 2j为第2j个乘数,a i为第i个被乘数,a i-1为第i-1个被乘数,c 2j+2是基于乘数b编码的第2j+2个中间编码结果,c 2j是基于乘数b编码的第2j个中间编码结果,得到的编解码的面积更小,同时,减小了从c 2j到c 2j+2的延时。
具体的,在编码解码的过程中,如图11所示,图11中(a)图示出了编码解码过程中数值的编码、解码到得到部分积阵列的过程,图11中(b)示出了第一编码电路的结构示意图,图11中(c)示出了第二编码电路的结构示意图,第一编码电路和第二编码 电路都是输入b 2j、b 2j+1、c 2j组合输出c 2j+2,图11中(d)示出了解码电路的结构示意图,解码电路中示出的1X、-1X、2X是由b 2j、b 2j+1、c 2j、c 2j+2组合生成的电路。
具体实施时,现有技术中通过修改的波兹编码得到的部分积阵列如图12所示,所用的编码解码电路算子如图13所示,下表3是部分积阵列的真值表,图12中的τ 31、τ 30基于下表3中的真值表得到。
Figure PCTCN2021078251-appb-000014
表3
由此,得到的布尔函数表达式是:
Figure PCTCN2021078251-appb-000015
Figure PCTCN2021078251-appb-000016
Figure PCTCN2021078251-appb-000017
Figure PCTCN2021078251-appb-000018
Figure PCTCN2021078251-appb-000019
Figure PCTCN2021078251-appb-000020
Figure PCTCN2021078251-appb-000021
Figure PCTCN2021078251-appb-000022
(式-16)中的cin j就是表3中的c i。可见现有技术得到的部分积阵列规模比较大,时序也比较差。
为了进一步提高并行乘法器的时序、减小部分积阵列的规模,在本实施例中,提出了在部分积阵列产生的过程,还可以对产生的部分积阵列进行优化,例如,并行乘法器还可以包括优化处理结构,用于采用布尔函数对编码解码电路输出的部分积阵列进行推导优化。
具体的,可以结合Shiann-Rong Kuang和Wen_Chang Yeh的两种部分及阵列优化方案,将其思想应用到NR4SD+编码算法的部分积阵列中;另外,基于对部分积阵列中的Row_LSB(指该行最后一位)、Neg_cin(是指负进位)和普通部分积项(即除了Row_LSB、Neg_cin之外的部分积)的优化,得到优化后的部分积阵列如图4的阶段2、图14所示,将图14与图13相比可知,优化后的部分积阵列的规模更小,时序也会更好。
具体实施时,关于部分积阵列的累加,目前对于部分积的树形结构设计,普遍采用的是Wallace树形结构、Dadda树形结构和Reduced area树形结构,如图15所示,图15中(a)为Wallace树形结构,图15中(b)为Dadda树形结构,图15中(c)为Reduced area树形结构,下表4展示了上述三种约简树方法的全加器、半加器、VMA(Vector Merging Adder,向量合并加法器)长度的统计结果,可见现有的上述三种约简树方法算法复杂,面积和时序没有达到更优。
树形结构 全加器 半加器 VMA长度
Wallace 16 13 8
Dadda 15 5 10
Reduced area 18 5 7
表4
为了进一步提高并行乘法器的时序、减小电路面积,在本实施例中,在部分积阵列的累加部分,提出了一种约简树结构,例如,该约简树结构包括全加器、半加器、第二定制组合电路和行波进位加法器,第二定制组合电路包括数字电路器件,该第二定制组合电路包括一个同或门和一个或门。
具体的,在采用该约简树结构对部分积阵列进行叠加处理的过程,全加器,用于在改进后的平方根选择进位加法器出现前,进行加法处理,所述半加器,用于在部分积阵列左侧,当全加器左列剩余两个求和项时进行加法处理,或者,当半加器左列剩余两个求和项时进行加法处理;第二定制组合电路,用于在输入包括数字1时,进行加法处理;行波进位加法器,用于对部分积阵列最右端,在改进后的平方根选择进位加法器出现前,进行加法处理,如图4所示,约简树结构在部分积阵列中除了最后两个部分积行之外的部分进行累加处理时,全加器、半加器、第二定制组合电路的分布如图4的阶段3所示。可见,本实施例提出的约简树结构,使约简树中的全加器和半加器的面积和最小;另外,相比于现有的上述三种约简树方法,该约简树的算法更加简单,且面积更小,时序更好。
具体实施时,上述并行乘法器可以适用于有符号、无符号、奇偶位宽的定点运算和浮点运算等,即为一种并行通用乘法器。
具体实施时,基于TSMC的28nm标准单元库,上述并行乘法器用在有符号16位宽乘法器上,得到的面积比Design Compiler的面积收益>31.667%,功耗收益>21.68%。
基于同一发明构思,本发明实施例中还提供了一种并行乘法器的工作方法,如下面的实施例所述。由于并行乘法器的工作方法解决问题的原理与并行乘法器相似,因此并行乘法器的工作方法的实施可以参见并行乘法器的实施,重复之处不再赘述。
图16是本发明实施例的并行乘法器的工作方法的一种流程图,如图16所示,包括:
步骤1602:通过所述编码解码电路对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
步骤1604:通过所述约简树结构对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
步骤1606:通过改进后的平方根选择进位加法器对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
在一个实施例中,通过改进后的平方根选择进位加法器对所述部分积阵列的最后两个部分积行进行加法处理,包括:
所述第一定制组合电路包括第一组合加法电路和第二组合加法电路,所述第一组合加法电路包括一个非门,所述第二组合加法电路包括一个非门和一个或门,其中,
对三项输入数据进行加法处理时,采用所述全加器;
对两项输入数据进行加法处理时,采用所述半加器;
对所述部分积阵列的最后两个部分积行的最高位采用所述第一组合加法电路,进行加法处理;
在所述部分积阵列的最后两个部分积行中,在右侧中,出现一项输入数据时,采用所述第二组合加法电路进行加法处理。
在一个实施例中,所述编码解码电路包括第一编码电路、第二编码电路和解码电路,所述编码解码电路中一个所述第一编码电路依次连接多个所述第二编码电路,所述解码电路连接所述第一编码电路和所述第二编码电路,其中,
所述第一编码电路包括OAI 21或与非的标准单元,所述第二编码电路包括AOI 21的标准单元,所述解码电路包括OAI 222D标准单元。
在一个实施例中,通过所述约简树结构对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理,包括:
所述约简树结构包括全加器、半加器、第二定制组合电路和行波进位加法器,所述第二定制组合电路包括数字电路器件,其中,
在改进后的平方根选择进位加法器出现前,采用所述全加器进行加法处理,
在部分积阵列左侧,当全加器左列剩余两个求和项时采用所述半加器进行加法处理,或者,当半加器左列剩余两个求和项时采用所述半加器进行加法处理;
在输入包括数字1时,采用所述第二定制组合电路进行加法处理;
对部分积阵列最右端,在改进后的平方根选择进位加法器出现前,采用所述行波进位加法器进行加法处理。
在另外一个实施例中,还提供了一种软件,该软件用于执行上述实施例及优选实施方式中描述的技术方案。
在另外一个实施例中,还提供了一种存储介质,该存储介质中存储有上述软件,该存储介质包括但不限于:光盘、软盘、硬盘、可擦写存储器等。
本发明实施例实现了如下技术效果:提出了对NR4SD +的数字集进行编码和解码处理,得到部分积阵列,与现有技术相比,其数值集比改进的波兹编码少1位,有利于减小产生部分积的电路面积,此外,通过约简树结构对部分积阵列中除最后两个部分积行之外的部分进行累加处理,在对部分积阵列的最后两个部分积行进行加法处理的过程中,提出了改进后的平方根选择进位加法器,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,第一定制组合电路包括数字电路器件,与传统的平方根进位选择加法器相比,可以提高时序,同时,减小了面积,有利于并行乘法器满足高性能的时序要求。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种并行乘法器,其特征在于,包括:
    编码解码电路,用于对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
    约简树结构,连接所述编码解码电路,用于对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
    改进后的平方根选择进位加法器,连接所述约简树结构,用于对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
  2. 如权利要求1所述的并行乘法器,其特征在于,所述第一定制组合电路包括第一组合加法电路和第二组合加法电路,所述第一组合加法电路包括一个非门,所述第二组合加法电路包括一个非门和一个或门,其中,
    所述全加器,用于对三项输入数据进行加法处理;
    所述半加器,用于对两项输入数据进行加法处理;
    所述第一组合加法电路,用于对所述部分积阵列的最后两个部分积行的最高位进行加法处理;
    所述第二组合加法电路,用于在所述部分积阵列的最后两个部分积行中,在右侧中,出现一项输入数据时进行加法处理。
  3. 如权利要求1所述的并行乘法器,其特征在于,所述编码解码电路包括第一编码电路、第二编码电路和解码电路,所述编码解码电路中一个所述第一编码电路依次连接多个所述第二编码电路,所述解码电路连接所述第一编码电路和所述第二编码电路,其中,
    所述第一编码电路包括OAI 21或与非的标准单元,所述第二编码电路包括AOI 21的标准单元,所述解码电路包括OAI 222D标准单元。
  4. 如权利要求1所述的并行乘法器,其特征在于,还包括:
    优化处理结构,用于采用布尔函数对所述编码解码电路输出的所述部分积阵列进行推导。
  5. 如权利要求1所述的并行乘法器,其特征在于,所述约简树结构包括全加器、半加器、第二定制组合电路和行波进位加法器,所述第二定制组合电路包括数字电路器件。
  6. 如权利要求5所述的并行乘法器,其特征在于,所述第二定制组合电路包括一个同或门和一个或门,其中,
    所述全加器,用于在改进后的平方根选择进位加法器出现前,进行加法处理,
    所述半加器,用于在部分积阵列左侧,当全加器左列剩余两个求和项时进行加法处理,或者,当半加器左列剩余两个求和项时进行加法处理;
    所述第二定制组合电路,用于在输入包括数字1时,进行加法处理;
    所述行波进位加法器,用于对部分积阵列最右端,在改进后的平方根选择进位加法器出现前,进行加法处理。
  7. 一种权利要求1至6中任一项所述的并行乘法器的工作方法,其特征在于,包括:
    通过所述编码解码电路对NR4SD +的数字集进行编码和解码处理,得到部分积阵列;
    通过所述约简树结构对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理;
    通过改进后的平方根选择进位加法器对所述部分积阵列的最后两个部分积行进行加法处理,其中,改进后的平方根选择进位加法器包括全加器、半加器和第一定制组合电路,所述第一定制组合电路包括数字电路器件。
  8. 如权利要求7所述的并行乘法器的工作方法,其特征在于,通过改进后的平方根选择进位加法器对所述部分积阵列的最后两个部分积行进行加法处理,包括:
    所述第一定制组合电路包括第一组合加法电路和第二组合加法电路,所述第一组合加法电路包括一个非门,所述第二组合加法电路包括一个非门和一个或门,其中,
    对三项输入数据进行加法处理时,采用所述全加器;
    对两项输入数据进行加法处理时,采用所述半加器;
    对所述部分积阵列的最后两个部分积行的最高位采用所述第一组合加法电路进行加法处理;
    在所述部分积阵列的最后两个部分积行中,在右侧中,出现一项输入数据时,采用所述第二组合加法电路进行加法处理。
  9. 如权利要求7所述的并行乘法器的工作方法,其特征在于,所述编码解码电路包括第一编码电路、第二编码电路和解码电路,所述编码解码电路中一个所述第一编码电 路依次连接多个所述第二编码电路,所述解码电路连接所述第一编码电路和所述第二编码电路,其中,
    所述第一编码电路包括OAI 21或与非的标准单元,所述第二编码电路包括AOI 21的标准单元,所述解码电路包括OAI 222D标准单元。
  10. 如权利要求7所述的并行乘法器的工作方法,其特征在于,通过所述约简树结构对所述部分积阵列中除最后两个部分积行之外的部分进行累加处理,包括:
    所述约简树结构包括全加器、半加器、第二定制组合电路和行波进位加法器,所述第二定制组合电路包括数字电路器件,其中,
    在改进后的平方根选择进位加法器出现前,采用所述全加器进行加法处理,
    在部分积阵列左侧,当全加器左列剩余两个求和项时采用所述半加器进行加法处理,或者,当半加器左列剩余两个求和项时采用所述半加器进行加法处理;
    在输入包括数字1时,采用所述第二定制组合电路进行加法处理;
    对部分积阵列最右端,在改进后的平方根选择进位加法器出现前,采用所述行波进位加法器进行加法处理。
PCT/CN2021/078251 2021-02-26 2021-02-26 并行乘法器及其工作方法 WO2022178861A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/078251 WO2022178861A1 (zh) 2021-02-26 2021-02-26 并行乘法器及其工作方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/078251 WO2022178861A1 (zh) 2021-02-26 2021-02-26 并行乘法器及其工作方法

Publications (1)

Publication Number Publication Date
WO2022178861A1 true WO2022178861A1 (zh) 2022-09-01

Family

ID=83047610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/078251 WO2022178861A1 (zh) 2021-02-26 2021-02-26 并行乘法器及其工作方法

Country Status (1)

Country Link
WO (1) WO2022178861A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146421A (en) * 1987-11-24 1992-09-08 Digital Equipment Corporation High speed parallel multiplier circuit
US20060143260A1 (en) * 2004-12-29 2006-06-29 Chuan-Cheng Peng Low-power booth array multiplier with bypass circuits
CN103092560A (zh) * 2013-01-18 2013-05-08 中国科学院自动化研究所 一种基于Bypass技术的低功耗乘法器
CN107977191A (zh) * 2016-10-21 2018-05-01 中国科学院微电子研究所 一种低功耗并行乘法器
CN109753268A (zh) * 2017-11-08 2019-05-14 北京思朗科技有限责任公司 多粒度并行运算乘法器
CN111966323A (zh) * 2020-08-18 2020-11-20 合肥工业大学 基于无偏压缩器的近似乘法器及计算方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5146421A (en) * 1987-11-24 1992-09-08 Digital Equipment Corporation High speed parallel multiplier circuit
US20060143260A1 (en) * 2004-12-29 2006-06-29 Chuan-Cheng Peng Low-power booth array multiplier with bypass circuits
CN103092560A (zh) * 2013-01-18 2013-05-08 中国科学院自动化研究所 一种基于Bypass技术的低功耗乘法器
CN107977191A (zh) * 2016-10-21 2018-05-01 中国科学院微电子研究所 一种低功耗并行乘法器
CN109753268A (zh) * 2017-11-08 2019-05-14 北京思朗科技有限责任公司 多粒度并行运算乘法器
CN111966323A (zh) * 2020-08-18 2020-11-20 合肥工业大学 基于无偏压缩器的近似乘法器及计算方法

Similar Documents

Publication Publication Date Title
Lamba et al. A review paper on different multipliers based on their different performance parameters
Nair et al. A review paper on comparison of multipliers based on performance parameters
Sharma et al. Modified booth multiplier using wallace structure and efficient carry select adder
Chavan et al. High speed 32-bit vedic multiplier for DSP applications
Agrawal et al. ASIC based logarithmic multiplier using iterative pipelined architecture
WO2022178861A1 (zh) 并行乘法器及其工作方法
Mahitha et al. A low power signed redundant binary vedic multiplier
CN112860219B (zh) 并行乘法器及其工作方法
Baba et al. Design and implementation of advanced modified booth encoding multiplier
Hsiao et al. Design of a low-cost floating-point programmable vertex processor for mobile graphics applications based on hybrid number system
Sharma et al. Addition Of redundant binary signed digits using RBSD Adder
Thomas et al. Comparison of Vedic Multiplier with Conventional Array and Wallace Tree Multiplier
Kaur et al. Implementation of modified booth multiplier using pipeline technique on FPGA
Arab Nezhad et al. Design of low-power approximate logarithmic multipliers with improved accuracy
Kalaiselvi et al. Area efficient high speed and low power MAC unit
Govindarajulu et al. Design of Energy-Efficient and High-Performance VLSI Adders
Bhongale et al. Review on Recent Advances in VLSI Multiplier
Parhami Comments on" Evaluation of A+ B= K conditions without carry propagation
Lv et al. Efficient diminished-1 modulo 2 n+ 1 multiplier architectures
Sharanya et al. A MODIFIED PARTIAL PRODUCT GENERATOR FOR REDUNDANT BINARY MULTIPLIERS
US20230015148A1 (en) Multiplier and Adder in Systolic Array
Setia et al. Novel Architecture of High Speed Parallel MAC using Carry Select Adder
Rathore et al. Implementation and Design of Xilinx based Booth multiplier
VAMSI COMPARATIVE ANALYSIS OF VARIOUS TYPES OF MULTIPLIERS FOR EFFECTIVE LOW POWER AND TIME
Li et al. Performance Improvement of Radix-4 Booth Multiplier on Negative Partial Products

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927297

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927297

Country of ref document: EP

Kind code of ref document: A1