JP2005242722A

JP2005242722A - Parallel operation unit and method

Info

Publication number: JP2005242722A
Application number: JP2004052586A
Authority: JP
Inventors: Mitsuyoshi Nozoe; 三資農添
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-02-27
Filing date: 2004-02-27
Publication date: 2005-09-08

Abstract

<P>PROBLEM TO BE SOLVED: To realize high-speed arithmetic operation without increasing a circuit scale. <P>SOLUTION: In the case of performing arithmetic operation of an input word by dividing it in the parallel operation method, an input processing means 101 converts the input value of a prescribed boundary bit in accordance with the kind of arithmetic operation and performs arithmetic operation of the converted input value and an output correction means 103 corrects the arithmetic operation result in accordance with the conversion of the input value of the boundary bit, so that high-speed operation can be realized without increasing the circuit scale. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、ディジタル演算装置に関し、特に、入力データを分割して並列算術演算する並列演算方法に関する。 The present invention relates to a digital arithmetic device, and more particularly to a parallel arithmetic method for dividing input data and performing parallel arithmetic operations.

通常、コンピュータは、所定の最大ビット数の数値を加算する加算器を含む算術演算論理装置を備えている。ディジタルシグナルプロセッサ等では、１６ビットおよび３２ビット長ワード用の加算器が見られる。また、それらの加算器は、それよりも小さなワードについても機能する。例えば、３２ビットの加算器を用いて、１６ビットの２つの加算を並列に行う場合がある。 A computer typically includes an arithmetic logic unit that includes an adder that adds a numeric value of a predetermined maximum number of bits. In digital signal processors and the like, adders for 16-bit and 32-bit long words are found. These adders also work for smaller words. For example, two 16-bit additions may be performed in parallel using a 32-bit adder.

以下、並列加算について、図６，図７，図８を参照しながら例を挙げて説明する。
図６は８ビットの加算器を用いて４ビットの２つの並列加算を行う例を示す図、図７は従来の加算器の構成例を示す図、図８は従来の演算結果を補正する加算器の構成を例示する図である。 Hereinafter, the parallel addition will be described with reference to FIGS. 6, 7, and 8.
FIG. 6 is a diagram illustrating an example of performing two 4-bit parallel additions using an 8-bit adder, FIG. 7 is a diagram illustrating a configuration example of a conventional adder, and FIG. 8 is an addition for correcting a conventional calculation result. It is a figure which illustrates the composition of a container.

図６において、通常の加算として、８ビットの数Ａ［７：０］＝［１１００１１０１］と、数Ｂ［７：０］＝［０１１１０１０１］の加算の計算例を表す。加算結果は［１０１００００１０］となる。和が８ビット（Ｓ［７：０］）と、キャリー（Ｃ［８］）が１ビットで都合９ビットの結果が得られる。今後、ｎビット目の和をＳ［ｎ］、ｎビット目から（ｎ＋１）ビット目に上がるキャリーをＣ［ｎ＋１］と表す。また、本明細書での加算器を用いて減算を行う際には、Ｂ［ｎ］入力は反転データが入り、２の補数系の減算を行うための初期キャリーＣ［０］の入力機構があるものとする。２の補数系の減算方法に関して、この前提は決して特殊なものではない。 In FIG. 6, an example of calculation of addition of an 8-bit number A [7: 0] = [11001101] and a number B [7: 0] = [01110101] is shown as normal addition. The addition result is [101000010]. The sum is 8 bits (S [7: 0]), the carry (C [8]) is 1 bit, and a convenient 9-bit result is obtained. From now on, the sum of the nth bit is represented as S [n], and the carry rising from the nth bit to the (n + 1) th bit is represented as C [n + 1]. In addition, when performing subtraction using the adder in this specification, B [n] input contains inverted data, and an input mechanism for an initial carry C [0] for performing 2's complement subtraction is provided. It shall be. This assumption is by no means special with respect to the two's complement subtraction method.

ここで、Ａ，Ｂの上位、下位４ビットずつを並列に加算する場合を考える。すなわち、Ａ［７：４］＝［１１００］，Ｂ［７：４］＝［０１１１］の４ビットの加算と、Ａ［３：０］＝［１１０１］，Ｂ［３：０］＝［０１０１］の４ビットの加算を並列に行う。求める結果は図６右にあるように、上位４ビットについては和Ｓ［７：４］＝［００１１］とキャリーＣ［８］＝［１］、下位４ビットについては、和Ｓ［３：０］＝［００１０］とキャリーＣ［４］＝［１］となる。 Here, a case is considered where the upper and lower 4 bits of A and B are added in parallel. That is, 4-bit addition of A [7: 4] = [1100], B [7: 4] = [0111] and A [3: 0] = [1101], B [3: 0] = [0101] ] Is added in parallel. As shown in the right of FIG. 6, the result to be obtained is the sum S [7: 4] = [0011] and carry C [8] = [1] for the upper 4 bits, and the sum S [3: 0] for the lower 4 bits. ] = [0010] and carry C [4] = [1].

前記の例でわかるように、並列に加算を行う場合は、上位で行う加算に、下位で行う加算のキャリーが伝播するのを制御する必要がある。
また、今後動画などのマルチメディア処理が必要になると、６４ビット以上の加算器が必要となり、任意のビット位置で分割する必要が出てくる。 As can be seen from the above example, in the case of performing addition in parallel, it is necessary to control the propagation of the addition carry performed in the lower order to the addition performed in the higher order.
Further, when multimedia processing such as a moving image is required in the future, an adder of 64 bits or more is required, and it is necessary to divide at an arbitrary bit position.

従来、並列加算を実現する第１の従来技術の例として、加算器のキャリー伝播部に、セレクタ回路を挿入し、キャリーの伝播を制御する方法がある（例えば、特許文献１参照）。 Conventionally, as an example of the first conventional technique for realizing parallel addition, there is a method of controlling carry propagation by inserting a selector circuit into a carry propagation unit of an adder (see, for example, Patent Document 1).

従来の加算器では、図７にあるように、分割の可能性のあるビット間にセレクタ回路を設け、そのビットで分割するか否かを表す制御信号を用いて、キャリーを上位に伝播するか否か決定することにより、並列加算を実現する方法が用いられている。図７において、並列加算を行わない場合には、通常の加算を行うので、セレクタは下位の部分加算器からのキャリーを出力として選択し、並列加算を行う場合には、セレクタは０の値を出力する。これにより、並列加算を行う場合には、上位の加算にキャリーが伝播しないことになる。 In the conventional adder, as shown in FIG. 7, a selector circuit is provided between bits that can be divided, and a control signal indicating whether or not to divide by that bit is used to propagate the carry to the upper side. A method of realizing parallel addition by determining whether or not is used is used. In FIG. 7, when parallel addition is not performed, normal addition is performed. Therefore, the selector selects the carry from the lower partial adder as an output, and when parallel addition is performed, the selector sets a value of 0. Output. As a result, when performing parallel addition, the carry is not propagated to the higher-order addition.

第２の従来例として、演算器の入力データに加工を行い、得られた演算結果を補正する方法が知られている。図８は、第２の従来例の概念図である（例えば、特許文献２参照）。
特開平７−２１０３６９号公報特開２００３−２１６４１５号公報 As a second conventional example, there is known a method of processing input data of a calculator and correcting the obtained calculation result. FIG. 8 is a conceptual diagram of a second conventional example (see, for example, Patent Document 2).
Japanese Patent Laid-Open No. 7-210369 JP 2003-216415 A

しかしながら、上記第１の従来例に見られるような従来の方法では、キャリー伝播部にセレクタ回路をいれる必要があるため、加算器の段数が増加し、演算速度の低下を招くという問題点があった。 However, in the conventional method as seen in the first conventional example, it is necessary to insert a selector circuit in the carry propagation unit, which increases the number of stages of the adder and causes a decrease in calculation speed. It was.

例えば、１６ビットの加算器に対して、任意のビット位置で分割して並列加算を行う場合には、各ビット間にキャリー伝播を制御するセレクタ回路を入れる必要があるため、計１５箇所のキャリー伝播部にセレクタ回路が入り、演算時間がセレクタ回路１５個分だけ長くなる。 For example, when performing parallel addition by dividing an arbitrary 16 bit adder into a 16-bit adder, it is necessary to insert a selector circuit for controlling carry propagation between each bit. A selector circuit enters the propagation unit, and the computation time is increased by 15 selector circuits.

また、図８に示される第２の従来例の並列演算回路は、入力データの分割ビットに固定値を入力する加工を行うことで、演算時間がビットを分割する個数分のセレクタの遅延だけ長くなる問題を解決している。 Further, the parallel arithmetic circuit of the second conventional example shown in FIG. 8 performs a process of inputting a fixed value to the divided bits of the input data, so that the operation time is increased by the selector delay corresponding to the number of divided bits. Has solved the problem.

一方、この方法だと分割ビットは加算に用いることが出来ず、例えば４ビットと４ビットで合計８ビットの並列加算を行う場合には、分割ビット１ビットを増やした９ビットの加算器を用いる必要があり、回路規模が増大するという問題点があった。そこで、第２の従来例では、入力加工手段に加えて、分割ビットの演算結果の補正を行う出力加工回路を用いることで、分割ビットも加算に用いることができるようにし、回路増大という問題を解決している。 On the other hand, with this method, the divided bits cannot be used for addition. For example, when performing parallel addition of 8 bits in total of 4 bits and 4 bits, a 9-bit adder with 1 divided bit increased is used. There is a problem that the circuit scale increases. Therefore, in the second conventional example, in addition to the input processing means, an output processing circuit that corrects the operation result of the divided bits is used, so that the divided bits can be used for addition, and there is a problem that the circuit increases. It has been solved.

しかしながら、第２の従来例においても、入力加工手段、出力加工回路が入るビットにおいては、少なくとも２段分の論理ゲートが入り、遅延時間が長くなるという問題点があった。 However, even in the second conventional example, there is a problem that at least two stages of logic gates are included in the bit in which the input processing means and the output processing circuit are input, resulting in a long delay time.

上記問題点を解決するために、本発明の並列演算装置および並列演算方法は、回路規模を増加することなく、高速な演算を実現することを目的とする。 In order to solve the above problems, a parallel arithmetic device and a parallel arithmetic method according to the present invention are intended to realize high-speed arithmetic without increasing the circuit scale.

上記目的を達成するために、本発明の請求項１記載の並列演算方法は、演算に用いる複数の入力ワードをそれぞれ複数のサブワードに分割して並列演算する並列演算方法であって、前記サブワードの最下位または最上位ビットになるように境界ビットを設定する工程と、前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する工程と、前記加工した境界ビットを含む前記サブワードをパイプラインレジスタに入力する工程と、前記パイプラインレジスタから出力した加工後の前記サブワードを用いて演算を行う工程と、加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行う工程と、前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する工程とを有することを特徴とする。 In order to achieve the above object, a parallel operation method according to claim 1 of the present invention is a parallel operation method in which a plurality of input words used for an operation are divided into a plurality of subwords to perform a parallel operation. A step of setting a boundary bit to be the least significant bit or the highest bit, a step of processing the value of the boundary bit in the subword into a preset value according to the operation type, and the processed boundary bit Including the step of inputting the subword including: a pipeline register; a step of performing an operation using the processed subword output from the pipeline register; the boundary bit before processing and a carry value immediately before the boundary bit Using the boundary bit to perform an operation on the boundary bit, and in the boundary bit, the boundary bit before processing and the boundary The operation result using the carry immediately before the output is output, and the operation result using the processed subword is output in bits other than the boundary bits, thereby correcting the operation result using the processed subword. And outputting the operation result of the input word.

請求項２記載の並列演算方法は、演算に用いる複数の入力ワードをそれぞれ複数のサブワードに分割して並列演算する並列演算方法であって、前記サブワードの最下位または最上位ビットになるように境界ビットを設定する工程と、前記サブワードをパイプラインレジスタに入力する工程と、前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する工程と、前記加工後の前記サブワードを用いて演算を行う工程と、加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行う工程と、前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する工程とを有することを特徴とする。 3. The parallel operation method according to claim 2, wherein a plurality of input words used for the operation are divided into a plurality of subwords to perform a parallel operation, and the boundary is set so as to be the least significant bit or the most significant bit of the subword. A step of setting a bit, a step of inputting the subword into a pipeline register, a step of processing the value of the boundary bit in the subword into a preset value according to the operation type, and the post-processing Performing the operation using the subword, performing the operation on the boundary bit using the boundary bit before processing and the carry value immediately before the boundary bit, and the boundary bit before processing in the boundary bit The operation result using the bit and the carry immediately before the boundary bit is output, and in the bits other than the boundary bit, By outputting a calculation result using the Buwado, characterized by a step of outputting the operation result of the input word corrects the calculation result using the word after the processing.

請求項３記載の並列演算方法は、請求項１または請求項２いずれかに記載の並列演算方法において、前記キャリーとして前記サブワードを用いた演算中の中間値を用い、前記サブワードを用いた演算と並行して前記境界ビットでの演算を行うことを特徴とする。 The parallel operation method according to claim 3 is the parallel operation method according to claim 1 or 2, wherein an intermediate value during an operation using the subword as the carry is used, and an operation using the subword is used. In parallel, the operation at the boundary bit is performed.

請求項４記載の並列演算方法は、請求項１または請求項２または請求項３いずれかに記載の並列演算方法において、前記設定値を０とし、前記演算を加算とすることを特徴とする。 A parallel calculation method according to a fourth aspect is the parallel calculation method according to the first, second, or third aspect, wherein the set value is set to 0 and the calculation is added.

請求項５記載の並列演算方法は、請求項１または請求項２または請求項３いずれかに記載の並列演算方法において、前記設定値を１とし、前記演算を減算とすることを特徴とする。 The parallel operation method according to claim 5 is the parallel operation method according to claim 1, 2, or 3, wherein the set value is 1 and the operation is subtraction.

請求項６記載の並列演算方法は、請求項１または請求項２または請求項３または請求項４または請求項５いずれかに記載の並列演算方法において、前記演算は、桁上げ先見加算であることを特徴とする。 The parallel operation method according to claim 6 is the parallel operation method according to claim 1, claim 2, claim 3, claim 4, or claim 5, wherein the operation is carry look ahead addition. It is characterized by.

請求項７記載の並列演算装置は、演算に用いる複数の入力ワードをそれぞれ複数の境界ビットを最下位または最上位ビットとしてサブワードに分割して並列演算する並列演算装置であって、前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する入力加工手段と、前記加工した境界ビットを含む前記サブワードを入力するパイプラインレジスタと、前記パイプラインレジスタから出力した加工後の前記サブワードを用いて前記演算を行う演算器と、加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行い、前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する出力補正手段とを有することを特徴とする。 The parallel processing device according to claim 7, wherein the parallel operation device divides a plurality of input words used for the operation into subwords with a plurality of boundary bits as the least significant or most significant bits, and performs parallel operations. Input processing means for processing the value of the boundary bit into a preset value according to the operation type, a pipeline register for inputting the subword including the processed boundary bit, and processing output from the pipeline register An arithmetic unit that performs the calculation using the subsequent subword, and performs an operation on the boundary bit using the boundary bit before processing and the carry value immediately before the boundary bit, and the boundary bit before processing An operation result using the boundary bit and the carry immediately before the boundary bit is output, and bits other than the boundary bit are output. Output correction means for correcting the calculation result using the processed subword and outputting the calculation result of the input word by outputting the calculation result using the processed subword. And

このように、本発明によると、回路規模を増加することなく、高速な演算を実現することができる。 Thus, according to the present invention, high-speed computation can be realized without increasing the circuit scale.

本発明の並列演算方法は、入力ワードを分割して演算する際に、パイプラインレジスタに入力前のデータに対して、入力加工手段により所定の境界ビットの入力値を演算種類に応じて変換して演算を行い、出力補正手段により境界ビットの入力値の変換に応じて演算結果を補正することにより、回路規模を増加することなく、高速な演算を実現することができる。 According to the parallel operation method of the present invention, when an input word is divided and operated, the input value of a predetermined boundary bit is converted according to the operation type with respect to the data before being input to the pipeline register by the input processing means. By performing the calculation in this manner and correcting the calculation result according to the conversion of the input value of the boundary bit by the output correction means, a high-speed calculation can be realized without increasing the circuit scale.

また、演算器の途中から、出力補正に必要な信号を引き出して、その信号を用いて出力補正を行うことにより、出力補正を演算と並列に行うことができるので、より高速な演算を行うことができる。 In addition, by extracting a signal necessary for output correction from the middle of the arithmetic unit and performing output correction using the signal, output correction can be performed in parallel with the calculation, so that higher-speed calculation is performed. Can do.

以下、本発明の実施の形態について、図面を参照しながら説明する。
（実施の形態１）
以下、図１，図２，図３，図４を用いて、本発明の実施の形態１における並列演算装置について、加算器を例として説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
Hereinafter, the parallel arithmetic device according to the first embodiment of the present invention will be described using an adder as an example with reference to FIGS.

図１は本発明の実施の形態１における並列加算回路構成を表すブロック図、図２は本発明の実施の形態１におけるセレクタの構成図、図３は本発明の実施の形態１における並列演算装置を例示する図である。 1 is a block diagram showing the configuration of a parallel adder circuit according to the first embodiment of the present invention, FIG. 2 is a block diagram of a selector according to the first embodiment of the present invention, and FIG. 3 is a parallel arithmetic unit according to the first embodiment of the present invention. FIG.

実施の形態１における並列加算回路は、図１に示されるように、入力加工手段１０１、演算器１０２、出力補正手段１０３、パイプラインレジスタ１０４、入力ワード１１１、制御信号１１２、演算結果信号１１３、演算結果ワード１１４により構成される。 As shown in FIG. 1, the parallel adder circuit according to the first embodiment includes an input processing unit 101, an arithmetic unit 102, an output correction unit 103, a pipeline register 104, an input word 111, a control signal 112, an operation result signal 113, The operation result word 114 is used.

図１において、２つの入力ワード１１１を演算する場合、入力ワードを分割したサブワードの最下位または最上位ビットになるように特定ビットを境界ビットとして設定し、入力加工手段１０１は、パイプラインレジスタ１０４に入力される２つの入力ワード１１１におけるそれぞれの特定ビットに対し、制御信号１１２に基づいた演算を行い、パイプラインレジスタ１０４の境界ビットの入力に、制御信号１１２に基づき、０または１の固定値を入力することで、入力ワードをサブワードに分割する。また、境界ビットのレジスタは２重化されており、１つのレジスタには、入力加工手段１０１の出力データが入力され、２重化されたもう１つのレジスタには、制御信号１１２に基づいた演算を行う前の境界ビットがデータ入力される。後に詳細に述べるように、境界ビットのレジスタを２重化して、入力加工前のデータを入れておくことにより、並列演算時に出力補正を行うことが可能になる。演算器１０２の入力にはパイプラインレジスタ１０４の出力が用いられる。出力補正手段１０３は、二重化されたパイプラインレジスタ１０４の境界ビットの出力と、演算結果信号１１３の特定ビットを入力として用い、制御信号１１２に基づいた演算を行い、特定ビットの演算結果の補正を行う。 In FIG. 1, when two input words 111 are calculated, a specific bit is set as a boundary bit so that it becomes the least significant bit or the most significant bit of a subword obtained by dividing the input word. Is operated on the basis of the control signal 112 for each specific bit in the two input words 111 input to the input, and the fixed value of 0 or 1 is input to the boundary bit input of the pipeline register 104 based on the control signal 112 By inputting, the input word is divided into sub-words. The boundary bit register is duplicated, the output data of the input processing means 101 is input to one register, and the operation based on the control signal 112 is applied to the other duplicated register. The boundary bits before performing the data are input. As will be described later in detail, it is possible to perform output correction at the time of parallel operation by duplicating the boundary bit register and storing data before input processing. The output of the pipeline register 104 is used as the input of the arithmetic unit 102. The output correction means 103 uses the output of the boundary bit of the duplex pipeline register 104 and the specific bit of the operation result signal 113 as inputs, performs an operation based on the control signal 112, and corrects the operation result of the specific bit. Do.

並列加算回路の演算結果ワード１１４として、境界ビットは出力補正手段１０３の出力を用い、それ以外のビットは、演算結果信号１１３を用いることで、並列加算回路の並列演算を実現している。 As the operation result word 114 of the parallel adder circuit, the boundary bit uses the output of the output correction unit 103, and the other bits use the operation result signal 113, thereby realizing the parallel operation of the parallel adder circuit.

このように、パイプラインレジスタ１０４と演算器１０２の間に、サブワード分割のための入力加工手段１０１が入らず、パイプラインレジスタ１０４の前に入力加工手段１０１を設けることにより、パイプラインレジスタ１０４と演算器１０２での演算サイクルにおける、入力加工手段１０１分の論理段数増加を無くし、演算時間の高速化を図ることができる。演算サイクルよりも前に入力データの加工を行うことにより、演算サイクルのゲート段数を減らしている。レジスタをはさむトータル段数としては同じであるが、演算サイクルがネックになっている場合、この方法は有効である。 In this way, the input processing means 101 for subword division does not enter between the pipeline register 104 and the arithmetic unit 102, and the input processing means 101 is provided in front of the pipeline register 104. It is possible to eliminate the increase in the number of logical stages corresponding to the input processing means 101 in the calculation cycle in the calculation unit 102 and to increase the calculation time. By processing the input data before the calculation cycle, the number of gate stages in the calculation cycle is reduced. Although the total number of stages sandwiching the registers is the same, this method is effective when the operation cycle is a bottleneck.

次に、入力加工手段１０１の詳細について、図１、図２を用いて演算器１０２が８ビット加算器である場合を例として説明する。
以下の説明では、８ビット加算器の入力をＡ［７：０］，Ｂ［７：０］とし、加算結果が和Ｓ［７：０］、キャリーＣ［８］で出力されるものとする。なお、一般的に加算結果の和は
Ｓ［ｎ］＝Ａ［ｎ］＾Ｂ［ｎ］＾Ｃ［ｎ］（式１）
と表される。ここで“＾”は排他的論理和を表す。また、各ビット毎のキャリーは
Ｃ［ｎ＋１］＝Ａ［ｎ］・Ｂ［ｎ］＋Ｂ［ｎ］・Ｃ［ｎ］＋Ｃ［ｎ］・Ａ［ｎ］（式２）
と表される。ここで“・”、“＋”はそれぞれ論理積、論理和を表す。 Next, the details of the input processing unit 101 will be described by taking as an example the case where the computing unit 102 is an 8-bit adder with reference to FIGS.
In the following description, it is assumed that the inputs of the 8-bit adder are A [7: 0] and B [7: 0], and the addition result is output as the sum S [7: 0] and carry C [8]. . In general, the sum of the addition results is S [n] = A [n] ^ B [n] ^ C [n] (Formula 1)
It is expressed. Here, “^” represents exclusive OR. The carry for each bit is C [n + 1] = A [n] · B [n] + B [n] · C [n] + C [n] · A [n] (Formula 2)
It is expressed. Here, “·” and “+” represent logical product and logical sum, respectively.

４ビットずつの並列加算を行う場合には、上述の境界ビットとしてビット３を用いるとすると、Ａ［３］、Ｂ［３］を入力加工手段に入力し、この時、入力加工手段では入力される値に関らず０を出力する。つまり、パイプラインレジスタに入力される値は、Ａ［３］＝Ｂ［３］＝０となる。境界ビットの和は、式１より、Ｓ［３］＝Ａ［３］＾Ｂ［３］＾Ｃ［３］、式２より、Ｃ［４］＝Ａ［３］・Ｂ［３］＋Ｂ［３］・Ｃ［３］＋Ｃ［３］・Ａ［３］と表せる。今、Ａ［３］＝Ｂ［３］＝０であるので、Ｓ［３］＝０＾０＾Ｃ［３］＝Ｃ［３］、Ｃ［４］＝０である。すなわち、ビット４にキャリーが伝播しないので、上位４ビットの加算については、上位４ビットの和がＳ［７：４］で得られ、キャリーがＣ［８］で得られる。下位３ビットの加算については、３ビットの和がＳ［２：０］で得られ、キャリーがＳ［３］（＝Ｃ［３］）で得られる。このように、Ａ，Ｂ両方の入力ワードにおける境界ビットの入力値を０とすることで、境界ビットでの和が０となるので、例え、さらに、直前のビットのキャリーが１であったとしても境界ビットにキャリーが発生せず、下位の加算から上位の加算へのキャリー伝播を防ぎ、境界ビットで下位の加算のキャリー結果を得ることができる。実施例である図２では、セレクタ２０１を用いることにより、入力加工手段１０１で入力データを定数０に変換することを実現している。今回の例では、制御信号１１２の値に応じて入力Ａ［ｎ］（またはＢ［ｎ］）または定数０を選択する回路を用いて、出力信号として０が入力されている信号を選択して、必ず０が出力するようにしている。 In the case of performing 4-bit parallel addition, assuming that bit 3 is used as the boundary bit described above, A [3] and B [3] are input to the input processing means. At this time, the input processing means inputs them. Outputs 0 regardless of the value. That is, the value input to the pipeline register is A [3] = B [3] = 0. The sum of the boundary bits is S [3] = A [3] ^ B [3] ^ C [3] from Equation 1 and C [4] = A [3] · B [3] + B [ 3] · C [3] + C [3] · A [3]. Now, since A [3] = B [3] = 0, S [3] = 0 ^ 0 ^ C [3] = C [3], C [4] = 0. That is, since no carry propagates to bit 4, for the addition of the upper 4 bits, the sum of the upper 4 bits is obtained by S [7: 4], and the carry is obtained by C [8]. For the addition of the lower 3 bits, the sum of 3 bits is obtained by S [2: 0], and the carry is obtained by S [3] (= C [3]). Thus, by setting the input value of the boundary bits in both the input words A and B to 0, the sum at the boundary bits becomes 0. For example, if the carry of the immediately preceding bit is 1, However, no carry occurs in the boundary bits, carry propagation from the lower addition to the higher addition can be prevented, and the carry result of the lower addition can be obtained with the boundary bits. In FIG. 2, which is an embodiment, by using the selector 201, it is realized that the input processing means 101 converts the input data into a constant 0. In this example, a circuit that selects input A [n] (or B [n]) or constant 0 according to the value of the control signal 112 is used to select a signal in which 0 is input as an output signal. Therefore, 0 is always output.

また、上位４ビットで減算を行う場合には、入力加工手段の出力として、１を出力すればよい。このようにすると、式２よりＣ［４］＝１であるので、上位４ビットの初期キャリーが１となり、減算を行うことができる。 When subtraction is performed with the upper 4 bits, 1 may be output as the output of the input processing means. In this case, since C [4] = 1 from Equation 2, the initial carry of the upper 4 bits is 1, and subtraction can be performed.

このように、入力加工手段１０１は、図２に示されるような、演算命令に応じて、入力データと、０、１との選択を行うセレクタを用いることで実現ができる。
なお、今回の例では加算の場合、特定データとして０を、減算の場合、特定データとして１を選ぶ回路で説明したが、入力データが負論理の場合は、特定データを反転すればよい。 As described above, the input processing unit 101 can be realized by using a selector for selecting input data and 0 and 1 in accordance with an operation command as shown in FIG.
In the present example, a circuit has been described in which 0 is selected as the specific data for addition and 1 is selected as the specific data for subtraction. However, if the input data is negative logic, the specific data may be inverted.

なお、図１における入力加工手段は、１ビット分であり、入力を２つのサブワードに分割し、２つの並列計算を行っている。境界ビットを２つ以上設けることにより、３つ以上の並列計算が可能となる。３つのサブワードに分割する際は、２つの境界ビットを設けることで実現することができる。そのためには、サブワードの最上位または最下位ビットを境界ビットとすればよい。 The input processing means in FIG. 1 is for one bit, and the input is divided into two subwords to perform two parallel calculations. By providing two or more boundary bits, three or more parallel calculations can be performed. The division into three subwords can be realized by providing two boundary bits. For this purpose, the most significant or least significant bit of the subword may be a boundary bit.

このような入力加工手段を用いることにより、サブワード毎に加算・減算どちらの演算も行うことが可能となる。
ところで、入力データを固定するだけの構成では、ここで説明したように、下位は３ビットの加算になる。すなわち、８ビットの加算器で、４＋３＝７ビットの並列加算しか行えないので、合計８ビットの２並列加算を行うためには、９ビット加算器が必要となり、回路が増大してしまうことになる。 By using such an input processing means, it is possible to perform both addition and subtraction operations for each subword.
By the way, in the configuration that only fixes the input data, as described here, the lower order is addition of 3 bits. That is, since an 8-bit adder can only perform 4 + 3 = 7-bit parallel addition, a 9-bit adder is required to perform a total of 8-bit 2-parallel addition, which increases the circuit. Become.

次に、出力補正手段を用いることにより、８ビットの加算器で合計８ビットの並列演算が行えることを示す。すなわち、ビット長を最大に利用し、回路規模の縮小が図れることを示す。ここでは、境界ビットが１ビットのみ、下位側４ビット目の場合について説明を行う。 Next, it will be shown that a total of 8-bit parallel operations can be performed by an 8-bit adder by using the output correction means. In other words, it shows that the circuit size can be reduced by using the bit length to the maximum. Here, the case where the boundary bit is only 1 bit and the lower 4th bit is described.

図６を参照しながら、具体的な出力補正演算方法について説明する。
図６の右側は、８ビットの加算器を用いて４ビット＋４ビットの並列加算を行う例である。上位４ビットで［１１００］と［０１１１］の加算を行い、下位４ビットで［１１０１］と［０１０１］の加算を行う。結果としては、［１１００］＋［０１１１］＝［１００１１］であるので、上位４ビットの和ＳＵ［３：０］＝［００１１］、キャリーＣＵ＝［１］となる。同様にして、［１１０１］＋［０１０１］＝［１００１０］であるので、上位４ビットの和ＳＬ［３：０］＝［００１０］、キャリーＣＬ＝［１］となる。次に、今回の発明により、これらの値が正しく計算されていることを示す。入力加工手段１０１は実施の形態１のように、制御信号１１２により、下位の最上位ビット、本例の場合はビット３の入力をＡ［３］＝Ｂ［３］＝０に変換する（図６右網掛け部分。）。すると、演算器１０２の入力データは［１１０００１０１］＋［０１１１０１０１］となるので、演算結果信号１１３は、和Ｓ［７：０］＝［００１１１０１０］、キャリーＣ［８］＝［１］となる。Ｓ［７：４］＝［００１１］＝ＳＵ［３：０］、Ｃ［８］＝ＣＵとなるので、上位４ビットの加算結果はＳ［７：４］、Ｃ［８］で得られている。これは、Ａ［３］＝Ｂ［３］＝０に変換することにより、式２のｎ＝３の場合を計算すると、Ｃ［４］＝０となるので、上位４ビットの加算に、キャリーが伝播しないようにされているためである。下位４ビットの加算結果に関しては、Ｓ［２：０］＝［０１０］＝ＳＬ［２：０］であるので、下位の３ビットは正確な値が得られているが、Ｓ［３］およびキャリーＣ［４］が得られていない。ここで、式１より本来はＳ［３］＝Ａ［３］＾Ｂ［３］＾Ｃ［３］、の値を出力しなければならないが、入力加工手段１０１によってＡ［３］＝Ｂ［３］＝０の値が入力されているので、Ｓ［３］＝Ａ［３］＾Ｂ［３］＾Ｃ［３］＝０＾０＾Ｃ［３］＝Ｃ［３］の値が出力されている。また、下位の加算結果のキャリーとして、式２よりＣ［４］＝Ａ［３］・Ｂ［３］＋Ｂ［３］・Ｃ［３］＋Ｃ［３］・Ａ［３］の値を出力する必要がある。そこで、出力補正手段１０３において、加工を行う前のＡ［３］，Ｂ［３］、および、Ｓ［３］から出力されているＣ［３］の値を用いて、Ｓ［３］＝Ａ［３］＾Ｂ［３］＾Ｃ［３］および、Ｃ［４］＝Ａ［３］・Ｂ［３］＋Ｂ［３］・Ｃ［３］＋Ｃ［３］・Ａ［３］の値を計算する。これは、図３において、信号３１１にＣ［３］を入力し、制御信号１１２に１を入力することで得られる。これまでをまとめると、境界部分のビット３においては、加工を行う前のデータと、加工を行ったあとの演算結果を用いて、値を再計算する必要があったが、その他のビットにおいては、加工後のデータに関しては、求める値が出力されていることになる。このため、境界ビットにおいては出力補正手段１０３の演算結果を出力し、境界ビット以外のビットにおいては加工後のサブワードを用いた演算結果を出力することにより、演算結果ワード１１４を再計算することができる。 A specific output correction calculation method will be described with reference to FIG.
The right side of FIG. 6 is an example of performing 4-bit + 4-bit parallel addition using an 8-bit adder. [1100] and [0111] are added with the upper 4 bits, and [1101] and [0101] are added with the lower 4 bits. As a result, since [1100] + [0111] = [10011], the upper 4 bits of the sum SU [3: 0] = [0011] and carry CU = [1]. Similarly, since [1101] + [0101] = [10010], the upper 4 bits of the sum SL [3: 0] = [0010] and carry CL = [1]. Next, it will be shown that these values are calculated correctly according to the present invention. As in the first embodiment, the input processing means 101 converts the input of the lower most significant bit, in this example, bit 3, into A [3] = B [3] = 0 by the control signal 112 (see FIG. 6 right shaded part.). Then, since the input data of the computing unit 102 is [11000101] + [01110101], the computation result signal 113 is the sum S [7: 0] = [00111010] and carry C [8] = [1]. Since S [7: 4] = [0011] = SU [3: 0] and C [8] = CU, the addition result of the upper 4 bits is obtained by S [7: 4] and C [8]. Yes. This is because by converting A [3] = B [3] = 0 and calculating the case of n = 3 in Equation 2, C [4] = 0, so that the addition of the upper 4 bits is carried by the carry. It is because it is made not to propagate. Regarding the addition result of the lower 4 bits, since S [2: 0] = [010] = SL [2: 0], an accurate value is obtained for the lower 3 bits, but S [3] and Carry C [4] is not obtained. Here, the value of S [3] = A [3] ^ B [3] ^ C [3] must be output from Equation 1, but A [3] = B [ Since the value of 3] = 0 is input, the value of S [3] = A [3] ^ B [3] ^ C [3] = 0 ^ 0 ^ C [3] = C [3] is output. Has been. Also, as a carry of the lower addition result, the value of C [4] = A [3] · B [3] + B [3] · C [3] + C [3] · A [3] is output from Equation 2. There is a need. Therefore, the output correction means 103 uses S [3] = A using the values of A [3], B [3] and C [3] output from S [3] before processing. [3] ^ B [3] ^ C [3] and C [4] = A [3] · B [3] + B [3] · C [3] + C [3] · A [3] calculate. This is obtained by inputting C [3] to the signal 311 and inputting 1 to the control signal 112 in FIG. To summarize, in bit 3 of the boundary part, it was necessary to recalculate the value using the data before processing and the operation result after processing, but in other bits As for the processed data, the calculated value is output. Therefore, the calculation result word 114 can be recalculated by outputting the calculation result of the output correction means 103 at the boundary bit and outputting the calculation result using the processed subword at the bit other than the boundary bit. it can.

具体的な回路構成について、図３を参照しながら説明する。
図３の出力補正手段１０３において、並列演算する場合は、Ｃ［ｎ］の値が演算結果信号３１１に出力され、補正前の演算結果は正しい値を示さないので、境界ビットについては、入力ワード１１１であるＡ［ｎ］、Ｂ［ｎ］の値を用いて、補正値として、式１，２に基づいて、Ｃ［ｎ＋１］，Ｓ［ｎ］の値を計算し、境界ビットの演算結果としている。また、境界ビットを設けず、並列演算しない場合には、図６左のように、８ビットの加算器を用いて８ビットの加算を行うこととなる。この場合、入力加工手段１０１は、入力データＡ［ｎ］、Ｂ［ｎ］を選択して出力する。このため、演算器１０２には通常の８ビットデータＡ［７：０］、Ｂ［７：０］が入るので、演算結果はＳ［７：０］となる、このため、図３のネット３１１には、Ｓ［３］が入力される。演算結果ワード１１４としては、Ｓ［３］を出力する必要があるので、図３のＳｎ２の入力を１１４にスルーさせる必要がある。これは、制御信号１１２に０を入力することで、Ｓ［ｎ］＝Ｓｎ２＾０＝Ｓｎ２となり、求める結果が得られることになる。 A specific circuit configuration will be described with reference to FIG.
In the output correction means 103 of FIG. 3, when performing parallel calculation, the value of C [n] is output to the calculation result signal 311 and the calculation result before correction does not indicate a correct value. Using the values of A [n] and B [n] which are 111, the values of C [n + 1] and S [n] are calculated as correction values based on Equations 1 and 2, and the result of calculating the boundary bit It is said. Further, when no boundary bit is provided and parallel calculation is not performed, 8-bit addition is performed using an 8-bit adder as shown in the left of FIG. In this case, the input processing unit 101 selects and outputs the input data A [n] and B [n]. For this reason, normal 8-bit data A [7: 0] and B [7: 0] are input to the computing unit 102, and the computation result is S [7: 0]. Therefore, the net 311 in FIG. Is inputted with S [3]. Since it is necessary to output S [3] as the operation result word 114, the input of Sn2 in FIG. This is because by inputting 0 to the control signal 112, S [n] = Sn2 ^ 0 = Sn2, and the result to be obtained is obtained.

以上は境界ビットが４ビット目のみの場合である。
また、任意のビットでサブワードに区切る場合には、入力加工手段、出力補正手段を全ビットに入れておくことによって実現できる。 The above is the case where the boundary bit is only the fourth bit.
Moreover, when dividing into subwords by arbitrary bits, it can be realized by putting input processing means and output correction means in all bits.

Ｃ［ｎ＋１］は式２の通りである。Ｓ［ｎ］について説明すると、境界ビット以外のビットでは（８ビット加算器で、４ビット＋４ビットの分割を行う場合では、ビット７からビット４と、ビット２からビット０）、制御信号１１１の値が０になるので、図３において、Ｓｎ１＝０＾０＝０、Ｓｎ２＝Ｓ［ｎ］となる。よって、
Ｓ［ｎ］＝Ｓｎ１＾Ｓｎ２＝０＾Ｓ［ｎ］＝Ｓ［ｎ］
となり、正しい加算結果が出力される。境界ビットにおいては、制御信号１１２の値が１になるので、Ｓｎ１＝Ａ［ｎ］＾Ｂ［ｎ］、Ｓｎ２＝Ｃ［ｎ］となる。よって、
Ｓ［ｎ］＝Ｓｎ１＾Ｓｎ２＝Ａ［ｎ］＾Ｂ［ｎ］＾Ｃ［ｎ］＝Ｓ［ｎ］
となり、同様に正しい加算結果が出力される。 C [n + 1] is as in Expression 2. S [n] will be described. For bits other than the boundary bits (in the case of dividing 4 bits + 4 bits with an 8-bit adder, bits 7 to 4 and bits 2 to 0), the control signal 111 Since the value becomes 0, in FIG. 3, Sn1 = 0 ^ 0 = 0 and Sn2 = S [n]. Therefore,
S [n] = Sn1 ^ Sn2 = 0 ^ S [n] = S [n]
The correct addition result is output. In the boundary bit, since the value of the control signal 112 is 1, Sn1 = A [n] ^ B [n] and Sn2 = C [n]. Therefore,
S [n] = Sn1 ^ Sn2 = A [n] ^ B [n] ^ C [n] = S [n]
Similarly, the correct addition result is output.

このように、出力補正回路を用いることにより、加算器のビット長を最大限に利用した並列加算器を構成することが可能になり、回路規模を削減することができる。
以上のように、入力ワードを分割して演算する際に、パイプラインレジスタに入力前のデータに対して、入力加工手段により所定の境界ビットの入力値を演算種類に応じて変換して演算を行い、出力補正手段により境界ビットの入力値の変換に応じて演算結果を補正することにより、回路規模を増加することなく、高速な演算を実現することができる。
（実施の形態２）
以下、図４，図５を用いて、本発明の実施の形態２における並列演算装置について説明する。 As described above, by using the output correction circuit, it is possible to configure a parallel adder that uses the bit length of the adder to the maximum, and the circuit scale can be reduced.
As described above, when the operation is performed by dividing the input word, the input value of a predetermined boundary bit is converted according to the operation type by the input processing unit with respect to the data before being input to the pipeline register. Then, by correcting the calculation result according to the conversion of the input value of the boundary bit by the output correction means, high-speed calculation can be realized without increasing the circuit scale.
(Embodiment 2)
Hereinafter, the parallel arithmetic device according to the second embodiment of the present invention will be described with reference to FIGS.

図４は本発明における実施の形態２の並列加算回路構成を表すブロック図、図５は本発明における実施の形態２の並列加算を説明する図である。図４に示されるように、入力加工手段１０１、演算器１０２、出力補正手段１０３、演算器内部信号４１１により構成される。 FIG. 4 is a block diagram showing the configuration of the parallel adder circuit according to the second embodiment of the present invention, and FIG. 5 is a diagram for explaining the parallel adder according to the second embodiment of the present invention. As shown in FIG. 4, the input processing unit 101, the arithmetic unit 102, the output correction unit 103, and the arithmetic unit internal signal 411 are configured.

入力加工手段１０１については、実施の形態１と同様である。ただし、実施の形態１ではパイプラインレジスタの前段に入れていたが、ここではパイプラインレジスタの後段に入れてもよい。出力補正手段１０３の入力として、演算の中間値である演算器内部信号４１１を用いる。この演算器内部信号４１１は境界ビット目に上がってくるキャリー信号を用いる。すなわち実施の形態１の演算結果信号１１３は実施の形態１の説明で述べたように、境界ビットでの分割を行う場合キャリー信号になる、これと同論理の信号を、演算器の途中から引き抜くことで、全体のゲート段数を減らしている。図１、図３の演算器構成、出力補正手段を用いると、演算器出力にさらに出力補正手段の論理ゲートが入り、演算速度が劣化する。そこで、図４の構成にすることで、実施の形態１の演算結果信号１１３と同論理の信号を演算器の途中から引き抜き、演算の一部と、出力補正を並列に行うことができるようになり、高速化を図ることができる。 The input processing unit 101 is the same as that in the first embodiment. However, in the first embodiment, it is placed before the pipeline register, but here it may be placed after the pipeline register. As an input of the output correction means 103, an arithmetic unit internal signal 411 that is an intermediate value of the calculation is used. The arithmetic unit internal signal 411 uses a carry signal that rises at the boundary bit. That is, as described in the description of the first embodiment, the calculation result signal 113 according to the first embodiment becomes a carry signal when dividing by boundary bits, and a signal having the same logic as this is extracted from the middle of the arithmetic unit. This reduces the total number of gate stages. When the arithmetic unit configuration and the output correction unit shown in FIGS. 1 and 3 are used, a logic gate of the output correction unit is further added to the arithmetic unit output, and the calculation speed is deteriorated. Therefore, by adopting the configuration of FIG. 4, a signal having the same logic as the operation result signal 113 of Embodiment 1 can be extracted from the middle of the arithmetic unit so that a part of the operation and output correction can be performed in parallel. Thus, the speed can be increased.

すべての加算方式において、図４のような実施の形態１の演算結果信号１１３と同論理の信号が存在するわけではない。しかし、近年高速加算方法として一般的に使用されている桁上げ先見方式などでは、該当信号が存在する。次に桁上げ先見方式の場合について例を挙げて説明する。 In all the addition methods, a signal having the same logic as the operation result signal 113 of the first embodiment as shown in FIG. 4 does not exist. However, there is a corresponding signal in the carry look-ahead method generally used as a high-speed addition method in recent years. Next, the carry look ahead method will be described with an example.

図５を用いて、８ビットの桁上げ先見方式の場合の例を示す。桁上げ先見回路の回路図は、一般的に図５のように、桁上げ生成項および伝播項生成部８０１、桁上げ生成部８０２、和生成部８０３からなり、桁上げ生成部の出力が、８本出力され、和生成部８０３の入力となっている。桁上げ生成部８０２の８本の出力は、ビット１から８のキャリーＣ［８：１］に相当する。和生成部８０４は、このキャリー信号と入力信号との演算を行い、和Ｓ［７：０］を出力する。図４における演算器内部信号４１１として、この桁上げ生成部８０２からの出力であるキャリー信号を用いる。 An example in the case of an 8-bit carry look-ahead method will be described with reference to FIG. The circuit diagram of the carry look-ahead circuit generally includes a carry generation term and propagation term generation unit 801, a carry generation unit 802, and a sum generation unit 803 as shown in FIG. Eight are output and are input to the sum generation unit 803. The eight outputs of the carry generation unit 802 correspond to carry C [8: 1] of bits 1 to 8. The sum generation unit 804 calculates the carry signal and the input signal and outputs a sum S [7: 0]. As the arithmetic unit internal signal 411 in FIG. 4, a carry signal that is an output from the carry generation unit 802 is used.

図４において、演算器中間内部信号４１１に境界ビットの桁上げ生成部８０２の出力を用いると、正しい結果が得られるのは、実施の形態１で示した通りである。なぜならば、実施の形態１の演算結果信号１１３と同論理の信号を、演算器の出力からではなく、演算器の途中から引き出しているからである。これにより、桁上げ先見回路の例では、実施の形態１よりも、和生成部８０３の分だけゲート段数を削減、すなわち高速化を図ることができる。 In FIG. 4, when the output of the boundary bit carry generator 802 is used as the arithmetic unit intermediate internal signal 411, the correct result is obtained as described in the first embodiment. This is because a signal having the same logic as the operation result signal 113 of the first embodiment is drawn from the middle of the arithmetic unit, not from the output of the arithmetic unit. Thereby, in the example of the carry look ahead circuit, the number of gate stages can be reduced, that is, the speed can be increased by the sum generation unit 803 as compared with the first embodiment.

よって、第２の従来例と比較しても、演算器の高速化を図ることができる。
なお、同様の論理、機能を表す演算器内部信号、出力補正回路であれば、本発明の効果を得るには、図８に示した加算回路に限定されるものではないことは自明である。 Therefore, even if compared with the second conventional example, the computing unit can be speeded up.
It is obvious that the arithmetic unit internal signal and output correction circuit having the same logic and function are not limited to the addition circuit shown in FIG. 8 in order to obtain the effects of the present invention.

以上のように、演算器の途中に現れるキャリー信号と同論理の信号を引き出すことにより、出力補正を、演算器の演算と並列に行うことができるので、実施の形態１において演算器の演算後出力補正を行よりさらに、回路規模を増加することなく、高速な演算を実現することができる。 As described above, by extracting a signal having the same logic as the carry signal appearing in the middle of the computing unit, output correction can be performed in parallel with the computation of the computing unit. It is possible to realize high-speed calculation without increasing the circuit scale further than the output correction.

本発明にかかる並列演算装置および並列演算方法は、回路規模を増加することなく、高速な演算を実現することができ、入力データを分割して並列算術演算等に有用である。 The parallel computing device and the parallel computing method according to the present invention can realize high-speed computation without increasing the circuit scale, and are useful for parallel arithmetic computation by dividing input data.

本発明の実施の形態１における並列加算回路構成を表すブロック図The block diagram showing the parallel addition circuit structure in Embodiment 1 of this invention 本発明の実施の形態１におけるセレクタの構成図Configuration diagram of the selector in Embodiment 1 of the present invention 本発明の実施の形態１における並列演算装置を例示する図The figure which illustrates the parallel arithmetic unit in Embodiment 1 of this invention 本発明の実施の形態２における並列加算回路構成を表すブロック図The block diagram showing the parallel addition circuit structure in Embodiment 2 of this invention 本発明における実施の形態２の並列加算を説明する図The figure explaining the parallel addition of Embodiment 2 in this invention ８ビットの加算器を用いて４ビットの２つの並列加算を行う例を示す図The figure which shows the example which performs two 4-bit parallel additions using an 8-bit adder 従来の加算器の構成例を示す図The figure which shows the structural example of the conventional adder 従来の演算結果を補正する加算器の構成を例示する図The figure which illustrates the structure of the adder which corrects the conventional calculation result

符号の説明Explanation of symbols

１０１入力加工手段
１０２演算器
１０３出力補正手段
１０４パイプラインレジスタ
１１１入力ワード
１１２制御信号
１１３演算結果信号
１１４演算結果ワード
２０１セレクタ
３１１演算結果信号
４１１演算器内部信号
８０１桁上げ生成項および伝播生成項
８０２桁上げ生成部
８０３和生成部 DESCRIPTION OF SYMBOLS 101 Input processing means 102 Calculator 103 Output correction means 104 Pipeline register 111 Input word 112 Control signal 113 Calculation result signal 114 Calculation result word 201 Selector 311 Calculation result signal 411 Calculator internal signal 801 Carry generation term and propagation generation term 802 Carry generation unit 803 Sum generation unit

Claims

演算に用いる複数の入力ワードをそれぞれ複数のサブワードに分割して並列演算する並列演算方法であって、
前記サブワードの最下位または最上位ビットになるように境界ビットを設定する工程と、
前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する工程と、
前記加工した境界ビットを含む前記サブワードをパイプラインレジスタに入力する工程と、
前記パイプラインレジスタから出力した加工後の前記サブワードを用いて演算を行う工程と、
加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行う工程と、
前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する工程と
を有することを特徴とする並列演算方法。 A parallel operation method in which a plurality of input words used for an operation are each divided into a plurality of subwords to perform a parallel operation,
Setting boundary bits to be the least significant or most significant bits of the subword;
Processing the value of the boundary bit in the subword into a preset value according to the operation type;
Inputting the subword containing the processed boundary bits into a pipeline register;
Performing an operation using the processed subword output from the pipeline register;
Performing the operation on the boundary bit using the boundary bit before processing and the carry value immediately before the boundary bit;
In the boundary bit, the operation result using the boundary bit before processing and the carry immediately before the boundary bit is output, and in the bit other than the boundary bit, the operation result using the processed subword is output. And a step of correcting the calculation result using the processed subword and outputting the calculation result of the input word.

演算に用いる複数の入力ワードをそれぞれ複数のサブワードに分割して並列演算する並列演算方法であって、
前記サブワードの最下位または最上位ビットになるように境界ビットを設定する工程と、
前記サブワードをパイプラインレジスタに入力する工程と、
前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する工程と、
前記加工後の前記サブワードを用いて演算を行う工程と、
加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行う工程と、
前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する工程と
を有することを特徴とする並列演算方法。 A parallel operation method in which a plurality of input words used for an operation are each divided into a plurality of subwords to perform a parallel operation,
Setting boundary bits to be the least significant or most significant bits of the subword;
Inputting the subword into a pipeline register;
Processing the value of the boundary bit in the subword into a preset value according to the operation type;
Performing an operation using the subword after the processing;
Performing the operation on the boundary bit using the boundary bit before processing and the carry value immediately before the boundary bit;
In the boundary bit, the operation result using the boundary bit before processing and the carry immediately before the boundary bit is output, and in the bit other than the boundary bit, the operation result using the processed subword is output. And a step of correcting the calculation result using the processed subword and outputting the calculation result of the input word.

前記キャリーとして前記サブワードを用いた演算中の中間値を用い、前記サブワードを用いた演算と並行して前記境界ビットでの演算を行うことを特徴とする請求項１または請求項２いずれかに記載の並列演算方法。 3. An operation at the boundary bit is performed in parallel with an operation using the subword, using an intermediate value during the operation using the subword as the carry. Parallel calculation method.

前記設定値を０とし、前記演算を加算とすることを特徴とする請求項１または請求項２または請求項３いずれかに記載の並列演算方法。 The parallel operation method according to claim 1, wherein the set value is 0, and the operation is addition.

前記設定値を１とし、前記演算を減算とすることを特徴とする請求項１または請求項２または請求項３いずれかに記載の並列演算方法。 The parallel calculation method according to claim 1, wherein the set value is 1 and the calculation is subtraction.

前記演算は、桁上げ先見加算であることを特徴とする請求項１または請求項２または請求項３または請求項４または請求項５いずれかに記載の並列演算方法。 The parallel calculation method according to claim 1, wherein the calculation is carry look ahead addition, or claim 3, claim 3, claim 4, or claim 5.

演算に用いる複数の入力ワードをそれぞれ複数の境界ビットを最下位または最上位ビットとしてサブワードに分割して並列演算する並列演算装置であって、
前記サブワードの内前記境界ビットの値を演算種類に応じてあらかじめ設定された設定値に加工する入力加工手段と、
前記加工した境界ビットを含む前記サブワードを入力するパイプラインレジスタと、
前記パイプラインレジスタから出力した加工後の前記サブワードを用いて前記演算を行う演算器と、
加工前の前記境界ビットと前記境界ビット直前のキャリーの値を用いて前記境界ビットでの演算を行い、前記境界ビットにおいては加工前の前記境界ビットおよび前記境界ビット直前のキャリーを用いた演算結果を出力し、前記境界ビット以外のビットにおいては前記加工後のサブワードを用いた演算結果を出力することにより、前記加工後のサブワードを用いた演算結果の補正を行い前記入力ワードの演算結果を出力する出力補正手段と
を有することを特徴とする並列演算装置。 A parallel operation device that divides a plurality of input words used for an operation into subwords with a plurality of boundary bits as least significant or most significant bits and performs parallel operations,
Input processing means for processing the value of the boundary bit in the subword into a set value set in advance according to the operation type;
A pipeline register for inputting the subword including the processed boundary bit;
A computing unit that performs the computation using the processed subword output from the pipeline register;
The boundary bit before processing and the carry value immediately before the boundary bit are used to perform an operation on the boundary bit, and the boundary bit before the processing and the result of the operation using the carry immediately before the boundary bit are used in the boundary bit. And outputs the operation result using the processed subword in the bits other than the boundary bits, thereby correcting the operation result using the processed subword and outputting the operation result of the input word. And an output correcting means.