KR102574824B1

KR102574824B1 - Parallel processing apparatus for supporting variable bit number

Info

Publication number: KR102574824B1
Application number: KR1020210053858A
Authority: KR
Inventors: 김태형
Original assignee: 주식회사 모르미
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2023-09-06
Also published as: KR20220146937A

Abstract

일실시예에 의한 병렬 처리 장치는 전처리 유닛들을 포함하며, 제1 신호들 및 제2 신호들을 입력받는 전처리부; 및 합산기들을 포함하는 주처리부를 포함한다. 상기 전처리 유닛들 중 각 전처리 유닛은 쉬프트 연산부를 포함한다. 상기 쉬프트 연산부는 상기 제1 신호들 중 대응하는 제1 신호가 쉬프트된 신호들을 상기 제2 신호들 중 대응하는 제2 신호의 비트들에 따라 상기 합산기들 중 대응하는 합산기로 전달하되, 분할 선택 신호에 따라 상기 쉬프트된 신호들의 일부 비트들이 0이 되도록 제어한다. A parallel processing apparatus according to an embodiment includes pre-processing units, and includes a pre-processing unit receiving first signals and second signals; and a main processing unit including summerers. Each of the preprocessing units includes a shift operation unit. The shift operation unit transfers signals obtained by shifting a corresponding first signal among the first signals to a corresponding adder among the adders according to bits of a corresponding second signal among the second signals, and selects division. Depending on the signal, some bits of the shifted signals are controlled to be 0.

Description

가변 비트 수를 지원하는 병렬 처리 장치{PARALLEL PROCESSING APPARATUS FOR SUPPORTING VARIABLE BIT NUMBER}Parallel processing unit supporting variable number of bits {PARALLEL PROCESSING APPARATUS FOR SUPPORTING VARIABLE BIT NUMBER}

이하 설명하는 기술은 병렬 처리 장치에 관한 것이다.The technology described below relates to parallel processing devices.

높은 데이터 처리 성능을 위하여 병렬 처리 장치에 관한 연구가 많이 수행되고 있다. 병렬 처리 장치의 예로서 멀티코아 프로세서(multi-core processor)가 있다. 멀티코아 프로세서는 복수의 코아(프로세싱 유닛)을 구비하는 프로세서로서, 멀티코아 프로세서가 사용되는 이유는 코아의 개수를 늘림으로써, 전체 프로세서의 성능을 개선하기 위함이다. 그러나, 다양한 이유로 인하여, 코아의 개수를 늘이더라도 전체 프로세서의 성능이 이에 비례하여 증가하지 아니하고 있다. For high data processing performance, many studies on parallel processing devices have been conducted. An example of a parallel processing device is a multi-core processor. A multi-core processor is a processor having a plurality of cores (processing units), and the reason why a multi-core processor is used is to improve the performance of the entire processor by increasing the number of cores. However, for various reasons, even if the number of cores increases, the performance of the entire processor does not increase proportionally.

본 발명의 발명자는 이러한 문제점을 개선하기 위하여 지속적인 개발을 수행하고 있으며, 이에 기반하여 한국특허공개번호 제10-2019-0132295호, 제10-2018-0057950호, 제10-2018-0058166호, 제10-2018-0058167호, 제10-2018-0007523호, 제10-2018-0007652호 및 한국특허등록번호 제 10-1859294호의 발명을 수행한 바 있다. The inventors of the present invention are conducting continuous development to improve these problems, and based on this, Korean Patent Publication Nos. 10-2019-0132295, 10-2018-0057950, 10-2018-0058166, 10-2018-0058167, 10-2018-0007523, 10-2018-0007652 and Korean Patent Registration No. 10-1859294 have been carried out.

한국특허공개번호: 10-2019-0132295, 10-2018-0057950, 10-2018-0058166, 10-2018-0058167, 10-2018-0007523, 10-2018-0007652Korean Patent Publication No.: 10-2019-0132295, 10-2018-0057950, 10-2018-0058166, 10-2018-0058167, 10-2018-0007523, 10-2018-0007652 한국특허등록번호: 10-1859294Korean patent registration number: 10-1859294

종래기술에 의한 병렬 처리 장치는 합산기와 곱셈기를 별도로 구비하고 있다. 이는 병렬 처리 장치의 효율을 저하시킨다. 보다 구체적으로, 곱셈 연산이 많이 요구되는 때에는 병렬 처리 장치의 모든 곱셈기가 활용되나 일부 덧셈기는 활용되지 않는다. 또한 덧셈 연산이 많이 요구되는 때에는 병렬 처리 장치의 모든 덧셈기가 활용되나, 일부 곱셈기는 활용되지 않는다. 또한 종래기술에 의한 병렬 처리 장치는 처리 유닛들 간의 데이터 교환이 용이하지 않는 측면이 있다. 이는 병렬 처리 장치 전체의 성능을 저하시킨다. Parallel processing apparatuses according to the related art have separate adders and multipliers. This lowers the efficiency of the parallel processing unit. More specifically, when many multiplication operations are required, all multipliers of the parallel processing unit are utilized, but some adders are not utilized. In addition, when many addition operations are required, all adders of the parallel processing unit are utilized, but some multipliers are not utilized. In addition, in the parallel processing apparatus according to the prior art, data exchange between processing units is not easy. This degrades the performance of the entire parallel processing unit.

본 개시는 종래기술의 문제점을 해결하기 위한 것으로서, 병렬 처리 장치의 처리 유닛이 곱셈 연산과 덧셈 연산을 모두 수행 가능하게 설계함으로써, 병렬 처리 장치의 효율을 증가시키는 것을 목표로 한다. 또한 본 개시는 처리 유닛들 간의 데이터 교환을 용이하게 함으로써 전체 병렬 처리 장치의 효율을 증가시키는 것을 목표로 한다. 또한 본 개시는 각 처리 유닛이 시간에 따라 다양한 연산들을 수행할 수 있도록 함으로써 전체 병렬 처리 장치의 효율을 증가시키는 것을 목표로 한다. 또한 본 개시는 상술한 개선을 가짐에도 불구하고 전체적인 하드웨어의 복잡도를 크게 증가시키지 않는 것을 목표로 한다.The present disclosure is to solve the problems of the prior art, and aims to increase the efficiency of the parallel processing device by designing a processing unit of the parallel processing device to perform both multiplication and addition operations. Also, the present disclosure aims to increase the efficiency of an overall parallel processing apparatus by facilitating data exchange between processing units. In addition, the present disclosure aims to increase the efficiency of the entire parallel processing unit by allowing each processing unit to perform various operations over time. In addition, the present disclosure aims not to greatly increase the complexity of the overall hardware despite having the above-described improvements.

또한 본 개시는 가변 비트 수를 가지는 입력을 지원하는 것을 목표로 한다. 가변 비트 수를 가지는 입력은 다양한 이유에서 요구된다. 예로서 ANN(Artificial Neural Network), DNN(Deep Neural Network), CNN(합성곱신경망 : Convolution Neural Network), RNN(순환신경망 : Recurrent Neural Network)과 같은 딥러닝 알고리즘에 있어서, 입력 데이터 및 가중치의 비트 수는 성능, 처리 속도, 요구 메모리 용량 등에 영향을 준다. 따라서 요구 사항에 따라 입력 데이터 및 가중치의 비트 수가 조정될 필요가 있다. 또한 수학적 연산을 많이 수행하는 병렬 처리 장치도, 어떤 경우에는 16비트 연산이 수행되는 것이 적절하고, 다른 경우에는 32비트 연산이 수행되는 것이 적절할 수 있다. 이러한 경우에 있어서, 최대 비트 수에 맞추어 프로세서를 설계할 경우, 그보다 적은 비트 수의 연산을 수행하면 프로세서의 상당 부분이 동작하지 않게 되므로 프로세서의 효율이 저하된다. 예로서 32비트 곱셈기를 사용하여 16비트 곱셈을 수행하면 전체 프로세서의 대략 25%만 사용된다. Also, the present disclosure aims to support an input having a variable number of bits. Inputs with a variable number of bits are required for various reasons. For example, in deep learning algorithms such as ANN (Artificial Neural Network), DNN (Deep Neural Network), CNN (Convolution Neural Network), RNN (Recurrent Neural Network), bits of input data and weights The number affects performance, processing speed, and amount of memory required. Therefore, the number of bits of input data and weights needs to be adjusted according to requirements. Also, a parallel processing unit that performs many mathematical operations may be appropriate for 16-bit operations to be performed in some cases and 32-bit operations to be performed in other cases. In this case, when a processor is designed according to the maximum number of bits, if an operation with a smaller number of bits is performed, a significant part of the processor does not operate, and thus the efficiency of the processor is reduced. As an example, performing a 16-bit multiplication using a 32-bit multiplier only uses approximately 25% of the total processor.

일실시예에 의한 병렬 처리 장치는 전처리 유닛들을 포함하며, 제1 신호들 및 제2 신호들을 입력받는 전처리부; 및 합산기들을 포함하는 주처리부를 포함한다. 상기 전처리 유닛들 중 각 전처리 유닛은 쉬프트 연산부를 포함한다. 상기 쉬프트 연산부는 상기 제1 신호들 중 대응하는 제1 신호가 쉬프트된 신호들을 상기 제2 신호들 중 대응하는 제2 신호의 비트들에 따라 상기 합산기들 중 대응하는 합산기로 전달하되, 분할 선택 신호에 따라 상기 쉬프트된 신호들의 일부 비트들이 0이 되도록 제어한다.A parallel processing apparatus according to an embodiment includes pre-processing units, and includes a pre-processing unit receiving first signals and second signals; and a main processing unit including summerers. Each of the preprocessing units includes a shift operation unit. The shift operation unit transfers signals obtained by shifting a corresponding first signal among the first signals to a corresponding adder among the adders according to bits of a corresponding second signal among the second signals, and selects division. Depending on the signal, some bits of the shifted signals are controlled to be 0.

본 개시에 의한 병렬 처리 장치는 단위 유닛이 곱셈 연산과 덧셈 연산을 모두 수행 가능하므로, 높은 병렬 처리 효율을 가진다. 또한, 병렬 처리 장치는 부가적으로 변위 연산 및 쉬프트 연산도 수행할 수 있다는 장점이 있다. 또한 병렬 처리 장치는 처리 유닛들 간의 용이한 데이터 교환을 가능케 한다는 장점을 가진다. 또한 병렬 처리 장치는 각 처리 유닛이 시간에 따라 다양한 연산을 수행할 수 있다는 장점을 가진다. 또한 병렬 처리 장치는 상술한 개선에도 불구하고 하드웨어의 복잡도가 크게 증가하지 아니한다는 장점을 가진다. The parallel processing apparatus according to the present disclosure has high parallel processing efficiency because each unit can perform both multiplication and addition operations. In addition, the parallel processing device has an advantage in that it can additionally perform displacement calculations and shift calculations. In addition, the parallel processing unit has an advantage of enabling easy data exchange between processing units. In addition, the parallel processing unit has an advantage that each processing unit can perform various operations according to time. In addition, the parallel processing device has an advantage that hardware complexity does not greatly increase despite the above-described improvements.

또한 본 개시에 의한 병렬 처리 장치는 가변 비트 수를 가지는 입력을 지원할 수 있다는 장점을 가진다. In addition, the parallel processing device according to the present disclosure has an advantage of supporting an input having a variable number of bits.

도 1은 제1 실시예에 의한 병렬 처리 장치를 나타내는 도면이다.
도 2는 제1 실시예의 i번째 전처리 유닛의 일례를 설명하기 위한 도면이다.
도 3은 제2 실시예에 의한 병렬 처리 장치를 나타내는 도면이다.
도 4은 제3 실시예에 의한 병렬 처리 장치를 나타내는 도면이다.
도 5는 제3 실시예의 i번째 전처리 유닛의 일례를 설명하기 위한 도면이다.
도 6은 제4 실시예에 의한 병렬 처리 장치를 나타내는 도면이다.1 is a diagram showing a parallel processing device according to a first embodiment.
2 is a diagram for explaining an example of the i-th preprocessing unit of the first embodiment.
3 is a diagram showing a parallel processing device according to a second embodiment.
4 is a diagram showing a parallel processing device according to a third embodiment.
5 is a diagram for explaining an example of the i-th preprocessing unit of the third embodiment.
6 is a diagram showing a parallel processing device according to a fourth embodiment.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the technology to be described below can have various changes and various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and it should be understood to include all modifications, equivalents, or substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various elements, but the elements are not limited by the above terms, and are merely used to distinguish one element from another. used only as For example, without departing from the scope of the technology described below, a first element may be referred to as a second element, and similarly, the second element may be referred to as a first element. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In the terms used in this specification, singular expressions should be understood to include plural expressions unless clearly interpreted differently in context, and terms such as “comprising” refer to the described features, numbers, steps, operations, and components. , parts or combinations thereof, but it should be understood that it does not exclude the possibility of the presence or addition of one or more other features or numbers, step-action components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to a detailed description of the drawings, it is to be clarified that the classification of components in the present specification is merely a classification for each main function in charge of each component. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition, each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by .

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing a method or method of operation, each process constituting the method may occur in a different order from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

도 1은 제1 실시예에 의한 병렬 처리 장치를 나타내는 도면이다. 도 1을 참조하면, 병렬 처리 장치는 제1 내지 제N 입력들({X1, Y1}, {X2, Y2}, ... {XN, YN})을 입력받고, 제1 내지 제N 출력들(M1, M2, ... MN)을 출력한다. 여기에서 N은 4이상의 자연수를 의미하며, 일례로 N은 32일 수 있다. 제1 내지 제N 입력들({X1, Y1}, {X2, Y2}, ... {XN, YN})은 제1 신호들(X1, X2, ... XN)과 제2 신호들(Y1, Y2, ... YN)을 구비한다. 병렬 처리 장치는 전처리부(100)와 주처리부(200)를 구비한다. 병렬 처리장치는 지연부(300)와 선택부(400)를 더 구비할 수 있다. 1 is a diagram showing a parallel processing device according to a first embodiment. Referring to FIG. 1, the parallel processing device receives first to Nth inputs ({X1, Y1}, {X2, Y2}, ... {XN, YN}), and outputs first to Nth outputs. Outputs (M1, M2, ... MN). Here, N means a natural number of 4 or more, and for example, N may be 32. The first to Nth inputs {X1, Y1}, {X2, Y2}, ... {XN, YN} are the first signals X1, X2, ... XN and the second signals ( Y1, Y2, ... YN). The parallel processing device includes a pre-processing unit 100 and a main processing unit 200 . The parallel processing device may further include a delay unit 300 and a selection unit 400 .

전처리부(100)가 합산 모드로 동작하는 경우에는 i번째 입력({Xi, Yi})의 제1 신호(Xi)를 제2 신호들(Y1, Y2, ... YN)의 i번째 비트들(Y1[i], Y2[i], ... YN[i])에 따라 주처리부(200)의 합산기들(SUM1, SUM2, ... SUMN)에 각각 전달한다. 여기에서 i는 1 이상이고 N 이하인 자연수이다. 제1 합산기(SUM1)의 입력들이 S1_1, S1_2, ... S1_N이고, 제2 합산기(SUM2)의 입력들이 S2_1, S2_2, ... S2_N이고, 제3 합산기(SUM3)의 입력들이 S3_1, S3_2, ... S3_N이라고 하자. 이때 전처리부(100)의 합산 모드 동작은 일례로 아래와 같이 의사 코드(Pseudo Code)로 표현될 수 있다. When the pre-processor 100 operates in the summation mode, the first signal Xi of the i-th input {Xi, Yi} is converted to the i-th bits of the second signals Y1, Y2, ... YN. According to (Y1[i], Y2[i], ... YN[i]), it is transferred to the adders SUM1, SUM2, ... SUMN of the main processing unit 200, respectively. Here, i is a natural number greater than or equal to 1 and less than or equal to N. The inputs of the first summer SUM1 are S1_1, S1_2, ... S1_N, the inputs of the second summer SUM2 are S2_1, S2_2, ... S2_N, and the inputs of the third summer SUM3 are Let's say S3_1, S3_2, ... S3_N. At this time, the operation of the summation mode of the pre-processing unit 100 may be expressed in pseudo code as follows, for example.

[수학식 1][Equation 1]

(Y1[1] ? X1 : 0) => S1_1, (Y1[1] ? X1 : 0) => S1_1,

(Y1[2] ? X2 : 0) => S1_2, (Y1[2] ? X2 : 0) => S1_2,

......

(Y1[N] ? XN : 0) => S1_N, (Y1[N] ? XN : 0) => S1_N,

(Y2[1] ? X1 : 0) => S2_1, (Y2[1] ? X1 : 0) => S2_1,

(Y2[2] ? X2 : 0) => S2_2, (Y2[2] ? X2 : 0) => S2_2,

......

(Y2[N] ? XN : 0) => S2_N, (Y2[N] ? XN : 0) => S2_N,

......

(YN[1] ? X1 : 0) => SN_1, (YN[1] ? X1 : 0) => SN_1,

(YN[2] ? X2 : 0) => SN_2, (YN[2] ? X2 : 0) => SN_2,

......

(YN[N] ? XN : 0) => SN_N (YN[N] ? XN : 0) => SN_N

상기 수학식에서 [(YN[N] ? XN : 0) => SN_N]은 YN[N]이 1인 경우에 XN이 SN_N으로 전달되고, YN[N]이 0인 경우에 0이 SN_N으로 전달됨을 의미한다. 또한 YN[1]은 1번째 비트(최하위 비트)를 의미하고, YN[N]은 N번째 비트(최상위 비트)를 의미한다.In the above equation, [(YN[N] ? XN : 0) => SN_N] indicates that when YN[N] is 1, XN is transferred to SN_N, and when YN[N] is 0, 0 is transferred to SN_N. it means. Also, YN[1] means the 1st bit (lowest bit), and YN[N] means the Nth bit (most significant bit).

전처리부(100)가 곱셈 모드로 동작하는 경우에는 제1 신호들(X1, X2, ... XN)이 (i-1)비트만큼 쉬프트된 신호들((X1<<(i-1)), (X2<<(i-1)), ... (XN<<(i-1)))을 제2 신호들(Y1, Y2, ... YN)의 i번째 비트들(Y1[i], Y2[i], ... YN[i])에 따라 합산기들(SUM1, SUM2, ... SUMN)에 각각 전달한다. 전처리부(100)의 곱셈 모드 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다. When the preprocessor 100 operates in multiplication mode, the first signals (X1, X2, ... XN) are shifted by (i-1) bits ((X1<<(i-1)) , (X2<<(i−1)), ... (XN<<(i−1))) to the i th bits of the second signals Y1, Y2, ... YN (Y1[i ], Y2[i], ... YN[i]) to the adders SUM1, SUM2, ... SUMN, respectively. The multiplication mode operation of the pre-processor 100 can be expressed in pseudocode as follows, for example.

[수학식 2][Equation 2]

(Y1[1] ? (X1 << 0) : 0)=> S1_1,(Y1[1] ? (X1 << 0) : 0)=> S1_1,

(Y1[2] ? (X1 << 1) : 0)=> S1_2, (Y1[2] ? (X1 << 1) : 0)=> S1_2,

......

(Y1[N] ? (X1 << (N-1)) : 0)=> S1_N,(Y1[N] ? (X1 << (N-1)) : 0)=> S1_N,

(Y2[1] ? (X2 << 0) : 0)=> S2_1,(Y2[1] ? (X2 << 0) : 0)=> S2_1,

(Y2[2] ? (X2 << 1) : 0)=> S2_2, (Y2[2] ? (X2 << 1) : 0)=> S2_2,

......

(Y2[N] ? (X2 << (N-1)) : 0)=> S2_N, (Y2[N] ? (X2 << (N-1)) : 0)=> S2_N,

......

(YN[1] ? (XN << 0) : 0)=> SN_1,(YN[1] ? (XN << 0) : 0)=> SN_1,

(YN[2] ? (XN << 1) : 0)=> SN_2, (YN[2] ? (XN << 1) : 0)=> SN_2,

......

(YN[N] ? (XN << (N-1)) : 0)=> SN_N(YN[N] ? (XN << (N-1)) : 0)=> SN_N

상기 수학식에서 [(XN << (N-1)]은 XN을 좌측(최상위 비트 방향으로)으로 (N-1)비트 쉬프트함을 의미한다. In the above equation, [(XN << (N-1)] means shifting (N-1) bits of XN to the left (in the direction of the most significant bit).

일례로, 동작 모드 선택 신호들(SF1, SF2, ... SFN)에 따라 전처리부(100)가 합산 모드 또는 곱셈 모드로 동작한다. 일례로 병렬 처리 장치 전체에 대하여 1개의 동작 모드 선택 신호가 할당될 수 있다. 이 경우에는 병렬 처리 장치 전체가 합산 모드로 동작하거나 곱셈 모드로 동작하여야 한다. 다른 예로 N개의 동작 모드 선택 신호들(SF1, SF2, ... SFN)이 할당 될 수 있다. 이 경우, N개의 출력들(M1, M2, ... MN) 중 일부는 합산 모드에 따라 얻어진 결과이고, 나머지는 곱셈 모드에 따라 얻어진 결과가 되도록 설정될 수 있다. 가령 N이 4인 경우, M1, M2, M3는 곱셈 모드로 동작하고 M4는 합산 모드로 동작하도록 동작 모드 선택 신호들(SF1, SF2, SF3, SF4)이 설정될 수 있다.For example, the preprocessor 100 operates in an addition mode or a multiplication mode according to the operation mode selection signals SF1, SF2, ... SFN. For example, one operation mode selection signal may be allocated to all parallel processing units. In this case, the entire parallel processing unit must operate in the summation mode or multiplication mode. As another example, N operation mode selection signals SF1, SF2, ... SFN may be allocated. In this case, some of the N outputs M1, M2, ... MN may be set to be results obtained according to the summation mode, and others may be results obtained according to the multiplication mode. For example, when N is 4, the operation mode selection signals SF1, SF2, SF3, and SF4 may be set such that M1, M2, and M3 operate in a multiplication mode and M4 operates in a summation mode.

일례로, 전처리부(100)는 복수의 전처리 유닛들(150_1, 150_2, ... 150_N)을 포함한다. 복수의 전처리 유닛들(150_1, 150_2, ... 150_N)은 선택 연산부들(110_1, 110_2, ... 110_N) 및 쉬프트 연산부들(120_1, 120_2, ... 120_N)을 포함한다. 전처리 유닛(150_i)은 선택 연산부(110_i) 및 쉬프트 연산부(120_i)를 포함한다. For example, the pre-processing unit 100 includes a plurality of pre-processing units 150_1, 150_2, ... 150_N. The plurality of preprocessing units 150_1, 150_2, ... 150_N include selection operation units 110_1, 110_2, ... 110_N and shift operation units 120_1, 120_2, ... 120_N. The pre-processing unit 150_i includes a selection operation unit 110_i and a shift operation unit 120_i.

선택 연산부(110_i)는 전처리 유닛(150_i)이 합산 모드로 동작하는 경우에 동작한다. 선택 연산부(110_i)는 제1 신호들(X1, X2, ... XN)을 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 합산기(SUMi)에 전달하는 기능을 수행한다. 이때 선택 연산부(110_i)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다. The selection operation unit 110_i operates when the preprocessing unit 150_i operates in the summation mode. The selection operation unit 110_i converts the first signals X1, X2, ... XN to the bits Yi[1], Yi[2], ... Yi[N] of the second signal Yi. It performs the function of transmitting to the summer (SUMi) according to the At this time, the operation of the selection operation unit 110_i may be expressed in pseudo code as follows, for example.

[수학식 3][Equation 3]

(Yi[1] ? X1 : 0) => Si_1, (Yi[1] ? X1 : 0) => Si_1,

(Yi[2] ? X2 : 0) => Si_2, (Yi[2] ? X2 : 0) => Si_2,

......

(Yi[N] ? XN : 0) => Si_N, (Yi[N] ? XN : 0) => Si_N,

쉬프트 연산부(120_i)는 전처리 유닛(150_i)이 곱셈 모드로 동작하는 경우에 동작한다. 쉬프트 연산부(120_i)는 제1 신호(Xi)가 0, 1, ... (N-1) 비트만큼 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N-1)))을 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 합산기(SUMi)에 전달하는 기능을 수행한다. 이때 쉬프트 연산부(120_i)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다.The shift operation unit 120_i operates when the preprocessing unit 150_i operates in the multiplication mode. The shift operation unit 120_i converts the first signal Xi into shifted signals by 0, 1, ... (N-1) bits ((Xi<<0), (Xi<<1), ... ( A function of transferring Xi<<(N-1))) to the summer SUMi according to bits (Yi[1], Yi[2], ... Yi[N]) of the second signal Yi. Do it. At this time, the operation of the shift operator 120_i may be expressed in pseudocode as follows, for example.

[수학식 4][Equation 4]

(Yi[1] ? (Xi << 0) : 0)=> Si_1,(Yi[1] ? (Xi << 0) : 0)=> Si_1,

(Yi[2] ? (Xi << 1) : 0)=> Si_2, (Yi[2] ? (Xi << 1) : 0)=> Si_2,

......

(Yi[N] ? (Xi << (N-1)) : 0)=> Si_N,(Yi[N] ? (Xi << (N-1)) : 0)=> Si_N,

전처리 유닛(150_i)은 동작 모드 선택 신호(SFi)에 따라 선택 연산부(110_i)를 동작시키거나 쉬프트 연산부(120_i)를 동작시킨다. 일례로 SFi가 0인 경우가 선택 연산부(110_i)의 동작을 의미하고, 1인 경우가 쉬프트 연산부(120_i)의 동작을 의미하는 경우에, SF1=0, SF2=0 및 SFN=1은 제1 전처리 유닛(150_1), 제2 전처리 유닛(150_2) 및 제N 전처리 유닛(150_N)이 각각 선택 연산부(120_1), 선택 연산부(110_2) 및 쉬프트 연산부(110_N)를 동작시킴을 의미한다. The preprocessing unit 150_i operates the selection operation unit 110_i or the shift operation unit 120_i according to the operation mode selection signal SFi. For example, when SFi is 0 means the operation of the selection operation unit 110_i, and when 1 means the operation of the shift operation unit 120_i, SF1 = 0, SF2 = 0 and SFN = 1 The preprocessing unit 150_1, the second preprocessing unit 150_2, and the Nth preprocessing unit 150_N operate the selection operation unit 120_1, the selection operation unit 110_2, and the shift operation unit 110_N, respectively.

주처리부(200)는 합산기들(SUM1, SUM2, ... SUMN)을 포함한다. i번째 합산기(Mi)는 전달된 신호들(Si_1, Si_2, ... Si_N)을 합산하며, 합산된 결과를 i번째 출력(Mi)으로서 출력한다. 주처리부(200)의 동작은 일례로 아래와 같은 의사 코드로 표현될 수 있다. The main processing unit 200 includes summers SUM1, SUM2, ... SUMN. The i-th summer (Mi) sums the transferred signals (Si_1, Si_2, ... Si_N), and outputs the summed result as the i-th output (Mi). The operation of the main processing unit 200 may be expressed in the following pseudo code, for example.

[수학식 5][Equation 5]

S1_1 + S1_2 + ... S1_N => M1, S1_1 + S1_2 + ... S1_N => M1,

S2_1 + S2_2 + ... S2_N => M2, S2_1 + S2_2 + ... S2_N => M2,

......

SN_1 + SN_2 + ... SN_N => MN, SN_1 + SN_2 + ... SN_N => MN,

지연부(300)는 클록 신호(CLK)에 따라 제1 내지 제N 출력들(M1, M2, ... MN)을 지연하여 출력한다. 이를 위하여 지연부(300)는 복수의 지연 유닛들(DU1, DU2, ... DUN)을 포함한다. 지연부(300)에서 출력되는 신호들(D1, D2, ... DN)은 제1 내지 제N 출력들(M1, M2, ... MN)에 각각 대응한다. The delay unit 300 delays and outputs the first through Nth outputs M1, M2, ... MN according to the clock signal CLK. To this end, the delay unit 300 includes a plurality of delay units DU1, DU2, ... DUN. The signals D1, D2, ... DN output from the delay unit 300 correspond to the first to Nth outputs M1, M2, ... MN, respectively.

선택부(400)는 메모리(미도시)로부터 전달된 신호들(R1, R2, ... RN) 및 지연부(300)에서 출력되는 신호들(D1, D2, ... DN) 중에서 입력 제어 신호들(SI1, SI2, ... SIN)에 따라 선택된 신호들을 제1 신호들(X1, X2, ... XN)로서 출력한다. 예로서, 도면에 표현된 바와 같이, 제1 신호(Xi)는 메모리로부터 전달된 신호(Ri)와 지연부(300)에서 출력된 신호(Di) 중에서 입력 제어 신호(SIi)에 따라 선택된 신호일 수 있다. 다른 예로 제1 신호(Xi)는 메모리로부터 전달된 신호(Ri)와 지연부(300)에서 출력된 2개의 신호들(D(i-1), Di) 중에서 입력 제어 신호(SIi)에 따라 선택된 신호일 수 있다. 즉, 제1 신호(Xi)는 메모리로부터 전달된 신호(Ri), i번째 출력(Mi)에 대응하는 지연부 출력 신호(Di) 및 (i-1)번째 출력(M(i-1))에 대응하는 지연부 출력 신호(D(i-1)) 중에서 입력 제어 신호(SIi)에 따라 선택된 신호일 수 있다. 메모리(미도시)는 일례로 복수의 뱅크를 구비할 수 있다. 일례로 메모리는 N개의 뱅크를 구비하고, N개의 뱅크는 N개의 입력들({X1, Y1}, {X2, Y2}, ... {XN, YN})에 각각 연결될 수 있다. 또한 메모리는 2*N개의 뱅크들을 구비하고, 이들 중 N개의 뱅크들은 N개의 제1 신호들(X1, X2, ... XN)에 각각 연결되고, 나머지 N개의 뱅크들을 N개의 제2 신호들(Y1, Y2, ... YN)에 각각 연결될 수 있다. The selector 400 performs input control among signals R1, R2, ... RN transmitted from a memory (not shown) and signals D1, D2, ... DN output from the delay unit 300. Signals selected according to the signals SI1, SI2, ... SIN are output as first signals X1, X2, ... XN. For example, as shown in the drawing, the first signal Xi may be a signal selected according to the input control signal SIi from among the signal Ri transmitted from the memory and the signal Di output from the delay unit 300. there is. As another example, the first signal Xi is selected according to the input control signal SIi from among the signal Ri transferred from the memory and the two signals D(i-1) and Di output from the delay unit 300. could be a signal That is, the first signal Xi is the signal Ri transferred from the memory, the output signal Di of the delay unit corresponding to the i-th output Mi, and the (i-1)-th output M(i-1) It may be a signal selected according to the input control signal SIi from among the output signals D(i-1) of the delay unit corresponding to . A memory (not shown) may have a plurality of banks, for example. For example, a memory may have N banks, and the N banks may be respectively connected to N inputs ({X1, Y1}, {X2, Y2}, ... {XN, YN}). In addition, the memory includes 2*N banks, of which N banks are respectively connected to N first signals (X1, X2, ... XN), and the remaining N banks are connected to N second signals (Y1, Y2, ... YN) can be connected respectively.

병렬 처리 장치는 이와 같은 구성을 가짐으로써, 1개의 하드웨어로 다양한 연산을 수행할 수 있다. 일례로 병렬 처리 장치는 부분 합산 연산을 수행할 수 있다. 여기에서 부분 합산이란 제1 신호들(X1, X2, ... XN)의 전체 또는 일부를 합산한다는 의미이다. 부분 합산 연산을 수행하기 위해서는 전처리부(100)는 합산 모드로 동작해야 한다. 이 때, i번째 출력(Mi)은 제1 신호들(X1, X2, ... XN)중 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 선택된 신호들의 합산에 대응한다. 가령, N이 4이고, 제2 신호들(Y1, Y2, Y3, Y4)가 이진수로 1011, 1100, 0010, 0111이면, 출력들(M1, M2, M3, M4)은 X4+X2+X1, X4+X3, X2, X3+X2+X1에 각각 해당한다. 이와 같이 전처리부(100)가 합산 모드로 동작하는 경우에 동시에 N개의 부분 합산 연산을 수행할 수 있다. By having such a configuration, the parallel processing unit can perform various calculations with one piece of hardware. For example, a parallel processing unit may perform partial summation operations. Here, partial summation means summing all or part of the first signals X1, X2, ... XN. To perform the partial summation operation, the preprocessor 100 must operate in summation mode. At this time, the i-th output (Mi) is the bits (Yi[1], Yi[2], ... Yi[ N]) corresponds to the summation of the selected signals. For example, if N is 4 and the second signals (Y1, Y2, Y3, Y4) are 1011, 1100, 0010, and 0111 in binary, the outputs (M1, M2, M3, and M4) are X4+X2+X1, Corresponds to X4+X3, X2, and X3+X2+X1, respectively. In this way, when the preprocessor 100 operates in the summation mode, N partial summation operations may be performed simultaneously.

전처리부(100)가 합산 모드로 동작할 때, 변위 연산도 수행될 수 있다. 여기서 변위 연산이라 함은 제1 신호(Xi)를 출력 신호들(M1, M2, ... MN)로 전달함에 있어서 위치를 변경하는 것을 의미한다. 가령 N이 4이고, 제2 신호들(Y1, Y2, Y3, Y4)이 이진수로 1000, 0100, 0010, 0001이면, 출력들(M1, M2, M3, M4)은 X4, X3, X2, X1에 각각 해당한다. 따라서 제1 신호들(X1, X2, X3, X4)가 출력들(M1, M2, M3, M4)로 전달되되 그 위치가 변경되어 전달된다. When the pre-processing unit 100 operates in the summation mode, displacement calculation may also be performed. Here, the displacement operation means to change the position of the first signal Xi in transferring it to the output signals M1, M2, ... MN. For example, if N is 4 and the second signals (Y1, Y2, Y3, Y4) are 1000, 0100, 0010, 0001 in binary, then the outputs (M1, M2, M3, M4) are X4, X3, X2, X1 corresponding to each. Accordingly, the first signals X1, X2, X3, and X4 are transferred to the outputs M1, M2, M3, and M4, but their positions are changed and transferred.

상술한 바와 같이 병렬 처리 장치가 부분 합산 연산 및 변위 연산을 수행함에 있어서, 처리 유닛들 간의 데이터 교환을 용이하게 한다. 여기에서 i번째 처리 유닛은 i번째 전처리 유닛과 i번째 합산기를 포함하는 개념이다. 예로서 제1 처리 유닛(150_1, SUM1)은 첫째 제1 신호(X1)뿐만 아니라 둘째 내지 N번째 제1 신호들(X2, ... XN)을 입력받아 부분 합산을 수행할 수 있다. 또한 제2 처리 유닛(150_2, SUM2)은 둘째 제1 신호(X2) 이외의 제1 신호인 예로서 N번째 제1 신호(XN)을 전달받을 수 있다. As described above, when the parallel processing unit performs the partial summation operation and the displacement operation, data exchange between processing units is facilitated. Here, the i-th processing unit is a concept including an i-th pre-processing unit and an i-th summer. For example, the first processing unit 150_1 (SUM1) may perform partial summing by receiving not only the first first signal X1 but also the second to Nth first signals X2, ... XN. Also, the second processing unit 150_2 (SUM2) may receive the N-th first signal XN as an example, which is a first signal other than the second first signal X2.

일례로 병렬 처리 장치는 곱셈 연산을 수행할 수 있다. 곱셈 연산을 수행하기 위해서는 전처리부(100)는 곱셈 모드로 동작하여야 한다. 이 때, i번째 출력(Mi)이 제1 신호(Xi)와 제2 신호(Yi)의 곱(Xi*Yi)에 대응한다. 가령 N이 4이면, 출력들(M1, M2, M3, M4)은 X1*Y1, X2*Y2, X3*Y3, X4*Y4에 각각 해당한다. 이와 같이, 전처리부(100)가 곱셈 모드로 동작하는 경우에 동시에 N개의 곱셈을 수행할 수 있다. For example, parallel processing units can perform multiplication operations. To perform a multiplication operation, the preprocessor 100 must operate in a multiplication mode. At this time, the i-th output Mi corresponds to the product (Xi*Yi) of the first signal Xi and the second signal Yi. For example, if N is 4, the outputs M1, M2, M3, and M4 correspond to X1*Y1, X2*Y2, X3*Y3, and X4*Y4, respectively. As such, when the pre-processing unit 100 operates in the multiplication mode, N multiplications can be performed at the same time.

전처리부(100)가 곱셈 모드로 동작할 때, 쉬트프 연산도 수행될 수 있다. 가령 N이 4이고, 제2 신호들(Y1, Y2, Y3, Y4)이 이진수로 1000, 0100, 0010, 0001이면, 출력들(M1, M2, M3, M4)은 (X1<<3), (X2<<2), (X3<<1), (X4<<0)에 각각 해당한다. When the pre-processing unit 100 operates in multiplication mode, a shift operation may also be performed. For example, if N is 4 and the second signals (Y1, Y2, Y3, Y4) are 1000, 0100, 0010, 0001 in binary, the outputs (M1, M2, M3, M4) are (X1<<3), Corresponds to (X2<<2), (X3<<1), and (X4<<0), respectively.

병렬 처리 장치는 다양한 연산을 동시에 수행할 수도 있다. 일례로 N이 4일 때, 다음과 같은 연산을 동시에 수행할 수 있다. Parallel processing units may perform various operations simultaneously. For example, when N is 4, the following operations can be performed simultaneously.

M1 = X2 + X3 + X4 [부분 합산 연산]M1 = X2 + X3 + X4 [partial summation operation]

M2 = X1 [변위 연산]M2 = X1 [displacement calculation]

M3 = X3 * Y3 [곱셈 연산]M3 = X3 * Y3 [multiplication operation]

M4 = (X4 << 2) [쉬프트 연산]M4 = (X4 << 2) [shift operation]

이를 위해선 제1 및 제2 전처리 유닛들(150_1, 150_2)이 합산 모드가 되도록 선택 신호들(SF1, SF2)이 설정되어야 하며, 제3 및 제4 전처리 유닛들(150_3, 150_4)이 곱셈 모드가 되도록 선택 신호들(SF3, SF4)이 설정되어야 한다. 또한, 부분 합산 연산이 위와 같이 수행될 수 있도록 Y1이 1110로 설정되어야 하고, 변위 연산이 위와 같이 수행될 수 있도록 Y2가 0001로 설정되어야 하고, 쉬프트 연산이 위와 같이 수행될 수 있도록 Y4가 0100로 설정되어야 한다. To this end, the selection signals SF1 and SF2 must be set so that the first and second preprocessing units 150_1 and 150_2 are in the summation mode, and the third and fourth preprocessing units 150_3 and 150_4 are in the multiplication mode. The selection signals SF3 and SF4 must be set so as to be possible. In addition, Y1 must be set to 1110 so that the partial summation operation can be performed as above, Y2 must be set to 0001 so that the displacement operation can be performed as above, and Y4 must be set to 0100 so that the shift operation can be performed as above. should be set

또한 병렬 처리 장치는 선택 신호(SFi) 및 제2 신호(Yi)를 변경함으로써 병렬 처리 장치의 동작을 매 차례마다 독립적으로 변경할 수 있다. 가령 첫째 차례에서 상술한 바와 같이 M1, M2, M3, M4가 각각 부분 합산 연산, 변위 연산, 곱셈 연산 및 쉬프트 연산을 수행한 후에, 둘째 차례에서 아래와 같이 곱셈 연산, 곱셈 연산, 부분 합산 연산 및 부분 합산 연산을 수행할 수 있다. Also, the parallel processing device may independently change the operation of the parallel processing device each time by changing the selection signal SFi and the second signal Yi. For example, as described above in the first turn, after M1, M2, M3, and M4 perform partial summation, displacement, multiplication, and shift operations, respectively, in the second turn, multiplication, multiplication, partial summation, and partial A summation operation can be performed.

M1 = X1 * Y1 [곱셈 연산]M1 = X1 * Y1 [multiplication operation]

M2 = X2 * Y2 [곱셈 연산]M2 = X2 * Y2 [multiplication operation]

M3 = X1 + X2 + X3 + X4 [부분 합산 연산]M3 = X1 + X2 + X3 + X4 [partial summation operation]

M4 = X2 + X4 [부분 합산 연산]M4 = X2 + X4 [partial summation operation]

이를 위해선 제1 및 제2 전처리 유닛들(150_1, 150_2)이 곱셈 모드가 되도록 선택 신호들(SF1, SF2)이 설정되어야 하며, 제3 및 제4 전처리 유닛들(150_3, 150_4)이 합산 모드가 되도록 선택 신호들(SF3, SF4)이 설정되어야 한다. 또한, 부분 합산 연산이 위와 같이 수행될 수 있도록 Y3 및 Y4가 각각 1111 및 1010로 설정되어야 한다. To this end, the selection signals SF1 and SF2 must be set so that the first and second preprocessing units 150_1 and 150_2 are in the multiplication mode, and the third and fourth preprocessing units 150_3 and 150_4 are in the summation mode. The selection signals SF3 and SF4 must be set so as to be possible. In addition, Y3 and Y4 must be set to 1111 and 1010, respectively, so that the partial summation operation can be performed as above.

이와 같이 제1 실시예에 의한 병렬 처리 장치는 N개의 독립적인 연산들이 동시에 수행될 수 있으며, 또한 N개의 연산들이 매 차례마다 독립적으로 변경될 수 있다. 이는 병렬 처리 장치의 효율을 극대화 시킬 수 있다. As such, in the parallel processing device according to the first embodiment, N independent operations can be simultaneously performed, and the N operations can be independently changed each time. This can maximize the efficiency of the parallel processing unit.

만일 복수의 부분 합산 연산들을 수행하는 부분 합산 병렬 처리부, 복수의 변위 연산들을 수행하는 변위 병렬 처리부, 복수의 곱셈 연산들을 수행하는 곱셈 병렬 처리부 및 복수의 쉬프트 연산들을 수행하는 쉬프트 병렬 처리부를 구비하는 병렬 처리 장치가 있다고 가정하면, 곱셈 연산을 많이 필요로 하는 순간에는 곱셈 병렬 처리부가 100% 활용될 수 있으나, 부분 합산 병렬 처리부, 변위 병렬 처리부 및 쉬프트 병렬 처리부의 활용도는 저조할 것이다. 또한 부분 합산 연산을 많이 필요로 하는 순간에는 부분 합산 병렬 처리부가 100% 활용될 수 있으나, 변위 병렬 처리부, 곱셈 병렬 처리부 및 쉬프트 병렬 처리부의 활용도는 저조할 것이다. If a partial sum parallel processing unit for performing a plurality of partial summation operations, a displacement parallel processing unit for performing a plurality of displacement operations, a multiplication parallel processing unit for performing a plurality of multiplication operations, and a shift parallel processing unit for performing a plurality of shift operations are parallel. Assuming that there is a processing device, the multiplication parallel processing unit can be 100% utilized at the moment when a lot of multiplication operations are required, but utilization of the partial summation parallel processing unit, displacement parallel processing unit, and shift parallel processing unit will be low. In addition, at the moment when a lot of partial summation operations are required, the partial summation parallel processing unit may be 100% utilized, but the utilization of the displacement parallel processing unit, multiplication parallel processing unit, and shift parallel processing unit will be low.

이와 달리 제1 실시예에 의한 병렬 처리 장치는 매 순간마다 전처리 유닛들(150_1, 150_2, ... 150_N)에 의하여 수행되는 연산들을 변경함으로써 병렬 처리 장치의 활용도를 극대화 시킬 수 있다. 가령, 곱셈 연산을 많이 필요로 하는 순간에는 전처리 유닛들(150_1, 150_2, ... 150_N) 중 많은 부분들이 곱셈 연산을 수행하고, 나머지 부분들이 다른 연산들을 수행하도록 설정함으로써 전처리 유닛들(150_1, 150_2, ... 150_N)의 대부분이 활용되도록 할 수 있다. 또한, 부분 합산 연산을 많이 필요로 하는 순간에는 전처리 유닛들(150_1, 150_2, ... 150_N) 중 많은 부분들이 부분 합산 연산을 수행하고, 나머지 부분들이 다른 연산들을 수행하도록 설정함으로써 전처리 유닛들(150_1, 150_2, ... 150_N)의 대부분이 활용되도록 할 수 있다. Unlike this, the parallel processing device according to the first embodiment can maximize utilization of the parallel processing device by changing operations performed by the preprocessing units 150_1, 150_2, ... 150_N at every moment. For example, at the moment when a lot of multiplication operations are required, many of the preprocessing units 150_1, 150_2, ... 150_N perform multiplication operations, and the rest of the preprocessing units 150_1, 150_N perform other operations. 150_2, ... 150_N) can be utilized. In addition, at the moment when a lot of partial summation operations are required, many of the preprocessing units 150_1, 150_2, ... 150_N perform partial summation operations, and the remaining parts perform other operations, so that the preprocessing units ( 150_1, 150_2, ... 150_N) can be utilized.

도 2는 제1 실시예의 i번째 전처리 유닛의 일례를 설명하기 위한 도면이다. 도 2를 참조하면 전처리 유닛은 선택 연산부(110_i) 및 쉬프트 연산부(120_i)를 포함한다. 2 is a diagram for explaining an example of the i-th preprocessing unit of the first embodiment. Referring to FIG. 2 , the preprocessing unit includes a selection operation unit 110_i and a shift operation unit 120_i.

선택 연산부(110_i)는 복수의 역다중화부들(DM1, DM2, ... DMN)을 포함한다. 복수의 역다중화부들(DM1, DM2, ... DMN)은 복수의 제1 신호들(X1, X2, ... XN) 및 0 중에서 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 선택된 신호들을 각각 출력한다. 예로서 제1 역다중화부(DM1)은 제1 신호(X1) 및 0 중에서 제2 신호(Yi)의 첫째 비트(Yi[1])에 따라 선택된 신호를 출력하고, 제2 역다중화부(DM2)은 제1 신호(X2) 및 0 중에서 제2 신호(Yi)의 둘째 비트(Yi[2])에 따라 선택된 신호를 출력하고, 제N 역다중화부(DMN)은 제1 신호(XN) 및 0 중에서 제2 신호(Yi)의 N째 비트(Yi[N])에 따라 선택된 신호를 출력한다. The selection operation unit 110_i includes a plurality of demultiplexers DM1, DM2, ... DMN. A plurality of demultiplexers (DM1, DM2, ... DMN) is a plurality of first signals (X1, X2, ... XN) and bits of the second signal (Yi) of 0 (Yi[1], The signals selected according to Yi[2], ... Yi[N]) are output respectively. For example, the first demultiplexer DM1 outputs a signal selected according to the first bit Yi[1] of the second signal Yi from among the first signal X1 and 0, and the second demultiplexer DM2 ) outputs a signal selected according to the second bit (Yi[2]) of the second signal (Yi) among the first signal (X2) and 0, and the N-th demultiplexer (DMN) outputs the first signal (XN) and A signal selected from 0 according to the Nth bit (Yi[N]) of the second signal (Yi) is output.

쉬프트 연산부(120_i)는 복수의 쉬프트 유닛들(SH1, SH2, ... SHN)을 포함한다. 복수의 쉬프트 유닛들(SH1, SH2, ... SHN)은 제1 신호(Xi)가 쉬프트된 신호들 및 0 중에서 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 선택된 신호들을 각각 출력한다. 예로서 제1 쉬프트 유닛(SH1)은 제1 신호(Xi)가 0비트만큼 쉬프트된 신호(Xi<<0) 및 0 중에서 제2 신호(Yi)의 첫째 비트(Yi[1])에 따라 선택된 신호를 출력하고, 제2 쉬프트 유닛(SH2)은 제1 신호(Xi)가 1비트만큼 쉬프트된 신호(Xi<<1) 및 0 중에서 제2 신호(Yi)의 둘째 비트(Yi[2])에 따라 선택된 신호를 출력하고, 제N 쉬프트 유닛(SHN)은 제1 신호(Xi)가 (N-1)비트만큼 쉬프트된 신호(Xi<<(N-1)) 및 0 중에서 제2 신호(Yi)의 N째 비트(Yi[N])에 따라 선택된 신호를 출력한다. The shift operator 120_i includes a plurality of shift units SH1, SH2, ... SHN. The plurality of shift units (SH1, SH2, ... SHN) include the bits (Yi[1], Yi[2], ... Each signal selected according to Yi[N]) is output. For example, the first shift unit SH1 is selected according to the first bit (Yi[1]) of the second signal (Yi) from among the signals (Xi<<0) obtained by shifting the first signal (Xi) by 0 bits and 0. Outputs a signal, and the second shift unit SH2 generates a signal obtained by shifting the first signal Xi by 1 bit (Xi<<1) and the second bit (Yi[2]) of the second signal Yi among 0 Outputs a signal selected according to, and the N-th shift unit SHN is a signal obtained by shifting the first signal Xi by (N-1) bits (Xi<<(N-1)) and a second signal (0). A signal selected according to the Nth bit (Yi[N]) of Yi) is output.

선택 신호(SFi)에 따라 선택 연산부(110_i) 및 쉬프트 연산부(120_i) 중 어느 하나의 연산부가 동작한다. 예로서 선택 신호(SFi)가 0인 경우 선택 연산부(110_i)가 동작하고, 쉬프트 연산부(120_i)는 동작하지 아니한다. 이때 선택 연산부(110_i)는 역다중화부들(DM1, DM2, ... DMN)로부터 출력된 신호들을 합산기 입력들(Si_1, Si_2, ... Si_N)로서 합산기(SUMi)로 전달하고, 쉬프트 연산부(120_i)는 고 임피던스 신호들(high impedance signals)를 출력한다. 또한 선택 신호(SFi)가 1인 경우 선택 연산부(110_i)가 동작하지 아니하고, 쉬프트 연산부(120_i)는 동작한다. 이때 선택 연산부(110_i)는 고 임피던스 신호들(high impedance signals)를 출력하고, 쉬프트 연산부(120_i)는 쉬프트 유닛들(SH1, SH2, ... SHN)로부터 출력된 신호들을 합산기 입력들(Si_1, Si_2, ... Si_N)로서 합산기(SUMi)로 전달한다.One of the selection operation unit 110_i and the shift operation unit 120_i operates according to the selection signal SFi. For example, when the selection signal SFi is 0, the selection operation unit 110_i operates, and the shift operation unit 120_i does not operate. At this time, the selection operation unit 110_i transfers the signals output from the demultiplexers DM1, DM2, ... DMN to the summer SUMi as summer inputs Si_1, Si_2, ... Si_N, and shifts The calculation unit 120_i outputs high impedance signals. Also, when the selection signal SFi is 1, the selection operation unit 110_i does not operate, and the shift operation unit 120_i operates. At this time, the selection operation unit 110_i outputs high impedance signals, and the shift operation unit 120_i transfers the signals output from the shift units SH1, SH2, ... SHN to the summer inputs Si_1. , Si_2, ... Si_N) to the summer (SUMi).

도면과 달리, 별도의 역다중화부들을 추가하여 선택 신호(SFi)에 따라 선택 연산부(110_i) 출력들 및 쉬프트 연산부(120_i) 출력들 중 일부를 선택할 수 있다. 예로서 선택 신호(SFi)가 합산 모드를 의미하는 경우 별도의 역다중화부들은 선택 연산부(110_i) 출력들을 합산기(SUMi)로 전달하고, 곱셈 모드를 의미하는 경우 별도의 역다중화부들은 쉬프트 연산부(120_i) 출력들을 합산기(SUMi)로 전달할 수 있다. Unlike the drawing, some of the outputs of the selection operation unit 110_i and the outputs of the shift operation unit 120_i may be selected according to the selection signal SFi by adding separate demultiplexers. For example, when the selection signal SFi means the summation mode, separate demultiplexers transfer the outputs of the selection operation unit 110_i to the summer SUMi, and when the selection signal SFi means the summation mode, the separate demultiplexers transfer the outputs to the shift operation unit. (120_i) The outputs may be delivered to the summer SUMi.

도 3은 제2 실시예에 의한 병렬 처리 장치를 나타내는 도면이다. 도 3을 참조하면, 병렬 처리 장치는 제1 내지 제P 입력들({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p+1)}, ... {XP, YP})을 입력받고, 제1 내지 제P 출력들(M1, ... M(p-1), Mp, M(p+1) ... MP)을 출력한다. 여기에서 P은 4이상의 자연수를 의미하며, 일례로 P는 1024일 수 있다. 또한 p는 1 이상이고 P 이하인 자연수를 의미한다. 제1 내지 제P 입력들({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p+1)}, ... {XP, YP})은 제1 신호들(X1, ... X(p-1), Xp, X(p+1), ... XP)과 제2 신호들(Y1, ... Y(p-1), Yp, Y(p+1) ... YP)을 구비한다. 병렬 처리 장치는 전처리부(100A)와 주처리부(200A)를 구비한다. 병렬 처리장치는 지연부(300A)와 선택부(400A)를 더 구비할 수 있다. 3 is a diagram showing a parallel processing device according to a second embodiment. Referring to FIG. 3, the parallel processing device has first to Pth inputs ({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, { X(p+1), Y(p+1)}, ... {XP, YP}) are input, and the first to Pth outputs M1, ... M(p-1), Mp, Outputs M(p+1) ... MP). Here, P means a natural number of 4 or more, and for example, P may be 1024. Also, p means a natural number greater than or equal to 1 and less than or equal to P. 1st to Pth inputs {X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p +1)}, ... {XP, YP}) are the first signals (X1, ... X (p-1), Xp, X (p + 1), ... XP) and the second signal (Y1, ... Y(p-1), Yp, Y(p+1) ... YP). The parallel processing device includes a pre-processing unit 100A and a main processing unit 200A. The parallel processing device may further include a delay unit 300A and a selection unit 400A.

전처리부(100A)는 복수의 전처리 유닛들(... 150_(p-1), 150_p, 150(p+1), ...)을 포함한다. 복수의 전처리 유닛들(... 150_(p-1), 150_p, 150(p+1), ...)은 선택 연산부들(... 110_(p-1), 110_p, 110_(p+1), ...) 및 쉬프트 연산부들(... 120_(p-1), 120_p, 120_(p+1), ...)을 포함한다. 전처리 유닛(150_p)은 선택 연산부(110_p) 및 쉬프트 연산부(120_p)를 포함한다. The pre-processing unit 100A includes a plurality of pre-processing units (... 150_(p-1), 150_p, 150(p+1), ...). The plurality of preprocessing units (... 150_(p-1), 150_p, 150(p+1), ...) are selected operation units (... 110_(p-1), 110_p, 110_(p+ 1), ...) and shift operation units (... 120_(p-1), 120_p, 120_(p+1), ...). The pre-processing unit 150_p includes a selection operation unit 110_p and a shift operation unit 120_p.

선택 연산부(110_p)는 전처리 유닛(150_p)이 합산 모드로 동작하는 경우에 동작한다. 선택 연산부(110_p)는 전처리 유닛(150_p)에 대응하는 제1 신호(Xp) 및 이에 인접한 제1 신호들(예: X(p-Q/2+1), ... X(p-1), X(p+1), ... X(p+Q/2)을 제2 신호(Yp)의 비트들(Yp[1], Yp[2], ... Yp[Q])에 따라 합산기(SUMp)에 전달하는 기능을 수행한다. 여기에서 Q는 4 이상의 짝수를 의미하며, 일례로 Q는 32일 수 있다. 또한 q는 1 이상이고 Q 이하인 자연수를 의미한다. 이때 선택 연산부(110_p)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다. The selection operation unit 110_p operates when the preprocessing unit 150_p operates in the summation mode. The selection operation unit 110_p outputs the first signal Xp corresponding to the preprocessing unit 150_p and the first signals adjacent thereto (eg, X(p−Q/2+1), ... X(p−1), X (p + 1), ... X (p + Q / 2) according to the bits (Yp [1], Yp [2], ... Yp [Q]) of the second signal (Yp) (SUMp), where Q denotes an even number greater than or equal to 4, and for example, Q may be 32. Also, q denotes a natural number greater than or equal to 1 and less than or equal to Q. At this time, the selection operation unit 110_p The operation of can be expressed in pseudocode as follows, for example.

[수학식 6][Equation 6]

(Yp[1] ? X(p-Q/2+1) : 0) => Si_1, (Yp[1] ? X(p-Q/2+1) : 0) => Si_1,

(Yp[2] ? X(p-Q/2+2) : 0) => Si_2, (Yp[2] ? X(p-Q/2+2) : 0) => Si_2,

......

(Yp[Q] ? X(p+Q/2) : 0) => Si_Q, (Yp[Q] ? X(p+Q/2) : 0) => Si_Q,

쉬프트 연산부(120_p)는 전처리 유닛(150_p)이 곱셈 모드로 동작하는 경우에 동작한다. 쉬프트 연산부(120_p)는 제1 신호(Xp)가 0, 1, ... (Q-1) 비트만큼 쉬프트된 신호들((Xp<<0), (Xp<<1), ... (Xp<<(Q-1)))을 제2 신호(Yp)의 비트들(Yp[1], Yp[2], ... Yp[Q])에 따라 합산기(SUMp)에 전달하는 기능을 수행한다. 이때 쉬프트 연산부(120_p)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다.The shift operation unit 120_p operates when the preprocessing unit 150_p operates in the multiplication mode. The shift operation unit 120_p converts the first signal Xp into shifted signals by 0, 1, ... (Q-1) bits ((Xp<<0), (Xp<<1), ... ( A function of transferring Xp<<(Q-1))) to the summer (SUMp) according to the bits (Yp[1], Yp[2], ... Yp[Q]) of the second signal (Yp) do At this time, the operation of the shift operation unit 120_p may be expressed in pseudocode as follows, for example.

[수학식 7][Equation 7]

(Yp[1] ? (Xp << 0) : 0)=> Sp_1,(Yp[1] ? (Xp << 0) : 0)=> Sp_1,

(Yp[2] ? (Xp << 1) : 0)=> Sp_2, (Yp[2] ? (Xp << 1) : 0)=> Sp_2,

......

(Yp[Q] ? (Xp << (Q-1)) : 0)=> Sp_Q,(Yp[Q] ? (Xp << (Q-1)) : 0)=> Sp_Q,

전처리 유닛(150_p)은 동작 모드 선택 신호(SFp)에 따라 선택 연산부(110_p)를 동작시키거나 쉬프트 연산부(120_p)를 동작시킨다. The pre-processing unit 150_p operates the selection operation unit 110_p or the shift operation unit 120_p according to the operation mode selection signal SFp.

주처리부(200A)는 합산기들(... SUM(p-1), SUMp, SUM(p+1), ...)을 포함한다. p번째 합산기(Mp)는 전달된 신호들(Sp_1, Sp_2, ... Si_Q)을 합산하며, 합산된 결과를 p번째 출력(Mp)으로서 출력한다. 주처리부(200A)의 동작은 일례로 아래와 같은 의사 코드로 표현될 수 있다. The main processing unit 200A includes adders (... SUM(p-1), SUMp, SUM(p+1), ...). The p-th adder Mp sums the transmitted signals Sp_1, Sp_2, ... Si_Q, and outputs the summed result as the p-th output Mp. The operation of the main processing unit 200A may be expressed as an example of the following pseudo code.

[수학식 8][Equation 8]

......

S(p-1)_1 + S(p-1)_2 + ... S(p-1)_Q => M(p-1), S(p-1)_1 + S(p-1)_2 + ... S(p-1)_Q => M(p-1),

Sp_1 + Sp_2 + ... Sp_Q => Mp, ...Sp_1 + Sp_2 + ... Sp_Q => Mp, ...

S(p+1)_1 + S(p+1)_2 + ... S(p+1)_Q => M(p+1), S(p+1)_1 + S(p+1)_2 + ... S(p+1)_Q => M(p+1),

......

지연부(300A)는 클록 신호(CLK)에 따라 출력들(... M(p-1), Mp, M(p+1), ...)을 지연하여 출력한다. 이를 위하여 지연부(300A)는 복수의 지연 유닛들(... DU(p-1), DUp, DU(p+1), ...)을 포함한다. 지연부(300A)에서 출력되는 신호들(... D(p-1), Dp, D(p+1), ...)은 출력들(... M(p-1), Mp, M(p+1), ...)에 각각 대응한다. The delay unit 300A delays and outputs outputs (... M(p-1), Mp, M(p+1), ...) according to the clock signal CLK. To this end, the delay unit 300A includes a plurality of delay units (... DU(p-1), DUp, DU(p+1), ...). The signals (... D(p-1), Dp, D(p+1), ...) output from the delay unit 300A are outputs (... M(p-1), Mp, Each corresponds to M(p+1), ...).

선택부(400A)는 메모리(미도시)로부터 전달된 신호들(... R(p-1), Rp, R(p+1), ...) 및 지연부(300A)에서 출력되는 신호들(... D(p-1), Dp, D(p+1), ...) 중에서 입력 제어 신호들(... SI(p-1), SIp, SI(p+1), ...)에 따라 선택된 신호들을 제1 신호들(... X(p-1), Xp, X(p+1), ...)로서 출력한다. 일례로 메모리는 P개의 뱅크를 구비하고, P개의 뱅크는 P개의 입력들({X1, Y1}, {X2, Y2}, ... {XP, YP})에 각각 연결될 수 있다. 또한 메모리는 2P개의 뱅크들을 구비하고, 이들 중 P개의 뱅크들은 P개의 제1 신호들(X1, X2, ... XP)에 각각 연결되고, 나머지 P개의 뱅크들을 P개의 제2 신호들(Y1, Y2, ... YP)에 각각 연결될 수 있다. The selector 400A transmits signals from a memory (not shown) (... R(p-1), Rp, R(p+1), ...) and a signal output from the delay unit 300A. Among the (... D (p-1), Dp, D (p + 1), ...), the input control signals (... SI (p-1), SIp, SI (p + 1), Signals selected according to ...) are output as first signals (... X(p-1), Xp, X(p+1), ...). For example, the memory includes P banks, and the P banks may be connected to P inputs ({X1, Y1}, {X2, Y2}, ... {XP, YP}), respectively. In addition, the memory includes 2P banks, of which P banks are respectively connected to P first signals (X1, X2, ... XP), and the remaining P banks are connected to P second signals (Y1). , Y2, ... YP) may be respectively connected.

도 4는 제3 실시예에 의한 병렬 처리 장치를 나타내는 도면이다. 도 4를 참조하면, 병렬 처리 장치는 제1 내지 제N 입력들({X1, Y1}, {X2, Y2}, ... {XN, YN})을 입력받고, 제1 내지 제N 출력들(M1, M2, ... MN)을 출력한다. 여기에서 N은 8이상의 자연수를 의미하며, 일례로 N은 32일 수 있다. 제1 내지 제N 입력들({X1, Y1}, {X2, Y2}, ... {XN, YN})은 제1 신호들(X1, X2, ... XN)과 제2 신호들(Y1, Y2, ... YN)을 구비한다. 병렬 처리 장치는 전처리부(100B)와 주처리부(200)를 구비한다. 병렬 처리장치는 지연부(300)와 선택부(400)를 더 구비할 수 있다. 4 is a diagram showing a parallel processing device according to a third embodiment. Referring to FIG. 4, the parallel processing device receives first to Nth inputs ({X1, Y1}, {X2, Y2}, ... {XN, YN}), and outputs first to Nth outputs. Outputs (M1, M2, ... MN). Here, N means a natural number of 8 or more, and for example, N may be 32. The first to Nth inputs {X1, Y1}, {X2, Y2}, ... {XN, YN} are the first signals X1, X2, ... XN and the second signals ( Y1, Y2, ... YN). The parallel processing device includes a pre-processing unit 100B and a main processing unit 200 . The parallel processing device may further include a delay unit 300 and a selection unit 400 .

전처리부(100B)는 복수의 전처리 유닛들(150B_1, 150B_2, ... 150B_N)을 포함한다. 복수의 전처리 유닛들(150B_1, 150B_2, ... 150B_N)은 선택 연산부들(110B_1, 110B_2, ... 110B_N) 및 쉬프트 연산부들(120B_1, 120B_2, ... 120B_N)을 포함한다. 전처리 유닛(150B_i)은 선택 연산부(110B_i) 및 쉬프트 연산부(120B_i)를 포함한다. The pre-processing unit 100B includes a plurality of pre-processing units 150B_1, 150B_2, ... 150B_N. The plurality of preprocessing units 150B_1, 150B_2, ... 150B_N include selection operation units 110B_1, 110B_2, ... 110B_N and shift operation units 120B_1, 120B_2, ... 120B_N. The pre-processing unit 150B_i includes a selection operation unit 110B_i and a shift operation unit 120B_i.

선택 연산부(110B_i)는 전처리 유닛(150B_i)이 합산 모드로 동작하는 경우에 동작한다. 선택 연산부(110B_i)는 제1 신호들(X1, X2, ... XN)을 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 합산기(SUMi)에 전달하되, 분할 선택 신호(DSi)에 따라 제1 신호들(X1, X2, ... XN)의 일부 비트들을 쉬프트하여 합산기(SUMi)에 전달한다.The selection operation unit 110B_i operates when the pre-processing unit 150B_i operates in the summation mode. The selection operation unit 110B_i converts the first signals X1, X2, ... XN to the bits Yi[1], Yi[2], ... Yi[N] of the second signal Yi. According to the division selection signal DSi, some bits of the first signals X1, X2, ... XN are shifted and transmitted to the summer SUMi.

분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면(분할하지 않음에 대응하는 신호이면), 선택 연산부(110B_i)가 제1 신호들(X1, X2, ... XN)의 모든 비트들을 쉬프트하지 않은 채로 합산기(SUMi)로 전달한다. If the division selection signal DSi is a signal corresponding to division 1 (a signal corresponding to non-division), the selection operation unit 110B_i shifts all bits of the first signals X1, X2, ... XN. It is passed to the summator (SUMi) without doing so.

제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 선택 연산부(110B_i)가 제1 신호들(X1, X2, ... XN)의 최상위 (N/2) 비트들을 (N/2) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 최하위 (N/2) 비트들을 0 비트 쉬프트하여(쉬프트하지 않은 채로) 합산기(SUMi)로 전달한다. The number of bits of the first signals X1, X2, ... XN is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is divided into two divisions. If it is a corresponding signal, the selection operation unit 110B_i bit-shifts the most significant (N/2) bits of the first signals X1, X2, ... XN by (N/2), and the first signals X1, The least significant (N/2) bits of X2, ... XN) are shifted by 0 bits (without shifting) and transmitted to the summer SUMi.

제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 선택 연산부(110B_i)가 제1 신호들(X1, X2, ... XN)의 N 내지 (N*3/4+1) 비트들을 (N*3/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*3/4) 내지 (N*2/4+1) 비트들을 (N*2/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*2/4) 내지 (N*1/4+1) 비트들을 (N*1/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*1/4) 내지 1 비트들을 0 비트 쉬프트하여(쉬프트하지 않은 채로) 합산기(SUMi)로 전달한다. The number of bits of the first signals X1, X2, ... XN is N (N is 8 or more and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi If is a signal corresponding to division by 4, the selection operation unit 110B_i converts N to (N*3/4+1) bits of the first signals X1, X2, ... XN to (N*3/4) bit shift, (N*3/4) to (N*2/4+1) bits of the first signals (X1, X2, ... XN) are (N*2/4) bit shifted, and (N*2/4) to (N*1/4+1) bits of 1 signals (X1, X2, ... XN) are bit-shifted by (N*1/4), and the first signals (X1) , X2, ... XN) are shifted (N*1/4) to 1 bits by 0 bits (without shifting) and transmitted to the summer SUMi.

선택 연산부(110B_i)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다. An operation of the selection operation unit 110B_i may be expressed in pseudo code as follows, for example.

[수학식 9][Equation 9]

(Yi[1] ? XX1 : 0) => Si_1, (Yi[1] ? XX1 : 0) => Si_1,

(Yi[2] ? XX2 : 0) => Si_2, (Yi[2] ? XX2 : 0) => Si_2,

......

(Yi[N] ? XXN : 0) => Si_N, (Yi[N] ? XXN : 0) => Si_N,

상기 계산식에서 XX1, XX2, ... XXN는 제1 신호들(X1, X2, ... XN) 및 분할 선택 신호(DSi)에 따라 정해진다. N이 8 및 16인 경우의 XXi의 일례가 표 1 및 2에 각각 표시되어 있다. In the above formula, XX1, XX2, ... XXN are determined according to the first signals X1, X2, ... XN and the division selection signal DSi. Examples of XXi where N is 8 and 16 are shown in Tables 1 and 2, respectively.

XXiXXi 1분할(분할하지 않음)1 division (no division) {00000000, Xi[8:1]}{00000000, Xi[8:1]} 2분할2 divisions {0000, Xi[8:5], 0000, Xi[4:1]}{0000, Xi[8:5], 0000, Xi[4:1]} 4분할4 divisions {00, Xi[8:7], 00, Xi[6:5], 00, Xi[4:3], 00, Xi[2:1]}{00, Xi[8:7], 00, Xi[6:5], 00, Xi[4:3], 00, Xi[2:1]}

XXiXXi 1분할(분할하지 않음)1 division (no division) {0000000000000000, Xi[16:1]}{0000000000000000, Xi[16:1]} 2분할2 divisions {00000000, Xi[16:9], 00000000, Xi[8:1]}{00000000, Xi[16:9], 00000000, Xi[8:1]} 4분할4 divisions {0000, Xi[16:13], 0000 Xi[12:9], 0000, Xi[8:5], 0000, Xi[4:1]}{0000, Xi[16:13], 0000 Xi[12:9], 0000, Xi[8:5], 0000, Xi[4:1]}

상기 표에서 {0000, Xi[8:5], 0000, Xi[4:1]}은 최상위 4비트는 0000, 다음 4비트는 Xi[8:5], 다음 4비트는 0000, 최하위 4비트는 Xi[4:1]로 구성된 총 16비트의 수를 의미한다. 본 기술이 속한 분야에서 통상의 지식을 가진 자는 상술한 설명으로부터 8 분할, 16 분할 및 그 초과의 분할일 때의 선택 연산부(110B_i)의 동작을 쉽게 예측할 수 있으므로 이에 대한 설명은 설명의 편의상 생략한다. In the table above, for {0000, Xi[8:5], 0000, Xi[4:1]}, the most significant 4 bits are 0000, the next 4 bits are Xi[8:5], the next 4 bits are 0000, and the least significant 4 bits are It means the total number of 16 bits composed of Xi[4:1]. Since a person with ordinary knowledge in the field to which the present technology belongs can easily predict the operation of the selection operation unit 110B_i in the case of 8 divisions, 16 divisions, and more divisions from the above description, the description thereof will be omitted for convenience of explanation. .

쉬프트 연산부(120B_i)는 전처리 유닛(150B_i)이 곱셈 모드로 동작하는 경우에 동작한다. 쉬프트 연산부(120B_i)는 제1 신호(Xi)가 0, 1, ... (N-1) 비트만큼 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N-1)))을 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 합산기(SUMi)에 전달하되, 분할 선택 신호(DSi)에 따라 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N-1)))의 일부 비트들이 0이 되도록 제어한다. The shift operation unit 120B_i operates when the pre-processing unit 150B_i operates in the multiplication mode. The shift operation unit 120B_i converts the first signal Xi into shifted signals by 0, 1, ... (N-1) bits ((Xi<<0), (Xi<<1), ... ( Xi<<(N-1))) is transferred to the summer (SUMi) according to the bits (Yi[1], Yi[2], ... Yi[N]) of the second signal Yi, Some bits of shifted signals ((Xi<<0), (Xi<<1), ... (Xi<<(N-1))) according to the division selection signal DSi are controlled to be 0. .

분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면(분할하지 않음에 대응하는 신호이면), 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N-1)))의 일부 비트들은 0으로 설정되지 아니한다. If the division selection signal DSi is a signal corresponding to division 1 (a signal corresponding to non-division), the shifted signals ((Xi<<0), (Xi<<1), ... (Xi< Some bits of <(N-1))) are not set to 0.

제1 신호(Xi)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 쉬프트 연산부(120B_i)는 제1 신호(Xi)가 0 비트 내지 (N/2-1) 비트 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N/2-1)))의 최상위 (N/2) 비트들이 0이 되도록 제어하고, 제1 신호(Xi)가 (N/2) 비트 내지 (N-1) 비트 쉬프트된 신호들((Xi<<(N/2)), (Xi<<(N/2+1)), ... (Xi<<(N-1)))의 최하위 (N/2) 비트들이 0이 되도록 제어한다. If the number of bits of the first signal Xi is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into two, a shift operator ( 120B_i is signals obtained by shifting the first signal Xi by 0 bits to (N/2-1) bits ((Xi<<0), (Xi<<1), ... (Xi<<(N/ 2-1))) is controlled so that the most significant (N/2) bits are 0, and the first signal Xi is shifted from (N/2) bits to (N-1) bits ((Xi<< (N/2)), (Xi<<(N/2+1)), ... (Xi<<(N-1))), the least significant (N/2) bits are controlled to be 0.

제1 신호(Xi)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 쉬프트 연산부(120B_i)는 제1 신호(Xi)가 0 비트 내지 (N*1/4-1) 비트 쉬프트된 신호들((Xi<<0), (Xi<<1), ... (Xi<<(N*1/4-1)))의 최상위 (N*3/4) 비트들이 0이 되도록 제어하고, 제1 신호(Xi)가 (N*1/4) 비트 내지 (N*2/4-1) 비트 쉬프트된 신호들((Xi<<(N*1/4)), (Xi<<(N*1/4+1)), ... (Xi<<(N*2/4-1)))의 최상위 (N*2/4) 비트들 및 최하위 (N*1/4) 비트들이 0이 되도록 제어하고, 제1 신호(Xi)가 (N*2/4) 비트 내지 (N*3/4-1) 비트 쉬프트된 신호들((Xi<<(N*2/4)), (Xi<<(N*2/4+1)), ... (Xi<<(N*3/4-1)))의 최상위 (N*1/4) 비트들 및 최하위 (N*2/4) 비트들이 0이 되도록 제어하고, 제1 신호(Xi)가 (N*3/4) 비트 내지 (N-1) 비트 쉬프트된 신호들((Xi<<(N*3/4)), (Xi<<(N*3/4+1)), ... (Xi<<(N-1)))의 최하위 (N*3/4) 비트들이 0이 되도록 제어한다. If the number of bits of the first signal Xi is N (N is 8 or more and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into four , The shift operation unit 120B_i converts the first signal Xi to 0 bit to (N*1/4-1) bit-shifted signals ((Xi<<0), (Xi<<1), ... ( The most significant (N*3/4) bits of Xi<<(N*1/4-1))) are controlled to be 0, and the first signal (Xi) is (N*1/4) bits to (N* 2/4-1) Bit-shifted signals ((Xi<<(N*1/4)), (Xi<<(N*1/4+1)), ... (Xi<<(N* The most significant (N*2/4) bits and the least significant (N*1/4) bits of 2/4-1))) are controlled to be 0, and the first signal (Xi) is (N*2/4) Bit to (N*3/4-1) bit-shifted signals ((Xi<<(N*2/4)), (Xi<<(N*2/4+1)), ... (Xi <<(N*3/4-1))), the most significant (N*1/4) bits and the least significant (N*2/4) bits are controlled to be 0, and the first signal (Xi) is (N *3/4) bit to (N-1) bit-shifted signals ((Xi<<(N*3/4)), (Xi<<(N*3/4+1)), ... ( The least significant (N*3/4) bits of Xi<<(N-1))) are controlled to be 0.

쉬프트 연산부(120B_i)의 동작은 일례로 아래와 같이 의사 코드로 표현될 수 있다.The operation of the shift operation unit 120B_i may be expressed in pseudocode as follows, for example.

[수학식 10][Equation 10]

(Yi[1] ? ((Xi & DC1) << 0) : 0)=> Si_1,(Yi[1] ? ((Xi & DC1) << 0) : 0)=> Si_1,

(Yi[2] ? ((Xi & DC2) << 1) : 0)=> Si_2, (Yi[2] ? ((Xi & DC2) << 1) : 0)=> Si_2,

......

(Yi[N] ? ((Xi & DCN) << (N-1)) : 0)=> Si_N(Yi[N] ? ((Xi & DCN) << (N-1)) : 0)=> Si_N

상기 수학식에서 (Xi & DCN)은 Xi와 DCN을 비트 단위로 논리곱 연산(bitwise AND operation)을 수행함을 의미한다. 분할 상수들(DC1~DCN)는 분할 선택 신호(DSi)에 따라 정해진다. N이 8인 및 16인 경우의 분할 상수들(DC1~DCN)의 일례가 표 3 및 4에 각각 표시되어 있다. In the above equation, (Xi & DCN) means that a bitwise AND operation is performed on Xi and DCN in units of bits. The division constants DC1 to DCN are determined according to the division selection signal DSi. Examples of division constants (DC1 to DCN) when N is 8 and 16 are shown in Tables 3 and 4, respectively.

1분할
(분할하지 않음)1 division
(do not split) 2분할2 divisions 4분할4 divisions DC1DC1 1111111111111111 0000111100001111 0000001100000011 DC2DC2 1111111111111111 0000111100001111 0000001100000011 DC3DC3 1111111111111111 0000111100001111 0000110000001100 DC4DC4 1111111111111111 0000111100001111 0000110000001100 DC5DC5 1111111111111111 1111000011110000 0011000000110000 DC6DC6 1111111111111111 1111000011110000 0011000000110000 DC7DC7 1111111111111111 1111000011110000 1100000011000000 DC8DC8 1111111111111111 1111000011110000 1100000011000000

1분할
(분할하지 않음)1 division
(do not split) 2분할2 divisions 4분할4 divisions DC1DC1 11111111111111111111111111111111 00000000111111110000000011111111 00000000000011110000000000001111 DC2DC2 11111111111111111111111111111111 00000000111111110000000011111111 00000000000011110000000000001111 DC3DC3 11111111111111111111111111111111 00000000111111110000000011111111 00000000000011110000000000001111 DC4DC4 11111111111111111111111111111111 00000000111111110000000011111111 00000000000011110000000000001111 DC5DC5 11111111111111111111111111111111 00000000111111110000000011111111 00000000111100000000000011110000 DC6DC6 11111111111111111111111111111111 00000000111111110000000011111111 00000000111100000000000011110000 DC7DC7 11111111111111111111111111111111 00000000111111110000000011111111 00000000111100000000000011110000 DC8DC8 11111111111111111111111111111111 00000000111111110000000011111111 00000000111100000000000011110000 DC9DC9 11111111111111111111111111111111 11111111000000001111111100000000 00001111000000000000111100000000 DC10DC10 11111111111111111111111111111111 11111111000000001111111100000000 00001111000000000000111100000000 DC11DC11 11111111111111111111111111111111 11111111000000001111111100000000 00001111000000000000111100000000 DC12DC12 11111111111111111111111111111111 11111111000000001111111100000000 00001111000000000000111100000000 DC13DC13 11111111111111111111111111111111 11111111000000001111111100000000 11110000000000001111000000000000 DC14DC14 11111111111111111111111111111111 11111111000000001111111100000000 11110000000000001111000000000000 DC15DC15 11111111111111111111111111111111 11111111000000001111111100000000 11110000000000001111000000000000 DC16DC16 11111111111111111111111111111111 11111111000000001111111100000000 11110000000000001111000000000000

본 기술이 속한 분야에서 통상의 지식을 가진 자는 상술한 설명으로부터 8 분할, 16 분할, 또는 그 초과의 분할일 때의 쉬프트 연산부(120B_i)의 동작을 쉽게 예측할 수 있으므로 이에 대한 설명은 설명의 편의상 생략한다. 전처리 유닛(150B_i)은 동작 모드 선택 신호(SFi)에 따라 선택 연산부(110B_i)를 동작시키거나 쉬프트 연산부(120B_i)를 동작시킨다. 일례로 SFi가 0인 경우가 선택 연산부(110B_i)의 동작을 의미하고, 1인 경우가 쉬프트 연산부(120B_i)의 동작을 의미하는 경우에, SF1=0, SF2=0 및 SFN=1은 제1 전처리 유닛(150B_1), 제2 전처리 유닛(150B_2) 및 제N 전처리 유닛(150B_N)이 각각 선택 연산부(120B_1), 선택 연산부(110B_2) 및 쉬프트 연산부(110B_N)를 동작시킴을 의미한다. Since a person with ordinary knowledge in the field to which the present technology belongs can easily predict the operation of the shift operation unit 120B_i in case of 8 divisions, 16 divisions, or more divisions from the above description, the description thereof is omitted for convenience of explanation. do. The preprocessing unit 150B_i operates the selection operation unit 110B_i or the shift operation unit 120B_i according to the operation mode selection signal SFi. For example, when SFi is 0, the operation of the selection operation unit 110B_i is indicated, and when 1 is the operation of the shift operation unit 120B_i, SF1 = 0, SF2 = 0, and SFN = 1 are the first The pre-processing unit 150B_1, the second pre-processing unit 150B_2, and the Nth pre-processing unit 150B_N operate the selection operation unit 120B_1, the selection operation unit 110B_2, and the shift operation unit 110B_N, respectively.

주처리부(200), 지연부(300) 및 선택부(400)의 동작은 도 1 및 이에 대한 설명과 동일하므로 설명의 편의상 생략한다. Since operations of the main processing unit 200, the delay unit 300, and the selection unit 400 are the same as those of FIG. 1 and the description thereof, they are omitted for convenience of description.

병렬 처리 장치는 이와 같은 구성을 가짐으로써, 1개의 하드웨어로 다양한 연산을 수행할 수 있다. 예로서 병렬 처리 장치는 부분 합산 연산, 변위 연산, 곱셈 연산 및 쉬트프 연산을 수행할 수 있다. 이러한 연산에 대한 설명은 이미 도 1 및 이에 대한 설명에서 이미 수행하였으므로 설명의 편의상 생략한다. By having such a configuration, the parallel processing unit can perform various calculations with one piece of hardware. As an example, the parallel processing unit may perform partial summation, displacement, multiplication, and shift operations. Since the description of these operations has already been performed in FIG. 1 and the description thereof, they are omitted for convenience of description.

또한 병렬 처리 장치는 이와 같은 구성을 가짐으로써, 다양한 비트 수의 입력에 대하여 곱셈 및 덧셈 연산을 수행할 수 있다. 예로서 전처리 유닛(150B_i)이 합산 모드로 동작하고, 8비트로 구성된 제1 신호들(X1, X2, ... X8)이 {a1_8, a1_7, a1_6, a1_5, a1_4, a1_3, a1_2, a1_1}, {a2_8, a2_7, a2_6, a2_5, a2_4, a2_3, a2_2, a2_1}, ... {a8_8, a8_7, a8_6, a8_5, a8_4, a8_3, a8_2, a8_1}이고, 8비트로 구성된 제2 신호(Yi)가 {b8, b7, b6, b5, b4, b3, b2, b1}이라고 가정하자. 만일 분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면, 수학식 9 및 표 1에 따라 출력(Mi[16:1])은 b1*{a1_8, a1_7, a1_6, a1_5, a1_4, a1_3, a1_2, a1_1} + b2*{a2_8, a2_7, a2_6, a2_5, a2_4, a2_3, a2_2, a2_1} + ... + b8*{a8_8, a8_7, a8_6, a8_5, a8_4, a8_3, a8_2, a8_1}의 값을 가진다. In addition, the parallel processing device can perform multiplication and addition operations on inputs of various numbers of bits by having such a configuration. For example, the preprocessing unit 150B_i operates in the summation mode, and the first signals X1, X2, ... X8 composed of 8 bits are {a1_8, a1_7, a1_6, a1_5, a1_4, a1_3, a1_2, a1_1}, {a2_8, a2_7, a2_6, a2_5, a2_4, a2_3, a2_2, a2_1}, ... {a8_8, a8_7, a8_6, a8_5, a8_4, a8_3, a8_2, a8_1}, and the second signal Yi composed of 8 bits Assume {b8, b7, b6, b5, b4, b3, b2, b1}. If the division selection signal DSi corresponds to division 1, according to Equation 9 and Table 1, the output (Mi[16:1]) is b1*{a1_8, a1_7, a1_6, a1_5, a1_4, a1_3, a1_2 , a1_1} + b2*{a2_8, a2_7, a2_6, a2_5, a2_4, a2_3, a2_2, a2_1} + ... + b8*{a8_8, a8_7, a8_6, a8_5, a8_4, a8_3, a8_2, a8_1} have

또한, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 수학식 9 및 표 1에 따라 출력(Mi[16:9])은 b1*{a1_8, a1_7, a1_6, a1_5} + b2*{a2_8, a2_7, a2_6, a2_5} + ... + b8*{a8_8, a8_7, a8_6, a8_5}의 값을 가지고, 출력(Mi[8:1])은 b1*{a1_4, a1_3, a1_2, a1_1} + b2*{a2_4, a2_3, a2_2, a2_1} + ... + b8*{a8_4, a8_3, a8_2, a8_1}의 값을 가진다. 이를 일반화 하면 다음과 같다. 제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 합산부(SUMi)의 출력(Mi)의 최상위 N 비트들은 제1 신호(X1, X2, ... XN)들 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 최상위 (N/2) 비트들의 합에 해당하고, 합산부(SUMi)의 출력(Mi)의 최하위 N 비트들은 제1 신호들(X1, X2, ... XN) 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 최하위 (N/2) 비트들의 합에 해당한다. 본 실시예에 의한 병렬 처리 장치는 일반적인 방식(예: 8개의 제1 신호들(각 제1 신호는 8비트를 가짐)을 입력받는 합산기를 이용하여 8개의 제1 신호들(각 제1 신호는 4비트를 가짐)의 합산을 수행하는 경우 즉 제1 신호들(X1, X2, ... XN)로서 {0000, a1_4, a1_3, a1_2, a1_1}, {0000, a2_4, a2_3, a2_2, a2_1}, ... {0000, a8_4, a8_3, a8_2, a8_1}을 입력받고, 제2 신호(Yi)로 {b8, b7, b6, b5, b4, b3, b2, b1}를 입력받아 덧셈을 수행하는 경우) 대비하여, 합산기(SUMi)의 하드웨어의 활용도를 높이고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 개수를 증가시킨다. In addition, if the division selection signal DSi corresponds to division by 2, the output (Mi[16:9]) according to Equation 9 and Table 1 is b1*{a1_8, a1_7, a1_6, a1_5} + b2*{ With values of a2_8, a2_7, a2_6, a2_5} + ... + b8*{a8_8, a8_7, a8_6, a8_5}, the output (Mi[8:1]) is b1*{a1_4, a1_3, a1_2, a1_1} + b2*{a2_4, a2_3, a2_2, a2_1} + ... + b8*{a8_4, a8_3, a8_2, a8_1}. Generalizing this, we have: The number of bits of the first signals X1, X2, ... XN is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is divided into two divisions. If it is a corresponding signal, the most significant N bits of the output Mi of the summing unit SUMi are the most significant ( N/2) corresponds to the sum of bits, and the least significant N bits of the output Mi of the summing unit SUMi are selected according to the second signal Yi from among the first signals X1, X2, ... XN. It corresponds to the sum of the least significant (N/2) bits of the first signals. The parallel processing device according to the present embodiment is a general method (eg, 8 first signals (each first signal has 8 bits) using an adder that receives 8 first signals (each first signal has 8 bits). 4 bits), that is, {0000, a1_4, a1_3, a1_2, a1_1}, {0000, a2_4, a2_3, a2_2, a2_1} as the first signals (X1, X2, ... XN) , ... receiving {0000, a8_4, a8_3, a8_2, a8_1} as input and receiving {b8, b7, b6, b5, b4, b3, b2, b1} as the second signal (Yi) to perform addition case), the hardware utilization of the summer SUMi is increased, and the number of effective bits of the first signal Xi, the second signal Yi, and the output Mi is increased.

또한, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 수학식 9 및 표 1에 따라 출력(Mi[16:13])은 b1*{a1_8, a1_7} + b2*{a2_8, a2_7} + ... + b8*{a8_8, a8_7}의 값을 가지고, 출력(Mi[12:9])은 b1*{a1_6, a1_5} + b2*{a2_6, a2_5} + ... + b8*{a8_6, a8_5}의 값을 가지고, 출력(Mi[8:5])은 b1*{a1_4, a1_3} + b2*{a2_4, a2_3} + ... + b8*{a8_4, a8_3}의 값을 가지고, 출력(Mi[4:1])은 b1*{a1_2, a1_1} + b2*{a2_2, a2_1} + ... + b8*{a8_2, a8_1}의 값을 가진다. 이를 일반화 하면 다음과 같다. 제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 합산부(SUMi)의 출력(Mi)의 2N 내지 (N*3/2+1) 비트들은 제1 신호들(X1, X2, ... XN) 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 N 내지 (N*3/4+1) 비트들의 합에 해당하고, 합산부(SUMi)의 출력(Mi)의 (N*3/2) 내지 (N*2/2+1) 비트들은 제1 신호들(X1, X2, ... XN) 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 (N*3/4) 내지 (N*2/4+1) 비트들의 합에 해당하고, 합산부(SUMi)의 출력(Mi)의 (N*2/2) 내지 (N*1/2+1) 비트들은 제1 신호들(X1, X2, ... XN) 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 (N*2/4) 내지 (N*1/4+1) 비트들의 합에 해당하고, 합산부(SUMi)의 출력(Mi)의 (N*1/2) 내지 1 비트들은 제1 신호들(X1, X2, ... XN) 중에서 제2 신호(Yi)에 따라 선택된 제1 신호들의 (N*1/4) 내지 1 비트들의 합에 해당한다. 본 실시예에 의한 병렬 처리 장치는 일반적인 방식(예: 8개의 제1 신호들(각 제1 신호는 8비트를 가짐)을 입력받는 합산기를 이용하여 8개의 제1 신호들(각 제1 신호는 2비트를 가짐)의 합산을 수행하는 경우 즉 제1 신호들(X1, X2, ... XN)로서 {000000, a1_2, a1_1}, {000000, a2_2, a2_1}, ... {000000, a8_2, a8_1}을 입력받고, 제2 신호(Yi)로 {b8, b7, b6, b5, b4, b3, b2, b1}를 입력받아 덧셈을 수행하는 경우) 대비하여, 합산기(SUMi)의 하드웨어의 활용도를 높이고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 개수를 증가시킨다. 다만, 분할 선택 신호(DSi)가 4 분할 또는 그 이상의 분할(예: 8분할, 16분할 등)에 대응하는 신호일 경우에 오버 플로우가 발생할 수 있으므로(예: b1*{a1_2, a1_1} + b2*{a2_2, a2_1} + ... + b8*{a8_2, a8_1}의 크기가 4비트를 초과하여, 출력(Mi[4:1])뿐만 아니라 출력(Mi[8:5])에도 영향을 주는 현상) 제2 신호(Yi)에 포함된 1의 개수가 소정의 개수를 초과하지 아니하도록 관리되어야 한다. In addition, if the division selection signal DSi corresponds to division by 4, the output Mi[16:13] according to Equation 9 and Table 1 is b1*{a1_8, a1_7} + b2*{a2_8, a2_7} With a value of + ... + b8*{a8_8, a8_7}, the output (Mi[12:9]) is b1*{a1_6, a1_5} + b2*{a2_6, a2_5} + ... + b8*{ With the values of a8_6, a8_5}, the output (Mi[8:5]) has the values of b1*{a1_4, a1_3} + b2*{a2_4, a2_3} + ... + b8*{a8_4, a8_3} , the output (Mi[4:1]) has a value of b1*{a1_2, a1_1} + b2*{a2_2, a2_1} + ... + b8*{a8_2, a8_1}. Generalizing this, we have: The number of bits of the first signals X1, X2, ... XN is N (N is greater than or equal to 8 and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi If is a signal corresponding to division by 4, 2N to (N*3/2+1) bits of the output Mi of the summing unit SUMi are the second signals among the first signals X1, X2, ... XN. Corresponds to the sum of N to (N*3/4+1) bits of the first signals selected according to the signal Yi, and (N*3/2) to (N of the output Mi of the summing unit SUMi) *2/2+1) bits are (N*3/4) to (N*2/ Corresponds to the sum of 4+1) bits, and (N*2/2) to (N*1/2+1) bits of the output Mi of the summing unit SUMi are the first signals X1, X2, ... corresponds to the sum of (N*2/4) to (N*1/4+1) bits of the first signals selected according to the second signal Yi from among XN), and the output of the summing unit SUMi. (N*1/2) to 1 bits of (Mi) are (N*1/4) of the first signals selected according to the second signal Yi from among the first signals X1, X2, ... XN. to the sum of 1 bits. The parallel processing device according to the present embodiment is a general method (eg, 8 first signals (each first signal has 8 bits) using an adder that receives 8 first signals (each first signal has 8 bits). 2 bits), that is, as the first signals X1, X2, ... XN, {000000, a1_2, a1_1}, {000000, a2_2, a2_1}, ... {000000, a8_2 , a8_1} and receiving {b8, b7, b6, b5, b4, b3, b2, b1} as the second signal Yi and performing addition), the hardware of the summer (SUMi) The utilization of is increased, and the number of effective bits of the first signal (Xi), the second signal (Yi), and the output (Mi) is increased. However, overflow may occur when the division selection signal DSi is a signal corresponding to 4 divisions or more divisions (eg 8 divisions, 16 divisions, etc.) (eg b1*{a1_2, a1_1} + b2* The size of {a2_2, a2_1} + ... + b8*{a8_2, a8_1} exceeds 4 bits, affecting not only the output (Mi[4:1]) but also the output (Mi[8:5]). Phenomenon) It should be managed so that the number of 1's included in the second signal Yi does not exceed a predetermined number.

예로서 전처리 유닛(150B_i)이 곱셈 모드로 동작하고, 8비트로 구성된 제1 신호(Xi)가 {a8, a7, a6, a5, a4, a3, a2, a1}이고, 8비트로 구성된 제2 신호(Yi)가 {b8, b7, b6, b5, b4, b3, b2, b1}이라고 가정하자. 만일 분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면, 수학식 10 및 표 3에 따라 출력(Mi[16:1])은 {a8, a7, a6, a5, a4, a3, a2, a1}*{b8, b7, b6, b5, b4, b3, b2, b1}의 값을 가진다. 이때 합산기(SUMi)를 구성하는 하드웨어(예: adder)의 100%가 활용된다. For example, the preprocessing unit 150B_i operates in multiplication mode, the first signal Xi composed of 8 bits is {a8, a7, a6, a5, a4, a3, a2, a1}, and the second signal composed of 8 bits ( Assume that Yi) is {b8, b7, b6, b5, b4, b3, b2, b1}. If the division selection signal DSi corresponds to division 1, the output (Mi[16:1]) according to Equation 10 and Table 3 is {a8, a7, a6, a5, a4, a3, a2, a1 }*{b8, b7, b6, b5, b4, b3, b2, b1}. At this time, 100% of the hardware (eg adder) constituting the summing unit (SUMi) is utilized.

또한, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 수학식 10 및 표 3에 따라 출력(Mi[16:9])은 {a8, a7, a6, a5}*{b8, b7, b6, b5}의 값을 가지고, 출력(Mi[8:1])은 {a4, a3, a2, a1}*{b4, b3, b2, b1}의 값을 가진다. 이를 일반화하여 표현하면 다음과 같다. 제1 신호(Xi)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 합산기(SUMi)의 출력(Mi)의 최상위 N 비트들은 제1 신호(Xi)의 최상위 (N/2) 비트들과 제2 신호(Yi)의 최상위 (N/2) 비트들의 곱에 해당하고, 합산기(SUMi)의 출력(Mi)의 최하위 N 비트들은 제1 신호(Xi)의 최하위 (N/2) 비트들과 제2 신호(Yi)의 최하위 (N/2) 비트들의 곱에 해당한다. 이때, 합산기(SUMi)를 구성하는 하드웨어의 50%가 활용되고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 100%가 활용된다. 이에 반하여 일반적인 방식(예: 8비트용 곱셈기를 이용하여 4비트의 곱셈을 수행하는 경우 즉 제1 신호(Xi)로 {0000, a4, a3, a2, a1}를 입력하고, 제2 신호(Yi)로 {0000, b4, b3, b2, b1}를 입력받아 곱셈을 수행하는 경우)을 이용하는 경우, 합산기(SUMi)를 구성하는 하드웨어의 25%가 활용되고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 50%가 활용된다. 따라서, 2 분할 시에, 본 실시예에 의한 병렬 처리 장치는 종래기술 대비하여 합산기(Mi)의 하드웨어 활용도를 2배 증가시킬 수 있고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들도 2배 증가시킬 수 있다. In addition, if the division selection signal DSi corresponds to division by 2, the output (Mi[16:9]) according to Equation 10 and Table 3 is {a8, a7, a6, a5}*{b8, b7, With the values of b6, b5}, the output (Mi[8:1]) has the values of {a4, a3, a2, a1}*{b4, b3, b2, b1}. If this is generalized, it is expressed as follows. If the number of bits of the first signal Xi is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into two, an adder ( The most significant N bits of the output Mi of SUMi) correspond to the product of the most significant (N/2) bits of the first signal Xi and the most significant (N/2) bits of the second signal Yi, and The least significant N bits of the output Mi of (SUMi) correspond to the product of the least significant (N/2) bits of the first signal Xi and the least significant (N/2) bits of the second signal Yi. At this time, 50% of the hardware constituting the summer SUMi is utilized, and 100% of valid bits of the first signal Xi, the second signal Yi, and the output Mi are utilized. In contrast, in a general method (e.g., when 4-bit multiplication is performed using an 8-bit multiplier, that is, {0000, a4, a3, a2, a1} is input as the first signal (Xi), and the second signal (Yi) ), when {0000, b4, b3, b2, b1} is input and multiplication is performed), 25% of the hardware constituting the summer (SUMi) is utilized, and the first signal (Xi), 2 50% of the effective bits of signal Yi and output Mi are utilized. Therefore, when dividing by 2, the parallel processing device according to the present embodiment can double the hardware utilization of the summer (Mi) compared to the prior art, and the first signal (Xi), the second signal (Yi) and Effective bits of the output Mi can also be doubled.

또한, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 수학식 10 및 표 3에 따라 출력(Mi[16:13])은 {a8, a7}*{b8, b7}의 값을 가지고, 출력(Mi[12:9])은 {a6, a5}*{b6, b5}의 값을 가지고, 출력(Mi[8:5])은 {a4, a3}*{b4, b3}의 값을 가지고, 출력(Mi[4:1])은 {a2, a1}*{b2, b1}의 값을 가진다. 이를 일반화하여 표현하면 다음과 같다. 제1 신호(Xi)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 합산기(SUMi)의 출력(Mi)의 2N 내지 (N*3/2+1) 비트들은 제1 신호(Xi)의 N 내지 (N*3/4+1) 비트들과 제2 신호(Yi)의 N 내지 (N*3/4+1) 비트들의 곱에 해당하고, 합산기(SUMi)의 출력(Mi)의 (N*3/2) 내지 (N*2/2+1) 비트들은 제1 신호(Xi)의 (N*3/4) 내지 (N*2/4+1) 비트들과 제2 신호(Yi)의 (N*3/4) 내지 (N*2/4+1) 비트들의 곱에 해당하고, 합산기(SUMi)의 출력(Mi)의 (N*2/2) 내지 (N*1/2+1) 비트들은 제1 신호(Xi)의 (N*2/4) 내지 (N*1/4+1) 비트들과 제2 신호(Yi)의 (N*2/4) 내지 (N*1/4+1) 비트들의 곱에 해당하고, 합산기(SUMi)의 출력(Mi)의 (N*1/2) 내지 1 비트들은 제1 신호(Xi)의 (N*1/4) 내지 1 비트들과 제2 신호(Yi)의 (N*1/4) 내지 1 비트들의 곱에 해당한다. 이때, 합산기(SUMi)를 구성하는 하드웨어의 25%가 활용되고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 100%가 활용된다. 이에 반하여 일반적인 방식(예: 8비트용 곱셈기를 이용하여 2비트의 곱셈을 수행하는 경우 즉 제1 신호(Xi)로 {000000, a2, a1}를 입력하고, 제2 신호(Yi)로 {000000, b2, b1}를 입력받아 곱셈을 수행하는 경우)을 이용하는 경우, 합산기(SUMi)를 구성하는 하드웨어의 6.25%가 활용되고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들의 25%가 활용된다. 따라서, 4 분할 시에, 본 실시예에 의한 병렬 처리 장치는 종래기술 대비하여 합산기(Mi)의 하드웨어 활용도를 4배 증가시킬 수 있고, 제1 신호(Xi), 제2 신호(Yi) 및 출력(Mi)의 유효 비트들도 4배 증가시킬 수 있다. In addition, if the division selection signal DSi is a signal corresponding to division by 4, according to Equation 10 and Table 3, the output (Mi[16:13]) has a value of {a8, a7}*{b8, b7} , output (Mi[12:9]) has the value of {a6, a5}*{b6, b5}, and output (Mi[8:5]) has the value of {a4, a3}*{b4, b3} , the output (Mi[4:1]) has a value of {a2, a1}*{b2, b1}. If this is generalized, it is expressed as follows. If the number of bits of the first signal Xi is N (N is 8 or more and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into four , 2N to (N*3/2+1) bits of the output Mi of the summer SUMi are N to (N*3/4+1) bits of the first signal Xi and the second signal ( Yi) corresponds to the product of N to (N*3/4+1) bits, and (N*3/2) to (N*2/2+1) bits of the output Mi of the summer SUMi (N*3/4) to (N*2/4+1) bits of the first signal (Xi) and (N*3/4) to (N*2/4+) bits of the second signal (Yi). 1) Corresponds to the product of bits, and (N*2/2) to (N*1/2+1) bits of the output Mi of the summer SUMi are (N*2 of the first signal Xi) /4) to (N * 1/4 + 1) bits and (N * 2/4) to (N * 1/4 + 1) bits of the second signal Yi, and an adder ( (N*1/2) to 1 bits of the output Mi of SUMi are (N*1/4) to 1 bits of the first signal Xi and (N*1/2) bits of the second signal Yi. 4) to the product of 1 bits. At this time, 25% of the hardware constituting the summer SUMi is utilized, and 100% of valid bits of the first signal Xi, the second signal Yi, and the output Mi are utilized. In contrast, in a general method (e.g., when performing 2-bit multiplication using an 8-bit multiplier, that is, {000000, a2, a1} is input as the first signal (Xi), and {000000 , b2, b1} and performing multiplication), 6.25% of the hardware constituting the summer (SUMi) is utilized, and the first signal (Xi), the second signal (Yi) and the output ( 25% of the effective bits of Mi) are utilized. Therefore, when dividing by 4, the parallel processing device according to the present embodiment can increase the hardware utilization of the adder Mi by 4 times compared to the prior art, and the first signal Xi, the second signal Yi and The effective bits of the output Mi can also be increased by a factor of 4.

이와 같이 제3 실시예에 의한 병렬 처리 장치는 N개의 독립적인 연산들이 동시에 수행될 수 있으며, N개의 연산들이 매 차례마다(시간에 따라) 독립적으로 변경될 수 있고, 또한 다양한 비트 수의 입력들에 대하여 연산을 수행 할 수 있다. 이는 병렬 처리 장치의 효율을 극대화 시킬 수 있다. As such, in the parallel processing device according to the third embodiment, N independent operations can be simultaneously performed, the N operations can be independently changed every turn (according to time), and inputs of various numbers of bits can be used. Calculations can be performed on This can maximize the efficiency of the parallel processing unit.

도 5는 제3 실시예의 i번째 전처리 유닛의 일례를 설명하기 위한 도면이다. 도 5를 참조하면 전처리 유닛은 선택 연산부(110B_i) 및 쉬프트 연산부(120B_i)를 포함한다. 5 is a diagram for explaining an example of the i-th preprocessing unit of the third embodiment. Referring to FIG. 5 , the preprocessing unit includes a selection operation unit 110B_i and a shift operation unit 120B_i.

선택 연산부(110B_i)는 복수의 제1 변환부들(T1_1, T1_2, ... T1_N) 및 복수의 역다중화부들(DM1, DM2, ... DMN)을 포함한다. 제1 변환부들(T1_1, T1_2, ... T1_N)은 분할 선택 신호(DSi)에 따라 복수의 제1 신호들(X1, X2, ... XN)의 일부 비트들을 쉬프트 한다. 분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면(분할하지 않음에 대응하는 신호이면), 제1 변환부들(T1_1, T1_2, ... T1_N)이 제1 신호들(X1, X2, ... XN)의 모든 비트들을 쉬프트하지 않은 채로 출력한다. The selection operation unit 110B_i includes a plurality of first conversion units T1_1, T1_2, ... T1_N and a plurality of demultiplexers DM1, DM2, ... DMN. The first converters T1_1, T1_2, ... T1_N shift some bits of the plurality of first signals X1, X2, ... XN according to the division selection signal DSi. When the division selection signal DSi is a signal corresponding to division 1 (a signal corresponding to non-division), the first conversion units T1_1, T1_2, ... T1_N transmit the first signals X1, X2, . .. Outputs all bits of XN) without shifting.

제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 제1 변환부들(T1_1, T1_2, ... T1_N)이 제1 신호들(X1, X2, ... XN)의 최상위 (N/2) 비트들을 (N/2) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 최하위 (N/2) 비트들을 0 비트 쉬프트 한다. The number of bits of the first signals X1, X2, ... XN is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is divided into two divisions. If it is a corresponding signal, the first conversion units (T1_1, T1_2, ... T1_N) shift (N/2) most significant (N/2) bits of the first signals (X1, X2, ... XN) and shifts the least significant (N/2) bits of the first signals (X1, X2, ... XN) by 0 bit.

제1 신호들(X1, X2, ... XN)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 제1 변환부들(T1_1, T1_2, ... T1_N)이 제1 신호들(X1, X2, ... XN)의 N 내지 (N*3/4+1) 비트들을 (N*3/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*3/4) 내지 (N*2/4+1) 비트들을 (N*2/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*2/4) 내지 (N*1/4+1) 비트들을 (N*1/4) 비트 쉬프트하고, 제1 신호들(X1, X2, ... XN)의 (N*1/4) 내지 1 비트들을 0 비트 쉬프트 한다. The number of bits of the first signals X1, X2, ... XN is N (N is greater than or equal to 8 and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi If is a signal corresponding to division by 4, the first conversion units (T1_1, T1_2, ... T1_N) convert N to (N*3/4+1) of the first signals (X1, X2, ... XN). Bits are shifted (N*3/4), and (N*3/4) to (N*2/4+1) bits of the first signals (X1, X2, ... XN) are (N* 2/4) bit shift, and (N*2/4) to (N*1/4+1) bits of the first signals (X1, X2, ... XN) are (N*1/4) bits shift, and (N*1/4) to 1 bits of the first signals (X1, X2, ... XN) are shifted by 0 bits.

복수의 역다중화부들(DM1, DM2, ... DMN)은 제1 변환부들(T1_1, T1_2, ... T1_N)의 출력들 및 0 중에서 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 선택된 신호들을 각각 출력한다. The plurality of demultiplexers (DM1, DM2, ... DMN) convert the bits (Yi[1]) of the second signal (Yi) among the outputs of the first conversion units (T1_1, T1_2, ... T1_N) and 0. , Yi[2], ... Yi[N]) to output the selected signals respectively.

쉬프트 연산부(120_i)는 복수의 제2 변환부들(T2_1, T2_2, ... T2_N) 및 복수의 쉬프트 유닛들(SH1, SH2, ... SHN)을 포함한다. 제2 변환부들(T2_1, T2_2, ... T2_N)은 분할 선택 신호(DSi)에 따라 제1 신호(Xi)의 일부 비트들을 0으로 설정한다. The shift operator 120_i includes a plurality of second transform units T2_1, T2_2, ... T2_N and a plurality of shift units SH1, SH2, ... SHN. The second converters T2_1, T2_2, ... T2_N set some bits of the first signal Xi to 0 according to the division selection signal DSi.

분할 선택 신호(DSi)가 1 분할에 대응하는 신호이면(분할하지 않음에 대응하는 신호이면), 제2 변환부들(T2_1, T2_2, ... T2_N)가 제1 신호(Xi)의 일부 비트들을 0으로 설정하지 아니한다. If the division selection signal DSi is a signal corresponding to division 1 (a signal corresponding to non-division), the second conversion units T2_1, T2_2, ... T2_N convert some bits of the first signal Xi. Do not set to 0.

제1 신호(Xi)의 비트 수가 N(N은 8 이상의 짝수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 2 분할에 대응하는 신호이면, 제2 변환부들(T2_1, ... T2_(N/2))이 제1 신호(Xi)의 최상위 (N/2) 비트들이 0이 되도록 제어하고, 제2 변환부들(T2_(N/2+1), ... T2_N)이 제1 신호(Xi)의 최하위 (N/2) 비트들이 0이 되도록 제어한다. When the number of bits of the first signal Xi is N (N is an even number greater than or equal to 8), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into two, the second conversion The parts T2_1, ... T2_(N/2) control the most significant (N/2) bits of the first signal Xi to be 0, and the second conversion parts T2_(N/2+1), ... T2_N) controls the least significant (N/2) bits of the first signal Xi to be zero.

제1 신호(Xi)의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 제2 신호(Yi)의 비트 수가 N이고, 분할 선택 신호(DSi)가 4 분할에 대응하는 신호이면, 제2 변환부들(T2_1, ... T2_(N*1/4))이 제1 신호(Xi)의 최상위 (N*3/4) 비트들이 0이 되도록 제어하고, 제2 변환부들(T2_(N*1/4+1), ... T2_(N*2/4))이 제1 신호(Xi)의 최상위 (N*2/4) 비트들 및 최하위 (N*1/4) 비트들이 0이 되도록 제어하고, 제2 변환부들(T2_(N*2/4+1), ... T2_(N*3/4))이 제1 신호(Xi)의 최상위 (N*1/4) 비트들 및 최하위 (N*2/4) 비트들이 0이 되도록 제어하고, 제2 변환부들(T2_(N*3/4+1), ... T2_N)이 제1 신호(Xi)의 최하위 (N*3/4) 비트들이 0이 되도록 제어한다. If the number of bits of the first signal Xi is N (N is 8 or more and is a multiple of 4), the number of bits of the second signal Yi is N, and the division selection signal DSi is a signal corresponding to division into four , the second conversion units T2_1, ... T2_(N*1/4) control the most significant (N*3/4) bits of the first signal Xi to be 0, and the second conversion units T2_ (N*1/4+1), ... T2_(N*2/4)) are the most significant (N*2/4) bits and least significant (N*1/4) bits of the first signal Xi. are controlled to be 0, and the second conversion units T2_(N*2/4+1), ... T2_(N*3/4) are the highest (N*1/4) of the first signal Xi. ) bits and the least significant (N*2/4) bits are controlled to be 0, and the second conversion units T2_(N*3/4+1), ... T2_N are the least significant of the first signal Xi. (N*3/4) controls to be 0.

복수의 쉬프트 유닛들(SH1, SH2, ... SHN)은 제2 변환부들(T2_1, T2_2, ... T2_N)의 출력들이 쉬프트된 신호들 및 0 중에서 제2 신호(Yi)의 비트들(Yi[1], Yi[2], ... Yi[N])에 따라 선택된 신호들을 각각 출력한다. The plurality of shift units SH1, SH2, ... SHN are signals obtained by shifting the outputs of the second conversion units T2_1, T2_2, ... T2_N and bits of the second signal Yi among 0 ( The signals selected according to Yi[1], Yi[2], ... Yi[N]) are output respectively.

선택 신호(SFi)에 따라 선택 연산부(110_i) 및 쉬프트 연산부(120_i) 중 어느 하나의 연산부가 동작한다.One of the selection operation unit 110_i and the shift operation unit 120_i operates according to the selection signal SFi.

도 6은 제4 실시예에 의한 병렬 처리 장치를 나타내는 도면이다. 도 6을 참조하면, 병렬 처리 장치는 제1 내지 제P 입력들({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p+1)}, ... {XP, YP})을 입력받고, 제1 내지 제P 출력들(M1, ... M(p-1), Mp, M(p+1) ... MP)을 출력한다. 여기에서 P은 4이상의 자연수를 의미하며, 일례로 P는 1024일 수 있다. 또한 p는 1 이상이고 P 이하인 자연수를 의미한다. 제1 내지 제P 입력들({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p+1)}, ... {XP, YP})은 제1 신호들(X1, ... X(p-1), Xp, X(p+1), ... XP)과 제2 신호들(Y1, ... Y(p-1), Yp, Y(p+1) ... YP)을 구비한다. 병렬 처리 장치는 전처리부(100C)와 주처리부(200A)를 구비한다. 병렬 처리장치는 지연부(300A)와 선택부(400A)를 더 구비할 수 있다. 6 is a diagram showing a parallel processing device according to a fourth embodiment. Referring to FIG. 6, the parallel processing device has first through Pth inputs ({X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, { X(p+1), Y(p+1)}, ... {XP, YP}) are input, and the first to Pth outputs M1, ... M(p-1), Mp, Outputs M(p+1) ... MP). Here, P means a natural number of 4 or more, and for example, P may be 1024. Also, p means a natural number greater than or equal to 1 and less than or equal to P. 1st to Pth inputs {X1, Y1}, ... {X(p-1), Y(p-1)}, {Xp, Yp}, {X(p+1), Y(p +1)}, ... {XP, YP}) are the first signals (X1, ... X (p-1), Xp, X (p + 1), ... XP) and the second signal (Y1, ... Y(p-1), Yp, Y(p+1) ... YP). The parallel processing device includes a pre-processing unit 100C and a main processing unit 200A. The parallel processing device may further include a delay unit 300A and a selection unit 400A.

전처리부(100C)는 복수의 전처리 유닛들(... 150C_(p-1), 150C_p, 150C_(p+1), ...)을 포함한다. 복수의 전처리 유닛들(... 150C_(p-1), 150C_p, 150_C(p+1), ...)은 선택 연산부들(... 110C_(p-1), 110C_p, 110C_(p+1), ...) 및 쉬프트 연산부들(... 120C_(p-1), 120C_p, 120C_(p+1), ...)을 포함한다. 전처리 유닛(150C_p)은 선택 연산부(110C_p) 및 쉬프트 연산부(120C_p)를 포함한다. The pre-processing unit 100C includes a plurality of pre-processing units (... 150C_(p-1), 150C_p, 150C_(p+1), ...). The plurality of preprocessing units (... 150C_(p-1), 150C_p, 150_C(p+1), ...) are selected operation units (... 110C_(p-1), 110C_p, 110C_(p+ 1), ...) and shift operation units (... 120C_(p-1), 120C_p, 120C_(p+1), ...). The pre-processing unit 150C_p includes a selection operation unit 110C_p and a shift operation unit 120C_p.

선택 연산부(110C_p)는 전처리 유닛(150C_p)이 합산 모드로 동작하는 경우에 동작한다. 선택 연산부(110C_p)는 전처리 유닛(150C_p)에 대응하는 제1 신호(Xp) 및 이에 인접한 제1 신호들(예: X(p-Q/2+1), ... X(p-1), X(p+1), ... X(p+Q/2)을 제2 신호(Yp)의 비트들(Yp[1], Yp[2], ... Yp[Q])에 따라 합산기(SUMp)에 전달하되, 분할 선택 신호(DSi)에 따라 대응하는 제1 신호(Xp) 및 인접한 제1 신호들(예: X(p-Q/2+1), ... X(p-1), X(p+1), ... X(p+Q/2))의 일부 비트들을 쉬프트하여 합산기에 전달한다. 여기에서 Q는 4 이상의 짝수를 의미하며, 일례로 Q는 32일 수 있다. 또한 q는 1 이상이고 Q 이하인 자연수를 의미한다. The selection operation unit 110C_p operates when the preprocessing unit 150C_p operates in the summation mode. The selection operation unit 110C_p includes the first signal Xp corresponding to the preprocessing unit 150C_p and the first signals adjacent thereto (eg, X(p−Q/2+1), ... X(p−1), X (p + 1), ... X (p + Q / 2) according to the bits (Yp [1], Yp [2], ... Yp [Q]) of the second signal (Yp) (SUMp), but the corresponding first signal Xp and adjacent first signals (eg, X(p-Q/2+1), ... X(p-1) according to the division selection signal DSi , X (p + 1), ... Shift some bits of X (p + Q / 2) and pass them to the summer, where Q means an even number greater than or equal to 4, and for example, Q can be 32 In addition, q means a natural number greater than or equal to 1 and less than or equal to Q.

쉬프트 연산부(120C_p)는 전처리 유닛(150C_p)이 곱셈 모드로 동작하는 경우에 동작한다. 쉬프트 연산부(120C_p)는 제1 신호(Xp)가 0, 1, ... (Q-1) 비트만큼 쉬프트된 신호들((Xp<<0), (Xp<<1), ... (Xp<<(Q-1)))을 제2 신호(Yp)의 비트들(Yp[1], Yp[2], ... Yp[Q])에 따라 합산기(SUMp)에 전달하되, 분할 선택 신호(DSi)에 따라 쉬프트된 신호들((Xp<<0), (Xp<<1), ... (Xp<<(Q-1)))의 일부 비트들이 0이 되도록 제어한다. The shift operation unit 120C_p operates when the pre-processing unit 150C_p operates in the multiplication mode. The shift operation unit 120C_p converts the first signal Xp into shifted signals by 0, 1, ... (Q-1) bits ((Xp<<0), (Xp<<1), ... ( Xp<<(Q-1))) is transferred to the summer (SUMp) according to the bits (Yp[1], Yp[2], ... Yp[Q]) of the second signal (Yp), Some bits of shifted signals ((Xp<<0), (Xp<<1), ... (Xp<<(Q-1))) according to the division selection signal DSi are controlled to be 0. .

전처리 유닛(150C_p)은 동작 모드 선택 신호(SFp)에 따라 선택 연산부(110C_p)를 동작시키거나 쉬프트 연산부(120C_p)를 동작시킨다. The pre-processing unit 150C_p operates the selection operation unit 110C_p or the shift operation unit 120C_p according to the operation mode selection signal SFp.

본 기술이 속한 분야에서 통상적인 지식을 가진 자는 제2 및 제3 실시예를 참조하면 선택 연산부(110C_p) 및 쉬프트 연산부(120C_p)의 상세한 동작을 충분히 예측할 수 있으므로, 설명의 편의상 이에 대한 상세한 설명을 생략한다. 주처리부(200A), 지연부(300A) 및 선택부(400A)의 동작은 도 3 및 이에 대한 설명과 동일하므로 설명의 편의상 생략한다. Since a person with ordinary knowledge in the field to which the present technology belongs can sufficiently predict detailed operations of the selection operation unit 110C_p and the shift operation unit 120C_p by referring to the second and third embodiments, for convenience of description, a detailed description thereof will be given. omit Since operations of the main processing unit 200A, the delay unit 300A, and the selection unit 400A are the same as those of FIG. 3 and the description thereof, they are omitted for convenience of description.

Claims

전처리 유닛들을 포함하며, 제1 신호들 및 제2 신호들을 입력받는 전처리부; 및
합산기들을 포함하는 주처리부를 포함하며,
상기 전처리 유닛들 중 각 전처리 유닛은 쉬프트 연산부를 포함하며,
상기 쉬프트 연산부는 상기 제1 신호들 중 대응하는 제1 신호가 쉬프트된 신호들을 상기 제2 신호들 중 대응하는 제2 신호의 비트들에 따라 상기 합산기들 중 대응하는 합산기로 전달하되, 분할 선택 신호에 따라 상기 쉬프트된 신호들의 일부 비트들이 0이 되도록 제어하는 병렬 처리 장치. A pre-processing unit including pre-processing units and receiving first signals and second signals; and
It includes a main processing unit including summerers,
Each of the preprocessing units includes a shift operation unit,
The shift operation unit transfers signals obtained by shifting a corresponding first signal among the first signals to a corresponding adder among the adders according to bits of a corresponding second signal among the second signals, and selects division. A parallel processing device that controls some bits of the shifted signals to be 0 according to a signal.

제1 항에 있어서,
상기 대응하는 제1 신호의 비트 수가 N(N은 8 이상의 짝수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 2 분할에 대응하는 신호이면, 상기 쉬프트 연산부는 상기 대응하는 제1 신호가 0 비트 내지 (N/2-1) 비트 쉬프트된 신호들의 최상위 (N/2) 비트들이 0이 되도록 제어하고, 상기 대응하는 제1 신호가 (N/2) 비트 내지 (N-1) 비트 쉬프트된 신호들의 최하위 (N/2) 비트들이 0이 되도록 제어하는 병렬 처리 장치. According to claim 1,
If the number of bits of the corresponding first signal is N (N is an even number greater than or equal to 8), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division into two, the shift operation unit The corresponding first signal is controlled so that the most significant (N/2) bits of the signals shifted from 0 bits to (N/2-1) bits are 0, and the corresponding first signal is (N/2) bits to ( N-1) A parallel processing device that controls the least significant (N/2) bits of the bit-shifted signals to be 0.

제2 항에 있어서,
상기 대응하는 제1 신호의 비트 수가 N(N은 8 이상의 짝수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 2 분할에 대응하는 신호이면, 상기 대응하는 합산기의 출력의 최상위 N 비트들은 상기 대응하는 제1 신호의 최상위 (N/2) 비트들과 상기 대응하는 제2 신호의 최상위 (N/2) 비트들의 곱에 해당하고, 상기 대응하는 합산기의 상기 출력의 최하위 N 비트들은 상기 대응하는 제1 신호의 최하위 (N/2) 비트들과 상기 대응하는 제2 신호의 최하위 (N/2) 비트들의 곱에 해당하는 병렬 처리 장치. According to claim 2,
If the number of bits of the corresponding first signal is N (N is an even number equal to or greater than 8), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 2, the corresponding adder The most significant N bits of the output of the correspond to the product of the most significant (N/2) bits of the corresponding first signal and the most significant (N/2) bits of the corresponding second signal, and the corresponding summer The least significant N bits of the output correspond to the product of the least significant (N/2) bits of the corresponding first signal and the least significant (N/2) bits of the corresponding second signal.

제1 항에 있어서,
상기 대응하는 제1 신호의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 4 분할에 대응하는 신호이면, 상기 쉬프트 연산부는 상기 대응하는 제1 신호가 0 비트 내지 (N*1/4-1) 비트 쉬프트된 신호들의 최상위 (N*3/4) 비트들이 0이 되도록 제어하고, 상기 대응하는 제1 신호가 (N*1/4) 비트 내지 (N*2/4-1) 비트 쉬프트된 신호들의 최상위 (N*2/4) 비트들 및 최하위 (N*1/4) 비트들이 0이 되도록 제어하고, 상기 대응하는 제1 신호가 (N*2/4) 비트 내지 (N*3/4-1) 비트 쉬프트된 신호들의 최상위 (N*1/4) 비트들 및 최하위 (N*2/4) 비트들이 0이 되도록 제어하고, 상기 대응하는 제1 신호가 (N*3/4) 비트 내지 (N-1) 비트 쉬프트된 신호들의 최하위 (N*3/4) 비트들이 0이 되도록 제어하는 병렬 처리 장치. According to claim 1,
If the number of bits of the corresponding first signal is N (N is 8 or more and is a multiple of 4), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 4, the The shift operation unit controls so that the most significant (N*3/4) bits of signals obtained by shifting the corresponding first signal from 0 bit to (N*1/4-1) bits become 0, and the corresponding first signal is (N * 1/4) bits to (N * 2/4-1) bit-shifted signals so that the most significant (N * 2/4) bits and least significant (N * 1/4) bits are 0, The most significant (N*1/4) bits and least significant (N*2/4) bits of signals obtained by shifting the corresponding first signal from (N*2/4) bits to (N*3/4-1) bits Parallel processing of controlling to be 0, and controlling the least significant (N * 3/4) bits of signals in which the corresponding first signal is (N * 3/4) bit to (N-1) bit shifted to be 0 Device.

제4 항에 있어서,
상기 대응하는 제1 신호의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 4 분할에 대응하는 신호이면, 상기 대응하는 합산기의 출력의 2N 내지 (N*3/2+1) 비트들은 상기 대응하는 제1 신호의 N 내지 (N*3/4+1) 비트들과 상기 대응하는 제2 신호의 N 내지 (N*3/4+1) 비트들의 곱에 해당하고, 상기 대응하는 합산기의 상기 출력의 (N*3/2) 내지 (N*2/2+1) 비트들은 상기 대응하는 제1 신호의 (N*3/4) 내지 (N*2/4+1) 비트들과 상기 대응하는 제2 신호의 (N*3/4) 내지 (N*2/4+1) 비트들의 곱에 해당하고, 상기 대응하는 합산기의 상기 출력의 (N*2/2) 내지 (N*1/2+1) 비트들은 상기 대응하는 제1 신호의 (N*2/4) 내지 (N*1/4+1) 비트들과 상기 대응하는 제2 신호의 (N*2/4) 내지 (N*1/4+1) 비트들의 곱에 해당하고, 상기 대응하는 합산기의 상기 출력의 (N*1/2) 내지 1 비트들은 상기 대응하는 제1 신호의 (N*1/4) 내지 1 비트들과 상기 대응하는 제2 신호의 (N*1/4) 내지 1 비트들의 곱에 해당하는 병렬 처리 장치. According to claim 4,
If the number of bits of the corresponding first signal is N (N is 8 or more and is a multiple of 4), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 4, the 2N to (N*3/2+1) bits of the output of the corresponding summer are N to (N*3/4+1) bits of the corresponding first signal and N to (N*3/2+1) bits of the corresponding second signal. Corresponds to a product of (N*3/4+1) bits, and (N*3/2) to (N*2/2+1) bits of the output of the corresponding summer are the corresponding first signal Corresponds to the product of (N*3/4) to (N*2/4+1) bits of and (N*3/4) to (N*2/4+1) bits of the corresponding second signal. and (N*2/2) to (N*1/2+1) bits of the output of the corresponding adder are (N*2/4) to (N*1/ 4+1) bits and (N*2/4) to (N*1/4+1) bits of the corresponding second signal, and (N* of the output of the corresponding summer) 1/2) to 1 bits correspond to the product of (N * 1/4) to 1 bits of the corresponding first signal and (N * 1/4) to 1 bits of the corresponding second signal. processing unit.

제1 항에 있어서,
상기 쉬프트 연산부는 변환부들 및 쉬프트 유닛들을 포함하며,
상기 변환부들은 상기 분할 선택 신호에 따라 상기 제1 신호의 일부 비트들을 0으로 설정하고,
상기 쉬프트 유닛들은 상기 변환부들의 출력들이 쉬프트된 신호들 및 0 중에서 상기 제2 신호의 비트들에 따라 선택된 신호들을 출력하는 병렬 처리 장치.According to claim 1,
The shift operation unit includes conversion units and shift units,
The conversion units set some bits of the first signal to 0 according to the division selection signal,
The shift units output signals selected according to bits of the second signal from among shifted signals and 0s of the outputs of the conversion units.

제1 항에 있어서,
상기 각 전처리 유닛은 선택 연산부를 더 포함하며,
상기 선택 연산부는 상기 전처리 유닛들 중 대응하는 전처리 유닛이 합산 모드로 동작하는 경우에 동작하며, 상기 제1 신호들을 상기 대응하는 제2 신호의 비트들에 따라 상기 대응하는 합산기로 전달하되, 상기 분할 선택 신호에 따라 상기 제1 신호들의 일부 비트들을 쉬프트하여 상기 대응하는 합산기로 전달하며,
상기 쉬프트 연산부는 상기 대응하는 전처리 유닛이 곱셈 모드로 동작하는 경우에 동작하는 병렬 처리 장치. According to claim 1,
Each of the preprocessing units further includes a selection operation unit,
The selection operation unit operates when a corresponding preprocessing unit among the preprocessing units operates in an addition mode, and transfers the first signals to the corresponding adder according to bits of the corresponding second signal, Some bits of the first signals are shifted according to a selection signal and transmitted to the corresponding adder;
The shift operation unit operates when the corresponding preprocessing unit operates in a multiplication mode.

제7 항에 있어서,
상기 제1 신호들의 비트 수가 N(N은 8 이상의 짝수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 2 분할에 대응하는 신호이면, 상기 선택 연산부가 상기 제1 신호들의 최상위 (N/2) 비트들을 (N/2) 비트 쉬프트하고, 상기 제1 신호들의 최하위 (N/2) 비트들을 0 비트 쉬프트하여 상기 대응하는 합산기로 전달하는 병렬 처리 장치. According to claim 7,
When the number of bits of the first signals is N (N is an even number equal to or greater than 8), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division into two, the selection operation unit performs the first A parallel processing device that bit-shifts most significant (N/2) bits of signals by (N/2), shifts least significant (N/2) bits of the first signals by 0 bit, and transfers them to the corresponding adder.

제8 항에 있어서,
상기 제1 신호들의 비트 수가 N(N은 8 이상의 짝수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 2 분할에 대응하는 신호이면, 상기 대응하는 합산기의 출력의 최상위 N 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 최상위 (N/2) 비트들의 합에 해당하고, 상기 대응하는 합산기의 출력의 최하위 N 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 최하위 (N/2) 비트들의 합에 해당하는 병렬 처리 장치. According to claim 8,
If the number of bits of the first signals is N (N is an even number greater than or equal to 8), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 2, the output of the corresponding summer The most significant N bits of correspond to the sum of the most significant (N/2) bits of the first signals selected according to the corresponding second signal among the first signals, and the least significant N bits of the output of the corresponding summer A parallel processing device corresponding to the sum of the least significant (N/2) bits of the first signals selected according to the corresponding second signal from among the first signals.

제7 항에 있어서,
상기 제1 신호들의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 4 분할에 대응하는 신호이면, 상기 선택 연산부가 상기 제1 신호들의 N 내지 (N*3/4+1) 비트들을 (N*3/4) 비트 쉬프트하고, 상기 제1 신호들의 (N*3/4) 내지 (N*2/4+1) 비트들을 (N*2/4) 비트 쉬프트하고, 상기 제1 신호들의 (N*2/4) 내지 (N*1/4+1) 비트들을 (N*1/4) 비트 쉬프트하고, 상기 제1 신호들의 (N*1/4) 내지 1 비트들을 0 비트 쉬프트하여 상기 대응하는 합산기로 전달하는 병렬 처리 장치. According to claim 7,
If the number of bits of the first signals is N (N is greater than or equal to 8 and is a multiple of 4), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 4, the selection operation Add (N*3/4) bit shifts N to (N*3/4+1) bits of the first signals, and (N*3/4) to (N*2/4+) bits of the first signals. 1) bit-shift (N*2/4) bits and (N*1/4) bit-shift bits (N*2/4) to (N*1/4+1) of the first signals; Parallel processing device for shifting (N*1/4) to 1 bits of the first signals by 0 bits and transmitting them to the corresponding summer.

제10 항에 있어서,
상기 제1 신호들의 비트 수가 N(N은 8 이상이고, 4의 배수임)이고, 상기 대응하는 제2 신호의 비트 수가 N이고, 상기 분할 선택 신호가 4 분할에 대응하는 신호이면, 상기 대응하는 합산기의 출력의 2N 내지 (N*3/2+1) 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 N 내지 (N*3/4+1) 비트들의 합에 해당하고, 상기 대응하는 합산기의 출력의 (N*3/2) 내지 (N*2/2+1) 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 (N*3/4) 내지 (N*2/4+1) 비트들의 합에 해당하고, 상기 대응하는 합산기의 출력의 (N*2/2) 내지 (N*1/2+1) 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 (N*2/4) 내지 (N*1/4+1) 비트들의 합에 해당하고, 상기 대응하는 합산기의 출력의 (N*1/2) 내지 1 비트들은 상기 제1 신호들 중에서 상기 대응하는 제2 신호에 따라 선택된 제1 신호들의 (N*1/4) 내지 1 비트들의 합에 해당하는 병렬 처리 장치. According to claim 10,
If the number of bits of the first signals is N (N is 8 or more and is a multiple of 4), the number of bits of the corresponding second signal is N, and the division selection signal is a signal corresponding to division by 4, the corresponding 2N to (N*3/2+1) bits of the output of the summer are N to (N*3/4+1) bits of the first signals selected according to the corresponding second signal from among the first signals. sum, and (N*3/2) to (N*2/2+1) bits of the output of the corresponding adder are selected from among the first signals according to the corresponding second signal. corresponds to the sum of (N*3/4) to (N*2/4+1) bits of (N*2/2) to (N*1/2+1) of the output of the corresponding summer. The bits correspond to the sum of (N*2/4) to (N*1/4+1) bits of the first signals selected according to the corresponding second signal from among the first signals, and the corresponding adder (N * 1/2) to 1 bits of the output of the parallel processing corresponding to the sum of (N * 1/4) to 1 bits of the first signals selected according to the corresponding second signal among the first signals. Device.

제7 항에 있어서,
상기 선택 연산부는 변환부들 및 역다중화부들을 포함하며,
상기 변환부들은 상기 분할 선택 신호에 따라 상기 제1 신호들의 일부 비트들을 쉬프트 하며,
상기 역다중화부들은 상기 변환부들의 출력들 및 0 중에서 상기 제2 신호의 비트들에 따라 선택된 신호들을 출력하는 병렬 처리 장치. According to claim 7,
The selection operation unit includes conversion units and demultiplexers,
The converters shift some bits of the first signals according to the division selection signal,
The demultiplexing units output signals selected according to the bits of the second signal from among the outputs of the conversion units and 0's.

제1 항에 있어서,
상기 각 전처리 유닛은 선택 연산부를 더 포함하며,
상기 선택 연산부는 상기 전처리 유닛들 중 대응하는 전처리 유닛이 합산 모드로 동작하는 경우에 동작하며, 상기 제1 신호들 중 대응하는 일부 제1 신호들을 상기 대응하는 제2 신호의 비트들에 따라 상기 대응하는 합산기로 전달하되, 상기 분할 선택 신호에 따라 상기 대응하는 일부 제1 신호들의 일부 비트들을 쉬프트하여 상기 대응하는 합산기로 전달하며,
상기 쉬프트 연산부는 상기 대응하는 전처리 유닛이 곱셈 모드로 동작하는 경우에 동작하는 병렬 처리 장치.According to claim 1,
Each of the preprocessing units further includes a selection operation unit,
The selection operation unit operates when a corresponding pre-processing unit among the pre-processing units operates in a summation mode, and converts a corresponding part of the first signals among the first signals into the corresponding corresponding bits according to bits of the corresponding second signal. shifting some bits of the corresponding partial first signals according to the division selection signal and transferring them to the corresponding summer;
The shift operation unit operates when the corresponding preprocessing unit operates in a multiplication mode.

제1 항에 있어서,
클록 신호에 따라 상기 합산기들의 출력들을 지연하여 출력하는 지연부; 및
메모리로부터 전달된 신호들 및 상기 지연부에서 출력되는 상기 신호들 중에서 입력 제어 신호들에 따라 각각 선택된 신호들을 상기 제1 신호들로서 각각 출력하는 선택부를 더 포함하는 병렬 처리 장치.
According to claim 1,
a delay unit delaying and outputting outputs of the adders according to a clock signal; and
and a selector configured to output signals selected according to input control signals among signals transferred from a memory and signals output from the delay unit as the first signals.