KR0155515B1

KR0155515B1 - Fast hardmard transformer

Info

Publication number: KR0155515B1
Application number: KR1019950039781A
Authority: KR
Inventors: 현진일; 이훈복; 차진종
Original assignee: 양승택; 한국전자통신연구원; 이준; 한국전기통신공사
Priority date: 1995-11-04
Filing date: 1995-11-04
Publication date: 1998-11-16
Also published as: KR970031516A

Abstract

본 발명은 하다마드 변환에 소요되는 시간을 단축하고 칩면적과 소비전력을 저감토록 한 고속 하다마드 변환기에 관한 것이다.The present invention relates to a high-speed Hadamard converter to shorten the time required for Hadamard conversion and to reduce chip area and power consumption.

이러한 본 발명은 입력 데이타를 다중화하는 제 1 다중화기와, 입력 데이타를 선입선출하는 제 1 선입선출 버퍼와, 제 1 선입선출기의 출력과 입력 데이타를 다중화하는 제 2 다중화기와, 제 1 다중화기의 출력데이타와 제 2 다중화기의 출력 데이타를 버터 플라이 연산하여 제 1 및 제 2 출력신호를 얻는 버터플라이 연산기와, 버터플라이 연산기에서 출력되는 제 2 출력신호를 선입선출하는 제 2 선입선출 버퍼로 프로세서를 구성하게 된다.The present invention provides a first multiplexer for multiplexing input data, a first first-in first-out buffer for first-in first-out input data, a second multiplexer for multiplexing output and input data of the first first-in first-out, and a first multiplexer. A butterfly operator that performs a butterfly operation on the output data and the output data of the second multiplexer to obtain first and second output signals, and a second first-in first-out buffer that first-in first-out a second output signal output from the butterfly operator. Will be configured.

Description

고속 하다마드 변환기High speed Hadamard Converter

제1도는 일반적인 하다마드 변환 오류 정정 기법을 이용한 디지털 통신 시스템 구성도로서, (a)는 CDMA 디지털 이동통신 역방향 링크 송신부 구성이고, (b)는 CDMA 디지털 이동통신 역방향 링크 수신부 구성이다.1 is a configuration diagram of a digital communication system using a general Hadamard transformation error correction technique, (a) is a CDMA digital mobile communication reverse link transmitter, and (b) is a configuration of a CDMA digital mobile communication reverse link receiver.

제2도는 종래 고속 하다마드 변환(FHT) 알고리즘 신호 흐름도.2 is a conventional fast Hadamard transform (FHT) algorithm signal flow diagram.

제3도는 종래 FHT 알고리즘의 버터플라이 연산방법 설명도.3 is an explanatory diagram of a butterfly calculation method of a conventional FHT algorithm.

제4도는 종래 FHT 알고리즘의 규칙화된 신호 흐름도.4 is a regularized signal flow diagram of a conventional FHT algorithm.

제5도는 종래 FHT를 위한 단일 프로세서 구성도.5 is a schematic diagram of a single processor for a conventional FHT.

제6도는 종래 FHT를위한 파이프라인 구성도로서, (a)는 FHT를 위한 프로세서 어레이 구성도이고, (b)는 프로세서 내부 구성도이고, (c)는 시간에 따른 프로세서의 타이밍도이다.6 is a pipeline configuration diagram for a conventional FHT, where (a) is a processor array configuration diagram for an FHT, (b) is an internal configuration diagram of a processor, and (c) is a timing diagram of a processor over time.

제7도는 본 발명에 의한 고속 하다마드 변환기의 구성도로서, (a)는 FHT를 위한 고효율 프로세서 어레이 구성도이고, (b)는 프로세서의 기본 구성도이고, (c)는 버퍼 절약형 프로세서 구성도이고, (d)는 시간에 따른 프로세서의 동작 타이밍도이다.7 is a block diagram of a high-speed Hadamard converter according to the present invention, (a) is a high efficiency processor array configuration for the FHT, (b) is a basic configuration of the processor, (c) is a buffer-saving processor configuration (D) is an operation timing diagram of the processor over time.

제8도는 본 발명에 의한 FHT 알고리즘의 신호 흐름도.8 is a signal flow diagram of the FHT algorithm according to the present invention.

제9도는 본 발명에 의한 FHT 알고리즘을 위한 파이프라인 구성도로서,(a)는 FHT를 위한 프로세서 어레이 구성이고, (b)는 시간에 따른 프로세서의 동작 타이밍도이다.9 is a pipeline configuration diagram for the FHT algorithm according to the present invention, (a) is a processor array configuration for the FHT, (b) is an operation timing diagram of the processor over time.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

20-22 : 프로세서 23,25,28,29,31 : 다중화기20-22: Processor 23,25,28,29,31: Multiplexer

24,27,30 : 선입선출 버퍼 26,32 : 버터플라이 연산기24,27,30: First-in, first-out buffer 26,32: Butterfly calculator

본 발명은 고속 하다마드 변환(이하 FHT라 칭한다)에 관한 것으로, 특히 하다마드 변환에 소요되는 시간을 단축하고 칩면적과 소비전력을 저감토록 한 고속 하다마드 변환기에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to high speed Hadamard transform (hereinafter referred to as FHT), and more particularly to a high speed Hadamard converter to shorten the time required for Hadamard transform and to reduce chip area and power consumption.

통상, 하다마드 변환은 디지털 통신 시스템의 오류 정정 부호화, 영상 분석 시스템등에 널리 사용되어 왔던 일종의 직교 변환이다.In general, the Hadamard transform is a kind of orthogonal transform that has been widely used in error correction encoding, image analysis system, and the like of a digital communication system.

최근에는 CDMA 디지털 이동통신 시스템의 기지국이 단말기로부터 송신된 직교 변조 신호를 복구하는데 사용되고 있는데, 직교 변조 신호의 실시간 복구 및 하다마드 변환기의 VLSI구현을 위하여 효율적인 하드웨어 구조의 설계가 요구되고 있다.Recently, a base station of a CDMA digital mobile communication system is used to recover an orthogonal modulated signal transmitted from a terminal, and an efficient hardware structure is required for real time recovery of an orthogonal modulated signal and VLSI implementation of a Hadamard converter.

디지털 이동 통신 시스템에서 채널 오류 정정을 위한 기법으로 Convolution coding Interleaving이 널리 사용되고 있으나 CDMA 디지털 이동통신 시스템의 역방향 링크에서는 이와 더불어 Walsh-하다마드 직교 변조가 사용되고 있다.Convolution coding interleaving is widely used as a technique for channel error correction in digital mobile communication systems, but Walsh-Hadamard orthogonal modulation is used in reverse link of CDMA digital mobile communication systems.

제1도는 일반적인 하다마드 변환 오류 정정 기법을 이용한 디지털 통신 시스템 구성도이다.1 is a block diagram of a digital communication system using a general Hadamard transform error correction technique.

먼저, (a)는 CDMA 디지털 이동통신 역방향 링크 송신부 구성으로, Convolution coding, Interleaving을 거친 코드 심볼(6 code symbols)을 직교 변조기(1)에서 일정한 크기로 나누어 Walsh-Hadamard 직교 변조를 수행하고(64 Walsh chips), PN 확산기(2), OQPSK 변조기(3), 주파수 변환기(4)를 순차 통해 고주파(RF)신호를 송신한다.First, (a) is a CDMA digital mobile communication reverse link transmitter configured to perform Walsh-Hadamard orthogonal modulation by dividing convolution coding and interleaving code symbols (6 code symbols) into a predetermined size in the orthogonal modulator 1 (64). Walsh chips), PN spreaders (2), OQPSK modulators (3), and frequency converters (4) sequentially transmit high frequency (RF) signals.

여기서, Walsh-Hadamard 직교 변조는 log₂N 비트의 입력 데이타에 따라 N*N Walsh-Hadamard 행렬중 한 열을 선택하여 N개의 데이타를 생성하는 것으로서, 이때 사용되는 Walsh-Hadamard행렬은 다음과 같이 순환적으로 구해진다.Here, Walsh-Hadamard orthogonal modulation generates N data by selecting one column of N * N Walsh-Hadamard matrices according to log ₂ N bits of input data. Obtained by enemy.

따라서 N이 8인 경우의 Walsh-Hadamard 행렬은 다음과 같은 값을 갖게 되는데 행렬의 계수가 1 또는 -1임을 알 수 있다.Therefore, when N is 8, the Walsh-Hadamard matrix has the following values. It can be seen that the coefficient of the matrix is 1 or -1.

CDMA 디지털 이동통신 시스템의 역 방향 링크에서는 N이 64인 Walsh-Hadamard 행렬을 사용하고 있는데 매 6비트의 코드 심볼 단위로 이에 해당하는 Walsh-Hadamard 행렬의 한 열, 즉 64 Walsh Chip을 송신하게 된다.The reverse link of the CDMA digital mobile communication system uses a Walsh-Hadamard matrix having N of 64, and transmits one column of the Walsh-Hadamard matrix corresponding to every 6-bit code symbol, that is, 64 Walsh chips.

제1도의 (b)는 CDMA 디지털 이동통신 역방향 링크 수신부 구성이다.(B) of FIG. 1 is a configuration of a CDMA digital mobile communication reverse link receiver.

도시된 바와 같이, 주파수 변환기(5)로 수신되는 고주파를 기저대역 신호로 변환하고, OQPSK 복조기(6), PN 역확산기(7)를 통해 64개 심볼 단위(64Chips)로 나눈다.As shown, the high frequency received by the frequency converter 5 is converted into a baseband signal and divided into 64 symbol units (64 Chips) through the OQPSK demodulator 6 and the PN despreader 7.

이와 같이 분리된 심볼은 하다마드 변환기(8)에서 분리된 64개의 심볼이 Walsh-Hadamard 행렬 중 어느 열과 가장 일치하는가가 결정되고, 원래 송신된 6개의 심볼을 복구하는데, 이 과정은 하다마드 변환에 의해 6개의 심볼이 복구되고, 피크치 검출기(9)에서 하다마드 변환 결과중 최대값이 검출되어 6개의 코드 심볼로 출력된다.The separated symbols determine which column of the Walsh-Hadamard matrix best matches the 64 symbols separated by the Hadamard transformer (8), and recovers the original six transmitted symbols. The six symbols are recovered, and the peak value detector 9 detects the maximum value of the Hadamard transform result and outputs the six code symbols.

입력 벡타 X에 대하여 하다마드 변환은 다음과 같이 정의된다.For the input vector X, the Hadamard transform is defined as

Y = 1/N H_NX --------------- (5)Y = 1 / NH _N X --------------- (5)

여기서 X는 길이가 N인 입력 벡터이고, H_N은 크기가 N * N인 Walsh-Hadamard 행렬이며, Y는 길이가 N인 출력 벡터이다.Where X is an input vector of length N, H _N is a Walsh-Hadamard matrix of size N * N, and Y is an output vector of length N.

하다마드 변환 결과인 벡타 Y의 각 요소는 길이가 N인 입력 벡터중 Walsh-Hadamard 행렬의 각 열과 유사한 정도를 나타낸다고 볼 수 있기 때문에 벡타 Y요소중 최대값을 찾아내면 Walsh-Hadamard 행렬 중 송신된 열을 알게 되고, 송신된 열의 위치는 Walsh-Hadamard 변조 이전의 코드 심볼값을 가르키게 된다.Since each element of vector Y, which is the result of Hadamard transform, is similar to each column of Walsh-Hadamard matrix of length N input vector, when the maximum value of vector Y element is found, the transmitted column of Walsh-Hadamard matrix The position of the transmitted string points to the code symbol value before Walsh-Hadamard modulation.

제1도와 같은 시스템의 경우 6비트의 코드 심볼을 위하여 실제로는 64Walsh Chip을 송신하기 때문에 채널 통과시 64Chip 중 일부에서 오류가 발생하더라도 6비트의 코드 심볼은 성공적으로 복구할 수 있게 된다.In the case of the system as shown in FIG. 1, since the 64Walsh chip is actually transmitted for the 6-bit code symbol, the 6-bit code symbol can be recovered successfully even if an error occurs in some of the 64 chips during the channel pass.

하다마드 함수의 자유거리(d_min, Free Distance)는 N/2로서(d_min-1)/2개의 오류까지 정정할 수 있다.The free distance (d _min , Free Distance) of the Hadamard function is N / 2 (d _min -1) / 2 errors that can be corrected.

제1도와 같은 시스템에서 Walsh-Hadamard 행렬을 이용한 오류 정정에 있어서, Walsh-Hadamard 직교 변조기는 (1)(2)(3)의 행렬 생성 방법을 이용하여 간단하게 구현할 수 있으나 하다마드 변환기의 구현은 간단하지 않으며 수신부 구현에 필요한 VLSI 수를 줄이기 위해서는 하다마드 변환기를 위한 효율적인 알고리즘 및 하드웨어 구조 설계가 필수적이다.In the error correction using the Walsh-Hadamard matrix in the system as shown in FIG. 1, the Walsh-Hadamard quadrature modulator can be simply implemented by using the matrix generation method of (1) (2) (3). In order to reduce the number of VLSI required for the implementation of the receiver, it is essential to design an efficient algorithm and hardware structure for the Hadamard converter.

Walsh-Hadamard 행렬의 계수는 1 또는 -1의 값을 갖기 때문에 하다마드 변환은 다른 변환에 비교적 연산이 간단한 편이다.Since the coefficients of the Walsh-Hadamard matrix have values of 1 or -1, the Hadamard transform is relatively simple to compute for other transforms.

하지만 하다마드 변환을 (5)의 정의대로 수행할 경우 N²에 비례하는 수만큼의 덧셈 또는 뺄셈 연산이 필요하게 된다.However, when the Hadamard transform is performed as defined in (5), the number of addition or subtraction operations proportional to N ² is required.

FHT 알고리즘은 Walsh-Hadamard 행렬의 특성을 이용하여 하다마드 변환에 필요한 연산수를 줄이는 방식으로서 (3)의 특성을 이용하면 N데이타에 대한 하다마드 변환 결과인 Y벡터의 상위 반과 하위 반은 각각 다음과 같이 길이 N/2인 새로운 벡터에 대한 하다마드 변환으로부터 구해진다.The FHT algorithm reduces the number of operations required for the Hadamard transform by using the Walsh-Hadamard matrix. When using the property of (3), the upper half and the lower half of the Y vector, the result of Hadamard transformation for N data, are Is obtained from the Hadamard transform for a new vector of length N / 2 as

여기서 xl_i와 xl_j+N/2는 다음과 같이 데이타 x_i간의 연산으로 얻어진 중간 결과이다.Where xl _i and xl _{j + N / 2} are intermediate results obtained by the operation between data x _i as follows.

xl_i= (x_i+ x_i+N/2)/2 ----------------(8)xl _i = (x _i + x _{i + N / 2} ) / 2 ---------------- (8)

xl_j+N/2= (x_j+ x_j+N/2)2 -------------(9)xl _{j + N / 2} = (x _j + x _{j + N / 2} ) 2 ------------- (9)

이 과정을 반복하면 길이 N/2벡터에 대한 하다마드 변환은 길이 N/4 벡터에 대한 하다마드 변환으로부터 구해지며 최종에는 길이가 2인 벡터에 대한 하다마드 변환, 즉 2데이타간의 연산으로 하다마드 변환이 완료되게 된다.Repeating this process, the Hadamard transform of length N / 2 vector is obtained from Hadamard transform of length N / 4 vector, and finally Hadamard transform of vector of length 2, that is, Hadamard operation The conversion is complete.

N이 8인 경우의 FHT 알고리즘 신호 흐름도가 제2도에 도시되어 있는데, 첫 단계에서는 N = 8에 대한 (8)과 (9)의 연산을 수행하고, 다음 단게에서는 N = 4, 최종단계에서는 N/2에 대한 연산을 수행한다.The signal flow diagram of the FHT algorithm when N is 8 is shown in FIG. 2. In the first step, the operations of (8) and (9) are performed for N = 8, and in the next step, N = 4 and in the final step, Perform an operation on N / 2.

일반적으로 FHT 알고리즘은 log₂N 단계로 구성되며 각 단계에서는 N/2개의 버터플라이 연산이 수행된다.In general, the FHT algorithm consists of log ₂ N steps, where N / 2 butterfly operations are performed.

버터플라이 연산은 제3도와 같이 덧셈과 뺄셈, 시프트 라이트 동작에 의한 나누기2 연산에 의하여 간단하게 구성된다.The butterfly operation is simply configured by the division 2 operation by addition, subtraction and shift write operations as shown in FIG.

제2도의 신호 흐름도를 모든 단계에서 규칙적이 되도록 변형한 것이 제4도인데, 이것은 제2도에서 중간 연산 결과의 위치를 조정함으로써 구해진다.4 is a modification of the signal flow diagram of FIG. 2 to be regular at all stages, which is obtained by adjusting the position of the intermediate calculation result in FIG.

제2도에서 볼 수 있듯이 N개의 입력에 대한 고속 하다마드 변환은 N2 log₂N개의 버터플라이 연산을 필요로 하여 행렬과 벡터의 직접 곱셈 방식보다 연산 요구량이 1/N log₂N배로 줄어든다.As can be seen in Figure 2, the fast Hadamard transform on N inputs requires N2 log ₂ N butterfly operations, reducing the computational requirements by 1 / N log ₂ N times than the direct multiplication of matrices and vectors.

FHT 알고리즘을 구현하는 방법에는 하나의 프로세서를 이용하여 모든 연산을 순차적으로 수행하는 방법과 여러개의 프로세서를 사용하여 연산을 분담시킴으로서 FHT 처리시간을 줄이는 방법이 있을 수 있다.Methods for implementing the FHT algorithm may include a method of sequentially performing all operations using one processor and a method of reducing FHT processing time by dividing the operations using multiple processors.

프로세서를 여러개 사용할 경우 프로세서간의 연결 방식 및 연산분담 방법에 따라 프로세서의 효율이 달라지고 이에 따라 FHT처리 시간이 달라지게 된다.When multiple processors are used, the efficiency of the processor is changed according to the connection method and the operation sharing method between the processors, and thus the FHT processing time is changed.

단일 프로세서 구조는 제5도와 같이 하나의 프로세서(12)를 이용하여 연산을 수행한다.The single processor structure performs operations using one processor 12 as shown in FIG.

입력 데이타는 우선 메모리(10)(11)에 저장되는데 N개의 입력 데이타중 처음 N/2개의 메모리 뱅크1(10)에 저장되고, 나머지 N/2개는 메모리 뱅크2(11)에 저장된다.The input data is first stored in the memory 10 (11), which is stored in the first N / 2 memory bank 1 (10) of the N input data, and the remaining N / 2 are stored in the memory bank 2 (11).

연산은 중간 연산 결과를 메모리에 저장하거나 읽을 때 2개의 메모리 뱅크(10)(11)를 효율적으로 사용할 수 있도록 제4도에 있는 신호흐름도를 따라 수행한다.The operation is performed according to the signal flow diagram in FIG. 4 so that the two memory banks 10 and 11 can be efficiently used when storing or reading intermediate operation results.

각 단계에서는 위쪽으로부터 아래쪽 순서로 수행하며 한 단계가 완료되면 다음 단계로 이동한다. 한편 각 단계의 연산 결과 중 처음 n/2개는 메모리 뱅크1(10)에 저장하고 나머지는 메모리 뱅크2(11)에 저장한다.Each step is done from top to bottom, and when one step is completed, it moves to the next step. On the other hand, the first n / 2 of the calculation results of each step are stored in the memory bank 1 (10) and the rest are stored in the memory bank 2 (11).

각 단계의 수행에는 모두 n/2의 연산이 필요하며 log₂N단계의 연산 결과가 FHT 출력이 된다.Each step requires n / 2 operations, and the log ₂ N operation results in FHT output.

버터플라이 연산 시간을 단위 시간으로 했을 때 입력이 완료된 후 최종 결과가 출력되기 까지에는 N/2 log₂N 단위시간이 소요된다.When the butterfly operation time is the unit time, it takes N / 2 log ₂ N unit time to complete the input and output the final result.

파이프라인 구조는 log₂N개의 프로세서를 사용하여 프로세서 한 개가 한 단계를 담당하게 하고 각 프로세서가 동시에 연산을 수행하도록 함으로써 FHT처리 시간을 줄일 수 있는 방안이다.The pipeline structure can reduce the FHT processing time by using one log ₂ N processors so that one processor takes care of one step and each processor executes operations at the same time.

FHT를 위한 파이프라인 구조는 제6도의 (a)와 같이 1차원 프로세서 어레이로 구성되어 있는데, 각 프로세서(13-15)는 입력된 데이타에 대하여 제2도에 도시된 FHT신호 흐름도의 한 단계를 수행하고 그 결과를 다음 단계 프로세서로 넘겨준다.The pipeline structure for the FHT is composed of a one-dimensional processor array as shown in FIG. 6 (a). Each processor 13-15 performs one step of the flow chart of the FHT signal shown in FIG. Run it and pass the result to the next processor.

각 프로세서는 제6도의 (b)와 같은 내부 구조를 갖고 있는데, 1단계 프로세서(13)는 N/2, 2단계 프로세서(14)는 N/4, 최종 단계 프로세서(15)는 1개 데이타를 위한 버퍼를 갖고 있다.Each processor has an internal structure as shown in FIG. 6 (b), where the first stage processor 13 is N / 2, the second stage processor 14 is N / 4, and the last stage processor 15 stores one data. It has a buffer for

전단계 프로세서 또는 외부로부터 입력되는 데이타는 선입선출 버퍼(17)의 입력 다중화기(16)를 통하여 버퍼가 채워질 때까지 버퍼에 일단 저장된다. 다음 데이타가 들어오면 버퍼의 출력 데이타와 버터플라이 연산기(18)에서 연산이 이루어진다.Data input from the preprocessor or external is once stored in the buffer until the buffer is filled via the input multiplexer 16 of the first-in, first-out buffer 17. When the next data comes in, the operation is performed in the buffer operator 18 and the output data of the buffer.

연산 결과중 하나는 출력 다중화기(19)를 통하여 다음 단계 프로세서로 전달되고, 하나는 버퍼 입력 다중화기(16)를 통하여 버퍼에 저장된다.One of the results of the operation is passed through the output multiplexer 19 to the next step processor, and one is stored in the buffer via the buffer input multiplexer 16.

다음 입력 데이타에 대해서도 버퍼에 저장된 입력 데이타가 소진될 때까지 이 과정이 반복된다.This process is repeated for the next input data until the input data stored in the buffer is exhausted.

이 경우 버퍼에는 다음 단계 프로세서로 보낼 버터플라이 연산 결과로 채워져 있게 된다.In this case, the buffer is filled with the result of the butterfly operation that is sent to the next processor.

새로운 입력 데이타가 들어오면 다시 선입선출 버퍼(17)에 저장되는데 버퍼에 들어있던 연산 결과는 출력 다중화기(19)를 통하여 다음 단계 프로세서로 차례대로 전달된다.When new input data comes in, it is stored in the first-in, first-out buffer 17, and the result of the operation in the buffer is sequentially transmitted through the output multiplexer 19 to the next processor.

입력 데이타가 8개일 경우 제6도의 (a)와 같은 구성으로 제2도에 도시된 FHT 알고리즘을 수행할 경우 시간에 따른 각 프로세서의 동작을 보면 제6도의 (c)와 같다.If there are 8 input data, the operation of each processor according to time is shown in FIG. 6C when the FHT algorithm shown in FIG. 2 is performed with the configuration as shown in FIG.

제6도의 (c)에서 알 수 있듯이 파이프라인 구조는 최종 데이타가 입력된 후 최종 FHT 결과가 출력되기 까지 N-1단위 시간이 걸리고 있으며, 지연시간과 소요되는 프로세서 수는 곱의 Nlog₂N이다.As can be seen from (c) of FIG. 6, the pipeline structure takes N-1 unit time from the final data input to the final FHT result, and the delay time and the number of processors are Nlog ₂ N of the product. .

이는 FHT 알고리즘에서 필요로 하는 전체 버퍼플라이 연산 N/2 log₂N개를 log₂N개의 프로세서를 사용할 경우 얻을 수 있는 최적값의 2배가 되는데 첫 번째 프로세서를 제외한 각 프로세서가 최초 연산을 시작하여 최종 연산을 마칠 때까지 전체 시간중 50%만 연산에 사용되기 때문이다.This is twice the optimal value obtained by using N / 2 log ₂ N total buffer fly operations required by the FHT algorithm using log ₂ N processors. Each processor except the first processor starts the first operation. This is because only 50% of the time is spent on calculations until the calculation is completed.

그러나 이와 같은 종래의 고속 FHT 변환기는 FHT를 위한 파이프라인 구조에서 프로세서 효율이 낮다는 문제점이 있었다.However, such a conventional fast FHT converter has a problem of low processor efficiency in a pipeline structure for FHT.

즉, 파이프라인 구조에서 각 프로세서의 버터플라이 연산기에는 2개의 연산 결과를 배출하나 다음 단계 프로세서에는 이중에 하나만 전달되고 하나는 현 단계 프로세서의 버퍼에 저장되어 있다가 다음 단계 프로세서로 전달되므로 이 과정에서 프로세서가 연산을 수행하지 않기 때문에 프로세서의 효율이 낮아지게 되는 것이다.In other words, in the pipeline structure, two operation results are outputted to each processor's butterfly operator, but only one is transmitted to the next processor and one is stored in the buffer of the current processor, and then transferred to the next processor. Since the processor does not perform the operation, the efficiency of the processor becomes low.

따라서 본 발명은 상기와 같은 종래 기술의 제반 문제를 해결하기 위한 것으로서, 본 발명의 목적은 파이프라인 구조의 프로세서 효율을 높이고 하다마드 변환에 소요되는 시간을 단축하고 칩면적과 소비전력을 저감토록 고속 하다마드 변환기를 제공하는데 있다.Accordingly, the present invention is to solve the above-mentioned problems of the prior art, the object of the present invention is to improve the processor efficiency of the pipeline structure, to reduce the time required for Hadamard conversion, to reduce the chip area and power consumption To provide a Hadamard converter.

이하, 본 발명의 실시예를 첨부한 도면 제7도 내지 제9도를 참조하여 상세히 설명하면 다음과 같다.Hereinafter, an embodiment of the present invention will be described in detail with reference to FIGS. 7 to 9.

제7도는 본 발명에 의한 고속 하다마드 변환기의 구성도로서, (b)는 기본 프로세서 구조가 2개의 선입선출 버퍼(24,27)와, 2개의 다중화기(23,25)와, 1개의 버터플라이 연산기(26)로 구성되었다.7 is a block diagram of a fast Hadamard converter according to the present invention, and (b) shows a basic processor structure having two first-in first-out buffers 24 and 27, two multiplexers 23 and 25, and one butter. It consists of a fly operator 26.

선입선출 버퍼중 하나는(27)는 동일한 크기의 버퍼 2개(27a,27b)로 나누어져 있어 2개의 출력이 가능하도록 되어 있다.One of the first-in, first-out buffers 27 is divided into two buffers 27a and 27b of the same size so that two outputs are possible.

또한, 버터플라이 연산기(26)의 양쪽 입력에 다중화기(23,25)의 각 출력이 연결되어 있어 버퍼 출력이나 외부 입력중 하나를 선택하도록 되어 있다.In addition, each output of the multiplexers 23 and 25 is connected to both inputs of the butterfly operator 26 so as to select either the buffer output or the external input.

제7도의 (a)와 같은 구성을 갖는 프로세서(20-22)는 다음과 같은 세가지 동작을 수행한다.The processor 20-22 having the configuration as shown in FIG. 7A performs three operations as follows.

첫 번째로는 선입선출 버퍼(24)에 입력 데이타를 저장하는 동작을 수행하고, 두 번째로는 다중화기(23)를 통해 얻어지는 프로세서 입력과 선입선출 버퍼(24)의 출력데이타에 대하여 버터플라이 연산을 수행하여 연산 결과중 하나는 다음 단계 프로세서로 전달하고 다른 하나는 선입선출 버퍼(27)에 저장하는 동작을 수행하고, 세 번째로는 전단 프로세서에서 얻어지는 두 입력 데이타에 대하여 버터플라이 연산을 수행하여 결과를 다음 프로세서와 선입선출 버퍼(27)에 저장하는 동작을 수행한다.The first operation is to store input data in the first-in, first-out buffer 24. The second operation is butterfly operation on the processor input obtained through the multiplexer 23 and the output data of the first-in, first-out buffer 24. One of the operation results is transferred to the next step processor, and the other is stored in the first-in, first-out buffer 27. Third, the butterfly operation is performed on two input data obtained from the front end processor. The result is stored in the next processor and the first-in, first-out buffer 27.

입력 데이타가 8개인 경우의 FHT연산 과정이 제7도의 (d)에 나타나 있는데, 프로세서1(20)에서는 처음 4단위 시간 동안에는 선입선출 버퍼(24)에 입력 데이타를 저장하고, 다음에는 다중화기(23)를 통해 얻어지는 입력과 선입선출 버퍼(23)의 출력데이타에 대한 버터플라이 연산이 이루어지면서 FHT 알고리즘의 첫 단계 연산이 수행된다.The FHT calculation process for 8 input data is shown in (d) of FIG. 7. In the first processor 20, the input data is stored in the first-in-first-out buffer 24, and then the multiplexer ( A butterfly operation is performed on the input obtained through the method 23 and the output data of the first-in, first-out buffer 23, and the first step of the FHT algorithm is performed.

프로세서2(21)에서는 처음 2단위 시간 동안에는 상기한 첫 번째 동작이 수행되고, 다음 2단위 시간에는 두 번째 동작이 이루어지고, 다음에는 세 번째 동작이 이루어져 2번째 단계가 수행된다.In the processor 2 21, the first operation is performed during the first two unit times, the second operation is performed in the next two unit times, and the second operation is performed after the third operation.

프로세서3(22)에서는 처음 1단위 시간에 첫 번째 동작이 이루어지고, 다음 1단위 시간에 두 번째 동작이 이루어지고, 다음 1단위 시간에는 첫 번째 동작과 세 번째 동작이 동시에 이루어지면서 3번째 단계가 수행된다.In the processor 3 (22), the first operation is performed at the first unit time, the second operation is performed at the next unit time, and the first operation and the third operation are simultaneously performed at the next unit time. Is performed.

최종 결과는 2개씩 동시에 출력되며 최종 입력이 완료된 시점에서 최종 출력이 완료되기까지 소요되는 시간은 N/2단위시간이다.Two final results are output at the same time, and the time required for the final output to be completed is N / 2 unit time from the completion of the final input.

따라서 파이프라인 구조에 비하여 지연 시간이 반으로 감소하였는데 이는 모든 프로세서들이 최초연산을 시작하여 최종 연산을 마칠 때까지 100%의 효율로 동작하기 때문이다.Therefore, the latency is reduced by half compared to the pipeline structure because all processors operate at 100% efficiency from the start of the initial operation to the completion of the final operation.

프로세서의 기본 구조는 제6도의 (b)에 있는 프로세서에 비해 총 버퍼의 크기가 2배가 되는데, 이는 프로세서의 효율이 높아지기 때문에 발생되는 문제이다.The basic structure of the processor is twice the size of the total buffer compared to the processor shown in (b) of FIG. 6, which is a problem caused by the increased efficiency of the processor.

그러나 실제 상황에서는 프로세서1(20)과 프로세서2(21)에서는 선입선출 버퍼23과 27이 동시에 사용되는 경우가 없기 때문에 제7도의 (c)와 같이 간략화된 프로세서 구조를 사용할 수 있고, 이에 따라 전체적으로 필요한 버퍼의 양은 제7도의 (a)에 도시된 바와 같이 된다.However, in the actual situation, since the first-in, first-out buffers 23 and 27 are not used at the same time in the processor 1 (20) and the processor 2 (21), a simplified processor structure as shown in (c) of FIG. 7 can be used. The amount of buffer required is as shown in Fig. 7A.

제4도의 구조에서는 총 버퍼의 크기가 N-1이나 제7도의 구조에서 필요로 하는 버퍼의 양은 2(N-1)-3/4N으로서 약 25%의 버퍼를 더 필요로 한다.In the structure of FIG. 4, the total buffer size is N-1 or the amount of buffers required in the structure of FIG. 7 is 2 (N-1) -3 / 4N, which requires about 25% more buffers.

지금까지 제시된 FHT프로세서 구조들은 제2도나 제4도의 신호 흐름도를 파이프라인 방식이나 병렬 연산 방식으로 구현하고 있는데, 모두 입력 벡타 길이가 N인 하다마드 변환을 벡타 길이 N/2인 하다마드 변환 2개로 바꾸는 방식을 사용하고 있다.The FHT processor architectures presented so far implement the signal flow diagrams of Figs. 2 and 4 in a pipelined or parallel operation scheme, all of which have a Hadamard transform with an input vector length of N and two Hadamard transforms with a vector length of N / 2. I'm using the change method.

변환 순서를 역으로 하여 우선 벡타 길이 2에 대한 하다마드 변환을 수행하고 다음에 벡타 길이 4에 대한 변환 순서로 진행하면 제8도와 같은 신호 흐름도를 얻을 수 있다.By inverting the conversion order, first performing a Hadamard transform on the vector length 2, and then proceeding to the conversion order on the vector length 4, a signal flow diagram like FIG. 8 can be obtained.

수정된 FHT 알고리즘을 수행하기 위한 파이프라인 구조를 설계하면 제9도의 (a)와 같이 된다.Designing a pipeline structure for executing the modified FHT algorithm is as shown in FIG.

전체적인 구조는 제6도의 구조와 같은 형태이며, 프로세서 구조도 동일하다.The overall structure is the same as that of FIG. 6, and the processor structure is the same.

다만 최종단 프로세서에서 동시에 2개의 연산 결과가 출력될 수 있도록 되어 있다.However, the final processor can output two calculation results at the same time.

또한, FHT 알고리즘 수행 순서가 변경됨으로 인하여 프로세서의 버퍼 크기가 바뀌어 있다.In addition, the buffer size of the processor is changed due to the change in the order of FHT algorithm execution.

시간상의 동작을 보면 제9도의 (b)와 같은데, 동작 방식은 제6도의 파이프라인 구조와 같다.The operation in time is shown in FIG. 9 (b), but the operation method is the same as the pipeline structure in FIG.

연산 결과를 하나씩 출력할 경우 제6도의 구조와 같은 시간이 소요되지만 2개의 결과를 동시에 출력할 경우 제6도의 구조에 비해 FHT처리 시간이 단축되고 있다.Outputting the calculation results one by one takes the same time as the structure of FIG. 6, but outputting two results simultaneously reduces the FHT processing time compared to the structure of FIG.

전체적으로 필요한 프로세서의 수는 log₂N이며, 전체 버퍼 메모리의 크기는 N-1이고 최종 데이타 입력 후 최종 출력까지는 N/2-1단위 시간이 소요된다.The total number of processors required is log ₂ N, the total buffer memory size is N-1, and it takes N / 2-1 unit time from the final data input to the final output.

이상에서와 같이 본 발명은 파이프라인 구조에서 각 프로세서내의 버터플라이 연산 결과를 모두 다음 단계의 프로세서로 전달하도록 함으로써 하다마드 변환시간을 단축할 수 있는 효과가 있다.As described above, the present invention has the effect of reducing the Hadamard conversion time by transmitting all the butterfly operation results in each processor in the pipeline structure to the next step processor.

또한, 하다마드 변환기를 VLSI로 구현할 경우 구조가 간단하여 칩면적과 소비전력을 줄일 수 있으며, 하다마드 변환기 구현의 용이성은 하다마드 함수열을 사용하는 시스템 설계의 폭을 넓히고 성능 개선을 도모할 수 있는 효과도 있다.In addition, when the Hadamard converter is implemented in VLSI, the structure is simple and chip area and power consumption can be reduced, and the ease of implementing the Hadamard converter can expand the design of the system using the Hadamard function string and improve the performance. It also works.

Claims

제 1 내지 제 2 프로세서로 입력 데이타를 버터플라이 연산하여 하다마드 변환을 하는 하다마드 변환기에 있어서, 상기 각 프로세서가 입력 데이타를 다중화하는 제 1 다중화기와, 상기 입력 데이타를 선입선출하는 제 1 선입선출 버퍼와, 상기 제 1 선입선출기의 출력과 입력 데이타를 다중화하는 제 2 다중화기와, 상기 제 1 다중화기의 출력데이타와 상기 제 2 다중화기의 출력 데이타를 버터 플라이 연산하여 제 1 및 제 2 출력신호를 얻은 버터 플라이 연산기와, 상기 버터플라이 연산기에서 출력되는 제 2 출력신호를 선입선출하는 제 2 선입선출 버퍼로 각각 구성된 것을 특징으로 하는 고속 하다마드 변환기.A Hadamard converter for performing Hadamard transform by butterfly operation on input data by first to second processors, the first multiplexer for each processor to multiplex the input data, and the first first-in first-out to first-in first-out the input data. A second multiplexer for multiplexing a buffer, an output of the first first-in-first-out and input data, and first and second outputs by performing a butterfly operation on the output data of the first multiplexer and the output data of the second multiplexer And a butterfly operator for obtaining the signal and a second first-in first-out buffer for first-in first-out of the second output signal output from the butterfly operator.

제1항에 있어서, 상기 제 1 선입선출 버퍼는 상기 버터플라이 연산기에서 출력되는 제 2 출력신호를 선입선출하는 제 1 버퍼와, 상기 제 1 버퍼와 동일한 크기를 갖고 제 1 버퍼에서 출력되는 신호를 선입선출하는 제 2 버퍼로 구성된 것을 특징으로 하는 고속 하다마드 변환기.The buffer of claim 1, wherein the first first-in first-out buffer comprises a first buffer that first-in first-out a second output signal output from the butterfly operator, and a signal output from the first buffer having the same size as the first buffer. A high-speed Hadamard converter comprising a first-in first-out buffer.

제1항에 있어서, 상기 각 프로세서가 입력 데이타를 다중화하는 제 1 및 제 2 다중화기와, 상기 제 2 다중화기에서 출력되는 신호를 선입선출하는 선입선출 버퍼와, 상기 선입선출 버퍼의 출력신호와 상기 입력 데이타를 다중화하는 제 3 다중화기와, 상기 제 3 다중화기의 출력 데이타와 상기 제 1 다중화기의 출력 데이타를 버터 플라이 연산하여 제 1 및 제 2 출력신호를 얻고, 제 2 출력은 상기 제 2 다중화기로 피이드백 시키는 버터플라이 연산기로 각각 구성된 것을 특징으로 하는 고속 하다마드 변환기.The first and second multiplexers of which each processor multiplexes input data, a first-in first-out buffer that first-in first-out a signal output from the second multiplexer, an output signal of the first-in first-out buffer and the first and second multiplexers. A third multiplexer for multiplexing input data, a butterfly operation on the output data of the third multiplexer and the output data of the first multiplexer to obtain first and second output signals, and a second output of the second multiplexer High speed Hadamard converter, characterized in that each consisting of a butterfly operator to feed back.