WO2014104520A1

WO2014104520A1 - Transform method, computation method and hevc system to which same are applied

Info

Publication number: WO2014104520A1
Application number: PCT/KR2013/006820
Authority: WO
Inventors: 최병호; 김동순; 이승열; 김제우
Original assignee: 전자부품연구원
Priority date: 2012-12-27
Filing date: 2013-07-30
Publication date: 2014-07-03

Abstract

Provided is a transform method, a computation method and an HEVC system to which the same are applied. According to embodiments of the present invention, by using a memory instead of a register when a result value of a horizontal transform in a transform process of the HEVC system is stored, the complexity of hardware can be reduced, and a critical path can be reduced. In addition, the present invention can reduce a computation amount by skipping a transform/inverse-transform computation through the analysis of values inputted in a butterfly structure in a transform/inverse-transform process having a large computation amount in the HEVC system.

Description

변환 방법, 연산 방법 및 이를 적용한 ＨＥＶＣ 시스템Transformation method, calculation method and HEEC system

본 발명은 데이터 처리 기술에 관한 것으로, 보다 상세하게는, HEVC(High Efficiency Video Coding) 시스템에서 메모리를 이용하여 변환을 수행하는 방법, Butterfly 연산량을 감소시킬 수 있는 방법 및 이를 적용한 HEVC 시스템에 관한 것이다.The present invention relates to a data processing technology, and more particularly, to a method for performing a transformation using a memory in a high efficiency video coding (HEVC) system, a method for reducing butterfly calculation amount, and an HEVC system using the same. .

HEVC(High Efficiency Video Coding) 시스템의 변환 블록은 4×4에서 32×32의 변환 크기를 지원하고, 수평 변환과 수직 변환의 두 단계로 구성 된다. 수평 변환에서는 행 단위로 입력과 출력이 발생되고, 수직 변환에서는 열 단위로 입력과 출력이 발생된다. 그러므로 수직 변환을 하려면 수평 변환이 모두 끝난 상태에서만 시작할 수가 있기 때문에 수평 변환의 결과값들을 저장하기 위한 공간이 필요하게 된다.The transform block of the HEVC system supports transform sizes from 4x4 to 32x32, and consists of two stages: horizontal transform and vertical transform. In horizontal transformations, inputs and outputs are generated in rows, and in vertical transformations, inputs and outputs are generated in columns. Therefore, vertical conversion requires space to store the result of the horizontal conversion because it can only be started when the horizontal conversion is completed.

저장을 위해서 쉽게 사용할 수 있는 것이 레지스터(register)이다. 기존의 H.264 시스템 같은 경우는 변환 크기가 4×4이므로, 굳이 메모리를 사용하지 않고 구현이 편한 레지스터(register)를 사용하는 것이 보편적이다. 하지만, 최대 변환 크기가 32×32가 되는 HEVC에서는 많은 레지스터(register)를 사용하여야 하므로 하드웨어 복잡도가 증가하고, 그에 따라 임계 경로(critical path)가 길어지므로, 초당 60프레임을 요구하며 고속 연산이 필요한 HEVC 시스템에서 병목 현상을 야기한다.One easy to use for storage is a register. In the case of the existing H.264 system, since the conversion size is 4 × 4, it is common to use a register that is easy to implement without using memory. However, HEVC, which has a maximum conversion size of 32 × 32, requires a large number of registers, which increases the hardware complexity and the critical path, thus requiring 60 frames per second and requiring high-speed computation. Cause bottlenecks in HEVC systems.

한편, HEVC(High Efficiency Video Coding) 시스템은 코딩 효율이 기존의 H.264에 비하여 50%이상의 향상을 기대하는 반면, 그에 따른 많은 연산량과 하드웨어 복잡도를 가지고 있다. 특히, 변환/역변환에서는 기존의 4×4 변환에서 코딩 효율을 높이기 위하여 변환 크기를 4×4부터 32×32까지로 규정을 하였다.On the other hand, HEVC (High Efficiency Video Coding) system expects more than 50% improvement in coding efficiency compared to H.264, but has a large amount of computation and hardware complexity. In particular, in the transform / inverse transform, the transform size is defined as 4 × 4 to 32 × 32 in order to increase coding efficiency in the existing 4 × 4 transform.

이에 따라, 가장 큰 크기인 32×32일 때 한 번의 butterfly 구조를 연산하기 위해서는 344개의 곱셈과 404개의 덧셈연산이 필요하게 된다. 전체 변환/역변환 계산을 위해서는 변환 블록 크기 만큼의 연산이 필요하고 변환 단계가 수평과 수직 단계로 구분이 되며 인코더일 경우 변환과 역변환을 수행하여야 하므로 4를 곱하여 한다.Accordingly, 344 multiplications and 404 addition operations are required to calculate one butterfly structure at the largest size of 32 × 32. For the total transform / inverse transform calculation, it is necessary to calculate the size of the transform block, and the transform stage is divided into horizontal and vertical stages. In the case of an encoder, the transform and inverse transform must be performed, so multiply by 4.

즉, 가장 큰 크기인 변환크기 32인 경우에 전체 변환/역변환 계산을 위해서는 344×32×4=44,032의 곱셈과 404×32×4=51,712의 덧셈이 필요하게 된다. 많은 연산량을 요구하는 변환/역변환에서 HEVC 시스템을 실시간으로 구현하기 위해서는 많은 단계의 병렬연산을 요구하게 되며 이는 하드웨어 복잡도를 증가시키는 문제점을 야기한다.That is, in the case of the largest transform size 32, multiplication of 344 × 32 × 4 = 44,032 and addition of 404 × 32 × 4 = 51,712 are required for the total transform / inverse transform calculation. In order to implement a HEVC system in real time in a conversion / inverse transform requiring a large amount of computation, a lot of parallel operations are required, which causes a problem of increasing hardware complexity.

상술한 바와 같은 점을 감안한 본 발명의 목적은 HEVC 시스템 중의 변환 과정 중 수평 변환의 결과값을 저장할 때 레지스터(register)가 아닌 메모리를 사용함으로서, 하드웨어 복잡도를 감소시키고 임계 경로를 감소시키는데 그 목적이 있다.In view of the foregoing, an object of the present invention is to reduce hardware complexity and reduce a critical path by using a memory rather than a register when storing a result of horizontal conversion during a conversion process in an HEVC system. have.

본 발명은 다른 목적은, HEVC 시스템에서 많은 연산량을 갖고 있는 변환/역변환 과정에서 butterfly 구조로 입력되는 값들의 분석을 통하여 변환/역변환 연산 자체를 스킵 (skip) 함으로서 연산량을 줄일 수 있는 Butterfly 연산 방법 및 이를 적용한 HEVC 시스템을 제공함에 있다.Another object of the present invention is to provide a butterfly operation method that can reduce the amount of computation by skipping the transform / inverse transform operation itself through analysis of values input to a butterfly structure in a transform / inverse transformation process having a large amount of computation in an HEVC system. It is to provide a HEVC system applying this.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 메모리를 이용한 변환을 위한 장치는, 행과 열이 동수로 정렬된 데이터들에 대해 각 행별로 수평 변환을 수행하여 출력하는 수평 연산부; 상기 행 또는 열과 동수로 서로 구분되는 메모리를 가지는 메모리부; 및 상기 각 행별 수평 변환에 따른 결과에서 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장하는 수평 레지스터부;를 포함한다.An apparatus for converting using a memory according to a preferred embodiment of the present invention for achieving the object as described above, the horizontal operation unit for performing horizontal conversion for each row of the data arranged in the same number of rows and columns by outputting ; A memory unit having a memory distinguished from each other by the same number as the row or column; And a horizontal register unit for storing each of the data of each row in the result of the horizontal conversion of each row separately in each of the memory separated from each other.

상기 수평 레지스터부가 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장할 때, 상기 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 하는 저장 주소값 생성부;를 더 포함하는 것을 특징으로 한다.A storage address which allows the same address value to be allocated to data having the same column in the data in which the row and column are arranged in the same number when the horizontal register unit stores each of the data of each row separately in the memory separated from each other Value generation unit; characterized in that it further comprises.

상기 서로 구분되는 메모리 각각으로부터 단일 클럭으로 데이터를 읽어 들이는 수직 레지스터부; 및 상기 수직 레지스터부가 읽어들인 데이터에 대해 수직 변환을 수행하는 수직 변환부;를 더 포함한다.A vertical register unit configured to read data from each of the memory separated from each other by a single clock; And a vertical conversion unit which performs vertical conversion on the data read from the vertical register unit.

상기 수직 레지스터부는 상기 서로 구분되는 메모리 각각에서 동일한 주소값이 할당된 데이터를 단일 클럭으로 읽어 들이는 것을 특징으로 한다.The vertical register unit reads data allocated with the same address value from each of the memory separated from each other as a single clock.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 메모리를 이용한 변환을 위한 방법은, 행과 열이 동수로 정렬된 데이터들에 대해 각 행별로 수평 변환을 수행하는 단계; 및 상기 각 행별 수평 변환에 따른 결과에서 각 행의 데이터들 각각을 상기 행 또는 열과 동수로 서로 구분되는 메모리 각각에 따로 저장하는 단계;를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a method for converting using a memory, the method comprising: performing horizontal transformation for each row of data arranged in the same number of rows and columns; And separately storing each of the data of each row in the result of the horizontal conversion of each row in each of the memories separated by the same number as the rows or columns.

상기 저장하는 단계는 상기 수평 레지스터부가 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장할 때, 상기 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 하는 저장하는 것을 특징으로 한다.In the storing step, when the horizontal register unit stores each of the data of each row separately in each of the memory separated from each other, the same address value is obtained for data having the same column in the data in which the rows and columns are arranged in the same number. Characterized in that the storage to be assigned.

상기 서로 구분되는 메모리 각각으로부터 단일 클럭으로 데이터를 읽어 들이는 단계; 및 상기 읽어들인 데이터에 대해 수직 변환을 수행하는 단계;를 더 포함하는 것을 특징으로 한다.Reading data from each of the distinct memories into a single clock; And performing vertical conversion on the read data.

상기 읽어 들이는 단계는 상기 서로 구분되는 메모리 각각에서 동일한 주소 값이 할당된 데이터를 단일 클럭으로 읽어 들이는 것을 특징으로 한다.The reading may be performed by reading data allocated with the same address value from each of the memories separated by a single clock.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, butterfly 연산 방법은, 입력되는 TPU(Transform Processing Unit)를 분석하는 단계; 및 상기 분석단계에서의 분석결과를 기초로, 상기 TPU에 대한 butterfly 연산을 스킵하는 단계;를 포함한다.Butterfly operation method according to an embodiment of the present invention for achieving the above object, the step of analyzing the input TPU (Transform Processing Unit); And skipping butterfly operations on the TPU based on the analysis result in the analyzing step.

그리고, 상기 스킵 단계는, 상기 TPU가 모두 '0'인 경우, 상기 butterfly 연산을 스킵하고 butterfly 연산 결과로 '000...00'을 출력할 수 있다.In the skipping step, when all of the TPUs are '0', the butterfly operation may be skipped and '000 ... 00' may be output as a butterfly operation result.

또한, 상기 스킵 단계는, 상기 TPU가 이전에 입력되었던 이전 TPU와 동일하면, 상기 TPU에 대한 butterfly 연산을 스킵하고 butterfly 연산 결과로 상기 이전 TPU에 대한 butterfly 연산 결과를 출력할 수 있다.Further, in the skipping step, if the TPU is the same as the previous TPU that was previously input, the butterfly operation for the TPU may be skipped and the butterfly operation result for the previous TPU may be output as the butterfly operation result.

그리고, 상기 TPU가, 적어도 하나의 '1'을 포함하거나 이전에 입력되었던 이전 TPU와 다르면, 상기 TPU에 대해 butterfly 연산하여 결과를 출력하는 단계;를 더 포함할 수 있다.If the TPU includes at least one '1' or is different from a previous TPU previously input, the TPU may perform butterfly operation on the TPU and output a result.

또한, 상기 분석단계와 상기 스킵단계는, 상기 TPU의 변환과 역변환 모두에 적용될 수 있다.In addition, the analysis step and the skip step may be applied to both the transform and inverse transform of the TPU.

그리고, 상기 분석단계와 상기 스킵단계는, 수직 변환과 수평 변환 모두에 적용될 수 있다.The analyzing step and the skipping step may be applied to both the vertical transform and the horizontal transform.

상술한 바와 같이 본 발명에 따르면, 레지스터(register)를 이용한 구조에 비하여 하드웨어 복잡도가 감소된다. 또한, 본 발명은 레지스터를 이용한 구조에 비하여 임계 경로가 감소된다. 알테라(altera)를 통해 구현한 결과, 기본적인 FPGA 사용량인 ALUT의 사용량이 51%정도 감소되며, 임계 경로는 24%정도 감소하는 것을 확인할 수 있다.As described above, according to the present invention, hardware complexity is reduced in comparison with a structure using a register. In addition, the present invention reduces the critical path compared to the structure using the register. As a result of implementing Altera, we can see that the usage of ALUT, the basic FPGA usage, is reduced by 51% and the critical path is reduced by 24%.

또한, 본 발명에 따르면, HEVC 시스템에서 많은 연산량을 갖고 있는 변환/역변환 과정에서 butterfly 구조로 입력되는 값들의 분석을 통하여 변환/역변환 연산 자체를 스킵 함으로서 연산량을 줄일 수 있게 된다.In addition, according to the present invention, it is possible to reduce the amount of computation by skipping the transform / inverse transform operation itself through the analysis of the values input to the butterfly structure in the transform / inverse transform process having a large amount of computation in the HEVC system.

테스트 영상에 따라 발명의 효과는 차이가 있지만, 발명에 의한 대략적인 연산 스킵에 의하여 연산량 감소가 다음과 같이 나타남을 확인할 수 있었다.Although the effect of the invention is different according to the test image, it was confirmed that the amount of calculation is shown as follows by the approximate calculation skip by the invention.

- 역수평변환일 경우, 대략 60~80%의 연산 감소 효과-In the case of inverse horizontal transformation, approximately 60 ~ 80% of computation reduction effect

- 역수직변환일 경우, 1) 변환크기가 4일때 대략 30~40%의 연산 감소 효과, 2) 변환크기가 8일때 대략 20~40%의 연산 감소 효과, 3) 변환크기가 16일때 대략 10~20%의 연산 감소 효과, 4) 변환크기가 32일때 대략 10%의 연산 감소 효과-In case of inverted vertical conversion, 1) approximately 30 ~ 40% reduction of operation when conversion size is 4, 2) approximately 20 ~ 40% reduction of operation when conversion size is 8, and 3) approximately 10 when conversion size is 16 ~ 20% computational reduction, 4) approximately 10% computational reduction when the conversion size is 32

도 1은 본 발명의 실시예에 따른 수평 변환과 수직 변환의 연산 순서와 방향을 도시하는 도면,1 is a diagram illustrating a calculation order and a direction of a horizontal transform and a vertical transform according to an embodiment of the present invention;

도 2는 본 발명의 실시예에 따른 메모리를 이용한 변환을 위한 장치를 설명하기 위한 블록도,2 is a block diagram illustrating an apparatus for conversion using a memory according to an embodiment of the present invention;

도 3은 본 발명의 실시예에 따른 메모리를 이용한 변환을 위한 방법을 설명하기 위한 흐름도,3 is a flowchart illustrating a method for conversion using a memory according to an embodiment of the present invention;

도 4는 변환크기가 8일 때 butterfly 구조를 도시한 도면,4 is a view showing a butterfly structure when the conversion size is 8,

도 5는 한 번의 butterfly 연산을 수행할 때 변환 크기에 따라 필요한 연산량을 보여주는 표,5 is a table showing the amount of calculation required according to the conversion size when performing one butterfly operation,

도 6은 변환/역변환에서 사용되는 피승수를 나타낸 표,6 is a table showing a multiplier used in the transformation / inverse transformation,

도 7은 본 발명의 바람직한 실시예에 따른 HEVC 시스템의 구조를 도시한 도면,7 is a diagram illustrating the structure of an HEVC system according to an embodiment of the present invention;

도 8은 연산 스킵 기법에 대한 시뮬레이션 결과를 나타낸 표,8 is a table showing a simulation result for the operation skip technique;

도 9는 DC값만이 존재할 때 역 변환에 대한 입력값을 나타낸 도면, 그리고,9 illustrates an input value for an inverse transform when only a DC value is present, and

도 10은 DC값 주위에 약간의 값만이 존재할 때 역 변환에 대한 입력값을 나타낸 도면이다.FIG. 10 is a diagram illustrating an input value for an inverse transform when only a few values exist around a DC value.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.Prior to the description of the invention, the terms or words used in the specification and claims described below are not to be construed as limiting in their usual or dictionary meanings, and the inventors are contemplating their own invention in the best way. For the purpose of explanation, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention on the basis of the principle that it can be appropriately defined as the concept of term. Therefore, the embodiments described in the present specification and the configuration shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical idea of the present invention, and various equivalents may be substituted for them at the time of the present application. It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that like elements are denoted by like reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may blur the gist of the present invention will be omitted. For the same reason, some components in the accompanying drawings are exaggerated, omitted, or schematically illustrated, and the size of each component does not entirely reflect the actual size.

도 1은 본 발명의 실시예에 따른 수평 변환과 수직 변환의 연산 순서와 방향을 도시하는 도면이다. 도 1에서는 변환 크기가 32ㅧ32일 때 수평 변환과 수직 변환의 연산 순서와 방향을 보인다.1 is a diagram illustrating a calculation order and a direction of a horizontal transform and a vertical transform according to an exemplary embodiment of the present invention. In Fig. 1, the operation order and direction of the horizontal transform and the vertical transform are shown when the transform size is 32 ㅧ 32.

도 1의 (a) 및 (b)에서 숫자 0 내지 31은 각각 수평 변환과 수직 변환의 순서를 나타내며, 도 1의 (a)에서 x값은 화면 내 혹은 화면 간 예측 후의 잔여 성분에 해당하는 수평 변환의 입력값이다. 또한, 도 1의 (b)에서 X값은 수평 변환의 결과값이며 수직 변화의 입력값이 된다.In FIGS. 1A and 1B, numerals 0 to 31 indicate the order of horizontal transformation and vertical transformation, respectively. In FIG. 1A, the x value is a horizontal value corresponding to the residual component after intra-screen or inter-screen prediction. Input value of the transformation. In Fig. 1B, the X value is a result of the horizontal conversion and becomes an input value of the vertical change.

수평 변환은 (a)에 도시된 바와 같이, x값들을 행단위로 연산을 하고, 수직변환은 (b)에 도시된 바와 같이, X값들을 열 단위로 연산을 한다. 그러므로 수직변환의 첫 번째 연산을 위한 열에 대한 입력값은 수평 변환의 모든 행에 대한 첫 번째 결과값에 해당하기 때문에, 수직 변환은 모든 행에 대한 수평 변환이 종료되어야 시작할 수가 있다. 따라서 수평 변환의 결과값은 모두 저장이 되어야 한다.As shown in (a), the horizontal transform calculates the x values in rows, and the vertical transform calculates the x values in columns as shown in (b). Therefore, since the input to the column for the first operation of the vertical transformation corresponds to the first result of every row of the horizontal transformation, the vertical transformation can only begin after the horizontal transformation of all the rows has finished. Therefore, the result of the horizontal conversion must be stored.

저장을 위한 공간으로는 레지스터(register)와 메모리를 고려할 수 있다. 레지스터(register)는 읽고 쓰는 것이 메모리에 비하여 간단하므로 간단한 제어로 사용이 가능하나 하드웨어 복잡도를 증가시키고 많은 레지스터(register)를 사용하였을 경우에는 임계 경로(critical path) 또한 증가하게 된다. 메모리는 레지스터(register)에 비하여 적은 하드웨어 복잡도를 가지며, 임계 경로(critical path) 또한 레지스터(register)에 비하여 단축시킬 수가 있다. 하지만, 메모리는 주소값에 의하여 제어가 되기 때문에 레지스터(register)보다는 제어 신호가 증가하게 된다. 특히, 메모리에 저장하는 주소값과 읽는 주소값이 동일한 규칙이 아닌 경우에는 제어 복잡도가 증가하게 된다. 그러므로 HEVC 시스템의 변환에서는 수평 변환의 결과값은 행단위로 메모리에 저장을 하고 읽을 때는 열 단위이기 때문에 일반적인 메모리 제어로는 메모리의 사용이 불가능하다.As space for storage, register and memory may be considered. Since registers are simpler to read and write than memory, they can be used with simple control, but they increase hardware complexity and critical paths when many registers are used. Memory has less hardware complexity than registers, and the critical path can also be shortened compared to registers. However, since the memory is controlled by the address value, the control signal increases rather than the register. In particular, the control complexity increases when the address value stored in the memory and the address value read are not the same rule. Therefore, in the transformation of HEVC system, the result of horizontal transformation is stored in memory in units of rows and columns in units of reading. Therefore, the memory cannot be used under general memory control.

HEVC 시스템에서는 기존 H.264에 비하여 같은 영상 크기일 때 연산량이 2~4배 이상이고, HEVC에서 추구하는 영상 사이즈는 H.264에 비하여 4배가 크고 프레임율 또한 2배가 늘어나게 되므로 연산량은 최대 8배 이상 늘어나게 된다. 이에 따라, 하드웨어 복잡도는 그 만큼 증가하게 된다. 따라서 본 발명의 실시예에서는 하드웨어 복잡도가 늘어나는 레지스터(register)의 사용 대신 메모리를 사용하여 변환을 수행하도록 한다.In HEVC system, the computational amount is more than 2 ~ 4 times when the same image size is compared with the existing H.264, and the image size pursued by HEVC is 4 times larger than the H.264 and the frame rate is also doubled. More than that. As a result, the hardware complexity increases by that amount. Therefore, in the embodiment of the present invention, the conversion is performed using a memory instead of using a register, which increases hardware complexity.

이하, 본 발명의 실시예에 따라 하드웨어 복잡도를 고려하여 메모리를 사용하여 수평 변환의 결과값을 저장하는 방법에 대해서 설명한다. 예시적으로, 수평변환의 결과값을 저장하는 메모리는 최대 변환 크기인 32×32를 고려하여 32의 depth를 갖는 메모리 32개를 사용한다.Hereinafter, a method of storing a result of horizontal conversion using a memory in consideration of hardware complexity according to an embodiment of the present invention will be described. For example, the memory storing the result of the horizontal conversion uses 32 memories having a depth of 32 in consideration of the maximum conversion size of 32 × 32.

수평 변환의 결과가 X일 때 파이프라인을 이용한 변환의 첫 번째 결과는 X(0,0), X(0,1), X(0,2), X(0,3), X(0,4), ... , X(0,30), X(0,31)이고, 두 번째 결과는 X(1,0), X(1,1), X(1,2), X(1,3), X(1,4), ... , X(1,30), X(1,31)가 되며 32번째까지의 결과가 메모리에 저장된다. 메모리에 저장된 수평 변환의 결과는 수직 변환에서 첫 번째는 X(0,0), X(1,0), X(2,0), X(3,0), X(4,0), ... , X(30,0), X(31,0)가 입력되고 두 번째는 X(0,1), X(1,1), X(2,1), X(3,1), X(4,1), ... , X(30,1), X(31,1)가 되며, 32번째까지 입력이 된다.When the result of the horizontal transformation is X, the first result of the transformation using the pipeline is X (0,0), X (0,1), X (0,2), X (0,3), X (0, 4), ..., X (0,30), X (0,31), and the second result is X (1,0), X (1,1), X (1,2), X (1 , (3), X (1,4), ..., X (1,30), X (1,31) and the 32nd result is stored in memory. The result of the horizontal transformation stored in memory is the first in the vertical transformation: X (0,0), X (1,0), X (2,0), X (3,0), X (4,0),. .., X (30,0), X (31,0) is input, the second is X (0,1), X (1,1), X (2,1), X (3,1), X (4,1), ..., X (30,1), X (31,1), and up to the 32nd input.

일반적인 메모리 제어 기술을 사용한다면, 첫 번째 수평 변환의 결과값 32개는 32개의 메모리의 주소값 0에 적고, 두 번째 수평 변환의 결과값은 주소값 1에 적는다. 하지만 이렇게 제어를 한다면 수직 변환을 위해 메모리 값을 읽을 때, 첫 번째 수직 변환의 입력값을 한 번에 읽을 수 가 없게 되고, 수직 변환의 입력을 위해서는 32번의 클럭(clock)이 필요하게 된다. 이는 초당 30 혹은 60 프레임을 요구하는 HEVC의 실시간 구현이 불가능하게 된다. 즉, 32개의 수평 변환의 결과를 하나의 클럭에 저장하고 32개의 수직 변환의 입력을 하나의 클럭에 읽기 위해서는, 메모리에 저장할 때와 읽을 때 조건이 필요하다.Using the conventional memory control technique, 32 results of the first horizontal translation are written to address 0 of 32 memories, and the results of the second horizontal translation are written to address 1. With this control, however, when reading the memory value for vertical conversion, the input value of the first vertical conversion cannot be read at once, and 32 clocks are required for the vertical conversion input. This makes it impossible to implement a real-time implementation of HEVC, which requires 30 or 60 frames per second. That is, in order to store the results of the 32 horizontal conversions in one clock and read the inputs of the 32 vertical conversions in one clock, a condition is required when storing and reading in memory.

메모리 저장의 조건은 수평 변환의 32개의 행에 대한 결과값들이 32개의 각각의 메모리에 적혀야 한다. 그리고 메모리 읽기 조건은 수평 변환의 결과값인 수직 변환의 입력으로 들어가는 32개의 열에 대한 값들 또한 32개의 각각의 메모리에 적혀야 한다.The condition of memory storage is that the results of 32 rows of horizontal translation must be written to each of 32 memories. The memory read condition must also write the values for the 32 columns into the 32 separate memories into the input of the vertical transform, the result of the horizontal transform.

예를 들면, 수직 변환의 첫 번째 입력인 X(0,0), X(1,0), X(2,0), X(3,0), X(4,0), ... , X(30,0), X(31,0)들이 서로 각각의 메모리에 적혀 있어야 하나의 클럭으로 32개의 메모리를 동시에 읽어 수직 변환의 첫 번째 입력값들을 생성할 수 있다. 이 두 조건을 만족하는 메모리 저장 및 읽기 방법은 다음의 표 1과 같다. 표 1에서 가로 안의 값은 각 메모리의 주소값을 의미한다.For example, X (0,0), X (1,0), X (2,0), X (3,0), X (4,0), ..., X (30,0) and X (31,0) must be written in each memory so that one memory can read 32 memories simultaneously and generate the first input values of vertical conversion. The memory storage and reading methods that satisfy these two conditions are shown in Table 1 below. In Table 1, the horizontal values represent the address of each memory.

표 1

Table 1

<표 1>과 같은 방법으로 수평 변환의 결과값과 메모리에 저장하는 주소값을 수평 변환의 순서에 따라 메모리를 이동하며 저장하면, 수직 변환의 첫 번째 열의 입력인 X(0,0), X(1,0), X(2,0), X(3,0), X(4,0), ... , X(30,0), X(31,0)를 주소값 '0'으로 하나의 클럭에 읽을 수가 있다. 그리고 그 다음의 열의 입력도 주소값을 증가시켜가며 읽을 수 있다.If you save the result of horizontal conversion and the address value stored in memory according to the order of horizontal conversion in the same way as in <Table 1>, X (0,0), X which is the input of the first column of vertical conversion (1,0), X (2,0), X (3,0), X (4,0), ..., X (30,0), X (31,0) Can be read on a single clock. The next column of inputs can also be read with increasing address values.

상술한 방법을 사용하기 위해서는 수평 변환의 결과를 순서대로 메모리에 저장하는 것이 아니므로, 메모리 전단에 메모리 저장 주소를 생성하는 부분과 수평 변환 결과의 순서를 재조정하여 메모리에 저장값을 생성하는 부분이 필요하다. 또한, 메모리에서 읽은 값들도 메모리의 순서와 수직 변환의 입력 순서가 다르므로 순서를 재조정하는 부분이 필요하다.In order to use the method described above, the results of the horizontal conversion are not stored in the memory in order. Therefore, a part of generating a memory storage address at the front end of the memory and a part of generating a stored value in the memory by rearranging the order of the horizontal conversion result are described. need. In addition, since the values read from the memory differ from the order of the memory and the input order of the vertical conversion, a part of reordering is necessary.

다음의 표 2는 재정렬 되어 메모리에 저장된 값들을 메모리 순서와 주소값 기준으로 정리를 한 것이다.Table 2 below shows the reordered data stored in memory based on memory order and address value.

표 2

address	mem.0	mem.1	mem.2	mem.3	mem.4	......	mem. 30	mem. 31
0	X(0,0)	X(1,0)	X(2,0)	X(3,0)	X(4,0)	......	X(30,0)	X(31,0)
1	X(31,1)	X(0,1)	X(1,1)	X(2,1)	X(3,1)	......	X(29,1)	X(30,1)
2	X(30,2)	X(31,2)	X(0,2)	X(1,2)	X(2,2)	......	X(28,2)	X(29,2)
3	X(29,3)	X(30,3)	X(31,3)	X(0,3)	X(1,3)	......	X(27,3)	X(28,3)
4	X(28,4)	X(29,4)	X(30,4)	X(31,4)	X(0,4)	......	X(26,4)	X(27,4)
......	......	......	......	......	......	......	......	......
30	X(2,30)	X(3,30)	X(4,30)	X(5,30)	X(6,30)	......	X(0,30)	X(1,30)
31	X(1,31)	X(2,31)	X(3,31)	X(4,31)	X(5,31)	......	X(1,31)	X(0,31)

TABLE 2

address	mem.0	mem.1	mem.2	mem.3	mem.4	......	mem. 30	mem. 31
0	X (0,0)	X (1,0)	X (2,0)	X (3,0)	X (4,0)	......	X (30,0)	X (31,0)
One	X (31,1)	X (0,1)	X (1,1)	X (2,1)	X (3,1)	......	X (29,1)	X (30,1)
2	X (30,2)	X (31,2)	X (0,2)	X (1,2)	X (2,2)	......	X (28,2)	X (29,2)
3	X (29,3)	X (30,3)	X (31,3)	X (0,3)	X (1,3)	......	X (27,3)	X (28,3)
4	X (28,4)	X (29,4)	X (30,4)	X (31,4)	X (0,4)	......	X (26,4)	X (27,4)
......	......	......	......	......	......	......	......	......
30	X (2,30)	X (3,30)	X (4,30)	X (5,30)	X (6,30)	......	X (0,30)	X (1,30)
31	X (1,31)	X (2,31)	X (3,31)	X (4,31)	X (5,31)	......	X (1,31)	X (0,31)

본 발명의 실시예에 따라 메모리에 저장되어 있는 수평 변환의 결과값은 표 2에 보인 바와 같다.According to an embodiment of the present invention, the result of the horizontal conversion stored in the memory is shown in Table 2.

상술한 바와 같이, 변환에서 수평 변환 저장을 위해 레지스터(register)가 아닌 메모리를 사용하기 위한 장치에 대해서 보다 상세하게 살펴보기로 한다.As described above, a device for using a memory other than a register for horizontal translation storage in a transformation will be described in more detail.

도 2는 본 발명의 실시예에 따른 메모리를 이용한 변환을 위한 장치를 설명하기 위한 블록도이다.2 is a block diagram illustrating an apparatus for conversion using a memory according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 변환 장치는 기본적으로, 수평 변환부(110), 메모리부(140) 및 수직 변환부(160)를 포함한다. 그리고 변환 장치는 추가로, 메모리(140) 전단에 위치하여, 수평 변환의 결과값들이 메모리에 저장되는 어드레스를 생성하는 저장 주소값 생성부(120), 및 수평 변환 결과값들을 가지고 메모리 저장값을 생성하는 32개의 레지스터(register)를 포함하는 수평 레지스터부(130)를 더 포함한다. 또한, 변환 장치는 추가로, 메모리부(140) 후단에 위치하며, 메모리부(140)에서 읽은 데이터를 통하여, 수직 변환 입력값을 생성하는 32개의 레지스터(register)로 구성되는 수직 레지스터부(150)를 더 포함한다.Referring to FIG. 2, the conversion apparatus according to the embodiment of the present invention basically includes a horizontal converter 110, a memory 140, and a vertical converter 160. The conversion device further includes a storage address value generation unit 120 that is located in front of the memory 140 and generates an address at which the result values of the horizontal conversion are stored in the memory, and the memory storage value with the horizontal conversion result values. The apparatus further includes a horizontal register unit 130 including 32 registers to generate. In addition, the conversion device is further located in the rear of the memory unit 140, the vertical register unit 150 consisting of 32 registers for generating a vertical conversion input value through the data read from the memory unit 140 More).

수평 연산부(110)에는 도 1의 (a)와 같이 행과 열이 동수로 정렬된 데이터들이 입력된다. 이에 따라, 수평 연산부(110)는 각 행별로 수평 변환을 수행하여 출력한다. 이때, 수평 연산부(110)는 도 1의 (a)에 보인 바와 같이 0 내지 31과 같은 순서로 연산을 수행한다.The horizontal operation unit 110 receives data in which rows and columns are arranged in the same number as shown in FIG. Accordingly, the horizontal calculating unit 110 performs horizontal conversion for each row and outputs the horizontal transform. At this time, the horizontal operation unit 110 performs the calculation in the order as 0 to 31 as shown in (a) of FIG.

메모리부(140)는 수평 연산부(110)에 입력되는 데이터의 행 또는 열과 동수로 서로 구분되는 메모리(메모리 영역)를 가진다. 이는 동일한 연산 순서(0 내지 31 각각)에 해당하는 데이터들을 서로 구분되는 메모리(메모리 영역)에 따로 저장하고(쓰고), 단일 클럭으로 읽기 위한 것이다.The memory unit 140 has a memory (memory region) which is divided into equal numbers of rows or columns of data input to the horizontal calculating unit 110. This is to store (write) data corresponding to the same operation sequence (0 to 31 respectively) in a separate memory (memory region), and to read them as a single clock.

수평 레지스터부(130)는 수평 연산부(110)의 각 행별 수평 변환에 따른 결과에서 각 행의 데이터들 각각을 메모리부(140)의 서로 구분되는 메모리 각각에 따로 저장한다. 이때, 저장 주소값 생성부(120)는 상술한 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 주소값을 생성하여 할당한다(표 1 및 표 2 참조).The horizontal register 130 stores the data of each row in the result of the horizontal conversion of each row of the horizontal calculator 110 in the memory of the memory 140 separately from each other. At this time, the storage address value generation unit 120 generates and assigns an address value so that the same address value is assigned to data having the same column in the data in which the rows and columns are arranged in the same number (Table 1 and Table 2). Reference).

수직 레지스터부(150)는 메모리부(140)의 서로 구분되는 메모리 각각으로부터 단일 클럭으로 데이터를 읽어 들인다. 이때, 수직 레지스터부(150)는 서로 구분되는 메모리 각각에서 동일한 주소값이 할당된 데이터를 단일 클럭으로 읽어 들일 수 있다(표 1 및 표 2 참조). 그런 다음, 수직 레지스터부(150)는 수직 변환부(160)에 도 1의 (b)에 기술된 순서(0 내지 31)로 데이터를 입력한다.The vertical register unit 150 reads data from each of the memories of the memory unit 140 separated from each other by a single clock. In this case, the vertical register unit 150 may read data allocated with the same address value from each of the memories separated from each other as a single clock (see Tables 1 and 2). Then, the vertical register unit 150 inputs data to the vertical conversion unit 160 in the order (0 to 31) described in FIG.

이에 따라, 수직 변환부(160)는 수직 레지스터부(150)가 읽어들인 데이터에 대해 수직 변환을 수행할 수 있다.Accordingly, the vertical conversion unit 160 may perform vertical conversion on the data read by the vertical register unit 150.

도 3은 본 발명의 실시예에 따른 메모리를 이용한 변환을 위한 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a method for conversion using a memory according to an embodiment of the present invention.

도 3을 참조하면, 수평 변환부(110)는 S310 단계에서 도 1의 (a)에 보인 바와 같이 행과 열이 동수로 정렬된 데이터들에 대해 순서 1 내지 31(정렬된 데이터의 행 별로)로 수평 변환에 따른 연산을 수행하여 그 결과를 출력한다. 그러면, 저장 주소값 생성부(120)는 S320 단계에서 각 순서의 각 값에 대해 주소를 할당한다. 이때, 할당되는 주소는 동일한 수직 변환의 순서를 가지는 값에 대해 동일한 주소를 할당한다. 즉, 도 1의 (a)에 도시된 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 한다. 이와 같이, 메모리 쓰기 주소값 생성은 다음의 수학식 1과 같은 코드에 따라 이루어진다.Referring to FIG. 3, in step S310, the horizontal conversion unit 110 may order 1 to 31 (for each row of sorted data) with respect to data in which rows and columns are equally aligned as shown in FIG. 1A. Perform the operation according to the horizontal transformation with the output of the result. Then, the storage address value generation unit 120 allocates an address to each value of each order in step S320. In this case, the allocated address allocates the same address to a value having the same vertical conversion order. That is, the same address value is allocated to data having the same column in the data in which the rows and columns shown in FIG. As such, the memory write address value is generated according to a code shown in Equation 1 below.

[수학식 1][Equation 1]

for ( HT_order=0; HT_order < 32; HT_order=++)for (HT_order = 0; HT_order <32; HT_order = ++)

for (mem_order=0; mem_order < 32; mem_order=++) for (mem_order = 0; mem_order <32; mem_order = ++)

wd_addr(HT_order, mem_order) = mod(32 - HT_order +mem_order, 32) wd_addr (HT_order, mem_order) = mod (32-HT_order + mem_order, 32)

HT_order는 수평 변환의 순서를 나타내며, mem_order는 메모리의 순서를 나타내고, 결과인 wd_addr은 메모리 저장 주소값을 표현한다. 메모리의 수가 32개이므로 mem_order에 의하여 총 32개의 메모리 저장 주소값들이 생성된다.HT_order represents a horizontal translation order, mem_order represents a memory order, and the result wd_addr represents a memory storage address value. Since the number of memories is 32, a total of 32 memory address values are created by mem_order.

수평 레지스터부(130)는 S330 단계에서 각 순서의 각 값을 메모리부(140)의 서로 다른 메모리(mem. 1 내지 mem. 31)에 할당된 주소로 저장한다. 즉, 수평 레지스터부(130)는 도 1의 (a)에 도시된 행과 열이 동수로 정렬된 데이터에서 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각(mem. 1 내지 mem. 31 각각)에 따로 저장한다. 이러한 저장은 단일 클럭에 이루어진다.In operation S330, the horizontal register unit 130 stores each value of each order as an address allocated to different memories (mem. 1 to mem. 31) of the memory unit 140. That is, the horizontal register unit 130 stores each of the data of each row in the data of the same number of rows and columns shown in FIG. Save it separately). This storage is done on a single clock.

수평 레지스터부(130)가 메모리에 저장할 데이터를 생성하는 것은 다음의 수학식 2에 기술된 코드로 구현할 수 있다.Generating data to be stored in the memory by the horizontal register unit 130 may be implemented by the code described in Equation 2 below.

[수학식 2][Equation 2]

for ( mem_order=0; mem_order < 32; mem_order=++) for (mem_order = 0; mem_order <32; mem_order = ++)

mem(mem_order,HT_order) = X(HT_order,wr_addr(HT_order,mem_order)) mem (mem_order, HT_order) = X (HT_order, wr_addr (HT_order, mem_order))

생성된 주소값과 입력되는 수평 변환의 32개의 결과값은 임계 경로(critical path)를 확보하기 위하여 수평 레지스터부(130)의 레지스터(register)에 버퍼링 된 후에, 수평 변환의 순서대로 메모리부(140)부의 각 메모리에 저장한다. 이때, 동일한 순서의 각 데이터(동일한 행에 배열된 데이터 각각)는 서로 다른 메모리(mem. 0 내지 mem. 31)에 저장된다.The generated address values and the 32 result values of the input horizontal transform are buffered in registers of the horizontal register unit 130 to secure a critical path, and then the memory unit 140 in the order of horizontal conversion. Stored in each memory. At this time, each data in the same order (each data arranged in the same row) is stored in different memories (mem. 0 to mem. 31).

그리고 수직 레지스터부(150)는 S340 단계에서 메모리부(140)의 서로 구분되는 메모리(mem. 0 내지 mem. 31)에 저장된 데이터를 단일 클럭에서 메모리부(140)의 각 메모리에 할당된 주소에 따라 읽어 들여, 수직 변환부(160)에 입력한다. 메모리 읽기 주소 생성은 수직 변환의 순서대로 읽으면 되므로 별도의 주소 생성을 위한 모듈이 필요하지가 않다. 즉, 동일한 주소가 할당된 데이터를 단일 클럭으로 읽어 들이면, 수직 변환의 순서대로 읽을 수 있게 된다(표 1 및 표 2 참조).In operation S340, the vertical register unit 150 stores the data stored in the memories (mem. 0 to mem. 31) of the memory unit 140 at addresses allocated to each memory of the memory unit 140 in a single clock. It reads along and inputs to the vertical conversion part 160. Memory read address generation does not need a module for address generation because it can be read in the order of vertical conversion. In other words, if data with the same address is read as a single clock, it can be read in the order of vertical conversion (see Table 1 and Table 2).

여기서, 수직 레지스터부(150)의 수직 변환 입력의 생성은 다음의 수학식 3의 코드에 따라 구현할 수 있다.Here, the generation of the vertical conversion input of the vertical register unit 150 may be implemented according to the code of Equation 3 below.

[수학식 3][Equation 3]

for ( VT_order=0; VT_order < 32; VT_order=++)for (VT_order = 0; VT_order <32; VT_order = ++)

for ( mem_order=0; mem_order < 32; mem_order=++){ for (mem_order = 0; mem_order <32; mem_order = ++) {

mem_reorder(VT_order, mem_order) = mod(VT_order + mem_order, 32) mem_reorder (VT_order, mem_order) = mod (VT_order + mem_order, 32)

(3)(3)

VT_in(VT_order,mem_order) = mem(mem_reorder,VT_order) VT_in (VT_order, mem_order) = mem (mem_reorder, VT_order)

} }

수평 변환의 결과값은 수평 변환 순서에 따라 서로 다른 메모리에 적혀 있다. 이를 위해, mem_reorder는 메모리를 읽을 때 수직 변환의 입력을 생성하기 위한 재정렬 순서를 나타낸다. VT_in은 재정렬된 수직 변환의 입력값을 나타내며, 레지스터(register)를 통한 후에 수직 변환 순서에 따라 수직 변환기에 32개의 데이터가 입력된다.The result of the horizontal conversion is written in different memories according to the horizontal conversion order. To do this, mem_reorder indicates the reordering order to generate the input of the vertical translation when reading the memory. VT_in represents an input value of the rearranged vertical conversion, and 32 data are input to the vertical converter according to the vertical conversion order after the register.

그러면, 수직 변환부(160)는 S350 단계에서 읽혀진 데이터에 대해 도 1의 (b)와 같은 순서(1 내지 31)로 수직 변환을 수행하여 데이터를 출력한다.Then, the vertical converter 160 outputs data by performing vertical conversion on the data read in step S350 in the order (1 to 31) as shown in FIG.

이러한 구조는 수평 변환의 결과를 저장하기 위하여 32×32의 메모리와 32×2개의 레지스터(register)를 사용하였다. 만약, 메모리가 아닌 레지스터(register)로 수평 변환의 결과를 저장하였다면 32×32개의 레지스터(register)를 사용하여야 한다.This structure uses 32x32 memory and 32x2 registers to store the result of the horizontal translation. If the result of horizontal translation is stored in a register rather than a memory, 32 x 32 registers should be used.

다음의 표 3은 변환기 없이 레지스터를 사용하는 구조와 메모리를 사용하는 구조를 알테라(altera) EP3SL340F1760C3에서 구현한 후, FPGA 사용량 비교 결과를 보인다.Table 3 below shows the FPGA usage comparison after implementing the structure using the register without the converter and the memory using the structure in Altera EP3SL340F1760C3.

표 3

	레지스터(register)architecture	memoryarchitecture	reductionratio
ALUT	13,835	6,712	51.49%
레지스터(register)	17,413	2,082
memory	0	16,384
critical path	5.139ns	3.918ns	23.76%

TABLE 3

	Register architecture	memoryarchitecture	reductionratio
ALUT	13,835	6,712	51.49%
Register	17,413	2,082
memory	0	16,384
critical path	5.139ns	3.918ns	23.76%

표 3에 보인 바와 같이, 수평 변환의 결과를 저장하는 버퍼로서 메모리를 사용하는 구조가 레지스터(register)를 사용하는 구조에 비하여, 기본적인 FPGA 사용량인 ALUT의 사용량이 51%정도 감소하였으며 임계 경로 또한 24%정도 감소하는 것을 확인할 수 있었다.As shown in Table 3, the structure that uses memory as a buffer to store the result of horizontal conversion has reduced the usage of ALUT, which is the basic FPGA usage by 51%, and the critical path. It was confirmed that the percent decrease.

상술한 바와 같이, 본 발명은 HEVC 시스템의 변환 과정에서 하드웨어 복잡도를 고려하여 메모리를 사용한다. 즉, HEVC 시스템의 변환 과정 중 수평 변환의 결과를 저장함에 있어서 하드웨어 복잡도를 고려하여 메모리를 사용하였다. 본 발명에 따르면, 고속 연산을 위해 하나의 클럭을 통하여 데이터를 메모리에 쓰고 읽을 수 있다.As described above, the present invention uses a memory in consideration of hardware complexity in the conversion process of the HEVC system. That is, the memory is used in consideration of hardware complexity in storing the result of the horizontal conversion during the conversion process of the HEVC system. According to the present invention, data can be written to and read from a memory through one clock for high speed computation.

변환/역변환 블록에서 기본적인 butterfly 구조 및 연산량Basic butterfly structure and amount of computation in transform / inverse transform block

도 4는 변환크기가 8일 때 butterfly 구조를 도시한 도면이다. 도 4에 도시된 바와 같이, 변환/역변환은 기본적으로 수평 변환과 수직 변환의 두 단계로 나뉘어지고, 각 단계에서는 변환크기가 n×n일 때, n개의 값이 butterfly 구조로 입력되어 한 번의 butterfly 연산이 수행된다.4 is a view showing a butterfly structure when the conversion size is 8. As shown in FIG. 4, the transform / inverse transform is basically divided into two stages, a horizontal transform and a vertical transform. In each stage, when the transform size is n × n, n values are input in a butterfly structure and one butterfly The operation is performed.

butterfly 연산에 있어서 수평 변환일 때는 n×n를 갖는 변환 크기의 값에서 행에 대한 n개의 값이 차례대로 입력되고 수직 변환일 때는 열에 대한 n개의 값이 차례대로 입력된다.In the horizontal operation of the butterfly operation, n values for a row are input in order from the value of the transform size having n × n, and n values for a column are input in order for the vertical conversion.

이렇게 입력되는 행 혹은 열에 해당하는 n개의 값은 TPU(Transform Processing Unit)으로 정의한다. butterfly 연산은 도 4와 같이 입력되는 TPU의 덧셈과 곱셈, 옵셋 덧셈, shift에 의해 수행된다. 이러한 butterfly 연산을 n번 수행하면 수평 혹은 수직의 단계에 해당하는 하나의 단계가 완료 된다.N values corresponding to the input row or column are defined as a TPU (Transform Processing Unit). The butterfly operation is performed by addition, multiplication, offset addition, and shift of the input TPU as shown in FIG. 4. Performing this butterfly operation n times completes one step, either horizontal or vertical.

도 5는 한 번의 butterfly 연산을 수행할 때 변환 크기에 따라 필요한 연산량을 표로 보여준다.5 shows a table showing the amount of calculation required according to the transform size when performing one butterfly operation.

이에 따르면, 전체의 변환/역변환을 위해서는 해당 값에 변환 크기와 단계의 수를 가리키는 2를 곱하고 변환과 역변환의 수행에 대하여 2를 더 곱하여야 한다. 즉, 변환 크기가 32일 때는 44,032개의 곱셈과 51,712개의 덧셈, 4,096개의 shift가 필요하게 된다. 본 발명의 실시예에서는 이러한 많은 연산량을 줄이는 방법에 대하여 기술한다.According to this, for the total transform / inverse transform, the value must be multiplied by 2 indicating the transform size and the number of steps, and further multiplied by 2 for the performance of the transform and the inverse transform. That is, when the transform size is 32, 44,032 multiplications, 51,712 additions, and 4,096 shifts are required. An embodiment of the present invention describes a method of reducing such a large amount of computation.

변환/역변환에서 butterfly 연산 스킵(skip) 기법Skip Butterfly Operations in Transform / Inverse Transforms

변환에 입력되는 TPU는 butterfly 연산에서 항상 곱셈을 수행하게 되며 곱셈에서 사용되는 피승수는 같은 크기의 변환 블록 내에서는 고정되어 있는 특성을 이용하여 두 가지의 연산 스킵 기법을 설명한다.The TPU input to the transform always performs multiplication in the butterfly operation, and the multiplicands used in the multiplication are explained using the fixed characteristics in the transform block of the same size.

첫 번째 연산 스킵 기법은 다음과 같다. butterfly 연산은 앞서 설명한 것과 같이 덧셈과 곱셈, 옵셋 덧셈, shift로 구성된다. 구체적으로, 입력되는 TPU는 변환크기에 따른 butterfly 원리에 의하여 덧셈이 수행되며, 덧셈 결과 값들은 변환 크기에 따른 피승수에 의해 곱하여진다. 이때, 입력되는 TPU의 모든 값이 '0'이라면 곱셈의 결과도 당연히 '0'이 되므로 곱셈과 그에 앞서 실행된 덧셈을 연산하지 않아도 된다.The first operation skip technique is as follows. The butterfly operation consists of addition, multiplication, offset addition, and shift as described above. Specifically, the input TPU is performed by the butterfly principle according to the transform size, and the addition result values are multiplied by the multiplier according to the transform size. At this time, if all the values of the input TPU are '0', the result of the multiplication also becomes '0', so it is not necessary to calculate the multiplication and the addition performed before it.

또한, 옵셋 덧셈과 shift의 관계는 shift가 2ⁿ일 때 옵셋값은 2^(n-1)이 되므로, 옵셋 덧셈과 shift의 결과값은 '0'이 된다. 그러므로 입력되는 TPU가 모두 '0'이라면 변환/역변환의 결과값도 '0'이므로 변환/역변환에 대한 연산을 수행할 필요가 없으므로 이때 많은 연산량을 감소시킬 수 있다. 이 기법을 사용하기 위해서 변환/역변환 앞에 변환 크기 만큼의 '0'과 TPU를 비교하는 비교기 삽입이 필요하다.In addition, the relationship between offset addition and shift becomes 2 ^(n-1) when the shift is 2 ⁿ , so the result of offset addition and shift becomes '0'. Therefore, if the input TPUs are all '0', the result of the conversion / inverse transformation is also '0', so it is not necessary to perform the operation for the transformation / inverse transformation, thereby reducing a large amount of computation. To use this technique, a comparator insert is required before the transform / inverse transform to compare the TPU with '0' of the transform size.

두 번째 연산 스킵 방법은 다음과 같다. 변환/역변환 블록에서 사용되는 피승수는 같은 변환 크기일 때는 도 6에 나타난 피승수의 조합으로 구성된다.The second operation skip method is as follows. The multiplicands used in the transform / inverse transform block consist of a combination of multiplicands shown in FIG.

변환/역변환을 수행할 때 같은 변환 크기라면 피승수 값은 정해져 있으므로, TPU에 해당되는 입력값에 의해 변환/역변환 블록의 결과가 정해진다. 현재의 TPU에 대한 변환/역변환이 수행되고, 그 다음 TPU의 입력 또한 같은 피승수에 의해 수행되어진다.When the transform / inverse transform is performed, the multiplicand value is determined if the same transform size is determined, and the result of the transform / inverse transform block is determined by the input value corresponding to the TPU. The transform / inverse transform is performed on the current TPU, and then the input of the TPU is also performed by the same multiplicand.

즉, 현재의 TPU와 그 전의 TPU의 값들이 정확히 같다면, 같은 피승수를 사용하므로 그 결과값도 같게 된다. 이러한 경우, 그 전의 결과값을 저장해 놓는다면 현재의 TPU에 대한 변환/역변환을 수행할 필요가 없다. 이러한 특성을 이용하여, 입력되는 TPU를 저장할 수 있는 버퍼와 그에 대한 변환/역변환의 결과를 저장하는 버퍼 삽입과, 또한 저장된 TPU와 현재의 TPU를 비교할 수 있는 비교기를 삽입이 필요하다.That is, if the values of the current TPU and the previous TPU are exactly the same, the same multiplier is used, and thus the result is the same. In this case, if the previous result is stored, it is not necessary to perform the conversion / inverse transformation on the current TPU. Using this characteristic, a buffer for storing the input TPU and a buffer for storing the result of the conversion / inverse conversion therefor and a comparator for comparing the stored TPU with the current TPU are required.

이러한 두 개의 연산 스킵 기법을 위한 하드웨어 구조는 도 7과 같다. 도 7은 본 발명의 바람직한 실시예에 따른 HEVC 시스템을 도시한 도면이다.The hardware structure for the two skip operations is shown in FIG. 7 illustrates an HEVC system according to a preferred embodiment of the present invention.

도 7에 도시된 바와 같이, HEVC 시스템은, 변환(transform) 블럭(710), 양자화(quantization) 블럭(720), 엔트로피 코딩(entropy coder) 블럭(730), 역양자화(inverse quantization) 블럭(740) 및 역변환(inverse transform) 블럭(750)을 포함한다.As shown in FIG. 7, the HEVC system includes a transform block 710, a quantization block 720, an entropy coder block 730, and an inverse quantization block 740. And an inverse transform block 750.

변환/역변환 블럭(710, 750)은 각각 수평 변환 블럭과 수직 변환 블럭으로 구성되어 있는데, 이 4개의 블럭은 butterfly 연산 블럭(800)으로 구현가능하다.The transform / inverse transform blocks 710 and 750 are composed of a horizontal transform block and a vertical transform block, respectively, and these four blocks can be implemented by the butterfly operation block 800.

butterfly 연산 블럭(800)은, 비교기-I(comparator-I)(810), 비교기-II(comparator-II)(820), 버퍼-I(buffer-I)(830), OR 게이트(840), 버퍼-II(buffer-II)(850), butterfly 연산기(butterfly)(860) 및 멀티플렉서(870)를 포함한다.The butterfly operation block 800 may include a comparator-I (810), a comparator-II (820), a buffer-I (830), an OR gate 840, A buffer-II 850, a butterfly operator 860, and a multiplexer 870.

비교기-I(810)는 입력되는 TPU을 '0'과 비교하여 비교 결과를 출력하고, 비교기-II(820)는 입력되는 TPU을 버퍼-I(830)에 저장되어 있는 이전 TPU와 비교하여 비교 결과를 출력한다.The comparator-I 810 outputs a comparison result by comparing the input TPU with '0', and the comparator-II 820 compares the input TPU with a previous TPU stored in the buffer-I 830. Output the result.

버퍼-I(830)는 입력되는 TPU을 저장하기 위한 버퍼이고, 버퍼-II(850)는 butterfly 연산기(860)에 의한 butterfly 연산 결과를 저장하기 위한 버퍼이다.The buffer-I 830 is a buffer for storing an input TPU, and the buffer-II 850 is a buffer for storing a butterfly operation result by the butterfly operator 860.

OR 게이트(840)는 비교기-I(810)의 비교 결과와 비교기-II(820)의 비교 결과를 OR 연산하여, 그 결과를 butterfly 연산기(860)에 출력한다.The OR gate 840 ORs the comparison result of the comparator-I 810 and the comparison result of the comparator-II 820, and outputs the result to the butterfly operator 860.

butterfly 연산기(860)는 도 4에 도시된 구조의 연산기로, 이미 설명한 바 있다.The butterfly operator 860 is an operator of the structure shown in FIG. 4 and has been described above.

멀티플렉서(870)는 비교기-I(810)과 비교기-II(820)의 비교결과를 기초로, 1) '000...00', 2) 버퍼-II(850)에 저장된 이전 TPU에 대한 butterfly 연산 결과, 3) butterfly 연산기(860)에 의한 현재 TPU에 대한 butterfly 연산 결과 중 하나를 선택적으로 출력한다.The multiplexer 870 is based on a comparison between the comparator-I 810 and the comparator-II 820, 1) '000 ... 00', 2) butterfly for the previous TPU stored in the buffer-II 850. Operation result, 3) selectively outputs one of the butterfly operation results for the current TPU by the butterfly operator 860.

구체적으로, 멀티플렉서(870)는,Specifically, the multiplexer 870,

1) 비교기-I(810)의 비교 결과가 '1'이면(입력되는 TPU가 모두 '0'인 경우), butterfly 연산기(860)에 의한 butterfly 연산 없이 '000...00'을 출력하고,1) When the comparison result of the comparator-I 810 is '1' (when the input TPUs are all '0'), '000 ... 00' is output without the butterfly operation by the butterfly operator 860,

2) 비교기-II(820)의 비교 결과가 '1'이면(입력되는 TPU가 이전 TPU와 동일한 경우), 버퍼-II(850)에 저장된 이전 TPU에 대한 butterfly 연산 결과를 출력하며,2) If the comparison result of the comparator-II 820 is '1' (when the input TPU is the same as the previous TPU), the butterfly operation result of the previous TPU stored in the buffer-II 850 is output.

3) 비교기-I(810)와 비교기-II(820)의 비교 결과가 모두 '0'이면(입력되는 TPU가, 적어도 하나의 '1'을 포함하고 이전 TPU와 다른 경우), butterfly 연산기(860)에 의한 현재 TPU에 대한 butterfly 연산 결과를 출력한다.3) When the comparison result of the comparator-I 810 and the comparator-II 820 is both '0' (when the input TPU includes at least one '1' and is different from the previous TPU), the butterfly operator 860 Outputs the result of the butterfly operation on the current TPU.

한편, 비교기-I(810) 또는 비교기-II(820)의 비교 결과가 '1'이면(위 "1)"과 "2)"의 경우), OR 게이트(840)의 출력값이 '1'이 되어 butterfly 연산기(860)에 의한 butterfly 연산이 수행되지 않는다.On the other hand, if the comparison result of the comparator-I 810 or the comparator-II 820 is '1' (in case of "1" and "2)", the output value of the OR gate 840 is '1'. Therefore, the butterfly operation by the butterfly operator 860 is not performed.

반면, 비교기-I(810)와 비교기-II(820)의 비교 결과가 모두 '0'이면(위 "3)"의 경우), OR 게이트(840)의 출력값이 '0'이 되어 butterfly 연산기(860)에 의한 butterfly 연산이 수행된다.On the other hand, if the comparison results of the comparator-I 810 and the comparator-II 820 are both '0' (in case of “3” above), the output value of the OR gate 840 becomes '0' and the butterfly operator ( The butterfly operation by 860 is performed.

시뮬레이션 결과Simulation result

이하에서는, butterfly 연산 스킵에 의한 효과를 알아보기 위하여, HM8의 소프트웨어 시뮬레이터에 본 실시예에 따른 연산 기법을 적용하고, 다양한 영상의 테스트 시퀀스를 이용한 시뮬레이션 결과를 도 8의 표에 나타내었다.Hereinafter, in order to examine the effect of the butterfly operation skip, the calculation method according to the present embodiment is applied to the software simulator of the HM8, and the simulation results using the test sequences of various images are shown in the table of FIG. 8.

시뮬레이션에서 양자화와 역양자화에서 사용하는 Quantization Parameter (QP)는 32로 고정을 하였다. HT는 수평 변환(Horizontal Transform)을 가리키며, VT는 수직 변환(Vertical Transform)을 가리키고 HT와 VT앞에 붙은 접두사 I는 역변환(inverse transform)을 지칭한다.In the simulation, the quantization parameter (QP) used for quantization and inverse quantization is fixed at 32. HT denotes a horizontal transform, VT denotes a vertical transform, and the prefix I prefixed with HT and VT denotes an inverse transform.

시뮬레이션 결과는 2개의 연산 스킵 기법에 관하여 비교기 값이 1이 되는 발생 비율을 나타내었으며 단위는 퍼센트(%)이다. TPU의 모든 값이 '0'인 경우에는 그 전의 TPU와 현재의 TPU에 대한 비교를 하지 않았다.The simulation results show the rate of occurrence of the comparator value of 1 for the two computational skipping techniques, in percent (%). If all values of the TPU are '0', no comparison is made between the previous TPU and the current TPU.

시뮬레이션 결과를 분석하여 보면, 변환 블럭에서는 두 가지의 연산 스킵이 발생하는 경우가 상당히 미비하다. 그 이유는 변환 블록으로 들어가는 입력이 인트라 혹은 인터 예측 후의 잔여 성분에 해당되는데 이 값들은 모두 '0'의 성분이 아닌 경우가 많고 또한 영상의 변화가 생긴다면 현재의 TPU와 그전의 TPU가 같을 경우가 작아진다.Analyzing the simulation results, there are quite a few cases where two operation skips occur in the transform block. The reason for this is that the input to the transform block corresponds to the residual components after intra or inter prediction, and these values are often not all components of '0', and if the image changes, the current TPU and the previous TPU are the same. Becomes smaller.

만약, 원 영상의 이미지 변화 혹은 색의 변화가 적다면 연산 스킵이 일어나는 경우는 시뮬레이션 결과보다 많아질 것으로 예상이 된다. 반면에 역변환일 경우에는 연산 스킵이 일어나는 경우가 빈번히 발생을 한다. 특히, 역 수직 변환에서는 비교기-I에 대한 발생이 많이 일어나며, 역 수평 변환에서는 비교기-II에 대한 발생이 많이 발생한다.If the image change or the color change of the original image is small, it is expected that the operation skip occurs more than the simulation result. On the other hand, in the case of inverse transform, operation skip occurs frequently. In particular, in the inverted vertical transformation, a large number of occurrences of the comparator-I occur.

이는, 변환 이후에 양자화와 역 양자화에 의하여 변환된 값들이 스케일링되어 지므로, 값이 작은 경우에는 '0'으로 되므로, 역변환에서 먼저 수행이 되는 역 수직변환에서는 TPU가 모두 '0'이 되는 경우가 발생한다. 그리고, TPU가 모두 '0'이 아니더라도 도 9의 (a)에서와 같이 수직 변환의 입력이 DC값만 존재하는 경우 수직 변환 결과는 도 9의 (b)와 같이 행의 기준으로 보았을 때 TPU가 서로 같음을 알 수 있다. 즉, 행으로 입력되는 수평 변환에서는 비교기-II에서 '1'이 많이 발생하게 되며 첫 번째 열에 대한 역변환만 수행하면 된다.This is because the values transformed by quantization and inverse quantization after the transformation are scaled, so if the value is small, the value is '0'. Therefore, in the inverted vertical transformation performed first in the inverse transformation, the TPU becomes '0'. Occurs. And even if the TPUs are not all '0', when the input of the vertical conversion has only a DC value as shown in (a) of FIG. 9, the result of the vertical conversion is that when the TPUs are viewed from each other as shown in (b) of FIG. 9. It can be seen that the same. That is, in the horizontal transformation inputted as a row, '1' is generated a lot in Comparator-II, and only the inverse transformation of the first column needs to be performed.

또한, 도 10의 (a)처럼 역양자화 이후의 성분이 DC값만 존재하는 것이 아닌 DC 주변에 약간의 값이 존재하는 경우에도 수직 변환의 결과는 도 10의 (b)처럼 두 번째와 세 번째의 행에 대한 TPU가 서로 같게 된다. 그러므로, 역 수평 변환에서는 행에 대한 TPU가 서로 같게 되는 경우가 발생하게 된다.In addition, even if a component after inverse quantization as shown in (a) of FIG. 10 does not have only a DC value but a small value around DC, the result of vertical conversion is as shown in (b) of FIG. 10. The TPUs for the rows are equal to each other. Therefore, in the inverse horizontal conversion, the cases where the TPUs for the rows become equal to each other occur.

결과적으로, 역 변환에서 연산을 스킵할 수 있는 확률은 테스트 영상과 변환 크기에 따라 다르지만, 역 수직변환에서는 대략 60~80%이상의 연산을 스킵할 수 있으며, 역 수평변환에서는 대략적으로 변환 크기가 4일 경우에는 30~40%, 변환 크기가 8일 경우에는 20~40%, 변환 크기가 16일 경우에는 10~20%, 변환 크기가 32일 경우에는 10%정도 연산을 스킵할 수가 있다. 이러한 연산 스킵으로 변환/역변환에서 많은 연산을 스킵할 수가 있게 된다.As a result, the probability of skipping an operation in an inverse transform depends on the test image and the transform size, but an inverted vertical transform can skip approximately 60 to 80% or more of the operation. In this case, the operation can be skipped 30 to 40%, 20 to 40% for the conversion size 8, 10 to 20% for the conversion size 16, and 10% for the conversion size 32. This skip operation allows many operations to be skipped in the transform / inverse transform.

이는 또한, 실시간으로 초당 30프레임 혹은 60프레임으로 영상 압축을 하여야 하는 경우 병렬 연산을 진행하여야 하는데 병렬단계를 줄일 수가 있어서 하드웨어 복잡도를 줄이는 효과를 얻을 수 있다. 그리고, 이러한 방법은 인코더보다 디코더에서 더 큰 효과를 얻을 수 있는데 그 이유는 디코더에서는 변환수행 없이 역변환만 수행하면 되기 때문에 디코더의 전체적인 연산량 감소비율은 더 높아진다.In addition, when image compression is to be performed at 30 frames or 60 frames per second in real time, parallel computation must be performed, and the parallel stage can be reduced, thereby reducing the hardware complexity. In addition, this method can obtain a larger effect at the decoder than the encoder because the decoder only needs to perform inverse transformation without performing the conversion, so that the overall throughput reduction ratio of the decoder is higher.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiment of the present invention has been shown and described above, the present invention is not limited to the specific embodiments described above, but the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

Claims

메모리를 이용한 변환을 위한 장치에 있어서,In the device for conversion using a memory,

행과 열이 동수로 정렬된 데이터들에 대해 각 행별로 수평 변환을 수행하여 출력하는 수평 연산부;A horizontal operation unit which performs horizontal conversion for each row and outputs the data arranged in the same number of rows and columns;

상기 행 또는 열과 동수로 서로 구분되는 메모리를 가지는 메모리부; 및A memory unit having a memory distinguished from each other by the same number as the row or column; And

상기 각 행별 수평 변환에 따른 결과에서 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장하는 수평 레지스터부;를 포함하는 것을 특징으로 하는 메모리를 이용한 변환을 위한 장치.And a horizontal register unit for storing each of the data of each row in the result of the horizontal conversion for each row separately in each of the memories that are distinguished from each other.
제1항에 있어서,The method of claim 1,

상기 수평 레지스터부가 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장할 때,When the horizontal register unit stores each of the data of each row separately in each of the memory separated from each other,

상기 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 하는 저장 주소값 생성부;를 더 포함하는 것을 특징으로 하는 메모리를 이용한 변환을 위한 장치.And a storage address value generator for allocating the same address value to the data having the same column in the data in which the rows and the columns are arranged in the same number.
제2항에 있어서,The method of claim 2,

상기 서로 구분되는 메모리 각각으로부터 단일 클럭으로 데이터를 읽어 들이는 수직 레지스터부; 및A vertical register unit configured to read data from each of the memory separated from each other by a single clock; And

상기 수직 레지스터부가 읽어들인 데이터에 대해 수직 변환을 수행하는 수직 변환부;를 더 포함하는 메모리를 이용한 변환을 위한 장치.And a vertical converter configured to perform vertical conversion on the data read from the vertical register unit.
제3항에 있어서,The method of claim 3,

상기 수직 레지스터부는The vertical register portion

상기 서로 구분되는 메모리 각각에서 동일한 주소값이 할당된 데이터를 단일 클럭으로 읽어 들이는 것을 특징으로 하는 메모리를 이용한 변환을 위한 장치.And converting the data allocated to the same address value from each of the different memories into a single clock.
메모리를 이용한 변환을 위한 방법에 있어서,In the method for conversion using a memory,

행과 열이 동수로 정렬된 데이터들에 대해 각 행별로 수평 변환을 수행하는 단계;Performing horizontal transformation for each row on data in which rows and columns are arranged in equal numbers;

상기 각 행별 수평 변환에 따른 결과에서 각 행의 데이터들 각각을 상기 행또는 열과 동수로 서로 구분되는 메모리 각각에 따로 저장하는 단계;를 포함하는 것을 특징으로 하는 메모리를 이용한 변환을 위한 방법.And storing each of the data of each row separately in each of the memories separated by the same number as the rows or columns in the result of the horizontal conversion for each row.
제5항에 있어서,The method of claim 5,

상기 저장하는 단계는The storing step

상기 수평 레지스터부가 각 행의 데이터들 각각을 상기 서로 구분되는 메모리 각각에 따로 저장할 때, 상기 행과 열이 동수로 정렬된 데이터에서 동일한 열을 가지는 데이터들에 대해 동일한 주소값이 할당되도록 하는 저장하는 것을 특징으로 하는 메모리를 이용한 변환을 위한 방법.When the horizontal register unit stores each of the data of each row separately in each of the memory separated from each other, the horizontal register stores the same address value for data having the same column in the data arranged in the same number. Method for conversion using a memory, characterized in that.
제6항에 있어서,The method of claim 6,

상기 서로 구분되는 메모리 각각으로부터 단일 클럭으로 데이터를 읽어 들이는 단계; 및Reading data from each of the distinct memories into a single clock; And

상기 읽어들인 데이터에 대해 수직 변환을 수행하는 단계;를 더 포함하는 것을 특징으로 하는 메모리를 이용한 변환을 위한 방법.And performing a vertical conversion on the read data.
제7항에 있어서,The method of claim 7, wherein

상기 읽어 들이는 단계는The reading step

상기 서로 구분되는 메모리 각각에서 동일한 주소값이 할당된 데이터를 단일 클럭으로 읽어 들이는 것을 특징으로 하는 메모리를 이용한 변환을 위한 방법.And converting the data allocated to the same address value from each of the memories separated by a single clock.
입력되는 TPU(Transform Processing Unit)를 분석하는 단계; 및Analyzing an input transform processing unit (TPU); And

상기 분석단계에서의 분석결과를 기초로, 상기 TPU에 대한 butterfly 연산을 스킵하는 단계;를 포함하는 것을 특징으로 하는 butterfly 연산 방법.And skipping a butterfly operation on the TPU based on the analysis result in the analyzing step.
제 9항에 있어서,The method of claim 9,

상기 스킵 단계는,The skip step,

상기 TPU가 모두 '0'인 경우, 상기 butterfly 연산을 스킵하고 butterfly 연산 결과로 '000...00'을 출력하는 것을 특징으로 하는 butterfly 연산 방법.If all the TPUs are '0', the butterfly operation skips the butterfly operation and outputs '000 ... 00' as a butterfly operation result.
제 9항에 있어서,The method of claim 9,

상기 스킵 단계는,The skip step,

상기 TPU가 이전에 입력되었던 이전 TPU와 동일하면, 상기 TPU에 대한 butterfly 연산을 스킵하고 butterfly 연산 결과로 상기 이전 TPU에 대한 butterfly 연산 결과를 출력하는 것을 특징으로 하는 butterfly 연산 방법.If the TPU is the same as the previous TPU that was previously input, the butterfly operation method for skipping the butterfly operation for the TPU and outputs the butterfly operation result for the previous TPU as a butterfly operation result.
제 9항에 있어서,The method of claim 9,

상기 TPU가, 적어도 하나의 '1'을 포함하거나 이전에 입력되었던 이전 TPU와 다르면, 상기 TPU에 대해 butterfly 연산하여 결과를 출력하는 단계;를 더 포함하는 것을 특징으로 하는 butterfly 연산 방법.And if the TPU includes at least one '1' or is different from a previous TPU previously input, performing a butterfly operation on the TPU and outputting a result.
제 9항에 있어서,The method of claim 9,

상기 분석단계와 상기 스킵단계는,The analysis step and the skip step,

상기 TPU의 변환과 역변환 모두에 적용되는 것을 특징으로 하는 butterfly 연산 방법.Butterfly operation method, characterized in that applied to both the transform and inverse transform of the TPU.
제 13항에 있어서,The method of claim 13,

상기 분석단계와 상기 스킵단계는,The analysis step and the skip step,

수직 변환과 수평 변환 모두에 적용되는 것을 특징으로 하는 butterfly 연산 방법.Butterfly operation method, characterized in that applied to both vertical and horizontal transformation.