KR101722695B1

KR101722695B1 - Reconfigurable processor and method for processing loop having memory dependency

Info

Publication number: KR101722695B1
Application number: KR1020100109998A
Authority: KR
Inventors: 안희진; 유동훈; 이강웅; 안민욱; 이진석; 김태송; 김원섭
Original assignee: 삼성전자주식회사
Priority date: 2010-10-19
Filing date: 2010-11-05
Publication date: 2017-04-04
Also published as: KR20120040630A

Abstract

메모리 액세스 명령들 간의 의존 관계를 분석하고, 분석된 결과에 기초하여 명령들을 다수의 프로세싱 엘리먼트에 할당함으로써, 잘못된 연산을 줄일 수 있는 재구성 가능 프로세서 및 제어 방법이 개시된다. 시뮬레이션 결과로부터 연산 트레이스를 추출하고, 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석한다.Disclosed is a reconfigurable processor and control method that can reduce erroneous operations by analyzing dependencies between memory access instructions and assigning instructions to a plurality of processing elements based on the analyzed results. Extracts a calculation trace from the simulation result, and analyzes a memory dependence relationship between instructions included in iteration based on a trace corresponding to a memory access instruction in the calculation trace.

Description

메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서 및 방법{RECONFIGURABLE PROCESSOR AND METHOD FOR PROCESSING LOOP HAVING MEMORY DEPENDENCY}BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a reconfigurable processor and a method for processing a memory-

이터레이션(iteration)에 대한 연산을 병렬적으로 실행할 때, 연산이 정확하게 실행될 수 있도록 명령들을 다수의 프로세싱 엘리먼트(processing element)에 할당하는 기술과 관련된다.
Relates to techniques for assigning instructions to a plurality of processing elements so that when an operation for iteration is executed in parallel, the operation can be executed correctly.

재구성 가능 아키텍처(reconfigurable architecture)란 어떠한 작업을 수행하기 위한 컴퓨팅 장치의 하드웨어적 구성을 각각의 작업에 최적화되도록 변경할 수 있는 아키텍처를 의미한다.Reconfigurable architecture means an architecture that can change the hardware configuration of a computing device to perform an operation so that it is optimized for each task.

어떠한 작업을 하드웨어적으로만 처리하면 고정된 하드웨어의 기능으로 인해 작업 내용에 약간의 변경이 가해지면 이를 효율적으로 처리하기가 어렵다. 또한, 어떠한 작업을 소프트웨어적으로만 처리하면 그 작업 내용에 맞도록 소프트웨어를 변경하여 처리하는 것이 가능하지만 하드웨어적 처리에 비해 속도가 늦다.If a task is processed only by hardware, it is difficult to handle it efficiently if some changes are made to the task because of fixed hardware function. In addition, if a certain task is processed only by software, it is possible to change and process the software according to the task, but it is slower than hardware processing.

재구성 가능 아키텍처는 이러한 하드웨어/소프트웨어의 장점을 모두 만족시킬 수가 있다. 특히, 동일한 작업이 반복적으로 수행되는 디지털 신호 처리 분야에서는 이러한 재구성 가능 아키텍처가 많은 주목을 받고 있다.A reconfigurable architecture can satisfy all of these hardware / software advantages. In particular, such reconfigurable architectures have received much attention in the field of digital signal processing where the same operations are repeatedly performed.

한편, 디지털 신호 처리 과정은 그 신호 처리 특성상 동일한 작업이 반복되는 루프 연산 과정을 다수 포함하는 것이 일반적이다. 일반적으로, 루프 연산 속도를 높이기 위해서 루프 레벨 병렬화(loop level parallelism, LLP)가 많이 이용된다. 이러한 LLP로는 소프트웨어 파이프라이닝(software pipelining)이 대표적이다.On the other hand, the digital signal processing process generally includes a plurality of loop calculation processes in which the same operation is repeated due to the characteristics of the signal processing. In general, loop level parallelism (LLP) is often used to increase the speed of loop operations. Software pipelining is a typical example of such LLPs.

소프트웨어 파이프라이닝은 서로 다른 이터레이션(iteration)에 속해 있는 오퍼레이션이라도 그 이터레이션(iteration) 간의 의존성이 없으면 각각의 오퍼레이션을 동시에 처리할 수 있는 원리를 이용한 것이다. 이러한 소프트 웨어 파이프라이닝은 재구성 가능 어레이와 결합하면서 더 좋은 성능을 낼 수 있다. 예를 들어, 병렬처리가 가능한 오퍼레이션들이 재구성 가능 어레이를 구성하는 각각의 프로세싱 유닛에서 동시에 처리되는 것이 가능하다.Software pipelining is based on the principle that operations that belong to different iterations can be processed simultaneously without dependencies between iterations. This software pipelining can be combined with a reconfigurable array for better performance. For example, it is possible that operations capable of parallel processing can be processed simultaneously in each processing unit constituting the reconfigurable array.

최근에는, 파이프 라이닝을 실행함에 있어 메모리 의존성이 있는 루프의 명령이 정확하게 연산되도록, 명령들을 다수의 프로세싱 엘리먼트에 할당하는 기술에 대한 연구의 필요성이 증대되고 있다.
Recently, there is an increasing need for research on techniques for assigning instructions to a large number of processing elements so that instructions in a memory-dependent loop are correctly computed in executing pipelining.

메모리 액세스 명령들 간의 의존 관계를 분석하고, 분석된 결과에 기초하여 명령들을 다수의 프로세싱 엘리먼트에 할당함으로써, 잘못된 연산을 줄일 수 있는 재구성 가능 프로세서와 관련된다.
Relates to a reconfigurable processor capable of reducing erroneous operations by analyzing dependencies between memory access instructions and assigning instructions to a plurality of processing elements based on the analyzed results.

발명의 일 실시예에 따른 메모리 의존성 있는 루프의 처리하기 위한 재구성 가능 프로세서는 시뮬레이션 결과로부터 연산 트레이스를 추출하는 추출부 및 상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하는 스케줄러를 포함한다.A reconfigurable processor for processing a memory dependent loop according to an embodiment of the present invention includes an extraction unit for extracting a computation trace from a simulation result, and an extraction unit for extracting a computation trace from the simulation result based on a trace corresponding to a memory access instruction in the iteration And a scheduler for analyzing the memory dependence between the instructions.

상기 명령들을 시뮬레이션하는 시뮬레이션부를 더 포함할 수 있다.And a simulation unit for simulating the instructions.

상기 스케줄러는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석할 수 있다.The scheduler may generate an iteration window corresponding to a processing time of instructions included in one iteration and may analyze a memory dependence relationship between instructions included in the iteration existing in the generated iteration window.

상기 스케줄러는 상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산할 수 있다.The scheduler may calculate a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency.

상기 스케줄러는 상기 분석된 메모리 의존 관계에 기초하여 상기 연산된 MII로부터 II(iteration distance)를 증가시키면서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다.The scheduler may allocate instructions to the processing elements, increasing the iteration distance (II) from the computed MII based on the analyzed memory dependency.

상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
Wherein the compute trace includes at least one of a register address, a value stored in a register, a memory address, and a value stored in a memory.

본 발명의 일 실시예에 따른 메모리 의존성 있는 루프의 처리 방법은 시뮬레이션 결과로부터 연산 트레이스를 추출하는 단계 및 상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하는 단계를 포함한다.A method of processing a memory dependent loop according to an embodiment of the present invention includes extracting a calculation trace from a simulation result and calculating a memory access instruction based on a trace corresponding to a memory access instruction in the calculation trace. And analyzing the memory dependency.

상기 명령들을 시뮬레이션하는 단계를 더 포함할 수 있다.The method may further comprise simulating the instructions.

상기 분석하는 단계는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석할 수 있다.The analyzing step may generate an iteration window corresponding to a processing time of instructions included in one iteration and analyze a memory dependence relationship between instructions included in the iteration existing in the generated iteration window have.

상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산하는 단계를 더 포함할 수 있다.And calculating a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency.

상기 분석된 메모리 의존 관계에 기초하여 상기 연산된 MII로부터 II(iteration distance) 값을 증가시켜 나가면서 상기 분석된 메모리 의존 관계를 고려하여 프로세싱 엘리먼트들에 할당하는 단계를 더 포함할 수 있다.And incrementing an iteration distance (II) value from the computed MII based on the analyzed memory dependency, and assigning the computed memory dependency to the processing elements in consideration of the analyzed memory dependency.

상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함할 수 있다.
The operation trace may include at least one of a register address, a value stored in a register, a memory address, and a value stored in a memory.

개시된 내용에 따르면, 프로파일링을 통해 얻은 연산 트레이스로부터 명령들 간의 메모리 의존 관계를 추출하고, 추출된 메모리 의존관계에 기초하여 이터레이션에 포함된 명령들을 다수의 프로세싱 엘리먼트에 할당함으로써, 메모리 의존 관계를 고려하지 않았을 때보다 연산의 정확도를 향상시킬 수 있다.According to the disclosed subject matter, by extracting a memory dependency between instructions from an operation trace obtained through profiling and assigning instructions contained in iteration to a plurality of processing elements based on the extracted memory dependency, It is possible to improve the accuracy of the computation more than when not considered.

또한, 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다. In addition, by using the iteration window to analyze the dependency relation between the memory access instructions, the dependency analysis time can be reduced.

도 1은 본 발명의 일 실시예와 관련된 재구성 가능 프로세서를 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 이터레이션 윈도우를 설명하기 위한 도면이다.
도 3a 및 도 3b는 MII를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 재구성 가능 프로세서의 제어 방법을 설명하기 위한 흐름도이다.1 is a diagram illustrating a reconfigurable processor in accordance with one embodiment of the present invention.
2 is a view for explaining an iteration window according to an embodiment of the present invention.
3A and 3B are views for explaining MII.
4 is a flowchart illustrating a method of controlling a reconfigurable processor according to an exemplary embodiment of the present invention.

이하, 첨부된 도면을 참조하여 발명을 실시하기 위한 구체적인 내용에 대하여 상세하게 설명한다.
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예와 관련된 재구성 가능 프로세서를 설명하기 위한 도면이다.1 is a diagram illustrating a reconfigurable processor in accordance with one embodiment of the present invention.

도 1을 참조하면, 재구성 가능 프로세서(100)는 재구성 가능 어레이(110), 메모리(120), 시뮬레이션부(130), 추출부(140) 및 스케줄러(150)를 포함한다. Referring to FIG. 1, a reconfigurable processor 100 includes a reconfigurable array 110, a memory 120, a simulation unit 130, an extraction unit 140, and a scheduler 150.

이하에서, 이터레이션(iteration)이란 루프가 여러번 실행될 때, 각각의 루프의 실행을 의미한다. 예를 들면, 루프가 3번 실행되는 경우, 루프의 첫번째 실행을 첫번째 이터레이션이라고 하고, 루프의 두번째 실행을 두번째 이터레이션이라고 하고, 루프의 세번째 실행을 세번째 이터레이션이라고 할 수 있다. 이터레이션에 속하는 명령(instruction)들이 서로 다른 프로세싱 엘리먼트에 매핑되고 각 프로세싱 엘리먼트가 동시에 동작함으로써, 명령들이 병렬적으로 처리될 수 있다. 이에 따라, 연산 속도가 향상될 수 있다.In the following, iteration means execution of each loop when the loop is executed several times. For example, if the loop is executed three times, the first execution of the loop is called the first iteration, the second execution of the loop is called the second iteration, and the third execution of the loop is called the third iteration. As the instructions belonging to the iteration are mapped to different processing elements and each processing element operates simultaneously, the instructions can be processed in parallel. Thus, the operation speed can be improved.

재구성 가능 프로세서(100)는 CGA(coarse-grained array) 모드, VLIW(very long instruction word) 모드 등으로 구동될 수 있다. 예를 들면, 재구성 가능 프로세서(100)는 CGA 모드에서 루프 연산을 처리하고, VLIW 모드에서는 일반적인 연산 또는 루프 연산을 처리할 수 있다. 다만, VLIW 모드에서 루프 연산을 할 수 있으나, CGA 모드에서 루프 연산을 처리하는 것보다 효율이 떨어진다. 예를 들면, 하나의 프로그램이 실행될 때, 재구성 가능 프로세서(100)는 CGA 모드 및 VLIW 모드를 번갈아가면서 구동될 수 있다. The reconfigurable processor 100 may be operated in a coarse-grained array (CGA) mode, a very long instruction word (VLIW) mode, or the like. For example, the reconfigurable processor 100 may process loop operations in CGA mode and may process general or loop operations in VLIW mode. However, the loop operation can be performed in the VLIW mode, but it is less efficient than the loop operation in the CGA mode. For example, when one program is executed, the reconfigurable processor 100 may be alternately driven in the CGA mode and the VLIW mode.

재구성 가능 어레이(110)는 레지스터 파일(111) 및 다수의 프로세싱 엘리먼트(processing element; PE)(112)를 포함한다. 재구성 가능 어레이(110)는 최적의 연산을 수행하도록 하드웨어적 구성을 변경하는 것이 가능하다. 예를 들면, 재구성 가능 어레이(110)는 연산의 종류에 따라 다수의 프로세싱 엘리먼트들 간의 연결 상태를 변경할 수 있다.The reconfigurable array 110 includes a register file 111 and a plurality of processing elements (PE) The reconfigurable array 110 is capable of altering the hardware configuration to perform an optimal operation. For example, the reconfigurable array 110 may change the connection state between a plurality of processing elements depending on the type of operation.

레지스터 파일(111)은 프로세싱 엘리먼트들(112) 간의 데이터 전달을 위해 사용되거나, 명령 실행 시 필요한 각종 데이터를 저장한다. 예를 들면, 각각의 프로세싱 엘리먼트(112)는 레지스터 파일(111)에 접속하여 명령 실행 시 사용되는 데이터를 읽거나 쓰는 것이 가능하다. 다만, 모든 프로세싱 엘리먼트들(112)이 서로 연결되는 것이 아니기 때문에, 특정 프로세싱 엘리먼트의 경우에는 레지스터 파일(11)에 접속하기 위해 다른 프로세싱 엘리먼트를 경유할 수도 있다.The register file 111 is used for transferring data between the processing elements 112, or stores various data necessary for executing the instruction. For example, each processing element 112 may be connected to a register file 111 to read or write data used in command execution. However, since not all of the processing elements 112 are connected to each other, in the case of a particular processing element, they may pass through other processing elements to access the register file 11.

프로세싱 엘리먼트(112)들은 할당된 명령을 실행할 수 있다. 프로세싱 엘리먼트(112)들의 연결 상태 및 동작 순서는 처리하고자 하는 작업에 따라 변경될 수 있다. Processing elements 112 may execute the assigned instruction. The connection state and the operation order of the processing elements 112 may be changed according to the task to be processed.

메모리(120)는 프로세싱 엘리먼트들(112)간의 연결 상태에 관한 정보, 명령들 등과 같이 프로세싱에 필요한 정보 및 프로세싱의 결과 정보가 저장될 수 있다.예를 들면, 메모리(120)는 처리할 데이터를 저장하거나 처리 결과를 저장할 수 있다. 또 다른 예를 들면, 메모리(120)에는 재구성 가능 프로세서(100)의 구동시 필요한 정보, 재구성 가능 어레이의 연결 상태 정보, 재구성 가능 어레이의 동작 방법에 대한 정보 등이 저장될 수 있다.The memory 120 may store information required for processing and result information of processing, such as information about the connection state between the processing elements 112, instructions, etc. For example, the memory 120 may store data to be processed You can save or save processing results. As another example, the memory 120 may store information necessary for driving the reconfigurable processor 100, connection state information of the reconfigurable array, information on the operation method of the reconfigurable array, and the like.

시뮬레이션부(130)는 프로세싱 엘리먼트에서 실행될 명령들을 테스트 파일에 적용하여 시뮬레이션을 실행할 수 있다. 예를 들면, 시뮬레이션부(130)는 명령들을 이용하여 테스트 파일(예를 들면, MP3 파일, 동영상 파일 등)을 처리하는 시뮬레이션을 실행할 수 있다. The simulation unit 130 may execute the simulation by applying the instructions to be executed in the processing element to the test file. For example, the simulation unit 130 can execute simulations that use commands to process test files (e.g., MP3 files, video files, etc.).

추출부(140)는 시뮬레이션부(130)에서 실행된 시뮬레이션 결과로부터 연산 트레이스(execution trace)를 추출할 수 있다. 이를 프로파일링이라고도 한다. 연산 트레이스는 시뮬레이션 중에 실행된 명령에 관한 정보를 시간 순서대로 기록한 자료이며, 기록된 자료는 각 명령이 실행된 순간의 변수들의 값일 수 있다. 예를 들면, 상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 등을 포함할 수 있다.The extraction unit 140 may extract an execution trace from the simulation result performed in the simulation unit 130. [ This is also called profiling. The arithmetic trace is data in which the information about the command executed during the simulation is recorded in chronological order, and the recorded data may be the value of the variables at the moment each command is executed. For example, the operation trace may include a register address, a value stored in a register, a memory address, and a value stored in a memory.

스케줄러(150)는 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스('추적된 변수 값')에 기초하여 명령들 간의 메모리 의존 관계를 분석할 수 있다. 각각의 이터레이션에 포함된 명령들 중 동일한 메모리에 액세스하는 명령이 존재하는 경우, 명령들 간에는 메모리 의존 관계가 있는 것이다. 이 경우, 명령들은 정확한 연산을 위해 직렬적으로 처리되어야만 한다. 메모리 액세스 명령이란 메모리(120)로 데이터를 저장하거나 메모리(120)로부터 데이터를 읽어들이는 명령을 의미할 수 있다. 예를 들면, 아래와 같다. The scheduler 150 may analyze the memory dependence between instructions based on a trace ('tracked variable value') corresponding to a memory access instruction during a compute trace. If there is an instruction to access the same memory among the instructions included in each iteration, there is a memory dependency between the instructions. In this case, the instructions must be processed serially for correct operation. The memory access instruction may refer to a command to store data in the memory 120 or to read data from the memory 120. [ For example:

k번째 이터레이션 : kth iteration:

A : A: ldld _i _i r20r20 <- M[0x50] <- M [0x50]

B : B: addadd r2r2 <- <- r4r4 + + r5r5

C : C: stst _i M[0x100] <- _i M [0x100] <- r8r8

D : D: subsub r1r1 <- <- r4r4 - - r5r5

E : E: stst _i M[0x1000] <- _i M [0x1000] <- r10r10

k+1번째 이터레이션 : k + 1th iteration:

A : A: ldld _i _i r20r20 <- M[0x100] <- M [0x100]

B : B: addadd r2r2 <- <- r4r4 + + r5r5

C : C: stst _i M[0x150] <- _i M [0x150] <- r8r8

D : D: subsub r1r1 <- <- r4r4 - - r5r5

E : E: stst _i M[0x1000] <- _i M [0x1000] <- r10r10

여기서, ld는 읽기(load) 명령, add는 덧셈(addition) 명령, st는 저장(store) 명령, sub는 뺄셈(subtraction) 명령을 의미한다. 메모리 액세스 명령은 M[]이 포함된 명령을 의미한다.Here, ld denotes a load instruction, add denotes an addition command, st denotes a store command, and sub denotes a subtraction command. The memory access instruction means an instruction including M [].

레지스터(r) 의존 관계 분석은 레지스터의 이름만을 비교함으로써 쉽게 알 수 있다. 반면에, 메모리 의존 관계 분석은 레지스터에 저장된 메모리의 주소 값(예를 들면 '0x100','0x150')을 비교하여야만 알 수 있다. 따라서, 상대적으로 메모리 의존 관계 분석이 레지스터 의존 관계 분석에 비해 어렵다. The register (r) dependency analysis can be easily found by comparing only the names of the registers. On the other hand, the memory dependency analysis can be known only when the address values of the memories stored in the registers (for example, '0x100', '0x150') are compared. Therefore, relative memory dependency analysis is more difficult than register dependency analysis.

연산 트레이스는 레지스터 주소('r1,r2,r4,r5,r8,r10,r20'), 레지스터에 저장된 값, k번째 및 k+1번째 이터레이션에 포함된 메모리 액세스 명령에 해당하는 메모리 주소 또는 메모리 주소에 저장된 값을 포함할 수 있다.The arithmetic trace includes a register address ('r1, r2, r4, r5, r8, r10, r20'), a value stored in a register, a memory address corresponding to a memory access instruction included in the kth and k + It can contain the value stored in the address.

스케줄러(150)는 메모리 액세스 명령에 대응되는 트레이스에 기초하여 명령들 간의 의존 관계를 분석할 수 있다. 예를 들면, 스케줄러(150)는 메모리 액세스 명령에 해당하는 메모리의 주소 값이 동일한 경우, 해당하는 메모리 액세스 명령들 간에 메모리 의존 관계가 존재한다고 판단할 수 있다. 예를 들면, k번째 이터레이션 C의 메모리 주소 값 '0X100'과 k+1번째 이터레이션 A의 메모리 주소 값 '0x100'이 동일하므로, 스케줄러(150)는 k번째 이터레이션 C와 k+1번째 이터레이션 A가 의존 관계가 있다고 판단할 수 있다. 또 다른 예를 들면 k번째 이터레이션 C의 M[0X100]에 저장된 값과 k+1번째 이터레이션 A의 M[0X100]에 저장된 값이 동일한 경우, 스케줄러(150)는 k번째 이터레이션 C와 k+1번째 이터레이션 A가 의존 관계가 있다고 판단할 수 있다. 스케줄러(150)는 시뮬레이션 결과값에 기초하여 위와 같은 의존 관계를 분석할 수 있다. k번째 이터레이션과 k+1 번째 이터레이션 간의 메모리 의존 관계를 판단하는 과정을 설명하였으나, 스케줄러(150)는 k번째 이터레이션과 k+2번째 이터레이션, k번째 이터레이션과 k+3번째 이터레이션 등간의 메모리 의존 관계도 판단한다. The scheduler 150 may analyze dependencies between instructions based on traces corresponding to memory access instructions. For example, the scheduler 150 may determine that there is a memory dependency between the corresponding memory access instructions if the address value of the memory corresponding to the memory access instruction is the same. For example, since the memory address value '0x100' of the kth iteration C is equal to the memory address value '0x100' of the k + 1th iteration A, the scheduler 150 determines that the kth iteration C and the k + It can be determined that the iteration A has a dependency. For example, if the value stored in M [0X100] of the kth iteration C is equal to the value stored in M [0X100] of the k + 1th iteration A, the scheduler 150 determines that the kth iteration C and k It can be determined that the +1 iteration A has a dependency. The scheduler 150 can analyze the above dependency based on the simulation result value. the k-th iteration and the (k + 1) -th iteration. However, the scheduler 150 may determine the memory dependency between the k-th iteration and the k + It also determines the memory dependence of the equalization.

스케줄러(150)는 메모리 의존 관계에 기초하여 MII(minimum iteration distance)를 연산할 수 있다. The scheduler 150 may calculate a minimum iteration distance MII based on the memory dependency.

스케줄러(150)는 연산된 MII로부터 II값을 증가시켜 나가면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다. 예를 들면, 스케줄러(150)는 MII로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다. 이를 시행착오법(trial-and-error)이라고도 한다. 다만, 시행착오법은 일 실시예에 불과하고, 연산된 MII로부터 II값을 연산하는 다른 방법이 사용될 수도 있다.The scheduler 150 may assign instructions to the processing elements in consideration of the analyzed memory dependency, while increasing the value of II from the computed MII. For example, the scheduler 150 may allocate instructions to the processing elements in consideration of the analyzed memory dependency, increasing the iteration distance (I) value by 1 from MII. This is also called trial-and-error. However, the trial-and-error method is merely an embodiment, and another method of calculating the II value from the computed MII may be used.

스케줄러(150)는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 스케줄러(150)는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성한다. 스케줄러(150)는 순차적으로 입력되는 이터레이션들에 대해 이터레이션 윈도우를 이용하여 명령들 간의 의존 관계를 분석할 수 있다.The scheduler 150 may generate an iteration window corresponding to the processing time of the instructions included in one iteration and may analyze the dependency between instructions included in the iteration existing in the generated iteration window. The scheduler 150 determines whether or not the processing time corresponding to the processing time of the instructions included in one iteration Creates an iteration window. The scheduler 150 can analyze dependencies between instructions using iteration windows for sequentially iterations input.

스케줄러(150)가 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다. 즉, 이터레이션 윈도우를 이용함으로써, 분석이 필요없는 이터레이션들에 포함된 명령들 간의 의존 관계를 분석하지 않을 수 있다.The scheduler 150 can use the iteration window to analyze dependencies between memory access instructions, thereby reducing dependency analysis time. That is, by using the iteration window, it is possible not to analyze the dependency between instructions included in iterations that do not require analysis.

도 2는 본 발명의 일 실시예에 따른 이터레이션 윈도우를 설명하기 위한 도면이다. 2 is a view for explaining an iteration window according to an embodiment of the present invention.

본 실시예에서는 이터레이션은 1 사이클(cycle)마다 입력(II=1)되고, 1개의 이터레이션에 포함된 명령의 처리 시간은 10 사이클이라고 가정한다.In the present embodiment, the iteration is input every cycle (II = 1), and the processing time of the instruction included in one iteration is assumed to be 10 cycles.

도 1 및 도 2를 참조하면, 이터레이션 윈도우(200)는 1개의 이터레이션에 포함된 명령의 처리 시간과 같거나 큰 크기로 생성될 수 있다. 예를 들면, 1개의 이터레이션에 포함된 명령들의 처리 시간이 10 사이클이므로, 이터레이션 윈도우(200)는 10 사이클에 대응되는 10개의 이터레이션이 포함되거나 이거나 10개 이상의 이터레이션이 포함될 수 있는 크기로 생성될 수 있다. 10개의 이터레이션이 입력되는데 걸리는 시간은 10 사이클이다. Referring to FIGS. 1 and 2, the iteration window 200 may be created with a size equal to or greater than the processing time of an instruction included in one iteration. For example, since the processing time of instructions included in one iteration is 10 cycles, the iteration window 200 includes 10 iterations corresponding to 10 cycles or a size that can include 10 or more iterations Lt; / RTI > Ten iterations take 10 cycles.

스케줄러(150)는 이터레이션 윈도우에 포함된 이터레이션들에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 예를 들면, 현재 입력된 이터레이션('첫번째 이터레이션')과 현재 입력된 이터레이션에 포함된 명령들의 처리 시간('10 사이클')을 초과하여 입력된 이터레이션('열한번째 이터레이션')간의 의존 관계는 분석할 필요가 없다. 그 이유는 첫번재 이터레이션과 열한번째 이터레이션은 동시에 연산('병렬적 연산')되는 것이 아니라 순차적으로 연산('직렬적 연산') 되기 때문이다. 즉, 첫번째 이터레이션이 연산된 후, 열한번째 이터레이션이 연산된다. 따라서, 이터레이션 윈도우의 크기는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 크기와 같거나 크게 설정할 수 있다. The scheduler 150 may analyze dependencies between instructions included in iterations included in the iteration window. For example, between the currently input iteration ('first iteration') and the input iteration ('eleventh iteration') exceeding the processing time of the instructions included in the currently input iteration ('10 cycles' Dependencies do not need to be analyzed. This is because the first iteration and the eleventh iteration do not operate at the same time ('parallel operation') but sequentially ('serial operation'). That is, after the first iteration is computed, the eleventh iteration is computed. Therefore, the size of the iteration window can be set to be equal to or larger than the size corresponding to the processing time of instructions included in one iteration.

스케줄러(150)는 재구성 가능 어레이(110)에서 실행될 명령을 분석하고, 분석 결과에 기초하여 다수의 프로세싱 엘리먼트(112)에 명령을 할당할 수 있다. The scheduler 150 may analyze the instructions to be executed in the reconfigurable array 110 and may assign instructions to the plurality of processing elements 112 based on the analysis results.

스케줄러(150)는 이터레이션(iteration)들의 MII를 연산할 수 있다. 스케줄러(150)는 연산된 MII(minimum iteration distance)로부터 II(iteration distance)를 증가시켜 나가면서 분석된 메모리 의존 관계를 이용하여 명령을 프로세싱 엘리먼트들에 할당할 수 있다.
The scheduler 150 may calculate the MII of iterations. The scheduler 150 may allocate instructions to the processing elements using the memory dependency analyzed while increasing the iteration distance II from the computed minimum iteration distance MII.

도 3a 및 도 3b는 MII를 설명하기 위한 도면이다.3A and 3B are views for explaining MII.

도 3a는 MII가 1인 경우를 도시한 도면이고, A,B,C,D,E는 명령을 의미한다.FIG. 3A shows a case where MII is 1, and A, B, C, D, and E denote an instruction.

도 3a를 참조하면, 스케줄러(150)는 MII(minimum iteration distance)로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. 예를 들면, 제 1 이터레이션(200a)의 A와 제 2 이터레이션(210a)의 B간에 의존 관계가 존재하는 경우, 스케줄러(150)는 제 2 이터레이션(210a)의 B 명령이 제 1 이터레이션(200a)의 A 명령이 실행된 이후에 실행되도록 MII값을 연산할 수 있다. 그 다음, 스케줄러(150)는 연산된 MII로부터 II값을 증가시키면서 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. Referring to FIG. 3A, the scheduler 150 may allocate instructions to the processing elements, taking into account the analyzed memory dependency, increasing the iteration distance (II) value from MII (minimum iteration distance) by one. For example, when there is a dependency relationship between A of the first iteration 200a and B of the second iteration 210a, the scheduler 150 determines that the B instruction of the second iteration 210a is the first The MII value can be calculated so that it is executed after the A instruction of the instruction 200a is executed. The scheduler 150 may then allocate instructions to the processing elements, taking into account the memory dependencies analyzed, increasing the II value from the computed MII.

도 3b는 MII가 3인 경우를 도시한 도면이고, A,B,C,D,E는 명령을 의미한다.FIG. 3B shows a case where MII is 3, and A, B, C, D and E denote an instruction.

도 3b를 참조하면, 스케줄러(150)는 MII(minimum iteration distance)로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. 예를 들면, II는 MII값인 3부터 1씩 증가 될 수 있다. 예를 들면, 제 1 이터레이션(200a)의 D와 제 2 이터레이션(210a)의 B간에 의존 관계가 존재하는 경우, 스케줄러(150)는 제 2 이터레이션(210a)의 B 명령이 제 1 이터레이션(200a)의 D 명령이 실행된 이후에 실행되도록 MII값을 연산할 수 있다. 스케줄러(150)는 연산된 MII로부터 II값을 증가시키면서 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다.
Referring to FIG. 3B, the scheduler 150 may allocate instructions to the processing elements in consideration of the analyzed memory dependency, increasing the iteration distance (II) value from MII (minimum iteration distance) by one. For example, II can be increased from 3 to 1, the MII value. For example, when there is a dependency between D of the first iteration 200a and B of the second iteration 210a, the scheduler 150 determines that the B instruction of the second iteration 210a is the first The MII value can be calculated so that it is executed after the D instruction of the instruction 200a is executed. The scheduler 150 may allocate instructions to the processing elements taking into account the memory dependencies analyzed while increasing the II value from the computed MII.

도 4는 본 발명의 일 실시예에 따른 재구성 가능 프로세서의 제어 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a method of controlling a reconfigurable processor according to an exemplary embodiment of the present invention.

도 4를 참조하면, 다수의 프로세싱 엘리먼트에서 실행될 명령들을 테스트 파일에 적용하여 시뮬레이션한다(400). 시뮬레이션된 결과로부터 연산 트레이스를 추출한다(410). 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석한다(420). 예를 들면, 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 분석된 메모리 의존 관계에 기초하여 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산한다(430). 분석된 메모리 의존 관계에 기초하여 연산된 MII(minimum iteration distance)로부터 II값을 증가시켜 나가면서 메모리 의존 관계를 고려하여 명령들을 프로세싱 엘리먼트들에 할당한다(440).Referring to FIG. 4, commands to be executed in a plurality of processing elements are applied to a test file to simulate (400). Computational traces are extracted from the simulated results (410). A memory dependency relationship between instructions included in the iterations is analyzed (420) based on the trace corresponding to the memory access instruction during the operation trace. For example, it is possible to generate an iteration window corresponding to the processing time of instructions included in one iteration and to analyze the dependency between instructions included in the iteration existing in the generated iteration window. A minimum iteration distance (MII) between iterations is computed 430 based on the analyzed memory dependency. (440) instructions are assigned to the processing elements, taking into account the memory dependency as the II value is increased from the minimum iteration distance (MII) computed based on the analyzed memory dependency.

재구성 가능 프로세서의 제어 방법은 메모리 액세스 명령들 간의 의존 관계에 기초하여 이터레이션을 다수의 프로세싱 엘리먼트에 할당함으로써, 연산의 정확도를 향상시킬 수 있다.The control method of the reconfigurable processor can improve the accuracy of the operation by assigning the iteration to the plurality of processing elements based on the dependency between the memory access instructions.

또한, 재구성 가능 프로세서의 제어 방법은 한개의 이터레이션의 처리 시간에 대응되는 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다.
Also, the control method of the reconfigurable processor can reduce the dependency analysis time by analyzing the dependency relation between the memory access instructions using the iteration window corresponding to the processing time of one iteration.

설명된 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The embodiments described may be constructed by selectively combining all or a part of each embodiment so that various modifications can be made.

또한, 실시예는 그 설명을 위한 것이며, 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술분야의 통상의 전문가라면 본 발명의 기술사상의 범위에서 다양한 실시예가 가능함을 이해할 수 있을 것이다.It should also be noted that the embodiments are for explanation purposes only, and not for the purpose of limitation. In addition, it will be understood by those of ordinary skill in the art that various embodiments are possible within the scope of the technical idea of the present invention.

또한, 본 발명의 일 실시예에 의하면, 전술한 방법은, 프로그램이 기록된 매체에 프로세서가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 프로세서가 읽을 수 있는 매체의 예로는, ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다.Further, according to an embodiment of the present invention, the above-described method can be implemented as a code that can be read by a processor on a medium on which the program is recorded. Examples of the medium that can be read by the processor include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, etc., and may be implemented in the form of a carrier wave (e.g., transmission over the Internet) .

Claims

시뮬레이션 결과로부터 연산 트레이스를 추출하는 추출부; 및
상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하고, 상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산하고, 상기 연산된 MII로부터 II(iteration distance)를 증가시키면서 상기 분석된 메모리 의존 관계를 고려하여 명령들을 프로세싱 엘리먼트들에 할당하는 스케줄러를 포함하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
An extracting unit for extracting a calculation trace from the simulation result; And
(MII) between the iterations based on the analyzed memory dependency relationship, based on a trace corresponding to a memory access instruction in the operation trace, analyzing a memory dependency relationship between instructions included in the iterations, and a scheduler for calculating a minimum iteration distance and assigning instructions to the processing elements in consideration of the analyzed memory dependency while increasing an iteration distance from the computed MII, Reconfigurable processor.

제 1 항에 있어서,
상기 명령들을 시뮬레이션하는 시뮬레이션부를 더 포함하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
The method according to claim 1,
A reconfigurable processor for processing a memory-dependent loop, the processor further including a simulator for simulating the instructions.

제 1 항에 있어서,
상기 스케줄러는,
1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
The method according to claim 1,
The scheduler comprising:
A memory-dependent loop for generating an iteration window corresponding to a processing time of instructions included in one iteration and analyzing a memory dependence relationship between instructions included in the iteration existing in the generated iteration window &Lt; / RTI >

삭제delete

제 1 항에 있어서,
상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
The method according to claim 1,
Wherein the compute trace includes at least one of a register address, a value stored in a register, a memory address, and a value stored in a memory.

재구성 가능 프로세서의 메모리 의존성이 있는 루프의 처리 방법에 있어서,
시뮬레이션 결과로부터 연산 트레이스를 추출하는 단계;
상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하는 단계;
상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산하는 단계; 및
상기 연산된 MII로부터 II(iteration distance) 값을 증가시키면서 상기 분석된 메모리 의존 관계를 고려하여 상기 명령들을 프로세싱 엘리먼트들에 할당하는 단계; 를 포함하는 메모리 의존성 있는 루프의 처리 방법.
A method of processing a memory-dependent loop of a reconfigurable processor,
Extracting a calculation trace from the simulation result;
Analyzing a memory dependency relationship between instructions included in iterations based on a trace corresponding to a memory access instruction in the operation trace;
Computing a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency; And
Allocating the instructions to the processing elements in consideration of the analyzed memory dependency while increasing an iteration distance (II) value from the computed MII; / RTI > of a memory-dependent loop.

제 7 항에 있어서,
상기 명령들을 시뮬레이션하는 단계를 더 포함하는 메모리 의존성 있는 루프의 처리 방법.
8. The method of claim 7,
Further comprising simulating the instructions. &Lt; Desc / Clms Page number 22 >

제 7 항에 있어서,
상기 분석하는 단계는
1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석하는 메모리 의존성 있는 루프의 처리 방법.
8. The method of claim 7,
The analyzing step
A memory dependent loop processing for generating an iteration window corresponding to a processing time of instructions included in one iteration and analyzing a memory dependence relationship between instructions included in the iteration existing in the generated iteration window Way.

삭제delete

제 7 항에 있어서,
상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함하는 메모리 의존성 있는 루프의 처리 방법.8. The method of claim 7,
Wherein the compute trace includes at least one of a register address, a value stored in a register, a memory address, and a value stored in a memory.