KR20180072345A

KR20180072345A - Prefetching method and apparatus for pages

Info

Publication number: KR20180072345A
Application number: KR1020160175816A
Authority: KR
Inventors: 김신덕; 윤영선; 윤수경
Original assignee: 연세대학교 산학협력단
Priority date: 2016-12-21
Filing date: 2016-12-21
Publication date: 2018-06-29
Also published as: KR101940382B1

Abstract

Provided are a method for prefetching a page and an apparatus thereof. According to an embodiment of the present invention, the apparatus for prefetching a page from a second memory to a first memory comprises: a cluster generation unit which generates a cluster by grouping pages accessed by an application program for each data access pattern of the application program; a prefetching unit; and an access pattern learning unit which learns a data access pattern of the application program by exchanging an actual requested page with a page of a cluster where a page miss occurs. The heat rate in a main memory can be increased.

Description

페이지의 프리페칭 방법 및 장치{PREFETCHING METHOD AND APPARATUS FOR PAGES}PREPECCHING METHOD AND APPARATUS FOR PAGES [0002]

본 발명은 페이지의 프리페칭 방법 및 장치에 관한 것으로서, 더욱 상세하게는 데이터 액세스 패턴을 반영하여 페이지들을 클러스터(cluster)로 관리하고, 데이터 액세스 패턴이 잘 반영된 클러스터(well made cluster)를 프리페칭하여 CPU의 히트 레이트(hit rate)를 높이는 기술에 관한 것이다.The present invention relates to a page prefetching method and apparatus, and more particularly, to a method and apparatus for prefetching a page by managing pages in a cluster by reflecting a data access pattern, prefetching a well- To a technology for increasing a hit rate of a CPU.

최근 통신 기술의 발달과 스마트 폰과 같은 디바이스 제조 기술의 발달로, 많은 사람들이 다양한 디바이스와 응용 프로그램을 이용하여 다양한 크기와 종류의 데이터를 생성하고 공유하는 시대가 되었다.With the recent development of communication technologies and the development of device manufacturing technologies such as smart phones, many people have been using the various devices and applications to create and share various sizes and types of data.

그리고 이러한 다양한 크기와 종류의 데이터를 빠른 속도로 처리하기 위해 ‘데이터 센터’는 엑사급(exa-scale)의 서버 수를 늘려 처리 성능을 향상(scale out)시키고 있으며, 대규모의 크라우드 데이터 센터(cloud datacenters)에서는 많은 양의 DRAM 공간을 이용한 인 메모리 데이터베이스(in memory database) 또는 캐싱 레이어(caching layer) 등을 통해 데이터의 처리 성능을 향상시키고 있다.In order to process these various sizes and types of data at high speed, 'data centers' scale out the processing performance by increasing the number of exa-scale servers, and large-scale crowd data centers datacenters are improving the performance of data through in memory databases or caching layers using large amounts of DRAM space.

그러나, 데이터 센터의 저장소(storage)로부터 DRAM으로 데이터를 가져오는 것은 성능 측면에서 큰 손실을 가져오는 문제가 있다.However, fetching data from the data center's storage to the DRAM has a problem in terms of performance.

이에 Memcached, MemC3, Redis 등과 같이 소프트웨어적으로 DRAM 의 빠른 레이턴시(latency)를 적극적으로 활용하는 방법이 제안되었지만, 이는 실제 데이터의 위치나 간략한 정보만을 DRAM 에 저장하여 검색 속도를 높일 뿐, 액세스 패턴(access pattern)이 동적으로 자주 변경되는 응용 프로그램의 실질적인 히트 레이트(hit rate)를 높이지 못하는 문제가 있다.In this paper, we propose a method to actively utilize the fast latency of DRAM in software such as Memcached, MemC3, Redis, etc. However, it only stores the location and brief information of actual data in DRAM, access pattern can not increase the actual hit rate of an application that changes dynamically frequently.

또한, DRAM의 덴서티(density) 특성 상 그 수를 늘리는 것(scale out)에는 한계가 있으며, DRAM의 높은 에너지 소비는 데이터 서버의 TCO(Total Cost Ownership)를 상승시키는 문제가 있다.In addition, there is a limit to scale out of the DRAM due to the density characteristics thereof, and the high energy consumption of the DRAM has a problem of increasing the TCO (Total Cost Ownership) of the data server.

또한, 메인 메모리로 사용되는 DRAM의 부담을 덜기 위해, 다량의 SSD(Solid State Drive) caching layer, multi level cache 등을 DRAM의 second cache로 사용하는 방법이 제안되었으나, 이는 DRAM과 second cache간의 데이터 전송에서 발생하는 병목현상(bottleneck)과 전력 소모가 이중으로 소비되는 문제가 있다.In order to reduce the burden on the DRAM used as the main memory, a method of using a large amount of SSD (solid state drive) caching layer and multi level cache as the second cache of the DRAM has been proposed. However, There is a problem that bottlenecks and power consumption are consumed.

메모리 인텐시브(memory intensive)한 빅 데이터(big data)나 클라우드 저장소(cloud storage)와 같이 파일 액세스가 복잡하고 때로는 랜덤 액세스 패턴을 보이는 환경에서, DRAM의 스케일 아웃 덴서티(scale out density)와 전력 소비 문제를 해결하고, 데이터 센터의 TCO를 낮추며, 파일 액세스가 복잡하고 때로는 랜덤 액세스 패턴을 보이는 데이터를 프리페칭하여 DRAM에서의 히트 레이트를 높이는 방안이 요구되고 있다.In an environment where file access is complex and sometimes random access patterns, such as memory intensive big data or cloud storage, DRAM's scale out density and power consumption There is a need to improve the hit rate in DRAM by solving problems, lowering the TCO of the data center, prefetching data with complicated and sometimes random access patterns of the file accesses.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로, 파일 액세스가 복잡하고 때로는 랜덤 액세스 패턴을 보이는 데이터를 액세스 패턴에 따라서 클러스터로 관리하고, 히트될 예상 확률이 높은 클러스터를 메인 메모리에 프리페칭함으로써, 메인 메모리에서의 히트 레이트를 높이는 방안을 제공하고자 한다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and it is an object of the present invention to provide a method of managing data having a complicated file access and sometimes showing a random access pattern in a cluster according to an access pattern, So as to increase the hit rate in the main memory.

상기와 같은 목적을 달성하기 위해, 본 발명의 일 실시예에 따른 제2 메모리에서 제1 메모리로 페이지를 프리페칭(prefetching)하는 장치는 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 클러스터 생성부, 상기 응용 프로그램으로부터 요청된 페이지가 상기 제1 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 제2 메모리에서 상기 제1 메모리로 프리페칭하는 프리페칭부 및 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 액세스 패턴 학습부를 포함하되, 상기 제1 메모리는 상기 제2 메모리보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며, 상기 액세스 패턴 학습부는 클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 한다.In order to achieve the above object, an apparatus for prefetching a page from a second memory to a first memory according to an embodiment of the present invention includes a memory for storing pages accessed by an application program in a data access pattern If the requested page is not present in the first memory, the requested page and the cluster to which the requested page belongs satisfy a pre-fetching condition A prefetching unit for prefetching all the pages of the cluster to which the requested page belongs from the second memory to the first memory and a page miss page miss) occurs, the page of the cluster in which the page miss has occurred And an access pattern learning unit for exchanging an actually requested page with an image and learning a data access pattern of the application program, wherein the first memory has a smaller space, fast read and fast write speed than the second memory, The pattern learning unit uses the prefetch depth indicating the hit rate of the cluster and the queue in which the information of the cluster where the page miss occurs is stored and the cluster in which the page miss occurs and the actually requested page Cluster closeness of clusters belonging to the cluster, and determines whether or not the page is exchanged based on the calculated closeness between clusters.

상기와 같은 목적을 달성하기 위해, 본 발명의 다른 실시예에 따른 저장소에서 메인 메모리로 페이지를 프리페칭(prefetching)하는 장치는 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 클러스터 생성부, 상기 응용 프로그램으로부터 요청된 페이지가 상기 메인 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 저장소에서 상기 메인 메모리로 프리페칭하는 프리페칭부 및 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 액세스 패턴 학습부를 포함하되, 상기 메인 메모리는 상기 저장소보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며, 상기 액세스 패턴 학습부는 클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided an apparatus for prefetching a page from a repository to a main memory, the apparatus for grouping pages accessed by an application program according to a data access pattern of the application program A cluster generating unit for generating a cluster; if a page requested by the application program is not present in the main memory, if the requested page and a cluster to which the requested page belongs satisfy a pre-fetching condition, A prefetching unit for prefetching all the pages of the cluster to which the page belongs from the storage to the main memory and a page miss after the prefetching when the page request after the prefetching fails to be hit, The page of the cluster in which the page miss occurred And an access pattern learning unit that learns a data access pattern of the application program by exchanging a page actually requested with the access pattern learning unit, wherein the main memory has a smaller space, faster read and faster write speed than the storage, A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculating a cluster closeness and determining whether the page is exchanged based on the calculated proximity between clusters.

상기와 같은 목적을 달성하기 위해, 본 발명의 일 실시예에 따른 프리페칭 장치가 제2 메모리에서 제1 메모리로 페이지를 프리페칭(prefetching)하는 방법은 (a) 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 단계, (b) 상기 응용 프로그램으로부터 요청된 페이지가 상기 제1 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 제2 메모리에서 상기 제1 메모리로 프리페칭하는 단계 및 (c) 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 단계를 포함하되, 상기 제1 메모리는 상기 제2 메모리보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며, 상기 (c) 단계는 클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 한다.In order to achieve the above object, a prefetching apparatus according to an embodiment of the present invention prefetches a page from a second memory to a first memory by (a) (B) if the page requested by the application program is not present in the first memory, the requested page and the requested page are stored in a first memory, Prefetching all pages of the cluster to which the requested page belongs from the second memory to the first memory if the belonging cluster satisfies the prefetching condition; and (c) If a page request fails to be hit and a page miss occurs, And reading the data access pattern of the application program by exchanging a page of the first page with a page actually requested, wherein the first memory has a smaller space, fast read and fast write speed than the second memory, The step c) includes the step of calculating the cluster size of the cluster in which the page miss occurred and the actual requested cluster size by using a queue storing a prefetch depth indicating a hit rate of the cluster, Calculating cluster closeness between the clusters to which the page belongs, and determining whether to exchange the page based on the calculated closeness between clusters.

상기와 같은 목적을 달성하기 위해, 본 발명의 다른 실시예에 따른 프리페칭 장치가 저장소에서 메인 메모리로 페이지를 프리페칭(prefetching)하는 방법은 (a) 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 단계, (b) 상기 응용 프로그램으로부터 요청된 페이지가 상기 메인 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 저장소에서 상기 메인 메모리로 프리페칭하는 단계 및 (c) 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 단계를 포함하되, 상기 메인 메모리는 상기 저장소보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며, 상기 (c) 단계는 클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method for prefetching a page from a repository to a main memory, the method comprising the steps of: (a) (B) if the requested page is not present in the main memory, the requested page and the cluster to which the requested page belongs are free Prefetching all the pages of the cluster to which the requested page belongs from the storage to the main memory if the fetching condition is satisfied; and (c) When a page miss occurs, And reading the data access pattern of the application program by exchanging a page actually requested with the page, wherein the main memory has a smaller space, faster read and faster write speed than the storage, and (c) A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculating a cluster closeness and determining whether the page is exchanged based on the calculated proximity between clusters.

본 발명의 일 실시예에 따르면, 메모리 인텐시브한 빅 데이터나 클라우드 저장소와 같이 파일 액세스가 복잡하고 때로는 랜덤 액세스 패턴을 보이는 환경에서, DRAM의 스케일 아웃 덴서티와 전력 소비 문제를 해결할 수 있다.According to an embodiment of the present invention, the problem of scale-out capacity and power consumption of a DRAM can be solved in an environment where file access is complicated and sometimes a random access pattern is displayed, such as memory-intensive big data or cloud storage.

또한, 데이터 센터의 TCO를 낮출 수 있다.In addition, the TCO of the data center can be lowered.

또한, 파일 액세스가 복잡하고 때로는 랜덤 액세스 패턴을 보이는 데이터의 히트 레이트를 높일 수 있다.In addition, it is possible to increase the hit rate of data in which file access is complicated and sometimes shows a random access pattern.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.It should be understood that the effects of the present invention are not limited to the above effects and include all effects that can be deduced from the detailed description of the present invention or the configuration of the invention described in the claims.

도 1은 본 발명의 일 실시예에 따른 페이지의 프리페칭 시스템의 구성을 도시한 도면이다.
도 2는 본 발명의 다른 실시예에 따른 페이지의 프리페칭 시스템의 구성을 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 페이지 프리페칭 장치의 구성을 도시한 블록도이다.
도 4는 본 발명의 일 실시예에 따른 클러스터간 페이지 교환 과정을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 페이지를 프리페칭하는 과정을 도시한 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 최적의 클러스터 사이즈를 테스트한 결과이다.
도 7은 본 발명의 일 실시예에 따른 DRAM에서 히트 레이트의 변화를 나타낸 그래프이다.
도 8은 본 발명의 일 실시예에 따른 페이지의 교환과 프리페치 정확도의 관계를 나타낸 그래프이다.
도 9는 본 발명의 일 실시예에 따른 DRAM의 전반적인 히트 레이트를 나타낸 그래프이다.
도 10은 본 발명의 일 실시예에 따른 워크로드의 전반적인 실행 시간을 측정한 결과를 나타낸 그래프이다.
도 11은 본 발명의 일 실시예에 따른 테스트에서 에너지 소비를 정규화한 결과를 나타낸 것이다.
도 12는 본 발명의 일 실시예에 따른 각 워크로드로부터 내보내지는 페이지들의 수를 계산한 것이다.
도 13은 본 발명의 일 실시예에 따른 클러스터링 관리 기법에서 발생할 수 있는 오버헤드의 테스트 결과이다.1 is a diagram illustrating a configuration of a prefetching system for a page according to an embodiment of the present invention.
2 is a diagram illustrating a configuration of a page prefetching system according to another embodiment of the present invention.
3 is a block diagram showing a configuration of a page prefetching apparatus according to an embodiment of the present invention.
4 is a diagram illustrating a process of exchanging a page between clusters according to an embodiment of the present invention.
5 is a flowchart illustrating a process of pre-fetching a page according to an embodiment of the present invention.
6 is a result of testing an optimal cluster size according to an embodiment of the present invention.
FIG. 7 is a graph illustrating a change in a heat rate in a DRAM according to an exemplary embodiment of the present invention. Referring to FIG.
8 is a graph illustrating a relationship between page replacement and prefetch accuracy according to an embodiment of the present invention.
9 is a graph illustrating the overall hit rate of a DRAM according to an embodiment of the present invention.
10 is a graph illustrating a result of measuring the overall execution time of a workload according to an embodiment of the present invention.
Figure 11 shows the results of normalizing energy consumption in a test according to an embodiment of the present invention.
Figure 12 is a calculation of the number of pages to be exported from each workload according to an embodiment of the present invention.
FIG. 13 is a test result of overhead that may occur in the clustering management technique according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시예로 한정되는 것은 아니다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "indirectly connected" .

또한 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 구비할 수 있다는 것을 의미한다.Also, when an element is referred to as "comprising ", it means that it can include other elements, not excluding other elements unless specifically stated otherwise.

이하 첨부된 도면을 참고하여 본 발명의 실시예를 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 페이지의 프리페칭 시스템의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of a prefetching system for a page according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 페이지 프리페칭 시스템은 제1 메모리(10), 제2 메모리(20) 및 페이지 프리페칭 장치(100)를 포함할 수 있으며, 듀얼 인라인 메모리 모듈(Dual in-line memory module; DIMM) 인터페이스를 위해 하나의 통합된 어레이로 구현될 수 있다.A page prefetching system according to an embodiment of the present invention may include a first memory 10, a second memory 20 and a page prefetching apparatus 100, and may include a dual in-line memory module module (DIMM) interface).

여기서 제1 메모리(10)는 제2 메모리(20)보다 쓰기와 읽기 레이턴시(write & read latency)가 빠르고, 제2 메모리(20) 대비 소량을 사용할 수 있다.Here, the first memory 10 has a faster write and read latency than the second memory 20 and can use a smaller amount than the second memory 20.

이에 반해, 제2 메모리(20)는 높은 덴서티(density)를 가질 수 있으며, 제1 메모리(10)보다 상대적으로 적은 전력이 소비된다.On the other hand, the second memory 20 may have a high density, and relatively less power is consumed than the first memory 10.

제1 메모리(10)의 일 실시예로서 DRAM을, 제2 메모리(20)의 일 실시예로서 NAND Flash를 사용할 수 있으며, 이하에서는 제1 메모리(10)를 DRAM(10)으로, 제2 메모리(20)를 NAND Flash(20)로 칭하도록 한다.NAND Flash may be used as an embodiment of the first memory 10 and the second memory 20 and the first memory 10 may be used as the DRAM 10, (20) is referred to as a NAND Flash (20).

시스템의 각 구성 요소를 설명하면, 페이지 프리페칭 장치(100)는 페이지들(pages)을 데이터 액세스 패턴별로 그룹화하여 ‘클러스터(cluster)’를 생성할 수 있다.Describing each component of the system, the page pre-fetching apparatus 100 can create a 'cluster' by grouping pages according to a data access pattern.

여기서 ‘액세스 패턴’은 응용 프로그램의 수행 특성(예를 들어 연산을 주로 수행하거나, 이미지 등과 같은 그래픽을 주로 참조하는 등)에 따라서 페이지에 접근하는 방식을 의미할 수 있다.Here, 'access pattern' may mean a method of accessing a page according to the performance characteristics of an application program (for example, mainly performing an operation or mainly referring to graphics such as an image, etc.).

따라서, 하나의 클러스터에는 응용 프로그램의 수행 특성이 반영된, 즉 액세스 패턴이 동일한 페이지들이 포함될 수 있다.Therefore, one cluster may include pages that reflect the performance characteristics of the application program, that is, pages having the same access pattern.

본 발명에서 하나의 클러스터는 256KB의 사이즈로 구현될 수 있다. 이는 하나의 클러스터에는 4KB 단위의 페이지가 64개 포함될 수 있음을 의미한다.In the present invention, one cluster may be implemented with a size of 256 KB. This means that one cluster can contain 64 pages in 4KB units.

참고로, 클러스터의 사이즈가 256KB로 설정된 것은 NAND Flash(20)의 구조에 따른 블록 사이즈와 실험적 결과를 고려한 것이다. 이에 대해서는 도 5를 참조하여 후술하도록 한다.For reference, the size of the cluster is set to 256 KB in consideration of the block size and the experimental result according to the structure of the NAND Flash 20. This will be described later with reference to FIG.

페이지 프리페칭 장치(100)는 클러스터가 특정 응용 프로그램에 대하여 ‘올바른 액세스 패턴’을 가지도록 관리할 수 있다.The page prefetching apparatus 100 can manage the cluster to have a 'correct access pattern' for a specific application program.

여기서 ‘올바른 액세스 패턴’의 의미는 NAND Flash(20)로부터 DRAM(10)으로 프리페칭된 클러스터의 64개 페이지들 중 미리 정해진 기준(예를 들어 60~70%) 이상 히트되는 경우를 의미할 수 있다.Here, the meaning of 'correct access pattern' may mean a case where a predetermined criterion (for example, 60 to 70%) of 64 pages of the cluster prefetched from the NAND Flash 20 to the DRAM 10 is hit have.

만일, 응용 프로그램의 페이지 요청 시, 요청된 페이지가 해당 액세스 패턴에 상응하는 클러스터(이하 ‘제1 클러스터’라 칭함)에 존재하지 않고 다른 클러스터(이하 ‘제2 클러스터’라 칭함)에 존재하는 경우, 페이지 프리페칭 장치(100)는 요청된 실제 페이지와 제1 클러스터의 페이지를 교환할 수 있으며, 이러한 페이지 교환 과정을 지속적으로 수행함으로써 각 클러스터들이 올바른 액세스 패턴을 가지도록 학습할 수 있다.If the requested page is not present in a cluster corresponding to the access pattern (hereinafter, referred to as 'first cluster') but exists in another cluster (hereinafter referred to as 'second cluster') at the time of page request of the application program , The page pre-fetching apparatus 100 can exchange the requested physical page with the page of the first cluster, and by continuously performing the page exchange process, each cluster can learn to have a correct access pattern.

이를 위해, 본 발명은 클러스터가 올바른 액세스 패턴으로 클러스터링이 잘 되었는지 또는 그렇지 않은지(well-clustered or not)를 의미하는, 즉 클러스터링의 정도(degree of clustering)를 나타내는 ‘프리페치 뎁스(prefetch depth)’를 사용한다.To this end, the present invention provides a 'prefetch depth' representing the degree of clustering, meaning that the clusters are well-clustered or not clustering with the correct access pattern, Lt; / RTI >

여기서 ‘클러스터링이 잘 되었다’는 것은 앞서 언급한 올바른 액세스 패턴을 가진다는 것과 같은 의미로서, 클러스터의 페이지들이 특정 응용 프로그램의 정확한 액세스 패턴을 반영하고 있음을 의미한다.Here, 'clustering is good' means that the pages in the cluster reflect the correct access pattern of a specific application, which is equivalent to having the correct access pattern mentioned above.

클러스터링이 잘 된 클러스터의 페이지들은 NAND Flash(20)로부터 DRAM(10)으로 프리페칭된 후 히트되는 횟수가 많으며, 한 번 히트될 때마다 점수(score)를 부여한다면 올바른 액세스 패턴으로 클러스터링이 잘 된 클러스터는 높은 값(점수)의 프리페치 뎁스를 가질 수 있다.The pages of the clusters with good clustering are prefetched from the NAND Flash 20 to the DRAM 10 and then hit many times. If the score is given once every hit, The cluster may have a high pre-fetch depth (score).

결국, ‘프리페치 뎁스’는 ‘클러스터의 히트율(hit rate)’을 의미한다.As a result, 'prefetch depth' means 'cluster hit rate'.

페이지 프리페칭 장치(100)는 클러스터간 페이지를 교환하면서 클러스터가 올바른 액세스 패턴을 가지도록 학습할 때, 상기 프리페치 뎁스를 하나의 파라미터로 이용할 수 있다.The page prefetching apparatus 100 can use the prefetch depth as one parameter when the cluster learns to have a correct access pattern while exchanging pages between clusters.

또한, 페이지 프리페칭 장치(100)는 응용 프로그램의 페이지 요청 시, 요청된 페이지가 DRAM(10)에 존재하지 않으면, NAND Flash(20)에서 해당 요청 페이지를 찾는다.In addition, when the page requested by the application program is not present in the DRAM 10, the page prefetching apparatus 100 searches the NAND Flash 20 for the requested page.

그리고 요청된 페이지의 프리페치 뎁스와 요청된 페이지가 속한 클러스터의 평균 프리페치 뎁스를 비교하여 요청된 페이지의 프리페치 뎁스가 더 큰 경우, 해당 클러스터의 모든 페이지를 DRAM(10)에 프리페칭하여 히트율을 높일 수 있다.When the prefetch depth of the requested page is compared with the average prefetch depth of the cluster to which the requested page belongs, if the prefetch depth of the requested page is larger, all the pages of the cluster are prefetched into the DRAM 10, The rate can be increased.

또한, 페이지 프리페칭 장치(100)는 DRAM(10)에 프리페칭될 페이지를 저장할 공간이 미리 정해진 기준 값 미만이면, 기 프리페칭된 페이지들 중 프리페치 뎁스와 페이지 점수를 이용하여 계산되는 방출 점수(eviction score)가 하위에서 미리 정해진 비율(예를 들어 20%)에 해당하는 페이지들을 NAND Flash(20)로 방출시킬 수 있다.The page prefetching apparatus 100 may also be configured such that if the space for storing pages to be prefetched in the DRAM 10 is less than a predetermined reference value, (e.g., 20%) in the lower part of the NAND flash 20 in the lower level.

페이지 프리페칭 장치(100)에 대한 상세한 설명은 도 3을 참조하여 후술하도록 한다.A detailed description of the page pre-fetching apparatus 100 will be described later with reference to FIG.

한편, DRAM(10)은 읽는 속도에 있어서 NAND Flash(20) 보다 더 빠르므로 요청된 페이지로의 신속한 액세스를 제공할 수 있으며, 이를 위해 NAND Flash(20)는 백업 후보들(backup candidates)을 저장할 수 있다.Meanwhile, since the DRAM 10 is faster in read speed than the NAND Flash 20, it can provide quick access to the requested page. To this end, the NAND Flash 20 can store the backup candidates have.

여기서 ‘백업 후보들’은 DRAM(10)으로부터 공간 부족을 이유로 방출된 페이지들과 저장소(storage)(미도시)로부터 페치된 페이지들로 구성될 수 있다.Here, the 'backup candidates' may consist of pages ejected from the DRAM 10 due to lack of space and pages fetched from storage (not shown).

도 2는 본 발명의 다른 실시예에 따른 페이지의 프리페칭 시스템의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of a page prefetching system according to another embodiment of the present invention.

본 발명의 다른 실시예에 따른 페이지의 프리페칭 시스템은 메인 메모리(30), 저장소(40) 및 페이지 프리페칭 장치(200)를 포함할 수 있다.A page prefetching system according to another embodiment of the present invention may include a main memory 30, a storage 40, and a page prefetching apparatus 200. [

도 2에 도시된 페이지의 프리페칭 시스템은 도 1의 DRAM(10)과 NAND Flash(20)가 메인 메모리(30)와 저장소(40)로 변경된 것이며, 페이지 프리페칭 장치(200)의 동작은 도 1의 페이지 프리페칭 장치(100)와 동일하다.2 is that the DRAM 10 and the NAND Flash 20 of FIG. 1 are changed to the main memory 30 and the storage 40, and the operation of the page prefetching device 200 is the same as that of FIG. 1 page prefetching apparatus 100 shown in FIG.

메인 메모리(30)의 일 실시예로서 DRAM을, 저장소(storage)(40)의 일 실시예로서 SSD 등을 사용할 수 있다.As an embodiment of the main memory 30, a DRAM, an SSD or the like as an embodiment of the storage 40 may be used.

이하에서는 도 1에 도시된 DRAM(10)과 NAND Flash(20) 그리고 페이지 프리페칭 장치(100)를 중심으로 본 발명의 페이지 프리페칭의 실시예를 설명하도록 한다.Hereinafter, an embodiment of page prefetching of the present invention will be described with reference to the DRAM 10, the NAND Flash 20, and the page prefetching device 100 shown in FIG.

도 3은 본 발명의 일 실시예에 따른 페이지 프리페칭 장치의 구성을 도시한 블록도이다.3 is a block diagram showing a configuration of a page prefetching apparatus according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 페이지 프리페칭 장치(100)는 클러스터 생성부(110), 액세스 패턴 학습부(120), 프리페칭부(130) 및 페이지 방출부(140)를 포함할 수 있다.The page prefetching apparatus 100 according to an embodiment of the present invention may include a cluster generating unit 110, an access pattern learning unit 120, a prefetching unit 130, and a page emitting unit 140.

각 구성 요소를 설명하면, 클러스터 생성부(110)는 응용 프로그램에 의해 액세스되는 페이지들을 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성할 수 있다.In the description of each component, the cluster generating unit 110 may generate a cluster by grouping the pages accessed by the application program according to the data access pattern of the application program.

여기서 하나의 클러스터는 4KB 단위의 페이지가 64개 포함된 256KB의 사이즈로 구현될 수 있다.Here, one cluster can be implemented with a size of 256 KB including 64 pages in units of 4 KB.

클러스터 생성부(110)는 클러스터 생성 시 클러스터 번호(cluster number)를 유니크한 식별자로서 부여할 수 있으며, 클러스터 번호가 부여된 각 클러스터는 클러스터 테이블로 관리될 수 있다.The cluster generating unit 110 may assign a cluster number as a unique identifier when a cluster is created, and each cluster to which a cluster number is assigned may be managed as a cluster table.

여기서 ‘클러스터 테이블’은 클러스터 번호, 성공 횟수(cluster success count), 접근 시간(cluster access time), 페이지 주소(page address) 및 프리페치 뎁스(prefetch depth)와 같은 정보를 포함할 수 있다.The 'cluster table' may include information such as a cluster number, a cluster success count, a cluster access time, a page address, and a prefetch depth.

‘성공 횟수’는 클러스터가 프리페치되고, 프리페치된 클러스터가 히트되었을 때(클러스터에 속한 페이지가 히트되었을 때) 카운트될 수 있으며 클러스터의 액세스 패턴 학습 시 사용될 수 있다The number of successes can be counted when the cluster is prefetched, the prefetched cluster is hit (when a page belonging to the cluster is hit), and can be used to learn the access pattern of the cluster

그리고, ‘접근 시간’은 상기 성공 횟수가 카운트되는 시간으로 기록될 수 있으며, ‘프리페치 뎁스’는 앞서 설명한 바와 같이 클러스터가 올바른 액세스 패턴으로 클러스터링이 잘 되었는지 또는 그렇지 않은지(well-clustered or not)를 의미하는, 즉 클러스터링의 정도(degree of clustering)를 나타내는 값으로서 ‘클러스터의 히트율’을 의미한다.The 'access time' may be recorded as the time at which the success count is counted, and the 'prefetch depth' may be a well-clustered or not cluster, as described above, Quot; clustering rate ", which is a value indicating the degree of clustering.

한편, 액세스 패턴 학습부(120)는 클러스터가 응용 프로그램의 수행 특성에 따라서 올바른 액세스 패턴을 가지도록 학습할 수 있다.On the other hand, the access pattern learning unit 120 can learn that the cluster has a correct access pattern according to the performance characteristics of the application program.

이를 위해 액세스 패턴 학습부(120)는 클러스터의 프리페치 뎁스를 계산할 수 있다.For this, the access pattern learning unit 120 may calculate the prefetch depth of the cluster.

구체적으로, 액세스 패턴 학습부(120)는 LRU(Least Recently Used) 정책에 기반하여 클러스터에 속한 페이지들의 최종 접근 시간과 접근 횟수로 계산되는 페이지 점수와, 클러스터의 성공 횟수 및 최종 접근 시간을 이용하여 클러스터의 프리페치 뎁스를 계산할 수 있으며, 이를 수학식으로 나타내면 아래와 같다.Specifically, the access pattern learning unit 120 uses the page score calculated by the last access time and the access count of the pages belonging to the cluster, the number of successes of clusters, and the final access time based on the LRU (Least Recently Used) The prefetch depth of the cluster can be calculated.

여기서 함수 pd(C)는 특정 클러스터 C의 프리페치 뎁스, i는 페이지 인덱스, δ_i(s)는 특정 클러스터 C에 속한 페이지 i의 평균 페이지 점수를 계산하는 함수, ρ는 해당 클러스터에 속한 전체 페이지의 수이다.Here, the function pd (C) is a function of prefetch depth of a specific cluster C, i is a page index, δ _i (s) is a function of calculating the average page score of a page i belonging to a specific cluster C, &Lt; / RTI >

그리고 α는 클러스터의 성공 횟수, β는 클러스터의 최종 접근 시간을 나타낸다.Α is the number of cluster successes, and β is the final access time of the cluster.

또한, 액세스 패턴 학습부(120)는 클러스터가 올바른 액세스 패턴을 가지도록 학습하기 위해서, 클러스터들 중 페이지 미스(page miss)가 발생한 클러스터의 정보가 저장된 큐(이하 ‘미스 클러스터 큐(miss cluster queue)’라 칭함)를 이용한다.Also, the access pattern learning unit 120 may be configured to classify a queue (hereinafter referred to as a " miss cluster queue ") in which information of a cluster in which a page miss occurs, Quot;).

미스 클러스터 큐는 DRAM(10)에 프리페칭된 페이지가 히트되지 못하면, 즉 페이지 미스(miss)가 발생하면 해당 클러스터의 번호가 선입선출(FIFO) 방식으로 기록될 수 있다.If a page prefetched in the DRAM 10 is not hit, that is, a page miss occurs, the number of the cluster may be recorded in a first-in first-out (FIFO) manner.

액세스 패턴 학습부(120)는 NAND Flash(20)로부터 DRAM(10)으로 프리페칭된 클러스터가 바로 이후의 페이지 요청에 의해 히트될 때 해당 클러스터의 성공 횟수를 카운트하고 이때의 시간을 최종 접근 시간으로 기록할 수 있다.The access pattern learning unit 120 counts the number of successes of the cluster when the cluster prefetched from the NAND Flash 20 to the DRAM 10 is hit by a subsequent page request and sets the time at this time as the final access time Can be recorded.

프리페칭된 클러스터가 히트에 성공하면 액세스 패턴 학습부(120)는 클러스터 테이블에서 해당 클러스터의 성공 횟수와 최종 접근 시간 및 프리페치 뎁스와 같은 정보들을 업데이트할 수 있다. 참고로 프리페치 뎁스의 값이 증가할수록 해당 클러스터는 더욱 올바른 액세스 패턴을 가지게 된다.If the prefetched cluster succeeds in hit, the access pattern learning unit 120 can update information such as the number of successes of the cluster, the final access time, and the prefetch depth in the cluster table. Note that as the value of the prefetch depth increases, the cluster will have a more correct access pattern.

또한, 액세스 패턴 학습부(120)는 NAND Flash(20)로부터 DRAM(10)으로 프리페칭된 클러스터가 바로 이후의 페이지 요청에 의해 히트되지 못하면, 페이지 미스가 발생된 클러스터의 페이지와 실제 요청된 페이지를 교환함으로써 해당 클러스터가 올바른 액세스 패턴을 가지도록 학습할 수 있다.In addition, if the cluster prefetched from the NAND Flash 20 to the DRAM 10 is not hit by a subsequent page request, the access pattern learning unit 120 determines that the page of the cluster in which the page miss occurred and the actual requested page So that the cluster can learn to have a correct access pattern.

여기서 액세스 패턴 학습부(120)는 프리페칭 됐지만 미스가 발생한 페이지와 실제 요청된 페이지를 교환 시, 미스 클러스터 큐에서 각 페이지가 속한 클러스터간의 거리를 고려하여, 클러스터간 거리가 미리 정해진 조건을 만족하는 경우 두 페이지를 교환할 수 있다.Here, the access pattern learning unit 120, considering the distance between the clusters to which each page belongs in the miss cluster queue, when the actual requested page is exchanged with the page where the missed page has been pre-fetched but the cluster distance satisfies a predetermined condition If you can exchange two pages.

참고로, 페이지의 교환 시 클러스터간 거리를 고려하는 이유는 다음과 같다.For reference, the reason for considering the intercluster distances when exchanging pages is as follows.

앞서 설명한 바와 같이 하나의 클러스터 사이즈는 256KB인데, 워크로드(workload)나 응용 프로그램에서는 일반적으로 클러스터 사이즈보다 더 큰 데이터를 요청하게 되므로, 실제로는 4~5개의 클러스터가 하나의 액세스 패턴으로 여겨질 수 있다.As mentioned earlier, one cluster size is 256KB, and in a workload or an application it typically requests data larger than the cluster size, so actually 4 to 5 clusters can be considered as one access pattern have.

즉, NAND Flash(20)로부터 DRAM(10)으로 프리페칭 시 4~5개의 클러스터가 한꺼번에 옮겨질 수 있다.That is, when prefetching from the NAND Flash 20 to the DRAM 10, 4 to 5 clusters can be transferred at once.

따라서, 프리페칭 됐지만 페이지 미스가 발생한 클러스터와 실제 요청된 페이지가 속한 클러스터의 거리가 가까운 경우라면 굳이 페이지를 교환하지 않더라도 다음 페이지 요청 시 페이지 미스가 발생한 클러스터 이후의 클러스터에서 실제 요청된 페이지가 곧 처리될 것이다.Therefore, if the distance between the cluster where the page miss has occurred and the cluster where the actual requested page belongs is close, even if the page is not exchanged, the actual requested page is immediately processed in the cluster after the cluster in which the page miss occurred, Will be.

결국, 클러스터간 거리가 하나의 액세스 패턴으로 여겨지는 경우에는 굳이 페이지를 교환할 필요가 없고, 클러스터간 거리가 하나의 액세스 패턴으로 여겨질 수 없을 만큼 떨어진 경우에 페이지가 교환될 수 있다.As a result, if the distance between clusters is regarded as one access pattern, it is not necessary to exchange pages, and the page can be exchanged when the distance between clusters is too far apart to be regarded as one access pattern.

이를 위해 액세스 패턴 학습부(120)는 프리페치 뎁스와 미스 클러스터 큐를 이용하여 클러스터간 거리를 나타내는 클러스터 근접도(cluster closeness)를 계산할 수 있으며, 이를 수학식으로 나타내면 아래와 같다.For this, the access pattern learning unit 120 may calculate the cluster closeness indicating the distance between the clusters using the prefetch depth and the miss cluster queue.

여기서, acc()는 평균 클러스터 근접도를 의미하며 C_t는 클러스터의 전체 수, γ는 acc()를 위해 설정 가능한 가중치, n은 미스 클러스터 큐의 전체 크기(total entries), ε은 클러스터의 강도(strength)를 나타내는 변수이다.Here, acc () refers to the average cluster proximity, and C _t is the total number of clusters, γ is configurable weight to acc (), n is the total size of the miss cluster queue (total entries), ε is the strength of the cluster is a variable representing strength.

여기서, 평균 클러스터 근접도 acc()는 클러스터 근접도의 중간 값 또는 ε을 변화시킴으로써 조절될 수 있다.Here, the average cluster proximity acc () can be adjusted by changing the median of the cluster proximity or?.

미스 클러스터 큐의 전체 크기 n이 정해진 상태에서 γ가 커지면 클러스터의 강도 ε이 감소하며, 이는 클러스터간의 페이지 교환 기회가 감소되는 것을 의미한다.When the total size n of the miss cluster queue is fixed and γ is increased, the intensity ε of the cluster decreases, which means that the chance of page exchange between clusters is reduced.

즉, 클러스터간의 페이지 교환 기회가 증가할수록 해당 클러스터의 액세스 패턴은 더욱 올바른 액세스 패턴을 가질 것이다.That is, as the opportunity to exchange pages between clusters increases, the access pattern of the cluster will have a more correct access pattern.

참고로, ε은 기본 값으로 n/2로 설정될 수 있으며, ε을 설정 시 ε <= n 조건을 만족하도록 설정되어야 한다.For reference, ε can be set to n / 2 as the default value, and ε must be set to satisfy the condition ε <= n when set.

상기 [수학식 2]의 cc(C₁, C₂)를 상세히 설명하면 다음과 같다.Cc (C ₁ , C ₂ ) in the above-mentioned formula ( ₂ ) will be described in detail as follows.

여기서 C₁은 페이지 미스가 발생한 클러스터이고, C₂는 실제 요청된 페이지가 속한 클러스터이다.Where C ₁ is the cluster in which the page miss occurred and C ₂ is the cluster to which the actual requested page belongs.

액세스 패턴 학습부(120)는 클러스터 C₁에서, [수학식 1]을 이용하여 클러스터 C1의 프리페치 뎁스 pd(C₁)를 계산한다.Access pattern learning unit (120) is a cluster C _1, using Equation 1 to calculate the prefetch depth pd (C ₁₎ of the cluster C1.

이후, 액세스 패턴 학습부(120)는 C₁으로부터 C₂까지의 거리 d(C₁, C₂)를 계산할 수 있다. 여기서 거리는 미스 클러스터 큐 내에서의 거리이다.Then, the access pattern learning unit 120 can calculate the distance d (C ₁ , C ₂ ) from C ₁ to C ₂ . Where the distance is the distance in the miss cluster queue.

그리고 액세스 패턴 학습부(120)는 C₂에 대해서도 pd(C₂)를 계산하고 C₂로부터 C₁까지의 거리 d(C₂, C₁)를 계산할 수 있다.And access pattern learning unit 120 may calculate pd (C ₂₎ about the C ₂ and calculate distances d (C _2, C ₁₎ to the C ₁ from the C _2.

참고로, 클러스터 근접도에서 사용되는 거리의 측정 기준(distance metric)은 유클리드 거리(Euclidean distance)이다.For reference, the distance metric used in the cluster proximity is the Euclidean distance.

액세스 패턴 학습부(120)는 클러스터 C₁에 대한 프리페치 뎁스와 클러스터 C₁에서 C₂까지의 거리를 곱한 값과, 클러스터 C₂에 대한 프리페치 뎁스와 클러스터 C₂에서 C₁까지의 거리를 곱한 값을 더하고, 이를 평균 클러스터 근접도 acc()로 나누어줌으로써 클러스터 C₁과 C₂간의 클러스터 근접도를 계산할 수 있다.Access pattern learning unit 120 in the pre-fetching depth and the cluster C ₂ for the product of the distance value between C ₂ and a cluster C ₂ in the pre-fetching depth and the cluster C ₁ to the cluster C ₁ the distance to the C ₁ The cluster proximity between cluster C ₁ and C ₂ can be calculated by adding the multiplied values and dividing by the average cluster proximity acc ().

만일, 액세스 패턴 학습부(120)에서 클러스터간 거리를 획득할 수 없는 경우, 클러스터 근접도는 0이 되고 페이지 교환은 수행되지 않는다.If the access pattern learning unit 120 can not acquire the intercluster distance, the cluster proximity becomes 0 and the page exchange is not performed.

또한, 액세스 패턴 학습부(120)는 페이지의 교환 여부를 결정하는 클러스터간 근접도의 판단 기준인 ‘클러스터 학습 프로파일(cluster learning profile)’을 계산할 수 있다.In addition, the access pattern learning unit 120 may calculate a 'cluster learning profile' which is a criterion of the intercluster proximity for determining whether or not to exchange pages.

액세스 패턴 학습부(120)는 클러스터간 근접도가 클러스터 학습 프로파일보다 작으면 두 클러스터가 하나의 액세스 패턴으로 여겨지는 범위를 벗어난 것으로 판단하여 페이지를 교환할 수 있다.If the proximity between clusters is smaller than the cluster learning profile, the access pattern learning unit 120 can determine that two clusters are out of the range considered as one access pattern, and exchange the pages.

또한, 클러스터간 근접도가 클러스터 학습 프로파일보다 크거나 같으면 액세스 패턴 학습부(120)는 두 클러스터가 하나의 액세스 패턴으로 여겨지는 범위 내에 위치하는 것으로 판단하여 페이지를 교환하지 않는다.If the proximity between clusters is equal to or greater than the cluster learning profile, the access pattern learning unit 120 determines that the two clusters are located within a range considered as one access pattern, and does not exchange pages.

여기서 상기 ‘클러스터 학습 프로파일’은 중간 거리(median distance)와 프리페칭에 실패한 클러스터의 평균 프리페치 뎁스에 의해 계산될 수 있다.Here, the 'cluster learning profile' can be calculated by median distance and average prefetch depth of a cluster that fails prefetching.

프리페치에 실패한 클러스터의 평균 프리페치 뎁스는 새로운 액세스 패턴과 반복되는 액세스 패턴 등 유용한 정보를 포함할 수 있다.The average prefetch depth of a cluster that fails to prefetch may contain useful information such as a new access pattern and a repeated access pattern.

또한, 액세스 패턴 학습부(120)는 두 클러스터(C₁, C₂)의 페이지를 교환 시, 페이지 미스가 발생한 클러스터(C₁)에서 페이지 점수가 가장 낮은 페이지를 선택하여 실제 요청된 페이지와 교환할 수 있다.When the two clusters C ₁ and C ₂ are exchanged, the access pattern learning unit 120 selects the page having the lowest page score in the cluster C ₁ where the page miss occurs and exchanges the actually requested page can do.

여기서 ‘페이지 점수’는 페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산될 수 있다.Here, 'page score' can be calculated using the last access time and access count of the page.

한편, 프리페칭부(130)는 각 클러스터의 프리페치 뎁스에 기초하여 클러스터의 페이지들이 미래에 연속적으로 요청될지 또는 그렇지 않을지를 판단할 수 있다.On the other hand, the pre-fetching unit 130 can determine whether or not the pages of the cluster are continuously requested in the future based on the pre-fetch depth of each cluster.

먼저, 요청된 페이지가 DRAM(10)에 존재하지 않으면, 페이지 프리페칭부(130)는 요청된 페이지 및 요청된 페이지가 속한 클러스터를 NAND Flash(20)에서 찾는다.First, if the requested page does not exist in the DRAM 10, the page prefetching unit 130 searches the NAND Flash 20 for the requested page and the cluster to which the requested page belongs.

이후, 페이지 프리페칭부(130)는 요청된 페이지의 프리페치 뎁스와, 요청된 페이지가 속한 클러스터의 평균 프리페치 뎁스를 비교하여, 요청된 페이지의 프리페치 뎁스가 더 큰 경우, 해당 클러스터의 페이지들이 미래에 연속적으로 요청될 것으로 판단하고 해당 클러스터의 모든 페이지들을 DRAM(10)에 프리페칭할 수 있다(all prefetching policy).Then, the page pre-fetching unit 130 compares the pre-fetch depth of the requested page with the average pre-fetch depth of the cluster to which the requested page belongs. If the pre-fetch depth of the requested page is larger, (All prefetching policy) in the DRAM 10, assuming that all the pages of the cluster are requested continuously in the future.

프리페치 뎁스가 액세스 패턴을 표현하고 있기 때문에 높은 프리페치 뎁스 값을 가지는 클러스터는 DRAM(10)에서의 히트율를 높일 수 있는 확률이 높다고 판단될 수 있다.Since the prefetch depth expresses the access pattern, a cluster having a high prefetch depth value can be judged to have a high probability of increasing the hit ratio in the DRAM 10.

한편, 페이지 방출부(140)는 DRAM(10)에 프리페칭된 페이지를 수용할 수 있는 공간이 부족할 때, DRAM(10)으로부터 NAND Flash(20)로 페이지들을 방출할 수 있다.On the other hand, the page emitter 140 may emit pages from the DRAM 10 to the NAND Flash 20 when the DRAM 10 lacks enough space to accommodate prefetched pages.

이때, NAND Flash(20)로 방출될 페이지들은 프리페치 뎁스와 페이지 점수를 이용하여 계산된 방출 점수(eviction score)가 하위에서 미리 정해진 비율(예를 들어 20%)인 페이지들이 선택될 수 있다.At this time, the pages to be emitted to the NAND Flash 20 may be selected with pages having a predefined ratio (for example, 20%) lower than the eviction score calculated using the prefetch depth and the page score.

참고로, NAND Flash(20)에 DRAM(10)으로부터 방출된 페이지들을 수용할 공간이 부족할 때, 전술한 경우와 동일한 룰이 적용되어 NAND Flash(20)로부터 저장소(storage)(미도시)로 페이지들이 방출될 수 있다.For reference, when there is insufficient space in the NAND Flash 20 to accommodate the pages issued from the DRAM 10, the same rules as in the above-described case are applied, and the page is transferred from the NAND Flash 20 to the storage (not shown) Lt; / RTI >

이와 같이 페이지 단위에 기반한 선택적 페이지 방출은 프리페칭되는 페이지들의 평균 개수를 감소시키고, 남아있는 페이지들의 재-히트율(re-hit rate)도 향상시킬 수 있다.This page-based selective page ejection can reduce the average number of pages to be pre-fetched and improve the re-hit rate of the remaining pages.

도 4는 본 발명의 일 실시예에 따른 클러스터간 페이지 교환 과정을 도시한 도면이다.4 is a diagram illustrating a process of exchanging a page between clusters according to an embodiment of the present invention.

DRAM(10)에 프리페칭된 클러스터 A에는 요청된 페이지(d₂)가 존재하지 않는 페이지 미스가 발생한 상태이며, 요청된 페이지(d₂)는 프리페칭된 클러스터 B에 존재한다.In the cluster A pre-fetched in the DRAM 10, a page miss in which the requested page d ₂ does not exist has occurred, and the requested page d ₂ exists in the prefetched cluster B.

참고로, 클러스터 A는 요청된 페이지(d₂)의 액세스 패턴에 상응하는 클러스터이다.For reference, cluster A is a cluster corresponding to the access pattern of the requested page (d ₂ ).

액세스 패턴 학습부(120)는 클러스터 A에 속한 페이지와 클러스터 B에 속한 실제 요청된 페이지(d₂)를 교환하기 위해, 미스 클러스터 큐에서 클러스터 A와 클러스터 B의 근접도를 계산하고, 클러스터 학습 프로파일과 비교하여 그 결과에 따라서 페이지를 교환할 수 있다.The access pattern learning unit 120 calculates the proximity of the cluster A and the cluster B in the miss cluster queue to exchange the actually requested page (d ₂ ) belonging to the cluster B with the page belonging to the cluster A, And can exchange pages according to the result.

클러스터 A와 클러스터 B의 근접도가 클러스터 학습 프로파일보다 작으면 액세스 패턴 학습부(120)는 클러스터 A에 속한 페이지(d₁)와 클러스터 B에 속한 실제 요청된 페이지(d₂)를 교환한다.If the proximity of the cluster A and the cluster B is smaller than the cluster learning profile, the access pattern learning unit 120 exchanges the page (d ₁ ) belonging to the cluster A with the actually requested page (d ₂ ) belonging to the cluster B.

참고로, 클러스터 A에서 교환 페이지로 선택된 페이지 d₁은, 클러스터 A에 속한 페이지들 중 페이지 점수가 가장 낮은 페이지이다.For reference, the page d ₁ selected as the exchange page in the cluster A is the page having the lowest page score among the pages belonging to the cluster A.

도 5는 본 발명의 일 실시예에 따른 페이지를 프리페칭하는 과정을 도시한 흐름도이다.5 is a flowchart illustrating a process of pre-fetching a page according to an embodiment of the present invention.

도 5의 과정은 도 1에 도시된 페이지 프리페칭 장치(100)에 의해 수행될 수 있다.The process of FIG. 5 may be performed by the page prefetching apparatus 100 shown in FIG.

페이지 프리페칭 장치(100)는 응용 프로그램에 의해 액세스되는 페이지들을 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성한다(S501).The page pre-fetching apparatus 100 creates a cluster by grouping pages accessed by an application program according to a data access pattern of an application program (S501).

여기서 하나의 클러스터는 4KB 단위의 페이지가 64개 포함된 256KB의 사이즈로 구현될 수 있으며, 클러스터 생성 시 클러스터 번호(cluster number)가 유니크한 식별자로서 부여될 수 있고, 클러스터 번호가 부여된 각 클러스터는 클러스터 테이블로 관리될 수 있다.In this case, one cluster can be implemented with a size of 256 KB including 64 pages in units of 4 KB, and a cluster number can be given as a unique identifier when a cluster is created, and each cluster to which a cluster number is assigned And can be managed as a cluster table.

그리고 ‘클러스터 테이블’은 클러스터 번호, 성공 횟수(cluster success count), 접근 시간(cluster access time), 페이지 주소(page address) 및 프리페치 뎁스(prefetch depth)와 같은 정보를 포함할 수 있다.The 'cluster table' may include information such as a cluster number, a cluster success count, a cluster access time, a page address, and a prefetch depth.

참고로, 처음 클러스터는 액세스된 순서대로 페이지들이 구성될 수 있다.For reference, the first cluster may be organized in pages in the order in which they were accessed.

S501 후, 페이지 프리페칭 장치(100)는 페이지가 요청되고, 해당 요청 페이지가 DRAM(10)에 존재하지 않으면, 요청된 페이지의 프리페치 뎁스와 해당 페이지가 속한 클러스터의 평균 프리페치 뎁스를 비교하여, 요청된 페이지의 프리페치 뎁스가 더 크면, 요청된 페이지가 속한 클러스터의 모든 페이지들을 NAND Flash(20)에서 DRAM(10)으로 프리페칭한다(S502).After step S501, the page prefetching apparatus 100 compares the prefetch depth of the requested page with the average prefetch depth of the cluster to which the page belongs, if the page is requested and the requested page does not exist in the DRAM 10 If the prefetch depth of the requested page is larger, all the pages of the cluster to which the requested page belongs are prefetched from the NAND Flash 20 to the DRAM 10 (S502).

S502 후, 프리페칭 이후의 페이지 요청 시 프리페칭된 클러스터들 중 요청된 페이지의 액세스 패턴에 상응하는 클러스터에서 페이지 미스가 발생하면, 페이지 프리페칭 장치(100)는 페이지 미스가 발생된 클러스터와 실제 요청된 페이지가 속한 다른 클러스터의 근접도를 계산한다(S503).If a page miss occurs in the cluster corresponding to the access pattern of the requested page among the clusters pre-fetched at the time of the page request after the pre-fetching after step S502, the page pre- The proximity of other clusters to which the page belongs is calculated (S503).

여기서 클러스터간 근접도는 미스 클러스터 큐 내에서의 거리이다.Where the closeness between clusters is the distance within the miss cluster queue.

S503 후, 페이지 프리페칭 장치(100)는 S503에서 계산된 클러스터간 근접도와 클러스터 학습 프로파일을 비교하고, 클러스터간 근접도가 클러스터 학습 프로파일보다 작으면 페이지 미스가 발생된 클러스터의 페이지와 실제 요청된 페이지를 교환한다(S504).After step S503, the page pre-fetching apparatus 100 compares the proximity between clusters calculated in step S503 with the cluster learning profile, and if the proximity between clusters is smaller than the cluster learning profile, (S504).

만일, 클러스터간 근접도가 클러스터 학습 프로파일보다 크거나 같다면, 미스가 발생된 클러스터의 페이지와 실제 요청된 페이지의 교환은 이루어지지 않는다.If the proximity between clusters is greater than or equal to the cluster learning profile, then the page of the cluster in which the miss occurred is not exchanged with the actual requested page.

참고로, ‘클러스터 학습 프로파일’은 중간 거리(median distance)와 프리페칭에 실패한 클러스터의 평균 프리페치 뎁스에 의해 계산될 수 있다.For reference, the 'cluster learning profile' can be calculated by the median distance and the average prefetch depth of the cluster that failed prefetching.

이후, 페이지 프리페칭 장치(100)는 S502 내지 S504의 과정을 반복 수행하면서 클러스터가 올바를 액세스 패턴을 가지도록 액세스 패턴을 학습할 수 있다.Thereafter, the page pre-fetching apparatus 100 can learn the access pattern so that the cluster has a correct access pattern while repeating steps S502 to S504.

S504 후, DRAM(10)에 페이지를 프리페칭할 공간이 부족하면, 페이지 프리페칭 장치(100)는 각 페이지에 대해 프리페치 뎁스와 페이지 점수로 계산된 방출 점수(eviction score)를 계산하고, 하위 20%에 해당하는 페이지들을 DRAM(10)으로부터 NAND Flash(20)로 방출한다(S505).After step S504, if there is insufficient space in the DRAM 10 to prefetch the page, the page prefetching device 100 calculates the prefetch depth for each page and the eviction score calculated by the page score, 20% from the DRAM 10 to the NAND Flash 20 (S505).

이는 같은 클러스터의 페이지들을 모두 방출하지 않고, 페이지 점수 기반으로 방출함으로써 프리페칭 시 프리페칭되는 페이지의 개수를 줄이고 남아있는 페이지들의 재히트 레이트(re-hit rate)도 향상시킬 수 있다.This can reduce the number of prefetched pages and improve the re-hit rate of the remaining pages by pre-fetching by releasing based on page scores rather than releasing all of the pages of the same cluster.

이하 도 6 내지 도 13을 참조하여 본 발명의 페이지 프리페치 장치 및 방법의 성능을 테스트한 결과를 설명하도록 한다.Hereinafter, the results of testing the performance of the page pre-fetching apparatus and method of the present invention will be described with reference to FIGS. 6 to 13. FIG.

본 발명은 도 1에 도시된 DRAM(10)과 NAND Flash(20) 그리고 페이지 프리페칭 장치(100)에 대하여 클라우드 서비스의 응용 프로그램에서 테스트를 수행하였다.The present invention tests the application of the cloud service to the DRAM 10, the NAND Flash 20 and the page prefetching device 100 shown in FIG.

테스트를 위한 기본 파라미터는 아래와 같다.The basic parameters for the test are as follows.

또한, 메모리 액세스 주소와 스토리지 I/O 이벤트를 획득하기 위해, QEMU(short for Quick Emulator)를 위한 추적 및 추출 모듈(trace-extract module)을 별도로 개발하여 사용하였다.In addition, a trace-extract module for QEMU (Short for Quick Emulator) was separately developed and used to acquire memory access address and storage I / O event.

그리고 QEMU에서 OpenStack Swift, Redis 및 ZooKeeper와 같은 응용 프로그램을 셋업하였고, 실시간 스트림 데이터 처리를 위해 아파치 스톰(Apache Storm)을 사용하였다.We set up applications like OpenStack Swift, Redis and ZooKeeper in QEMU and used Apache Storm to process live stream data.

상기 ‘추적 및 추출 모듈’은 프로세서의 요청으로부터 수집되는 응용 프로그램의 메모리 사용량(memory footprints), 쓰기/읽기 요청 및 메모리 관리 기법(memory management scheme) 등을 획득할 수 있다.The 'trace and extract module' may obtain memory footprints, write / read requests, and memory management schemes of an application program collected from a processor request.

본 발명의 일 실시예에 따른 테스트 시스템은 아래와 같이 서로 다른 4개의 캐쉬와, 도 1에 도시된 DRAM(10) 및 NAND Flash(20)의 하이브리드 메인 메모리로 구성될 수 있다.The test system according to an embodiment of the present invention may be configured with four different caches as shown below and a hybrid main memory of the DRAM 10 and the NAND Flash 20 shown in FIG.

여기서 DRAM(10) 및 NAND Flash(20)를 포함하는 하이브리드 메인 메모리는 올바르게 클러스터된 데이터(well-clustered data)를 수용하는 128MB의 DRAM(10)과 백업 후보들을 수용하는 256MB의 NAND Flash(20)로 구성된다.The hybrid main memory including the DRAM 10 and the NAND Flash 20 includes a 128 MB DRAM 10 that accommodates well-clustered data and a 256 MB NAND Flash 20 that accommodates backup candidates. .

그리고, 각 클러스터의 관리 사이즈는 4KB의 페이지를 64개 사용하여 256KB이며, 미스 클러스터 큐의 엔트리는 100개로 설정하였다.The management size of each cluster is 256 KB using 64 pages of 4 KB and the number of entries of the miss cluster queue is set to 100.

또한, 본 발명의 페이지 프리페칭 시스템을 평가하기 위해, 아래와 같이 5개의 응용 프로그램에 대해 서로 다른 메모리와 데이터 워크로드를 설정하여 수행하였다.In order to evaluate the page prefetching system of the present invention, different memory and data workloads were set and performed for the five application programs as described below.

대부분의 클라우드 서비스 응용 프로그램은 읽기 동작이 쓰기 동작보다 더 많이 수행되는데 이를 WORM(Write Once Read Many)이라 하며, 본 발명의 테스트에도 이를 반영하였다.Most cloud service applications perform more read operations than write operations, which is called Write Once Read Many (WORM), and this is reflected in the test of the present invention.

응용 프로그램 중 Redis는 Remote Dictionary Server의 약자로서 오픈 소스 소프트웨어이며, 메모리 데이터 스토어에서 가장 유명한 key-value structure이다.Redis is the abbreviation for Remote Dictionary Server, an open source software and the most famous key-value structure in memory data stores.

본 발명의 테스트에서는 Redis의 성능 평가를 위해 두 개의 서로 다른 YSCB(Yahoo! Cloud Serving Benchmark) 워크로드를 설정하였다.In the test of the present invention, two different YSCB (Yahoo Cloud Serving Benchmark) workloads were set up for performance evaluation of Redis.

Redis1은 small random key-value 액세스 패턴을 나타내고, Redis2는 더 많은 쓰기 동작의 수행을 보여주고 있다.Redis1 represents a small random key-value access pattern, and Redis2 represents more write operations.

또한, 본 발명의 테스트에서는 클라우드 서비스의 성능을 평가하기 위해, Open Stack Swift를 사용한다.Further, in the test of the present invention, Open Stack Swift is used to evaluate the performance of the cloud service.

Open Stack Swift는 널리 알려진 object storage open-source 아키텍쳐이며, Swift 시스템을 평가하기 위해 Intel COSBench(Cloud Object Storage Benchmark)를 이용하였다.Open Stack Swift is a well-known object storage open-source architecture and uses Intel COSBench (Cloud Object Storage Benchmark) to evaluate Swift systems.

두 개의 워크로드 Swift1 및 Swift2는 COSBench에 의해 생성되었고, Swift1은 스몰 랜덤(small random) 파일 액세스 패턴이고 Swift2는 크고 순차적인(large and sequential) 액세스 패턴이다.Two workloads, Swift1 and Swift2, were created by COSBench, Swift1 is a small random file access pattern, and Swift2 is a large and sequential access pattern.

또한, 본 발명의 테스트는 데이터 센터의 저장소와 컴퓨터 노드를 조절하기 위해 많은 가상 머신을 생성하는 Zookeeper를 사용하였으며, 아파치 스톰(Apache Strom)에서 스트림 프로세싱(stream processing)의 해쉬 태그 파싱(hash tag parsing) 응용 프로그램이 실행된다.In addition, the test of the present invention uses a Zookeeper that creates many virtual machines to control the repository and computer nodes of the data center. In the Apache Strom, hash tag parsing (stream processing) ) The application runs.

도 6은 본 발명의 일 실시예에 따른 최적의 클러스터 사이즈를 테스트한 결과이다.6 is a result of testing an optimal cluster size according to an embodiment of the present invention.

최적의 클러스터 관리 사이즈를 설정하기 위해 도 6에 도시된 바와 같이, 3개의 서로 다른 클러스터 사이즈 - 128KB, 256KB 및 512KB - 를 테스트하였다.To set the optimal cluster management size, three different cluster sizes - 128 KB, 256 KB and 512 KB - were tested, as shown in FIG.

페이지가 4KB 단위일 때, 128KB의 클러스터는 32개의 페이지를 수용할 수 있고, 256KB의 클러스터는 64개의 페이지를 수용할 수 있으며, 512KB는 128개의 페이지를 각각 수용할 수 있다.When a page is 4 KB units, a 128 KB cluster can accommodate 32 pages, a 256 KB cluster can accommodate 64 pages, and 512 KB can accommodate 128 pages, respectively.

먼저, DRAM(10)에서 페이지 프리페칭 장치(100)의 히트 레이트를 평가하였다.First, the hit rate of the page prefetching device 100 in the DRAM 10 was evaluated.

도 6의 (a)는 그 결과로서, 128KB의 클러스터는 DRAM(10)에서 평균 84%의 히트 레이트를 보이고 있으며, 256KB의 클러스터는 93%의 히트 레이트를 달성했음을 볼 수 있다. 이는 128KB의 클러스터 대비 약 8% 이상이다.As a result, FIG. 6A shows that a cluster of 128 KB has an average hit rate of 84% in the DRAM 10, and a cluster of 256 KB achieves a hit rate of 93%. This is more than 8% of the 128 KB cluster.

반면, 512KB의 클러스터는 256KB의 클러스터보다 수용하는 페이지의 수가 월등이 많음에도 불구하고, 256KB의 클러스터 대비 약 0.4% 이상의 히트 레이트를 달성할 뿐이었다.On the other hand, a 512 KB cluster only achieved a hit rate of about 0.4% over a 256 KB cluster, despite the fact that the number of pages it accepts is much higher than that of a 256 KB cluster.

결과적으로 본 발명의 실시예에 따른 256KB의 클러스터를 이용했을 때 DRAM(10)에서 더 많은 페이지 히트 레이트를 달성했음을 알 수 있다.As a result, it can be seen that a higher page hit rate is achieved in the DRAM 10 when using the 256 KB cluster according to the embodiment of the present invention.

도 6의 (b)는 각 클러스터 사이즈에서 평균 프리페치 사이즈를 측정한 결과이다.FIG. 6 (b) shows the result of measuring the average prefetch size in each cluster size.

클러스터의 사이즈가 증가함에 따라서 평균 프리페치 사이즈 또한 증가할 수 있는데, 대부분 워크로드의 평균 프리페치 사이즈는 클러스터 사이즈에 따라서 선형적으로 증가하는 경향이 있다.As the cluster size increases, the average prefetch size may also increase, with the average prefetch size of most workloads tending to increase linearly with cluster size.

도 6의 (b)를 살펴보면, 5개의 워크로드 각각에서 128KB의 클러스터는 42KB의 평균 프리페치 사이즈를 가지며, 256KB의 클러스터는 86KB의 평균 프리페치 사이즈를, 그리고 512KB의 클러스터는 190KB의 평균 프리페치 사이즈를 가진다.Referring to FIG. 6B, in each of the five workloads, a 128 KB cluster has an average prefetch size of 42 KB, a 256 KB cluster has an average prefetch size of 86 KB, and a 512 KB cluster has an average prefetch size of 190 KB Size.

평균 프리페치 사이즈의 증가 범위의 갭(gap)은 128KB에서 256KB 보다, 256KB에서 512KB의 경우가 더 큰 것을 알 수 있다..It can be seen that the gap of the average prefetch size increase ranges from 128 KB to 256 KB, and from 256 KB to 512 KB.

도 6의 (c)는 각 클러스터에서 프리페치된 페이지들의 실제 히트 수를 측정한 결과로서, 5개의 워크로드를 비교하기 위해 실제 카운트 수를 정규화하였다.FIG. 6C is a result of measuring the actual number of hits of the prefetched pages in each cluster. In order to compare the five workloads, the actual count number is normalized.

클러스터의 사이즈가 증가할 때 프리페치된 페이지의 히트 수 또한 대부분의 경우 증가하였다.As the size of the cluster grows, the number of hits on prefetched pages has also increased in most cases.

본 발명의 테스트에서는 서로 다른 실험 환경에서 세 가지 사이즈의 클러스터를 평가하였고, 그 결과 128KB에서 256KB 사이의 경우가 성능 개선이 있었음을 확인할 수 있었다.In the test of the present invention, three sizes of clusters were evaluated in different experimental environments. As a result, it was confirmed that the performance between 128KB and 256KB was improved.

실제로 대부분의 경우, DRAM(10)에서의 히트 레이트는 256KB 사이즈의 클러스터에서 대부분 집중된다.In fact, in most cases, the hit rate in the DRAM 10 is mostly concentrated in a cluster of 256 KB size.

참고로, 히트 레이트의 집중(saturation of hit rate)은 클러스터의 관리 사이즈를 선택함에 있어서 중요한 요소이고, 평균 프리페치 사이즈는 클러스터의 사이즈를 선택함에 있어 중요한 요소이다.For reference, the saturation of hit rate is an important factor in selecting the management size of the cluster, and the average prefetch size is an important factor in selecting the cluster size.

결국, 512KB의 클러스터를 사용하는 경우, NAND Flash(20)로부터 DRAM(10)으로 프리페치되는 데이터의 사이즈가 190KB이며, 190KB는 데이터 전송에 있어서 또 다른 병목현상(bottleneck)을 발생시킬 수 있다.As a result, in the case of using a cluster of 512 KB, the size of data prefetched from the NAND Flash 20 to the DRAM 10 is 190 KB, and 190 KB can cause another bottleneck in data transfer.

따라서, 종합적으로 이러한 요소들을 고려하면 최적의 클러스터 관리 사이즈는 256KB로 설정되는 것이 바람직하다.Therefore, considering these factors together, it is desirable that the optimum cluster management size is set to 256 KB.

도 7은 본 발명의 일 실시예에 따른 DRAM에서 히트 레이트의 변화를 나타낸 그래프이다.FIG. 7 is a graph illustrating a change in a heat rate in a DRAM according to an exemplary embodiment of the present invention. Referring to FIG.

도 7에서는 본 발명의 페이지 프리페칭 장치(100)의 액세스 패턴 학습 효과를 관찰하기 위해 페이지 프리페칭 장치(100)의 클러스터링 관리 기법과 종래의 메모리 관리 및 순차적 프리페칭 기술을 비교하였으며, 테스트 환경은 이전과 동일하다.7 compares the clustering management technique of the page prefetching device 100 with the conventional memory management and sequential prefetching techniques to observe the access pattern learning effect of the page prefetching device 100 of the present invention. It is the same as before.

페이지 프리페칭 장치(100)의 액세스 패턴의 학습 효과를 관찰하기 위해, Redis에서 갑작스런 랜덤 액세스 패턴이 관여된 실험 환경을 고려하였다.In order to observe the learning effect of the access pattern of the page prefetching device 100, an experimental environment in which a sudden random access pattern is involved in Redis is considered.

도 7에서, y축은 DRAM(10)의 히트 레이트이고, x축은 요청된 페이지의 수로서 각 수는 1백만 단위이다.7, the y-axis is the hit rate of the DRAM 10, and the x-axis is the number of pages requested, each number being one million units.

참고로, 랜덤 액세스가 발생하는 시작점에서 히트 레이트는 초기화된다.For reference, the hit rate is initialized at the starting point where random access occurs.

도 7의 그래프를 살펴보면, 히트 레이트의 감소는 요청 수가 3백만이 될 때까지 계속된다. 그러나 요청 수가 3백만 이후에 본 발명의 클러스터링 관리 기법은 클러스터의 페이지를 교환함으로써 응용 프로그램의 비규칙적인 액세스 패턴에 대해 학습할 수 있다.Referring to the graph of FIG. 7, the decrease in the hit rate continues until the number of requests reaches 3 million. However, after the number of requests is 3 million, the clustering management technique of the present invention can learn about the irregular access pattern of the application program by exchanging pages of the cluster.

결과적으로, 요청 수가 4백만에서 본 발명의 클러스터링 관리 기법은 대략 0.4%의 히트 레이트를 보이는 반면, 종래의 메모리 관리 기법은 대략 2.8%의 감소를 보인다. 그리고 종래의 순차적인 프리페칭 기법은 대략 2%의 히트 레이트 손실을 보이고 있다.As a result, the conventional memory management technique shows a reduction of approximately 2.8% while the number of requests is 4 million, while the clustering management technique of the present invention shows a heat rate of approximately 0.4%. And the conventional sequential prefetching technique shows a heat rate loss of approximately 2%.

요청 수 4백만 이후로 종래의 메모리 관리 기법과 순차적인 프리페칭 기법의 히트 레이트 기울기는 요청 수가 7백만이 될 까지 지속적으로 감소하는 반면, 본 발명의 클러스터 관리 기법의 경우 클러스터가 학습을 통해 올바른 액세스 패턴을 가지게 되므로 히트 레이트가 증가하는 것을 볼 수 있다.Since the number of requests is more than 4 million, the conventional memory management techniques and the sequential prefetching technique have a steady decline in the hit rate until the number of requests reaches 7 million, whereas in the cluster management technique of the present invention, Pattern, so that the heat rate increases.

요청 수가 7백만에서 본 발명의 클러스터링 관리 기법과 종래의 메모리 관리 기법과 비교했을 때, 히트 레이트의 가장 큰 갭은 거의 20%의 차이를 보인다.When the number of requests is 7 million, the largest gap of the hit rate shows a difference of almost 20% when compared with the clustering management technique of the present invention and the conventional memory management technique.

도 8은 본 발명의 일 실시예에 따른 페이지의 교환과 프리페치 정확도의 관계를 나타낸 그래프이다.8 is a graph illustrating a relationship between page replacement and prefetch accuracy according to an embodiment of the present invention.

도 8의 그래프는 급작스런 랜덤 액세스 워크로드에서 페이지 프리페칭 장치(100)에 의해 교환된 페이지들의 수와 프리페치 정확도를 보여주고 있다.The graph of FIG. 8 shows the number of pages exchanged and the prefetch accuracy by the page prefetching device 100 in a sudden random access workload.

도 8에서, 좌측의 y축은 프리페치된 페이지의 정확도, 우측의 y축은 페이지의 수, 그리고 x축은 액세스된 페이지의 수를 나타낸다. 참고로 x축에서 각 요청 수의 단위는 1백만이다.8, the y-axis on the left represents the accuracy of the prefetched page, the y-axis on the right represents the number of pages, and the x-axis represents the number of accessed pages. Note that the number of each request in the x-axis is one million.

본 발명의 클러스터 관리 기법이 처음 급작스런 비규칙적 패턴에 직면하면 프리페치 정확도가 대략 38%로부터 30%로 급격히 감소하나, 이후에는 지속적으로 증가하고 있음을 볼 수 있다.If the cluster management technique of the present invention encounters a sudden irregular pattern for the first time, the prefetch accuracy sharply decreases from approximately 38% to 30%, but then increases steadily thereafter.

특히, 프리페치 정확도의 주요 이득은 요청 수가 4백만에서 5백만 사이에 관찰되며, 이 때 본 발명의 시스템의 히트율은 대략 2%로 크게 증가한다.In particular, the main benefit of pre-fetch accuracy is observed in the number of requests between 4 million and 5 million, where the hit rate of the system of the present invention increases substantially to approximately 2%.

그리고 이와 같은 큰 증가 후, 프리페치 정확도는 거의 50% 정도를 유지한다.And after such a large increase, the prefetch accuracy is maintained at about 50%.

도 8에서, 프리페치 정확도가 감소할 때, 클러스터간 교환된 페이지의 수가 급격하게 증가하는 것을 볼 수 있는데, 이는 페이지 교환을 통해 클러스터가 올바를 액세스 패턴을 가지도록 학습이 이루어지기 때문이다.In FIG. 8, when the pre-fetch accuracy decreases, the number of pages exchanged between clusters increases sharply because the learning is done so that the cluster has a correct access pattern through page exchange.

이와 같은 페이지 교환 후에, 프리페치 정확도는 증가하고 교환된 페이지의 수 또한 일정 상태를 유지하고 있음을 볼 수 있다.After such a page swap, the prefetch accuracy increases and the number of swapped pages remains constant.

본 발명의 클러스터 관리 기법을 통해서 각 클러스터는 정확한 액세스 패턴을 획득할 수 있으며, 특히, 급작스런 비규칙 액세스 패턴이 발생했을 때, 본 발명의 클러스터 관리 기법은 종래의 메모리 관리 기법 및 순차적인 프리페칭 기법보다 더 강한 히트 레이트 복원을 제공할 수 있다.Each cluster can acquire an accurate access pattern through the cluster management technique of the present invention. In particular, when a sudden irregular access pattern occurs, the cluster management technique of the present invention uses a conventional memory management technique and a sequential prefetching technique It is possible to provide a stronger heat rate recovery.

도 9는 본 발명의 일 실시예에 따른 DRAM의 전반적인 히트 레이트를 나타낸 그래프이다.9 is a graph illustrating the overall hit rate of a DRAM according to an embodiment of the present invention.

앞서 설명한 바와 같이, 본 발명의 페이지 프리페칭 시스템은 128MB의 DRAM(10)과 256MB의 NAND Flash(20)로 구성될 수 있다.As described above, the page prefetching system of the present invention can be configured with a 128 MB DRAM 10 and a 256 MB NAND Flash 20.

본 발명의 이러한 모델을 평가하기 위해, 128MB, 256MB, 512MB 및 1024MB로 구성되는 종래의 DRAM 메인 메모리 모델과 비교하여 테스트를 수행하였으며, 모든 비교 모델들과 본 발명의 모델은 기본 저장소로서 128 SSD를 가진다.In order to evaluate this model of the present invention, a test was conducted in comparison with a conventional DRAM main memory model consisting of 128 MB, 256 MB, 512 MB and 1024 MB, and all comparison models and models of the present invention have 128 SSD I have.

그리고, 앞서 언급한 워크로드(Redis1, Redis2, OpenStack Swift1, OpenStack Swift2 및 Zookeeper)를 적용하여 성능 평가를 수행하였다.Then, performance evaluation was performed by applying the above-mentioned workloads (Redis1, Redis2, OpenStack Swift1, OpenStack Swift2, and Zookeeper).

먼저, 도 9에서, 본 발명의 128MB의 DRAM(10)과 종래 비교 모델의 DRAM 메인 메모리(128MB, 256MB, 512MB 및 1024MB)에서의 히트 레이트를 평가하였다.First, in FIG. 9, the hit rates of the 128 MB DRAM 10 of the present invention and the DRAM main memory of the conventional comparative model (128 MB, 256 MB, 512 MB and 1024 MB) were evaluated.

본 발명이 128MB의 DRAM 공간에도 불구하고 Redis1, Redis2 및 Zookeeper와 같은 워크로드들의 히트 레이트는 본 발명을 통해 만족스럽게 개선되었음을 볼 수 있다.Despite the 128MB DRAM space of the present invention, it can be seen that the heat rates of workloads such as Redis1, Redis2 and Zookeeper have satisfactorily improved through the present invention.

본 발명의 DRAM에서, 순차적 프리페칭 기법을 사용하는 동일한 DRAM 사이즈를 가지는 종래의 메인 메모리 시스템보다 대략 16% 이상의 히트 레이트를 획득할 수 있음을 확인할 수 있다.It can be seen that the DRAM of the present invention can acquire a hit rate of about 16% or more as compared with the conventional main memory system having the same DRAM size using the sequential prefetching technique.

또한, 1024KB의 DRAM 메인 메모리 모델과 비교한 경우에도, 본 발명의 모델은 대략 2% 이상의 히트 레이트를 획득할 수 있다.In addition, even when compared with the 1024 KB DRAM main memory model, the model of the present invention can obtain a hit rate of approximately 2% or more.

명확하게 정해진 순차적인 파일 액세스 패턴과 관련된 Swift2 워크로드에서, 순차적 프리페칭 기법의 1024MB의 DRAM 메인 메모리 모델은 본 발명의 제안 모델보다 대략 1.3% 더 높은 히트 레이트를 가진다.In the Swift2 workload associated with a clearly defined sequential file access pattern, the 1024 MB DRAM main memory model of the sequential prefetching scheme has a heat rate approximately 1.3% higher than the proposed model of the present invention.

이는 본 발명의 모델에서 DRAM의 용량이 128MB로 제한적이기 때문인데, 이는 Swift1 및 Swift2와 같은 스몰 파일 랜덤 액세스의 히트 레이트에서 다소 부족한 성능 개선을 보여준다.This is because the capacity of the DRAM in the model of the present invention is limited to 128 MB, which shows a somewhat poor performance improvement in the hit rates of small file random access such as Swift1 and Swift2.

본 발명의 모델은 Redis1, Redis2 및 Zookeeper와 같은 memory-intensive 워크로드보다 Swift1 및 Swift2와 같은 file-intensive 워크로드에서 조금 부족한 성능을 보여준다.The models of this invention show little performance on file-intensive workloads such as Swift1 and Swift2, rather than memory-intensive workloads such as Redis1, Redis2, and Zookeeper.

그럼에도 불구하고, 본 발명의 모델은 동일한 사이즈의 DRAM의 경우보다 더 만족스러운 히트율을 제공할 수 있다.Nevertheless, the model of the present invention can provide a more satisfactory hit ratio than the case of the DRAM of the same size.

전체적으로 128MB의 DRAM과 256MB의 NAND Flash를 이용한 384MB의 공간에도 불구하고, 본 발명의 모델은 memory-intensive 워크로드에서 상당한 히트 레이트 개선을 달성할 수 있다.Despite a total of 128 MB of DRAM and 384 MB of space using 256 MB of NAND Flash, the model of the present invention can achieve significant heat-rate improvements in memory-intensive workloads.

평균적으로, 본 발명의 모델은 DRAM의 히트 레이트에 있어서 종래의 관리 방식 및 순차적인 프리페칭 기법을 각각 사용하는 1024MB의 DRAM 메인 메모리보다 대략 3.3% 및 2.1%를 각각 강화시킬 수 있다.On average, the model of the present invention can enhance 3.3% and 2.1%, respectively, of the 1024 MB DRAM main memory using the conventional management scheme and the sequential prefetching technique, respectively, for the DRAM's hit rate.

또한, 동일한 사이즈의 DRAM으로 비교해봐도, 히트 레이트에 있어서 대략 15% 및 13%의 개선을 각각 획득할 수 있다.Further, when compared with a DRAM of the same size, an improvement of approximately 15% and 13% in the heat rate can be obtained, respectively.

도 10은 본 발명의 일 실시예에 따른 워크로드의 전반적인 실행 시간을 측정한 결과를 나타낸 그래프이다.10 is a graph illustrating a result of measuring the overall execution time of a workload according to an embodiment of the present invention.

워크로드의 실행 시간을 측정하기 위해서, 앞서 설명한 메모리의 파라미터들을 도 10의 테스트에 사용하였다.In order to measure the execution time of the workload, the parameters of the memory described above were used in the test of FIG.

전반적인 워크로드에 있어서, 본 발명의 모델은 전체 실행 시간을 단축시킬 수 있다.For the overall workload, the model of the present invention can shorten the overall execution time.

이는 본 발명에서 제안하는 DRAM에서의 높은 히트 레이트 때문이며, 응용 프로그램의 실행 시간은 평균적으로 상당히 감소될 수 있다.This is due to the high heat rate in the DRAM proposed in the present invention, and the execution time of the application program can be significantly reduced on average.

특히 Zookeeper 워크로드에서의 실행 시간은 종래 모델의 128MB DRAM과 비교했을 때 클러스터링 데이터 관리 기법에 의해 대략 2.7 만큼 감소될 수 있다.In particular, the execution time in the Zookeeper workload can be reduced by about 2.7 by the clustering data management technique compared with the conventional model 128 MB DRAM.

이와 유사하게, 히트 레이트의 실험적 결과로서, 종래 기법 및 프리페칭 기법을 각각 사용하는 1024MB의 DRAM 메인 메모리에서 Swift1 및 Swift2의 실행 시간은 대략 6% 및 3% 증가하였다 Similarly, as an experimental result of the heat rate, the execution times of Swift1 and Swift2 in the 1024 MB DRAM main memory using the conventional technique and the prefetching technique respectively increased approximately 6% and 3%

그러나, 384MB의 공간을 사용하여 클러스터링 데이터 관리 기법을 적용한 본 발명의 모델은 종래 메인 메모리의 1024MB 사이즈의 DRAM보다 대략 29% 더 빠른 실행 시간을 달성하였다.However, the model of the present invention employing the clustering data management technique using the space of 384 MB has achieved the execution time about 29% faster than that of the DRAM of 1024 MB in the conventional main memory.

특히, 본 발명의 모델은 종래의 동일한 사이즈의 DRAM을 이용한 비교 모델보다 관리 기법 및 프리페칭 기법에 있어서 각각 대략 75% 및 42% 더 빠른 실행 시간을 보인다.In particular, the model of the present invention shows approximately 75% and 42% faster execution times in the management and pre-fetching schemes, respectively, than the comparable models using DRAM of the same size in the prior art.

앞서 언급한 메모리의 파라미터를 이용하여 에너지 소비를 측정하는 경우 아래의 수학식을 이용할 수 있다.When measuring the energy consumption using the parameters of the memory mentioned above, the following equation can be used.

여기서 E_r 및 E_w는 읽기 및 쓰기 동작에서 에너지 비용을 각각 나타내고, E_dr와 E_dw은 DRAM의 읽기와 쓰기에서 에너지 비용을 각각 나타내며, E_fr, E_fw 및 E_fsr은 NAND Flash의 읽기와 쓰기 그리고 순차적 읽기를 각각 나타낸다.Where E _r and E _w is a read and write operation shown respectively in energy costs, E _dr and E _dw represents each energy costs in the reading and writing of DRAM, E _fr, E _fw and E _fsr is read the NAND Flash and Write and sequential read, respectively.

또한, N_rp 및 N_wp는 DRAM에서 액세스된 페이지에 대한 읽기와 쓰기의 수를 나타내고, N_tr 및 N_sr은 NAND Flash의 블록에서 액세스된 페이지에 대한 읽기와 순차적 읽기의 전체 수를 각각 나타낸다.N _rp and N _wp represent the number of reads and writes to the pages accessed in the DRAM, and N _tr and N _sr represent the total number of reads and sequential reads to the pages accessed in the block of NAND Flash, respectively.

도 11은 본 발명의 일 실시예에 따른 테스트에서 에너지 소비를 정규화한 결과를 나타낸 것이다.Figure 11 shows the results of normalizing energy consumption in a test according to an embodiment of the present invention.

본 발명의 모델은 모든 워크로드에서 동일한 사이즈의 DRAM 메인 메모리 시스템보다 대략 48%의 에너지를 절약할 수 있다.The model of the present invention can save approximately 48% of energy over DRAM main memory systems of the same size in all workloads.

도 12는 본 발명의 일 실시예에 따른 각 워크로드로부터 내보내지는 페이지들의 수를 계산한 것이다.Figure 12 is a calculation of the number of pages to be exported from each workload according to an embodiment of the present invention.

도 12를 살펴 보면, Zookeeper 워크로드에서 주목할만한 성능 개선이 있었음을 확인할 수 있다.Looking at Figure 12, it can be seen that there was a notable performance improvement in the Zookeeper workload.

종래의 메인 메모리 시스템에서 Zookeeper 워크로드가 내보내는 페이지의 양은 본 발명의 모델보다 더 많다.In conventional main memory systems, the amount of pages the Zookeeper workload exports is more than the model of the present invention.

본 발명의 모델은 클러스터링 관리 기법을 통해 페이지를 재사용하기 때문에 Zookeeper 워크로드에서 내보내지는 페이지들의 수는 급격히 감소한다.Because the model of the present invention reuses pages through clustering management techniques, the number of pages exported from the Zookeeper workload sharply decreases.

다른 워크로드에 있어서, 본 발명의 모델은 동일한 사이즈를 가지는 종래의 메인 메모리 시스템의 DRAM의 경우보다 거의 8회 적게 페이지들을 감소시킬 수 있다.For other workloads, the model of the present invention can reduce pages by almost eight times less than in the case of DRAMs of conventional main memory systems of the same size.

본 발명의 모델은 종래의 모델과 비교했을 때, 순차적으로 프리페칭하는 1024MB의 DRAM 및 종래의 메모리 관리 기법으로 구성되는 종래의 모델보다 메인 메모리로부터 4.4 및 7.1 회 각각 더 적게 페이지들을 내보낼 수 있다.The model of the present invention is capable of exporting 4.4 and 7.1 times less pages respectively from the main memory than the conventional model consisting of 1024 MB DRAM and conventional memory management techniques that are sequential prefetching as compared to the conventional model.

이와 같은 본 발명의 모델은 데이터 센터의 TCO(Total Cost Ownership)를 개선할 수 있으며, TCO는 아래의 수학식을 이용하여 계산될 수 있다.Such a model of the present invention can improve the TCO (Total Cost Ownership) of the data center, and the TCO can be calculated using the following equation.

여기서 C_d, C_f, P_d 및 P_f는 DRAM, NAND Flash의 사이즈, DRAM과 NAND Flash의 단위 용량 당 가격을 각각 나타낸다.Here, C _d , C _f , P _d, and P _f represent the DRAM, the size of NAND Flash, and the price per unit capacity of DRAM and NAND Flash, respectively.

상기 [수학식 4]에 기초하여, 본 발명의 모델은 1024MB의 DRAM을 가지는 종래 시스템의 경우보다 7.14 회 더 비용을 절감할 수 있다.Based on Equation (4) above, the model of the present invention can save 7.14 times more than the conventional system having a DRAM of 1024 MB.

도 13은 본 발명의 일 실시예에 따른 클러스터링 관리 기법에서 발생할 수 있는 오버헤드의 테스트 결과이다.FIG. 13 is a test result of overhead that may occur in the clustering management technique according to an embodiment of the present invention.

클러스터의 페이지들에서 프리페칭 동작과 액세스 패턴에 대한 학습을 수행하기 위해, 본 발명의 모델은 오버헤드로서 종래의 시스템보다 많은 양의 페이지들을 읽은 동작을 발생시켰다.In order to perform prefetching operations on the pages of the cluster and learning about access patterns, the model of the present invention has resulted in an operation that reads more pages than conventional systems as overhead.

도 13에 도시된 바와 같이, 본 발명의 모델은 종래의 1024MB의 DRAM 메인 메모리와 비교될 수 있다. 여기서 본 발명의 모델에서 액세스된 페이지들은 종래의 페이지들과 비교하기 위해 정규화될 수 있다.As shown in FIG. 13, the model of the present invention can be compared to a conventional 1024 MB DRAM main memory. Where pages accessed in the model of the present invention may be normalized for comparison with conventional pages.

평균적으로, 본 발명의 모델은 종래의 메인 메모리의 읽는 동작보다 세 번 이상 읽는 동작을 수행할 수 있다.On average, the model of the present invention can perform an operation of reading three or more times than the conventional main memory reading operation.

특히 본 발명의 모델은 Swift1 워크로드에서 스몰 랜덤(small random) 읽기와 쓰기를 가지는 파일 액세스 패턴을 처리하게 되며, 이 때, 거의 4회 및 1.4 회 더 액세스된 페이지들이 각각 관찰된다.In particular, the model of the present invention handles file access patterns with small random reads and writes in the Swift1 workload, where approximately four and fourteen more accessed pages are observed, respectively.

이러한 오버헤드가 전형적인 워크로드에 있어서 상당한 것임에도 불구하고, 본 발명의 모델은 종래의 1024MB의 DRAM 메인 메모리 시스템보다 더 높은 히트 레이트와 더 빠른 실행 시간, 그리고 더 적은 에너지 소비를 달성할 수 있다.Although this overhead is significant for typical workloads, the model of the present invention can achieve a higher hit rate, faster execution time, and less energy consumption than a conventional 1024 MB DRAM main memory system.

따라서, 더 저렴한 TCO를 통해서 성능 개성 및 비용 절감을 달성할 수 있으며, 이는 액세스된 많은 페이지들의 오버헤드를 충분히 커버할 수 있는 장점이다.Thus, performance personality and cost savings can be achieved through a cheaper TCO, which is an advantage that can sufficiently cover the overhead of many pages accessed.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be.

그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

10 : 제1 메모리(DRAM)
20 : 제2 메모리(NAND Flash)
30 : 메인 메모리(DRAM)
40 : 저장소(storage)
100, 200 : 페이지 프리페칭 장치
110 : 클러스터 생성부
120 : 액세스 패턴 학습부
130 : 프리페칭부
140 : 페이지 방출부10: first memory (DRAM)
20: Second memory (NAND Flash)
30: main memory (DRAM)
40: storage
100, 200: page pre-fetching device
110:
120: access pattern learning unit
130: Prefetching unit
140:

Claims

제2 메모리에서 제1 메모리로 페이지를 프리페칭(prefetching)하는 장치에 있어서,
응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 클러스터 생성부;
상기 응용 프로그램으로부터 요청된 페이지가 상기 제1 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면, 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 제2 메모리에서 상기 제1 메모리로 프리페칭 하는 프리페칭부; 및
상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 액세스 패턴 학습부
를 포함하되,
상기 제1 메모리는 상기 제2 메모리보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며,
상기 액세스 패턴 학습부는
클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 하는 페이지 프리페칭 장치.
An apparatus for prefetching a page from a second memory to a first memory,
A cluster generating unit for generating a cluster by grouping pages accessed by an application program according to a data access pattern of the application program;
If the requested page does not exist in the first memory, and if the requested page and the cluster to which the requested page belongs satisfy the pre-fetching condition, all the pages of the cluster to which the requested page belongs A prefetching unit for prefetching from the second memory to the first memory; And
If a page miss after the prefetching of the prefetched clusters fails to be hit in a page request after the prefetching, the page of the cluster in which the page miss occurred is exchanged with the actually requested page, An access pattern learning unit
, &Lt; / RTI &
Wherein the first memory has a smaller space, faster read and faster write speed than the second memory,
The access pattern learning unit
A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculates a cluster closeness and determines whether or not the page is exchanged based on the calculated proximity between clusters.

제1 항에 있어서,
상기 제1 메모리에 상기 프리페칭되는 페이지를 저장할 공간이 미리 정해진 기준 값 미만인 경우,
상기 프리페칭된 클러스터의 페이지들 중 방출 점수(eviction score)가 하위에서 미리 정해진 비율에 해당하는 페이지들을 상기 제2 메모리로 방출시키는 페이지 방출부
를 더 포함하되,
상기 방출 점수는
페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 페이지 점수 및 상기 프리페치 뎁스를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
The method according to claim 1,
When the space for storing the page to be prefetched in the first memory is less than a predetermined reference value,
A page emitter for releasing pages of the pages of the pre-fetched cluster corresponding to a predetermined ratio of the eviction score to the second memory;
Further comprising:
The release score
A page score calculated using a last access time and an access count of a page, and the prefetch depth.

저장소에서 메인 메모리로 페이지를 프리페칭(prefetching)하는 장치에 있어서,
응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 클러스터 생성부;
상기 응용 프로그램으로부터 요청된 페이지가 상기 메인 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 저장소에서 상기 메인 메모리로 프리페칭 하는 프리페칭부; 및
상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 액세스 패턴 학습부
를 포함하되,
상기 메인 메모리는 상기 저장소보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며,
상기 액세스 패턴 학습부는
클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 하는 페이지 프리페칭 장치.
An apparatus for prefetching pages from a store to main memory,
A cluster generating unit for generating a cluster by grouping pages accessed by an application program according to a data access pattern of the application program;
If the requested page does not exist in the main memory, if the requested page and the cluster to which the requested page belongs satisfy the pre-fetching condition, all pages of the cluster to which the requested page belongs are stored in the storage A prefetching unit for prefetching into the main memory; And
If a page miss after the prefetching of the prefetched clusters fails to be hit in a page request after the prefetching, the page of the cluster in which the page miss occurred is exchanged with the actually requested page, An access pattern learning unit
, &Lt; / RTI &
Wherein the main memory has a smaller space, faster read and faster write speed than the storage,
The access pattern learning unit
A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculates a cluster closeness and determines whether or not the page is exchanged based on the calculated proximity between clusters.

제3 항에 있어서,
상기 메인 메모리에 상기 프리페칭되는 페이지를 저장할 공간이 미리 정해진 기준 값 미만인 경우,
상기 프리페칭된 클러스터의 페이지들 중 방출 점수(eviction score)가 하위에서 미리 정해진 비율에 해당하는 페이지들을 상기 저장소로 방출시키는 페이지 방출부
를 더 포함하되,
상기 방출 점수는
페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 페이지 점수 및 상기 프리페치 뎁스를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
The method of claim 3,
When the space for storing the page to be pre-fetched in the main memory is less than a predetermined reference value,
A page ejection unit for ejecting pages of the pages of the pre-fetched cluster corresponding to a predetermined ratio of the eviction score to the storage,
Further comprising:
The release score
A page score calculated using a last access time and an access count of a page, and the prefetch depth.

제1 항 또는 제3 항에 있어서,
상기 프리페칭 조건은
상기 요청된 페이지의 프리페치 뎁스가 상기 요청된 페이지가 속한 클러스터의 평균 프리페치 뎁스보다 큰 것을 특징으로 하는 페이지 프리페칭 장치
The method according to claim 1 or 3,
The pre-
Wherein the prefetch depth of the requested page is greater than the average prefetch depth of the cluster to which the requested page belongs.

제1 항 또는 제3 항에 있어서,
상기 프리페치 뎁스는
LRU(Least Recently Used) 정책에 기반하여 클러스터에 속한 페이지들의 최종 접근 시간과 접근 횟수로 계산된 페이지 점수와, 클러스터의 성공 회수 및 최종 접근 시간을 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
The method according to claim 1 or 3,
The prefetch depth
Wherein the page prefetching unit is calculated using a page score calculated as a final access time and an access count of pages belonging to the cluster based on a Least Recently Used (LRU) policy, the number of successes of clusters, and a final access time.

제6 항에 있어서,
상기 프리페치 뎁스는
아래의 수학식을 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
[수학식]

여기서 함수 pd(C)는 LRU 정책에 기반한 데이터 액세스 패턴에 대한 학습 정도를 나타내는 프리페치 뎁스, C는 특정 클러스터, i는 페이지 인덱스, δ_i(s)는 특정 클러스터 C에 속한 페이지 i의 평균 점수를 계산하는 함수, ρ는 해당 클러스터에 속한 전체 페이지의 수, α는 클러스터의 성공 회수, β는 클러스터의 최근 접근 시간을 각각 나타냄.
The method according to claim 6,
The prefetch depth
Is calculated using the following equation.
[Mathematical Expression]

Here, the function pd (C) is a prefetch depth indicating the degree of learning of the data access pattern based on the LRU policy, C is a specific cluster, i is a page index, and δ _i (s) Ρ is the total number of pages belonging to the cluster, α is the number of successes of the cluster, and β is the recent access time of the cluster.

제1 항 또는 제3 항에 있어서,
상기 클러스터간 근접도는
상기 페이지 미스가 발생한 클러스터(이하 ‘제1 클러스터’라 칭함)와 상기 실제 요청된 페이지가 속한 클러스터(이하 ‘제2 클러스터’라 칭함)의 프리페치 뎁스 및 상기 큐 내에서 상기 제1 클러스터와 제2 클러스터간 상호 거리를 이용하여 계산되며,
상기 액세스 패턴 학습부는
상기 계산된 클러스터간 근접도가 중간 거리(median distance)와 프리페칭에 실패한 클러스터의 평균 프리페치 뎁스에 의해 계산되는 학습 프로파일의 값 보다 작으면 상기 페이지 교환을 수행하는 것을 특징으로 하는 페이지 프리페칭 장치.
The method according to claim 1 or 3,
The intercluster proximity is
A prefetch depth of a cluster in which the page miss occurs (hereinafter referred to as a "first cluster") and a cluster to which the actually requested page belongs (hereinafter referred to as a "second cluster"), Lt; RTI ID = 0.0 > 2 < / RTI > clusters,
The access pattern learning unit
Wherein if the calculated intercluster proximity is less than a median distance and a value of a learning profile computed by an average prefetch depth of a cluster that failed prefetching, .

제8 항에 있어서,
상기 클러스터간 근접도 cc(C1, C2)는 아래의 수학식을 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
[수학식]

여기서, acc()는 평균 클러스터 근접도, C_t는 클러스터의 전체 수, γ는 acc()를 위해 설정 가능한 가중치, ε은 클러스터의 강도(strength)를 나타내는 변수, pd()는 프리페치 뎁스, n은 미스 클러스터 큐의 전체 크기(total entries)를 나타냄.
9. The method of claim 8,
Wherein the intercluster proximity cc (C1, C2) is calculated using the following equation.
[Mathematical Expression]

Here, acc () is the average cluster proximity, C _t is the total number of clusters, γ is configurable weight to acc (), ε is a parameter which measures the intensity (strength) of the cluster, pd () are pre-fetching depth, n represents the total size of the missed cluster queue.

제1 항 또는 제3 항에 있어서,
상기 액세스 패턴 학습부는
상기 페이지 미스가 발생한 클러스터에 속한 페이지들 중 페이지 점수가 가장 낮은 페이지를 선택하여 상기 실제 요청된 페이지와 교환하되,
상기 페이지 점수는 페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 장치.
The method according to claim 1 or 3,
The access pattern learning unit
Selecting a page having the lowest page score among the pages belonging to the cluster in which the page miss occurs and exchanging the page with the actually requested page,
Wherein the page score is calculated using a last access time and an access count of the page.

프리페칭 장치가 제2 메모리에서 제1 메모리로 페이지를 프리페칭(prefetching)하는 방법에 있어서,
(a) 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 단계;
(b) 상기 응용 프로그램으로부터 요청된 페이지가 상기 제1 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 제2 메모리에서 상기 제1 메모리로 프리페칭 하는 단계; 및
(c) 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 단계
를 포함하되,
상기 제1 메모리는 상기 제2 메모리보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며,
상기 (c) 단계는
클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 하는 페이지 프리페칭 방법.
A method for a prefetching device prefetching a page from a second memory to a first memory,
(a) creating a cluster by grouping pages accessed by an application program according to a data access pattern of the application program;
(b) if the requested page is not present in the first memory, if the requested page and the cluster to which the requested page belongs satisfy the pre-fetching condition, all pages of the cluster to which the requested page belongs Prefetching from the second memory to the first memory; And
(c) exchanging a page of the cluster in which the page miss occurred and an actually requested page when the page request after the prefetching has not been hit and a page miss occurs in the prefetched clusters, A step of learning a data access pattern
, &Lt; / RTI &
Wherein the first memory has a smaller space, faster read and faster write speed than the second memory,
The step (c)
A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculating a cluster closeness and determining whether or not the page is exchanged based on the calculated proximity between clusters.

제11 항에 있어서,
상기 제1 메모리에 상기 프리페칭되는 페이지를 저장할 공간이 미리 정해진 기준 값 미만인 경우,
상기 프리페칭된 클러스터의 페이지들 중 방출 점수(eviction score)가 하위에서 미리 정해진 비율에 해당하는 페이지들을 상기 제2 메모리로 방출시키는 단계
를 더 포함하되,
상기 방출 점수는
페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 페이지 점수 및 상기 프리페치 뎁스를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 방법.
12. The method of claim 11,
When the space for storing the page to be prefetched in the first memory is less than a predetermined reference value,
Releasing pages of the pages of the pre-fetched cluster into the second memory, the pages of which have an eviction score lower than a predetermined ratio;
Further comprising:
The release score
A page score calculated using a last access time and an access count of a page, and the prefetch depth.

프리페칭 장치가 저장소에서 메인 메모리로 페이지를 프리페칭(prefetching)하는 방법에 있어서,
(a) 응용 프로그램에 의해 액세스되는 페이지들을 상기 응용 프로그램의 데이터 액세스 패턴별로 그룹화하여 클러스터(cluster)를 생성하는 단계;
(b) 상기 응용 프로그램으로부터 요청된 페이지가 상기 메인 메모리에 존재하지 않는 경우, 상기 요청된 페이지 및 상기 요청된 페이지가 속한 클러스터가 프리페칭 조건을 만족하면 상기 요청된 페이지가 속한 클러스터의 모든 페이지를 상기 저장소에서 상기 메인 메모리로 프리페칭 하는 단계; 및
(c) 상기 프리페칭된 클러스터들 중 상기 프리페칭 이후의 페이지 요청에 히트되지 못하고 페이지 미스(page miss)가 발생하면, 상기 페이지 미스가 발생한 클러스터의 페이지와 실제 요청된 페이지를 교환하여 상기 응용 프로그램의 데이터 액세스 패턴을 학습하는 단계
를 포함하되,
상기 메인 메모리는 상기 저장소보다 작은 공간, 빠른 읽기 및 빠른 쓰기 속도를 가지며,
상기 (c) 단계는
클러스터의 히트 레이트(hit rate)를 의미하는 프리페치 뎁스(prefetch depth)와 페이지 미스가 발생한 클러스터의 정보가 저장된 큐(queue)를 이용하여 상기 페이지 미스가 발생한 클러스터와 상기 실제 요청된 페이지가 속한 클러스터간 근접도(cluster closeness)를 계산하고, 상기 계산된 클러스터간 근접도에 기초하여 페이지의 교환 여부를 결정하는 것을 특징으로 하는 페이지 프리페칭 방법.
A method for prefetching a page from a repository to a main memory, the method comprising:
(a) creating a cluster by grouping pages accessed by an application program according to a data access pattern of the application program;
(b) if the requested page does not exist in the main memory, if the requested page and the cluster to which the requested page belongs satisfy the pre-fetching condition, all pages of the cluster to which the requested page belongs Prefetching from the storage to the main memory; And
(c) exchanging a page of the cluster in which the page miss occurred and an actually requested page when the page request after the prefetching has not been hit and a page miss occurs in the prefetched clusters, A step of learning a data access pattern
, &Lt; / RTI &
Wherein the main memory has a smaller space, faster read and faster write speed than the storage,
The step (c)
A prefetch depth indicating a hit rate of a cluster and a queue storing information of the cluster in which a page miss occurs are used to determine a cluster in which the page miss occurs and a cluster to which the actually requested page belongs Calculating a cluster closeness and determining whether or not the page is exchanged based on the calculated proximity between clusters.

제13 항에 있어서,
상기 메인 메모리에 상기 프리페칭되는 페이지를 저장할 공간이 미리 정해진 기준 값 미만인 경우,
상기 프리페칭된 클러스터의 페이지들 중 방출 점수(eviction score)가 하위에서 미리 정해진 비율에 해당하는 페이지들을 상기 저장소로 방출시키는 단계
를 더 포함하되,
상기 방출 점수는
페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 페이지 점수 및 상기 프리페치 뎁스를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 방법.
14. The method of claim 13,
When the space for storing the page to be pre-fetched in the main memory is less than a predetermined reference value,
Releasing pages of the pages of the pre-fetched cluster into the repository, the pages of which have a predetermined ratio of eviction score below
Further comprising:
The release score
A page score calculated using a last access time and an access count of a page, and the prefetch depth.

제11 항 또는 제13 항에 있어서,
상기 프리페칭 조건은
상기 요청된 페이지의 프리페치 뎁스가 상기 요청된 페이지가 속한 클러스터의 평균 프리페치 뎁스보다 큰 것을 특징으로 하는 페이지 프리페칭 방법.
14. The method according to claim 11 or 13,
The pre-
Wherein the prefetch depth of the requested page is greater than the average prefetch depth of the cluster to which the requested page belongs.

제11 항 또는 제13 항에 있어서,
상기 프리페치 뎁스는
LRU(Least Recently Used) 정책에 기반하여 클러스터에 속한 페이지들의 최종 접근 시간과 접근 횟수로 계산된 페이지 점수와, 클러스터의 성공 회수 및 최종 접근 시간을 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 방법.
14. The method according to claim 11 or 13,
The prefetch depth
Wherein the page prefetching is performed using the number of pages calculated as the last access time and access count of the pages belonging to the cluster based on the least recently used (LRU) policy, the number of successes of clusters, and the last access time.

제11 항 또는 제13 항에 있어서,
상기 클러스터간 근접도는
상기 페이지 미스가 발생한 클러스터(이하 ‘제1 클러스터’라 칭함)와 상기 실제 요청된 페이지가 속한 클러스터(이하 ‘제2 클러스터’라 칭함)의 프리페치 뎁스 및 상기 큐 내에서 상기 제1 클러스터와 제2 클러스터간 상호 거리를 이용하여 계산되며,
상기 (c) 단계는
상기 계산된 클러스터간 근접도가 중간 거리(median distance)와 프리페칭에 실패한 클러스터의 평균 프리페치 뎁스에 의해 계산되는 학습 프로파일의 값 보다 작으면 상기 페이지 교환을 수행하는 것을 특징으로 하는 페이지 프리페칭 방법.
14. The method according to claim 11 or 13,
The intercluster proximity is
A prefetch depth of a cluster in which the page miss occurs (hereinafter referred to as a "first cluster") and a cluster to which the actually requested page belongs (hereinafter referred to as a "second cluster"), Lt; RTI ID = 0.0 > 2 < / RTI > clusters,
The step (c)
The page exchange is performed if the calculated intercluster proximity is less than a median distance and a value of a learning profile computed by an average prefetch depth of a cluster that fails to prefetch. .

제11 항 또는 제13 항에 있어서,
상기 (c) 단계는
상기 페이지 미스가 발생한 클러스터에 속한 페이지들 중 페이지 점수가 가장 낮은 페이지를 선택하여 상기 실제 요청된 페이지와 교환하되,
상기 페이지 점수는 페이지의 최종 접근 시간 및 접근 횟수를 이용하여 계산되는 것을 특징으로 하는 페이지 프리페칭 방법.
14. The method according to claim 11 or 13,
The step (c)
Selecting a page having the lowest page score among the pages belonging to the cluster in which the page miss occurs and exchanging the page with the actually requested page,
Wherein the page score is calculated using the last access time and access count of the page.

제11 항 또는 제13 항에 따른 방법을 수행하기 위한 일련의 명령을 포함하는 기록 매체에 저장된 컴퓨터 프로그램.14. A computer program stored in a recording medium comprising a series of instructions for performing the method according to claim 11 or claim 13.