KR20190109638A

KR20190109638A - Method for scheduling task in big data analysis platform based on distributed file system, program and computer readable storage medium therefor

Info

Publication number: KR20190109638A
Application number: KR1020180025780A
Authority: KR
Inventors: 최영리; 황은지; 김현구
Original assignee: 울산과학기술원
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2019-09-26
Also published as: KR102045997B1

Abstract

The present invention relates to a task scheduling method in a big data processing platform based on a distributed file system. The task scheduling method in a big data processing platform based on a distributed file system including a name node and a data node, comprises the following steps: receiving information on a data block stored in an in-memory cache of a data node from a name node when a map-reduce job is executed; determining whether the data block used for the map-reduce job exists in the in-memory cache on the basis of the information on the data block; and scheduling a task of the map-reduce job using the cached data block when the data block used for the map-reduce job exists in the in-memory cache.

Description

분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼의 태스크 스케줄링 방법, 이를 위한 컴퓨터 프로그램 및 컴퓨터 판독 가능 기록 매체 {METHOD FOR SCHEDULING TASK IN BIG DATA ANALYSIS PLATFORM BASED ON DISTRIBUTED FILE SYSTEM, PROGRAM AND COMPUTER READABLE STORAGE MEDIUM THEREFOR}Task scheduling method of big data processing platform based on distributed file system, computer program and computer readable recording medium for this purpose.

본 발명은 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼의 태스크 스케줄링 기술에 관한 것이다.The present invention relates to a task scheduling technique of a big data processing platform based on a distributed file system.

분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼 중 하나인 하둡(hadoop) 2.0의 Yarn에서 맵 리듀스 작업(Map Reduce job)은 적어도 하나의 태스크(task)로 구성된다. 그리고, Yarn의 스케쥴러는 이러한 태스크를 노드 매니저(node manager)가 실행되는 계산 노드에서 스케쥴링한다. 태스크가 계산 노드에서 스케쥴링 되면 이러한 계산 노드에는 컨테이너(container)가 생성된다. 그리고, 태스크는 이와 같이 생성된 컨테이너에서 실행된다. 이 때, 태스크가 실행되는 계산 노드가 이러한 태스크가 처리해야 하는 데이터 블록을 저장하는 데이터 노드와 동일한 경우, 이러한 태스크는 데이터 로컬 태스크라고 지칭된다.In the Yarn of hadoop 2.0, one of the big data processing platforms based on the distributed file system, the map reduce job consists of at least one task. Yarn's scheduler then schedules these tasks at compute nodes running node managers. When tasks are scheduled on compute nodes, containers are created on those compute nodes. The task is then executed in the container created in this way. At this time, if the compute node on which the task is executed is the same as the data node that stores the data block that this task must process, then this task is referred to as a data local task.

최근 들어, 하둡 분산 파일 시스템(Haddop Distribution File System, HDFS)에 인메모리 캐시(in-memory cache) 기능이 추가 되었다. 태스크를 실행함에 있어서 전술한 데이터 로컬 태스크의 경우 말고도, 하둡 분산 파일 인메모리 캐시 시스템의 인메모리 캐시로부터 그에 저장된 데이터 블록을 읽어들여서 태스크를 실행할 수도 있는데, 이러한 태스크는 캐시 로컬 태스크라고 지칭된다. 이때, 인메모리 캐시에 이와 같이 데이터 블록이 저장되어 있음을 최대한 활용하면서 하둡 분산 파일 인메모리 캐시 시스템의 효용성을 기존보다 향상시키는 방안이 요구된다.Recently, an in-memory cache has been added to the Hadop Distribution File System (HDFS). In executing the task, in addition to the above-described data local task, the task may be executed by reading a block of data stored in the in-memory cache of the Hadoop distributed file in-memory cache system, which is called a cache local task. In this case, there is a need for a method of improving the utility of the Hadoop distributed file in-memory cache system as compared to the conventional method while fully utilizing the data blocks stored in the in-memory cache.

한국등록특허 제10-1661475호, 2016.09.26 등록Korea Patent Registration No. 10-1661475, 2016.09.26 registration

본 발명의 실시예에서는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼, 특히 인메모리 캐시 기반의 하둡 분산 파일 시스템(HDFS)을 포함하는 빅데이터 처리 플랫폼에서의 가상화 환경(Virtualization Environment)과 캐시 지역성(Cache Locality)을 함께 고려한 태스크 스케줄링 기술을 제안하고자 한다.In an embodiment of the present invention, a virtualization environment and a cache locality in a big data processing platform based on a distributed file system, particularly a big data processing platform including an in-memory cache based Hadoop distributed file system (HDFS) We propose a task scheduling technique that considers Cache Locality.

본 발명이 해결하고자 하는 과제는 상기에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재들로부터 본 발명이 속하는 통상의 지식을 가진 자에 의해 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the above-mentioned, another problem to be solved is not mentioned can be clearly understood by those skilled in the art from the following description. will be.

본 발명의 실시예에 따르면, 네임 노드와 데이터 노드를 포함하는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법에 있어서, 맵 리듀스 작업(Map Reduce job)이 실행되면 상기 데이터 노드의 인메모리 캐시(in memory cache)가 저장하는 데이터 블록에 관한 정보를 상기 네임 노드로부터 수신하는 단계; 상기 데이터 블록에 관한 정보를 기초로 상기 맵 리듀스 작업에 이용되는 데이터 블록이 상기 인메모리 캐시에 존재하는지를 판단하는 단계; 및 상기 맵 리듀스 작업에 이용되는 데이터 블록이 상기 인메모리 캐시에 존재하면 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 포함하는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법을 제공할 수 있다.According to an embodiment of the present invention, in a task scheduling method in a big data processing platform based on a distributed file system including a name node and a data node, when a map reduce job is executed, the data node is executed. Receiving information from the name node regarding a data block stored in an in memory cache; Determining whether a data block used for the map reduce operation exists in the in-memory cache based on the information about the data block; And scheduling a task of the map reduce job by using a cached data block when a data block used for the map reduce job exists in the in-memory cache. A task scheduling method in a processing platform may be provided.

여기서, 상기 스케줄링하는 단계는, 상기 맵 리듀스 작업의 태스크가 실행되는 복수의 물리 머신(Physical Machine) 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링할 수 있다.The scheduling may include scheduling a task of the map reduce job by using a data block cached in a virtual machine in a plurality of physical machines on which the task of the map reduce job is executed. .

또한, 상기 스케줄링하는 단계는, 상기 복수의 물리 머신 내에 각 물리 머신 당 복수의 가상 머신이 상주(co-resident)할 경우, 상기 맵 리듀스 작업의 태스크가 실행되는 가상 머신이 상주하는 물리 머신과 다른 물리 머신 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링할 수 있다.The scheduling may include: when a plurality of virtual machines for each physical machine co-resident in the plurality of physical machines, a physical machine in which the virtual machine on which the task of the map reduce operation is executed resides; A task of the map reduce job may be scheduled by using a data block cached in a virtual machine in another physical machine.

또한, 상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하지 않거나 상기 스케줄링하는 단계의 대기 시간이 기 설정된 시간을 초과하면, 상기 가상 머신의 디스크에 저장되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 더 포함할 수 있다.In addition, when the data block of the map reduce operation does not exist in the in-memory cache or the waiting time of the scheduling step exceeds a preset time, the map is utilized by using the data block stored in the disk of the virtual machine. The method may further include scheduling a task of the reduce job.

또한, 상기 스케줄링 하는 단계는, 상기 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 서로 다른 물리 머신 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링할 수 있다.In addition, in the scheduling, the task of the map reduce job may be scheduled by using data blocks cached in virtual machines in different physical machines in the big data processing platform based on the distributed file system. .

본 발명의 실시예에 의하면, 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼, 특히 인메모리 캐시 기반의 하둡 분산 파일 시스템(HDFS)을 포함하는 빅데이터 처리 플랫폼에서의 가상화 환경과 캐시 지역성을 함께 고려하여 맵 리듀스 작업의 태스크를 스케줄링함으로써, 맵 리듀스 작업이 캐시된 입력 데이터를 효율적으로 사용할 수 있다.According to an embodiment of the present invention, a virtual data environment and cache locality are considered together in a big data processing platform based on a distributed file system, particularly a big data processing platform including an in-memory cache-based Hadoop distributed file system (HDFS). By scheduling the task of the map reduce job, the map reduce job can efficiently use the cached input data.

도 1은 본 발명의 실시예에 따른 태스크 스케줄링을 위한 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼, 특히 인메모리 캐시 기반의 하둡 분산 파일 시스템과 빅데이터 처리 플랫폼의 시스템 블록도이다.
도 2는 도 1의 인메모리 캐시 기반의 하둡 분산 파일 시스템을 포함한 빅데이터 처리 플랫폼에서 수행되는 가상화 환경을 고려한 태스크 스케줄링을 예시적으로 설명하는 도면이다.
도 3은 본 발명의 실시예에 따른 분산 파일 시스템, 예컨대 인메모리 캐시 기반의 하둡 분산 파일 시스템을 포함하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 과정을 예시적으로 설명하는 흐름도이다.1 is a system block diagram of a big data processing platform based on a distributed file system for task scheduling according to an embodiment of the present invention, in particular, an in-memory cache-based Hadoop distributed file system and a big data processing platform.
FIG. 2 is a diagram exemplarily illustrating task scheduling considering a virtualization environment performed in a big data processing platform including the in-memory cache-based Hadoop distributed file system of FIG. 1.
3 is a flowchart illustrating a task scheduling process in a big data processing platform including a distributed file system, for example, an in-memory cache-based Hadoop distributed file system according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 범주는 청구항에 의해 정의될 뿐이다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various forms, only the embodiments are to make the disclosure of the present invention complete, and those skilled in the art to which the present invention pertains. It is provided to fully inform the scope of the invention, and the scope of the invention is defined only by the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명은 본 발명의 실시예들을 설명함에 있어 실제로 필요한 경우 외에는 생략될 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, detailed descriptions of well-known functions or configurations will be omitted unless they are actually necessary in describing the embodiments of the present invention. In addition, terms to be described below are terms defined in consideration of functions in the embodiments of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the contents throughout the specification.

본 발명의 실시예는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼, 특히 인메모리 캐시 기반의 하둡 분산 파일 시스템(Haddop Distribution File System, 이하 HDFS라 함)을 포함하는 빅데이터 처리 플랫폼에서의 가상화 환경과 캐시 지역성을 함께 고려하여 맵 리듀스 작업의 태스크를 스케줄링함으로써, HDFS의 맵 리듀스 작업이 캐시된 입력 데이터를 효율적으로 사용할 수 있는 기술을 제안하고자 한다.An embodiment of the present invention provides a virtualization environment in a big data processing platform including a big data processing platform based on a distributed file system, in particular, a Hadop Distribution File System (HDFS) based on an in-memory cache. By scheduling the task of the map reduce task in consideration of the locality and cache locality, we propose a technique for efficiently using the input data cached by the map reduce task of HDFS.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings will be described in detail an embodiment of the present invention.

도 1은 본 발명의 실시예에 따른 태스크 스케줄링을 위한 분산 파일 시스템을 포함하는 빅데이터 처리 플랫폼, 예컨대 인메모리 캐시 기반의 HDFS(10)를 포함하는 빅데이터 처리 플랫폼의 시스템 블록도이다.1 is a system block diagram of a big data processing platform including a distributed file system for task scheduling, eg, a big data processing platform including an in-memory cache-based HDFS 10.

도 1을 참조하면, 인메모리 캐시 기반의 HDFS(10)는 네임 노드(300)와 데이터 노드(400)를 포함할 수 있다. 인메모리 캐시 기반의 HDFS(10)를 포함하는 빅데이터 처리 플랫폼 중 하나인 하둡에서 애플리케이션이 실행되면, 생성된 컨테이너(500)에서 태스크가 실행된다. 아울러, 계산 노드의 컨테이너에 태스크를 스케줄링하기 위한 스케줄링 장치(100) 또한 빅데이터 처리 플랫폼에 포함된다. 여기서, 전술한 계산 노드, 데이터 노드(400) 및 인메모리 캐시(420)는 복수의 서버(컴퓨터)로 구성된 클러스터에서 실행 가능하며, 이러한 서버는 이하에서 노드로 지칭될 수 있다. 경우에 따라 노드는 계산 노드와 데이터 노드(400) 복수의 기능을 수행할 수 있으며, 계산 노드와 데이터 노드(400) 중 어느 하나의 기능만을 수행할 수도 있다.Referring to FIG. 1, the in-memory cache-based HDFS 10 may include a name node 300 and a data node 400. When the application is executed in Hadoop, which is one of the big data processing platforms including the in-memory cache-based HDFS 10, the task is executed in the created container 500. In addition, the scheduling apparatus 100 for scheduling the task in the container of the calculation node is also included in the big data processing platform. Here, the above-described computing node, data node 400 and in-memory cache 420 can be executed in a cluster composed of a plurality of servers (computers), such servers may be referred to as nodes below. In some cases, the node may perform a plurality of functions of the calculation node and the data node 400, and may perform only one function of the calculation node and the data node 400.

데이터 노드(400)는 복수 개로 구성될 수 있으며, 파일이 블록 단위로 나뉜 결과물인 데이터 블록을 스토리지의 일종인 로컬 디스크(410)에 저장한다.The data node 400 may be configured in plural and stores a data block, which is a result of dividing a file into blocks, on a local disk 410 which is a kind of storage.

인메모리 캐시(420)는 인메모리 캐시 기반 HDFS(10)을 구성하는 복수의 서버에 포함된 메모리, 예를 들어 RAM(Random Access Memory)의 일정 부분에 형성될 수 있다.The in-memory cache 420 may be formed in a portion of memory included in a plurality of servers constituting the in-memory cache-based HDFS 10, for example, random access memory (RAM).

데이터 노드(400)는 자신의 인메모리 캐시(420)에 어떤 데이터 블록이 저장되어 있는지를 주기적으로 네임 노드(300)에 보고(reporting)한다. 예를 들어, 데이터 노드(400)는 3초 단위의 하트비트(heartbeat)를 네임 노드(300)에 주기적으로 전송할 수 있다.The data node 400 periodically reports to the name node 300 which data block is stored in its in-memory cache 420. For example, the data node 400 may periodically transmit a heartbeat of 3 seconds to the name node 300.

네임 노드(300)는 파일이나 데이터 블록의 기본 정보 그리고 데이터 블록이 복수의 데이터 노드(400) 중 어떤 데이터 노드(400)에 저장되어 있는지에 대한 메타 데이터 등을 저장하고 관리한다. 아울러, 네임 노드(300)는 맵 리듀스 응용과 관련된 복수의 데이터 블록 중에서 로컬 디스크(410)와 인메모리 캐시(420)에 저장된 데이터 블록, 특히 HDFS에 저장된 정보에 관한 정보를 관리한다. 여기서, 복수의 데이터 블록 중에서 인메모리 캐시(420)에 저장된 데이터 블록에 관한 정보는, 복수의 데이터 블록의 개수 대비 인메모리 캐시(420)가 저장하는 데이터 블록의 개수 비율을 포함할 수 있으나 이에 한정되는 것은 아니다. 또한, 네임 노드(300)는 각 맵 리듀스 작업에 대해 인메모리 캐시(420)에 저장된 입력 데이터의 비율을 스케쥴링 장치(100)에 주기적으로 전달한다. 한편, 네임 노드(300)에 포함된 캐시 관리부(200) 및 가상 머신 토폴로지(210)에 대해서는 후술하기로 한다.The name node 300 stores and manages basic information of a file or a data block and metadata about which data node 400 among the plurality of data nodes 400 is stored. In addition, the name node 300 manages information about data blocks stored in the local disk 410 and the in-memory cache 420, particularly information stored in the HDFS, among a plurality of data blocks related to the map reduce application. The information about the data blocks stored in the in-memory cache 420 among the plurality of data blocks may include a ratio of the number of data blocks stored by the in-memory cache 420 to the number of the plurality of data blocks, but is not limited thereto. It doesn't happen. In addition, the name node 300 periodically transmits a ratio of the input data stored in the in-memory cache 420 to the scheduling apparatus 100 for each map reduce job. Meanwhile, the cache manager 200 and the virtual machine topology 210 included in the name node 300 will be described later.

컨테이너(500)는 태스크를 실행하는 구성이다. 컨테이너(500)가 태스크를 실행하는 경우, 컨테이너(500)는 해당 태스크가 처리해야 하는 데이터 블록을 인메모리 캐시(420)에 저장(caching)하라는 요청을 네임 노드(300)에 전달한다. 이때, 이러한 요청은 해당 데이터 블록이 이미 인메모리 캐시(420)에 저장되어 있는 경우에도 네임 노드(300)에게 전달된다.The container 500 is configured to execute a task. When the container 500 executes a task, the container 500 transmits a request to the name node 300 to cache a block of data to be processed by the task in the in-memory cache 420. In this case, the request is delivered to the name node 300 even if the corresponding data block is already stored in the in-memory cache 420.

스케쥴링 장치(100)는 네임 노드(300)로부터 맵 리듀스 작업과 관련된 전체 데이터 블록의 개수 중에서 인메모리 캐시(420)에 저장된 데이터 블록의 개수의 비율을 수신한다.The scheduling apparatus 100 receives a ratio of the number of data blocks stored in the in-memory cache 420 among the total number of data blocks related to the map reduce operation from the name node 300.

다음으로, 스케쥴링 장치(100)는 캐시 로컬 태스크로 스케쥴링 되기까지 대기하는 최대 대기 시간인 제1 대기 시간, 데이터 로컬 태스크로 스케쥴링 되기까지 대기하는 최대 대기 시간인 제 2 대기 시간이 경과하였는지 여부를 기초로 태스크를 스케쥴링한다. 여기서, 제1 대기 시간은 제2 대기 시간보다 짧을 수 있다.Next, the scheduling apparatus 100 is based on whether the first waiting time, which is the maximum waiting time to wait to be scheduled by the cache local task, and the second waiting time, which is the maximum waiting time waiting to be scheduled by the data local task, has passed. To schedule tasks. Here, the first waiting time may be shorter than the second waiting time.

이러한 스케줄링 장치(100)는 본 발명의 실시예에 따라 여러 맵 리듀스 작업을 동시에 실행할 때 가상화 환경과 캐시 지역성을 고려하여 태스크를 스케줄링할 수 있다. 구체적으로, 스케줄링 장치(100)는 가상 머신 토폴로지를 고려하여 동일한 물리 머신(Physical Machine) 내의 가상 머신에 캐시되어 있거나 저장되어 있는 데이터 블록을 활용할 수 있도록 태스크를 스케줄링할 수 있다.According to an embodiment of the present invention, the scheduling apparatus 100 may schedule a task in consideration of a virtualization environment and cache locality when executing multiple map reduce tasks simultaneously. In detail, the scheduling apparatus 100 may schedule a task to utilize a data block cached or stored in a virtual machine in the same physical machine in consideration of the virtual machine topology.

스케쥴링 장치(100)가 태스크를 스케쥴링하는 보다 자세한 과정은 도 2 및 도 3에서 보다 자세하게 설명하기로 하되, 이러한 스케쥴링 장치(100)는 이러한 과정을 수행하도록 프로그램된 명령어를 저장하는 메모리, 그리고 이러한 명령어를 수행하는 마이크로프로세서에 의하여 구현 가능하다.A detailed process of scheduling the task by the scheduling apparatus 100 will be described in more detail with reference to FIGS. 2 and 3, but the scheduling apparatus 100 stores a memory that stores instructions programmed to perform such a process, and such instructions. It can be implemented by a microprocessor that performs.

캐시 관리부(200)는 인메모리 캐시(420)에 데이터 블록을 추가적으로 저장할 공간이 없을 때 맵 리듀스 애플리케이션의 인메모리 캐시(420)에 대한 친밀도를 기초로 인메모리 캐시(420)에 기 저장된 데이터 블록을, 추가하고자 하는 데이터 블록으로 교체하여 저장한다.The cache manager 200 stores data blocks previously stored in the in-memory cache 420 based on the intimacy with the in-memory cache 420 of the map reduce application when there is no space to additionally store the data blocks in the in-memory cache 420. Replace with the data block to add and save.

또한, 캐시 관리부(200)는 인메모리 캐시(420)에 데이터 블록을 추가할 때 데이터 블록이 인메모리 캐시(420)에 저장될 확률을 고려하여 추가한다. 이 저장될 확률은 맵 리듀스 애플리케이션의 캐시 친밀도(Affinity)와 인메모리 캐시(420)의 크기, 애플리케이션의 입력 데이터 크기에 의해 결정된다.In addition, the cache manager 200 adds the data block in consideration of the probability that the data block is stored in the in-memory cache 420 when the data block is added to the in-memory cache 420. The probability of this storage is determined by the cache affinity of the map reduce application, the size of the in-memory cache 420, and the size of the input data of the application.

가상 머신 토폴로지(210)는 네임 노드(300) 내에 포함될 수 있으며, 가상화 환경의 복수의 가상 머신으로 이루어지고, 이러한 가상 머신에 데이터 블록이 캐시되거나 저장될 수 있다.The virtual machine topology 210 may be included in the name node 300, and may be composed of a plurality of virtual machines in a virtualized environment, and data blocks may be cached or stored in such a virtual machine.

도 2는 도 1의 HDFS(10)에서 수행되는 가상화 환경을 고려한 스케줄링 장치(100)의 태스크 스케줄링을 예시적으로 설명하는 도면이다.FIG. 2 is a diagram exemplarily illustrating task scheduling of the scheduling apparatus 100 in consideration of a virtualization environment performed by the HDFS 10 of FIG. 1.

도 2에 도시한 바와 같이, 가상화 환경은 복수의 가상 머신, 예를 들어 VM1, VM2, VM3, VM4, VM5 및 VM6을 포함하고, 복수의 가상 머신들 중 일부는 물리 머신(PM1, PM2, PM3)에 포함될 수 있다. 그리고, 일부 물리 머신들, 예를 들어 PM1 및 PM2는 랙(Rack), 예를 들어 Rack1에 포함될 수 있다.As shown in FIG. 2, the virtualization environment includes a plurality of virtual machines, for example, VM1, VM2, VM3, VM4, VM5, and VM6, and some of the plurality of virtual machines are physical machines PM1, PM2, PM3. ) May be included. And, some physical machines, for example PM1 and PM2, may be included in a rack, for example Rack1.

맵 리듀스 작업은 태스크가 실행되는 가상 머신, 예를 들어 VM1의 메모리에 입력 데이터가 캐시되어 있을 때 실행 시간이 가장 짧다. 이러한 상태를 VM-ML (Virtual Machine Memory Locality)이라 칭하기로 한다.The map reduce job has the shortest execution time when the input data is cached in the memory of the virtual machine where the task runs, for example, VM1. This state will be referred to as VM-ML (Virtual Machine Memory Locality).

하지만 VM-ML로 태스크를 스케줄링 할 수 없을 때 입력 데이터가 동일한 PM에서 실행중인 다른 VM의 메모리에 캐시되어 있다면 디스크의 I/O 없이 VM간 통신만으로 캐시된 입력 데이터를 활용할 수 있으므로 적은 오버헤드로 태스크가 실행중인 VM의 메모리에 캐시된 입력 데이터를 활용하는 것과 비슷한 성능 효과를 갖는다. 이러한 상태를 PM-ML(Physical Machine Memory Locality)이라 칭하기로 한다.However, if the task can't be scheduled with VM-ML and the input data is cached in the memory of another VM running in the same PM, then the input data cached can be utilized only by inter-VM communication without disk I / O. It has a performance effect similar to utilizing input data cached in the memory of a running VM. This state will be referred to as physical machine memory locality (PM-ML).

또한, 같은 랙(Rack)에서 실행중인 VM의 메모리에 캐시된 입력 데이터도 PM간 통신만으로 디스크의 I/O 없이 캐시된 입력 데이터를 활용할 수 있기 때문에, 디스크에서 직접 데이터를 읽는 경우보다 실행 시간이 짧아지게 된다. 이러한 상태를 RM-ML(Rack Machine Memory Locality)이라 칭하기로 한다.In addition, the input data cached in the memory of the VM running in the same rack can utilize the input data cached without the I / O of the disk only by inter-PM communication. Will be shortened. This state will be referred to as rack machine memory locality (RM-ML).

따라서, 태스크 스케줄링 시 태스크 실행 시간이 가장 짧은 VM-ML로 태스크를 스케줄링 하도록 하며, VM-ML로 스케줄링 하기 위해 대기하는 시간이 설정된 시간을 초과하게 되면, PM-ML로 스케줄링을 시도하게 된다. PM-ML로 스케줄링 하기 위해 대기하는 시간이 설정된 시간을 초과하게 되면 RM-ML로 태스크 스케줄링을 시도한다.Therefore, when scheduling a task, the task execution time is scheduled in the shortest VM-ML, and if the waiting time for scheduling in the VM-ML exceeds the set time, the scheduling is attempted in the PM-ML. If the waiting time for scheduling with PM-ML exceeds the set time, task scheduling is attempted with RM-ML.

만약, 메모리에 캐시된 입력 데이터가 없거나, RM-ML로 태스크를 스케줄링 하기 위해 대기하는 시간이 설정된 시간을 초과하게 되면 태스크가 실행되는 VM의 디스크에 저장된 입력 데이터를 활용하도록 태스크 스케줄링을 시도한다. 이러한 상태를 VM-DL(Virtual Machine Disk Locality)이라 칭하기로 한다.If there is no input data cached in the memory or the waiting time for scheduling the task in the RM-ML exceeds the set time, the task scheduling is attempted to utilize the input data stored in the disk of the VM where the task is executed. This state will be referred to as virtual machine disk locality (VM-DL).

또한, PM-ML과 마찬가지로 같은 PM에서 서비스중인 다른 VM의 디스크에서 읽은 데이터를 활용하는 경우, 동일한 랙(Rack)의 다른 PM에서 서비스중인 VM의 디스크에 저장된 데이터를 활용하는 경우보다 오버헤드가 적다. 따라서 VM-DL로 스케줄링 하기 위해 대기하는 시간이 설정된 시간을 초과하는 경우, 동일한 PM의 다른 VM의 디스크에 저장된 입력 데이터를 활용하도록 태스크 스케줄링을 시도한다. 이러한 상태를 PM-DL(Physical Machine Disk Locality)이라 칭하기로 한다.In addition, as in PM-ML, when using data read from the disk of another VM in service in the same PM, there is less overhead than using data stored in the disk of the VM in service in other PMs in the same rack. . Therefore, if the waiting time for scheduling with the VM-DL exceeds the set time, the task scheduling is attempted to utilize the input data stored in the disk of another VM of the same PM. This state will be referred to as physical machine disk locality (PM-DL).

마지막으로, 스케줄링 하지 못한 경우, 동일한 랙의 다른 PM에서 서비스중인 VM의 디스크에 저장된 입력 데이터를 활용하도록 태스크를 스케줄링 한다. 이러한 상태를 RM-DL(Rack Machine Disk Locality)이라 칭하기로 한다.Finally, if the scheduling fails, the task is scheduled to utilize input data stored in the disk of the VM in service in another PM of the same rack. This state will be referred to as a rack machine disk locality (RM-DL).

요약하면, 본 발명의 실시예에 따른 가상화 환경에서의 스케줄링은 다음과 같은 순서로 진행될 수 있다.In summary, scheduling in a virtualization environment according to an embodiment of the present invention may be performed in the following order.

VM-ML -> PM-ML -> RM-ML -> VM-DL -> PM-DL -> RM-DLVM-ML-> PM-ML-> RM-ML-> VM-DL-> PM-DL-> RM-DL

도 3은 본 발명의 실시예에 따른 분산 파일 시스템, 예컨대 인메모리 캐시 기반의 HDFS을 포함하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 과정을 예시적으로 설명하는 흐름도이다.3 is a flowchart illustrating a task scheduling process in a big data processing platform including a distributed file system, for example, an in-memory cache-based HDFS according to an embodiment of the present invention.

도 3을 참조하면, 먼저 스케줄링 장치(100)는 맵 리듀스 작업이 실행될 때 네임 노드(300)로부터 노드의 인메모리 캐시(420)가 저장하는 데이터 블록에 관한 정보를 수신한다(S100, S110). 여기서, 인메모리 캐시(420)가 저장하는 데이터 블록에 관한 정보란, 예컨대 맵 리듀스 작업과 관련된 전체 데이터 블록의 개수 중에서 인메모리 캐시(420)가 저장하는 데이터 블록의 개수의 비율일 수 있다.Referring to FIG. 3, first, the scheduling apparatus 100 receives information about a data block stored in an in-memory cache 420 of a node from a name node 300 when a map reduce operation is executed (S100 and S110). . Here, the information about the data blocks stored in the in-memory cache 420 may be, for example, a ratio of the number of data blocks stored in the in-memory cache 420 among the total number of data blocks related to the map reduce operation.

다음으로, 스케줄링 장치(100)는 이러한 데이터 블록에 관한 정보를 기초로 맵 리듀스 작업에 이용되는 데이터 블록이 인메모리 캐시(420)에 존재하는지 여부에 따라 맵 리듀스 작업의 태스크가 실행되는 가상 머신에 캐시되어 있는 데이터 블록을 활용할지, 또는 가상 머신의 디스크에 저장되어 있는 데이터 블록을 활용할지를 결정한 후 맵 리듀스 작업의 태스크를 스케줄링할 수 있다.Next, the scheduling apparatus 100 performs a virtual execution of the task of the map reduce job according to whether the data block used in the map reduce job exists in the in-memory cache 420 based on the information about the data block. After deciding whether to utilize the data blocks cached on the machine or the data blocks stored on the disk of the virtual machine, you can schedule the task of the map reduce job.

이때, 스케쥴링 장치(100)는 캐시 로컬 태스크로 스케쥴링되기까지 대기하는 시간의 한계인 제1 대기 시간을 산정한다. 제1 대기 시간은 기 지정된 최대 대기 시간 C 에 네임 노드(300)로부터 수신한 비율을 곱한 값으로 산정될 수 있다. In this case, the scheduling apparatus 100 calculates a first waiting time, which is a limit of a waiting time until scheduled by the cache local task. The first waiting time may be calculated by multiplying a predetermined maximum waiting time C by a ratio received from the name node 300.

단계(S120)에서의 판단 결과, 맵 리듀스 작업에 이용되는 데이터 블록이 인메모리 캐시에 존재하면, 스케줄링 장치(100)는 맵 리듀스 작업의 태스크가 실행되는 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 맵 리듀스 작업의 태스크를 스케줄링할 수 있다(S130).As a result of the determination in step S120, if a data block used for the map reduce operation exists in the in-memory cache, the scheduling apparatus 100 may delete the data block cached in the virtual machine on which the task of the map reduce operation is executed. In operation S130, the task of the map reduce job may be scheduled.

여기서, 맵 리듀스 작업에 이용되는 데이터 블록이 인메모리 캐시에 존재하는 경우에는, 상술한 바와 같이 VM-ML -> PM-ML -> RM-ML의 순서로 태스크를 스케줄링할 수 있다.Here, when the data block used for the map reduce operation exists in the in-memory cache, the task may be scheduled in the order of VM-ML-> PM-ML-> RM-ML as described above.

이때, 스케줄링하는 단계(S130)는, 맵 리듀스 작업의 태스크가 실행되는 복수의 물리 머신 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 맵 리듀스 작업의 태스크를 스케줄링할 수 있다. 예컨대, 도 2에 도시한 바와 같이, 복수의 물리 머신들 중 각각의 물리 머신, 예를 들어 PM1 내에 복수의 가상 머신, 예를 들어 VM1 및 VM2가 상주(co-resident)할 경우, 맵 리듀스 작업의 태스크가 실행되는 가상 머신(VM1)이 상주하는 물리 머신(PM1)과 다른 물리 머신(예를 들어, PM2) 내의 가상 머신(예를 들어, VM3)에 캐시되어 있는 데이터 블록을 활용하여 맵 리듀스 작업의 태스크를 스케줄링할 수 있다.In this case, in the scheduling step S130, the task of the map reduce task may be scheduled by utilizing data blocks cached in the virtual machines in the plurality of physical machines on which the task of the map reduce task is executed. For example, as shown in FIG. 2, when a plurality of virtual machines, for example, VM1 and VM2 co-resident in each of the plurality of physical machines, for example, PM1, map reduce. Maps utilizing data blocks cached in the virtual machine (for example, VM3) in the physical machine (PM1) in which the virtual machine (VM1) on which the task's task is executed resides and in another physical machine (for example, PM2). You can schedule the task of the reduce job.

한편, 맵 리듀스 작업의 데이터 블록이 인메모리 캐시(420)에 존재하지 않거나, 스케줄링하는 단계(S130)의 대기 시간이 기 설정된 시간을 초과하면(S140), 스케줄링 장치(100)는 가상 머신의 디스크에 저장되어 있는 데이터 블록을 활용하여 맵 리듀스 작업의 태스크를 스케줄링할 수 있다(S150).On the other hand, if the data block of the map reduce operation does not exist in the in-memory cache 420, or if the waiting time of the scheduling step (S130) exceeds the predetermined time (S140), the scheduling apparatus 100 of the virtual machine The task of the map reduce job may be scheduled by using the data block stored in the disk (S150).

여기서, 맵 리듀스 작업에 이용되는 데이터 블록이 인메모리 캐시에 존재하지 않을 경우에는, 상술한 바와 같이 VM-DL -> PM-DL -> RM-DL의 순서로 태스크를 스케줄링할 수 있다.If the data block used for the map reduce operation does not exist in the in-memory cache, the task may be scheduled in the order of VM-DL-> PM-DL-> RM-DL as described above.

이상 설명한 바와 같이 본 발명의 실시예에 의하면, 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼, 특히 인메모리 캐시 기반의 HDFS를 포함하는 빅데이터 처리 플랫폼에서 가상화 환경과 캐시 지역성을 함께 고려하여 맵 리듀스 작업의 태스크를 스케줄링함으로써, 맵 리듀스의 작업이 HDFS에 캐시된 입력 데이터를 효율적으로 사용하도록 구현한 것이다.As described above, according to the exemplary embodiment of the present invention, a map data is considered in consideration of a virtualization environment and cache locality in a big data processing platform based on a distributed file system, particularly a big data processing platform including an in-memory cache-based HDFS. By scheduling the task of the deuce job, the map reduce job is implemented to efficiently use the input data cached in HDFS.

한편, 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록에서 설명된 기능들을 수행하는 수단을 생성하게 된다.On the other hand, the combination of each block in the accompanying block diagram and each step in the flowchart may be performed by computer program instructions. These computer program instructions may be mounted on a processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, such that the instructions executed by the processor of the computer or other programmable data processing equipment are described in each block of the block diagram. It creates a means to perform the functions.

이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체(또는 메모리) 등에 저장되는 것도 가능하므로, 그 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체(또는 메모리)에 저장된 인스트럭션들은 블록도의 각 블록에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다.These computer program instructions may be stored on a computer usable or computer readable recording medium (or memory) or the like that may be directed to a computer or other programmable data processing equipment to implement functionality in a particular manner, thereby making the computer available. Alternatively, instructions stored on a computer readable recording medium (or memory) may produce an article of manufacture containing instruction means for performing the functions described in each block of the block diagram.

그리고, 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.In addition, computer program instructions may be mounted on a computer or other programmable data processing equipment, such that a series of operating steps may be performed on the computer or other programmable data processing equipment to create a computer-implemented process to generate a computer or other program. Instructions that perform possible data processing equipment may also provide steps for performing the functions described in each block of the block diagram.

또한, 각 블록은 특정된 논리적 기능(들)을 실행하기 위한 적어도 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시 예들에서는 블록들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In addition, each block may represent a portion of a module, segment, or code that includes at least one or more executable instructions for executing a specified logical function (s). It should also be noted that in some alternative embodiments, the functions noted in the blocks may occur out of order. For example, the two blocks shown in succession may in fact be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending on the corresponding function.

100: 스케줄링 장치
200: 캐시 관리부
210: 가상 머신 토폴로지
300: 네임 노드
400: 데이터 노드
410: 로컬 디스크
420: 인메모리 캐시
500: 컨테이너100: scheduling device
200: cache management unit
210: virtual machine topology
300: Name Node
400: data node
410: local disk
420: In-memory cache
500: container

Claims

네임 노드와 데이터 노드를 포함하는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법에 있어서,
맵 리듀스 작업(Map Reduce job)이 실행되면 상기 데이터 노드의 인메모리 캐시(in memory cache)가 저장하는 데이터 블록에 관한 정보를 상기 네임 노드로부터 수신하는 단계;
상기 데이터 블록에 관한 정보를 기초로 상기 맵 리듀스 작업에 이용되는 데이터 블록이 상기 인메모리 캐시에 존재하는지를 판단하는 단계; 및
상기 맵 리듀스 작업에 이용되는 데이터 블록이 상기 인메모리 캐시에 존재하면 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 포함하는
분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법.A task scheduling method in a big data processing platform based on a distributed file system including a name node and a data node,
Receiving information about a data block stored in an in memory cache of the data node from the name node when a map reduce job is executed;
Determining whether a data block used for the map reduce operation exists in the in-memory cache based on the information about the data block; And
Scheduling a task of the map reduce job by using a cached data block if a data block used for the map reduce job exists in the in-memory cache;
Task scheduling method in big data processing platform based on distributed file system.

제 1 항에 있어서,
상기 스케줄링하는 단계는,
상기 맵 리듀스 작업의 태스크가 실행되는 복수의 물리 머신(Physical Machine) 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는
분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법.The method of claim 1,
The scheduling step,
Scheduling a task of the map reduce job by using a data block cached in a virtual machine in a plurality of physical machines on which the task of the map reduce job is executed
Task scheduling method in big data processing platform based on distributed file system.

제 2 항에 있어서,
상기 스케줄링하는 단계는,
상기 복수의 물리 머신 내에 각 물리 머신 당 복수의 가상 머신이 상주(co-resident)할 경우, 상기 맵 리듀스 작업의 태스크가 실행되는 가상 머신이 상주하는 물리 머신과 다른 물리 머신 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는
분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법.The method of claim 2,
The scheduling step,
When a plurality of virtual machines for each physical machine co-resident in the plurality of physical machines, a cache is stored in a virtual machine in a physical machine different from the physical machine in which the virtual machine on which the task of the map reduce job is executed resides. Scheduling a task of the map reduce job using a data block
Task scheduling method in big data processing platform based on distributed file system.

제 1 항에 있어서,
상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하지 않거나 상기 스케줄링하는 단계의 대기 시간이 기 설정된 시간을 초과하면, 상기 가상 머신의 디스크에 저장되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 더 포함하는
분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법.The method of claim 1,
If the data block of the map reduce operation does not exist in the in-memory cache or if the waiting time of the scheduling step exceeds a preset time, the map reduce operation is utilized by using the data block stored in the disk of the virtual machine. Further comprising the step of scheduling a task of the job
Task scheduling method in big data processing platform based on distributed file system.

제 1 항에 있어서,
상기 스케줄링 하는 단계는,
상기 빅데이터 처리 플랫폼의 서로 다른 물리 머신 내의 가상 머신에 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는
분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법.The method of claim 1,
The scheduling step,
Scheduling a task of the map reduce job using a data block cached in a virtual machine in different physical machines of the big data processing platform
Task scheduling method in big data processing platform based on distributed file system.

네임 노드와 데이터 노드를 포함하는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법에 있어서,
맵 리듀스 작업이 실행되면 상기 데이터 노드의 인메모리 캐시가 저장하는 데이터 블록에 관한 정보를 상기 네임 노드로부터 수신하는 단계;
상기 데이터 블록에 관한 정보를 기초로 상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하는지를 판단하는 단계; 및
상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하면 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 수행하는 명령어를 포함하는 프로그램이 기록된
컴퓨터 판독 가능 기록 매체.A task scheduling method in a big data processing platform based on a distributed file system including a name node and a data node,
Receiving information about a data block stored in an in-memory cache of the data node from the name node when a map reduce operation is executed;
Determining whether a data block of the map reduce operation exists in the in-memory cache based on the information about the data block; And
If a data block of the map reduce job exists in the in-memory cache, a program including an instruction for performing a task of scheduling a task of the map reduce job by using the cached data block is recorded.
Computer-readable recording media.

네임 노드와 데이터 노드를 포함하는 분산 파일 시스템을 기반으로 하는 빅데이터 처리 플랫폼에서의 태스크 스케줄링 방법에 있어서,
맵 리듀스 작업이 실행되면 상기 데이터 노드의 인메모리 캐시가 저장하는 데이터 블록에 관한 정보를 상기 네임 노드로부터 수신하는 단계;
상기 데이터 블록에 관한 정보를 기초로 상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하는지를 판단하는 단계; 및
상기 맵 리듀스 작업의 데이터 블록이 상기 인메모리 캐시에 존재하면 캐시되어 있는 데이터 블록을 활용하여 상기 맵 리듀스 작업의 태스크를 스케줄링하는 단계를 수행하는
컴퓨터 판독 가능 기록 매체에 저장된 컴퓨터 프로그램.A task scheduling method in a big data processing platform based on a distributed file system including a name node and a data node,
Receiving information about a data block stored in an in-memory cache of the data node from the name node when a map reduce operation is executed;
Determining whether a data block of the map reduce operation exists in the in-memory cache based on the information about the data block; And
If a data block of the map reduce job exists in the in-memory cache, scheduling a task of the map reduce job by using a cached data block
Computer program stored on a computer readable recording medium.