KR20160050930A

KR20160050930A - Apparatus for Processing Transaction with Modification of Data in Large-Scale Distributed File System and Computer-Readable Recording Medium with Program

Info

Publication number: KR20160050930A
Application number: KR1020140150131A
Authority: KR
Inventors: 최용진; 이재영; 박근태; 이정룡; 최승운
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2016-05-11
Also published as: KR102253841B1

Abstract

Disclosed are an apparatus for processing a transaction including a modification of data in a large-scale distributed file system, and a computer-readable recording medium. According to an aspect of an embodiment of the present invention, a big data system is provided, in which the transaction including the modification of the data is processed while intactly using big data which is accumulated in a Hadoop distributed file system (HDFS) of Hadoop. In the large-scale distributed file system, the computer-readable recording medium has a program for implementing the steps of: parsing a query received from a client; obtaining a lock with respect to data related to a transaction when the parsed query is a query as for the transaction including a data modification; transmitting the data with respect to the transaction to a data node storing data related to the transaction; receiving the data with respect to the transaction to save a change log with respect to chunk data to be modified, in a local database; informing whether a commit is requested, after saving the change log, to permit the commit; and canceling the lock with respect to the data related to the transaction after receiving the commit.

Description

대용량 분산 파일 시스템에서 데이터의 수정을 포함하는 트랜잭션 처리 장치 및 컴퓨터로 읽을 수 있는 기록매체{Apparatus for Processing Transaction with Modification of Data in Large-Scale Distributed File System and Computer-Readable Recording Medium with Program}BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a transaction processing apparatus and a computer-readable recording medium including a modification of data in a large-capacity distributed file system,

본 실시예는 대용량 분산 파일 시스템에서 데이터의 수정을 포함하는 트랜잭션 처리 장치 및 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.This embodiment relates to a transaction processing apparatus and a computer-readable recording medium including a modification of data in a large capacity distributed file system.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the present embodiment and do not constitute the prior art.

이하에 기술되는 내용은 단순히 본 실시예와 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아님을 밝혀둔다.It should be noted that the following description merely provides background information related to the present embodiment and does not constitute the prior art.

PC(Personal Computer), 모바일 장치 및 인터넷의 이용이 일상화되면서 IT 사업자가 처리해야 하는 데이터양은 기하급수적으로 증가하고 있다. 사용자가 제작하는 UCC(User Created Contents), SNS(Social Network Service) 데이터는 데이터 증가 속도뿐 아니라 형태와 질에서도 기존과 다른 양상을 보이고 있다. 따라서 이와 같이 다양하고 방대한 규모의 데이터는 기업이나 국가의 미래 경쟁력을 좌우하는 중요한 요소로 활용될 수 있다. 과거에도 대규모 데이터를 분석하고 의미 있는 정보를 찾아내려는 시도는 있었지만, 현재의 빅데이터(BigData) 환경은 과거와 비교해 데이터양과 다양성 측면에서 과거와는 비교할 수 없을 정도가 되었다.With the everyday use of personal computers (PCs), mobile devices and the Internet, the amount of data that an IT service operator has to process is growing exponentially. UCC (User Created Contents) and SNS (Social Network Service) data produced by users are not only in terms of data growth rate but also in form and quality. Therefore, such diverse and vast amounts of data can be used as important factors for future competitiveness of companies and countries. In the past, attempts have been made to analyze large data and find meaningful information, but the current BigData environment is far from the past in terms of data volume and diversity compared to the past.

최근에 등장한 빅데이터 처리 시스템으로 하둡(Hadoop)은 구글(Google)의 GFS(Google File System)을 기반으로 인터넷 환경에서 HTML, TEXT 등의 다양한 대규모의 비정형 데이터를 처리할 수 있도록 개발되어 왔다. 하둡은 HDFS(Hadoop Distributed File System)와 HDFS에서 관계형 데이터베이스(Relational Database)에서 사용하는 SQL(Structured Query Language)과 같은 질의를 처리하는 엔진을 포함한다.Hadoop has been developed to handle large-scale unstructured data such as HTML and TEXT in the Internet environment based on Google's GFS (Google File System), a recently introduced big data processing system. Hadoop includes an engine that handles queries such as Hadoop Distributed File System (HDFS) and Structured Query Language (SQL) used in relational databases in HDFS.

그러나 하둡과 같은 빅데이터 처리 시스템은 한번 데이터가 적재되면, 갱신하거나 삭제하는 등의 수정이 곤란한 불편을 있다. 이러한 불편을 해소하고자 데이터의 수정을 포함하는 트랜잭션(Transaction)을 처리하기 위해 별도의 데이터 베이스를 사용하는 것은 용량에 있어서나 가격에 있어서 제약이 존재하게 된다. 최근 제안되고 있는 빅데이터 처리 시스템으로써, 데이터 수정이 용이한 키(Key)/밸류(Value) 기반으로 저장하여 이미 파일 시스템 내에 저장된 데이터에 트랜잭션을 지원하도록 하는 빅데이터 처리 시스템이 잇다. 그러나 기존의 빅 데이터 처리 시스템을 앞서 언급한 빅 데이터 처리 시스템으로 마이그레이션(Migration) 하는 것은 상당한 비용을 필요로 하는 단점이 있다. However, a big data processing system such as Hadoop has a problem that it is difficult to modify such as updating or deleting data once it is loaded. To solve this inconvenience, the use of a separate database to process a transaction, including modification of the data, has a limitation in capacity and price. Recently, a big data processing system is a big data processing system that stores data based on key / value that is easy to modify and supports transactions to data already stored in the file system. However, migrating the existing big data processing system to the big data processing system mentioned above is disadvantageous in that it requires considerable cost.

본 실시예는, 하둡의 HDFS에 축적된 빅 데이터를 그대로 사용하면서 데이터의 수정을 포함하는 트랜잭션을 처리할 수 있는 빅 데이터 시스템을 제공하는 데 주된 목적이 있다.The main object of the present embodiment is to provide a big data system capable of processing a transaction including modification of data while directly using big data stored in HDFS of Hadoop.

본 실시예의 일 측면에 의하면, 대용량 분산 파일 시스템에, 클라이언트(Client)로부터 수신한 질의를 파싱(Parsing)하는 과정과 파싱된 질의가 데이터 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우, 상기 트랜잭션과 관련된 데이터에 대해 락(Lock)을 획득하는 과정과 상기 트랜잭션과 관련된 데이터가 저장되어 있는 데이터 노드로 상기 트랜잭션 정보를 전송하는 과정과 상기 트랜잭션 정보를 수신하여 수정할 청크(Chunk) 데이터에 대한 변경 로그(Log)를 로컬(Local) 데이터베이스에 저장하는 과정과 상기 변경 로그가 저장된 후, 커밋(Commit) 여부를 통지하여 상기 커밋을 승인하는 과정 및 상기 커밋을 수신한 후, 상기 트랜잭션과 관련된 데이터에 대한 락을 해제하는 과정을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.According to an aspect of the present invention, in a case where a query about a transaction received from a client is parsed into a large-capacity distributed file system and a query about a transaction including a data modification includes a parsed query, A process of acquiring a lock for data related to a transaction, a process of transmitting the transaction information to a data node storing data related to the transaction, a process of changing a chunk data to be modified by receiving the transaction information, A step of storing a log in a local database and a step of accepting the commit by informing whether or not the commit log is stored after the change log is stored; A computer-readable recording medium on which a program for realizing the process of releasing the lock is provided do.

또한, 본 실시예의 다른 측면에 의하면, 클라이언트(Client)로부터 질의(Query)를 수신하여 수신한 질의를 파싱하고, 파싱한 질의 정보를 전송하는 네임 노드와 데이터를 저장하고 있으며, 상기 파싱한 질의 정보를 수신하여 전달하는 데이터 노드와 상기 데이터 노드에 저장된 각각의 데이터의 메타데이터(Metadata)를 저장하고 있으며, 상기 파싱한 질의 정보가 데이터의 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우 상기 네임 노드로 상기 트랜잭션과 관련된 데이터에 대해 락(Lock)을 전송하는 메타데이터 데이터베이스(Database) 및 상기 데이터 노드로부터 전달받은 질의 정보가 데이터의 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우, 수정할 청크(Chunk) 데이터에 대한 변경 로그를 저장하고, 상기 데이터 노드로부터 전달받은 질의 정보가 데이터 조회와 관련된 질의인 경우, 상기 청크 데이터에 대한 변경 로그가 존재하는지 여부를 확인하는 로컬 데이터베이스를 포함하는 것을 특징으로 하는 질의 처리장치를 제공한다.According to another aspect of the present invention, there is provided a method for receiving a query from a client and parsing the received query, storing a name node and data for transmitting the parsed query information, And stores the meta data of each data stored in the data node. If the parsed query information is a query related to a transaction including a modification of data, A metadata database for transmitting a lock to data related to the transaction to a node, and a metadata database for storing a query related to a transaction including a modification of the data, Stores a change log for chunk data, and stores query information received from the data node If the query associated with the query data, and provides a query processing unit comprises a local database to determine whether the change log on the data chunks exist.

이상에서 설명한 바와 같이 본 실시예에 의하면, 트랜잭션의 4가지 성질 ACID - 원자성(Atomic), 일관성(Consistent), 고립성(Isolated), 지속성(Durable)을 만족하면서도, 별도의 데이터 베이스나 빅 데이터 처리 시스템 전제의 마이그레이션이 없이 빅 데이터 처리 시스템이 데이터의 수정을 포함하는 트랜잭션을 처리할 수 있다.As described above, according to the present embodiment, the four properties of the transaction ACID - Atomic, Consistent, Isolated, and Durable, A big data processing system can handle transactions involving the modification of data without system premigration migration.

도 1은 본 발명의 일 실시예에 따른 대용량 분산파일 시스템의 구성을 도시한 블럭도이다.
도 2는 본 발명의 일 실시예에 따른 대용량 분산파일 시스템에서 트랜잭션의 처리 방법을 예시한 순서도이다.
도 3은 본 발명의 다른 일 실시예에 따른 대용량 분산파일 시스템에서 데이터의 수정이 존재하는지 여부를 조회하기 위한 방법을 예시한 순서도이다.
도 4는 본 발명의 다른 일 실시예에 따른 대용량 분산파일 시스템에서 트랜잭션의 수정내용이 반영된 새로운 파일을 생성하는 방법을 예시한 순서도이다.1 is a block diagram illustrating a configuration of a large capacity distributed file system according to an embodiment of the present invention.
2 is a flowchart illustrating a transaction processing method in a large capacity distributed file system according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a method for inquiring whether data modification is present in a large capacity distributed file system according to another embodiment of the present invention.
FIG. 4 is a flowchart illustrating a method of generating a new file in which a modification of a transaction is reflected in a large capacity distributed file system according to another embodiment of the present invention.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. It should be noted that, in adding reference numerals to the constituent elements of the drawings, the same constituent elements are denoted by the same reference numerals even though they are shown in different drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In describing the components of the present invention, terms such as first, second, A, B, (a), and (b) may be used. These terms are intended to distinguish the constituent elements from other constituent elements, and the terms do not limit the nature, order or order of the constituent elements. Throughout the specification, when an element is referred to as being "comprising" or "comprising", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise . In addition, '... Quot ;, " module ", and " module " refer to a unit that processes at least one function or operation, and may be implemented by hardware or software or a combination of hardware and software.

도 1은 본 발명의 일 실시예에 따른 대용량 분산파일 시스템의 구성을 도시한 블럭도이다.1 is a block diagram illustrating a configuration of a large capacity distributed file system according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 대용량 분산파일 시스템은 빅데이터를 처리하기 위해 수집된 대용량의 데이터를 여러 서버에 나눠서 저장하도록 하는 기술이다. 도 1을 참조하면, 본 발명의 일 실시예에 따른 대용량 분산파일 시스템(100)는 네임 노드(마스터)(110), 메타 데이터 데이터베이스(Database, 이하 'DB'라고 약칭함)(120) 데이터 노드(130) 및 로그 DB(140, 142, 144)를 포함한다. The large capacity distributed file system according to an embodiment of the present invention is a technology for dividing and storing a large amount of data collected for processing big data into a plurality of servers. 1, a large capacity distributed file system 100 according to an embodiment of the present invention includes a name node (master) 110, a metadata database 120 (130) and a log DB (140, 142, 144).

네임 노드(마스터)(110)는 저장할 데이터를 데이터 노드(130)로 분배를 하는 역할을 하는 노드로서, 실제로 저장할 데이터가 저장되는 곳은 아니고, 클라이언트(Client, 미도시)로부터 저장할 데이터를 수신하여 이를 각각의 데이터 노드(130)로 분배하는 역할을 한다. 또한, 클라이언트로부터 트랜잭션(Transaction) 질의(Query)를 수신하는 경우, 수신한 질의를 파싱(Parsing)하여 데이터 노드로 수신한 질의에 관한 정보를 송신한다.The name node (master) 110 is a node that distributes data to be stored to the data node 130, and receives data to be stored from a client (not shown) instead of storing data to be actually stored And distributes it to each data node 130. In addition, when receiving a transaction query from a client, it parses the received query and transmits information about the query received to the data node.

메타 데이터 DB(Meta Data DB)(120)는 네임 노드(마스터)가 분배한 실제 데이터들의 메타 정보를 저장하고 있는 DB로서, 각각의 데이터 노드에 저장되어 있는 데이터의 메타 정보를 저장한다. 또한 네임 노드(마스터)가 클라이언트로부터 트랜잭션 질의를 수신하는 경우, 다른 클라이언트 또는 다른 장치들이 트랜잭션이 완료되지 않은 데이터에 접근하는 것을 방지하기 위해 메타 데이터 DB에 기 저장된 락(Lock)을 부여할 수 있다. 도 1에서 메타 데이터 DB(120)는 네임 노드(마스터)(110) 외부에 존재하는 것으로 도시되어 있으나, 반드시 이에 한정하는 것은 아니고 네임 노드(마스터)(110) 내부에 존재할 수 있다.The Meta Data DB 120 is a DB that stores meta information of real data distributed by a name node (master), and stores meta information of data stored in each data node. Also, when a naming node (master) receives a transaction query from a client, it may grant a lock stored in the metadata DB to prevent other clients or other devices from accessing the uncompleted data . In FIG. 1, the metadata DB 120 is shown to be located outside the name node (master) 110, but it is not limited thereto and may exist within the name node (master) 110.

데이터 노드(130)는 실제 적재할 데이터가 저장되는 공간으로서, 네트워크로 연결된 서버 또는 스토리지에 해당한다. 즉, 데이터 노드(130)는 네임 노드(마스터)가 분배한 데이터들을 저장한다. 데이터 노드(130)는 데이터 노드 1(132), 데이터 노드 2(134) 및 데이터 노드 N(136)으로 복수의 데이터 노드들을 가질 수 있다. 또한 데이터 노드가 수신한 질의에 관한 정보를 로그 DB(140, 142, 144)에 전달하고, 로그 DB가 기록을 완료한 경우, 네임 노드(마스터)(마스터)에 커밋(Commit)을 통지한다.The data node 130 is a space in which data to be actually loaded is stored, and corresponds to a network-connected server or storage. That is, the data node 130 stores data distributed by a name node (master). Data node 130 may have a plurality of data nodes in data node 1 132, data node 2 134 and data node N 136. Also, the data node transmits information about the query received to the log DBs 140, 142, and 144, and notifies the name node (master) (master) when the log DB completes the recording.

로그 DB(140, 142, 144)는 데이터 노드로부터 질의에 관한 정보를 수신한 경우, 그에 따른 청크(Chunk) 데이터들에 대한 변경 로그(Log)를 기록하는 역할을 한다. 특히 질의가 데이터 수정을 포함하는 트랜잭션에 관한 질의인 경우, 로그 DB는 수정할 청크 데이터들에 대한 변경 로그를 기록한다. 로그 DB는 변경 로그를 기록함에 있어, 수정이 필요한 청크 데이터들에 각각 식별자(ID: Identifier)를 부여할 수 있다. 도 1에서 로그 DB는 데이터 노드 외부에 존재하는 것으로 도시되어 있으나, 반드시 이에 한정하는 것은 아니고 데이터 노드 내부에 존재할 수 있다.The log DBs 140, 142, and 144, when receiving information about a query from the data node, record a change log for the corresponding chunk data. In particular, if the query is a query about a transaction involving data modification, the log DB records a change log for the chunk data to be modified. The log DB records the change log, and it is possible to assign an identifier (ID) to each chunk data that needs to be modified. In FIG. 1, the log DB is shown to exist outside the data node, but it is not limited thereto and may exist inside the data node.

본 발명의 일 실시예에 따른 대용량 분산파일 시스템은 HDFS로 구현될 수 있다.The large capacity distributed file system according to an embodiment of the present invention can be implemented in HDFS.

도 2는 본 발명의 일 실시예에 따른 대용량 분산파일 시스템에서 트랜잭션의 처리 방법을 예시한 순서도이다.2 is a flowchart illustrating a transaction processing method in a large capacity distributed file system according to an embodiment of the present invention.

클라이언트로부터 데이터 수정 등의 트랜잭션 질의를 수신한다(S210). 네임 노드(마스터)가 클라이언트로부터 데이터 수정 등을 포함하는 트랜잭션 질의를 수신한다.And receives a transaction query such as data modification from the client (S210). The name node (master) receives the transaction query including the data modification from the client.

수신한 질의를 파싱하여, 트랜잭션과 관련된 질의인 경우 락을 획득한다(S220). 네임 노드(마스터)는 수신한 질의를 파싱하고, 데이터 수정 등을 포함하는 트랜잭션과 관련된 질의인 경우 메타 데이터 DB로부터 락을 획득한다. 락을 획득하는 이유로는 데이터 수정을 요하는 트랜잭션인 경우, 다른 클라이언트 또는 다른 기타 장치가 트랜잭션이 완료되지 않은 데이터를 이용하여 다른 별도의 처리나 작업을 하지 못하도록 접근하는 것을 방지하기 위함이다. 락에 관한 정보는 이미 메타 데이터 DB에 저장되어 있기 때문에, 네임 노드(마스터)가 수신한 질의를 파싱하고, 트랜잭션에 관한 정보를 데이터 노드로 전송하기 전에 먼저 락에 관한 정보를 수신한다.The received query is parsed to acquire a lock in case of a query related to the transaction (S220). The name node (master) parses the received query and acquires the lock from the metadata database if it is a query related to a transaction including data modification. The reason for acquiring locks is to prevent other clients or other devices from accessing data that do not complete the transaction so that they can not do any other processing or work, in the case of a transaction that requires data modification. Since the lock information is already stored in the metadata DB, the name node (master) parses the received query and receives information about the lock before transferring the transaction information to the data node.

트랜잭션과 관련된 데이터 노드에 트랜잭션에 관한 정보를 전송한다(S230). 네임 노드(마스터)에서 트랜잭션과 관련된 질의이기 때문에 메타 데이터 DB로부터 락을 획득한 경우, 트랜잭션을 요하는 데이터와 관련된 데이터 노드에 트랜잭션에 관한 정보를 전송한다. 실제 데이터는 데이터 노드에 축적되어 있기 때문에, 네임 노드(마스터)는 트랜잭션에 관한 정보를 데이터 노드로 전송한다. The information about the transaction is transmitted to the data node related to the transaction (S230). If the lock is acquired from the metadata DB because it is a transaction-related query at the name node (master), information about the transaction is transmitted to the data node related to the data requiring the transaction. Since the actual data is stored in the data node, the name node (master) transfers information about the transaction to the data node.

트랜잭션에 따라 변경할 청크 데이터들에 대한 변경 로그를 로그 DB에 저장한다(S240). 데이터 노드는 트랜잭션에 관한 정보를 수신한 경우, 트랜잭션에 관한 정보에 따라 변경할 청크 데이터들에 대하여 변경 로그를 생성하고, 이를 로그 DB에 저장한다. 트랜잭션에 관한 정보를 수신할 때마다 데이터 노드에 저장된 데이터를 수정하는 것은 곤란하기 때문에, 이들에 관한 변경 로그를 생성하고 이를 저장한다.The change log for the chunk data to be changed according to the transaction is stored in the log DB (S240). When the data node receives the information about the transaction, it generates a change log for the chunk data to be changed according to the information about the transaction, and stores the change log in the log DB. Since it is difficult to modify the data stored in the data nodes every time information about the transaction is received, a change log is generated for them and stored.

커밋 여부를 네임 노드(마스터)에 통지한다(S250). 트랜잭션에 따라 변경할 청크 데이터들에 대하여 변경 로그를 생성하고 이를 로그 DB에 저장한 경우, 로그 DB는 커밋 정보(로그 DB로 변경 로그를 저장을 완료하였음을 나타내는 정보)를 데이터 노드로 통지를 하고 데이터 노드는 이를 네임 노드(마스터)로 통지를 한다.And informs the name node (master) whether or not to commit (S250). When a change log is created for chunk data to be changed according to a transaction and the change log is stored in the log DB, the log DB notifies the data node of the commit information (information indicating that the change log has been stored in the log DB) The node notifies it to the name node (master).

트랜잭션과 관련된 데이터 노드로부터 커밋 정보를 통지받은 경우, 커밋을 승인하고 락을 해제한다(S260). 네임 노드(마스터)가 데이터 노드로부터 커밋 정보를 통지받은 경우, 이미 로그 DB에 청크 데이터들에 대한 변경 로그가 저장된 상태이기 때문에, 데이터 노드가 통시한 커밋 여부에 대하 커밋을 승인하고, 메타 데이터 DB는 락을 해제한다. If the commit information is notified from the data node associated with the transaction, the commit is accepted and the lock is released (S260). Since the change log for the chunk data is already stored in the log DB when the name node (master) is notified of the commit information from the data node, the commit is approved for whether or not the data node has committed the commit, Releases the lock.

도 3은 본 발명의 다른 일 실시예에 따른 대용량 분산파일 시스템에서 데이터의 수정이 존재하는지 여부를 조회하기 위한 방법을 예시한 순서도이다.FIG. 3 is a flowchart illustrating a method for inquiring whether data modification is present in a large capacity distributed file system according to another embodiment of the present invention.

클라이언트로부터 데이터 조회에 관한 질의를 수신한다(S310). 클라이언트로부터 네임 노드(마스터)가 데이터 조회에 관한 질의를 수신한다. 데이터 조회에 관한 질의는 데이터 수정 등을 포함하는 트랜잭션이 아닌 데이터 노드에 저장된 데이터 및 저장된 데이터에 수정이 되었는지 여부를 확인하기 위한 질의에 해당한다. A query regarding data inquiry from the client is received (S310). From the client, the name node (master) receives a query about data retrieval. The query about the data query corresponds to a query to check whether the data stored in the data node and the stored data are modified, not the transaction including data modification and the like.

수신한 질의를 파싱하여, 데이터 노드에 질의에 관한 정보를 전송한다(S320). 네임 노드(마스터)는 수신한 질의를 파싱하고, 데이터 노드에 파싱한 질의에 관한 정보를 전송한다. 이때, 메타 데이터 DB로부터 락을 획득할 필요는 없다. 데이터 노드에 저장된 데이터를 수정을 포함하는 트랜잭션에 관한 질의가 아니라 데이터 조회를 하기 위한 질의이기 때문에, 네임 노드(마스터)는 메타 데이터 DB로부터 락을 획득하지 않는다. The received query is parsed and information about the query is transmitted to the data node (S320). The name node (master) parses the received query and sends information about the query parsed to the data node. At this time, it is not necessary to acquire a lock from the metadata DB. The name node (master) does not acquire a lock from the metadata DB because it is a query for data inquiry, not a query about a transaction including modification of data stored in the data node.

쿼리에 포함된 데이터에 대하여 변경 로그가 존재하는지 여부를 판단한다(S330). 데이터 노드가 질의에 관한 정보를 수신하는 경우, 로그 DB에 변경 로그가 존재하는지 여부를 확인한다. 이때, 로그 DB는 수정이 필요한 청크 데이터들에 각각 식별자를 부여하여 저장할 수 있기 때문에, 변경 로그가 존재하는지 여부를 확인함에 있어서 용이하게 확인할 수 있다.It is determined whether a change log exists in the data included in the query (S330). When the data node receives information about the query, it checks whether there is a change log in the log DB. At this time, since the log DB can store an identifier to each chunk data that needs to be modified, it can be easily confirmed whether or not the change log exists.

쿼리에 포함된 데이터에 대하여 변경 로그가 존재하는 경우, 쿼리에 포함된 데이터의 청크 데이터에 대한 스냅샷을 생성하여 전송한다(S340). 쿼리에 포함된 데이터에 대하여 변경 로그가 존재하는 경우, 로그 DB는 쿼리에 포함된 데이터의 청크 데이터에 대한 스냅샷(Snapshot)을 생성하여 이를 데이터 노드를 거쳐 네임 노드(마스터)로 전송한다.If there is a change log for the data included in the query, a snapshot of the chunk data of the data included in the query is generated and transmitted (S340). If there is a change log for the data included in the query, the log DB generates a snapshot of the chunk data of the data included in the query, and transmits the snapshot to the name node (master) via the data node.

쿼리에 포함된 데이터에 대하여 변경 로그가 존재하지 않는 경우, 쿼리에 포함된 데이터에 대해 데이터 노드에 저장된 데이터를 전송한다(S350). 쿼리에 포함된 데이터에 대하여 변경 로그가 존재하지 않기 때문에, 데이터 노드에 저장된 데이터에 트랜잭션에 따른 수정 요청은 존재하지 않는다. 이에 따라 데이터 노드에 저장된 데이터를 네임 노드(마스터)에 전송한다.If the change log does not exist with respect to the data included in the query, the data stored in the data node is transmitted to the data included in the query (S350). Since there is no change log for the data contained in the query, there is no transaction-dependent modification request in the data stored in the data node. Thereby transmitting data stored in the data node to the name node (master).

도 4는 본 발명의 다른 일 실시예에 따른 대용량 분산파일 시스템에서 트랜잭션의 수정내용이 반영된 새로운 파일을 생성하는 방법을 예시한 순서도이다.FIG. 4 is a flowchart illustrating a method of generating a new file in which a modification of a transaction is reflected in a large capacity distributed file system according to another embodiment of the present invention.

클라이언트로부터 데이터 수정 등의 트랜잭션 질의를 수신한다(S410).A transaction query such as data modification from the client is received (S410).

수신한 질의를 파싱하여, 트랜잭션과 관련된 질의인 경우 락을 획득한다(S420).The received query is parsed to acquire a lock in case of a query related to the transaction (S420).

트랜잭션과 관련된 데이터 노드에 트랜잭션에 관한 정보를 전송한다(S430).The information about the transaction is transmitted to the data node related to the transaction (S430).

트랜잭션에 따라 변경할 청크 데이터들에 대한 변경 로그를 로그 DB에 저장한다(S440).A change log for chunk data to be changed according to the transaction is stored in the log DB (S440).

로그 DB에 청크 데이터가 기 설정된 양 이상 저장되었는지 여부를 판단한다(S450). 로그 DB에 트랜잭션에 따라 변경할 청크 데이터가 기 설정된 양 이상이 저장되었는지 여부를 판단한다. 이는 청크 데이터가 일정량 이상이 되는 경우, 전체 HDFS의 성능 저하를 가져올 수 있고, 로그 DB의 용량을 초과할 수 있기 때문이다.It is determined whether or not a predetermined amount of chunk data is stored in the log DB (S450). It is determined whether more than a predetermined amount of chunk data to be changed according to the transaction is stored in the log DB. This is because if the chunk data exceeds a certain amount, the performance of the entire HDFS may deteriorate and the capacity of the log DB may be exceeded.

로그 DB에 청크 데이터가 기 설정된 양 이상 저장된 경우, 변경 로그가 반영된 청크 데이터들로 이루어진 새로운 데이터 파일을 백그라운드(Background)로 생성하고 네임 노드(마스터)로 통지한다(S460). 로그 DB에 청크 데이터가 기 설정된 양 이상 저장된 경우, 데이터 노드는 기존에 저장되어 있던 데이터들에 변경 로그가 반영된 청크 데이터들로 이루어진 새로운 데이터 파일을 생성한다. 새로운 데이터 파일을 생성함에 있어 백그라운드로 새로운 데이터 파일을 생성하기 때문에, HDFS가 동작함에 있어서 어떠한 영향을 미치지 않는다. 다만 새로운 데이터 파일을 생성함에 있어 상당한 시간을 필요로 하는 경우가 많기 때문에, 데이터 노드에 저장된 파일의 양에 따라 기 설정된 양을 조절할 수 있다. 데이터 노드가 새로운 데이터 파일을 백그라운드로 생성한 경우, 이를 네임 노드(마스터)로 통지한다.If a predetermined amount or more of chunk data is stored in the log DB, a new data file including chunk data reflecting the change log is generated as a background and notified to the name node (master) (S460). When a predetermined amount of chunk data is stored in the log DB, the data node generates a new data file including the chunk data in which the change log is reflected in the previously stored data. Since creating a new datafile creates a new datafile in the background, it has no effect on HDFS operation. However, since it often takes a considerable amount of time to generate a new data file, a predetermined amount can be adjusted according to the amount of files stored in the data node. If the data node has created a new data file in the background, it notifies it to the name node (master).

생성된 새로운 데이터 파일에 따라 메타 데이터 정보를 갱신한 후 락을 해제한다(S470). 네임 노드(마스터)가 데이터 노드로부터 새로운 데이터 파일이 생성되었음을 통지받은 경우, 네임 노드(마스터)는 메타 데이터 DB에 이를 통지하고, 메타 데이터 DB는 새로운 데이터 파일에 따라 메타 데이터 정보를 갱신한다. 메타 데이터 정보를 갱신한 후, 메타 데이터 DB는 락을 해제한다.After updating the metadata information according to the created new data file, the lock is released (S470). When the name node (master) is notified from the data node that a new data file has been created, the name node (master) notifies the metadata DB, and the metadata DB updates the metadata information according to the new data file. After updating the metadata information, the metadata DB releases the lock.

도 2에서는 과정 S210 내지 과정 S260을, 도 3에서는 과정 S310 내지 과정 S350을, 도 4에서는 과정 S410 내지 과정 S470을 각각 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 발명의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 발명의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 2 내지 도 4에 기재된 순서를 변경하여 실행하거나 각각의 복수의 과정 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 2 내지 도 4는 시계열적인 순서로 한정되는 것은 아니다.2, step S310 to step S350 in FIG. 3, and steps S410 to S470 in FIG. 4 are sequentially performed. However, it is to be understood that the technical idea of an embodiment of the present invention is exemplified It is nothing more than an explanation. In other words, those skilled in the art will recognize that the present invention may be practiced with modification of the order described in FIGS. 2 through 4 without departing from the essential characteristics of one embodiment of the present invention, It is to be understood that the invention is not limited to the above-described embodiments, but may be embodied in many other specific forms without departing from the spirit or essential characteristics thereof.

한편, 도 2 내지 도 4에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.2 to 4 can be embodied as computer-readable codes on a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. That is, a computer-readable recording medium includes a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), an optical reading medium (e.g., CD ROM, And the like). The computer-readable recording medium may also be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The foregoing description is merely illustrative of the technical idea of the present embodiment, and various modifications and changes may be made to those skilled in the art without departing from the essential characteristics of the embodiments. Therefore, the present embodiments are to be construed as illustrative rather than restrictive, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment should be construed according to the following claims, and all technical ideas within the scope of equivalents thereof should be construed as being included in the scope of the present invention.

110: 네임 노드(마스터) 120: 메타 데이터 DB
130, 132, 134, 136: 데이터 노드
140, 142, 144: 로그 DB110: Namenode (master) 120: Metadata DB
130, 132, 134, 136: data node
140, 142, 144: log DB

Claims

대용량 분산 파일 시스템에,
클라이언트(Client)로부터 수신한 질의(Query)를 파싱(Parsing)하는 과정;
파싱된 질의가 데이터 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우, 상기 트랜잭션과 관련된 데이터에 대해 락(Lock)을 획득하는 과정;
상기 트랜잭션과 관련된 데이터가 저장되어 있는 데이터 노드로 상기 트랜잭션에 관한 정보를 전송하는 과정;
상기 트랜잭션에 관한 정보를 수신하여 수정할 청크(Chunk) 데이터에 대한 변경 로그(Log)를 로컬(Local) 데이터베이스에 저장하는 과정;
상기 변경 로그가 저장된 후, 커밋(Commit) 여부를 통지하여 상기 커밋을 승인하는 과정; 및
상기 커밋을 수신한 후, 상기 트랜잭션과 관련된 데이터에 대한 락을 해제하는 과정
을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.For high-capacity distributed file systems,
A process of parsing a query received from a client;
Acquiring a lock on data related to the transaction when the parsed query is a query relating to a transaction including data modification;
Transmitting information about the transaction to a data node storing data related to the transaction;
Storing a change log for chunk data to receive and modify the transaction information in a local database;
After the change log is stored, notifying whether or not the commit is committed and approving the commit; And
After receiving the commit, releasing a lock on data associated with the transaction
Readable recording medium having recorded thereon a program for realizing a computer program product.

제1항에 있어서,
상기 락에 관한 정보는,
메타데이터(Metadata) 데이터베이스에 저장되어 있으며, 상기 메타데이터 데이터베이스로부터 획득하는 것을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 1,
The information on the lock may be,
A computer-readable recording medium having recorded thereon a program for realizing acquisition from a metadata database stored in a metadata database.

제1항에 있어서,
상기 로컬(Local) 데이터베이스에 저장하는 과정은,
상기 청크 데이터에 대한 변경로그를 로컬 데이터베이스에 저장함에 있어, 각각의 청크 데이터에 대해 식별자를 생성하여 부여하는 것을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 1,
The process of storing in the local database comprises:
A computer-readable recording medium storing a program for realizing generation of an identifier for each chunk data in storing a change log for the chunk data in a local database.

제1항에 있어서,
상기 파싱된 질의가 데이터 조회와 관련된 질의인 경우, 상기 데이터 조회와 관련된 질의에 관한 정보를 전송하는 과정;
상기 로컬 데이터베이스 내에 상기 데이터 조회와 관련된 질의에 관한 정보에 대해 상기 청크 데이터의 변경 로그가 존재하는지 여부를 확인하는 과정; 및
상기 데이터 조회와 관련된 질의에 관한 정보에 대해 상기 청크 데이터의 변경 로그가 존재하는 경우, 상기 데이터 조회와 관련된 질의에 관한 정보의 청크 데이터 스냅샷(Snapshot)을 생성하여 전송하는 과정
을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 1,
Transmitting information related to a query related to the data inquiry when the parsed query is a query related to data retrieval;
Determining whether a change log of the chunk data exists in the local database with respect to information about a query related to the data inquiry; And
Generating a chunk data snapshot of information on a query related to the data inquiry when the change log of the chunk data exists with respect to the information related to the inquiry related to the data inquiry,
Readable recording medium having recorded thereon a program for realizing a computer program product.

제3항 또는 제4항에 있어서,
상기 데이터 조회와 관련된 질의에 관한 정보에 대해 상기 청크 데이터의 변경 로그가 존재하는지 여부를 상기 식별자를 이용해 확인하는 것을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 3 or 4,
Wherein the verification unit confirms whether or not a change log of the chunk data exists with respect to information about a query related to the data inquiry by using the identifier.

제1항에 있어서,
상기 로컬(Local) 데이터베이스에 저장하는 과정은,
상기 로컬 데이터베이스에 저장된 변경 로그가 기 설정된 양을 초과하는 경우, 상기 변경 로그가 반영된 청크 데이터로 이루어진 새로운 파일을 백그라운드(Background)로 생성하는 것을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 1,
The process of storing in the local database comprises:
And a new file composed of the chunk data reflecting the change log is generated as a background when the change log stored in the local database exceeds a preset amount.

제6항에 있어서,
상기 새로운 파일의 변경 내용이 반영된 새로운 메타데이터를 생성하는 것을 실현하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.The method according to claim 6,
And the new metadata is generated based on the changed contents of the new file.

클라이언트(Client)로부터 질의(Query)를 수신하여 수신한 질의를 파싱하고, 파싱한 질의 정보를 전송하는 네임 노드;
데이터를 저장하고 있으며, 상기 파싱한 질의 정보를 수신하여 전달하는 데이터 노드;
상기 데이터 노드에 저장된 각각의 데이터의 메타데이터(Metadata)를 저장하고 있으며, 상기 파싱한 질의 정보가 데이터의 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우 상기 네임 노드로 상기 트랜잭션과 관련된 데이터에 대해 락(Lock)을 전송하는 메타데이터 데이터베이스(Database); 및
상기 데이터 노드로부터 전달받은 질의 정보가 데이터의 수정을 포함하는 트랜잭션(Transaction)에 관한 질의인 경우, 수정할 청크(Chunk) 데이터에 대한 변경 로그를 저장하고, 상기 데이터 노드로부터 전달받은 질의 정보가 데이터 조회와 관련된 질의인 경우, 상기 청크 데이터에 대한 변경 로그가 존재하는지 여부를 확인하는 로컬 데이터베이스
를 포함하는 것을 특징으로 하는 질의 처리장치.A name node receiving a query from a client and parsing the received query and transmitting the parsed query information;
A data node storing data and receiving and transmitting the parsed query information;
And storing metadata of each data stored in the data node. When the parsed query information is a query relating to a transaction including modification of data, the metadata related to the transaction is stored in the name node A metadata database for transmitting locks; And
Storing a change log for chunk data to be modified when the inquiry information transmitted from the data node is a query related to a transaction including modification of data; A local database for verifying whether a change log for the chunk data exists,
And a query processing unit for processing the query.

제8항에 있어서,
상기 메타데이터 데이터베이스는,
상기 네임 노드가 트랜잭션에 관한 질의 정보를 전송하기 전에 상기 트랜잭션에 관한 데이터에 대해 락을 저장하고 있는 것을 특징으로 하는 질의 처리장치.9. The method of claim 8,
Wherein the metadata database comprises:
Wherein the name node stores a lock on data related to the transaction before transmitting the query information about the transaction.

제8항에 있어서,
상기 로컬 데이터베이스가 상기 변경 로그를 저장한 경우, 커밋(Commit) 여부를 상기 데이터 노드를 거쳐 상기 네임 노드로 전송하고,
상기 네임 노드는 상기 커밋을 승인하여 상기 메타데이터 데이터베이스로부터 수신한 락을 해제하는 것을 특징으로 하는 질의 처리장치.9. The method of claim 8,
When the local database stores the change log, transmits a commit message to the name node via the data node,
Wherein the name node approves the commit and releases the lock received from the metadata database.

제8항에 있어서,
상기 메타데이터 데이터베이스는,
상기 청크(Chunk) 데이터에 대한 변경 로그를 저장함에 있어, 각각의 청크 데이터에 대해 식별자를 생성하여 부여하는 것을 특징으로 하는 질의 처리장치.9. The method of claim 8,
Wherein the metadata database comprises:
Wherein an identifier is generated and given to each chunk data in storing the change log for the chunk data.

제8항에 있어서,
상기 데이터 노드는
상기 로컬 데이터베이스에 저장된 변경 로그가 기 설정된 양을 초과하는 경우, 상기 변경 로그가 반영된 청크 데이터로 이루어진 새로운 파일을 백그라운드(Background)로 생성하는 것을 특징으로 하는 질의 처리장치.9. The method of claim 8,
The data node
Wherein when the change log stored in the local database exceeds a preset amount, a new file made up of chunk data reflecting the change log is generated as a background.

제12항에 있어서,
상기 메타데이터 데이터베이스는,
상기 새로운 파일의 변경 내용이 반영된 새로운 메타데이터를 생성하는 것을 특징으로 하는 질의 처리장치.13. The method of claim 12,
Wherein the metadata database comprises:
And generates new metadata reflecting the changed contents of the new file.