KR102014725B1

KR102014725B1 - Manycore based core partitioning apparatus and method, storage media storing the same

Info

Publication number: KR102014725B1
Application number: KR1020170156589A
Authority: KR
Inventors: 진현욱; 임은진; 조중연; 김주호
Original assignee: 건국대학교 산학협력단; 국민대학교산학협력단
Priority date: 2017-11-22
Filing date: 2017-11-22
Publication date: 2019-08-27
Also published as: KR20190059048A

Abstract

본 발명은 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치에 관한 것으로, 메시지를 저장하는 공유 메모리, 제1 및 제2 쓰레드들을 포함하는 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 메시지를 처리하는 제1 프로세싱 엘리먼트(Processing Element) 및 상기 제2 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 제1 쓰레드에 배타적으로 상기 메시지를 처리하는 제2 프로세싱 엘리먼트(Processing Element)를 포함할 수 있다.The present invention relates to a Manycore based Core Partitioning device, which creates a process comprising a shared memory for storing messages, first and second threads, and when a system call is performed by said process. Accessing the shared memory through the first thread to access the shared memory through the first processing element and the second thread to process the message in each process before and after the system call. Each process before and after may include a second processing element for processing the message exclusively to the first thread.

Description

매니코어 기반 코어 파티셔닝 장치 및 방법, 이를 저장하는 기록매체{MANYCORE BASED CORE PARTITIONING APPARATUS AND METHOD, STORAGE MEDIA STORING THE SAME}MANYCORE BASED CORE PARTITIONING APPARATUS AND METHOD, STORAGE MEDIA STORING THE SAME}

본 발명은 코어 파티셔닝 기술에 관한 것으로, 보다 상세하게는, 매니코어 시스템에서 네트워크 성능을 향상시킬 수 있는 매니코어 기반 코어 파티셔닝 장치 및 방법에 관한 것이다.The present invention relates to core partitioning technology, and more particularly, to a manicore based core partitioning device and method capable of improving network performance in a manicore system.

매니코어 시스템에서 기존의 운영체제는 네트워크 I/O관점에서 확장성(scalability)에 많은 문제점을 가지고 있다. 기존의 운영체제는 수행되는 코어가 늘어날수록 캐시 일관성을 유지하기 위해 높은 비용을 소모하고, 공유자원을 획득하기 위해 락(lock)이 집중되는 문제를 가지고 있다. 또한 기존의 운영체제에서는 커널 이미지와 응용 코드가 같은 코어에서 수행될 수 있다. 이로 인하여 커널 이미지와 응용 코드의 문맥 교환 과정에서 캐시 오염이나 TLB(Translation Lookaside Buffer) 오염으로 인하여 확장성이 저하될 수 있다.Existing operating systems in manicore systems have many problems with scalability in terms of network I / O. Existing operating systems have a problem that the more cores are executed, the higher the cost to maintain cache coherency, and the lock is concentrated to obtain shared resources. In addition, in existing operating systems, kernel images and application code can run on the same core. As a result, scalability may be degraded due to cache corruption or translation lookaside buffer (TLB) contamination in the context of kernel image and application code exchange.

한국 공개특허공보 제10-2013-0033020(2013.04.03)호는 매니코어 시스템에서의 파티션 스케줄링 장치 및 방법에 관한 것으로, 더욱 상세하게는 매니코어 시스템에서 구동되는 어플리케이션들을 위해 우선순위에 따른 효율적 파티셔닝이 수행되어 리얼타임(real-time) 환경에서 효율적으로 자원 활용이 가능하고, 어플리케이션들의 우선순위에 따른 파티셔닝에 있어서 코어들의 하드웨어적인 환경을 고려하여 데이터 코어 간의 교환 오버로드를 최소화하고, 입출력 딜레이를 최소화하며, 유휴 코어의 수는 최소화하고, 나아가 캐시 히트를 위해 로컬리티(locality)를 높인 파티션 스케줄링이 수행된다.Korean Patent Laid-Open Publication No. 10-2013-0033020 (2013.04.03) relates to a partition scheduling apparatus and method in a manicore system, and more particularly, efficient partitioning according to priorities for applications running in a manicore system. This enables efficient resource utilization in a real-time environment, minimizes the exchange overload between data cores and minimizes I / O delays in consideration of the hardware environment of cores in partitioning according to the priority of applications. Minimizing, minimizing the number of idle cores, and further partition scheduling with increased locality for cache hits.

한국 공개특허공보 제10-2014-0125893(2014.10.30)호는 가상화된 매니코어 서버의 작업분배 시스템과 그 방법 및 기록매체에 관한 것으로, 더욱 상세하게는 가상화 환경에서 자동으로 작업을 분배해주는 시스템에 있어서, 상기 작업을 분석하여 상기 작업의 정보를 생성하는 서비스 노드 및 이용 가능한 자원의 정보를상기 서비스 노드에 전달하는 복수의 컴퓨팅 노드를 포함하고, 상기 서비스 노드는 상기 작업의 정보 및 상기 이용 가능한 자원의 정보를 이용하여 상기 작업을 파티셔닝하고, 상기 이용 가능한 자원의 정보를 이용하여 상기 파티셔닝된 작업을 그루핑하여 복수의 잡을 생성하고, 상기 생성된 복수의 잡을 상기 이용 가능한 자원의 정보를 이용하여 상기 복수의 컴퓨팅 노드로 분배한다.Korean Patent Laid-Open No. 10-2014-0125893 (2014.10.30) relates to a work distribution system of a virtualized manicore server, a method and a recording medium, and more particularly, a system for automatically distributing work in a virtualized environment. A service node comprising: a service node for analyzing the task to generate information of the task and a plurality of computing nodes for conveying information of available resources to the service node, wherein the service node comprises information of the task and the available node; Partitioning the job using information of a resource, generating a plurality of jobs by grouping the partitioned job using the information of the available resources, and generating the plurality of jobs using the information of the available resources. Distribute to a plurality of computing nodes.

한국 공개특허공보 제10-2013-0033020(2013.04.03)호Korean Unexamined Patent Publication No. 10-2013-0033020 (2013.04.03) 한국 공개특허공보 제10-2014-0125893(2014.10.30)호Korean Unexamined Patent Publication No. 10-2014-0125893 (2014.10.30)

본 발명의 일 실시예는, 매니코어 시스템에서 네트워크 성능을 향상시킬 수 있는 매니코어 기반 코어 파티셔닝 장치 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide a manifold-based core partitioning apparatus and method that can improve network performance in a manicore system.

본 발명의 일 실시예는, 공유 메모리를 통해 메시지를 교환함으로써 네트워크 성능을 향상시킬 수 있는 매니코어 기반 코어 파티셔닝 장치 및 방법을 제공하고자 한다.One embodiment of the present invention is to provide a manifold based core partitioning apparatus and method that can improve network performance by exchanging messages through a shared memory.

본 발명의 일 실시예는, 쓰레드의 코어 친화도를 효과적으로 결정하여 캐시 일관성 오버헤드를 낮출 수 있는 매니코어 기반 코어 파티셔닝 장치 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide a manicore based core partitioning apparatus and method that can effectively determine the core affinity of a thread to reduce cache coherency overhead.

실시예를 중에서, 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치는 메시지를 저장하는 공유 메모리, 제1 및 제2 쓰레드들을 포함하는 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 메시지를 처리하는 제1 프로세싱 엘리먼트(Processing Element) 및 상기 제2 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 제1 쓰레드에 배타적으로 상기 메시지를 처리하는 제2 프로세싱 엘리먼트(Processing Element)를 포함한다.In one embodiment, a Manycore based Core Partitioning device creates a process comprising a shared memory storing messages, first and second threads, and when a system call is performed by the process. Accessing the shared memory through a first thread to access the shared memory through the first processing element and the second thread to process the message in each process before and after the system call. And a second processing element for processing the message exclusively to the first thread in each process before and after.

상기 공유 메모리는 단일 버퍼(Buffer)를 통해 상기 메시지를 저장하여 상기 시스템 호출의 동기성을 유지할 수 있다.The shared memory may store the message through a single buffer to maintain synchronization of the system call.

상기 공유 메모리는 복수의 버퍼(Buffer)들을 통해 상기 메시지를 순차적으로 저장하여 상기 시스템 호출의 비동기성을 제공할 수 있다.The shared memory may sequentially store the message through a plurality of buffers to provide asynchronousness of the system call.

상기 제2 프로세싱 엘리먼트(Processing Element)는 상기 제2 쓰레드를 제어하여 상기 복수의 버퍼들에 있는 메시지들을 분석하여 해당 시스템 호출을 비순차적으로 수행할 수 있다.The second processing element may control the second thread to analyze messages in the plurality of buffers to perform a corresponding system call out of order.

상기 제1 프로세싱 엘리먼트(Processing Element)는 상기 공유 메모리에 상기 메시지를 저장한 이후에 제1 이벤트를 발생시킬 수 있다.The first processing element may generate a first event after storing the message in the shared memory.

상기 제2 프로세싱 엘리먼트(Processing Element)는 상기 제1 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 저장된 메시지를 독출하여 상기 시스템 호출을 처리하고, 상기 공유 메모리에 접근하여 상기 메시지를 상기 시스템 호출의 처리에 따른 결과로서 수정한 후에 제2 이벤트를 발생시킬 수 있다.When the first event occurs, the second processing element accesses the shared memory to read the stored message to process the system call, and accesses the shared memory to process the message as the system call. As a result of the modification, the second event may be generated after the modification.

상기 제1 프로세싱 엘리먼트(Processing Element)는 상기 제2 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 수정된 메시지를 독출할 수 있다.When the second event occurs, the first processing element may access the shared memory and read the modified message.

상기 제2 프로세싱 엘리먼트(Processing Element)는 복수의 프로세싱 코어(Processing Core)들 및 코어 친화도 정책에 따라 상기 복수의 프로세싱 코어(Processing Core)들의 접근들을 관리하는 라스트 레벨 캐시(Last Level Cache)를 포함할 수 있다.The second processing element includes a last level cache that manages accesses of the plurality of processing cores according to a plurality of processing cores and a core affinity policy. can do.

상기 라스트 레벨 캐시(Last Level Cache)는 상기 제2 쓰레드에 대해 상기 복수의 프로세싱 코어(Processing Core)들 각각에 대한 코어 친화도를 계산하고, 가장 높은 코어 친화도와 연관된 프로세싱 코어(Processing Core)의 접근을 허용할 수 있다.The Last Level Cache calculates the core affinity for each of the plurality of processing cores for the second thread, and accesses to the processing core associated with the highest core affinity. Can be allowed.

상기 라스트 레벨 캐시(Last Level Cache)는 네트워크 장치가 연결된 I/O 버스를 소유한 프로세서 소켓에 포함된 제1 프로세싱 코어(Processing Core), 상기 제1 프로세싱 코어(Processing Core) 중에서 라스트 레벨 캐쉬(Last Level Cash)를 공유하는 제2 프로세싱 코어(Processing Core) 및 상기 제2 프로세싱 코어(Processing Core) 중에서 임계치 내의 가장 높은 부하를 가진 제3 프로세싱 코어(Processing Core)의 순서대로 높은 코어 친화도를 부여함으로써 상기 제2 쓰레드의 코어 친화도를 결정할 수 있다.The last level cache is a last level cache among a first processing core and a first processing core included in a processor socket that owns an I / O bus to which a network device is connected. By giving a high core affinity in the order of the second processing core (Shared Level Cash) and the third processing core with the highest load within the threshold among the second processing core (Processing Core) The core affinity of the second thread may be determined.

실시예들 중에서, 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법은 메시지를 저장하는 공유 메모리를 포함하는 코어 파티셔닝(Core Partitioning) 장치에서 수행되는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법에 있어서, 제1 및 제2 쓰레드들을 포함하는 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 메시지를 처리하는 제1 프로세싱(Processing) 단계 및 상기 제2 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 제1 쓰레드에 배타적으로 상기 메시지를 처리하는 제2 프로세싱(Processing) 단계를 포함한다.Among the embodiments, the Manycore based Core Partitioning method is a Manycore based Core Partitioning performed on a Core Partitioning device including shared memory for storing messages. A method comprising: creating a process comprising first and second threads, and when a system call is made by the process, accesses the shared memory through the first thread to cause the message in each step before and after the system call. A second processing step of accessing the shared memory through the second thread and processing the message exclusively to the first thread in each process before and after the system call; Steps.

상기 제1 프로세싱(Processing) 단계는 상기 공유 메모리에 상기 메시지를 저장한 이후에 제1 이벤트를 발생시킬 수 있다.The first processing step may generate a first event after storing the message in the shared memory.

상기 제2 프로세싱(Processing) 단계는 상기 제1 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 저장된 메시지를 독출하여 상기 시스템 호출을 처리하고, 상기 공유 메모리에 접근하여 상기 메시지를 상기 시스템 호출의 처리에 따른 결과로서 수정한 후에 제2 이벤트를 발생시킬 수 있다.In the second processing step, when the first event is generated, the shared memory is accessed to read the stored message to process the system call, and the shared memory is accessed to process the message to the processing of the system call. As a result, a second event may be generated after the modification.

상기 제1 프로세싱(Processing) 단계는 상기 제2 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 수정된 메시지를 독출할 수 있다.When the second event occurs, the first processing step may access the shared memory and read the modified message.

실시예들 중에서, 컴퓨터에 의해 수행 가능한 기록매체는 메시지를 저장하는 공유 메모리를 포함하는 코어 파티셔닝(Core Partitioning) 장치에서 수행되는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법에 있어서, 제1 및 제2 쓰레드들을 포함하는 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 메시지를 처리하는 제1 프로세싱(Processing) 단계 및 상기 제2 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 상기 제1 쓰레드에 배타적으로 상기 메시지를 처리하는 제2 프로세싱(Processing) 단계를 포함하는 방법을 기록한다.In one or more embodiments, a computer-readable recording medium may include a first core-based core partitioning method performed by a core partitioning device including a shared memory storing a message. And a first processing for creating a process including second threads, and accessing the shared memory through the first thread to process the message before and after each system call when a system call is performed by the process. And a second processing step of accessing the shared memory through the second thread and processing the message exclusively to the first thread in each process before and after the system call. Record it.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technique can have the following effects. However, since a specific embodiment does not mean to include all of the following effects or only the following effects, it should not be understood that the scope of the disclosed technology is limited by this.

본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치 및 방법 은 공유 메모리를 통해 메시지를 교환함으로써 네트워크 성능을 향상시킬 수 있다.An apparatus and method for core based partitioning according to an embodiment of the present invention can improve network performance by exchanging messages through a shared memory.

본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치 및 방법 은 쓰레드의 코어 친화도를 효과적으로 결정하여 캐시 일관성 오버헤드를 낮출 수 있다.The manicore-based core partitioning apparatus and method according to an embodiment of the present invention can effectively determine the core affinity of a thread to lower cache coherency overhead.

도 1은 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치를 설명하는 도면이다.
도 2는 도 1에 있는 프로세싱 엘리먼트를 나타내는 블록도이다.
도 3는 도 1에 있는 제1 프로세싱 엘리먼트에서 제1 쓰레드를 통해 공유 메모리에 접근하는 알고리즘을 나타내는 순서도이다.
도 4은 도 1에 있는 제2 프로세싱 엘리먼트에서 제2 쓰레드를 통해 공유 메모리에 접근하는 알고리즘을 나타내는 순서도이다.
도 5는 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치에서 수행되는 공유 메모리 기반의 메시지 교환 과정을 통해 시스템 호출을 처리하는 과정을 설명하는 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 시스템을 설명하는 도면이다.1 is a diagram illustrating a manicore-based core partitioning device according to an embodiment of the present invention.
2 is a block diagram illustrating a processing element in FIG. 1.
3 is a flow chart illustrating an algorithm for accessing shared memory through a first thread in a first processing element in FIG. 1.
4 is a flow diagram illustrating an algorithm for accessing shared memory through a second thread in a second processing element in FIG. 1.
5 is a flowchart illustrating a process of processing a system call through a shared memory based message exchange process performed in a manicore based core partitioning apparatus according to an embodiment of the present invention.
6 is a diagram illustrating a manicore based core partitioning system according to an embodiment of the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Description of the present invention is only an embodiment for structural or functional description, the scope of the present invention should not be construed as limited by the embodiments described in the text. That is, since the embodiments may be variously modified and may have various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical idea. In addition, the objects or effects presented in the present invention does not mean that a specific embodiment should include all or only such effects, the scope of the present invention should not be understood as being limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are intended to distinguish one component from another component, and the scope of rights should not be limited by these terms. For example, the first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" to another component, it should be understood that there may be other components in between, although it may be directly connected to the other component. On the other hand, when a component is referred to as being "directly connected" to another component, it should be understood that there is no other component in between. On the other hand, other expressions describing the relationship between the components, such as "between" and "immediately between" or "neighboring to" and "directly neighboring to", should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions should be understood to include plural expressions unless the context clearly indicates otherwise, and terms such as "comprise" or "have" refer to a feature, number, step, operation, component, part, or feature thereof. It is to be understood that the combination is intended to be present and does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, an identification code (e.g., a, b, c, etc.) is used for convenience of description, and the identification code does not describe the order of the steps, and each step clearly indicates a specific order in context. Unless stated otherwise, they may occur out of the order noted. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다.The present invention can be embodied as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all kinds of recording devices in which data can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. Generally, the terms defined in the dictionary used are to be interpreted to coincide with the meanings in the context of the related art, and should not be interpreted as having ideal or excessively formal meanings unless clearly defined in the present application.

최근 매니코어(Manycore) 시스템에서 코어 수 증가에 따른 기존 운영체제의 확장성 문제에 대한 연구들이 진행되었다. 최근 연구들은 기존 운영체제의 문제점에 대하여 아래와 같이 분석하고 있다.Recently, studies on scalability problems of existing operating systems according to an increase in the number of cores in a Manycore system have been conducted. Recent studies analyze the problems of the existing operating system as follows.

1. NUMA(Non Uniform Memory Access) 기반의 매니코어(Manycore) 시스템에서 캐시 일관성을 위해서 서로 다른 프로세서 패키지의 캐시에 대한 접근은 확장성에 치명적이다.For cache coherency in non-uniform memory access (NUMA) -based Manycore systems, access to caches from different processor packages is critical to scalability.

2. 공유자원에 대한 lock은, 공유자원을 사용하기 위해 점유하지 못한 다른 코어들이 대기하는 만큼 지연시간이 발생하게 되며, 코어가 많은 매니코어(Manycore) 시스템에서는 그 지연시간이 더 길어지며 확장성을 저해할 수 있다.2. A lock on a shared resource is delayed as other cores that are not occupied to use the shared resource wait. The delay is longer and scalable in a Manycore system with many cores. May inhibit.

3. 커널 이미지와 응용 코드가 같은 코어에서 동작할 수 있으며, 이 때 캐시 오염이나 TLB(Translation Lookaside Buffer) 오염이 발생할 수 있다.3. Kernel images and application code can run on the same core, which can result in cache corruption or translation lookaside buffer (TLB) contamination.

또한, 네트워크 관점으로 매니코어(Manycore) 환경에서 기존 운영체제의 확장성 문제에 대한 연구도 진행되었다. 이 연구는 TCP(Transmission Control Protocol) connection별로 코어를 분리시킨다면 connection상태를 관리하기 위한 락(lock) 오버헤드(overhead)를 줄일 수 있고 캐시 일관성 오버헤드도 줄여 네트워크 성능을 향상시킬 수 있다고 밝히고 있다.In addition, from the perspective of the network, research on the scalability problem of the existing operating system in the Manycore environment was also conducted. The study found that separating cores by Transmission Control Protocol (TCP) connections can reduce the lock overhead for managing connection state and reduce cache coherency overhead to improve network performance.

앞서 기술된 매니코어(Manycore) 시스템에서 기존 운영체제가 갖는 확장성에 대한 문제들을 해결하기 위해 커널 구조를 변경하여 코어별로 지역성을 갖도록 하는 연구들이 진행되었으며, TCP를 유저 수준에서 처리하여 커널 모드와 사용자 모드 간의 전환을 줄이기 위한 연구도 진행되었다. 또한, 최적의 코어 친화도를 적용하여 I/O 성능을 향상시키고 하드웨어 자원을 효율적으로 사용하기 위한 연구들도 진행되었다. 이러한 연구들의 공통점은 커널 이미지와 응용 코드를 분리시켜 지역성을 획득하고 이를 통해 락(lock) 오버헤드와 캐시 일관성을 유지하기 위한 비용을 줄일 수 있다는 것이다.In order to solve the problem of scalability of the existing operating system in the aforementioned Manicore system, researches have been made to change the kernel structure to have locality by core, and process TCP at the user level to kernel mode and user mode. Research has also been conducted to reduce the transition. In addition, studies have been conducted to improve I / O performance and to efficiently use hardware resources by applying optimal core affinity. The commonality between these studies is that separation of kernel images and application code achieves locality, thereby reducing the cost of maintaining lock overhead and cache coherency.

기존의 연구들은 운영체제의 구조를 변경하거나 응용 수준의 변경을 통해 네트워크 성능을 높이고자 하였다. 그러나, 이러한 방법은 이미 구현된 수많은 응용을 지원하지 못한다는 큰 단점이 있다. 본 발명은 운영체제나 응용 수준의 변경없이 응용 문맥과 시스템 호출 문맥을 분리시켜 지역성을 증가시키고, 코어 친화도를 적용하여 connection별로 코어를 분리시킴으로써 네트워크 성능을 향상시킬 수 있다.Existing studies have attempted to improve network performance by changing the structure of the operating system or application level. However, this method has a big disadvantage in that it does not support many applications already implemented. The present invention can increase the locality by separating the application context and the system call context without changing the operating system or application level, and can improve network performance by separating cores by connection by applying core affinity.

도 1은 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치를 설명하는 도면이다.1 is a diagram illustrating a manicore-based core partitioning device according to an embodiment of the present invention.

도 1을 참조하면, 매니코어 기반 코어 파티셔닝 장치(이하, 코어 파티셔닝 장치라 한다.)(100)는 공유 메모리(110), 제1 프로세싱 엘리먼트(130), 제1 프로세싱 엘리먼트(150) 및 네트워크 인터페이스 카드(170)을 포함할 수 있다.Referring to FIG. 1, a manifold-based core partitioning device (hereinafter referred to as a core partitioning device) 100 is a shared memory 110, a first processing element 130, a first processing element 150, and a network interface. Card 170 may be included.

공유 메모리(110)는 매니코어(Manycore) 시스템에서 사용되는 기억장치에 해당할 수 있고, SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 코어 파티셔닝 장치(100)의 동작에 필요한 메시지를 저장하는데 사용될 수 있다.The shared memory 110 may correspond to a storage device used in a Manycore system, and may be implemented as a non-volatile memory such as a solid state disk (SSD) or a hard disk drive (HDD) to form the core partitioning device 100. It can be used to store messages needed for the operation of.

공유 메모리(110)는 단일 버퍼(Buffer)로 구현될 수 있고, 복수의 버퍼(Buffer)들로 구현될 수 있다. 일 실시예에서, 공유 메모리(110)는 단일 버퍼(Buffer)를 통해 메시지를 저장하여 시스템 호출의 동기성을 유지할 수 있다. 다른 실시예에서, 공유 메모리(110)는 복수의 버퍼(Buffer)들을 통해 메시지를 순차적으로 저장하여 시스템 호출의 비동기성을 제공할 수 있다.The shared memory 110 may be implemented as a single buffer or a plurality of buffers. In one embodiment, shared memory 110 may store messages through a single buffer to maintain synchronization of system calls. In another embodiment, the shared memory 110 may store messages sequentially through a plurality of buffers to provide asynchronous system calls.

보다 구체적으로, 공유 메모리(110)가 단일 버퍼(Buffer)로 구현된 경우, 제1 프로세싱 엘리먼트(130) 및 제2 프로세싱 엘리먼트(150)는 동시에 공유 메모리(110)에 접근할 수 없고, 상호 배타적으로만 공유 메모리(110)에 접근할 수 있다. 공유 메모리(110)가 복수의 버퍼(Buffer)들로 구현된 경우, 제1 프로세싱 엘리먼트(130) 및 제2 프로세싱 엘리먼트(150)는 복수의 버퍼(Buffer)들 중에서 이용가능한 버퍼(Buffer)가 존재하는 한 동시에 공유 메모리(110)에 접근할 수 있다.More specifically, when the shared memory 110 is implemented as a single buffer, the first processing element 130 and the second processing element 150 cannot access the shared memory 110 at the same time, and are mutually exclusive. Only the shared memory 110 can be accessed. When the shared memory 110 is implemented with a plurality of buffers, the first processing element 130 and the second processing element 150 have a buffer available among the plurality of buffers. At the same time, the shared memory 110 can be accessed.

제1 프로세싱 엘리먼트(130)는 NUMA(Non Uniform Memory Access) 구조 기반의 매니코어(Manycore) 시스템을 구성하는 프로세서 패키지(Processor Package)에 해당할 수 있다. 매니코어(Manycore) 시스템은 응용 프로그램이 실행되는 경우 해당 응용 프로그램의 동작을 담당하는 프로세스를 생성할 수 있고, 프로세스는 해당 응용프로그램의 일부 동작을 담당하는 응용 쓰레드를 포함할 수 있다. 여기에서, 응용 쓰레드는 제1 쓰레드에 해당할 수 있다.The first processing element 130 may correspond to a processor package constituting a Manycore system based on a Non Uniform Memory Access (NUMA) structure. The Manycore system may create a process that is responsible for the operation of the application when the application is running, and the process may include an application thread that is responsible for some operation of the application. Here, the application thread may correspond to the first thread.

제1 프로세싱 엘리먼트(130)는 프로세스에 의해 시스템 호출이 수행되면 제1 쓰레드를 통해 공유 메모리(110)에 접근하여 시스템 호출의 전후 각각의 과정에서 메시지를 처리할 수 있다.When a system call is performed by a process, the first processing element 130 may access the shared memory 110 through a first thread to process a message before and after the system call.

일 실시예에서, 제1 프로세싱 엘리먼트(130)는 공유 메모리(110)에 메시지를 저장한 이후에 제1 이벤트를 발생시킬 수 있다. 여기에서, 제1 이벤트는 제1 프로세싱 엘리먼트(130)와 제2 프로세싱 엘리먼트(150) 간의 동기화를 위해 사용되는 함수의 호출에 해당할 수 있다. 제1 이벤트가 발생한 경우 제1 프로세싱 엘리먼트(130)는 제2 프로세싱 엘리먼트(150)가 특정 이벤트를 발생하기 전까지 공유 메모리(110)에 접근할 수 없다.In one embodiment, the first processing element 130 may generate a first event after storing the message in the shared memory 110. Here, the first event may correspond to a call of a function used for synchronization between the first processing element 130 and the second processing element 150. When the first event occurs, the first processing element 130 may not access the shared memory 110 until the second processing element 150 generates a specific event.

일 실시예에서, 제1 프로세싱 엘리먼트(130)는 제2 이벤트가 발생되면 공유 메모리(110)에 접근하여 수정된 메시지를 독출할 수 있다. 여기에서, 제2 이벤트는 제1 프로세싱 엘리먼트(130)와 제2 프로세싱 엘리먼트(150) 간의 동기화를 위해 사용되는 함수의 호출에 해당할 수 있다. 제2 이벤트가 발생한 경우 제1 프로세싱 엘리먼트(110)는 공유 메모리(110)에 접근할 수 있고, 수정된 메시지를 읽을 수 있다.In one embodiment, when the second event occurs, the first processing element 130 may access the shared memory 110 and read the modified message. Here, the second event may correspond to a call of a function used for synchronization between the first processing element 130 and the second processing element 150. When a second event occurs, the first processing element 110 may access the shared memory 110 and read the modified message.

제2 프로세싱 엘리먼트(150)는 NUMA(Non Uniform Memory Access) 구조 기반의 매니코어(Manycore) 시스템을 구성하는 프로세서 패키지(Processor Package)에 해당할 수 있다. 매니코어(Manycore) 시스템은 응용 프로그램이 실행되는 경우 해당 응용 프로그램의 동작을 담당하는 프로세스를 생성할 수 있고, 프로세스는 해당 응용프로그램의 네트워크 시스템 호출을 담당하는 Syscall 쓰레드를 포함할 수 있다. 응용 쓰레드가 네트워크 시스템 호출을 부르면 독립적인 쓰레드에 해당하는 Syscall 쓰레드가 생성될 수 있다. 여기에서, Syscall 쓰레드는 제2 쓰레드에 해당할 수 있다.The second processing element 150 may correspond to a processor package constituting a Manycore system based on a Non Uniform Memory Access (NUMA) structure. The Manycore system may create a process that is responsible for the operation of the application when the application is running, and the process may include a Syscall thread that is responsible for calling the network system of the application. When an application thread calls a network system call, a Syscall thread can be created that corresponds to an independent thread. Here, the Syscall thread may correspond to the second thread.

제2 프로세싱 엘리먼트(150)는 제2 쓰레드를 통해 공유 메모리(110)에 접근하여 시스템 호출의 전후 각각의 과정에서 제1 쓰레드에 배타적으로 메시지를 처리할 수 있다. 보다 구체적으로, 제1 쓰레드와 제2 쓰레드는 공유 메모리(110)를 공유할 수 있고, 제1 쓰레드가 공유 메모리(110)에 접근하고 있는 경우 제2 쓰레드는 공유 메모리(110)에 접근할 수 없다. 또한, 제2 쓰레드가 공유 메모리(110)에 접근하고 있는 경우 제1 쓰레드는 공유 메모리(110)에 접근할 수 없다.The second processing element 150 may access the shared memory 110 through the second thread and process the message exclusively to the first thread in each process before and after the system call. More specifically, the first thread and the second thread may share the shared memory 110, and when the first thread is accessing the shared memory 110, the second thread may access the shared memory 110. none. Also, when the second thread is accessing the shared memory 110, the first thread cannot access the shared memory 110.

일 실시예에서, 제2 프로세싱 엘리먼트(150)는 공유 메모리가 복수의 버퍼(Buffer)들로 구현된 경우 제2 쓰레드를 제어하여 복수의 버퍼(Buffer)들에 있는 메시지들을 분석하여 해당 시스템 호출을 비순차적으로 수행할 수 있다. 보다 구체적으로, 공유 메모리가 복수의 버퍼(Buffer)들로 구현된 경우 제2 프로세싱 엘리먼트(150)는 복수의 버퍼(Buffer)들 중 어느 하나에 접근하여 메시지를 읽고 해당 메시지에 의해 요청된 네트워크 시스템 호출을 실행하여 반환값을 해당 버퍼(Buffer)에 저장할 수 있다. 또한, 제2 프로세싱 엘리먼트(150)는 복수의 버퍼들을 순차적으로 또는 랜덤하게 접근하여 복수의 메시지를 읽고 각각의 메시지에 의해 요청된 네트워크 시스템 호출을 병렬적으로 실행할 수 있고, 각각의 반환값을 해당 버퍼(Buffer)에 병렬적으로 저장할 수 있다.In one embodiment, when the shared memory is implemented with a plurality of buffers, the second processing element 150 controls the second thread to analyze messages in the plurality of buffers to make corresponding system calls. It can be done out of order. More specifically, when the shared memory is implemented with a plurality of buffers, the second processing element 150 accesses any one of the plurality of buffers to read a message and to request a network system requested by the message. You can execute the call and store the return value in the appropriate buffer. In addition, the second processing element 150 may access the plurality of buffers sequentially or randomly to read the plurality of messages and execute the network system calls requested by each message in parallel, each returning a corresponding value. Can be stored in a buffer in parallel.

일 실시예에서, 제2 프로세싱 엘리먼트(150)는 제1 이벤트가 발생되면 공유 메모리(110)에 접근하여 저장된 메시지를 독출하여 해당 메시지에 의해 요청된 시스템 호출을 처리할 수 있고, 공유 메모리(110)에 접근하여 메시지를 시스템 호출의 처리에 따른 결과로서 수정한 후 제2 이벤트를 발생할 수 있다. 제2 프로세싱 엘리먼트(150)는 제1 이벤트 발생을 감지할 수 있고, 제1 이벤트 발생부터 제2 이벤트를 발생 전까지 공유 메모리(110)에 접근할 수 있다.In one embodiment, when the first event occurs, the second processing element 150 may access the shared memory 110 to read a stored message to process a system call requested by the message, and share the memory 110. ) And modify the message as a result of the processing of the system call and then generate a second event. The second processing element 150 may detect the occurrence of the first event and access the shared memory 110 from the occurrence of the first event until the occurrence of the second event.

일 실시예에서, 제2 프로세싱 엘리먼트(150)는 복수의 프로세싱 코어(Processing Core)들 및 코어 친화도 정책에 따라 복수의 프로세싱 코어(Processing Core)들의 접근들을 관리하는 라스트 레벨 캐시(Last Level Cache)를 포함할 수 있다. 프로세싱 코어(Processing Core)들 및 라스트 레벨 캐시(Last Level Cache)에 대해서는 도 2에서 자세히 설명한다.In one embodiment, the second processing element 150 is a Last Level Cache that manages access of a plurality of Processing Cores in accordance with a plurality of Processing Cores and a core affinity policy. It may include. Processing Cores and Last Level Cache are described in detail in FIG. 2.

네트워크 인터페이스 카드(Network Interface Card, NIC)(170)는 컴퓨터를 네트워크에 연결하여 통신하기 위해 사용하는 하드웨어 장치에 해당할 수 있다. 네트워크 인터페이스 카드(170)는 랜 카드(LAN card)에 해당할 수 있고, 네트워크 인터페이스 컨트롤러, 네트워크 인터페이스 카드, 네트워크 어댑터, 네트워크 카드, 이더넷 카드를 포함하여 다양한 명칭으로 사용되고 있다. The network interface card (NIC) 170 may correspond to a hardware device used to communicate by connecting a computer to a network. The network interface card 170 may correspond to a LAN card, and is used under various names including a network interface controller, a network interface card, a network adapter, a network card, and an Ethernet card.

도 2는 도 1에 있는 프로세싱 엘리먼트를 나타내는 블록도이다.2 is a block diagram illustrating a processing element in FIG. 1.

도 2를 참조하면, 프로세싱 엘리먼트(Processing Element)(200)는 프로세싱 코어(Processing Core) C₁(211) 내지 프로세싱 코어(Processing Core)C₉ (219)를 포함하는 복수의 프로세싱 코어(Processing Core)들(210) 및 라스트 레벨 캐시(Last Level Cache)(230)를 포함할 수 있다. 일 실시예에서, 프로세싱 엘리먼트(Processing Element)(200)는 프로세싱 코어(Processing Core) C₀부터 C₉까지 총 10개의 프로세싱 코어(Processing Core)들을 포함하여 구현될 수 있다.Referring to FIG. 2, the processing element 200 may include a plurality of processing cores including a processing core C ₁ 211 to a processing core C ₉ 219. And 210 and a last level cache 230. In one embodiment, the processing element 200 may include a total of 10 processing cores from processing cores C ₀ to C ₉ .

프로세싱 코어(Processing Core)(210)는 매니코어(Manycore) 시스템을 구성하는 복수의 코어에 해당할 수 있고, 응용 쓰레드, Syscall 쓰레드 및 네트워크 장치의 이벤트 핸들러(Event Handler)를 포함하는 다양한 쓰레드들의 동작을 처리할 수 있다. 프로세싱 코어(Processing Core)(210)는 코어 친화도 정책에 의해 결정된 특정 쓰레드의 동작을 처리할 수 있다.The processing core 210 may correspond to a plurality of cores constituting a Manycore system, and the operation of various threads including an application thread, a syscall thread, and an event handler of a network device. Can be processed. The processing core 210 may process the operation of a particular thread determined by the core affinity policy.

라스트 레벨 캐시(Last Level Cache)(230)는 캐시의 한 종류로서 CPU 코어와 같은 특정 칩의 모든 기능을 수행하는 유닛(unit)들에 의해 공유되는 가장 마지막 레벨에 존재하는 캐시에 해당할 수 있다. 라스트 레벨 캐시(Last Level Cache)(230)는 복수의 프로세싱 코어(Processing Core)들(210)에 의해 공유될 수 있다.The last level cache 230 is a type of cache and may correspond to a cache that exists at the last level shared by units performing all functions of a specific chip such as a CPU core. . The last level cache 230 may be shared by a plurality of processing cores 210.

일 실시예에서, 라스트 레벨 캐시(Last Level Cache)(230)는 코어 친화도 정책에 따라 복수의 프로세싱 코어(Processing Core)(210)들의 접근들을 관리할 수 있다. 여기에서, 코어 친화도 정책은 코어 친화도를 결정하는 규칙에 해당할 수 있다.In one embodiment, the last level cache 230 may manage access of a plurality of processing cores 210 in accordance with a core affinity policy. Here, the core affinity policy may correspond to a rule for determining core affinity.

일 실시예에서, 라스트 레벨 캐시(Last Level Cache)(230)는 제2 쓰레드에 대해 상기 복수의 프로세싱 코어(Processing Core)들 각각에 대한 코어 친화도를 계산하고, 가장 높은 코어 친화도와 연관된 프로세싱 코어(Processing Core)의 접근을 허용할 수 있다. 제2 쓰레드에 대한 코어 친화도가 높은 프로세싱 코어(Processing Core)는 해당 쓰레드의 동작을 여유롭게 처리할 수 있을 정도로 부하가 낮은 코어에 해당할 수 있다. 코어 친화도 정책은 부하가 낮은 프로세싱 코어(Processing Core)에 대해 특정 쓰레드의 동작을 담당하게 함으로써 매니코어(Manycore) 시스템의 전체 부하를 최소화할 수 있도록 설정될 수 있다.In one embodiment, the last level cache 230 calculates a core affinity for each of the plurality of processing cores for a second thread and processes processing cores associated with the highest core affinity. (Processing Core) can be accessed. A processing core having a high core affinity for the second thread may correspond to a core having a low load enough to handle the operation of the thread. Core affinity policies can be set to minimize the overall load on a Manycore system by allowing specific threads to operate on low-loading Processing Cores.

일 실시예에서, 라스트 레벨 캐시(Last Level Cache)(230)는 네트워크 장치가 연결된 I/O 버스를 소유한 프로세서 소켓에 포함된 제1 프로세싱 코어(Processing Core), 상기 제1 프로세싱 코어(Processing Core) 중에서 라스트 레벨 캐쉬(Last Level Cash)를 공유하는 제2 프로세싱 코어(Processing Core) 및 상기 제2 프로세싱 코어(Processing Core) 중에서 임계치 내의 가장 높은 부하를 가진 제3 프로세싱 코어(Processing Core)의 순서대로 높은 코어 친화도를 부여함으로써 상기 제2 쓰레드의 코어 친화도를 결정할 수 있다.In one embodiment, the last level cache 230 includes a first processing core included in a processor socket owning an I / O bus to which a network device is connected, and the first processing core. ) In order of the second processing core sharing the last level cash and the third processing core having the highest load within a threshold among the second processing cores. By giving a high core affinity, the core affinity of the second thread can be determined.

보다 구체적으로, 코어 파티셔닝 장치(100)는 제2 쓰레드의 코어 친화도를 효과적으로 결정하기 위해 다음과 같은 코어 친화도 정책을 설정할 수 있다. 코어 파티셔닝 장치(100)는 코어 친화도 정책을 만족하도록 제2 쓰레드의 코어 친화도를 동적으로 결정할 수 있다.More specifically, the core partitioning apparatus 100 may set the following core affinity policy to effectively determine the core affinity of the second thread. The core partitioning device 100 may dynamically determine the core affinity of the second thread to satisfy the core affinity policy.

정책 1. 네트워크 장치가 연결된 I/O 버스를 소유한 프로세서 소켓에 포함된 코어Policy 1. Cores contained in processor sockets owning I / O buses to which network devices are attached

정책 2. 정책 1을 만족하는 코어 중에서 라스트 레벨 캐시(Last Level Cache)를 공유하는 코어들Policy 2. Cores that Share Last Level Cache Among Cores that satisfy Policy 1

정책 3. 정책 2를 만족하는 코어 중에서 부하가 임계치를 넘지 않으면서 가장 높은 코어Policy 3. Of the cores that satisfy Policy 2, the highest core with no load above the threshold

코어 파티셔닝 장치(100)는 네트워크 장치의 이벤트 핸들러(Event Handler)가 동일한 네트워크 연결을 담당하고 있는 제2 쓰레드가 실행되고 있는 프로세싱 코어(Processing Core)에서 같이 실행될 수 있도록 코어 친화도를 결정할 수 있다. 코어 파티셔닝 장치(100)는 제1 쓰레드의 경우는 부하가 낮은 프로세싱 코어(Processing Core)에서 실행되도록 할 수 있다. 따라서, 코어 파티셔닝 장치(100)는 시스템의 전체 부하가 낮은 경우는 제1 쓰레드를 제2 쓰레드와는 다른 프로세싱 코어(Processing Core)에서 실행되도록 할 수 있다. 코어 파티셔닝 장치(100)는 시스템의 전체 부하가 높아져서 가용한 프로세싱 코어(Processing Core)가 없을 경우에는 제1 쓰레드를 제2 쓰레드와 동일한 프로세싱 코어(Processing Core)에서 실행되도록 할 수 있다.The core partitioning device 100 may determine core affinity such that an event handler of a network device may be executed together in a processing core in which a second thread that is in charge of the same network connection is running. The core partitioning apparatus 100 may be executed in a processing core having a low load in the case of the first thread. Accordingly, the core partitioning apparatus 100 may allow the first thread to be executed in a processing core different from the second thread when the overall load of the system is low. The core partitioning apparatus 100 may cause the first thread to be executed in the same processing core as the second thread when there is no processing core available due to a high overall load of the system.

코어 파티셔닝 장치(100)는 제2 쓰레드에게 메시지를 전달할 때 발생하는 오버헤드(overhead)를 낮추기 위해서 공유 메모리(110) 기반의 메시지 채널을 이용할 수 있다. 제2 쓰레드에게 시스템 호출을 요청하고 반환값을 받는 과정은 동기적으로 동작하기 때문에 공유 메모리(110)에는 최대 하나의 메시지만 존재할 수 있다. 따라서, 제2 쓰레드를 위해서는 하나의 메시지만 저장될 수 있는 크기의 공유 메모리(110)만 할당하면 된다. 제2 쓰레드는 공유 메모리(110)에 제1 쓰레드가 요청 메시지를 보냈는지 인지해야 하며, 제1 쓰레드는 시스템 호출 요청 이후 제2 쓰레드가 반환값을 공유 메모리(110)에 썼는지 인지해야 한다. 코어 파티셔닝 장치(100)는 메시지의 인지와 공유 메모리에 대한 읽기 및 쓰기 동기화를 위해서 도 3 및 도 4와 같은 알고리즘을 사용할 수 있다.The core partitioning apparatus 100 may use a shared memory 110 based message channel to lower the overhead incurred when delivering a message to the second thread. Since the process of requesting the second thread to receive the system call and receiving the return value operates synchronously, at most one message may exist in the shared memory 110. Therefore, the second thread only needs to allocate shared memory 110 of a size that can store only one message. The second thread must be aware that the first thread has sent a request message to the shared memory 110, and the first thread must be aware if the second thread has written a return value to the shared memory 110 after the system call request. The core partitioning apparatus 100 may use the algorithm shown in FIGS. 3 and 4 to recognize the message and synchronize read and write to the shared memory.

도 3는 도 1에 있는 제1 프로세싱 엘리먼트에서 제1 쓰레드를 통해 공유 메모리에 접근하는 알고리즘을 나타내는 순서도이다.3 is a flow chart illustrating an algorithm for accessing shared memory through a first thread in a first processing element in FIG. 1.

도 3을 참조하면, 제1 프로세싱 엘리먼트(130)는 시스템 호출 요청 메시지를 공유 메모리(110)에 쓸 수 있다(단계 S310). 제1 프로세싱 엘리먼트(130)는 시스템 호출 요청 메시지를 공유 메모리(110)에 쓴 후 제1 이벤트를 발생시킬 수 있다(단계 S330). 제1 프로세싱 엘리먼트(130)는 제1 이벤트를 발생시킨 이후 제2 이벤트가 발생할 때까지 대기할 수 있고, 제2 이벤트 발생을 감지할 수 있다(단계 S350). 제1 프로세싱 엘리먼트(130)는 제2 이벤트 발생을 감지한 경우 공유 메모리(110)로부터 반환값을 읽을 수 있다(단계 S370).Referring to FIG. 3, the first processing element 130 may write a system call request message to the shared memory 110 (step S310). The first processing element 130 may generate a first event after writing the system call request message to the shared memory 110 (step S330). After generating the first event, the first processing element 130 may wait until a second event occurs and detect the occurrence of the second event (step S350). When the first processing element 130 detects the occurrence of the second event, the first processing element 130 may read a return value from the shared memory 110 (step S370).

도 4은 도 1에 있는 제2 프로세싱 엘리먼트에서 제2 쓰레드를 통해 공유 메모리에 접근하는 알고리즘을 나타내는 순서도이다.4 is a flow diagram illustrating an algorithm for accessing shared memory through a second thread in a second processing element in FIG. 1.

도 4를 참조하면, 제2 프로세싱 엘리먼트(150)는 제1 이벤트 발생을 감지할 수 있다(단계 S410). 제2 프로세싱 엘리먼트(150)는 제1 이벤트 발생을 감지한 경우 공유 메모리(110)로부터 시스템 호출 요청 메시지를 읽을 수 있다(단계 S430). 제2 프로세싱 엘리먼트(150)는 메시지에 의해 요청된 시스템 호출을 실행할 수 있고, 시스템 호출의 반환값을 획득할 수 있다(단계 S450). 제2 프로세싱 엘리먼트(150)는 시스템 호출의 반환값을 공유 메모리(110)에 쓸 수 있다(단계 S470). 제2 프로세싱 엘리먼트(150)는 시스템 호출의 반환값을 공유 메모리(110)에 쓴 이후 제2 이벤트를 발생시킬 수 있다(단계 S490).Referring to FIG. 4, the second processing element 150 may detect the occurrence of the first event (step S410). When the second processing element 150 detects the occurrence of the first event, the second processing element 150 may read a system call request message from the shared memory 110 (step S430). The second processing element 150 may execute the system call requested by the message and obtain a return value of the system call (step S450). The second processing element 150 may write the return value of the system call to the shared memory 110 (step S470). The second processing element 150 may generate a second event after writing the return value of the system call to the shared memory 110 (step S490).

도 5는 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 장치에서 수행되는 공유 메모리 기반의 메시지 교환 과정을 통해 시스템 호출을 처리하는 과정을 설명하는 흐름도이다.5 is a flowchart illustrating a process of processing a system call through a shared memory based message exchange process performed in a manicore based core partitioning apparatus according to an embodiment of the present invention.

코어 파티션닝 장치(100)는 공유 메모리를 이용하여 메시지 교환을 수행함으로써 제1 프로세싱 엘리먼트(510) 및 제2 프로세싱 엘리먼트(530) 간의 동기화를 수행할 수 있다.The core partitioning apparatus 100 may perform synchronization between the first processing element 510 and the second processing element 530 by performing a message exchange using the shared memory.

도 5를 참조하면, 제1 프로세싱 엘리먼트(510)는 공유 메모리(520)에 메시지를 저장할 수 있고(단계 S541), 제1 이벤트를 발생시킬 수 있다(단계 S542). 제2 프로세싱 엘리먼트(530)는 제1 이벤트 발생을 감지한 경우 공유 메모리(520)에서 메시지를 독출할 수 있다(단계 S543). 제2 프로세싱 엘리먼트(530)는 메시지에 의해 요청된 시스템 호출을 처리할 수 있다(단계 S544). 제2 프로세싱 엘리먼트(530)는 시스템 호출의 결과로서 얻은 반환값을 공유 메모리(520)에 저장함으로써 공유 메모리(520)의 메시지를 수정할 수 있다(단계 S545). 제2 프로세싱 엘리먼트(530)는 공유 메모리의 메시지 수정 후 제2 이벤트를 발생시킬 수 있다(단계 S546). 제1 프로세싱 엘리먼트(510)는 제2 이벤트 발생을 감지한 경우 공유 메모리(110)에 접근하여 메시지를 독출할 수 있고, 시스템 호출 처리 결과에 대한 반환값을 획득할 수 있다(단계 S547).Referring to FIG. 5, the first processing element 510 may store a message in the shared memory 520 (step S541) and generate a first event (step S542). When the second processing element 530 detects the occurrence of the first event, the second processing element 530 may read a message from the shared memory 520 (step S543). The second processing element 530 may process the system call requested by the message (step S544). The second processing element 530 may modify the message in the shared memory 520 by storing the return value obtained as a result of the system call in the shared memory 520 (step S545). The second processing element 530 may generate a second event after modifying the message of the shared memory (step S546). When the first processing element 510 detects the occurrence of the second event, the first processing element 510 may access the shared memory 110 to read a message, and obtain a return value for the system call processing result (step S547).

도 6은 본 발명의 일 실시예에 따른 매니코어 기반 코어 파티셔닝 시스템을 설명하는 도면이다.6 is a diagram illustrating a manicore based core partitioning system according to an embodiment of the present invention.

도 6을 참조하면, 코어 파티셔닝 장치(100)는 응용 수준에서 새로운 네트워크 연결을 위한 설명자를 받을 때마다(예를 들어, socket()이 호출될 때마다) Syscall 쓰레드(620)라고 불리는 독립적인 쓰레드를 생성할 수 있다. 코어 파티셔닝 장치(100)는 응용 쓰레드(650)가 네트워크 시스템 호출을 부르면(예를 들어, send(), resc()가 호출되면) 내부적으로 해당 Syscall 쓰레드(620)에게 메시지를 보내어 시스템 호출을 요청할 수 있다. 코어 파티셔닝 장치(100)는 메시지가 어느 시스템 호출을 요청하는지 명시하기 위해 인식번호와 해당 시스템 호출을 위한 인자값을 포함하도록 할 수 있다. 코어 파티셔닝 장치(100)는 메시지를 받은 Syscall 쓰레드(620)가 인식번호에 해당하는 네트워크 시스템 호출을 실질적으로 호출하고, 반환된 값을 메시지로 요청한 응용 쓰레드(650)에게 전달하도록 할 수 있다.Referring to FIG. 6, whenever the core partitioning device 100 receives a descriptor for a new network connection at the application level (eg, whenever a socket () is called), an independent thread called Syscall thread 620. Can be generated. The core partitioning device 100 internally sends a message to the corresponding Syscall thread 620 to request a system call when the application thread 650 calls a network system call (for example, when send () or resc () is called). Can be. The core partitioning device 100 may include a recognition number and an argument value for the system call in order to specify which system call the message requests. The core partitioning device 100 may allow the Syscall thread 620 receiving the message to substantially call a network system call corresponding to the identification number, and deliver the returned value to the requesting application thread 650 as a message.

코어 파티셔닝 장치(100)는 네트워크 시스템 호출이 수행되는 코어(630)와 응용 쓰레드가 수행되는 코어(640)를 분리하고 사용자-커널 문맥 교환 과정에서 발생하는 캐시 오염을 줄일 수 있다. 또한, 코어 파티셔닝 장치(100)는 응용 쓰레드(650)와 Syscall 쓰레드(620) 간의 메시지 교환을 위해 공유 메모리(610)를 사용함으로써 상호 간의 동기화를 수행할 수 있다.The core partitioning apparatus 100 may separate the core 630 in which the network system call is performed from the core 640 in which the application thread is executed, and reduce the cache pollution generated during the user-kernel context exchange. In addition, the core partitioning apparatus 100 may perform synchronization with each other by using the shared memory 610 for exchanging messages between the application thread 650 and the Syscall thread 620.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below I can understand that you can.

100: 매니코어 기반 코어 파티셔닝 장치
110: 공유 메모리 130: 제1 프로세싱 엘리먼트
150: 제2 프로세싱 엘리먼트 170: 네트워크 인터페이스 카드
200: 프로세싱 엘리먼트 210: 프로세싱 코어
211: 프로세싱 코어 C₀ 219: 프로세싱 코어 C₉
230: 라스트 레벨 캐시
510: 제1 프로세싱 엘리먼트 520: 공유 메모리
530: 제2 프로세싱 엘리먼트
610: 공유 메모리 620: Syscall 쓰레드
630: 네트워크 시스템 호출이 수행되는 코어
640: 응용 쓰레드가 수행되는 코어
650: 응용 쓰레드100: manicore based core partitioning device
110: shared memory 130: first processing element
150: second processing element 170: network interface card
200: processing element 210: processing core
211: Processing Core C₀ 219: processing core C₉
230: Last level cache
510: First processing element 520: Shared memory
530: second processing element
610: shared memory 620: Syscall thread
630: Cores where network system calls are made
640: Core on which application thread runs
650: application thread

Claims

프로세스에 고유 할당되어 쓰레드 간의 메시지 교환 과정에서 메시지를 저장하는 공유 메모리;
제1 및 제2 쓰레드들을 포함하는 상기 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 이벤트의 동작과 연동함으로써 상기 메시지를 기록하고 독출하는 제1 프로세싱 엘리먼트(Processing Element); 및
상기 제2 쓰레드를 통해 상기 이벤트의 동작과 연동함으로써, 상기 시스템 호출의 수행 전 과정에서 상기 제1 쓰레드에 의해 상기 공유 메모리에 기록된 상기 메시지를 독출하고, 상기 시스템 호출의 수행 후 과정에서 상기 공유 메모리에 상기 시스템 호출의 결과를 기록하며, 라스트 레벨 캐시(Last Level Cache) 공유 구조를 고려한 코어 친화도 결정 정책을 포함하는 제2 프로세싱 엘리먼트(Processing Element)를 포함하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
Shared memory which is uniquely allocated to a process and stores a message in a message exchange process between threads;
Create the process including the first and second threads, and when a system call is performed by the process, access the shared memory through the first thread to interwork with the operation of the event in each process before and after the system call. A first processing element for recording and reading the message by doing so; And
By interlocking with the operation of the event through the second thread, the message written to the shared memory by the first thread is read by the first thread before the system call is executed, and the sharing is performed after the system call. Manycore based core partitioning, which records the result of the system call in memory and includes a second processing element that includes a core affinity determination policy that takes into account the last level cache shared structure. (Core Partitioning) device.

제1항에 있어서, 상기 공유 메모리는
단일 버퍼(Buffer)를 통해 상기 메시지를 저장하여 상기 시스템 호출의 동기성을 유지하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 1, wherein the shared memory
A Manycore based Core Partitioning device, characterized in that to maintain the synchronization of the system call by storing the message through a single buffer.

제1항에 있어서, 상기 공유 메모리는
복수의 버퍼(Buffer)들을 통해 상기 메시지를 순차적으로 저장하여 상기 시스템 호출의 비동기성을 제공하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 1, wherein the shared memory
Manycore based core partitioning device, characterized in that to provide asynchronous to the system call by sequentially storing the message through a plurality of buffers (Buffer).

제3항에 있어서, 상기 제2 프로세싱 엘리먼트(Processing Element)는
상기 제2 쓰레드를 제어하여 상기 복수의 버퍼들에 있는 메시지들을 분석하여 해당 시스템 호출을 비순차적으로 수행하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 3, wherein the second processing element is
Manycore based core partitioning device, characterized in that for controlling the second thread to analyze the messages in the plurality of buffers to perform the corresponding system call out of sequence.

제1항에 있어서, 상기 제1 프로세싱 엘리먼트(Processing Element)는
상기 공유 메모리에 상기 메시지를 저장한 이후에 제1 이벤트를 발생시키는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 1, wherein the first processing element is
And a first event after the storing of the message in the shared memory.

제5항에 있어서, 상기 제2 프로세싱 엘리먼트(Processing Element)는
상기 제1 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 저장된 메시지를 독출하여 상기 시스템 호출을 처리하고, 상기 공유 메모리에 접근하여 상기 메시지를 상기 시스템 호출의 처리에 따른 결과로서 수정한 후에 제2 이벤트를 발생시키는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 5, wherein the second processing element is
When the first event occurs, access the shared memory to read the stored message to process the system call, access to the shared memory to modify the message as a result of the processing of the system call and then the second event Manycore based core partitioning device, characterized in that for generating a.

제6항에 있어서, 상기 제1 프로세싱 엘리먼트(Processing Element)는
상기 제2 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 수정된 메시지를 독출하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 6, wherein the first processing element (Processing Element)
When the second event occurs, the core memory (Manycore) based core partitioning device, characterized in that for accessing the shared memory to read the modified message.

제1항에 있어서, 상기 제2 프로세싱 엘리먼트(Processing Element)는
복수의 프로세싱 코어(Processing Core)들; 및
코어 친화도 정책에 따라 상기 복수의 프로세싱 코어(Processing Core)들의 접근들을 관리하는 라스트 레벨 캐시(Last Level Cache)를 포함하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 1, wherein the second processing element is
A plurality of processing cores; And
Manycore based core partitioning device comprising a last level cache for managing access of the plurality of processing cores in accordance with a core affinity policy.

제8항에 있어서, 상기 라스트 레벨 캐시(Last Level Cache)는
상기 제2 쓰레드에 대해 상기 복수의 프로세싱 코어(Processing Core)들 각각에 대한 코어 친화도를 계산하고, 가장 높은 코어 친화도와 연관된 프로세싱 코어(Processing Core)를 결정하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 8, wherein the last level cache (Last Level Cache)
Manycore, characterized in that for computing the core affinity for each of the plurality of processing cores (Processing Core) for the second thread, and determine the Processing Core associated with the highest core affinity (Manycore) Based Core Partitioning Device.

제1항에 있어서, 상기 라스트 레벨 캐시(Last Level Cache) 공유 구조를 고려한 코어 친화도 결정 정책은
네트워크 장치가 연결된 I/O 버스를 소유한 프로세서 소켓에 포함된 제1 프로세싱 코어(Processing Core), 상기 제1 프로세싱 코어(Processing Core) 중에서 라스트 레벨 캐쉬(Last Level Cache)를 공유하는 제2 프로세싱 코어(Processing Core) 및 상기 제2 프로세싱 코어(Processing Core) 중에서 임계치 내의 가장 높은 부하를 가진 제3 프로세싱 코어(Processing Core)의 순서대로 높은 코어 친화도를 부여함으로써 상기 제2 쓰레드의 코어 친화도를 결정하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 장치.
The method of claim 1, wherein the core affinity determination policy in consideration of the last level cache shared structure is
A first processing core included in a processor socket owning an I / O bus to which a network device is connected, and a second processing core sharing a last level cache among the first processing cores. Determining the core affinity of the second thread by assigning high core affinity in order of the Processing Core and the third Processing Core with the highest load within a threshold among the second Processing Core. Manycore based core partitioning device.

프로세스에 고유 할당되어 쓰레드 간의 메시지 교환 과정에서 메시지를 저장하는 공유 메모리를 포함하는, 코어 파티셔닝(Core Partitioning) 장치에서 수행되는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법에 있어서,
제1 및 제2 쓰레드들을 포함하는 상기 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 이벤트의 동작과 연동함으로써 상기 메시지를 기록하고 독출하는 제1 프로세싱(Processing) 단계; 및
상기 제2 쓰레드를 통해 상기 이벤트의 동작과 연동함으로써, 상기 시스템 호출의 수행 전 과정에서 상기 제1 쓰레드에 의해 상기 공유 메모리에 기록된 상기 메시지를 독출하고, 상기 시스템 호출의 수행 후 과정에서 상기 공유 메모리에 상기 시스템 호출의 결과를 기록하며, 라스트 레벨 캐시(Last Level Cache) 공유 구조를 고려한 코어 친화도 결정 단계를 포함하는 제2 프로세싱(Processing) 단계를 포함하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법.
In the core partitioning method according to the Manycore (Core Partitioning) performed in the core partitioning device, including a shared memory that is uniquely allocated to the process and stores the message in the process of message exchange between threads,
Create the process including the first and second threads, and when a system call is performed by the process, access the shared memory through the first thread to interwork with the operation of the event in each process before and after the system call. First processing to thereby record and read the message; And
By interlocking with the operation of the event through the second thread, the message written to the shared memory by the first thread is read by the first thread before the system call is executed, and the sharing is performed after the system call. Manycore based core partitioning, including a second processing step of recording a result of the system call in memory and including a core affinity determination step in consideration of a last level cache shared structure ( Core Partitioning Method.

제11항에 있어서, 상기 공유 메모리는
단일 버퍼(Buffer)를 통해 상기 메시지를 저장하여 상기 시스템 호출의 동기성을 유지하는 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법.
12. The system of claim 11, wherein the shared memory
Manycore based core partitioning method, characterized in that the message is stored through a single buffer to maintain synchronization of the system call.

제11항에 있어서, 상기 제1 프로세싱(Processing) 단계는
상기 공유 메모리에 상기 메시지를 저장한 이후에 제1 이벤트를 발생시키는 단계인 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법.
The method of claim 11, wherein the first processing step is performed.
And generating a first event after storing the message in the shared memory.

제13항에 있어서, 상기 제2 프로세싱(Processing) 단계는
상기 제1 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 저장된 메시지를 독출하여 상기 시스템 호출을 처리하고, 상기 공유 메모리에 접근하여 상기 메시지를 상기 시스템 호출의 처리에 따른 결과로서 수정한 후에 제2 이벤트를 발생시키는 단계인 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법.
The method of claim 13, wherein the second processing step
When the first event occurs, access the shared memory to read the stored message to process the system call, access to the shared memory to modify the message as a result of the processing of the system call and then the second event Manycore based core partitioning method, characterized in that the step of generating a.

제14항에 있어서, 상기 제1 프로세싱(Processing) 단계는
상기 제2 이벤트가 발생되면 상기 공유 메모리에 접근하여 상기 수정된 메시지를 독출하는 단계인 것을 특징으로 하는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법.
15. The method of claim 14, wherein the first processing step
And if the second event occurs, accessing the shared memory and reading the modified message.

프로세스에 고유 할당되어 쓰레드 간의 메시지 교환 과정에서 메시지를 저장하는 공유 메모리를 포함하는, 코어 파티셔닝(Core Partitioning) 장치에서 수행되는 매니코어(Manycore) 기반 코어 파티셔닝(Core Partitioning) 방법에 있어서,
제1 및 제2 쓰레드들을 포함하는 상기 프로세스를 생성하고, 상기 프로세스에 의해 시스템 호출이 수행되면 상기 제1 쓰레드를 통해 상기 공유 메모리에 접근하여 상기 시스템 호출의 전후 각각의 과정에서 이벤트의 동작과 연동함으로써 상기 메시지를 기록하고 독출하는 제1 프로세싱(Processing) 단계; 및
상기 제2 쓰레드를 통해 상기 이벤트의 동작과 연동함으로써, 상기 시스템 호출의 수행 전 과정에서 상기 제1 쓰레드에 의해 상기 공유 메모리에 기록된 상기 메시지를 독출하고, 상기 시스템 호출의 수행 후 과정에서 상기 공유 메모리에 상기 시스템 호출의 결과를 기록하며, 라스트 레벨 캐시(Last Level Cache) 공유 구조를 고려한 코어 친화도 결정 단계를 포함하는 제2 프로세싱(Processing) 단계를 포함하는 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.

In the core partitioning method according to the Manycore (Core Partitioning) performed in the core partitioning device, including a shared memory that is uniquely allocated to the process and stores the message in the process of message exchange between threads,
Create the process including the first and second threads, and when a system call is performed by the process, access the shared memory through the first thread to interwork with the operation of the event in each process before and after the system call. First processing to thereby record and read the message; And
By interlocking with the operation of the event through the second thread, the message written to the shared memory by the first thread is read by the first thread before the system call is executed, and the sharing is performed after the system call. Recording a result of the system call into a memory, and recording a program for performing a method including a second processing step including a core affinity determination step in consideration of a Last Level Cache shared structure; Readable record carrier.