KR20230036945A

KR20230036945A - Method and apparatus for managing resouce in edge computing environment

Info

Publication number: KR20230036945A
Application number: KR1020210152915A
Authority: KR
Inventors: 강성주; 전재호; 김용연; 나갑주; 이준희; 전인걸
Original assignee: 한국전자통신연구원
Priority date: 2021-09-08
Filing date: 2021-11-09
Publication date: 2023-03-15

Abstract

A resource management method performed on an edge server cluster configured with a worker node and a master node according to one embodiment of the present invention for achieving the objective comprises: a step of receiving an inference request from an image acquisition terminal; a step of receiving accelerator information and node information from a state analysis module; and a step of selecting a node and an accelerator to perform the inference request based on the node information and the accelerator information. Therefore, the present invention is capable of efficiently managing operation/acceleration resources in an edge server environment.

Description

엣지 컴퓨팅 환경의 자원 관리 방법 및 장치{METHOD AND APPARATUS FOR MANAGING RESOUCE IN EDGE COMPUTING ENVIRONMENT}Edge computing environment resource management method and apparatus {METHOD AND APPARATUS FOR MANAGING RESOUCE IN EDGE COMPUTING ENVIRONMENT}

본 발명은 기술에 엣지 컴퓨팅 환경에서 가속기 등의 자원을 효율적으로 관리하기 위한 방법에 관한 것이다.The present invention relates to a method for efficiently managing resources such as accelerators in an edge computing environment.

구체적으로, 본 발명은 이종의 연산/가속기가 존재하는 엣지 컴퓨팅 환경에서 가장 적합한 자원을 선택하고, 선택한 자원을 활용하여 대기 없이 추론 서비스를 제공하기 위한 기술에 관한 것이다.Specifically, the present invention relates to a technology for selecting the most suitable resource in an edge computing environment where heterogeneous computational/accelerators exist and providing an inference service without waiting by utilizing the selected resource.

초저지연 서비스의 기반이 되는 5G 이동통신과 엣지 컴퓨팅 기술의 발전으로 실세계의 사물과 근접한 곳에서 다양한 인공지능 서비스가 가능해지고 있다. 하지만 복잡한 연산 처리와 데이터 저장을 위한 대규모 자원이 필요한 인공지능 서비스의 특성상 이를 효과적으로 운영하는 방법이 필요하며, 최근에는 학습은 데이터센터의 클라우드에서, 그리고 추론은 실세계 현장의 엣지 서버를 연계해서 인공지능 서비스를 수행하는 방법들이 시도되고 있다.With the development of 5G mobile communication and edge computing technology, which are the basis of ultra-low latency services, various artificial intelligence services are becoming possible in close proximity to objects in the real world. However, due to the nature of artificial intelligence services that require large-scale resources for complex calculation processing and data storage, it is necessary to operate them effectively. Methods of performing the service are being tried.

인공지능 서비스를 이루는 주요 기능 중 영상 기반의 추론은 실세계에서 일어나는 다양한 상황에 대한 영상을 획득하고, 이를 기존에 학습된 데이터와 비교해 확률적으로 근접한 결과를 도출하는 기술로서 엣지 컴퓨팅 기술이 점차 고도화됨에 따라 예전의 산업현장에서는 할 수 없었던 다양한 현장 상황 추론이 시도되고 있다.Among the main functions of artificial intelligence services, image-based reasoning is a technology that obtains images of various situations occurring in the real world and compares them with previously learned data to derive probabilistically close results. Edge computing technology is gradually advancing. Accordingly, various field situation inferences that could not be done in previous industrial sites are being attempted.

예를 들어 공항, 항만, 철도 등 주요 시설에서의 위험물 자동 판별, 열악한 현장 작업자의 보행 동작을 분석한 낙상사고 검출, 방문객에 대한 얼굴 인식이나 번호판 인식 등이 대표적이다.For example, automatic identification of hazardous materials at major facilities such as airports, ports, and railroads, fall accident detection by analyzing the walking motions of poor field workers, and facial recognition or license plate recognition for visitors are representative examples.

실세계와 근접한 엣지 컴퓨팅 환경은 클라우드 컴퓨팅 환경에 비해 연산자원이 제한적이다. 실제로 엣지 컴퓨팅 환경에 배치되는 엣지 서버 클러스터의 규격은 단일 서버나 4U 수준의 클러스터 시스템이며, 특히 빠른 추론 기능에 필수적인 GPU, FPGA 등 가속기 자원은 여러 응용에 의해 공유되기 어려우므로 더욱더 제한적이다.The edge computing environment, which is close to the real world, has limited operating resources compared to the cloud computing environment. In fact, the specifications of edge server clusters deployed in edge computing environments are single server or 4U-level cluster systems, and in particular, accelerator resources such as GPU and FPGA, which are essential for fast inference functions, are more limited because they are difficult to share by multiple applications.

따라서 다수의 영상 획득 장치(ex. CCTV)들로부터 유입되는 대규모의 영상 기반 추론 요청을 제한된 연산·가속기 자원으로 처리하기 위해서는 엣지 서버 클러스터의 컴퓨팅 자원에 대한 효율적 관리 방법이 필요하다.Therefore, in order to process large-scale video-based inference requests from multiple video capture devices (ex. CCTV) with limited computing and accelerator resources, an efficient management method for computing resources of edge server clusters is required.

국내 공개특허공보 제 10-2021-0064033호(발명의 명칭: 공존 에지 컴퓨팅에서 분산 게임 이론을 기반으로 무선 및 컴퓨팅 리소스를 관리하는 장치 및 방법)Korean Patent Publication No. 10-2021-0064033 (Title of Invention: Apparatus and Method for Managing Wireless and Computing Resources Based on Distributed Game Theory in Coexistence Edge Computing)

본 발명의 목적은 이종의 연산/가속기가 존재하는 엣지 서버 환경에서 효율적으로 연산/가속 자원을 관리하는 것이다.An object of the present invention is to efficiently manage computation/acceleration resources in an edge server environment where heterogeneous computation/accelerators exist.

또한, 본 발명의 목적은 다양한 추론 요청에 대해 유휴 노드에서 비할당된 가속기를 선별하여 원하는 서비스를 대기 없이 수행하는 것이다.In addition, an object of the present invention is to perform a desired service without waiting by selecting an unallocated accelerator in an idle node for various inference requests.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 워커 노드 및 마스터 노드로 구성된 엣지 서버 클러스터에 수행되는 자원 관리 방법은 영상 획득 단말로부터 추론 요청을 수신하는 단계, 상태 분석 모듈로부터 가속기 정보 및 노드 정보를 수신하는 단계, 및 상기 노드 정보 및 가속기 정보에 기반하여 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계를 포함한다.A resource management method performed on an edge server cluster composed of worker nodes and master nodes according to an embodiment of the present invention for achieving the above object includes receiving an inference request from an image acquisition terminal, accelerator information and Receiving node information, and selecting a node and an accelerator to perform the inference request based on the node information and accelerator information.

이때, 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계는 가상화된 추론 서비스 저장소의 추론 응용 컨테이너를 선택할 수 있다.In this case, in the step of selecting a node and an accelerator to perform the inference request, an inference application container of a virtualized inference service storage may be selected.

이때, 상기 추론 응용 컨테이너는 특정 가속기 정보가 서비스 정의 파일에 명시되어 있을 수 있다.In this case, the inference application container may have specific accelerator information specified in a service definition file.

이때, 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계는 상기 수신한 추론 요청에 상응하는 추론 종류와 매칭되는 컨테이너 중 노드 및 가속기를 선택하는 시점의 가용 노드와 가속기 정보에 기반하여 노드 및 가속기를 선택할 수 있다.At this time, the step of selecting a node and an accelerator to perform the inference request is a node and an accelerator based on available node and accelerator information at the time of selecting a node and an accelerator among containers that match the inference type corresponding to the received inference request. can choose

이때, 본 발명의 일 실시예에 따른 워커 노드 및 마스터 노드로 구성된 엣지 서버 클러스터에 수행되는 자원 관리 방법은 상기 선택된 컨테이너 정보를 클러스터 스케줄러에 전달하는 단계, 및 클러스터 스케줄러가 선택된 워커 노드에 추론 서비스를 요청하는 단계를 더 포함할 수 있다.At this time, the resource management method performed in the edge server cluster composed of the worker node and the master node according to an embodiment of the present invention includes the steps of transmitting the selected container information to the cluster scheduler, and the cluster scheduler providing an inference service to the selected worker node. A requesting step may be further included.

본 발명에 따르면, 이종의 연산/가속기가 존재하는 엣지 서버 환경에서 효율적으로 연산/가속 자원을 관리할 수 있다.According to the present invention, it is possible to efficiently manage computation/acceleration resources in an edge server environment where heterogeneous computation/accelerators exist.

또한, 본 발명은 다양한 추론 요청에 대해 유휴 노드에서 비할당된 가속기를 선별하여 원하는 서비스를 대기 없이 수행할 수 있다.In addition, according to the present invention, a desired service can be performed without waiting by selecting an unallocated accelerator in an idle node for various inference requests.

도 1은 본 발명의 일실시예에 따른 엣지 컴퓨팅 환경의 자원 관리 방법을 개념적으로 나타낸 흐름도이다.
도 2는 본 발명의 일실시예에 따른 자원 관리 시스템의 구성을 나타낸 연결도이다.
도 3은 본 발명의 실시예에 따른 시스템의 구성을 상세히 나타낸 도면이다.
도 4는 본 발명의 일실시예에 따른 자원 관리 방법을 상세하게 나타낸 흐름도이다.
도 5는 실시예에 따른 컴퓨터 시스템의 구성을 나타낸 도면이다.1 is a flowchart conceptually illustrating a resource management method in an edge computing environment according to an embodiment of the present invention.
2 is a connection diagram showing the configuration of a resource management system according to an embodiment of the present invention.
3 is a diagram showing the configuration of a system according to an embodiment of the present invention in detail.
4 is a flowchart illustrating in detail a resource management method according to an embodiment of the present invention.
5 is a diagram showing the configuration of a computer system according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, only these embodiments make the disclosure of the present invention complete, and common knowledge in the art to which the present invention belongs. It is provided to fully inform the holder of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numbers designate like elements throughout the specification.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although "first" or "second" is used to describe various elements, these elements are not limited by the above terms. Such terms may only be used to distinguish one component from another. Therefore, the first component mentioned below may also be the second component within the technical spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.Terms used in this specification are for describing embodiments and are not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" or "comprising" implies that a stated component or step does not preclude the presence or addition of one or more other components or steps.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used herein may be interpreted as meanings commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면 부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when describing with reference to the drawings, the same or corresponding components are given the same reference numerals, and overlapping descriptions thereof will be omitted. .

컨테이너 오케스트레이션 플랫폼인 쿠버네티스 상에서 추론 서비스를 용이하게 하기 위한 솔루션이 제안되고 있다. KFServing은 쿠버네티스에서 Tensorflow, XGBoost, ScikitLearn, PyTorch 및 ONNX와 같은 일반적인 머신러닝 프레임워크에 대한 고성능의 높은 추상화 인터페이스를 제공한다. KFServing을 통해 다양한 머신 러닝 프레임워크를 제공하기 위한, 쿠버네티스 사용자 리소스의 추상화가 잘 되어 있다. 그래서 쉽고 간편하게 추론 서비스를 생성할 수 있다.A solution is being proposed to facilitate inference services on top of Kubernetes, a container orchestration platform. KFServing provides high-performance, high-abstraction interfaces to common machine learning frameworks such as Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX on Kubernetes. There is a good abstraction of Kubernetes user resources to provide various machine learning frameworks through KFServing. So, you can easily and simply create an inference service.

셀던 코어(Seldon core)도 머신러닝 모델을 쿠버네티스에 대규모로 배포하기 위한 오픈 소스 플랫폼이다. 셀던 코어는 머신러닝 모델(Tensorflow, Pytorch, H2o 등)을 REST 기반 서버 프로그램으로 변환한다. 셀던 코어는 수천 개의 프로덕션 머신러닝 모델로 확장 처리하고 고급 메트릭, 요청 로깅, 설명자, 이상값 감지기, A/B 테스트, 카나리아 등을 포함한 고급 머신 러닝 기능을 서비스로 제공한다.Seldon core is also an open source platform for deploying machine learning models to Kubernetes at scale. Sheldon Core converts machine learning models (Tensorflow, Pytorch, H2o, etc.) into REST-based server programs. Sheldon Core scales to thousands of production machine learning models and provides advanced machine learning capabilities as a service, including advanced metrics, request logging, descriptors, outlier detectors, A/B testing, canaries and more.

인텔의 오픈비노(OpenVINO)는 딥러닝 응용을 신속한 개발을 지원하는 오픈소스 툴킷이다. 오픈비노는 CNN을 기반으로 하여 인텔 하드웨어 가속기에 대한 딥러닝과 Xeon CPU, Iris GPU, Arria FPGA 및 Movidius NPU 등의 연산·가속기에서의 추론 기능 실행을 가능하게 한다. 오픈비노 모델 서버(Model Server)는 오픈비노 툴킷을 통해 최적화된 모델을 이용해 추론 기능을 서비스 형태로 제공할 수 있도록 한다.Intel's OpenVINO is an open source toolkit that supports rapid development of deep learning applications. OpenVino is based on CNN and enables deep learning on Intel hardware accelerators and execution of inference functions on compute and accelerators such as Xeon CPU, Iris GPU, Arria FPGA and Movidius NPU. The OpenVino Model Server enables inference functions to be provided in the form of a service using models optimized through the OpenVino toolkit.

임의의 머신러닝 추론 모델을 가진 개발자 또는 운영자는 상기 솔루션들을 활용해 특정 모델을 입력으로 하여 추론 서비스(서버) 프로그램을 출력물로 변환할 수 있다. 하지만 실제 프로그램 실행 환경에서 CPU, GPU, FPGA, NPU 등 연산·가속기는 서비스(프로세스) 간 경쟁을 통해 획득하는 자원이다. A developer or operator with an arbitrary machine learning inference model can use the above solutions to convert an inference service (server) program into an output by taking a specific model as an input. However, in an actual program execution environment, calculation and accelerators such as CPU, GPU, FPGA, and NPU are resources obtained through competition between services (processes).

따라서 실제 추론 서비스를 제공하는 환경에서 임의의 추론 요청을 처리하기 위한 추론 응용(서버)이 다른 응용에 의해 선점된 가속기 자원을 장시간 대기하거나, 동일한 추론이 가능한 다른 연산·가속기가 가용한 상태임에도 불구하고 추론 서비스가 유휴 가속기를 사용할 수 없기 때문에 추론 기능이 개시되지 않는 문제가 있다. 왜냐하면 상기 솔루션-KFServing, Seldon core, OpenVINO 등의 솔루션은 추론 서비스 변환 시점에 추론에 사용하는 가속기가 결정되어 있기 때문이다.Therefore, in an environment that provides an actual inference service, an inference application (server) for processing an arbitrary inference request waits for an accelerator resource preoccupied by another application for a long time, or even if another operation accelerator capable of the same inference is available. There is a problem in that the reasoning function is not started because the reasoning service cannot use the idle accelerator. This is because the above solution—KFServing, Seldon core, OpenVINO, etc.—has determined the accelerator used for inference at the time of inference service conversion.

위와 같은 문제를 해결하기 위해서 본 발명에서는 이종의 연산·가속기가 존재하는 엣지 서버 환경에서 실제 영상 획득 장치에서 추론 요청 시 현재 엣지 서버 클러스터 상의 이종 연산·가속기 자원 중 가장 적합한 자원을 선택하고, 선택된 종류의 연산·가속기 자원을 활용해 추론 기능을 수행하는 응용을 컨테이너 저장소로부터 추출해 클러스터 내의 노드에 활성화하여 원하는 추론 서비스를 대기 없이 수행할 수 있게 한다.In order to solve the above problem, in the present invention, in an edge server environment where heterogeneous computational accelerators exist, when an actual image capture device requests inference, the most appropriate resource is selected among heterogeneous computational and accelerator resources on the current edge server cluster, and the selected type is selected. An application that performs reasoning by using computational and accelerator resources is extracted from the container storage and activated on a node in the cluster so that the desired inference service can be performed without waiting.

도 1은 본 발명의 일실시예에 따른 엣지 컴퓨팅 환경의 자원 관리 방법을 개념적으로 나타낸 흐름도이다.1 is a flowchart conceptually illustrating a resource management method in an edge computing environment according to an embodiment of the present invention.

이때, 본 발명의 일실시예에 따른 엣지 컴퓨팅 환경의 자원 관리 방법은 워커 노드 및 마스터 노드로 구성된 엣지 서버 클러스터에 수행될 수 있다.At this time, the resource management method of the edge computing environment according to an embodiment of the present invention may be performed in an edge server cluster composed of worker nodes and master nodes.

이때, 상기 워커 노드는 상기 엣지 서버 클러스터 내에 복수개 존재할 수 있다.In this case, a plurality of worker nodes may exist in the edge server cluster.

이때, 상기 마스터 노드는 상기 엣지 서버 클러스터 내에 일반적으로 하나의 노드가 존재할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.At this time, the master node may generally exist in one node in the edge server cluster, but the scope of the present invention is not limited thereto.

도 1을 참조하면, 실시예에 따른 자원 관리 방법은 영상 획득 단말로부터 추론 요청을 수신한다(S110).Referring to FIG. 1 , the resource management method according to the embodiment receives an inference request from an image acquisition terminal (S110).

이때, 상기 추론 요청은 위험물 자동 판별, 열악한 현장 작업자의 보행 동작을 분석한 낙상사고 검출, 방문객에 대한 얼굴 인식, 차량 번호판 인식 등을 포함할 수 있다.In this case, the reasoning request may include automatic determination of dangerous substances, detection of a fall accident by analyzing a poor field worker's walking motion, face recognition of a visitor, license plate recognition, and the like.

이때, 상기 추론 요청을 수신하는 단계(S110)는 워커 노드의 추론 요청 분석 모듈(추론 요청 분석기)에서 수행될 수 있다.At this time, the step of receiving the inference request (S110) may be performed by an inference request analysis module (inference request analyzer) of the worker node.

다음으로, 상태 분석 모듈(상태 추출기)로부터 가속기 정보 및 노드 정보를 수신한다(S120).Next, accelerator information and node information are received from the state analysis module (state extractor) (S120).

이때, 상기 가속기 정보는 가속기의 종류 및 가속기가 유휴 상태인지 여부를 포함할 수 있다.In this case, the accelerator information may include the type of accelerator and whether the accelerator is in an idle state.

이때 가속기의 종류는 CPU(Central Processing Unit), GPU(Graphic Processing Unit), FPGA(Field-Programmable Gate Array), 및 NPU(Neural Processing Unit)를 포함할 수 있다. 다만, 본 발명의 범위가 이에 한정되는 것은 아니며 연산/가속기의 종류는 다양하게 차용될 수 있다.In this case, types of accelerators may include a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a Field-Programmable Gate Array (FPGA), and a Neural Processing Unit (NPU). However, the scope of the present invention is not limited thereto, and various types of operation/accelerators may be employed.

다음으로, 상기 노드 정보 및 가속기 정보에 기반하여 상기 추론 요청을 수행할 노드 및 가속기를 선택한다(S130).Next, a node and an accelerator to perform the inference request are selected based on the node information and the accelerator information (S130).

이때, 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계(S130)는 가속기 결정 모듈(가속기 결정기)에서 수행될 수 있다.In this case, the step of selecting a node and an accelerator to perform the inference request (S130) may be performed by an accelerator determination module (accelerator determiner).

이때, 도 1에는 도시되지 않았지만, 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계(S130)의 수행을 위해 추론 요청 분석 모듈 및 상태 분석 모듈의 정보를 가속기 결정 모듈로 전달할 수 있다.At this time, although not shown in FIG. 1 , information of the inference request analysis module and the state analysis module may be transmitted to the accelerator determination module to perform the step of selecting the node and the accelerator to perform the inference request (S130).

이때, 가상화된 추론 서비스 저장소는 엣지 서버 내 가속기로 구동되는 추론 응용 전용 컨테이너와 서비스 정의 파일이 저장된 공간에 상응할 수 있다.In this case, the virtualized inference service storage may correspond to a space in which a dedicated container for an inference application driven by an accelerator in the edge server and a service definition file are stored.

이때, 도 1에는 도시되지 않았지만 본 발명의 실시예에 따른 자원 관리 방법은 상기 선택된 컨테이너 정보를 클러스터 스케줄러에 전달하는 단계, 및 클러스터 스케줄러가 선택된 워커 노드에 추론 서비스를 요청하는 단계를 더 포함할 수 있다.At this time, although not shown in FIG. 1, the resource management method according to an embodiment of the present invention may further include transmitting the selected container information to a cluster scheduler, and requesting an inference service from the selected worker node by the cluster scheduler. there is.

도 2는 본 발명의 일실시예에 따른 자원 관리 시스템의 구성을 나타낸 연결도이다.2 is a connection diagram showing the configuration of a resource management system according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일실시예에 따른 자원 관리 시스템은 영상획득장치(100), 마스터 노드(200), 워커 노드(300-1, 300-2)로 구성될 수 있다.Referring to FIG. 2 , a resource management system according to an embodiment of the present invention may include an image acquisition device 100, a master node 200, and worker nodes 300-1 and 300-2.

이때, 상기 영상획득장치(100), 마스터 노드(200), 및 워커 노드(300-1, 300-2)는 네트워크(10)를 통해 연결될 수 있다.At this time, the image acquisition device 100, the master node 200, and the worker nodes 300-1 and 300-2 may be connected through the network 10.

이때, 상기 영상획득장치(100)는 추론 요청을 수행하는 스마트 폰, PC(Personal computer), 태블릿 PC, CCTV 등을 포함하는 개념일 수 있다.At this time, the image acquisition device 100 may be a concept including a smart phone, a PC (Personal computer), a tablet PC, a CCTV, etc. that perform an inference request.

도 3은 본 발명의 실시예에 따른 시스템의 구성을 상세히 나타낸 도면이다.3 is a diagram showing the configuration of a system according to an embodiment of the present invention in detail.

도 3을 참조하면, 다수의 엣지 서버 노드가 엣지 서버 클러스터를 구성하는 것을 알 수 있다. 쿠버네티스에서 시스템 구성을 실시할 경우 클러스터에는 마스터 노드와 워커 노드가 존재하며, 각각의 노드에 일반 컴포넌트 외에 본 발명의 컴포넌트가 추가된다. Referring to FIG. 3 , it can be seen that a plurality of edge server nodes constitute an edge server cluster. When configuring the system in Kubernetes, there are master nodes and worker nodes in the cluster, and the components of the present invention are added to each node in addition to the general components.

본 발명의 컴포넌트는 추론 요청 분석기(310), 상태 추출기(320), 연산·가속기 결정기(210), 클러스터 상태 모니터(220), 가상화된 추론 서비스 저장소(410)로 구성된다.The components of the present invention are composed of an inference request analyzer 310, a state extractor 320, an operation/accelerator determiner 210, a cluster state monitor 220, and a virtualized inference service storage 410.

추론 요청 분석기(310)는 쿠버네티스 워커 노드에 파드 형태(쿠버네티스 내에서 응용의 최소 단위로서 컨테이너의 집합)로 배포가 된다. 추론 분석 요청기(310)의 역할은 일정한 시간 간격마다 또는 외부의 영상 획득 장치에서 추론 요청이 들어왔을 때 자신이 배치된 워커 노드의 가속기 정보를 모니터링하는 것이다. The inference request analyzer 310 is deployed in the form of a pod (a set of containers as a minimum unit of application in Kubernetes) on a Kubernetes worker node. The role of the inference analysis requester 310 is to monitor the accelerator information of the worker node where it is deployed at regular time intervals or when an inference request is received from an external image acquisition device.

이어서 클러스터 상태 모니터(220)에서 수집 중인 워커 노드들의 상태 정보를 바탕으로 해당 워커 노드에 어떠한 연산·가속기(CPU, GPU, FPGA, NPU)가 유휴한 상태인지를 판단한다. 파악된 상태 정보는 최종적으로 연산·가속기 결정기(210)에 전달된다.Subsequently, based on the state information of the worker nodes being collected by the cluster state monitor 220, it is determined which operation/accelerator (CPU, GPU, FPGA, NPU) is idle in the corresponding worker node. The identified state information is finally delivered to the operation/accelerator determiner 210.

상태 추출기(320)는 자신이 배포된 노드의 연산·가속기 정보를 수집한다. 상태 추출기(320)가 저장하는 정보는 해당 노드의 연산·가속기 종류, 가속기 ID, 가속기 상태를 포함할 수 있다.The state extractor 320 collects operation/accelerator information of the node where it is deployed. The information stored by the state extractor 320 may include operation/accelerator type, accelerator ID, and accelerator state of the corresponding node.

연산·가속기 결정기(210)는 클러스터 내 모든 워커 노드의 추론 요청 분석기(310)로부터 가속기 정보를 전달받고, 클러스터 상태 모니터로부터 유휴 노드 정보를 전달받아 <유휴 가속기 종류, 가속기 ID, 노드 이름> 정보를 갱신한다.The operation/accelerator determiner 210 receives accelerator information from the inference request analyzer 310 of all worker nodes in the cluster, and receives idle node information from the cluster status monitor to obtain <idle accelerator type, accelerator ID, node name> information. update

또한, 사용자가 요청한 <추론 종류>와 매칭되는 추론 컨테이너를 가상화된 추론 서비스 저장소(410)로부터 확인한다. 끝으로 선택된 추론 응용 파드를 선택된 노드에 배포하도록 클러스터의 스케줄러에 스케줄링을 요청한다.In addition, an inference container matching the <inference type> requested by the user is checked from the virtualized inference service storage 410 . Finally, it requests scheduling from the cluster's scheduler to deploy the selected inference application pods to the selected nodes.

클러스터 상태 모니터(220)는 각 워커 노드에 배포된 상태 추출기(320)가 수집하는 정보를 스크랩하는 서버이다. 이를 통해 워커 노드별 <유휴 가속기 종류, 가속기 ID> 정보를 수집한다The cluster state monitor 220 is a server that scraps information collected by the state extractor 320 distributed in each worker node. Through this, <idle accelerator type, accelerator ID> information is collected for each worker node.

가상화된 추론 서비스 저장소(410)는 엣지 서버 내 연산·가속기(CPU, FPGA, GPU, NPU)로 구동되는 추론 응용 전용 컨테이너와 서비스 정의 파일의 저장소이다. 각각의 추론 응용 컨테이너들은 하나의 특정 가속기 정보가 서비스 정의 파일에 명시되어 있고, 연산·가속기 결정기(210)에 의해서 스케줄러에 요청되어 실행될 수 있다. 도 3의 가상화된 추론 서비스 저장소(410) 내의 추론 응용 컨테이너의 표현은 다음과 같은 의미를 가진다. A, B, C, D는 얼굴 인식, 객체 인식 등 각기 다른 추론 응용의 종류를 나타낸다.The virtualized inference service storage 410 is a container dedicated to inference applications driven by computational accelerators (CPU, FPGA, GPU, NPU) in the edge server and a storage of service definition files. For each inference application container, one specific accelerator information is specified in the service definition file, and can be requested and executed by the scheduler by the operation/accelerator determiner 210. The expression of the inference application container in the virtualized inference service storage 410 of FIG. 3 has the following meaning. A, B, C, and D represent different types of inference applications, such as face recognition and object recognition.

- I_AH_C= CPU로 A 종류의 추론하는 응용 프로그램 - I _A H _C = A kind of reasoning application with CPU

- I_BH_G = GPU로 B 종류의 추론하는 응용 프로그램- I _B H _G = B type inference application with GPU

- I_CH_F = FPGA로 C 종류의 추론하는 응용 프로그램- I _C H _F = C kind of reasoning application with FPGA

- I_DH_N = NPU로 D 종류의 추론하는 응용 프로그램- I _D H _N = D types of inference applications with NPU

도 4는 본 발명의 일실시예에 따른 자원 관리 방법을 상세하게 나타낸 흐름도이다.4 is a flowchart illustrating in detail a resource management method according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일실시예에 따른 자원 관리 방법은 추론 요청 분석 단계, 유휴 연산·가속기 결정 단계, 추론 서비스 추출 단계, 추론 서비스 실행 단계를 포함한다.Referring to FIG. 4 , a resource management method according to an embodiment of the present invention includes an inference request analysis step, an idle operation/accelerator determination step, an inference service extraction step, and an inference service execution step.

이하, 도 4를 참조하여 실제 엣지 컴퓨팅 환경에서 영상 획득 장치로부터 임의의 추론 요청이 서버에 도착해서, 해당 요청을 처리하기 위한 가상화된 추론 서비스 응용이 임의의 워커 노드에 배치되기까지의 순서를 상세히 설명한다.Hereinafter, with reference to FIG. 4, the sequence from when an arbitrary inference request arrives at a server from an image acquisition device in an actual edge computing environment to when a virtualized inference service application for processing the request is deployed on an arbitrary worker node is detailed. Explain.

먼저, 영상 획득 장치(100)은 획득한 영상에 기반하여 추론 요청 분석기(310)에 임의의 추론 서비스를 요청한다(S202).First, the image capture device 100 requests an arbitrary reasoning service from the reasoning request analyzer 310 based on the obtained image (S202).

영상 획득 장치(100)로부터 추론 서비스 요청을 수신한(S204) 추론 요청 분석기(310)는 클러스터 상태 모니터(220)에 상기 수신한 추론 요청을 전달한다(S206).Upon receiving the inference service request from the image capture device 100 (S204), the inference request analyzer 310 forwards the received inference request to the cluster state monitor 220 (S206).

클러스터 상태 모니터(220)는 클러스터 모든 워커 노드의 상태 추출기(320)로부터 현재 보유 중인 연산/가속기 정보와 가용 정보를 획득한다.The cluster state monitor 220 obtains currently held operation/accelerator information and available information from the state extractor 320 of all worker nodes in the cluster.

보다 상세하게, 클러스터 상태 모니터(220)는 워커 노드 각각의 상태 추출기(320)에 가속기 정보 요청을 전달한다(S208).In more detail, the cluster state monitor 220 transmits a request for accelerator information to the state extractor 320 of each worker node (S208).

가속기 정보 요청을 수신한 상태 추출기(320)는 가속기 정보를 모니터링하고(S210), 가속기 정보 및 노드 정보를 반환한다(S212).Upon receiving the accelerator information request, the state extractor 320 monitors the accelerator information (S210) and returns accelerator information and node information (S212).

가속기 정보 및 노드 정보를 반환 받은 클러스터 상태 모니터(220)는 유휴 가속기 및 노드 정보를 추론 요청 분석기(310)에 전달한다(S214).Upon receiving the returned accelerator information and node information, the cluster state monitor 220 transfers the idle accelerator and node information to the inference request analyzer 310 (S214).

추론 요청 분석기(310)는 파악된 유휴 연산·가속기 정보와 노드 정보를 연산·가속기 결정기(210)에 전달한다(S216).The inference request analyzer 310 transfers the identified idle operation/accelerator information and node information to the operation/accelerator determiner 210 (S216).

연산·가속기 결정기(210)는 가상화된 추론 서비스 저장소(410) 내에서 현재 유휴 노드와 가속기 정보에 기반하여 영상 획득 장치가 요청한 추론의 종류와 매칭되는 컨테이너를 선택한다(S218).The operation/accelerator determiner 210 selects a container that matches the type of inference requested by the image acquisition device based on information on a currently idle node and an accelerator within the virtualized inference service storage 410 (S218).

가상화된 추론 서비스 저장소(410)는 선택된 추론 서비스 응용 정보를 반환(S220)한다.The virtualized reasoning service storage 410 returns the selected reasoning service application information (S220).

연산·가속기 결정기(210)는 선택된 추론 컨테이너 정보를 클러스터 스케줄러(230)에 전달하고, 추론 서비스 응용의 실행을 요청한다(S222).The operation/accelerator determiner 210 transfers the selected inference container information to the cluster scheduler 230 and requests execution of the inference service application (S222).

클러스터 스케줄러(230)는 선택된 워커 노드에 추론 서비스를 실행한다. 실행된 추론 서버의 주소가 영상 획득 장치(100)에 전달되고, 추론 서비스가 시작된다.Cluster scheduler 230 runs an inference service on selected worker nodes. The address of the executed inference server is transferred to the image capture device 100, and the inference service is started.

기존의 기술에 의하면 서비스 정의 파일에 응용 실행을 위해 요구되는 자원을 기입하고, 이를 바탕으로 쿠버네티스 스케줄러에 의해 파드가 배포되는 현재의 방식에서 서비스 운영자는 다양한 유휴 가속기를 사용하기 위해 파드 정의 파일을 수정하는 작업을 반복하고 배치를 요청하는 작업도 반복해야 하는 번거로움이 있다.According to the existing technology, in the current method in which resources required for application execution are written in the service definition file and pods are distributed by the Kubernetes scheduler based on this, the service operator uses the pod definition file to use various idle accelerators. It is inconvenient to repeat the process of modifying and requesting batches.

본 발명의 구성에 따라 시스템을 구현함으로써 다양한 추론 요청에 대해 유휴 노드에서 비할당된 가속기를 검색해서 원하는 추론 서비스를 대기 없이 수행할 수 있게 해준다. 이는 관련 서비스 개발 및 운영 효율을 높이는데 기여할 수 있고, 결과적으로 쿠버네티스 기반의 엣지 컴퓨팅 환경에서의 전체 추론 성능을 향상시킬 수 있다.By implementing the system according to the configuration of the present invention, it is possible to perform a desired inference service without waiting by searching for an unallocated accelerator in an idle node for various inference requests. This can contribute to increasing related service development and operational efficiency, and as a result, overall inference performance in a Kubernetes-based edge computing environment can be improved.

도 5은 실시예에 따른 컴퓨터 시스템의 구성을 나타낸 도면이다.5 is a diagram showing the configuration of a computer system according to an embodiment.

실시예에 따른 엣지 컴퓨팅 환경의 자원 관리 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1000)에서 구현될 수 있다.An apparatus for managing resources in an edge computing environment according to an embodiment may be implemented in the computer system 1000 such as a computer-readable recording medium.

컴퓨터 시스템(1000)은 버스(1020)를 통하여 서로 통신하는 하나 이상의 프로세서(1010), 메모리(1030), 사용자 인터페이스 입력 장치(1040), 사용자 인터페이스 출력 장치(1050) 및 스토리지(1060)를 포함할 수 있다. 또한, 컴퓨터 시스템(1000)은 네트워크(1080)에 연결되는 네트워크 인터페이스(1070)를 더 포함할 수 있다. 프로세서(1010)는 중앙 처리 장치 또는 메모리(1030)나 스토리지(1060)에 저장된 프로그램 또는 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1030) 및 스토리지(1060)는 휘발성 매체, 비휘발성 매체, 분리형 매체, 비분리형 매체, 통신 매체, 또는 정보 전달 매체 중에서 적어도 하나 이상을 포함하는 저장 매체일 수 있다. 예를 들어, 메모리(1030)는 ROM(1031)이나 RAM(1032)을 포함할 수 있다.Computer system 1000 may include one or more processors 1010, memory 1030, user interface input devices 1040, user interface output devices 1050, and storage 1060 that communicate with each other over a bus 1020. can In addition, computer system 1000 may further include a network interface 1070 coupled to network 1080 . The processor 1010 may be a central processing unit or a semiconductor device that executes programs or processing instructions stored in the memory 1030 or the storage 1060 . The memory 1030 and the storage 1060 may be storage media including at least one of volatile media, nonvolatile media, removable media, non-removable media, communication media, and information delivery media. For example, memory 1030 may include ROM 1031 or RAM 1032 .

본 발명에서 설명하는 특정 실행들은 실시예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described herein are examples and do not limit the scope of the present invention in any way. For brevity of the specification, description of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection of lines or connecting members between the components shown in the drawings are examples of functional connections and / or physical or circuit connections, which can be replaced in actual devices or additional various functional connections, physical connection, or circuit connections. In addition, if there is no specific reference such as “essential” or “important”, it may not be a component necessarily required for the application of the present invention.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments and should not be determined, and all scopes equivalent to or equivalently changed from the claims as well as the claims to be described later are within the scope of the spirit of the present invention. will be said to belong to

100: 영상획득장치
200: 마스터 노드
300: 워커 노드
10: 네트워크100: image acquisition device
200: master node
300: worker node
10: network

Claims

워커 노드 및 마스터 노드로 구성된 엣지 서버 클러스터에 수행되는 자원 관리 방법에 있어서,
영상 획득 단말로부터 추론 요청을 수신하는 단계;
상태 분석 모듈로부터 가속기 정보 및 노드 정보를 수신하는 단계; 및
상기 노드 정보 및 가속기 정보에 기반하여 상기 추론 요청을 수행할 노드 및 가속기를 선택하는 단계;
를 포함하는 것을 특징으로 하는 엣지 컴퓨팅 환경의 자원 관리 방법.A resource management method performed on an edge server cluster composed of worker nodes and master nodes,
Receiving an inference request from an image acquisition terminal;
Receiving accelerator information and node information from the state analysis module; and
selecting a node and an accelerator to perform the inference request based on the node information and the accelerator information;
A resource management method in an edge computing environment comprising a.