KR20220029004A

KR20220029004A - Cloud-based deep learning task execution time prediction system and method

Info

Publication number: KR20220029004A
Application number: KR1020200110778A
Authority: KR
Inventors: 이경용
Original assignee: 국민대학교산학협력단
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2022-03-08
Also published as: KR102504939B1; WO2022050477A1

Abstract

The present invention relates to a system and method for predicting execution time of a cloud-based deep learning task. The system includes: a feature vector generation unit which generates feature vectors for a plurality of deep learning algorithms; a prediction model construction unit which learns a plurality learning data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances, and constructs a performance prediction model; a candidate feature vector generation unit which generates a candidate feature vector for a candidate deep learning algorithm for which an execution time is to be predicted; a performance feature vector prediction unit which applies the candidate feature vector to the performance prediction model to predict a performance feature vector; and an execution time prediction unit which predicts an execution time for the candidate deep learning algorithm on the basis of the performance feature vector. Therefore, it is possible to build an effective environment by predicting time required per unit operation when performing the deep learning task.

Description

클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법{CLOUD-BASED DEEP LEARNING TASK EXECUTION TIME PREDICTION SYSTEM AND METHOD}Cloud-based deep learning task execution time prediction system and method {CLOUD-BASED DEEP LEARNING TASK EXECUTION TIME PREDICTION SYSTEM AND METHOD}

본 발명은 딥러닝 작업의 수행시간 예측 기술에 관한 것으로, 보다 상세하게는 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법에 관한 것이다.The present invention relates to a technology for predicting execution time of a deep learning task, and more particularly, a cloud-based technology that can support building an effective environment by predicting the time required per unit operation when performing a deep learning learning task on various hardware resources. It relates to a system and method for predicting execution time of a deep learning task.

최근 딥러닝 알고리즘은 다양한 분야에서 우수한 성능을 보이며 인공지능의 응용 사례를 넓히고 있다. 딥러닝 모델의 학습은 단시간에 많은 컴퓨팅 자원을 필요로 하기 때문에 주로 클라우드 환경 하에서 학습 작업이 이루어지고 있다.Recently, deep learning algorithms have shown excellent performance in various fields and are expanding the application cases of artificial intelligence. Learning of deep learning models requires a lot of computing resources in a short time, so the learning task is mainly performed in a cloud environment.

하지만, 클라우드 컴퓨팅 서비스를 통해서 제공되는 자원의 종류가 너무 많은 탓에 사용자들은 다양한 서비스를 활용하여 최적의 딥러닝 학습 환경을 구축하는데 큰 어려움을 겪고 있다. 클라우드 인스턴스들 간의 가격 역시 큰 차이를 보이기에 성능 및 비용 측면에서 최적의 효율을 보이는 인스턴스를 선택하여 학습 작업을 진행하는 것은 매우 중요하면서도 어려운 일이다.However, because there are too many types of resources provided through cloud computing services, users have great difficulty in building an optimal deep learning learning environment using various services. Because the price between cloud instances also shows a big difference, it is very important and difficult to select an instance that shows the best efficiency in terms of performance and cost to proceed with learning.

한편, 딥러닝(인공지능) 플랫폼은 인공지능 기술들, 예를 들어 영상처리, 음성인식, 자연어처리 등을 이용하여 필요에 의해서 사용자가 사용이 가능하게 해주는 제품이나 서비스를 개발하기 위한 도구를 의미할 수 있다. 최근 구현되고 있는 인공지능의 핵심 기술들은 다양한 분야로 응용 가능한 범용적인 특성을 갖고 있으며, 인공지능은 딥러닝 플랫폼의 핵심 기술에 해당할 수 있다.On the other hand, a deep learning (artificial intelligence) platform refers to a tool for developing products or services that users can use according to need using artificial intelligence technologies, for example, image processing, voice recognition, natural language processing, etc. can do. The core technologies of artificial intelligence that are being implemented recently have universal characteristics that can be applied to various fields, and artificial intelligence can correspond to the core technology of a deep learning platform.

한국공개특허 제10-2017-0078012호 (2017.07.07)Korean Patent Publication No. 10-2017-0078012 (2017.07.07)

본 발명의 일 실시예는 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법을 제공하고자 한다.An embodiment of the present invention provides a system and method for predicting execution time of a cloud-based deep learning task that can support to build an effective environment by predicting the time required per unit operation when performing a deep learning learning task on various hardware resources want to

본 발명의 일 실시예는 사용자 정의 코드를 클라우드 컴퓨팅 환경에서 실행하기 위해 요구되는 최적의 자원을 정확하게 추론함으로써 비용 효율적인 환경 구축이 가능한 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법을 제공하고자 한다.An embodiment of the present invention is to provide a system and method for predicting the execution time of a cloud-based deep learning task capable of constructing a cost-effective environment by accurately inferring an optimal resource required to execute a user-defined code in a cloud computing environment.

실시예들 중에서, 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템은 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 특징 벡터 생성부; 상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 예측 모델 구축부; 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 후보 특징 벡터 생성부; 상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 성능 특징 벡터 예측부; 및 상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 수행시간 예측부를 포함한다.In embodiments, a system for predicting execution time of a cloud-based deep learning task includes: a feature vector generator for generating feature vectors for a plurality of deep learning algorithms; a predictive model construction unit configured to build a performance prediction model by learning a plurality of training data generated as a result of execution in a plurality of cloud instances for each of the plurality of deep learning algorithms; a candidate feature vector generator for generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted; a performance feature vector predictor for predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and an execution time prediction unit for predicting execution time of the candidate deep learning algorithm based on the performance feature vector.

상기 특징 벡터 생성부는 딥러닝 알고리즘을 구현한 딥러닝 학습코드의 실행에 따른 학습 과정을 모니터링하고 상기 모니터링의 결과로서 생성된 성능 메트릭(metric)을 해당 딥러닝 알고리즘에 관한 특징 벡터로 결정할 수 있다.The feature vector generator may monitor a learning process according to the execution of a deep learning learning code implementing a deep learning algorithm, and determine a performance metric generated as a result of the monitoring as a feature vector related to the deep learning algorithm.

상기 특징 벡터 생성부는 상기 성능 메트릭을 구성하는 복수의 필드(field)들 중 특정 필드들 만을 추출하여 압축된 특징 벡터를 생성할 수 있다.The feature vector generator may generate a compressed feature vector by extracting only specific fields from among a plurality of fields constituting the performance metric.

상기 예측 모델 구축부는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성하고, 상기 특정 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성하며, 상기 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 상기 복수의 학습 데이터들에 포함시켜 상기 성능 예측 모델을 구축할 수 있다.The predictive model building unit generates n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in a first cloud instance, and executes the specific deep learning algorithm m in a second cloud instance. As a result of repeated execution (where m is a natural number), m second feature vectors are generated, and n*m feature vector pairs generated by a combination between the first and second feature vectors are used as the plurality of training data. It is possible to construct the performance prediction model by including it in

상기 특징 벡터 생성부는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 하고, 상기 예측 모델 구축부는 상기 그룹화의 결과로 생성된 적어도 하나의 벡터 그룹마다 상기 성능 예측 모델을 독립적으로 구축할 수 있다.The feature vector generator may group a plurality of feature vectors based on the distance between the vectors, and the predictive model building unit may independently construct the performance prediction model for each at least one vector group generated as a result of the grouping.

상기 후보 특징 벡터 생성부는 상기 후보 딥러닝 알고리즘을 구현한 후보 딥러닝 학습코드를 최소 비용의 클라우드 인스턴스에서 실행한 결과로서 상기 후보 특징 벡터를 생성할 수 있다.The candidate feature vector generator may generate the candidate feature vector as a result of executing the candidate deep learning learning code implementing the candidate deep learning algorithm in a cloud instance with a minimum cost.

상기 성능 특징 벡터 예측부는 상기 후보 특징 벡터를 기준으로 상기 적어도 하나의 벡터 그룹 중 어느 하나를 선택하고 해당 벡터 그룹에 대응되는 성능 예측 모델을 이용하여 상기 성능 특징 벡터를 예측할 수 있다.The performance feature vector predictor may select any one of the at least one vector group based on the candidate feature vector and predict the performance feature vector using a performance prediction model corresponding to the vector group.

상기 수행시간 예측부는 상기 성능 특징 벡터를 회귀(regressor) 모델에 적용하여 상기 수행 시간을 예측할 수 있다.The execution time prediction unit may predict the execution time by applying the performance feature vector to a regressor model.

실시예들 중에서, 클라우드 기반 딥러닝 작업의 수행시간 예측 방법은 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 단계; 상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 단계; 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 단계; 상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 단계; 및 상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 단계를 포함한다.In embodiments, a method for predicting execution time of a cloud-based deep learning task includes generating feature vectors for a plurality of deep learning algorithms; building a performance prediction model by learning a plurality of training data generated as a result of execution in a plurality of cloud instances for each of the plurality of deep learning algorithms; generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted; predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and predicting execution time for the candidate deep learning algorithm based on the performance feature vector.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology may have the following effects. However, it does not mean that a specific embodiment should include all of the following effects or only the following effects, so the scope of the disclosed technology should not be understood as being limited thereby.

본 발명의 일 실시예에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법은 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있다.A cloud-based deep learning task execution time prediction system and method according to an embodiment of the present invention can support building an effective environment by predicting the time required per unit operation when performing a deep learning learning task on various hardware resources. .

본 발명의 일 실시예에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법은 사용자 정의 코드를 클라우드 컴퓨팅 환경에서 실행하기 위해 요구되는 최적의 자원을 정확하게 추론함으로써 비용 효율적인 환경 구축이 가능할 수 있다.The system and method for predicting execution time of a cloud-based deep learning task according to an embodiment of the present invention can build a cost-effective environment by accurately inferring the optimal resource required to execute a user-defined code in a cloud computing environment.

도 1은 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템을 설명하는 도면이다.
도 2는 도 1의 수행시간 예측 장치의 시스템 구성을 설명하는 도면이다.
도 3은 도 1의 수행시간 예측 장치의 기능적 구성을 설명하는 도면이다.
도 4는 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 과정을 설명하는 순서도이다.
도 5는 본 발명에 따른 특징 벡터를 생성하는 과정을 설명하는 예시도이다.
도 6은 본 발명에 따른 성능 예측 모델을 생성하는 과정을 설명하는 예시도이다.
도 7은 본 발명에 따른 최종 수행시간을 예측하는 과정을 설명하는 예시도이다.1 is a diagram illustrating a system for predicting execution time of a cloud-based deep learning task according to the present invention.
FIG. 2 is a diagram for explaining a system configuration of the apparatus for predicting execution time of FIG. 1 .
FIG. 3 is a diagram for explaining a functional configuration of the apparatus for predicting execution time of FIG. 1 .
4 is a flowchart illustrating a process of predicting execution time of a cloud-based deep learning task according to the present invention.
5 is an exemplary diagram illustrating a process of generating a feature vector according to the present invention.
6 is an exemplary diagram illustrating a process of generating a performance prediction model according to the present invention.
7 is an exemplary diagram illustrating a process of predicting a final execution time according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is merely an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiment described in the text. That is, since the embodiment is capable of various changes and may have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such effects, it should not be understood that the scope of the present invention is limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as “first” and “second” are for distinguishing one component from another, and the scope of rights should not be limited by these terms. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as “connected” to another component, it may be directly connected to the other component, but it should be understood that other components may exist in between. On the other hand, when it is mentioned that a certain element is "directly connected" to another element, it should be understood that the other element does not exist in the middle. On the other hand, other expressions describing the relationship between elements, that is, "between" and "between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression is to be understood as including the plural expression unless the context clearly dictates otherwise, and terms such as "comprises" or "have" refer to the embodied feature, number, step, action, component, part or these It is intended to indicate that a combination exists, and it is to be understood that it does not preclude the possibility of the existence or addition of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, identification numbers (eg, a, b, c, etc.) are used for convenience of description, and identification numbers do not describe the order of each step, and each step clearly indicates a specific order in context. Unless otherwise specified, it may occur in a different order from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable codes on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Terms defined in the dictionary should be interpreted as being consistent with the meaning of the context of the related art, and cannot be interpreted as having an ideal or excessively formal meaning unless explicitly defined in the present application.

도 1은 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템을 설명하는 도면이다.1 is a diagram illustrating a system for predicting execution time of a cloud-based deep learning task according to the present invention.

도 1을 참조하면, 수행시간 예측 시스템(100)은 사용자 단말(110), 수행시간 예측 장치(130), 클라우드 서버(150) 및 데이터베이스(170)를 포함할 수 있다.Referring to FIG. 1 , the execution time prediction system 100 may include a user terminal 110 , an execution time prediction apparatus 130 , a cloud server 150 , and a database 170 .

사용자 단말(110)은 클라우드 서비스를 이용할 수 있는 컴퓨팅 장치에 해당할 수 있고, 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다. 사용자 단말(110)은 수행시간 예측 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 수행시간 예측 장치(140)와 동시에 연결될 수 있다. 또한, 사용자 단말(110)은 클라우드 서버(150)와 직접 연결될 수 있으며, 클라우드 서비스 이용을 위한 전용 프로그램 또는 애플리케이션을 설치하여 실행시킬 수 있다.The user terminal 110 may correspond to a computing device capable of using a cloud service, and may be implemented as a smartphone, a laptop computer, or a computer, but is not limited thereto, and may be implemented in various devices such as a tablet PC. The user terminal 110 may be connected to the execution time prediction apparatus 130 through a network, and a plurality of user terminals 110 may be simultaneously connected to the execution time prediction apparatus 140 . In addition, the user terminal 110 may be directly connected to the cloud server 150 , and may install and execute a dedicated program or application for using the cloud service.

수행시간 예측 장치(130)는 클라우드 컴퓨팅 환경에서 딥러닝 학습 작업 수행 시 최적의 환경을 추천할 수 있는 알고리즘을 구동하는 시스템, 또는 이에 해당하는 서버로 구현될 수 있다. 수행시간 예측 장치(130)는 사용자 단말(110)과 네트워크를 통해 연결될 수 있고 정보를 주고받을 수 있다.The execution time prediction apparatus 130 may be implemented as a system for driving an algorithm capable of recommending an optimal environment when performing a deep learning learning task in a cloud computing environment, or a server corresponding thereto. The execution time prediction apparatus 130 may be connected to the user terminal 110 through a network and may exchange information.

또한, 수행시간 예측 장치(130)는 적어도 하나의 외부 시스템과 연동하여 동작할 수 있다. 예를 들어, 외부 시스템은 클라우드 서비스를 제공하는 클라우드 서버(150), 딥러닝 학습을 수행하는 인공지능 서버, 서비스 결제를 위한 결제 서버 등을 포함할 수 있다.Also, the execution time prediction apparatus 130 may operate in conjunction with at least one external system. For example, the external system may include a cloud server 150 providing a cloud service, an artificial intelligence server performing deep learning learning, a payment server for service payment, and the like.

일 실시예에서, 수행시간 예측 장치(130)는 데이터베이스(170)와 연동하여 클라우드 컴퓨팅 환경에서 딥러닝 작업의 실행시간을 예측하고 클라우드 서비스를 이용한 최적의 딥러닝 환경을 추천하기 위해 필요한 데이터를 저장할 수 있다. 또한, 수행시간 예측 장치(130)는 프로세서, 메모리, 사용자 입출력부 및 네트워크 입출력부를 포함하여 구현될 수 있으며, 이에 대해서는 도 2에서 보다 자세히 설명한다.In one embodiment, the execution time prediction device 130 works with the database 170 to predict the execution time of a deep learning task in a cloud computing environment and to store data necessary to recommend an optimal deep learning environment using a cloud service. can In addition, the execution time prediction apparatus 130 may be implemented including a processor, a memory, a user input/output unit, and a network input/output unit, which will be described in more detail with reference to FIG. 2 .

클라우드 서버(150)는 클라우드 서비스를 제공하는 서버에 해당할 수 있다. 클라우드 서버(150)는 수행시간 예측 장치(130)와 네트워크를 통해 연결될 수 있으며, 사용자 단말(110)과 직접 연결될 수 있다. 클라우드 서버(150)는 수행시간 예측 장치(130)에서 수행되는 딥러닝 학습을 위한 다양한 클라우드 인스턴스들을 제공할 수 있다. 일 실시예에서, 클라우드 서버(150)는 딥러닝 플랫폼을 제공하는 서버의 역할을 수행할 수 있다.The cloud server 150 may correspond to a server providing a cloud service. The cloud server 150 may be connected to the execution time prediction apparatus 130 through a network, and may be directly connected to the user terminal 110 . The cloud server 150 may provide various cloud instances for deep learning learning performed by the execution time prediction device 130 . In an embodiment, the cloud server 150 may serve as a server that provides a deep learning platform.

데이터베이스(170)는 수행시간 예측 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(170)는 딥러닝 알고리즘 및 이에 관한 딥러닝 학습코드에 관한 정보를 저장할 수 있고, 딥러닝 알고리즘에 관한 특징 벡터와 학습 데이터에 관한 정보를 저장할 수 있으며, 반드시 이에 한정되지 않고, 클라우드 기반 딥러닝 작업의 수행시간 예측 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 170 may correspond to a storage device for storing various types of information required in the operation process of the execution time prediction apparatus 130 . The database 170 may store information about a deep learning algorithm and a deep learning learning code related thereto, and may store information about a feature vector and learning data related to the deep learning algorithm, but is not necessarily limited thereto, and cloud-based deep Information collected or processed in various forms can be stored in the process of predicting the execution time of the running task.

도 2는 도 1의 수행시간 예측 장치의 시스템 구성을 설명하는 도면이다.FIG. 2 is a diagram for explaining a system configuration of the apparatus for predicting execution time of FIG. 1 .

도 2를 참조하면, 수행시간 예측 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함하여 구현될 수 있다.Referring to FIG. 2 , the execution time prediction apparatus 130 may be implemented including a processor 210 , a memory 230 , a user input/output unit 250 , and a network input/output unit 270 .

프로세서(210)는 수행시간 예측 장치(130)가 동작하는 과정에서의 각 단계들을 처리하는 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄할 수 있다. 프로세서(210)는 수행시간 예측 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 수행시간 예측 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor 210 may execute a procedure for processing each step in the process in which the execution time prediction device 130 operates, and manage the memory 230 that is read or written throughout the process, and the memory ( 230) may schedule a synchronization time between the volatile memory and the non-volatile memory. The processor 210 may control the overall operation of the execution time prediction device 130 , and is electrically connected to the memory 230 , the user input/output unit 250 , and the network input/output unit 270 to control the flow of data therebetween. can do. The processor 210 may be implemented as a central processing unit (CPU) of the execution time prediction apparatus 130 .

메모리(230)는 SSD(Solid State Drive) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 수행시간 예측 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory 230 is implemented as a non-volatile memory, such as a solid state drive (SSD) or a hard disk drive (HDD), and may include an auxiliary storage device used to store overall data required for the execution time prediction device 130, and , it may include a main memory implemented as a volatile memory such as random access memory (RAM).

사용자 입출력부(250)는 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)는 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 수행시간 예측 장치(130)는 서버로서 수행될 수 있다.The user input/output unit 250 may include an environment for receiving a user input and an environment for outputting specific information to the user. For example, the user input/output unit 250 may include an input device including an adapter such as a touch pad, a touch screen, an on-screen keyboard, or a pointing device, and an output device including an adapter such as a monitor or a touch screen. In an embodiment, the user input/output unit 250 may correspond to a computing device accessed through a remote connection, and in such a case, the execution time prediction device 130 may be performed as a server.

네트워크 입출력부(270)은 네트워크를 통해 외부 장치 또는 시스템과 연결하기 위한 환경을 포함하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다.The network input/output unit 270 includes an environment for connecting with an external device or system through a network, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a VAN (Wide Area Network) (VAN). It may include an adapter for communication such as Value Added Network).

도 3은 도 1의 수행시간 예측 장치의 기능적 구성을 설명하는 도면이다.FIG. 3 is a diagram for explaining a functional configuration of the apparatus for predicting execution time of FIG. 1 .

도 3을 참조하면, 수행시간 예측 장치(130)는 특징 벡터 생성부(310), 예측 모델 구축부(320), 후보 특징 벡터 생성부(330), 성능 특징 벡터 예측부(340), 수행시간 예측부(350) 및 제어부(360)를 포함할 수 있다.Referring to FIG. 3 , the execution time prediction apparatus 130 includes a feature vector generator 310 , a predictive model builder 320 , a candidate feature vector generator 330 , a performance feature vector predictor 340 , and an execution time. It may include a prediction unit 350 and a control unit 360 .

특징 벡터 생성부(310)는 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성할 수 있다. 즉, 딥러닝 알고리즘에 대응되는 특징 벡터는 딥러닝 알고리즘을 구현한 학습코드가 클라우드 인스턴스에서 실행될 경우 도출되는 특징 정보에 해당할 수 있다. 결과적으로, 특징 벡터 생성부(310)는 딥러닝 알고리즘에 대한 특징 정보를 표현하기 위하여 딥러닝 알고리즘에 대응되는 특징 벡터를 새롭게 정의하여 정확도 높은 예측 모델을 구축하기 위한 입력 데이터를 제공할 수 있다.The feature vector generator 310 may generate feature vectors for a plurality of deep learning algorithms. That is, the feature vector corresponding to the deep learning algorithm may correspond to feature information derived when the learning code implementing the deep learning algorithm is executed in the cloud instance. As a result, the feature vector generator 310 may provide input data for constructing a predictive model with high accuracy by newly defining a feature vector corresponding to the deep learning algorithm in order to express feature information about the deep learning algorithm.

일 실시예에서, 특징 벡터 생성부(310)는 딥러닝 알고리즘을 구현한 딥러닝 학습코드의 실행에 따른 학습 과정을 모니터링하고 모니터링의 결과로서 생성된 성능 메트릭(metric)을 해당 딥러닝 알고리즘에 관한 특징 벡터로 결정할 수 있다. 예를 들어, 도 5에서, 특징 벡터 생성부(310)는 텐서플로우(TensorFlow), 파이토치(PyTorch) 등의 딥러닝 플랫폼(530)이 딥러닝 학습코드(510)의 실행 과정에서 제공하는 성능 메트릭(550)을 이용하여 각 딥러닝 알고리즘에 대응되는 특징 벡터를 생성할 수 있다. 해당 성능 메트릭(550)은 사용자로 하여금 작업의 특성을 관찰하고 진행 사항을 모니터링 하는 목적으로 딥러닝 플랫폼(530)에 의해 시각화(visualization) 툴과 함께 제공될 수 있다. 보다 구체적으로, 텐서플로우의 경우 모델링 과정에서 n = 2046 개의 특징값을 제공하고 있으며, 해당 특징값들은 딥러닝 알고리즘의 특성에 따라 공백 값과 유효 값을 함께 포함할 수 있다.In one embodiment, the feature vector generator 310 monitors a learning process according to the execution of a deep learning learning code implementing a deep learning algorithm, and uses a performance metric generated as a result of the monitoring for the corresponding deep learning algorithm. It can be determined as a feature vector. For example, in FIG. 5 , the feature vector generator 310 provides the performance provided by the deep learning platform 530 such as TensorFlow and PyTorch during the execution process of the deep learning learning code 510 . A feature vector corresponding to each deep learning algorithm may be generated using the metric 550 . The corresponding performance metric 550 may be provided together with a visualization tool by the deep learning platform 530 for the purpose of allowing the user to observe the characteristics of the task and monitor the progress. More specifically, in the case of TensorFlow, n = 2046 feature values are provided in the modeling process, and the corresponding feature values may include blank values and valid values together according to the characteristics of the deep learning algorithm.

일 실시예에서, 특징 벡터 생성부(310)는 성능 메트릭을 구성하는 복수의 필드(field)들 중 특정 필드들 만을 추출하여 압축된 특징 벡터를 생성할 수 있다. 텐서플로우를 사용하는 경우 특징 벡터 생성부(310)는 2046개의 성능 메트릭 중에서 딥러닝 알고리즘의 수행과 밀접한 관련이 있는 필드들만을 추출하여 특징 벡터를 구성할 수 있다. 예를 들어, BatchMatMul 필드는 딥러닝 학습 중 행렬 곱셈에 소요되는 시간을 나타내는 메트릭에 해당할 수 있고, 특징 벡터 생성부(310)는 성능 메트릭 중 이와 관련된 필드들을 추출하여 압축된 특징 벡터를 생성할 수 있다.In an embodiment, the feature vector generator 310 may generate a compressed feature vector by extracting only specific fields from among a plurality of fields constituting the performance metric. When TensorFlow is used, the feature vector generator 310 may construct a feature vector by extracting only fields closely related to the performance of the deep learning algorithm from among 2046 performance metrics. For example, the BatchMatMul field may correspond to a metric representing the time required for matrix multiplication during deep learning learning, and the feature vector generator 310 extracts fields related thereto from among the performance metrics to generate a compressed feature vector. can

일 실시예에서, 특징 벡터 생성부(310)는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 할 수 있다. 딥러닝 알고리즘의 수행시간 예측에 있어, 사용자는 딥러닝 알고리즘을 구현하기 위한 자신만의 코드를 새롭게 작성하여 학습 모델을 새롭게 구성할 수 있다. 또한, 딥러닝 알고리즘이 매우 많기 때문에 이를 구현한 다양한 딥러닝 학습코드들을 하나의 성능 예측 모델로 분류하는 것은 쉽지 않을 수 있다. 특징 벡터 생성부(310)는 딥러닝 알고리즘에 대해 유사한 알고리즘을 하나의 클러스터로 묶어 각 클러스터 별로 성능 예측 모델이 독립적으로 생성되도록 동작할 수 있으며, 특징 벡터 간의 거리를 기준으로 유사한 알고리즘을 분류할 수 있다.In an embodiment, the feature vector generator 310 may group a plurality of feature vectors based on a distance between the vectors. In predicting the execution time of the deep learning algorithm, the user can newly compose the learning model by writing his own code to implement the deep learning algorithm. In addition, since there are so many deep learning algorithms, it may not be easy to classify various deep learning learning codes that implement them into one performance prediction model. The feature vector generator 310 may operate to independently generate a performance prediction model for each cluster by grouping similar algorithms into one cluster for the deep learning algorithm, and classify similar algorithms based on the distance between the feature vectors. there is.

예측 모델 구축부(320)는 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축할 수 있다. 성능 예측 모델의 입력은 사용자가 정의한 딥러닝 학습코드를 임의의 타입의 클라우드 인스턴스에서 실행시켜 추출되는 특징 벡터에 해당할 수 있다. 이 때, 딥러닝 작업이 수행된 인스턴스 타입은 앵커 타입에 해당할 수 있다. 즉, 성능 예측 모델에 의해 예측되는 값은 앵커 타입이 아닌 다른 인스턴스 타입의 클라우드 인스턴스에서 해당 딥러닝 코드를 실행시켜 생성된 특징 벡터에 해당할 수 있다.The predictive model building unit 320 may build a performance prediction model by learning a plurality of training data generated as a result of executing each of a plurality of deep learning algorithms in a plurality of cloud instances. The input of the performance prediction model may correspond to a feature vector extracted by executing a user-defined deep learning learning code in an arbitrary type of cloud instance. In this case, the instance type on which the deep learning task is performed may correspond to the anchor type. That is, the value predicted by the performance prediction model may correspond to a feature vector generated by executing a corresponding deep learning code in a cloud instance of an instance type other than the anchor type.

예를 들어, 도 6에서, 앵커노드의 인스턴스 타입이 G3.2xlarge 라면, g3.2xlarge에서 사용자 정의 코드를 실행시켜 발생되는 제1 특징 벡터(610)가 성능 예측 모델(630)의 입력이 될 수 있다. 성능 예측 모델(630)은 해당 입력을 기초로 다른 인스턴스 타입(예를 들어, P2.xlarge)의 클라우드 인스턴스에서 실행될 경우의 제2 특징 벡터(650)들을 예측할 수 있다. 즉, 성능 예측 모델(630)을 구축하기 위해서는 다양한 인스턴스 타입에서 실행되어 생성된 특징 벡터들을 학습 데이터로 사용될 필요가 있으며, 예측 모델 구축부(320)는 하나의 알고리즘을 다양한 클라우드 인스턴스들에서 실행한 결과로서 생성되는 특징 벡터들을 학습 데이터로서 학습할 수 있다.For example, in FIG. 6 , if the instance type of the anchor node is G3.2xlarge, the first feature vector 610 generated by executing the user-defined code in g3.2xlarge can be an input of the performance prediction model 630. there is. The performance prediction model 630 may predict the second feature vectors 650 when executed in a cloud instance of another instance type (eg, P2.xlarge) based on the corresponding input. That is, in order to build the performance prediction model 630, it is necessary to use feature vectors generated by being executed in various instance types as training data, and the prediction model building unit 320 executes one algorithm in various cloud instances. Feature vectors generated as a result may be learned as training data.

일 실시예에서, 예측 모델 구축부(320)는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성하고, 특정 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성하며, 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 복수의 학습 데이터들에 포함시켜 성능 예측 모델을 구축할 수 있다. 즉, 예측 모델 구축부(320)는 클라우드 환경에서 새로운 데이터 증강(augmentation) 기법을 적용하여 다수의 학습 데이터를 확보함으로써 성능 예측 모델의 일반성을 높일 수 있다.In one embodiment, the predictive model building unit 320 generates n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in the first cloud instance, and a specific deep learning algorithm m second feature vectors are generated as a result of repeatedly executing m (where m is a natural number) in the second cloud instance, and n*m pairs of feature vectors generated by a combination between the first and second feature vectors It is possible to construct a performance prediction model by including it in a plurality of training data. That is, the predictive model building unit 320 may increase the generality of the performance prediction model by applying a new data augmentation technique in a cloud environment to secure a plurality of learning data.

보다 구체적으로, 예측 모델 구축부(320)는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성할 수 있다. 여기에서, 제1 클라우드 인스턴스는 앵커 타입의 클라우드 인스턴스에 해당할 수 있다. 클라우드의 특성상 n개의 제1 특징 벡터들은 서로 비슷한 값을 가질 수 있지만, 실행 시점과 동작 상태의 차이로 인해 조금씩 상이한 값을 가질 수 있다.More specifically, the predictive model building unit 320 may generate n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in the first cloud instance. Here, the first cloud instance may correspond to an anchor type cloud instance. Due to the nature of the cloud, the n first feature vectors may have similar values, but may have slightly different values due to differences in execution time and operation state.

그 다음, 예측 모델 구축부(320)는 동일한 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성할 수 있다. 여기에서, 제2 클라우드 인스턴스는 앵커 타입이 아닌 다른 인스턴스 타입에 해당할 수 있으며, 제2 특징 벡터들 역시 서로 비슷하지만 조금씩 상이한 값을 가질 수 있다.Next, the predictive model building unit 320 may generate m second feature vectors as a result of repeatedly executing the same deep learning algorithm m times (where m is a natural number) in the second cloud instance. Here, the second cloud instance may correspond to an instance type other than the anchor type, and the second feature vectors may also have similar but slightly different values.

그 다음, 예측 모델 구축부(320)는 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 복수의 학습 데이터들에 포함시켜 성능 예측 모델을 구축할 수 있다. 즉, 특징 벡터 쌍은 성능 예측 모델 구축을 위한 하나의 학습 데이터에 대응될 수 있고, 각각 입력과 출력 데이터에 대응될 수 있다.Next, the prediction model building unit 320 may construct a performance prediction model by including n*m pairs of feature vectors generated by a combination between the first and second feature vectors in a plurality of training data. That is, the feature vector pair may correspond to one piece of training data for constructing a performance prediction model, and may correspond to input and output data, respectively.

일 실시예에서, 예측 모델 구축부(320)는 그룹화의 결과로 생성된 적어도 하나의 벡터 그룹마다 성능 예측 모델을 독립적으로 구축할 수 있다. 특징 벡터 생성부(310)는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 할 수 있으며, 이 경우 예측 모델 구축부(320)는 그룹화된 결과로 생성된 클러스터들, 즉 각 벡터 그룹에 대응되는 성능 예측 모델을 개별적으로 구축하여 성능 예측 모델의 예측 정확성을 높일 수 있다.In an embodiment, the prediction model building unit 320 may independently construct a performance prediction model for each at least one vector group generated as a result of the grouping. The feature vector generator 310 may group a plurality of feature vectors based on the distance between the vectors, and in this case, the predictive model building unit 320 generates clusters generated as a result of the grouping, that is, each vector group. By building the performance prediction model individually, the prediction accuracy of the performance prediction model can be increased.

후보 특징 벡터 생성부(330)는 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성할 수 있다. 예측 모델 구축부(320)에 의해 성능 예측 모델을 구축된 경우, 후보 특징 벡터 생성부(330)는 실제 성능 예측 대상이 되는 후보 딥러닝 알고리즘이 구현된 학습코드를 앵커 타입의 클라우드 인스턴스에서 실행시킨 결과로서 특징 벡터를 생성할 수 있다. 이후 단계에서, 후보 특징 벡터는 성능 예측 모델의 입력으로 활용될 수 있다.The candidate feature vector generator 330 may generate a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted. When the performance prediction model is built by the prediction model building unit 320, the candidate feature vector generation unit 330 executes the learning code in which the candidate deep learning algorithm, which is the actual performance prediction target, is implemented in the anchor type cloud instance. As a result, a feature vector can be generated. In a later step, the candidate feature vector may be utilized as an input of a performance prediction model.

일 실시예에서, 후보 특징 벡터 생성부(330)는 후보 딥러닝 알고리즘을 구현한 후보 딥러닝 학습코드를 최소 비용의 클라우드 인스턴스에서 실행한 결과로서 후보 특징 벡터를 생성할 수 있다. 성능 예측 모델의 입력으로 활용될 후보 특징 벡터는 기준이 되는 클라우드 인스턴스에서 실행시킬 필요가 있으며, 후보 특징 벡터 생성부(330)는 최소 비용으로 구성 가능한 클라우드 인스턴스를 기초로 후보 특징 벡터를 생성할 수 있다.In an embodiment, the candidate feature vector generator 330 may generate a candidate feature vector as a result of executing the candidate deep learning learning code implementing the candidate deep learning algorithm in the cloud instance with the minimum cost. A candidate feature vector to be used as an input of the performance prediction model needs to be executed in a cloud instance as a reference, and the candidate feature vector generator 330 can generate a candidate feature vector based on a cloud instance configurable at a minimum cost. there is.

성능 특징 벡터 예측부(340)는 후보 특징 벡터를 성능 예측 모델에 적용하여 성능 특징 벡터를 예측할 수 있다. 즉, 성능 예측 모델은 사용자가 작성한 딥러닝 학습코드에 대한 후보 특징 벡터를 기초로 다른 인스턴스 타입에서 동작 시 생성될 수 있는 특징 벡터를 예측하여 출력으로 제공할 수 있다.The performance feature vector predictor 340 may predict the performance feature vector by applying the candidate feature vector to the performance prediction model. That is, the performance prediction model can predict a feature vector that can be generated when operating in another instance type based on the candidate feature vector for the deep learning code written by the user and provide it as an output.

일 실시예에서, 성능 특징 벡터 예측부(340)는 후보 특징 벡터를 기준으로 적어도 하나의 벡터 그룹 중 어느 하나를 선택하고 해당 벡터 그룹에 대응되는 성능 예측 모델을 이용하여 성능 특징 벡터를 예측할 수 있다. 성능 특징 벡터 예측부(340)는 후보 특징 벡터를 기준으로 벡터 간의 거리에 따라 특정 벡터 그룹을 결정할 수 있으며, 해당 벡터 그룹에 대응되어 구축된 성능 예측 모델을 선택하여 성능 특징 벡터 예측에 사용할 수 있다.In an embodiment, the performance feature vector predictor 340 may select any one of at least one vector group based on the candidate feature vector and predict the performance feature vector using a performance prediction model corresponding to the vector group. . The performance feature vector predictor 340 may determine a specific vector group according to the distance between vectors based on the candidate feature vector, and select a performance prediction model constructed to correspond to the vector group and use it to predict the performance feature vector. .

수행시간 예측부(350)는 성능 특징 벡터를 기초로 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다. 성능 특징 벡터는 특정 클라우드 인스턴스에서 딥러닝 학습코드가 실행되는 과정에서 모니터링된 성능 메트릭에 해당할 수 있으며, 과거 실제 수행 과정에서 수집된 정보를 기초로 이와 유사한 성능 메트릭과 실제 수행시간에 관한 정보를 이용하면 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다. 이를 위하여, 수행시간 예측부(350)는 통계적 분석 방법론에 해당하는 회귀분석(regression analysis)을 수행 시간 예측에 활용할 수 있다.The execution time prediction unit 350 may predict the execution time of the candidate deep learning algorithm based on the performance feature vector. The performance feature vector may correspond to a performance metric monitored while the deep learning code is executed in a specific cloud instance. It can be used to predict the execution time of a candidate deep learning algorithm. To this end, the execution time prediction unit 350 may utilize a regression analysis corresponding to a statistical analysis methodology to predict the execution time.

일 실시예에서, 수행시간 예측부(350)는 성능 특징 벡터를 회귀(regressor) 모델에 적용하여 수행 시간을 예측할 수 있다. 예를 들어, 도 7에서, 회귀 모델(730)은 딥러닝 알고리즘에 관한 특징 벡터와 실제 수행 시간 간의 회귀분석을 통해 사전에 생성될 수 있으며, 수행시간 예측부(350)는 성능 예측 모델을 통해 예측된 성능 특징 벡터(710)를 회귀 모델(730)에 적용하여 실제 수행 시간을 예측할 수 있다. 즉, 회귀 모델(730)은 학습 데이터 생성 시 만들어진 특징 벡터와 해당 특징 벡터를 생성하기 위해서 실행된 단계에서의 학습 시간을 추론하는 분석 모델에 해당할 수 있다.In an embodiment, the execution time prediction unit 350 may predict the execution time by applying the performance feature vector to a regressor model. For example, in FIG. 7 , a regression model 730 may be generated in advance through regression analysis between a feature vector for a deep learning algorithm and an actual execution time, and the execution time prediction unit 350 is configured to perform a performance prediction model. An actual execution time may be predicted by applying the predicted performance feature vector 710 to the regression model 730 . That is, the regression model 730 may correspond to an analysis model that infers a feature vector created when generating training data and a learning time in a step executed to generate the corresponding feature vector.

제어부(360)는 특징 벡터 생성부(310), 예측 모델 구축부(320), 후보 특징 벡터 생성부(330), 성능 특징 벡터 예측부(340) 및 수행시간 예측부(350) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit 360 includes a control flow between the feature vector generator 310 , the predictive model builder 320 , the candidate feature vector generator 330 , the performance feature vector predictor 340 , and the execution time predictor 350 , or You can manage the data flow.

도 4는 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 과정을 설명하는 순서도이다.4 is a flowchart illustrating a process of predicting execution time of a cloud-based deep learning task according to the present invention.

도 4를 참조하면, 수행시간 예측 장치(130)는 특징 벡터 생성부(310)를 통해 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성할 수 있다(단계 S410). 수행시간 예측 장치(130)는 예측 모델 구축부(320)를 통해 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축할 수 있다(단계 S420).Referring to FIG. 4 , the execution time prediction apparatus 130 may generate feature vectors for a plurality of deep learning algorithms through the feature vector generator 310 (step S410 ). Execution time prediction apparatus 130 builds a performance prediction model by learning a plurality of training data generated as a result of execution in a plurality of cloud instances for each of a plurality of deep learning algorithms through the prediction model building unit 320 . It can be done (step S420).

또한, 수행시간 예측 장치(130)는 후보 특징 벡터 생성부(330)를 통해 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성할 수 있다(단계 S430). 수행시간 예측 장치(130)는 성능 특징 벡터 예측부(340)를 통해 후보 특징 벡터를 성능 예측 모델에 적용하여 성능 특징 벡터를 예측할 수 있다(단계 S440). 수행시간 예측 장치(130)는 수행시간 예측부(350)를 통해 성능 특징 벡터를 기초로 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다(단계 S450).Also, the execution time prediction apparatus 130 may generate a candidate feature vector for the candidate deep learning algorithm for which the execution time is to be predicted through the candidate feature vector generator 330 (step S430). The execution time prediction apparatus 130 may predict the performance feature vector by applying the candidate feature vector to the performance prediction model through the performance feature vector predictor 340 (step S440). The execution time prediction apparatus 130 may predict the execution time of the candidate deep learning algorithm based on the performance feature vector through the execution time prediction unit 350 (step S450).

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the preferred embodiment of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention as described in the claims below. You will understand that it can be done.

100: 수행시간 예측 시스템
110: 사용자 단말 130: 수행시간 예측 장치
150: 클라우드 서버 170: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 특징 벡터 생성부 320: 예측 모델 구축부
330: 후보 특징 벡터 생성부 340: 성능 특징 벡터 예측부
350: 수행시간 예측부 360: 제어부
510: 딥러닝 학습코드 530: 딥러닝 플랫폼
550: 성능 메트릭
610: 제1 특징 벡터 630: 성능 예측 모델
650: 제2 특징 벡터
710: 성능 특징 벡터 730: 회귀 모델100: execution time prediction system
110: user terminal 130: execution time prediction device
150: cloud server 170: database
210: processor 230: memory
250: user input/output unit 270: network input/output unit
310: feature vector generating unit 320: predictive model building unit
330: candidate feature vector generator 340: performance feature vector predictor
350: execution time prediction unit 360: control unit
510: deep learning learning code 530: deep learning platform
550: performance metrics
610: first feature vector 630: performance prediction model
650: second feature vector
710: performance feature vector 730: regression model

Claims

복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 특징 벡터 생성부;
상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 예측 모델 구축부;
수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 후보 특징 벡터 생성부;
상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 성능 특징 벡터 예측부; 및
상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 수행시간 예측부를 포함하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
a feature vector generator that generates feature vectors for a plurality of deep learning algorithms;
a predictive model construction unit configured to build a performance prediction model by learning a plurality of training data generated as a result of execution in a plurality of cloud instances for each of the plurality of deep learning algorithms;
a candidate feature vector generator for generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted;
a performance feature vector predictor for predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and
A system for predicting execution time of a cloud-based deep learning task, comprising: an execution time predictor for predicting execution time for the candidate deep learning algorithm based on the performance feature vector.

제1항에 있어서, 상기 특징 벡터 생성부는
딥러닝 알고리즘을 구현한 딥러닝 학습코드의 실행에 따른 학습 과정을 모니터링하고 상기 모니터링의 결과로서 생성된 성능 메트릭(metric)을 해당 딥러닝 알고리즘에 관한 특징 벡터로 결정하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 1, wherein the feature vector generator
Cloud-based deep, characterized in that monitoring a learning process according to the execution of a deep learning learning code implementing a deep learning algorithm and determining a performance metric generated as a result of the monitoring as a feature vector for the corresponding deep learning algorithm Running time prediction system for running tasks.

제2항에 있어서, 상기 특징 벡터 생성부는
상기 성능 메트릭을 구성하는 복수의 필드(field)들 중 특정 필드들 만을 추출하여 압축된 특징 벡터를 생성하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 2, wherein the feature vector generator
A system for predicting execution time of a cloud-based deep learning task, characterized in that the compressed feature vector is generated by extracting only specific fields from among a plurality of fields constituting the performance metric.

제1항에 있어서, 상기 예측 모델 구축부는
특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성하고, 상기 특정 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성하며, 상기 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 상기 복수의 학습 데이터들에 포함시켜 상기 성능 예측 모델을 구축하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
According to claim 1, wherein the predictive model building unit
As a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in the first cloud instance, n first feature vectors are generated, and the specific deep learning algorithm is executed m times (where m is a natural number) in the second cloud instance. natural number) as a result of repeated execution, m second feature vectors are generated, and n*m pairs of feature vectors generated by a combination between the first and second feature vectors are included in the plurality of training data to obtain the performance. A cloud-based deep learning task execution time prediction system, characterized in that it builds a predictive model.

제1항에 있어서,
상기 특징 벡터 생성부는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 하고,
상기 예측 모델 구축부는 상기 그룹화의 결과로 생성된 적어도 하나의 벡터 그룹마다 상기 성능 예측 모델을 독립적으로 구축하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 1,
The feature vector generator groups a plurality of feature vectors based on the distance between the vectors,
The predictive model building unit is a cloud-based deep learning task execution time prediction system, characterized in that independently constructing the performance prediction model for each at least one vector group generated as a result of the grouping.

제2항에 있어서, 상기 후보 특징 벡터 생성부는
상기 후보 딥러닝 알고리즘을 구현한 후보 딥러닝 학습코드를 최소 비용의 클라우드 인스턴스에서 실행한 결과로서 상기 후보 특징 벡터를 생성하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 2, wherein the candidate feature vector generator
A cloud-based deep learning task execution time prediction system, characterized in that the candidate feature vector is generated as a result of executing the candidate deep learning learning code implementing the candidate deep learning algorithm in the cloud instance with the minimum cost.

제5항에 있어서, 상기 성능 특징 벡터 예측부는
상기 후보 특징 벡터를 기준으로 상기 적어도 하나의 벡터 그룹 중 어느 하나를 선택하고 해당 벡터 그룹에 대응되는 성능 예측 모델을 이용하여 상기 성능 특징 벡터를 예측하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 5, wherein the performance feature vector predictor
Execution time of a cloud-based deep learning task, wherein any one of the at least one vector group is selected based on the candidate feature vector and the performance feature vector is predicted using a performance prediction model corresponding to the vector group prediction system.

제1항에 있어서, 상기 수행시간 예측부는
상기 성능 특징 벡터를 회귀(regressor) 모델에 적용하여 상기 수행 시간을 예측하는 것을 특징으로 하는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템.
The method of claim 1, wherein the execution time prediction unit
A system for predicting execution time of a cloud-based deep learning task, characterized in that the execution time is predicted by applying the performance feature vector to a regressor model.

복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 단계;
상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 단계;
수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 단계;
상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 단계; 및
상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 단계를 포함하는 클라우드 기반 딥러닝 작업의 수행시간 예측 방법.

generating feature vectors for a plurality of deep learning algorithms;
building a performance prediction model by learning a plurality of training data generated as a result of execution in a plurality of cloud instances for each of the plurality of deep learning algorithms;
generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted;
predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and
A method of predicting execution time of a cloud-based deep learning task, comprising estimating an execution time for the candidate deep learning algorithm based on the performance feature vector.