KR20220094564A

KR20220094564A - Method and apparatus for automatically reducing model weight for deep learning model serving optimization, and a method for providing cloud inference services usin the same

Info

Publication number: KR20220094564A
Application number: KR1020200185894A
Authority: KR
Inventors: 이경용; 손태선
Original assignee: 국민대학교산학협력단
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-07-06
Also published as: WO2022145564A1; KR102613367B1

Abstract

The present invention relates to a method and device for automatically reducing a model weight for deep learning model serving optimization, and a method for providing a cloud inference service using the same. The device comprises the steps of: receiving a deep learning algorithm for building a deep learning model; dividing the deep learning algorithm into a plurality of operation steps; determining at least one divergence point existing between the operation steps in a learning process according to the deep learning algorithm; generating at least one intermediate deep learning model that diverges from the progress direction of the learning process based on the at least one divergence point and proceeds to a final operation step of the deep learning algorithm; and completing the deep learning model and the at least one intermediate deep learning model upon the completion of the learning process. Accordingly, a deep learning model inference service can be provided smoothly.

Description

딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법{METHOD AND APPARATUS FOR AUTOMATICALLY REDUCING MODEL WEIGHT FOR DEEP LEARNING MODEL SERVING OPTIMIZATION, AND A METHOD FOR PROVIDING CLOUD INFERENCE SERVICES USIN THE SAME}A method and apparatus for automatically lightweighting a model for optimizing deep learning model serving, and a method for providing cloud inference service using the same

본 발명은 딥러닝 모델 생성 기술에 관한 것으로, 보다 상세하게는 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법에 관한 것이다.The present invention relates to a deep learning model generation technology, and more particularly, a method for automatically lightweighting a model for optimizing deep learning model serving that can provide a deep learning model inference service smoothly by automatically generating a deep learning model of various levels; and It relates to an apparatus and a method of providing a cloud inference service using the same.

최근 다양한 애플리케이션에 딥러닝 기법을 적용하게 되면서 임베디드 환경에서도 해당 알고리즘을 사용하는 사례가 증가하고 있다. 일반적으로 정확도가 높은 딥러닝 모델일수록 결과 도출까지 긴 시간이 필요할 수 있다. 그러나, 임베디드 환경은 시간과 에너지에 제약 사항이 존재하며, 기존의 신경망은 제약 사항에 동적으로 대처하지 못하는 문제점을 가지고 있다.Recently, as deep learning techniques are applied to various applications, the cases of using the algorithm in embedded environments are increasing. In general, the higher the accuracy of a deep learning model, the longer it may take to obtain a result. However, the embedded environment has limitations in time and energy, and the existing neural network has a problem in that it cannot dynamically cope with the limitations.

이러한 상황은 클라우드 컴퓨팅 환경에서도 동일하게 적용될 수 있다. 현재 대규모의 컴퓨팅 자원을 필요로 하는 딥러닝 작업의 특성 상 학습과 추론 서비스 모두 클라우드 컴퓨팅 환경에서 많은 작업이 이루어지고 있다. 딥러닝 추론 서비스의 경우 생성된 모델을 활용하여 사용자의 요청에 따라 예측 서비스를 제공해주는 역할을 담당할 수 있다.This situation can be equally applied in a cloud computing environment. Currently, due to the nature of deep learning tasks that require large-scale computing resources, many tasks are being performed in cloud computing environments for both learning and inference services. In the case of a deep learning inference service, it can play a role in providing a prediction service according to a user's request by using the generated model.

학습 서비스와 달리 추론 서비스는 사용자의 요청 수에 따라서 확장성 있는 서비스가 제공되어야 하는 특성을 가질 수 있다. 딥러닝 추론 모델의 경우 복잡한 구조의 모델은 추론 시간은 오래 걸리는 반면에 정확도가 높은 특징이 있으며, 반대로 간단한 구조의 모델은 추론 시간은 짧지만 정확도가 낮다는 단점을 가질 수 있다. 일반적인 딥러닝 알고리즘 개발자는 정확도가 높은 모델을 만드는 것에 많은 노력을 기울이게 된다.Unlike the learning service, the inference service may have a characteristic that a scalable service should be provided according to the number of requests from users. In the case of deep learning inference models, a model with a complex structure takes a long inference time, but has high accuracy. Conversely, a model with a simple structure may have a shortcoming inference time but low accuracy. A typical deep learning algorithm developer puts a lot of effort into creating a model with high accuracy.

하지만, 실제 딥러닝 모델의 추론 서비스를 제공해주는 경우에 있어서 사용자 요청이 폭증하는 경우 정확도가 높고 오랜 연산이 걸리는 모델보다는, 약간의 정확도 손해를 보더라도 빠른 추론 시간을 제공해 주는 모델이 응용 서비스에 적용하기에 더 효과적일 수 있다.However, in the case of providing the inference service of an actual deep learning model, when user requests increase, rather than a model that has high accuracy and takes a long time to calculate, a model that provides fast inference time even if there is a slight loss of accuracy is applied to the application service. may be more effective for

한국등록특허 제10-0820723(2008.04.02)호Korean Patent Registration No. 10-0820723 (2008.04.02)

본 발명의 일 실시예는 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법을 제공하고자 한다.An embodiment of the present invention provides a method and apparatus for automatically lightweighting a model for optimizing deep learning model serving, which can provide a deep learning model inference service smoothly by automatically generating a deep learning model of various levels, and a method for providing a cloud inference service using the same would like to provide

본 발명의 일 실시예는 딥러닝 모델 개발 연구원이 정확도가 높은 복잡한 모델을 개발하는 중에 해당 모델로부터 각기 다른 예측 정확도와 모델 복잡도를 가지는 다양한 딥러닝 예측 모델들을 생성할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법을 제공하고자 한다.One embodiment of the present invention is a deep learning model serving optimization that can generate various deep learning prediction models having different prediction accuracy and model complexity from the model while a deep learning model development researcher develops a complex model with high accuracy. An object of the present invention is to provide a method and apparatus for automatically reducing the weight of a model, and a method for providing a cloud inference service using the method.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법은 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 단계; 상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 단계; 상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 단계; 상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 단계; 및 상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 단계를 포함한다.Among the embodiments, a method for automatically lightweighting a model for deep learning model serving optimization includes: receiving a deep learning algorithm for building a deep learning model; dividing the deep learning algorithm into a plurality of operation steps; determining at least one branch point existing between the plurality of operation steps in a learning process according to the deep learning algorithm; generating at least one intermediate deep learning model branching from the progress direction of the learning process based on the at least one branch point and proceeding to the last operation step of the deep learning algorithm; and completing the deep learning model and the at least one intermediate deep learning model upon completion of the learning process.

상기 딥러닝 알고리즘은 DNN(Deep Neural Network), CNN(Convolution Neural Network) 및 RNN(Recurrent Neural Network)을 포함할 수 있다.The deep learning algorithm may include a Deep Neural Network (DNN), a Convolution Neural Network (CNN), and a Recurrent Neural Network (RNN).

상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델은 예측 정확도와 연산 속도가 각각 상이할 수 있다.The deep learning model and the at least one intermediate deep learning model may have different prediction accuracy and operation speed, respectively.

상기 복수의 동작 단계들로 분할하는 단계는 상기 딥러닝 알고리즘의 동작들을 복수의 레이어(layer)들로 분할하는 단계; 및 상기 복수의 레이어들 각각에 대응하는 동작 단계들을 결정하는 단계를 포함할 수 있다.The dividing into the plurality of operation steps may include dividing the operations of the deep learning algorithm into a plurality of layers; and determining operation steps corresponding to each of the plurality of layers.

상기 복수의 동작 단계들로 분할하는 단계는 상기 딥러닝 알고리즘의 동작 과정에서 반복적으로 수행되는 반복 구간을 결정하는 단계; 상기 반복 구간을 기준으로 반복 전 구간 및 반복 후 구간을 결정하는 단계; 상기 반복 구간에 대해 적어도 하나의 단위 구간을 결정하는 단계; 및 상기 반복 전 구간, 상기 적어도 하나의 단위 구간 및 상기 반복 후 구간을 순서대로 정렬하여 상기 복수의 동작 단계들로 결정하는 단계를 포함할 수 있다.The dividing into the plurality of operation steps may include: determining a repetition interval that is repeatedly performed in the operation process of the deep learning algorithm; determining a section before repetition and a section after repetition based on the repetition section; determining at least one unit section for the repetition section; and arranging the section before the repetition, the at least one unit section, and the section after the repetition in order to determine the plurality of operation steps.

상기 적어도 하나의 분기 지점을 결정하는 단계는 상기 반복 구간이 종료되는 지점마다 분기 지점으로 결정하는 단계를 포함할 수 있다.The determining of the at least one branch point may include determining a branch point at each point at which the repetition period ends.

상기 적어도 하나의 분기 지점을 결정하는 단계는 상기 딥러닝 알고리즘이 CNN인 경우 적어도 하나의 콘볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)를 순차적으로 진행하는 반복 구간의 종료 지점을 상기 분기 지점으로 결정하는 단계를 포함할 수 있다.In the step of determining the at least one branching point, when the deep learning algorithm is a CNN, at least one convolution layer and a pooling layer are sequentially performed. It may include the step of determining

상기 적어도 하나의 중간 딥러닝 모델을 생성하는 단계는 상기 분기에 따른 후보 중간 딥러닝 모델을 정의하는 단계; 상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하는 단계; 및 상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정하는 단계를 포함할 수 있다.The generating of the at least one intermediate deep learning model may include: defining a candidate intermediate deep learning model according to the branch; calculating the appropriateness of the branch point based on the number of layers (L) and the prediction accuracy (A) of the candidate intermediate deep learning model; and determining the candidate intermediate deep learning model as an intermediate deep learning model when the appropriateness of the corresponding branching point satisfies a preset condition.

상기 적정성을 산출하는 단계는 상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하는 단계를 포함하고, 상기 중간 딥러닝 모델로 결정하는 단계는 상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정하는 단계를 포함할 수 있다.The step of calculating the adequacy includes calculating the adequacy through a product operation (A * L) between the number of layers (L) and the prediction accuracy (A), and the step of determining with the intermediate deep learning model includes: When the adequacy is doubled compared to the intermediate deep learning model generated in the previous step, it may include determining the candidate intermediate deep learning model as the intermediate deep learning model.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 장치는 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 알고리즘 수신부; 상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 알고리즘 분석부; 상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 분기지점 결정부; 상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 중간모델 생성부; 및 상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 딥러닝 모델 구축부를 포함한다.Among the embodiments, the apparatus for automatically reducing the weight of a model for optimizing deep learning model serving includes: an algorithm receiving unit for receiving a deep learning algorithm for building a deep learning model; an algorithm analysis unit that divides the deep learning algorithm into a plurality of operation steps; a branch point determining unit for determining at least one branch point existing between the plurality of operation steps in a learning process according to the deep learning algorithm; an intermediate model generator for generating at least one intermediate deep learning model branching from the progress direction of the learning process based on the at least one branch point and proceeding to the last operation step of the deep learning algorithm; and a deep learning model building unit that completes the deep learning model and the at least one intermediate deep learning model according to the completion of the learning process.

상기 중간모델 생성부는 상기 분기에 따른 후보 중간 딥러닝 모델을 정의하고 상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하며 상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다.The intermediate model generation unit defines a candidate intermediate deep learning model according to the branching and calculates the appropriateness of the corresponding branching point based on the number of layers (L) and the prediction accuracy (A) of the candidate intermediate deep learning model, and the corresponding branching point When the adequacy of , satisfies a preset condition, the candidate intermediate deep learning model may be determined as the intermediate deep learning model.

상기 중간모델 생성부는 상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하고 상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정할 수 있다.The intermediate model generation unit calculates the adequacy through the multiplication operation (A * L) between the number of layers (L) and the prediction accuracy (A), and when the adequacy is doubled than the intermediate deep learning model generated in the previous step A corresponding candidate intermediate deep learning model may be determined as the intermediate deep learning model.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 이용한 클라우드 추론 서비스 제공 방법은 사용자 단말로부터 추론 서비스에 관한 요청을 수신하는 단계; 상기 요청의 수신 시점을 기준으로 클라우드 가용 자원 현황을 결정하고 상기 요청에 대한 응답 생성 시간을 예측하여 해당 응답 생성 시간에 따라 상기 복수의 딥러닝 모델들 중 어느 하나를 결정하는 단계; 상기 결정된 딥러닝 모델을 이용하여 상기 요청에 대한 응답을 생성하여 상기 사용자 단말에 제공하는 단계를 포함한다.Among the embodiments, a method for providing a cloud inference service using a model automatic weight reduction method for deep learning model serving optimization includes: receiving a request for an inference service from a user terminal; determining the cloud available resource status based on the reception time of the request, predicting a response generation time for the request, and determining any one of the plurality of deep learning models according to the corresponding response generation time; and generating a response to the request using the determined deep learning model and providing it to the user terminal.

상기 복수의 딥러닝 모델들은 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 기초로 예측 정확도와 연산 속도가 각각 상이하도록 생성될 수 있다.The plurality of deep learning models may be generated so that prediction accuracy and operation speed are different from each other based on a model automatic weight reduction method for deep learning model serving optimization.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology may have the following effects. However, this does not mean that a specific embodiment should include all of the following effects or only the following effects, so the scope of the disclosed technology should not be understood as being limited thereby.

본 발명의 일 실시예에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법은 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있다.A method and apparatus for automatically lightweighting a model for optimizing deep learning model serving according to an embodiment of the present invention, and a method for providing a cloud inference service using the same, provide a deep learning model inference service by automatically generating various levels of deep learning models. can

본 발명의 일 실시예에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법은 딥러닝 모델 개발 연구원이 정확도가 높은 복잡한 모델을 개발하는 중에 해당 모델로부터 각기 다른 예측 정확도와 모델 복잡도를 가지는 다양한 딥러닝 예측 모델들을 생성할 수 있다.A method and apparatus for automatically lightweighting a model for optimizing deep learning model serving according to an embodiment of the present invention, and a method for providing a cloud inference service using the same, are different from each other from the model while the deep learning model development researcher develops a complex model with high accuracy. It is possible to create various deep learning prediction models with prediction accuracy and model complexity.

도 1은 본 발명에 따른 모델 자동 경량화 시스템을 설명하는 도면이다.
도 2는 도 1의 모델 자동 경량화 장치의 시스템 구성을 설명하는 도면이다.
도 3은 도 1의 모델 자동 경량화 장치의 기능적 구성을 설명하는 도면이다.
도 4는 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 설명하는 순서도이다.
도 5는 본 발명에 따른 클라우드 추론 서비스 제공 방법을 설명하는 순서도이다.
도 6은 CNN의 레이어 구성을 설명하는 도면이다.
도 7 및 8은 본 발명에 따른 모델 자동 경량화 방법의 일 실시예를 설명하는 도면이다.1 is a view for explaining a model automatic weight reduction system according to the present invention.
FIG. 2 is a view for explaining a system configuration of the model automatic weight reduction device of FIG. 1 .
FIG. 3 is a view for explaining a functional configuration of the model automatic weight reduction device of FIG. 1 .
4 is a flowchart illustrating an automatic model weight reduction method for optimizing deep learning model serving according to the present invention.
5 is a flowchart illustrating a method of providing a cloud inference service according to the present invention.
6 is a diagram for explaining the layer configuration of CNN.
7 and 8 are diagrams for explaining an embodiment of the automatic model weight reduction method according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is merely an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiment described in the text. That is, since the embodiment may have various changes and may have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such effects, it should not be understood that the scope of the present invention is limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as “first” and “second” are for distinguishing one component from another, and the scope of rights should not be limited by these terms. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected to” another component, it may be directly connected to the other component, but it should be understood that other components may exist in between. On the other hand, when it is mentioned that a certain element is "directly connected" to another element, it should be understood that the other element does not exist in the middle. Meanwhile, other expressions describing the relationship between elements, that is, "between" and "between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression is to be understood as including the plural expression unless the context clearly dictates otherwise, and terms such as "comprises" or "have" refer to the embodied feature, number, step, action, component, part or these It is intended to indicate that a combination exists, and it should be understood that it does not preclude the possibility of the existence or addition of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, identification numbers (eg, a, b, c, etc.) are used for convenience of description, and identification numbers do not describe the order of each step, and each step clearly indicates a specific order in context. Unless otherwise specified, it may occur in a different order from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable codes on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. In addition, the computer-readable recording medium is distributed in a computer system connected to a network, so that the computer-readable code can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs, unless otherwise defined. Terms defined in the dictionary should be interpreted as being consistent with the meaning of the context of the related art, and cannot be interpreted as having an ideal or excessively formal meaning unless explicitly defined in the present application.

도 1은 본 발명에 따른 모델 자동 경량화 시스템을 설명하는 도면이다.1 is a view for explaining a model automatic weight reduction system according to the present invention.

도 1을 참조하면, 모델 자동 경량화 시스템(100)은 사용자 단말(110), 모델 자동 경량화 장치(130) 및 데이터베이스(150)를 포함할 수 있다.Referring to FIG. 1 , the model automatic weight reduction system 100 may include a user terminal 110 , an automatic model weight reduction apparatus 130 , and a database 150 .

사용자 단말(110)은 사용자에 운용되는 컴퓨팅 장치에 해당할 수 있다. 즉, 사용자는 이미지 분류, 자연어 처리 등의 응용을 위해 사용자 단말(110)을 통해 모델 자동 경량화 장치(130)에 추론 서비스를 요청할 수 있다. 사용자 단말(110)은 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다.The user terminal 110 may correspond to a computing device operated by a user. That is, the user may request an inference service from the model automatic weight reduction device 130 through the user terminal 110 for applications such as image classification and natural language processing. The user terminal 110 may be implemented as a smartphone, a notebook computer, or a computer, but is not limited thereto, and may be implemented in various devices such as a tablet PC.

또한, 사용자 단말(110)은 모델 자동 경량화 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 모델 자동 경량화 장치(130)와 동시에 연결될 수 있다. 또한, 사용자 단말(110)은 모델 자동 경량화 장치(130)와 연동하기 위한 전용 프로그램 또는 어플리케이션을 설치하여 실행할 수 있다.In addition, the user terminal 110 may be connected to the model automatic weight reduction device 130 through a network, and a plurality of user terminals 110 may be simultaneously connected to the model automatic weight reduction apparatus 130 . In addition, the user terminal 110 may install and execute a dedicated program or application for interworking with the model automatic weight reduction device 130 .

모델 자동 경량화 장치(130)는 사용자에 의해 설정된 딥러닝 알고리즘을 기초로 학습을 수행하여 딥러닝 모델을 구축할 수 있는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 모델 자동 경량화 장치(130)는 사용자가 요구한 딥러닝 알고리즘에 대응되는 딥러닝 모델을 구축하는 과정에서 사용자의 개입없이 높은 정확도의 복잡한 모델로부터 낮은 정확도의 단순한 모델들을 자동으로 생성하는 동작을 수행할 수 있다. 모델 자동 경량화 장치(130)는 사용자 단말(110)과 유선 네트워크 또는 블루투스, WiFi, LTE 등과 같은 무선 네트워크로 연결될 수 있고, 네트워크를 통해 사용자 단말(110)과 데이터를 송·수신할 수 있다.The model automatic weight reduction device 130 may be implemented as a server corresponding to a computer or program capable of building a deep learning model by performing learning based on a deep learning algorithm set by a user. The model automatic weight reduction device 130 automatically generates low-accuracy simple models from high-accuracy complex models without user intervention in the process of building a deep learning model corresponding to the deep learning algorithm requested by the user. can The model automatic weight reduction device 130 may be connected to the user terminal 110 and a wired network or a wireless network such as Bluetooth, WiFi, LTE, etc., and may transmit/receive data to and from the user terminal 110 through the network.

또한, 모델 자동 경량화 장치(130)는 데이터의 수집 또는 추가 기능의 제공을 위하여 외부 시스템(도 1에 미도시됨)과 연동하여 동작할 수도 있다. 예를 들어, 외부 시스템은 클라우드 서비스를 제공하는 클라우드 서버를 포함할 수 있다. 이 경우, 모델 자동 경량화 장치(130)는 클라우드 서버와 연동하여 사용자에 의해 요청된 추론 서비스를 처리할 수 있다. 즉, 모델 자동 경량화 장치(130)는 사용자의 추론 서비스 요청을 클라우드 서버에 질의하여 그에 대한 응답을 수신하여 사용자에게 제공할 수 있다.In addition, the model automatic weight reduction device 130 may operate in conjunction with an external system (not shown in FIG. 1 ) to collect data or provide an additional function. For example, the external system may include a cloud server that provides a cloud service. In this case, the model automatic weight reduction device 130 may process the inference service requested by the user in conjunction with the cloud server. That is, the model automatic weight reduction device 130 may query the user's request for inference service to the cloud server, receive a response thereto, and provide it to the user.

한편, 모델 자동 경량화 장치(130)는 클라우드 서버에 포함되어 구현될 수 있다. 이 경우 모델 자동 경량화 장치(130)는 클라우드 추론 서비스를 제공함에 있어 하나의 딥러닝 모델을 기초로 다양한 정확도와 복잡성을 가진 다수의 딥러닝 모델들을 구축할 수 있고, 이를 기초로 사용자의 요청에 관한 추론을 효율적으로 수행하여 서비스 응답을 제공할 수 있다.On the other hand, the model automatic weight reduction device 130 may be implemented by being included in the cloud server. In this case, the model automatic weight reduction device 130 can build a plurality of deep learning models with various accuracy and complexity based on one deep learning model in providing the cloud inference service, and based on this, Inference can be efficiently performed to provide a service response.

데이터베이스(150)는 모델 자동 경량화 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(150)는 딥러닝 알고리즘 및 학습 데이터를 저장할 수 있고, 학습을 통해 구축된 딥러닝 모델들을 저장할 수 있으며, 반드시 이에 한정되지 않고, 모델 자동 경량화 장치(130)가 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 수행하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 150 may correspond to a storage device for storing various types of information required in the operation process of the model automatic weight reduction device 130 . The database 150 may store deep learning algorithms and learning data, and may store deep learning models built through learning, and is not necessarily limited thereto, and the model automatic weight reduction device 130 for optimizing deep learning model serving. In the process of performing the model automatic weight reduction method, information collected or processed in various forms can be stored.

도 2는 도 1의 모델 자동 경량화 장치의 시스템 구성을 설명하는 도면이다.FIG. 2 is a view for explaining a system configuration of the model automatic weight reduction device of FIG. 1 .

도 2를 참조하면, 모델 자동 경량화 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함하여 구현될 수 있다.Referring to FIG. 2 , the model automatic weight reduction device 130 may be implemented including a processor 210 , a memory 230 , a user input/output unit 250 , and a network input/output unit 270 .

프로세서(210)는 모델 자동 경량화 장치(130)가 동작하는 과정에서의 각 단계들을 처리하는 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄할 수 있다. 프로세서(210)는 모델 자동 경량화 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 모델 자동 경량화 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor 210 may execute a procedure for processing each step in the process of the automatic model automatic weight reduction device 130 operating, and manage the memory 230 that is read or written throughout the process, and the memory ( 230) may schedule a synchronization time between the volatile memory and the non-volatile memory. The processor 210 may control the overall operation of the model automatic weight reduction device 130 , and is electrically connected to the memory 230 , the user input/output unit 250 , and the network input/output unit 270 to control the data flow therebetween. can do. The processor 210 may be implemented as a central processing unit (CPU) of the model automatic weight reduction device 130 .

메모리(230)는 SSD(Solid State Drive) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 모델 자동 경량화 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory 230 is implemented as a non-volatile memory such as a solid state drive (SSD) or a hard disk drive (HDD), and may include an auxiliary storage device used to store overall data required for the model automatic lightweight device 130 and , it may include a main memory implemented as a volatile memory such as random access memory (RAM).

사용자 입출력부(250)는 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)는 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 모델 자동 경량화 장치(130)는 독립적인 서버로서 수행될 수 있다.The user input/output unit 250 may include an environment for receiving a user input and an environment for outputting specific information to the user. For example, the user input/output unit 250 may include an input device including an adapter such as a touch pad, a touch screen, an on-screen keyboard, or a pointing device, and an output device including an adapter such as a monitor or a touch screen. In an embodiment, the user input/output unit 250 may correspond to a computing device connected through remote access, and in such a case, the model automatic weight reduction device 130 may be performed as an independent server.

네트워크 입출력부(270)은 네트워크를 통해 외부 장치 또는 시스템과 연결하기 위한 환경을 포함하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다.The network input/output unit 270 includes an environment for connecting with an external device or system through a network, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a VAN (Wide Area Network) (VAN). It may include an adapter for communication such as Value Added Network).

도 3은 도 1의 모델 자동 경량화 장치의 기능적 구성을 설명하는 도면이다.FIG. 3 is a view for explaining a functional configuration of the model automatic weight reduction device of FIG. 1 .

도 3을 참조하면, 모델 자동 경량화 장치(130)는 알고리즘 수신부(310), 알고리즘 분석부(320), 분기지점 결정부(330), 중간모델 생성부(340), 딥러닝 모델 구축부(350) 및 제어부(360)를 포함할 수 있다.Referring to FIG. 3 , the model automatic weight reduction device 130 includes an algorithm receiving unit 310 , an algorithm analyzing unit 320 , a branch point determining unit 330 , an intermediate model generating unit 340 , and a deep learning model building unit 350 . ) and a control unit 360 .

알고리즘 수신부(310)는 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신할 수 있다. 이때, 딥러닝 알고리즘은 DNN(Deep Neural Network), CNN(Convolution Neural Network) 및 RNN(Recurrent Neural Network) 등을 포함할 수 있다. 알고리즘 수신부(310)는 데이터베이스(150)를 통해 딥러닝 알고리즘을 저장할 수 있고, 사용자 단말(110)로부터 특정 딥러닝 알고리즘에 관한 선택 정보 만을 수신함으로써 해당 딥러닝 알고리즘에 관한 수신 동작을 처리할 수도 있다. 한편, 딥러닝 알고리즘은 상기의 대표적인 알고리즘을 기초로 수많은 변형 알고리즘이 존재하며, 알고리즘 수신부(310)는 사용자 단말(110)로부터 딥러닝 알고리즘에 관한 설정 내용을 함께 수신하여 딥러닝 알고리즘의 동작 단계들 및 동작 순서를 구체적으로 특정할 수 있다.The algorithm receiving unit 310 may receive a deep learning algorithm for building a deep learning model. In this case, the deep learning algorithm may include a Deep Neural Network (DNN), a Convolution Neural Network (CNN), and a Recurrent Neural Network (RNN). The algorithm receiving unit 310 may store the deep learning algorithm through the database 150, and by receiving only selection information about a specific deep learning algorithm from the user terminal 110, it may process a receiving operation related to the deep learning algorithm. . On the other hand, the deep learning algorithm has a number of modified algorithms based on the above representative algorithm, and the algorithm receiving unit 310 receives the settings related to the deep learning algorithm from the user terminal 110 together with the operation steps of the deep learning algorithm. and an operation sequence may be specifically specified.

알고리즘 분석부(320)는 딥러닝 알고리즘을 복수의 동작 단계들로 분할할 수 있다. 딥러닝 알고리즘은 학습 데이터를 복수의 동작 단계들에 순차적으로 통과시키는 과정에서 각 단계별 가중치를 누적 갱신하는 방식으로 학습 과정을 수행하도록 정의될 수 있다. 알고리즘 분석부(320)는 사용자에 의해 선택 또는 입력된 딥러닝 알고리즘을 분석하여 동작의 진행 과정에서 단위 연산들의 종료 지점을 기초로 독립적인 동작 단계들을 정의할 수 있다.The algorithm analyzer 320 may divide the deep learning algorithm into a plurality of operation steps. The deep learning algorithm may be defined to perform the learning process in a manner of cumulatively updating the weights of each step in the process of sequentially passing the learning data through a plurality of operation steps. The algorithm analyzer 320 may analyze the deep learning algorithm selected or input by the user to define independent operation steps based on the end point of unit operations in the process of operation.

일 실시예에서, 알고리즘 분석부(320)는 딥러닝 알고리즘의 동작들을 복수의 레이어(layer)들로 분할하고, 복수의 레이어들 각각에 대응하는 동작 단계들을 결정함으로써 복수의 동작 단계들로 분할할 수 있다. 딥러닝 알고리즘은 단위 연산들의 집합이 레이어(layer)로 구분되어 정의될 수 있다. 예를 들어, 이미지 분류 딥러닝 모델 중 하나인 CNN은 이미지 처리 레이어(Convolution layer), 완전 연결 레이어(Fully-Connected layer), 활성 레이어(Activation layer) 및 소프트맥스 레이어(Softmax layer) 등으로 구성될 수 있다. 알고리즘 분석부(320)는 딥러닝 알고리즘의 기본 레이어와 사용자에 의해 설정된 구체적인 파라미터들을 기초로 딥러닝 알고리즘의 동작들을 복수의 레이어들로 분할할 수 있고, 이를 기초로 복수의 동작 단계들을 결정할 수 있다. 이때, 독립적인 동작 단계들 간의 연결 지점은 이후 단계에서 중간 딥러닝 모델을 생성하기 위한 분기 지점으로 활용될 수 있다.In one embodiment, the algorithm analyzer 320 divides the operations of the deep learning algorithm into a plurality of layers, and divides the operations into a plurality of operation steps by determining operation steps corresponding to each of the plurality of layers. can A deep learning algorithm may be defined by dividing a set of unit operations into layers. For example, CNN, one of the image classification deep learning models, is composed of an image processing layer (Convolution layer), a fully-connected layer (Fully-Connected layer), an activation layer (Activation layer), and a softmax layer (Softmax layer). can Algorithm analysis unit 320 may divide the operations of the deep learning algorithm into a plurality of layers based on the basic layer of the deep learning algorithm and specific parameters set by the user, and may determine a plurality of operation steps based on this . In this case, the connection point between the independent operation steps may be utilized as a branch point for generating an intermediate deep learning model in a later step.

일 실시예에서, 알고리즘 분석부(320)는 딥러닝 알고리즘의 동작 과정에서 반복적으로 수행되는 반복 구간을 결정하고 반복 구간을 기준으로 반복 전 구간 및 반복 후 구간을 결정할 수 있다. 또한, 알고리즘 분석부(320)는 반복 구간을 적어도 하나의 단위 구간으로 분할할 수 있으며, 딥러닝 알고리즘의 동작 순서에 따라 반복 전 구간, 적어도 하나의 단위 구간 및 반복 후 구간을 정렬하여 복수의 동작 단계들로 결정할 수 있다. 딥러닝 모델은 다양한 레이어들을 반복적으로 쌓는 학습 과정에 따라 최종 구축된 모델의 성능(예를 들어, 정확도)이 결정될 수 있다. 따라서, 알고리즘 분석부(320)는 반복 구간의 시작 및 종료 지점을 기초로 딥러닝 알고리즘에 관한 복수의 동작 단계들을 결정할 수 있다.In an embodiment, the algorithm analyzer 320 may determine a repetition section that is repeatedly performed in the operation process of the deep learning algorithm, and determine a section before and after iteration based on the repetition section. In addition, the algorithm analysis unit 320 may divide the repetition section into at least one unit section, and align the section before the iteration, at least one unit section, and the section after the iteration according to the operation order of the deep learning algorithm to perform a plurality of operations steps can be determined. In the deep learning model, the performance (eg, accuracy) of the finally built model may be determined according to the learning process of repeatedly stacking various layers. Accordingly, the algorithm analyzer 320 may determine a plurality of operation steps related to the deep learning algorithm based on the start and end points of the iteration section.

한편, 딥러닝 알고리즘의 반복 구간은 설정에 따라 동일한 단위 구간들이 연속하여 진행하도록 구현될 수 있다. 예를 들어, VGG 모델(VGGNet)의 경우 두개의 콘볼루션(convolution) 연산과 하나의 맥스풀링(maxpooling) 연산으로 구성된 제1 단위 구간이 반복적으로 수행될 수 있고, 세개의 콘볼루션 연산과 하나의 맥스풀링 연산으로 구성된 제2 단위 구간이 반복적으로 수행될 수도 있다. 알고리즘 분석부(320)는 반복 구간 내에서도 단위 구간들로 분할하여 각각 독립된 동작 단계들로 결정할 수 있다.On the other hand, the repetition section of the deep learning algorithm may be implemented so that the same unit sections are continuously performed according to settings. For example, in the case of the VGG model (VGGNet), the first unit section consisting of two convolution operations and one maxpooling operation may be repeatedly performed, and three convolution operations and one The second unit section composed of the maxpooling operation may be repeatedly performed. The algorithm analyzer 320 may divide into unit sections even within the repetition section and determine each independent operation step.

분기지점 결정부(330)는 딥러닝 알고리즘에 따른 학습 과정에서 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정할 수 있다. 여기에서, 분기 지점은 딥러닝 알고리즘을 구성하는 동작 단계들 사이의 지점으로서 중간 딥러닝 모델을 생성하기 위하여 딥러닝 모델의 진행 방향에서 독립적으로 분기하는 지점에 해당할 수 있다. 따라서, 분기 지점으로부터 분기하여 생성되는 중간 딥러닝 모델은 원래의 딥러닝 모델에 비하여 정확도는 낮은 반면, 복잡성 감소에 따라 연산 속도는 증가할 수 있다. 분기지점 결정부(330)는 원래의 딥러닝 모델과 함께 생성되는 중간 딥러닝 모델들의 정확도가 균등하게 분포될 수 있도록 분기 지점을 결정할 수 있다.The branch point determiner 330 may determine at least one branch point existing between a plurality of operation steps in a learning process according to the deep learning algorithm. Here, the branching point is a point between the operation steps constituting the deep learning algorithm, and may correspond to a point independently branching in the progress direction of the deep learning model in order to generate an intermediate deep learning model. Therefore, the intermediate deep learning model generated by branching from the branching point has lower accuracy than the original deep learning model, but the computation speed may increase as complexity decreases. The branch point determiner 330 may determine the branch point so that the accuracy of the intermediate deep learning models generated together with the original deep learning model can be evenly distributed.

일 실시예에서, 분기지점 결정부(330)는 반복 구간이 종료되는 지점마다 분기 지점으로 결정할 수 있다. 딥러닝 알고리즘의 반복 구간이 반복될 때마다 누적 학습에 따른 모델 성능은 증가하는 반면, 복잡성도 함께 증가하여 연산 속도는 감소할 수 있다. 이에 따라, 분기지점 결정부(330)는 반복 구간의 종료 지점을 기초로 분기 지점을 결정할 수 있다. 다만, 반복 구간의 종료 지점은 동작 단계가 동일하게 누적됨에도 불구하고 중간 딥러닝 모델의 성능이 선형적으로 변경되지 않을 수 있다.In an embodiment, the branch point determiner 330 may determine a branch point for each point at which the repetition section ends. Each time the iteration section of the deep learning algorithm is repeated, the model performance according to the cumulative learning increases, while the complexity also increases and the computation speed may decrease. Accordingly, the branch point determiner 330 may determine the branch point based on the end point of the repetition section. However, the performance of the intermediate deep learning model may not change linearly at the end point of the iteration section even though the operation steps are equally accumulated.

일 실시예에서, 분기지점 결정부(330)는 딥러닝 알고리즘이 CNN인 경우 적어도 하나의 콘볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)를 순차적으로 진행하는 반복 구간의 종료 지점을 분기 지점으로 결정할 수 있다. CNN은 콘볼루션 레이어, 풀링 레이어 및 완전 연결 레이어(fully-connected layer)로 구성될 수 있으며, 콘볼루션 레이어와 풀링 레이어는 특징 추출(feature extraction) 단계로서 소정의 횟수만큼 반복적으로 수행될 수 있다. 완전 연결 레이어는 분류(classification) 단계로서 단일 동작으로 수행될 수 있다. 분기지점 결정부(330)는 CNN의 경우 풀링 레이어의 종료 지점을 기초로 분기 지점으로 결정할 수 있으며, 이에 따라 분기 지점에 완전 연결 레이어가 단순 결합됨으로써 동일한 동작 구조를 기반으로 반복 횟수에서만 차이가 있는 다수의 중간 딥러닝 모델들이 생성될 수 있다.In one embodiment, when the deep learning algorithm is CNN, the branch point determiner 330 determines the end point of the iteration section in which at least one convolution layer and a pooling layer are sequentially performed as a branch point. can be determined as A CNN may be composed of a convolutional layer, a pooling layer, and a fully-connected layer, and the convolutional layer and the pooling layer may be repeatedly performed a predetermined number of times as a feature extraction step. The fully connected layer may be performed in a single operation as a classification step. The branch point determiner 330 may determine the branch point based on the end point of the pooling layer in the case of CNN, and accordingly, the fully connected layer is simply coupled to the branch point, so that there is a difference only in the number of iterations based on the same operation structure. A number of intermediate deep learning models can be created.

중간모델 생성부(340)는 적어도 하나의 분기 지점을 기준으로 학습 과정의 진행 방향으로부터 분기하고 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성할 수 있다. 즉, 중간 딥러닝 모델은 원래의 딥러닝 모델을 기반으로 학습 과정의 일부를 생략하여 생성된 딥러닝 모델에 해당할 수 있다. 따라서, 중간 딥러닝 모델은 원래의 딥러닝 모델에 비해 복잡성이 감소하여 연산 속도가 증가한 반면, 예측 성능(즉, 정확도)은 감소할 수 있다. 중간모델 생성부(340)는 분기 지점에서 추가 학습 과정을 수행하지 않고 곧바로 마지막 동작 단계로 진행하는 중간 딥러닝 모델을 생성할 수 있다. 한편, 마지막 동작 단계는 원래의 딥러닝 모델의 마지막 동작 단계와 동일하게 구성될 수 있고, 필요에 따라 마지막 동작 단계의 일부만으로 구성될 수도 있다.The intermediate model generator 340 may generate at least one intermediate deep learning model that branches from the progress direction of the learning process based on at least one branch point and proceeds to the last operation step of the deep learning algorithm. That is, the intermediate deep learning model may correspond to a deep learning model generated by omitting a part of the learning process based on the original deep learning model. Therefore, the intermediate deep learning model may reduce the complexity of the original deep learning model and increase the computational speed, while the prediction performance (i.e., accuracy) may decrease. The intermediate model generator 340 may generate an intermediate deep learning model that proceeds directly to the last operation step without performing an additional learning process at the branch point. On the other hand, the last operation step may be configured the same as the last operation step of the original deep learning model, and may consist of only a part of the last operation step if necessary.

일 실시예에서, 중간모델 생성부(340)는 분기에 따른 후보 중간 딥러닝 모델을 정의하고 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하며 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다. 여기에서, 레이어 수(L)은 딥러닝 모델을 구성하는 레이어들의 총 개수에 해당할 수 있고, 예측 정확도(A)는 생성된 딥러닝 모델의 성능지표에 해당할 수 있으며 원래의 딥러닝 모델의 성능을 100%으로 가정하여 이에 대한 상대적 수치로 표현될 수 있다.In one embodiment, the intermediate model generator 340 defines a candidate intermediate deep learning model according to branching and determines the appropriateness of the branch point based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model. If the appropriateness of the branching point satisfies the preset condition, the candidate intermediate deep learning model may be determined as the intermediate deep learning model. Here, the number of layers (L) may correspond to the total number of layers constituting the deep learning model, and the prediction accuracy (A) may correspond to the performance index of the generated deep learning model, and may correspond to the performance index of the original deep learning model. Assuming that the performance is 100%, it can be expressed as a relative number.

즉, 중간모델 생성부(340)는 모든 분기 지점에 대해 중간 딥러닝 모델을 생성하는 대신 적정한 분기 지점을 선별하여 중간 딥러닝 모델을 생성할 수 있다. 만약 중간 딥러닝 모델이 너무 많이 생성될 경우 비슷한 성능을 가지는 다수의 모델들을 관리해야 하는 문제가 발생할 수 있으며, 중간 딥러닝 모델의 개수가 너무 작다면 각 중간 딥러닝 모델 간의 연산 속도 및 정확도의 차이가 커져서 이를 활용한 추론 서비스 제공 시 제약으로 작용할 수 있다.That is, the intermediate model generator 340 may generate an intermediate deep learning model by selecting appropriate branch points instead of generating intermediate deep learning models for all branch points. If too many intermediate deep learning models are created, there may be a problem of managing multiple models with similar performance. If the number of intermediate deep learning models is too small, the difference in computation speed and accuracy between each intermediate deep learning model , which may act as a limitation when providing an inference service using it.

일 실시예에서, 중간모델 생성부(340)는 레이어 수(L)와 예측 정확도(A) 간의 곱 연산(A*L)을 통해 분기 지점의 적정성을 산출하고 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다. 즉, 중간모델 생성부(340)는 A*L 값을 분기지점의 적적성으로서 산출할 수 있고, 해당 값이 초기 값의 2배씩 증가할 때마다 중간 딥러닝 모델을 생성할 수 있다. 결과적으로, 모델링의 초기에 정확도가 빠르게 증가하기에 보다 많은 중간 딥러닝 모델이 생성될 수 있고, 모델링의 후반으로 갈수록 레이어는 깊어지는 반면 정확도는 크게 향상되지 않기에 상대적으로 적은 중간 딥러닝 모델이 생성될 수 있다.In one embodiment, the intermediate model generator 340 calculates the adequacy of the branching point through the product operation (A * L) between the number of layers (L) and the prediction accuracy (A), and the adequacy of the intermediate dip generated in the previous step If it is doubled as the learning model, the corresponding candidate intermediate deep learning model can be determined as the intermediate deep learning model. That is, the intermediate model generation unit 340 may calculate the A*L value as the aptitude of the branch point, and may generate the intermediate deep learning model whenever the corresponding value increases twice the initial value. As a result, more intermediate deep learning models can be created because the accuracy increases rapidly in the early stages of modeling, and relatively few intermediate deep learning models are created because the accuracy does not improve significantly while the layers become deeper toward the latter part of modeling. can be

딥러닝 모델 구축부(350)는 학습 과정의 완료에 따라 딥러닝 모델 및 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다. 딥러닝 모델 구축부(350)는 딥러닝 알고리즘에 따라 소정의 학습 데이터에 대한 학습을 수행할 수 있고, 모든 학습 데이터에 대한 학습이 완료되면 원래의 딥러닝 모델과 이에 기반하여 생성되는 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다. 즉, 딥러닝 모델 구축부(350)는 하나의 딥러닝 알고리즘과 학습 데이터 집합을 기초로 복수의 딥러닝 모델들을 자동으로 구축할 수 있다.The deep learning model building unit 350 may complete the deep learning model and at least one intermediate deep learning model according to the completion of the learning process. The deep learning model building unit 350 may perform learning on predetermined learning data according to the deep learning algorithm, and when learning of all the learning data is completed, the original deep learning model and at least one Intermediate deep learning models can be completed. That is, the deep learning model building unit 350 may automatically build a plurality of deep learning models based on one deep learning algorithm and a training data set.

이때, 생성되는 복수의 딥러닝 모델들은 예측 정확도와 연산 속도가 각각 상이할 수 있으며, 이를 활용한 추론 서비스의 수행 과정에서 시스템의 동작 조건에 따라 딥러닝 모델이 선택적으로 적용할 수 있다. 예를 들어, 추론 서비스 제공 중 서비스 요청이 폭증하는 경우 연산이 빠른 가벼운 딥러닝 모델을 활용하여 보다 빠른 응답을 제공할 수 있고 서비스 요청이 적은 경우 복잡성 높은 딥러닝 모델을 활용하여 보다 정확한 응답을 제공할 수 있다.In this case, the plurality of generated deep learning models may have different prediction accuracy and operation speed, respectively, and the deep learning model may be selectively applied according to the operating conditions of the system in the process of performing an inference service using them. For example, if there is an explosion of service requests while providing inference service, a faster response can be provided by using a lightweight deep learning model with fast computation. can do.

제어부(360)는 모델 자동 경량화 장치(130)의 전체적인 동작을 제어하고, 알고리즘 수신부(310), 알고리즘 분석부(320), 분기지점 결정부(330), 중간모델 생성부(340) 및 딥러닝 모델 구축부(350) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit 360 controls the overall operation of the model automatic weight reduction device 130 , and the algorithm receiving unit 310 , the algorithm analyzing unit 320 , the branch point determining unit 330 , the intermediate model generating unit 340 and deep learning. A control flow or data flow between the model building units 350 may be managed.

도 4는 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 설명하는 순서도이다.4 is a flowchart illustrating a model automatic weight reduction method for optimizing deep learning model serving according to the present invention.

도 4를 참조하면, 모델 자동 경량화 장치(130)는 알고리즘 수신부(310)를 통해 사용자 단말(110)로부터 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신할 수 있다(단계 S410). 모델 자동 경량화 장치(130)는 알고리즘 분석부(320)를 통해 딥러닝 알고리즘을 복수의 동작 단계들로 분할할 수 있다(단계 S430). 모델 자동 경량화 장치(130)는 분기지점 결정부(330)를 통해 딥러닝 알고리즘에 따른 학습 과정에서 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정할 수 있다(단계 S450).Referring to FIG. 4 , the model automatic weight reduction device 130 may receive a deep learning algorithm for building a deep learning model from the user terminal 110 through the algorithm receiving unit 310 (step S410 ). The model automatic weight reduction apparatus 130 may divide the deep learning algorithm into a plurality of operation steps through the algorithm analysis unit 320 (step S430). The model automatic weight reduction device 130 may determine at least one branch point existing between a plurality of operation steps in the learning process according to the deep learning algorithm through the branch point determiner 330 (step S450).

또한, 모델 자동 경량화 장치(130)는 중간모델 생성부(340)를 통해 적어도 하나의 분기 지점을 기준으로 학습 과정의 진행 방향으로부터 분기하고 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성할 수 있다(단계 S470). 모델 자동 경량화 장치(130)는 딥러닝 모델 구축부(350)를 통해 학습 과정의 완료에 따라 딥러닝 모델 및 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다(단계 S490).In addition, the model automatic weight reduction device 130 branches from the progress direction of the learning process based on at least one branching point through the intermediate model generator 340 and proceeds to the last operation step of the deep learning algorithm at least one intermediate deep A learning model may be generated (step S470). The model automatic weight reduction device 130 may complete the deep learning model and at least one intermediate deep learning model according to the completion of the learning process through the deep learning model building unit 350 (step S490).

도 5는 본 발명에 따른 클라우드 추론 서비스 제공 방법을 설명하는 순서도이다.5 is a flowchart illustrating a method of providing a cloud inference service according to the present invention.

도 5를 참조하면, 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 사용하면 예측 정확도와 연산 속도가 상이한 복수의 딥러닝 모델들을 생성할 수 있으며, 이를 활용하여 클라우드 추론 서비스를 제공할 수 있다.Referring to FIG. 5 , using the model automatic weight reduction method for optimizing deep learning model serving according to the present invention, a plurality of deep learning models with different prediction accuracy and operation speed can be generated, and a cloud inference service is provided by using this can do.

보다 구체적으로, 단계 S510에서 본 발명에 따른 클라우드 추론 서비스 제공 방법은 사용자 단말(110)로부터 추론 서비스에 관한 요청을 수신할 수 있다. 만약 클라우드 추론 서비스 제공 장치가 본 발명에 따른 모델 자동 경량화 장치와 독립적으로 구현된 경우, 모델 자동 경량화 장치가 해당 요청을 수신하여 클라우드 추론 서비스 제공 장치로 전달할 수 있다.More specifically, in step S510 , the method for providing a cloud inference service according to the present invention may receive a request for an inference service from the user terminal 110 . If the cloud inference service providing device is implemented independently of the model automatic lightweighting device according to the present invention, the model automatic lightweighting device may receive the request and deliver it to the cloud inference service providing device.

단계 S530)에서 클라우드 추론 서비스 제공 방법은 기 구축된 복수의 딥러닝 모델들 중에서 어느 하나를 결정할 수 있다. 이때, 클라우드 추론 서비스 제공 방법은 요청의 수신 시점을 기준으로 클라우드 가용 자원 현황을 결정하고 요청에 대한 응답 생성 시간을 예측하여 해당 응답 생성 시간에 따라 복수의 딥러닝 모델들 중 어느 하나를 결정할 수 있다.In step S530), the cloud inference service providing method may determine any one of a plurality of pre-built deep learning models. In this case, the cloud inference service providing method determines the cloud available resource status based on the request reception time, predicts the response generation time to the request, and determines any one of a plurality of deep learning models according to the corresponding response generation time. .

예를 들어, 현재의 클라우드 가용 자원 현황을 기초로 원래의 딥러닝 모델을 이용하여 서비스 요청에 관한 응답을 생성할 경우 10초가 소요되는 것으로 예측된다면, 정확도는 다소 낮더라도 복잡성이 낮은 딥러닝 모델을 선택하여 10초 보다 빠르게 응답을 생성할 수 있다.For example, if it is predicted that it will take 10 seconds to generate a response to a service request using the original deep learning model based on the current cloud available resource status, use a deep learning model with low complexity even if the accuracy is somewhat low. You can choose to generate a response faster than 10 seconds.

단계 S550에서, 클라우드 추론 서비스 제공 방법은 기 결정된 딥러닝 모델을 이용하여 사용자의 서비스 요청에 대한 응답을 생성할 수 있고, 사용자 단말(110)을 통해 사용자에게 제공할 수 있다.In step S550 , the cloud inference service providing method may generate a response to the user's service request using a predetermined deep learning model, and may provide it to the user through the user terminal 110 .

도 6은 CNN의 레이어 구성을 설명하는 도면이다.6 is a diagram for explaining the layer configuration of CNN.

도 6을 참조하면, CNN(Convolution Neural Network)은 이미지 처리 레이어(Convolution layer), 완전 연결 레이어(Fully-Connected layer), 활성 레이어(Activation layer) 등으로 구성될 수 있다.Referring to FIG. 6 , a Convolution Neural Network (CNN) may include an image processing layer (convolution layer), a fully-connected layer, an activation layer, and the like.

모델 자동 경량화 장치(130)는 딥러닝 알고리즘을 분석하여 복수의 동작 단계들로 분할할 수 있으며, 딥러닝 알고리즘에 정의된 반복 구간을 기초로 동작 단계들을 분할할 수 있다.The model automatic weight reduction apparatus 130 may analyze the deep learning algorithm and divide it into a plurality of operation steps, and may divide the operation steps based on a repetition interval defined in the deep learning algorithm.

도 6에서, CNN의 경우 콘볼루션 레이어들이 반복적으로 수행되는 단계 S630이 반복 구간에 해당할 수 있다. 모델 자동 경량화 장치(130)는 해당 반복 구간(S630)을 기준으로 반복 전 구간(S610)과 반복 후 구간(S630)을 결정할 수 있다. CNN의 경우 반복 전 구간(S610)은 입력 수신 단계에 해당할 수 있고, 반복 후 구간(S610)은 출력 생성 단계에 해당할 수 있다. 이때, 출력 생성 단계는 완전 연결 레이어(Dense Layer)와 활성 레이어(Activation Layer)를 포함할 수 있으며, 필요에 따라 반복될 수 있다.In FIG. 6 , in the case of CNN, step S630 in which convolutional layers are repeatedly performed may correspond to a repetition period. The model automatic weight reduction apparatus 130 may determine a pre-repetition section ( S610 ) and a post-repetition section ( S630 ) based on the corresponding repetition section ( S630 ). In the case of CNN, the section before the repetition ( S610 ) may correspond to the input receiving step, and the section after the repetition ( S610 ) may correspond to the step of generating the output. In this case, the output generating step may include a fully connected layer (Dense Layer) and an activation layer (Activation Layer), and may be repeated as necessary.

또한, 모델 자동 경량화 장치(130)는 반복 구간(S630)을 복수의 단위 구간들로 분할할 수 있고, 단위 구간들의 종료 지점을 분기 지점으로 결정하여 중간 딥러닝 모델을 생성할 수 있다. 즉, 도 6에서 콘볼루션 레이어들 사이의 지점이 분기 지점으로 결정될 수 있다.In addition, the model automatic weight reduction device 130 may divide the repetition section ( S630 ) into a plurality of unit sections, and determine an end point of the unit sections as a branch point to generate an intermediate deep learning model. That is, a point between the convolutional layers in FIG. 6 may be determined as a branch point.

도 7 및 8은 본 발명에 따른 모델 자동 경량화 방법의 일 실시예를 설명하는 도면이다.7 and 8 are diagrams for explaining an embodiment of the automatic model weight reduction method according to the present invention.

도 7을 참조하면, 모델 자동 경량화 장치(130)는 사용자로부터 하나의 딥러닝 알고리즘을 수신하여 원래의 학습 방향(750)에 따라 딥러닝 모델을 구축할 수 있다. 모델 자동 경량화 장치(130)는 학습 과정에서 적어도 하나의 분기 지점(740)을 결정할 수 있으며, 분기 지점을 기준으로 원래의 학습 방향(750)으로부터 분기하는 분기된 학습 방향(760)을 정의할 수 있다. 즉, 모델 자동 경량화 장치(130)는 분기된 학습 방향(760)에 따라 분기 지점(740)에서 곧바로 학습 종료 단계(S730)를 수행하는 중간 딥러닝 모델을 생성할 수 있다. 이렇게 생성된 중간 딥러닝 모델은 원래의 학습 방향(750)보다 더 단순한 구조를 가지게 되어 추론 시간은 짧지만 정확도가 낮은 특징을 가질 수 있다.Referring to FIG. 7 , the model automatic weight reduction device 130 may receive one deep learning algorithm from the user and build a deep learning model according to the original learning direction 750 . The model automatic weight reduction device 130 may determine at least one branching point 740 in the learning process, and may define a branched learning direction 760 that branches from the original learning direction 750 based on the branching point. have. That is, the model automatic weight reduction device 130 may generate an intermediate deep learning model that performs the learning termination step S730 immediately at the branch point 740 according to the branched learning direction 760 . The intermediate deep learning model generated in this way has a simpler structure than the original learning direction 750, so it may have a short inference time but low accuracy.

한편, VGG 모델의 경우, 학습 종료 단계(S730)은 플래튼(Flatten) 레이어, 덴스(Dense) 레이어, 드롭아웃(Dropout) 레이어 및 활성(Activation) 레이어로 구성될 수 있으며, 반드시 이에 한정되지 않고, 필요에 따라 선택적으로 적용될 수 있음은 물론이다. 또한, 덴스 레이어와 드롭아웃 레이어는 반복적으로 수행될 수도 있다.On the other hand, in the case of the VGG model, the learning end step S730 may consist of a flatten layer, a dense layer, a dropout layer, and an activation layer, but is not necessarily limited thereto. Of course, it can be selectively applied according to need. Also, the dense layer and the dropout layer may be repeatedly performed.

또한, 모델 자동 경량화 장치(130)는 중간 딥러닝 모델을 생성하기 위하여 원래의 학습 과정의 동작 단계들 사이의 분기 지점(740)을 결정할 수 있다. 이때, 모델 자동 경량화 장치(130)는 소정의 동작 단계들이 반복적으로 수행되는 반복 구간을 결정하고 반복 구간의 종료 지점을 분기 지점으로 결정할 수 있다. 도 7에서, 단계 S710 및 단계 S720은 각각 2개의 컨볼루션 연산(Conv1 및 Conv2)과 하나의 풀링 연산(Pool)을 순차적으로 수행하는 구간으로 반복 구간에 해당할 수 있다. 즉, 모델 자동 경량화 장치(130)는 단계 S710 및 단계 S720 사이의 지점을 분기 지점(740)으로 결정하고, 이로부터 분기하는 중간 딥러닝 모델을 생성할 수 있다.In addition, the model automatic weight reduction device 130 may determine a branching point 740 between the operation steps of the original learning process in order to generate an intermediate deep learning model. In this case, the model automatic weight reduction device 130 may determine a repetition section in which predetermined operation steps are repeatedly performed and determine an end point of the repetition section as a branch point. 7 , steps S710 and S720 are sections in which two convolution operations (Conv1 and Conv2) and one pooling operation (Pool) are sequentially performed, respectively, and may correspond to a repetition section. That is, the model automatic weight reduction device 130 may determine a point between steps S710 and S720 as a branch point 740, and generate an intermediate deep learning model branching therefrom.

도 8을 참조하면, 모델 자동 경량화 장치(130)는 원래의 딥러닝 모델의 학습 과정에서 분기하는 중간 딥러닝 모델(Model A 또는 Model B)을 생성할 수 있다. 딥러닝 모델은 동일 동작을 반복적으로 수행하여 모델의 성능을 높일 수 있으나, 복잡성이 증가하여 계산의 오버헤드도 동시에 증가할 수 있다. 즉, 모델 자동 경량화 장치(130)는 원래의 학습 과정의 일부가 생략된 중간 딥러닝 모델들(Model A 및 Model B)을 생성하여 다양한 정확도와 계산 오버헤드를 가지는 딥러닝 모델들을 자동으로 확보할 수 있다.Referring to FIG. 8 , the model automatic weight reduction device 130 may generate an intermediate deep learning model (Model A or Model B) that branches in the learning process of the original deep learning model. A deep learning model can increase the performance of the model by repeatedly performing the same operation, but the complexity of the model increases and thus the computational overhead may increase at the same time. That is, the model automatic weight reduction device 130 generates intermediate deep learning models (Model A and Model B) in which a part of the original learning process is omitted to automatically secure deep learning models with various accuracy and computational overhead. can

이에 따라, 본 발명에 따른 모델 자동 경량화 방법을 통해 구축된 복수의 딥러닝 모델들은 클라우드에서 학습하는 서비스에 적용될 수 있으며, 딥러닝 모델 개발자는 별도의 중간 딥러닝 모델 생성 작업없이도 학습작업 실행 중 중간 딥러닝 모델을 자동으로 생성할 수 있게 된다. 즉, 중간 딥러닝 모델들은 정확도는 낮을지라도 모델의 계산 오버헤드에 관한 다양한 스펙트럼을 제공하여 특정 서비스 환경에서도 원활한 서비스 제공을 가능하게 할 수 있다. 예를 들어, 추론 서비스 제공 중 요청이 폭증하는 경우 연산이 빠른 가벼운 모델을 활용하여 서비스를 제공해 줄 수 있게 되어 사용자의 서비스 만족도를 높일 수 있다.Accordingly, a plurality of deep learning models built through the model automatic weight reduction method according to the present invention can be applied to a service learning in the cloud, and a deep learning model developer can perform a learning task in the middle without a separate intermediate deep learning model creation task. Deep learning models can be created automatically. That is, even though the intermediate deep learning models have low accuracy, they can provide a wide spectrum regarding the model's computational overhead, enabling smooth service provision in a specific service environment. For example, in the case of an explosion in requests while providing an inference service, the service can be provided by using a lightweight model with fast computation, thereby increasing user satisfaction with the service.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that it can be done.

100: 모델 자동 경량화 시스템
110: 사용자 단말 130: 모델 자동 경량화 장치
150: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 알고리즘 수신부 320: 알고리즘 분석부
330: 분기지점 결정부 340: 중간모델 생성부
350: 딥러닝 모델 구축부 360: 제어부
740: 분기 지점 750: 원래의 학습 방향
760: 분기된 학습 방향100: model automatic lightweight system
110: user terminal 130: model automatic lightweight device
150: database
210: processor 230: memory
250: user input/output unit 270: network input/output unit
310: Algorithm receiving unit 320: Algorithm analyzing unit
330: branch point determination unit 340: intermediate model generation unit
350: deep learning model building unit 360: control unit
740: branch point 750: original learning direction
760: branched learning direction

Claims

딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 단계;
상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 단계;
상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 단계;
상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 단계; 및
상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 단계를 포함하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
Receiving a deep learning algorithm for building a deep learning model;
dividing the deep learning algorithm into a plurality of operation steps;
determining at least one branch point existing between the plurality of operation steps in a learning process according to the deep learning algorithm;
generating at least one intermediate deep learning model branching from the progress direction of the learning process based on the at least one branch point and proceeding to the last operation step of the deep learning algorithm; and
Model automatic weight reduction method for deep learning model serving optimization comprising the step of completing the deep learning model and the at least one intermediate deep learning model according to the completion of the learning process.

제1항에 있어서, 상기 딥러닝 알고리즘은
DNN(Deep Neural Network), CNN(Convolution Neural Network) 및 RNN(Recurrent Neural Network)을 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
The method of claim 1, wherein the deep learning algorithm is
A method for automatically lightweighting models for deep learning model serving optimization, characterized in that it includes a Deep Neural Network (DNN), a Convolution Neural Network (CNN), and a Recurrent Neural Network (RNN).

제1항에 있어서,
상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델은 예측 정확도와 연산 속도가 각각 상이한 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
According to claim 1,
The deep learning model and the at least one intermediate deep learning model have different prediction accuracy and operation speed, respectively.

제1항에 있어서,
상기 복수의 동작 단계들로 분할하는 단계는
상기 딥러닝 알고리즘의 동작들을 복수의 레이어(layer)들로 분할하는 단계; 및
상기 복수의 레이어들 각각에 대응하는 동작 단계들을 결정하는 단계를 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
According to claim 1,
The step of dividing into the plurality of operation steps is
dividing the operations of the deep learning algorithm into a plurality of layers; and
Automatic model lightweighting method for deep learning model serving optimization comprising the step of determining operation steps corresponding to each of the plurality of layers.

제1항에 있어서,
상기 복수의 동작 단계들로 분할하는 단계는
상기 딥러닝 알고리즘의 동작 과정에서 반복적으로 수행되는 반복 구간을 결정하는 단계;
상기 반복 구간을 기준으로 반복 전 구간 및 반복 후 구간을 결정하는 단계;
상기 반복 구간에 대해 적어도 하나의 단위 구간을 결정하는 단계; 및
상기 반복 전 구간, 상기 적어도 하나의 단위 구간 및 상기 반복 후 구간을 순서대로 정렬하여 상기 복수의 동작 단계들로 결정하는 단계를 포함하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
According to claim 1,
The step of dividing into the plurality of operation steps is
determining a repetition interval that is repeatedly performed in the operation process of the deep learning algorithm;
determining a section before repetition and a section after repetition based on the repetition section;
determining at least one unit section for the repetition section; and
The automatic model weight reduction method for deep learning model serving optimization comprising the step of arranging the section before the iteration, the at least one unit section, and the section after the iteration in order and determining the plurality of operation steps.

제5항에 있어서,
상기 적어도 하나의 분기 지점을 결정하는 단계는
상기 반복 구간이 종료되는 지점마다 분기 지점으로 결정하는 단계를 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
6. The method of claim 5,
The step of determining the at least one branching point comprises:
Automatic model weight reduction method for deep learning model serving optimization, characterized in that it comprises the step of determining as a branch point for each point at which the repetition section ends.

제6항에 있어서,
상기 적어도 하나의 분기 지점을 결정하는 단계는
상기 딥러닝 알고리즘이 CNN인 경우 적어도 하나의 콘볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)를 순차적으로 진행하는 반복 구간의 종료 지점을 상기 분기 지점으로 결정하는 단계를 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
7. The method of claim 6,
The step of determining the at least one branching point comprises:
When the deep learning algorithm is CNN, at least one convolution layer and a pooling layer are sequentially performed, and an end point of an iteration section is determined as the branch point. An automatic model lightweight method for optimizing deep learning model serving.

제1항에 있어서,
상기 적어도 하나의 중간 딥러닝 모델을 생성하는 단계는
상기 분기에 따른 후보 중간 딥러닝 모델을 정의하는 단계;
상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하는 단계; 및
상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정하는 단계를 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
According to claim 1,
The step of generating the at least one intermediate deep learning model includes
defining a candidate intermediate deep learning model according to the branch;
calculating the appropriateness of the branch point based on the number of layers (L) and the prediction accuracy (A) of the candidate intermediate deep learning model; and
Automatic model weight reduction method for deep learning model serving optimization comprising the step of determining the candidate intermediate deep learning model as an intermediate deep learning model when the adequacy of the corresponding branching point meets a preset condition.

제8항에 있어서,
상기 적정성을 산출하는 단계는
상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하는 단계를 포함하고,
상기 중간 딥러닝 모델로 결정하는 단계는
상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정하는 단계를 포함하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법.
9. The method of claim 8,
The step of calculating the adequacy is
Calculating the adequacy through a multiplication operation (A*L) between the number of layers (L) and the prediction accuracy (A),
The step of determining the intermediate deep learning model is
Automatic lightweight model for deep learning model serving optimization, comprising the step of determining the candidate intermediate deep learning model as the intermediate deep learning model when the adequacy is doubled than the intermediate deep learning model generated in the previous step Way.

딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 알고리즘 수신부;
상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 알고리즘 분석부;
상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 분기지점 결정부;
상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 중간모델 생성부; 및
상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 딥러닝 모델 구축부를 포함하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 장치.
Algorithm receiving unit for receiving a deep learning algorithm for building a deep learning model;
an algorithm analysis unit that divides the deep learning algorithm into a plurality of operation steps;
a branch point determining unit for determining at least one branch point existing between the plurality of operation steps in a learning process according to the deep learning algorithm;
an intermediate model generator for generating at least one intermediate deep learning model branching from the progress direction of the learning process based on the at least one branch point and proceeding to the last operation step of the deep learning algorithm; and
A model automatic weight reduction device for deep learning model serving optimization comprising a deep learning model building unit that completes the deep learning model and the at least one intermediate deep learning model according to the completion of the learning process.

제10항에 있어서, 상기 중간모델 생성부는
상기 분기에 따른 후보 중간 딥러닝 모델을 정의하고 상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하며 상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 장치.
11. The method of claim 10, wherein the intermediate model generating unit
Define a candidate intermediate deep learning model according to the branching, calculate the adequacy of the branch point based on the number of layers (L) and the prediction accuracy (A) of the candidate intermediate deep learning model, and the adequacy of the branch point is preset When the condition is satisfied, the model automatic lightweighting device for deep learning model serving optimization, characterized in that the candidate intermediate deep learning model is determined as an intermediate deep learning model.

제11항에 있어서, 상기 중간모델 생성부는
상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하고 상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정하는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 장치.
The method of claim 11, wherein the intermediate model generator
If the adequacy is calculated through the multiplication operation (A*L) between the number of layers (L) and the prediction accuracy (A), and the adequacy is doubled compared to the intermediate deep learning model generated in the previous step, the candidate intermediate deep learning A model automatic lightweighting device for deep learning model serving optimization, characterized in that determining the model as the intermediate deep learning model.

사용자 단말로부터 추론 서비스에 관한 요청을 수신하는 단계;
상기 요청의 수신 시점을 기준으로 클라우드 가용 자원 현황을 결정하고 상기 요청에 대한 응답 생성 시간을 예측하여 해당 응답 생성 시간에 따라 상기 복수의 딥러닝 모델들 중 어느 하나를 결정하는 단계; 및
상기 결정된 딥러닝 모델을 이용하여 상기 요청에 대한 응답을 생성하여 상기 사용자 단말에 제공하는 단계를 포함하고,
상기 복수의 딥러닝 모델들은 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 기초로 예측 정확도와 연산 속도가 각각 상이하도록 생성되는 것을 특징으로 하는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 이용한 클라우드 추론 서비스 제공 방법.
receiving a request for an inference service from a user terminal;
determining cloud available resource status based on the reception time of the request, predicting a response generation time for the request, and determining any one of the plurality of deep learning models according to the corresponding response generation time; and
using the determined deep learning model to generate a response to the request and provide it to the user terminal,
Cloud using the model automatic weight reduction method for deep learning model serving optimization, characterized in that the plurality of deep learning models are generated to have different prediction accuracy and operation speed based on the model automatic weight reduction method for deep learning model serving optimization How to provide inference services.