KR102086351B1

KR102086351B1 - Apparatus and method for learning machine learning models based on virtual data

Info

Publication number: KR102086351B1
Application number: KR1020190109025A
Authority: KR
Inventors: 양훈민; 유기중; 오세윤
Original assignee: 국방과학연구소
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2020-05-29

Abstract

The present invention relates to an apparatus and a method for learning a machine learning model using virtual data. The apparatus for learning a machine learning model comprises: a virtual data generation unit for generating virtual data from basic data obtained through computer aided design (CAD) software or game engines for an object to be identified; a comparison unit for comparing verification data including characteristics of an actual image of the object to be identified with characteristics extracted from the virtual data, and determining whether the virtual data satisfies a learning data condition; a learning unit for performing the learning of the machine learning model; and a control unit for controlling the learning unit to learn the machine learning model according to the generated virtual data, based on the determination result of the comparison unit.

Description

가상 데이터에 기반한 머신 러닝 모델 학습 장치 및 방법{APPARATUS AND METHOD FOR LEARNING MACHINE LEARNING MODELS BASED ON VIRTUAL DATA}Apparatus and method for learning machine learning model based on virtual data {APPARATUS AND METHOD FOR LEARNING MACHINE LEARNING MODELS BASED ON VIRTUAL DATA}

본 발명은 머신 러닝 중 딥 러닝에 대한 것으로, 가상 데이터를 이용하여 딥 러닝 모델을 학습하는 장치 및 방법에 대한 것이다. The present invention relates to deep learning during machine learning, and to an apparatus and method for learning a deep learning model using virtual data.

머신러닝(machine learning) 중에서 딥 러닝(deep learning)은 데이터가 많을수록 알고리즘의 분류(Classification) 성능을 향상시킬 수 있다. 데이터를 이용한 학습이 충분히 수행되는 경우 딥 러닝은 우수한 분류 성능을 가지기 때문에, 딥 러닝은 현재 각광받고 있는 추세에 있다. Among machine learning, deep learning may improve the classification performance of an algorithm as more data is present. Since deep learning has excellent classification performance when learning using data is sufficiently performed, deep learning is currently in the spotlight.

일반적으로 이미지(사진)를 사람 수준으로 예측하기 위해서는 최소 수 만 ~ 수백 만 장의 이미지를 학습 데이터로 필요로 한다. 하지만 현실적으로 일부 공개되어있는 몇 종류의 표준 벤치마킹 데이터 셋 외에는 여러 가지 기술 외적인 요인으로 인해, 딥 러닝에서 충분한 학습을 위해 요구되는 대용량 학습 데이터를 확보하기 쉽지 않은 실정이다.In general, in order to predict an image (picture) at the human level, at least tens of thousands to millions of images are required as training data. However, in reality, it is not easy to obtain the large amount of training data required for sufficient learning in deep learning due to various non-technical factors other than some standard benchmarking data sets that are partially disclosed.

이러한 문제를 해결하기 위해서 학습 데이터를 생성하는 다양한 방안이 제안되었다. 이러한 방안들의 예로, 게임엔진 혹은 3D CAD 모델링 데이터를 기반으로 이미지 예측을 위한 딥 러닝 학습용 데이터를 가상으로 생성하는 방법이 등장하였다. In order to solve this problem, various methods for generating training data have been proposed. As an example of these methods, a method of virtually generating data for deep learning learning for image prediction based on a game engine or 3D CAD modeling data has emerged.

그러나 대량의 학습 데이터를 가상으로 생성 및, 생성된 가상의 학습 데이터를 이용하여 딥 러닝 모델을 학습하고, 학습된 딥 러닝 모델이 사용자가 원하는 수준의 분류 성능을 가지는지 검증하기까지는 많은 시간이 소요된다. 이는 학습 데이터가 많을수록, 딥 러닝 모델의 구조가 깊을수록, 우수한 분류 성능을 가지는 딥 러닝 알고리즘이 만들어질 수 있기 때문이다. 즉, 우수한 분류 성능을 가지는 딥 러닝 알고리즘이 만들어지기 위해서는 대량의 학습 데이터와 그 학습 데이터들을 이용한 학습이 필요한데, 학습 데이터가 대량일수록 딥 러닝 알고리즘의 학습에 소요되는 시간이 오래 소요되기 때문이다. However, it takes a lot of time to create a large amount of training data virtually, train a deep learning model using the generated virtual training data, and verify that the trained deep learning model has the level of classification performance desired by the user. do. This is because the more learning data, the deeper the structure of the deep learning model, the more deep learning algorithms with excellent classification performance can be created. That is, in order to create a deep learning algorithm having excellent classification performance, a large amount of learning data and learning using the learning data are required, because the larger the learning data, the longer it takes to learn the deep learning algorithm.

본 발명은 전술한 문제를 해결하기 위한 것으로, 가상 데이터에 기반하여 딥 러닝 모델의 학습을 수행하는 장치 및 방법에 있어서, 상기 딥 러닝 모델의 학습에 소요되는 시간을 보다 단축할 수 있도록 하는 것이다. The present invention is to solve the above-mentioned problems, and in an apparatus and method for performing training of a deep learning model based on virtual data, it is possible to shorten the time required for learning the deep learning model.

상술한 목적을 달성하기 위한 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치는, 식별 대상에 대한 CAD(Computer Aided Design) 소프트웨어 또는 게임 엔진을 통해 획득되는 이미지 데이터로부터 가상 데이터를 생성하는 가상 데이터 생성부와, 상기 식별 대상에 대한 실제 이미지의 특징을 포함하는 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교하여, 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 비교부와, 상기 머신 러닝 모델의 학습을 수행하는 학습부 및, 상기 비교부의 판단 결과에 따라, 상기 생성된 가상 데이터에 따른 상기 머신 러닝 모델의 학습이 이루어지도록 상기 학습부를 제어하는 제어부를 포함하는 것을 특징으로 한다. A machine learning model learning apparatus according to an embodiment of the present invention for achieving the above object, generates virtual data to generate virtual data from image data obtained through CAD (Computer Aided Design) software or game engine for an identification target A comparison unit comparing the verification data including characteristics of the actual image for the identification target with the characteristics extracted from the virtual data, and determining whether the virtual data satisfies the learning data condition, and the machine It characterized in that it comprises a learning unit for performing the learning of the learning model, and a control unit for controlling the learning unit so that the learning of the machine learning model according to the generated virtual data according to the determination result of the comparison unit.

일 실시 예에 있어서, 상기 검증 데이터는, 상기 식별 대상에 대해 획득된 하나의 이미지이며, 상기 비교부는, 상기 검증 데이터와 상기 생성된 가상 데이터 간에, 두 개의 이미지 사이의 오차를 산출하고 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 것을 특징으로 한다. In one embodiment, the verification data is one image acquired for the identification target, and the comparison unit calculates an error between two images between the verification data and the generated virtual data and calculates the error It is characterized in that it is determined whether the virtual data satisfies the learning data condition according to whether or not is equal to or less than a preset first level.

일 실시 예에 있어서, 상기 비교부는, 유클리안 거리법(Euclidean distance), HoG(Histograms of Gradients) 벡터간의 거리를 고려한 DPM(DPM: Deformable Part Model)법, 아다 부스트(AdaBoost) 법, 및 두 샘플의 평균 함수 값의 차이로서 결과값(오차값)이 클수록 분포가 서로 다른 표본일 가능성이 높은 최대 평균 불일치(MMD, Maximum Mean Discrepancy)법 중 적어도 하나의 알고리즘에 근거하여 상기 검증 데이터와 상기 생성된 가상 데이터 사이의 오차를 산출하는 것을 특징으로 한다. In one embodiment, the comparison unit, Euclidean distance (Euclidean distance), Hog (Histograms of Gradients) DPM (DPM: Deformable Part Model) method considering the distance between the vector, AdaBoost (AdaBoost) method, and two The verification data and the generation based on at least one algorithm among the Maximum Mean Discrepancy (MMD) method, which is more likely to be a sample with a different distribution as the result (error value) is larger as the difference between the average function values of the sample. It is characterized by calculating the error between the virtual data.

일 실시 예에 있어서, 상기 검증 데이터는, 상기 식별 대상에 대해 획득된 복수의 이미지들이 공통적으로 갖는 특징들의 데이터이며, 상기 비교부는, 상기 검증 데이터에 포함된 특징들에 따른 적어도 하나의 확률 분포와, 상기 생성된 가상 데이터로부터 산출되는 특징들에 따른 적어도 하나의 확률 분포 간의 오차를 산출하고 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 것을 특징으로 한다. In one embodiment, the verification data is data of characteristics common to a plurality of images acquired for the identification target, and the comparison unit includes at least one probability distribution according to characteristics included in the verification data and , Calculates an error between at least one probability distribution according to features calculated from the generated virtual data, and determines whether the virtual data satisfies a learning data condition according to whether the calculated error is equal to or less than a preset first level It is characterized by.

일 실시 예에 있어서, 상기 제어부는, 상기 검증 데이터와 상기 생성된 가상 데이터 간에 산출된 오차가, 상기 제1 수준 이하 및, 상기 제1 수준보다 작은 값을 가지는 제2 수준 이상인 경우에 상기 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단하는 것을 특징으로 한다. In one embodiment, when the error calculated between the verification data and the generated virtual data is less than the first level and a second level having a value less than the first level, the control unit may perform the virtual data. Characterized in that it is determined to satisfy the learning data condition.

일 실시 예에 있어서, 상기 가상 데이터 생성부는, 상기 기초 데이터로부터 획득되는 복수의 특징에 대응하는 특징 벡터들을 추출하고, 추출된 특징 벡터들을 변경하여 상기 가상 데이터를 생성하는 것을 특징으로 한다. In one embodiment, the virtual data generator extracts feature vectors corresponding to a plurality of features obtained from the basic data, and changes the extracted feature vectors to generate the virtual data.

일 실시 예에 있어서, 상기 제어부는, 상기 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교한 결과에 따라, 적어도 하나의 특징 벡터에 대한 가중치를 결정하고, 결정된 가중치를 상기 추출된 특징 벡터들의 변경에 반영하여 상기 가상 데이터를 생성하는 것을 특징으로 한다.In one embodiment, the control unit determines a weight for at least one feature vector according to a result of comparing the verification data and features extracted from the virtual data, and determines the weights of the extracted feature vectors Characterized in that the virtual data is generated by reflecting the change.

상술한 목적을 달성하기 위한 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치의 머신 러닝 모델 학습 방법은, 식별 대상에 대한 CAD(Computer Aided Design) 소프트웨어 또는 게임 엔진을 통해 획득되는 이미지 데이터로부터 가상 데이터를 생성하는 제1 단계와, 상기 식별 대상에 대한 실제 이미지의 특징을 포함하는 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교하여, 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 제2 단계 및, 상기 제2 단계의 판단 결과, 상기 가상 데이터에 따른 상기 머신 러닝 모델의 학습을 수행하는 제3 단계를 포함하는 것을 특징으로 한다. The machine learning model learning method of the machine learning model learning apparatus according to the embodiment of the present invention for achieving the above object is virtual data from image data acquired through computer-aided design (CAD) software or game engine for the identification target A first step of generating, comparing the verification data including the characteristics of the actual image for the identification target with the characteristics extracted from the virtual data, determining whether the virtual data satisfies the learning data condition. And a third step of performing learning of the machine learning model according to the virtual data as a result of the determination of the second step and the second step.

일 실시 예에 있어서, 상기 검증 데이터는, 상기 식별 대상에 대해 획득된 하나의 이미지이며, 상기 제2 단계는, 두 개의 이미지 사이의 오차를 산출하는 적어도 하나의 기 설정된 알고리즘에 근거하여 상기 검증 데이터와 상기 생성된 가상 데이터 간의 오차를 산출하는 제2-1 단계 및, 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 제2-2 단계를 포함하는 것을 특징으로 한다. In one embodiment, the verification data is one image acquired for the identification target, and the second step is the verification data based on at least one preset algorithm for calculating an error between two images. And 2-1 step of calculating an error between the generated virtual data and 2-2 of determining whether the virtual data satisfies the learning data condition according to whether the calculated error is equal to or less than a preset first level. It characterized in that it comprises a step.

일 실시 예에 있어서, 상기 적어도 하나의 기 설정된 알고리즘은, 유클리안 거리법(Euclidean distance), HoG(Histograms of Gradients) 벡터간의 거리를 고려한 DPM(DPM: Deformable Part Model)법, 아다 부스트(AdaBoost) 법, 및 두 샘플의 평균 함수 값의 차이로서 결과값(오차값)이 클수록 분포가 서로 다른 표본일 가능성이 높은 최대 평균 불일치(MMD, Maximum Mean Discrepancy)법 중 적어도 하나임을 특징으로 한다. In one embodiment, the at least one predetermined algorithm includes a Euclidean distance method, a Deformable Part Model (DPM) method considering distances between Histograms of Gradients (HoG) vectors, and AdaBoost ) Method and at least one of the maximum mean discrepancy (MMD) method, which is more likely to be a sample with a different distribution as the result (error value) is larger as the difference between the average function values of the two samples.

일 실시 예에 있어서, 상기 검증 데이터는, 상기 식별 대상에 대해 획득된 복수의 이미지들이 공통적으로 갖는 특징들의 데이터이며, 상기 제2 단계는, 상기 검증 데이터에 포함된 특징들에 따른 적어도 하나의 확률 분포를 추출하는 a 단계와, 상기 생성된 가상 데이터로부터 산출되는 특징들에 따른 적어도 하나의 확률 분포와 상기 검증 데이터로부터 추출된 적어도 하나의 확률 분포 간의 오차를 산출하는 b 단계 및, 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 c 단계를 포함하는 것을 특징으로 한다. In one embodiment, the verification data is data of characteristics common to a plurality of images acquired for the identification target, and the second step includes at least one probability according to the features included in the verification data. Step a for extracting a distribution, step b for calculating an error between at least one probability distribution according to features calculated from the generated virtual data and at least one probability distribution extracted from the verification data, and the calculated error And a step c for determining whether the virtual data satisfies a learning data condition according to whether or not the first level is lower than or equal to a predetermined first level.

일 실시 예에 있어서, 상기 제1 단계는, 상기 기초 데이터로부터 획득되는 복수의 특징에 대응하는 특징 벡터들을 추출하는 제1-1 단계 및, 상기 추출된 특징 벡터들을 변경하여 상기 가상 데이터를 생성하는 제1-2 단계를 포함하는 것을 특징으로 한다. In an embodiment, the first step may include generating a virtual data by changing the extracted feature vectors and a first-first step of extracting feature vectors corresponding to a plurality of features obtained from the basic data. It characterized in that it comprises a step 1-2.

일 실시 예에 있어서, 상기 제1-2 단계는, 상기 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교한 결과에 따라, 적어도 하나의 특징 벡터에 대한 가중치를 결정하고, 결정된 가중치를 상기 추출된 특징 벡터들의 변경에 반영하여 새로운 가상 데이터를 생성하는 1-3 단계를 더 포함하는 것을 특징으로 한다. In one embodiment, in the first and second steps, a weight for at least one feature vector is determined according to a result of comparing the verification data and features extracted from the virtual data, and the determined weight is extracted It characterized in that it further comprises steps 1-3 to generate new virtual data by reflecting the changed feature vectors.

본 발명에 따른 머신 러닝 모델 학습 장치 및 방법의 효과에 대해 설명하면 다음과 같다.The effects of the machine learning model learning apparatus and method according to the present invention are as follows.

본 발명의 실시 예들 중 적어도 하나에 의하면, 본 발명은 딥 러닝 모델의 성능을 판단할 수 있는 검증 데이터 셋의 이미지 특징에 따라 유효한 학습 데이터인지 여부를 판단하고, 유효한 학습 데이터만을 딥 러닝의 학습에 이용되도록 함으로써, 불필요한 학습 데이터에 따른 딥 러닝 학습에 소요되는 시간 낭비를 줄일 수 있다는 효과가 있다. According to at least one of the embodiments of the present invention, the present invention determines whether or not valid learning data according to an image characteristic of a verification data set capable of determining performance of a deep learning model, and only learns effective learning data for deep learning. By using it, there is an effect of reducing the waste of time required for deep learning learning according to unnecessary learning data.

또한 본 발명의 실시 예들 중 적어도 하나에 의하면, 본 발명은 딥 러닝 모델의 성능을 판단할 수 있는 검증 데이터 셋의 이미지 특징에 따라 학습 데이터를 생성함으로써, 보다 적은 양의 학습 데이터에 따른 학습으로도 충분한 학습이 이루어질 수 있도록 하여 딥 러닝 학습에 소요되는 시간을 감소시킬 수 있다는 효과가 있다. In addition, according to at least one of the embodiments of the present invention, the present invention generates learning data according to an image characteristic of a verification data set capable of determining the performance of a deep learning model, thereby learning with a smaller amount of learning data. It is effective in reducing the time required for deep learning by allowing sufficient learning.

도 1은 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치의 구성을 도시한 블록도이다.
도 2는 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치에서 머신 러닝 모델의 학습을 위한 동작 과정을 도시한 흐름도이다.
도 3은, 도 2의 동작 과정 중 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 일 실시 예에 따른 과정을 도시한 흐름도이다.
도 4는, 도 2의 동작 과정 중 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 다른 실시 예에 따른 과정을 도시한 흐름도이다.
도 5는 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치에서, 검증 데이터 셋의 이미지 특징에 따라 학습 데이터를 생성하는 동작 과정을 도시한 흐름도이다. 1 is a block diagram showing the configuration of a machine learning model learning apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating an operation process for learning a machine learning model in a machine learning model learning apparatus according to an embodiment of the present invention.
3 is a flowchart illustrating a process according to an embodiment of determining whether virtual data satisfies a learning data condition during the operation process of FIG. 2.
4 is a flowchart illustrating a process according to another embodiment of determining whether virtual data satisfies a learning data condition during the operation process of FIG. 2.
5 is a flowchart illustrating an operation process of generating training data according to an image characteristic of a verification data set in a machine learning model learning apparatus according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 동일하거나 유사한 구성요소에는 동일, 유사한 도면 부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, exemplary embodiments disclosed herein will be described in detail with reference to the accompanying drawings, but the same or similar reference numerals are assigned to the same or similar elements, and overlapping descriptions thereof will be omitted. The suffixes "modules" and "parts" for components used in the following description are given or mixed only considering the ease of writing the specification, and do not have meanings or roles distinguished from each other in themselves. In addition, in describing the embodiments disclosed in this specification, detailed descriptions of related known technologies are omitted when it is determined that the gist of the embodiments disclosed in this specification may be obscured. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in the present specification, and the technical spirit disclosed in the specification is not limited by the accompanying drawings, and all modifications included in the spirit and technical scope of the present invention , It should be understood to include equivalents or substitutes.

먼저 도 1은 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)의 구성을 도시한 블록도이다. First, FIG. 1 is a block diagram showing the configuration of a machine learning model learning apparatus 10 according to an embodiment of the present invention.

도 1을 참조하여 살펴보면, 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)는, 제어부(100)와 상기 제어부(100)에 연결되는 입력부(110), 가상 데이터 생성부(120), 비교부(130), 학습부(140), 그리고 메모리(150)를 포함하여 구성될 수 있다. 도 1에 도시된 구성요소들은 머신 러닝 모델 학습 장치(10)를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 머신 러닝 모델 학습 장치(10)는 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다. Referring to Figure 1, the machine learning model learning apparatus 10 according to an embodiment of the present invention, the control unit 100 and the input unit 110, the virtual data generation unit 120 connected to the control unit 100, It may be configured to include a comparison unit 130, the learning unit 140, and the memory 150. The components shown in FIG. 1 are not essential for implementing the machine learning model learning apparatus 10, so the machine learning model learning apparatus 10 described herein is more than the components listed above, or It may have fewer components.

먼저 입력부(110)는 머신 러닝 모델 학습 장치(10)로 입력되는 다양한 데이터를 입력받을 수 있도록 형성될 수 있다. 예를 들어 입력부(110)는 다양한 기초 데이터 및 검증 데이터를 입력받도록 형성될 수 있다. First, the input unit 110 may be formed to receive various data input to the machine learning model learning apparatus 10. For example, the input unit 110 may be formed to receive various basic data and verification data.

여기서 기초 데이터는, 머신 러닝, 즉 딥 러닝 알고리즘 모델의 학습을 위한 가상 데이터를 생성하기 위한 데이터로서, 학습을 통해 식별 성능을 향상시키고자하는 식별 대상에 대한 CAD 또는 게임 엔진 등을 통해 획득되는 데이터들을 의미할 수 있다. Here, the basic data is data for generating virtual data for learning of a machine learning, that is, a deep learning algorithm model, and data obtained through a CAD or game engine for an identification object to improve identification performance through learning. Can mean

일 예로 상기 CAD 방식으로 생성된 기초 데이터는, 딥 러닝 알고리즘 모델이 학습을 통해 식별 성능을 향상시키고자 하는 대상, 예를 들어 식별하고자 하는 무기 체계에 대해 CAD(Computer Aided Design) 소프트웨어 등을 이용하여 생성된 2D 또는 3D 형태의 모델링 데이터일 수 있다. 한편 게임 엔진에서 획득되는 데이터의 경우, 게임 엔진에서 구현되는 모델링된 객체등을 의미할 수 있다.For example, the basic data generated by the CAD method is a deep learning algorithm model that uses a computer aided design (CAD) software for an object to improve identification performance through learning, for example, a weapon system to be identified. The generated 2D or 3D modeling data may be used. Meanwhile, in the case of data obtained from a game engine, it may mean a modeled object implemented in the game engine.

한편 상기 검증 데이터는, 상기 기초 데이터에 근거하여 생성되는 가상 데이터가, 학습에 유효한지 여부를 판단하기 위한 데이터를 의미할 수 있다. 일 예로 검증 데이터는 학습을 통해 딥 러닝 알고리즘 모델이 학습을 통해 식별 성능을 향상시키고자 하는 대상, 일 예로 특정 무기 체계에 대해 실제로 획득된 이미지일 수 있다.Meanwhile, the verification data may mean data for determining whether virtual data generated based on the basic data is valid for learning. As an example, the verification data may be an object that a deep learning algorithm model through training learns to improve identification performance through training, for example, an image actually acquired for a specific weapon system.

이러한 검증 데이터는 전문적인 수단, 예를 들어 위성 촬영이나 원거리 항공 촬영 등을 통해 얻어질 수 있다. 그러나 이와는 달리, 인터넷 또는 다양한 경로 등을 통하여 쉽게 얻어질 수 있는 상기 무기 체계의 외관이 표시된 이미지들이 될 수도 있음은 물론이다. 예를 들어 미국 육군의 주력 전차인 M1 에이브람스 전차의 경우, 그 상세한 제원과 이미지, 그리고 정면과 후면을 비롯한 각 방위에서 촬영된 이미지들을 인터넷, 또는 밀리터리(Military) 잡지 등을 통해 쉽게 얻을 수 있다. 그리고 이렇게 얻어진 이미지 정보들은 본 발명의 실시 예에 따른 머신 러닝 학습 장치(10)에서 상기 M1 에이브람스 전차에 대해 생성된 가상 데이터의 유효 여부를 판단하기 위한 검증 데이터로 사용될 수 있다.Such verification data may be obtained through professional means, for example, satellite imaging or remote aerial photography. However, unlike this, it may be understood that the appearance of the weapon system which can be easily obtained through the Internet or various routes may be displayed images. For example, the US Army's flagship tank, the M1 Abrams, can easily obtain detailed specifications and images, as well as images taken in each defense, including the front and rear, via the Internet or a military magazine. In addition, the image information thus obtained may be used as verification data for determining whether the virtual data generated for the M1 Abrams tank is valid in the machine learning learning apparatus 10 according to the embodiment of the present invention.

한편 검증 데이터는 이미지의 형태를 가지지 않을 수도 있다. 일 예로 상기 검증 데이터는 식별 성능을 향상시키고자 하는 대상, 예를 들어 식별하고자 하는 무기 체계로부터 획득되는 다양한 특징점들에 대한 데이터일 수도 있다. 이러한 경우 상기 검증 데이터는 기 설정된 복수의 특징 벡터에 따른 특징 분포 데이터들로 구성될 수도 있다. Meanwhile, the verification data may not have the form of an image. For example, the verification data may be data for various feature points obtained from an object to improve identification performance, for example, a weapon system to be identified. In this case, the verification data may be composed of feature distribution data according to a plurality of preset feature vectors.

일 예로 상기 검증 데이터는, 상기 식별 성능을 향상시키고자 하는 대상에 대한 복수의 이미지로부터 공통적으로 검출되는 이미지 특징들에 대한 데이터일 수 있다. 즉, 상기 식별 대상(예 : 특정 무기 체계)으로부터 획득된 복수의 이미지로부터, 공통적으로 검출되는 폭과 길이의 비, 또는 제1 구성부와 제2 구성부 간의 크기 비율, 평균 색상 등의 특징들이 상기 검증 데이터로서 입력될 수 있다. 이러한 경우 상기 검증 데이터들은 상기 식별 대상으로부터 획득되는 특정 특징 벡터들로 표현되는 분포, 예를 들어 확률 분포와 같은 데이터(이하 분포 데이터라고 하기로 한다)들일 수도 있다. For example, the verification data may be data for image characteristics commonly detected from a plurality of images of an object to improve the identification performance. That is, features such as a ratio of a width and a length commonly detected from a plurality of images obtained from the identification target (eg, a specific weapon system), or a size ratio between the first component and the second component, and the average color It can be input as the verification data. In this case, the verification data may be distributions represented by specific feature vectors obtained from the identification target, for example, data such as probability distributions (hereinafter referred to as distribution data).

가상 데이터 생성부(120)는 상기 입력부(110)에서 입력된 기초 데이터에 근거하여 가상 데이터를 생성할 수 있다. 가상 데이터 생성부(120)는 동일한 가상 데이터에 의한 불필요한 학습 중복이 발생하지 않도록, 하나의 기초 데이터에 근거하여 서로 다른 복수의 가상 데이터를 생성할 수 있다.The virtual data generation unit 120 may generate virtual data based on the basic data input from the input unit 110. The virtual data generation unit 120 may generate a plurality of different virtual data based on one basic data so that unnecessary learning duplication does not occur due to the same virtual data.

이를 위해 가상 데이터 생성부(120)는, 제어부(100)의 제어에 따라 가상 데이터가 생성될 때마다 가상 데이터가 생성되는 조건을 변경할 수 있다. 일 예로 기초 데이터가 2D 모델링 데이터의 경우, 모델링 데이터의 일부만을 이미지화하여 가상 데이터를 생성하거나 또는 다양한 각도로 회전된 모델링 데이터에 기초하여 상기 가상 데이터를 생성할 수 있다. 또한 모델링 데이터의 가로 방향 길이 또는 세로 방향 길이를 변경하여 가상 데이터를 생성할 수도 있다. 또한 생성된 가상 데이터, 즉 이미지의 색상을 변경하여 복수의 서로 다른 가상 데이터를 생성할 수도 있다. To this end, the virtual data generation unit 120 may change the conditions under which virtual data is generated whenever virtual data is generated under the control of the control unit 100. For example, when the basic data is 2D modeling data, virtual data may be generated by imaging only a part of the modeling data, or the virtual data may be generated based on modeling data rotated at various angles. In addition, virtual data may be generated by changing the length or length of the modeling data in the horizontal direction. In addition, a plurality of different virtual data may be generated by changing the color of the generated virtual data, that is, the image.

한편 이러한 가상 데이터의 생성 조건은, 기초 데이터로부터 획득되는 특징들에 대한 특징 벡터들 및 상기 특징 벡터들에 의해 나타나는 확률 분포에 따라 결정될 수 있다. 즉, 가상 데이터 생성부(120)는, 제어부(100)의 제어에 따라 기초 데이터로부터 획득되는 기 설정된 다수의 특징들에 근거하여 특징 벡터들을 생성하고, 생성된 특징 벡터들로 표현되는 확률 분포를 생성할 수 있다. 그리고 상기 기초 데이터의 확률 분포를 기준으로, 상기 기초 데이터의 확률 분포를 변경하여 복수의 가상 데이터를 생성할 수 있다. 이 경우 상기 복수의 가상 데이터는 서로 다른 확률 분포를 가질 수 있다. Meanwhile, the conditions for generating the virtual data may be determined according to feature vectors for features obtained from basic data and a probability distribution represented by the feature vectors. That is, the virtual data generation unit 120 generates feature vectors based on a plurality of preset characteristics obtained from the basic data under the control of the control unit 100, and generates a probability distribution represented by the generated feature vectors. Can be created. Also, based on the probability distribution of the basic data, a plurality of virtual data may be generated by changing the probability distribution of the basic data. In this case, the plurality of virtual data may have different probability distributions.

한편 비교부(130)는 상기 가상 데이터 생성부(120)에서 생성된 가상 데이터가 기 설정된 학습 데이터의 조건을 만족하는지 여부를 판단할 수 있다. 이를 위해 비교부(130)는 검증 데이터와 현재 생성된 가상 데이터를 비교할 수 있다. Meanwhile, the comparison unit 130 may determine whether the virtual data generated by the virtual data generation unit 120 satisfies a preset learning data condition. To this end, the comparison unit 130 may compare the verification data with the currently generated virtual data.

일 예로 비교부(130)는, 기 설정된 적어도 하나의 이미지 유사도 알고리즘을 이용하여 검증 데이터로서 입력된 이미지와 현재 생성된 가상 데이터(이미지) 간의 유사도를 산출할 수 있다. 그리고 산출된 유사도가 기 설정된 수준 이상인 경우라면 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하는 것으로 판단할 수 있다. 그러나 산출된 유사도가 기 설정된 수준 미만인 경우라면, 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하지 못하는 것으로 판단할 수 있다. As an example, the comparison unit 130 may calculate the similarity between the input image and the currently generated virtual data (image) as verification data using at least one image similarity algorithm that has been set. In addition, if the calculated similarity is greater than or equal to a predetermined level, it may be determined that the virtual data currently generated satisfies the condition as learning data. However, if the calculated similarity is less than a predetermined level, it can be determined that the currently generated virtual data does not satisfy the condition as training data.

또는 비교부(130)는 가상 데이터로부터 기 설정된 특징 벡터들을 추출할 수 있다. 그리고 추출된 특징 벡터들에 근거하여, 상기 가상 데이터에 대응하는 확률 분포를 산출할 수 있다. 그리고 산출된 확률 분포와 검증 데이터를 비교하고, 비교 결과에 따라 상기 가상 데이터가 학습 데이터로서의 조건을 만족하는지 여부를 판단할 수 있다. Alternatively, the comparison unit 130 may extract preset feature vectors from the virtual data. Then, based on the extracted feature vectors, a probability distribution corresponding to the virtual data can be calculated. Then, it is possible to compare the calculated probability distribution with the verification data, and determine whether the virtual data satisfies the condition as learning data according to the comparison result.

이러한 경우 상기 검증 데이터는 이미지 형태가 아니라 분포 데이터일 수 있으며, 비교부(130)는 검증 데이터를 구성하는 복수의 분포 데이터가 각각과, 그에 대응하는 가상 데이터의 확률 분포 각각을 비교하여 유사도 또는 오차량를 산출할 수 있으며, 산출된 유사도가 기 설정된 수준 이상이거나 또는 산출된 오차량이 기 설정된 수준 이하인 경우 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하는 것으로 판단할 수 있다. 반면 산출된 유사도가 기 설정된 수준 미만이거나 또는 산출된 오차량이 기 설정된 수준 미만인 경우 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하지 못하는 것으로 판단할 수 있다.In this case, the verification data may be distribution data, not an image form, and the comparison unit 130 compares each of a plurality of distribution data constituting the verification data and each of the probability distributions of the corresponding virtual data. The vehicle may be calculated, and when the calculated similarity is higher than a preset level or the calculated error amount is lower than a preset level, it may be determined that the currently generated virtual data satisfies the condition as learning data. On the other hand, if the calculated similarity is less than a preset level or the calculated error amount is less than a preset level, it can be determined that the currently generated virtual data does not satisfy the condition as learning data.

한편 제어부(100)는 비교부(130)의 판단 결과, 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하는 경우, 현재 생성된 가상 데이터에 따른 학습이 이루어지도록 학습부(140)를 제어할 수 있다. 여기서 학습부(140)는 학습을 수행하고자 하는 머신 러닝 알고리즘 모델, 보다 바람직하게 딥 러닝 알고리즘 모델을 구비할 수 있으며, 제어부(100)를 통해 인가되는 가상 데이터에 따라 구비된 딥 러닝 알고리즘 모델의 학습을 수행할 수 있다. Meanwhile, the control unit 100 may control the learning unit 140 so that learning according to the currently generated virtual data is performed when the currently generated virtual data satisfies the condition as the learning data. have. Here, the learning unit 140 may include a machine learning algorithm model to perform learning, more preferably a deep learning algorithm model, and learning of the deep learning algorithm model provided according to the virtual data applied through the control unit 100. You can do

그리고 제어부(100)는 학습부(140)에서 학습이 수행되는 경우, 기 설정된 손실 함수(loss function 또는 cost function)에 근거하여 상기 학습된 딥 러닝 알고리즘 모델의 성능을 평가할 수 있다. 그리고 평가된 모델 성능에 따라 상기 손실 함수의 결과를 보다 감축시키는 방향으로, 상기 딥 러닝 알고리즘의 학습 가중치를 갱신할 수 있다. In addition, when learning is performed by the learning unit 140, the controller 100 may evaluate the performance of the learned deep learning algorithm model based on a preset loss function or cost function. In addition, the learning weight of the deep learning algorithm may be updated in a direction to further reduce the result of the loss function according to the evaluated model performance.

그리고 제어부(100)는 학습 종료 조건이 충족되는지 여부를 판단할 수 있다. 예를 들어 상기 딥 러닝 알고리즘 모델의 성능 평가 결과, 딥 러닝 알고리즘이 식별 대상에 대한 충분한 식별 성능을 가지는 것으로 판단되는 경우 제어부(100)는 상기 딥 러닝 알고리즘에 대한 학습을 종료할 수 있다. 또는 제어부(100)는 기 설정된 최대 학습 횟수만큼 학습이 이루어진 경우라면 상기 딥 러닝 알고리즘에 대한 학습을 종료할 수 있다. 또는 상기 딥 러닝 알고리즘 모델의 성능 평가 결과 Over fitting 문제가 발생했다고 판단되면 최대 학습 횟수만큼 학습을 완료하지 않았더라도 모델 학습을 종료시킬 수 있다.In addition, the control unit 100 may determine whether the learning end condition is satisfied. For example, as a result of performance evaluation of the deep learning algorithm model, when it is determined that the deep learning algorithm has sufficient identification performance for the identification target, the controller 100 may end learning of the deep learning algorithm. Alternatively, the controller 100 may end learning of the deep learning algorithm when learning is performed for a preset maximum number of learning times. Alternatively, if it is determined that an over fitting problem occurs as a result of performance evaluation of the deep learning algorithm model, model training may be terminated even if training is not completed as many as the maximum number of training.

한편 제어부(100)는 기 설정된 학습 종료 조건이 충족되지 않은 경우, 상기 가상 데이터 생성부(120)를 제어하여 가상 데이터를 다시 생성할 수 있다. 그리고 비교부(130)의 판단 결과 상기 다시 생성된 가상 데이터가 학습 데이터의 조건을 만족하는 경우 학습부(140)를 제어하여 상기 다시 생성된 가상 데이터에 따른 학습이 수행되도록 할 수 있다. 여기서 상기 다시 생성된 가상 데이터는 그 이전에 생성된 가상 데이터와 서로 다른 가상 데이터임은 물론이다. Meanwhile, if the preset learning end condition is not satisfied, the control unit 100 may control the virtual data generation unit 120 to generate virtual data again. In addition, when the re-generated virtual data satisfies the condition of the learning data as a result of the determination by the comparison unit 130, the learning unit 140 may be controlled to perform learning according to the re-generated virtual data. It goes without saying that the re-generated virtual data is different from the previously generated virtual data.

반면 제어부(100)는 비교부(130)의 판단 결과, 현재 생성된 가상 데이터가 학습 데이터로서의 조건을 만족하지 않는 것으로 판단되는 경우, 현재 생성된 가상 데이터가 딥 러닝 알고리즘 모델의 학습에 유효하지 않은 것으로 판단할 수 있다. 그러면 제어부(100)는 상기 가상 데이터 생성부(120)를 제어하여 가상 데이터를 다시 생성하고, 비교부(130)를 제어하여 상기 다시 생성된 가상 데이터가 학습 데이터의 조건을 만족하는지 여부를 다시 판단할 수 있다. 여기서 상기 다시 생성된 가상 데이터는 그 이전에 생성된 가상 데이터와 서로 다른 가상 데이터임은 물론이다. On the other hand, when it is determined that the comparison unit 130 does not satisfy the condition as the training data, the control unit 100 determines that the currently generated virtual data is not effective for training the deep learning algorithm model. You can judge that. Then, the control unit 100 controls the virtual data generation unit 120 to regenerate virtual data, and controls the comparison unit 130 to determine whether the regenerated virtual data satisfies the learning data condition. can do. It goes without saying that the re-generated virtual data is different from the previously generated virtual data.

여기서 제어부(100)는 현재 생성된 가상 데이터가 딥 러닝 알고리즘 모델의 학습에 유효하지 않은 것으로 판단되면, 상기 비교부(130)의 비교 여부 판단 결과를 반영하여 가상 데이터가 생성되도록 가상 데이터 생성부(120)를 제어할 수도 있다. Here, when it is determined that the currently generated virtual data is not effective for learning the deep learning algorithm model, the control unit 100 reflects the comparison result of the comparison unit 130 to generate virtual data to generate virtual data ( 120).

일 예로 제어부(100)는 비교부(130)의 비교 결과, 가상 데이터로부터 추출되는 적어도 하나의 특징 벡터가, 대응하는 검증 데이터의 특징 벡터와 기 설정된 수준 이상 차이가 검출되는 경우, 검출된 차이에 대응하는 가중치를 생성할 수 있다. For example, when the comparison result of the comparison unit 130 compares the at least one feature vector extracted from the virtual data with a feature vector of a corresponding verification data and a difference above a predetermined level is detected, the control unit 100 determines the difference. Corresponding weights can be generated.

그리고 제어부(100)는, 가상 데이터 생성부(120)에서 가상 데이터를 생성하기 위한 가상 데이터 생성 조건들 중 적어도 하나에, 상기 생성된 가중치를 반영할 수 있다. 예를 들어 제어부(100)는, 변경된 가중치에 따라 특징 벡터의 변경량을 다르게 하여 가상 데이터를 생성할 수 있다. 일 예로, 제어부(100)는 기초 데이터의 확률 분포를 기준으로, 상기 기초 데이터의 확률 분포를 변경하여 가상 데이터를 생성할 수 있다. 이를 위해 제어부(100)는 상기 확률 분포에 관련된 특징 벡터들을 변경하여 상기 확률 분포가 변경되도록 할 수 있다. 여기서 상기 확률 분포를 변경하기 위해 변경되는 특징 벡터들의 값을 상기 가중치에 따라 제한함으로써, 가상 데이터 생성부(120)에서 생성되는 가상 데이터에서 특정 특징 벡터에 관련된 확률 분포의 변경량을 제한할 수 있다. In addition, the control unit 100 may reflect the generated weight in at least one of virtual data generation conditions for generating virtual data in the virtual data generation unit 120. For example, the controller 100 may generate virtual data by varying the amount of change of the feature vector according to the changed weight. For example, the controller 100 may generate virtual data by changing the probability distribution of the basic data based on the probability distribution of the basic data. To this end, the controller 100 may change the feature vectors related to the probability distribution so that the probability distribution is changed. Here, by limiting the values of the feature vectors that are changed to change the probability distribution according to the weight, it is possible to limit the amount of change in the probability distribution related to a specific feature vector in the virtual data generated by the virtual data generator 120. .

이에 따라 제어부(100)는 검증 데이터와의 오차가 일정 크기 이상인 가상 데이터가 생성되는 경우, 그 오차를 발생시킨 원인이 되는 특징 벡터의 변경량을 줄임으로서, 검증 데이터와의 유사도가 일정 수준을 유지하는 가상 데이터들이 생성되도록, 즉 검증 데이터와의 유사도가 일정 수준 미만인 요소를 보완하는 방향으로 가상 데이터가 생성되도록 가상 데이터 생성부(120)를 제어할 수 있다. Accordingly, when the virtual data in which the error with the verification data is greater than a certain size is generated, the control unit 100 reduces the amount of change in the feature vector that causes the error, thereby maintaining a certain level of similarity with the verification data. The virtual data generation unit 120 may be controlled so that the virtual data is generated, that is, the virtual data is generated in a direction to complement an element having a similarity to the verification data below a certain level.

한편 메모리(150)는 제어부(100)의 동작을 위한 다양한 프로그램을 저장할 수 있고, 제어부(100)의 동작에 필요한 다양한 데이터들(예를 들어, 기초 데이터, 가상 데이터, 검증 데이터, 다양한 임계값들 등)을 저장할 수 있다. Meanwhile, the memory 150 may store various programs for the operation of the control unit 100, and various data (eg, basic data, virtual data, verification data, and various threshold values) necessary for the operation of the control unit 100 Etc.).

한편 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)는 무선 또는 유선 통신으로 인터넷(internet)에 연결될 수 있으며, 이 경우 상기 메모리(150)는 기 설정된 외부 서버 또는 웹 스토리지(web storage)와 관련되어 동작될 수도 있다. 일 예로, 상기 기초 데이터 또는 검증 데이터는 기 설정된 외부 서버 또는 상기 웹 스토리지에 저장될 수 있으며, 이 경우 머신 러닝 모델 학습 장치(10)는 입력부(110)를 제어하여 상기 외부 서버 또는 웹 스토리지에 저장된 기초 데이터의 일부 또는 검증 데이터의 일부를 다운로드하고, 다운로드된 기초 데이터의 일부 또는 검증 데이터의 일부를 상기 메모리(150)에 저장할 수 있다. On the other hand, the machine learning model learning apparatus 10 according to an embodiment of the present invention can be connected to the Internet (internet) by wireless or wired communication, in this case, the memory 150 is a predetermined external server or web storage (web storage) It may be operated in conjunction with. For example, the basic data or verification data may be stored in a preset external server or the web storage. In this case, the machine learning model learning apparatus 10 controls the input unit 110 to be stored in the external server or web storage. A portion of the basic data or a portion of the verification data may be downloaded, and a portion of the downloaded basic data or a portion of the verification data may be stored in the memory 150.

도 2는 이러한 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)에서 머신 러닝 모델의 학습을 위한 동작 과정을 도시한 흐름도이다. 2 is a flowchart illustrating an operation process for learning a machine learning model in the machine learning model learning apparatus 10 according to an embodiment of the present invention.

도 2를 참조하여 살펴보면, 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)의 제어부(100)는 먼저 입력된 기초 데이터에 근거하여 가상 데이터를 생성할 수 있다(S200). 여기서 상기 기초 데이터는 상술한 바와 같이 여기서 기초 데이터는 가상 데이터를 생성하기 위한 데이터로서, CAD 방식 또는 게임 엔진 등에서 획득되는 다양한 데이터들을 의미할 수 있다. Referring to FIG. 2, the controller 100 of the machine learning model learning apparatus 10 according to an embodiment of the present invention may generate virtual data based on first input basic data (S200). Here, as described above, the basic data is data for generating virtual data, and may mean various data obtained in a CAD method or a game engine.

제어부(100)는 상기 S200 단계에서, 기초 데이터의 특징을 적어도 하나 변경하여 가상 데이터를 생성할 수 있다. 일 예로 제어부(100)는 기초 데이터이 일부만을 이미지화하거나 또는, 적어도 일부의 크기나 비율을 변경할 수 있다. 또는 기초 데이터를 소정 각도 회전시키고 회전된 기초 데이터에 근거하여 생성된 이미지를 가상 데이터로 생성할 수 있다. 또는 색상을 변경하여 복수의 서로 다른 가상 데이터를 생성할 수도 있다. In step S200, the controller 100 may generate virtual data by changing at least one characteristic of the basic data. For example, the controller 100 may image only a portion of the basic data or change the size or ratio of at least a portion. Alternatively, the basic data may be rotated by a predetermined angle and an image generated based on the rotated basic data may be generated as virtual data. Alternatively, a plurality of different virtual data may be generated by changing colors.

또는 제어부(100)는 상기 S200 단계에서, 상기 기초 데이터로부터 획득되는 기 설정된 다수의 특징들에 근거하여 특징 벡터들을 생성하고, 생성된 특징 벡터들로 표현되는 확률 분포를 생성할 수 있다. 그리고 상기 기초 데이터의 확률 분포를 구성하는 특징 벡터의 값을 적어도 하나 변경하여 복수의 가상 데이터를 생성할 수 있다. 이에 따라 가상 데이터는 상기 기초 데이터와 서로 다른 확률 분포를 가질 수 있다.Alternatively, in step S200, the control unit 100 may generate feature vectors based on a plurality of preset features obtained from the basic data, and generate a probability distribution represented by the generated feature vectors. In addition, a plurality of virtual data may be generated by changing at least one value of a feature vector constituting the probability distribution of the basic data. Accordingly, the virtual data may have a different probability distribution than the basic data.

여기서 상기 가상 데이터의 생성을 위해 변경되는 적어도 하나의 기초 데이터의 특징은 가상 데이터 생성 조건으로 명명될 수 있다. 제어부(100)는, 가상 데이터가 생성될 때마다, 가상 데이터 생성 조건을 적어도 하나 서로 다르게 변경함으로써, 동일한 기초 데이터로부터 서로 다른 가상 데이터들이 생성되도록 할 수 있다. Here, the characteristic of at least one basic data that is changed for the generation of the virtual data may be referred to as a virtual data generation condition. The controller 100 may allow different virtual data to be generated from the same basic data by changing at least one different virtual data generation condition each time the virtual data is generated.

한편 가상 데이터가 생성되면, 제어부(100)는 생성된 가상 데이터가 기 설정된 학습 데이터 조건을 만족하는지 여부를 판단할 수 있다(S202). 여기서 학습 데이터 조건은, 생성된 가상 데이터가 사용자로부터 입력된 검증 데이터 자체(이미지) 또는 상기 검증 데이터가 가지는 특징 분포(예 : 확률 분포)와 일정 수준 유사한지 여부에 따라 결정될 수 있다. Meanwhile, when the virtual data is generated, the controller 100 may determine whether the generated virtual data satisfies a preset learning data condition (S202). Here, the learning data condition may be determined according to whether the generated virtual data is similar to the verification data itself (image) input from the user or a characteristic distribution (for example, probability distribution) of the verification data.

여기서 제어부(100)는 상기 S202 단계에서 다양한 방식으로 가상 데이터와 검증 데이터의 유사도를 판단할 수 있다. 일 예로 제어부(100)는 검증 데이터와 가상 데이터에 대해 직접 유사도를 산출하고 산출된 유사도에 따라 현재 생성된 가상 데이터가 기 설정된 학습 데이터 조건을 만족하는지 여부를 판단할 수 있다. Here, the control unit 100 may determine the similarity between the virtual data and the verification data in various ways in step S202. For example, the controller 100 may directly calculate the similarity between the verification data and the virtual data, and determine whether the currently generated virtual data satisfies a preset learning data condition according to the calculated similarity.

또는 제어부(100)는 상기 S202 단계에서 가상 데이터로부터 추출되는 적어도 하나의 특징 벡터에 따른 확률 분포와, 검증 데이터로부터 획득된 확률 분포를 비교하고, 비교 결과에 따라 현재 생성된 가상 데이터가 기 설정된 학습 데이터 조건을 만족하는지 여부를 판단할 수 있다. Alternatively, the control unit 100 compares the probability distribution according to at least one feature vector extracted from the virtual data in step S202 and the probability distribution obtained from the verification data, and the virtual data currently generated according to the comparison result is preset learning It can be determined whether the data conditions are satisfied.

그리고 현재 생성된 가상 데이터가 기 설정된 학습 데이터 조건을 만족하지 않는 경우라면 다시 S200 단계로 진행하여 기초 데이터를 기준으로 가상 데이터를 다시 생성할 수 있다. 만약 가상 데이터가 식별하고자 하는 대상, 즉 식별 대상과 일정 수준 이상 차이가 있는 이미지, 즉 완전히 다른 이미지인 경우, 딥 러닝 알고리즘 모델의 학습에 전혀 도움이 되지 않을 수 있기 때문이다.In addition, if the currently generated virtual data does not satisfy the preset learning data condition, the process proceeds to step S200 again to generate the virtual data again based on the basic data. This is because if the virtual data is an object to be identified, that is, an image that is more than a certain level from the object to be identified, that is, a completely different image, it may not help at all to train a deep learning algorithm model.

이하 상기 S202 단계에서, 검증 데이터와 가상 데이터에 대해 직접 유사도를 산출하고 산출된 유사도에 따라 가상 데이터의 학습 데이터 조건 충족 여부를 판단하는 경우의 동작 과정을 하기 도 3을 참조하여 보다 자세하게 살펴보기로 한다. 또한 상기 S202 단계에서, 검증 데이터의 확률 분포와 가상 데이터의 확률 분포를 비교하고, 비교 결과에 따라 가상 데이터의 학습 데이터 조건 충족 여부를 판단하는 경우의 동작 과정을 하기 도 4를 참조하여 보다 자세하게 살펴보기로 한다.Hereinafter, in step S202, an operation process in the case of determining whether the learning data condition of the virtual data is satisfied according to the calculated similarity and calculating the similarity directly for the verification data and the virtual data will be described in more detail with reference to FIG. 3 below. do. In addition, in step S202, an operation process when comparing the probability distribution of the verification data and the probability distribution of the virtual data and determining whether the learning data condition of the virtual data is satisfied according to the comparison result will be described in more detail with reference to FIG. 4 below. Let's see.

한편 상기 S202 단계의 비교 결과, 가상 데이터가 학습 데이터 조건을 만족하는 경우, 제어부(100)는 현재 생성된 가상 데이터에 근거하여 딥 러닝 알고리즘 모델의 학습을 수행할 수 있다(S204). On the other hand, when the comparison result of step S202, the virtual data satisfies the learning data condition, the controller 100 may perform training of a deep learning algorithm model based on the currently generated virtual data (S204).

그리고 제어부(100)는 딥 러닝 알고리즘 모델의 학습이 수행되는 경우, 기 설정된 손실 함수(loss function 또는 cost function)에 근거하여 상기 학습된 딥 러닝 알고리즘 모델의 성능을 평가할 수 있다(S206). In addition, when learning of the deep learning algorithm model is performed, the controller 100 may evaluate the performance of the learned deep learning algorithm model based on a preset loss function (cost function) (S206).

그리고 제어부(100)는 학습 종료 조건이 충족되는지 여부를 판단할 수 있다(S208). 일 예로 제어부(100)는 상기 S206 단계에서 평가된 딥 러닝 알고리즘 모델의 성능이 식별 대상에 대한 충분한 식별 성능을 가지는 것으로 판단되는 경우, 상기 딥 러닝 알고리즘에 대한 학습을 종료할 수 있다. 또는 제어부(100)는 기 설정된 최대 학습 횟수만큼 학습이 이루어진 경우라면 상기 딥 러닝 알고리즘에 대한 학습을 종료할 수 있다. 또는 상기 딥 러닝 알고리즘 모델의 성능 평가 결과 Over fitting 문제가 발생했다고 판단되면 최대 학습 횟수만큼 학습을 완료하지 않았더라도 모델 학습을 종료시킬 수 있다.Then, the control unit 100 may determine whether the learning end condition is satisfied (S208). For example, if it is determined that the performance of the deep learning algorithm model evaluated in step S206 has sufficient identification performance for the identification target, the controller 100 may end learning of the deep learning algorithm. Alternatively, the controller 100 may end learning of the deep learning algorithm when learning is performed for a preset maximum number of learning times. Alternatively, if it is determined that an over fitting problem occurs as a result of performance evaluation of the deep learning algorithm model, model training may be terminated even if training is not completed as many as the maximum number of training.

한편 제어부(100)는 상기 S208 단계에서 기 설정된 학습 종료 조건이 충족되지 않은 것으로 판단되는 경우, 평가된 모델 성능에 따라 상기 손실 함수의 결과를 보다 감축시키는 방향으로, 상기 딥 러닝 알고리즘의 학습 가중치를 갱신할 수 있다(S210). 그리고 다시 S200 단계로 진행하여 가상 데이터를 다시 생성할 수 있다. 여기서 상기 다시 생성된 가상 데이터는 그 이전에 생성된 가상 데이터와 서로 다른 가상 데이터임은 물론이다. On the other hand, if it is determined that the learning end condition set in step S208 is not satisfied, the controller 100 increases the learning weight of the deep learning algorithm in a direction to further reduce the result of the loss function according to the evaluated model performance. It can be updated (S210). Then, the process proceeds to step S200 and virtual data can be generated again. It goes without saying that the re-generated virtual data is different from the previously generated virtual data.

그러면 상기 S202 단계의 판단 결과에 따라 다시 S200 단계가 수행되거나 또는 S204 단계 내지 S206 단계가 다시 수행되어 다시 생성된 가상 데이터에 따른 학습이 수행 수 있다. 그리고 상기 S208 단계가 다시 수행되고 상기 S208 단계의 판단 결과에 따라 상술한 과정들이 다시 반복될 수 있다. Then, according to the determination result of step S202, step S200 may be performed again or steps S204 to S206 may be performed again, and learning according to the generated virtual data may be performed again. Then, step S208 is performed again and the above-described processes may be repeated again according to the determination result of step S208.

이에 따라 본 발명은, 식별 성능을 향상시키고자 하는 식별 대상에 대한 기초 데이터로부터 생성된 가상 데이터들에 근거하여 딥 러닝 알고리즘 모델의 학습이 이루어지도록 하는 경우에, 상기 기초 데이터에 근거하여 생성되는 가상 데이터들 중, 상기 식별 대상으로부터 실제 획득된 이미지(검증 데이터)와 일정 수준 이상 유사한 가상 데이터들, 즉 유효한 가상 데이터들만을 이용하여 상기 딥 러닝 알고리즘 모델의 학습이 이루어질 수 있도록 한다. 따라서 본 발명은 딥 러닝 알고리즘 모델의 학습에 도움이 되지 않는 무효한 가상 데이터의 학습에 소요되는 시간을 줄일 수 있으므로, 딥 러닝 알고리즘 모델의 학습에 소요되는 시간을 크게 단축할 수 있다는 효과가 있다. Accordingly, according to the present invention, when a deep learning algorithm model is trained based on virtual data generated from basic data on an identification target for improving identification performance, a virtual generated based on the basic data Among the data, learning of the deep learning algorithm model can be performed using only virtual data similar to an image (verification data) actually obtained from the identification target or more than a certain level, that is, valid virtual data. Therefore, the present invention can reduce the time required to train invalid virtual data that does not help the learning of the deep learning algorithm model, and thus has the effect of significantly reducing the time required to train the deep learning algorithm model.

한편 도 3은, 상술한 바와 같이, 검증 데이터와 가상 데이터에 대해 직접 유사도를 산출하고 산출된 유사도에 따라 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 동작 과정을 보다 자세히 도시한 흐름도이다. Meanwhile, FIG. 3 is a flowchart illustrating an operation process of calculating the similarity directly with respect to the verification data and the virtual data in detail, and determining whether the virtual data satisfies the learning data condition according to the calculated similarity, as described above.

도 3을 참조하여 살펴보면, 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)의 제어부(100)는 상기 S202 단계로 진입하는 경우, 먼저 기 설정된 이미지 유사도 산출 알고리즘에 따라 가상 데이터와 검증 데이터 간의 유사도를 산출할 수 있다(S300). 여기서 상기 검증 데이터는 이미지 형태일 수 있다. Referring to FIG. 3, when the controller 100 of the machine learning model learning apparatus 10 according to an embodiment of the present invention enters the step S202, virtual data and verification data according to a preset image similarity calculation algorithm Similarity between livers may be calculated (S300). Here, the verification data may be in the form of an image.

한편 상기 S300 단계는 두 이미지 간의 유사도를 산출할 수 있는 다양한 알고리즘을 이용할 수 있다. 예를 들어 상기 S300 단계는, 유클리안 거리법(Euclidean distance), HoG(Histograms of Gradients) 벡터간의 거리를 고려한 DPM(DPM: Deformable Part Model)법, 아다 부스트(AdaBoost) 법, 두 샘플의 평균 함수 값의 차이로서, 결과값(오차값)이 클수록 분포가 서로 다른 표본일 가능성이 높은 최대 평균 불일치(MMD, Maximum Mean Discrepancy)법 등을 이용하여 유사도를 산출하는 단계일 수 있다. Meanwhile, in step S300, various algorithms capable of calculating the similarity between two images may be used. For example, in step S300, the Euclidean distance method, the Deformable Part Model (DPM) method considering the distance between Histograms of Gradients (HoG) vectors, the AdaBoost method, and the average of the two samples As a function value difference, a similarity may be calculated using a maximum mean discrepancy (MMD) method, which is more likely to be a sample having a different distribution as the result value (error value) is larger.

하기 수학식 1은 유클리안 거리법을, 수학식 2는 DPM 법을, 수학식 3은 아다 부스트법을, 수학식 4는 MMD 법을 각각 나타낸 것이다. Equation 1 below is a Euclidean distance method, Equation 2 is a DPM method, Equation 3 is an Ada boost method, and Equation 4 is an MMD method.

한편 상술한 수학식 1 내지 4 중 어느 하나에 근거하여, 제어부(100)는 실제 이미지, 즉 검증 데이터와, 가상 이미지 즉 가상 데이터 간의 유사도를 산출할 수 있다. 그리고 유사도가 기 설정된 임계 수준 이상인지 여부에 따라 상기 가상 데이터가 학습 데이터의 조건을 만족하는지 여부를 판단할 수 있다(S302). 즉, S302 단계의 판단 결과 산출된 유사도가 상기 임계 수준 이상인 경우라면, 제어부(100)는 가상 데이터가 학습 데이터를 만족하는 것으로 판단할 수 있다(304). 이 경우 제어부(100)는 도 2의 S204 단계로 진행하여 현재 생성된 가상 데이터에 따라 딥 러닝 알고리즘 모델의 학습을 수행할 수 있다. Meanwhile, based on any one of Equations 1 to 4 described above, the controller 100 may calculate the similarity between the actual image, that is, the verification data and the virtual image, that is, the virtual data. In addition, it may be determined whether the virtual data satisfies the condition of the learning data according to whether the similarity is greater than or equal to a predetermined threshold level (S302). That is, if the similarity calculated as a result of the determination in step S302 is equal to or greater than the threshold level, the controller 100 may determine that the virtual data satisfies the learning data (304). In this case, the control unit 100 proceeds to step S204 of FIG. 2 to perform learning of the deep learning algorithm model according to the currently generated virtual data.

그러나 상기 S302 단계의 판단 결과 산출된 유사도가 상기 임계 수준 미만인 경우라면, 제어부(100)는 가상 데이터가 학습 데이터를 만족하지 못하는 것으로 판단할 수 있다(306). 그러면 제어부(100)는 도 2의 S200 단계로 다시 진행하여 새로운 가상 데이터를 생성할 수 있다.However, if the similarity calculated as a result of the determination in step S302 is less than the threshold level, the controller 100 may determine that the virtual data does not satisfy the learning data (306). Then, the control unit 100 may go back to step S200 of FIG. 2 to generate new virtual data.

한편 상기 유사도는 오차값으로도 산출될 수 있으며, 이 경우 상기 S302 단계는 산출된 오차값이 기 설정된 값 이하인지 여부를 판단하는 과정일 수 있다. 이 경우 오차값이 기 설정된 임계값 이하인 경우라면 제어부(100)는 상기 S304 단계로 진행할 수 있으며, 오차값이 상기 임계값 미만인 경우라면 상기 S306 단계로 진행할 수 있다. Meanwhile, the similarity may be calculated as an error value. In this case, the step S302 may be a process of determining whether the calculated error value is equal to or less than a preset value. In this case, if the error value is less than or equal to the preset threshold, the control unit 100 may proceed to step S304. If the error value is less than the threshold, the control unit 100 may proceed to step S306.

한편 상기 가상 데이터가 학습 데이터의 조건을 만족하는지 여부를 판단하기 위한 상기 임계 수준 또는 임계값은 사용자의 선택에 따라 변경될 수 있다. Meanwhile, the threshold level or threshold for determining whether the virtual data satisfies the condition of the learning data may be changed according to a user's selection.

한편 도 4는, 도 3의 동작 과정과 달리 검증 데이터의 확률 분포와 가상 데이터의 확률 분포를 비교하고, 비교 결과에 따라 가상 데이터의 학습 데이터 조건 충족 여부를 판단하는 경우의 동작 과정을 도시한 흐름도이다. Meanwhile, FIG. 4 is a flowchart illustrating an operation process when comparing the probability distribution of the verification data with the probability distribution of the virtual data and determining whether the learning data condition of the virtual data is satisfied according to the comparison result, unlike the operation process of FIG. 3. to be.

도 3을 참조하여 살펴보면, 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)의 제어부(100)는 상기 S202 단계로 진입하는 경우, 먼저 현재 생성된 가상 데이터로부터 기 설정된 복수의 특징 벡터를 추출할 수 있다. 그리고 추출된 복수의 특징 벡터들에 따라 적어도 하나의 확률 분포를 산출할 수 있다(S400). Referring to FIG. 3, when entering the step S202, the controller 100 of the machine learning model learning apparatus 10 according to an embodiment of the present invention first sets a plurality of feature vectors from virtual data currently generated. Can be extracted. In addition, at least one probability distribution may be calculated according to the extracted feature vectors (S400).

그리고 제어부(100)는 상기 S400 단계에서 산출된 적어도 하나의 확률 분포와, 검증 데이터에 따른 적어도 하나의 확률 분포 간의 오차를 산출할 수 있다(S402). In addition, the control unit 100 may calculate an error between at least one probability distribution calculated in step S400 and at least one probability distribution according to verification data (S402).

여기서 상기 검증 데이터에 따른 적어도 하나의 확률 분포는 검증 데이터로부터 추출되는 복수의 특징 벡터로부터 생성될 수 있다. 이를 위해 상기 S400 단계는 검증 데이터로부터 기 설정된 복수의 특징 벡터를 추출하고, 추출된 복수의 특징 벡터들에 따라 적어도 하나의 확률 분포를 산출하는 과정을 더 포함할 수 있다. Here, at least one probability distribution according to the verification data may be generated from a plurality of feature vectors extracted from the verification data. To this end, the step S400 may further include a process of extracting a plurality of preset feature vectors from the verification data and calculating at least one probability distribution according to the extracted feature vectors.

또는 상기 검증 데이터가 이미지 형태가 아니라, 식별 대상으로부터 획득된 복수의 실제 이미지 간에 공통된 특징들, 즉 특징 벡터들에 따른 적어도 하나의 확률 분포를 포함하는 경우, 상기 S402 단계는 상기 검증 데이터에 포함된 적어도 하나의 확률 분포를, 대응하는 특징 벡터 별로 각각 가상 데이터로부터 산출된 적어도 하나의 확률 분포와 비교하여 그 차이를 산출하는 단계일 수 있다. Alternatively, if the verification data is not in the form of an image, but includes common features among a plurality of real images obtained from an identification target, that is, at least one probability distribution according to feature vectors, the step S402 is included in the verification data It may be a step of calculating a difference by comparing at least one probability distribution with at least one probability distribution calculated from virtual data for each corresponding feature vector.

여기서 서로 비교되어 그 차이가 산출되는 확률 분포가 복수임에 따라, 복수의 오차값이 산출되면, 제어부(100)는 산출된 오차값을 평균하여 평균 오차값을 산출할 수 있다. Here, as a plurality of probability distributions in which the differences are compared and calculated are calculated, when a plurality of error values are calculated, the controller 100 may calculate the average error value by averaging the calculated error values.

상기 S402 단계의 비교 결과, 검증 데이터에 따른 적어도 하나의 확률 분포와, 가상 데이터에 따른 적어도 하나의 확률 분포 간의 오차가 산출되면, 제어부(100)는 산출된 차이, 즉 산출된 오차가 기 설정된 임계값 미만인지 여부를 판단할 수 있다(S404). As a result of the comparison of step S402, when an error between at least one probability distribution according to verification data and at least one probability distribution according to virtual data is calculated, the controller 100 calculates the difference, that is, the calculated error is a predetermined threshold It may be determined whether the value is less than (S404).

그리고 산출된 오차가 상기 임계값 미만인 경우라면 현재 생성된 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단할 수 있다(S406). 이 경우 제어부(100)는 도 2의 S204 단계로 진행하여 현재 생성된 가상 데이터에 따라 딥 러닝 알고리즘 모델의 학습을 수행할 수 있다. Then, if the calculated error is less than the threshold, it may be determined that the currently generated virtual data satisfies the learning data condition (S406). In this case, the control unit 100 proceeds to step S204 of FIG. 2 to perform learning of the deep learning algorithm model according to the currently generated virtual data.

반면 산출된 오차가 상기 임계값 이상인 경우라면 현재 생성된 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단할 수 있다(S408). 그러면 제어부(100)는 도 2의 S200 단계로 다시 진행하여 새로운 가상 데이터를 생성할 수 있다.On the other hand, if the calculated error is greater than or equal to the threshold, it may be determined that the currently generated virtual data satisfies the learning data condition (S408). Then, the control unit 100 may go back to step S200 of FIG. 2 to generate new virtual data.

한편 도 5는 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)에서, 검증 데이터 셋의 이미지 특징에 따라 학습 데이터를 생성하는 동작 과정을 도시한 흐름도이다. Meanwhile, FIG. 5 is a flowchart illustrating an operation process of generating training data according to an image characteristic of a verification data set in the machine learning model learning apparatus 10 according to an embodiment of the present invention.

예를 들어 제어부(100)는 상기 도 2의 S202 단계에서, 가상 데이터가 학습 데이터 조건을 만족하지 못하는 것으로 판단되는 경우, 다시 S200 단계로 진행하여 가상 데이터를 생성할 수 있다. 이 경우 제어부(100)는 가상 데이터의 생성 시에, 가상 데이터와 검증 데이터 간의 오차가 기 설정된 수준 이상인 특징 벡터의 변경량이 제한되도록 함으로서, 검증 데이터와 기 설정된 수준 이상의 오차가 발생하는 가상 데이터가 생성될 확률을 보다 낮출 수 있다. 도 5는 이러한 경우에 상기 가상 데이터와 검증 데이터 간의 오차를 반영한 가상 데이터가 생성되도록 하는 일 예를 도시한 것이다.For example, if it is determined in step S202 of FIG. 2 that the virtual data does not satisfy the learning data condition, the control unit 100 may proceed to step S200 again to generate virtual data. In this case, when the virtual data is generated, the control unit 100 limits the amount of change in the feature vector in which the error between the virtual data and the verification data is greater than or equal to a predetermined level, thereby generating virtual data in which the error between the verification data and a predetermined level occurs. You can lower the probability of becoming. 5 illustrates an example in which virtual data reflecting an error between the virtual data and verification data is generated in this case.

도 5를 참조하여 살펴보면, 본 발명의 실시 예에 따른 머신 러닝 모델 학습 장치(10)의 제어부(100)는, 현재 생성된 가상 데이터가 학습 데이터 조건을 만족하지 못하는 것으로 판단되는 경우, 가상 데이터와 검출 데이터 간의 각 특징 벡터 별 오차를 검출할 수 있다(S500). 5, the control unit 100 of the machine learning model learning apparatus 10 according to an embodiment of the present invention, if it is determined that the currently generated virtual data does not satisfy the learning data condition, the virtual data and An error for each feature vector between detection data may be detected (S500).

그리고 검출된 오차가 일정 수준 이상인 값을 가지는 적어도 하나의 특징 벡터를 검출할 수 있다(S502). 그리고 검출된 오차값들의 크기에 각각 대응하는 가중치들을 특징 벡터 각각에 부여할 수 있다(S504). 이 경우 각 특징 벡터에 부여되는 가중치들은 0 초과 1 미만의 값을 가질 수 있다. 여기서 검출된 오차값의 크기가 작을수록 1에 가까운 가중치가 설정될 수 있고, 검출된 오차값의 크기가 클수록 0에 가까운 가중치가 설정될 수 있다.In addition, at least one feature vector having a value in which the detected error is equal to or greater than a certain level may be detected (S502). In addition, weights corresponding to the detected error values may be assigned to each feature vector (S504). In this case, weights assigned to each feature vector may have a value greater than 0 and less than 1. Here, as the size of the detected error value is smaller, a weight closer to 1 may be set, and as the size of the detected error value is larger, a weight closer to 0 may be set.

한편 가중치들이 부여되는 경우, 제어부(100)는 식별 대상에 따른 기초 데이터로부터 추출되는 각 특징 벡터별로 가상 데이터의 생성을 위해 변경할 확률 분포에 따른 특징 벡터의 변경량을 결정할 수 있다(S506). Meanwhile, when weights are assigned, the control unit 100 may determine a change amount of the feature vector according to a probability distribution to be changed to generate virtual data for each feature vector extracted from the basic data according to the identification target (S506).

그리고 결정된 특징 벡터의 변경량들 중, 가중치가 부여된 특징 벡터의 변경량들을 상기 가중치에 근거하여 다시 변경할 수 있다(S508). 이에 따라 상기 S504 단계에서 결정된 가중치에 따라 상기 S506 단계에서 결정된 변경량의 크기가 줄어들 수 있다. 그리고 제어부(100)는 가중치에 따라 변경된 변경량을 포함하는 각 특징 벡터별 변경량에 따라 가상 데이터를 생성할 수 있다(S510). And, among the determined amount of change of the feature vector, the change amount of the weighted feature vector may be changed again based on the weight (S508). Accordingly, the size of the change amount determined in step S506 may be reduced according to the weight determined in step S504. Then, the control unit 100 may generate virtual data according to the change amount for each feature vector including the change amount changed according to the weight (S510).

즉, 본 발명의 머신 러닝 모델 학습 장치(10)의 제어부(100)는 기초 데이터의 특징 벡터를 변경하여 가상 데이터를 생성하는 경우에, 일정 수준 이상 오차가 발생한 특징 벡터에 대해서는 변경되는 특징 벡터의 크기를 제한함으로써 검증 데이터와의 유사도가 일정 수준 미만인 요소를 보완하는 방향으로 가상 데이터가 생성되도록 할 수 있다. 이에 딥 러닝 알고리즘 모델의 학습에 무효한 가상 데이터가 생성되는 확률을 보다 낮출 수 있으므로, 상기 딥 러닝 알고리즘 모델의 학습을 위한 가상 데이터 생성에 소요되는 시간을 보다 줄일 수 있다. That is, when the virtual machine generates the virtual data by changing the feature vector of the basic data, the control unit 100 of the machine learning model learning apparatus 10 of the present invention changes the feature vector of the feature vector that has an error of a certain level or more. By limiting the size, it is possible to generate virtual data in a direction to compensate for elements having a similarity to the verification data below a certain level. Accordingly, since the probability that invalid virtual data is generated for training the deep learning algorithm model can be lowered, the time required to generate the virtual data for training the deep learning algorithm model can be further reduced.

한편 상술한 설명에서는 산출된 가상 데이터와 검증 데이터 간의 오차에 근거하여 결정되는 가중치에 근거하여, 가상 데이터의 생성을 위해 변경되는 특정 특징 벡터의 변경이 제한되도록 하는 구성을 개시하였으나, 이는 상기 가상 데이터와 검증 데이터 간의 오차를 반영하여 가상 데이터가 생성되도록 하는 일 예에 불과할 뿐, 본 발명이 이에 한정되는 것이 아님은 물론이다. On the other hand, in the above-described description, based on the weight determined based on the error between the calculated virtual data and the verification data, a configuration in which a change of a specific feature vector changed for generation of virtual data is restricted is disclosed, but this is the virtual data. It is, of course, that the present invention is not limited to this, and is merely an example in which virtual data is generated by reflecting an error between and verification data.

일 예로 제어부(100)는 산출된 가상 데이터와 검증 데이터 간의 오차 따른 학습을 통해 보다 검증 데이터와 오차가 적게 발생하는 가상 데이터들이 생성되도록 할 수도 있음은 물론이다. 이 경우 제어부(100)는 가상 데이터와 검증 데이터 간의 오차를 감소시키는 방향으로, 기 설정된 값의 가중치를 순차적으로 적용하거나 또는 상기 오차의 크기에 비례하는 가중치를 적용하여, 검증 데이터와 기 설정된 수준 이상의 오차가 발생하는 가상 데이터가 생성될 확률을 보다 낮출 수도 있음은 물론이다. For example, the control unit 100 may be configured to generate virtual data having less verification data and less errors through learning according to an error between the calculated virtual data and the verification data. In this case, the controller 100 applies a weight of a predetermined value sequentially or applies a weight proportional to the size of the error in a direction of reducing the error between the virtual data and the verification data, so that the verification data and a predetermined level or more are applied. Of course, it is possible to lower the probability of generating virtual data in which errors occur.

한편 상술한 도 3 내지 도 4의 설명에서는, 학습 데이터 조건이 가상 데이터와 검증 데이터 간의 유사도가 기 설정된 수준 이상이거나 또는 오차가 기 설정된 수준 이하인 경우만 예로 들어 설명하였으나, 이와는 달리 가상 데이터와 검증 데이터 간의 유사도 또는 오차가 일정 범위 내, 즉 제1 수준에서 제2 수준 사이인 경우에만 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단할 수도 있다. On the other hand, in the above description of FIGS. 3 to 4, the learning data condition has been described as an example only when the degree of similarity between the virtual data and the verification data is greater than or equal to a preset level, or the virtual data and verification data are different It may be determined that the virtual data satisfies the learning data condition only when the similarity or error of the liver is within a certain range, that is, between the first level and the second level.

이 경우 예를 들어 제어부(100)는 가상 데이터와 검증 데이터의 유사도 차가 50% 내지 90%이내인 경우에만 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단할 수 있다. 이는 기초 데이터로 입력된 이미지가 검증 데이터로 다시 사용되는 경우에, 입력된 이미지와 동일한 이미지가 머신 러닝 알고리즘 모델의 학습에 사용되도록 하는 경우를 방지하기 위함이다. In this case, for example, the controller 100 may determine that the virtual data satisfies the learning data condition only when the similarity difference between the virtual data and the verification data is within 50% to 90%. This is to prevent a case in which the same image as the input image is used for training the machine learning algorithm model when the image input as the basic data is used again as verification data.

한편 상술한 본 발명의 설명에서는 구체적인 실시 예에 관해 설명하였으나, 여러 가지 변형이 본 발명의 범위에서 벗어나지 않고 실시할 수 있다. 특히 상술한 설명에서는, 가중치를 적용하여 일정 수준 이상의 오차가 발생한 특징 벡터의 변경량이 제한되도록 하여 가상 데이터를 생성하는 것을 설명하였으나, 이는 본 발명에서 가정한 일 예일 뿐, 얼마든지 가상 데이터와 검증 데이터 간의 차이를 반영하여, 검증 데이터와의 유사도가 일정 수준 미만인 요소를 보완하는 방향으로 가상 데이터가 생성되도록 하는 다른 방법이 적용될 수도 있음은 물론이다. Meanwhile, in the above description of the present invention, specific embodiments have been described, but various modifications can be made without departing from the scope of the present invention. Particularly, in the above-described description, it has been described that virtual data is generated by limiting the amount of change of a feature vector in which an error of a certain level or more is applied by applying weights. Of course, other methods may be applied to generate virtual data in a direction to compensate for factors in which the similarity with verification data is less than a certain level by reflecting the difference between them.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 컴퓨터는 상기 제어부(180)를 포함할 수도 있다.The present invention described above can be embodied as computer readable codes on a medium on which a program is recorded. The computer-readable medium includes all types of recording devices in which data readable by a computer system is stored. Examples of computer-readable media include a hard disk drive (HDD), solid state disk (SSD), silicon disk drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. This includes, and is also implemented in the form of a carrier wave (eg, transmission over the Internet). Also, the computer may include the control unit 180.

본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Those skilled in the art to which the present invention pertains will be able to make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the claims below, and all technical spirits within the equivalent range should be interpreted as being included in the scope of the present invention.

10 : 머신 데이터 학습 장치
100 : 제어부 110 : 입력부
120: 가상 데이터 생성부 130 : 비교부
140 : 학습부 150 : 메모리10: machine data learning device
100: control unit 110: input unit
120: virtual data generation unit 130: comparison unit
140: learning unit 150: memory

Claims

가상 데이터를 통해 머신 러닝 모델의 학습을 수행하는 머신 러닝 모델 학습 장치에 있어서,
식별 대상에 대한 CAD(Computer Aided Design) 소프트웨어 또는 게임 엔진을 통해 획득되는 이미지 데이터로부터 가상 데이터를 생성하는 가상 데이터 생성부;
상기 식별 대상에 대한 실제 이미지의 특징을 포함하는 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교하여, 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 비교부;
상기 머신 러닝 모델의 학습을 수행하는 학습부; 및,
상기 비교부의 판단 결과에 따라, 상기 생성된 가상 데이터에 따른 상기 머신 러닝 모델의 학습이 이루어지도록 상기 학습부를 제어하는 제어부를 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 장치..In the machine learning model learning apparatus for performing the learning of the machine learning model through virtual data,
A virtual data generation unit for generating virtual data from image data acquired through CAD (Computer Aided Design) software or a game engine for identification;
A comparison unit comparing the verification data including the characteristics of the actual image for the identification object with the characteristics extracted from the virtual data, and determining whether the virtual data satisfies the learning data condition;
A learning unit that performs learning of the machine learning model; And,
And a control unit for controlling the learning unit to perform learning of the machine learning model according to the generated virtual data according to the determination result of the comparison unit.

제1항에 있어서,
상기 검증 데이터는,
상기 식별 대상에 대해 획득된 하나의 이미지이며,
상기 비교부는,
상기 검증 데이터와 상기 생성된 가상 데이터 간에, 두 개의 이미지 사이의 오차를 산출하고 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 것을 특징으로 하는 머신 러닝 모델 학습 장치. According to claim 1,
The verification data,
It is one image acquired for the identification object,
The comparison unit,
Between the verification data and the generated virtual data, calculating an error between two images and determining whether the virtual data satisfies the learning data condition according to whether the calculated error is equal to or less than a preset first level Machine learning model learning device.

제2항에 있어서, 상기 비교부는,
유클리안 거리법(Euclidean distance), HoG(Histograms of Gradients) 벡터간의 거리를 고려한 DPM(DPM: Deformable Part Model)법, 아다 부스트(AdaBoost) 법, 및 두 샘플의 평균 함수 값의 차이로서 결과값(오차값)이 클수록 분포가 서로 다른 표본일 가능성이 높은 최대 평균 불일치(MMD, Maximum Mean Discrepancy)법 중 적어도 하나의 알고리즘에 근거하여 상기 검증 데이터와 상기 생성된 가상 데이터 사이의 오차를 산출하는 것을 특징으로 하는 머신 러닝 모델 학습 장치.According to claim 2, The comparison unit,
The result is the difference between the Euclidean distance method, the Deformable Part Model (DPM) method considering the distance between Histograms of Gradients (HoG) vectors, the AdaBoost method, and the average function value of the two samples. Calculating an error between the verification data and the generated virtual data based on at least one algorithm among the Maximum Mean Discrepancy (MMD) method, which is more likely to be a sample having a different distribution as the (error value) is larger Machine learning model learning device characterized by.

제1항에 있어서,
상기 검증 데이터는,
상기 식별 대상에 대해 획득된 복수의 이미지들이 공통적으로 갖는 특징들의 데이터이며,
상기 비교부는,
상기 검증 데이터에 포함된 특징들에 따른 적어도 하나의 확률 분포와, 상기 생성된 가상 데이터로부터 산출되는 특징들에 따른 적어도 하나의 확률 분포 간의 오차를 산출하고 산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 것을 특징으로 하는 머신 러닝 모델 학습 장치. According to claim 1,
The verification data,
It is data of features that the plurality of images acquired for the identification object have in common,
The comparison unit,
Calculate an error between at least one probability distribution according to the features included in the verification data and at least one probability distribution according to features calculated from the generated virtual data, and whether the calculated error is equal to or less than a preset first level A machine learning model learning apparatus characterized in that it is determined whether or not the virtual data satisfies a learning data condition.

제2항 또는 제4항에 있어서, 상기 제어부는,
상기 검증 데이터와 상기 생성된 가상 데이터 간에 산출된 오차가, 상기 제1 수준 이하 및, 상기 제1 수준보다 작은 값을 가지는 제2 수준 이상인 경우에 상기 가상 데이터가 학습 데이터 조건을 만족하는 것으로 판단하는 것을 특징으로 하는 머신 러닝 모델 학습 장치.According to claim 2 or 4, wherein the control unit,
When the error calculated between the verification data and the generated virtual data is equal to or less than the first level and a second level having a value smaller than the first level, it is determined that the virtual data satisfies the learning data condition Machine learning model learning device, characterized in that.

제1항에 있어서, 상기 가상 데이터 생성부는,
상기 이미지 데이터로부터 획득되는 복수의 특징에 대응하는 특징 벡터들을 추출하고, 추출된 특징 벡터들을 변경하여 상기 가상 데이터를 생성하는 것을 특징으로 하는 머신 러닝 모델 학습 장치.According to claim 1, The virtual data generation unit,
A machine learning model learning apparatus characterized by extracting feature vectors corresponding to a plurality of features obtained from the image data and generating the virtual data by changing the extracted feature vectors.

제6항에 있어서, 상기 제어부는,
상기 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교한 결과에 따라, 적어도 하나의 특징 벡터에 대한 가중치를 결정하고, 결정된 가중치를 상기 추출된 특징 벡터들의 변경에 반영하여 상기 가상 데이터를 생성하는 것을 특징으로 하는 머신 러닝 모델 학습 장치.The method of claim 6, wherein the control unit,
According to a result of comparing the verification data and features extracted from the virtual data, weights for at least one feature vector are determined, and the determined weights are reflected in changes of the extracted feature vectors to generate the virtual data Machine learning model learning device, characterized in that.

가상 데이터를 통해 학습을 수행하는 머신 러닝 모델 학습 장치의 머신 러닝 모델 학습 방법에 있어서,
식별 대상에 대한 CAD(Computer Aided Design) 소프트웨어 또는 게임을 통해 획득되는 이미지 데이터로부터 가상 데이터를 생성하는 제1 단계;
상기 식별 대상에 대한 실제 이미지의 특징을 포함하는 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교하여, 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 제2 단계; 및,
상기 제2 단계의 판단 결과, 상기 가상 데이터에 따른 상기 머신 러닝 모델의 학습을 수행하는 제3 단계를 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 방법.In the machine learning model learning method of the machine learning model learning apparatus for performing learning through virtual data,
A first step of generating virtual data from image data acquired through CAD (Computer Aided Design) software or a game for an identification object;
A second step of comparing whether the virtual data satisfies the learning data condition by comparing the verification data including the characteristics of the actual image with respect to the identification target and the characteristics extracted from the virtual data; And,
And a third step of performing learning of the machine learning model according to the virtual data as a result of the determination of the second step.

제8항에 있어서,
상기 검증 데이터는,
상기 식별 대상에 대해 획득된 하나의 이미지이며,
상기 제2 단계는,
두 개의 이미지 사이의 오차를 산출하는 적어도 하나의 기 설정된 알고리즘에 근거하여 상기 검증 데이터와 상기 생성된 가상 데이터 간의 오차를 산출하는 제2-1 단계; 및,
산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 제2-2 단계를 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 방법. The method of claim 8,
The verification data,
It is one image acquired for the identification object,
The second step,
Step 2-1 of calculating an error between the verification data and the generated virtual data based on at least one preset algorithm for calculating an error between two images; And,
And a step 2-2 of determining whether the virtual data satisfies a learning data condition according to whether the calculated error is equal to or less than a preset first level.

제9항에 있어서, 상기 적어도 하나의 기 설정된 알고리즘은,
유클리안 거리법(Euclidean distance), HoG(Histograms of Gradients) 벡터간의 거리를 고려한 DPM(DPM: Deformable Part Model)법, 아다 부스트(AdaBoost) 법, 및 두 샘플의 평균 함수 값의 차이로서 결과값(오차값)이 클수록 분포가 서로 다른 표본일 가능성이 높은 최대 평균 불일치(MMD, Maximum Mean Discrepancy)법 중 적어도 하나임을 특징으로 하는 머신 러닝 모델 학습 방법. The method of claim 9, wherein the at least one predetermined algorithm,
The result is the difference between the Euclidean distance method, the Deformable Part Model (DPM) method considering the distance between Histograms of Gradients (HoG) vectors, the AdaBoost method, and the average function value of the two samples. A machine learning model learning method characterized by having at least one of a maximum mean discrepancy (MMD) method in which a larger distribution (error value) is more likely to be a sample having a different distribution.

제8항에 있어서,
상기 검증 데이터는,
상기 식별 대상에 대해 획득된 복수의 이미지들이 공통적으로 갖는 특징들의 데이터이며,
상기 제2 단계는,
상기 검증 데이터에 포함된 특징들에 따른 적어도 하나의 확률 분포를 추출하는 a 단계;
상기 생성된 가상 데이터로부터 산출되는 특징들에 따른 적어도 하나의 확률 분포와 상기 검증 데이터로부터 추출된 적어도 하나의 확률 분포 간의 오차를 산출하는 b 단계; 및,
산출된 오차가 기 설정된 제1 수준 이하인지 여부에 따라 상기 가상 데이터가 학습 데이터 조건을 만족하는지 여부를 판단하는 c 단계를 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 방법. The method of claim 8,
The verification data,
It is data of features that the plurality of images acquired for the identification object have in common,
The second step,
A step of extracting at least one probability distribution according to the features included in the verification data;
A step b of calculating an error between at least one probability distribution according to features calculated from the generated virtual data and at least one probability distribution extracted from the verification data; And,
And a step c for determining whether the virtual data satisfies a learning data condition according to whether the calculated error is equal to or less than a preset first level.

제8항에 있어서, 상기 제1 단계는,
상기 이미지 데이터로부터 획득되는 복수의 특징에 대응하는 특징 벡터들을 추출하는 제1-1 단계; 및,
상기 추출된 특징 벡터들을 변경하여 상기 가상 데이터를 생성하는 제1-2 단계를 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 방법.The method of claim 8, wherein the first step,
A first-first step of extracting feature vectors corresponding to a plurality of features obtained from the image data; And,
And a step 1-2 of generating the virtual data by changing the extracted feature vectors.

제12항에 있어서, 상기 제1-2 단계는,
상기 검증 데이터와, 상기 가상 데이터로부터 추출되는 특징을 비교한 결과에 따라, 적어도 하나의 특징 벡터에 대한 가중치를 결정하고, 결정된 가중치를 상기 추출된 특징 벡터들의 변경에 반영하여 새로운 가상 데이터를 생성하는 1-3 단계를 더 포함하는 것을 특징으로 하는 머신 러닝 모델 학습 방법.The method of claim 12, wherein the step 1-2,
According to a result of comparing the verification data and the features extracted from the virtual data, weights for at least one feature vector are determined and new virtual data is generated by reflecting the determined weights in the changes of the extracted feature vectors Machine learning model learning method characterized in that it further comprises steps 1-3.