KR102384009B1

KR102384009B1 - Learning data augmentation method and apparatus by composing object and background

Info

Publication number: KR102384009B1
Application number: KR1020200185353A
Authority: KR
Inventors: 채승엽; 전세윤; 김소원
Original assignee: 주식회사 마크애니
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-04-08

Abstract

Disclosed is a learning data augmentation method using composition of an object and a background in a learning data augmentation apparatus, which increases reality of augmented learning data. According to the present invention, the method comprises the following steps: extracting an object image (the object image is a learning target) and determining a type of the object image; inputting a background image (the background image includes a plurality of different areas) and distinguishing a first background area and a second background area from the background image; and combining the object image with the first background area and the second background area to augment learning data. The learning data augmentation step includes a step of randomly arranging object images of a first type corresponding to the first background area in the first background area and randomly arranging second type object images corresponding to the second background area in the second background area.

Description

객체와 배경 합성을 통한 학습 데이터 증강 방법 및 장치{LEARNING DATA AUGMENTATION METHOD AND APPARATUS BY COMPOSING OBJECT AND BACKGROUND}Method and apparatus for augmenting learning data through object and background synthesis

본 발명은 학습데이터 증강 방법에 관한 것으로, 학습 데이터 생성을 위해 객체와 배경을 효율적으로 합성하는 방법에 관한 것이다.The present invention relates to a method for augmenting learning data, and to a method for efficiently synthesizing an object and a background to generate learning data.

머신러닝(machine learning)이란 데이터를 이용해서 컴퓨터를 학습시키는 방법론이다. 머신러닝은 크게 지도학습, 비지도학습, 강화학습으로 나눌 수 있다. 이 중 지도학습은 데이터에 대한 레이블(명시적인 정답)이 주어진 상태에서 컴퓨터를 학습시키는 방법이다.Machine learning is a method of learning a computer using data. Machine learning can be divided into supervised learning, unsupervised learning, and reinforcement learning. Among them, supervised learning is a method of learning a computer in a state where labels (explicit correct answers) for data are given.

통상적으로, 머신러닝 모델은 많은 양의 데이터로 학습될수록 그 성능이 좋아진다. 한편, 머신러닝 모델 중 하나인 합성곱 신경망(CNN: Convolutional Neural Network)은 이미지 검출 분야에서 우수한 성능을 보이고 있다. CNN은 수십만 개의 매개변수를 가지고 있기 때문에 충분한 수의 학습 이미지로 학습되어야 한다. 따라서, 이미지에서 객체를 검출하는 객체 검출 인공신경망의 경우, 그 성능을 높이기 위해서는, 보다 많은 양의 데이터를 학습하거나 인공신경망을 개선하는 방법이 요구된다. In general, the performance of a machine learning model improves as it is trained with a large amount of data. Meanwhile, a convolutional neural network (CNN), one of the machine learning models, is showing excellent performance in the field of image detection. Since CNNs have hundreds of thousands of parameters, they must be trained with a sufficient number of training images. Therefore, in the case of an object detection artificial neural network that detects an object in an image, in order to improve its performance, a method for learning a larger amount of data or improving the artificial neural network is required.

종래 학습데이터 구축 과정은 관련 데이터를 수집하고 학습에 방해되거나 불필요한 데이터를 정제하고 학습이 가능하도록 모든 객체를 라벨링하는 방법이 사용되었다. In the conventional learning data building process, a method of collecting related data, purifying data that is obstructed or unnecessary to learning, and labeling all objects to enable learning was used.

도 1은 종래의 학습 데이터 증강 방법을 설명하기 위한 개념도이다.1 is a conceptual diagram for explaining a conventional method of augmenting learning data.

도 1을 참조하면, 학습데이터를 구축하는 과정은 객체 추출을 위한 관련 데이터를 수집하고, 학습에 방해가 되는 불필요한 데이터를 정제하며, 학습이 가능하도록 모든 객체를 정답파일에 표기하는 라벨링 과정을 포함하여 구성된다. 이와 같은 방법으로 학습데이터를 구축하는 과정은 매우 많은 시간과 인력이 소모된다. Referring to FIG. 1 , the process of building learning data includes a labeling process of collecting relevant data for object extraction, purifying unnecessary data that interferes with learning, and marking all objects in an answer file to enable learning. is composed by The process of constructing learning data in this way consumes a lot of time and manpower.

보다 구체적으로, 정답 객체를 잘라내어 다른 배경에 합성함에 의해, 학습데이터를 증강시킬 수 있다. 특히, 이미지에서 학습할 객체(예를 들어, 사람)를 컴퓨터가 식별할 수 있도록 객체의 위치 및 영역을 표시하는 과정을 라벨링이라고 하는데, 위와 같은 방법으로 구축된 학습데이터의 경우, 객체가 라벨링되어 있어 잘라내기 수월하고, 다른 배경에 합성할 경우, 합성하는 과정에서 객체의 위치를 이미 알고 있으므로, 라벨링 작업을 별도로 진행하지 않아도 되는 장점이 있다. More specifically, by cutting out the correct answer object and synthesizing it in another background, the learning data can be augmented. In particular, the process of marking the location and area of an object so that a computer can identify an object (eg, a person) to be learned from an image is called labeling. In the case of learning data constructed in the above way, the object is labeled Therefore, it is easy to cut, and when compositing on a different background, the position of the object is already known in the process of compositing, so there is an advantage that there is no need to perform a separate labeling operation.

또한, 학습데이터의 증강 방법으로는, 배경이미지 합성시, 배경이미지를 반전시키거나, 흑백 처리하거나 회전시키는 등의 가공 작업을 통해 하나의 배경이미지로부터 다수의 증강된 학습데이터를 생성할 수 있다. 또한, 객체 이미지를 반전, 흑백처리 및 회전시키고, 스케일링, 뒤집기(flipping), 원근법 변환(perspective transform), 조도변경(Lighting condition)하여 학습데이터를 증강시킬 수 있다. 추가적으로, 배경이미지 내에 정답 객체를 하나 또는 다수 개 랜덤(random)하게 배열함으로써 증강된 학습데이터를 다수 생성할 수 있다. In addition, as a method of augmenting learning data, when synthesizing a background image, a plurality of augmented learning data can be generated from one background image through processing operations such as inverting, black-and-white processing, or rotating the background image. In addition, learning data can be augmented by inverting, black-and-white processing, and rotating the object image, scaling, flipping, perspective transform, and lighting condition. Additionally, a plurality of augmented learning data can be generated by randomly arranging one or more correct answer objects in the background image.

다만, 이와 같이 증강된 학습데이터는 객체와 배경 사이에 연관성을 전혀 고려하지 않다 보니, 이질감이 생기고, 이와 같은 이질감은 머신러닝 등의 학습 효과의 저하를 야기하는 문제점이 있다. 또한, 반복적인 배경이미지의 사용은 인공신경망 학습 과정에서 과적합(overfitting)되는 문제를 야기시키는 문제점이 있다.However, since the augmented learning data does not consider the association between the object and the background at all, a sense of heterogeneity arises, and this heterogeneity has a problem in that it causes deterioration of learning effects such as machine learning. In addition, the use of repetitive background images has a problem of causing overfitting in the artificial neural network learning process.

상술한 문제점을 해결하기 위한 본 발명의 일 양태에 따른 목적은 객체에 따른 배경영역을 지정하여 현실적으로 객체와 배경을 합성함에 의해 증강된 학습데이터를 생성하는 학습데이터 증강을 위한 객체-배경 합성 방법을 제공하는 것이다.An object according to an aspect of the present invention for solving the above problems is an object-background synthesis method for augmenting learning data that generates augmented learning data by realistically synthesizing an object and a background by designating a background area according to the object. will provide

상기한 목적을 달성하기 위한 본 발명의 일 양태에 따른, 학습데이터 증강 장치에서의, 객체와 배경 합성을 통한 학습 데이터 증강 방법은, 객체 이미지를 추출하는 단계(상기 객체 이미지는 학습 대상임), 상기 객체 이미지의 타입(type)을 결정하는 단계, 배경이미지(background image)를 입력하는 단계(상기 배경이미지는 복수 개의 서로 다른 영역을 포함함), 상기 배경 이미지에서 제 1 배경 영역과 제 2 배경 영역을 구분하는 단계, 상기 제 1 배경 영역과 상기 제 2 배경 영역에 상기 객체 이미지를 합성하여 학습 데이터를 증강하는 단계를 포함하되, 상기 학습 데이터 증강 단계는, 상기 제 1 배경 영역에는, 상기 제 1 배경 영역에 대응하는 제 1 타입의 객체 이미지를 랜덤(random)하게 배열하고, 상기 제 2 배경 영역에는, 상기 제 2 배경 영역에 대응하는 제 2 타입의 객체 이미지를 랜덤하게 배열하는 단계를 포함할 수 있다.According to an aspect of the present invention for achieving the above object, in the apparatus for augmenting learning data, the method for augmenting learning data through object and background synthesis includes the steps of extracting an object image (the object image is a learning target), Determining the type of the object image, inputting a background image (the background image includes a plurality of different regions), a first background region and a second background in the background image Separating regions, comprising the step of synthesizing the object image in the first background region and the second background region to augment learning data, wherein the augmenting learning data includes: randomly arranging object images of a first type corresponding to a first background area, and randomly arranging object images of a second type corresponding to the second background area in the second background area; can do.

상기 제 1 배경 영역은 사람이 걸어다니는 인도 영역을 포함하고, 상기 제 1 타입의 객체는 사람 타입의 객체를 포함할 수 있다.The first background area may include a sidewalk area on which a person walks, and the first type object may include a person type object.

상기 제 2 배경 영역은 차량이 다니는 도로 영역을 포함하고, 상기 제 2 타입의 객체는 차량 타입의 객체를 포함할 수 있다.The second background area may include a road area on which a vehicle travels, and the second type object may include a vehicle type object.

상기 랜덤하게 배열하는 것은 동일한 객체 이미지를 그에 대응하는 배경 영역에 적어도 하나 배열하되, 공간적으로 랜덤하게 배열할 수 있다.The randomly arranging may include arranging at least one identical object image in a background area corresponding thereto, but spatially randomly arranging the same object image.

공간적으로 랜덤하게 배열함에 따라 하나의 배경이미지를 이용하여 서로 다른 복수 개의 학습데이터를 생성할 수 있다.By spatially random arrangement, a plurality of different training data can be generated using a single background image.

상기 방법은, 상기 배경이미지 내에서 객체 배열이 불가능한 제 3 영역을 구분하는 단계 및 상기 제 3 영역을 노이즈(noise)로 채우는 단계를 더 포함할 수 있다.The method may further include distinguishing a third area in which object arrangement is impossible in the background image and filling the third area with noise.

상기 제 1 배경영역과 상기 제 1 타입의 객체의 대응관계와 상기 제 2 배경영역과 상기 제 2 타입의 객체의 대응관계는 미리 저장되어 있을 수 있다.The correspondence between the first background area and the first type of object and the correspondence between the second background area and the second type of object may be previously stored.

상기 객체 이미지의 타입을 정의하는 카테고리는 제 1 트리 구조(tree structure)를 이루고, 상기 객체 이미지의 타입 별로 각각 대응하는 배경 영역도 제 2 트리 구조를 이루며, 상기 제 1 트리 구조와 상기 제 2 트리 구조는 서로 대응 관계를 가지며 연관되며, 상기 연관 관계를 기반으로, 상기 제 1 타입의 객체 이미지에 대응되는 상기 제 1 배경영역이 결정되고, 상기 제 2 타입의 객체 이미지에 대응되는 상기 제 2 배경영역이 결정될 수 있다.A category defining the type of the object image forms a first tree structure, a background area corresponding to each type of the object image also forms a second tree structure, and the first tree structure and the second tree structure The structures have a corresponding relationship and are related to each other. Based on the relationship, the first background area corresponding to the first type of object image is determined, and the second background corresponding to the second type of object image is determined. The area may be determined.

상기한 목적을 달성하기 위한 본 발명의 일 양태에 따른, 객체와 배경 합성을 통한 학습 데이터 증강 장치는, 객체 이미지를 추출하는 객체 추출부(상기 객체 이미지는 학습 대상임), 상기 객체 이미지의 타입(type)을 결정하는 객체 카테고리 결정부, 배경이미지(background image)를 입력하는 배경이미지 입력부(상기 배경이미지는 복수 개의 서로 다른 영역을 포함함), 상기 배경 이미지에서 제 1 배경 영역과 제 2 배경 영역을 구분하는 객체 배열 영역 지정부 및 상기 제 1 배경 영역과 상기 제 2 배경 영역에 상기 객체 이미지를 합성하여 학습 데이터를 증강하는 객체 배경 합성부를 포함하되, 상기 객체 배경 합성부는, 상기 제 1 배경 영역에는, 상기 제 1 배경 영역에 대응하는 제 1 타입의 객체 이미지를 랜덤(random)하게 배열하고, 상기 제 2 배경 영역에는, 상기 제 2 배경 영역에 대응하는 제 2 타입의 객체 이미지를 랜덤하게 배열할 수 있다.According to an aspect of the present invention for achieving the above object, an apparatus for augmenting learning data by synthesizing an object and a background includes an object extracting unit for extracting an object image (the object image is a learning target), a type of the object image An object category determining unit for determining (type), a background image input unit for inputting a background image (the background image includes a plurality of different regions), a first background region and a second background in the background image An object arrangement region designator for dividing regions, and an object background synthesizing unit for augmenting learning data by synthesizing the object image with the first background region and the second background region, wherein the object background synthesizing unit comprises: the first background region In the region, a first type of object image corresponding to the first background region is randomly arranged, and in the second background region, an object image of a second type corresponding to the second background region is randomly arranged can be arranged

상기한 목적을 달성하기 위한 본 발명의 다른 양태에 따른, 학습데이터 증강 장치에서의, 객체와 배경 합성을 통한 학습 데이터 증강 방법은, 학습 대상인 객체 이미지를 추출하는 단계, 학습 데이터 증강을 위해 배경이미지(background image)를 입력하는 단계, 객체-배경 매칭 정책에 따라 상기 배경 이미지에서 상기 추출된 객체 이미지에 대응하는 객체 배열 영역을 지정하는 단계 및 상기 지정된 객체 배열 영역에 상기 추출된 객체 이미지를 랜덤(random)하게 배열하는 단계를 포함할 수 있다.According to another aspect of the present invention for achieving the above object, in the apparatus for augmenting learning data, the method for augmenting learning data through object and background synthesis includes extracting an object image as a learning target, a background image for augmenting learning data inputting (background image), specifying an object arrangement area corresponding to the extracted object image from the background image according to an object-background matching policy, and randomly ( (randomly) may include a step of arranging.

상기 객체 이미지는 카테고리화되어 있고, 상기 객체-배경 매칭 정책은 상기 객체 이미지의 카테고리에 대응하는 객체 배열 영역의 이미지 상의 특징 정보를 포함하고, 상기 특징 정보를 기반으로, 상기 배경이미지 내에서, 상기 객체 이미지의 카테고리에 대응하는 객체 배열 영역을 추출할 수 있다.The object image is categorized, and the object-background matching policy includes characteristic information on an image of an object arrangement area corresponding to the category of the object image, and based on the characteristic information, in the background image, the An object arrangement area corresponding to a category of an object image may be extracted.

상기 객체-배경 매칭 정책은 상기 객체 이미지의 카테고리를 정의하는 제 1 트리 구조(tree structure) 및 상기 객체 이미지에 대응하는 객체 배열 영역을 정의하는 제 2 트리 구조를 포함하고, 상기 객체-배경 매칭 정책은 상기 제 1 트리 구조와 상기 제 2 트리 구조의 대응관계를 정의하며, 상기 대응관계를 기반으로, 상기 객체이미지에 대응하는 객체 배열 영역을 지정할 수 있다.The object-background matching policy includes a first tree structure defining a category of the object image and a second tree structure defining an object arrangement area corresponding to the object image, the object-background matching policy defines a correspondence relationship between the first tree structure and the second tree structure, and may designate an object arrangement area corresponding to the object image based on the correspondence relationship.

상기 객체이미지의 카테고리를 결정함에 있어서, 상기 제 1 트리 구조를 이용하여, 상기 객체이미지를 상기 제 1 트리 구조 상에서 매칭가능한 가장 낮은 계층의 카테고리로 매칭할 수 있다.In determining the category of the object image, using the first tree structure, the object image may be matched with a category of the lowest matchable hierarchy on the first tree structure.

상기 객체-배경 매칭 정책은 특정 객체 이미지가 특정 객체 배열 영역에 얼마나 밀집하여 분포가능한지를 나타내는 랜덤 배열 확률을 정의하며, 상기 랜덤 배열 확률을 고려하여 상기 객체이미지를 상기 지정된 객체 배열 영역에 랜덤하게 배열할 수 있다.The object-background matching policy defines a random arrangement probability indicating how densely a specific object image can be distributed in a specific object arrangement area, and the object image is randomly arranged in the designated object arrangement area in consideration of the random arrangement probability can do.

상기한 목적을 달성하기 위한 본 발명의 다른 양태에 따른, 객체와 배경 합성을 통한 학습 데이터 증강 장치는, 학습 대상인 객체 이미지를 추출하는 객체 추출부, 학습 데이터 증강을 위해 배경이미지(background image)를 입력하는 배경이미지 입력부, 객체-배경 매칭 정책에 따라 상기 배경 이미지에서 상기 추출된 객체 이미지에 대응하는 객체 배열 영역을 지정하는 객체 배열 영역 지정부 및 상기 지정된 객체 배열 영역에 상기 추출된 객체 이미지를 랜덤(random)하게 배열하는 객체 배경 합성부를 포함할 수 있다.According to another aspect of the present invention for achieving the above object, an apparatus for augmenting learning data through synthesizing an object and a background is an object extracting unit for extracting an object image that is a learning target, and a background image for augmenting learning data. A background image input unit to input, an object arrangement area designation unit for designating an object arrangement area corresponding to the extracted object image from the background image according to an object-background matching policy, and a random selection of the extracted object image to the designated object arrangement area It may include an object background composition unit arranged randomly.

본 발명의 객체와 배경 합성을 통한 학습 데이터 증강 방법에 따르면, 객체와 배경 간의 연관성을 기반으로 학습데이터를 증강시키기 때문에, 증강된 학습데이터의 현실감이 높아지고, 이를 이용하는 딥러닝 엔진(Deep Learning Engine)의 성능도 제고시키는 효과가 있다.According to the method of augmenting learning data through object and background synthesis of the present invention, since the learning data is augmented based on the association between the object and the background, the sense of reality of the augmented learning data is increased, and a deep learning engine using it It also has the effect of improving the performance of

도 1은 종래의 학습 데이터 증강 방법을 설명하기 위한 개념도,
도 2는 본 발명의 일 실시예에 따른 객체와 배경 합성을 통한 학습 데이터 증강 방법을 나타낸 흐름도,
도 3은 객체와 객체에 대응하는 배경영역을 지정하여 현실적으로 합성하는 방법을 설명하기 위한 개념도,
도 4는 도 3의 방법에 따라 배경이미지에 객체를 합성하여 증강된 학습데이터의 이미지를 예시적으로 나타낸 예시도,
도 5는 객체 배열 영역을 제외한 객체 배열 제외 영역을 노이즈로 채우는 처리를 수행하는 과정을 구체적으로 나타낸 상세흐름도,
도 6은 도 5의 방법에 따라 배경이미지의 일부 영역을 노이즈로 채우고, 다른 일부 영역에 실제 객체를 합성하여 생성된 실제 증강된 학습데이터를 예시적으로 나타낸 예시도,
도 7은 사람 객체를 카테고리화한 구조를 예시적으로 나타낸 예시도,
도 8은 차량 객체를 카테고리화한 구조를 예시적으로 나타낸 예시도,
도 9는 객체와 배경 매칭 테이블이 특정 객체가 특정 배경 영역에 배열될 확률을 관리하는 것을 설명하기 위한 개념도,
도 10은 본 발명의 일 실시예에 따른 객체와 배경 합성을 통한 학습 데이터 증강 장치를 나타낸 블록도이다.1 is a conceptual diagram for explaining a conventional method of augmenting learning data;
2 is a flowchart illustrating a method of augmenting learning data through object and background synthesis according to an embodiment of the present invention;
3 is a conceptual diagram for explaining a method of realistically synthesizing an object and a background area corresponding to the object;
4 is an exemplary view illustrating an image of training data augmented by synthesizing an object with a background image according to the method of FIG. 3;
5 is a detailed flowchart specifically showing the process of filling the object array exclusion area with noise except for the object arrangement area;
6 is an exemplary diagram illustrating actual augmented learning data generated by filling some areas of the background image with noise and synthesizing real objects in other partial areas according to the method of FIG. 5;
7 is an exemplary diagram illustrating a structure that categorizes human objects;
8 is an exemplary view illustrating a structure in which vehicle objects are categorized;
9 is a conceptual diagram for explaining that the object and background matching table manages the probability that a specific object is arranged in a specific background area;
10 is a block diagram illustrating an apparatus for augmenting learning data by synthesizing an object and a background according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention.

제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. In describing the present invention, in order to facilitate the overall understanding, the same reference numerals are used for the same components in the drawings, and duplicate descriptions of the same components are omitted.

도 2는 본 발명의 일 실시예에 따른 객체와 배경 합성을 통한 학습 데이터 증강 방법을 나타낸 흐름도이다. 2 is a flowchart illustrating a method of augmenting learning data by synthesizing an object and a background according to an embodiment of the present invention.

도 2를 참조하면, 장치는 객체 검출 알고리즘의 기계 학습을 위해 준비된 다수의 기보유 이미지들에서 정답 객체를 추출한다(S210). 정답 객체는 실제 객체 검출 알고리즘의 검출 대상이 되는 객체이다. 이는 사람 객체, 차량 객체, 상품 객체 등 다양한 타입의 객체로 구현될 수 있다. 정답 객체는 다수의 기보유 이미지들에서 사용자의 선택에 의해 추출될 수 있다. 즉, 사용자는 영상 판독의 목적에 따라 학습할 객체를 기보유 이미지들에서 추출하여 등록한다. 객체 추출시 객체와 관련된 정보를 표시하는 작업(레이블링 작업의 일종)을 진행하여 추후 객체 배열을 통한 증강된 학습데이터 생성시의 레이블링에 활용될 수 있도록 한다.Referring to FIG. 2 , the device extracts a correct answer object from a plurality of pre-owned images prepared for machine learning of an object detection algorithm (S210). The correct answer object is the object to be detected by the actual object detection algorithm. This may be implemented as various types of objects, such as a person object, a vehicle object, and a product object. The correct answer object may be extracted by the user's selection from a plurality of previously owned images. That is, the user extracts the object to be learned from the already owned images and registers it according to the purpose of image reading. When the object is extracted, the operation of displaying the information related to the object (a kind of labeling operation) is performed so that it can be utilized for labeling when generating augmented learning data through object arrangement in the future.

장치는 추출된 정답 객체의 카테고리를 결정한다(S220). 장치는 추출된 정답 객체를 수집하여 배경이미지에 합성시킬 준비를 하는데, 이때, 객체를 배경이미지 내에서 배열하기에 적절한 영역에 합성시키기 위해, 추출된 객체의 성격을 명확히 정의하는 것이 바람직하다. 추출된 객체가 사람의 인체를 포함하는 객체인 경우, 이를 "사람" 타입의 객체로 명확히 정의하여, 그에 대응하는 배경이미지에 배열될 수 있도록 특정할 수 있다. 객체의 특정은 사용자가 직접 수행할 수도 있고, 미리 보유하고 있는 객체-배경 매칭 정책(테이블(table) 형태인 경우, 이를 "객체-배경 매칭 테이블"이라고 부를 수 있음)에 맞게 기정의된 객체들 중 하나로 장치가 직접 정의할 수도 있다. 장치가 직접 정의한다면, 추출된 객체들의 특징을 분석하여 기정의된 특정 타입 및/또는 특정 카테고리의 객체들의 특징(이미지를 구성하는 다수의 파라미터들로 정의될 수 있음)과 비교하는 방법을 이용하여 장치가 직접 정의할 수 있다. The device determines the category of the extracted correct answer object (S220). The device collects the extracted correct answer object and prepares to synthesize it in the background image. At this time, in order to synthesize the object in an area suitable for arranging in the background image, it is desirable to clearly define the nature of the extracted object. When the extracted object is an object including a human body, it can be clearly defined as a "person" type object and specified to be arranged in a corresponding background image. Objects can be specified directly by the user, and objects predefined according to the pre-owned object-background matching policy (in the case of a table form, this can be called “object-background matching table”) One of them may be defined directly by the device. If the device defines it directly, using a method of analyzing the characteristics of the extracted objects and comparing them with the characteristics of objects of a predefined specific type and/or specific category (which may be defined by a plurality of parameters constituting the image) The device can be defined directly.

장치는 추출된 객체를 합성할 배경이미지를 입력받을 수 있다(S230). 이는 장치 내의 메모리에 미리 저장되어 있을 수도 있고, 유선 또는 무선 네트워크를 통해 연결된 다수의 장치로부터 수신하여 입력할 수도 있다. The device may receive a background image to synthesize the extracted object (S230). This may be stored in advance in a memory in the device, or may be inputted by receiving it from a plurality of devices connected through a wired or wireless network.

배경이미지가 입력되고 나면, 장치는 입력된 배경이미지 내에서 객체가 배열될 수 있는 객체 배열 가능 영역을 지정한다(S240). 이는 단계(S220)에서 특정된 정답 객체의 카테고리를 참고하여, 객체-배경 매칭 테이블을 이용함에 의해 수행된다. 객체-배경 매칭 테이블은 특정 카테고리로 결정된 객체에 대응하는 배경영역의 특징을 서로 매칭시켜 정의하고 있다. 매칭 관계는 객체가 해당 배경영역에 실질적으로(현실적으로) 존재가능한가를 기반으로 설정된다. 이는 장치의 메모리에 기저장되어 있을 수 있고, 또는, 사용자가 직접 배경영역을 지정함에 의해 설정될 수 있다. After the background image is input, the device designates an object arrangable area in which objects can be arranged in the input background image (S240). This is performed by using the object-background matching table with reference to the category of the correct answer object specified in step S220. The object-background matching table is defined by matching the characteristics of the background area corresponding to the object determined as a specific category. The matching relationship is established based on whether the object can actually (really) exist in the corresponding background area. This may be pre-stored in the device's memory, or may be set by a user directly designating a background area.

예를 들어, 사람 카테고리의 객체는 "인도, 횡단보도, 건물 안" 등의 배경영역에 배열되도록 정의되고, 장치는 정의된 배경영역의 이미지적인 특징을 파라미터화하여 보유하고 있을 수 있다. 이에, 단계(S220)에서 정의된 객체 카테고리와 대응되는 배경영역이 단계(S230)에서 입력된 배경이미지 내에 존재하는지 판단하고, 상기 판단에 따라 입력된 배경이미지를 하나 또는 그 이상의 영역으로 구분하여 객체가 배열될 영역을 지정한다. 객체-배경 매칭 테이블은 객체와 매칭되는 배경영역의 이미지적인 특징도 이미지와 관련된 다수의 파라미터들을 이용하여 정의하고 있다. 이에, 장치는 테이블에서 정의하는 배경영역과 관련된 파라미터들을 이용하여 배경이미지 내의 객체 배열 가능 영역을 추출하고, 추출된 영역을 정답 객체와 매칭하여 관리함으로써 해당 객체의 영역 내 랜덤한 배열이 가능하게 준비한다. 이때, 학습되어야 할 정답 객체가 다수인 경우, 그와 대응하는 다수의 객체 배열 가능 영역이 배경이미지 내에 지정될 수 있다. 반대로, 배경이미지 내에 객체 배열 가능 영역이 없으면, 해당 배경이미지는 통과시키고 다른 배경이미지를 입력받아 단계(S240)를 반복하여 수행한다. For example, an object of the person category is defined to be arranged in a background area such as "sidewalk, crosswalk, inside a building", and the device may hold an image characteristic of the defined background area by parameterizing it. Accordingly, it is determined whether a background area corresponding to the object category defined in step S220 exists in the background image input in step S230, and the input background image is divided into one or more areas according to the determination. Specifies the area to be arranged. The object-background matching table also defines the image characteristics of the background area matched with the object using a number of parameters related to the image. Accordingly, the device extracts the object arrangement possible area in the background image using the parameters related to the background area defined in the table, and manages the extracted area by matching the correct answer object, so that a random arrangement in the area of the object is possible. do. In this case, when there are a plurality of correct answer objects to be learned, a plurality of object arrayable areas corresponding thereto may be designated in the background image. Conversely, if there is no object arrayable area in the background image, the background image is passed through and another background image is received, and step S240 is repeated.

배경이미지에서 객체 배열 영역이 지정되고 나면, 지정된 영역에 대응되는 객체를 랜덤하게 배열하여 객체와 배경의 합성 과정을 완료한다(S250). 객체의 배열은 대응하는 영역에 객체의 위치와 갯수를 한정하지 않고 랜덤하게 배열함을 포함할 수 있다. 장치는 해당 객체를 지정된 영역 내에 랜덤하게 배열하면서, 레이블링 작업을 수행한다. 즉, 객체에 대한 크기 정보와 위치 정보를 레이블링을 통해 저장하여, 머신러닝 프로그램이 이미지를 분석할 때, 어떤 객체가 어느 위치에 있는지 학습할 수 있도록 지원한다. 이때, 객체의 카테고리도 레이블링될 수 있고, 코덱, 그 밖의 상태 정보도 레이블링될 수 있다. 객체의 합성시 객체를 반전, 흑백처리 및 회전시키고, 스케일링(scaling), 뒤집기(flipping), 원근법 변환(perspective transform), 조도변경(Lighting condition)할 수 있으며, 위와 같은 처리에 대한 정보를 레이블에 기록할 수 있다.After the object arrangement area is designated in the background image, the objects corresponding to the designated area are randomly arranged to complete the process of synthesizing the object and the background (S250). The arrangement of objects may include randomly arranging objects in a corresponding area without limiting the location and number of objects. The device performs the labeling operation while randomly arranging the object in the designated area. In other words, by storing size information and location information about an object through labeling, when a machine learning program analyzes an image, it supports learning which object is in which location. In this case, the category of the object may be labeled, and the codec and other state information may also be labeled. When compositing an object, you can invert, black-and-white processing, and rotate the object, and perform scaling, flipping, perspective transform, and lighting condition. can be recorded

도 3은 객체와 객체에 대응하는 배경영역을 지정하여 현실적으로 합성하는 방법을 설명하기 위한 개념도이다. 3 is a conceptual diagram for explaining a method of realistically synthesizing an object and a background region corresponding to the object.

도 3을 참조하면, 배경이미지를 입력받으면, 장치는 추출된 객체에 대응하는 객체 배열 영역을 결정한다. 만약, 장치가 하나의 타입의 객체에 대한 학습데이터의 증강을 달성하고자 할 때는, 단일 타입의 객체에 대응하는 배경영역만을 지정하면 된다. 장치가 둘 이상의 타입의 객체에 대한 학습데이터 증강을 달성하고자 할 때는, 둘 이상의 타입의 객체를 모두 고려하여 복수 개의 배경 영역을 지정하는 것이 바람직하다. Referring to FIG. 3 , upon receiving a background image, the device determines an object arrangement area corresponding to the extracted object. If the device intends to achieve augmentation of learning data for one type of object, only a background area corresponding to a single type of object needs to be designated. When the device intends to achieve training data augmentation for two or more types of objects, it is preferable to designate a plurality of background regions in consideration of all two or more types of objects.

도 3의 실시예에서, 장치는 A 타입의 객체와 B 타입 객체에 대한 증강된 학습데이터를 생성하고자 할 때, A 타입 객체에 대응하는 배경영역으로 영역(310)을 지정하고, B 타입 객체에 대응하는 배경영역으로 영역(320)을 지정할 수 있다. 이때, A 타입 객체은 인체를 포함하는 "사람" 객체일 수 있고, 영역(310)은 사람이 지나다닐 수 있는 인도 영역으로 식별하여 지정될 수 있다. 그리고, B 타입 객체는 "차량" 객체일 수 있고, 영역(320)은 차량이 지나다닐 수 있는 도로 영역으로 식별하여 지정된다. In the embodiment of Fig. 3, when the device intends to generate the augmented learning data for the A-type object and the B-type object, the area 310 is designated as the background area corresponding to the A-type object, and the B-type object is The area 320 may be designated as the corresponding background area. In this case, the A-type object may be a "person" object including a human body, and the area 310 may be identified and designated as a guide area through which a person can pass. In addition, the B-type object may be a “vehicle” object, and the area 320 is identified and designated as a road area through which a vehicle may pass.

A 타입 객체와 그에 대응하는 영역(310)의 이미지적 특징에 대한 매칭 관계, 그리고, B 타입 객체와 그에 대응하는 영역(320)의 이미지적 특징에 대한 매칭 관계는 객체-배경 매칭 테이블 또는 매칭 정책을 통해 정의된다. 장치는 정의된 관계를 기반으로 배경 이미지에서 객체가 배열될 영역을 지정한다. The matching relationship between the A type object and the image feature of the region 310 corresponding thereto, and the matching relationship between the type B object and the image feature of the region 320 corresponding thereto, is an object-background matching table or matching policy. is defined through The device designates an area in the background image where objects will be arranged based on the defined relationship.

배경이미지에서 객체에 대응되는 영역이 아닌, 객체 배열 제외 영역(330)은 객체 배열 영역(310, 320)이 지정되고 나면, 자동적으로 계산되어질 수 있다. 장치는 객체 배열 제외 영역(330)에는 객체를 배열하지 않고, 영역(310, 320)에만 대응하는 객체들을 랜덤하게 배열하여 현실감있는 학습이미지를 생성한다. 도 3의 실시예에서, 장치는 영역(310)에는 3개의 A 타입 객체를, 영역(320)에는 2개의 B 타입 객체를 배열한다. 이때, 앞서 설명한 바와 같이, 각 객체의 배열시 그 타입(하나의 타입은 그 안에 다수의 계층화된 카테고리를 포함함), 크기, 위치 및 그 밖의 환경정보들을 레이블링하여 학습 프로그램에서 이를 인지할 수 있도록 한다. The area excluding the object arrangement 330, not the area corresponding to the object in the background image, may be automatically calculated after the object arrangement areas 310 and 320 are designated. The device does not arrange objects in the object array exclusion area 330 , but randomly arranges objects corresponding only to the areas 310 and 320 to generate a realistic learning image. In the embodiment of FIG. 3 , the device arranges three A-type objects in area 310 and two B-type objects in area 320 . At this time, as described above, when each object is arranged, the type (one type includes a plurality of layered categories therein), size, location, and other environmental information are labeled so that the learning program can recognize it. do.

한편, 하나의 배경이미지에 대해 객체의 랜덤 배열을 다르게 하여 복수 개의 학습이미지를 생성할 수 있다. 예를 들어, 영역(310)에 A 타입 객체의 위치 및 갯수 또는 그 밖의 파라미터들(반전, 회전, 스케일링 등을 수행)을 다르게 하여 배열할 수 있다. 이에 따라, 2개의 A 타입 객체를 배열하고, 영역(320)에는 B 타입 객체를 5개 배열하여, 또 다른 학습이미지를 생성할 수 있다. 장치는 기설정된 기준에 도달할 때까지, 하나의 배경이미지에 대해 학습 이미지를 최대로 많이 생성하는 것이 바람직하다. 기설정된 기준과 관련하여, 일 예로, 장치는 객체 배열 영역의 크기 및/또는 갯수를 기반으로 그에 대응하는 횟수를 설정하여 해당 횟수에 도달할 때까지 학습이미지를 증강시키는 것을 고려할 수 있다. On the other hand, a plurality of learning images can be generated by different random arrangement of objects for one background image. For example, the location and number of A-type objects or other parameters (reversal, rotation, scaling, etc. are performed) may be differently arranged in the area 310 . Accordingly, by arranging two A-type objects and arranging five B-type objects in the region 320 , another learning image may be generated. Preferably, the device generates a maximum number of training images for one background image until a preset criterion is reached. With respect to the preset criterion, as an example, the device may consider augmenting the learning image until the corresponding number is reached by setting a corresponding number of times based on the size and/or number of the object arrangement area.

도 4는 도 3의 방법에 따라 배경이미지에 객체를 합성하여 증강된 학습데이터의 이미지를 예시적으로 나타낸 예시도이다. 4 is an exemplary diagram illustrating an image of training data augmented by synthesizing an object with a background image according to the method of FIG. 3 .

도 4의 상단 도면을 참조하면, 장치는 인도, 도로, 강, 건물 등이 포함된 배경이미지를 복수 개의 영역으로 구분하여 객체 배열 가능한 영역들을 정의한다. 도 4의 실시예에서는, 영역(410)은 사람이 다니는 인도 영역을, 영역(420)은 차가 다니는 도로 영역으로 정의하고, 그외 영역은 객체 배열 불가 영역으로 정의할 수 있다. Referring to the upper drawing of FIG. 4 , the device divides a background image including a sidewalk, a road, a river, a building, etc. into a plurality of regions to define regions where objects can be arranged. In the embodiment of FIG. 4 , the area 410 may be defined as a sidewalk area where a person travels, the area 420 may be defined as a road area through which a vehicle travels, and other areas may be defined as an object arrangement impossible area.

장치는 객체 배열 가능 영역을 검출할 때, 정답객체에 의존한다. 즉, 정답 객체에 대응하는 객체 배열 가능 영역은 복수 개로 정의될 수 있으며, 장치는 복수 개의 객체 배열 가능 영역 중 하나라도 배경이미지 내에 있는지 분석한다. When the device detects the object arrangement possible area, it depends on the correct object. That is, a plurality of object arrayable regions corresponding to the correct answer object may be defined, and the device analyzes whether at least one of the plurality of object arrayable regions is in the background image.

도 4의 하단 도면을 참조하면, 장치는 영역(410)에는 사람 객체(412-1, 412-2)만을 배치하고, 영역(420)에는 차량 객체(미도시)만 배열하여 보다 현실적으로 객체와 배경을 합성함에 의해 증강된 학습데이터를 생성한다.Referring to the lower drawing of FIG. 4 , the device arranges only human objects 412-1 and 412-2 in the area 410, and arranges only vehicle objects (not shown) in the area 420 to more realistically match the object and the background. It generates augmented learning data by synthesizing

도 5는 객체 배열 영역을 제외한 객체 배열 제외 영역을 노이즈로 채우는 처리를 수행하는 과정을 구체적으로 나타낸 상세흐름도이다. 5 is a detailed flowchart illustrating a process of performing a process of filling an object arrangement exclusion area with noise except for an object arrangement area.

도 5를 참조하면, 장치는 도 2의 방법(특히, 단계(S240))에 따라 객체가 배열가능한 영역을 지정하는데(S510), 객체가 배열 가능한 하나 또는 그 이상의 영역이 결정되고 나면, 장치는 배경이미지 내의 상기 결정된 영역을 제외한 나머지 영역을 산출하여 객체 배열 제외 영역으로 지정할 수 있다(S520). Referring to FIG. 5 , the device designates an area in which an object can be arranged according to the method of FIG. 2 (particularly, step S240) ( S510 ). Once one or more areas in which an object can be arranged are determined, the device An area other than the determined area in the background image may be calculated and designated as an object array exclusion area (S520).

그리고는, 장치는 객체 배열 제외 영역으로 지정된 영역을 노이즈로 처리할 수 있다(S530). 객체 검출 알고리즘에 대한 학습의 관점에서, 배경은 랜덤한 값이 들어오는 것을 지향하고, 정답 객체만 정확하게 검출하는 것이 목적이 된다. 따라서, 객체가 들어갈 가능성이 (거의) 없는 부분에 대해서는, 노이즈로 처리하여 해당 영역을 최대한 난수로 만드는 것은 학습 성능의 제고를 달성하는 좋은 방법이 될 수 있도록 한다. Then, the device may process the area designated as the object array exclusion area as noise ( S530 ). From the point of view of learning about the object detection algorithm, the background is oriented to random values coming in, and the purpose is to accurately detect only the correct object. Therefore, for a part where an object is unlikely (almost) to enter, processing it as noise and making the region as random as possible can be a good way to achieve improvement in learning performance.

한편, 노이즈는 백색잡음(AWGN: Additive White Gaussian Noise)인 것이 바람직하다. 반복적인 배경이미지의 사용은 인공신경망 학습 과정에서 과적합(overfitting)되는 문제를 야기시킬 수 있다. 따라서, 객체와 배경을 합성하여 학습데이터를 증강시키는 과정에서, 객체 배열 제외영역을 지정하고, 동일 배경을 재사용할 때마다 랜덤하게 백색잡음을 채워 과적합을 방지하는 것이 바람직하다.Meanwhile, it is preferable that the noise is Additive White Gaussian Noise (AWGN). The use of repetitive background images may cause an overfitting problem in the artificial neural network learning process. Therefore, in the process of augmenting the learning data by synthesizing the object and the background, it is desirable to designate an object array exclusion area and randomly fill in white noise whenever the same background is reused to prevent overfitting.

도 6은 도 5의 방법에 따라 배경이미지의 일부 영역을 노이즈로 채우고, 다른 일부 영역에 객체를 합성하여 생성된 증강된 학습데이터를 예시적으로 나타낸 예시도이다. 6 is an exemplary diagram illustrating augmented learning data generated by filling a partial region of a background image with noise and synthesizing an object in another partial region according to the method of FIG. 5 .

도 6의 상단 도면을 참조하면, 영역(610)과 영역(620)은 사람 객체 및 차량 객체를 배열할 수 있는 영역이고, 장치는 이 두 영역들(610, 620)을 제외한 영역(630)을 객체 배열 제외 영역으로 지정할 수 있다. Referring to the upper drawing of FIG. 6 , an area 610 and an area 620 are areas in which a person object and a vehicle object can be arranged, and the device uses an area 630 excluding these two areas 610 and 620 . It can be designated as an object array exclusion area.

그리고는, 도 6의 하단 도면과 같이, 해당 영역(630)을 노이즈로 채우고, 객체들을 상기 두 영역들(610, 620)에 랜덤하게 배열하여 증강된 학습데이터를 생성할 수 있다. Then, as shown in the lower drawing of FIG. 6 , the corresponding area 630 is filled with noise and objects are randomly arranged in the two areas 610 and 620 to generate augmented learning data.

한편, 장치는 경우에 따라 객체 배열 영역들 중 적어도 일부에도 노이즈를 배열할 수 있다. 예를 들어, 배경이미지 하나를 이용하여 다수의 증강 학습데이터를 생성할 때, 기본적으로는 영역들(610, 620) 각각에 대응하는 객체들을 배열하겠지만, 일정 갯수 이상의 증강된 학습데이터를 생성하고 나면, 보다 다양한 예시적인 학습데이터의 생성을 위해, 영역(620)을 객체 배열 제외 영역을 설정하고 영역(620)까지도 노이즈로 채우는 처리를 할 수 있다. 이를 통해 영역(610)에만 사람 객체가 랜덤 배열되고, 영역(620, 630)은 노이즈로 채우는 처리를 할 수 있다. 그 반대의 처리도 가능하다. 영역(620)에만 차량 객체를 랜덤배열하고, 영역(610, 630)을 노이즈로 채워 또 다른 증강 학습데이터를 생성할 수 있다.Meanwhile, in some cases, the device may arrange noise in at least some of the object arrangement areas. For example, when generating a plurality of augmented learning data using one background image, objects corresponding to each of the regions 610 and 620 are basically arranged, but after generating a certain number of augmented learning data , in order to generate more various exemplary learning data, the area 620 may be set to an object array exclusion area and even the area 620 may be filled with noise. Through this, human objects are randomly arranged only in the region 610, and the regions 620 and 630 may be filled with noise. The reverse treatment is also possible. Another augmented learning data may be generated by randomly arranging vehicle objects only in the region 620 and filling the regions 610 and 630 with noise.

다른 예에서는, 전체 배경이미지를 노이즈로 채우고 거기에 객체만 랜덤하게 배열하는 방법으로 증강된 학습데이터를 생성할 수도 있다. 즉, 정답 객체가 배열된 부분 이외의 모든 영역을 노이즈로 채우는 것이다. 이 실시예를 통한 학습데이터가 난수가 가장 많이 발생한 학습데이터가 될 수 있다.In another example, augmented learning data may be generated by filling the entire background image with noise and randomly arranging only objects there. That is, all areas other than the part where the correct answer objects are arranged are filled with noise. The learning data through this embodiment may be the learning data in which the random number is generated the most.

도 7은 사람 객체를 카테고리화한 구조를 예시적으로 나타낸 예시도이고, 도 8은 차량 객체를 카테고리화한 구조를 예시적으로 나타낸 예시도이다. 7 is an exemplary diagram illustrating a structure in which a human object is categorized, and FIG. 8 is an exemplary diagram illustrating a structure in which a vehicle object is categorized.

도 7 및 도 8을 참조하면, 정답 객체는 다양한 타입 중 하나로 카테고리화 될 수 있다. 그리고, 이러한 카테고리는 트리 구조(tree structure)를 이룰 수 있다. 사람 객체에 대해서는, "사람"이라는 상위 카테고리 아래, "어른"과 "아이" 등으로 구분이 될 수 있고, 어른 객체 하위로는 "60세 이상의 노인" 객체와 "60세 이하의 일반 어른" 객체 등으로 또 구분될 수 있다. 사람과 관련된 카테고리에서 가장 상위 개념인 "사람" (A) 객체에 대해서는, 인도(A₁), 횡단보도(A₂), 놀이터(A₃), ... 등의 영역이 대응될 수 있다. 그의 하위 카테고리인 "아이"(Aa) 객체에 대해서는, 놀이터(Aa₁), 키즈카페(Aa₂) 등의 영역이 대응될 수 있다. "어른"(Ab) 객체에는 놀이터나 키즈카페 등의 영역은 대응되지 않고, 인도(Ab₁), 횡단보도(Ab₂), 골프장(Ab3) 등의 영역이 대응될 수 있다. 다만, 놀이터나 키즈카페에 어른이 있는 것도 가능할 수 있다. 이를 위해, 사용자는 객체-배경 매칭 테이블에서 이러한 부분에 대한 것도 고려하여 매칭관계를 설정할 수 있다. 분포 가능성(확률)을 고려하여, 그 확률 변수를 "아이" 객체와 "어른" 객체에 대해 다르게 설정하는 방식으로 적절한 객체-배경 합성이 이루어지도록 제어할 수 있다(도 9 참조). 7 and 8 , the correct answer object may be categorized into one of various types. And, these categories may form a tree structure. For the human object, it can be divided into "adult" and "child" under the upper category of "person", and the "old man over 60" object and "general adult under 60" object as the sub-adult object. It can be further divided into For the "person" (A) object, which is the highest concept in a category related to people, areas such as a sidewalk (A ₁ ), a crosswalk (A ₂ ), a playground (A ₃ ), ... may correspond. For the object "child" (Aa), which is a sub-category thereof, areas such as a playground (Aa ₁ ) and a kids cafe (Aa ₂ ) may correspond. The "adult" (Ab) object does not correspond to an area such as a playground or a kids cafe, but may correspond to an area such as a sidewalk (Ab ₁ ), a crosswalk (Ab ₂ ), or a golf course (Ab3). However, it may be possible for adults to be present in playgrounds or kids cafes. To this end, the user may establish a matching relationship in consideration of such a part in the object-background matching table. In consideration of the distribution probability (probability), it is possible to control appropriate object-background synthesis by setting the random variable differently for the "child" object and the "adult" object (refer to FIG. 9).

다시 말해, A 카테고리(사람 관련 객체)의 하위 카테고리인 Aa 객체의 대응되는 배경영역(Aa_n 영역)은 A 카테고리의 대응영역인 A_n 영역의 전부 또는 일부가 포함될 수 있다. 즉, 하위 카테고리의 객체에 대응하는 배경영역은 상위 카테고리의 객체의 대응하는 배경영역에 포함되는 관계가 형성될 수 있다. In other words, the corresponding background area (Aa _n area) of the object Aa that is a subcategory of category A (person-related object) may include all or part of area A _n that is the corresponding area of category A. That is, a relationship may be formed in which the background area corresponding to the object of the lower category is included in the background area corresponding to the object of the upper category.

도 8의 실시예에 있어서, 차량과 관련된 카테고리의 최상위 카테고리인 "차량" 객체에 대해서는, 도로 영역이 대응될 수 있다. 이때, "차량"의 하위 카테고리인 "4륜 차량"에 대해서는, "고속도로"와 "일반도로"가 모두 포함될 수 있으나, 다른 하위 카테고리인 "2륜 차량"에 대해서는, "고속도로" 영역은 제외되고 "일반도로" 영역만 대응되도록 설정될 수 있다. 이와 같이 객체의 타입을 정의하는 카테고리는 트리 구조를 이루며, 그에 대응하는 배경 영역도 트리구조를 이루되, 두 트리구조는 서로 대응관계를 가지면서 연관된다. 다만, 객체 관련 트리구조 내의 특정 계층의 카테고리와 배경 영역 관련 트리구조의 특정 계층의 배경영역 간의 대응관계는 두 트리구조 내의 계층의 위치에 정확히 매칭되는 것은 아닐 수 있다. 즉, 카테고리별 트리구조 내 대응되는 배경영역은 배경영역 트리구조 내에서 개별적으로 지정(정의)되는 것이 바람직하다.In the embodiment of FIG. 8 , a road area may correspond to a “vehicle” object that is the highest category of a vehicle-related category. At this time, for "4-wheel vehicle", which is a sub-category of "vehicle," both "highway" and "general road" may be included, but for "two-wheeled vehicle", which is another sub-category, the "highway" area is excluded and It may be set to correspond only to the "general road" area. In this way, the category defining the object type forms a tree structure, and the corresponding background area also forms a tree structure, but the two tree structures are related while having a corresponding relationship with each other. However, the correspondence between the category of a specific hierarchy in the object-related tree structure and the background region of the specific hierarchy in the background region-related tree structure may not exactly match the position of the hierarchy in the two tree structures. That is, it is preferable that the corresponding background regions in the tree structure for each category are individually designated (defined) in the background region tree structure.

도 9는 객체와 배경 매칭 테이블이 특정 객체가 특정 배경 영역에 배열될 확률을 관리하는 것을 설명하기 위한 개념도이다. 9 is a conceptual diagram for explaining that an object and a background matching table manages a probability that a specific object is arranged in a specific background area.

도 9를 참조하면, 사람 관련 객체에 대해서, 객체 배열 가능 영역으로 인도, 횡단보도, 등산로, 암벽 등이 대응될 수 있는데, 장치는 객체의 해당 영역으로의 분포 가능성을 토대로 랜덤 배열 확률(분포 가능성 또는 분포도라 불릴 수 있음)을 미리 설정하고, 설정된 확률만큼만 객체가 배열되도록 제어할 수 있다. 예를 들어, 사람이 인도에 분포할 확률은 100%로 설정하여, 인도에 빽빽하게 사람 객체가 배열되어도 무방하게 설정할 수 있다. 여기서의 확률은 다른 배경영역과 비교했을 때의 상대적인 분포 가능성을 의미한다. 또한, 이는 배열 포화도와 연관된다. 즉, 100%는 해당 영역의 거의 모든 부분에 대응하는 객체가 배열되어도 무방하는 것을 의미한다. 등산로 같은 경우는 인도보다는 낮은 70%로 설정하여 해당 영역에는 객체의 랜덤 배열 포화도를 70% 정도가 되도록 할 수 있다. 암벽은 20%로 정의하여, 해당 영역에 너무나 많은 사람이 배열되지 않도록 한다. 이때, "사람" 카테고리의 하위 카테고리에 대해서는, 그 분포 확률이 변화될 수 있다. "어른" 카테고리에 대해서는, 암벽 영역은 20%로 동일한 분포도를 가질 수 있지만, "아이" 카테고리의 경우, 암벽 영역은 0%로 설정되어 객체 배열 제외 영역으로 지정될 수 있다. 이와 같이, 특정 타입의 카테고리 트리구조 내에서 개별 카테고리마다 객체 영역의 분포도는 서로 다를 수 있다. 이러한 확률에 따른 배열 분포는 현실적인 부분을 반영한 것으로써, 객체-배경 매칭 테이블에서 이를 미리 설정해 두고, 관리할 수 있다. Referring to FIG. 9 , with respect to a person-related object, a sidewalk, a crosswalk, a hiking trail, a rock wall, etc. may correspond to the object arrangement possible area. Alternatively, it may be referred to as a distribution map) in advance, and the objects may be controlled to be arranged only by a set probability. For example, the probability that a person will be distributed in India is set to 100%, so that it is possible to set the probability that human objects are densely arranged in India. Here, the probability means the relative probability of distribution when compared to other background regions. Also, it is related to the array saturation. That is, 100% means that objects corresponding to almost all parts of the corresponding area may be arranged. In the case of a hiking trail, it can be set to 70%, which is lower than that of the sidewalk, so that the saturation of random arrangement of objects in the corresponding area is about 70%. A rock wall is defined as 20%, so that there are not too many people arranged in the area. In this case, for a sub-category of the “person” category, the distribution probability may be changed. For the “adult” category, the rock wall area may have the same distribution as 20%, but in the case of the “child” category, the rock wall area may be set to 0% and designated as an object arrangement exclusion area. As such, within a specific type of category tree structure, the distribution of object regions may be different for each individual category. The arrangement distribution according to this probability reflects the realistic part, and it can be set in advance in the object-background matching table and managed.

차량 타입 카테고리에서, 도로 영역은 100% 분포도를 가지나, 산 영역은 10%, 사막 영역은 5%의 분포도를 가질 수 있다. In the vehicle type category, a road area may have a distribution of 100%, a mountain area may have a distribution of 10%, and a desert area may have a distribution of 5%.

상품 타입 카테고리에서, 상품 판매 매장 진열대 영역은 100% 분포도를 가질 수 있으나, 건물 내 영역에는 70%, 인체에는 50%의 분포도를 가질 수 있다. 특히, 상품의 종류에 따라 서로 다른 영역별 분포도를 갖는다. 신발 객체는 인체 중 사람의 하반신 영역에 50%의 분포도를 갖지만, 상반신 영역에는 0%의 분포도를 갖는다. In the product type category, the product sales store display rack area may have a distribution of 100%, but may have a distribution of 70% in the area within the building and 50% in the human body. In particular, it has a different distribution for each area depending on the type of product. The shoe object has a distribution of 50% in the lower body region of the human body, but has a distribution of 0% in the upper body region.

동물 타입 카테고리에서, 동물원 영역 및 초원 영역 등은 100% 분포도를 가질 수 있으나, 인도 영역 및 차도 영역은 10% 미만의 분포도를 갖도록 설정할 수 있다. In the animal type category, the zoo area and the grassland area may have a distribution of 100%, but the sidewalk area and the driveway area may be set to have a distribution of less than 10%.

하나의 카테고리로 정의되어도 대응되는 배경영역은 하나의 배경이미지 내에 복수 개일 수 있다. 예를 들어, 놀이터와 인도가 공존하는 배경이미지에서, 어른 객체를 랜덤 배열할 때, 놀이터 영역과 인도 영역 모두 어른 객체를 배열 가능하나, 놀이터 영역에는 10% 분포도로 배열하고, 인도 영역에는 100% 분포도로 배열할 수 있다. 즉, 하나의 배경이미지 내의 복수 개 영역에는 상기 테이블의 정책에 따라 서로 다른 분포도로 정답 객체를 랜덤 배열할 수 있다. Even if one category is defined, a plurality of corresponding background areas may be included in one background image. For example, in a background image in which a playground and a sidewalk coexist, when randomly arranging adult objects, both the playground area and the sidewalk area can arrange adult objects, but they are arranged in a 10% distribution in the playground area and 100% in the sidewalk area. It can be arranged in a distribution diagram. That is, in a plurality of regions within one background image, answer objects may be randomly arranged with different distributions according to the policy of the table.

장치가 특정 객체에 대한 객체 배열 가능 영역을 지정함에 있어서, 객체에 대응하는 객체 배열 가능 영역이 다수 개일 때, 분포도가 높은 배열 영역에 우선순위를 두어 차례로 분석할 수 있다. 예를 들어, 사람 객체의 객체 배열 가능 영역을 지정할 때, 장치는 100% 분포도를 갖는 인도 영역 및 횡단보도 영역을 찾아 구분하고, 그 다음으로, 등산로, 암벽 등의 순서로 배열영역을 지정한다. 그리고는, 지정된 배열영역에, 그에 대응하는 분포도에 맞게, 랜덤하게 객체를 배열한다. When the device designates an object arrangementable area for a specific object, when there are a plurality of object arrangementable areas corresponding to an object, the arrangement area having a high distribution may be analyzed sequentially by giving priority to the arrangement area. For example, when designating an object arrangementable area of a human object, the device finds and divides a sidewalk area and a crosswalk area having 100% distribution, and then designates the arrangement area in the order of hiking trails, rock walls, etc. Then, in the designated array area, the objects are randomly arranged according to the corresponding distribution.

또한, 하나의 배열 영역에 서로 다른 정답 객체가 배열될 수 있다. 예를 들어, 차도 영역에 차량 객체도 배열될 수 있지만, 사람 객체도 낮은 분포도(10% 미만)로 배열되도록 설정할 수 있다. 이때, 두 정답 객체의 랜덤 배열에 따른 분포도 합이 두 정답 객체의 기설정된 분포도 중 높은 분포도의 값을 초과하지 않도록 배열하는 것이 바람직하다. 즉, 차량 90%에 사람 10%를 더하여 둘을 합쳤을 때, 100%는 넘지 않도록 배열하는 것이 바람직하다. Also, different answer objects may be arranged in one arrangement area. For example, vehicle objects may be arranged in the roadway area, but human objects may also be set to be arranged with a low distribution (less than 10%). In this case, it is preferable to arrange the distribution so that the sum of the distributions according to the random arrangement of the two correct answer objects does not exceed the value of the higher distribution among the preset distributions of the two correct answer objects. That is, when adding 10% of people to 90% of vehicles and adding the two, it is desirable to arrange so as not to exceed 100%.

한편, 본 발명의 다른 실시예에 따르면, 위와 같은 객체-배경 매칭 테이블의 분포 확률을 기반으로 생성된 복수 개의 학습이미지를 분할하여 재조합함에 의해, 새로운 증강 학습데이터를 생성하는 것도 가능하다. 일 예로, 제 1 내지 제 4 학습이미지를 생성한 후, 제 1 학습이미지는 좌상단에, 제 2 학습이미지는 우상단에, 제 3 학습이니지는 좌하단에, 제 4 학습이미지는 우하단에 배열하여 제 5 학습이미지를 생성할 수 있다. Meanwhile, according to another embodiment of the present invention, it is also possible to generate new augmented learning data by dividing and recombining a plurality of learning images generated based on the distribution probability of the object-background matching table as above. For example, after generating the first to fourth learning images, the first learning image is arranged in the upper left, the second learning image is in the upper right, the third learning image is in the lower left, and the fourth learning image is arranged in the lower right. A fifth learning image can be generated.

도 10은 본 발명의 일 실시예에 따른 객체와 배경 합성을 통한 학습 데이터 증강 장치를 나타낸 블록도이다. 도 10에 도시된 바와 같이, 본 발명의 일 실시예에 따른 학습데이터 증강 장치는 객체 추출부(1010), 배경이미지 입력부(1020), 객체 카테고리 결정부(1030), 배경특징 결정부(1040), 객체 배열 영역 지정부(1050) 및 객체 배경 합성부(1060)를 포함할 수 있다. 10 is a block diagram illustrating an apparatus for augmenting learning data by synthesizing an object and a background according to an embodiment of the present invention. 10, the learning data augmentation apparatus according to an embodiment of the present invention includes an object extracting unit 1010, a background image input unit 1020, an object category determining unit 1030, and a background feature determining unit 1040. , an object arrangement area designation unit 1050 and an object background synthesis unit 1060 may be included.

도 10을 참조하면, 학습 데이터 증강 장치는 학습데이터 증강부(1000) 및 머신러닝 엔진(1005)을 포함할 수 있다. 이때, 학습데이터 증강부(1000)는 정답 객체를 기반으로 다수의 증강된 학습이미지를 생성하고, 이를 머신러닝 엔진(1005)으로 제공할 수 있다. 학습데이터 증강부(1000)는 마이크로 프로세서(Micro Processor)로 구현될 수 있고, 메모리(미도시)에 저장된 명령어를 실행한다. 이하, 학습데이터 증강부(1000)의 개별 구성요소에 대해 보다 상세히 설명한다.Referring to FIG. 10 , the learning data augmentation apparatus may include a learning data augmentation unit 1000 and a machine learning engine 1005 . In this case, the learning data augmentation unit 1000 may generate a plurality of augmented learning images based on the correct answer object, and provide them to the machine learning engine 1005 . The learning data augmentation unit 1000 may be implemented as a microprocessor, and executes instructions stored in a memory (not shown). Hereinafter, individual components of the learning data augmentation unit 1000 will be described in more detail.

객체 추출부(1010)는 객체 검출 알고리즘의 기계 학습을 위해 준비된 다수의 기보유 이미지들에서 정답 객체를 추출한다. 정답 객체는 실제 객체 검출 알고리즘의 추출대상이 되는 객체로, 사용자의 선택에 의해 추출될 수 있다. 다른 예에서, 이미 선택된 객체 이미지를 수신하여 이를 확보할 수도 있다. 객체의 추출시, 객체의 크기 정보, 코덱(codec) 정보, 그 밖의 환경 정보(이미지 생성 일자, 출처 등) 등을 레이블링하고 있다가 객체와 배경을 합성할 때의 레이블링 작업에 활용할 수 있도록 준비하는 것이 바람직하다. 정답 객체는 복수 개일 수 있다. The object extraction unit 1010 extracts a correct answer object from a plurality of pre-owned images prepared for machine learning of an object detection algorithm. The correct answer object is an object to be extracted by the actual object detection algorithm, and may be extracted by the user's selection. In another example, an object image that has already been selected may be received and secured. When extracting an object, labeling object size information, codec information, and other environmental information (image creation date, source, etc.) it is preferable There may be a plurality of answer objects.

배경이미지 입력부(1020)는 임의의 배경이미지들을 입력받는다. 배경이미지들은 장치 내의 메모리에 미리 저장되어 있을 수도 있고, 네트워크를 통해 연결된 타장치로부터 수신하여 입력할 수도 있다. 경우에 따라, 배경이미지는 전체가 노이즈로 채워진 이미지를 포함할 수 있다.The background image input unit 1020 receives arbitrary background images. Background images may be pre-stored in a memory within the device, or may be input by receiving from another device connected through a network. In some cases, the background image may include an image entirely filled with noise.

객체 카테고리 결정부(1030)는 상기 객체 추출부(1010)에서 추출된 정답 객체의 카테고리를 결정한다. 객체를 배경이미지의 적절한 영역에 합성시키기 위해, 추출된 객체의 성격을 명확히 정의하기 위해, 객체-배경 매칭 정책에 따라 추출된 객체의 카테고리를 결정한다. 객체가 복수 개라면, 복수 개의 객체에 대한 카테고리를 결정한다. 이때, 하나의 타입의 카테고리는 계층 구조를 갖기 때문에, 동일 카테고리에서 어느 계층의 카테고리로 정의되는지 찾는 것은 매우 어려울 수 있다. 이때, 하위 계층 카테고리로 갈수로 정답 객체에 대한 구체화된 정보를 가지고 카테고리화하는 것이기 때문에, 객체 카테고리 결정부(1030)는 정답객체를 최대한 하위 계층에 매칭하는 것이 바람직하다. 그래야 학습데이터의 현실성이 제고된다. 예를 들어, 3세 이하의 유아의 경우, "사람-아이-유아"의 카테고리 계층 구조에서, 카테고리를 결정할 때, 3개 카테고리 중 가장 낮은 계층인 "유아" 카테고리로 결정하는 것이 바람직하다. 그래야 배열영역의 트리 구조에서도 가장 좁은 범위의 배열영역에 대응되고, 이는 보다 현실적인 합성을 야기하기 때문이다.The object category determining unit 1030 determines the category of the correct answer object extracted by the object extracting unit 1010 . In order to synthesize the object into an appropriate area of the background image, to clearly define the nature of the extracted object, the category of the extracted object is determined according to the object-background matching policy. If there are a plurality of objects, a category for the plurality of objects is determined. In this case, since one type of category has a hierarchical structure, it may be very difficult to find which hierarchical category is defined in the same category. At this time, since categorization is carried out with detailed information about the correct answer object by going to the lower hierarchical category, it is preferable that the object category determiner 1030 matches the correct answer object to the lower hierarchy as much as possible. In this way, the realism of the learning data is improved. For example, in the case of an infant 3 years or younger, in the category hierarchy of "person-child-infant", when determining the category, it is preferable to determine the "infant" category, which is the lowest hierarchy among the three categories. In this way, even in the tree structure of the arrangement area, it corresponds to the arrangement area of the narrowest range, which causes more realistic synthesis.

배경 특징 결정부(1040)는 객체 카테고리 결정부(1030)에서 결정된 카테고리에 대응하는 배경영역의 특징을 결정한다. 상기 대응되는 배경영역은 이미지적인 특징을 파라미터화하여 보유하고 있다. 따라서, 배경 특징 결정부(1040)는 대응하는 배경영역의 파라미터들을 가져와서 객체 배열영역 지정부(1050)로 제공한다.The background feature determiner 1040 determines a characteristic of the background region corresponding to the category determined by the object category determiner 1030 . The corresponding background area holds the image characteristic by parameterizing it. Accordingly, the background feature determiner 1040 obtains parameters of the corresponding background area and provides them to the object arrangement area designator 1050 .

객체 배열영역 지정부(1050)는 배경특징 결정부(1040)에서 제공하는 객체 배열 가능 영역의 이미지 관련 특징 파라미터들을 기반으로 배경이미지 입력부(1020)를 통해 입력된 배경이미지 내에서 객체 배열 가능 영역을 지정한다. 이때, 만약 정답 객체와 대응되는 객체 배열 영역이 전혀 검출되지 않는 배경이미지가 있다면, 해당 이미지는 학습데이터에서 제외시킨다.The object arrangement area designation unit 1050 selects an object arrangement area within the background image input through the background image input unit 1020 based on the image-related feature parameters of the object arrangement area provided by the background feature determination unit 1040. specify At this time, if there is a background image in which the object arrangement area corresponding to the correct answer object is not detected at all, the image is excluded from the training data.

객체 배경 합성부(1060)는 객체 배열 영역 지정부(1050)에서 지정한 영역에 대응하는 정답 객체를 랜덤하게 배열하여 증강된 학습데이터를 생성한다. 객체의 배열은 대응하는 영역에 위치와 갯수를 한정하지 않고 랜덤하게 배열할 수 있다. 객체 배경 합성부(1060)는 해당 객체를 랜덤하게 지정된 영역에 배열하면서, 레이블링 작업을 수행한다. 객체 배경 합성부(1060)는 배경이미지에서 객체 배열 영역을 제외한 객체 배열 제외 영역에 대해서는 노이즈를 채우는 처리를 수행한다. 이때, 노이즈는 AWGN으로 하여, 객체가 존재할 수 없는 부분에 대해서는 최대한 난수가 되도록 처리한다. The object background synthesis unit 1060 generates augmented learning data by randomly arranging correct answer objects corresponding to the area specified by the object arrangement area designation unit 1050 . The arrangement of objects can be randomly arranged without limiting the location and number of objects in the corresponding area. The object background synthesizing unit 1060 performs a labeling operation while arranging the corresponding object in a randomly designated area. The object background synthesizing unit 1060 performs a noise filling process on the object array exclusion area except the object arrangement area in the background image. At this time, the noise is set to AWGN, and a part where an object cannot exist is processed to be as random as possible.

학습 데이터 증강부(1000)를 통해 증강된 학습데이터는 머신러닝 엔진(1005)으로 제공되어, 해당 엔진(1005)에서 관련된 객체 검출 알고리즘이 학습될 수 있도록 지원한다. 머신러닝 엔진(1005)는 동일 장치 내에서 실행될 수도 있고, 다른 장치에 존재할 수도 있다. The learning data augmented through the learning data augmentation unit 1000 is provided to the machine learning engine 1005 , and the engine 1005 supports a related object detection algorithm to be learned. The machine learning engine 1005 may run on the same device, or it may reside on a different device.

머신러닝 엔진(1005)은 판독률을 측정하는 판독률 측정 모듈을 추가적으로 구비할 수 있다. 이를 통해, 본 발명의 일 실시예에 따라 증강된 학습데이터를 사용하는 머신러닝 엔진(1005)의 판독률을 근거로, 객체의 특정, 카테고리의 트리 구조, 객체-배경영역의 매칭 관계 등을 장치 자체적으로 학습할 수 있도록 한다. 즉, 사용된 증강 학습이미지와 판독률 정보를 다시 학습데이터 증강부(1000)로 반환하여, 학습데이터 증강부(1000)가 객체의 특정, 카테고리의 특리 구조, 및 객체-배경영역의 매칭 관계 정립을 위한 학습데이터로 사용될 수 있도록 한다. 학습데이터 셋은 사용된 증강 학습데이터의 레이블링 정보들(여기에는, 객체 특정 정보, 카테고리 트리 구조 정보 및 객체-배경영역의 매칭관계에 대한 정보(분포도도 포함)가 포함될 수 있음)과 그에 따른 머신러닝 엔진(1005)에서의 정답 객체 판독률 값을 포함한다. 그리고는, 장치는 이 학습데이터 셋을 기반으로 판독률을 높이는 것을 지향점으로 하이퍼 파라미터들을 변경한다. 변경되는 하이퍼 파라미터는 정답 객체의 특정, 트리구조 및 객체-배경영역의 매칭관계와 관련된 것일 수 있다. 이에 따라 가장 높은 판독률을 갖는 파라미터로 위의 하이퍼 파라미터와 관련된 설정값을 결정한다. The machine learning engine 1005 may further include a read rate measurement module for measuring the read rate. Through this, based on the read rate of the machine learning engine 1005 using the augmented learning data according to an embodiment of the present invention, object specific, category tree structure, object-background region matching relationship, etc. are device enable self-learning. That is, by returning the used augmented learning image and reading rate information back to the learning data augmentation unit 1000, the learning data augmentation unit 1000 establishes a matching relationship between object specificity, category specific structure, and object-background area to be used as learning data for The training data set includes labeling information of the augmented learning data used (this may include object specific information, category tree structure information, and information (including distribution diagram) on the matching relationship between the object-background area) and the machine accordingly contains the correct object read rate value in the learning engine 1005 . Then, the device changes the hyperparameters based on the training data set with the aim of increasing the read rate. The hyperparameter to be changed may be related to the specific of the correct answer object, the tree structure, and the matching relation between the object-background area. Accordingly, the parameter with the highest read rate determines the setting value related to the hyperparameter above.

이상 도면 및 실시예를 참조하여 설명하였지만, 본 발명의 보호범위가 상기 도면 또는 실시예에 의해 한정되는 것을 의미하지는 않으며 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although described above with reference to the drawings and examples, it does not mean that the scope of protection of the present invention is limited by the drawings or examples, and those skilled in the art will appreciate the spirit of the present invention described in the claims below. And it will be understood that various modifications and changes can be made without departing from the scope of the present invention.

Claims

학습데이터 증강 장치에서의, 객체와 배경 합성을 통한 학습 데이터 증강 방법에 있어서,
객체 이미지를 추출하는 단계, 상기 객체 이미지는 학습 대상임;
상기 객체 이미지의 타입(type)을 결정하는 단계;
배경이미지(background image)를 입력하는 단계, 상기 배경이미지는 복수 개의 서로 다른 영역을 포함함;
상기 배경 이미지에서 제 1 배경 영역과 제 2 배경 영역을 구분하는 단계; 및
상기 제 1 배경 영역과 상기 제 2 배경 영역에 상기 객체 이미지를 합성하여 학습 데이터를 증강하는 단계를 포함하되,
상기 학습 데이터 증강 단계는, 상기 제 1 배경 영역에는, 상기 제 1 배경 영역에 대응하는 제 1 타입의 객체 이미지를 랜덤(random)하게 배열하고, 상기 제 2 배경 영역에는, 상기 제 2 배경 영역에 대응하는 제 2 타입의 객체 이미지를 랜덤하게 배열하는 단계를 포함하되,
상기 객체 이미지의 타입을 정의하는 카테고리는 제 1 트리 구조(tree structure)를 이루고, 상기 객체 이미지의 타입 별로 각각 대응하는 배경 영역도 제 2 트리 구조를 이루며,
상기 제 1 트리 구조와 상기 제 2 트리 구조는 서로 대응 관계를 가지며 연관되며,
상기 연관 관계를 기반으로, 상기 제 1 타입의 객체 이미지에 대응되는 상기 제 1 배경영역이 결정되고, 상기 제 2 타입의 객체 이미지에 대응되는 상기 제 2 배경영역이 결정되는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.In the learning data augmentation apparatus, in the method of augmenting learning data through object and background synthesis,
extracting an object image, the object image being a learning object;
determining a type of the object image;
inputting a background image, the background image including a plurality of different regions;
distinguishing a first background area and a second background area in the background image; and
Comprising the step of synthesizing the object image to the first background area and the second background area to augment learning data,
In the step of augmenting the learning data, in the first background region, a first type of object image corresponding to the first background region is randomly arranged, and in the second background region, the second background region is Randomly arranging a corresponding second type of object image,
A category defining the type of the object image forms a first tree structure, and a background area corresponding to each type of the object image also forms a second tree structure,
The first tree structure and the second tree structure have a corresponding relationship and are related to each other,
Based on the association relationship, the first background area corresponding to the first type of object image is determined, and the second background area corresponding to the second type of object image is determined. Learning data augmentation method through.

제 1 항에 있어서,
상기 제 1 배경 영역은 사람이 걸어다니는 인도 영역을 포함하고,
상기 제 1 타입의 객체는 사람 타입의 객체를 포함하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.The method of claim 1,
the first background area includes a sidewalk area on which a person walks,
The method for augmenting learning data through object and background synthesis, wherein the first type of object includes a human type object.

제 1 항에 있어서,
상기 제 2 배경 영역은 차량이 다니는 도로 영역을 포함하고,
상기 제 2 타입의 객체는 차량 타입의 객체를 포함하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.The method of claim 1,
The second background area includes a road area on which a vehicle travels,
The second type of object includes a vehicle type object, learning data augmentation method through object and background synthesis.

제 1 항에 있어서,
상기 랜덤하게 배열하는 것은 동일한 객체 이미지를 그에 대응하는 배경 영역에 적어도 하나 배열하되, 공간적으로 랜덤하게 배열하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.The method of claim 1,
The randomly arranging includes arranging at least one identical object image in a background area corresponding thereto, and spatially randomly arranging the same object image.

제 4 항에 있어서,
공간적으로 랜덤하게 배열함에 따라 하나의 배경이미지를 이용하여 서로 다른 복수 개의 학습데이터를 생성하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.5. The method of claim 4,
A method of augmenting learning data through object and background synthesis, which generates a plurality of different learning data using a single background image as they are spatially randomly arranged.

제 1 항에 있어서,
상기 배경이미지 내에서 객체 배열이 불가능한 제 3 영역을 구분하는 단계; 및
상기 제 3 영역을 노이즈(noise)로 채우는 단계를 더 포함하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.The method of claim 1,
distinguishing a third area in which object arrangement is impossible in the background image; and
The method of augmenting learning data through object and background synthesis, further comprising filling the third region with noise.

제 1 항에 있어서,
상기 제 1 배경영역과 상기 제 1 타입의 객체의 대응관계와 상기 제 2 배경영역과 상기 제 2 타입의 객체의 대응관계는 미리 저장되어 있는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.The method of claim 1,
and the correspondence between the first background area and the first type of object and the correspondence between the second background area and the second type of object are previously stored.

삭제delete

객체와 배경 합성을 통한 학습 데이터 증강 장치에 있어서,
객체 이미지를 추출하는 객체 추출부, 상기 객체 이미지는 학습 대상임;
상기 객체 이미지의 타입(type)을 결정하는 객체 카테고리 결정부;
배경이미지(background image)를 입력하는 배경이미지 입력부, 상기 배경이미지는 복수 개의 서로 다른 영역을 포함함;
상기 배경 이미지에서 제 1 배경 영역과 제 2 배경 영역을 구분하는 객체 배열 영역 지정부; 및
상기 제 1 배경 영역과 상기 제 2 배경 영역에 상기 객체 이미지를 합성하여 학습 데이터를 증강하는 객체 배경 합성부를 포함하되,
상기 객체 배경 합성부는, 상기 제 1 배경 영역에는, 상기 제 1 배경 영역에 대응하는 제 1 타입의 객체 이미지를 랜덤(random)하게 배열하고, 상기 제 2 배경 영역에는, 상기 제 2 배경 영역에 대응하는 제 2 타입의 객체 이미지를 랜덤하게 배열하되,
상기 객체 이미지의 타입을 정의하는 카테고리는 제 1 트리 구조(tree structure)를 이루고, 상기 객체 이미지의 타입 별로 각각 대응하는 배경 영역도 제 2 트리 구조를 이루며,
상기 제 1 트리 구조와 상기 제 2 트리 구조는 서로 대응 관계를 가지며 연관되며,
상기 연관 관계를 기반으로, 상기 제 1 타입의 객체 이미지에 대응되는 상기 제 1 배경영역이 결정되고, 상기 제 2 타입의 객체 이미지에 대응되는 상기 제 2 배경영역이 결정되는, 객체와 배경 합성을 통한 학습 데이터 증강 장치.In the learning data augmentation apparatus through object and background synthesis,
an object extraction unit for extracting an object image, wherein the object image is a learning target;
an object category determining unit that determines a type of the object image;
a background image input unit for inputting a background image, wherein the background image includes a plurality of different regions;
an object arrangement area designator for dividing a first background area and a second background area in the background image; and
Comprising an object background synthesizing unit for augmenting learning data by synthesizing the object image with the first background region and the second background region,
The object background synthesizing unit may randomly arrange a first type of object image corresponding to the first background area in the first background area, and correspond to the second background area in the second background area Randomly arrange the second type of object image to
A category defining the type of the object image forms a first tree structure, and a background area corresponding to each type of the object image also forms a second tree structure,
The first tree structure and the second tree structure have a corresponding relationship and are related to each other,
Based on the association relationship, the first background area corresponding to the first type of object image is determined, and the second background area corresponding to the second type of object image is determined. Learning data augmentation device through.

학습데이터 증강 장치에서의, 객체와 배경 합성을 통한 학습 데이터 증강 방법에 있어서,
학습 대상인 객체 이미지를 추출하는 단계;
학습 데이터 증강을 위해 배경이미지(background image)를 입력하는 단계;
객체-배경 매칭 정책에 따라 상기 배경 이미지에서 상기 추출된 객체 이미지에 대응하는 객체 배열 영역을 지정하는 단계; 및
상기 지정된 객체 배열 영역에 상기 추출된 객체 이미지를 랜덤(random)하게 배열하는 단계를 포함하되,
상기 객체-배경 매칭 정책은 상기 객체 이미지의 카테고리를 정의하는 제 1 트리 구조(tree structure) 및 상기 객체 이미지에 대응하는 객체 배열 영역을 정의하는 제 2 트리 구조를 포함하고,
상기 객체-배경 매칭 정책은 상기 제 1 트리 구조와 상기 제 2 트리 구조의 대응관계를 정의하며,
상기 대응관계를 기반으로, 상기 객체이미지에 대응하는 객체 배열 영역을 지정하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.In the learning data augmentation apparatus, in the method of augmenting learning data through object and background synthesis,
extracting an object image as a learning target;
inputting a background image to augment learning data;
designating an object arrangement area corresponding to the extracted object image from the background image according to an object-background matching policy; and
Comprising the step of randomly arranging the extracted object image in the designated object arrangement area,
The object-background matching policy includes a first tree structure defining a category of the object image and a second tree structure defining an object arrangement area corresponding to the object image,
The object-background matching policy defines a correspondence between the first tree structure and the second tree structure,
Based on the correspondence, designating an object arrangement area corresponding to the object image, learning data augmentation method through object and background synthesis.

제 10 항에 있어서,
상기 객체 이미지는 카테고리화되어 있고,
상기 객체-배경 매칭 정책은 상기 객체 이미지의 카테고리에 대응하는 객체 배열 영역의 이미지 상의 특징 정보를 포함하고,
상기 특징 정보를 기반으로, 상기 배경이미지 내에서, 상기 객체 이미지의 카테고리에 대응하는 객체 배열 영역을 추출하는, 학습 데이터 증강 방법.11. The method of claim 10,
The object image is categorized,
The object-background matching policy includes feature information on the image of the object arrangement area corresponding to the category of the object image,
Based on the feature information, from within the background image, extracting an object arrangement area corresponding to the category of the object image, learning data augmentation method.

삭제delete

제 10 항에 있어서,
상기 객체이미지의 카테고리를 결정함에 있어서, 상기 제 1 트리 구조를 이용하여, 상기 객체이미지를 상기 제 1 트리 구조 상에서 매칭가능한 가장 낮은 계층의 카테고리로 매칭하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.11. The method of claim 10,
In determining the category of the object image, by using the first tree structure, the object image is matched to the lowest matchable category on the first tree structure, learning data augmentation method through object and background synthesis .

제 10 항에 있어서,
상기 객체-배경 매칭 정책은 특정 객체 이미지가 특정 객체 배열 영역에 얼마나 밀집하여 분포가능한지를 나타내는 랜덤 배열 확률을 정의하며,
상기 랜덤 배열 확률을 고려하여 상기 객체이미지를 상기 지정된 객체 배열 영역에 랜덤하게 배열하는, 객체와 배경 합성을 통한 학습 데이터 증강 방법.11. The method of claim 10,
The object-background matching policy defines a random arrangement probability indicating how densely a specific object image can be distributed in a specific object arrangement area,
In consideration of the random arrangement probability, the object image is randomly arranged in the designated object arrangement area, learning data augmentation method through object and background synthesis.

객체와 배경 합성을 통한 학습 데이터 증강 장치에 있어서,
학습 대상인 객체 이미지를 추출하는 객체 추출부;
학습 데이터 증강을 위해 배경이미지(background image)를 입력하는 배경이미지 입력부;
객체-배경 매칭 정책에 따라 상기 배경 이미지에서 상기 추출된 객체 이미지에 대응하는 객체 배열 영역을 지정하는 객체 배열 영역 지정부; 및
상기 지정된 객체 배열 영역에 상기 추출된 객체 이미지를 랜덤(random)하게 배열하는 객체 배경 합성부를 포함하되,
상기 객체 이미지의 카테고리를 정의하는 제 1 트리 구조(tree structure) 및 상기 객체 이미지에 대응하는 객체 배열 영역을 정의하는 제 2 트리 구조를 포함하고,
상기 객체 배경 합성부는 상기 제 1 트리 구조와 상기 제 2 트리 구조의 대응관계를 기반으로, 상기 객체이미지에 대응하는 객체 배열 영역을 지정하는, 객체와 배경 합성을 통한 학습 데이터 증강 장치.In the learning data augmentation apparatus through object and background synthesis,
an object extraction unit for extracting an object image as a learning target;
a background image input unit for inputting a background image for augmentation of learning data;
an object arrangement area designation unit for designating an object arrangement area corresponding to the object image extracted from the background image according to an object-background matching policy; and
Comprising an object background synthesizing unit for randomly arranging the extracted object image in the designated object arrangement area,
A first tree structure defining a category of the object image and a second tree structure defining an object arrangement area corresponding to the object image,
The object background synthesizing unit, based on the correspondence between the first tree structure and the second tree structure, designates an object arrangement area corresponding to the object image, a learning data augmentation apparatus through object and background synthesis.