KR102370910B1

KR102370910B1 - Method and apparatus for few-shot image classification based on deep learning

Info

Publication number: KR102370910B1
Application number: KR1020200112520A
Authority: KR
Inventors: 이성환; 서진우; 김정준; 정홍규
Original assignee: 고려대학교 산학협력단
Priority date: 2019-09-05
Filing date: 2020-09-03
Publication date: 2022-03-08
Also published as: KR20210029110A

Abstract

본원의 일 측면에 따른 딥러닝 기반 소수 샷 이미지 분류 방법은, 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계; 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계 및 상기 1차 학습 단계와 2차 학습 단계를 거친 영상 분류 모델에 신규 입력 영상을 입력하여 분류 결과를 출력하는 단계를 포함한다.A deep learning-based fractional shot image classification method according to an aspect of the present application performs data augmentation processing on an input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the input image a first learning step of performing feature extraction on input images on which data augmentation has been made, learning the extracted features based on deep learning learning, and generating a classification model for classifying input images; A secondary learning step and the first learning step and 2 of generating a new auxiliary classifier from the main classifier for inclusion in the classification model, and allowing the independently output results of each classifier to be shared with other classifiers according to the self-distillation technique and outputting a classification result by inputting a new input image to the image classification model that has undergone the differential learning step.

Description

딥러닝 기반 소수 샷 이미지 분류 장치 및 방법{METHOD AND APPARATUS FOR FEW-SHOT IMAGE CLASSIFICATION BASED ON DEEP LEARNING}DEEP LEARNING-BASED FEW-SHOT IMAGE CLASSIFICATION DEVICE AND METHODS

본 발명은 딥러닝에 기반하여 소수 샷 이미지를 분류할 수 있는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method capable of classifying a fractional shot image based on deep learning.

최근 영상처리 기술은 빅데이터의 출현과 컴퓨터 하드웨어의 발전을 동반한 인공지능 분야 내 딥러닝 기술의 발전으로 인해 크게 진보하고 있으며, 이를 기반으로 다양한 분야의 영상을 자동으로 분류하는 기술이 발달하고 있다. 기존의 알고리즘에 기반한 영상 처리 방식이나 통계 기반의 영상 처리 방식들과는 다르게, 딥러닝에 기반한 영상 처리 기술은 인간이 분류하기 어려운 복잡한 영상일지라도 컴퓨터가 자동으로 분류할 수 있도록 한다. Recently, image processing technology has advanced greatly due to the advent of big data and the development of deep learning technology in the field of artificial intelligence accompanied by the development of computer hardware, and based on this, the technology to automatically classify images in various fields is developing. . Unlike image processing methods based on existing algorithms or image processing methods based on statistics, deep learning-based image processing technology enables a computer to automatically classify complex images that are difficult for humans to classify.

그러나 영상 처리에서 높은 성능의 분류 기술을 달성하기 위해서는 정답이 라벨링 되어 있는 다수의 학습 데이터가 필요한데, 아직 많은 산업 및 학문 분야에서 이러한 학습이 가능한 데이터 양을 충분하게 보유하지 못하고 있어, 사람이 직접 데이터를 가공하는 작업이 필요하여 시간 및 비용적 측면에서 어려움이 있는 상황이다.However, in order to achieve high-performance classification technology in image processing, a large number of training data with correct answers are required. However, many industries and academic fields do not yet have enough data for such learning, so people It is a situation in which it is difficult in terms of time and cost because it requires processing.

최근에는 영상 처리에서의 분류 성능을 향상시키기 위해 소수 샷 학습 기법에 관한 연구가 주목받고 있다. 소수 샷 학습 기법은 다수의 데이터로 학습이 되어 있는 모델에 이전에 보지 못했던 새로운 이미지가 입력될 때, 입력된 소수의 이미지를 제대로 분류할 수 있도록 하는 것을 목표로 하는 연구이다. 다수의 데이터로 학습이 되어 있는 모델이라 하더라도 새로운 카테고리를 분류하기 위해서는 수백 개 이상의 학습 데이터가 필요 하지만, 소수 샷 학습 기법은 이러한 과정을 불과 몇 장의 이미지만으로 가능하게 하는 기술이다. Recently, in order to improve classification performance in image processing, research on a fractional shot learning technique has been attracting attention. The prime shot learning technique is a study that aims to properly classify the input small number of images when a new image that has not been seen before is input to a model that has been trained with a large number of data. Even a model that has been trained with a large number of data requires hundreds of training data to classify a new category, but the few-shot learning technique is a technology that makes this process possible with only a few images.

소수 샷 학습 기술은 크게 특징추출 및 분류와 같은 일반 영상 분류 방식과 동일한 학습 방식을 취하고 있지만, 세부적으로는 거리 기반 학습 방식, 최적화 방식, 데이터 증강 방식, 그 외 기타 방식으로 구분할 수 있다.The small-shot learning technique takes the same learning method as the general image classification method such as feature extraction and classification, but in detail, it can be divided into a distance-based learning method, an optimization method, a data augmentation method, and other methods.

종래 기술의 이미지 영상 분류의 연구는 다수의 빅데이터를 기반으로 하여 각 클래스에 해당하는 이미지에 대한 패턴과 특징을 추출하는 방식으로 진행되었으며, 최근에는 연산을 줄이면서도 정확도를 향상시킬 수 있는 기술적 개선에 집중되거나 특정 도메인에 특화된 분류 모델 개발이 연구되어 왔다. The research on image classification in the prior art has been conducted in a way of extracting patterns and features for images corresponding to each class based on a large number of big data, and recently, technological improvements that can reduce calculations and improve accuracy Development of a classification model focused on or specialized in a specific domain has been studied.

이와 같이 이미지 영상 분류 기술이 발전하고 있지만, 결국 분류 모델을 학습시키기 위한 충분한 데이터의 양을 확보하지 못하면 관련 기술을 적용할 수 없는 것이 딥러닝 기반 영상 이미지 분류 기술의 한계점으로 지적되어 왔다.In this way, although image classification technology is developing, it has been pointed out as a limitation of deep learning-based image image classification technology that the relevant technology cannot be applied if a sufficient amount of data is not secured to train the classification model.

이러한 한계를 극복하기 위해, 소수 샷 학습 기반의 영상 이미지 분류 방법에 대한 연구가 최근 몇 년간 진행되었으며, 이러한 연구로 인해 소 수샷 학습 시스템 연구는 특정 하나의 데이터 세트 내에서는 비교적 높은 성능을 보이는 단계까지 기술 개발이 이뤄졌다. In order to overcome this limitation, research on video image classification method based on small-shot learning has been conducted in recent years, and due to these studies, small-shot learning system research has reached a stage where it shows relatively high performance within a specific data set. technology development has taken place.

그러나, 학습에 활용했던 데이터 세트와 전혀 다른 데이터 세트를 소수 샷 학습 시스템의 성능 평가 단계에 적용할 경우, 분류 성능이 낮아지는 현상이 한 연구결과에서 보고되었다. 또 다른 연구에서는 딥러닝 모델이 학습될 때 입력 학습 데이터를 암기하는 경향이 있음을 밝혀냈다. 이는 딥러닝 모델이 갖는 과적합 문제와도 연결되는 사항으로, 이러한 학습데이터 암기가 딥러닝의 일반화 성능을 저해하는데 큰 영향을 미친다고 볼 수 있다. However, when a data set that is completely different from the data set used for learning is applied to the performance evaluation stage of the small-shot learning system, a phenomenon in which classification performance is lowered was reported in one study result. Another study found that deep learning models tend to memorize input training data as they are trained. This is also related to the overfitting problem of deep learning models, and it can be seen that memorizing these learning data has a great effect on inhibiting the generalization performance of deep learning.

결국 선행 연구들은 소수 샷 학습에 있어 높은 일반화 성능을 달성하는데 한계가 있었음을 보여주는 연구로, 이전과는 다른 새로운 학습 방식으로 조금 더 향상된 일반화 성능을 보이는 것이 후속 연구에 있어 중요한 과제로 대두되기 시작했다.In the end, the previous studies showed that there was a limit to achieving high generalization performance in small-shot learning. .

기존의 소수 샷 학습 모델들은 순수하게 딥러닝 기반의 영상처리 프로세스만을 이용해 분류의 정확도를 향상시키는데 주력하다 보니 현실에서 발생할 수 있는 전혀 다른 데이터 셋에 대한 인지 정확도는 고려되지 못했던 것으로 판단된다.It is judged that the recognition accuracy for completely different data sets that can occur in reality was not taken into account because the existing few shot learning models focus on improving the accuracy of classification using only a deep learning-based image processing process.

대한민국 등록특허공보 제 10-1888647 호 (발명의 명칭: 이미지 분류 장치 및 방법)Republic of Korea Patent Publication No. 10-1888647 (Title of the invention: Image classification apparatus and method)

본 발명은 전술한 문제점을 해결하기 위한 것으로, 입력 영상의 데이터 분포와 출력 분포에 다양한 변화를 주는 셀프 증강 기법과 지역 특징 학습이라는 미세 조정 기법을 사용하여 소수의 데이터만으로도 딥러닝 기반의 분류기 학습이 가능하고 높은 수준의 분류 정확도를 제공하는 소수 샷 이미지 분류 방법 및 장치를 제공 하는 것을 목적으로 한다.The present invention is to solve the above problems, and deep learning-based classifier learning is performed with only a small amount of data using a self-augmentation technique that gives various changes to the data distribution and output distribution of an input image and a fine-tuning technique called regional feature learning. An object of the present invention is to provide a fractional-shot image classification method and apparatus that is possible and provides a high level of classification accuracy.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical task to be achieved by the present embodiment is not limited to the technical task as described above, and other technical tasks may exist.

상술한 기술적 과제를 해결하기 위한 기술적 수단으로서, 본 개시의 제 1측면에 따른 딥러닝 기반 소수 샷 이미지 분류 방법은, 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계; 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계 및 상기 1차 학습 단계와 2차 학습 단계를 거친 영상 분류 모델에 신규 입력 영상을 입력하여 분류 결과를 출력하는 단계를 포함한다.As a technical means for solving the above-described technical problem, the deep learning-based fractional-shot image classification method according to the first aspect of the present disclosure generates a new input image by replacing a part of the input image with another part of the input image. A method of performing data augmentation processing on an input image according to a self-mixing technique, performing feature extraction on input images with data augmentation, learning the extracted features based on deep learning learning, and classifying an input image a primary learning step of generating a classification model; A secondary learning step and the first learning step and 2 of generating a new auxiliary classifier from the main classifier for inclusion in the classification model, and allowing the independently output results of each classifier to be shared with other classifiers according to the self-distillation technique and outputting a classification result by inputting a new input image to the image classification model that has undergone the differential learning step.

또한, 본 개시의 제 2 측면에 따른 딥러닝 기반 소수 샷 이미지 분류 장치는 통신 모듈; 소수 샷 이미지 분류 프로그램이 저장된 메모리; 상기 메모리에 저장된 프로그램을 실행하는 프로세서를 포함하되, 상기 소수샷 이미지 분류 프로그램은 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델을 포함하고, 상기 소수샷 이미지 분류 프로그램은 신규 입력 영상을 상기 영상 분류 모델에 입력하여 신규 입력 영상의 분류 결과를 출력하는 것이다.In addition, a deep learning-based fractional shot image classification apparatus according to a second aspect of the present disclosure includes: a communication module; a memory in which a fractional shot image classification program is stored; and a processor executing the program stored in the memory, wherein the fractional shot image classification program replaces a part of the input image with another part of the input image to generate a new input image by using a self-mixing technique to generate data on the input image. A primary learning step of performing augmentation processing, performing feature extraction on input images with data augmentation, learning the extracted features based on deep learning learning, and generating a classification model for classifying input images; To include in the classification model, a new auxiliary classifier is created from the main classifier, and the image classification model generated through the secondary learning step of generating the independently output results of each classifier is shared with other classifiers according to the self-distillation technique. Including, wherein the minor-shot image classification program outputs a classification result of the new input image by inputting a new input image to the image classification model.

전술한 본원의 과제 해결 수단 중 어느 하나에 의하면, 자기혼합이라는 데이터 증강 기법을 활용하므로, 딥러닝 기술이 학습 데이터를 암기하는 현상을 방지할 수 있다. 또한, 보조 분류기를 기반으로 한 자가증류 기법을 통해 모델의 출력 분포에 대해서 변화를 가할 수 있다. 이를 통해 시스템이 학습 데이터를 암기하는 것이 아닌 데이터가 갖고 있는 다양한 패턴 정보를 조금 더 민감하게 취득할 수 있도록 하여 소수의 새로운 영상에 대해서도 영상 내 객체를 올바르게 인식할 수 있는 정확도를 향상시킬 수 있다. According to any one of the above-described problem solving means of the present application, since a data augmentation technique called self-mixing is utilized, it is possible to prevent a phenomenon in which the deep learning technique memorizes the learning data. In addition, it is possible to apply a change to the output distribution of the model through the self-distillation technique based on the auxiliary classifier. This allows the system to more sensitively acquire various pattern information possessed by the data rather than memorize the learning data, thereby improving the accuracy of correctly recognizing objects in the image even for a small number of new images.

이러한 효과에 의하여, 수작업으로 데이터 세트에 라벨링을 수행하는 번거로운 작업을 거치지 않고서도, 산업 및 학계에서 겪고 있는 데이터 부족 현상을 해소할 수 있는 영상 분류 모델 구축이 가능하다. Due to this effect, it is possible to build an image classification model that can solve the data shortage experienced in industry and academia without going through the cumbersome work of manually labeling data sets.

도 1은 본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 장치를 도시한 블록도이다.
도 2는 본 발명의 다른 실시예에 소수 샷 이미지 분류 방법을 도시한 순서도이다.
도 3은 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증강 기법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 특징 추출 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증류 기법을 설명하기 위한 도면이다.1 is a block diagram illustrating a deep learning-based fractional shot image classification apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating a method for classifying a fractional shot image according to another embodiment of the present invention.
3 is a diagram for explaining a self-enhancement technique used in a method for classifying a small number of shots according to an embodiment of the present invention.
4 is a diagram for explaining a feature extraction process used in a method for classifying a fractional shot image according to an embodiment of the present invention.
5 is a diagram for explaining a self-distillation technique used in a method for classifying a small number of shots according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present application may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, it includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 “상에” 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be positioned “on” another member, this includes not only a case in which a member is in contact with another member but also a case in which another member is present between the two members.

도 1은 본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 장치를 도시한 블록도이다.1 is a block diagram illustrating a deep learning-based fractional shot image classification apparatus according to an embodiment of the present invention.

본 발명에 따른 소수 샷 이미지 분류 장치(100), 통신모듈(110), 메모리(120), 프로세서(130), 데이터베이스(140), 각종 입출력 모듈(150)을 포함할 수 있다.The fractional shot image classification apparatus 100 according to the present invention may include a communication module 110 , a memory 120 , a processor 130 , a database 140 , and various input/output modules 150 .

통신모듈(110)은 소수 샷 이미지 분류 장치(100)와 다른 컴퓨팅 장치 또는 서버 등과 유무선 통신 네트워크 접속을 통해 각종 데이터를 송수신할 수 있다. 특히, 통신모듈(110)은 학습을 위해 필요한 영상 데이터를 수신하거나, 이미지 분류 대상이 되는 신규 입력 영상을 수신하거나, 신규 입력 영상에 대한 분류 결과를 전송할 수 있다.The communication module 110 may transmit/receive various data through a wired/wireless communication network connection between the small number of shot image classification apparatus 100 and other computing devices or servers. In particular, the communication module 110 may receive image data necessary for learning, receive a new input image that is an image classification target, or transmit a classification result for the new input image.

메모리(120)에는 소수 샷 이미지 분류 프로그램이 저장된다. 소수샷 이미지 분류 프로그램은 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델을 포함한다. 이러한 소수샷 이미지 분류 프로그램은 신규 입력 영상을 영상 분류 모델에 입력하여 신규 입력 영상의 분류 결과를 출력한다. The memory 120 stores a fractional shot image classification program. The minor-shot image classification program performs data augmentation processing on the input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the input image, and applies data augmentation to the input images. A new auxiliary classifier from the main classifier in the primary learning step of performing feature extraction, generating a classification model that classifies the input image, and performing feature extraction based on deep learning learning, and the classification model and includes an image classification model generated through a secondary learning step in which the independently output result of each classifier is shared with other classifiers according to the self-distillation technique. The small-shot image classification program inputs a new input image to an image classification model and outputs a classification result of the new input image.

한편 메모리(120)에는 소수 샷 이미지 분류 장치(100)의 구동을 위한 운영 체제나 소수 샷 이미지 분류 프로그램의 실행 과정에서 발생되는 여러 종류가 데이터가 저장된다. 이때, 메모리(120)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치를 통칭하는 것이다. Meanwhile, the memory 120 stores various types of data generated during the execution of an operating system for driving the minor-shot image classification apparatus 100 or a fractional-shot image classification program. In this case, the memory 120 collectively refers to a non-volatile storage device that continuously maintains stored information even when power is not supplied, and a volatile storage device that requires power to maintain the stored information.

프로세서(130)는 메모리(120)에 저장된 프로그램을 실행하되, 소수 샷 이미지 분류 프로그램의 실행에 따르는 전체 과정을 제어한다. 프로세서(120)가 수행하는 각각의 동작에 대해서는 추후 보다 상세히 살펴보기로 한다.The processor 130 executes the program stored in the memory 120, but controls the entire process according to the execution of the fractional shot image classification program. Each operation performed by the processor 120 will be described in more detail later.

이러한 프로세서(130)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이 크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The processor 130 may include all kinds of devices capable of processing data. For example, it may refer to a data processing device embedded in hardware having a physically structured circuit to perform a function expressed as code or instructions included in a program. As an example of the data processing device embedded in the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific (ASIC) An integrated circuit) and a processing device such as a field programmable gate array (FPGA) may be included, but the scope of the present invention is not limited thereto.

한편, 소수 샷 이미지 분류 장치(100)는 데이터베이스(140) 등을 더 포함할 수 있으며, 이는 프로세서(130)의 제어에 따라, 소수 샷 이미지 분류 프로그램의 실행에 필요한 데이터를 저장 또는 제공한다. 이러한 데이터베이스는 메모리(110)와는 별도의 구성 요소로서 포함되거나, 또는 메모리(110)의 일부 영역에 구축될 수도 있다.On the other hand, the minor-shot image classification apparatus 100 may further include a database 140 , which stores or provides data necessary for the execution of the minority-shot image classification program under the control of the processor 130 . Such a database may be included as a component separate from the memory 110 , or may be built in some area of the memory 110 .

또한, 소수 샷 이미지 분류 장치(100)는 입출력 모듈(150) 등을 더 포함할 수 있다. 소수 샷 이미지 분류 장치(100)의 동작을 위한 각종 입출력 인터페이스 등이 이에 해당한다.In addition, the minor-shot image classification apparatus 100 may further include an input/output module 150 and the like. Various input/output interfaces for the operation of the fractional shot image classification apparatus 100 correspond to this.

도 2는 본 발명의 다른 실시예에 소수 샷 이미지 분류 방법을 도시한 순서도이다.2 is a flowchart illustrating a method for classifying a fractional shot image according to another embodiment of the present invention.

먼저, 입력 영상이 입력되면(S210), 이에 대한 특징 추출을 수행하기 전에 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행한다(S220).First, when an input image is input ( S210 ), data on the input image is augmented according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the input image before performing feature extraction. processing is performed (S220).

이를 수학식으로 표현하면 다음과 같다.This can be expressed as a mathematical expression as follows.

[수학식 1][Equation 1]

이때, T 는

와 같은 처리를 수행하는 변환 함수이며, 이미지

의 패치

을 다른 부분의 패치

로 대체하는 것을 의미한다.In this case, T is

It is a transform function that performs the same processing as

patch of

to patch other parts

means to be replaced with

도 3은 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증강 기법을 설명하기 위한 도면이다.3 is a diagram for explaining a self-enhancement technique used in a method for classifying a small number of shots according to an embodiment of the present invention.

좌측의 입력 영상에서, 해당 입력 영상의 일부 패치(300)를 해당 입력 영상의 다른 부분의 패치와 대체하는 자가 혼합 기법에 따라 데이터 증강 처리를 수행한다.In the left input image, data augmentation processing is performed according to a self-mixing technique in which a partial patch 300 of the corresponding input image is replaced with a patch of another part of the corresponding input image.

이러한 데이터 증강 처리가 수행됨에 따라, 해당 객체에 해당하는 정답을 찾기위해 기존에는 중요하게 여기지 않았던 영상의 세부적인 특징까지 고려하도록 모델 학습 과정이 이뤄져 결론적으로 특징 추출기의 성능을 향상 시킬 수 있다. 이를 통해, 딥러닝이 기존에 갖고있던 학습데이터 암기 문제를 방지함으로써, 입력 영상이 갖는 고유한 패턴을 고려한 특징을 추출하도록 학습이 이루어 지며, 이러한 방식을 통해 학습 단계에서 보지 못한 카테고리의 영상에서도 유의미한 패턴을 추출하는 효과를 기대할 수 있다.As such data augmentation processing is performed, the model learning process is performed to consider even the detailed features of the image that were not previously considered important to find the correct answer for the object, and consequently, the performance of the feature extractor can be improved. Through this, learning is performed to extract features that consider the unique pattern of the input image by preventing the learning data memorization problem that deep learning had in the past. The effect of extracting the pattern can be expected.

이와 같은 데이터 증강 처리 외에도, 입력 영상에 대하여 영상의 크기 또는 형태 등을 자동으로 변환하는 전처리 과정이 추가로 수행될 수 있다.In addition to the data augmentation processing, a preprocessing process of automatically converting the size or shape of an input image may be additionally performed.

다시 도 2를 참조하면, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 추출된 특징을 학습하는 과정을 통해, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계를 수행한다(S230).Referring back to FIG. 2 , a primary method for generating a classification model for classifying input images through a process of performing feature extraction on input images on which data augmentation has been made and learning the extracted features based on deep learning learning A learning step is performed (S230).

도 4는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 특징 추출 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a feature extraction process used in a method for classifying a fractional shot image according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 복수의 서브 계층을 갖는 멀티 특징 추출 분류기를 통해 분류 모델을 생성할 수 있다. 이때, 분류기는 코사인 유사도를 이용한 분류 기법을 사용한다. 코사인 분류기를 사용하게 되면 특징 추출기가 클래스 내 데이터 간의 분포의 변화를 낮춰주는 쪽으로 학습하도록 유인하기 때문에 새로운 카테고리를 분류해야 하는 소수 샷 학습 방식에서는 유리하게 작용한다.As shown in FIG. 4 , a classification model may be generated through a multi-feature extraction classifier having a plurality of sub-layers. In this case, the classifier uses a classification technique using cosine similarity. The use of a cosine classifier is advantageous in the fractional shot learning method that requires classifying a new category because it induces the feature extractor to learn to reduce the change in the distribution of data within the class.

특히, 본 발명에서는 자가 혼합에 따른 데이터 증강 처리를 수행한 데이터를 이용하여 특징을 추출하고, 분류 모델을 생성하므로, 도시된 바와 같이, 여러 서브 계층(3-1, 4-1, 4-2)을 추가로 포함하는 분류 모델을 생성할 수 있다.In particular, in the present invention, features are extracted using data subjected to data augmentation processing according to self-mixing and a classification model is generated. ) can be created to further include a classification model.

다시 도 2를 참조하면, 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 수행하여 최종적인 영상 분류 모델을 생성한다(S240). Referring back to FIG. 2 , a secondary learning step is performed in which a new auxiliary classifier is generated from the main classifier to be included in the classification model, and the independently output result of each classifier is shared with other classifiers according to the self-distillation technique. to generate a final image classification model (S240).

도 5는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증류 기법을 설명하기 위한 도면이다.5 is a diagram for explaining a self-distillation technique used in a method for classifying a small number of shots according to an embodiment of the present invention.

자가증류 기법은 딥러닝 모델의 중간 계층으로부터 또 다른 보조 분류기를 갖는 서브 모델을 생성하고, 이 보조 분류기가 출력하는 예측 결과와 메인 분류기가 예측하는 결과를 쿨백 라이블러 발산(KL divergence)을 통해 지식 공유를 하는 기법을 의미한다. 입력 샘플에 대해 각 분류기는 독립적인 분류 결과를 출력하게 되고 이러한 정보를 서로 공유하게 함으로써 메인 분류기가 좀 더 풍부한 정보의 출력 분포를 갖도록 정규화를 시킨다.The self-distillation technique creates a sub-model with another auxiliary classifier from the middle layer of the deep learning model, and the prediction result output by this auxiliary classifier and the result predicted by the main classifier are known through KL divergence. It means sharing method. For an input sample, each classifier outputs an independent classification result, and by sharing this information with each other, the main classifier is normalized to have a richer information output distribution.

이와 관련한 수식은 다음과 같다. The related formula is as follows.

[수학식 2][Equation 2]

수학식 2에서

는 학습 데이터의 카테고리를,

분류기의 개수를,

는 학습 파라미터를 의미한다. 이와 같이, 자가증류 기법에서는 각 영상의 특징을 나타내는 특징 벡터를 분류하여 지식을 공유하도록 하고, 각 분류기에서 출력된 지식 정보인 각 클래스의 확률 분포가 비슷해지도록 정규화하여, 메인 분류 모델이 좀더 정확한 분류를 수행할 수 있도록 한다.in Equation 2

is the category of the training data,

the number of classifiers,

is a learning parameter. As such, in the self-distillation technique, knowledge is shared by classifying feature vectors representing features of each image, and the probability distribution of each class, which is the knowledge information output from each classifier, is normalized to be similar, so that the main classification model can be classified more accurately. to be able to perform

다시 도 2를 참조하면, 앞선 단계를 통해 구축한 영상 분류 모델을 이용하여, 새롭게 입력된 영상에 대하여 분류 결과를 출력한다(S250). 분류 결과를 이용하여 영상 분류 모델의 성능을 평가하는 단계를 추가적으로 수행할 수 있다.Referring again to FIG. 2 , a classification result is output for a newly input image by using the image classification model constructed through the previous step ( S250 ). The step of evaluating the performance of the image classification model may be additionally performed using the classification result.

이와 같은 평가 단계에서는 입력 영상에 대한 특징을 추출하되, 앞서 설명한 1차 및 2차 학습 단계와는 상이하게, 단일 계층 구조의 특징 추출기만을 사용하여 최종 특징을 추출하도록 한다. 그리고, 성능 평가를 위해서는 학습 단계에서는 입력되지 않은 소수의 입력 영상을 이용하여 쿼리 영상과의유사도를 기반으로 해당 영상이 속하는 클래스를 최종적으로 평가하도록 할 수 있다. In this evaluation step, the features of the input image are extracted, but unlike the first and second learning steps described above, the final features are extracted using only the feature extractor of a single hierarchical structure. And, for performance evaluation, a class to which the corresponding image belongs may be finally evaluated based on the similarity with the query image using a small number of input images that are not input in the learning stage.

본 발명에서는 이와 같이, 1차 학습 단계와 2차 학습 단계를 통해 영상 분류 모델을 생성하는 것을 특징으로 포함한다.In the present invention, as described above, the image classification model is generated through the primary learning stage and the secondary learning stage.

본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The deep learning-based fractional shot image classification method according to an embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with reference to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

100: 소수 샷 이미지 분류 장치
110: 통신모듈
120: 메모리
130: 프로세서
140: 데이터베이스
150: 입출력 모듈100: fractional shot image classification device
110: communication module
120: memory
130: processor
140: database
150: input/output module

Claims

딥러닝 기반 소수 샷 이미지 분류 방법에 있어서,
입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와
상기 분류 모델에 포함된 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델이 제공되는 단계;
상기 영상 분류 모델에 신규 입력 영상을 입력하여 분류 결과를 출력하는 단계를 포함하는 딥러닝 기반 소수 샷 이미지 분류 방법.In the deep learning-based fractional shot image classification method,
Data augmentation processing is performed on the input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the input image, and feature extraction is performed on the input images on which data augmentation has been performed. , a primary learning step of learning the extracted features based on deep learning learning and generating a classification model that classifies the input image;
The image classification model generated through the secondary learning step of generating a new auxiliary classifier from the main classifier included in the classification model and sharing the independently output result of each classifier with other classifiers according to the self-distillation technique provided;
A deep learning-based fractional shot image classification method comprising the step of inputting a new input image to the image classification model and outputting a classification result.

제 1 항에 있어서,
상기 1차 학습 단계는 복수의 서브 계층을 갖는 멀티 특징 추출 분류기를 통해 상기 분류 모델을 생성하되, 상기 분류 모델은 코사인 유사도를 이용하여 분류를 수행하는 것인, 딥러닝 기반 소수 샷 이미지 분류 방법.The method of claim 1,
The primary learning step generates the classification model through a multi-feature extraction classifier having a plurality of sub-layers, wherein the classification model performs classification using cosine similarity.

제 1 항에 있어서,
상기 2차 학습 단계는 상기 보조 분류기가 출력하는 예측 결과와 상기 메인 분류기가 예측하는 결과를 쿨백 라이블러 발산(KL divergence)을 통해 지식 공유를 수행하는 것인, 딥러닝 기반 소수 샷 이미지 분류 방법.The method of claim 1,
In the secondary learning step, the prediction result output by the auxiliary classifier and the result predicted by the main classifier are shared with knowledge through KL divergence, a deep learning-based fractional shot image classification method.

딥러닝 기반 소수 샷 이미지 분류 장치에 있어서,
통신 모듈;
소수 샷 이미지 분류 프로그램이 저장된 메모리;
상기 메모리에 저장된 프로그램을 실행하는 프로세서를 포함하되,
상기 소수샷 이미지 분류 프로그램은 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델을 포함하고, 상기 소수샷 이미지 분류 프로그램은 신규 입력 영상을 상기 영상 분류 모델에 입력하여 신규 입력 영상의 분류 결과를 출력하는 것인 딥러닝 기반 소수샷 이미지 분류 장치.In the deep learning-based fractional shot image classification apparatus,
communication module;
a memory in which a fractional shot image classification program is stored;
Including a processor executing the program stored in the memory,
The small-shot image classification program performs data augmentation processing on the input image according to a self-mixing technique of generating a new input image by replacing a part of the input image with another part of the input image, and A new auxiliary classifier from the main classifier to the primary learning step of performing feature extraction on and an image classification model generated through a secondary learning step to generate and share the independently output results of each classifier with other classifiers according to the self-distillation technique, wherein the fractional shot image classification program is a new input image A deep learning-based fractional shot image classification apparatus to output a classification result of a new input image by inputting to the image classification model.

제 4 항에 있어서,
상기 1차 학습 단계는 복수의 서브 계층을 갖는 멀티 특징 추출 분류기를 통해 상기 분류 모델을 생성하되, 상기 분류 모델은 코사인 유사도를 이용하여 분류를 수행하는 것인, 딥러닝 기반 소수 샷 이미지 분류 장치.5. The method of claim 4,
The primary learning step generates the classification model through a multi-feature extraction classifier having a plurality of sub-layers, wherein the classification model performs classification using cosine similarity.

제 4 항에 있어서,
상기 2차 학습 단계는 상기 보조 분류기가 출력하는 예측 결과와 상기 메인 분류기가 예측하는 결과를 쿨백 라이블러 발산(KL divergence)을 통해 지식 공유를 수행하는 것인, 딥러닝 기반 소수 샷 이미지 분류 장치.5. The method of claim 4,
In the secondary learning step, the prediction result output by the auxiliary classifier and the result predicted by the main classifier are shared by knowledge sharing through KL divergence, a deep learning-based fractional shot image classification apparatus.

제 1 항 내지 제 3 항 중 어느 한 항에 따른 딥러닝 기반 소수 샷 이미지 분류 방법 을 수행하기 위한 컴퓨터 프로그램이 기록된 비일시적 컴퓨터 판독가능 기록매체.A non-transitory computer-readable recording medium in which a computer program for performing the deep learning-based fractional shot image classification method according to any one of claims 1 to 3 is recorded.