KR20220138696A

KR20220138696A - Method and apparatus for classifying image

Info

Publication number: KR20220138696A
Application number: KR1020210044691A
Authority: KR
Inventors: 안가온; 송현오; 문승용
Original assignee: 서울대학교산학협력단
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2022-10-13
Also published as: KR102505303B1

Abstract

An image classification method and apparatus are disclosed. The image classification method according to one embodiment of the present disclosure is an image classification method based on a robust classification model trained to identify a class of an input image. The image classification method may include the steps of: extracting a feature map for an input image output from a specific layer of a classification model; extracting style information of the feature map using statistical information of the feature map in a convolution layer; performing normalization on the feature map by applying the style information in a normalization layer; and predicting the label of the input image based on the normalized feature map.

Description

이미지 분류 방법 및 장치{METHOD AND APPARATUS FOR CLASSIFYING IMAGE}Image classification method and apparatus

본 개시는 인공신경망 네트워크에 AdaIN(Adaptive Instance Normalization) 기법을 적용하여 강건한 인공신경망 추론이 가능하도록 한 분류기를 제공하는 이미지 분류 방법 및 장치에 관한 것이다.The present disclosure relates to an image classification method and apparatus for providing a classifier that enables robust artificial neural network inference by applying an adaptive instance normalization (AdaIN) technique to an artificial neural network.

일반적으로 학습된 인공신경망은 사람에게는 느껴지지 않을 정도의 미세한 입력 값 변화에도 그에 따른 출력 값이 크게 바뀔 수 있다. 이러한 민감함을 악용하여, 작은 입력 값 교란(adversarial perturbation)을 통해 출력 값을 완전히 바꿔 오작동을 일으키는 행위를 적대적 공격(adversarial attack), 이러한 오작동을 일으키는 입력 값을 적대적 예시(adversarial example)라고 한다. In general, a learned artificial neural network may change its output value significantly even with a minute change in input value that is not felt by humans. An action that takes advantage of this sensitivity to completely change an output value through small input value perturbation to cause a malfunction is called an adversarial attack, and an input value that causes such a malfunction is called an adversarial example.

적대적 공격에 대한 취약함은 인공지능 기술을 의료/자율주행 등 오작동에 대한 리스크가 큰 분야에 적용하는 데에 큰 난관이 된다. 예를 들어, 악의적인 사람이 교통표지판에 대해 적대적 공격을 할 경우, 자율주행 자동차에서 표지판을 잘못 인식하여 교통사고가 발생할 수 있다.Vulnerability to hostile attacks makes it a great challenge to apply AI technology to areas with high risk of malfunctions, such as medical/autonomous driving. For example, if a malicious person conducts a hostile attack on a traffic sign, the autonomous vehicle may incorrectly recognize the sign and cause a traffic accident.

이에, 선행기술 1에 개시된 강건성을 높이기 위해 가장 널리 쓰이는 방법인 PGD(projected gradient descent) 기반 적대적 훈련 기법은 데이터 확장 기법(data augmentation)의 하나로, 훈련 데이터로부터 적대적 예시를 생성한 후 이에 대한 손실 함숫값이 0에 가까워지도록 신경망을 훈련한다. 구체적으로, 매 훈련 데이터에 대해 그 데이터의 주변에서 손실 함숫값이 높아지는 데이터를 PGD로 찾아, 해당 데이터에 대해 인공신경망을 학습한다. 이 방법은 현재까지 나온 강건성 향상 기법 중 가장 성능이 뛰어나다고 볼 수 있다.Accordingly, the projected gradient descent (PGD)-based adversarial training technique, which is the most widely used method to increase the robustness disclosed in Prior Art 1, is one of data augmentation techniques. Train the neural network to get close to zero. Specifically, for every training data, PGD finds data with a higher loss function value in the vicinity of the data, and the artificial neural network is trained on the data. This method can be considered to have the best performance among the methods for improving the robustness that have been released so far.

하지만 PGD 기반 적대적 훈련 기법은 적대적 예시를 생성하는 과정에서 PGD 방식으로 적대적 예시를 최적화하는 과정을 거치기 때문에 훈련 시간이 상당히 많이 든다는 문제가 있다. 구체적으로 일반 인공 신경망 훈련 기법보다 필요한 연산량이 보통 10배 이상이다. 또한, 입력 이미지의 도메인에서 특정 거리 안의 적대적 예시만을 찾기 때문에 생성되는 적대적 예시가 제한적이라는 문제가 있다.However, the PGD-based adversarial training technique has a problem in that it takes a lot of training time because it goes through the process of optimizing the adversarial example in the process of generating the hostile example. Specifically, the amount of computation required is usually more than 10 times that of general artificial neural network training techniques. In addition, there is a problem in that the generated hostile examples are limited because only hostile examples within a specific distance are found in the domain of the input image.

이에 반해 선행기술 2에서는 거리 안의 적대적 예시를 찾는 데서 벗어나 스타일 전이(style transfer)를 활용하여 한 이미지의 형태에 다른 이미지의 스타일이 입혀진(이미지의 통계적 특성을 다른 이미지에 합성하는 것으로 볼 수 있음), 새로운 데이터 셋을 만든다. 그리고 생성된 데이터 셋을 통해 인공신경망을 훈련하여 인공신경망의 스타일에 대한 의존도를 낮춰줄 수 있다(즉, 형태를 더 많이 보도록 해준다). 이 기법은 style transfer를 통해 강건성을 향상시키고자 하는 것이나, style transfer 된 이미지를 입력 이미지 도메인에서 미리 만들어 저장해야 한다는 문제가 있다. 또한, 해당 기존 기법은 날씨 변화, 밝기 변화 등 일반적인 노이즈에 대해서는 강건성을 키워주지만, 적대적 공격에 대해서는 강건성을 키워주지 못 한다는 문제가 있다.On the other hand, in Prior Art 2, the style of one image is applied to the form of another image by using style transfer instead of looking for hostile examples within the street (it can be seen as synthesizing the statistical characteristics of an image to another image) , create a new data set. And by training the neural network with the generated data set, we can reduce our dependence on the neural network's style (i.e., it allows us to see more of the shape). This technique is intended to improve robustness through style transfer, but there is a problem that the style-transferred image needs to be created and saved in advance in the input image domain. In addition, the existing technique increases robustness against general noise such as weather changes and brightness changes, but has a problem in that it does not increase robustness against hostile attacks.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-mentioned background art is technical information possessed by the inventor for the derivation of the present invention or acquired in the process of derivation of the present invention, and cannot necessarily be said to be a known technique disclosed to the general public prior to the filing of the present invention.

선행기술 1: Madry et al., Towards Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018 Prior Art 1: Madry et al., Towards Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018 선행기술 2: Geirhos et al., ImageNet-trained CNNs are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness, ICLR 2019 Prior Art 2: Geirhos et al., ImageNet-trained CNNs are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness, ICLR 2019

본 개시의 실시 예의 일 과제는, 한 이미지에 다른 이미지의 스타일(색감, 질감 등)을 입혀주는 AdaIN(Adaptive instance normalization) 기법을 인공신경망 이미지 분류기에 적용하여, 분류기의 적대적 공격에 대한 강건성(adversarial robustness)을 향상시키고자 하는데 있다.One task of the embodiment of the present disclosure is to apply an adaptive instance normalization (AdaIN) technique that applies a style (color, texture, etc.) of another image to one image to an artificial neural network image classifier, thereby adversarial against hostile attacks of the classifier. robustness) is to be improved.

본 개시의 실시 예의 일 과제는, 의도적으로 학습 모델의 학습 시 추론(inference) 중간 과정에서 다른 이미지의 스타일 정보를 기존 입력 이미지의 스타일 정보와 섞어, 분류기가 입력 이미지를 인식할 때 스타일 정보에 집중하지 않도록 하여, 이미지의 작은 노이즈에 취약한 문제를 해결하고자 하는데 있다.One task of an embodiment of the present disclosure is to intentionally mix style information of another image with style information of an existing input image in an inference intermediate process when learning a learning model, and focus on style information when the classifier recognizes an input image This is to avoid the problem of being vulnerable to small noise in the image.

본 개시의 실시 예의 일 과제는, 인공신경망 네트워크의 중간중간에 BN(batch normalization) 레이어 대신에 AdaIN 레이어를 삽입하여, 한 이미지의 분류 과정에서 다른 이미지의 스타일이 입히도록 함으로써, 스타일 전이를 통해 원본 이미지의 형태 정보는 남기고 스타일 정보만을 수정하여 거리 척도에 의존할 필요가 없도록 하는데 있다.One task of the embodiment of the present disclosure is to insert an AdaIN layer instead of a BN (batch normalization) layer in the middle of an artificial neural network network so that the style of another image is applied in the classification process of one image, so that the original image through style transfer It is to avoid the need to depend on the distance scale by modifying only the style information while leaving the shape information of the image.

본 개시의 실시 예의 일 과제는, 입력 값의 도메인에서 적대적 예시를 찾는 것이 아니라, 분류기의 인식 과정 중간에서 두 이미지의 정보를 바로 섞어줌으로써, 강건성 향상을 위한 훈련 비용이 감소되도록 하는데 있다.An object of the embodiment of the present disclosure is to reduce the training cost for robustness improvement by directly mixing information of two images in the middle of the recognition process of the classifier, rather than finding a hostile example in the domain of the input value.

본 개시의 실시 예의 일 과제는, 인공신경망을 활용하는 모든 컴퓨터 비전 애플리케이션, 특히 오작동으로 인한 리스크가 큰 의료/자율주행 등의 분야에 적용하여 인공지능 기술의 안정성을 높이고자 하는데 있다.An object of the embodiments of the present disclosure is to increase the stability of artificial intelligence technology by applying it to all computer vision applications using artificial neural networks, particularly medical/autonomous driving, which has a high risk due to malfunction.

본 개시의 실시 예의 일 과제는, 시각처리 기반 인공지능 기술에 유연하게 적용하고자 하는 것으로, 특히 자율주행이나 의료 등 임의의 환경에서 정확한 판단을 요구하는 산업 분야에 안정성이 높은 인공지능 기술로써 사용 가능하도록 하는데 있다. One task of the embodiment of the present disclosure is to flexibly apply to visual processing-based artificial intelligence technology, and in particular, it can be used as an artificial intelligence technology with high stability in industrial fields that require accurate judgment in arbitrary environments such as autonomous driving or medical care. is to do it

본 개시의 실시 예의 일 과제는, 음성 스피커, 이미지 인식 등 현재 실제 서비스에 적용되고 있는 인공신경망에 미세 조정(fine-tuning) 기법으로 활용될 수 있도록 하고, 이미지 인식 외에 이미지 분할, 탐지 등의 분야에도 적용 가능하도록 하는데 있다.One task of the embodiment of the present disclosure is to enable it to be utilized as a fine-tuning technique in artificial neural networks currently applied to actual services such as voice speakers and image recognition, and in fields such as image segmentation and detection in addition to image recognition. It is also intended to be applicable to

본 개시의 실시 예의 목적은 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.Objects of the embodiments of the present disclosure are not limited to the above-mentioned tasks, and other objects and advantages of the present invention not mentioned may be understood by the following description, and will be more clearly understood by the embodiments of the present invention. will be. It will also be appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

본 개시의 일 실시 예에 따른 이미지 분류 방법은, 인공신경망 이미지 분류기에 AdaIN 기법을 적용하여 적대적 공격에 대해 강건한 인공신경망 추론이 가능하도록 하는 단계를 포함할 수 있다.The image classification method according to an embodiment of the present disclosure may include applying an AdaIN technique to an artificial neural network image classifier to enable robust artificial neural network inference against a hostile attack.

구체적으로 본 개시의 일 실시 예에 따른 이미지 분류 방법은, 입력 이미지의 클래스를 식별하도록 훈련된 강건한 분류 모델에 기반한 이미지 분류 방법으로서, 분류 모델의 특정 레이어에서 출력된 입력 이미지에 대한 특징맵을 추출하는 단계와, 컨볼루션 레이어에서 특징맵의 통계 정보를 이용하여 특징맵의 스타일 정보를 추출하는 단계와, 정규화 레이어에서 스타일 정보를 적용하여 특징맵을 대상으로 정규화를 수행하는 단계와, 정규화된 특징맵을 기반으로 입력 이미지의 레이블을 예측하는 단계를 포함할 수 있다.Specifically, the image classification method according to an embodiment of the present disclosure is an image classification method based on a robust classification model trained to identify a class of an input image, and extracts a feature map for an input image output from a specific layer of the classification model. The steps of: extracting style information of the feature map using statistical information of the feature map in the convolution layer; performing normalization on the feature map by applying the style information in the normalization layer; It may include predicting the label of the input image based on the map.

본 개시의 일 실시 예에 따른 이미지 분류 방법을 통하여, 분류기 중간중간에 AdaIN 레이어를 삽입해 한 이미지의 분류 과정에서 다른 이미지의 스타일이 입히도록 함으로써, 스타일 전이(style transfer)를 통해 적대적 공격에 대한 강건성을 향상시킬 수 있다.Through the image classification method according to an embodiment of the present disclosure, by inserting an AdaIN layer in the middle of the classifier so that the style of another image is applied in the classification process of one image, Robustness can be improved.

이 외에도, 본 발명의 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램이 저장된 컴퓨터로 판독 가능한 기록매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer-readable recording medium storing a computer program for executing the method may be further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages other than those described above will become apparent from the following drawings, claims, and detailed description of the invention.

본 개시의 실시 예에 의하면, 인공신경망 분류기의 학습 과정에서 한 이미지의 형태 정보에 다른 이미지의 스타일 정보를 입히면서 분류 학습을 진행하여, 인공신경망이 이미지의 스타일 정보에 과하게 의존하지 않도록 함으로써, 적대적 공격에 대한 내성이 있는, 즉 강건한 인공지능의 구축이 가능하도록 하여 사용자 만족도 및 성능을 향상시킬 수 있다.According to an embodiment of the present disclosure, in the learning process of the artificial neural network classifier, classification learning is carried out while applying the style information of another image to the shape information of one image, so that the artificial neural network does not depend too much on the style information of the image, thereby adversarial It can improve user satisfaction and performance by enabling the construction of attack-resistant, that is, robust AI.

또한, 적대적 예시를 따로 입력값 도메인에서 최적화하는 과정을 거치지 않고 인공신경망의 forward pass 과정에서 AdaIN 레이어를 활용함으로써, 강건성 향상에 필요한 연산량을 감소시킬 수 있다.In addition, the amount of computation required to improve robustness can be reduced by using the AdaIN layer in the forward pass process of the artificial neural network without going through the process of optimizing adversarial examples separately in the input value domain.

또한, 거리가 적은 적대적 예시만을 찾기 때문에 공격의 다양성이 한정적인 기존 방법과 달리, 스타일 이미지를 인공신경망 forward pass 과정에서 중간중간에 투입함으로써, 스타일 이미지의 다양성에 따라 적대적 공격의 양상도 다양해지도록 할 수 있다.In addition, unlike the existing method in which the variety of attacks is limited because only adversarial examples with a small distance are searched, by inserting style images in the middle of the artificial neural network forward pass process, the aspect of hostile attacks can be diversified according to the diversity of style images. can

또한, 입력 값의 도메인에서 적대적 예시를 찾는 것이 아니라, 분류기의 인식 과정 중간에서 두 이미지의 정보를 바로 섞어줌으로써, 강건성 향상을 위한 훈련 비용을 감소시킬 수 있으며, 기존 방식 대비 성능은 유지하면서 비용을 감소시켜 강건한 AI 도입을 촉진시킬 수 있다.In addition, by mixing information from two images right in the middle of the recognition process of the classifier rather than looking for hostile examples in the domain of the input value, the training cost for robustness improvement can be reduced, and the cost can be reduced while maintaining performance compared to the existing method. can be reduced to promote robust AI adoption.

또한, 의료, 보안 시스템 산업 등 인간의 생명과 직접 연관이 있는 산업 분야의 인공지능 제품에 적용하여 작동 안정성을 높일 수 있다.In addition, it can be applied to artificial intelligence products in industries directly related to human life, such as medical and security system industries, to increase operational stability.

또한, 무인택배 드론, 자율주행차와 같이 변덕스러운 외부 환경에도 잘 작동해야 하는 인공지능 제품들에 적용하여 작동 안정성 및 제품 신뢰도를 높일 수 있다.In addition, it can be applied to artificial intelligence products that must work well even in volatile external environments, such as unmanned delivery drones and autonomous vehicles, to increase operational stability and product reliability.

또한, 이미지 인식과 관련된 모든 인공지능 제품(즉, 인공신경망을 활용하는 모든 컴퓨터 비전 어플리케이션)에 적용되도록 함으로써, 기술의 사업적 완성도를 향상시킬 수 있다. In addition, by making it applicable to all artificial intelligence products related to image recognition (that is, all computer vision applications that utilize artificial neural networks), the business maturity of the technology can be improved.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 개시의 일 실시 예에 따른 이미지 분류 시스템의 개략적인 예시도이다.
도 2는 본 개시의 일 실시 예에 따른 이미지 분류 장치를 개략적으로 나타낸 블록도이다.
도 3 및 도 4는 본 개시의 일 실시 예에 따른 이미지 분류 장치를 설명하기 위한 예시도이다.
도 5는 본 개시의 일 실시 예에 따른 이미지 분류 방법을 설명하기 위한 흐름도이다.1 is a schematic illustration of an image classification system according to an embodiment of the present disclosure.
2 is a block diagram schematically illustrating an image classification apparatus according to an embodiment of the present disclosure.
3 and 4 are exemplary views for explaining an image classification apparatus according to an embodiment of the present disclosure.
5 is a flowchart illustrating an image classification method according to an embodiment of the present disclosure.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시 예들을 참조하면 명확해질 것이다.Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the detailed description in conjunction with the accompanying drawings.

그러나 본 발명은 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 아래에 제시되는 실시 예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.However, it should be understood that the present invention is not limited to the embodiments presented below, but may be implemented in a variety of different forms, and includes all transformations, equivalents, and substitutes included in the spirit and scope of the present invention. . The embodiments presented below are provided to complete the disclosure of the present invention, and to completely inform those of ordinary skill in the art to the scope of the present invention. In describing the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof. Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

이하, 본 발명에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings, and in the description with reference to the accompanying drawings, the same or corresponding components are given the same reference numerals, and overlapping descriptions thereof are omitted. decide to do

도 1은 본 개시의 일 실시 예에 따른 이미지 분류 시스템의 개략적인 예시도이다.1 is a schematic illustration of an image classification system according to an embodiment of the present disclosure.

도 1에 도시된 바와 같이, 이미지 분류 시스템(1)은 이미지 분류 장치(100), 사용자 단말(200), 서버(300) 및 네트워크(400)를 포함할 수 있다.As shown in FIG. 1 , the image classification system 1 may include an image classification apparatus 100 , a user terminal 200 , a server 300 , and a network 400 .

본 실시 예에서, 이미지 분류 시스템(1)은 인공신경망 이미지 분류기에 관한 것으로, 한 이미지에 다른 이미지의 스타일(색감, 질감 등)을 입혀주는 Adaptive instance normalization(AdaIN) 기법을 인공신경망 이미지 분류기에 적용하여, 분류기의 적대적 공격에 대한 강건성(adversarial robustness)을 향상시킬 수 있도록 하는 것을 특징으로 한다.In this embodiment, the image classification system 1 relates to an artificial neural network image classifier, and the Adaptive instance normalization (AdaIN) technique that applies a style (color, texture, etc.) of another image to one image is applied to the artificial neural network image classifier. Thus, it is characterized in that it is possible to improve the adversarial robustness of the classifier against hostile attacks.

학습된 인공신경망은 사람에게는 느껴지지 않을 정도의 미세한 입력값 변화에도 그에 따른 출력값이 크게 바뀔 수 있다. 이러한 민감함을 악용하여, 작은 입력값 교란(adversarial perturbation)을 통해 출력값을 완전히 바꿔 오작동을 일으키는 행위를 적대적 공격(adversarial attack), 이러한 오작동을 일으키는 입력값을 적대적 예시(adversarial example)라고 한다. 특히, 적대적 공격에 대한 취약함은 인공지능 기술을 의료/자율주행 등 오작동에 대한 리스크가 큰 분야에 적용하는 데에 큰 난관이 된다. 예를 들어, 악의적인 사람이 교통표지판에 대해 적대적 공격을 할 경우, 자율주행 자동차가 표지판을 잘못 인식하여 교통사고를 일으킬 수 있다.The learned artificial neural network can change its output value significantly even when the input value changes so small that it cannot be felt by humans. The act of using this sensitivity to completely change the output value through small input value perturbation to cause a malfunction is called an adversarial attack, and the input value causing such a malfunction is called an adversarial example. In particular, the vulnerability to hostile attacks poses a great difficulty in applying AI technology to areas with high risk of malfunction, such as medical/autonomous driving. For example, if a malicious person launches a hostile attack on a traffic sign, the self-driving car may misrecognize the sign and cause a traffic accident.

이에, 본 실시 예에서는, 적대적 공격에 대한 내성이 있는, 즉 강건(robust)한 인공지능을 구축하고자 하는 것으로, 구체적으로, 인공신경망 분류기의 학습 과정에서 한 이미지의 형태 정보에 다른 이미지의 스타일 정보를 입히면서 분류 학습을 진행하여, 인공신경망이 이미지의 스타일 정보에 과하게 의존하지 않도록 함으로써, 적대적 예시에 대한 강건성이 향상되도록 할 수 있다. Accordingly, in this embodiment, it is intended to build an artificial intelligence that is resistant to hostile attacks, that is, robust. Specifically, in the learning process of the artificial neural network classifier, the shape information of one image and the style information of another image It is possible to improve the robustness against adversarial examples by performing classification learning while applying .

즉 인공신경망이 인간보다 이미지의 작은 노이즈에 취약한 이유는 인공신경망이 이미지의 스타일 정보에 과하게 의존하기 때문이다. 이에 따라 본 실시 예에서는, 분류기가 입력 이미지를 인식할 때 스타일 정보에 집중하지 않도록, 의도적으로 추론(inference) 중간 과정에서 다른 이미지의 스타일 정보를 기존 입력 이미지의 스타일 정보와 섞어주고자 한다. 이를 위해 본 실시 예에서는, 구체적으로 인공신경망 네트워크의 중간중간에 batch normalization 레이어 대신에 adaptive Instance normalization 레이어를 삽입하여, 한 이미지의 분류 과정에서 중간 다른 이미지의 스타일이 입히도록 할 수 있다.In other words, the reason why artificial neural networks are more vulnerable to small noise in images than humans is that artificial neural networks rely too much on image style information. Accordingly, in the present embodiment, the style information of another image is intentionally mixed with the style information of the existing input image in an intermediate process of inference so that the classifier does not focus on the style information when recognizing the input image. To this end, in this embodiment, specifically, by inserting an adaptive instance normalization layer instead of a batch normalization layer in the middle of the artificial neural network, the style of another image can be applied during the classification process of one image.

종래의 강건성 향상방식은 적합한 적대적 예시를 찾기 위해 거리 정보에 의존하지만, 본 실시 예에서는 스타일 전이(style transfer)를 통해 원본 이미지의 형태 정보는 남기고 스타일 정보만을 수정하기 때문에 거리 척도에 의존할 필요가 없다. 또한, 기존의 style transfer 방식의 경우 입력값의 도메인에서 적대적 예시를 찾지만, 본 실시 예의 경우 분류기의 인식 과정 중간에서 두 이미지의 정보를 바로 섞어주므로 강건성 향상을 위한 훈련 비용이 상당히 적게 소요될 수 있다.The conventional robustness improvement method relies on distance information to find a suitable hostile example, but in this embodiment, it is not necessary to rely on the distance scale because the style information is modified while leaving the shape information of the original image through style transfer. none. In addition, in the case of the existing style transfer method, a hostile example is found in the domain of the input value, but in this embodiment, the information of the two images is directly mixed in the middle of the recognition process of the classifier, so the training cost for improving the robustness can be significantly reduced. .

따라서 본 실시 예의 이미지 분류 시스템(1)은 인공신경망을 활용하는 모든 컴퓨터 비전 애플리케이션에 부담 없이 적용될 수 있으며, 특히 오작동으로 인한 리스크가 큰 의료/자율주행 등의 분야에서 인공지능 기술의 안정성을 높이기 위해 중요하게 사용될 수 있다. Therefore, the image classification system 1 of this embodiment can be applied without burden to all computer vision applications that utilize artificial neural networks, and in particular, in order to increase the stability of artificial intelligence technology in fields such as medical/autonomous driving where there is a high risk due to malfunction. can be used importantly.

즉 예를 들어, 입력값에 자연환경 등의 요인으로 인해 노이즈가 끼면 인공신경망이 잘못된 출력값을 내보내는 경우가 존재하기 때문에, 본 실시 예의 이미지 분류 시스템(1)을 통해 인간의 생명과 직결된 의료, 자율주행 등의 분야에서 오작동을 줄일 수 있다. 또한 본 실시 예의 이미지 분류 시스템(1)은 예를 들어, 의료, 보안 시스템 산업 등 인간의 생명과 직접 연관이 있는 산업 분야의 인공지능 제품에 적용하여 작동 안정성을 높일 수 있고, 무인택배 드론, 자율주행차와 같이 변덕스러운 외부 환경에도 잘 작동해야 하는 인공지능 제품들에 적용하여 안정성을 높일 수 있으며, 이외에도 이미지 인식과 관련된 모든 인공지능 제품에 적용할 수 있다That is, for example, if the input value contains noise due to factors such as the natural environment, the artificial neural network may output an erroneous output value. It can reduce malfunctions in fields such as autonomous driving. In addition, the image classification system 1 of this embodiment can be applied to artificial intelligence products in industries directly related to human life, such as medical care and security system industries, to increase operational stability, and unmanned delivery drones, autonomous It can be applied to artificial intelligence products that must work well in a volatile external environment, such as a driving car, to increase stability. In addition, it can be applied to all artificial intelligence products related to image recognition.

한편, 본 실시 예에서는, 사용자들이 사용자 단말(200)에서 구현되는 어플리케이션 또는 웹사이트에 접속하여, 이미지 분류를 위한 이미지를 입력하거나 분류 모델을 학습하는 등의 과정을 수행할 수 있다. 이러한 사용자 단말(200)은 이미지 분류 어플리케이션 또는 이미지 분류 웹사이트에 접속한 후 인증 과정을 통하여 이미지 분류 서비스를 제공받을 수 있다. 인증 과정은 회원가입 등 사용자 정보를 입력하는 인증, 사용자 단말에 대한 인증 등을 포함할 수 있으나, 이에 한정되지 않고 이미지 분류 장치(100) 및/또는 서버(300)에서 전송되는 링크에 접속하는 것만으로 인증 과정이 수행될 수도 있다.Meanwhile, in the present embodiment, users may access an application or website implemented in the user terminal 200 and perform a process such as inputting an image for image classification or learning a classification model. The user terminal 200 may receive an image classification service through an authentication process after accessing an image classification application or an image classification website. The authentication process may include authentication for entering user information such as membership registration, authentication for a user terminal, etc., but is not limited thereto, and only accessing a link transmitted from the image classification apparatus 100 and/or server 300 is not limited thereto. The authentication process may be performed.

본 실시 예에서, 사용자 단말(200)은 사용자가 조작하는 데스크 탑 컴퓨터, 스마트폰, 노트북, 태블릿 PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 미디어 플레이어, 마이크로 서버, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 사용자 단말(200)은 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 단말기 일 수 있다. 사용자 단말(200)은 상술한 내용에 제한되지 아니하며, 웹 브라우징이 가능한 단말기는 제한 없이 차용될 수 있다.In this embodiment, the user terminal 200 is a desktop computer, a smartphone, a notebook computer, a tablet PC, a smart TV, a mobile phone, a personal digital assistant (PDA), a laptop, a media player, a micro server, a GPS (global) operated by the user. positioning system) devices, e-book terminals, digital broadcast terminals, navigation devices, kiosks, MP3 players, digital cameras, home appliances, and other mobile or non-mobile computing devices. Also, the user terminal 200 may be a wearable terminal such as a watch, glasses, a hair band, and a ring having a communication function and a data processing function. The user terminal 200 is not limited to the above, and a terminal capable of web browsing may be borrowed without limitation.

한편, 본 실시 예에서 이미지 분류 시스템(1)은 이미지 분류 장치(100) 및/또는 서버(300)에 의해 구현될 수 있다.Meanwhile, in the present embodiment, the image classification system 1 may be implemented by the image classification apparatus 100 and/or the server 300 .

서버(300)는 이미지 분류 장치(100)가 포함되는 이미지 분류 시스템(1)을 운용하기 위한 서버일 수 있다. 또한 서버(300)는 각종 인공 지능 알고리즘을 적용하는데 필요한 빅데이터와, 이미지 분류 장치(100)를 동작시키는 데이터를 제공하는 데이터베이스 서버일 수 있다. 그 밖에 서버(300)는 이미지 분류 시스템(1)이 구현될 수 있도록 하는 웹 서버 또는 어플리케이션 서버, 그리고 딥러닝 등의 인공지능 프로세스를 수행하는 학습 서버 등을 포함할 수 있다. 본 실시 예에서, 서버(300)는 상술하는 서버들을 포함하거나 이러한 서버들과 네트워킹 할 수 있다.The server 300 may be a server for operating the image classification system 1 including the image classification apparatus 100 . In addition, the server 300 may be a database server that provides big data necessary for applying various artificial intelligence algorithms and data for operating the image classification apparatus 100 . In addition, the server 300 may include a web server or application server that enables the image classification system 1 to be implemented, and a learning server that performs artificial intelligence processes such as deep learning. In this embodiment, the server 300 may include or network with the above-described servers.

특히, 본 실시 예에서, 서버(300)는 이미지 분류 장치(100)로부터 이미지 분류를 위한 이미지와 스타일을 입혀주기 위한 이미지를 수신하고, AdaIN 분류 알고리즘을 적용하여, 입력 이미지의 실제 레이블을 유추할 수 있다.In particular, in this embodiment, the server 300 receives an image for image classification and an image for styling from the image classification apparatus 100, and applies the AdaIN classification algorithm to infer the actual label of the input image. can

네트워크(400)는 이미지 분류 시스템(1)에서 이미지 분류 장치(100), 서버(300) 및 사용자 단말(200)을 연결하는 역할을 수행할 수 있다. 이러한 네트워크(400)는 예컨대 LANs(local area networks), WANs(Wide area networks), MANs(metropolitan area networks), ISDNs(integrated service digital networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 또한 네트워크(400)는 근거리 통신 및/또는 원거리 통신을 이용하여 정보를 송수신할 수 있다. 여기서 근거리 통신은 블루투스(bluetooth), RFID(radio frequency identification), 적외선 통신(IrDA, infrared data association), UWB(ultra-wideband), ZigBee, Wi-Fi(Wireless fidelity) 기술을 포함할 수 있고, 원거리 통신은 CDMA(code division multiple access), FDMA(frequency division multiple access), TDMA(time division multiple access), OFDMA(orthogonal frequency division multiple access), SC-FDMA(single carrier frequency division multiple access) 기술을 포함할 수 있다.The network 400 may serve to connect the image classification apparatus 100 , the server 300 , and the user terminal 200 in the image classification system 1 . The network 400 is, for example, wired networks such as local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), integrated service digital networks (ISDNs), wireless LANs, CDMA, Bluetooth, and satellite communication. It may cover a wireless network such as, but the scope of the present invention is not limited thereto. In addition, the network 400 may transmit and receive information using short-distance communication and/or long-distance communication. Here, the short-distance communication may include Bluetooth, radio frequency identification (RFID), infrared data association (IrDA), ultra-wideband (UWB), ZigBee, and wireless fidelity (Wi-Fi) technologies. Communication may include code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), single carrier frequency division multiple access (SC-FDMA) technology. can

네트워크(400)는 허브, 브리지, 라우터, 스위치 및 게이트웨이와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크(400)는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크(400)에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다. 더 나아가 네트워크(400)는 사물 등 분산된 구성 요소들 간에 정보를 주고받아 처리하는 IoT(Internet of Things, 사물인터넷) 망 및/또는 5G 통신을 지원할 수 있다.Network 400 may include connections of network elements such as hubs, bridges, routers, switches, and gateways. Network 400 may include one or more connected networks, eg, multiple network environments, including public networks such as the Internet and private networks such as secure enterprise private networks. Access to network 400 may be provided via one or more wired or wireless access networks. Furthermore, the network 400 may support an Internet of Things (IoT) network and/or 5G communication that exchanges and processes information between distributed components such as things.

한편, 도 2에 도시된 바와 같이, 이미지 분류 장치(100)는 메모리(110), 통신부(120), 프로세서(130) 및 사용자 인터페이스(140)를 포함할 수 있다.Meanwhile, as shown in FIG. 2 , the image classification apparatus 100 may include a memory 110 , a communication unit 120 , a processor 130 , and a user interface 140 .

메모리(110)는 이미지 분류 장치(100)의 동작에 필요한 각종 정보들을 저장하고, 이미지 분류 장치(100)를 동작시킬 수 있는 제어 소프트웨어를 저장할 수 있는 것으로, 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다. The memory 110 may store various types of information necessary for the operation of the image classification apparatus 100 , and may store control software capable of operating the image classification apparatus 100 , and may include a volatile or non-volatile recording medium. have.

메모리(110)는 하나 이상의 프로세서(130)와 연결되는 것으로, 프로세서(130)에 의해 실행될 때, 프로세서(130)로 하여금 이미지 분류 장치(100)를 제어하도록 야기하는(cause) 코드들을 저장할 수 있다.The memory 110 is connected to one or more processors 130 , and when executed by the processor 130 , may store codes that cause the processor 130 to control the image classification apparatus 100 . .

여기서, 메모리(110)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 이러한 메모리(110)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD. CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다.Here, the memory 110 may include magnetic storage media or flash storage media, but the scope of the present invention is not limited thereto. Such memory 110 may include internal memory and/or external memory, and may include volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, Non-volatile memory, such as NAND flash memory, or NOR flash memory, SSD. It may include a flash drive such as a compact flash (CF) card, an SD card, a Micro-SD card, a Mini-SD card, an Xd card, or a memory stick, or a storage device such as an HDD.

특히, 본 실시 예에서, 메모리(110)에는 본 개시에 따른 신경망 모델, 신경망 모델을 이용하여 본 개시의 다양할 실시 예를 구현할 수 있도록 구현된 모듈이 저장될 수 있다. 그리고, 메모리(110)에는 본 개시에 따른 학습을 수행하기 위한 알고리즘에 관련된 정보가 저장될 수 있다. 그 밖에도 본 개시의 목적을 달성하기 위한 범위 내에서 필요한 다양한 정보가 메모리(110)에 저장될 수 있으며, 메모리(110)에 저장된 정보는 서버 또는 외부 장치로부터 수신되거나 사용자에 의해 입력됨에 따라 갱신될 수도 있다. In particular, in this embodiment, the memory 110 may store a neural network model according to the present disclosure and a module implemented to implement various embodiments of the present disclosure using the neural network model. In addition, information related to an algorithm for performing learning according to the present disclosure may be stored in the memory 110 . In addition, various information necessary within the scope for achieving the object of the present disclosure may be stored in the memory 110, and the information stored in the memory 110 may be updated as it is received from a server or an external device or input by a user. may be

통신부(120)는 네트워크(400)와 연동하여 외부 장치(서버를 포함) 간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공할 수 있다. 또한 통신부(120)는 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다. 이러한 통신부(120)는 각종 사물 지능 통신(IoT(internet of things), IoE(internet of everything), IoST(internet of small things) 등)을 지원할 수 있으며, M2M(machine to machine) 통신, V2X(vehicle to everything communication) 통신, D2D(device to device) 통신 등을 지원할 수 있다. The communication unit 120 may provide a communication interface necessary to provide a transmission/reception signal between external devices (including a server) in the form of packet data by interworking with the network 400 . In addition, the communication unit 120 may be a device including hardware and software necessary for transmitting and receiving signals such as control signals or data signals through wired/wireless connection with other network devices. The communication unit 120 may support various kinds of intelligent communication (Internet of things (IoT), internet of everything (IoE), internet of small things (IoST), etc.), and M2M (machine to machine) communication, V2X (vehicle) to everything communication) communication, D2D (device to device) communication, and the like may be supported.

즉, 프로세서(130)는 통신부(120)를 통해 연결된 외부 장치로부터 각종 데이터 또는 정보를 수신할 수 있으며, 외부 장치로 각종 데이터 또는 정보를 전송할 수도 있다. 그리고, 통신부(120)는 WiFi 모듈, Bluetooth 모듈, 무선 통신 모듈, 및 NFC 모듈 중 적어도 하나를 포함할 수 있다.That is, the processor 130 may receive various data or information from an external device connected through the communication unit 120 , and may transmit various data or information to the external device. In addition, the communication unit 120 may include at least one of a WiFi module, a Bluetooth module, a wireless communication module, and an NFC module.

사용자 인터페이스(140)는 이미지 분류를 위해 이미지 분류 장치(100)에 입력되는 이미지들이 수집되고, 이미지 분류를 위한 사용자 요청 및 명령들이 입력되는 입력 인터페이스를 포함할 수 있다. 이때 이미지들은 사용자에 의해 입력되거나 서버로부터 획득될 수 있다.The user interface 140 may include an input interface in which images input to the image classification apparatus 100 are collected for image classification, and user requests and commands for image classification are input. In this case, the images may be input by a user or obtained from a server.

그리고 사용자 인터페이스(140)는 이미지 분류 장치(100)에서 수행된 결과가 출력되는 출력 인터페이스를 포함할 수 있다. 예를 들어, 입력된 원본 이미지(또는 입력된 콘텐츠 이미지와 스타일 이미지)를 바탕으로 AdapIN 정규화를 수행하며 추론을 진행한 결과 등이 출력될 수 있다. 즉 사용자 인터페이스(140)는 이미지 분류를 위한 사용자 요청 및 명령에 따른 결과를 출력할 수 있다.In addition, the user interface 140 may include an output interface through which results performed by the image classification apparatus 100 are output. For example, AdapIN normalization is performed based on the input original image (or the input content image and style image) and the result of inference may be output. That is, the user interface 140 may output a result according to a user request and command for image classification.

이러한 사용자 인터페이스(140)의 입력 인터페이스와 출력 인터페이스는 동일한 인터페이스에서 구현될 수 있다.The input interface and the output interface of the user interface 140 may be implemented in the same interface.

한편, 프로세서(130)는 이미지 분류 장치(100)의 전반적인 동작을 제어할 수 있다. 구체적으로, 프로세서(130)는 상술한 바와 같은 메모리(110)를 포함하는 이미지 분류 장치(100)의 구성과 연결되며, 상술한 바와 같은 메모리(110)에 저장된 적어도 하나의 명령을 실행하여 이미지 분류 장치(100)의 동작을 전반적으로 제어할 수 있다.Meanwhile, the processor 130 may control the overall operation of the image classification apparatus 100 . Specifically, the processor 130 is connected to the configuration of the image classification apparatus 100 including the memory 110 as described above, and executes at least one command stored in the memory 110 as described above to classify the image. It is possible to control overall operation of the device 100 .

프로세서(130)는 다양한 방식으로 구현될 수 있다. 예를 들어, 프로세서(130)는 주문형 집적 회로(Application Specific Integrated Circuit, ASIC), 임베디드 프로세서, 마이크로 프로세서, 하드웨어 컨트롤 로직, 하드웨어 유한 상태 기계(Hardware Finite State Machine, FSM), 디지털 신호 프로세서(Digital Signal Processor, DSP) 중 적어도 하나로 구현될 수 있다. The processor 130 may be implemented in various ways. For example, the processor 130 may include an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, hardware control logic, a hardware finite state machine (FSM), and a digital signal processor (Digital Signal). Processor, DSP).

프로세서(130)는 일종의 중앙처리장치로서 메모리(110)에 탑재된 제어 소프트웨어를 구동하여 이미지 분류 장치(100) 전체의 동작을 제어할 수 있다. 프로세서(130)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령어로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The processor 130 is a kind of central processing unit, and may control the overall operation of the image classification apparatus 100 by driving control software mounted on the memory 110 . The processor 130 may include any type of device capable of processing data. Here, the 'processor' may refer to a data processing device embedded in hardware having a physically structured circuit to perform a function expressed by, for example, a code or an instruction included in a program. As an example of the data processing device embedded in the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) circuit) and a processing device such as a field programmable gate array (FPGA), but the scope of the present invention is not limited thereto.

본 실시 예에서 프로세서(130)는 이미지 분류 장치(100)가 최적의 이미지 분류 결과를 출력하도록, 입력된 이미지에 대하여 딥러닝(Deep Learning) 등 머신 러닝(machine learning)을 수행할 수 있고, 메모리(110)는, 머신 러닝에 사용되는 데이터, 결과 데이터 등을 저장할 수 있다.In this embodiment, the processor 130 may perform machine learning, such as deep learning, on the input image so that the image classification apparatus 100 outputs an optimal image classification result, and the memory 110 may store data used for machine learning, result data, and the like.

즉 본 실시 예에서, 프로세서(130)는 딥러닝 기반 이미지 처리 기술을 활용한 이미지 분류를 수행하는 것으로, 분류하고자 하는 이미지의 레이블을 예측하도록 훈련된 이미지 처리 기반 학습 분류 모델을 로딩할 수 있다. 다시 말해, 프로세서(130)는 입력 이미지의 클래스를 식별하도록 훈련된 강건한 분류 모델을 로딩할 수 있다.That is, in this embodiment, the processor 130 performs image classification using deep learning-based image processing technology, and may load an image processing-based learning classification model trained to predict the label of an image to be classified. In other words, the processor 130 may load a robust classification model trained to identify the class of the input image.

도 3 및 도 4는 본 개시의 일 실시 예에 따른 이미지 분류 장치를 설명하기 위한 예시도이다.3 and 4 are exemplary views for explaining an image classification apparatus according to an embodiment of the present disclosure.

도 3 및 도 4를 참조하여, 본 실시 예의 강건한 분류 모델에 대해 설명하도록 한다.3 and 4, the robust classification model of the present embodiment will be described.

도 3(a)는 Batch normalization layer를 활용하는 기존의 인공신경망 분류기의 작동 과정을 나타내며, 도 3(b)는 본 실시 예의 AdaIN 정규화의 스타일 전이를 통한 강건한 인공신경망 분류기의 학습 과정, 도 3(c)는 본 실시 예의 AdaIN 정규화의 스타일 전이를 통한 강건한 인공신경망 분류기의 실제 사용(추론) 과정을 나타낸 것이다.Figure 3 (a) shows the operation process of the existing artificial neural network classifier utilizing the batch normalization layer, Figure 3 (b) is the learning process of the robust artificial neural network classifier through the style transfer of the AdaIN normalization of this embodiment, Figure 3 ( c) shows the actual use (inference) process of the robust artificial neural network classifier through the style transfer of the AdaIN regularization of this embodiment.

도 3(a)에 도시된 바와 같이, 종래의 인공신경망 분류기는 이미지가 입력되면 특정 레이어들(Other Layers)을 거치면서 적용되는 가중치 영향을 받은 다음 BN(Batch Normalization) 레이어를 사용하여 정규화 과정을 거치도록 배치하였다. 즉, 종래의 인공신경망 분류기는 입력 이미지를 피드 포워드(feed forward) 하는 과정 중간중간에 BN 레이어를 배치하여 정규화를 수행하도록 하였다. 여기서 BN 레이어에서는 batch 단위 입력 값의 평균과 분산을 이용해 정규화를 진행할 수 있다.As shown in Fig. 3(a), when an image is input, the conventional artificial neural network classifier is affected by weights applied through specific layers (Other Layers), and then performs a normalization process using a batch normalization (BN) layer. placed to pass. That is, the conventional artificial neural network classifier performs normalization by arranging the BN layer in the middle of the feed-forwarding process of the input image. Here, in the BN layer, normalization can be performed using the average and variance of batch unit input values.

반면, 본 실시 예에서는, 도 3(b)와 도 3(c)에 도시된 바와 같이, 종래의 인공신경망 분류기의 네트워크에서 BN 레이어를 사용하는 경우 하나의 스타일만 생성하게 되기 때문에, BN 레이어를 AdaIN 레이어로 변경하여 입력 이미지를 피드 포워드 하는 과정 중간중간에 AdaIN 레이어를 통해 정규화를 수행할 수 있도록 할 수 있다.On the other hand, in the present embodiment, as shown in FIGS. 3(b) and 3(c), when a BN layer is used in a network of a conventional artificial neural network classifier, only one style is generated, so the BN layer is By changing to the AdaIN layer, normalization can be performed through the AdaIN layer in the middle of the feed-forward process of the input image.

보다 구체적으로 설명하면, 본 실시 예에서 강건성 향상이 되는 분류 모델의 네트워크 F의 구조는 수학식 1과 같이 정의될 수 있다.More specifically, in the present embodiment, the structure of the network F of the classification model for which robustness is improved may be defined as in Equation (1).

여기서,

와

는 정규화 레이어를 제외한, 본 실시 예의 분류 모델을 구성하는 모듈들(특정 레이어들, Our Layers)을 의미할 수 있으며,

로 파라미터화(parameterize) 되어 있을 수 있다. 이때

와

는 신경망 레이어들에서 정규화 레이어를 제외한 모든 레이어들이 포함 가능할 수 있다. 예를 들어, convolutional layer, fully-connected layer 등의 레이어들이나 ReLU, maxpool 등 다양한 레이어들이 적용될 수 있다.here,

Wow

may mean modules (specific layers, Our Layers) constituting the classification model of this embodiment, except for the normalization layer,

may be parameterized. At this time

Wow

may include all layers except for the normalization layer in the neural network layers. For example, layers such as a convolutional layer and a fully-connected layer, or various layers such as ReLU and maxpool may be applied.

그리고

는 정규화 레이어를 의미할 수 있다. 즉, 본 실시 예에서,

는 AdaIN 정규화 레이어를 나타낼 수 있으며, 이전 레이어들로부터 두 가지 입력값을 받을 수 있다. 다시 말해, AdaIN 정규화 레이어는 정규화 될 입력값과, 정규화를 위한 평균 및 표준편차, 또는 정규화를 위한 평균 및 표준편차를 추출하기 위한 입력값을 입력받을 수 있다. 즉, AdaIN 정규화 레이어는 이전의 특정 레이어로부터 입력 이미지에 대한 특징맵(정규화 될 입력값)을 입력받을 수 있으며, 이전의 다른 특정 레이어로부터 상기 입력 이미지에 대한 특징맵을 정규화 하기 위한 평균 및 표준편차를 입력받을 수 있다. and

may mean a normalization layer. That is, in this embodiment,

may represent the AdaIN normalization layer, and may receive two input values from previous layers. In other words, the AdaIN normalization layer may receive an input value to be normalized and an input value for extracting a mean and standard deviation for normalization or a mean and standard deviation for normalization. That is, the AdaIN normalization layer can receive a feature map (input value to be normalized) for the input image from the previous specific layer, and the average and standard deviation for normalizing the feature map for the input image from another specific layer before. can be input.

이러한 AdaIN 정규화 레이어는 수학식 2와 같이 나타낼 수 있다.Such an AdaIN normalization layer can be expressed as Equation (2).

여기서 μ(·)는 입력값의 평균을 출력하는 함수를 의미할 수 있고, σ(·)는 입력값의 표준편차를 출력하는 함수를 의미할 수 있다. 본 실시 예에서, AdaIN 정규화 레이어는 같은 형태(shape)를 가지는 두 개의 특징맵(feature map) x와 y를 입력값으로 받을 수 있다. 여기서 x는 정규화의 대상이 되는 특징맵을 의미하며, 정규화된 특징맵은 y에서 얻은 통계량을 활용해 스케일(scale) 및 바이어스(bias)가 입혀질 수 있다. 즉, 수학식 2를 살펴보면, 먼저 큰 괄호 부분에 나타난 바와 같이 x가 정규화 된 후, y에서 얻은 표준편차로 스케일되고, 평균으로 바이어스가 입혀짐을 볼 수 있다.Here, μ(·) may mean a function that outputs the average of the input values, and σ(·) may mean a function that outputs the standard deviation of the input values. In this embodiment, the AdaIN normalization layer may receive two feature maps x and y having the same shape as input values. Here, x means a feature map to be normalized, and a scale and bias can be applied to the normalized feature map by using statistics obtained from y. That is, looking at Equation 2, it can be seen that, as shown in the large brackets, x is first normalized, then scaled with the standard deviation obtained from y, and biased with the average.

본 실시 예에서는, 상술한 바와 같이, AdaIN 정규화 레이어는 이전의 특정 레이어로부터 입력 이미지(제 1 이미지)에 대한 특징맵(정규화 될 입력값)을 입력받을 수 있으며, 이전의 다른 특정 레이어로부터 상기 입력 이미지에 대한 특징맵을 정규화 하기 위하여, 임의의 다른 이미지(제 2 이미지)에 대한 평균 및 표준편차를 입력받을 수 있다. In this embodiment, as described above, the AdaIN normalization layer may receive a feature map (input value to be normalized) for an input image (first image) from a previous specific layer, and the input from another previous specific layer. In order to normalize the feature map for the image, the average and standard deviation of any other image (the second image) may be input.

즉 본 실시 예에서는, 입력 이미지 x(제 1 이미지)가 주어졌을 때 분류 모델 네트워크 F에 x를 피드 포워드 하는 과정에서 임의의 다른 이미지 x'(제 2 이미지)의 스타일을 신경망에 계속 섞어주는 과정을 추가할 수 있다. 이는 입력 이미지 x가 주어졌을 때 F가 x의 형태 정보에 집중하도록 하고, 작은 노이즈에도 급변하기 쉬운 스타일 정보에는 가중치를 덜 주기 위함이다. That is, in this embodiment, when an input image x (first image) is given, in the process of feeding x to the classification model network F, the process of continuously mixing the styles of arbitrary other images x' (second image) into the neural network can be added. This is to allow F to focus on the shape information of x when an input image x is given, and to give less weight to style information that is easily changed even with small noise.

이를 위해, 본 실시 예에서는, 우선 AdaIN 정규화 레이어의 첫 번째 입력값(정규화 될 입력값, 즉 제 1 이미지의 콘텐츠 정보에 대한 특징맵)에는 그 전 레이어까지 x를 피드 포워드하면서 출력되는 중간값(특징맵)을 입력할 수 있다. 그리고 두 번째 입력값(정규화에 쓰일 평균과 표준편차를 계산하기 위한 입력값, 즉 제 2 이미지의 스타일 정보에 대한 특징맵)에는 x'를 F에 피드 포워드 하는 과정에서 출력되는 중간값(특징맵)을 입력할 수 있다. 상기의 과정을 수행하는 경우, AdaIN 정규화 레이어의 출력값은 x의 형태 정보에 x'의 스타일 정보가 더해진 중간값이 될 수 있다. 즉 콘텐츠 정보에 대한 특징맵의 평균과 표준편차를 스타일 정보에 대한 특징맵에 align 할 수 있다. To this end, in this embodiment, first, the first input value of the AdaIN normalization layer (the input value to be normalized, that is, the feature map for the content information of the first image) has an intermediate value ( feature map). And the second input value (the input value for calculating the mean and standard deviation to be used for normalization, that is, the feature map for the style information of the second image) has the median value (feature map) output in the process of feeding x' to F. ) can be entered. When the above process is performed, the output value of the AdaIN normalization layer may be an intermediate value obtained by adding the style information of x' to the shape information of x. That is, the average and standard deviation of the feature map for content information can be aligned with the feature map for style information.

그리고 본 실시 예에서는, 수학식 1을 참조하면, 분류 모델 네트워크 F의 특정 레이어들과 AdaIN 정규화 레이어가 반복될 수 있음을 알 수 있으며, 상기의 과정을 AdaIN 정규화 레이어 마다 반복하여 나온 최종 결과물(예측값)이 x의 레이블인 y와 비슷해지도록 학습할 수 있다. 다만 반복 여부 및 반복 횟수는 한정되는 것이 아니며, 네트워크 사용자가 주어진 태스크에 맞게 설정 가능할 수 있다. And in this embodiment, referring to Equation 1, it can be seen that specific layers of the classification model network F and the AdaIN normalization layer can be repeated, and the final result (prediction value) obtained by repeating the above process for each AdaIN normalization layer ) can be learned to be similar to the label of x, y. However, whether or not repetition and the number of repetitions are not limited, and a network user may be able to set according to a given task.

상기의 과정은 수학식 3과 같이 나타낼 수 있다.The above process can be expressed as Equation (3).

여기서 D는 데이터셋을 의미하는 것으로, 입력 이미지(제 1 이미지) x, 입력 이미지의 레이블 y 및 임의의 다른 입력 이미지(제 2 이미지) x'를 포함할 수 있다. 그리고,

는 분류 손실 함수(classification loss)를 나타낼 수 있다.Here, D denotes a dataset, and may include an input image (first image) x, a label y of the input image, and any other input image (second image) x′. and,

may represent a classification loss function.

보다 구체적으로 살펴보면, x와 x'를 통해 얻어지는 (i-1)번째 특징맵을 각각 h_i-1와 h'_i-1라 하면, 여기서 i번째 특징맵 h_i-1와 h'_i-1를 얻는 과정은 다음과 같다. 우선, h_i-1와 h'_i-1을 각각 f_i(도 3의 other layers)에 포워드 할 수 있다. 그 다음 h'_i는 AdaIN 정규화 레이어에 x = f_i(h_i-1), y = f_i(h'_i-1)를 입력하여 얻을 수 있다. 그리고 h'_i는 AdaIN 정규화 레이어에 x = f_i(h'_i-1), y = f_i(h'_i-1)를 입력하여 얻을 수 있다. 이로써 추론(inference) 과정에서 x의 형태 정보에 x'의 스타일 정보가 섞여도 원래의 레이블 y를 잘 유추하도록 할 수 있다. 이는 본 실시 예의 분류 모델이 이미지의 형태 정보에 집중하도록 유도하고, 강건성 향상에 도움을 줄 수 있다.More specifically, if the (i-1)-th feature maps obtained through x and x' are h _i-1 and h' _i-1 , respectively, here the i-th feature maps h _i-1 and h' _i-1 The process to obtain is as follows. First, h _i-1 and h' _i-1 may be forwarded to _fi (other layers in FIG. 3 ), respectively. Then h' _i can be obtained by entering x = f _i (h _i-1 ), y = f _i (h' _i-1 ) into the AdaIN regularization layer. And h' _i can be obtained by entering x = f _i (h' _i-1 ), y = f _i (h' _i-1 ) in the AdaIN normalization layer. Accordingly, even if the shape information of x and the style information of x' are mixed in the inference process, the original label y can be inferred well. This may induce the classification model of the present embodiment to focus on the shape information of the image, and may help to improve robustness.

한편, 도 3(c)를 참조하면, 상기와 같이 학습된 분류 모델을 활용 및 적용할 때는 스타일 정보를 변형할 필요가 없으므로 임의의 다른 이미지를 입력하는 네트워크(과정)를 제외하고, 입력 이미지(원본 이미지) x를 바탕으로 AdaIN 정규화를 수행하며 추론(inference)을 진행할 수 있다. 이는 수학식 4와 같이 나타낼 수 있다.On the other hand, referring to Fig. 3(c), when utilizing and applying the classification model learned as described above, there is no need to modify the style information, so except for the network (process) of inputting any other image, the input image ( Based on the original image) x, AdaIN normalization is performed and inference can be performed. This can be expressed as Equation (4).

도 4를 참조하면, 상술한 바와 같이 x는 입력 이미지, y는 입력 이미지에 해당하는 레이블, x'는 스타일을 입히기 위한 임의의 다른 이미지를 나타낼 수 있다. 도 4(a)에 개시된 훈련(학습) 과정에서는 x에 대한 추론(inference)을 진행할 때 분류 모델 네트워크 F 중간중간에 있는 AdaIN 정규화 레이어에서 x'의 스타일 정보를 입력할 수 있다. 그렇게 x의 형태 정보와 x'의 스타일 정보가 섞인 특징맵이 출력되었을 때, 최종적으로 분류 모델 네트워크 F가 x의 레이블인 y를 잘 예측하도록 학습할 수 있다. 이를 위해, 본 실시 예에서는 표준적인 분류 손실 함수를 활용할 수 있으며, 적용 가능한 분류 손실 함수는 한정되지 않는다. Referring to FIG. 4 , as described above, x may represent an input image, y may represent a label corresponding to the input image, and x' may represent any other image for applying a style. In the training (learning) process shown in FIG. 4( a ), style information of x' may be input in the AdaIN regularization layer in the middle of the classification model network F when inference is performed on x. When a feature map that mixes the shape information of x and the style information of x' is output, the classification model network F can finally learn to predict the label y of x well. To this end, in this embodiment, a standard classification loss function may be used, and the applicable classification loss function is not limited.

또한 도 4(b)에 개시된 테스트(활용 및 적용) 과정에서는, 학습된 분류 모델 네트워크 F를 기반으로 하여 추론을 진행하되, 스타일 정보를 변형하지 않아도 되므로 AdaIN 정규화 레이어에 원본 이미지 x의 스타일 정보를 입력할 수 있다. 그리고 분류 모델 네트워크 F에서 예측한 레이블

를 바탕으로 실제 레이블 y를 유추할 수 있다.In addition, in the test (utilization and application) process disclosed in Fig. 4(b), inference is made based on the learned classification model network F, but the style information of the original image x does not need to be modified, so the style information of the original image x is added to the AdaIN regularization layer. can be entered. And the labels predicted by the classification model network F

Based on , the actual label y can be inferred.

상술한 본 실시 예의 분류 모델을 기반으로, 이미지 분류 장치(100)에 대해 정리하면, 프로세서(130)는 분류 모델의 특정 레이어에서 출력된 입력 이미지에 대한 특징맵을 추출하고, 특징맵의 통계 정보를 이용하여 특징맵의 스타일 정보를 추출할 수 있다. 그리고 프로세서(130)는 스타일 정보를 적용하여 특징맵을 대상으로 정규화를 수행하고, 정규화된 이미지를 기반으로 입력 이미지의 레이블을 예측할 수 있다. 여기서 특징맵을 추출하는 과정 내지 정규화를 수행하는 과정은 도 3 및 도 4에 도시된 바와 같이, 설정 횟수 이상 연속으로 수행되도록 구성될 수 있으나, 설정 횟수 이상 연속으로 수행되는지 여부 및 횟수는 한정되지 않고, 실시 예에 따라 달라질 수 있다.If the image classification apparatus 100 is summarized based on the classification model of the present embodiment described above, the processor 130 extracts a feature map for an input image output from a specific layer of the classification model, and statistical information of the feature map can be used to extract the style information of the feature map. In addition, the processor 130 may perform normalization on the feature map by applying the style information, and predict the label of the input image based on the normalized image. Here, the process of extracting the feature map or the process of performing normalization may be configured to be performed continuously more than a set number of times, as shown in FIGS. , and may vary depending on the embodiment.

이때 적용되는 본 실시 예의 분류 모델은, 클래스를 식별하기 위한 제 1 이미지(x) 및 제 1 이미지의 제 1 레이블(y)을 제 1 네트워크에 입력하는 단계와, 제 1 이미지에 대한 스타일 전이 정규화(AdaIN)를 위한 제 2 이미지(x')를 제 2 네트워크에 입력하는 단계와, 제 1 네트워크의 특정 레이어에서 출력된 제 1 이미지에 대한 콘텐츠 특징맵을 추출하는 단계와, 제 2 네트워크의 특정 레이어에서 출력된 제 2 이미지에 대한 스타일 특징맵을 추출하여 스타일 특징맵의 통계 정보를 이용해 스타일 특징맵의 스타일 정보를 추출하는 단계와, 스타일 정보를 제 1 네트워크에 적용하여 제 1 이미지에 대한 콘텐츠 특징맵을 대상으로 정규화를 수행하는 단계와, 정규화된 제 1 이미지를 기반으로 입력된 제 1 이미지의 제 1 레이블을 예측하도록 학습하는 단계를 포함하는 훈련 페이즈(phase)에 의해 훈련될 수 있다. 이때 분류 모델은, 제 1 레이블을 예측하도록 학습할 수 있으며, 다시 말해, 분류 모델의 AdaIN 정규화를 거친 추론 결과, 최종 출력되는 예측 이미지(

)가 제 1 레이블(y)과 유사하도록 학습하는 것이다. The classification model of the present embodiment applied at this time includes the steps of inputting a first image (x) for identifying a class and a first label (y) of the first image to the first network, and normalizing the style transfer for the first image Inputting a second image (x') for (AdaIN) to a second network, extracting a content feature map for a first image output from a specific layer of the first network, and specifying a second network Extracting the style feature map for the second image output from the layer and extracting the style information of the style feature map using statistical information of the style feature map, and applying the style information to the first network to content for the first image It can be trained by a training phase comprising the steps of performing normalization on the feature map and learning to predict the first label of the input first image based on the normalized first image. At this time, the classification model can learn to predict the first label, that is, the prediction image (

) is to learn to be similar to the first label (y).

여기서, 스타일 정보는 스타일 특징맵의 평균 및 표준 편차를 포함할 수 있다. 또한 제 1 네트워크는 제 1 이미지(x)가 추론되는 인공신경망 네트워크를 의미하고, 제 2 네트워크는 제 1 이미지(x)에 대한 스타일 전이 정규화(AdaIN)를 위한 제 2 이미지(x')가 추론되는 인공신경망 네트워크를 의미할 수 있다.Here, the style information may include the average and standard deviation of the style feature map. In addition, the first network means an artificial neural network from which the first image (x) is inferred, and the second network is the second image (x') for style transfer normalization (AdaIN) for the first image (x) is inferred. It may mean an artificial neural network that becomes

그리고 분류 모델의 정규화를 수행하는 단계에서, 프로세서(130)는 콘텐츠 특징맵을 정규화한 후, 정규화된 콘텐츠 특징맵에 스타일 특징맵의 표준 편차를 곱하고, 스타일 특징맵의 평균을 가산할 수 있다. 또한, 본 실시 예의 분류 모델은 제 1 이미지의 제 1 레이블을 입력하는 단계 내지 정규화를 수행하는 단계는 설정 횟수 반복 수행되도록 구성될 수 있으며, 반복 여부 및 반복 횟수는 한정되지 않고 실시 예에 따라 달라질 수 있다.In the step of normalizing the classification model, the processor 130 may normalize the content feature map, multiply the normalized content feature map by the standard deviation of the style feature map, and add the average of the style feature map. In addition, in the classification model of this embodiment, the step of inputting the first label of the first image or the step of performing the normalization may be configured to be repeatedly performed a set number of times, and whether or not the repetition and the number of repetitions are not limited and may vary depending on the exemplary embodiment. can

또한 본 실시 예의 분류 모델은, 설정 횟수 반복 수행되는 경우, 스타일 정보를 제 2 네트워크에 적용하여 제 2 이미지에 대한 스타일 특징맵을 대상으로 정규화를 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련될 수 있으며, 이때 정규화된 제 2 이미지가 제 2 네트워크의 다음 레이어에 입력되도록 구성될 수 있다. In addition, when the classification model of this embodiment is repeatedly performed a set number of times, it is to be trained by a training phase further comprising the step of applying the style information to the second network and performing normalization on the style feature map for the second image. In this case, the normalized second image may be configured to be input to the next layer of the second network.

그리고 본 실시 예에서, 제 1 네트워크 및 제 2 네트워크는 동일한 구조 및 동일한 파라미터가 적용되도록 구성될 수 있으며, 분류 모델은 제 1 네트워크 및 제 2 네트워크 각각 손실함수를 통해 최적화되도록 구성될 수 있다.And in this embodiment, the first network and the second network may be configured to have the same structure and the same parameters applied, and the classification model may be configured to be optimized through a loss function of the first network and the second network, respectively.

한편 본 실시 예에서는, 인공 신경망의 훈련 과정에서 훈련 이미지들을 적절히 섞어줌으로써, 보다 강건하게 하는 것을 특징으로 하는데, 이때 훈련 레이블들도 같이 섞어주면서 분류 성능 및 강건성을 더욱 더 향상시킬 수 있다. 즉, 본 실시 예는 이미지 두 장을 섞는 방식이라고 할 수 있으므로, 실시 예에 따라서, 훈련 레이블도 섞어 줄 수 있는 것이다.Meanwhile, in the present embodiment, it is characterized in that it is made more robust by appropriately mixing training images in the training process of the artificial neural network. That is, since the present embodiment can be said to be a method of mixing two images, according to an embodiment, training labels can also be mixed.

즉, 본 실시 예의 분류 모델은, 제 1 이미지의 제 1 레이블 및 제 2 이미지의 제 2 레이블에 대한 레이블 스무딩(label smoothing)을 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련될 수 있다. 이때, 레이블 스무딩은 다음 수학식 5에 의해 수행될 수 있다.That is, the classification model of the present embodiment may be trained by a training phase that further includes performing label smoothing on the first label of the first image and the second label of the second image. In this case, label smoothing may be performed by Equation 5 below.

여기서, y는 제 1 레이블, y'는 제 2 레이블, α는 하이퍼 파라미터(hyper parameter)를 의미할 수 있으며, α의 값은 하이퍼파라미터로 설정에 따라 조절될 수 있다.Here, y may mean a first label, y' may mean a second label, and α may mean a hyper parameter, and the value of α may be adjusted according to setting as a hyper parameter.

예를 들어, 제 1 이미지(x)와 제 2 이미지(x')의 one-hot 레이블이 각각 y와 y'인 경우 훈련 목표 레이블을 수학식 5와 같이 설정할 수 있다. 이때 α의 값은 하이퍼파라미터로 설정에 따라 조절될 수 있다. 여기서, one-hot 레이블은 표현하고 싶은 인덱스에 1의 값을 부여하고, 다른 인덱스에는 0을 부여하는 일종의 표현 방식을 의미할 수 있다.For example, when the one-hot labels of the first image (x) and the second image (x') are y and y', respectively, the training target label may be set as in Equation 5. In this case, the value of α may be adjusted according to the setting as a hyperparameter. Here, the one-hot label may mean a kind of expression method in which a value of 1 is assigned to an index to be expressed and 0 is assigned to another index.

한편, 본 실시 예에서는, 이미지 분류기 강건성 연구에서 널리 쓰이는 CIFAR-10 데이터셋을 이용한 실험을 통해, 본 실시 예의 분류 모델의 실용성을 확인할 수 있다. 실험에서 설정된 데이터셋은 CIFAR-10이며, CIFAR-10은 10 개의 자연 이미지 클래스(개, 고양이, 비행기, 트럭) 등으로 구성된 RGB 데이터셋이고, 각 이미지는 32x32 크기이다. 또한, 분류기는 WideResNet 34-10 인공신경망으로 설정되었으며, 훈련 방법은 SGD 최적화(optimizer)로 200 epoch 동안 훈련하고, 모멘텀(momentum) 값은 0.1로 설정되었고, 학습 진도율(learning rate)은 0.1에서 시작하여 [100, 150] epoch에서 각각 1/10로 줄이도록 설정되었다.Meanwhile, in this embodiment, the practicality of the classification model of this embodiment can be confirmed through an experiment using the CIFAR-10 dataset widely used in image classifier robustness studies. The dataset set in the experiment is CIFAR-10, and CIFAR-10 is an RGB dataset consisting of 10 natural image classes (dog, cat, airplane, truck), etc., and each image is 32x32 in size. In addition, the classifier was set up with a WideResNet 34-10 artificial neural network, the training method was trained for 200 epochs with the SGD optimizer, the momentum value was set to 0.1, and the learning rate started at 0.1. Therefore, it was set to be reduced to 1/10 in each [100, 150] epoch.

상기 표 1에서 각 행은 어떤 방어 기법이 쓰였는지 나타낸다. 'Natural'은 아무 방어 기법을 쓰지 않고 표준적인 방식으로 학습한 경우를 뜻하며, 'Madry'는 강건성 연구 분야에서 널리 쓰이는 베이스라인으로 분류기를 강건하게 학습시킨 경우를 뜻한다. 'Ours'는 본 실시 예의 기법으로 강건하게 학습시킨 경우를 의미한다. 또한, 'clean'과 'PGD' 열은 각각 분류기에 테스트셋의 원본 이미지를 입력했을 때와 해당 원본 이미지에 PGD attack을 가하여 생성한 적대적 예시를 입력했을 때의 분류 정확도를 나타낸다. 이때 분류 정확도는 테스트셋 데이터 10,000장 전체에 대하여 측정될 수 있다. In Table 1, each row indicates which defense technique was used. 'Natural' refers to a case in which a standard method is used without using any defensive technique, and 'Madry' refers to a case in which a classifier is robustly trained as a baseline widely used in the field of robustness research. 'Ours' refers to a case in which the method of this embodiment has been robustly learned. In addition, the 'clean' and 'PGD' columns indicate the classification accuracy when the original image of the test set is input to the classifier and when the hostile example generated by applying the PGD attack to the original image is input, respectively. In this case, classification accuracy may be measured for all 10,000 pieces of test set data.

표 1에 개시된 바와 같이, 본 실시 예의 분류 모델은 기존의 Madry 방법 대비 clean 정확도(이미지가 공격받지 않았을 때의 정확도)가 8%p 향상되면서도, PGD 정확도(PGD attack으로 공격받았을 때의 정확도) 또한 약간 더 높은 성능을 보여준다.As shown in Table 1, the classification model of this embodiment improves the clean accuracy (accuracy when the image is not attacked) by 8%p compared to the existing Madry method, while the PGD accuracy (accuracy when attacked by a PGD attack) also shows slightly higher performance.

두 번째로 본 실시 예의 분류 모델과 기존 기법의 훈련 소요시간 차이를 측정할 수 있으며, 각 기법의 훈련을 위해서 1개의 GPU(RTX 2080Ti)이 사용될 수 있다.Second, the difference in training time between the classification model of this embodiment and the existing technique can be measured, and one GPU (RTX 2080Ti) can be used for training each technique.

표 2를 참조하면, 본 실시 예의 분류 모델의 경우, 학습 단계에서 한번 손실 함수 재기를 위해 두 개의 이미지를 포워딩 해야 한다는 점 때문에 기본적인 학습 방법보다는 소요시간이 길다는 것을 볼 수 있다. 하지만 한번 손실 함수 재기를 위해 대략 7~10배의 포워딩을 해야 하는 Madry 기법의 경우 본 실시 예의 분류 모델보다 훈련 소요시간이 3~4배 정도 크다는 것을 확인할 수 있다.Referring to Table 2, it can be seen that, in the case of the classification model of this embodiment, the required time is longer than that of the basic learning method because two images need to be forwarded to recover the loss function once in the learning step. However, it can be seen that in the case of the Madry method, which requires approximately 7 to 10 times of forwarding to recover the loss function, the training time is 3 to 4 times greater than that of the classification model of this embodiment.

결론적으로, 상기 표 1 및 표 2를 참조하면, 본 실시 예의 분류 모델이 기존 기법에 비해 공격받지 않은 이미지에 대한 분류 성능은 훨씬 높고, 공격받은 이미지에 대한 분류 성능은 약간 우수함을 확인할 수 있으며, 그럼에도 불구하고 학습에 걸리는 시간은 3~4배 정도 적음을 확인할 수 있다. 즉 본 실시 예의 분류 모델이 실용성이 있음을 확인할 수 있다.In conclusion, referring to Tables 1 and 2 above, it can be seen that the classification model of this embodiment has much higher classification performance for non-attacked images and slightly better classification performance for attacked images compared to the existing technique, Nevertheless, it can be seen that the learning time is 3 to 4 times less. That is, it can be confirmed that the classification model of the present embodiment has practicality.

도 5는 본 개시의 일 실시 예에 따른 이미지 분류 방법을 설명하기 위한 흐름도이다.5 is a flowchart illustrating an image classification method according to an embodiment of the present disclosure.

도 5를 참조하여, 이미지 분류를 위한 테스트 과정을 살펴보면, S10단계에서, 이미지 분류 장치(100)는 분류 모델의 특정 레이어에서 출력된 입력 이미지에 대한 특징맵을 추출한다.Referring to the test process for image classification with reference to FIG. 5 , in step S10 , the image classification apparatus 100 extracts a feature map for an input image output from a specific layer of the classification model.

여기서, 분류 모델은 클래스를 식별하기 위한 제 1 이미지 및 제 1 이미지의 제 1 레이블을 제 1 네트워크에 입력하는 단계와, 제 1 이미지에 대한 스타일 전이 정규화를 위한 제 2 이미지를 제 2 네트워크에 입력하는 단계와, 제 1 네트워크의 특정 레이어에서 출력된 제 1 이미지에 대한 콘텐츠 특징맵을 추출하는 단계와, 제 2 네트워크의 특정 레이어에서 출력된 제 2 이미지에 대한 스타일 특징맵을 추출하여 스타일 특징맵의 통계 정보를 이용해 스타일 특징맵의 스타일 정보를 추출하는 단계와, 스타일 정보를 제 1 네트워크에 적용하여 제 1 이미지에 대한 콘텐츠 특징맵을 대상으로 정규화를 수행하는 단계와, 정규화된 제 1 이미지를 기반으로 입력된 제 1 이미지의 제 1 레이블을 예측하도록 학습하는 단계를 포함하는 훈련 페이즈(phase)에 의해 훈련된 학습 모델일 수 있다.Here, the classification model includes inputting a first image for identifying a class and a first label of the first image to a first network, and inputting a second image for style transfer normalization for the first image to a second network extracting the content feature map for the first image output from the specific layer of the first network, and extracting the style feature map for the second image output from the specific layer of the second network to create a style feature map extracting style information of the style feature map using the statistical information of It may be a learning model trained by a training phase including learning to predict the first label of the input first image based on the training phase.

여기서, 스타일 정보는 스타일 특징맵의 평균 및 표준 편차를 포함할 수 있으며, 분류 모델의 정규화를 수행하는 단계에서는, 콘텐츠 특징맵을 정규화하고 정규화된 콘텐츠 특징맵에 스타일 특징맵의 표준 편차를 곱하고, 스타일 특징맵의 평균을 가산할 수 있다. Here, the style information may include the average and standard deviation of the style feature map, and in the step of normalizing the classification model, the content feature map is normalized and the normalized content feature map is multiplied by the standard deviation of the style feature map, The average of the style feature maps can be added.

S20단계에서, 이미지 분류 장치(100)는 컨볼루션 레이어에서 특징맵의 통계 정보를 이용하여 특징맵의 스타일 정보를 추출한다.In step S20, the image classification apparatus 100 extracts style information of the feature map by using statistical information of the feature map in the convolutional layer.

여기서, 스타일 정보는 특징맵의 평균 및 표준 편차를 포함할 수 있다.Here, the style information may include the average and standard deviation of the feature map.

그리고 S30단계에서, 이미지 분류 장치(100)는 정규화 레이어에서 스타일 정보를 적용하여 특징맵을 대상으로 정규화를 수행하며, S40단계에서, 정규화된 특징맵을 기반으로 입력 이미지의 레이블을 예측한다.And in step S30, the image classification apparatus 100 applies style information from the normalization layer to normalize the feature map, and in step S40, predicts the label of the input image based on the normalized feature map.

이때 본 실시 예에서는, 특징맵을 추출하는 단계(S10) 내지 정규화를 수행하는 단계(S30)는 설정 횟수 이상 연속으로 수행되도록 구성될 수 있다.At this time, in this embodiment, the step of extracting the feature map (S10) to the step of performing the normalization (S30) may be configured to be continuously performed more than a set number of times.

또한, 본 실시 예에서, 정규화 레이어 및 컨볼루션 레이어는 훈련 이미지를 훈련 이미지의 레이블로 분류하도록 훈련된 학습 모델의 레이어들로서, 정규화 레이어는 학습 모델의 학습 시 훈련 이미지에 스타일을 전이하는 스타일 이미지의 특징맵 및 훈련 이미지의 특징맵을 입력으로 하여 정규화를 수행하는 레이어일 수 있다.In addition, in this embodiment, the regularization layer and the convolution layer are layers of the learning model trained to classify the training image as a label of the training image, and the regularization layer is a style image that transfers the style to the training image when the learning model is trained. It may be a layer that performs normalization by inputting a feature map and a feature map of a training image as inputs.

이상 설명된 본 발명에 따른 실시 예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.The embodiment according to the present invention described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded in a computer-readable medium. In this case, the medium includes a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as a CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and a ROM. , RAM, flash memory, and the like, hardware devices specially configured to store and execute program instructions.

한편, 상기 컴퓨터 프로그램은 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be specially designed and configured for the present invention, or may be known and used by those skilled in the computer software field. Examples of the computer program may include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다.In the specification of the present invention (especially in the claims), the use of the term "above" and similar referential terms may be used in both the singular and the plural. In addition, when a range is described in the present invention, each individual value constituting the range is described in the detailed description of the invention as including the invention to which individual values belonging to the range are applied (unless there is a description to the contrary). same as

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.The steps constituting the method according to the present invention may be performed in an appropriate order, unless explicitly stated or contrary to the order. The present invention is not necessarily limited to the order in which the steps are described. The use of all examples or exemplary terms (eg, etc.) in the present invention is merely for the purpose of describing the present invention in detail, and the scope of the present invention is limited by the examples or exemplary terms unless defined by the claims. it's not going to be In addition, those skilled in the art will recognize that various modifications, combinations, and changes may be made in accordance with design conditions and factors within the scope of the appended claims or their equivalents.

따라서, 본 발명의 사상은 상기 설명된 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the spirit of the present invention is not limited to the scope of the scope of the present invention. will be said to belong to

1 : 이미지 분류 시스템
100 : 이미지 분류 장치
110 : 메모리
120 : 통신부
130 : 프로세서
140 : 사용자 인터페이스 1: Image classification system
100: image classification device
110: memory
120: communication department
130: processor
140 : user interface

Claims

입력 이미지의 클래스를 식별하도록 훈련된 강건한 분류 모델에 기반한 이미지 분류 방법으로서,
상기 분류 모델의 특정 레이어에서 출력된 입력 이미지에 대한 특징맵을 추출하는 단계;
컨볼루션 레이어에서 상기 특징맵의 통계 정보를 이용하여 상기 특징맵의 스타일 정보를 추출하는 단계;
정규화 레이어에서 상기 스타일 정보를 적용하여 상기 특징맵을 대상으로 정규화를 수행하는 단계; 및
상기 정규화된 특징맵을 기반으로 상기 입력 이미지의 레이블을 예측하는 단계를 포함하되,
상기 특징맵을 추출하는 단계 내지 상기 정규화를 수행하는 단계는 설정 횟수 이상 연속으로 수행되도록 구성되고,
상기 정규화 레이어 및 상기 컨볼루션 레이어는 훈련 이미지를 상기 훈련 이미지의 레이블로 분류하도록 훈련된 학습 모델의 레이어들로서, 상기 정규화 레이어는 상기 학습 모델의 학습 시 상기 훈련 이미지에 스타일을 전이하는 스타일 이미지의 특징맵 및 상기 훈련 이미지의 특징맵을 입력으로 하여 정규화를 수행하는 레이어인,
이미지 분류 방법.
An image classification method based on a robust classification model trained to identify classes of input images, comprising:
extracting a feature map for an input image output from a specific layer of the classification model;
extracting style information of the feature map by using statistical information of the feature map in a convolutional layer;
performing normalization on the feature map by applying the style information in a normalization layer; and
Predicting the label of the input image based on the normalized feature map,
The step of extracting the feature map and the step of performing the normalization are configured to be continuously performed more than a set number of times,
The normalization layer and the convolution layer are layers of a learning model trained to classify a training image as a label of the training image, and the regularization layer is a style image that transfers a style to the training image when the learning model is trained. A layer that performs normalization by inputting a map and a feature map of the training image as inputs,
Image classification method.

제 1 항에 있어서,
상기 분류 모델은,
클래스를 식별하기 위한 제 1 이미지 및 상기 제 1 이미지의 제 1 레이블을 제 1 네트워크에 입력하는 단계;
상기 제 1 이미지에 대한 스타일 전이 정규화를 위한 제 2 이미지를 제 2 네트워크에 입력하는 단계;
상기 제 1 네트워크의 특정 레이어에서 출력된 상기 제 1 이미지에 대한 콘텐츠 특징맵을 추출하는 단계;
상기 제 2 네트워크의 특정 레이어에서 출력된 상기 제 2 이미지에 대한 스타일 특징맵을 추출하여 상기 스타일 특징맵의 통계 정보를 이용해 상기 스타일 특징맵의 스타일 정보를 추출하는 단계;
상기 스타일 정보를 상기 제 1 네트워크에 적용하여 상기 제 1 이미지에 대한 상기 콘텐츠 특징맵을 대상으로 정규화를 수행하는 단계; 및
상기 정규화된 제 1 이미지를 기반으로 상기 입력된 제 1 이미지의 제 1 레이블을 예측하도록 학습하는 단계를 포함하는 훈련 페이즈(phase)에 의해 훈련된,
이미지 분류 방법.
The method of claim 1,
The classification model is
inputting a first image for identifying a class and a first label of the first image into a first network;
inputting a second image for style transfer normalization with respect to the first image to a second network;
extracting a content feature map for the first image output from a specific layer of the first network;
extracting a style feature map for the second image output from a specific layer of the second network and extracting style information of the style feature map using statistical information of the style feature map;
performing normalization on the content feature map for the first image by applying the style information to the first network; and
Trained by a training phase comprising learning to predict a first label of the input first image based on the normalized first image,
Image classification method.

제 2 항에 있어서,
상기 스타일 정보는 상기 스타일 특징맵의 평균 및 표준 편차를 포함하고,
상기 정규화를 수행하는 단계는,
상기 콘텐츠 특징맵을 정규화하는 단계; 및
상기 정규화된 콘텐츠 특징맵에 상기 스타일 특징맵의 표준 편차를 곱하고, 상기 스타일 특징맵의 평균을 가산하는 단계를 포함하는,
이미지 분류 방법.
3. The method of claim 2,
The style information includes a mean and a standard deviation of the style feature map,
The step of performing the normalization is,
normalizing the content feature map; and
multiplying the normalized content feature map by a standard deviation of the style feature map, and adding an average of the style feature map,
Image classification method.

제 2 항에 있어서,
상기 분류 모델은,
상기 제 1 이미지의 제 1 레이블을 입력하는 단계 내지 상기 정규화를 수행하는 단계는 설정 횟수 이상 연속으로 수행되도록 구성되는,
이미지 분류 방법.
3. The method of claim 2,
The classification model is
The step of inputting the first label of the first image to the step of performing the normalization are configured to be continuously performed more than a set number of times,
Image classification method.

제 4 항에 있어서,
상기 분류 모델은,
설정 횟수 이상 연속으로 수행되는 경우,
상기 스타일 정보를 상기 제 2 네트워크에 적용하여 상기 제 2 이미지에 대한 상기 스타일 특징맵을 대상으로 정규화를 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련되며,
상기 정규화된 제 2 이미지가 상기 제 2 네트워크의 다음 레이어에 입력되도록 구성되는,
이미지 분류 방법.
5. The method of claim 4,
The classification model is
If it is performed continuously more than the set number of times,
Trained by a training phase further comprising the step of applying the style information to the second network to perform normalization on the style feature map for the second image,
wherein the normalized second image is configured to be input to a next layer of the second network,
Image classification method.

제 2 항에 있어서,
상기 제 1 네트워크 및 상기 제 2 네트워크의 중간 레이어들은,
동일한 구조 및 동일한 파라미터가 적용되도록 구성되는,
이미지 분류 방법.
3. The method of claim 2,
Intermediate layers of the first network and the second network are
configured to have the same structure and the same parameters applied,
Image classification method.

제 2 항에 있어서,
상기 분류 모델은,
상기 제 1 네트워크 및 상기 제 2 네트워크가 각각 손실함수를 통해 최적화되도록 구성되는,
이미지 분류 방법.
3. The method of claim 2,
The classification model is
wherein the first network and the second network are each configured to be optimized through a loss function,
Image classification method.

제 2 항에 있어서,
상기 분류 모델은,
상기 제 1 이미지의 제 1 레이블 및 상기 제 2 이미지의 제 2 레이블에 대한 레이블 스무딩(label smoothing)을 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련되고,
상기 레이블 스무딩은 다음 수식에 의해 수행되며,

여기서, y는 제 1 레이블, y'는 제 2 레이블, α는 하이퍼 파라미터(hyper parameter)인,
이미지 분류 방법.
3. The method of claim 2,
The classification model is
trained by a training phase further comprising performing label smoothing on the first label of the first image and the second label of the second image,
The label smoothing is performed by the following formula,

Here, y is the first label, y' is the second label, and α is a hyper parameter,
Image classification method.

입력 이미지의 클래스를 식별하도록 훈련된 강건한 분류 모델에 기반한 이미지 분류 장치로서,
메모리; 및
상기 메모리와 연결되고, 상기 메모리에 포함된 컴퓨터 판독 가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고,
상기 적어도 하나의 프로세서는,
입력 이미지의 클래스를 식별하도록 훈련된 강건한 분류 모델에 기반한 이미지 분류 방법으로서,
상기 분류 모델의 특정 레이어에서 출력된 입력 이미지에 대한 특징맵을 추출하는 동작,
컨볼루션 레이어에서 상기 특징맵의 통계 정보를 이용하여 상기 특징맵의 스타일 정보를 추출하는 동작,
정규화 레이어에서 상기 스타일 정보를 적용하여 상기 특징맵을 대상으로 정규화를 수행하는 동작, 및
상기 정규화된 특징맵을 기반으로 상기 입력 이미지의 레이블을 예측하는 동작을 수행하도록 구성되며,
상기 특징맵을 추출하는 동작 내지 상기 정규화를 수행하는 동작은 설정 횟수 이상 연속으로 수행되도록 구성되고,
상기 정규화 레이어 및 상기 컨볼루션 레이어는 훈련 이미지를 상기 훈련 이미지의 레이블로 분류하도록 훈련된 학습 모델의 레이어들로서, 상기 정규화 레이어는 상기 학습 모델의 학습 시 상기 훈련 이미지에 스타일을 전이하는 스타일 이미지의 특징맵 및 상기 훈련 이미지의 특징맵을 입력으로 하여 정규화를 수행하는 레이어인,
이미지 분류 장치.
An image classification device based on a robust classification model trained to identify classes of input images, comprising:
Memory; and
at least one processor coupled to the memory and configured to execute computer readable instructions contained in the memory;
the at least one processor,
An image classification method based on a robust classification model trained to identify classes of input images, comprising:
extracting a feature map for an input image output from a specific layer of the classification model;
extracting style information of the feature map by using statistical information of the feature map in the convolution layer;
performing normalization on the feature map by applying the style information in the normalization layer; and
configured to predict the label of the input image based on the normalized feature map,
The operation of extracting the feature map or the operation of performing the normalization is configured to be continuously performed more than a set number of times,
The normalization layer and the convolution layer are layers of a learning model trained to classify a training image as a label of the training image, and the regularization layer is a style image that transfers a style to the training image when the learning model is trained. A layer that performs normalization by inputting a map and a feature map of the training image as inputs,
Image classification device.

제 9 항에 있어서,
상기 분류 모델은,
클래스를 식별하기 위한 제 1 이미지 및 상기 제 1 이미지의 제 1 레이블을 제 1 네트워크에 입력하는 단계;
상기 제 1 이미지에 대한 스타일 전이 정규화를 위한 제 2 이미지를 제 2 네트워크에 입력하는 단계;
상기 제 1 네트워크의 특정 레이어에서 출력된 상기 제 1 이미지에 대한 콘텐츠 특징맵을 추출하는 단계;
상기 제 2 네트워크의 특정 레이어에서 출력된 상기 제 2 이미지에 대한 스타일 특징맵을 추출하여 상기 스타일 특징맵의 통계 정보를 이용해 상기 스타일 특징맵의 스타일 정보를 추출하는 단계;
상기 스타일 정보를 상기 제 1 네트워크에 적용하여 상기 제 1 이미지에 대한 상기 콘텐츠 특징맵을 대상으로 정규화를 수행하는 단계; 및
상기 정규화된 제 1 이미지를 기반으로 상기 입력된 제 1 이미지의 제 1 레이블을 예측하도록 학습하는 단계를 포함하는 훈련 페이즈(phase)에 의해 훈련된,
이미지 분류 장치.
10. The method of claim 9,
The classification model is
inputting a first image for identifying a class and a first label of the first image into a first network;
inputting a second image for style transfer normalization with respect to the first image to a second network;
extracting a content feature map for the first image output from a specific layer of the first network;
extracting a style feature map for the second image output from a specific layer of the second network and extracting style information of the style feature map using statistical information of the style feature map;
performing normalization on the content feature map for the first image by applying the style information to the first network; and
Trained by a training phase comprising learning to predict a first label of the input first image based on the normalized first image,
Image classification device.

제 10 항에 있어서,
상기 스타일 정보는 상기 스타일 특징맵의 평균 및 표준 편차를 포함하고,
상기 분류 모델의 상기 정규화를 수행하는 단계는,
상기 콘텐츠 특징맵을 정규화하는 단계; 및
상기 정규화된 콘텐츠 특징맵에 상기 스타일 특징맵의 표준 편차를 곱하고, 상기 스타일 특징맵의 평균을 가산하는 단계를 포함하는,
이미지 분류 장치.
11. The method of claim 10,
The style information includes a mean and a standard deviation of the style feature map,
The step of performing the normalization of the classification model,
normalizing the content feature map; and
multiplying the normalized content feature map by a standard deviation of the style feature map, and adding an average of the style feature map,
Image classification device.

제 10 항에 있어서,
상기 분류 모델은,
상기 제 1 이미지의 제 1 레이블을 입력하는 단계 내지 상기 정규화를 수행하는 단계는 설정 횟수 이상 연속으로 수행되도록 구성되는,
이미지 분류 장치.
11. The method of claim 10,
The classification model is
The step of inputting the first label of the first image to the step of performing the normalization are configured to be continuously performed more than a set number of times,
Image classification device.

제 12 항에 있어서,
상기 분류 모델은,
설정 횟수 이상 연속으로 수행되는 경우,
상기 스타일 정보를 상기 제 2 네트워크에 적용하여 상기 제 2 이미지에 대한 상기 스타일 특징맵을 대상으로 정규화를 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련되며,
상기 정규화된 제 2 이미지가 상기 제 2 네트워크의 다음 레이어에 입력되도록 구성되는,
이미지 분류 장치.
13. The method of claim 12,
The classification model is
If it is performed continuously more than the set number of times,
Trained by a training phase further comprising the step of applying the style information to the second network to perform normalization on the style feature map for the second image,
wherein the normalized second image is configured to be input to a next layer of the second network,
Image classification device.

제 10 항에 있어서,
상기 제 1 네트워크 및 상기 제 2 네트워크의 중간 레이어들은,
동일한 구조 및 동일한 파라미터가 적용되도록 구성되는,
이미지 분류 장치.
11. The method of claim 10,
Intermediate layers of the first network and the second network are
configured to have the same structure and the same parameters applied,
Image classification device.

제 10 항에 있어서,
상기 분류 모델은,
상기 제 1 네트워크 및 상기 제 2 네트워크가 각각 손실함수를 통해 최적화되도록 구성되는,
이미지 분류 장치.
11. The method of claim 10,
The classification model is
wherein the first network and the second network are each configured to be optimized through a loss function,
Image classification device.

제 10 항에 있어서,
상기 분류 모델은,
상기 제 1 이미지의 제 1 레이블 및 상기 제 2 이미지의 제 2 레이블에 대한 레이블 스무딩(label smoothing)을 수행하는 단계를 더 포함하는 훈련 페이즈에 의해 훈련되고,
상기 레이블 스무딩은 다음 수식에 의해 수행되며,

여기서, y는 제 1 레이블, y'는 제 2 레이블, α는 하이퍼 파라미터(hyper parameter)인,
이미지 분류 장치.11. The method of claim 10,
The classification model is
trained by a training phase further comprising performing label smoothing on the first label of the first image and the second label of the second image,
The label smoothing is performed by the following formula,

Here, y is the first label, y' is the second label, and α is a hyper parameter,
Image classification device.