KR102507273B1

KR102507273B1 - Electronic device and operation method of electronic device for detecting and classifying object

Info

Publication number: KR102507273B1
Application number: KR1020220108841A
Authority: KR
Inventors: 표성백; 김은규; 김기우
Original assignee: (주)트루엔
Priority date: 2022-07-21
Filing date: 2022-08-30
Publication date: 2023-03-08

Abstract

The present invention provides an electronic device capable of accurately detecting an object to be actually detected. According to various embodiments of the present invention, the electronic device comprises: a memory storing a first model outputting detection results for an object included in a first input image based on a first training image set, a second training image set, and feature values included in the first input image and a second model outputting values associated with whether an object included in a second input image is authentic based on feature values of the second input image; and a processor. The processor trains the first model by using the first training image set and trains the second model by using the second training image set. The first training image set is a set of one or more images to which classes are assigned. The second training image set is a set of images with different label values assigned to the classes for an image detected as a first object in the first model. Various other embodiments are possible.

Description

객체를 검출 및 분류하기 위한 전자 장치 및 전자 장치의 동작 방법{ELECTRONIC DEVICE AND OPERATION METHOD OF ELECTRONIC DEVICE FOR DETECTING AND CLASSIFYING OBJECT}Electronic device and method of operating the electronic device for detecting and classifying objects

본 문서에 개시된 다양한 실시예들은, 객체를 검출 및 분류하기 위한 전자 장치 및 전자 장치의 동작 방법에 관한 것이다.Various embodiments disclosed in this document relate to an electronic device and a method of operating the electronic device for detecting and classifying an object.

영상 처리 기술은, 영상을 목적에 따라 분석하는 기술을 의미한다. 특히, 인공 지능 모델을 이용하여 영상 내의 객체를 검출하는 기술이 많이 사용되고 있다.Image processing technology refers to a technology of analyzing an image according to a purpose. In particular, a technique for detecting an object in an image using an artificial intelligence model is widely used.

예를 들어, 영상 처리 기술은 도로의 영상에서 자동차, 사람, 표지판 등을 검출하고, 검출된 객체에 대응하는 이벤트를 발생시키는 데에 사용될 수 있다.For example, image processing technology may be used to detect cars, people, signs, and the like from road images, and generate events corresponding to the detected objects.

한편, 현재 도로에서의 영상 처리 기술은 검지 거리가 100~200m 일 때 객체를 안정적으로 인식하는 수준이며, 600m 정도의 거리에서는 객체를 오탐없이 정확하고 안정적이게 인식하지 못하고 있다.On the other hand, the current image processing technology on the road is at a level of stably recognizing an object when the detection distance is 100 to 200 m, and cannot accurately and reliably recognize an object at a distance of about 600 m without false positives.

검지 거리가 멀어질수록, 도로 노면의 갈라짐, 차선들이 겹치는 것이 사람 또는 자동차로 인식되어 오탐 확률이 높아질 수 있다.As the detection distance increases, cracks in the road surface and overlapping lanes may be recognized as people or vehicles, and the probability of false positives may increase.

다시 말해, 원거리의 객체일수록 근거리의 객체와는 특징 표현이 정확하지 않아, 사물을 오인식할 확률이 높아질 수 있다.In other words, the more distant an object is, the more likely it is to misrecognize an object because its characteristic expression is not accurate compared to a nearby object.

일반적인 검출 모델(detection model)의 경우, 학습된 객체의 특징과 유사한 것을 검출하도록 훈련되어 있는 바, 먼 거리의 객체의 경우 그 특징을 구분하기 어려워 오검출할 확률이 높아진다.In the case of a general detection model, it is trained to detect something similar to a learned object feature, and in the case of a distant object, it is difficult to distinguish the feature and the probability of false detection increases.

본 발명은 상기 언급한 문제점들을 해결하기 위하여, 검출 모델(detection model)에서 검출한 객체의 크롭 이미지를 분류 모델(classification model)에 입력하여 실제 객체 여부를 판단하여 이벤트를 발생시킴으로써, 실제 검출하고자하는 객체를 정확하게 검출하는 전자 장치 및 전자 장치의 동작 방법을 제공한다.In order to solve the above-mentioned problems, the present invention inputs a cropped image of an object detected in a detection model to a classification model to determine whether it is a real object and generates an event, so that the object to be actually detected Provided are an electronic device that accurately detects an object and an operating method of the electronic device.

특히, 본 발명의 전자 장치는 검출 모델에서 객체로 검출되었으나, 실제 객체인 이미지와 실제 객체가 아닌 이미지들을 분류 모델에서 학습하여, 객체의 오탐을 방지할 수 있다.In particular, the electronic device of the present invention, although detected as an object in the detection model, learns images that are real objects and images that are not real objects in the classification model to prevent false object detection.

본 문서에서 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved in this document is not limited to the technical problem mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

본 문서에 개시된 다양한 실시예에 따른 전자 장치는, 제 1 학습 이미지 세트, 제 2 학습 이미지 세트, 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과를 출력하는 제 1 모델 및 제 2 입력 이미지의 특징 값에 기반하여 제 2 입력 이미지에 포함된 객체의 진위 여부와 관련된 값을 출력하는 제 2 모델을 저장하는 메모리, 및 프로세서,를 포함하고, 상기 프로세서는 상기 제 1 학습 이미지 세트를 이용하여 상기 제 1 모델을 학습시키고, 상기 제 2 학습 이미지 세트를 이용하여 상기 제 2 모델을 학습시키고, 상기 제 1 학습 이미지 세트는 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합이고, 상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 제 1 객체로 검출된 이미지에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합일 수 있다.An electronic device according to various embodiments disclosed in this document outputs a detection result for an object included in a first input image based on feature values of a first training image set, a second training image set, and a first input image. A memory for storing a second model for outputting a value related to authenticity of an object included in a second input image based on feature values of the first model and the second input image, and a processor, wherein the processor comprises: The first model is trained using a first training image set, the second model is trained using a second training image set, and the first training image set is assigned a class. A set of at least one image, and the second training image set may be a set of images to which different label values are assigned to respective classes with respect to images detected as first objects in the first model.

다양한 실시예에 따른 전자 장치는, 객체를 정확하게 검출하는 기술을 제공할 수 있다.An electronic device according to various embodiments may provide a technique for accurately detecting an object.

다양한 실시예에 따른 전자 장치는, 검출 모델에서 객체로 인식하였지만, 객체가 아닌 이미지에 대하여 학습하여 객체를 정확하게 검출하는 기술을 제공할 수 있다.An electronic device according to various embodiments may provide a technique of accurately detecting an object by learning about an image that has been recognized as an object in a detection model, but is not an object.

다양한 실시예에 따른 전자 장치는, 지정된 기준을 만족하는 입력 이미지를 미지 객체(unknown)의 학습 이미지로 사용하여, 모델을 반복하여 강화 학습시킬 수 있다.An electronic device according to various embodiments may repeatedly perform reinforcement learning on a model by using an input image that satisfies a specified criterion as a learning image of an unknown object.

다양한 실시예에 따른 전자 장치는, 모델이 세대를 거듭할수록 직전의 가중치를 기반으로 진보하며 학습한다는 점에서 자동 전이학습(Auto Transfer Learning)의 모델을 제공할 수 있다.An electronic device according to various embodiments may provide an auto transfer learning model in that the model advances and learns based on the previous weight as the model goes through generations.

도면의 설명과 관련하여, 동일 또는 유사한 구성 요소에 대해서는 동일 또는 유사한 참조 부호가 사용될 수 있다.
도 1은, 다양한 실시예들에 따른, 네트워크 환경 내의 전자 장치의 블럭도이다.
도 2는, 다양한 실시예에 따른 프로세서가 제 1 모델 및 제 2 모델을 학습시키는 방법을 도시한 흐름도이다.
도 3은, 다양한 실시예에 따른 제 2 학습 이미지 세트의 예시를 도시한 도면이다.
도 4는, 다양한 실시예에 따른 프로세서가 제 1 모델 및 제 2 모델을 이용하여 이벤트를 수행하는 방법을 도시한 흐름도이다.
도 5는, 다양한 실시예에 따른 제 1 모델 및 제 2 모델의 구조를 도시한 도면이다.
도 6a는, 다양한 실시예에 따른 방식이 다른 제 2 모델에서 활성화 함수의 각 출력 값을 시각화하여 비교한 도면이다.
도 6b는, 다양한 실시예에 따른 제 2 모델에서, Attention 모듈의 적용 여부에 따른 활성화 함수의 출력 값을 시각화한 도면이다.
도 7a, 7b 및 7c는 다양한 실시예에 따른 본 발명의 전자 장치가 객체를 검출 및 분류한 결과와 관련된 영상이다.In connection with the description of the drawings, the same or similar reference numerals may be used for the same or similar elements .
1 is a block diagram of an electronic device in a network environment, according to various embodiments.
2 is a flowchart illustrating a method for a processor to learn a first model and a second model, according to various embodiments.
3 is a diagram illustrating an example of a second training image set according to various embodiments.
4 is a flowchart illustrating a method for a processor to perform an event using a first model and a second model, according to various embodiments.
5 is a diagram illustrating structures of a first model and a second model according to various embodiments.
6A is a diagram in which each output value of an activation function is visualized and compared in a second model having a different method according to various embodiments.
6B is a diagram visualizing output values of activation functions according to whether an attention module is applied or not in a second model according to various embodiments.
7A, 7B, and 7C are images related to results of detecting and classifying objects by the electronic device of the present disclosure according to various embodiments.

도 1는, 다양한 실시예에 따른 전자 장치의 블록도이다.1 is a block diagram of an electronic device according to various embodiments.

도 1를 참조하면, 전자 장치(100)는 프로세서(110), 카메라(120), 메모리(130)및/또는 통신 모듈(140)을 포함할 수 있다. 도 1에 포함된 구성 요소는 전자 장치(100)에 포함된 구성들의 일부에 대한 것이며 전자 장치(100)는 이 밖에도 다양한 구성요소를 포함할 수 있다.Referring to FIG. 1 , the electronic device 100 may include a processor 110, a camera 120, a memory 130, and/or a communication module 140. The components included in FIG. 1 are for some of the components included in the electronic device 100, and the electronic device 100 may include various other components.

프로세서(110)는, 예를 들면, 소프트웨어(예: 프로그램)를 실행하여 프로세서(110)에 연결된 전자 장치(100)의 적어도 하나의 다른 구성요소(예: 하드웨어 또는 소프트웨어 구성요소)를 제어할 수 있고, 다양한 데이터 처리 또는 연산을 수행할 수 있다. 일실시예에 따르면, 데이터 처리 또는 연산의 적어도 일부로서, 프로세서(110)는 다른 구성요소로부터 수신된 명령 또는 데이터를 휘발성 메모리에 저장하고, 휘발성 메모리에 저장된 명령 또는 데이터를 처리하고, 결과 데이터를 비휘발성 메모리에 저장할 수 있다. 일실시예에 따르면, 프로세서(110)는 메인 프로세서(예: 중앙 처리 장치 또는 어플리케이션 프로세서) 또는 이와는 독립적으로 또는 함께 운영 가능한 보조 프로세서(예: 그래픽 처리 장치, 신경망 처리 장치(NPU: neural processing unit), 이미지 시그널 프로세서, 센서 허브 프로세서, 또는 커뮤니케이션 프로세서)를 포함할 수 있다. 예를 들어, 전자 장치(100)가 메인 프로세서 및 보조 프로세서를 포함하는 경우, 보조 프로세서는 메인 프로세서보다 저전력을 사용하거나, 지정된 기능에 특화되도록 설정될 수 있다. 보조 프로세서는 메인 프로세서와 별개로, 또는 그 일부로서 구현될 수 있다.The processor 110 may, for example, execute software (eg, a program) to control at least one other component (eg, hardware or software component) of the electronic device 100 connected to the processor 110. and can perform various data processing or calculations. According to one embodiment, as at least part of a data processing or operation, the processor 110 stores commands or data received from other components in volatile memory, processes the commands or data stored in the volatile memory, and outputs the resulting data. It can be stored in non-volatile memory. According to one embodiment, the processor 110 may include a main processor (eg, a central processing unit or an application processor) or a secondary processor (eg, a graphic processing unit, a neural processing unit (NPU)) that may operate independently of or together therewith. , image signal processor, sensor hub processor, or communication processor). For example, when the electronic device 100 includes a main processor and an auxiliary processor, the auxiliary processor may use less power than the main processor or may be set to be specialized for a designated function. A secondary processor may be implemented separately from, or as part of, the main processor.

일 실시예에 따르면, 보조 프로세서(예: 이미지 시그널 프로세서 또는 커뮤니케이션 프로세서)는 기능적으로 관련 있는 다른 구성요소(예: 카메라(120) 또는 통신 모듈(140))의 일부로서 구현될 수 있다. 일실시예에 따르면, 보조 프로세서((예: 신경망 처리 장치)는 인공지능 모델의 처리에 특화된 하드웨어 구조를 포함할 수 있다. 인공지능 모델은 기계 학습을 통해 생성될 수 있다. 이러한 학습은, 예를 들어, 인공지능 모델이 수행되는 전자 장치(100) 자체에서 수행될 수 있다. 학습 알고리즘은, 예를 들어, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)을 포함할 수 있으나, 전술한 예에 한정되지 않는다. 인공지능 모델은, 복수의 인공 신경망 레이어들을 포함할 수 있다. 인공 신경망은 심층 신경망(DNN: deep neural network), CNN(convolutional neural network), RNN(recurrent neural network), RBM(restricted boltzmann machine), DBN(deep belief network), BRDNN(bidirectional recurrent deep neural network), 심층 Q-네트워크(deep Q-networks) 또는 상기 중 둘 이상의 조합 중 하나일 수 있으나, 전술한 예에 한정되지 않는다. 인공지능 모델은 하드웨어 구조 이외에, 추가적으로 또는 대체적으로, 소프트웨어 구조를 포함할 수 있다.According to one embodiment, the auxiliary processor (eg, image signal processor or communication processor) may be implemented as a part of other functionally related components (eg, camera 120 or communication module 140). According to one embodiment, an auxiliary processor (eg, a neural network processing unit) may include a hardware structure specialized for processing an artificial intelligence model. The artificial intelligence model may be generated through machine learning. Such learning may include, for example, For example, it may be performed in the electronic device 100 itself where the artificial intelligence model is performed. The learning algorithm may be, for example, supervised learning, unsupervised learning, or semi-supervised learning. -supervised learning) or reinforcement learning (reinforcement learning), but is not limited to the above examples. The artificial intelligence model may include a plurality of artificial neural network layers. The artificial neural network is a deep neural network (DNN). neural network), convolutional neural network (CNN), recurrent neural network (RNN), restricted boltzmann machine (RBM), deep belief network (DBN), bidirectional recurrent deep neural network (BRDNN), deep Q-networks ) or a combination of two or more of the above, but is not limited to the above examples The artificial intelligence model may additionally or alternatively include a software structure in addition to a hardware structure.

일 실시예에 따르면, 프로세서(110)는 인공지능 모델(예 : 제 1 모델 및/또는 제 2 모델)을 학습시킬 수 있다. 상세한 내용은 도 2와 관련된 설명에서 후술한다.According to an embodiment, the processor 110 may train an artificial intelligence model (eg, a first model and/or a second model). Details will be described later in the description related to FIG. 2 .

일 실시예에 따르면, 프로세서(110)는 학습된 인공지능 모델에 이미지를 입력하여 획득한 출력 값에 기반하여 지정된 동작을 수행할 수 있다. 상세한 내용은 도 4와 관련된 설명에서 후술한다.According to an embodiment, the processor 110 may perform a designated operation based on an output value obtained by inputting an image to the learned artificial intelligence model. Details will be described later in the description related to FIG. 4 .

메모리(130)는, 전자 장치(100)의 적어도 하나의 구성요소(예: 프로세서(110))에 의해 사용되는 다양한 데이터를 저장할 수 있다. 데이터는, 예를 들어, 소프트웨어(예: 프로그램) 및, 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 포함할 수 있다. 메모리(130)는, 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다. 프로그램은 메모리(130)에 소프트웨어로서 저장될 수 있으며, 예를 들면, 운영 체제, 미들 웨어 또는 어플리케이션을 포함할 수 있다. The memory 130 may store various data used by at least one component (eg, the processor 110) of the electronic device 100. The data may include, for example, input data or output data for software (eg, a program) and commands related thereto. The memory 130 may include volatile memory or non-volatile memory. The program may be stored as software in the memory 130 and may include, for example, an operating system, middleware, or applications.

일 실시예에 따르면, 메모리(130)는 인공지능 모델(예 : 제 1 모델, 제 2 모델) 및 인공지능 모델과 관련된 데이터 중 적어도 하나를 일시적으로 또는 비일시적으로 저장할 수 있다. 예를 들어, 메모리(130)는 인공지능 모델의 학습 결과에 기초하여 인공지능 모델에 포함된 파라미터를 저장할 수 있다.According to an embodiment, the memory 130 may temporarily or non-temporarily store at least one of an artificial intelligence model (eg, a first model and a second model) and data related to the artificial intelligence model. For example, the memory 130 may store parameters included in the artificial intelligence model based on a learning result of the artificial intelligence model.

카메라(120)는 정지 영상 및 동영상을 촬영할 수 있다. 일실시예에 따르면, 카메라(120)는 하나 이상의 렌즈들, 이미지 센서들, 이미지 시그널 프로세서들, 또는 플래시들을 포함할 수 있다.The camera 120 may capture still images and moving images. According to one embodiment, camera 120 may include one or more lenses, image sensors, image signal processors, or flashes.

일 실시예에 따르면, 카메라(120)는 스피드 돔 카메라(SPEED DOME camera) 및 전 방위 영역을 촬영할 수 있는 적어도 하나의 카메라를 포함할 수 있다.According to one embodiment, the camera 120 may include a speed dome camera (SPEED DOME camera) and at least one camera capable of capturing an omnidirectional area.

통신 모듈(140)은 전자 장치(100)와 외부 전자 장치(예: 전자 장치, 또는 서버) 간의 직접(예: 유선) 통신 채널 또는 무선 통신 채널의 수립, 및 수립된 통신 채널을 통한 통신 수행을 지원할 수 있다. 통신 모듈(140)은 프로세서(110)(예: 어플리케이션 프로세서)와 독립적으로 운영되고, 직접(예: 유선) 통신 또는 무선 통신을 지원하는 하나 이상의 커뮤니케이션 프로세서를 포함할 수 있다. 일실시예에 따르면, 통신 모듈(140)은 무선 통신 모듈(예: 셀룰러 통신 모듈, 근거리 무선 통신 모듈, 또는 GNSS(global navigation satellite system) 통신 모듈) 또는 유선 통신 모듈(예: LAN(local area network) 통신 모듈, 또는 전력선 통신 모듈)을 포함할 수 있다. 이들 통신 모듈 중 해당하는 통신 모듈은 제 1 네트워크(예: 블루투스, WiFi(wireless fidelity) direct 또는 IrDA(infrared data association)와 같은 근거리 통신 네트워크) 또는 제 2 네트워크(예: 레거시 셀룰러 네트워크, 5G 네트워크, 차세대 통신 네트워크, 인터넷, 또는 컴퓨터 네트워크(예: LAN 또는 WAN)와 같은 원거리 통신 네트워크)를 통하여 외부의 전자 장치와 통신할 수 있다. 이런 여러 종류의 통신 모듈들은 하나의 구성요소(예: 단일 칩)로 통합되거나, 또는 서로 별도의 복수의 구성요소들(예: 복수 칩들)로 구현될 수 있다. 무선 통신 모듈은 가입자 식별 모듈에 저장된 가입자 정보(예: 국제 모바일 가입자 식별자(IMSI))를 이용하여 제 1 네트워크 또는 제 2 네트워크와 같은 통신 네트워크 내에서 전자 장치(100)를 확인 또는 인증할 수 있다. The communication module 140 establishes a direct (eg, wired) communication channel or a wireless communication channel between the electronic device 100 and an external electronic device (eg, an electronic device or a server) and performs communication through the established communication channel. can support The communication module 140 may include one or more communication processors that operate independently of the processor 110 (eg, an application processor) and support direct (eg, wired) communication or wireless communication. According to one embodiment, the communication module 140 is a wireless communication module (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module (eg, a local area network (LAN)). ) communication module, or power line communication module). Among these communication modules, the corresponding communication module is a first network (eg, a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network (eg, a legacy cellular network, a 5G network, It may communicate with an external electronic device through a next-generation communication network, the Internet, or a telecommunications network such as a computer network (eg, LAN or WAN). These various types of communication modules may be integrated as one component (eg, a single chip) or implemented as a plurality of separate components (eg, multiple chips). The wireless communication module may identify or authenticate the electronic device 100 within a communication network such as the first network or the second network using subscriber information (eg, International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module. .

도 2는, 다양한 실시예에 따른 프로세서(110)가 제 1 모델 및 제 2 모델을 학습시키는 방법을 도시한 흐름도이다.2 is a flowchart illustrating a method for the processor 110 to learn a first model and a second model according to various embodiments.

다양한 실시예에 따르면, 프로세서(110)는, 동작 210에서 제 1 학습 이미지 세트를 이용하여 제 1 모델을 학습시킬 수 있다.According to various embodiments, the processor 110 may train a first model using a first training image set in operation 210 .

제 1 모델은 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과(예 : 객체 영역 및/또는 객체일 확률)를 출력하는 학습 모델일 수 있다. 예를 들어, 제 1 모델은 합성곱 신경망(CNN: convolution neural network) 알고리즘에 기초하여 제 1 입력 이미지 내에서 객체를 검출하도록 학습된 모델일 수 있다. 예를 들어, 프로세서(110)는, 제 1 모델을 이용하여 제 1 입력 이미지에 포함된 객체를 검출한 결과에 기반하여, 객체 영역을 크롭(crop)하여 추출할 수 있다.The first model may be a learning model that outputs a detection result (eg, object region and/or object probability) of an object included in the first input image based on feature values of the first input image. For example, the first model may be a model learned to detect an object in the first input image based on a convolution neural network (CNN) algorithm. For example, the processor 110 may crop and extract the object area based on a result of detecting the object included in the first input image using the first model.

일 실시예에 따르면, 제 1 모델은 제 1 입력 이미지의 특징 값을 추출하기 위한 하나 이상의 층을 포함할 수 있다. 예를 들어, 제 1 모델은 합성곱 층(convolution layer), 풀링 층(pooling layer), 완전 결합 층(fully connected layer) 등을 포함할 수 있으나, 반드시 이에 한정되는 것은 아니고, 실시예에 따라 다양한 형태로 구성될 수 있다. 예를 들어, 제 1 모델은 제 1 입력 이미지에 대한 하나 이상의 특징 값을 포함하는 특징 벡터를 추출할 수 있다.According to an embodiment, the first model may include one or more layers for extracting feature values of the first input image. For example, the first model may include a convolution layer, a pooling layer, a fully connected layer, etc., but is not necessarily limited thereto, and various can be made into a shape. For example, the first model may extract a feature vector including one or more feature values of the first input image.

일 실시예에 따르면, 제 1 모델은 제 1 입력 이미지에 포함된 객체에 대한 분류 결과를 출력하기 위한 하나 이상의 층을 포함할 수 있다. 구체적으로, 제 1 모델은 제 1 입력 이미지의 특징 값에 기초하여 제 1 입력 이미지에 포함된 객체에 대해 기 설정된 복수의 클래스 중 특정 클래스로 분류될 확률을 출력할 수 있다.According to an embodiment, the first model may include one or more layers for outputting a classification result for an object included in the first input image. Specifically, the first model may output a probability that an object included in the first input image is classified into a specific class among a plurality of pre-set classes based on feature values of the first input image.

일 실시예에 따르면, 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합일 수 있다. 예를 들어, 제 1 학습 이미지 세트는 “사람”, “자동차” 및/또는 “표지판”과 같이 기 설정된 객체로 클래스가 할당된 이미지들의 집합일 수 있다. 이때, 제 1 학습 이미지 세트에 포함된 클래스는 사용자에 의해 할당된 것일 수 있다.According to an embodiment, the first training image set may be a set of at least one image to which a class is assigned. For example, the first training image set may be a set of images to which classes are assigned to preset objects such as “person,” “car,” and/or “sign”. In this case, the class included in the first training image set may be assigned by the user.

일 실시예에 따르면, 프로세서(110)는 제 1 학습 이미지 세트를 이용한 지도 학습 기법을 이용하여 제 1 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 클래스가 할당된 이미지를 제 1 모델의 입력 데이터로 이용하고, 이미지에 할당된 클래스를 제 1 모델의 타겟 데이터로 이용하여 제 1 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 제 1 모델을 학습시키면서, 제 1 모델에 포함된 파라미터를 업데이트할 수 있다. According to an embodiment, the processor 110 may train the first model using a supervised learning technique using the first training image set. For example, the processor 110 may train the first model by using an image assigned to a class as input data of the first model and using a class assigned to the image as target data of the first model. For example, the processor 110 may update parameters included in the first model while training the first model.

다양한 실시예에 따르면, 프로세서(110)는, 동작 220에서 제 2 학습 이미지 세트를 이용하여 제 2 모델을 학습하고, 결과 데이터를 메모리(130)에 저장할 수 있다.According to various embodiments, the processor 110 may learn a second model using a second training image set in operation 220 and store resultant data in the memory 130 .

제 2 모델은 제 2 입력 이미지의 특징 값에 기반하여 제 2 입력 이미지에 포함된 객체의 진위 여부와 관련된 값을 출력하는 학습 모델일 수 있다. 예를 들어, 제 2 모델은 합성곱 신경망(CNN: convolution neural network) 알고리즘에 기초하여 제 2 입력 이미지에 포함된 객체의 진위 여부에 대하여 분류하도록 학습된 모델일 수 있다.The second model may be a learning model that outputs a value related to authenticity of an object included in the second input image based on feature values of the second input image. For example, the second model may be a model learned to classify whether an object included in the second input image is authentic based on a convolution neural network (CNN) algorithm.

제 2 모델은 제 2 입력 이미지에 포함된 객체의 진위 여부에 대한 분류 결과를 출력하기 위한 하나 이상의 층을 포함할 수 있다. 예를 들어, 제 2 모델은 제 2 입력 이미지의 특징 값에 기초하여 제 2 입력 이미지에 검출된 객체의 진위 여부에 대한 확률을 출력할 수 있다.The second model may include one or more layers for outputting a classification result for authenticity of an object included in the second input image. For example, the second model may output a probability of authenticity of an object detected in the second input image based on feature values of the second input image.

일 실시예에 따르면, 제 2 학습 이미지 세트는 제 1 모델에서 제 1 객체로 검출된 적어도 하나의 이미지들의 집합일 수 있다. 예를 들어, 제 2 학습 이미지 세트는 제 1 모델에서 “사람”과 같이 지정된 객체로 검출된 이미지들의 집합일 수 있다.According to an embodiment, the second training image set may be a set of at least one image detected as the first object in the first model. For example, the second training image set may be a set of images detected as an object designated as “person” in the first model.

일 실시예에 따르면, 제 2 학습 이미지 세트는 제 1 모델에서 제 1 객체로 검출된 이미지에 대하여, “클래스 A”, “클래스 B”, “클래스 C” 및/또는 “unknown”으로 분류된 이미지들의 집합일 수 있다. “클래스 A”, “클래스 B” 및/또는 “클래스 C” 는 제 1 객체와 관련된 이미지들의 집합이고, “unknown”은 “클래스 A”, “클래스 B” 및/또는 “클래스 C”로 분류되지 않아 제 1 객체가 아닌 이미지들의 집합일 수 있다. 예를 들어, 제 2 학습 이미지 세트는 제 1 모델에서 “사람”으로 검출된 이미지에 대하여, “상반신”, “전신”, “하반신” 및/또는 “unknown”으로 분류된 이미지들의 집합일수 있다. 예를 들어, 제 2 학습 이미지 세트는 공통적인 특징을 가진, 콘, 표지판, 자전거, 오토바이, 도로의 갈라짐, 도로의 선(lane), 등으로 분류된 이미지들의 집합과, 공통된 특징이 없는 임의의 영역이 'unknown'으로 분류된 이미지들의 집합일 수 있다.According to an embodiment, the second training image set is an image classified as “class A”, “class B”, “class C” and/or “unknown” with respect to the image detected as the first object in the first model. may be a set of “Class A”, “Class B” and/or “Class C” is a set of images related to the first object, and “unknown” is not classified as “Class A”, “Class B” and/or “Class C”. Therefore, it may be a set of images other than the first object. For example, the second training image set may be a set of images classified as “upper body”, “whole body”, “lower body” and/or “unknown” with respect to the image detected as “person” in the first model. For example, the second training image set includes a set of images classified as cones, signs, bicycles, motorcycles, road splits, road lanes, etc. having common features, and an arbitrary region having no common features. This may be a set of images classified as 'unknown'.

본 문서에서는 “클래스 A”, “클래스 B”, “클래스 C”라고 작성하였으나, 제 1 객체와 관련된 이미지들의 집합은 “클래스 A”, “클래스 B”, “클래스 C” 외에도 “클래스 D” … “클래스 N”과 같이 N개일 수 있다.In this document, “Class A”, “Class B”, “Class C” are written, but the set of images related to the first object is “Class A”, “Class B”, “Class C” as well as “Class D”... There can be N, such as “class N”.

일 실시예에 따르면, 제 2 학습 이미지 세트는 제 1 모델에서 제 1 객체로 검출된 이미지에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합일 수 있다. 예를 들어, “클래스 A”의 이미지는 [1, 0, 0], “클래스 B”의 이미지는 [0, 1, 0], “클래스 C”의 이미지는 [0, 0, 1]. “unknown” 이미지는 [0, 0, 0]으로 라벨링될 수 있다.According to an embodiment, the second training image set may be a set of images to which different label values are assigned to respective classes with respect to images detected as the first object in the first model. For example, an image of “class A” is [1, 0, 0], an image of “class B” is [0, 1, 0], an image of “class C” is [0, 0, 1]. An “unknown” image can be labeled [0, 0, 0].

다양한 실시예에 따르면, 프로세서(110)는, 제 2 모델은 학습 데이터의 레이블 값에 레이블 스무딩(Lable Smoothing) 기법을 적용할 수 있다. 레이블 스무딩(Label smoothing)은 레이블 값을 조절하는 방식일 수 있다. 예를 들어, 레이블 스무딩 스케일(label smoothing scale)을 ε로 설정하고, “class A”의 레이블은 [01-ε, 0+ε, 0+ε], “class B”의 레이블은 [0+ε, 1-ε, 0+ε], “class C”의 레이블은 [0+ε, 0+ε, 1-ε], “unknown”의 레이블은 [0+ε, 0+ε, 0+ε]로 설정할 수 있다. 이에 따라, 학습 모델의 과대 확신을 방지하여 과적합(overfitting) 현상을 완화할 수 있다.일 실시예에 따르면, 프로세서(110)는 제 2 학습 이미지 세트를 이용한 지도 학습 기법을 이용하여 제 2 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 “클래스 A”, “클래스 B”, “클래스 C” 및/또는 “unknown”으로 분류된 이미지를 제 2 모델의 입력 데이터로 이용하고, 이미지에 할당된 클래스를 제 2 모델의 타겟 데이터로 이용하여 제 2 모델을 학습시킬 수 있다. 예를 들어, 프로세서(110)는 제 2 모델을 학습시키면서, 제 2 모델에 포함된 파라미터를 업데이트할 수 있다. According to various embodiments, the processor 110 may apply a label smoothing technique to label values of the second model training data. Label smoothing may be a method of adjusting label values. For example, if the label smoothing scale is set to ε, the label of “class A” is [01-ε, 0+ε, 0+ε], and the label of “class B” is [0+ε]. , 1-ε, 0+ε], the label of “class C” is [0+ε, 0+ε, 1-ε], the label of “unknown” is [0+ε, 0+ε, 0+ε] can be set to Accordingly, overconfidence of the learning model can be prevented and overfitting phenomenon can be mitigated. can be learned. For example, the processor 110 uses images classified as “class A”, “class B”, “class C” and/or “unknown” as input data of the second model, and assigns a class assigned to the image. The second model may be trained by using it as the target data of the second model. For example, the processor 110 may update parameters included in the second model while training the second model.

일 실시예에 따르면, 제 2 모델은 크로스 엔트로피 기반 손실 함수를 이용하여 제 2 입력 이미지에 포함된 객체의 진위 여부에 대한 분류 결과를 출력할 수 있다. 예를 들어, 제 2 모델은 시그모이드(sigmoid) 함수를 이용할 수 있다. According to an embodiment, the second model may output a classification result on whether an object included in the second input image is genuine or not by using a cross entropy-based loss function. For example, the second model may use a sigmoid function.

예를 들어, 제 2 모델은 입력된 이미지의 출력 값으로 0~1의 벡터 값을 출력할 수 있다.For example, the second model may output a vector value of 0 to 1 as an output value of an input image.

예를 들어, 제 2 모델은 제 1 모델에서 제 1 객체로 검출된 제 2 입력 이미지가 “클래스 A”, “클래스 B” 및/또는 “클래스 C”로 분류됨에 대응하여, 1에 가까운 벡터 값을 출력할 수 있다. 프로세서(110)는, 제 2 모델의 출력값이 지정된 임계값 이상임에 대응하여, 제 2 입력 이미지에 포함된 객체가 실제 제 1 객체(real object)라고 판단할 수 있다. 프로세서(110)는, 제 2 모델의 출력값에서 지정된 임계값 이상인 벡터가 존재함에 대응하여, 제 2 입력 이미지에 포함된 객체를 대응하는 클래스(예 : “클래스 A”, “클래스 B” 및/또는 “클래스 C”)로 분류할 수 있다. For example, the second model has a vector value close to 1 in response to the second input image detected as the first object in the first model being classified as “class A”, “class B” and/or “class C”. can output The processor 110 may determine that the object included in the second input image is a real object in response to the output value of the second model being greater than or equal to the specified threshold. The processor 110 classifies objects included in the second input image in response to the presence of a vector that is equal to or greater than the specified threshold in the output value of the second model (eg, “class A”, “class B” and/or “Class C”).

예를 들어, 제 2 모델은 제 1 모델에서 제 1 객체로 검출된 제 2 입력 이미지가 “클래스 A”, “클래스 B” 및 “클래스 C” 중 적어도 하나로 분류되지 않음에 대응하여, 0에 가까운 벡터 값을 출력할 수 있다. 프로세서(110)는, 제 2 모델의 출력값이 지정된 임계값 미만임에 대응하여, 제 2 입력 이미지에 포함된 객체가 미지 객체(unknown)라고 판단할 수 있다.For example, the second model is close to 0 in response to the fact that the second input image detected as the first object in the first model is not classified as at least one of “class A”, “class B” and “class C”. Vector values can be output. The processor 110 may determine that the object included in the second input image is unknown in response to the output value of the second model being less than a specified threshold.

다양한 실시예에 따르면, 프로세서(110)는, 동작 210 내지 220을 지정된 횟수 이상 반복하여 제 1 모델 및 제 2 모델을 학습시킬 수 있다.According to various embodiments, the processor 110 may train the first model and the second model by repeating operations 210 to 220 a specified number of times or more.

다양한 실시예에 따른 프로세서(110)는, 학습된 제 2 모델에 제 2 입력 이미지를 입력하여 획득한 출력 값이 지정된 범위(예 : 0.9 이상 및/또는 0.1 이하)임에 대응하여, 제 2 입력 이미지에 출력 값을 라벨링하여 제 2 학습 이미지로 저장할 수 있다.The processor 110 according to various embodiments generates a second input image in response to an output value obtained by inputting the second input image to the learned second model within a specified range (eg, 0.9 or more and/or 0.1 or less). An output value may be labeled on an image and stored as a second training image.

도 3은, 다양한 실시예에 따른 제 2 학습 이미지 세트의 예시를 도시한 도면이다.3 is a diagram illustrating an example of a second training image set according to various embodiments.

일 실시예에 따르면, 제 2 학습 이미지 세트는 제 1 모델에서 제 1 객체로 검출된 적어도 하나의 이미지들의 집합일 수 있다. 예를 들어, 제 2 이미지 세트는 제 1 모델에서 “사람”으로 검출된 이미지들의 집합일 수 있다.According to an embodiment, the second training image set may be a set of at least one image detected as the first object in the first model. For example, the second image set may be a set of images detected as “human” in the first model.

일 실시예에 따르면, 제 2 학습 이미지 세트는 제 1 모델에서 제 1 객체로 분류된 이미지에 대하여, “클래스 A”, “클래스 B”, “클래스 C” 및/또는 “unknown”으로 분류된 이미지들의 집합일 수 있다. “클래스 A”, “클래스 B” 및/또는 “클래스 C” 는 제 1 객체와 관련된 이미지들의 집합이고, “unknown”은 “클래스 A”, “클래스 B” 및/또는 “클래스 C”로 분류되지 않아 제 1 객체가 아닌 이미지들의 집합일 수 있다. 예를 들어, 제 2 이미지 세트는 제 1 모델에서 “사람”으로 분류된 이미지에 대하여, “상반신”, “전신”, “하반신” 및/또는 “unknown”으로 분류된 이미지들의 집합일수 있다. 예를 들어, 제 2 학습 이미지 세트는 공통적인 특징을 가진, 콘, 표지판, 자전거, 오토바이, 도로의 갈라짐, 도로의 선(lane), 등으로 분류된 이미지들의 집합과, 공통된 특징이 없는 임의의 영역이 'unknown'으로 분류된 이미지들의 집합일 수 있다.According to an embodiment, the second training image set includes images classified as “class A”, “class B”, “class C” and/or “unknown” with respect to images classified as first objects in the first model. may be a set of “Class A”, “Class B” and/or “Class C” is a set of images related to the first object, and “unknown” is not classified as “Class A”, “Class B” and/or “Class C”. Therefore, it may be a set of images other than the first object. For example, the second image set may be a set of images classified as “upper body”, “full body”, “lower body”, and/or “unknown” with respect to the image classified as “person” in the first model. For example, the second training image set includes a set of images classified as cones, signs, bicycles, motorcycles, road splits, road lanes, etc. having common features, and an arbitrary region having no common features. This may be a set of images classified as 'unknown'.

그림 (a)는 제 1 객체의 “클래스 A”로 분류된 이미지의 예시이다. “클래스 A”는 제 1 모델에서 “사람”으로 검출된 이미지 중에서, “상반신”으로 분류된 이미지일 수 있다.Figure (a) is an example of an image classified as “class A” of the first object. “Class A” may be images classified as “upper body” among images detected as “human” in the first model.

그림 (b)는 제 1 객체의 “클래스 B”로 분류된 이미지의 예시이다. “클래스 B”는 제 1 모델에서 “사람”으로 검출된 이미지 중에서, “전신”으로 분류된 이미지일 수 있다.Figure (b) is an example of an image classified as “class B” of the first object. “Class B” may be images classified as “whole body” among images detected as “human” in the first model.

그림 (c)는 제 1 객체의 “클래스 C”로 분류된 이미지의 예시이다. “클래스 C”는 제 1 모델에서 “사람”으로 검출된 이미지 중에서, “하반신”으로 분류된 이미지일 수 있다.Figure (c) is an example of an image classified as “class C” of the first object. “Class C” may be images classified as “lower body” among images detected as “human” in the first model.

그림 (d)는 제 1 객체의 “unknown”로 분류된 이미지의 예시이다. “unknown”은 제 1 모델에서 “사람”으로 검출된 이미지 중에서, “상반신”, “전신” 및/또는 “하반신”으로 분류되지 않아 사람이 아닌 이미지(예 : 도로 기둥)일 수 있다.Figure (d) is an example of an image classified as “unknown” of the first object. Among the images detected as “person” in the first model, “unknown” may be an image (eg, a road pillar) that is not classified as “upper body,” “whole body,” and/or “lower body.”

도 4는, 다양한 실시예에 따른 프로세서(110)가 제 1 모델 및 제 2 모델을 이용하여 이벤트를 수행하는 방법을 도시한 흐름도이다.4 is a flowchart illustrating a method of performing an event by using the first model and the second model by the processor 110 according to various embodiments.

다양한 실시예에 따르면, 프로세서(110)는, 동작 410에서, 제 1 입력 이미지에 대하여 제 1 모델을 이용하여 제 1 객체를 검출 및 추출할 수 있다.According to various embodiments, in operation 410, the processor 110 may detect and extract a first object by using a first model with respect to the first input image.

제 1 입력 이미지는 카메라(120)로부터 획득한 이미지일 수 있다.The first input image may be an image acquired from the camera 120 .

제 1 모델은 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과(예 : 객체 영역 및/또는 객체일 확률)를 출력하는 학습 모델일 수 있다. 예를 들어, 제 1 모델은 합성곱 신경망(CNN: convolution neural network) 알고리즘에 기초하여 제 1 입력 이미지 내에서 객체를 검출하도록 학습된 모델일 수 있다.The first model may be a learning model that outputs a detection result (eg, object region and/or object probability) of an object included in the first input image based on feature values of the first input image. For example, the first model may be a model learned to detect an object in the first input image based on a convolution neural network (CNN) algorithm.

일 실시예에 따르면, 프로세서(110)는, 제 1 입력 이미지를 제 1 모델에 입력하여,제 1 객체, 제 2 객체 및/또는 제 3 객체를 검출할 수 있다. 예를 들어, 프로세서(110)는 제 1 모델을 이용하여 제 1 입력 이미지에 포함된 “사람”, “자동차” 및/또는 “표지판” 을 검출할 수 있다.According to an embodiment, the processor 110 may input the first input image to the first model to detect the first object, the second object, and/or the third object. For example, the processor 110 may detect “person”, “vehicle” and/or “sign” included in the first input image by using the first model.

일 실시예에 따르면, 프로세서(110)는, 제 1 모델을 이용하여 제 1 입력 이미지에 포함된 객체를 검출한 결과에 기반하여, 객체 영역을 크롭(crop)하여 추출할 수 있다. 예를 들어, 프로세서(110)는, 제 1 입력 이미지에 포함된 제 1 객체를 검출하고, 제 1 객체에 대응하는 제 1 입력 이미지의 영역을 추출할 수 있다.According to an embodiment, the processor 110 may crop and extract an object area based on a result of detecting an object included in the first input image using the first model. For example, the processor 110 may detect a first object included in the first input image and extract a region of the first input image corresponding to the first object.

다양한 실시예에 따르면, 프로세서(110)는, 동작 420에서, 제 2 입력 이미지를 제 2 모델에 입력할 수 있다.According to various embodiments, in operation 420, the processor 110 may input a second input image to a second model.

일 실시예에 따르면, 프로세서(110)는, 제 1 모델을 이용하여 추출한 제 1 객체에 대응하는 영역을 제 2 입력 이미지로 제 2 모델에 입력할 수 있다.According to an embodiment, the processor 110 may input a region corresponding to the first object extracted using the first model to the second model as a second input image.

다양한 실시예에 따르면, 프로세서(110)는, 동작 430에서, 제 2 모델로부터 확률 값을 출력할 수 있다.According to various embodiments, the processor 110 may output a probability value from the second model in operation 430 .

다양한 실시예에 따르면, 프로세서(110)는, 동작 440에서, 출력된 확률 값이 임계 값 이상인지 여부를 확인할 수 있다.According to various embodiments, the processor 110, in operation 440, may check whether the output probability value is greater than or equal to a threshold value.

제 2 모델은 제 2 입력 이미지에 대한 출력 값으로 0~1의 벡터 값을 출력할 수 있다.The second model may output a vector value of 0 to 1 as an output value for the second input image.

프로세서(110)는, 출력 값이 임계 값(예 : 0.4) 이상인지 여부를 확인할 수 있다.The processor 110 may check whether the output value is greater than or equal to a threshold value (eg, 0.4).

다양한 실시예에 따르면, 프로세서(110)는, 동작 450에서, 확률 값이 임계 값 이상임에 대응하여(예 : 동작 440 - 예), 제 2 입력 이미지를 제 1 객체로 분류할 수 있다.According to various embodiments, in operation 450, the processor 110 may classify the second input image as the first object in response to the probability value being greater than or equal to the threshold value (eg, operation 440 - yes).

예를 들어, 제 2 모델은 제 1 모델에서 제 1 객체로 검출된 제 2 입력 이미지가 “클래스 A”, “클래스 B” 및/또는 “클래스 C”로 분류됨에 대응하여, 1에 가까운 값을 출력할 수 있다. 프로세서(110)는, 제 2 모델의 출력값이 지정된 임계값 이상임에 대응하여, 제 2 입력 이미지에 포함된 객체가 실제 제 1 객체(real object)라고 판단할 수 있다. 프로세서(110)는, 제 2 모델의 출력값에서 지정된 임계값 이상(예 : 0.9)인 벡터가 존재함에 대응하여, 제 2 입력 이미지에 포함된 객체를 해당 벡터에 대응하는 클래스(예 : “클래스 A”, “클래스 B” 및/또는 “클래스 C”)로 분류할 수 있다.For example, the second model sets a value close to 1 in response to classification of the second input image detected as the first object in the first model into “class A,” “class B,” and/or “class C.” can be printed out. The processor 110 may determine that the object included in the second input image is a real object in response to the output value of the second model being greater than or equal to the specified threshold. The processor 110, in response to the presence of a vector having a specified threshold value or more (eg, 0.9) in the output value of the second model, assigns an object included in the second input image to a class corresponding to the vector (eg, “class A”). ”, “Class B” and/or “Class C”).

다양한 실시예에 따르면, 프로세서(110)는, 동작 460에서, 확률 값이 임계 값 미만임에 대응하여(예 : 동작 440 - 아니오), 추출된 객체 영역을 “unknown”으로 분류할 수 있다.According to various embodiments, in operation 460, the processor 110 may classify the extracted object region as “unknown” in response to the probability value being less than the threshold value (eg, operation 440 - No).

예를 들어, 제 2 모델은 제 1 모델에서 제 1 객체로 검출된 제 2 입력 이미지가 “클래스 A”, “클래스 B” 및 “클래스 C” 중 적어도 하나로 분류되지 않음에 대응하여, 0에 가까운 값을 출력할 수 있다. 프로세서(110)는, 제 2 모델의 출력값이 지정된 임계값 미만임에 대응하여, 제 2 입력 이미지에 포함된 객체가 미지 객체(unknown)라고 판단할 수 있다.For example, the second model is close to 0 in response to the fact that the second input image detected as the first object in the first model is not classified as at least one of “class A”, “class B” and “class C”. value can be printed. The processor 110 may determine that the object included in the second input image is unknown in response to the output value of the second model being less than a specified threshold.

다양한 실시예에 따르면, 프로세서(110)는, 동작 470에서, 분류 결과에 기반하여, 이벤트를 수행할 수 있다.According to various embodiments, in operation 470, the processor 110 may perform an event based on the classification result.

일 실시예에 따르면, 프로세서(110)는, 제 2 입력 이미지가 실제 제 1 객체(real object)라고 판단함에 대응하여, 제 1 객체와 관련된 이벤트를 수행할 수 있다.According to an embodiment, the processor 110 may perform an event related to the first object in response to determining that the second input image is a real object.

일 실시예에 따르면, 프로세서(110)는, 제 2 입력 이미지가 미지 객체(unknown)라고 판단함에 대응하여, 제 1 객체와 관련된 이벤트를 수행하지 않을 수 있다.According to an embodiment, the processor 110 may not perform an event related to the first object in response to determining that the second input image is an unknown object.

도 5는, 다양한 실시예에 따른 제 1 모델 및 제 2 모델의 구조를 도시한 도면이다.5 is a diagram illustrating structures of a first model and a second model according to various embodiments.

일 실시예에 따르면, 프로세서(110)는, 제 1 입력 이미지(10)를 제 1 모델(510)에 입력할 수 있다.According to an embodiment, the processor 110 may input the first input image 10 to the first model 510 .

일 실시예에 따르면, 제 1 모델(510)은 제 1 입력 이미지(10)의 특징 값에 기반하여 제 1 입력 이미지(10)에 포함된 객체에 대하여 검출 결과를 출력할 수 있다.According to an embodiment, the first model 510 may output a detection result of an object included in the first input image 10 based on a feature value of the first input image 10 .

일 실시예에 따르면, 프로세서(110)는, 제 1 입력 이미지(10)를 제 1 모델(510)에 입력하여, 제 1 객체, 제 2 객체 및/또는 제 3 객체를 검출할 수 있다. 예를 들어, 프로세서(110)는 제 1 모델(510)을 이용하여 제 1 입력 이미지(에 포함된 “사람”, “자동차” 및/또는 “표지판”을 검출할 수 있다.According to an embodiment, the processor 110 may input the first input image 10 to the first model 510 to detect a first object, a second object, and/or a third object. For example, the processor 110 may use the first model 510 to detect “person”, “vehicle” and/or “sign” included in the first input image.

일 실시예에 따르면, 프로세서(110)는, 제 1 모델(510)을 이용하여 제 1 입력 이미지(10)에 포함된 객체를 검출한 결과에 기반하여, 객체 영역을 크롭(crop)하여 추출할 수 있다. 예를 들어, 프로세서(110)는, 제 1 입력 이미지(10)에 포함된 제 1 객체를 검출하고, 제 1 객체에 대응하는 제 1 입력 이미지의 영역(21)을 추출할 수 있다. 예를 들어, 프로세서(110)는, 제 1 입력 이미지(10)에 포함된 제 2 객체를 검출하고, 제 2 객체에 대응하는 제 1 입력 이미지의 영역(22)을 추출할 수 있다. 예를 들어, 프로세서(110)는, 제 1 입력 이미지(10)에 포함된 제 3 객체를 검출하고, 제 3 객체에 대응하는 제 1 입력 이미지의 영역(23)을 추출할 수 있다. According to an embodiment, the processor 110 crops and extracts an object area based on a result of detecting an object included in the first input image 10 using the first model 510. can For example, the processor 110 may detect a first object included in the first input image 10 and extract a region 21 of the first input image corresponding to the first object. For example, the processor 110 may detect a second object included in the first input image 10 and extract a region 22 of the first input image corresponding to the second object. For example, the processor 110 may detect a third object included in the first input image 10 and extract a region 23 of the first input image corresponding to the third object.

일 실시예에 따르면, 프로세서(110)는, 제 1 모델을 이용하여 추출한 객체에 대응하는 영역을 제 2 입력 이미지로 제 2 모델에 입력할 수 있다. 예를 들어, 프로세서(110)는 제 1 객체로 검출된 이미지 영역(21)을 제 1 객체의 제 2 모델(521)에 입력할 수 있다. 예를 들어, 프로세서(110)는 제 2 객체로 검출된 이미지 영역(22)을 제 2 객체의 제 2 모델(522)에 입력할 수 있다. 예를 들어, 프로세서(110)는 제 3 객체로 검출된 이미지 영역(23)을 제 3 객체의 제 2 모델(523)에 입력할 수 있다.According to an embodiment, the processor 110 may input a region corresponding to the object extracted using the first model to the second model as a second input image. For example, the processor 110 may input the image area 21 detected as the first object to the second model 521 of the first object. For example, the processor 110 may input the image area 22 detected as the second object to the second model 522 of the second object. For example, the processor 110 may input the image area 23 detected as the third object to the second model 523 of the third object.

다양한 실시예에 따르면, 프로세서(110)는, 제 2 모델(521, 522, 523)로부터 확률 값을 출력할 수 있다.According to various embodiments, the processor 110 may output probability values from the second models 521 , 522 , and 523 .

제 2 모델(521, 522, 523)은 입력된 이미지(21, 22, 23)의 특징 값에 기반하여 입력된 이미지(21, 22, 23)에 포함된 객체의 진위 여부와 관련된 값을 출력할 수 있다.The second models 521, 522, and 523 output values related to authenticity of objects included in the input images 21, 22, and 23 based on feature values of the input images 21, 22, and 23. can

일 실시예에 따르면, 제 2 모델(521, 522, 523)은 크로스 엔트로피 기반 손실 함수를 이용하여 입력된 이미지(21, 22, 23)에 포함된 객체의 진위 여부에 대한 분류 결과를 출력할 수 있다. 예를 들어, 제 2 모델(521, 522, 523)은 시그모이드(sigmoid) 함수를 이용할 수 있다. According to an embodiment, the second models 521 , 522 , and 523 may output a classification result for authenticity of objects included in the input images 21 , 22 , and 23 using a cross-entropy-based loss function. there is. For example, the second models 521, 522, and 523 may use a sigmoid function.

다양한 실시예에 따르면, 프로세서(110)는, 확률 값이 임계 값 이상인지 여부를 확인할 수 있다.According to various embodiments, the processor 110 may check whether the probability value is greater than or equal to a threshold value.

제 2 모델(521, 522, 523)은 입력된 이미지(21, 22, 23)에 대한 출력 값으로 0~1의 벡터 값을 출력할 수 있다.The second models 521, 522, and 523 may output vector values of 0 to 1 as output values for the input images 21, 22, and 23.

다양한 실시예에 따르면, 프로세서(110)는, 확률 값이 임계 값 이상인지 여부에 따라, 검출된 객체가 실제 객체 또는 미지 객체(unknown)라고 판단할 수 있다.According to various embodiments, the processor 110 may determine that the detected object is a real object or an unknown object according to whether the probability value is greater than or equal to a threshold value.

예를 들어, 제 1 객체의 제 2 모델(521)은 제 1 객체로 검출된 이미지(21)의 출력 값이 임계값 이상임에 대응하여, 제 1 객체로 검출된 이미지(21)가 제 1 객체(31)라고 판단할 수 있다. 예를 들어, 제 1 객체의 제 2 모델(521)은 제 1 객체로 검출된 이미지(21)의 출력 값이 임계값 미만임에 대응하여, 제 1 객체로 검출된 이미지(21)가 미지 객체(32)라고 판단할 수 있다.For example, the second model 521 of the first object corresponds to an output value of the image 21 detected as the first object being greater than or equal to a threshold value, so that the image 21 detected as the first object is the first object. (31) can be judged. For example, the second model 521 of the first object corresponds to an output value of the image 21 detected as the first object being less than a threshold value, so that the image 21 detected as the first object is an unknown object. (32) can be judged.

예를 들어, 제 2 객체의 제 2 모델(522)은 제 2 객체로 검출된 이미지(22)의 출력 값이 임계값 이상임에 대응하여, 제 2 객체로 검출된 이미지(22)가 제 2 객체(33)라고 판단할 수 있다. 예를 들어, 제 2 객체의 제 2 모델(522)은 제 2 객체로 검출된 이미지(22)의 출력 값이 임계값 미만임에 대응하여, 제 2 객체로 검출된 이미지(22)가 미지 객체(33)라고 판단할 수 있다.For example, the second model 522 of the second object corresponds to an output value of the image 22 detected as the second object being greater than or equal to a threshold value, so that the image 22 detected as the second object is the second object. (33) can be judged. For example, the second model 522 of the second object corresponds to an output value of the image 22 detected as the second object being less than a threshold value, so that the image 22 detected as the second object is an unknown object. (33) can be judged.

예를 들어, 제 3 객체의 제 2 모델(523)은 제 3 객체로 검출된 이미지(23)의 출력 값이 임계값 이상임에 대응하여, 제 3 객체로 검출된 이미지(23)가 제 3 객체(35)라고 판단할 수 있다. 예를 들어, 제 3 객체의 제 2 모델(522)은 제 3 객체로 검출된 이미지(23)의 출력 값이 임계값 미만임에 대응하여, 제 3 객체로 검출된 이미지(23)가 미지 객체(36)라고 판단할 수 있다.For example, the second model 523 of the third object corresponds to an output value of the image 23 detected as the third object being greater than or equal to a threshold value, so that the image 23 detected as the third object is the third object. (35) can be judged. For example, the second model 522 of the third object corresponds to an output value of the image 23 detected as the third object being less than a threshold value, so that the image 23 detected as the third object is an unknown object. (36) can be judged.

다양한 실시예에 따르면, 프로세서(110)는, 분류 결과에 기반하여, 이벤트를 수행할 수 있다.According to various embodiments, the processor 110 may perform an event based on a classification result.

일 실시예에 따르면, 프로세서(110)는, 실제 객체(31, 33, 35)로 분류된 결과에 따라, 객체와 관련된 이벤트를 수행할 수 있다.According to an embodiment, the processor 110 may perform an event related to an object according to a result of being classified as a real object 31 , 33 , or 35 .

일 실시예에 따르면, 프로세서(110)는, 미지 객체(32, 34, 36)로 분류된 결과에 따라, 객체와 관련된 이벤트를 수행하지 않을 수 있다.According to an embodiment, the processor 110 may not perform an event related to the object according to a result of being classified as the unknown object 32 , 34 , or 36 .

도 6a는, 다양한 실시예에 따른 방식이 다른 제 2 모델에서 활성화 함수의 각 출력 값을 시각화하여 비교한 도면이다.6A is a diagram in which each output value of an activation function is visualized and compared in a second model having a different method according to various embodiments.

그림 (a)는 입력 이미지, 그림 (b)는 제 1 객체의 클래스에 대한 미지 객체(unknown)를 레이블링 하고 sigmoid 함수를 이용하여 제 2 모델을 학습시켰을 때의 ReLU 활성화 함수, 그림 (c)는 제 1 객체만 레이블링 하고 softmax 함수를 이용하여 제 2 모델을 학습시켰을 때의 ReLU 활성화 함수를 시각화한 그림이다.Figure (a) is the input image, Figure (b) is the ReLU activation function when the unknown object (unknown) for the class of the first object is labeled and the second model is trained using the sigmoid function, Figure (c) is This is a visualized ReLU activation function when only the first object is labeled and the second model is trained using the softmax function.

ReLU 함수는 수학식 1과 같다.The ReLU function is as shown in Equation 1.

활성화 지도에서 밝은 부분이 ReLU 함수값이 활성화 된 부분으로, 객체 분류를 위한 사람 특징 추출에 네트워크가 활성화 된 영역, 즉, 관심을 가진 영역이다.In the activation map, the bright part is the part where the ReLU function value is activated, and the area where the network is activated in human feature extraction for object classification, that is, the area of interest.

그림 (b)를 참조하면, 미지의 객체를 0 벡터로 레이블링 항여 훈련한 제 2 모델은 입력 이미지 (a) 에서 “사람”으로 오인할 수 있는 영역(예 : 입력 이미지에서 밝은 부분)에서 “사람” 클래스에 대한 활성화가 되지 않(어두운 영역)았다. Referring to Figure (b), the second model trained by labeling the unknown object as a 0 vector is “human” in an area that can be mistaken for a “person” in the input image (a) (e.g., a bright part in the input image). ” No activation for the class (dark area).

그림 (c)를 참조하면, 제 1 객체만 레이블링 하여 훈련한 제 2 모델은 입력 이미지 (a) 에서 “사람”으로 오인될 수 있는 영역(예 : 입력 이미지에서 밝은 부분)에서 높게 활성화(밝은 영역)되었다.Referring to Figure (c), the second model trained by labeling only the first object has high activation (bright areas) in areas that can be mistaken for “humans” in the input image (a) ) became

(b), (c) 방식 모두 동일한 뉴럴 계층 구조를 띄고 있다. (a) 이미지에 대해 분류를 한 결과, (b) 방식에서는 모든 출력 벡터 값들이 특정 임계값(예 : 0.4)보다 낮아 Unknown 클래스로 분류한 반면, (c) 방식에서는 높은 확률로 “사람” 클래스를 예측하였다.Both methods (b) and (c) have the same neural hierarchy. As a result of (a) classification of images, in (b) method, all output vector values are lower than a certain threshold (eg 0.4), so they are classified as Unknown class, whereas in (c) method, “human” class with high probability predicted.

도 6b는, 다양한 실시예에 따른 제 2 모델에서, Attention 모듈의 적용 여부에 따른 활성화 함수의 출력 값을 시각화한 도면이다.6B is a diagram visualizing output values of activation functions according to whether an attention module is applied or not in a second model according to various embodiments.

다양한 실시예에 따르면, 제 2 모델은 계층 구조에 Attention 모듈을 포함할 수 있다. According to various embodiments, the second model may include an attention module in a hierarchical structure.

Attention 모듈은 N개의 필터를 가진 콘볼루션 연산을 통과한 특징 지도(feature map)의 첫번째부터 N번째까지의 채널에 대해 최종 분류 작업에 있어서 중요도가 낮은 채널들의 특징 값들을 상대적으로 낮춰주어 분류 결과에 대해 신뢰도와 성능을 향상시킬 수 있다.The Attention module relatively lowers the feature values of channels with low importance in the final classification task for the first to N channels of the feature map that have passed the convolution operation with N filters, thereby contributing to the classification result. Reliability and performance can be improved.

다양한 실시예에 따르면, 제 2 모델은 콘볼루션 계층 구조의 활성화 함수 뒷부분에 Attention 모듈을 추가하여 특징 추출에 유리한 채널들을 학습할 수 있다. According to various embodiments, the second model may learn channels advantageous to feature extraction by adding an attention module to the end of the activation function of the convolutional layer structure.

일 실시예에 따르면, Attention 모듈은 SE(Squeeze-and-Excitation) Block을 사용할 수 있다. Squeeze 부분에서는 콘볼루션 계층과 활성화 함수를 통과한 여러 겹의 채널을 가지고 있는 요약된 특징 정보들을 채널 길이만큼의 한 차원으로 펼쳐준다. 가령 (WIDTH, HEIGHT, CHANNEL) = (8, 8, 128)의 특징 정보가 있다고 하면 이를 (1, 1, 128) 차원으로 지역적 정보를 요약하게 된다. 지역적 정보는 Global Average Pooling으로 압축되며 수식은 수학식 2와 같을 수 있다.According to one embodiment, the attention module may use a squeeze-and-excitation (SE) block. In the Squeeze part, the summarized feature information with multiple layers of channels that have passed through the convolutional layer and the activation function is spread out in one dimension as long as the channel length. For example, if there is feature information of (WIDTH, HEIGHT, CHANNEL) = (8, 8, 128), regional information is summarized in (1, 1, 128) dimensions. Regional information is compressed by Global Average Pooling, and the formula may be the same as Equation 2.

요약된 지역적인 특징이 나오면 Excitation 단계로 채널 별 의존성 관계를 파악하기 위해 특징 벡터를 Fully Connected Layer와 ReLU를 통과시켜 채널 간 관계성을 찾아낸다.When the summarized regional features are found, the relationship between channels is found by passing the feature vector through the Fully Connected Layer and ReLU to identify the dependency relationship for each channel in the excitation step.

관계성을 가진 벡터가 나오면 관계 해석을 위해 최종 활성화 함수를 통과해야 한다. Attention이 필요한 채널들에 가중치를 줄 때 여러 채널들의 가중치가 강조되도록 유연하게 관계를 추출해야 한다. 이를 위해 sigmoid 함수를 사용하고 채널 사이 더욱 복잡한 관계 해석을 얻을 수 있다. 마지막으로 SE Block 입력의 모양을 동일하게 맞추고 추출한 관계를 반영하기 위해 sigmoid 함수를 통과한 채널 간 관계 벡터를 Squeeze 입력 벡터에 행렬곱 연산을 한다.When a vector with a relation is generated, it must pass through a final activation function to interpret the relation. When giving weights to channels that require attention, relationships must be extracted flexibly so that the weights of several channels are emphasized. For this, we can use the sigmoid function and obtain a more complex interpretation of the relationship between the channels. Finally, to match the shape of the SE Block input and reflect the extracted relationship, the relationship vector between channels that passed the sigmoid function is matrix-multiplied by the squeeze input vector.

도 6b는 전자 장치에서 ReLU 활성화 함수 뒤에 SE Block을 추가하지 않은 모델과 추가한 제 2 모델의 활성화 지도를 비교한 그림이다. 6B is a diagram comparing activation maps of a model in which an SE block is not added after a ReLU activation function in an electronic device and a second model in which an SE block is added.

도 6b는 “사람”, “unknown” 데이터로 학습 한 제 2 모델에 대하여, “사람” 클래스에 대한 ReLU 함수의 활성화 지도일 수 있다.6B may be an activation map of the ReLU function for the “person” class with respect to the second model learned with “person” and “unknown” data.

그림 (a)는 입력 이미지, 그림 (b)는 ReLU 활성화 함수 뒤에 SE Block을 추가하지 않은 모델의 활성화 함수, 그림 (c)는 ReLU 활성화 함수 뒤에 SE Block을 추가한 모델의 활성화 함수를 시각화한 그림이다. Figure (a) is the input image, Figure (b) is the activation function of the model without the SE Block added after the ReLU activation function, and Figure (c) is the visualization of the activation function of the model with the SE Block added after the ReLU activation function. am.

활성화 지도에서, 계층이 활성화될 수록 밝은 영역일 수 있다.In the activation map, the more activated the layer, the brighter the area.

그림 (b)(Attention 미적용)를 참조하면, 그림 (a)의 지협적인 팔, 다리 부분에서 계층이 활성화 된다. Attention 적용하지 않은 경우에는, 특징 벡터가 “사람”의 “손”, “다리”, “머리” 등에 대한 정보를 각각 채널 별로 담고 있을 때 이들 간의 관계성을 파악하지 않고 가장 크게 활성화 되는 채널을 신뢰하고 “사람”으로 인식할 수 있다. 이렇게 클래스의 특징적인 한 부분이 결정에 대부분을 기여한다면 오감지의 가능성이 있을 수 있다.Referring to Figure (b) (attention not applied), layers are activated in the narrow arms and legs of Figure (a). When Attention is not applied, when the feature vector contains information about “hand”, “leg”, and “head” of “person” for each channel, the relationship between them is not identified and the most activated channel is trusted. and can be recognized as a “person”. In this way, if a characteristic part of a class contributes most of the decision, there may be a possibility of false detection.

반면, 그림 (c)(Attention 적용)를 참조하면, 그림 (a)의 사람 신체 모양에 전반적으로 계층이 활성화 된다. Attention을 적용하면 “손”, “다리”, “머리” 등 “사람”의 특징적인 부분들의 복잡한 관계성을 파악하기 때문에 “사람”으로 판단하기 위해 “사람”의 여러 특징들이 모두 기여하게 된다. 따라서 Attention을 적용했을 경우 전체적인 사람의 모양이 활성화 되고 콘볼루션 신경망을 안정적으로 학습할 수 있게 된다.On the other hand, referring to Figure (c) (attention applied), the hierarchy is generally activated for the human body shape in Figure (a). When Attention is applied, the complex relationship between the characteristic parts of a “person” such as “hands,” “legs,” and “head” is grasped, so all of the various characteristics of a “person” contribute to being judged as a “person.” Therefore, when Attention is applied, the overall shape of a person is activated and convolutional neural networks can be learned stably.

도 7a, 7b 및 7c는 다양한 실시예에 따른 본 발명의 전자 장치가 객체를 검출 및 분류한 결과와 관련된 영상이다.7A, 7B, and 7C are images related to results of detecting and classifying objects by the electronic device of the present disclosure according to various embodiments.

도 7a, 7b 및 7c의 사진은 최대 300m 거리의 도로에서 테스트 장면이다.The photos of FIGS. 7a, 7b and 7c are test scenes on a road at a distance of up to 300 m.

도 7a를 참조하면, 약 200m 시작 지점에 특정 물체가 제 1 모델(Detection Model)에서 “사람”으로 검출되었지만, 제 2 모델(Clasification Vector Output Model)에서 해당 객체를 미지 객체(unknown)로 분류하였다. 주황색 박스로 표시된 부분이 제 1 모델에서 “사람”으로 검출되었으나 제 2 모델에서 “unknown”으로 분류된 객체이다.Referring to FIG. 7A, at a starting point of about 200 m, a specific object was detected as “human” in the first model (Detection Model), but the object was classified as unknown in the second model (Clasification Vector Output Model). . The part marked with an orange box is an object detected as “human” in the first model but classified as “unknown” in the second model.

도 7b를 참조하면, 제 1 모델의 경우, 빛에 상당히 민간한 반응을 보였으며, 지형지물에 비춰진 빛을 사람으로 오인식 할 수 있다. 반면, 제 2 모델에서 해당 객체를 미지 객체(unknown)로 분류하였다. 주황색 박스로 표시된 부분이 제 1 모델에서 “사람”으로 검출되었으나 제 2 모델에서 “unknown”으로 분류된 객체이다.Referring to FIG. 7B , in the case of the first model, it shows a very sensitive response to light, and can misrecognize the light projected on a feature as a person. On the other hand, in the second model, the corresponding object is classified as an unknown object. The part marked with an orange box is an object detected as “human” in the first model but classified as “unknown” in the second model.

도 7c 를 참조하면, 제 1 모델의 경우, 도로에 새겨져 있는 표식들을 사람으로 오인할 수 있다. 해당 표식들은 거리가 멀수록 겹쳐보이는 현상 때문에, 사람처럼 인식될 수 있다. 반면, 제 2 모델에서 해당 객체를 미지 객체(unknown)로 분류하였다. 주황색 박스로 표시된 부분이 제 1 모델에서 “사람”으로 검출되었으나 제 2 모델에서 “unknown”으로 분류된 객체이다.Referring to FIG. 7C , in the case of the first model, signs engraved on the road may be mistaken for people. The marks can be recognized as people because of the phenomenon of overlapping as the distance increases. On the other hand, in the second model, the corresponding object is classified as an unknown object. The part marked with an orange box is an object detected as “human” in the first model but classified as “unknown” in the second model.

다양한 실시예에 따른 전자 장치는, 제 1 학습 이미지 세트, 제 2 학습 이미지 세트, 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과를 출력하는 제 1 모델 및 제 2 입력 이미지의 특징 값에 기반하여 제 2 입력 이미지에 포함된 객체의 진위 여부와 관련된 값을 출력하는 제 2 모델을 저장하는 메모리 및 프로세서를 포함하고, 상기 프로세서는 상기 제 1 학습 이미지 세트를 이용하여 상기 제 1 모델을 학습시키고, 상기 제 2 학습 이미지 세트를 이용하여 상기 제 2 모델을 학습시키고, 상기 제 1 학습 이미지 세트는 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합이고, 상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 제 1 객체로 검출된 이미지에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합일 수 있다.An electronic device according to various embodiments includes a first model for outputting a detection result of an object included in a first input image based on feature values of a first training image set, a second training image set, and a first input image; and A memory and a processor for storing a second model that outputs a value related to the authenticity of an object included in a second input image based on a feature value of the second input image, and a processor configured to generate the first training image set The first training image set is used to train the first model, the second training image set is trained using the second training image set, the first training image set is a set of at least one image to which a class is assigned. , and the second training image set may be a set of images to which different label values are assigned to respective classes with respect to the image detected as the first object in the first model.

다양한 실시예에 따른 전자 장치에서, 상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 상기 제 1 객체로 검출된 이미지에 대하여, “클래스 A”“클래스 B”“클래스 C”및/또는 “unknown”으로 분류된 이미지들의 집합이고, 상기 “클래스 A”상기 “클래스 B”및/또는 상기 “클래스 C”는 상기 제 1 객체와 관련된 이미지들의 집합이고, 상기 “unknown”은 상기 “클래스 A”상기 “클래스 B”및/또는 상기 “클래스 C”로 분류되지 않아 상기 제 1 객체가 아닌 이미지들의 집합일 수 있다.In the electronic device according to various embodiments, the second training image set may include “Class A”, “Class B”, “Class C”, and/or “unknown” images with respect to images detected as the first object in the first model. is a set of images classified as, the “class A”, the “class B” and/or the “class C” are a set of images related to the first object, and the “unknown” is the “class A” It may be a set of images that are not classified as "class B" and/or "class C" and are not the first object.

다양한 실시예에 따른 전자 장치에서, 상기 “클래스 A”의 이미지는 [1-ε0+ε0+ε]로, 상기 “클래스 B”의 이미지는 [0+ε1-ε, 0+ε로, 상기 “클래스 C”의 이미지는 [0+ε0+ε1-ε]로, 상기 “unknown”이미지는 [0+ε0+ε0+ε]로 각각 라벨링된 이미지들의 집합이고, 상기 ε는 지정된 값일 수 있다.In the electronic device according to various embodiments, the image of “class A” is [1-ε0+ε0+ε], the image of “class B” is [0+ε1-ε, 0+ε, The image of class C” is a set of images labeled as [0+ε0+ε1-ε] and the “unknown” image as [0+ε0+ε0+ε], respectively, and ε may be a specified value.

다양한 실시예에 따른 전자 장치에서, 상기 제 2 모델은 시그모이드(sigmoid) 함수를 이용한 학습 모델일 수 있다.In an electronic device according to various embodiments, the second model may be a learning model using a sigmoid function.

다양한 실시예에 따른 전자 장치에서, 상기 제 2 모델은 활성화 함수 뒤에 attention 모듈을 더 포함하고, 상기 attention 모듈은 필터를 가진 콘볼루션 연산을 통과한 특징 지도(feature map)의 첫번째부터 N번째까지의 채널에 대해 최종 분류 작업에 있어서 중요도가 낮은 채널들의 특징 값들을 상대적으로 낮춰주어 분류 결과에 대해 신뢰도와 성능을 향상시킬 수 있다.In the electronic device according to various embodiments, the second model further includes an attention module after the activation function, and the attention module includes first to Nth feature maps that have passed a convolution operation with a filter. Reliability and performance of classification results may be improved by relatively lowering feature values of channels of low importance in the final classification of channels.

다양한 실시예에 따른 전자 장치에서, 상기 제 2 모델은 학습 데이터의 레이블 값에 레이블 스무딩(Label Smoothing) 기법을 적용하고, 상기 레이블 스무딩 기법은 특정 상수 ε을 정답(1) 값에서 빼고, 오답(0) 값에서 더하여 학습된 모델의 과신뢰와 과적합 현상을 완화할 수 있다.In an electronic device according to various embodiments, the second model applies a label smoothing technique to label values of training data, and the label smoothing technique subtracts a specific constant ε from the value of the correct answer (1) and returns an incorrect answer ( By adding from the value of 0), overconfidence and overfitting of the learned model can be mitigated.

다양한 실시예에 따른 전자 장치에서, 카메라를 더 포함하고, 상기 프로세서는 상기 카메라로부터 획득한 이미지를 학습된 제 1 모델에 입력하여 제 1 객체를 검출 및 추출하고, 상기 검출 및 추출한 제 1 객체를 학습된 제 2 모델에 입력하여 출력 값을 획득하고, 상기 출력 값이 임계값 이상임에 대응하여, 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단하고, 상기 출력 값이 임계값 미만임에 대응하여, 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단할 수 있다.The electronic device according to various embodiments further includes a camera, and the processor detects and extracts a first object by inputting an image obtained from the camera to a first learned model, and converts the detected and extracted first object into a first model. Obtain an output value by inputting it to the learned second model, and in response to the output value being greater than or equal to a threshold value, it is determined that the detected and extracted first object is a real first object, and the output value is a threshold value In response to being less than the value, it may be determined that the detected and extracted first object is an unknown object.

다양한 실시예에 따른 전자 장치에서, 상기 프로세서는 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하고, 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하지 않을 수 있다.In the electronic device according to various embodiments, the processor performs an event related to the first object in response to determining that the detected and extracted first object is a real first object, and the detected and extracted first object In response to determining that the first object is unknown, an event related to the first object may not be performed.

다양한 실시예에 따른 전자 장치에서, 상기 프로세서는 상기 학습된 제 2 모델에 상기 검출 및 추출한 제 1 객체를 입력하여 획득한 출력 값이 지정된 범위임에 대응하여, 상기 검출 및 추출한 제 1 객체에 출력 값을 라벨링하여 상기 메모리에 저장된 상기 제 2 학습 이미지 세트에 추가할 수 있다.In the electronic device according to various embodiments, the processor outputs the detected and extracted first object in response to an output value obtained by inputting the detected and extracted first object to the learned second model and is within a specified range. A value may be labeled and added to the second set of training images stored in the memory.

다양한 실시예에 따른 전자 장치의 동작 방법에서, 제 1 학습 이미지 세트를 이용하여 제 1 모델을 학습시키는 동작, 제 2 학습 이미지 세트를 이용하여 제 2 모델을 학습시키는 동작,을 포함하고, 상기 제 1 학습 이미지 세트는 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합이고, 상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 제 1 객체로 검출된 이미지에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합이고, 상기 제 1 모델은 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과를 출력하고, 상기 제 2 모델은 제 2 입력 이미지의 특징 값에 기반하여 제 2 입력 이미지에 포함된 객체의 진위 여부와 관련된 값을 출력할 수 있다.An operating method of an electronic device according to various embodiments includes: learning a first model using a first training image set; and learning a second model using a second training image set. In the first training image set, the first training image set is a set of at least one image to which a class is assigned, and the second training image set is a set of images detected as a first object in the first model, and different classes are assigned to the second training image set. A set of images to which label values are assigned, the first model outputs a detection result for an object included in the first input image based on a feature value of the first input image, and the second model outputs a detection result for an object included in the first input image. A value related to whether the object included in the second input image is authentic may be output based on the feature value of .

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 상기 제 1 객체로 검출된 이미지에 대하여, “클래스 A”“클래스 B”“클래스 C”및/또는 “unknown”으로 분류된 이미지들의 집합이고, 상기 “클래스 A”상기 “클래스 B”및/또는 상기 “클래스 C”는 상기 제 1 객체와 관련된 이미지들의 집합이고, 상기 “unknown”은 상기 “클래스 A”상기 “클래스 B”및/또는 상기 “클래스 C”로 분류되지 않아 상기 제 1 객체가 아닌 이미지들의 집합일 수 있다.In an operating method of an electronic device according to various embodiments, the second training image set may be "class A", "class B", "class C", and/or images detected as the first object in the first model. A set of images classified as “unknown”, the “class A”, the “class B” and/or the “class C” are a set of images related to the first object, and the “unknown” is the “class A” It may be a set of images that are not classified as the “Class B” and/or the “Class C” and are not the first object.

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 “클래스 A”의 이미지는 [1-ε,0+ε,0+ε]로, 상기 “클래스 B”의 이미지는 [0+ε,1-ε,0+ε]로, 상기 “클래스 C”의 이미지는 [0+ε,0+ε,1-ε]로, 상기 “unknown”이미지는 [0+ε,0+ε,0+ε]로 각각 라벨링된 이미지들의 집합일 수 있다.In the operating method of the electronic device according to various embodiments, the “class A” image is [1-ε,0+ε,0+ε], and the “class B” image is [0+ε,1- ε,0+ε], the “class C” image is [0+ε,0+ε,1-ε], and the “unknown” image is [0+ε,0+ε,0+ε] It may be a set of images each labeled with .

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 제 2 모델은 시그모이드(sigmoid) 함수를 이용한 학습 모델일 수 있다.In an operating method of an electronic device according to various embodiments, the second model may be a learning model using a sigmoid function.

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 제 2 모델은 활성화 함수 뒤에 attention 모듈을 더 포함하고, 상기 attention 모듈은 필터를 가진 콘볼루션 연산을 통과한 특징 지도(feature map)의 첫번째부터 N번째까지의 채널에 대해 최종 분류 작업에 있어서 중요도가 낮은 채널들의 특징 값들을 상대적으로 낮춰주어 분류 결과에 대해 신뢰도와 성능을 향상시킬 수 있다.In the operating method of the electronic device according to various embodiments, the second model further includes an attention module after the activation function, and the attention module is first to N of feature maps that have passed a convolution operation with a filter. In the final classification process for channels up to th, it is possible to improve reliability and performance of classification results by relatively lowering feature values of channels with low importance.

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 제 2 모델은 학습 데이터의 레이블 값에 레이블 스무딩(Label Smoothing) 기법을 적용하고, 상기 레이블 스무딩 기법은 특정 상수 ε을 정답(1) 값에서 빼고, 오답(0) 값에서 더하여 학습된 모델의 과신뢰와 과적합 현상을 완화할 수 있다.In an operating method of an electronic device according to various embodiments, the second model applies a label smoothing technique to label values of training data, and the label smoothing technique subtracts a specific constant ε from the correct answer (1) value. , it is possible to mitigate the overconfidence and overfitting phenomena of the learned model by adding from the wrong answer (0) value.

다양한 실시예에 따른 전자 장치의 동작 방법은, 카메라로부터 획득한 이미지를 학습된 제 1 모델에 입력하여 제 1 객체를 검출 및 추출하는 동작, 상기 검출 및 추출한 제 1 객체를 학습된 제 2 모델에 입력하여 출력 값을 획득하는 동작, 상기 출력 값이 임계값 이상임에 대응하여, 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단하는 동작, 및 상기 출력 값이 임계값 미만임에 대응하여, 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단하는 동작을 포함할 수 있다.An operating method of an electronic device according to various embodiments includes an operation of detecting and extracting a first object by inputting an image acquired from a camera into a learned first model, and converting the detected and extracted first object into a learned second model. An operation of obtaining an output value by inputting, an operation of determining that the detected and extracted first object is a real first object in response to the output value being greater than or equal to a threshold value, and an operation of determining that the output value is less than the threshold value Corresponding to, an operation of determining that the detected and extracted first object is an unknown object may be included.

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하는 동작, 및 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하지 않는 동작을 포함할 수 있다.An operating method of an electronic device according to various embodiments includes an operation of performing an event related to the first object in response to determining that the detected and extracted first object is a real first object, and the detection and In response to determining that the extracted first object is unknown, an operation of not performing an event related to the first object may be included.

다양한 실시예에 따른 전자 장치의 동작 방법은, 상기 학습된 제 2 모델에 제 2 입력 이미지를 입력하여 획득한 출력 값이 지정된 범위임에 대응하여, 제 2 입력 이미지에 출력 값을 라벨링하여 상기 제 2 학습 이미지 세트에 추가하는 동작을 포함할 수 있다.An operating method of an electronic device according to various embodiments may include labeling an output value in a second input image in response to an output value obtained by inputting a second input image to the learned second model in a specified range, 2 It can include the operation of adding to the training image set.

본 문서에 개시된 다양한 실시예들에 따른 전자 장치는 다양한 형태의 장치가 될 수 있다. 전자 장치는, 예를 들면, 휴대용 통신 장치(예: 스마트폰), 컴퓨터 장치, 휴대용 멀티미디어 장치, 휴대용 의료 기기, 카메라, 웨어러블 장치, 또는 가전 장치를 포함할 수 있다. 본 문서의 실시예에 따른 전자 장치는 전술한 기기들에 한정되지 않는다.Electronic devices according to various embodiments disclosed in this document may be devices of various types. The electronic device may include, for example, a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. An electronic device according to an embodiment of the present document is not limited to the aforementioned devices.

본 문서의 다양한 실시예들 및 이에 사용된 용어들은 본 문서에 기재된 기술적 특징들을 특정한 실시예들로 한정하려는 것이 아니며, 해당 실시예의 다양한 변경, 균등물, 또는 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 또는 관련된 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 아이템에 대응하는 명사의 단수 형은 관련된 문맥상 명백하게 다르게 지시하지 않는 한, 상기 아이템 한 개 또는 복수 개를 포함할 수 있다. 본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다. "제 1", "제 2", 또는 "첫째" 또는 "둘째"와 같은 용어들은 단순히 해당 구성요소를 다른 해당 구성요소와 구분하기 위해 사용될 수 있으며, 해당 구성요소들을 다른 측면(예: 중요성 또는 순서)에서 한정하지 않는다. 어떤(예: 제 1) 구성요소가 다른(예: 제 2) 구성요소에, "기능적으로" 또는 "통신적으로"라는 용어와 함께 또는 이런 용어 없이, "커플드" 또는 "커넥티드"라고 언급된 경우, 그것은 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로(예: 유선으로), 무선으로, 또는 제 3 구성요소를 통하여 연결될 수 있다는 것을 의미한다.Various embodiments of this document and terms used therein are not intended to limit the technical features described in this document to specific embodiments, but should be understood to include various modifications, equivalents, or substitutes of the embodiments. In connection with the description of the drawings, like reference numerals may be used for like or related elements. The singular form of a noun corresponding to an item may include one item or a plurality of items, unless the relevant context clearly dictates otherwise. In this document, "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "A Each of the phrases such as "at least one of , B, or C" may include any one of the items listed together in that phrase, or all possible combinations thereof. Terms such as "first", "second", or "first" or "secondary" may simply be used to distinguish a given component from other corresponding components, and may be used to refer to a given component in another aspect (eg, importance or order) is not limited. A (e.g., first) component is said to be "coupled" or "connected" to another (e.g., second) component, with or without the terms "functionally" or "communicatively." When mentioned, it means that the certain component may be connected to the other component directly (eg by wire), wirelessly, or through a third component.

본 문서의 다양한 실시예들에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구현된 유닛을 포함할 수 있으며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로와 같은 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는, 상기 부품의 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 일실시예에 따르면, 모듈은 ASIC(application-specific integrated circuit)의 형태로 구현될 수 있다. The term "module" used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as, for example, logic, logical blocks, parts, or circuits. can be used as A module may be an integrally constructed component or a minimal unit of components or a portion thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

본 문서의 다양한 실시예들은 기기(machine) 의해 읽을 수 있는 저장 매체(storage medium)에 저장된 하나 이상의 명령어들을 포함하는 소프트웨어로서 구현될 수 있다. 예를 들면, 기기의 프로세서는, 저장 매체로부터 저장된 하나 이상의 명령어들 중 적어도 하나의 명령을 호출하고, 그것을 실행할 수 있다. 이것은 기기가 상기 호출된 적어도 하나의 명령어에 따라 적어도 하나의 기능을 수행하도록 운영되는 것을 가능하게 한다. 상기 하나 이상의 명령어들은 컴파일러에 의해 생성된 코드 또는 인터프리터에 의해 실행될 수 있는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장 매체는, 비일시적(non-transitory) 저장 매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장 매체가 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장 매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다.Various embodiments of this document may be implemented as software including one or more instructions stored in a storage medium readable by a machine. For example, the processor of the device may call at least one command among one or more commands stored from a storage medium and execute it. This enables the device to be operated to perform at least one function according to the at least one command invoked. The one or more instructions may include code generated by a compiler or code executable by an interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' only means that the storage medium is a tangible device and does not contain a signal (e.g. electromagnetic wave), and this term refers to the case where data is stored semi-permanently in the storage medium. It does not discriminate when it is temporarily stored.

일실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory(CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 또는 두 개의 사용자 장치들(예: 스마트 폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, the method according to various embodiments disclosed in this document may be included and provided in a computer program product. Computer program products may be traded between sellers and buyers as commodities. A computer program product is distributed in the form of a device-readable storage medium (e.g. compact disc read only memory (CD-ROM)), or through an application store (e.g. Play Store™) or on two user devices (e.g. It can be distributed (eg downloaded or uploaded) online, directly between smart phones. In the case of online distribution, at least part of the computer program product may be temporarily stored or temporarily created in a device-readable storage medium such as a manufacturer's server, an application store server, or a relay server's memory.

다양한 실시예들에 따르면, 상기 기술한 구성요소들의 각각의 구성요소(예: 모듈 또는 프로그램)는 단수 또는 복수의 개체를 포함할 수 있으며, 복수의 개체 중 일부는 다른 구성요소에 분리 배치될 수도 있다. 다양한 실시예들에 따르면, 전술한 해당 구성요소들 중 하나 이상의 구성요소들 또는 동작들이 생략되거나, 또는 하나 이상의 다른 구성요소들 또는 동작들이 추가될 수 있다. 대체적으로 또는 추가적으로, 복수의 구성요소들(예: 모듈 또는 프로그램)은 하나의 구성요소로 통합될 수 있다. 이런 경우, 통합된 구성요소는 상기 복수의 구성요소들 각각의 구성요소의 하나 이상의 기능들을 상기 통합 이전에 상기 복수의 구성요소들 중 해당 구성요소에 의해 수행되는 것과 동일 또는 유사하게 수행할 수 있다. 다양한 실시예들에 따르면, 모듈, 프로그램 또는 다른 구성요소에 의해 수행되는 동작들은 순차적으로, 병렬적으로, 반복적으로, 또는 휴리스틱하게 실행되거나, 상기 동작들 중 하나 이상이 다른 순서로 실행되거나, 생략되거나, 또는 하나 이상의 다른 동작들이 추가될 수 있다.According to various embodiments, each component (eg, module or program) of the above-described components may include a single object or a plurality of entities, and some of the plurality of entities may be separately disposed in other components. there is. According to various embodiments, one or more components or operations among the aforementioned corresponding components may be omitted, or one or more other components or operations may be added. Alternatively or additionally, a plurality of components (eg modules or programs) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the plurality of components identically or similarly to those performed by a corresponding component among the plurality of components prior to the integration. . According to various embodiments, the actions performed by a module, program, or other component are executed sequentially, in parallel, iteratively, or heuristically, or one or more of the actions are executed in a different order, or omitted. or one or more other actions may be added.

Claims

전자 장치에 있어서,
제 1 학습 이미지 세트, 제 2 학습 이미지 세트, 제 1 입력 이미지의 특징 값에 기반하여 제 1 입력 이미지에 포함된 객체에 대하여 검출 결과를 출력하는 제 1 모델 및 제 2 입력 이미지의 특징 값에 기반하여 제 2 입력 이미지에 포함된 객체의 진위 여부와 관련된 값을 출력하는 제 2 모델을 저장하는 메모리; 및
프로세서;를 포함하고,
상기 프로세서는
상기 제 1 학습 이미지 세트를 이용하여 학습된 상기 제 1 모델에 상기 제 1 입력 이미지를 입력하고,
상기 제 1 모델을 기반으로 상기 제 1 입력 이미지에 포함된 제 1 객체를 검출하고,
상기 검출된 제 1 객체에 대응되는 제 1 객체 영역을 추출하고,
상기 추출된 제 1 객체 영역을 상기 제 2 입력 이미지로 설정하고,
상기 제 2 학습 이미지 세트를 이용하여 학습된 상기 제 2 모델에 상기 제 2 입력 이미지를 입력하고,
상기 제 2 모델을 기반으로 상기 제 2 입력 이미지에 포함된 상기 제 1 객체가 실제 객체인지 여부를 판단하고,
상기 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합이고,
상기 제 2 학습 이미지 세트는 상기 제 1 모델을 기반으로 검출된, 상기 제 1 객체에 대응되는 상기 제 1 객체 영역에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합인 전자 장치.
In electronic devices,
A first model outputting a detection result for an object included in the first input image based on the first training image set, the second training image set, and the feature value of the first input image, and based on the feature value of the second input image a memory for storing a second model for outputting a value related to authenticity of an object included in a second input image; and
Including; processor;
The processor
inputting the first input image to the first model learned using the first training image set;
Detecting a first object included in the first input image based on the first model;
Extracting a first object region corresponding to the detected first object;
Setting the extracted first object region as the second input image;
Inputting the second input image to the second model learned using the second training image set;
Determine whether the first object included in the second input image is a real object based on the second model;
The first training image set is a set of at least one image to which a class is assigned,
The second training image set is a set of images to which different label values are assigned to respective classes of the first object region corresponding to the first object, detected based on the first model.

제 1 항에 있어서,
상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 상기 제 1 객체로 검출된 이미지에 대하여, “클래스 A”, “클래스 B”, “클래스 C” 및 “unknown”으로 분류된 이미지들의 집합이고,
상기 “클래스 A”, 상기 “클래스 B” 및 상기 “클래스 C” 는 상기 제 1 객체와 관련된 이미지들의 집합이고,
상기 “unknown”은 상기 “클래스 A”, 상기 “클래스 B” 및 상기 “클래스 C”로 분류되지 않아 상기 제 1 객체가 아닌 이미지들의 집합인 전자 장치.
According to claim 1,
The second training image set is a set of images classified as “class A”, “class B”, “class C”, and “unknown” with respect to the image detected as the first object in the first model,
The "class A", the "class B" and the "class C" are sets of images related to the first object,
The "unknown" is a set of images that are not the first object because they are not classified into the "Class A", the "Class B", and the "Class C".

제 2 항에 있어서,
상기 “클래스 A”의 이미지는 [1-ε, 0+ε, 0+ε]로,
상기 “클래스 B”의 이미지는 [0+ε, 1-ε, 0+ε]로,
상기 “클래스 C”의 이미지는 [0+ε, 0+ε, 1-ε]로,
상기 “unknown” 이미지는 [0+ε, 0+ε, 0+ε]로 각각 라벨링된 이미지들의 집합이고,
상기 ε는 지정된 값인 전자 장치.

According to claim 2,
The image of “Class A” is [1-ε, 0+ε, 0+ε],
The image of “class B” is [0+ε, 1-ε, 0+ε],
The image of “class C” is [0+ε, 0+ε, 1-ε],
The “unknown” image is a set of images each labeled [0+ε, 0+ε, 0+ε],
The electronic device wherein ε is a designated value.

제 3 항에 있어서,
상기 제 2 모델은 시그모이드(sigmoid) 함수를 이용한 학습 모델인
전자 장치.
According to claim 3,
The second model is a learning model using a sigmoid function.
electronic device.

제 1 항에 있어서,
상기 제 2 모델은 활성화 함수 뒤에 attention 모듈을 더 포함하고,
상기 attention 모듈은 필터를 가진 콘볼루션 연산을 통과한 특징 지도(feature map)의 첫번째부터 N번째까지의 채널에 대해 최종 분류 작업에 있어서 중요도가 낮은 채널들의 특징 값들을 상대적으로 낮춰주어 분류 결과에 대해 신뢰도와 성능을 향상시키는 전자 장치.
According to claim 1,
The second model further includes an attention module after the activation function,
The attention module relatively lowers the feature values of channels with low importance in the final classification task for the first to Nth channels of the feature map that have passed the convolution operation with the filter. Electronics that improve reliability and performance.

제 1 항에 있어서,
상기 제 2 모델은 학습 데이터의 레이블 값에 레이블 스무딩(Label Smoothing) 기법을 적용하고,
상기 레이블 스무딩 기법은 특정 상수 ε을 정답(1) 값에서 빼고, 오답(0) 값에서 더하여 학습된 모델의 과신뢰와 과적합 현상을 완화하는
전자 장치.
According to claim 1,
The second model applies a label smoothing technique to label values of training data,
The label smoothing technique subtracts a specific constant ε from the correct (1) value and adds it to the incorrect (0) value to mitigate overconfidence and overfitting of the learned model.
electronic device.

제 1 항에 있어서,
카메라를 더 포함하고,
상기 프로세서는
상기 카메라로부터 획득한 이미지를 학습된 상기 제 1 모델에 입력하여 상기 제 1 객체를 검출 및 추출하고,
상기 검출 및 추출한 제 1 객체를 학습된 상기 제 2 모델에 입력하여 출력 값을 획득하고,
상기 출력 값이 임계값 이상임에 대응하여, 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단하고,
상기 출력 값이 임계값 미만임에 대응하여, 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단하는 전자 장치.
According to claim 1,
Including more cameras,
The processor
Detecting and extracting the first object by inputting an image acquired from the camera to the learned first model;
Obtaining an output value by inputting the detected and extracted first object to the learned second model;
In response to the output value being greater than or equal to a threshold value, determining that the detected and extracted first object is a real first object;
An electronic device that determines that the detected and extracted first object is an unknown object in response to the output value being less than a threshold value.

제 7 항에 있어서,
상기 프로세서는
상기 검출 및 추출된 제 1 객체가 상기 실제(real) 제 1 객체라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하고,
상기 검출 및 추출된 제 1 객체가 상기 미지 객체(unknown)라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하지 않는 전자 장치.
According to claim 7,
The processor
In response to determining that the detected and extracted first object is the real first object, performing an event related to the first object;
In response to determining that the detected and extracted first object is the unknown object, the electronic device does not perform an event related to the first object.

제 7 항에 있어서,
상기 프로세서는
상기 학습된 제 2 모델에 상기 검출 및 추출한 제 1 객체를 입력하여 획득한 출력 값이 지정된 범위임에 대응하여, 상기 검출 및 추출한 제 1 객체에 출력 값을 라벨링하여 상기 메모리에 저장된 상기 제 2 학습 이미지 세트에 추가하는
전자 장치.
According to claim 7,
The processor
Corresponding to the fact that the output value obtained by inputting the detected and extracted first object to the learned second model is within a designated range, the detected and extracted first object is labeled with an output value, and the second learning stored in the memory is performed. add to image set
electronic device.

전자 장치의 동작 방법에 있어서,
제 1 학습 이미지 세트를 이용하여 학습된 제 1 모델에 제 1 입력 이미지를 입력하는 동작;
상기 제 1 모델을 기반으로 상기 제 1 입력 이미지에 포함된 제 1 객체를 검출하는 동작;
상기 검출된 제 1 객체에 대응되는 제 1 객체 영역을 추출하는 동작;
상기 추출된 제 1 객체 영역을 제 2 입력 이미지로 설정하는 동작;
제 2 학습 이미지 세트를 이용하여 학습된 제 2 모델에 상기 제 2 입력 이미지를 입력하는 동작; 및
상기 제 2 모델을 기반으로 상기 제 2 입력 이미지에 포함된 상기 제 1 객체가 실제 객체인지 여부를 판단하는 동작; 을 포함하고,
상기 제 1 학습 이미지 세트는 클래스가 할당된 적어도 하나의 이미지들의 집합이고,
상기 제 2 학습 이미지 세트는 상기 제 1 모델을 기반으로 검출된, 상기 제 1 객체에 대응되는 상기 제 1 객체 영역에 대하여, 각각의 클래스에 상이한 라벨 값이 부여된 이미지들의 집합이고,
상기 제 1 모델은 상기 제 1 입력 이미지의 특징 값에 기반하여, 상기 제 1 입력 이미지에 포함된 상기 제 1 객체에 대하여 검출 결과를 출력하고,
상기 제 2 모델은 상기 제 2 입력 이미지의 특징 값에 기반하여, 상기 제 2 입력 이미지에 포함된 상기 제 1 객체가 실제 객체인지 여부와 관련된 값을 출력하는 방법.
In the operating method of the electronic device,
inputting a first input image to a first model learned using a first training image set;
detecting a first object included in the first input image based on the first model;
extracting a first object area corresponding to the detected first object;
setting the extracted first object region as a second input image;
inputting the second input image to a second model learned using a second training image set; and
determining whether the first object included in the second input image is a real object based on the second model; including,
The first training image set is a set of at least one image to which a class is assigned,
The second training image set is a set of images to which different label values are assigned to respective classes of the first object region corresponding to the first object, detected based on the first model,
The first model outputs a detection result for the first object included in the first input image based on a feature value of the first input image;
The second model outputs a value related to whether the first object included in the second input image is a real object based on the feature value of the second input image.

제 10 항에 있어서,
상기 제 2 학습 이미지 세트는 상기 제 1 모델에서 상기 제 1 객체로 검출된 이미지에 대하여, “클래스 A”, “클래스 B”, “클래스 C” 및 “unknown”으로 분류된 이미지들의 집합이고,
상기 “클래스 A”, 상기 “클래스 B” 및 상기 “클래스 C” 는 상기 제 1 객체와 관련된 이미지들의 집합이고,
상기 “unknown”은 상기 “클래스 A”, 상기 “클래스 B” 및 상기 “클래스 C”로 분류되지 않아 상기 제 1 객체가 아닌 이미지들의 집합인 방법.
According to claim 10,
The second training image set is a set of images classified as “class A”, “class B”, “class C”, and “unknown” with respect to the image detected as the first object in the first model,
The "class A", the "class B" and the "class C" are sets of images related to the first object,
The “unknown” is a set of images that are not the first object because they are not classified into the “Class A”, the “Class B”, and the “Class C”.

제 11 항에 있어서,
상기 “클래스 A”의 이미지는 [1-ε, 0+ε, 0+ε]로,
상기 “클래스 B”의 이미지는 [0+ε, 1-ε, 0+ε]로,
상기 “클래스 C”의 이미지는 [0+ε, 0+ε, 1-ε]로,
상기 “unknown” 이미지는 [0+ε, 0+ε, 0+ε]로 각각 라벨링된 이미지들의 집합
인
방법.
According to claim 11,
The image of “Class A” is [1-ε, 0+ε, 0+ε],
The image of “class B” is [0+ε, 1-ε, 0+ε],
The image of “class C” is [0+ε, 0+ε, 1-ε],
The “unknown” image is a set of images each labeled [0+ε, 0+ε, 0+ε]
person
method.

제 12 항에 있어서,
상기 제 2 모델은 시그모이드(sigmoid) 함수를 이용한 학습 모델인
방법.
According to claim 12,
The second model is a learning model using a sigmoid function.
method.

제 10 항에 있어서,
상기 제 2 모델은 활성화 함수 뒤에 attention 모듈을 더 포함하고,
상기 attention 모듈은
필터를 가진 콘볼루션 연산을 통과한 특징 지도(feature map)의 첫번째부터 N번째까지의 채널에 대해 최종 분류 작업에 있어서 중요도가 낮은 채널들의 특징 값들을 상대적으로 낮춰주어 분류 결과에 대해 신뢰도와 성능을 향상시키는
방법.
According to claim 10,
The second model further includes an attention module after the activation function,
The attention module
Reliability and performance for classification results are improved by relatively lowering the feature values of channels with low importance in the final classification task for the first to N channels of the feature map that have passed the convolution operation with a filter. improving
method.

제 10 항에 있어서,
상기 제 2 모델은 학습 데이터의 레이블 값에 레이블 스무딩(Label Smoothing) 기법을 적용하고,
상기 레이블 스무딩 기법은 특정 상수 ε을 정답(1) 값에서 빼고, 오답(0) 값에서 더하여 학습된 모델의 과신뢰와 과적합 현상을 완화하는
방법.
According to claim 10,
The second model applies a label smoothing technique to label values of training data,
The label smoothing technique subtracts a specific constant ε from the correct (1) value and adds it to the incorrect (0) value to mitigate overconfidence and overfitting of the learned model.
method.

제 10 항에 있어서,
카메라로부터 획득한 이미지를 학습된 상기 제 1 모델에 입력하여 상기 제 1 객체를 검출 및 추출하는 동작;
상기 검출 및 추출한 제 1 객체를 학습된 상기 제 2 모델에 입력하여 출력 값을 획득하는 동작;
상기 출력 값이 임계값 이상임에 대응하여, 상기 검출 및 추출된 제 1 객체가 실제(real) 제 1 객체라고 판단하는 동작; 및
상기 출력 값이 임계값 미만임에 대응하여, 상기 검출 및 추출된 제 1 객체가 미지 객체(unknown)이라고 판단하는 동작을 포함하는 방법.
According to claim 10,
detecting and extracting the first object by inputting an image acquired from a camera to the learned first model;
obtaining an output value by inputting the detected and extracted first object to the learned second model;
determining that the detected and extracted first object is a real first object in response to the output value being greater than or equal to a threshold value; and
and determining that the detected and extracted first object is an unknown object in response to the output value being less than a threshold value.

제 16 항에 있어서,
상기 검출 및 추출된 제 1 객체가 상기 실제(real) 제 1 객체라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하는 동작; 및
상기 검출 및 추출된 제 1 객체가 상기 미지 객체(unknown)라고 판단함에 대응하여, 상기 제 1 객체와 관련된 이벤트를 수행하지 않는 동작을 포함하는 방법.
17. The method of claim 16,
performing an event related to the first object in response to determining that the detected and extracted first object is the real first object; and
and not performing an event related to the first object in response to determining that the detected and extracted first object is the unknown object.

제 16 항에 있어서,
상기 학습된 제 2 모델에 제 2 입력 이미지를 입력하여 획득한 출력 값이 지정된 범위임에 대응하여, 제 2 입력 이미지에 출력 값을 라벨링하여 상기 제 2 학습 이미지 세트에 추가하는 동작을 포함하는
방법.17. The method of claim 16,
Corresponding to the fact that the output value obtained by inputting the second input image to the learned second model is within a specified range, labeling the output value to the second input image and adding it to the second training image set
method.