WO2020045903A1

WO2020045903A1 - Method and device for detecting object size-independently by using cnn

Info

Publication number: WO2020045903A1
Application number: PCT/KR2019/010766
Authority: WO
Inventors: 김대진; 김용현
Original assignee: 포항공과대학교 산학협력단
Priority date: 2018-08-28
Filing date: 2019-08-23
Publication date: 2020-03-05
Also published as: KR102213600B1; KR102213600B9; KR20200027078A

Abstract

Disclosed are a method and device for detecting an object size-independently by using a convolutional neural network (CNN). The method for detecting an object size-independently includes: a step for acquiring an input image including an object to be detected; a step for inputting the input image to a size recognition-based CNN that has learned the correlation between a fixed size feature extracted from an image in which the size of the object has been normalized, and a size-dependent feature extracted from the input image, and extracting a size-independent feature for the object; and a step for inputting the extracted size-independent feature to an object detection neural network, and detecting the object.

Description

CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치Method and apparatus for detecting objects in size independent using CNN

본 발명은 CNN을 이용하여 크기에 독립적으로 물체를 검출하는 방법 및 장치에 관한 것으로, 더욱 상세하게는 고정된 크기로 정규화된 물체의 특징과 다양한 크기를 갖는 물체의 특징 사이의 상관관계를 CNN을 이용해 학습하고, 학습된 CNN을 이용하여 물체의 크기에 관계없이 물체를 검출하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for detecting an object independently of size using a CNN. More particularly, the present invention relates to a correlation between a feature of an object normalized to a fixed size and a feature of an object having various sizes. The present invention relates to a method and apparatus for detecting an object regardless of the size of the object using the learned CNN.

물체 검출 기술은 로봇, 비디오 감시, 자동차 안전 등과 같은 여러 응용 분야에서 널리 사용되고 있는 핵심 기술이다. 최근에는, 물체 검출 기술에 합성 곱 신경망(convolutional neural network, CNN)을 사용하는 방식이 알려짐에 따라, 단일 영상을 이용한 물체 검출 기술은 비약적으로 발전하였다. Object detection technology is a key technology that is widely used in many applications such as robots, video surveillance, automotive safety, and so on. Recently, as a method of using a convolutional neural network (CNN) for object detection technology is known, the object detection technology using a single image has advanced dramatically.

합성 곱 신경망을 사용한 물체 검출 방법은 영역 추출(ROI Pooling)을 기반으로 하는 물체 검출 기술과 격자 공간(Grid Cell)을 기반으로 하는 물체 검출 기술로 분류할 수 있다. Object detection methods using a synthetic product neural network can be classified into object detection technology based on ROI pooling and object detection technology based on grid cell.

영역 추출을 기반으로 하는 물체 검출 방법은 합성 곱 신경망(CNN)을 이용해 단일 영상 전체의 합성 곱 특징을 계산하고, 계산된 특징을 이용하여 확인된 물체 후보 영역을 대상으로 영역 추출을 통한 합성 곱 특징을 계산한다. 상기 영역 추출을 기반으로 한 방법은 물체 후보 영역을 미리 정의된 추출 영역 크기로 분할한 뒤 분할된 영역에 대해 최대 혹은 평균값을 계산해 대입한다. 이처럼, 영역 추출을 기반으로 한 방법은 미리 정의된 추출 영역 크기를 사용하기 때문에 영상 내 물체 크기와 관계없이 동일 크기의 합성 곱 특징을 추출하게 된다. 따라서, 영상에 따라 다양한 크기로 표현되는 물체의 특징을 동일한 영역 크기를 갖는 합성 곱 특징으로 추출하기 때문에, 영상 내 물체 크기에 따라 합성 곱 특징의 해상도가 달라지고 특징의 중복과 누락이 발생하게 된다.The object detection method based on region extraction uses the composite product neural network (CNN) to calculate the composite product feature of the whole single image, and the extracted product candidate region is extracted using the calculated feature. Calculate The method based on region extraction divides an object candidate region into a predefined extraction region size and calculates and substitutes a maximum or average value for the divided region. As such, the method based on region extraction uses a predefined extraction region size to extract the composite product feature of the same size regardless of the object size in the image. Therefore, since the feature of the object represented by various sizes according to the image is extracted as the composite product feature having the same area size, the resolution of the synthesized product feature varies according to the object size in the image and duplication and omission of features occur. .

격자 공간을 기반으로 하는 물체 검출 방법은 합성 곱 신경망(CNN)을 이용해 단일 영상 전체의 합성 곱 특징을 계산하고 얻어진 합성 곱 특징에 따른 각각의 격자 공간을 물체와 대응시킨다. 격자 공간은 물체의 크기와 관계없이 격자 공간의 중심에 위치한 물체들을 대표한다. 상기 격자 공간을 기반으로 한 방법은 공간 정보가 없이 하나의 격자 공간 값으로 표현하기 때문에 물체의 크기 정보를 학습할 수 없게 된다.The object detection method based on the lattice space calculates the composite product feature of the whole single image by using a composite product neural network (CNN) and associates each lattice space according to the obtained composite product feature with the object. Grid space represents objects located in the center of grid space regardless of the size of the object. Since the grid space-based method is expressed as a grid space value without spatial information, the size information of the object cannot be learned.

종합하면, 현재의 합성 곱 신경망(CNN)을 이용한 단일 영상내의 물체 검출 기술은 영상마다 상이한 물체의 크기를 고려하지 않고 영역 추출 혹은 격자 공간에 기초한 방법을 이용해 물체를 검출하고 있다. 이 때문에, 같은 물체라고 하더라도 서로 다른 물체의 특징이 추출되므로 물체 검출의 정확도가 낮은 문제점이 있다.In summary, current object detection technology in a single image using a composite product neural network (CNN) detects objects using a method based on region extraction or grid space without considering the size of different objects for each image. For this reason, even if the same object is extracted from the characteristics of different objects there is a problem that the accuracy of object detection is low.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법을 제공하는 데 있다.An object of the present invention for solving the above problems is to provide a method for detecting an object independently of the size using a CNN (Convolutional Neural Network).

상기와 같은 문제점을 해결하기 위한 본 발명의 다른 목적은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치를 제공하는 데 있다.Another object of the present invention for solving the above problems is to provide an apparatus for detecting an object independently of the size using a CNN (Convolutional Neural Network).

상기 목적을 달성하기 위한 본 발명의 일 측면은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법을 제공한다.One aspect of the present invention for achieving the above object, provides a method for detecting an object independently of the size using a CNN (Convolutional Neural Network).

CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법은, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.A method of detecting an object independently in size by using a convolutional neural network (CNN) includes obtaining an input image including an object to be detected, a fixed size feature extracted from an image of a normalized size of the object, and the input image Inputting the input image to a size recognition-based CNN that has learned the correlation between the size-dependent features extracted at, extracting the size-independent features for the object, and inputting the extracted size-independent features to the object detection neural network. It may include the step of detecting.

상기 입력 영상을 획득하는 단계 전이나 후에, 상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함할 수 있다.The method may further include generating a fixed size feature database by collecting the fixed size feature before or after acquiring the input image.

상기 고정 크기 특징 DB를 생성하는 단계는, 상기 물체의 크기 정보를 수집하는 단계, 수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계, 결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계 및 정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계를 포함할 수 있다.The generating of the fixed size feature DB may include collecting size information of the object, determining a reference size using the collected size information of the object, and including the determined reference size in the input image. Normalizing the size of the object and extracting a fixed size feature by inputting the normalized input image to the CNN.

상기 참조 크기를 결정하는 단계는, 상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정할 수 있다.In the determining of the reference size, a median among the values included in the size information of the object may be determined as the reference size.

상기 크기 인식 기반 CNN은, 상기 참조 크기에 따라 상기 크기 의존적 특징을 분류하고, 분류된 크기에 상응하는 부분 신경망이 개별적으로 활성화되어 상기 크기 독립적 특징을 추출할 수 있다.The size recognition based CNN may classify the size dependent features according to the reference size, and partial neural networks corresponding to the classified sizes may be individually activated to extract the size independent features.

상기 크기 인식 기반 CNN은, 상기 크기 의존적 특징이 산출된 물체의 크기를 상기 참조 크기와 비교하여, 상기 참조 크기보다 큰 경우, 작은 경우 및 유사한 경우에 따라 개별적으로 활성화되는 3개의 부분 신경망을 포함할 수 있다.The size recognition based CNN may include three partial neural networks that are individually activated according to a smaller, smaller, and similar case when the size-dependent feature has been compared with the reference size, compared to the reference size. Can be.

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계는, 상기 크기 의존적 특징에 따른 특징값을 공간축상에서 모두 더한 값(r_c)과 상기 고정 크기 특징에 따른 특징값을 공간축상에서 모두 더한 값(

) 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth_L1)로 계산하여 결정될 수 있다.The correlation between the fixed size feature and the size dependent feature is a value obtained by adding a feature value according to the size dependent feature on the spatial axis (r _c ) and a feature value according to the fixed size feature on the spatial axis. (

Can be determined by calculating a smooth first order normal loss function (smooth _L1 ).

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계(L_san)는, 상기 입력 영상의 표시 방식에 따른 채널(C)에 대하여 수학식

로 정의될 수 있다.The correlation L _san between the fixed size feature and the size dependent feature is expressed by the equation for channel C according to the display method of the input image.

It can be defined as.

상기 크기 인식 기반 CNN은, 상기 상관 관계를 학습하여 신경망 내부 파라미터(parameter)를 설정하는 제1 크기 인식 기반 CNN 및 상기 제1 크기 인식 기반 CNN의 신경망 내부 파라미터를 공유하여 상기 크기 독립적 특징을 추출하는 제2 크기 인식 기반 CNN을 포함할 수 있다.The size recognition-based CNN is configured to extract the size independent feature by sharing a neural network internal parameter of a first size recognition-based CNN and a first size recognition-based CNN that learn the correlation to set neural network internal parameters. And may include a second size recognition based CNN.

상기 물체를 검출하는 단계는, 상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출할 수 있다.The detecting of the object may include detecting the object in consideration of position information in the image of the object according to the size dependent feature by inputting a result value combining the size independent feature and the size dependent feature to the object detecting neural network. can do.

상기 목적을 달성하기 위한 본 발명의 다른 측면은, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치를 제공한다.Another aspect of the present invention for achieving the above object, there is provided an apparatus for detecting an object independently of the size using a CNN (Convolutional Neural Network).

상기 CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치는, 적어도 하나의 프로세서(processor) 및 상기 적어도 하나의 프로세서가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함할 수 있다.An apparatus for detecting an object independently in size using the convolutional neural network (CNN) may include at least one processor and instructions for instructing the at least one processor to perform at least one step. It may include a memory (memory).

적어도 하나의 단계는, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.The at least one step may include obtaining an input image including an object to be detected, and learning a correlation between the fixed size feature extracted from the normalized image of the object and the size dependent feature extracted from the input image. Inputting the input image to a recognition-based CNN, extracting a size independent feature for the object, and inputting the extracted size independent feature to an object detection neural network to detect the object.

It can be defined as.

상기와 같은 본 발명에 따른 CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치를 이용할 경우에는 영상에 표현된 물체의 크기와 관계없이 물체 자체가 가진 고유의 특징을 이용하여 물체를 검출할 수 있으므로 검출 성능이 크게 향상될 수 있다.When using the method and apparatus for detecting an object independently in size using CNN (Convolutional Neural Network) according to the present invention, regardless of the size of the object represented in the image using the inherent characteristics of the object itself Since the object can be detected, the detection performance can be greatly improved.

또한, 기존의 합성 곱 특징(또는 크기 의존적 특징)과 크기에 무관한 크기 독립적 특징을 결합하여 물체를 검출하면, 기존의 합성 곱 특징에 따른 물체의 영상 내 위치 정보와 크기 독립적 특징을 모두 이용하여 물체를 검출할 수 있는 장점이 있다.In addition, when an object is detected by combining a conventional composite product feature (or size dependent feature) with a size independent feature independent of size, both the location information and the size independent feature in the image of the object according to the existing composite product feature are used. There is an advantage to detect the object.

도 1은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 대한 개념도이다.1 is a conceptual diagram of a method and apparatus for detecting an object independently in size using a CNN according to an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 따른 영상 내 물체에 대한 고정 크기 특징을 수집하여 데이터베이스로 구축하는 개념도이다.2 is a conceptual diagram of collecting a fixed size feature of an object in an image and constructing it into a database according to an embodiment of the present invention.

도 3은 본 발명의 일 실시예에 따른 크기 인식 기반 CNN에 대한 개념도이다.3 is a conceptual diagram of a size recognition based CNN according to an embodiment of the present invention.

도 4는 본 발명의 일 실시예에 따른 크기 인식 기반 CNN을 학습시키는 과정을 설명하기 위한 개념도이다.4 is a conceptual diagram illustrating a process of learning a size recognition based CNN according to an embodiment of the present invention.

도 5는 본 발명의 일 실시예에 따른 크기 의식 손실이 역전파되는 것을 차단하기 위해 크기 인식 기반 CNN을 듀얼로 구성한 개념도이다.FIG. 5 is a conceptual diagram of dual size recognition based CNNs to block back-propagation of loss of size consciousness according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법에 대한 흐름도이다. 6 is a flowchart illustrating a method for detecting an object independently of size using a CNN according to an embodiment of the present invention.

도 7은 본 발명의 일 실시예에 따른 고정 크기 특징을 수집하여 데이터베이스를 구축하는 방법에 대한 흐름도이다.7 is a flowchart illustrating a method of building a database by collecting fixed size features according to an embodiment of the present invention.

도 8은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치에 대한 구성도이다.8 is a block diagram of an apparatus for detecting an object independently of size using a CNN according to an embodiment of the present invention.

도 9a 내지 도 9d는 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법 및 장치에 따른 물체 검출 성능을 나타내는 그래프이다.9A to 9D are graphs illustrating object detection performance according to a method and apparatus for detecting an object independently in size using a CNN according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is said to be "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that another component may be present in the middle. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 일 실시예에 따르면, 영상에 표시되는 물체의 크기에 관계없이 물체의 형상 자체가 가진 고유의 특징(이하에서 크기 독립적 특징으로 지칭할 수 있음)을 추출하고, 추출된 특징을 이용하여 물체를 검출하기 위한 방법을 제안한다. 이러한 목적을 달성하기 위해 본 발명에서는 크기 독립적 특징을 추출할 수 있는 크기 인식 기반 CNN을 학습시키고, 학습된 크기 인식 기반 CNN을 이용하여 물체를 검출한다.According to an embodiment of the present invention, regardless of the size of an object displayed on an image, an inherent feature of the object itself (hereinafter, may be referred to as a size independent feature) may be extracted, and the extracted feature may be used. We propose a method for detecting an object. In order to achieve this object, the present invention learns a size recognition based CNN capable of extracting size independent features, and detects an object using the learned size recognition based CNN.

구체적으로 도 1을 참조하면, 먼저 검출하고자 하는 물체가 포함된 영상을 CNN(10)에 입력하여 물체에 대한 크기 의존적 특징을 추출할 수 있다. 이때, 크기 의존적 특징은 영상에 따라 다양한 크기를 갖는 물체의 특징을 추출한 것이기 때문에, 영상에 표시된 물체가 동일하더라도 물체의 크기에 따라 서로 다를 수 있다.In detail, referring to FIG. 1, a size-dependent feature of an object may be extracted by first inputting an image including an object to be detected to the CNN 10. In this case, since the size-dependent feature is extracted from the feature of an object having various sizes according to the image, even if the objects displayed on the image are the same, they may be different according to the size of the object.

여기서 CNN(10)은 입력 영상으로부터 특징을 추출하는 컨볼루셔널 레이어(Convolutional layer)를 포함할 수 있다. 컨볼루셔널 레이어는 입력 영상의 특징을 추출하는 필터, 필터의 값을 비선형 값으로 바꾸는 활성화 함수(activation function) 및 풀링 레이어(pooling layer) 중 적어도 하나를 포함할 수 있다. 필터는 일종의 행렬로 표현되는 입력 영상의 특징적 부분을 검출하는 함수로서, 일반적으로 행렬로 표현된다. 여기서 행렬로 표현된 입력 영상과 필터를 서로 합성 곱함으로써, 물체의 특징을 추출할 수 있는데, 여기서 추출된 특징은 특징 맵(feature map), 활성화 맵(activation map) 또는 관심 영역(Region of interest, ROI)으로 지칭될 수도 있다. 또한, 합성곱을 수행하는 간격 값을 스트라이드(stride)라고 지칭할 수 있는데, 스트라이드 값에 따라 다른 크기의 특징 맵이 추출될 수 있다. 이때, 특징 맵은 필터의 크기가 입력 영상보다 작으면, 기존의 입력 영상보다 더 작은 크기를 갖게 되는데, 여러 단계를 거쳐 특징이 소실되는 것을 방지하기 위하여 패딩 과정이 추가로 수행될 수 있다. 이때, 패딩 과정은 생성된 특징 맵의 외곽에 미리 설정된 값(예를 들면 0)을 추가함으로써 입력 영상의 크기와 특징 맵의 크기를 동일하게 유지하는 과정일 수 있다. Here, the CNN 10 may include a convolutional layer that extracts a feature from an input image. The convolutional layer may include at least one of a filter for extracting a feature of the input image, an activation function for changing a filter value into a nonlinear value, and a pooling layer. A filter is a function that detects a characteristic part of an input image represented by a matrix, and is generally represented by a matrix. Here, the feature of the object may be extracted by synthesizing and multiplying the input image and the filter represented by the matrix, wherein the extracted feature may be a feature map, an activation map, or a region of interest, ROI). In addition, an interval value for performing a compound product may be referred to as a stride, and a feature map having a different size may be extracted according to the stride value. At this time, if the size of the filter is smaller than the size of the input image, the size of the filter is smaller than that of the existing input image. The padding process may be additionally performed to prevent the feature from being lost through various steps. In this case, the padding process may be a process of maintaining the size of the input image and the size of the feature map by adding a preset value (for example, 0) to the outside of the generated feature map.

활성화 함수는 어떠한 값(또는 행렬)으로 추출된 특징을 비선형 값으로 바꾸는 함수로서, 시그모이드(sigmoid) 함수, ReLU 함수 등이 사용될 수 있다.The activation function is a function for converting a feature extracted as a value (or matrix) into a nonlinear value, and a sigmoid function, a ReLU function, and the like may be used.

풀링 레이어는 추출된 특징맵에 대하여 서브 샘플링(subsampling) 또는 풀링(pooling)을 수행하여 특징맵을 대표하는 특징을 선정하는 계층으로서, 특징맵의 일정 영역에 대하여 가장 큰 값을 추출하는 맥스 풀링(max pooling), 평균값을 추출하는 애버리지 풀링(average pooling) 등이 수행될 수 있다. 이때, 풀링 레이어는 활성화 함수 이후에 반드시 수행되는 것이 아니라 선택적으로 수행될 수 있다. The pooling layer is a layer that selects a feature representative of the feature map by performing subsampling or pooling on the extracted feature map. The pooling layer extracts the largest value for a predetermined region of the feature map. max pooling, average pooling to extract an average value, and the like may be performed. In this case, the pooling layer may be selectively performed instead of necessarily performed after the activation function.

그 밖에 CNN(10)의 구성과 동작에 대해서는 본 발명이 속하는 기술분야에서 통상의 기술자가 용이하게 이해할 수 있을 것이므로 구체적인 설명은 생략한다.In addition, since the structure and operation of the CNN 10 will be easily understood by those skilled in the art to which the present invention pertains, a detailed description thereof will be omitted.

한편, 본 발명의 일 실시예에 따르면, 영상에 표시된 다양한 크기의 물체를 미리 설정된 크기로 정규화하고, 미리 설정된 크기의 물체가 표시된 영상을 대상으로 추출한 특징(이하에서 고정 크기 특징으로 지칭)을 수집하여 고정 크기 특징 DB(database, DB, 11)를 미리 구축할 수 있다.Meanwhile, according to an embodiment of the present invention, objects of various sizes displayed on an image are normalized to a preset size, and features (hereinafter, referred to as fixed size features) that are extracted from an image on which an object of the preset size is displayed are collected. Fixed-size feature DB (database, DB, 11) can be built in advance.

이때, 미리 구축된 고정 크기 특징 DB(11)에서 얻어지는 고정 크기 특징과 다양한 크기의 물체가 표시된 영상에서 얻은 크기 의존적 특징 사이의 상관 관계를 크기 인식 기반 CNN(12)이 학습할 수 있도록 한다. 상관 관계를 학습한 크기 인식 기반 CNN(12)은 입력 영상에 대해 크기 독립적 특징을 추출할 수 있게 된다.In this case, the size recognition based CNN 12 may learn the correlation between the fixed size feature obtained from the fixed size feature DB 11 previously constructed and the size dependent feature obtained from the image in which objects of various sizes are displayed. The size recognition based CNN 12 having learned the correlation may extract the size independent feature of the input image.

다음으로, 크기 인식 기반 CNN(12)을 통해 크기 독립적 특징이 추출되면, 물체 검출 신경망(13)은 추출된 크기 독립적 특징을 이용하여 입력 영상에 표시된 물체를 검출할 수 있다. Next, when the size independent feature is extracted through the size recognition based CNN 12, the object detecting neural network 13 may detect the object displayed on the input image using the extracted size independent feature.

이때, 물체 검출 신경망(13)은 입력 영상에 표시된 물체를 하나 이상의 검출 가능성이 있는 후보 물체로 분류하는 분류기(classifier) 및 회귀 분석을 통해 어떤 물체인지 예측하는 리그레서(Regressor)로 구성될 수 있는데, 물체 검출 신경망(13)은 본 발명이 속하는 기술분야에서 통상의 기술자가 용이하게 구현할 수 있으므로 자세한 설명은 생략한다.In this case, the object detecting neural network 13 may include a classifier classifying an object displayed on an input image into one or more detectable candidate objects, and a regressor for predicting what kind of object is through regression analysis. Since the object detection neural network 13 can be easily implemented by those skilled in the art, detailed description thereof will be omitted.

본 발명의 일 실시예에 따르면, 도 1의 크기 인식 기반 CNN(12)을 학습시키기 위해 고정 크기 특징이 사용될 수 있다. 이때, 고정 크기 특징은 고정된 크기로 정규화된 물체에 대하여 추출된 특징으로서, 미리 수집될 필요가 있다.According to one embodiment of the invention, a fixed size feature may be used to train the size recognition based CNN 12 of FIG. In this case, the fixed size feature is a feature extracted for an object normalized to a fixed size and needs to be collected in advance.

도 2를 참조하면, 고정 크기 특징이 수집되어 데이터베이스로 구축되는 과정을 설명할 수 있다. 먼저 하나 이상의 영상(또는 하나 이상의 픽쳐)에 표시된 물체의 크기를 미리 설정된 크기로 정규화할 수 있다(21). 이때, 정규화된 영상은 영상 내에 표시된 물체가 하나 이상 존재할 경우, 개별 물체가 표시된 영역을 추출하고, 추출된 영역의 크기를 미리 설정된 크기로 정규화함으로써, 하나의 영상에 대해 복수개의 정규화된 영상이 도출될 수도 있다. Referring to FIG. 2, a process of collecting fixed size features and building them into a database may be described. First, the size of an object displayed on one or more images (or one or more pictures) may be normalized to a preset size (21). In this case, when one or more objects displayed in the image exist, the normalized image extracts a region in which an individual object is displayed and normalizes the size of the extracted region to a preset size, thereby deriving a plurality of normalized images for one image. May be

한편, 여기서 미리 설정된 크기는 별도로 구축된 물체 DB(22)를 통해 얻어진 참조 크기에 따라 설정될 수 있다. 여기서 물체 DB(22)는 개별 물체가 표시된 영상들이 저장되어 있거나, 개별 물체에 대한 크기(넓이, 폭, 높이 등으로 대표적으로는 넓이)가 저장되어 있다. 따라서, 물체 DB(22)에 포함된 크기 중에서 선정된 크기(예를 들면 중앙값)를 참조 크기로 하여 정규화할 크기가 설정될 수 있다.Meanwhile, the preset size may be set according to the reference size obtained through the separately constructed object DB 22. Here, the object DB 22 stores images in which individual objects are displayed, or stores sizes (typically, widths such as width, width, height, etc.) for individual objects. Therefore, the size to be normalized can be set by using the predetermined size (for example, the median value) among the sizes included in the object DB 22 as the reference size.

물체의 크기가 정규화된 영상이 도출되면, 크기 정규화된 영상을 CNN(23)에 입력하여 영상에 포함된 물체의 특징을 추출할 수 있다. 여기서 추출되는 물체의 특징은 크기가 정규화된 물체를 대상으로 추출되었기 때문에 도 1에서 설명한 고정 크기 특징으로 지칭될 수 있다. 여기서 CNN(23)은 도 1에 따른 CNN(10)과 동일한 구조를 가지거나 통상의 기술자에 따라 다양한 방식으로 구현될 수 있다. When the image of the normalized size of the object is derived, the feature of the object included in the image may be extracted by inputting the size normalized image to the CNN 23. The feature of the object to be extracted here may be referred to as the fixed size feature described with reference to FIG. Here, the CNN 23 may have the same structure as the CNN 10 according to FIG. 1 or may be implemented in various ways according to a person skilled in the art.

도 3을 참조하면, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 크기에 독립적으로(또는 크기에 관련없는) 물체의 특징을 추출하는 신경망으로서, 입력 영상에 대하여 추출된 관심영역들을 관심영역에 포함된 물체의 크기에 따라 분류하는 분류기(31) 물체의 크기에 따라 독립적으로 활성화되는 복수의 부분 신경망(32) 및 중 적어도 하나를 포함할 수 있다.Referring to FIG. 3, a size recognition based CNN according to an embodiment of the present invention is a neural network extracting a feature of an object independently of size (or not related to size), and interested regions extracted from an input image. The classifier 31 that classifies according to the size of the object included in the may include at least one of a plurality of partial neural network 32 and independently activated according to the size of the object.

여기서 분류기(31)는 입력된 관심영역을 물체의 크기에 따라 분류하여 해당되는 크기에 상응하는 부분 신경망(32)을 활성화할 수 있다. 이때, 분류 기준이 되는 크기는 앞서 도 2에 따른 참조 크기가 될 수 있다. The classifier 31 may classify the input ROI according to the size of the object to activate the partial neural network 32 corresponding to the corresponding size. In this case, the size used as the classification criterion may be the reference size according to FIG. 2.

예를 들어, 부분 신경망(32)은 참조 크기보다 미리 설정된 임계값 이상 작은 경우, 미리 설정된 임계값보다 큰 경우, 미리 설정된 임계값 이내인 경우에 따라 각각 하나씩 구현되어 최소 3개일 수 있다. 또한, 부분 신경망(32)은 미리 학습된 고정 크기 특징과 크기 의존적 특징 사이의 관계를 이용하여 입력된 물체의 특징에서, 물체의 크기가 미치는 영향을 제외함으로써 크기 독립적 특징을 추출할 수 있다.For example, when the partial neural network 32 is smaller than or equal to a predetermined threshold value than the reference size, when the partial neural network is larger than the preset threshold value, the partial neural network 32 may be implemented one by one, respectively, at least three. In addition, the partial neural network 32 may extract the size-independent feature by excluding the influence of the size of the object from the input feature of the object by using the relationship between the pre-learned fixed size feature and the size-dependent feature.

도 4를 참조하면, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 먼저 입력 영상(40)에 대하여 고정 크기 특징(42)을 추출하는 제1 과정(40a)과 입력 영상에서 추출된 관심 영역(43)을 크기 인식 기반 CNN(44)에 입력하여 크기 의존적 특징(45)을 추출하는 제2 과정(40b)을 통해 학습될 수 있다. Referring to FIG. 4, a size recognition based CNN according to an embodiment of the present invention first extracts a fixed size feature 42 from an input image 40 and a region of interest extracted from the input image. It can be learned through the second process 40b of extracting the size dependent feature 45 by inputting 43 to the size recognition based CNN 44.

여기서 고정 크기 특징(42)은 고정된 크기로 정규화된 관심 영역들(41)을 필터와 합성 곱(convolution)하고, 풀링을 선택적으로 수행하여 획득될 수 있다.The fixed size feature 42 may be obtained by convolutionally composing the regions of interest 41 normalized to a fixed size with a filter and optionally performing pooling.

이때, 크기 인식 기반 CNN(44)은 관심 영역(43)을 참조 크기에 따라 분류하고, 분류된 관심 영역(43)을 필터와 컨볼루션한 후 활성화 함수(ReLU)를 적용함으로써 크기 의존적 특징(45)을 추출할 수 있다. 여기서 추출된 크기 의존적 특징(45)은 입력 영상이 RGB에 기반한 경우, R,G,B 각각 3개의 채널에 따라 3개의 특징값일 수 있다.In this case, the size recognition-based CNN 44 classifies the region of interest 43 according to the reference size, convolves the classified region of interest 43 with a filter, and then applies an activation function (ReLU) to thereby determine the size-dependent feature 45. ) Can be extracted. The extracted size dependent feature 45 may be three feature values according to three channels of R, G, and B when the input image is based on RGB.

여기서, 크기 인식 기반 CNN(44)은 고정 크기 특징(42)의 전역 평균값과 크기 의존적 특징(45)의 전역 평균값 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth L1)로 계산함으로써, 고정 크기 특징(42)과 크기 의존적 특징(45) 사이의 관계를 학습할 수 있다.Here, the size recognition based CNN 44 calculates the difference between the global mean of the fixed size feature 42 and the global mean of the size dependent feature 45 with a smooth first order normal loss function (smooth L1), thereby One can learn the relationship between the feature 42 and the size dependent feature 45.

구체적으로, 고정 크기 특징(42)과 크기 의존적 특징(45) 사이의 관계는 다음의 수학식 1에 따른 함수로 정의될 수 있다.Specifically, the relationship between the fixed size feature 42 and the size dependent feature 45 may be defined as a function according to Equation 1 below.

상기 수학식 1을 참조하면, 각 채널(c, 예를 들어 R,G,B마다 각각 하나)마다 크기 의존적 특징(44)을 공간 축에 대하여 모두 더한 값(r_c)과 고정 크기 특징(42)을 공간 축(x, y)에 대하여 모두 더한 값(

) 사이의 차분값을 부드러운 정규 손실 함수(smooth_L1)에 입력하고 각 채널의 결과값을 합산함으로써, 고정 크기 특징(42)과 크기 의존적 특징(44) 사이의 관계가 정의될 수 있다. 이때, 수학식 1에 따른 관계는 크기 의식 손실(size aware loss)로 지칭될 수도 있다.Referring to Equation (1), each channel (c, for example, R, G, one each for each B) for each size-dependent characteristic (44) plus both the spatial-axis value (r _c) with the fixed size features (42 ) Plus all of the space axes (x, y) (

By inputting the difference value between) into a smooth normal loss function (smooth _L1 ) and summing the result of each channel, the relationship between the fixed size feature 42 and the size dependent feature 44 can be defined. In this case, the relationship according to Equation 1 may be referred to as size aware loss.

크기 인식 기반 CNN(44)에 대한 학습이 끝나면, 학습에 따라 결정된 파라미터를 기반으로 크기 인식 기반 CNN(44)에 따른 특징 추출 과정(40b)을 수행함으로써, 영상에 포함된 물체에 대한 크기 독립적 특징을 추출할 수 있고 추출된 크기 독립적 특징을 물체 검출 신경망(46)에 입력함으로써 최종적으로 물체를 검출할 수 있다.After the learning about the size recognition based CNN 44 is finished, the feature extraction process 40b according to the size recognition based CNN 44 is performed based on the parameter determined according to the learning, thereby making the size independent feature of the object included in the image. Can be extracted and the object can be finally detected by inputting the extracted size independent feature to the object detecting neural network 46.

일반적으로 인공 신경망(nurural network)은 다중 계층으로 구성되어 있고, 계층이 복잡해질수록 연산이 복잡해지고 최적의 값을 계산할 수 없는 문제가 있다. 이러한 문제를 해결하기 위해 인공 신경망에서는, 일반적으로 인공 신경망의 연산 결과값을 다시 역방향으로 계산하면서 결론을 도출하는 역전파(back propagation) 개념이 사용된다. 그러나, 본 발명의 일 실시예에 따른 크기 의식 손실(또는 크기 독립적 특징과 크기 의존적 특징 사이의 관계 학습)이 역전파되는 경우 검출 성능의 저하를 야기할 수 있다. 따라서, 본 발명의 일 실시예에 따른 크기 인식 기반 CNN은 역전파를 차단하기 위해 듀얼로 구성될 수 있다. In general, an artificial neural network (nurural network) is composed of a multi-layer, the more complicated the layer, the more complicated the operation and the optimal value can not be calculated. In order to solve this problem, in artificial neural networks, the concept of back propagation is generally used to draw conclusions by recalculating the computational results of artificial neural networks in the reverse direction. However, when the magnitude consciousness loss (or learning the relationship between the size independent and size dependent features) according to one embodiment of the present invention is propagated back, it may cause a decrease in detection performance. Therefore, the size recognition-based CNN according to an embodiment of the present invention may be dually configured to block backpropagation.

구체적으로, 도 5를 참조하면 제1 크기 인식 기반 CNN(51)이 크기 의식 손실을 학습하여 신경망 내부의 파라미터를 설정하게 되며, 설정된 파라미터를 공유하는 제2 크기 인식 기반 CNN(52)을 실제 영상의 물체 검출에 사용함으로써, 역전파를 차단할 수 있다. 반면, 제2 크기 인식 기반 CNN(52)는 검출 신경망에 따른 검출 결과(또는 검출 손실)를 다시 신경망 전체로 역전파할 수 있다. Specifically, referring to FIG. 5, the first size recognition-based CNN 51 learns the loss of size consciousness to set parameters in the neural network, and actual image of the second size recognition-based CNN 52 sharing the set parameters. By using it for object detection, backpropagation can be interrupted. On the other hand, the second size recognition based CNN 52 may back propagate the detection result (or detection loss) according to the detection neural network back to the entire neural network.

한편, 도 5에서 제2 크기 인식 기반 CNN에서 도출되는 크기 독립적 특징은 그대로 물체 검출 신경망에 입력되어 물체를 검출할 수도 있으나, 기존의 합성 곱 특징과 크기 독립적 특징을 서로 결합시키고 결합된 특징이 검출 신경망에 입력될 수 있다.Meanwhile, in FIG. 5, the size independent feature derived from the second size recognition based CNN may be input to the object detection neural network as it is, but the object may be detected. It can be entered into the neural network.

크기 독립적 특징은 영상 내 물체의 위치에 따른 정보가 배제되어 있기 때문에 기존의 합성 곱 특징과 결합시키면 표현력을 더 풍부하게 만들 수 있다. 여기서 결합된 특징은 기존 합성 곱 특징에 따른 물체의 위치를 반영하고 있고, 크기 독립적일 수 있다. 도 5를 참조하면, 이러한 특징 결합을 반영하기 위하여 가산기(53)가 부가된 것을 확인할 수 있다.Since the size-independent feature excludes information based on the position of the object in the image, combining it with the existing composite product feature can make the expression richer. The combined feature reflects the position of the object according to the existing composite product feature and may be size independent. Referring to FIG. 5, it can be seen that an adder 53 is added to reflect this feature combination.

도 6은 본 발명의 일 실시예에 따른 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법에 대한 흐름도이다. 도 7은 본 발명의 일 실시예에 따른 고정 크기 특징을 수집하여 데이터베이스를 구축하는 방법에 대한 흐름도이다. 6 is a flowchart illustrating a method for detecting an object independently of size using a CNN according to an embodiment of the present invention. 7 is a flowchart illustrating a method of building a database by collecting fixed size features according to an embodiment of the present invention.

도 6을 참조하면, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법은, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계(S100), 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계(S110) 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계(S120)를 포함할 수 있다.Referring to FIG. 6, in a method of detecting an object independently of size using a CNN (Convolutional Neural Network), the method may include obtaining an input image including an object to be detected (S100), and normalizing the size of the object. Extracting the size independent feature of the object by inputting the input image to a size recognition based CNN that learns a correlation between the fixed size feature extracted from and the size dependent feature extracted from the input image (S110) and extracted The method may include detecting an object by inputting a size independent feature to an object detecting neural network (S120).

상기 입력 영상을 획득하는 단계(S100) 전이나 후에, 상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함할 수 있다.The method may further include generating a fixed size feature DB by collecting the fixed size feature before or after obtaining the input image (S100).

도 7을 참조할 때 상기 고정 크기 특징 DB를 생성하는 단계는, 상기 물체의 크기 정보를 수집하는 단계(S200), 수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계(S210), 결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계(S220) 및 정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계(S230)를 포함할 수 있다.Referring to FIG. 7, the generating of the fixed size feature DB may include collecting size information of the object (S200), determining a reference size using the collected size information of the object (S210), Normalizing the size of an object included in the input image using the determined reference size (S220) and extracting a fixed size feature by inputting the normalized input image to the CNN (S230).

상기 참조 크기를 결정하는 단계(S110)는, 상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정할 수 있다.In the determining of the reference size (S110), a median among the values included in the size information of the object may be determined as the reference size.

It can be defined as.

상기 물체를 검출하는 단계(S120)는, 상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출할 수 있다.The detecting of the object (S120) may include inputting a result value combining the size independent feature and the size dependent feature into the object detecting neural network, thereby considering the positional information in the image of the object according to the size dependent feature. The object can be detected.

도 8을 참조하면, CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는, 적어도 하나의 프로세서(processor, 110) 및 상기 적어도 하나의 프로세서(110)가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory, 120)를 포함할 수 있다.Referring to FIG. 8, an apparatus 100 for detecting an object independently of size using a convolutional neural network (CNN) may include at least one processor 110 and at least one processor 110. It may include a memory (120) for storing instructions for performing a step.

상기 적어도 하나의 프로세서(110)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 메모리(120) 및 저장 장치(160) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다. The at least one processor 110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed. Can be. Each of the memory 120 and the storage device 160 may be configured as at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory 120 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).

또한, CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는, 무선 네트워크를 통해 통신을 수행하는 송수신 장치(transceiver, 130)를 포함할 수 있다. 또한, CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)는 입력 인터페이스 장치(140), 출력 인터페이스 장치(150), 저장 장치(160) 등을 더 포함할 수 있다. CNN을 이용하여 크기 독립적으로 물체를 검출하는 장치(100)에 포함된 각각의 구성 요소들은 버스(bus)(170)에 의해 연결되어 서로 통신을 수행할 수 있다.In addition, the apparatus 100 for detecting an object independently of size using a CNN may include a transceiver 130 for performing communication through a wireless network. In addition, the apparatus 100 for detecting an object independently of size using the CNN may further include an input interface device 140, an output interface device 150, a storage device 160, and the like. Each component included in the device 100 that detects an object independently of size using a CNN may be connected by a bus 170 to communicate with each other.

상기 적어도 하나의 단계는, 검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계, 상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계 및 추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함할 수 있다.The at least one step may include obtaining an input image including an object to be detected, and learning a correlation between a fixed size feature extracted from a normalized image of the object and a size dependent feature extracted from the input image. The method may include extracting a size independent feature of the object by inputting the input image to a size recognition based CNN, and detecting the object by inputting the extracted size independent feature to an object detection neural network.

상기 참조 크기를 결정하는 단계는, 상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정할 수 있다.The determining of the reference size may determine a median among the values included in the size information of the object as the reference size.

It can be defined as.

도 9a 내지 도 9d는 각각 배경(background), 비행기(aeroplane), 자전거(bicycle), 새(bird) 등이 도시된 픽쳐(또는 영상)을 입력 영상으로 사용하여 영상에 포함된 물체를 검출한 성능을 나타낸 것이다. 이때, 각 그래프는 기존의 관심 영역 기반의 물체 검출 방법(pooling)과 본 발명의 일 실시예에 따른 크기 인식 기반 CNN을 이용하여 크기 독립적으로 물체를 검출하는 방법(Pooling with SAN)을 서로 비교하여, 물체의 크기(scale)에 따른 평균 제곱근 오차(root mean square error, RMSE)을 도시한 것이다.9A to 9D are performances of detecting an object included in an image using a picture (or image) showing a background, aeroplane, a bicycle, a bird, etc. as an input image, respectively. It is shown. In this case, each graph compares a conventional object detection method based on a region of interest (pooling) and a method of detecting an object independently of size using a size recognition based CNN according to an embodiment of the present invention (Pooling with SAN) The root mean square error (RMS) is shown according to the scale of the object.

도 9a 내지 도 9d를 참조하면, 본 발명의 일 실시예에 따른 크기 독립적으로 물체를 검출하는 방법 및 장치를 사용한 경우에서, 종래의 기술보다 오검출율이 모든 크기 범위에서 더 낮게 측정된 것을 확인할 수 있다. 9A to 9D, in the case of using a method and apparatus for detecting an object independently in size according to an embodiment of the present invention, it is confirmed that a false detection rate is measured lower in all size ranges than in the prior art. Can be.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below I can understand that you can.

Claims

CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 방법에서,In a method of detecting an object independently of size using a CNN (Convolutional Neural Network),

검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계;Obtaining an input image including an object to be detected;

상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계; 및The size-independent feature of the object is extracted by inputting the input image to a size recognition-based CNN that learns a correlation between the fixed size feature extracted from the normalized image of the object and the size dependent feature extracted from the input image. Making; And

추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함하는, 크기 독립적으로 물체를 검출하는 방법.And detecting the object by inputting an extracted size independent feature into an object detection neural network.
청구항 1에서,In claim 1,

상기 입력 영상을 획득하는 단계 전이나 후에,Before or after acquiring the input image,

상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함하는, 크기 독립적으로 물체를 검출하는 방법.Collecting the fixed size feature to generate a fixed size feature database (DB).
청구항 2에서,In claim 2,

상기 고정 크기 특징 DB를 생성하는 단계는,Generating the fixed size feature DB,

상기 물체의 크기 정보를 수집하는 단계;Collecting size information of the object;

수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계;Determining a reference size using the collected size information of the object;

결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계; 및Normalizing the size of an object included in the input image using the determined reference size; And

정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계를 포함하는, 크기 독립적으로 물체를 검출하는 방법.And extracting a fixed size feature by inputting the normalized input image to a CNN.
청구항 3에서,In claim 3,

상기 참조 크기를 결정하는 단계는,Determining the reference size,

상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정하는, 크기 독립적으로 물체를 검출하는 방법.And determining a median among the values included in the size information of the object as the reference size.
청구항 3에서,In claim 3,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 참조 크기에 따라 상기 크기 의존적 특징을 분류하고, 분류된 크기에 상응하는 부분 신경망이 개별적으로 활성화되어 상기 크기 독립적 특징을 추출하는, 크기 독립적으로 물체를 검출하는 방법.Classifying the size dependent features according to the reference size, and partial neural networks corresponding to the sorted sizes are individually activated to extract the size independent features.
청구항 5에서,In claim 5,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 크기 의존적 특징이 산출된 물체의 크기를 상기 참조 크기와 비교하여, 상기 참조 크기보다 큰 경우, 작은 경우 및 유사한 경우에 따라 개별적으로 활성화되는 3개의 부분 신경망을 포함하는, 크기 독립적으로 물체를 검출하는 방법.Compare the size of the object from which the size dependent feature is calculated with the reference size to detect the object independently of size, including three partial neural networks that are individually activated according to the case of being larger, smaller and similar to the reference size How to.
청구항 1에서,In claim 1,

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계는,The correlation between the fixed size feature and the size dependent feature is

상기 크기 의존적 특징에 따른 특징값을 공간축상에서 모두 더한 값(r_c)과 상기 고정 크기 특징에 따른 특징값을 공간축상에서 모두 더한 값(
) 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth_L1)로 계산하여 결정되는, 크기 독립적으로 물체를 검출하는 방법.The sum of the feature values according to the size dependent features on the spatial axis (r _c ) and the sum of the feature values according to the fixed size features on the spatial axis (
) Primary normal loss the difference value between the smooth function (method of detecting, as the object size is determined independently of calculating a smooth _L1).
청구항 7에서,In claim 7,

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계(L_san)는, 상기 입력 영상의 표시 방식에 따른 채널(C)에 대하여 수학식The correlation L _san between the fixed size feature and the size dependent feature is expressed by the equation for channel C according to the display method of the input image.

로 정의되는, 크기 독립적으로 물체를 검출하는 방법.
A method of detecting an object independently of size, defined as.
청구항 1에서,In claim 1,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 상관 관계를 학습하여 신경망 내부 파라미터(parameter)를 설정하는 제1 크기 인식 기반 CNN; 및A first size recognition based CNN learning the correlation to set neural network internal parameters; And

상기 제1 크기 인식 기반 CNN의 신경망 내부 파라미터를 공유하여 상기 크기 독립적 특징을 추출하는 제2 크기 인식 기반 CNN을 포함하는, 크기 독립적으로 물체를 검출하는 방법.And a second size recognition based CNN that shares the neural network internal parameters of the first size recognition based CNN to extract the size independent features.
청구항 1에서,In claim 1,

상기 물체를 검출하는 단계는,Detecting the object,

상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출하는, 크기 독립적으로 물체를 검출하는 방법.By inputting a result value combining the size independent feature and the size dependent feature into the object detecting neural network, the object is detected independently of size, which detects the object in consideration of position information in the image of the object according to the size dependent feature. How to.
CNN(Convolutional Neural Network)을 이용하여 크기 독립적으로 물체를 검출하는 장치로서,An apparatus for detecting an object independently of size using a CNN (Convolutional Neural Network),

적어도 하나의 프로세서(processor); 및At least one processor; And

상기 적어도 하나의 프로세서가 적어도 하나의 단계를 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함하고,A memory storing instructions instructing the at least one processor to perform at least one step,

상기 적어도 하나의 단계는,The at least one step,

검출하고자 하는 물체가 포함된 입력 영상을 획득하는 단계;Obtaining an input image including an object to be detected;

상기 물체의 크기를 정규화한 영상에서 추출한 고정 크기 특징과 상기 입력 영상에서 추출한 크기 의존적 특징 사이의 상관 관계를 학습한 크기 인식 기반 CNN에 상기 입력 영상을 입력하여, 상기 물체에 대한 크기 독립적 특징을 추출하는 단계; 및The size-independent feature of the object is extracted by inputting the input image to a size recognition-based CNN that learns a correlation between the fixed size feature extracted from the normalized image of the object and the size dependent feature extracted from the input image. Making; And

추출된 크기 독립적 특징을 물체 검출 신경망에 입력하여 상기 물체를 검출하는 단계를 포함하는, 크기 독립적으로 물체를 검출하는 장치.And detecting the object by inputting an extracted size independent feature to an object detecting neural network.
청구항 11에서,In claim 11,

상기 입력 영상을 획득하는 단계 전이나 후에,Before or after acquiring the input image,

상기 고정 크기 특징을 수집하여 고정 크기 특징 DB(database)를 생성하는 단계를 더 포함하는, 크기 독립적으로 물체를 검출하는 장치.And collecting the fixed-size features to generate a fixed-size feature database.
청구항 12에서,In claim 12,

상기 고정 크기 특징 DB를 생성하는 단계는,Generating the fixed size feature DB,

상기 물체의 크기 정보를 수집하는 단계;Collecting size information of the object;

수집된 상기 물체의 크기 정보를 이용하여 참조 크기를 결정하는 단계;Determining a reference size using the collected size information of the object;

결정된 참조 크기를 이용하여 상기 입력 영상에 포함된 물체의 크기를 정규화하는 단계; 및Normalizing the size of an object included in the input image using the determined reference size; And

정규화된 상기 입력 영상을 CNN에 입력하여 고정 크기 특징을 추출하는 단계를 포함하는, 크기 독립적으로 물체를 검출하는 장치.And extracting a fixed size feature by inputting the normalized input image to a CNN.
청구항 13에서,In claim 13,

상기 참조 크기를 결정하는 단계는,Determining the reference size,

상기 물체의 크기 정보에 포함된 값들 중에서 중간값(median)을 상기 참조 크기로 결정하는, 크기 독립적으로 물체를 검출하는 장치.And determine a median among the values included in the size information of the object as the reference size.
청구항 13에서,In claim 13,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 참조 크기에 따라 상기 크기 의존적 특징을 분류하고, 분류된 크기에 상응하는 부분 신경망이 개별적으로 활성화되어 상기 크기 독립적 특징을 추출하는, 크기 독립적으로 물체를 검출하는 장치.Classifying the size dependent feature according to the reference size, and partial neural networks corresponding to the classified size are individually activated to extract the size independent feature.
청구항 15에서,In claim 15,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 크기 의존적 특징이 산출된 물체의 크기를 상기 참조 크기와 비교하여, 상기 참조 크기보다 큰 경우, 작은 경우 및 유사한 경우에 따라 개별적으로 활성화되는 3개의 부분 신경망을 포함하는, 크기 독립적으로 물체를 검출하는 장치.Compare the size of the object from which the size dependent feature is calculated with the reference size to detect the object independently of size, including three partial neural networks that are individually activated according to the case of being larger, smaller and similar to the reference size Device.
청구항 11에서,In claim 11,

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계는,The correlation between the fixed size feature and the size dependent feature is

상기 크기 의존적 특징에 따른 특징값을 공간축상에서 모두 더한 값(r_c)과 상기 고정 크기 특징에 따른 특징값을 공간축상에서 모두 더한 값(
) 사이의 차분값을 부드러운 1차 정규 손실 함수(smooth_L1)로 계산하여 결정되는, 크기 독립적으로 물체를 검출하는 장치.The sum of the feature values according to the size dependent features on the spatial axis (r _c ) and the sum of the feature values according to the fixed size features on the spatial axis (
A device that detects an object independently of size, determined by calculating the difference between the values of?) With a smooth first order normal loss function (smooth _L1 ).
청구항 17에서,In claim 17,

상기 고정 크기 특징과 상기 크기 의존적 특징 사이의 상관 관계(L_san)는, 상기 입력 영상의 표시 방식에 따른 채널(C)에 대하여 수학식The correlation L _san between the fixed size feature and the size dependent feature is expressed by the equation for channel C according to the display method of the input image.

로 정의되는, 크기 독립적으로 물체를 검출하는 장치.
An apparatus for detecting an object independently of size, defined as.
청구항 1에서,In claim 1,

상기 크기 인식 기반 CNN은,The size recognition based CNN,

상기 상관 관계를 학습하여 신경망 내부 파라미터(parameter)를 설정하는 제1 크기 인식 기반 CNN; 및A first size recognition based CNN learning the correlation to set neural network internal parameters; And

상기 제1 크기 인식 기반 CNN의 신경망 내부 파라미터를 공유하여 상기 크기 독립적 특징을 추출하는 제2 크기 인식 기반 CNN을 포함하는, 크기 독립적으로 물체를 검출하는 장치.And a second size recognition based CNN that shares the neural network internal parameters of the first size recognition based CNN to extract the size independent features.
청구항 11에서,In claim 11,

상기 물체를 검출하는 단계는,Detecting the object,

상기 크기 독립적 특징과 상기 크기 의존적 특징을 결합한 결과값을 상기 물체 검출 신경망에 입력함으로써, 상기 크기 의존적 특징에 따른 상기 물체의 영상 내 위치 정보를 고려하여 상기 물체를 검출하는, 크기 독립적으로 물체를 검출하는 장치.By inputting a result value combining the size independent feature and the size dependent feature into the object detecting neural network, the object is detected independently of size, which detects the object in consideration of position information in the image of the object according to the size dependent feature. Device.