KR20220073914A

KR20220073914A - Device and Method for Recognizing Face Using Light-weight Neural Network

Info

Publication number: KR20220073914A
Application number: KR1020200161826A
Authority: KR
Inventors: 이상윤; 장성준; 배한별; 전태재; 이용주
Original assignee: 연세대학교 산학협력단
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-06-03
Also published as: WO2022114314A1

Abstract

경량화 신경망을 이용한 얼굴 인식 장치 및 방법이 개시된다. 개시된 장치는, 입력 영상으로부터 얼굴 영역을 검출하는 전처리부; 상기 전치러부에서 출력되는 영상에 대해 신경망 연산을 통해 특징맵을 출력하는 특징 추출 네트워크; 상기 특징맵에 대해 미리 설정된 레퍼런스 인물별 확률값을 신경망 연산을 통해 출력하는 소프트맥스 네트워크를 포함하되, 상기 특징 추출 네트워크는 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크를 포함하며, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 네트워크는 적어도 하나의 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크는 2개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하는 구조를 포함한다. 개시된 장치 및 방법에 의하면, 신경망 연산의 연산 복잡도를 최소화하여 하드웨어의 성능에 영향을 받지 않으면서 실시간 얼굴 인식이 가능한 장점이 있다.A face recognition apparatus and method using a lightweight neural network are disclosed. The disclosed apparatus includes: a preprocessor for detecting a face region from an input image; a feature extraction network for outputting a feature map through neural network operation on the image output from the prepositioner; A softmax network for outputting a preset reference person-specific probability value for the feature map through a neural network operation, wherein the feature extraction network includes a stem block subnetwork and a plurality of dense block subnetworks, the stem block subnetwork and a plurality of dense block networks perform a convolution operation using at least one 1 x 1 convolution kernel, and the stem block subnetwork and the plurality of dense block subnetworks perform convolution in parallel using two paths. Contains structures that perform operations. According to the disclosed apparatus and method, there is an advantage that real-time face recognition is possible without being affected by hardware performance by minimizing the computational complexity of neural network operation.

Description

경량화 신경망을 이용한 얼굴 인식 장치 및 방법{Device and Method for Recognizing Face Using Light-weight Neural Network}Device and Method for Recognizing Face Using Light-weight Neural Network

본 발명은 얼굴 인식 장치 및 방법에 관한 것으로서, 더욱 상세하게는 경향화 신경망을 이용한 얼굴 인식 장치 및 방법에 관한 것이다. The present invention relates to a face recognition apparatus and method, and more particularly, to a face recognition apparatus and method using a trending neural network.

얼굴 인식 기술은 얼굴 식별과 얼굴 입증의 2가지로 구분되며, 입력 영상으로부터 얼굴의 특징 정보를 벡터로 추출하여 각 태스크에 맞게 기존의 레퍼런스 인물과의 유사도를 통해 얼굴 인식이 이루어진다. Facial recognition technology is divided into two types: face identification and face verification, and facial feature information is extracted from an input image as a vector, and facial recognition is performed through similarity with an existing reference person for each task.

얼굴 인식 시스템은 단말기의 사용자 잠금/로그인, 모바일 결제, 신원 확인, 출입 관리 등 다양한 어플리케이션에 사용되고 있으며, 많은 모바일 장치 및 임베디드 장치에 사용된다. The face recognition system is used in various applications such as terminal user lock/login, mobile payment, identity verification, and access management, and is used in many mobile devices and embedded devices.

신경망이 얼굴 인식 기술에 적용되면서 얼굴 인식의 정확도는 비약적으로 상승하였다. 그러나, 신경망 연산은 매우 복잡한 연산이며 하드웨어의 능력에 따라 연산 속도에 큰 차이를 보이는 문제점이 있었다. As neural networks are applied to face recognition technology, the accuracy of face recognition has risen dramatically. However, the neural network operation is a very complicated operation, and there is a problem in that the operation speed varies greatly depending on the capability of the hardware.

이러한 연산 복잡도로 인해 실시간 얼굴 인식이 중요한 분야에서 실시간 검출이 이루어지지 않는 문제가 있었다. Due to such computational complexity, there is a problem in that real-time detection is not performed in a field where real-time face recognition is important.

상기 목적을 달성하기 위해, 본 발명의 일 측면에 따르면, 입력 영상으로부터 얼굴 영역을 검출하는 전처리부; 상기 전치러부에서 출력되는 영상에 대해 신경망 연산을 통해 특징맵을 출력하는 특징 추출 네트워크; 상기 특징맵에 대해 미리 설정된 레퍼런스 인물별 확률값을 신경망 연산을 통해 출력하는 소프트맥스 네트워크를 포함하되, 상기 특징 추출 네트워크는 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크를 포함하며, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 네트워크는 적어도 하나의 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크는 2개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하는 구조를 포함하는 경량화 네트워크를 이용한 얼굴 인식 장치가 제공된다. In order to achieve the above object, according to an aspect of the present invention, a preprocessor for detecting a face region from an input image; a feature extraction network for outputting a feature map through neural network operation on the image output from the prepositioner; A softmax network for outputting a preset reference person-specific probability value for the feature map through a neural network operation, wherein the feature extraction network includes a stem block subnetwork and a plurality of dense block subnetworks, the stem block subnetwork and a plurality of dense block networks perform a convolution operation using at least one 1 x 1 convolution kernel, and the stem block subnetwork and the plurality of dense block subnetworks perform convolution in parallel using two paths. A face recognition apparatus using a lightweight network including a structure for performing calculations is provided.

상기 소프트 맥스 네트워크 및 상기 특징 추출 네트워크는 상기 소프트 맥스 네트워크의 출력인 확률값을 정편파의 각도로 변환하고, 상기 변환된 각도에 미리 설정된 마진을 부가한 후 참값과 마진이 부가된 값의 손실을 역전파하여 학습된다. The soft max network and the feature extraction network convert a probability value, which is an output of the soft max network, into an angle of a positive wave, add a preset margin to the converted angle, and then reverse the loss of the value to which the true value and the margin are added. learn by dissemination.

상기 스템 블록 서브 네트워크는 제1 경로를 통해 적어도 하나의 2 X 2 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제2 경로를 통해 적어도 하나의 1 X 1 컨볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제1 경로 및 상기 제2 경로의 콘볼루션 연산은 병렬적으로 이루어지며, 상기 제1 경로 및 상기 제2 경로 각각의 출력은 서로 결합된다. The stem block subnetwork performs a convolution operation using at least one 2 X 2 convolution kernel through a first path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a second path. , the convolution operation of the first path and the second path is performed in parallel, and outputs of the first path and the second path are combined with each other.

상기 덴스 블록 서브 네트워크는 제3 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제4 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제3 경로 및 상기 제4 경로 의 출력은 서로 결합된다. The dense block subnetwork performs a convolution operation using at least one 1 X 1 convolution kernel through a third path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a fourth path. , and outputs of the third path and the fourth path are combined with each other.

본 발명의 다른 측면에 따르면, 입력 영상으로부터 얼굴 영역을 검출하는 전처리를 수행하는 단계(a); 상기 전치러부에서 출력되는 영상에 대해 특징 추출 네트워크에 의한 신경망 연산을 통해 특징맵을 출력하는 단계(b); 상기 특징맵에 대해 미리 설정된 레퍼런스 인물별 확률값을 소프트 맥스 네트워크에 의한 신경망 연산을 통해 출력하는 단계(c)를 포함하되, 상기 특징 추출 네트워크는 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크를 포함하며, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 네트워크는 적어도 하나의 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크는 2개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하는 구조를 포함하는 경량화 네트워크를 이용한 얼굴 인식 방법이 제공된다. According to another aspect of the present invention, performing pre-processing for detecting a face region from an input image (a); (b) outputting a feature map through neural network operation using a feature extraction network on the image output from the prepositioner; A step (c) of outputting a preset reference person-specific probability value for the feature map through a neural network operation by a soft max network, wherein the feature extraction network includes a stem block subnetwork and a plurality of dense block subnetworks, , the stem block subnetwork and the plurality of dense block networks perform a convolution operation using at least one 1 x 1 convolution kernel, and the stem block subnetwork and the plurality of dense block subnetworks use two paths. Thus, a face recognition method using a lightweight network including a structure for performing a convolution operation in parallel is provided.

본 발명에 의하면, 신경망 연산의 연산 복잡도를 최소화하여 하드웨어의 성능에 영향을 받지 않으면서 실시간 얼굴 인식이 가능한 장점이 있다. According to the present invention, there is an advantage that real-time face recognition is possible without being affected by hardware performance by minimizing the computational complexity of neural network operation.

도 1은 본 발명의 일 실시예에 따른 경량화 신경망을 이용한 얼굴 인식 장치의 개략적인 구조를 나타낸 블록도.
도 2는 본 발명의 일 실시예에 따른 특징 추출 네트워크의 구조를 나타낸 도면.
도 3은 본 발명의 일 실시예에 따른 스템 블록 서브 네트워크의 구조를 나타낸 도면.
도 4는 본 발명의 일 실시예에 따른 덴스 블록 서브 네트워크의 구조를 나타낸 도면.
도 5는 본 발명의 일 실시예에 따른 마진을 이용한 학습 방법의 개략적인 동작을 나타낸 순서도.
도 6은 본 발명의 일 실시예에 따른 경량화 네트워크를 이용한 얼굴 추출 방법의 전체적인 흐름을 나타낸 순서도.1 is a block diagram showing a schematic structure of a face recognition apparatus using a lightweight neural network according to an embodiment of the present invention.
2 is a diagram showing the structure of a feature extraction network according to an embodiment of the present invention.
3 is a diagram showing the structure of a stem block sub-network according to an embodiment of the present invention.
4 is a diagram showing the structure of a dense block sub-network according to an embodiment of the present invention.
5 is a flowchart illustrating a schematic operation of a learning method using a margin according to an embodiment of the present invention.
6 is a flowchart showing the overall flow of a face extraction method using a lightweight network according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention may be embodied in various different forms, and is not limited to the described embodiments. In addition, in order to clearly explain the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 “...부”, “...기”, “모듈”, “블록” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part “includes” a certain component, it does not exclude other components unless otherwise stated, but may further include other components. In addition, terms such as “…unit”, “…group”, “module”, and “block” described in the specification mean a unit that processes at least one function or operation, which is hardware, software, or hardware and a combination of software.

도 1은 본 발명의 일 실시예에 따른 경량화 신경망을 이용한 얼굴 인식 장치의 개략적인 구조를 나타낸 블록도이다. 1 is a block diagram showing a schematic structure of a face recognition apparatus using a lightweight neural network according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 경량화 신경망을 이용한 얼굴 인식 장치는 전처리부(100), 특징 추출 네트워크(110) 및 소프트 맥스 네트워크(120)를 포함할 수 있다. Referring to FIG. 1 , a face recognition apparatus using a lightweight neural network according to an embodiment of the present invention may include a preprocessor 100 , a feature extraction network 110 , and a soft max network 120 .

본 발명의 얼굴 인식 장치는 얼굴이 포함된 인물 영상 또는 사진을 입력받은 후 입력된 사진이 미리 저장하고 있는 레퍼런스 인물 중 어떠한 인물에 해당되는지를 판단하기 위한 기술이다. 예를 들어, 특정 장소에 출입이 가능한 200명의 레퍼런스 인물이 존재하고, 특정 사람이 출입할 경우, 출입하는 사람의 영상을 포착한 후 어떠한 레퍼런스 인물에 해당되는지를 판단하는 기술이다. The face recognition apparatus of the present invention is a technology for receiving an image or photo of a person including a face and then determining whether the input photo corresponds to a reference person among pre-stored reference persons. For example, there are 200 reference people who can enter and exit a specific place, and when a specific person enters, it captures an image of the person entering and exits and determines which reference person they belong to.

전처리부(100)는 입력되는 사진 또는 영상으로부터 얼굴 영역을 검출하는 기능을 한다. 획득되는 사진 또는 영상에는 얼굴 이외에도 배경 또는 사물과 같은 얼굴 인식에 있어 노이즈에 해당되는 객체들이 존재한다. The preprocessor 100 functions to detect a face region from an input photo or image. In the acquired photo or image, there are objects corresponding to noise in face recognition, such as a background or an object, in addition to a face.

전처리부(100)는 입력되는 사진 또는 영상으로부터 얼굴 영역만을 검출하고 나머지 객체들의 영역은 제거하는 기능을 한다. The preprocessor 100 functions to detect only the face region from the input photo or image and remove the remaining object regions.

특징 추출 네트워크(110)는 전처리부에서 출력되는 얼굴 영상으로부터 특징맵을 추출한다. 신경망 네트워크를 이용한 특징 추출은 널리 알려진 기술이나, 기존의 알려진 네트워크는 특징맵 생성을 위해 복잡한 연산량을 요구한다. The feature extraction network 110 extracts a feature map from the face image output from the preprocessor. Feature extraction using a neural network is a well-known technique, but the known network requires a complex amount of computation to generate a feature map.

일반적인 신경망에서는 콘볼루션 커널을 입력 영상에 적용하여 콘볼루션 연산을 통해 특징맵을 생성하며, 이때, 콘볼루션 연산은 3 X 3 콘볼루션 커널과 같이 큰 사이즈의 콘볼루션 커널을 사용하며, 다단의 스테이지를 걸쳐 출력되는 특징맵 역시 매우 많은 채널값을 포함하고 있어서 상당한 연산 복잡도가 요구되며, 이는 실시간으로 얼굴을 감지하는데 있어 가장 큰 장애 요인으로 작용한다. In a general neural network, a convolution kernel is applied to an input image and a feature map is generated through a convolution operation. At this time, the convolution operation uses a large-sized convolution kernel such as a 3 X 3 convolution kernel, and a multi-stage stage Since the feature map outputted over , also includes a very large number of channel values, considerable computational complexity is required, which acts as the biggest obstacle in real-time face detection.

본 발명의 특징 추출 네트워크(110)는 연산 복잡도를 감소시켜 실시간 검출을 가능하도록 하기 위한 구조를 가지고 있으며, 도 2 내지 도 4를 참조하여 본 발명의 특징 추출 네트워크의 구조를 설명하도록 한다. The feature extraction network 110 of the present invention has a structure for enabling real-time detection by reducing computational complexity, and the structure of the feature extraction network of the present invention will be described with reference to FIGS. 2 to 4 .

도 2는 본 발명의 일 실시예에 따른 특징 추출 네트워크의 구조를 나타낸 도면이다. 2 is a diagram showing the structure of a feature extraction network according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 특징 추출 네트워크(110)는 스템 블록 서브 네트워크(210), 제1 덴스 블록 서브 네트워크(220), 제2 덴스 블록 서브 네트워크(230), 제3 덴스 블록 서브 네트워크(240) 및 제5 덴스 블록 네트워크(250)를 포함할 수 있다. Referring to FIG. 2 , the feature extraction network 110 according to an embodiment of the present invention includes a stem block subnetwork 210 , a first dense block subnetwork 220 , a second dense block subnetwork 230 , and a second It may include a third dense block sub-network 240 and a fifth dense block network 250 .

도 2에는 총 5개의 스테이지를 통해 특징맵을 도출하는 구조가 도시되어 있으나 이는 예시적인 것이며 스테이지의 개수가 필요에 따라 조절될 수 있다는 점은 당업자에게 있어 자명할 것이다. Although FIG. 2 shows a structure for deriving a feature map through a total of five stages, it will be apparent to those skilled in the art that this is exemplary and the number of stages can be adjusted as needed.

도 2의 특징 추출 네트워크는 하나의 스템 블록 서브 네트워크(210)와 다수의 덴스 블록 서브 네트워크(220, 230, 240, 250)가 연결된 구조이다. The feature extraction network of FIG. 2 has a structure in which one stem block sub-network 210 and a plurality of dense block sub-networks 220 , 230 , 240 , 250 are connected.

도 3은 본 발명의 일 실시예에 따른 스템 블록 서브 네트워크의 구조를 나타낸 도면이다. 3 is a diagram showing the structure of a stem block sub-network according to an embodiment of the present invention.

스템 블록 서브 네트워크(210)는 전처리부(100)의 출력 영상을 1차적으로 처리하는 서브 네트워크이다. 일례로, 입력 영상은 112 X 112 X 3의 구조를 가지고 있으며, 입력 영상은 일반적인 컬러 영상이기에 채널 값이 R, G, B인 3개의 채널축을 가지고 있다. The stem block sub-network 210 is a sub-network that primarily processes the output image of the preprocessor 100 . For example, the input image has a structure of 112 X 112 X 3, and since the input image is a general color image, it has three channel axes with channel values of R, G, and B.

스템 블록 서브 네트워크는 1차적으로 3 X 3 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 스트라이드(stiride)2를 적용하여 사이즈를 반으로 축소시킨다. The stem block subnetwork primarily performs a convolution operation using a 3 X 3 convolution kernel, and reduces the size by half by applying a stride2.

스템 블록 서브 네트워크(210)의 제1 콘볼루션 연산부(300)에서 출력되는 특징맵은 일례로, 56 X 56 X 32의 구조를 가질 수 있다. The feature map output from the first convolution operation unit 300 of the stem block sub-network 210 may have, for example, a structure of 56 X 56 X 32 .

스템 블록 서브 네트워크(210)는 병렬 연산 구조를 가지는데 특징이 있으며, 제2 콘볼루션 연산(310)부의 콘볼루션 연션과 제3 및 제4 콘볼루션 연산부(320, 330)의 콘볼루션 연산은 두 개의 경로를 통해 병렬적으로 이루어진다. The stem block sub-network 210 is characterized in having a parallel operation structure, and the convolution operation of the second convolution operation 310 unit and the convolution operation of the third and fourth convolution operation units 320 and 330 are two. It is done in parallel through two paths.

제1 경로를 통해 제2 콘볼루션 연산부(310)는 2 X 2 콘볼루션 커널을 적용하여 콘볼루션 연산을 수행하며, 스트라이드2 맥스 풀링을 통해 사이즈를 반으로 축소시킨다. Through the first path, the second convolution operation unit 310 performs a convolution operation by applying a 2 X 2 convolution kernel, and reduces the size by half through stride2 max pooling.

제2 경로를 통해 제3 및 제4 콘볼루션 연산부(320, 330)는 제2 콘볼루션 연산(510)과 병렬적으로 콘볼루션 연산을 수행하며, 제3 콘볼루션 연산부(320)는 1 x 1 콘볼루션 연산을 수행하며 16개의 채널축을 가진 특징맵을 생성한다. 제4 콘볼루션 연산부(330)는 3 X 3 커널을 이용하여 콘볼루션 연산을 수행하며, 32개의 채널축을 가진 특징맵을 생성하며 스트라이드2를 적용하여 특징맵의 사이즈를 반으로 축소시킨다. The third and fourth convolution operation units 320 and 330 perform a convolution operation in parallel with the second convolution operation 510 through the second path, and the third convolution operation unit 320 is 1 x 1 Convolution is performed and a feature map with 16 channel axes is generated. The fourth convolution operation unit 330 performs a convolution operation using a 3 X 3 kernel, generates a feature map having 32 channel axes, and reduces the size of the feature map by half by applying a stride 2 .

제2 콘볼루션 연산(310)부를 통해 출력되는 특징맵과 제4 콘볼루션 연산(330)부를 통해 출력되는 특징맵은 결합부(350)에서의 결합을 통해 서로 결합(concatenate)된다. 이때 결합된 특징맵은 28 X 28 X 64의 구조를 가질 수 있다. The feature map output through the second convolution operation 310 unit and the feature map output through the fourth convolution operation 330 unit are concatenated with each other through combining in the combining unit 350 . In this case, the combined feature map may have a structure of 28 X 28 X 64.

제5 콘볼루션 연산부(360)는 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 통해 최종적인 특징맵을 출력한다. The fifth convolution operation unit 360 outputs a final feature map through a convolution operation using a 1 X 1 convolution kernel.

도 3을 통해 살펴본 바와 같이, 본 발명의 스템 블록 서브 네트워크는 적어도 하나 이상의 1 X 1 또는 2 x 2 콘볼루션 커널을 이용하여 병렬적으로 콘볼루션 연산을 수행한 후 특징맵을 결합하는 구조에 의해 콘볼루션 연산을 단순화하면서 적은 수의 채널을 가지는 특징맵의 생성이 가능하다. 3, the stem block sub-network of the present invention performs a convolution operation in parallel using at least one 1 X 1 or 2 X 2 convolution kernel and then combines the feature maps. It is possible to generate a feature map having a small number of channels while simplifying the convolution operation.

도 4는 본 발명의 일 실시예에 따른 덴스 블록 서브 네트워크의 구조를 나타낸 도면이다. 4 is a diagram illustrating the structure of a dense block sub-network according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 특징 추출 네트워크(110)는 다수의 덴스 블록 서브 네트워크(220, 230, 240, 250)를 포함하며, 덴스 블록 서브 네트워크(220, 230, 240, 250)의 수는 필요에 따라 조절될 수 있다. 4, the feature extraction network 110 according to an embodiment of the present invention includes a plurality of dense block subnetworks 220, 230, 240, and 250, and dense block subnetworks 220 and 230. , 240, 250) can be adjusted as needed.

도 4를 참조하면, 덴스 블록 서브 네트워크도 병렬적으로 콘볼루션 연산을 수행한다. Referring to FIG. 4 , the dense block sub-network also performs a convolution operation in parallel.

제1 콘볼루션 연산부(400) 및 제2 콘볼루션 연산부(410)와 제3 콘볼루션 연산부(420), 제4 콘볼루션 연산부(430) 및 제5 콘볼루션 연산부(440)는 병렬적으로 이루어진다. 제1 경로를 통해 제1 콘볼루션 연산부(400) 및 제2 콘볼루션 연산부(410)에 의한 콘볼루션 연산이 이루어지고 제2 경로를 통해 제3 콘볼루션 연산부(420), 제4 콘볼루션 연산부(430) 및 제5 콘볼루션 연산부(440)가 동시에 콘볼루션 연산을 수행하는 것이다. The first convolution operation unit 400 and the second convolution operation unit 410 , the third convolution operation unit 420 , the fourth convolution operation unit 430 , and the fifth convolution operation unit 440 are performed in parallel. A convolution operation is performed by the first convolution operation unit 400 and the second convolution operation unit 410 through a first path, and a third convolution operation unit 420 and a fourth convolution operation unit ( 430) and the fifth convolution operation unit 440 simultaneously perform a convolution operation.

제1 콘볼루션 연산부(400)는 채널 축을 두배로 늘리면서 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행한다. 제2 콘볼루션 연산부(410)는 채널축을 반으로 줄이면서 3 X 3 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행한다. The first convolution operation unit 400 performs a convolution operation using a 1 X 1 convolution kernel while doubling the channel axis. The second convolution operation unit 410 performs a convolution operation using a 3 X 3 convolution kernel while reducing the channel axis in half.

제3 콘볼루션 연산부(420)는 채널축을 두배로 늘리면서 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행한다. 제4 콘볼루션 연산부(430) 및 제5 콘볼루션 연산부(440)는 3 X 3 콘볼루션 커널을 이용하며 채널축을 반으로 출이는 콘볼루션 연산을 수행한다. The third convolution operation unit 420 performs a convolution operation using a 1 x 1 convolution kernel while doubling the channel axis. The fourth convolution operation unit 430 and the fifth convolution operation unit 440 use a 3 X 3 convolution kernel and perform a convolution operation in which the channel axis is halved.

덴스(Dense) 블록 서브 네트워크 역시 두 개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하며, 일부의 연산은 1 X 1 콘볼루션 커널을 이용하여 수행하며 채널축을 증가시키지 않도록 연산을 수행하기에 연산의 복잡도가 감소될 수 있다. Dense block subnetworks also perform convolution operations in parallel using two paths, and some operations are performed using a 1 X 1 convolution kernel, and operations are performed so as not to increase the channel axis. complexity can be reduced.

제2 콘볼루션 연산부(410)의 출력과 제5 콘볼루션 연산부(440)의 출력은 결합부(450)에서 서로 결합된다. The output of the second convolution operation unit 410 and the output of the fifth convolution operation unit 440 are combined with each other in the combiner 450 .

다시 도 1을 참조하면, 소프트 맥스(Soft Max) 네트워크(300)는 특징 추출 네트워크(200)로부터 출력되는 특징맵에 대한 신경망 연산을 통해 레퍼런스 인물별로 확률을 출력하는 기능을 한다. Referring back to FIG. 1 , the Soft Max network 300 functions to output a probability for each reference person through neural network operation on the feature map output from the feature extraction network 200 .

특징 추출 네트워크(200)로부터 특정 인물의 얼굴 특징맵이 출력되면 출력된 특징맵과 가장 유사한 레퍼런스 인물을 탐색하는 작업이 소프트 맥스 네트워크(300)에서 이루어지는 것이며, 소프트 맥스 네트워크(300)는 각 인물별로 특징 추출 네트워크(200)의 특징맵과 매칭될 확률을 연산하는 것이다. When the facial feature map of a specific person is output from the feature extraction network 200, the operation of searching for a reference person most similar to the output feature map is performed in the soft max network 300, and the soft max network 300 is configured for each person. The probability of matching with the feature map of the feature extraction network 200 is calculated.

다수의 레퍼런스 인물 중 가장 높은 확률값을 가지는 인물이 입력된 영상의 인물로 판단된다. A person having the highest probability value among a plurality of reference persons is determined as the person of the input image.

클래스 분류를 위한 소프트 맥스 네트워크의 구조는 널리 알려진 구조이기에 이에 대한 상세한 설명은 생략한다. Since the structure of the soft max network for class classification is a well-known structure, a detailed description thereof will be omitted.

본 발명은 특징 추출 네트워크(200) 및 소프트 맥스 네트워크(300)의 두 개의 신경망으로 이루어지며, 신경망의 학습을 위한 손실 함수는 소프트 맥스 네트워크(300)의 출력과 참값(ground truth)의 차이에 상응하는 손실을 역전파하도록 설정된다. The present invention consists of two neural networks, a feature extraction network 200 and a soft max network 300, and the loss function for learning the neural network corresponds to the difference between the output of the soft max network 300 and the ground truth. It is set to backpropagate the loss.

본 발명의 바람직한 실시예에 따르면, 각 인물별 특징맵의 특징을 인물별로 부각시키기 위해 마진을 부가한 학습이 이루어지는 것이 바람직하다. According to a preferred embodiment of the present invention, it is preferable that learning is performed by adding a margin in order to highlight the characteristics of each person's feature map for each person.

도 5는 본 발명의 일 실시예에 따른 마진을 이용한 학습 방법의 개략적인 동작을 나타낸 순서도이다. 5 is a flowchart illustrating a schematic operation of a learning method using a margin according to an embodiment of the present invention.

우선, 소프트 맥스 네트워크(300)로부터 출력되는 각 레퍼런스 인물별 확률값을 획득한다(단계 500). First, a probability value for each reference person outputted from the soft max network 300 is obtained (step 500).

인물별 확률값이 획득되면, 각 인물별 확률값을 정현파의 각도로 변환한다(단계 502). 예를 들어, 획득되는 확률값을 코사인 함수의 각도로 변환하는 것이다. 구체적으로, 특징 인물의 확률 값이 0.5일 경우, 이를 0.5에 상응하는 코사인 각도인 60도로 변환하는 것이다. When the probability value for each person is obtained, the probability value for each person is converted into a sinusoidal angle (step 502). For example, the obtained probability value is converted into an angle of a cosine function. Specifically, when the probability value of the characteristic person is 0.5, it is converted to 60 degrees, which is a cosine angle corresponding to 0.5.

각도 변환이 완료되면, 변환된 각도에 미리 설정된 마진값을 부가한다(단계 504). 변환된 각도가 θ이고 미리 설정된 마진값이 m일 경우, (θ+m)의 연산을 수행하는 것이다. When the angle conversion is completed, a preset margin value is added to the converted angle (step 504). When the converted angle is θ and the preset margin value is m, the operation of (θ+m) is performed.

마진값이 각도에 부가되면, 마진이 부가된 각도의 확률값을 다시 획득한다(단계 506). 예를 들어, 정현파로 코사인 함수가 이용될 경우, cos(θ+m)의 확률값을 연산하는 것이다. If the margin value is added to the angle, the probability value of the angle to which the margin is added is obtained again (step 506). For example, when a cosine function is used as a sine wave, a probability value of cos(θ+m) is calculated.

마진이 부가된 각도의 확률값이 획득되면, 부진이 부가된 각도의 확률값과 참값과의 차이를 손실로 설정하는 손실함수를 통해 손실을 역전파하여 소프트 맥스 네트워크(300) 및 특징 추출 네트워크(200)를 학습시킨다(단계 508). When the probability value of the angle to which the margin is added is obtained, the loss is backpropagated through a loss function that sets the difference between the probability value and the true value of the angle to which the slack is added as the loss, so that the soft max network 300 and the feature extraction network 200 to learn (step 508).

도 6은 본 발명의 일 실시예에 따른 경량화 네트워크를 이용한 얼굴 추출 방법의 전체적인 흐름을 나타낸 순서도이다. 6 is a flowchart illustrating the overall flow of a face extraction method using a lightweight network according to an embodiment of the present invention.

도 6을 참조하면, 영상 또는 사진을 입력받아 전처리를 수행한다(단계 600). 영상 또는 사진에서 인물 영역을 검출하고 나머지 영역을 드랍시키는 전처리가 이루어진다.Referring to FIG. 6 , an image or photo is received and pre-processing is performed (step 600). Pre-processing of detecting a person area in an image or photo and dropping the remaining area is performed.

전처리가 완료되면, 특징 추출 네트워크(110)에 전처리가 완료된 영상을 입력하여 특징맵을 출력한다(단계 602). 앞서 설명한 바와 같이, 특징 추출 네트워크는 각 스테이지별로 두 개의 경로를 이용한 병렬화 콘볼루션 연산을 수행한다. 또한, 콘볼루션 연산들 중 적어도 하나는 1 X 1 콘볼루션을 적용하며 채널축을 2배 이상으로 증가시키지 아니한다. When the pre-processing is completed, the pre-processed image is input to the feature extraction network 110 to output a feature map (step 602). As described above, the feature extraction network performs a parallelized convolution operation using two paths for each stage. In addition, at least one of the convolution operations applies 1 X 1 convolution and does not increase the channel axis more than twice.

콘볼루션 커널의 사이즈가 크지 않고 채널축의 크기가 크지 않으므로 빠른 시간에 특징맵의 출력이 가능하다. Since the size of the convolution kernel is not large and the size of the channel axis is not large, it is possible to quickly output a feature map.

특징맵이 특징 추출 네트워크(110)로부터 출력되면, 소프트 맥스 네트워크(120)를 통해 특징맵에 대한 레퍼런스 인물별 확률값을 출력한다(단계 604). When the feature map is output from the feature extraction network 110, a probability value for each reference person for the feature map is output through the soft max network 120 (step 604).

레퍼런스 인물별 확률값이 출력되면, 가장 높은 확률값에 대응되는 레퍼런스 인물을 입력 영상 또는 사진의 인물로 인식한다(단계 606). When the probability value for each reference person is output, the reference person corresponding to the highest probability value is recognized as a person in the input image or photo (step 606).

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to the embodiment shown in the drawings, which is only exemplary, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

Claims

입력 영상으로부터 얼굴 영역을 검출하는 전처리부;
상기 전치러부에서 출력되는 영상에 대해 신경망 연산을 통해 특징맵을 출력하는 특징 추출 네트워크;
상기 특징맵에 대해 미리 설정된 레퍼런스 인물별 확률값을 신경망 연산을 통해 출력하는 소프트맥스 네트워크를 포함하되,
상기 특징 추출 네트워크는 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크를 포함하며, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 네트워크는 적어도 하나의 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크는 2개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하는 구조를 포함하는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 장치.
a preprocessor for detecting a face region from an input image;
a feature extraction network for outputting a feature map through neural network operation on the image output from the prepositioner;
A softmax network for outputting a preset reference person-specific probability value for the feature map through a neural network operation,
The feature extraction network includes a stem block subnetwork and a plurality of dense block subnetworks, wherein the stem block subnetwork and the plurality of dense block networks perform a convolution operation using at least one 1 x 1 convolution kernel, , wherein the stem block subnetwork and the plurality of dense block subnetworks include a structure for performing a convolution operation in parallel using two paths.

제1항에 있어서,
상기 소프트 맥스 네트워크 및 상기 특징 추출 네트워크는 상기 소프트 맥스 네트워크의 출력인 확률값을 정편파의 각도로 변환하고, 상기 변환된 각도에 미리 설정된 마진을 부가한 후 참값과 마진이 부가된 값의 손실을 역전파하여 학습되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 장치.
The method of claim 1,
The soft max network and the feature extraction network convert a probability value, which is an output of the soft max network, into an angle of a positive wave, add a preset margin to the converted angle, and then reverse the loss of the value to which the true value and the margin are added. A face recognition device using a lightweight network, characterized in that it is learned by propagation.

제1항에 있어서,
상기 스템 블록 서브 네트워크는 제1 경로를 통해 적어도 하나의 2 X 2 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제2 경로를 통해 적어도 하나의 1 X 1 컨볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제1 경로 및 상기 제2 경로의 콘볼루션 연산은 병렬적으로 이루어지며, 상기 제1 경로 및 상기 제2 경로 각각의 출력은 서로 결합되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 장치.
The method of claim 1,
The stem block subnetwork performs a convolution operation using at least one 2 X 2 convolution kernel through a first path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a second path. , wherein the convolution operation of the first path and the second path is performed in parallel, and the output of each of the first path and the second path is combined with each other. Device.

제3항에 있어서,
상기 덴스 블록 서브 네트워크는 제3 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제4 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제3 경로 및 상기 제4 경로 의 출력은 서로 결합되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 장치.
4. The method of claim 3,
The dense block subnetwork performs a convolution operation using at least one 1 X 1 convolution kernel through a third path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a fourth path. , and outputs of the third path and the fourth path are combined with each other.

입력 영상으로부터 얼굴 영역을 검출하는 전처리를 수행하는 단계(a);
상기 전치러부에서 출력되는 영상에 대해 특징 추출 네트워크에 의한 신경망 연산을 통해 특징맵을 출력하는 단계(b);
상기 특징맵에 대해 미리 설정된 레퍼런스 인물별 확률값을 소프트 맥스 네트워크에 의한 신경망 연산을 통해 출력하는 단계(c)를 포함하되,
상기 특징 추출 네트워크는 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크를 포함하며, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 네트워크는 적어도 하나의 1 x 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 상기 스템 블록 서브 네트워크 및 다수의 덴스 블록 서브 네트워크는 2개의 경로를 이용하여 병렬적으로 콘볼루션 연산을 수행하는 구조를 포함하는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 방법.
performing pre-processing of detecting a face region from an input image (a);
(b) outputting a feature map through neural network operation using a feature extraction network for the image output from the prepositioner;
Including a step (c) of outputting a preset reference person-specific probability value for the feature map through a neural network operation using a soft max network,
The feature extraction network includes a stem block subnetwork and a plurality of dense block subnetworks, wherein the stem block subnetwork and the plurality of dense block networks perform a convolution operation using at least one 1 x 1 convolution kernel, , wherein the stem block subnetwork and the plurality of dense block subnetworks include a structure in which a convolution operation is performed in parallel using two paths.

제5항에 있어서,
상기 소프트 맥스 네트워크 및 상기 특징 추출 네트워크는 상기 소프트 맥스 네트워크의 출력인 확률값을 정편파의 각도로 변환하고, 상기 변환된 각도에 미리 설정된 마진을 부가한 후 참값과 마진이 부가된 값의 손실을 역전파하여 학습되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 방법.
6. The method of claim 5,
The soft max network and the feature extraction network convert a probability value, which is an output of the soft max network, into an angle of a positive wave, add a preset margin to the converted angle, and then reverse the loss of the value to which the true value and the margin are added. A face recognition method using a lightweight network, characterized in that it is learned by propagation.

제5항에 있어서,
상기 스템 블록 서브 네트워크는 제1 경로를 통해 적어도 하나의 2 X 2 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제2 경로를 통해 적어도 하나의 1 X 1 컨볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제1 경로 및 상기 제2 경로의 콘볼루션 연산은 병렬적으로 이루어지며, 상기 제1 경로 및 상기 제2 경로 각각의 출력은 서로 결합되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 방법.
6. The method of claim 5,
The stem block subnetwork performs a convolution operation using at least one 2 X 2 convolution kernel through a first path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a second path. , wherein the convolution operation of the first path and the second path is performed in parallel, and the output of each of the first path and the second path is combined with each other. Way.

제7항에 있어서,
상기 덴스 블록 서브 네트워크는 제3 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하고, 제4 경로를 통해 적어도 하나의 1 X 1 콘볼루션 커널을 이용하여 콘볼루션 연산을 수행하며, 상기 제3 경로 및 상기 제4 경로 의 출력은 서로 결합되는 것을 특징으로 하는 경량화 네트워크를 이용한 얼굴 인식 방법.

8. The method of claim 7,
The dense block subnetwork performs a convolution operation using at least one 1 X 1 convolution kernel through a third path, and performs a convolution operation using at least one 1 X 1 convolution kernel through a fourth path. , and outputs of the third path and the fourth path are combined with each other.