KR102613887B1

KR102613887B1 - Method and apparatus for face image reconstruction using video identity clarification model

Info

Publication number: KR102613887B1
Application number: KR1020220050392A
Authority: KR
Inventors: 이영기; 이주헌
Original assignee: 서울대학교산학협력단
Priority date: 2021-04-22
Filing date: 2022-04-22
Publication date: 2023-12-14
Also published as: KR20220145792A

Abstract

신원 복원 모델을 이용하여 저화질 얼굴 이미지로부터 고화질 얼굴 이미지를 재구성하는 방법 및 장치가 제공된다. 실시예에 의하면 얼굴 인식 정확도 제고에 기여한다.A method and apparatus for reconstructing a high-quality face image from a low-quality face image using an identity restoration model are provided. According to the embodiment, it contributes to improving face recognition accuracy.

Description

비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법 및 장치{METHOD AND APPARATUS FOR FACE IMAGE RECONSTRUCTION USING VIDEO IDENTITY CLARIFICATION MODEL}Method and apparatus for reconstructing face image using video identity restoration model {METHOD AND APPARATUS FOR FACE IMAGE RECONSTRUCTION USING VIDEO IDENTITY CLARIFICATION MODEL}

본 발명은 얼굴 이미지 재구성 방법 및 장치에 관한 것으로, 신원 복원 모델 및/또는 비디오 신원 복원 모델을 이용하여 저화질 얼굴 이미지로부터 고화질 얼굴 이미지를 재구성하는 방법 및 장치에 관한 것이다.The present invention relates to a method and device for facial image reconstruction, and to a method and device for reconstructing a high-quality facial image from a low-quality facial image using an identity recovery model and/or a video identity recovery model.

이하에서 기술되는 내용은 본 발명의 실시예와 관련되는 배경 정보를 제공할 목적으로 기재된 것일 뿐이고, 기술되는 내용들이 당연하게 종래기술을 구성하는 것은 아니다.The content described below is merely for the purpose of providing background information related to embodiments of the present invention, and does not necessarily constitute prior art.

복잡한 도심 공간 속 얼굴 인식을 위해서는 입력 이미지에 포함된 먼 거리에서 찍힌 저화질 얼굴들을 정확히 인식할 수 있어야 한다. 최근 딥 뉴럴 네트워크(Deep Neural Network; DNN) 기반 얼굴 인식 기술이 높은 정확도를 달성하고 있으나, 저화질 이미지에 대한 인식 정확도는 현저히 떨어진다.In order to recognize faces in complex urban spaces, it is necessary to accurately recognize low-quality faces taken from a distance included in the input image. Recently, deep neural network (DNN)-based face recognition technology has achieved high accuracy, but recognition accuracy for low-quality images is significantly poor.

한편, 저화질 얼굴 이미지를 고화질로 재구성하는 DNN 기반 연구 또한 활발히 이루어지고 있으나, 시각적으로 그럴듯한 이미지를 재구성하는데 집중되어 인식 정확도 향상에는 도움을 주지 못한다.Meanwhile, DNN-based research on reconstructing low-quality face images into high-definition is also being actively conducted, but it is focused on reconstructing visually plausible images and does not help improve recognition accuracy.

이와 같은 얼굴 인식 기술의 제한적인 정확도를 제고하기 위해서는, 먼 거리에서 찍은 저화질의 작은 얼굴 이미지로부터 고화질의 얼굴 이미지를 재구성할 수 있는 기술이 필요하다.In order to improve the limited accuracy of such face recognition technology, technology is needed to reconstruct high-quality face images from small, low-quality face images taken from a distance.

또한 기존 DNN 모델은 단일 저화질 이미지를 입력으로 받는 상황을 가정한다. 이에 비디오에서 대상 얼굴이 연속적인 프레임에 걸쳐 캡쳐되는 상황에서 해당 정보를 화질 재구성에 활용하지 못하는 한계가 있다.Additionally, the existing DNN model assumes a situation where a single low-quality image is received as input. Accordingly, there is a limitation in that the information cannot be used for image quality reconstruction in a situation where the target face is captured over successive frames in the video.

한편, 전술한 선행기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.Meanwhile, the above-mentioned prior art is technical information that the inventor possessed for deriving the present invention or acquired in the process of deriving the present invention, and cannot necessarily be said to be known art disclosed to the general public before the application for the present invention. .

본 발명의 일 과제는 저화질 얼굴 이미지의 신원을 복원하여 고화질 얼굴 이미지를 재구성하는 얼굴 이미지 재구성 방법 및 장치를 제공하는 것이다.One object of the present invention is to provide a facial image reconstruction method and device for reconstructing a high-quality face image by restoring the identity of a low-quality face image.

본 발명의 일 과제는 저화질 입력 이미지의 신원을 복원하는 신원 복원 모델(Identity Clarification Network; ICN)을 제공하는 것이다.One object of the present invention is to provide an Identity Clarification Network (ICN) that restores the identity of a low-quality input image.

본 발명의 일 과제는 비디오의 연속적인 프레임로부터 캡쳐된 일련의 저화질 이미지 프레임으로부터 대상 얼굴의 이미지를 고화질로 재구성하는 비디오 신원 복원 모델(Video Identity Clarification Network; VICN) 및 이를 이용한 얼굴 이미지 재구성 방법 및 장치를 제공하는 것이다.One object of the present invention is a Video Identity Clarification Network (VICN) model for reconstructing an image of a target face in high quality from a series of low-quality image frames captured from consecutive frames of video, and a facial image reconstruction method and device using the same. is to provide.

본 발명의 목적은 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.The object of the present invention is not limited to the problems mentioned above, and other objects and advantages of the present invention that are not mentioned can be understood through the following description and will be more clearly understood through examples of the present invention. It will also be appreciated that the objects and advantages of the present invention can be realized by means and combinations thereof as set forth in the claims.

본 발명의 일 실시예에 따른 얼굴 이미지 재구성 방법은, 얼굴 이미지 및 상기 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득하는 단계 및 상기 학습 데이터에 기반하여 신원 복원 모델(Identity Clarification Network; ICN)을 학습하는 단계를 포함하고, 상기 학습하는 단계는, 상기 신원 복원 모델의 생성기(Generator)를 실행하여, 상기 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 단계 및 상기 생성기와 생성적 적대 신경망(Generative Adversarial Network; GAN)의 경쟁 관계에 있는 상기 신원 복원 모델의 판별기(Discriminator)를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 단계를 포함할 수 있다.A facial image reconstruction method according to an embodiment of the present invention includes the steps of acquiring learning data including a face image and a correct facial image for the face image, and an identity restoration model (Identity Clarification Network; ICN) based on the learning data. ), wherein the learning step includes a generating step of executing a generator of the identity recovery model to generate a reconstructed face image in which the identity of the face appearing in the face image is restored; A discrimination step of executing a discriminator of the identity restoration model in competition between the generator and a generative adversarial network (GAN) to determine the reconstructed face image based on the correct face image. may include.

본 발명의 일 실시예에 따른 얼굴 이미지 재구성 장치는, 생성기 및 상기 생성기와 생성적 적대 신경망의 경쟁 관계에 있는 판별기를 포함하는 신원 복원 모델을 저장하는 메모리 및 얼굴 이미지 및 상기 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터에 기반하여 상기 신원 복원 모델의 학습을 실행하도록 구성되는 프로세서를 포함하고, 상기 프로세서는, 상기 학습을 실행하기 위하여, 상기 생성기를 실행하여, 상기 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 작업 및 상기 판별기를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 작업을 수행하도록 구성될 수 있다.A facial image reconstruction device according to an embodiment of the present invention includes a memory for storing an identity reconstruction model including a generator and a discriminator in competition with the generator and a generative adversarial network, a face image, and a correct answer face for the face image. a processor configured to perform learning of the identity recovery model based on training data including images, wherein the processor executes the generator to perform the learning to determine a face appearing in the facial image. It may be configured to perform a generation task for generating a reconstructed face image with restored identity and a discrimination task for determining the reconstructed face image based on the correct face image by executing the discriminator.

본 발명의 일 실시예에 따른 프로세서를 포함한 얼굴 이미지 재구성 장치에 의해 실행되는 얼굴 이미지 재구성 방법은, 입력 비디오의 일련의 프레임으로부터 트래킹(tracking)된 적어도 하나의 얼굴 이미지 및 상기 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득하는 단계 및 상기 학습 데이터에 기반하여 비디오 신원 복원 모델(Video Identity Clarification Network; VICN)을 학습하는 단계를 포함하고, 상기 학습하는 단계는, 상기 비디오 신원 복원 모델의 생성기(Generator)를 실행하여, 상기 적어도 하나의 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 단계 및 상기 생성기와 생성적 적대 신경망(Generative Adversarial Network; GAN)의 경쟁 관계에 있는 상기 신원 복원 모델의 판별기(Discriminator)를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 단계를 포함할 수 있다.A facial image reconstruction method executed by a facial image reconstruction apparatus including a processor according to an embodiment of the present invention includes at least one facial image tracked from a series of frames of an input video and the at least one facial image. A step of acquiring training data including a correct answer face image and learning a Video Identity Clarification Network (VICN) based on the training data, wherein the learning step includes restoring the video identity. A generation step of executing a model generator to generate a reconstructed face image that restores the identity of the face shown in the at least one face image, and combining the generator and a generative adversarial network (GAN) It may include a discrimination step of executing a discriminator of the competing identity recovery model to determine the reconstructed face image based on the correct face image.

본 발명의 일 실시예에 따른 얼굴 이미지 재구성 장치는 생성기 및 상기 생성기와 생성적 적대 신경망의 경쟁 관계에 있는 판별기를 포함하는 비디오 신원 복원 모델을 저장하는 메모리 및 입력 비디오의 일련의 프레임으로부터 트래킹된 적어도 하나의 얼굴 이미지 및 상기 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터에 기반하여 상기 비디오 신원 복원 모델의 학습을 실행하도록 구성되는 프로세서를 포함하고, 상기 프로세서는, 상기 학습을 실행하기 위하여, 상기 생성기를 실행하여, 상기 적어도 하나의 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 작업 및 상기 판별기를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 작업을 수행하도록 구성될 수 있다.A facial image reconstruction device according to an embodiment of the present invention includes a memory for storing a video identity reconstruction model including a generator and a discriminator in competition with the generator and a generative adversarial network, and at least one tracked from a series of frames of the input video. a processor configured to execute training of the video identity reconstruction model based on training data including one facial image and a correct facial image for the at least one facial image, wherein the processor executes the training; To this end, the generator is executed to generate a reconstructed face image that restores the identity of the face appearing in the at least one face image, and the discriminator is executed to generate the reconstructed face based on the correct face image. It may be configured to perform a discrimination task to determine an image.

전술한 것 외의 다른 측면, 특징, 및 이점이 이하의 도면, 청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features, and advantages in addition to those described above will become apparent from the following drawings, claims, and detailed description of the invention.

실시예에 의하면, 저화질 얼굴 이미지의 신원을 복원하여 고화질 얼굴 이미지를 재구성할 수 있다.According to an embodiment, a high-quality face image can be reconstructed by restoring the identity of a low-quality face image.

실시예에 의하면 저화질 얼굴 이미지로 탐색 대상에 대한 탐지 정확도가 제고된다.According to the embodiment, detection accuracy for a search target is improved using a low-quality face image.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 실시예에 따른 얼굴 이미지 재구성 장치의 동작 환경의 개략적인 예시도이다.
도 2는 실시예에 따른 얼굴 이미지 재구성 장치의 블록도이다.
도 3은 실시예에 따른 얼굴 이미지 재구성 방법의 흐름도이다.
도 4는 실시예에 따른 신원 복원 모델 및 학습 구조를 설명하기 위한 도면이다.
도 5는 실시예에 따른 얼굴 이미지 재구성 방법의 학습 과정에 대한 흐름도이다.
도 6은 실시예에 따른 신원 복원 모델의 생성기의 네트워크 구조를 설명하기 위한 도면이다.
도 7은 실시예에 따른 얼굴 이미지 재구성 과정의 실행 결과를 예시적으로 보여주는 도면이다.
도 8은 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법의 흐름도이다.
도 9는 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성의 얼굴 트래킹 과정을 설명하기 위한 도면이다.
도 10은 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성의 얼굴 트래킹 과정을 예시적으로 설명하기 위한 도면이다.
도 11은 실시예에 따른 비디오 신원 복원 모델 및 학습 구조를 설명하기 위한 도면이다.
도 12는 실시예에 따른 신원 복원 모델의 생성기의 다중 프레임 얼굴 화질 개선기의 네트워크 구조를 설명하기 위한 도면이다.1 is a schematic illustration of the operating environment of a facial image reconstruction device according to an embodiment.
Figure 2 is a block diagram of a facial image reconstruction device according to an embodiment.
Figure 3 is a flowchart of a facial image reconstruction method according to an embodiment.
Figure 4 is a diagram for explaining an identity recovery model and learning structure according to an embodiment.
Figure 5 is a flowchart of a learning process of a facial image reconstruction method according to an embodiment.
FIG. 6 is a diagram illustrating the network structure of a generator of an identity recovery model according to an embodiment.
Figure 7 is a diagram exemplarily showing the execution result of a facial image reconstruction process according to an embodiment.
Figure 8 is a flowchart of a facial image reconstruction method using a video identity restoration model according to an embodiment.
FIG. 9 is a diagram illustrating a face tracking process of facial image reconstruction using a video identity restoration model according to an embodiment.
FIG. 10 is a diagram illustrating an exemplary face tracking process of face image reconstruction using a video identity restoration model according to an embodiment.
Figure 11 is a diagram for explaining a video identity recovery model and learning structure according to an embodiment.
FIG. 12 is a diagram illustrating the network structure of a multi-frame face image quality improver of an identity recovery model generator according to an embodiment.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다. 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 여기에서 설명하는 실시 예들에 한정되지 않는다. 이하 실시 예에서는 본 발명을 명확하게 설명하기 위해서 설명과 직접적인 관계가 없는 부분을 생략하지만, 본 발명의 사상이 적용된 장치 또는 시스템을 구현함에 있어서, 이와 같이 생략된 구성이 불필요함을 의미하는 것은 아니다. 아울러, 명세서 전체를 통하여 동일 또는 유사한 구성요소에 대해서는 동일한 참조번호를 사용한다.Hereinafter, the present invention will be described in more detail with reference to the drawings. The present invention may be implemented in many different forms and is not limited to the embodiments described herein. In the following embodiments, parts that are not directly related to the description are omitted in order to clearly explain the present invention, but this does not mean that such omitted elements are unnecessary when implementing a device or system to which the spirit of the present invention is applied. . In addition, the same reference numbers are used for identical or similar components throughout the specification.

이하의 설명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 되며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 또한, 이하의 설명에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.In the following description, terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms, and the terms may be used to separate one component from another component. It is used only for distinguishing purposes. Additionally, in the following description, singular expressions include plural expressions, unless the context clearly indicates otherwise.

이하의 설명에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the following description, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are intended to indicate the presence of one or more other It should be understood that this does not exclude in advance the possibility of the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof.

이하 도면을 참고하여 본 발명을 상세히 설명하기로 한다.The present invention will be described in detail below with reference to the drawings.

도 1은 실시예에 따른 얼굴 이미지 재구성 장치의 동작 환경의 개략적인 예시도이다.1 is a schematic illustration of the operating environment of a facial image reconstruction device according to an embodiment.

실시예에 따른 얼굴 이미지 재구성 과정은 고정밀 얼굴 인식을 위한 기술로서, 딥 뉴럴 네트워크(DNN) 기반의 얼굴 인식 알고리즘에 적용되어, 이미지 기반 얼굴 인식의 정확도를 제고할 수 있다. 예를 들어 복잡도가 높은 공간은 많은 사람으로 붐비는 공간, 예를 들어 유동 인구가 많은 도심 공간, 출퇴근 시간의 환승역, 다수의 관중이 들어찬 스포츠 경기장 및 쇼핑몰 등을 포함한다.The facial image reconstruction process according to the embodiment is a technology for high-precision face recognition and can be applied to a deep neural network (DNN)-based face recognition algorithm to improve the accuracy of image-based face recognition. For example, high-complexity spaces include spaces crowded with many people, such as urban spaces with high floating population, transfer stations during rush hour, sports stadiums and shopping malls with large crowds of spectators.

실시예에 따른 얼굴 이미지 재구성 과정은 복잡도가 높은 공간에서 예를 들어 스마트폰, 웨어러블 글래스(wearable glasses) 또는 CCTV(Closed Circuit Television)와 같은 단말의 카메라로 획득한 영상에 포함된 먼 거리에 있는 다수의 작은 얼굴을 정확히 인식가능하도록 고화질 얼굴 이미지를 재구성할 수 있다.The facial image reconstruction process according to the embodiment is performed in a space of high complexity, for example, by multiple images from a distance included in an image acquired by a camera of a terminal such as a smartphone, wearable glasses, or CCTV (Closed Circuit Television). High-definition face images can be reconstructed so that small faces can be accurately recognized.

실시예에 따른 얼굴 이미지 재구성 장치(100)는 실시예에 따른 얼굴 이미지 재구성 방법을 실행하여 입력 얼굴 이미지로부터 재구성된 얼굴 이미지를 생성할 수 있다.The facial image reconstruction apparatus 100 according to the embodiment may generate a reconstructed face image from the input face image by executing the facial image reconstruction method according to the embodiment.

얼굴 이미지 재구성 장치(100)는 얼굴 인식 알고리즘에 의해 먼 거리에서 찍힌 작은 얼굴을 정확히 인식할 수 있도록 저화질 얼굴 이미지의 화질을 개선하여 고화질 얼굴 이미지로 재구성할 수 있다.The face image reconstruction device 100 can improve the quality of a low-definition face image and reconstruct it into a high-definition face image so that it can accurately recognize a small face taken from a distance using a face recognition algorithm.

이를 위하여 실시예에 따른 얼굴 이미지 재구성 장치(100)는 딥 뉴럴 네트워크(DNN) 기반의 신원 복원 모델(Identity Clarification Network; ICN)을 제공할 수 있다. To this end, the facial image reconstruction apparatus 100 according to the embodiment may provide an identity clarification network (ICN) based on a deep neural network (DNN).

일 예에서 신원 복원 모델은 얼굴 인식 알고리즘에 의한 얼굴 인식 정확도를 향상시키기 위하여 얼굴 이미지를 재구성하는 모델 구조 및 학습 목표 함수(training loss function)를 도입한다.In one example, the identity restoration model introduces a model structure and a training loss function to reconstruct a face image to improve the accuracy of face recognition by a face recognition algorithm.

일 예에서 얼굴 이미지 재구성 장치(100)는 네트워크(300)를 통해 서버(200)로부터 제공된 학습 데이터를 이용하여 신원 복원 모델을 학습시킬 수 있다. 일 예에서 얼굴 이미지 재구성 장치(100)는 학습된 신원 복원 모델을 네트워크(300)를 통해 서버(200) 또는 다른 단말 장치로 전송할 수 있다.In one example, the facial image reconstruction apparatus 100 may learn an identity restoration model using training data provided from the server 200 through the network 300. In one example, the facial image reconstruction device 100 may transmit the learned identity recovery model to the server 200 or another terminal device through the network 300.

일 예에서 얼굴 이미지 재구성 장치(100)는 기학습된 신원 복원 모델을 네트워크(300)를 통해 수신할 수 있다. 예를 들어 얼굴 이미지 재구성 장치(100)는 서버(200) 또는 다른 단말 장치에서 학습된 신원 복원 모델을 네트워크(300)를 통해 수신할 수 있다.In one example, the facial image reconstruction apparatus 100 may receive a pre-learned identity reconstruction model through the network 300. For example, the facial image reconstruction device 100 may receive an identity reconstruction model learned from the server 200 or another terminal device through the network 300.

얼굴 이미지 재구성 장치(100)는 학습된 신원 복원 모델을 실행하여 입력 이미지에 포함된 저화질 얼굴 이미지를 고화질 얼굴 이미지로 재구성할 수 있다. 여기서 얼굴 이미지 재구성 장치(100)는 입력 이미지를 직접 촬영하거나 또는 네트워크(300)를 통해 서버(200) 또는 다른 단말 장치로부터 입력 이미지를 수신할 수 있다.The facial image reconstruction apparatus 100 may execute the learned identity restoration model to reconstruct a low-quality face image included in the input image into a high-quality face image. Here, the facial image reconstruction apparatus 100 may directly capture an input image or receive an input image from the server 200 or another terminal device through the network 300.

얼굴 이미지 재구성 장치(100)는 단말 또는 서버(200)에서 구현될 수 있다. 여기서 단말은 사용자가 조작하는 데스크 탑 컴퓨터, 스마트폰, 노트북, 태블릿 PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 단말은 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 디바이스일 수 있다.The facial image reconstruction device 100 may be implemented in a terminal or server 200. Here, the terminal may be a desktop computer, smartphone, laptop, tablet PC, smart TV, mobile phone, PDA (personal digital assistant), laptop, digital camera, home appliance, and other mobile or non-mobile computing devices operated by the user. It is not limited to this. Additionally, the terminal may be a wearable device such as a watch, glasses, hair band, or ring equipped with a communication function and data processing function.

일 예에서 단말 또는 서버(200)는 실시예에 따른 얼굴 이미지 재구성 방법을 실행하는 어플리케이션(application) 또는 앱(app)을 실행하여 입력 이미지에 포함된 얼굴 이미지를 재구성할 수 있다.In one example, the terminal or server 200 may reconstruct a facial image included in an input image by executing an application or app that executes a facial image reconstruction method according to an embodiment.

서버(200)는 학습 데이터를 분석하여 신원 복원 모델을 훈련시키고, 훈련된 신원 복원 모델을 네트워크(300)를 통해 얼굴 이미지 재구성 장치(100)에게 제공할 수 있다. 다른 예에서, 얼굴 이미지 재구성 장치(100)는 서버(200)와의 연결 없이, 온디바이스(on-device) 방식으로 신원 복원 모델을 훈련시킬 수 있다.The server 200 may analyze the learning data to train an identity recovery model and provide the trained identity recovery model to the facial image reconstruction device 100 through the network 300. In another example, the facial image reconstruction apparatus 100 may train an identity restoration model in an on-device manner without connection to the server 200.

네트워크(300)는 유선 및 무선 네트워크, 예를 들어 LAN(local area network), WAN(wide area network), 인터넷(internet), 인트라넷(intranet) 및 엑스트라넷(extranet), 그리고 모바일 네트워크, 예를 들어 셀룰러, 3G, LTE, 5G, WiFi 네트워크, 애드혹 네트워크 및 이들의 조합을 비롯한 임의의 적절한 통신 네트워크 일 수 있다.Network 300 may include wired and wireless networks, such as local area networks (LANs), wide area networks (WANs), the Internet, intranets, and extranets, and mobile networks, such as It may be any suitable communications network, including cellular, 3G, LTE, 5G, WiFi networks, ad hoc networks, and combinations thereof.

네트워크(300)는 허브, 브리지, 라우터, 스위치 및 게이트웨이와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크(300)는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크(300)에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다.Network 300 may include connections of network elements such as hubs, bridges, routers, switches, and gateways. Network 300 may include one or more connected networks, including public networks such as the Internet and private networks such as secure enterprise private networks, such as a multi-network environment. Access to network 300 may be provided through one or more wired or wireless access networks.

이하에서 도 2 내지 도 7을 참조하여 실시예에 따른 얼굴 이미지 재구성 방법 및 장치에 대하여 보다 상세히 살펴본다.Hereinafter, with reference to FIGS. 2 to 7 , the facial image reconstruction method and device according to the embodiment will be described in more detail.

도 2는 실시예에 따른 얼굴 이미지 재구성 장치의 블록도이다.Figure 2 is a block diagram of a facial image reconstruction device according to an embodiment.

실시예에 따른 얼굴 이미지 재구성 장치(100)는 메모리(120) 및 프로세서(110)를 포함할 수 있다. 이와 같은 구성은 예시적인 것이고, 얼굴 이미지 재구성 장치(100)는 도 2에 도시된 구성 중 일부를 포함하거나, 도 2에 도시되지 않았으나 장치의 작동을 위해 필요한 구성을 추가로 포함할 수 있다.The facial image reconstruction apparatus 100 according to an embodiment may include a memory 120 and a processor 110. This configuration is an example, and the facial image reconstruction device 100 may include some of the configurations shown in FIG. 2 or may additionally include components that are not shown in FIG. 2 but are necessary for operation of the device.

프로세서(110)는 일종의 중앙처리장치로서, 메모리(120)에 저장된 하나 이상의 명령어를 실행하여 얼굴 이미지 재구성 장치(100)의 동작을 제어할 수 있다.The processor 110 is a type of central processing unit and can control the operation of the facial image reconstruction device 100 by executing one or more instructions stored in the memory 120.

프로세서(110)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 프로세서(110)는 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다.Processor 110 may include all types of devices capable of processing data. The processor 110 may mean, for example, a data processing device built into hardware that has a physically structured circuit to perform a function expressed by code or instructions included in a program.

이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로서, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 그래픽 처리 유닛(Graphic Processing Unit; GPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 이에 한정되는 것은 아니다. 프로세서(110)는 하나 이상의 프로세서를 포함할 수 있다.Examples of data processing devices built into hardware include microprocessors, central processing units (CPUs), graphics processing units (GPUs), processor cores, and multi-processor devices. It may include processing devices such as a processor (multiprocessor), application-specific integrated circuit (ASIC), and field programmable gate array (FPGA), but is not limited thereto. Processor 110 may include one or more processors.

얼굴 이미지 재구성 장치(100)는 생성기 및 상기 생성기와 생성적 적대 신경망의 경쟁 관계에 있는 판별기를 포함하는 신원 복원 모델을 저장하는 메모리(120) 및 얼굴 이미지 및 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터에 기반하여 신원 복원 모델의 학습을 실행하도록 구성되는 프로세서(110)를 포함할 수 있다.The face image reconstruction device 100 includes a memory 120 that stores an identity reconstruction model including a generator and a discriminator in competition with the generator and a generative adversarial network, and a face image and a correct answer face image for the face image. It may include a processor 110 configured to execute learning of the identity recovery model based on the learning data.

프로세서(110)는 신원 복원 모델의 학습을 실행하기 위하여, 생성기를 실행하여, 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 작업을 수행할 수 있다.In order to learn the identity recovery model, the processor 110 may execute a generator to perform a generation task of generating a reconstructed face image in which the identity of the face shown in the face image is restored.

프로세서(110)는 신원 복원 모델의 학습을 실행하기 위하여, 판별기를 실행하여, 정답 얼굴 이미지에 기반하여 생성기에서 재구성된 얼굴 이미지를 판별하는 판별 작업을 수행하도록 구성될 수 있다.The processor 110 may be configured to execute a discriminator to perform learning of the identity recovery model and perform a discrimination task to determine the facial image reconstructed by the generator based on the correct facial image.

일 예에서, 생성기는 얼굴 랜드마크 예측기 및 얼굴 업샘플러를 포함할 수 있다. 프로세서(110)는, 생성기에서의 생성 작업을 수행하기 위하여, 얼굴 랜드마크 예측기를 실행하여 얼굴 이미지에 기반하여 복수 개의 얼굴 랜드마크를 예측하고, 얼굴 업샘플러를 실행하여 복수 개의 얼굴 랜드마크를 이용하여 얼굴 이미지를 업샘플링(upsampling)하도록 구성될 수 있다.In one example, the generator may include a facial landmark predictor and a facial upsampler. In order to perform the generation task in the generator, the processor 110 executes a facial landmark predictor to predict a plurality of facial landmarks based on the face image, and executes a facial upsampler to use the plurality of facial landmarks. It may be configured to upsample the face image.

일 예에서 생성기는 복수 개의 잔차 블록(Residual Block)을 포함하는 중간 이미지 생성기를 더 포함할 수 있다. 프로세서(110)는 생성기에서의 생성 작업을 수행하기 위하여, 중간 이미지 생성기를 이용하여 얼굴 이미지의 화질을 개선한 중간 이미지를 생성하고, 얼굴 랜드마크 예측기를 실행하여 중간 이미지에 기반하여 복수 개의 얼굴 랜드마크를 예측하고, 얼굴 업샘플러를 실행하여 중간 이미지에 기반하여 예측된 복수 개의 얼굴 랜드마크를 이용하여 중간 이미지를 업샘플링하도록 구성될 수 있다.In one example, the generator may further include an intermediate image generator including a plurality of residual blocks. In order to perform the creation task in the generator, the processor 110 generates an intermediate image with improved quality of the face image using an intermediate image generator, and executes a facial landmark predictor to generate a plurality of face lands based on the intermediate image. It may be configured to predict a mark, run a face upsampler, and upsample the intermediate image using a plurality of facial landmarks predicted based on the intermediate image.

일 예에서 신원 복원 모델은 얼굴 특징 추출기를 더 포함할 수 있다. 프로세서(110)는 신원 복원 모델의 학습을 실행하기 위하여, 얼굴 특징 추출기를 실행하여, 생성기에서 재구성된 얼굴 이미지의 특징맵 및 정답 얼굴 이미지의 특징맵을 추출하도록 구성될 수 있다.In one example, the identity recovery model may further include a facial feature extractor. The processor 110 may be configured to execute a facial feature extractor to extract a feature map of the facial image reconstructed from the generator and a feature map of the correct facial image in order to learn the identity recovery model.

일 예에서 프로세서(110)는, 신원 복원 모델의 학습을 실행하기 위하여, 학습 목표 함수를 연산하고, 학습 목표 함수의 함수값을 최소화하도록 생성기와 상기 판별기를 교번하여 학습시키도록 구성될 수 있다.In one example, the processor 110 may be configured to calculate a learning objective function and alternately train a generator and the discriminator to minimize the function value of the learning objective function, in order to perform learning of the identity recovery model.

여기서 학습 목표 함수는, 생성기에 대한 GAN 손실 함수를 포함한 제 1 목표 함수 및 판별기에 대한 GAN 손실 함수에 기반한 제 2 목표 함수를 포함할 수 있다.Here, the learning objective function may include a first objective function including a GAN loss function for the generator and a second objective function based on the GAN loss function for the discriminator.

제 1 목표 함수는, 재구성된 얼굴 이미지와 정답 얼굴 이미지 간의 픽셀 재구성 정확도 함수, 재구성된 얼굴 이미지의 생성 작업에서 예측한 얼굴 랜드마크의 예측 정확도 함수 및 재구성된 얼굴 이미지와 정답 얼굴 이미지 간의 얼굴 특징 유사도 함수를 더 포함할 수 있다.The first objective function is the pixel reconstruction accuracy function between the reconstructed face image and the correct face image, the prediction accuracy function of the facial landmarks predicted in the task of generating the reconstructed face image, and the facial feature similarity between the reconstructed face image and the correct face image. Additional functions may be included.

일 예에서, 프로세서(110)는, 탐색 대상의 얼굴 이미지 및 해당 탐색 대상의 얼굴 이미지에 대한 기준 얼굴 이미지를 포함하는 제 2 학습 데이터에 기반하여 신원 복원 모델을 미세튜닝하는 제 2 학습을 실행하도록 구성될 수 있다.In one example, the processor 110 is configured to perform second learning to fine-tune the identity recovery model based on second learning data including a facial image of the search target and a reference facial image for the facial image of the search target. It can be configured.

프로세서(110)는, 제 2 학습을 실행하기 위하여, 제 2 학습 데이터에 기반하여 생성 작업 및 판별 작업을 수행하도록 구성될 수 있다.The processor 110 may be configured to perform a creation task and a determination task based on the second learning data in order to perform the second learning.

메모리(120)는 실시예에 따른 얼굴 이미지 재구성 과정을 실행하기 위한 하나 이상의 명령을 포함하는 프로그램을 저장할 수 있다. 프로세서(110)는 메모리(120)에 저장된 프로그램, 명령어들에 기반하여 실시예에 따른 얼굴 이미지 재구성 과정을 실행할 수 있다.The memory 120 may store a program including one or more commands for executing a facial image reconstruction process according to an embodiment. The processor 110 may execute a facial image reconstruction process according to an embodiment based on programs and instructions stored in the memory 120.

메모리(120)는 신원 복원 모델(ICN) 및 신원 복원 모델(ICN)에 의한 얼굴 이미지 재구성을 위한 연산 과정에서 발생하는 중간 데이터 및 연산 결과 등을 더 저장할 수 있다.The memory 120 may further store the identity restoration model (ICN) and intermediate data and calculation results generated during the calculation process for facial image reconstruction by the identity restoration model (ICN).

메모리(120)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD, CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다. 메모리(120)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 이에 한정되는 것은 아니다.Memory 120 may include internal memory and/or external memory, such as volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, and NAND. Non-volatile memory such as flash memory or NOR flash memory, flash drives such as SSD, compact flash (CF) card, SD card, Micro-SD card, Mini-SD card, Xd card, or memory stick, etc. Alternatively, it may include a storage device such as an HDD. The memory 120 may include, but is not limited to, magnetic storage media or flash storage media.

실시예에 따른 얼굴 이미지 재구성 장치(100)는 통신부(130)를 더 포함할 수 있다.The facial image reconstruction apparatus 100 according to the embodiment may further include a communication unit 130.

통신부(130)는 얼굴 이미지 재구성 장치(100)의 데이터의 송신 및 수신을 위한 통신 인터페이스를 포함한다. 통신부(130)는 얼굴 이미지 재구성 장치(100)에게 다양한 방식의 유무선 통신 경로를 제공하여 얼굴 이미지 재구성 장치(100)를 도 1을 참조하여 네트워크(300)와 연결할 수 있다.The communication unit 130 includes a communication interface for transmitting and receiving data of the facial image reconstruction device 100. The communication unit 130 may provide various types of wired and wireless communication paths to the facial image reconstruction device 100 to connect the facial image reconstruction device 100 to the network 300 with reference to FIG. 1 .

얼굴 이미지 재구성 장치(100)는 통신부(130)를 통해 입력 이미지, 학습 데이터, 제 2 학습 데이터, 중간 이미지 및 재구성된 이미지 등을 송/수신할 수 있다. 통신부(130)는 예를 들어 각종 무선 인터넷 모듈, 근거리 통신 모듈, GPS 모듈, 이동 통신을 위한 모뎀 등에서 적어도 하나 이상을 포함하도록 구성될 수 있다.The facial image reconstruction device 100 may transmit/receive input images, learning data, second learning data, intermediate images, and reconstructed images through the communication unit 130. The communication unit 130 may be configured to include at least one of various wireless Internet modules, short-range communication modules, GPS modules, modems for mobile communication, etc., for example.

얼굴 이미지 재구성 장치(100)는 프로세서(110), 메모리(120) 및 통신부(130) 간에 물리적/논리적 연결 경로를 제공하는 버스(140)를 더 포함할 수 있다.The facial image reconstruction apparatus 100 may further include a bus 140 that provides a physical/logical connection path between the processor 110, the memory 120, and the communication unit 130.

도 3은 실시예에 따른 얼굴 이미지 재구성 방법의 흐름도이다.Figure 3 is a flowchart of a facial image reconstruction method according to an embodiment.

실시예에 따른 얼굴 이미지 재구성 방법은, 얼굴 이미지 및 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득하는 단계(S1), 학습 데이터에 기반하여 신원 복원 모델(Identity Clarification Network; ICN)을 학습하는 단계(S2)를 포함할 수 있다.The face image reconstruction method according to the embodiment includes acquiring learning data including a face image and a correct facial image for the face image (S1), and learning an identity restoration model (Identity Clarification Network (ICN)) based on the learning data. It may include a step (S2).

단계(S1)에서 프로세서(110)는 얼굴 이미지 및 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득한다.In step S1, the processor 110 acquires learning data including a face image and a correct answer face image for the face image.

여기서 얼굴 이미지는 신원 복원 모델에 대한 입력 데이터이고, 입력 이미지에 대한 정답 얼굴 이미지는 해당 얼굴 이미지로부터 신원 복원 모델이 생성한 재구성된 얼굴 이미지(Reconstructed Face Image)에 대한 정답(Ground Truth) 데이터에 대응한다.Here, the face image is input data to the identity restoration model, and the correct face image for the input image corresponds to the ground truth data for the reconstructed face image generated by the identity restoration model from the face image. do.

예를 들어, 신원 복원 모델에 대한 입력 데이터인 얼굴 이미지는 저화질 얼굴 이미지이고, 정답 얼굴 이미지는 저화질 얼굴 이미지보다 고화질의 얼굴 이미지일 수 있다.For example, the face image that is the input data for the identity restoration model may be a low-quality face image, and the correct answer face image may be a higher-quality face image than the low-quality face image.

일 예에서 프로세서(110)는 얼굴 이미지에 대한 정답 얼굴 이미지를 다운샘플링(down sampling)하여 신원 복원 모델에 입력할 얼굴 이미지를 생성할 수 있다.In one example, the processor 110 may generate a face image to be input to an identity restoration model by down sampling the correct facial image for the face image.

예를 들어 프로세서(110)는 다양한 신원을 가진 사람들의 고화질 얼굴 사진을 다운샘플링하여 <고화질 정답 얼굴 이미지, 저화질 얼굴 이미지>로 구성된 학습 데이터셋(training dataset)을 구성할 수 있다. 예를 들어, 프로세서(110)는 약 70,000장의 고화질 얼굴로 구성된 FFHQ 데이터셋(T. Karras et al., "A Style-Based Generator Architecture for Generative Adversarial Networks," CVPR 2019 참조)을 활용할 수 있다.For example, the processor 110 may downsample high-definition face photos of people with various identities to construct a training dataset consisting of <high-quality correct face images and low-quality face images>. For example, the processor 110 may utilize the FFHQ dataset (see T. Karras et al., "A Style-Based Generator Architecture for Generative Adversarial Networks," CVPR 2019) consisting of approximately 70,000 high-definition faces.

일 예에서 프로세서(110)는 도 1을 참조하여 서버(200)로부터 또는 다른 단말로부터 네트워크(200)를 통해 <고화질 정답 얼굴 이미지, 저화질 얼굴 이미지>로 구성된 학습 데이터셋을 수신할 수 있다.In one example, the processor 110 may receive a learning dataset consisting of <high-quality correct answer face image, low-quality face image> from the server 200 or another terminal through the network 200, with reference to FIG. 1 .

단계(S2)에서 프로세서(110)는 단계(S1)에서 획득한 학습 데이터에 기반하여 신원 복원 모델을 학습한다.In step S2, the processor 110 learns an identity recovery model based on the training data obtained in step S1.

단계(S2)는 신원 복원 모델의 생성기(Generator)를 실행하여, 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 단계(도 5를 참조하여 S21) 및 생성기와 생성적 적대 신경망(Generative Adversarial Network; GAN)의 경쟁 관계에 있는 신원 복원 모델의 판별기(Discriminator)를 실행하여, 정답 얼굴 이미지에 기반하여 재구성된 얼굴 이미지를 판별하는 판별 단계(도 5를 참조하여 S22)를 포함한다.Step (S2) is a generation step (S21, referring to FIG. 5) that executes the generator of the identity recovery model to generate a reconstructed face image that restores the identity of the face shown in the face image, and the generator and generative A discriminator step in which the discriminator of the competing identity restoration model of the Generative Adversarial Network (GAN) is executed to determine the reconstructed face image based on the correct face image (S22, referring to FIG. 5). Includes.

단계(S2)에서 프로세서(110)는 단계(S1)에서 구성된 학습 데이터셋을 기반으로 신원 복원 모델을 학습한다. 이 과정을 거쳐 학습된 신원 복원 모델은 임의의 저화질 입력을 신원 정보를 보존하면서 고화질 얼굴로 재구성할 수 있는 능력을 가지게 된다. 신원 정보는 대상의 얼굴의 시각적 특징에 의하여 부여되는 아이덴티티 정보를 의미한다. 단계(S2)에 대하여는 도 5를 참조하여 구체적으로 살펴본다.In step S2, the processor 110 learns an identity recovery model based on the training dataset constructed in step S1. The identity restoration model learned through this process has the ability to reconstruct arbitrary low-quality inputs into high-quality faces while preserving identity information. Identity information refers to identity information given by the visual characteristics of the target's face. Step (S2) will be looked at in detail with reference to FIG. 5.

실시예에 따른 얼굴 이미지 재구성 방법은, 탐색 대상의 얼굴 이미지 및 해당 탐색 대상의 얼굴 이미지에 대한 기준 얼굴 이미지를 포함하는 제 2 학습 데이터를 획득하는 단계(S3) 및 제 2 학습 데이터에 기반하여 단계(S2)에서 학습된 신원 복원 모델을 미세튜닝(fine-tuning)하는 제 2 학습 단계(S4)를 더 포함할 수 있다.The facial image reconstruction method according to the embodiment includes acquiring second learning data including a face image of a search target and a reference face image for the face image of the search target (S3) and a step based on the second learning data. It may further include a second learning step (S4) of fine-tuning the identity recovery model learned in (S2).

단계(S3)에서 프로세서(110)는 탐색 대상에 대한 <고화질 기준 얼굴 이미지(probe), 저화질 얼굴 이미지>로 구성된 제 2 학습 데이터셋을 구성할 수 있다. 예를 들어, 프로세서(110)는 탐색 대상별로 복수의 기준 얼굴 이미지 및 각 기준 얼굴 이미지에 대한 저화질 얼굴 이미지로 제 2 학습 데이터를 구성할 수 있다.In step S3, the processor 110 may configure a second learning dataset consisting of <a high-quality reference face image (probe) and a low-quality face image> for the search target. For example, the processor 110 may configure the second learning data with a plurality of reference face images for each search target and a low-quality face image for each reference face image.

일 예에서 단계(S3)에서 프로세서(110)는 단계(S1)에서 학습 데이터를 획득한 방식을 단계(S3)에서 제 2 학습 데이터 획득에 적용할 수 있다.In one example, in step S3, the processor 110 may apply the method of acquiring the learning data in step S1 to acquiring the second learning data in step S3.

단계(S4)에서 프로세서(110)는 제 2 학습 데이터에 기반하여 단계(S2)에서 학습된 신원 복원 모델을 미세튜닝하는 제 2 학습을 수행할 수 있다.In step S4, the processor 110 may perform second learning to fine-tune the identity recovery model learned in step S2 based on the second learning data.

일 예에서, 제 2 학습 단계는 단계(S3)에서 획득한 제 2 학습 데이터에 기반하여 후술할 도 5를 참조하여 생성 단계(S21) 및 판별 단계(S22)를 실행하는 단계를 포함할 수 있다.In one example, the second learning step may include executing a generating step (S21) and a determining step (S22) based on the second learning data obtained in step S3 with reference to FIG. 5, which will be described later. .

단계(S4)에서 프로세서(110)는 탐색 대상에 대한 기준 이미지에 기반한 제 2 학습 데이터를 활용하여, 단계(S2)에서 학습된 신원 복원 모델의 미세 튜닝을 위한 제 2 학습을 실행한다. 단계(S4)에서 제 2 학습된 신원 복원 모델은 탐색 대상에 특화되어, 탐색 대상의 저화질 얼굴을 더욱 탐색 대상과 유사하게 재구성할 수 있는 능력을 가지게 된다. 단계(S4)에 대하여는 도 7을 참조하여 살펴본다.In step S4, the processor 110 uses second learning data based on the reference image for the search target to perform second learning for fine tuning of the identity recovery model learned in step S2. In step S4, the second learned identity recovery model is specialized for the search target and has the ability to reconstruct the low-quality face of the search target to be more similar to the search target. Step S4 will be described with reference to FIG. 7.

추가적으로 실시예에 따른 얼굴 이미지 재구성 방법은, 단계(S4)에서 제 2 학습된 신원 복원 모델을 이용하여 입력 영상에서 탐색 대상을 인식하는 단계(S5)를 더 포함할 수 있다.Additionally, the facial image reconstruction method according to the embodiment may further include a step (S5) of recognizing a search target in the input image using the identity restoration model learned in step (S4).

단계(S5)에서 프로세서(110)는 입력 영상에서 추출된 저화질 얼굴 이미지를 입력 데이터로 하여 학습된 신원 복원 모델을 실행하여 저화질 얼굴 이미지로부터 고화질 얼굴 이미지를 재구성할 수 있다.In step S5, the processor 110 may reconstruct a high-quality face image from the low-quality face image by executing the learned identity restoration model using the low-quality face image extracted from the input image as input data.

또한, 단계(S5)에서 프로세서(110)는 신원 복원 모델에 의해 재구성된 고화질 얼굴 이미지에 기반하여 입력 영상에서 탐색된 적어도 하나의 얼굴 영역과 탐색 대상의 유사도를 결정하고, 결정된 유사도에 기반하여 입력 영상에서 탐색된 적어도 하나의 얼굴 영역 중에 탐색 대상이 있는 지 여부를 결정할 수 있다.Additionally, in step S5, the processor 110 determines the similarity between at least one face area searched in the input image and the search target based on the high-definition face image reconstructed by the identity recovery model, and determines the similarity between the search target and the input image based on the determined similarity. It may be determined whether there is a search target among at least one face area searched in the image.

이하에서 실시예에 따른 얼굴 이미지 재구성 방법에서 사용하는 신원 복원 모델 및 학습 구조에 대하여 도 4를 참조하여 살펴본다.Hereinafter, the identity restoration model and learning structure used in the facial image reconstruction method according to the embodiment will be described with reference to FIG. 4.

도 4는 실시예에 따른 신원 복원 모델 및 학습 구조를 설명하기 위한 도면이다.Figure 4 is a diagram for explaining an identity recovery model and learning structure according to an embodiment.

실시예에서 먼 거리에서 찍힌 작은 얼굴을 정확히 인식하기 위해 저화질 입력 얼굴의 화질을 개선하여 고화질 얼굴로 재구성하는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 신원 복원 모델(Identity Clarification Network; ICN)을 설계하였다.In an example, in order to accurately recognize small faces taken from a distance, a deep neural network (DNN) identity clarification network (ICN) was designed to improve the image quality of low-quality input faces and reconstruct them into high-definition faces. .

신원 복원 모델은 생성적 적대 신경망 구조에 기반한 딥 뉴럴 네트워크를 포함한다. 신원 복원 모델은 입력된 얼굴 이미지(LR)로부터 재구성된 얼굴 이미지(Reconstructed )를 생성하는 생성기(G)와 입력된 얼굴 이미지(LR)에 대한 정답 얼굴 이미지(Ground Truth )에 기초하여 생성기(G)가 생성한 재구성된 얼굴 이미지(Reconstructed )가 정답 얼굴 이미지(Ground Truth )에 대응하는 지 여부를 판별하는 판별기(D)를 포함한다.The identity restoration model includes a deep neural network based on a generative adversarial network structure. The identity restoration model is a face image reconstructed from the input face image (LR). ) and the correct face image (Ground Truth) for the input face image (LR). ) Based on the reconstructed face image generated by the generator (G) (Reconstructed ) is the correct answer. Face image (Ground Truth) ) includes a discriminator (D) that determines whether or not it corresponds to.

생성기(G)와 판별기(D)는 생성적 적대 신경망(GAN)의 경쟁 함수로, 생성기(G)에 대한 GAN 손실 함수(L_GAN) 및 판별기(D)에 대한 GAN 손실 함수(L_{Discriminator})를 정의한다.The generator (G) and discriminator (D) are competing functions of a generative adversarial network (GAN), with the GAN loss function (L _GAN ) for the generator (G) and the GAN loss function (L _{Discriminator} ) for the discriminator (D). ) is defined.

생성기(Generator)(G)는 얼굴 랜드마크 예측기(Face Landmark Estimator)(G_FLE)를 포함한다. 얼굴 랜드마크 예측기(G_FLE)는 생성기(G)에 입력되는 저화질 얼굴 이미지(LR)으로부터 적어도 하나의 얼굴 랜드마크(landmark )를 추출할 수 있다. 여기서 예를 들어 얼굴 랜드마크는 얼굴의 윤곽 정보 및 이목구비 정보를 포함한다.Generator (G) includes a Face Landmark Estimator (G_FLE). The facial landmark predictor (G_FLE) predicts at least one facial landmark from the low-quality face image (LR) input to the generator (G). ) can be extracted. Here, for example, the facial landmark includes facial outline information and facial features information.

얼굴 랜드마크 예측기(G_FLE)가 예측한 얼굴 랜드마크와 정답 얼굴 이미지로부터 추출된 얼굴 랜드마크에 기반하여 후술할 랜드마크 정확도 함수(L_landmark)가 정의된다.A landmark accuracy function (L _landmark ), which will be described later, is defined based on the facial landmark predicted by the facial landmark predictor (G_FLE) and the facial landmark extracted from the correct facial image.

신원 복원 모델에 얼굴 랜드마크 예측기(G_FLE)를 도입함으로써 생성기(G)가 고화질 얼굴을 재구성하는 과정에서, 고화질 얼굴 이미지(Reconstructed )의 재구성 정확도가 향상될 수 있다.In the process of the generator (G) reconstructing a high-definition face by introducing a facial landmark predictor (G_FLE) into the identity restoration model, a high-definition face image (Reconstructed ) reconstruction accuracy can be improved.

생성기(D)는 얼굴 업샘플러(Face Upsampler)(G_FUP)를 포함한다. 얼굴 업샘플러(G_FUP)는 얼굴 랜드마크 예측기(G_FLE)로부터 추출된 적어도 하나의 얼굴 랜드마크에 기초하여 저화질 얼굴 이미지(LR)로부터 재구성된 고화질 얼굴 이미지(Reconstructed )를 생성한다.The generator (D) includes a Face Upsampler (G_FUP). The face upsampler (G_FUP) is a high-quality face image reconstructed from a low-quality face image (LR) based on at least one facial landmark extracted from the face landmark predictor (G_FLE). ) is created.

얼굴 업샘플러(G_FUP)가 재구성한 얼굴 이미지와 정답 얼굴 이미지의 픽셀값에 기반하여 후술할 픽셀 정확도 함수(L_pixel)가 정의된다.A pixel accuracy function (L _pixel ), which will be described later, is defined based on the pixel values of the face image reconstructed by the face upsampler (G_FUP) and the correct face image.

생성기(G)의 구조는 도 6을 참조하여 후술한다. 판별기(D)는 GAN 학습을 위한 잔차 블록(Residual block) 기반 구조를 포함하여 구성될 수 있다.The structure of the generator (G) will be described later with reference to FIG. 6. The discriminator (D) may be configured to include a residual block-based structure for GAN learning.

추가적으로 신원 복원 모델은 얼굴 특징 추출기(Face Feature Extractor)()를 포함한다. 얼굴 특징 추출기()는 재구성된 얼굴 이미지(Reconstructed )의 특징맵(feature map) 및 정답 얼굴 이미지(Ground Truth )의 특징맵을 추출한다.Additionally, the identity recovery model uses a Face Feature Extractor ( ) includes. Facial Feature Extractor ( ) is a reconstructed face image (Reconstructed )'s feature map and the correct face image (Ground Truth) ) extract the feature map.

얼굴 특징 추출기()는 재구성된 얼굴 이미지(Reconstructed )의 특징맵(feature map) 및 정답 얼굴 이미지(Ground Truth )의 특징맵에 기반하여 후술할 얼굴 특징 유사도 함수(L_face)가 정의된다.Facial Feature Extractor ( ) is a reconstructed face image (Reconstructed )'s feature map and the correct face image (Ground Truth) Based on the feature map of ), the facial feature similarity function (L _face ), which will be described later, is defined.

얼굴 특징 추출기()는 예를 들어 잔차 블록 기반 ArcFace 네트워크 구조의 얼굴 인식 네트워크를 적용할 수 있으며, 이에 제한되지 않고 얼굴 인식을 위한 다양한 신경망 구조를 포함할 수 있다.Facial Feature Extractor ( ) may apply, for example, a face recognition network of the residual block-based ArcFace network structure, but is not limited to this and may include various neural network structures for face recognition.

실시예에 따른 얼굴 이미지 재구성 장치(100)는 생성기(G)가 사실적인 얼굴을 재구성하도록 하는 제 1 목표 함수(L_total)와 판별기(D)가 생성기(G)가 재구성한 얼굴과 정답 얼굴을 잘 구분할 수 있도록 하는 제 2 목표 함수(L_{Discriminator})를 번갈아 최소화하도록 학습하여, 신원 복원 모델이 높은 재구성 정확도를 얻을 수 있도록 한다. 이에 대하여는 도 5를 참조하여 단계(S24) 및 단계(S25)에서 설명한다.The facial image reconstruction device 100 according to the embodiment includes a first objective function (L _total ) that allows the generator (G) to reconstruct a realistic face, and a discriminator (D) to determine the face reconstructed by the generator (G) and the correct face. By learning to alternately minimize the second objective function (L _{Discriminator} ), which allows for good discrimination, the identity recovery model can achieve high reconstruction accuracy. This will be explained in steps S24 and S25 with reference to FIG. 5 .

도 5는 실시예에 따른 얼굴 이미지 재구성 방법의 학습 과정에 대한 흐름도이다.Figure 5 is a flowchart of a learning process of a facial image reconstruction method according to an embodiment.

일 예에서 도 5에 도시된 단계(S21) 내지 단계(S25)는 도 3을 참조하여 단계(S2)의 학습 단계 또는 단계(S4)의 제 2 학습 단계에서 실행될 수 있다.In one example, steps S21 to S25 shown in FIG. 5 may be executed in the learning step of step S2 or the second learning step of step S4 with reference to FIG. 3 .

단계(S21)에서 프로세서(110)는 신원 복원 모델의 생성기(G)를 실행하여, 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지(Reconstructed )를 생성할 수 있다.In step S21, the processor 110 executes the generator (G) of the identity restoration model to restore the identity of the face shown in the face image. ) can be created.

단계(S21)에서 프로세서(110)는 도 4를 참조하여 신원 복원 모델의 생성기(G)를 실행하여 입력된 얼굴 이미지(LR)에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지(Reconstructed )를 생성할 수 있다.In step S21, the processor 110 executes the generator (G) of the identity restoration model with reference to FIG. 4 to generate a reconstructed face image (Reconstructed) in which the identity of the face shown in the input face image (LR) is restored. ) can be created.

이하에서 도 6을 참조하여 생성기(G)의 네트워크 구조에 대하여 보다 상세히 살펴본다.Below, the network structure of the generator (G) will be examined in more detail with reference to FIG. 6.

도 6은 실시예에 따른 신원 복원 모델의 생성기의 네트워크 구조를 설명하기 위한 도면이다.FIG. 6 is a diagram illustrating the network structure of a generator of an identity recovery model according to an embodiment.

생성기(G)는 저화질 얼굴 이미지(LR)의 화질을 일차적으로 개선한 중간 이미지(IN)를 생성하고, 중간 이미지(IN)로부터 얼굴 랜드마크를 예측한 후 이를 활용하여 최종 고화질 얼굴(HR)을 출력할 수 있다.The generator (G) generates an intermediate image (IN) that primarily improves the quality of the low-quality face image (LR), predicts facial landmarks from the intermediate image (IN), and uses this to create the final high-quality face (HR). Can be printed.

일 예에서 생성기(G)는 저화질 입력 얼굴 이미지(LR)로부터 중간 이미지(IN)를 생성하는 뉴럴 네트워크를 포함하는 중간 이미지 생성기G_IN), 중간 이미지(IN)로부터 얼굴 랜드마크를 예측하는 뉴럴 네트워크를 포함하는 얼굴 랜드마크 예측기(G_FLE) 및 중간 이미지(IN) 및 얼굴 랜드마크에 기반하여 얼굴 업샘플링을 수행하여 출력 얼굴 이미지(HR)를 생성하는 뉴럴 네트워크를 포함하는 얼굴 업샘플러(G_FUP)를 포함한다. 여기서 출력 얼굴 이미지(HR)는 도 4를 참조하여 재구성된 얼굴 이미지(Reconstructed )에 대응한다.In one example, the generator (G) includes an intermediate image generator (G_IN) that includes a neural network that generates an intermediate image (IN) from a low-quality input face image (LR), and a neural network that predicts facial landmarks from the intermediate image (IN). Includes a facial landmark predictor (G_FLE) and a facial upsampler (G_FUP), which includes a neural network that performs facial upsampling based on the intermediate images (IN) and facial landmarks to produce an output facial image (HR). do. Here, the output face image (HR) is a reconstructed face image (Reconstructed with reference to FIG. 4). ) corresponds to

생성기(G)는 다양한 이미지 처리에서 높은 정확도를 달성하는 잔차 블록(Residual block)(K. He et al., “Deep residual learning for image recognition,” CVPR 2016 참조)을 기본 블록 구조로 활용할 수 있다.The generator (G) can utilize the residual block (see K. He et al., “Deep residual learning for image recognition,” CVPR 2016), which achieves high accuracy in various image processing, as the basic block structure.

일 예에서 중간 이미지 생성기(G_IN)는 복수 개의 잔차 블록을 포함할 수 있다. 예를 들어 중간 이미지 생성기(G_IN)는 12개의 잔차 블록을 포함할 수 있다.In one example, the intermediate image generator (G_IN) may include a plurality of residual blocks. For example, the intermediate image generator (G_IN) may contain 12 residual blocks.

일 예에서 얼굴 랜드마크 예측기(G_FLE)는 적어도 하나의 적층 모래시계(Stacked Hourglass) 블록 구조 기반으로 설계할 수 있다. 예를 들어 얼굴 랜드마크 예측기(G_FLE)는 4개의 적층 모래시계 블록을 포함할 수 있다.In one example, the facial landmark predictor (G_FLE) may be designed based on at least one stacked hourglass block structure. For example, the facial landmark predictor (G_FLE) may include four stacked hourglass blocks.

일 예에서 얼굴 업샘플러(G_FUP)는 복수 개의 잔차 블록의 세트를 복수 개 포함할 수 있다. 예를 들어 얼굴 업샘플러(G_FUP)는 3 개의 잔차 블록으로 구성된 잔차 블록 세트를 2 개 포함할 수 있다.In one example, the face upsampler (G_FUP) may include a plurality of sets of residual blocks. For example, the face upsampler (G_FUP) may include two sets of residual blocks composed of three residual blocks.

중간 이미지 생성기(G_IN)에서 생성된 중간 이미지(IN)는 얼굴 업샘플러(G_FUP)에 입력되어 복수 개의 잔차 블록으로 구성된 잔차 블록 세트를 적어도 한 번 거친다. 이후 얼굴 랜드마크 예측기(G_FLE)로부터 예측된 얼굴 랜드마크 정보를 도입하여 얼굴 업샘플러(G_FUP)의 복수 개의 잔차 블록 세트 중 나머지 잔차 블록 세트를 수행하여 출력 얼굴 이미지(HR)를 생성한다.The intermediate image (IN) generated in the intermediate image generator (G_IN) is input to the face upsampler (G_FUP) and goes through a residual block set consisting of a plurality of residual blocks at least once. Afterwards, the facial landmark information predicted from the facial landmark predictor (G_FLE) is introduced and the remaining residual block sets among the plurality of residual block sets of the facial upsampler (G_FUP) are performed to generate an output facial image (HR).

도 5로 돌아와서 단계(S21) 내지 단계(S25)에 대하여 살펴본다.Returning to FIG. 5, steps S21 to S25 will be examined.

단계(S21)에서 생성기(G)는 입력 얼굴 이미지(LR)로부터 얼굴 랜드마크를 예측한 후 이를 활용하여 출력 얼굴 이미지(HR)를 생성할 수 있다.In step S21, the generator (G) may predict facial landmarks from the input face image (LR) and then use them to generate the output face image (HR).

즉, 일 예에서 단계(S21)은 생성기(G)의 얼굴 랜드마크 예측기(G_FLE)를 실행하여 입력 얼굴 이미지(LR)에 기반하여 복수 개의 얼굴 랜드마크를 예측하는 단계 및 생성기(G)의 얼굴 업샘플러(G_FUP)를 실행하여 앞서 예측된 복수 개의 얼굴 랜드마크를 이용하여 입력 얼굴 이미지(LR)를 업샘플링하는 단계를 포함할 수 있다.That is, in one example, step S21 is a step of executing the facial landmark predictor (G_FLE) of the generator (G) to predict a plurality of facial landmarks based on the input face image (LR) and the face of the generator (G) It may include executing an upsampler (G_FUP) to upsample the input face image (LR) using a plurality of previously predicted facial landmarks.

단계(S21)에서 생성기(G)는 저화질 얼굴 이미지(LR)의 화질을 일차적으로 개선한 중간 이미지(IN)를 생성하고, 중간 이미지(IN)로부터 얼굴 랜드마크를 예측한 후 이를 활용하여 최종 고화질 얼굴(HR)을 출력할 수 있다.In step S21, the generator (G) generates an intermediate image (IN) that primarily improves the quality of the low-quality face image (LR), predicts facial landmarks from the intermediate image (IN), and uses this to generate the final high-quality image. The face (HR) can be printed.

즉, 일 예에서 단계(S21)은 복수 개의 잔차 블록을 포함하는 중간 이미지 생성기(G_IN)를 이용하여 입력된 얼굴 이미지(LR)의 화질을 개선한 중간 이미지(IN)를 생성하는 단계, 생성기(G)의 얼굴 랜드마크 예측기(G_FLE)를 실행하여 중간 이미지(IN)에 기반하여 복수 개의 얼굴 랜드마크를 예측하는 단계 및 생성기(G)의 얼굴 업샘플러(G_FUP)를 실행하여 얼굴 랜드마크 예측기(G_FLE)에서 예측된 복수 개의 얼굴 랜드마크를 이용하여 중간 이미지(IN)를 업샘플링하는 단계를 포함할 수 있다.That is, in one example, step S21 is a step of generating an intermediate image (IN) with improved image quality of the input face image (LR) using an intermediate image generator (G_IN) including a plurality of residual blocks, the generator ( Executing the facial landmark predictor (G_FLE) of G) to predict a plurality of facial landmarks based on the intermediate image (IN), and executing the facial upsampler (G_FUP) of the generator (G) to generate the facial landmark predictor ( It may include upsampling the intermediate image (IN) using a plurality of facial landmarks predicted in G_FLE).

단계(S22)에서 프로세서(110)는 생성기(G)와 생성적 적대 신경망의 경쟁 관계에 있는 신원 복원 모델의 판별기(D)를 실행하여, 정답 얼굴 이미지(Ground Truth )에 기반하여 재구성된 얼굴 이미지(Reconstructed )를 판별할 수 있다.In step S22, the processor 110 executes the discriminator (D) of the identity restoration model in competition with the generator (G) and the generative adversarial network to determine the correct face image (Ground Truth). ) Based on the reconstructed face image (Reconstructed ) can be determined.

단계(S23)에서 프로세서(110)는 신원 복원 모델의 얼굴 특징 추출기()를 실행하여 재구성된 얼굴 이미지(Reconstructed )의 특징맵 및 정답 얼굴 이미지(Ground Truth )의 특징맵을 추출할 수 있다.In step S23, the processor 110 generates a facial feature extractor ( ) to run the reconstructed face image (Reconstructed )'s feature map and correct answer face image (Ground Truth) ) feature maps can be extracted.

도 3을 참조하여 단계(S2)는 학습 목표 함수(training loss function)를 연산하는 단계(S24) 및 학습 목표 함수의 함수값을 최소화하도록 생성기(G)와 판별기(D)를 교번하여 학습하는 단계를 더 포함할 수 있다.Referring to Figure 3, step (S2) includes a step (S24) of calculating a learning target function (training loss function) and learning by alternating the generator (G) and discriminator (D) to minimize the function value of the learning target function. Additional steps may be included.

단계(S24)에서 프로세서(110)는 신원 복원 모델의 학습 목표 함수를 연산할 수 있다.In step S24, the processor 110 may calculate the learning target function of the identity recovery model.

일 예에서 신원 복원 모델의 학습 목표 함수는 생성기(G)에 대한 GAN 손실 함수(L_GAN)를 포함한 제 1 목표 함수(L_Total) 및 판별기(D)에 대한 GAN 손실 함수(L_{Discriminator})에 기반한 제 2 목표 함수를 포함할 수 있다.In one example, the learning objective function of the identity recovery model includes a first objective function (L _Total ) including a GAN loss function (L _GAN ) for the generator (G) and a GAN loss function (L _{Discriminator} ) for the discriminator (D). It may include a second objective function based on

일 예에서, 제 1 목표 함수(L_total)는, 재구성된 얼굴 이미지(HR)와 정답 얼굴 이미지(Ground Truth y) 간의 픽셀 재구성 정확도 함수(L_pixel), 재구성된 얼굴 이미지(HR)의 생성 단계(S21)에서 예측한 얼굴 랜드마크의 예측 정확도 함수(L_landmark) 및 재구성된 얼굴 이미지(HR)와 정답 얼굴 이미지(Ground Truth y) 간의 얼굴 특징 유사도 함수(L_face)를 더 포함할 수 있다.In one example, the first objective function (L _total ) is the pixel reconstruction accuracy function (L _pixel ) between the reconstructed face image (HR) and the correct face image (Ground Truth y), and the generation step of the reconstructed face image (HR) It may further include a prediction accuracy function (L _landmark ) of the facial landmark predicted in (S21) and a facial feature similarity function (L _face ) between the reconstructed face image (HR) and the correct face image (Ground Truth y).

제 1 목표 함수(L_total)는, 생성기(G)에 대한 GAN 손실 함수(L_GAN), 재구성된 얼굴 이미지(HR)와 정답 얼굴 이미지(Ground Truth y) 간의 픽셀 재구성 정확도 함수(L_pixel), 재구성된 얼굴 이미지(HR)의 생성 단계(S21)에서 예측한 얼굴 랜드마크의 예측 정확도 함수(L_landmark) 및 재구성된 얼굴 이미지(HR)와 정답 얼굴 이미지(Ground Truth y) 간의 얼굴 특징 유사도 함수(L_face)에 기반하여 정의될 수 있다.The first objective function (L _total ) is the GAN loss function (L _GAN ) for the generator (G), the pixel reconstruction accuracy function (L _pixel ) between the reconstructed face image (HR) and the correct face image (Ground Truth y), The prediction accuracy function (L _landmark ) of the facial landmark predicted in the generation step (S21) of the reconstructed face image (HR) and the facial feature similarity function between the reconstructed face image (HR) and the correct face image (Ground Truth y) ( L _face ) can be defined based on.

이하에서 제 1 목표 함수(L_total) 및 제 2 목표 함수(L_{Discriminator})를 보다 구체적으로 살펴본다.Below, we will look at the first objective function (L _total ) and the second objective function (L _{Discriminator} ) in more detail.

실시예에 따른 얼굴 이미지 재구성 방법은 신원 복원 모델이 고화질 얼굴 이미지(HR)을 재구성하는 동시에 입력 얼굴 이미지에 대응하는 대상의 신원 정보를 보존하기 위하여, 다양한 학습 목표 함수들을 도입하였다.The face image reconstruction method according to the embodiment introduces various learning objective functions so that the identity restoration model reconstructs a high-quality face image (HR) while simultaneously preserving the identity information of the subject corresponding to the input face image.

(1) 픽셀 재구성 정확도(L_pixel): 생성기(G)가 재구성한 얼굴(HR)과 원본 얼굴 즉, 정답 얼굴 이미지(Ground Truth) 간의 픽셀값 간의 L2 거리 함수(1) Pixel reconstruction accuracy (L _pixel ): L2 distance function between pixel values between the face (HR) reconstructed by the generator (G) and the original face, that is, the correct face image (Ground Truth)

H,W는 정답 얼굴 이미지(Ground Truth)의 높이와 너비를 나타내며, 는 정답 이미지(Ground Truth y)의 (i,j)번째 픽셀값을 나타내며, 와 는 각각 중간 이미지(IN) 및 최종 재구성 얼굴 이미지(HR)의 (i,j)번째 픽셀값을 나타낸다.H,W represents the height and width of the correct face image (ground truth), represents the (i,j)th pixel value of the correct answer image (Ground Truth y), and represents the (i,j)th pixel values of the intermediate image (IN) and the final reconstructed face image (HR), respectively.

(2) 얼굴 랜드마크 예측 정확도: 생성기(G)의 재구성된 얼굴 이미지 생성 과정에서 예측한 랜드마크 좌표와 정답 간의 L2 거리 함수(2) Facial landmark prediction accuracy: L2 distance function between the landmark coordinates predicted during the reconstructed face image generation process of the generator (G) and the correct answer.

N은 얼굴 랜드마크의 총 개수를 나타내며, , 는 각각 (i,j)번째 픽셀에서의 n번째 랜드마크에 대한 정답 및 예측 확률을 나타낸다. 예를 들어, 눈, 코, 입 및 얼굴 윤곽선에 해당하는 총 68개의 랜드마크를 사용할 수 있다.N represents the total number of facial landmarks; , represents the correct answer and predicted probability for the nth landmark at the (i,j)th pixel, respectively. For example, a total of 68 landmarks are available, corresponding to the eyes, nose, mouth, and facial outlines.

(3) 얼굴 인식 피쳐(feature) 유사도: 재구성된 얼굴(HR)과 원본 얼굴(Ground Truth)의 얼굴 인식 네트워크 출력 피쳐 간 L2 거리 함수(3) Face recognition feature similarity: L2 distance function between the face recognition network output features of the reconstructed face (HR) and the original face (ground truth)

와 는 각각 정답 얼굴(Ground Truth y) 및 신원 복원 모델이 재구성한 고화질 얼굴(Reconstructed )에 대한 얼굴 특징 추출기()의 출력 피쳐를 나타낸다. d는 출력 피쳐 개수로서 예를 들어 총 512개이다. and are the ground truth face and the high-definition face reconstructed by the identity reconstruction model, respectively. ) facial feature extractor ( ) represents the output features. d is the number of output features, for example, a total of 512.

(4) 생성적 적대 신경망(Generative Adversarial Network, GAN) 학습 목표 함수: 재구성된 얼굴(Reconstructed )이 사실적으로 보이도록 하기 위한 생성기(G)와 판별기(D) 간 경쟁 함수(4) Generative Adversarial Network (GAN) learning objective function: Reconstructed face ) is a competition function between the generator (G) and the discriminator (D) to make it look realistic.

G는 생성기, D는 판별기를 나타낸다.G represents the generator and D represents the discriminator.

종합적으로, 수학식 1 내지 수학식 4에 정의된 목표 함수들을 수학식 5와 같이 통합하여 신원 복원 모델의 학습에 사용한다.Overall, the goal functions defined in Equations 1 to 4 are integrated as shown in Equation 5 and used to learn the identity recovery model.

(생성기(G)가 사실적인 얼굴을 재구성하도록 하는 목표 함수)과 (판별기(D)가 생성기(G)가 재구성한 얼굴()과 정답 얼굴(y)을 잘 구분할 수 있도록 하는 목표 함수)를 번갈아 최소화하도록 학습하여 신원 복원 모델이 높은 재구성 정확도를 얻을 수 있도록 한다. (a goal function that allows the generator (G) to reconstruct a realistic face) and (The discriminator (D) is the face reconstructed by the generator (G) ( ) and the objective function that allows the correct face (y) to be well distinguished) are learned to alternately minimize, allowing the identity reconstruction model to achieve high reconstruction accuracy.

예를 들어, 이와 같은 학습 목표 함수에 의해 총 70,000장으로 구성된 데이터셋으로 학습을 진행하여, NVIDIA RTX 2080Ti GPU 활용 시 약 하루 정도의 학습 시간이 소요된다.For example, training is conducted with a dataset consisting of a total of 70,000 images using this learning objective function, and it takes about a day to learn when using the NVIDIA RTX 2080Ti GPU.

도 7은 실시예에 따른 얼굴 이미지 재구성 과정의 실행 결과를 예시적으로 보여주는 도면이다.Figure 7 is a diagram exemplarily showing the execution result of a facial image reconstruction process according to an embodiment.

(a)는 주어진 정답 얼굴 이미지(Ground Truth y), (b)는 (a)를 다운샘플링하여 생성한 입력 얼굴 이미지(LR), (c)는 도 3을 참조하여 단계(S2)의 학습을 완료한 신원 복원 모델의 생성기(G)를 실행한 결과로 획득한 베이스라인 이미지이고, (d)는 도 3을 참조하여 단계(S4)의 제 2 학습을 완료한 신원 복원 모델의 생성기(G)를 실행한 결과로 미세튜닝된 이미지이다.(a) is the given correct face image (Ground Truth y), (b) is the input face image (LR) generated by downsampling (a), and (c) is the learning of step (S2) with reference to Figure 3. (d) is a baseline image obtained as a result of executing the generator (G) of the completed identity recovery model, and (d) is the generator (G) of the identity recovery model that completed the second learning of step (S4) with reference to FIG. 3. This is a fine-tuned image as a result of running .

도 3을 참조하여 단계(S3) 및 단계(S4)의 제 2 학습 과정은 주어진 탐색 대상의 기준 이미지(probe)를 활용한 미세-튜닝(fine-tuning) 기법을 제공한다.Referring to FIG. 3, the second learning process of steps S3 and S4 provides a fine-tuning technique using a reference image (probe) of a given search target.

얼굴 인식 정확도가 저화질 얼굴에 대해 현저히 떨어지는 현상을 분석한 결과, 다른 사람의 두 얼굴을 다르다고 판단하는 정확도(True Negative)에 비해, 같은 사람의 두 얼굴을 같다고 판단하는 정확도(True Positive)가 떨어지는 것이 저화질 얼굴에 대한 얼굴 인식 정확도를 저하시키는 지배적 요인이다.As a result of analyzing the phenomenon in which face recognition accuracy drops significantly for low-quality faces, the accuracy of judging two faces of the same person as the same (True Positive) is lower than the accuracy of judging that two faces of different people are different (True Negative). It is a dominant factor that reduces face recognition accuracy for low-quality faces.

이를 해결하기 위해, 도 3을 참조하여 단계(S3) 및 단계(S4)에서 탐색 대상의 고화질 기준 얼굴 이미지(probe)를 모아 다운샘플링하여 <고화질 정답, 저화질 입력>으로 구성된 제 2 학습 데이터를 구성하고, 이를 활용하여 도 5를 참조하여 단계(S24)에서 전술한 학습 목표 함수를 기반으로 신원 복원 모델을 미세-튜닝(fine-tuning)하는 제 2 학습을 실행한다.To solve this, referring to FIG. 3, collect and downsample high-quality reference face images (probes) of the search target in steps S3 and S4 to construct second learning data consisting of <high-quality answer, low-quality input>. And, using this, a second learning process is performed to fine-tune the identity recovery model based on the learning objective function described above in step S24 with reference to FIG. 5.

제 2 학습은, 탐색 대상에 특화되어 학습되며, 학습할 데이터셋 수가 상대적으로 작으므로 짧은 시간 안에 학습이 진행된다(예: NVIDIA RTX 2080Ti GPU에서 1시간 이내).The second learning is specialized for the search target, and because the number of datasets to learn is relatively small, learning is carried out in a short time (e.g., less than 1 hour on an NVIDIA RTX 2080Ti GPU).

탐색 대상의 기준 이미지(probe)를 활용한 제 2 학습 기법을 통해 정답 탐지율(true positive)이 약 78% 향상되었다.Through the second learning technique using the reference image (probe) of the search target, the true positive rate was improved by about 78%.

제안된 신원 복원 모델은 얼굴 인식 정확도를 향상시키는 재구성을 위한 모델 구조 및 학습 목표 함수를 도입하였고, 탐색 대상의 기준 이미지(probe)를 활용한 미세-튜닝(fine-tuning) 기법에 의한 제 2 학습을 제안하였다. The proposed identity restoration model introduces a model structure and learning objective function for reconstruction that improves face recognition accuracy, and second learning is performed using a fine-tuning technique using a reference image (probe) of the search target. proposed.

이하에서는 비디오 신원 복원 모델(Video Identity Clarification Model; VICN)을 이용한 얼굴 이미지 재구성에 대하여 살펴본다.Below, we look at facial image reconstruction using the Video Identity Clarification Model (VICN).

실시예에 따른 비디오 신원 복원 모델(VICN)을 이용한 얼굴 이미지 재구성 방법은 도 3을 참조하여 전술한 신원 복원 모델(ICN)을 이용한 입력 이미지에 포함된 얼굴 이미지 재구성 방법을 입력 비디오에 포함된 여러 얼굴 이미지를 처리할 수 있도록 확장한 것이다.The facial image reconstruction method using a video identity restoration model (VICN) according to an embodiment is a method of reconstructing a face image included in an input image using the identity restoration model (ICN) described above with reference to FIG. 3 by reconstructing several faces included in the input video. It has been expanded to process images.

즉, 실시예에 따른 비디오 신원 복원 모델(VICN)을 이용한 얼굴 이미지 재구성 방법은 일련의 이미지 프레임에 포함된 저화질 얼굴 이미지를 고화질로 재구성하기 위한 구성을 추가적으로 포함하며, 이하에서는 도 8 내지 도 11을 참조하여 이와 같은 추가 및 확장된 구성을 중심으로 설명한다.That is, the facial image reconstruction method using a video identity reconstruction model (VICN) according to the embodiment additionally includes a configuration for reconstructing a low-quality facial image included in a series of image frames into high-definition quality. Hereinafter, FIGS. 8 to 11 are shown. The explanation will focus on such additional and expanded configurations.

실시예에 따른 비디오 신원 복원 모델(Video Identity Clarification Model; VICN)을 이용한 얼굴 이미지 재구성 방법은 도 2를 참조하여 전술한 프로세서(110)를 포함한 얼굴 이미지 재구성 장치(100)에 의해 실행될 수 있다.The facial image reconstruction method using the Video Identity Clarification Model (VICN) according to the embodiment may be executed by the facial image reconstruction apparatus 100 including the processor 110 described above with reference to FIG. 2.

도 8은 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법의 흐름도이다.Figure 8 is a flowchart of a facial image reconstruction method using a video identity restoration model according to an embodiment.

실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성 방법은, 프로세서(110)에 의해, 입력 비디오의 일련의 프레임으로부터 트래킹(tracking)된 적어도 하나의 얼굴 이미지 및 이와 같은 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득하는 단계(SS1) 및 획득된 학습 데이터에 기반하여 비디오 신원 복원 모델(VICN)을 학습하는 단계(SS2)를 포함할 수 있다.A facial image reconstruction method using a video identity restoration model according to an embodiment includes, by the processor 110, at least one facial image tracked from a series of frames of an input video and the at least one facial image. It may include a step of acquiring training data including the correct face image (SS1) and a step of learning a video identity restoration model (VICN) based on the acquired training data (SS2).

단계(SS1)에서 프로세서(110)는 입력 비디오의 일련의 프레임으로부터 트래킹(tracking)된 적어도 하나의 얼굴 이미지 및 이와 같은 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득한다. 이에 대하여는 도 9 및 도 10을 참조하여 후술한다.In step SS1, the processor 110 acquires learning data including at least one face image tracked from a series of frames of the input video and a correct face image for the at least one face image. This will be described later with reference to FIGS. 9 and 10.

단계(SS2)는 단계(SS1)에서 획득된 학습 데이터 셋을 기반으로 기본 VICN 모델을 학습한다. 이 과정을 거쳐 학습된 VICN은 임의의 저화질 얼굴 입력 시퀀스를 신원 정보를 보존하면서 고화질 얼굴로 재구성할 수 있는 능력을 가지게 된다.Step (SS2) learns the basic VICN model based on the training data set obtained in step (SS1). VICN learned through this process has the ability to reconstruct arbitrary low-quality face input sequences into high-quality faces while preserving identity information.

단계(SS2)는, 비디오 신원 복원 모델(VICN)의 생성기(Generator; G)를 실행하여, 적어도 하나의 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 단계(도 5를 참조하여 단계 S21에 대응) 및 생성기(G)와 생성적 적대 신경망(Generative Adversarial Network; GAN)의 경쟁 관계에 있는 비디오 신원 복원 모델(VICN)의 판별기(Discriminator; D)를 실행하여, 정답 얼굴 이미지에 기반하여 재구성된 얼굴 이미지를 판별하는 판별 단계(도 5를 참조하여 단계 S22에 대응)를 포함한다.Step SS2 is a generation step of executing a generator (G) of the video identity restoration model (VICN) to generate a reconstructed face image in which the identity of the face appearing in at least one face image is restored (FIG. 5 (Refer to step S21) and execute the discriminator (D) of the video identity restoration model (VICN) in competition with the generator (G) and the generative adversarial network (GAN) to obtain the correct answer. It includes a determination step (corresponding to step S22 with reference to FIG. 5) of determining the reconstructed face image based on the face image.

실시예에 따른 얼굴 이미지 재구성 방법은, 탐색 대상의 적어도 하나의 얼굴 이미지 및 해당 탐색 대상의 적어도 하나의 얼굴 이미지에 대한 기준 얼굴 이미지를 포함하는 제 2 학습 데이터를 획득하는 단계(SS3) 및 제 2 학습 데이터에 기반하여 단계(SS2)에서 학습된 비디오 신원 복원 모델(VICN)을 미세튜닝(fine-tuning)하는 제 2 학습 단계(SS4)를 더 포함할 수 있다.The face image reconstruction method according to the embodiment includes the steps of acquiring second learning data including at least one face image of a search target and a reference face image for the at least one face image of the search target (SS3) and a second step It may further include a second learning step (SS4) of fine-tuning the video identity restoration model (VICN) learned in step (SS2) based on the learning data.

단계(SS3) 및 단계(SS4)는 도 3을 참조하여 전술한 단계(S3) 및 단계(S4)를 탐색 대상의 적어도 하나의 이미지를 처리하도록 변형한 것에 대응한다.Steps SS3 and SS4 correspond to steps S3 and S4 described above with reference to FIG. 3 modified to process at least one image of the search target.

단계(SS3)은 탐색 대상이 찍힌 비디오로부터 탐색 대상의 기준 이미지들(probes)을 모으고, 단계(SS1)과 마찬가지 과정을 통해 <고화질 정답 얼굴 시퀀스, 저화질 입력 얼굴>로 구성된 학습 데이터 셋을 구성한다.Step (SS3) collects reference images (probes) of the search target from the video of the search target, and constructs a learning data set consisting of <high-quality correct answer face sequence and low-quality input face> through the same process as step (SS1). .

단계(SS4)는 단계(SS3)에서 획득한 학습 데이터 셋을 활용하여 비디오 신원 복원 모델(VICN)의 파인-튜닝(fine-tuning) 학습 과정을 실행한다. 이와 같이 학습된 비디오 신원 복원 모델(VICN)은 탐색 대상에 특화되어, 탐색 대상의 저화질 얼굴을 더욱 잘 재구성할 수 있는 능력을 가지게 된다.Step (SS4) uses the learning data set obtained in step (SS3) to execute a fine-tuning learning process of the video identity restoration model (VICN). The video identity restoration model (VICN) learned in this way is specialized for the search target and has the ability to better reconstruct the low-quality face of the search target.

실시예에 따른 얼굴 이미지 재구성 방법은 학습된 비디오 신원 복원 모델(VICN)을 이용하여 입력 비디오에서 탐색 대상을 인식하는 단계(SS5)를 더 포함할 수 있다. 단계(SS5)는 도 3을 참조하여 전술한 단계(S5)를 비디오 신원 복원 모델(VICN)을 이용하여 수행한다.The facial image reconstruction method according to the embodiment may further include a step (SS5) of recognizing a search target in the input video using a learned video identity restoration model (VICN). In step SS5, the step S5 described above with reference to FIG. 3 is performed using a video identity restoration model (VICN).

단계(SS5)에 의해, 실시간 비디오 입력으로부터 얼굴 탐지 및 도 9를 참조하여 후술한 단계(SS1)의 얼굴 특징점 트래킹 기법을 활용하여 실시간 비디오 입력으로부터 탐지된 저화질 얼굴 시퀀스를 고화질 얼굴로 재구성한다.In step SS5, the low-quality face sequence detected from the real-time video input is reconstructed into a high-definition face by utilizing the face detection from the real-time video input and the facial feature point tracking technique of step SS1 described later with reference to FIG. 9.

도 9는 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성의 얼굴 트래킹 과정을 설명하기 위한 도면이다.FIG. 9 is a diagram illustrating a face tracking process of facial image reconstruction using a video identity restoration model according to an embodiment.

도 8을 참조하여 단계(SS1)에서 프로세서(110)는 입력 비디오에서 특징점 기반 얼굴 트래킹(landmark-based face tracking) 및 학습 데이터 셋을 획득한다.Referring to FIG. 8, in step SS1, the processor 110 acquires landmark-based face tracking and a learning data set from the input video.

예를 들어, 단계(SS1)에서 프로세서(110)는 도심 공간을 캡쳐한 비디오 프레임들로부터 다양한 신원(identity)를 가진 사람들의 얼굴을 탐지한다. 여기서, 카메라 움직임, 장면 내 얼굴 움직임 등이 포함된 연속된 프레임들로부터 탐지된 얼굴 이미지들을 같은 신원끼리 매핑하기 위해, 프로세서(110)는 얼굴 특징점 기반 트래킹 기법을 사용할 수 있다.For example, in step SS1, the processor 110 detects the faces of people with various identities from video frames captured in urban spaces. Here, the processor 110 may use a facial feature point-based tracking technique to map facial images detected from consecutive frames containing camera movement, facial movement within a scene, etc. to the same identity.

프로세서(110)는 얼굴 특징점 기반 트래킹 기법을 통해 얻어진 얼굴 이미지를 다운샘플링하여 <고화질 정답 얼굴 시퀀스, 저화질 입력 얼굴>로 구성된 학습 데이터셋을 구성할 수 있다. 예를 들어, 학습 데이터 셋으로는 고화질 비디오 데이터셋 WILDTRACK dataset(Tatjana Chavdarova et al., “WILDTRACK: A Multi-Camera HD Dataset for Dense Unscripted Pedestrian Detection,” CVPR2018.) 등을 활용할 수 있다.The processor 110 can construct a learning dataset consisting of <high-quality correct answer face sequence, low-quality input face> by downsampling the face image obtained through facial feature point-based tracking technique. For example, the high-definition video dataset WILDTRACK dataset (Tatjana Chavdarova et al., “WILDTRACK: A Multi-Camera HD Dataset for Dense Unscripted Pedestrian Detection,” CVPR2018.) can be used as a learning data set.

도 11을 참조하여 후술할 비디오 신원 복원 모델(VICN)은 동일 신원을 가진 사람의 얼굴 프레임 시퀀스를 입력으로 받는다. 이를 위해, 입력 비디오의 프레임별로 얼굴을 탐지하지만, 연속된 프레임 간에는 카메라 움직임, 장면 내 대상의 움직임 등으로 인해 동일 신원을 가진 사람의 얼굴이 시간에 따라 다른 위치에 나타날 수 있다.The video identity restoration model (VICN), which will be described later with reference to FIG. 11, receives as input a sequence of face frames of people with the same identity. For this purpose, faces are detected frame by frame of the input video, but between successive frames, the faces of people with the same identity may appear in different positions over time due to camera movement, movement of objects in the scene, etc.

따라서, 실시예에 따른 얼굴 인식 방법은 단계(SS1)에서 입력 비디오의 연속된 프레임 간 얼굴의 움직임을 고려하여 프레임별로 탐지된 얼굴들을 같은 신원끼리 매핑하기 위해, 특징점 기반 얼굴 트래킹 기법을 사용한다. 도 9는 이와 같은 얼굴 트래킹 과정의 전체 동작 구조도를 나타낸다.Therefore, the face recognition method according to the embodiment uses a feature point-based face tracking technique to map the faces detected for each frame to the same identity by considering the movement of the face between consecutive frames of the input video in step SS1. Figure 9 shows the overall operational structure of this face tracking process.

얼굴 트래킹 과정의 동작 순서는 다음과 같다.The operation sequence of the face tracking process is as follows.

(i) 단계(SS11) - 얼굴 탐지(Face Detection): 먼저, 연속된 두 입력 프레임(frame t, frame t+1)에서 각각 얼굴을 탐지한다.(i) Step (SS11) - Face Detection: First, faces are detected in two consecutive input frames (frame t, frame t+1).

(ii) 단계(SS12) - 랜드마크 추출(Landmark Estimation): 탐지된 개별 얼굴들로부터 랜드마크(예를 들어 눈, 코, 입의 위치)를 특징점으로 추출한다. 이를 위하여 예를 들어 얼굴 탐지 및 특징점 추출을 동시에 수행 가능한 RetinaFace detector(J. Deng et al., "RetinaFace: Single-stage Dense Face Localisation in the Wild," CVPR 2020.)를 사용할 수 있다.(ii) Step (SS12) - Landmark Estimation: Landmarks (e.g., positions of eyes, nose, mouth) are extracted as feature points from individual detected faces. For this purpose, for example, the RetinaFace detector (J. Deng et al., "RetinaFace: Single-stage Dense Face Localization in the Wild," CVPR 2020.), which can perform face detection and feature extraction simultaneously, can be used.

(iii) 단계(SS13) - 광학 흐름 트래킹(Optical Flow Tracking): 이후, frame t와 frame t+1 사이에서 대응되는 랜드마크를 찾기 위해 광학 흐름(Optical Flow)을 계산한다. 예를 들어, Lukas-Kanade Optical Flow Tracker(B. D. Lucas, T. Kanade et al., "An iterative image registration technique with an application to stereo vision." Vancouver, British Columbia, 1981.)를 사용할 수 있다.(iii) Step (SS13) - Optical Flow Tracking: Next, optical flow is calculated to find the corresponding landmark between frame t and frame t+1. For example, the Lukas-Kanade Optical Flow Tracker (B. D. Lucas, T. Kanade et al., "An iterative image registration technique with an application to stereo vision." Vancouver, British Columbia, 1981.) can be used.

(iv) 단계(SS14) - 움직임 보상(Motion Compensation): 단계(SS13)에서 계산된 랜드마크(예를 들어 눈, 코, 입)의 특징점들의 광학 흐름(Optical Flow)의 평균을 구한다. 해당 평균을 프레임 간 물체의 움직임으로 가정하고, frame t의 얼굴 바운딩 박스(bounding box) 좌표들을 변환한다.(iv) Step (SS14) - Motion Compensation: Calculate the average of the optical flow of the feature points of the landmark (e.g., eyes, nose, mouth) calculated in step (SS13). Assume that the average is the movement of the object between frames, and transform the face bounding box coordinates of frame t.

(v) 단계(SS15) - IoU-based Bounding Box Matching: (iv) 단계(SS14)의 과정을 거친 frame t의 바운딩 박스들과 frame t+1의 바운딩 박스들 간 IoU (Intersection over Union)를 계산하여 겹치는 영역의 면적을 계산한다. IoU가 일정 값 이상인 두 바운딩 박스가 탐지되는 경우, 해당 두 얼굴을 동일 신원으로 판단한다.(v) Step (SS15) - IoU-based Bounding Box Matching: (iv) Calculate IoU (Intersection over Union) between the bounding boxes of frame t and the bounding boxes of frame t+1 through the process of step (SS14). Calculate the area of the overlapping area. If two bounding boxes with IoU over a certain value are detected, the two faces are judged to have the same identity.

도 10은 실시예에 따른 비디오 신원 복원 모델을 이용한 얼굴 이미지 재구성의 얼굴 트래킹 과정을 예시적으로 설명하기 위한 도면이다.FIG. 10 is a diagram illustrating an exemplary face tracking process of face image reconstruction using a video identity restoration model according to an embodiment.

도 9를 참조하여 전술한 단계(SS11)에서 연속된 두 개의 프레임(frame t, frame t+1)에서 얼굴을 탐지한다(바운딩 박스 bbox_t,i, bbox_t+1,j 등). 단계(SS12)에서 각 바운딩 박스의 랜드마크를 추출한다.Referring to FIG. 9, in the above-described step (SS11), a face is detected in two consecutive frames (frame t, frame t+1) (bounding box bbox _t,i , bbox _t+1,j , etc.). In step SS12, landmarks of each bounding box are extracted.

단계(SS13)에서 frame t와 frame t+1 사이에서 대응되는 랜드마크를 찾기 위해 광학 흐름(Optical Flow)을 계산한다. 단계(SS14)에서 랜드마크의 특징점의 광학 흐름의 평균을 구하고, 해당 평균을 프레임 간 물체의 움직임으로 가정하고, frame t의 바운딩 박스(예를 들어 bbox_t,i 등)의 좌표들을 변환한다.In step SS13, optical flow is calculated to find the corresponding landmark between frame t and frame t+1. In step SS14, the average of the optical flow of the feature points of the landmark is obtained, the average is assumed to be the movement of the object between frames, and the coordinates of the bounding box of frame t (e.g., bbox _t,i, etc.) are converted.

단계(SS15)에서 frame t의 바운딩 박스들과 frame t+1의 바운딩 박스들 간 IoU를 계산하여 겹치는 영역의 면적을 계산한다. 예를 들어 bbox_t,i 와 bbox_t+1,j의 IoU가 일정 값 이상인 경우 두 바운딩 방스(bbox_t,i 와 bbox_t+1,j)가 나타내는 얼굴 이미지를 동일 신원으로 판단한다.In step SS15, the IoU between the bounding boxes of frame t and the bounding boxes of frame t+1 is calculated to calculate the area of the overlapping area. For example, if the IoU of bbox _t,i and bbox _t+1,j is above a certain value, the face images represented by the two bounding fences (bbox _t,i and bbox _t+1,j ) are judged to be the same identity.

도 11은 실시예에 따른 비디오 신원 복원 모델 및 학습 구조를 설명하기 위한 도면이다.Figure 11 is a diagram for explaining a video identity recovery model and learning structure according to an embodiment.

도 11은 비디오 신원 복원 모델(VICN)의 예시적인 학습 구조를 보여준다.11 shows an example learning structure of a video identity restoration model (VICN).

생성기(Generator; G)는 저화질 얼굴 시퀀스(FRM_SE Q)를 입력으로 받아 고화질 얼굴(F_R)로 재구성한다. 구체적으로, 생성기(G)를 통한 얼굴 재구성은 다중 프레임 얼굴 화질 개선기에 의한 제 1 단계 및 랜드마크 기반 얼굴 업샘플러에 의한 제 2 단계를 포함한다.The generator (G) receives a low-quality face sequence (FRM_SE Q) as input and reconstructs it into a high-quality face (F_R). Specifically, face reconstruction through the generator (G) includes a first step by a multi-frame face image quality improver and a second step by a landmark-based face upsampler.

(i) 제 1 단계 - 다중 프레임 얼굴 화질 개선(Multi-Frame Face Resolution Enhancement): 다중 프레임 얼굴 화질 개선기(Multi-Frame Face Resolution Enhancer)(G_MFRE)는 입력 비디오로부터 얻은 저화질 얼굴 이미지 시퀀스(예를 들어, frame 1, frame 2, frame 3 등)(FRM_SEQ)를 기준 프레임(Reference frame)(FRM_REF)에 기반하여 서로 융합하여 일차적으로 화질이 개선된 중간-재구성된 얼굴 이미지(y_int)(F_IR)를 얻는다.(i) Step 1 - Multi-Frame Face Resolution Enhancement: Multi-Frame Face Resolution Enhancer (G_MFRE) is a low-quality face image sequence obtained from the input video (e.g. For example, frame 1, frame 2, frame 3, etc.) (FRM_SEQ) are fused together based on the reference frame (FRM_REF) to create an intermediate-reconstructed face image (y_int) (F_IR) with primarily improved image quality. get

다중 프레임 얼굴 화질 개선기(G_MFRE)는 모션 추정(Motion Estimation)(G_ME) 및 워핑(Warping)(G_W), 및 다중 프레임 융합기(Multi-Frame Fuser)(G_MFF)를 포함한다. 구체적인 구조는 도 12를 참조하여 후술한다.The multi-frame face quality enhancer (G_MFRE) includes Motion Estimation (G_ME) and Warping (G_W), and Multi-Frame Fuser (G_MFF). The specific structure will be described later with reference to FIG. 12.

(ii) 제 2 단계 - 랜드마크-기반 얼굴 업샘플링(Landmark-guided Face Upsampling): 이는 얼굴 랜드마크 예측기(Face Landmark Estimator)(G_FLE) 및 얼굴 업샘플러(Face Upsampler)(G_FUP)에 의해 수행되며 이에 대하여는 도 4를 참조하여 전술한 내용에서 중간-재구성된 얼굴 이미지(F_IR)을 도 4의 저화질 이미지(LR)로 하여 실행된다.(ii) Stage 2 - Landmark-guided Face Upsampling: This is performed by Face Landmark Estimator (G_FLE) and Face Upsampler (G_FUP); In this regard, as described above with reference to FIG. 4 , the intermediate-reconstructed face image (F_IR) is used as the low-quality image (LR) of FIG. 4 .

이로써 재구성된 얼굴 이미지(F_R)이 출력되고, 정답 이미지(F_GT)를 이용하여 학습이 이루어진다.As a result, the reconstructed face image (F_R) is output, and learning is performed using the correct answer image (F_GT).

도 12는 실시예에 따른 신원 복원 모델의 생성기의 다중 프레임 얼굴 화질 개선기의 네트워크 구조를 설명하기 위한 도면이다.FIG. 12 is a diagram illustrating the network structure of a multi-frame face image quality improver of an identity recovery model generator according to an embodiment.

먼저, 일련의 프레임(FRM_SEQ)의 비디오 프레임별 얼굴 이미지의 자세 및 각도 차이를 보정하기 위해, 기준 프레임(reference frame)(FRM_REF)을 중심으로 프레임 간 움직임 예측(G_ME) 및 워핑(G_W) 과정을 거친다.First, in order to correct the pose and angle difference of the face image for each video frame of a series of frames (FRM_SEQ), inter-frame motion prediction (G_ME) and warping (G_W) processes are performed centering on the reference frame (FRM_REF). It's rough.

예를 들어, 프레임 간 움직임이 너무 크지 않도록 프레임 시퀀스(FRM_SEQ)의 중앙 프레임을 기준 프레임(FRM_REF)으로 정할 수 있으나, 이에 제한되는 것은 아니며, 다양한 방식으로 기준 프레임(FRM_REF)을 정할 수 있다.For example, the center frame of the frame sequence (FRM_SEQ) can be set as the reference frame (FRM_REF) so that the movement between frames is not too large, but this is not limited, and the reference frame (FRM_REF) can be set in various ways.

예를 들어, 세 개의 프레임(frame 1, frame 2, frame 3)가 있다고 가정하면, 중앙 프레임인 frame 2를 기준 프레임(FRM_REF)로 정하고, frame 1과 frame 2에 대하여 프레임 간 움직임 예측(G_ME) 및 워핑(G_W) 과정을 수행하고, frame 2와 frame 3에 대하여 프레임 간 움직임 예측(G_ME) 및 워핑(G_W) 과정을 수행할 수 있다.For example, assuming there are three frames (frame 1, frame 2, frame 3), frame 2, the central frame, is set as the reference frame (FRM_REF), and inter-frame motion prediction (G_ME) is performed for frame 1 and frame 2. and warping (G_W) processes, and inter-frame motion prediction (G_ME) and warping (G_W) processes can be performed for frame 2 and frame 3.

예를 들어, 다섯 개의 프레임(frame 1, frame 2, frame 3, frame 4, frame 5)가 있다고 가정하면, 중앙 프레임인 frame 3를 기준 프레임(FRM_REF)로 정하고, frame 1과 frame 3, frame 2와 frame 3, frame 4와 frame 3, 그리고 frame 5와 frame 3에 대하여 프레임 간 움직임 예측(G_ME) 및 워핑(G_W) 과정을 수행할 수 있다.For example, assuming that there are five frames (frame 1, frame 2, frame 3, frame 4, frame 5), frame 3, the central frame, is set as the reference frame (FRM_REF), and frame 1, frame 3, and frame 2 are set as the reference frame (FRM_REF). Inter-frame motion prediction (G_ME) and warping (G_W) processes can be performed for frame 3, frame 4 and frame 3, and frame 5 and frame 3.

예를 들어, 네 개의 프레임((frame 1, frame 2, frame 3, frame 4)이 있다면, frame 2와 frame 3 중 하나를 임의로 기준 프레임(FRM_REF)로 결정할 수 있다.For example, if there are four frames (frame 1, frame 2, frame 3, frame 4), one of frame 2 and frame 3 can be arbitrarily determined as the reference frame (FRM_REF).

예를 들어, 네 개의 프레임((frame 1, frame 2, frame 3, frame 4)이 있다면, frame 1과 frame2, frame 3와 frame 4 에 대하여 프레임 간 움직임 예측(G_ME) 및 워핑(G_W) 과정을 수행할 수 있다.For example, if there are four frames ((frame 1, frame 2, frame 3, frame 4), inter-frame motion prediction (G_ME) and warping (G_W) processes are performed for frame 1, frame 2, frame 3, and frame 4. It can be done.

여기서 움직임 예측을 위한 네트워크(G_ME)의 구조는 관련 태스크에서 활발히 사용되는 VESPCN 구조(J. Caballero et al., "Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation," CVPR 2017.)를 활용할 수 있다.Here, the structure of the network for motion prediction (G_ME) is the VESPCN structure, which is actively used in related tasks (J. Caballero et al., "Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation," CVPR 2017.) You can use .

이 때, Residual block (K. He et al., "Deep residual learning for image recognition," CVPR 2016.)을 기본 블록 구조로 활용하여, 얼굴 프레임 간 움직임 예측을 위한 효과적인 기준점(feature point)를 추출할 수 있도록 학습될 수 있다.At this time, residual block (K. He et al., "Deep residual learning for image recognition," CVPR 2016.) can be used as a basic block structure to extract effective reference points (feature points) for predicting movement between face frames. It can be learned to do so.

기준 프레임(FRM_REF)을 중심으로 워핑된 입력 프레임들은 이미지의 채널 축으로 연결되어(concat 연산), 다중 프레임 융합기(G_MFF)의 입력으로 들어간다. 예를 들어, 워핑된 입력 프레임들이 concat 연산될 수 있다. 예를 들어, 워핑된 입력 프레임들과 기준 프레임(FRM_REF)이 concat 연산될 수 있다.Input frames warped around the reference frame (FRM_REF) are connected to the channel axis of the image (concat operation) and input to the multi-frame fusion unit (G_MFF). For example, warped input frames can be concat operated. For example, warped input frames and a reference frame (FRM_REF) can be concat operated.

일 예에서 프로세서(110)는, N 개의 프레임 단위로 입력 프레임 시퀀스(FRM_SEQ) 상에 슬라이딩 윈도우를 이동하면서 전술한 움직임 예측(G_ME)-워핑(G_W)-다중 프레임 융합(G_MFF)을 수행하여, N 개의 프레임을 한번에 합쳐서 중간-재구성된 얼굴 이미지(F_IR)을 출력할 수 있다.In one example, the processor 110 performs the above-described motion prediction (G_ME)-warping (G_W)-multiple frame fusion (G_MFF) while moving a sliding window on the input frame sequence (FRM_SEQ) in units of N frames, N frames can be combined at once to output an intermediate-reconstructed face image (F_IR).

일 예에서 프로세서(110)는 입력 프레임 시퀀스(FRM_SEQ)의 연속된 두 프레임씩 전술한 움직임 예측(G_ME)-워핑(G_W)-다중 프레임 융합(G_MFF)을 수행하여 점진적으로 화질을 개선하는 방식도 가능하다.In one example, the processor 110 performs the above-described motion prediction (G_ME)-warping (G_W)-multi-frame fusion (G_MFF) for each two consecutive frames of the input frame sequence (FRM_SEQ) to gradually improve image quality. possible.

예시적인 다중 프레임 융합 네트워크(G_MFF)는 다양한 이미지 처리에서 높은 정확도를 달성하는 Residual block을 기본 블록 구조로 활용하여, 화질 개선을 위해 중요한 특징(feature)을 뽑도록 학습될 수 있다.An exemplary multi-frame fusion network (G_MFF) can be trained to extract important features to improve image quality by using residual blocks, which achieve high accuracy in various image processing, as a basic block structure.

실시예에 의한 얼굴 이미지 재구성 방법 및 장치는 장거리 저화질 얼굴 인식 기술을 제공하며, 대화 상황에서 근거리의 1-2명의 얼굴을 인식하는데 한정된 기존의 모바일 얼굴 인식 응용의 정확도를 혁신적으로 개선할 수 있다.The facial image reconstruction method and device according to the embodiment provides long-distance, low-definition face recognition technology and can innovatively improve the accuracy of existing mobile face recognition applications that are limited to recognizing 1-2 faces at close range in conversation situations.

실시예에 의한 얼굴 이미지 재구성 방법 및 장치는 비디오 속 연속적인 프레임에 걸쳐 캡쳐된 일련의 저화질 얼굴 이미지로부터 고화질의 얼굴 이미지를 재구성할 수 있다.실시예에 의한 얼굴 이미지 재구성 방법은 안드로이드 기반 스마트폰에서 동작할 수 있는 소프트웨어로 개발되어, 상용 스마트폰에 탑재 후 실행시킬 수 있다. 신원 복원 모델을 포함한 DNN 기반 얼굴 인식 기술은 Google TensorFlow로 구현되어, 안드로이드용 Google TensorFlow-Lite로 변환하여 실행할 수 있다.The facial image reconstruction method and device according to the embodiment can reconstruct a high-quality facial image from a series of low-quality facial images captured over successive frames in a video. The facial image reconstruction method according to the embodiment can be used in an Android-based smartphone. It is developed as operable software and can be installed and run on commercial smartphones. DNN-based face recognition technology, including an identity recovery model, is implemented in Google TensorFlow and can be converted and executed in Google TensorFlow-Lite for Android.

실시예에 의한 얼굴 이미지 재구성 기술은 얼굴 인식 기반의 여러 유용한 모바일 AR 응용(예: 실종 아동 찾기, 범인 추적)에 활용가능하다.The face image reconstruction technology according to the embodiment can be used in various useful mobile AR applications based on face recognition (e.g., finding missing children, tracking criminals).

전술한 본 발명의 일 실시예에 따른 방법은 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다.The method according to an embodiment of the present invention described above can be implemented as computer-readable code on a medium on which a program is recorded. Computer-readable media includes all types of recording devices that store data that can be read by a computer system. Examples of computer-readable media include HDD (Hard Disk Drive), SSD (Solid State Disk), SDD (Silicon Disk Drive), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. There is.

실시예에 따른 얼굴 이미지 재구성 방법은 이를 실행하기 위한 하나 이상의 명령어를 포함하는 컴퓨터 프로그램을 기록한 컴퓨터 판독가능한 비-일시적인 기록매체에 저장될 수 있다.The facial image reconstruction method according to the embodiment may be stored in a computer-readable non-transitory recording medium on which a computer program including one or more instructions for executing the method is recorded.

이상 설명된 본 발명의 실시 예에 대한 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the embodiments of the present invention described above is for illustrative purposes, and a person skilled in the art of the present invention can easily transform it into another specific form without changing the technical idea or essential features of the present invention. You will be able to understand that Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 청구범위에 의하여 나타내어지며, 청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention.

100: 얼굴 이미지 재구성 장치
200: 서버
300: 네트워크100: Facial image reconstruction device
200: server
300: Network

Claims

프로세서를 포함한 얼굴 이미지 재구성 장치에 의해 실행되는 얼굴 이미지 재구성 방법으로서,
입력 비디오의 일련의 프레임으로부터 트래킹(tracking)된 적어도 하나의 얼굴 이미지 및 상기 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터를 획득하는 단계; 및
상기 학습 데이터에 기반하여 비디오 신원 복원 모델(Video Identity Clarification Network; VICN)을 학습하는 단계를 포함하고,
상기 학습하는 단계는,
상기 비디오 신원 복원 모델의 생성기(Generator)를 실행하여, 상기 적어도 하나의 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 단계; 및
상기 생성기와 생성적 적대 신경망(Generative Adversarial Network; GAN)의 경쟁 관계에 있는 상기 비디오 신원 복원 모델의 판별기(Discriminator)를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 단계를 포함하고,
상기 생성 단계는,
상기 생성기의 다중 프레임 얼굴 화질 개선기(Multi-Frame Face Resolution Enhancer)를 실행하여, 상기 적어도 하나의 얼굴 이미지로부터 중간-재구성된 얼굴 이미지(intermediate-reconstructed face image)를 생성하는 단계
를 포함하는,
얼굴 이미지 재구성 방법.
A facial image reconstruction method executed by a facial image reconstruction device including a processor, comprising:
Obtaining learning data including at least one face image tracked from a series of frames of an input video and a correct face image for the at least one face image; and
Comprising the step of learning a Video Identity Clarification Network (VICN) based on the learning data,
The learning step is,
A generating step of executing a generator of the video identity recovery model to generate a reconstructed face image in which the identity of the face appearing in the at least one face image is restored; and
A discriminator to determine the reconstructed face image based on the correct face image by executing a discriminator of the video identity restoration model in competition between the generator and a generative adversarial network (GAN). Includes steps,
The creation step is,
Executing the Multi-Frame Face Resolution Enhancer of the generator to generate an intermediate-reconstructed face image from the at least one face image.
Including,
Facial image reconstruction method.

제 1 항에 있어서,
상기 학습 데이터를 획득하는 단계는,
상기 일련의 프레임의 연속된 프레임 간 얼굴 특징점 트래킹 정보에 기반하여 상기 일련의 프레임에서 상기 적어도 하나의 얼굴 이미지를 추출하는 단계를 포함하는,
얼굴 이미지 재구성 방법.
According to claim 1,
The step of acquiring the learning data is,
Comprising the step of extracting the at least one facial image from the series of frames based on facial feature point tracking information between consecutive frames of the series of frames,
Facial image reconstruction method.

삭제delete

제 1 항에 있어서,
상기 생성 단계는,
상기 생성기의 얼굴 랜드마크 예측기(face landmark estimator)를 실행하여 상기 중간-재구성된 얼굴 이미지에 기반하여 복수 개의 얼굴 랜드마크를 예측하는 단계; 및
상기 생성기의 얼굴 업샘플러(face upsampler)를 실행하여 복수 개의 얼굴 랜드마크를 이용하여 상기 중간-재구성된 얼굴 이미지를 업샘플링(upsampling)하는 단계
를 포함하는,
얼굴 이미지 재구성 방법.
According to claim 1,
The creation step is,
executing a face landmark estimator of the generator to predict a plurality of facial landmarks based on the mid-reconstructed facial image; and
Running a face upsampler of the generator to upsample the mid-reconstructed face image using a plurality of facial landmarks.
Including,
Facial image reconstruction method.

제 4 항에 있어서,
상기 생성 단계는,
복수 개의 잔차 블록(Residual Block)을 포함하는 중간 이미지 생성기를 이용하여 상기 중간-재구성된 얼굴 이미지의 화질을 개선한 중간 이미지를 생성하는 단계
를 더 포함하고,
상기 예측하는 단계는,
상기 중간 이미지에 기반하여 상기 복수 개의 얼굴 랜드마크를 예측하고,
상기 업샘플링하는 단계는,
예측된 상기 복수 개의 얼굴 랜드마크를 이용하여 상기 중간 이미지를 업샘플링하는,
얼굴 이미지 재구성 방법.
According to claim 4,
The creation step is,
Generating an intermediate image with improved image quality of the intermediate-reconstructed face image using an intermediate image generator including a plurality of residual blocks.
It further includes,
The prediction step is,
Predicting the plurality of facial landmarks based on the intermediate image,
The upsampling step is,
Upsampling the intermediate image using the plurality of predicted facial landmarks,
Facial image reconstruction method.

제 1 항에 있어서,
상기 학습하는 단계는,
상기 비디오 신원 복원 모델의 얼굴 특징 추출기(face feature extractor)를 실행하여 상기 재구성된 얼굴 이미지의 특징맵(feature map) 및 상기 정답 얼굴 이미지의 특징맵을 추출하는 추출 단계
를 더 포함하는,
얼굴 이미지 재구성 방법.
According to claim 1,
The learning step is,
An extraction step of executing a face feature extractor of the video identity restoration model to extract a feature map of the reconstructed face image and a feature map of the correct face image.
Containing more,
Facial image reconstruction method.

제 1 항에 있어서,
상기 학습하는 단계는,
학습 목표 함수(training loss function)를 연산하는 단계; 및
상기 학습 목표 함수의 함수값을 최소화하도록 상기 생성기와 상기 판별기를 교번하여 학습하는 단계
를 더 포함하는,
얼굴 이미지 재구성 방법.
According to claim 1,
The learning step is,
Computing a training loss function; and
Alternatingly learning the generator and the discriminator to minimize the function value of the learning target function.
Containing more,
Facial image reconstruction method.

제 7 항에 있어서,
상기 학습 목표 함수는,
상기 생성기에 대한 GAN 손실 함수를 포함한 제 1 목표 함수; 및
상기 판별기에 대한 GAN 손실 함수에 기반한 제 2 목표 함수
를 포함하는,
얼굴 이미지 재구성 방법.
According to claim 7,
The learning objective function is,
a first objective function including a GAN loss function for the generator; and
A second objective function based on the GAN loss function for the discriminator
Including,
Facial image reconstruction method.

제 8 항에 있어서,
상기 제 1 목표 함수는,
상기 재구성된 얼굴 이미지와 상기 정답 얼굴 이미지 간의 픽셀 재구성 정확도 함수, 상기 재구성된 얼굴 이미지의 생성 단계에서 예측한 얼굴 랜드마크의 예측 정확도 함수 및 상기 재구성된 얼굴 이미지와 상기 정답 얼굴 이미지 간의 얼굴 특징 유사도 함수를 포함하는,
얼굴 이미지 재구성 방법.
According to claim 8,
The first objective function is,
A pixel reconstruction accuracy function between the reconstructed face image and the correct face image, a prediction accuracy function of the facial landmark predicted in the generation step of the reconstructed face image, and a facial feature similarity function between the reconstructed face image and the correct face image. Including,
Facial image reconstruction method.

제 1 항에 있어서,
탐색 대상의 적어도 하나의 얼굴 이미지 및 상기 탐색 대상의 적어도 하나의 얼굴 이미지에 대한 기준 얼굴 이미지를 포함하는 제 2 학습 데이터에 기반하여 상기 비디오 신원 복원 모델을 미세튜닝(fine-tuning)하는 제 2 학습 단계
를 더 포함하는,
얼굴 이미지 재구성 방법.
According to claim 1,
Second learning for fine-tuning the video identity restoration model based on second learning data including at least one face image of the search target and a reference face image for the at least one face image of the search target. step
Containing more,
Facial image reconstruction method.

제 10 항에 있어서,
상기 제 2 학습 단계는,
상기 제 2 학습 데이터에 기반하여 상기 생성 단계 및 상기 판별 단계를 실행하는 단계
를 포함하는,
얼굴 이미지 재구성 방법.
According to claim 10,
The second learning step is,
Executing the generating step and the determining step based on the second learning data
Including,
Facial image reconstruction method.

얼굴 이미지 재구성 장치로서,
생성기 및 상기 생성기와 생성적 적대 신경망의 경쟁 관계에 있는 판별기를 포함하는 비디오 신원 복원 모델을 저장하는 메모리; 및
입력 비디오의 일련의 프레임으로부터 트래킹된 적어도 하나의 얼굴 이미지 및 상기 적어도 하나의 얼굴 이미지에 대한 정답 얼굴 이미지를 포함하는 학습 데이터에 기반하여 상기 비디오 신원 복원 모델의 학습을 실행하도록 구성되는 프로세서
를 포함하고,
상기 프로세서는,
상기 학습을 실행하기 위하여,
상기 생성기를 실행하여, 상기 적어도 하나의 얼굴 이미지에 나타난 얼굴에 대한 신원을 복원한 재구성된 얼굴 이미지를 생성하는 생성 작업; 및
상기 판별기를 실행하여, 상기 정답 얼굴 이미지에 기반하여 상기 재구성된 얼굴 이미지를 판별하는 판별 작업을 수행하도록 구성되며,
상기 생성기는 다중 프레임 얼굴 화질 개선기를 포함하고,
상기 프로세서는, 상기 생성 작업을 수행하기 위하여, 상기 다중 프레임 얼굴 화질 개선기를 실행하여, 상기 적어도 하나의 얼굴 이미지로부터 중간-재구성된 얼굴 이미지를 생성하도록 구성되는,
얼굴 이미지 재구성 장치.
A facial image reconstruction device, comprising:
a memory storing a video identity recovery model including a generator and a discriminator in competition with the generator and a generative adversarial network; and
A processor configured to execute training of the video identity reconstruction model based on training data including at least one facial image tracked from a series of frames of an input video and a correct facial image for the at least one facial image.
Including,
The processor,
In order to carry out the above learning,
A generation operation of executing the generator to generate a reconstructed face image in which the identity of the face appearing in the at least one face image is restored; and
configured to execute the discriminator to perform a discrimination task to determine the reconstructed face image based on the correct face image;
The generator includes a multi-frame face image quality enhancer,
wherein the processor is configured to execute the multi-frame facial quality enhancer to generate an intermediate-reconstructed facial image from the at least one facial image to perform the generating task,
Facial image reconstruction device.

제 12 항에 있어서,
상기 프로세서는 상기 학습 데이터를 획득하도록 구성되고,
상기 프로세서는 상기 학습 데이터를 획득하기 위하여, 상기 일련의 프레임의 연속된 프레임 간 얼굴 특징점 트래킹 정보에 기반하여 상기 일련의 프레임에서 상기 적어도 하나의 얼굴 이미지를 추출하도록 구성되는,
얼굴 이미지 재구성 장치.
According to claim 12,
The processor is configured to obtain the learning data,
The processor is configured to extract the at least one facial image from the series of frames based on facial feature point tracking information between consecutive frames of the series of frames to obtain the learning data.
Facial image reconstruction device.

삭제delete

제 12 항에 있어서,
상기 생성기는 얼굴 랜드마크 예측기 및 얼굴 업샘플러를 더 포함하고,
상기 프로세서는,
상기 생성 작업을 수행하기 위하여,
상기 얼굴 랜드마크 예측기를 실행하여 상기 중간-재구성된 얼굴 이미지에 기반하여 복수 개의 얼굴 랜드마크를 예측하고,
상기 얼굴 업샘플러를 실행하여 복수 개의 얼굴 랜드마크를 이용하여 상기 중간-재구성된 얼굴 이미지를 업샘플링(upsampling)하도록 구성되는,
얼굴 이미지 재구성 장치.
According to claim 12,
The generator further includes a facial landmark predictor and a facial upsampler,
The processor,
To perform the above creation operation,
Execute the facial landmark predictor to predict a plurality of facial landmarks based on the mid-reconstructed facial image,
configured to execute the facial upsampler to upsample the mid-reconstructed facial image using a plurality of facial landmarks,
Facial image reconstruction device.

제 15 항에 있어서,
상기 생성기는 복수 개의 잔차 블록(Residual Block)을 포함하는 중간 이미지 생성기를 더 포함하고,
상기 프로세서는,
상기 생성 작업을 수행하기 위하여,
상기 중간 이미지 생성기를 이용하여 상기 중간-재구성된 얼굴 이미지의 화질을 개선한 중간 이미지를 생성하고,
상기 얼굴 랜드마크 예측기를 실행하여 상기 중간 이미지에 기반하여 상기 복수 개의 얼굴 랜드마크를 예측하고,
상기 얼굴 업샘플러를 실행하여 상기 중간 이미지에 기반하여 예측된 상기 복수 개의 얼굴 랜드마크를 이용하여 상기 중간 이미지를 업샘플링하도록 구성되는,
얼굴 이미지 재구성 장치.
According to claim 15,
The generator further includes an intermediate image generator including a plurality of residual blocks,
The processor,
To perform the above creation operation,
Generating an intermediate image with improved quality of the intermediate-reconstructed face image using the intermediate image generator,
Executing the facial landmark predictor to predict the plurality of facial landmarks based on the intermediate image,
configured to execute the facial upsampler to upsample the intermediate image using the plurality of facial landmarks predicted based on the intermediate image,
Facial image reconstruction device.

제 12 항에 있어서,
상기 비디오 신원 복원 모델은 얼굴 특징 추출기를 더 포함하고,
상기 프로세서는,
상기 학습을 실행하기 위하여,
상기 얼굴 특징 추출기를 실행하여 상기 재구성된 얼굴 이미지의 특징맵 및 상기 정답 얼굴 이미지의 특징맵을 추출하도록 구성되는,
얼굴 이미지 재구성 장치.
According to claim 12,
The video identity recovery model further includes a facial feature extractor,
The processor,
In order to carry out the above learning,
configured to execute the facial feature extractor to extract a feature map of the reconstructed facial image and a feature map of the correct facial image,
Facial image reconstruction device.

제 12 항에 있어서,
상기 프로세서는,
상기 학습을 실행하기 위하여,
학습 목표 함수를 연산하고,
상기 학습 목표 함수의 함수값을 최소화하도록 상기 생성기와 상기 판별기를 교번하여 학습시키도록 구성되고,
상기 학습 목표 함수는,
상기 생성기에 대한 GAN 손실 함수를 포함한 제 1 목표 함수; 및
상기 판별기에 대한 GAN 손실 함수에 기반한 제 2 목표 함수
를 포함하고,
상기 제 1 목표 함수는,
상기 재구성된 얼굴 이미지와 상기 정답 얼굴 이미지 간의 픽셀 재구성 정확도 함수, 상기 재구성된 얼굴 이미지의 생성 작업에서 예측한 얼굴 랜드마크의 예측 정확도 함수 및 상기 재구성된 얼굴 이미지와 상기 정답 얼굴 이미지 간의 얼굴 특징 유사도 함수를 포함하는,
얼굴 이미지 재구성 장치.
According to claim 12,
The processor,
In order to carry out the above learning,
Compute the learning objective function,
Configured to alternately train the generator and the discriminator to minimize the function value of the learning target function,
The learning objective function is,
a first objective function including a GAN loss function for the generator; and
A second objective function based on the GAN loss function for the discriminator
Including,
The first objective function is,
A pixel reconstruction accuracy function between the reconstructed face image and the correct face image, a prediction accuracy function of facial landmarks predicted in the generation task of the reconstructed face image, and a facial feature similarity function between the reconstructed face image and the correct face image. Including,
Facial image reconstruction device.

제 12 항에 있어서,
상기 프로세서는,
탐색 대상의 적어도 하나의 얼굴 이미지 및 상기 탐색 대상의 적어도 하나의 얼굴 이미지에 대한 기준 얼굴 이미지를 포함하는 제 2 학습 데이터에 기반하여 상기 비디오 신원 복원 모델을 미세튜닝하는 제 2 학습을 실행하도록 구성되는,
얼굴 이미지 재구성 장치.
According to claim 12,
The processor,
configured to perform second learning to fine-tune the video identity reconstruction model based on second learning data including at least one face image of the search target and a reference face image for the at least one face image of the search target. ,
Facial image reconstruction device.

제 19 항에 있어서,
상기 프로세서는,
상기 제 2 학습을 실행하기 위하여,
상기 제 2 학습 데이터에 기반하여 상기 생성 작업 및 상기 판별 작업을 수행하도록 구성되는,
얼굴 이미지 재구성 장치.According to claim 19,
The processor,
To carry out the second lesson above,
Configured to perform the generation task and the determination task based on the second learning data,
Facial image reconstruction device.