KR102193469B1

KR102193469B1 - Computer device and method to perform image conversion

Info

Publication number: KR102193469B1
Application number: KR1020190073498A
Authority: KR
Inventors: 강민석; 이록규; 박지혁; 김현기
Original assignee: 엔에이치엔 주식회사
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2020-12-21

Abstract

According to an embodiment of the present invention, a computer device senses at least one target object from a first image in response to a first user input selecting the first image, displays the target object overlapped in a part of a second image on a display device in response to a second user input overlapped in the target object with a part of the second image, converts a target object overlapped in response to a third user input and a target object overlapped by processing the second image through a plurality of convolutional neural networks in association with the second image, and displays the target object overlapped with the part of the second image and converted as a third image on a display device, wherein each of the convolutional neural networks includes a plurality of convolutional encoder layers and at least one convolutional decoder layer.

Description

이미지 변환을 수행하는 컴퓨터 장치 및 방법{COMPUTER DEVICE AND METHOD TO PERFORM IMAGE CONVERSION}Computer device and method for performing image conversion TECHNICAL FIELD

본 발명은 전자 장치에 관한 것으로, 좀 더 구체적으로는 이미지 변환을 수행하는 컴퓨터 장치 및 방법에 관한 것이다.The present invention relates to an electronic device, and more particularly, to a computer device and method for performing image conversion.

소스 이미지의 일부 정보는 포함하되 다른 특징(혹은 스타일)을 갖는 변환 이미지에 대한 수요가 증가하면서, 이미지 필터링에 대한 다양한 방법들이 개발되고 있다. 기존의 방법들은 다양한 이미지 필터링들을 통해 하나의 소스 이미지로부터 다양한 변환 이미지들을 제공할 수 있다는 장점을 가지나, 소스 이미지 자체에 대한 수정은 어렵다는 한계를 가진다.As the demand for a converted image including some information of a source image but having a different characteristic (or style) increases, various methods for image filtering are being developed. Existing methods have the advantage of providing various converted images from one source image through various image filtering, but have a limitation in that it is difficult to modify the source image itself.

한편, 딥러닝(deep learning)은 머신러닝의 한 분야로 데이터를 컴퓨터가 처리 가능한 형태인 행렬 형태로 변환하고 이를 인공신경망으로 학습하는 모델을 구축하는 연구를 포함한다. 이러한 딥러닝은 뉴런과 같은 신경과학에서 영감을 얻어 만들어진 인공신경망에 기반을 두고 있으며, 인간의 인식방식과 유사하게 여러 층의 계층적 구조로 이루어져 있다.On the other hand, deep learning is a field of machine learning that involves converting data into a matrix form that can be processed by a computer and building a model that learns it with an artificial neural network. This deep learning is based on an artificial neural network created with inspiration from neuroscience such as neurons, and is composed of several layers of hierarchical structure similar to human recognition methods.

위 기재된 내용은 오직 본 발명의 기술적 사상들에 대한 배경 기술의 이해를 돕기 위한 것이며, 따라서 그것은 본 발명의 기술 분야의 당업자에게 알려진 선행 기술에 해당하는 내용으로 이해될 수 없다.The above-described content is only intended to aid in understanding the background art of the technical idea of the present invention, and therefore it cannot be understood as the content corresponding to the prior art known to those skilled in the art.

본 발명의 실시 예들은 다양한 이미지들 내에 포함된 오브젝트들 중 사용자에 의해 선택된 오브젝트를 사용자에 의해 선택된 이미지의 특징을 갖도록 변환할 수 있는 컴퓨터 장치 및 방법을 제공하기 위한 것이다.Embodiments of the present invention provide a computer device and method capable of converting an object selected by a user among objects included in various images to have characteristics of an image selected by the user.

본 발명의 실시 예에 따른 컴퓨터 장치는, 적어도 하나의 프로세서; 상기 적어도 하나의 프로세서의 제어에 응답하여 동작하는 디스플레이 장치; 및 제 1 이미지 및 제 2 이미지를 저장하되, 상기 적어도 하나의 프로세서의 제어에 응답하여 동작하는 메모리를 포함하되, 상기 적어도 하나의 프로세서는, 상기 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하고, 상기 타겟 오브젝트를 상기 제 2 이미지의 일부에 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하고, 제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하고, 상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이한다.A computer device according to an embodiment of the present invention includes at least one processor; A display device operating in response to the control of the at least one processor; And a memory for storing the first image and the second image, the memory operating in response to the control of the at least one processor, wherein the at least one processor is configured to perform a response to a first user input for selecting the first image. Detecting at least one target object from the first image, and displaying the target object overlapping the part of the second image in response to a second user input that overlaps the target object with a part of the second image Displaying on a device, and processing the overlapped target object and the second image through a plurality of convolutional neural networks in response to a third user input to convert the overlapped target object in association with the second image, wherein the Each of the plurality of convolutional neural network networks includes a plurality of convolutional encoder layers and at least one convolutional decoder layer, and the transformed target object overlapped with the part of the second image is displayed as a third image. Display on the device.

상기 복수의 컨볼루션 신경망 네트워크들은 순차적인 스테이지들로서 제공될 수 있다.The plurality of convolutional neural networks may be provided as sequential stages.

상기 복수의 컨볼루션 신경망 네트워크들은 서로 다른 수의 컨볼루션 인코더 레이어들을 가질 수 있다.The plurality of convolutional neural network networks may have different numbers of convolutional encoder layers.

상기 복수의 컨볼루션 신경망 네트워크들은 제 1 컨볼루션 신경망 네트워크 및 제 2 컨볼루션 신경망 네트워크를 포함하고, 상기 적어도 하나의 프로세서는, 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크에 입력시켜 상기 제 1 컨볼루션 신경망 네트워크로부터 제 4 이미지를 획득하고, 상기 제 4 이미지 및 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크에 입력시켜 상기 제 2 컨볼루션 신경망 네트워크로부터 상기 제 3 이미지를 획득할 수 있다.The plurality of convolutional neural network networks includes a first convolutional neural network network and a second convolutional neural network network, and the at least one processor includes the overlapped target object and the second image as the first convolutional neural network network. To obtain a fourth image from the first convolutional neural network network, and input the fourth image and the second image to the second convolutional neural network network to obtain the third image from the second convolutional neural network network Can be obtained.

상기 적어도 하나의 프로세서는, 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들에 통과시켜 제 1 특징 맵들을 생성하고, 상기 오버랩된 타겟 오브젝트를 상기 제 1 컨볼루션 신경망 네트워크의 상기 컨볼루션 인코더 레이어들에 통과시켜 제 2 특징 맵들을 생성하고, 상기 제 2 특징 맵들의 적어도 일부에 상기 제 1 특징 맵들을 반영하여 제 1 스왑(swap) 맵들을 획득하고, 상기 제 1 스왑 맵들을 상기 제 1 컨볼루션 신경망 네트워크의 적어도 하나의 컨볼루션 디코더 레이어에 통과시켜 상기 제 4 이미지를 획득할 수 있다.The at least one processor generates first feature maps by passing the second image through convolutional encoder layers of the first convolutional neural network, and the overlapped target object of the first convolutional neural network network. Passing through the convolutional encoder layers to generate second feature maps, reflecting the first feature maps in at least a portion of the second feature maps to obtain first swap maps, and the first swap map The fourth image may be obtained by passing them through at least one convolutional decoder layer of the first convolutional neural network network.

상기 적어도 하나의 프로세서는, 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들에 통과시켜 제 3 특징 맵들을 생성하고, 상기 제 4 이미지를 상기 제 2 컨볼루션 신경망 네트워크의 상기 컨볼루션 인코더 레이어들에 통과시켜 제 4 특징 맵들을 생성하고, 상기 제 4 특징 맵들의 적어도 일부에 상기 제 3 특징 맵들을 반영하여 제 2 스왑 맵들을 획득하고, 상기 제 2 스왑 맵들을 상기 제 2 컨볼루션 신경망 네트워크의 적어도 하나의 컨볼루션 디코더 레이어에 통과시켜 상기 제 3 이미지를 획득할 수 있다.The at least one processor generates third feature maps by passing the second image through convolutional encoder layers of the second convolutional neural network network, and transmitting the fourth image to the second convolutional neural network network. Passing through convolutional encoder layers to generate fourth feature maps, reflecting the third feature maps in at least a portion of the fourth feature maps to obtain second swap maps, and converting the second swap maps to the second The third image may be obtained by passing through at least one convolutional decoder layer of the convolutional neural network network.

상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많을 수 있다.The number of convolutional encoder layers of the first convolutional neural network network may be greater than the number of convolutional encoder layers of the second convolutional neural network network.

상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많을 수 있다.The number of convolutional encoder layers of the second convolutional neural network network may be greater than the number of convolutional encoder layers of the first convolutional neural network.

본 발명의 다른 일면은 디스플레이 장치를 포함하는 컴퓨터 장치의 동작 방법에 관한 것이다. 상기 동작 방법은, 제 1 이미지 및 제 2 이미지를 저장하는 단계; 상기 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하는 단계; 상기 타겟 오브젝트를 상기 제 2 이미지의 일부에 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하는 단계; 제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하는, 단계; 및 상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이하는 단계를 포함한다.Another aspect of the present invention relates to a method of operating a computer device including a display device. The operating method includes storing a first image and a second image; Detecting at least one target object from the first image in response to a first user input selecting the first image; Displaying the target object overlapping the part of the second image on the display device in response to a second user input overlapping the target object with a part of the second image; In response to a third user input, the overlapped target object and the second image are processed through a plurality of convolutional neural network networks to convert the overlapped target object in association with the second image, and the plurality of convolutional neural networks Each of the networks comprising a plurality of convolutional encoder layers and at least one convolutional decoder layer; And displaying the converted target object overlapping the part of the second image as a third image on the display device.

상기 복수의 컨볼루션 신경망 네트워크들은 제 1 컨볼루션 신경망 네트워크 및 제 2 컨볼루션 신경망 네트워크를 포함하고, 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하는 단계는, 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크에 입력시켜 상기 제 1 컨볼루션 신경망 네트워크로부터 제 4 이미지를 획득하고, 상기 제 4 이미지 및 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크에 입력시켜 상기 제 2 컨볼루션 신경망 네트워크로부터 상기 제 3 이미지를 획득할 수 있다.The plurality of convolutional neural network networks includes a first convolutional neural network network and a second convolutional neural network network, and converting the overlapped target object in association with the second image includes the overlapped target object and the second image. By inputting a second image to the first convolutional neural network network to obtain a fourth image from the first convolutional neural network network, and inputting the fourth image and the second image to the second convolutional neural network network The third image may be obtained from the second convolutional neural network.

본 발명의 또 다른 일면은 프로그램을 저장하는, 컴퓨터에 의해 판독 가능한 저장 매체에 관한 것이다. 상기 프로그램은 상기 컴퓨터에 의해 실행될 때, 상기 컴퓨터에 저장된 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하고, 상기 디스플레이 장치에 디스플레이된 제 2 이미지의 일부에 상기 타겟 오브젝트를 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하고, 제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하고, 상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이하는 명령어들을 포함한다.Another aspect of the present invention relates to a storage medium readable by a computer storing a program. The program, when executed by the computer, detects at least one target object from the first image in response to a first user input selecting a first image stored in the computer, and a second image displayed on the display device In response to a second user input overlapping the target object on a part of, the target object overlapped with the part of the second image is displayed on the display device, and in response to a third user input, the overlapped target object and By processing the second image through a plurality of convolutional neural network networks, the overlapped target object is converted in association with the second image, wherein each of the plurality of convolutional neural network networks includes a plurality of convolutional encoder layers and at least It includes one convolution decoder layer, and includes instructions for displaying the transformed target object overlapped with the part of the second image as a third image on the display device.

본 발명의 실시 예들에 따른 컴퓨터 장치 및 방법은 사용자에 의해 선택된 오브젝트를 사용자에 의해 선택된 이미지의 특징을 갖도록 변환함으로써, 선택된 이미지에 변환 오브젝트가 합성되어 있는 출력 이미지를 제공할 수 있다. 예를 들면, 사용자는 본 발명의 실시 예들에 따른 컴퓨터 장치 및 방법을 통해 오브젝트들 및 이미지들을 조합하여 다양한 출력 이미지들을 생성할 수 있다.The computer apparatus and method according to embodiments of the present invention may provide an output image in which the transform object is synthesized with the selected image by converting an object selected by the user to have characteristics of the image selected by the user. For example, a user may generate various output images by combining objects and images through a computer device and method according to embodiments of the present invention.

도 1은 본 발명의 실시 예에 따른 컴퓨터 장치를 보여주는 블록도이다.
도 2는 도 1의 컴퓨터 장치의 동작 방법을 보여주는 순서도이다.
도 3은 컴퓨터 장치에 디스플레이되는 기준 이미지의 일 예를 보여주는 도면이다.
도 4는 컴퓨터 장치에 디스플레이되는 타겟 이미지의 일 예를 보여주는 도면이다.
도 5는 기준 이미지에 오버랩된 타겟 오브젝트의 일 예를 보여주는 도면이다.
도 6은 기준 이미지 및 오버랩된 타겟 오브젝트를 프로세싱하여 생성된 변환 이미지의 일 예를 보여주는 도면이다.
도 7은 도 1의 컨볼루션 신경망 네트워크 그룹의 실시 예를 보여주는 블록도이다.
도 8 및 도 9는 도 7의 제 1 및 제 2 컨볼루션 신경망 네트워크들의 실시 예들을 보여주는 블록도들이다.
도 10은 도 2의 S140단계의 실시 예를 보여주는 순서도이다.
도 11은 도 1의 컴퓨터 장치의 다른 실시 예를 보여주는 블록도이다.
도 12는 도 11의 컴퓨터 장치와 통신할 수 있는 클라이언트 서버를 보여주는 블록도이다.1 is a block diagram showing a computer device according to an embodiment of the present invention.
2 is a flow chart illustrating a method of operating the computer device of FIG. 1.
3 is a diagram illustrating an example of a reference image displayed on a computer device.
4 is a diagram illustrating an example of a target image displayed on a computer device.
5 is a diagram illustrating an example of a target object overlapping a reference image.
6 is a diagram illustrating an example of a converted image generated by processing a reference image and an overlapped target object.
7 is a block diagram illustrating an embodiment of the convolutional neural network group of FIG. 1.
8 and 9 are block diagrams illustrating embodiments of the first and second convolutional neural networks of FIG. 7.
10 is a flowchart showing an embodiment of step S140 of FIG. 2.
11 is a block diagram showing another embodiment of the computer device of FIG. 1.
12 is a block diagram illustrating a client server capable of communicating with the computer device of FIG. 11.

이하, 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기의 설명에서는 본 발명에 따른 동작을 이해하는데 필요한 부분만이 설명되며 그 이외 부분의 설명은 본 발명의 요지를 모호하지 않도록 하기 위해 생략될 것이라는 것을 유의하여야 한다. 또한 본 발명은 여기에서 설명되는 실시 예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 단지, 여기에서 설명되는 실시 예는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여 제공되는 것이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, it should be noted that only parts necessary to understand the operation according to the present invention will be described, and descriptions of other parts will be omitted so as not to obscure the subject matter of the present invention. In addition, the present invention is not limited to the embodiments described herein and may be embodied in other forms. However, the embodiments described herein are provided to explain in detail enough to be able to easily implement the technical idea of the present invention to those of ordinary skill in the art.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 여기에서 사용된 용어는 특정한 실시예들을 설명하기 위한 것이며 본 발명을 한정하기 위한 것이 아니다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. "X, Y, 및 Z 중 적어도 어느 하나", 그리고 "X, Y, 및 Z로 구성된 그룹으로부터 선택된 적어도 어느 하나"는 X 하나, Y 하나, Z 하나, 또는 X, Y, 및 Z 중 둘 또는 그 이상의 어떤 조합 (예를 들면, XYZ, XYY, YZ, ZZ) 으로 해석될 수 있다. 여기에서, "및/또는"은 해당 구성들 중 하나 또는 그 이상의 모든 조합을 포함한다.Throughout the specification, when a part is said to be "connected" with another part, this includes not only "directly connected" but also "indirectly connected" with another element interposed therebetween. . The terms used herein are for describing specific embodiments and not for limiting the present invention. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated. "At least any one of X, Y, and Z", and "at least any one selected from the group consisting of X, Y, and Z" means X one, Y one, Z one, or two of X, Y, and Z, or Any combination beyond that (eg, XYZ, XYY, YZ, ZZ) can be interpreted. Herein, “and/or” includes all combinations of one or more of the constituents.

도 1은 본 발명의 실시 예에 따른 컴퓨터 장치를 보여주는 블록도이다.1 is a block diagram showing a computer device according to an embodiment of the present invention.

도 1을 참조하면, 컴퓨터 장치(100)는 통신기(105), 사용자 인터페이스(110), A/V(Audio/Video) 입력기(120), 디스플레이 장치(130), 메모리(140), 불휘발성 저장 매체(150), 및 이미지 변환 장치(160)를 포함할 수 있다.Referring to FIG. 1, the computer device 100 includes a communication device 105, a user interface 110, an audio/video (A/V) input device 120, a display device 130, a memory 140, and a nonvolatile storage device. A medium 150 and an image conversion device 160 may be included.

통신기(105)는 이미지 변환 장치(160)의 제어에 응답하여 네트워크를 통해 외부 장치와 통신할 수 있다. 통신기(105)는 외부 장치로부터 이미지를 수신할 수 있으며, 수신된 이미지는 메모리(140) 및/또는 불휘발성 저장 매체(150)에 저장되거나, 디스플레이 장치(130)에 디스플레이될 수 있다.The communicator 105 may communicate with an external device through a network in response to the control of the image conversion device 160. The communicator 105 may receive an image from an external device, and the received image may be stored in the memory 140 and/or the nonvolatile storage medium 150 or displayed on the display device 130.

사용자 인터페이스(110)는 컴퓨터 장치(100)의 동작을 제어하기 위한 사용자 입력들을 수신하며, 수신된 사용자 입력들을 이미지 변환 장치(160)에 제공할 수 있다. 사용자 인터페이스(110)는 키 패드(key pad), 돔 스위치(dome switch), 터치 패드(정압/정전), 조그 휠, 조그 스위치, 핑거 마우스 등을 포함할 수 있다. 특히, 터치 패드가 디스플레이(130)와 일체로 형성되는 경우, 이를 터치 스크린이라 부를 수 있다. 이러한 경우, 사용자 인터페이스(110)는 디스플레이(130)에 의해 시각화될 수 있다.The user interface 110 may receive user inputs for controlling the operation of the computer device 100 and may provide the received user inputs to the image conversion device 160. The user interface 110 may include a key pad, a dome switch, a touch pad (positive pressure/electrostatic), a jog wheel, a jog switch, a finger mouse, and the like. In particular, when the touch pad is integrally formed with the display 130, it may be referred to as a touch screen. In this case, the user interface 110 may be visualized by the display 130.

A/V 입력기(120)는 오디오 신호 및 비디오 신호의 입력을 위한 것으로, 카메라를 포함할 수 있다. A/V 입력기(120)에 의해 촬영된 이미지는 메모리(140) 및/또는 불휘발성 저장 매체(150)에 저장되거나, 디스플레이 장치(130)에 디스플레이될 수 있다.The A/V input unit 120 is for inputting an audio signal and a video signal, and may include a camera. The image captured by the A/V input device 120 may be stored in the memory 140 and/or the nonvolatile storage medium 150 or may be displayed on the display device 130.

디스플레이 장치(130)는 컴퓨터 장치(100)에서 처리되는 정보를 디스플레이한다. 디스플레이 장치(130)는 이미지 변환 장치(160)의 제어에 응답하여 동작할 수 있다. 디스플레이 장치(130)는 이미지를 포함하는 그래픽 인터페이스를 디스플레이할 수 있다. 디스플레이(130)가 터치 패드와 일체로 형성되어 터치 스크린을 구성하는 경우, 디스플레이(130)는 사용자 인터페이스를 디스플레이할 수 있다.The display device 130 displays information processed by the computer device 100. The display device 130 may operate in response to the control of the image conversion device 160. The display device 130 may display a graphic interface including an image. When the display 130 is integrally formed with a touch pad to form a touch screen, the display 130 may display a user interface.

메모리(140)는 이미지 변환 장치(160)의 제어에 응답하여 동작하며, 이미지 변환 장치(160)를 위한 버퍼 메모리로서 제공될 수 있다. 이러한 경우, 이미지 변환 장치(160)는 복수의 이미지들을 메모리(140)에 로드할 수 있으며, 로드된 이미지들 중 적어도 일부를 디스플레이 장치(130)에 디스플레이할 수 있다. 예를 들면, 위 복수의 이미지들은 통신기(105), A/V 입력기(120), 혹은 불휘발성 저장 매체(150)로부터 로드될 수 있다. 실시 예들에서, 메모리(140)는 램(Random Access Memory, RAM)과 같은 상대적으로 빠른 읽기 및 쓰기 속도를 제공하는 메모리들 중 어느 하나를 포함할 수 있다.The memory 140 operates in response to the control of the image conversion device 160 and may be provided as a buffer memory for the image conversion device 160. In this case, the image conversion device 160 may load a plurality of images into the memory 140 and may display at least some of the loaded images on the display device 130. For example, the plurality of images may be loaded from the communicator 105, the A/V input unit 120, or the nonvolatile storage medium 150. In embodiments, the memory 140 may include any one of memories that provide relatively fast read and write speeds, such as random access memory (RAM).

이미지 변환 장치(160)는 사용자 인터페이스(110)를 통해 수신되는 사용자 입력들에 응답하여 동작할 수 있다. 이미지 변환 장치(160)는 사용자 입력들에 응답하여 통신기(105), A/V 입력기(120), 디스플레이 장치(130), 메모리(140), 및 불휘발성 저장 매체(150)를 제어할 수 있다. 이미지 변환 장치(160)는 사용자 입력들에 의해 선택된 이미지들을 프로세싱하여 변환 이미지(CIMG)를 생성할 수 있다.The image conversion device 160 may operate in response to user inputs received through the user interface 110. The image conversion device 160 may control the communicator 105, the A/V input device 120, the display device 130, the memory 140, and the nonvolatile storage medium 150 in response to user inputs. . The image conversion apparatus 160 may generate a converted image CIMG by processing images selected by user inputs.

이미지 변환 장치(160)는 제어부(161), 이미지 뷰어(162), 오브젝트 감지기(163), 및 컨볼루션 신경망 네트워크(Convolutional Neural Network: CNN) 그룹(164)를 포함할 수 있다.The image conversion device 160 may include a control unit 161, an image viewer 162, an object detector 163, and a convolutional neural network (CNN) group 164.

제어부(161)는 이미지 뷰어(162), 오브젝트 감지기(163), 및 컨볼루션 신경망 네트워크 그룹(164)를 제어하도록 구성된다. 제어부(161)는 메모리(140)에 복수의 이미지들을 로드할 수 있으며, 로드된 이미지들 중 적어도 일부를 이미지 뷰어(162)를 통해 디스플레이 장치(130)에 디스플레이할 수 있다. 즉, 이미지 뷰어(162)는 이미지를 포함하는 그래픽 인터페이스를 제공할 수 있다.The controller 161 is configured to control the image viewer 162, the object detector 163, and the convolutional neural network group 164. The controller 161 may load a plurality of images into the memory 140, and may display at least some of the loaded images on the display device 130 through the image viewer 162. That is, the image viewer 162 may provide a graphic interface including an image.

제어부(161)는 사용자 입력에 의해 선택된 관심 이미지를 오브젝트 감지기(163)에 제공할 수 있다. 오브젝트 감지기(163)는 관심 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하도록 구성된다. 실시 예들에서, 오브젝트 감지기(163)는 타겟 오브젝트의 감지를 위해, 인스턴스 분할(instance segmentation)과 같은 다양한 알고리즘들을 포함하는 신경망 네트워크를 채용할 수 있다.The controller 161 may provide the object detector 163 with an image of interest selected by a user input. The object detector 163 is configured to detect at least one target object from the image of interest. In embodiments, the object detector 163 may employ a neural network network including various algorithms, such as instance segmentation, for detection of a target object.

제어부(161)는 감지된 타겟 오브젝트를 이미지 뷰어(162)를 통해 디스플레이 장치(130)에 디스플레이할 수 있다. 이에 따라, 사용자 입력에 의해 감지된 타겟 오브젝트들 중 하나가 선택될 수 있다.The controller 161 may display the detected target object on the display device 130 through the image viewer 162. Accordingly, one of the target objects detected by the user input may be selected.

또한, 제어부(161)는 선택된 타겟 오브젝트를 기준 이미지(RIMG)의 일부에 오버랩하는 사용자 입력에 응답하여, 기준 이미지(RIMG)에 오버랩된 타겟 오브젝트를 이미지 뷰어(162)를 통해 디스플레이 장치(130)에 디스플레이할 수 있다. 이에 따라, 사용자는 타겟 오브젝트를 오버랩시킬 기준 이미지(RIMG)의 일부를 확인 및 선택할 수 있다. 기준 이미지(RIMG) 또한 사용자에 의해 선택될 수 있다.In addition, the controller 161 displays the target object overlapped with the reference image RIMG through the image viewer 162 in response to a user input that overlaps the selected target object with a part of the reference image RIMG. Can be displayed on. Accordingly, the user may check and select a part of the reference image RIMG to overlap the target object. The reference image RIMG may also be selected by the user.

이후, 제어부(161)는 컨볼루션 신경망 네트워크 그룹(164)을 이용하여 기준 이미지(RIMG)에 오버랩된 타겟 오브젝트를 기준 이미지(RIMG)의 특징(혹은 스타일)을 갖도록 변환할 수 있다. 제어부(161)는 오버랩된 타겟 오브젝트를 타겟 이미지(TIMG)로서 컨볼루션 신경망 네트워크 그룹(164)에 제공할 수 있다. 실시 예들에서, 타겟 이미지(TIMG)는 타겟 오브젝트, 그리고 타겟 오브젝트를 제외한 나머지 픽셀들에 패딩된 데이터 값들을 포함할 수 있다.Thereafter, the controller 161 may convert the target object overlapped with the reference image RIMG to have a characteristic (or style) of the reference image RIMG using the convolutional neural network group 164. The controller 161 may provide the overlapped target object to the convolutional neural network group 164 as a target image TIMG. In embodiments, the target image TIMG may include a target object and data values padded in pixels other than the target object.

컨볼루션 신경망 네트워크 그룹(164)은 제 1 컨볼루션 신경망 네트워크(165) 및 제 2 컨볼루션 신경망 네트워크(166)를 포함할 수 있다. 제 1 및 제 2 컨볼루션 신경망 네트워크들(165, 166) 각각은 Deep Painterly Harmonization과 같은 다양한 알고리즘들에 기반한 복수의 컨볼루션 인코더 레이어들과 적어도 하나의 컨볼루션 디코더 레이어를 포함할 수 있다. 컨볼루션 인코더 레이어들은 입력 이미지를 순차적으로 처리하여 입력 이미지의 특징 데이터를 추출할 수 있으며, 컨볼루션 디코더 레이어는 수신된 특징 데이터에 대한 디컨볼루션을 수행하여 출력 이미지를 생성할 수 있다.The convolutional neural network group 164 may include a first convolutional neural network network 165 and a second convolutional neural network network 166. Each of the first and second convolutional neural networks 165 and 166 may include a plurality of convolutional encoder layers and at least one convolutional decoder layer based on various algorithms such as Deep Painterly Harmonization. The convolutional encoder layers may sequentially process the input image to extract feature data of the input image, and the convolution decoder layer may perform deconvolution on the received feature data to generate an output image.

제 1 및 제 2 컨볼루션 신경망 네트워크들(165, 166)은 순차적으로 연결된 스테이지들로서 제공될 수 있다. 제 1 컨볼루션 신경망 네트워크(165)는 제 1 스테이지로서, 기준 이미지(RIMG) 및 타겟 이미지(TIMG)를 프로세싱하여 세미 변환 이미지(SIMG)를 생성하고, 생성된 세미 변환 이미지(SIMG)를 제 2 컨볼루션 신경망 네트워크(166)에 전송할 수 있다. 제 2 컨볼루션 신경망 네트워크(166)는 제 2 스테이지로서, 기준 이미지(RIMG) 및 세미 변환 이미지(SIMG)를 프로세싱하여 변환 이미지(CIMG)를 생성하고, 생성된 변환 이미지(CIMG)를 제어부(161)에 전송할 수 있다. 변환 이미지(CIMG)는 기준 이미지(RIMG)의 일부에 오버랩된 변환된 타겟 오브젝트를 포함한다.The first and second convolutional neural networks 165 and 166 may be provided as stages sequentially connected. As a first stage, the first convolutional neural network 165 generates a semi-transformed image SIMG by processing a reference image RIMG and a target image TIMG, and converts the generated semi-transformed image SIMG into a second stage. It can be transmitted to the convolutional neural network 166. As a second stage, the second convolutional neural network 166 generates a converted image CIMG by processing a reference image RIMG and a semi-transformed image SIMG, and controls the generated converted image CIMG. ). The converted image CIMG includes a converted target object overlapped with a part of the reference image RIMG.

제어부(161)는 변환 이미지(CIMG)를 이미지 뷰어(162)를 통해 디스플레이 장치(130)에 디스플레이할 수 있으며, 사용자의 선택에 따라 변환 이미지(CIMG)를 불휘발성 저장 매체(150)에 저장할 수도 있다.The controller 161 may display the converted image CIMG on the display device 130 through the image viewer 162, and may store the converted image CIMG in the nonvolatile storage medium 150 according to the user's selection. have.

본 발명의 실시 예에 따르면, 이미지 변환 장치(160)는 사용자 입력에 따라 선택된 관심 이미지에 포함된 적어도 하나의 타겟 오브젝트를 감지할 수 있으며, 사용자 입력에 따라 감지된 타겟 오브젝트들 중 하나를 선택할 수 있다. 그리고, 이미지 변환 장치(160)는 기준 이미지(RIMG)에 오버랩된 타겟 오브젝트를 기준 이미지(RIMG)의 특징을 갖도록 변환할 수 있다. 이에 따라, 이미지 변환 장치(160)는 사용자로 하여금 다양한 이미지들 내에 포함된 오브젝트들 중 자신이 선택한 오브젝트를 자신이 선택한 기준 이미지(RIMG)의 특징을 갖도록 변환하게 할 수 있다.According to an embodiment of the present invention, the image conversion device 160 may detect at least one target object included in an image of interest selected according to a user input, and select one of the detected target objects according to a user input. have. In addition, the image conversion apparatus 160 may convert a target object overlapped with the reference image RIMG to have characteristics of the reference image RIMG. Accordingly, the image conversion apparatus 160 may cause the user to convert an object selected by the user among objects included in various images to have the characteristics of the reference image RIMG selected by the user.

통신기(105), 사용자 인터페이스(110), A/V 입력기(120), 디스플레이 장치(130), 메모리(140), 불휘발성 저장 매체(150), 및 이미지 변환 장치(160) 각각은 하드웨어, 소프트웨어, 펌웨어, 및 그것들의 조합 중 하나를 통해 구현될 수 있다. 실시 예들에서, 통신기(105), 사용자 인터페이스(110), A/V 입력기(120), 디스플레이 장치(130), 메모리(140), 및 불휘발성 저장 매체(150)는 하드웨어들을 통해, 이미지 변환 장치(160)는 소프트웨어를 통해 구현될 수 있다. 또한, 이미지 변환 장치(160)의 제어부(161), 이미지 뷰어(162), 오브젝트 감지기(163), 및 컨볼루션 신경망 네트워크 그룹(164)은 실시 예들에 따라 더 많은 구성 요소들로 분리되거나 더 적은 구성 요소들로 통합될 수 있다.The communication unit 105, the user interface 110, the A/V input unit 120, the display device 130, the memory 140, the nonvolatile storage medium 150, and the image conversion device 160 are each hardware and software. , Firmware, and combinations thereof. In embodiments, the communicator 105, the user interface 110, the A/V input unit 120, the display device 130, the memory 140, and the nonvolatile storage medium 150 are configured through hardware, and the image conversion device 160 may be implemented through software. In addition, the control unit 161, the image viewer 162, the object detector 163, and the convolutional neural network group 164 of the image conversion device 160 are separated into more components or fewer Can be integrated into components.

도 2는 도 1의 컴퓨터 장치의 동작 방법을 보여주는 순서도이다. 도 3은 컴퓨터 장치에 디스플레이되는 기준 이미지의 일 예를 보여주는 도면이다. 도 4는 컴퓨터 장치에 디스플레이되는 타겟 이미지의 일 예를 보여주는 도면이다. 도 5는 기준 이미지에 오버랩된 타겟 오브젝트의 일 예를 보여주는 도면이다. 도 6은 기준 이미지 및 오버랩된 타겟 오브젝트를 프로세싱하여 생성된 변환 이미지의 일 예를 보여주는 도면이다.2 is a flow chart illustrating a method of operating the computer device of FIG. 1. 3 is a diagram illustrating an example of a reference image displayed on a computer device. 4 is a diagram illustrating an example of a target image displayed on a computer device. 5 is a diagram illustrating an example of a target object overlapping a reference image. 6 is a diagram illustrating an example of a converted image generated by processing a reference image and an overlapped target object.

도 1 및 도 2를 참조하면, S110단계에서, 이미지 변환 장치(160)는 기준 이미지(RIMG)를 선택하는 제 1 사용자 입력에 응답하여 기준 이미지(RIMG)를 디스플레이한다. 도 3을 참조하면, 이미지 변환 장치(160)는 기준 이미지(RIMG)를 포함하는 제 1 그래픽 인터페이스(GUI1)를 디스플레이할 수 있다. 실시 예들에서, 제 1 그래픽 인터페이스(GUI1)는 복수의 후보 이미지들에 각각 대응하는 썸네일들(SN1~SN4)을 포함할 수 있으며, 그 중 어느 하나를 선택하는 제 1 사용자 입력에 응답하여 선택된 후보 이미지를 기준 이미지(RIMG)로서 제 1 그래픽 인터페이스(GUI1) 상에 디스플레이할 수 있다. 도 3에는 4개의 썸네일들(SN1~SN4)이 도시되나, 이는 예시적인 것으로서 더 많은 혹은 더 적은 썸네일들(SN1~SN4)이 제공될 수 있다. 실시 예들에서, 썸네일의 선택에 응답하여 선택된 썸네일을 나타내는 피드백 알람이 디스플레이될 수 있다. 예를 들면, 선택된 썸네일은 “V”와 같은 심볼이 표시되거나, 음영이 표시되는 등의 다양한 방식들에 따라 하이라이트될 수 있다. 제 1 그래픽 인터페이스(GUI1)에 포함된 제 1 영역(AR1)을 선택하는 추가적인 사용자 입력에 응답하여, 기준 이미지(RIMG)가 결정 혹은 승인될 수 있다.1 and 2, in step S110, the image conversion apparatus 160 displays a reference image RIMG in response to a first user input for selecting the reference image RIMG. Referring to FIG. 3, the image conversion apparatus 160 may display a first graphic interface GUI1 including a reference image RIMG. In embodiments, the first graphic interface GUI1 may include thumbnails SN1 to SN4 respectively corresponding to a plurality of candidate images, and a candidate selected in response to a first user input selecting one of them. The image may be displayed as the reference image RIMG on the first graphic interface GUI1. Although four thumbnails SN1 to SN4 are shown in FIG. 3, this is exemplary, and more or fewer thumbnails SN1 to SN4 may be provided. In embodiments, a feedback alarm indicating the selected thumbnail may be displayed in response to the selection of the thumbnail. For example, the selected thumbnail may be highlighted according to various methods, such as displaying a symbol such as “V” or displaying a shadow. In response to an additional user input for selecting the first area AR1 included in the first graphic interface GUI1, the reference image RIMG may be determined or approved.

다시 도 1 및 도 2를 참조하면, S120단계에서, 이미지 변환 장치(160)는 관심 이미지를 선택하는 제 2 사용자 입력에 응답하여 관심 이미지로부터 적어도 하나의 타겟 오브젝트를 감지한다. 도 4를 참조하면, 이미지 변환 장치(160)는 선택된 관심 이미지(NIMG)를 포함하는 제 2 그래픽 인터페이스(GUI2)를 디스플레이할 수 있다. 이때, 이미지 변환 장치(160)는 관심 이미지(NIMG)를 프로세싱하여 관심 이미지(NIMG)에 포함된 자동차를 타겟 오브젝트(TG)로서 감지할 수 있다. 예를 들면, 관심 이미지(NIMG)로부터 타겟 오브젝트(TG)만 추출되고, 추출된 타겟 오브젝트(TG)만 디스플레이될 수 있다. 예를 들면, 관심 이미지(NIMG) 상에서 타겟 오브젝트(TG)를 하이라이트하는 지시 영역이 디스플레이될 수 있다. 타겟 오브젝트(TG)를 선택하고 제 2 그래픽 인터페이스(GUI2)에 포함된 제 2 영역(AR2)을 선택하는 추가적인 사용자 입력들에 응답하여, 타겟 오브젝트(TG)가 결정 혹은 승인될 수 있다.Referring back to FIGS. 1 and 2, in step S120, the image conversion apparatus 160 detects at least one target object from the image of interest in response to a second user input for selecting the image of interest. Referring to FIG. 4, the image conversion apparatus 160 may display a second graphic interface GUI2 including the selected image of interest NIMG. In this case, the image conversion apparatus 160 may process the image of interest NIMG to detect a vehicle included in the image of interest NIMG as the target object TG. For example, only the target object TG may be extracted from the image of interest NIMG, and only the extracted target object TG may be displayed. For example, an indication area that highlights the target object TG on the image of interest NIMG may be displayed. In response to additional user inputs for selecting the target object TG and selecting the second area AR2 included in the second graphic interface GUI2, the target object TG may be determined or approved.

실시 예들에서, 제 2 그래픽 인터페이스(GUI2) 내에 도 3의 썸네일들(SN1~SN4) 혹은 선택된 썸네일이 더 디스플레이될 수 있다. 실시 예들에서, S110단계 및 S120단계의 순서는 변경될 수 있다. S110단계가 S120단계 이후에 수행되는 경우, 제 1 그래픽 인터페이스(GUI1)에는 관심 이미지(NIMG) 혹은 결정된 타겟 오브젝트(TG)가 더 디스플레이될 수 있다.In embodiments, the thumbnails SN1 to SN4 of FIG. 3 or the selected thumbnail may be further displayed in the second graphic interface GUI2. In embodiments, the order of steps S110 and S120 may be changed. If step S110 is performed after step S120, the image of interest NIMG or the determined target object TG may be further displayed on the first graphic interface GUI1.

다시 도 1 및 도 2를 참조하면, S130단계에서, 이미지 변환 장치(160)는 타겟 오브젝트를 기준 이미지(RIMG)의 일부에 오버랩하기 위한 제 3 사용자 입력에 응답하여 기준 이미지(RIMG)의 일부에 오버랩된 타겟 오브젝트를 디스플레이한다. 도 5를 참조하면, 이미지 변환 장치(160)는 기준 이미지(RIMG)의 일부에 오버랩된 타겟 오브젝트(TG)를 포함하는 제 3 그래픽 인터페이스(GUI3)를 디스플레이할 수 있다. 제 3 그래픽 인터페이스(GUI3)는 타겟 오브젝트(TG)를 회전하고, 타겟 오브젝트(TG)의 위치 및 크기를 조절하는 등의 기준 이미지(RIMG) 상에서 타겟 오브젝트(TG)를 변형하기 위한 다양한 사용자 인터페이스들(UI)을 포함할 수 있다. 사용자 인터페이스들(UI)을 이용한 사용자 입력들은 제 3 사용자 입력으로서 제공될 수 있다. 또한, 제 3 그래픽 인터페이스(GUI3)는 선택될 때 그 밖에 다른 기능들을 제공하는 제 3 영역들(AR3)을 더 포함할 수 있다.Referring back to FIGS. 1 and 2, in step S130, the image conversion device 160 may apply a portion of the reference image RIMG in response to a third user input for overlapping the target object with a portion of the reference image RIMG. Display the overlapped target object. Referring to FIG. 5, the image conversion apparatus 160 may display a third graphic interface GUI3 including a target object TG overlapping a part of the reference image RIMG. The third graphic interface GUI3 is a variety of user interfaces for transforming the target object TG on the reference image RIMG, such as rotating the target object TG and adjusting the position and size of the target object TG. (UI) can be included. User inputs using user interfaces (UI) may be provided as third user inputs. In addition, the third graphic interface GUI3 may further include third areas AR3 that provide other functions when selected.

다시 도 1 및 도 2를 참조하면, S140단계에서, 이미지 변환 장치(160)는 제 4 사용자 입력에 응답하여 기준 이미지(RIMG) 및 오버랩된 타겟 오브젝트를 제 1 및 제 2 컨볼루션 신경망 네트워크들(165, 166)을 통해 프로세싱함으로써 오버랩된 타겟 오브젝트를 기준 이미지(RIMG)와 연관하여 변환할 수 있다. 예를 들면, 이미지 변환 장치(160)는 도 5의 제 3 그래픽 인터페이스(GUI3)에 포함된 제 4 영역(AR4)을 선택하는 제 4 사용자 입력에 응답하여, 기준 이미지(RIMG) 및 오버랩된 타겟 오브젝트를 제 1 및 제 2 컨볼루션 신경망 네트워크들(165, 166)을 이용하여 프로세싱할 수 있다. 이때, 오버랩된 타겟 오브젝트로서 타겟 이미지(TIMG)가 제공될 수 있다.Referring back to FIGS. 1 and 2, in step S140, the image conversion device 160 converts the reference image RIMG and the overlapped target object to the first and second convolutional neural networks ( By processing through 165 and 166, the overlapped target object may be converted in association with the reference image RIMG. For example, in response to a fourth user input for selecting the fourth area AR4 included in the third graphic interface GUI3 of FIG. 5, the image conversion device 160 may be configured to provide a reference image RIMG and an overlapped target. The object may be processed using the first and second convolutional neural networks 165 and 166. In this case, a target image TIMG may be provided as an overlapped target object.

기준 이미지(RIMG)에 오버랩된 변환된 타겟 오브젝트는 변환 이미지(CIMG)로서 제공될 수 있다. 도 6을 참조하면, 이미지 변환 장치(160)는 변환 이미지(CIMG)를 포함하는 제 4 그래픽 인터페이스(GUI4)를 디스플레이할 수 있다. 변환 이미지(CIMG)는 변환된 타겟 오브젝트(CTG)를 포함하며, 변환된 타겟 오브젝트(CTG)에는 기준 이미지(RIMG)의 특징이 반영되어 있다.The converted target object overlapping the reference image RIMG may be provided as the converted image CIMG. Referring to FIG. 6, the image conversion apparatus 160 may display a fourth graphic interface GUI4 including a converted image CIMG. The converted image CIMG includes the converted target object CTG, and the characteristics of the reference image RIMG are reflected in the converted target object CTG.

본 발명의 실시 예에 따르면, 사용자 입력에 따라 선택된 관심 이미지에 포함된 적어도 하나의 타겟 오브젝트가 감지될 수 있으며, 감지된 타겟 오브젝트들 중 하나가 사용자 입력에 의해 선택될 수 있다. 나아가, 기준 이미지(RIMG)에 오버랩된 타겟 오브젝트가 기준 이미지(RIMG)의 특징을 갖도록 변환될 수 있다. 이는, 사용자로 하여금 다양한 이미지들 내에 포함된 오브젝트들 중 자신이 선택한 오브젝트를 자신이 선택한 기준 이미지(RIMG)의 특징을 갖도록 변환하게 할 수 있다.According to an embodiment of the present invention, at least one target object included in an image of interest selected according to a user input may be detected, and one of the detected target objects may be selected according to a user input. Furthermore, the target object overlapping the reference image RIMG may be converted to have the characteristics of the reference image RIMG. This enables the user to convert an object selected by the user among objects included in various images to have the characteristics of the reference image RIMG selected by the user.

도 7은 도 1의 컨볼루션 신경망 네트워크 그룹의 실시 예를 보여주는 블록도이다.7 is a block diagram illustrating an embodiment of the convolutional neural network group of FIG. 1.

도 7을 참조하면, 컨볼루션 신경망 네트워크 그룹(200)은 제 1 컨볼루션 신경망 네트워크(210) 및 제 2 컨볼루션 신경망 네트워크(220)를 포함한다.Referring to FIG. 7, a convolutional neural network group 200 includes a first convolutional neural network 210 and a second convolutional neural network 220.

제 1 컨볼루션 신경망 네트워크(210)는 제 1 컨볼루션 인코더(211), 제 1 특징 스왑부(212), 및 제 2 컨볼루션 디코더(213)를 포함할 수 있다.The first convolutional neural network 210 may include a first convolutional encoder 211, a first feature swap unit 212, and a second convolutional decoder 213.

제 1 컨볼루션 인코더(211)는 기준 이미지(RIMG) 및 타겟 이미지(TIMG)를 수신한다. 타겟 이미지(TIMG)는 타겟 오브젝트를 포함할 수 있다. 타겟 이미지(TIMG)는 타겟 오브젝트 및 타겟 오브젝트를 제외한 나머지 픽셀들에 패딩된 “0”과 같은 데이터 값들을 포함할 수 있다.The first convolution encoder 211 receives a reference image RIMG and a target image TIMG. The target image TIMG may include a target object. The target image TIMG may include a target object and data values such as “0” padded in pixels other than the target object.

제 1 컨볼루션 인코더(211)는 기준 이미지(RIMG)에 대해 컨볼루션 및 서브 샘플링(혹은 풀링(pooling))을 수행하여 제 1 특징 맵들(FM1)을 출력할 수 있다. 여기에서, 특징 맵은 n * n의 요소들을 갖는 매트릭스로 이해될 수 있다. 마찬가지로, 제 1 컨볼루션 인코더(211)는 타겟 이미지(TIMG)에 대해 컨볼루션 및 서브 샘플링을 수행하여 제 2 특징 맵들(FM2)을 출력할 수 있다.The first convolution encoder 211 may perform convolution and sub-sampling (or pooling) on the reference image RIMG to output first feature maps FM1. Here, the feature map can be understood as a matrix having n * n elements. Likewise, the first convolution encoder 211 may output second feature maps FM2 by performing convolution and sub-sampling on the target image TIMG.

제 1 특징 스왑부(212)는 제 2 특징 맵들(FM2)의 요소들 중 적어도 일부를 제 1 특징 맵들(FM1)의 요소들로 스왑(swap)하여 제 1 스왑 맵들(SWM1)을 생성할 수 있다. 즉, 제 1 특징 스왑부(212)는 제 2 특징 맵들(FM2)의 데이터에 제 1 특징 맵들(FM1)의 데이터를 반영할 수 있다. 제 1 특징 스왑부(212)는 제 2 특징 맵(FM2)의 요소들 각각과 가장 유사한 값을 갖는 제 1 특징 맵들(FM1)의 요소를 판별하고, 판별된 요소를 제 1 스왑 맵들(SWM1)의 해당 요소로 결정할 수 있다.The first feature swap unit 212 may generate first swap maps SWM1 by swapping at least some of the elements of the second feature maps FM2 with elements of the first feature maps FM1. have. That is, the first feature swap unit 212 may reflect the data of the first feature maps FM1 to the data of the second feature maps FM2. The first feature swap unit 212 determines an element of the first feature maps FM1 having a value most similar to each of the elements of the second feature map FM2, and converts the determined element into the first swap maps SWM1. Can be determined by the corresponding element of.

제 1 컨볼루션 디코더(213)는 제 1 스왑 맵들(SWM1)에 대해 디컨볼루션 및 업 샘플링(혹은 언풀링)을 수행하여 세미 변환 이미지(SIMG)를 생성할 수 있다. 이에 따라 생성된 세미 변환 이미지(SIMG)는 기준 이미지(RIMG) 혹은 타겟 이미지(TIMG)와 동일한 사이즈를 가질 수 있다. 세미 변환 이미지(SIMG)는 제 2 컨볼루션 신경망 네트워크(220)에 입력될 수 있다.The first convolution decoder 213 may generate a semi-transformed image SIMG by performing deconvolution and up-sampling (or unpooling) on the first swap maps SWM1. Accordingly, the generated semi-transformed image SIMG may have the same size as the reference image RIMG or the target image TIMG. The semi-transformed image SIMG may be input to the second convolutional neural network 220.

제 2 컨볼루션 신경망 네트워크(220)는 제 2 컨볼루션 인코더(221), 제 2 특징 스왑부(222), 및 제 2 컨볼루션 디코더(223)를 포함할 수 있다.The second convolutional neural network 220 may include a second convolutional encoder 221, a second feature swap unit 222, and a second convolutional decoder 223.

제 2 컨볼루션 인코더(221), 제 2 특징 스왑부(222), 및 제 2 컨볼루션 디코더(223)는 각각 제 1 컨볼루션 인코더(211), 제 1 특징 스왑부(212), 및 제 1 컨볼루션 디코더(213)와 유사하게 구성될 수 있다.The second convolution encoder 221, the second feature swap unit 222, and the second convolution decoder 223 are respectively a first convolution encoder 211, a first feature swap unit 212, and a first It may be configured similarly to the convolution decoder 213.

제 2 컨볼루션 인코더(221)는 기준 이미지(RIMG) 및 세미 변환 이미지(SIMG)를 수신한다. 제 2 컨볼루션 인코더(221)는 기준 이미지(RIMG)에 대해 컨볼루션 및 서브 샘플링을 수행하여 제 3 특징 맵들(FM3)을 출력할 수 있다. 또한, 제 2 컨볼루션 인코더(221)는 세미 변환 이미지(SIMG)에 대해 컨볼루션 및 서브 샘플링을 수행하여 제 4 특징 맵들(FM4)을 출력할 수 있다.The second convolutional encoder 221 receives a reference image RIMG and a semi-transformed image SIMG. The second convolution encoder 221 may output third feature maps FM3 by performing convolution and sub-sampling on the reference image RIMG. Also, the second convolutional encoder 221 may perform convolution and sub-sampling on the semi-transformed image SIMG to output fourth feature maps FM4.

제 2 특징 스왑부(222)는 제 4 특징 맵들(FM4)의 요소들 중 적어도 일부를 제 3 특징 맵들(FM3)의 요소들로 스왑하여 제 2 스왑 맵들(SWM2)을 생성할 수 있다. 제 2 스왑 맵들(SWM2)은 제 1 스왑 맵들(SWM1)의 생성을 위한 연산과 동일한 방식을 통해 제 3 특징 맵들(FM3)과 제 4 특징 맵들(FM4)을 이용하여 연산될 수 있다.The second feature swap unit 222 may generate second swap maps SWM2 by swapping at least some of the elements of the fourth feature maps FM4 with elements of the third feature maps FM3. The second swap maps SWM2 may be calculated using the third feature maps FM3 and the fourth feature maps FM4 in the same manner as the operation for generating the first swap maps SWM1.

제 2 컨볼루션 디코더(223)는 제 2 스왑 맵들(SWM2)에 대해 디컨볼루션 및 업 샘플링을 수행하여 변환 이미지(CIMG)를 생성할 수 있다. 이에 따라 생성된 변환 이미지(CIMG)는 기준 이미지(RIMG) 혹은 타겟 이미지(TIMG)와 동일한 사이즈를 가질 수 있다.The second convolution decoder 223 may generate a transformed image CIMG by performing deconvolution and up-sampling on the second swap maps SWM2. Accordingly, the converted image CIMG generated may have the same size as the reference image RIMG or the target image TIMG.

도 8 및 도 9는 도 7의 제 1 및 제 2 컨볼루션 신경망 네트워크들의 실시 예들을 보여주는 블록도들이다.8 and 9 are block diagrams illustrating embodiments of the first and second convolutional neural networks of FIG. 7.

먼저 도 8을 참조하면, 제 1 컨볼루션 신경망 네트워크(300)는 제 1 컨볼루션 인코더(310), 제 1 특징 스왑부(320), 및 제 1 컨볼루션 디코더(330)를 포함할 수 있다.First, referring to FIG. 8, the first convolutional neural network 300 may include a first convolutional encoder 310, a first feature swap unit 320, and a first convolutional decoder 330.

제 1 컨볼루션 인코더(310)는 제 1 내지 제 3 컨볼루션 인코더 레이어들(CV11~CV13)과 같은 복수의 컨볼루션 인코더 레이어들, 그리고 제 1 내지 제 3 서브 샘플링 레이어들(MP11~MP13)과 같은 복수의 서브 샘플링 레이어들을 포함할 수 있다.The first convolutional encoder 310 includes a plurality of convolutional encoder layers, such as first to third convolutional encoder layers CV11 to CV13, and first to third sub-sampling layers MP11 to MP13. The same plurality of sub-sampling layers may be included.

제 1 내지 제 3 컨볼루션 인코더 레이어들(CV11~CV13) 각각은 이 분야에서 잘 알려진 바와 같이 입력 데이터와 하나 또는 그 이상의 필터들에 대한 컨볼루션을 수행하여 특징 맵들을 생성할 수 있다. 컨볼루션을 위한 필터들의 수는 하나의 필터의 깊이(depth)로서 이해될 수 있다. 입력 데이터가 2이상의 필터들과 컨볼루션될 때 복수의 필터 맵들이 생성될 수 있다. 이때, 필터들은 딥 러닝(deep learning)에 따라 결정 및 수정되는 것들일 있다. 도 8에 도시된 바와 같이, 기준 이미지(RIMG) 및 타겟 이미지(TIMG) 각각이 제 1 컨볼루션 인코더 레이어(CV11)의 입력 데이터로서 제공될 수 있다.Each of the first to third convolutional encoder layers CV11 to CV13 may generate feature maps by performing convolution on input data and one or more filters, as is well known in the art. The number of filters for convolution can be understood as the depth of one filter. When input data is convolved with two or more filters, a plurality of filter maps may be generated. In this case, the filters may be those that are determined and modified according to deep learning. As shown in FIG. 8, each of the reference image RIMG and the target image TIMG may be provided as input data of the first convolution encoder layer CV11.

제 1 내지 제 3 서브 샘플링 레이어들(MP11~MP13) 각각은 입력되는 특징 맵들을 다운 샘플링하여 특징 맵들의 사이즈를 감소시킴으로써, 모델의 복잡도(complexity)를 완화할 수 있다. 서브 샘플링은 평균 풀링, 맥스 풀링(max pooling) 등 다양한 방식들에 따라 수행될 수 있다. 실시 예들에서, 제 1 내지 제 3 서브 샘플링 레이어들(MP11~MP13) 각각은 맥스 풀링 레이어일 수 있다.Each of the first to third sub-sampling layers MP11 to MP13 down-samples the input feature maps to reduce the size of the feature maps, thereby reducing model complexity. Sub-sampling may be performed according to various methods, such as average pooling and max pooling. In embodiments, each of the first to third sub-sampling layers MP11 to MP13 may be a max pooling layer.

컨볼루션 인코더 레이어 및 서브 샘플링 레이어는 하나의 그룹을 이루며, 제 1 컨볼루션 인코더(310)는 복수의 그룹들을 포함할 수 있다. 기준 이미지(RIMG)가 컨볼루션 인코더 레이어 및 서브 샘플링 레이어를 각각 포함하는 3개의 그룹들을 통과함에 따라 특징 맵들(FM11), 특징 맵들(FM12), 및 특징 맵들(FM13)이 순차적으로 생성될 수 있다. 예를 들면, 기준 이미지(RIMG)는 제 1 컨볼루션 인코더 레이어(CV11) 및 제 1 서브 샘플링 레이어(MP11)를 통과하여 특징 맵들(FM11)로 변환되며, 특징 맵들(FM11)은 제 2 컨볼루션 인코더 레이어(CV12) 및 제 2 서브 샘플링 레이어(MP12)를 통과하여 특징 맵들(FM12)로 변환되며, 특징 맵들(FM12)은 제 3 컨볼루션 인코더 레이어(CV13) 및 제 3 서브 샘플링 레이어(MP13)를 통과하여 특징 맵들(FM13)로 변환될 수 있다. 특징 맵들(FM11)의 깊이는 기준 이미지(RIMG)보다 깊고, 특징 맵들(FM12)의 깊이는 특징 맵들(FM11)보다 깊고, 특징 맵들(FM13)의 깊이는 특징 맵들(FM12)보다 깊을 수 있으며, 이는 도 8에서 특징 맵들(FM11), 특징 맵들(FM12), 및 특징 맵들(FM13)을 나타내는 육면체들의 가로 방향의 너비들로서 도식화되어 있다. 특징 맵들(FM13)은 도 7의 제 1 특징 맵들(FM1)일 수 있다.The convolutional encoder layer and the sub-sampling layer form one group, and the first convolutional encoder 310 may include a plurality of groups. Feature maps FM11, feature maps FM12, and feature maps FM13 may be sequentially generated as the reference image RIMG passes through three groups each including a convolutional encoder layer and a sub-sampling layer. . For example, the reference image RIMG is converted into feature maps FM11 by passing through the first convolution encoder layer CV11 and the first sub-sampling layer MP11, and the feature maps FM11 are converted into second convolutional The feature maps FM12 are converted by passing through the encoder layer CV12 and the second sub-sampling layer MP12, and the feature maps FM12 are the third convolutional encoder layer CV13 and the third sub-sampling layer MP13. It may be converted into feature maps FM13 by passing through. The depth of the feature maps FM11 may be deeper than the reference image RIMG, the depth of the feature maps FM12 may be greater than that of the feature maps FM11, and the depth of the feature maps FM13 may be greater than that of the feature maps FM12, In FIG. 8, the feature maps FM11, the feature maps FM12, and the hexahedron representing the feature maps FM13 are illustrated as widths in the horizontal direction. The feature maps FM13 may be the first feature maps FM1 of FIG. 7.

마찬가지로, 타겟 이미지(TIMG)가 컨볼루션 인코더 레이어 및 서브 샘플링 레이어를 각각 포함하는 3개의 그룹들을 통과함에 따라 특징 맵들(FM21), 특징 맵들(FM22), 및 특징 맵들(FM23)이 순차적으로 생성될 수 있다. 특징 맵들(FM21)의 깊이는 타겟 이미지(TIMG)보다 깊고, 특징 맵들(FM22)의 깊이는 특징 맵들(FM21)보다 깊고, 특징 맵들(FM23)의 깊이는 특징 맵들(FM22)보다 깊을 수 있으며, 이는 도 8에서 특징 맵들(FM21), 특징 맵들(FM22), 및 특징 맵들(FM23)을 나타내는 육면체들의 가로 방향의 너비들로서 도식화되어 있다. 특징 맵들(FM23)은 도 7의 제 2 특징 맵들(FM2)일 수 있다.Similarly, feature maps FM21, feature maps FM22, and feature maps FM23 are sequentially generated as the target image TIMG passes through three groups each including a convolutional encoder layer and a sub-sampling layer. I can. The depth of the feature maps FM21 may be deeper than the target image TIMG, the depth of the feature maps FM22 may be greater than that of the feature maps FM21, and the depth of the feature maps FM23 may be greater than that of the feature maps FM22, In FIG. 8, the feature maps FM21, the feature maps FM22, and the hexahedron representing the feature maps FM23 are illustrated as widths in the horizontal direction. The feature maps FM23 may be the second feature maps FM2 of FIG. 7.

제 1 특징 스왑부(320)는 특징 맵들(FM13) 및 특징 맵들(FM23)을 수신하며, 특징 맵들(FM23)의 요소들 중 적어도 일부를 특징 맵들(FM13)의 요소들로 스왑할 수 있다. 제 1 특징 스왑부(212)는 제 2 특징 맵(FM2)의 요소들 각각과 가장 유사한 값을 갖는 제 1 특징 맵들(FM1)의 요소를 판별하고, 판별된 요소를 제 1 스왑 맵들(SWM1)의 해당 요소의 값으로 결정할 수 있다. 예를 들면, 제 1 스왑 맵들(SWM1)의 요소들은 아래의 수학식 1과 같은 코릴레이션 측정(correlation measure)에 따라 결정될 수 있다.The first feature swap unit 320 receives the feature maps FM13 and the feature maps FM23, and may swap at least some of the elements of the feature maps FM23 with elements of the feature maps FM13. The first feature swap unit 212 determines an element of the first feature maps FM1 having a value most similar to each of the elements of the second feature map FM2, and converts the determined element into the first swap maps SWM1. Can be determined by the value of the corresponding element of. For example, elements of the first swap maps SWM1 may be determined according to a correlation measure as shown in Equation 1 below.

수학식 1을 참조하면,

는 제 1 스왑 맵들(SWM1)의 요소를 나타내며,

은 특징 맵들(FM13)의 요소를 나타내며,

는 특징 맵들(FM23)의 요소를 나타낼 수 있다.Referring to Equation 1,

Denotes an element of the first swap maps SWM1,

Denotes an element of the feature maps FM13,

May represent an element of the feature maps FM23.

제 1 컨볼루션 디코더(330)는 적어도 하나의 컨볼루션 디코더 레이어(DCV11) 및 적어도 하나의 업 샘플링 레이어(UP11)를 포함할 수 있다. 제 1 컨볼루션 디코더(330)에 포함되는 컨볼루션 디코더 레이어들의 수 및 업 샘플링 레이어들의 수는 실시 예들에 따라 변할 수 있다.The first convolution decoder 330 may include at least one convolution decoder layer DCV11 and at least one up-sampling layer UP11. The number of convolution decoder layers and the number of up-sampling layers included in the first convolution decoder 330 may vary according to exemplary embodiments.

제 1 스왑 맵들(SWM1)은 업 샘플링 레이어(UP11) 및 컨볼루션 디코더 레이어(DCV11)를 통과하여 세미 변환 이미지(SIMG)로 변환될 수 있다.The first swap maps SWM1 may pass through the up-sampling layer UP11 and the convolution decoder layer DCV11 to be converted into a semi-transformed image SIMG.

업 샘플링 레이어(UP11)는 제 1 스왑 맵들(SWM1)에 대해, 다운 샘플링에 반대되는 업 샘플링을 수행하여 제 1 스왑 맵들(SWM1)의 사이즈를 증가시킬 수 있다. 실시 예들에서, 업 샘플링 레이어(UP11)는 언 풀링(un-pooling) 레이어를 포함하며, 제 1 내지 제 3 서브 샘플링 레이어들(MP11~MP13)에 대응하는 언 풀링 인덱스들을 가질 수 있다.The up-sampling layer UP11 may increase the size of the first swap maps SWM1 by performing up-sampling opposite to the down-sampling on the first swap maps SWM1. In embodiments, the up-sampling layer UP11 includes an un-pooling layer and may have un-pooling indices corresponding to the first to third sub-sampling layers MP11 to MP13.

컨볼루션 디코더 레이어(DCV11)는 입력 데이터에 대한 디컨볼루션을 수행할 수 있다. 하나 또는 그 이상의 필터들이 디컨볼루션에 이용될 수 있으며, 해당 필터들은 컨볼루션 인코더 레이어들(CV11~CV13)에서 이용되는 필터들과 연관될 수 있다. 예를 들면, 해당 필터들은 컨볼루션 인코더 레이어들(CV11~CV13)에서 이용되는 필터들을 전치(transpose)한 것들일 수 있다. The convolution decoder layer DCV11 may perform deconvolution on input data. One or more filters may be used for deconvolution, and the filters may be associated with filters used in the convolutional encoder layers CV11 to CV13. For example, the filters may be those obtained by transposing filters used in the convolutional encoder layers CV11 to CV13.

이어서 도 9를 참조하면, 제 2 컨볼루션 신경망 네트워크(400)는 제 2 컨볼루션 인코더(410), 제 2 특징 스왑부(420), 및 제 2 컨볼루션 디코더(430)를 포함할 수 있다.Next, referring to FIG. 9, the second convolutional neural network 400 may include a second convolutional encoder 410, a second feature swap unit 420, and a second convolutional decoder 430.

제 2 컨볼루션 인코더(410)는 기준 이미지(RIMG) 및 세미 변환 이미지(SIMG)를 수신할 수 있다. 제 2 컨볼루션 인코더(410)는 제 1 및 제 2 컨볼루션 인코더 레이어들(CV21, CV22)와 같은 복수의 컨볼루션 인코더 레이어들, 그리고 제 1 및 제 2 서브 샘플링 레이어들(MP21, MP22)과 같은 복수의 서브 샘플링 레이어들을 포함할 수 있다.The second convolution encoder 410 may receive a reference image RIMG and a semi-transformed image SIMG. The second convolutional encoder 410 includes a plurality of convolutional encoder layers such as first and second convolutional encoder layers CV21 and CV22, and first and second sub-sampling layers MP21 and MP22. The same plurality of sub-sampling layers may be included.

제 1 및 제 2 컨볼루션 인코더 레이어들(CV21, CV22) 각각은 도 8의 제 1 내지 제 3 컨볼루션 인코더 레이어들(CV11~CV13) 각각과 마찬가지로 입력 데이터와 하나 또는 그 이상의 필터들에 대한 컨볼루션을 수행할 수 있다. 실시 예들에서, 제 1 및 제 2 컨볼루션 인코더 레이어들(CV21, CV22)의 필터들은 도 8의 제 1 및 제 2 컨볼루션 인코더 레이어들(CV11, CV12)의 그것들과 마찬가지로 구성될 수 있다.Each of the first and second convolutional encoder layers CV21 and CV22 is similar to each of the first to third convolutional encoder layers CV11 to CV13 of FIG. 8, convolving input data and one or more filters. You can perform lution. In embodiments, filters of the first and second convolutional encoder layers CV21 and CV22 may be configured similarly to those of the first and second convolutional encoder layers CV11 and CV12 of FIG. 8.

제 1 및 제 2 서브 샘플링 레이어들(MP21, MP22) 각각은 도 8의 제 1 내지 제 3 서브 샘플링 레이어들(MP11~MP13) 각각과 마찬가지로 입력되는 특징 맵들을 다운 샘플링하여 특징 맵들의 사이즈를 감소시킬 수 있다. 실시 예들에서, 제 1 및 제 2 서브 샘플링 레이어들(MP21, MP22)은 도 8의 제 1 및 제 2 서브 샘플링 레이어들(MP11, MP12)과 마찬가지로 구성될 수 있다.Each of the first and second sub-sampling layers MP21 and MP22 reduces the size of the feature maps by down-sampling the input feature maps, similarly to each of the first to third sub-sampling layers MP11 to MP13 of FIG. 8. I can make it. In embodiments, the first and second sub-sampling layers MP21 and MP22 may be configured similarly to the first and second sub-sampling layers MP11 and MP12 of FIG. 8.

기준 이미지(RIMG)가 컨볼루션 인코더 레이어 및 서브 샘플링 레이어를 각각 포함하는 2개의 그룹들을 순차적으로 통과함에 따라, 특징 맵들(FM31) 및 특징 맵들(FM32)이 순차적으로 생성될 수 있다. 특징 맵들(FM32)의 깊이는 특징 맵들(FM31)보다 깊을 수 있으며, 이는 도 9에서 특징 맵들(FM31) 및 특징 맵들(FM32)을 나타내는 육면체들의 가로 방향의 너비들로서 도식화되어 있다. 특징 맵들(FM32)은 도 7의 제 3 특징 맵들(FM3)일 수 있다. 세미 변환 이미지(SIMG)가 컨볼루션 인코더 레이어 및 서브 샘플링 레이어를 각각 포함하는 2개의 그룹들을 순차적으로 통과함에 따라 특징 맵들(FM41) 및 특징 맵들(FM42)이 순차적으로 생성될 수 있다. 특징 맵들(FM42)의 깊이는 특징 맵들(FM41)보다 깊을 수 있으며, 이는 도 9에서 특징 맵들(FM41) 및 특징 맵들(FM42)을 나타내는 육면체들의 가로 방향의 너비들로서 도식화되어 있다. 특징 맵들(FM42)은 도 7의 제 4 특징 맵들(FM4)일 수 있다.As the reference image RIMG sequentially passes through two groups each including a convolutional encoder layer and a sub-sampling layer, the feature maps FM31 and the feature maps FM32 may be sequentially generated. The depth of the feature maps FM32 may be deeper than the feature maps FM31, which is illustrated as widths in the horizontal direction of hexahedrons representing the feature maps FM31 and the feature maps FM32 in FIG. 9. The feature maps FM32 may be the third feature maps FM3 of FIG. 7. Feature maps FM41 and feature maps FM42 may be sequentially generated as the semi-transformed image SIMG sequentially passes through two groups each including a convolutional encoder layer and a sub-sampling layer. The depth of the feature maps FM42 may be deeper than the feature maps FM41, which is illustrated as widths in the horizontal direction of the cubes representing the feature maps FM41 and the feature maps FM42 in FIG. 9. The feature maps FM42 may be the fourth feature maps FM4 of FIG. 7.

제 2 특징 스왑부(420) 및 제 2 컨볼루션 디코더(430)는 도 8의 제 1 특징 스왑부(320) 및 제 1 컨볼루션 디코더(330)와 각각 마찬가지로 구성될 수 있다. 제 2 특징 스왑부(420)는 특징 맵들(FM32) 및 특징 맵들(FM42)을 수신하며, 특징 맵들(FM42)의 요소들 중 적어도 일부를 특징 맵들(FM32)의 요소들로 스왑할 수 있다. 제 2 컨볼루션 디코더(430)는 적어도 하나의 컨볼루션 디코더 레이어(DCV21) 및 적어도 하나의 업 샘플링 레이어(UP21)를 포함하며, 제 2 스왑 맵들(SWM2)은 업 샘플링 레이어(UP21) 및 컨볼루션 디코더 레이어(DCV21)를 통과하여 변환 이미지(CIMG)로 변환될 수 있다.The second feature swap unit 420 and the second convolution decoder 430 may be configured similarly to the first feature swap unit 320 and the first convolution decoder 330 of FIG. 8, respectively. The second feature swap unit 420 receives the feature maps FM32 and the feature maps FM42, and may swap at least some of the elements of the feature maps FM42 with elements of the feature maps FM32. The second convolution decoder 430 includes at least one convolution decoder layer DCV21 and at least one up-sampling layer UP21, and the second swap maps SWM2 include an up-sampling layer UP21 and a convolution. It may be converted into a converted image CIMG by passing through the decoder layer DCV21.

제 2 컨볼루션 인코더(410)에 포함된 컨볼루션 인코더 레이어들(CV21, CV22)의 수와 제 1 컨볼루션 인코더(310)에 포함된 컨볼루션 인코더 레이어들(CV11~CV13)의 수는 서로 다를 수 있다.The number of convolutional encoder layers CV21 and CV22 included in the second convolutional encoder 410 and the number of convolutional encoder layers CV11 to CV13 included in the first convolutional encoder 310 are different from each other. I can.

입력 이미지를 순차적으로 처리하는 컨볼루션 인코더 레이어들의 수가 감소할수록, 특징 맵들은 상대적으로 미세한 영역의 특징과 관련된 데이터를 포함할 수 있다. 만약 기준 이미지(RIMG)와 타겟 이미지(TIMG)가 상대적으로 적은 수의 컨볼루션 인코더 레이어들을 통과하고 그에 따라 변환 이미지가 생성된다면, 기준 이미지(RIMG)의 특징(혹은 스타일)은 상대적으로 미세한 영역의 단위로 타겟 이미지(TIMG)에 반영될 수 있으며, 이에 따라 변환 이미지 내에서 타겟 이미지(TIMG)의 디테일한 형상들은 상대적으로 잘 표현될 수 있다.As the number of convolutional encoder layers sequentially processing an input image decreases, the feature maps may include data related to a feature of a relatively fine region. If the reference image (RIMG) and the target image (TIMG) pass through a relatively small number of convolutional encoder layers and a transformed image is generated accordingly, the characteristic (or style) of the reference image (RIMG) is The unit may be reflected in the target image TIMG, and thus detailed shapes of the target image TIMG in the converted image may be relatively well expressed.

입력 이미지를 순차적으로 처리하는 컨볼루션 인코더 레이어들의 수가 증가할수록, 특징 맵들은 상대적으로 큰 영역의(혹은 거시적인) 특징을 포함할 수 있다. 만약 기준 이미지(RIMG)와 타겟 이미지(TIMG)가 상대적으로 많은 수의 컨볼루션 인코더 레이어들을 통과하고 그에 따라 변환 이미지가 생성된다면, 기준 이미지(RIMG)의 특징은 상대적으로 큰 영역의 단위로 타겟 이미지(TIMG)에 반영될 수 있으며, 이에 따라 변환 이미지 내에서 기준 이미지(RIMG)의 특징이 타겟 이미지(TIMG)에 반영된 정도는 사람에게 상대적으로 높게 느껴질 수 있다.As the number of convolutional encoder layers sequentially processing an input image increases, the feature maps may include a relatively large area (or macroscopic) feature. If the reference image (RIMG) and the target image (TIMG) pass through a relatively large number of convolutional encoder layers and a transformed image is generated accordingly, the characteristic of the reference image (RIMG) is the target image in units of a relatively large area. It may be reflected in (TIMG), and accordingly, the degree to which the characteristic of the reference image RIMG is reflected in the target image TIMG in the converted image may be felt relatively high to a person.

본 발명의 실시 예에 따르면, 제 1 및 제 2 컨볼루션 신경망 네트워크들(300, 400)은 순차적인 스테이지들을 이루어 기준 이미지(RIMG) 및 타겟 이미지(TIMG)를 처리하되, 그것들에 포함되는 컨볼루션 인코더 레이어들의 수들은 서로 다를 수 있다. 이때, 제 1 컨볼루션 신경망 네트워크(300)의 출력인 세미 변환 이미지(SIMG)는 제 2 컨볼루션 신경망 네트워크(400)의 입력으로서 제공된다. 좀 더 구체적으로, 제 1 컨볼루션 신경망 네트워크(300)는 기준 이미지(RIMG)의 특징을 타겟 이미지(TIMG)에 일차적으로 반영하여 세미 변환 이미지(SIMG)를 생성하며, 제 2 컨볼루션 신경망 네트워크(400)는 기준 이미지(RIMG)의 특징을 세미 변환 이미지(SIMG)에 이차적으로 반영하여 변환 이미지(CIMG)를 생성한다. 위 일차적 및 이차적 반영들 중 어느 하나는 변환 이미지(CIMG) 내에서 기준 이미지(RIMG)의 특징이 타겟 이미지(TIMG)에 반영된 정도를 상대적으로 높게 하며, 나머지 하나는 변환 이미지(CIMG)가 타겟 이미지(TIMG)의 디테일한 형상들을 상대적으로 잘 표현하게 한다. 이에 따라, 변환 이미지(CIMG)는 기준 이미지(RIMG)의 특징이 효과적으로 반영되면서도 타겟 이미지(TIMG)의 디테일한 형상들을 효과적으로 표현하는 변환된 타겟 오브젝트를 포함할 수 있다.According to an embodiment of the present invention, the first and second convolutional neural networks 300 and 400 form sequential stages to process the reference image RIMG and the target image TIMG, but convolutions included therein The number of encoder layers may be different. In this case, a semi-transformed image (SIMG) that is an output of the first convolutional neural network 300 is provided as an input of the second convolutional neural network 400. More specifically, the first convolutional neural network 300 generates a semi-transformed image SIMG by primarily reflecting the characteristics of the reference image RIMG to the target image TIMG, and the second convolutional neural network network ( 400) generates a converted image CIMG by secondaryly reflecting the characteristics of the reference image RIMG to the semi-transformed image SIMG. One of the above primary and secondary reflections is to relatively increase the degree to which the characteristics of the reference image (RIMG) are reflected in the target image (TIMG) in the converted image (CIMG), and the other is the converted image (CIMG) It makes the detailed shapes of (TIMG) relatively well expressed. Accordingly, the converted image CIMG may include a converted target object that effectively reflects the characteristics of the reference image RIMG and effectively expresses detailed shapes of the target image TIMG.

실시 예들에서, 도 8 및 도 9를 참조하여 설명된 실시 예들과 같이, 제 1 컨볼루션 신경망 네트워크(300)에 포함된 컨볼루션 인코더 레이어들의 수는 제 2 컨불루션 신경망 네트워크(400)보다 많을 수 있다. 본 출원의 발명자는 반복적인 실험들을 통해 이러한 실시 예들에서 변환 이미지(CIMG)가 타겟 이미지(TIMG)의 디테일한 형상들을 더욱 효과적으로 표현하는 변환된 타겟 오브젝트를 포함함을 알 수 있었다. 다른 실시 예들에서, 제 2 컨불루션 신경망 네트워크(400)에 포함된 컨볼루션 인코더 레이어들의 수가 제 1 컨볼루션 신경망 네트워크(300)보다 많을 수 있다.In embodiments, as in the embodiments described with reference to FIGS. 8 and 9, the number of convolutional encoder layers included in the first convolutional neural network 300 may be greater than that of the second convolutional neural network 400 have. Through repeated experiments, the inventor of the present application found that in these embodiments, the transformed image CIMG includes a transformed target object that more effectively expresses detailed shapes of the target image TIMG. In other embodiments, the number of convolutional encoder layers included in the second convolutional neural network 400 may be greater than that of the first convolutional neural network 300.

도 10은 도 2의 S140단계의 실시 예를 보여주는 순서도이다.10 is a flowchart showing an embodiment of step S140 of FIG. 2.

도 1 및 도 10을 참조하면, S210단계에서, 이미지 변환 장치(160)는 기준 이미지(RIMG) 및 타겟 이미지(TIMG)를 제 1 컨볼루션 신경망 네트워크(165)에 입력시켜 세미 변환 이미지를 획득한다.1 and 10, in step S210, the image conversion device 160 inputs the reference image RIMG and the target image TIMG to the first convolutional neural network network 165 to obtain a semi-transformed image. .

제 1 컨볼루션 신경망 네트워크(165)는 복수의 컨볼루션 인코더 레이어들을 이용하여 기준 이미지(RIMG)와 연관된 제 1 특징 맵들 및 타겟 이미지(TIMG)와 연관된 제 2 특징 맵들을 생성하고, 제 2 특징 맵들의 요소들 중 적어도 일부를 제 1 특징 맵들의 요소들로 스왑하고, 이를 하나 또는 그 이상의 컨볼루션 디코더 레이어들에 통과시켜 세미 변환 이미지(SIMG)를 획득할 수 있다.The first convolutional neural network 165 generates first feature maps associated with the reference image RIMG and second feature maps associated with the target image TIMG using a plurality of convolutional encoder layers, and generates a second feature map. At least some of the elements of the first feature maps may be swapped with elements of the first feature maps and passed through one or more convolution decoder layers to obtain a semi-transformed image SIMG.

S220단계에서, 이미지 변환 장치(160)는 기준 이미지(RIMG) 및 세미 변환 이미지(SIMG)를 제 2 컨볼루션 신경망 네트워크(166)에 입력시켜 변환 이미지(CIMG)를 획득한다. 제 2 컨볼루션 신경망 네트워크(166)는 복수의 컨볼루션 인코더 레이어들을 이용하여 기준 이미지(RIMG)와 연관된 제 3 특징 맵들 및 세미 변환 이미지(SIMG)와 연관된 제 4 특징 맵들을 생성하고, 제 4 특징 맵들의 요소들 중 적어도 일부를 제 3 특징 맵들의 요소들로 스왑하고, 이를 하나 또는 그 이상의 컨볼루션 디코더 레이어들에 통과시켜 변환 이미지(CIMG)를 획득할 수 있다.In step S220, the image conversion apparatus 160 obtains the converted image CIMG by inputting the reference image RIMG and the semi-transformed image SIMG to the second convolutional neural network network 166. The second convolutional neural network network 166 generates third feature maps associated with the reference image RIMG and fourth feature maps associated with the semi-transformed image SIMG using a plurality of convolutional encoder layers, and generates a fourth feature map. At least some of the elements of the maps may be swapped with elements of the third feature maps, and the transformed image CIMG may be obtained by passing this through one or more convolutional decoder layers.

이때, 제 1 컨볼루션 신경망 네트워크(165) 및 제 2 컨볼루션 신경망 네트워크(166)는 서로 다른 수들의 컨볼루션 인코더 레이어들을 포함할 수 있다. 이에 따라, 변환 이미지(CIMG)는 기준 이미지(RIMG)의 특징이 효과적으로 반영되면서도 타겟 이미지(TIMG)의 디테일한 형상들을 효과적으로 표현하는 변환된 타겟 오브젝트를 포함할 수 있다.In this case, the first convolutional neural network 165 and the second convolutional neural network 166 may include different numbers of convolutional encoder layers. Accordingly, the converted image CIMG may include a converted target object that effectively reflects the characteristics of the reference image RIMG and effectively expresses detailed shapes of the target image TIMG.

도 11은 도 1의 컴퓨터 장치의 다른 실시 예를 보여주는 블록도이다.11 is a block diagram showing another embodiment of the computer device of FIG. 1.

도 11을 참조하면, 컴퓨터 장치(1000)는 통신기(1100), A/V 입력기(1200), 사용자 인터페이스(1300), 디스플레이 장치(1400), 불휘발성 저장 매체(1500), 프로세서(1600), 및 시스템 메모리(1700)를 포함한다.Referring to FIG. 11, the computer device 1000 includes a communication device 1100, an A/V input device 1200, a user interface 1300, a display device 1400, a nonvolatile storage medium 1500, a processor 1600, and And a system memory 1700.

통신기(1100)는 프로세서(1600)의 제어에 응답하여 외부 장치와 네트워크를 통해 통신하도록 구성된다. 통신기(1100)는 이동 통신망 상에서 기지국, 외부의 서버, 및 외부의 단말 중 적어도 하나와 유/무선 신호들을 송신하도록 구성된다. 이때, 무선 신호는 음성 호 신호, 화상 통화 호 신호, 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 타입들의 데이터를 포함할 수 있다. 또한, 통신기(1100)는 무선 인터넷에 접속하도록 구성된다. 나아가, 통신기(1100)는 근거리 무선 통신을 수행하도록 구성되며, 여기서 근거리 무선 통신은 블루투스(Bluetooth), 와이파이(Wi-Fi) 통신, LTE D2D 통신, NFC, 마그네틱 보안 전송 통신, 지그비(ZigBee) 통신, 적외선(IrDA, infrared Data Association) 통신, UWB(Ultra Wideband) 통신, Ant+ 통신, 및/또는 그와 유사한 것 중 적어도 하나의 통신 프로토콜을 이용한 통신을 포함할 수 있다. 통신기(1100)는 도 1의 통신기(105)로서 제공될 수 있다.The communicator 1100 is configured to communicate with an external device through a network in response to the control of the processor 1600. The communication unit 1100 is configured to transmit wired/wireless signals with at least one of a base station, an external server, and an external terminal on a mobile communication network. In this case, the wireless signal may include a voice call signal, a video call signal, or various types of data according to transmission/reception of text/multimedia messages. Further, the communicator 1100 is configured to access the wireless Internet. Further, the communicator 1100 is configured to perform short-range wireless communication, where the short-range wireless communication is Bluetooth, Wi-Fi communication, LTE D2D communication, NFC, magnetic security transmission communication, ZigBee communication. , Infrared (Infrared Data Association) communication, UWB (Ultra Wideband) communication, Ant+ communication, and/or communication using at least one communication protocol of the same. The communicator 1100 may be provided as the communicator 105 of FIG. 1.

A/V 입력기(1200)는 오디오 신호 및 비디오 신호의 입력을 위한 것으로, 카메라를 포함할 수 있다. A/V 입력기(1200)는 카메라의 이미지 센서를 통해 얻어지는 이미지를 처리한다. A/V 입력기(1200)는 도 1의 A/V 입력기(120)로서 제공될 수 있다.The A/V input unit 1200 is for inputting an audio signal and a video signal, and may include a camera. The A/V input unit 1200 processes an image obtained through an image sensor of a camera. The A/V input unit 1200 may be provided as the A/V input unit 120 of FIG. 1.

사용자 인터페이스(1300)는 컴퓨터 장치(1000) 혹은 프로세서(1600)의 동작들을 제어하기 위한 사용자 입력들을 수신한다. 사용자 인터페이스(1300)는 도 1의 사용자 인터페이스(110)로서 제공될 수 있다.The user interface 1300 receives user inputs for controlling operations of the computer device 1000 or the processor 1600. The user interface 1300 may be provided as the user interface 110 of FIG. 1.

디스플레이 장치(1400)는 컴퓨터 장치(1000) 혹은 프로세서(1600)에 의해 처리되는 정보를 디스플레이한다. 실시 예들에서, 디스플레이(1400)는 액정 디스플레이(liquid crystal display), 유기 발광 다이오드(organic light-emitting diode) 디스플레이, 플렉서블 디스플레이(flexible display), 등 중 적어도 하나를 포함할 수 있다. 디스플레이 장치(1400)는 도 1의 디스플레이 장치(130)로서 제공될 수 있다.The display device 1400 displays information processed by the computer device 1000 or the processor 1600. In embodiments, the display 1400 may include at least one of a liquid crystal display, an organic light-emitting diode display, a flexible display, and the like. The display device 1400 may be provided as the display device 130 of FIG. 1.

불휘발성 저장 매체(1500)는 플래시 메모리(flash memory type), 하드 디스크 (hard disk type), 멀티미디어 카드(multimedia card) 등 중 적어도 하나일 수 있다. 불휘발성 저장 매체(1500)는 프로세서(1600)의 제어에 응답하여 데이터를 기입하고 독출하도록 구성된다. 불휘발성 저장 매체(1500)는 도 1의 불휘발성 저장 매체(150)로서 제공될 수 있다.The nonvolatile storage medium 1500 may be at least one of a flash memory type, a hard disk type, and a multimedia card. The nonvolatile storage medium 1500 is configured to write and read data in response to the control of the processor 1600. The nonvolatile storage medium 1500 may be provided as the nonvolatile storage medium 150 of FIG. 1.

프로세서(1600)는 범용 혹은 전용 프로세서 중 어느 하나를 포함할 수 있으며, 통신기(1100), A/V 입력기(1200), 사용자 인터페이스(1300), 디스플레이 장치(1400), 불휘발성 저장 매체(1500), 및 시스템 메모리(1700)의 동작들을 제어한다.The processor 1600 may include either a general-purpose or a dedicated processor, and a communication unit 1100, an A/V input unit 1200, a user interface 1300, a display device 1400, and a nonvolatile storage medium 1500 , And operations of the system memory 1700.

프로세서(1600)는 프로그램 코드들을 불휘발성 저장 매체(1500)로부터 시스템 메모리(1700)에 로딩하고, 로딩된 프로그램 코드들을 실행할 수 있다. 프로세서(1600)는 프로세서(1600)에 의해 실행될 때 도 1, 도 2, 및 도 7 내지 도 10을 참조하여 설명된 이미지 변환 장치(160)의 기능들을 수행하는 이미지 변환 모듈(1710)을 시스템 메모리(1700)에 로딩하고, 로딩된 이미지 변환 모듈(1710)을 실행할 수 있다. 또한, 프로세서(1600)는 운영 체제(1720)를 시스템 메모리(1700)에 로딩하고, 로딩된 운영 체제(1720)를 실행하여 컴퓨터 장치(1000)의 제반 동작들을 제어할 수 있다. 운영 체제(1720)는 통신기(1100), A/V 입력기(1200), 사용자 인터페이스(1300), 디스플레이 장치(1400), 불휘발성 저장 매체(1500), 및 메모리(1700)에 대한 인터페이스를 이미지 변환 모듈(1710)에 제공할 수 있다.The processor 1600 may load program codes from the nonvolatile storage medium 1500 into the system memory 1700 and execute the loaded program codes. The processor 1600 includes an image conversion module 1710 that performs the functions of the image conversion device 160 described with reference to FIGS. 1, 2, and 7 to 10 when executed by the processor 1600 as a system memory. The image conversion module 1710 may be loaded, and the loaded image conversion module 1710 may be executed. Further, the processor 1600 may load the operating system 1720 into the system memory 1700 and execute the loaded operating system 1720 to control all operations of the computer device 1000. The operating system 1720 converts the interface to the communication unit 1100, A/V input unit 1200, user interface 1300, display device 1400, nonvolatile storage medium 1500, and memory 1700. It can be provided to the module 1710.

시스템 메모리(1700)는 램(Random Access Memory, RAM), 롬(Read Only Memory, ROM), 및 다른 타입들의 컴퓨터에 의해 판독 가능한 저장 매체들 중 적어도 하나를 포함할 수 있다. 도 11에서, 시스템 메모리(1700)는 프로세서(1600)와 분리된 구성 요소로서 도시되어 있으나, 이는 예시적인 것으로 시스템 메모리(1700)의 적어도 일부는 프로세서(1600) 내에 통합될 수 있다. 시스템 메모리(1700)는 프로세서(1600)의 버퍼 메모리로서 제공될 수 있다. 시스템 메모리(1700)는 도 1의 메모리(140)로서 기능할 수 있다.The system memory 1700 may include at least one of random access memory (RAM), read only memory (ROM), and other types of computer-readable storage media. In FIG. 11, the system memory 1700 is shown as a separate component from the processor 1600, but this is exemplary, and at least a portion of the system memory 1700 may be integrated into the processor 1600. The system memory 1700 may be provided as a buffer memory of the processor 1600. The system memory 1700 may function as the memory 140 of FIG. 1.

실시 예들에서, 통신기(1100), A/V 입력기(1200), 사용자 인터페이스(1300), 디스플레이 장치(1400), 불휘발성 저장 매체(1500), 프로세서(1600), 및 시스템 메모리(1700)를 서로 연결하는 인터페이스를 제공하는 시스템 버스가 더 제공될 수 있다.In embodiments, the communication unit 1100, the A/V input unit 1200, the user interface 1300, the display device 1400, the nonvolatile storage medium 1500, the processor 1600, and the system memory 1700 are connected to each other. A system bus may be further provided that provides a connecting interface.

도 12는 도 11의 컴퓨터 장치와 통신할 수 있는 클라이언트 서버를 보여주는 블록도이다.12 is a block diagram illustrating a client server capable of communicating with the computer device of FIG. 11.

도 11의 이미지 변환 모듈(1710)에 대응하는 명령어들을 포함하는 프로그램(혹은 응용 애플리케이션)은 클라이언트 서버(2000)로부터 제공될 수 있다. 도 12를 참조하면, 클라이언트 서버(2000)는 통신기(2100), 프로세서(2200), 및 데이터베이스(2300)를 포함할 수 있다. 통신기(2100)는 네트워크를 통해 컴퓨터 장치(1000)와 통신할 수 있다. 데이터베이스(2300)는 이미지 변환 모듈(1710)에 대응하는 명령어들을 포함하는 프로그램을 저장할 수 있다. 프로세서(2200)는 도 11의 컴퓨터 장치(1000)로부터의 요청에 응답하여 데이터베이스(2300)에 저장된 프로그램을 통신기(2100)를 통해 컴퓨터 장치(1000)에 제공할 수 있다. 컴퓨터 장치(1000)는 통신기(1100)를 통해 프로그램을 수신하고, 수신된 프로그램을 설치하고, 설치된 프로그램의 명령어들 중 적어도 일부를 이미지 변환 모듈(1710)로서 시스템 메모리(1700)에 로드하고, 로드된 이미지 변환 모듈(1710)을 실행할 수 있다.A program (or application application) including instructions corresponding to the image conversion module 1710 of FIG. 11 may be provided from the client server 2000. Referring to FIG. 12, the client server 2000 may include a communicator 2100, a processor 2200, and a database 2300. The communicator 2100 may communicate with the computer device 1000 through a network. The database 2300 may store a program including commands corresponding to the image conversion module 1710. The processor 2200 may provide the program stored in the database 2300 to the computer device 1000 through the communicator 2100 in response to a request from the computer device 1000 of FIG. 11. The computer device 1000 receives a program through the communication unit 1100, installs the received program, loads at least some of the instructions of the installed program into the system memory 1700 as the image conversion module 1710, and loads The converted image conversion module 1710 may be executed.

비록 특정 실시 예들 및 적용 례들이 여기에 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정들 및 변형들이 가능하다.Although specific embodiments and application examples have been described herein, these are provided only to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains. When grown, various modifications and variations are possible from this description.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention is limited to the described embodiments and should not be defined, and all things that are equivalent or equivalent to the claims as well as the claims to be described later fall within the scope of the spirit of the present invention. .

105: 통신기
110: 사용자 인터페이스
120: A/V 입력기
130: 디스플레이 장치
140: 메모리
150: 불휘발성 저장 매체
160: 이미지 변환 장치105: communicator
110: user interface
120: A/V input method
130: display device
140: memory
150: nonvolatile storage medium
160: image conversion device

Claims

적어도 하나의 프로세서;
상기 적어도 하나의 프로세서의 제어에 응답하여 동작하는 디스플레이 장치; 및
제 1 이미지 및 제 2 이미지를 저장하되, 상기 적어도 하나의 프로세서의 제어에 응답하여 동작하는 메모리를 포함하되,
상기 적어도 하나의 프로세서는,
상기 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하고,
상기 타겟 오브젝트를 상기 제 2 이미지의 일부에 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하고,
제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크(Convolutional Neural Network)들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하고,
상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이하도록 구성되며,
상기 복수의 컨볼루션 신경망 네트워크들은 제 1 및 제 2 컨볼루션 신경망 네트워크들을 포함하고,
상기 적어도 하나의 프로세서는,
상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크에 입력시켜 상기 제 1 컨볼루션 신경망 네트워크로부터 제 4 이미지를 획득하고,
상기 제 4 이미지 및 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크에 입력시켜 상기 제 2 컨볼루션 신경망 네트워크로부터 상기 제 3 이미지를 획득하도록 구성되는 컴퓨터 장치.At least one processor;
A display device operating in response to the control of the at least one processor; And
A memory for storing the first image and the second image, the memory operating in response to the control of the at least one processor,
The at least one processor,
In response to a first user input selecting the first image, at least one target object is detected from the first image,
Displaying the target object overlapping the part of the second image on the display device in response to a second user input overlapping the target object with a part of the second image,
In response to a third user input, the overlapped target object and the second image are processed through a plurality of convolutional neural networks to convert the overlapped target object in association with the second image, wherein the Each of the plurality of convolutional neural network networks includes a plurality of convolutional encoder layers and at least one convolutional decoder layer,
Configured to display the converted target object overlapped with the part of the second image as a third image on the display device,
The plurality of convolutional neural network networks include first and second convolutional neural network networks,
The at least one processor,
Inputting the overlapped target object and the second image to the first convolutional neural network network to obtain a fourth image from the first convolutional neural network network,
A computer apparatus configured to input the fourth image and the second image to the second convolutional neural network network to obtain the third image from the second convolutional neural network network.

제 1 항에 있어서,
상기 복수의 컨볼루션 신경망 네트워크들은 순차적인 스테이지들로서 제공되는 컴퓨터 장치.The method of claim 1,
The plurality of convolutional neural network networks are provided as sequential stages.

제 1 항에 있어서,
상기 복수의 컨볼루션 신경망 네트워크들은 서로 다른 수의 컨볼루션 인코더 레이어들을 갖는 컴퓨터 장치.The method of claim 1,
The plurality of convolutional neural network networks have different numbers of convolutional encoder layers.

삭제delete

제 1 항에 있어서,
상기 적어도 하나의 프로세서는,
상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들에 통과시켜 제 1 특징 맵들을 생성하고,
상기 오버랩된 타겟 오브젝트를 상기 제 1 컨볼루션 신경망 네트워크의 상기 컨볼루션 인코더 레이어들에 통과시켜 제 2 특징 맵들을 생성하고,
상기 제 2 특징 맵들의 적어도 일부에 상기 제 1 특징 맵들을 반영하여 제 1 스왑(swap) 맵들을 획득하고,
상기 제 1 스왑 맵들을 상기 제 1 컨볼루션 신경망 네트워크의 적어도 하나의 컨볼루션 디코더 레이어에 통과시켜 상기 제 4 이미지를 획득하는 컴퓨터 장치.The method of claim 1,
The at least one processor,
Generating first feature maps by passing the second image through convolutional encoder layers of the first convolutional neural network,
Generate second feature maps by passing the overlapped target object through the convolutional encoder layers of the first convolutional neural network network,
Acquire first swap maps by reflecting the first feature maps on at least a part of the second feature maps,
A computer apparatus for obtaining the fourth image by passing the first swap maps through at least one convolutional decoder layer of the first convolutional neural network network.

제 5 항에 있어서,
상기 적어도 하나의 프로세서는,
상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들에 통과시켜 제 3 특징 맵들을 생성하고,
상기 제 4 이미지를 상기 제 2 컨볼루션 신경망 네트워크의 상기 컨볼루션 인코더 레이어들에 통과시켜 제 4 특징 맵들을 생성하고,
상기 제 4 특징 맵들의 적어도 일부에 상기 제 3 특징 맵들을 반영하여 제 2 스왑 맵들을 획득하고,
상기 제 2 스왑 맵들을 상기 제 2 컨볼루션 신경망 네트워크의 적어도 하나의 컨볼루션 디코더 레이어에 통과시켜 상기 제 3 이미지를 획득하는 컴퓨터 장치.The method of claim 5,
The at least one processor,
Generating third feature maps by passing the second image through convolutional encoder layers of the second convolutional neural network,
Passing the fourth image through the convolutional encoder layers of the second convolutional neural network to generate fourth feature maps,
Obtaining second swap maps by reflecting the third feature maps on at least a portion of the fourth feature maps,
A computer apparatus for obtaining the third image by passing the second swap maps through at least one convolutional decoder layer of the second convolutional neural network network.

제 1 항에 있어서,
상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많거나 같은 컴퓨터 장치.The method of claim 1,
The number of convolutional encoder layers of the first convolutional neural network network is greater than or equal to the number of convolutional encoder layers of the second convolutional neural network.

제 1 항에 있어서,
상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많거나 같은 컴퓨터 장치.The method of claim 1,
A computer device in which the number of convolutional encoder layers of the second convolutional neural network network is greater than or equal to the number of convolutional encoder layers of the first convolutional neural network.

디스플레이 장치를 포함하는 컴퓨터 장치의 동작 방법에 있어서,
제 1 이미지 및 제 2 이미지를 저장하는 단계;
상기 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하는 단계;
상기 타겟 오브젝트를 상기 제 2 이미지의 일부에 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하는 단계;
제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하는, 단계; 및
상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이하는 단계를 포함하며,
상기 복수의 컨볼루션 신경망 네트워크들은 제 1 및 제 2 컨볼루션 신경망 네트워크들을 포함하고,
상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하는 단계는,
상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크에 입력시켜 상기 제 1 컨볼루션 신경망 네트워크로부터 제 4 이미지를 획득하는 단계; 및
상기 제 4 이미지 및 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크에 입력시켜 상기 제 2 컨볼루션 신경망 네트워크로부터 상기 제 3 이미지를 획득하는 단계를 포함하는 동작 방법.In the method of operating a computer device including a display device,
Storing the first image and the second image;
Detecting at least one target object from the first image in response to a first user input selecting the first image;
Displaying the target object overlapping the part of the second image on the display device in response to a second user input overlapping the target object with a part of the second image;
In response to a third user input, the overlapped target object and the second image are processed through a plurality of convolutional neural network networks to convert the overlapped target object in association with the second image, and the plurality of convolutional neural networks Each of the networks comprising a plurality of convolutional encoder layers and at least one convolutional decoder layer; And
Displaying the converted target object overlapped with the part of the second image as a third image on the display device,
The plurality of convolutional neural network networks include first and second convolutional neural network networks,
Converting the overlapped target object in association with the second image,
Inputting the overlapped target object and the second image to the first convolutional neural network network to obtain a fourth image from the first convolutional neural network network; And
And obtaining the third image from the second convolutional neural network network by inputting the fourth image and the second image to the second convolutional neural network network.

제 9 항에 있어서,
상기 복수의 컨볼루션 신경망 네트워크들은 순차적인 스테이지들로서 제공되는 동작 방법.The method of claim 9,
The method of operation wherein the plurality of convolutional neural networks are provided as sequential stages.

제 9 항에 있어서,
상기 복수의 컨볼루션 신경망 네트워크들은 서로 다른 수의 컨볼루션 인코더 레이어들을 갖는 동작 방법.The method of claim 9,
The method of operation of the plurality of convolutional neural network networks having different numbers of convolutional encoder layers.

삭제delete

제 9 항에 있어서,
상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많거나 같은 동작 방법.The method of claim 9,
The number of convolutional encoder layers of the first convolutional neural network network is greater than or equal to the number of convolutional encoder layers of the second convolutional neural network.

제 9 항에 있어서,
상기 제 2 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수는 상기 제 1 컨볼루션 신경망 네트워크의 컨볼루션 인코더 레이어들의 수보다 많거나 같은 동작 방법.The method of claim 9,
The number of convolutional encoder layers of the second convolutional neural network network is greater than or equal to the number of convolutional encoder layers of the first convolutional neural network.

프로그램을 저장하는, 컴퓨터에 의해 판독 가능한 저장 매체에 있어서,
상기 프로그램은 상기 컴퓨터에 의해 실행될 때,
상기 컴퓨터에 저장된 제 1 이미지를 선택하는 제 1 사용자 입력에 응답하여 상기 제 1 이미지로부터 적어도 하나의 타겟 오브젝트를 감지하고,
디스플레이 장치에 디스플레이된 제 2 이미지의 일부에 상기 타겟 오브젝트를 오버랩하는 제 2 사용자 입력에 응답하여 상기 제 2 이미지의 상기 일부에 오버랩된 상기 타겟 오브젝트를 상기 디스플레이 장치에 디스플레이하고,
제 3 사용자 입력에 응답하여 상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 복수의 컨볼루션 신경망 네트워크들을 통해 프로세싱함으로써 상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하되, 상기 복수의 컨볼루션 신경망 네트워크들 각각은 복수의 컨볼루션 인코더 레이어들 및 적어도 하나의 컨볼루션 디코더 레이어를 포함하고,
상기 제 2 이미지의 상기 일부에 오버랩된 상기 변환된 타겟 오브젝트를 제 3 이미지로서 상기 디스플레이 장치에 디스플레이하는 명령어들을 포함하며,
상기 복수의 컨볼루션 신경망 네트워크들은 제 1 및 제 2 컨볼루션 신경망 네트워크들을 포함하고,
상기 오버랩된 타겟 오브젝트를 상기 제 2 이미지와 연관하여 변환하는 것은,
상기 오버랩된 타겟 오브젝트 및 상기 제 2 이미지를 상기 제 1 컨볼루션 신경망 네트워크에 입력시켜 상기 제 1 컨볼루션 신경망 네트워크로부터 제 4 이미지를 획득하고,
상기 제 4 이미지 및 상기 제 2 이미지를 상기 제 2 컨볼루션 신경망 네트워크에 입력시켜 상기 제 2 컨볼루션 신경망 네트워크로부터 상기 제 3 이미지를 획득하는 것을 포함하는 저장 매체.In a computer-readable storage medium storing a program,
When the program is executed by the computer,
Detecting at least one target object from the first image in response to a first user input selecting a first image stored in the computer,
Displaying the target object overlapping the part of the second image on the display device in response to a second user input of overlapping the target object with a part of the second image displayed on the display device,
In response to a third user input, the overlapped target object and the second image are processed through a plurality of convolutional neural network networks to convert the overlapped target object in association with the second image, and the plurality of convolutional neural networks Each of the networks includes a plurality of convolutional encoder layers and at least one convolutional decoder layer,
And instructions for displaying the converted target object overlapped with the part of the second image as a third image on the display device,
The plurality of convolutional neural network networks include first and second convolutional neural network networks,
Transforming the overlapped target object in association with the second image,
Inputting the overlapped target object and the second image into the first convolutional neural network network to obtain a fourth image from the first convolutional neural network network,
And obtaining the third image from the second convolutional neural network network by inputting the fourth image and the second image to the second convolutional neural network network.