KR102424538B1

KR102424538B1 - Method and apparatus for image restoration

Info

Publication number: KR102424538B1
Application number: KR1020210034480A
Authority: KR
Inventors: 권기남; 김희원; 이경무; 이형욱
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2021-01-26
Filing date: 2021-03-17
Publication date: 2022-07-27

Abstract

Disclosed are an image restoration method and a device thereof. According to one embodiment, the method may include a step of receiving an input image and a first task vector indicating a first image effect among a plurality of candidate image effects, a step of extracting a common feature shared by a plurality of candidate image effects from the input image based on the task-independent architecture of a source neural network, and a step of reconstructing the common feature into a first reconstructed image corresponding to the first image effect based on the task-independent architecture of the source neural network and the first task vector.

Description

영상 복원 방법 및 장치{METHOD AND APPARATUS FOR IMAGE RESTORATION}Image restoration method and apparatus {METHOD AND APPARATUS FOR IMAGE RESTORATION}

아래 실시예들은 영상 복원 방법 및 장치에 관한 것이다.The following embodiments relate to an image restoration method and apparatus.

영상 복원은 열화 상태의 영상을 향상된 화질의 영상으로 복원하는 기술이다. 영상 복원을 위해 딥 러닝 기반의 뉴럴 네트워크가 이용될 수 있다. 뉴럴 네트워크는 딥 러닝에 기반하여 트레이닝된 후, 비선형적 관계에 있는 입력 데이터 및 출력 데이터를 서로 매핑함으로써 목적에 맞는 추론(inference)을 수행해낼 수 있다. 이러한 맵핑을 생성하는 트레이닝된 능력은 신경망의 학습 능력이라 할 수 있다. 더구나, 영상 복원과 같은 특화된 목적을 위해 트레이닝된 신경망은, 예를 들어 트레이닝하지 않은 입력 패턴에 대하여 비교적 정확한 출력을 발생시키는 일반화 능력을 가질 수 있다.Image restoration is a technique for restoring an image in a deteriorated state into an image of improved quality. A deep learning-based neural network may be used for image restoration. After the neural network is trained based on deep learning, by mapping input data and output data in a non-linear relationship with each other, inference can be performed according to the purpose. The trained ability to generate such a mapping is the learning ability of the neural network. Moreover, neural networks trained for specialized purposes, such as image reconstruction, may have the ability to generalize, for example, to generate relatively accurate outputs for untrained input patterns.

일 실시예에 따르면, 영상 복원 방법은 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하는 단계; 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하는 단계; 및 상기 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 상기 제1 작업 벡터에 기초하여 상기 공통 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는 단계를 포함한다.According to an embodiment, an image restoration method includes receiving an input image and a first working vector indicating a first image effect among a plurality of candidate image effects; extracting a common feature shared by the plurality of candidate image effects from the input image based on a task-independent architecture of a source neural network; and reconstructing the common feature into a first reconstructed image corresponding to the first image effect based on the work-specific architecture of the source neural network and the first working vector.

일 실시예에 따르면, 트레이닝 방법은 제1 트레이닝 입력 영상, 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터, 및 상기 제1 영상 효과에 따른 제1 트레이닝 목표 영상을 포함하는 제1 트레이닝 데이터 세트를 수신하는 단계; 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 제1 트레이닝 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하는 단계; 상기 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 상기 제1 작업 벡터에 기초하여 상기 공통 특징을 제1 복원 영상으로 복원하는 단계; 및 상기 제1 트레이닝 목표 영상과 상기 제1 복원 영상 간의 차이, 및 상기 공통 특징의 추출 및 상기 제1 복원 영상의 복원과 관련된 연산량에 기초하여 상기 소스 뉴럴 네트워크를 갱신하는 단계를 포함한다.According to an embodiment, the training method includes a first training input image, a first working vector indicating a first image effect among a plurality of candidate image effects, and a first training target image according to the first image effect receiving a first set of training data; extracting a common feature shared by the plurality of candidate image effects from the first training input image based on a task-independent architecture of a source neural network; reconstructing the common feature into a first reconstructed image based on a work-specific architecture of the source neural network and the first working vector; and updating the source neural network based on a difference between the first training target image and the first reconstructed image, and an amount of computation related to extraction of the common feature and reconstruction of the first reconstructed image.

일 실시예에 따르면, 전자 장치는 입력 영상을 생성하는 카메라; 및 상기 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하고, 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하고, 상기 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 상기 제1 작업 벡터에 기초하여 상기 공통 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는, 프로세서를 포함한다.According to an embodiment, an electronic device includes: a camera that generates an input image; and receiving the input image and a first working vector indicating a first visual effect among a plurality of candidate visual effects, wherein the plurality of candidate visual effects are determined from the input image based on a work-independent architecture of a source neural network. and a processor for extracting a common feature shared and reconstructing the common feature into a first reconstructed image corresponding to the first image effect based on a work-specific architecture of the source neural network and the first working vector. .

도 1은 일 실시예에 따른 영상 복원 장치의 개략적인 동작을 나타낸다.
도 2는 일 실시예에 따른 소스 뉴럴 네트워크 및 변형 네트워크들을 나타낸다.
도 3은 일 실시예에 따른 작업-특화 아키텍처 및 제어 아키텍처를 나타낸다.
도 4는 일 실시예에 따른 제1 작업 벡터에 기초한 영상 복원 동작을 나타낸 플로우 차트이다.
도 5는 일 실시예에 따른 트레이닝 장치를 나타낸 블록도이다.
도 6은 일 실시예에 따른 소스 뉴럴 네트워크의 아키텍처를 나타낸다.
도 7은 일 실시예에 따른 채널 선택 동작을 나타낸다.
도 8은 일 실시예에 따른 아키텍처 제어 네트워크의 구성을 나타낸다.
도 9는 절대적 목표를 갖는 트레이닝 데이터 세트를 나타낸다.
도 10은 일 실시예에 따른 상대적 목표를 갖는 트레이닝 데이터 세트를 나타낸다.
도 11은 일 실시예에 따른 트레이닝 데이터 세트의 구성을 나타낸다.
도 12는 일 실시예에 따른 제1 트레이닝 데이터 세트에 기초한 트레이닝 동작을 나타낸 플로우 차트이다.
도 13은 일 실시예에 따른 영상 복원 장치를 나타낸 블록도이다.
도 14는 일 실시예에 따른 전자 장치를 나타낸 블록도이다.1 illustrates a schematic operation of an image restoration apparatus according to an exemplary embodiment.
2 shows a source neural network and a transform network according to an embodiment.
3 illustrates a task-specific architecture and control architecture according to one embodiment.
4 is a flowchart illustrating an image restoration operation based on a first working vector according to an exemplary embodiment.
5 is a block diagram illustrating a training apparatus according to an embodiment.
6 shows an architecture of a source neural network according to an embodiment.
7 illustrates a channel selection operation according to an embodiment.
8 shows a configuration of an architecture control network according to an embodiment.
9 shows a training data set with an absolute goal.
10 illustrates a training data set with relative goals according to one embodiment.
11 shows a configuration of a training data set according to an embodiment.
12 is a flowchart illustrating a training operation based on a first training data set according to an embodiment.
13 is a block diagram illustrating an image restoration apparatus according to an exemplary embodiment.
14 is a block diagram illustrating an electronic device according to an exemplary embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical spirit described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various elements, these terms should be interpreted only for the purpose of distinguishing one element from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted.

도 1은 일 실시예에 따른 영상 복원 장치의 개략적인 동작을 나타낸다. 도 1을 참조하면, 영상 복원 장치(100)는 입력 영상(101) 및 다양한 작업 벡터들(task vectors, 102)을 수신하고, 다양한 복원 영상들(restoration images, 103)을 출력할 수 있다. 다양한 작업 벡터들(102)은 다양한 영상 효과들(image effects)에 대응될 수 있다. 다양한 작업 벡터들(102)은 하나 이상의 디멘전(dimension)을 가질 수 있다. 각 디멘전은 효과 유형(effect type)을 나타낼 수 있고, 각 디멘전의 값은 조절 레벨을 나타낼 수 있다. 조절 레벨은 작업 벡터들(102)에 의해 조절되는 효과 레벨(effect level)의 크기를 나타낼 수 있다. 열화의 관점에서 효과 유형 및 효과 레벨은 열화 유형 및 열화 레벨로 칭할 수도 있다. 다양한 작업 벡터들(102)은 영상 복원 장치(100)의 설계자 및/또는 운영자에 의해 사전에 설정되거나, 혹은 영상 복원 장치(100)의 사용 과정에서 사용자에 의해 설정될 수 있다.1 illustrates a schematic operation of an image restoration apparatus according to an exemplary embodiment. Referring to FIG. 1 , the image restoration apparatus 100 may receive an input image 101 and various task vectors 102 , and may output various restoration images 103 . Various working vectors 102 may correspond to various image effects. The various working vectors 102 may have one or more dimensions. Each dimension may indicate an effect type, and a value of each dimension may indicate an adjustment level. The adjustment level may indicate the magnitude of the effect level adjusted by the working vectors 102 . In terms of deterioration, the effect type and effect level may be referred to as deterioration type and deterioration level. The various working vectors 102 may be preset by a designer and/or an operator of the image restoration apparatus 100 , or may be set by a user in the course of using the image restoration apparatus 100 .

다양한 영상 효과들의 효과 유형은 노이즈 효과, 블러(blur) 효과, JPEG 압축 효과, 화이트 밸런스(white balance) 효과, 노출(exposure) 효과, 대비(contrast) 효과, 렌즈 왜곡(lens distortion) 효과, 및 이들 중 적어도 하나의 결합(combination)을 포함할 수 있다. 예를 들어, 3-디멘전의 작업 벡터의 제1 디멘전은 노이즈 효과를 나타내고, 제1 디멘전의 값은 노이즈 레벨을 나타낼 수 있다. 제2 디멘전은 블러 효과를 나타내고, 제2 디멘전의 값은 블러 레벨을 나타낼 수 있고, 제3 디멘전은 JPEG 압축(Joint Photographic Experts Group compression) 효과를 나타낼 수 있고, 제3 디멘전의 값은 JPEG 압축 레벨을 나타낼 수 있다. 다만, 이는 하나의 예시에 불과하며, 작업 벡터는 다른 디멘전, 다른 효과 유형, 및/또는 다른 효과 레벨을 가질 수 있다.The effect types of various picture effects are noise effect, blur effect, JPEG compression effect, white balance effect, exposure effect, contrast effect, lens distortion effect, and these It may include a combination of at least one of them. For example, a first dimension of the 3-dimensional working vector may indicate a noise effect, and a value of the first dimension may indicate a noise level. The second dimension may indicate a blur effect, the value of the second dimension may indicate a blur level, the third dimension may indicate a Joint Photographic Experts Group compression effect, and the value of the third dimension may be a JPEG It can indicate the compression level. However, this is only an example, and the working vector may have different dimensions, different effect types, and/or different effect levels.

영상 복원은 영상 효과의 적용을 포함할 수 있다. 깨끗한 영상을 좋은 품질의 영상이라고 가정하면, 영상 효과의 적용에 따라 영상 품질은 향상될 수도 있고 열화(degradation)될 수도 있다. 예를 들어, 영상 품질은 노이즈 제거 효과를 통해 향상될 수 있고, 노이즈 추가 효과를 통해 열화될 수 있다. 영상 복원은 이러한 영상 품질의 향상 및/또는 열화를 유발할 수 있다.Image restoration may include application of image effects. Assuming that a clean image is an image of good quality, image quality may be improved or deteriorated according to application of image effects. For example, image quality may be improved through a noise removal effect and may be degraded through a noise addition effect. Image restoration may cause such image quality improvement and/or deterioration.

영상 복원 장치(100)는 다양한 작업 벡터들(102)에 의해 지시되는 다양한 영상 효과들을 입력 영상(101)에 적용하여 다양한 복원 영상들(103)을 생성할 수 있다. 영상 복원 장치(100)는 소스 뉴럴 네트워크(110)에 다양한 작업 벡터들(102)을 적용하여 변형 네트워크들(120)을 결정할 수 있고, 변형 네트워크들(120)을 이용하여 다양한 복원 영상들(103)을 생성할 수 있다. 영상 복원 장치(100)는 소스 뉴럴 네트워크(110) 및 변형 네트워크들(120)을 이용하여 영상 복원에 필요한 연산을 최소화할 수 있다.The image restoration apparatus 100 may generate various restored images 103 by applying various image effects indicated by various working vectors 102 to the input image 101 . The image restoration apparatus 100 may determine the transformation networks 120 by applying various working vectors 102 to the source neural network 110 , and use the transformation networks 120 to obtain various restored images 103 . ) can be created. The image restoration apparatus 100 may minimize the operation required for image restoration by using the source neural network 110 and the transformation networks 120 .

소스 뉴럴 네트워크(110) 및 변형 네트워크들(120)은 복수의 레이어들을 포함하는 딥 뉴럴 네트워크(deep neural network, DNN)를 포함할 수 있다. 복수의 레이어들은 입력 레이어(input layer), 적어도 하나의 히든 레이어(hidden layer), 및 출력 레이어(output layer)를 포함할 수 있다.The source neural network 110 and the transform networks 120 may include a deep neural network (DNN) including a plurality of layers. The plurality of layers may include an input layer, at least one hidden layer, and an output layer.

딥 뉴럴 네트워크는 완전 연결 네트워크(fully connected network, FCN), 컨볼루셔널 뉴럴 네트워크(convolutional neural network, CNN), 및 리커런트 뉴럴 네트워크(recurrent neural network, RNN) 중 적어도 하나를 포함할 수 있다. 예를 들어, 뉴럴 네트워크 내 복수의 레이어들 중 적어도 일부는 CNN에 해당할 수 있고, 다른 일부는 FCN에 해당할 수 있다. 이 경우, CNN은 컨볼루셔널 레이어로 지칭될 수 있고, FCN은 완전 연결 레이어로 지칭될 수 있다.The deep neural network may include at least one of a fully connected network (FCN), a convolutional neural network (CNN), and a recurrent neural network (RNN). For example, at least some of the plurality of layers in the neural network may correspond to CNNs, and others may correspond to FCNs. In this case, CNN may be referred to as a convolutional layer, and FCN may be referred to as a fully connected layer.

CNN의 경우, 각 레이어에 입력되는 데이터는 입력 특징 맵(input feature map)으로 지칭될 수 있고, 각 레이어에서 출력되는 데이터는 출력 특징 맵(output feature map)으로 지칭될 수 있다. 입력 특징 맵 및 출력 특징 맵은 액티베이션 데이터(activation data)로 지칭될 수도 있다. 컨볼루셔널 레이어가 입력 레이어에 해당하는 경우, 입력 레이어의 입력 특징 맵은 입력 영상일 수 있다. 입력 특징 맵과 웨이트 커널(weight kernel) 간의 컨볼루션 연산을 통해 출력 특징 맵이 생성될 수 있다. 입력 특징 맵, 출력 특징 맵, 및 웨이트 커널은 각각 텐서(tensor) 단위로 구분될 수 있다.In the case of CNN, data input to each layer may be referred to as an input feature map, and data output from each layer may be referred to as an output feature map. The input feature map and the output feature map may be referred to as activation data. When the convolutional layer corresponds to the input layer, the input feature map of the input layer may be an input image. An output feature map may be generated through a convolution operation between the input feature map and a weight kernel. Each of the input feature map, the output feature map, and the weight kernel may be divided into tensor units.

뉴럴 네트워크는 딥 러닝에 기반하여 트레이닝된 후, 비선형적 관계에 있는 입력 데이터 및 출력 데이터를 서로 매핑함으로써 트레이닝 목적에 맞는 추론(inference)을 수행해낼 수 있다. 딥 러닝은 빅 데이터 세트로부터 영상 또는 음성 인식과 같은 문제를 해결하기 위한 기계 학습 기법이다. 딥 러닝은 준비된 트레이닝 데이터를 이용하여 뉴럴 네트워크를 트레이닝하면서 에너지가 최소화되는 지점을 찾아가는 최적화 문제 풀이 과정으로 이해될 수 있다.After the neural network is trained based on deep learning, by mapping input data and output data in a non-linear relationship with each other, inference can be performed according to the training purpose. Deep learning is a machine learning technique for solving problems such as image or speech recognition from big data sets. Deep learning can be understood as an optimization problem solving process that finds a point where energy is minimized while training a neural network using prepared training data.

딥 러닝의 지도식(supervised) 또는 비지도식(unsupervised) 학습을 통해 뉴럴 네트워크의 구조, 혹은 모델에 대응하는 웨이트(weight)가 구해질 수 있고, 이러한 웨이트를 통해 입력 데이터 및 출력 데이터가 서로 매핑될 수 있다. 뉴럴 네트워크의 폭과 깊이가 충분히 크면 임의의 함수를 구현할 수 있을 만큼의 용량(capacity)을 가질 수 있다. 뉴럴 네트워크가 적절한 트레이닝 과정을 통해 충분히 많은 양의 트레이닝 데이터를 학습하면 최적의 성능을 달성할 수 있다.A weight corresponding to the structure or model of a neural network can be obtained through supervised or unsupervised learning of deep learning, and input data and output data can be mapped to each other through these weights. can If the width and depth of the neural network are large enough, it can have enough capacity to implement an arbitrary function. When a neural network learns a sufficiently large amount of training data through an appropriate training process, optimal performance can be achieved.

아래에서 뉴럴 네트워크가 '미리' 트레이닝된 것으로 표현될 수 있는데, 여기서 '미리'는 뉴럴 네트워크가 '시작'되기 전을 나타낼 수 있다. 뉴럴 네트워크가 '시작'되었다는 것은 뉴럴 네트워크가 추론을 위한 준비가 된 것을 의미할 수 있다. 예를 들어, 뉴럴 네트워크가 '시작'된 것은 뉴럴 네트워크가 메모리에 로드된 것, 혹은 뉴럴 네트워크가 메모리에 로드된 이후 뉴럴 네트워크에 추론을 위한 입력 데이터가 입력된 것을 포함할 수 있다.In the following, the neural network may be expressed as being trained 'in advance', where 'in advance' may indicate before the neural network is 'started'. When a neural network is 'started', it may mean that the neural network is ready for inference. For example, the 'starting' of the neural network may include that the neural network is loaded into the memory, or that input data for inference is input to the neural network after the neural network is loaded into the memory.

소스 뉴럴 네트워크(110)는 작업-무관 아키텍처(task-agnostic architecture), 작업-특화 아키텍처(task-specific architecture), 및 제어 아키텍처(control architecture)를 포함할 수 있다. 작업-무관 아키텍처는 입력 영상(101)으로부터 각 작업에 공통적으로 이용되는 특징을 추출할 수 있다. 이러한 특징은 공통 특징으로 부를 수 있다. 작업-특화 아키텍처는 공통 특징에 기초하여 각 작업에 특화된 특징을 추출할 수 있다. 이러한 특징은 특화 특징으로 부를 수 있다. 작업-특화 아키텍처는 특화 특징을 복원 영상으로 복원할 수 있다. 제어 아키텍처(control architecture)는 각 작업 벡터 및 작업-특화 아키텍처에 기초하여 각 작업-특화 네트워크를 결정할 수 있다. 도 2 및 도 3을 참조하여 소스 뉴럴 네트워크(110) 및 변형 네트워크들(120)을 추가로 설명한다.The source neural network 110 may include a task-agnostic architecture, a task-specific architecture, and a control architecture. The task-independent architecture may extract features commonly used for each task from the input image 101 . These characteristics may be referred to as common characteristics. The task-specific architecture may extract features specific to each task based on common features. These characteristics may be referred to as special characteristics. The task-specific architecture can reconstruct a specialized feature into a reconstructed image. A control architecture may determine each task-specific network based on each task vector and task-specific architecture. The source neural network 110 and variant networks 120 are further described with reference to FIGS. 2 and 3 .

도 2는 일 실시예에 따른 소스 뉴럴 네트워크 및 변형 네트워크들을 나타낸다. 도 2를 참조하면, 소스 뉴럴 네트워크(200)는 작업-무관 아키텍처(201) 및 작업-특화 아키텍처(202)를 포함할 수 있다. 소스 뉴럴 네트워크(220)에 제1 작업 벡터(203)가 적용되어 제1 변형 네트워크(210)가 생성될 수 있고, 제2 작업 벡터(204)가 적용되어 제2 변형 네트워크(220)가 생성될 수 있다. 추가적인 작업 벡터에 기초하여 추가적인 변형 네트워크가 생성될 수 있고, 아래의 설명은 추가적인 작업 벡터 및 추가적인 변형 네트워크에 적용될 수 있다.2 shows a source neural network and a transform network according to an embodiment. Referring to FIG. 2 , the source neural network 200 may include a task-agnostic architecture 201 and a task-specific architecture 202 . A first working vector 203 may be applied to the source neural network 220 to generate a first transforming network 210 , and a second working vector 204 may be applied to generate a second transforming network 220 . can Additional transformation networks may be generated based on the additional working vectors, and the description below may be applied to additional working vectors and additional transformation networks.

제1 변형 네트워크(210)는 입력 영상(205)에 기초하여 제1 복원 영상(206)을 복원할 수 있다. 제1 변형 네트워크(210)는 작업-무관 네트워크(211) 및 제1 작업-특화 네트워크(212)를 포함할 수 있다. 작업-무관 네트워크(211)는 작업-무관 아키텍처(201)에 공유 파라미터를 적용하여 결정될 수 있고, 제1 작업-특화 네트워크(212)는 작업-특화 아키텍처(202)에 제1 작업 벡터(203)를 적용하여 결정될 수 있다. 예를 들어, 제1 작업 벡터(203)를 이용하여 작업-특화 아키텍처(202)에 관한 채널 프루닝(channel pruning)을 수행하여 제1 작업-특화 네트워크(212)가 결정될 수 있다. 이러한 프루닝을 통해 연산이 감축될 수 있다. 작업-무관 네트워크(211)는 입력 영상(205)으로부터 공통 특징을 추출할 수 있고, 제1 작업-특화 네트워크(212)는 공통 특징으로부터 제1 작업 벡터(203)에 의해 지시되는 제1 영상 효과에 특화된 제1 특화 특징을 추출할 수 있다. 제1 작업-특화 네트워크(212)는 제1 특화 특징을 제1 복원 영상(206)으로 복원할 수 있다.The first modified network 210 may reconstruct the first reconstructed image 206 based on the input image 205 . The first variant network 210 may include a task-agnostic network 211 and a first task-specific network 212 . The task-agnostic network 211 may be determined by applying the shared parameters to the task-agnostic architecture 201 , and the first task-specific network 212 is the task-specific architecture 202 with a first task vector 203 . can be determined by applying For example, the first task-specific network 212 may be determined by performing channel pruning on the task-specific architecture 202 using the first task vector 203 . Through such pruning, calculations can be reduced. The task-independent network 211 may extract a common feature from the input image 205 , and the first task-specific network 212 may have a first image effect indicated by the first working vector 203 from the common feature. It is possible to extract a first specialized feature specialized for . The first task-specific network 212 may restore the first specialized feature to the first restored image 206 .

제2 변형 네트워크(220)는 입력 영상(205)에 기초하여 제2 복원 영상(206)을 복원할 수 있다. 제2 변형 네트워크(220)는 작업-무관 네트워크(221) 및 제2 작업-특화 네트워크(222)를 포함할 수 있다. 작업-무관 네트워크(211)와 작업-무관 네트워크(221)는 동일할 수 있다. 작업-무관 네트워크(221)는 작업-무관 아키텍처(201)에 공유 파라미터를 적용하여 결정될 수 있고, 작업-무관 네트워크(221)는 입력 영상(205)으로부터 공통 특징을 추출할 수 있다. 이 공통 특징은 작업-무관 네트워크(211)의 출력과 같을 수 있다. 따라서, 작업-무관 네트워크(211)의 출력이 제2 복원 영상의 복원을 위해 재사용될 수 있고, 작업-무관 네트워크(221)의 결정을 위한 동작 및 작업-무관 네트워크(221)의 특징 추출 동작이 생략될 수 있다. 제2 작업-특화 네트워크(222)는 작업-특화 아키텍처(202)에 제2 작업 벡터(204)를 적용하여 결정될 수 있다. 제2 작업-특화 네트워크(222)는 공통 특징으로부터 제2 작업 벡터(204)에 의해 지시되는 제2 영상 효과에 특화된 제2 특화 특징을 추출할 수 있고, 제2 특화 특징을 제2 복원 영상(207)으로 복원할 수 있다.The second modified network 220 may reconstruct the second reconstructed image 206 based on the input image 205 . The second variant network 220 may include a task-agnostic network 221 and a second task-specific network 222 . The task-agnostic network 211 and the task-agnostic network 221 may be the same. The task-agnostic network 221 may be determined by applying a shared parameter to the task-agnostic architecture 201 , and the task-agnostic network 221 may extract common features from the input image 205 . This common characteristic may be the same as the output of the task-agnostic network 211 . Accordingly, the output of the task-independent network 211 may be reused for the restoration of the second restored image, and the operation for determining the task-independent network 221 and the feature extraction operation of the task-independent network 221 are performed. may be omitted. The second task-specific network 222 may be determined by applying the second task vector 204 to the task-specific architecture 202 . The second task-specific network 222 may extract a second specialized feature specialized for the second image effect indicated by the second working vector 204 from the common feature, and extract the second specialized feature from the second reconstructed image ( 207) can be restored.

도 3은 일 실시예에 따른 작업-특화 아키텍처 및 제어 아키텍처를 나타낸다. 도 3을 참조하면, 작업-특화 아키텍처(310)는 채널 선택기들(channel selectors; 311 내지 313) 및 복수의 레이어들(315 내지 317)을 포함할 수 있고, 제어 아키텍처(320)는 복수의 아키텍처 제어 네트워크들(321 내지 323)을 포함할 수 있다. 복수의 아키텍처 제어 네트워크들(321 내지 323) 각각은 적어도 하나의 컨볼루션 레이어 및 적어도 하나의 활성화 함수(activation function)를 포함할 수 있다. 예를 들어, 컨볼루션 레이어는 1*1 컨볼루션 레이어일 수 있고, 활성화 함수는 ReLU 함수일 수 있다. 다만 이는 하나의 예시에 불과하며, 1*1 이외의 다른 디멘전의 컨볼루션 레이어, 및/또는 시그모이드(Sigmoid), 하이퍼볼릭 탄젠트(hyperbolic tangent, tanh)와 같은 다른 비선형 함수가 사용될 수 있다. 채널 선택기들(311 내지 313) 및 아키텍처 제어 네트워크들(321 내지 323)의 쌍들은 복수의 레이어들(315 내지 317)에 대응할 수 있다.3 illustrates a task-specific architecture and control architecture according to one embodiment. Referring to FIG. 3 , the task-specific architecture 310 may include channel selectors 311 - 313 and a plurality of layers 315 - 317 , and the control architecture 320 includes a plurality of architectures. control networks 321 to 323 may be included. Each of the plurality of architecture control networks 321 to 323 may include at least one convolutional layer and at least one activation function. For example, the convolutional layer may be a 1*1 convolutional layer, and the activation function may be a ReLU function. However, this is only an example, and a convolutional layer of a dimension other than 1*1, and/or other nonlinear functions such as sigmoid and hyperbolic tangent (tanh) may be used. Pairs of channel selectors 311 - 313 and architectural control networks 321 - 323 may correspond to a plurality of layers 315 - 317 .

영상 복원 장치는 작업 벡터(301)를 작업-특화 아키텍처(310)에 적용하여 작업-특화 네트워크를 결정할 수 있다.The image restoration apparatus may determine the task-specific network by applying the task vector 301 to the task-specific architecture 310 .

영상 복원 장치는 아키텍처 제어 네트워크들(321 내지 323) 및 채널 선택기들(311 내지 313)을 이용하여 복수의 레이어들(315 내지 317) 각각에 관한 채널 선택 정보를 생성할 수 있다. 아키텍처 제어 네트워크들(321 내지 323) 각각은 작업 벡터에 기초하여 작업에 대한 채널 중요도(또는 채널에 대한 작업 선호도)를 결정할 수 있다. 채널 중요도(또는 작업 선호도)는 실수 벡터(real vector)의 형식(form)을 가질 수 있다. 아키텍처 제어 네트워크들(321 내지 323)에 의해 출력된 채널 중요도들 각각은 서로 다른 값을 가질 수 있다. 채널 선택기들(311 내지 313) 각각은 채널 중요도에 기초하여 채널 선택 정보를 생성할 수 있다. 채널 선택기들(311 내지 313) 각각은 채널 중요도를 나타내는 실수 벡터의 각 실수 엘리먼트를 참(true) 또는 거짓(false)으로 변환하여 채널 선택 정보를 생성할 수 있다. 채널 선택 정보는 2진 벡터(binary vector)의 형식을 가질 수 있다.The image restoration apparatus may generate channel selection information for each of the plurality of layers 315 to 317 by using the architecture control networks 321 to 323 and the channel selectors 311 to 313 . Each of the architectural control networks 321 to 323 may determine a channel importance for a task (or a task preference for a channel) based on the task vector. The channel importance (or task preference) may have the form of a real vector. Each of the channel severities output by the architecture control networks 321 to 323 may have a different value. Each of the channel selectors 311 to 313 may generate channel selection information based on channel importance. Each of the channel selectors 311 to 313 may generate channel selection information by converting each real element of a real vector indicating channel importance into true or false. The channel selection information may have the form of a binary vector.

영상 복원 장치는 복수의 레이어들(315 내지 317) 각각에 관한 채널 선택 정보에 기초하여 작업 벡터(301)에 대응하는 작업-특화 네트워크를 결정할 수 있다. 영상 복원 장치는 채널 선택 정보에 기초하여 복수의 레이어들(315 내지 317) 각각에 채널 프루닝을 적용하여 작업-특화 네트워크를 결정할 수 있다. 예를 들어, 제1 레이어(315)가 c개의 출력 채널들을 갖는 경우, 제1 채널 선택기(311)에 의해 생성된 채널 선택 정보를 통해 c개의 출력 채널들 중에 적어도 일부가 제거될 수 있다. 채널 선택 정보의 참에 대응하는 채널은 유지될 수 있고, 거짓에 대응하는 채널은 제거될 수 있다. 채널의 제거는 채널의 스킵(skip)을 의미할 수 있다. 예를 들어, 웨이트 커널을 각각 출력 채널에 대응하는 웨이트 텐서로 구분할 경우, 영상 복원 장치는 제거 대상 채널의 웨이트 텐서를 레지스터로 로드하지 않은 채, 나머지 채널의 웨이트 텐서로 해당 레이어의 컨볼루션 연산을 수행할 수 있다. 이와 같이 특정 작업 벡터(301)에 따른 채널 스킵을 통해 해당 작업 벡터(301)에 특화된 작업-특화 네트워크가 구현될 수 있다.The image restoration apparatus may determine a work-specific network corresponding to the work vector 301 based on channel selection information regarding each of the plurality of layers 315 to 317 . The image restoration apparatus may determine a task-specific network by applying channel pruning to each of the plurality of layers 315 to 317 based on the channel selection information. For example, when the first layer 315 has c output channels, at least some of the c output channels may be removed through the channel selection information generated by the first channel selector 311 . A channel corresponding to true of the channel selection information may be maintained, and a channel corresponding to false may be removed. Removal of a channel may mean skipping a channel. For example, when the weight kernel is divided into weight tensors corresponding to output channels, the image restoration apparatus performs the convolution operation of the corresponding layer with the weight tensors of the remaining channels without loading the weight tensors of the channels to be removed into a register. can be done As described above, a work-specific network specialized for the corresponding work vector 301 may be implemented through channel skip according to the specific work vector 301 .

도 4는 일 실시예에 따른 제1 작업 벡터에 기초한 영상 복원 동작을 나타낸 플로우 차트이다. 도 4의 단계들(410 내지 430)은 순차적으로 수행되거나, 혹은 비 순차적으로 수행될 수 있다. 예를 들어, 단계들(410 내지 430)의 순서가 변경되거나, 및/또는 단계들(410 내지 430)의 적어도 둘이 병렬적으로 수행될 수 있다. 단계들(410 내지 430)은 영상 복원 장치(100, 1300) 및/또는 전자 장치(1400)의 적어도 하나의 구성요소(예: 프로세서(1310, 1410))에 의해 수행될 수 있다.4 is a flowchart illustrating an image restoration operation based on a first working vector according to an exemplary embodiment. Steps 410 to 430 of FIG. 4 may be performed sequentially or non-sequentially. For example, the order of steps 410 to 430 may be changed, and/or at least two of steps 410 to 430 may be performed in parallel. Steps 410 to 430 may be performed by the image restoration apparatuses 100 and 1300 and/or at least one component (eg, the processors 1310 and 1410) of the electronic apparatus 1400 .

도 4를 참조하면, 단계(410)에서 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터가 수신될 수 있다. 단계(420)에서 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 입력 영상으로부터 복수의 후보 영상 효과들이 공유하는 공통 특징이 추출된다. 단계(420)에서 작업-무관 아키텍처에 공유 파라미터를 적용하여 작업-무관 네트워크가 결정되고, 작업-무관 네트워크에 기초하여 입력 영상으로부터 공통 특징이 추출될 수 있다.Referring to FIG. 4 , in operation 410 , an input image and a first working vector indicating a first image effect among a plurality of candidate image effects may be received. In operation 420 , a common feature shared by a plurality of candidate image effects is extracted from the input image based on the task-independent architecture of the source neural network. In operation 420 , a work-independent network is determined by applying a shared parameter to the task-agnostic architecture, and a common feature may be extracted from the input image based on the task-agnostic network.

단계(430)에서 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 제1 작업 벡터에 기초하여 공통 특징이 제1 영상 효과에 대응하는 제1 복원 영상으로 복원될 수 있다. 단계(430)에서 작업-특화 아키텍처에 제1 작업 벡터를 적용하여 제1 작업-특화 네트워크가 결정되고, 제1 작업-특화 네트워크에 기초하여 공통 특징이 제1 복원 영상으로 복원될 수 있다. 이때, 제1 작업-특화 네트워크에 기초하여 공통 특징으로부터 제1 영상 효과에 특화된 제1 특화 특징이 추출되고, 제1 작업-특화 네트워크에 기초하여 제1 특화 특징이 제1 영상 효과에 대응하는 제1 복원 영상으로 복원될 수 있다.In operation 430 , a common feature may be reconstructed as a first reconstructed image corresponding to the first image effect based on the first working vector and the work-specific architecture of the source neural network. In operation 430 , a first task-specific network may be determined by applying a first working vector to the task-specific architecture, and a common feature may be reconstructed as a first reconstructed image based on the first task-specific network. At this time, based on the first task-specific network, a first specialized feature specialized for the first video effect is extracted from the common feature, and based on the first task-specific network, the first specialized feature corresponds to the first video effect. 1 It can be restored as a restored image.

또한, 아키텍처 제어 네트워크를 이용하여 제1 작업 벡터에 대응하는 제1 채널 선택 정보가 생성되고, 제1 채널 선택 정보에 기초하여 작업-특화 아키텍처의 적어도 일부의 채널을 제거하여 제1 작업-특화 네트워크가 결정될 수 있다. 이때, 아키텍처 제어 네트워크를 통해 제1 작업 벡터를 처리하여 제1 실수 벡터가 생성되고, 변환 함수를 통해 제1 실수 벡터의 각 실수 엘리먼트를 참 또는 거짓으로 변환하여 제1 채널 선택 정보가 생성될 수 있다.In addition, first channel selection information corresponding to the first working vector is generated by using the architecture control network, and channels of at least a portion of the task-specific architecture are removed based on the first channel selection information to remove the first task-specific network. can be determined. At this time, the first real vector is generated by processing the first working vector through the architecture control network, and the first channel selection information is generated by converting each real element of the first real vector to true or false through a transform function. have.

단계들(410 내지 430)과 관련하여, 제2 작업 벡터에 기초한 영상 복원 동작이 수행될 수 있다. 예를 들어, 복수의 후보 영상 효과들 중에 제2 영상 효과에 대응하는 제2 작업 벡터가 수신되고, 작업-특화 아키텍처 및 제2 작업 벡터에 기초하여 공통 특징이 제2 영상 효과에 대응하는 제2 복원 영상으로 복원될 수 있다. 여기의 공통 특징은 단계(420)에서 추출된 공통 특징에 해당할 수 있고, 해당 공통 특징은 제2 복원 영상의 복원을 위해 재사용될 수 있다. 어느 입력 영상으로부터 공통 특징이 추출되면 해당 공통 특징은 동일한 입력 영상에 대한 다양한 영상 효과들의 복원 영상들을 복원하는데 재사용될 수 있다. 이러한 재사용을 통해 특징 추출을 위한 연산이 감축될 수 있다. 그 밖에, 영상 복원에는 도 1 내지 도 3 및 도 5 내지 도 14에 관한 설명이 적용될 수 있다.In relation to steps 410 to 430 , an image restoration operation based on the second working vector may be performed. For example, a second working vector corresponding to a second visual effect among the plurality of candidate visual effects is received, and a second working vector corresponding to the second visual effect has a common feature based on the task-specific architecture and the second working vector. It may be restored as a restored image. Here, the common feature may correspond to the common feature extracted in step 420 , and the common feature may be reused for reconstructing the second reconstructed image. When a common feature is extracted from an input image, the common feature may be reused to reconstruct reconstructed images of various image effects for the same input image. Through such reuse, calculations for feature extraction may be reduced. In addition, descriptions of FIGS. 1 to 3 and 5 to 14 may be applied to image restoration.

도 5는 일 실시예에 따른 트레이닝 장치를 나타낸 블록도이다. 도 5를 참조하면, 트레이닝 장치(500)는 프로세서(510) 및 메모리(520)를 포함할 수 있다. 프로세서(510)는 트레이닝 데이터에 기초하여 메모리(520)에 저장된 소스 뉴럴 네트워크(530)를 트레이닝할 수 있다. 소스 뉴럴 네트워크(530)의 트레이닝은 소스 뉴럴 네트워크(530)를 갱신하는 것, 및/또는 소스 뉴럴 네트워크(530)의 파라미터(예: 웨이트)를 갱신하는 것을 포함할 수 있다. 소스 뉴럴 네트워크(530)는 사전에 트레이닝되거나, 및/또는 사용 과정에서 온-디바이스로 트레이닝될 수 있다. 트레이닝 데이터는 트레이닝 입력 및 트레이닝 출력을 포함할 수 있다. 다른 말로 트레이닝 출력은 트레이닝 목표로 지칭될 수 있다. 트레이닝 입력은 트레이닝 입력 영상 및 작업 벡터를 포함할 수 있고, 트레이닝 출력은 트레이닝 목표 영상을 포함할 수 있다.5 is a block diagram illustrating a training apparatus according to an embodiment. Referring to FIG. 5 , the training apparatus 500 may include a processor 510 and a memory 520 . The processor 510 may train the source neural network 530 stored in the memory 520 based on the training data. Training of the source neural network 530 may include updating the source neural network 530 and/or updating parameters (eg, weights) of the source neural network 530 . The source neural network 530 may be trained in advance and/or may be trained on-device in the course of use. The training data may include a training input and a training output. In other words, the training output may be referred to as a training target. The training input may include a training input image and a working vector, and the training output may include a training target image.

소스 뉴럴 네트워크(530)는 작업-무관 아키텍처, 작업-특화 아키텍처, 및 제어 아키텍처를 포함할 수 있다. 트레이닝 장치(500)는 작업-특화 프루닝 및 작업-무관 프루닝을 통해 효율적인 아키텍처를 찾을 수 있다. 작업-특화 프루닝은 각 작업에 관련 없는 네트워크 파라미터를 적응적으로 제거하는 방법을 학습할 수 있고, 작업-무관 프루닝은 여러 작업에 걸쳐 네트워크의 초기 레이어를 공유하여 효율적인 아키텍처를 찾는 방법을 학습할 수 있다.The source neural network 530 may include a task-agnostic architecture, a task-specific architecture, and a control architecture. The training apparatus 500 may find an efficient architecture through task-specific pruning and task-independent pruning. Task-specific pruning can learn how to adaptively remove network parameters that are not relevant to each task, while task-independent pruning learns how to find an efficient architecture by sharing the initial layer of the network across multiple tasks. can do.

제어 가능한(controllable) 영상 복원 또는 영상 변조(image modulation)는 각 효과 유형에 대해 효과 정도가 다른 영상을 복원할 수 있다. D개의 효과 유형이 주어지면 작업 벡터

∈

는 m번째 영상 복원 작업, 다시 말해 m번째 영상 효과(m∈{1, 2, ..., M})를 인코딩하고,

(

∈[0, ])의 각 d번째 구성요소는 해당 d번째 열화 유형에 대한 조절 레벨을 결정할 수 있다. 뉴럴 네트워크를 트레이닝하는 동안, 작업 벡터

는 입력 영상과 대상 영상의 대응 트레이닝 쌍과 함께 무작위로 샘플링될 수 있다. 추론 시, 작업 벡터는 영상 효과를 결정하는 제어 변수에 해당할 수 있다.Controllable image restoration or image modulation may restore images having different effect levels for each effect type. Working vector given D effect types

∈

encodes the mth image restoration task, that is, the mth image effect (m∈{1, 2, ..., M}),

(

Each d-th component of ∈[0, ]) may determine an adjustment level for the corresponding d-th degradation type. While training the neural network, the task vector

may be randomly sampled along with corresponding training pairs of the input image and the target image. In inference, the working vector may correspond to a control variable that determines an image effect.

현실의 열화 영상에 있어서 미리 결정된 측정치(예: PSNR(peak signal-to-noise ratio), LPIPS(learned perceptual image patch similarity), 사용자 선호도 등)에 관해 최상의 영상 효과를 생성하는 최적의 작업 벡터는 알 수 없다고 가정할 수 있다. 따라서, 이러한 작업 벡터를 찾기 위해 제어 가능한 영상 복원 네트워크가 입력 영상 당 많은 수의 영상 효과를 생성하는 과정이 요구될 수 있다. 이때, 사용자 선호도 또는 요구가 충족될 때까지 주어진 작업에 대해 생성된 임의의 수의 영상 효과를 M으로 나타낼 수 있다.The optimal working vector that produces the best image effect for a predetermined measure (e.g., peak signal-to-noise ratio (PSNR), learned perceptual image patch similarity (LPIPS), user preference, etc.) It can be assumed that no Therefore, in order to find such a working vector, a process of generating a large number of image effects per input image by a controllable image reconstruction network may be required. At this time, an arbitrary number of video effects generated for a given task may be represented by M until user preference or demand is satisfied.

이전 작업의 아키텍처는 고정된 상태에서 영상 효과 당 전체 네트워크 추론을 수행할 수 있다. 실시예들에 따르면 복원 프로세스의 계산 비용을 최소화하면서 입력 영상 당 여러 영상 효과를 정확히 생성하는 네트워크 아키텍처가 제공될 수 있다. 주어진 M개의 영상 효과를 생성하기 위한 평균 계산 비용은 수학식 1과 같이 나타낼 수 있다.The architecture of the previous work can perform full network inference per transition in a fixed state. According to embodiments, a network architecture that accurately generates multiple image effects per input image while minimizing the computational cost of a restoration process may be provided. The average calculation cost for generating the given M image effects can be expressed as Equation (1).

은 네트워크 아키텍처 f, 입력 영상 x, 및 작업 벡터

을 사용하여 m번째 영상 효과를 생성하기위한 FLOPS(floating point operations per second) 또는 지연시간(latency)을 나타낸다. 작업-특화 프루닝은 각 영상 효과에 특화된 효율적인 네트워크 아키텍처를 검색(search)할 수 있다. 이것은 수학식 2와 같은 평균 계산 비용을 나타낼 수 있다.

is the network architecture f, the input image x, and the working vector

is used to indicate floating point operations per second (FLOPS) or latency for generating the m-th video effect. Task-specific pruning can search for an efficient network architecture specific to each image effect. This may represent the average calculation cost as in Equation (2).

고정된 아키텍처 f는 작업-특화 프루닝 프로세스에 필요한 보조 계산 비용

을 갖는 m번째 영상 효과 특정 효율적인 네트워크

으로 대체될 수 있다. 그런 다음, 작업-무관 프루닝은 특징 재사용을 가능케 하기 위해 작업 전반에 걸쳐 초기 레이어의 특징 맵을 공유하는 작업-무관 아키텍처

를 결정할 수 있다. 이는 수학식 3과 같이 나타낼 수 있다.The fixed architecture f is the auxiliary computational cost required for the task-specific pruning process.

m-th picture effect specific efficient network with

can be replaced with Then, task-agnostic pruning is a task-agnostic architecture that shares feature maps of initial layers across tasks to enable feature reuse.

can be decided This can be expressed as Equation (3).

은

이후

의 나머지 작업-특화 레이어이고,

는

의 특징 맵 출력이다. 특징 맵 출력은 각 작업 간의 공통 특징에 해당할 수 있다. 결과적으로, 모든 M개의 영상 효과에 대해

의 단일 계산만 요구될 수 있고, 공유 초기 레이어의 특징 맵에 대한 중복된 M-1개의 계산이 제거될 수 있다. 이는 수학식 4와 같이 나타낼 수 있다.

silver

after

The rest of the work-specific layers of

Is

is the output of the feature map. The feature map output may correspond to a common feature between each task. As a result, for all M picture effects,

Only a single calculation of may be required, and the M−1 redundant calculations for the feature map of the shared initial layer may be eliminated. This can be expressed as Equation (4).

는

에 대한 단일 계산의 계산 비용이다. 트레이닝 장치(500)는 손실 함수에 기초하여 소스 뉴럴 네트워크(530)를 트레이닝할 수 있다. 손실 함수는 복원 성능에 관한 제1 손실 성분 및 연산량에 관한 제2 손실 성분을 포함할 수 있다. 트레이닝 장치(500)는 소스 뉴럴 네트워크(530)의 복원 성능이 향상되고, 소스 뉴럴 네트워크(530)에 관한 연산량이 감축되도록 소스 뉴럴 네트워크(530)를 트레이닝할 수 있다. 보다 구체적으로, 트레이닝 장치(500)는 트레이닝 입력(트레이닝 입력 영상 및 작업 벡터)에 대한 소스 뉴럴 네트워크(530)의 출력(복원 영상)을 트레이닝 출력(트레이닝 목표 영상)과 비교하고, 비교 결과에 기초하여 손실 함수의 제1 손실 성분을 결정할 수 있다. 또한, 트레이닝 장치(500)는 복원 성능의 손상을 최소화하면서 연산량이 감축되도록 소스 뉴럴 네트워크(530)를 트레이닝할 수 있다. 예를 들어, 연산량의 감축은 작업-무관 아키텍처에 포함되는 레이어의 수 및/또는 작업-특화 아키텍처에서 제거되는 채널의 수의 증가를 통해 달성될 수 있다.

Is

is the computational cost of a single calculation for . The training apparatus 500 may train the source neural network 530 based on the loss function. The loss function may include a first loss component related to restoration performance and a second loss component related to an amount of computation. The training apparatus 500 may train the source neural network 530 so that the restoration performance of the source neural network 530 is improved and the amount of computation related to the source neural network 530 is reduced. More specifically, the training apparatus 500 compares the output (reconstructed image) of the source neural network 530 with respect to the training input (training input image and the working vector) with the training output (training target image), and based on the comparison result to determine the first loss component of the loss function. Also, the training apparatus 500 may train the source neural network 530 to reduce the amount of computation while minimizing damage to the restoration performance. For example, the reduction in the amount of computation may be achieved through an increase in the number of layers included in the task-agnostic architecture and/or the number of channels removed in the task-specific architecture.

트레이닝 장치(500)의 검색 알고리즘은 수퍼 네트워크라고 불리는 대규모 네트워크로부터 효율적이거나 성능 별 최적의 네트워크를 찾는 것을 목표로 하는 수퍼 네트워크 기반 접근 방식일 수 있다. 검색 프로세스는 동작 또는 구성요소의 검색 공간에서 수행되며, 검색 프로세스의 각 조합은 슈퍼 네트워크로부터 도출된 후보 네트워크를 제공할 수 있다. 소스 뉴럴 네트워크(530)는 슈퍼 네트워크에 해당할 수 있고, 소스 뉴럴 네트워크(530)로부터 도출된 변형 네트워크들은 후보 네트워크에 해당할 수 있다. 트레이닝 장치(500)는 작업 간에 레이어가 공유되어야 하는지 여부 및 채널이 수퍼 네트워크에서 제거되어야 하는지 여부를 아키텍처 컨트롤러와 함께 종단 간 방식(end-to-end manner)에서 결정할 수 있다.The search algorithm of the training device 500 may be a super network-based approach that aims to find an efficient or optimal network for each performance from a large network called a super network. A search process is performed in the search space of an operation or component, and each combination of search processes may provide a candidate network derived from a super network. The source neural network 530 may correspond to a super network, and modified networks derived from the source neural network 530 may correspond to a candidate network. The training device 500 may determine in an end-to-end manner together with the architecture controller whether a layer should be shared between tasks and whether a channel should be removed from the super network.

도 6은 일 실시예에 따른 소스 뉴럴 네트워크의 아키텍처를 나타낸다. 도 6을 참조하면, 소스 뉴럴 네트워크(600)는 작업-무관 아키텍처(610), 작업-특화 아키텍처(620), 및 제어 아키텍처(630)를 포함할 수 있다. 작업-무관 아키텍처(610)는 복수의 레이어들(6101 내지 6103) 및 복수의 채널 선택기들(6111 내지 6114)를 포함할 수 있다. 작업-무관 아키텍처(610)는 채널 선택기(611)의 출력을 채널 선택기(6114)의 출력에 더하는 동작에 대응하는 스킵 커넥션(6121)을 더 포함할 수 있다. 복수의 레이어들(6101 내지 6103)은 컨볼루션 동작 및/또는 활성화 함수의 연산에 대응할 수 있다. 예를 들어, 레이어들(6101, 6103)은 3*3 컨볼루션 연산에 대응할 수 있고, 레이어(6102)는 3*3 컨볼루션 연산 및 활성화 연산(예: ReLU 연산)에 대응할 수 있다. 레이어(6101)의 스트라이드(stride)는 레이어들(6102, 6103)에 비해 2배 클 수 있다. 3*3 및/또는 2배와 같은 수치는 다르게 조절될 수 있다.6 shows an architecture of a source neural network according to an embodiment. Referring to FIG. 6 , the source neural network 600 may include a task-agnostic architecture 610 , a task-specific architecture 620 , and a control architecture 630 . The task-agnostic architecture 610 may include a plurality of layers 6101 - 6103 and a plurality of channel selectors 6111 - 6114 . The task-agnostic architecture 610 may further include a skip connection 6121 corresponding to adding the output of the channel selector 611 to the output of the channel selector 6114 . The plurality of layers 6101 to 6103 may correspond to a convolution operation and/or operation of an activation function. For example, the layers 6101 and 6103 may correspond to a 3*3 convolution operation, and the layer 6102 may correspond to a 3*3 convolution operation and an activation operation (eg, a ReLU operation). The stride of layer 6101 may be twice as large as that of layers 6102 and 6103 . Numerical values such as 3*3 and/or double may be adjusted differently.

작업-특화 아키텍처(620)는 특징 추출 파트(621) 및 영상 복원 파트(622)를 포함할 수 있다. 특징 추출 파트(621)는 복수의 채널 선택기들(6211 내지 6114) 및 복수의 레이어들(6215, 6216)을 포함할 수 있다. 특징 추출 파트(621)는 컨볼루션 블록(6219)에 의한 작업 벡터

의 컨볼루션 결과를 채널 선택기(6213)의 출력에 곱하는 곱셈 동작, 및 스킵 커넥션(6218)을 따라 작업-무관 아키텍처(610)의 출력을 곱셈 결과에 더하는 덧셈 동작을 더 포함할 수 있다. 복수의 레이어들(6215, 6216)은 컨볼루션 동작 및/또는 활성화 함수의 연산에 대응할 수 있다. 예를 들어, 레이어(6215)는 3*3 컨볼루션 연산 및 활성화 연산(예: ReLU 연산)에 대응할 수 있고, 레이어(6216)는 3*3 컨볼루션 연산에 대응할 수 있다. 레이어들(6215, 6216)의 스트라이드는 레이어들(66102, 6103)과 같을 수 있다.The task-specific architecture 620 may include a feature extraction part 621 and an image reconstruction part 622 . The feature extraction part 621 may include a plurality of channel selectors 6211 to 6114 and a plurality of

layers

6215 and 6216 . The feature extraction part 621 is a working vector by the convolution block 6219

A multiplication operation for multiplying the output of the channel selector 6213 by the convolution result of , and an addition operation for adding the output of the task-independent architecture 610 along a skip connection 6218 to the multiplication result. The plurality of

layers

6215 and 6216 may correspond to a convolution operation and/or operation of an activation function. For example, the layer 6215 may correspond to a 3*3 convolution operation and an activation operation (eg, a ReLU operation), and the layer 6216 may correspond to a 3*3 convolution operation. The strides of

layers

6215 and 6216 may be the same as layers 66102 and 6103 .

영상 복원 파트(622)는 복수의 레이어들(6221, 6222) 및 채널 선택기(6224)를 포함할 수 있다. 영상 복원 파트(622)는 컨볼루션 블록(6229)에 의한 작업 벡터

의 컨볼루션 결과를 레이어(6222)의 출력에 곱하는 곱셈 동작, 및 스킵 커넥션(6227)을 따라 작업-무관 아키텍처(610)의 입력을 곱셈 결과에 더하는 덧셈 동작을 더 포함할 수 있다. 제어 아키텍처(630)는 복수의 아키텍처 제어 네트워크들(6301 내지 6304)를 포함할 수 있다. 복수의 레이어들(6221, 6222)은 컨볼루션 동작, 활성화 함수의 연산, 및 픽셀 셔플(pixel shuffle) 동작 중 적어도 하나에 대응할 수 있다. 예를 들어, 레이어(6221)는 *2 픽셀 셔플 동작, 3*3 컨볼루션 연산, 및 활성화 연산(예: ReLU 연산)에 대응할 수 있고, 레이어(6222)는 3*3 컨볼루션 연산에 대응할 수 있다. 레이어(6101)의 2배의 스트라이드와 레이어(6221)의 2배의 픽셀 셔플을 통해 입력 영상과 복원 영상의 크기가 동일하게 유지될 수 있다.The image reconstruction part 622 may include a plurality of

layers

6221 and 6222 and a channel selector 6224 . The image reconstruction part 622 is a working vector by the convolution block 6229

It may further include a multiplication operation of multiplying the output of the layer 6222 by the convolution result of , and an addition operation of adding the input of the task-independent architecture 610 along a skip connection 6227 to the multiplication result. Control architecture 630 may include a plurality of architecture control networks 6301 - 6304 . The plurality of

layers

6221 and 6222 may correspond to at least one of a convolution operation, an operation of an activation function, and a pixel shuffle operation. For example, layer 6221 may correspond to a *2 pixel shuffle operation, a 3*3 convolution operation, and an activation operation (eg, a ReLU operation), and layer 6222 may correspond to a 3*3 convolution operation. have. The size of the input image and the reconstructed image may be maintained to be the same through stride twice as large as the layer 6101 and pixel shuffling twice as much as the layer 6221 .

트레이닝 장치는 각 채널이 주어진 작업에 중요한지, 모든 작업에 중요한지, 또는 아무것도 아닌지 결정하여 효율적인 네트워크를 찾을 수 있다. 작업-특화 아키텍처(620)를 찾기 위해, 주어진 작업에 대해 중요한 채널은 유지되고 무관한 채널은 제거될 수 있다. 이하, 작업-특화 아키텍처는

로 나타낼 수 있다. 유사하게, 작업-무관 아키텍처(610)의 경우 대부분의 작업에 중요한 채널은 유지되고 무관한 채널은 제거될 수 있다. 이하, 작업-무관 아키텍처는

로 나타낼 수 있다. 작업에 대한 채널 중요도(또는 채널에 대한 작업 선호도)

∈

및

∈

에 의해 결정될 수 있다. 채널 중요도

는 제어 아키텍처(630)의 출력에 해당할 수 있다. 아래에서 다시 설명하겠지만, 채널 중요도

는 채널 중요도

에 기초하여 결정될 수 있다. 여기서, m, N, 및 C는 작업 인덱스, 채널 선택 모듈 인덱스, 및 채널 인덱스를 나타낸다.A training device can find an efficient network by determining whether each channel is important for a given task, for all tasks, or none. To find a task-specific architecture 620 , channels that are important for a given task may be kept and channels that are not relevant may be removed. Hereinafter, the task-specific architecture is

can be expressed as Similarly, in the case of task-agnostic architecture 610 , channels that are most task-critical may be kept and channels that are not relevant may be removed. Hereinafter, the task-agnostic architecture is

can be expressed as Channel Importance for a Job (or Job Affinity for a Channel)

∈

and

∈

can be determined by Channel Importance

may correspond to the output of the control architecture 630 . As discussed further below, channel importance

is the channel importance

can be determined based on Here, m, N, and C denote a work index, a channel selection module index, and a channel index.

도 7을 추가로 참조하여, 채널 선택 동작을 설명한다. 채널 선택기(710)는 채널 중요도(701)를 채널 선택 정보(702)로 변환하고, 채널 선택 정보(702)에 기초하여 슈퍼 특징 맵(705)에서 적어도 일부의 채널을 선택(또는 적어도 일부의 채널을 제거)하여 변형 특징 맵(706)을 결정할 수 있다. 채널 중요도(701)는 실수 벡터에 해당할 수 있고, 채널 선택 정보(702)는 2진 벡터에 해당할 수 있다. 채널 선택기(710)는 변환 함수(711)를 통해 실수 벡터의 각 실수 엘리먼트를 참 또는 거짓으로 변환하여 2진 벡터를 결정할 수 있다. 변환 함수(711)는 수학식 5와 같이 제공되는 미분 가능한(differentiable) 게이팅 함수(gating function)에 해당할 수 있다.With further reference to FIG. 7 , the channel selection operation will be described. The channel selector 710 converts the channel importance 701 into the channel selection information 702 and selects (or at least some channels) from the super feature map 705 based on the channel selection information 702 . ) to determine the variant feature map 706 . The channel importance 701 may correspond to a real vector, and the channel selection information 702 may correspond to a binary vector. The channel selector 710 may determine the binary vector by converting each real element of the real vector to true or false through the transformation function 711 . The transform function 711 may correspond to a differentiable gating function provided as shown in Equation (5).

*∈{a, s}이고,

은

의 구성요소를 나타내고,

[·]는 입력이 참일 때 1을 반환하고 그렇지 않으면 0을 반환하는 지시 함수(indicator function)이다. 따라서,

및

의 각 파라미터는 슈퍼 네트워크에서 대응 채널이

및

에 대해 각각 활성화 또는 비활성화되도록 결정할 수 있다. 트레이닝 시 곱셈 연산(712)을 통해 슈퍼 특징 맵(705)에 채널 선택 정보(702)를 곱하여 변형 특징 맵(706)이 생성될 수 있다. 추론의 경우 곱셈 연산(712)이 스킵 처리로 대체될 수 있고, 이에 따라 연산량 감소가 실현될 수 있다. 보다 구체적으로, 채널 선택 정보(702)의 거짓에 대응하는 웨이트 텐서의 로드는 스킵될 수 있고, 참에 대응하는 웨이트 텐서만 선택적으로 로드되어 컨볼루션 연산에 이용될 수행될 수 있다.*∈{a, s},

silver

represents the components of

[·] is an indicator function that returns 1 if the input is true and 0 otherwise. therefore,

and

Each parameter in the super network has a corresponding channel

and

can be determined to be activated or deactivated, respectively. During training, the modified feature map 706 may be generated by multiplying the super feature map 705 by the channel selection information 702 through the multiplication operation 712 . In the case of speculation, the multiplication operation 712 may be replaced with a skip processing, and thus a reduction in the amount of operations may be realized. More specifically, a load of a weight tensor corresponding to false of the channel selection information 702 may be skipped, and only a weight tensor corresponding to true may be selectively loaded and used for a convolution operation.

도 8을 추가로 참조하여, 아키텍처 제어 네트워크를 설명한다. 도 8을 참조하면, 아키텍처 제어 네트워크

는 컨볼루셔널 레이어(811) 및 활성화 함수(812)를 포함할 수 있고, 완전 연결 네트워크로 구성될 수 있다.

는

의 네트워크 아키텍처를 적응적으로 수정할 수 있다.

는 수학식 6과 같이 정의될 수 있다.With further reference to FIG. 8 , an architectural control network is described. Referring to Figure 8, the architecture control network

may include a convolutional layer 811 and an activation function 812, and may be configured as a fully connected network.

Is

can adaptively modify the network architecture of

can be defined as in Equation (6).

는 n번째 채널 선택기의 아키텍처 제어 네트워크를 나타낸다.

은 채널에 대한 작업 선호도를 나타내며, 각 작업 벡터가 수퍼 네트워크에서 채널을 적응적으로 활성화하므로

의 함수일 수 있다.

denotes the architectural control network of the nth channel selector.

represents the task preference for the channel, since each task vector adaptively activates the channel in the supernetwork.

can be a function of

다시 도 6을 참조하면, 작업-무관 레이어를 찾기 위해, 수학식 7과 같이 트레이닝 전반에 걸친 작업들로부터 각 채널에 대한 선호도

를 수집하여 각 채널에 대한 선호도

를 결정될 수 있다.Referring back to FIG. 6 , in order to find a task-independent layer, preference for each channel from tasks throughout training as in Equation 7

to collect preferences for each channel

can be determined.

는 0 값으로 초기화될 수 있다. c는 n번째 채널 선택 모듈의 채널 인덱스, α는 지수 이동 평균(exponential moving average)에 대한 하이퍼파라미터(hyperparameter)를 나타낸다.

는 수학식 8과 같이 합의 기준(agreement criterion)을 계산하여 각 채널의 선호도에 대한 크기 M의 미니 배치(mini-batch)에서 작업의 합의를 추정하는데 사용될 수 있다.

may be initialized to a value of 0. c is a channel index of the n-th channel selection module, and α is a hyperparameter for an exponential moving average.

can be used to estimate the agreement of tasks in a mini-batch of size M for each channel preference by calculating an agreement criterion as in Equation 8.

γ는 임계 하이퍼파라미터이다. 수학식 8이 성립되는지 여부는 부울 변수(Boolean variable) η에 의해 나타낼 수 있다. 수학식 8이 성립되면(η = 1), 대부분의 작업들은 채널들을 프루닝하고 레이어를 공유하는 것에 동의할 수 있다. 그러나, 수학식 8의 조건은 현재 트레이닝 미니 배치에 있는 작업에 따라 성립될 수도 있고 성립되지 않을 수도 있다. 따라서, η는, 수학식 7과 유사하게, 수학식 9와 같이 전체 데이터 세트로부터 작업의 합의를 얻기 위해 트레이닝 내내

을 통해 누적될 수 있다.γ is the critical hyperparameter. Whether Equation (8) holds true may be expressed by a Boolean variable η. If Equation 8 holds (η = 1), then most operations can agree to prune the channels and share the layer. However, the condition of Equation (8) may or may not be satisfied depending on the task currently in the training mini-batch. Thus, η is, similar to Equation 7, throughout training to obtain a consensus of tasks from the entire data set as Equation 9

can be accumulated through

은 0으로 초기화될 수 있다.

이 클수록, 더 많은 작업이 n번째 채널에 대한 선호도에 동의하고, 더 많은 전략이 n번째 채널 선택 모듈이 작업-무관이 되는 것을 선호할 수 있다. 작업-무관 레이어는 네트워크의 초기 단계에 함께 위치하여 작업 간에 특징 재사용을 가능케 할 수 있다. n번째 채널 선택 모듈은 n번째 채널 선택기와 이전 채널 선택기 모두가 수학식 10과 같이 주어진 임계 값 γ보다 큰

를 갖는 경우 작업-무관에 해당할 수 있다. 이는 수학식 10과 같이 나타낼 수 있다.

may be initialized to 0.

The larger this is, the more tasks agree on a preference for the nth channel, and the more strategies may prefer the nth channel selection module to be task-agnostic. The task-agnostic layer can be co-located at an early stage in the network to enable feature reuse between tasks. The n-th channel selection module indicates that both the n-th channel selector and the previous channel selector are greater than the threshold value γ given by Equation (10).

If it has , it may correspond to work-independent. This can be expressed as Equation (10).

φ∈

는 결정 변수를 나타낸다. n번째 채널 선택기가 작업-무관에 해당하는 경우, n번째 구성요소

은 1이다.φ∈

represents the decision variable. If the nth channel selector is task-independent, the nth component

is 1

효율적인 아키텍처의 검색을 위해, 실시예들은 정규화 텀(regularization term)을 사용할 수 있다.

(·, ·)은 영상 복원 작업을 위한 표준

손실 함수를 나타낸다. 리소스 정규화 함수

(·)은 현재 검색된 아키텍처의 리소스 양을 수학식 4에 의해 계산할 수 있다. 여러 영상 효과를 보다 효율적으로 생성하기 위한 작업-무관 레이어의 수를 최대화하기 위해 정규화 함수

가 이용될 수 있다. 전반적인 목적 함수는 수학식 11과 같이 나타낼 수 있다.For efficient architecture search, embodiments may use a regularization term.

(·, ·) is the standard for image restoration work

Represents the loss function. Resource normalization function

(·) can calculate the resource amount of the currently searched architecture by Equation (4). Normalization function to maximize the number of task-independent layers for more efficient creation of multiple transitions

can be used The overall objective function can be expressed as Equation (11).

θ는 복원 네트워크 f(

및

), ø는 아키텍처 제어 네트워크

의 학습 가능한 파라미터이고,

및

는 이들의 균형을 맞추기 위한 하이퍼 파라미터이다. 가능한 성능을 희생하지 않고 네트워크를 작업-무관으로 만들기 위해

는 수학식 12와 같이 채널 중요도에 대한 작업 간의 불일치에 페널티를 부과할 수 있다.θ is the restoration network f(

and

), ø is the architecture control network

is a learnable parameter of

and

is a hyperparameter for balancing them. To make the network task-agnostic without sacrificing possible performance

may impose a penalty on the discrepancy between tasks on the channel importance as shown in Equation (12).

n=0의 레이어는 입력 영상을 나타내고, 입력 영상이 주어진 작업에 대한 여러 영상 효과에 대해 공유되기 때문에

≡1일 수 있다. 수학식 11에서

는 복원 성능에 관한 제1 손실 성분에 해당할 수 있고,

및

는 연산량에 관한 제2 손실 성분에 해당할 수 있다. 제1 손실 성분은 트레이닝 목표 영상과 복원 영상 간의 차이가 감소하도록 소스 뉴럴 네트워크를 트레이닝할 수 있고, 제2 손실 성분은 작업-무관 아키텍처에 포함된 레이어의 수가 증가되어 연산량이 감소되도록 소스 뉴럴 네트워크를 트레이닝할 수 있다.A layer of n=0 represents the input image, since the input image is shared for multiple visual effects for a given task.

It can be ≡1. in Equation 11

may correspond to the first loss component with respect to the restoration performance,

and

may correspond to the second loss component with respect to the amount of computation. The first loss component may train the source neural network so that the difference between the training target image and the reconstructed image is reduced, and the second loss component trains the source neural network so that the number of layers included in the task-independent architecture is increased to reduce the amount of computation. can train

도 9는 절대적 목표를 갖는 트레이닝 데이터 세트를 나타낸다. 기존의 영상 복원 작업은 다양한 열화 레벨의 열화 영상을 원래 영상으로 복원하는 것으로 정의될 수 있다. 예를 들어, 트레이닝 입력 영상들(911 내지 913)은 서로 다른 열화 레벨(예: 1 내지 3)을 가짐에도 각각 동일한 열화 레벨(예: 0)의 트레이닝 목표 영상(921)과 트레이닝 쌍을 이룰 수 있다. 도 9의 예시에서 트레이닝 목표 영상(921)은 절대적 목표에 해당할 수 있다.9 shows a training data set with an absolute goal. The existing image restoration operation may be defined as restoring a degraded image of various degradation levels to an original image. For example, the training input images 911 to 913 may form a training pair with the training target image 921 of the same degradation level (eg, 0) even though they have different degradation levels (eg, 1 to 3). have. In the example of FIG. 9 , the training target image 921 may correspond to an absolute target.

도 10은 일 실시예에 따른 상대적 목표를 갖는 트레이닝 데이터 세트를 나타내고, 도 11은 일 실시예에 따른 트레이닝 데이터 세트의 구성을 나타낸다. 실시예들에 따른 제어 가능한 영상 복원은 시각적으로 만족스러운 다양한 영상 효과를 생성하는 것을 목표로 하며, 이는 단일 원본 영상으로의 복원에 초점을 맞춘 기존의 트레이닝 기반으로 달성되기 어려울 수 있다. 실시예들에 따르면 복원 작업은 열화 레벨(또는 효과 레벨)을 조절하여 다양한 효과를 부여하는 것으로 재정의될 수 있다. 예를 들어, 트레이닝 입력 영상들(1011 내지 1013)은 각각 트레이닝 목표 영상들(1021 내지 1023) 중 어느 하나와 트레이닝 쌍을 이룰 수 있다. 도 10의 예시에서 트레이닝 목표 영상들(1021 내지 1023)은 상대적 목표에 해당할 수 있다.10 shows a training data set having a relative target according to an embodiment, and FIG. 11 shows a configuration of a training data set according to an embodiment. Controllable image reconstruction according to embodiments aims to create a variety of visually pleasing image effects, which may be difficult to achieve based on existing training focused on restoration to a single original image. According to embodiments, the restoration operation may be redefined as providing various effects by adjusting the deterioration level (or effect level). For example, each of the training input images 1011 to 1013 may form a training pair with any one of the training target images 1021 to 1023 , respectively. In the example of FIG. 10 , training target images 1021 to 1023 may correspond to relative targets.

복원 정도는 입력과 목표 간의 레벨 차이를 나타내는 조절 레벨로 주어질 수 있다. 예를 들어, 트레이닝 입력 영상(1011)을 트레이닝 목표 영상(1021)으로, 트레이닝 입력 영상(1011)을 트레이닝 목표 영상(1021)으로, 트레이닝 입력 영상(1011)을 트레이닝 목표 영상(1021)으로 복원하는 것은 0의 조절 레벨에 대응할 수 있다. 트레이닝 입력 영상(1012)을 트레이닝 목표 영상(1021)으로, 트레이닝 입력 영상(1013)을 트레이닝 목표 영상(1022)으로 복원하는 것은 1의 조절 레벨에 대응할 수 있다. 트레이닝 입력 영상(1013)을 트레이닝 목표 영상(1021)으로 복원하는 것은 2의 조절 레벨에 대응할 수 있다. 반대로, 열화 효과의 추가를 위한 -1, -2의 조절 레벨도 존재할 수 있다. 이러한 조절 레벨에 기초하여 작업 벡터

가 수학식 13과 같이 정의될 수 있다.The degree of restoration may be given as an adjustment level representing a level difference between the input and the target. For example, to restore the training input image 1011 to the training target image 1021 , the training input image 1011 to the training target image 1021 , and the training input image 1011 to the training target image 1021 . This may correspond to an adjustment level of zero. Restoring the training input image 1012 into the training target image 1021 and the training input image 1013 into the training target image 1022 may correspond to an adjustment level of 1. Restoring the training input image 1013 into the training target image 1021 may correspond to an adjustment level of 2. Conversely, there may also be adjustment levels of -1, -2 for the addition of the degradation effect. Based on these adjustment levels, the working vector

can be defined as in Equation 13.

,

∈

는 입력 영상 및 목표 영상의 열화 레벨 내지 효과 레벨을 나타낸다. d번째 열화 유형에 대해

,

∈[0, 1]로 정의될 수 있다. 예를 들어, 도 11의 영상들(1101 내지 1106)은 표준 편차 σ=0 내지 50의 노이즈를 가진다. 이러한 노이즈에 기초하여 영상들(1101 내지 1106)에 열화 레벨 l=0 내지 1이 부여될 수 있다.

＜

의 경우는 목표 영상이 입력 영상보다 덜 열화된 시나리오, 다시 말해 입력 영상을 보다 나은 품질의 목표 영상으로 복원하는 시나리오에 대응할 수 있다.

＞

의 경우는 목표 영상이 입력 영상보다 더 열화된 시나리오, 다시 말해 입력 영상에 열화 효과를 추가하는 시나리오에 대응할 수 있다.

,

∈

denotes a deterioration level or an effect level of the input image and the target image. for the dth degradation type

,

It can be defined as ∈[0, 1]. For example, the images 1101 to 1106 of FIG. 11 have noise of standard deviation σ=0 to 50. A degradation level l=0 to 1 may be provided to the images 1101 to 1106 based on the noise.

<

In the case of , the target image may correspond to a scenario in which the target image is less degraded than the input image, that is, a scenario in which the input image is restored to a higher quality target image.

>

In the case of , may correspond to a scenario in which the target image is more deteriorated than the input image, that is, a scenario in which a deterioration effect is added to the input image.

제2 영상(1102)이 제1 트레이닝 입력 영상에 해당하고, 제4 영상(1104)이 제1 트레이닝 목표 영상에 해당하는 경우, 제1 작업 벡터는 노이즈 레벨을 0.4만큼 낮추는 제1 영상 효과를 지시할 수 있고, 이러한 제1 트레이닝 입력 영상, 제1 작업 벡터, 및 제1 트레이닝 목표 영상은 제1 트레이닝 세트를 구성할 수 있다. 제3 영상(1103)이 제2 트레이닝 입력 영상에 해당하고, 제5 영상(1105)이 제2 트레이닝 목표 영상에 해당하는 경우, 제2 작업 벡터는 노이즈 레벨을 0.4만큼 낮추는 제2 영상 효과를 지시할 수 있고, 이러한 제2 트레이닝 입력 영상, 제2 작업 벡터, 및 제2 트레이닝 목표 영상은 제2 트레이닝 세트를 구성할 수 있다. 이때, 제2 트레이닝 입력 영상의 입력 효과 레벨과 제2 트레이닝 목표 영상의 목표 효과 레벨 간의 차이가 제1 트레이닝 입력 영상의 입력 효과 레벨과 제1 트레이닝 목표 영상의 목표 효과 레벨 간의 차이와 동일하다. 따라서, 제1 작업 벡터와 제2 작업 벡터는 동일한 값을 가질 수 있다. 이에 따라 제1 작업 벡터와 제2 작업 벡터는 0.4의 레벨 차이라는 상대적인 목표로 트레이닝 방향을 설정할 수 있다. 각 미니 배치에 대해 트레이닝 영상 쌍은 단일 열화 유형에 대한 균일 분포, 모든 열화 유형에 대한 2진 분포, 및 모든 열화 유형에 대한 균일 분포에 대해 동일하게 샘플링될 수 있다.When the second image 1102 corresponds to the first training input image and the fourth image 1104 corresponds to the first training target image, the first working vector indicates a first image effect that lowers the noise level by 0.4. The first training input image, the first working vector, and the first training target image may constitute a first training set. When the third image 1103 corresponds to the second training input image and the fifth image 1105 corresponds to the second training target image, the second working vector indicates a second image effect that lowers the noise level by 0.4. The second training input image, the second working vector, and the second training target image may constitute a second training set. In this case, the difference between the input effect level of the second training input image and the target effect level of the second training target image is the same as the difference between the input effect level of the first training input image and the target effect level of the first training target image. Accordingly, the first working vector and the second working vector may have the same value. Accordingly, the training direction may be set as a relative target of a level difference of 0.4 between the first working vector and the second working vector. For each mini-batch, a pair of training images can be sampled equally for a uniform distribution for a single degradation type, a binary distribution for all degradation types, and a uniform distribution for all degradation types.

도 12는 일 실시예에 따른 제1 트레이닝 데이터 세트에 기초한 트레이닝 동작을 나타낸 플로우 차트이다. 도 12의 단계들(1210 내지 1240)은 순차적으로 수행되거나, 혹은 비 순차적으로 수행될 수 있다. 예를 들어, 단계들(1210 내지 1240)의 순서가 변경되거나, 및/또는 단계들(1210 내지 1240)의 적어도 둘이 병렬적으로 수행될 수 있다. 단계들(1210 내지 1240)은 트레이닝 장치(500) 및/또는 전자 장치(1400)의 적어도 하나의 구성요소(예: 프로세서(510, 1410))에 의해 수행될 수 있다.12 is a flowchart illustrating a training operation based on a first training data set according to an embodiment. Steps 1210 to 1240 of FIG. 12 may be sequentially performed or non-sequentially performed. For example, the order of steps 1210 to 1240 may be changed, and/or at least two of steps 1210 to 1240 may be performed in parallel. Steps 1210 to 1240 may be performed by the training device 500 and/or at least one component (eg, the processors 510 and 1410 ) of the electronic device 1400 .

단계(1210)에서 제1 트레이닝 입력 영상, 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터, 및 제1 영상 효과에 따른 제1 트레이닝 목표 영상을 포함하는 제1 트레이닝 데이터 세트가 수신될 수 있다. 단계(1220)에서 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 제1 트레이닝 입력 영상으로부터 복수의 후보 영상 효과들이 공유하는 공통 특징이 추출될 수 있다. 단계(1230)에서 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 제1 작업 벡터에 기초하여 공통 특징이 제1 복원 영상으로 복원된다. 단계(1240)에서 제1 트레이닝 목표 영상과 제1 복원 영상 간의 차이, 및 공통 특징의 추출 및 제1 복원 영상의 복원과 관련된 연산량에 기초하여 소스 뉴럴 네트워크가 갱신된다. 예를 들어, 소스 뉴럴 네트워크는 작업-무관 아키텍처에 포함된 레이어의 수가 증가되어 연산량이 감소되도록 갱신될 수 있다.In operation 1210 , a first training data set including a first training input image, a first working vector indicating a first image effect among a plurality of candidate image effects, and a first training target image according to the first image effect can be received. In operation 1220 , a common feature shared by a plurality of candidate image effects may be extracted from the first training input image based on the task-independent architecture of the source neural network. In operation 1230, a common feature is reconstructed as a first reconstructed image based on the first working vector and the work-specific architecture of the source neural network. In operation 1240 , the source neural network is updated based on the difference between the first training target image and the first reconstructed image, and the amount of computation related to extraction of common features and reconstruction of the first reconstructed image. For example, the source neural network may be updated such that the number of layers included in the task-agnostic architecture is increased to reduce the amount of computation.

제1 작업 벡터는 제1 영상 효과의 각 효과 유형의 조절 레벨을 포함할 수 있고, 조절 레벨의 값은 제1 트레이닝 입력 영상의 입력 효과 레벨과 제1 트레이닝 목표 영상의 목표 효과 레벨 간의 차이에 의해 결정될 수 있다. 제2 트레이닝 입력 영상, 제2 영상 효과를 지시하는 제2 작업 벡터, 및 제2 영상 효과에 따른 제2 트레이닝 목표 영상을 포함하는 제2 트레이닝 세트가 존재하고, 제2 트레이닝 입력 영상의 입력 효과 레벨과 제2 트레이닝 목표 영상의 목표 효과 레벨 간의 차이가 제1 트레이닝 입력 영상의 입력 효과 레벨과 제1 트레이닝 목표 영상의 목표 효과 레벨 간의 차이와 동일한 경우, 제2 작업 벡터는 제1 작업 벡터와 동일한 값을 가질 수 있다. 그 밖에, 트레이닝에는 도 1 내지 도 11 및 도 13 내지 도 14에 관한 설명이 적용될 수 있다.The first working vector may include an adjustment level of each effect type of the first image effect, and the value of the adjustment level is determined by a difference between the input effect level of the first training input image and the target effect level of the first training target image. can be decided. A second training set including a second training input image, a second working vector indicating the second image effect, and a second training target image according to the second image effect exists, and the input effect level of the second training input image If the difference between the target effect level of the and the second training target image is the same as the difference between the input effect level of the first training input image and the target effect level of the first training target image, the second working vector has the same value as the first working vector can have In addition, the descriptions of FIGS. 1 to 11 and 13 to 14 may be applied to training.

도 13은 일 실시예에 따른 영상 복원 장치를 나타낸 블록도이다. 도 13을 참조하면, 장치(1300)는 프로세서(1310) 및 메모리(1320)를 포함한다. 메모리(1320)는 프로세서(1310)에 연결되고, 프로세서(1310)에 의해 실행가능한 명령어들, 프로세서(1310)가 연산할 데이터 또는 프로세서(1310)에 의해 처리된 데이터를 저장할 수 있다. 메모리(1320)는 비일시적인 컴퓨터 판독가능 매체, 예컨대 고속 랜덤 액세스 메모리 및/또는 비휘발성 컴퓨터 판독가능 저장 매체(예컨대, 하나 이상의 디스크 저장 장치, 플래쉬 메모리 장치, 또는 기타 비휘발성 솔리드 스테이트 메모리 장치)를 포함할 수 있다.13 is a block diagram illustrating an image restoration apparatus according to an exemplary embodiment. Referring to FIG. 13 , an apparatus 1300 includes a processor 1310 and a memory 1320 . The memory 1320 is connected to the processor 1310 and may store instructions executable by the processor 1310 , data to be operated by the processor 1310 , or data processed by the processor 1310 . Memory 1320 may include non-transitory computer-readable media, such as high-speed random access memory and/or non-volatile computer-readable storage media (eg, one or more disk storage devices, flash memory devices, or other non-volatile solid state memory devices). may include

프로세서(1310)는 도 1 내지 도 12 및 도 14의 동작을 수행하기 위한 명령어들을 실행할 수 있다. 예를 들어, 프로세서(1310)는 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하고, 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 입력 영상으로부터 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하고, 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 제1 작업 벡터에 기초하여 공통 특징을 제1 영상 효과에 대응하는 제1 복원 영상으로 복원할 수 있다. 그 밖에, 영상 복원 장치(1300)에는 도 1 내지 도 12 및 도 14의 설명이 적용될 수 있다.The processor 1310 may execute instructions for performing the operations of FIGS. 1 to 12 and 14 . For example, the processor 1310 receives an input image and a first working vector indicating a first image effect among a plurality of candidate image effects, and receives a plurality of images from the input image based on a work-independent architecture of the source neural network. A common feature shared by the candidate image effects of may be extracted, and the common feature may be reconstructed as a first reconstructed image corresponding to the first image effect based on the work-specific architecture of the source neural network and the first working vector. In addition, the descriptions of FIGS. 1 to 12 and 14 may be applied to the image restoration apparatus 1300 .

도 14는 일 실시예에 따른 전자 장치를 나타낸 블록도이다. 도 14를 참조하면, 전자 장치(1400)는 프로세서(1410), 메모리(1420), 카메라(1430), 저장 장치(1440), 입력 장치(1450), 출력 장치(1460) 및 네트워크 인터페이스(1470)를 포함할 수 있으며, 이들은 통신 버스(1480)를 통해 서로 통신할 수 있다. 예를 들어, 전자 장치(1400)는 이동 전화, 스마트 폰, PDA, 넷북, 태블릿 컴퓨터, 랩톱 컴퓨터 등과 같은 모바일 장치, 스마트 워치, 스마트 밴드, 스마트 안경 등과 같은 웨어러블 디바이스, 데스크탑, 서버 등과 같은 컴퓨팅 장치, 텔레비전, 스마트 텔레비전, 냉장고 등과 같은 가전 제품, 도어 락 등과 같은 보안 장치, 자율주행 차량, 스마트 차량 등과 같은 차량의 적어도 일부로 구현될 수 있다. 전자 장치(1400)는 도 1의 영상 복원 장치(100), 도 5의 트레이닝 장치(500), 및 도 13의 영상 복원 장치(1300) 중 적어도 하나를 구조적 및/또는 기능적으로 포함할 수 있다.14 is a block diagram illustrating an electronic device according to an exemplary embodiment. Referring to FIG. 14 , the electronic device 1400 includes a processor 1410 , a memory 1420 , a camera 1430 , a storage device 1440 , an input device 1450 , an output device 1460 , and a network interface 1470 . may include, and they may communicate with each other via a communication bus 1480 . For example, the electronic device 1400 may include a mobile device such as a mobile phone, a smart phone, a PDA, a netbook, a tablet computer, a laptop computer, and the like, a wearable device such as a smart watch, a smart band, and smart glasses, and a computing device such as a desktop and a server. , home appliances such as televisions, smart televisions, refrigerators, etc., security devices such as door locks, etc., autonomous vehicles, smart vehicles, etc. may be implemented as at least a part of the vehicle. The electronic device 1400 may structurally and/or functionally include at least one of the image restoration apparatus 100 of FIG. 1 , the training apparatus 500 of FIG. 5 , and the image restoration apparatus 1300 of FIG. 13 .

프로세서(1410)는 전자 장치(1400) 내에서 실행하기 위한 기능 및 명령어들을 실행할 수 있다. 예를 들어, 프로세서(1410)는 메모리(1420) 또는 저장 장치(1440)에 저장된 명령어들을 처리할 수 있다. 프로세서(1410)는 도 1 내지 도 13을 통하여 설명된 동작을 수행할 수 있다. 예를 들어, 프로세서(1410)는 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하고, 소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 입력 영상으로부터 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하고, 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 제1 작업 벡터에 기초하여 공통 특징을 제1 영상 효과에 대응하는 제1 복원 영상으로 복원할 수 있다. 메모리(1420)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함할 수 있다. 메모리(1420)는 프로세서(1410)에 의해 실행하기 위한 명령어들을 저장할 수 있고, 전자 장치(1400)에 의해 소프트웨어 및/또는 애플리케이션이 실행되는 동안 관련 정보를 저장할 수 있다.The processor 1410 may execute functions and instructions to be executed in the electronic device 1400 . For example, the processor 1410 may process instructions stored in the memory 1420 or the storage device 1440 . The processor 1410 may perform the operations described with reference to FIGS. 1 to 13 . For example, the processor 1410 receives an input image and a first working vector indicating a first image effect among a plurality of candidate image effects, and receives a plurality of images from the input image based on a work-independent architecture of the source neural network. A common feature shared by the candidate image effects of may be extracted, and the common feature may be reconstructed as a first reconstructed image corresponding to the first image effect based on the work-specific architecture of the source neural network and the first working vector. Memory 1420 may include a computer-readable storage medium or computer-readable storage device. The memory 1420 may store instructions for execution by the processor 1410 , and may store related information while software and/or applications are executed by the electronic device 1400 .

카메라(1430)는 입력 영상(사진 및/또는 비디오)을 생성할 수 있다. 저장 장치(1440)는 컴퓨터 판독가능한 저장 매체 또는 컴퓨터 판독가능한 저장 장치를 포함한다. 저장 장치(1440)는 메모리(1420)보다 더 많은 양의 정보를 저장하고, 정보를 장기간 저장할 수 있다. 예를 들어, 저장 장치(1440)는 자기 하드 디스크, 광 디스크, 플래쉬 메모리, 플로피 디스크 또는 이 기술 분야에서 알려진 다른 형태의 비휘발성 메모리를 포함할 수 있다.The camera 1430 may generate an input image (photo and/or video). Storage device 1440 includes a computer-readable storage medium or computer-readable storage device. The storage device 1440 may store a larger amount of information than the memory 1420 and may store the information for a long period of time. For example, the storage device 1440 may include a magnetic hard disk, an optical disk, a flash memory, a floppy disk, or any other form of non-volatile memory known in the art.

입력 장치(1450)는 키보드 및 마우스를 통한 전통적인 입력 방식, 및 터치 입력, 음성 입력, 및 이미지 입력과 같은 새로운 입력 방식을 통해 사용자로부터 입력을 수신할 수 있다. 예를 들어, 입력 장치(1450)는 키보드, 마우스, 터치 스크린, 마이크로폰, 또는 사용자로부터 입력을 검출하고, 검출된 입력을 전자 장치(1400)에 전달할 수 있는 임의의 다른 장치를 포함할 수 있다. 출력 장치(1460)는 시각적, 청각적 또는 촉각적인 채널을 통해 사용자에게 전자 장치(1400)의 출력을 제공할 수 있다. 출력 장치(1460)는 예를 들어, 디스플레이, 터치 스크린, 스피커, 진동 발생 장치 또는 사용자에게 출력을 제공할 수 있는 임의의 다른 장치를 포함할 수 있다. 네트워크 인터페이스(1470)는 유선 또는 무선 네트워크를 통해 외부 장치와 통신할 수 있다.The input device 1450 may receive an input from a user through a traditional input method through a keyboard and a mouse, and a new input method such as a touch input, a voice input, and an image input. For example, the input device 1450 may include a keyboard, mouse, touch screen, microphone, or any other device capable of detecting input from a user and passing the detected input to the electronic device 1400 . The output device 1460 may provide an output of the electronic device 1400 to the user through a visual, auditory, or tactile channel. Output device 1460 may include, for example, a display, a touch screen, a speaker, a vibration generating device, or any other device capable of providing output to a user. The network interface 1470 may communicate with an external device through a wired or wireless network.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using a general purpose computer or special purpose computer. The processing device may execute an operating system (OS) and a software application running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may store program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. have. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, a person skilled in the art may apply various technical modifications and variations based thereon. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

전자 장치의 프로세서에 의해 수행되는 영상 복원 방법에 있어서,
입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하는 단계;
소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하는 단계; 및
상기 소스 뉴럴 네트워크의 작업-특화 아키텍처에 상기 제1 작업 벡터를 적용하여 결정된 제1 작업-특화 네트워크에 기초하여 상기 공통 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는 단계
를 포함하는 영상 복원 방법.An image restoration method performed by a processor of an electronic device, comprising:
receiving an input image and a first working vector indicating a first video effect among a plurality of candidate video effects;
extracting a common feature shared by the plurality of candidate image effects from the input image based on a task-independent architecture of a source neural network; and
reconstructing the common feature into a first reconstructed image corresponding to the first image effect based on a first task-specific network determined by applying the first working vector to the task-specific architecture of the source neural network
Image restoration method comprising a.

삭제delete

제1항에 있어서,
상기 복원하는 단계는
상기 제1 작업-특화 네트워크에 기초하여 상기 공통 특징으로부터 상기 제1 영상 효과에 특화된 제1 특화 특징을 추출하는 단계; 및
상기 제1 작업-특화 네트워크에 기초하여 상기 제1 특화 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는 단계
를 포함하는 영상 복원 방법.According to claim 1,
The restoration step
extracting a first specialized feature specialized for the first video effect from the common feature based on the first task-specific network; and
Restoring the first specialized feature to a first restored image corresponding to the first image effect based on the first task-specific network
Image restoration method comprising a.

제1항에 있어서,
상기 제1 작업-특화 네트워크를 결정하는 단계는
아키텍처 제어 네트워크(architecture control network)를 이용하여 상기 제1 작업 벡터에 대응하는 제1 채널 선택 정보를 생성하는 단계; 및
상기 제1 채널 선택 정보에 기초하여 상기 작업-특화 아키텍처의 적어도 일부의 채널을 제거하여 상기 제1 작업-특화 네트워크를 결정하는 단계
를 포함하는, 영상 복원 방법.According to claim 1,
The step of determining the first task-specific network comprises:
generating first channel selection information corresponding to the first working vector using an architecture control network; and
determining the first task-specific network by removing at least some channels of the task-specific architecture based on the first channel selection information;
Including, an image restoration method.

제4항에 있어서,
상기 제1 채널 선택 정보를 생성하는 단계는
상기 아키텍처 제어 네트워크를 통해 상기 제1 작업 벡터를 처리하여 제1 실수 벡터를 생성하는 단계; 및
변환 함수를 통해 상기 제1 실수 벡터의 각 실수 엘리먼트를 참 또는 거짓으로 변환하여 상기 제1 채널 선택 정보를 생성하는 단계
를 포함하는, 영상 복원 방법.5. The method of claim 4,
The step of generating the first channel selection information includes:
processing the first working vector through the architecture control network to generate a first real vector; and
generating the first channel selection information by converting each real element of the first real vector into true or false through a conversion function
Including, an image restoration method.

제1항에 있어서,
상기 추출하는 단계는
상기 작업-무관 아키텍처에 공유 파라미터를 적용하여 작업-무관 네트워크를 결정하는 단계; 및
상기 작업-무관 네트워크에 기초하여 상기 입력 영상으로부터 상기 공통 특징을 추출하는 단계
를 포함하는, 영상 복원 방법.According to claim 1,
The extraction step
determining a work-agnostic network by applying a shared parameter to the work-agnostic architecture; and
extracting the common feature from the input image based on the task-independent network;
Including, an image restoration method.

제1항에 있어서,
상기 복수의 후보 영상 효과들 중에 제2 영상 효과에 대응하는 제2 작업 벡터를 수신하는 단계; 및
상기 작업-특화 아키텍처 및 상기 제2 작업 벡터에 기초하여 상기 공통 특징을 상기 제2 영상 효과에 대응하는 제2 복원 영상으로 복원하는 단계
를 더 포함하고,
상기 공통 특징은 상기 제2 복원 영상의 복원을 위해 재사용되는,
영상 복원 방법.According to claim 1,
receiving a second working vector corresponding to a second video effect from among the plurality of candidate video effects; and
reconstructing the common feature into a second reconstructed image corresponding to the second image effect based on the work-specific architecture and the second working vector
further comprising,
The common feature is reused for reconstructing the second reconstructed image,
How to restore video.

제1항에 있어서,
상기 제1 작업 벡터는
상기 제1 영상 효과의 각 효과 유형의 조절 레벨을 포함하는,
영상 복원 방법.According to claim 1,
The first working vector is
including the adjustment level of each effect type of the first video effect,
How to restore video.

전자 장치의 프로세서에 의해 수행되는 트레이닝 방법에 있어서,
제1 트레이닝 입력 영상, 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터, 및 상기 제1 영상 효과에 따른 제1 트레이닝 목표 영상을 포함하는 제1 트레이닝 데이터 세트를 수신하는 단계;
소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 제1 트레이닝 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하는 단계;
상기 소스 뉴럴 네트워크의 작업-특화 아키텍처에 상기 제1 작업 벡터를 적용하여 결정된 제1 작업-특화 네트워크에 기초하여 상기 공통 특징을 제1 복원 영상으로 복원하는 단계; 및
상기 제1 트레이닝 목표 영상과 상기 제1 복원 영상 간의 차이, 및 상기 공통 특징의 추출 및 상기 제1 복원 영상의 복원과 관련된 연산량에 기초하여 상기 소스 뉴럴 네트워크를 갱신하는 단계
를 포함하는 트레이닝 방법.A training method performed by a processor of an electronic device, comprising:
Receiving a first training data set including a first training input image, a first working vector indicating a first image effect among a plurality of candidate image effects, and a first training target image according to the first image effect ;
extracting a common feature shared by the plurality of candidate image effects from the first training input image based on a task-independent architecture of a source neural network;
reconstructing the common feature into a first reconstructed image based on a first work-specific network determined by applying the first working vector to the work-specific architecture of the source neural network; and
updating the source neural network based on a difference between the first training target image and the first reconstructed image, and an amount of computation related to extraction of the common feature and reconstruction of the first reconstructed image;
A training method comprising a.

제9항에 있어서,
상기 소스 뉴럴 네트워크를 갱신하는 단계는
상기 작업-무관 아키텍처에 포함된 레이어의 수가 증가되어 상기 연산량이 감소되도록 상기 소스 뉴럴 네트워크를 갱신하는 단계를 포함하는,
트레이닝 방법.10. The method of claim 9,
The step of updating the source neural network includes:
Updating the source neural network so that the number of layers included in the task-agnostic architecture is increased to reduce the amount of computation,
training method.

제9항에 있어서,
상기 제1 작업 벡터는 상기 제1 영상 효과의 각 효과 유형의 조절 레벨을 포함하고,
상기 조절 레벨의 값은 상기 제1 트레이닝 입력 영상의 입력 효과 레벨과 상기 제1 트레이닝 목표 영상의 목표 효과 레벨 간의 차이에 의해 결정되는,
트레이닝 방법.10. The method of claim 9,
the first working vector includes an adjustment level of each effect type of the first video effect,
The value of the adjustment level is determined by a difference between the input effect level of the first training input image and the target effect level of the first training target image,
training method.

제11항에 있어서,
제2 트레이닝 입력 영상, 제2 영상 효과를 지시하는 제2 작업 벡터, 및 상기 제2 영상 효과에 따른 제2 트레이닝 목표 영상을 포함하는 제2 트레이닝 세트가 존재하고, 상기 제2 트레이닝 입력 영상의 입력 효과 레벨과 상기 제2 트레이닝 목표 영상의 목표 효과 레벨 간의 차이가 상기 제1 트레이닝 입력 영상의 입력 효과 레벨과 상기 제1 트레이닝 목표 영상의 목표 효과 레벨 간의 차이와 동일한 경우, 상기 제2 작업 벡터는 상기 제1 작업 벡터와 동일한 값을 갖는,
트레이닝 방법.12. The method of claim 11,
A second training set including a second training input image, a second working vector indicating a second image effect, and a second training target image according to the second image effect exists, and the input of the second training input image When the difference between the effect level and the target effect level of the second training target image is the same as the difference between the input effect level of the first training input image and the target effect level of the first training target image, the second working vector is the having the same value as the first working vector,
training method.

하드웨어와 결합되어 제1항 및 제3항 내지 제12항 중 어느 하나의 항의 방법을 실행시키기 위하여 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램.A computer program stored in a computer-readable recording medium in combination with hardware to execute the method of any one of claims 1 and 3 to 12.

입력 영상을 생성하는 카메라; 및
상기 입력 영상, 및 복수의 후보 영상 효과들 중에 제1 영상 효과를 지시하는 제1 작업 벡터를 수신하고,
소스 뉴럴 네트워크의 작업-무관 아키텍처에 기초하여 상기 입력 영상으로부터 상기 복수의 후보 영상 효과들이 공유하는 공통 특징을 추출하고,
상기 소스 뉴럴 네트워크의 작업-특화 아키텍처 및 상기 제1 작업 벡터에 기초하여 상기 공통 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는, 프로세서
를 포함하는 전자 장치.a camera that generates an input image; and
receiving the input image and a first working vector indicating a first image effect among a plurality of candidate image effects;
extracting a common feature shared by the plurality of candidate image effects from the input image based on the task-independent architecture of the source neural network;
The processor restores the common feature to a first reconstructed image corresponding to the first image effect based on the first working vector and the work-specific architecture of the source neural network.
An electronic device comprising a.

제14항에 있어서,
상기 프로세서는
상기 작업-특화 아키텍처에 상기 제1 작업 벡터를 적용하여 제1 작업-특화 네트워크를 결정하고,
상기 제1 작업-특화 네트워크에 기초하여 상기 공통 특징을 상기 제1 복원 영상으로 복원하는,
전자 장치.15. The method of claim 14,
the processor
apply the first working vector to the task-specific architecture to determine a first task-specific network;
Restoring the common feature to the first restored image based on the first task-specific network,
electronic device.

제15항에 있어서,
상기 프로세서는
상기 제1 작업-특화 네트워크에 기초하여 상기 공통 특징으로부터 상기 제1 영상 효과에 특화된 제1 특화 특징을 추출하고,
상기 제1 작업-특화 네트워크에 기초하여 상기 제1 특화 특징을 상기 제1 영상 효과에 대응하는 제1 복원 영상으로 복원하는,
전자 장치.16. The method of claim 15,
the processor
extracting a first specialized feature specialized for the first video effect from the common feature based on the first task-specific network;
reconstructing the first specialized feature into a first restored image corresponding to the first image effect based on the first task-specific network;
electronic device.

제15항에 있어서,
상기 프로세서는
아키텍처 제어 네트워크를 이용하여 상기 제1 작업 벡터에 대응하는 제1 채널 선택 정보를 생성하고,
상기 제1 채널 선택 정보에 기초하여 상기 작업-특화 아키텍처의 적어도 일부의 채널을 제거하여 상기 제1 작업-특화 네트워크를 결정하는,
전자 장치.16. The method of claim 15,
the processor
generating first channel selection information corresponding to the first working vector by using the architecture control network;
determining the first task-specific network by removing at least some channels of the task-specific architecture based on the first channel selection information;
electronic device.

제17항에 있어서,
상기 프로세서는
상기 아키텍처 제어 네트워크를 통해 상기 제1 작업 벡터를 처리하여 제1 실수 벡터를 생성하고,
변환 함수를 통해 상기 제1 실수 벡터의 각 실수 엘리먼트를 참 또는 거짓으로 변환하여 상기 제1 채널 선택 정보를 생성하는,
전자 장치.18. The method of claim 17,
the processor
processing the first working vector through the architecture control network to generate a first real vector;
converting each real element of the first real vector into true or false through a conversion function to generate the first channel selection information,
electronic device.

제14항에 있어서,
상기 프로세서는
상기 작업-무관 아키텍처에 공유 파라미터를 적용하여 작업-무관 네트워크를 결정하고,
상기 작업-무관 네트워크에 기초하여 상기 입력 영상으로부터 상기 공통 특징을 추출하는, 전자 장치.15. The method of claim 14,
the processor
determining a task-agnostic network by applying a shared parameter to the task-agnostic architecture;
and extracting the common feature from the input image based on the task-independent network.

제14항에 있어서,
상기 프로세서는
상기 복수의 후보 영상 효과들 중에 제2 영상 효과에 대응하는 제2 작업 벡터를 수신하고,
상기 작업-특화 아키텍처 및 상기 제2 작업 벡터에 기초하여 상기 공통 특징을 상기 제2 영상 효과에 대응하는 제2 복원 영상으로 복원하고,
상기 공통 특징은
상기 제2 복원 영상의 복원을 위해 재사용되는,
전자 장치.15. The method of claim 14,
the processor
receiving a second working vector corresponding to a second video effect among the plurality of candidate video effects;
reconstructing the common feature into a second restored image corresponding to the second image effect based on the task-specific architecture and the second working vector;
The common feature is
reused for restoration of the second restored image,
electronic device.