KR102485872B1

KR102485872B1 - Image quality improving method improving image quality using context vector and image quality improving module performing the same

Info

Publication number: KR102485872B1
Application number: KR1020220113502A
Authority: KR
Inventors: 박세직; 이용균
Original assignee: (주)내스타일
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-01-09

Abstract

Disclosed are an image quality improvement method for improving image quality by utilizing a context vector, and an image quality improvement module for performing the same. The image quality improvement method according to the technical idea of the present disclosure comprises the steps of: generating a full context vector corresponding to a query image and a first context vector corresponding to a plurality of first partition images split from the query image to have a first size; generating an associated feature map by broadcasting the full context vector and the first context vector; generating a pre-processed feature map by performing pre-processing on the associated feature map; and improving image quality of the query image by using the pre-processed feature map and a super-resolution model. According to the present invention, by utilizing a context vector for segmented images, the recognition rate of objects in the images can be increased.

Description

컨텍스트 벡터를 활용하여 이미지 화질을 개선시키는 이미지 화질 개선 방법 및 이를 수행하는 이미지 화질 개선 모듈{IMAGE QUALITY IMPROVING METHOD IMPROVING IMAGE QUALITY USING CONTEXT VECTOR AND IMAGE QUALITY IMPROVING MODULE PERFORMING THE SAME}Image quality improvement method for improving image quality by utilizing context vector and image quality improvement module performing the same

본 발명은 뉴럴 네트워크에 관한 것으로, 더욱 상세하게는 분할 이미지에 대한 컨텍스트 벡터를 활용하여 이미지 화질을 개선시키는 이미지 화질 개선 방법 및 이를 수행하는 이미지 화질 개선 모듈에 관한 것이다.The present invention relates to a neural network, and more particularly, to an image quality improvement method for improving image quality by utilizing a context vector for a segmented image and an image quality improvement module performing the same.

최근 카메라를 포함하는 모바일 기기의 확장 및 카메라 기술의 발전에 따라 획득할 수 있는 영상의 화질이 크게 향상되고 있다. 한편, 기술의 발전에도 불구하고 어두운 저녁의 CCTV 영상이나, 화질이 떨어지는 이미지 내에 객체를 식별하지 못하는 경우도 흔하게 발생하고 있는 것이 현실이다.Recently, with the expansion of mobile devices including cameras and the development of camera technology, the quality of images that can be obtained is greatly improved. On the other hand, despite the development of technology, it is a reality that cases in which an object cannot be identified in a CCTV video of a dark evening or an image with poor quality often occur.

이에 더하여, 카메라 기반의 자율주행 기술이 개발 및 발전됨에 따라 컴퓨터 비전(computer vision) 분야의 성장이 매우 빠르다. 컴퓨터 비전은 사람의 시각과 관련한 시스템 구조를 모방하여 컴퓨터가 카메라와 동영상에서 딥 러닝 모델을 사용하여 디지털 이미지 내에서 객체를 정확하게 식별하고 분류하는 학습을 수행하고, 식별된 객체에 대한 데이터 처리를 수행하는 것이다. 컴퓨터 비전의 발전과 더불어, 딥 러닝에 기반한 디지털 이미지 또는 영상의 화질을 개선하기 위한 여러 논의가 대두되고 있다. In addition, as camera-based self-driving technology is developed and developed, the growth of the computer vision field is very rapid. Computer vision mimics the system structure related to human vision, so that a computer learns to accurately identify and classify objects within digital images using deep learning models from cameras and videos, and performs data processing on identified objects. is to do Along with the development of computer vision, various discussions for improving the quality of digital images or videos based on deep learning are emerging.

본 발명의 목적은, 쿼리 이미지가 다양한 크기로 분할된 파티션 이미지의 컨텍스트 벡터를 활용하여 이미지의 화질을 개선하는 이미지 화질 개선 방법 및 이를 수행하는 이미지 화질 개선 모듈을 제공하는 것이다.An object of the present invention is to provide an image quality improvement method and an image quality improvement module for improving image quality by utilizing context vectors of partition images in which a query image is divided into various sizes.

본 개시의 기술적 사상에 따른 이미지 화질 개선 방법은, 쿼리 이미지에 대응하는 전체 컨텍스트 벡터 및 상기 쿼리 이미지를 제1 크기를 갖도록 분할한 복수의 제1 파티션 이미지들에 대응하는 제1 컨텍스트 벡터 를 생성하는 단계, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터를 브로드 캐스트 함으로써 연관 피쳐맵을 생성하는 단계, 상기 연관 피쳐맵에 대한 전처리를 수행함으로써 전처리 피쳐맵을 생성하는 단계 및 상기 전처리 피처맵과 초고해상도 모델을 활용하여 상기 쿼리 이미지의 화질을 개선시키는 단계를 포함할 수 있다. An image quality improvement method according to the technical idea of the present disclosure includes generating an entire context vector corresponding to a query image and a first context vector corresponding to a plurality of first partition images obtained by dividing the query image to have a first size. generating an associated feature map by broadcasting the full context vector and the first context vector; generating a preprocessed feature map by performing preprocessing on the associated feature map; and generating a preprocessed feature map with the preprocessed feature map and the super resolution model It may include improving the quality of the query image by utilizing.

일 실시예에서, 상기 연관 피쳐맵을 생성하는 단계는, 상기 전체 컨텍스트 벡터를 대응하는 상기 쿼리 이미지에 대한 픽셀 사이즈만큼 복제하는 단계, 상기 제1 컨텍스트 벡터를 대응하는 상기 제1 파티션 이미지에 대한 픽셀 사이즈만큼 복제하는 단계 및 상기 쿼리 이미지에 포함되는 픽셀들 별로 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 픽셀들에 대응하는 컬러 채널을 연관시킴으로써 연관 피쳐맵을 생성하는 단계를 포함할 수 있다. In an embodiment, the generating of the associated feature map may include replicating the entire context vector by a pixel size of the query image corresponding to the entire context vector, and a pixel of the first partition image corresponding to the first context vector. The method may include duplicating by the size and generating an associated feature map by associating the entire context vector, the first context vector, and color channels corresponding to the pixels for each pixel included in the query image.

일 실시예에서, 상기 전처리 피쳐맵을 생성하는 단계는, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 단위별로 컨볼루션 연산을 통해 제1 전처리 피쳐맵을 생성하는 단계를 포함하고, 상기 제1 전처리 피쳐맵에 포함되는 채널 사이즈는 상기 컬러 채널의 사이즈에 대응되는 것을 특징으로 할 수 있다. In an embodiment, the generating of the preprocessing feature map may include performing a convolution operation on a pixel-by-pixel basis with respect to the associated feature map including the color channel, the full context vector, and the first context vector to perform a first preprocessing feature map. A size of a channel included in the first preprocessing feature map may correspond to a size of the color channel.

일 실시예에서, 상기 전처리 피쳐맵을 생성하는 단계는, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵을 압축함으로써 제2 전처리 피쳐맵을 생성하는 단계를 포함하고, 상기 제2 전처리 피쳐맵은 상기 초고해상도 모델에서 활용되는 피쳐맵 사이즈를 갖는 것을 특징으로 할 수 있다. In an embodiment, the generating of the preprocessing feature map includes generating a second preprocessing feature map by compressing the associated feature map composed of the color channel, the full context vector, and the first context vector; , The second preprocessing feature map may have a feature map size utilized in the ultra-high resolution model.

일 실시예에서, 상기 전처리 피쳐맵을 생성하는 단계는, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 별로 컨텍스트 유사도 손실율을 계산하는 단계 및 계산한 상기 컨텍스트 유사도 손실율를 활용하여 이미지 화질 개선 모듈을 학습시키는 단계를 포함할 수 있다. In an embodiment, the generating of the preprocessed feature map may include calculating a context similarity loss rate for each pixel for the associated feature map composed of the color channel, the full context vector, and the first context vector, and the calculated context similarity loss rate. and training an image quality improvement module by utilizing the context similarity loss rate.

일 실시예에서, 상기 제1 컨텍스트 벡터를 생성하는 단계는, 상기 쿼리 이미지를 상기 제1 크기보다 작은 제2 크기를 갖도록 분할한 복수의 제2 파티션 이미지들에 대응하는 제2 컨텍스트 벡터를 생성하는 단계를 포함하고, 상기 연관 피쳐맵을 생성하는 단계는, 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터를 브로드 캐스트 함으로써 상기 연관 피쳐맵을 생성하는 단계를 포함할 수 있다.In an embodiment, the generating of the first context vector may include generating second context vectors corresponding to a plurality of second partition images obtained by dividing the query image to have a second size smaller than the first size. and generating the associated feature map may include generating the associated feature map by broadcasting the entire context vector, the first context vector, and the second context vector.

일 실시예에서, 상기 전처리 피쳐맵을 생성하는 단계는, 상기 연관 피쳐맵에 포함된 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터 중 적어도 하나가 마스킹되어 있는지 확인하는 단계 및 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터 중 적어도 하나가 마스킹 되어 있는 경우, 마스킹된 벡터를 제외한 나머지 벡터를 활용하여 상기 전처리 피쳐맵을 생성하는 단계를 포함할 수 있다.In one embodiment, the generating of the preprocessing feature map includes checking whether at least one of the entire context vector, the first context vector, and the second context vector included in the associated feature map is masked; and When at least one of the full context vector, the first context vector, and the second context vector is masked, generating the pre-processed feature map using vectors other than the masked vector.

본 개시의 일 실시예에 따른 쿼리 이미지에 대한 화질을 개선하는 이미지 화질 개선 모듈은 쿼리 이미지에 대응하는 전체 컨텍스트 벡터 및 상기 쿼리 이미지를 제1 크기를 갖도록 분할한 복수의 제1 파티션 이미지들에 대응하는 제1 컨텍스트 벡터 를 생성하는 컨텍스트 벡터 추출 모듈, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터를 브로드 캐스트 함으로써 연관 피쳐맵을 생성하는 브로드 캐스팅 모듈, 상기 브로드 캐스트된 컨텍스트 벡터에 대한 전처리를 수행함으로써 전처리 피쳐맵을 생성하는 전처리 모듈 및 상기 전처리된 컨텍스트 벡터를 초고해상도 모델에 입력함으로써 상기 쿼리 이미지의 화질을 개선시키는 해상도 개선 모듈을 포함할 수 있다. An image quality improvement module for improving the quality of a query image according to an embodiment of the present disclosure corresponds to an entire context vector corresponding to the query image and a plurality of first partition images obtained by dividing the query image to have a first size. A context vector extraction module for generating a first context vector, a broadcasting module for generating an associated feature map by broadcasting the entire context vector and the first context vector, and preprocessing for the broadcasted context vector. It may include a preprocessing module that generates a feature map and a resolution enhancement module that improves the image quality of the query image by inputting the preprocessed context vector to an ultra-high resolution model.

일 실시예에서, 상기 브로드 캐스팅 모듈은, 상기 전체 컨텍스트 벡터를 대응하는 상기 쿼리 이미지에 대한 픽셀 사이즈만큼 복제하고, 상기 제1 컨텍스트 벡터를 대응하는 상기 제1 파티션 이미지에 대한 픽셀 사이즈만큼 복제하고, 상기 쿼리 이미지에 포함되는 픽셀들 별로 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 픽셀들에 대응하는 컬러 채널을 연관시킴으로써 연관 피쳐맵을 생성하는 것을 특징으로 할 수 있다. In one embodiment, the broadcasting module copies the full context vector by a pixel size of the corresponding query image, and copies the first context vector by a pixel size of the corresponding first partition image; An associated feature map may be generated by associating the entire context vector, the first context vector, and color channels corresponding to the pixels for each pixel included in the query image.

일 실시예에서, 상기 전처리 모듈은 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 단위별로 컨볼루션 연산을 통해 제1 전처리 피쳐맵을 생성하고, 상기 제1 전처리 피쳐맵에 포함되는 픽셀 사이즈는 상기 컬러 채널의 사이즈에 대응되는 것을 특징으로 할 수 있다.In an embodiment, the pre-processing module generates a first pre-processed feature map through a convolution operation per pixel for the associated feature map composed of the color channel, the entire context vector, and the first context vector, and 1 It may be characterized in that the pixel size included in the preprocessing feature map corresponds to the size of the color channel.

일 실시예에서, 상기 전처리 모듈은 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵을 압축함으로써 제2 전처리 피쳐맵을 생성하고, 상기 제2 전처리 피쳐맵은 상기 초고해상도 모델에서 활용되는 피쳐맵 사이즈를 갖는 것을 특징으로 할 수 있다. In an embodiment, the pre-processing module generates a second pre-processing feature map by compressing the associated feature map composed of the color channel, the full context vector, and the first context vector, and the second pre-processing feature map is configured in the second It may be characterized by having a feature map size utilized in a high-resolution model.

본 발명의 기술적 사상에 따르면, 이미지의 화질을 개선할 때 분할 이미지에 대한 컨텍스트 벡터를 활용함으로써 이미지에 대한 객체의 인식율이 증가할 수 있고, 결과적으로 이미지 화질 개선의 효율이 증대될 수 있다. According to the technical idea of the present invention, when improving the quality of an image, the recognition rate of an object in the image can be increased by using the context vector for the divided image, and as a result, the efficiency of improving the image quality can be increased.

도 1은 본 개시의 예시적 실시예에 따른 뉴럴 네트워크 장치의 블록도이다.
도 2는 뉴럴 네트워크의 연산 처리를 설명하기 위한 도면이다.
도 3은 본 개시의 예시적인 실시예들에 따른 이미지 화질 개선 모듈의 블록도이다.
도 4는 본 개시의 예시적인 실시예들에 따른 도 3의 컨텍스트 벡터 추출 모듈의 세부 블록도이다.
도 5는 본 개시의 예시적인 실시예들에 따른 분할 이미지 생성 모듈의 출력들을 도시한다.
도 6은 본 개시의 예시적인 실시예들에 따른 컨텍스트 벡터를 생성하는 예를 도시한다.
도 7은 본 개시의 예시적인 실시예들에 따른 연관 피쳐맵을 생성하는 예를 도시한다.
도 8은 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다.
도 9은 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다.
도 10은 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다. Fig. 1 is a block diagram of a neural network device according to an exemplary embodiment of the present disclosure.
2 is a diagram for explaining calculation processing of a neural network.
Fig. 3 is a block diagram of an image quality improvement module according to exemplary embodiments of the present disclosure.
Fig. 4 is a detailed block diagram of the context vector extraction module in Fig. 3 according to exemplary embodiments of the present disclosure.
5 shows outputs of a segmented image generation module according to exemplary embodiments of the present disclosure.
6 illustrates an example of generating a context vector according to exemplary embodiments of the present disclosure.
7 illustrates an example of generating an associative feature map according to exemplary embodiments of the present disclosure.
8 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.
9 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.
10 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 바람직한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 개시의 기술적 사상을 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Advantages and features of the present disclosure, and methods of achieving them, will become clear with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical idea of the present disclosure is not limited to the following embodiments and can be implemented in various different forms, and only the following embodiments complete the technical idea of the present disclosure, and in the technical field to which the present disclosure belongs. It is provided to completely inform those skilled in the art of the scope of the present disclosure, and the technical spirit of the present disclosure is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to components of each drawing, it should be noted that the same components have the same numerals as much as possible even if they are displayed on different drawings. In addition, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시를 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those of ordinary skill in the art to which this disclosure belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined. Terminology used herein is for describing the embodiments and is not intended to limit the present disclosure. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.Also, terms such as first, second, A, B, (a), and (b) may be used in describing the components of the present disclosure. These terms are only used to distinguish the component from other components, and the nature, order, or order of the corresponding component is not limited by the term. When an element is described as being “connected,” “coupled to,” or “connected” to another element, that element is directly connected or connectable to the other element, but there is another element between the elements. It will be understood that elements may be “connected”, “coupled” or “connected”.

본 개시에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used in this disclosure, "comprises" and/or "comprising" means that a stated component, step, operation, and/or element is one or more other components, steps, operations, and/or elements. Existence or additions are not excluded.

어느 하나의 실시예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성 요소는, 다른 실시예에서 동일한 명칭을 사용하여 설명될 수 있다. 반대되는 기재가 없는 이상, 어느 하나의 실시예에 기재된 설명은 다른 실시예에도 적용될 수 있으며, 중복되는 범위 또는 당해 기술 분야에 속한 통상의 기술자가 자명하게 이해할 수 있는 범위 내에서 구체적인 설명은 생략될 수 있다.Components included in one embodiment and components including common functions may be described using the same names in other embodiments. Unless stated to the contrary, descriptions described in one embodiment may be applied to other embodiments, and detailed descriptions will be omitted to the extent of overlapping or to the extent that those skilled in the art can clearly understand. can

이하, 본 발명의 바람직한 실시예 및 첨부한 도면을 참조하여 본 발명에 대해 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to preferred embodiments and accompanying drawings of the present invention.

도 1은 본 개시의 예시적 실시예에 따른 뉴럴 네트워크 장치(10)의 블록도이다.Fig. 1 is a block diagram of a neural network device 10 according to an exemplary embodiment of the present disclosure.

도 1을 참조하면, 뉴럴 네트워크 장치(10)는 프로세서(100), RAM(200), 및 스토리지(300) 및 카메라(미도시)를 포함할 수 있다.Referring to FIG. 1 , a neural network device 10 may include a processor 100, a RAM 200, a storage 300, and a camera (not shown).

뉴럴 네트워크 장치(10)는 입력 데이터를 분석하여 유효한 정보를 추출하고, 추출된 정보를 기초로 출력 데이터를 생성할 수 있다. 상기 입력 데이터는, 다양한 객체들을 포함하는 쿼리 이미지일 수 있고, 상기 출력 데이터는 개선된 화질의 쿼리 이미지일 수 있다.The neural network device 10 may extract valid information by analyzing input data and generate output data based on the extracted information. The input data may be a query image including various objects, and the output data may be a query image of improved quality.

뉴럴 네트워크 장치(10)는 PC(personal computer), IoT(Internet of Things) 장치, 또는 휴대용 전자 기기로 구현될 수 있다. 휴대용 전자 기기는, 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트폰, 태블릿 PC, PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라, 디지털 비디오 카메라, 오디오 장치, PMP(portable multimedia player), PND(personal navigation device), MP3 플레이어, 휴대용 게임 콘솔(handheld game console), e-북(e-book), 웨어러블(wearable) 기기 등과 같은 다양한 장치에 구비될 수 있다.The neural network device 10 may be implemented as a personal computer (PC), an Internet of Things (IoT) device, or a portable electronic device. Portable electronic devices include laptop computers, mobile phones, smart phones, tablet PCs, personal digital assistants (PDAs), enterprise digital assistants (EDAs), digital still cameras, digital video cameras, audio devices, and portable multimedia players (PMPs). ), a personal navigation device (PND), an MP3 player, a handheld game console, an e-book, a wearable device, and the like.

다양한 실시예들에 따라, 프로세서(100)는 학습 이미지에 기반하여 뉴럴 네트워크를 학습시키고, 학습된 뉴럴 네트워크를 이용하여 쿼리 이미지에 대한 화질 개선을 수행하는 등, 뉴럴 네트워크 장치(10)를 제어할 수 있다.According to various embodiments, the processor 100 may control the neural network device 10, such as training a neural network based on a training image and improving the image quality of a query image using the trained neural network. can

다양한 실시예들에 따라, 프로세서(100)는 CPU(110), 및 NPU(120)를 포함할 수 있고, 도시되지 않았지만 GPU를 포함할 수 있다. CPU(110)는 뉴럴 네트워크 장치(10)의 전반적인 동작을 제어할 수 있다. CPU(110)는 하나의 프로세서 코어(Single Core)를 포함하거나, 복수의 프로세서 코어들(Multi-Core)을 포함할 수 있다. CPU(110)는 스토리지(300)에 저장된 프로그램들 및/또는 데이터를 처리 또는 실행할 수 있다. 예를 들어, CPU(110)는 스토리지(300)에 저장된 프로그램들 및/또는 모듈들을 실행함으로써, NPU(120)의 기능을 제어할 수 있다.According to various embodiments, the processor 100 may include a CPU 110 and an NPU 120 , and may include a GPU although not shown. The CPU 110 may control overall operations of the neural network device 10 . The CPU 110 may include one processor core (Single Core) or may include a plurality of processor cores (Multi-Core). The CPU 110 may process or execute programs and/or data stored in the storage 300 . For example, the CPU 110 may control functions of the NPU 120 by executing programs and/or modules stored in the storage 300 .

다양한 실시예들에 따라, NPU(120)는 뉴럴 네트워크를 생성하거나, 뉴럴 네트워크를 훈련(train) 또는 학습(learn)하거나, 훈련 데이터를 기초로 연산을 수행하고, 수행 결과를 기초로 정보 신호(information signal)를 생성하거나, 뉴럴 네트워크를 재훈련(retrain)할 수 있다. According to various embodiments, the NPU 120 generates a neural network, trains or learns the neural network, performs an operation based on training data, and generates an information signal ( information signal) or retrain the neural network.

다양한 실시예들에 따라, 뉴럴 네트워크의 모델들은 GoogleNet, AlexNet, VGG Network 등과 같은 CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restricted Boltzman Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network 등 다양한 종류의 모델들을 포함할 수 있으며, 전술한 모델들로 제한되는 것은 아니다.According to various embodiments, neural network models include a Convolution Neural Network (CNN) such as GoogleNet, AlexNet, and VGG Network, a Region with Convolution Neural Network (R-CNN), a Region Proposal Network (RPN), and a Recurrent Neural Network (RNN). ), S-DNN (Stacking-based deep neural network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restricted Boltzman Machine), Fully Convolutional Network, LSTM (Long It may include various types of models such as Short-Term Memory) Network and Classification Network, and is not limited to the above models.

다양한 실시예들에 따라, NPU(120)는 뉴럴 네트워크의 모델들에 대응되는 프로그램들을 저장하기 위한 별도의 메모리를 포함할 수도 있다. NPU(120)는 뉴럴 네트워크를 구동하기 위하여 요구되는 많은 연산을 처리하기 위한 별도의 IP(intellectual property) 블록들을 더 포함할 수도 있다. 예를 들어, 별도의 IP 블록들은 GPU(graphical processing unit) 또는 특정 연산을 빠르게 수행하기 위한 가속기(accelerator)를 더 포함할 수도 있다.According to various embodiments, the NPU 120 may include a separate memory for storing programs corresponding to neural network models. The NPU 120 may further include separate intellectual property (IP) blocks for processing many operations required to drive the neural network. For example, separate IP blocks may further include a graphical processing unit (GPU) or an accelerator for quickly performing a specific operation.

다양한 실시예들에 따라, RAM(200)은 프로그램들, 데이터, 또는 명령들(instructions)을 일시적으로 저장할 수 있다. 예를 들어, RAM(200)은 스토리지(300)에 저장된 프로그램들 및/또는 데이터를, CPU(110)의 제어 또는 부팅 코드에 따라 일시적으로 로드(load)할 수 있다. 예컨대, RAM(200)은 DRAM(Dynamic RAM), SRAM(Static RAM), SDRAM(Synchronous DRAM)을 포함할 수 있다.According to various embodiments, RAM 200 may temporarily store programs, data, or instructions. For example, the RAM 200 may temporarily load programs and/or data stored in the storage 300 according to control of the CPU 110 or a booting code. For example, the RAM 200 may include dynamic RAM (DRAM), static RAM (SRAM), and synchronous DRAM (SDRAM).

스토리지(300)는 데이터를 저장하기 위한 저장 장소로서, OS(Operating System), 각종 프로그램들, 및 각종 데이터를 저장할 수 있다. 예를 들어, 스토리지(300)는 비휘발성 메모리에 상응할 수 있다. 예를 들어, 스토리지(300)는 ROM(Read Only Memory), 플래시 메모리, PRAM(Phase-change RAM), MRAM(Magnetic RAM), RRAM(Resistive RAM), FRAM(Ferroelectric RAM) 등을 포함할 수 있다. 일 실시예에 따라, 스토리지(300)는 HDD(Hard Disk Drive), SSD(Solid State Drive) 등으로 구현될 수 있다.The storage 300 is a storage place for storing data, and may store an Operating System (OS), various programs, and various data. For example, the storage 300 may correspond to non-volatile memory. For example, the storage 300 may include read only memory (ROM), flash memory, phase-change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), ferroelectric RAM (FRAM), and the like. . According to an embodiment, the storage 300 may be implemented as a hard disk drive (HDD), solid state drive (SSD), or the like.

도 2는 뉴럴 네트워크의 연산 처리를 설명하기 위한 도면이다. 2 is a diagram for explaining calculation processing of a neural network.

도 1의 프로세서(100)는 뉴럴 네트워크(NN)를 활용하여 본 명세서에서 설명하는 이미지 화질 개선 방법을 수행할 수 있다. 도 2를 참조하면, 뉴럴 네트워크(NN)는 복수의 레이어들(L1 내지 Ln)을 포함할 수 있다. 복수의 레이어들(L1 내지 Ln) 각각은 선형 레이어 또는 비선형 레이어일 수 있으며, 실시예에 따라, 적어도 하나의 선형 레이어 및 적어도 하나의 비선형 레이어가 결합되어 하나의 레이어로 지칭될 수도 있다. 예를 들어, 선형 레이어는 컨볼루션 레이어(convolution), 풀리 커넥티드(fully connected) 레이어를 포함할 수 있으며, 비선형 레이어는 샘플링(sampling) 레이어, 풀링(pooling) 레이어, 활성(activation) 레이어를 포함할 수 있다.The processor 100 of FIG. 1 may perform the image quality improvement method described herein by using a neural network (NN). Referring to FIG. 2 , the neural network NN may include a plurality of layers L1 to Ln. Each of the plurality of layers L1 to Ln may be a linear layer or a non-linear layer, and according to embodiments, at least one linear layer and at least one non-linear layer may be combined to be referred to as one layer. For example, a linear layer may include a convolution layer and a fully connected layer, and a nonlinear layer may include a sampling layer, a pooling layer, and an activation layer. can do.

일 실시예에 따라, 제1 레이어(L1)는 컨볼루션 레이어이고, 제2 레이어(L2)는 샘플링 레이어일 수 있다. 뉴럴 네트워크는 활성(activation) 레이어를 더 포함할 수 있으며, 다른 종류의 연산을 수행하는 적어도 하나의 레이어를 더 포함할 수 있다.According to an embodiment, the first layer L1 may be a convolution layer, and the second layer L2 may be a sampling layer. The neural network may further include an activation layer and may further include at least one layer that performs other types of calculations.

복수의 레이어들 각각은 입력되는 이미지 데이터 또는 이전 레이어에서 생성된 입력 피처맵(input feature map)을 수신하고, 입력 피처맵을 연산하여 출력 피처맵을 생성할 수 있다. 이때, 피처맵은 입력 데이터의 다양한 특징이 표현된 데이터를 의미할 수 있다. Each of the plurality of layers may receive input image data or an input feature map generated in a previous layer, and generate an output feature map by calculating the input feature map. In this case, the feature map may refer to data expressing various characteristics of the input data.

피처맵들(FM1, FM2, FM3)은 예컨대 2차원 매트릭스 또는 3차원 매트릭스 형태를 가질 수 있다. 피처맵들(FM1, FM2, FM3)은 너비(Width)(또는 칼럼이라고 함), 높이(Height)(또는 로우라고 함) 및 깊이(Depth)를 가지며, 이는 좌표 상의 x축, y축 및 z축에 각각 대응할 수 있다. 이때, 깊이(Depth)는 채널(Channel)의 개수로 지칭될 수 있다.The feature maps FM1 , FM2 , and FM3 may have a form of a 2D matrix or a 3D matrix, for example. The feature maps FM1, FM2, and FM3 have a width (or called a column), a height (or called a row), and a depth, which are the x-axis, y-axis, and z-axis on the coordinates. It can correspond to each axis. In this case, depth may be referred to as the number of channels.

제1 레이어(L1)는 제1 피처맵(FM1)을 웨이트맵(WM)과 컨볼루션하여 제2 피처맵(FM2)을 생성할 수 있다. 웨이트맵(WM)은 제1 피처맵(FM1)을 필터링할 수 있으며, 필터 또는 커널로 지칭될 수 있다. 예컨대, 웨이트맵(WM)의 깊이, 즉 채널 개수는 제1 피처맵(FM1)의 깊이, 즉 채널 개수와 동일하며, 웨이트맵(WM)과 제1 피처맵(FM1)의 동일한 채널끼리 컨볼루션될 수 있다. 웨이트맵(WM)은 제1 피처맵(FM1)을 슬라이딩 윈도우로 하여 횡단하는 방식으로 쉬프트된다. 쉬프트되는 양은 "스트라이드(stride) 길이" 또는 "스트라이드"로 지칭될 수 있다. 각 쉬프트 동안, 웨이트맵(WM)에 포함되는 웨이트들 각각이 제1 피처맵(FM1)과 중첩된 영역에서의 모든 피처 값과 곱해지고 더해질 수 있다. 제1 피처맵(FM1)과 웨이트맵(WM)이 컨볼루션 됨에 따라, 제2 피처맵(FM2)의 하나의 채널이 생성될 수 있다. The first layer L1 may generate a second feature map FM2 by convolving the first feature map FM1 with the weight map WM. The weight map WM may filter the first feature map FM1 and may be referred to as a filter or a kernel. For example, the depth of the weight map WM, that is, the number of channels, is equal to the depth of the first feature map FM1, that is, the number of channels, and convolution between the weight map WM and the same channels of the first feature map FM1 is performed. It can be. The weight map WM is shifted by traversing the first feature map FM1 as a sliding window. The amount shifted may be referred to as the “stride length” or “stride”. During each shift, each of the weights included in the weight map WM may be multiplied and added with all feature values in an area overlapping the first feature map FM1. As the first feature map FM1 and the weight map WM are convoluted, one channel of the second feature map FM2 may be generated.

도 2에는 하나의 웨이트맵(WM)이 표시되었으나, 실질적으로는 복수개의 웨이트맵이 제1 피처맵(FM1)과 컨볼루션 되어, 제2 피처맵(FM2)의 복수개의 채널이 생성될 수 있다. 다시 말해, 제2 피처맵(FM2)의 채널의 수는 웨이트맵의 개수에 대응할 수 있다.Although one weight map WM is shown in FIG. 2 , a plurality of weight maps are actually convolved with the first feature map FM1 to generate a plurality of channels of the second feature map FM2. . In other words, the number of channels of the second feature map FM2 may correspond to the number of weight maps.

제2 레이어(L2)는 제2 피처맵(FM2)의 공간적 크기(spatial size)를 변경함으로써, 제3 피처맵(FM3)을 생성할 수 있다. 일 예로, 제2 레이어(L2)는 샘플링 레이어일 수 있다. 제2 레이어(L2)는 업-샘플링 또는 다운-샘플링을 수행할 수 있으며, 제2 레이어(L2)는 제2 피처맵(FM2)에 포함된 데이터들 중 일부를 선별할 수 있다. 예컨대, 2 차원의 윈도우(WD)가 윈도우(WD)의 사이즈(예컨대, 4 * 4 행렬) 단위로 제2 피처맵(FM2) 상에서 쉬프트되고, 윈도우(WD)와 중첩되는 영역에서 특정 위치(예컨대, 1행 1열)의 값을 선택할 수 있다. 제2 레이어(L2)는 선택된 데이터를 제3 피처맵(FM3)의 데이터로서 출력할 수 있다. 다른 예로, 제2 레이어(L2)는 풀링 레이어일 수 있다. 이 경우, 제2 레이어(L2)는 제2 피처맵(FM2)에서 윈도우(WD)와 중첩되는 영역의 피처 값들의 최대 값(max pooling) 또는 피처 값들의 평균 값(average pooling)을 선택할 수 있다. 제2 레이어(L2)는 선택된 데이터를 제3 피처맵(FM3)의 데이터로서 출력할 수 있다. The second layer L2 may generate the third feature map FM3 by changing the spatial size of the second feature map FM2. For example, the second layer L2 may be a sampling layer. The second layer L2 may perform up-sampling or down-sampling, and the second layer L2 may select some of the data included in the second feature map FM2. For example, the 2-dimensional window WD is shifted on the second feature map FM2 in units of the size of the window WD (eg, a 4*4 matrix), and a specific position (eg, a 4x4 matrix) in an area overlapping the window WD. , 1 row, 1 column). The second layer L2 may output the selected data as data of the third feature map FM3. As another example, the second layer L2 may be a pooling layer. In this case, the second layer L2 may select max pooling or average pooling of feature values of an area overlapping the window WD in the second feature map FM2. . The second layer L2 may output the selected data as data of the third feature map FM3.

이에 따라, 제2 피처맵(FM2)으로부터 공간적 사이즈가 변경된 제3 피처맵(FM3)이 생성될 수 있다. 제3 피처맵(FM3)의 채널과 제2 피처맵(FM2)의 채널 개수는 동일할 수 있다. 한편, 본 개시의 예시적인 실시예에 따르면, 풀링 레이어보다 샘플링 레이어의 연산 속도가 빠를 수 있고, 샘플링 레이어는 출력 이미지의 퀄리티(예컨대, PSNR(Peak Signal to Noise Ratio) 측면에서)를 개선할 수 있다. 예컨대, 풀링 레이어에 의한 연산은, 최대 값 또는 평균 값을 산출하여야 하므로 샘플링 레이어에 의한 연산보다 연산 시간이 더 오래 걸릴 수 있다.Accordingly, a third feature map FM3 whose spatial size is changed may be generated from the second feature map FM2. The number of channels of the third feature map FM3 and the number of channels of the second feature map FM2 may be the same. Meanwhile, according to an exemplary embodiment of the present disclosure, the calculation speed of the sampling layer may be faster than that of the pooling layer, and the sampling layer may improve the quality of the output image (eg, in terms of Peak Signal to Noise Ratio (PSNR)). there is. For example, an operation by a pooling layer may take longer than an operation by a sampling layer because a maximum value or an average value must be calculated.

실시예에 따라, 제2 레이어(L2)는 샘플링 레이어 또는 풀링 레이어에 한정되지 않는다. 즉, 제2 레이어(L2)는 제1 레이어(L1)와 유사한 컨볼루션 레이어가 될 수 있다. 제2 레이어(L2)는 제2 피처맵(FM2)을 웨이트맵과 컨볼루션하여 제3 피처맵(FM3)을 생성할 수 있다. 이 경우, 제2 레이어(L2)에서 컨볼루션 연산을 수행한 웨이트맵은 제1 레이어(L1)에서 컨볼루션 연산을 수행한 웨이트맵(WM)과 다를 수 있다.According to embodiments, the second layer L2 is not limited to a sampling layer or a pooling layer. That is, the second layer L2 may be a convolutional layer similar to the first layer L1. The second layer L2 may generate a third feature map FM3 by convolving the second feature map FM2 with the weight map. In this case, the weight map on which the convolution operation is performed in the second layer L2 may be different from the weight map WM on which the convolution operation is performed on the first layer L1.

제1 레이어(L1) 및 제2 레이어(L2)를 포함한 복수의 레이어들을 거쳐 제N 레이어에서 제N 피처맵을 생성할 수 있다. 제N 피처맵은 출력 데이터가 출력되는 뉴럴 네트워크(NN)의 백 엔드(back end)에 위치한 복원 레이어(reconstruction layer)에 입력될 수 있다. 복원 레이어는 제N 피처맵을 기반으로 출력 이미지를 생성할 수 있다. 또한, 복원 레이어는 제N 피처맵 뿐만 아니라, 제1 피처맵(FM1) 및 제2 피처맵(FM2) 등 복수의 피처맵들을 수신하고, 복수의 피처맵들에 기초하여 출력 이미지를 생성할 수 있다.An Nth feature map may be generated in an Nth layer through a plurality of layers including the first layer L1 and the second layer L2. The Nth feature map may be input to a reconstruction layer located at a back end of a neural network (NN) to which output data is output. The reconstruction layer may generate an output image based on the Nth feature map. In addition, the reconstruction layer may receive not only the Nth feature map, but also a plurality of feature maps such as the first feature map FM1 and the second feature map FM2, and generate an output image based on the plurality of feature maps. there is.

제3 레이어(L3)는 제3 피처맵(FM3)의 피처들을 조합하여 피처 벡터(FV) 또는 컨텍스트 벡터를 생성할 수 있다. 예컨대, 입력 데이터는 이미지 또는 동영상 프레임의 데이터일 수 있다. 이 경우, 제3 레이어(L3)는 제2 레이어(L2)로부터 제공되는 제3 피처맵(FM3)을 기초로 피처 벡터(FV) 또는 컨텍스트 벡터를 추출할 수 있다.The third layer L3 may generate a feature vector FV or a context vector by combining features of the third feature map FM3. For example, the input data may be image or video frame data. In this case, the third layer L3 may extract the feature vector FV or the context vector based on the third feature map FM3 provided from the second layer L2.

도 3은 본 개시의 예시적인 실시예들에 따른 이미지 화질 개선 모듈(400) 의 블록도이다.3 is a block diagram of an image quality improvement module 400 according to exemplary embodiments of the present disclosure.

도 3을 참조하면, 이미지 화질 개선 모듈(400)은 컨텍스트 벡터 추출 모듈(410), 브로드 캐스팅 모듈(420), 전처리 모듈(430) 및 해상도 개선 모듈(440)을 포함할 수 있다. 이미지 화질 개선 모듈(400)은 도 1의 CPU(110) 또는 NPU(120)의 적어도 일부 구성일 수 있으며, 이미지 화질 개선 모듈(400)의 각종 동작은 도 1의 CPU(110) 또는 NPU(120)가 RAM(200) 또는 스토리지(300)에 저장된 적어도 하나의 명령어를 포함하는 컴퓨터 프로그램을 활용하여 수행하는 동작일 수 있다. 또한, 이미지 화질 개선 모듈(300)에 포함되는 컨텍스트 벡터 추출 모듈(410), 브로드 캐스팅 모듈(420), 전처리 모듈(430) 및 해상도 개선 모듈(440)은 명령어/프로그램 모듈 별로 구분된 구성일 수 있고, 하나의 하드웨어 장치(예를 들면, CPU(110), NPU(120))에 의해 수행될 수 있다.Referring to FIG. 3 , the image quality improvement module 400 may include a context vector extraction module 410, a broadcasting module 420, a preprocessing module 430, and a resolution improvement module 440. The image quality improvement module 400 may be at least a part of the CPU 110 or NPU 120 of FIG. 1, and various operations of the image quality improvement module 400 may be performed by the CPU 110 or NPU 120 of FIG. ) may be an operation performed using a computer program including at least one instruction stored in the RAM 200 or the storage 300. In addition, the context vector extraction module 410, the broadcasting module 420, the preprocessing module 430, and the resolution improvement module 440 included in the image quality improvement module 300 may be components classified according to command/program modules. There is, and may be performed by one hardware device (eg, CPU 110, NPU 120).

이미지 화질 개선 모듈(400)은 뉴럴 네트워크(NN) 중 피드포워드 신경망(feedforward neural network), 순환 신경망(recurrent neural network)에 기반할 수 있다. 예를 들어, 이미지 화질 개선 모듈(400)은 시퀀스-투-시퀀스(sequence-to-sequence) 모델을 포함할 수 있다. 이미지 화질 개선 모듈(400)는 사용자 입력에 상응하는 소스 시퀀스의 정보를 인코딩할 수 있다. 이미지 화질 개선 모듈(400)는 순환 신경망(recurrent neural network)에 기반할 수 있으며, 예를 들어, LSTM(long short-term memory) 또는 GRU(gated recurrent unit)을 포함할 수 있다. The image quality improvement module 400 may be based on a feedforward neural network or a recurrent neural network among neural networks (NNs). For example, the image quality improvement module 400 may include a sequence-to-sequence model. The image quality improvement module 400 may encode information of a source sequence corresponding to a user input. The image quality improvement module 400 may be based on a recurrent neural network, and may include, for example, a long short-term memory (LSTM) or a gated recurrent unit (GRU).

컨텍스트 벡터 추출 모듈(410)은 쿼리 이미지(Img_Qr)에 대응하는 소스 시퀀스들을 순차적으로 입력 받고, 모든 소스 시퀀스들을 압축하여 컨텍스트 벡터(CV)를 생성할 수 있다. 일 실시예에서, 컨텍스트 벡터 추출 모듈(410)은 입력된 쿼리 이미지(Img_Qr)을 복수의 파티션 이미지들로 분할하고, 전체 쿼리 이미지에 대한 컨텍스트 벡터(CV) 뿐만 아니라 분할된 파티션 이미지들에 대한 컨텍스트 벡터(CV) 역시 생성할 수 있다. 즉, 컨텍스트 벡터 추출 모듈(410)에서 출력하는 컨텍스트 벡터(CV)는 쿼리 이미지(Img_Qr) 전체에 대한 전체 컨텍스트 벡터, 쿼리 이미지(Img_Qr)를 제1 크기로 분할한 제1 파티션 이미지들에 대한 제1 컨텍스트 벡터, 쿼리 이미지(Img_Qr)를 제2 크기로 분할한 제2 파티션 이미지들에 대한 제2 컨텍스트 벡터를 포함할 수 있다. The context vector extraction module 410 may sequentially receive source sequences corresponding to the query image Img_Qr, and generate a context vector CV by compressing all source sequences. In one embodiment, the context vector extraction module 410 divides the input query image (Img_Qr) into a plurality of partition images, and the context vector (CV) for the entire query image as well as the context for the partitioned partition images. Vectors (CVs) can also be created. That is, the context vector (CV) output from the context vector extraction module 410 is the entire context vector for the entire query image (Img_Qr) and the first partition image for the first partition images obtained by dividing the query image (Img_Qr) into a first size. 1 context vector, and second context vectors for second partition images obtained by dividing the query image (Img_Qr) into a second size may be included.

브로드 캐스팅 모듈(420)은 컨텍스트 벡터(CV)를 브로드 캐스팅함으로써 연관 피쳐맵(CFM)을 생성할 수 있다. 본 명세서에서 브로드 캐스팅은 쿼리 이미지(Img_Qr)에 포함되는 같은 픽셀에 대응하는 컨텍스트 벡터(CV)끼리 연관(Concatenate) 시키는 동작을 나타낼 수 있고, 일 예시에서, 브로드 캐스팅 모듈(420)은 전체 컨텍스트 벡터, 제1 컨텍스트 벡터, 제2 컨텍스트 벡터를 대응되는 이미지 사이즈에 맞게 복제하고, 쿼리 이미지(Img_Qr)에 포함되는 제1 픽셀에 대한 전체 컨텍스트 벡터 값, 제1 컨텍스트 벡터 값, 제2 컨텍스트 벡터 값을 같은 레벨에 배열함으로써 브로드 캐스팅할 수 있다. 이를 위해 브로드 캐스팅 모듈(420)은 연관 함수(Concat)를 활용하여 브로드 캐스팅을 수행할 수 있다. 본 개시의 일 실시예에 따르면, 브로드 캐스팅 모듈(420)이 복수의 컨텍스트 벡터(CV)를 브로드 캐스팅 함으로써 복수의 컨텍스트 벡터(CV)를 모두 화질 개선에 활용할 수 있다.The broadcasting module 420 may generate the associated feature map (CFM) by broadcasting the context vector (CV). In this specification, broadcasting may represent an operation of concatenating context vectors (CVs) corresponding to the same pixel included in the query image (Img_Qr), and in one example, the broadcasting module 420 converts the entire context vector . You can broadcast by arranging them on the same level. To this end, the broadcasting module 420 may perform broadcasting using a concatenation function (Concat). According to an embodiment of the present disclosure, the broadcasting module 420 broadcasts a plurality of context vectors (CVs) so that all of the plurality of context vectors (CVs) can be used to improve picture quality.

전처리 모듈(430)은 연관 피쳐맵(CFM)에 대한 전처리를 통해 전처리 피쳐맵(PFM)을 생성할 수 있다. 연관 피쳐맵(CFM)은 브로드 캐스팅됨에 따라서 사이즈 등의 문제로 해상도 개선 모듈(440)에서 바로 활용할 수 없을 수 있다. 이에 따라서, 전처리 모듈(430)은 해상도 개선 모듈(440)에서 활용할 수 있도록 연관 피쳐맵(CFM)에 대한 사이즈 또는 데이터 형식에 대한 전처리를 수행함으로써 해상도 개선 모듈(440)에서 활용 가능한 전처리 피쳐맵(PFM)을 생성할 수 있다. The preprocessing module 430 may generate a preprocessed feature map (PFM) through preprocessing of the associated feature map (CFM). As the associated feature map (CFM) is broadcasted, the resolution enhancement module 440 may not be able to immediately utilize it due to a problem such as size. Accordingly, the preprocessing module 430 performs preprocessing on the size or data format of the associated feature map (CFM) so that the resolution enhancement module 440 can utilize the preprocessed feature map ( PFM) can be created.

일 실시예에서, 전처리 모듈(430)은 연관 피쳐맵(CFM)에 대한 1:1 컨볼루션을 수행함으로써 전처리 피쳐맵(PFM)을 생성할 수 있다. 일 실시예에서, 전처리 모듈(430)은 연관 피쳐맵(CFM)에 대한 압축을 통해 전처리 피쳐맵(PFM)을 생성할 수 있다. 일 실시예에서, 전처리 모듈(430)은 연관 피쳐맵(CFM)의 컨텍스트 벡터들 사이의 컨텍스트 유사도 손실율을 계산하고, 계산된 컨텍스트 유사도 손실율을 이용하여 해상도 개선 모듈(440)을 학습시킬 수 있다. 일 실시예에서, 전처리 모듈(430)은 연관 피쳐맵(CFM)에 포함된 컨텍스트 벡터들 중 마스킹되어 있는 벡터를 확인하고, 마스킹된 벡터가 있는 경우, 대응하는 픽셀에 대해서는 마스킹된 벡터를 제외한 나머지 벡터들만으로 전처리 피쳐맵(PFM)을 생성할 수 있다. In one embodiment, the preprocessing module 430 may generate a preprocessing feature map (PFM) by performing a 1:1 convolution on the associated feature map (CFM). In one embodiment, the pre-processing module 430 may generate a pre-processing feature map (PFM) through compression of the associated feature map (CFM). In an embodiment, the preprocessing module 430 may calculate a context similarity loss rate between context vectors of the associated feature map (CFM), and may train the resolution improvement module 440 using the calculated context similarity loss rate. In one embodiment, the preprocessing module 430 checks a masked vector among the context vectors included in the associated feature map (CFM), and if there is a masked vector, the rest except for the masked vector for the corresponding pixel A preprocessing feature map (PFM) can be created with only vectors.

본 개시의 일 실시예에 따르면, 전처리 모듈(430)이 연관 피쳐맵(CFM)을 기초로 전처리 피쳐맵(PFM)을 생성함으로써 새로운 해상도 개선 모델에 대한 개발 없이 기존의 해상도 개선 모델을 활용해도 복수의 컨텍스트 벡터들이 화질 개선에 반영될 수 있다.According to an embodiment of the present disclosure, the pre-processing module 430 generates a pre-processing feature map (PFM) based on the associated feature map (CFM), so that even if an existing resolution enhancement model is used without developing a new resolution enhancement model, multiple resolution enhancement models are used. Context vectors of may be reflected in picture quality improvement.

해상도 개선 모듈(440)은 전처리 피쳐맵(PFM)을 이용하여 쿼리 이미지(Img_Qr)에 대한 이미지를 개선함으로써 출력 이미지(Img_Out)를 생성할 수 있다. 일 실시예에서, 전처리 피쳐맵(PFM)은 픽셀별 오브젝트에 대한 상세 정보를 포함할 수 있고, 해상도 개선 모듈(440)은 픽셀별 오브젝트의 상세 정보에 따라서 객체별 해상도 개선 알고리즘을 적용하여 쿼리 이미지(Img_Qr)의 화질을 개선시킬 수 있다. 일 예시에서, 해상도 개선 모듈(440)은 초고해상도 모델(Super Resolution Model), 보간 기반 알고리즘(interpolation-based algorithm)을 활용하여 쿼리 이미지(Img_Qr)의 화질을 개선시킬 수 있다.The resolution enhancement module 440 may generate an output image Img_Out by improving an image of the query image Img_Qr using a preprocessing feature map (PFM). In one embodiment, the pre-processing feature map (PFM) may include detailed information on objects per pixel, and the resolution enhancement module 440 applies a resolution enhancement algorithm for each object according to the detailed information on objects per pixel to query the image. The image quality of (Img_Qr) can be improved. In one example, the resolution improvement module 440 may improve the quality of the query image Img_Qr by utilizing a super resolution model and an interpolation-based algorithm.

도 4는 본 개시의 예시적인 실시예들에 따른 도 3의 컨텍스트 벡터 추출 모듈(410)의 세부 블록도이고, 도 5는 본 개시의 예시적인 실시예들에 따른 분할 이미지 생성 모듈의 출력들을 도시한다.4 is a detailed block diagram of the context vector extraction module 410 in FIG. 3 according to exemplary embodiments of the present disclosure, and FIG. 5 illustrates outputs of the segmented image generation module according to exemplary embodiments of the present disclosure. do.

도 4를 참조하면, 컨텍스트 벡터 추출 모듈(410)은 분할 이미지 생성 모듈(411), 이미지 보간 모듈(412), 이미지 인코딩 모듈(413), 텍스트 인코딩 모듈(414), 및 컨텍스트 벡터 튜닝 모듈(415)을 포함할 수 있다. Referring to FIG. 4 , the context vector extraction module 410 includes a segmented image generation module 411, an image interpolation module 412, an image encoding module 413, a text encoding module 414, and a context vector tuning module 415. ) may be included.

분할 이미지 생성 모듈(411)은 입력된 쿼리 이미지에 기반하여 복수의 파티션 이미지들을 생성할 수 있다. 분할 이미지 생성 모듈(411)은 상기 쿼리 이미지를 미리 결정된 개수의 파티션 이미지들로 분할할 수도 있고, 바운딩 박스에 기반하여 파티션 이미지들을 분할할 수도 있다. 즉, 분할 이미지 생성 모듈(411)은 쿼리 이미지를 다양한 크기의 이미지 패치들로 분할할 수 있다.The partition image generation module 411 may generate a plurality of partition images based on the input query image. The segmented image generating module 411 may segment the query image into a predetermined number of partition images, or segment the partition images based on a bounding box. That is, the segmented image generating module 411 may segment the query image into image patches of various sizes.

일 실시예에 따라, 도 5를 함께 참조하면, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 수신하고, 쿼리 이미지(510)를 미리 결정된 개수의 파티션 이미지들로 분할할 수 있다. 예를 들어, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 9개의 제1 파티션 이미지들로 분할할 수 있다. 이를 위하여, 분할 이미지 생성 모듈(4101은 쿼리 이미지(510)의 X축, Y축을 동일한 간격으로 3등분하여 제1 파티션 이미지들을 생성할 수 있다. 분할 이미지 생성 모듈(410)이 쿼리 이미지를 분할하여 미리 결정된 개수만큼 파티션 이미지들을 생성하는 경우, 상기 생성된 파티션 이미지들 각각은 동일한 크기일 수 있다.According to an embodiment, referring to FIG. 5 together, the segmented image generating module 411 may receive a query image 510 and divide the query image 510 into a predetermined number of partition images. For example, the divided image generation module 411 may divide the query image 510 into nine first partition images. To this end, the segmented image generation module 4101 may generate first partition images by dividing the X-axis and Y-axis of the query image 510 into thirds at equal intervals. When a predetermined number of partition images are generated, each of the generated partition images may have the same size.

일 실시예에 따라, 도 5를 함께 참조하면, 분할 이미지 생성 모듈(411)은 복수의 바운딩 박스들을 획득할 수 있다. 복수의 바운딩 박스들 각각은, 객체 인식 결과, 식별된 객체들 각각을 최소한 영역으로 포함하는 사각형(복수의 점선 사각형)을 지칭할 수 있다. 분할 이미지 생성 모듈(411)은 쿼리 이미지(510) 내에 복수의 바운딩 박스들 중 쿼리 이미지(510)를 분할함으로써 절단되는 바운딩 박스의 개수를 최소화하는 방향으로 제1 파티션 이미지들을 생성할 수 있다. 도 5를 함께 참조하면, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 4개의 제1 파티션 이미지들로 분할할 수 있다. 쿼리 이미지(510)를 동일한 크기의 제1 파티션 이미지들로 분할하는 경우, 오른쪽 하단(3,3)의 제1 파티션 이미지의 사람 객체는 중앙 하단(3,2)의 제1 파티션 이미지와 오른쪽 하단(3,3)의 제1 파티션 이미지에 나누어져 포함될 수 있다. 바운딩 박스에 기반하여 제1 파티션 이미지들을 생성하는 경우, 분할 이미지 생성 모듈(411)은 X축을 기준으로 바운딩 박스들이 집중되어 있는 왼쪽 영역((1,1), (2,1), (3,1))과 오른쪽 영역((1,2), (2,2), (3,2))으로 분할할 수 있다. 또한 분할 이미지 생성 모듈(410)은 Y축을 기준으로 바운딩 박스들이 집중되어 있는 상단 영역((1,1), (1,2))과 하단 영역((2,1), (2,2))으로 분할할 수 있다.According to an embodiment, referring to FIG. 5 together, the division image generating module 411 may obtain a plurality of bounding boxes. Each of the plurality of bounding boxes may refer to a rectangle (a plurality of dotted line rectangles) including, as a result of object recognition, each of the identified objects as at least an area. The divided image generation module 411 may generate first partition images in a direction of minimizing the number of bounding boxes to be cut by dividing the query image 510 among a plurality of bounding boxes in the query image 510 . Referring to FIG. 5 together, the split image generation module 411 may divide the query image 510 into four first partition images. When the query image 510 is divided into first partition images of the same size, the human object of the first partition image at the lower right (3,3) is the first partition image at the lower center (3,2) and the lower right It may be divided and included in the first partition image of (3,3). When the first partition images are generated based on the bounding boxes, the division image generation module 411 performs the left region ((1,1), (2,1), (3, 1)) and the right area ((1,2), (2,2), (3,2)). In addition, the division image generation module 410 generates an upper region ((1,1), (1,2)) and a lower region ((2,1), (2,2)) where bounding boxes are concentrated on the Y axis. can be divided into

이미지 보간 모듈(412)은 파티션 이미지들에 대한 보간을 수행할 수 있다. 상기 보간은, 영상 신호를 변환할 때(예를 들어, 확대) 기존의 픽셀 정보를 이용하여 픽셀 홀의 값을 연산하는 것을 의미한다. 예를 들어, 이미지 보간 모듈(412)은 가장 가까운 픽셀 값을 이용하여 픽셀 홀을 계산하는 최근접 보간법(Nearest Neighbor Interpolation), 인접한 4개 픽셀 값과 거리비를 사용하는 양선형 보간법(First Order Interpolation), 인접한 16개의 픽셀 값과 거리에 따른 가중치의 곱을 사용하는 바이큐빅 보간법(Bicubic Interpolation) 등을 이용할 수 있다. 예를 들어, 도 5를 함께 참조하면, 9개로 분할된 제1 파티션 이미지들 각각의 크기는 쿼리 이미지(510)의 크기의 1/9일 수 있다. 이미지 보간 모듈(412)은 쿼리 이미지(510)와 동일한 크기로 복수의 제1 파티션 이미지들 각각을 확대할 수 있다. The image interpolation module 412 may perform interpolation on partition images. The interpolation means calculating a value of a pixel hole using existing pixel information when converting (eg, enlarging) an image signal. For example, the image interpolation module 412 may use Nearest Neighbor Interpolation, which calculates a pixel hole using the nearest pixel value, and First Order Interpolation, which uses 4 adjacent pixel values and a distance ratio. ), bicubic interpolation using the product of 16 adjacent pixel values and a weight according to distance, or the like can be used. For example, referring to FIG. 5 together, the size of each of the 9 first partition images may be 1/9 of the size of the query image 510 . The image interpolation module 412 may enlarge each of the plurality of first partition images to the same size as the query image 510 .

이미지 인코딩 모듈(413)은 입력 시퀀스에 상응하는 입력 이미지를 입력 받아 컨텍스트 벡터를 생성할 수 있다. 예를 들어, 이미지 인코딩 모듈(413)은 ResNet의 컨볼루션 신경망 또는 시퀀스-투-시퀀스 모델의 트랜스포머(예를 들어, 비전 트랜스포머(ViT))에 기반할 수 있다. 상기 컨텍스트 벡터는, 입력 이미지의 상황 정보와 입력 이미지에 포함되는 객체 정보를 반영하여 생성되는 벡터일 수 있다. 텍스트 인코딩 모듈(414)은 입력 시퀀스에 상응하는 입력 텍스트를 입력 받아, 컨텍스트 벡터를 생성할 수 있다. 예를 들어, 텍스트 인코딩 모듈(414)은 트랜스포머에 기반할 수 있다. The image encoding module 413 may generate a context vector by receiving an input image corresponding to an input sequence. For example, the image encoding module 413 may be based on a convolutional neural network of ResNet or a transformer of a sequence-to-sequence model (eg, a vision transformer (ViT)). The context vector may be a vector generated by reflecting context information of the input image and object information included in the input image. The text encoding module 414 may receive input text corresponding to an input sequence and generate a context vector. For example, text encoding module 414 may be transformer based.

컨텍스트 벡터 튜닝 모듈(415)은 여러 컨텍스트 벡터들 간에 유사도를 판단하여 임계 값 이하의 유사도를 갖는 임의의 컨텍스트 벡터를 마스킹할 수 있다. 상기 유사도는 코사인 유사도에 기반할 수 있다. 예를 들어, 쿼리 이미지(510)의 화질 개선에 불필요한 컨텍스트 벡터를 마스킹함으로써 화질 개선에 필요한 컨텍스트 벡터의 비중을 높게 설정할 수 있다. 또한, 일 실시예에서, 컨텍스트 벡터 튜닝 모듈(415)은 여러 컨텍스트 벡터들 간의 유사도를 기초로 컨텍스트 벡터 추출 모듈(410)로 하여금 컨텍스트 벡터들 간의 의미가 일관성이 있도록(예를 들면, 여러 컨텍스트 벡터들 간의 유사도가 높도록) 학습시킬 수 있다.The context vector tuning module 415 may determine a similarity between several context vectors and mask any context vector having a similarity equal to or less than a threshold value. The similarity may be based on cosine similarity. For example, by masking context vectors unnecessary for improving the image quality of the query image 510, the proportion of context vectors necessary for improving the image quality can be set high. In addition, in one embodiment, the context vector tuning module 415 enables the context vector extraction module 410 to have consistency in meaning between context vectors based on the degree of similarity between the context vectors (eg, multiple context vectors). It can be trained so that the similarity between them is high).

일 실시예에 따라, 컨텍스트 벡터 튜닝 모듈(415)은 파티션 이미지들에 기반하여 생성된 컨텍스트 벡터의 마스킹 여부를 판단할 수 있다. 도 5를 함께 참조하면, 컨텍스트 벡터 튜닝 모듈(415)은 9개로 분할된 제1 파티션 이미지들이 변환된 컨텍스트 벡터들에 대하여 마스킹 여부를 판단할 수 있다. 예를 들어, 컨텍스트 벡터 튜닝 모듈(415)은 왼쪽 상단 (1,1)의 컨텍스트 벡터에 대하여 마스킹 여부를 판단할 수 있다. 이 때, 컨텍스트 벡터 튜닝 모듈(415)은 상위 벡터인 쿼리 이미지(510)에 상응하는 전체 컨텍스트 벡터 및 인접한 컨텍스트 벡터들(예를 들어, (1,2)의 제1 파티션 이미지에 상응하는 컨텍스트 벡터, (2,1)의 제1 파티션 이미지에 상응하는 컨텍스트 벡터 등등)과의 유사도를 비교하여 마스킹 여부를 결정할 수 있다. According to an embodiment, the context vector tuning module 415 may determine whether to mask a context vector generated based on partition images. Referring to FIG. 5 together, the context vector tuning module 415 may determine whether or not masking is performed with respect to context vectors in which the first partition images divided into 9 are converted. For example, the context vector tuning module 415 may determine whether masking is performed on the context vector of (1,1) in the upper left corner. At this time, the context vector tuning module 415 performs the entire context vector corresponding to the query image 510 as an upper vector and the context vector corresponding to the first partition image of adjacent context vectors (eg, (1,2)). , a context vector corresponding to the first partition image of (2,1), etc.) to determine whether masking is performed.

도 6은 본 개시의 예시적인 실시예들에 따른 컨텍스트 벡터를 생성하는 예를 도시한다. 6 illustrates an example of generating a context vector according to exemplary embodiments of the present disclosure.

도 6을 참조하면, 이미지 인코딩 모듈(413)은 복수의 이미지 인코더들을 포함할 수 있다. 이미지 인코더들의 개수는 쿼리 이미지(510)를 다양한 크기의 파티션 이미지들로 분할하는 것에 기반할 수 있다. 예를 들어, 도 6을 함께 참조하면, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 수신하고, 9개 크기의 제1 파티션 이미지들로 분할할 수 있다. 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 9개의 동일한 크기의 제1 파티션 이미지들로 분할할 수 있고, 36개의 동일한 크기의 제2 파티션 이미지들로 분할할 수 있다. 이 때, 이미지 인코딩 모듈(413)은 제1 이미지 인코더(413_1) 내지 제3 이미지 인코더(413_3)를 포함할 수 있다. 다만, 본 개시는 이에 제한되지 않으며, 쿼리 이미지(510)를 분할하는 다양한 크기에 따라 N개(여기서, N은 4 이상의 자연수)의 이미지 인코더들을 포함할 수 있다. Referring to FIG. 6 , the image encoding module 413 may include a plurality of image encoders. The number of image encoders can be based on partitioning the query image 510 into partition images of various sizes. For example, referring to FIG. 6 together, the segmentation image generating module 411 may receive the query image 510 and divide it into nine first partition images. The split image generation module 411 may divide the query image 510 into 9 equal-sized first partition images and may divide the query image 510 into 36 equal-sized second partition images. At this time, the image encoding module 413 may include a first image encoder 413_1 to a third image encoder 413_3. However, the present disclosure is not limited thereto, and may include N image encoders (where N is a natural number of 4 or more) according to various sizes of dividing the query image 510 .

제1 이미지 인코더(413_1)는 쿼리 이미지(510) 전체를 수신하여 전체 컨텍스트 벡터(whole image context vector)를 추출할 수 있다. 제2 이미지 인코더(413_2)는 쿼리 이미지(510)가 분할된 9개의 제1 파티션 이미지들 각각에 대하여 제1 컨텍스트 벡터(middle image context vector)들을 추출할 수 있다. 이 때, 분할된 9개의 파티션 이미지들의 크기는 쿼리 이미지(510)의 1/9이지만, 이미지 보간 모듈(412)에 의해 쿼리 이미지(510)와 동일한 크기로 확대된 이미지가 제2 이미지 인코더(413_2)로 입력될 수 있다. 따라서, 분할된 9개의 파티션 이미지들 각각에 대하여 추출된 복수의 제1 컨텍스트 벡터들 각각의 크기는 제1 이미지 인코더(413_1)의 출력인 전체 컨텍스트 벡터와 동일할 수 있다.The first image encoder 413_1 may receive the entire query image 510 and extract a whole image context vector. The second image encoder 413_2 may extract first middle image context vectors for each of nine first partition images from which the query image 510 is divided. At this time, the size of the divided nine partition images is 1/9 of the query image 510, but the image enlarged to the same size as the query image 510 by the image interpolation module 412 is the second image encoder 413_2. ) can be entered. Accordingly, the size of each of the plurality of first context vectors extracted for each of the nine divided partition images may be the same as the entire context vector output of the first image encoder 413_1.

제3 이미지 인코더(413_3)는 쿼리 이미지(510)가 분할된 36개의 제2 파티션 이미지들 각각에 대하여 제2 컨텍스트 벡터들(small image context vector)을 추출할 수 있다. 이 때, 분할된 36개의 파티션 이미지들의 크기는 쿼리 이미지(510)의 1/36이지만, 이미지 보간 모듈(412)에 의해 쿼리 이미지(510)와 동일한 크기로 확대된 이미지가 제3 이미지 인코더(413_3)로 입력될 수 있다. 따라서, 분할된 36개의 파티션 이미지들 각각에 대하여 추출된 복수의 제2 컨텍스트 벡터들의 크기는 제1 이미지 인코더(413_1)의 출력인 전체 컨텍스트 벡터의 크기와 동일할 수 있다. The third image encoder 413_3 may extract second context vectors (small image context vectors) for each of 36 second partition images from which the query image 510 is divided. At this time, the size of the divided 36 partition images is 1/36 of the query image 510, but the image enlarged to the same size as the query image 510 by the image interpolation module 412 is the third image encoder 413_3. ) can be entered. Accordingly, the size of the plurality of second context vectors extracted for each of the 36 divided partition images may be the same as the size of all context vectors output from the first image encoder 413_1.

전술한 실시예에서, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 동일한 크기의 9개, 36개의 파티션 이미지들로 분할하는 것으로 도시되었으나, 이에 제한되는 것은 아니다. 다양한 실시예들에 따라, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 동일한 크기의 N²개(여기서, N은 자연수)의 파티션 이미지들로 분할할 수 있다.In the above-described embodiment, the segmentation image generation module 411 is illustrated as dividing the query image 510 into 9 or 36 partition images of the same size, but is not limited thereto. According to various embodiments, the partition image generation module 411 may divide the query image 510 into N ² (where N is a natural number) partition images having the same size.

또한 전술한 실시예에서, 분할 이미지 생성 모듈(411)은 쿼리 이미지(510)를 9개, 36개의 두 종류의 파티션 이미지로 분할하는 것으로 도시되었으나, 이에 제한되는 것은 아니다. 다양한 실시예들에 따라, 분할 이미지 생성 모듈(411)은 한 종류 이상의 파티션 이미지로 분할할 수 있다.Also, in the above-described embodiment, the division image generation module 411 is illustrated as dividing the query image 510 into 9 and 36 partition images, but is not limited thereto. According to various embodiments, the partition image generation module 411 may divide into one or more types of partition images.

일 실시예에서, 컨텍스트 벡터 튜닝 모듈(415)은 복수의 컨텍스트 벡터들을 수신할 수 있다. 예를 들어, 컨텍스트 벡터 튜닝 모듈(415)은 제1 이미지 인코더(413_1)로부터 쿼리 이미지(510)에 상응하는 전체 컨텍스트 벡터를 수신할 수 있다. 컨텍스트 벡터 튜닝 모듈(415)은 제2 이미지 인코더(413_2)로부터 복수의 제1 파티션 이미지들에 상응하는 복수의 제1 컨텍스트 벡터들을 수신할 수 있다. 컨텍스트 벡터 튜닝 모듈(415)은 제3 이미지 인코더(413)3)로부터 복수의 제2 파티션 이미지들에 상응하는 복수의 제2 컨텍스트 벡터들을 수신할 수 있다. In one embodiment, context vector tuning module 415 may receive a plurality of context vectors. For example, the context vector tuning module 415 may receive the entire context vector corresponding to the query image 510 from the first image encoder 413_1. The context vector tuning module 415 may receive a plurality of first context vectors corresponding to a plurality of first partition images from the second image encoder 413_2. The context vector tuning module 415 may receive a plurality of second context vectors corresponding to a plurality of second partition images from the third image encoder 413 3 .

도 7은 본 개시의 예시적인 실시예들에 따른 연관 피쳐맵을 생성하는 예를 도시한다. 7 illustrates an example of generating an associative feature map according to exemplary embodiments of the present disclosure.

도 7을 참조하면, 브로드 캐스팅 모듈(420)은 쿼리 이미지(Img_Qr)에 대응하는 전체 컨텍스트 벡터(WV), 제1 파티션 이미지에 대응하는 제1 컨텍스트 벡터(MV), 제2 파티션 이미지에 대응하는 제2 컨텍스트 벡터(SV)를 수신하고, 수신한 전체 컨텍스트 벡터(WV), 제1 컨텍스트 벡터(MV), 제2 컨텍스트 벡터(SV)를 브로드 캐스팅함으로써 연관 피쳐맵(CFM)을 생성할 수 있다. Referring to FIG. 7 , the broadcasting module 420 includes a full context vector (WV) corresponding to the query image (Img_Qr), a first context vector (MV) corresponding to the first partition image, and a full context vector (MV) corresponding to the second partition image. A related feature map (CFM) may be generated by receiving the second context vector (SV) and broadcasting the entire received context vector (WV), the first context vector (MV), and the second context vector (SV). .

상세하게, 브로드 캐스팅 모듈(420)는 쿼리 이미지(Img_Qr)에 대응하여 하나로 구성되는 전체 컨텍스트 벡터(WV)를 쿼리 이미지(Img_Qr)의 픽셀 사이즈인 폭(W)과 높이(H)의 개수만큼 복제하고, 복제된 전체 컨텍스트 벡터(WV)를 쿼리 이미지(Img_Qr)의 각 픽셀에 대응시킬 수 있다. In detail, the broadcasting module 420 duplicates the entire context vector WV, which is composed of one corresponding to the query image Img_Qr, by the number of width W and height H, which are pixel sizes of the query image Img_Qr. And, the replicated entire context vector (WV) may correspond to each pixel of the query image (Img_Qr).

또한, 브로드 캐스팅 모듈(420)은 제1 파티션 이미지에 대응하여 제1 컨텍스트 벡터(MV)를 제1 파티션 이미지의 픽셀 사이즈인 폭(W/3)과 높이(H/3)의 개수만큼 복제하고, 복제된 제1 컨텍스트 벡터(MV)를 제1 파티션 이미지에 포함되는 각 픽셀에 대응시킬 수 있다. 일 예시에서, (1,3)의 제1 파티션 이미지에 대응하는 제1 컨텍스트 벡터(MV)를 W/3*H/3개만큼 복제하여 (1,3)의 제1 파티션 이미지에 포함된 모든 픽셀에 각각 대응시킬 수 있다. 그 결과, 쿼리 이미지(Img_Qr)에 포함되는 픽셀들은 (1,1) 내지 (3,3)에 해당하는 9개의 제1 컨텍스트 벡터(MV) 중 픽셀의 위치에 대응한 벡터에 연관될 수 있다. In addition, the broadcasting module 420 copies the first context vector (MV) corresponding to the first partition image by the number of pixels of width (W/3) and height (H/3) of the first partition image, and , the replicated first context vector MV may correspond to each pixel included in the first partition image. In one example, by replicating the first context vector (MV) corresponding to the first partition image of (1,3) by W/3*H/3, all of the first context vectors included in the first partition image of (1,3) It can correspond to each pixel. As a result, pixels included in the query image Img_Qr may be associated with a vector corresponding to a pixel position among nine first context vectors MV corresponding to (1,1) to (3,3).

또한, 브로드 캐스팅 모듈(420)은 제2 파티션 이미지에 대응하여 제2 컨텍스트 벡터(SV)를 제2 파티션 이미지의 픽셀 사이즈인 폭(W/6)과 높이(H/6)의 개수만큼 복제하고, 복제된 제2 컨텍스트 벡터(SV)를 제2 파티션 이미지에 포함되는 각 픽셀에 대응시킬 수 있다. 일 예시에서, (2,6)의 제2 파티션 이미지에 대응하는 제2 컨텍스트 벡터(SV)를 W/6*H/6개만큼 복제하여 (2,6)의 제2 파티션 이미지에 포함된 모든 픽셀에 각각 대응시킬 수 있다. 그 결과, 쿼리 이미지(Img_Qr)에 포함되는 픽셀들은 (1,1) 내지 (6,6)에 해당하는 36개의 제2 컨텍스트 벡터(SV) 중 픽셀의 위치에 대응한 벡터에 연관될 수 있다. In addition, the broadcasting module 420 copies the second context vector (SV) corresponding to the second partition image by the number of width (W/6) and height (H/6), which are pixel sizes of the second partition image, and , the replicated second context vector SV may correspond to each pixel included in the second partition image. In one example, by replicating the second context vector (SV) corresponding to the second partition image of (2,6) by W/6*H/6, all of the second context vectors included in the second partition image of (2,6) It can correspond to each pixel. As a result, pixels included in the query image Img_Qr may be associated with a vector corresponding to a pixel position among 36 second context vectors SVs corresponding to (1,1) to (6,6).

결과적으로 각 픽셀들은 위치별로 포함되는 제1 파티션 이미지, 제2 파티션 이미지에 따라서 전체 컨텍스트 벡터(WV), 제1 컨텍스트 벡터(MV) 및 제2 컨텍스트 벡터(SV)의 세가지 벡터와 연관될 수 있다. 브로드 캐스팅 모듈(320)은 쿼리 이미지(Img_Qr)에 포함되는 모든 픽셀에 대해 연관된 픽셀들을 연결시킴으로써 연관 피쳐맵(CFM)을 생성할 수 있다. 일 예시에서, (2,6)의 제2 파티션 이미지에 포함된 픽셀들은 (1,3)의 제1 파티션 이미지, 쿼리 이미지(Img_Qr)에 역시 포함되므로, 도 7에 도시된 전체 컨텍스트 벡터(WV), 제1 컨텍스트 벡터(MV), 제2 컨텍스트 벡터(SV)와 연결됨으로써 연관 피쳐맵(CFM)이 생성될 수 있다. 또한, 브로드 캐스팅 모듈(420)은 컨텍스트 벡터들(WV, MV, SV)외에 각 픽셀의 컬러 정보를 갖는 컬러 채널(CC)을 더 연결시킴으로써 연관 피쳐맵(CFM)을 생성할 수 있다.As a result, each pixel may be associated with three vectors of a full context vector (WV), a first context vector (MV), and a second context vector (SV) according to the first partition image and the second partition image included for each position. . The broadcasting module 320 may generate a related feature map (CFM) by connecting related pixels to all pixels included in the query image Img_Qr. In one example, since the pixels included in the second partition image of (2,6) are also included in the first partition image, query image (Img_Qr) of (1,3), the entire context vector (WV) shown in FIG. ), the first context vector (MV), and the second context vector (SV), the associated feature map (CFM) can be created. In addition, the broadcasting module 420 may generate a related feature map (CFM) by further connecting a color channel (CC) having color information of each pixel in addition to the context vectors (WV, MV, and SV).

즉, 연관 피쳐맵(CFM)에는 각 픽셀의 컬러 정보(CC), 가장 큰 레벨에서의 객체 정보(WV), 중간 레벨에서의 객체 정보(MV), 작은 레벨에서의 객체 정보(SV)가 포함될 수 있다. 쿼리 이미지(Img_Qr)는 확대 정도에 따라서 서로 다른 객체로 인식될 수 있다. 예를 들면, 도 5 상단의 쿼리 이미지(510)에 있어서, 전체 이미지는 "화창한 날씨", (3,1)에 해당하는 제1 파티션 이미지는 "사람들", (3,1)의 왼쪽 하단에 해당하는 제2 파티션 이미지는 "강아지"를 객체로서 인식할 수 있다. That is, the associated feature map (CFM) includes color information (CC) of each pixel, object information (WV) at the largest level, object information (MV) at the middle level, and object information (SV) at the small level. can The query image Img_Qr may be recognized as different objects according to the degree of magnification. For example, in the query image 510 at the top of FIG. 5, the first partition image corresponding to "sunny weather" and (3,1) is "people" and the lower left of (3,1) is the entire image. The corresponding second partition image may recognize “dog” as an object.

본 개시의 일 실시예에 따르면, 같은 이미지에 대해서 큰 레벨, 중간 레벨, 작은 레벨 별로 서로 다른 컨텍스트 정보를 인식할 수 있고, 이에 따라서 화질 개선시에 더 상세한 정보를 활용함에 따라서 화질 개선율이 증대될 수 있다.According to an embodiment of the present disclosure, it is possible to recognize different context information for a large level, a medium level, and a small level for the same image, and accordingly, the picture quality improvement rate can be increased by using more detailed information when improving picture quality. can

도 7에서는 세개의 컨텍스트 벡터를 이용하여 연관 피쳐맵(CFM)을 생성하는 예시가 도시되어 있으나, 두개 또는 네개보다 많은 컨텍스트 벡터를 활용하여 연관 피쳐맵(CFM)을 생성할 수 있으며, 제1 파티션 이미지가 9분할 이미지이고, 제2 파티션 이미지가 36분할 이미지 인 것 역시 예시일 뿐 본 개시의 기술적 사상이 이에 한정되지 않는다. Although FIG. 7 shows an example of generating an associated feature map (CFM) using three context vectors, the associated feature map (CFM) can be generated using two or more than four context vectors, and the first partition It is also an example that the image is a 9-division image and the second partition image is a 36-division image, and the technical spirit of the present disclosure is not limited thereto.

도 8은 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다. 8 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.

도 8을 참조하면, 전처리 모듈(430)은 연관 피쳐맵(CFM)을 활용하여 제1 전처리 피쳐맵(PFM1)을 생성할 수 있다. 일 예시에서, 전처리 모듈(430)은 1:1 컨볼루션 모듈을 포함할 수 있고, 연관 피쳐맵(CFM)에 대한 컨볼루션을 통해 제1 전처리 피쳐맵(PFM1)을 생성할 수 있다. 일 실시예에서, 제1 전처리 피쳐맵(PFM1)의 컬러 채널(CC')의 차원은 컬러 채널(CC)의 차원과 동일할 수 있고, 이에 따라서 기존의 해상도 개선 모듈(440)의 입력으로서 조건을 충족할 수 있다.Referring to FIG. 8 , the preprocessing module 430 may generate a first preprocessing feature map PFM1 by using the associated feature map CFM. In one example, the preprocessing module 430 may include a 1:1 convolution module and generate the first preprocessing feature map PFM1 through convolution on the associated feature map CFM. In one embodiment, the dimension of the color channel (CC') of the first preprocessing feature map (PFM1) may be the same as that of the color channel (CC), and accordingly, a condition as an input of the existing resolution enhancement module 440. can satisfy

일 실시예에서, 전처리 모듈(430)은 연관 피쳐맵(CFM)에 포함된 마스킹된 벡터가 있는 경우, 마스킹된 벡터를 제외한 나머지 벡터를 이용하여 제1 전처리 피쳐맵(PFM1)을 생성할 수 있다.In an embodiment, if there are masked vectors included in the associated feature map (CFM), the preprocessing module 430 may generate the first preprocessing feature map PFM1 using the remaining vectors excluding the masked vectors. .

해상도 개선 모듈(440)은 제1 전처리 피쳐맵(PFM1)을 이용하여 출력 이미지(Img_Out)을 생성할 수 있다. 본 개시의 일 실시예에 따르면, 제1 전처리 피쳐맵(PFM1)의 컬러 채널(CC')에는 각 픽셀의 전체 레벨, 중간 레벨, 작은 레벨 각각에 대한 컨텍스트 정보를 포함할 수 있고, 이러한 컨텍스트 정보들을 모두 활용하여 이미지의 화질을 개선함으로써 이미지 화질 개선율이 증대될 수 있다. The resolution enhancement module 440 may generate an output image Img_Out using the first preprocessing feature map PFM1. According to an embodiment of the present disclosure, the color channel CC' of the first preprocessing feature map PFM1 may include context information for each of a full level, an intermediate level, and a small level of each pixel, and such context information The image quality improvement rate can be increased by improving the quality of the image by utilizing all of them.

도 9는 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다. 9 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.

도 9를 참조하면, 전처리 모듈(430)은 연관 피쳐맵(CFM)을 활용하여 제2 전처리 피쳐맵(PFM2)을 생성할 수 있다. 일 예시에서, 전처리 모듈(430)은 해상도 개선 모듈(440)에서 활용되는 중간 피쳐맵의 사이즈에 적합하도록 연관 피쳐맵(CFM)을 압축함으로써 제2 전처리 피쳐맵(PFM2)을 생성할 수 있다. 일 예시에서, 중간 피쳐맵은 도 2의 제1 피쳐맵(FM1) 내지 제3 피쳐맵(FM3) 중 어느 하나일 수 있다. Referring to FIG. 9 , the preprocessing module 430 may generate a second preprocessing feature map PFM2 by using the associated feature map CFM. In one example, the preprocessing module 430 may generate the second preprocessing feature map PFM2 by compressing the associated feature map CFM to be suitable for the size of the intermediate feature map used in the resolution enhancement module 440. In one example, the intermediate feature map may be any one of the first feature map FM1 to the third feature map FM3 of FIG. 2 .

해상도 개선 모듈(440)은 쿼리 이미지(Img_Qr)를 입력 피쳐맵으로서 수신하고, 제2 전처리 피쳐맵(PFM2)을 중간 피쳐맵으로서 수신하고, 제2 전처리 피쳐맵(PFM2)을 이용하여 쿼리 이미지(Img_Qr)에 대한 화질 개선을 수행함으로써 출력 이미지(Img_Out)를 생성할 수 있다. 본 개시의 일 실시예에 따르면, 중간 피쳐맵으로서 제2 전처리 피쳐맵(PFM2)의 각 픽셀에 대한 복수의 컨텍스트 정보를 활용하여 이미지의 화질을 개선함으로써 이미지 화질 개선율이 증대될 수 있다. The resolution enhancement module 440 receives the query image Img_Qr as an input feature map, receives the second preprocessed feature map PFM2 as an intermediate feature map, and uses the second preprocessed feature map PFM2 to query the image ( An output image (Img_Out) may be generated by performing picture quality improvement on Img_Qr. According to an embodiment of the present disclosure, the image quality improvement rate can be increased by improving the image quality by utilizing a plurality of context information for each pixel of the second preprocessing feature map PFM2 as an intermediate feature map.

도 10은 본 개시의 예시적인 실시예들에 따른 출력 이미지를 생성하는 예를 도시한다. 10 illustrates an example of generating an output image according to exemplary embodiments of the present disclosure.

도 10을 참조하면, 전처리 모듈(430)은 연관 피쳐맵(CFM)을 활용하여 컨텍스트 유사도 손실율(CSL)을 계산할 수 있다. 컨텍스트 유사도 손실율(CSL)은 전체 컨텍스트 벡터(WV), 제1 컨텍스트 벡터(MV), 제2 컨텍스트 벡터(SV) 각각에 대한 주변 컨텍스트의 유사도를 비교하고, 그 유사도 차이값을 나타내는 정보일 수 있다. Referring to FIG. 10 , the pre-processing module 430 may calculate a context similarity loss ratio (CSL) by using a relational feature map (CFM). The context similarity loss rate (CSL) may be information indicating a similarity difference value obtained by comparing similarities of surrounding contexts to each of the entire context vector (WV), the first context vector (MV), and the second context vector (SV). .

해상도 개선 모듈(440)은 컨텍스트 유사도 손실율(CSL)을 이용하여 이미지 화질 개선 모듈(400)을 학습시킬 수 있다. 본 개시의 일 실시예에 따르면, 이미지 화질 개선 모듈(400)이 컨텍스트 벡터를 추출하는 정확도가 높아질 수 있고, 이에 따라서 이미지 화질 개선율이 증대될 수 있다. The resolution improvement module 440 may train the image quality improvement module 400 using the context similarity loss ratio (CSL). According to an embodiment of the present disclosure, the accuracy of extracting the context vector by the image quality improvement module 400 may increase, and accordingly, the image quality improvement rate may increase.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 개시의 진정한 기술적 보호범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.As above, exemplary embodiments have been disclosed in the drawings and specifications. Embodiments have been described using specific terms in this specification, but they are only used for the purpose of explaining the technical idea of the present disclosure, and are not used to limit the scope of the present disclosure described in the meaning or claims. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of protection of the present disclosure should be determined by the technical spirit of the appended claims.

Claims

프로세서를 포함하는 뉴럴 네트워크 장치에 의해 수행되는 이미지 화질 개선 방법에 있어서,
상기 프로세서에 의해, 쿼리 이미지에 대응하는 전체 컨텍스트 벡터 및 상기 쿼리 이미지를 제1 크기를 갖도록 분할한 복수의 제1 파티션 이미지들에 대응하는 복수의 제1 컨텍스트 벡터들을 생성하는 단계;
상기 프로세서에 의해, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터를 브로드 캐스트 함으로써 연관 피쳐맵을 생성하는 단계;
상기 프로세서에 의해, 상기 연관 피쳐맵의 사이즈 또는 데이터 형식에 대한 전처리를 수행함으로써 전처리 피쳐맵을 생성하는 단계;및
상기 프로세서에 의해, 상기 전처리 피처맵과 초고해상도 모델을 활용하여 상기 쿼리 이미지의 화질을 개선시키는 단계;를 포함하는 이미지 화질 개선 방법.
A method for improving image quality performed by a neural network device including a processor, the method comprising:
generating, by the processor, a full context vector corresponding to a query image and a plurality of first context vectors corresponding to a plurality of first partition images obtained by dividing the query image to have a first size;
generating, by the processor, an associated feature map by broadcasting the entire context vector and the first context vector;
generating, by the processor, a preprocessed feature map by performing preprocessing on the size or data format of the associated feature map; and
and improving, by the processor, the quality of the query image by utilizing the preprocessed feature map and the ultra-high resolution model.

제1항에 있어서,
상기 연관 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 전체 컨텍스트 벡터를 대응하는 상기 쿼리 이미지에 대한 픽셀 사이즈만큼 복제하는 단계;
상기 프로세서에 의해, 상기 제1 컨텍스트 벡터를 대응하는 상기 제1 파티션 이미지에 대한 픽셀 사이즈만큼 복제하는 단계;및
상기 프로세서에 의해, 상기 쿼리 이미지에 포함되는 픽셀들 별로 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 픽셀들에 대응하는 컬러 채널을 연관시킴으로써 연관 피쳐맵을 생성하는 단계;를 포함하는 이미지 화질 개선 방법.
According to claim 1,
The step of generating the associated feature map,
replicating, by the processor, the entire context vector by a pixel size of the corresponding query image;
Replicating, by the processor, the first context vector by the pixel size of the corresponding first partition image; and
generating, by the processor, an associated feature map by associating the full context vector, the first context vector, and color channels corresponding to the pixels for each pixel included in the query image; generating an associated feature map; Way.

제2항에 있어서,
상기 전처리 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 단위별로 컨볼루션 연산을 통해 제1 전처리 피쳐맵을 생성하는 단계;를 포함하고,
상기 제1 전처리 피쳐맵에 포함되는 픽셀 사이즈는 상기 컬러 채널의 사이즈에 대응되는 것을 특징으로 하는 이미지 화질 개선 방법.
According to claim 2,
In the step of generating the preprocessing feature map,
Generating, by the processor, a first preprocessing feature map through a convolution operation per pixel for the associated feature map composed of the color channel, the entire context vector, and the first context vector;
The image quality improvement method of claim 1 , wherein a pixel size included in the first preprocessing feature map corresponds to a size of the color channel.

제2항에 있어서,
상기 전처리 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵을 압축함으로써 제2 전처리 피쳐맵을 생성하는 단계;를 포함하고,
상기 제2 전처리 피쳐맵은 상기 초고해상도 모델에서 활용되는 피쳐맵 사이즈를 갖는 것을 특징으로 하는 이미지 화질 개선 방법.
According to claim 2,
In the step of generating the preprocessing feature map,
Generating, by the processor, a second pre-processed feature map by compressing the associated feature map composed of the color channel, the full context vector, and the first context vector;
The second preprocessing feature map has a feature map size utilized in the ultra-high resolution model.

제2항에 있어서,
상기 전처리 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 별로 컨텍스트 유사도 손실율을 계산하는 단계;및
상기 프로세서에 의해, 계산한 상기 컨텍스트 유사도 손실율를 활용하여 이미지 화질 개선 모듈을 학습시키는 단계;를 포함하는 이미지 화질 개선 방법.
According to claim 2,
In the step of generating the preprocessing feature map,
Calculating, by the processor, a context similarity loss rate for each pixel for the associated feature map composed of the color channel, the full context vector, and the first context vector; and
and learning, by the processor, an image quality improvement module by utilizing the calculated context similarity loss rate.

제1항에 있어서,
상기 제1 컨텍스트 벡터를 생성하는 단계는,
상기 프로세서에 의해, 상기 쿼리 이미지를 상기 제1 크기보다 작은 제2 크기를 갖도록 분할한 복수의 제2 파티션 이미지들에 대응하는 제2 컨텍스트 벡터를 생성하는 단계;를 포함하고,
상기 연관 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터를 브로드 캐스트 함으로써 상기 연관 피쳐맵을 생성하는 단계;를 포함하는 이미지 화질 개선 방법.
According to claim 1,
Generating the first context vector comprises:
Generating, by the processor, second context vectors corresponding to a plurality of second partition images obtained by dividing the query image to have a second size smaller than the first size;
The step of generating the associated feature map,
and generating, by the processor, the associated feature map by broadcasting the entire context vector, the first context vector, and the second context vector.

제6항에 있어서,
상기 전처리 피쳐맵을 생성하는 단계는,
상기 프로세서에 의해, 상기 연관 피쳐맵에 포함된 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터 중 적어도 하나가 마스킹되어 있는지 확인하는 단계;및
상기 프로세서에 의해, 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 제2 컨텍스트 벡터 중 적어도 하나가 마스킹 되어 있는 경우, 마스킹된 벡터를 제외한 나머지 벡터를 활용하여 상기 전처리 피쳐맵을 생성하는 단계;를 포함하는 이미지 화질 개선 방법.
According to claim 6,
In the step of generating the preprocessing feature map,
Checking, by the processor, whether at least one of the entire context vector, the first context vector, and the second context vector included in the associated feature map is masked; and
generating, by the processor, the preprocessed feature map by utilizing vectors other than the masked vector when at least one of the entire context vector, the first context vector, and the second context vector is masked; How to improve image quality, including.

쿼리 이미지에 대한 화질을 개선하는 이미지 화질 개선 모듈에 있어서,
쿼리 이미지에 대응하는 전체 컨텍스트 벡터 및 상기 쿼리 이미지를 제1 크기를 갖도록 분할한 복수의 제1 파티션 이미지들에 대응하는 복수의 제1 컨텍스트 벡터들을 생성하는 컨텍스트 벡터 추출 모듈;
상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터를 브로드 캐스트 함으로써 연관 피쳐맵을 생성하는 브로드 캐스팅 모듈;
상기 브로드 캐스트된 컨텍스트 벡터의 사이즈 또는 데이터 형식에 대한 전처리를 수행함으로써 전처리 피쳐맵을 생성하는 전처리 모듈; 및
상기 전처리된 컨텍스트 벡터를 초고해상도 모델에 입력함으로써 상기 쿼리 이미지의 화질을 개선시키는 해상도 개선 모듈;를 포함하는 이미지 화질 개선 모듈.
In the image quality improvement module for improving the quality of a query image,
a context vector extraction module configured to generate an entire context vector corresponding to a query image and a plurality of first context vectors corresponding to a plurality of first partition images obtained by dividing the query image to have a first size;
a broadcasting module generating an associated feature map by broadcasting the entire context vector and the first context vector;
a pre-processing module generating a pre-processed feature map by performing pre-processing on the size or data format of the broadcast context vector; and
and a resolution enhancement module configured to improve the image quality of the query image by inputting the preprocessed context vector to an ultra-high resolution model.

제8항에 있어서,
상기 브로드 캐스팅 모듈은,
상기 전체 컨텍스트 벡터를 대응하는 상기 쿼리 이미지에 대한 픽셀 사이즈만큼 복제하고, 상기 제1 컨텍스트 벡터를 대응하는 상기 제1 파티션 이미지에 대한 픽셀 사이즈만큼 복제하고, 상기 쿼리 이미지에 포함되는 픽셀들 별로 상기 전체 컨텍스트 벡터, 상기 제1 컨텍스트 벡터 및 상기 픽셀들에 대응하는 컬러 채널을 연관시킴으로써 연관 피쳐맵을 생성하는 것을 특징으로 하는 이미지 화질 개선 모듈.
According to claim 8,
The broadcasting module,
The entire context vector is duplicated by the pixel size of the corresponding query image, the first context vector is duplicated by the pixel size of the corresponding first partition image, and the entire context vector is duplicated for each pixel included in the query image. An image quality improvement module, characterized in that for generating an associated feature map by associating a context vector, the first context vector, and color channels corresponding to the pixels.

제9항에 있어서,
상기 전처리 모듈은 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵에 대해서 픽셀 단위별로 컨볼루션 연산을 통해 제1 전처리 피쳐맵을 생성하고,
상기 제1 전처리 피쳐맵에 포함되는 픽셀 사이즈는 상기 컬러 채널의 사이즈에 대응되는 것을 특징으로 하는 이미지 화질 개선 모듈.
According to claim 9,
The pre-processing module generates a first pre-processed feature map through a convolution operation per pixel for the associated feature map composed of the color channel, the entire context vector, and the first context vector;
The image quality improvement module of claim 1 , wherein a pixel size included in the first preprocessing feature map corresponds to a size of the color channel.

제9항에 있어서,
상기 전처리 모듈은 상기 컬러 채널, 상기 전체 컨텍스트 벡터 및 상기 제1 컨텍스트 벡터로 구성된 상기 연관 피쳐맵을 압축함으로써 제2 전처리 피쳐맵을 생성하고,
상기 제2 전처리 피쳐맵은 상기 초고해상도 모델에서 활용되는 피쳐맵 사이즈를 갖는 것을 특징으로 하는 이미지 화질 개선 모듈.

According to claim 9,
The pre-processing module generates a second pre-processed feature map by compressing the associated feature map composed of the color channel, the full context vector, and the first context vector;
The image quality improvement module, characterized in that the second pre-processing feature map has a feature map size utilized in the ultra-high resolution model.