KR102617023B1

KR102617023B1 - System and method for improving light field spatial resolution

Info

Publication number: KR102617023B1
Application number: KR1020220107583A
Authority: KR
Inventors: 김영섭; 박준형; 천현우
Original assignee: 단국대학교 산학협력단
Priority date: 2021-08-26
Filing date: 2022-08-26
Publication date: 2023-12-27
Also published as: KR20230031170A

Abstract

본 발명은 라이트 필트 공간 해상도 개선 방법을 개시한다. 보다 상세하게는, 본 발명은 어텐션 기법과 변형 가능 합성곱을 이용하여 개별 네트워크를 통한 공간 해상도를 개선하는 기법으로서, 기존의 단일 네트워크를 통한 공간 해상도 개선 기법 대비 라이트 필드 내의 개별 이미지들의 피쳐를 보다 잘 포착할 수 있도록 한 라이트 필드 공간 해상도 개선 시스템 및 방법에 관한 것이다.
본 발명의 실시예에 따르면, 라이트 필드 내의 이미지 별로 라이트 필드 이미지 전체를 입력값으로 받아 할당된 개별 이미지에 대하여 해상도를 개선하여 출력하는 개별 네트워크들을 통해, 라이트 필드 이미지 전체의 공간 해상도 개선을 수행함으로써 고해상도 이미지로 복원할 수 있는 효과가 있다.The present invention discloses a method for improving light field spatial resolution. More specifically, the present invention is a technique for improving spatial resolution through an individual network using an attention technique and deformable convolution. Compared to the existing spatial resolution improvement technique through a single network, the present invention better identifies the features of individual images in the light field. It relates to a system and method for improving light field spatial resolution that enables capture.
According to an embodiment of the present invention, the spatial resolution of the entire light field image is improved through individual networks that receive the entire light field image as an input for each image in the light field and output the resolution by improving the resolution of the assigned individual image. It has the effect of being able to restore a high-resolution image.

Description

라이트 필드 공간 해상도 개선 시스템 및 방법{SYSTEM AND METHOD FOR IMPROVING LIGHT FIELD SPATIAL RESOLUTION}Light field spatial resolution improvement system and method {SYSTEM AND METHOD FOR IMPROVING LIGHT FIELD SPATIAL RESOLUTION}

본 발명은 라이트 필트 공간 해상도 개선 방법에 대한 것으로, 특히 어텐션 기법과 변형 가능 합성곱을 이용하여 개별 네트워크를 통한 공간 해상도를 개선하는 기법으로서, 기존의 단일 네트워크를 통한 공간 해상도 개선 기법 대비 라이트 필드 내의 개별 이미지들의 특징을 보다 잘 포착할 수 있도록 한 라이트 필드 공간 해상도 개선 시스템 및 방법에 관한 것이다.The present invention relates to a method for improving the spatial resolution of a light field. In particular, it is a technique for improving spatial resolution through an individual network using an attention technique and deformable convolution. Compared to the existing spatial resolution improvement method through a single network, the present invention relates to a method for improving spatial resolution in a light field. It relates to a system and method for improving light field spatial resolution to better capture the characteristics of images.

라이트 필드(Light Field) 기술은 빛의 세기와 방향 정보를 획득 후 디스플레이를 이용하여 3차원 정보를 그대로 재현함으로써, 사용자에게 3차원 정보를 제공하는 기술이다. Light Field technology is a technology that provides three-dimensional information to users by obtaining light intensity and direction information and then using a display to reproduce the three-dimensional information.

일반적인 카메라는 영상을 촬영할 때 렌즈의 모든 부분을 통해 들어온 빛을 합쳐 상을 만드는 것과 대비하여 라이트 필드 기술이 적용된 카메라는 동일한 위치에 들어오는 빛들을 방향 별로 나누어 측정하게 된다.When shooting an image, a general camera combines the light coming through all parts of the lens to create an image, but a camera with light field technology divides the light entering the same location and measures it by direction.

이러한 라이트 필드 카메라를 구현하는 방법으로는 카메라를 여러 대 두어 각 방향으로의 영상을 각각의 카메라로 촬영하는 방법, 마이크로렌즈 어레이(microlens array; MLA)를 이용하여 입사되는 빛을 방향 별로 분할하여 상을 맺는 방법, 카메라의 메인 렌즈와는 별도로 다수의 렌즈를 장착하여, 방향별로 별개의 상을 만든 후 이 상들을 카메라로 취득하는 방법 등이 있다.Methods of implementing such a light field camera include installing multiple cameras to capture images in each direction with each camera, and dividing the incident light by direction using a microlens array (MLA) to produce images. There is a method of attaching multiple lenses separately from the main lens of the camera, creating separate images for each direction, and then acquiring these images with the camera.

이러한 방법에 따라 획득한 라이트 필드 영상은 기존의 2차원 영상에 비해 광선의 방향 정보를 추가로 포함하고 있으며, 추가되는 정보들을 활용하여 재초점 영상, 3차원 깊이 정보 추정 등 다양한 영상처리를 수행할 수 있다는 장점이 있다. The light field image obtained according to this method contains additional information on the direction of light rays compared to existing two-dimensional images, and the additional information can be used to perform various image processing such as refocusing images and 3D depth information estimation. There is an advantage in that it can be done.

그러나, 라이트 필드 영상의 정보는 기존의 2차원 영상이 가지는 공간 도메인(spatial domain)의 해상도를 갖는 동시에 방향의 정보를 가지는 각 도메인(angular domain)의 해상도를 갖게 되는데, 영상 취득 단계에서 다차원의 정보를 2차원의 센서로 취득하므로 라이트 필드 영상은 공간 도메인의 해상도와 각 도메인의 해상도 사이의 상호 관계(trade-off)로 인한 해상도의 제약이 존재하게 된다.However, the information of the light field image has the resolution of the spatial domain of the existing two-dimensional image, and at the same time has the resolution of each domain containing directional information. Multidimensional information is used in the image acquisition stage. Since the light field image is acquired with a two-dimensional sensor, there are resolution limitations due to the trade-off between the resolution of the spatial domain and the resolution of each domain.

이러한 상호 관계에 따라 렌즈 개수 및 렌즈 크기가 제한적인 상용 라이트 필드 취득 장비에서 해상도가 감소하게 된다. 즉, 제한된 정보량에서 각 도메인의 해상도가 증가하면 필연적으로 공간 도메인의 해상도가 감소하게 되며, 이는 재초점 영상, 3차원 깊이 정보 추정 등 다양한 응용 프로그램에서의 사용자 만족도를 낮추는 원인이 된다.This correlation results in reduced resolution in commercial light field acquisition equipment with limited lens numbers and lens sizes. In other words, as the resolution of each domain increases with a limited amount of information, the resolution of the spatial domain inevitably decreases, which causes lower user satisfaction in various applications such as refocusing images and 3D depth information estimation.

이러한 문제점을 해결하기 위한 방법으로서, 라이트 필드 초해상도(Light Field Super-Resolution; LFSR) 알고리즘이 제안되었다. 이는 일반적인 선형 보간 기법에 비해 수행 시간은 좀 더 오래 소요되나 정확도 및 정밀도가 높다는 장점을 갖는다.As a method to solve this problem, the Light Field Super-Resolution (LFSR) algorithm has been proposed. This takes longer to perform than general linear interpolation techniques, but has the advantage of high accuracy and precision.

등록특허공보 제10-1957735호(공고일자: 2019.03.13.)Registered Patent Publication No. 10-1957735 (Publication date: 2019.03.13.)

J. Jin, J. Hou, J. Chen and S. Kwong, "Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2257-2266, doi: 10.1109/CVPR42600.2020.00233. J. Jin, J. Hou, J. Chen and S. Kwong, "Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020 , pp. 2257-2266, doi: 10.1109/CVPR42600.2020.00233. Y. Wang et al., "Light Field Image Super-Resolution Using Deformable Convolution," in IEEE Transactions on Image Processing, vol. 30, pp. 1057-1071, 2021, doi: 10.1109/TIP.2020.3042059. Y. Wang et al., “Light Field Image Super-Resolution Using Deformable Convolution,” in IEEE Transactions on Image Processing, vol. 30, pp. 1057-1071, 2021, doi: 10.1109/TIP.2020.3042059. J. Dai et al., "Deformable Convolutional Networks," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 764-773, doi: 10.1109/ICCV.2017.89. J. Dai et al., "Deformable Convolutional Networks," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 764-773, doi: 10.1109/ICCV.2017.89. Woo S., Park J., Lee JY., Kweon I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1 Woo S., Park J., Lee J.Y., Kweon I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1 Zhang Y., Li K., Li K., Wang L., Zhong B., Fu Y. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_18 Zhang Y., Li K., Li K., Wang L., Zhong B., Fu Y. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision - ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_18 K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90. K. He, 770-778, doi: 10.1109/CVPR.2016.90. W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1874-1883, doi: 10.1109/CVPR.2016.207. W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1874-1883, doi: 10.1109/CVPR.2016.207.

본 발명은 전술한 문제점을 해결하기 위해 안출된 것으로, 본 발명은 라이트 필드의 학습 기반 공간 해상도를 개선하는 방법으로서, 어텐션 기법과 변형가능 합성곱을 이용한 개별 네트워크를 통해 공간 해상도를 개선함으로써, 기존의 단일 네트워크를 통한 공간 해상도 개선 기법과 비교하여 라이트 필드 내의 개별 이미지들의 특징을 보다 잘 포착할 수 있도록 하는 데 과제가 있다.The present invention was conceived to solve the above-described problems. The present invention is a method of improving the learning-based spatial resolution of a light field by improving the spatial resolution through an individual network using an attention technique and deformable convolution, thereby reducing the existing Compared to spatial resolution improvement techniques through a single network, the challenge is to better capture the characteristics of individual images within the light field.

전술한 과제를 해결하기 위해, 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템은, 라이트 필드 이미지를 입력받는 입력부, 상기 라이트 필드 이미지의 각도 위치별로 구성되고, 각도 위치별 이미지에 가중치를 격리하여 피쳐를 추출하는 복수의 개별 네트워크 및, 상기 복수의 개별 네트워크로부터 출력되는 각 출력값의 손실 함수를 계산하고, 역전파를 통해 각 출력값을 보정하는 손실함수 계산부를 포함할 수 있다.In order to solve the above-described problem, the light field spatial resolution improvement system according to an embodiment of the present invention is composed of an input unit that receives a light field image, each angular position of the light field image, and separate weights on the images for each angular position. It may include a plurality of individual networks that extract features, a loss function calculation unit that calculates a loss function of each output value output from the plurality of individual networks, and corrects each output value through backpropagation.

상기 개별 네트워크는, 상기 각도 위치별 이미지를 보틀넥(Bottleneck) 구조 및 DCN을 포함하는 Res 블록에 적용하여 각도 위치별 피쳐를 추출하는 피쳐 추출부, 상기 피쳐 추출부를 통해 추출된 피쳐를 하나 이상의 콘볼루셔널 블록 어텐션 모듈(Convolutional Block Attention Module)을 통해 피쳐 내에 저주파수 정보를 강조하는 피쳐 강조부 및, 저주파수 정보가 강조된 피쳐들을 이피션트 서브 픽셀 컨볼루션 레이어(Efficient Sub-pixel Convolution layer)를 통해 업스케일링하고, 1Χ1 CNN 층을 통해 라이트 필드 이미지를 구축하는 업샘플러를 포함할 수 있다.The individual network includes a feature extraction unit that applies the image for each angular position to a Res block including a bottleneck structure and a DCN to extract features for each angular position, and extracts the features extracted through the feature extraction unit into one or more cone A feature highlighting unit that emphasizes low-frequency information within features through the Convolutional Block Attention Module, and upscaling of features with emphasized low-frequency information through an efficient sub-pixel convolution layer. And, it may include an upsampler that builds a light field image through a 1Χ1 CNN layer.

상기 피쳐 추출부의 DCN는, 이하의 수학식, The DCN of the feature extraction unit is expressed by the following equation,

으로 표현될 수 있다(단, p_n은 CNN 필터 격자 내의 좌표, w는 가중치 행렬, p₀는 CNN 필에 입력된 행렬에서 필터가 적용될 위치, Δp는 pn의 이동좌표). It can be expressed as (where p _n is the coordinate within the CNN filter grid, w is the weight matrix, p ₀ is the position where the filter will be applied in the matrix input to the CNN fill, and Δp is the movement coordinate of pn).

상기 피쳐 강조부는, 각도별로 추출된 피쳐들을 입력받아 MaxPool 및 AvgPool을 수행하여 1X1X채널 개수의 행렬로 만들어 채널 어텐션 맵(M_C)을 추출하고, 상기 채널 어텐션 맵을 이용하여 입력된 피쳐들에 요소별 곱을 수행하여 피쳐 채널간 차등을 부여하는 제1 콘볼루셔널 블록 어텐션 모듈, 모든 각도 위치의 피쳐들을 입력받아 MaxPool 및 AvgPool을 수행하여 두 개의 행렬을 획득하고, 상기 두 개의 행렬을 연관시키고 CNN 레이어를 이용하여 공간 어텐션 맵(M_S)을 추출하고, 채널에 차등이 부여된 각 피쳐에 요소별 곱을 수행하여 피쳐 내 부분에 차등을 부여하는 제2 콘볼루셔널 블록 어텐션 모듈 및, 각 각도 위치의 피쳐들에 대하여, 롱 스킵 커넥션(Long Skip Connection)을 수행하여 피쳐 내의 저주파수 정보를 강조하는 스킵 모듈을 포함할 수 있다.The feature highlighting unit receives the features extracted for each angle, performs MaxPool and _AvgPool to create a matrix with 1 A first convolutional block attention module that performs star multiplication to provide differentiation between feature channels, receives features at all angular positions, performs MaxPool and AvgPool to obtain two matrices, associates the two matrices, and creates a CNN layer A second convolutional block attention module that extracts a spatial attention map (M _S ) using For features, a skip module may be included that emphasizes low-frequency information within the feature by performing a long skip connection.

상기 업 샘플러의 이피션트 서브 픽셀 컨볼루션 레이어는, 히든 레이어들로부터 추출한 피쳐 맵을 확장시켜 서로 합하여 업스케일링을 수행할 수 있다.The efficient subpixel convolution layer of the upsampler can perform upscaling by expanding feature maps extracted from hidden layers and combining them.

본 발명의 실시예에 따르면, 라이트 필드 내의 이미지 별로 라이트 필드 이미지 전체를 입력값으로 받아 할당된 개별 이미지에 대하여 해상도를 개선하여 출력하는 개별 네트워크들을 통해, 라이트 필드 이미지 전체의 공간 해상도 개선을 수행함으로써 고해상도 이미지로 복원할 수 있는 효과가 있다. According to an embodiment of the present invention, the spatial resolution of the entire light field image is improved through individual networks that receive the entire light field image as an input for each image in the light field and output the resolution by improving the resolution of the assigned individual image. It has the effect of being able to restore a high-resolution image.

도 1은 종래의 단일 네트워크 구조의 라이트 필드 공간 해상도 개선 시스템을 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 개별 네트워크 구조의 라이트 필드 공간 해상도 개선 시스템을 나타낸 도면이다.
도 3은 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 개별 네트워크의 구조를 나타낸 도면이다.
도 4는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 추출부에 적용되는 보틀넥(Bottleneck) 구조를 나타낸 도면이다.
도 5는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 추출부에 포함되는 디포머블 컨볼루션 네트워크(DCN)의 피쳐 추출 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 강조부에 포함되는 컨볼루셔널 블록 어텐션 모듈의 구조를 나타낸 도면이다.
도 7은 도 6에 도시된 컨볼루셔널 블록 어텐션 모듈의 채널 어텐션 모듈의 구조를 나타낸 도면이다.
도 8은 도 6에 도시된 컨볼루셔널 블록 어텐션 모듈의 공간(Spatial) 어텐션 모듈의 구조를 나타낸 도면이다.
도 9는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 업샘플링부에 포함되는 이피션트 서브 픽셀 컨볼루션 레이어(Efficient Sub-pixel Convolution layer)의 네트워크 구조를 나타낸 도면이다.
도 10은 도 9의 이피션트 서브 픽셀 컨볼루션 레이어에 의한 업스케일링 방법을 나타낸 도면이다.Figure 1 is a diagram showing a conventional light field spatial resolution improvement system of a single network structure.
Figure 2 is a diagram showing a light field spatial resolution improvement system with an individual network structure according to an embodiment of the present invention.
Figure 3 is a diagram showing the structure of an individual network of a light field spatial resolution improvement system according to an embodiment of the present invention.
Figure 4 is a diagram showing a bottleneck structure applied to the feature extraction unit of the light field spatial resolution improvement system according to an embodiment of the present invention.
Figure 5 is a diagram for explaining a feature extraction method of a deformable convolutional network (DCN) included in the feature extraction unit of the light field spatial resolution improvement system according to an embodiment of the present invention.
Figure 6 is a diagram showing the structure of a convolutional block attention module included in the feature highlighting unit of the light field spatial resolution improvement system according to an embodiment of the present invention.
FIG. 7 is a diagram showing the structure of the channel attention module of the convolutional block attention module shown in FIG. 6.
FIG. 8 is a diagram showing the structure of the spatial attention module of the convolutional block attention module shown in FIG. 6.
Figure 9 is a diagram showing the network structure of an efficient sub-pixel convolution layer included in the upsampling unit of the light field spatial resolution improvement system according to an embodiment of the present invention.
FIG. 10 is a diagram illustrating an upscaling method using the efficient subpixel convolution layer of FIG. 9.

상기한 바와 같은 본 발명을 첨부된 도면들과 실시예들을 통해 상세히 설명하도록 한다. The present invention as described above will be described in detail through the attached drawings and examples.

본 발명에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 발명에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in the present invention are only used to describe specific embodiments and are not intended to limit the present invention. In addition, the technical terms used in the present invention, unless specifically defined in a different sense in the present invention, should be interpreted as meanings generally understood by those skilled in the art in the technical field to which the present invention pertains, and are not overly comprehensive. It should not be interpreted in a literal or excessively reduced sense. Additionally, if the technical term used in the present invention is an incorrect technical term that does not accurately express the idea of the present invention, it should be replaced with a technical term that can be correctly understood by a person skilled in the art. In addition, general terms used in the present invention should be interpreted according to the definition in the dictionary or according to the context, and should not be interpreted in an excessively reduced sense.

또한, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 본 발명에서, "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성 요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Additionally, as used in the present invention, singular expressions include plural expressions unless the context clearly dictates otherwise. In the present invention, terms such as “consists of” or “comprises” should not be construed as necessarily including all of the various components or steps described in the invention, and some of the components or steps are included. It may not be possible, or it should be interpreted as including additional components or steps.

또한, 본 발명에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 구성 요소들을 설명하는데 사용될 수 있지만, 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Additionally, terms including ordinal numbers, such as first, second, etc., used in the present invention may be used to describe components, but the components should not be limited by the terms. Terms are used only to distinguish one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. However, identical or similar components will be assigned the same reference numbers regardless of the reference numerals, and duplicate descriptions thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.Additionally, when describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted. In addition, it should be noted that the attached drawings are only intended to facilitate easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the attached drawings.

도 1은 종래의 단일 네트워크 구조의 라이트 필드 공간 해상도 개선 시스템을 나타낸 도면이고, 도 2는 본 발명의 실시예에 따른 개별 네트워크 구조의 라이트 필드 공간 해상도 개선 시스템을 나타낸 도면이다.FIG. 1 is a diagram showing a conventional light field spatial resolution improvement system with a single network structure, and FIG. 2 is a diagram showing a light field spatial resolution improvement system with an individual network structure according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 라이트 필드 공간 해상도 개선 시스템은, 네트워크의 출력값에 대해 손실 함수를 계산하고, 이를 바탕으로 역전파를 통해 가중치(w)를 학습하는 과정을 수행하는 것으로, 출력값(img_2)을 비롯하여 입력값(img_i)이 정답값(img_2)으로부터 무엇이 부족한 지를 파악하고 이를 보정하는 것으로 볼 수 있다.Referring to Figures 1 and 2, the light field spatial resolution improvement system calculates a loss function for the output value of the network and performs a process of learning the weight (w) through backpropagation based on this, and the output value ( It can be seen as identifying what the input value (img_i), including img_2), is lacking from the correct value (img_2), and correcting this.

도 1은 전체 이미지 기반 단일 이미지 개선 네트워크를 기준으로 단일 네트워크를 사용할 때의 구조로서, 단일 네트워크(10)를 사용 시 어느 각도 위치의 이미지이건 간에 해상도 개선시, 동일한 네트워크를 통과하게 된다.Figure 1 is a structure when using a single network based on the entire image-based single image improvement network. When using a single network 10, the image at any angular position passes through the same network when improving resolution.

이러한 구조에 따르면, 학습 과정에서부터 네트워크가 모든 각도 위치들에 대해서 평균적으로 부족한 부분을 중점으로 보정하도록 학습하게 하는 것으로, 각도 위치에 따라 부족한 부분이 다를 수 있으며, 결국 단일 네트워크 구조는 동일한 장면의 다른 시점이라는 독특한 특성을 지니는 라이트 필드 이미지의 공간 해상도 개선의 성능을 제한할 수 있다.According to this structure, from the learning process, the network learns to compensate for the shortcomings on average for all angular positions, so the shortcomings may vary depending on the angular position, and in the end, a single network structure can be used to compensate for the shortcomings in the same scene. This may limit the performance of improving the spatial resolution of light field images, which have the unique characteristic of viewpoint.

반면, 도 2는 개별 네트워크를 사용할 때의 구조를 나타낸 것으로, 도 2에 도시된 바와 같이, 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템은, 라이트 필드 이미지를 입력받는 입력부(90), 라이트 필드 이미지(img_i)의 각도 위치별로 구성되고, 각도 위치별 이미지에 가중치를 격리하여 피쳐를 추출하는 복수의 개별 네트워크(100) 및, 복수의 개별 네트워크(100)로부터 출력되는 각 출력값(img_1)의 손실함수를 계산하고, 역전파를 통해 각 출력값(img_1)을 보정하는 손실함수 계산부(110)를 포함할 수 있으며, 이러한 구조에서 개별 네트워크(100)는, 각도 위치 별로 가중치(w)를 격리함에 따라, 특정한 각도 위치에 대해 부족한 부분을 집중적으로 학습할 수 있도록 하는 특징이 있다.On the other hand, Figure 2 shows the structure when using an individual network. As shown in Figure 2, the light field spatial resolution improvement system according to an embodiment of the present invention includes an input unit 90 that receives a light field image, A plurality of individual networks (100) configured for each angular position of the light field image (img_i) and extracting features by isolating weights in the image for each angular position, and each output value (img_1) output from the plurality of individual networks (100) It may include a loss function calculation unit 110 that calculates the loss function and corrects each output value (img_1) through backpropagation. In this structure, the individual network 100 sets a weight (w) for each angular position. By isolating, there is a feature that allows you to focus on learning the missing part for a specific angular position.

이하, 도면을 참조하여 본 발명의 실시예에 따른 개별 네트워크에 기반한 라이트 필드 공간 해상도 개선 시스템을 상세히 설명한다.Hereinafter, a light field spatial resolution improvement system based on an individual network according to an embodiment of the present invention will be described in detail with reference to the drawings.

개별 네트워크individual network

도 3은 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 개별 네트워크의 구조를 나타낸 도면이다.Figure 3 is a diagram showing the structure of an individual network of a light field spatial resolution improvement system according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 개별 네트워크(100)는, 각도 위치별 이미지(img_i)를 보틀넥(Bottleneck) 구조 및 DCN을 포함하는 Res 블록(DBR)에 적용하여 각도 위치별 피쳐(feature)를 추출하는 피쳐 추출부(200), 피쳐 추출부를 통해 추출된 피쳐를 하나 이상의 콘볼루셔널 블록 어텐션 모듈(Convolutional Block Attention Module)을 통해 피쳐 내에 저주파수 정보를 강조하는 피쳐 강조부(300) 및, 저주파수 정보가 강조된 피쳐들을 이피션트 서브 픽셀 컨볼루션 레이어(Efficient Sub-pixel Convolution layer)를 통해 업스케일링하고, 1Χ1 CNN 층을 통해 라이트 필드 이미지를 구축하는 업샘플러(400)를 포함할 수 있다.Referring to FIG. 3, the individual network 100 of the light field spatial resolution improvement system according to an embodiment of the present invention converts the image (img_i) for each angular position into a Res block (DBR) including a bottleneck structure and DCN. ) is applied to the feature extraction unit 200 to extract features for each angular position, and the features extracted through the feature extraction unit are used to extract low-frequency information within the feature through one or more convolutional block attention modules. A feature highlighting unit 300 that emphasizes, and an upsampler that upscales features with emphasized low-frequency information through an efficient sub-pixel convolution layer and builds a light field image through a 1Χ1 CNN layer. It may include (400).

피쳐 추출부(200)는, Res 블록(DBR)에서 파생된 보틀넥 구조에 DCN(Deformable Convolution Network)을 적용하여 입력값으로부터 피쳐를 추출할 수 있다.The feature extraction unit 200 may extract features from input values by applying a Deformable Convolution Network (DCN) to the bottleneck structure derived from the Res block (DBR).

상세하게는, 피쳐 추출부(200)는 임의의 개수(n)의 디포머블 보틀넥(Deformable Bottleneck) Res 블록(DBR)으로 구성된 피쳐 추출 레이어에 각 각도 위치별 이미지들(F1)을 통과시킴으로써 각도 위치별로 피쳐들(F2)을 추출하게 된다. In detail, the feature extraction unit 200 passes the images (F1) for each angular position through a feature extraction layer composed of a random number (n) of Deformable Bottleneck Res blocks (DBR). Features (F2) are extracted for each angular position.

피쳐 강조부(300)는 각도 위치 별로 추출된 피쳐들(F2)을 제1 CBAM(Convolutional Block Attention Module; 310)에 통과시켜 피쳐들에 차등을 준 뒤, 제1 1x1 CNN 레이어(320)를 통과시켜 피쳐 채널의 개수를 임의의 개수로 압축할 수 있다. 이어서, 모든 각도 위치의 피쳐들을 칸케터네이트(concatenate) 함수(330)를 통해 연관시킨 후, 다시 제2 CBAM(Convolutional Block Attention Module; 340)에 통과시키고 제2 1X1 CNN 레이어(350)로 채널을 압축할 수 있다.The feature highlighting unit 300 passes the features F2 extracted for each angular position through a first CBAM (Convolutional Block Attention Module; 310) to differentiate the features, and then passes through the first 1x1 CNN layer 320. The number of feature channels can be compressed to an arbitrary number. Subsequently, the features at all angular positions are associated through the concatenate function 330, then passed through the second CBAM (Convolutional Block Attention Module; 340) and channeled to the second 1X1 CNN layer 350. It can be compressed.

이 때, 피쳐의 채널들이 압축되고 나서 업샘플링 레이어로 이어지는 활성화 함수를 통과하기 전, 피쳐들에 네트워크에 입력됐던 개선해야 하는 각도 위치의 이미지의 값을 더함으로써, 롱 스킵 커넥션(Long skip connection)을 수행하여 피쳐 내의 저주파수 정보를 강조하여 네트워크가 고주파수 정보의 학습에 집중하도록 한다.At this time, after the feature channels are compressed and before passing through the activation function leading to the upsampling layer, the value of the image of the angular position to be improved that was input to the network is added to the features, creating a long skip connection. is performed to emphasize low-frequency information within the feature, allowing the network to focus on learning high-frequency information.

업 샘플러(400)는 업스케일 부(410) 및 1x1 CNN 레이어(420)를 포함할 수 있고, 임의의 개수의 이피션트 서브 픽셀 컨볼루션 레이어(efficient sub-pixel convolution layer)를 통해 피쳐들을 업스케일링하고, 1x1 CNN 레이어(420)를 통해 단일 초해상도 전체 이미지(img_o)를 출력할 수 있다. The upsampler 400 may include an upscale unit 410 and a 1x1 CNN layer 420, and upscales features through an arbitrary number of efficient sub-pixel convolution layers. And, a single super-resolution entire image (img_o) can be output through the 1x1 CNN layer 420.

이하, 도면을 참조하여 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템을 이루는 각 구성부의 구조 및 기능을 통해 본 발명의 기술적 사상을 상세히 설명한다. Hereinafter, the technical idea of the present invention will be described in detail through the structure and function of each component forming the light field spatial resolution improvement system according to the embodiment of the present invention with reference to the drawings.

피쳐 추출부Feature extraction unit

도 4는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 추출부에 적용되는 보틀넥(Bottleneck) 구조를 나타낸 도면이고, 도 5는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 추출부에 포함되는 디포머블 컨볼루션 네트워크(DCN)의 피쳐 추출 방법을 설명하기 위한 도면이다.Figure 4 is a diagram showing a bottleneck structure applied to the feature extraction unit of the light field spatial resolution improvement system according to an embodiment of the present invention, and Figure 5 is a diagram showing the light field spatial resolution improvement system according to an embodiment of the present invention. This is a diagram to explain the feature extraction method of the deformable convolutional network (DCN) included in the feature extraction part of .

도 4를 참조하면, 본 발명의 실시예에 따른 피쳐 추출부는 심층망에서의 깊이(depth) 증가에 따른 오버 피팅, gradient의 소멸, 연산량의 증가 등의 문제를 해결하기 위해, 잔차 학습(residual learning)을 이용하는 것을 특징으로 하며, 이러한 잔차 학습은 빌딩 블록(building block)을 보틀렉(bottleneck) 구조로 수정한 블록을 이용할 수 있고, 이는 3층 스택으로서 1X1, 3X3, 1x1 컨볼루션으로 구성될 수 있다.Referring to FIG. 4, the feature extraction unit according to an embodiment of the present invention uses residual learning to solve problems such as overfitting, disappearance of gradient, and increase in computation amount due to increase in depth in a deep network. ), and this residual learning can use a building block modified into a bottleneck structure, which is a 3-layer stack and can be composed of 1X1, 3X3, and 1x1 convolution. there is.

또한, 본 발명의 실시예에 따르면, 피쳐 추출부의 Res 블록은 전술한 보틀넥 구조에 디포머블 컨볼루션 네트워크(Deformable Convolutional Network; DCN)를 적용한 것을 특징으로 한다. Additionally, according to an embodiment of the present invention, the Res block of the feature extraction unit is characterized by applying a Deformable Convolutional Network (DCN) to the above-described bottleneck structure.

상세하게는, DCN은 기존의 CNN에 레이어를 하나 추가하여 필터의 값 뿐만 아니라 형태도 학습 가능한 가중치의 일종으로 만든 것이다. 이러한 CNN은 이하의 수학식 1으로 표현될 수 있다. In detail, DCN adds a layer to the existing CNN to create a type of weight that can learn not only the value but also the shape of the filter. This CNN can be expressed as Equation 1 below.

상기의 수학식 1에서, p_n은 CNN 필터 격자 내의 좌표를 의미하며, w는 가중치 행렬, 즉 필터를 의미하고, p₀은 CNN에 입력된 행렬에서 필터가 적용될 위치를 의미한다.In Equation 1 above, p _n refers to the coordinates in the CNN filter grid, w refers to the weight matrix, that is, the filter, and p ₀ refers to the position where the filter is to be applied in the matrix input to the CNN.

도 5를 참조하면, (a)는 종래의 컨볼루션에서 값을 추출하는 영역을 나타낸 것으로, p₀은 입력값 내에서의 가운데의 초록색 점의 좌표이고, 은 p_n입력값과 무관하게 필터 내에서 가운데의 초록색 점을 기준으로 모든 초록색 점을 가르키는 좌표이고, w는 필터 내에서 각 초록색 점의 값을 가리킬 수 있다. 이를 참조하면, DCN은 상기의 수학식 1을 이하의 수학식 2로 변형할 수 있다.Referring to Figure 5, (a) shows the area where values are extracted in a conventional convolution, where p ₀ is the coordinate of the green dot in the center within the input value, and p _n is the coordinate within the filter regardless of the input value. It is a coordinate pointing to all green points based on the green point in the center, and w can point to the value of each green point within the filter. With reference to this, DCN can transform Equation 1 above into Equation 2 below.

상기의 수학식 2에서, p_n를 학습하는 추가적인 레이어를 통해, 가중치 행렬 자체는 변경없이 가중치 행렬이 실제로 입력값에 적용되는 위치만 변경하여 필터의 형태를 가변적으로 만들 수 있으며, 필터의 형태 또한 학습의 대상으로 만들 수 있다.In Equation 2 above, through an additional layer that learns p _n , the shape of the filter can be made variable by changing only the position where the weight matrix is actually applied to the input value without changing the weight matrix itself, and the shape of the filter is also It can be made into an object of learning.

그리고, 도 5의 (a)를 제외한 나머지 (b),(c),(d)는 디포머블 컨볼루션으로 값을 추출하는 영역을 나타낸 것으로, 이에 표시된 화살표는 일종의 벡터로서 작용하는 p_n으로서, 화살표의 시작지점이 원래 CNN의 공식에 따라 각 가중치들이 적용됐어야 할 값이고, 도착지점이 추가적인 레이어의 학습을 통해 변형된 실제로 적용되는 위치의 값으로 볼 수 있다.In addition, except for (a) in Figure 5, (b), (c), and (d) show areas where values are extracted by deformable convolution, and the arrow shown here represents p _n , which acts as a kind of vector. , the starting point of the arrow is the value to which each weight should have been applied according to the original CNN formula, and the destination point can be seen as the value of the actual applied position transformed through learning of additional layers.

피쳐 강조부Feature highlighting

도 6은 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 피쳐 강조부에 포함되는 컨볼루셔널 블록 어텐션 모듈(Convolutional Block Attention Module, CBAM; 310)의 구조를 나타낸 도면이다.FIG. 6 is a diagram showing the structure of a convolutional block attention module (CBAM) 310 included in the feature highlighting unit of the light field spatial resolution improvement system according to an embodiment of the present invention.

CBAM(100)은 CNN에 사용하기 위한 어텐션 메커니즘(Attention mechanism)을 모듈화한 것이다.CBAM (100) is a modularized attention mechanism for use in CNN.

일반적으로, CNN 모델의 성능을 향상시키는 방법으로는, depth, width 및 cardinality의 세 가지 요소가 있는 것으로 알려져 있다. 여기서, depth는 층을 의미하고, width는 필터 수를 의미하며, cardinality는 xepction과 resnext에서 제안된 group convolution에서 group의 수를 의미한다.In general, it is known that there are three factors to improve the performance of CNN models: depth, width, and cardinality. Here, depth refers to the layer, width refers to the number of filters, and cardinality refers to the number of groups in the group convolution proposed in xepction and resnext.

CBAM(310)은 위 세가지 요소를 제외하고 어텐션 모듈(Attention module)을 사용하여 모델의 성능을 향상시키는 것을 특징으로 한다.CBAM (310) is characterized by improving model performance by using an attention module, excluding the above three factors.

도 6을 참조하면, 이러한 어텐션 모듈은 채널 어텐션 모듈(Channel attention module; 311)과 공간 어텐션 모듈(Spatial attention module; 315)로 구성되어 있으며, 각각의 어센텬 모듈은 채널과 공간 각각에 대한 어텐션 맵(Attention map)을 생성할 수 있다. 생성한 어텐션 맵을 입력되는 피쳐 에 곱하여 필요없는 정보는 억제하고 중요한 정보는 강조하게 된다. 이러한 CBAM은 무시할만한 작은 연산량으로 CNN 구조에 적용할 수 있다.Referring to FIG. 6, this attention module consists of a channel attention module (311) and a spatial attention module (315), and each attention module provides an attention map for each channel and space. (Attention map) can be created. By multiplying the generated attention map with the input features, unnecessary information is suppressed and important information is emphasized. This CBAM can be applied to the CNN structure with a negligibly small amount of computation.

즉, 어텐션 모듈은 추출한 피쳐들 중에서도 보다 중요한 피쳐에 주목(attention)하고자 한 것으로, 피쳐들에 차등을 주기 위한 레이어를 구성함으로써 구현된다.In other words, the attention module is intended to pay attention to more important features among the extracted features, and is implemented by configuring a layer to differentiate the features.

전술한 바와 같이, CBAM(300)은 입력된 피쳐들(F,F')은 먼저 피쳐 채널들에 차등을 주기 위한 채널 어텐션 모듈(311)을 통과한 후, 피쳐 내의 부분들에 차등을 주기 위한 공간 어텐션 모듈(315)을 통과한다.As described above, the CBAM 300 first passes the input features (F, F') through the channel attention module 311 to differentiate the feature channels, and then to differentiate the parts within the feature. It passes through the spatial attention module 315.

상세하게는, 도 7은 도 6에 도시된 컨볼루셔널 블록 어텐션 모듈의 채널 어텐션 모듈의 구조를 나타낸 도면으로서, 도 7에 나타낸 바와 같이 채널 어텐션 모듈(311)은 먼저 입력된 피쳐들(F)에 각각 MaxPool과 AvgPool을 수행하여 피쳐들을 각각 1x1x채널 개수의 행렬로 만들고, 동일한 Shared MLP에 각각 통과시킨다.In detail, Figure 7 is a diagram showing the structure of the channel attention module of the convolutional block attention module shown in Figure 6. As shown in Figure 7, the channel attention module 311 first inputs the features (F). By performing MaxPool and AvgPool, respectively, the features are created into matrices with the number of 1x1x channels, and each is passed through the same Shared MLP.

이어서, 채널 어텐션 모듈(311)은 MLP를 통과한 각 행렬을 합한 후, 이를 활성화 함수에 통과시키고 CNN 레이어에 입력하여 채널 어텐션 맵(Channel Attention map; M_C)를 추출한다. 이러한 M_C로 입력 피쳐들에 요소별 곱을 수행함으로써 채널 간에 차등을 줄 수 있다.Next, the channel attention module 311 adds each matrix that has passed the MLP, passes it through an activation function, and inputs it to the CNN layer to extract a channel attention map ( _MC ). With this M _C , differentiation can be made between channels by performing element-wise multiplication of the input features.

다음으로, 도 8은 도 6에 도시된 컨볼루셔널 블록 어텐션 모듈의 공간(Spatial) 어텐션 모듈의 구조를 나타낸 도면으로서, 도 8에 나타낸 바와 같이 채널 축을 따라서 각각 MaxPool과 AvgPool을 수행하여 2개의 행렬을 획득하게 된다.Next, Figure 8 is a diagram showing the structure of the spatial attention module of the convolutional block attention module shown in Figure 6. As shown in Figure 8, MaxPool and AvgPool are performed respectively along the channel axis to obtain two matrices. is obtained.

일반적인 CNN에서 풀링(Pooling)시, 세로축을 기준으로 하여 채널 별로 풀링하는 것과 달리, 가로축을 기준으로 전 피쳐 채널들을 아울러 큰 숫자를 찾고 합계의 평균을 찾는다. 그리고, 두 행렬을 컨케터네이트(concatenate) 함수에 입력하여 연관시킨 후, CNN 레이어를 통과시켜 공간 어텐션 맵(Spatial attention map; M_S)를 추출하고, 이전의 과정에서 채널 간에 차등이 주어진 피쳐에 요소별 곱을 수행함으로써 피쳐 내의 부분들에도 차등을 주게 된다.When pooling in a typical CNN, unlike pooling by channel based on the vertical axis, large numbers are found across all feature channels based on the horizontal axis and the average of the sum is found. Then, after inputting the two matrices into the concatenate function and relating them, passing them through the CNN layer to extract a spatial attention map ( _MS ), and adding the features that were differentiated between channels in the previous process. By performing element-by-element multiplication, the parts within the feature are also differentiated.

업샘플링부Upsampling section

도 9는 본 발명의 실시예에 따른 라이트 필드 공간 해상도 개선 시스템의 업샘플링부에 포함되는 이피션트 서브 픽셀 컨볼루션 레이어(Efficient Sub-pixel Convolution layer; ESPC)의 네트워크 구조를 나타낸 도면이고, 도 10은 도 9의 이피션트 서브 픽셀 컨볼루션 레이어에 의한 업스케일링 방법을 나타낸 도면이다.FIG. 9 is a diagram showing the network structure of an efficient sub-pixel convolution layer (ESPC) included in the upsampling unit of the light field spatial resolution improvement system according to an embodiment of the present invention, and FIG. 10 is a diagram showing the upscaling method using the efficient subpixel convolution layer of FIG. 9.

전술한 구성에 따라 추출 및 강조된 피쳐들에 대하여, 업샘블링부는 끝에서 업샘플링(upsampling)을 통해 연산량과 파라미터를 수를 줄이고 성능을 높인 모델로서, 본 발명의 업샘플링부는 전술한 ESPC 레이어와 1X1 CNN 레이어를 이용하여 업샘플링 절차를 수행할 수 있다.For the features extracted and emphasized according to the above-mentioned configuration, the upsampling unit is a model that reduces the number of calculations and parameters and improves performance through upsampling at the end. The upsampling unit of the present invention is a model that combines the above-mentioned ESPC layer with 1 The upsampling procedure can be performed using a CNN layer.

도 9 및 도 10을 참조하면, 업샘플링부에 따르면 원본 이미지가 H x W 사이즈일 때(a), r배로 업스케일링된 이미지는 rH x rW 사이즈가 되며, 이에 따라 마지막 레이어인 ESPC 레이어는 로우 해상도(LR)보다 r^2만큼 채널 수(feature map 개수)를 늘린 다음(b), 피쳐 맵(feature map)을 순서대로 조합해서 하이 해상도(HR) 이미지를 출력할 수 있다(c). 여기서, 입력으로 들어온 LR 이미지의 채널이 1, 즉 이미지가 1개이므로, 최종 결과인 HR 이미지 또한 채널이 1개, 이미지가 1개가 된다(a→b→c). Referring to Figures 9 and 10, according to the upsampling unit, when the original image is H You can increase the number of channels (number of feature maps) by r^2 more than the resolution (LR) (b), and then output a high resolution (HR) image by combining the feature maps in order (c). Here, since the input LR image has 1 channel, that is, 1 image, the final result HR image also has 1 channel and 1 image (a→b→c).

특히, 본 발명의 실시예에 따른 업샘플링부는 저해상도 입력값에 대하여 복수의 히든 레이어들로부터 추출한 피쳐 맵을 확장시켜 서로 합하여 업스케일링을 수행할 수 있다.In particular, the upsampling unit according to an embodiment of the present invention may perform upscaling by expanding feature maps extracted from a plurality of hidden layers for low-resolution input values and combining them.

상기한 설명에 많은 사항이 구체적으로 기재되어 있으나 이것은 발명의 범위를 한정하는 것이라기보다 바람직한 실시예의 예시로서 해석되어야 한다. 따라서, 발명은 설명된 실시예에 의하여 정할 것이 아니고 특허청구범위와 특허청구범위에 균등한 것에 의하여 정하여져야 한다.Although many details are described in detail in the above description, this should be interpreted as an example of a preferred embodiment rather than limiting the scope of the invention. Therefore, the invention should not be determined by the described embodiments, but by the scope of the patent claims and their equivalents.

10 : 단일 네트워크 100 : (복수의) 개별 네트워크
200 : 피쳐 추출부 300 : 피쳐 강조부
320, 350, 420 : CNN 레이어
310, 340 : 콘볼루셔널 블록 어텐션 모듈(CBAM)
311 : 채널 어텐션 모듈 315 : 공간 어텐션 모듈
330 : 컨케터네이트(concatenate) 함수
400 : 업샘플러 410 : 업스케일링
img_i : 입력값 img_o : 출력값
F1, F2, F3 : 피쳐10: single network 100: (multiple) individual networks
200: feature extraction section 300: feature highlighting section
320, 350, 420: CNN layers
310, 340: Convolutional block attention module (CBAM)
311: Channel attention module 315: Space attention module
330: concatenate function
400: Upsampler 410: Upscaling
img_i: input value img_o: output value
F1, F2, F3: Features

Claims

라이트 필드 이미지를 입력받는 입력부;
상기 라이트 필드 이미지의 각도 위치별로 구성되고, 각도 위치별 이미지에 가중치를 격리하여 피쳐를 추출하는 복수의 개별 네트워크; 및
상기 복수의 개별 네트워크로부터 출력되는 각 출력값의 손실 함수를 계산하고, 역전파를 통해 각 출력값을 보정하는 손실함수 계산부를 포함하고,
상기 개별 네트워크는,
상기 각도 위치별 이미지를 보틀넥(Bottleneck) 구조 및 DCN을 포함하는 Res 블록에 적용하여 각도 위치별 피쳐를 추출하는 피쳐 추출부;
상기 피쳐 추출부를 통해 추출된 피쳐를 하나 이상의 콘볼루셔널 블록 어텐션 모듈(Convolutional Block Attention Module)을 통해 피쳐 내에 저주파수 정보를 강조하는 피쳐 강조부; 및
저주파수 정보가 강조된 피쳐들을 히든 레이어들로부터 추출한 피쳐 맵을 확장시켜 서로 합하여 업스케일링을 수행하는 이피션트 서브 픽셀 컨볼루션 레이어(Efficient Sub-pixel Convolution layer) - 상기 이피션트 서브 픽셀 컨볼루션 레이어는 로우 해상도보다 r^2(r은 업스케일 배율)만큼 채널 수를 늘린 다음, 상기 피쳐 맵을 순서대로 조합함 - 를 통해 업스케일링하고, 1 X 1 CNN 층을 통해 라이트 필드 이미지를 구축하는 업샘플러
를 포함하는 라이트 필드 공간 해상도 개선 시스템.An input unit that receives a light field image;
a plurality of individual networks configured for each angular position of the light field image and extracting features by isolating weights in the image for each angular position; and
A loss function calculation unit that calculates a loss function for each output value output from the plurality of individual networks and corrects each output value through backpropagation,
The individual networks are:
a feature extraction unit that extracts features for each angular position by applying the image for each angular position to a Res block including a bottleneck structure and DCN;
a feature highlighting unit that emphasizes low-frequency information in the features extracted through the feature extraction unit through one or more convolutional block attention modules; and
An efficient sub-pixel convolution layer that performs upscaling by expanding feature maps extracted from hidden layers and combining features with emphasized low-frequency information - the efficient sub-pixel convolution layer has low resolution An upsampler that increases the number of channels by r^2 (r is the upscale factor), then upscales the feature maps by combining them in order, and builds a light field image through a 1
A light field spatial resolution improvement system comprising:

삭제delete

제 1 항에 있어서,
상기 피쳐 추출부의 DCN는, 이하의 수학식,

으로 표현되는 라이트 필드 공간 해상도 개선 시스템(단, p_n은 CNN 필터 격자 내의 좌표, w는 가중치 행렬, p₀는 CNN 필터에 입력된 행렬에서 필터가 적용될 위치, Δp는 pn의 이동좌표). According to claim 1,
The DCN of the feature extraction unit is expressed by the following equation,

A light field spatial resolution improvement system expressed as (where p _n is the coordinate within the CNN filter grid, w is the weight matrix, p ₀ is the position where the filter will be applied in the matrix input to the CNN filter, and Δp is the movement coordinate of pn).

제 1 항에 있어서,
상기 피쳐 강조부는,
각도별로 추출된 피쳐들을 입력받아 MaxPool 및 AvgPool을 수행하여 1X1X채널 개수의 행렬로 만들어 채널 어텐션 맵(M_C)을 추출하고, 상기 채널 어텐션 맵을 이용하여 입력된 피쳐들에 요소별 곱을 수행하여 피쳐 채널간 차등을 부여하는 제1 콘볼루셔널 블록 어텐션 모듈;
모든 각도 위치의 피쳐들을 입력받아 MaxPool 및 AvgPool을 수행하여 두 개의 행렬을 획득하고, 상기 두 개의 행렬을 연관시키고 CNN 레이어를 이용하여 공간 어텐션 맵(M_S)을 추출하고, 채널에 차등이 부여된 각 피쳐에 요소별 곱을 수행하여 피쳐 내 부분에 차등을 부여하는 제2 콘볼루셔널 블록 어텐션 모듈; 및
각 각도 위치의 피쳐들에 대하여, 롱 스킵 커넥션(Long Skip Connection)을 수행하여 피쳐 내의 저주파수 정보를 강조하는 스킵 모듈
을 포함하는 라이트 필드 공간 해상도 개선 시스템.According to claim 1,
The feature highlighting part is,
Receive the features extracted for each angle, perform MaxPool and _AvgPool to create a matrix with 1 A first convolutional block attention module that provides differentiation between channels;
Receive features at all angular positions, perform MaxPool and AvgPool to obtain two matrices, associate the two matrices, extract a spatial attention map (M _S ) using a CNN layer, and differentiate the channels. a second convolutional block attention module that performs element-by-element multiplication of each feature to provide differentiation to parts within the feature; and
Skip module that emphasizes low-frequency information within features by performing long skip connection for features at each angular position
A light field spatial resolution improvement system comprising:

삭제delete