CN117274067A

CN117274067A - Light field image blind super-resolution processing method and system based on reinforcement learning

Info

Publication number: CN117274067A
Application number: CN202311561062.5A
Authority: CN
Inventors: 居法银; 李宁; 朱虎
Original assignee: Jiangsu Youzhong Micro Nano Semiconductor Technology Co ltd; Zhejiang Unisom New Material Technology Co ltd
Current assignee: Jiangsu Youzhong Micro Nano Semiconductor Technology Co ltd; Zhejiang Unisom New Material Technology Co ltd
Priority date: 2023-11-22
Filing date: 2023-11-22
Publication date: 2023-12-22

Abstract

The invention discloses a light field image blind super-resolution processing method and system based on reinforcement learning, which relate to the technical field of image processing and comprise the following steps: acquiring a low-resolution light field image and a corresponding high-resolution light field image, and storing the low-resolution light field image and the corresponding high-resolution light field image in a data set; extracting spatial features and angular features of a light field image to be super-resolved in the data set; processing the light field image to be super-resolved by a fuzzy kernel predictor and acquiring a fuzzy kernel; and inputting a light field image blind super-resolution network based on reinforcement learning according to the extracted spatial features, the extracted angular features and the extracted fuzzy kernel, and obtaining a light field image with high resolution. According to the invention, the spatial information and the angular information of the light field image are combined with the blind super-resolution network, so that the spatial information and the angular information in the light field image are decoupled, the light field image information can be fully utilized, and the suboptimal problem is solved.

Description

Light field image blind super-resolution processing method and system based on reinforcement learning

Technical Field

The invention relates to the technical field of image processing, in particular to a light field image blind super-resolution processing method and system based on reinforcement learning.

Background

The super-resolution (SR) is a process of recovering a High Resolution (HR) image from a given Low Resolution (LR) image, and has important application value in the fields of monitoring devices, satellite image remote sensing, digital high definition, microscopic imaging, video coding communication, video restoration, medical imaging, and the like. At the same time, SR is a long-standing problem in computer vision, and in order to obtain an image with high reconstruction performance, the SR method needs to obtain as much useful information as possible from an input low-resolution (LR) image. For common image super resolution (SISR) in a single field, good performance can be obtained by fully utilizing the neighborhood context (i.e. spatial information) of the image. In contrast, however, a Light Field (LF) video camera can record and obtain multiple view images of a scene in a manner different from conventional camera shots. The spatial resolution information can be recorded, and the angular resolution information can be recorded through apertures with different angles, so that more information can be utilized in the processing process, the reconstruction performance of the image can be further improved, and the light field image has wide application in refocusing, depth estimation, significance detection and scene 3D perception. However, due to the limitation of hardware conditions, the light field image has a constraint relation between spatial resolution (image resolution of each view) and angular resolution (sampling density of view), and spatial and angular information is highly coupled in the 4D light field and contributes to the light field image SR in different ways, so that it is difficult for the network to directly use the coupling information.

The current image super-resolution method is mainly divided into two types: non-blind SR and blind SR. In the non-blind SR method, a fuzzy core (degradation core) is assumed to be known in the training process, namely, a certain fuzzy core is generally used for acting on an image in the training process to obtain a degraded LR image, and then the LR image and the HR image are used for training a network, so that the obtained network can obtain a better recovery effect when facing the LR image generated by the same fuzzy core. Moreover, most of the early work of deep learning to do super resolution is non-blind SR, but no fuzzy core is provided in most practical applications. Thus, the SR problem with unknown fuzzy kernels, i.e., blind SR, is a more attractive area in academia and industry. The method for initially solving the blind SR problem is mainly to decompose the blind SR problem into two sub-problems, i.e., estimating a blur kernel from an input LR image and generating an SR image based on the estimated kernel. But this approach is not an end-to-end training approach and can lead to suboptimal problems. Therefore, how to realize the recovery of HR image based on the blind SR is the problem to be solved by the invention.

Disclosure of Invention

In order to solve the problem of non-micro evaluation index optimization under light field image blind super-resolution processing while ensuring that a network can be quickly trained and tested, the invention provides a light field image blind super-resolution processing method based on reinforcement learning, which comprises the following steps:

s1: acquiring a low-resolution light field image and a corresponding high-resolution light field image, and storing the low-resolution light field image and the corresponding high-resolution light field image into a data set;

s2: extracting spatial features and angular features of a light field image to be super-resolved in the data set;

s3: processing the light field image to be super-resolved by a fuzzy kernel predictor and acquiring a fuzzy kernel;

s4: and inputting a light field image blind super-resolution network based on reinforcement learning according to the extracted spatial features, the extracted angular features and the extracted fuzzy kernel, and obtaining a light field image with high resolution.

Further, in the step S1, before the light field image is stored in the data set, the method further includes the steps of:

extracting pixel blocks at the same position in each sub-aperture image to form a macro pixel block;

and orderly arranging the macro pixel blocks according to the relative position relation to obtain macro pixel images and storing the macro pixel images into a data set.

Further, in the step S2, in the process of extracting the spatial feature and the angular feature by the spatial feature extractor and the angular feature extractor, the angular feature extractor adopts a convolution operation that the convolution kernel size is the same as the step size and is not filled, and the spatial feature extractor is a convolution with the expansion size same as the step size of the angular feature extractor.

Further, in the step S3, the fuzzy kernel predictor predicts the fuzzy kernel by aggregating the spatial features and the angular features.

Further, in the step S4, in the light field image blind super-resolution network based on reinforcement learning, the reinforcement learning is to optimize the blind super-resolution network by using the non-micro-perceivable index.

Further, the reinforcement learning adopts a depth deterministic strategy gradient as a learning direction, expressed as the following formula:

in the method, in the process of the invention,is critical network parameter->To minimize the loss function, i is the number of image blocks taking values 1 to N, Q is the participant, < ->For the attenuation coefficient, r is the actual prize, s is the low resolution image, and a is the blur kernel.

The invention also provides a light field image blind super-resolution processing system based on reinforcement learning, which comprises the following steps:

the data acquisition module is used for acquiring the low-resolution light field image and the corresponding high-resolution light field image and storing the low-resolution light field image and the corresponding high-resolution light field image into a data set;

the characteristic extraction module is used for extracting the spatial characteristics and the angular characteristics of the light field image to be super-resolved in the data set;

the fuzzy kernel prediction module is used for adapting to blind super resolution, and space information and angle information of a light field image to be super-resolved are aggregated by a residual block and global average pooling to obtain a fuzzy kernel;

and the self-adaptive modulation network module is used for inputting a light field image blind super-resolution network based on reinforcement learning according to the extracted spatial features, the extracted angular features and the extracted fuzzy kernel, and obtaining a light field image with high resolution.

Compared with the prior art, the invention at least has the following beneficial effects:

(1) According to the light field image blind super-resolution processing method and system based on reinforcement learning, spatial information and angular information of the light field image are combined with a blind super-resolution network, so that the spatial information and the angular information in the light field image are decoupled, and the light field image information can be fully utilized;

(2) And a reinforcement learning algorithm is introduced into the whole blind super-resolution frame, so that the problem that the blind super-resolution index cannot be evaluated in a micro manner is solved.

Drawings

FIG. 1 is a step diagram of a light field image blind super-resolution processing method based on reinforcement learning;

fig. 2 is a block diagram of a light field image blind super-resolution processing system based on reinforcement learning.

Detailed Description

The following are specific embodiments of the present invention and the technical solutions of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these embodiments.

Example 1

In order to realize the recovery of HR images based on blind SR, as shown in FIG. 1, the invention provides a light field image blind super-resolution processing method based on reinforcement learning, which comprises the following steps:

s1: acquiring a low-resolution light field image and a corresponding high-resolution light field image, and storing the low-resolution light field image and the corresponding high-resolution light field image in a data set;

The LR small block image is obtained by downsampling the low-resolution sub-aperture light field image in the data set for double three times with scale factors of 2 and 4, and then the LR small block image is recombined into a macro pixel image, so that a training data set is obtained and is used as the input of a blind SR network.

Considering that a light field image has a 4D structure, it can be expressed specifically asWhere U and V represent angular dimensions (e.g., u=3, v=4 for a 3×4 light field image), and H and W represent the height and width of each sub-aperture image (SAI), respectively. Intuitively, the Light Field (LF) may be considered as a 2D set of angles of SAI, each angular coordinate +.>SAI at this point can be expressed as +.>. Similarly, LF can also be considered as a 2D spatial set of macro-Pixel images (MacPI), i.e., each spatial coordinate +.>The macro-pixels at may be represented as. In the present invention, u=v=a is set, where a represents angular resolution.

The sub-aperture images (SAIs) are stored by a multi-aperture method, namely, information shot at different angles is recorded by different images, and then spatial information is recorded by a certain spatial resolution. In a conventional digital image, one pixel point is represented as one pixel, but a light field image stored by using macro pixels is a small pixel block representing one pixel, and each small pixel block is a macro pixel block.

When the light field image is stored as SAI, the angle information is hidden in different SAI, so that the extraction becomes difficult, and the invention selects the macro pixel image form to store the light field image. For this reason, the sub-aperture image needs to be converted into a macro-pixel image before the light field image is stored. Firstly, extracting pixel blocks at the same position in each aperture image to form a macro pixel block, then placing each macro pixel block together according to the corresponding position, and finally obtaining a light field image under the macro pixel image and storing the light field image into a data set.

For the acquired light field image, the invention selects a Spatial Feature Extractor (SFE) and an Angular Feature Extractor (AFE) to extract and decouple the spatial features and the angular features. Wherein the AFE is a convolution with a kernel size of a x a, a step size of a, and zero padding is performed, so that the feature size generated by the AFE is(C represents the feature depth). Accordingly, SFE is a dilation convolution of core size 3 x 3, step size 1 and the same as SFE step size (to ensure size consistency of the data signature and continuity of information). The AFE adopts the step length with the same size as the convolution kernels, so that each convolution kernel can cover different positions of input data, a plurality of local features are extracted, and zero padding is adopted, so that the space size of an output result is reduced after each convolution, the parameter quantity and the calculation complexity of a model are reduced, and the output features are ensured to have the same space size AHXAW as that of input MacPI. Meanwhile, when the angle features are extracted, each macro pixel can be accurately convolved through the AFE, so that information among different macro pixels cannot be aliased. And in extracting spatial features, each SAICan be convolved by SFE without involving angle information, so that the spatial and angle information in LF is decoupled.

As for the prediction kernelThe invention constructs a fuzzy kernel predictor which takes Residual Blocks (RB) as basic blocks and uses a Global Average Pool (GAP) to gather spatial information and angle information at the end of the network to obtain a fuzzy kernel.

And according to the obtained spatial features, angle features and the fuzzy kernel, blind super-resolution acquisition of the image can be performed. But unlike the traditional end-to-end training method, the invention first trains the non-blind SR network(taking into account the functions of the blur kernel k and the LR image) then ++over the network by the following optimization formula>Optimization (function of estimating the blur kernel k from LR images):

in the middle ofIs->Is aimed at obtaining +.>(is->Model parameters of (2)%>For HR image, ++>For LR image, +.>Either a loss associated with fidelity (e.g., L1 loss) or a loss associated with perception (e.g., GAN loss) may be selected based on task requirements. In this way, an end-to-end network can be re-established and blind SR training is more comfortable and rapid.

That is, the invention firstly proposes an adaptive modulation network (AMNet) for processing a non-blind SR model, and a kernel predictor is equipped in the AMNet, and the model is optimized through the optimization formula, so that the model is further upgraded into the blind SR model.

Further, in order to reduce the calculation cost and improve the training and testing speed, the invention also blurs the kernelFlattening and reducing the fuzzy kernel by Principal Component Analysis (PCA) to obtain a simplified kernel +.>(the real set R of core size t) the output characteristics of the simplified core control network are directly used. The present invention proposes an improved AdaIN based on the existing adaptive instance norms (AdaIN) to implement image synthesis in StyleGAN. Given an input image feature->Simplified core->AdaIN may be defined as:

wherein,and->Representing the reduced kernel respectively>Conversion into a function of scaling value (scaling) and bias value (bias), ->And->The mean and standard deviation of x in all spatial dimensions (h×w) in each channel c are:

but only useAnd->The mean and standard deviation of the input image features cannot be adjusted well. If AdaIN is used to synthesize the image in StyleGAN, each blur kernel controls a specific level of features (features related to feature resolution), so that the synthesized image has better performance. The invention thus uses a simplified core by constructing an adaptive modulation layer (AMLAyer)>Guiding modification of the mean and standard deviation of the current feature, the adaptive modulation layer may be expressed as:

in the method, in the process of the invention,are all full connection layers, are->For connection operations across channel c.

Aiming at the blind SR problem under the guidance of the non-differentiable evaluation index, a bridge needs to be built between the SR and the evaluation index. The aim is to make the obtained SR image better in the appointed evaluation index by selecting the fuzzy kernel. And defines this as automatic parameter selection, for which the present invention employs reinforcement learning RL. In the RL framework, tuples need to be definedWherein S is the image space, A is the action space, p is the action taken +.>After that, the input state +.>Mapping to result State->R is a bonus function. Specifically to super-resolution of light field image, S is image space including LR image and SR image, A is blur kernel +.>P is based on the input state s (LR image) and action +.>(fuzzy core) get output state->Non-blind function of (SR image)>. Of the most important is that r, which is composed of predefined evaluation metrics, such as PSNR (micro) and NIQE (non-micro), acts to evaluate actions (blur kernels) from a given state (image).

Specifically, the invention selects depth deterministic strategy gradient (DDPG) as RL method, and the original DDPG passes through minimum lossConsumption optimized critical network parameters：

Wherein,，/>representing the target criticizing network, the target participant and the decay function, respectively. Since the task only needs to be performed in one step, the equation can be reduced to:

if actually rewardsThen estimate +.>And->An error of 0.1 is possible. However, when +.>The penalty is expected to be greater, so the equation above is further:

if it isThen indicate rewards->On the contrary, if->Then indicate rewards->. According to this formula, when the first term +.>And->When the signs of the (B) are different, the result is a punishment item; when the first item->And->When the signs of (c) are the same, the result is 0.

Example two

In order to better understand the technical content of the present invention, the present embodiment describes the present invention in a form of a system structure, as shown in fig. 2, a light field image blind super-resolution processing system based on reinforcement learning, including:

In summary, according to the light field image blind super-resolution processing method and system based on reinforcement learning, spatial information and angular information of the light field image are combined with the blind super-resolution network, so that the spatial information and the angular information in the light field image are decoupled, and the light field image information can be fully utilized. And a reinforcement learning algorithm is introduced into the whole blind super-resolution frame, so that the problem that the blind super-resolution index cannot be evaluated in a micro manner is solved.

It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.

Furthermore, descriptions such as those referred to herein as "first," "second," "a," and the like are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.

Claims

1. A light field image blind super-resolution processing method based on reinforcement learning is characterized by comprising the following steps:

2. The method for blind super-resolution processing of a light field image based on reinforcement learning as set forth in claim 1, wherein in said step S1, before the light field image is stored in the data set, the method further comprises the steps of:

3. The method of claim 1, wherein in the step S2, in the process of extracting the spatial feature and the angular feature by using a spatial feature extractor and an angular feature extractor, respectively, the angular feature extractor uses a convolution operation with a convolution kernel having the same size as the step size and not being filled, and the spatial feature extractor is a convolution with an expansion size identical to the step size of the angular feature extractor.

4. The method for blind super-resolution processing of a light field image based on reinforcement learning according to claim 1, wherein in the step S3, a fuzzy kernel predictor predicts a fuzzy kernel by aggregating spatial features and angular features.

5. The method for blind super-resolution processing of light field images based on reinforcement learning according to claim 1, wherein in the step S4, in the blind super-resolution network of light field images based on reinforcement learning, the blind super-resolution network is optimized by reinforcement learning by adopting non-micro-perceivable indexes.

6. The method for blind super-resolution processing of a light field image based on reinforcement learning according to claim 5, wherein the reinforcement learning adopts a depth deterministic strategy gradient as a learning direction, expressed as the following formula:

7. The light field image blind super-resolution processing system based on reinforcement learning is characterized by comprising: