CN112308085B

CN112308085B - Light field image denoising method based on convolutional neural network

Info

Publication number: CN112308085B
Application number: CN202011144012.3A
Authority: CN
Inventors: 蒋刚毅; 陈晔曜; 郁梅
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2023-06-09
Anticipated expiration: 2040-10-23
Also published as: CN112308085A

Abstract

The invention discloses a light field image denoising method based on a convolutional neural network, which comprises the steps of firstly, respectively recombining a 4D light field image into a sub-aperture image and a micro lens array image; then constructing an initial stack space convolution block and an initial angle convolution block to extract space features and angle features from the sub-aperture image and the microlens array image respectively; then, a space angle joint encoder group is introduced to model an information compensation relation between the space features and the angle features and improve the expression capacity of the features; based on the extracted spatial features and the angular features, constructing a spatial-angular feature fusion device group to fully utilize the features to enrich the detail information of the reconstructed denoising light field image; finally, reconstructing fusion features output by the spatial angular feature fusion device group into a denoising light field image by using the constructed decoder; the method has the advantages that noise existing in the light field image is effectively removed, texture information of the denoised light field image can be reconstructed, and structural consistency of the denoised light field image is reserved.

Description

Light field image denoising method based on convolutional neural network

Technical Field

The invention relates to an image denoising method, in particular to a light field image denoising method based on a convolutional neural network.

Background

Light is one of the most important media for human perception in the natural world, but traditional imaging methods for recording light information lose the direction information of light traveling in free space. Light field imaging is an emerging imaging technology that is capable of simultaneously collecting intensity and direction information of light rays, and is receiving widespread attention in industry and academia. At the same time, many industrial and commercial applications of light field images have also been developed, such as post-capture refocusing, virtual reality display, depth estimation, and the like. However, since industrial and commercial light field cameras such as Lytro cameras currently on the market are designed based on microlens arrays, the design results in sparse sampling of light rays reaching the sensor imaging plane, and thus the collected raw light field image is subject to a large amount of noise pollution, which seriously affects the visual quality of the light field image on the one hand, and reduces the performance of the light field image application on the other hand.

Light field image denoising has become an indispensable preprocessing step for improving light field image quality and improving subsequent light field image visual tasks. Existing light field image denoising methods can be divided into three categories:

The first type of method is to consider that a 4D light field image can be represented by a series of 2D Sub-aperture images (SAIs), and thus it is proposed to apply an existing representative 2D image denoising method such as Block-Matching 3D filtering method (Block-Matching and 3D filtering,BM3D) to each SAI independently for denoising purposes. However, each SAI loses the angular information of the light field image, so the following scholars propose to apply a block matching 3D filtering method to each polar plane image (Epipolar plane image, EPI) of the light field image, each EPI containing the 1D space and the 1D angular information of the light field image. The second type of method is to replace the 2D image denoising method by adopting a Video denoising method such as a Video Block-Matching 4D filtering method (Video Block-Matching and 4D filtering,VBM4D) based on the first type of method so as to achieve a better light field image denoising effect. Specifically, each SAI of the light field image is arranged into a pseudo video sequence, and then the pseudo video sequence is processed by adopting the existing video denoising method; similarly, arranging each EPI of a light field image may also produce a corresponding pseudo-video sequence. Both the above two methods apply the existing 2D image or video denoising method to the light field data according to the visualization modes of the light field image, i.e. SAI and EPI, but the inherent 4D structure of the light field image is not yet completely explored, which can damage the structural consistency of the denoised light field image.

The third method is to design a denoising method specific to the 4D light field image by researching a special structure of the light field image. Danser et al propose a linear 4D frequency domain hyper-sector filter for removing noise in a light field image, which explores the characteristics of the light field image in the frequency domain. Allain et al propose a light field image denoising method based on 4D anisotropic diffusion, which can obtain a better denoising effect through multiple iterations. However, both of the above methods create problems of incomplete or excessive smoothing of image textures in noise removal when processing light field images contaminated with high intensity noise. Subsequently, chen et al proposed a light field image denoising method based on learning and anisotropic analysis, which improved the removal effect of high-intensity noise, but only considered the horizontal and vertical directions when using the 2D angular information of the light field image, and did not fully utilize the 4D structural characteristics and intrinsic geometric information of the light field image.

In summary, although the related research at present has achieved a better denoising effect for the light field image, there is still a certain disadvantage in the problem of removing the noise with high intensity, and in particular, there is a certain room for improvement in reconstructing texture information of the denoised light field image and retaining structural consistency of the denoised light field image.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a light field image denoising method based on a convolutional neural network, which can fully utilize the 4D structural characteristics of the light field image, namely the information of a 2D space domain and a 2D angle domain, effectively remove noise existing in the light field image, reconstruct texture information of the denoised light field image and keep the structural consistency of the denoised light field image.

The technical scheme adopted for solving the technical problems is as follows: the light field image denoising method based on the convolutional neural network is characterized by comprising the following steps of:

step one: selecting Num noise light field images with spatial resolution of W multiplied by H and angular resolution of U multiplied by V and corresponding Num reference noise-free light field images; then recombining each noise light field image into a noise sub-aperture image with the width W and the height H of a U X V single channel, and recombining each noise light field image into a noise micro-lens array image with the width U X W and the height V X H of a single channel; then, forming a training set by using the Num noise light field images, the corresponding Num reference noise-free light field images, the corresponding Num multiplied by U multiplied by V noise sub-aperture images and the corresponding Num noise micro lens array images; wherein Num > 1;

Step two: constructing a convolutional neural network as a spatial angle joint coding network: the spatial-angular joint coding network comprises an initial stack spatial convolution block for extracting spatial features of a light field image, an initial angular convolution block for extracting the angular features of the light field image, a spatial-angular joint coder group for jointly processing the spatial features and the angular features of the light field image and improving the feature expression capability, a spatial-angular feature fusion device group for fusing the spatial features and the angular features of the light field image, and a decoder for reconstructing a target denoising light field image, wherein the spatial-angular joint coder group comprises a first spatial-angular joint coder, a second spatial-angular joint coder and a third spatial-angular joint coder which are sequentially connected, and the spatial-angular feature fusion device group comprises a first spatial-angular feature fusion device, a second spatial-angular feature fusion device and a third spatial-angular feature fusion device;

for an initial stack space convolution block, the initial stack space convolution block consists of U multiplied by V first convolution layers, the input end of each first convolution layer receives a noise sub-aperture image with the width W and the height H of a single channel, the output end of each first convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith first convolution layer is denoted as F _s0,i The set of all the spatial feature maps output by the initial stack spatial convolution block is denoted as F _S0 ，F _S0 ＝{F _s0,1 ,…,F _s0,U×V -a }; wherein i is more than or equal to 1 and less than or equal to U multiplied by V, F _s0,1 Representing the set of all spatial feature patterns output by the output end of the 1 st first convolution layer, F _s0,U×V The size of convolution kernel of U x V first convolution layers is 3 x 3, convolution step length is 1, input channel number is 1, output channel number is 32, and adopted activation function is shown as a set formed by all spatial feature diagrams output by output ends of U x V first convolution layersThe number is 'leakage ReLU', and U multiplied by V first convolution layers share the weight parameter of the convolution kernel;

for an initial angle convolution block, the initial angle convolution block consists of 1 second convolution layer, the input end of the second convolution layer receives a single-channel noise micro lens array image with the width of U multiplied by W and the height of V multiplied by H, the output end of the second convolution layer outputs 32 angle characteristic images with the width of W and the height of H, and the set formed by all the angle characteristic images output by the output end of the second convolution layer is denoted as F _A0 The method comprises the steps of carrying out a first treatment on the surface of the The size of the convolution kernel of the second convolution layer is U multiplied by V, the horizontal convolution step length is U, the vertical convolution step length is V, the number of input channels is 1, the number of output channels is 32, and the adopted activation function is 'leakage ReLU';

For a set of spatial-angle joint encoders, a first input of a first spatial-angle joint encoder receives F _S0 The second input of the first spatial angle joint encoder receives F _A0 The first output end of the first spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S1 ，F _S1 ＝{F _s1,1 ,…,F _s1,U×V The second output end of the first spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A1 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the second spatial angle joint encoder receives F _S1 The second input end of the second spatial angle joint encoder receives F _A1 The first output end of the second spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S2 ，F _S2 ＝{F _s2,1 ,…,F _s2,U×V The second output end of the second spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and all the angle characteristic diagrams output by the second output end are formed The aggregate is denoted as F _A2 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the third spatial angle joint encoder receives F _S2 The second input end of the third spatial angle joint encoder receives F _A2 The first output end of the third spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S3 ，F _S3 ＝{F _s3,1 ,…,F _s3,U×V The second output end of the third spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _s1,1 Representing a set of 1 st set of spatial feature maps output by a first output of a first spatial angle joint encoder, F _s1,U×V Representing a set of U x V group spatial feature maps output by a first output of a first spatial angle joint encoder, F _s2,1 Representing a set of 1 st set of spatial feature maps output by a first output of a second spatial angle joint encoder, F _s2,U×V Representing a set of U x V group spatial feature maps output by a first output of a second spatial angle joint encoder, F _s3,1 Representing a set of 1 st set of spatial feature maps output by the first output of the third spatial angle joint encoder, F _s3,U×V Representing a set of the U x V group spatial feature maps output by the first output end of the third spatial angle joint encoder;

for the set of spatial angle feature fusion devices, a first input of the first spatial angle feature fusion device receives F _S1 The second input end of the first spatial angle feature fusion device receives F _A1 The output end of the first spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the first spatial angle feature fusion device is marked as F _Fused1 ，F _Fused1 ＝{F _fused1,1 ,…,F _fused1,U×V -a }; second spatial angleA first input terminal of the feature fusion device receives F _S2 All the space feature graphs in (a), the second input end of the second space angle feature fusion device receives F _A2 The output end of the second spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the second spatial angle feature fusion device is marked as F _Fused2 ，F _Fused2 ＝{F _fused2,1 ,…,F _fused2,U×V -a }; the first input end of the third space angle feature fusion device receives F _S3 The second input end of the third spatial angle feature fusion device receives F _A3 The output end of the third spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the third spatial angle feature fusion device is marked as F _Fused3 ，F _Fused3 ＝{F _fused3,1 ,…,F _fused3,U×V -a }; wherein F is _fused1,1 A set of 1 st group of fusion feature images output by the output end of the first space angle feature fusion device is represented, F _fused1,U×V A set of the fusion feature graphs of the first group of the U x V groups output by the output end of the first space angle feature fusion device, F _fused2,1 A set of 1 st group of fusion feature graphs output by the output end of the second space angle feature fusion device, F _fused2,U×V Representing the set formed by the U X V group fusion characteristic diagrams output by the output end of the second space angle characteristic fusion device, F _fused3,1 A set of 1 st group of fusion feature graphs output by the output end of the third space angle feature fusion device, F _fused3,U×V Representing a set formed by a U multiplied by V group fusion characteristic diagram output by an output end of the third space angle characteristic fusion device;

for F _Fused1 All of the fused feature maps F _Fused2 All of the fused feature maps F _Fused3 Performing cascading operation on all the fusion characteristic graphs in the database, and marking a set formed by the fusion characteristic graphs with the width W and the height H of 96U multiplied by V groups obtained after the cascading operation as F _CatF ，F _CatF ＝{F _catF,1 ,…,F _catF,U×V -a }; wherein F is _catF,1 Representing a set of 1 st group of fusion feature graphs obtained after cascading operation, F _catF,U×V Representing a set formed by a U multiplied by V group fusion characteristic diagram obtained after cascading operation; for the decoder, the first stack space convolution block consists of U×V third convolution layers, and the input end of each third convolution layer receives a corresponding group of fusion feature images obtained after cascade operation, namely a set F formed by the i-th group of fusion feature images obtained after cascade operation _catF,i The output end of each third convolution layer outputs 64 decoding feature images with width W and height H, and the set formed by all decoding feature images output by the output end of the ith third convolution layer is marked as F _d1,i The set of all decoding feature maps output by the first stack space convolution block is denoted as F _D1 ，F _D1 ＝{F _d1,1 ,…,F _d1,U×V -a }; the second stack space convolution block consists of U×V fourth convolution layers, and the input end of each fourth convolution layer receives F _D1 A corresponding set of decoding profiles, i.e. the input of the ith fourth convolutional layer receives F _D1 F in (F) _d1,i The output end of each fourth convolution layer outputs 32 decoding characteristic diagrams with width W and height H, and the set formed by all decoding characteristic diagrams output by the output end of the ith fourth convolution layer is denoted as F _d2,i The set of all decoding characteristic diagrams output by the second stack space convolution block is denoted as F _D2 ，F _D2 ＝{F _d2,1 ,…,F _d2,U×V -a }; the third stack space convolution block consists of U×V fifth convolution layers, and the input end of each fifth convolution layer receives F _D2 A corresponding set of decoding profiles, i.e. the input of the ith fifth convolutional layer receives F _D2 F in (F) _d2,i The output end of each fifth convolution layer outputs a single-channel reconstructed denoising sub-aperture image with the width W and the height H, and U multiplied by V reconstructed denoising sub-aperture images output by the third stack space convolution block are recombined into a reconstructed denoising light field image; which is a kind ofIn F _d1,1 Representing the set of all decoded feature maps output by the output end of the 1 st third convolution layer, F _d1,U×V Representing the set of all decoded feature maps output by the output terminal of the third convolutional layer of the U×V number, F _d2,1 Representing the set of all decoded feature maps output by the output end of the 1 st fourth convolution layer, F _d2,U×V The method comprises the steps that the size of a convolution kernel of U multiplied by V third convolution layers is 3 multiplied by 3, the convolution step length is 1, the number of input channels is 96, the number of output channels is 64, the adopted activation function is 'leakage ReLU', the U multiplied by V third convolution layers share the weight parameters of the convolution kernel, the size of the convolution kernel of U multiplied by V fourth convolution layers is 3 multiplied by 3, the convolution step length is 1, the number of input channels is 64, the number of output channels is 32, the adopted activation function is 'leakage ReLU', the size of the convolution kernel of U multiplied by V fourth convolution layers is 1 multiplied by 1, the number of input channels is 32, the number of output channels is 1, the activation function is not adopted, and the weight parameters of the convolution kernel of U multiplied by V fifth convolution layers share the same;

Step three: taking each reference noiseless light field image in the training set as a label image; then inputting the noise sub-aperture images corresponding to all the noise light field images in the training set and the corresponding noise microlens array images into a spatial-angular joint coding network for training to obtain reconstructed denoising light field images corresponding to each noise light field image in the training set; obtaining optimal weight parameters of each convolution kernel in the spatial angle joint coding network after training is finished, namely obtaining a trained spatial angle joint coding network model;

step four: randomly selecting one noise light field image as a test image; then recombining the test image into a plurality of single-channel noise sub-aperture images, and recombining the test image into a single-channel noise micro-lens array image; and inputting the multiple single-channel noise sub-aperture images corresponding to the test image and the corresponding single-channel noise microlens array image into a spatial angle joint coding network model, and testing to obtain a reconstructed denoising light field image corresponding to the test image.

In the second step, the first spatial angle joint encoder, the second spatial angle joint encoder and the third spatial angle joint encoder have the same structure and consist of a fourth stack spatial convolution block, a first angle convolution block and 1×1 convolution layers, wherein the fourth stack spatial convolution block consists of u×v sixth convolution layers, and the first angle convolution block consists of 1 seventh convolution layer;

For the first spatial angle joint encoder, for F _A0 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S0 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA0 ，F _Cat,SA0 ＝{F _cat,SA0,1 ,…,F _cat,SA0,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA0,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s1,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S1 The method comprises the steps of carrying out a first treatment on the surface of the For F _S0 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S0 The input of the seventh convolution layer receives F _temp,S0 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A1 For F _temp,A1 All temporary angle feature maps and F _A0 Performing cascading operation on all angle characteristic graphs in the table, and recording a set of 64 characteristic graphs with width W and height H obtained after cascading operationIs F _Cat,A10 The input of the 1 x 1 convolutional layer receives F _Cat,A10 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA0,1 Representing the angular characteristic of U x V times _S0 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA0,U×V Representing the angular characteristic of U x V times _S0 A set formed by the U X V group space feature images obtained after cascade operation;

for the second spatial angle joint encoder, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA1 ，F _Cat,SA1 ＝{F _cat,SA1,1 ,…,F _cat,SA1,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA1,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s2,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S2 The method comprises the steps of carrying out a first treatment on the surface of the For F _S1 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S1 The input of the seventh convolution layer receives F _temp,S1 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A2 For F _temp,A2 All temporary corners in (a)Degree feature map and F _A1 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A21 The input of the 1 x 1 convolutional layer receives F _Cat,A21 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA1,1 Representing the angular characteristic of U x V times _S1 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA1,U×V Representing the angular characteristic of U x V times _S1 A set formed by the U X V group space feature images obtained after cascade operation;

for the third spatial angle joint encoder, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA2 ，F _Cat,SA2 ＝{F _cat,SA2,1 ,…,F _cat,SA2,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA2,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s3,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S3 The method comprises the steps of carrying out a first treatment on the surface of the For F _S2 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S2 The input of the seventh convolution layer receives F _temp,S2 The output end of the seventh convolution layer outputs 32 temporary angle features with width W and height HCharacterization graph, namely, the set formed by all temporary angle characteristic graphs output by the output end of the seventh convolution layer is marked as F _temp,A3 For F _temp,A3 All temporary angle feature maps and F _A2 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A32 The input of the 1 x 1 convolutional layer receives F _Cat,A32 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA2,1 Representing the angular characteristic of U x V times _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA2,U×V Representing the angular characteristic of U x V times _S2 A set formed by the U X V group space feature images obtained after cascade operation;

The size of the convolution kernel of the u×v sixth convolution layers in each of the first, second, and third spatial angle joint encoders is 3×3, the convolution step size is 1, the number of input channels is 64, the number of output channels is 32, the activation function used is "leak ReLU", and the u×v sixth convolution layers share the weight parameter of the convolution kernel, the size of the convolution kernel of the seventh convolution layer in each of the first, second, and third spatial angle joint encoders is u×v, the horizontal convolution step size is U, the vertical convolution step size is V, the number of input channels is 32, the number of output channels is 32, the activation function used is "leak ReLU", the size of the convolution kernel of the 1×1 convolution layer in each of the first, second, and third spatial angle joint encoders is 1×1, the number of input channels is 64, the number of output channels is 32, and the activation function used is "leak ReLU".

In the second step, the first spatial angle feature fusion device, the second spatial angle feature fusion device and the third spatial angle feature fusion device have the same structure and are composed of a fifth stack spatial convolution block, wherein the fifth stack spatial convolution block is composed of U multiplied by V eighth convolution layers;

For the first space angle feature fusion device, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f1 ，F _Cat,f1 ＝{F _cat,f1,1 ,…,F _cat,f1,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f1,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused1,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f1,1 Representing the angular characteristic of U x V times _S1 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f1,U×V Representing the angular characteristic of U x V times _S1 A set formed by the U X V group space feature images obtained after cascade operation;

For the second spatial angle feature fusion cage, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f2 ，F _Cat,f2 ＝{F _cat,f2,1 ,…,F _cat,f2,U×V Each eighth convolution layer input end receives a corresponding group of feature images obtained after cascade operation, namely, an ith group of feature images obtained after cascade operationSet F _cat,f2,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused2,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f2,1 Representing the angular characteristic of U x V times _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f2,U×V Representing the angular characteristic of U x V times _S2 A set formed by the U X V group space feature images obtained after cascade operation;

For the third spatial angle feature fusion cage, for F _A3 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S3 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f3 ，F _Cat,f3 ＝{F _cat,f3,1 ,…,F _cat,f3,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f3,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused3,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f3,1 Representing the angular characteristic of U x V times _S3 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f3,U×V Representing the angular characteristic of U x V times _S3 A set formed by the U X V group space feature images obtained after cascade operation;

The sizes of convolution kernels of the u×v eighth convolution layers in the first spatial angle feature fusion device, the second spatial angle feature fusion device, and the third spatial angle feature fusion device are 3×3, convolution steps are 1, input channel numbers are 64, output channel numbers are 32, an adopted activation function is "leak ReLU", and the u×v eighth convolution layers share weight parameters of the convolution kernels.

Compared with the prior art, the invention has the advantages that:

1) According to the method, the fact that the light field image comprises a 4D structure which contains abundant spatial information and angle information is considered, so that the traditional 2D image or video denoising method cannot fully utilize the redundancy characteristic of the light field image in the angle domain, the 4D structure of the light field image is not completely explored yet, and the phenomenon that the denoising light field image is incomplete in noise removal or excessive in texture is caused; particularly, as each sub-aperture image or polar plane image of the light field image is independently processed by the traditional method, the structural consistency of the denoised light field image is damaged, so that the method disclosed by the invention is used for jointly processing the spatial information and the angular information of the light field image, namely, a spatial-angular joint coding network is constructed to solve the problems existing in the traditional method, the noise existing in the light field image can be effectively removed, and the texture information of the denoised light field image can be reconstructed; in addition, the method of the invention simultaneously reconstructs the whole light field image, namely simultaneously reconstructs all the sub-aperture images so as to keep the structural consistency of the denoising light field image.

2) The method considers that the spatial information of the light field image can be represented by the sub-aperture image, the angular information can be represented by the micro-lens array image, so that an initial stack spatial convolution block and an initial angular convolution block are constructed to extract effective spatial features and angular features on the sub-aperture image and the micro-lens array image.

Drawings

FIG. 1 is a block diagram of the overall implementation of the method of the present invention;

FIG. 2 is a schematic diagram of the structure of a spatial angle joint coding network constructed by the method of the invention;

FIG. 3 is a schematic diagram of the structure of a spatial angle joint encoder in a spatial angle joint encoding network constructed by the method of the present invention;

FIG. 4 is a schematic diagram of the composition and structure of a spatial angle feature fusion device in a spatial angle joint coding network constructed by the method of the invention;

FIG. 5a is a test noise light field image corresponding to the buildings_24 scene, here shown as a sub-aperture image at the center coordinates;

FIG. 5b is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by BM3D, wherein the reconstructed denoising light field image is shown by taking a sub-aperture image under a central coordinate;

FIG. 5c is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by VBM4D, wherein the reconstructed denoising light field image is shown by taking a sub-aperture image under a central coordinate;

FIG. 5d is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by the method of Danser eau et al, here shown by taking a sub-aperture image at the center coordinates;

FIG. 5e is a reconstructed denoised light field image obtained by processing the noisy light field image shown in FIG. 5a using the method of Chen et al, here shown with a sub-aperture image at the center coordinates;

FIG. 5f is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by the method of the present invention, wherein the reconstructed denoising light field image is shown by taking a sub-aperture image under a central coordinate;

FIG. 5g is a reference noiseless light field image corresponding to the noisy light field image shown in FIG. 5a, here shown as a sub-aperture image at the center coordinates;

FIG. 5h is a polar plane image corresponding to the noisy light field image shown in FIG. 5 a;

FIG. 5i is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 b;

FIG. 5j is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 c;

FIG. 5k is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 d;

FIG. 5l is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 e;

FIG. 5m is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 f;

FIG. 5n is a polar plane image corresponding to the reference noiseless light field image shown in FIG. 5 g;

FIG. 6a is a test noise light field image corresponding to the Peole_5 scene, here shown as a sub-aperture image at the center coordinates;

FIG. 6b is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a by BM3D, here shown by taking a sub-aperture image under the center coordinates;

FIG. 6c is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a with VBM4D, here shown by taking a sub-aperture image under the center coordinates;

FIG. 6d is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a by the method of Danser eau et al, here shown by taking a sub-aperture image at the center coordinates;

FIG. 6e is a reconstructed denoised light field image obtained by processing the noisy light field image shown in FIG. 6a using the method of Chen et al, here shown with a sub-aperture image at the center coordinates;

FIG. 6f is a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a by the method of the present invention, here shown by taking a sub-aperture image at the center coordinates;

FIG. 6g is a reference noiseless light field image corresponding to the noisy light field image shown in FIG. 6a, here shown as a sub-aperture image at the center coordinates;

FIG. 6h is a polar plane image corresponding to the noisy light field image shown in FIG. 6 a;

FIG. 6i is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 b;

FIG. 6j is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 c;

FIG. 6k is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 d;

FIG. 6l is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 e;

FIG. 6m is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 f;

fig. 6n is a polar plane image corresponding to the reference noiseless light field image shown in fig. 6 g.

Detailed Description

The invention is described in further detail below with reference to the embodiments of the drawings.

The development of immersive media such as virtual reality allows users to view images/video content with depth perception, which greatly improves the visual quality of experience of users viewing the content. However, the traditional 2D visual content acquisition mode, or the imaging method, can only capture the 2D intensity information of the scene, and the appearance of subsequent stereoscopic imaging improves the traditional 2D visual content acquisition mode, but can only provide limited visual angle information for the user. The invention provides a light field imaging method based on a convolutional neural network, which considers that 4D structure information of a light field image can be represented by 2D space information and 2D angle information, and the 2D space information is represented on a sub-aperture image, and the 2D angle information is represented on a micro-lens array image, so that an initial stack space convolution block and an initial angle convolution block are firstly constructed to extract space information and angle information of the light field image, and then a space angle joint encoder group is designed to improve the expression capability of characteristics; then introducing a space angle feature fusion device group to fully utilize the extracted space features and angle features; finally, a target denoised light field image is reconstructed according to the constructed decoder.

The invention provides a light field image denoising method based on a convolutional neural network, which is generally implemented as shown in a flow chart in fig. 1 and comprises the following steps:

step one: selecting Num noise light field images with spatial resolution of W multiplied by H and angular resolution of U multiplied by V and corresponding Num reference noise-free light field images; then recombining each noise light field image into a noise sub-aperture image with the width W and the height H of a U X V single channel, and recombining each noise light field image into a noise micro-lens array image with the width U X W and the height V X H of a single channel; then, forming a training set by using the Num noise light field images, the corresponding Num reference noise-free light field images, the corresponding Num multiplied by U multiplied by V noise sub-aperture images and the corresponding Num noise micro lens array images; here, num > 1, and in this example, num=70 is taken.

Step two: constructing a convolutional neural network as a spatial angle joint coding network: as shown in fig. 2, the spatial-temporal joint coding network includes an initial stacked spatial convolution block for extracting spatial features of a light field image, an initial angular convolution block for extracting angular features of the light field image, a spatial-temporal joint encoder set for jointly processing spatial features and angular features of the light field image and improving feature expression capability, a spatial-temporal feature fusion set for fusing spatial features and angular features of the light field image, and a decoder for reconstructing a target denoised light field image, wherein the spatial-temporal joint encoder set is composed of a first spatial-temporal joint encoder, a second spatial-temporal joint encoder and a third spatial-temporal joint encoder which are sequentially connected, and the spatial-temporal feature fusion set is composed of a first spatial-temporal feature fusion, a second spatial-temporal feature fusion and a third spatial-temporal feature fusion, and the decoder is composed of a first stacked spatial convolution block, a second stacked spatial convolution block and a third stacked spatial convolution block which are sequentially connected.

For an initial stack spatial convolution block, which consists of U x V first convolution layers, the input of each first convolution layer receives a single channelThe noise sub-aperture image with the width W and the height H is characterized in that the output end of each first convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith first convolution layer is denoted as F _s0,i The set of all the spatial feature maps output by the initial stack spatial convolution block is denoted as F _S0 ，F _S0 ＝{F _s0,1 ,…,F _s0,U×V -a }; wherein i is more than or equal to 1 and less than or equal to U multiplied by V, F _s0,1 Representing the set of all spatial feature patterns output by the output end of the 1 st first convolution layer, F _s0,U×V The method is characterized in that the method represents a set formed by all spatial feature graphs output by the output end of the U multiplied by V first convolution layers, the convolution kernels of the U multiplied by V first convolution layers are 3 multiplied by 3, the convolution step length is 1, the number of input channels is 1, the number of output channels is 32, the adopted activation function is a leakage ReLU (Leaky Rectified Linear Unit, a leakage correction linear unit), and the U multiplied by V first convolution layers share the weight parameters of the convolution kernels, so that the parameters of network training can be effectively reduced, and the phenomenon of overfitting is avoided.

For an initial angle convolution block, the initial angle convolution block consists of 1 second convolution layer, the input end of the second convolution layer receives a single-channel noise micro lens array image with the width of U multiplied by W and the height of V multiplied by H, the output end of the second convolution layer outputs 32 angle characteristic images with the width of W and the height of H, and the set formed by all the angle characteristic images output by the output end of the second convolution layer is denoted as F _A0 The method comprises the steps of carrying out a first treatment on the surface of the The size of the convolution kernel of the second convolution layer is U×V, the horizontal convolution step length is U, the vertical convolution step length is V, the number of input channels is 1, the number of output channels is 32, and the adopted activation function is 'activation ReLU'.

For a set of spatial-angle joint encoders, a first input of a first spatial-angle joint encoder receives F _S0 The second input of the first spatial angle joint encoder receives F _A0 The first output end of the first spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and outputs all the spatial feature images of the first output endThe set of structures is denoted as F _S1 ，F _S1 ＝{F _s1,1 ,…,F _s1,U×V The second output end of the first spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A1 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the second spatial angle joint encoder receives F _S1 The second input end of the second spatial angle joint encoder receives F _A1 The first output end of the second spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S2 ，F _S2 ＝{F _s2,1 ,…,F _s2,U×V The second output end of the second spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A2 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the third spatial angle joint encoder receives F _S2 The second input end of the third spatial angle joint encoder receives F _A2 The first output end of the third spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S3 ，F _S3 ＝{F _s3,1 ,…,F _s3,U×V The second output end of the third spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _s1,1 Representing a set of 1 st set of spatial feature maps output by a first output of a first spatial angle joint encoder, F _s1,U×V Representing a set of U x V group spatial feature maps output by a first output of a first spatial angle joint encoder, F _s2,1 Representing a set of 1 st set of spatial feature maps output by a first output of a second spatial angle joint encoder, F _s2,U×V Representing a second spatial angleA set of U x V group space feature patterns output by the first output end of the joint encoder, F _s3,1 Representing a set of 1 st set of spatial feature maps output by the first output of the third spatial angle joint encoder, F _s3,U×V Representing a set of the U x V-th set of spatial feature maps output by the first output of the third spatial angle joint encoder.

For the set of spatial angle feature fusion devices, a first input of the first spatial angle feature fusion device receives F _S1 The second input end of the first spatial angle feature fusion device receives F _A1 The output end of the first spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the first spatial angle feature fusion device is marked as F _Fused1 ，F _Fused1 ＝{F _fused1,1 ,…,F _fused1,U×V -a }; the first input end of the second space angle feature fusion device receives F _S2 All the space feature graphs in (a), the second input end of the second space angle feature fusion device receives F _A2 The output end of the second spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the second spatial angle feature fusion device is marked as F _Fused2 ，F _Fused2 ＝{F _fused2,1 ,…,F _fused2,U×V -a }; the first input end of the third space angle feature fusion device receives F _S3 The second input end of the third spatial angle feature fusion device receives F _A3 The output end of the third spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the third spatial angle feature fusion device is marked as F _Fused3 ，F _Fused3 ＝{F _fused3,1 ,…,F _fused3,U×V -a }; wherein F is _fused1,1 A set of 1 st group of fusion feature images output by the output end of the first space angle feature fusion device is represented, F _fused1,U×V A set of the fusion feature graphs of the first group of the U x V groups output by the output end of the first space angle feature fusion device, F _fused2,1 A set of 1 st group of fusion feature graphs output by the output end of the second space angle feature fusion device, F _fused2,U×V Representing the set formed by the U X V group fusion characteristic diagrams output by the output end of the second space angle characteristic fusion device, F _fused3,1 A set of 1 st group of fusion feature graphs output by the output end of the third space angle feature fusion device, F _fused3,U×V And the set of the U multiplied by V group fusion characteristic diagrams output by the output end of the third space angle characteristic fusion device is shown.

For F _Fused1 All of the fused feature maps F _Fused2 All of the fused feature maps F _Fused3 Performing cascading operation on all the fusion characteristic graphs in the database, and marking a set formed by the fusion characteristic graphs with the width W and the height H of 96U multiplied by V groups obtained after the cascading operation as F _CatF ，F _CatF ＝{F _catF,1 ,…,F _catF,U×V -a }; wherein F is _catF,1 Representing a set of 1 st group of fusion feature graphs obtained after cascading operation, F _catF,U×V Representing a set formed by a U multiplied by V group fusion characteristic diagram obtained after cascading operation; for the decoder, the first stack space convolution block consists of U×V third convolution layers, and the input end of each third convolution layer receives a corresponding group of fusion feature images obtained after cascade operation, namely a set F formed by the i-th group of fusion feature images obtained after cascade operation _catF,i The output end of each third convolution layer outputs 64 decoding feature images with width W and height H, and the set formed by all decoding feature images output by the output end of the ith third convolution layer is marked as F _d1,i The set of all decoding feature maps output by the first stack space convolution block is denoted as F _D1 ，F _D1 ＝{F _d1,1 ,…,F _d1,U×V -a }; the second stack space convolution block consists of U×V fourth convolution layers, and the input end of each fourth convolution layer receives F _D1 A corresponding set of decoding profiles, i.e. the ith fourth volumeInput terminal of the lamination receives F _D1 F in (F) _d1,i The output end of each fourth convolution layer outputs 32 decoding characteristic diagrams with width W and height H, and the set formed by all decoding characteristic diagrams output by the output end of the ith fourth convolution layer is denoted as F _d2,i The set of all decoding characteristic diagrams output by the second stack space convolution block is denoted as F _D2 ，F _D2 ＝{F _d2,1 ,…,F _d2,U×V -a }; the third stack space convolution block consists of U×V fifth convolution layers, and the input end of each fifth convolution layer receives F _D2 A corresponding set of decoding profiles, i.e. the input of the ith fifth convolutional layer receives F _D2 F in (F) _d2,i The output end of each fifth convolution layer outputs a single-channel reconstructed denoising sub-aperture image with the width W and the height H, and U multiplied by V reconstructed denoising sub-aperture images output by the third stack space convolution block are recombined into a reconstructed denoising light field image; wherein F is _d1,1 Representing the set of all decoded feature maps output by the output end of the 1 st third convolution layer, F _d1,U×V Representing the set of all decoded feature maps output by the output terminal of the third convolutional layer of the U×V number, F _d2,1 Representing the set of all decoded feature maps output by the output end of the 1 st fourth convolution layer, F _d2,U×V The set formed by all decoding feature graphs output by the output end of the U×V fourth convolution layers is represented, the sizes of convolution kernels of the U×V third convolution layers are 3×3, the convolution step sizes are 1, the number of input channels is 96, the number of output channels is 64, the adopted activation functions are all 'leakage ReLU', the U×V third convolution layers share the weight parameters of the convolution kernels, the sizes of the convolution kernels of the U×V fourth convolution layers are 3×3, the convolution step sizes are 1, the number of input channels is 64, the number of output channels is 32, the adopted activation functions are all 'leakage ReLU', the size of the convolution kernels of the U×V fourth convolution layers is 1×1, the number of input channels is 32, the number of output channels is 1, the activation functions are not adopted, and the weight parameters of the U×V fifth convolution layers share the convolution kernels.

Step three: taking each reference noiseless light field image in the training set as a label image; then inputting the noise sub-aperture images corresponding to all the noise light field images in the training set and the corresponding noise microlens array images into a spatial-angular joint coding network for training to obtain reconstructed denoising light field images corresponding to each noise light field image in the training set; and after training is finished, obtaining optimal weight parameters of all convolution kernels in the spatial angle joint coding network, and obtaining the trained spatial angle joint coding network model.

The trained spatial-angular joint coding network model is used for removing noise existing in the noise light field image, so that the visual quality of the light field image can be improved, and the performance of the visual task of the subsequent light field image can be improved.

In the second embodiment, in the step two, the first spatial angle joint encoder, the second spatial angle joint encoder and the third spatial angle joint encoder have the same structure, and the structures are shown in fig. 3, and the structures are composed of a fourth stacked spatial convolution block, a first angular convolution block and 1×1 convolution layers, where the fourth stacked spatial convolution block is composed of u×v sixth convolution layers, and the first angular convolution block is composed of 1 seventh convolution layer.

For the first spatial angle joint encoder, for F _A0 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S0 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA0 ，F _Cat,SA0 ＝{F _cat,SA0,1 ,…,F _cat,SA0,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA0,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s1,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S1 The method comprises the steps of carrying out a first treatment on the surface of the For F _S0 The recombination is carried out on all the space feature images (the recombination operation is a conventional processing means of the light field image, the recombination operation only changes the arrangement sequence of each feature value in the feature images and does not change the size of the feature values), and the collection formed by 32 feature images with the width of U multiplied by W and the height of V multiplied by H obtained after the recombination is marked as F _temp,S0 The input of the seventh convolution layer receives F _temp,S0 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A1 For F _temp,A1 All temporary angle feature maps and F _A0 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A10 The input of the 1 x 1 convolutional layer receives F _Cat,A10 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA0,1 Representing the angular characteristic of U x V times _S0 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA0,U×V Representing the angular characteristic of U x V times _S0 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

For the second spatial angle joint encoder, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA1 ，F _Cat,SA1 ＝{F _cat,SA1,1 ,…,F _cat,SA1,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA1,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s2,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S2 The method comprises the steps of carrying out a first treatment on the surface of the For F _S1 The recombination is carried out on all the space feature images (the recombination operation is a conventional processing means of the light field image, the recombination operation only changes the arrangement sequence of each feature value in the feature images and does not change the size of the feature values), and the collection formed by 32 feature images with the width of U multiplied by W and the height of V multiplied by H obtained after the recombination is marked as F _temp,S1 The input of the seventh convolution layer receives F _temp,S1 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A2 For F _temp,A2 All temporary angle feature maps and F _A1 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A21 The input of the 1 x 1 convolutional layer receives F _Cat,A21 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA1,1 Representing the angular characteristic of U x V times _S1 Cascade operation of U X V group space feature diagrams in the computer systemThe obtained set of 1 st group of characteristic diagrams F _cat,SA1,U×V Representing the angular characteristic of U x V times _S1 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

For the third spatial angle joint encoder, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA2 ，F _Cat,SA2 ＝{F _cat,SA2,1 ,…,F _cat,SA2,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA2,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s3,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S3 The method comprises the steps of carrying out a first treatment on the surface of the For F _S2 The recombination is carried out on all the space feature images (the recombination operation is a conventional processing means of the light field image, the recombination operation only changes the arrangement sequence of each feature value in the feature images and does not change the size of the feature values), and the collection formed by 32 feature images with the width of U multiplied by W and the height of V multiplied by H obtained after the recombination is marked as F _temp,S2 The input of the seventh convolution layer receives F _temp,S2 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A3 For F _temp,A3 All temporary angle feature maps and F _A2 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A32 The input of the 1 x 1 convolutional layer receives F _Cat,A32 All of (3)The characteristic diagram is that the output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA2,1 Representing the angular characteristic of U x V times _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA2,U×V Representing the angular characteristic of U x V times _S2 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

In the second embodiment, in the step two, the first spatial angle feature fusion device, the second spatial angle feature fusion device and the third spatial angle feature fusion device have the same structure, and the structures are shown in fig. 4, and the structures are composed of a fifth stack spatial convolution block, where the fifth stack spatial convolution block is composed of a uxv eighth convolution layers.

For the first space angle feature fusion device, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In U X V group space feature diagram, to be cascade operatedThe set of the obtained UXV groups and 64 characteristic graphs with width W and height H after cascading operation is denoted as F _Cat,f1 ，F _Cat,f1 ＝{F _cat,f1,1 ,…,F _cat,f1,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f1,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused1,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f1,1 Representing the angular characteristic of U x V times _S1 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f1,U×V Representing the angular characteristic of U x V times _S1 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

For the second spatial angle feature fusion cage, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f2 ，F _Cat,f2 ＝{F _cat,f2,1 ,…,F _cat,f2,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f2,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused2,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f2,1 Representation ofFor angle characteristic diagram of U multiplied by V and F _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f2,U×V Representing the angular characteristic of U x V times _S2 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

For the third spatial angle feature fusion cage, for F _A3 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S3 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f3 ，F _Cat,f3 ＝{F _cat,f3,1 ,…,F _cat,f3,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f3,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused3,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f3,1 Representing the angular characteristic of U x V times _S3 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f3,U×V Representing the angular characteristic of U x V times _S3 The U X V group space feature images in the model (C) are subjected to cascading operation to obtain a set formed by the U X V group feature images.

The method is realized by adopting a TensorFlow deep learning framework. The training and testing adopted light field images are from Lytro light field database provided by Stanford university, the light field database comprises noise-free light field images of different scenes, 100 noise-free light field images are selected in total, and the angular resolution of the noise-free light field images is 14 multiplied by 14. And adding additive white gaussian noise with noise intensity of 10, 20 and 50 into each noiseless light field image respectively to generate 100 noise light field images with the additive white gaussian noise with noise intensity of 10, 100 noise light field images with the additive white gaussian noise with noise intensity of 20 and 100 noise light field images with the additive white gaussian noise with noise intensity of 50.

Considering the vignetting effect of the Lytro light field camera, the noise-free sub-aperture images at the boundary are black and have extremely low visual quality, so that each noise light field image is subjected to angle clipping, the angular resolution of the clipped noise light field image is 8×8, namely, the sub-aperture images with the centers of 8×8 are taken for recombination. After the angle clipping, 70% of the noise light field image with the additive white gaussian noise with the noise intensity of 10, the corresponding noise-free light field image, the corresponding noise sub-aperture image and the corresponding noise microlens array image are randomly selected, 70% of the noise light field image with the additive white gaussian noise with the noise intensity of 20, the corresponding noise-free light field image, the corresponding noise sub-aperture image and the corresponding noise microlens array image are randomly selected, 70% of the noise light field image with the additive white gaussian noise with the noise intensity of 50, the corresponding noise-free light field image, the corresponding noise sub-aperture image and the corresponding noise microlens array image are randomly selected, an image set is formed, each image in the image set is clipped into an overlapped image block with the size of 64×64 according to the step length 32, therefore the spatial resolution of the tag image is 64×64 and the angular resolution is 8×8, the size of the noise sub-aperture image for training is 64×64, the size of the noise microlens array image for training is 512×512 (which is equal to the size of the sub-aperture image block), and the training set is multiplied by the size of the angular resolution.

After the angle clipping, the test set is formed by the rest 30% of noise light field images with the additive Gaussian white noise with the noise intensity of 10, the corresponding noiseless light field images, the corresponding noise sub-aperture images and the corresponding noise micro-lens array images, the rest 30% of noise light field images with the additive Gaussian white noise with the noise intensity of 20, the corresponding noiseless light field images, the corresponding noise sub-aperture images and the corresponding noise micro-lens array images, and the rest 30% of noise light field images with the additive Gaussian white noise with the noise intensity of 50, the corresponding noiseless light field images, the corresponding noise sub-aperture images and the corresponding noise micro-lens array images.

When training the spatial angle joint coding network, the learning rate is set to 10 ^-4 . All parameters of convolution kernels in the spatial angle joint coding network are initialized by using an Xavier initializer. For the spatial angle joint coding network, an L1 norm loss and a structural similarity loss are adopted, and the ADAM optimizer is utilized to train the network.

To test the effectiveness and universality of the method of the invention, a test is performed using noisy light field images in a test set that are different from the training samples. Basic information of the noisy light field image for training is shown in table 1 and basic information of the noisy light field image for testing is shown in table 2.

Table 1 training scenario information

Scene category	Angular resolution	Spatial resolution	Number of scenes	Scene name
					Bikes	8×8	375×540	10	Bikes 10-19
Buildings	8×8	375×540	10	Buildings 10-19
					Cars	8×8	375×540	10	Cars 10-19
Flowers plants	8×8	375×540	10	Flowers plants 10-19
					Fruits vegetables	8×8	375×540	10	Fruits vegetables 10-19
People	8×8	375×540	10	People 1-2，People 10-17
					General	8×8	375×540	10	General 10-19

Table 2 test scenario information

To illustrate the performance of the method of the present invention, the method of the present invention is compared with existing four light field image denoising methods, and the methods for comparison include three types: firstly, the existing 2D image denoising method (which processes sub-aperture images); secondly, the existing video denoising method (which processes sub-aperture images); thirdly, the existing denoising method specially designed for the light field image comprises a light field image denoising method based on a linear 4D frequency domain hyper-sector filter and a light field image denoising method based on learning and anisotropic analysis and proposed by Chen et al.

Here, the objective quality evaluation index used includes PSNR, which is an objective quality of evaluating a denoised image from the pixel difference itself, and SSIM, the higher the value thereof is, the better the image quality is; SSIM is an objective quality evaluation of a denoised image from the viewpoint of visual perception, and the value is between 0 and 1, and the higher the value is, the better the image quality is.

Table 3 shows the comparison of PSNR (dB) indexes of the method of the invention with the existing four light field image denoising methods, and Table 4 shows the comparison of SSIM indexes of the method of the invention with the existing four light field image denoising methods. In tables 3 and 4, this is shown by averaging the quality scores of the reconstructed denoising light field images corresponding to all the tested noisy light field images. As can be seen from the data listed in tables 3 and 4, compared with the existing all light field image denoising methods, the method of the present invention obtains higher quality scores on both objective indexes of PSNR and SSIM, and in particular, the method of the present invention obtains the best denoising effect for all three noise levels.

TABLE 3 comparison of PSNR (dB) index with the four existing light field image denoising methods

Table 4 shows the comparison of the method of the present invention with the existing four light field image denoising methods on SSIM index

FIG. 5a shows a test noisy light field image corresponding to the buildings_24 scene, here shown in terms of a sub-aperture image at the center coordinates; FIG. 5b shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by BM3D, here shown by taking a sub-aperture image in central coordinates; FIG. 5c shows a reconstructed denoised light field image from the noisy light field image of FIG. 5a processed with VBM4D, here shown as a sub-aperture image at central coordinates; FIG. 5d shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by the method of Danser et al, here shown in terms of a sub-aperture image at the center coordinates; FIG. 5e shows a reconstructed denoised light field image from the noisy light field image of FIG. 5a processed by Chen et al, here shown in terms of a sub-aperture image at central coordinates; FIG. 5f shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 5a by the method of the present invention, here shown by taking a sub-aperture image at the center coordinates; fig. 5g shows a reference noise-free light field image corresponding to the noise light field image shown in fig. 5a, here shown as a sub-aperture image at the center coordinates. FIG. 5h is a polar plane image corresponding to the noisy light field image shown in FIG. 5 a; FIG. 5i is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 b; FIG. 5j is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 c; FIG. 5k is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 d; FIG. 5l is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 e; FIG. 5m is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 5 f; fig. 5n is a polar plane image corresponding to the reference noiseless light field image shown in fig. 5 g. As can be seen from the figure, the de-noised light field image reconstructed by the method of the present invention has clear texture information and the noise has been removed, approaching the reference noise-free light field image in visual quality, whereas the texture of the image is severely blurred by the BM3D and VBM4D methods, as shown in the lower right region of fig. 5b and 5c, the method of danser eau et al cannot completely remove the noise, as shown in the lower left region of fig. 5D, and the method of Chen et al may generate visual artifacts at the object boundary, as shown in the lower left region of fig. 5 e. In addition, the polar plane image obtained by the method comprises continuous clear straight lines, which is also observed from the polar plane image, and shows that the reconstructed denoising light field image has good structural consistency.

FIG. 6a shows a test noise light field image corresponding to the Peole_5 scene, here shown as a sub-aperture image at the center coordinates; FIG. 6b shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a by BM3D, here shown by taking a sub-aperture image in central coordinates; FIG. 6c shows a reconstructed denoised light field image from the noisy light field image of FIG. 6a processed with VBM4D, here shown with sub-aperture images at central coordinates; FIG. 6d shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a by the method of Danser et al, here shown in terms of a sub-aperture image at the center coordinates; FIG. 6e shows a reconstructed denoised light field image from the noisy light field image of FIG. 6a processed by Chen et al, here shown in terms of a sub-aperture image at central coordinates; FIG. 6f shows a reconstructed denoising light field image obtained by processing the noisy light field image shown in FIG. 6a using the method of the present invention, here shown by taking a sub-aperture image at the center coordinates; fig. 6g shows a reference noise-free light field image corresponding to the noise light field image shown in fig. 6a, here shown as a sub-aperture image at the center coordinates. FIG. 6h is a polar plane image corresponding to the noisy light field image shown in FIG. 6 a; FIG. 6i is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 b; FIG. 6j is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 c; FIG. 6k is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 d; FIG. 6l is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 e; FIG. 6m is a polar plane image corresponding to the reconstructed denoising light field image shown in FIG. 6 f; fig. 6n is a polar plane image corresponding to the reference noiseless light field image shown in fig. 6 g. As can be seen from the figure, compared with the existing denoising method for the light field image, the denoising method for the light field image can remove noise in the noisy light field image more effectively and restore more texture details, as shown in fig. 6f, and particularly, the denoising light field image reconstructed by the denoising method for the light field image has better structural consistency, which can be seen from fig. 6m, and comprises continuous clear straight lines.

The innovation of the method of the invention is mainly as follows: firstly, considering that a 4D light field image can be characterized by a sub-aperture image and a micro-lens array image respectively, wherein the sub-aperture image explicitly reflects the spatial information of the light field image, and the micro-lens array image explicitly reflects the angular information of the light field image, a specific stack spatial convolution block and an angular convolution block are constructed to effectively extract the spatial characteristics and the angular characteristics of the light field image; secondly, an effective spatial-angular joint encoder set is constructed to explore the information compensation relation between the spatial characteristics and the angular characteristics of the light field image, and the expression capacity of the characteristics is improved; thirdly, the extracted features are fully utilized through the constructed space angle feature fusion device group, the target denoising image is reconstructed through the constructed decoder, and particularly, the reconstructed denoising light field image can recover better texture details based on multi-layer space and angle feature fusion; in addition, the method of the invention simultaneously reconstructs all sub-aperture images of the light field image, thereby being capable of well preserving the structural consistency of the denoising light field image.

Claims

1. The light field image denoising method based on the convolutional neural network is characterized by comprising the following steps of:

For an initial stack space convolution block, the initial stack space convolution block consists of U multiplied by V first convolution layers, the input end of each first convolution layer receives a noise sub-aperture image with the width W and the height H of a single channel, the output end of each first convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith first convolution layer is denoted as F _s0,i The set of all the spatial feature maps output by the initial stack spatial convolution block is denoted as F _S0 ，F _S0 ＝{F _s0,1 ,…,F _s0,U×V -a }; wherein i is more than or equal to 1 and less than or equal to U multiplied by V, F _s0,1 Representing the set of all spatial feature patterns output by the output end of the 1 st first convolution layer, F _s0,U×V Representing a set formed by all spatial feature graphs output by the output ends of the U×V first convolution layers, wherein the sizes of convolution kernels of the U×V first convolution layers are 3×3, convolution step sizes are 1, the numbers of input channels are 1, the numbers of output channels are 32, and the adopted activation functions are 'leakage ReLU', and the U×V first convolution layers share weight parameters of the convolution kernels;

for a set of spatial-angle joint encoders, a first input of a first spatial-angle joint encoder receives F _S0 The second input of the first spatial angle joint encoder receives F _A0 Is output by a first output end of a first spatial angle joint encoderU x V sets of 32 space feature images with width W and height H, and the set of all the space feature images outputted from the first output end is denoted as F _S1 ，F _S1 ＝{F _s1,1 ,…,F _s1,U×V The second output end of the first spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A1 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the second spatial angle joint encoder receives F _S1 The second input end of the second spatial angle joint encoder receives F _A1 The first output end of the second spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S2 ，F _S2 ＝{F _s2,1 ,…,F _s2,U×V The second output end of the second spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A2 The method comprises the steps of carrying out a first treatment on the surface of the The first input end of the third spatial angle joint encoder receives F _S2 The second input end of the third spatial angle joint encoder receives F _A2 The first output end of the third spatial angle joint encoder outputs U X V groups of 32 spatial feature images with width W and height H, and the set formed by all the spatial feature images output by the first output end is marked as F _S3 ，F _S3 ＝{F _s3,1 ,…,F _s3,U×V The second output end of the third spatial angle joint encoder outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the second output end is denoted as F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _s1,1 Representing a set of 1 st set of spatial feature maps output by a first output of a first spatial angle joint encoder, F _s1,U×V Representing a set of U x V group spatial feature maps output by a first output of a first spatial angle joint encoder, F _s2,1 Representing a second spatial angle joint weaveA set of 1 st group of spatial feature patterns output by the first output end of the encoder, F _s2,U×V Representing a set of U x V group spatial feature maps output by a first output of a second spatial angle joint encoder, F _s3,1 Representing a set of 1 st set of spatial feature maps output by the first output of the third spatial angle joint encoder, F _s3,U×V Representing a set of the U x V group spatial feature maps output by the first output end of the third spatial angle joint encoder;

for the set of spatial angle feature fusion devices, a first input of the first spatial angle feature fusion device receives F _S1 The second input end of the first spatial angle feature fusion device receives F _A1 The output end of the first spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the first spatial angle feature fusion device is marked as F _Fused1 ，F _Fused1 ＝{F _fused1,1 ,…,F _fused1,U×V -a }; the first input end of the second space angle feature fusion device receives F _S2 All the space feature graphs in (a), the second input end of the second space angle feature fusion device receives F _A2 The output end of the second spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the second spatial angle feature fusion device is marked as F _Fused2 ，F _Fused2 ＝{F _fused2,1 ,…,F _fused2,U×V -a }; the first input end of the third space angle feature fusion device receives F _S3 The second input end of the third spatial angle feature fusion device receives F _A3 The output end of the third spatial angle feature fusion device outputs U multiplied by V groups of 32 fusion feature images with the width W and the height H, and the collection formed by all the fusion feature images output by the third spatial angle feature fusion device is marked as F _Fused3 ，F _Fused3 ＝{F _fused3,1 ,…,F _fused3,U×V -a }; wherein F is _fused1,1 A set of 1 st group of fusion feature images output by the output end of the first space angle feature fusion device is represented, F _fused1,U×V A set of the fusion feature graphs of the first group of the U x V groups output by the output end of the first space angle feature fusion device, F _fused2,1 A set of 1 st group of fusion feature graphs output by the output end of the second space angle feature fusion device, F _fused2,U×V Representing the set formed by the U X V group fusion characteristic diagrams output by the output end of the second space angle characteristic fusion device, F _fused3,1 A set of 1 st group of fusion feature graphs output by the output end of the third space angle feature fusion device, F _fused3,U×V Representing a set formed by a U multiplied by V group fusion characteristic diagram output by an output end of the third space angle characteristic fusion device;

for F _Fused1 All of the fused feature maps F _Fused2 All of the fused feature maps F _Fused3 Performing cascading operation on all the fusion characteristic graphs in the database, and marking a set formed by the fusion characteristic graphs with the width W and the height H of 96U multiplied by V groups obtained after the cascading operation as F _CatF ，F _CatF ＝{F _catF,1 ,…,F _catF,U×V -a }; wherein F is _catF,1 Representing a set of 1 st group of fusion feature graphs obtained after cascading operation, F _catF,U×V Representing a set formed by a U multiplied by V group fusion characteristic diagram obtained after cascading operation; for the decoder, the first stack space convolution block consists of U×V third convolution layers, and the input end of each third convolution layer receives a corresponding group of fusion feature images obtained after cascade operation, namely a set F formed by the i-th group of fusion feature images obtained after cascade operation _catF,i The output end of each third convolution layer outputs 64 decoding feature images with width W and height H, and the set formed by all decoding feature images output by the output end of the ith third convolution layer is marked as F _d1,i The set of all decoding feature maps output by the first stack space convolution block is denoted as F _D1 ，F _D1 ＝{F _d1,1 ,…,F _d1,U×V -a }; the second stack space convolution block consists of U×V fourth convolution layersThe input of each fourth convolution layer receives F _D1 A corresponding set of decoding profiles, i.e. the input of the ith fourth convolutional layer receives F _D1 F in (F) _d1,i The output end of each fourth convolution layer outputs 32 decoding characteristic diagrams with width W and height H, and the set formed by all decoding characteristic diagrams output by the output end of the ith fourth convolution layer is denoted as F _d2,i The set of all decoding characteristic diagrams output by the second stack space convolution block is denoted as F _D2 ，F _D2 ＝{F _d2,1 ,…,F _d2,U×V -a }; the third stack space convolution block consists of U×V fifth convolution layers, and the input end of each fifth convolution layer receives F _D2 A corresponding set of decoding profiles, i.e. the input of the ith fifth convolutional layer receives F _D2 F in (F) _d2,i The output end of each fifth convolution layer outputs a single-channel reconstructed denoising sub-aperture image with the width W and the height H, and U multiplied by V reconstructed denoising sub-aperture images output by the third stack space convolution block are recombined into a reconstructed denoising light field image; wherein F is _d1,1 Representing the set of all decoded feature maps output by the output end of the 1 st third convolution layer, F _d1,U×V Representing the set of all decoded feature maps output by the output terminal of the third convolutional layer of the U×V number, F _d2,1 Representing the set of all decoded feature maps output by the output end of the 1 st fourth convolution layer, F _d2,U×V The size of the convolution kernel of the U x V third convolution layers is 3 x 3, the convolution step length is 1, the number of input channels is 96, the number of output channels is 64, the adopted activation function is 'leakage ReLU', the U x V third convolution layers share the weight parameters of the convolution kernel, the size of the convolution kernel of the U x V fourth convolution layers is 3 x 3, the convolution step length is 1, the number of input channels is 64, the number of output channels is 32, the adopted activation function is 'leakage ReLU', the size of the convolution kernel of the U x V fourth convolution layers is 1 x 1, the step length of the convolution kernel of the U x V fifth convolution layers is 1, the number of input channels is 32, and the output is onThe number of the channels is 1, an activation function is not adopted, and the U multiplied by V fifth convolution layers share the weight parameters of the convolution kernel;

2. The method for denoising the light field image based on the convolutional neural network according to claim 1, wherein in the second step, the first spatial angle joint encoder, the second spatial angle joint encoder and the third spatial angle joint encoder have the same structure and are composed of a fourth stack spatial convolution block, a first angular convolution block and 1 x 1 convolution layers, the fourth stack spatial convolution block is composed of a plurality of U x V sixth convolution layers, and the first angular convolution block is composed of 1 seventh convolution layer;

for the first spatial angle joint encoder, for F _A0 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S0 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA0 ，F _Cat,SA0 ＝{F _cat,SA0,1 ,…,F _cat,SA0,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA0,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s1,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S1 The method comprises the steps of carrying out a first treatment on the surface of the For F _S0 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S0 The input of the seventh convolution layer receives F _temp,S0 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A1 For F _temp,A1 All temporary angle feature maps and F _A0 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A10 The input of the 1 x 1 convolutional layer receives F _Cat,A10 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA0,1 Representing the angular characteristic of U x V times _S0 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA0,U×V Representing the angular characteristic of U x V times _S0 A set formed by the U X V group space feature images obtained after cascade operation;

for the second spatial angle joint encoder, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In the U multiplied by V group space feature diagram, cascade operation is carried out, and cascade operation is carried outThe set of the obtained U×V sets of 64 feature maps each having a width W and a height H is denoted as F _Cat,SA1 ，F _Cat,SA1 ＝{F _cat,SA1,1 ,…,F _cat,SA1,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA1,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s2,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S2 The method comprises the steps of carrying out a first treatment on the surface of the For F _S1 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S1 The input of the seventh convolution layer receives F _temp,S1 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A2 For F _temp,A2 All temporary angle feature maps and F _A1 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A21 The input of the 1 x 1 convolutional layer receives F _Cat,A21 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA1,1 Representing the angular characteristic of U x V times _S1 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA1,U×V Representing the angular characteristic of U x V times _S1 A set formed by the U X V group space feature images obtained after cascade operation;

for the third spatial angle joint encoder, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,SA2 ，F _Cat,SA2 ＝{F _cat,SA2,1 ,…,F _cat,SA2,U×V The input end of each sixth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,SA2,i The output end of each sixth convolution layer outputs 32 space feature images with the width W and the height H, and the set formed by all the space feature images output by the output end of the ith sixth convolution layer is F _s3,i The set of all the spatial feature patterns output by the fourth stack spatial convolution block is F _S3 The method comprises the steps of carrying out a first treatment on the surface of the For F _S2 All the space feature images in the model (1) are recombined, and a set formed by 32 feature images with width of U multiplied by W and height of V multiplied by H obtained through recombination is denoted as F _temp,S2 The input of the seventh convolution layer receives F _temp,S2 The output end of the seventh convolution layer outputs 32 temporary angle characteristic diagrams with the width W and the height H, and the set formed by all the temporary angle characteristic diagrams output by the output end of the seventh convolution layer is denoted as F _temp,A3 For F _temp,A3 All temporary angle feature maps and F _A2 Performing cascading operation on all angle characteristic graphs in the table, and marking a set formed by 64 characteristic graphs with width W and height H obtained after cascading operation as F _Cat,A32 The input of the 1 x 1 convolutional layer receives F _Cat,A32 The output end of the 1X 1 convolution layer outputs 32 angle characteristic diagrams with the width W and the height H, and the set formed by all the angle characteristic diagrams output by the output end of the 1X 1 convolution layer is F _A3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,SA2,1 Representing the angular characteristic of U x V times _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,SA2,U×V Representing the angular characteristic of U x V times _S2 Space bits of U x V groupA set formed by the U multiplied by V group characteristic diagrams obtained after cascade operation of the characteristic diagrams;

3. The method for denoising the light field image based on the convolutional neural network according to claim 1 or 2, wherein in the second step, the first spatial angle feature fusion device, the second spatial angle feature fusion device and the third spatial angle feature fusion device have the same structure, and are composed of a fifth stack spatial convolution block, wherein the fifth stack spatial convolution block is composed of a plurality of u×v eighth convolution layers;

For the first space angle feature fusion device, for F _A1 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S1 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f1 ，F _Cat,f1 ＝{F _cat,f1,1 ,…,F _cat,f1,U×V Each eighth convolution layer input terminal receives a corresponding set of feature graphs obtained after cascade operation, namely an i th eighth convolution layer input terminal receiving stageThe set F formed by the ith group of characteristic diagrams obtained after the joint operation _cat,f1,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused1,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f1,1 Representing the angular characteristic of U x V times _S1 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f1,U×V Representing the angular characteristic of U x V times _S1 A set formed by the U X V group space feature images obtained after cascade operation;

For the second spatial angle feature fusion cage, for F _A2 The angle characteristic diagram of the three-dimensional image is copied by U multiplied by V, and the angle characteristic diagram of the U multiplied by V is compared with F _S2 In the U X V group space feature diagram, the cascading operation is carried out, and a set formed by 64 feature diagrams with width W and height H of the U X V group obtained after the cascading operation is marked as F _Cat,f2 ，F _Cat,f2 ＝{F _cat,f2,1 ,…,F _cat,f2,U×V The input end of each eighth convolution layer receives a corresponding group of feature images obtained after cascade operation, namely a set F formed by the i th group of feature images obtained after cascade operation _cat,f2,i The output end of each eighth convolution layer outputs 32 fused feature images with the width W and the height H, and the set formed by all the fused feature images output by the output end of the ith eighth convolution layer is F _fused2,i The set of all the fusion feature graphs output by the fifth stack space convolution block is F _Fused2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is _cat,f2,1 Representing the angular characteristic of U x V times _S2 A set of 1 st group of feature maps obtained by cascade operation of the UXV group of space feature maps, F _cat,f2,U×V Representing the angular characteristic of U x V times _S2 A set formed by the U X V group space feature images obtained after cascade operation;