CN116740515A

CN116740515A - CNN-based intensity image and polarization image fusion enhancement method

Info

Publication number: CN116740515A
Application number: CN202310572366.5A
Authority: CN
Inventors: 王晨光; 马如鉞; 申冲; 曹慧亮; 唐军; 刘俊
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-09-12

Abstract

The invention discloses a CNN-based intensity image and polarization image fusion enhancement method, which is based on an unsupervised end-to-end network framework of a Convolutional Neural Network (CNN), wherein a convolutional neural network model is constructed and trained by a mixed loss function consisting of multi-scale weighted structure similarity (MWSSIM) and multi-scale weighted fusion quality indexes, an intensity image and a polarization image are respectively input into an encoder module to extract image characteristics, the obtained characteristic images are input into a fusion module to perform characteristic fusion, and the output fusion characteristic images finally enter into a decoder to reconstruct a final fusion image. The invention can effectively solve the problem of losing part of important information in the process of fusion of the polarized images, and the obtained fused image has better quality compared with the traditional fused image, thereby laying a good foundation for better application in the fields of target detection, target tracking and the like.

Description

CNN-based intensity image and polarization image fusion enhancement method

Technical Field

The invention belongs to the field of image fusion, and particularly relates to a CNN-based intensity image and polarization image fusion enhancement method.

Background

Polarization is one of important physical properties of light, and a target on the earth's surface or in the atmosphere generates specific polarization information determined by its own properties when reflecting, scattering, transmitting, and radiating electromagnetic waves, and the polarization information can be used to analyze the shape, surface roughness, texture orientation, physical and chemical properties of materials, and the like of the target. Polarization imaging techniques obtain intrinsic property information of a target by acquiring images of the degree of polarization (Degree ofLinear Polarization, doLP) and the angle of polarization (Angle ofPolarization, aoP) of the target. As a front-edge technology, the method not only can obtain the polarization information of the target, but also can provide the light intensity distribution characteristics of a two-dimensional space, and has wide application prospects in multiple fields of target detection, communication, underwater detection, medical imaging and the like.

However, the polarization characteristics of the light are easily affected by the environment (such as haze, overcast and rainy, weak light, etc.), so that the quality of the polarized image is difficult to reach the applicable level. Because the visible light image and the polarized image often have complementary characteristics, researchers at home and abroad mainly adopt methods such as DoLP image and AoP image or DoLP image and intensity image fusion and the like to achieve the purpose of increasing image information, thereby improving the multidimensional detail information of a single-frame polarized image. The fusion method of polarized images is mainly divided into two categories: traditional methods and neural network methods. Most conventional methods use the same transformations for different source images and therefore they are not the best method to fuse intensity and polarization images. Furthermore, activity level measurement and fusion rules in most methods are designed manually, which makes it difficult to handle different scenarios. Therefore, the neural network-based method has the advantages of strong adaptability, fault tolerance, noise resistance and the like, and the advantages enable the neural network to be successfully applied to various fields, including image fusion. However, in some non-ideal environments, the light intensity image contains less information, and the DoLP image and the AoP image may contain noise, so that the quality of the polarized fused image is poor, which is unfavorable for further development of the polarized target detection technology, and therefore, a certain degree of enhancement processing needs to be performed on the fused image.

Disclosure of Invention

The invention aims to: the invention provides a CNN-based intensity image and polarization image fusion enhancement method, which aims to solve the problem of poor image fusion quality caused by loss of part of important information in the polarization image fusion process in the non-ideal environment in the prior art.

The technical scheme is as follows: the CNN-based intensity image and polarized image fusion enhancement method comprises the following steps:

step one, acquiring a polarized image, establishing a polarized image data set, and calculating an intensity image and a polarization degree image from the polarized image to be used as a source image of an input convolutional neural network model;

step two, constructing a convolutional neural network model which sequentially comprises an encoder module, a fusion module and a decoder module; the convolutional neural network model adopts a loss function to evaluate the difference value between the predicted value and the true value of the model, and the loss function is formed by fusing the multi-scale weighted structural similarity and the multi-scale weighted fusion quality index Q _W Obtaining;

training the convolutional neural network model by utilizing the polarized image data set in the first step;

and step four, inputting the two source images into a trained convolutional neural network model, and outputting a final fusion enhancement result.

Further, in the second step, in the convolutional neural network model, the Loss function Loss is calculated as follows:

wherein, loss _MwSSIM Is a loss function based on multi-scale weighted structural similarity,is a loss function based on multi-scale weighted fusion quality index, w epsilon {3,5,7,9, 11} represents different windows, SSIM (x, y; w) represents the structural similarity of two images under w windows, beta _w Is a weight coefficient, c (w) represents a window with high image saliency being given a higher weight, λ (w) represents the relative importance of image x with respect to image y, Q _O (x, y; w) is a measure of the similarity of x and y in window w, where ψ is the balance parameter.

Further, in the formula (1), SSIM (x, y; w) is a structural similarity SSIM (x, y) of the image x and the image y under the w window, and the structural similarity SSIM (x, y) of the two images is calculated as follows:

weight coefficient beta _w The calculation formula of (2) is as follows:

wherein mu _i Representing the average value of a given image,representing variance, sigma, of a given image _xy Representing covariance of the two images; c (C) ₁ 、C ₂ Is constant and is respectively 1 multiplied by 10 ^-4 And 9X 10 ^-4 The method comprises the steps of carrying out a first treatment on the surface of the g (x) =max (x, 0.0001) is a correction function to increase the robustness of the solution.

Further, in the formula (2),

where s (x; w) reflects the local correlation of image x in window w, w' represents a different window, s (x; w) reflects the local correlation of image x in window w, and λ (w) represents the relative importance of image x with respect to image y.

Further, the balance parameter ψ=0.1.

Further, in the second step, the encoder module is divided into two branches, the two branches are respectively used for inputting an intensity image and a polarization degree image, each branch sequentially comprises a first convolution layer, a dense block module and a second convolution layer, each layer of data output is processed by a ReLU function, the first convolution layer is a 16-channel convolution layer, and the second convolution layer is a 64-channel convolution layer; the intensity image and the polarization degree image are respectively input into two branches of the encoder network, and respectively output an intensity characteristic image and a polarization degree characteristic image; the fusion module is used for physically splicing the intensity characteristic diagram and the polarization degree characteristic diagram to obtain a fusion characteristic diagram; the decoder module sequentially comprises five convolution layers and a Laplace enhancement layer, the channel numbers of the five convolution layers are 128, 64, 32, 16 and 1 respectively, wherein the data output of the first four convolution layers is processed by a ReLU function.

Further, in the second step, the filter sizes of the first convolution layer and the second convolution layer in the encoder module are 3*3, and the step length is 1; the filter sizes of the five convolutional layers in the decoder block are 3*3 each, with a step size of 1.

Further, in the second step, the laplacian enhancement layer in the decoder module uses a laplacian operator by differential approximation, and the formula is:

wherein x and y are respectively pixel points in two directions of the plane image in a gathering arrangement mode and are discretely distributed;the operator calculation result of Laplace of a point on the image can be understood as four times of the self gray level subtracted by the sum of the upper gray level, the lower gray level, the left gray level and the right gray level; f (x, y) is the image obtained after five layers of convolution layers of the decoder, c is the detail coefficient, and finally the fusion image I is output _f 。

In the second step, in the convolutional neural network model, the learning capacity of each convolutional layer in the decoder module is greater than 1/6, the receptive field size of the uppermost layer of the convolutional layers is 19, zero padding operation is adopted, and no sampling layer exists in the convolutional neural network model.

Further, in the first step, the method for calculating the intensity image and the polarization degree image is as follows:

decomposing the polarized image according to four adjacent pixels to obtain four images with different polarization directions, wherein the polarization directions are respectively 0 DEG, 45 DEG, 90 DEG and 135 DEG, and then solving an S0 image and a DoLP image according to Stokes vectors, wherein the specific formulas are as follows:

wherein S is ₀ Both I and I represent original light intensity information, namely an intensity image, under the condition of no biasing device; s is S ₁ Q represents the difference between the horizontally polarized component and the vertically polarized component of the light wave; s is S ₂ U represents the difference between the polarization component of the light wave in the 45 DEG direction and the polarization component in the 135 DEG direction; s is S ₃ V represents the difference between the right-hand circular polarization component and the left-hand circular polarization component of the light wave; i _i Representing the intensity of transmitted light after light waves in different directions pass through the polaroid; i _R Is the right-hand circular polarization component of the light wave; i _L Is the left-hand circularly polarized component of the light wave.

The beneficial effects are that: compared with the prior art, the CNN-based intensity image and polarized image fusion enhancement method can effectively solve the problem that part of important information is lost in the process of polarized image fusion, so that the obtained fused image has better quality than the traditional fused image, and lays a good foundation for better application to target detection, target tracking and the like;

by designing a multi-scale weighted structure similarity (MWSSIM) and a multi-scale weighted fusion quality index Q _W The combined mixed loss function forms an unsupervised learning process, and overcomes the limitation of complex activity level measurement and fusion rules of manual design;

by introducing the Laplace operator enhancement layer, the contrast of the gray abrupt change position in the image and the small detail part in the image are enhanced, the background tone of the image is reserved, the detail of the image is clearer than the original image, and the quality of the fused image is further improved.

Drawings

FIG. 1 is a schematic diagram of a convolutional neural network model of the present invention;

FIG. 2 is a fused image of the methods at the scene;

fig. 3 is a fused image of the methods in scene two.

Detailed Description

The invention is further illustrated by the following description in conjunction with the accompanying drawings and specific embodiments.

The CNN-based intensity image and polarized image fusion enhancement method comprises the following steps:

step one, acquiring a polarized image with size of 1224 x 1024 by adopting a polarized camera of Lucid Phoenix PHX050S-PC, and establishing a polarized image data set for training, verifying and testing a fusion network. An intensity (S0) image and a degree of polarization (DoLP) image are calculated from the polarization images as source images for the input convolutional neural network model. The resolving method comprises the following steps:

decomposing the polarized image according to four adjacent pixels to obtain four images with different polarization directions, wherein the polarization directions are respectively 0 degrees, 45 degrees, 90 degrees and 135 degrees, the size of the decomposed image is 612 multiplied by 512, and then calculating an S0 image and a DoLP image according to Stokes vectors, wherein the specific formula is as follows:

wherein S is ₀ I both represent the original light intensity information, i.e. the S0 image, without biasing means; s is S ₁ Q represents the difference between the horizontally polarized component and the vertically polarized component of the light wave; s is S ₂ U represents the difference between the polarization component of the light wave in the 45 DEG direction and the polarization component in the 135 DEG direction; s is S ₃ V represents the difference between the right-hand circular polarization component and the left-hand circular polarization component of the light wave; i _α Representing the intensity of transmitted light after light waves in different directions pass through the polaroid; i _R Is the right-hand circular polarization component of the light wave; i _L Is the left-hand circularly polarized component of the light wave.

Step two, constructing a convolutional neural network model, as shown in fig. 1, wherein the convolutional neural network model sequentially comprises an encoder module, a fusion module and a decoder module; the convolutional neural network model adopts a loss function to evaluate the difference value between the predicted value and the true value of the model, and the lossThe loss function is formed by multi-scale weighted structure similarity (MWSSIM) and multi-scale weighted fusion quality index Q _W Obtained.

1. In the convolutional neural network model, an encoder module is divided into two branches, the two branches are respectively used for inputting two source images of an S0 image and a DoLP image, each branch sequentially comprises a first convolutional layer, a Dense Block (Dense Block) module and a second convolutional layer, each layer of data output is processed by a ReLU function, the first convolutional layer is a 16-channel convolutional layer, the second convolutional layer is a 64-channel convolutional layer, the filter sizes of the first convolutional layer and the second convolutional layer are 3*3, and the step sizes are 1; the Dense Block can extract image features of higher dimensions and deeper layers, and image overfitting is reduced. This structure allows the input of images of arbitrary size, ensuring that all salient features can be applied in the fusion strategy. Because the constructed convolutional neural network is of an end-to-end structure, the output result is a weight graph of the input parameters. The S0 image and the DoLP image are respectively input into two branches of the encoder network, and an S0 characteristic diagram and a DoLP characteristic diagram are respectively output.

S0 feature map and DoLP feature map input fusion module, the fusion module is used for the physical concatenation of S0 feature map and DoLP feature map, obtain the fusion feature mapThere are 128 channels and then serve as inputs to the decoder module. The activity level metrics and fusion rules are learned autonomously in subsequent convolutional layers without manual design.

Will fuse the feature mapAn input decoder module, the decoder module comprising in order five convolutional layers and one Laplace (Laplace) enhancement layer, in particular:

the first layer in the decoder module is a 128-channel convolutional layer, the filter size is designed to be 3*3, and the step size is 1;

after the first layer convolution, the second layer convolution is a 64-channel convolution layer, the size of the filter is 3*3, and the step length is 1;

after the second layer convolution, the third layer convolution is a convolution layer of 32 channels, the size of the filter is 3*3, and the step length is 1;

after the third layer of convolution, a fourth layer of convolution is performed, wherein the fourth layer of convolution is a 16-channel convolution layer, the size of the filter is 3*3, and the step length is 1;

after the fourth layer convolution, the fifth layer convolution is a convolution layer of 1 channel, the size of the filter is 3*3, and the step length is 1;

after fifth-layer convolution, the Laplace operator is used by adopting differential approximation through a sixth Laplace enhancement layer, and the formula is as follows:

Only the first four data outputs in the decoder module are processed by the ReLU function, the learning capacity c-value of each convolution layer is greater than 1/6, the receptive field size of the uppermost convolution layer is 19, and no upsampling layer is used in the network due to the zero padding operation. The specific network configuration parameters of this embodiment are shown in table 1.

Table 1 specific network configuration parameters

2. The convolutional neural network needs to design a loss function to evaluate the degree of difference between the predicted value and the true value of the model and provide the difference value to the backward pass. The loss function is fused with a quality index Q by multi-scale weighted structure similarity (MWSSIM) and multi-scale weighting _w Is obtained by the following steps:

in the formula (5), w epsilon {3,5,7,9, 11} represents different windows, the sizes are 11×11, SSIM (x, y; w) represents the structural similarity of two images under the w window, and the calculation method is as shown in the formula (6), beta _w Is a weight coefficient, and the calculation method is shown as formula (7), whenWhen this means that the intensity image has more detailed information in the local area, the weight coefficient of the corresponding intensity image should be larger, where g (x) =max (x, 0.0001) is a correction function to increase the robustness of the solution.

Wherein mu _i Representing the average value of a given image,square representing a given imageDifference, sigma _xy Representing covariance of the two images; c (C) ₁ ，C ₂ Is constant and is respectively 1 multiplied by 10 ^-4 And 9X 10 ^-4 The method comprises the steps of carrying out a first treatment on the surface of the g (x) =max (x, 0.0001) is a correction function to increase the robustness of the solution. To better train the model, a multi-scale weighted fusion quality index Q is introduced into the loss function _w ：

Wherein s (x; w) reflects the local correlation of the image x in the window w, w' represents a different window, s (x; w) reflects the local correlation of the image x in the window w, λ (w) represents the relative importance of the image x with respect to the image y; lambda (w) represents the relative importance of image x with respect to image y, Q _O (x, y; w) is a measure of the similarity of x and y in window w. The final loss function is expressed as:

where ψ is the balance parameter used to balance the order of magnitude of the sum, and in the present invention, let ψ=0.1.

And thirdly, training the convolutional neural network model by utilizing the polarized image data set in the first step. In the network model training phase, the present embodiment uses 150 pairs of images for training the network, and the remaining pairs of images for verification and testing. Regarding the setting of the parameters, the learning rate was set to 0.0001, epochs was set to 30, and batch was set to 128.

And step four, inputting the polarized images to be fused into a trained convolutional neural network model, and outputting a fusion enhancement result.

The effect of the present method is verified by comparative experiments as follows. The comparison method comprises the following steps: an average image of an S0 image and a DoLP image, a non-downsampled contourlet transform (NSCT), a correlation coefficient fusion method (CC), a wavelet transform (DWT), a PFNet, a method (L-1) of enhancing only the S0 image, and the present method (CLNet). As can be seen from fig. 2 and fig. 3, the fusion network proposed by the method has a better visual fusion effect. The quality evaluation method for the image comprises the following steps: average Gradient (AG), entropy (EN), mean Square Error (MSE), standard Deviation (SD), spatial Frequency (SF), image noise (N) ^ab/f ). From table 2, it can be seen that the fusion network proposed by the method has more excellent objective fusion effect on quantized data compared with other methods.

Table 2 quality evaluation table of images obtained by different fusion methods

Methods	Ave	NSCT	CC	WT	PFNet	L-1	CLNet
								EN	6.6680	6.6680	6.6941	6.4023	6.6830	6.7067	6.9305
SD	24.7581	45.1511	41.6973	29.9284	42.5580	49.6488	48.4512
								SF	9.9961	13.0996	13.0467	19.4373	16.2962	17.5047	22.8588
MSE	0.0278	0.0567	0.0528	0.0307	0.0524	0.0542	0.0547
								AG	3.0847	4.0683	4.0672	5.8565	4.8617	5.6394	7.4127
N ^ab/f	0.0963	0.0218	0.0285	0.0243	0.0316	0.0106	0.0122

Claims

1. The CNN-based intensity image and polarization image fusion enhancement method is characterized by comprising the following steps of:

2. The CNN-based intensity image and polarization image fusion enhancement method according to claim 1, wherein in the second step, in the convolutional neural network model, the Loss function Loss is calculated as follows:

3. The CNN-based intensity image and polarization image fusion enhancement method according to claim 2, wherein in the formula (1), SSIM (x, y; w) is a structural similarity SSIM (x, y) of an image x and an image y under a w window, and the structural similarity SSIM (x, y) of the two images is calculated by the following formula:

weight coefficient beta _w The calculation formula of (2) is as follows:

4. The CNN-based intensity image and polarization image fusion enhancement method according to claim 2 or 3, wherein, in formula (2),

5. A CNN-based intensity image and polarization image fusion enhancement method according to claim 2 or 3, wherein the balance parameter ψ = 0.1.

6. The method for enhancing the fusion of the intensity image and the polarization image based on the CNN according to any one of claims 1 to 3, wherein in the second step, the encoder module is divided into two branches, the two branches are respectively used for inputting the intensity image and the polarization degree image, each branch sequentially comprises a first convolution layer, a dense block module and a second convolution layer, each layer of data output is processed by a ReLU function, the first convolution layer is a 16-channel convolution layer, and the second convolution layer is a 64-channel convolution layer; the intensity image and the polarization degree image are respectively input into two branches of the encoder network, and respectively output an intensity characteristic image and a polarization degree characteristic image; the fusion module is used for physically splicing the intensity characteristic diagram and the polarization degree characteristic diagram to obtain a fusion characteristic diagram; the decoder module sequentially comprises five convolution layers and a Laplace enhancement layer, the channel numbers of the five convolution layers are 128, 64, 32, 16 and 1 respectively, wherein the data output of the first four convolution layers is processed by a ReLU function.

7. The CNN-based intensity image and polarization image fusion enhancement method according to claim 6, wherein in the second step, the filter sizes of the first convolution layer and the second convolution layer in the encoder module are 3*3, and the step size is 1; the filter sizes of the five convolutional layers in the decoder block are 3*3 each, with a step size of 1.

8. The method for enhancing the fusion of an intensity image and a polarized image based on CNN according to claim 6, wherein in the second step, the laplace enhancement layer in the decoder module uses a differential approximation to use a laplace operator, and the formula is:

9. The CNN-based intensity image and polarization image fusion enhancement method according to claim 6, wherein in the second step, the learning capacity of each convolutional layer in the decoder module is greater than 1/6, the receptive field size of the uppermost convolutional layer is 19, and zero padding operation is adopted, and there is no sampling layer in the convolutional neural network model.

10. A CNN-based intensity image and polarization image fusion enhancement method according to any one of claims 1 to 3, wherein in step one, the method for resolving the intensity image and the polarization degree image is as follows: