Disclosure of Invention
The invention aims to provide a general non-local optimized image fusion method which can be used for non-convex function optimization, and the fusion image obtained by the method has stronger robustness and is obviously superior to the current advanced image fusion method in two aspects of objective evaluation indexes and visual effects of the fusion image.
The invention also aims at providing a system using the fusion method.
The invention also aims to provide application of the fusion method or system.
The invention also aims to provide an evaluation method for evaluating the image fusion effect.
The invention firstly provides the following technical scheme:
an image fusion method based on total variation deep learning, which comprises the following steps: and obtaining a fused image through a convolutional neural network by taking the characteristic values of the source image and/or the pre-fused image as input and taking an optimized objective function as a loss function, wherein the optimized objective function is obtained through a total variation model.
According to some embodiments of the invention, it comprises the steps of:
s1: pre-fusing one or more source images to obtain a pre-fused image;
s2: setting initialization parameters, respectively extracting characteristic information from one to a plurality of source images through a convolutional neural network, and adding the characteristic information to a pre-fusion image to obtain a fusion image of the time;
s3: establishing a total variation model through the fusion image and the pre-fusion image to obtain the optimized objective function;
s4, taking the optimized objective function as a convolutional neural network loss function, performing reverse calculation to obtain new parameters, and respectively extracting characteristic information from one to a plurality of source images through the convolutional neural network under the new parameters and adding the characteristic information into the latest fusion image to obtain a new fused image;
s5: establishing a total variation model through the new current fusion image and the last fusion image to obtain a new optimized objective function, and repeating the steps S4-S5 until the loss function reaches the minimum to obtain an outputtable fusion image;
preferably, the adding is performed one to more times per fusion;
preferably, the adding extracts information from the source image or the convolved source image;
preferably, the pre-fusion image or the last fusion image is self-convolved before adding information;
preferably, the fusion image is self-convolved after the addition is completed.
According to some embodiments of the invention, the fusion method comprises the steps of:
S 1 : taking one image in the source images as a basic image;
S 2 setting initialization parameters, respectively extracting characteristic information from one to a plurality of source images through a convolutional neural network, and adding the characteristic information to a basic image to obtain a fusion image;
S 3 : establishing a total variation model through the fusion image and the basic image to obtain the optimized objective function;
S 4 the optimized objective function is used as a convolution neural network loss function, new parameters are obtained through reverse calculation, and under the new parameters, characteristic information is respectively extracted from one to a plurality of source images through the convolution neural network and added into the latest fusion image, so that a new fusion image is obtained;
S 5 : establishing a total variation model through the new current fusion image and the last fusion image to obtain a new optimized objective function, and repeating the steps S4-S5 until the loss function reaches the minimum to obtain an outputtable fusion image;
preferably, the adding is performed one to more times per fusion;
preferably, the adding extracts information from the source image or the convolved source image;
preferably, the pre-fusion image or the last fusion image is self-convolved before adding information;
preferably, the fusion image is self-convolved after the addition is completed.
According to some embodiments of the invention, the pre-fusion image is obtained by fusing the source images through a fusion algorithm.
Preferably, the fusion algorithm is selected from any one of dwt, nst and sr algorithms.
According to some embodiments of the invention, the optimization objective function isWherein F represents the fusion image of this time, F p Representing the last fused image, or the pre-fused image or the basic image, wherein alpha represents the norm order of the corresponding constraint term, is a regularization constraint term, and lambda is a regularization coefficient.
According to some embodiments of the invention, the optimization objective function isWherein V and R are feature matrices of visible light image and infrared image respectively, +.>Representing gradient operators, max { -, - } representing the matrix element corresponding position taken up, H, W representing the height and width of the input image, respectively, and ii-representing the Frobenius norm of the matrix.
According to some embodiments of the present invention, the characteristic information is extracted from one to a plurality of source images through a convolutional neural network and added to the pre-fusion image, or the base image or the latest fusion image, in such a way that complementary information is extracted from one to a plurality of source images, and the complementary information is fused through a connection operation.
The invention also provides an effect evaluation method of the fusion method, which is characterized in that the fusion effect is comprehensively evaluated through 8 parameters of information entropy, average gradient, standard deviation, mutual information, structural mutual information, petrovic index, piella index and spatial frequency.
The invention also provides a system based on total variation deep learning, which comprises a deep convolutional neural network, wherein the deep convolutional neural network comprises at least 3 common convolutional layers and at least 2 independent convolutional layers behind the common convolutional layers, the 3 common convolutional layers comprise a trunk part and a branch part, the trunk part is used for information addition and convolution after addition from a plurality of source images, the branch part is used for self-convolution of the plurality of source images, and the independent convolutional layers are used for self-convolution of fusion images after complete addition.
The self-convolution in the above scheme refers to convolution of the image information itself without adding new information.
According to some embodiments of the invention, the system employs common initialization parameters.
According to some embodiments of the invention, the learning rate of the algorithm in the common initialization parameter is 1e-4 to 1e-3.
According to some embodiments of the invention, the regularization coefficient of the total variation model in the common initialization parameter is 60.0-200.0.
According to some embodiments of the invention, the common initializer is an Adam optimizer, and the attenuation is 1e-6.
According to some embodiments of the invention, the number of iterative optimizations in the common initialization parameter is 400-1000.
The invention also provides an application method of the fusion method or the system, which is applied to fusion of the infrared image and the visible light image.
The invention has the following beneficial effects: the invention provides a universal image fusion method which is not easy to fall into local optimum, has good interpretability and can perform non-convex total variation optimization; the system comprises a total variation deep learning optimization framework for image fusion, wherein a total variation model function is used as a loss function of a neural network, then a source image and/or a pre-fusion image training network are used for optimizing the total variation model through a convolution deep neural network, so that a fusion image is generated, a traditional iterative optimization method is replaced, and the structure of the system is more in accordance with the physical meaning of image fusion; the evaluation method can effectively and accurately evaluate the fusion effect; the fusion image obtained by the invention has good robustness and consistency, is clear and contains rich edge, detail and texture information.
Detailed Description
The present invention will be described in detail with reference to the following examples and drawings, but it should be understood that the examples and drawings are only for illustrative purposes and are not intended to limit the scope of the present invention in any way. All reasonable variations and combinations that are included within the scope of the inventive concept fall within the scope of the present invention.
Using depth as shown in figure 1An optimization framework of a convolutional neural network comprising 3 common convolutional layers WL and 2 independent convolutional layers WL after the common convolutional layers, wherein the 3 common convolutional layers each comprise a trunk portion and a branch portion, b i { i=1,..5 } means that the module is the backbone part of the network, s i,j The module represented by { i=1, & gt, n, j=1, & gt, 3} is a network branch part, the trunk part realizes the addition of source image information through a connection algorithm, and carries out self-convolution after the addition, and each time the addition is carried out, the self-convolution of different source images is carried out, the branch part simultaneously carries out self-convolution of different source images respectively, after each time the convolution is completed, new supplementary information is provided for the trunk part, after the completion of 3 times of supplementation, the fusion image is obtained, and then 2 times of self-convolution are completed through 2 independent convolution layers in the system.
Under the above system, the embodiment performs image fusion by the following method:
s1: pre-fusing the original images S1-Sn to obtain a pre-fused image, wherein the pre-fusion can use any one of dwt, nst and sr algorithms, as shown in the formula (1):
F p =pre_fusion(S 1 ,…,S n ); (1)
preferably, the pre-fusion uses the nst algorithm.
S2: extracting complementary information from the original images S1-Sn by adopting a system shown in the figure 1, and merging the complementary information into a pre-fusion image to obtain a fusion image F shown in a formula (2);
in equation (2), α represents the norm of the corresponding constraint term, where the first term is the data constraint term whose function is to bring the fused image into close proximity with the original fused image as a whole, and the second term E r (F) Is a regularization constraint term to preserve features of the fused image that conform to the constraint term definition, and λ is a regularization coefficient. E (F), i.e. the loss function of the entire neural network, when the system is running.
S3: and inputting regularization constraint terms into the optimization model to obtain an optimization objective function, and training in a system shown in the attached figure 1 according to the optimization objective function to obtain a fusion image.
Preferably, in the step S3, the optimization objective function is:
in the formula (3), V and R represent a visible light image and an infrared image, respectively,representing a gradient operator, and max { ·, } represents that the corresponding position of the matrix element is enlarged. H and W represent the height and width of the input image, and i·i denote the Frobenius norm of the matrix.
In the functions, regularization can enable the fusion image to retain gradient information of an original input image, so that the fusion image has better edge texture information, is clearer and has a better visual effect.
Unlike conventional optimization functions, equation (3) is non-convex, nonlinear, non-derivable, indicating that it can be used to solve the optimization of non-convex, nonlinear, non-derivable problems that conventional optimization methods cannot solve.
In some embodiments of the invention, NSCT uses the default parameter configuration of the algorithm.
The system adopts a common initialization parameter, such as the learning rate of an algorithm is 1 e-4-1 e-3, the regularization coefficient of the total variation model is 60.0-200.0, the optimizer is an Adam optimizer, the attenuation is 1e-6, and the iterative optimization frequency is 400-1000.
In the following example, the learning rate of the algorithm was set to 5e-4 and the regularization coefficient of the variational model was set to 120.0. The optimizer is an Adam optimizer, the attenuation is 1e-6, and the iterative optimization step is set to 600, so that a fusion image with the minimum output loss function is obtained.
The following examples evaluate the quality of the final fused image using the following evaluation method:
the evaluation method comprehensively measures the quality of the fusion image by adopting a mode of combining objective evaluation indexes of the fusion image and visual effects of the fusion image, and further evaluates the advantages and disadvantages of an image fusion algorithm.
Specifically, the evaluation method comprehensively evaluates the fusion effect by the following 8 indexes:
information Entropy (EN), average Gradient (AG), standard Deviation (SD), mutual Information (MI), structural mutual information (FMI), petrovic index (Q) AB/F ) Piella index (Q) E ) Spatial Frequency (SF).
The specific definition of the above index is as follows (V and I in the following formulas denote a visible light image and an infrared light image, respectively, F denotes a fusion image, W and H denote the width and height of the image, respectively, and L denotes the gray level of the image). Wherein,
the information entropy (entropy) is an evaluation index for measuring the information amount of the fused image, and the larger the value is, the more information of the fused image is represented. The corresponding definition is shown in formula (4):
the Average Gradient (AG) is an evaluation index for measuring the richness of the edge and texture information of the fused image. The corresponding definition is shown in formula (5), whereinGradient values of the fused image in the horizontal and vertical directions at corresponding points are respectively represented:
the standard deviation (Standard deviation, SD) is a fusion image evaluation index reflecting the gray distribution of the fusion image based on the statistical idea, and the larger the standard deviation is, the more the gray distribution of the fusion image is dispersed, and the more the information contained in the fusion image is enriched. The corresponding definition is shown in formula (6), wherein μ represents the mean of the fused image:
mutual information (Mutual information, MI) measures the similarity of gray information of a fusion image and a source image, and the larger the value is, the more abundant the fusion image contains the information of the source image, and the higher the quality of the fusion image is. The corresponding definitions are shown in formulas (7) - (8), wherein X represents a source image, P x (i) And P F (j) Histograms representing source and fusion images, respectively, P X,F (i, j) represents the joint histogram of images X and F:
MI=MI V,F +MI I,F (7)
the mutual structural information (Feature Mutual information, FMI) is an evaluation index based on the Mutual Information (MI)) and the structural information, the index is measured as the similarity degree of the fusion image and the structural information of the source image, and the greater the value, the more the fusion image contains the structural information of the source image. The corresponding definition is shown in formula (9):
wherein the method comprises the steps of V f , I f , F f Features representing visible, infrared and fused images, respectively.
Petrovic index (Petrovic metric, Q AB/F ) The measure is how well the fused image retains the source image gradient information. The corresponding definitions are shown in formulas (10) - (11), wherein Q g XF (i, j) and Q a XF (i, j) respectively represent edge intensity and direction information at the corresponding points:
piella metric (Q) E ) Based on the fusion image evaluation index Q W Is an improved fusion image evaluation index and Q W In comparison, Q E More accords with the characteristics of the human visual system. The corresponding definition is shown in formula (12), wherein V ', I ', F ' respectively represent edge weight matrixes corresponding to the images V, I and F, alpha represents weight coefficients, and the calculation formula is shown in formula (12):
Q E =Q W (V,I,F) 1-α ·Q W (V',I',F') α (12)
spatial frequency (Spatial frequency, SF) is an index of quality evaluation of a fused image based on gradient information, and higher SF values represent that the fused image contains more abundant edge and texture information, and the quality of the fused image is higher. The corresponding definitions are shown in formulas (13) - (15), where RF and CF represent the line frequency values and column frequency values of the image:
example 1
And (3) performing image fusion processing on 12 groups of visible light and infrared data by optimizing an objective function according to a formula (3), running 10 times for each group, calculating an average value and a variance, and obtaining the results of all evaluation indexes as shown in a table 1 (wherein avg and std represent the average value and the standard deviation).
TABLE 1 evaluation of fusion Effect of different fused images
As can be seen from Table 1, the standard deviation of each index of the obtained 12 groups of fusion images is very small, and most of the standard deviations are smaller than 1%of the average value, which indicates that the method has small difference of each running result, good consistency and good algorithm robustness.
Example 2
The different algorithms were compared and evaluated by the following procedure:
(1) Collecting visible light image and infrared image data under different conditions, and establishing a data set, wherein an illustration is shown in fig. 2;
(2) Based on all data of the data set, carrying out overall evaluation on the quality of fusion methods adopting different algorithms;
(3) And selecting representative data, namely a group of urban building scene data and a group of wild natural scene images, and comparing the fusion effect of the representative data.
The fusion method adopted comprises the existing FusionGAN, denseFuse, deepMSTFuse, SWTDCTSF, featureExtractFuse, MEGC, deepFuse, GTF, NSCT, JSR fusion method in the prior art and the method of the invention.
The results of the overall evaluation of the different fusion methods on the validation dataset are shown in table 2 below:
TABLE 2 evaluation of fusion effect by different methods
Algorithm/index (mean)
|
EN
|
AG
|
SD
|
MI
|
Q AB/F |
Q E |
FMI
|
SF
|
FusionGAN
|
6.483
|
3.091
|
27.603
|
2.402
|
0.231
|
0.160
|
0.363
|
10.697
|
DenseFuse
|
6.912
|
4.714
|
37.264
|
2.805
|
0.438
|
0.311
|
0.412
|
14.354
|
DeepMSTFuse
|
6.588
|
4.841
|
30.180
|
2.719
|
0.559
|
0.384
|
0.482
|
17.493
|
SWTDCTSF
|
7.001
|
6.597
|
41.080
|
3.656
|
0.600
|
0.404
|
0.469
|
22.541
|
FeatureExtractFuse
|
6.981
|
6.506
|
42.067
|
3.880
|
0.555
|
0.381
|
0.511
|
20.837
|
MEGC
|
6.515
|
4.330
|
29.405
|
2.817
|
0.412
|
0.333
|
0.472
|
15.987
|
DeepFuse
|
6.867
|
4.910
|
36.902
|
2.872
|
0.508
|
0.353
|
0.475
|
16.706
|
GTF
|
6.731
|
4.046
|
31.840
|
2.316
|
0.421
|
0.291
|
0.461
|
16.428
|
NSCT
|
6.809
|
6.742
|
34.629
|
2.354
|
0.612
|
0.390
|
0.481
|
23.177
|
JSR
|
6.972
|
6.309
|
40.836
|
3.708
|
0.633
|
0.434
|
0.484
|
20.972
|
The invention is that
|
7.121
|
6.752
|
45.719
|
4.544
|
0.674
|
0.456
|
0.521
|
23.624 |
。
Ranking the methods according to the correlation between the index change trend and the image fusion quality, wherein the method with the largest numerical value in the index is ranked highest under the condition that the larger the index value is, the better the image fusion quality is indicated. The ranking as shown in table 3 was obtained:
table 3 average ranking of different methods on evaluation index
As can be seen from tables 2-3, the method of the present invention is significantly better than the rest of the methods in terms of 8 indexes, such as average gradient, information entropy, standard deviation, mutual information, structural mutual information, petrovic index, pilella index, spatial frequency, etc., except that the average gradient index is close to the NSCT method, the rest of the indexes are ranked highest, especially in terms of information entropy, standard deviation, petrovic index, and pilella index.
Wherein the method of the invention is slightly superior to the second ranked methods SWTDCTSF and NSCT in terms of information entropy and average gradient.
The method of the invention is superior to the second ranked methods featureextfuse and NSCT in terms of structural mutual information and spatial frequency.
The method of the invention is significantly better than the second ranking methods FeatureExtractFuse and JSR in terms of standard deviation, mutual information, petrovic index and Piella index. Wherein the standard deviation and mutual information are improved by more than 10% relative to the second method of ranking.
From the above tables 2-3, it can be seen that the method of the present invention is excellent in all indexes, while other methods in the prior art are greatly oscillated with respect to different indexes.
For example, in the evaluation index of the featureextfuse method, the standard deviation and mutual information are ranked second, which performs better, but the Petrovic index and the pilella index are obviously worse.
And (3) comparing the fusion effect of the infrared and visible light images of the two classical images selected in the dataset, the urban building scene data (pair_1) and the outdoor natural scene image (pair_2) outside the overall evaluation.
Wherein, each evaluation index of different algorithms on 2 groups of images is shown in table 4 and table 5, and the obtained fusion images are shown in fig. 3 and fig. 4.
Table 4 evaluation index of pair_1 under different algorithms
Table 5 evaluation index of pair_2 under different algorithms
|
EN
|
AG
|
SD
|
MI
|
Q AB/F |
Q E |
FMI
|
SF
|
FusionGAN
|
6.519
|
2.218
|
30.206
|
2.136
|
0.248
|
0.155
|
0.379
|
16.266
|
DenseFuse
|
6.882
|
2.224
|
30.486
|
1.876
|
0.438
|
0.173
|
0.299
|
12.733
|
DeepMSTFuse
|
6.580
|
2.956
|
25.387
|
1.942
|
0.559
|
0.320
|
0.462
|
25.146
|
SWTDCTSF
|
7.067
|
4.110
|
42.410
|
3.436
|
0.600
|
0.360
|
0.460
|
30.471
|
FeatureExtractFuse
|
7.027
|
3.210
|
39.041
|
3.331
|
0.550
|
0.312
|
0.452
|
21.259
|
MEGC
|
6.529
|
2.428
|
24.938
|
2.013
|
0.412
|
0.267
|
0.449
|
19.707
|
DeepFuse
|
6.808
|
2.728
|
30.251
|
2.019
|
0.508
|
0.302
|
0.451
|
18.995
|
GTF
|
6.622
|
3.571
|
40.449
|
2.017
|
0.461
|
0.274
|
0.456
|
25.901
|
NSCT
|
6.679
|
4.196
|
27.571
|
1.590
|
0.612
|
0.344
|
0.469
|
30.817
|
JSR
|
7.049
|
3.769
|
42.742
|
3.354
|
0.633
|
0.388
|
0.451
|
28.952
|
The invention is that
|
7.069
|
4.089
|
43.651
|
3.934
|
0.673
|
0.418
|
0.517
|
31.777 |
。
For the urban scene image pair_1, as can be seen from table 4, the method of the invention ranks first in 6 indexes such as information entropy, standard deviation, mutual information, structural mutual information, petrovic index and Piella index, and the method of the invention is similar to the best-performing method result in average gradient and spatial frequency of the other 2 indexes.
For the natural scene image pair_2, it can be seen from table 5 that the method of the present invention ranks first on the remaining 7 indices, except that the average gradient is very close to the second method of ranking.
In practical representation of the fused image, it can be found by observing the different fused images of pair_1 as shown in fig. 3 that the four fused images of fig. (e), (f), (k) and (l) are more clear than the fused images of other methods, and that the image (l) is more clear in part of the detail area than the other three fused images, such as the window area on the upper side of the light panel and the light panel, and better fuses the edge and texture information of the source image.
By observing the different fused images of pair_2 as shown in fig. 4, it can be found that the five fused images of fig. (d), (e), (h), (k) and (l) are better than the fused images of other methods, which have not only very rich detailed texture information, but also better consistency, whereas in the lakeshore area, fig. (e) and (l) are more clear than the other three images, and in the woodland area above the picture, fig. (l) is more clear than the other four fused images.
In summary, the objective evaluation results in tables 4 and 5 are consistent with the conclusions of the actual observation effects, and both show that the method of the present invention is superior to other prior art in the quality of the fused image.
The above examples are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples. All technical schemes belonging to the concept of the invention belong to the protection scope of the invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.