CN115358961A - Multi-focus image fusion method based on deep learning - Google Patents

Multi-focus image fusion method based on deep learning Download PDF

Info

Publication number
CN115358961A
CN115358961A CN202211110378.8A CN202211110378A CN115358961A CN 115358961 A CN115358961 A CN 115358961A CN 202211110378 A CN202211110378 A CN 202211110378A CN 115358961 A CN115358961 A CN 115358961A
Authority
CN
China
Prior art keywords
image
fused
fusion
channel
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211110378.8A
Other languages
Chinese (zh)
Inventor
陈滨
熊峰
邵艳利
魏丹
王兴起
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211110378.8A priority Critical patent/CN115358961A/en
Publication of CN115358961A publication Critical patent/CN115358961A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multifocal image fusion method based on deep learning. And then, obtaining edge information of the multifocal image pair according to the Laplacian, and obtaining a fused image label according to a maximum selection strategy. And inputting the multi-focus image pair into a generator fusion network to obtain a fusion image. In addition, high-frequency texture information of the source image is extracted according to the designed texture enhancement module, and a corresponding loss function optimization fusion image is designed. And finally, inputting the fused image label and the fused image into a discriminator to discriminate true and false, and optimizing a generator according to a discrimination result. The invention solves the problem of texture and edge information loss in the process of multi-focus image fusion, and the fused image can be used for further image processing.

Description

Multifocal image fusion method based on deep learning
Technical Field
The invention discloses a fusion method for generating a clear image by using a plurality of images acquired by different focus settings, which aims to ensure that the fusion image processed by the technology has full clear visual perception, can selectively reserve detailed information contained in a plurality of source images, and can be used for subsequent image processing tasks.
Background
Due to the characteristics of thick middle and thin two sides of the lens, the image obtained by lens imaging can be distorted at a position far away from the focus, so that the obtained image becomes fuzzy. Due to the need for sharper pictures, multi-focus image fusion has become an important topic in the field of image processing. Due to the limitation of the optical lens, only objects near the focus and within the depth of field can obtain full-focus and clear images, and only blurred images can be obtained by objects far away from the focus and outside the depth of field. Multi-focus image fusion algorithms have been proposed to address such problems, which synthesize a fully focused image by capturing the clear appearance of multiple source images with different focal points in the same scene. The fused image can be applied to the fields of photography visualization, object tracking, medical diagnosis, remote sensing monitoring and the like.
To date, image fusion can be divided into two categories: a conventional fusion method and a fusion method based on deep learning. Conventional image fusion algorithms focus on processing the transform domain and the spatial domain of the image. The algorithm based on the transform domain is to transform the image into different feature domains, then carry out weighted fusion on the feature domains, and finally reversely generate a fused image by the fused features. Even images of different modalities share similar attributes on a characteristic domain, so that the algorithm is suitable for fusion among multi-modality images, such as infrared-visible light image fusion and CT-MR image fusion; using a spatial domain based algorithm, an input image is first divided into a plurality of small blocks or regions, then the saliency of each small block is measured, and finally the most salient regions are fused into a new image. The algorithm based on the spatial domain is also suitable for the images of the same modality, such as multi-focus image fusion. However, the traditional image fusion algorithm has the inevitable defects of poor universality, low efficiency, edge blurring and the like. The deep learning algorithm is widely applied to the field of image fusion by virtue of strong feature expression capability. Researchers can complete the image fusion task by designing a proper network structure and a corresponding loss function. The deep learning based method first automatically learns the features in each image block using a convolutional neural network, learning according to labels that partition the focused and defocused regions. And continuously optimizing the network parameters according to the designed loss function. Researchers deal with different application scenes by adjusting the network structure of the depth network, and the quality and the efficiency of image fusion are improved, for example, a pixel level fusion CNN, an encoder-decoder fusion network, a residual error fusion network, and an end-to-end multifocal image fusion algorithm. Although the fusion effect is good, the defects of large calculation amount, complex network and the like still exist.
The typical algorithm still has disadvantages in three ways: (1) Along with the continuous deepening of the network layer number, the expansion or reduction effect of the gradient is continuously accumulated, so that the model can not be converged easily, and the final fusion effect is influenced; (2) In the feature fusion process, some algorithms do not have perfect image reconstruction capability, so that redundant information exists in fused images; (3) A large number of parameters and loss functions need to be calculated in the deep learning model, so that the complexity of an image fusion algorithm is high, and the calculation time cost is high. In order to solve the above problem, the present patent proposes a multi-focus image fusion algorithm based on texture enhancement. The algorithm is based on generation of a confrontation network, firstly, high-frequency information of a source image is extracted by using a texture enhancement module, and a reasonable loss function is designed; secondly, extracting the characteristics of the image to be fused by using a generator adopting a dual-channel mechanism; then performing concat fusion on the obtained characteristics; and finally, inputting the generated fusion image and the real image label into a discriminator for counterstudy.
Disclosure of Invention
The invention provides an improved algorithm aiming at a multi-focus image fusion technology based on deep learning, source images with different focuses can be fused into an image with a full focus area and high texture characteristics, and the fused image can be used for image processing tasks such as image segmentation and target recognition.
The method specifically comprises the following steps:
step 1: and converting the training sample images and the test sample images from an RGB space to a YCbCr space, and reserving Y-channel images therein as a training data set and a test data set.
And 2, step: using the Laplace operator (Laplacian) Obtaining edge information in each image in the training data set, and obtaining a fused image label I according to a Maximum selection strategy (Maximum) r The concrete formula is as follows:
I r =Maximum(Laplacian(I 1 )+Laplacian(I 2 ))
in which I 1 And I 2 Is a pair of multi-focus images in the training data set.
And step 3: constructing a generation countermeasure network model, wherein the generation countermeasure network model comprises a generator network and a discriminator network;
and 4, step 4: constructing a texture enhancement module (ITEB), and carrying out multi-focus image pair I corresponding to the training data set 1 And I 2 Inputting the information into a texture enhancement module to extract high-frequency information.
Extracting deep features of the image through a texture enhancement module, and obtaining high-frequency texture information of the image;
the method comprises the following specific steps: (1) constructing a texture enhancement model; (2) Obtaining shallow layer characteristic information of an input image by utilizing convolution operation; (3) Adding a channel attention mechanism to rescale the extracted characteristic channels, and distributing different weights to each channel; (4) Adding a relu function to enable the output of a part of network neurons to be 0, reducing the interdependency of parameters and relieving overfitting; (5) Adding residual connection, and connecting the extracted shallow information serving as input with the output of the next extraction stage; (6) Continuously superposing the 5 steps to obtain depth characteristic information of the source image;
and 5: texture information and edge information are introduced, and meanwhile, a reasonable loss function is designed according to an image structure. The loss function can be divided into a generator loss function and a discriminator loss function, where the generator loss includes content loss
Figure BDA0003842868640000031
Loss of antagonism
Figure BDA0003842868640000032
And loss of SSIM
Figure BDA0003842868640000033
Where content loss is used to extract and reconstruct information, countermeasures are used to enhance texture detail, SSIM loss is used to constrain the generator to generate images that are consistent with the true image structure. The generator loss function is expressed as:
Figure BDA0003842868640000034
where the hyperparameters α and β are used to balance the three to the same level, α and β being set to 100 and 0.01, respectively. Content loss for generators
Figure BDA0003842868640000035
Is the mean square error of the pixels in the fused image and the input image, and constrains the fused image and the focus area of the source image to have the same intensity distribution and texture details, which are defined as:
Figure BDA0003842868640000036
where G (z) is the fused image, z represents the pixel distribution of the fused image, X 1 And X 2 Is to input a source image, i and j represent the pixel values of the ith row and the jth column of the gradient map or source image, W and H are the maps respectivelyWidth and length of the image. While
Figure BDA0003842868640000037
And
Figure BDA0003842868640000038
the gradient image is a gradient image of a source image after being enhanced by the texture enhancing module, and the size of the gradient image is the same as that of the fused image. The competing loss of the generator may further enhance the texture detail of the fused image, which is defined as:
Figure BDA0003842868640000039
where N represents the number of fused images during training, a is the probability label that the generator expects the discriminator to discriminate fused images, here set to 1,
Figure BDA00038428686400000310
representing the operation of generating an image gradient map using the laplacian operator. This antagonistic game results in a fused image with a finer texture.
Since the mean square error is very sensitive to large errors, the pure use of the loss of the mean square error may result in an excessively smooth image, and the overall structure of the image is not considered, so that the fused image and the source image structure are consistent
Figure BDA00038428686400000311
Added to the loss function, which is defined as:
Figure BDA00038428686400000312
in which I f Representing a fused image, I 1 And I 2 The method comprises the following steps of representing a source image, wherein an SSIM function is used for calculating the brightness, the contrast and the structure difference value between a fusion image and the source image, and the calculation formula is as follows:
Figure BDA0003842868640000041
where μ and σ denote the mean and standard deviation, respectively, of the entire image pixel matrix, c 1 And c 2 Is two minimum constants to prevent the denominator from being zero. The larger the SSIM value, the higher the structural similarity between the two.
The penalty function of the discriminator may enable the discriminator to accurately distinguish between genuine and fake data. The input of the discriminator is a fused image I generated by the generator real_fused And a fused image I obtained according to the maximum selection principle and reconstruction fake_fused The loss function of the discriminator is specifically expressed as:
Figure BDA0003842868640000042
where N is the number of fused images during training, AVE is the pixel averaging function, b represents the probability that the desired discriminator can recognize real data, and b =1, c represents the probability that the desired discriminator can recognize false data, where c =0 is set. With such constraints, the discriminator can continue to improve the ability to discriminate between true and false data, and then direct the generator to generate a strongly textured fused image.
Step 6: the multi-focus image pair I corresponding to the training data set 1 And I 2 Input to a generator network to obtain I 1 And I 2 Feature set F of 1 =(x 1 ,x 2 ,…x n ) And F 2 =(x 1 ,x 2 ,…x n ) Then obtaining a fused image I of the generator through concat fusion and reconstruction f The concrete formula is as follows:
I f =reconstruct(concat(F 1 ,F 2 ))
and 7: fused image I of generator f And fused image tag I r The discriminator of the generation countermeasure network is input for discrimination, and countermeasure rules are established between the generator and the discriminator to optimize the fusion image. Specific optimization step bagComprises the following steps:
7-1: the discriminator discriminates whether the fused image is a real image;
7-2: if not, minimizing the difference between the fused image and the fused image label through the loss function, feeding back the identification result to the generator, adjusting the fusion rule of the generator according to the identification result, and optimizing the fusion result.
7-3: and if so, the fused image of the generator is the optimal fused image.
And 8: and performing color space conversion on the fused gray-scale image to obtain a final fused image.
The invention has the beneficial effects that:
the multi-focus image fusion method based on deep learning disclosed by the invention adopts the texture enhancement module to extract the high-frequency texture information of the source image and designs the loss function which can promote the network to be rapidly converged to adaptively adjust the fusion rule of the generator, thereby effectively reducing the loss of important information in the process of feature extraction. In addition, a countermeasure rule and a design loss function are established between the generator and the discriminator, and the fusion result is optimized.
Drawings
Fig. 1 is a schematic flow chart of a method for fusing a multi-focus image based on deep learning according to the present invention;
FIG. 2 is a schematic diagram of the structure of a generator network in the practice of the present invention;
FIG. 3 is a schematic diagram of the structure of a network of discriminators in the practice of the present invention;
FIG. 4 is a block diagram of a texture enhancement module in accordance with an embodiment of the present invention;
Detailed Description
The present invention will be further explained with reference to the attached drawings, and the purpose, technical solutions and points of the present invention will be clearly and precisely explained. Referring to the attached drawing 1, the overall process of the invention comprises the following steps:
step 1: the training sample set and the Test sample set are converted from RGB space to YCbCr (Y: luminance component, cb: blue chrominance component, cr: red chrominance component) space, and the Y-channel images therein are reserved as a training data set Train _ set and a Test data set Test _ set.
Step 2: utilizing a Laplacian operator to obtain edge information of a multifocal image pair in a training set, and obtaining a fused image label I according to a maximum selection strategy r
And step 3: and inputting the multifocal images in the training set into a texture enhancement module to obtain high-frequency texture information of the images, and designing a loss function for optimizing a generator.
And 4, step 4: inputting the training set multi-focus image pair into a generator network to obtain a fusion image I f . The detailed structure of the generator network is shown in fig. 2.
And 5: label I of fused image r And fused image I f Inputting the image into a discriminator, obtaining a probability label, and discriminating whether the input image is a real image according to the probability label. The specific discrimination process is as follows: if the discriminator discriminates that the input image is not the real image, reducing the difference between the input image and the real image by using the loss function in the step 3, feeding back the discrimination result to the generator, and adaptively adjusting the fusion rule according to the feedback result by the generator to optimize the fusion result; if the discriminator discriminates that the input image is a real image, the fused image is the optimal result. The detailed structure of the discriminator is shown in figure 3.
Further, in step 3, the texture enhancement module calculation can be divided into 3 parts: and extracting image features, reconstructing and designing a loss function.
And 3-1, extracting deep features of the image to obtain high-frequency texture information of the image.
And 3-1-1, acquiring image edge information. First using the laplacian (Laplacian) Extracting edge information of an image focusing area:
image p =Laplacian(image)
wherein image is a source image in the training dataset p Is an edge information graph after laplacian conversion.
3-1-2, extracting deep features of the image. First image is transformed using a 3 × 3 convolution kernel p Conversion from single channel to 32And converting 32 channels into 128 channels by using a convolution kernel of 3 × 3 to obtain a feature set Origin, wherein the convolution process is called conv operation.
3-1-3 attention is assigned a weight. After the channels are extracted, the feature map corresponding to each channel is compressed into a real number, and the real number set is expressed as { x 1 ,x 2 ,…x 128 Then, each characteristic Channel is assigned with a weight through a parameter W, the W is learned and used for explicitly modeling the correlation among the characteristic channels, and then the assigned weight is applied to each original characteristic Channel to obtain a Channel attention set, which is expressed as { W } 1 ,w 2 ,…w 128 And the importance of different channels can be learned by matching with a deep learning method.
3-1-4. Residual jump connection. After each feature Channel is assigned with different weights, the feature Set origin is assigned and added to the Channel attention Set Channel _ Set to obtain the output of each layer of ITEB of the texture enhancement module:
ITEB=Origin+Channel_set
where ITEB is the result obtained for each layer of the texture enhancement module.
3-1-5, extracting the characteristic diagram of the deeper layer of the image. Continuously iterating the ITEB process of each layer of the texture enhancement block to obtain an extraction result F, wherein a specific formula is expressed as follows:
F=ITEB(ITEB(…ITEB(I p )))
and 3-2, reconstructing the image to restore the original size of the image. The 128-channel feature map is restored to a single-channel image by using a 1 × 1 convolution kernel, and the specific formula is as follows:
F ITEB =constructor(F)
wherein F ITEB And outputting the result of the texture enhancement module.
And 3-3, designing a loss function. And (3) introducing texture enhancement module evaluation information, and designing a loss function for promoting the rapid convergence of the network, wherein the specific formula of the loss function is as follows:
Figure BDA0003842868640000061
where W and H are the width and height of the image, respectively, l 1 And l 2 Respectively, a pair of multifocal images in the training data set, I f Is the fused image output by the generator and,
Figure BDA0003842868640000071
and
Figure BDA0003842868640000072
the gradient image is a gradient image of a source image after being enhanced by the texture enhancing module, and the size of the gradient image is the same as that of the fused image. The detailed structure of the texture enhancement module is shown in FIG. 4.
Further, in step 4, the generator model is composed of two channels, the calculation of which is divided into 2 parts, image feature extraction and image restoration.
And 4-1, extracting the characteristics of the image.
4-1-1, extracting characteristics from the first layer. The source image is first converted from a single channel to 16 channel features using a 1 x 1 convolution kernel, the output of which is denoted layer1.
4-1-2, extracting characteristics in the second layer. The 16 channels of the first layer are converted to 32 channel features using a 33 convolution kernel, which is denoted as layer2.
4-1-3, full connection layer. Connecting the outputs of the first and second layers as inputs to the third layer may be expressed as:
layer2_3=concat(layer1,layer2)
where layer2_3 represents the output of the fully connected layer, with 48 eigen-channel numbers.
4-1-4, extracting characteristics from the third layer. The output of the fully-connected layer is converted from 48 channels to 16 channels using a 3 x 3 convolution kernel, which is denoted as layer3.
4-1-5, full connecting layer. The output of the first three layers is connected as the input of the fourth layer, which can be expressed as:
layer3_4=concat(layer1,layer2,layer3)
where layer3_4 is the output of the fully connected layer, with 64 eigen-channel numbers.
And 4-1-6, extracting characteristics in the fourth layer. The output of the fully-connected layer is converted from 64 channels to 16 channels, denoted layer4, using a 3 x 3 convolution kernel.
4-1-7, full connecting layer. The output connections of the first four layers are taken as inputs to the fused layer, which is represented as:
layer4_5=concat(layer1,layer2,layer3,layer4)
where layer4_5 represents the output of a fully connected layer, which has 128 characteristic channels.
And 4-2, recovering the image. The image is converted from 128 channels to a single channel using a 1 x 1 convolution kernel, denoted layer5, and the final single channel image is acted on by a tangent hyperbolic function tanh, which is specifically denoted as:
Figure BDA0003842868640000081
the output of the generator is represented as:
I f =tanh(layer5)
further, in step 5, the discriminator is composed of four convolutional layers and one linear layer, and the specific steps are as follows:
and 5-1, extracting image characteristics for identification.
5-1-1. First, the input image is converted from a single channel to 16 channels using a 3 x 3 convolution kernel;
5-1-2. Converting the input features from 16 channels to 32 channels using a convolution kernel of 3 x 3;
5-1-3. Converting the input features from 32 channels to 64 channels using a convolution kernel of 3 x 3;
5-1-4. Converting the input features from 64 channels to 128 channels using a convolution kernel of 3 x 3;
5-1-5, converting the input feature from 128 channels to 256 channels by using a convolution kernel of 3 x 3, and acquiring the height H of the output feature;
and 5-2, acquiring a probability label of the input image by using a linear layer.
5-2-1, firstly, the dimension of the output characteristic channel is readjusted, the channel number is 32, the size of each channel is H multiplied by 256, and the output can be expressed as reshape _ var;
5-2-2. Set the normalization matrix, receiver, which is expressed as:
recover=[H×H×256,1]
5-2-3, multiplying the readjusted characteristic channel by a normalization matrix to obtain a probability distribution label:
Probability=reshape_var×recover
where Proavailability represents the output of the discriminator.
Analysis of Experimental results
In order to more intuitively show the advantages of the invention, the algorithm used in the invention is compared with five most advanced fusion algorithms, including BF, GRW, quadtree, CNN and MFF-GAN, wherein BF, GRW and Quadtree are traditional fusion algorithms, CNN and MFF-GAN are methods based on deep learning,
in order to objectively evaluate the image fusion algorithm used by the invention, six popular statistical data are selected as objective indexes for measuring the fusion result. Including the mean gradient (Q) AG ) Information entropy (Q) EN ) Spatial frequency (Q) SF ) Gradient-based fusion performance (Q) AB|F ) Visual fidelity (Q) VIF ) Sum of difference correlations (Q) SCD ). The results of the objective comparison are shown in the following table:
comparison of the present invention with other algorithms
Figure BDA0003842868640000091
The indices in the table illustrate:
Q AG is the average value of the image gradient and is used for measuring the definition of the fused image. Q AG The larger the image, the sharper the image. Q AG The definition is as follows:
Figure BDA0003842868640000092
where M and N are the height and width of the image,I f is a fused image, i and j represent the ith row and jth column of the image.
Q EN Is used to measure the information content of the image. Q EN The larger the size, the more information the fused image contains. Q EN The definition is as follows:
Figure BDA0003842868640000093
wherein p is i Is the normalized probability for the gray value.
Q SF Used to measure the texture of the fused image. Q SF The larger the size, the richer the edges and texture of the image. Q SF The definition is as follows:
Figure BDA0003842868640000094
where RF and CF represent the row and column frequencies, respectively, which are defined as follows:
Figure BDA0003842868640000095
Figure BDA0003842868640000101
Q AB|F which is used to measure the degree of retention of edge information from the source image to the fused image. Q AB|F The larger the more edge information is retained. Q AB|F The definition is as follows:
Figure BDA0003842868640000102
wherein
Figure BDA0003842868640000103
And
Figure BDA0003842868640000104
respectively, the edge intensity and orientation value, w, at the location of the image (i, j) A And w B The weight values of the two source images corresponding to the fused image are obtained.
Q VIF Is used to measure the fidelity of the information of the fused image, similar to the human visual system. First, Q VIF The source image and the fused image are filtered and divided into different blocks. Each block is then evaluated for visual information of distortion. The visual fidelity of each block is then calculated. And finally, calculating the overall visual fidelity.
Q SCD The method is used for measuring the degree of correlation between the source image information and the fused image information, and can evaluate the pseudo information contained in the fused image. Q SCD The larger the size, the better the fusion performance, and the less pseudo information contained. Q SCD The definition is as follows:
Q SCD =r(I 1 ,I f )+r(I 2 ,I f )
wherein, I 1 And I 2 Is a source image, I f Is the fused image, r (-) is the correlation coefficient function, defined as follows:
Figure BDA0003842868640000105

Claims (6)

1. a multi-focus image fusion method based on deep learning is characterized by comprising the following steps:
step 1: preprocessing and color space converting are carried out on the training sample image and the test sample image to obtain a training data set and a test data set;
and 2, step: lelaplacian algorithm for obtaining edge information image of multi-focus image pair in training set p And obtaining a fused image label I according to the maximum selection strategy r
And step 3: constructing a deep convolutional neural network model and a texture enhancement module; extracting deep features of the image through a texture enhancement module, and obtaining high-frequency texture information of the image;
the method comprises the following specific steps: (1) constructing a texture enhancement model; (2) Obtaining shallow layer characteristic information of an input image by utilizing convolution operation; (3) Adding a channel attention mechanism to rescale the extracted characteristic channels, and distributing different weights to each channel; (4) Adding a relu function to enable the output of a part of network neurons to be 0, reducing the interdependency of parameters and relieving overfitting; (5) Adding residual connection, and connecting the extracted shallow layer information as input with the output of the next extraction stage; (6) Continuously superposing the 5 steps to obtain depth characteristic information of the source image;
and 4, step 4: reconstructing the image to restore the original size of the image; obtaining a high-frequency information set of the test sample by using the texture enhancement module in the step 2, and designing a corresponding loss function according to the high-frequency information set; the specific formula of the loss function is as follows:
Figure FDA0003842868630000011
where W and H are the width and height of the image, respectively, and l 1 And l 2 Respectively, a pair of multifocal images in a training data set, I f Is the fused image output by the generator and,
Figure FDA0003842868630000012
and
Figure FDA0003842868630000013
the gradient image is a gradient image of a source image which is enhanced by a texture enhancing module, and the size of the gradient image is the same as that of a fusion image;
and 5: training the deep convolutional neural network model in the step 2 by using a training data set to obtain a trained deep convolutional neural network model;
step 6: obtaining a depth feature set of a training data set based on the trained depth convolution neural network model, and performing concat fusion on the depth feature set to obtain a fused gray scale map;
and 7: and performing color space conversion on the fused gray-scale image to obtain a final fused image.
2. The method for multi-focus image fusion based on deep learning of claim 1, wherein: in step 1, the training sample set and the test sample set are converted from RGB color space to YCbCr color space, wherein Y-channel images serve as the training data set and the test data set.
3. The method for multi-focus image fusion based on deep learning of claim 1, wherein: the deep convolutional neural network adopts a generation countermeasure network architecture and comprises a generator and a discriminator; the generator extracts features by using two channels, and inputs the features into a multi-focus image pair I in a training data set 1 And I 2 Output as a fused image I f (ii) a The input of the discriminator is a fused image I f And a fused image tag I r The output is a probability label; and establishing a countermeasure rule between the generator and the discriminator to optimize the fused image.
4. The method for multi-focus image fusion based on deep learning of claim 1, wherein: reconstructing the image to restore the image to the original size; the method specifically comprises the following steps:
the 128-channel feature map is restored to a single-channel image by using a 1 × 1 convolution kernel, and the specific formula is as follows:
F ITEB =constructor(F)
wherein F ITEB And outputting the result of the texture enhancement module.
5. The method of claim 3, wherein the method comprises: the image I to be fused is f And a fused image tag I r Inputting the image into a discriminator to discriminate, establishing a countermeasure rule between the discriminator and a generator, and optimizing a fused image, wherein the steps comprise: (1) If the discriminator discriminates the fusion image I f For false image, go through the steps4 minimizing the fused image I by the loss function f And a fused image tag I r The difference is obtained, the identification result is fed back to the generator, the fusion rule of the generator is adjusted in a self-adaptive mode, and the fusion image is optimized; (2) If the discriminator discriminates the fused image I f Fusing the image I as a real image f Is the optimal fused image.
6. The method for multi-focus image fusion based on deep learning of claim 1, wherein: extracting deep features of the image through a texture enhancement module in the step 3, and obtaining high-frequency texture information of the image;
3-1, extracting deep features of the image;
image is first transformed using a 33 convolution kernel p Converting a single channel into 32 channels, converting the 32 channels into 128 channels by using a convolution kernel of 3 multiplied by 3 to obtain a feature set Origin, and calling the convolution process as conv operation;
3-2, attention distribution weight;
after the channels are extracted, the feature graph corresponding to each channel is compressed into a real number, and the real number set is expressed as { x 1 ,x 2 ,…x 128 Then, each characteristic Channel is assigned with a weight through a parameter W, the W is learned and used for explicitly modeling the correlation among the characteristic channels, and then the assigned weight is applied to each original characteristic Channel to obtain a Channel attention set, which is expressed as { W } 1 ,w 2 ,…w 128 Learning the importance of different channels by matching with a deep learning method;
3-3, residual jump connection;
after each feature Channel is assigned with different weights, the feature Set origin is assigned and added to the Channel attention Set Channel _ Set to obtain the output of each layer of ITEB of the texture enhancement module:
ITEB=Origin+Channel_set
wherein ITEB is the result obtained by each layer of the texture enhancement module;
3-4, extracting a characteristic diagram of the deeper layer of the image;
continuously iterating the ITEB process of each layer of the texture enhancement block to obtain an extraction result F, wherein a specific formula is expressed as follows:
F=ITEB(ITEB(…ITEB(I p )))。
CN202211110378.8A 2022-09-13 2022-09-13 Multi-focus image fusion method based on deep learning Pending CN115358961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211110378.8A CN115358961A (en) 2022-09-13 2022-09-13 Multi-focus image fusion method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211110378.8A CN115358961A (en) 2022-09-13 2022-09-13 Multi-focus image fusion method based on deep learning

Publications (1)

Publication Number Publication Date
CN115358961A true CN115358961A (en) 2022-11-18

Family

ID=84007069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211110378.8A Pending CN115358961A (en) 2022-09-13 2022-09-13 Multi-focus image fusion method based on deep learning

Country Status (1)

Country Link
CN (1) CN115358961A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109539A (en) * 2023-03-21 2023-05-12 智洋创新科技股份有限公司 Infrared image texture information enhancement method and system based on generation of countermeasure network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109539A (en) * 2023-03-21 2023-05-12 智洋创新科技股份有限公司 Infrared image texture information enhancement method and system based on generation of countermeasure network

Similar Documents

Publication Publication Date Title
CN107977932B (en) Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN114998210B (en) Retinopathy of prematurity detecting system based on deep learning target detection
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN116958825B (en) Mobile remote sensing image acquisition method and highway maintenance monitoring method
CN114283158A (en) Retinal blood vessel image segmentation method and device and computer equipment
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN110070574A (en) A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet
CN113610732B (en) Full-focus image generation method based on interactive countermeasure learning
CN114511502A (en) Gastrointestinal endoscope image polyp detection system based on artificial intelligence, terminal and storage medium
CN113344933A (en) Glandular cell segmentation method based on multi-level feature fusion network
CN115358961A (en) Multi-focus image fusion method based on deep learning
CN112926667B (en) Method and device for detecting saliency target of depth fusion edge and high-level feature
CN117095471B (en) Face counterfeiting tracing method based on multi-scale characteristics
Zheng et al. Overwater image dehazing via cycle-consistent generative adversarial network
CN113538363A (en) Lung medical image segmentation method and device based on improved U-Net
CN116091793A (en) Light field significance detection method based on optical flow fusion
CN117315735A (en) Face super-resolution reconstruction method based on priori information and attention mechanism
CN115424337A (en) Iris image restoration system based on priori guidance
CN114067187A (en) Infrared polarization visible light face translation method based on countermeasure generation network
CN113901916A (en) Visual optical flow feature-based facial fraud action identification method
CN113553895A (en) Multi-pose face recognition method based on face orthogonalization
CN112634239A (en) Cerebral hemorrhage detecting system based on deep learning
CN111882495A (en) Image highlight processing method based on user-defined fuzzy logic and GAN
Joshi et al. Enhancing Two dimensional magnetic resonance image using generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination