CN117808691A - Image fusion method based on difference significance aggregation and joint gradient constraint - Google Patents

Image fusion method based on difference significance aggregation and joint gradient constraint Download PDF

Info

Publication number
CN117808691A
CN117808691A CN202311705681.7A CN202311705681A CN117808691A CN 117808691 A CN117808691 A CN 117808691A CN 202311705681 A CN202311705681 A CN 202311705681A CN 117808691 A CN117808691 A CN 117808691A
Authority
CN
China
Prior art keywords
image
fusion
gradient
map
saliency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311705681.7A
Other languages
Chinese (zh)
Inventor
李璇
王杰
陈荣富
冯昭明
张国敏
丁一凡
程莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202311705681.7A priority Critical patent/CN117808691A/en
Publication of CN117808691A publication Critical patent/CN117808691A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image fusion method based on difference significance aggregation and joint gradient constraint, which comprises the following steps: inputting two types of source images of infrared and visible light into a fusion network, and integrating through a regional aggregation strategy to obtain a difference joint significance map; the generated difference joint saliency map and the two types of source images are input into a feature fusion sub-network together, and features are reconstructed through convolution to obtain a primary fusion image; constructing a joint gradient map containing source image complementation texture information through a two-channel gradient aggregation module in a fusion network; in a generator of the fusion network, respectively calculating content loss between the primary fusion image and the infrared and visible light source images; in a discriminator of the fusion network, calculating a contrast loss between the combined gradient map and the gradient map of the primary fusion image; the content loss and the contrast loss are used together to train the fusion network, generating the final fusion image. The invention realizes the fusion of infrared and visible light images.

Description

Image fusion method based on difference significance aggregation and joint gradient constraint
Technical Field
The invention belongs to the field of image processing, and particularly relates to an image fusion method based on difference saliency aggregation and joint gradient constraint.
Background
The image fusion technology synthesizes the data acquired by a plurality of sensors or shooting conditions in the same scene to form an image with information of different descriptions on the same scene, so that the computer can conveniently further recognize and process the image. In the task of fusing infrared and visible light images, problems such as distortion and blurring are inevitably caused in the final obtained image due to the influence of hardware conditions of imaging equipment, interference in imaging and transmission processes, natural environment and other factors. The image fusion technology can well solve the defect of insufficient single image information, improves the richness of image content, enables the expression of pictures to be more abundant, reduces the redundancy of information, and achieves better visual effect.
The purpose of image fusion is to obtain a higher visual effect image. However, due to technical conditions and environmental factors, background high-brightness information such as smoke, strong light, and night strong light like fog mist is generally retained in the visible light image. Once there is a situation in the visible image where the object is blocked by the background highlighting information, the texture information and the integrity of the infrared intensity features of the foreground object of the fused image are inevitably affected. Meanwhile, it is difficult to preserve meaningful background highlighting information while displaying the complete foreground object in the fused image. If the fusion process cannot retain meaningful image content, redundant information in the fused image can impair the expression of effective information.
Disclosure of Invention
The invention aims to provide an image fusion method based on difference significance aggregation and joint gradient constraint, which realizes infrared and visible light image fusion.
In order to solve the technical problems, the technical scheme of the invention is as follows: an image fusion method based on difference saliency aggregation and joint gradient constraint comprises the following steps:
s1, inputting two types of source images of infrared and visible light into a fusion network; the fusion network comprises a generator, a two-channel gradient aggregation module and a discriminator, wherein the generator comprises a saliency difference perception aggregation sub-network and a feature fusion sub-network; the method comprises the steps that two types of source images of infrared and visible light are firstly input into a saliency difference perception aggregation sub-network, the sub-network comprises a multi-scale stereoscopic attention module and a region aggregation strategy, attention information of saliency regions of the two types of source images of the infrared and the visible light is respectively extracted through the multi-scale stereoscopic attention module, and a difference joint saliency map is obtained through integration of the region aggregation strategy;
s2, inputting the generated difference joint saliency map and the two types of source images into a feature fusion sub-network; the feature fusion sub-network comprises gradient residual modules which are added with a chain structure in jump connection, the gradient residual modules are sequentially connected to extract features of a difference joint saliency map and two types of infrared and visible light source images, and then the features are reconstructed through convolution to obtain a primary fusion image;
s3, constructing a joint gradient map containing source image complementarity texture information through a two-channel gradient aggregation module in the fusion network; inputting the combined gradient map and the gradient map of the primary fusion image into a discriminator of a fusion network, and enhancing texture details of the primary fusion image;
s4, in a generator of the fusion network, respectively calculating content loss between the primary fusion image and the infrared and visible light source images; in a discriminator of the fusion network, calculating a contrast loss between the combined gradient map and the gradient map of the primary fusion image; the content loss and the antagonism loss are used together for training a fusion network, so that the effect of image fusion is optimized; when the training round number reaches the preset number, the network training is completed, and a final fusion image is generated.
The S1 specifically comprises the following steps:
s11, respectively inputting two types of source images, namely an infrared image and a visible light image, into a saliency difference perception aggregation sub-network; sensing the space and channel attention information of the saliency areas of the two types of source images under different scales through a multi-scale stereoscopic attention module to obtain a saliency feature image of the corresponding source image;
s12, integrating the saliency feature images through a regional aggregation strategy to obtain a difference combined saliency image I mask
The step S2 is specifically as follows:
s21, combining the differences with a saliency map I in a feature fusion sub-network mask Multiplying the images with infrared and visible light element by element respectively to obtain a saliency target area diagram I t And salient background region map I d The method comprises the steps of carrying out a first treatment on the surface of the Pair I t And infrared image, I d And extracting shallow features of the visible light image on two parallel feature extraction branches by using a 3x3 convolution layer respectively; in each parallel feature extraction branch, the extracted shallow features are respectively connected in series in the channel dimension, and the deep features are further extracted by using a gradient residual error module of a chain structure which is added in a feature fusion sub-network and connected in a jumping manner;
s22, the shallow layer features after being connected in series are further extracted through gradient residual modules which are connected in sequence, and meanwhile, adjacent gradient residual modules are connected in a jumping mode to avoid losing context information; in a gradient residual error module, deep features are extracted from a main stream by convolution dense connection of two 3x3, a residual error stream is subjected to gradient operation through a Sobel gradient operator, fine granularity features of a source image are reserved, the deep features and the fine granularity features are respectively obtained through the main stream and the residual error stream, and after channel dimension splicing, the deep features and the fine granularity information are integrated; carrying out feature reconstruction on deep features and fine-grained features extracted from a gradient residual error module through 4 3x3 convolution layers, wherein except for the fact that a final layer uses a Tanh activation function to obtain a primary fusion image, all other layers use BN normalization and LRelu activation functions;
the step S3 is specifically as follows:
s31, constructing a joint gradient map through a two-channel gradient aggregation module: firstly, respectively calculating gradients of infrared and visible light channels through Sobel operators; then integrating the gradient information of the infrared and visible light channels through a double-channel aggregation strategy to obtain a combined gradient map I containing source image complementarity texture information grad This process is expressed as:
wherein,representing gradient operation, in particular calculating gradient by a sobel operator; abs (-) represents absolute value operations and max (-) represents the maximum selection policy at pixel level;
s32, inputting the combined gradient map with the complementary texture characteristics and the gradient map of the primary fusion image into a discriminator. The discriminator is a four-layer network structure, the first three layers use a convolution kernel of 3x3, the last layer is a full-connection layer, and a sigmoid function is used for outputting discrimination probability; the discriminator is used for calculating the similarity degree of texture information between the gradient map of the combined gradient map and the gradient map of the primary fusion image, so as to further describe texture details of foreground objects and background semantic information in the primary fusion image.
The step S4 specifically comprises the following steps:
s41, respectively calculating content loss between the primary fusion image and the two types of source images in a generator; wherein the content loss includes at least a pixel intensity loss, a structural similarity loss, and a contrast loss of the generator;
s42, in the discriminator, calculating the antagonism loss between the combined gradient map and the primary fusion image gradient map; training a network together with the content loss to optimize the fusion effect of the images; and presetting the total training wheel number, and when the training wheel number reaches the preset total training wheel number, finishing the network training to generate a final fusion image.
The working mechanism of the multi-scale stereoscopic attention module in the step S11 is as follows:
in this module, the input image is convolved with 3x3 depth separation to obtain the general features F of each channel 0 The method comprises the steps of carrying out a first treatment on the surface of the F is convolved by 3x3 depth separable of different expansion rates 0 Dividing into different scale features; calculating the attention weights of different branches of the multi-scale feature in space and channel dimensions through the stereoscopic attention; wherein, the calculation formula of the stereoscopic attention is expressed as follows:
where v represents the stereo attention weight, c and s represent the channel attention and the spatial attention, respectively,representing element-by-element multiplication;
the three-dimensional attention weight is processed by a softmax function to obtain a final three-dimensional attention weight, different scale features are integrated, the fusion feature F is subjected to 1x1 convolution adjustment channel and then summed with an input image, and a saliency feature map corresponding to the source image is obtained.
The step S12 is specifically as follows:
integrating the features of the corresponding saliency feature graphs of the infrared and visible light images through a regional aggregation strategy: integrating all significant regions of the source image by means of pixel-level maximum selection:
wherein I is joint (I, j) represents the pixel value of the position of the saliency map (I, j) after integration, I ir_mask (I, j) and I vi_mask (i, j) pixel values at the (i, j) position of the saliency map of the infrared and visible light images, respectively;
the integrated saliency feature map is binarized through an OTSU threshold segmentation method to obtain a difference joint saliency map, which is expressed as follows:
I mask =Threshold OTSU (I joint )
wherein, threshold OTSU (. Cndot.) represents the OTSU thresholding method.
The step S41 specifically comprises the following steps:
the content loss of the generator includes pixel intensity loss, similarity loss, and generator contrast loss, expressed as: l (L) int ,L sim ,L adv The method comprises the steps of carrying out a first treatment on the surface of the Wherein,
loss of pixel intensity L int The definition is expressed as:
wherein H and W are the height and width of the image, respectively, |·|| 1 Representing the L1 norm, max (·) representing the maximum choice at the image pixel level; intensity loss the pixel intensity distribution of the fused image is constrained by integrating the pixel intensity distribution of the infrared and visible images through a maximum selection strategy;
similarity loss L sim The definition is expressed as:
wherein SSIM (I) f ,I ir ) Representing image I f And I ir Structural similarity measure between I ir And I vi Respectively infrared and visible light images;
generatingCounter loss L of the device adv The definition is expressed as:
wherein c is a probability tag, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image, N represents the total number of images
Building generator content loss L for image fusion tasks G The definition is expressed as:
L G =λ 1 L int2 L SSIM3 L adv
wherein lambda is 123 Is the weight that controls each loss;
the step S42 specifically includes:
s42, calculating the similarity between the gradient map of the fusion image and the combined gradient map through the antagonism loss function of the discriminator so as to achieve the purpose of reducing the difference of texture features between the primary fusion image and the combined gradient map; antagonistic loss function of arbiterThe definition is as follows:
wherein a is a label of a gradient map of the fusion image, and the value is 0; b is a label of the joint gradient map, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image and N represents the total number of images.
And presetting the total training round number as E, and when the training round number reaches E, finishing the network training to generate a final fusion image.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention provides a novel difference significance aggregation and combined gradient constraint generation countermeasure network to realize fusion of infrared and visible light images. The method coordinates well the relationship between the complete representation of the target and the retention of meaningful highlighting information. The generated fusion image has rich semantic information, and is helpful for meeting the requirements of advanced visual tasks.
(2) The invention designs a remarkable difference perception aggregation sub-network. Wherein a multi-scale stereo attention mechanism is used to perceive spatial location information and channel attention information of regions of saliency of source images of different modalities. The region aggregation strategy can be used for three-dimensionally integrating the difference of the salient regions in images of different modes and constructing a difference combined saliency map. The difference combined saliency map can effectively relieve the difficulty of reserving target features blocked by background highlight information.
(3) The invention designs the chain type gradient residual error module with the jump connection, wherein the gradient residual error modules are sequentially connected to enhance the extraction capability of image target characteristics and fine granularity information, and the jump connection is added between the adjacent gradient residual error modules to avoid the loss of context information. In the gradient residual error module, the main flow uses a dense connection mode to enhance the multiplexing capability of network characteristics, and the residual error flow uses a gradient operator to promote the description of fine granularity information;
(4) The dual-channel gradient aggregation module strengthens the connection between the infrared image gradient information and the visible image gradient information, and generates a combined gradient map containing the source image complementary texture information. Under the countermeasure constraint of the combined gradient map, the fusion image can effectively display rich texture details containing foreground objects and background semantic information.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a schematic diagram of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-scale stereoscopic attention module structure according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a converged network architecture in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a gradient residual module structure according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a network structure of a arbiter according to an embodiment of the present invention;
fig. 7 is a fusion effect diagram of an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Referring to fig. 1 to 7, the technical scheme of the invention is as follows:
an image fusion method based on difference significance aggregation and joint gradient constraint comprises the following steps:
1. and inputting the infrared source image and the visible source image into a constructed fusion network. The fusion network mainly comprises a generator, a two-channel gradient aggregation module and a discriminator. The generator is composed of a saliency difference perception aggregation sub-network and a feature fusion sub-network. The method comprises the steps that two types of source images, namely infrared and visible light, are firstly input into a saliency difference perception aggregation sub-network, the network comprises a multi-scale three-dimensional attention module and a region aggregation strategy, attention information of an image saliency region is respectively extracted through the multi-scale three-dimensional attention module, and a difference joint saliency map is obtained through integration of the region aggregation strategy;
(1) And respectively inputting two types of source images, namely infrared and visible light images, into the saliency difference perception aggregation sub-network. Firstly, a multi-scale stereoscopic attention module is used for sensing channel and space position information of saliency areas of two types of source images to obtain a saliency feature map of a corresponding source image:
in the multi-scale stereo attention module, the input image is first convolved with a depth separable of 3x3 (DSConv 3x 3) to obtain the general features F of each channel 0 . On this basis, F is convolved with DSConv3x3 of different expansion rates 0 Divided into different scale features. To make full use of multi-scale featuresThe spatial and channel dimensions are related, the stereo attention is used to calculate the attention weights of the different branches. The calculation formula of the three-dimensional attention is as follows:wherein v represents a stereo attention weight, c and s represent channel attention and spatial attention, respectively, +.>Representing element-by-element multiplication. And obtaining final stereoscopic attention weight through softmax function processing, and integrating different scale features. And (3) after the channel is adjusted by using a 1x1 convolution, the fusion characteristic F is summed with the image characteristic to obtain a final output result.
(2) The saliency feature images are integrated stereoscopically through a simple and effective regional aggregation strategy to obtain a difference combined saliency image I mask The process of (1) is as follows:
the corresponding saliency feature images of the infrared and visible light images are integrated into features by means of a region aggregation. First, the entire saliency areas of the source image are integrated by means of pixel-level maximum selection:
wherein I is joint (I, j) represents the pixel value of the position of the saliency map (I, j) after integration, I ir_mask (I, j) and I vi_mask (i, j) are pixel values at the (i, j) position of the saliency map of the infrared and visible light images, respectively. Then, the integrated saliency feature map is divided into two values through an OTSU threshold value to obtain a final difference joint saliency map:
I mask =Threshold OTSU (I joint )
wherein Threshold OTSU (. Cndot.) represents the OTSU thresholding method.
2. And inputting the generated difference joint saliency map and the two types of source images into a feature fusion sub-network. The feature fusion sub-network comprises chain type gradient residual modules which are added with jump connection, wherein the gradient residual modules are sequentially connected to extract features of a difference joint saliency map and two types of infrared and visible light source images, and the adjacent modules are connected through the jump connection to avoid the loss of context information; and reconstructing the features through convolution to obtain a primary fusion image. The process is as follows:
in the feature fusion sub-network, the difference is combined with the saliency map I mask Multiplying the images with infrared and visible light element by element respectively to obtain a saliency target area diagram I t And salient background region map I d The method comprises the steps of carrying out a first treatment on the surface of the Pair I t And infrared image, I d And extracting shallow features of the visible light image on two parallel feature extraction branches by using a 3x3 convolution layer respectively; in each parallel feature extraction branch, the extracted shallow features are respectively connected in series in the channel dimension, and deep features are further extracted by using a chain gradient residual error module which is connected in a jumping manner in a feature fusion sub-network;
the characteristics of the shallow layer after series connection are further extracted through gradient residual modules which are connected in sequence, and meanwhile, adjacent gradient residual modules are connected through jumping to avoid losing context information; in a gradient residual error module, deep features are extracted from a main stream by convolution dense connection of two 3x3, a residual error stream is subjected to gradient operation through a Sobel gradient operator, fine granularity features of a source image are reserved, the deep features and the fine granularity features are respectively obtained through the main stream and the residual error stream, and after channel dimension splicing, the deep features and the fine granularity information are integrated; carrying out feature reconstruction on deep features and fine-grained features extracted from a gradient residual error module through 4 3x3 convolution layers, wherein except for the fact that a final layer uses a Tanh activation function to obtain a primary fusion image, all other layers use BN normalization and LRelu activation functions;
3. constructing a joint gradient map containing source image complementation texture information through a two-channel gradient aggregation module in a fusion network; and inputting the combined gradient map and the constructed gradient map of the primary fusion image into a discriminator of the fusion network, and enhancing the texture details of the primary fusion image. The method comprises the following specific steps:
by double pairThe channel gradient aggregation module constructs a joint gradient map: firstly, respectively calculating gradients of infrared and visible light channels through Sobel operators; then integrating the gradient information of the infrared and visible light channels through a double-channel aggregation strategy to obtain a combined gradient map I containing source image complementarity texture information grad This process is expressed as:
wherein,representing gradient operation, in particular calculating gradient by a sobel operator; abs (-) represents absolute value operations and max (-) represents the maximum selection policy at pixel level;
the combined gradient map with the complementary texture features and the gradient map of the preliminary fusion image are input together into a arbiter. The discriminator is a four-layer network structure, the first three layers use a convolution kernel of 3x3, the last layer is a full-connection layer, and a sigmoid function is used for outputting discrimination probability; the discriminator is used for calculating the similarity degree of texture information between the gradient map of the combined gradient map and the gradient map of the primary fusion image, so as to further describe texture details of foreground objects and background semantic information in the primary fusion image.
4. In a generator of the fusion network, respectively calculating content loss between the generated primary fusion image and the two types of source images; in a discriminator of the fusion network, calculating the antagonism loss between the combined gradient map and the primary fusion image gradient map; the two losses are used for training a fusion network together, so that the effect of image fusion is optimized; when the training round number reaches the preset number, the network training is completed, and a final fusion image is generated. The method comprises the following specific steps:
(1) The content loss of the generator includes pixel intensity loss, similarity loss, and generator contrast loss, respectively: l (L) int ,L sim ,L adv . Wherein the pixel intensity loss L int The definition is as follows:
where H and W are the height and width of the image respectively, I.I 1 The L1 norm is represented by the expression,representing a maximum selection of image pixel levels. Intensity loss the pixel intensity distribution of the fused image is constrained by integrating the pixel intensity distribution of the infrared and visible images by a maximum selection strategy.
The effect of the similarity penalty is to maintain an overall structure in which the fused image is consistent with the source image. Similarity loss L sim The definition is as follows:
wherein SSIM (I) 1 ,I 2 ) Representing image I 1 And I 2 Structural similarity measures between. I ir And I vi Respectively infrared and visible images.
The objective of the resistance penalty of the generator is to have the fused image retain more texture information in the source image. Counter loss L of generator adv The definition is as follows:
wherein c is a probability tag, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image and N represents the total number of images.
Building generator content loss L for image fusion tasks G The definition of the loss function is as follows:
L G =λ 1 L int2 L SSIM3 L adv
wherein lambda is 123 Is to control eachWeight of term loss.
(2) Calculating the similarity between the gradient map of the fusion image and the combined gradient map through the antagonism loss function of the discriminator so as to achieve the purpose of reducing the texture feature difference between the primary fusion image and the combined gradient map; antagonistic loss function L of discriminator Dadv The definition is as follows:
wherein a is a label of a gradient map of the fusion image, and the value is 0; b is a label of the joint gradient map, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image and N represents the total number of images.
And presetting the total training round number as E, and when the training round number reaches E, finishing the network training to generate a final fusion image.
Fig. 6 shows the fusion results of the test charts of the present invention. The result in the graph shows that the invention can well reserve the target blocked by the high-brightness information, and simultaneously, the meaningful high-brightness information in the image is reserved, thereby enriching the semantic information of the fusion image.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. An image fusion method based on difference significance aggregation and joint gradient constraint is characterized by comprising the following steps:
s1, inputting two types of source images of infrared and visible light into a fusion network; the fusion network comprises a generator, a two-channel gradient aggregation module and a discriminator, wherein the generator comprises a saliency difference perception aggregation sub-network and a feature fusion sub-network; the method comprises the steps that two types of source images of infrared and visible light are firstly input into a saliency difference perception aggregation sub-network, the sub-network comprises a multi-scale stereoscopic attention module and a region aggregation strategy, attention information of saliency regions of the two types of source images of the infrared and the visible light is respectively extracted through the multi-scale stereoscopic attention module, and a difference joint saliency map is obtained through integration of the region aggregation strategy;
s2, inputting the generated difference joint saliency map and the two types of source images into a feature fusion sub-network; the feature fusion sub-network comprises gradient residual modules which are added with a chain structure in jump connection, the gradient residual modules are sequentially connected to extract features of a difference joint saliency map and two types of infrared and visible light source images, and then the features are reconstructed through convolution to obtain a primary fusion image;
s3, constructing a joint gradient map containing source image complementarity texture information through a two-channel gradient aggregation module in the fusion network; inputting the combined gradient map and the gradient map of the primary fusion image into a discriminator of a fusion network, and enhancing texture details of the primary fusion image;
s4, in a generator of the fusion network, respectively calculating content loss between the primary fusion image and the infrared and visible light source images; in a discriminator of the fusion network, calculating a contrast loss between the combined gradient map and the gradient map of the primary fusion image; the content loss and the antagonism loss are used together for training a fusion network, so that the effect of image fusion is optimized; when the training round number reaches the preset number, the network training is completed, and a final fusion image is generated.
2. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 1, wherein S1 specifically is:
s11, respectively inputting two types of source images, namely an infrared image and a visible light image, into a saliency difference perception aggregation sub-network; sensing the space and channel attention information of the saliency areas of the two types of source images under different scales through a multi-scale stereoscopic attention module to obtain a saliency feature image of the corresponding source image;
s12, integrating the saliency feature images through a regional aggregation strategy to obtain differencesJoint saliency map I mask
3. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 2, wherein S2 is specifically:
s21, combining the differences with a saliency map I in a feature fusion sub-network mask Multiplying the images with infrared and visible light element by element respectively to obtain a saliency target area diagram I t And salient background region map I d The method comprises the steps of carrying out a first treatment on the surface of the Pair I t And infrared image, I d And extracting shallow features of the visible light image on two parallel feature extraction branches by using a 3x3 convolution layer respectively; in each parallel feature extraction branch, the extracted shallow features are respectively connected in series in the channel dimension, and the deep features are further extracted by using a gradient residual error module of a chain structure which is added in a feature fusion sub-network and connected in a jumping manner;
s22, the shallow layer features after being connected in series are further extracted through gradient residual modules which are connected in sequence, and meanwhile, adjacent gradient residual modules are connected in a jumping mode to avoid losing context information; in a gradient residual error module, deep features are extracted from a main stream by convolution dense connection of two 3x3, a residual error stream is subjected to gradient operation through a Sobel gradient operator, fine granularity features of a source image are reserved, the deep features and the fine granularity features are respectively obtained through the main stream and the residual error stream, and after channel dimension splicing, the deep features and the fine granularity information are integrated; and carrying out feature reconstruction on deep features and fine-grained features extracted from the gradient residual error module through 4 3x3 convolution layers, wherein except for the final layer, a primary fusion image is obtained by using a Tanh activation function, and all the other layers use BN normalization and LRelu activation functions.
4. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 3, wherein S3 specifically is:
s31, constructing a joint gradient map through a two-channel gradient aggregation module: firstly, respectively calculating gradients of infrared and visible light channels through Sobel operators; then go through double-passThe channel aggregation strategy integrates gradient information of infrared and visible light channels to obtain a combined gradient map I containing source image complementarity texture information grad This process is expressed as:
wherein,representing gradient operation, in particular calculating gradient by a sobel operator; abs (-) represents absolute value operations and max (-) represents the maximum selection policy at pixel level;
s32, inputting the combined gradient map with the complementary texture characteristics and the gradient map of the primary fusion image into a discriminator. The discriminator is a four-layer network structure, the first three layers use a convolution kernel of 3x3, the last layer is a full-connection layer, and a sigmoid function is used for outputting discrimination probability; the discriminator is used for calculating the similarity degree of texture information between the gradient map of the combined gradient map and the gradient map of the primary fusion image, so as to further describe texture details of foreground objects and background semantic information in the primary fusion image.
5. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 4, wherein S4 is specifically:
s41, respectively calculating content loss between the primary fusion image and the two types of source images in a generator; wherein the content loss includes at least a pixel intensity loss, a structural similarity loss, and a contrast loss of the generator;
s42, in the discriminator, calculating the antagonism loss between the combined gradient map and the primary fusion image gradient map; training a network together with the content loss to optimize the fusion effect of the images; and presetting the total training wheel number, and when the training wheel number reaches the preset total training wheel number, finishing the network training to generate a final fusion image.
6. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 2, wherein the working mechanism of the multi-scale stereoscopic attention module in S11 is as follows:
in this module, the input image is convolved with 3x3 depth separation to obtain the general features F of each channel 0 The method comprises the steps of carrying out a first treatment on the surface of the F is convolved by 3x3 depth separable of different expansion rates 0 Dividing into different scale features; calculating the attention weights of different branches of the multi-scale feature in space and channel dimensions through the stereoscopic attention; wherein, the calculation formula of the stereoscopic attention is expressed as follows:
where v represents the stereo attention weight, c and s represent the channel attention and the spatial attention, respectively,representing element-by-element multiplication;
the three-dimensional attention weight is processed by a softmax function to obtain a final three-dimensional attention weight, different scale features are integrated, the fusion feature F is subjected to 1x1 convolution adjustment channel and then summed with an input image, and a saliency feature map corresponding to the source image is obtained.
7. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 2, wherein S12 is specifically:
integrating the features of the corresponding saliency feature graphs of the infrared and visible light images through a regional aggregation strategy: integrating all significant regions of the source image by means of pixel-level maximum selection:
wherein I is joint (i, j) represents post-integration significancePixel value at feature map (I, j) position, I ir_mask (I, j) and I vi_mask (i, j) pixel values at the (i, j) position of the saliency map of the infrared and visible light images, respectively;
the integrated saliency feature map is binarized through an OTSU threshold segmentation method to obtain a difference joint saliency map, which is expressed as follows:
I mask =Threshold OTSU (I joint )
wherein, threshold OTSU (. Cndot.) represents the OTSU thresholding method.
8. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 5, wherein S41 specifically is:
the content loss of the generator includes pixel intensity loss, similarity loss, and generator contrast loss, expressed as: l (L) int ,L sim ,L adv The method comprises the steps of carrying out a first treatment on the surface of the Wherein,
loss of pixel intensity L int The definition is expressed as:
wherein H and W are the height and width of the image, respectively, |·|| 1 Representing the L1 norm, max (·) representing the maximum choice at the image pixel level; intensity loss the pixel intensity distribution of the fused image is constrained by integrating the pixel intensity distribution of the infrared and visible images through a maximum selection strategy;
similarity loss L sim The definition is expressed as:
wherein SSIM (I) f ,I ir ) Representing image I f And I ir Structural similarity measure between I ir And I vi Respectively infrared and visible light images;
counter loss L of generator adv The definition is expressed as:
wherein c is a probability tag, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image, N represents the total number of images
Building generator content loss L for image fusion tasks G The definition is expressed as:
L G =λ 1 L mt2 L SSIM3 L adv
wherein lambda is 123 Is the weight that controls each loss.
9. The image fusion method based on difference saliency aggregation and joint gradient constraint according to claim 5, wherein S42 specifically is:
calculating the similarity between the gradient map of the fusion image and the combined gradient map through the antagonism loss function of the discriminator so as to achieve the purpose of reducing the texture feature difference between the primary fusion image and the combined gradient map; antagonistic loss function of arbiterThe definition is as follows:
wherein a is a label of a gradient map of the fusion image, and the value is 0; b is a label of the joint gradient map, and the value is 1; d (-) represents the output result of the discriminator; n represents the nth image and N represents the total number of images.
And presetting the total training round number as E, and when the training round number reaches E, finishing the network training to generate a final fusion image.
CN202311705681.7A 2023-12-12 2023-12-12 Image fusion method based on difference significance aggregation and joint gradient constraint Pending CN117808691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311705681.7A CN117808691A (en) 2023-12-12 2023-12-12 Image fusion method based on difference significance aggregation and joint gradient constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311705681.7A CN117808691A (en) 2023-12-12 2023-12-12 Image fusion method based on difference significance aggregation and joint gradient constraint

Publications (1)

Publication Number Publication Date
CN117808691A true CN117808691A (en) 2024-04-02

Family

ID=90428928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311705681.7A Pending CN117808691A (en) 2023-12-12 2023-12-12 Image fusion method based on difference significance aggregation and joint gradient constraint

Country Status (1)

Country Link
CN (1) CN117808691A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Similar Documents

Publication Publication Date Title
US11562498B2 (en) Systems and methods for hybrid depth regularization
Srinivasan et al. Aperture supervision for monocular depth estimation
DE102019130889A1 (en) ESTIMATE THE DEPTH OF A VIDEO DATA STREAM TAKEN BY A MONOCULAR RGB CAMERA
Liu et al. Image de-hazing from the perspective of noise filtering
Xiao et al. Single image dehazing based on learning of haze layers
CN117808691A (en) Image fusion method based on difference significance aggregation and joint gradient constraint
Goncalves et al. Deepdive: An end-to-end dehazing method using deep learning
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
Li et al. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module
Zhuang et al. A dense stereo matching method based on optimized direction-information images for the real underwater measurement environment
Cho et al. Event-image fusion stereo using cross-modality feature propagation
CN114372931A (en) Target object blurring method and device, storage medium and electronic equipment
Wang et al. Agcyclegan: Attention-guided cyclegan for single underwater image restoration
CN113763300A (en) Multi-focus image fusion method combining depth context and convolution condition random field
CN112115786A (en) Monocular vision odometer method based on attention U-net
CN116958393A (en) Incremental image rendering method and device
CN116883303A (en) Infrared and visible light image fusion method based on characteristic difference compensation and fusion
Li et al. Single image depth estimation using edge extraction network and dark channel prior
Kumar et al. Underwater image enhancement using deep learning
Mathew et al. Monocular depth estimation with SPN loss
Ivanecký Depth estimation by convolutional neural networks
Haji-Esmaeili et al. Large-scale monocular depth estimation in the wild
Kim et al. Bidirectional Deep Residual learning for Haze Removal.
Honnutagi et al. Underwater video enhancement using manta ray foraging lion optimization-based fusion convolutional neural network
Kim et al. Real-time human segmentation from RGB-D video sequence based on adaptive geodesic distance computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination