CN116993639A - Visible light and infrared image fusion method based on structural re-parameterization - Google Patents

Visible light and infrared image fusion method based on structural re-parameterization Download PDF

Info

Publication number
CN116993639A
CN116993639A CN202310932335.6A CN202310932335A CN116993639A CN 116993639 A CN116993639 A CN 116993639A CN 202310932335 A CN202310932335 A CN 202310932335A CN 116993639 A CN116993639 A CN 116993639A
Authority
CN
China
Prior art keywords
fusion
image
layer
structural
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310932335.6A
Other languages
Chinese (zh)
Inventor
蒋汶臻
胡荣林
王林涛
李文超
王佳雯
马甲林
李翔
邵鹤帅
张海艳
何艳婷
冯万利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310932335.6A priority Critical patent/CN116993639A/en
Publication of CN116993639A publication Critical patent/CN116993639A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visible light and infrared image fusion method based on structural reconsideration, which comprises the following steps: in the encoder, respectively extracting features of visible light and infrared images by using a deep learning convolutional neural network model stacked by a RepVGG block, a convolutional layer and a temporary retirement layer, and dividing the features into low-level features and high-level features according to the number of channels; the acquired low-level features and high-level features are input into a decoder, and the low-level features and the high-level features are fused into new depth features by using a feature fusion module, so that a fused image is generated. The invention uses the RepVGG block, the temporary return layer and the structural heavy parameterization in the task of fusion of the visible light and the infrared image, can effectively extract the image characteristics, lightens the overfitting, and simultaneously improves the reasoning speed of the model and the utilization rate of the memory.

Description

Visible light and infrared image fusion method based on structural re-parameterization
Technical Field
The invention relates to the field of computer vision visible light and infrared image fusion, in particular to a visible light and infrared image fusion method based on structural heavy parameterization.
Background
In the field of image fusion, fusion of visible light and infrared images is an important technology aimed at fusing information from visible light and infrared sensors to produce a composite image of more comprehensive and rich information. The visible light image and the infrared image respectively capture information in different wave bands, have complementary characteristics, and can provide more comprehensive visual perception capability.
The existing visible light and infrared image fusion method is mainly based on fusion strategies at pixel level or region level, such as weighted average, multi-scale decomposition, wavelet transformation and the like. However, these methods often fail to fully utilize the structural information of the image, resulting in problems such as artifacts, distortion, incomplete information, and the like in the fusion result. In addition, these methods have poor adaptability to different scenes and lighting conditions, and cannot realize precise control and adjustment of image content.
Ding X et al in Repvgg: making vgg-style convnets great again ([ C ]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition2021:13733-13742 ]) propose a RepVGG network whose training model has a multi-branch topology, with decoupling of training and reasoning structures achieved by structure re-parameterization techniques such that the reasoning time body consists of only 3X 3 convolutions and a ReLU stack. Training a multi-branch model, equivalently converting the multi-branch model into a single-path model, and finally deploying the single-path model. The method can simultaneously utilize the advantages (high performance) of multi-branch model training and the benefits (high speed and memory saving) of single-path model reasoning. The network image classification task has high accuracy and high reasoning speed, but is introduced into the field of image fusion, and excessive RepVGG blocks, pooling layers and full-connection layers not only lead to large model calculation amount, but also distort the fusion image, and the fusion image lacks a temporary return layer and also has the risk of overfitting.
Li H et al (LRRNet: ANovel Representation Learning Guided Fusion Network for Infrared and Visible Images) ([ J ]. IEEE transactions on pattern analysis and machine intelligence, 2023.) propose a characterization learning guided fusion network (LRRNet) using a learning image decomposition model (supported LRR, LLRR) for infrared and visible (IR-VI) image fusion tasks. To train this network, a new detail-semantic information loss function is proposed, containing four levels of loss terms, pixel level, shallow feature level, intermediate feature level and deep feature level. The fusion performance (including improved fusion artifact measurement, multi-scale structure similarity and other indexes) of the fusion network is superior to that of most of the existing fusion methods, and the fusion network has fewer parameters and shorter training and reasoning time. However, LLRR blocks in LRRNet are insufficient in deep feature extraction of visible light and infrared images, so that fused image evaluation indexes such as entropy and mutual information are not ideal.
Disclosure of Invention
The invention aims to: the invention aims to provide a visible light and infrared image fusion method based on structural heavy parameterization, so that the reasoning speed is improved, and the utilization rate of a memory is improved.
The technical scheme is as follows: the invention discloses a visible light and infrared image fusion method based on structural heavy parameterization, which comprises the following steps:
(1) The input visible light image is convolved with a layer of convolution by a plurality of RepVGG blocks to extract low-level characteristics L x And advanced feature S x And then L is arranged x And S is x Respectively inputting the convolution layers to obtain C 11 And C 12 Two tensors;
and (1.1) reading an input visible light image as a gray image, scaling the gray image to a uniform size, taking a RepVGG convolutional neural network architecture as a basic convolutional neural network, removing a final pooling layer and a full connection layer, and adding a layer of convolution.
(1.2) inputting the scaled gray image into a modified RepVGG convolutional neural network architecture, and outputting a characteristic diagram C after convolution, batch normalization and linear unit operation correction 1
(1.3) feature map C 1 The first 128 channels are sliced into L x The last 128 channels are sliced as S x
(1.4) mixing L x And S is x The two sub tensors are respectively input into a convolution layer to sequentially obtain C 11 And C 12
(2) The input infrared image is convolved with a layer of convolution by a plurality of RepVGG blocks to extract low-level characteristics L y And advanced feature S y And then L is arranged y And S is y Respectively inputting the convolution layers to obtain C 21 And C 22 Two tensors;
and (2.1) reading the input infrared image as a gray image, scaling the gray image to a uniform size, taking a RepVGG convolutional neural network architecture as a basic convolutional neural network, removing a final pooling layer and a full connection layer, and adding a layer of convolution.
(2.2) inputting the scaled gray image into a modified RepVGG convolutional neural network architecture, and outputting a characteristic diagram C after convolution, batch normalization and linear unit operation correction 2
(2.3) feature map C 2 The first 128 channels are sliced into L y The last 128 channels are sliced as S y
(2.4) mixing L y And S is y The two sub tensors are respectively input into a convolution layer to sequentially obtain C 21 And C 22
(3) C is C 11 And C 21 The two tensors are spliced in the dimension 1 to obtain C 3 C is carried out by 12 And C 22 The two tensors are spliced in the dimension 1 to obtain C 4
(4) C is C 3 And C 4 Respectively inputting the two tensors into a convolution layer to obtain a low tensor and a high tensor;
(5) And performing element-level addition operation on the low tensor and the high tensor, thereby generating a fusion image.
A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method of visible light and infrared image fusion based on structural reparameterization as described above.
The beneficial effects are that: compared with the prior art, the invention has the following advantages:
1. the invention has reasonable design, the low-level features and the high-level features of the image are extracted by using a deep convolutional neural network model, then the risk of overfitting is reduced by a convolutional layer and a temporary retirement layer in a decoder, the generalization capability is enhanced, and the low-level features and the high-level features are fused to obtain stronger feature representation;
2. in the reasoning process, the RepVGG block is converted into the stacked pure topology (without branches) convolution layer by utilizing the decoupling technology of the structural re-parameterization, so that the reasoning speed is improved, and the utilization rate of the memory is improved.
Drawings
FIG. 1 is a diagram of a visible and infrared image fusion network framework based on structural reparameterization;
FIG. 2 is a schematic diagram of the operation of the RepVGG block;
fig. 3 is a schematic diagram of the stacked convolution operation after decoupling.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, in an encoder, a deep convolutional neural network model is used to extract low-level features and high-level features of an image, based on a visible light and infrared image fusion method of structural heavy parameterization; in the decoder, the low-level features and the high-level features are spliced to generate a fused image. Meanwhile, in the encoder, in order to effectively acquire the source image characteristics, accelerate the reasoning speed of the model and improve the utilization rate of the memory, a RepVGG block is used, the image is subjected to 3X 3 convolution, connected 1X 1 convolution and ReLU, and finally a layer of convolution and a temporary retirement layer is adopted, so that the characteristic image is divided into low-level characteristics and high-level characteristics for fusion of a decoder. The output of the network is a fused image with the same resolution as the source image, VGG-19 is used as a loss network, and a detail-semantic information loss function based on multi-level characteristics is used for evaluating the network performance so as to achieve the aim of training the network. The method specifically comprises the following steps:
and S1, extracting basic depth features of the image by using a deep learning convolutional neural network model in the encoder, and dividing the features into low-level features and high-level features according to the number of channels. The specific implementation method of the steps is as follows:
s1.1, reading an input image into a gray image, scaling the gray image to a uniform size, taking a RepVGG convolutional neural network architecture as a pre-trained basic convolutional neural network, removing a final pooling layer and a full connection layer of the basic convolutional neural network, and adding a layer of convolution;
s1.2, respectively inputting the scaled gray images into a modified RepVGG convolutional neural network architecture, and outputting a feature map C after a series of convolution, batch normalization and ReLU operation 1 、C 2
Step S1.3, feature map C 1 、C 2 The first 128 channel slices are L, and the last 128 channel slices are S;
step S1.4, the input visible light image and the corresponding infrared image are processed in steps S1.1, S1.2 and S1.3 respectively to obtain L x 、S x 、L y And S is y Four sub-tensors;
and S2, inputting the acquired low-level features and high-level features into a decoder, and fusing the low-level features and the high-level features into new depth features by using a feature fusion module according to a fusion technology, so as to generate a fused image. The decoder consists of a series of convolution layers and a temporary return layer, and the overfitting risk of the model can be reduced by the output of the convolution layers through the temporary return layer, so that the generalization capability of the model is enhanced. The specific implementation method of the steps is as follows:
step S2.1, L x 、S x 、L y And S is y The four sub tensors are respectively input into a convolution layer to sequentially obtain C 11 、C 12 、C 21 And C 22
Step S2.2, C 11 And C 21 The two tensors are spliced in the dimension 1 to obtain C 3 C is carried out by 12 And C 22 The two tensors are spliced in the dimension 1 to obtain C 4
Step S2.3, C 3 And C 4 Respectively inputting the two tensors into a convolution layer to obtain low and high tensors;
and S2.4, performing element-level addition operation on the low tensor and the high tensor, thereby generating a fusion image.
And S3, using the VGG-19 network as a loss network, and evaluating the network performance by using a detail-semantic information loss function based on multi-level characteristics. The specific implementation method of the steps is as follows:
s3.1, selecting 4 convolution blocks to extract characteristics by using VGG-19 trained on ImageNet as a loss network;
step S3.2, L total Is of the structure L total =γ 1 L pixel2 L shallow +L middle4 L deep Wherein gamma is 1 、γ 2 、γ 4 The weight of each partial loss function is represented. L (L) pixel Representing pixel level loss, L shallow 、L middle And L deep Representing shallow, medium and deep feature loss, respectively, wherein features are extracted through a pre-training network;
step S3.3, after normalizing and scaling the fused image outputted in the step S2.4, mapping the numerical range between 0 and 255, and calculating a loss value L by a mean square error loss function (mse_loss) pixel
S3.4, respectively inputting the fused image, the visible light image and the infrared image output in the S2.4 into a VGG-19 network to respectively output I f 、I vis 、I ir
Step S3.5, I f 、I vis 、I ir Combined calculation of the loss value L by means of a mean square error loss function (mse_loss) shallow 、L middle And L deep And then L is arranged shallow 、L middle And L deep Adding to obtain L total The weights are updated using a back propagation algorithm.
The effect of the present invention will be explained by the following experiments conducted according to the method of the present invention.
Test environment: python3.9; a PyTorch framework; window11 system; NVIDIARTX 3070GPU
Test sequence: the selected training dataset is an image dataset KAIST for fusion of visible light with infrared images. The selected test dataset is TNO and VOT2020-RGBT from the common multi-modal dataset. Wherein 21 pairs of IR-VI images are selected from TNO for testing, and 40 pairs of images are selected from VOT2020-RGBT and TNO for constructing a new test data set. These images have arbitrary sizes and are converted into gray scales.
The test indexes are as follows: according to the invention, 6 quality indexes are selected to objectively evaluate the fusion performance. This includes entropy (En); standard Deviation (SD); mutual Information (MI); improved fusion artifact measurement (Nabf); difference correlation Sum (SCD); multiscale structural similarity (MS-SSIM). The performance of the image fusion method increases with increasing numerical index of the 6 indices (except Nabf).
The test results are shown in tables 1 and 2.
Table 1 shows the average of 6 quality metrics over a 21-pair infrared and visible image fusion of the present invention (RepVGGfuse) with other algorithms.
Table 1 average comparison of quality indicators
Table 2 shows the average of 6 quality indicators of the present invention (RepVGGfuse) and other algorithms on a fused image of 40 pairs of infrared and visible images
Table 2 average comparison of quality indicators
As can be seen from the comparison of the data, the fusion performance of the method is superior to that of most of the existing fusion methods.

Claims (10)

1. The visible light and infrared image fusion method based on the structural reparameterization is characterized by comprising the following steps of:
(1) The input visible light image is convolved with a layer of convolution by a plurality of RepVGG blocks to extract low-level characteristics L x And advanced feature S x And then L is arranged x And S is x Respectively inputting the convolution layers to obtain C 11 And C 12 Two tensors;
(2) The input infrared image is convolved with a layer of convolution by a plurality of RepVGG blocks to extract low-level characteristics L y And advanced feature S y And then L is arranged y And S is y Respectively inputting the convolution layers to obtain C 21 And C 22 Two tensors;
(3) C is C 11 And C 21 The two tensors are spliced in the dimension 1 to obtain C 3 C is carried out by 12 And C 22 The two tensors are spliced in the dimension 1 to obtain C 4
(4) C is C 3 And C 4 Respectively inputting the two tensors into a convolution layer to obtain a low tensor and a high tensor;
(5) And performing element-level addition operation on the low tensor and the high tensor, thereby generating a fusion image.
2. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (1) includes the steps of:
and (1.1) reading an input visible light image as a gray image, scaling the gray image to a uniform size, taking a RepVGG convolutional neural network architecture as a basic convolutional neural network, removing a final pooling layer and a full connection layer, and adding a layer of convolution.
3. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (1) includes the steps of:
(1.2) inputting the scaled gray image into a modified RepVGG convolutional neural network architecture, and outputting a characteristic diagram C after convolution, batch normalization and linear unit operation correction 1
4. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (1) includes the steps of:
(1.3) feature map C 1 The first 128 channels are sliced into L x The last 128 channels are sliced as S x
5. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (1) includes the steps of:
(1.4) mixing L x And S is x The two sub tensors are respectively input into a convolution layer to sequentially obtain C 11 And C 12
6. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (2) includes the steps of:
and (2.1) reading the input infrared image as a gray image, scaling the gray image to a uniform size, taking a RepVGG convolutional neural network architecture as a basic convolutional neural network, removing a final pooling layer and a full connection layer, and adding a layer of convolution.
7. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (2) includes the steps of:
(2.2) inputting the scaled gray image into a modified RepVGG convolutional neural network architecture, and outputting a characteristic diagram C after convolution, batch normalization and linear unit operation correction 2
8. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (2) includes the steps of:
(2.3) feature map C 2 The first 128 channels are sliced into L y The last 128 channels are sliced as S y
9. The method of fusion of visible and infrared images based on structural reparameterization according to claim 1, wherein the step (2) includes the steps of:
(2.4) mixing L y And S is y The two sub tensors are respectively input into a convolution layer to sequentially obtain C 21 And C 22
10. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method of structure-based re-parameterization of visible and infrared image fusion according to any of claims 1-8.
CN202310932335.6A 2023-07-27 2023-07-27 Visible light and infrared image fusion method based on structural re-parameterization Pending CN116993639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310932335.6A CN116993639A (en) 2023-07-27 2023-07-27 Visible light and infrared image fusion method based on structural re-parameterization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310932335.6A CN116993639A (en) 2023-07-27 2023-07-27 Visible light and infrared image fusion method based on structural re-parameterization

Publications (1)

Publication Number Publication Date
CN116993639A true CN116993639A (en) 2023-11-03

Family

ID=88525970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310932335.6A Pending CN116993639A (en) 2023-07-27 2023-07-27 Visible light and infrared image fusion method based on structural re-parameterization

Country Status (1)

Country Link
CN (1) CN116993639A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688936A (en) * 2024-02-04 2024-03-12 江西农业大学 Low-rank multi-mode fusion emotion analysis method for graphic fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688936A (en) * 2024-02-04 2024-03-12 江西农业大学 Low-rank multi-mode fusion emotion analysis method for graphic fusion
CN117688936B (en) * 2024-02-04 2024-04-19 江西农业大学 Low-rank multi-mode fusion emotion analysis method for graphic fusion

Similar Documents

Publication Publication Date Title
CN110889852B (en) Liver segmentation method based on residual error-attention deep neural network
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN112818969B (en) Knowledge distillation-based face pose estimation method and system
CN110889853A (en) Tumor segmentation method based on residual error-attention deep neural network
CN115546032B (en) Single-frame image super-resolution method based on feature fusion and attention mechanism
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN112950480A (en) Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention
CN116993639A (en) Visible light and infrared image fusion method based on structural re-parameterization
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN112257727A (en) Feature image extraction method based on deep learning self-adaptive deformable convolution
CN112541566B (en) Image translation method based on reconstruction loss
CN117333750A (en) Spatial registration and local global multi-scale multi-modal medical image fusion method
CN117252936A (en) Infrared image colorization method and system adapting to multiple training strategies
Di et al. FDNet: An end-to-end fusion decomposition network for infrared and visible images
CN115239943A (en) Training method of image correction model and color correction method of slice image
CN116309221A (en) Method for constructing multispectral image fusion model
CN116452408A (en) Transparent liquid sensing method based on style migration
CN116189160A (en) Infrared dim target detection method based on local contrast mechanism
CN113538484B (en) Deep-refinement multiple-information nested edge detection method
Wan et al. Multi-focus color image fusion based on quaternion multi-scale singular value decomposition
CN113269702A (en) Low-exposure vein image enhancement method based on cross-scale feature fusion
CN114022362A (en) Image super-resolution method based on pyramid attention mechanism and symmetric network
CN111932486A (en) Brain glioma segmentation method based on 3D convolutional neural network
CN117456339B (en) Image quality evaluation method and system based on multi-level feature multiplexing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination