CN117058507A - Fourier convolution-based visible light and infrared image multi-scale feature fusion method - Google Patents

Fourier convolution-based visible light and infrared image multi-scale feature fusion method Download PDF

Info

Publication number
CN117058507A
CN117058507A CN202311037544.0A CN202311037544A CN117058507A CN 117058507 A CN117058507 A CN 117058507A CN 202311037544 A CN202311037544 A CN 202311037544A CN 117058507 A CN117058507 A CN 117058507A
Authority
CN
China
Prior art keywords
convolution
fusion
feature map
feature
infrared image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311037544.0A
Other languages
Chinese (zh)
Other versions
CN117058507B (en
Inventor
程文明
陈国强
魏振兴
张国财
唐长华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd
Original Assignee
Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd filed Critical Zhejiang Aerospace Runbo Measurement And Control Technology Co ltd
Priority to CN202311037544.0A priority Critical patent/CN117058507B/en
Publication of CN117058507A publication Critical patent/CN117058507A/en
Application granted granted Critical
Publication of CN117058507B publication Critical patent/CN117058507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a Fourier convolution-based visible light and infrared image multi-scale feature fusion method, which comprises the following steps of: A. acquiring an RGB image and an infrared image to be fused; B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained; C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map; D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map; E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map. According to the application, the infrared image and the RGB image are effectively fused together, so that more comprehensive and more accurate image data is obtained.

Description

Fourier convolution-based visible light and infrared image multi-scale feature fusion method
Technical Field
The application relates to an image feature extraction processing method, in particular to a multi-scale feature fusion method based on Fourier convolution visible light and infrared images.
Background
In the military and security fields, infrared or visible light detection images are currently in common use for detection and identification of objects. The infrared detection image can capture infrared radiation outside visible spectrum, which is radiation that can not be perceived by human eyes, the infrared radiation has strong penetrability for some substances and environments, and can penetrate through smog, haze, cloud layers and other visual barriers, so that the infrared image can still provide effective image information under severe weather conditions, and is beneficial to observation and monitoring in complex environments; but infrared images have some limitations such as relatively low resolution, imaging quality being affected by environmental factors, etc. The visible light detection image has weaker penetrability, but higher resolution and good imaging quality. If the infrared image and the RGB image can be combined, the respective limitations can be overcome, and the comprehensiveness and usability of the image can be improved. Therefore, there is a need for an infrared image and an RGB image that can be effectively fused together to obtain more comprehensive and accurate image data.
Disclosure of Invention
The application aims to provide a multi-scale feature fusion method based on Fourier convolution visible light and infrared images. According to the application, the infrared image and the RGB image are effectively fused together, so that more comprehensive and more accurate image data is obtained.
The technical scheme of the application is as follows: the method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution comprises the following steps:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map.
In the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific process of the fast Fourier convolution module for fusing the multi-source information is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixel height and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
feature map Y comprising a mapping of high frequency branches H to high frequency branches H l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L→L
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, C represents the spatial resolution and the number of channels, respectively;
c4, splitting the serial characteristic graph X along the dimension of the characteristic channel by the fast Fourier convolution module, namely splitting the serial characteristic graph X into X= { X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useLet y= { Y as output tensor l ,Y g And updated with equation 1),
c6, use 3X 3 convolution to Y l And Y g After convolution processing, fusing the two to obtain an output tensor Y, namely a multisource information fusion feature map;
in the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, firstly carrying out feature extraction by using 1X 1 convolution processing on the bottleneck block through depth convolution, and finally carrying out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
In the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific processing process of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map;
e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, 1 is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula a= (1/(tr (Σ))) Σ;
e4, carrying out iterative treatment by adopting a Newton-Schulz iterative formula;
and E4, performing post-compensation treatment and splicing treatment in sequence.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the Newton-Schulz iterative formula is as follows
Wherein I represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
Z k-1 representing the result obtained by iterating k-1 times by taking the identity matrix I as a starting value, and the same as I k The results obtained after iterating k times are shown.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the calculation formula of post-compensation processing is as follows: c= (tr (Σ)) 1/2 Y N Where tr (Σ) is the covariance matrix trace, Y N The results obtained after N iterations.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the specific process of the splicing treatment is to splice an upper triangular matrix of the symmetrical matrix obtained by post-compensation treatment into a feature map of d (d-1)/2-dimensional vector, so as to obtain a comprehensive fusion feature map.
Compared with the prior art, the application sequentially carries out the feature extraction of the multi-scale feature extractor, the multi-source information fusion processing of the fast Fourier convolution module, the fusion of different layer features of the multi-scale feature fusion module and the global covariance pooling processing of the covariance pooling module on the RGB image and the infrared image, thereby effectively fusing the RGB image and the infrared image together, simultaneously acquiring the heat energy information and the color information, realizing the efficient feature fusion and the more comprehensive target analysis and the feature extraction, acquiring the more comprehensive and more accurate image data and providing powerful support for the analysis and the research of a plurality of fields. For example, in the military and security fields, the combination of infrared images with RGB images can enable more accurate target detection and identification, improving night vision and target tracking capabilities.
Specifically, deep semantic information in an image is extracted through a multi-scale feature extractor; the fast Fourier convolution module performs multi-source information fusion and retains discrimination information; fusing the features of different layers in the feature map by utilizing a multi-scale feature fusion module; global covariance pooling replaces global average pooling, and high-order information is extracted from RGB images and infrared images to obtain richer depth feature statistical information.
In summary, the application effectively fuses the infrared image and the RGB image together to obtain more comprehensive and more accurate image data.
Extensive experimentation on the baseline dataset showed that classification accuracy by fusion increased 2.036% and 1.926%, respectively, compared to using only infrared images or RGB images.
Drawings
FIG. 1 is a schematic flow chart of the present application;
FIG. 2 is a schematic diagram of the overall structure of the present application;
FIG. 3 is a schematic diagram of a fast Fourier convolution module according to the present application;
FIG. 4 is a schematic diagram of the structure of a fast Fourier convolution layer in the fast Fourier convolution module of the present application; wherein, (a) is a total chart of a Fourier convolution module, and (b) is a specific structure of a SpectralTransformer branch in a;
FIG. 5 is a schematic diagram of a bottleneck block according to the present application;
FIG. 6 is a schematic diagram of a multi-scale convolution in a bottleneck block according to an embodiment of the present application;
fig. 7 is a schematic flow chart of a covariance pooling module according to an embodiment of the application.
Detailed Description
The application is further illustrated by the following figures and examples, which are not intended to be limiting.
Examples. The method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution comprises the following steps:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map.
The specific process of the fast Fourier convolution module for carrying out multi-source information fusion processing is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixel height and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
special bits comprising mapping of high frequency branches H to high frequency branches HSign diagram Y l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L→L
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, C represents the spatial resolution and the number of channels, respectively;
c4, splitting the serial characteristic graph X along the dimension of the characteristic channel by the fast Fourier convolution module, namely splitting the serial characteristic graph X into X= { X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useLet y= { Y as output tensor l ,Y g And updated with equation 1),
Y l =Y l→l +Y g→l =f l (X l )+f g→l X g )
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) 1, a method for manufacturing the same
C6, use 3X 3 convolution to Y l And Y g After convolution processing, the two are fused to obtain an output tensor Y, namely a multisource information fusion feature map;
The specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, firstly carrying out feature extraction by using 1X 1 convolution processing on the bottleneck block through depth convolution, and finally carrying out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
The specific processing procedure of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
The specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map; e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, 1 is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula a= (1/(tr (Σ))) Σ;
e4, carrying out iterative treatment by adopting a Newton-Schulz iterative formula;
and E4, performing post-compensation treatment and splicing treatment in sequence.
Newton-schulz iteration formula is
Wherein I represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
Z k-1 representing the result obtained by iterating k-1 times by taking the identity matrix I as a starting value, and the same as I k The results obtained after iterating k times are shown.
The calculation formula of the post-compensation processing is as follows: c= (tr (Σ)) 1/2 Y N Where tr (Σ) is the covariance matrix trace, Y N The results obtained after N iterations.
The specific process of the splicing treatment is to splice the upper triangular matrix of the symmetrical matrix obtained by the post-compensation treatment into a characteristic diagram of d (d-1)/2-dimensional vector, and obtain a comprehensive fusion characteristic diagram.
Example 2. Based on a Fourier convolution visible light and infrared image multi-scale feature fusion method, the framework of the application is designed for carrying out pixel-level classification by fusing multi-source remote sensing images, and is shown in figure 2; it is mainly composed of two parts: 1) Multi-source frequency decomposition and fusion (first part) based on fast fourier convolution module (FFCN); 2) Feature extraction (second part) of the multi-scale layer feature fusion module, and covariance pooling module (GCP model).
Construction of a multiscale feature fusion covariance network of a fast fourier convolution module (F 2 MCN), which is focused on efficient feature fusionAnd (5) combining and comprehensively extracting the characteristics. First, FFCN adopts fast Fourier convolution layer to fuse multi-source information and retain discrimination information. Then, using a multiscale feature fusion (MF 2) module pair F 2 Features of different layers in the MCN are fused. Finally, global Covariance Pooling (GCP) replaces Global Average Pooling (GAP), and high-order information is extracted from RGB images and infrared images to obtain richer depth feature statistics.
Fig. 3 is a fast fourier convolution (FFC Conv) layer in the lower half of fig. 2, which has been used for visible image classification with a more efficient convolution layer. The simple characteristic splicing or superposition operation is extremely easy to generate redundant information superposition. The classical feature extraction and fusion method is adopted to fuse the visible light image and the infrared image information, so that partial redundancy is reduced, but redundancy still exists in a low-frequency part. The present application first uses a fourier convolution layer to decompose the input image into a multi-resolution representation, which makes it easier to reduce spatial redundancy.
In this step, the visible light image (RGB image) is represented asWherein b r Representing the band, r c represents the pixel height and width. An Infrared image (Infrared image) is expressed as +.>Wherein b i Representing the band, r c represents the pixel height and width.
Convolving X with Fast FourierConvolution (FFC) r And X i Explicit decomposition along the channel dimension. Wherein Y is l H →H Representing a feature map for high frequency branch (H) to high frequency branch (H) mapping, Y l H→L Feature map, Y, representing a mapping for high frequency branches (H) to low frequency branches (L) h L→H Representing a feature map, Y, for mapping of low frequency branches (L) to high frequency branches (H) h L→L A feature map for the low frequency branch (L) to low frequency branch (L) mapping is represented.
The FFC architecture is shown as a in fig. 4, and b in fig. 4 is a block diagram of a spectrolforsformer. Conceptually, the FFC is composed of two interconnected paths: a spatial (or local) path that performs a common convolution on a portion of the input feature channels, and a spectral (or global) path that operates in the spectral domain. Each path can capture complementary information with a different receptive field. The exchange of information between these paths is performed internally.
Formally, is provided withThe input features of the FFC are mapped, where h× W, C represents the spatial resolution and the number of channels, respectively. At the FFC entrance, X is first split along the dimension of the characteristic channel, i.e., x= { X l ,X g }. Local part->Learning from local neighborhood, second global part->The remote context is intended to be captured. Alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion. To simplify the network, it is assumed that the output is the same size as the input. Use->As an output tensor. Similarly, let Y= { Y l ,Y g The } is a local-global partition, the global part proportion of the output tensor is defined by the superparameter alpha out ∈[0,1]And (5) controlling. The update process inside the FFC can be described by the following formula:
y l =y l→l +y g→l =f l (X l )+f g→l (X g )
y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) (1)
wherein component Y l→l The purpose of (2) is to capture small-scale information using conventional convolution. Also, the other twoComponent (Y) g →l /Y l→g ) Obtained by inter-path conversion, is also implemented using conventional convolution to make full use of the multi-scale acceptance domain. The main complexity is due to Y g→g Is calculated by the computer. For clarity of description we call f g Is a spectral converter as shown in fig. 4 b.
There are two modes 1 and 2 in fig. 3 using stride in Bottleneck block. When the Bottleneck sets the stride to 1, the Bottleneck is specifically the structure shown in the left half of fig. 5, first the dimension is lifted using a 1 x 1 convolution, feature extraction is performed by a depth convolution (DW Conv), and finally a point convolution is performed. When the stride is set to 2, the bottleck is specifically the structure shown in the right half of fig. 6, and the dimension is lifted by using 1×1 convolution, then feature extraction is performed by using multi-scale convolution (MS Conv, the detailed structure of which is shown in fig. 6), and finally point convolution is performed.
Fig. 6 shows the MS Conv structure at split=4, with the input feature map divided equally into s groups according to channel. Features are extracted from the input feature map of the first set using 3 x 3Conv. The output of the first group is then sent to the second group and added to the input of the second group. At the same time, the result of the addition is sent to the second set of 3 x 3Conv. This process is repeated multiple times until the final set of feature maps has been processed. And finally, splicing all the outputs according to the channels, and carrying out 1X 1 point convolution to carry out information fusion.
The first 1 x 1 convolution of the bottleneck block of the present application performs an up-scaling process on the input feature map, which can provide sufficient channels for MSConv to perform multi-scale feature extraction, as compared to the bottleneck block of the res net. Since in MSConv the output of the previous group is added to the input of the current group, the size of the feature map needs to be the same, we use MS Conv only when the bottleneck step is 1.
The architecture of GCP is shown in fig. 7. A feature map of size h×w×d of the multi-scale feature fusion output is converted to a feature map of size n×d, where n=h×w. First, the covariance matrix Σ is composed ofAnd (3) calculating, wherein,i is an n x n identity matrix, and 1 is a matrix with all elements being 1.
Then, the covariance matrix Σ is divided by its trace a= (1/(tr (Σ))) Σ in a pre-normalization step, (eliminating the adverse effect of pre-normalization), where tr (·) is the trace of the matrix. This is done to enable the subsequent newton-schulz iterations to converge. The iterative formula is as follows:
i represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
in the post-compensation, the result Y obtained after N iterations N Multiplying the square root of the covariance matrix trace, c= (tr (Σ)) 1/2 Y N To eliminate the adverse effects of pre-normalization. And finally, splicing the upper triangular matrix of the symmetrical matrix C obtained by post-compensation into a d (d-1)/2-dimensional vector, and transmitting the d (d-1)/2-dimensional vector to the FC layer.
C=(tr(∑)) 1/2 Y N The function of this formula is to eliminate the adverse effects of pre-normalization.
The FC layer, also known as the full connection layer (fullyconnectiedlayer), is a common layer type in deep learning neural networks. In the FC layer, each neuron is connected to all neurons of the previous layer to form a fully connected structure. Thus, each neuron of the FC layer has a weight connection with all input neurons of the previous layer.
The main function of the fully connected layer is to map the feature representation of the previous layer to the final output space. The method can learn complex nonlinear relations among input features, and perform linear combination and activation function processing through weight parameters so as to generate an output result. In deep learning, the full-join layer is often used for the final classification task or regression task.
The application can also carry out back propagation through final result data (parameters in a GCP module), thereby facilitating learning and training based on a Fourier convolution visible light and infrared image multi-scale feature fusion model. In the back propagation, the partial derivative of the loss function l relative to the covariance error input matrix is obtained by the gradient related to the network structure in the matrix back propagation algorithm, the chain law of the general matrix function is established by first-order taylor approximation,then, the corresponding gradient is calculated>And->
From the chain law of matrix back propagation and newton-schulz iteration, through a series of operations, k=n, …,2, one can derive
In the pre-normalization of the values,can be obtained by (7)
Here we need to combine the gradient of the loss function i with respect to Σ with the gradient of the back-propagation of the back-compensation layer, it can be deduced that:
finally, the gradient of the loss function/with respect to the input matrix X can be deduced as:
the parameters in the GCP module may be updated by back propagation formulas. GCP retains semantic information better than GAP. Most importantly, the GCP module is more suitable for GPU parallel operation.

Claims (8)

1. The method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution is characterized by comprising the following steps of:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map.
2. The multi-scale feature fusion method based on Fourier convolution visible light and infrared images according to claim 1, wherein the specific process of performing multi-source information fusion processing by the fast Fourier convolution module is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixel height and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
feature map Y comprising a mapping of high frequency branches H to high frequency branches H l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L →L
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, C represents the spatial resolution and the number of channels, respectively;
c4, fast Fourier convolution module along characteristic channelDimension splitting series characteristic diagram X, namely splitting into X= { X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useLet y= { Y as output tensor l ,Y g And updated with equation 1),
Y l =Y l→l +Y g→l =f l (X l )+f g→l (X g )
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) Formula 1);
c6, use 3X 3 convolution to Y l And Y g And after convolution processing, fusing the two to obtain an output tensor Y, namely a multisource information fusion feature map.
3. The method for multi-scale feature fusion based on Fourier convolution visible light and infrared images according to claim 1, wherein the specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, firstly carrying out feature extraction by using 1X 1 convolution processing on the bottleneck block through depth convolution, and finally carrying out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
4. The fourier convolution-based visible and infrared image multi-scale feature fusion method according to claim 3, wherein the specific processing procedure of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
5. The fourier convolution visible light and infrared image-based multi-scale feature fusion method according to claim 1, wherein the specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map;
e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, 1 is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula a= (1/(tr (Σ))) Σ;
e4, carrying out iterative treatment by adopting a Newton-Schulz iterative formula;
and E4, performing post-compensation treatment and splicing treatment in sequence.
6. The Fourier convolution visible and infrared image-based multi-scale feature fusion method according to claim 5, wherein the Newton-Schulz iteration formula is that
Wherein I represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
Z k-1 representing the result obtained by iterating k-1 times by taking the identity matrix I as a starting value, and the same as I k The results obtained after iterating k times are shown.
7. The fourier convolution visible light and infrared image based multi-scale feature fusion method according to claim 5, wherein a calculation formula of the post-compensation process is: c= (tr (Σ)) 1/2 Y N Where tr (Σ) is the covariance matrix trace, Y N The results obtained after N iterations.
8. The Fourier convolution visible light and infrared image-based multi-scale feature fusion method according to claim 5, wherein the specific process of the splicing process is to splice an upper triangular matrix of a symmetrical matrix obtained by post-compensation processing into a feature map of d (d-1)/2-dimensional vector, so as to obtain a comprehensive fusion feature map.
CN202311037544.0A 2023-08-17 2023-08-17 Fourier convolution-based visible light and infrared image multi-scale feature fusion method Active CN117058507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311037544.0A CN117058507B (en) 2023-08-17 2023-08-17 Fourier convolution-based visible light and infrared image multi-scale feature fusion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311037544.0A CN117058507B (en) 2023-08-17 2023-08-17 Fourier convolution-based visible light and infrared image multi-scale feature fusion method

Publications (2)

Publication Number Publication Date
CN117058507A true CN117058507A (en) 2023-11-14
CN117058507B CN117058507B (en) 2024-03-19

Family

ID=88658487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311037544.0A Active CN117058507B (en) 2023-08-17 2023-08-17 Fourier convolution-based visible light and infrared image multi-scale feature fusion method

Country Status (1)

Country Link
CN (1) CN117058507B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401373A (en) * 2020-03-04 2020-07-10 武汉大学 Efficient semantic segmentation method based on packet asymmetric convolution
CN111738314A (en) * 2020-06-09 2020-10-02 南通大学 Deep learning method of multi-modal image visibility detection model based on shallow fusion
CN111899206A (en) * 2020-08-11 2020-11-06 四川警察学院 Medical brain image fusion method based on convolutional dictionary learning
CN111899209A (en) * 2020-08-11 2020-11-06 四川警察学院 Visible light infrared image fusion method based on convolution matching pursuit dictionary learning
CN111899207A (en) * 2020-08-11 2020-11-06 四川警察学院 Visible light and infrared image fusion method based on local processing convolution dictionary learning
CN112801040A (en) * 2021-03-08 2021-05-14 重庆邮电大学 Lightweight unconstrained facial expression recognition method and system embedded with high-order information
WO2021120404A1 (en) * 2019-12-17 2021-06-24 大连理工大学 Infrared and visible light fusing method
CN113159067A (en) * 2021-04-13 2021-07-23 北京工商大学 Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN114005046A (en) * 2021-11-04 2022-02-01 长安大学 Remote sensing scene classification method based on Gabor filter and covariance pooling
CN114445430A (en) * 2022-04-08 2022-05-06 暨南大学 Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN115019132A (en) * 2022-06-14 2022-09-06 哈尔滨工程大学 Multi-target identification method for complex background ship
CN115100301A (en) * 2022-07-19 2022-09-23 重庆七腾科技有限公司 Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow
CN115688040A (en) * 2022-11-08 2023-02-03 西安交通大学 Mechanical equipment fault diagnosis method, device, equipment and readable storage medium
CN116310688A (en) * 2023-03-16 2023-06-23 城云科技(中国)有限公司 Target detection model based on cascade fusion, and construction method, device and application thereof
CN116486288A (en) * 2023-04-23 2023-07-25 东南大学 Aerial target counting and detecting method based on lightweight density estimation network
CN116486251A (en) * 2023-03-01 2023-07-25 中国矿业大学 Hyperspectral image classification method based on multi-mode fusion

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220044374A1 (en) * 2019-12-17 2022-02-10 Dalian University Of Technology Infrared and visible light fusion method
WO2021120404A1 (en) * 2019-12-17 2021-06-24 大连理工大学 Infrared and visible light fusing method
CN111401373A (en) * 2020-03-04 2020-07-10 武汉大学 Efficient semantic segmentation method based on packet asymmetric convolution
CN111738314A (en) * 2020-06-09 2020-10-02 南通大学 Deep learning method of multi-modal image visibility detection model based on shallow fusion
CN111899206A (en) * 2020-08-11 2020-11-06 四川警察学院 Medical brain image fusion method based on convolutional dictionary learning
CN111899209A (en) * 2020-08-11 2020-11-06 四川警察学院 Visible light infrared image fusion method based on convolution matching pursuit dictionary learning
CN111899207A (en) * 2020-08-11 2020-11-06 四川警察学院 Visible light and infrared image fusion method based on local processing convolution dictionary learning
CN112801040A (en) * 2021-03-08 2021-05-14 重庆邮电大学 Lightweight unconstrained facial expression recognition method and system embedded with high-order information
CN113159067A (en) * 2021-04-13 2021-07-23 北京工商大学 Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN114005046A (en) * 2021-11-04 2022-02-01 长安大学 Remote sensing scene classification method based on Gabor filter and covariance pooling
CN114445430A (en) * 2022-04-08 2022-05-06 暨南大学 Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN115019132A (en) * 2022-06-14 2022-09-06 哈尔滨工程大学 Multi-target identification method for complex background ship
CN115100301A (en) * 2022-07-19 2022-09-23 重庆七腾科技有限公司 Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow
CN115688040A (en) * 2022-11-08 2023-02-03 西安交通大学 Mechanical equipment fault diagnosis method, device, equipment and readable storage medium
CN116486251A (en) * 2023-03-01 2023-07-25 中国矿业大学 Hyperspectral image classification method based on multi-mode fusion
CN116310688A (en) * 2023-03-16 2023-06-23 城云科技(中国)有限公司 Target detection model based on cascade fusion, and construction method, device and application thereof
CN116486288A (en) * 2023-04-23 2023-07-25 东南大学 Aerial target counting and detecting method based on lightweight density estimation network

Also Published As

Publication number Publication date
CN117058507B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN109741256B (en) Image super-resolution reconstruction method based on sparse representation and deep learning
CN111709902B (en) Infrared and visible light image fusion method based on self-attention mechanism
Lin et al. Hyperspectral image denoising via matrix factorization and deep prior regularization
CN112233026A (en) SAR image denoising method based on multi-scale residual attention network
CN107730482B (en) Sparse fusion method based on regional energy and variance
Panigrahy et al. Parameter adaptive unit-linking dual-channel PCNN based infrared and visible image fusion
CN114862731B (en) Multi-hyperspectral image fusion method guided by low-rank priori and spatial spectrum information
CN113887645B (en) Remote sensing image fusion classification method based on joint attention twin network
CN112381144B (en) Heterogeneous deep network method for non-European and Euclidean domain space spectrum feature learning
de Souza Brito et al. Combining max-pooling and wavelet pooling strategies for semantic image segmentation
CN112967210A (en) Unmanned aerial vehicle image denoising method based on full convolution twin network
Feng et al. Fully convolutional network-based infrared and visible image fusion
CN114972748A (en) Infrared semantic segmentation method capable of explaining edge attention and gray level quantization network
CN115984323A (en) Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization
CN113762277B (en) Multiband infrared image fusion method based on Cascade-GAN
CN112418203B (en) Robustness RGB-T tracking method based on bilinear convergence four-stream network
Huang et al. MAGAN: Multiattention generative adversarial network for infrared and visible image fusion
CN117853596A (en) Unmanned aerial vehicle remote sensing mapping method and system
CN113421198A (en) Hyperspectral image denoising method based on subspace non-local low-rank tensor decomposition
CN117058507B (en) Fourier convolution-based visible light and infrared image multi-scale feature fusion method
CN110648332B (en) Image discriminable area extraction method based on multi-branch convolutional neural network feature orthogonality
Sun et al. IMGAN: Infrared and visible image fusion using a novel intensity masking generative adversarial network
Salem et al. Image fusion models and techniques at pixel level
CN116051444A (en) Effective infrared and visible light image self-adaptive fusion method
CN112990230B (en) Spectral image compression reconstruction method based on two-stage grouping attention residual error mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant