CN117765410B

CN117765410B - Remote sensing image double-branch feature fusion solid waste identification method and system and electronic equipment

Info

Publication number: CN117765410B
Application number: CN202410018256.9A
Authority: CN
Inventors: 刘明超; 储国中; 舒弥; 杨超群
Original assignee: Spacetime Intelligence Big Data Xi'an Co ltd; Zhejiang Shizizhizi Big Data Co ltd
Current assignee: Spacetime Intelligence Big Data Xi'an Co ltd; Zhejiang Shizizhizi Big Data Co ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-05-28
Anticipated expiration: 2044-01-05
Also published as: CN117765410A

Abstract

The invention discloses a remote sensing image double-branch feature fusion solid waste identification method, a system and electronic equipment, wherein the method comprises the following steps: s1, constructing a double-branch feature fusion model, inputting a remote sensing image sample tag dataset into the double-branch feature fusion model for model training, and performing parallel feature extraction by improving EFFICIENTNET feature extraction branches and Transformer feature extraction branches; s2, sequentially carrying out multi-level feature fusion on the four-layer feature images by a four-layer feature fusion module of the double-branch feature fusion model, outputting a fusion feature image, and carrying out convolution processing on the fusion feature image to obtain a final fusion feature image; and S3, acquiring remote sensing image data, slicing to obtain a plurality of image slices, sequentially inputting each image slice into the double-branch feature fusion model, and sequentially obtaining solid waste identification results corresponding to the image slices. The invention focuses more on the key information of the solid waste area, enhances the characteristic excavation capability of the small-area solid waste area, and reduces the interference of the solid waste scale randomness on model identification.

Description

Remote sensing image double-branch feature fusion solid waste identification method and system and electronic equipment

Technical Field

The invention relates to the field of urban remote sensing image solid waste identification treatment, in particular to a remote sensing image double-branch feature fusion solid waste identification method, a system and electronic equipment.

Background

Urban solid waste refers to solid or semi-solid waste objects generated in human production construction, daily life and other activities. The annual growth of urban solid waste becomes an important burden for restricting the sustainable development of cities, so that the space distribution and the area size of urban solid waste accumulation can be timely and accurately acquired, and important reference information can be provided for land resource management and urban ecological environment monitoring. Traditional urban solid waste monitoring and management mainly relies on manual field investigation and field measurement, and the method has the defects of poor timeliness, time and labor consumption and incapability of omnibearing monitoring. With the increase of satellite remote sensing image data acquisition channels and the development of remote sensing image interpretation methods, the utilization of remote sensing images to extract urban solid waste image spots and area measurement has become an important method and means for timely monitoring urban solid waste.

At present, a plurality of methods for developing a refined urban solid waste recognition task by utilizing remote sensing image data, particularly high-resolution remote sensing images, including a spectrum-based recognition method, an object-oriented classification method and a machine learning recognition method, are available. The traditional solid waste identification method based on spectral characteristics and object-oriented is difficult to solve the problem that the identification accuracy is low due to the diversity of solid waste and regional diversity; the solid waste identification method based on the machine learning algorithm is a method for extracting sample characteristics through a characteristic extraction algorithm, inputting characteristic vectors into a classifier for training and predicting to obtain a solid waste region of interest, and the method has certain requirements on the characteristic extraction algorithm and the type of the classifier, and is difficult to show enough generalization capability in practical tasks. With the development of deep learning methods, a plurality of classification methods based on convolutional neural networks are used for high-resolution remote sensing image urban solid waste recognition tasks. The convolutional neural network can automatically learn image features, semantic segmentation accuracy can be improved through modes of model pre-training, multi-model fusion and the like, and the recognition result is superior to that of a traditional spectrum recognition method and a classical machine learning method. However, the spatial distribution and the geometric size of urban solid wastes in the remote sensing image have randomness, the internal structure of solid waste deposit is disordered, and the spectrum confusion of various solid wastes causes the remote sensing image to show larger heterogeneity, thereby increasing the difficulty of fine solid waste identification. Therefore, the prior art cannot effectively realize high-precision solid waste identification by utilizing urban remote sensing images, and has the problems of insufficient generalization performance and insufficient migration performance of models.

Disclosure of Invention

The invention aims to overcome the technical problems pointed out by the background technology, and provides a remote sensing image double-branch feature fusion solid waste identification method, a system and electronic equipment, which adopt an improved EFFICIENTNET feature extraction branch and a transform feature extraction branch parallel structure, and simultaneously connect in parallel between two feature extraction branches through a four-layer feature fusion module to form a four-layer parallel structure, thereby having the acquisition capability of local information and the mining capability of global features, enhancing the feature exchange and feature fusion of a plurality of scales and having better robustness in urban solid waste identification tasks.

The aim of the invention is achieved by the following technical scheme:

a method for identifying double-branch feature fusion solid waste of a remote sensing image comprises the following steps:

S1, constructing a remote sensing image sample tag data set, wherein the remote sensing image sample tag data set stores image slice samples and corresponding solid waste tags, and urban solid waste and non-urban solid waste in the solid waste tags; constructing a double-branch feature fusion model, and inputting a remote sensing image sample tag dataset into the double-branch feature fusion model for model training, wherein the double-branch feature fusion model comprises an improved EFFICIENTNET feature extraction branch and a Transformer feature extraction branch; the improved EFFICIENTNET feature extraction branch comprises a EFFICIENTNET-B3 model, the EFFICIENTNET-B3 model sequentially captures space detail features of an input image slice sample through four layers of improved EFFICIENTNET modules, the features are subjected to up-scaling according to an expansion proportion through point-by-point convolution of 1×1, then channel relation and position information of the features are obtained through deep convolution and coordinate attention processing, and four layers of improved EFFICIENTNET modules sequentially output four-scale feature graphs Characteristic map/>；

The method comprises the steps that a transform feature extraction branch uniformly divides an input image slice sample into N image blocks, and embeds each image block by adopting linear mapping to obtain an embedded sequence; the transducer feature extraction branch comprises four layers of transducer modules, the transducer modules are used for information aggregation with the multi-layer perceptron through an L-layer multi-head attention mechanism, and the four layers of transducer modules sequentially output feature graphs with four scalesCharacteristic map/>；

S2, the double-branch feature fusion model further comprises four-layer feature fusion modules, wherein the four-layer feature fusion modules, the four-layer improvement EFFICIENTNET module and the four-layer transducer module are sequentially and correspondingly arranged according to layers, each of the four-layer feature fusion modules comprises a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four-layer feature fusion modules respectively aim at the feature graphsCharacteristic map/>Channel attention processing is carried out, and feature graphs/> areobtained in sequenceCharacteristic map/>; Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>; The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains a feature map/>Characteristic map/>The four-layer feature fusion module respectively fuses the three feature images obtained in the same layer and sequentially obtains the output feature image/>Characteristic map/>; Four-layer feature fusion module pair feature map/>Characteristic map/>Sequentially carrying out multi-level feature fusion and outputting a fusion feature map, and carrying out convolution processing on the fusion feature map to obtain a final fusion feature map; the double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label;

S3, acquiring remote sensing image data, slicing to obtain a plurality of image slices, sequentially inputting each image slice into a double-branch feature fusion model, and sequentially obtaining solid waste identification results corresponding to the image slices; or acquiring remote sensing image data and inputting the remote sensing image data into a double-branch feature fusion model, and identifying pixel-by-pixel solid waste of the remote sensing image data by the double-branch feature fusion model and acquiring solid waste identification results of all pixels; or acquiring remote sensing image data, inputting the remote sensing image data into a double-branch feature fusion model, and adopting a sliding window form to perform solid waste identification.

In order to better implement the present invention, in step S2, the four-layer feature fusion module acquires a feature mapCharacteristic diagramThe method comprises the following steps:

Channel attention ECA module pair feature map of first layer feature fusion module Performing one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>；

Channel attention ECA module pair feature map of second-layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>；

Channel attention ECA module pair feature map of third layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>；

Channel attention ECA module pair feature map of fourth-layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>。

The preferable technical scheme of the invention is as follows: in step S2, the attention CBAM modules of the four-layer feature fusion module each include a channel attention CAM module and a spatial attention SAM module, and acquire a feature mapCharacteristic map/>The method comprises the following steps:

Attention CBAM module to feature map of first layer feature fusion module The following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of second layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of third layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of fourth layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>。

The preferable technical scheme of the invention is as follows: in step S2, the four-layer feature fusion module acquires a feature mapCharacteristic map/>The method comprises the following steps:

the fourth layer feature fusion module fuses the feature map obtained by the fourth layer Feature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the fourth layer; The third layer feature fusion module is used for fusing the feature images obtained from the third layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the third layer; The second layer feature fusion module carries out/>, on the feature map obtained by the second layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the second layer; The first layer feature fusion module is used for fusing the feature map/>, obtained by the first layerFeature map/>Feature mapFeature fusion processing is carried out to obtain a feature map/>, which is output by the first layer。

The preferable technical scheme of the invention is as follows: in step S2, the fusion feature map is convolved with 3×3 to obtain a final fusion feature map.

The preferable technical scheme of the invention is as follows: in step S1, the sample size of the image slice is 512 pixels×512 pixels, and the feature mapCharacteristic map/>The sizes are respectively/>、/>、/>、/>; The image slice sample before being input to EFFICIENTNET-B3 model is processed by 3X 3 convolution feature extraction, batch normalization and Sigmoid weighted linear unit function; feature map/>Characteristic map/>The sizes are respectively/>、/>、、/>。

Preferably, the total loss function of the dual-branch feature fusion modelThe expression is as follows:

; wherein/> Representing predicted outcome loss,/>Representing an improvement EFFICIENTNET feature extraction branch loss function,/>Representing a transducer feature extraction branch loss function,/>And/>Representing the weights.

Preferably, the dual-branch feature fusion model cuts the input remote sensing image data into image blocks by adopting a preset sliding window size and overlapping degree, performs solid waste identification on the image blocks one by one, and retains a solid waste identification result of a central area of the image blocks to obtain a solid waste prediction result without splicing marks.

The double-branch feature fusion solid waste identification system for the remote sensing image comprises a double-branch feature fusion model, wherein the double-branch feature fusion model comprises a storage module, an improved EFFICIENTNET feature extraction branch and a Transformer feature extraction branch, the storage module stores a remote sensing image sample tag data set and remote sensing image data, the remote sensing image sample tag data set stores an image slice sample and a corresponding solid waste tag, and urban solid waste and non-urban solid waste in the solid waste tag; the improved EFFICIENTNET feature extraction branch comprises a EFFICIENTNET-B3 model, the EFFICIENTNET-B3 model sequentially captures space detail features of an input image slice sample through four layers of improved EFFICIENTNET modules, the features are subjected to up-scaling according to an expansion proportion through point-by-point convolution of 1×1, then channel relation and position information of the features are obtained through deep convolution and coordinate attention processing, and four layers of improved EFFICIENTNET modules sequentially output four-scale feature graphsCharacteristic map/>; The method comprises the steps that an input image slice sample is uniformly divided into N image blocks by a transducer characteristic extraction branch, each image block is embedded by linear mapping, an embedded sequence is obtained, the transducer characteristic extraction branch further comprises four layers of transducer modules, the transducer modules conduct information aggregation with a multi-layer sensor through an L-layer multi-head attention mechanism, and the four layers of transducer modules sequentially output four-scale characteristic images/>Characteristic map/>; The double-branch feature fusion model further comprises four layers of feature fusion modules, wherein the four layers of feature fusion modules, the four layers of improvement EFFICIENTNET modules and the four layers of conversion modules are correspondingly arranged according to layers, the four layers of feature fusion modules comprise a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four layers of feature fusion modules respectively pair the feature images/>Characteristic map/>Carrying out channel attention processing and sequentially obtaining feature graphs/>Characteristic map/>; Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>; The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains the feature mapCharacteristic map/>; The four-layer feature fusion module respectively fuses the three feature images obtained in the same layer and sequentially obtains the output feature image/>Characteristic map/>; Four-layer feature fusion module pair feature map/>Characteristic map/>Sequentially carrying out multi-level feature fusion and outputting a fusion feature map, and carrying out convolution processing on the fusion feature map to obtain a final fusion feature map; the double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label; the double-branch feature fusion model carries out slicing treatment on the remote sensing image data to obtain a plurality of image slices, and solid waste recognition results corresponding to the image slices are sequentially obtained; or the double-branch feature fusion model identifies the solid waste of the remote sensing image data pixel by pixel and acquires the solid waste identification result of all pixels; or the double-branch feature fusion model is used for carrying out solid waste identification on the remote sensing image data in a sliding window mode.

An electronic device comprising at least one processor, at least one memory and a data bus; wherein: the processor and the memory complete communication with each other through a data bus; the memory stores program instructions executed by the processor, and the processor calls the program instructions to execute the steps of realizing the remote sensing image double-branch feature fusion solid waste identification method.

Compared with the prior art, the invention has the following advantages:

(1) The invention adopts an improved EFFICIENTNET feature extraction branch and a Transformer feature extraction branch parallel structure, and simultaneously connects in parallel between two feature extraction branches through the four-layer feature fusion module to form a four-layer parallel structure, thereby having the acquisition capability of local information and the mining capability of global features, enhancing the feature communication and feature fusion of multiple scales and having better robustness in urban solid waste recognition tasks.

(2) The four-layer EFFICIENTNET core module with the EFFICIENTNET feature extraction branches can better capture space detail features, acquire more accurate channel relation and position information by adopting an attention mechanism, and carry out information aggregation by the four-layer transducer core module with the EFFICIENTNET feature extraction branches; the invention focuses more on the key information of the solid waste area, enhances the characteristic excavation capability of the small-area solid waste area, and reduces the interference of the solid waste scale randomness on model identification.

(3) The attention CBAM module and the channel attention ECA module in the core four-layer feature fusion module realize effective fusion of feature information, can capture global context information more efficiently, and are beneficial to better understanding the whole image structure; the difference of the feature images during direct fusion is avoided, the model fusion efficiency can be improved, the space information loss is reduced, and the segmentation precision and the solid waste recognition precision are improved.

Drawings

FIG. 1 is a schematic flow chart of a method for identifying double-branch feature fusion solid waste;

FIG. 2 is a schematic diagram of a dual-branch feature fusion model of the present invention;

FIG. 3 is a schematic diagram showing the combination of the improved EFFICIENTNET feature extraction branch, the transducer feature extraction branch, and the feature fusion module in the embodiment;

FIG. 4 is a schematic diagram of the structure of an ECA module in a feature fusion module according to an embodiment;

FIG. 5 is a schematic diagram of a modified EFFICIENTNET module in an embodiment;

FIG. 6 is a schematic structural diagram of a CBAM module in a feature fusion module according to an embodiment;

fig. 7 is a schematic block diagram of a dual-branch feature fusion solid waste identification system according to the present invention.

Description of the embodiments

The invention is further illustrated by the following examples:

Examples

As shown in fig. 1, a method for identifying double-branch feature fusion solid waste of a remote sensing image includes:

S1, constructing a remote sensing image sample tag data set, wherein the remote sensing image sample tag data set stores image slice samples (in the embodiment, the sizes of the image slice samples are exemplified by 512 pixels multiplied by 512 pixels, and the sizes of the image slice samples can be selected to be of other sizes according to actual conditions, in some embodiments, the image slice samples need data cutting, sample screening, data enhancement and data preprocessing operations of sample data set division), and corresponding solid waste tags, wherein urban solid waste and non-urban solid waste are contained in the solid waste tags. In some embodiments, the image slice sample is further processed including radiation correction, orthographic correction and image fusion, the image slice sample is subjected to vector labeling, wherein the urban solid waste value is 1, the non-urban solid waste area value is 0, and then the remote sensing image sample label dataset (including the image slice sample and the solid waste label) of the research area is obtained after rasterization, and the expansion processing can be performed by adopting the following method: the remote sensing image sample source (image slice sample slice data source) and the solid waste label data acquire slices by adopting a sliding window cutting mode according to the overlapping degree of 25%, and the image slice sample size is exemplified as 512 pixels multiplied by 512 pixels; and performing operations such as rotation, scaling, color degree conversion, noise addition and the like on the acquired image slice samples, and increasing the number and diversity of the samples.

As shown in fig. 2, a dual-branch feature fusion model is constructed, a remote sensing image sample tag dataset is input into the dual-branch feature fusion model for model training, and the data of the remote sensing image sample tag dataset is according to 7:2:1 into a training set, a verification set and a test set; the training set is used for training the double-branch feature fusion model to learn features; the verification set is used for evaluating the performance of the training stage of the double-branch feature fusion model, so that training parameters can be conveniently adjusted; the test set is used for testing and evaluating the recognition effect and accuracy of the network when the training of the double-branch feature fusion model is completed.

The dual-branch feature fusion model includes a modified EFFICIENTNET feature extraction branch and a transducer feature extraction branch. The improved EFFICIENTNET feature extraction branch comprises a EFFICIENTNET-B3 model, the EFFICIENTNET-B3 model captures space detail features of an input image slice sample sequentially through four layers of improved EFFICIENTNET modules (preferably, the image slice sample before the input EFFICIENTNET-B3 model is subjected to feature extraction, batch normalization and Sigmoid weighted linear unit function processing through 3×3 convolution, so that multi-level space detail can be captured more advantageously, each time the resolution of the image is halved and the number of channels is doubled), the features are subjected to dimension increase according to an expansion proportion through 1×1 point-by-point convolution, then the channel relation and the position information of the features are obtained through depth convolution and coordinate attention processing (the feature extraction capability of a small target solid waste image is improved), and as shown in fig. 5, the four layers of improved EFFICIENTNET modules sequentially output four-scale feature graphsCharacteristic map/>(In the example of the present embodiment, the feature map/>Characteristic map/>The sizes are respectively/>、、/>、/>）。

The transducer feature extraction branch takes input video slice samples (images are recorded as) Evenly divide into N image blocks,/>(Wherein,/>And/>Representing the height and width of the input features, this embodiment exemplifies/>And a value of 16). Embedding each image block using a linear mapping and obtaining an embedded sequence (preferably each image block is flattened to pass to the output dimension/>In the linear embedding of (a) to obtain the original embedded sequence/>). The transducer feature extraction branch comprises four layers of transducer modules, the transducer modules are used for information aggregation with the multi-layer perceptron through an L-layer multi-head attention mechanism, and the four layers of transducer modules sequentially output four-scale feature graphs/>Characteristic map/>(In the example of the embodiment, the output of the last layer of transducer module is subjected to layer normalization to obtain a coded sequence, and then spatial resolution is recovered by means of layer-by-layer upsampling, and a characteristic map/>Characteristic map/>The sizes are respectively/>、、/>、/>）。

S2, the double-branch feature fusion model further comprises four-layer feature fusion modules, wherein the four-layer feature fusion modules, the four-layer improvement EFFICIENTNET module and the four-layer transducer module are sequentially and correspondingly arranged according to layers, each of the four-layer feature fusion modules comprises a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four-layer feature fusion modules respectively aim at the feature graphsCharacteristic map/>Channel attention processing is carried out, and feature graphs/> areobtained in sequenceCharacteristic map/>. Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>. The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains a feature map/>Characteristic map/>; Preferably, the feature map/>, of the first layerAnd feature map/>Direct fusion to obtain feature map/>Feature map of the second layer/>And feature map/>Direct fusion to obtain feature map/>Feature map of third layer/>And feature map/>Direct fusion to obtain feature map/>Feature map of the fourth layer/>And feature map/>Direct fusion to obtain feature map/>; In the characteristic diagram/>As a representative, then/>Wherein/>Indicates a one-dimensional convolution operation, +.. As shown in fig. 3, the four-layer feature fusion module respectively fuses three feature graphs obtained in the same layer and sequentially obtains an output feature graph/>Characteristic map/>. Four-layer feature fusion module pair feature map/>Characteristic map/>Sequentially performing multi-level feature fusion and outputting a fusion feature map, and performing convolution processing on the fusion feature map to obtain a final fusion feature map (preferably, the fusion feature map also adopts 3x3 convolution processing to obtain the final fusion feature map). The double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label.

In some embodiments, toRepresents the/>Output feature map of layer. Using ATTENTAIN GATE (AG) module to make multi-level feature fusion to obtain final feature map of coding stage as follows: /(I)And there is/>. Wherein/>Representing upsampling operations,/>Representing a jump connection,/>Representing the number of layers of the fusion module,/>Representing Hadamard product operations.

In some preferred embodiments, as shown in FIG. 4, a four-layer feature fusion module obtains a feature mapCharacteristic diagramThe method comprises the following steps:

Channel attention ECA module pair feature map of first layer feature fusion module Performing one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>. The further technical scheme is as follows: feature map/>The following expression may be employed: /(I); Wherein/>Representing an activation function,/>Representing one-dimensional convolution operations,/>Representing global average pooling operations,/>Is a full connection layer,/>Representing element-by-element multiplication, i.e. feature/>, of input channel attention ECA module pairsAnd channel attention weightThe multiplication results in an output after passing through the channel attention ECA module.

Channel attention ECA module pair feature map of second-layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>. The further technical scheme is as follows: feature map/>The following expression may be employed: ; wherein/> Representing an activation function,/>Representing a one-dimensional convolution operation,Representing global average pooling operations,/>Is a full connection layer,/>Representing element-by-element multiplication, i.e. feature/>, of input channel attention ECA module pairsAnd channel attention weight/>The multiplication results in an output after passing through the channel attention ECA module.

Channel attention ECA module pair feature map of third layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>. The further technical scheme is as follows: feature map/>The following expression may be employed: ; wherein/> Representing an activation function,/>Representing a one-dimensional convolution operation,Representing global average pooling operations,/>Is a full connection layer,/>Representing element-by-element multiplication, i.e. feature/>, of input channel attention ECA module pairsAnd channel attention weight/>The multiplication results in an output after passing through the channel attention ECA module.

Channel attention ECA module pair feature map of fourth-layer feature fusion modulePerforming one-dimensional convolution operation, global average pooling operation and activation function operation to obtain the attention weight of the channel, and obtaining the attention weight and the feature map/>Element-by-element multiplication to obtain a feature map/>. The further technical scheme is as follows: feature map/>The following expression may be employed: ; wherein/> Representing an activation function,/>Representing a one-dimensional convolution operation,Representing global average pooling operations,/>Is a full connection layer,/>Representing element-by-element multiplication, i.e. feature/>, of input channel attention ECA module pairsAnd channel attention weight/>The multiplication results in an output after passing through the channel attention ECA module.

In some preferred embodiments, as shown in FIG. 5, the attention CBAM modules of the four-layer feature fusion module each include a channel attention CAM module and a spatial attention SAM module, and acquire a feature mapCharacteristic map/>The method comprises the following steps:

Attention CBAM module to feature map of first layer feature fusion module The following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting to obtain a channel weighted feature map, and carrying out/>, on the feature mapGlobal maximum pooling and global average pooling are carried out on the channel dimension, and the maximum value and the average value of each channel are respectively obtained (expressed as/>) The maximum value and the average value are connected in series and then are subjected to full connection layer to obtain a space weighted feature map, and the channel weighted feature map and the space weighted feature map are multiplied element by element to obtain a feature map/>。

Attention CBAM module to feature map of second layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>And carrying out weighting processing to obtain a characteristic diagram after channel weighting. For the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>。

Attention CBAM module to feature map of third layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>And carrying out weighting processing to obtain a characteristic diagram after channel weighting. For the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>。

Attention CBAM module to feature map of fourth layer feature fusion moduleThe following treatment is carried out: for characteristic diagramGlobal average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>And carrying out weighting processing to obtain a characteristic diagram after channel weighting. For the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>。

In some preferred embodiments, the further technical solutions are: the CBAM module structure comprises two sub-modules, wherein the two sub-modules are a channel attention CAM module and a space attention SAM module respectively, and the space attention weighted characteristic output calculation process in the CBAM module structure is as follows: global averaging pooling of input features in spatial dimensions to obtain an average per channelThe channel weight is learned by using the full connection layer and is combined with the input characteristic diagram/>Weighting to obtain the output/>, of the channel attention CAM module. The input of the space attention SAM module is a characteristic diagram weighted by the channel attention CAM module, and the original characteristics are subjected to global maximum pooling and global average pooling in the channel dimension to obtain the maximum value/>, of each channelAverage value/>The maximum value and the average value are connected in series and then the spatial attention weight/>' is obtained through full-connection layer learningThen the spatial attention weighted output in CBAM modules is/>Wherein/>Representing an activation function,/>Representing one-dimensional convolution operations,/>Representing global average pooling operations,/>Is a full connection layer,/>Representing element-wise multiplication.

In some preferred embodiments, a four-layer feature fusion module obtains feature mapsCharacteristic map/>The method comprises the following steps:

the fourth layer feature fusion module fuses the feature map obtained by the fourth layer Feature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the fourth layer. The third layer feature fusion module is used for fusing the feature map/>, obtained by the third layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the third layer. The second layer feature fusion module carries out/>, on the feature map obtained by the second layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the second layer. The first layer feature fusion module is used for fusing the feature map/>, obtained by the first layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the first layer。

S3, acquiring remote sensing image data (preferably high-resolution remote sensing images), slicing to obtain a plurality of image slices, sequentially inputting each image slice into a double-branch feature fusion model, and sequentially obtaining solid waste identification results corresponding to the image slices. Or acquiring remote sensing image data, inputting the remote sensing image data into a double-branch feature fusion model, and identifying the solid waste of each pixel of the remote sensing image data by the double-branch feature fusion model and acquiring solid waste identification results of all pixels. Or acquiring remote sensing image data, inputting the remote sensing image data into a double-branch feature fusion model, and adopting a sliding window form to perform solid waste identification.

In some preferred embodiments, the total loss function of the dual-branch feature fusion modelThe expression is as follows: . Wherein/> Representing predicted outcome loss,/>Representing an improvement EFFICIENTNET feature extraction branch loss function,/>Representing a transducer feature extraction branch loss function,/>And/>Representing the weights.

In some preferred embodiments, the following method may be further adopted in step S3: and cutting the input remote sensing image data into image blocks by adopting a preset sliding window size and overlapping degree by the double-branch feature fusion model, carrying out solid waste identification on the image blocks one by one, and reserving a solid waste identification result of a central area of the image blocks to obtain a solid waste prediction result without splicing marks.

As shown in fig. 7, a remote sensing image dual-branch feature fusion solid waste identification system comprises a dual-branch feature fusion model, wherein the dual-branch feature fusion model comprises a storage module, a modified EFFICIENTNET feature extraction branch and a Transformer feature extraction branch, the storage module stores a remote sensing image sample tag dataset and remote sensing image data, the remote sensing image sample tag dataset stores an image slice sample and a corresponding solid waste tag, and urban solid waste and non-urban solid waste in the solid waste tag. The improved EFFICIENTNET feature extraction branch comprises a EFFICIENTNET-B3 model, the EFFICIENTNET-B3 model sequentially captures space detail features of an input image slice sample through four layers of improved EFFICIENTNET modules, the features are subjected to up-scaling according to an expansion proportion through point-by-point convolution of 1×1, then channel relation and position information of the features are obtained through deep convolution and coordinate attention processing, and four layers of improved EFFICIENTNET modules sequentially output four-scale feature graphsCharacteristic map/>. The method comprises the steps that an input image slice sample is uniformly divided into N image blocks by a transducer characteristic extraction branch, each image block is embedded by linear mapping, an embedded sequence is obtained, the transducer characteristic extraction branch further comprises four layers of transducer modules, the transducer modules conduct information aggregation with a multi-layer sensor through an L-layer multi-head attention mechanism, and the four layers of transducer modules sequentially output four-scale characteristic images/>Characteristic map/>. The double-branch feature fusion model further comprises four layers of feature fusion modules, wherein the four layers of feature fusion modules, the four layers of improvement EFFICIENTNET modules and the four layers of conversion modules are correspondingly arranged according to layers, the four layers of feature fusion modules comprise a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four layers of feature fusion modules respectively pair the feature images/>Characteristic map/>Channel attention processing is carried out, and feature graphs/> areobtained in sequenceCharacteristic map/>. Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>. The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains a feature map/>Characteristic map/>. The four-layer feature fusion module respectively fuses the three feature images obtained in the same layer and sequentially obtains the output feature image/>Characteristic map/>. Four-layer feature fusion module pair feature map/>Characteristic map/>And sequentially carrying out multi-level feature fusion and outputting a fusion feature map, and carrying out convolution processing on the fusion feature map to obtain a final fusion feature map. The double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label. The double-branch feature fusion model performs slicing processing on the remote sensing image data to obtain a plurality of image slices, and solid waste recognition results corresponding to the image slices are sequentially obtained. Or the double-branch feature fusion model identifies the solid waste of the remote sensing image data pixel by pixel and acquires the solid waste identification result of all pixels. Or the double-branch feature fusion model is used for carrying out solid waste identification on the remote sensing image data in a sliding window mode.

An electronic device includes at least one processor, at least one memory, and a data bus. Wherein: the processor and the memory communicate with each other via a data bus. The memory stores program instructions executed by the processor, and the processor calls the program instructions to execute the steps for realizing the remote sensing image double-branch feature fusion solid waste identification method.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A remote sensing image double-branch feature fusion solid waste identification method is characterized by comprising the following steps of: the method comprises the following steps:

S2, the double-branch feature fusion model further comprises four-layer feature fusion modules, wherein the four-layer feature fusion modules, the four-layer improvement EFFICIENTNET module and the four-layer transducer module are sequentially and correspondingly arranged according to layers, each of the four-layer feature fusion modules comprises a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four-layer feature fusion modules respectively aim at the feature graphsCharacteristic map/>Channel attention processing is carried out, and feature graphs/> areobtained in sequenceCharacteristic map/>; Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>; The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains a feature map/>Characteristic map/>; The four-layer feature fusion module respectively fuses the three feature images obtained in the same layer and sequentially obtains the output feature image/>Characteristic map/>; Four-layer feature fusion module pair feature map/>Characteristic map/>Sequentially carrying out multi-level feature fusion and outputting a fusion feature map, and carrying out convolution processing on the fusion feature map to obtain a final fusion feature map; the double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label;

2. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: in step S2, the four-layer feature fusion module acquires a feature mapCharacteristic map/>The method comprises the following steps:

3. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: in step S2, the attention CBAM modules of the four-layer feature fusion module each include a channel attention CAM module and a spatial attention SAM module, and acquire a feature mapCharacteristic map/>The method comprises the following steps:

Attention CBAM module to feature map of first layer feature fusion module The following steps are carried out: for the characteristic diagram/>Global average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of second layer feature fusion moduleThe following treatment is carried out: for the characteristic diagram/>Global average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of third layer feature fusion moduleThe following treatment is carried out: for the characteristic diagram/>Global average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>；

Attention CBAM module to feature map of fourth layer feature fusion moduleThe following treatment is carried out: for the characteristic diagram/>Global average pooling processing is carried out on the space dimension to obtain the average value of each channel, and the channel weight is obtained by utilizing the full-connection layer and is combined with the input feature map/>Weighting treatment is carried out to obtain a characteristic diagram after channel weighting; for the characteristic diagram/>Carrying out global maximum pooling and global average pooling treatment on channel dimensions, respectively obtaining the maximum value and the average value of each channel, carrying out serial connection on the maximum value and the average value, obtaining a space weighted feature map through a full-connection layer, and carrying out element-by-element multiplication on the channel weighted feature map and the space weighted feature map to obtain a feature map/>。

4. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: in step S2, the four-layer feature fusion module acquires a feature mapCharacteristic map/>The method comprises the following steps:

the fourth layer feature fusion module fuses the feature map obtained by the fourth layer Feature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the fourth layer; The third layer feature fusion module is used for fusing the feature map/>, obtained by the third layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the third layer; The second layer feature fusion module carries out/>, on the feature map obtained by the second layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the second layer; The first layer feature fusion module is used for fusing the feature map/>, obtained by the first layerFeature map/>Feature map/>Feature fusion processing is carried out to obtain a feature map/>, which is output by the first layer。

5. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: in step S2, the fusion feature map is convolved with 3×3 to obtain a final fusion feature map.

6. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: in step S1, the sample size of the image slice is 512 pixels×512 pixels, and the feature mapCharacteristic map/>The sizes are respectively、/>、/>、/>; The image slice sample before being input to EFFICIENTNET-B3 model is processed by 3X 3 convolution feature extraction, batch normalization and Sigmoid weighted linear unit function; feature map/>Characteristic map/>The sizes are respectively/>、/>、/>、/>。

7. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: total loss function of the dual-branch feature fusion modelThe expression is as follows:

8. The method for identifying the fused solid waste of the double-branch characteristics of the remote sensing image according to claim 1, wherein the method comprises the following steps: and cutting the input remote sensing image data into image blocks by adopting a preset sliding window size and overlapping degree by the double-branch feature fusion model, carrying out solid waste identification on the image blocks one by one, and reserving a solid waste identification result of a central area of the image blocks to obtain a solid waste prediction result without splicing marks.

9. A remote sensing image double-branch feature fusion solid waste recognition system is characterized in that: the system comprises a double-branch feature fusion model, wherein the double-branch feature fusion model comprises a storage module, an improved EFFICIENTNET feature extraction branch and a Transformer feature extraction branch, the storage module stores a remote sensing image sample data set and remote sensing image data, the remote sensing image sample data set stores an image slice sample and a corresponding solid waste label, and urban solid waste and non-urban solid waste in the solid waste label; the improved EFFICIENTNET feature extraction branch comprises a EFFICIENTNET-B3 model, the EFFICIENTNET-B3 model sequentially captures space detail features of an input image slice sample through four layers of improved EFFICIENTNET modules, the features are subjected to up-scaling according to an expansion proportion through point-by-point convolution of 1×1, then channel relation and position information of the features are obtained through deep convolution and coordinate attention processing, and four layers of improved EFFICIENTNET modules sequentially output four-scale feature graphsCharacteristic map/>The method comprises the steps that an input image slice sample is uniformly divided into N image blocks by a transducer characteristic extraction branch, each image block is embedded by linear mapping, an embedded sequence is obtained, the transducer characteristic extraction branch further comprises four layers of transducer modules, the transducer modules conduct information aggregation with a multi-layer sensor through an L-layer multi-head attention mechanism, and the four layers of transducer modules sequentially output four-scale characteristic images/>Characteristic map/>; The double-branch feature fusion model further comprises four layers of feature fusion modules, wherein the four layers of feature fusion modules, the four layers of improvement EFFICIENTNET modules and the four layers of conversion modules are correspondingly arranged according to layers, the four layers of feature fusion modules comprise a channel attention ECA module and an attention CBAM module, and the channel attention ECA modules of the four layers of feature fusion modules respectively pair the feature images/>Characteristic map/>Channel attention processing is carried out, and feature graphs/> areobtained in sequenceCharacteristic map/>; Attention CBAM modules of the four-layer feature fusion module respectively pair feature graphs/>Characteristic map/>Performing space attention weighting processing and sequentially obtaining a characteristic diagram/>Characteristic map/>; The four-layer feature fusion module fuses the feature map of the same-layer transducer module with the feature map of the improved EFFICIENTNET module and sequentially obtains a feature map/>Characteristic map/>; The four-layer feature fusion module respectively fuses the three feature images obtained in the same layer and sequentially obtains the output feature image/>Characteristic map/>; Four-layer feature fusion module pair feature map/>Characteristic map/>Sequentially carrying out multi-level feature fusion and outputting a fusion feature map, and carrying out convolution processing on the fusion feature map to obtain a final fusion feature map; the double-branch feature fusion model establishes the relationship between the final fusion feature image of the image slice sample and the solid waste label; the double-branch feature fusion model carries out slicing treatment on the remote sensing image data to obtain a plurality of image slices, and solid waste recognition results corresponding to the image slices are sequentially obtained; or the double-branch feature fusion model identifies the solid waste of the remote sensing image data pixel by pixel and acquires the solid waste identification result of all pixels; or the double-branch feature fusion model is used for carrying out solid waste identification on the remote sensing image data in a sliding window mode.

10. An electronic device, characterized in that: comprising at least one processor, at least one memory and a data bus; wherein: the processor and the memory complete communication with each other through a data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the steps of implementing the remote sensing image dual-branch feature fusion solid waste identification method of any of claims 1 to 8.