CN115861635B - Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion - Google Patents

Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion Download PDF

Info

Publication number
CN115861635B
CN115861635B CN202310125661.6A CN202310125661A CN115861635B CN 115861635 B CN115861635 B CN 115861635B CN 202310125661 A CN202310125661 A CN 202310125661A CN 115861635 B CN115861635 B CN 115861635B
Authority
CN
China
Prior art keywords
feature
scale
features
semantic
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310125661.6A
Other languages
Chinese (zh)
Other versions
CN115861635A (en
Inventor
郑先伟
丁友丽
宦麟茜
马启源
熊汉江
陈学业
聂可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center
Wuhan University WHU
Original Assignee
Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center, Wuhan University WHU filed Critical Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center
Priority to CN202310125661.6A priority Critical patent/CN115861635B/en
Publication of CN115861635A publication Critical patent/CN115861635A/en
Application granted granted Critical
Publication of CN115861635B publication Critical patent/CN115861635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and equipment for extracting semantic information of unmanned aerial vehicle inclined images with transmission distortion resistance. A dense context learning network is designed to robustly extract semantic information of objects in unmanned aerial vehicle oblique images. The network mainly uses a general encoder-decoder as a main body frame to perform feature learning and semantic prediction of objects in unmanned aerial vehicle inclined images. An encoder is constructed using parallel dual-branch feature extractors, and multi-scale features of the same level and different levels are extracted with pairs of images of different resolutions as inputs, respectively. A trans-scale context selector is constructed to adaptively fuse multi-scale features from different resolution input encodings. The mode of joint supervision of the encoder and the decoder is adopted, and the semantic features coded by the double-branch feature extractor and the semantic prediction graph obtained by the decoder are respectively subjected to iterative optimization in the training process, so that the effectiveness of the multi-scale features and the robustness of the prediction result are enhanced.

Description

Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion
Technical Field
The invention belongs to the field of computer application, and mainly relates to a transmission distortion resistant unmanned aerial vehicle inclined image semantic information extraction method.
Background
The oblique images acquired by the unmanned aerial vehicle are one of the most important data sources in urban scene observation. The unmanned aerial vehicle inclined image contains information of the top surface and the side surface of an object in a city, and meanwhile, the unmanned aerial vehicle inclined image has the advantages of a ground vehicle-mounted image and a remote sensing image. Semantic information extraction technology of unmanned aerial vehicle inclined images plays a vital role in many city applications, including city planning, dynamic monitoring, city three-dimensional semantic modeling and the like. The purpose of pixel-level semantic extraction, i.e., semantic segmentation, is to assign a unique semantic label to each pixel in an image. Heretofore, there have been a great deal of researches on semantic segmentation methods of ground vehicle-mounted images and remote sensing images, and there have been a few researches on semantic segmentation of unmanned aerial vehicle oblique images.
In recent years, with the rapid development of Convolutional Neural Networks (CNNs), semantic segmentation tasks have achieved tremendous success. Especially for semantic segmentation tasks of ground vehicle-mounted images and remote sensing images, many CNN models show strong segmentation performance. However, compared with remote sensing images and ground vehicle-mounted images, the unmanned aerial vehicle inclined images face more complex scale problems, so that the existing semantic segmentation model is difficult to apply to semantic segmentation tasks of the unmanned aerial vehicle inclined images. Specifically, for the same urban scene, the scale problem faced by semantic segmentation of unmanned aerial vehicle oblique images mainly comes from two aspects: 1) More complex dimensional changes. The dimensional change of the remote sensing image is derived from the difference in physical dimensions between different objects. In addition to the dimensional difference from different objects, the dimensional change between objects in the oblique images of the unmanned aerial vehicle can also cause the dimensional change along the depth direction due to the influence of transmission distortion, and the same type of objects generate dimensional difference due to different distances from the camera. 2) A larger scene range. The scale change between objects in the ground vehicle-mounted image is also affected by the transmission distortion, but is limited by the more local scene range of the vehicle-mounted image. In contrast, unmanned aerial vehicle tilted images have larger swaths, more and denser objects in the acquired scene, resulting in greater challenges in pixel-level semantic segmentation thereof. Aiming at the difficulties faced by unmanned aerial vehicle inclined image semantic segmentation, the current mainstream method carries out unmanned aerial vehicle inclined image semantic segmentation by fusing multi-scale features from different resolution image codes or predicted multi-scale semantic segmentation results, and although the methods obtain a certain segmentation effect, the performance of the unmanned aerial vehicle inclined image is limited due to insufficient utilization of the multi-scale features.
Disclosure of Invention
In order to solve the technical problems, the invention provides a transmission distortion resistant unmanned aerial vehicle inclined image semantic information extraction method, which realizes semantic segmentation of unmanned aerial vehicle inclined images with high precision.
The invention discloses a transmission distortion resistant unmanned aerial vehicle inclined image semantic information extraction method, which is characterized by comprising the following steps of:
step 1, multi-scale feature coding is carried out based on a double-branch feature extractor, a double-resolution image is taken as an input, multi-scale feature extraction is carried out, the multi-scale features comprise multi-scale features of different levels and the same level, the multi-scale features of different levels come from different coding layers of an encoder, and the multi-scale features of the same level come from the same coding layer with different resolution inputs;
step 2, dense feature extraction is carried out based on a cross-scale context selector, the cross-scale context selector is constructed, context information is extracted from the double-scale features of the same coding layer, a feature weight map corresponding to the double-scale features is obtained, and self-adaptive fusion is carried out on the double-scale feature map according to the learned feature weight map;
the method comprises the steps that a cross-scale context selector firstly upsamples multi-scale features coded by lower resolution images to the same size as multi-scale features coded by higher resolution images, splices the multi-scale features along a channel dimension to obtain a feature map fusing the dual-scale features, and then learns a context relation between cross-scale pixels in the dual-scale feature map by utilizing a cross-scale context coding structure, so that importance weight is given to each pixel in the dual-scale feature map to selectively fuse the dual-scale features;
step 3, performing feature decoding based on a multi-scale feature aggregator, firstly embedding low-level geometric features of an encoder into high-level semantic features of a decoder layer by layer in a top-down feature transfer mode, and then establishing the multi-scale feature aggregator to fuse semantic features of different scales in a plurality of decoding layers, so that robustness of a decoding feature map on semantic expression of multi-scale objects in an unmanned aerial vehicle image is enhanced;
the feature aggregator performs convolution operation on each semantic feature graph, then merges the semantic feature graphs along the channel direction, acquires global feature description vectors, obtains context relations among features, and finally acquires importance weight vectors of features of each channel.
Further, in the step 1, the wide-resnet38 is adopted to extract the multi-scale features.
Further, the cross-scale context selector is composed of twoTwo consecutive +.>Is formed by a convolution layer of (1) and a sigmoid activation function, wherein->For mixing the two-scale features and performing a dimension reduction process, a +.>The convolution layer of (2) is used for extracting local context information of the mixed double-scale feature, and the sigmoid activation function is used for normalizing the mixed double-scale feature obtained by encoding to obtain a feature weight map corresponding to the double-scale feature.
Further, according to the learned feature weight map, the dual-scale feature map is adaptively fused by the following formula:
in the formula 1, the components are mixed,representing a fused feature map of the two-dimensional feature map by context selection, < + >>Indicating that the convolution kernel is +.>Is used for dimension reduction, symbol +.>Representing a matrix multiplication operation, +.>Representing matrix addition operations, +.>Up-sampling operation of representing feature map, multi-scale features of dual resolution input image coding are respectively represented as +.>Andboth according to the characteristic weight graphW i Fusion is performed.
Further, the decoder adopts a top-down feature transfer mode to embed the low-level geometric features of the encoder into the high-level features of the decoder layer by layer, and specifically adopts the following formula:
in the method, in the process of the invention,to fuse the feature map of the dual scale feature +.>Indicating that the convolution kernel is +.>For fusion of different features, symbol +.>Representing matrix addition operations, +.>Representing the characteristic diagram->Upsampling to +.>The same size is used for feature fusion.
Still further, the multi-scale feature aggregator first employs one on each semantic feature mapAnd then up-sampling the third and fourth decoding layer feature maps to the same size as the second decoding layer feature map and merging along the channel direction, secondly, obtaining global feature description vectors by an overall average pooling operation along the channel direction by an aggregator, further modeling context relations among features in the global feature description vectors by using two fully connected layers, and finally, obtaining importance weight vectors of the features of each channel by normalizing a sigmoid function.
Based on the same inventive concept, the scheme also provides a structure for realizing the unmanned aerial vehicle inclined image semantic information extraction method for resisting transmission distortion, which is characterized in that: the device comprises an encoding module, a feature extraction module and a decoding module;
the coding module performs multi-scale feature coding based on a double-branch feature extractor, takes a double-resolution image as input, performs multi-scale feature extraction, and comprises multi-scale features of different levels and the same level, wherein the multi-scale features of different levels come from different coding layers of the coder, and the multi-scale features of the same level come from the same coding layer with different resolution inputs;
the feature extraction module performs intensive feature extraction based on a cross-scale context selector, constructs the cross-scale context selector, is used for extracting context information from the double-scale features of the same coding layer, obtains a feature weight map corresponding to the double-scale features, and performs self-adaptive fusion on the double-scale feature map according to the learned feature weight map;
the decoding module performs feature decoding based on the multi-scale feature aggregator, and the decoding module firstly embeds the low-level geometric features of the encoder into the high-level features of the decoder layer by layer in a top-down feature transfer mode, so that semantic feature graphs with different granularity detail information are obtained. Then constructing a multi-scale context aggregator, carrying out convolution operation on each semantic feature graph, merging along the channel direction, carrying out global average pooling operation along the channel direction to obtain a global feature description vector, acquiring a context relation among features, and finally acquiring an importance weight vector of each channel feature for self-adaptive aggregation of the multi-layer semantic feature graphs.
Based on the same inventive concept, the scheme also designs a network training method for the system, which is characterized in that:
the method has the advantages that the network training is carried out in a mode of joint supervision of the encoder and the decoder, the supervision mode carries out semantic supervision on final decoding characteristics, and extra semantic supervision is added on the highest-layer characteristics of the encoder so as to guide the encoding characteristics to carry out gradient feedback in a more effective mode, thereby further optimizing semantic prediction results;
the total loss equation for this joint supervision model can be expressed as follows:
in the method, in the process of the invention,representing semantic truth value->And->Representing the final prediction map from the decoder and the prediction map from the highest layer features of the encoder, respectively; />Representing the number of different resolution images of the network input,/-for>Andrespectively represent for supervision->And->The loss equation of (2), the loss is exploited->And->As a weight to balance->And->Importance in the overall loss function.
Based on the same inventive concept, the scheme also designs electronic equipment, which comprises:
one or more processors;
a storage means for storing one or more programs;
when one or more programs are executed by the one or more processors, the one or more processors implement a method for extracting semantic information of unmanned aerial vehicle inclined images with anti-transmission distortion.
Based on the same inventive concept, the present solution also designs a computer readable medium, on which a computer program is stored, characterized in that: the program is executed by a processor to realize the unmanned aerial vehicle inclined image semantic information extraction method for resisting transmission distortion.
The innovation point of the invention is that:
1) The novel deep neural network for dense context learning is used for semantic segmentation of the unmanned aerial vehicle inclined images, and can effectively aggregate multi-scale context information from multi-resolution image coding so as to enhance the characteristic distortion resistance in the unmanned aerial vehicle inclined image semantic segmentation process.
2) A cross-scale context selector embedded by a plurality of coding layers is constructed, and context information is densely and selectively fused from a multi-level double-scale feature map, so that the information expression capability of coding features is enhanced.
3) A multi-scale feature aggregator is introduced to effectively aggregate long-distance context information from multiple coding layers, and finally an accurate unmanned aerial vehicle inclined image semantic prediction graph is obtained.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a cross-scale context selector according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a multi-scale context aggregator of an embodiment of the invention.
Detailed Description
In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.
The technical scheme adopted by the invention is an unmanned aerial vehicle inclined image semantic information extraction method for resisting transmission distortion, the flow of which can be seen in fig. 1, and the specific implementation steps are explained as follows:
step 1, multi-scale feature coding based on a dual-branch feature extractor. With the dual-resolution image as input, in this embodiment, the oblique image and the scaled 0.5-fold image of the unmanned aerial vehicle with the original resolution are adopted, and the wide-resnet38 is adopted to extract the multi-scale features. The multi-scale features include multi-scale features of different levels and the same level. Wherein the multi-scale features of different levels come from different coding layers of the encoder, and the multi-scale features of the same level come from the same coding layer of different resolution inputs.
Step 2, dense feature extraction based on cross-scale context selector. To fuse multi-scale features of the same hierarchy, the present invention constructs a cross-scale context selector, as shown in FIG. 2. In a dual branch feature extractor, multi-scale features encoded in the original resolution image are represented asThe multiscale feature encoded with the original 0.5-fold resolution image is denoted +.>. For the corresponding bi-scale feature from the same coding layer->Extracting useful context information from the same, first +.>Upsampling to sum->Equal size and stitching them together along the channel dimension to get a feature map of fused double-scale features +.>. Then learn +.>The context between mid-span scale pixels, thus +.>Importance weights are assigned to each pixel of (c) for fusion of the dual scale features. The trans-scale context coding structure consists of two +.>Two consecutive +.>Is formed by a convolution layer of (1) and a sigmoid activation function. Wherein (1)>Is used to mix the two-scale features and the dimension reduction process,/->The convolution layer of (2) is used for extracting local context information of the mixed double-scale feature, and the sigmoid activation function is used for normalizing the mixed double-scale feature obtained by encoding to obtain the double-scale feature ∈>Corresponding characteristic weight map->. The feature weight map represents the importance weight of each pixel in the corresponding feature map in the double-scale feature, and models the context relationship of the long distance in the double-scale feature map. According to the feature weight map, the trans-scale context selector can effectively retain +.>Useful context information in the context of the document, while suppressing redundant features. Finally, according to the learned feature weight map, the dual-scale feature map can be adaptively fused by the following formula:
(1)
in the formula 1, the components are mixed,representing a fused feature map of the two-dimensional feature map by context selection, < + >>Indicating that the convolution kernel is +.>Is used for dimension reduction. Sign->Representing a matrix multiplication operation, +.>Representing matrix addition operations, +.>Representing the upsampling operation of the feature map.
And 3, feature decoding based on a multi-scale feature aggregator. Since some detail features are inevitably lost while extracting semantics using convolution and downsampling operations during encoding, these detail features are critical in pixel-level semantic prediction. Therefore, the decoder needs to be designed to restore the spatial resolution for pixel-level prediction. Considering that the coding features of the lower layer contain more detail information, and the semantic information contained in the coding features of the higher layer is more obvious, the decoder firstly adopts a top-down feature transmission mode shown in the formula 2 to embed the features of the lower layer into the features of the higher layer by layer, so as to obtain semantic feature graphs with detail information with different granularities.
(2)
In the formula 2, the components are mixed,is prepared by the step 1The feature map of the obtained fused double-scale feature, < + >>Representing the convolution kernel asFor fusion of different features. Sign->Representing matrix addition operations, +.>Representing the characteristic diagram->Upsampling to +.>The same size is used for feature fusion.
To further merge these semantic feature graphs with detail information of different granularityThe present invention introduces a multi-scale feature aggregator. To enhance the local context information of features, the aggregator first employs a +_ on each semantic feature map>Is then +.>And->Upsampling into sum->The same size and merge along the channel direction. Second, for extracting the merged feature map->For the most effective featuresAt final semantic prediction, the aggregator obtains global feature description vectors using a global averaging pooling operation along the channel direction. Modeling further using two fully connected layers +.>Contextual relationships between the intermediate features. Finally, an importance weight vector of each channel characteristic is obtained by normalization of a sigmoid function>. Let->Representation->The feature aggregator aggregates semantic feature graphs of different granularity detail information by equation 3:
(3)
in the formula 3, the components are mixed,and->Respectively represent characteristic diagrams->And->Characteristic diagram of the kth channel, symbol->Representing an element-by-element multiplication operation.
Based on the same inventive concept, the scheme also designs a network model structure for the method, which comprises an encoding module, a feature extraction module and a decoding module;
the coding module performs multi-scale feature coding based on a double-branch feature extractor, takes a double-resolution image as input, performs multi-scale feature extraction, and comprises multi-scale features of different levels and the same level, wherein the multi-scale features of different levels come from different coding layers of the coder, and the multi-scale features of the same level come from the same coding layer with different resolution inputs;
the feature extraction module performs intensive feature extraction based on a cross-scale context selector, constructs the cross-scale context selector, is used for extracting context information from the double-scale features of the same coding layer, obtains a feature weight map corresponding to the double-scale features, and performs self-adaptive fusion on the double-scale feature map according to the learned feature weight map;
the decoding module performs feature decoding based on a multi-scale feature aggregator, and the decoding module firstly embeds low-level geometric features of an encoder into high-level features of the decoder layer by layer in a top-down feature transfer mode, so that semantic feature graphs with different granularity detail information are obtained; then constructing a multi-scale context aggregator, carrying out convolution operation on each semantic feature graph, merging along the channel direction, carrying out global average pooling operation along the channel direction to obtain a global feature description vector, acquiring a context relation among features, and finally acquiring an importance weight vector of each channel feature for self-adaptive aggregation of the multi-layer semantic feature graphs.
Refined semantic prediction based on encoder-decoder joint supervision. In order to enhance the effectiveness of feature expression in the process of extracting the intensive context features, the invention adopts an encoder-decoder joint supervision mode for training the network model structure. The supervision mode performs semantic supervision on the final decoding features, and adds additional semantic supervision on the highest-layer features of the encoder to guide the encoding features to perform gradient feedback in a more efficient manner, so that semantic prediction results are further optimized. The total loss equation for this joint supervision model can be expressed as follows:
(4)
in the formula 4, the components are mixed,representing semantic truth value->And->Representing the final prediction map from the decoder and the prediction map from the highest layer features of the encoder, respectively. />Representing the number of different resolution images of the network input, set to 2 ± in the dense context learning network>And->Respectively represent for supervision->And->Is calculated using the loss of area mutual information (RMI). Furthermore, the loss utilises->And->As weights to balanceAnd->Importance in the overall loss function.
Based on the same inventive concept, the scheme also designs electronic equipment, which comprises:
one or more processors;
a storage means for storing one or more programs;
when one or more programs are executed by the one or more processors, the one or more processors implement a method for extracting semantic information of unmanned aerial vehicle inclined images with anti-transmission distortion.
Based on the same inventive concept, the present solution also designs a computer readable medium, on which a computer program is stored, characterized in that: the program is executed by a processor to realize the unmanned aerial vehicle inclined image semantic information extraction method for resisting transmission distortion.
It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims (10)

1. The method for extracting the semantic information of the unmanned aerial vehicle inclined image with the anti-transmission distortion function is characterized by comprising the following steps of:
step 1, multi-scale feature coding is carried out based on a double-branch feature extractor, a double-resolution unmanned aerial vehicle inclined image is taken as input, multi-scale feature extraction is carried out, the multi-scale features comprise multi-scale features of different levels and the same level, the multi-scale features of different levels come from different coding layers of an encoder, and the multi-scale features of the same level come from the same coding layer with different resolution inputs;
step 2, dense feature extraction is carried out based on a cross-scale context selector, the cross-scale context selector is constructed, context information is extracted from the double-scale features of the same coding layer, a feature weight map corresponding to the double-scale features is obtained, and self-adaptive fusion is carried out on the double-scale feature map according to the learned feature weight map;
the method comprises the steps that a cross-scale context selector firstly upsamples multi-scale features coded by lower resolution images to the same size as multi-scale features coded by higher resolution images, splices the multi-scale features along a channel dimension to obtain a feature map fusing the dual-scale features, and then learns a context relation between cross-scale pixels in the dual-scale feature map by utilizing a cross-scale context coding structure, so that importance weight is given to each pixel in the dual-scale feature map to selectively fuse the dual-scale features;
step 3, performing feature decoding based on a multi-scale feature aggregator, firstly embedding low-level geometric features of an encoder into high-level semantic features of a decoder layer by layer in a top-down feature transfer mode, and then establishing the multi-scale feature aggregator to fuse semantic features of different scales in a plurality of decoding layers, so that robustness of a decoding feature map on semantic expression of multi-scale objects in an unmanned aerial vehicle image is enhanced;
the multi-scale feature aggregator firstly carries out convolution operation on each semantic feature graph, then merges the semantic feature graphs along the channel direction, then obtains global feature description vectors, obtains context relations among features, and finally obtains importance weight vectors of features of each channel.
2. The transmission distortion resistant unmanned aerial vehicle tilting image semantic information extraction method according to claim 1, wherein the method comprises the following steps of: in the step 1, the wide-resnet38 is adopted to extract the multi-scale features.
3. The transmission distortion resistant unmanned aerial vehicle tilting image semantic information extraction method according to claim 1, wherein the method comprises the following steps of: the cross-scale context selector consists of twoTwo consecutive +.>Is formed by a convolution layer of (1) and a sigmoid activation function, wherein->Is used to mix the two-scale features and the dimension reduction process,/->The convolution layer of (2) is used for extracting local context information of the mixed double-scale feature, and the sigmoid activation function is used for normalizing the mixed double-scale feature obtained by encoding to obtain a feature weight map corresponding to the double-scale feature.
4. The transmission distortion resistant unmanned aerial vehicle tilting image semantic information extraction method according to claim 1, wherein the method comprises the following steps of: according to the learned feature weight map, the double-scale feature map is adaptively fused by the following formula:
in the method, in the process of the invention,representing a fused feature map of the two-dimensional feature map by context selection, < + >>Indicating that the convolution kernel is +.>Is used for dimension reduction, symbol +.>Representing a matrix multiplication operation, +.>Representing matrix addition operations, +.>Up-sampling operation of representing feature map, multi-scale features of dual resolution input image coding are respectively represented as +.>Andfeature weight mapW i
5. The transmission distortion resistant unmanned aerial vehicle tilting image semantic information extraction method according to claim 1, wherein the method comprises the following steps of: the decoder adopts a top-down feature transfer mode to embed low-layer features into high-layer features layer by layer based on the weight vector of the feature aggregator, and specifically adopts the following formula:
in the method, in the process of the invention,to fuse the feature map of the dual scale feature +.>Indicating that the convolution kernel is +.>For fusion of different features, symbol +.>Representing matrix addition operations, +.>Representing the characteristic diagram->Upsampling to +.>The same size is used for feature fusion.
6. The method for extracting semantic information of unmanned aerial vehicle inclined images with resistance to transmission distortion according to claim 5, wherein the method comprises the following steps: the multi-scale feature aggregator first employs one on each semantic feature mapAnd then up-sampling the third and fourth decoding layer feature maps to the same size as the second decoding layer feature map and merging along the channel direction, secondly, obtaining global feature description vectors by an overall average pooling operation along the channel direction by an aggregator, further modeling context relations among features in the global feature description vectors by using two fully connected layers, and finally, obtaining importance weight vectors of the features of each channel by normalizing a sigmoid function.
7. A system for implementing the transmission distortion resistant unmanned aerial vehicle tilted image semantic information extraction method according to any one of claims 1 to 6, characterized in that: the device comprises an encoding module, a feature extraction module and a decoding module;
the encoding module performs multi-scale feature encoding based on a double-branch feature extractor, takes a double-resolution unmanned aerial vehicle inclined image as input, performs multi-scale feature extraction, wherein the multi-scale features comprise multi-scale features of different levels and the same level, the multi-scale features of different levels come from different encoding layers of an encoder, and the multi-scale features of the same level come from the same encoding layer with different resolution inputs;
the feature extraction module performs intensive feature extraction based on a cross-scale context selector, constructs the cross-scale context selector, is used for extracting context information from the double-scale features of the same coding layer, obtains a feature weight map corresponding to the double-scale features, and performs self-adaptive fusion on the double-scale feature map according to the learned feature weight map;
the decoding module performs feature decoding based on a multi-scale feature aggregator, and the decoding module firstly embeds low-level geometric features of an encoder into high-level features of the decoder layer by layer in a top-down feature transfer mode, so that semantic feature graphs with different granularity detail information are obtained; then constructing a multi-scale context aggregator, carrying out convolution operation on each semantic feature graph, merging along the channel direction, carrying out global average pooling operation along the channel direction to obtain a global feature description vector, acquiring a context relation among features, and finally acquiring an importance weight vector of each channel feature for self-adaptive aggregation of the multi-layer semantic feature graphs.
8. A network training method for implementing the system of claim 7, characterized by:
the method has the advantages that the network training is carried out in a mode of joint supervision of the encoder and the decoder, the supervision mode carries out semantic supervision on final decoding characteristics, and extra semantic supervision is added on the highest-layer characteristics of the encoder so as to guide the encoding characteristics to carry out gradient feedback in a more effective mode, thereby further optimizing semantic prediction results;
the total loss equation for this joint supervision model can be expressed as follows:
in the method, in the process of the invention,representing semantic truth value->And->Representing the final prediction map from the decoder and the highest layer bits from the encoder, respectivelyA predictive graph of symptoms; />Representing the number of different resolution images of the network input,/-for>Andrespectively represent for supervision->And->Is to use +.>And->As a weight to balance->And->Importance in the overall loss function.
9. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
10. A computer readable medium having a computer program stored thereon, characterized by: the program, when executed by a processor, implements the method of any of claims 1-6.
CN202310125661.6A 2023-02-17 2023-02-17 Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion Active CN115861635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310125661.6A CN115861635B (en) 2023-02-17 2023-02-17 Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310125661.6A CN115861635B (en) 2023-02-17 2023-02-17 Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion

Publications (2)

Publication Number Publication Date
CN115861635A CN115861635A (en) 2023-03-28
CN115861635B true CN115861635B (en) 2023-07-28

Family

ID=85658259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310125661.6A Active CN115861635B (en) 2023-02-17 2023-02-17 Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion

Country Status (1)

Country Link
CN (1) CN115861635B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883476A (en) * 2023-06-29 2023-10-13 唐山学院 Monocular depth estimation method based on attention feature fusion and multistage correction
CN117152441B (en) * 2023-10-19 2024-05-07 中国科学院空间应用工程与技术中心 Biological image instance segmentation method based on cross-scale decoding

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916435B (en) * 2010-08-30 2011-12-28 武汉大学 Method for fusing multi-scale spectrum projection remote sensing images
WO2018052586A1 (en) * 2016-09-14 2018-03-22 Konica Minolta Laboratory U.S.A., Inc. Method and system for multi-scale cell image segmentation using multiple parallel convolutional neural networks
US20220262002A1 (en) * 2019-07-01 2022-08-18 Optimum Semiconductor Technologies Inc. Feedbackward decoder for parameter efficient semantic image segmentation
CN112509001A (en) * 2020-11-24 2021-03-16 河南工业大学 Multi-scale and multi-feature fusion feature pyramid network blind restoration method
CN113780296B (en) * 2021-09-13 2024-02-02 山东大学 Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113850824B (en) * 2021-09-27 2024-03-29 太原理工大学 Remote sensing image road network extraction method based on multi-scale feature fusion
CN114693929A (en) * 2022-03-31 2022-07-01 西南科技大学 Semantic segmentation method for RGB-D bimodal feature fusion
CN115512103A (en) * 2022-09-01 2022-12-23 中国海洋大学 Multi-scale fusion remote sensing image semantic segmentation method and system

Also Published As

Publication number Publication date
CN115861635A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN115861635B (en) Unmanned aerial vehicle inclined image semantic information extraction method and equipment for resisting transmission distortion
CN111062951B (en) Knowledge distillation method based on semantic segmentation intra-class feature difference
Zhou et al. GMNet: Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation
CN109377530B (en) Binocular depth estimation method based on depth neural network
US11430134B2 (en) Hardware-based optical flow acceleration
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN110782490A (en) Video depth map estimation method and device with space-time consistency
CN108491763B (en) Unsupervised training method and device for three-dimensional scene recognition network and storage medium
CN113657388A (en) Image semantic segmentation method fusing image super-resolution reconstruction
CN112598053A (en) Active significance target detection method based on semi-supervised learning
CN115294282A (en) Monocular depth estimation system and method for enhancing feature fusion in three-dimensional scene reconstruction
CN115205150A (en) Image deblurring method, device, equipment, medium and computer program product
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN112241959A (en) Attention mechanism generation semantic segmentation method based on superpixels
Ke et al. Mdanet: Multi-modal deep aggregation network for depth completion
CN116229394A (en) Automatic driving image recognition method, device and recognition equipment
CN114693744A (en) Optical flow unsupervised estimation method based on improved cycle generation countermeasure network
Berenguel-Baeta et al. Fredsnet: Joint monocular depth and semantic segmentation with fast fourier convolutions
CN114758203B (en) Residual intensive visual transformation method and system for hyperspectral image classification
CN116597135A (en) RGB-D multi-mode semantic segmentation method
CN115661482A (en) RGB-T significant target detection method based on joint attention
CN111726621B (en) Video conversion method and device
Bi et al. EBStereo: edge-based loss function for real-time stereo matching
Li et al. SGNet: a fast and accurate semantic segmentation network based on semantic guidance
CN116824308B (en) Image segmentation model training method and related method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Floor 2, Land and Property Transaction Building, No. 8007, Hongli West Road, Xiangmihu Street, Futian District, Shenzhen, Guangdong 518034

Applicant after: Shenzhen Planning and Natural Resources Data Management Center (Shenzhen Spatial Geographic Information Center)

Applicant after: WUHAN University

Address before: 430072 No. 299 Bayi Road, Wuchang District, Hubei, Wuhan

Applicant before: WUHAN University

Applicant before: Shenzhen Planning and Natural Resources Data Management Center (Shenzhen Spatial Geographic Information Center)

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant