CN114639020A

CN114639020A - Segmentation network, segmentation system and segmentation device for target object of image

Info

Publication number: CN114639020A
Application number: CN202210303635.3A
Authority: CN
Inventors: 刘琦; 李阳; 肖博; 路慧; 杨志云
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-06-17

Abstract

The invention provides a segmentation network, a segmentation system and a segmentation device for a target object of an image, and relates to the field of semantic segmentation of remote sensing images. The segmentation network, the segmentation system and the segmentation device of the target object of the image comprise a backbone network, a multi-scale middle layer member and a decoding module, wherein the multi-scale middle layer member comprises an attention module and a feature fusion module. The attention module is used for extracting a feature map, and the feature map is processed by the feature module to represent multi-scale information. The method can capture deep semantic information and multi-scale information of the high-resolution remote sensing image, and improves the accuracy of prediction.

Description

Segmentation network, segmentation system and segmentation device for target object of image

Technical Field

The invention relates to the field of semantic segmentation of remote sensing images, in particular to a segmentation network, a segmentation system and a segmentation device for a target object of an image.

Background

The improvement of Chinese remote sensing series satellite technology and quantity, and the gradually formed network service platform, the quantity and quality of the related remote sensing images are continuously improved at present, so that researchers can more conveniently obtain high-resolution remote sensing images. The remote sensing images play an important role in the fields of land resource exploration, environment detection and protection, city planning, crop estimation, disaster prevention and reduction and the like. However, because of the huge data, how to automatically, efficiently and quickly extract relevant information from an image becomes a very important research direction. How to automatically identify the update of the urban space database by the surface buildings, the urban dynamic detection and the establishment of the 'smart city' in the high-resolution satellite image has absolute value.

Pan et al propose a generative countermeasure network consisting of space and channel attention mechanisms (SCA) for accurate partitioning of buildings. Protopapa dakis et al [4] propose a Deep Neural Network (DNN) based on Stacked Automatic Encoder (SAE) driver and semi-supervised (SSL) for extracting buildings from low cost satellites. Wang et al [5] propose a novel non-local residual U-shaped network that uses a co-dec structure to extract and recover feature maps and uses a self-attention mechanism to obtain global context information. Hu et al [6] have built new modules by setting up components to improve progress. In addition, an attention mechanism is introduced into the network, and the segmentation accuracy is improved. Liu et al [7] propose a network that can recover lightweight model details by means of a spatial pyramid. Chen et al [8] propose an adaptive iterative segmentation method. Cheng et al [9] propose a deep active ray network (Darnet) for end-to-end training that achieves accurate building segmentation through energy minimization and back propagation of the backbone CNN. Shi et al [10] combine Graph Convolution Network (GCN) with Deep Structure Feature Embedding (DSFE) to propose a gated graph convolution network to generate sharp boundaries and fine-grained pixel-level classification.

At present, the high-resolution remote sensing image has higher resolution than a common image and contains more textural features and detail parts. And the same architectural hue and character may appear differently. In the semantic extraction result, the extraction may be incomplete or may be extracted by mistake. Some researchers have adopted network scaling to change the accuracy in order to extract more features. By deepening the network and increasing the network calculation amount, the defects that the network parameters move millions are overcome, and the lightweight and real-time performance cannot be realized in the prediction process. So that the whole field is necessary for real-time performance and multi-scale performance.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a segmentation network, a segmentation system and a segmentation device of a target object of an image, which can capture deep semantic information and multi-scale information of a high-resolution remote sensing image and improve the accuracy of prediction.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

in one aspect, there is provided a segmentation network of an object of an image, the segmentation network comprising:

cutting out a data set by using a sliding window;

extracting semantic feature vectors from the data set through a decoder, and replacing the convolution common to the decoder with Res2Net series convolution;

the semantic feature vectors are subjected to channel information fusion through the middle layer;

and restoring the semantic feature vector subjected to channel information fusion to the size of the input picture through a decoder.

Further, the convolution of Res2Net is changed to a hole convolution.

Furthermore, the channel information fusion adopts a high-level semantic fusion method.

Further, the decoder is constructed with a dual-branch decoder structure comprising a deconvolution branch and a feature enhancement branch.

Furthermore, each time the dimension of the deconvolution branch of the decoder is enlarged to twice that of the original dimension, the feature enhancement branch and the decoder are supplemented with details.

In another aspect, there is provided a segmentation system for a target of an image, the segmentation system comprising:

the system comprises a backbone network, a multi-scale interlayer component and a decoding module, wherein the multi-scale interlayer component comprises an attention module and a feature fusion module. The attention module is used for extracting a feature map, and the feature map is processed by the feature module to represent multi-scale information.

Further, the decoding module is configured with a dual-branch decoder architecture, the dual-branch decoder comprising a deconvolution branch and a feature enhancement branch.

Further, the deconvolution branch is used for capturing basic information and adding bottom semantic details, and the feature enhancement branch is used for enhancing high-level semantic information and deepening multi-scale information.

Further, the low-layer semantic information is information that is peer-transferred to a decoder by an encoder in an encoding process.

In still another aspect, there is provided a segmentation apparatus for an object of an image, the segmentation apparatus including:

a network of segmentation of an object of an image according to any one of claims 1 to 5 and a system of segmentation of an object of an image according to any one of claims 6 to 9.

(III) advantageous effects

The invention provides a segmentation network, a segmentation system and a segmentation device for a target object of an image, which can capture deep semantic information and multi-scale information of a high-resolution remote sensing image and improve the accuracy of prediction.

Drawings

FIG. 1 is a network framework diagram of the present invention;

FIG. 2 is an attention mechanism for use with the present invention;

FIG. 3 is a graph showing the results of comparison between the present model and other models.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

Referring to fig. 1-3, there is shown a network overall framework diagram of a segmented network of objects of an image, the segmented network comprising:

cutting out a data set by using a sliding window; the reason that the original data set is a particularly large picture is that the sliding window is needed to cut out a single picture and transmit the single picture to the network;

The convolution of Res2Net is changed to a hole convolution.

The channel information fusion adopts a high-level semantic fusion method.

The decoder is built with a dual-branch decoder structure comprising a deconvolution branch and a feature enhancement branch. The dual-branch decoder structure is mainly composed of a deconvolution branch and a feature enhancement branch. The restoration to the input picture in the decoder mainly depends on upsampling (up-sampling) restoration, wherein the semantic feature components of each layer in the decoder network are combined to enhance the details, because at the time of the encoder, the picture size consistency is maintained before and after the semantic division codec network every time the decoder is reduced to 1/2.

The decoder performs detail supplement each time the deconvolution branch dimension is enlarged to twice the original. With the addition of a space and channel attention mechanism.

A segmentation system for an object of an image, the segmentation system comprising:

the multi-scale multi-branch decoding system comprises a backbone network, a multi-scale middle layer component and a dual-branch decoding module. The multi-scale interlayer member is divided into two parts: an attention module and a feature fusion module. After the feature map extracted by the attention module is processed by the feature fusion module, multi-scale rich surface building information can be effectively represented. In the decoding module, a dual-branch decoder architecture is constructed, which includes a deconvolution branch and a feature enhancement branch. The deconvolution branch is responsible for capturing basic information and adding bottom semantic details, and the feature enhancement branch strengthens high-level semantic information and deepens multi-scale information.

An apparatus for segmenting an object of an image, the apparatus comprising:

a segmentation network of objects of an image according to any one of claims 1 to 5 and a segmentation system of objects of an image according to any one of claims 6 to 9.

The invention patent, the overall framework is based on an encoder-decoder network. First, a feature extraction is performed on the input data. Since building segmentation is essentially a pixel point classification problem, global information is also important for local information. Global high-level semantic information can be extracted by the decoder. And then, fusing the characteristics of the middle layer to enable the semantic information to contain more multi-scale information. And finally, gradually restoring by a double-branch decoder defined by us to gradually complement detail information, wherein the detail information can be understood as extracted feature vectors of the feature branches, and assuming that the picture input is 256 × 256, the feature size behind the middle layer is 16 × 16, and the original size is doubled after each deconvolution. From 16 to 32, to 64, 128 and finally to 256, this is a stepwise reduction. In the reduction process, a concatenate function is directly used to connect semantic variables of the characteristic enhancement branch, namely, a predicted value is obtained by gradually complementing, and the predicted value is obtained because the predicted value obtained by the network designed by the user is better than that of the network designed by the predecessor, so the accuracy is high, and the segmentation accuracy is effectively improved.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A segmentation network of an object of an image, characterized in that the segmentation network comprises:

cutting out a data set by using a sliding window;

2. The segmentation network of objects of an image according to claim 1, characterized by: the convolution of Res2Net is changed to a hole convolution.

3. The segmentation network, the segmentation system and the segmentation device for the target object of the image according to claim 1, wherein: the channel information fusion adopts a high-level semantic fusion method.

4. The segmentation network, the segmentation system and the segmentation device for the target object of the image according to claim 1, wherein: the decoder is built with a dual-branch decoder structure comprising a deconvolution branch and a feature enhancement branch.

5. The segmentation network, the segmentation system and the segmentation device for the target object of the image according to claim 1, wherein: and the characteristic enhancement branch and the decoder perform detail supplement each time the dimension of the deconvolution branch is enlarged to be twice of the original dimension.

6. An object segmentation system for an image, the segmentation system comprising:

7. The system for segmenting an object of an image according to claim 6, wherein: the decoding module is constructed with a dual-branch decoder architecture comprising a deconvolution branch and a feature enhancement branch.

8. The system for segmenting an object in an image according to claim 7, wherein: the deconvolution branch is used for capturing basic information and adding bottom semantic details, and the feature enhancement branch is used for enhancing high-level semantic information and deepening multi-scale information.

9. The system for segmenting an object in an image according to claim 8, wherein: the low-layer semantic information is information which is transmitted to a decoder by the same level in the encoding process of the encoder.

10. An apparatus for segmenting an object of an image, the apparatus comprising: