CN110443248B

CN110443248B - Method and system for eliminating semantic segmentation blocking effect of large-amplitude remote sensing image

Info

Publication number: CN110443248B
Application number: CN201910560692.8A
Authority: CN
Inventors: 张觅; 胡翔云; 赵丽科; 魏域君
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2021-12-03
Anticipated expiration: 2039-06-26
Also published as: CN110443248A

Abstract

The invention relates to a method and a system for eliminating a semantic segmentation blocking effect of a large-amplitude remote sensing image. And eliminating the extended boundary by using the size of the sliding window and the fusion factor so as to obtain a final fusion result and eliminate the semantic segmentation blocking effect of the large-amplitude remote sensing image.

Description

Method and system for eliminating semantic segmentation blocking effect of large-amplitude remote sensing image

Technical Field

The invention relates to the field of computer vision and remote sensing, in particular to a method and a system for eliminating a semantic segmentation blocking effect of a remote sensing image.

Background

In recent years, with the promotion of large-scale applications such as deep learning, big data and Graphic Processing Unit (GPU), the intelligent interpretation of remote sensing images faces many opportunities and challenges. The high-resolution remote sensing image semantic segmentation technology is a technology for endowing each pixel with a category attribute, can be widely applied to emergency processing tasks such as homeland change detection, geographical national situation general survey, earthquake prevention and disaster reduction and the like, and has great economic and social values.

Semantic segmentation methods for images are commonly used in indoor/outdoor image processing. With the advent of large-scale data sets such as ImageNet and MS-COCO, semantic segmentation methods have been rapidly developed, and methods represented by Deep Convolutional Neural Networks (DCNN) have been widely studied. Compared with the traditional method, such as TextonBoost and the like, the DCNN-based semantic segmentation method has better robustness, can find the optimal approximation function of image semantic segmentation through the combination of linear and nonlinear mapping under the condition of having enough labeled data, and is the method with the best effect in the current indoor/outdoor image semantic segmentation task. In the field of remote sensing image processing, a semantic segmentation technology is also called as a remote sensing image classification technology, and is different from indoor/outdoor images, the problem faced by the semantic segmentation of the remote sensing images is more complex, and the problems of semantic segmentation scale, direction, spatial context information expression and the like exist, and the problems of local information loss, same-spectrum foreign matter, same-object different-spectrum and the like of a training sample can also occur. The consistency of the block prediction results of large-image remote sensing images (usually, the size is larger than 15000 multiplied by 15000 pixels) becomes a bottleneck of applying the indoor/outdoor image semantic segmentation method to remote sensing image processing.

The DCNN-based semantic segmentation method can be divided into three categories according to processing levels and fusion units: the first kind of characteristics is that based on image classification network, such as VGGNet, GoogleNet, ResNet, etc., multiple strategies are fused, and semantic information extraction of end-to-end (input to output) is completed by adjusting the structure of the image classification network. For example, a "hole" convolution kernel is used in a dilated convolution (DilatedConv) network to maintain the perceptual view of a Full Convolution Network (FCN); RefineNet represents semantic segmentation information through multipath and multiresolution; the ExFuse model fuses bottom-layer and high-layer features and the like. The second category is the method using object detection to assist the semantic segmentation task. The method integrates the tasks of target detection and semantic segmentation, and segments the examples in the image by using the outsourcing rectangle and the target mask branch. For example, the Mask-RCNN framework has achieved champions on the MS-COCO dataset using this strategy. However, for some specific categories in the remote sensing image, the labeling data are not single instances enclosed by rectangular frames, and the labeling data have large subjective differences. The third type of scene constraint semantic segmentation aims to blend scene information in a semantic segmentation task so as to inhibit interference of irrelevant scenes. Semantically segmented scene information generally originates from two aspects: one is scene category information of the image block; another aspect is the combination of different hierarchical features of DCNN. For the former, the scene information constraint is derived from statistics of each category information in the semantic segmentation labeling data, and the dominant category information is used as the scene constraint information; for the latter, mainly through integration of different hierarchical features in the network structure, it may lead to an increasingly complex structural design and an excessive consumption of GPU computing resources.

Although the above-described semantic segmentation method based on DCNN can perform segmentation using local image blocks (patch) as processing units, the processing objects are limited to small images due to GPU resource limitations and model design, and information fusion between predicted local image blocks is not considered. The remote sensing image has a large image size which is usually 3-4 times of the size of the natural image, and by using a sliding window or weighted overlapping sliding window method, optimal stitching is still difficult to achieve between prediction results of each local image block after block prediction, and a blocking effect cannot be eliminated (as shown in fig. 1(a) - (c)). Therefore, when the semantic segmentation prediction of the large-scale remote sensing image is required, a blocking effect elimination method is introduced, so that the transition of the prediction result of the large-scale image is smoother, and the semantic segmentation result keeps better global consistency.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a global weighted fusion (GWFose) method (shown in figure 1(d)) which is used for eliminating the semantic segmentation blocking effect of the remote sensing image greatly, so that the semantic segmentation result of the remote sensing image is transited more smoothly, and the optimal stitching among the segmentation results of local image blocks is achieved.

In order to achieve the purpose, the invention provides the technical scheme that: the method for eliminating the semantic segmentation blocking effect of the large-amplitude remote sensing image comprises the following steps:

step 1, calculating weighted fusion preprocessing parameters of an image to be semantically segmented, comprising the following substeps;

step 1.1, expanding the boundary of the remote sensing image to be interpreted;

step 1.2, a window weighting function;

step 1.3, calculating the total step length of batch processing;

step 2, a semantic segmentation method based on a Convolutional Neural Network (CNN) introduces a batch processing mode in a prediction stage, and uses a window weighting function to fuse batch processing image prediction results to obtain a final interpretation result, and the method comprises the following substeps;

step 2.1, initializing the current step length s to be 0;

step 2.2, judging whether the current step length is smaller than the total step length, if so, initializing the parameter lambda of the current batch to be 0, and turning to the step 2.3; otherwise, the image is processed, and the interpretation result image is output;

step 2.3, judging whether the parameter lambda of the current batch is smaller than the total batch bs, if so, calculating a weighted interpretation result of the current batch by using a window weighting function, and turning to the step 2.4;

step 2.4, updating the current step length parameter s ═ s + bs until the current step length is larger than the total step length, indicating that the whole image is processed, and turning to the step 2.5;

step 2.5, according to the fusion factor beta, calculating the extended boundary image I_p×qIs interpreted to be a result M^b；

Step 2.6, according to the window size k and the fusion factor beta, obtaining the interpretation result M after eliminating the expansion boundary^T。

Further, the specific implementation manner of step 1.1 is as follows,

the input remote sensing image to be interpreted is assumed to be I_m×nWherein m and n are the width and height of the image to be interpreted respectively; the window size during image interpretation is k, and the fusion factor is beta; the extended image is I_p×qWherein p and q are the width and height of the extended image respectively, and the extended width w and height h are calculated by the size of the interpretation window and the fusion factor:

wherein the symbols

Indicating rounding up, so the width and height of the extended image are:

and finally, expanding the image to be interpreted in a mirror image boundary expansion mode according to the expansion width and the expansion height.

Further, the specific implementation manner of step 1.2 is as follows,

using a window weighting function for calculating a global weight value between overlapping windows,

wherein the window size is k, and when processing the image blocks in the window, the overlap degree between each divided blocks is kept to be

Expanding the formula (5) to two dimensions to obtain a second-order smooth function W under the two-dimensional condition_k×k＝[f(x) f(y)]^T。

Further, the specific implementation manner of step 1.3 is as follows,

according to the window size k, the image width and height p and q after the boundary is expanded and the fusion factor beta, the total step length of batch processing in the x direction and the y direction is respectively calculated as follows:

wherein sm represents a scaling factor, and the calculation mode is as follows:

the total step size of the final batch process is therefore

Furthermore, the specific implementation manner of calculating the current batch weighting interpretation result in step 2.3 is as follows,

wherein the symbol denotes a two-dimensional convolution operator; the function F (-) represents a CNN semantic segmentation prediction network; i is^λRepresenting the block image to be processed in the current batch by the image I after the boundary expansion_p×qObtaining:

I^λ＝I_p×q[x_λ:x_λ+k,y_λ:y_λ+k,:] (10)

y_λ＝(index％l_y)×sm (12)

x in the formula (10)_λ:x_λ+ k represents the extended image matrix I_p×qThe range taken along the x-axis is [ x ]_λ,x_λ+k]；y_λ:y_λ+ k represents the extended image matrix I_p×qThe range taken along the y-axis is [ y ]_λ,y_λ+k](ii) a In the formulas (11) and (12), index ∈ [ s, s + bs) represents a positive integer vernier, s is a current step length parameter, and if the current batch parameter λ is smaller than the total batch bs, the current batch image I continues to be obtained^λAnd updating the current batch parameter lambda + 1.

Further, the interpretation result M in step 2.5^bThe way of calculating (a) is as follows,

in the formula (13), the first and second groups,

represents the ith lot weighted interpretation result, all the lot results together form the image I with the size of p × q_p×qIs interpreted to be a result M^b。

Further, the interpretation result M in step 2.6^TIn a manner such asIn the following, the first and second parts of the material,

M^T＝M^b[w:p-w,h:q-h,:] (14)

in the formula (14), w: p-w represents the image matrix M for the interpretation result^bThe range taken along the x-axis is [ w, p-w ]](ii) a Q-h represents the image matrix M for the interpretation result^bTaken along the y-axis, is in the range [ h, q-h]。

The invention also provides a system for eliminating the semantic segmentation blocking effect of the large-amplitude remote sensing image, which comprises the following modules:

the parameter calculation module is used for calculating weighted fusion preprocessing parameters of the image to be semantically segmented and comprises the following sub-modules;

the boundary expansion submodule is used for expanding the boundary of the remote sensing image to be interpreted;

the window weighting function calculation submodule is used for calculating a window weighting function;

the batch processing total step length calculation submodule is used for calculating the batch processing total step length;

the interpretation result output module is used for introducing a batch processing mode in a prediction stage based on a semantic segmentation method of a Convolutional Neural Network (CNN), fusing a batch processing image prediction result by using a window weighting function to obtain a final interpretation result, and comprises the following sub-modules;

a first sub-module, configured to initialize a current step size β equal to 0;

the second submodule is used for judging whether the current step length is smaller than the total step length or not, if so, initializing the parameter lambda of the current batch to be 0, and transferring to the third submodule; otherwise, the image is processed, and the interpretation result image is output;

the third sub-module is used for judging whether the parameter lambda of the current batch is smaller than the total batch bs, if so, calculating a weighted interpretation result of the current batch by using a window weighting function, and transferring the weighted interpretation result to the fourth sub-module;

the fourth sub-module is used for updating the current step length parameter s ═ s + bs until the current step length parameter s is larger than the total step length, and then the whole image is processed and the fourth sub-module is switched to the fifth sub-module;

a fifth sub-module for calculating an extended boundary shadow according to the fusion factor betaLike I_p×qIs interpreted to be a result M^b；

A sixth sub-module for obtaining the interpretation result M after eliminating the extended boundary according to the window size k and the fusion factor beta^T。

Compared with the existing sliding window or weighted overlapping sliding window fusion technology, the global weighted fusion GWFose method is provided, a second-order global weighted smoothing function is adopted, the high-efficiency calculation advantage of the GPU is considered, and a GPU batch processing mode is introduced in the semantic segmentation prediction stage of the remote sensing image, so that batch processing images under different step lengths are subjected to global weighted fusion. And eliminating the extended boundary by using the size of the sliding window and the fusion factor, thereby eliminating the semantic segmentation blocking effect of the large-amplitude remote sensing image and obtaining the optimal fusion result.

Drawings

FIG. 1 is a block diagram of a semantic segmentation blocking effect of local image blocks of a large-amplitude remote sensing image and a processing result thumbnail of the block effect. Wherein, the image (a) is an original image; FIG. (b) shows the result of the sliding window process; FIG. (c) shows the result of weighted overlap sliding window processing; FIG. d shows the results of the patented process of the present invention.

Fig. 2 is a flowchart of a global weighted fusion (GWFuse) blocking effect elimination method adopted in the present invention.

FIG. 3 is a schematic diagram of a remote sensing image boundary expansion to be interpreted.

Fig. 4 is a diagram illustrating a global window weighted smoothing function. Wherein, diagram (a) is a schematic diagram of the case of extending the second order smoothing function to two dimensions; graph (b) is a two-dimensional weighted smoothing function visualization.

FIG. 5 is a more semantic segmentation blocking effect and processing result thumbnail thereof. Wherein, (a) is the original image; (b) listing as a sliding window processing result; (c) the columns are weighted overlapping sliding window processing results; (d) listed as the patent processing result of the present invention. The original image size floats in the range of 7000 × 7000 pixels to 80000 × 80000 pixels.

Detailed Description

The invention adopts a global weighting fusion method to solve the problem of eliminating the blocking effect of the semantic segmentation result of the large-amplitude remote sensing image, the method utilizes the size of a sliding window and a global fusion factor to comprehensively calculate a window weighting function, simultaneously considers the advantages of batch processing prediction of a Convolutional Neural Network (CNN), and carries out global weighting fusion on batch processing images under different step sizes by taking an image block processed by the total batch processing size as a processing unit. And eliminating the extended boundary by using the size of the sliding window and the fusion factor so as to obtain a final fusion result and eliminate the semantic segmentation blocking effect of the large-amplitude remote sensing image.

In order to better understand the technical solution of the present invention, the following description will be made with reference to the accompanying drawings. The global weighting fusion (GWFose) semantic segmentation blocking effect elimination method provided by the invention is shown in figure 2, and the core of the method lies in window weighting function calculation and batch processing image weighting fusion. The specific implementation steps are as follows:

step 1, calculating weighted fusion preprocessing parameters of the image to be semantically segmented.

The calculation of the preprocessing parameters is a precondition for eliminating the semantic segmentation blocking effect, and comprises three aspects of image boundary expansion, window weighting function calculation and batch processing total step length calculation. The method comprises the following specific steps:

1.1 remote sensing image boundary expansion to be interpreted

The input remote sensing image to be interpreted is assumed to be I_m×nWherein m and n are the width and height of the image to be interpreted respectively; the window size during image interpretation is k, and the fusion factor is beta; the extended image is I_p×qWherein p and q are the width and height of the extended image, respectively. As shown in fig. 3, from the interpretation window size and the fusion factor, the extended width w and height h can be calculated as:

wherein the symbols

Indicating rounding up. Therefore, the width and height of the expanded image are:

the invention adopts a mirror image boundary expansion mode to expand the image to be interpreted according to the expansion width and height. Wherein the fusion factor is chosen to be β -2 and the window size is chosen to be k-512.

1.2 calculation of Window weighting function

In fig. 1(c), the semantic segmentation result is weighted by using overlapping window voting, and the smoothness of the boundary of the window overlapping is not considered fundamentally. In order to make the transition between predicted semantic segmentation blocks more natural, the patent designs a second-order window weighting smoothing function for calculating a global weight value between overlapping windows.

The reason why the window size is k, which causes the blocking effect of the semantic segmentation result, is that the voting function is a first-order linear function, the weight is higher at a position closer to the center of the window, and a step is generated at the window boundary:

in order to overcome the disadvantage of the step function at the boundary, the second-order window weighted smoothing function designed by the patent is as follows:

in formula (5), x represents a coordinate position in the x direction. When processing image blocks in the window, the overlapping degree between each divided block is kept as

The formula (5) is expanded to two dimensions, namely a second-order smooth function W under the condition of two dimensions is obtained_k×k＝[f(x) f(y)]^TWhere f (x), f (y) represent the window weighting functions in the x-direction and y-direction, respectively. Fig. 4 is a diagram illustrating a case where the second-order smoothing function is extended to two dimensions.

1.3, batch Total step size calculation

According to the window size k, the image width and height p and q after the boundary is expanded, and the fusion factor beta, the total step length of batch processing in the x direction and the y direction can be calculated and respectively:

wherein sm represents a scaling factor, and the calculation mode is as follows:

the total step size of the final batch process is therefore

And 2, eliminating the blocking effect of the batch processing semantic segmentation.

The semantic segmentation method based on Convolutional Neural Network (CNN) usually uses a single image block as a processing unit in the prediction stage. The invention introduces a batch processing mode in a prediction stage of a semantic segmentation method by virtue of the advantages of GPU batch processing in a CNN training stage so as to conveniently use a weighted window function to fuse batch processing image prediction results. The method specifically comprises the following steps:

step 2.1, initialize the current step size s to 0.

Step 2.2, determine whether the current step is less than the total step l ═ l_x×l_y. If the current batch length is less than the total step length, initializing the parameter lambda of the current batch to be 0, and turning to the step 2.3; otherwise, it indicates that the image has been processed,outputting the interpretation result image M^T。

And 2.3, judging whether the parameter lambda of the current batch is smaller than the total batch bs. If the total batch size is larger than the total batch size, a window weighting function W is utilized_k×k＝[f(x) f(y)]^TCalculating the current batch weighting interpretation result, and turning to the step 2.4:

wherein the symbol denotes a two-dimensional convolution operator; the function F (-) represents a CNN semantic segmentation prediction network, and a Full Convolution Network (FCN) with a dense connection structure is adopted as a prediction network function; i is^λRepresenting the block image to be processed in the current batch by the image I after the boundary expansion_p×qObtaining:

I^λ＝I_p×q[x_λ:x_λ+k,y_λ:y_λ+k,:] (10)

y_λ＝(index％l_y)×sm (12)

x in the formula (10)_λ:x_λ+ k represents the extended image matrix I_p×qThe range taken along the x-axis is [ x ]_λ,x_λ+k]；y_λ:y_λ+ k represents the extended image matrix I_p×qThe range taken along the y-axis is [ y ]_λ,y_λ+k]. Index ∈ [ s, s + bs) in equations (11) and (12) represents a positive integer vernier, s being the current stride parameter. If the parameter lambda of the current batch is smaller than the total batch bs, continuously obtaining the image I of the current batch^λAnd updating the current batch parameter lambda + 1.

And 2.4, updating the current step length parameter s + bs until the current step length is larger than the total step length l_x×l_y. At this time, the whole image is processed by the semantic segmentation network of the dense connection structure, and the step 2.5 is carried out;

in the step 2.5, the step of the method,calculating an extended boundary image I according to the fusion factor beta_p×qIs interpreted to be a result M^bThe calculation method is as follows:

in the formula (13), the first and second groups,

representing the ith lot weighted interpretation result, all the lots of prediction results together form the image I with the size of p × q_p×qIs interpreted to be a result M^b。

Step 2.6, according to the window size k and the fusion factor beta, obtaining the interpretation result M after eliminating the expansion boundary^T. The process of eliminating the extended boundary is opposite to the process of expanding the boundary in the step 1.1, the width and the height of the clipping are calculated by the formula (1), and finally, the interpretation result after the extended boundary is fused and eliminated is as follows:

M^T＝M^b[w:p-w,h:q-h,:] (14)

in the formula (14), w: p-w represents the image matrix M for the interpretation result^bThe range taken along the x-axis is [ w, p-w ]](ii) a Q-h represents the image matrix M for the interpretation result^bTaken along the y-axis, is in the range [ h, q-h]. Fig. 5 is an example of a thumbnail applied to large-format remote sensing image semantic segmentation blocking effect elimination by using the method provided by the present patent, where an original image includes types of GeoEye, ZY-3, and the like, and the image size floating range is 7000 × 7000 pixels to 80000 × 80000 pixels.

The embodiment of the invention also provides a system for eliminating the semantic segmentation blocking effect of the large-amplitude remote sensing image, which comprises the following modules:

a first sub-module, configured to initialize a current step size β equal to 0;

a fifth sub-module for calculating an extended boundary image I according to the fusion factor beta_p×qIs interpreted to be a result M^b；

The specific implementation of each module corresponds to each step, and the invention is not described.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. The method for eliminating the blocking effect of the semantic segmentation of the large-amplitude remote sensing image is characterized by comprising the following steps of:

step 1, calculating weighted fusion preprocessing parameters of a remote sensing image to be interpreted, wherein the method comprises the following substeps;

step 1.1, expanding the boundary of the remote sensing image to be interpreted;

step 1.2, determining a window weighting function;

step 1.3, calculating the total step length of batch processing;

step 2, a semantic segmentation method based on a convolutional neural network CNN introduces a batch processing mode in a prediction stage, and uses a window weighting function to fuse batch processing image prediction results to obtain a final interpretation result, and the method comprises the following substeps;

step 2.1, initializing the current step length s to be 0;

the specific implementation of calculating the current lot weighting interpretation result in step 2.3 is as follows,

I^λ＝I_p×q[x_λ:x_λ+k,y_λ:y_λ+k,:] (10)

y_λ＝(index％l_y)×sm (12)

x in the formula (10)_λ:x_λ+ k represents the extended image matrix I_p×qThe range taken along the x-axis is [ x ]_λ,x_λ+k]；y_λ:y_λ+ k represents the extended image matrix I_p×qThe range taken along the y-axis is [ y ]_λ,y_λ+k](ii) a In the formulas (11) and (12), index ∈ [ s, s + bs) represents a positive integer vernier, s is a current step length parameter, and if the current batch parameter λ is smaller than the total batch bs, the current batch image I continues to be obtained^λUpdating the parameter lambda of the current batch to lambda + 1;

Interpretation of the results M in step 2.5^bThe way of calculating (a) is as follows,

in the formula (13), the first and second groups,

represents the ith lot weighted interpretation result, all the lot results together form the image I with the size of p × q_p×qIs interpreted to be a result M^b；

Step 2.6, according to the window size k and the fusion factor beta, obtaining the interpretation result M after eliminating the expansion boundary^T；

Interpretation of the results M in step 2.6^TThe way of calculating (a) is as follows,

M^T＝M^b[w:p-w,h:q-h,:] (14)

in the formula (14), w: p-w represents the image matrix M for the interpretation result^bTaken along the x-axisThe range is [ w, p-w](ii) a Q-h represents the image matrix M for the interpretation result^bTaken along the y-axis, is in the range [ h, q-h]。

2. The method for eliminating the semantic segmentation blocking effect of the large-format remote sensing image according to claim 1, characterized by comprising the following steps: the specific implementation of step 1.1 is as follows,

wherein the symbols

Indicating rounding up, so the width and height of the extended image are:

3. The method for eliminating the semantic segmentation blocking effect of the large-format remote sensing image according to claim 2, characterized by comprising the following steps: the specific implementation of step 1.2 is as follows,

4. The method for eliminating the semantic segmentation blocking effect of the large-format remote sensing image according to claim 3, characterized by comprising the following steps: the specific implementation of step 1.3 is as follows,

wherein sm represents a scaling factor, and the calculation mode is as follows:

the total step size of the final batch process is therefore

5. A large-amplitude remote sensing image semantic segmentation blocking effect elimination system for realizing the method of any one of claims 1 to 4 is characterized by comprising the following modules:

the parameter calculation module is used for calculating the weighted fusion preprocessing parameters of the remote sensing image to be interpreted and comprises the following sub-modules;

a first sub-module, configured to initialize a current step size β equal to 0;