CN110443822B

CN110443822B - Semantic edge-assisted high-resolution remote sensing target fine extraction method

Info

Publication number: CN110443822B
Application number: CN201910638370.0A
Authority: CN
Inventors: 夏列钢; 张雄波; 吴炜; 杨海平
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2021-02-02
Anticipated expiration: 2039-07-16
Also published as: CN110443822A

Abstract

A high-resolution remote sensing target fine extraction method based on edge assistance comprises the steps of designing a deep convolutional neural network according to a remote sensing target extraction task, manufacturing a sample, and training to obtain a target detection model and an edge detection model. Then inputting the remote sensing image into a target detection model for extracting a surface feature outsourcing frame, thereby determining the type and the position range of a surface feature target; and inputting the remote sensing image into an edge detection model for ground object edge extraction, thereby obtaining a ground object target edge intensity image. And finally, aiming at the target in each outer frame, performing target boundary extraction operation by taking the edge strength graph of the corresponding position as an indication, directly refining the obvious edge with higher strength as a boundary, repairing the fuzzy, particularly interrupted edge by taking the complete boundary as the target, finally determining a target fine boundary, and vectorizing the target boundary into a polygon element according to task requirements. The invention can realize the fine extraction of the high-resolution remote sensing target.

Description

Semantic edge-assisted high-resolution remote sensing target fine extraction method

Technical Field

The invention belongs to the field of remote sensing and the field of target detection and edge extraction in computer vision, and provides a two-stage remote sensing target fine extraction method combining target detection and edge detection based on deep learning.

Technical Field

The target extraction is an important method for extracting information from remote sensing data, the observable target of the image and the extraction precision requirement of the observable target of the image are correspondingly and continuously improved along with the continuous improvement of the spatial resolution of the remote sensing image, although the example segmentation in general computer vision can be adopted for partial application, the requirements on the actual ground object boundary are far away, and the method is difficult to be directly applied to the fine target extraction of the high-resolution remote sensing image like the currently commonly adopted target detection-Mask segmentation two-stage method (such as Mask-RCNN and the like). In fact, target detection based on the deep learning method is generally high in precision because only positioning of a target and determination of an outer frame are required; the main actual influence on the target extraction effect lies in the fine determination of the actual boundary of the ground object in the mask segmentation stage, and due to the complex changes of the ground object target in the aspects of image spectral representation, spatial shape dimension and the like, the boundary identification of local data in a detection frame is limited by the current depth segmentation model, which is still difficult to obtain better precision.

The current mainstream target detection algorithm proposed by deep learning can be roughly divided into two types, namely a target detection convolutional neural network based on region suggestion and an end-to-end integrated target detection convolutional neural network. In the end-to-end target detection method, in order to pursue faster detection speed, the recognition result of single detection is adopted as the final target detection result, and the effect is not ideal compared with the multi-stage training target detection method based on the region suggestion. In the field of target detection based on region suggestion, R-CNN is an initiative proposal, changes the original complicated exhaustive sliding window form, extracts a small number of suitable region candidate frames through selective search, then normalizes the size of the candidate regions, extracts features by using a deep convolutional neural network, and finally uses an SVM for classification and recognition to obtain a final target detection result. Detection algorithms such as Fast-RCNN, FPN, Mask-RCNN and the like are all purposefully improved on the basis. As the target detection algorithm only requires positioning and classifying the target, the method has good detection precision.

The edge detection algorithm requires accurate determination of the target boundary, and has greater technical difficulty than simple target positioning. In the edge detection algorithm based on deep learning, RCF is a relatively effective convolutional neural network model. The method mainly includes the steps of utilizing a VGG model for fine adjustment, reducing dimensions of a plurality of convolution layers in 5 stages in the VGG to obtain a feature fusion graph, then respectively deconvoluting and independently calculating a loss value of each stage, and finally synthesizing the feature fusion graphs in the 5 stages in a multi-scale mode to obtain a final feature fusion graph and calculate the loss value. Utilizing the characteristics of all of the convolutional layers can provide an increase in efficiency over utilizing only the last convolutional layer characteristics of each stage in the HED. Under the condition of a strong boundary, the edge contains relatively rich semantic information, and then global information is obtained through convolution, so that a good edge detection effect is achieved. However, in the case of a weak boundary, it is difficult for the target edge to provide effective semantic information, and even due to the shadow occlusion of a tall object, the distinction between the target and the non-target edge is completely lost, so that the neural network cannot identify the target edge, and thus the edge extraction disconnection occurs. In practical requirements, the edge of the broken line cannot be regarded as complete boundary extraction, so that the boundary repairing work aiming at the integrity of the boundary can be regarded as very important content in the task of extracting the target boundary.

The method tries to integrate the advantages of target detection and edge extraction, finely positions the target boundary by using the edge detection result under the guidance of a target detection outer frame with higher precision, and actively speculates and completes the blocked or interfered edge based on the target detection result, thereby realizing the fine extraction of the remote sensing target.

Disclosure of Invention

The invention provides a high-resolution remote sensing target fine extraction method based on edge assistance, aiming at overcoming the defects of imprecise and incomplete extraction of the edges of ground objects caused by fuzzy and mutual shielding of the boundaries of the ground objects in high-resolution remote sensing images.

The invention can overcome the defect of inaccurate mask boundary in general case segmentation by edge assistance, simultaneously ensure the integrity of the target boundary by two-stage process under the assistance of the outer covering frame, overcome the defect of easy disconnection in general edge extraction, and finally realize the fine extraction of the high-resolution remote sensing target.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a high-resolution remote sensing target fine extraction method based on edge assistance comprises the following steps:

step 1: making a remote sensing image target extraction sample, drawing a fine boundary of a target by contrasting an image and determining a type; generating two different labels for the same image sample, namely generating a target outer covering frame label according to the requirement of a target detection model, and generating a target boundary label according to the requirement of an edge detection model; the method specifically comprises the following steps:

step 1.1: obtaining a high-resolution remote sensing image: optical satellite remote sensing data with a visible light-near infrared sensor or aviation remote sensing data carrying a common optical camera are adopted, and multispectral images or fusion panchromatic images can be directly used according to resolution requirements;

step 1.2: cutting a remote sensing image: selecting a range of a typical target in a production area, and cutting the remote sensing image according to uniform pixels;

step 1.3: manufacturing a deep learning training sample: drawing a deep learning training sample by using ArcGIS or other GIS software, marking the boundary of each ground feature target to obtain a corresponding shp file, and generating a target outer frame mark according to the requirements of a target detection model; generating a target boundary label according to the requirement of the edge detection model;

step 1.4: collecting more than 200 images and corresponding labels as training samples according to task requirements, and independently preparing test samples for detection precision;

step 2: training a target detection model and an edge detection model by using the prepared image sample; training a Faster R-CNN neural network to obtain a target detection model, training an RCF neural network to obtain an edge detection model, and correspondingly replacing and modifying the network according to a production target; the method specifically comprises the following steps:

step 2.1: designing a deep convolutional neural network: in order to train a target detection model and an edge detection model, two neural networks of RCF and Faster R-CNN are selected, and the networks can be replaced and modified correspondingly according to a production target;

step 2.2: initializing the weight: initializing RCF network weights using a VGG pre-training model, initializing weights of Faster R-CNN using a pre-training model on a COCO data set;

step 2.3: setting a training hyper-parameter: configuring hyper-parameters, and specifically setting numerical values after model tuning;

training parameter setting of RCF: 8000 for iteration number, 4 for batch _ size, step for learning rate update strategy, 3200,4800,6400,8000 for learning rate update step, 0.001 for initial learning rate, 0.1 for learning rate update coefficient;

training parameter settings for Faster R-CNN: training phase number is 3, each phase iteration round number is [40,120, 40], each iteration round number is 1000, verification interval is 50, batch _ size is 4, learning rate updating strategy is step, learning rate updating step size is 1000, initial learning rate is 0.001, learning rate updating momentum is 0.9, and weight decay is 0.0001;

step 2.4: inputting a sample, training a model: inputting the training samples into the RCF model, and training according to the hyper-parameters in the step 2.3 to obtain an edge detection model capable of extracting the edge contour of the ground object target; inputting the training sample into the Faster R-CNN, training according to the hyperparameter in the step 2.3, and obtaining a target detection model capable of extracting a surface feature target outer wrapping frame;

and step 3: inputting an image sample for testing into a target detection model and an edge detection model, and acquiring a surface feature target outer frame and an edge intensity map; the object target edge intensity graph can show the possibility that the corresponding position in the remote sensing image is the target edge; the ground object target outer covering frame is a rectangular limiting frame for marking the position range of the target and the type of the target; the method specifically comprises the following steps:

step 3.1: inputting the high-resolution remote sensing image into a target detection model, and obtaining a rectangular outer frame of the ground object target; inputting the high-resolution remote sensing image into an edge detection model to obtain a surface feature target edge intensity map;

step 3.2: parameterizing an outer package frame: converting the rectangular outer frame parameters output by the target detection model into a left lower vertex coordinate and a right upper vertex coordinate of the rectangular outer frame parameters; specifically, the method comprises the following steps:

x1＝x-w y1＝y-h

x2＝x+w y2＝y+h

wherein x, y, w and h respectively represent the central abscissa, the central ordinate, the width and the height of the rectangular outer wrapping frame;

and 4, step 4: the edge intensity graph in the step 3 is refined into a single-pixel-wide boundary; the method specifically comprises the following steps:

step 4.1: and converting the edge intensity map of the ground object target from a gray scale map into a binary map according to a set threshold value. The edge judgment condition of the ground object at the image (x, y) is represented by binary _ image (x, y) (1 represents edge, 0 represents not edge), and can be represented as:

where threshold is a real number in the interval [0,1], which can be set by the user, and the initial default value is 0.5. And x and y are horizontal and vertical coordinates of the image.

Step 4.2: for the thick edge lines with a plurality of pixel widths in the binary image, the center of each edge line is continuously etched until all the edge lines are only one pixel width, so that the aim of skeleton extraction is fulfilled.

And 5: and (4) repairing the boundary with the single pixel width in the step (4) by taking the foreign object target outer covering frame in the step (3) as a constraint so as to obtain a complete and fine polygonal foreign object target boundary.

The method specifically comprises the following steps:

step 5.1: the method comprises the steps of finding out an unclosed part in a ground object target boundary line, and dividing the unclosed part into three types, namely a boundary broken line at the edge of an image (the boundary at the edge of the image is incomplete due to a skeleton extraction algorithm), a boundary broken line inside the image (the part which is not correctly identified by a boundary detection model) and a boundary line head inside the image (the target boundary has an excessive boundary line due to the skeleton extraction algorithm).

Step 5.2: three types of boundary breaks are handled in three different ways.

Step 5.2.1: and regarding the boundary broken line at the edge of the image, taking the edge strength image at the corresponding position as an indication, selecting pixel points with higher edge strength values to fill the target boundary broken line, and if a gap still exists between the pixel points and the image boundary, adopting a straight line perpendicular to the image boundary to connect the pixel points and the target boundary broken line and form a closed target boundary line by connecting the pixel points and the image boundary.

Step 5.2.2: and resetting the threshold for the boundary broken line in the image, mapping pixel points with the intensity values higher than the new threshold in the edge intensity graph into a binary graph by taking the edge intensity graph at the corresponding position as an indication, and connecting the two broken points by the original geometric characteristics if the target boundary broken line is not closed. The steps of repairing the break point with the original geometric characteristics are as follows.

Step 5.2.2.1: and taking out the innermost boundary line connected with the two break points.

Step 5.2.2.2: : the most inner circle boundary line is divided into a plurality of parts according to the change condition of the slope, and the approximate geometrical shape of the boundary line is determined.

Step 5.2.2.3: and (4) repairing the broken points into a closed graph according to the geometric shape, so that the repaired positions keep the original geometric characteristics.

Step 5.2.3: the isolated boundary lines within the image, i.e., the isolated boundary lines that are far from other broken lines, are deleted.

Step 5.3: and 5, performing skeleton extraction again, and refining the part filled in the step 5.2 into a target boundary line with the width of a single pixel.

Step 5.4: and traversing the image again, and deleting all unclosed target boundary lines to obtain a complete and fine polygonal ground object target boundary.

Due to the adoption of the technical scheme, the invention has the following advantages and beneficial effects:

1. the invention adopts a method of combining the target detection model and the edge detection model, thereby not only avoiding the defect of inaccurate mask boundary in the common example segmentation algorithm, but also solving the problems of easy line breakage and thick outline in the common edge extraction algorithm, and ensuring the integrity and the refinement of the target boundary.

2. Compared with the traditional method for manually identifying the remote sensing target, the method has higher identification speed, and for a remote sensing image with hundreds of targets, the method can identify the remote sensing image in less than one second, and the manual drawing usually needs several hours; the invention has higher accuracy, and avoids the condition of accuracy reduction caused by manual drawing misoperation due to the adoption of an end-to-end design method.

3. The invention adopts a deep neural network algorithm in machine learning, can identify various targets and extract target boundaries only by changing sample training, does not need to redesign the algorithm, and has strong reusability and robustness.

Drawings

FIG. 1 is a schematic diagram of the edge-assisted fine extraction method of high-resolution remote sensing targets of the invention;

FIG. 2 is a sample graph of a target detection model in an embodiment of the invention;

FIG. 3 is a sample graph of an edge detection model in an embodiment of the invention;

FIG. 4 is a diagram illustrating an example of a prediction result of an object detection model in an embodiment of the invention;

FIG. 5 is a diagram illustrating an example of the prediction results of the edge detection model in an embodiment of the invention;

FIG. 6 is a graph illustrating a comparison of final boundary results in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a schematic diagram of a method for finely extracting a high-resolution remote sensing target with edge assistance.

Referring to fig. 1, a preferred embodiment of the present invention is provided, which includes the following steps:

step 1: making a remote sensing image target extraction sample, drawing a fine boundary of a target by contrasting an image and determining a type;

step 2: training a target detection model and an edge detection model by using the prepared image sample;

and step 3: inputting an image sample for testing into a target detection model and an edge detection model, and acquiring a surface feature target outer frame and an edge intensity map;

and 4, step 4: the edge intensity graph in the step 3 is refined into a single-pixel-wide boundary;

According to the above example, step 1 is detailed as follows:

step 1.1: obtaining a high-resolution remote sensing image: the multispectral image or the fusion panchromatic image can be directly used according to the resolution requirement by adopting optical satellite remote sensing data with a visible light-near infrared sensor or aviation remote sensing data carrying a common optical camera.

Step 1.2: cutting a remote sensing image: and uniformly cutting the remote sensing image into 512 by 512 pixels in the range of the typical target in the production area (the cutting range according to the size of the production area has certain randomness and is widely covered).

Step 1.3: manufacturing a deep learning training sample: using ArcGIS or other GIS software to draw a deep learning training sample, in the embodiment, the boundary of each ground feature target needs to be marked to obtain a corresponding shp file, and generating a target outer enclosure frame mark according to the requirements of a target detection model, as shown in FIG. 2; the target boundary label is generated according to the requirements of the edge detection model, as shown in fig. 3.

Step 1.4: more than 200 images and corresponding labels are generally collected as training samples according to task requirements, and test samples can be independently prepared if detection precision is required.

Step 2 detailed steps are as follows:

according to the above example, step 2.1: designing a deep convolutional neural network: in order to train the target detection model and the edge detection model, two neural networks, namely RCF and Faster R-CNN, are selected in the invention, and can be replaced and modified correspondingly according to the production target network.

Step 2.2: initializing the weight: the VGG pre-training model is used to initialize RCF network weights, and the pre-training model on the COCO data set is used to initialize the weights of Faster R-CNN.

Step 2.3: setting a training hyper-parameter: the hyper-parameters are configured, and the specific values are set as follows after the model is tuned and optimized.

Training parameter setting of RCF: the number of iterations is 8000, batch _ size is 4, the learning rate update strategy is step, the learning rate update step is 3200,4800,6400,8000, the initial learning rate is 0.001, and the learning rate update coefficient is 0.1.

Training parameter settings for Faster R-CNN: the training phase number is 3, the number of iteration rounds of each phase is [40,120, 40], the number of iterations of each round is 1000, the verification interval is 50, the batch _ size is 4, the learning rate updating strategy is step, the learning rate updating step size is 1000, the initial learning rate is 0.001, the learning rate updating momentum is 0.9, and the weight decay is 0.0001.

Step 2.4: inputting a sample, training a model: inputting the training samples into the RCF model, and training according to the hyper-parameters in the step 2.3 to obtain an edge detection model capable of extracting the edge contour of the ground object target; and (4) inputting the training sample into the Faster R-CNN, and training according to the hyperparameter in the step 2.3 to obtain a target detection model capable of extracting the surface feature target outer covering frame.

According to the above embodiment, step 3 is detailed as follows:

step 3.1: inputting the high-resolution remote sensing image into a target detection model, and obtaining a rectangular outer frame of the ground object target, as shown in FIG. 4; and inputting the high-resolution remote sensing image into an edge detection model to obtain a surface feature target edge intensity map, as shown in fig. 5.

Step 3.2: parameterizing an outer package frame: and converting the rectangular outer enclosure frame parameters output by the target detection model into the left lower vertex coordinate and the right upper vertex coordinate of the rectangular outer enclosure frame parameters. Specifically, the method comprises the following steps:

x1＝x-w y1＝y-h

x2＝x+w y2＝y+h

wherein x, y, w and h respectively represent the central abscissa, the central ordinate, the width and the height of the rectangular outer wrapping frame.

According to the above embodiment, step 4 is detailed as follows:

According to the above embodiment, step 5 is detailed as follows:

Step 5.2: three types of boundary breaks are handled in three different ways.

Step 5.4: and traversing the image again, and deleting all unclosed target boundary lines to obtain a complete and fine ground object target boundary shown in fig. 6-A, wherein fig. 6-B is a predicted boundary result of Mask-RCNN, and the comparison of the two results shows that the method has a relatively obvious precision advantage.

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A high-resolution remote sensing target fine extraction method based on edge assistance comprises the following steps:

training parameter setting of RCF: 8000 for iteration number, 4 for batch _ size, 3200,4800,6400,8000 for learning rate update step, 0.001 for initial learning rate, and 0.1 for learning rate update coefficient;

training parameter settings for Faster R-CNN: training phase number is 3, each phase iteration round number is [40,120, 40], each iteration round number is 1000, verification interval is 50, batch _ size is 4, learning rate update step size is 1000, initial learning rate is 0.001, learning rate update momentum is 0.9, and weight decay is 0.0001;

and step 3: inputting an image sample for testing into a target detection model and an edge detection model, and acquiring a surface feature target outer frame and an edge intensity map; the surface feature target edge intensity chart indicates the possibility that the corresponding position in the remote sensing image is the target edge; the ground object target outer covering frame is a rectangular limiting frame for marking the position range of the target and the type of the target; the method specifically comprises the following steps:

x1＝x-w y1＝y-h

x2＝x+w y2＝y+h

step 4.1: converting the surface feature target edge intensity map into a binary map from a gray map according to a set threshold; the edge judgment condition of the ground object target at the image (x, y) is represented by binary _ image (x, y), and can be represented as:

wherein 1 represents edge, 0 represents not edge, threshold is real number in interval [0,1], which can be set by user, and initial default value is 0.5; x and y are horizontal and vertical coordinates of the image;

step 4.2: for the thick edge lines with a plurality of pixel widths in the binary image, continuously performing corrosion operation by using the center of each edge line until only one pixel width is left in all the edge lines, thereby achieving the purpose of skeleton extraction;

and 5: taking the foreign object target outer frame in the step 3 as constraint, and repairing the single pixel boundary in the step 4 to obtain a complete and fine polygonal foreign object target boundary; the detailed steps are as follows:

step 5.1: finding out an unclosed part in a ground object target boundary line, and dividing the unclosed part into three types, namely a boundary broken line at the edge of an image, a boundary broken line in the image and a boundary line head in the image, wherein the boundary broken line at the edge of the image is the part which is not correctly identified by a boundary detection model, and the boundary broken line in the image is the part which is not correctly identified by the boundary detection model, and the boundary line head in the image is the redundant boundary line of the target boundary caused by the skeleton extraction algorithm;

step 5.2: processing the three types of boundary broken lines by three different methods;

step 5.2.1: for the boundary broken line at the edge of the image, taking an edge strength image at a corresponding position as an indication, selecting pixel points with higher edge strength values to fill the target boundary broken line, if a gap still exists between the pixel points and the image boundary, adopting a straight line perpendicular to the image boundary to connect the pixel points with the target boundary broken line and form a closed target boundary line by connecting the pixel points with the image boundary;

step 5.2.2: resetting a threshold value for the boundary broken line in the image, mapping pixel points with intensity values higher than the new threshold value in the edge intensity graph into a binary image by taking the edge intensity graph at the corresponding position as an indication, and connecting the two broken points by the original geometric characteristics if the target boundary broken line is not closed; wherein, the step of repairing the breakpoint by the original geometric characteristic is as follows;

step 5.2.2.1: taking out the boundary line of the innermost circle connected with the two breakpoints;

step 5.2.2.2: dividing the boundary line of the innermost circle into a plurality of parts according to the change condition of the slope, and determining the approximate geometric shape of the boundary line;

step 5.2.2.3: repairing the broken point into a closed graph according to the geometric shape, so that the repaired part keeps the original geometric characteristic;

step 5.2.3: deleting the isolated boundary lines in the image, namely the isolated boundary lines which are far away from other broken lines;

step 5.3: performing skeleton extraction again, and refining the part filled in the step 5.2 into a target boundary line with a single pixel width;