CN109948697B

CN109948697B - Method for extracting urban built-up area by using multi-source data to assist remote sensing image classification

Info

Publication number: CN109948697B
Application number: CN201910208202.8A
Authority: CN
Inventors: 苗则朗; 史文中; 贺跃光; 肖粤龙
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2022-08-26
Anticipated expiration: 2039-03-19
Also published as: CN109948697A

Abstract

The invention belongs to the field of remote sensing image processing, and discloses a method for extracting urban built-up areas by using a multi-source data-assisted remote sensing image classification, which comprises the steps of firstly preprocessing remote sensing data; then preprocessing operations such as coordinate conversion, redundant data elimination and filtering are carried out on the acquired multi-source data, and the frequency and spectrum similarity of the training samples in the single pixel are calculated; and finally, extracting the built-up city area by using a single-class support vector machine. The method realizes the rapid extraction of the urban built-up area, realizes the automatic generation of real-time and nearly zero-cost training samples by using the multi-source data, integrates the multi-source geographic data and the multi-source remote sensing image data to extract the urban built-up area, and greatly improves the extraction precision of the urban built-up area.

Description

Method for extracting urban built-up area by using multi-source data to assist remote sensing image classification

Technical Field

The invention relates to a method for extracting urban built-up areas by using a multi-source data-assisted remote sensing image classification, and belongs to the field of remote sensing image processing.

Background

The area of the urban built-up area is an important index for measuring the urbanization process. The accurate extraction of the boundary of the built-up area is beneficial to reasonably planning the land and provides important reference for relevant researches such as urban waterlogging, heat island effect and the like. The expansion of a built-up area also has a significant impact on regional climate change.

Remote sensing technology plays a very important role in monitoring a wide range of land type changes. At present, the technology for extracting the urban built-up area by using multi-source remote sensing images, such as DMSP/LOS remote sensing light data, MODIS, Landsat, ENVISAT ASAR, SPOT and the like, has been widely applied. To date, many methods have been developed for unsupervised or supervised satellite image interpretation. And in particular supervised classification, has gained wider application due to its higher precision performance. Although there are many advantages to using remote sensing images to extract urban built-up areas, the acquisition of training samples required by supervised classification is often an important factor affecting the accuracy of the built-up area extraction. The supervised classification needs a large amount of on-site training samples, consumes a large amount of manpower, material resources and financial resources, and particularly consumes a huge amount of 'three forces' when a classification area rises to a global scale, which puts an urgent need on a real-time and low-cost training sample sampling technology. Meanwhile, the selection of the training samples requires that operators have prior knowledge of the land types, the manual selection of the training samples is time-consuming and labor-consuming, and the extraction efficiency of the urban built-up area is limited.

Disclosure of Invention

The invention aims to provide a method for extracting urban built-up areas by using a multi-source data-assisted remote sensing image in a classified manner.

In order to achieve the purpose, the invention provides a method for extracting urban built-up areas by using a multi-source data-assisted remote sensing image in a classified manner, which comprises the following steps:

(1) carrying out data preprocessing on the obtained remote sensing image;

(2) acquiring the mass-source data, selecting training samples from the mass-source data, calculating the number of the training samples in a single pixel according to the spatial resolution of the remote sensing image, and further calculating the frequency and spectrum similarity;

(3) extracting the impervious layer by using a single-class support vector machine;

(4) and performing EM clustering on the impervious layer to obtain the urban built-up area to be extracted.

Further, the remote sensing image in the step (1) is taken from Landsat-8, and the data preprocessing comprises data correction, abnormal value detection and cloud masking.

Further, the data is corrected to convert the terrain correction data into a radiation value by using the provided calibration value in case of multispectral data; the abnormal value is detected as an abnormal value generated by acquisition inconsistency or calibration error; the cloud mask is a multispectral image with cloud content less than 10% selected to avoid cloud coverage effects.

Further, the crowd-source data in the step (2) includes two open data sources, namely social media data and openstreet map.

Further, in the step (2), the social media data is taken from Twitter, the number of tweets in a single pixel in the Landsat image is first calculated, and then only pixels with at least one tweets point are taken as training samples; with Ω ═ x _i I 1.. multidot.l } represents a training sample set, where x represents a spectral vector whose number of columns is equal to the number of bands of the multispectral image;

the frequency and spectral similarity of the Twitter data were measured:

1) measuring the frequency of Twitter data: defining the frequency of tweets of the ith training sample as F _i ：

Wherein l is the number of Landsat pixels containing tween, T _i Is the number of tweets, T, in the corresponding pixel _i A larger value of (a) indicates a larger probability that the training sample is located in the impervious layer;

2) measurement of spectral similarity of Twitter data: assuming that a cluster is formed by training samples derived from Twitter, the distance from the non-permeable class to the cluster center of the training samples is smaller than the distance from the permeable class to the cluster center of the training samples, and the distance from the training samples to the cluster center is quantitatively measured by using a minimum covariance matrix MCD; when the point cloud is symmetrically distributed around a center, the probability density function of the ith training sample is expressed as

Wherein x _i Is the spectral vector of the ith training sample, u is the sample mean, Σ is the covariance matrix of the mean, and the distance between the ith training sample and its mean is defined as the spectral similarity S _i

S _i ＝(x _i -u) ^T ∑ ^-1 (x _i -u) (3)

Wherein

And

respectively, the sample mean and the sample covariance matrix, S _i Smaller values of (i) indicate a higher probability that the ith training sample falls in the watertight region;

based on the frequency and spectral similarity of the Twitter data, the weight of the ith training sample is expressed as:

further, in the step (2), for the data taken from the openstreet map, the impervious layer is first converted into two OSM raster images with spatial resolutions of 1m and 30m, respectively, and the ith impervious pixel of the OSM raster image with resolution of 30m covers the area R formed by 900 pixels on the OSM raster image with resolution of 1m _i (ii) a Thus, the frequency of the ith water-impermeable picture element is defined as:

wherein

Is the OSM raster image region R of 1m resolution _i The number of impermeable pixels,/ ^o Is the number of impervious layer pixels of the OSM raster image with the resolution of 30 m;

spectral similarity S of impervious pixels with OSM as training sample _i ^o Is represented as follows:

wherein y is _i And

respectively, the sample mean and the sample covariance matrix, S _i ^o Smaller values of (i) indicate a higher probability that the ith training sample falls in the watertight region; weight w of training sample generated by OSM _i ^o Is represented as follows:

further, in the step (3), classifying the remote sensing image by using a single support vector machine (OCSVM) to extract impervious layer information;

the OCSVM trains a hypersphere with a minimum volume, wherein the hypersphere contains a maximum training sample of a single class, and the distance from the sample to a boundary is interpreted as a similarity degree and provides a reference for judging whether the sample belongs to a specific class or not;

OCSVM as a constrained convex optimization problem

s.t.||x _i -a|| ² ≤R ² +ξ _i

ξ _i ≥0，i＝1，...，l(8)

Wherein a and R are the center and radius of the hypersphere, respectively; | | The | is the Euler norm, ξ _i Is used to control the relaxation variable of the relaxation degree, C is a trade-off parameter; the dual form of equation (8) is as follows:

s.t.0≤α _i ≤C，i＝1，...，l

wherein<·，·>Is the inner product, { α ₁ ，...，α _l Is the lagrange multiplier;

the weight of the training samples from the water impermeable area is made greater than the weight of the training samples from the water permeable area, and the weight of the training samples in equation (4) is introduced into equation (8), as shown below

s.t.||x _i -a|| ² ≤R ² +ξ _i

ξ _i ≥0，i＝1，...，l (10)

Equation (9) becomes the following equation accordingly:

s.t.0≤α _i ≤w _i C，i＝1，...，l

by kernel function K<x _i ，x _j >Replacing the inner product in equation (7) and equation (9)<x _i ，x _j >Obtaining a core version of an OCSVM (weighted OCSVM) and a weighted single-class support vector machine (WOCSVM), determining the most appropriate parameter by a quintuplet cross-validation method, and selecting a free parameter by the following formula:

where theta is the entire set of free parameters,

is the optimal set of parameters obtained by quintupling cross-validation, OA is the overall accuracy, nSV is the number of support vectors.

Further, the EM clustering algorithm in the step (4) is to cluster the probability map of the impervious layer obtained in the step (3) to remove image noise of the probability map and fill holes existing in the image.

Further, the EM clustering algorithm divides the probability map into five classes, and at least one class belonging to the impervious layer is extracted from the five classes.

The invention integrates the multi-source data and the remote sensing image data, and realizes high-precision urban built-up area extraction by using a single support vector machine: firstly, open and free crowd-sourced geographic data are obtained, and the crowd-sourced data are preprocessed; preprocessing the remote sensing data; and finally, a single-type support vector machine is used for extracting the urban built-up area, so that a real-time and nearly zero-cost training sample is automatically generated by using the multi-source data, the multi-source geographic data and the multi-source remote sensing image data are fused to extract the urban built-up area, the urban built-up area is rapidly extracted, and the extraction precision of the urban built-up area is greatly improved.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a flow chart of one embodiment of the present invention.

Fig. 2 is Landsat-8 remote sensing data in tokyo, japan in an embodiment of the present invention.

Fig. 3 is a partial image of fig. 2.

Fig. 4 is a distribution diagram of social media data Twitter as a training sample.

FIG. 5 is a graph of the results of extracting impermeable layers using Twitter data as training samples.

Fig. 6 is a distribution diagram of open source data OSM as a training sample.

Fig. 7 is a graph of the results of extracting impermeable layers using OSM data as a training sample.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

In an embodiment of the present invention, as shown in fig. 1, a flowchart of a method for extracting a built-up city area from remote sensing image data by using a crowd-sourced data automatic generation training sample includes the following specific steps:

s1, preprocessing original data.

The remote sensing data selected by the embodiment of the invention is the Landsat 8 image. The study area was tokyo, japan, and several pre-processing steps were employed in order to reduce the temporal and spatial radiation differences of the different Landsat datasets used in the examples. The method specifically comprises data correction, outlier detection and cloud masking.

(a) And data correction, in the case of multispectral data, converting the terrain correction data into radiometric values using the provided calibration values.

(b) Outlier detection acquisition inconsistencies or calibration errors produce outliers, especially in multispectral data (labeled 0 or NaN), which may lead to classification errors. This is often the case at the borders of the image because each band captured by the sensor is slightly delayed, resulting in some missing position information. As previously mentioned, this phenomenon affects the multispectral dataset and increases the edge portions that must be excluded from the processing. To prevent this problem, an automatic shielding method is proposed.

(c) Cloud layer existence can influence the precision of classification, and therefore the multispectral image is selected to avoid cloud layer coverage influence. In practice, images with cloud cover less than 10% are selected.

FIG. 2 shows a preprocessed Landsat-8 OLI image of Tokyo 2015, Japan, with 7 bands of 30m spatial resolution. The embodiment of the present invention selects a local area 2666 (rows) × 2728 (columns) from the whole scene image for description (fig. 3).

And S2, generating a training sample from the open data. The embodiment of the invention selects two open data sources of social media data (taking Twitter data as an example) and Openstoretmap (OSM) respectively. The two data are taken as an example to illustrate that the training sample strategy is automatically generated by the crowd-sourced data. Twitter is considered herein to be the only source of geo-location related social media. Firstly, the number of tweets in a single pixel in the Landsat image is calculated, and then only the pixel with at least one tweets point is taken as a training sample. Q ═ x _i I 1.. l } represents a training sample set. Where x represents a spectral vector (i.e., spectral vector) with a number of columns equal to the number of bands of the multispectral image. It should be noted that Twitter data with original geographic reference has two basic features, 1) tweets can be repeatedly sent in 30m pixels; 2) the twets can be delivered on either a water-impermeable surface or a water-permeable surface. The embodiment of the invention provides two measurement methods (namely the frequency and the spectrum similarity of the Twitter data) to describe the characteristics of the Twitter data.

1) Twitter data typically recurs in the same coordinate-like range (Landsat-8 terrestrial spatial resolution of 30 meters). On the basis, defining the frequency of tweets of the ith training sample asF _i ：

Wherein l is the number of Landsat pixels containing tween, T _i Is the number of tweets in the corresponding pixel. T is _i A larger value of (a) means that the training sample is more likely to be located in the impervious layer. For example, the number of Twitter sent in a certain pixel of the Landsat-8 image is 5, and the maximum Twitter sending frequency in a certain pixel of the image is, for example, 10, so that the pixel with frequency of 5 has a frequency of 0.5, i.e., F _i ＝0.5。

2) Spectral similarity of Twitter data As mentioned above, although a small number of tweets are from water-permeable areas, such as the sea/beach, farmland, mountainous areas, etc., most of the tweets are concentrated in water-impermeable areas. This means that training samples from Twitter have similar spectral values. Assume that training samples derived from Twitter form a cluster. The distance from the non-permeable class to the clustering center of the training sample is smaller, and the distance from the permeable class to the clustering center of the training sample is larger. The distance of the training samples to the cluster center can be quantitatively measured with a Minimum Covariance Matrix (MCD). The core idea of the MCD algorithm is to find h observation values with minimum spatial divergence. The determinant of the covariance matrix is a good measure of the divergence of the point cloud when the point cloud (in this example the h most central observations) is symmetrically distributed around a center. Assuming that the ith training sample satisfies the multivariate normal distribution, the probability density function thereof can be expressed as

Wherein x _i Is the spectral vector of the ith training sample, u is the sample mean,

is the mean valueThe covariance matrix of (2). The distance between the ith training sample and the mean value is defined as the spectral similarity S _i

S _i ＝(x _i -u) ^T ∑ ^-1 (x _i -u) (3)

Wherein

And

respectively, a sample mean and a sample covariance matrix. S _i A smaller value of (i) means that the ith training sample is more likely to fall in the water-impermeable area. Given the frequency and spectral similarity of the Twitter data, the weight of the ith training sample is expressed as:

for example, when the frequency of the ith training sample is 0.5, when S _i When 2, weight w _i 0.8244; fig. 4 shows a distribution diagram of social media data Twitter as a training sample.

For OSM, we first convert the impervious layers of buildings, traffic, roads, etc. into two raster images with spatial resolutions of 1m and 30m, respectively. The i-th impervious pixel of the OSM raster image with the resolution of 30m will cover 900 pixels (region R) on the OSM raster image with the resolution of 1m _i ). Thus, the frequency of the ith water-impermeable picture element is defined as:

wherein

Is the OSM raster image region R of 1m resolution _i The number of impermeable pixels,/ ^o The number of impervious layer pixels of the OSM raster image with the resolution of 30 m; when the region R _i The number of the waterproof pixels is 300, the frequency F in the area _i ^O ＝0.3333；

Similar to Twitter, the spectral similarity of the impervious pixels of OSM can also be calculated in the same way. Using F _i ^O Alternative F _i And updating the spectrum similarity in the formula (4) to obtain the weight of the training sample generated by the OSM. Spectral similarity S of impervious pixels with OSM as training sample _i ^o Is represented as follows:

wherein y is _i And

fig. 6 is a distribution diagram of another open source data (OSM) as a training sample.

S3, extracting the impervious layer by using a single-class support vector machine

The embodiment of the invention classifies the satellite images by using a single-class support vector machine (OCSVM), and extracts the information of the impervious layer. In contrast to a standard support vector machine that finds optimal boundaries separating two classes, OCSVM trains a hypersphere with a minimum volume that contains the largest training sample of a single class. The distance of a sample to a boundary may be interpreted as a degree of similarity, which may provide a reference as to whether the sample belongs to a particular class. OCSVM is a constrained convex optimization problem

s.t.||x _i -a|| ² ≤R ² +ξ _i

ξ _i ≥0，i＝1，...，l (8)

Wherein α and R are the center and radius of the hypersphere, respectively; | | The | is the Euler norm, ξ _i Is a slack variable, and C is a trade-off parameter that is used to control the degree of slack. The dual form of equation (8) is as follows:

s.t.0≤α _i ≤C，i＝1，...，l

wherein<·，·>Is the inner product, { α ₁ ，...，α _l Is the lagrange multiplier.

Twitter data as training samples may come from both watertight and water-permeable areas. In other words, the training samples contain a certain proportion of the image elements that are mislabeled. As described in step S2, we give training samples from the water impermeable areas a higher weight and give training samples from the water permeable areas a lower weight. Therefore, we assume that a training sample with small weight has a weak influence on the parameters (such as center and radius) for fitting the hypersphere. For this purpose, the weights of the training samples in equation (4) are introduced into equation (8), as shown below

s.t.||x _i -a|| ² ≤R ² +ξ _i

ξ _i ≥0，i＝1，...，l (10)

Equation (8) becomes the following equation accordingly:

s.t.0≤α _i ≤w _i C，i＝1，...，l

by kernel function K<x _i ，x _j >Replacing the inner product in equation (9) and equation (11)<x _i ，x _j >The nucleated versions of OCSVM and WOCSVM are obtained. The most suitable parameters are determined by quintupling cross-validation. In general, the free parameters are difficult to adjust when only a sample of the target marker is available. In fact, in this case, only the true rate (sensitivity) can be calculated, and the other error corresponding values (specificity) cannot be calculated. In this case, the choice of the free parameters is determined by the following equation:

where (is the entire set of free parameters,

is the optimal set of parameters obtained by quintupling cross-validation, OA is the overall accuracy, nSV is the number of support vectors. This performance constraint limits the complexity of the model by keeping the number of support vectors low, thereby achieving higher overall accuracy. In order to reduce the computational cost brought by the large-scale training sample capacity, the OCSVM classification is realized by using the LIBSVM packet accelerated by the GPU.

And S4, clustering the probability graph obtained in the step S3.

And (3) clustering the impervious bed (probability map) obtained in the last step by utilizing an expectation maximization clustering algorithm (EM clustering algorithm), wherein the clustering aims to remove image noise and fill holes in the image. The general steps of the EM algorithm are:

e, step E: q _i (z ⁽ⁱ⁾ )：＝p(z ⁽ⁱ⁾ |x ⁽ⁱ⁾ ；θ)

And M:

the probability graph is divided into 5 classes by using an EM algorithm, the class or classes belonging to the impervious layer are extracted, and finally the urban built-up area or the impervious layer to be extracted is obtained.

Fig. 5 and 7 are graphs showing the result of extracting an impervious layer using Twitter data as a training sample and OSM data as a training sample, respectively.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.

In addition, any combination of various different implementation manners of the embodiments of the present invention can be made, and the embodiments of the present invention should also be regarded as the disclosure of the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.

Claims

1. A method for extracting urban built-up areas by using multi-source data to assist remote sensing image classification is characterized by comprising the following steps:

(1) carrying out data preprocessing on the obtained remote sensing image;

(2) acquiring the mass-source data, selecting training samples from the mass-source data, calculating the number of the training samples in a single pixel according to the spatial resolution of the remote sensing image, and further calculating the frequency and spectrum similarity; the crowd source data comprises two open data sources, namely social media data and Openstreetmap, wherein the social media data is taken from Twitter, the number of tweets in a single pixel in a Landsat image is firstly calculated, and then only pixels with at least one tweet point are taken as training samples; with Ω ═ x _i I 1.. multidot.l } represents a training sample set, where x represents a spectral vector whose number of columns is equal to the number of bands of the multispectral image;

the frequency and spectral similarity of the Twitter data were measured:

Wherein l is the number of Landsat pixels containing tween, T _i Is the number of tweets, T, in the corresponding pixel _i The larger the value of (a) is, the larger the probability that the training sample is located in the impervious layer is;

Wherein x _i Is the spectral vector of the ith training sample, u is the sample mean,Σ is a covariance matrix of the mean, and the distance of the ith training sample from its mean is defined as the spectral similarity S _i

S _i ＝(x _i -u) ^T ∑ ^-1 (x _i -u) (3)

Wherein S _i Smaller values of (i) indicate a higher probability that the ith training sample falls in the watertight region;

for the data taken from the Opestreemap, firstly, the impervious layer is converted into two OSM raster images with the spatial resolutions of 1m and 30m, respectively, and the ith impervious pixel of the OSM raster image with the resolution of 30m covers an area R formed by 900 pixels on the OSM raster image with the resolution of 1m _i (ii) a Thus, the frequency of the ith water-impermeable picture element is defined as:

wherein

Is the OSM raster image region R of 1m resolution _i Water impermeable pixel count of l ^o Is the number of impervious layer pixels of the OSM grid image with the resolution of 30 m;

spectral similarity of impervious pixels with OSM as training sample

Is represented as follows:

wherein y is _i And

respectively a sample mean and a sample covariance matrix,

smaller values of (i) indicate a higher probability that the ith training sample falls in the watertight region; weight of training sample generated by OSM

Is represented as follows:

2. The method for extracting the urban built-up area by using the crowd-sourced data to assist the classification of the remote sensing image according to claim 1, wherein the remote sensing image in the step (1) is taken from Landsat-8, and the data preprocessing comprises data correction, abnormal value detection and cloud masking.

3. The method for extracting the urban built-up area by using the crowd-sourced data-assisted remote sensing image in a classified manner according to claim 2, wherein the data correction is to convert terrain correction data into a radiation value by using a provided calibration value in the case of multispectral data; the abnormal value is detected as an abnormal value caused by acquisition inconsistency or calibration error; the cloud mask is a multispectral image with cloud content less than 10% selected to avoid cloud coverage effects.

4. The method for extracting the urban built-up area by using the crowd-sourced data to assist the classification of the remote sensing image according to the claim 1, wherein in the step (3), the remote sensing image is classified by using a one-class support vector machine (OCSVM) to extract the impervious layer information;

OCSVM as a constrained convex optimization problem

Wherein a and R are the center and radius of the hypersphere, respectively; | | The | is the Euler norm, ξ _i Is used for controlling the relaxation variable of the relaxation degree, and C is a balance parameter; the dual form of equation (8) is as follows:

Equation (9) becomes the following equation accordingly:

by kernel functions K<x _i ，x _j >Alternative formulae (9) andinner product in formula (11)<x _i ，x _j >Obtaining a coring version of an OCSVM (weighted OCSVM) and a weighted single-class support vector machine (WOCSVM), determining the most appropriate parameter by a quintuplet cross-validation method, and selecting a free parameter by the following formula:

where theta is the entire set of free parameters,

is the optimal parameter set obtained by quintupling cross-validation, OA is the overall accuracy, nSV is the number of support vectors.

5. The method for extracting the built-up areas of the city by using the crowd-sourced data to assist the classification of the remote sensing images as claimed in claim 1, wherein the EM clustering algorithm in the step (4) clusters the probability map of the impervious layer obtained in the step (3) so as to remove the image noise of the probability map and fill holes existing in the image.

6. The method for classifying and extracting built-up areas of cities by using the crowd-sourced data-assisted remote sensing images as claimed in claim 5, wherein the EM clustering algorithm classifies the probability map into five classes, and extracts at least one class belonging to impervious layers from the five classes.