CN102867195A

CN102867195A - Method for detecting and identifying a plurality of types of objects in remote sensing image

Info

Publication number: CN102867195A
Application number: CN2012103006458A
Authority: CN
Inventors: 韩军伟; 周培诚; 王东阳; 郭雷; 程塨; 李晖晖
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2013-01-09
Anticipated expiration: 2032-08-22
Also published as: CN102867195B

Abstract

The invention relates to a method for detecting and identifying a plurality of types of objects in a remote sensing image based on sparse representation dictionary learning. The method is technically characterized by comprising the following steps of: training preprocessed training data to form a dictionary by using a sparse representation dictionary training method; performing sparse coding on the dictionary obtained by training by using an image sub-block in a test image, calculating a sparse representation coefficient to further obtain a reconstruction error of the image sub-block, and determining a candidate target area by thresholding the reconstruction error; and accurately detecting and identifying a plurality of types of objects in the remote sensing image by using post processing. By the method, a plurality of types of objects in the remote sensing image under a complex background can be detected and identified. The method is high in detection and identification accuracy and low in false alarm rate.

Description

A kind of remote sensing images multi-class targets detection and Identification method

Technical field

The present invention relates to a kind of remote sensing images multi-class targets detection and Identification method, can be applied to the polytype Target detection and identification under the complex background remote sensing images.

Background technology

An application as Remote Sensing Image Processing Technology, object detection and recognition under the complex background remote sensing images is a gordian technique in the fields such as military surveillance and precision strike, also be study hotspot and the difficult point in this field always, there is important military and civilian to be worth, is subject to people and more and more pays close attention to.

Remote Sensing Target detects the two kinds of methods that mainly contain at present.A kind of is that some shape, the geometric properties that has by the detection target in remote sensing images solves target detection problems, but because the remote sensing images background is complicated, exist a large amount of shape, geometric properties similar with target, only rely on these features to detect target and a large amount of undetected, flase drop can occur.Another kind is based on the thought of classification, wherein modal is Bag-of-Words(BoW) sorting technique, the method at first is that image is extracted SIFT feature and cluster, with cluster centre as the one group of standard base (image-region of standard) in the image space, then can organize the standard base with this image is carried out vector representation, at last resulting vector is classified and thresholding by use svm classifier device, obtain testing result; But the BoW method although the SIFT feature of extracting has yardstick and rotational invariance, has only been utilized the statistical nature of characteristic area, and has been ignored the spatial information of characteristic area, therefore uses the method verification and measurement ratio of BoW low, and false alarm rate is high; And another sorting technique Linear Spatial Pyramid Matching Using Sparse Coding(ScSPM) although considered the spatial information of characteristic area, resulting vectorial dimension for classification is too high, operand is excessive.In addition, most also only limits to simple target is detected based on the object detection method of classification, can not detect and identify a plurality of targets simultaneously.

Summary of the invention

The technical matters that solves

For fear of the deficiencies in the prior art part, the present invention proposes a kind of method of the remote sensing images multi-class targets detection and Identification based on the rarefaction representation dictionary learning.This method can automatically detect from the remote sensing images of complex background and identify dissimilar target, has higher accuracy of detection and lower false alarm rate.

Technical scheme

A kind of remote sensing images multi-class targets detection and Identification method is characterized in that step is as follows:

Step 1: use the method training dictionary based on the rarefaction representation dictionary learning, concrete steps are as follows:

Step a1 training image is processed in earlier stage: at first with generic target unification to a principal direction in the original image, then will unify after the direction image along 0 ° to 360 °, according to step-length

Rotate to be

The image of individual different directions; The original image of different classes of target is all processed according to the method described above, obtained

The class training image, wherein p is the different classes of number of targets that will detect,

Be the anglec of rotation, c is the total number of classification of different target different directions image in the resulting training image; Wherein:

For rounding downwards;

Step b1 data pre-service: adopt method of weighted mean pair Three components of the RGB of class training image are weighted and on average obtain gray level image, then gray level image are carried out down-sampling and process, and obtain the image of n * n size; The image of n * n size is carried out energy normalized process and obtain normalized image, again normalized image is converted to n ²The column vector of * 1 dimension as the row in the training data, obtains pretreated training dataset U=[U with column vector ₁, U ₂..., U _c], U wherein _iThe subdata collection of corresponding i class among the training dataset U, i=1,2 ..., c;

Step c1 trains dictionary: the FDDL software package by Fisher Discrimination Dictionary Learning for Sparse Representation issue is trained known training dataset U=[U ₁, U ₂..., U _c], obtain dictionary D=[D ₁, D ₂..., D _c], wherein, D _iThe sub-dictionary corresponding with the i class;

Step 2 sparse coding: according to the resulting dictionary D=[D of training ₁, D ₂..., D _c], each subimage block in the test pattern is carried out sparse coding, obtain sparse coefficient corresponding to each subimage block, concrete treatment step is as follows:

Step a2 test pattern pre-service: at first use the method for weighted mean described in the step b1 that test pattern is converted into the test gray level image, then use size to obtain subimage block along the test gray level image with interval steps b slip as the moving window of S * S; The subimage block down-sampling is processed the image that size is n * n, then carry out energy normalized and process, the image transitions after again energy normalized being processed is a n ²The column vector β of * 1 dimension represents pixel grey scale value information by the resulting subimage block of moving window with column vector β;

Step b2 sparse coding: each subimage block is passed through Optimized model

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient of corresponding each subimage block

Wherein

Be and sub-dictionary D _iCorresponding coefficient vector, ε＞0th, allowable error, || || ₁Be l ₁Norm, || || ₂Be l ₂Norm;

Step c2 asks for reconstructed error: according to the sparse coding coefficient

Calculate the reconstructed error e of each subimage block and each class _i, get e=min{e _iAs the reconstructed error of this subimage block, and record its corresponding classification

Then judge whether comprise target in this subimage block according to the magnitude relationship between reconstructed error e and the predefined threshold tau: if e＜τ illustrate to comprise target, otherwise, illustrate that this subimage block is background;

Step 3 object detection and recognition:

Step a3: will judge the corresponding reconstructed error e of each subimage block that comprises target among the step c2, and form a reconstructed error matrix E=(e of the same size with the test gray level image, as to represent candidate target region _St) _{P * Q}Wherein, e _StThe value of locating in coordinate points (s, t) for the reconstruct error matrix,

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P * Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q;

Will be among the step c2 judge the corresponding classification C of each subimage block that comprises target, forms one with test that gray level image is of the same size, the classification matrix L that represents the candidate target classification=(C _St) _{P * Q}C wherein _StThe value of locating in coordinate points (s, t) for the classification matrix,

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};

Step b3: change big or small G time that slides window S * S, repeating step 2 ~ step a3G time, the G that obtains a reconstructed error matrix and G classification matrix, the span of G is 5 ~ 10; G the reconstructed error matrix that obtains formed a multiple dimensioned reconstructed error matrix M AP=(e _Stg) _{P * Q * G}Wherein, e _StgBe the element among the matrix M AP, its value is the corresponding e of reconstructed error matrix that the sliding window size of the g time change obtains _St, P * Q * G is the size of multiple dimensioned reconstructed error matrix, g=1, and 2 ... G;

G the classification matrix that obtains consisted of a multiple dimensioned classification Matrix C LASS=(C _Stg) _{P * Q * G}Wherein, C _StgBe the element among the Matrix C LASS, its value is the corresponding C of classification matrix that the sliding window size of the g time change obtains _StObtain a minimal reconstruction error matrix (map (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{P * Q}, wherein map (s, t) is the value that corresponding minimal reconstruction error matrix is located in coordinate points (s, t),

Then obtain the minimum classification matrix (class (s, t)) of corresponding minimal reconstruction error matrix _{P * Q}, wherein class (s, t) is the value that minimum classification matrix is located in coordinate points (s, t),

Obtain Scale Matrixes scale=(scale (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{P * Q}, scale (s, t) is the value that corresponding Scale Matrixes is located in coordinate points (r, t),

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

Step c3: ask for minimal reconstruction error matrix (map (s, t)) _{P * Q}The local neighborhood minimal value as the target response value that detects, the local neighborhood minimal value is at minimal reconstruction error matrix (map (s, t)) _{P * Q}In corresponding coordinate be the center of target, according to the center at (class (s, t)) _{P * Q}(scale (s, t)) _{P * Q}The corresponding classification of target and scale size are found in the position of middle correspondence.

Described calculated with weighted average method formula is f (x, y)=0.3R (x, y)+0.59G (x, y)+0.11B (x, y), in the formula, f (x, y) gray level image that obtains for method of weighted mean is at the gray-scale value of pixel (x, y), R (x, y), G (x, y) and B (x, y) be respectively the training image of input at three component values of the RGB of pixel (x, y).

Described energy normalized computing formula is In the formula, f _Norm(x, y) is the gray-scale value after f (x, y) the process energy normalized, and u and v are respectively the row and column size of gray level image.

Described l ₁The computing formula of norm is

{| | z | |}_{1} = Σ_{k = 1}^{M} | ξ_{k} |

In the formula, z is that size is the vector of M * 1, ξ _kBe the element of vectorial z, k=1,2 ..., M.

Described l ₂The computing formula of norm is

{| | z | |}_{1} = \sqrt{Σ_{k = 1}^{M} {| ξ_{k} |}^{2}}

Described reconstructed error e _iComputing formula be

e_{i} = {| | β - D_{i} {\hat{α}}_{i} | |}_{2}^{2} + γ {| | \hat{α} - m_{i} | |}_{2}^{2}

In the formula, γ is predefined weights, and the span of γ is 0 ~ 1, m _iTo Y _iIn the element of the every delegation mean vector of averaging and obtaining; Y _iBe U _iThe optimum code coefficient that process dictionary D sparse coding obtains.

The described anglec of rotation

Span is 0 ° to 90 °.

Described FDDL software package parameter lambda ₁Scope be 0.001 ~ 0.01, λ ₂Scope be 0.01 ~ 0.1.

The span of described S is the integer between 40 ~ 90, and the span of b is the integer between 1 ~ 15.

The span of described threshold tau is 0 ~ 1.

Beneficial effect

The method of a kind of remote sensing images multi-class targets detection and Identification based on the rarefaction representation dictionary learning that the present invention proposes, at first use pretreated training data to train redundant dictionary, then use the resulting dictionary of training to carry out sparse coding to the subimage block in the test pattern, obtain its rarefaction representation coefficient, and then obtain the reconstructed error of subimage block by the rarefaction representation coefficient, and it is carried out thresholding process, determine candidate target region; Finally by crossing the accurate detection and Identification of some post-processed realizations to the remote sensing images multi-class targets.

The present invention automatically detects and identifies the target of plurality of classes the remote sensing images under complex background.Facts have proved, the method has higher detection and accuracy of identification and lower false alarm rate.

Description of drawings

Fig. 1: the basic flow sheet of the inventive method

Fig. 2: the training data in the inventive method

Fig. 3: the partial detection of the inventive method

(a) Aircraft Targets testing result (red square frame represents Aircraft Targets, and yellow square frame is false-alarm)

(b) Ship Target Detection result (white box representative Ship Target)

(c) oil depot target detection result (blue square frame represents the oil depot target)

(d) aircraft, Ship Target Detection result

(e) aircraft, oil depot target detection result

(f) naval vessel, oil depot target detection result

Embodiment

Now the invention will be further described in conjunction with embodiment, accompanying drawing:

The hardware environment that is used for implementing is: Intel Pentium 2.93GHz CPU computing machine, 2.0GB internal memory, the software environment of operation is: Matlab R2011a and Windows XP.Choose 100 width of cloth and carried out the multi-class targets test experience from the remote sensing images that Google Earth obtains, mainly comprised tertiary target: aircraft, naval vessel, oil depot, wherein, totally 200 of Aircraft Targets, totally 120 of Ship Targets, totally 420 of oil depot targets.

Implementation of the present invention is as follows:

1, training redundant dictionary: use the method training dictionary based on the rarefaction representation dictionary learning, detailed process is as follows:

(1.1) training image is processed in earlier stage: concrete processing procedure is: at first with generic target unification to a principal direction in the original image, then will unify after the direction image along 0 ° to 360 °, every 10 ° of rotations once, obtain 36 class training datas, the original image of different classes of target is all processed according to above method, finally obtained 55 class training image, i.e. c=55, wherein, aircraft is totally 36 classes, naval vessel 18 classes, oil depot 1 class;

(1.2) data pre-service: adopt method of weighted mean that three components of RGB of 55 class training images are weighted and on average obtain gray level image, then gray level image being carried out down-sampling processes, obtain the image of 15 * 15 sizes, the image of 15 * 15 sizes is carried out energy normalized to be processed and obtains normalized image, again normalized image is converted to the column vector of 255 * 1 dimensions, this column vector as the row in the training data, is obtained pretreated training dataset U=[U ₁, U ₂..., U _c], U wherein _iThe subdata collection of corresponding i class among the training dataset U, i=1,2 ..., c;

(1.3) train known training dataset U=[U by the FDDL software package of Lei Zhang issue ₁, U ₂..., U _c], obtain dictionary D=[D ₁, D ₂..., D _c], wherein, D _iThe sub-dictionary corresponding with the i class; The software package parameter lambda ₁=0.005, λ ₂=0.05;

The FDDL software package of described Lei Zhang is seen paper: Meng Yang, Lei Zhang, Xiangchu Feng, David Zhang.Fisher Discrimination Dictionary Learning for Sparse Representation[C] .ICCV, 2011

2, sparse coding: the dictionary D=[D that obtains according to training ₁, D ₂..., D _c], each subimage block in the test pattern is carried out sparse coding, obtain sparse coefficient corresponding to each subimage block, concrete treatment step is as follows:

(2.1) test pattern pre-service: at first use the method for weighted mean described in (1.1) that test pattern is converted into the test gray level image, then use size to obtain subimage block along testing gray level image take interval steps as 5 pixels slips as the moving window of S * S, the S initial value gets 90; To using resulting each subimage block of moving window, it is down sampled to size is 15 * 15 image, then carrying out energy normalized processes, image transitions after again energy normalized being processed is the column vector β of one 225 * 1 dimension, represents pixel grey scale value information by the resulting subimage block of moving window with column vector β;

(2.2) sparse coding: to each subimage block by passing through Optimized model:

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient vector of corresponding each subimage block

Wherein

Be and sub-dictionary D _i

Corresponding coefficient vector, allowable error ε=0.15, || || ₁Be l ₁Norm, || || ₂Be l ₂Norm;

(2.3) ask for reconstructed error: according to the sparse coding coefficient

Calculate the reconstructed error of subimage block picture piece and each class

E=min{e is got in weights γ=0.5 _iAs the reconstructed error of this subimage block, and record its corresponding classification Then judge whether comprise target in this subimage block according to the magnitude relationship between reconstructed error e and predefined threshold tau=0.3: if e＜τ illustrate to comprise target, otherwise, illustrate that this subimage block is background;

3, object detection and recognition:

(3.1) will judge the corresponding reconstructed error e of each subimage block that comprises target in (2.3), form a reconstructed error matrix E=(e of the same size with the test gray level image, as to represent candidate target region _St) _{P * Q}E wherein _StThe value of locating in coordinate points (s, t) for the reconstruct error matrix,

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P * Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q; Judge the corresponding classification of each subimage block that comprises target in will (2.3), forms one with test that gray level image is of the same size, the classification matrix L that represents the candidate target classification=(C _St) _{P * Q}C wherein _StThe value of locating in coordinate points (s, t) for the classification matrix,

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};

(3.2) change the size of sliding window S * S, become S=90-10 * j, j=1,2 ... G is number of times for a change, repetition 2, step (3.1) G time altogether, the G that obtains a reconstructed error matrix and G classification matrix; G the reconstructed error matrix that obtains formed a multiple dimensioned reconstructed error matrix M AP=(e _Stg) _{P * Q * G}Wherein, e _StgBe the element among the matrix M AP, its value is the corresponding e of reconstructed error matrix that the sliding window size of the g time change obtains _St, P * Q * G is the size of multiple dimensioned reconstructed error matrix, g=1, and 2 ... G; G the classification matrix that obtains consisted of a multiple dimensioned classification Matrix C LASS=(C _Stg) _{P * Q * G}Wherein, C _StgBe the element among the Matrix C LASS, its value is the corresponding C of classification matrix that the sliding window size of the g time change obtains _StObtain a minimal reconstruction error matrix (map (s, t)) according to multiple dimensioned reconstructed error matrix M AP _{P * Q}, wherein map (s, t) is the value that corresponding minimal reconstruction error matrix is located in coordinate points (s, t),

Obtain Scale Matrixes according to multiple dimensioned reconstructed error matrix M AP

Scale (s, t) is the value that corresponding Scale Matrixes is located in coordinate points (r, t),

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

(3.3): ask for minimal reconstruction error matrix (map (s, t)) _{P * Q}The local neighborhood minimal value as the target response value that detects, the local neighborhood minimal value is at minimal reconstruction error matrix (map (s, t)) _{P * Q}In corresponding coordinate be the center of target, just can be at (class (s, t)) according to the center _{P * Q}(scale (s, t)) _{P * Q}The corresponding classification of target and scale size are found in the position of middle correspondence.

Described calculated with weighted average method formula is

f(x,y)＝0.3R(x,y)+0.59G(x,y)+0.11B(x,y)

In the formula, the gray level image that f (x, y) obtains for method of weighted mean is at pixel (x, y) gray-scale value, R (x, y), G (x, y) and B (x, y) be respectively the training image of input at three component values of the RGB of pixel (x, y).

Described energy normalized computing formula is

f_{norm} (x, y) = \frac{f (x, y)}{\sqrt{Σ_{x = 1}^{u} Σ_{y = 1}^{v} {[f (x, y)]}^{2}}}

In the formula, f _Norm(x, y) is the gray-scale value after f (x, y) the process energy normalized, and u and v are respectively the row and column size of gray level image, u=15, v=15.

Described l ₁The computing formula of norm is

{| | z | |}_{1} = Σ_{k = 1}^{M} | ξ_{k} |

Described l ₂The computing formula of norm is

{| | z | |}_{1} = \sqrt{Σ_{k = 1}^{M} {| ξ_{k} |}^{2}}

Described reconstructed error e _iComputing formula be

e_{i} = {| | β - D_{i} {\hat{α}}_{i} | |}_{2}^{2} + γ {| | \hat{α} - m_{i} | |}_{2}^{2}

In the formula, γ is predefined weights, γ=0.5, m _iTo Y _iIn the element of the every delegation mean vector of averaging and obtaining; Y _iBe U _iThe optimum code coefficient that process dictionary D sparse coding obtains;

Select correct verification and measurement ratio and false alarm rate that validity of the present invention is assessed.Wherein, the ratio of the target number that correct verification and measurement ratio is defined as correct detection and total target number, false alarm rate is defined as false-alarm number and the correct target number that detects and the ratio of false-alarm number sum.Simultaneously, the testing result of gained of the present invention and multi-class targets detection algorithm based on BoW are contrasted, comparing result is as shown in table 1.Correct verification and measurement ratio and false alarm rate have all shown the validity of the inventive method.

Table 1 evaluation

。

Claims

1. remote sensing images multi-class targets detection and Identification method, its design procedure is as follows:

Step a1 training image is processed in earlier stage: at first with generic target unification to a principal direction in the original image, then will unify after the direction image along 0 ° to 360 °, according to step-length Rotate to be

Be the anglec of rotation, c is the total number of classification of different target different directions image in the resulting training image; Wherein: For rounding downwards;

Step b1 data pre-service: adopt method of weighted mean pair

Three components of the RGB of class training image are weighted and on average obtain gray level image, then gray level image are carried out down-sampling and process, and obtain the image of n * n size; The image of n * n size is carried out energy normalized process and obtain normalized image, again normalized image is converted to n ²The column vector of * 1 dimension as the row in the training data, obtains pretreated training dataset U=[U with column vector ₁, U ₂..., U _c], U wherein ₁The subdata collection of corresponding i class among the training dataset U, i=1,2 ..., c;

Step c1 trains dictionary: the FDDL software package by Fisher Discrimination Dictionary Learning for Sparse Representation issue is trained known training dataset U=[U ₁, U ₂.., U _c], obtain dictionary D=[D ₁, D ₂..., D _c], wherein, D _iThe sub-dictionary corresponding with the i class;

Step b2 sparse coding: each subimage block is passed through Optimized model

\hat{α} = \arg \min {| | α | |}_{1} s . t . {| | β - Dα | |}_{2}^{2} \leq ϵ

Obtain the sparse coding coefficient of corresponding each subimage block Wherein

Step 3 object detection and recognition:

e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},

P * Q is the size of test pattern, s=1, and 2 ... P, t=1,2 ... Q;

C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};

scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

2. described remote sensing images multi-class targets detection and Identification method according to claim 1, it is characterized in that: described calculated with weighted average method formula is f (x, y)=0.3R (x, y)+0.59G (x, y)+0.11B (x, y), in the formula, the gray level image that f (x, y) obtains for method of weighted mean is at pixel (x, y) gray-scale value, R (x, y), G (x, y) and B (x, y) be respectively the training image of input at three component values of the RGB of pixel (x, y).

3. described remote sensing images multi-class targets detection and Identification method according to claim 1, it is characterized in that: described energy normalized computing formula is

In the formula, f _Norm(x, y) is the gray-scale value after f (x, y) the process energy normalized, and u and v are respectively the row and column size of gray level image.

4. described remote sensing images multi-class targets detection and Identification method according to claim 1 is characterized in that: described l ₁The computing formula of norm is

In the formula, z is that size is the vector of M * 1, ξ _kBe the element of vectorial z,

k＝1,2,…,M。

5. described remote sensing images multi-class targets detection and Identification method according to claim 1 is characterized in that: described l ₂The computing formula of norm is

k＝1,2,…,M。

6. described remote sensing images multi-class targets detection and Identification method according to claim 1 is characterized in that: described reconstructed error e _iComputing formula be

In the formula, γ is predefined weights, the value of γ

Scope is 0 ~ 1, m _iTo Y _iIn the element of the every delegation mean vector of averaging and obtaining; Y _iBe U _iThe optimum code coefficient that process dictionary D sparse coding obtains.

7. described remote sensing images multi-class targets detection and Identification method according to claim 1 is characterized in that: the described anglec of rotation

Span is 0 ° to 90 °.

8. described remote sensing images multi-class targets detection and Identification method according to claim 1 is characterized in that: described FDDL software package parameter lambda ₁Scope be 0.001 ~ 0.01, λ ₂Scope be 0.01 ~ 0.1.

9. described remote sensing images multi-class targets detection and Identification method according to claim 1, it is characterized in that: the span of described S is the integer between 40 ~ 90, the span of b is the integer between 1 ~ 15.

10. described remote sensing images multi-class targets detection and Identification method according to claim 1, it is characterized in that: the span of described threshold tau is 0 ~ 1.