CN110443207B

CN110443207B - Target progressive detection and identification method based on hierarchical feature tensor

Info

Publication number: CN110443207B
Application number: CN201910727379.9A
Authority: CN
Inventors: 陈浩; 高通; 陈稳
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2022-10-11
Anticipated expiration: 2039-08-07
Also published as: CN110443207A

Abstract

A target progressive detection and identification method based on hierarchical feature tensor belongs to the technical field of target detection and identification in remote sensing images. The method solves the problem that the accuracy of the existing method for detecting and identifying the target in the remote sensing image is low. According to the method, a target identification model based on a tensor mode is established, and the problem that a typical vector mode identification method ignores internal structural information of a remote sensing image target and is easy to overfit under a small sample is solved. The method realizes accurate identification of the target by learning the hierarchical feature tensor with the most discrimination capability and utilizing a soft and hard interval support tensor machine. The method is suitable for detecting and identifying the target in the large-scene remote sensing image. The method can obtain more than 98% of detection accuracy rate for the remote sensing image with large information amount and complex background, and can be applied to the technical field of target detection and identification in the remote sensing image.

Description

Target progressive detection and identification method based on hierarchical feature tensor

Technical Field

The invention belongs to the technical field of target detection and identification in remote sensing images, and particularly relates to a target progressive detection and identification method.

Background

With the gradual maturity of remote sensing technology, the quality of the obtained remote sensing image is greatly improved, and the target identification based on the remote sensing image becomes a hot research problem. And for various ground targets, the target detection and identification have important research significance and value and wide application scenes.

At present, a great deal of research is developed for remote sensing image target detection and identification at home and abroad, certain achievements are obtained, and various target detection methods aiming at optical image slices are divided into static target detection and moving target detection according to different target types; according to different implementation strategies, the method can be divided into bottom-up data-driven target detection and top-down knowledge (hypothesis) -driven target detection; according to the type of the target features, the method can be divided into target detection based on local features and target detection based on global features; in addition, from the viewpoint of human visual recognition, the utilization of visual saliency is an important means for realizing target detection.

However, the conventional spectral feature can only describe the characteristic of one pixel, and the spatial feature which can be extracted from the local area of the image is ignored. The vector itself is only a one-dimensional array, and the spatial spectrum information of the image is arranged in the vector, so that the structural information of the pixel characteristics is necessarily lost. The accuracy rate of target detection and identification in the remote sensing image is low.

Disclosure of Invention

The invention aims to solve the problem that the accuracy of detection and identification of a target in a remote sensing image is low in the existing method, and provides a target progressive detection and identification method based on hierarchical feature tensor.

The technical scheme adopted by the invention for solving the technical problems is as follows: a target progressive detection and identification method based on hierarchical feature tensor comprises the following steps:

the method comprises the following steps of firstly, extracting real target slices and false target slices to construct a training sample set, and constructing a multi-resolution feature tensor corresponding to each slice in the training sample set;

step two, constructing a characteristic discrimination criterion, and extracting characteristic sub-tensors with the most discrimination capability in different resolutions according to a projection matrix from the multi-resolution characteristic tensor intermediate mathematics Xi Touying matrix constructed in the step one;

training a soft and hard interval support tensor machine by using the extracted characteristic sub tensor until all false targets are correctly classified or the number of layers of a multi-resolution layer reaches a threshold value Q, and obtaining a trained target identification model;

thirdly, quickly detecting a target of the remote sensing image to be detected and identified to obtain a candidate target position; then, acquiring a candidate target slice by intercepting an image block at the position of the candidate target;

step four, respectively carrying out direction estimation on each obtained candidate target slice, and carrying out rotation transformation on each candidate target slice according to the direction estimation result to obtain all candidate target slices after the rotation transformation;

step five, extracting the feature tensors with different resolutions of each slice obtained in the step four, and extracting the feature sub-tensors with the most discriminative ability with different resolutions from the feature tensors with different resolutions by using the projection matrix obtained in the step two;

and inputting the extracted characteristic sub tensor into the target recognition model trained in the second step to obtain a target recognition result of the remote sensing image to be detected and recognized, which is output by the target recognition model.

The beneficial effects of the invention are: the invention provides a target progressive detection and identification method based on hierarchical feature tensor, which establishes a target identification model based on a tensor mode, and solves the problem that a typical vector mode identification method ignores internal structural information of a remote sensing image target and is easy to over-fit under a small sample. The method realizes accurate identification of the target by learning the hierarchical feature tensor with the most discrimination capability and utilizing a soft and hard interval support tensor machine.

The method is suitable for detecting and identifying the target in the large-scene remote sensing image. The method can obtain more than 98% of detection accuracy for the remote sensing image with large information amount and complex background, and has very important significance in the military and civil fields.

Drawings

FIG. 1 is a flow chart of a hierarchical feature tensor-based progressive detection identification method of a target of the present invention;

FIG. 2 is a schematic representation of a standard STM and soft-hard interval support tensor machine classification hyperplane of the present invention;

fig. 3 is a schematic illustration of an experimental slice training sample.

Detailed Description

The first embodiment is as follows: as shown in fig. 1, the method for detecting and identifying a target progressive based on a hierarchical feature tensor according to the present embodiment includes the following steps:

the method comprises the following steps of firstly, extracting a real target slice and a false target slice to construct a training sample set, and constructing a multi-resolution feature tensor corresponding to each slice in the training sample set;

the multi-resolution feature tensor is: constructing a Q-layer image pyramid, extracting images of different layers of the pyramid to respectively construct a feature tensor, and gradually increasing the resolution from the first layer to the Q-th layer of the pyramid to form a multi-resolution layer;

thirdly, after the remote sensing image to be detected and identified is subjected to target rapid detection, a candidate target position is obtained; then, acquiring a candidate target slice by intercepting an image block at the position of the candidate target;

direction estimation refers to estimating the direction of an object that may be included in a candidate object slice. Dividing the target slices of the training sample set into M types according to the direction of the included target, constructing a K-neighbor classifier, and finishing direction estimation of the candidate target by constructing a feature tensor and inputting the feature tensor to the K-neighbor classifier for any candidate target slice. If the target direction in the target slice is X degrees, it is ceil (X × M/360) class.

The objects in the digital images are in tensor forms essentially, the extracted image features are described by the tensors, the spatial features of the images are reserved, and the target identification precision can be improved to a certain extent.

Conventional vector mode feature selection selects a subset of features from a set of candidate features, resulting in the feature selection algorithm of conventional vector mode not being applicable to tensor mode data. The invention designs a target progressive identification technology based on a hierarchical feature tensor model, and utilizes a hierarchical feature tensor from low spatial resolution to high spatial resolution to remove false alarms with obvious difference between the current resolution and a target layer by layer, and finally outputs a target identification result meeting specific features of a specified model.

The method fully analyzes the specific characteristics of the typical target in the large-scene remote sensing image, constructs the characteristic tensor according to the specific characteristics, obtains a better detection and identification effect, and has a better application prospect. Experiments against typical aircraft targets identified the aircraft on a *** earth data source. Experimental results show that the method not only can reliably and effectively identify the airplane target in the high-resolution remote sensing image, but also can have a strong robust effect in a complex scene.

The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: the specific process of the second step is as follows:

and (3) extracting the feature tensors with different resolutions in the first step, wherein the feature tensors always have great contribution of local sub tensors to target identification, so that the sub tensors are extracted by using a tensor mode feature selection algorithm based on tensor manifold discrimination and arrangement to realize effective target identification.

The manifold learning theory considers that a manifold structure with low dimensionality is contained in a high-dimensional feature, namely, the original sample close relation can still be maintained in the low dimensionality. However, some conventional manifold learning methods ignore the difficult sample problem and do not analyze the separation of heterogeneous neighbors in depth.

A difficult sample refers to a sample that is easily confused with other classes of samples. Due to the complexity of the remote sensing image background, the situation that an object is mixed with the background (false object) easily occurs (for example, the similarity of an airplane object and a cross-shaped road intersection under the spatial feature is high), especially under the description of the low-resolution feature. The reason is that the low resolution feature tensor describes the rough characteristics of the target, and the feature dimensionality is lower, which is more likely to generate a false target that is difficult to distinguish. The task of the multi-resolution identification layer is to separate false targets which can be distinguished by the current resolution, and difficult samples should be distinguished by a subsequent high-resolution feature identification layer.

Extracting multi-resolution characteristic tensor and constructing a training set according to a real target and a false target slice, wherein the false target slice is a chi _{false_t arg et} The method is divided into two types, one is a false target (namely a difficult sample) x which is difficult to separate from a real target _{hard_samples} The other is the easy-to-remove false target (i.e. simple sample) χ _{simple_samples} (ii) a The goal of each multi-resolution identification layer is to remove χ as much as possible according to a soft and hard interval support tensor machine corresponding to the feature tensor training of the current resolution on the premise of keeping the correct identification of a real target sample _{simple_samples} And then, constructing a training set of the next layer by using the real target samples and the false target samples which cannot be correctly classified by the previous layer.

And then constructing a characteristic discrimination criterion, defining samples which are difficult to meet the characteristic discrimination criterion as sample samples which are difficult to be instantiated, and using the rest samples as simple samples. The feature tensor selection algorithm should enable the feature sub-tensors of the same type of samples to still keep the manifold structure of the original feature tensor, that is, the neighbors of the same type of samples of the original feature tensor are still neighbors in the sub-tensors, and the sub-tensors of different types are separated as far as possible. Whereas a difficult sample is a false target that is difficult to separate from a real target.

Step two, aiming at the first layer of the multi-resolution layer, a false target x _j′ ∈χ _{false_t arg et} ，χ _{false_t arg et} Representing a set of false objects, x _j′ With each real target

The sum of squares of the Frobenius norms, and the sum result represents the false target χ _j′ The difference from the real target is that,

χ _{t arg et} represents a set of real targets, 1 ≦ i' ≦ N _t ，N _t The total number of real targets;

false target χ _j′ Maximum separation from the totality of real objects

Comprises the following steps:

wherein: u shape _i′ In order to be a projection matrix, the projection matrix,

is a false target x _j′ Maximum separation from the whole real target, all false targets

In descending order, the top 80% of the dummy objects are the first layer of hard samples, the other dummy objects are the first layer of simple samples,

represents U _i′ And chi _j′ In the mode product of the i' th mode,

represents U _i′ And

the mode product at the i' th mode;

to ensure

In that

Neighbor feature tensor or

In that

The adjacent feature tensors are far away from each other as much as possible, and adjacent samples of the original feature tensors are selected to construct adjacent constraint;

for the

The same group (means

False target or true target together with neighbor set NH) neighbor set NH representation

K neighbors in the same group, i.e.

To be separated (i.e. the different groups,

unlike the neighbor set NH, the neighbor set NM is a dummy target,

is a true target, or the neighbor set NM is a true target,

as a false target) represents: if it is

Then

K neighbors to be separated are

If it is

Then

K neighbors to be separated are

If it is

For the real target, the same group of k neighbors means, and

the k nearest real targets with different k adjacent groups mean

The k false objects that are closest in distance,

when in use

Is composed of

(

Possibly a false target, and possibly a true target) or

Is composed of

When k is adjacent to each other in the same group, the weight value

Is 1, otherwise the weight value

Is 0;

when the temperature is higher than the set temperature

Is composed of

Of different groups k of neighbors or

Is composed of

When k is adjacent to other groups, the weight value

Is 1, otherwise weight

Is 0;

according to the relationship of the adjacent neighbors,

sub tensor of

Satisfying the minimum following formula:

wherein: k is a regularization parameter, the purpose of which is to balance the importance of the same set of neighbors remaining separate from different sets of samples,

is composed of

The sub-tensors of (a) are,

the meaning of the above equation is the neutralization of the original tensor

The nearest neighbor of the same group of samples in the sub-tensor is the nearest possible, and the chi to be separated _{simple_samples} And with

Are to be moved away from each other.

The equation (2) is modified as:

in the formula: intermediate variables

Intermediate variables

Is a matrix of the units,

g and h represent the g-th row and h-th column of the matrix respectively,

represents the first

Sub-tensor set formed by samples and 2k neighbors thereof

In (1)The sub-tensor of the g-th sample (which may be a false target, or a target),

represents the first

Sub-tensor set composed of samples and 2k neighbors thereof

The sub-tensor of the h-th sample in (a); diag (·) represents a diagonal matrix with input parameters as diagonals;

the optimization goal of equation (3) is the sample

The target function of the samples adjacent to it, and the overall target function is the target function of all samples

Stacking the optimization objective functions corresponding to the N with the i being more than or equal to 1 and less than or equal to N;

to select a sub-tensor

And with

Neighbor (including same group neighbor and different group neighbor) sub tensor, defining selection matrix

As shown in equation (4):

wherein:

to select a matrix

I of (1) ₀ Line kth ₀ Elements of a column;

represents

Is composed of

Of the same group of neighbors ₀ The number of the samples is one,

represents

Is composed of

Of the same group of neighbors ₀ The number of the samples is one,

represents

Is composed of

Of a different group of neighbors ₀ The number of the samples is one,

represents

Is composed of

Of a different group of neighbors ₀ A sample is obtained;

the overall feature discrimination objective function is expressed as equation (5):

intermediate variables

A set of sub-tensor representing the total of N samples (i.e. the total number of false targets and real targets is N)

The sub-tensor of the g-th sample (which may be a false target or a true target),

subset set of sub-tensor representing the entire N samples

The sub-tensor of the h-th sample in (a);

the final feature discrimination criterion is:

in the formula: w is an intermediate variable matrix, an

Is W to the middle

Go to the first

The elements of the column are,

represents

Is composed of

Are close to each other in the same group k,

represents

Is composed of

Are close to each other in the same group k,

represents

Is composed of

Are close to each other in the different groups k,

represents

Is composed of

K, D is a diagonal matrix,

equation (6) is equivalent to the form of equation (7):

L _g,h is a Laplace matrix;

by using

Replacement of

By using

Replacement of

i =1,2, …, l; transforming equation (7) into the form of equation (8):

then, the alternative iteration method is utilized to respectively and sequentially solve the U _i I =1,2, …, l, and based on the determined projection matrix U _i Calculating the most discriminative characteristic sub-tensor of the first layer of the multi-resolution layer

J' is more than or equal to 1 and less than or equal to N; (remote sensing image tensor expression and manifold learning method research _ Zhang Lefei)

Fig. 2 (a) shows that the standard STM uses soft gaps, and all the black and white segments are segmented incorrectly. Assuming that black is the class of targets and white is the false target, for the classifier of the multi-resolution recognition layer, the situation of fig. 2 (b) (SHM-STM) occurs, i.e., targets can be classified correctly and have hard intervals, while the false target can tolerate false separation under the condition of maximizing soft intervals. Establishing a soft-hard interval support tensor machine (SHM-STM), according to the objective function of the standard STM for maximizing class interval and minimizing relaxation variable and the soft-hard interval constraint required by us, then:

step two, in order to obtain the parameters of the soft-hard interval support tensor machine, the optimization problem of constructing the soft-hard interval support tensor machine is as follows:

wherein: w is a _i Parameters representing the i-th hyperplane of the soft-hard interval support tensor machine, i =1,2, …, l, l represents the number of hyperplanes of the soft-hard interval support tensor machine,

representing the outer product of the vectors, C representing the penalty factor of the soft-hard interval support tensor machine, ξ _i″ Represents the relaxation variable for the ith "simple sample, 1 ≦ i ≦ N _sft ，N _sft For the number of false target samples,

is the sub tensor of the ith sample with the category of the real target, i' is more than or equal to 1 and less than or equal to N _t ，N _t B represents the bias of the hyperplane for the true target sample number;

is a sub-tensor of the ith "sample of the class of false objects,

denotes w _i And

in the mode product of the i-th mode,

denotes w _i And

the mode product at the ith mode;

the soft-hard interval support tensor machine can be considered as an extreme form of the C-STM, which can be expressed as follows;

ξ _i ≥0，1≤i≤N _simple +N _{t arg et}

when C is present ₁ The formula is + ∞, and the C-STM is equivalent to the SHM-STM. The SHM-STM ensures the maximum target recognition rate of the training process.

Respectively identifying the characteristic sub-tensors with the highest discrimination capability of the first layer, and obtaining the category to which each characteristic sub-tensor belongs:

wherein:

if it is

Is 1, then the sub-tensor

The corresponding sample is classified as a real target if

Is-1, the sub-tensor

The category of the corresponding sample is a false target;

and step two, repeating the process from the step one to the step two for other layers of the multi-resolution layer until all the false targets are correctly classified or the number of layers of the multi-resolution layer reaches a threshold value Q, and stopping training to obtain a trained target recognition model.

For the second layer of the multi-resolution layer, false targets which cannot be correctly classified by the first layer and all real target samples are used as a training set of the second layer of the multi-resolution layer, and the like, for the Nth layer of the multi-resolution layer, false targets which cannot be correctly classified by the (N-1) th layer and all real target samples are used as a training set of the Nth layer of the multi-resolution layer;

and (3) adopting a mode of sequentially executing a first layer, a second layer, a third layer, … …, an Nth layer and … … until all the false targets are correctly classified or the number of layers of the multi-resolution layer reaches a threshold value Q, stopping training, and obtaining a trained target recognition model.

The third concrete implementation mode: the second embodiment is different from the first embodiment in that: the specific process of the third step is as follows:

and rapidly detecting a target of the remote sensing image based on the visual saliency to obtain candidate target positions in the remote sensing image, and intercepting an image block at each candidate target position according to the center of each candidate target position, the size of a real target and the prior of the resolution of the remote sensing image to obtain an image slice of each candidate target.

In the large-scene airplane target detection, after rapid detection based on visual saliency, a large number of candidate targets exist, and image slices corresponding to the size of an airplane are intercepted to be used as slices to be identified; for the remote sensing image to be detected, the suspected target position can be marked by methods such as selective search and the like.

And taking the center of the candidate target position as a center, and intercepting a square image block, wherein the side length of the image block is between 1 and 2 times of the target length.

The fourth concrete implementation mode: the third difference between the present embodiment and the specific embodiment is that: the multi-resolution feature tensor in the first step is a multi-resolution Gabor (Gabor) feature tensor, a multi-resolution Gist (Generalized search trees) feature, or a multi-resolution morphological difference operator feature tensor.

The fifth concrete implementation mode is as follows: the fourth difference between this embodiment and the specific embodiment is that: the threshold Q of the number of layers of the multi-resolution layer is 6.

The following examples were used to demonstrate the beneficial effects of the present invention:

the method for progressively detecting and identifying the target based on the hierarchical feature tensor has the advantages that a large number of experiments are conducted, experimental data sources are optical remote sensing images of 20 airports of a Google Earth data source, a slice result drawing in the experiments is shown in figure 3, (a) is a real target training sample, and (b) is a false target training sample, and sliding window identification is conducted with the step length of 10.

According to the overall training algorithm of the target progressive recognition model of the hierarchical feature tensor, which is provided by the invention, the model is trained, and relevant training parameters are enabled to be delta S _cellsize ＝[40,40]，δ ₁ ＝[16,16]，δ ₂ ＝20，δ ₃ ＝[4,4]. Setting the difficult sample of the multi-resolution identification layer as

The dimension of the eigen-tensor is 20% of the smaller samples

Where round () represents rounding and size () is the dimension from which the input tensor is extracted. In the target attitude estimation layer

I.e. we have

Target poses are classified into 10 classes for separation, and sigma in target pose estimation is the minimum distance of different classes of samples in single pose

Each level of the model can correctly identify a part of background samples till the last background sample is almost allPartial removal, and since the proposed SHM-STM and OC-STM ensure 100% correct classification of the targets in the training process, 4 of 912 background samples are considered as false alarms and 600 target samples are correctly classified at the end of the final model training.

The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications can be made on the basis of the foregoing description, and it is not intended to exhaust all of the embodiments, and all obvious variations and modifications which fall within the scope of the invention are intended to be included within the scope of the invention.

Claims

1. A target progressive detection and identification method based on hierarchical feature tensor is characterized by comprising the following steps:

step four, respectively carrying out direction estimation on each obtained candidate target slice, and carrying out rotation transformation on each candidate target slice according to a direction estimation result to obtain all candidate target slices after rotation transformation;

2. The method for progressively detecting and identifying the target based on the hierarchical feature tensor of claim 1, wherein the specific process of the second step is as follows:

step two, regarding the first layer of the multi-resolution layer, the false target x _j′ ∈χ _{false_target} Will falsely target x _j′ With each real target

1≤i′≤N _t ，N _t the total number of real targets;

false target χ _j′ Maximum separation from the totality of real objects

Comprises the following steps:

is a false target x _j′ To all bodiesMaximum separation of targets, all false targets

represents U _i′ And chi _j′ In the mode product of the i' th mode,

represents U _i′ And

the mode product at the i' th mode;

when in use

Is composed of

Of the same group k neighbor or

Is composed of

When k is adjacent to the same group, the weight value

Is 1, otherwise the weight value

Is 0;

when in use

Is composed of

Of different groups k of neighbors or

Is composed of

When k is adjacent to other groups, the weight value

Is 1, otherwise the weight value

Is 0;

according to the relationship of the adjacent neighbors,

sub tensor of

Satisfying the minimum following formula:

wherein: k is a regularization parameter and is,

is composed of

The sub-tensors of (a) are,

the equation (2) is modified as:

in the formula: intermediate variables

Intermediate variables

Is a matrix of the units,

g and h represent the g-th row and h-th column of the matrix respectively,

represents the first

Sub-tensor set formed by samples and 2k neighbors thereof

The sub-tensor of the g-th sample in (b),

represents the first

A sub-tensor composed of one sample and 2k neighbors thereofCollection

The sub-tensor of the h-th sample in (a);

to select a sub-tensor

And with

Neighbor tensor, defining a selection matrix

As shown in equation (4):

wherein:

to select a matrix

I of (1) ₀ Line kth ₀ Elements of a column;

represents

Is composed of

Of the same group of neighbors ₀ The number of the samples is one,

represent

Is composed of

Of the same group of neighbors ₀ The number of the samples is one,

represent

Is composed of

Of different groups of neighbors ₀ The number of the samples is one,

represents

Is composed of

Of different groups of neighbors ₀ A sample is obtained;

intermediate variables

Subset set of sub-tensor representing the entire N samples

The sub-tensor of the g-th sample in (b),

subset set of sub-tensor representing the entire N samples

The sub-tensor of the h-th sample in (a);

the final feature discrimination criterion is:

in the formula: w is an intermediate variable matrix, an

Is W to the middle

Go to the first

The elements of the column are,

represents

Is composed of

Are close to each other in the same group k,

represents

Is composed of

Are close to each other in the same group k,

represents

Is composed of

Are close to each other in the different groups k,

represent

Is composed of

K, D is a diagonal matrix,

equation (6) is equivalent to the form of equation (7):

L _g,h is a Laplace matrix;

by using

Replacement of

By using

Replacement of

Transforming equation (7) into the form of equation (8):

is the sub tensor of the ith sample with the category of the real target, i' is more than or equal to 1 and less than or equal to N _t ，N _t B generation for the number of real target samplesA bias of the watch hyperplane;

is a sub-tensor of the ith "sample of the class of false objects,

denotes w _i And

in the mode product of the i-th mode,

denotes w _i And

the mode product at the ith mode;

wherein:

if it is

Is 1, the sub-tensor

The corresponding sample is classified as a real target if

Is-1, then the sub-tensor

The category of the corresponding sample is a false target;

3. The method for progressively detecting and identifying the target based on the hierarchical feature tensor of claim 2, wherein the specific process of the third step is as follows:

and carrying out target rapid detection on the remote sensing image based on the visual saliency to obtain candidate target positions in the remote sensing image, and intercepting an image block at each candidate target position according to the center of each candidate target position, the size of a real target and the prior of the resolution of the remote sensing image to obtain an image slice of each candidate target.

4. The method as claimed in claim 3, wherein the multi-resolution feature tensor in the first step is a multi-resolution Gabor feature tensor, a multi-resolution Gist feature tensor or a multi-resolution morphological difference operator feature tensor.

5. The method according to claim 4, wherein the threshold Q of the number of layers of the multi-resolution layer is 6.