CN115082801A

CN115082801A - Airplane model identification system and method based on remote sensing image

Info

Publication number: CN115082801A
Application number: CN202210888641.XA
Authority: CN
Inventors: 杨澜; 杨晓冬; 刘建明; 严华
Original assignee: Beijing Daoda Tianji Technology Co ltd
Current assignee: Beijing Daoda Tianji Technology Co ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-09-20
Anticipated expiration: 2042-07-27
Also published as: CN115082801B

Abstract

The invention relates to an airplane model identification system and method based on remote sensing images, which comprises the following steps: the target detection module is used for detecting whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to the fine-grained classification module; and the model identification module is used for predicting the model of the airplane target based on the fine-grained classification network. The invention constructs a combined framework of a target detection module and a bilinear pooling fine-grained classification network based on the void convolution, and the framework is suitable for identifying the type of an airplane target in a remote sensing image and can also be suitable for identifying the targets with the same size and similar appearance characteristics in images in other fields.

Description

Airplane model identification system and method based on remote sensing image

Technical Field

The invention relates to the technical field of target identification, in particular to an airplane model identification system and method based on remote sensing images.

Background

The currently mainstream neural network architecture based on the deep learning target detection algorithm is divided into a one-stage network model and a two-stage network model. The first-stage network model directly regresses the class probability and the position coordinates of the object target, and compared with the second-stage model, the speed is higher, but the precision is lower than that of the second-stage network model. The two-stage network model mainly completes the target detection process through a convolutional neural network, CNN convolutional characteristics are extracted, when the network is trained, the network is mainly trained in two parts, the first step is to train the RPN network, the second step is to train the network detected in a target area, the accuracy of the network is high, but the speed is slow compared with that of the network model in the first stage.

Most remote sensing image airplane target identification methods are in a stage of identifying whether a certain target is an airplane target or not, and further classification and identification of airplane target models are lacked. Limited by the influence of high-resolution, more complex background and interference factors which are difficult to distinguish compared with the airplane target appear in the remote sensing image, and particularly, although some airplane models have certain differences, the common features of the airplane models are obvious and very similar, especially in the image with lower resolution. Therefore, no matter the structure of the one-stage network model or the two-stage network model can not accurately and finely identify two different types of airplanes with similar spatial structures, and the practical precision can not be achieved.

Disclosure of Invention

The invention aims to improve the identification precision of the specific model of the airplane target through a combined framework of a target detection module and a fine-grained classification module, and provides an airplane model identification system and method based on a remote sensing image.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

an aircraft model identification system based on remote sensing images, comprising:

the target detection module is used for detecting whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to the model identification module;

and the model identification module is used for predicting the model of the airplane target based on the fine-grained classification network.

The fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence, wherein the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit;

each neural network unit comprises a first residual cavity rolling block, a second residual cavity rolling block, a first rolling layer, a second rolling layer, a full connection layer, a BN layer, a third rolling layer and a first activation function layer; the first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected.

The first residual cavity convolution block or the second residual cavity convolution block comprises a cavity convolution layer, a BN layer and a second activation function layer which are sequentially connected.

The bilinear output unit is used for performing transposition operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and performing outer product on the high-dimensional feature matrix before the transposition operation and the high-dimensional feature matrix after the transposition operation to obtain the fused bilinear feature.

The classification loss function of the fine-grained classification network is as follows:

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci Is thin and thinThe prediction output of the granularity classification network indicates that the ith aircraft target belongs to the c model class, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);

prediction output y representing a fine-grained classification network _ci And input sample x _i The included angle therebetween.

An airplane model identification method based on remote sensing images comprises the following steps:

step S1, the target detection module detects whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to the model identification module;

and step S2, the model identification module predicts the model of the airplane target based on the fine-grained classification network.

The model identification module predicts the model of the airplane target based on the fine-grained classification network, and comprises the following steps:

the fine-grained classification network carries out feature extraction on the airplane target, outputs bilinear features of the airplane target, and predicts the model of the airplane target according to the bilinear features;

the fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence; the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit; the 3 neural network units output a high-dimensional characteristic matrix of the airplane target, and the bilinear output unit outputs bilinear characteristics of the airplane target.

The bilinear output unit outputs bilinear characteristics of the aircraft target, and the bilinear output unit comprises the following steps: and the bilinear output unit transposes the high-dimensional characteristic matrix output by the 3 rd neural network unit, and then performs outer product on the high-dimensional characteristic matrix before the transposing operation and the high-dimensional characteristic matrix after the transposing operation to obtain the fused bilinear characteristic.

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci For the prediction output of the fine-grained classification network, the ith aircraft target belongs to the category of the type c, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention constructs a combined framework of a target detection module and a bilinear pooling fine-grained classification network based on the void convolution, and the framework is suitable for identifying the type of an airplane target in a remote sensing image and can also be suitable for identifying the targets with the same size and similar appearance characteristics in images in other fields.

(2) The invention extracts a fine-grained classification network structure based on the bilinear feature of the cavity convolution, uses a neural network unit for feature sampling formed by combining common convolution and the cavity convolution and uses residual connection, thereby not only reducing the parameter quantity, but also avoiding the loss of key data, and an output end transposes an extracted high-dimensional feature matrix and outputs a feature calculation result after outer products are fused.

(3) The invention adds cosine angle to enlarge classification boundary, increase class spacing and aggregate class inner distance aiming at the loss function with small class distance of fine-grained classification network, thereby achieving the identification function of airplane target model, rather than simple classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a block diagram of an aircraft model identification system according to the present invention;

fig. 2 is a schematic structural diagram of a fine-grained classification network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a neural network unit according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a residual hole convolution block according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.

Example (b):

the invention is realized by the following technical scheme, as shown in figure 1, the airplane model identification system based on the remote sensing image comprises a target detection module and a model identification module. The target detection module is used for detecting whether an airplane target exists in the input remote sensing image, and if the airplane target exists, the detected airplane target is sent to the model identification module. The model identification module is used for predicting the model of the airplane target based on the fine-grained classification network.

The remote sensing image outputs single-class target detection in the field of a deep learning image visual algorithm, a target detection module of the scheme is based on a Yolov5 algorithm, the Yolov5 algorithm belongs to a one-stage network model (one-stage), the performance is excellent, the precision is high, and the small target detection precision is particularly improved, so that the method is very suitable for remote sensing image aircraft target detection. Referring to fig. 1, the output of the object detection module includes coordinate information [ xmin, ymin, xmax, ymax ] and category information [ class, conf ] (class represents classification, conf represents confidence), and since the object detection module is only used for detecting whether an airplane object exists in the remote sensing image, only the coordinate information [ xmin, ymin, xmax, ymax ] is focused here. Wherein xmin and ymin represent the coordinates of the left lower vertex of the frame where the airplane target is located, and xmax and ymax represent the coordinates of the right upper vertex of the frame where the airplane target is located.

According to the obtained coordinate information, cutting the original remote sensing image to obtain the area where the airplane target is located, and sending the airplane target after cutting to a fine-grained classification network.

In the remote sensing image, because the difference existing between the appearances of different types of airplanes is small, the airplane appearance is very similar due to the characteristic of obvious commonality and is particularly easily influenced by the spatial resolution. The normal convolution reduces the image resolution and the spatial hierarchy information is lost during the downsampling (pooling) process. The size of an airplane target in a remote sensing image is usually small, and if ordinary convolution is used for sampling learning, the feature effective information extracted after multilayer convolution can be greatly compressed. And the space between the airplane target classes is small, and the general image classification task is difficult to finely identify the specific model of the airplane target.

Therefore, the fine-grained classification network used by the model identification module is a bilinear pooling fine-grained classification network based on the hole convolution. The hole convolution is a method of expanding the field of view by injecting a hole in the normal convolution, and has a parameter expansion rate (ratio) that defines the distance between values when the convolution kernel processes data, compared to the normal convolution.

Referring to fig. 2, the structure of the fine-grained classification network includes 3 neural network units and 1 bilinear output unit, where the 3 neural network units are a first neural network unit, a second neural network unit, and a third neural network unit, and the structures of the neural network units are the same. The first neural network unit, the second neural network unit and the third neural network unit are sequentially connected, and the output end of the third neural network unit is connected with the input end of the bilinear output unit.

Please refer to fig. 3, which shows a structure of a neural network unit, including a first residual void volume block, a second residual void volume block, a first volume layer, a second volume layer, a full link layer, a BN layer, a third volume layer, and a first activation function layer, where the first residual void volume block and the second residual void volume block have the same structure. The first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected.

Referring to fig. 4, the structure of the residual void volume block includes a void volume layer, a BN layer, and a second activation function layer, which are connected in sequence. The cavity convolution layer is used for down sampling, sliding compensation is set to be 2, and texture features are better reserved instead of Pooling.

Referring to fig. 2, the bilinear output unit is configured to perform a transpose operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and perform an outer product on the high-dimensional feature matrix before the transpose operation and the high-dimensional feature matrix after the transpose operation to obtain a fused bilinear feature: x (I) = A (I) ^T Wherein X (I) is bilinear feature, A (I) is high-dimensional feature matrix before transposition operation, A (I) ^T And the high-dimensional feature matrix after the transposition operation is performed.

And high-order feature representation is obtained after square root and two-norm normalization operation, and bilinear pooling provides stronger feature representation than a linear model, so that the classification accuracy of the high-order feature representation is far higher than that of a common one-stage network model.

Before the fine-grained classification network is used, the fine-grained classification network needs to be trained, and the classification loss function of the fine-grained classification network is as follows:

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci For the prediction output of the fine-grained classification network, the ith aircraft target belongs to the category of the type c, and c belongs to M; m representsInputting distance intervals among airplane targets of different types and classes of the fine-grained classification network;

denotes y _i The cosine angle of (d);

Based on the system, please refer to fig. 1, the scheme also provides an airplane model identification method based on remote sensing images, which comprises the following steps:

and step S1, the target detection module detects whether the remote sensing image has an airplane target, and if so, the detected airplane target is sent to the model identification module.

The size of the remote sensing image is usually far larger than that of the natural scene image, if the remote sensing image is directly sent to a target detection module, the airplane target can be seriously compressed, and particularly, the network of the target detection module is difficult to learn the characteristic information of the small target such as the airplane. Therefore, the cutting pretreatment needs to be carried out on the remote sensing image with the super-large width, and in consideration of the problem that one target is cut off, the scheme sets a certain size and carries out sliding cutting at a certain step length to obtain N small pictures. And setting an overlap area between two adjacent small images, and inputting the cut remote sensing image small images into a target detection module.

The method comprises the steps that after image preprocessing is carried out on a remote sensing image, N small images are generated and sequentially input into a target detection module, coordinate information [ xmin, ymin, xmax, ymax ] of an airplane target is obtained through prediction of the small images through the target detection module, and the airplane target is obtained according to a coordinate information slice.

The input image size of the fine-grained classification network is set to be 112 x 112, and in order to ensure that the airplane target is not stretched and deformed after being zoomed, the airplane target is subjected to filling zooming processing before being input into the fine-grained classification network. Specifically, a base image with a pixel value of 128 is generated by taking the maximum side W/H (length or width) of an image of an airplane target, then the image of the airplane target is attached to the base image, an equilateral (W = H) square image is generated, and when the size is scaled to 112 × 112, the airplane shape is guaranteed to be scaled in an equal ratio.

And the fine-grained classification network extracts the characteristics of the airplane target, outputs bilinear characteristics of the airplane target, and predicts the model of the airplane target according to the bilinear characteristics.

The step of outputting bilinear characteristics of the aircraft target by the bilinear output unit comprises the following steps: and the bilinear output unit transposes the high-dimensional feature matrix output by the 3 rd neural network unit, and then performs outer product on the high-dimensional feature matrix before the transposing operation and the high-dimensional feature matrix after the transposing operation to obtain the fused bilinear feature.

denotes y _i The cosine angle of (d);

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An aircraft model identification system based on remote sensing image which characterized in that: the method comprises the following steps:

2. An aircraft model identification system based on remote sensing images according to claim 1, characterized in that: the fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence, wherein the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit;

each neural network unit comprises a first residual cavity convolution block, a second residual cavity convolution block, a first convolution layer, a second convolution layer, a full connection layer, a BN layer, a third convolution layer and a first activation function layer; the first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected.

3. An aircraft model identification system based on remote sensing images according to claim 2, characterized in that: the first residual cavity convolution block or the second residual cavity convolution block comprises a cavity convolution layer, a BN layer and a second activation function layer which are connected in sequence.

4. An aircraft model identification system based on remote sensing images according to claim 2, characterized in that: the bilinear output unit is used for performing transposition operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and performing outer product on the high-dimensional feature matrix before the transposition operation and the high-dimensional feature matrix after the transposition operation to obtain the fused bilinear feature.

5. An aircraft model identification system based on remote sensing images according to claim 1, characterized in that: the classification loss function of the fine-grained classification network is as follows:

denotes y _i The cosine angle of (d);

6. An airplane model identification method based on remote sensing images is characterized in that: the method comprises the following steps:

7. The aircraft model identification method based on the remote sensing image as claimed in claim 6, wherein: the model identification module predicts the model of the airplane target based on the fine-grained classification network, and comprises the following steps:

8. The aircraft model identification method based on the remote sensing image as claimed in claim 7, wherein: each neural network unit comprises a first residual cavity rolling block, a second residual cavity rolling block, a first rolling layer, a second rolling layer, a full connection layer, a BN layer, a third rolling layer and a first activation function layer; the first residual cavity convolution block, the second residual cavity convolution block and the first convolution layer are connected in sequence, the output end of the first convolution layer and the output end of the second convolution layer are connected with the input end of the full connection layer respectively, and the full connection layer, the BN layer, the third convolution layer and the first activation function layer are connected in sequence.

9. The aircraft model identification method based on the remote sensing image as claimed in claim 7, characterized in that: the step of outputting bilinear characteristics of the aircraft target by the bilinear output unit comprises the following steps: and the bilinear output unit transposes the high-dimensional feature matrix output by the 3 rd neural network unit, and then performs outer product on the high-dimensional feature matrix before the transposing operation and the high-dimensional feature matrix after the transposing operation to obtain the fused bilinear feature.

10. The aircraft model identification method based on the remote sensing image as claimed in claim 6, wherein: the classification loss function of the fine-grained classification network is as follows:

wherein x is _i Representing input samples, i.e. inputEntering an ith airplane target of the fine-grained classification network, wherein N represents the total number of input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci For the prediction output of the fine-grained classification network, the ith aircraft target belongs to the category of the type c, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);