CN112329538A

CN112329538A - Target classification method based on microwave vision

Info

Publication number: CN112329538A
Application number: CN202011078852.4A
Authority: CN
Inventors: 谭敏; 叶丹惠; 周剑; 胡东洋; 徐魁文
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-02-05
Anticipated expiration: 2040-10-10

Abstract

The invention discloses a target classification method based on microwave vision. The method comprises the following steps of 1, acquiring electromagnetic scattering field data by utilizing microwave transmitting and receiving antennas and preprocessing the electromagnetic scattering field data. 2. Constructing a complex convolutional neural network to realize object classification, wherein the network comprises the following components: a plurality of convolution layers, a plurality of batch normalization layers and a plurality of active layers. 3. An attention mechanism module is introduced in the complex convolution neural network. 4. And (3) inputting the electromagnetic scattering field data acquired in the step (1) as training data into the network constructed in the steps (2) and (3), and training network parameters through a back propagation algorithm until the whole network model is converged. The invention designs a special complex convolution neural network with an attention mechanism, and the learning effect of the complex convolution neural network is superior to that of a real convolution neural network. Meanwhile, the target classification method based on the microwave visual characteristics can still effectively classify the target when the RGB image cannot be applied in a specific difficult scene.

Description

Target classification method based on microwave vision

Technical Field

The invention relates to the field of target classification by using electromagnetic scattered field data in microwave vision under poor illumination conditions or under the condition that a target is shielded, in particular to a target classification method based on microwave vision, namely a classification method based on a complex convolution neural network with an attention mechanism.

Background

The target classification task is a core problem in the field of computer vision, and the main purpose of the task is to distinguish different types of targets according to different characteristics reflected in each data, and the correct identification of the targets is a key task for realizing machine intellectualization. In recent years, with the development of deep learning techniques, the objective classification task has made remarkable progress. However, most current object classification techniques are based on machine vision features of RGB images. However, in some particularly difficult scenes, such as poor lighting conditions or when the target is occluded, satisfactory RGB image data is difficult to acquire or even impossible to acquire at all.

To address the objective classification task in these cases, we propose that this task can be performed based on microwave visual characteristics using Electromagnetic Scattered Field (Electromagnetic Scattered Field) data. With the rapid development of wireless communication systems in recent years, electromagnetic wave sensors are increasingly being applied to various tasks. Compared with the common RGB image, the image obtained by the electromagnetic wave sensor, the electromagnetic wave image, has two obvious advantages: firstly, it is not affected by the illumination condition; secondly, the method has better robustness for shooting angles. In addition, it is noteworthy that the electromagnetic wave image can more easily cope with the image scale problem. In addition, an electromagnetic wave image is also advantageous in storage of data. Based on these advantages, researchers have applied electromagnetic waves to tasks such as object perception, object recognition, object reconstruction, and the like. Recently, some researchers have applied electromagnetic waves to the task of gesture recognition, and only used or used the conventional convolutional neural network.

The scattered field data we use is complex, unlike the general RGB image data. For this form of data, we propose an end-to-end complex convolutional neural network with attention mechanism to better learn its features for object classification. Compared with the real convolutional neural network used by researchers, the complex convolutional neural network takes the interaction between the real part and the imaginary part of the scattered field data into consideration, so that the network can learn better microwave visual characteristics. Furthermore, the detection of key locations is very important when identifying objects in an image, and therefore we introduce a mechanism of attention to enable the detection of key locations.

Under some difficult scenes, RGB image data are difficult to acquire or even cannot be acquired at all, and the microwave visual characteristics of the electromagnetic scattered field data have superiority for an image classification task, so that new modal characteristic data are provided for a target classification task under a specific scene. The complex convolution neural network with the attention mechanism, which is provided by the user aiming at scattered field data, can better learn the microwave visual characteristics and improve the effect of a target classification task. The complex convolution neural network is more suitable for microwave visual feature extraction and learning of scattered field data, and has innovation and applicability.

Disclosure of Invention

The invention provides a target classification method based on microwave vision. The invention applies the microwave visual characteristics of electromagnetic scattered field data to target classification under a specific difficult scene, and completes an end-to-end complex convolution neural network model with an attention mechanism, the model can better learn the microwave visual characteristics of scattered field data in a complex form, and the attention mechanism can realize the detection of the key position of a target. Compared with learning machine vision characteristics of RGB images, the model can fully learn microwave vision characteristics of electromagnetic scattered field data under a specific difficult scene, so that a better target classification effect is achieved. Meanwhile, the complex convolution neural network with the attention mechanism has better effect on extracting the microwave visual features of the scattered field data than the real convolution neural network.

A target classification method based on microwave vision comprises the following steps:

and (1) acquiring electromagnetic scattering field data by using a microwave transmitting antenna and a microwave receiving antenna and preprocessing the electromagnetic scattering field data.

Step (2), constructing a complex convolution neural network to realize target classification, wherein the network comprises the following steps: a plurality of convolution layers, a plurality of batch normalization layers and a plurality of active layers.

And (3) introducing an attention mechanism module in the complex convolution neural network in order to capture the details of the target in the image and realize the detection of the key position.

And (4) inputting the preprocessed electromagnetic scattering field data obtained in the step (1) as training data into the network model constructed in the steps (2) and (3), and training network parameters through a back propagation algorithm until the whole network model converges.

And (2) the electromagnetic scattered field data acquired in the step (1) is generated by simulation of a human hand model. The hand model comprises a simple hand model with only one layer of skin and a complex hand model consisting of skin, muscle and skeleton.

The complex convolution neural network in the step (2) is obtained by improving a convolution layer, a batch normalization layer and a complex activation layer in a real convolution neural network. Currently, most building blocks, techniques and architectures for deep learning are based on real-valued calculations and characterizations. However, recent analysis of recurrent neural networks and other older basic theories indicates that complex numbers have richer characterization capabilities and can also implement a memory retrieval mechanism that is robust to noise. The mathematical principles of the rewritten complex convolution layer, complex batch normalization layer, and complex activation layer will be described in detail below:

2-1. a plurality of convolutional layers. Unlike real convolution kernels, complex convolution kernels split the convolution kernel parameters into two parts, real and imaginary parts of an imaginary number. When convolution operation is performed, it follows the algorithm of imaginary number convolution. That is, for the convolution mask W ═ a + iB and the input feature h ═ x + iy, the complex convolution operation is defined as follows:

w ═ h ═ (a × x-B × + i (B × x + a ×) # (formula 1)

Written in matrix form then:

wherein, indicates a convolution operation, R (") indicates taking a real part, and I (") indicates taking an imaginary part.

2-2. a plurality of batch normalization layers. When normalizing a complex number array, it is not sufficient to scale its translation to a mean of 0 and a variance of 1. This approach does not ensure that both the real and imaginary parts have the same variance. In order to enable both to have the same variance, a covariance matrix is introduced, with a specific formula as follows:

wherein, x represents the input of the input,

denotes the result after normalization, E [ x ]]All 1 matrices of the same shape as x are shown. And V represents a covariance matrix between the real part and the imaginary part of x, and the specific calculation rule is as follows:

wherein Cov (") indicates an ask covariance.

2-3. a plurality of active layers. For the active layer, the operation rule is relatively simple, and we only need to suppress the values of the real part and the imaginary part respectively. In the network model, the activation function in the complex convolutional neural network used is:

crelu (z) ═ ReLu (r (z)) + iReLu (i (z)) # (formula 5)

Where z represents the input and Relu (") represents the activation function in a real convolutional neural network.

And (4) introducing an attention mechanism module into the complex convolutional neural network.

As shown in fig. 4, in the attention mechanism module, first, maximum value sampling and mean value sampling are performed on the intermediate layer features on a spatial scale, then denoising and spatial projection are performed through a multilayer perceptron, after the sum of two sampling results is obtained, element-by-element product operation is performed on the spatial scale with the original features, and a high-response region on a channel scale is captured. Then, maximum value sampling and average value sampling on a channel scale are carried out again, and the maximum value sampling and the average value sampling are fused together by using a convolution layer, so that a high-response region of the feature on a spatial scale can be captured finally.

And (4) training the model parameters of the neural network built in the steps (2) and (3) through a back propagation algorithm until the whole network model converges, and mainly aiming at enabling the accuracy of the trained model to classify the target to be the highest. For this problem, a cross entropy loss function (linear cross sensitivity) is used as the loss function.

The invention has the beneficial effects that:

the invention designs a special complex convolution neural network with an attention mechanism aiming at the microwave visual characteristic learning of electromagnetic scattered field data, and the learning effect of the complex convolution neural network is superior to that of a real convolution neural network. Meanwhile, the target classification method based on the microwave visual characteristics can still effectively classify the target when the RGB image cannot be applied in a specific difficult scene.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a basic framework diagram of a complex convolutional network classification model with attention mechanism constructed in the method of the present invention.

FIG. 3 is a frame structure diagram of the comparison between the effects of a complex network and a real network.

FIG. 4 is a block diagram of the frame of the attention mechanism module incorporated in the model of the method of the present invention.

FIG. 5 is a graph of experimental results of the method of the present invention with respect to whether or not the attention mechanism module was incorporated into the model and the number of layers of the network.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a target classification method based on microwave vision.

We apply the method to the recognition task of a static gesture data set.

Acquiring a data set of an electromagnetic scattering field of a target and preprocessing the data set, wherein the data set comprises the following specific steps:

in the deep neural network, due to the existence of the convolutional layer and the pooling layer, the size of input data in the calculation process becomes smaller and smaller, so that the input of the network is required to meet a certain size so as to ensure a normal calculation process. The size of the scattered field data we obtained under this task is 16x32, which is not conducive to our intensive scientific research. Therefore, before the raw scattered field data is input to the network, we interpolate it using cubic polynomial interpolation in Matlab. The final interpolated scattered field data size is 64x 64. The same interpolation processing is performed for both the simple hand model and the complex hand model. Meanwhile, since the number of samples in the data set is only 2,084, the problem of too small number exists, so that the training set and the test set are divided by a ratio of 7:3, and the test set is used for replacing the verification set in the training process.

In summary, the data set obtained in step (1) is roughly as shown in table 1, and all subsequent experiments were performed on this preprocessed data set.

TABLE 1 overview of scattered field data

The complex convolution neural network in the step (2) is specially proposed for learning the microwave visual features of the electromagnetic scattering field data, and the structure is shown in fig. 2, so that the relation between the real part and the imaginary part of the complex can be better mined. In the process of eliminating interference factors such as noise, site limitation and the like which really exist in reality, scattered field data of a simple hand model with full aperture and no noise is used for verifying the effectiveness of a complex convolution network.

We construct a two-layer deep complex neural network using the convolutional layers, the batch regularization layers and the activation layers mentioned in step (2). As a control, a two-layer real neural network was constructed similarly. A structural comparison of them is shown in fig. 3. It can be seen that the two are different only in the components used, and the structure is identical.

The scattered field data processed in step (1) are respectively input into a complex network a and a real network B shown in fig. 3. When inputting data into the network B, we also consider the effect of the real and imaginary parts on the classification result, respectively. Specifically, for Complex network a, the complete fringe field data is used as input, and the result is denoted as "complete"; for the Real network B, when the Real part of the scattered field data is input alone, the result is referred to as "Real", the result obtained when the Imaginary part is input alone is referred to as "Imaginary", the result obtained when the modular lengths of the two are input is referred to as "Modulus", and the result obtained when the complete scattered field data is input is referred to as "Concat". The experimental results are shown in table 2:

TABLE 2 comparison of real and complex networks

From table 2, the following two points can be found:

1. the results for "Concat" are best compared to other results obtained with real networks. This shows that the real and imaginary parts of the scattered field data store the same information about the object, which is also important for identifying the object, but they are not sufficient when considering the application of complex convolutional neural networks to identify objects.

2. The result of "complete" is significantly better than any other result obtained with a real network. This means that it is very unreasonable to roughly consider the real and imaginary parts as two separate parts. We shall treat the two jointly, which also illustrates the necessity to process the scattered field data using our invented complex convolutional neural network.

On the basis of the above, we introduce the attention mechanism described in step (3), and the structure is shown in fig. 3. To demonstrate the effect of the attention mechanism on capturing subtle features in scattered field data, we performed experimental demonstrations. It should be noted that as the number of layers of the convolutional neural network increases, the learned features become more abstract and focus on the slight difference of the object itself, so that the experiment considers the influence of the network depth on the attention mechanism.

The network used here is basically the same as the complex network a established in step (2), with the difference that: 1) the number of layers is different; 2) whether or not an attention layer is introduced after each convolutional layer. The input data used is the scattered field data of the simple hand model. The results are shown in FIG. 5, which is the average of five experiments.

As can be seen from fig. 5, a complex network with attention mechanism performs better, both in terms of accuracy and loss value of the network. This is also easy to understand and interpret, and the scatter field data also carries some redundant information that can interfere with the final classification results, while the attention mechanism helps us to capture those key features and ignore those extraneous, redundant features.

And (4) training the model parameters in the neural network built in the steps (2) and (3) through a back propagation algorithm until the whole network model converges, which is as follows:

and 4-1, when the network is trained, the optimization function uses a random gradient descent method, the learning rate is set to be 0.8, the batch size is set to be 5, the iteration is carried out for 50 times, and the loss function uses a cross entropy loss function (conditional cross entropy).

And 4-2, testing the network model. After the training of the network model is completed, the accuracy is tested on the final model by using the test data as the detection standard, and the accuracy comparison results obtained by testing the final model by using the same data on some traditional methods are shown in the following table 3.

Table 3 results of the comparison of the accuracy (%) with other methods

Data of	LeNet	AlexNet	VGG11	Our-C	Our
						Image of a person	97.9	99.2	99.4	-	-
Scattered field	91.7	81.9	98.2	99.0	99.5

Where LeNet, AlexNet and VGG11 are classical machine vision neural network models, when these three models accept scattered field data, they simply look like the RGB three channels when accepting RGB images, with the real and imaginary parts being simply viewed as two channels. "Our-C" and "Our" are two methods that both use the complex convolutional neural network model with attention mechanism built in steps (2) and (3). Where "Our-C" takes as input the fringe field data of the complex hand model and "Our" takes as input the fringe field data of the simple hand model. From the results in table 3, the following three conclusions can be drawn:

1. from the expressions of LeNet, AlexNet, and VGG11, the deeper the number of layers when image data is used as input to the network, the better the model can learn the features of the target, and thus achieve higher accuracy. However, this conclusion is not true when the scattered field data is used as input. This shows that for scatterfield data, a shallow network may be more suitable, and a deep network may destroy the original intrinsic structure of the scatterfield data.

2. From the perspective of the scattered field data as input, our proposed model is significantly superior to other network models designed for machine vision features. This illustrates that conventional visual networks (LeNet, AlexNet, and VGG11) cannot effectively capture the intrinsic correlation between the real and imaginary parts in the scattered field data.

3. The result of "Our-C" is only 0.5% different from the result of "Our", and even considering the complex structure of the hand, our model can still effectively recognize the object. This means that the simple hand model can be equivalent to a complex hand model when the microwave frequency is less than 3 GHz.

And 4-3, testing the noise resistance of the model. In actual production life, the collected scattered field data is necessarily noisy. Therefore, to test the noise immunity of our proposed model, we added white gaussian noise to the collected data. Specifically, we set four levels of noise levels, which are 0db, 10db, 20db, and 30db, respectively. 0db means no noise is added. The results of the experiment are shown in Table 4.

TABLE 4 Classification results at different noise levels

It can be seen that when the noise is 30db, the classification accuracy of the model is already the same as the result without the noise added. The model can still provide meaningful results even in the presence of significant noise. This demonstrates that our model is superior in its ability to resist noise.

Claims

1. A target classification method based on microwave vision is characterized by comprising the following steps:

the method comprises the following steps that (1) electromagnetic scattering field data are obtained by utilizing a microwave transmitting antenna and a microwave receiving antenna and are preprocessed;

step (2), constructing a complex convolution neural network to realize target classification, wherein the network comprises the following steps: a plurality of convolution layers, a plurality of batch normalization layers and a plurality of active layers;

step (3), in order to capture the details of the target in the image and realize the detection of the key position, an attention mechanism module is introduced into the complex convolution neural network;

2. The method for classifying targets based on microwave vision according to claim 1, wherein the complex convolutional neural network in step (2) is obtained by improving convolutional layers, batch normalization layers and complex activation layers in a real convolutional neural network, and specifically comprises the following steps:

2-1. plural convolutional layers: the complex convolution kernel divides the convolution kernel parameters into two parts, namely a real part and an imaginary part of an imaginary number; when convolution operation is carried out, the operation method follows the operation rule of imaginary number convolution; that is, for the convolution mask W ═ a + iB and the input feature h ═ x + iy, the complex convolution operation is defined as follows:

w ═ h ═ (a × x-B × + i (B × x + a ×) # (formula 1)

Written in matrix form then:

wherein, indicates a convolution operation, R (") indicates taking a real part, and I (") indicates taking an imaginary part;

2-2. plural batch normalization layer: in order to ensure that the real part and the imaginary part have the same variance, a covariance matrix of the real part and the imaginary part is introduced, and a specific formula is as follows:

wherein, x represents the input of the input,

denotes the result after normalization, E [ x ]]All 1 matrices of the same shape as x are represented; v denotes the covariance between the real and imaginary parts of x, and the specific calculation rule is as follows:

wherein Cov (") indicates an evaluation of covariance;

2-3. plural active layers: the values of the real part and the imaginary part need to be suppressed separately, and the complex activation function used is:

crelu (z) ═ ReLu (r (z)) + iReLu (i (z)) # (formula 5)

Wherein z represents an input and Relu (") represents an activation function in a real network.

3. The method for classifying targets based on microwave vision as claimed in claim 2, wherein the attention mechanism module of step (3) is introduced into a complex convolutional neural network; in the attention mechanism module, firstly, maximum value sampling and mean value sampling are respectively carried out on the characteristics of the middle layer on the spatial scale once, then denoising and spatial projection are carried out through a multilayer perceptron, after the sum of two sampling results is obtained, element-by-element product operation is carried out on the characteristics of the middle layer and the original characteristics on the spatial scale, and a high-response area on the channel scale is captured; then, maximum value sampling and average value sampling on a channel scale are carried out again, and the maximum value sampling and the average value sampling are fused together by using a convolution layer, so that a high-response region of the feature on a spatial scale can be captured finally.

4. The microwave vision-based target classification method according to claim 3, characterized in that the model parameters of the neural network constructed in the steps (2) and (3) are trained in the step (4) through a back propagation algorithm until the whole network model converges, and the main purpose is to maximize the accuracy of the trained model in classifying the target; for this problem, the loss function uses a cross-entropy loss function.