CN114494339A

CN114494339A - Unmanned aerial vehicle target tracking method based on DAMDNet-EKF algorithm

Info

Publication number: CN114494339A
Application number: CN202111639385.2A
Authority: CN
Inventors: 于倩倩; 郑钰辉
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-05-13
Anticipated expiration: 2041-12-29
Also published as: CN114494339B

Abstract

The invention discloses an unmanned aerial vehicle target tracking method based on DAMDNet-EKF algorithm, which comprises the following steps: step 1, collecting video frames, and training a boundary box regression model according to the initial video frames through a preset tracking model; the tracking model is constructed based on a neural network model of DAMDNet and is pre-trained and initialized; step 2, removing the initial video frame, and using the residual video frame as the tracking model input to obtain the target boundary frame of the video frame and the target score corresponding to the target boundary frame; step 3, if the target score obtained in the step 2 is smaller than a preset value, updating the tracking model; and if the target score obtained in the step 2 is not less than the preset value, adjusting the optimal target boundary box by using a boundary box regression model and updating the tracking model. The invention effectively solves the problems that the tracking failure of the unmanned aerial vehicle tracking algorithm is easy and the real-time performance is poor.

Description

Unmanned aerial vehicle target tracking method based on DAMDNet-EKF algorithm

Technical Field

The invention relates to the technical field of target tracking, in particular to an unmanned aerial vehicle target tracking method based on a DAMDNet-EKF algorithm.

Background

Near target tracking is one of the research hotspots in the field of computer vision. The main task of target tracking is to track a specific target and obtain the information of its position, scale, etc. in each frame. At present, although the academic world and the industrial world have made exciting progress in the target tracking research, there still exist many problems and challenges, such as complex application environment, similar interference of background, change of illumination condition, shading and other external factors, and target posture change, apparent deformation, scale change, fast movement, motion blur and the like.

The classification network-based tracking method regards the tracking task as a foreground and background two-classification task, and classifies the candidate sample with the highest confidence degree in each frame. The MDNet method introduces classification into the tracking field through a network, and introduces a training mode under a multi-target field to learn robust general target characteristics aiming at the fact that ambiguity can be generated between a target and a background in a tracking scene under different videos, namely the target in the video can become a background object in other videos. However, since feature extraction needs to be performed on a large number of candidate samples, the classified tracking method has the disadvantage that the tracking speed is very slow, and real-time tracking cannot be achieved.

Disclosure of Invention

The invention aims to solve the technical problem of overcoming the defects of the prior art and provides the target tracking method of the unmanned aerial vehicle based on the DAMDNet-EKF algorithm.

The invention adopts the following technical scheme for solving the technical problems:

the unmanned aerial vehicle target tracking method based on the DAMDNet-EKF algorithm provided by the invention comprises the following steps:

step 1, collecting video frames, and training a boundary box regression model according to the initial video frames through a preset tracking model; the tracking model is constructed based on a neural network model of DAMDNet and is pre-trained and initialized;

step 2, removing the initial video frame, and using the residual video frame as the tracking model input to obtain the target boundary frame of the video frame and the target score corresponding to the target boundary frame;

step 3, if the target score obtained in the step 2 is smaller than a preset value, updating the tracking model; and if the target score obtained in the step 2 is not less than the preset value, adjusting the optimal target boundary box by using a boundary box regression model and updating the tracking model.

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the neural network model of the DAMDNet comprises an input layer, a convolution layer, a field adaptive assembly, a spatial pyramid pooling layer and a full connection layer which are sequentially connected.

The invention relates to a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, namely a learning target f of a field self-adaptive component^*Comprises the following steps:

where f (-) is the objective function, i.e. x in the current frame_iIs the probability of the target, B denotes the number of samples, x_iRepresenting the i-th candidate box, y, in the current video frame_iIs x_iWhen x is a label of_iIs the target time y_i1, l (-) is a loss function, λ is a regularization parameter, R (-) is a regularization term, i.e., a measure of the complexity of the DAMDNet neural network model,

and

samples representing the source domain and the target domain, respectively, f is the true label function over the data distribution space H,

is the data distribution space of the sample.

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the convolution layer and the field adaptive assembly in the neural network model of the DAMDNet are pre-trained by using an image classification field data set ILSVRC.

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the spatial pyramid pooling layer specifically comprises the following steps:

setting the size of the output feature of the domain self-adaptive component as x, wherein x is the width of the feature, the input of the spatial pyramid pooling layer is x, the size of the convolution kernel is k, the filling is p, the step length is s, and the output feature graph x of the spatial pyramid pooling layer is_jThe size is as follows:

and inputting features with different sizes by virtue of two mapping relations of point mapping and edge mapping, outputting candidate frames with the same size by the spatial pyramid pooling layer, and directly inputting the candidate frames to the full connection layer.

As a further optimization scheme of the unmanned aerial vehicle target tracking method based on the DAMDNet-EKF algorithm, in the step 3, the optimal target boundary box refers to the target boundary box with the highest target score:

wherein x is^*For the target bounding box with the highest target score, xⁱVideo frame target bounding box, f, obtained from the output of the tracking model⁺(xⁱ) Is xⁱThe target score of (1).

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the target boundary frame obtained in the step 2 is obtained by correction, the correction is realized by adopting a deep convolution neural network method and an EKF method; the method comprises the following specific steps:

step 2.1, mapping the target boundary frame to obtain the motion state and direction of the target to be tracked in real time;

2.2, carrying out Canny algorithm contour detection according to the motion state and direction of the target to be tracked, and solving pixel coordinates of a target boundary frame;

and 2.3, fine adjustment is carried out on the coordinates of the contour pixels of the edge of the target, and the unmanned aerial vehicle can move along with the target at any angle in the same plane.

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the mapping in the step 2.1 is as follows:

h(x′)→Hx′,H_j→H,

x′＝f(x,u)→Fx+u,F_j→F

wherein, H is cross entropy loss, x 'is target bounding box predicted pixel coordinates, → is mapping, f (x, u) and H (x') are nonlinear functions, and u is predicted gaussian noise; h_jTo correct the target bounding box position matrix at the moment, F_jTo predict the future target bounding box position matrix, F is the target bounding box position transition matrix.

As a further optimization scheme of the unmanned aerial vehicle target tracking method of the DAMDNet-EKF algorithm, the specific processes of the steps 2.1-2.3 are as follows:

according to the real-time target boundary frame position matrix to be tracked obtained in the step 2.1, Canny algorithm contour detection is carried out on the target to be tracked, and the coordinates (x) of the ith target boundary frame pixel are solved_i,y_i) Then obtaining the image centroid coordinates of the target

Wherein n is the number of pixel coordinates,

represents the average of the target bounding box pixel abscissas,

an average value representing the ordinate of the pixel of the target bounding box;

calculating a difference of abscissas of the target bounding box pixels

Difference of ordinate of target bounding box pixel

The unmanned aerial vehicle performs the following control:

wherein a and b are fine tuning coefficients, a>0,b>0，x_max、y_maxThe maximum values of the abscissa and the ordinate of the target bounding box pixel are respectively.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

according to the invention, a target tracking model based on MDNet improvement is adopted, and a tracking task is executed through pre-training and initialization; the hysteresis processed by the tracking algorithm is well compensated through an EKF algorithm; the problem that the tracking failure of the unmanned aerial vehicle tracking algorithm is easy to occur and the real-time performance is poor is effectively solved.

Drawings

Fig. 1 is a block diagram of an unmanned aerial vehicle target tracking control system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a DAMDNet-EKF tracking algorithm model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a block diagram of an unmanned aerial vehicle target tracking control system according to an embodiment of the present invention, and as shown in fig. 1, the structure includes:

the quad-rotor unmanned aerial vehicle adopts ARM as the main processor to control the speed of four motors, utilizes DJI miscalculations to process the target image that the camera obtained, DJI miscalculations output the control command that unmanned aerial vehicle flies and send to unmanned aerial vehicle flight control system.

The vehicle-mounted camera acquires a standard video signal and transmits the standard video signal to the FGPA for image decoding and analog-to-digital conversion. And the wireless transmission module is controlled by the FGPA to transmit the field of view information of the camera to the ground station. The ground station receives the image information and decompresses it for display on the display. While the user will select the target to track on the screen. The data and control parameters are transmitted to the FGPA through a suitable tracking algorithm, and the FGPA transmits the information to the ARM master module. The main control module is used for carrying out attitude calculation to control the airplane based on the template data, the control parameters, the GPS signals and attitude information of the gyroscope, the magnetometer and the accelerometer. At the same time, the FGPA will also control the camera via the attitude signal from the ARM.

The method comprises the steps that collected video frames pass through a preset tracking model to obtain a target boundary box of the video frames and a target score corresponding to the target boundary box; the tracking model is constructed based on a DAMDNet neural network model and is obtained after pre-training and initialization.

As shown in fig. 2, the damnet neural network model sequentially includes an input layer, a convolutional layer, a domain adaptation component, an adaptive spatial pyramid pooling layer, and a full connection layer, where the input of the input layer is set to be 107 × 107 images, the convolutional layer may be set to be three conv1-conv3 layers connected in sequence, the domain adaptation component may be set to be conv2_1 and conv3_1, the adaptive spatial pyramid pooling layer may be set to be spp, the full connection layer is a binary layer domain-specific layers fc6, and each training video corresponds to one full connection layer in the training process. The general characteristics of each video in the training set can be obtained through pre-training, so that a general expression of the tracking model is obtained, and a trained conv1-conv3_1 layer is obtained.

When a tracking task is executed at the beginning each time, on the basis of a pre-trained tracking model, the tracking model initializes an fc6 layer corresponding to the current tracking task through a first collected video frame. The specific initialization process is exemplified as follows:

firstly, a first frame of video frame is collected, the real position of the first frame of video frame is determined, 1000 candidate regions meeting the requirements are generated through multi-dimensional Gaussian distribution, the intersection ratio of the candidate regions IOU (intersection over Union) is 0.6, the candidate regions are input into a tracking model, the features of domain adaptation components conv2_1 and conv3_1 are obtained, and the features are used for carrying out boundary frame regression training by combining the real position. The features of the positive and negative sample regions are next extracted for the 500 positive sample and 5000 negative sample regions generated for the first video frame. And then, carrying out iterative training on the tracking model, and randomly selecting the features of 32 positive sample regions and the features of 1024 negative sample regions in each iteration to form a mini-batch. And inputting 1024 negative sample regions into the tracking model, circulating and calculating scores, and selecting 96 highest-scoring negative sample regions from the 1024 negative samples as the difficultly-exemplified negative sample regions. And introducing the data into the system, respectively calculating the score of a positive sample region and the score of a difficult negative sample region, calculating loss by forward propagation, and finally optimizing, updating parameters and the like to obtain an initialized fc6 layer.

And if the tracking of the acquired video frame is determined to fail and the target score is less than 0.5, updating the model in a short time. The short-time updating process specifically comprises the following steps: and selecting a number of video frames which are acquired recently and correspond to the short-term updating process, such as the latest 20 video frames, and extracting the positive sample regions and the negative sample regions of the prestored 20 video frames to iteratively train the tracking model.

Actually, after the pre-stored positive sample region and negative sample region of the video frame are determined to successfully track the currently acquired video frame, forward propagation is respectively performed on the positive sample region and the negative sample region according to the obtained positive sample region and the negative sample region acquired by the target bounding box, and then the characteristics of corresponding conv2_1 and conv3_1 are obtained and stored.

The updating strategy also sets a long-term updating strategy, the long-term updating process is periodically executed on the model according to a preset long-term updating period, for example, 8 frames, 10 frames and the like, and the positive sample regions and the negative sample regions of the video frames which are collected and correspond to the long-term updating process are selected in the long-term updating process for performing iterative training on the tracking model.

The model tracking unit is used for enabling the collected video frames to pass through a preset tracking model to obtain a target boundary box of the video frames and a target score corresponding to the target boundary box; the tracking model is constructed based on a DAMDNet neural network model and is obtained after pre-training and initialization; and the model updating unit is used for updating the model in a short time if the tracking of the acquired video frame is determined to fail according to a preset updating strategy. And meanwhile, the model is updated periodically for a long time.

An EKF (extended Kalman Filter) method is used for correcting the current measured value, real-time recursive filtering is carried out, and instability of the unmanned aerial vehicle caused by violent shaking of the following target is prevented; the EKF method can make predictions about the next state, compensating better for the hysteresis of the damnet algorithm. The principle is as follows:

h(x′)→Hx′,H_j→H,

x′＝f(x,u)→Fx+u,F_j→F

wherein, H is cross entropy loss, x' is target bounding box predicted pixel coordinates, → is mapping, and u is predicted Gaussian noise; h_jTo correct the target bounding box position matrix at the moment, F_jTo predict the position matrix of the future target bounding box, f (x, u), H (x') are non-linear functions to obtain more accurate state prediction values, and the mapped measurement values H_j、F_jIs composed of

Through the processing, the motion state and the direction of the target can be obtained in real time, Canny algorithm contour detection is carried out on the target to be tracked, and the coordinates (x) of the ith target boundary frame pixel are solved_i,y_i) Then obtaining the image centroid coordinates of the target

Wherein n is the number of pixel coordinates,

represents the average of the target bounding box pixel abscissas,

calculating a difference of abscissas of the target bounding box pixels

Difference of ordinate of target bounding box pixel

The unmanned aerial vehicle performs the following control:

wherein a and b are fine tuning coefficients, a>0,b>0，x_max、y_maxRespectively taking the maximum values of the abscissa and the ordinate of the target boundary frame pixel; x is to be_eAnd y_eVectorization and vector addition can convert the vector into (x)_e,y_e) The target tracking method has the advantages that the unmanned aerial vehicle can move along with the target at any angle in the same plane, the algorithm design of the unmanned aerial vehicle for target tracking is integrally realized, and the problems that the unmanned aerial vehicle is easy to fail when the target is tracked and is insufficient in real-time are compensated by using the prediction idea of the EKF.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. An unmanned aerial vehicle target tracking method based on a DAMDNet-EKF algorithm is characterized by comprising the following steps:

2. The method of claim 1, wherein the neural network model of DAMDNet comprises an input layer, a convolutional layer, a domain adaptive component, a spatial pyramid pooling layer, and a fully connected layer, which are sequentially connected in sequence.

3. The method of claim 2, wherein the learning objective f of the domain adaptive component is a learning objective f^*Comprises the following steps:

where f (-) is the objective function, i.e. x in the current frame_iIs the probability of the target, B denotes the number of samples, x_iRepresenting the i-th candidate box, y, in the current video frame_iIs x_iWhen x is a label of_iIs the target time y_i1, l (-) is a loss function, λ is a regularization parameter, R (-) is a regularization term, i.e., a measure of the complexity of the damnet neural network model,

and

is the data distribution space of the sample.

4. The method of claim 2, wherein the convolution layer and the domain adaptive component in the neural network model of damnet are pre-trained using an image classification domain dataset ILSVRC.

5. The method for unmanned aerial vehicle target tracking based on DAMDNet-EKF algorithm as claimed in claim 2, wherein the spatial pyramid pooling layer is as follows:

and if the size of the output feature of the domain self-adaptive component is x, and x is the width of the feature, the input of the spatial pyramid pooling layer is x, the size of the convolution kernel is k, the filling is p, the step length is s, and the output feature diagram x of the spatial pyramid pooling layer is x_jThe size is as follows:

6. The method for tracking the target of the unmanned aerial vehicle based on the DAMDNet-EKF algorithm as claimed in claim 1, wherein in step 3, the optimal target boundary box is the target boundary box with the highest target score:

7. The method for unmanned aerial vehicle target tracking based on DAMDNet-EKF algorithm as claimed in claim 1, wherein the target bounding box obtained in step 2 is obtained by modification, the modification is performed by using deep convolution neural network method and using EKF method; the method comprises the following specific steps:

8. The method for unmanned aerial vehicle target tracking based on DAMDNet-EKF algorithm as claimed in claim 7, wherein the mapping in step 2.1 is specifically as follows:

h(x′)→Hx′，H_j→H，

x′＝f(x，u)→Fx+u，F_j→F

wherein, H is cross entropy loss, x 'is target bounding box predicted pixel coordinates, → is mapping, f (x, u) and H (x') are nonlinear functions, and u is predicted gaussian noise; h_jFor the purpose of current correctionBounding box position matrix, F_jTo predict the future target bounding box position matrix, F is the target bounding box position transition matrix.

9. The target tracking method of the unmanned aerial vehicle based on DAMDNet-EKF algorithm as claimed in claim 7, wherein the specific procedures of steps 2.1-2.3 are as follows:

according to the real-time target boundary frame position matrix to be tracked obtained in the step 2.1, Canny algorithm contour detection is carried out on the target to be tracked, and the coordinates (x) of the ith target boundary frame pixel are solved_i，y_i) Then obtaining the image centroid coordinates of the target

Wherein n is the number of pixel coordinates,

represents the average of the target bounding box pixel abscissas,

calculating a difference of abscissas of the target bounding box pixels

Difference of ordinate of target bounding box pixel

The unmanned aerial vehicle performs the following control:

wherein a and b are fine tuning coefficients, a is more than 0, b is more than 0, and x_max、y_maxThe maximum values of the abscissa and the ordinate of the target bounding box pixel are respectively.