CN112347993B

CN112347993B - Expressway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation

Info

Publication number: CN112347993B
Application number: CN202011379618.5A
Authority: CN
Inventors: 张素民; 支永帅; 包智鹏; 杨志; 卢守义; 孟志伟
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2023-03-17
Anticipated expiration: 2040-11-30
Also published as: CN112347993A

Abstract

The invention discloses a method for predicting behaviors and tracks of vehicles on a highway based on vehicle-unmanned aerial vehicle cooperation, which belongs to the field of intelligent transportation and comprises the following steps of S1: training data generation to obtain ideal data input; step S2: training data generation utilizes an unmanned aerial vehicle to acquire data, and then a training data generation method is used for generating training data; and step S3: training a vehicle behavior and track prediction model in an off-line training by using training data generated by training data generation; and step S4: and generating output training data by using the training data, storing parameters of a behavior and trajectory prediction model in offline training, and then predicting the behavior and trajectory in online prediction by using the model. The invention can accurately acquire the state information of surrounding vehicles and the behaviors which the vehicles can take, and improves the accuracy of prediction and the performance of a prediction algorithm.

Description

Expressway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation

Technical Field

The invention belongs to the field of intelligent transportation, and particularly relates to a highway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation.

Background

With the rapid development of deep learning techniques and the intensive research in the field of artificial intelligence, the field of automated driving is undergoing revolutionary changes. A good autonomous vehicle should be able to predict the future evolution of traffic conditions accurately in time.

The highway is one of the most important scenes in the research of the automatic driving technology as a structured road, and meanwhile, the highway is one of the scenes that the highway is relatively easy to land in the automatic driving technology because the road structure is simpler compared with that of an urban road.

However, even in a relatively single scene such as a highway, long-term (about 5 s) behavior and trajectory prediction of surrounding traffic vehicles still face great challenges, mainly due to the following reasons: 1) The behavior of the vehicle is multi-modal. 2) The laser radar sensors arranged on the vehicles are difficult to accurately acquire the state information of all surrounding vehicles in the current scene, and the prediction accuracy is greatly reduced. 3) Currently, most vehicle behavior and trajectory prediction methods are assumed to be under the background of V2X, which is very dependent on the implementation of internet technology, and cannot be implemented in a short time, and the cost of the internet of vehicles is huge. 4) Vehicle behavior on a highway is highly interactive, resulting in too aggressive or conservative driving behavior if the behavior and location of surrounding vehicles are not well predicted.

In the current research on vehicle motion prediction, the behavior and the trajectory of a vehicle are generally studied separately, but the trajectory and the behavior of the vehicle belong to the motion state of the vehicle, and the two are closely linked, so that a method capable of predicting the two at the same time is needed. Secondly, the existing vehicle behavior and track prediction method mainly considers the historical track information of the vehicles and the road conditions of the vehicles, and the interaction influence among the vehicles is less involved, so that the prediction accuracy is greatly reduced. Finally, most of the existing vehicle behavior and trajectory prediction methods rely on various sensors equipped on the vehicle, such as laser radar, millimeter wave radar, and multiple vehicle-mounted cameras, to obtain historical trajectories of surrounding vehicles and current position information of the vehicle. In addition, some of the internet of vehicles rely on the implementation of the internet of vehicles technology, which can acquire information of all vehicles in the same scene, but the internet of vehicles technology cannot be widely popularized in a short period of time due to the advancement of the technology.

Disclosure of Invention

The embodiment of the invention aims to provide a method for predicting the behavior and the track of a highway vehicle based on the cooperation of a vehicle and an unmanned aerial vehicle, and solves the problems that in the prior art, a sensing signal is easily shielded by surrounding vehicles, the state information of the surrounding vehicles cannot be accurately obtained, the functions of a vehicle behavior and track prediction system are not perfect, the performance of a prediction algorithm is low, and the prediction accuracy is low.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a highway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation comprises the following steps:

step S1: the training data generation module acquires ideal data input;

step S2: the training data generation module acquires data by using the unmanned aerial vehicle provided with the GPS and the camera, and generates training data by using a training data generation method;

and step S3: training a vehicle behavior and track prediction model in an offline training module by using training data generated by a training data generation module;

and step S4: and using the training data output by the training data generation module, storing parameters of a behavior and trajectory prediction model in the off-line training module, and then predicting the behavior and trajectory in the on-line prediction module by using the model.

Further, the data input of step S1 includes: the vehicle information includes a vehicle number v _ ID of the predicted vehicle and its surrounding vehicles, vehicle abscissas x and ordinates y of the predicted vehicle and its surrounding vehicles, vehicle speeds v _ vel of the predicted vehicle and its surrounding vehicles, vehicle accelerations v _ acc of the predicted vehicle and its surrounding vehicles, and Lane numbers Lane _ ID of the predicted vehicle and its surrounding vehicles.

Further, the step S2 specifically includes:

step S21: acquiring a traffic scene video sequence transmitted by an unmanned aerial vehicle, and respectively inputting the acquired video sequence into a camera matching module, a vehicle detection module and a lane line detection module;

step S22: the camera matching module calculates internal and external parameters of a camera on the unmanned aerial vehicle; the vehicle detection module detects vehicles in the traffic scene video transmitted by the video sequence module, and inputs the vehicle mark number v _ ID, the frame number frame _ ID, the vehicle type v _ class, the vehicle bounding box information and the vehicle inside and outside parameters calculated by the camera matching module into the track segment generation module to generate track segments corresponding to each v _ ID;

wherein the vehicle bounding box information includes the vehicle bounding box and center coordinates of the vehicle bounding box;

step S23: then inputting the generated track segments into an appearance modeling module and a loss calculation module, wherein the appearance modeling module models the content in each vehicle bounding box of the track segments and inputs appearance modeling loss into the loss calculation module; the loss calculation module combines smoothness loss, speed change loss, time interval loss and appearance modeling loss transmitted by the appearance modeling module with a loss function of a calculated track, and the smaller the loss function value is, the better the track matching effect is shown;

step S24: inputting the calculated loss function into a track clustering module; the track clustering module comprises 5 operations of distribution, fusion, segmentation, exchange and interruption, and finds the minimum change value and executes the operation corresponding to the minimum change value of the loss function by comparing the change values of the loss function under the 5 operations;

step S25: and finally, inputting the vehicle mark v _ ID, the vehicle speed v _ vel, the vehicle acceleration v _ acc, the vehicle abscissa x, the vehicle ordinate y, the vehicle type v _ class, the frame number frame _ ID and the Lane number Lane _ ID information detected by the Lane line detection module into the csv data output module to obtain corresponding data.

Further, in the step S21, the unmanned aerial vehicle obtains the position of the unmanned aerial vehicle and simultaneously obtains a top view of a driving scene, and information transmitted by a GPS on the unmanned aerial vehicle always keeps the unmanned aerial vehicle between 30 m and 150m above the predicted vehicle;

after the camera matching module receives the video sequence, calibrating internal parameters and external parameters of the unmanned aerial vehicle camera by using a vanishing point method, and then reversely mapping the central coordinate of the vehicle bounding box detected by the vehicle detection module to a 3-dimensional space by using the marked parameters to calculate the position coordinate information of the vehicle in a real scene;

the vehicle detection module performs vehicle detection by using a vehicle detection algorithm based on YOLOv4, and initializes a network by adopting a pre-trained weight;

the lane line prediction module detects a lane line in a video sequence by using a semantic segmentation method, compares and matches a lane line detection result with the detected vehicle position, and obtains lane line information occupied by each frame of picture by each vehicle ID.

Further, the step S22 of generating a track segment corresponding to each v _ ID by the track segment generating module includes the following steps:

step S22.1: after position coordinate information in a real scene is obtained, outputting position information under the real scene corresponding to all detected vehicles in each picture, and continuously inputting vehicle position information in a plurality of adjacent pictures to obtain the position coordinates of the vehicles at different continuous moments;

step S22.2: grouping the position coordinates of the obtained vehicles by adopting a density-based clustering algorithm to divide the position coordinates into a plurality of different track segments, wherein the number of the track segments is the same as that of the vehicles appearing in the camera;

step S22.3: finally, connecting the track segments by using a clustering method to form a longer track;

wherein the clustering operation is implemented by minimizing a clustering loss function, the loss function being:

l _i ＝λ _sm l _i,sm +λ _vc l _i,vc +λ _ti l _i,ti +λ _ac l _i,ac

wherein l represents the loss function of all tracks in the scene; n is _v Indicating the number of vehicles in the video taken by the camera, l _i Is the clustering loss of the ith trace, l _i,sm Is the loss of smoothness of the trajectory i,/ _i,vc Is the loss of variation in vehicle speed,. L _i,ti Is the loss of time interval between two adjacent track segments, l _i,ac Is a loss of appearance change of the vehicle; lambda [ alpha ] _sm Regularization parameter, λ, representing trajectory smoothness loss _vc Regularization parameter, λ, representing loss of vehicle speed variation _ti Parametric parameter, λ, representing the loss of time interval between adjacent track segments _ac A regularization parameter indicative of a loss of change in vehicle appearance;

the smoothness loss is to measure the smoothness of track segments belonging to the same track, the track segments in each track are sequenced according to an incoming time sequence, the acceleration is used for measuring the speed change between two adjacent track segments, and whether the track segments belong to the track of the same vehicle is judged according to the acceleration change of a connecting area of the track segments and the time interval between the track segments;

wherein the loss of change in appearance of the vehicle is calculated using an adaptive appearance model based on histograms.

Further, the specific process of step S23 is: modeling the content in the image in the vehicle bounding box detected by the vehicle detection module corresponding to the position coordinates of each track segment generated by the track segment generation module, describing the characteristics of the vehicle in the bounding box by using an RGB color histogram, an HSV color histogram, an Lab color histogram, a linear binary mode histogram and a direction gradient histogram, and accurately modeling the image content in each vehicle bounding box; determining the consistency of the detected vehicle in each bounding box by comparing five histogram combinations of different bounding boxes; and defining the loss of the self-adaptive appearance model by using the difference value between two adjacent histogram combinations of all track segments in the track, wherein the smaller the loss value is, the higher the probability that the track segments in the track belong to the same track is.

Further, the allocation operation in the trajectory clustering module in step S24 specifically includes: all track segments t _j The formed track set is recorded as T (j), and the T (j) belongs to the track of the vehicle j; searching in all tracks and dividing the track into t sections _j Assigning to a trajectory that produces minimal losses; the loss change after the allocation operation is defined as:

wherein: Δ l _j,as Is the loss change after the dispensing operation, l (T (j) \ T _j ) Represents a track segment t _j Removing the loss of the post-trace set T (j) from T (j), l (T (i) U.T. _j ) Indicates that the track segment t is to be traced _j The loss of the track set T (i) after the T (j) is merged, and l (T (j)) and l (T (i)) respectively represent the loss of the track set T (j) and the track set T (i);

the fusion operation in the track clustering module is as follows: for each track segment, if a smaller loss value can be obtained after fusing two track sets, fusing the corresponding track set T (j) with another track set T (i), and similarly, the loss change of the fusion operation is defined as:

wherein,. DELTA.l _j,mg Is a loss change after the fusion operation, l (T (j) u T (i)) represents a loss after fusing the track set T (j) and the track set T (i);

the segmentation operation in the trajectory clustering module: using a segmentation operation to segment a track segment from a current track set to form an independent track set; the loss variation of the segmentation operation is defined as:

Δl _j,sp ＝(l(t _j )+l(T(j)\t _j ))-l(T(j))

wherein,. DELTA.l _j,sp Is the loss change after the splitting operation, l (t) _j ) Is a track segment T divided from T (j) _j The resulting loss, | (T (j) \ T) _j ) Indicating the division of T from the set of tracks T (j) _j The resulting loss;

the switching operation in the track clustering module is as follows: for a set of tracks T (j), T is added _j The track segment thereafter is denoted T _aft (j) The previous track segment is denoted T _bef (j) (ii) a Will T _bef (i) And T _bef (j) The exchange is performed and the loss change after the exchange and before the exchange is obtained,

wherein,. DELTA.l _sw Indicates the change in loss after the crossover operation, l (T) _aft (j)∪T _bef (i) Denotes that _aft (j) And T _bef (i) Loss after performing fusion,/(T) _bef (j)∪T _aft (i) Denotes that _bef (j) And T _aft (i) Loss after fusion;

interrupting operation in the track clustering module: dividing T (j) into T _aft (j) And T _bef (j) To calculate the loss of the break operation:

Δl _bk ＝(l(T _bef (j))+l(T _aft (j)))-l(T(j))

wherein, Δ l _bk Indicating the change in loss after the breaking operation, l (T) _bef (j) ) represents a track set T _bef (j) Loss of l (T) _aft (j) Represents a set of trajectories T _aft (j) Is lost.

Further, the step S3 specifically includes:

step S31: the grid diagram representation module is used for representing the position relation between the predicted vehicle and the surrounding vehicles;

step S32: carrying out deep level feature extraction, carrying out unsupervised feature extraction on training data by using a confrontation network encoder, and extracting the dynamic feature and the driving style of each vehicle;

step S33: carrying out characteristic coding, and coding vehicle dynamics and driving style by adopting a gate control cycle unit;

step S34: after all vehicles in a scene are coded, filling feature vectors according to the position of each vehicle relative to a predicted vehicle, and filling the coded features into corresponding grid graphs to form social semantic vectors so as to obtain a class image feature representation form;

step S35: in the convolution module, processing the social semantic vector by using 3 × 3 convolution, 3 × 1 convolution and 2 × 2 residual convolution with a residual rate of 2;

step S36: in a full-connection module, using a full-connection layer to connect the coded predicted vehicle characteristics to obtain the description of the state of the predicted vehicle, and then connecting the state description of the predicted vehicle with the scene social semantic vector output by the convolution module to obtain complete scene description;

step S37: predicting the behavior of the vehicle by utilizing a Softmax layer to obtain the probability of the vehicle under each behavior; dividing the behavior probability output by the Softmax layer into two branches, inputting one branch into a behavior prediction output module to output the behavior of the predicted vehicle, and inputting the other branch into a multi-modal trajectory prediction module after being connected with the complete scene description to obtain a multi-modal trajectory of the predicted vehicle;

step S38: and inputting the multi-modal track output by the multi-modal track prediction module and the behavior probability output by the behavior prediction output into the behavior and track output module to obtain a multi-modal behavior and track prediction result.

Further, the confrontational network encoder is composed of a series of encoders and decoders, the encoders encoding the input data into a new feature representation, and the decoders processing this feature to obtain an output;

the k-th layer of the antagonistic network encoder model is:

H ^k ＝F ^k (W ^k H ^k-1 +B ^(k-1,k) )

Z ^k ＝G ^k (W ^kT H ^k +B ^(k,k-1) )≈H ^k-1

wherein H ^k-1 Representing a set of k-1 decoder reconstructed sample hidden feature expressions; h ^k Set of hidden feature expressions, Z, representing reconstructed samples of a k-th layer decoder ^k Is a set of k-th layer reconstructed samples; f ^k And G ^k Representing the activation functions of the input layer and the hidden layer of the k-th layer, respectively, W ^k And B ^(k-1,k) Respectively representing a weight matrix and an offset vector from the k-th input layer to the hidden layer; w is a group of ^kT And B ^(k,k-1) The weight matrix and the offset vector corresponding to the k layer hidden layer to the output layer;

h is obtained by adjusting parameters of anti-network encoder model ^k-1 And Z ^k As similar as possible, a deeper-level representation of the feature D = [ D ] is obtained ₁ ,d ₂ ,...,d _i ,...,d _N ]Where N represents the number of samples input, N =1,2, 3.. -, i.. -, N; d represents the deep level feature of each vehicle at each time;

after the deep-level features d of each vehicle at each time are obtained, the features belonging to each vehicle are arranged according to the time sequence, so that the deep-level features M in the past time period corresponding to each vehicle i are obtained _i ：

Where t represents time, t _h Representing a historical observation time domain.

Further, step S4 specifically includes:

step S41: dividing the track of the vehicle into segments of 8s, extracting the historical characteristics of the vehicle by using the historical track of the first 3s, and predicting the track of the second 5 s; the input X to the model is the history of the past 3s trajectories of the predicted vehicle and its surrounding vehicles:

wherein the content of the first and second substances,

x _t representing the set of states of the predicted vehicle and its surrounding vehicles at time t, t = t-t _h ，...，t-2，t-1，t；t _h Representing a historical observation time domain;

represents the status information of the nth vehicle at time t, n =0, 1.., n; n represents the number of vehicles around the predicted vehicle;

step S42: inputting position coordinates x and y of the vehicle, a Lane number Lane _ ID, a type v _ class of the vehicle, a speed v _ vel of the vehicle, an acceleration v _ acc of the vehicle, a lateral behavior code LAT of the vehicle to represent a state of the vehicle:

s＝[x,y,Lane_ID,v_class,v_vel,v_acc,LAT]

LAT = [1, 0] if the vehicle makes a left lane change in 5s in the future, LAT = [0,1,0] if a right lane change is made, LAT = [0, 1] if the vehicle continues to keep the current lane;

step S43: in a traffic scene with n vehicles, the historical characteristics X of all vehicles in the previous 3s are taken as the input of a model, and the future 5s of behaviors of the vehicles are predicted:

m = { left lane change, right lane change, straight line }

Where M represents a set of vehicle behaviors

And predicting the vehicle future prediction time domain t _f Trajectory distribution for 5 s:

wherein the content of the first and second substances,

where P denotes the prediction time domain t _f Future vehicle position, p _t+1 Representing the predicted time domain t _f The location of the vehicle at the time;

representing the predicted time domain t _f The longitudinal coordinates of the vehicle at the time of day,

representing the predicted time domain t _f Lateral coordinates of the vehicle at the time of day.

The invention has the beneficial effects that: the invention simultaneously predicts the track and the behavior of the vehicle and also considers the interaction influence among the vehicles; the unmanned aerial vehicle is used for assisting the vehicle to predict the behavior and the track of the surrounding vehicle, so that the state information of the surrounding vehicle can be accurately acquired, the behavior of the vehicle can be accurately predicted, and the prediction accuracy and the performance of a prediction algorithm are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a three lane highway scene graph;

FIG. 2 is a block diagram of an implementation of a vehicle behavior and trajectory prediction method;

FIG. 3 is a diagram of a training data generation method;

FIG. 4 is a CSV data format output diagram;

FIG. 5 is a diagram of an offline training module;

FIG. 6 is an ANE encoding structure;

FIG. 7 is a multi-modal behavior and trajectory prediction effect.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For convenience of explanation, the present invention takes a three-Lane highway scenario as an example, as shown in fig. 1, the right Lane1 is shown from bottom to top, the middle Lane Lane2 and the left Lane Lane3 are arranged on the same plane, the height of the unmanned aerial vehicle from the ground is H, and the detection range of the unmanned aerial vehicle on the expressway is R. For convenience of representation, the origin O of the coordinate system is fixed at the centroid position of the predicted vehicle, the direction of the vehicle traveling along the highway is represented as the Y-axis direction, and the direction perpendicular to the Y-axis is represented as the X-axis. In the detection range R of the unmanned aerial vehicle, a vehicle in the middle is represented as a predicted vehicle, and two vehicles nearest to the predicted vehicle on the left and right lanes of the predicted vehicle and vehicles in front of and behind the predicted vehicle are represented as surrounding vehicles.

The invention defines the behavior of the vehicle on the expressway as a left Lane change (the vehicle drives to the left Lane Lane3 from the middle Lane Lane2 or the vehicle drives to the middle Lane Lane2 from the right Lane Lane 1), a right Lane change (the vehicle drives to the right Lane Lane1 from the middle Lane Lane2 or the vehicle drives to the middle Lane Lane2 from the left Lane Lane 3), and a current Lane keeping (namely, the vehicle belongs to the current Lane in the whole prediction period, and no behavior of crossing lanes occurs in the middle). Meanwhile, corresponding to each vehicle behavior, the invention also provides a track prediction result in the corresponding prediction time domain.

As shown in fig. 2, the framework for implementing the vehicle behavior and trajectory prediction method mainly includes: training data generation 1, offline training 2, and online prediction 3.

The training data generation 1, the off-line training 2 and the on-line prediction 3 specifically comprise the following steps:

step S1: training data generation 1 obtains ideal data input;

step S2: the training data generation 1 acquires data by using an unmanned aerial vehicle provided with a GPS and a camera, and generates training data by using a training data generation method;

and step S3: training a vehicle behavior and trajectory prediction model in an off-line training 2 by using training data generated by the training data generation 1;

and step S4: the training data output by the training data generation 1 is used, parameters of a behavior and trajectory prediction model in the offline training 2 are stored, and then the model is used for performing behavior and trajectory prediction in the online prediction 3.

Wherein the training data generates 1: the ideal data input must be obtained to train the model, and the vehicle behavior and trajectory prediction model of the present invention requires input data including the vehicle reference v _ ID of the vehicle being predicted and its surrounding vehicles. X and y coordinates, vehicle speed v _ vel, acceleration v _ acc and their respective Lane numbers Lane _ ID, etc. In order to accurately acquire the state information of surrounding vehicles and improve the performance of a trajectory prediction algorithm, the unmanned aerial vehicle is used for assisting the vehicles to predict the behaviors and the trajectories of the surrounding vehicles, and meanwhile, a data generation method in training data 1 is specifically introduced in fig. 3.

As shown in FIG. 3, the training data generation 1 according to the present invention includes acquiring a video sequence 11, camera matching 12, vehicle detection 13, lane line detection 14, track segment generation 15, appearance modeling 16, loss calculation 17, track clustering 18, and csv data generation 19. The acquisition video sequence 11 acquires a traffic scene video sequence transmitted by the unmanned aerial vehicle, and the acquired video sequences are respectively input to the camera matching 12, the vehicle detection 13 and the lane line detection 14. The camera matching 12 calculates internal and external parameters of the camera on the unmanned aerial vehicle, the vehicle detection 13 detects vehicles in the traffic scene video transmitted by the acquired video sequence 11, and inputs the vehicle label v _ ID, the frame number frame _ ID, the vehicle type v _ class, the vehicle bounding box information (including the center coordinates of the vehicle bounding box and the vehicle bounding box) and the internal and external parameters of the vehicle calculated by the camera matching 12 into the track segment generation 15 to generate track segments corresponding to each v _ ID, and inputs the generated track segments into the appearance modeling 16 and the loss calculation 17. The appearance modeling 16 models the content in each vehicle bounding box of the track segment and inputs the appearance modeling losses to a loss calculation 17. The loss calculation 17 combines the smoothness loss, the speed change loss, the time interval loss and the appearance modeling loss delivered by the appearance modeling 16 to calculate a loss function for the trajectory. The calculated loss function is input to the trajectory cluster 18. The track clustering 18 includes 5 operations of assigning, fusing, dividing, exchanging and breaking, and by comparing the loss function change values under these 5 operations, the minimum change value is found and the operation corresponding to the value with the minimum change of the loss function is executed. The vehicle number v _ ID, vehicle speed v _ vel, vehicle acceleration v _ acc, vehicle abscissa x, vehicle ordinate y, vehicle type v _ class, frame number frame _ ID, and Lane number Lane _ ID information detected by the Lane line detection 14 are then input into the csv data output 19 to generate data as shown in fig. 4.

In the stage of acquiring the video sequence 11, in order to acquire required scene information, a GPS and a camera are installed on the unmanned aerial vehicle, so that the position of the unmanned aerial vehicle can be acquired, and a top view of a driving scene can be acquired through the unmanned aerial vehicle. In the video acquisition stage, information transmitted by a GPS on the unmanned aerial vehicle is always kept between 30 m and 150m above the predicted vehicle, and 50m is selected through a test text, so that a better visual field can be obtained, and the effect is better. The data acquisition place is a section of 3-lane expressway, the data acquisition frequency is 10Hz, the acquisition time is 8-9 am, 12-13 am and 5-6 pm for three hours, and three different traffic flows are covered respectively.

After the traffic video sequence is acquired, vehicle detection 13 is performed using a YOLOv4 based vehicle detection algorithm. Since the dynamics (such as turning radius, maximum acceleration, etc.) of different types of vehicles are different, the types of vehicles need to be determined while detecting the vehicles, and the vehicles common on expressways are divided into four types, namely cars, buses, pickup trucks, buses, and the like. In order to be able to detect the vehicle and its type at the same time, we sampled 4000 sample pictures uniformly from NVIDIA AI city challenge data set as training data, where each picture contains an unequal number of 5 to 40 objects, and the training data is manually labeled as cars, buses, pickup trucks, mini-buses, etc. To reduce training time, we initialize the network with pre-trained weights. The vehicle detection module is required to be able to detect the vehicle and type information and output the vehicle identification number v _ ID and its type v _ class in correspondence to the CSV data output 19.

In the stage of camera matching 12, the internal parameters and the external parameters of the unmanned aerial vehicle camera are calibrated by using a vanishing point method, and then the marked parameters are used for reversely mapping the central coordinates of the vehicle bounding box detected by the vehicle detection 13 to a 3-dimensional space to calculate the position coordinate information of the vehicle in a real scene.

In order to predict the behaviors of vehicles, namely left lane changing, right lane changing, lane keeping and the like, a current lane mark where each vehicle is located is determined, at a lane line detection 14 stage, a lane line in a video sequence is detected by using a semantic segmentation method, and the lane line detection result is compared with the detected vehicle position to obtain the lane line information occupied by each vehicle ID in each frame of picture.

In a track segment generation stage 15, camera matching 12 is used for outputting camera internal and external parameter information on the unmanned aerial vehicle and vehicle bounding box information detected by vehicle detection 13, and the center coordinates of the vehicle bounding box can be reversely mapped into a 3-dimensional real world coordinate system to obtain position coordinate information of the vehicle in a real scene. The position information of all detected vehicles in each picture under the real scene is output, the vehicle position information in a plurality of adjacent pictures is continuously input, the position coordinates of the vehicles at different continuous moments can be obtained, then, the obtained position coordinates of a plurality of vehicles can be grouped by adopting a density-based clustering algorithm, such as a DBSCAN algorithm, and the like, so that the position coordinates are divided into a plurality of different track segments, and the number of the track segments is the same as the number of the vehicles appearing in the camera. Finally, we use clustering methods (spectral clustering, etc.) to connect trajectory segments to form longer trajectories. The clustering operation is implemented by minimizing a clustering loss function, which is defined as:

l _i ＝λ _sm l _i,sm +λ _vc l _i,vc +λ _ti l _i,ti +λ _ac l _i,ac

where l represents the loss function for all tracks in the scene; n is _v Indicating the number of vehicles in the video taken by the camera, l _i Is the clustering loss of the ith trace, l _i,sm Is the loss of smoothness of the trajectory i,/ _i,vc Is loss of vehicle speed variation,/ _i,ti Is the loss of time interval between two adjacent track segments, l _i,ac Is a loss of appearance change of the vehicle; lambda [ alpha ] _sm Regularization parameter, λ, representing trajectory smoothness loss _vc Regularization parameter, λ, representing loss of vehicle speed variation _ti Parametric parameter, λ, representing the loss of time interval between adjacent track segments _ac A regularization parameter indicative of a loss of change in vehicle appearance. Lambda [ alpha ] _sm 、λ _vc 、λ _ti And l _i,ac The invention is set to 0.2, 8, 25 and 0.5 in this order according to experience.

Smoothness loss is defined as measuring the smoothness of track segments belonging to the same track, the track segments in each track being ordered according to the temporal order of entry. The speed of the vehicle does not normally change suddenly and we use acceleration to measure the change in speed between two adjacent track segments, which are unlikely to belong to the same vehicle track if a large acceleration is detected in the junction area of the two track segments. Meanwhile, if the time interval between two adjacent track segments is relatively long, the probability that the two track segments belong to the same track is relatively low. Reliable vehicle appearance description is the key to reducing vehicle code v _ ID toggling, and the present invention utilizes a histogram-based adaptive appearance model to calculate appearance change loss.

In the appearance modeling 16 stage, we model the content in the image in the vehicle bounding box detected in the vehicle detection 13 corresponding to the position coordinates of each track segment generated by the track segment generation 15, and in order to ensure the accuracy of the appearance modeling, we use an RGB color Histogram, an HSV color Histogram, a Lab color Histogram, a Linear Binary Pattern (LBP) Histogram, and a Histogram of Oriented Gradients (HOG) to combine five histograms to describe the features of the vehicle in the bounding box. For each track segment, we record five histogram combinations representing the image content in each bounding box, and by adopting the five histogram combinations, the image content in each vehicle bounding box can be modeled more accurately, if the difference of the respective 5 histogram combinations of the two bounding boxes is smaller, the more similar the contents contained in the two bounding boxes are indicated, i.e. the more likely the vehicles in the bounding boxes belong to the same vehicle, and by comparing the five histogram combinations of different bounding boxes, the consistency of the detected vehicle in each bounding box can be determined. Therefore, the histograms of the bounding boxes in a plurality of track segments belonging to the same vehicle track should be as consistent as possible, the difference between two adjacent histogram combinations of all track segments in the track is used to define the adaptive appearance model loss, and the smaller the loss value, the greater the probability that the track segments in the track belong to the same track.

The loss calculation 17 defines and calculates four losses including smoothness loss, speed variation loss, time interval loss, and appearance modeling loss delivered by the appearance modeling 16. The smaller the loss, the better the effect of the track matching. The track clustering 18 includes 5 operations of assigning, fusing, dividing, exchanging and breaking, and by comparing the loss function change values under these 5 operations, the minimum change value is found and the operation corresponding to the value with the minimum change of the loss function is executed. If the minimum loss variance value is greater than 0, meaning that the loss increases after the selected operation is performed again, then no operation is performed on the track segment. And sequentially clustering all track segments until convergence.

Assignment operation in trajectory clustering 18: all track segments t _j The combined set of trajectories is denoted as T (j), which belongs to the trajectory of vehicle j. We search all the tracks and segment t of the track _j The trajectory that produces the smallest loss is assigned. Specifically, the loss variation after the allocation operation is defined as:

wherein: Δ l _j,as Is the loss change after the dispensing operation, l (T (j) \ T _j ) Indicates that the track segment t is to be traced _j Removing the loss of the post-track set T (j), l (T (i) < u > T > _j ) Indicates that the track segment t is to be traced _j The loss of the track set T (i) after merging into T (j), and l (T (j)) and l (T (i)) represent the loss of the track set T (j) and the track set T (i), respectively.

Fusion operations in trajectory clustering 18: for each track segment, if a smaller loss value can be obtained after fusing two track sets, we fuse its corresponding track set T (j) with another track set T (i), and similarly, the loss variation of the fusion operation is defined as:

wherein, Δ l _j,mg Is a loss change after the fusion operation, and l (T (j) uet (i)) represents a loss after fusing the track set T (j) and the track set T (i).

Segmentation operations in trajectory clustering 18: a segmentation operation is used to segment a track segment from the current set of tracks to form an independent set of tracks. The loss variation of the segmentation operation is defined as:

Δl _j,sp ＝(l(t _j )+l(T(j)\t _j ))-l(T(j))

wherein, Δ l _j,sp Is after a splitting operationChange in loss of l (t) _j ) Is a track segment T divided from T (j) _j The resulting loss, l (T (j) \ T _j ) Indicating the division of T from the set of tracks T (j) _j The loss generated later.

Swapping operation in trajectory clustering 18: for a set of trajectories T (j), let T _j The track segment thereafter is denoted T _aft (j) The previous track segment is denoted T _bef (j) In that respect Will T _bef (i) And T _bef (j) The exchange is performed and the loss change after the exchange and before the exchange is obtained,

wherein,. DELTA.l _sw Indicates the change in loss after the crossover operation, l (T) _aft (j)∪T _bef (i) Denotes that _aft (j) And T _bef (i) Loss after performing fusion,/(T) _bef (j)∪T _aft (i) Denotes a combination of T _bef (j) And T _aft (i) Loss after fusion was performed.

Breaking operation in trajectory clustering 18: we partition T (j) into T _aft (j) And T _bef (j) To calculate the loss of the break operation:

Δl _bk ＝(l(T _bef (j))+l(T _aft (j)))-l(T(j))

wherein,. DELTA.l _bk Indicating the change in loss after the breaking operation, l (T) _bef (j) ) represents a track set T _bef (j) Loss of l (T) _aft (j) Represents a set of trajectories T _aft (j) Is lost.

By minimizing these losses, we can determine each vehicle index v _ ID and its corresponding x, y coordinates, and by deriving the time, obtain the corresponding velocity v _ ID, acceleration v _ acc. Finally, v _ ID, v _ acc, v _ vel, x, y, v _ class, frame _ ID are output to csv data output 19.

The CSV data output 19 stores the vehicle number v _ ID, the vehicle type information v _ class, the vehicle acceleration v _ acc, the vehicle speed v _ vel, the vehicle coordinates x, y, the frame number frame _ ID, and the Lane line number Lane _ ID output from the Lane line detection 14, which are output from the trajectory cluster 18, as a format output as shown in fig. 4.

The invention utilizes training data generated by training data generation 1 to train a vehicle behavior and track prediction model in off-line training 2, and the off-line training 2 is shown in fig. 5 and mainly comprises the following steps: the trellis diagram representation 21, the deep level feature extraction 22, the feature encoding 23, the feature vector filling 24, the convolution module 25, the full-concatenation module 26, the Softmax layer 27, the multi-modal trajectory prediction module 28, the behavior prediction output 29, and the behavior and trajectory output module 30.

The grid map 21 represents a driving scene of a vehicle in the form of a grid map, and specifically represents the following modes: by taking the grid where the predicted vehicle is located as the center, expanding 80m forward and backward along the Y axis to be used as interested areas, dividing the whole interested area into 17 x 3 grid maps at intervals of 10m in the longitudinal direction and taking a lane as a dividing line in the transverse direction, and filling the grid maps into corresponding grid maps according to the position of each peripheral vehicle relative to the predicted vehicle, so that a representation about the position relation between the predicted vehicle and the peripheral vehicles can be obtained.

In order to obtain the dynamic characteristics and driving style of each vehicle, in the deep level feature extraction 22, we use an Antagonistic Network Encoder (ANE) to extract the dynamic characteristics and driving style of each vehicle. In order to solve the problem that which kind of behavior (lane keeping, lane changing left, lane changing right, and the like) the vehicle can take in the future cannot be judged, the input training data of the invention comprises a vehicle label v _ ID, a lane number LaneID, x and y coordinates of the vehicle, a vehicle speed v _ vel, an acceleration v _ acc of the vehicle and the like, and in order to automatically extract effective vehicle dynamics and driving style characteristics, an anti-Network Encoder (ANE) is used for carrying out unsupervised characteristic extraction on the training data.

ANE encoder 221 as shown in fig. 6, an ANE auto-encoder consists of a series of encoders and decoders, the encoders encoding input data into a new feature representation, and the decoders then processing this feature to obtain output. The error is propagated backwards and continuously optimized at each training time, so that the input and output are similar as much as possible, and the learned feature representation can be deeper as the layer depth increases. To further illustrate the calculation of the ANE, we illustrate the first layer of the ANE model as:

H ¹ ＝F ¹ (W ¹ X+B ^(in,1) )

Z＝G ¹ (W ^1T H ¹ +B ^(1,out) )≈X

where X represents a set of encoder input samples, H ¹ Representing a set of first layer decoder reconstructed sample concealment feature expressions, Z is a set of first layer reconstructed samples. F ¹ And G ¹ Representing the activation functions of the input layer and the hidden layer of the first layer, W, respectively ¹ And B ^(in,1) Respectively representing a weight matrix and an offset vector from a first layer input layer to a hidden layer; w ^1T And B ^(1,out) Are the corresponding weight matrix and offset vector from the first layer concealment layer to the output layer. Similar to the first layer, for the k-th layer, there are:

H ^k ＝F ^k (W ^k H ^k-1 +B ^(k-1,k) )

Z ^k ＝G ^k (W ^kT H ^k +B ^(k,k-1) )≈H ^k-1

wherein H ^k-1 Representing a set of k-1 decoder reconstructed sample concealment feature expressions; h ^k Set of hidden feature expressions, Z, representing reconstructed samples of a k-th layer decoder ^k Is a set of k-th layer reconstructed samples. F ^k And G ^k Representing the activation functions of the input layer and the hidden layer of the k-th layer, respectively, W ^k And B ^(k-1,k) Respectively representing a weight matrix and an offset vector from a k-th input layer to a hidden layer; w ^kT And B ^(k,k-1) Are the corresponding weight matrix and offset vector from the k-th hidden layer to the output layer.

X and Z are made as similar as possible by adjusting the parameters of the ANE model. This way, a deeper level of the feature representation D = [ D ] can be obtained ₁ ,d ₂ ,...,d _i ,...,d _N ]Where N represents the number of samples inputAnd d represents a deep level feature of each vehicle at each time.

t represents time, t _h Representing a historical observation time domain.

The Gated Recovery Units (GRUs) have the characteristics of simple structure and capability of processing time series data, so that the characteristic code 23 of the invention adopts the GRUs to code the vehicle dynamics and the driving style.

After all vehicles in the scene are encoded, we fill the encoded features into their corresponding grid maps in sequence to form a social semantic vector in the feature vector filling 24 according to the position of each vehicle relative to the predicted vehicle. So far, a similar image feature representation form is obtained, and in order not to destroy the position relationship between vehicles, in the convolution module 25, a 3 × 3 convolution, a 3 × 1 convolution and a 2 × 2 residual convolution with a residual rate of 2 are sequentially used for processing to obtain a social semantic vector, wherein the use of the residual convolution can enlarge the receptive field of a scene, so that the prediction precision is better.

In the full-connection module 26, the coded predicted vehicle features are connected by using a full-connection layer to obtain a description of the state of the predicted vehicle, and then the state description of the predicted vehicle is connected with the scene social semantic vector output by the convolution module 25 to obtain a complete scene description. The behavior of the vehicle is predicted by using the Softmax layer 27, and the probability of the vehicle under each behavior is obtained. The behavior probability of the output of the Softmax layer 27 is divided into two branches, one branch is input into the behavior prediction output 29 to output the behavior of the predicted vehicle, and the other branch is connected with the complete scene description and then input into the multi-modal trajectory prediction module 28 to obtain the multi-modal trajectory of the predicted vehicle. Finally, the multi-modal trajectories output by the multi-modal trajectory prediction module 28 and the behavior probabilities output by the behavior prediction output 29 are both input to the behavior and trajectory output module 30 to obtain multi-modal behavior and trajectory prediction results.

We use the training data to generate 1 the output training data and save the parameters of the behavior and trajectory prediction model in offline training 2, and then use this model to make behavior and trajectory predictions in online prediction 3.

In training, the trajectory of the vehicle is divided into 8s segments, the historical trajectory of the first 3s is used for extracting the historical characteristics of the vehicle, and the trajectory of the last 5s is predicted. Specifically, the input X of the model is the trajectory history of the predicted vehicle and its surrounding vehicles (past 3 s):

wherein the content of the first and second substances,

x _t representing the set of states of the predicted vehicle and its surrounding vehicles at time t, t = t-t _h ，...，t-2，t-1，t。

Indicating the state information of the nth vehicle at time t. n represents the number of vehicles around the predicted vehicle, n =0, 1. t is t _h Representing a historical observation time domain.

We input the position coordinates x and y of the vehicle, the Lane number Lane _ ID, the type v _ class of the vehicle, the speed v _ vel of the vehicle, the acceleration v _ acc of the vehicle, the lateral behavior code LAT of the vehicle to represent the state of the vehicle:

s＝[x,y,Lane_ID,v_class,v_vel,v_acc,LAT]

wherein LAT = [1, 0] if the vehicle makes a left lane change in the future for 5s, LAT = [0,1,0] if a right lane change is made, and LAT = [0, 1] if the vehicle continues to keep the current lane.

In a traffic scene with n vehicles, the historical characteristics X of all vehicles in the previous 3s are taken as the input of a model, and the future 5s of behaviors of the vehicles are predicted:

m = { left lane change, right lane change, straight line }

Where M represents a set of vehicle behaviors.

wherein, the first and the second end of the pipe are connected with each other,

where P denotes the prediction time domain t _f Future vehicle position, p _t+1 Representing the predicted time domain t _f The position of the vehicle at that moment.

representing the predicted time domain t _f The lateral coordinates of the vehicle at the time.

The embodiment is as follows:

10000 straight lines, 10000 left lane changing tracks and 10000 right lane changing tracks are extracted from an unmanned aerial vehicle video screen respectively, the previous 3s historical state characteristics in the tracks are used as input, the future 5s behaviors and tracks of the vehicle are output, the prediction result is compared with the real behavior label and the real track, the corresponding deviation is minimized, and therefore the parameters of the model are obtained. And then, carrying out online prediction by using the trained model, and respectively evaluating the action of the track and behavior trajectory measurement method by using the root mean square error and the accuracy, wherein the predicted quantitative effects are respectively shown in tables 1 and 2.

TABLE 1 different prediction time domain RMS errors

Model	1s	2s	3s	4s	5s
							1.78	2.43	3.34	4.22	6.25

TABLE 2 probability maximum behavior prediction accuracy at each time 5s before lane change behavior

Model	First 1s	First 2s	First 3s	First 4s	First 5s
							97％	95％	90％	88％	84％

The visualization effect is shown in fig. 7, and it can be seen that the method of the invention can effectively predict the behavior and the track of the vehicle.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A highway vehicle behavior and track prediction method based on vehicle-unmanned aerial vehicle cooperation is characterized by comprising the following steps:

step S1: the training data generation module acquires ideal data input;

the data input includes: the vehicle identification number v _ ID of the predicted vehicle and the surrounding vehicles, the vehicle abscissa x and the ordinate y of the predicted vehicle and the surrounding vehicles, the vehicle speed v _ vel of the predicted vehicle and the surrounding vehicles, the vehicle acceleration v _ acc of the predicted vehicle and the surrounding vehicles, and the Lane number Lane _ ID of the predicted vehicle and the surrounding vehicles;

the step S3 specifically comprises the following steps:

step S36: in the full-connection module, the full-connection layer is used for connecting the coded predicted vehicle features to obtain the description of the predicted vehicle state, and then the state description of the predicted vehicle is connected with the scene social semantic vector output by the convolution module to obtain a complete scene description;

step S37: predicting the behaviors of the vehicle by utilizing a Softmax layer to obtain the probability of the vehicle under each behavior; dividing the behavior probability output by the Softmax layer into two branches, inputting one branch into a behavior prediction output module to output the behavior of the predicted vehicle, and inputting the other branch into a multi-modal trajectory prediction module after being connected with the complete scene description to obtain the multi-modal trajectory of the predicted vehicle;

step S38: inputting the multi-modal trajectory output by the multi-modal trajectory prediction module and the behavior probability output by the behavior prediction output module into the behavior and trajectory output module to obtain a multi-modal behavior and trajectory prediction result;

2. The method for predicting the vehicle behavior and the track of the expressway based on the vehicle-unmanned aerial vehicle cooperation according to claim 1, wherein the step S2 is specifically as follows:

3. The method for predicting the vehicle behavior and track of the expressway based on the vehicle-unmanned aerial vehicle cooperation of claim 2, wherein in the step S21, the unmanned aerial vehicle obtains the position of the unmanned aerial vehicle and simultaneously obtains the top view of the driving scene, and the information transmitted by the GPS on the unmanned aerial vehicle always keeps the unmanned aerial vehicle between 30 and 150m above the predicted vehicle;

after the camera matching module receives the video sequence, calibrating internal parameters and external parameters of the unmanned aerial vehicle camera by using a vanishing point method, and then reversely mapping the central coordinates of the vehicle bounding box detected by the vehicle detection module to a 3-dimensional space by using the marked parameters to calculate the position coordinate information of the vehicle in a real scene;

4. The method for predicting vehicle behavior and trajectory on the expressway based on vehicle-unmanned aerial vehicle cooperation according to claim 2, wherein the step S22 of generating the trajectory segment corresponding to each v _ ID by the trajectory segment generating module comprises the following steps:

l _i ＝λ _sm l _i,sm +λ _vc l _i,vc +λ _ti l _i,ti +λ _ac l _i,ac

wherein l represents the loss function of all tracks in the scene; n is _v Indicating the number of vehicles in the video taken by the camera, l _i Is the clustering loss of the ith trace, l _i,sm Is the loss of smoothness of the trajectory i,/ _i,vc Is the loss of variation in vehicle speed,. L _i,ti Is the loss of time interval between two adjacent track segments, l _i,ac Is a loss of appearance change of the vehicle; lambda _sm Regularization parameter, λ, representing trajectory smoothness loss _vc Regularization parameters representing loss of vehicle speed variation、λ _ti Parametric parameter, λ, representing the loss of time interval between adjacent track segments _ac A regularization parameter indicative of a loss of change in vehicle appearance;

the smoothness loss is to measure the smoothness of track segments belonging to the same track, the track segments in each track are sequenced according to the entering time sequence, the acceleration is used for measuring the speed change between two adjacent track segments, and whether the track segments belong to the track of the same vehicle is judged according to the acceleration change of a connecting area of the track segments and the time interval between the track segments;

5. The method for predicting the vehicle behavior and the track of the expressway based on the vehicle-unmanned aerial vehicle cooperation according to claim 2, wherein the step S23 is implemented by the following specific processes: modeling the content in the image in the vehicle bounding box detected by the vehicle detection module corresponding to the position coordinates of each track segment generated by the track segment generation module, describing the characteristics of the vehicle in the bounding box by using the combination of an RGB color histogram, an HSV color histogram, a Lab color histogram, a linear binary mode histogram and a directional gradient histogram, and accurately modeling the image content in each vehicle bounding box; determining the consistency of the detected vehicle in each bounding box by comparing five histogram combinations of different bounding boxes; and defining the loss of the self-adaptive appearance model by using the difference value between two adjacent histogram combinations of all track segments in the track, wherein the smaller the loss value is, the higher the probability that the track segments in the track belong to the same track is.

6. The method for predicting vehicle behavior and trajectory on the highway based on vehicle-unmanned aerial vehicle cooperation according to claim 2, wherein the allocating operation in the trajectory clustering module in the step S24 is specifically: all track segments t _j The formed track set is recorded as T (j), and the T (j) belongs to the track of the vehicle j; searching in all tracksDividing the track into segments t _j Assigning to a trajectory that produces minimal losses; the loss change after the allocation operation is defined as:

wherein: Δ l _j,as Is the loss change after the dispensing operation, l (T (j) \ T _j ) Represents a track segment t _j Removing the loss of the post-trace set T (j) from T (j), l (T (i) U.T. _j ) Represents a track segment t _j The loss of the track set T (i) after the T (j) is merged, and l (T (j)) and l (T (i)) respectively represent the loss of the track set T (j) and the track set T (i);

the fusion operation in the track clustering module is as follows: for each track segment, if a smaller loss value can be obtained after fusing two track sets, fusing the corresponding track set T (j) with another track set T (i), and similarly, the loss variation of the fusion operation is defined as:

wherein,. DELTA.l _j,mg Is a loss change after the fusion operation, l (T (j) uet (i)) represents a loss after fusing the trajectory set T (j) and the trajectory set T (i);

Δl _j,sp ＝(l(t _j )+l(T(j)\t _j ))-l(T(j))

the trajectory clusteringExchange operation in module: for a track set T (j), T is added _j The track segment after this is denoted T _aft (j) The previous track segment is denoted T _bef (j) (ii) a Will T _bef (i) And T _bef (j) The exchange is performed and the loss variation after the exchange and before the exchange is obtained,

wherein,. DELTA.l _sw Indicates the change in loss after the crossover operation, l (T) _aft (j)∪T _bef (i) Denotes a combination of T _aft (j) And T _bef (i) Loss after performing fusion,/(T) _bef (j)∪T _aft (i) Denotes a combination of T _bef (j) And T _aft (i) Loss after fusion;

Δl _bk ＝(l(T _bef (j))+l(T _aft (j)))-l(T(j))

wherein, Δ l _bk Indicating the change in loss after the breaking operation, l (T) _bef (j) ) represents a track set T _bef (j) Loss of l (T) _aft (j) ) represents a track set T _aft (j) Is lost.

7. The method of claim 1, wherein the countermeasure network encoder is composed of a series of encoders and decoders, the encoders encoding the input data into a new feature representation, the decoders processing this feature to obtain an output;

the k-th layer of the competing network encoder model is:

H ^k ＝F ^k (W ^k H ^k-1 +B ^(k-1,k) )

Z ^k ＝G ^k (W ^kT H ^k +B ^(k,k-1) )≈H ^k-1

wherein H ^k-1 Representing a set of k-1 decoder reconstructed sample concealment feature expressions; h ^k Set of hidden feature expressions, Z, representing reconstructed samples of a k-th layer decoder ^k Is a set of k-th layer reconstructed samples; f ^k And G ^k Representing the activation functions of the input layer and the hidden layer of the k-th layer, respectively, W ^k And B ^(k-1,k) Respectively representing a weight matrix and an offset vector from the k-th input layer to the hidden layer; w is a group of ^kT And B ^(k,k-1) Is the corresponding weight matrix and offset vector from the k layer hidden layer to the output layer;

h is obtained by adjusting parameters of anti-network encoder model ^k-1 And Z ^k As similar as possible, a deeper-level representation of the feature D = [ D ] is obtained ₁ ,d ₂ ,...,d _i ,...,d _N ]Wherein N represents the number of samples input, N =1,2, 3.., i., N; d represents the deep level feature of each vehicle at each time;

Wherein t represents time, t _h Representing a historical observation time domain.

8. The method for predicting the vehicle behavior and the track of the expressway based on the vehicle-unmanned aerial vehicle cooperation according to claim 1, wherein the step S4 specifically comprises the following steps:

step S41: dividing the track of the vehicle into 8s segments, extracting the historical characteristics of the vehicle by using the historical track of the first 3s, and predicting the track of the last 5 s; the input X to the model is the history of the past 3s trajectories of the predicted vehicle and its surrounding vehicles:

wherein the content of the first and second substances,

s＝[x,y,Lane_ID,v_class,v_vel,v_acc,LAT]

m = { left lane change, right lane change, straight line }

Where M represents a set of vehicle behaviors

wherein the content of the first and second substances,

where P represents the prediction time domain t _f Inner future vehicle position, p _t+1 Representing the predicted time domain t _f The location of the vehicle at the time;

representing the predicted time domain t _f The longitudinal coordinate of the vehicle at the moment of time,