CN117141517A

CN117141517A - Method for constructing vehicle track prediction model by combining data driving and knowledge guiding

Info

Publication number: CN117141517A
Application number: CN202311179871.XA
Authority: CN
Inventors: 郭景华; 王靖瑶; 何智飞; 李录斌; 焦一洲; 王晖年
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-01

Abstract

A method for constructing a vehicle track prediction model by combining data driving and knowledge guiding relates to intelligent driving. The steps are as follows: 1) Processing data: the method comprises the steps of processing peripheral vehicle driving data acquired by vehicle-mounted sensors such as cameras and millimeter wave radars, setting up rule definition Zhou Che, setting up labels for each sequence data, and obtaining data required for realizing vehicle motion prediction; 2) An encoder-decoder framework is adopted to provide a vehicle track prediction model based on an intention perception space-time attention network; 3) Providing decision information: and providing rich information basis for subsequent operation of the host vehicle according to the result of the combined prediction. The driving intention and the driving track of Zhou Che are jointly predicted by using the deep learning network through the peripheral vehicle driving information acquired by the sensors equipped with the automatic driving vehicle, so that the accuracy of long-term track prediction is improved, more accurate position prediction is realized, and rich decision information is provided for the vehicle to ensure the driving safety of the vehicle.

Description

Method for constructing vehicle track prediction model by combining data driving and knowledge guiding

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a method for constructing a vehicle track prediction model by combining data driving and knowledge guiding.

Background

In recent years, automatic driving of automobiles is attracting attention as an effective way to improve traffic safety and alleviate energy and environmental problems of intelligent traffic systems. The automatic driving automobile needs to have the capability of predicting future movement of surrounding vehicles so as to plan a safe driving path in advance, thereby effectively reducing traffic accidents. In the context of autonomous driving, early prediction of vehicle movement around an autonomous vehicle is a critical factor in ensuring high level road safety. There are two main types of vehicle motion prediction: driving intent recognition and trajectory prediction.

With the rapid development of artificial intelligence, the latest progress of deep learning provides a powerful tool for solving the problem of vehicle motion prediction, and a vehicle motion prediction method based on deep learning becomes the mainstream. Literature (Ding W, etc. predictive Vehicle Behaviors Over An Extended Horizon Using Behavior Interaction Network,2019International Conference on Robotics and Automation (ICRA). 2019, montal, canada.) proposes a new vehicle behavior interaction network based on a Recurrent Neural Network (RNN) for vehicle interaction modeling to predict cut-in intent of surrounding vehicles, however, intent recognition does not give explicit trajectory information. Literature (Deo N, trivedi M m.Convolitional social pooling for vehicle trajectory prediction [ C ]. Proceedings 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018:1549-1557.) proposes a classical Convolution Social LSTM prediction algorithm using a convolution social pool to learn interdependencies in vehicle motion, which, while current deep learning-based trajectory prediction methods have achieved good results, lack interpretability and transparency in such data-driven black box models; in addition, the motion prediction of the vehicle is independently considered in consideration of driving intention recognition or track prediction, and the decision information provided for the subsequent host vehicle is single.

Disclosure of Invention

The invention aims to solve the problems in the background art and provides a vehicle track prediction method for combining data driving and knowledge guiding.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention comprises the following steps:

1) Processing data: the method comprises the steps of processing peripheral vehicle driving data acquired by vehicle-mounted sensors such as cameras and millimeter wave radars, setting up rule definition Zhou Che, setting up labels for each sequence data, and obtaining data required for realizing vehicle motion prediction;

2) An encoder-decoder framework is adopted to provide a vehicle track prediction model based on an intention perception space-time attention network;

first, constructing a track prediction model of an LSTM-based encoder-decoder framework;

secondly, constructing an intention recognition model based on BiLSTM, and recognizing the driving intention of Zhou Che;

Thirdly, introducing an intention attention mechanism to promote the effect of time sequence problem prediction;

fourthly, calculating the obtained vehicle feature coding vector based on an intention attention mechanism, and capturing the importance of the neighbor vehicle by using an interaction relation capturing module;

fifthly, simultaneously carrying out long-time-domain track prediction on Zhou Che by combining the position information, the driving intention and the interaction relation of the peripheral vehicle;

and sixthly, considering two aspects of hard constraint and soft constraint, adding a kinematic layer and introducing a traffic rule auxiliary loss function on the basis of the proposed intention perception space-time attention network track prediction model so as to improve the performance of the data driving model.

3) Providing decision information: providing abundant information basis for the subsequent operation of the host vehicle according to the result of the combined prediction;

in step 1), the method for processing the peripheral vehicle driving data collected by the vehicle-mounted sensors such as cameras and millimeter wave radars, and the like, specifically comprises the following steps:

the first step, the vehicle-surrounding driving data collected by the vehicle-mounted sensors such as cameras and millimeter wave radars equipped in the vehicle mainly comprises: longitudinal distance, transverse distance, longitudinal relative speed, transverse relative speed, longitudinal relative acceleration, transverse relative acceleration of the peripheral vehicle and the main vehicle, azimuth angle of the peripheral vehicle relative to the main vehicle and azimuth angle change rate.

Filling the missing values of the data, filling a single missing value by adopting a nearest neighbor filling method, filling a plurality of continuous missing values by adopting an interpolation method, removing the abnormal values of the data by using a 3-sigma rule, and finally filtering the data by using a Savitzky-Golay filter so as to obtain a smoother data curve.

And thirdly, defining a coordinate system to conveniently describe the position of the vehicle, and calculating the position coordinates of the host vehicle and the peripheral vehicle at each moment.

Fourth, zhou Chejia driving intention is defined as lane change to the left, lane change to the right and straight running, and intention labels are set for each piece of driving sequence data.

In step 2), the vehicle track prediction model based on the intention-aware spatiotemporal attention network is built by using an LSTM-based encoder-decoder framework in order to capture the time dependency in the track, which is the most advanced architecture for handling the track prediction problem, and the specific steps include:

the first step, using a full connection layer as an embedding layer, embedding the input state vector of each vehicle to form an embedding vector, then utilizing LSTM to respectively encode the embedding vectors of different vehicle historic moments, and obtaining corresponding feature vectors for the target vehicle and the neighboring vehicles around the target vehicle.

The second step, the specific method for predicting the driving intention of the peripheral car is as follows:

(1) Zhou Chejia driving intent recognition is a classification problem, and the driving intent recognition model classifies input multi-feature multi-step long-time series data.

(2) A peripheral car driving intention recognition model is built based on a BiLSTM network, the BiLSTM network is formed by combining a forward LSTM and a backward LSTM, and the accuracy of a time sequence prediction result can be improved by fully utilizing the context information.

(3) Inspired by ResNet, shortcomputer connections are added in Bi-LSTM network, so that gradient stride calculation is realized, and the influence of deep network gradient elimination and network degradation problems can be effectively reduced.

(4) And inputting the motion state of the target vehicle and the spatial position information of the neighbor vehicle into a driving intention recognition model, outputting a driving intention probability vector of the peripheral vehicle, and obtaining the driving intention with the maximum probability as a final recognition result.

Third, the specific method for the attention mechanism is as follows:

(1) Firstly, splicing an intention vector of a target vehicle and a hidden state vector at the moment on a decoder by using a splicing operation, then, using a vector obtained by processing a complete connection layer as a query in a key value to attention mechanism, and forming a key and a value by processing the hidden state of the vehicle coded by an encoder by using the complete connection layer;

(2) The attention features are then calculated as a weighted sum of "values".

(3) A multi-headed attention mechanism is employed to extend attention to higher order interactions.

And fourthly, providing an interaction relation capturing module based on a multi-head attention mechanism for capturing interaction between the target vehicle and other vehicles and selecting surrounding vehicles to pay attention when predicting future tracks of the target vehicle.

And fifthly, when the t-th time step prediction is carried out, the decoder also adopts a full-connection layer with a LeakyReLU nonlinear activation function, embeds the coordinates of the position of the predicted track at the moment before the track and forms an embedded vector, and the intermediate semantic vector at the moment contains the selected vehicle interaction information and the motion state code of the target vehicle and is transmitted to the LSTM decoder together with the embedded vector, and the LSTM decoder predicts the track point position of the future time step of the target vehicle.

And sixthly, adding a kinematic layer embedded in the kinematic model of the bicycle between the final hidden layer and the output layer of the decoder to decode the track so as to realize more accurate position prediction.

(1) The inputs of the kinematic model are the steering angle and the longitudinal acceleration of the front wheels. At the current time t _p In preparation for predicting the t _p In the motion of +h time step, the decoder obtains t through LSTM prediction _p Longitudinal acceleration a of +h-1 time step target vehicle _tp+h-1 And steering angle delta _tp+h-1 And then uses them to calculate and obtain the target vehicle at t _p Motion state for +h time steps.

(2) And processing the hidden state vector of the final moment of the LSTM unit by using a full connection layer to process the history information of the target vehicle, and estimating the fixed kinematic parameters of the vehicle.

And seventh, designing an auxiliary loss function based on some traffic rules available in the current data set, and guiding the model to train according to the direction of the conforming knowledge, so that the model has stronger adaptability to the knowledge.

(1) According to traffic regulations and social practice, vehicles must travel along the direction of travel of a lane when traveling on the lane so as not to interfere with other traffic. The ability of the track to be oriented in the direction of the lane where it is located is measured by introducing a yaw loss.

(2) According to the requirements of traffic regulations, vehicles generally have the highest speed limit when traveling on roads. Overspeed losses are introduced to measure the ability of the predicted trajectory to control speeds that do not exceed the highest speed per hour.

(3) In order to drive the model learned from the training data to also conform to certain acceptable knowledge rules, the loss function incorporates lane yaw loss and overspeed loss on a mean square error basis.

In step 3), the method provides abundant information basis for the subsequent operation of the host vehicle according to the result of the joint prediction, and comprises the following specific steps: obtaining the current driving intention of the peripheral vehicle and the track position of the peripheral vehicle for a period of time after the current driving intention and the track through the combined prediction of the driving intention and the track; the behavior in a short time can be roughly judged according to the recognized driving intention, the reaction time is reserved for the operation of the host vehicle, the risk of vehicle collision is reduced, and the driving safety is ensured; the main vehicle makes path planning in advance according to the predicted track position of the peripheral vehicle in a subsequent period of time, and can obtain the optimal manipulation selection for reaching the destination, thereby ensuring the driving comfort and reducing the useless energy consumption.

The invention has the beneficial effects that:

1. aiming at the problems that the interaction relation of neighbor vehicles is ignored, the characteristics of vehicle information are not fully extracted, the influence of driving intention is not deeply considered and the like in the current vehicle track prediction research, a vehicle track prediction algorithm based on intention perception space-time attention network is provided so as to improve the precision of long-term track prediction.

2. On the basis of the prediction of the intention perception space-time attention network track, the vehicle kinematics hard constraint is integrated, a kinematics layer embedded in the two-wheeled bicycle kinematics model is added to generate the vehicle kinematics track, and more accurate position prediction is realized.

3. And (3) using knowledge related to some traffic rules, designing an auxiliary loss function based on knowledge constraint punishment to optimize training of the model so as to improve the interpretability of the model.

Drawings

FIG. 1 is a schematic diagram of a process framework of the present invention.

Fig. 2 is a schematic view of a neighboring vehicle of the present invention.

Fig. 3 is a schematic diagram of classification of driving intention of a vehicle according to the present invention.

FIG. 4 is a schematic diagram of a vehicle trajectory prediction model based on an intent-aware spatiotemporal attention network of the present invention.

FIG. 5 is a schematic diagram of the attention mechanism of the present invention.

FIG. 6 is a schematic diagram of an interaction relationship capture of the present invention.

FIG. 7 shows the fixed kinematic parameter I of the present invention _f And l _r Is a schematic diagram of the estimation process.

Fig. 8 is a schematic diagram of a decoder of the present invention with the addition of a kinematic layer.

Detailed Description

The invention will be further illustrated by the following examples in conjunction with the accompanying drawings.

The method framework of the invention is shown in fig. 1, and consists of three parts of data processing, combined prediction of cycle intention recognition and track prediction and decision information providing, and comprises the following steps:

step 1: the data processing steps are as follows:

step 1.1: acquiring peripheral vehicle driving data acquired by vehicle-mounted sensors such as cameras and millimeter wave radars equipped in vehicles: longitudinal distance, transverse distance, longitudinal speed, transverse speed, longitudinal acceleration and transverse acceleration of the peripheral car and the main car.

Step 1.2: for a single missing value in the data, filling the data of the previous time step or the later time step by adopting a nearest neighbor filling method; for a plurality of consecutive missing values, interpolation is used to calculate the average of the previous and subsequent missing values for filling. And removing the abnormal value existing in the data by adopting a 3-sigma criterion, calculating the standard deviation sigma and the mean value mu of each characteristic, and removing the data of which the numerical value is distributed outside the (mu-3 sigma, mu+3 sigma) interval. And finally, filtering the data by adopting a Savitzky-Golay filter, taking M sampling points near the original data x (i), and constructing a window of 2M+1 sampling points around the x to fit a p-order polynomial y (i), wherein the expression is as follows:

where y (i) is the processed data, i= -M, …,0, …; e is the sum of the squares of the total errors; p is less than or equal to 2M.

Step 1.3: defining a coordinate system, wherein fig. 2 is a schematic diagram of a neighbor vehicle, 6 vehicles within a certain longitudinal range + -L of a target vehicle are defined as neighbor vehicles, and other vehicles except the target vehicle in an elliptical dotted line are neighbor vehicles; and predicting at the time t, wherein the origin is fixed on the host vehicle, the x-axis points to the movement direction of the expressway, and the y-axis points to the direction perpendicular to the movement direction. This allows the data acquisition of the present invention to be better matched to the onboard sensors on an autonomous vehicle.

Step 1.4: it is known from literature (Toledo T, etc. modeling duration of lane changes [ J ]. Transportation Research Record,2007,1999 (1)) that the average duration of a lane change in a city is 5s, the average duration on a highway is 5.8s, and the lane change time is set to 5.4s in combination; FIG. 3 is a schematic diagram showing classification of driving intention of a vehicle according to the present invention, wherein an intersection point of a vehicle track and a left road dotted line is a left lane change point, a position of the vehicle at a time point of 2.7s along an inverse time direction of the left lane change point is defined as a left lane change start point, and a position of the vehicle at a time point of 2.7s along a time direction of the left lane change point is defined as a left lane change end point; the track sequence comprises a left lane change segment defined as a left lane change intention; defining a lane change right intention based on the same method; if the vehicle track sequence does not contain a lane change segment, this track segment is defined as a lane keeping intent. An intention label is set for each piece of travel data according to this rule.

Step 2: the method comprises the following implementation steps of carrying out joint prediction on the driving intention and the track of the peripheral vehicle, wherein a schematic diagram of a vehicle track prediction model based on an intention perception space-time attention network is shown in fig. 4:

step 2.1: and using a full connection layer as an embedding layer, embedding the input state vector of each vehicle to form an embedding vector, then respectively encoding the embedding vectors at different vehicle historic moments by utilizing LSTM, and obtaining corresponding feature vectors for the target vehicle and surrounding neighbor vehicles.

Step 2.1.1: for the target vehicle and the neighboring vehicles around the target vehicle, a period of time t is elapsed _p -t _h To t _p Is encoded with the historical state information of (a). First, using a full connection layer as an embedded layer, the input state vector of each vehicle is calculatedEmbedding to form an embedded vector->

Wherein ψ represents a fully connected layer with a LeakyReLU nonlinear activation function, W _emb Representing the embedded layer weights used for learning.

Step 2.1.2: using LSTM to separate different vehicles from t _p -t _h To t _p The embedded vector for this time period at time is encoded:

in the method, in the process of the invention,represents the hidden state of the LSTM unit body of the vehicle i at the time step t, W _enc Representing the weight matrix of LSTM.

Step 2.1.3: for the target vehicle and the neighboring vehicles around the target vehicle, the obtained corresponding feature vectors are recorded as Wherein->d _e Number of hidden units for LSTM.

Step 2.2: the method for identifying the driving intention of Zhou Che by constructing an intention identification model based on the BiLSTM network comprises the following steps of:

step 2.2.1: considering that the time for one complete lane change is around 5s and the acquisition frequency of the data is 10Hz, the length of the input sequence is chosen to be 50. Zhou Chejia drive intents are marked as 3 of lane change left, lane change right and straight. The identification of the intention of the surrounding vehicles is a classification problem, corresponding labels are required to be set on input sequences, and the invention respectively represents the 3 intentions by 0, 1 and 2.

Step 2.2.2: taking the motion state of the target vehicle as input, taking the spatial position information of the neighbor vehicle into consideration, and taking the input characteristics of intention recognition as follows:

I _t ＝[s _t ,Δs _t ]

wherein s is _t Is the motion state characteristic of the target vehicle, v _t Representing the speed of the target vehicle at the current moment t, a _t Representing acceleration, v of the vehicle _xt Representing the lateral speed of the vehicle.V as a speed assessment index _e Representing the expected speed of the driver. Δs _t For the interactive status feature of the target vehicle, +.>Representing the relative lateral displacement of the target vehicle and the neighboring vehicle, < >>Representing the relative longitudinal displacement of the target vehicle from the neighboring vehicle, i represents the neighboring vehicle, i=1, 2.

Step 2.2.3: embedding input vector I using full connection layer _t To form an embedded vector e _t Then time t _p -t _h To the current predicted time t _p Is fed into the Bi-LSTM. The expression is as follows:

h _t ＝BiLSTM(h _t-1 ,e _t ；W _bi ) (4)

in the method, in the process of the invention,represents a wholeConnection layer, W _fe Representing a weight matrix embedded in the fully connected layer. _t Represents the hidden state of the Bi-LSTM unit body at the moment t, W _bi Representing the weight matrix of the Bi-LSTM layer.

Step 2.2.4: the invention introduces shortcut connections, uses an FC layer to process the input vector I _t Obtaining a vector r with a fixed length _t Its length is equal to BiLSTM output vector h _t The same applies. By taking the vector r ^t And BiLSTM output vector h ^t Performing add operation construction shortcut connection, and activating with a ReLU to obtain updated output vectorThe formula is as follows:

in which W is _r Representing the weight matrix of the fully connected layer.

Step 2.2.5: t is t _p Output vector h of time step _t After being processed by the FC layer, the probability of 3 driving intentions, namely the probability of lane keeping LK, lane changing to the left LLC and lane changing to the right RLC, is obtained by calculation by using a Softmax function, and the following formula is shown:

in the method, in the process of the invention,representing an intent class vector, wherein->The probabilities of three kinds of driving intentions are represented respectively; w (W) _f Representing the weight matrix of the fully connected layer.

Step 2.3: an intention attention mechanism is constructed, attention weights under different historical time steps are distributed to hidden states of a target vehicle and neighbor vehicles around the target vehicle in combination with intention features, track features of the vehicle are effectively extracted in a time dimension, the track features can be dynamically adjusted according to driving intention, and referring to FIG. 5, a schematic diagram of the intention attention mechanism is provided, and the specific steps are as follows:

step 2.3.1: intent vector of target vehicleAnd the hidden state vector p at the last moment of the decoder _t-1 Spliced together by splicing operation, and then formed by completely connecting the layers theta _l Processing the obtained vector as a "query" Q in a key-value-to-attention mechanism _l Vehicle hidden state H encoded by encoder _i From the full-connection layer phi _l And ρ _l Processing to form a "key" K _l Sum "value" V _l . The formula is as follows:

K _l ＝φ _l (H _i ；W _φl )

V _l ＝ρ _l (H _i ；W _ρl )

in the method, in the process of the invention,and->Representing the weight matrix to be learned in each attention header l, concat represents the stitching operation.

Step 2.3.2: attention feature head _l Calculated as "value" v _lj The calculation formula is as follows:

wherein alpha is _lj Representing attention weight, herein point product attention (Dot-product Attention) [22 ]]The method of (2) is calculated and obtained: d represents a scaling factor equal to the dimension of the projection space.

Step 2.3.3: a multi-headed attention mechanism is employed to extend attention to higher order interactions. Using different learned linear projections Q _l 、K _l And V _l Calculate n _h Attention feature head _l ，l＝1,2,...,n _h . These attention features head _l Spliced together and treated with a fully connected layer:

wherein z is _t Representing the vehicle history trajectory feature encoding vector obtained when the decoder performs t time step prediction, the feature vectors of the target vehicle and surrounding vehicles extracted by the intention attention mechanism are represented here asW _iq Representing the weight matrix of the fully connected layer.

Step 2.4: an interaction relation capturing module is provided based on a multi-head attention mechanism and used for capturing interaction between a target vehicle and other vehicles and selecting surrounding vehicles to pay attention when predicting future tracks of the target vehicle. Referring to fig. 6, a schematic diagram of capturing an interaction relationship according to the present invention includes the following specific steps:

step 2.4.1: from the fully connected layer theta _s Processed historical track feature vector for target vehicleAs "query", feature vector of neighboring vehicle +.>From the full-connection layer phi _s And ρ _s The process forms "keys" and "values", cuboid Z _t The system consists of six small cuboids, which represent the feature vectors of six neighbor vehicles, and a blank transparent cuboid represents that no neighbor vehicle exists at the position. As with the intent awareness mechanism, "query" Q _s "Key" K _s Sum "value" V _s The scaling dot product attention mechanism is adopted to calculate and the multi-head attention is used to enhance the representation capability of the model, and the formula is as follows:

K _s ＝φ _s (Z _t ；W _φs )

V _s ＝ρ _s (Z _t ；W _ρs )

wherein, c _t Intermediate semantic vector, alpha, representing the combination of all vehicle interaction information when the decoder makes a t-time step prediction _sj Representing the degree of correlation of a neighboring vehicle and a target vehicle, head _s Representing a concentration feature with surrounding neighbor vehicle interactions, n _h Representing the number of attention features computed in parallel,and W is _si Representing a matrix of learnable weights for the corresponding transformation.

Step 2.5: meanwhile, the Zhou Che is subjected to longer time domain track prediction by combining the position information, the driving intention and the interaction relation of the peripheral vehicle, and the specific steps are as follows:

step 2.5.1: in the time-step prediction of the t-th time, the decoder also uses the fully-connected layer with the LeakyReLU nonlinear activation function to embed the coordinate Y of the position of the predicted track at the previous time _t-1 And form the embedded vector e _t ：

e _t ＝ψ(Y _t-1 ；W _ed )

Wherein ψ represents a fully connected layer with a LeakyReLU nonlinear activation function, W _es Representing the embedded layer weights used for learning.

Intermediate semantic vector C at this time _t Containing selected vehicle interaction information c _t Motion state encoding of target vehicleAnd embedding vector e _t Together to the LSTM decoder:

U _t ＝Concat(C _t ,e _t )

in U _t Representing the input vector of the LSTM decoder at the time of the t-th time step prediction, concat represents the splicing operation.

LSTM decoder predicts generation of target vehicle future time step t=t _p +1,t _p +2,...,t _p +t _f Is provided. The decoder also has the same shortcut as in the intent recognition module. The formula is as follows:

p _t ＝LSTM(p _t-1 ,U _t ；W _dec )

wherein p is _t Represents the hidden state vector of the decoder LSTM, Representing a new state vector derived by introducing a shortcut connection, W _dec ，W _d And W is _p Is a weight matrix.

Step 2.6: a kinematics layer embedded in a two-wheel bicycle kinematics model is added between a final hidden layer and an output layer of the decoder to decode the track so as to realize more accurate position prediction, and the method comprises the following specific steps:

step 2.6.1: the inputs of the kinematic model are the front wheel steering angle delta and the longitudinal acceleration a. At the current time t _p In preparation for predicting the t _p In the motion of +h time step, the decoder obtains t through LSTM prediction _p Longitudinal acceleration of +h-1 time step target vehicleAnd steering angle->And then uses them to calculate and obtain the target vehicle at t _p The motion state of the +h time step is as follows:

in the method, in the process of the invention,representing the target vehicle motion state obtained by the decoder, including the lateral coordinates of the vehicle>And longitudinal coordinates->Vehicle yaw angle->And vehicle speed->k represents a fixed kinematic parameter of the vehicle, and here includes the distance l of the vehicle centroid to the front axle _f And distance l to the rear axle _r The method comprises the steps of carrying out a first treatment on the surface of the f represents a two-wheeled vehicle kinematic mathematical model. />The specific calculation of (2) is as follows:

wherein Δt represents the time interval of the two predicted time steps;the derivative representing the state, the computational expression is as follows:

Step 2.6.2: distance l from vehicle centroid to front axle through historical state information of vehicle _f And distance l to the rear axle _r Estimating the fixed kinematic parameters, processing the hidden state vector of the final moment of the LSTM unit by using a FC layer to process the history information of the target vehicleEstimating a fixed kinematic parameter of a vehicle _f And l _r Referring to fig. 7, the formula is as follows:

in the method, in the process of the invention,represents a full connection layer, W _fr Representing the weight matrix of the fully connected layer. Will be l according to the characteristics of the middle-sized sedan _f And l _r The threshold value of (2) is set to be 1, 2m]：

Step 2.6.3: the decoder first generates the front wheel steering angle delta _t-1 And longitudinal acceleration a _t-1 The formula is as follows:

wherein p is _t Represents the hidden state vector of the decoder LSTM,representing a new state vector derived by introducing a shortcut connection, W _p Is a weight matrix. Also according to the characteristics of the middle-sized car, the front wheel steering angle delta _t The threshold value of (2) is set to be-45 DEG, 45 DEG]Longitudinal acceleration a _t The threshold value of [ -8m/s ² ，8m/s ² ]。

Step 2.6.4: the decoder structure with the addition of the kinematic layer is shown in fig. 8. The input of the decoder in the step 2.5 is calculated from the position coordinate Y of the previous moment of the target vehicle _t-1 Front wheel steering angle delta changed to the first two time steps _t-2 And longitudinal acceleration a _t-2 Forming an embedded vector e through the process of embedding the full connection layer _t ：

e _t ＝ψ(δ _t-1 ,a _t-1 ；W _ed )

Wherein ψ represents a fully connected layer with a LeakyReLU nonlinear activation function, W _ed Representing embedded layer weights for learning; the initial input of the decoder is set to 0.

Step 2.6.5: and then the l obtained by the coding unit _f And l _r Front wheel steering angle delta obtained by decoder at last time step _t-1 Longitudinal acceleration a _t-1 State of motion s of last time step _t-1 Obtaining the motion state s of the current time step through calculation in the step 2.6.1 _t 。

Step 2.7: the auxiliary loss function is designed based on some traffic rules available in the current data set, the model is guided to train according to the direction of the conforming knowledge, so that the model has stronger adaptability to the knowledge, and the specific steps are as follows:

step 2.7.1: the ability of the track to be oriented in the direction of the lane where it is located is measured by introducing a yaw loss.

Step 2.7.1.1: the vehicle being at the locus point (x _t ,y _t ) Heading angle θ at _t The calculation is as follows:

step 2.7.1.2: global frameCourse angle theta of track point in (3) _t And heading θ of nearest lane _NL The angular difference between them can be calculated as follows:

δ(θ _t )＝|θ _t -θ _NL |

in the formula, aiming at a horizontal straight lane of a research object, taking theta _NL ＝0°。

Step 2.7.1.3: typically, minor adjustments to the heading of the vehicle are made in order to remain within the lane. In addition to this, the heading angle only changes when the vehicle changes lanes, during which the vehicle typically performs lane changing maneuvers at heading angles not exceeding 90 °, typically much less than 90 °. Since the vehicle trajectory should not be considered yaw during legal lane changes, or during small angle lane corrections, the constraint is limited to punishing only the angle difference exceeding the threshold α, expressed as follows:

Take α=45°. For the predicted time domain range t _f All track points in the track points are based on the following expressions of the deviation heading constraint loss:

step 2.7.2: overspeed losses are introduced to measure the ability of the predicted trajectory to control speeds that do not exceed the highest speed per hour.

Step 2.7.2.1: taking the longitudinal speed value as the speed of the vehicle, the longitudinal speed of the vehicle can be calculated from the predicted track position of the vehicle as follows:

in the formula, v _t Representing the position of the vehicle at the track point (x _t ,y _t ) At a longitudinal speed, Δt, representing the time interval of two predicted time steps.

Step 2.7.2.2: limiting the constraint to penalize only beyond the maximum speed per hour limit v _m The expression is as follows:

step 2.7.2.3: the purpose of using the quadratic function for overspeed losses is to tolerate slight deviations from the optimal driving situation, while still allowing the existence of the abnormal trajectory while penalizing it, in line with the real driving situation. Finally, the predicted time domain range t can be calculated _f Overspeed loss at all trace points in the range is expressed as follows:

there are different maximum speed limits for different road conditions. For experimental data set, take the highest speed per hour limit v _m Study was performed with =120 km/h.

Step 2.7.3: and training the intention recognition model and the track prediction model respectively. For the intent recognition model, cross entropy was chosen as the loss function, and Adam optimizer with initial learning rate of 0.0005 was used to train the network in an end-to-end fashion. For the trajectory prediction model, training was performed using Adam optimizer with initial learning rate of 0.0005. In order to drive the model learned from the training data to also conform to certain acceptable knowledge rules, the loss function is set to a mean square error L _MSE The lane yaw loss and the overspeed loss are incorporated on the basis of the following formulas:

L _total ＝L _MSE +λ _OH L _OH +λ _v L _v (5)

wherein lambda is _OE And lambda (lambda) _v Representing the weights taken by the off-course and overspeed losses, respectivelyIs an empirically determined hyper-parameter. Respectively representing the transverse coordinate and the longitudinal coordinate of the predicted track, x _t 、y _t Representing the lateral and longitudinal coordinates of the real trajectory, respectively. t is t _p Representing the current time, t _f Representing the predicted duration.

In summary, the method for realizing the combined prediction of the driving intention and the track of the peripheral car of the automatic driving car can simultaneously identify the driving intention of the peripheral car and predict the track in the time domain of Che Jiaochang, and provides rich decision information for the car; the track prediction of the driving intention characteristics and the attention mechanism is considered, so that the effect of long-term track prediction can be ensured; aiming at different characteristics of intention recognition and track prediction, different input features are reasonably selected for prediction, so that the calculation cost is saved, and the prediction efficiency is improved.

The above is a further detailed description of the present invention in connection with the preferred embodiments, and it should not be construed that the invention is limited to the specific embodiments. It should be understood by those skilled in the art that the present invention may be simply put forth and substituted without departing from the spirit of the invention.

Claims

1. The method for constructing the vehicle track prediction model by combining data driving and knowledge guiding is characterized by comprising the following steps of:

1) Processing data: the method comprises the steps of processing peripheral vehicle driving data acquired by vehicle-mounted sensors such as cameras and millimeter wave radars, filling up missing values of the data, and filtering the data by using a Savitzky-Golay filter to obtain a smoother data curve; setting up rule definition Zhou Che driving intention type, setting labels for each sequence data, and obtaining data required for realizing vehicle motion prediction; the Zhou Che driving data comprise a longitudinal distance, a transverse distance, a longitudinal relative speed, a transverse relative speed, a longitudinal relative acceleration and a transverse relative acceleration of the peripheral vehicle and the main vehicle, and an azimuth angle change rate of the peripheral vehicle relative to the main vehicle;

Fourthly, calculating an obtained vehicle feature coding vector based on an intention attention mechanism, and capturing the importance of the neighbor vehicle by using an interaction relation capturing module;

fifthly, carrying out long-time-domain track prediction on Zhou Che by combining the position information, the driving intention and the interaction relation of the peripheral vehicle;

step six, considering two aspects of hard constraint and soft constraint, adding a kinematic layer and introducing a traffic rule auxiliary loss function on the basis of the proposed intention perception space-time attention network track prediction model so as to improve the performance of the data driving model;

3) Providing decision information: and providing rich information basis for subsequent operation of the host vehicle according to the result of the combined prediction.

2. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 1), the data is processed by:

step 1.1: acquiring peripheral vehicle driving data acquired by vehicle-mounted sensors such as cameras and millimeter wave radars which are equipped in the vehicle;

step 1.2: for a single missing value in the data, filling the data of the previous time step or the later time step by adopting a nearest neighbor filling method; for a plurality of continuous missing values, calculating the average of the previous value and the next value of the missing values by adopting an interpolation method for filling; removing abnormal values existing in the data by adopting a 3-sigma criterion, calculating standard deviation sigma and mean mu of each characteristic, and removing data with values distributed outside a (mu-3 sigma, mu+3 sigma) interval; and finally, filtering the data by adopting a Savitzky-Golay filter, taking M sampling points near the original data x (i), and constructing a window of 2M+1 sampling points around the x to fit a p-order polynomial y (i), wherein the expression is as follows:

Where y (i) is the processed data, i= -M, …,0, …, M; e is the sum of the squares of the total errors; p is less than or equal to 2M;

step 1.3: defining a coordinate system, wherein vehicles within a certain longitudinal range + -L of the target vehicle are defined as neighbor vehicles, and other vehicles except the target vehicle in an elliptical dotted line are defined as neighbor vehicles; predicting at the time t, fixing an origin on a host vehicle, wherein an x-axis points to the movement direction of the expressway, and a y-axis points to the direction perpendicular to the movement direction; the data acquisition is better matched with the vehicle-mounted sensor on the automatic driving vehicle;

step 1.4: zhou Chejia travel intents are defined as lane change to the left, lane change to the right and straight travel, and intention labels are set for each piece of travel sequence data.

3. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model as claimed in claim 1, wherein in step 2), the model is constructed using an LSTM-based encoder-decoder framework for capturing time-dependent relationships in the trajectory based on the intention-aware spatiotemporal attention network vehicle trajectory prediction model, comprising the steps of:

the method comprises the steps that a full connection layer is used as an embedding layer, input state vectors of each vehicle are embedded to form embedded vectors, the embedded vectors at different vehicle historic moments are encoded by utilizing LSTM, and corresponding feature vectors are obtained for a target vehicle and neighboring vehicles around the target vehicle;

Secondly, predicting the driving intention of the peripheral vehicle:

(1) Zhou Chejia driving intention recognition is a classification problem, and the driving intention recognition model classifies input multi-feature multi-step long-time series data;

(2) The method comprises the steps of constructing a driving intention recognition model of a peripheral car based on a BiLSTM network, wherein the BiLSTM network is formed by combining a forward LSTM and a backward LSTM, and the accuracy of a time sequence prediction result is improved by utilizing context information;

(3) Adding shortcomputer connections into the Bi-LSTM network to realize the stride calculation of the gradient, and effectively reducing the influence of deep network gradient elimination and network degradation problems;

(4) Inputting the motion state of the target vehicle and the spatial position information of the neighbor vehicle into a driving intention recognition model, outputting a driving intention probability vector of the peripheral vehicle, and obtaining the driving intention with the maximum probability as a final recognition result;

third, the mechanism of attention is intended:

(1) The intention vector of the target vehicle and the hidden state vector at the moment on the decoder are spliced by splicing operation, the vector obtained by processing of the complete connection layer is used as 'query' in a key value to attention mechanism, and the hidden state of the vehicle coded by the encoder is processed by the complete connection layer to form a key and a value;

(2) Calculating the attention feature as a weighted sum of "values";

(3) A multi-headed attention mechanism is employed to extend attention to higher order interactions;

fourthly, providing an interaction relation capturing module based on a multi-head attention mechanism for capturing interaction between the target vehicle and other vehicles, and selecting surrounding vehicles to pay attention when predicting future tracks of the target vehicle;

fifthly, when the t time step prediction is carried out, the decoder also adopts a full-connection layer with a LeakyReLU nonlinear activation function, the coordinates of the position of the predicted track at the moment before the track is embedded to form an embedded vector, the intermediate semantic vector at the moment contains selected vehicle interaction information and motion state codes of the target vehicle, the selected vehicle interaction information and the motion state codes are transmitted to the LSTM decoder together with the embedded vector, and the LSTM decoder predicts the track point position of the future time step of the target vehicle;

sixthly, adding a kinematic layer embedded in the two-wheel bicycle kinematic model between a final hidden layer and an output layer of the decoder to decode the track so as to realize more accurate position prediction;

(1) The input of the kinematic model is the steering angle and the longitudinal acceleration of the front wheel; at the current time t _p In preparation for predicting the t _p In the motion of +h time step, the decoder obtains t through LSTM prediction _p Longitudinal acceleration of +h-1 time step target vehicleAnd steering angle->And then uses them to calculate and obtain the target vehicle at t _p A motion state of +h time steps;

(2) Processing a hidden state vector of the final moment of the LSTM unit by using a full-connection layer to estimate the fixed kinematic parameters of the vehicle;

seventh, designing an auxiliary loss function based on some traffic rules available in the current data set, and guiding the model to train according to the direction of the conforming knowledge, so that the model has stronger adaptability to the knowledge;

(1) According to traffic rules and social practice, vehicles must travel along the travel direction of the lane when traveling on the lane so as to avoid interference with other traffic; introducing a yaw loss to measure the ability of the track to orient in the direction of the lane being taken;

(2) According to the requirements of traffic rules, vehicles generally have the limitation of highest speed per hour when running on roads; introducing overspeed losses to measure the ability of the predicted trajectory to control speed not exceeding the highest speed per hour;

4. The method for constructing a model for predicting a vehicle trajectory by combining data driving and knowledge guiding according to claim 1, wherein in step 2), the constructing a model for predicting a vehicle trajectory by using an LSTM-based encoder-decoder framework uses a full connection layer as an embedding layer, embeds an input state vector of each vehicle to form an embedded vector, encodes the embedded vectors at different vehicle history moments by using the LSTM, and for a target vehicle and neighboring vehicles around the target vehicle, the method comprises the steps of:

step 2.1.1: for the target vehicle and the neighboring vehicles around the target vehicle, a period of time t is elapsed _p -t _h To t _p Encoding historical state information of (2); using a full connection layer as an embedded layer, the input state vector of each vehicle is calculatedEmbedding to form an embedded vector->

Wherein ψ represents a fully connected layer with a LeakyReLU nonlinear activation function, W _emb Representing embedded layer weights for learning;

step 2.1.2: using LSTM separatelyTake different vehicles from t _p -t _h To t _p The embedded vector for this time period at time is encoded:

in the method, in the process of the invention,represents the hidden state of the LSTM unit body of the vehicle i at the time step t, W _enc A weight matrix representing LSTM;

5. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 2), the method for constructing a BiLSTM-based intent recognition model comprises the following specific steps:

step 2.2.1: considering that the time of one complete channel change is about 5s, and the acquisition frequency of data is 10Hz, the length of an input sequence is selected to be 50; zhou Chejia travel intents are marked as 3 kinds of lane change left, lane change right and straight; the cycle intention recognition is a classification problem, and corresponding labels are set for the input sequence;

I _t ＝[s _t ,Δs _t ]

wherein s is _t Is the motion state characteristic of the target vehicle, v _t Representing the speed of the target vehicle at the current moment t, a _t Representing acceleration, v of the vehicle _xt Representing the lateral speed of the vehicle;v as a speed assessment index _e Representing the expected speed of the driver; Δs _t For the interactive status feature of the target vehicle, +. >Representing the relative lateral displacement of the target vehicle and the neighboring vehicle, < >>Represents the relative longitudinal displacement of the target vehicle and the neighboring vehicle, i represents the neighboring vehicle, i=1, 2, 6;

step 2.2.3: embedding input vector I using full connection layer _t To form an embedded vector e _t Will be at time t _p -t _h To the current predicted time t _p Is fed into Bi-LSTM; the expression is as follows:

h _t ＝BiLSTM(h _t-1 ,e _t ；W _bi ) (2)

in the method, in the process of the invention,represents a fully-connected layer, W _fe Representing a weight matrix embedded in the full connection layer; _t represents the hidden state of the Bi-LSTM unit body at the moment t, W _bi A weight matrix representing the Bi-LSTM layer;

step 2.2.4: introduction shortcut connections processing of input vector I using an FC layer _t Obtaining a vector r with a fixed length _t Its length is equal to BiLSTM output vector h _t The same; by taking the vector r ^t And BiLSTM output vector h ^t Performing add operation construction shortcut connection, and activating with a ReLU to obtain updated output vectorThe formula is as follows:

in which W is _r A weight matrix representing the full connection layer;

6. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 2), an intention attention mechanism is introduced, attention weights under different historical time steps are allocated to hidden states of a target vehicle and neighboring vehicles around the target vehicle in combination with intention features, the trajectory features of the vehicle are effectively extracted in a time dimension, and the method is dynamically adjusted according to driving intention, and comprises the following specific steps:

step 2.3.1: intent vector of target vehicleAnd the hidden state vector p at the last moment of the decoder _t-1 Spliced together by splicing operation, and then formed by completely connecting the layers theta _l Processing the obtained vector as a "query" Q in a key-value-to-attention mechanism _l Vehicle hidden state H encoded by encoder _i From the full-connection layer phi _l And ρ _l Processing to form a "key" K _l Sum "value" V _l The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:

in the method, in the process of the invention,and->Representing a weight matrix to be learned in each attention header l, and Concat represents a splicing operation;

wherein alpha is _lj Representing attention weight, herein point product attention (Dot-product Attention) [22 ]]The method of (2) is calculated and obtained: d represents a scaling factor equal to the dimension of the projection space;

step 2.3.3: a multi-headed attention mechanism is employed to extend attention to higher order interactions; using different learned linear projections Q _l 、K _l And V _l Calculate n _h Attention feature head _l ，l＝1,2,...,n _h The method comprises the steps of carrying out a first treatment on the surface of the These attention features head _l Spliced together and treated with a fully connected layer:

wherein z is _t Representing obtained by decoder when performing t time step predictionIs described herein as a feature vector of the target vehicle and surrounding vehicles extracted by the intention and attention mechanismW _iq Representing the weight matrix of the fully connected layer.

7. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 2), the method for calculating the obtained vehicle feature code vector based on the intention and attention mechanism, and capturing the importance of the neighboring vehicle by using the interaction relation capturing module comprises the following specific steps:

step 2.4.1: from the fully connected layer theta _s Processed historical track feature vector for target vehicle As "query", feature vector of neighboring vehicle +.>From the full-connection layer phi _s And ρ _s The process forms "keys" and "values", cuboid Z _t The system consists of six small cuboids which represent the feature vectors of six neighbor vehicles, and a blank transparent cuboid represents that no neighbor vehicle exists at the position; as with the intent awareness mechanism, "query" Q _s "Key" K _s Sum "value" V _s With the scaling dot product attention mechanism calculation, multi-head attention is used to enhance the representational capacity of the model, as follows:

K _s ＝φ _s (Z _t ；W _φs )

V _s ＝ρ _s (Z _t ；W _ρs )

8. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 2), the trajectory prediction in a longer time domain is performed on Zhou Che by combining the position information, the driving intention and the interaction relationship of the peripheral vehicle, specifically comprising the following steps:

e _t ＝ψ(Y _t-1 ；W _ed )

Wherein ψ represents a fully connected layer with a LeakyReLU nonlinear activation function, W _es Representing embedded layer weights for learning;

at this timeIntermediate semantic vector c _t Containing selected vehicle interaction information c _t Motion state encoding of target vehicleAnd embedding vector e _t Together to the LSTM decoder:

U _t ＝Concat(C _t ,e _t )

in U _t Representing the input vector of the LSTM decoder at the time of the t-th time step prediction, concat representing the splicing operation;

LSTM decoder predicts generation of target vehicle future time step t=t _p +1,t _p +2,...,t _p +t _f Track point positions of (a); the decoder is also added with the same shortcut connection as the intended identification module; the formula is as follows:

p _t ＝LSTM(p _t-1 ,U _t ；W _dec )

wherein p is _t Represents the hidden state vector of the decoder LSTM,representing a new state vector derived by introducing a shortcut connection, W _dec 、W _d And W is _p Is a weight matrix.

9. The method for constructing a data-driven and knowledge-guided vehicle trajectory prediction model according to claim 1, wherein in step 3), the method provides a rich information basis for the subsequent maneuvering of the host vehicle according to the result of the joint prediction, and comprises the following specific steps: obtaining the current driving intention of the peripheral vehicle and the track position of the peripheral vehicle for a period of time after the current driving intention and the track through the combined prediction of the driving intention and the track; the behavior in a short time can be roughly judged according to the recognized driving intention, the reaction time is reserved for the operation of the host vehicle, the risk of vehicle collision is reduced, and the driving safety is ensured; and the host vehicle makes path planning in advance according to the predicted track position of the peripheral vehicle in a subsequent period of time to obtain the optimal manipulation selection for reaching the destination, thereby ensuring the driving comfort and reducing the useless energy consumption.