CN115909239A

CN115909239A - Vehicle intention recognition method and device, computer equipment and storage medium

Info

Publication number: CN115909239A
Application number: CN202211361174.1A
Authority: CN
Inventors: 黄萌; 宋永康; 万烨星; 邓捷
Original assignee: Wuhan Lotus Technology Co Ltd
Current assignee: Ningbo Lutes Robotics Co ltd
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-04-04

Abstract

The application relates to a vehicle intention identification method, a vehicle intention identification device, a computer device and a storage medium. The method comprises the following steps: acquiring vehicle source data; global feature encoding processing to obtain global encoding information including vehicle and lane information; and predicting lane change tracks of the global coding information to obtain lane change tracks, and performing intention prediction score calculation on the lane change tracks and states thereof to identify the lane change intention of the vehicle according to the intention prediction score. According to the lane change recognition method and device, global feature coding processing is carried out on vehicle source data to obtain global coding information including vehicle and lane information, lane change track prediction is carried out on the global coding information, intention prediction scoring calculation is carried out on the lane change track, the predicted intention prediction scoring of the lane change track is obtained, resource consumption of deployment of various cognitive models at a vehicle end can be reduced, calculation results of track prediction and intention recognition can be mutually restricted, and lane change recognition accuracy is improved.

Description

Vehicle intention recognition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of automotive technologies, and in particular, to a method and an apparatus for identifying a vehicle intention, a computer device, and a storage medium.

Background

With the wide application of artificial intelligence in the field of automobile automatic driving, as one of the automatic driving application functions, the driving intention is recognized by using the artificial intelligence to realize the planning and control of the driving track of the vehicle and by recognizing the lane-changing and queue-inserting behaviors of the vehicle, so that a certain traffic accident rate is reduced.

Related technologies exist in the prior art, and in an HFES journal published in 9/2005, a scheme for detecting lane change by using an SVM (Support Vector Machine) is disclosed, as shown in fig. 1, in a single SVM model, although characteristic information considering interactive behaviors is provided, the characteristic information only stays in simple association characteristics existing between vehicles, but association relations between vehicles and lanes and between vehicles in the same time period are not considered, in the process of identifying a lane changing intention of a vehicle, information such as distance and speed difference between vehicles is depended on in a transition way, and a lot of operation redundancy exists between the SVM model and other trajectory prediction models, so that vehicle end deployment pressure is easily overlarge; and when the SVM model processes time series information, multi-scale characteristic information is easily lost, and further the calculation efficiency is reduced.

Referring to fig. 2, in the prior art, the probability distribution of an observation sequence in a given model is calculated through a double random process in an HMM model, and according to some easily-obtained observation variables, potential and random behavior states of a driver are analyzed and mined, but the front-back state information is excessively dependent, a connection type cannot be captured, and if the front-back state information is lacked and the state transition process is not known, no inference can be made, so that the dependence relationship of a long distance is difficult to handle.

At present, no technical scheme for judging the intention of the vehicle by utilizing the predicted track exists.

Disclosure of Invention

In view of the above, it is necessary to provide a vehicle intention identifying method, apparatus, computer device and storage medium to solve the problem of excessively depending on information such as a distance between vehicles and a speed difference, or excessively depending on information on a front-rear state of a vehicle, so as to determine a lane change intention of a vehicle by reasonably predicting a future trajectory of the vehicle.

A vehicle intent recognition method, the method comprising:

acquiring vehicle source data;

carrying out global feature coding processing on the vehicle source data to obtain global coding information comprising vehicle and lane information;

and predicting lane change tracks of the global coding information through a preset lane change prediction model, obtaining the lane change tracks, and performing intention prediction scoring calculation on the lane change tracks and states thereof so as to identify the lane change intention of the vehicle according to the intention prediction scoring of the lane change tracks.

In one embodiment, the step of performing global feature encoding processing on the vehicle source data to obtain global encoded information including vehicle and lane information includes:

performing characteristic coding processing on the vehicle source data through a corresponding coder to obtain vehicle track coding information and lane node coding information;

and carrying out feature fusion processing on the vehicle track coding information and the lane node coding information through a fusion network structure to obtain global coding information fusing vehicle and lane information.

In one embodiment, the vehicle source data comprises vehicle historical track data and map node data;

the step of respectively performing feature extraction and coding processing on the vehicle source data through corresponding encoders to obtain vehicle track coding information and lane node coding information includes:

performing feature coding processing on the vehicle historical track data through a first encoder to obtain vehicle track coding information;

and performing feature coding processing on the map node data through a second encoder to obtain lane node coding information.

In one embodiment, the first encoder comprises a one-dimensional convolutional neural network structure and a feature pyramid network structure;

the step of performing feature encoding processing on the vehicle historical track data through the first encoder to obtain the vehicle track encoding information includes:

extracting the characteristics of the vehicle historical track data through the one-dimensional convolutional neural network structure to obtain a plurality of vehicle motion track characteristics comprising space and time information;

fusing the vehicle motion trail features of multiple sizes through the feature pyramid network structure to obtain tensors corresponding to the vehicle motion trail features to serve as the vehicle track coding information.

In one embodiment, the step of performing feature extraction on the vehicle historical track data through a one-dimensional convolutional neural network structure to obtain a plurality of vehicle motion track features including spatial and temporal information includes:

and utilizing a residual block of a residual network structure as a basic network unit of the one-dimensional convolutional neural network structure, so that a plurality of residual blocks in a plurality of groups of the one-dimensional convolutional neural network structure can conveniently perform feature extraction on the vehicle historical track data to obtain a plurality of vehicle motion track features.

In one embodiment, the step of fusing the vehicle motion trail features of multiple sizes through the feature pyramid network structure to obtain a tensor corresponding to the vehicle motion trail features includes:

acquiring a plurality of vehicle motion track characteristics output by the one-dimensional convolutional neural network structure;

marking the vehicle motion track characteristics with preset sizes as vehicle track original nodes to construct a vehicle track original node graph so as to obtain at least one track;

and filling the vehicle motion track characteristics with less than a predetermined size with a predetermined number of marks, recording the vehicle motion track characteristics marked with the predetermined number of marks through a 1 × T mask, and forming a 3 × T tensor after connecting with the track tensor, wherein T represents the predetermined size.

In one embodiment, the second encoder comprises a graph convolutional neural network structure and a first multi-layer perceptron;

the step of performing feature encoding processing on the map node data by the second encoder to obtain lane node encoding information includes:

carrying out feature extraction processing on the map node data through the graph convolution neural network structure to obtain lane node features;

carrying out parameterization operation on the lane node characteristics through a first multilayer perceptron to obtain lane node characteristic parameters;

and performing expansion graph convolution operation on each lane node characteristic parameter and adjacent lane node characteristic parameters by using matrix power to obtain the long-distance dependent lane node coding information.

In one embodiment, the converged network architecture comprises: a plurality of feature fusion sub-networks corresponding to vehicle-to-lane, lane-to-vehicle, and vehicle-to-vehicle;

the step of performing feature fusion processing on the vehicle track coding information and the lane node coding information through the fusion network to obtain global coding information fusing road and vehicle information comprises the following steps of:

acquiring the vehicle track coding information and lane node coding information;

updating the lane node coding information through the feature fusion sub-network corresponding to the lane-to-lane;

and respectively carrying out information transmission and aggregation on roads, vehicles and vehicles in the vehicle track coding information and the lane node coding information by using spatial attention through the feature fusion sub-networks corresponding to the vehicle-to-lane, lane-to-vehicle and vehicle-to-vehicle so as to obtain global coding information fusing the road and vehicle information.

dividing the vehicle source data into a plurality of local areas to obtain track sections, lane sections and corresponding coordinates in the local areas, and calculating interaction vectors between vehicles and between the vehicles and the lanes according to the coordinates;

performing vector rotation on the interaction vector by using a cross attention network structure, and calculating to obtain local coding information of any vehicle;

and carrying out coordinate difference parameterization on the interaction vectors of the vehicles to obtain paired local coding information, and combining the local coding information through vector transformation to obtain global coding information in the local area.

In one embodiment, the lane change prediction model comprises a spatial attention network structure and a second multi-layer perceptron;

the step of predicting lane change tracks of the global coding information and the state thereof through a preset lane change prediction model to obtain lane change tracks, and calculating the score of intention prediction of the lane change tracks to judge the result of the lane change intention of the vehicle according to the score of intention prediction of each lane change track comprises the following steps:

extracting the interactive features of future tracks from the global coding information through the space attention network structure to obtain a plurality of lane change tracks;

and acquiring a global vehicle state vector of each lane change track through the second multilayer perceptron to score the intention prediction of each lane change track and the state of the lane change track.

A vehicle intention recognition device, the device comprising: the system comprises a data acquisition module, a feature coding module and an intention calculation module;

the data acquisition module is used for acquiring vehicle source data;

the feature coding module is used for carrying out global feature coding processing on the vehicle source data to obtain global coding information comprising vehicle and lane information;

the intention calculation module is used for predicting lane change tracks of the global coding information through a preset lane change prediction model to obtain lane change tracks, and performing intention prediction scoring calculation on the lane change tracks and states of the lane change tracks to identify vehicle lane change intention according to the intention prediction scoring of the lane change tracks.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring vehicle source data;

global feature coding processing is carried out on the vehicle source data to obtain global coding information comprising vehicle and lane information;

and predicting the lane change track of the global coding information through a preset lane change prediction model to obtain the lane change track, and performing intention prediction scoring calculation on the lane change track and the state of the lane change track to identify the lane change intention of the vehicle according to the intention prediction scoring of the lane change track.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring vehicle source data;

The vehicle intention identification method, the vehicle intention identification device, the computer equipment and the storage medium have at least the following technical effects:

because the global characteristic coding processing is carried out on the vehicle source data to obtain the global coding information comprising the vehicle and lane information, namely the vehicle source data which is dependent on the historical track data and the map data of the vehicle in the scheme, the distance or the speed information between the vehicles in real time does not need to be considered, and the front and back state information of the vehicle does not need to be considered, the problem that the distance or the speed information between the vehicles or the front and back state information is excessively depended in the prior art is solved through the global characteristic coding processing, the operation redundancy between the vehicle source data and the track prediction is avoided, and the operation pressure of vehicle end deployment is reduced; multi-scale feature information is not processed based on a time sequence, so that the loss of the multi-scale feature information is avoided, and the calculation efficiency is improved; the dependence relationship of long distance can be processed without considering the connection type.

Due to the fact that a preset lane change prediction model is adopted, lane change track prediction is conducted on the global coding information, lane change tracks are obtained, intention prediction scoring is conducted on the lane change tracks, and a vehicle lane change intention result is judged according to the intention prediction scoring of the lane change tracks. Therefore, the method and the device predict the possible lane change track by utilizing the global coding information, and judge the lane change intention result of the vehicle by predicting and scoring the intention of the lane change track, so that the judgment of the predicted track on the intention of the vehicle is reasonably utilized.

Due to the adoption of the technologies such as a one-dimensional convolution neural network structure, a characteristic pyramid network structure, a fusion network, a spatial attention network structure and a multilayer perceptron, the loss of geometric information and semantic information of a high-precision map is reduced, the lane-changing intention of a vehicle is judged based on the lane-changing trajectory prediction of the vehicle, the trajectory prediction in automatic driving software is effectively utilized, the resource occupation of the recognition operation of the lane-changing intention deployed at the vehicle end is reduced, and the accuracy of lane-changing trajectory scoring is increased; and the interaction between the vehicle and the lane is coded through spatial attention, a future lane change track is generated, the accuracy of vehicle lane change intention prediction is effectively improved by utilizing the spatial attention coding, and meanwhile, the robustness of the input attribute of prediction data is enhanced.

Drawings

FIG. 1 is a schematic diagram illustrating the effect of using SVM to predict lane change intention of surrounding vehicles in the background art;

FIG. 2 is a schematic diagram illustrating an observation of recognizing driving intention by using HMM in the background art;

FIG. 3 is a flow diagram of a vehicle intent recognition method in one embodiment;

FIG. 4 is a schematic diagram of a network structure for lane node extraction based on a graph convolution network in one embodiment;

FIG. 5 is a schematic diagram of a network structure for vehicle trajectory extraction based on 1D-CNN in one embodiment;

FIG. 6 is a schematic diagram of a network architecture incorporating global vehicle information encoding in one embodiment;

FIG. 7 is a diagram illustrating a network architecture for predicting lane change trajectories and score calculations in one embodiment;

FIG. 8 is a schematic diagram of a local area extraction network in one embodiment;

FIG. 9 is a schematic diagram of a convolutional network structure for extracting local information of a vehicle in one embodiment;

FIG. 10 is a graph volume network graph that is computed by global interaction to capture remote dependencies in one embodiment;

fig. 11 is a block diagram schematically showing the structure of a vehicle intention identifying device in one embodiment.

Detailed Description

Further supplementary explanation is made for the background art, in the technical scheme that the double random process in the existing HMM can calculate the probability distribution of the observation sequence in the given model, the HMM is called as: a Hidden Markov Model, representing a Hidden Markov Model. In the prior HMM, a double random process is adopted to observe the relationship between a time sequence and a model, one random process in the double random process describes the transition between states through a Markov chain, and the other random process describes the statistical correspondence between the states and an observation condition, so that the two random processes are used to obtain the relevant information of the state time sequence and the model through an observation sequence generated by the HMM. In the vehicle, by taking O (t) = { h (t), fn (t), sa (t), se (t) } as a test sample of an observation sequence, a process of identifying driving intentions is described, h (t) represents a standard difference of a horizontal turning angle of a head, fn (t) represents the number of times of fixation on a rearview mirror in a time window, sa (t) represents an average panning amplitude, and se (t) represents an entropy value of a turning angle of a steering wheel, it can be seen from fig. 2 that the higher the probability value is, the higher the degree of matching between the observation sequence and the model is, the HMM can effectively calculate the probability distribution of the observation sequence in a given model, and potential and random behavior states of a driver are obtained through analysis according to some easily available observation variables, excellent performance is shown in behavior modeling and analysis of a driving object, but if a front-back relation is missing, information is missing, that is, a state transition process is not known, the HMM cannot be presumed. Based on this, this application provides the technical scheme: a vehicle intention identification method, a vehicle intention identification device, a computer device and a storage medium.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 3, the present application provides a vehicle intention identifying method, which includes the steps of:

and step S10, vehicle source data are acquired.

Step S20, global feature encoding processing is performed on the vehicle source data to obtain global encoding information including vehicle and lane information.

And S30, predicting lane change tracks of the global coding information through a preset lane change prediction model to obtain lane change tracks, and performing intention prediction scoring calculation on the lane change tracks and states of the lane change tracks to identify the lane change intention of the vehicle according to the intention prediction scoring of the lane change tracks.

In one embodiment, in step S10, in the step of acquiring vehicle source data, the vehicle source data includes vehicle historical track data and map node data. Further, the manner of acquiring the vehicle historical track data includes: and extracting historical track data of the vehicle in the driving recording software, or constructing a simulation scene of vehicle driving through simulation software, and then acquiring corresponding historical track data of the vehicle. If the actual historical track data of the vehicle is adopted, the historical track data of the vehicle can be directly obtained from the service platform corresponding to the driving recording software in order to enrich the historical track data of the vehicle.

In one embodiment, the manner of obtaining the map node data is: and constructing a lane graph network based on the vector graph to collect map node data. In particular, the lane graph network may represent at least one set of lanes and inter-lane connectivity. Wherein each lane contains a center line, i.e., a series of 2D BEV points, arranged in the lane direction as shown in fig. 4. Thus, for any two direct lanes, there are four types of connections: predecessor, successor, left neighbor, and right neighbor. For example, assume that a represents a lane, the corresponding predecessor represents a lane that may be straight a, and the corresponding successor represents a lane that emanates from a. The left neighbor and the right neighbor, i.e., adjacent lanes, refer to lanes that can be reached directly without violating traffic regulations. The lane graph network in the embodiment provides basic geometric meaning and semantic information for motion prediction, and a vehicle usually refers to a lane central line machine and connectivity of the lane central line machine to plan a route.

In one embodiment, in the step S20, the step of performing global feature encoding processing on the vehicle source data to obtain global encoded information including vehicle and lane information further includes the steps of:

and S21, respectively carrying out characteristic coding processing on the vehicle source data through corresponding encoders to obtain vehicle track coding information and lane node coding information.

And S22, carrying out feature fusion processing on the vehicle track coding information and the lane node coding information through a fusion network structure to obtain global coding information fusing road and vehicle information.

Further, in step S21, the vehicle source data is subjected to feature encoding processing by the corresponding encoder to obtain vehicle track encoding information and lane node encoding information, and the vehicle source data includes vehicle history track data and map node data. Correspondingly, the step of respectively carrying out feature extraction and coding processing on the vehicle source data through the corresponding coder to obtain vehicle track coding information and lane node coding information comprises the following steps:

step S211, carrying out characteristic coding processing on the vehicle historical track data through a first encoder to obtain vehicle track coding information; and the number of the first and second groups,

in step S212, feature encoding processing is performed on the map node data by the second encoder to obtain lane node encoding information.

In one embodiment, and as shown with reference to FIG. 5, the first encoder includes a one-dimensional convolutional neural network structure and a feature pyramid network structure. Accordingly, in step S211, the step of performing feature encoding processing on the vehicle history track data by the first encoder to obtain vehicle track encoding information includes:

step S2111, extracting the characteristics of the vehicle historical track data through a one-dimensional convolution neural network structure to obtain a plurality of vehicle motion track characteristics comprising space and time information;

step S2112, fusing the vehicle motion track characteristics of multiple sizes through the characteristic pyramid network structure to obtain a tensor corresponding to the vehicle motion track characteristics to serve as vehicle track coding information.

The one-dimensional convolution neural network structure is expressed as 1D-CNN, is applied to the fields of sequence models and natural language processing, and is different from 2D-CNN and 3D-CNN, and the 1D-CNN represents that convolution operation is carried out on one dimension. The Feature Pyramid Network structure is a Feature Pyramid Network, which is called FPN for short, and is represented by performing top-down side connection on a high-level Feature of low-resolution and high-semantic information and a low-level Feature of high-resolution and low-semantic information, so that features under all scales have rich semantic information.

In one embodiment, step S2111, performing feature extraction on the vehicle historical trajectory data through a one-dimensional convolutional neural network structure to obtain a plurality of vehicle motion trajectory features including spatial and temporal information, includes:

and utilizing the residual blocks of the residual network structure as basic network units of the one-dimensional convolutional neural network structure so that a plurality of residual blocks in a plurality of groups of one-dimensional convolutional neural networks can conveniently extract the characteristics of the historical track data of the vehicle to obtain a plurality of vehicle motion track characteristics.

Further, in the present embodiment, the vehicle history trajectory data is stored in time series, and the trajectory element at t =0 is used as the vehicle feature. Considering that a one-dimensional convolutional neural network structure (1D-CNN) has the advantages of extracting multi-scale features and parallel computing efficiency, original data of vehicle historical track data is processed through the one-dimensional convolutional neural network structure, the one-dimensional convolutional neural network structure has a plurality of groups/scales of one-dimensional (1D) convolutions, residual Blocks (Residual Blocks) of a Residual network structure are used as basic network units, each group of Residual Blocks is provided with a plurality of Residual networks, vehicle motion track features are extracted through a plurality of groups of Residual Blocks, then a Feature Pyramid Network (FPN) is used for fusing the multi-scale vehicle motion track features, and another Residual block is used for obtaining an output tensor.

Further, in step S2112, the step of fusing the vehicle motion trajectory features of multiple sizes through the feature pyramid network structure to obtain a tensor corresponding to the vehicle motion trajectory features includes:

obtaining a plurality of vehicle motion track characteristics output by a one-dimensional convolutional neural network structure;

and marking the vehicle motion track features with preset sizes as vehicle track original nodes to construct a vehicle track original node graph so as to obtain at least one track.

Each trajectory in this embodiment is represented as:

{ΔP _-(T-1) ，…，ΔP _-1 ，ΔP ₀ }

wherein Δ p represents a coordinate Offset (Offset) from time T-1 to time T, and T represents a predetermined size;

and filling the vehicle motion track characteristics with less than the predetermined size T by a predetermined number of marks, recording the vehicle motion track characteristics with the predetermined number of marks through a mask (mask) of 1 × T, and forming a 3 × T TenSor (TenSor) after connecting with the track TenSor. In one embodiment, the predetermined number of flags may be "0", and then a Mask of 1 × T is used to record the time when the vehicle motion trajectory feature is filled with 0.

The 3 × T tensor formed in this embodiment is expressed as:

[(x ₁ ，y ₁ ，mask ₁ )，…，(x _T ，y _T ，mask _T )] _3×T 。

further, the first encoder in this embodiment may be defined as a vehicle track encoder, where a network structure of the first encoder includes a plurality of sets of one-dimensional convolutional neural network structures (1D-CNN) and a feature pyramid network structure (FCN), where the one-dimensional convolutional neural network structure includes a plurality of residual blocks serving as basic network units, and the one-dimensional convolutional neural network structure is connected to the feature pyramid network structure to perform multi-scale feature fusion on a vehicle motion track feature by using the feature pyramid network structure. Furthermore, in the feature pyramid network structure, vehicle motion track features output by the one-dimensional convolutional neural network structure are subjected to upsampling (Upsample) and downsampling (Sum) to complete multi-scale feature fusion, and then the tensor is calculated after the fusion is connected with the track tensor.

In the one-dimensional convolutional neural network structure and the feature pyramid network structure of the first encoder in this embodiment, the size of the adopted convolutional kernel is greater than or equal to 3, the number of output channels is 128, and after each convolution processing, layer normalization and ReLU function activation are used.

In one embodiment, the second encoder includes a graph convolutional neural network structure and a first multi-layered perceptron. Accordingly, the step S212 of performing the feature encoding process on the map node data by the second encoder to obtain the lane node encoding information includes:

step S2121, carrying out feature extraction processing on map node data through a graph convolution neural network structure to obtain lane node features;

step S2122, carrying out parameterization operation on the lane node characteristics through the first multilayer perceptron to obtain lane node characteristic parameters;

and S2123, performing expanded graph convolution operation on the characteristic parameters of the nodes of each lane and the characteristic parameters of the nodes of the adjacent lanes by using the matrix power to obtain long-distance dependent lane node coding information.

The Graph convolution neural Network structure in this step is Graph probabilistic Network, GCN for short, and is different from RNN and CNN, and is a feature extractor for Graph data, and convolution is defined by fourier transform.

With further reference to FIG. 3, the feature extraction process is illustrated for map node data by a convolutional neural network structure, which includes 4-6 convolutional residual blocks, which internally includes 4-6 multi-scale layer convolutions (1, 2, 4, 8, 16, 32), a linear network, and a residual connection, where all layers have 128 feature channels, and after each convolutional and linear layer, layer normalization and ReLU activation functions are used.

In one embodiment, in step S2121, in the step of performing feature extraction processing on the map node data through the graph-convolution neural network structure to obtain lane node features, a graph-convolution operator is used to obtain the lane node features.

Specifically, the graph convolution operator is defined as: y = LXW, where X ∈ R ^N×F Expressed as a lane node characteristic, X ∈ R ^F ^×O Expressed as a weight matrix, Y ∈ R ^F×O As indicated by the output(s),

represents a graph Laplace matrix, and the graph Laplace matrix L ∈ R ^N×N I, A and D represent an identity matrix, an adjacency matrix and a degree matrix, respectively, I and A represent connections between different nodes, all connections share the same weight W, and the degree matrix D is used to pair the output YAnd (6) carrying out normalization. However, in step S121, it is not clear what node features will retain the information in the lane graph, and a single graph laplacian cannot capture the connection type, so that the direction information is lost, and it is not easy to handle the long distance dependence in this form of graph convolution.

In the present embodiment, an expansion map convolution operator is given to obtain the lane node feature.

And S122, carrying out parameterization operation on the lane node characteristics through the multilayer perceptron to obtain lane node characteristic parameters.

In order to encode all the lane node features, the shapes (e.g., the size and the direction) and the positions (center coordinates) of the corresponding line segments need to be considered at the same time, so that the lane node features are parameterized and expressed by the following operation:

where MLP denotes a multi-layered perceptron, the two subscripts denote shape and position, respectively, v _i The position of the ith lane node, i.e. the center between the two end points,

and &>

Denotes the start of node i and the BEV coordinate end of node, x _i And the ith row of the node characteristic matrix X is the input characteristic of the ith lane node.

The lane node features in this embodiment are only represented as local information of one line segment, and therefore, in one embodiment, in order to aggregate topology information of a lane graph on a larger scale, the lane node features are input to the following graph convolution operators:

wherein A is _i And W _i Respectively representing an adjacency matrix and a weight matrix corresponding to the ith connection type. In the present embodiment, the lane node features are ordered from the starting point to the ending point, and therefore, for convenience of calculation, the matrix a is obtained by moving the unit matrix one step to the upper right (non-zero hyper diagonal) and the lower left (non-zero sub diagonal) _suc And A _pre . And, A _suc And A _pre Information from forward and backward neighbors can be propagated, and/or, A _right And A _left Information of the left and right adjacent lanes can be transferred.

Step S2123, performing an expanded graph convolution operation on each lane node characteristic parameter and adjacent lane node characteristic parameters by using the matrix power to obtain long-distance dependent lane node code information, wherein in the step of obtaining the long-distance dependent lane node code information, in the present embodiment, the expanded graph convolution operation is performed on the lane node characteristic parameters.

In order to obtain the long-distance dependent lane node coding information, the expansion graph convolution operation on the input lane node characteristic parameters is represented as follows:

wherein the content of the first and second substances,

is A _pre Is used to propagate k steps of information along the lane, k being a hyperparameter.

In view of

The sparse matrix multiplication is used for calculation in the present embodiment, and therefore the dilated volume of the graph (LaneConv) in the present embodiment is used for the predecessor and successor calculations only, considering that the long distance dependence is mainly along the lane direction.

Further, after the convolution operation of the expansion graph of the node features of each lane is completed, the node features of adjacent lanes need to be considered, so that on the basis of convolution of the expansion graphs corresponding to the node features of each lane, a graph convolution expansion operator based on the node features of the left adjacent lane and the right adjacent lane is added, and the expression is as follows:

wherein k is _c Denotes the size of the c-th expansion, (k) ₁ ，…，k _c ) Representing a multi-scale layer.

In one embodiment, in step S322, performing feature fusion processing on the vehicle track coded information and the lane node coded information through a fusion network structure to obtain global coded information fusing road and vehicle information, referring to fig. 6, the fusion network structure includes: a plurality of feature fusion sub-networks corresponding to vehicle-to-lane (A2L), lane-to-lane (L2L), lane-to-vehicle (L2A), and vehicle-to-vehicle (A2A).

In the step, vehicle track coding information and lane node coding information are obtained and are fused into new global coding information with road and vehicle information through a fusion network structure. The above described converged network architecture consists of a stack of four feature converged sub-networks to capture all information flow between the vehicle and the lane nodes.

Correspondingly, in step S22, the step of performing feature fusion processing on the vehicle track coding information and the lane node coding information through the fusion network structure to obtain global coding information fusing the vehicle and the lane information includes:

acquiring vehicle track coding information and lane node coding information;

updating lane node coding information through a lane-to-lane corresponding feature fusion sub-network;

the information of roads, vehicles and vehicles in the vehicle track coding information and the lane node coding information is transmitted and aggregated by the feature fusion sub-networks corresponding to the vehicles to the lanes, the lanes to the vehicles and the vehicles to obtain global coding information fusing the road and vehicle information by respectively utilizing the spatial attention.

The purpose of the step is to output global coded information fusing road and vehicle information after inputting vehicle track coded information and lane node coded information.

Wherein the feature fusion sub-network corresponding to the vehicle-to-lane (A2L) is configured to introduce real-time traffic information into lane nodes, such as congestion or usage of lanes; a feature fusion sub-network corresponding to lane-to-lane (L2L) configured to update lane node features by propagating traffic information on a lane graph; a feature fusion sub-network corresponding to the lane-to-vehicle (L2A) configured to fuse updated lane node features with real-time traffic information back to the participant; a plurality of feature fusion sub-networks corresponding to vehicle-to-vehicle (A2A) configured to process interactions between vehicles and generate output vehicle features for motion prediction by a lane change prediction model.

Wherein, in a plurality of feature fusion sub-networks corresponding to the vehicle-to-lane (A2L), the lane-to-vehicle (L2A) and the vehicle-to-vehicle (A2A), the information is transmitted and aggregated to the road and the vehicle, the vehicle and the vehicle through the space attention; and adopting an expansion graph volume structure corresponding to the lane-to-lane (L2L) feature fusion sub-network. Further, in this embodiment, through the addition of the attention mechanism, the existing disadvantage that each input is given the same vector is changed, and different weights are given according to different motion trajectories, so that the network can be obtained in different time periods. Taking the example corresponding to a vehicle-to-lane (A2L) feature fusion sub-network, assuming i is represented as a vehicle node, the feature aggregation for its context road node j is represented as follows:

wherein x is _i Is a characteristic of the ith node, W is a weight matrix,

the function consists of the layer normalization and the composition of ReLU, and Δ _i,j ＝MLP(v _i -v _j ) Where v represents a node position and the context node represents a lane node whose euclidean distance to the participant node i is less than a preset threshold.

In one embodiment, the preset thresholds are set to 7, 6 and 100 meters respectively in the plurality of feature fusion sub-networks corresponding to the vehicle-to-lane (A2L), the lane-to-vehicle (L2A) and the vehicle-to-vehicle (A2A).

In one embodiment, two residual blocks are respectively configured in a plurality of feature fusion sub-networks corresponding to the vehicle-to-lane (A2L), the lane-to-vehicle (L2A), and the vehicle-to-vehicle (A2A). Each residual block is configured with a stack in which an attention layer and a linear layer are configured, and all layers have 128 output feature channels, as well as a residual. In one embodiment, the residual block consists of a stack of the proposed attention layer and the linear layer and a residual connection.

In one embodiment, in step S30, a lane change track is predicted on the global coded information through a preset lane change prediction model to obtain a lane change track, and an intention prediction score is calculated on the lane change track and the state thereof, so as to judge a vehicle lane change intention result according to the intention prediction score of the lane change track, wherein the lane change prediction model comprises a spatial attention network structure and a second multilayer sensor.

Correspondingly, in step S30, the step of predicting the lane-changing track of the global coding information through a preset lane-changing prediction model to obtain the lane-changing track, and performing an intention prediction score calculation on the lane-changing track and the state thereof to judge the intention result of the vehicle lane-changing according to the intention prediction score of the lane-changing track includes:

extracting the interactive characteristics of future tracks from the global coding information through a space attention network structure to obtain a plurality of lane change tracks;

and acquiring a global vehicle state vector of each lane change track through a second multilayer perceptron to score the intention prediction of each lane change track and the state of the lane change track.

In one embodiment, global coding information fusing road and vehicle information is used as an input of a lane change prediction model, a spatial attention network structure and a multilayer perceptron are configured in the lane change prediction model, and intention scores of three predicted lane change tracks are output.

Further, referring to fig. 7, the global coding information fused with the information of the road and the vehicle is input into the lane change prediction model, the interaction features between future tracks are obtained through a spatial attention network between vehicles in one layer, then the global vehicle state vector of each lane change track is obtained through a second multi-layer sensor, and finally the lane change track and the intention prediction score of the state thereof are output, which may be represented as:

wherein x is _n Expressed as the output of the lane change prediction model, y _n Is the true value of the target.

In another embodiment, the step S20 of performing global feature encoding processing on the vehicle source data to obtain global encoded information including vehicle and lane information includes:

and S21', dividing the vehicle source data into a plurality of local areas to obtain track sections, lane sections and corresponding coordinates in the local areas, and calculating interaction vectors between vehicles and between the vehicles and the lanes according to the coordinates.

And S22', utilizing a cross attention network structure to carry out vector rotation on the interaction vector, and calculating to obtain local coding information of any vehicle.

And step S23', carrying out coordinate difference parameterization on the interaction vectors of the vehicles to obtain paired local coding information, and combining the local coding information through vector transformation to obtain global coding information in a local area.

Referring to fig. 8, step S21' is a step of dividing the vehicle source data into a plurality of local areas to obtain track segments, lane segments and corresponding coordinates in the local areas, and calculating interaction vectors between the vehicle and between the vehicle and the lane according to the coordinates.

The vehicle source data includes vehicle trajectory data and map data. In one embodiment, vectorization entities (such as vehicles and lanes) are extracted through a simulation scene, and vehicle track data and map data are acquired. In the embodiment, a reference center vehicle is determined, and a lane in the map data is divided according to the map data where the center vehicle is located, so as to obtain a track segment of the vehicle and a lane segment in the map data. Wherein the trajectory of vehicle i is represented as:

wherein the content of the first and second substances,

representing the position of vehicle i at time T, which is the overall historical time step. For the segmented lane xi, the geometric attribute is defined as ≥>

Wherein it is present>

The start and end coordinates of ξ are represented.

Referring to fig. 9, in step S22', the interaction vector is vector-rotated by using the cross attention network structure to calculate and obtain the local encoding information of any vehicle, and the track segment and the lane segment in the local area are vector-rotated by using the cross attention network structure to obtain the local embedding vector between the vehicle and between the vehicle and the lane.

Specifically, the relationship between the center vehicle and the adjacent vehicle at each time step is intended to be learned in each local region for the interactive operation between the vehicles. Wherein the center vehicle is indicated as any vehicle used as a reference.

In the present embodiment, the trajectory segments and lane segments in the local area are aggregated according to the rotational invariance in the cross-attention network structure to take advantage of the symmetry of the problem.

Specifically, in one embodiment, the latest trajectory segment of the center vehicle is segmented

As a reference vector of the local region, according to the direction θ of the rotation vector _i All local vectors are rotated.

Calculating and obtaining the embedded vector of the central vehicle i by utilizing the multilayer perceptron technology according to the rotation vector and the related semantic attributes thereof

And an embedded vector {. Of any adjacent vehicle j }>

t represents a time point, and is specifically represented as follows:

wherein the content of the first and second substances,

is represented by theta _i Parameterized rotation matrix, a _i And a _j Representing semantic attributes of vehicles i and j, respectively.

In the present embodiment, the time information in each local area is captured by the time converter/time encoder before the interactive operation between the vehicle and the vehicle.

For arbitrary centerVehicle i, using returned embedded vectors of different time steps in performing interactive operation between vehicles

An input sequence is composed. Similar to the language-characterization model BERT, an additional learnable flag/is attached at the end of the input sequence>

Further, the learnable position embedding is added to all tags and the tags are stacked into a matrix

Finally, the time attention mechanism is input, and is expressed as follows:

and performing interactive operation between the vehicle and the lane to obtain more local information of the vehicle. The local map structure in the present embodiment is used to indicate the future intention of the center vehicle. In the interactive operation between the vehicle and the lane, the local map information is encoded and then incorporated into the embedded vector of the vehicle.

Specifically, relative position vectors of the lane segment rotated in the local area of the current time T and the lane where the vehicle is located are obtained. In the present embodiment, the rotation-rotated embedded vector is encoded by the multilayer perceptron MLP as follows:

the space-time characteristics of the central vehicle are used as query input, the lane segment characteristics coded by the multi-layer perceptron MLP are used as key/value input, and the final local embedding vector of the central vehicle i is obtained by using the other multi-layer perceptron MLP

In one embodiment, step S23' parameterizes the coordinate difference of the vehicle-to-vehicle interaction vectors to obtain pairs of locally encoded information, and combines the locally encoded information by vector transformation to obtain globally encoded information within the local area.

Referring to fig. 10, the purpose of this step is to introduce global interactive operation to capture the remote dependency relationship in a specific driving scene.

The coordinate difference between the vehicles is parameterized to obtain paired embedded vectors. Specifically, coordinate information of a track segment is acquired, and a coordinate difference between vehicles is parameterized, which is expressed as follows:

and Δ θ _ij Where Δ θ _ij Denotes θ _j -θ _i 。

Obtaining pairs of embedded vectors e using multi-layered perceptron MLPs _ij Expressed as follows:

in one embodiment, pairs of embedded vectors are combined into a transformation of local embedded vectors to obtain global vehicle feature vectors.

Global coding information is obtained by merging pairs of embedded vectors into a transformation of local embedded vectors.Combining the paired embedded vectors into the transformation of the final local embedded vector, and acquiring the global coding vector of any vehicle through a spatial attention mechanism and a multi-layer perceptron (MLP)

/>

Referring again to fig. 7, the final vehicle intent recognition is to input the output global vehicle feature vector into the lane-change prediction model. The interactive characteristics among future tracks are obtained through a spatial attention network among vehicles in one layer, then a global vehicle state vector is obtained through a second multilayer perceptron, and finally an intention prediction score of the lane-changing track and the state of the lane-changing track is output and expressed as follows:

wherein x _n As output of the model, y _n Is the true value of the target.

In the vehicle intention identification method, because the global characteristic coding processing is carried out on the vehicle source data to obtain the global coding information comprising the vehicle and the lane information, namely the vehicle source data which is dependent on the historical track data and the map data of the vehicle in the scheme, the real-time distance or speed information between the vehicles is not required to be considered, and the front and back state information of the vehicle is not required to be considered, the problem that the distance or speed information between the vehicles is excessively dependent or the front and back state information is excessively dependent in the prior art is solved through the global characteristic coding processing, the operation redundancy between the vehicle source data and the track prediction is avoided, and the operation pressure of vehicle end deployment is reduced; multi-scale characteristic information is not processed based on a time sequence, so that the loss of the multi-scale characteristic information is avoided, and the calculation efficiency is improved; the dependence relationship of long distance can be processed without considering the connection type.

Due to the fact that a preset lane change prediction model is adopted, lane change track prediction is conducted on the global coding information, lane change tracks are obtained, intention prediction scoring calculation is conducted on the lane change tracks, and a vehicle lane change intention result is judged according to the lane change tracks and intention prediction scores of states of the lane change tracks. Therefore, the method and the device can predict the possible lane change track by using the global coding information, and then judge the lane change intention result of the vehicle by calculating the intention prediction score of the lane change track, so that the judgment of the predicted track on the intention of the vehicle is reasonably utilized.

Due to the adoption of the technologies such as a one-dimensional convolution neural network structure, a characteristic pyramid network structure, a fusion network, a spatial attention network structure and a multilayer perceptron, the loss of geometric information and semantic information of a high-precision map is reduced, the lane-changing intention of a vehicle is judged based on the lane-changing trajectory prediction of the vehicle, the trajectory prediction in automatic driving software is effectively utilized, the resource occupation of the recognition operation of the lane-changing intention deployed at the vehicle end is reduced, and the accuracy of lane-changing trajectory scoring is increased; and through the interaction of the space attention coding vehicle and the lane, the future lane change track is generated, the accuracy of the prediction of the lane change intention of the vehicle is effectively improved by using the space attention coding, and meanwhile, the robustness of the input attribute of the prediction data is enhanced.

In the global feature coding processing, a backbone model of a space attention mechanism is applied, and feature information of a vehicle motion track and a map is effectively extracted through a graph convolution network; by means of a lane change prediction model combining spatial attention and a multilayer perceptron, characteristic information of a vehicle future motion state is fully extracted, and lane change scoring prediction is carried out. The track prediction backbone network is flexibly utilized to extract the vehicle characteristics, global characteristic coding processing is completed, and the lane change prediction model can adapt to different global coding information. Road node information and vehicle motion information are obtained through a graph convolution network, and global vehicle motion state vectors are obtained through 4-6 space attention modules. The model backbone for predicting the integrated track can effectively extract the time series information of the motion state of the vehicle. A spatial attention network of the vehicle and the vehicle is added in the lane change prediction model, so that interactive information of the future motion state of the vehicle can be effectively captured to obtain more accurate lane change prediction. The lane change prediction model can be used as a downstream module for any global feature coding processing, so that not only can the resource consumption of various cognitive models deployed at the vehicle end be reduced, but also the calculation results of the trajectory prediction and the intention recognition can be mutually restricted, and the accuracy of lane change recognition is further improved.

In one embodiment, referring to fig. 11, there is provided a vehicle intention identifying apparatus including: a data acquisition module 101, a feature encoding module 102, and an intent calculation module 103, wherein:

the data acquisition module 101 is used for acquiring vehicle source data;

the feature coding module 102 is configured to perform global feature coding processing on the vehicle source data to obtain global coding information including vehicle and lane information;

the intention calculation module 103 is configured to perform lane change trajectory prediction on the global coding information through a preset lane change prediction model to obtain a lane change trajectory, and perform intention prediction score calculation on the lane change trajectory and a state thereof, so as to identify a vehicle lane change intention according to an intention prediction score of the lane change trajectory.

For specific definition of the vehicle intention identifying device, reference may be made to the definition of the vehicle intention identifying method above, and details will not be repeated here. The respective modules in the above-described vehicle intention identifying apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store xxx data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a vehicle intention recognition method.

In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a vehicle intention recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configurations shown in the figures are block diagrams of only some of the configurations relevant to the present application, and do not constitute a limitation on the computing devices to which the present application may be applied, and that a particular computing device may include more or less components than those shown in the figures, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring vehicle source data;

and predicting lane change tracks of the global coding information through a preset lane change prediction model to obtain lane change tracks, and performing intention prediction scoring calculation on the lane change tracks and states thereof to identify the lane change intention of the vehicle according to the intention prediction scoring of the lane change tracks.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of:

acquiring vehicle source data;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A vehicle intention recognition method, characterized in that the method comprises:

acquiring vehicle source data;

2. The vehicle intention recognition method according to claim 1, wherein the step of performing global feature encoding processing on the vehicle source data to obtain global encoding information including vehicle and lane information includes:

performing characteristic coding processing on the vehicle source data through a corresponding encoder to obtain vehicle track coding information and lane node coding information;

3. The vehicle intention recognition method according to claim 2, characterized in that the vehicle source data includes vehicle history track data and map node data;

the step of performing feature encoding processing on the vehicle source data through a corresponding encoder to obtain vehicle track encoding information and lane node encoding information includes:

and performing feature coding processing on the map node data through a second coder to obtain the lane node coding information.

4. The vehicle intention recognition method according to claim 3, characterized in that the first encoder includes a one-dimensional convolutional neural network structure and a feature pyramid network structure;

the step of performing feature encoding processing on the vehicle historical track data through a first encoder to obtain the vehicle track encoding information comprises the following steps:

5. The vehicle intention recognition method according to claim 4, wherein the step of performing feature extraction on the vehicle historical track data through a one-dimensional convolutional neural network structure to obtain a plurality of vehicle motion track features including spatial and temporal information comprises:

6. The vehicle intention identifying method according to claim 5, wherein the step of fusing the vehicle motion trail features of multiple sizes through the feature pyramid network structure to obtain a tensor corresponding to the vehicle motion trail features comprises:

7. The vehicle intention recognition method of claim 3, wherein the second encoder comprises a graph convolutional neural network structure and a first multi-layer perceptron;

8. The vehicle intention recognition method according to claim 2, characterized in that the converged network structure includes: a plurality of feature fusion sub-networks corresponding to vehicle-to-lane, lane-to-vehicle, and vehicle-to-vehicle;

the step of performing feature fusion processing on the vehicle track coding information and the lane node coding information through a fusion network structure to obtain global coding information fusing the vehicle and lane information includes:

9. The vehicle intention recognition method according to claim 1, wherein the step of performing global feature encoding processing on vehicle source data to obtain global encoded information including vehicle and lane information includes:

10. The vehicle intent recognition method according to claim 1, wherein the lane change prediction model includes a spatial attention network structure and a second multi-layer perceptron;

the step of predicting the lane change track of the global coding information through a preset lane change prediction model to obtain the lane change track, and calculating the intention prediction score of the lane change track and the state of the lane change track to judge the lane change intention result of the vehicle according to the intention prediction score of each lane change track comprises the following steps:

11. A vehicle intention recognition apparatus, characterized in that the apparatus comprises:

the data acquisition module is used for acquiring vehicle source data;

the characteristic coding module is used for carrying out global characteristic coding processing on the vehicle source data to obtain global coding information comprising vehicle and lane information;

and the intention calculation module is used for predicting lane change tracks of the global coding information through a preset lane change prediction model to obtain lane change tracks, and performing intention prediction scoring calculation on the lane change tracks and states thereof so as to identify the lane change intention of the vehicle according to the intention prediction scoring of the lane change tracks.

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.