CN114997067A - Trajectory prediction method based on space-time diagram and space-domain aggregation Transformer network - Google Patents

Trajectory prediction method based on space-time diagram and space-domain aggregation Transformer network Download PDF

Info

Publication number
CN114997067A
CN114997067A CN202210767796.8A CN202210767796A CN114997067A CN 114997067 A CN114997067 A CN 114997067A CN 202210767796 A CN202210767796 A CN 202210767796A CN 114997067 A CN114997067 A CN 114997067A
Authority
CN
China
Prior art keywords
pedestrian
network
time
space
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210767796.8A
Other languages
Chinese (zh)
Other versions
CN114997067B (en
Inventor
曾繁虎
杨欣
王翔辰
李恒锐
樊江锋
周大可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210767796.8A priority Critical patent/CN114997067B/en
Publication of CN114997067A publication Critical patent/CN114997067A/en
Application granted granted Critical
Publication of CN114997067B publication Critical patent/CN114997067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a trajectory prediction method based on a space-time diagram and airspace aggregation Transformer network, which solves the problem of insufficient extraction of interactive features in the existing pedestrian trajectory prediction output, uses a space-time diagram convolution neural network and a time sequence feature transformation network to operate and finish effective and accurate extraction of pedestrian trajectory features in a scene, meanwhile, a brand-new airspace aggregation Transformer framework is designed to perform pedestrian time sequence characteristic transformation, efficient aggregation and utilization of airspace pedestrian characteristics are completed, output of predicted tracks of pedestrians is finally completed in a probability distribution mode, the purposes of reasonably avoiding sudden conditions and keeping movement consistency of group pedestrians are achieved, relevant indexes show that the framework breaks through in the aspect of predicting pedestrian endpoints, the purpose of more accurate and efficient prediction of pedestrian track distribution is completed, and important help is provided for development in the fields of automatic driving, intelligent traffic and the like.

Description

Trajectory prediction method based on space-time diagram and space-domain aggregation Transformer network
Technical Field
The invention relates to a trajectory prediction method based on a space-time diagram and airspace aggregation Transformer network, and belongs to the field of artificial intelligence and automatic driving.
Background
The pedestrian trajectory prediction technology has deeper theoretical background and practical application value, and the pedestrian trajectory recognition and prediction technology always occupies more important position in the fields of unmanned driving, intelligent monitoring and the like. In recent years, due to the progress of artificial intelligence and deep learning technology, the falling and application of intelligent algorithms related to the pedestrian trajectory prediction problem gradually arouse attention and advices.
The behavior characteristics of traffic participants in a scene are better understood and judged by an intelligent agent, a pedestrian trajectory prediction model with space interaction characteristic information is established, relevant prediction is carried out, and accurate, rapid and reasonable relevant decisions are made. However, the high complexity and uncertainty of the pedestrian trajectory prediction problem determines that it has the following difficulties: the complex scene characteristic information enables the future track of the pedestrian to be influenced not only by the historical track and the established track route, but also by various influences of obstacles and other traffic participants in the scene in the space-time dimension. Therefore, whether a reasonable and accurate model can be established and quick prediction output and decision can be carried out is the key of applying the pedestrian trajectory prediction problem to an actual scene.
Thanks to the development of machine learning in the field of artificial intelligence, for a long time, trajectory prediction methods based on LSTM and on CNN algorithms are the mainstream prediction methods. The prediction method has simple model, can obtain good prediction effect by using less parameters and more basic model frameworks, provides thought and basic module frameworks for subsequent deep algorithm research, and has pioneering significance.
Since graphs and their network architectures have natural advantages in data information representation of the pedestrian trajectory prediction problem, graph-based pedestrian trajectory prediction research has become a popular direction of research in recent years. Mohamed A et al used a spatio-temporal neural network method in 2020 literature (Source-stgcnn: A spatial-temporal mapping conditional neural network for human target prediction [ C ]), and performed two different convolution operations on the time domain and the space domain, respectively, to obtain trajectory feature information and perform prediction output at the same time. Similarly, the model considers the randomness and uncertainty of the pedestrian track in the space, that is, the information such as the established track and the terminal point of each pedestrian is not known in advance when the model predicts, so a reasonable research method is to assume that the horizontal and vertical coordinates of the predicted track of the pedestrian conform to two-dimensional Gaussian distribution, and output the track in a sampling mode in the process of checking and predicting. The model also accomplishes the prediction based on such assumptions, yielding less favorable results. However, there is still no further processing on the pedestrian interaction characteristic information in such a pedestrian trajectory prediction model, which results in insufficient spatial interaction capability, so that the generated trajectory has large inertia, and cannot generate closely-related motion prediction according to the motion pattern between group pedestrians.
In recent years, many researchers have made many advances in predicting pedestrian trajectories based on graph representation, combining many other different algorithmic tools and research methods. Dan X et al propose a Pedestrian trajectory Prediction model architecture based on spatio-Temporal module and LSTM in the literature (Spatial-Temporal Block and LSTM Network for Pedestrian Trajectories Prediction [ J ]), the model is based on graph representation, the relation feature vector between each Pedestrian node and its neighbor Pedestrian is obtained through graph embedding representation, the space-Temporal graph Pedestrian interaction feature vector obtained through coding is input into LSTM, and then relevant Prediction is carried out, and a good Prediction result is obtained; rainbow B A et al propose a semantic-based space-time diagram pedestrian trajectory prediction model Semantics-STGCNN in the literature (Semantics-STGCNN: A semantic-defined spatial-temporal map contextual network for multi-class trajectory prediction [ C ]), wherein, from scene semantic understanding, the class labels of pedestrian objects are embedded into a label adjacency matrix, and the semantic adjacency matrix is output by combining with a velocity adjacency matrix, so as to complete modeling of semantic information and finally output the prediction result; yu C et al uses a Transformer-based network model in the literature (Spatio-temporal map transform networks for collaborative project prediction [ C ]), which utilizes the excellent performance of transformers in other fields to design a direct concatenation of multiple Transformer basic frames to extract the Spatio-temporal feature information of pedestrians in a scene and complete the relevant prediction.
Aiming at the problems and the defects of the existing track prediction method in the aspects of pedestrian space interactive feature extraction and prediction, the invention provides a brand-new network architecture for predicting the pedestrian track by using a space-time diagram and space domain aggregation Transformer, performs proper diagram representation and pretreatment on input original data, extracts original pedestrian track feature information by using a space-time diagram convolutional neural network and a time sequence feature transformation network, and introduces a space domain Transformer network architecture deep layer space feature information for sufficient extraction and aggregation so as to ensure the effectiveness and the accuracy of the model in the aspect of space pedestrian interactive feature. The invention pays attention to the reasonability of the model prediction result in the aspect of space interaction, ensures the space walking characteristic of the pedestrian and simultaneously considers the interaction influence, particularly makes a breakthrough on the prediction of the pedestrian track terminal point, has positive effects on pedestrian track interaction and the prediction of the pedestrian track in a complex modeling scene, and helps and inspires the research and exploration in the fields of unmanned driving, artificial intelligence and the like.
Disclosure of Invention
The invention discloses a track prediction method based on a space-time diagram and an airspace aggregation Transformer network, which aims at the problems that the extraction of space pedestrian track information is insufficient in the existing pedestrian track prediction method, the relative position relation of pedestrians during walking is not clear enough, the pedestrians cannot rotate in a large range aiming at collision and the like.
In the aspect of a space-time graph convolution neural network, pedestrian track characteristic information in a scene is represented and preprocessed in a graph form, and the graph convolution neural network is constructed to finish primary extraction of the pedestrian track characteristic information in the space and serve as subsequent network input.
In the time sequence feature transformation network, the extraction of time sequence feature information and the transformation of feature dimensions are completed through a convolution extending network, and meanwhile, the network is reasonably designed to simplify model parameters and improve the performance of the model.
In the spatial aggregation Transformer network, the features obtained from the space-time graph convolutional neural network and the time-sequence feature transformation network are further processed. In order to further mine and model the interaction of pedestrian features in the spatial scene, the model uses the time sequence feature vector of each pedestrian as an input vector, inputs the time sequence feature vector into a spatial aggregation Transformer network to fully extract and aggregate the spatial trajectory features of the pedestrians, and simultaneously completes the task of trajectory prediction output.
The invention mainly comprises the following steps:
step (1): representing and preprocessing the pedestrian track characteristic information in the scene from the input original data by using the characteristics of the graph, selecting a proper kernel function to complete the construction of the adjacent matrix, and providing accurate and efficient pedestrian information in the scene for the subsequent network architecture input;
step (2): establishing a space-time graph convolution neural network module, establishing a graph convolution neural network, and finishing preliminary extraction of pedestrian track characteristic information in a space by selecting graph convolution times of pedestrian track characteristics to ensure accuracy and effectiveness of characteristic extraction;
and (3): establishing a time sequence feature transformation network module, and finishing the extraction of time sequence features and the transformation of feature dimensions by designing a convolutional neural network;
and (4): and establishing a spatial aggregation Transformer network, using the time sequence characteristic vector of each pedestrian in the scene as an input vector, simultaneously inputting the Transformer network to further aggregate spatial characteristics, and finishing the output of the pedestrian track prediction sequence.
Furthermore, in the step (1), a space-time diagram is introduced to represent the input original pedestrian trajectory data, a proper kernel function is selected from multiple kernel functions to construct an adjacent matrix under the diagram meaning, efficient construction and selection of pedestrian features in a scene are completed, and accurate and efficient information is provided for subsequent modeling.
Further, the method for representing the input original pedestrian trajectory data by the introduced space-time diagram specifically comprises the following steps: for each time t, a spatial map G is introduced t The system is used for representing the interactive characteristic relation among pedestrians at each time point; g t Is defined as G t =(V t ,E t ) Wherein V is t Coordinate information specifically representing pedestrians in the scene at the time t, i.e.
Figure BDA0003722819760000041
Each one of which
Figure BDA0003722819760000042
Using observed relative coordinate changes
Figure BDA0003722819760000043
To perform the engraving, namely:
Figure BDA0003722819760000044
Figure BDA0003722819760000045
wherein, i is 1, …, N, T is 2, …, T obs For the initial time, the relative offset of the position is defined as 0, i.e.
Figure BDA0003722819760000046
E t Then represents the space diagram G t Is a matrix with dimension size n × n; is defined as
Figure BDA0003722819760000047
Figure BDA0003722819760000048
The values of (a) are given by:
if node
Figure BDA0003722819760000049
And node
Figure BDA00037228197600000410
Are connected to each other, then
Figure BDA00037228197600000411
On the contrary, if the node
Figure BDA00037228197600000412
And node
Figure BDA00037228197600000413
Are not connected, then
Figure BDA00037228197600000414
Further, the selecting a proper kernel function from the plurality of kernel functions to construct the adjacency matrix in the sense of the graph is specifically as follows:
introducing a weighted adjacency matrix A t Weighting and representing the node information of the pedestrian space diagram, obtaining the magnitude of mutual influence among pedestrians through kernel function transformation and storing the magnitude in a weighted adjacency matrix A t Performing the following steps;
the reciprocal of the distance between two nodes in the Euclidean space is selected as a kernel function, and in order to avoid the problem of function divergence caused by too close distance between the two nodes, a tiny constant epsilon is added to accelerate model convergence, and the expression is as follows:
Figure BDA00037228197600000415
spatial map G for each time instant in the time dimension t Stacking to obtain a pedestrian trajectory prediction space-time diagram sequence G ═ G under diagram representation 1 ,…,G T }。
Further, the step (2) specifically comprises:
for the characteristic diagram time sequence obtained by inputting, obtaining output through the established space-time diagram convolution neural network:
e t =GNN(G t ) (1.6)
the GNN represents a constructed space-time diagram convolution neural network, and an output result is obtained by multilayer diagram convolution iteration; e.g. of a cylinder t Representing spatio-temporal feature information preliminarily extracted from a spatial dimension through a graph neural network;
for the output at each instant, there is such an operation; the output of the actual graph convolution neural network is a stack of such time series:
e g =Stack(e t ) (1.7)
wherein Stack (·) denotes the superposition of the input in an extended dimension, e g Representing the output of the graph convolution; in the actual processing process, a plurality of extension dimensions are simultaneously and parallelly sent to the neural network of the graph for processing;
the features are then appropriately dimension transformed through a full connectivity layer FC:
V GNN =FC(e g ) (1.8)
therefore, the preliminary extraction output of the characteristic information of the space-time graph convolutional neural network is obtained.
Further, in the step (3), the output of the spatiotemporal graph convolutional neural network is subjected to dimensionality transformation, and a CNN-based time sequence feature transformation network module is used and convolution times are designed to complete the extraction of the pedestrian self-history track feature information;
further, the step (3) is specifically as follows:
after the feature extraction information of the space-time graph convolutional neural network is obtained, sending the feature extraction information into a time sequence feature transformation network to extract time sequence features; in the second step, the dimensional characteristics are properly converted through a full connection layer, so that the network module in the second step directly utilizes the obtained characteristic information; in the invention, a multilayer CNN convolutional neural network is selected to process time dimension characteristic information, which can be expressed as:
e c =CNN(V GNN ) (1.9)
wherein, V GNN Representing feature information extracted from the convolutional neural network, e c Representing the output through a time series characteristic transformation network; then, a multi-layer perceptron MLP is used for increasing the expression capability of the network:
V CNN =MLP(e c ) (1.10)
the characteristics are transformed and processed through the network, and the output V of the time sequence characteristic transformation network is obtained CNN
Further, the main construction calculation content of the step four comprises: in order to increase the relation between pedestrian characteristics in the airspace, a airspace Transformer network is designed to further spatially aggregate the extracted characteristic information. Specifically, the feature vector of the same pedestrian in time sequence is input as an input vector, and the extracted features of different pedestrians are sequentially input.
For the spatial domain aggregation Transformer network, an encoder layer of a Transformer architecture is selected, and firstly, position coding is added to input:
V in =V CNN +PE pos,i (V CNN ) (1.11)
where pos represents the relative position of the input feature and i represents the dimension of the input feature. Then, introducing a multi-head attention layer, inputting query (Q), Key (K) and value (V) by using three attention layers obtained by matrix transformation from an input layer, dividing input characteristics according to a set multi-head number, and calculating an attention score, wherein the expression is as follows:
Figure BDA0003722819760000061
head i =Attention(Q i ,K i ,V i ) (1.13)
where, i ═ 1, …, nhead, nhead indicates the number of heads. And the final multi-head output completes the feature extraction in a splicing mode, and the expression is as follows:
V Multi =ConCat(head 1 ,…,head h )W o (1.14)
wherein ConCat denotes a splicing operation, W o A parameter matrix representing the attention layer output.
And then, completing the final output of the spatial Transformer through a feedforward neural network and layer normalization, wherein the final output is expressed as:
V out =LN(Feedback(V Multi )) (1.15)
by the aid of the structure mode, aggregation of pedestrian space interaction characteristics of the piles through the preliminarily extracted space-time characteristics is well completed, and the purpose of better outputting pedestrian tracks meeting scene pedestrian association and interaction is achieved.
In the aspect of the loss function, the sum of the negative log-likelihood of each point on the predicted track of the pedestrian is selected as the loss function. The loss function for the ith pedestrian is expressed as follows:
Figure BDA0003722819760000062
wherein the content of the first and second substances,
Figure BDA0003722819760000063
is an unknown pedestrian trajectory characteristic parameter to be predicted, T obs ,T pred Respectively representing observed and predicted endpoint times; and the sum of the loss functions of all pedestrians is the final loss function:
Figure BDA0003722819760000071
the forward loss function calculation and the reverse parameter updating are carried out on the model framework provided by the invention, so that the training of the model can be completed, and the reasonable pedestrian prediction trajectory output is obtained.
Advantageous effects
The invention provides a brand new network model architecture, which uses a time-space pattern convolution neural network, a time-sequence characteristic transformation network and other related transformation operations to effectively and accurately extract pedestrian characteristics in a scene, simultaneously designs a brand new airspace aggregation Transformer architecture to perform pedestrian time-sequence characteristic transformation and utilization, and finally completes the output of the pedestrian predicted trajectory in a probability distribution form, the method achieves the purposes of reasonably avoiding sudden situations and keeping the movement consistency of group pedestrians, completes more accurate and reasonable prediction of pedestrian space interaction, provides a new idea for further and deep research and exploration of the problem of pedestrian trajectory prediction, has profound significance and effect for more accurate and timely prediction and application in actual scenes, and provides help for development in the fields of automatic driving, intelligent traffic and the like.
Drawings
FIG. 1 is a general diagram of a space-time diagram and space-domain aggregation transform network framework according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the trajectory prediction performed by inputting the time-series transformation characteristics into the spatial aggregation Transformer network according to the present invention.
Detailed Description
The invention relates to a pedestrian trajectory prediction method based on a space-time diagram and airspace aggregation Transformer network, which mainly comprises the following steps:
for the pedestrian trajectory prediction problem under a given scene, N pedestrians are used for each observationThe coordinates of the time of day within the scene. For the coordinate information of the ith pedestrian at the t-th time, the coordinate information is used
Figure BDA0003722819760000072
And (4) showing. With the above definitions in mind, then the general formulation of the problem is that for each known set of given observed pedestrian trajectory sequences:
Figure BDA0003722819760000081
extracting and modeling pedestrian track characteristics by a constructed network framework through input data to obtain proper track characteristic information, and providing reasonable track prediction output in a scene:
Figure BDA0003722819760000082
wherein T is obs And T pred Respectively representing the observation time span and the prediction time span of the pedestrian, () representing the true value of the pedestrian track prediction,
Figure BDA0003722819760000083
and representing the predicted pedestrian track value given by the model.
Fig. 1 is a general schematic diagram of a space-time diagram and space-domain aggregation transform network framework according to an embodiment of the present invention.
The method comprises the following steps: the data is properly represented and preprocessed, and the accurate and efficient pedestrian information in the scene is provided
According to the invention, firstly, a proper graph representation method is used for carrying out correlation graph conversion and preprocessing on the input original pedestrian trajectory data, so that the input characteristic information can be conveniently extracted and efficiently utilized in the follow-up process.
For each time t, a space map G is introduced t And is used for representing the interaction characteristic relation among pedestrians at each time point. G t Is defined as G t =(V t ,E t ) Wherein V is t Representation space diagram G t Node information of (2), in this model, V t Coordinate information specifically representing pedestrians in the scene at the time t, i.e.
Figure BDA0003722819760000084
For the model, each
Figure BDA0003722819760000085
Using observed relative coordinate changes
Figure BDA0003722819760000086
To perform the engraving, i.e.:
Figure BDA0003722819760000087
Figure BDA0003722819760000088
wherein, i is 1, …, N, T is 2, …, T obs For the initial time, the relative offset of the position is defined as 0, i.e.
Figure BDA0003722819760000089
E t Then represents the space diagram G t Is a matrix of one dimension size n × n. It is defined in its ordinary sense as
Figure BDA00037228197600000810
Figure BDA00037228197600000811
The values of (c) are given as follows: if node
Figure BDA00037228197600000812
And node
Figure BDA00037228197600000813
Are connected, then
Figure BDA00037228197600000814
On the contrary, if the node
Figure BDA00037228197600000815
And node
Figure BDA00037228197600000816
Are not connected, then
Figure BDA00037228197600000817
For the prediction task, not only is the correlation between pedestrians desired to be obtained, but also the relative size of the interpersonal influence in the space is desired to be measured, so that a weighted adjacency matrix A is introduced t Weighting and expressing node information of the pedestrian space diagram, obtaining the magnitude of mutual influence among pedestrians through kernel function transformation, and storing the magnitude in a weighted adjacent matrix A t In the invention, the reciprocal of the distance between two nodes in the Euclidean space is used as a kernel function, and a tiny constant epsilon is added to accelerate the convergence of the model in order to avoid the problem of function divergence caused by the fact that the two nodes are too close to each other, wherein the expression is as follows:
Figure BDA0003722819760000091
spatial map G for each time instant in the time dimension t Stacking to obtain a pedestrian track prediction time-space diagram sequence G ═ G under the diagram representation 1 ,…,G T }. By means of the definition and transformation, the data graph representation and preprocessing in the pedestrian trajectory prediction problem are completed.
Step two: establishing a space-time graph convolutional neural network to preliminarily extract characteristic information
In the invention, aiming at the data obtained by representing the original data in the step one, a space-time graph convolutional neural network is used for preliminarily extracting the characteristic information.
In the model architecture, a graph convolution neural network is used for determining the proper number of convolution layers to carry out proper feature iteration times, so that the aim of better extracting the track features in the space is fulfilled.
For the characteristic diagram time sequence obtained by inputting, obtaining output through the established space-time diagram convolution neural network:
e t =GNN(G t ) (1.6)
the GNN represents a constructed time-space diagram convolution neural network, and an output result is obtained by multilayer diagram convolution iteration; e.g. of the type t Representing spatio-temporal feature information that is initially extracted from the spatial dimensions by a graph neural network.
This is done for the output at each instant. The output of the actual graph convolution neural network is a stack of such time series:
e g =Stack(e t ) (1.7)
wherein Stack (·) denotes the superposition of the input in an extended dimension, e g Representing the output of the graph convolution. In the actual processing process, a plurality of extension dimensions are simultaneously and parallelly sent to the neural network of the graph for processing.
The features are then appropriately dimension transformed through a full connectivity layer FC:
V GNN =FC(e g ) (1.8)
therefore, the preliminary extraction output of the characteristic information of the space-time graph convolutional neural network is obtained.
Step three: establishing a time sequence feature transformation network, and finishing the extraction of time sequence features and the transformation of feature dimensions by designing a convolutional neural network;
and after the characteristic extraction information of the space-time graph convolutional neural network is obtained, sending the characteristic extraction information into a time sequence characteristic transformation network to extract the time sequence characteristics. In the second step, the dimensional characteristics are properly transformed through a full connection layer, so that the network module in the second step directly utilizes the obtained characteristic information. In the invention, a multilayer CNN convolutional neural network is selected to process time dimension characteristic information, which can be expressed as:
e c =CNN(V GNN ) (1.9)
wherein, V GNN Representing characteristic information extracted from the convolutional neural network, e c Representing the output through the temporal feature transform network. Then, a multi-layer perceptron MLP is used for increasing the expression capability of the network:
V CNN =MLP(e c ) (1.10)
the characteristics are transformed and processed through the network, and the output V of the time sequence characteristic transformation network is obtained CNN
Step four: establishing a spatial domain aggregation Transformer network to further aggregate spatial domain characteristics, and finishing the output of a pedestrian trajectory prediction sequence
The pedestrian trajectory prediction method aims to solve the problems that interactive feature extraction is insufficient in the existing pedestrian trajectory prediction output, and further the spatial characteristics of pedestrians are not obvious, on one hand, the fact that the predicted trajectories of the pedestrians have large inertia and cannot avoid large corners aiming at conditions of high speed, burst and the like is shown, on the other hand, the fact that the motion consistency of pedestrian group behaviors is not enough, and the same motion trend cannot be kept among people closely related in space within a period of time is shown.
In order to increase the relation of pedestrian characteristics in an airspace, an airspace Transformer network is designed to further spatially aggregate the extracted characteristic information. Specifically, the feature vector of the same pedestrian in time sequence is input as an input vector, and the extracted features of different pedestrians are sequentially input.
For the spatial domain aggregation Transformer network, an encoder layer of a Transformer architecture is selected, and firstly, position coding is added to input:
V in =V CNN +PE pos,i (V CNN ) (1.11)
where pos represents the relative position of the input features and i represents the dimension of the input features. Then, introducing a multi-head attention layer, inputting query (Q), Key (K) and value (V) by using three attention layers obtained by matrix transformation from an input layer, dividing input characteristics according to a set multi-head number, and calculating an attention score, wherein the expression is as follows:
Figure BDA0003722819760000111
head i =Attention(Q i ,K i ,V i ) (1.13)
where, i ═ 1, …, nhead, nhead indicates the number of heads. And the final multi-head output completes the feature extraction in a splicing mode, and the expression is as follows:
V Multi =ConCat(head 1 ,…,head h )W o (1.14)
wherein ConCat denotes a splicing operation, W o A parameter matrix representing the attention layer output.
And then, completing the final output of the spatial Transformer through a feedforward neural network and layer normalization, wherein the final output is expressed as follows:
V out =LN(Feedback(V Multi )) (1.15)
by the aid of the structure mode, aggregation of pedestrian space interaction characteristics of the piles through the preliminarily extracted space-time characteristics is well completed, and the purpose of better outputting pedestrian tracks meeting scene pedestrian association and interaction is achieved.
In the aspect of the loss function, the sum of the negative log-likelihood of each point on the predicted track of the pedestrian is used as the loss function. The loss function for the ith pedestrian is expressed as follows:
Figure BDA0003722819760000112
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003722819760000113
is an unknown pedestrian trajectory characteristic parameter to be predicted, T obs ,T pred Respectively representing observed and predicted endpoint times; and the sum of the loss functions of all pedestrians is the final loss function:
Figure BDA0003722819760000114
the forward loss function calculation and the reverse parameter updating are carried out on the model framework provided by the invention, so that the training of the model can be completed, and the reasonable pedestrian prediction trajectory output is obtained.
In the process of evaluating the accuracy and the effectiveness of the model, similar to a common trajectory prediction evaluation method, an Average Differential Error (ADE) and an endpoint Differential Error (FDE) are used as evaluation indexes to describe the accuracy of a predicted trajectory. The average displacement error refers to the average of the L2 norm of the predicted displacement and the real displacement error of each pedestrian in the scene at each moment in time, and the endpoint displacement error refers to the average of the L2 norm of the predicted displacement and the real displacement error of each pedestrian in the scene at the endpoint moment in time, and the expression is as follows:
Figure BDA0003722819760000121
Figure BDA0003722819760000122
wherein the content of the first and second substances,
Figure BDA0003722819760000123
showing the true value of the track to be predicted of the predicted pedestrian,
Figure BDA0003722819760000124
a predicted trajectory of the pedestrian representing an output of the model; t is a unit of pred Indicates the predicted end point time, T p And the prediction time range is represented, and for the FDE index, the error of the coordinate of each pedestrian terminal point in the scene is only averaged, and no higher requirement is made on the selection of the walking route, while for the ADE index, the coordinate error summation of each time point is averaged. For both indices, a smaller value indicates a closer approach to the actual trajectory, and the better the prediction performance.
Because the actual output is probability distribution of the track in a two-dimensional plane, in the actual track prediction performance evaluation, in order to ensure track diversity and generalization capability, multiple sampling prediction (for example, 20 times) is often adopted, and the predicted track closest to the true value track is taken as the output track to calculate ADE/FDE and an evaluation model. Specifically, for the five data sets on ETH and UCY, sampling is performed every 0.4 seconds to generate pedestrian trajectory data, using pedestrian trajectory data every 20 frames as a data sample, training and validating the model by giving pedestrian trajectory data for 3.2s for the last 8 frames as input, and predicting pedestrian trajectory for 4.8s for the future 12 frames. The model of the invention was compared with two other algorithms that also used the graph network model for performance, the comparison results obtained are shown in table 1, the best performance is marked in red:
Figure BDA0003722819760000125
Figure BDA0003722819760000131
TABLE 1 comparison of the model with the prediction results of the graphical network mainstream model
As can be seen from table 1, the framework proposed by the present invention provides a great breakthrough to the endpoint prediction problem, the FDE index is optimal on almost all data sets, and both the average ADE and FDE indexes are optimal performance. Compared with the optimal performance of two graph network algorithms, the model respectively improves the FDE by 17%, 21%, 5% and 12% on ETH, UNIV, ZARA1 and ZARA2, and improves the FDE by 16% on the average FDE index. The data show that the model uses a space aggregation Transformer framework to input pedestrian time sequence feature vectors, concentrates on the utilization of extracting features in a space diagram neural network and a time sequence feature transformation network, completes the better aggregation of space pedestrian interaction features, achieves a better prediction effect, makes a greater breakthrough on FDE, and has stronger perception and expression on the interaction features of pedestrians in the space.

Claims (8)

1. A trajectory prediction method based on a space-time diagram and a space-domain aggregation Transformer network is characterized by comprising the following steps:
(1) representing and preprocessing the pedestrian track characteristic information in the scene from the input original data by using the characteristics of the graph, selecting a proper kernel function to complete the construction of an adjacent matrix, and providing accurate and efficient pedestrian track characteristic information in the scene for the subsequent network architecture input;
(2) establishing a space-time graph convolution neural network module, establishing a graph convolution neural network, and finishing the preliminary extraction of the graph representation in the step (1) and the preprocessed pedestrian track characteristic information by selecting the graph convolution times of the pedestrian track characteristic, so as to ensure the accuracy and effectiveness of the extracted characteristic;
(3) establishing a time sequence feature transformation network module, and finishing the extraction of time sequence feature information and the transformation of feature dimensions by designing a convolutional neural network;
(4) and establishing a spatial aggregation Transformer network, using the time sequence characteristic vector of each pedestrian in the scene as an input vector, simultaneously inputting the Transformer network to further aggregate spatial characteristics, and finishing the output of the pedestrian track prediction sequence.
2. The trajectory prediction method based on the spatio-temporal graph and spatial domain aggregation Transformer network as claimed in claim 1, wherein in the step (1), the spatio-temporal graph is introduced to represent the input original pedestrian trajectory data, a proper kernel function is selected from a plurality of kernel functions to construct an adjacency matrix under the graph meaning, so that efficient construction and selection of pedestrian features in a scene are completed, and accurate and efficient information is provided for subsequent modeling.
3. The trajectory prediction method based on the spatio-temporal graph and the spatial aggregation Transformer network as claimed in claim 2, wherein the representation of the input original pedestrian trajectory data by the introduced spatio-temporal graph is specifically as follows: for each time t, a space map G is introduced t The system is used for representing the interactive characteristic relation among pedestrians at each time point; g t Is defined as G t =(V t ,E t ) Wherein, V t Coordinate information specifically representing pedestrians in the scene at the time t, i.e.
Figure FDA0003722819750000011
Each one of which
Figure FDA0003722819750000012
Using observed relative coordinate changes
Figure FDA0003722819750000013
To perform the engraving, namely:
Figure FDA0003722819750000014
Figure FDA0003722819750000015
wherein, i is 1, …, N, T is 2, …, T obs For the initial time, the relative offset of the position is defined as 0, i.e.
Figure FDA0003722819750000016
E t Then represents a space diagram G t Is a matrix of one dimension size n × n; is defined as
Figure FDA0003722819750000021
Figure FDA0003722819750000022
The values of (a) are given by:
if node
Figure FDA0003722819750000023
And node
Figure FDA0003722819750000024
Are connected, then
Figure FDA0003722819750000025
On the contrary, if the node
Figure FDA0003722819750000026
And node
Figure FDA0003722819750000027
Are not connected, then
Figure FDA0003722819750000028
4. The trajectory prediction method based on the spatio-temporal graph and spatial aggregation Transformer network as claimed in claim 2, wherein the selecting a suitable kernel function from the plurality of kernel functions to construct the adjacency matrix in the graph sense is specifically:
introducing a weighted adjacency matrix A t Weighting and expressing node information of the pedestrian space diagram, obtaining the magnitude of mutual influence among pedestrians through kernel function transformation, and storing the magnitude in a weighted adjacent matrix A t Performing the following steps;
the reciprocal of the distance between two nodes in the Euclidean space is selected as a kernel function, and in order to avoid the problem of function divergence caused by too close distance between the two nodes, a tiny constant epsilon is added to accelerate model convergence, and the expression is as follows:
Figure FDA0003722819750000029
spatial map G for each time instant in the time dimension t Stacking to obtain a pedestrian trajectory prediction space-time diagram sequence G ═ G under diagram representation 1 ,…,G T }。
5. The trajectory prediction method based on the spatio-temporal graph and the spatial domain aggregation Transformer network as claimed in claim 4, wherein the step (2) is specifically as follows:
for the characteristic diagram time sequence obtained by inputting, obtaining output through the established space-time diagram convolution neural network:
e t =GNN(G t ) (1.6)
the GNN represents a constructed time-space diagram convolution neural network, and an output result is obtained by multilayer diagram convolution iteration; e.g. of the type t Representing spatiotemporal feature information preliminarily extracted from spatial dimensions through a graph neural network;
for the output at each instant, there is such an operation; the output of the actual graph convolution neural network is a stack of such time series:
e g =Stack(e t ) (1.7)
wherein Stack (·) denotes the superposition of the input in an extended dimension, e g Representing the output of the graph convolution; in the actual processing process, a plurality of extension dimensions are simultaneously and parallelly sent to a graph neural network for processing;
the features are then appropriately dimension transformed through a full connectivity layer FC:
V GNN =FC(e g ) (1.8)
therefore, the preliminary extraction output of the characteristic information of the space-time graph convolutional neural network is obtained.
6. The trajectory prediction method based on the spatio-temporal graph and spatial aggregation Transformer network as claimed in claim 1, wherein in the step (3), the output of the spatio-temporal graph convolutional neural network is subjected to appropriate dimension transformation, and the extraction of the pedestrian's own historical trajectory feature information is completed by using a CNN-based time-series feature transformation network module and designing the convolution times.
7. The trajectory prediction method based on the spatio-temporal graph and the spatial domain aggregation Transformer network as claimed in claim 6, wherein the step (3) is specifically as follows:
after the feature extraction information of the space-time graph convolutional neural network is obtained, sending the feature extraction information into a time sequence feature transformation network to extract time sequence features; in the second step, the dimensional characteristics are properly converted through a full connection layer, so that the network module in the second step directly utilizes the obtained characteristic information; in the invention, a multilayer CNN convolutional neural network is selected to process time dimension characteristic information, which can be expressed as:
e c =CNN(V GNN ) (1.9)
wherein, V GNN Representing characteristic information extracted from the convolutional neural network, e c Representing the output through a time series characteristic transformation network; then, a multi-layer perceptron MLP is used for increasing the expression capability of the network:
V CNN =MLP(e c ) (1.10)
the characteristics are transformed and processed through the network, and the output V of the time sequence characteristic transformation network is obtained CNN
8. The trajectory prediction method based on the spatio-temporal graph and the spatial domain aggregation Transformer network as claimed in claim 1, wherein the step (4) is specifically as follows:
inputting a feature vector of the same pedestrian on a time sequence as an input vector, and sequentially inputting extracted features of different pedestrians;
for the spatial domain aggregation Transformer network, an encoder layer of a Transformer architecture is selected, and firstly, position coding is added to input:
V in =V CNN +PE pos,i (V CNN ) (1.11)
where pos represents the relative position of the input feature and i represents the dimension of the input feature; then introducing a multi-head attention layer, inputting Query, Key and Value by using three attention layers obtained by matrix transformation from an input layer, dividing input characteristics according to a set multi-head number, and calculating an attention score, wherein the expression is as follows:
Figure FDA0003722819750000041
head i =Attention(Q i ,K i ,V i ) (1.13)
wherein, i is 1, …, nhead, and nhead represents the number of the multiple heads; and the final multi-head output completes the feature extraction in a splicing mode, and the expression is as follows:
V Multi =ConCat(head 1 ,…,head h )W o (1.14)
wherein ConCat denotes a splicing operation, W o A parameter matrix representing the attention layer output.
And then, completing the final output of the spatial Transformer through a feedforward neural network and layer normalization, wherein the final output is expressed as:
V out =LN(Feedback(V Multi )) (1.15)
in the aspect of a loss function, the sum of negative log-likelihood of each point on the predicted track of the pedestrian is selected as the loss function; the loss function for the ith pedestrian is expressed as follows:
Figure FDA0003722819750000042
wherein the content of the first and second substances,
Figure FDA0003722819750000043
is an unknown pedestrian trajectory characteristic parameter to be predicted, T obs ,T pred Respectively representing observed and predicted endpoint times; and the sum of the loss functions of all pedestrians is the final loss function:
Figure FDA0003722819750000044
the forward loss function calculation and the reverse parameter updating are carried out on the model framework, so that the training of the model is completed, and the reasonable predicted trajectory output of the pedestrian is obtained.
CN202210767796.8A 2022-06-30 2022-06-30 Track prediction method based on space-time diagram and airspace aggregation transducer network Active CN114997067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210767796.8A CN114997067B (en) 2022-06-30 2022-06-30 Track prediction method based on space-time diagram and airspace aggregation transducer network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210767796.8A CN114997067B (en) 2022-06-30 2022-06-30 Track prediction method based on space-time diagram and airspace aggregation transducer network

Publications (2)

Publication Number Publication Date
CN114997067A true CN114997067A (en) 2022-09-02
CN114997067B CN114997067B (en) 2024-07-19

Family

ID=83019465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210767796.8A Active CN114997067B (en) 2022-06-30 2022-06-30 Track prediction method based on space-time diagram and airspace aggregation transducer network

Country Status (1)

Country Link
CN (1) CN114997067B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392595A (en) * 2022-10-31 2022-11-25 北京科技大学 Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer
CN115881286A (en) * 2023-02-21 2023-03-31 创意信息技术股份有限公司 Epidemic prevention management scheduling system
CN115966313A (en) * 2023-03-09 2023-04-14 创意信息技术股份有限公司 Integrated management platform based on face recognition
CN117493424A (en) * 2024-01-03 2024-02-02 湖南工程学院 Vehicle track prediction method independent of map information
CN117523821A (en) * 2023-10-09 2024-02-06 苏州大学 System and method for predicting vehicle multi-mode driving behavior track based on GAT-CS-LSTM
CN117933492A (en) * 2024-03-21 2024-04-26 中国人民解放军海军航空大学 Ship track long-term prediction method based on space-time feature fusion
WO2024119489A1 (en) * 2022-12-09 2024-06-13 中国科学院深圳先进技术研究院 Pedestrian trajectory prediction method, system, device, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113762595A (en) * 2021-07-26 2021-12-07 清华大学 Traffic time prediction model training method, traffic time prediction method and equipment
CN113837148A (en) * 2021-11-04 2021-12-24 昆明理工大学 Pedestrian trajectory prediction method based on self-adjusting sparse graph transform
CN114117892A (en) * 2021-11-04 2022-03-01 中通服咨询设计研究院有限公司 Method for predicting road traffic flow under distributed system
CN114267084A (en) * 2021-12-17 2022-04-01 北京沃东天骏信息技术有限公司 Video identification method and device, electronic equipment and storage medium
CN114626598A (en) * 2022-03-08 2022-06-14 南京航空航天大学 Multi-modal trajectory prediction method based on semantic environment modeling
CN114638408A (en) * 2022-03-03 2022-06-17 南京航空航天大学 Pedestrian trajectory prediction method based on spatiotemporal information
CN114757975A (en) * 2022-04-29 2022-07-15 华南理工大学 Pedestrian trajectory prediction method based on transformer and graph convolution network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113762595A (en) * 2021-07-26 2021-12-07 清华大学 Traffic time prediction model training method, traffic time prediction method and equipment
CN113837148A (en) * 2021-11-04 2021-12-24 昆明理工大学 Pedestrian trajectory prediction method based on self-adjusting sparse graph transform
CN114117892A (en) * 2021-11-04 2022-03-01 中通服咨询设计研究院有限公司 Method for predicting road traffic flow under distributed system
CN114267084A (en) * 2021-12-17 2022-04-01 北京沃东天骏信息技术有限公司 Video identification method and device, electronic equipment and storage medium
CN114638408A (en) * 2022-03-03 2022-06-17 南京航空航天大学 Pedestrian trajectory prediction method based on spatiotemporal information
CN114626598A (en) * 2022-03-08 2022-06-14 南京航空航天大学 Multi-modal trajectory prediction method based on semantic environment modeling
CN114757975A (en) * 2022-04-29 2022-07-15 华南理工大学 Pedestrian trajectory prediction method based on transformer and graph convolution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHE HUANG: ""Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory Prediction"", 《IEEE ROBOTICS AND AUTOMATION LETTERS》, vol. 7, no. 2, 28 December 2021 (2021-12-28), pages 1198 - 1205 *
成星橙: ""基于Transformer与图卷积网络的行人轨迹预测研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2023, 15 December 2023 (2023-12-15), pages 138 - 34 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392595A (en) * 2022-10-31 2022-11-25 北京科技大学 Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer
CN115392595B (en) * 2022-10-31 2022-12-27 北京科技大学 Time-space short-term wind speed prediction method and system based on graph convolution neural network and Transformer
WO2024119489A1 (en) * 2022-12-09 2024-06-13 中国科学院深圳先进技术研究院 Pedestrian trajectory prediction method, system, device, and storage medium
CN115881286A (en) * 2023-02-21 2023-03-31 创意信息技术股份有限公司 Epidemic prevention management scheduling system
CN115881286B (en) * 2023-02-21 2023-06-16 创意信息技术股份有限公司 Epidemic prevention management scheduling system
CN115966313A (en) * 2023-03-09 2023-04-14 创意信息技术股份有限公司 Integrated management platform based on face recognition
CN115966313B (en) * 2023-03-09 2023-06-09 创意信息技术股份有限公司 Integrated management platform based on face recognition
CN117523821A (en) * 2023-10-09 2024-02-06 苏州大学 System and method for predicting vehicle multi-mode driving behavior track based on GAT-CS-LSTM
CN117493424A (en) * 2024-01-03 2024-02-02 湖南工程学院 Vehicle track prediction method independent of map information
CN117493424B (en) * 2024-01-03 2024-03-22 湖南工程学院 Vehicle track prediction method independent of map information
CN117933492A (en) * 2024-03-21 2024-04-26 中国人民解放军海军航空大学 Ship track long-term prediction method based on space-time feature fusion
CN117933492B (en) * 2024-03-21 2024-06-11 中国人民解放军海军航空大学 Ship track long-term prediction method based on space-time feature fusion

Also Published As

Publication number Publication date
CN114997067B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
CN114997067A (en) Trajectory prediction method based on space-time diagram and space-domain aggregation Transformer network
CN113887610B (en) Pollen image classification method based on cross-attention distillation transducer
CN106970615B (en) A kind of real-time online paths planning method of deeply study
Zhang et al. Generative adversarial network based heuristics for sampling-based path planning
CN110599521B (en) Method for generating trajectory prediction model of vulnerable road user and prediction method
CN114611663B (en) Customized pedestrian track prediction method based on online updating strategy
CN114613013A (en) End-to-end human behavior recognition method and model based on skeleton nodes
CN115829171B (en) Pedestrian track prediction method combining space-time information and social interaction characteristics
CN114117259A (en) Trajectory prediction method and device based on double attention mechanism
Zhao et al. Spatial-channel transformer network for trajectory prediction on the traffic scenes
Ye et al. GSAN: Graph self-attention network for learning spatial–temporal interaction representation in autonomous driving
Su et al. Pedestrian trajectory prediction via spatial interaction transformer network
CN116382267B (en) Robot dynamic obstacle avoidance method based on multi-mode pulse neural network
Liu et al. Multi-agent trajectory prediction with graph attention isomorphism neural network
CN116503446A (en) Multi-mode vehicle track prediction method for target driving and distribution thermodynamic diagram output
CN114626598A (en) Multi-modal trajectory prediction method based on semantic environment modeling
CN114580718B (en) Pedestrian track prediction method based on condition variation generation countermeasure network
Liu et al. Data augmentation technology driven by image style transfer in self-driving car based on end-to-end learning
CN115272712A (en) Pedestrian trajectory prediction method fusing moving target analysis
Chen et al. HGCN-GJS: Hierarchical graph convolutional network with groupwise joint sampling for trajectory prediction
CN117314956A (en) Interactive pedestrian track prediction method based on graphic neural network
CN117522920A (en) Pedestrian track prediction method based on improved space-time diagram attention network
Yuan et al. Steeringloss: A cost-sensitive loss function for the end-to-end steering estimation
CN115457657A (en) Method for identifying channel characteristic interaction time modeling behaviors based on BERT model
Zhou et al. Sa-sgan: A vehicle trajectory prediction model based on generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant