CN116504060B

CN116504060B - Diffusion diagram attention network traffic flow prediction method based on Transformer

Info

Publication number: CN116504060B
Application number: CN202310483068.9A
Authority: CN
Inventors: 张红; 王红燕; 巩蕾; 张玺君; 朱思雨; 李扬; 伊敏; 魏骄云; 杨俊译
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2023-05-01
Filing date: 2023-05-01
Publication date: 2024-05-14
Anticipated expiration: 2043-05-01
Also published as: CN116504060A

Abstract

The traffic flow combination prediction method adopts a transform encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolutional network modules (ST-Conv Block) and a spread map attention module (DGA-Block), the ST-Conv Block respectively captures the time dependence and the space dependence of traffic flows through a time gating convolutional network and a space convolutional network, and the DGA-Block adaptively learns the diffusion parameter of each spread step by using a query key value self-attention mechanism and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of the traffic flows. In addition, the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.

Description

Diffusion diagram attention network traffic flow prediction method based on Transformer

Technical Field

The invention relates to the technical field of intelligent traffic, in particular to a traffic flow prediction technology of a diffusion diagram attention network (T-DGAN) based on a transducer.

Background

Traffic flow prediction is an important component of an Intelligent Traffic System (ITS) and can provide scientific basis for management and planning of an urban traffic system. According to the predicted traffic state, traffic departments can deploy and guide traffic flows in advance, so that the running efficiency of a road network is improved, and traffic jams are relieved.

Over the past several decades, researchers have conducted extensive research into traffic flow prediction methods, including autoregressive moving average (ARIMA), kalman Filter (KF), and multi-layer perceptron (MLP), among others. However, since the time series is based on stationarity assumptions, these methods cannot handle complex nonlinear traffic flow data. Accordingly, in order to deal with complex traffic conditions and capture the non-linear relationship of traffic flows, many machine learning methods have been employed to predict traffic flows. For example, short-term traffic flow prediction is performed using a K-nearest neighbor (KNN) method, which considers the spatial correlation characteristics of adjacent road segments. The Bayesian network method processes the uncertain information and performs probabilistic reasoning for short-time traffic flow prediction. The Support Vector Machine (SVM) method is used as a machine learning method based on a statistical learning theory, and short-time traffic flow prediction can be well performed. The long-term memory network (LSTM) effectively captures the nonlinearity of traffic dynamics, and the method can overcome the problem of memory block counter-propagation error attenuation. However, the above approach performs poorly in long-term traffic flow prediction tasks due to the high degree of non-linearity and dynamic spatio-temporal dependence of traffic flow.

In recent years, with the widespread use of deep learning in the traffic field, researchers use Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to capture spatial and temporal dependencies of traffic flows, respectively, while this method captures temporal and spatial dependencies of traffic flows, CNNs are applicable to euclidean data with regular grids, and modeling irregular road networks may lose topology information of the traffic networks. To address this problem, graph roll-up networks (GCNs) are used instead of CNNs to better handle non-euclidean data in traffic road networks. Although existing mixed methods based on GCN and RNN have a great improvement in prediction performance, these methods still have some drawbacks. Since the GCN uses the laplace feature matrix of the graph to calculate and update the feature information of all nodes in the graph, the GCN has poor flexibility and expansibility in capturing the spatial correlation of traffic flows.

In the above method, the spatial structure of the road network is represented by a predefined adjacency matrix, which limits the possibility of learning the dynamic spatiotemporal characteristics of the traffic flow due to its complexity and dynamics. In response to this problem, researchers have proposed using a gated attention network to learn the dynamic spatial correlation of traffic flows from traffic flows based on the graph attention mechanism, proposed a graph multi-attention network (GMAN) for traffic flow prediction, and used a time-space attention mechanism to capture the dynamic spatial-temporal correlation of traffic flows in GMAN. At the same time, the Transformer is used as a deep learning method, which models sequences with encoder and decoder structures and learns dynamic features in data using a multi-headed attention mechanism, which is advantageous to solve the problem of dynamic spatiotemporal correlation in which traffic flows are difficult to capture due to the use of predefined neighbor matrices.

Disclosure of Invention

The invention aims to better capture the complex space-time correlation of traffic flows, and provides a transition-based diffusion diagram attention network (T-DGAN) traffic flow prediction method.

The invention relates to a traffic flow prediction method of a diffusion map attention network based on a transducer, which is characterized in that a T-DGAN adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules, namely ST-Conv blocks, and a diffusion map attention module, namely DGA-blocks. The ST-Conv Block captures the time dependence and the space dependence of traffic flow through a time gating convolution network and a space convolution network respectively, the DGA-Block self-adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of traffic flow; the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.

The invention has the following advantages:

1. The invention provides a transition-based diffusion diagram attention network traffic flow prediction method (T-DGAN). The method adopts an encoder-decoder architecture, wherein a codec stacks a plurality of space-time convolutional network modules (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and road network information is described through a dynamic diagram. The decoder adds an information Auxiliary Block (auxliary Block) on the basis of the encoder to aggregate traffic flow information between the encoder and the decoder.

2. The present invention uses a time-space convolution network (ST-Conv Block) to learn the time-space correlation of traffic flows. The time-gated convolutional layer is used for capturing the time dependence of the traffic flow, and the spatial convolutional layer is used for capturing the spatial dependence of the traffic flow.

3. The invention uses a diffusion diagram attention (DGA-Block) method to model the dynamic space correlation of traffic flow, and the method utilizes a query key value self-attention mechanism to adaptively learn the diffusion parameters of each diffusion step and dynamically update an adjacent transfer matrix to reflect the space dynamic change characteristic of traffic flow.

4. A large number of comparison experiments are respectively carried out on two groups of traffic data sets, and experimental results show that compared with a baseline method, the method provided by the invention has the advantage that more accurate prediction precision is obtained on different data sets.

Drawings

Fig. 1 is a diagram of a T-DGAN method, fig. 2 is a diagram of a time convolution network, fig. 3 is a diagram of T-DGAN versus PeMS, node=11, fig. 4 is a diagram of T-DGAN versus PeMS, node=190, fig. 5 is a diagram of T-DGAN versus METR-LA, node=119, fig. 6 is a diagram of T-DGAN versus METR-LA, node=176, fig. 7 is a adjacency matrix T _e,T_d (step 0) in PeMS data set, and fig. 8 is an adjacency matrix T _e,T_d (step 5) in METR-LA data set.

Description of the embodiments

The present invention will be described in further detail with reference to examples.

1 Method

The invention provides a diffusion diagram attention network traffic flow prediction method (T-DGAN) based on a transducer, wherein an encoder layer consists of a space-time convolution network module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and a decoder layer consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block) and an information Auxiliary module (Auxiliary Block). The encoder and decoder have an L-1 layer and an L' -1 layer, respectively. Given the input X _{{t-T′+1,...,t}} and the adjacency matrix A of the T-DGAN method, they are first converted into feature matrices, respectivelyAnd transition matrix/>Wherein,D represents a degree matrix with A having a self-loop, i.e./> And/>Weighting matrices of encoder and decoder respectively representing X { _t-T+1,...,t }, v >And/>Indicating the bias of the encoder and decoder, respectively. /(I)And/>Representing the adjacent transform matrices of the encoder and decoder, respectively. By/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation. Output/>, of the last layer encoderAnd/>And Diffusion Attention modules input to each layer of decoder to aggregate traffic flow temporal and spatial characteristic information between the encoder and decoder.

2 Problem definition

In the present invention, the road network is represented as graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R ^N×N represents a weighted adjacency matrix, a _ij is 1 if V _i,v_j E V and (V _i,v_j) E, otherwise 0. In each time step t, traffic flow X _t∈R^N×C on graph G is given, where C represents the number of features per node. The traffic flow prediction problem aims at learning a function f that can take X _{{t-T+1,...,t}} as input and predict the traffic flow for T time steps in the future, the mapping is as follows:

2.1 convolutional encoder for extracting spatio-temporal features

The encoder is used for extracting space-time characteristics from historical traffic flow data and consists of a space-time convolution module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block). Specifically, each ST-Conv Block comprises a time-gated convolutional layer and a spatial convolutional layer, which are used for capturing the time features and the spatial features of the traffic flow. DGA-Block learns the diffusion parameters of each diffusion step using query key value attention and dynamically more closely adjoins the transition matrix to reflect the spatially dynamic nature of the traffic flow.

(1) Time-gated convolutional layer

The time-gated convolution layer comprises a one-dimensional convolution using a Gated Linear Unit (GLU) to capture the time dependence of traffic flow. For each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged. Given the time convolution input of each nodeIt is a sequence of length P, with D _in features, using a 1D convolution kernel/>Core size (K _t, 1), input size D _in and output size 2D _out give output/>P, Q are split into two parts along the feature dimension and input to the GLU. Thus, the time-gated convolutional layer can be expressed as:

Wherein P, Q are the inputs of the gates in the GLU, respectively, and by which is meant the Hadamard product based on the element, sigma (Q) uses the Sigmoid function as the activation function, selectively obtaining the hidden state and the information in the input X.

(2) Graph rolling network

The graph convolution operation aggregates the features of neighboring nodes to a central node based on a graph structure to update node features, graph Convolution Network (GCN), chebNet is simplified by a first order approximation:

Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D _in features,/>Representing output,/>Representing a matrix of parameters that can be learned. The basic GCN is only applicable to undirected graphs, not conforming to the directional nature of the traffic network. To facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):

Where M ^k represents the power series of the transition matrix and K represents the number of diffusion steps. In the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M _f =a/rowsum (a) and the backward transfer matrix is M _b＝A^T/rowsum (a).

(3) Spatial convolution layer

The present invention proposes a spatial convolution layer to capture local and global spatial dependencies of traffic flows. The present invention performs a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4). Formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:

Where W represents a learnable parameter matrix convolved with the adjacency matrix a.

2.2 Distraction map attention network encoder

The invention takes the layer I as an example, and gives inputAnd/>Output feature matrix/>The following is shown:

Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a matrix of weights that can be learned,Representing a linear transformation matrix. Given/>The head number of (2) is H, and then:

Wherein, Representing a single head diffusion attention, ||representing a join operation.

K represents a diffusion step size, K represents a maximum diffusion order,Calculated from equation (8):

Wherein, theta _k represents the diffusion weight coefficient for the corresponding diffusion step The invention uses Query-Key-Value attention to obtain the appropriate θ _k as follows:

Wherein W _V represents the transformation matrix of Value, view represents reshape operation of matrix, namely, given the shape of the original matrix as R ^N×N, the output is a single-row vector with the dimension as An input sequence representing Query-Key-Value attention. e _ik represents the attention score between two different diffusion steps i and k, and e _ij represents the attention score between two different diffusion steps i and j. e _ij is calculated by equation (10):

Wherein d _qs represents the size of the Query, and W _Q and W _K represent transformation matrices of the Query and Key, respectively. And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.

Computing output adjacency transfer matrices by residual connectionsThe following is shown:

Wherein, Representing residual connection,/>A dynamically updated portion of representation adjacency transition, which is calculated as follows:

where m.epsilon.1, M represents the replica index, M represents the number of replicas, Representation/>The ith row and jth column element of (c)/>Representing the attention score of the mth replica, it is calculated as follows:

where laekyReLU denotes the activation function, a ^m (·) denotes the learnable weight vector of the mth replica. And/>Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.

2.3 Space-time decoder for traffic flow prediction

The decoder is configured to receive the spatio-temporal features extracted by the encoder to generate a future traffic stream sequence. The single layer decoder consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block), and an Auxiliary module (auxliary Block) that aggregates information between the encoder and decoder. The input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:

Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix. /(I)And/>The calculation process of (2) is the same as that of formulas (11) and (12).And/>And/>And/>Together into an Auxiliary module (auxliary Block) to aggregate traffic flow information between the encoder and decoder.

Then, the output of the layer-I decoder is as follows:

Wherein, The calculation process is similar to the formula (7), the calculation formula (9) of the diffusion parameter shows the diffusion attention, and the attention score calculation is shown in the formula (10). Will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:

Wherein, Transformation matrix representing Value,/>Representing the input sequence. /(I)Calculated from equation (19):

Wherein d _qs represents the size of Query, And/>Transform matrices representing Query and Key, respectively. /(I)And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.

3. Experiment

3.1 Data description

The present invention uses two sets of traffic datasets PeMS and METR-LA to verify the performance of the T-DGAN method proposed by the present invention. The experimental traffic data set contains different attributes, and the detailed information of the data set is shown in table 1:

TABLE 1 description of experimental data sets

PeMS03 is collected every 30 seconds by Caltrans performance measurement system (PeMS) and records spatial location information of traffic flow data monitoring sensors. The number of sensors in PeMS is 555. The collection time period is from 1 month 1 day 2018 to 31 days 1 month 2018, and the traffic speed is summarized every 5 minutes.

The METR-LA dataset was derived from a loop detector on the los Angeles highway, with a time span of 3 months, 1 year 2012, 7 days 3 years 2012, and a historical traffic speed collected by 207 sensors was selected and summarized every 5 minutes.

3.2 Experimental setup

The experiment was compiled and executed on a Windows server (CPU: intel (R) Core (TM) [email protected] 1.50GHz, 16GBRAM,GPU:NVIDIAGeForce RTX 2080TI), based on a Pytorch deep learning framework, with the construction and training of the T-DGAN method in PyCharm software.

The invention divides the data set into 60 percent: 10%: the 30% scale is divided into training, validation and test sets. The batch size was set to 8, the number of heads for diffuse attention and schematic attention in dga-Block was set to 8, the dimension of node embedding was set to 16, the maximum diffusion step size was set to 3, training epochs for pems03 and met-LA datasets was set to 60 and 80, respectively, the historical and predicted data lengths were set to 12 and 12, respectively, and the method was trained using an Adam optimizer with an initial learning rate of 0.001.

3.3 Evaluation index and baseline method

(1) Evaluation index

For better evaluation of the predictive performance of the method, the invention uses Mean Absolute Error (MAE), root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) as the evaluation index of the T-DGAN method:

1) Mean Absolute Error (MAE):

2) Root Mean Square Error (RMSE):

3) Mean Absolute Percent Error (MAPE):

wherein y _i and Representing the actual traffic speed and the predicted traffic speed, respectively. n represents the number of nodes on the traffic road network.

(2) Baseline method

The invention is mainly compared with a deep learning method and other baseline methods, wherein the baseline methods are as follows:

1) History averaging method (HA): the average traffic information for the historical period is used as a prediction.

2) Vector Autoregressive (VAR): vector autoregressive describes that n variables during the same sample period can be a linear function of their historical values.

3) Support vector regression method (SVR): support vector regression uses a linear support vector machine to train a method to obtain a relationship between input and output to predict traffic flow.

4) Feedforward Neural Network (FNN): feedforward neural network with two hidden layers and L2 regularization.

5) Autoregressive moving average method (ARIMA): an autoregressive moving average method with a Kalman filter.

6) Long and short term memory network (FC-LSTM): a recurrent neural network with fully connected LSTM hidden units.

7) Diffusion convolution cyclic neural network (DCRNN): diffusion convolution is combined with a recurrent neural network for traffic flow prediction.

8) GRAPH WAVENET (G-WN): the G-WN network combines a graph convolution network and an dilation-cause convolution network.

9) Space-time diagram convolutional network (STGCN): a space-time graph convolution network that combines graph convolution and one-dimensional convolution.

10A concentration-based space-time diagram convolutional network (ASTGCN): the spatio-temporal attention mechanism is further integrated into a spatio-temporal graph convolutional network for capturing the dynamic spatio-temporal patterns of traffic flow.

11 An Adaptive Graph Convolution Recursion Network (AGCRN): conventional graph convolution networks are improved by using node adaptive parameter learning and data adaptive graph generation modules for learning node specific patterns and capturing spatial correlations, respectively.

12 Figure multi-attention network (GMAN): multiple spatiotemporal attention blocks integrate the encoder-decoder architecture, converting attention between the encoder and decoder.

3.4 Experimental results and analysis

The present invention performs predictive visualization on PeMS and METR-LA datasets. The time range was set to 288 durations and nodes 11 and 190 were randomly selected for visualization in PeMS dataset, with the results shown in fig. 3 and 4. The nodes 119 and 176 are randomly selected for visualization in the METR-LA dataset as shown in FIGS. 5 and 6. It can be found that the prediction of the T-DGAN method accurately follows the true value of the traffic speed.

The T-DGAN method and various baseline methods are subjected to experiments on PeMS and METR-LA data sets, and the prediction results of the T-DGAN method and the baseline methods in 15 minutes, 30 minutes and 60 minutes are shown in table 2 and table 3, so that the experimental results show that the T-DGAN method provided by the invention obtains good prediction results on the two data sets.

It can be observed from tables 2 and 3 that the predictions of conventional time series analysis methods are not ideal, indicating that these methods have limited modeling capabilities for non-linearities and high complexity of traffic flow. Meanwhile, compared with the traditional time sequence analysis method, the deep learning-based method obtains better prediction results. For example, the DCRNN, STGCN, ASTGCN method and the T-DGAN method of the present invention consider both the spatio-temporal correlation and have better performance than the conventional time series methods such as ARIMA and FC-LSTM. In addition, GMAN methods perform better than G-WN, STGCN, ASTGCN, etc., indicating that the encoder-decoder architecture used in GMAN can effectively capture the dynamic spatio-temporal correlation of traffic streams.

In contrast, the T-DGAN method provided by the invention obtains better prediction results compared with a baseline method, and proves the effectiveness of the T-DGAN method in capturing the time-space correlation of traffic flow. Meanwhile, the T-DGAN method captures the time-space correlation of traffic flow through an encoder-decoder architecture, and models the direct relation between the historical time step and the future time step by combining a time-space convolution network and a diffusion diagram attention mechanism, thereby being beneficial to alleviating the error propagation problem between the predicted time steps.

TABLE 2 comparison of predicted Performance on PeMS data set

TABLE 3 comparison of predicted Performance on METR-LA datasets

In order to evaluate the performance of different modules in the T-DGAN method provided by the invention, an ablation experiment is performed.

(1) Influence of dynamic diagram on prediction result

According to the invention, an ablation experiment of a dynamic graph and a static graph is carried out on PeMS and METR-LA datasets, and the influence of the dynamic graph and the static graph on traffic flow prediction is researched. As can be seen from the ablation experimental results in table 4, the prediction performance of the dynamic graph is better than that of the static graph, and the dynamic graph has better prediction performance on traffic flow than that of the static graph.

TABLE 4 dynamic and static images set the experimental results

(2) Influence of space-time convolution (ST-Conv Block) on prediction results

To study the performance of the different modules in the T-DGAN method, variants of the T-DGAN method (NST-Conv Block: without space-time convolution network module) were designed, the effect of the space-time convolution module on the method prediction performance was verified, and traffic flow predictions of NST-Conv Block variants and T-DGAN method on PeMS and METR-LA datasets were performed for 15 min, 30min and 60 min, as shown in Table 5.

Table 5.T-comparison of the prediction results of the DGAN method and the variant method

At 15 minutes, the T-DGAN method reduced MAE on PeMS and METR-LA datasets by about 6.67%,1.52% and RMSE by about 3.47% and 2.02% respectively, as compared to NST-Conv Block method. At 30 minutes, MAE was reduced by about 7.16%,2.01% and RMSE was reduced by about 3.91%,0.94%, respectively. At 60 minutes, MAE was reduced by about 11.56%,2.04%, and RMSE was reduced by about 6.93%,1.08%, respectively. As can be obtained from table 5, the T-DGAN method has better prediction performance in different prediction time steps, and particularly in long-term prediction, the difference between the T-DGAN method and the NST-Conv Block method is more remarkable, which proves that the ST-Conv Block module effectively relieves the influence of error propagation.

(3) Influence of dynamic adjacency matrix on prediction result

The adjacency-shift matrix contains edge weight information between vertices, and the edge weights reflect traffic flow between traffic sensors, so that the dynamically updated adjacency matrix shows dynamically changing traffic flow over the road segment. Experiments prove that the adjacent transfer matrix is dynamically updated in the learning process. The results on PeMS and METR-LA datasets are shown in FIGS. 7 and 8, respectively, with the last batch T _e,T_d being different at randomly truncated time nodes, demonstrating that T _e,T_d is constantly changing during the learning process.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A traffic flow combined prediction method of a diffusion map attention network based on a transducer adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules and a diffusion map attention module; the space-time convolution network module captures the time dependence and the space dependence of traffic flow through the time gating convolution network and the space convolution network respectively, the diffusion map attention module adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates the adjacent transfer matrix so as to capture the dynamic space-time dependence of traffic flow; the decoder is added with an information auxiliary module to aggregate traffic flow information between the encoder and the decoder, and finally, a prediction sequence is output through the decoder to predict;

Wherein the encoder and the decoder have an L-1 layer and an L-1 layer, respectively; given the input X _{{t-T′+1,...,t}} and the adjacency matrix A of the traffic flow prediction method, they are first converted into feature matrices, respectively And transition matrixWherein/>D represents a degree matrix with A having a self-loop: /(I)And/>Weighting matrices for encoder and decoder, respectively, representing X _{{t-T+1,...,t}},/>And/>Representing the bias of the encoder and decoder, respectively; t _e ⁽¹⁾ and/>Adjacent transition matrices representing the encoder and decoder, respectively; by/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation; output/>, of the last layer encoderAnd/>Diffusion Attention modules input to each layer of decoders to aggregate traffic flow temporal-spatial characteristic information between the encoder and decoder;

Representing the road network as a graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R ^N×N represents a weighted adjacency matrix, a _ij is 1 if V _i,v_j E V and (V _i,v_j) E, otherwise 0; in each time step t, traffic flow X _t∈R^N×C on graph G is given, where C represents the number of features of each node;

Learning function f of the traffic flow prediction method: taking X _{{t-T′+1,...,t}} as input, and predicting traffic flow of T time steps in the future, the mapping relation is as follows:

2. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the time-gated convolution layer comprises a one-dimensional convolution, using a gated linear unit to capture the time dependence of traffic flow; for each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged; given the time convolution input of each node It is a sequence of length P, with D _in features, using a 1D convolution kernel/>Core size (K _t, 1), input size D _in and output size 2D _out give output/>P and Q are divided into two parts along the characteristic dimension and input into a gating linear unit; the time-gated convolutional layer can be expressed as:

wherein, P and Q are the inputs of gates in the gating linear unit respectively, and the Hadamard product of the elements is represented by the following, sigma (Q) uses a Sigmoid function as an activation function to selectively acquire information in the hidden state and the input X.

3. The Transformer-based traffic flow prediction method for the diffusion graph attention network according to claim 1, wherein the graph convolution operation aggregates the features of the neighbor nodes to the center node based on the graph structure to update the node features, and the graph convolution network is simplified ChebNet by a first order approximation:

Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D _in features,/>Representing output,/>Representing a matrix of learnable parameters; the basic graph convolution network is only applicable to undirected graphs and does not accord with the directional property of the traffic network; to facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):

wherein M ^k represents the power series of the transition matrix and K represents the number of diffusion steps; in the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M _f =a/rowsum (a) and the backward transfer matrix is M _b＝A^T/rowsum (a).

4. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the spatial convolution layer captures local and global spatial dependencies of traffic flows; performing a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4); formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:

5. The method of traffic flow prediction using a Transformer-based distraction map attention network according to claim 1, wherein the distraction map attention network encoder, as exemplified by layer i, gives the inputAnd/>Output feature matrix/>The following is shown:

Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; given/>The head number of (2) is H, and then:

Wherein, Representing a single head diffusion attention, ||representing a join operation; k represents the diffusion step size, K represents the maximum diffusion order,/>Calculated from equation (8):

where θ _k represents the diffusion weight coefficient, for the corresponding diffusion step (T _e ^(l))^k, use Query-Key-Value attention to obtain the appropriate θ _k, as follows:

Wherein W _V represents a transformation matrix of Value, view represents reshape operation of the matrix, the shape of the original matrix is given as R ^N×N, the output is a single-row vector, and the dimension is An input sequence representing Query-Key-Value attention; e _ik represents the attention score between two different diffusion steps i and k, e _ij represents the attention score between two different diffusion steps i and j; e _ij is calculated by equation (10):

Wherein d _qs represents the size of the Query, and W _Q and W _K represent transformation matrices of the Query and Key respectively; and/> The input of Query-Key-Value attention of the diffusion steps i and j are respectively shown;

the output adjacency transfer matrix T _e ^(l+1) is calculated by residual connection as follows:

wherein laekyReLU represents an activation function, a ^m (·) represents a learnable weight vector for the mth replica; And Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.

6. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized by a space-time decoder for traffic flow prediction, receiving the space-time features extracted by the encoder to generate a future traffic flow sequence; the single-layer decoder consists of a space-time convolution module, a diffusion diagram attention module and an auxiliary module for aggregating information between the encoder and the decoder; the input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:

Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; /(I)And/>The calculation process of (1) is the same as that of formulas (11) and (12); /(I)AndAnd/>Together with T _e ^L to the auxiliary module to aggregate traffic flow information between the encoder and decoder; then, the output of the layer-I decoder is as follows:

Wherein, The method is characterized in that the method is used for expressing the diffusion attention, the calculation process is similar to the formula (7), the calculation formula (9) of diffusion parameters is used for expressing the attention score, and the calculation of the attention score is shown in the formula (10); will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:

Wherein, Transformation matrix representing Value,/>Representing an input sequence; /(I)Calculated from equation (19):

Wherein d _qs represents the size of Query, And/>Transform matrices respectively representing Query and Key; and/> The inputs of Query-Key-Value attention spread steps i and j, respectively.