CN116504060B - Diffusion diagram attention network traffic flow prediction method based on Transformer - Google Patents

Diffusion diagram attention network traffic flow prediction method based on Transformer Download PDF

Info

Publication number
CN116504060B
CN116504060B CN202310483068.9A CN202310483068A CN116504060B CN 116504060 B CN116504060 B CN 116504060B CN 202310483068 A CN202310483068 A CN 202310483068A CN 116504060 B CN116504060 B CN 116504060B
Authority
CN
China
Prior art keywords
diffusion
representing
attention
traffic flow
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310483068.9A
Other languages
Chinese (zh)
Other versions
CN116504060A (en
Inventor
张红
王红燕
巩蕾
张玺君
朱思雨
李扬
伊敏
魏骄云
杨俊译
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou University of Technology
Original Assignee
Lanzhou University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou University of Technology filed Critical Lanzhou University of Technology
Priority to CN202310483068.9A priority Critical patent/CN116504060B/en
Publication of CN116504060A publication Critical patent/CN116504060A/en
Application granted granted Critical
Publication of CN116504060B publication Critical patent/CN116504060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The traffic flow combination prediction method adopts a transform encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolutional network modules (ST-Conv Block) and a spread map attention module (DGA-Block), the ST-Conv Block respectively captures the time dependence and the space dependence of traffic flows through a time gating convolutional network and a space convolutional network, and the DGA-Block adaptively learns the diffusion parameter of each spread step by using a query key value self-attention mechanism and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of the traffic flows. In addition, the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.

Description

Diffusion diagram attention network traffic flow prediction method based on Transformer
Technical Field
The invention relates to the technical field of intelligent traffic, in particular to a traffic flow prediction technology of a diffusion diagram attention network (T-DGAN) based on a transducer.
Background
Traffic flow prediction is an important component of an Intelligent Traffic System (ITS) and can provide scientific basis for management and planning of an urban traffic system. According to the predicted traffic state, traffic departments can deploy and guide traffic flows in advance, so that the running efficiency of a road network is improved, and traffic jams are relieved.
Over the past several decades, researchers have conducted extensive research into traffic flow prediction methods, including autoregressive moving average (ARIMA), kalman Filter (KF), and multi-layer perceptron (MLP), among others. However, since the time series is based on stationarity assumptions, these methods cannot handle complex nonlinear traffic flow data. Accordingly, in order to deal with complex traffic conditions and capture the non-linear relationship of traffic flows, many machine learning methods have been employed to predict traffic flows. For example, short-term traffic flow prediction is performed using a K-nearest neighbor (KNN) method, which considers the spatial correlation characteristics of adjacent road segments. The Bayesian network method processes the uncertain information and performs probabilistic reasoning for short-time traffic flow prediction. The Support Vector Machine (SVM) method is used as a machine learning method based on a statistical learning theory, and short-time traffic flow prediction can be well performed. The long-term memory network (LSTM) effectively captures the nonlinearity of traffic dynamics, and the method can overcome the problem of memory block counter-propagation error attenuation. However, the above approach performs poorly in long-term traffic flow prediction tasks due to the high degree of non-linearity and dynamic spatio-temporal dependence of traffic flow.
In recent years, with the widespread use of deep learning in the traffic field, researchers use Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to capture spatial and temporal dependencies of traffic flows, respectively, while this method captures temporal and spatial dependencies of traffic flows, CNNs are applicable to euclidean data with regular grids, and modeling irregular road networks may lose topology information of the traffic networks. To address this problem, graph roll-up networks (GCNs) are used instead of CNNs to better handle non-euclidean data in traffic road networks. Although existing mixed methods based on GCN and RNN have a great improvement in prediction performance, these methods still have some drawbacks. Since the GCN uses the laplace feature matrix of the graph to calculate and update the feature information of all nodes in the graph, the GCN has poor flexibility and expansibility in capturing the spatial correlation of traffic flows.
In the above method, the spatial structure of the road network is represented by a predefined adjacency matrix, which limits the possibility of learning the dynamic spatiotemporal characteristics of the traffic flow due to its complexity and dynamics. In response to this problem, researchers have proposed using a gated attention network to learn the dynamic spatial correlation of traffic flows from traffic flows based on the graph attention mechanism, proposed a graph multi-attention network (GMAN) for traffic flow prediction, and used a time-space attention mechanism to capture the dynamic spatial-temporal correlation of traffic flows in GMAN. At the same time, the Transformer is used as a deep learning method, which models sequences with encoder and decoder structures and learns dynamic features in data using a multi-headed attention mechanism, which is advantageous to solve the problem of dynamic spatiotemporal correlation in which traffic flows are difficult to capture due to the use of predefined neighbor matrices.
Disclosure of Invention
The invention aims to better capture the complex space-time correlation of traffic flows, and provides a transition-based diffusion diagram attention network (T-DGAN) traffic flow prediction method.
The invention relates to a traffic flow prediction method of a diffusion map attention network based on a transducer, which is characterized in that a T-DGAN adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules, namely ST-Conv blocks, and a diffusion map attention module, namely DGA-blocks. The ST-Conv Block captures the time dependence and the space dependence of traffic flow through a time gating convolution network and a space convolution network respectively, the DGA-Block self-adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of traffic flow; the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.
The invention has the following advantages:
1. The invention provides a transition-based diffusion diagram attention network traffic flow prediction method (T-DGAN). The method adopts an encoder-decoder architecture, wherein a codec stacks a plurality of space-time convolutional network modules (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and road network information is described through a dynamic diagram. The decoder adds an information Auxiliary Block (auxliary Block) on the basis of the encoder to aggregate traffic flow information between the encoder and the decoder.
2. The present invention uses a time-space convolution network (ST-Conv Block) to learn the time-space correlation of traffic flows. The time-gated convolutional layer is used for capturing the time dependence of the traffic flow, and the spatial convolutional layer is used for capturing the spatial dependence of the traffic flow.
3. The invention uses a diffusion diagram attention (DGA-Block) method to model the dynamic space correlation of traffic flow, and the method utilizes a query key value self-attention mechanism to adaptively learn the diffusion parameters of each diffusion step and dynamically update an adjacent transfer matrix to reflect the space dynamic change characteristic of traffic flow.
4. A large number of comparison experiments are respectively carried out on two groups of traffic data sets, and experimental results show that compared with a baseline method, the method provided by the invention has the advantage that more accurate prediction precision is obtained on different data sets.
Drawings
Fig. 1 is a diagram of a T-DGAN method, fig. 2 is a diagram of a time convolution network, fig. 3 is a diagram of T-DGAN versus PeMS, node=11, fig. 4 is a diagram of T-DGAN versus PeMS, node=190, fig. 5 is a diagram of T-DGAN versus METR-LA, node=119, fig. 6 is a diagram of T-DGAN versus METR-LA, node=176, fig. 7 is a adjacency matrix T e,Td (step 0) in PeMS data set, and fig. 8 is an adjacency matrix T e,Td (step 5) in METR-LA data set.
Description of the embodiments
The present invention will be described in further detail with reference to examples.
1 Method
The invention provides a diffusion diagram attention network traffic flow prediction method (T-DGAN) based on a transducer, wherein an encoder layer consists of a space-time convolution network module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and a decoder layer consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block) and an information Auxiliary module (Auxiliary Block). The encoder and decoder have an L-1 layer and an L' -1 layer, respectively. Given the input X {t-T′+1,...,t} and the adjacency matrix A of the T-DGAN method, they are first converted into feature matrices, respectivelyAnd transition matrix/>Wherein,D represents a degree matrix with A having a self-loop, i.e./> And/>Weighting matrices of encoder and decoder respectively representing X { t-T+1,...,t }, v >And/>Indicating the bias of the encoder and decoder, respectively. /(I)And/>Representing the adjacent transform matrices of the encoder and decoder, respectively. By/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation. Output/>, of the last layer encoderAnd/>And Diffusion Attention modules input to each layer of decoder to aggregate traffic flow temporal and spatial characteristic information between the encoder and decoder.
2 Problem definition
In the present invention, the road network is represented as graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R N×N represents a weighted adjacency matrix, a ij is 1 if V i,vj E V and (V i,vj) E, otherwise 0. In each time step t, traffic flow X t∈RN×C on graph G is given, where C represents the number of features per node. The traffic flow prediction problem aims at learning a function f that can take X {t-T+1,...,t} as input and predict the traffic flow for T time steps in the future, the mapping is as follows:
2.1 convolutional encoder for extracting spatio-temporal features
The encoder is used for extracting space-time characteristics from historical traffic flow data and consists of a space-time convolution module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block). Specifically, each ST-Conv Block comprises a time-gated convolutional layer and a spatial convolutional layer, which are used for capturing the time features and the spatial features of the traffic flow. DGA-Block learns the diffusion parameters of each diffusion step using query key value attention and dynamically more closely adjoins the transition matrix to reflect the spatially dynamic nature of the traffic flow.
(1) Time-gated convolutional layer
The time-gated convolution layer comprises a one-dimensional convolution using a Gated Linear Unit (GLU) to capture the time dependence of traffic flow. For each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged. Given the time convolution input of each nodeIt is a sequence of length P, with D in features, using a 1D convolution kernel/>Core size (K t, 1), input size D in and output size 2D out give output/>P, Q are split into two parts along the feature dimension and input to the GLU. Thus, the time-gated convolutional layer can be expressed as:
Wherein P, Q are the inputs of the gates in the GLU, respectively, and by which is meant the Hadamard product based on the element, sigma (Q) uses the Sigmoid function as the activation function, selectively obtaining the hidden state and the information in the input X.
(2) Graph rolling network
The graph convolution operation aggregates the features of neighboring nodes to a central node based on a graph structure to update node features, graph Convolution Network (GCN), chebNet is simplified by a first order approximation:
Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D in features,/>Representing output,/>Representing a matrix of parameters that can be learned. The basic GCN is only applicable to undirected graphs, not conforming to the directional nature of the traffic network. To facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):
Where M k represents the power series of the transition matrix and K represents the number of diffusion steps. In the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M f =a/rowsum (a) and the backward transfer matrix is M b=AT/rowsum (a).
(3) Spatial convolution layer
The present invention proposes a spatial convolution layer to capture local and global spatial dependencies of traffic flows. The present invention performs a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4). Formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:
Where W represents a learnable parameter matrix convolved with the adjacency matrix a.
2.2 Distraction map attention network encoder
The invention takes the layer I as an example, and gives inputAnd/>Output feature matrix/>The following is shown:
Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a matrix of weights that can be learned,Representing a linear transformation matrix. Given/>The head number of (2) is H, and then:
Wherein, Representing a single head diffusion attention, ||representing a join operation.
K represents a diffusion step size, K represents a maximum diffusion order,Calculated from equation (8):
Wherein, theta k represents the diffusion weight coefficient for the corresponding diffusion step The invention uses Query-Key-Value attention to obtain the appropriate θ k as follows:
Wherein W V represents the transformation matrix of Value, view represents reshape operation of matrix, namely, given the shape of the original matrix as R N×N, the output is a single-row vector with the dimension as An input sequence representing Query-Key-Value attention. e ik represents the attention score between two different diffusion steps i and k, and e ij represents the attention score between two different diffusion steps i and j. e ij is calculated by equation (10):
Wherein d qs represents the size of the Query, and W Q and W K represent transformation matrices of the Query and Key, respectively. And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.
Computing output adjacency transfer matrices by residual connectionsThe following is shown:
Wherein, Representing residual connection,/>A dynamically updated portion of representation adjacency transition, which is calculated as follows:
where m.epsilon.1, M represents the replica index, M represents the number of replicas, Representation/>The ith row and jth column element of (c)/>Representing the attention score of the mth replica, it is calculated as follows:
where laekyReLU denotes the activation function, a m (·) denotes the learnable weight vector of the mth replica. And/>Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.
2.3 Space-time decoder for traffic flow prediction
The decoder is configured to receive the spatio-temporal features extracted by the encoder to generate a future traffic stream sequence. The single layer decoder consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block), and an Auxiliary module (auxliary Block) that aggregates information between the encoder and decoder. The input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:
Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix. /(I)And/>The calculation process of (2) is the same as that of formulas (11) and (12).And/>And/>And/>Together into an Auxiliary module (auxliary Block) to aggregate traffic flow information between the encoder and decoder.
Then, the output of the layer-I decoder is as follows:
Wherein, The calculation process is similar to the formula (7), the calculation formula (9) of the diffusion parameter shows the diffusion attention, and the attention score calculation is shown in the formula (10). Will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:
Wherein, Transformation matrix representing Value,/>Representing the input sequence. /(I)Calculated from equation (19):
Wherein d qs represents the size of Query, And/>Transform matrices representing Query and Key, respectively. /(I)And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.
3. Experiment
3.1 Data description
The present invention uses two sets of traffic datasets PeMS and METR-LA to verify the performance of the T-DGAN method proposed by the present invention. The experimental traffic data set contains different attributes, and the detailed information of the data set is shown in table 1:
TABLE 1 description of experimental data sets
PeMS03 is collected every 30 seconds by Caltrans performance measurement system (PeMS) and records spatial location information of traffic flow data monitoring sensors. The number of sensors in PeMS is 555. The collection time period is from 1 month 1 day 2018 to 31 days 1 month 2018, and the traffic speed is summarized every 5 minutes.
The METR-LA dataset was derived from a loop detector on the los Angeles highway, with a time span of 3 months, 1 year 2012, 7 days 3 years 2012, and a historical traffic speed collected by 207 sensors was selected and summarized every 5 minutes.
3.2 Experimental setup
The experiment was compiled and executed on a Windows server (CPU: intel (R) Core (TM) [email protected] 1.50GHz, 16GBRAM,GPU:NVIDIAGeForce RTX 2080TI), based on a Pytorch deep learning framework, with the construction and training of the T-DGAN method in PyCharm software.
The invention divides the data set into 60 percent: 10%: the 30% scale is divided into training, validation and test sets. The batch size was set to 8, the number of heads for diffuse attention and schematic attention in dga-Block was set to 8, the dimension of node embedding was set to 16, the maximum diffusion step size was set to 3, training epochs for pems03 and met-LA datasets was set to 60 and 80, respectively, the historical and predicted data lengths were set to 12 and 12, respectively, and the method was trained using an Adam optimizer with an initial learning rate of 0.001.
3.3 Evaluation index and baseline method
(1) Evaluation index
For better evaluation of the predictive performance of the method, the invention uses Mean Absolute Error (MAE), root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) as the evaluation index of the T-DGAN method:
1) Mean Absolute Error (MAE):
2) Root Mean Square Error (RMSE):
3) Mean Absolute Percent Error (MAPE):
wherein y i and Representing the actual traffic speed and the predicted traffic speed, respectively. n represents the number of nodes on the traffic road network.
(2) Baseline method
The invention is mainly compared with a deep learning method and other baseline methods, wherein the baseline methods are as follows:
1) History averaging method (HA): the average traffic information for the historical period is used as a prediction.
2) Vector Autoregressive (VAR): vector autoregressive describes that n variables during the same sample period can be a linear function of their historical values.
3) Support vector regression method (SVR): support vector regression uses a linear support vector machine to train a method to obtain a relationship between input and output to predict traffic flow.
4) Feedforward Neural Network (FNN): feedforward neural network with two hidden layers and L2 regularization.
5) Autoregressive moving average method (ARIMA): an autoregressive moving average method with a Kalman filter.
6) Long and short term memory network (FC-LSTM): a recurrent neural network with fully connected LSTM hidden units.
7) Diffusion convolution cyclic neural network (DCRNN): diffusion convolution is combined with a recurrent neural network for traffic flow prediction.
8) GRAPH WAVENET (G-WN): the G-WN network combines a graph convolution network and an dilation-cause convolution network.
9) Space-time diagram convolutional network (STGCN): a space-time graph convolution network that combines graph convolution and one-dimensional convolution.
10A concentration-based space-time diagram convolutional network (ASTGCN): the spatio-temporal attention mechanism is further integrated into a spatio-temporal graph convolutional network for capturing the dynamic spatio-temporal patterns of traffic flow.
11 An Adaptive Graph Convolution Recursion Network (AGCRN): conventional graph convolution networks are improved by using node adaptive parameter learning and data adaptive graph generation modules for learning node specific patterns and capturing spatial correlations, respectively.
12 Figure multi-attention network (GMAN): multiple spatiotemporal attention blocks integrate the encoder-decoder architecture, converting attention between the encoder and decoder.
3.4 Experimental results and analysis
The present invention performs predictive visualization on PeMS and METR-LA datasets. The time range was set to 288 durations and nodes 11 and 190 were randomly selected for visualization in PeMS dataset, with the results shown in fig. 3 and 4. The nodes 119 and 176 are randomly selected for visualization in the METR-LA dataset as shown in FIGS. 5 and 6. It can be found that the prediction of the T-DGAN method accurately follows the true value of the traffic speed.
The T-DGAN method and various baseline methods are subjected to experiments on PeMS and METR-LA data sets, and the prediction results of the T-DGAN method and the baseline methods in 15 minutes, 30 minutes and 60 minutes are shown in table 2 and table 3, so that the experimental results show that the T-DGAN method provided by the invention obtains good prediction results on the two data sets.
It can be observed from tables 2 and 3 that the predictions of conventional time series analysis methods are not ideal, indicating that these methods have limited modeling capabilities for non-linearities and high complexity of traffic flow. Meanwhile, compared with the traditional time sequence analysis method, the deep learning-based method obtains better prediction results. For example, the DCRNN, STGCN, ASTGCN method and the T-DGAN method of the present invention consider both the spatio-temporal correlation and have better performance than the conventional time series methods such as ARIMA and FC-LSTM. In addition, GMAN methods perform better than G-WN, STGCN, ASTGCN, etc., indicating that the encoder-decoder architecture used in GMAN can effectively capture the dynamic spatio-temporal correlation of traffic streams.
In contrast, the T-DGAN method provided by the invention obtains better prediction results compared with a baseline method, and proves the effectiveness of the T-DGAN method in capturing the time-space correlation of traffic flow. Meanwhile, the T-DGAN method captures the time-space correlation of traffic flow through an encoder-decoder architecture, and models the direct relation between the historical time step and the future time step by combining a time-space convolution network and a diffusion diagram attention mechanism, thereby being beneficial to alleviating the error propagation problem between the predicted time steps.
TABLE 2 comparison of predicted Performance on PeMS data set
TABLE 3 comparison of predicted Performance on METR-LA datasets
In order to evaluate the performance of different modules in the T-DGAN method provided by the invention, an ablation experiment is performed.
(1) Influence of dynamic diagram on prediction result
According to the invention, an ablation experiment of a dynamic graph and a static graph is carried out on PeMS and METR-LA datasets, and the influence of the dynamic graph and the static graph on traffic flow prediction is researched. As can be seen from the ablation experimental results in table 4, the prediction performance of the dynamic graph is better than that of the static graph, and the dynamic graph has better prediction performance on traffic flow than that of the static graph.
TABLE 4 dynamic and static images set the experimental results
(2) Influence of space-time convolution (ST-Conv Block) on prediction results
To study the performance of the different modules in the T-DGAN method, variants of the T-DGAN method (NST-Conv Block: without space-time convolution network module) were designed, the effect of the space-time convolution module on the method prediction performance was verified, and traffic flow predictions of NST-Conv Block variants and T-DGAN method on PeMS and METR-LA datasets were performed for 15 min, 30min and 60 min, as shown in Table 5.
Table 5.T-comparison of the prediction results of the DGAN method and the variant method
At 15 minutes, the T-DGAN method reduced MAE on PeMS and METR-LA datasets by about 6.67%,1.52% and RMSE by about 3.47% and 2.02% respectively, as compared to NST-Conv Block method. At 30 minutes, MAE was reduced by about 7.16%,2.01% and RMSE was reduced by about 3.91%,0.94%, respectively. At 60 minutes, MAE was reduced by about 11.56%,2.04%, and RMSE was reduced by about 6.93%,1.08%, respectively. As can be obtained from table 5, the T-DGAN method has better prediction performance in different prediction time steps, and particularly in long-term prediction, the difference between the T-DGAN method and the NST-Conv Block method is more remarkable, which proves that the ST-Conv Block module effectively relieves the influence of error propagation.
(3) Influence of dynamic adjacency matrix on prediction result
The adjacency-shift matrix contains edge weight information between vertices, and the edge weights reflect traffic flow between traffic sensors, so that the dynamically updated adjacency matrix shows dynamically changing traffic flow over the road segment. Experiments prove that the adjacent transfer matrix is dynamically updated in the learning process. The results on PeMS and METR-LA datasets are shown in FIGS. 7 and 8, respectively, with the last batch T e,Td being different at randomly truncated time nodes, demonstrating that T e,Td is constantly changing during the learning process.
While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (6)

1. A traffic flow combined prediction method of a diffusion map attention network based on a transducer adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules and a diffusion map attention module; the space-time convolution network module captures the time dependence and the space dependence of traffic flow through the time gating convolution network and the space convolution network respectively, the diffusion map attention module adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates the adjacent transfer matrix so as to capture the dynamic space-time dependence of traffic flow; the decoder is added with an information auxiliary module to aggregate traffic flow information between the encoder and the decoder, and finally, a prediction sequence is output through the decoder to predict;
Wherein the encoder and the decoder have an L-1 layer and an L-1 layer, respectively; given the input X {t-T′+1,...,t} and the adjacency matrix A of the traffic flow prediction method, they are first converted into feature matrices, respectively And transition matrixWherein/>D represents a degree matrix with A having a self-loop: /(I)And/>Weighting matrices for encoder and decoder, respectively, representing X {t-T+1,...,t},/>And/>Representing the bias of the encoder and decoder, respectively; t e (1) and/>Adjacent transition matrices representing the encoder and decoder, respectively; by/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation; output/>, of the last layer encoderAnd/>Diffusion Attention modules input to each layer of decoders to aggregate traffic flow temporal-spatial characteristic information between the encoder and decoder;
Representing the road network as a graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R N×N represents a weighted adjacency matrix, a ij is 1 if V i,vj E V and (V i,vj) E, otherwise 0; in each time step t, traffic flow X t∈RN×C on graph G is given, where C represents the number of features of each node;
Learning function f of the traffic flow prediction method: taking X {t-T′+1,...,t} as input, and predicting traffic flow of T time steps in the future, the mapping relation is as follows:
2. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the time-gated convolution layer comprises a one-dimensional convolution, using a gated linear unit to capture the time dependence of traffic flow; for each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged; given the time convolution input of each node It is a sequence of length P, with D in features, using a 1D convolution kernel/>Core size (K t, 1), input size D in and output size 2D out give output/>P and Q are divided into two parts along the characteristic dimension and input into a gating linear unit; the time-gated convolutional layer can be expressed as:
wherein, P and Q are the inputs of gates in the gating linear unit respectively, and the Hadamard product of the elements is represented by the following, sigma (Q) uses a Sigmoid function as an activation function to selectively acquire information in the hidden state and the input X.
3. The Transformer-based traffic flow prediction method for the diffusion graph attention network according to claim 1, wherein the graph convolution operation aggregates the features of the neighbor nodes to the center node based on the graph structure to update the node features, and the graph convolution network is simplified ChebNet by a first order approximation:
Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D in features,/>Representing output,/>Representing a matrix of learnable parameters; the basic graph convolution network is only applicable to undirected graphs and does not accord with the directional property of the traffic network; to facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):
wherein M k represents the power series of the transition matrix and K represents the number of diffusion steps; in the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M f =a/rowsum (a) and the backward transfer matrix is M b=AT/rowsum (a).
4. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the spatial convolution layer captures local and global spatial dependencies of traffic flows; performing a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4); formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:
Where W represents a learnable parameter matrix convolved with the adjacency matrix a.
5. The method of traffic flow prediction using a Transformer-based distraction map attention network according to claim 1, wherein the distraction map attention network encoder, as exemplified by layer i, gives the inputAnd/>Output feature matrix/>The following is shown:
Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; given/>The head number of (2) is H, and then:
Wherein, Representing a single head diffusion attention, ||representing a join operation; k represents the diffusion step size, K represents the maximum diffusion order,/>Calculated from equation (8):
where θ k represents the diffusion weight coefficient, for the corresponding diffusion step (T e (l))k, use Query-Key-Value attention to obtain the appropriate θ k, as follows:
Wherein W V represents a transformation matrix of Value, view represents reshape operation of the matrix, the shape of the original matrix is given as R N×N, the output is a single-row vector, and the dimension is An input sequence representing Query-Key-Value attention; e ik represents the attention score between two different diffusion steps i and k, e ij represents the attention score between two different diffusion steps i and j; e ij is calculated by equation (10):
Wherein d qs represents the size of the Query, and W Q and W K represent transformation matrices of the Query and Key respectively; and/> The input of Query-Key-Value attention of the diffusion steps i and j are respectively shown;
the output adjacency transfer matrix T e (l+1) is calculated by residual connection as follows:
Wherein, Representing residual connection,/>A dynamically updated portion of representation adjacency transition, which is calculated as follows:
where m.epsilon.1, M represents the replica index, M represents the number of replicas, Representation/>The ith row and jth column element of (c)/>Representing the attention score of the mth replica, it is calculated as follows:
wherein laekyReLU represents an activation function, a m (·) represents a learnable weight vector for the mth replica; And Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.
6. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized by a space-time decoder for traffic flow prediction, receiving the space-time features extracted by the encoder to generate a future traffic flow sequence; the single-layer decoder consists of a space-time convolution module, a diffusion diagram attention module and an auxiliary module for aggregating information between the encoder and the decoder; the input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:
Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; /(I)And/>The calculation process of (1) is the same as that of formulas (11) and (12); /(I)AndAnd/>Together with T e L to the auxiliary module to aggregate traffic flow information between the encoder and decoder; then, the output of the layer-I decoder is as follows:
Wherein, The method is characterized in that the method is used for expressing the diffusion attention, the calculation process is similar to the formula (7), the calculation formula (9) of diffusion parameters is used for expressing the attention score, and the calculation of the attention score is shown in the formula (10); will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:
Wherein, Transformation matrix representing Value,/>Representing an input sequence; /(I)Calculated from equation (19):
Wherein d qs represents the size of Query, And/>Transform matrices respectively representing Query and Key; and/> The inputs of Query-Key-Value attention spread steps i and j, respectively.
CN202310483068.9A 2023-05-01 2023-05-01 Diffusion diagram attention network traffic flow prediction method based on Transformer Active CN116504060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310483068.9A CN116504060B (en) 2023-05-01 2023-05-01 Diffusion diagram attention network traffic flow prediction method based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310483068.9A CN116504060B (en) 2023-05-01 2023-05-01 Diffusion diagram attention network traffic flow prediction method based on Transformer

Publications (2)

Publication Number Publication Date
CN116504060A CN116504060A (en) 2023-07-28
CN116504060B true CN116504060B (en) 2024-05-14

Family

ID=87326175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310483068.9A Active CN116504060B (en) 2023-05-01 2023-05-01 Diffusion diagram attention network traffic flow prediction method based on Transformer

Country Status (1)

Country Link
CN (1) CN116504060B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117133116B (en) * 2023-08-07 2024-04-19 南京邮电大学 Traffic flow prediction method and system based on space-time correlation network
CN116884222B (en) * 2023-08-09 2024-03-26 重庆邮电大学 Short-time traffic flow prediction method for bayonet nodes
CN117726183A (en) * 2024-02-07 2024-03-19 天津生联智慧科技发展有限公司 Gas operation data prediction method based on space high-order convolution

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071065A (en) * 2020-09-16 2020-12-11 山东理工大学 Traffic flow prediction method based on global diffusion convolution residual error network
CN113450568A (en) * 2021-06-30 2021-09-28 兰州理工大学 Convolutional network traffic flow prediction method based on space-time attention mechanism
CN113487088A (en) * 2021-07-06 2021-10-08 哈尔滨工业大学(深圳) Traffic prediction method and device based on dynamic space-time diagram convolution attention model
CN115482656A (en) * 2022-05-23 2022-12-16 汕头大学 Method for predicting traffic flow by using space dynamic graph convolution network
CN115828990A (en) * 2022-11-03 2023-03-21 辽宁大学 Time-space diagram node attribute prediction method for fused adaptive graph diffusion convolution network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215223B (en) * 2020-10-16 2024-03-19 清华大学 Multidirectional scene character recognition method and system based on multi-element attention mechanism
CN113672865A (en) * 2021-07-27 2021-11-19 湖州师范学院 Traffic flow prediction method based on depth map Gaussian process

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071065A (en) * 2020-09-16 2020-12-11 山东理工大学 Traffic flow prediction method based on global diffusion convolution residual error network
CN113450568A (en) * 2021-06-30 2021-09-28 兰州理工大学 Convolutional network traffic flow prediction method based on space-time attention mechanism
CN113487088A (en) * 2021-07-06 2021-10-08 哈尔滨工业大学(深圳) Traffic prediction method and device based on dynamic space-time diagram convolution attention model
CN115482656A (en) * 2022-05-23 2022-12-16 汕头大学 Method for predicting traffic flow by using space dynamic graph convolution network
CN115828990A (en) * 2022-11-03 2023-03-21 辽宁大学 Time-space diagram node attribute prediction method for fused adaptive graph diffusion convolution network

Also Published As

Publication number Publication date
CN116504060A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
Du et al. Deep air quality forecasting using hybrid deep learning framework
CN116504060B (en) Diffusion diagram attention network traffic flow prediction method based on Transformer
Guo et al. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting
Guo et al. Short-term traffic speed forecasting based on graph attention temporal convolutional networks
Ye et al. Meta graph transformer: A novel framework for spatial–temporal traffic prediction
Sun et al. Dual dynamic spatial-temporal graph convolution network for traffic prediction
CN114519469B (en) Construction method of multivariable long-sequence time sequence prediction model based on transducer framework
Li et al. Graph CNNs for urban traffic passenger flows prediction
Huang et al. Transfer learning in traffic prediction with graph neural networks
Zheng et al. Hybrid deep learning models for traffic prediction in large-scale road networks
Modi et al. Multistep traffic speed prediction: A deep learning based approach using latent space mapping considering spatio-temporal dependencies
Jin et al. Adaptive dual-view wavenet for urban spatial–temporal event prediction
CN114299728A (en) Vehicle flow prediction method combining attention mechanism and dynamic space-time convolution model
Gao et al. Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction
CN115376317B (en) Traffic flow prediction method based on dynamic graph convolution and time sequence convolution network
Sriramulu et al. Adaptive dependency learning graph neural networks
Wang et al. TYRE: A dynamic graph model for traffic prediction
Zhang et al. Attention-driven recurrent imputation for traffic speed
CN114582128A (en) Traffic flow prediction method, medium, and device based on graph discrete attention network
Hu et al. Multi-source information fusion based dlaas for traffic flow prediction
Su et al. Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters
Guopeng et al. Dynamic graph filters networks: A gray-box model for multistep traffic forecasting
Lin et al. Hybrid water quality prediction with graph attention and spatio-temporal fusion
CN117290706A (en) Traffic flow prediction method based on space-time convolution fusion probability sparse attention mechanism
Wang et al. PSTN: periodic spatial-temporal deep neural network for traffic condition prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant