CN116504060B - Diffusion diagram attention network traffic flow prediction method based on Transformer - Google Patents
Diffusion diagram attention network traffic flow prediction method based on Transformer Download PDFInfo
- Publication number
- CN116504060B CN116504060B CN202310483068.9A CN202310483068A CN116504060B CN 116504060 B CN116504060 B CN 116504060B CN 202310483068 A CN202310483068 A CN 202310483068A CN 116504060 B CN116504060 B CN 116504060B
- Authority
- CN
- China
- Prior art keywords
- diffusion
- representing
- attention
- traffic flow
- decoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000009792 diffusion process Methods 0.000 title claims abstract description 70
- 238000010586 diagram Methods 0.000 title claims description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 60
- 238000012546 transfer Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 9
- 239000010410 layer Substances 0.000 claims description 34
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 12
- 230000007704 transition Effects 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000002441 reversible effect Effects 0.000 claims description 2
- 239000002356 single layer Substances 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 claims 1
- 238000002474 experimental method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
- G08G1/0129—Traffic data processing for creating historical data or processing based on historical data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Analytical Chemistry (AREA)
- Traffic Control Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The traffic flow combination prediction method adopts a transform encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolutional network modules (ST-Conv Block) and a spread map attention module (DGA-Block), the ST-Conv Block respectively captures the time dependence and the space dependence of traffic flows through a time gating convolutional network and a space convolutional network, and the DGA-Block adaptively learns the diffusion parameter of each spread step by using a query key value self-attention mechanism and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of the traffic flows. In addition, the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.
Description
Technical Field
The invention relates to the technical field of intelligent traffic, in particular to a traffic flow prediction technology of a diffusion diagram attention network (T-DGAN) based on a transducer.
Background
Traffic flow prediction is an important component of an Intelligent Traffic System (ITS) and can provide scientific basis for management and planning of an urban traffic system. According to the predicted traffic state, traffic departments can deploy and guide traffic flows in advance, so that the running efficiency of a road network is improved, and traffic jams are relieved.
Over the past several decades, researchers have conducted extensive research into traffic flow prediction methods, including autoregressive moving average (ARIMA), kalman Filter (KF), and multi-layer perceptron (MLP), among others. However, since the time series is based on stationarity assumptions, these methods cannot handle complex nonlinear traffic flow data. Accordingly, in order to deal with complex traffic conditions and capture the non-linear relationship of traffic flows, many machine learning methods have been employed to predict traffic flows. For example, short-term traffic flow prediction is performed using a K-nearest neighbor (KNN) method, which considers the spatial correlation characteristics of adjacent road segments. The Bayesian network method processes the uncertain information and performs probabilistic reasoning for short-time traffic flow prediction. The Support Vector Machine (SVM) method is used as a machine learning method based on a statistical learning theory, and short-time traffic flow prediction can be well performed. The long-term memory network (LSTM) effectively captures the nonlinearity of traffic dynamics, and the method can overcome the problem of memory block counter-propagation error attenuation. However, the above approach performs poorly in long-term traffic flow prediction tasks due to the high degree of non-linearity and dynamic spatio-temporal dependence of traffic flow.
In recent years, with the widespread use of deep learning in the traffic field, researchers use Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to capture spatial and temporal dependencies of traffic flows, respectively, while this method captures temporal and spatial dependencies of traffic flows, CNNs are applicable to euclidean data with regular grids, and modeling irregular road networks may lose topology information of the traffic networks. To address this problem, graph roll-up networks (GCNs) are used instead of CNNs to better handle non-euclidean data in traffic road networks. Although existing mixed methods based on GCN and RNN have a great improvement in prediction performance, these methods still have some drawbacks. Since the GCN uses the laplace feature matrix of the graph to calculate and update the feature information of all nodes in the graph, the GCN has poor flexibility and expansibility in capturing the spatial correlation of traffic flows.
In the above method, the spatial structure of the road network is represented by a predefined adjacency matrix, which limits the possibility of learning the dynamic spatiotemporal characteristics of the traffic flow due to its complexity and dynamics. In response to this problem, researchers have proposed using a gated attention network to learn the dynamic spatial correlation of traffic flows from traffic flows based on the graph attention mechanism, proposed a graph multi-attention network (GMAN) for traffic flow prediction, and used a time-space attention mechanism to capture the dynamic spatial-temporal correlation of traffic flows in GMAN. At the same time, the Transformer is used as a deep learning method, which models sequences with encoder and decoder structures and learns dynamic features in data using a multi-headed attention mechanism, which is advantageous to solve the problem of dynamic spatiotemporal correlation in which traffic flows are difficult to capture due to the use of predefined neighbor matrices.
Disclosure of Invention
The invention aims to better capture the complex space-time correlation of traffic flows, and provides a transition-based diffusion diagram attention network (T-DGAN) traffic flow prediction method.
The invention relates to a traffic flow prediction method of a diffusion map attention network based on a transducer, which is characterized in that a T-DGAN adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules, namely ST-Conv blocks, and a diffusion map attention module, namely DGA-blocks. The ST-Conv Block captures the time dependence and the space dependence of traffic flow through a time gating convolution network and a space convolution network respectively, the DGA-Block self-adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates an adjacent transfer matrix to capture the dynamic space dependence of traffic flow; the decoder adds an information assistance module to aggregate traffic flow information between the encoder and decoder.
The invention has the following advantages:
1. The invention provides a transition-based diffusion diagram attention network traffic flow prediction method (T-DGAN). The method adopts an encoder-decoder architecture, wherein a codec stacks a plurality of space-time convolutional network modules (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and road network information is described through a dynamic diagram. The decoder adds an information Auxiliary Block (auxliary Block) on the basis of the encoder to aggregate traffic flow information between the encoder and the decoder.
2. The present invention uses a time-space convolution network (ST-Conv Block) to learn the time-space correlation of traffic flows. The time-gated convolutional layer is used for capturing the time dependence of the traffic flow, and the spatial convolutional layer is used for capturing the spatial dependence of the traffic flow.
3. The invention uses a diffusion diagram attention (DGA-Block) method to model the dynamic space correlation of traffic flow, and the method utilizes a query key value self-attention mechanism to adaptively learn the diffusion parameters of each diffusion step and dynamically update an adjacent transfer matrix to reflect the space dynamic change characteristic of traffic flow.
4. A large number of comparison experiments are respectively carried out on two groups of traffic data sets, and experimental results show that compared with a baseline method, the method provided by the invention has the advantage that more accurate prediction precision is obtained on different data sets.
Drawings
Fig. 1 is a diagram of a T-DGAN method, fig. 2 is a diagram of a time convolution network, fig. 3 is a diagram of T-DGAN versus PeMS, node=11, fig. 4 is a diagram of T-DGAN versus PeMS, node=190, fig. 5 is a diagram of T-DGAN versus METR-LA, node=119, fig. 6 is a diagram of T-DGAN versus METR-LA, node=176, fig. 7 is a adjacency matrix T e,Td (step 0) in PeMS data set, and fig. 8 is an adjacency matrix T e,Td (step 5) in METR-LA data set.
Description of the embodiments
The present invention will be described in further detail with reference to examples.
1 Method
The invention provides a diffusion diagram attention network traffic flow prediction method (T-DGAN) based on a transducer, wherein an encoder layer consists of a space-time convolution network module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block), and a decoder layer consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block) and an information Auxiliary module (Auxiliary Block). The encoder and decoder have an L-1 layer and an L' -1 layer, respectively. Given the input X {t-T′+1,...,t} and the adjacency matrix A of the T-DGAN method, they are first converted into feature matrices, respectivelyAnd transition matrix/>Wherein,D represents a degree matrix with A having a self-loop, i.e./> And/>Weighting matrices of encoder and decoder respectively representing X { t-T+1,...,t }, v >And/>Indicating the bias of the encoder and decoder, respectively. /(I)And/>Representing the adjacent transform matrices of the encoder and decoder, respectively. By/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation. Output/>, of the last layer encoderAnd/>And Diffusion Attention modules input to each layer of decoder to aggregate traffic flow temporal and spatial characteristic information between the encoder and decoder.
2 Problem definition
In the present invention, the road network is represented as graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R N×N represents a weighted adjacency matrix, a ij is 1 if V i,vj E V and (V i,vj) E, otherwise 0. In each time step t, traffic flow X t∈RN×C on graph G is given, where C represents the number of features per node. The traffic flow prediction problem aims at learning a function f that can take X {t-T+1,...,t} as input and predict the traffic flow for T time steps in the future, the mapping is as follows:
2.1 convolutional encoder for extracting spatio-temporal features
The encoder is used for extracting space-time characteristics from historical traffic flow data and consists of a space-time convolution module (ST-Conv Block) and a diffusion diagram attention module (DGA-Block). Specifically, each ST-Conv Block comprises a time-gated convolutional layer and a spatial convolutional layer, which are used for capturing the time features and the spatial features of the traffic flow. DGA-Block learns the diffusion parameters of each diffusion step using query key value attention and dynamically more closely adjoins the transition matrix to reflect the spatially dynamic nature of the traffic flow.
(1) Time-gated convolutional layer
The time-gated convolution layer comprises a one-dimensional convolution using a Gated Linear Unit (GLU) to capture the time dependence of traffic flow. For each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged. Given the time convolution input of each nodeIt is a sequence of length P, with D in features, using a 1D convolution kernel/>Core size (K t, 1), input size D in and output size 2D out give output/>P, Q are split into two parts along the feature dimension and input to the GLU. Thus, the time-gated convolutional layer can be expressed as:
Wherein P, Q are the inputs of the gates in the GLU, respectively, and by which is meant the Hadamard product based on the element, sigma (Q) uses the Sigmoid function as the activation function, selectively obtaining the hidden state and the information in the input X.
(2) Graph rolling network
The graph convolution operation aggregates the features of neighboring nodes to a central node based on a graph structure to update node features, graph Convolution Network (GCN), chebNet is simplified by a first order approximation:
Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D in features,/>Representing output,/>Representing a matrix of parameters that can be learned. The basic GCN is only applicable to undirected graphs, not conforming to the directional nature of the traffic network. To facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):
Where M k represents the power series of the transition matrix and K represents the number of diffusion steps. In the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M f =a/rowsum (a) and the backward transfer matrix is M b=AT/rowsum (a).
(3) Spatial convolution layer
The present invention proposes a spatial convolution layer to capture local and global spatial dependencies of traffic flows. The present invention performs a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4). Formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:
Where W represents a learnable parameter matrix convolved with the adjacency matrix a.
2.2 Distraction map attention network encoder
The invention takes the layer I as an example, and gives inputAnd/>Output feature matrix/>The following is shown:
Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a matrix of weights that can be learned,Representing a linear transformation matrix. Given/>The head number of (2) is H, and then:
Wherein, Representing a single head diffusion attention, ||representing a join operation.
K represents a diffusion step size, K represents a maximum diffusion order,Calculated from equation (8):
Wherein, theta k represents the diffusion weight coefficient for the corresponding diffusion step The invention uses Query-Key-Value attention to obtain the appropriate θ k as follows:
Wherein W V represents the transformation matrix of Value, view represents reshape operation of matrix, namely, given the shape of the original matrix as R N×N, the output is a single-row vector with the dimension as An input sequence representing Query-Key-Value attention. e ik represents the attention score between two different diffusion steps i and k, and e ij represents the attention score between two different diffusion steps i and j. e ij is calculated by equation (10):
Wherein d qs represents the size of the Query, and W Q and W K represent transformation matrices of the Query and Key, respectively. And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.
Computing output adjacency transfer matrices by residual connectionsThe following is shown:
Wherein, Representing residual connection,/>A dynamically updated portion of representation adjacency transition, which is calculated as follows:
where m.epsilon.1, M represents the replica index, M represents the number of replicas, Representation/>The ith row and jth column element of (c)/>Representing the attention score of the mth replica, it is calculated as follows:
where laekyReLU denotes the activation function, a m (·) denotes the learnable weight vector of the mth replica. And/>Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.
2.3 Space-time decoder for traffic flow prediction
The decoder is configured to receive the spatio-temporal features extracted by the encoder to generate a future traffic stream sequence. The single layer decoder consists of a space-time convolution module (ST-Conv Block), a diffusion diagram attention module (DGA-Block), and an Auxiliary module (auxliary Block) that aggregates information between the encoder and decoder. The input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:
Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix. /(I)And/>The calculation process of (2) is the same as that of formulas (11) and (12).And/>And/>And/>Together into an Auxiliary module (auxliary Block) to aggregate traffic flow information between the encoder and decoder.
Then, the output of the layer-I decoder is as follows:
Wherein, The calculation process is similar to the formula (7), the calculation formula (9) of the diffusion parameter shows the diffusion attention, and the attention score calculation is shown in the formula (10). Will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:
Wherein, Transformation matrix representing Value,/>Representing the input sequence. /(I)Calculated from equation (19):
Wherein d qs represents the size of Query, And/>Transform matrices representing Query and Key, respectively. /(I)And/>The inputs of Query-Key-Value attention spread steps i and j, respectively.
3. Experiment
3.1 Data description
The present invention uses two sets of traffic datasets PeMS and METR-LA to verify the performance of the T-DGAN method proposed by the present invention. The experimental traffic data set contains different attributes, and the detailed information of the data set is shown in table 1:
TABLE 1 description of experimental data sets
PeMS03 is collected every 30 seconds by Caltrans performance measurement system (PeMS) and records spatial location information of traffic flow data monitoring sensors. The number of sensors in PeMS is 555. The collection time period is from 1 month 1 day 2018 to 31 days 1 month 2018, and the traffic speed is summarized every 5 minutes.
The METR-LA dataset was derived from a loop detector on the los Angeles highway, with a time span of 3 months, 1 year 2012, 7 days 3 years 2012, and a historical traffic speed collected by 207 sensors was selected and summarized every 5 minutes.
3.2 Experimental setup
The experiment was compiled and executed on a Windows server (CPU: intel (R) Core (TM) [email protected] 1.50GHz, 16GBRAM,GPU:NVIDIAGeForce RTX 2080TI), based on a Pytorch deep learning framework, with the construction and training of the T-DGAN method in PyCharm software.
The invention divides the data set into 60 percent: 10%: the 30% scale is divided into training, validation and test sets. The batch size was set to 8, the number of heads for diffuse attention and schematic attention in dga-Block was set to 8, the dimension of node embedding was set to 16, the maximum diffusion step size was set to 3, training epochs for pems03 and met-LA datasets was set to 60 and 80, respectively, the historical and predicted data lengths were set to 12 and 12, respectively, and the method was trained using an Adam optimizer with an initial learning rate of 0.001.
3.3 Evaluation index and baseline method
(1) Evaluation index
For better evaluation of the predictive performance of the method, the invention uses Mean Absolute Error (MAE), root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) as the evaluation index of the T-DGAN method:
1) Mean Absolute Error (MAE):
2) Root Mean Square Error (RMSE):
3) Mean Absolute Percent Error (MAPE):
wherein y i and Representing the actual traffic speed and the predicted traffic speed, respectively. n represents the number of nodes on the traffic road network.
(2) Baseline method
The invention is mainly compared with a deep learning method and other baseline methods, wherein the baseline methods are as follows:
1) History averaging method (HA): the average traffic information for the historical period is used as a prediction.
2) Vector Autoregressive (VAR): vector autoregressive describes that n variables during the same sample period can be a linear function of their historical values.
3) Support vector regression method (SVR): support vector regression uses a linear support vector machine to train a method to obtain a relationship between input and output to predict traffic flow.
4) Feedforward Neural Network (FNN): feedforward neural network with two hidden layers and L2 regularization.
5) Autoregressive moving average method (ARIMA): an autoregressive moving average method with a Kalman filter.
6) Long and short term memory network (FC-LSTM): a recurrent neural network with fully connected LSTM hidden units.
7) Diffusion convolution cyclic neural network (DCRNN): diffusion convolution is combined with a recurrent neural network for traffic flow prediction.
8) GRAPH WAVENET (G-WN): the G-WN network combines a graph convolution network and an dilation-cause convolution network.
9) Space-time diagram convolutional network (STGCN): a space-time graph convolution network that combines graph convolution and one-dimensional convolution.
10A concentration-based space-time diagram convolutional network (ASTGCN): the spatio-temporal attention mechanism is further integrated into a spatio-temporal graph convolutional network for capturing the dynamic spatio-temporal patterns of traffic flow.
11 An Adaptive Graph Convolution Recursion Network (AGCRN): conventional graph convolution networks are improved by using node adaptive parameter learning and data adaptive graph generation modules for learning node specific patterns and capturing spatial correlations, respectively.
12 Figure multi-attention network (GMAN): multiple spatiotemporal attention blocks integrate the encoder-decoder architecture, converting attention between the encoder and decoder.
3.4 Experimental results and analysis
The present invention performs predictive visualization on PeMS and METR-LA datasets. The time range was set to 288 durations and nodes 11 and 190 were randomly selected for visualization in PeMS dataset, with the results shown in fig. 3 and 4. The nodes 119 and 176 are randomly selected for visualization in the METR-LA dataset as shown in FIGS. 5 and 6. It can be found that the prediction of the T-DGAN method accurately follows the true value of the traffic speed.
The T-DGAN method and various baseline methods are subjected to experiments on PeMS and METR-LA data sets, and the prediction results of the T-DGAN method and the baseline methods in 15 minutes, 30 minutes and 60 minutes are shown in table 2 and table 3, so that the experimental results show that the T-DGAN method provided by the invention obtains good prediction results on the two data sets.
It can be observed from tables 2 and 3 that the predictions of conventional time series analysis methods are not ideal, indicating that these methods have limited modeling capabilities for non-linearities and high complexity of traffic flow. Meanwhile, compared with the traditional time sequence analysis method, the deep learning-based method obtains better prediction results. For example, the DCRNN, STGCN, ASTGCN method and the T-DGAN method of the present invention consider both the spatio-temporal correlation and have better performance than the conventional time series methods such as ARIMA and FC-LSTM. In addition, GMAN methods perform better than G-WN, STGCN, ASTGCN, etc., indicating that the encoder-decoder architecture used in GMAN can effectively capture the dynamic spatio-temporal correlation of traffic streams.
In contrast, the T-DGAN method provided by the invention obtains better prediction results compared with a baseline method, and proves the effectiveness of the T-DGAN method in capturing the time-space correlation of traffic flow. Meanwhile, the T-DGAN method captures the time-space correlation of traffic flow through an encoder-decoder architecture, and models the direct relation between the historical time step and the future time step by combining a time-space convolution network and a diffusion diagram attention mechanism, thereby being beneficial to alleviating the error propagation problem between the predicted time steps.
TABLE 2 comparison of predicted Performance on PeMS data set
TABLE 3 comparison of predicted Performance on METR-LA datasets
In order to evaluate the performance of different modules in the T-DGAN method provided by the invention, an ablation experiment is performed.
(1) Influence of dynamic diagram on prediction result
According to the invention, an ablation experiment of a dynamic graph and a static graph is carried out on PeMS and METR-LA datasets, and the influence of the dynamic graph and the static graph on traffic flow prediction is researched. As can be seen from the ablation experimental results in table 4, the prediction performance of the dynamic graph is better than that of the static graph, and the dynamic graph has better prediction performance on traffic flow than that of the static graph.
TABLE 4 dynamic and static images set the experimental results
(2) Influence of space-time convolution (ST-Conv Block) on prediction results
To study the performance of the different modules in the T-DGAN method, variants of the T-DGAN method (NST-Conv Block: without space-time convolution network module) were designed, the effect of the space-time convolution module on the method prediction performance was verified, and traffic flow predictions of NST-Conv Block variants and T-DGAN method on PeMS and METR-LA datasets were performed for 15 min, 30min and 60 min, as shown in Table 5.
Table 5.T-comparison of the prediction results of the DGAN method and the variant method
At 15 minutes, the T-DGAN method reduced MAE on PeMS and METR-LA datasets by about 6.67%,1.52% and RMSE by about 3.47% and 2.02% respectively, as compared to NST-Conv Block method. At 30 minutes, MAE was reduced by about 7.16%,2.01% and RMSE was reduced by about 3.91%,0.94%, respectively. At 60 minutes, MAE was reduced by about 11.56%,2.04%, and RMSE was reduced by about 6.93%,1.08%, respectively. As can be obtained from table 5, the T-DGAN method has better prediction performance in different prediction time steps, and particularly in long-term prediction, the difference between the T-DGAN method and the NST-Conv Block method is more remarkable, which proves that the ST-Conv Block module effectively relieves the influence of error propagation.
(3) Influence of dynamic adjacency matrix on prediction result
The adjacency-shift matrix contains edge weight information between vertices, and the edge weights reflect traffic flow between traffic sensors, so that the dynamically updated adjacency matrix shows dynamically changing traffic flow over the road segment. Experiments prove that the adjacent transfer matrix is dynamically updated in the learning process. The results on PeMS and METR-LA datasets are shown in FIGS. 7 and 8, respectively, with the last batch T e,Td being different at randomly truncated time nodes, demonstrating that T e,Td is constantly changing during the learning process.
While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (6)
1. A traffic flow combined prediction method of a diffusion map attention network based on a transducer adopts a transducer encoder-decoder architecture, wherein the encoder and the decoder comprise a plurality of space-time convolution network modules and a diffusion map attention module; the space-time convolution network module captures the time dependence and the space dependence of traffic flow through the time gating convolution network and the space convolution network respectively, the diffusion map attention module adaptively learns the diffusion parameter of each diffusion step by using a query key value self-attention mechanism, and dynamically updates the adjacent transfer matrix so as to capture the dynamic space-time dependence of traffic flow; the decoder is added with an information auxiliary module to aggregate traffic flow information between the encoder and the decoder, and finally, a prediction sequence is output through the decoder to predict;
Wherein the encoder and the decoder have an L-1 layer and an L-1 layer, respectively; given the input X {t-T′+1,...,t} and the adjacency matrix A of the traffic flow prediction method, they are first converted into feature matrices, respectively And transition matrixWherein/>D represents a degree matrix with A having a self-loop: /(I)And/>Weighting matrices for encoder and decoder, respectively, representing X {t-T+1,...,t},/>And/>Representing the bias of the encoder and decoder, respectively; t e (1) and/>Adjacent transition matrices representing the encoder and decoder, respectively; by/>Calculating a result of the traffic flow prediction, wherein/>Transformation matrix representing fully connected layers,/>Representing the corresponding deviation; output/>, of the last layer encoderAnd/>Diffusion Attention modules input to each layer of decoders to aggregate traffic flow temporal-spatial characteristic information between the encoder and decoder;
Representing the road network as a graph g= (V, E, a), where V represents a set of N road network nodes, E represents a set of edges, a E R N×N represents a weighted adjacency matrix, a ij is 1 if V i,vj E V and (V i,vj) E, otherwise 0; in each time step t, traffic flow X t∈RN×C on graph G is given, where C represents the number of features of each node;
Learning function f of the traffic flow prediction method: taking X {t-T′+1,...,t} as input, and predicting traffic flow of T time steps in the future, the mapping relation is as follows:
2. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the time-gated convolution layer comprises a one-dimensional convolution, using a gated linear unit to capture the time dependence of traffic flow; for each node in the traffic network G, the time convolution explores the adjacent time steps of the input element with zero padding so that the time dimension size remains unchanged; given the time convolution input of each node It is a sequence of length P, with D in features, using a 1D convolution kernel/>Core size (K t, 1), input size D in and output size 2D out give output/>P and Q are divided into two parts along the characteristic dimension and input into a gating linear unit; the time-gated convolutional layer can be expressed as:
wherein, P and Q are the inputs of gates in the gating linear unit respectively, and the Hadamard product of the elements is represented by the following, sigma (Q) uses a Sigmoid function as an activation function to selectively acquire information in the hidden state and the input X.
3. The Transformer-based traffic flow prediction method for the diffusion graph attention network according to claim 1, wherein the graph convolution operation aggregates the features of the neighbor nodes to the center node based on the graph structure to update the node features, and the graph convolution network is simplified ChebNet by a first order approximation:
Wherein, Representing normalized adjacency matrix with self-loop,/>Input graph signal representing N node with D in features,/>Representing output,/>Representing a matrix of learnable parameters; the basic graph convolution network is only applicable to undirected graphs and does not accord with the directional property of the traffic network; to facilitate convolution on the directed graph, the diffusion convolution can be generalized to the form of equation (4):
wherein M k represents the power series of the transition matrix and K represents the number of diffusion steps; in the directed graph, the diffusion process is divided into forward and backward directions, where the forward transfer matrix is M f =a/rowsum (a) and the backward transfer matrix is M b=AT/rowsum (a).
4. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized in that the spatial convolution layer captures local and global spatial dependencies of traffic flows; performing a K-step diffusion convolution in both forward and reverse directions using a predefined weighted adjacency matrix to capture K-order local spatial dependencies, corresponding to equation (4); formally, given a spatial convolution layerFor each slot/>, of the input tensorA spatial convolution operation is performed, and the calculation process can be expressed as:
Where W represents a learnable parameter matrix convolved with the adjacency matrix a.
5. The method of traffic flow prediction using a Transformer-based distraction map attention network according to claim 1, wherein the distraction map attention network encoder, as exemplified by layer i, gives the inputAnd/>Output feature matrix/>The following is shown:
Wherein, Representing residual connection,/>Representing multi-headed diffusion attention,/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; given/>The head number of (2) is H, and then:
Wherein, Representing a single head diffusion attention, ||representing a join operation; k represents the diffusion step size, K represents the maximum diffusion order,/>Calculated from equation (8):
where θ k represents the diffusion weight coefficient, for the corresponding diffusion step (T e (l))k, use Query-Key-Value attention to obtain the appropriate θ k, as follows:
Wherein W V represents a transformation matrix of Value, view represents reshape operation of the matrix, the shape of the original matrix is given as R N×N, the output is a single-row vector, and the dimension is An input sequence representing Query-Key-Value attention; e ik represents the attention score between two different diffusion steps i and k, e ij represents the attention score between two different diffusion steps i and j; e ij is calculated by equation (10):
Wherein d qs represents the size of the Query, and W Q and W K represent transformation matrices of the Query and Key respectively; and/> The input of Query-Key-Value attention of the diffusion steps i and j are respectively shown;
the output adjacency transfer matrix T e (l+1) is calculated by residual connection as follows:
Wherein, Representing residual connection,/>A dynamically updated portion of representation adjacency transition, which is calculated as follows:
where m.epsilon.1, M represents the replica index, M represents the number of replicas, Representation/>The ith row and jth column element of (c)/>Representing the attention score of the mth replica, it is calculated as follows:
wherein laekyReLU represents an activation function, a m (·) represents a learnable weight vector for the mth replica; And Representing feature matrix/>I and j lines of (a) represent feature vectors of nodes i and j, respectively.
6. The Transformer-based diffusion map attention network traffic flow prediction method of claim 1, characterized by a space-time decoder for traffic flow prediction, receiving the space-time features extracted by the encoder to generate a future traffic flow sequence; the single-layer decoder consists of a space-time convolution module, a diffusion diagram attention module and an auxiliary module for aggregating information between the encoder and the decoder; the input of the layer I decoder isAnd/>The output of the DGA-Block module of the layer i decoder is as follows:
Wherein, Representing multi-headed diffusion attention, the calculation process is the same as that of formula (7)/>Representing a learnable weight matrix,/>Representing a linear transformation matrix; /(I)And/>The calculation process of (1) is the same as that of formulas (11) and (12); /(I)AndAnd/>Together with T e L to the auxiliary module to aggregate traffic flow information between the encoder and decoder; then, the output of the layer-I decoder is as follows:
Wherein, The method is characterized in that the method is used for expressing the diffusion attention, the calculation process is similar to the formula (7), the calculation formula (9) of diffusion parameters is used for expressing the attention score, and the calculation of the attention score is shown in the formula (10); will/>And/>Respectively expressed as/>Diffusion parameters and attention scores of (1), then/>The calculation process of (2) is as follows:
Wherein, Transformation matrix representing Value,/>Representing an input sequence; /(I)Calculated from equation (19):
Wherein d qs represents the size of Query, And/>Transform matrices respectively representing Query and Key; and/> The inputs of Query-Key-Value attention spread steps i and j, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310483068.9A CN116504060B (en) | 2023-05-01 | 2023-05-01 | Diffusion diagram attention network traffic flow prediction method based on Transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310483068.9A CN116504060B (en) | 2023-05-01 | 2023-05-01 | Diffusion diagram attention network traffic flow prediction method based on Transformer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116504060A CN116504060A (en) | 2023-07-28 |
CN116504060B true CN116504060B (en) | 2024-05-14 |
Family
ID=87326175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310483068.9A Active CN116504060B (en) | 2023-05-01 | 2023-05-01 | Diffusion diagram attention network traffic flow prediction method based on Transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116504060B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117133116B (en) * | 2023-08-07 | 2024-04-19 | 南京邮电大学 | Traffic flow prediction method and system based on space-time correlation network |
CN116884222B (en) * | 2023-08-09 | 2024-03-26 | 重庆邮电大学 | Short-time traffic flow prediction method for bayonet nodes |
CN117726183A (en) * | 2024-02-07 | 2024-03-19 | 天津生联智慧科技发展有限公司 | Gas operation data prediction method based on space high-order convolution |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071065A (en) * | 2020-09-16 | 2020-12-11 | 山东理工大学 | Traffic flow prediction method based on global diffusion convolution residual error network |
CN113450568A (en) * | 2021-06-30 | 2021-09-28 | 兰州理工大学 | Convolutional network traffic flow prediction method based on space-time attention mechanism |
CN113487088A (en) * | 2021-07-06 | 2021-10-08 | 哈尔滨工业大学(深圳) | Traffic prediction method and device based on dynamic space-time diagram convolution attention model |
CN115482656A (en) * | 2022-05-23 | 2022-12-16 | 汕头大学 | Method for predicting traffic flow by using space dynamic graph convolution network |
CN115828990A (en) * | 2022-11-03 | 2023-03-21 | 辽宁大学 | Time-space diagram node attribute prediction method for fused adaptive graph diffusion convolution network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215223B (en) * | 2020-10-16 | 2024-03-19 | 清华大学 | Multidirectional scene character recognition method and system based on multi-element attention mechanism |
CN113672865A (en) * | 2021-07-27 | 2021-11-19 | 湖州师范学院 | Traffic flow prediction method based on depth map Gaussian process |
-
2023
- 2023-05-01 CN CN202310483068.9A patent/CN116504060B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071065A (en) * | 2020-09-16 | 2020-12-11 | 山东理工大学 | Traffic flow prediction method based on global diffusion convolution residual error network |
CN113450568A (en) * | 2021-06-30 | 2021-09-28 | 兰州理工大学 | Convolutional network traffic flow prediction method based on space-time attention mechanism |
CN113487088A (en) * | 2021-07-06 | 2021-10-08 | 哈尔滨工业大学(深圳) | Traffic prediction method and device based on dynamic space-time diagram convolution attention model |
CN115482656A (en) * | 2022-05-23 | 2022-12-16 | 汕头大学 | Method for predicting traffic flow by using space dynamic graph convolution network |
CN115828990A (en) * | 2022-11-03 | 2023-03-21 | 辽宁大学 | Time-space diagram node attribute prediction method for fused adaptive graph diffusion convolution network |
Also Published As
Publication number | Publication date |
---|---|
CN116504060A (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Du et al. | Deep air quality forecasting using hybrid deep learning framework | |
CN116504060B (en) | Diffusion diagram attention network traffic flow prediction method based on Transformer | |
Guo et al. | Attention based spatial-temporal graph convolutional networks for traffic flow forecasting | |
Guo et al. | Short-term traffic speed forecasting based on graph attention temporal convolutional networks | |
Ye et al. | Meta graph transformer: A novel framework for spatial–temporal traffic prediction | |
Sun et al. | Dual dynamic spatial-temporal graph convolution network for traffic prediction | |
CN114519469B (en) | Construction method of multivariable long-sequence time sequence prediction model based on transducer framework | |
Li et al. | Graph CNNs for urban traffic passenger flows prediction | |
Huang et al. | Transfer learning in traffic prediction with graph neural networks | |
Zheng et al. | Hybrid deep learning models for traffic prediction in large-scale road networks | |
Modi et al. | Multistep traffic speed prediction: A deep learning based approach using latent space mapping considering spatio-temporal dependencies | |
Jin et al. | Adaptive dual-view wavenet for urban spatial–temporal event prediction | |
CN114299728A (en) | Vehicle flow prediction method combining attention mechanism and dynamic space-time convolution model | |
Gao et al. | Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction | |
CN115376317B (en) | Traffic flow prediction method based on dynamic graph convolution and time sequence convolution network | |
Sriramulu et al. | Adaptive dependency learning graph neural networks | |
Wang et al. | TYRE: A dynamic graph model for traffic prediction | |
Zhang et al. | Attention-driven recurrent imputation for traffic speed | |
CN114582128A (en) | Traffic flow prediction method, medium, and device based on graph discrete attention network | |
Hu et al. | Multi-source information fusion based dlaas for traffic flow prediction | |
Su et al. | Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters | |
Guopeng et al. | Dynamic graph filters networks: A gray-box model for multistep traffic forecasting | |
Lin et al. | Hybrid water quality prediction with graph attention and spatio-temporal fusion | |
CN117290706A (en) | Traffic flow prediction method based on space-time convolution fusion probability sparse attention mechanism | |
Wang et al. | PSTN: periodic spatial-temporal deep neural network for traffic condition prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |