CN114493014A

CN114493014A - Multivariate time series prediction method, multivariate time series prediction system, computer product and storage medium

Info

Publication number: CN114493014A
Application number: CN202210107028.XA
Authority: CN
Inventors: 谢鲲; 刘丹; 陈小迪; 张大方; 文吉刚; 李肯立
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-13

Abstract

The invention discloses a multivariate time sequence prediction method, a multivariate time sequence prediction system, a computer product and a storage medium, wherein two feature extraction codes are used for respectively extracting space-time feature vectors of long-term and short-term historical data matrixes, the historical time sequence matrixes are input into a space feature extraction coder to generate weighted attention space feature vectors, and the weighted space feature vectors are input into a gating circulation unit to generate the space-time feature vectors; inputting the space-time feature vector extracted by the long-term historical data matrix into an interactive attention module to generate a weighted feature vector; inputting the short-term historical data matrix into an autoregressive layer to generate a linear prediction result of the short-term historical time sequence data; and combining the weighted feature vector and the coding feature vector, inputting the weighted feature vector and the coding feature vector into a full-link layer to generate a neural network prediction result, and adding the neural network prediction result and an autoregressive layer linear prediction result to obtain a final prediction result. The invention realizes the accurate prediction of the multivariate time series data.

Description

Multivariate time series prediction method, multivariate time series prediction system, computer product and storage medium

Technical Field

The invention relates to the field of multivariate time series data prediction, in particular to a multivariate time series prediction method and a multivariate time series prediction system based on a feature extraction coding and interaction attention module.

Background

With the development of big data technology and the rapid increase of data, there are wide application scenarios for predicting the future state of the big data technology by using time series data, such as traffic flow prediction on traffic lines, stock price prediction on stock markets, and air quality index prediction in different cities. The accurate prediction of new trends or potential events is often the content of real interest to the user, and provides powerful support for future decision making and planning, which is helpful for the implementation of advanced applications. However, problems with complex periodic patterns and dependencies in time series data are not well modeled. The industry has conducted a deep study of the above problems.

Research shows that time series data have complex periodic patterns and time and space dependencies, so that the short-term and long-term periodic patterns are accurately mined, the time and space dependencies are learned and information from other variables is effectively combined by extracting the internal relation among historical time series, and the accurate time series prediction is still a challenging task.

In order to solve the problem of time series prediction, many researches and methods for predicting a sequence have been conducted. In summary, there are two main categories:

the first category of methods is the traditional time series prediction model. Generally referred to as statistical models for time series analysis/prediction^[1,2,3,4]For example, there are common methods such as mean regression, autoregressive moving average, and exponential smoothing prediction. The method mainly extracts the trend of historical data through modeling analysis of the historical data, and finally obtains the change of the demand in the future period of time through prediction of the trend. The method has the advantages of low complexity and high calculation speed, can provide a reasonable range of a prediction result, and helps to carry out post calibration when the black box model finally outputs the result by using the reasonable range, so that the prediction system is more stable.

However, these traditional regression-based models typically assume that the time series conform to some distribution or functional form, and then analyze and predict based thereon. This analysis method cannot capture complex non-linear relationships in the sequence and many datasets do not perform satisfactorily. Furthermore, conventional models based on functional distributions are not suitable for multivariate sequence analysis. Models like autoregressive integrated moving average (ARIMA) can only capture the distribution information of single sequence history data, and cannot model complex patterns or dependencies in real data and capture and integrate relationships between sequences.

With the improvement of computing power and the development of machine learning theory, the deep neural network gets more and more attention in various fields, and the second category of research direction of extracting features and learning data correlation for prediction by using a machine learning method becomes popular^[5]. A common processing method is to use a self-encoder^[6]Convolutional neural network^[7]Adding a recurrent neural network^[8,9](e.g., gated cycle units), etc. The self-encoder and the convolutional neural network are used for capturing the internal spatial relationship of the time sequence data at each moment, the gating cycle unit is used for learning the time relationship of the time sequence data, and finally the time-space relationship is integrated for prediction.

However, conventional recurrent neural networks vanish due to the presence of gradients^[10]The problem of (2) is not well dealing with long sequences and tasks that require long-term historical information. To address the problem of gradient disappearance, researchers have proposed short-term and long-term memory network (LSTM) models and gated cyclic units (GRU) models. The latter has similar effects but is computationally less intensive and is therefore widely used in research today. An attention mechanism is introduced in subsequent studies to allow the model to selectively focus on a portion of all information while ignoring other visible information.

Hybrid neural networks, as compared to single neural networks^[11,12^,13]Has better performance. Finally, and most importantly, we note that there are complex periodic patterns and spatio-temporal correlations in time series data, employing multitaskingThe learning method can effectively improve the generalization and the accuracy of the model. The existing prediction algorithm is usually insufficient in the capturing capability of trend features and multi-period modes among multivariate time sequences, for example, in a TCN multivariate time sequence prediction method based on a parallel space-time attention mechanism, in a module for obtaining multivariate time sequence data space-time features, complex space-time correlation among historical time sequence data cannot be obtained only by performing simple linear transformation and normalization on original input, and a TCN needs a complete time sequence to predict.

[1]Elvin Isufifi,Andreas Loukas,Nathanael Perraudin,and Geert Leus.Forecasting time series with varma recursions on graphs.IEEE Transactions on Signal Processing,67(18):4870–4885,2019.

[2]Helmut Lutkepohl.New introduction to multiple time series analysis.Springer Science&Business Media,2005.

[3]Jiahan Li and Weiye Chen.Forecasting macroeconomic time series:Lasso-based approaches and their forecast combinations with dynamic factor models.International Journal of Forecasting,30(4):996–1015,2014.

[4]George EP Box,Gwilym M Jenkins,Gregory C Reinsel,and Greta M Ljung.Time series analysis:forecasting and control.John Wiley&Sons,2015.

[5]Liu,Hanpeng,et al.“CoSTCo:A Neural Tensor Completion Model for Sparse Tensors.”Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining,2019,pp.324–334.

[6]Fan,Jicong,and Tommy W.S.Chow.“Deep Learning Based Matrix Completion.”Neurocomputing,vol.266,2017,pp.540–549.

[7]Lecun Y,Bottou L.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.

[8]Zhuoning Yuan,Xun Zhou,and Tianbao Yang.Hetero-convlstm:A deep learning approach to traffiffiffic accident prediction on heterogeneous spatio-temporal data.In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining,pages 984–992,2018.

[9]Xu Zhang,Furao Shen,Jinxi Zhao,and GuoHai Yang.Time series forecasting using gru neural network with multi-lag after decomposition.In International Conference on Neural Information Processing,pages 523–532.Springer,2017.

[10]Yoshua Bengio,Patrice Simard,and Paolo Frasconi.Learning long-term dependencies with gradient descent is difficult.IEEE transactions on neural networks,5(2):157–166,1994.

[11]Guokun Lai,Wei-Cheng Chang,Yiming Yang,and Hanxiao Liu.Modeling long-and short-term temporal patterns with deep neural networks.In The 41st International ACM SIGIR Conference on Research&Development in Information Retrieval,pages 95–104,2018.

[12]Huaxiu Yao,Xianfeng Tang,Hua Wei,Guanjie Zheng,and Zhenhui Li.Revisiting spatial-temporal similarity:A deep learning framework for traffiffiffic prediction.In Proceedings of the AAAI conference on artifificial intelligence,volume 33,pages 5668–5675,2019.

[13]Zhenxiong Yan,Kun Xie,Xin Wang,Dafang Zhang,Gaogang Xie,KenliLi,and Jigang Wen.Multivariate time series forecasting exploitingtensor projection embedding and gated memory network.In29thIEEE/ACM International Symposium on Quality of Service,IWQOS2021,Tokyo,Japan,June 25-28,2021,pages 1–6.IEEE,2021.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multivariate time series prediction method, a multivariate time series prediction system, a computer product and a storage medium aiming at the defects of the prior art, overcome the problems that the prior art cannot better learn complex periodic patterns, time and space dependencies and has low prediction precision, and realize the accurate prediction of multivariate time series data with dynamically changing time-space relationship.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multivariate time series prediction method comprises the following steps:

s1, historical time-series data [ X ]^k-nT，...，X^k-1]Partitioning into long-term historical time series data [ X ]^k ^-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k-T，...，X^k-1](ii) a T is the length of the long-term historical time sequence data or the short-term historical time sequence data;

s2, using the long-term historical time-series data [ X ]^k-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k-T，...，X^k-1]Obtaining a weighted attention space feature vector of a long-term historical time series { [ z ]^k-nT，...，z^k ^-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k ^-T，...，z^k-1](ii) a Short-term historical time-series data [ X ]^k-T，...，X^k-1]Inputting the autoregressive layer to obtain a prediction matrix

S3, utilizing the acquired long-term historical time series weighted attention space feature vector { [ z ]^k-nT，...，z^k ^-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k ^-T，...，z^k-1]Respectively obtaining a feature code vector sequence { m) containing space-time correlation_iAnd a eigen-code vector u containing spatio-temporal correlations;

s4, mixing { m_iInputting an interactive attention network with u to obtain attention weight distribution directionQuantity p_iBy calculating p_iAnd { m_iThe product of which yields a weighted feature vector o_i；

S5, weighting characteristic vector o with size of T multiplied by d_iCombining the feature coding vector u with the size of 1 × d and containing space-time correlation into a new vector with the size of (T +1) × d, inputting the combined new vector into a generation model, and generating a nonlinear prediction matrix

S6, generating a prediction matrix of model output

Prediction matrix with autoregressive layer output

And adding to obtain the final prediction result.

The invention uses the characteristic extraction encoder to obviously improve the characteristic learning speed; the relevance among the space-time characteristic vectors can be captured as much as possible through the interactive attention network, effective information contained in historical data is greatly reserved, and accurate prediction of the multivariate time sequence data can be achieved. The method solves the problems that the prior art cannot better learn complex periodic patterns, time and space dependencies and has low prediction precision, and realizes accurate prediction of the multi-element time sequence data with dynamically changing time-space relationship. The specific implementation process of step S2 includes: long-term historical time series data [ X ]^k-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k-T，...，X^k-1]Obtaining a space feature vector [ e ] of a long-term historical time sequence through a first space feature extractor and a second space feature extractor respectively^k-nT，...，e^k-T-1]And a short-term historical time series of spatial feature vectors [ e^k-T，...，e^k-1]Will [ e ]^k-nT，...，e^k-T-1]And [ e^k-T，...，e^k-1]Inputting the first attention layer and the second attention layer respectively to obtain the long termWeighted attention space feature vector of historical time series { [ z ]^k-nT，...，z^k ^-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k ^-T，...，z^k-1]. Because the matrix sequences contain complex space-time correlation, the spatial relation of the adjacent positions of the matrix can be extracted through the spatial feature extractor, and the influence between different position data can be further acquired by adopting an attention mechanism, so that the weighted attention spatial feature vector is acquired.

The specific implementation process of step S3 includes: weighted attention space feature vector of long-term history time series { [ z ] to be acquired^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k-T，...，z^k-1]Sequentially inputting a first gating circulation unit and a second gating circulation unit according to the time sequence, and respectively outputting a characteristic coding vector sequence { m } containing space-time correlation_iAnd a feature code vector u containing spatio-temporal correlations. The gate control circulation unit can capture a nonlinear time relation, and the obtained space-time characteristic vectors retain hidden space-time information in historical data.

o_i＝p_i×m_i；p_i＝Softmax(u^Tm_i)。

The obtained characteristic coding vector sequence containing space-time correlation { m_iAnd inputting the feature coding vector u containing the space-time correlation into an interactive attention network, and better capturing the relation between the space-time feature vectors output by the prediction encoder. Obtaining an attention weight distribution vector p by calculating an inner product of feature vectors and Softmax ()_iAnd a weighted feature vector O_i. According to the method, the mutual dependency relationship between the characteristic vector matrixes is further mined and dynamically captured through the interactive attention network, so that the finally obtained space-time characteristic vector can greatly retain effective information contained in historical data.

In step S5, the generative model includes a plurality of cascaded fully connected layers. And performing weighted mixing on the historical time sequence data through the weight matrix, introducing an activation function and adding a nonlinear relation, and finally extracting the relation between different elements of the matrix. Theoretically, the more full connection layers, the stronger the extraction capability of the feature extraction network.

The invention also provides a multivariate time series prediction system, which comprises:

a first feature extraction encoder for inputting the long-term history time-series data [ X ]^k-nT，...，X^k-T-1]The output is a characteristic code vector sequence { m) containing space-time correlation_i}；

A second feature extraction encoder for inputting a weighted attention space feature vector [ z ] of the short-term historical time series^k ^-T，...，z^k-1]Outputting a characteristic coding vector u containing space-time correlation;

autoregressive layer, input as short-term historical time series data [ X ]^k-T，...，X^k-1]The output is a prediction matrix

An interactive attention network, the input is a characteristic coding vector sequence { m ] containing space-time correlation_iAnd a feature coded vector u containing a spatio-temporal correlation, outputting a new vector of (T +1) × d;

generating a model, inputting a new vector of (T +1) x d, and outputting a nonlinear prediction matrix

A prediction module for generating a prediction matrix of the model output

Prediction matrix with autoregressive layer output

Adding to obtain a final prediction result;

wherein the long-term history time-series data [ X^k-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k ^-T，...，X^k-1]From historical time series data [ X ]^k-nT，...，X^k-1]Dividing to obtain; and T is the length of the long-term historical time sequence data or the short-term historical time sequence data.

According to the multivariate time sequence prediction system provided by the invention, firstly, the spatial relationship of the adjacent positions of the matrix can be extracted through the space-time information contained in the matrix sequence by the characteristic extraction encoder, the incidence relationship of data among different time dimensions can be further captured, the nonlinear time relationship can be captured, and the obtained space-time characteristic vectors retain the space-time information hidden in the historical data; and further mining and dynamically capturing the interdependence relation between the characteristic vector matrixes through an interactive attention network, so that the finally obtained space-time characteristic vector can greatly keep effective information contained in historical data. According to the method, the model effect is generated by integrating the space-time characteristic vector further captured and the space characteristic vector information at the current moment and training through the loss function, and the prediction result at the future moment is finally generated by combining the linear prediction result output by the autoregressive layer, so that the problems that the prior art cannot better learn complex periodic patterns, time and space dependence and low prediction precision are solved, and the accurate prediction of the multi-element time sequence data with the dynamic change of the space-time relationship is realized. .

The first feature extraction encoder and the second feature extraction encoder have the same structure; wherein the first feature extraction encoder includes:

a first spatial feature extractor for inputting the long-term historical time-series data [ X ]^k-nT，...，X^k-T-1]The output is a space characteristic vector [ e ] of the long-term historical time sequence^k-nT，...，e^k-T-1]；

First attention level, input of [ e ]^k-nT，...，e^k-T-1]And outputting a weighted attention space feature vector { [ z ] of the long-term historical time series^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]}；

A first gated cyclic unit with the input of weighted attention space feature vector { [ z ] of long-term history time series^k ^-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]The output is a characteristic coding vector sequence { m) containing space-time correlation_i}。

The adoption of the feature extraction encoder can obviously improve the learning speed of the model on the space-time correlation of the sequence data. The spatiotemporal relationship of the adjacent positions of the matrix can be extracted through spatiotemporal information contained in the matrix sequence by the characteristic extraction encoder and the gating circulation unit, and the incidence relationship of data among different time dimensions can be further captured by adopting the attention layer. The gate control circulation unit can capture a nonlinear time relation, and the obtained space-time characteristic vectors retain hidden space-time information in historical data.

The interactive attention network is configured to perform operations comprising: will { m_iUsing the u and the u as the input of the self-attention layer to obtain the attention weight value distribution vector p_iBy calculating p_iAnd { m_iThe product of which yields a weighted feature vector o_iWeighted feature vector o of size T × d_iAnd a feature code vector u of size 1 × d containing spatio-temporal correlation is combined into a new vector of size (T +1) × d.

As one inventive concept, the present invention also provides a computer-readable storage medium including a program running in a processor; the program is configured for performing the steps of the method of the invention.

As an inventive concept, the present invention also provides a computer program product comprising computer programs/instructions; which when executed by a processor implement the steps of the method of the present invention.

In the invention, the long-term time sequence historical data sequence can be divided into a plurality of groups, for example, the long-term time sequence historical data sequence can be divided into 7 groups, and a sliding window can be set for the long-term time sequence historical data sequence, so that the long-term and short-term time sequences are all T according to the unit length of a specified window (because the input feature extraction encoders have the same structure, in order to ensure that the input lengths are all T, different groups of data can be input by setting the sliding window).

Compared with the prior art, the invention has the beneficial effects that: the invention arranges the historical time sequence data matrix into a sequence one by one according to the time sequence, and the sequence contains the spatial correlation and the time correlation among the data. The historical time series data is divided into a long-term time series historical data sequence and a short-term time series historical data sequence, and the long-term time series historical data sequence and the short-term time series historical data sequence are arranged in time sequence to be used as input of the feature extraction codes. A multivariate time sequence prediction method based on feature extraction coding and an interactive attention module firstly extracts space-time information contained in a matrix sequence through a feature extraction coder and a gating circulation unit. In the process, the feature extraction encoder can extract the spatial relationship of the adjacent positions of the matrix, the attention layer can further capture the incidence relationship of data among different time dimensions, the gating circulation unit can capture the nonlinear time relationship, and the obtained space-time feature vectors retain the hidden space-time information in the historical data; according to the method, the interdependency relation between the characteristic vector matrixes is further mined and dynamically captured through an interactive attention network, so that the finally obtained space-time characteristic vector can greatly keep effective information contained in historical data; then, the method generates a model effect by integrating the space-time characteristic vector further captured and the space characteristic vector information at the current moment and training through a composite loss function, and finally generates a prediction result at the future moment by combining the linear prediction result input by the autoregressive layer; in the process, compared with other algorithms, the representation learning speed can be remarkably improved by using the space-time predictive coding, and the convergence speed of the generated model can be remarkably accelerated by adopting the relative root mean square error loss function and the empirical correlation coefficient combined training; the invention can input long-term historical data into the characteristic extraction encoder from the memory component, and can find that when new moment data comes temporarily, the data filling can be completed only by executing a plurality of steps of the method, rather than training the model parameters from the beginning like other algorithms, so the time complexity and the calculation complexity of the method are lower; by introducing the interactive attention network, the relevance among the space-time characteristic vectors can be captured as much as possible, effective information contained in historical data is greatly reserved, and accurate prediction of the multivariate time series data can be realized. The method adopts two independent feature extraction encoders to accurately and efficiently acquire the complex space-time correlation among historical time sequence data, firstly extracts the spatial features among the data through a convolutional neural network, weights the acquired spatial features through an attention layer, and further captures the spatial correlation; the weighted attention space-time characteristics of the data are further acquired through the gate control circulation unit, so that the problem of long dependence in the RNN is solved; the long-term historical time sequence data fully acquire the influence among different time steps through an interactive attention network, and meanwhile, the short-term historical time data further improve the prediction accuracy and the experimental robustness through an autoregressive component.

Drawings

FIG. 1 is a network model for predicting future time data of a plurality of time-series data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of extracting spatial information from historical time-series data and outputting a spatial feature vector according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an embodiment of extracting temporal correlations by inputting spatial feature vectors into a gated cyclic unit one by one;

FIG. 4 is a diagram of the prediction of the neural network portion by combining weighted feature vectors and short-term time-series space-time features in accordance with an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a multivariate time sequence prediction method based on a feature extraction coding and interactive attention module, which comprises the following steps:

step B1, historical time series data [ X ]^k-nT，...，X^k-1]Partitioning into long-term historical time series data [ X ]^k ^-nT，...，X^k-T-1]And short term historical time seriesAccording to [ X ]^k-T，...，X^k-1]In the embodiment of the invention, the value of n is 8, and the lengths of the input long-term and short-term historical time sequences are all T;

the time series data matrix mentioned in B1 has rows representing time nodes and columns representing variable dimensions, and the element values in the matrix represent the values of different indexes at corresponding time instants.

Step B2, inputting a long-term historical time sequence [ X ] with the length of T^k-nT，...，X^k-T-1]And short-term time series historical data series [ X ]^k-T，...，X^k-1]Respectively obtaining a space feature vector [ e ] of a long-term history time sequence through a space feature extractor consisting of a convolutional layer and a plurality of fully-connected layers^k-nT，...，e^k-T-1]And a short-term historical time series of spatial feature vectors [ e^k-T，...，e^k-1]Obtaining a weighted attention space feature vector of a long-term historical time series immediately after passing through an attention layer { [ z ]^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k-T-1，...，z^k]。

The spatial feature extractor reasonably constructed by the convolutional layer and the multilayer full-connection layer is mainly used for extracting the spatial correlation of the historical time data sequence and outputting the spatial correlation in the form of a spatial feature vector. Taking the full-connection layer in the short-term historical time sequence matrix characteristic extraction code as an example, the original input [ X^k-T，...，X^k-1]The layer output weighted attention space feature vector is obtained by:

[z^k-T，...，z^k-1]＝Softmax(f(W·[X^k-T，...，X^k-1]+b))

where W represents the weight matrix, b is the bias term, and f () is the activation function.

And performing weighted mixing on the historical time sequence data through the weight matrix, introducing an activation function and adding a nonlinear relation, and finally extracting the relation between different elements of the matrix. Theoretically, the more full connection layers, the stronger the extraction capability of the feature extraction network. The step can obtain the spatial correlation contained in each matrix surface and output the spatial correlation respectively in the form of vector sequences.

Step B3, obtaining the weighted attention space feature vector of the long-term history time series acquired in the step B2 { [ z { [^k ^-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k-T，...，z^k-1]Sequentially inputting the time-sequence into a gating cycle unit to learn the time correlation of the sequence, and respectively outputting a characteristic coding vector sequence { m } containing the space-time correlation_iAnd a feature code vector u containing spatio-temporal correlations. A loop layer with gated loop units (GRUs) uses the ReLU function as a hidden activation function. the hidden state of the cyclic unit at time t can be expressed as,

the weighted attention space feature vectors are input into a gated round robin unit (GRU) one by one to obtain context vectors. Thus, the short-term history matrix sequence [ z ]^k-T，...，z^k-1]Outputting a feature-coded vector u containing a spatio-temporal correlation, where u ═ c^k-1K is a time value; long-term history matrix sequence { [ z ]^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Outputs a feature code vector sequence (m) containing space-time correlation_iIn which m is_i}＝[c^k-(n-1)T，...，c^k-T-1]. In fact GRU inputs a weighted attention space feature vector z each time^k-n(1. ltoreq. n. ltoreq.T) results in a corresponding context vector c^k-nAnd a hidden state h^k-n. In which the hidden state h^k-nInput c for passing history information to the next moment^k-n+1. It is the transfer of the hidden state that enables the learning of the temporal correlation of the input sequence.

Finally, we succeed in extracting the encoder module through the characteristicTime series of long-term histories [ X ]^k-nT，...，X^k-T-1]Conversion into a sequence of eigenvectors { m } containing spatio-temporal correlations_i}. Short term time series historical data series [ X ]^k-T，...，X^k-1]And obtaining a feature coding vector u containing space-time correlation by another feature extraction coder, wherein the expression formula is as follows,

m_i＝Encoder_l([X^k-nT，...，X^k-T-1])

u＝Encoder_s([X^k-T，...，X^k-1])；

wherein m is_i∈R^d，u∈R^d，m_iConstructing a sequence of eigen-coded vectors { m } containing spatio-temporal correlations_iU is a feature code vector containing space-time correlation.

Step B4, the feature coding vector sequence { m } containing space-time correlation obtained in step B3 is coded_iAnd inputting the feature coding vector u containing the space-time correlation into an interactive attention network, so that the relationship between the space-time feature vectors output by the predictive encoder is better captured, the calculation speed is accelerated, and the defects of a recurrent neural network are avoided. Computing a feature-coded vector u containing spatio-temporal correlation and a plurality of sequences of feature-coded vectors m containing spatio-temporal correlation by computing the inner product of the feature vectors and a Softmax ()_iAttention weight distribution vector p_iAnd a weighted feature vector O_iThe calculation formula is as follows:

_pi＝SOftmax(u^Tm_i)；

o_i＝p_i×m_i；

wherein Softmax () is

Each of the inputs m_iAll have a corresponding weighted feature vector o_i，

d is the variable dimension of the weighted feature vector.

Step B5, weighting feature vector o containing space-time feature after training_iCombined with a feature code vector u containing spatio-temporal correlations to combine the spatio-temporal correlations with information of the data of the recent time instant.

Specifically, a weighted feature vector o of size T × d_iAnd a feature encoding vector u having a size of 1d and including a space-time correlation are combined into a new vector having a size of (T +1) × d, and the combined new vector is input to a generation model composed of a fully-connected layer to generate a nonlinear prediction matrix. The output value of the fully-connected layer is calculated as,

wherein b is a paranoid, [ u; o₁；o₂；…；o_T]Is a feature vector u and a set of weighted feature vectors { O }_iThe connection of the electronic component is realized,

is a prediction of the neural network portion.

Step B6, generating the prediction matrix of the model output in the step B5

Prediction matrix with autoregressive layer output

And adding to obtain a final prediction result, and fully combining the space-time correlation and the short-term historical time sequence data. Using a classical Autoregressive (AR) model as the linear component, all dimensional variables share the same set of linear parameters. The expression of the AR model is as follows,

wherein the prediction result of the AR component at the time stamp t is represented as

S^arFor the size of the input window, the correlation coefficient of the AR model is

And b^ar∈R。

The final prediction of the method is obtained by integrating the neural network part and the output of the AR model,

in the training process, we use the mean absolute error and the objective function as follows,

where N is the number of training samples and D is the dimension of the target data. All neural models were trained using Adam optimizer.

And generating a model training model by jointly training a relative root mean square error RRSE loss function and an empirical correlation coefficient CORR. The model can have faster convergence speed during training, and relative errors have higher precision.

And step B7, performing iterative training from steps B1 to B6 based on the existing historical data loop until convergence. And at the moment, the model is trained in the early stage, and long-term data is input into the feature extraction module according to groups at the subsequent moment.

The multivariate time series prediction algorithm provided by the invention takes a group of long-term time series historical data matrixes and short-term time series historical data matrixes as input by arranging multivariate time series data of space correlation and time correlation according to the time sequence. Firstly, spatial relation of adjacent positions of a matrix can be extracted through space-time information contained in a matrix sequence by a feature extraction encoder and a gating circulation unit, incidence relation of data among different time dimensions can be further captured by adopting an attention layer, the gating circulation unit can capture nonlinear time relation, and obtained space-time feature vectors retain the space-time information hidden in historical data; and then, the method further excavates and dynamically captures the interdependence relation between the characteristic vector matrixes through an interactive attention network, so that the finally obtained space-time characteristic vector can greatly keep effective information contained in historical data. The method generates a model effect by integrating the space-time characteristic vector further captured and the current time space characteristic vector information and training through a loss function, and finally generates a prediction result of the future time by combining a linear prediction result output by an autoregressive layer.

In the process, compared with other algorithms, the characteristic extraction coding module can be used for remarkably improving the characteristic learning speed, and the convergence speed of the generated model can be remarkably accelerated by adopting the relative root mean square error loss function and the empirical correlation coefficient combined training, so that the time complexity is lower, and the calculation complexity is lower; by introducing the interactive attention network, the relevance among the space-time characteristic vectors can be captured as much as possible, effective information contained in historical data is greatly reserved, and accurate prediction of the multivariate time series data can be realized.

The method provided by the invention adopts the traditional autoregressive linear model and the nonlinear neural network in parallel, and provides reliable guarantee for the robustness and accuracy of the prediction model.

The results of experiments using the indices RRSE and CORR were obtained by summarizing 4 multivariate data sets Traffic, electric, Exchange-Rate and Solar-Energy with 6 baseline algorithms, see Table 1. Wherein the smaller the RRSE value is, the better the experimental result is; the larger the CORR value, the better the experimental results. We set Horizon ═ 3,6, the larger the field of view, the more difficult the prediction. Comparing neural networks with statistical algorithms, we can also directly see that algorithms based on neural networks (i.e. RNN-GRU, DA-RNN, DASNet, MTNet, LSTNet, Our Model) have a clear advantage over statistical algorithms based on autoregressive models (i.e. VAR) in overall results, which has been confirmed by many studies. Due to the limited modeling capability of the traditional time series prediction method based on autoregression, the prediction result is not good. Comparing the recurrent neural network variant with our method, the RNN-GRU and DA-RNN models are only suitable for extracting temporal features, are not suitable for processing spatial features, and have no long-term memory capability, so that they have obvious defects in processing multivariate time series prediction tasks and have poor prediction results. Comparing other hybrid neural networks with our method, LSTNet provides a corresponding solution to extract long-term and long-term complex patterns and spatial correlations, MTNet learns long-term patterns using an end-to-end memory network with a single memory layer, DSANet exploits self-attention mechanisms to mine relationships between multivariate time sequences, but the experimental effect is poor due to the inability to comprehensively capture spatio-temporal correlations and complex periodic patterns between historical time sequences. The method comprises the steps of firstly converting latest short-term historical time sequence data into space-time characteristic vectors, calculating a weighted attention distribution matrix by using the long-term historical time sequence data, and finally obtaining a weighted space-time characteristic vector sequence. Intuitively, the method of the present invention knows which time supports the prediction, and can thus produce a better prediction for a data set with a periodic pattern. Experimental results show that the method is superior to other prediction methods in most cases. To demonstrate the effectiveness of the model design of the present invention, we performed a careful ablation study, with the results shown in table 2. Specifically, the present invention deletes one module of a model at a time and implements the remaining modules using the same environment and data set. A withoutEncoder experiment is carried out by deleting a spatial feature extractor in the feature extraction code, and the experiment effect is poor because the spatial correlation among the historical time sequence data cannot be extracted; the method has the advantages that a withoutInteractivetention experiment is carried out by deleting an interactive attention network, and the experiment effect is poor because the time-space correlation among historical time sequence data cannot be fully obtained; the Without experiment is carried out by deleting the autoregressive layer, and the experiment effect is poor because a linear prediction result of short-term historical time sequence data cannot be obtained. Experimental results show that deleting a certain module of the model directly influences the accuracy of prediction, so that the effectiveness of each experimental module in the model is further verified.

Table 1 prediction accuracy of four different data sets at different future times for six different methods in the experiment

Table 2 experimental data of ablation experiments performed on four different data sets according to the invention

Claims

1. A multivariate time series prediction method is characterized by comprising the following steps:

s1, historical time-series data [ X ]^k-nT，...，X^k-1]Partitioning into long-term historical time series data [ X ]^k-nT，...，X^k ^-T-1]And short-term historical time series data [ X ]^k-T，...，X^k-1](ii) a T is the length of the long-term historical time series data or the short-term historical time series data;

s2, using the long-term historical time-series data [ X ]^k-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k ^-T，...，X^k-1]Obtaining a weighted attention space feature vector of a long-term historical time series { [ z ]^k-nT，...，z^k ^-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k ^-T，...，z^k-1](ii) a Short-term historical time-series data [ X ]^k-T，...，X^k-1]Inputting the autoregressive layer to obtain a prediction matrix

S3, weighted attention space feature direction using acquired long-term historical time seriesQuantity { [ z ]^k-nT，...，z^k ^-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k ^-T，...，z^k-1]Respectively obtaining a feature code vector sequence { m) containing space-time correlation_iAnd a eigen-code vector u containing spatio-temporal correlations;

s4, mixing { m_iInputting an interactive attention network with u to obtain an attention weight value distribution vector p_iBy calculating p_iAnd { m_iThe product of (f) and (f) obtains a weighted feature vector o_i；

S6, generating a prediction matrix of model output

Prediction matrix with autoregressive layer output

And adding to obtain the final prediction result.

2. The multivariate time series prediction method as defined in claim 1, wherein the step S2 is implemented by: time series data [ X ] of long-term history^k-nT，...，X^k-T-1]And short-term historical time series data [ X ]^k-T，...，X^k-1]Respectively obtaining a space feature vector [ e ] of the long-term historical time sequence through a first space feature extractor and a second space feature extractor^k-nT，...，e^k-T-1]And a short-term historical time series of spatial feature vectors [ e^k-T，...，e^k-1]Will [ e ]^k-nT，...，e^k-T-1]And [ e^k-T，...，e^k-1]Respectively inputting a first attention layer and a second attention layer to obtain a weighted attention space feature vector { [ z ] of a long-term history time series^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k-T，...，z^k-1]。

3. The multivariate time series prediction method as defined in claim 1, wherein the step S3 is implemented by: weighted attention space feature vector of long-term history time series { [ z ] to be acquired^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]Weighted attention space feature vector [ z ] of short-term historical time series^k-T，...，z^k-1]Sequentially inputting a first gating circulation unit and a second gating circulation unit according to the time sequence, and respectively outputting a characteristic coding vector sequence { m } containing space-time correlation_iAnd a feature code vector u containing spatio-temporal correlations.

4. The multivariate time series prediction method as defined in claim 1, wherein in step S4, o_i＝p_i×m_i；p_i＝Softmax(u^Tm_i)。

5. The multivariate time series prediction method as defined in claim 1, wherein the generative model comprises a plurality of cascaded fully-connected layers in step S5.

6. A multivariate time series prediction system, comprising:

An interactive attention network, the input is a characteristic coding vector sequence { m ] containing space-time correlation_iAnd a feature coding vector u containing space-time correlation, and outputting a new vector of (T +1) x d;

A prediction module for generating a prediction matrix of the model output

Prediction matrix with autoregressive layer output

Adding to obtain a final prediction result;

7. The multivariate time series prediction system as defined in claim 6, wherein the first and second feature extraction encoders are identical in structure; wherein the first feature extraction encoder includes:

First attention tier, input of [ e ]^k-nT，...，e^k-T-1]And outputting a weighted attention space feature vector { [ z ] of the long-term historical time series^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]}；

A first gated cyclic unit with the input of weighted attention space feature vector { [ z ] of long-term history time series^k-nT，...，z^k-(n-1)T-1]，...，[z^k-2T，...，z^k-T-1]The output is a characteristic code vector sequence (m) containing space-time correlation_i}。

8. The multivariate time series prediction system as defined in claim 6 or 7, wherein the interactive attention network is configured to perform operations comprising: will { m_iUsing the u and the u as the input of the self-attention layer to obtain the attention weight value distribution vector p_iBy calculating p_iAnd { m }_iThe product of which yields a weighted feature vector o_iWeighted eigenvector o of size T × d_iAnd the eigen-code vector u containing the spatio-temporal correlation of size 1 x d are combined into a new vector of size (T +1) x d.

9. A computer-readable storage medium characterized by; comprising a program running in a processor; the program is configured for carrying out the steps of the method according to one of claims 1 to 5.

10. A computer program product comprising a computer program/instructions; characterized in that the computer program/instructions, when executed by a processor, performs the steps of the method according to one of claims 1 to 5.