WO2022053064A1 - 用于时间序列预测的方法和装置 - Google Patents

用于时间序列预测的方法和装置 Download PDF

Info

Publication number
WO2022053064A1
WO2022053064A1 PCT/CN2021/118272 CN2021118272W WO2022053064A1 WO 2022053064 A1 WO2022053064 A1 WO 2022053064A1 CN 2021118272 W CN2021118272 W CN 2021118272W WO 2022053064 A1 WO2022053064 A1 WO 2022053064A1
Authority
WO
WIPO (PCT)
Prior art keywords
future
historical
data sequence
time series
neural network
Prior art date
Application number
PCT/CN2021/118272
Other languages
English (en)
French (fr)
Inventor
朱云依
Original Assignee
胜斗士(上海)科技技术发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 胜斗士(上海)科技技术发展有限公司 filed Critical 胜斗士(上海)科技技术发展有限公司
Publication of WO2022053064A1 publication Critical patent/WO2022053064A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to time series forecasting, and in particular, to a method, apparatus, and computer-readable storage medium for predicting future data of an object based on historical data of the object.
  • Forecasting sales expectations for future times based on product sales over a past period of time is known as product time series forecasting.
  • the current mainstream technologies for time series forecasting include two categories: one is the traditional statistics-based forecasting algorithm represented by Arima/Prophet, and the other is the deep learning-based forecasting algorithm represented by the LSTM neural network.
  • time series prediction algorithm based on traditional statistics is a linear algorithm, and it is difficult to capture the nonlinear and long-term laws in the time series.
  • the time series prediction algorithm based on LSTM neural network is prone to gradient disappearance or gradient explosion when the scale of the time series becomes larger, which leads to the distortion of the prediction result, and the operation efficiency is low, and the data and calculation are redundant.
  • the present application proposes a method, an apparatus and a computer-readable storage medium for time series prediction, which are used to solve at least one defect existing in the prior art solutions, and extract rules from historical data of objects and combine future influencing factors to predict The object's future data.
  • a method for time series prediction comprising:
  • a predicted feature data sequence is generated based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future dynamic feature sequence includes a future dynamic feature of the object corresponding to a future time in the time series, the future dynamic feature being associated with the corresponding future time;
  • an apparatus for time series prediction comprising:
  • a historical data acquisition unit configured to acquire a historical data sequence of the object corresponding to a historical time series, the historical data in the historical data sequence including the history of the object corresponding to the historical time in the historical time series dynamic features and historical values, wherein the historical dynamic features are associated with corresponding historical times;
  • a regularity extraction unit configured to use a first neural network model to extract the regularity data sequence of the object corresponding to the future time sequence based on the historical data sequence
  • a predicted feature generation unit configured to generate a predicted feature data sequence based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future a sequence of dynamic characteristics including future dynamic characteristics of the object corresponding to future times of the future time series, the future dynamic characteristics being associated with the corresponding future times;
  • a prediction unit configured to use a second neural network model to predict a future data sequence of the object corresponding to the future time series based on the predicted feature data sequence, where the future data in the future data sequence includes a sequence related to the future data sequence. The predicted future value of the object corresponding to the future time of the time series.
  • a computer-readable storage medium on which a computer program is stored, the computer program including executable instructions, when the executable instructions are executed by at least one processor, implement the above-mentioned method.
  • an electronic device comprising a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to execute the executable instructions to implement the above the method described.
  • the time series prediction method and device can meet the requirements of efficient calculation, accurately capture the nonlinear effects of trend factors, seasonal factors, external factors, etc. on the predicted object, and make short-distance and Long-range time prediction.
  • FIG. 1 is a schematic diagram of a seq2seq neural network model architecture for time series prediction according to an embodiment of the present application
  • FIG. 2 is an exemplary flowchart of a method for time series forecasting according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of an apparatus for time series prediction according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an electronic device according to an embodiment of the present application.
  • the method and apparatus for predicting the future data of an object based on the historical data of the object are introduced with a specific neural network model structure hereinafter according to the embodiment, the solution of the present application is not limited to this example. It can be extended to other neural network structures capable of realizing the concept of time series forecasting of the present application, and can also be extended to other deep learning-based forecasting model structures.
  • the time series prediction method is introduced with the sales products of the catering industry at the sales place as the object, but the method of the present application can be applied to any objects and scenarios that require time series prediction.
  • the neural network generally refers to an artificial neural network (ANN).
  • a common convolutional neural network can be used for the neural network, and a fully convolutional neural network can be further used as the case may be.
  • Other specific types and structures of neural networks that are not relevant to the time series forecasting method of the present application have not been described too much herein to avoid confusion.
  • both the traditional statistics-based differential integrated moving average autoregressive (Autoregressive Integrated Moving Average, referred to as Arima) model forecasting algorithm and the time series model Prophet forecasting algorithm can be used to predict trends, seasonality, etc. time-related laws. They first disassemble the historical data corresponding to the historical time series into the linear superposition of trend factors, seasonal factors and external influencing factors, and respectively predict the impact of the above factors on the data corresponding to future time, and finally analyze the impact of these three factors. The subsequent prediction results are superimposed to obtain the final prediction result.
  • Arima Automatic Integrated Moving Average
  • linear algorithms are difficult to capture the laws that exist in time series; different time series are predicted independently, resulting in no consideration of the relationship between different time series, so each time series forecast It is not accurate enough and its simple limited superposition cannot accurately reflect the real process change trend; when the time series is relatively short and the corresponding amount of historical data is relatively small, the linear algorithm cannot capture long-term laws and cannot learn from other sequences.
  • the LSTM Long-Short-Term Memory, long short-term memory
  • the historical observations, historical influencing factors and future influencing factors of the variables corresponding to the time series are used as the input of the neural network model structure, and the future predicted values of the variables are used as the output of the neural network model structure.
  • LSTM network is a temporal recurrent neural network specially designed to solve the long-term dependency problem of general RNN (Recurrent Neural Network), which is suitable for processing and predicting important events with very long intervals and delays in time series.
  • LSTM networks outperform temporal recurrent neural networks and hidden Markov models (HMMs).
  • HMMs hidden Markov models
  • the important structure in the LSTM network is the gate, in which the forget gate determines whether the input is passed into the block, the input gate determines whether the input is accepted to pass into the block, and the output gate determines whether the information in the block memory is passed out.
  • LSTM networks are usually trained using gradient descent.
  • the LSTM network model can overcome the problem of poor long-term forecasting effect of the Arima/Prophet algorithm, it still cannot meet many requirements of time series forecasting.
  • time series prediction process of the present application is described below with reference to the seq2seq neural network model architecture of FIG. 1 and the method flow for time series prediction of FIG. 2 according to an embodiment of the present application.
  • the basic structure of the neural network model architecture 100 in FIG. 1 is a seq2seq (sequence to sequence) network model.
  • the seq2seq neural network model can be regarded as a transformation model.
  • the basic idea is that the former neural network model of the two neural network models connected in series is used as the encoder network, and the latter neural network model is used as the decoder network.
  • the encoder network converts a sequence of data into a vector or sequence of vectors, and the decoder network generates another sequence of data from that vector or sequence of vectors.
  • a usage scenario of the seq2seq network model is speech recognition, in which the encoder network converts or divides English sentences into English or Chinese semantic data or semantic sequences, and the decoder network can convert the semantic data or semantic sequences into English sentences corresponding to them. Chinese sentences.
  • the optimization of the seq2seq network model can use the maximum likelihood estimation method to maximize the probability of the data sequence generated by the decoder to obtain the optimal conversion effect.
  • the seq2seq neural network model architecture 100 includes a first neural network model 110 as an encoder and a second neural network model 120 as a decoder.
  • the first neural network model 110 is used for extracting information in the historical data, especially regular data reflecting the regularity in the historical data.
  • the first neural network model 110 is a WaveNet neural network.
  • the WaveNet network is designed to predict the predicted value of the nth data based on the first n-1 data of a data sequence.
  • WaveNet is particularly suitable for high-throughput input of one-dimensional data sequences as multi-dimensional vectors, which can achieve fast computation.
  • the standard WaveNet network model is a convolutional neural network in which each convolutional layer convolves the previous layer. The larger the convolution kernel of the network and the more layers, the stronger the perception ability in the time domain and the larger the perception range.
  • each time a node is generated the node can be placed in the last node in the input layer and then iteratively generated.
  • the activation function of the WaveNet network can use gate units, for example.
  • the hidden layer between the input layer and the output layer of the network adopts recursive and skip connections, that is, the node of each convolutional layer in the hidden layer will add the original value and the output value of the activation function and pass it to the next A convolutional layer.
  • the operation of reducing the number of channels can be achieved through a 1x1 convolution kernel. Then, the results of the activation function output of each hidden layer are added and finally output through the output layer.
  • the first neural network model 110 has an input layer (ie, a first convolutional layer) 112 , a hidden layer 113 and an output layer 114 .
  • the number of hidden layers 113 may be zero, one or more.
  • the input layer 112 , the hidden layer 113 and the output layer 114 each have a plurality of nodes 111 .
  • the number of nodes in the input layer 112 should at least correspond to the data length in the historical data sequence to ensure that the neural network can receive information at each historical time.
  • the first neural network model 110 using the ordinary WaveNet network is a one-dimensional causal convolutional network
  • the number of nodes used is Decrement by 1 with each convolutional layer. If the length of the historical data is large, many layers need to be added to the first neural network model 110 to satisfy n passes or a large filter is required, so that the selected gradient in the gradient descent process is too small, the training of the network is complicated, and the fitting Ineffective.
  • Dilated Convolutional Neural Network Dilated CNN
  • a dilated convolutional neural network is a convolutional network with "holes".
  • the first convolutional layer (ie, the input layer) of the dilated convolutional neural network may be a one-dimensional causal convolutional network with an expansion coefficient of 1.
  • the expansion coefficient of each convolutional layer is the expansion coefficient of the previous convolutional layer multiplied by the expansion index (Dilation Index), where the expansion index is a value not less than 2 and not greater than the convolutional layer.
  • Dilation Index the expansion index
  • This dilated convolutional neural network configuration can be employed in both the hidden layer and the output layer of the first neural network model 110 .
  • the dilation index is 2
  • the second convolutional layer will only use n, n-2, n-4, ... nodes for convolution
  • the third convolutional layer will only use n, n-4 , n-8, ... nodes, and so on.
  • the expanded neural network structure can significantly speed up the information transfer process in the neural network, avoid gradient disappearance or gradient explosion, and improve the processing speed and prediction accuracy of the first neural network model 110 .
  • the convolution kernel size is 2 and the expansion coefficient is 2
  • the number of convolutional layers of the neural network through which information is passed from the node corresponding to the first historical time to the node corresponding to the last historical time is Log 2 N, where N is the length of data in the historical data series.
  • the second neural network model 120 as a decoder may be a multi-layer perceptual (MLP) network.
  • the MLP network also includes an input layer, a hidden layer and an output layer, where each neuron node has an activation function (such as a sigmoid function) and is trained using a loss function.
  • the MLP network predicts the future value of the object based on the historical law extracted by the encoder network and the future influencing factors (including dynamic and static factors).
  • the time series method of the present application can use any other neural network capable of feature extraction and prediction of sequence data.
  • Network structures such as, but not limited to, various types of recurrent neural networks (RNNs) that can implement the function of time series prediction of the present application.
  • RNNs recurrent neural networks
  • the encoder network can use an LSTM network
  • the decoder network uses an MLP network.
  • the LSTM network has shortcomings, it is still possible to combine with the MLP network and adjust the input and output data sequences of the network to a certain extent. Compared with the existing scheme, better results can be obtained. You can also choose the WaveNet network as the encoder network, LSTM network or other RNN network as the decoder network, etc.
  • the unit of historical time and/or future time can be selected from hour, day, month, year, week, quarter, etc. as required.
  • the historical time and/or the future time may be a time point (for example, the t1th time, as of the first quarter, 10 am in the morning, etc.), or may be a continuous time period (for example, the t2th period of time) , week 2, month 3, October of the current year, etc.).
  • the time intervals between the respective historical times and/or future times may be the same to indicate a period in which the historical data and future data are extracted and predicted at the continuous time interval as a period sexual information.
  • the length of the time period can also be the same, so as to extract and predict the periodic information of the above-mentioned historical data and future data as a period.
  • the method 200 first acquires the historical data sequence 101 of the object corresponding to the historical time series T1 in step S210.
  • the historical value yi is the measured value of the object measured at the historical time ti , such as the actual sales value of the product.
  • the historical value is caused by the internal factors of the object, so it can also be called the internal factors of the object or the internal characteristic data.
  • the historical dynamic feature xi is the dynamic feature of the historical value yi in the historical data that affects the object, for example including one or more of whether it is a holiday, the number of working days, the number of days or weeks away from a holiday, and so on.
  • Historical dynamic characteristics are associated with time, such as including periodic factors that cyclically affect objects with a certain period (also known as periodic historical dynamic characteristics) and aperiodic factors that affect objects aperiodically (also known as aperiodic factors). Sexual History Dynamics).
  • the period of the periodic factor may be determined by the length of the same time interval between each historical time point in the historical time series, or by the length of the historical time as a time period of the same length.
  • the way in which aperiodic factors affect objects is related to a specific historical time, or it can be said to be random or triggered based on events.
  • the corresponding aperiodic factors may be different.
  • the number n of historical times represents the number or length of historical data.
  • the historical value yi may be a multidimensional variable or vector.
  • the historical dynamic feature xi that affects the historical value y i of the object includes many factors, the historical dynamic feature is considered to be a combination of multiple historical sub-dynamic features, and the historical dynamic feature xi can also be a multidimensional variable or vector .
  • Historical dynamic features x i and historical values y i can form a two-dimensional vector ( xi, y i ) T (also called a binary data group, hereinafter unified as a two-dimensional vector), each of which is a two-dimensional vector.
  • the sub-vectors x i and y i are both multidimensional vectors as described above.
  • the historical data sequence 101 can be represented as a one-dimensional sequence of two-dimensional vectors ⁇ (x 1 , y 1 ) T , (x 2 , y 2 ) T , . . . , (x n , y n ) T ⁇ .
  • the first neural network model 110 serving as an encoder does not process historical static data, which can reduce the redundancy of data and calculation, and improve the operation speed of the network model.
  • the first neural network model 110 is used to complete the regularity extraction function of extracting the regularity data sequence 102 of the object corresponding to the future time series T2 based on the historical data sequence 101.
  • the historical data sequence 101 is used as the input of the first neural network model 110, and the extracted regular data sequence 102 is output through the transmission and calculation of each convolutional layer of the WaveNet network.
  • the dilated convolutional neural network described above can speed up the regular information extraction process of the regular data sequence 102 and improve the information extraction accuracy.
  • the periodicity of the historical value of an object, such as the month, is affected.
  • the time interval between the future time ti of the future time series T2 (when the future time is a time point) and/or the length of the future time (when the future time is a time period) is set to be the same as that in the historical time series T1
  • the length of the historical time t n+j is the same, so the periodic regularity characteristic ca is the same for each future time t n +j .
  • the non-periodic regular feature c n+j is based on the future value y n+j of the object affected by a specific future time, so the non-periodic regular feature c n+j for the future time series T2
  • Each corresponding future time tn +j may be different.
  • m is the number of future times in the future time series T2, indicating the number or length of future data to be predicted.
  • the periodic regular feature ca also includes multiple sub-periodic regular features, which can be expressed as a multi-dimensional vector.
  • the dimension of the periodic regularity feature ca may be the same as the number of sub-periodic historical dynamic features in the periodic historical dynamic feature, or smaller than the latter to reduce the amount of computation.
  • an aperiodic historical dynamic feature may also have multiple sub-aperiodic dynamic features, so the aperiodic regularity feature c n+j also includes multiple sub-aperiodic regularity features, which can be represented as multi-dimensional vectors.
  • the dimension of the non-periodic regular feature c n+j can also be the same as the number of sub-periodic historical dynamic features in the aperiodic historical dynamic feature, or smaller than the latter to reduce the amount of computation.
  • the regular data sequence 102 can be represented as a one-dimensional sequence of two-dimensional vectors whose elements are composed of two multi-dimensional sub-vectors, a periodic regular feature c a and a non-periodic regular feature c n+j ⁇ (c a , c n+ 1 ) T , ( ca , cn +2 ) T , . . . , ( ca , cn +m ) T ⁇ .
  • x n+j in Fig. 1 is the future dynamic feature of the future value y n+j of the influence object corresponding to the future time t n+j in the future time series T2.
  • the future dynamic feature x n+j may be, for example, one or more of the promotion activities at a certain time in the future, whether it is a holiday, the number of working days, the number of days or weeks away from a holiday, and so on.
  • the future dynamics are also associated with time and may be a multi-dimensional vector that includes multiple sub-future dynamics.
  • the future dynamic features x n+j form a one-dimensional sequence of multi-dimensional vectors ⁇ x n+1 , x n+2 , ..., x n+m ⁇ .
  • the future static features xs may include properties of the object (which are generally only relevant to the object itself and not to future time) and other features that are not time-dependent.
  • the future static features x s can be the category of the product, the temperature of the product, the sales location of the product (for example, represented by the location of the distribution center), etc. These features are only associated with the object and do not have transsexual.
  • the future static feature x s can be a multidimensional vector composed of multiple sub-features.
  • the future static feature x s can be further processed.
  • the future static features x s are divided into different types, and the correlation between each type is different. Embedding operations can transform sparse discrete variables into continuous variables. Embed the future static features according to their types, for example, divide them into two groups of future static features x s1 and x s2 according to location-related features and product attribute-related features, so that different groups of future static features are not related, that is, keep orthogonality Therefore, it avoids considering each specific static influence factor as a variable or a dimension of a vector, thereby reducing the overall dimension of future static features x s and reducing the computational load of the model.
  • the future static feature set x s1 includes multi-dimensional future static features e 1 , which can affect objects at each future time t n+j
  • the future static feature set x s2 includes multi-dimensional future static features e 2 , can also affect the object at every future time t n+j
  • the number of future static features or future static feature groups may be 0, 1 or more.
  • the number of specific features included in each future static feature determines that its dimension can be one or more.
  • the future static features x s form a 0-, 1-, or multi-dimensional vector as a 1-dimensional sequence of length m ⁇ x s , x s , . . . , x s ⁇ with elements.
  • the one-dimensional sequence can be expressed as ⁇ (e 1 , e 2 ) T , (e 1 , e 2 ) T , . . . , (e 1 , e 2 ) T ⁇ .
  • the predicted feature data sequence 103 may be generated based on the regular data sequence 102, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object.
  • the generating process can be completed by splicing the one-dimensional sequences of the four influencing factors to form a one-dimensional prediction feature data sequence 103 with multi-dimensional (for example, 4-dimensional) vectors (or can be referred to as quaternary data groups) as elements. As shown in FIG.
  • the one-dimensional sequence 103 can be represented as ⁇ ( ca , cn +1 , e1, e2 , xn +1 ) T , ( ca , cn +2 , e1, e 2 , x n+2 ) T , ..., (ca , c n +m , e 1 , e 2 , x n+m ) T ⁇ .
  • the predicted feature data sequence 103 is input into the second neural network model 120, and the prediction of the future corresponding to the future time series T2 is completed through the transfer and operation of a decoder such as a multi-layer perceptron MLP network.
  • the future value yn +j of each predicted object in the future data sequence 104 is a multidimensional vector having the same dimensions as the historical value yi .
  • the method 200 may also optionally include, prior to using at least one of the first and second neural network models 110 and 120 as encoder and decoder networks, respectively, using the training data set for the neural network model A step S250 of training to determine the optimal parameters of the model.
  • the parameters of the neural network model can be unchanged during the use period after the training is completed, or they can be updated or adjusted based on a new data set after a period of use or in a predetermined period, and the model parameters can also be updated in real time by means of online supervision. .
  • FIG. 3 shows an exemplary structure of an apparatus 300 for time series prediction according to an embodiment of the present application.
  • the apparatus 300 includes a historical data acquisition unit 310 , a regularity extraction unit 320 , a prediction feature generation unit 330 and a prediction unit 340 .
  • the historical data acquisition unit 310 is configured to acquire the historical data sequence 101 of the object corresponding to the historical time series T1.
  • the historical data in the historical data sequence 101 includes the historical dynamic features xi associated with time corresponding to the historical time t i in the historical time series T1, and the historical value yi of the object.
  • the regularity extraction unit 320 includes, for example, the first neural network model 110 as an encoder network in the seq2seq neural network model to extract the regularity of historical data. This unit is used to extract the regular data sequence 102 of the object corresponding to the future time series T2 from the historical data sequence 101 provided by the historical data acquisition unit 310 by using the neural network model.
  • the regular data sequence 102 includes the periodic regular feature ca of the object corresponding to the future time t n+j of the future time series T2 and the aperiodic regular feature cn +j associated with the corresponding future time.
  • the encoder network can choose a sequence data network model such as the WaveNet network, and can further adopt a structure such as a dilated convolutional network to speed up information transfer and computation.
  • the prediction feature generation unit 330 is configured to use the regularity data sequence 102 output by the regularity extraction unit 320, the future dynamic feature sequence composed of the future dynamic feature x n+ j corresponding to the future time t n+j in the future time series T2, and the future dynamic feature sequence.
  • the static features x s are combined to generate the predicted feature data sequence 103 .
  • a future dynamic feature x n+j is associated with a future time t n+j .
  • the predicted feature generating unit 330 may further group the static features x s to orthogonalize each group of static features, thereby reducing the vector of each data element of the predicted feature data sequence dimension.
  • the prediction unit 340 comprises, for example, the second neural network model 120 as a decoder network in a seq2seq neural network model to predict future values of the object.
  • the unit 340 is used to predict the future data sequence 104 of the object corresponding to the future time series T2 from the predicted feature data sequence 103 from the predicted feature generation unit 330 using the second neural network model 120.
  • the second neural network model 120 may use a convolutional neural network such as a multi-layer perceptual MLP network.
  • the apparatus 300 also optionally includes a model training unit 350 for training the corresponding neural network model to determine the optimal parameters of the model before using the neural network model in the above-mentioned extraction unit 320 and the prediction unit 340, and can supervise or Update the parameters of the model.
  • a model training unit 350 for training the corresponding neural network model to determine the optimal parameters of the model before using the neural network model in the above-mentioned extraction unit 320 and the prediction unit 340, and can supervise or Update the parameters of the model.
  • the experiment is carried out in the scenario of product prediction in the catering industry, and the test task requires to predict the sales volume of each product (object) in each distribution center in the next 1-4 weeks.
  • the test dataset targets about 20 distribution centers, each including on average about 200 products. In the historical data of product sales, the longest is 128 weeks and the shortest is 1 week.
  • the test task involves considering 23 dynamic influencing factors (such as whether it is a holiday, the number of working days, the number of weeks until the Spring Festival, etc.) and 7 static influencing factors (such as product classification, temperature, location of distribution center, etc.) in the prediction Wait).
  • Table 1 shows the training time, prediction time and prediction error of the models using different time series forecasting methods.
  • the deep learning method using the seq2seq neural network model requires a lot of floating-point operations, and uses one more graphics processing unit GPU to accelerate the calculation than the traditional statistical algorithm Prophet.
  • the prediction accuracy (error) of the scheme using the WaveNet-MLP seq2seq2 (WaveNet network as the encoder and the MLP network as the decoder) according to the embodiment of the present application is better than that of the traditional statistical algorithm, and also better than that of the seq2seq neural network.
  • Network model structure both the encoder and decoder networks adopt the scheme of the LSTM network model.
  • the solution using the neural network model is faster than the traditional statistical algorithm; and between the solutions using the neural network model, the training time of the WaveNet-MLP seq2seq neural network model structure of this application is significantly reduced .
  • the advantages of the time series prediction method and apparatus lie in the following aspects: using two neural network models such as the WaveNet network and the MLP network as the encoder and the decoder network, respectively, can make historical data Sequences, future data sequences are calculated in parallel at different historical times of the corresponding historical time series and different future times of the future time series, thereby improving the speed of model training and use; using neural network models such as WaveNet networks as encoders , especially the dilated convolutional network structure, which reduces the transmission path of the information in the historical data sequence of the object from the first historical time to the last historical time, avoiding the gradient disappearance and gradient explosion during the training process of the neural network, Thereby, long-distance time series prediction can be performed; only the influencing factors that do not change with time are introduced at the input of the second neural network model as the decoder part, avoiding duplication and calculation at each time point of the encoder network, thereby reducing Redundancy of data and computation; embedded grouping of influencing
  • modules or units of the apparatus for time series prediction are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied. Components shown as modules or units may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.
  • a computer-readable storage medium on which a computer program is stored, the program including executable instructions, which, when executed by, for example, a processor, can implement any one of the above Steps of the method for time series forecasting described in the Examples.
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code, which is used to cause the program product to run on a terminal device when the program product is executed.
  • the terminal device performs the steps according to various exemplary embodiments of the present application described in the method for time series prediction in this specification.
  • the program product for implementing the above method according to the embodiments of the present application may adopt a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device such as a personal computer.
  • CD-ROM compact disc read only memory
  • the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable storage medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).
  • LAN local area network
  • WAN wide area network
  • an electronic device which may include a processor, and a memory for storing executable instructions of the processor.
  • the processor is configured to perform the steps of the method for time series prediction in any one of the foregoing embodiments by executing the executable instructions.
  • aspects of the present application may be implemented as a system, method or program product. Therefore, various aspects of the present application can be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", “module” or "system”.
  • the electronic device 400 according to this embodiment of the present application is described below with reference to FIG. 4 .
  • the electronic device 400 shown in FIG. 4 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present application.
  • electronic device 400 takes the form of a general-purpose computing device.
  • Components of the electronic device 400 may include, but are not limited to, at least one processing unit 410, at least one storage unit 420, a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410), a display unit 440, and the like.
  • the storage unit stores program codes, and the program codes can be executed by the processing unit 410, so that the processing unit 410 executes various examples according to the present application described in the method for automatic time series prediction in this specification steps of sexual implementation.
  • the processing unit 410 may perform the steps shown in FIG. 2 .
  • the storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202 , and may further include a read only storage unit (ROM) 4203 .
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205 including, but not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of a network environment.
  • program/utility 4204 having a set (at least one) of program modules 4205 including, but not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of a network environment.
  • the bus 430 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • the electronic device 400 may also communicate with one or more external devices 500 (eg, keyboards, pointing devices, Bluetooth devices, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 450 . Also, the electronic device 400 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 460 . Network adapter 460 may communicate with other modules of electronic device 400 through bus 430 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present application may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or a network Above, several instructions are included to cause a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the method for time series prediction according to an embodiment of the present application.
  • a computing device which may be a personal computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

用于时间序列预测的方法、装置和计算机可读存储介质。该方法包括获取与历史时间序列对应的对象的历史数据序列(S210),使用第一神经网络模型基于历史数据序列提取与未来时间序列对应的规律数据序列(S220),基于规律数据序列、与未来时间序列对应的未来动态特征序列、以及未来静态特征生成预测特征数据序列(S230),使用第二神经网络模型基于预测特征数据序列预测与未来时间序列对应的对象的未来数据序列(S240)。上述方法能够满足高效计算的要求,准确地捕捉趋势因素、季节性因素、外部因素等对所预测的对象的非线性影响,同时做出短距离和长距离时间预测。

Description

用于时间序列预测的方法和装置
相关申请的交叉引用
本申请要求于2020年09月14日递交的中国专利申请第202010959817.7号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本申请涉及时间序列预测,特别涉及基于对象的历史数据预测对象的未来数据的方法、装置和计算机可读存储介质。
背景技术
在例如零售和餐饮的行业中,需要根据产品的历史销售情况估计未来一段时间的销售数据以用于产品的储备、配送和更新。准确地预测未来时间的产品销售情况可以有效降低成本,及时发现商机,快速调整经营策略而提高竞争力。
基于过去一段时间的产品销售情况来预测未来时间的销售预期被称为产品的时间序列预测。时间序列预测目前的主流技术包括两大类:一类是以Arima/Prophet为代表的基于传统统计学的预测算法,另一类是以LSTM神经网络为代表的基于深度学习的预测算法。
但是,基于传统统计学的时间序列预测算法是线性算法,难以捕捉时间序列中的非线性规律和长期规律。基于LSTM神经网络的时间序列预测算法则在时间序列规模变大时容易出现梯度消失或梯度***的情况导致预测结果失真,并且运行效率低下,数据和计算冗余。
因此,存在对时间序列预测方法进行改进的需求。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技 术的信息。
发明内容
本申请提出用于时间序列预测的方法、装置和计算机可读存储介质,用于解决现有技术方案中存在的至少一个缺陷,从对象的历史数据中提取规律并结合未来的影响因素,以预测对象的未来数据。
根据本申请的一方面,提出一种用于时间序列预测的方法,包括:
获取与历史时间序列对应的对象的历史数据序列,所述历史数据序列中的历史数据包括与所述历史时间序列中的历史时间对应的所述对象的历史动态特征和历史值,其中所述历史动态特征与对应的历史时间相关联;
使用第一神经网络模型基于所述历史数据序列提取与未来时间序列对应的所述对象的规律数据序列;
基于所述规律数据序列、与所述未来时间序列对应的所述对象的未来动态特征序列、以及所述对象的未来静态特征生成预测特征数据序列,其中所述未来动态特征序列包括与所述未来时间序列中的未来时间对应的所述对象的未来动态特征,所述未来动态特征与对应的未来时间相关联;以及
使用第二神经网络模型基于所述预测特征数据序列预测与所述未来时间序列对应的所述对象的未来数据序列,所述未来数据序列中的未来数据包括与所述未来时间序列的未来时间对应的所预测的所述对象的未来值。
根据本申请的另一方面,提出一种用于时间序列预测的装置,包括:
历史数据获取单元,被配置为获取与历史时间序列对应的所述对象的历史数据序列,所述历史数据序列中的历史数据包括与所述历史时间序列中的历史时间对应的所述对象的历史动态特征和历史值,其中所述历史动态特征与对应的历史时间相关联;
规律提取单元,被配置为使用第一神经网络模型基于所述历史数据序列提取与未来时间序列对应的所述对象的规律数据序列;
预测特征生成单元,被配置为基于所述规律数据序列、与所述未来时间序列对应的所述对象的未来动态特征序列、以及所述对象的未来静态特征生成预测特征数据序列,其中所述未来动态特征序列包括与所述未来时间序列的未来时间对应的所述对象的未来动态特征,所述未来动态特征与所述对应未来时间相关联;以及
预测单元,被配置为使用第二神经网络模型基于所述预测特征数据序列预测与所述未来时间序列对应的所述对象的未来数据序列,所述未来数据序列中的未来数据包括与所述未来时间序列的未来时间对应的所预测的所述对象的未来值。
根据本申请的又一方面,提出一种计算机可读存储介质,其上存储有计算机程序,该计算机程序包括可执行指令,当该可执行指令被至少一个处理器执行时,实施如上所述的方法。
根据本申请的再一方面,提出一种电子设备,包括处理器和用于存储所述处理器的可执行指令的存储器,其中,所述处理器被配置为执行所述可执行指令以实施如上所述的方法。
根据本申请的实施例的时间序列预测方法和装置,能够满足高效计算的要求,准确地捕捉趋势因素、季节性因素、外部因素等对所预测的对象的非线性影响,同时做出短距离和长距离时间预测。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
通过参照附图详细描述其示例性实施例,本申请的上述和其它特征及优点将变得更加明显。
图1为根据本申请的一个实施例的用于时间序列预测的seq2seq神经网络模型架构的示意图;
图2为根据本申请的一个实施例的用于时间序列预测的方法的示例性流程图;
图3为根据本申请的一个实施例的用于时间序列预测的装置的示意性框图;以及
图4为根据本申请的一个实施例的电子设备的示意性框图。
具体实施方式
现在将参考附图更全面地描述示例性实施例。然而,示例性实施例能够以多种形式实施,且不应被理解为限于在此阐述的实施方式;相反,提供这些实施方式使得本申请将全面和完整,并将示例性实施例的构思全面地传达给本领域的技术人员。在图中,为了清晰,可能会夸大部分元件的尺寸或加以变形。在图中相同的附图标记表示相同或类似的结构,因而将省略它们的详细描述。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有所述特定细节中的一个或更多,或者可以采用其它的方法、元件等。在其它情况下,不详细示出或描述公知结构、方法或者操作以避免模糊本申请的各方面。
本领域技术人员将理解的是,在下文中虽然根据实施例以具体的神经网络模型结构介绍用于基于对象的历史数据预测对象的未来数据的方法和装置,但是本申请的方案并不限于该示例性神经网络模型,而是可以扩展到其它能够实现本申请的时间序列预测的构思的神经网络结构,还可以扩展到其它基于深度学习的预测模型结构中。在本文的示例性实施例中,时间序列预测方法以餐饮业在销售地的销售产品作为对象进行介绍,但是本申请的方法可以应用于任何需要进行时间序列预测的对象和场景。在下文中,如果没有特别说明,神经网络通常指人工神经网络(ANN)。神经网络可以采用常见的卷积神经网络(CNN),视情况还可以进一步采用全卷积神经网络。本文中没有对与本申请的时间序列预测方法不相关的神经网络的其它具体类型和结构进行过多描述以免造成混淆。
在主流的时间序列预测算法中,基于传统统计学的差分整合移动平均自回归(Autoregressive Integrated Moving Average,简称为Arima)模型预测算法和时间序列模型Prophet预测算法都可以用于预测趋势、季节性等时间相关的规律。它们先将历史时间序列对应的历史数据拆解为趋势因素、季节性因素和外部影响因素的线性叠加,并分别预测上述因素对与未来时间对应的数据的影响,最后将这三种因素的影响后的预测结果进行叠加得到最终的预测结果。
但是传统统计学的时间序列预测算法的主要问题在于:线性算法很难捕捉时间序列中存在的规律;不同的时间序列被独立预测导致没有考虑不同时间序列之间的关系,因此每个时间序列预测不够准确而且其简单的限定叠加不能准确体现真实的过程变化趋势;当时间序列比较短而对应的历史数据量比较少时,线性算法无法捕捉到长期规律,并且无法从其它序列中借鉴规律。
在基于深度学习的预测算法中,通常采用LSTM(Long-Short-Term Memory,长短期记忆)网络结构。将与时间序列对应的变量的历史观测值、历史影响因素和变量的未来影响因素作为神经网络模型结构的输入,将变量的未来预测值作为神经网络模型结构的输出。
LSTM网络是一种被专门设计为解决一般的RNN(循环神经网络)存在的长期依赖问题的时间循环神经网络,其适合处理和预测时间序列中间隔和延迟非常长的重要事件。LSTM网络比时间递归神经网络和隐马尔科夫模型(HMM)性能更好。在LSTM网络中的重要结构是门,其中遗忘门决定输入不传入区块,输入门决定接受输入以传入区块,输出门决定区块记忆中的信息是否传出。LSTM网络通常使用梯度下降法进行训练。
基于LSTM网络模型的时间序列预测算法存在若干缺陷。首先,信息在经由LSTM网络模型的传递过程中会出现梯度消失或梯度***的现象,所以在长距离时间序列预测时的预测结果可能失真。虽然LSTM网络中的门在一定程度上可以缓解这种情况但是无法根除。其次,在LSTM网络结构中,信息在LSTM网络结构的卷积层中的每个节点上从前到后、从下到 上逐个传递,这使得时间序列中的多个信息难以并行运算,导致网络模型的训练过程缓慢,效率低下。再次,为了将时间序列中所包含的影响因素中不随时间变化的部分信息输入到LSTM网络结构中,需要将这些信息在每个节点上进行复制,造成数据和计算的冗余,进一步降低了LSTM网络模型的处理速度。因此,虽然LSTM网络模型能够克服Arima/Prophet算法的长期预测效果不佳的问题,但是仍然不能满足时间序列预测的诸多要求。
下面结合根据本申请的实施例的图1的seq2seq神经网络模型架构和图2的用于时间序列预测的方法流程来描述本申请的时间序列预测过程。
图1中的神经网络模型架构100的基本结构为一个seq2seq(sequence to sequence)网络模型。seq2seq神经网络模型可以被视为一种转换模型,基本思想是串联连接的两个神经网络模型中的前一神经网络模型作为编码器网络,后一个神经网络模型作为解码器网络。编码器网络将数据序列转化为向量或向量序列,解码器网络则根据该向量或向量序列生成另一个数据序列。seq2seq网络模型的一种使用场景为语音识别,其中编码器网络将英文语句转化或分割为英文或中文的语义数据或语义序列,解码器网络则可以将语义数据或语义序列转化为英文语句对应的中文语句。seq2seq网络模型的优化可以采用极大似然估计方法,使得解码器生成的数据序列概率最大以获得最优的转换效果。
根据本申请的实施例,seq2seq神经网络模型架构100包括作为编码器的第一神经网络模型110和作为解码器的第二神经网络模型120。
第一神经网络模型110用于提取历史数据中的信息,特别是体现历史数据中的规律的规律数据。根据一个实施例,第一神经网络模型110为WaveNet神经网络。WaveNet网络作为一种序列生成模型,被设计用于根据一个数据序列的前n-1个数据预测第n个数据的预测值。WaveNet特别适用于输入为多维向量的一维数据序列的高通量输入,这种一维网络可以实现快速计算。标准的WaveNet网络模型是一种卷积神经网络,其中每个卷积层都对前一层进行卷积。网络的卷积核越大,层数越多,在时域上的 感知能力越强,感知范围越大。在WaveNet网络的创建过程中,每生成一个节点,就把该节点放到输入层中的最后一个节点之后继续迭代生成即可。WaveNet网络的激活函数例如可以使用门单元。该网络的输入层和输出层之间的隐含层之间采用递归和跳跃连接,即隐含层中每一卷积层的节点都会把原值和激活函数的输出值相加后传递给下一卷积层。可以通过1x1的卷积核来实现降通道数的操作。然后,每一个隐含层的激活函数输出的结果相加后最终通过输出层输出。
如图1所示,第一神经网络模型110具有输入层(即第一卷积层)112,隐含层113和输出层114。隐含层113的个数可以是0个,1个或多个。输入层112,隐含层113和输出层114中的每一个卷积层都具有多个节点111。输入层112的节点数应当至少与历史数据序列中的数据长度对应以保证神经网络能够接收到每个历史时间的信息。
当采用普通WaveNet网络的第一神经网络模型110为一维因果卷积网络的情况下,由于需要使用所输入的数据序列的前n-1个数据预测第n个数据,因此所使用的节点数随着每个卷积层减少1。如果历史数据的长度很大,则需要为第一神经网络模型110增加很多层以满足n次传递或者需要很大的过滤器,使得梯度下降过程中选择的梯度过小时网络的训练复杂,拟合效果不好。
根据本申请的实施例,可以引入膨胀卷积神经网络(Dilated CNN)的概念。膨胀卷积神经网络是一种具有“空洞”的卷积网络。根据本申请的实施例,膨胀卷积神经网络的第一卷积层(即输入层)可以是膨胀系数为1的一维因果卷积网络。从神经网络的第二卷积层开始,每一卷积层的膨胀系数为前一卷积层的膨胀系数乘以膨胀指数(Dilation Index),其中膨胀指数是一个不小于2且不大于卷积核大小的正整数。可以在第一神经网络模型110的隐含层和输出层都采用这种膨胀卷积神经网络配置。例如,当膨胀指数为2时,第二卷积层只会使用第n,n-2,n-4,…的节点进行卷积,而第三卷积层则只会使用n,n-4,n-8,…的节点,以下依次类推。
膨胀神经网络结构可以显著加快信息在神经网络中的传递过程,避免出现梯度消失或梯度***的情况,提高第一神经网络模型110的处理速度和预测精度。例如,当卷积核大小为2并且膨胀系数为2时,信息从与第一个历史时间处对应的节点传递到与最后一个历史时间处对应的节点所经过的神经网络的卷积层数为Log 2N,其中N是历史数据序列中的数据长度。
根据本申请的实施例,作为解码器的第二神经网络模型120可以是多层感知(MLP)网络。MLP网络同样包括输入层,隐含层和输出层,其中每个神经元节点具有激活函数(例如sigmoid函数)并使用损失函数进行训练。MLP网络基于编码器网络提取的历史规律,未来的影响因素(包括动态因素和静态因素)预测出对象的未来值。
虽然上文中以WaveNet网络作为seq2seq神经网络模型架构的编码器网络的示例,以MLP网络作为解码器网络的示例,但是本申请的时间序列方法可以采用能够进行序列数据的特征提取和预测任何其它神经网络结构,这些网络结构例如但不限于能够实现本申请的时间序列预测的功能的各种类型的循环神经网络RNN。例如,当使用seq2seq神经网络模型架构时,编码器网络可以采用LSTM网络,而解码器网络采用MLP网络,虽然LSTM网络存在缺陷,但是与MLP网络结合并且调整网络的输入输出数据序列仍然在一定程度上可以获得相比现有的方案更好的效果。还可以选择WaveNet网络作为编码器网络,LSTM网络或其它RNN网络作为解码器网络等。
在图2所述的流程图中,用于时间序列预测的方法200用于基于与包含n个历史时间的历史时间序列T1={t 1,t 2,t 3,…,t n}对应的历史数据序列101来预测与包括m个未来时间的未来时间序列T2={t n+1,t n+2,t n+3,…,t n+m}对应的未来数据序列104。历史时间和/或未来时间的单位根据需求可以选择小时,日,月,年,周,季度等。例如,对于预测公交车站的乘客上下车人数,可以采用小时,甚至分钟,每刻钟的时间单位或间隔。对于快餐店,可以采用日,月,周等时间单位。依据快餐店的客流量, 以周为单位测量和预测食物产品的销售量相对其它单位更能体现该行业的历史数据规律和未来趋势,在下文中以周为例进行阐述。根据本申请的实施例,历史时间和/或未来时间可以是时间点(例如第t 1时刻,截至第1季度,早上10点等),也可以是连续的时间段(例如第t 2段时间,第2周,第3个月,当年10月份等)。当历史时间和未来时间是时间点时,各个历史时间和/或未来时间之间的时间间隔可以是相同的,以表示在该连续的时间间隔作为周期来提取和预测历史数据和未来数据的周期性信息。当历史时间和未来时间是时间段时,该时间段的长度也可以是相同的,以作为周期来提取和预测上述历史数据和未来数据的周期性信息。
方法200首先在步骤S210中获取与历史时间序列T1对应的对象的历史数据序列101。
历史数据序列101的历史数据包括与历史时间序列T1中的历史时间t i对应的历史动态特征x i和历史值y i,其中i=1,2,…,n。历史值y i为在该历史时间t i处测量到的对象的测量值,例如产品的实际销量值。历史值是由对象的内部因素造成的,因此也可以被称为对象的内部因素或内部特征数据。历史动态特征x i为历史数据中影响对象的历史值y i的动态特征,例如包括是否是节假日,工作日的天数,距离节假日的天数或周数等中的一项或多项。历史动态特征与时间相关联,例如包括以一定周期循环地影响对象的周期性因素(也可以称为周期性历史动态特征)和非周期地影响对象的非周期性因素(也可以称为非周期性历史动态特征)。周期性因素的周期可以由历史时间序列中的各个历史时间点之间的相同的时间间隔的长度确定,或由作为相同长度的时间段的历史时间的长度来确定。非周期性因素影响对象的方式与特定的历史时间有关,或者可以说其具有随机性或者基于事件触发。在历史时间序列T1中的每个历史时间t i处,对应的非周期性因素可能不同。历史时间的数量n表示历史数据的数量或长度。
当对象包括多个部分或子对象(例如产品是包括多种产品的集合)时,历史值y i可以是多维变量或向量。同样,影响对象的历史值y i的历史动态特征x i包括多方面的因素时,该历史动态特征被认为是多种历史子动态特 征的组合,历史动态特征x i也可以是多维变量或向量。历史动态特征x i和历史值y i可以组成二维向量(x i,y i) T(也可以被称为二元数据组,在下文中统一为二维向量),该二维向量中的每个子向量x i和y i都是上文中所述的多维向量。因此,历史数据序列101可以表示为二维向量的一维序列{(x 1,y 1) T,(x 2,y 2) T,…,(x n,y n) T}。
根据本申请的实施例,在历史数据101中并未加入与历史时间不相关的历史静态特征。作为编码器的第一神经网络模型110不处理历史静态数据可以减少数据和计算的冗余,提高网络模型的运算速度。
在步骤S220,使用第一神经网络模型110基于历史数据序列101完成提取与未来时间序列T2对应的对象的规律数据序列102的规律性提取功能。历史数据序列101作为第一神经网络模型110的输入,经过WaveNet网络的各个卷积层的传递和计算,输出所提取的规律数据序列102。上文所述的膨胀卷积神经网络可以加速上述规律数据序列102的规律信息提取过程并改善信息的提取精度。
由于所输入的历史数据序列101中的历史动态特征x i中包括周期性动态特征和非周期性动态特征,因此在第一神经网络模型110所输出的规律数据序列102中表示的所提取的规律信息中,包括来自周期性历史动态特征的周期性规律特征c a以及来自非周期性历史动态特征的非周期性规律特征c n+j,其中j=1,2,…,m。周期性规律特征c a与周期性历史动态特征对应,以一定周期循环地影响对象的未来值y n+j,j=1,2,…,m,其例如包括以季节性、星期和/或月份等影响对象的历史值的周期性规律。由于将未来时间序列T2的未来时间t i之间的时间间隔(当未来时间为时间点时)和/或未来时间的长度(当未来时间为时间段时)设定为与历史时间序列T1中的历史时间t n+j的长度相同,因此周期性规律特征c a对于每个未来时间t n+j是相同的。与非周期性历史特征数据对应,非周期性规律特征c n+j基于特定的未来时间影响对象的未来值y n+j,因此非周期性规律特征c n+j对于未来时间序列T2中的每个对应的未来时间t n+j可能不同。m为未来时间序列T2中的未来时间的数量,表示待预测的未来数据的数量或长度。
由于历史动态特征x i中包括的周期性历史动态特征可能是多个子周期性因素的组合,因此周期性规律特征c a也包括多个子周期性规律特征,其可以表示为多维向量。根据本申请的实施例,周期性规律特征c a的维数可以与周期性历史动态特征中的子周期性历史动态特征的数量相同,或者比后者小以减小运算量。类似的,非周期性历史动态特征也可能具有多个子非周期性动态特征,因此非周期性规律特征c n+j也包括多个子非周期性规律特征,其可以表示为多维向量。非周期性规律特征c n+j的维数同样可以与非周期性历史动态特征中的子周期性历史动态特征的数量相同,或者比后者小以减小运算量。这样,规律数据序列102可以表示为由周期性规律特征c a和非周期性规律特征c n+j两个多维子向量组成其元素的二维向量的一维序列{(c a,c n+1) T,(c a,c n+2) T,…,(c a,c n+m) T}。
在预测对象的未来数据序列104中的未来值y n+j时,可能还需要考虑在未来时间影响对象的其它因素。
与历史动态特征x i类似,图1中的x n+j是与未来时间序列T2中的未来时间t n+j对应的影响对象的未来值y n+j的未来动态特征。未来动态特征x n+j例如可以是未来某个时间的促销活动,是否是节假日,工作日的天数,距离节假日的天数或周数等中的一项或多项。未来动态特征同样与时间相关联,并且可以是包括多个子未来动态特征的多维向量。未来动态特征x n+j组成多维向量的一维序列{x n+1,x n+2,…,x n+m}。
其它因素还可以包括影响对象的未来值y n+j但是与时间不相关的未来静态特征x s。未来静态特征x s可以包括对象的属性(其通常仅与对象本身有关而与未来时间无关)和与时间不相关的其它特征。例如,当对象为产品时,未来静态特征x s可以是产品的类别、产品的温度、产品的销售位置(例如以配销中心的位置表示)等,这些特征仅与对象相关联而不具有时变性。根据不相关的影响因素的数量,未来静态特征x s可以是由多个子特征组合的多维向量。
根据本申请的实施例,可以对未来静态特征x s进行进一步处理。未来静态特征x s分为不同类型,每种类型之间的相关性不同。嵌入(embedding) 操作可以将稀疏的离散变量转化为连续变量。对未来静态特征按照所属的类型进行嵌入操作,例如按照地点相关特征和产品属性相关特征分为两组未来静态特征x s1和x s2,使得不同组未来静态特征之间不相关,即保持正交性,从而避免将每个具体的静态影响因素作为一个变量或向量的一个维度考虑从而降低了未来静态特征x s的整体维度,减少模型的运算量。在图1中,未来静态特征组x s1中包括多维的未来静态特征e 1,可以在每个未来时间t n+j影响对象,而未来静态特征组x s2中包括多维的未来静态特征e 2,同样可以在每个未来时间t n+j影响对象。根据具体情况,未来静态特征或未来静态特征组的数量可以是0个,1个或更多个。每个未来静态特征中包括的具体特征的数量决定其维数可以是1个或更多个。
未来静态特征x s组成0维、一维或多维向量作为元素的长度为m的一维序列{x s,x s,…,x s}。以图1中的实施例为例,该一维序列可以表示为{(e 1,e 2) T,(e 1,e 2) T,…,(e 1,e 2) T}。
上文描述了具有四个部分的可以影响对象的未来数据y n+j的影响因素。在方法200的步骤S230中,可以基于规律数据序列102、与未来时间序列对应的对象的未来动态特征序列、以及对象的未来静态特征生成预测特征数据序列103。该生成过程可以通过将四种影响因素的一维序列拼接完成,形成以多维(例如4维)向量(或者可以称为四元数据组)为元素的一维的预测特征数据序列103。以图1中所示,该一维序列103可以表示为{(c a,c n+1,e 1,e 2,x n+1) T,(c a,c n+2,e 1,e 2,x n+2) T,…,(c a,c n+m,e 1,e 2,x n+m) T}。
在接下来的步骤S240中,预测特征数据序列103被输入到第二神经网络模型120中,通过诸如多层感知器MLP网络的解码器的传递和运算,完成预测与未来时间序列T2对应的未来数据序列104,即{y n+1,y n+2,…,y n+m}的预测功能。未来数据序列104中的每个所预测的对象的未来值y n+j是与历史值y i具有相同维度的多维向量。
根据本申请的实施例的方法200还可选地包括在使用分别作为编码器和解码器网络的第一和第二神经网络模型110和120中的至少一个之前, 使用训练数据集对神经网络模型进行训练以确定模型的最优参数的步骤S250。神经网络模型的参数在训练完成后的使用期间可以是不变的,也可以在使用一段时间后或者以预定的周期基于新的数据集更新或调整,还可以采用在线监督的方式实时更新模型参数。
图3示出根据本申请的实施例的用于时间序列预测的装置300的示例性结构。装置300包括历史数据获取单元310,规律提取单元320,预测特征生成单元330和预测单元340。
历史数据获取单元310用于获取与历史时间序列T1对应的对象的历史数据序列101。历史数据序列101中的历史数据包括与历史时间序列T1中的历史时间t i对应的与时间相关联的历史动态特征x i,以及对象的历史值y i
规律提取单元320中包括例如作为seq2seq神经网络模型中的编码器网络以提取历史数据规律性的第一神经网络模型110。该单元用于使用该神经网络模型从历史数据获取单元310提供的历史数据序列101中提取与未来时间序列T2对应的对象的规律数据序列102。规律数据序列102中包括与未来时间序列T2的未来时间t n+j对应的对象的周期性规律特征c a和与对应的未来时间相关联的非周期性规律特征c n+j。在seq2seq神经网络模型结构中,编码器网络可以选择诸如WaveNet网络的序列数据网络模型,并可以进一步采用诸如膨胀卷积网络的结构加快信息传递和计算。
预测特征生成单元330用于将规律提取单元320输出的规律数据序列102、与未来时间序列T2中的未来时间t n+j对应的未来动态特征x n+j组成的未来动态特征序列、以及未来静态特征x s组合以生成预测特征数据序列103。未来动态特征x n+j与未来时间t n+j相关联。在生成预测特征数据序列103的过程中,预测特征生成单元330还可以进一步对静态特征x s分组以使各组静态特征之间正交化,从而降低预测特征数据序列的每个数据元素的向量维度。
预测单元340包括例如作为seq2seq神经网络模型中的解码器网络以预测对象的未来值的第二神经网络模型120。该单元340用于使用该第二 神经网络模型120从来自预测特征生成单元330的预测特征数据序列103预测与未来时间序列T2对应的对象的未来数据序列104。第二神经网络模型120可以使用诸如多层感知MLP网络的卷积神经网络。
装置300还可选地包括模型训练单元350,用于在使用上述提取单元320和预测单元340中的神经网络模型之前对相应的神经网络模型进行训练以确定模型的最优参数,并且可以监督或更新模型的参数。
对于各个单元所完成功能的具体细节中与上述用于时间序列预测的方法200中相同或相似的部分不再详述。
对于根据本申请的实施例的时间序列预测方法和装置,进行了如下实验以与已有的时间序列预测方案比较性能。
该实验在餐饮行业中的产品预测场景下进行,测试任务要求预测在未来1-4周的每个产品(对象)在每个配销中心的销售量。测试数据集针对大约20个配销中心,平均每个配销中心包括约200个产品。在产品销售量的历史数据中,最长是128周,最短是1周。测试任务涉及在预测中考虑23个动态影响因素(例如是否是节假日、工作日的天数、距离春节的周数等)和7个静态影响因素(例如产品的归类、温度、配销中心的位置等)。
表1示出采用不同时间序列预测方法的模型的训练时间、预测时间和预测误差。其中使用seq2seq神经网络模型的深度学习方法需要大量的浮点运算,比传统统计学算法Prophet多使用一个图形处理单元GPU加速计算。
Figure PCTCN2021118272-appb-000001
表1
从结果可知,采用根据本申请的实施例的WaveNet-MLP seq2seq2(WaveNet网络作为编码器,MLP网络作为解码器)的方案的预测精度 (误差)优于传统统计学算法,也优于采用seq2seq神经网络模型结构但是编码器和解码器网络都采用LSTM网络模型的方案。在预测时间上来看,使用神经网络模型的方案比传统的统计学算法速度更快;而在同为神经网络模型的方案之间,本申请的WaveNet-MLP seq2seq神经网络模型结构的训练时间显著降低。
由此,根据本申请的实施例的时间序列预测方法和装置的优点在于以下的诸多方面:分别使用诸如WaveNet网络和MLP网络的两个神经网络模型作为编码器和解码器网络,可以使历史数据序列、未来数据序列在对应的历史时间序列的不同历史时间处和未来时间序列的不同未来时间处的计算并行进行,从而提高模型训练和使用的速度;使用诸如WaveNet网络的神经网络模型作为编码器,特别是膨胀卷积网络结构,减少了对象的历史数据序列中的信息从第一个历史时间到最后一个历史时间处的传递路径,避免在神经网络的训练过程中的梯度消失和梯度***,从而可以进行长距离的时间序列预测;仅将不随时间变化的影响因素在作为解码器部分的第二神经网络模型的输入处引入,避免在编码器网络的每个时间点处复制和计算从而减少数据和计算的冗余;对于不随时间变化的诸如静态特征的影响因素进行嵌入分组,使得降低输入数据维度的同时保持了影响因素之间的正交性。
应当注意,尽管在上文详细描述中提及了用于时间序列预测的装置的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。作为模块或单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本申请方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
在本申请的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序包括可执行指令,该可执行指令被例如处理器执行时可以实现上述任意一个实施例中所述用于时间序列预测的方法的步骤。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书用于时间序列预测的方法中描述的根据本申请各种示例性实施例的步骤。
根据本申请的实施例的用于实现上述方法的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
在本申请的示例性实施例中,还提供一种电子设备,该电子设备可以包括处理器,以及用于存储所述处理器的可执行指令的存储器。其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一个实施例中的用于时间序列预测的方法的步骤。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为***、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“***”。
下面参照图4来描述根据本申请的这种实施方式的电子设备400。图4显示的电子设备400仅仅是一个示例,不应对本申请的实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400以通用计算设备的形式表现。电子设备400的组件可以包括但不限于:至少一个处理单元410、至少一个存储单元420、连接不同***组件(包括存储单元420和处理单元410)的总线430、显示单元440等。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元410执行,使得所述处理单元410执行本说明书用于自动时间序列预 测的方法中描述的根据本申请各种示例性实施方式的步骤。例如,所述处理单元410可以执行如图2中所示的步骤。
所述存储单元420可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)4201和/或高速缓存存储单元4202,还可以进一步包括只读存储单元(ROM)4203。
所述存储单元420还可以包括具有一组(至少一个)程序模块4205的程序/实用工具4204,这样的程序模块4205包括但不限于:操作***、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线430可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、***总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备400也可以与一个或多个外部设备500(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备400交互的设备通信,和/或与使得该电子设备400能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口450进行。并且,电子设备400还可以通过网络适配器460与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。网络适配器460可以通过总线430与电子设备400的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备400使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID***、磁带驱动器以及数据备份存储***等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请的实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计 算设备(可以是个人计算机、服务器、或者网络设备等)执行根据本申请的实施方式的用于时间序列预测的方法。
本领域技术人员在考虑说明书及实践这里公开的内容后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由所附的权利要求指出。

Claims (22)

  1. 一种用于时间序列预测的方法,其特征在于,包括:
    获取与历史时间序列对应的对象的历史数据序列,所述历史数据序列中的历史数据包括与所述历史时间序列中的历史时间对应的所述对象的历史动态特征和历史值,其中所述历史动态特征与对应的历史时间相关联;
    使用第一神经网络模型基于所述历史数据序列提取与未来时间序列对应的所述对象的规律数据序列;
    基于所述规律数据序列、与所述未来时间序列对应的所述对象的未来动态特征序列、以及所述对象的未来静态特征生成预测特征数据序列,其中所述未来动态特征序列包括与所述未来时间序列中的未来时间对应的所述对象的未来动态特征,所述未来动态特征与对应的未来时间相关联;以及
    使用第二神经网络模型基于所述预测特征数据序列预测与所述未来时间序列对应的所述对象的未来数据序列,所述未来数据序列中的未来数据包括与所述未来时间序列的未来时间对应的所预测的所述对象的未来值。
  2. 根据权利要求1所述的方法,其特征在于,所述规律数据序列中的规律数据包括与未来时间序列的未来时间对应的所述对象的周期性规律特征和非周期性规律特征,其中所述非周期性规律特征与对应的未来时间相关联。
  3. 根据权利要求1所述的方法,其特征在于,所述第一神经网络模型构成seq2seq网络模型中的编码器,所述第二神经网络模型构成seq2seq网络模型中的解码器。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一神经网络模型为WaveNet网络。
  5. 根据权利要求4所述的方法,其特征在于,所述WaveNet网络为膨胀卷积神经网络。
  6. 根据权利要求5所述的方法,其特征在于,所述WaveNet网络包括至少两个卷积层,所述至少两个卷积层中的第一卷积层为膨胀系数为1的一维卷积层,所述至少两个卷积层中的在所述第一卷积层之后的其它卷积层的膨胀系数为上一卷积层的膨胀系数乘以膨胀指数。
  7. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第二神经网络模型为多层感知(MLP)网络。
  8. 根据权利要求1至3中任一项所述的方法,其特征在于,所述对象的周期性规律特征对于所述未来时间序列中的每个未来时间是相同的。
  9. 根据权利要求1至3中任一项所述的方法,其特征在于,基于所述规律数据序列、与所述未来时间序列对应的所述对象的未来动态特征序列、以及所述对象的未来静态特征生成预测特征数据序列包括:
    针对所述未来时间序列中的对应的未来时间,将所述规律数据序列中的所述周期性规律特征和所述非周期性规律特征、所述未来动态特征序列中的所述未来动态特征、以及所述未来静态特征拼接为所述预测特征数据序列。
  10. 根据权利要求1至3中任一项所述的方法,其特征在于,所述未来静态特征对于所述未来时间序列中的每个未来时间是相同的。
  11. 根据权利要求1至3、9中任一项所述的方法,其特征在于,进一步包括:
    对所述未来静态特征进行嵌入分组。
  12. 根据权利要求1至3中任一项所述的方法,其特征在于,还包括在使用所述第一神经网络模型和所述第二神经网络模型中的至少一个之前,对所使用的神经网络模型进行训练。
  13. 根据权利要求1至3中任一项所述的方法,其特征在于,所述对象为产品,所述对象的所述历史值和所述未来值分别为所述产品的历史销量和未来销量,所述历史时间和所述未来时间中的至少一个的单位包括以下中的一项:小时,日,月,年,周,季度。
  14. 根据权利要求13所述的方法,其特征在于,所述对象的所述历史动态特征和所述未来动态特征中的至少一个包括以下中的至少一项:是否是节假日,工作日的天数,距离节假日的天数或周数。
  15. 根据权利要求13所述的方法,其特征在于,所述未来静态特征包括以下中的至少一项:产品的类别,产品的温度,产品的销售位置。
  16. 一种用于时间序列预测的装置,其特征在于,包括:
    历史数据获取单元,被配置为获取与历史时间序列对应的所述对象的历史数据序列,所述历史数据序列中的历史数据包括与所述历史时间序列中的历史时间对应的所述对象的历史动态特征和历史值,其中所述历史动态特征与对应的历史时间相关联;
    规律提取单元,被配置为使用第一神经网络模型基于所述历史数据序列提取与未来时间序列对应的所述对象的规律数据序列;
    预测特征生成单元,被配置为基于所述规律数据序列、与所述未来时间序列对应的所述对象的未来动态特征序列、以及所述对象的未来静态特征生成预测特征数据序列,其中所述未来动态特征序列包括与所述未来时间序列的未来时间对应的所述对象的未来动态特征,所述未来动态特征与所述对应未来时间相关联;以及
    预测单元,被配置为使用第二神经网络模型基于所述预测特征数据序列预测与所述未来时间序列对应的所述对象的未来数据序列,所述未来数据序列中的未来数据包括与所述未来时间序列的未来时间对应的所预测的所述对象的未来值。
  17. 根据权利要求16所述的装置,其特征在于,所述规律数据序列中的规律数据包括与未来时间序列的未来时间对应的所述对象的周期性规律特征和非周期性规律特征,其中所述非周期性规律特征与对应的未来时间相关联。
  18. 根据权利要求16所述的装置,其特征在于,所述第一神经网络模型为seq2seq网络模型中的编码器网络,所述第二神经网络模型为seq2seq网络模型中的解码器网络。
  19. 根据权利要求16至18中任一项所述的装置,其特征在于,所述第一神经网络模型为WaveNet网络。
  20. 根据权利要求16至18中任一项所述的装置,其特征在于,所述第二神经网络模型为多层感知(MLP)网络。
  21. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序包括可执行指令,当该可执行指令被至少一个处理器执行时,实施根据权利要求1至15中任一项所述的方法。
  22. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器被配置为执行所述可执行指令以实施根据权利要求1至15中任一项所述的方法。
PCT/CN2021/118272 2020-09-14 2021-09-14 用于时间序列预测的方法和装置 WO2022053064A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010959817.7 2020-09-14
CN202010959817.7A CN112053004A (zh) 2020-09-14 2020-09-14 用于时间序列预测的方法和装置

Publications (1)

Publication Number Publication Date
WO2022053064A1 true WO2022053064A1 (zh) 2022-03-17

Family

ID=73610632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118272 WO2022053064A1 (zh) 2020-09-14 2021-09-14 用于时间序列预测的方法和装置

Country Status (2)

Country Link
CN (1) CN112053004A (zh)
WO (1) WO2022053064A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114971057A (zh) * 2022-06-09 2022-08-30 支付宝(杭州)信息技术有限公司 模型选择的方法及装置
CN115343621A (zh) * 2022-07-27 2022-11-15 山东科技大学 一种基于数据驱动的动力电池健康状态预测方法及设备
CN115794906A (zh) * 2022-12-02 2023-03-14 中电金信软件有限公司 一种确定突发事件影响的方法、装置、设备及存储介质
CN116307153A (zh) * 2023-03-07 2023-06-23 广东热矩智能科技有限公司 用于制冷制热***节能的气象预测方法、装置及电子设备
CN116976956A (zh) * 2023-09-22 2023-10-31 通用技术集团机床工程研究院有限公司 Crm***商机成交预测方法、装置、设备及存储介质
CN117252311A (zh) * 2023-11-16 2023-12-19 华南理工大学 一种基于改进lstm网络的轨道交通客流预测方法

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053004A (zh) * 2020-09-14 2020-12-08 胜斗士(上海)科技技术发展有限公司 用于时间序列预测的方法和装置
CN112232604B (zh) * 2020-12-09 2021-06-11 南京信息工程大学 基于Prophet模型提取网络流量的预测方法
CN112906941B (zh) * 2021-01-21 2022-12-06 哈尔滨工程大学 面向动态相关空气质量时间序列的预测方法及***
CN112967518B (zh) * 2021-02-01 2022-06-21 浙江工业大学 一种公交专用道条件下公交车辆轨迹的Seq2Seq预测方法
CN112801202B (zh) * 2021-02-10 2024-03-05 延锋汽车饰件***有限公司 车窗的起雾预测方法、***、电子设备及存储介质
CN113313316A (zh) * 2021-06-11 2021-08-27 北京明略昭辉科技有限公司 预测数据的输出方法及装置、存储介质、电子设备
CN113837858A (zh) * 2021-08-19 2021-12-24 同盾科技有限公司 用户信贷风险预测的方法、***、电子装置和存储介质
CN113850418B (zh) * 2021-09-02 2024-07-02 支付宝(杭州)信息技术有限公司 时间序列中异常数据的检测方法和装置
CN113985408B (zh) * 2021-09-13 2024-04-05 南京航空航天大学 一种结合门单元和迁移学习的逆合成孔径雷达成像方法
CN113837487A (zh) * 2021-10-13 2021-12-24 国网湖南省电力有限公司 基于组合模型的电力***负荷预测方法
CN114580798B (zh) * 2022-05-09 2022-09-16 南京安元科技有限公司 一种基于transformer的设备点位预测方法及***
CN118052310A (zh) * 2022-11-17 2024-05-17 阿里巴巴(中国)有限公司 一种时序预测优化方法、设备及存储介质
CN118095574B (zh) * 2024-04-23 2024-07-12 中国民航大学 一种基于航班链的航班过站时间预测方法、介质及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850891A (zh) * 2015-05-29 2015-08-19 厦门大学 一种时间序列预测的智能优化递归神经网络方法
CN106971348A (zh) * 2016-01-14 2017-07-21 阿里巴巴集团控股有限公司 一种基于时间序列的数据预测方法和装置
US20200074274A1 (en) * 2018-08-28 2020-03-05 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for multi-horizon time series forecasting with dynamic temporal context learning
CN110889560A (zh) * 2019-12-06 2020-03-17 西北工业大学 一种具有深度可解释性的快递序列预测的方法
CN111612215A (zh) * 2020-04-18 2020-09-01 华为技术有限公司 训练时间序列预测模型的方法、时间序列预测方法及装置
CN112053004A (zh) * 2020-09-14 2020-12-08 胜斗士(上海)科技技术发展有限公司 用于时间序列预测的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850891A (zh) * 2015-05-29 2015-08-19 厦门大学 一种时间序列预测的智能优化递归神经网络方法
CN106971348A (zh) * 2016-01-14 2017-07-21 阿里巴巴集团控股有限公司 一种基于时间序列的数据预测方法和装置
US20200074274A1 (en) * 2018-08-28 2020-03-05 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for multi-horizon time series forecasting with dynamic temporal context learning
CN110889560A (zh) * 2019-12-06 2020-03-17 西北工业大学 一种具有深度可解释性的快递序列预测的方法
CN111612215A (zh) * 2020-04-18 2020-09-01 华为技术有限公司 训练时间序列预测模型的方法、时间序列预测方法及装置
CN112053004A (zh) * 2020-09-14 2020-12-08 胜斗士(上海)科技技术发展有限公司 用于时间序列预测的方法和装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114971057A (zh) * 2022-06-09 2022-08-30 支付宝(杭州)信息技术有限公司 模型选择的方法及装置
CN115343621A (zh) * 2022-07-27 2022-11-15 山东科技大学 一种基于数据驱动的动力电池健康状态预测方法及设备
CN115343621B (zh) * 2022-07-27 2024-01-26 山东科技大学 一种基于数据驱动的动力电池健康状态预测方法及设备
CN115794906A (zh) * 2022-12-02 2023-03-14 中电金信软件有限公司 一种确定突发事件影响的方法、装置、设备及存储介质
CN116307153A (zh) * 2023-03-07 2023-06-23 广东热矩智能科技有限公司 用于制冷制热***节能的气象预测方法、装置及电子设备
CN116976956A (zh) * 2023-09-22 2023-10-31 通用技术集团机床工程研究院有限公司 Crm***商机成交预测方法、装置、设备及存储介质
CN117252311A (zh) * 2023-11-16 2023-12-19 华南理工大学 一种基于改进lstm网络的轨道交通客流预测方法
CN117252311B (zh) * 2023-11-16 2024-03-15 华南理工大学 一种基于改进lstm网络的轨道交通客流预测方法

Also Published As

Publication number Publication date
CN112053004A (zh) 2020-12-08

Similar Documents

Publication Publication Date Title
WO2022053064A1 (zh) 用于时间序列预测的方法和装置
US11928600B2 (en) Sequence-to-sequence prediction using a neural network model
US10846643B2 (en) Method and system for predicting task completion of a time period based on task completion rates and data trend of prior time periods in view of attributes of tasks using machine learning models
US10540967B2 (en) Machine reading method for dialog state tracking
US20210142181A1 (en) Adversarial training of machine learning models
US20190130249A1 (en) Sequence-to-sequence prediction using a neural network model
US20190138887A1 (en) Systems, methods, and media for gated recurrent neural networks with reduced parameter gating signals and/or memory-cell units
AU2019204399B2 (en) A neural dialog state tracker for spoken dialog systems using dynamic memory networks
US20200065812A1 (en) Methods and arrangements to detect fraudulent transactions
US20210303970A1 (en) Processing data using multiple neural networks
WO2018175972A1 (en) Device placement optimization with reinforcement learning
CN110663049A (zh) 神经网络优化器搜索
US11651212B2 (en) System and method for generating scores for predicting probabilities of task completion
US20210374544A1 (en) Leveraging lagging gradients in machine-learning model training
US20230289634A1 (en) Non-linear causal modeling based on encoded knowledge
CN112243509A (zh) 从异构源生成数据集用于机器学习的***和方法
EP4231202A1 (en) Apparatus and method of data processing
US20220391706A1 (en) Training neural networks using learned optimizers
CN116091110A (zh) 资源需求量预测模型训练方法、预测方法及装置
CN108475346B (zh) 神经随机访问机器
CN116993185B (zh) 时间序列预测方法、装置、设备及存储介质
CN113837635A (zh) 风险检测处理方法、装置及设备
US20200302303A1 (en) Optimization of neural network in equivalent class space
CN114610953A (zh) 一种数据分类方法、装置、设备及存储介质
CN111539208B (zh) 语句处理方法和装置、以及电子设备和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21866116

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21866116

Country of ref document: EP

Kind code of ref document: A1