CN111859264A

CN111859264A - Time sequence prediction method and device based on Bayes optimization and wavelet decomposition

Info

Publication number: CN111859264A
Application number: CN202010659067.1A
Authority: CN
Inventors: 金学波; 张家辉; 苏婷立; 白玉廷; 孔建磊
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-10-30
Anticipated expiration: 2040-07-09
Also published as: CN111859264B

Abstract

The invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps: optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor; acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result; building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result; and obtaining a prediction result according to the training result. The invention uses Bayesian optimization algorithm to optimize the hyperparameter, and has high accuracy in long-term time sequence prediction task.

Description

Time sequence prediction method and device based on Bayes optimization and wavelet decomposition

Technical Field

The application relates to the field of time sequence prediction, in particular to a time sequence prediction method and device based on Bayesian optimization and wavelet decomposition.

Background

With the continuous progress of industrialization and urbanization, information storage, sensor networks and computer technologies are rapidly developed, and technologies such as the internet gradually play an important role in the life of people. A great deal of information comes from various interactive tasks of the Internet, and most of the information is time sequences which are sequentially and continuously generated according to the same time interval, such as the temperature of a weather monitoring station, the concentration of the PM2.5 in the atmosphere and the like, the data is not only simple record of historical events, but also stores a great deal of useful information, such as the temperature change rule of the weather monitoring station all the year round. Therefore, by studying and mining the hidden information in the data according to the time series, the change rule can be grasped and the future data can be predicted in advance.

The method of predicting data of a future period of time by modeling historical time series data is a category of time series prediction, and currently, research in the field of time series prediction has a certain basis, and the method can be roughly divided into two categories. One is a traditional probability method, the traditional time prediction method is greatly limited by given data knowledge, and modeling conditions are harsh, so the method has poor effect; the other method is a machine learning method, an algorithm for parameter learning can be designed according to task requirements only by knowing historical data, relatively speaking, modeling of a model is easy, and the machine learning method is better in a nonlinear prediction task.

The time sequence prediction method based on machine learning starts from a shallow neural network, but due to the limitation of network depth, the shallow neural network cannot accurately model complex data, so the shallow neural network can only be applied to short-term prediction, and cannot perform accurate long-term prediction tasks. In order to overcome the drawbacks of the shallow Network, the structure of the Network is gradually deepened, and deep Neural networks such as a Recurrent Neural Network (RNN) and a GRU become the mainstream research direction of time sequence prediction. However, through research, most of time series data are obtained from a real environment, so that the data often have strong volatility, randomness and complexity, and the accuracy of prediction is difficult to guarantee only by analyzing and learning the data through a deep neural network.

Disclosure of Invention

In order to solve one of the above technical problems, the present invention provides a timing prediction method and apparatus based on bayesian optimization and wavelet decomposition.

The first aspect of the embodiments of the present invention provides a time sequence prediction method based on bayesian optimization and wavelet decomposition, where the method includes:

optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;

Acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;

building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;

and obtaining a prediction result according to the training result.

Preferably, the process of optimizing the model hyper-parameters according to the bayesian optimization method to obtain the optimal hyper-parameters includes:

defining an objective function of model hyper-parametric optimization, wherein the objective function of model hyper-parametric optimization obeys Gaussian distribution;

obtaining a Bayesian optimized objective function according to the model hyperparametric optimized objective function;

carrying out Gaussian process processing on the objective function optimized by the model hyperparameter to obtain the posterior probability of the objective function optimized by the model hyperparameter;

and updating parameters of the Bayesian optimized target function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain the optimal hyper-parameter.

Preferably, the acquiring the collected data, and performing wavelet decomposition on the collected data according to the number of wavelet decomposition layers obtained after optimization and a mother wavelet function in the wavelet decomposition to obtain a decomposition result includes:

Decomposing the acquired data into low-frequency components and high-frequency components according to the mother wavelet function obtained after optimization and the parent wavelet function corresponding to the mother wavelet function, wherein the decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization;

processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;

and processing the high-frequency component through a high-frequency filter to obtain a high-frequency subsequence.

Preferably, the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the hyper-parameters of the GRU sub-predictor obtained after optimization to obtain the training result includes:

building a GRU sub-predictor based on a Keras Tensorflow framework;

and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition through the GRU sub-predictor to obtain the training result of each subsequence.

Preferably, the process of obtaining the prediction result according to the training result includes:

and summing the training results of the subsequences to obtain a prediction result.

A second aspect of the embodiments of the present invention provides a timing prediction apparatus based on bayesian optimization and wavelet decomposition, where the apparatus includes a processor configured with operating instructions executable by the processor to perform the following operations:

and obtaining a prediction result according to the training result.

Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:

Preferably, the apparatus further comprises a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to perform operations comprising:

the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;

and the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.

building a GRU sub-predictor based on a Keras Tensorflow framework;

The invention has the following beneficial effects: the invention provides a mixed deep learning model combining a time sequence data decomposition method and a deep neural network in view of the characteristics of strong nonlinearity and strong randomness of a time sequence. And reducing the complexity of the complex sequence through wavelet decomposition, predicting the decomposed result by using a GRU network, and finally fusing to obtain a predicted result. The invention can effectively improve the accuracy of prediction, uses the Bayesian optimization algorithm to optimize the hyperparameter, and has high accuracy in long-term time sequence prediction tasks.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a timing prediction method based on Bayesian optimization and wavelet decomposition;

FIG. 2 is a schematic diagram of a wavelet decomposition process;

FIG. 3 is a diagram illustrating the result of wavelet decomposition, wherein the left region is a low frequency subsequence and the right region is a high frequency subsequence;

FIG. 4 is a schematic diagram of a GRU sub-predictor;

FIG. 5 is a schematic diagram of the overall structural framework of a WD-GRU hybrid model;

FIG. 6 is a diagram of a model prediction curve based on Bayesian optimization and stochastic search methods;

FIG. 7 is a graph showing the predicted results of Decompsition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM on PM2.5 hourly in Beijing from 3/22/2016 to 4/9/2016;

FIG. 8 is a schematic diagram showing the comparison of the details of RMSE and MAE of Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM;

FIG. 9 is a schematic diagram showing the comparison of the details of NRMSE, SMAPE and R in Decompsition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment provides a time sequence prediction method based on bayesian optimization and wavelet decomposition, the method includes:

And S101, optimizing the model hyperparameters according to a Bayesian optimization method to obtain optimal hyperparameters.

Specifically, the hyper-parameter selection of the deep learning model directly determines the performance of the model. In this embodiment, one of the bayesian optimization methods is implemented by a hyperopt library based on python: sequence Model Based Optimization (SMBO).

When Bayesian optimization is used for determining model parameters, an objective function and an optimized hyperparametric space need to be defined. Since the training process of deep learning is effectively a black box, the Root Mean Square Error (RMSE) of the hybrid model is used as the objective function for model hyper-parametric optimization:

where m is the number of input samples, yi (w) is the predicted value,

is the actual value.

The objective function of bayesian optimization can be expressed as:

wherein, w^*And determining the optimal parameters for Bayesian optimization, wherein W is a set of input hyperparameters, and W is a parameter space of the multidimensional hyperparameters.

The Bayesian optimization is divided into a Gaussian Process (GP) and a hyper-parameter selection process, and in the Gaussian process, when the set objective function g (w) obeys the following Gaussian distribution:

g(w)～GP(μ(w),O(w,w′))

where μ (w) is the mean of g (w), O (w, wv) is the covariance matrix of g (w), and the initial O (w, w') can be expressed as:

When Bayes optimization is carried out, the covariance matrix of the Gaussian process changes along with the iterative process, and a group of parameters input at the t +1 step is assumed to be w_t+1Then, at this time, the covariance matrix can be expressed as:

wherein o ═ o (w)_t+1,w₁),o(w_t+1,w₂),...,o(w_t+1,w_t)]Then, the posterior probability of the objective function can be obtained:

where θ is the observed data, μ_t+1(w) is the average of the t +1 th step g (w),

the variance of step g (w) at t + 1.

After the posterior probability is obtained, the optimal hyper-parameter is searched by a hyper-parameter search method based on the mean and variance of the posterior probability, and the hyper-parameter search is completed by a UCB acquisition function:

therein, ζ_t+1Is a constant, S (w | θ)_t) For UCB acquisition function, w_t+1Selecting the hyperparameter of the t +1 step. The super-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and super-parameters of GRU sub-predictor.

S102, acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result.

Specifically, in this embodiment, after the acquired data is acquired, wavelet decomposition is performed on the acquired data to reduce the complexity of the data.

When wavelet decomposition is performed on acquired data, a mother wavelet function is selected firstly, and the mother wavelet function can be obtained by directly selecting optimization in S101:

each parent wavelet function has a corresponding parent wavelet function:

wherein k is a scaling coefficient, and k belongs to R; k is not equal to 0, h is a translation coefficient, h belongs to R, and t is a time index.

A complex sequence can be decomposed into a low-frequency subsequence and a high-frequency subsequence by a wavelet basis composed of a parent wavelet function and a parent wavelet function:

wherein M (t) is a decomposed sequence, a_k,hRepresenting a low frequency component with a scaling factor k and a translation factor h, d_k,hAnd a high-frequency component with a scaling coefficient of k and a translation coefficient of h, wherein m represents the length of the original sequence, and n represents the number of wavelet decomposition layers, and the number of wavelet decomposition layers is determined according to the number of wavelet decomposition layers obtained after optimization in S101. Then processing a using a Low Pass Filter (LPF) and a High Pass Filter (HPF)_k,hAnd d_k,hTo obtain a low-frequency subsequence A_k,hAnd a high frequency subsequence D_k,h. The wavelet decomposition process is illustrated in fig. 2, and fig. 3 illustrates the result of 8-level wavelet decomposition with a PM2.5 sequence selected "db 35" mother wavelet function.

S103, building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the GRU sub-predictor obtained after optimization to obtain a training result.

Specifically, the GRU is a variant of a Long Short Term Memory (LSTM) network, which is simpler and more effective than the LSTM network, and only an update gate and a reset gate are provided in the structure of the GRU network. In this embodiment, two layers of GRU sub-predictors may be used to learn and predict each part obtained by wavelet decomposition, and fig. 4 is a structure of the GRU sub-predictor in this embodiment. Wherein, the hyper-parameters of the GRU sub-predictor further specifically include: number of neurons in GRU first layer, Dropout rate, number of training times, batch size, optimizer.

GRU algorithm pseudo code:

(1) normalizing the data set theta

(2) Model learning of training data

Learn H based on θ

return H

Based on the above process from S101 to S103, the contents of the model training part can be realized, that is: firstly, carrying out a hyper-parameter optimization process, determining the optimal hyper-parameter of a training model according to a Bayesian optimization method, and then carrying out model training by using the set of hyper-parameters; secondly, when model training is carried out, firstly, the original sequence is decomposed based on wavelet decomposition to obtain corresponding low-frequency components and high-frequency components, and then, regular learning is carried out on each component by using a GRU sub-predictor. Therefore, the pseudo code of the Bayesian optimization algorithm can be obtained preliminarily:

Inputting: θ is the data set, g (W) is the RMSE of the model, W is the hyperparametric space (W ∈ W), H (W | θ)_i) Is UCB collection function, T is number of hyper-parameters to be selected, l is number of sub-sequence of wavelet decomposition.

And (3) outputting: optimal hyperparameter w^*。

(1) Carry out initialization of^(l)←InitSamples(g(w),θ,l)

(2)for i←|θ^(l)|to T do

(3) Modeling the target function g (w), calculating the posterior probability

(4) The UCB acquisition function is used for parameter update,

(5) using w^*The hyper-parameter trains the model provided by the invention to obtain the prediction y_i←g(w^*) Calculating and updating

(6)

(7)end for

(8)

(9)return w^*

And S104, obtaining a prediction result according to the training result.

Specifically, in S103, after learning and predicting each part obtained by wavelet decomposition using the GRU sub-predictor to obtain a training result of each sub-sequence, the training results of each sub-sequence are summed to obtain a prediction result.

The method proposed in this example is further illustrated by two specific examples.

Example 1

In the method provided by the embodiment, the optimal hyper-parameter is determined by using a Bayesian optimization method, and the use of the Bayesian optimization method is demonstrated and the result is verified.

The data used in the experiment are first described and studies have shown that the PM2.5 sequence has a strongly non-linear and strongly random sequence, so this experiment uses the PM2.5 dataset from the american state department, which records the average PM2.5 concentration per hour in beijing city between 2013 and 2017 for 5 years, and 37704 bars in total, the unit of the data is μ g/m ³. The model prediction period is set to 24 steps, that is, the model realizes the function of predicting the value of the future 24 hours for the historical data of the previous 24 hours, and fig. 5 shows the overall structure of the model of the embodiment. The tests were conducted on PM2.5 content data per hour in air from 3 months 22 days 2016 to 4 months 9 days 2016 in beijing.

Next, a hyper-parameter optimization process is performed, according to the step S101, a hyper-parameter space and an objective function need to be defined, table 1 shows the hyper-parameter space used in this example, the space defines the number of wavelet decomposition layers, a mother wavelet function, the number of neurons in the first layer, Dropout rate, batch size, training times, and the range of an optimizer, and the RMSE of the hybrid model is selected as the objective function in the optimization process. After 100 selections in this example, bayesian optimization provides a set of optimal hyperparameters. Table 2 shows the result of the Bayesian optimization determined hyperparameters, and a set of hyperparameters is also obtained by using a traditional random search method for comparison.

Next, a training process of the model is performed, and the final result is shown in table 3, in which Root Mean Square Error (RMSE), normalized mean square error (NRMSE), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (SMAPE), and pearson correlation coefficient (R) are used as evaluation criteria of the model performance. Compared with the traditional random optimization method, the model trained by the Bayesian optimization method has better performance, and the RMSE reaches 21.7300 mu g/m ³The R index is 0.9276, which is a good performance. Fig. 6 is a model prediction curve based on the bayesian optimization and random search method, curve a represents a prediction curve based on the random optimization method, curve B represents a prediction curve based on the bayesian optimization method, and curve C represents a true value. It can be seen that the prediction curve based on the bayesian optimization method is closer to the true value.

In this example, in order to further verify the feasibility of bayesian optimization, an experiment is set for the phenomenon that the number of layers of wavelet decomposition will affect the performance of the whole model, 10 models are constructed in the experiment, the models use the hyper-parameters determined by bayesian optimization in table 2, each model is different only in the selection of the number of layers of wavelet decomposition, and the specific experiment setting is as follows:

(1) model 1: performing 1-level wavelet decomposition and training 2 GRUs for A1 and D1, respectively;

(2) model 2: performing 2-layer wavelet decomposition and training 3 GRUs for A2, D1 and D2, respectively;

(3) model 3: performing 3-layer wavelet decomposition, and training 4 GRUs for A3, D1-D3 respectively;

(4) model 4: performing 4-layer wavelet decomposition, and training 5 GRUs for A4, D1-D4 respectively;

(5) model 5: performing 5-layer wavelet decomposition, and training 6 GRUs for A5, D1-D5 respectively;

(6) Model 6: performing 6-layer wavelet decomposition, and training 7 GRUs for A6, D1-D6 respectively;

(7) model 7: performing 7-layer wavelet decomposition, and training 8 GRUs for A7, D1-D7 respectively;

(8) model 8: performing 8-layer wavelet decomposition, and training 9 GRUs for A8 and D1-D8 respectively;

(9) model 9: performing 9-layer wavelet decomposition, and training 10 GRUs for A9 and D1-D9 respectively;

(10) model 10: a 10-level wavelet decomposition was performed and 11 GRUs were trained for a10, D1-D10, respectively.

Table 4 gives 5 evaluation indices for the corresponding model, model 8 is a model using bayesian optimization parameters entirely. According to the results, the five indexes are optimized as the number of wavelet decomposition layers increases, and when the number of decomposition layers reaches 6, the change of the indexes tends to be stable, and the RMSE value is from 48.5712 mug/m³Reduced to 22.0185 mug/m³. The model 8 has the best value in two indexes of MAE and NRMSE, and the other 3 indexes are also very close to the best values in 10 models, and the difference is only 0.0132, so that the comprehensive consideration of the performance of the model 8 is the best, which verifies the effect of the Bayesian optimization method.

The experimental results show that: the method for carrying out the hyperparametric optimization by using the Bayesian optimization algorithm is feasible in a mixed deep learning model, and compared with the traditional hyperparametric optimization methods such as random optimization, the model trained by using the Bayesian optimization method can enable the performance of the model to reach a better level.

TABLE 1 Bayesian optimization of hyperparametric space

TABLE 2 Bayesian optimization and random search determined optimal hyperparameters

TABLE 3 model Performance based on Bayesian optimization and stochastic search methods

TABLE 4 analysis of predictive Performance of different wavelet decomposition levels

TABLE 5 prediction indexes of six models in the same test set

Example 2

A bayesian optimization algorithm was implemented and feasibility verified in example 1, in this example the advantage of the model proposed in this example (WD-GRU) in accuracy was demonstrated by comparison with other models.

First, data used in the experiment will be described, and the data set and test set used in the experiment are the same as those in example 1, and the prediction cycle setting is similarly 24 steps.

In this example, a comparison is made with five combinatorial models, which also include time series data decomposition and deep networks. The combinatorial models used included Composition-ARIMA-GRU, EMD _ RNN (EMD and RNN combined), EMDCNN _ GRU (EMD, CNN and GRU combined), WD-RNN (wavelet decomposition and RNN combined), WD-LSTM (wavelet decomposition and LSTM combined) and WD-GRU (wavelet decomposition and GRU combined) as proposed in this example.

The predicted results of these six models are shown in FIG. 7, where curves 1 to 6 represent the Composition-ARIMA-GRU-GRU model, EMD-RNN model, EMDCNN-GRU model, WD-RNN model, WD-LSTM model, and WD-GRU model of this example, respectively, and curve 7 represents the true curve, and it can be seen that these six curves have approximately the same trend as the true curve, while the WD-GRU model is closest to the true curve. Table 5 shows five evaluation indexes of six models, red in the table The color value is the optimum value for each index. The evaluation indexes of the WD-GRU model provided by the invention are optimal values, wherein RMSE reaches 21.7300 mu g/m³This is already a very low level. Fig. 8 and 9 show further details of the 5 indexes, which are improved by 38.3%, 31.5%, 51.4%, 9.8% and 17.9% respectively in the five indexes of RMSE, MAE, NRMSE, SMAPE and R, compared with the EMDCNN _ GRU model with better predictive performance, and it is obvious that the WD-GRU model makes great improvement in accuracy.

The advantages of the model of the present embodiment were further analyzed by comparison. Firstly, we find that the wavelet decomposition method has good effect in the research of PM2.5 complex sequences. In the hybrid model, the PM2.5 sequence is decomposed to reduce the complexity of the PM2.5 sequence, typically using wavelet decomposition, empirical mode decomposition, and a seasonal trend decomposition method of loess before prediction, and then prediction is performed using RNN or GRU network. Among the three mixed models of WD-RNN, EMD-RNN, and Decomposition-ARIMA-GRU-GRU in Table 5, the WD-RNN model is in the lead in all the indexes. Compared with the EMD-RNN model with higher score, although both models use the same RNN network as the sub-predictor, the WD-RNN model based on wavelet Decomposition improves RMSE by 36.0%, MAE by 36.1%, NRMSE by 34.6%, SMAPE by 23.9%, and R by 25.0%, while the composition-ARIMA-GRU-GRU model uses the GRU network with better performance but the overall prediction performance is worse.

Second, the selection of the GRU model as a predictor for each part of the wavelet decomposition may allow better performance of the model. In Table 5, the WD-RNN and WD-LSTM structures and the WD-GRU model are different only in the selection of the sub-predictor, but it can be seen that the WD-GRU model with the GRU network as the sub-predictor has obvious advantages in all indexes. The root mean square error of the proposed model was reduced by 4.7035 μ g/m compared to the WD-LSTM model³The pearson correlation coefficient increased from 0.8932 to 0.9276. Similarly, the model performance using the GRU network is better in the EMD-RNN and EMDCNN _ GRU models.

Finally, the performance of the model of the embodiment is advanced not only by the success of wavelet decomposition and GRU, but also by determining the hyper-parameters of the model by a Bayesian optimization algorithm, and the hyper-parameters can maximize the performance of the model. According to the data in Table 5, the WD-LSTM model does not use Bayesian optimization to determine the hyper-parameters, and the NRMSE is 0.0935, while the NRMSE of the WD-GRU model is 0.0682, which is difficult to be improved by only relying on the GRU network, and the same is also improved by the WD-GRU model in other indexes.

And (4) analyzing results: firstly, the training process of the model provided by the embodiment is reasonable, the super-parameters determined by Bayesian optimization can enable the performance of the model to be exerted in a larger way, and the effect of wavelet decomposition on data analysis of a complex sequence is good. Secondly, the model is superior to a common hybrid predictor in the prediction task of the PM2.5 concentration in the long-term atmosphere with the period of 24 hours, and the use value of the WD-GRU model provided by the embodiment on the prediction of the general time series is verified.

Example 2

Corresponding to embodiment 1, this embodiment proposes a timing prediction apparatus based on bayesian optimization and wavelet decomposition, where the apparatus includes a processor configured with operating instructions executable by the processor to perform the following operations:

and obtaining a prediction result according to the training result.

Specifically, the working principle and the calculation steps of the apparatus provided in this embodiment can refer to the contents described in embodiment 1, and are not described herein again. The wavelet decomposition of the embodiment reduces the complexity of the complex sequence, then the GRU network is used for predicting the decomposed result, and finally the prediction result is obtained by fusion. The prediction accuracy can be effectively improved, the Bayesian optimization algorithm is used for carrying out the optimization of the hyperparameter, and the prediction method has high accuracy in long-term time sequence prediction tasks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A time sequence prediction method based on Bayesian optimization and wavelet decomposition is characterized by comprising the following steps:

and obtaining a prediction result according to the training result.

2. The method according to claim 1, wherein the process of optimizing the model hyper-parameters according to the bayesian optimization method to obtain the optimal hyper-parameters comprises:

3. The method according to claim 1 or 2, wherein the process of acquiring the acquired data, performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers obtained after optimization and a mother wavelet function in the wavelet decomposition to obtain a decomposition result comprises:

4. The method of claim 3, wherein the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the hyperparameter of the GRU sub-predictor obtained after optimization to obtain the training result comprises:

building a GRU sub-predictor based on a Keras Tensorflow framework;

5. The method of claim 4, wherein obtaining the predicted outcome from the training outcome comprises:

6. An apparatus for temporal prediction based on bayesian optimization and wavelet decomposition, the apparatus comprising a processor configured with processor-executable operational instructions to perform the following operations:

and obtaining a prediction result according to the training result.

7. The apparatus of claim 6, wherein the processor is configured with processor-executable operating instructions to:

8. The apparatus of claim 6 or 7, further comprising a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to:

9. The apparatus of claim 8, wherein the processor is configured with processor-executable operating instructions to:

building a GRU sub-predictor based on a Keras Tensorflow framework;

10. The apparatus of claim 9, wherein the processor is configured with processor-executable operating instructions to: