CN111859264A - Time sequence prediction method and device based on Bayes optimization and wavelet decomposition - Google Patents

Time sequence prediction method and device based on Bayes optimization and wavelet decomposition Download PDF

Info

Publication number
CN111859264A
CN111859264A CN202010659067.1A CN202010659067A CN111859264A CN 111859264 A CN111859264 A CN 111859264A CN 202010659067 A CN202010659067 A CN 202010659067A CN 111859264 A CN111859264 A CN 111859264A
Authority
CN
China
Prior art keywords
gru
wavelet decomposition
model
decomposition
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010659067.1A
Other languages
Chinese (zh)
Other versions
CN111859264B (en
Inventor
金学波
张家辉
苏婷立
白玉廷
孔建磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202010659067.1A priority Critical patent/CN111859264B/en
Publication of CN111859264A publication Critical patent/CN111859264A/en
Application granted granted Critical
Publication of CN111859264B publication Critical patent/CN111859264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/148Wavelet transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Operations Research (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a time sequence prediction method based on Bayesian optimization and wavelet decomposition, which comprises the following steps: optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor; acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result; building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result; and obtaining a prediction result according to the training result. The invention uses Bayesian optimization algorithm to optimize the hyperparameter, and has high accuracy in long-term time sequence prediction task.

Description

Time sequence prediction method and device based on Bayes optimization and wavelet decomposition
Technical Field
The application relates to the field of time sequence prediction, in particular to a time sequence prediction method and device based on Bayesian optimization and wavelet decomposition.
Background
With the continuous progress of industrialization and urbanization, information storage, sensor networks and computer technologies are rapidly developed, and technologies such as the internet gradually play an important role in the life of people. A great deal of information comes from various interactive tasks of the Internet, and most of the information is time sequences which are sequentially and continuously generated according to the same time interval, such as the temperature of a weather monitoring station, the concentration of the PM2.5 in the atmosphere and the like, the data is not only simple record of historical events, but also stores a great deal of useful information, such as the temperature change rule of the weather monitoring station all the year round. Therefore, by studying and mining the hidden information in the data according to the time series, the change rule can be grasped and the future data can be predicted in advance.
The method of predicting data of a future period of time by modeling historical time series data is a category of time series prediction, and currently, research in the field of time series prediction has a certain basis, and the method can be roughly divided into two categories. One is a traditional probability method, the traditional time prediction method is greatly limited by given data knowledge, and modeling conditions are harsh, so the method has poor effect; the other method is a machine learning method, an algorithm for parameter learning can be designed according to task requirements only by knowing historical data, relatively speaking, modeling of a model is easy, and the machine learning method is better in a nonlinear prediction task.
The time sequence prediction method based on machine learning starts from a shallow neural network, but due to the limitation of network depth, the shallow neural network cannot accurately model complex data, so the shallow neural network can only be applied to short-term prediction, and cannot perform accurate long-term prediction tasks. In order to overcome the drawbacks of the shallow Network, the structure of the Network is gradually deepened, and deep Neural networks such as a Recurrent Neural Network (RNN) and a GRU become the mainstream research direction of time sequence prediction. However, through research, most of time series data are obtained from a real environment, so that the data often have strong volatility, randomness and complexity, and the accuracy of prediction is difficult to guarantee only by analyzing and learning the data through a deep neural network.
Disclosure of Invention
In order to solve one of the above technical problems, the present invention provides a timing prediction method and apparatus based on bayesian optimization and wavelet decomposition.
The first aspect of the embodiments of the present invention provides a time sequence prediction method based on bayesian optimization and wavelet decomposition, where the method includes:
optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;
Acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;
building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;
and obtaining a prediction result according to the training result.
Preferably, the process of optimizing the model hyper-parameters according to the bayesian optimization method to obtain the optimal hyper-parameters includes:
defining an objective function of model hyper-parametric optimization, wherein the objective function of model hyper-parametric optimization obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model hyperparametric optimized objective function;
carrying out Gaussian process processing on the objective function optimized by the model hyperparameter to obtain the posterior probability of the objective function optimized by the model hyperparameter;
and updating parameters of the Bayesian optimized target function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain the optimal hyper-parameter.
Preferably, the acquiring the collected data, and performing wavelet decomposition on the collected data according to the number of wavelet decomposition layers obtained after optimization and a mother wavelet function in the wavelet decomposition to obtain a decomposition result includes:
Decomposing the acquired data into low-frequency components and high-frequency components according to the mother wavelet function obtained after optimization and the parent wavelet function corresponding to the mother wavelet function, wherein the decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization;
processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;
and processing the high-frequency component through a high-frequency filter to obtain a high-frequency subsequence.
Preferably, the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the hyper-parameters of the GRU sub-predictor obtained after optimization to obtain the training result includes:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition through the GRU sub-predictor to obtain the training result of each subsequence.
Preferably, the process of obtaining the prediction result according to the training result includes:
and summing the training results of the subsequences to obtain a prediction result.
A second aspect of the embodiments of the present invention provides a timing prediction apparatus based on bayesian optimization and wavelet decomposition, where the apparatus includes a processor configured with operating instructions executable by the processor to perform the following operations:
Optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;
building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;
and obtaining a prediction result according to the training result.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
defining an objective function of model hyper-parametric optimization, wherein the objective function of model hyper-parametric optimization obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model hyperparametric optimized objective function;
carrying out Gaussian process processing on the objective function optimized by the model hyperparameter to obtain the posterior probability of the objective function optimized by the model hyperparameter;
and updating parameters of the Bayesian optimized target function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain the optimal hyper-parameter.
Preferably, the apparatus further comprises a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to perform operations comprising:
decomposing the acquired data into low-frequency components and high-frequency components according to the mother wavelet function obtained after optimization and the parent wavelet function corresponding to the mother wavelet function, wherein the decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization;
the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;
and the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition through the GRU sub-predictor to obtain the training result of each subsequence.
Preferably, the processor is configured with processor-executable operating instructions to perform the following operations:
and summing the training results of the subsequences to obtain a prediction result.
The invention has the following beneficial effects: the invention provides a mixed deep learning model combining a time sequence data decomposition method and a deep neural network in view of the characteristics of strong nonlinearity and strong randomness of a time sequence. And reducing the complexity of the complex sequence through wavelet decomposition, predicting the decomposed result by using a GRU network, and finally fusing to obtain a predicted result. The invention can effectively improve the accuracy of prediction, uses the Bayesian optimization algorithm to optimize the hyperparameter, and has high accuracy in long-term time sequence prediction tasks.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a timing prediction method based on Bayesian optimization and wavelet decomposition;
FIG. 2 is a schematic diagram of a wavelet decomposition process;
FIG. 3 is a diagram illustrating the result of wavelet decomposition, wherein the left region is a low frequency subsequence and the right region is a high frequency subsequence;
FIG. 4 is a schematic diagram of a GRU sub-predictor;
FIG. 5 is a schematic diagram of the overall structural framework of a WD-GRU hybrid model;
FIG. 6 is a diagram of a model prediction curve based on Bayesian optimization and stochastic search methods;
FIG. 7 is a graph showing the predicted results of Decompsition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM on PM2.5 hourly in Beijing from 3/22/2016 to 4/9/2016;
FIG. 8 is a schematic diagram showing the comparison of the details of RMSE and MAE of Decomposition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM;
FIG. 9 is a schematic diagram showing the comparison of the details of NRMSE, SMAPE and R in Decompsition-ARIMA-GRU-GRU, EMD-RNN, EMDCNN-GRU, WD-RNN, WD-LSTM.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Example 1
As shown in fig. 1, the present embodiment provides a time sequence prediction method based on bayesian optimization and wavelet decomposition, the method includes:
And S101, optimizing the model hyperparameters according to a Bayesian optimization method to obtain optimal hyperparameters.
Specifically, the hyper-parameter selection of the deep learning model directly determines the performance of the model. In this embodiment, one of the bayesian optimization methods is implemented by a hyperopt library based on python: sequence Model Based Optimization (SMBO).
When Bayesian optimization is used for determining model parameters, an objective function and an optimized hyperparametric space need to be defined. Since the training process of deep learning is effectively a black box, the Root Mean Square Error (RMSE) of the hybrid model is used as the objective function for model hyper-parametric optimization:
Figure BDA0002577829280000061
where m is the number of input samples, yi (w) is the predicted value,
Figure BDA0002577829280000062
is the actual value.
The objective function of bayesian optimization can be expressed as:
Figure BDA0002577829280000063
wherein, w*And determining the optimal parameters for Bayesian optimization, wherein W is a set of input hyperparameters, and W is a parameter space of the multidimensional hyperparameters.
The Bayesian optimization is divided into a Gaussian Process (GP) and a hyper-parameter selection process, and in the Gaussian process, when the set objective function g (w) obeys the following Gaussian distribution:
g(w)~GP(μ(w),O(w,w′))
where μ (w) is the mean of g (w), O (w, wv) is the covariance matrix of g (w), and the initial O (w, w') can be expressed as:
Figure BDA0002577829280000071
When Bayes optimization is carried out, the covariance matrix of the Gaussian process changes along with the iterative process, and a group of parameters input at the t +1 step is assumed to be wt+1Then, at this time, the covariance matrix can be expressed as:
Figure BDA0002577829280000072
wherein o ═ o (w)t+1,w1),o(wt+1,w2),...,o(wt+1,wt)]Then, the posterior probability of the objective function can be obtained:
Figure BDA0002577829280000073
where θ is the observed data, μt+1(w) is the average of the t +1 th step g (w),
Figure BDA0002577829280000074
the variance of step g (w) at t + 1.
After the posterior probability is obtained, the optimal hyper-parameter is searched by a hyper-parameter search method based on the mean and variance of the posterior probability, and the hyper-parameter search is completed by a UCB acquisition function:
Figure BDA0002577829280000075
therein, ζt+1Is a constant, S (w | θ)t) For UCB acquisition function, wt+1Selecting the hyperparameter of the t +1 step. The super-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and super-parameters of GRU sub-predictor.
S102, acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result.
Specifically, in this embodiment, after the acquired data is acquired, wavelet decomposition is performed on the acquired data to reduce the complexity of the data.
When wavelet decomposition is performed on acquired data, a mother wavelet function is selected firstly, and the mother wavelet function can be obtained by directly selecting optimization in S101:
Figure BDA0002577829280000076
each parent wavelet function has a corresponding parent wavelet function:
Figure BDA0002577829280000081
wherein k is a scaling coefficient, and k belongs to R; k is not equal to 0, h is a translation coefficient, h belongs to R, and t is a time index.
A complex sequence can be decomposed into a low-frequency subsequence and a high-frequency subsequence by a wavelet basis composed of a parent wavelet function and a parent wavelet function:
Figure BDA0002577829280000082
wherein M (t) is a decomposed sequence, ak,hRepresenting a low frequency component with a scaling factor k and a translation factor h, dk,hAnd a high-frequency component with a scaling coefficient of k and a translation coefficient of h, wherein m represents the length of the original sequence, and n represents the number of wavelet decomposition layers, and the number of wavelet decomposition layers is determined according to the number of wavelet decomposition layers obtained after optimization in S101. Then processing a using a Low Pass Filter (LPF) and a High Pass Filter (HPF)k,hAnd dk,hTo obtain a low-frequency subsequence Ak,hAnd a high frequency subsequence Dk,h. The wavelet decomposition process is illustrated in fig. 2, and fig. 3 illustrates the result of 8-level wavelet decomposition with a PM2.5 sequence selected "db 35" mother wavelet function.
S103, building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the GRU sub-predictor obtained after optimization to obtain a training result.
Specifically, the GRU is a variant of a Long Short Term Memory (LSTM) network, which is simpler and more effective than the LSTM network, and only an update gate and a reset gate are provided in the structure of the GRU network. In this embodiment, two layers of GRU sub-predictors may be used to learn and predict each part obtained by wavelet decomposition, and fig. 4 is a structure of the GRU sub-predictor in this embodiment. Wherein, the hyper-parameters of the GRU sub-predictor further specifically include: number of neurons in GRU first layer, Dropout rate, number of training times, batch size, optimizer.
GRU algorithm pseudo code:
(1) normalizing the data set theta
Figure BDA0002577829280000083
(2) Model learning of training data
Learn H based on θ
return H
Based on the above process from S101 to S103, the contents of the model training part can be realized, that is: firstly, carrying out a hyper-parameter optimization process, determining the optimal hyper-parameter of a training model according to a Bayesian optimization method, and then carrying out model training by using the set of hyper-parameters; secondly, when model training is carried out, firstly, the original sequence is decomposed based on wavelet decomposition to obtain corresponding low-frequency components and high-frequency components, and then, regular learning is carried out on each component by using a GRU sub-predictor. Therefore, the pseudo code of the Bayesian optimization algorithm can be obtained preliminarily:
Inputting: θ is the data set, g (W) is the RMSE of the model, W is the hyperparametric space (W ∈ W), H (W | θ)i) Is UCB collection function, T is number of hyper-parameters to be selected, l is number of sub-sequence of wavelet decomposition.
And (3) outputting: optimal hyperparameter w*
(1) Carry out initialization of(l)←InitSamples(g(w),θ,l)
(2)for i←|θ(l)|to T do
(3) Modeling the target function g (w), calculating the posterior probability
Figure BDA0002577829280000091
(4) The UCB acquisition function is used for parameter update,
(5) using w*The hyper-parameter trains the model provided by the invention to obtain the prediction yi←g(w*) Calculating and updating
Figure BDA0002577829280000093
(6)
Figure BDA0002577829280000094
(7)end for
(8)
Figure BDA0002577829280000095
(9)return w*
And S104, obtaining a prediction result according to the training result.
Specifically, in S103, after learning and predicting each part obtained by wavelet decomposition using the GRU sub-predictor to obtain a training result of each sub-sequence, the training results of each sub-sequence are summed to obtain a prediction result.
The method proposed in this example is further illustrated by two specific examples.
Example 1
In the method provided by the embodiment, the optimal hyper-parameter is determined by using a Bayesian optimization method, and the use of the Bayesian optimization method is demonstrated and the result is verified.
The data used in the experiment are first described and studies have shown that the PM2.5 sequence has a strongly non-linear and strongly random sequence, so this experiment uses the PM2.5 dataset from the american state department, which records the average PM2.5 concentration per hour in beijing city between 2013 and 2017 for 5 years, and 37704 bars in total, the unit of the data is μ g/m 3. The model prediction period is set to 24 steps, that is, the model realizes the function of predicting the value of the future 24 hours for the historical data of the previous 24 hours, and fig. 5 shows the overall structure of the model of the embodiment. The tests were conducted on PM2.5 content data per hour in air from 3 months 22 days 2016 to 4 months 9 days 2016 in beijing.
Next, a hyper-parameter optimization process is performed, according to the step S101, a hyper-parameter space and an objective function need to be defined, table 1 shows the hyper-parameter space used in this example, the space defines the number of wavelet decomposition layers, a mother wavelet function, the number of neurons in the first layer, Dropout rate, batch size, training times, and the range of an optimizer, and the RMSE of the hybrid model is selected as the objective function in the optimization process. After 100 selections in this example, bayesian optimization provides a set of optimal hyperparameters. Table 2 shows the result of the Bayesian optimization determined hyperparameters, and a set of hyperparameters is also obtained by using a traditional random search method for comparison.
Next, a training process of the model is performed, and the final result is shown in table 3, in which Root Mean Square Error (RMSE), normalized mean square error (NRMSE), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (SMAPE), and pearson correlation coefficient (R) are used as evaluation criteria of the model performance. Compared with the traditional random optimization method, the model trained by the Bayesian optimization method has better performance, and the RMSE reaches 21.7300 mu g/m 3The R index is 0.9276, which is a good performance. Fig. 6 is a model prediction curve based on the bayesian optimization and random search method, curve a represents a prediction curve based on the random optimization method, curve B represents a prediction curve based on the bayesian optimization method, and curve C represents a true value. It can be seen that the prediction curve based on the bayesian optimization method is closer to the true value.
In this example, in order to further verify the feasibility of bayesian optimization, an experiment is set for the phenomenon that the number of layers of wavelet decomposition will affect the performance of the whole model, 10 models are constructed in the experiment, the models use the hyper-parameters determined by bayesian optimization in table 2, each model is different only in the selection of the number of layers of wavelet decomposition, and the specific experiment setting is as follows:
(1) model 1: performing 1-level wavelet decomposition and training 2 GRUs for A1 and D1, respectively;
(2) model 2: performing 2-layer wavelet decomposition and training 3 GRUs for A2, D1 and D2, respectively;
(3) model 3: performing 3-layer wavelet decomposition, and training 4 GRUs for A3, D1-D3 respectively;
(4) model 4: performing 4-layer wavelet decomposition, and training 5 GRUs for A4, D1-D4 respectively;
(5) model 5: performing 5-layer wavelet decomposition, and training 6 GRUs for A5, D1-D5 respectively;
(6) Model 6: performing 6-layer wavelet decomposition, and training 7 GRUs for A6, D1-D6 respectively;
(7) model 7: performing 7-layer wavelet decomposition, and training 8 GRUs for A7, D1-D7 respectively;
(8) model 8: performing 8-layer wavelet decomposition, and training 9 GRUs for A8 and D1-D8 respectively;
(9) model 9: performing 9-layer wavelet decomposition, and training 10 GRUs for A9 and D1-D9 respectively;
(10) model 10: a 10-level wavelet decomposition was performed and 11 GRUs were trained for a10, D1-D10, respectively.
Table 4 gives 5 evaluation indices for the corresponding model, model 8 is a model using bayesian optimization parameters entirely. According to the results, the five indexes are optimized as the number of wavelet decomposition layers increases, and when the number of decomposition layers reaches 6, the change of the indexes tends to be stable, and the RMSE value is from 48.5712 mug/m3Reduced to 22.0185 mug/m3. The model 8 has the best value in two indexes of MAE and NRMSE, and the other 3 indexes are also very close to the best values in 10 models, and the difference is only 0.0132, so that the comprehensive consideration of the performance of the model 8 is the best, which verifies the effect of the Bayesian optimization method.
The experimental results show that: the method for carrying out the hyperparametric optimization by using the Bayesian optimization algorithm is feasible in a mixed deep learning model, and compared with the traditional hyperparametric optimization methods such as random optimization, the model trained by using the Bayesian optimization method can enable the performance of the model to reach a better level.
TABLE 1 Bayesian optimization of hyperparametric space
Figure BDA0002577829280000111
Figure BDA0002577829280000121
TABLE 2 Bayesian optimization and random search determined optimal hyperparameters
Figure BDA0002577829280000122
TABLE 3 model Performance based on Bayesian optimization and stochastic search methods
Figure BDA0002577829280000123
TABLE 4 analysis of predictive Performance of different wavelet decomposition levels
Figure BDA0002577829280000131
TABLE 5 prediction indexes of six models in the same test set
Figure BDA0002577829280000132
Example 2
A bayesian optimization algorithm was implemented and feasibility verified in example 1, in this example the advantage of the model proposed in this example (WD-GRU) in accuracy was demonstrated by comparison with other models.
First, data used in the experiment will be described, and the data set and test set used in the experiment are the same as those in example 1, and the prediction cycle setting is similarly 24 steps.
In this example, a comparison is made with five combinatorial models, which also include time series data decomposition and deep networks. The combinatorial models used included Composition-ARIMA-GRU, EMD _ RNN (EMD and RNN combined), EMDCNN _ GRU (EMD, CNN and GRU combined), WD-RNN (wavelet decomposition and RNN combined), WD-LSTM (wavelet decomposition and LSTM combined) and WD-GRU (wavelet decomposition and GRU combined) as proposed in this example.
The predicted results of these six models are shown in FIG. 7, where curves 1 to 6 represent the Composition-ARIMA-GRU-GRU model, EMD-RNN model, EMDCNN-GRU model, WD-RNN model, WD-LSTM model, and WD-GRU model of this example, respectively, and curve 7 represents the true curve, and it can be seen that these six curves have approximately the same trend as the true curve, while the WD-GRU model is closest to the true curve. Table 5 shows five evaluation indexes of six models, red in the table The color value is the optimum value for each index. The evaluation indexes of the WD-GRU model provided by the invention are optimal values, wherein RMSE reaches 21.7300 mu g/m3This is already a very low level. Fig. 8 and 9 show further details of the 5 indexes, which are improved by 38.3%, 31.5%, 51.4%, 9.8% and 17.9% respectively in the five indexes of RMSE, MAE, NRMSE, SMAPE and R, compared with the EMDCNN _ GRU model with better predictive performance, and it is obvious that the WD-GRU model makes great improvement in accuracy.
The advantages of the model of the present embodiment were further analyzed by comparison. Firstly, we find that the wavelet decomposition method has good effect in the research of PM2.5 complex sequences. In the hybrid model, the PM2.5 sequence is decomposed to reduce the complexity of the PM2.5 sequence, typically using wavelet decomposition, empirical mode decomposition, and a seasonal trend decomposition method of loess before prediction, and then prediction is performed using RNN or GRU network. Among the three mixed models of WD-RNN, EMD-RNN, and Decomposition-ARIMA-GRU-GRU in Table 5, the WD-RNN model is in the lead in all the indexes. Compared with the EMD-RNN model with higher score, although both models use the same RNN network as the sub-predictor, the WD-RNN model based on wavelet Decomposition improves RMSE by 36.0%, MAE by 36.1%, NRMSE by 34.6%, SMAPE by 23.9%, and R by 25.0%, while the composition-ARIMA-GRU-GRU model uses the GRU network with better performance but the overall prediction performance is worse.
Second, the selection of the GRU model as a predictor for each part of the wavelet decomposition may allow better performance of the model. In Table 5, the WD-RNN and WD-LSTM structures and the WD-GRU model are different only in the selection of the sub-predictor, but it can be seen that the WD-GRU model with the GRU network as the sub-predictor has obvious advantages in all indexes. The root mean square error of the proposed model was reduced by 4.7035 μ g/m compared to the WD-LSTM model3The pearson correlation coefficient increased from 0.8932 to 0.9276. Similarly, the model performance using the GRU network is better in the EMD-RNN and EMDCNN _ GRU models.
Finally, the performance of the model of the embodiment is advanced not only by the success of wavelet decomposition and GRU, but also by determining the hyper-parameters of the model by a Bayesian optimization algorithm, and the hyper-parameters can maximize the performance of the model. According to the data in Table 5, the WD-LSTM model does not use Bayesian optimization to determine the hyper-parameters, and the NRMSE is 0.0935, while the NRMSE of the WD-GRU model is 0.0682, which is difficult to be improved by only relying on the GRU network, and the same is also improved by the WD-GRU model in other indexes.
And (4) analyzing results: firstly, the training process of the model provided by the embodiment is reasonable, the super-parameters determined by Bayesian optimization can enable the performance of the model to be exerted in a larger way, and the effect of wavelet decomposition on data analysis of a complex sequence is good. Secondly, the model is superior to a common hybrid predictor in the prediction task of the PM2.5 concentration in the long-term atmosphere with the period of 24 hours, and the use value of the WD-GRU model provided by the embodiment on the prediction of the general time series is verified.
Example 2
Corresponding to embodiment 1, this embodiment proposes a timing prediction apparatus based on bayesian optimization and wavelet decomposition, where the apparatus includes a processor configured with operating instructions executable by the processor to perform the following operations:
optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;
building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;
and obtaining a prediction result according to the training result.
Specifically, the working principle and the calculation steps of the apparatus provided in this embodiment can refer to the contents described in embodiment 1, and are not described herein again. The wavelet decomposition of the embodiment reduces the complexity of the complex sequence, then the GRU network is used for predicting the decomposed result, and finally the prediction result is obtained by fusion. The prediction accuracy can be effectively improved, the Bayesian optimization algorithm is used for carrying out the optimization of the hyperparameter, and the prediction method has high accuracy in long-term time sequence prediction tasks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A time sequence prediction method based on Bayesian optimization and wavelet decomposition is characterized by comprising the following steps:
optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;
acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;
building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;
and obtaining a prediction result according to the training result.
2. The method according to claim 1, wherein the process of optimizing the model hyper-parameters according to the bayesian optimization method to obtain the optimal hyper-parameters comprises:
Defining an objective function of model hyper-parametric optimization, wherein the objective function of model hyper-parametric optimization obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model hyperparametric optimized objective function;
carrying out Gaussian process processing on the objective function optimized by the model hyperparameter to obtain the posterior probability of the objective function optimized by the model hyperparameter;
and updating parameters of the Bayesian optimized target function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain the optimal hyper-parameter.
3. The method according to claim 1 or 2, wherein the process of acquiring the acquired data, performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers obtained after optimization and a mother wavelet function in the wavelet decomposition to obtain a decomposition result comprises:
decomposing the acquired data into low-frequency components and high-frequency components according to the mother wavelet function obtained after optimization and the parent wavelet function corresponding to the mother wavelet function, wherein the decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization;
processing the low-frequency component through a low-frequency filter to obtain a low-frequency subsequence;
And processing the high-frequency component through a high-frequency filter to obtain a high-frequency subsequence.
4. The method of claim 3, wherein the building is based on a GRU sub-predictor, and the process of learning and predicting the decomposition result according to the hyperparameter of the GRU sub-predictor obtained after optimization to obtain the training result comprises:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition through the GRU sub-predictor to obtain the training result of each subsequence.
5. The method of claim 4, wherein obtaining the predicted outcome from the training outcome comprises:
and summing the training results of the subsequences to obtain a prediction result.
6. An apparatus for temporal prediction based on bayesian optimization and wavelet decomposition, the apparatus comprising a processor configured with processor-executable operational instructions to perform the following operations:
optimizing model hyper-parameters according to a Bayesian optimization method to obtain optimal hyper-parameters, wherein the model hyper-parameters comprise wavelet decomposition layer number, mother wavelet function in wavelet decomposition and hyper-parameters of a GRU sub predictor;
Acquiring acquired data, and performing wavelet decomposition on the acquired data according to the number of wavelet decomposition layers acquired after optimization and a mother wavelet function in the wavelet decomposition to acquire a decomposition result;
building a GRU-based sub-predictor, and learning and predicting the decomposition result according to the hyperparameter of the optimized GRU sub-predictor to obtain a training result;
and obtaining a prediction result according to the training result.
7. The apparatus of claim 6, wherein the processor is configured with processor-executable operating instructions to:
defining an objective function of model hyper-parametric optimization, wherein the objective function of model hyper-parametric optimization obeys Gaussian distribution;
obtaining a Bayesian optimized objective function according to the model hyperparametric optimized objective function;
carrying out Gaussian process processing on the objective function optimized by the model hyperparameter to obtain the posterior probability of the objective function optimized by the model hyperparameter;
and updating parameters of the Bayesian optimized target function by adopting a UCB acquisition function according to the mean value and the variance of the posterior probability to obtain the optimal hyper-parameter.
8. The apparatus of claim 6 or 7, further comprising a low frequency filter and a high frequency filter, the processor configured with processor-executable operating instructions to:
Decomposing the acquired data into low-frequency components and high-frequency components according to the mother wavelet function obtained after optimization and the parent wavelet function corresponding to the mother wavelet function, wherein the decomposition layer number is determined according to the wavelet decomposition layer number obtained after optimization;
the low-frequency filter processes the low-frequency component to obtain a low-frequency subsequence;
and the high-frequency filter processes the high-frequency component to obtain a high-frequency subsequence.
9. The apparatus of claim 8, wherein the processor is configured with processor-executable operating instructions to:
building a GRU sub-predictor based on a Keras Tensorflow framework;
and respectively learning and predicting the low-frequency subsequence and the high-frequency subsequence obtained after wavelet decomposition through the GRU sub-predictor to obtain the training result of each subsequence.
10. The apparatus of claim 9, wherein the processor is configured with processor-executable operating instructions to:
and summing the training results of the subsequences to obtain a prediction result.
CN202010659067.1A 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition Active CN111859264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010659067.1A CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010659067.1A CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Publications (2)

Publication Number Publication Date
CN111859264A true CN111859264A (en) 2020-10-30
CN111859264B CN111859264B (en) 2024-02-02

Family

ID=73152695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010659067.1A Active CN111859264B (en) 2020-07-09 2020-07-09 Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition

Country Status (1)

Country Link
CN (1) CN111859264B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288193A (en) * 2020-11-23 2021-01-29 国家海洋信息中心 Ocean station surface salinity prediction method based on GRU deep learning of attention mechanism
CN112434888A (en) * 2020-12-17 2021-03-02 中国计量大学上虞高等研究院有限公司 PM2.5 prediction method of bidirectional long and short term memory network based on deep learning
CN112651543A (en) * 2020-11-10 2021-04-13 沈阳工程学院 Daily electric quantity prediction method based on VMD decomposition and LSTM network
CN112749845A (en) * 2021-01-13 2021-05-04 中国工商银行股份有限公司 Model training method, resource data prediction method, device and computing equipment
CN113128132A (en) * 2021-05-18 2021-07-16 河南工业大学 Grain pile humidity and condensation prediction method based on depth time sequence
CN113360848A (en) * 2021-06-04 2021-09-07 北京工商大学 Time sequence data prediction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413494A (en) * 2019-06-19 2019-11-05 浙江工业大学 A kind of LightGBM method for diagnosing faults improving Bayes's optimization
WO2019229528A2 (en) * 2018-05-30 2019-12-05 Alexander Meyer Using machine learning to predict health conditions
CN111192453A (en) * 2019-12-30 2020-05-22 深圳市麦谷科技有限公司 Short-term traffic flow prediction method and system based on Bayesian optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019229528A2 (en) * 2018-05-30 2019-12-05 Alexander Meyer Using machine learning to predict health conditions
CN110413494A (en) * 2019-06-19 2019-11-05 浙江工业大学 A kind of LightGBM method for diagnosing faults improving Bayes's optimization
CN111192453A (en) * 2019-12-30 2020-05-22 深圳市麦谷科技有限公司 Short-term traffic flow prediction method and system based on Bayesian optimization

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHENG YONG, ET AL: "Hybrid algorithm for short-term forecasting for PM2.5 in china", ATMOSPHERIC ENVIRONMENT, vol. 200, pages 264 - 279 *
FEI WANG, ET AL: "Wavelet Decomposition and Convolutional LSTM Networks Based Improved Deep Learning Model for Solar Irradiance Forecasting", APPLIED SCIENCES, vol. 8, no. 8, XP055561337, DOI: 10.3390/app8081286 *
XIN FU, ET AL: "Short-Term Traffic Speed Prediction Method for Urban Road Sections Based on Wavelet Transform and Gated Recurrent Unit", MATHEMATICAL PROBLEMS IN ENGINEERING *
XUE-BO JIN, ET AL: "Deep Hybrid Model Based on EMD with Classification by Frequency Characteristics for Long-Term Air Quality Prediction", MATHEMATICS, vol. 8, no. 2 *
司阳;肖秦琨;: "基于长短时记忆和动态贝叶斯网络的序列预测", 计算机技术与发展, no. 09 *
梁志坚;唐云;: "一种短期风电场输出功率概率预测方法", 中国电力企业管理, no. 21 *
王欣冉;邢永丽;巨程晖;: "小波包与贝叶斯LS-SVM在石油价格预测中的应用", 统计与决策, no. 06 *
石婧文;罗树添;叶可江;须成忠;: "电商集群的流量预测与不确定性区间估计", 集成技术, no. 03 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651543A (en) * 2020-11-10 2021-04-13 沈阳工程学院 Daily electric quantity prediction method based on VMD decomposition and LSTM network
CN112288193A (en) * 2020-11-23 2021-01-29 国家海洋信息中心 Ocean station surface salinity prediction method based on GRU deep learning of attention mechanism
CN112434888A (en) * 2020-12-17 2021-03-02 中国计量大学上虞高等研究院有限公司 PM2.5 prediction method of bidirectional long and short term memory network based on deep learning
CN112749845A (en) * 2021-01-13 2021-05-04 中国工商银行股份有限公司 Model training method, resource data prediction method, device and computing equipment
CN113128132A (en) * 2021-05-18 2021-07-16 河南工业大学 Grain pile humidity and condensation prediction method based on depth time sequence
CN113360848A (en) * 2021-06-04 2021-09-07 北京工商大学 Time sequence data prediction method and device

Also Published As

Publication number Publication date
CN111859264B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN111859264B (en) Time sequence prediction method and device based on Bayesian optimization and wavelet decomposition
Liang et al. A novel wind speed prediction strategy based on Bi-LSTM, MOOFADA and transfer learning for centralized control centers
Abdar et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges
CN112784092B (en) Cross-modal image text retrieval method of hybrid fusion model
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN110751318B (en) Ultra-short-term power load prediction method based on IPSO-LSTM
CN114611792B (en) Atmospheric ozone concentration prediction method based on mixed CNN-converter model
CN112597392B (en) Recommendation system based on dynamic attention and hierarchical reinforcement learning
CN113411216B (en) Network flow prediction method based on discrete wavelet transform and FA-ELM
Wang et al. Remaining useful life estimation of aircraft engines using a joint deep learning model based on TCNN and transformer
Yang et al. An ensemble prediction system based on artificial neural networks and deep learning methods for deterministic and probabilistic carbon price forecasting
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN110210540A (en) Across social media method for identifying ID and system based on attention mechanism
CN116542701A (en) Carbon price prediction method and system based on CNN-LSTM combination model
CN116842354A (en) Feature selection method based on quantum artificial jellyfish search mechanism
CN117786602A (en) Long-period multi-element time sequence prediction method based on multi-element information interaction
CN114596726A (en) Parking position prediction method based on interpretable space-time attention mechanism
Xiao et al. Predict stock prices with ARIMA and LSTM
CN112651499A (en) Structural model pruning method based on ant colony optimization algorithm and interlayer information
CN116632834A (en) Short-term power load prediction method based on SSA-BiGRU-Attention
Zhang [Retracted] Analysis of College Students’ Network Moral Behavior by the History of Ideological and Political Education under Deep Learning
CN113657533B (en) Space-time scene construction-oriented multi-element time sequence segmentation clustering method
Huang Big data processing and analysis platform based on deep neural network model
Zheng Evaluation of Sino-foreign Cooperative Education Model by Big Data and Deep Learning
CN110648183A (en) Grey correlation and QGNN-based resident consumption price index prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant