CN104504475A - AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method - Google Patents

AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method Download PDF

Info

Publication number
CN104504475A
CN104504475A CN201410837471.8A CN201410837471A CN104504475A CN 104504475 A CN104504475 A CN 104504475A CN 201410837471 A CN201410837471 A CN 201410837471A CN 104504475 A CN104504475 A CN 104504475A
Authority
CN
China
Prior art keywords
model
svm
time series
sequence
haze
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410837471.8A
Other languages
Chinese (zh)
Inventor
李卫民
张礼名
周扬
王盛
毛敏娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201410837471.8A priority Critical patent/CN104504475A/en
Publication of CN104504475A publication Critical patent/CN104504475A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method. The method includes: firstly, establishing AR* models for a haze time series; secondly, performing modeling of an AR*-SVM hybrid model on an original series and an innovation series acquired from the AR* models by applying an SVM module; wherein the AR*-SVM hybrid model acquires linear and nonlinear parts of a haze time series stream through the AR* and SVM models, and the AR* and SVM models are combined to improve modeling and prediction performances of the whole haze time series stream. The AR* models and the SVM models are combined, and different aspects of hidden patterns in the time series stream are captured through the models, so that degree of fitting of the models is improved, prediction accuracy of the haze series is improved, and tests prove that the hybrid modeling method has better results than the two methods applied independently.

Description

Based on AR *the haze Time Series Forecasting Methods of-SVM hybrid modeling
Technical field
The present invention relates to a kind of Forecasting Methodology of haze, particularly a kind of haze Time Series Forecasting Methods based on AR*-SVM hybrid modeling.
Background technology
The multifactor time series stream of haze, illustrates the process of studied object development and change within a period of time.The Forecasting Methodology of so-called time series stream eigenwert, just refer in one group of actual measurement haze index and pollution source index time series stream basis of institute's research object, by the analyzing and processing means of various mathematics, find out the variation characteristic of data, development trend and rule, and then estimation is made to the state of certain moment research object following.Like this, just all factors of influence research object are integrated description by the time.Because the multifactor time series stream of haze itself is containing the characteristic of noise, instability and chaos, so it is very difficult for will obtaining full detail in the data of history, therefore will to set up funtcional relationship between future value and historical record be also not easily.
Haze time series stream is the sequence of a non-stationary.It is very difficult for a non-stationary series being mapped to a suitable linear model, so be usually cannot be gratifying based on the such as prediction of the time series models of ARMA model (ARMA).Large quantity research shows, the yield volatility existence condition Singular variance of time series stream, namely variance not only changes along with the time, and there is vary within wide limits and the little feature concentrated on respectively in some time period of amplitude---undulatory property cluster, this phenomenon is also exist in fields such as finance, electric power, weather.GARCH is the ARCH model of broad sense, and the object of GARCH modeling is understanding to the changeableness of time series stream and modeling.The conditional variance modeling become when GARCH model is by come variance and covariance Accurate Prediction pair, solves excessive kurtosis discussed above and the cluster problem of fluctuation well.
For haze time series stream, be difficult to build the GARCH model be applicable to, base prediction thereon is also unsatisfied.Because GARCH exists many deficiencies in the modeling of haze data, many alternative methods such as nonlinear model is also suggested the modeling for this respect, to improve the effect of model and forecast.
The present invention is referred to as AR* model AR, ARMA, ARIMA, ARCH and GARCH model.For time series stream, be difficult to build the AR* model be applicable to, base prediction thereon is also unsatisfied.Because linear session series model AR* exists many deficiencies in the modeling of time series stream, many alternative methods such as nonlinear model is also suggested the modeling for this respect, to improve the effect of model and forecast.
Neural network model has a wide range of applications, and increasing researcher utilizes neural network model to carry out the variation tendency of predicted time sequence flows in recent years.When setting up neural network model for time series stream, the estimated performance of researcher to neural network model and linear model is had to be studied contrast, by using the data of such as industry, finance, weather and microeconomy aspect, the experimental result drawn shows that nerual network technique has absolute advantage relative to linear model.The major advantage of neural network model is its Nonlinear Modeling ability, and many research also all provides respective experiment, illustrates that the forecasting accuracy of nonlinear neural network model on time series stream shows and have good performance than linear model.
Support vector machine (SVM, Support Vector Machine) is the learning algorithm of the structure based principle of minimization risk being proposed better to solve the practical problemss such as small sample, non-linear and high dimension by Vanpik etc.Support vector machine method solves to be predicted by the method for linear regression in high-dimensional feature space, increases the complicacy of calculating hardly, avoids the dimension calamity rising dimension process and may occur.The network structure that it also avoid the methods such as artificial neural network is difficult to determine, cross study and owe the problems such as study and local minimum, and the best be considered at present for the problem such as classification, recurrence of small sample is theoretical.
SVM has better effect relative to AR* time series flow model.But for a certain haze time series stream, be difficult to judge that it is pure linear process or nonlinear process, therefore be difficult to it and select suitable model to carry out matching and modeling.Although many documents show the prediction having many approach application to time sequence flows, and draw comparatively accurate result, but because haze time series stream exists many labile factors, SVM technology and other nerual network techniques are not used to the best model of haze time series stream prediction.
Summary of the invention
The object of the invention is to the defect existed for prior art, provide a kind of haze Time Series Forecasting Methods based on AR*-SVM hybrid modeling, to obtain the better effect of haze time series forecasting.
For solving the problems of the technologies described above, the present invention adopts following technical scheme:
Based on a haze Time Series Forecasting Methods for AR*-SVM hybrid modeling, operation steps is as follows:
The first step, sets up AR* model to haze time series stream, first identifies the exponent number of this model, determine the parameter of AR* model and estimate, the final linear segment used in AR* model analysis flow data; The information of this linear segment is by using AR* model to draw seasonal effect in time series innovation sequence { ε tobtain, this innovation sequence contains statistics and the fluctuation information of time series stream; Use it as the part building AR*-SVM mixture model, not only can reduce noise level, improve the accuracy of prediction simultaneously by the statistics of acquisition time sequence flows and fluctuation information;
Second step, uses SVM model to carry out the modeling of AR*-SVM mixture model to original series and the innovation sequence that obtains from AR* model; So mixture model AR*-SVM is the linear processes part being obtained haze time series stream by AR* and SVM model respectively, and combine the model and forecast performance improving whole haze time series stream.
Preferably, described AR* model comprises auto regressive moving average arma modeling, seasonal auto regressive moving average ARIMA model and generilized auto regressive conditional heteroskedastic GARCH model line pattern type, and this each model is as follows:
1) ARMA is ARMA model:
x t = Σ m = 1 p a m x t - m + ϵ t + Σ n = 1 q b n ϵ t - n
Wherein x trefer to the observed reading of t, x t-mrefer to the observed reading in t-m moment, (a 1, a 2... a p) be called autoregressive coefficient, argument (b 1, b 2... b q) be called running mean coefficient, { ε tbe white noise sequence, be also referred to as innovation sequence, the exponent number that (p, q) is arma modeling.Build seasonal effect in time series ARMA (p, q) model first to need to determine its p, q value.
By calculating AIC, the exponent number p of ARMA (p, q) model, q can determine that AIC information criterion and Akaike information criterion are a kind of standards of measure statistical models fitting Optimality.By calculating the different p of this sequence, the AIC value of q value, get the AR* model that minimum AIC value decides this sequence, AIC computing formula is as follows:
x t = Σ m = 1 p a m x t - m + ϵ t + Σ n = 1 q b n ϵ t - n
Wherein be the estimation to noise item variance, N is the length of sequence.
2) ARIMA is seasonal ARMA model:
If haze time series { x td jump divide y t=(1-B) dx tbe stable ARMA (p, a q) sequence, wherein B is One-step delay operator, represents the time of current sequence value to pulling out moment, i.e. a Bx in the past t=x t-1; D>=1 is integer, then claim { x tfor having rank p, the autoregression summation moving average ARIMA model of d and q, also referred to as seasonal ARMA model, is designated as { x t} ~ ARIMA (p, d, q);
3) GARCH model is EC GARCH:
y t=f(t-1,X)+ε t
σ t 2 = ω + Σ i = 1 p a i ϵ t - i 2 + Σ j = 1 q β j σ t - j 2
First formula y tbe one with error term ε tthe average equation about sequence X; Second formula be the first phase forward prediction variance based on foregoing information, be made up of three parts: one is average ω; Two are be called ARCH (ARCH) item, by the delayed undulatory property information of measuring from obtaining of the residuals squares of average equation above wherein a i, β jfor parameter; Three are being called generilized auto regressive conditional heteroskedastic (GARCH) item, is the prediction variance of upper first phase
AR* (p, the q) model in AR*-SVM framework can be determined by flow process above.
Preferably, described support vector machines is:
The theoretical foundation of this support vector machines is Statistical Learning Theory, the same with radial primary function network as multi-Layer Perceptron Neural Network, can be used for pattern classification and non-linear regression; Its core concept is, by Kernel Function Transformation, the sample of the input space is mapped to high-dimensional feature space, in high-dimensional feature space, find optimal classification surface, thus distinguishes sample; Therefore choosing of correlation parameter is the key determining SVM performance after the selection of kernel function type and definite kernel function; Owing to also not constructing the effective ways of suitable kernel function at present for particular problem, the standard kernel functions such as most or Polynomial kernel function, RBF kernel function, the perceptron kernel function that utilize in reality; RBF kernel function is a pervasive kernel function, is applicable to the sample of Arbitrary distribution by adjustment parameter;
Given training sample data (x i, y i), i=1,2 ..., l, x ∈ R m, y ∈ R, wherein x ifor input vector, y ibe corresponding output valve, l is number of samples, and support vector regression is exactly by data x by a Nonlinear Mapping ξ ibe mapped to high-dimensional feature space G, and carry out linear regression in this space:
y=g(x)=σ Tξ(x)+b
Wherein σ is the weight vector of lineoid, and b is bias term;
The analytic expression of the linear regression lineoid that support vector machines determines is as follows:
f ( x ) = Σ i = 1 l ( α ‾ i * - α ‾ i ) K ( x i , x ) + b ‾
Wherein f (x) is categorised decision function, i=1,2,3 ... l, l are the number of training sample, for gaussian radial basis function kernel function. the optimum solution of dual problem, for threshold value.
The present invention compared with prior art, has following apparent outstanding substantive distinguishing features and remarkable technical progress:
The present invention breaks through model AR* and cannot go up on time series stream and cannot carry out Nonlinear Modeling, also compensate for the deficiency of support vector machine at linear sequence prediction, AR*-SVM method with mixed model is proposed, it is advantageous that the advantage that make use of two class models, be applicable to linear processes modeling, this is a kind of strategy preferably to the solution of actual application problem.The present invention proposes an AR* class model and SVM models coupling, the different aspect of hidden patterns in pull-in time sequence flows is carried out by this two class model, thus improve the degree of fitting of model, to improve the precision of prediction of haze sequence, test also shows that hybrid modeling method of the present invention all has good result than using separately these two kinds of methods.The present invention adopts AR*-SVM and generalized regression nerve networks (GRNN), SVM model to compare analysis, and the data value of prediction and the actual value of target data compare, and calculate MAE, RMSE, MAPE respectively, test findings.
Accompanying drawing explanation
Fig. 1 is the structural representation of AR*-SVM model.
Fig. 2 is the P2.5 data of Shanghai haze January to March.
The AQI achievement data in nearly 24 hours of September 17 of Fig. 3 Shanghai.
Embodiment
The preferred embodiments of the present invention accompanying drawings is as follows:
The present invention proposes AR*-SVM method with mixed model, make use of the advantage of AR* model and SVM model, is applicable to linear processes modeling, and this is a kind of strategy preferably to the solution of actual application problem.The present invention proposes an AR* class model and SVM models coupling, i.e. AR*-SVM method with mixed model, is carried out the different aspect of hidden patterns in pull-in time sequence flows by this two class model.
AR* model refers to the general name to ARMA, ARIMA, ARCH and GARCH model, wherein auto regressive moving average arma modeling, seasonal auto regressive moving average ARIMA model and generilized auto regressive conditional heteroskedastic GARCH model line pattern type.SVM model is supporting vector machine model.
SVM technology describes historical perspective value (X t-1, X t-2..., X t-k, ε) and future value X ibetween nonlinear function, the relation between them can be represented by following formula:
X i=F(X t-1,X t-2,…,X t-k,ε)
ε is parameter vector, and F is the weight of the function that formed of acute pyogenic infection of finger tip SVM and link wherein, and Fig. 1 gives the architecture of mixture model AR*--SVM, the innovation sequence that the data of SVM mode input comprise original stream data and derived by AR*.Input vector (Y i-1) and output vector (Y i) all between mathematic(al) representation as follows:
Y i=F(Y i-1,ε)i=t,t+1,…t+l;
Vector (Yi-1) is equal to (Xi-1, Xi-2 ... Xi-k), not only comprise historical perspective value, and comprise innovation sequence, the same non linear autoregressive models such as AR*-SVM network add the assembly of an energy extraction time sequence flows fluctuation and statistical information.If vector (Yi-1) is m dimension, what so carry out is the prediction of a m step.
Test figure of the present invention adopts haze achievement data.Standard mean absolute error (the Mean Absolute Error of prediction and evaluation, MAE), square error square root (Root Mean Square Error, RMSE), mean absolute error rate (Mean Absolute Percent Error, MAPE), X is given time series stream, Y is predicted value, difference between predicted value and given sequential value is fewer, the accuracy rate of its prediction is high, therefore the value of MAE, RMSE, MAPE is less, the prediction of the method is more accurate, effective.Formula is as follows:
MAE = 1 N Σ t = 1 N | X t - Y t |
RMSE = 1 N Σ 1 N ( X t - Y t ) 2
MAPE = 1 N Σ t = 1 N | X t - Y t Y t |
Contrast as follows:
MAE, RMSE, MAPE comparative analysis of each method of table 1
Below two kinds of preferred embodiments of the haze Time Series Forecasting Methods based on AR*-SVM mixture model:
Embodiment one:
See Fig. 1, this is based on the haze Time Series Forecasting Methods of AR*-SVM mixture model, and take full advantage of the advantage of two models, overcome the deficiency of each self model, operation steps is as follows:
The first step, sets up AR* model to haze time series stream, first identifies the exponent number of this model, determine the parameter of AR* model and estimate, the final linear segment used in AR* model analysis flow data; The information of this linear segment is by using AR* model to draw seasonal effect in time series innovation sequence { ε tobtain, this innovation sequence contains statistics and the fluctuation information of time series stream; Use it as the part building AR*-SVM mixture model, not only can reduce noise level, improve the accuracy of prediction simultaneously by the statistics of acquisition time sequence flows and fluctuation information;
Second step, uses SVM model to carry out the modeling of AR*-SVM mixture model to original series and the innovation sequence that obtains from AR* model; So mixture model AR*-SVM is the linear processes part being obtained haze time series stream by AR* and SVM model respectively, and combine the model and forecast performance improving whole haze time series stream.
Preferably, described AR* model comprises auto regressive moving average arma modeling, the line style models such as seasonal auto regressive moving average ARIMA model and generilized auto regressive conditional heteroskedastic GARCH model, and this each model is as follows:
1) ARMA is ARMA model:
x t = Σ m = 1 p a m x t - m + ϵ t + Σ n = 1 q b n ϵ t - n
Wherein x trefer to the observed reading of t, x t-mrefer to the observed reading in t-m moment, (a 1, a 2... a p) be called autoregressive coefficient, argument (b 1, b 2... b q) be called running mean coefficient, { ε tbe white noise sequence, be also referred to as innovation sequence, the exponent number that (p, q) is arma modeling.Build seasonal effect in time series ARMA (p, q) model first to need to determine its p, q value.
ARMA (p, q) the exponent number p of model, q can determine by calculating AIC, AIC information criterion and Akaike information criterion, it is a kind of standard of measure statistical models fitting Optimality, because it is that Japanese statistician Chi Chi expands time foundation and develops, therefore also known as akaike information criterion.It is based upon on the conceptual foundation of entropy, can weigh the complexity of estimated model and the Optimality of these models fitting data.By calculating the different p of this sequence, the AIC value of q value, get the AR* model that minimum AIC value decides this sequence, AIC computing formula is as follows:
AIC ( p , q ) = ln σ ^ α 2 ( p , q ) + 2 ( p + q ) / N
Wherein be the estimation to noise item variance, N is the length of sequence.
2) ARIMA is seasonal ARMA model:
If haze time series { x td jump divide y t=(1-B) dx tbe stable ARMA (p, a q) sequence, wherein B is One-step delay operator, represents the time of current sequence value to pulling out moment, i.e. a Bx in the past t=x t-1; D>=1 is integer, then claim { x tfor having rank p, the autoregression summation moving average ARIMA model of d and q, also referred to as seasonal ARMA model, is designated as { x t} ~ ARIMA (p, d, q);
3) GARCH model is EC GARCH:
y t=f(t-1,X)+ε t
σ t 2 = ω + Σ i = 1 p a i ϵ t - i 2 + Σ j = 1 q β j σ t - j 2
First formula y tbe one with error term ε tthe average equation about sequence X; Second formula be the first phase forward prediction variance based on foregoing information, be made up of three parts: one is average ω; Two are be called ARCH (ARCH) item, by the delayed undulatory property information of measuring from obtaining of the residuals squares of average equation above wherein a i, β jfor parameter; Three are being called generilized auto regressive conditional heteroskedastic (GARCH) item, is the prediction variance of upper first phase
AR* (p, the q) model in AR*-SVM framework can be determined by flow process above.
Preferably, described support vector machines is:
The theoretical foundation of this support vector machines is Statistical Learning Theory, the same with radial primary function network as multi-Layer Perceptron Neural Network, can be used for pattern classification and non-linear regression; Its core concept is, by Kernel Function Transformation, the sample of the input space is mapped to high-dimensional feature space, in high-dimensional feature space, find optimal classification surface, thus distinguishes sample; Therefore choosing of correlation parameter is the key determining SVM performance after the selection of kernel function type and definite kernel function; Owing to also not constructing the effective ways of suitable kernel function at present for particular problem, the standard kernel functions such as most or Polynomial kernel function, RBF kernel function, the perceptron kernel function that utilize in reality; RBF kernel function is a pervasive kernel function, is applicable to the sample of Arbitrary distribution by adjustment parameter;
Given training sample data (x i, y i), i=1,2 ..., l, x ∈ R m, y ∈ R, wherein x ifor input vector, y ibe corresponding output valve, l is number of samples, and support vector regression is exactly by data x by a Nonlinear Mapping ξ ibe mapped to high-dimensional feature space G, and carry out linear regression in this space:
y=g(x)=σ Tξ(x)+b
Wherein σ is the weight vector of lineoid, and b is bias term;
The analytic expression of the linear regression lineoid that support vector machines determines is as follows:
f ( x ) = Σ i = 1 l ( α ‾ i * - α ‾ i ) K ( x i , x ) + b ‾
Wherein f (x) is categorised decision function, i=1,2,3 ... l, l are the number of training sample, for gaussian radial basis function kernel function. the optimum solution of dual problem, for threshold value.
See Fig. 2, this test figure based on the haze Time Series Forecasting Methods of AR*-SVM hybrid modeling adopts the nearly trimestral haze achievement data in Shanghai.Data refer to the P2.5 data of Shanghai haze January to the March of Fig. 2, and wherein abscissa representing time, ordinate represents surveyed achievement data.
1) original haze data are divided into two parts, front 75 observation datas are as the training data of AR*-SVM model, and rear 10 observation datas are as the target data of AR*-SVM model.
2) training data is processed, set up its yield volatility and set up AR* model, and these data determine SVM model parameter.
For the arma modeling of process data, can show that ARMA (3,3) is the model that it is applicable to by the AIC value (as shown in table 2) calculating not same order.
The AIC value of table 2 ARMA not same order
Exponent number (p, q) AIC value
(1,1) 99.2171
(2,1) 100.5325
(1,2) 100.6693
(2,2) 102.3874
(3,3) 98.8735
(4,4) 104.3692
For the GARCH model of process data, calculate the AIC value (as shown in table 3) of not same order equally, can show that GARCH (1,1) is the model that it is applicable to.
The AIC value of table 3 GARCH not same order
Exponent number (p, q) AIC value
(1,1) 115.6939
(2,1) 117.6939
(1,2) 116.2857
(2,2) 118.2970
(3,3) 121.6797
(4,4) 124.5786
3) training data and the model of AR*-SVM set up is used to predict target data
Utilize the model combined method of Fig. 1, training data the AR* model set up and SVM models coupling are predicted data.And different Forecasting Methodology is compared, draw result as shown in table 1.
Embodiment two:
The present embodiment is substantially identical with embodiment one, and special feature is as described below:
As shown in Figure 3, the AQI index in figure is air quality index (Air Quality Index is called for short AQI), and it is the zero dimension index of quantitative description Air Quality., the test figure of the present embodiment adopt Shanghai 24 hours on the 17th September in 2014 haze achievement data.Original haze data are divided into two parts, and front 14 observation datas are as the training data of AR*-SVM model, and rear 10 observation datas are as the target data of AR*-SVM model.Concrete steps are with embodiment one.
No matter done dependence test to the short-term forecasting of achievement data, be sample data size, and the method that the present invention proposes all shows the stability of prediction.
MAE, RMSE, MAPE comparative analysis of each method of table 4
Utilize the model combined method of Fig. 1, training data AR* and the SVM combination of setting up is predicted data.And different Forecasting Methodology is compared, draw result as shown in table 4.
In sum, a kind of haze Time Series Forecasting Methods based on AR*-SVM mixture model provided by the invention, compares traditional Forecasting Methodology, and the stability predicted the outcome is better, and accuracy rate is higher.

Claims (3)

1., based on a haze Time Series Forecasting Methods for AR*-SVM mixture model, it is characterized in that, operation steps is as follows:
The first step, sets up AR* model to haze time series stream, first identifies the exponent number of this model, determine the parameter of AR* model and estimate, the final linear segment used in AR* model analysis flow data; The information of this linear segment is by using AR* model to draw seasonal effect in time series innovation sequence { ε tobtain, this innovation sequence contains statistics and the fluctuation information of time series stream; Use it as the part building AR*-SVM mixture model, not only can reduce noise level, improve the accuracy of prediction simultaneously by the statistics of acquisition time sequence flows and fluctuation information;
Second step, uses SVM model to carry out the modeling of AR*-SVM mixture model to original series and the innovation sequence that obtains from AR* model; Mixture model AR*-SVM is the linear processes part being obtained haze time series stream by AR* and SVM model respectively, and combines the model and forecast performance improving whole haze time series stream.
2. the haze Time Series Forecasting Methods based on AR*-SVM mixture model according to claim 1, it is characterized in that: described AR* model comprises auto regressive moving average arma modeling, seasonal auto regressive moving average ARIMA model and generilized auto regressive conditional heteroskedastic GARCH model line pattern type, this each model is as follows:
1) ARMA is ARMA model:
Wherein x trefer to the observed reading of t, x t-mrefer to the observed reading in t-m moment, (a 1, a 2... a p) be called autoregressive coefficient, argument (b 1, b 2... b q) be called running mean coefficient, { ε tbe white noise sequence, be also referred to as innovation sequence, the exponent number that (p, q) is arma modeling, first build seasonal effect in time series ARMA (p, q) model needs to determine its p, q value;
By calculating AIC, the exponent number p of ARMA (p, q) model, q can determine that AIC information criterion and Akaike information criterion are a kind of standards of measure statistical models fitting Optimality; By calculating the different p of this sequence, the AIC value of q value, get the AR* model that minimum AIC value decides this sequence, AIC computing formula is as follows:
Wherein be the estimation to noise item variance, N is the length of sequence;
2) ARIMA model is seasonal ARMA model:
If haze time series { x td jump divide y t=(1-B) dx tbe stable ARMA (p, a q) sequence, wherein B is One-step delay operator, represents the time of current sequence value to pulling out moment, i.e. a Bx in the past t=x t-1; D>=1 is integer, then claim { x tfor having rank p, the autoregression summation moving average ARIMA model of d and q, also referred to as seasonal ARMA model, is designated as { x t} ~ ARIMA (p, d, q);
3) GARCH model is EC GARCH:
y t=f(t-1,X)+ε t
First formula y tbe one with error term ε tthe average equation about sequence X; Second formula be the first phase forward prediction variance based on foregoing information, be made up of three parts: one is average ω; Two are be called ARCH (ARCH) item, by the delayed undulatory property information of measuring from obtaining of the residuals squares of average equation above wherein a i, β jfor parameter; Three are being called generilized auto regressive conditional heteroskedastic (GARCH) item, is the prediction variance of upper first phase
AR* (p, the q) model in AR*-SVM framework is determined by flow process above.
3. the haze Time Series Forecasting Methods based on AR*-SVM mixture model according to claim 1, is characterized in that: described support vector machines is:
The theoretical foundation of this support vector machines is Statistical Learning Theory, the same with radial primary function network as multi-Layer Perceptron Neural Network, can be used for pattern classification and non-linear regression; Its core concept is, by Kernel Function Transformation, the sample of the input space is mapped to high-dimensional feature space, in high-dimensional feature space, find optimal classification surface, thus distinguishes sample; Therefore choosing of correlation parameter is the key determining SVM performance after the selection of kernel function type and definite kernel function; Owing to also not constructing the effective ways of suitable kernel function at present for particular problem, most or Polynomial kernel function, RBF kernel function, the perceptron kernel function standard kernel function that utilize in reality; RBF kernel function is a pervasive kernel function, is applicable to the sample of Arbitrary distribution by adjustment parameter;
Given training sample data (x i, y i), i=1,2 ..., l, x ∈ R m, y ∈ R, wherein x ifor input vector, y ibe corresponding output valve, l is number of samples, and support vector regression is exactly by data x by a Nonlinear Mapping ξ ibe mapped to high-dimensional feature space G, and carry out linear regression in this space:
y=g(x)=σ Tξ(x)+b
Wherein σ is the weight vector of lineoid, and b is bias term;
The analytic expression of the linear regression lineoid that support vector machines determines is as follows:
Wherein f (x) is categorised decision function, i=1,2,3 ... l, l are the number of training sample, for gaussian radial basis function kernel function; the optimum solution of dual problem, for threshold value.
CN201410837471.8A 2014-12-24 2014-12-24 AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method Pending CN104504475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410837471.8A CN104504475A (en) 2014-12-24 2014-12-24 AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410837471.8A CN104504475A (en) 2014-12-24 2014-12-24 AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method

Publications (1)

Publication Number Publication Date
CN104504475A true CN104504475A (en) 2015-04-08

Family

ID=52945870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410837471.8A Pending CN104504475A (en) 2014-12-24 2014-12-24 AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method

Country Status (1)

Country Link
CN (1) CN104504475A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834978A (en) * 2015-05-21 2015-08-12 国家电网公司 Load restoration and prediction method
CN104850734A (en) * 2015-04-21 2015-08-19 武大吉奥信息技术有限公司 Air quality index prediction method based on spatial and temporal distribution characteristics
CN105139079A (en) * 2015-07-30 2015-12-09 广州时韵信息科技有限公司 Tax revenue prediction method and device based on hybrid model
CN106126483A (en) * 2016-06-21 2016-11-16 湖北天明气和网络科技有限公司 A kind of method and device of weather forecasting
CN107908891A (en) * 2017-11-28 2018-04-13 河海大学 A kind of Hydrological Time Series rejecting outliers method based on ARIMA SVR
CN107909084A (en) * 2017-11-15 2018-04-13 电子科技大学 A kind of haze concentration prediction method based on convolution linear regression network
CN109146111A (en) * 2017-06-27 2019-01-04 中国农业大学 A method of based on ARIMA-LSSVM Combined model forecast grain yield
CN109375292A (en) * 2018-08-30 2019-02-22 昆明理工大学 A kind of prediction of precipitation method based on autoregression integral sliding average and support vector regression
CN109784528A (en) * 2018-12-05 2019-05-21 鲁东大学 Water quality prediction method and device based on time series and support vector regression
CN109883692A (en) * 2019-04-04 2019-06-14 西安交通大学 Generalized Difference filtering method based on built-in encoder information
CN110874802A (en) * 2018-09-03 2020-03-10 苏文电能科技股份有限公司 Electricity consumption prediction method based on ARMA and SVM model combination
CN111784022A (en) * 2019-08-08 2020-10-16 沈阳工业大学 Short-time adjacent fog prediction method based on combination of Wrapper method and SVM method
CN112257947A (en) * 2020-10-30 2021-01-22 红云红河烟草(集团)有限责任公司 Method, device and equipment for predicting temperature and humidity of cigarette making environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673463A (en) * 2009-09-17 2010-03-17 北京世纪高通科技有限公司 Traffic information predicting method based on time series and device thereof
CN101846753A (en) * 2010-04-29 2010-09-29 南京信息工程大学 Climate time sequence forecasting method based on empirical mode decomposition and support vector machine
CN102542167A (en) * 2011-12-31 2012-07-04 东北电力大学 Wind-speed time series forecasting method for wind power station

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673463A (en) * 2009-09-17 2010-03-17 北京世纪高通科技有限公司 Traffic information predicting method based on time series and device thereof
CN101846753A (en) * 2010-04-29 2010-09-29 南京信息工程大学 Climate time sequence forecasting method based on empirical mode decomposition and support vector machine
CN102542167A (en) * 2011-12-31 2012-07-04 东北电力大学 Wind-speed time series forecasting method for wind power station

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李卫民: "流数据查询算法若干关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
郑荣等: "基于ARIMA与SVM的国际铀资源价格预测", 《计算机工程与应用》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850734A (en) * 2015-04-21 2015-08-19 武大吉奥信息技术有限公司 Air quality index prediction method based on spatial and temporal distribution characteristics
CN104850734B (en) * 2015-04-21 2017-09-15 武大吉奥信息技术有限公司 A kind of air quality index Forecasting Methodology based on spatial-temporal distribution characteristic
CN104834978A (en) * 2015-05-21 2015-08-12 国家电网公司 Load restoration and prediction method
CN105139079A (en) * 2015-07-30 2015-12-09 广州时韵信息科技有限公司 Tax revenue prediction method and device based on hybrid model
CN106126483A (en) * 2016-06-21 2016-11-16 湖北天明气和网络科技有限公司 A kind of method and device of weather forecasting
CN109146111A (en) * 2017-06-27 2019-01-04 中国农业大学 A method of based on ARIMA-LSSVM Combined model forecast grain yield
CN107909084A (en) * 2017-11-15 2018-04-13 电子科技大学 A kind of haze concentration prediction method based on convolution linear regression network
CN107909084B (en) * 2017-11-15 2021-07-13 电子科技大学 Haze concentration prediction method based on convolution-linear regression network
CN107908891B (en) * 2017-11-28 2019-10-18 河海大学 A kind of Hydrological Time Series rejecting outliers method based on ARIMA-SVR
CN107908891A (en) * 2017-11-28 2018-04-13 河海大学 A kind of Hydrological Time Series rejecting outliers method based on ARIMA SVR
CN109375292A (en) * 2018-08-30 2019-02-22 昆明理工大学 A kind of prediction of precipitation method based on autoregression integral sliding average and support vector regression
CN110874802A (en) * 2018-09-03 2020-03-10 苏文电能科技股份有限公司 Electricity consumption prediction method based on ARMA and SVM model combination
CN109784528A (en) * 2018-12-05 2019-05-21 鲁东大学 Water quality prediction method and device based on time series and support vector regression
CN109883692A (en) * 2019-04-04 2019-06-14 西安交通大学 Generalized Difference filtering method based on built-in encoder information
CN109883692B (en) * 2019-04-04 2020-01-14 西安交通大学 Generalized differential filtering method based on built-in encoder information
CN111784022A (en) * 2019-08-08 2020-10-16 沈阳工业大学 Short-time adjacent fog prediction method based on combination of Wrapper method and SVM method
CN111784022B (en) * 2019-08-08 2024-01-30 沈阳工业大学 Short-time adjacent large fog prediction method based on combination of Wrapper method and SVM method
CN112257947A (en) * 2020-10-30 2021-01-22 红云红河烟草(集团)有限责任公司 Method, device and equipment for predicting temperature and humidity of cigarette making environment

Similar Documents

Publication Publication Date Title
CN104504475A (en) AR*-SVM (support vector machine) hybrid modeling based haze time series prediction method
Yunpeng et al. Multi-step ahead time series forecasting for different data patterns based on LSTM recurrent neural network
Wang et al. Optimal forecast combination based on neural networks for time series forecasting
Guo et al. A case study on a hybrid wind speed forecasting method using BP neural network
CN104699894A (en) JITL (just-in-time learning) based multi-model fusion modeling method adopting GPR (Gaussian process regression)
CN104574220A (en) Power customer credit assessment method based on least square support vector machine
CN108170885B (en) Method for identifying multiple harmonic sources in power distribution network
CN111079856B (en) Multi-period intermittent process soft measurement modeling method based on CSJITL-RVM
CN103853939A (en) Combined forecasting method for monthly load of power system based on social economic factor influence
CN110070202A (en) A method of economic output is predicted by electricity consumption data
Mostafaei et al. Hybrid grey forecasting model for Iran’s energy consumption and supply
Zhao et al. Short-term microgrid load probability density forecasting method based on k-means-deep learning quantile regression
CN103942298A (en) Recommendation method and system based on linear regression
CN108808657B (en) Short-term prediction method for power load
CN113988415A (en) Medium-and-long-term power load prediction method
CN103605493A (en) Parallel sorting learning method and system based on graphics processing unit
CN105913144B (en) A kind of method for predicting service life of product based on goal orientation Optimum Matching similitude
Eğri et al. Bayesian model selection in ARFIMA models
CN111259340B (en) Saturation load prediction method based on logistic regression
Fang Prediction and analysis of regional economic income multiplication capability based on fractional accumulation and integral model
Chen et al. Application of principal component regression analysis in economic analysis
Granziera et al. A predictability test for a small number of nested models
Roh et al. Tfe-net: time and feature focus embedding network for multivariate-to-multivariate time series forecasting
Tan et al. Long-term load forecasting based on feature fusion and lightgbm
CN103345581B (en) Based on online from the Dynamic Network Analysis system and method for center model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150408