CN111582567A - Wind power probability prediction method based on hierarchical integration - Google Patents
Wind power probability prediction method based on hierarchical integration Download PDFInfo
- Publication number
- CN111582567A CN111582567A CN202010348291.9A CN202010348291A CN111582567A CN 111582567 A CN111582567 A CN 111582567A CN 202010348291 A CN202010348291 A CN 202010348291A CN 111582567 A CN111582567 A CN 111582567A
- Authority
- CN
- China
- Prior art keywords
- model
- prediction
- egpr
- subspace
- gpr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000010354 integration Effects 0.000 title claims abstract description 34
- 230000008569 process Effects 0.000 claims abstract description 31
- 230000002068 genetic effect Effects 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000012952 Resampling Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 14
- 238000012795 verification Methods 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 10
- 239000000203 mixture Substances 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 8
- 239000011541 reaction mixture Substances 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000002123 temporal effect Effects 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims description 2
- 230000006866 deterioration Effects 0.000 abstract 1
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000002688 persistence Effects 0.000 description 3
- 238000010248 power generation Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- 102000020897 Formins Human genes 0.000 description 1
- 108091022623 Formins Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000011437 continuous method Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a wind power probability prediction method based on hierarchical integration. The method comprises the steps of constructing a subspace set through resampling and a partial least square method, obtaining a plurality of local regions on each subspace by utilizing GMM clustering, establishing a corresponding local GPR model, and establishing a first-layer integration model by utilizing a Bayesian inference strategy and a finite mixing mechanism to fuse local models. And selecting a proper first-layer integration model by adopting a genetic algorithm to perform selective self-adaptive integration to obtain a regression probability prediction model of a selective hierarchical integration Gaussian process. In order to solve the problem of performance deterioration caused by changes of wind power data characteristics, a prediction model has the capability of self-adaptive updating by introducing a self-adaptive updating strategy. The method uses the selective hierarchical ensemble learning framework for the ultra-short-term wind power prediction, and compared with the traditional ensemble learning prediction method, the method has higher prediction precision and stability, and the generated prediction interval can provide effective reference for power scheduling.
Description
Technical Field
The invention relates to the technical field of wind power prediction, in particular to a wind power probability prediction method based on hierarchical integration.
Background
Wind energy is a renewable energy source which is pollution-free and widely distributed, and the wind power generation technology is rapidly developed in recent years. However, due to the randomness and the fluctuation of wind energy, unstable wind power grid-connection impacts the safety and stability of a power system, so that the stable operation of equipment of a power grid is influenced. Therefore, the wind power prediction is accurate and efficient, reasonable power scheduling can be effectively promoted, reliable reference is provided for power grid arrangement power generation planning and shutdown maintenance, and the system is guaranteed to be safe, reliable and economical to operate. The wind power prediction plays a crucial role in the development of the power generation industry towards the environment protection and cleanness direction, and has great engineering application value.
The ensemble learning is a strategy for completing a learning task by constructing and combining a plurality of sub-models, and the ensemble learning can obtain better performance than a single model, so that the ensemble learning is widely applied to the field of wind power prediction. As we know, high performance and rich diversity of submodels can integrate better performance. However, most wind power prediction research aiming at ensemble learning neglects the diversity of building sub-models from input data, which is not favorable for obtaining sub-models with abundant diversity. In addition, as the prediction time of the model becomes longer, since the model is built by using historical data, a concept drift phenomenon inevitably occurs, and therefore the model should have a certain adaptive capacity. The self-adaptation of the integrated model is composed of two parts, namely, a sub-model has certain self-adaptation updating capability, and the weight value of the integrated sub-model is not fixed and is self-adaptively changed. However, the problem of adaptation of the integration model has only been discussed in recent studies.
Finally, due to the characteristics of strong randomness and high uncertainty of wind energy, the traditional single-point prediction cannot make a good estimation on the uncertainty of wind, and for the stability of a power system, the grid connection of wind power needs to accurately estimate the fluctuation range of the wind power, and the single-point prediction is far from sufficient. Therefore, a probabilistic modeling method capable of generating a probabilistic prediction interval should be applied to the submodel.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a wind power probability prediction method based on hierarchical integration, which effectively improves the accuracy and stability of a prediction model.
The invention adopts the following technical scheme for solving the technical problems: a wind power probability prediction method based on hierarchical integration comprises the following steps:
selecting historical meteorological data D of a section of wind power plant as a modeling sample set, and dividing the sample set into training sets DtrainVerification set DvalAnd test set DtestUsing Bootstrapping method to pair DtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And selecting input characteristic variables of the sub-sample set by using a Partial Least Squares (PLS) method, sequencing importance, deleting the same sample subset, and constructing N subspaces (S)1,...,SNSaving input characteristic variable indexes of training set samples corresponding to the N subspaces;
step (2) mapping the index of the subspace to a training set DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd then clustering is performed on each subspace by using a Gaussian mixture model GMM, and then a data set D is supposed to be trained in the ith subspacetra,iGet z local regions { LD1,LD2,...,LDzModeling by using Gaussian process regression on each local area to obtain a GPR model set (GPR)1,GPR2,...,GPRz}; for a new sample x*Obtaining the prediction output of the first-layer integrated EGPR model on the ith subspace by utilizing a Bayesian inference strategy and a finite mixing mechanism; similarly, N subspaces can obtain N first-layer integrated EGPR models { EGPR1,EGPR2,...,EGPRNThe predicted output of (c) };
step (3) according to the step (2), calculating a verification set DvalIntegrating the prediction precision RMSE and standard deviation STD of the EGPR models on the first N layers, weighting and mixing the RMSE and the STD to be used as an optimization target of model selection, and selecting the performance by using a genetic algorithmThe good and stable first-layer integrated EGPR models are assumed to be selected and used as sub-models of the second-layer integration;
integrating the sub-models integrated on the second layer by using a self-adaptive integration mode to obtain a final SHEGPR model;
and (5) updating the local region LD, the GPR model and the GMM model along with the increase of the prediction time.
Further, in the step (1), the historical meteorological data D is meteorological data and operation data of the wind power plant in the past 2-4 months, D is { x, y },wherein p is the number of samples, q is f × l, wherein f is the number of input features, l is the number of delay variables, y is the predicted power, and the input features comprise historical wind speed WSHistorical power P and historical wind direction WD。
Further, the specific process of performing feature selection on the sub-sample set by the Partial Least Squares (PLS) in the step (1) is as follows:
① training the L sub-sample sets with PLS to obtain regression coefficients β on the sub-sample setsrWhereinr ∈ { 1.., L }, which represents the importance of the features in input X to y on this subsample set;
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bqJudging according to the formula (1):
in the formula (1), biIs βrTh is set to be 0.8-0.9; if the formula (1) is established, storing indexes corresponding to the first i characteristics;
and thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
Further, the process of clustering the subspace by using the gaussian mixture model GMM and establishing the first-layer integrated EGPR model in the step (2) is as follows:
in training set DtrainAnd on, setting the nth subspace,wherein p is the number of samples, and c is the number of features in the subspace; setting the maximum clustering number v, establishing a GMM model by the nth subspace, and setting the nth subspace data to be gathered into z types, wherein z is less than or equal to v, namely z local regions { LD1,LD2,...,LDz}; then, local models are built for the z local regions by using Gaussian process regression to obtain z GPR models which are marked as { GPR1,GPR2,...,GPRz};
In detail, for a new sample x*The ith local region GPR model can be described as
In the formula (3), ki,*=[C(x*,xi,1),...,C(x*,xi,p)]C is represented as a positive definite covariance matrix of p × p,andGPR as submodel respectivelyiThe predicted mean and variance of;
in the actual prediction process, for a new sample x*It is assumed that, at the nth subspace,the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
in formula (4), i ∈ {1, 2, 3.., z }, LDiRepresents the ith local area; p (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability; the predicted output on the nth subspace can be obtained by the finite mixture mechanism as:
in the formula (5), the reaction mixture is,predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability;
similarly, the mixed variance can be calculated as:
then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
further, the detailed process of step (3) is as follows:
① mapping the index of the subspace to the validation set DvalObtaining verification data sets { D) on N subspacesval,1,...,Dval,NObtaining N EGPR models according to the step (2), and obtaining the EGPR models in a verification set DvalThereby obtaining N EGPR modulesThe predicted output of type is
Setting initial population number and iteration number of the genetic algorithm, and taking the prediction precision of the EGPR model and the mixed standard deviation and weighted sum as a target function:
fobi=λRMSE+(1-λ)σ (8)
in the formula (8), lambda is a parameter between 0 and 1, sigma is a predicted mixed standard deviation, and RMSE represents a root mean square error in the optimization process;
the further detailed process is as follows: supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction resultIt is calculated as follows:
in the formula (9), m is the number of EGPR models which are selected currently; the RMSE compared to the real values is then:
in the formula (10), NvalTo verify the set DvalThe number of the middle samples;
find min { f) through multiple iterationsobjAt verification set DvalAnd selecting the model with good performance, and storing the index of the model.
Further, the detailed process of step (4) is as follows:
assuming that the number of EGPR models selected according to the step (3) is M, when a new test sample x is predicted*Temporal, second layer integration prediction outputAnd the predicted varianceComprises the following steps:
wherein the content of the first and second substances,for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
wherein the content of the first and second substances,in the case of a conditional probability,for a priori probability, each model is assumed without some a priori knowledgeAre equal and have a value ofCan be expressed as:
wherein, γ is a parameter for controlling the weight.
Further, the detailed process of step (5) is as follows:
when a new one is formedSample (x) of (2)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRUpdating is carried out;
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
The invention has the following characteristics: according to the wind power probability prediction method based on hierarchical integration, firstly, data diversity is generated from two disturbance angles of sample information and characteristic information, a diversity subspace is established through characteristic selection, modeling is carried out after the subspace is clustered by using GMM, training speed is increased, and performance of the subspace after mixed modeling is remarkably improved. And then, pruning the sub-model after the first layer of integration in an optimized mode, so that the performance of the second layer of integration model is improved, and the operation complexity in the self-adaptive updating process is reduced. And finally, performing weighted fusion on the second layer of sub-models by using a self-adaptive integration mode, so that the final SHEGPR model has certain self-adaptive capacity. According to the invention, the GPR is used as a modeling sub-model, so that the integrated SHEGPR model not only has better prediction performance, but also can give a prediction interval.
Compared with the prior art, the invention has the beneficial effects that: according to the method, a selective hierarchical ensemble learning framework is used for wind power prediction in an ultra-short term, compared with the traditional ensemble learning prediction method, the method has higher prediction accuracy and stability, and the generated prediction interval can provide effective reference for power scheduling.
Drawings
FIG. 1 is a flowchart of the prediction of the SHEGPR wind power;
FIG. 2 is a three-dimensional map of the mapping relationship between the power of the wind farm and the wind speed and direction;
FIG. 3 is a GPR and EGPR comparison diagram on a 4h wind power prediction subspace;
FIG. 4 is a wind power prediction trend graph with prediction intervals of 15min, 1h, 2h and 4 h;
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
Example 1
As shown in fig. 1, in this embodiment, wind power data of a certain wind farm in the renewable energy laboratory (NREL) in the united states is taken as an example, wherein historical wind speed, historical power, and historical wind direction data are taken as inputs, a delay variable is set to 8, and power is taken as output of the SHEGPR.
Step 1: selecting historical data (96 data points in 1 day) of wind power, wind speed and wind direction with time resolution of 15 minutes in a certain wind power plant of a renewable energy laboratory (NREL) in America at 1-3 months, and dividing the data into a training set D in sequencetrain(3000) Verification set Dval(1000) And test set Dtest(4000) The specific mapping relationship between the power of the wind farm and the wind speed and direction is shown in fig. 2.
Step 2: using Bootstrapping mode to pairDtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And (5) performing importance ranking on the characteristics of the samples by using a Partial Least Squares (PLS), repeating R times and deleting a repeated subspace to obtain N DtrainSubspace { S }1,...,SNAnd saving feature indexes of the training samples corresponding to the N subspaces.
The process of feature selection of the sub-sample set by Partial Least Squares (PLS) is as follows:
① pairs of training set DtrainPerforming Z-score normalization, training the L subset with PLS, determining the number of principal components of PLS by cross validation to obtain subset regression coefficients βrWhereinr ∈ { 1.., L }, which represents the importance of the features in input X to y on this subset.
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bq}, judging
Wherein, biIs βrTh is set to 0.85. If the above formula is true, the indexes corresponding to the first i characteristics are saved.
And thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
① mapping the index of the subspace to DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd setting the maximum clustering number of the GMM as v, then establishing a Gaussian Mixture Model (GMM) for each subspace training data set, and storing the GMM on each subspace to obtain N GMM models. Suppose that the data set D is trained in the ith subspacetra,iClustering to obtain z bureausPartial region { LD1,LD2,...,LDz}. The GMM algorithm described above is:
Wherein the content of the first and second substances,is the model parameter of GMM, c is the number of Gaussian components, λkIs the weight of the kth Gaussian component, muk,∑kRespectively representing the mean and covariance matrix of the kth gaussian component, and the parameters of the GMM model are found by the expectation-maximization algorithm.
② pairs of Dtra,iZ local regions of (LD) {1,LD2,...,LDzModeling each LD in the model by using Gaussian Process Regression (GPR) to obtain a GPR model set which is marked as { GPR1,GPR2,...,GPRz}. In detail, for a new sample x*The ith local region GPR model can be described as:
wherein k isi,*=[C(x*,xi,1),...,C(x*,xi,p)]And C is represented as a positive definite covariance matrix of p × p.AndGPR as submodel respectivelyiThe predicted mean and variance of.
And thirdly, repeating the step two for N times, and establishing a GPR model set for the N subspace training data sets.
And 4, step 4: mapping index of subspace to DvalObtaining N subspace verification data sets { Dval,1,...,Dval,NZ-score normalization of the data on each subspace. And then obtaining N EGPR models in the { D (proportion of absolute difference) according to the N GMM models, the GPR model set and the Bayesian inference strategy built in the step 3 and a finite mixing mechanismval,1,...,Dval,NThe predicted output and variance are respectivelySpecifically, the following process is established for an EGPR model:
assume for a new sample x*It is assumed that, at the nth subspace,the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
wherein, i ∈ {1, 2, 3.., z }, LDiRepresenting the ith local area. P (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability.
The predicted output on the nth subspace can be obtained by the finite mixture mechanism as:
wherein the content of the first and second substances,predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability.
Similarly, the mixed variance can be calculated as:
wherein the content of the first and second substances,GPR as a submodeliThe predicted variance of (2).
Then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
and 5: this step constructs an optimization problem to select EGPR models for the second level integration; first, it is known that the first layer integration obtains N EGPR models, i.e., { EGPR1,EGPR2,...,EGPRNBinary coding the indexes of all EGPR models, wherein 1 represents that the model is selected, and 0 represents that the model is not selected; then, taking the prediction precision and the mixed standard deviation and the weighted sum of the EGPR model obtained in the step 4 on the verification set as a target function, adopting a Genetic Algorithm (GA) as an optimization algorithm, and searching for min { f } through multiple iterationsobjAnd selecting the models with good performance and difference, storing the indexes of the models, and assuming that M excellent EGPR models are finally selected for second-layer integration.
The optimization objective is specifically constructed as follows:
fobj=λRMSE+(1-λ)σ (8)
wherein λ is a parameter between 0 and 1, σ is a predicted mixed standard deviation, and RMSE represents the root mean square error in the optimization process, as detailed below:
supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction resultIt is calculated as follows
Wherein m is the number of the EGPR models which are selected currently, and the RMSE obtained by comparing the real value is as follows:
wherein N isvalTo verify the set DvalThe number of the middle samples;
step 6: for the on-line prediction phase, test set D is appliedtestSample x of*The prediction is carried out by the following steps:
① mapping index of subspace to x*Obtaining N subspace training data setsZ-score normalizing the data on each subspace; the same as the step 4, the N EGPR models are obtained according to the N GMM models, the GPR model set and the Bayesian inference strategy built in the step 3 and the finite mixing mechanismRespectively the prediction output and the variance of
② second-level integration of prediction outputs for the M EGPR models selected in step 5 by means of variance integrationAnd the predicted varianceComprises the following steps:
wherein the content of the first and second substances,for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
whereinIn the case of a conditional probability,is a prior probability. Assuming each model without some a priori knowledgeAre equal and have a value ofCan be expressed as:
wherein, γ is a parameter for controlling the weight.
And 7: when the prediction time is longer, the performance of the model is not enough to be degraded, so that the model is matchedAdaptive updating of the model becomes necessary. When a new sample (x)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRAnd (6) updating.
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
The implementation case of the invention adopts the root mean square error RMSE and the decision coefficient R2 to evaluate the prediction effect, and the evaluation is defined as:
in the formula, NtestTo test the number of samples, yi,The actual value and the predicted value of the ith sample,is the average of the actual values.
The invention compares the following methods: (1) a GPR global model; (2) a continuous method; (3) a gaussian process regression (SHEGPR) model based on selective hierarchical integration. (4) The gaussian process regression shegpr (with update) model (example 1) based on selective hierarchical integration with adaptive updating, the comparison results are shown in table 1 and table 2.
TABLE 1 comparison of predicted Performance at 2 hours Advance for different prediction methods
TABLE 2 comparison of predicted performance at 4 hours in advance for different prediction methods
As can be seen from tables 1 and 2, the method proposed in this example is a great improvement over the GPR global and persistence methods, both from RMSE and R2The effectiveness and universality of the invention can be proved by significant improvement. Unfortunately, the global GPR model is only comparable to the persistence method performance because the GPR modeling data uses historical samples, the performance in the test set degrades due to conceptual drift, and the persistence method uses the latest sample information as the prediction idea to output the latest previous sample as the next prediction. Therefore, in order to predict the power more accurately, the adaptive updating of the model is a key part in wind power prediction.
As can be seen from FIG. 3, the performance of the EGPR model obtained by clustering with GMM in subspace and then modeling with GPR is significantly different from that of the GPR model in subspace. Therefore, the method for building the sub-model according to the category after the GMM is clustered, which is provided by the invention, has the advantages of higher speed and better performance. Fig. 4 is a 15min, 1h, 2h and 4h wind power prediction trend curve chart based on the shegpr (with update) method from top to bottom, respectively, and it can be seen that the predicted value and the actual value are better fitted. It goes without saying that the shorter the prediction time, the better the fit. It is worth mentioning that the method not only can predict the trend of the wind power, but also can obtain the prediction interval to evaluate the uncertainty of the wind power, and the prediction interval provides powerful guarantee for the stable scheduling of the power system. As can be seen from fig. 4, the shorter the prediction time is, the narrower the 95% confidence interval is, which indicates that the interval prediction is more effective, and is more beneficial to the stable scheduling of the power system.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.
Claims (7)
1. A wind power probability prediction method based on hierarchical integration is characterized by comprising the following steps:
selecting historical meteorological data D of a section of wind power plant as a modeling sample set, and dividing the sample set into training sets DtrainVerification set DvalAnd test set DtestUsing Bootstrapping method to pair DtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And selecting input characteristic variables of the sub-sample set by using a Partial Least Squares (PLS) method, sequencing importance, deleting the same sample subset, and constructing N subspaces (S)1,...,SNSaving N subspace pairsInputting characteristic variable indexes of a sample of a training set;
step (2) mapping the index of the subspace to a training set DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd then clustering is performed on each subspace by using a Gaussian mixture model GMM, and then a data set D is supposed to be trained in the ith subspacetra,iGet z local regions { LD1,LD2,...,LDzModeling by using Gaussian process regression on each local area to obtain a GPR model set (GPR)1,GPR2,...,GPRz}; for a new sample x*Obtaining the prediction output of the first-layer integrated EGPR model on the ith subspace by utilizing a Bayesian inference strategy and a finite mixing mechanism; similarly, N subspaces are provided, and N first-layer integrated EGPR models { EGPR1,EGPR2,...,EGPRNThe predicted output of (c) };
step (3) according to the step (2), calculating a verification set DvalThe prediction precision RMSE and the standard deviation STD of the N first-layer integrated EGPR models are weighted and mixed to serve as an optimization target of model selection, the first-layer integrated EGPR models with good performance and stability are selected by utilizing a genetic algorithm, and the M first-layer integrated EGPR models are supposed to be selected and serve as sub-models of second-layer integration;
integrating the sub-models integrated on the second layer by using a self-adaptive integration mode to obtain a final SHEGPR model;
and (5) updating the local region LD, the GPR model and the GMM model along with the increase of the prediction time.
2. The wind power probability prediction method based on the hierarchical integration as claimed in claim 1, wherein in the step (1), historical meteorological data D is meteorological data and operation data of a wind power plant in the past 2-4 months, and D ═ x, y },wherein p is the number of samples, q ═ f × l, wheref is the number of input features, l is the number of delay variables; y is the predicted power; the input features include historical wind speed WSHistorical power P and historical wind direction WD。
3. The wind power probability prediction method based on hierarchical integration according to claim 2, wherein the specific process of performing feature selection on the sub-sample set by Partial Least Squares (PLS) in the step (1) is as follows:
① training the L sub-sample sets with PLS to obtain regression coefficients β on the sub-sample setsrWhereinr ∈ { 1.., L }, which represents the importance of the variable in input X to y on this subsample set;
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bqJudging according to the formula (1):
in the formula (1), biIs βrTh of the ith data is set to be 0.8-0.9; if the formula (1) is established, storing indexes corresponding to the first i characteristics;
and thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
4. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the process of clustering the subspace by using the Gaussian mixture model GMM and establishing the first-layer integrated EGPR model in the step (2) is as follows:
in training set DtrainAnd on, setting the nth subspace,wherein p is the number of samples, and c is the number of features in the subspace; setting the maximum clustering number v, establishing a GMM model by the nth subspace, and setting the nth subspace data to be gathered into z types, wherein z is less than or equal to v, namely z local regions { LD1,LD2,...,LDz}; then, local models are built for the z local areas by Gaussian process regression to obtain z GPR models which are marked as { GPR1,GPR2,...,GPRz};
For a new sample x*The ith local region GPR model can be described as
In the formula (3), ki,*=[C(x*,xi,1),...,C(x*,xi,p)]C is represented as a positive definite covariance matrix of p × p,andGPR as submodel respectivelyiThe predicted mean and variance of;
in the actual prediction process, for a new sample x*It is assumed that, at the nth subspace,the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
in formula (4), i ∈ {1, 2, 3.., z }, LDiRepresents the ith local area; p (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability; then pass through a limited mixerThe predicted output at the nth subspace can be made to be:
in the formula (5), the reaction mixture is,predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability;
similarly, the mixed variance can be calculated as:
then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
5. the wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (3) is as follows:
① mapping the index of the subspace to the validation set DvalObtaining verification data sets { D) on N subspacesval,1,...,Dval,NObtaining N EGPR models according to the step (2), and obtaining the EGPR models in a verification set DvalIn the above, the prediction outputs of N EGPR models are obtained as
Setting initial population number and iteration number of the genetic algorithm, and taking the prediction precision of the EGPR model and the mixed standard deviation and weighted sum as a target function:
fobj=λRMSE+(1-λ)σ (8)
in the formula (8), lambda is a parameter between 0 and 1, sigma is a predicted mixed standard deviation, and RMSE represents a root mean square error in the optimization process;
the further detailed process is as follows: supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction resultIt is calculated as follows:
in the formula (9), m is the number of EGPR models which are selected currently; the RMSE compared to the real values is then:
in the formula (10), NvalTo verify the set DvalThe number of the middle samples;
find min { f) through multiple iterationsobjAt verification set DvalAnd selecting the model with good performance, and storing the index of the model.
6. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (4) is as follows:
assuming that the number of EGPR models selected according to the step (3) is M, when a new test sample x is predicted*Temporal, second layer integration prediction outputAnd the predicted varianceComprises the following steps:
wherein the content of the first and second substances,for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
wherein the content of the first and second substances,in the case of a conditional probability,for a priori probability, each model is assumed without some a priori knowledgeAre equal and have a value of Can be expressed as:
wherein, γ is a parameter for controlling the weight.
7. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (5) is as follows:
when a new sample (x)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRUpdating is carried out;
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010348291.9A CN111582567B (en) | 2020-04-28 | 2020-04-28 | Wind power probability prediction method based on hierarchical integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010348291.9A CN111582567B (en) | 2020-04-28 | 2020-04-28 | Wind power probability prediction method based on hierarchical integration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111582567A true CN111582567A (en) | 2020-08-25 |
CN111582567B CN111582567B (en) | 2022-07-01 |
Family
ID=72112613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010348291.9A Active CN111582567B (en) | 2020-04-28 | 2020-04-28 | Wind power probability prediction method based on hierarchical integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111582567B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113012766A (en) * | 2021-04-27 | 2021-06-22 | 昆明理工大学 | Self-adaptive soft measurement modeling method based on online selective integration |
CN115017671A (en) * | 2021-12-31 | 2022-09-06 | 昆明理工大学 | Industrial process soft measurement modeling method and system based on data flow online clustering analysis |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282306A1 (en) * | 2005-06-10 | 2006-12-14 | Unicru, Inc. | Employee selection via adaptive assessment |
CN103793887A (en) * | 2014-02-17 | 2014-05-14 | 华北电力大学 | Short-term electrical load on-line predicting method based on self-adaptation enhancing algorithm |
CN106505631A (en) * | 2016-10-29 | 2017-03-15 | 塞壬智能科技(北京)有限公司 | Intelligent wind power wind power prediction system |
CN107451101A (en) * | 2017-07-21 | 2017-12-08 | 江南大学 | It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method |
CN109145949A (en) * | 2018-07-19 | 2019-01-04 | 山东师范大学 | Non-intrusive electrical load monitoring and decomposition method and system based on integrated study |
CN110046378A (en) * | 2019-02-28 | 2019-07-23 | 昆明理工大学 | A kind of integrated Gaussian process recurrence soft-measuring modeling method of the selective layering based on Evolutionary multiobjective optimization |
US20190303783A1 (en) * | 2016-06-09 | 2019-10-03 | Hitachi, Ltd. | Data prediction system and data prediction method |
-
2020
- 2020-04-28 CN CN202010348291.9A patent/CN111582567B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282306A1 (en) * | 2005-06-10 | 2006-12-14 | Unicru, Inc. | Employee selection via adaptive assessment |
CN103793887A (en) * | 2014-02-17 | 2014-05-14 | 华北电力大学 | Short-term electrical load on-line predicting method based on self-adaptation enhancing algorithm |
US20190303783A1 (en) * | 2016-06-09 | 2019-10-03 | Hitachi, Ltd. | Data prediction system and data prediction method |
CN106505631A (en) * | 2016-10-29 | 2017-03-15 | 塞壬智能科技(北京)有限公司 | Intelligent wind power wind power prediction system |
CN107451101A (en) * | 2017-07-21 | 2017-12-08 | 江南大学 | It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method |
CN109145949A (en) * | 2018-07-19 | 2019-01-04 | 山东师范大学 | Non-intrusive electrical load monitoring and decomposition method and system based on integrated study |
CN110046378A (en) * | 2019-02-28 | 2019-07-23 | 昆明理工大学 | A kind of integrated Gaussian process recurrence soft-measuring modeling method of the selective layering based on Evolutionary multiobjective optimization |
Non-Patent Citations (6)
Title |
---|
X. CHEN 等: "Semi-Supervised Feature Selection via Sparse Rescaled Linear Square Regression", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
刘蕾 等: "基于特征子集相关度和偏最小二乘法的特征选择策略", 《江西中医药大学学报》 * |
吕朋朋 等: "基于BIM技术的弱电***集成控制平台设计", 《自动化与仪器仪表》 * |
张伟等: "基于实时学习的高斯过程回归多模型融合建模", 《信息与控制》 * |
田明光等: "基于K均值聚类及高斯过程回归集成的铅酸电池荷电状态预测", 《软件》 * |
石立贤 等: "基于局部学习和多目标优化的选择性异质集成超短期风电功率预测方法", 《电网技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113012766A (en) * | 2021-04-27 | 2021-06-22 | 昆明理工大学 | Self-adaptive soft measurement modeling method based on online selective integration |
CN113012766B (en) * | 2021-04-27 | 2022-07-19 | 昆明理工大学 | Self-adaptive soft measurement modeling method based on online selective integration |
CN115017671A (en) * | 2021-12-31 | 2022-09-06 | 昆明理工大学 | Industrial process soft measurement modeling method and system based on data flow online clustering analysis |
Also Published As
Publication number | Publication date |
---|---|
CN111582567B (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112348271A (en) | Short-term photovoltaic power prediction method based on VMD-IPSO-GRU | |
CN111929748B (en) | Meteorological element forecasting method and system | |
CN110942205B (en) | Short-term photovoltaic power generation power prediction method based on HIMVO-SVM | |
CN110942194A (en) | Wind power prediction error interval evaluation method based on TCN | |
CN110059867B (en) | Wind speed prediction method combining SWLSTM and GPR | |
CN111582567B (en) | Wind power probability prediction method based on hierarchical integration | |
CN115130741A (en) | Multi-model fusion based multi-factor power demand medium and short term prediction method | |
CN111260126A (en) | Short-term photovoltaic power generation prediction method considering correlation degree of weather and meteorological factors | |
CN113469426A (en) | Photovoltaic output power prediction method and system based on improved BP neural network | |
CN106778838A (en) | A kind of method for predicting air quality | |
CN114362175B (en) | Wind power prediction method and system based on depth certainty strategy gradient algorithm | |
CN110675278A (en) | Photovoltaic power short-term prediction method based on RBF neural network | |
CN113722980A (en) | Ocean wave height prediction method, system, computer equipment, storage medium and terminal | |
CN116070769A (en) | Ultra-short-term wind power plant power multi-step interval prediction modularization method and device thereof | |
Gensler et al. | An analog ensemble-based similarity search technique for solar power forecasting | |
CN115764870A (en) | Multivariable photovoltaic power generation power prediction method and device based on automatic machine learning | |
CN108694475B (en) | Short-time-scale photovoltaic cell power generation capacity prediction method based on hybrid model | |
CN113344243A (en) | Wind speed prediction method and system for optimizing ELM based on improved Harris eagle algorithm | |
CN116911459A (en) | Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant | |
CN116865232A (en) | Wind speed error correction-based medium-and-long-term wind power prediction method and system | |
CN117291069A (en) | LSTM sewage water quality prediction method based on improved DE and attention mechanism | |
CN111310974A (en) | Short-term water demand prediction method based on GA-ELM | |
CN114372615A (en) | Short-term photovoltaic power prediction method and system considering spatial correlation | |
CN115296298A (en) | Wind power plant power prediction method | |
Li et al. | Application of time series model in relative humidity prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |