CN111582567A - Wind power probability prediction method based on hierarchical integration - Google Patents

Wind power probability prediction method based on hierarchical integration Download PDF

Info

Publication number
CN111582567A
CN111582567A CN202010348291.9A CN202010348291A CN111582567A CN 111582567 A CN111582567 A CN 111582567A CN 202010348291 A CN202010348291 A CN 202010348291A CN 111582567 A CN111582567 A CN 111582567A
Authority
CN
China
Prior art keywords
model
prediction
egpr
subspace
gpr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010348291.9A
Other languages
Chinese (zh)
Other versions
CN111582567B (en
Inventor
金怀平
石立贤
金怀康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010348291.9A priority Critical patent/CN111582567B/en
Publication of CN111582567A publication Critical patent/CN111582567A/en
Application granted granted Critical
Publication of CN111582567B publication Critical patent/CN111582567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a wind power probability prediction method based on hierarchical integration. The method comprises the steps of constructing a subspace set through resampling and a partial least square method, obtaining a plurality of local regions on each subspace by utilizing GMM clustering, establishing a corresponding local GPR model, and establishing a first-layer integration model by utilizing a Bayesian inference strategy and a finite mixing mechanism to fuse local models. And selecting a proper first-layer integration model by adopting a genetic algorithm to perform selective self-adaptive integration to obtain a regression probability prediction model of a selective hierarchical integration Gaussian process. In order to solve the problem of performance deterioration caused by changes of wind power data characteristics, a prediction model has the capability of self-adaptive updating by introducing a self-adaptive updating strategy. The method uses the selective hierarchical ensemble learning framework for the ultra-short-term wind power prediction, and compared with the traditional ensemble learning prediction method, the method has higher prediction precision and stability, and the generated prediction interval can provide effective reference for power scheduling.

Description

Wind power probability prediction method based on hierarchical integration
Technical Field
The invention relates to the technical field of wind power prediction, in particular to a wind power probability prediction method based on hierarchical integration.
Background
Wind energy is a renewable energy source which is pollution-free and widely distributed, and the wind power generation technology is rapidly developed in recent years. However, due to the randomness and the fluctuation of wind energy, unstable wind power grid-connection impacts the safety and stability of a power system, so that the stable operation of equipment of a power grid is influenced. Therefore, the wind power prediction is accurate and efficient, reasonable power scheduling can be effectively promoted, reliable reference is provided for power grid arrangement power generation planning and shutdown maintenance, and the system is guaranteed to be safe, reliable and economical to operate. The wind power prediction plays a crucial role in the development of the power generation industry towards the environment protection and cleanness direction, and has great engineering application value.
The ensemble learning is a strategy for completing a learning task by constructing and combining a plurality of sub-models, and the ensemble learning can obtain better performance than a single model, so that the ensemble learning is widely applied to the field of wind power prediction. As we know, high performance and rich diversity of submodels can integrate better performance. However, most wind power prediction research aiming at ensemble learning neglects the diversity of building sub-models from input data, which is not favorable for obtaining sub-models with abundant diversity. In addition, as the prediction time of the model becomes longer, since the model is built by using historical data, a concept drift phenomenon inevitably occurs, and therefore the model should have a certain adaptive capacity. The self-adaptation of the integrated model is composed of two parts, namely, a sub-model has certain self-adaptation updating capability, and the weight value of the integrated sub-model is not fixed and is self-adaptively changed. However, the problem of adaptation of the integration model has only been discussed in recent studies.
Finally, due to the characteristics of strong randomness and high uncertainty of wind energy, the traditional single-point prediction cannot make a good estimation on the uncertainty of wind, and for the stability of a power system, the grid connection of wind power needs to accurately estimate the fluctuation range of the wind power, and the single-point prediction is far from sufficient. Therefore, a probabilistic modeling method capable of generating a probabilistic prediction interval should be applied to the submodel.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a wind power probability prediction method based on hierarchical integration, which effectively improves the accuracy and stability of a prediction model.
The invention adopts the following technical scheme for solving the technical problems: a wind power probability prediction method based on hierarchical integration comprises the following steps:
selecting historical meteorological data D of a section of wind power plant as a modeling sample set, and dividing the sample set into training sets DtrainVerification set DvalAnd test set DtestUsing Bootstrapping method to pair DtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And selecting input characteristic variables of the sub-sample set by using a Partial Least Squares (PLS) method, sequencing importance, deleting the same sample subset, and constructing N subspaces (S)1,...,SNSaving input characteristic variable indexes of training set samples corresponding to the N subspaces;
step (2) mapping the index of the subspace to a training set DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd then clustering is performed on each subspace by using a Gaussian mixture model GMM, and then a data set D is supposed to be trained in the ith subspacetra,iGet z local regions { LD1,LD2,...,LDzModeling by using Gaussian process regression on each local area to obtain a GPR model set (GPR)1,GPR2,...,GPRz}; for a new sample x*Obtaining the prediction output of the first-layer integrated EGPR model on the ith subspace by utilizing a Bayesian inference strategy and a finite mixing mechanism; similarly, N subspaces can obtain N first-layer integrated EGPR models { EGPR1,EGPR2,...,EGPRNThe predicted output of (c) };
step (3) according to the step (2), calculating a verification set DvalIntegrating the prediction precision RMSE and standard deviation STD of the EGPR models on the first N layers, weighting and mixing the RMSE and the STD to be used as an optimization target of model selection, and selecting the performance by using a genetic algorithmThe good and stable first-layer integrated EGPR models are assumed to be selected and used as sub-models of the second-layer integration;
integrating the sub-models integrated on the second layer by using a self-adaptive integration mode to obtain a final SHEGPR model;
and (5) updating the local region LD, the GPR model and the GMM model along with the increase of the prediction time.
Further, in the step (1), the historical meteorological data D is meteorological data and operation data of the wind power plant in the past 2-4 months, D is { x, y },
Figure BDA0002470974880000031
wherein p is the number of samples, q is f × l, wherein f is the number of input features, l is the number of delay variables, y is the predicted power, and the input features comprise historical wind speed WSHistorical power P and historical wind direction WD
Further, the specific process of performing feature selection on the sub-sample set by the Partial Least Squares (PLS) in the step (1) is as follows:
① training the L sub-sample sets with PLS to obtain regression coefficients β on the sub-sample setsrWherein
Figure BDA0002470974880000032
r ∈ { 1.., L }, which represents the importance of the features in input X to y on this subsample set;
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bqJudging according to the formula (1):
Figure BDA0002470974880000033
in the formula (1), biIs βrTh is set to be 0.8-0.9; if the formula (1) is established, storing indexes corresponding to the first i characteristics;
and thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
Further, the process of clustering the subspace by using the gaussian mixture model GMM and establishing the first-layer integrated EGPR model in the step (2) is as follows:
in training set DtrainAnd on, setting the nth subspace,
Figure BDA0002470974880000034
wherein p is the number of samples, and c is the number of features in the subspace; setting the maximum clustering number v, establishing a GMM model by the nth subspace, and setting the nth subspace data to be gathered into z types, wherein z is less than or equal to v, namely z local regions { LD1,LD2,...,LDz}; then, local models are built for the z local regions by using Gaussian process regression to obtain z GPR models which are marked as { GPR1,GPR2,...,GPRz};
In detail, for a new sample x*The ith local region GPR model can be described as
Figure BDA0002470974880000035
In the formula (3), ki,*=[C(x*,xi,1),...,C(x*,xi,p)]C is represented as a positive definite covariance matrix of p × p,
Figure BDA0002470974880000041
and
Figure BDA0002470974880000042
GPR as submodel respectivelyiThe predicted mean and variance of;
in the actual prediction process, for a new sample x*It is assumed that, at the nth subspace,
Figure BDA0002470974880000043
the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
Figure BDA0002470974880000044
in formula (4), i ∈ {1, 2, 3.., z }, LDiRepresents the ith local area; p (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability; the predicted output on the nth subspace can be obtained by the finite mixture mechanism as:
in the formula (5), the reaction mixture is,
Figure BDA0002470974880000046
predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability;
similarly, the mixed variance can be calculated as:
Figure BDA0002470974880000047
in the formula (6), the reaction mixture is,
Figure BDA0002470974880000048
GPR as a local modeliThe predicted variance of (c);
then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
Figure BDA0002470974880000049
further, the detailed process of step (3) is as follows:
① mapping the index of the subspace to the validation set DvalObtaining verification data sets { D) on N subspacesval,1,...,Dval,NObtaining N EGPR models according to the step (2), and obtaining the EGPR models in a verification set DvalThereby obtaining N EGPR modulesThe predicted output of type is
Figure BDA00024709748800000410
Setting initial population number and iteration number of the genetic algorithm, and taking the prediction precision of the EGPR model and the mixed standard deviation and weighted sum as a target function:
fobi=λRMSE+(1-λ)σ (8)
in the formula (8), lambda is a parameter between 0 and 1, sigma is a predicted mixed standard deviation, and RMSE represents a root mean square error in the optimization process;
the further detailed process is as follows: supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction result
Figure BDA0002470974880000051
It is calculated as follows:
Figure BDA0002470974880000052
in the formula (9), m is the number of EGPR models which are selected currently; the RMSE compared to the real values is then:
Figure BDA0002470974880000053
in the formula (10), NvalTo verify the set DvalThe number of the middle samples;
find min { f) through multiple iterationsobjAt verification set DvalAnd selecting the model with good performance, and storing the index of the model.
Further, the detailed process of step (4) is as follows:
assuming that the number of EGPR models selected according to the step (3) is M, when a new test sample x is predicted*Temporal, second layer integration prediction output
Figure BDA0002470974880000054
And the predicted variance
Figure BDA00024709748800000512
Comprises the following steps:
Figure BDA0002470974880000056
Figure BDA0002470974880000057
wherein the content of the first and second substances,
Figure BDA0002470974880000058
for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
Figure BDA0002470974880000059
wherein the content of the first and second substances,
Figure BDA00024709748800000510
in the case of a conditional probability,
Figure BDA00024709748800000511
for a priori probability, each model is assumed without some a priori knowledge
Figure BDA0002470974880000061
Are equal and have a value of
Figure BDA0002470974880000062
Can be expressed as:
Figure BDA0002470974880000063
wherein, γ is a parameter for controlling the weight.
Further, the detailed process of step (5) is as follows:
when a new one is formedSample (x) of (2)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRUpdating is carried out;
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
Figure BDA0002470974880000064
Figure BDA0002470974880000065
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
wherein α is
Figure BDA0002470974880000066
T is the number of samples taken used in the mix update.
The invention has the following characteristics: according to the wind power probability prediction method based on hierarchical integration, firstly, data diversity is generated from two disturbance angles of sample information and characteristic information, a diversity subspace is established through characteristic selection, modeling is carried out after the subspace is clustered by using GMM, training speed is increased, and performance of the subspace after mixed modeling is remarkably improved. And then, pruning the sub-model after the first layer of integration in an optimized mode, so that the performance of the second layer of integration model is improved, and the operation complexity in the self-adaptive updating process is reduced. And finally, performing weighted fusion on the second layer of sub-models by using a self-adaptive integration mode, so that the final SHEGPR model has certain self-adaptive capacity. According to the invention, the GPR is used as a modeling sub-model, so that the integrated SHEGPR model not only has better prediction performance, but also can give a prediction interval.
Compared with the prior art, the invention has the beneficial effects that: according to the method, a selective hierarchical ensemble learning framework is used for wind power prediction in an ultra-short term, compared with the traditional ensemble learning prediction method, the method has higher prediction accuracy and stability, and the generated prediction interval can provide effective reference for power scheduling.
Drawings
FIG. 1 is a flowchart of the prediction of the SHEGPR wind power;
FIG. 2 is a three-dimensional map of the mapping relationship between the power of the wind farm and the wind speed and direction;
FIG. 3 is a GPR and EGPR comparison diagram on a 4h wind power prediction subspace;
FIG. 4 is a wind power prediction trend graph with prediction intervals of 15min, 1h, 2h and 4 h;
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
Example 1
As shown in fig. 1, in this embodiment, wind power data of a certain wind farm in the renewable energy laboratory (NREL) in the united states is taken as an example, wherein historical wind speed, historical power, and historical wind direction data are taken as inputs, a delay variable is set to 8, and power is taken as output of the SHEGPR.
Step 1: selecting historical data (96 data points in 1 day) of wind power, wind speed and wind direction with time resolution of 15 minutes in a certain wind power plant of a renewable energy laboratory (NREL) in America at 1-3 months, and dividing the data into a training set D in sequencetrain(3000) Verification set Dval(1000) And test set Dtest(4000) The specific mapping relationship between the power of the wind farm and the wind speed and direction is shown in fig. 2.
Step 2: using Bootstrapping mode to pairDtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And (5) performing importance ranking on the characteristics of the samples by using a Partial Least Squares (PLS), repeating R times and deleting a repeated subspace to obtain N DtrainSubspace { S }1,...,SNAnd saving feature indexes of the training samples corresponding to the N subspaces.
The process of feature selection of the sub-sample set by Partial Least Squares (PLS) is as follows:
① pairs of training set DtrainPerforming Z-score normalization, training the L subset with PLS, determining the number of principal components of PLS by cross validation to obtain subset regression coefficients βrWherein
Figure BDA0002470974880000081
r ∈ { 1.., L }, which represents the importance of the features in input X to y on this subset.
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bq}, judging
Figure BDA0002470974880000082
Wherein, biIs βrTh is set to 0.85. If the above formula is true, the indexes corresponding to the first i characteristics are saved.
And thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
① mapping the index of the subspace to DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd setting the maximum clustering number of the GMM as v, then establishing a Gaussian Mixture Model (GMM) for each subspace training data set, and storing the GMM on each subspace to obtain N GMM models. Suppose that the data set D is trained in the ith subspacetra,iClustering to obtain z bureausPartial region { LD1,LD2,...,LDz}. The GMM algorithm described above is:
for any one
Figure BDA0002470974880000083
Is provided with
Figure BDA0002470974880000084
Wherein the content of the first and second substances,
Figure BDA0002470974880000085
is the model parameter of GMM, c is the number of Gaussian components, λkIs the weight of the kth Gaussian component, muk,∑kRespectively representing the mean and covariance matrix of the kth gaussian component, and the parameters of the GMM model are found by the expectation-maximization algorithm.
② pairs of Dtra,iZ local regions of (LD) {1,LD2,...,LDzModeling each LD in the model by using Gaussian Process Regression (GPR) to obtain a GPR model set which is marked as { GPR1,GPR2,...,GPRz}. In detail, for a new sample x*The ith local region GPR model can be described as:
Figure BDA0002470974880000091
wherein k isi,*=[C(x*,xi,1),...,C(x*,xi,p)]And C is represented as a positive definite covariance matrix of p × p.
Figure BDA0002470974880000092
And
Figure BDA0002470974880000093
GPR as submodel respectivelyiThe predicted mean and variance of.
And thirdly, repeating the step two for N times, and establishing a GPR model set for the N subspace training data sets.
And 4, step 4: mapping index of subspace to DvalObtaining N subspace verification data sets { Dval,1,...,Dval,NZ-score normalization of the data on each subspace. And then obtaining N EGPR models in the { D (proportion of absolute difference) according to the N GMM models, the GPR model set and the Bayesian inference strategy built in the step 3 and a finite mixing mechanismval,1,...,Dval,NThe predicted output and variance are respectively
Figure BDA0002470974880000094
Specifically, the following process is established for an EGPR model:
assume for a new sample x*It is assumed that, at the nth subspace,
Figure BDA0002470974880000095
the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
Figure BDA0002470974880000096
wherein, i ∈ {1, 2, 3.., z }, LDiRepresenting the ith local area. P (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability.
The predicted output on the nth subspace can be obtained by the finite mixture mechanism as:
Figure BDA0002470974880000097
wherein the content of the first and second substances,
Figure BDA0002470974880000098
predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability.
Similarly, the mixed variance can be calculated as:
Figure BDA0002470974880000099
wherein the content of the first and second substances,
Figure BDA00024709748800000910
GPR as a submodeliThe predicted variance of (2).
Then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
Figure BDA0002470974880000101
and 5: this step constructs an optimization problem to select EGPR models for the second level integration; first, it is known that the first layer integration obtains N EGPR models, i.e., { EGPR1,EGPR2,...,EGPRNBinary coding the indexes of all EGPR models, wherein 1 represents that the model is selected, and 0 represents that the model is not selected; then, taking the prediction precision and the mixed standard deviation and the weighted sum of the EGPR model obtained in the step 4 on the verification set as a target function, adopting a Genetic Algorithm (GA) as an optimization algorithm, and searching for min { f } through multiple iterationsobjAnd selecting the models with good performance and difference, storing the indexes of the models, and assuming that M excellent EGPR models are finally selected for second-layer integration.
The optimization objective is specifically constructed as follows:
fobj=λRMSE+(1-λ)σ (8)
wherein λ is a parameter between 0 and 1, σ is a predicted mixed standard deviation, and RMSE represents the root mean square error in the optimization process, as detailed below:
supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction result
Figure BDA0002470974880000102
It is calculated as follows
Figure BDA0002470974880000103
Wherein m is the number of the EGPR models which are selected currently, and the RMSE obtained by comparing the real value is as follows:
Figure BDA0002470974880000104
wherein N isvalTo verify the set DvalThe number of the middle samples;
step 6: for the on-line prediction phase, test set D is appliedtestSample x of*The prediction is carried out by the following steps:
① mapping index of subspace to x*Obtaining N subspace training data sets
Figure BDA0002470974880000111
Z-score normalizing the data on each subspace; the same as the step 4, the N EGPR models are obtained according to the N GMM models, the GPR model set and the Bayesian inference strategy built in the step 3 and the finite mixing mechanism
Figure BDA0002470974880000112
Respectively the prediction output and the variance of
Figure BDA0002470974880000113
② second-level integration of prediction outputs for the M EGPR models selected in step 5 by means of variance integration
Figure BDA0002470974880000114
And the predicted variance
Figure BDA0002470974880000115
Comprises the following steps:
Figure BDA0002470974880000116
Figure BDA0002470974880000117
wherein the content of the first and second substances,
Figure BDA0002470974880000118
for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
Figure BDA0002470974880000119
wherein
Figure BDA00024709748800001110
In the case of a conditional probability,
Figure BDA00024709748800001111
is a prior probability. Assuming each model without some a priori knowledge
Figure BDA00024709748800001112
Are equal and have a value of
Figure BDA00024709748800001113
Can be expressed as:
Figure BDA00024709748800001114
wherein, γ is a parameter for controlling the weight.
③ finally, for a test sample, the prediction interval range at 95% confidence interval is
Figure BDA00024709748800001115
Figure BDA00024709748800001116
And 7: when the prediction time is longer, the performance of the model is not enough to be degraded, so that the model is matchedAdaptive updating of the model becomes necessary. When a new sample (x)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRAnd (6) updating.
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
Figure BDA0002470974880000121
Figure BDA0002470974880000122
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
wherein α is
Figure BDA0002470974880000123
T is the number of samples taken used in the mix update.
The implementation case of the invention adopts the root mean square error RMSE and the decision coefficient R2 to evaluate the prediction effect, and the evaluation is defined as:
Figure BDA0002470974880000124
Figure BDA0002470974880000125
in the formula, NtestTo test the number of samples, yi
Figure BDA0002470974880000126
The actual value and the predicted value of the ith sample,
Figure BDA0002470974880000127
is the average of the actual values.
The invention compares the following methods: (1) a GPR global model; (2) a continuous method; (3) a gaussian process regression (SHEGPR) model based on selective hierarchical integration. (4) The gaussian process regression shegpr (with update) model (example 1) based on selective hierarchical integration with adaptive updating, the comparison results are shown in table 1 and table 2.
TABLE 1 comparison of predicted Performance at 2 hours Advance for different prediction methods
Figure BDA0002470974880000128
TABLE 2 comparison of predicted performance at 4 hours in advance for different prediction methods
Figure BDA0002470974880000131
As can be seen from tables 1 and 2, the method proposed in this example is a great improvement over the GPR global and persistence methods, both from RMSE and R2The effectiveness and universality of the invention can be proved by significant improvement. Unfortunately, the global GPR model is only comparable to the persistence method performance because the GPR modeling data uses historical samples, the performance in the test set degrades due to conceptual drift, and the persistence method uses the latest sample information as the prediction idea to output the latest previous sample as the next prediction. Therefore, in order to predict the power more accurately, the adaptive updating of the model is a key part in wind power prediction.
As can be seen from FIG. 3, the performance of the EGPR model obtained by clustering with GMM in subspace and then modeling with GPR is significantly different from that of the GPR model in subspace. Therefore, the method for building the sub-model according to the category after the GMM is clustered, which is provided by the invention, has the advantages of higher speed and better performance. Fig. 4 is a 15min, 1h, 2h and 4h wind power prediction trend curve chart based on the shegpr (with update) method from top to bottom, respectively, and it can be seen that the predicted value and the actual value are better fitted. It goes without saying that the shorter the prediction time, the better the fit. It is worth mentioning that the method not only can predict the trend of the wind power, but also can obtain the prediction interval to evaluate the uncertainty of the wind power, and the prediction interval provides powerful guarantee for the stable scheduling of the power system. As can be seen from fig. 4, the shorter the prediction time is, the narrower the 95% confidence interval is, which indicates that the interval prediction is more effective, and is more beneficial to the stable scheduling of the power system.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (7)

1. A wind power probability prediction method based on hierarchical integration is characterized by comprising the following steps:
selecting historical meteorological data D of a section of wind power plant as a modeling sample set, and dividing the sample set into training sets DtrainVerification set DvalAnd test set DtestUsing Bootstrapping method to pair DtrainPerforming multiple resampling to obtain L sub-sample sets { (X)1,y1),...,(XL,yL) And selecting input characteristic variables of the sub-sample set by using a Partial Least Squares (PLS) method, sequencing importance, deleting the same sample subset, and constructing N subspaces (S)1,...,SNSaving N subspace pairsInputting characteristic variable indexes of a sample of a training set;
step (2) mapping the index of the subspace to a training set DtrainObtaining N subspace training data sets { Dtra,1,...,Dtra,NAnd then clustering is performed on each subspace by using a Gaussian mixture model GMM, and then a data set D is supposed to be trained in the ith subspacetra,iGet z local regions { LD1,LD2,...,LDzModeling by using Gaussian process regression on each local area to obtain a GPR model set (GPR)1,GPR2,...,GPRz}; for a new sample x*Obtaining the prediction output of the first-layer integrated EGPR model on the ith subspace by utilizing a Bayesian inference strategy and a finite mixing mechanism; similarly, N subspaces are provided, and N first-layer integrated EGPR models { EGPR1,EGPR2,...,EGPRNThe predicted output of (c) };
step (3) according to the step (2), calculating a verification set DvalThe prediction precision RMSE and the standard deviation STD of the N first-layer integrated EGPR models are weighted and mixed to serve as an optimization target of model selection, the first-layer integrated EGPR models with good performance and stability are selected by utilizing a genetic algorithm, and the M first-layer integrated EGPR models are supposed to be selected and serve as sub-models of second-layer integration;
integrating the sub-models integrated on the second layer by using a self-adaptive integration mode to obtain a final SHEGPR model;
and (5) updating the local region LD, the GPR model and the GMM model along with the increase of the prediction time.
2. The wind power probability prediction method based on the hierarchical integration as claimed in claim 1, wherein in the step (1), historical meteorological data D is meteorological data and operation data of a wind power plant in the past 2-4 months, and D ═ x, y },
Figure FDA0002470974870000011
wherein p is the number of samples, q ═ f × l, wheref is the number of input features, l is the number of delay variables; y is the predicted power; the input features include historical wind speed WSHistorical power P and historical wind direction WD
3. The wind power probability prediction method based on hierarchical integration according to claim 2, wherein the specific process of performing feature selection on the sub-sample set by Partial Least Squares (PLS) in the step (1) is as follows:
① training the L sub-sample sets with PLS to obtain regression coefficients β on the sub-sample setsrWherein
Figure FDA0002470974870000021
r ∈ { 1.., L }, which represents the importance of the variable in input X to y on this subsample set;
② pairs βrThe data in (1) are sorted from large to small to obtain { b1,b2,b3,...,bqJudging according to the formula (1):
Figure FDA0002470974870000022
in the formula (1), biIs βrTh of the ith data is set to be 0.8-0.9; if the formula (1) is established, storing indexes corresponding to the first i characteristics;
and thirdly, repeating the step two until L subspaces are selected from the L sample subsets, and deleting the repeated subspaces to obtain the final N subspaces.
4. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the process of clustering the subspace by using the Gaussian mixture model GMM and establishing the first-layer integrated EGPR model in the step (2) is as follows:
in training set DtrainAnd on, setting the nth subspace,
Figure FDA0002470974870000023
wherein p is the number of samples, and c is the number of features in the subspace; setting the maximum clustering number v, establishing a GMM model by the nth subspace, and setting the nth subspace data to be gathered into z types, wherein z is less than or equal to v, namely z local regions { LD1,LD2,...,LDz}; then, local models are built for the z local areas by Gaussian process regression to obtain z GPR models which are marked as { GPR1,GPR2,...,GPRz};
For a new sample x*The ith local region GPR model can be described as
Figure FDA0002470974870000024
In the formula (3), ki,*=[C(x*,xi,1),...,C(x*,xi,p)]C is represented as a positive definite covariance matrix of p × p,
Figure FDA0002470974870000025
and
Figure FDA0002470974870000026
GPR as submodel respectivelyiThe predicted mean and variance of;
in the actual prediction process, for a new sample x*It is assumed that, at the nth subspace,
Figure FDA0002470974870000027
the local area number after GMM clustering is z, then x*The posterior probability of (2) is obtained by a Bayesian inference strategy:
Figure FDA0002470974870000031
in formula (4), i ∈ {1, 2, 3.., z }, LDiRepresents the ith local area; p (x)*|LDi) Is conditional probability, P (LD)i) Is a prior probability; then pass through a limited mixerThe predicted output at the nth subspace can be made to be:
Figure FDA0002470974870000032
in the formula (5), the reaction mixture is,
Figure FDA0002470974870000033
predicted value for the i-th local area GPR model, P (LD)i|x*) Is the joint posterior probability;
similarly, the mixed variance can be calculated as:
Figure FDA0002470974870000034
in the formula (6), the reaction mixture is,
Figure FDA0002470974870000035
GPR as a submodeliThe predicted variance of (c);
then for a new sample x*On the nth subspace, the prediction output and the prediction variance of the first-layer prediction EGPR model are as follows:
Figure FDA0002470974870000036
5. the wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (3) is as follows:
① mapping the index of the subspace to the validation set DvalObtaining verification data sets { D) on N subspacesval,1,...,Dval,NObtaining N EGPR models according to the step (2), and obtaining the EGPR models in a verification set DvalIn the above, the prediction outputs of N EGPR models are obtained as
Figure FDA0002470974870000037
Setting initial population number and iteration number of the genetic algorithm, and taking the prediction precision of the EGPR model and the mixed standard deviation and weighted sum as a target function:
fobj=λRMSE+(1-λ)σ (8)
in the formula (8), lambda is a parameter between 0 and 1, sigma is a predicted mixed standard deviation, and RMSE represents a root mean square error in the optimization process;
the further detailed process is as follows: supposing that in the process of certain optimization, the prediction outputs of m selected EGPRs are integrated by using a simple average mode to obtain an integrated prediction result
Figure FDA0002470974870000041
It is calculated as follows:
Figure FDA0002470974870000042
in the formula (9), m is the number of EGPR models which are selected currently; the RMSE compared to the real values is then:
Figure FDA0002470974870000043
in the formula (10), NvalTo verify the set DvalThe number of the middle samples;
find min { f) through multiple iterationsobjAt verification set DvalAnd selecting the model with good performance, and storing the index of the model.
6. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (4) is as follows:
assuming that the number of EGPR models selected according to the step (3) is M, when a new test sample x is predicted*Temporal, second layer integration prediction output
Figure FDA0002470974870000044
And the predicted variance
Figure FDA0002470974870000045
Comprises the following steps:
Figure FDA0002470974870000046
Figure FDA0002470974870000047
wherein the content of the first and second substances,
Figure FDA0002470974870000048
for the output of the i-th EGPR model selected, wiFor integrated weights, then wiAs follows:
Figure FDA0002470974870000049
wherein the content of the first and second substances,
Figure FDA00024709748700000410
in the case of a conditional probability,
Figure FDA00024709748700000411
for a priori probability, each model is assumed without some a priori knowledge
Figure FDA00024709748700000412
Are equal and have a value of
Figure FDA00024709748700000413
Figure FDA00024709748700000414
Can be expressed as:
Figure FDA00024709748700000415
wherein, γ is a parameter for controlling the weight.
7. The wind power probability prediction method based on hierarchical integration according to claim 1, wherein the detailed process of the step (5) is as follows:
when a new sample (x)t+1,yt+1) At the time of arrival, a new sample (x) is first estimatedt+1,yt+1) Posterior probability belonging to different local areas is selected, and then the EGPR model is updated by selecting the value with the maximum posterior probability, and a new sample point x is assumedt+1At LDkHas the maximum posterior probability (LD)k|xt+1) The update operation then includes two steps:
① covariance matrix ∑ in GPR model for the kth local region by moving windowGPRUpdating is carried out;
② mean vector μ in the kth local area by incremental updatekCovariance matrix ∑kMixing coefficient of and GMMkUpdating:
Figure FDA0002470974870000051
Figure FDA0002470974870000052
πk (t+1)=πk (t)+α(P(k|xt+1)-πk (t)) (17)
wherein α is
Figure FDA0002470974870000053
T is the number of samples taken used in the mix update.
CN202010348291.9A 2020-04-28 2020-04-28 Wind power probability prediction method based on hierarchical integration Active CN111582567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010348291.9A CN111582567B (en) 2020-04-28 2020-04-28 Wind power probability prediction method based on hierarchical integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010348291.9A CN111582567B (en) 2020-04-28 2020-04-28 Wind power probability prediction method based on hierarchical integration

Publications (2)

Publication Number Publication Date
CN111582567A true CN111582567A (en) 2020-08-25
CN111582567B CN111582567B (en) 2022-07-01

Family

ID=72112613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010348291.9A Active CN111582567B (en) 2020-04-28 2020-04-28 Wind power probability prediction method based on hierarchical integration

Country Status (1)

Country Link
CN (1) CN111582567B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012766A (en) * 2021-04-27 2021-06-22 昆明理工大学 Self-adaptive soft measurement modeling method based on online selective integration
CN115017671A (en) * 2021-12-31 2022-09-06 昆明理工大学 Industrial process soft measurement modeling method and system based on data flow online clustering analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282306A1 (en) * 2005-06-10 2006-12-14 Unicru, Inc. Employee selection via adaptive assessment
CN103793887A (en) * 2014-02-17 2014-05-14 华北电力大学 Short-term electrical load on-line predicting method based on self-adaptation enhancing algorithm
CN106505631A (en) * 2016-10-29 2017-03-15 塞壬智能科技(北京)有限公司 Intelligent wind power wind power prediction system
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN109145949A (en) * 2018-07-19 2019-01-04 山东师范大学 Non-intrusive electrical load monitoring and decomposition method and system based on integrated study
CN110046378A (en) * 2019-02-28 2019-07-23 昆明理工大学 A kind of integrated Gaussian process recurrence soft-measuring modeling method of the selective layering based on Evolutionary multiobjective optimization
US20190303783A1 (en) * 2016-06-09 2019-10-03 Hitachi, Ltd. Data prediction system and data prediction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282306A1 (en) * 2005-06-10 2006-12-14 Unicru, Inc. Employee selection via adaptive assessment
CN103793887A (en) * 2014-02-17 2014-05-14 华北电力大学 Short-term electrical load on-line predicting method based on self-adaptation enhancing algorithm
US20190303783A1 (en) * 2016-06-09 2019-10-03 Hitachi, Ltd. Data prediction system and data prediction method
CN106505631A (en) * 2016-10-29 2017-03-15 塞壬智能科技(北京)有限公司 Intelligent wind power wind power prediction system
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN109145949A (en) * 2018-07-19 2019-01-04 山东师范大学 Non-intrusive electrical load monitoring and decomposition method and system based on integrated study
CN110046378A (en) * 2019-02-28 2019-07-23 昆明理工大学 A kind of integrated Gaussian process recurrence soft-measuring modeling method of the selective layering based on Evolutionary multiobjective optimization

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
X. CHEN 等: "Semi-Supervised Feature Selection via Sparse Rescaled Linear Square Regression", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
刘蕾 等: "基于特征子集相关度和偏最小二乘法的特征选择策略", 《江西中医药大学学报》 *
吕朋朋 等: "基于BIM技术的弱电***集成控制平台设计", 《自动化与仪器仪表》 *
张伟等: "基于实时学习的高斯过程回归多模型融合建模", 《信息与控制》 *
田明光等: "基于K均值聚类及高斯过程回归集成的铅酸电池荷电状态预测", 《软件》 *
石立贤 等: "基于局部学习和多目标优化的选择性异质集成超短期风电功率预测方法", 《电网技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012766A (en) * 2021-04-27 2021-06-22 昆明理工大学 Self-adaptive soft measurement modeling method based on online selective integration
CN113012766B (en) * 2021-04-27 2022-07-19 昆明理工大学 Self-adaptive soft measurement modeling method based on online selective integration
CN115017671A (en) * 2021-12-31 2022-09-06 昆明理工大学 Industrial process soft measurement modeling method and system based on data flow online clustering analysis

Also Published As

Publication number Publication date
CN111582567B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN112348271A (en) Short-term photovoltaic power prediction method based on VMD-IPSO-GRU
CN111929748B (en) Meteorological element forecasting method and system
CN110942205B (en) Short-term photovoltaic power generation power prediction method based on HIMVO-SVM
CN110942194A (en) Wind power prediction error interval evaluation method based on TCN
CN110059867B (en) Wind speed prediction method combining SWLSTM and GPR
CN111582567B (en) Wind power probability prediction method based on hierarchical integration
CN115130741A (en) Multi-model fusion based multi-factor power demand medium and short term prediction method
CN111260126A (en) Short-term photovoltaic power generation prediction method considering correlation degree of weather and meteorological factors
CN113469426A (en) Photovoltaic output power prediction method and system based on improved BP neural network
CN106778838A (en) A kind of method for predicting air quality
CN114362175B (en) Wind power prediction method and system based on depth certainty strategy gradient algorithm
CN110675278A (en) Photovoltaic power short-term prediction method based on RBF neural network
CN113722980A (en) Ocean wave height prediction method, system, computer equipment, storage medium and terminal
CN116070769A (en) Ultra-short-term wind power plant power multi-step interval prediction modularization method and device thereof
Gensler et al. An analog ensemble-based similarity search technique for solar power forecasting
CN115764870A (en) Multivariable photovoltaic power generation power prediction method and device based on automatic machine learning
CN108694475B (en) Short-time-scale photovoltaic cell power generation capacity prediction method based on hybrid model
CN113344243A (en) Wind speed prediction method and system for optimizing ELM based on improved Harris eagle algorithm
CN116911459A (en) Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant
CN116865232A (en) Wind speed error correction-based medium-and-long-term wind power prediction method and system
CN117291069A (en) LSTM sewage water quality prediction method based on improved DE and attention mechanism
CN111310974A (en) Short-term water demand prediction method based on GA-ELM
CN114372615A (en) Short-term photovoltaic power prediction method and system considering spatial correlation
CN115296298A (en) Wind power plant power prediction method
Li et al. Application of time series model in relative humidity prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant