CN112418545A - Load characteristic and model fusion based electricity sales amount prediction method and system - Google Patents

Load characteristic and model fusion based electricity sales amount prediction method and system Download PDF

Info

Publication number
CN112418545A
CN112418545A CN202011399227.XA CN202011399227A CN112418545A CN 112418545 A CN112418545 A CN 112418545A CN 202011399227 A CN202011399227 A CN 202011399227A CN 112418545 A CN112418545 A CN 112418545A
Authority
CN
China
Prior art keywords
electricity sales
model
predicting
load characteristics
sales amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011399227.XA
Other languages
Chinese (zh)
Inventor
李键
李凯
唐军
吴佼
张迎平
肖克江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Hunan Electric Power Co Ltd
Original Assignee
State Grid Hunan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Hunan Electric Power Co Ltd filed Critical State Grid Hunan Electric Power Co Ltd
Priority to CN202011399227.XA priority Critical patent/CN112418545A/en
Publication of CN112418545A publication Critical patent/CN112418545A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Accounting & Taxation (AREA)
  • Tourism & Hospitality (AREA)
  • Finance (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting electricity sales amount based on load characteristics and model fusion, comprising the following steps: modeling data acquisition and preprocessing; selecting and extracting relevant characteristics of the predicted target electricity sales amount, and optimizing the relevant characteristics through relevant coefficients; the relevant characteristics comprise power load characteristics, time characteristics and weather factor characteristics; decomposing the power load characteristics into a trend item, a season item and a residual item, respectively predicting the three decomposition items, reconstructing to obtain estimated power load characteristics, inputting the estimated power load characteristics, time characteristics and weather factor characteristics into short-term and medium-term prediction learner models, optimizing parameters of the models by grid connection, and fusing the learner models to obtain short-term and medium-term predicted values of the electricity sales; and performing Prophet model optimization on the power load characteristics, the time characteristics and the weather characteristics to obtain a long-term predicted value of the electricity sales amount. The invention solves the problem of large prediction error of the electricity sales amount after a special event occurs.

Description

Load characteristic and model fusion based electricity sales amount prediction method and system
Technical Field
The invention relates to the technical field of electric power, in particular to a method and a system for predicting electric power sale amount based on load characteristics and model fusion.
Background
In the prediction method of the electricity sales amount, the data volume of monthly data is very small, so the prediction method is realized by a time sequence method, such as a Holt-Winters algorithm, an ARIMA algorithm or an X11 decomposition method; or a simple machine learning model (a strong learning model is not suitable for a scene with too few samples) such as a linear regression algorithm, an SVM algorithm or a perceptron algorithm. Because the variables of the positions necessary for predicting the electricity sales quantity are needed, and the important characteristics (the industry added value is the industrial electricity consumption, the residents can dominate the income of the residents in the electricity consumption and the like) related to the electricity quantity are real-time characteristics, namely, the future related characteristic values cannot be given (the prediction will have great errors), the machine learning model is not suitable. With regard to the time series model, after a special event, the electric quantity tends to have a rebound trend, and the time series model cannot capture the trend. Therefore, the error of the prediction result of the existing prediction method of the electricity sales amount after the special event occurs is large.
Disclosure of Invention
Technical problem to be solved
Based on the above problems, the invention provides a method and a system for predicting electricity sales amount based on load characteristics and model fusion, which can solve the problem of large prediction error of electricity sales amount after a special event occurs.
(II) technical scheme
Based on the technical problem, the invention provides a method for predicting electricity sales amount based on load characteristics and model fusion, which comprises the following steps:
s1, modeling data acquisition and preprocessing, wherein the acquired data comprise monthly electric quantity data and daily electric quantity data;
s2, constructing a characteristic project: selecting and extracting relevant characteristics of predicted target electricity sales amount according to modeling data, and optimizing the relevant characteristics through relevant coefficients; the relevant characteristics comprise power load characteristics, time characteristics and weather factor characteristics;
s3, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a residual item, respectively predicting the three decomposition items, and reconstructing to obtain estimated power load characteristics;
s4, inputting the time characteristics, the weather factor characteristics and the estimated power load characteristics into short-term and medium-term prediction learner models, and optimizing the parameters of the learner models by grid connection:
and S5, fusing the learner models to obtain the predicted values of the electricity sales in the short term and the medium term.
Further, the method comprises the following steps:
and S6, performing Prophet model optimization on the power load characteristics, the time characteristics and the weather characteristics to obtain a long-term predicted value of the electricity sales amount.
Further, the time characteristic in step S2 includes: day _ of _ year day, day _ of _ month day, day _ of _ week day, year, month, week _ total _ id week, week _ year _ id week, whether if _ holiday day is holiday, month of month _ year _ holiday, month _ total _ id month, and weedalay _ num weekend.
Further, the weather factor characteristic in the step S2 is obtained by:
s2.2.1, setting a low temperature threshold and a high temperature threshold: drawing a fitting curve according to daily electricity sales and daily average air temperature, and setting a threshold value according to the trend of the fitting curve;
s2.2.2, calculating a heating coefficient and a refrigeration coefficient according to the low temperature threshold and the high temperature threshold:
hdi=max(Tlow-Ti,0),
cdi=max(Ti-Thigh,0),
wherein hdiAnd cdiRespectively representing the heating coefficient and the refrigeration coefficient of the ith day; t islowAnd ThighRespectively a low temperature threshold temperature and a high temperature threshold temperature; t isiIs the daily average temperature of the day;
s2.2.3, obtaining the temperature coefficient corresponding to the day according to the heating coefficient and the refrigeration coefficient:
HCDi=α*hdi+cdi
Figure BDA0002816465230000031
wherein, the power selling amount of the mth month in the Xth year is recorded as Sm:m+1The peak value of the electricity selling quantity is m months and n months, so the ratio of the peak value electricity selling quantity is Sm:m+1/Sn:n+1And alpha is the average value of the ratio of X, X +1 year peak electricity sales.
Further, step S3 includes the following steps:
s3.1, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a remaining item through X11;
s3.2, predicting the three decomposition terms respectively:
and S3.3, adding the predicted data of the three decomposition terms according to an X11 addition model to obtain the estimated power load characteristic.
Further, the prediction method of step S3.2 includes:
s3.2.1, trend item prediction: obtaining a trend term prediction result by utilizing a Prophet regression model;
s3.2.2, seasonal item period filling: the seasonal item has strong regularity and is filled according to the periodic item;
s3.2.3, residual term noise filling: the residual terms are checked for LB as white noise sequences and white noise filled using a random Gaussian model.
Further, the learner models for short and medium term prediction in step S4 include LightGBM, XGBoost, and GBDT.
Further, the grid optimization method in step S4 is to perform grid search on the parameters by using GridSearchCV to select an optimal parameter combination.
Further, the fusion method in step S5 is Stacking fusion of the models using Mlxtend.
The invention also discloses a power selling amount forecasting system based on load characteristics and model fusion, which comprises the following components:
at least one processor; and at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method for predicting the electricity sales amount based on the load characteristics and the model fusion.
(III) advantageous effects
The technical scheme of the invention has the following advantages:
(1) the method considers that the power load is a strong correlation real-time characteristic of power sale quantity prediction, decomposes and predicts the power load characteristic, then reconstructs the power load characteristic into a predicted power load characteristic which is used as a future power sale quantity characteristic and inputs the predicted power load characteristic into the prediction model together with other related characteristics for prediction, thereby avoiding the trend change of the power sale quantity caused by the fact that a time model cannot capture special events and reducing prediction errors;
(2) according to the method, the internal data of the daily electricity selling quantity, the monthly electricity selling quantity and the power load data are combined, external data such as weather factors, holiday data and the like are considered, relevant characteristics influencing the electricity selling quantity are selected reasonably, and an accurate electricity selling quantity predicted value can be obtained based on the relevant characteristics;
(3) the prediction model further improves the accuracy of model prediction through parameter optimization and fusion.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
fig. 1 is a schematic flow chart of a method for predicting electricity sales amount based on load characteristics and model fusion according to an embodiment of the present invention;
FIG. 2 is a block flow diagram of a method for predicting electricity sales based on load characteristics and model fusion according to an embodiment of the present invention;
fig. 3 is a daily average air temperature-daily electricity sales volume scatter diagram (total electricity sales volume) according to an embodiment of the present invention;
FIG. 4 is a sperman correlation coefficient of all features of an embodiment of the present invention with sold electricity;
FIG. 5 is a graph of a power load curve according to an embodiment of the present invention;
FIG. 6 is a graph showing a relationship between power sales and power load curves according to an embodiment of the present invention;
FIG. 7 is a decomposition graph of an electrical load X11 according to an embodiment of the present invention;
FIG. 8 is a graph illustrating a power load trend term prediction according to an embodiment of the present invention;
FIG. 9 is a graph of a fill-in of seasonal terms of a power load according to an embodiment of the invention;
FIG. 10 is a histogram of power load residue for an embodiment of the present invention;
FIG. 11 is a graph of a power load remainder fill curve according to an embodiment of the invention;
FIG. 12 is a graph illustrating predicted power load curves for an embodiment of the present invention;
FIG. 13 illustrates the optimal parameters of each prediction model according to an embodiment of the present invention;
FIG. 14 is a Stacking model fusion flow chart according to an embodiment of the present invention;
FIG. 15 is a parameter component graph of a Prophet model according to an embodiment of the present invention;
FIG. 16 shows the Prophet model manually setting mutation points according to an embodiment of the present invention;
fig. 17 is a long-term electric quantity prediction curve of the Prophet model according to the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The embodiment of the invention discloses a method for predicting electricity sales amount based on load characteristics and model fusion, which comprises the following steps as shown in figures 1 and 2:
s1, modeling data acquisition and preprocessing:
the collected modeling data comprises 114 monthly electric quantity data of 2011-2020.6 and 1461 daily electric quantity data of 2018-2020.7.8, and preprocessing is performed through modeling data abnormal value identification and abnormal value processing; the daily electric quantity data is used for electric quantity prediction, and the monthly electric quantity data is combined with relevant characteristics and used for factor analysis.
S2, constructing a characteristic project: selecting and extracting relevant characteristics of predicted target electricity sales amount according to modeling data, and optimizing the relevant characteristics through relevant coefficients;
s2.1, preliminarily extracting relevant characteristics of the predicted target electricity sales amount, including power load characteristics, time characteristics and weather factor characteristics: the invention extracts 12 characteristics, including a purchase _ elecqt power load, day _ of _ year day, day _ of _ month day, day _ of _ week day, year, month, week _ total _ id week, week _ year _ id week, if _ holiday day, month _ year _ id month, month _ total _ id month, and month _ year _ num week.
Besides date and load, the invention also considers relevant weather factors, the weather factors are one of main factors influencing the electricity sales amount, and how to process weather data is crucial to the result of electricity sales amount prediction. The invention adopts a method of setting a comfortable temperature interval to preprocess weather data.
Generally, people cannot take cooling or heating measures at comfortable temperature, so that the project sets a low-temperature threshold temperature and a high-temperature threshold temperature for each industry respectively, only when the actual temperature is lower than the low-temperature threshold temperature or higher than the high-temperature threshold temperature, heating measures or cooling measures are generated, a heating coefficient, a cooling coefficient and an overall temperature coefficient are correspondingly set, and quantized values of the heating coefficients, the cooling coefficients and the overall temperature coefficient are taken as monthly electricity sales influencing factors to be brought into a prediction model.
S2.2.1 setting low and high temperature thresholds
The method comprises the steps of firstly, carrying out item statistics on daily electricity sales and daily average air temperature in 2019 Hunan province, further drawing a daily average air temperature-daily electricity sales scatter diagram and a fitting curve, and finally analyzing curve trends to set a threshold value. Fig. 3 is a scatter diagram showing the relationship between the daily average gas temperature and the daily electricity sales amount, and the low-temperature threshold temperature and the high-temperature threshold temperature of the electricity consumption of the residents are respectively 15 ℃ and 25 ℃ according to the analysis.
S2.2.2, calculating a heating coefficient and a refrigeration coefficient according to the low temperature threshold and the high temperature threshold:
the heating coefficient represents that the average air temperature is lower than the low-temperature threshold temperature, and the larger the heating coefficient is, the lower the air temperature is, the larger the required heating intensity is; the refrigerating coefficient represents the size of the average air temperature higher than the high-temperature threshold temperature, and the higher the refrigerating coefficient is, the higher the air temperature is, the higher the required refrigerating intensity is. And when the air temperature is higher than the low-temperature threshold temperature or lower than the high-temperature threshold temperature, the air temperature is in a comfortable interval, and no heating or refrigerating measure is taken. The calculation formula of the daily heating coefficient and the refrigerating coefficient is as follows:
hdi=max(Tlow-Ti,0)
cdi=max(Ti-Thigh,0)
wherein hdiAnd cdiRespectively representing the heating coefficient and the refrigeration coefficient of the ith day; t islowAnd ThighThe values of the low-temperature threshold temperature and the high-temperature threshold temperature are respectively determined according to a daily average air temperature-daily electricity sales volume scatter diagram and a fitting curve chart of the daily average air temperature-daily electricity sales volume scatter diagram in each industry; t isiIs the daily average temperature of the day.
S2.2.3, obtaining the temperature coefficient corresponding to the day according to the heating coefficient and the refrigeration coefficient:
after the daily heating coefficient and the refrigeration coefficient are obtained, the corresponding temperature coefficient is obtained:
HCDi=α*hdi+cdi
wherein hdiAnd cdiHeating coefficient and refrigeration coefficient of the ith day are respectively, wherein:
Figure BDA0002816465230000081
the power sold in the mth month of the Xth year is recorded as Sm:m+1The peak value of the electricity selling quantity is generally 7 months and 1 month, so the ratio of the peak value electricity selling quantity is S7:8/S1:2And alpha is the average value of the ratio of peak electricity sales in 18 and 19 years.
S2.3, optimizing the relevant characteristics through a relevant coefficient:
after extracting and sorting the relevant feature information, screening the features by adopting a spearman correlation coefficient (the pearson coefficient is only suitable for linear correlation) or chi-square test, wherein the spearman correlation coefficient method is adopted in the invention, the correlation coefficient is shown in figure 4, and the features with the correlation coefficient larger than 0.05 are selected as the features of the final data set.
S3, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a residual item, respectively predicting the three decomposition items, and reconstructing to obtain estimated power load characteristics;
s3.1, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a remaining item through X11;
the power load characteristic is a real-time characteristic, if the power load characteristic is used for predicting the power selling amount, the power load characteristic must be predicted first, the predicted power load characteristic is used as a future power selling amount characteristic, the curve of the power load characteristic is shown in fig. 5, the periodic trend of the observed load curve is stable, the predicted error is not too large, the power selling amount and the power load are normalized and then compared, the curve of the power selling amount and the curve of the power load are found to be very similar (fig. 6), and therefore the conclusion is obtained: the power load is a strong correlation real-time characteristic of power sale amount prediction and can be accurately predicted.
Therefore, the power load characteristic is decomposed by X11, and the formula of the X11 decomposition addition model is as follows: y ═ T + S + R
The power load data sequence Y is divided into a trend term T, a season term S, and a remaining term (residual term) R for the next prediction, and the decomposition effect is shown in fig. 7.
S3.2, predicting the three decomposition terms respectively:
s3.2.1, trend item prediction: obtaining a trend term prediction result by utilizing a Prophet regression model;
the method for predicting the trend term is a Prophet model proposed by facebook, and compared with other traditional time sequence models, the Prophet model has the following advantages: the operation is more flexible; the problem of missing values may not be considered; the fitting is very fast, so that interactive exploration becomes possible; the parameters of the predictive model are very easy to interpret, so that an analyst can set some parameters empirically.
The specific tuning parameters of the Prophet model are explained in terms of long-term prediction, where only default parameters are used in predicting the power load. The prediction curve of Prophet is shown in fig. 8, wherein the predicted mapes of month 4 are 6.379%, month 5 are 5.929%, and month 6 are 0.761%, and the curve and the error can be used to remove the influence of special events (abnormal reduction), so that the overall trend term is more accurate.
S3.2.2, seasonal item period filling: the seasonal item has strong regularity and is filled according to the periodic item;
the seasonal item is a periodically decomposed component of the curve, so that the seasonal item does not need to be predicted by a complex method, and only needs to be filled periodically, and the filling effect of the seasonal item is shown in fig. 9.
S3.2.3, residual term noise filling: performing LB inspection on the residual items, and if the residual items are judged to be white noise sequences, performing white noise filling on the residual items by using a random Gaussian model;
the decomposed residue, i.e. the residual, is found to conform to the standard gaussian noise model based on the probability distribution in its histogram (fig. 10) and the LB test. This means that the residual term cannot be predicted (because of random data), and accordingly, only gaussian noise (white noise) needs to be generated continuously to complete the filling of the residual term, and the filling effect is shown in fig. 11.
S3.3, adding the predicted data of the three decomposition terms to obtain an estimated power load characteristic:
after the trend item, the season item and the residual items are respectively predicted and filled, according to an X11 addition model: the predicted total data of the power load characteristics is trend term + season term + remaining term, and the predicted data is restored, and the restoration effect is shown in fig. 12. Where the map for month 4 is 5.93%, the map for month 5 is 3.38%, and the map for month 6 is 0.761%, an evaluation of the justifiable load is possible based on the error for each month of the second quarter.
S4, inputting the time characteristics, the weather factor characteristics and the estimated power load characteristics into short-term and medium-term prediction learner models, and optimizing the parameters of the learner models by grid connection:
and adding the estimated power load as a new characteristic into the short-term and medium-term prediction of the power selling quantity, wherein 3 prediction models are selected tentatively: LightGBM, XGboost and GBDT, wherein a strong learner model with short electricity sales and medium-term prediction contains a large number of parameters, and GridSearchCV is used for carrying out grid search on the parameters to select an optimal parameter combination;
the GBDT is called a Gradient Boosting Decision Tree, which is an iterative Decision Tree algorithm composed of a plurality of Decision trees, and the conclusions of all the trees are accumulated to make a final answer. The algorithm firstly estimates an initial value gamma which is a tree with only one root node; then calculating the value of the negative gradient of the loss function in the current model, and taking the value as the estimation of residual error; estimating a regression tree leaf node area to fit an approximate value of the residual error; then, estimating the value of a leaf node region by utilizing linear search, minimizing a loss function and updating a regression tree; and finally obtaining an output final model.
The XGboost algorithm is an efficient implementation of the GBDT, and mainly optimizes the GBDT in the following three aspects: one is the optimization of the algorithm itself: in the weak learner model selection of the algorithm, compared with GBDT, only decision trees are supported, and many other weak learners can be directly used. In addition to the loss itself, a regularization component is added to the loss function of the algorithm. In the optimization mode of the algorithm, the loss function of the GBDT only performs negative gradient (first-order Taylor) expansion on the error part, and the XGboost loss function performs second-order Taylor expansion on the error part, so that the algorithm is more accurate; secondly, optimizing the operation efficiency of the algorithm: and (4) performing parallel selection on each weak learner, such as a process for establishing a decision tree, and finding out proper subtree splitting characteristics and characteristic values. Before parallel selection, all the characteristic values are sorted and grouped, so that the parallel selection is facilitated. And selecting an appropriate packet size for the characteristics of the packets, and using a CPU (central processing unit) cache for reading acceleration. Storing each group to a plurality of hard disks to improve IO speed; thirdly, optimizing the robustness of the algorithm: for the missing value feature, the processing mode of the missing value is determined by enumerating whether all the missing values enter the left sub-tree or the right sub-tree at the current node. The algorithm adds the regularization terms of L1 and L2, so that overfitting can be prevented, and the generalization capability is stronger.
The LightGBM (light Gradient Boosting machine) is a framework for realizing the GBDT algorithm, supports high-efficiency parallel training, and has the advantages of higher training speed, lower memory consumption, better accuracy and the like.
These 3 models are used more frequently in data prediction, and weak learners (such as decision trees) are integrated by using Boosting and the like to increase the accuracy of prediction, so that such models usually have many parameter quantities (especially LightGBM). The method selects the optimal parameters of the model by using a GridSearchCV grid search method, and the GridSearchCV method optimizes the parameters which have the largest influence on the model at present until the parameters are optimized; and then the next parameter with the largest influence is adjusted and optimized, and so on until all the parameters are adjusted. The method has the disadvantage that the local optimum can be adjusted to be not the global optimum, but the time and the labor are saved. The parameters determined by the mesh optimization are shown in fig. 13, which are the optimal parameters of each model to the training set.
S5, fusing the learner models: using Mlxtend to perform Stacking on the model to obtain short and medium predicted values of the electricity sales;
the model fusion has various modes, such as Voting, Averaging, and Stacking, etc., the project uses the Stacking method to perform model fusion, the flow chart of the model fusion in the Stacking mode is shown in FIG. 14, briefly speaking, the Stacking is to design several individual models, perform K-fold cross validation on the models to output prediction results, then merge the prediction results output by each model into new features, and train the base model, thereby further improving the model accuracy.
S6, performing Prophet model optimization on the power load characteristics, the time characteristics and the weather characteristics to obtain a long-term predicted value of the electricity sales amount:
and adjusting and optimizing methods such as setting breakpoints, increasing disturbance, setting thresholds and the like are carried out aiming at a Prophet model used for long-term prediction of the electricity sales quantity, so that the long-sequence electricity sales quantity curve is more practical.
The Prophet model is used for predicting the trend item of the power load, and the fluctuation and the sensitivity of the power selling amount are stronger than those of the power load, so that the default Prophet model needs to be optimized. Adding the variation of the week (7 days), month (30.5 days) and year (365.5 days) aiming at seasonal disturbance, adding the fluctuation of weekends and holidays, setting the growth trend score linearity and setting a threshold value to enable the overall trend to meet the reality as far as possible. The Prophet parameter component curve is shown in fig. 15, and includes a trend component, a holiday component, a week component, a year component, a month component, and a weekend component in this order from top to bottom. In addition, Prophet can manually set mutation points to more accurately capture the overall trend change of the data, and a mutation point setting curve is shown in fig. 16.
And (3) verifying the electricity selling amount prediction result: and (4) predicting to obtain a prediction result of the total electricity sales in Hunan by taking the months 4, 5 and 6 in 2020 as test sets. The short-term and medium-term prediction results are shown in table 1. As can be seen from Table 1, the daily error of each model is about 6.5 to 7%, the 7-day error is about 3.5 to 4%, the 15-day error is about 1.8 to 2.8%, and the monthly error is about 0 to 3%. The model fusion mode using the xgboost as the base model has the best effect, the error of 15 days can reach 1.97%, and the average error of 4, 5 and 6 months can be reduced to 1.472%.
TABLE 1 prediction error of electricity sales of models in short and middle periods
Figure BDA0002816465230000131
Figure BDA0002816465230000141
According to the prediction error, the selected model is a model fusion of the xgboost as a base model and the xgboost, the lightgbm and the gbdt, and the prediction results of the model with a period of 15 days, electric quantity of 4-6 march and whole quarter electric quantity of two quarters are shown in table 2.
TABLE 2.2020 prediction results of 4, 5, 6 months short and medium term electricity sales
Date True electric quantity Predicting electric quantity Error% (mape)
4.1-4.15 5721915407 5743509700 0.377396229
4.16-4.30 5762598366 5756301090 0.109278411
5.1-5.15 5908824579 5778365800 2.207863472
5.16-5.30 5965766882 6071295130 1.768896608
5.31-6.14 6179979786 5813112060 5.936390385
6.15-6.29 6812189773 6640018480 2.527400122
Long-term predictions are currently determined from the predicted results and the overall curve fitting trends of 6 months in 2020 and 1-8 months in 2020. Wherein the average error is 8.24% for 7 days, 7.49% for 15 days, 3.64% for the whole 6-month map, and 0.29% for 7 months, 1-8 days. The long-term predicted curve of the electricity sales is shown in fig. 17.
Wherein, the short-term prediction means that the electricity sales quantity of 7-15 days in the future is predicted from multiple angles; the medium-term prediction means that the electricity sales of the future for 1-3 months are predicted from multiple angles; the long-term prediction means that the electricity sales of 0.5-2 years in the future are predicted from multiple angles.
Finally, it should be noted that the above-described methods may be converted into software program instructions, either implemented by running a system comprising a processor and a memory, or implemented by computer instructions stored in a non-transitory computer readable storage medium. The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In summary, the method and the system for predicting the electricity sales amount based on the load characteristics and the model fusion have the following advantages:
(1) the method considers that the power load is a strong correlation real-time characteristic of power sale quantity prediction, decomposes and predicts the power load characteristic, then reconstructs the power load characteristic into a predicted power load characteristic which is used as a future power sale quantity characteristic and inputs the predicted power load characteristic into the prediction model together with other related characteristics for prediction, thereby avoiding the trend change of the power sale quantity caused by the fact that a time model cannot capture special events and reducing prediction errors;
(2) according to the method, the internal data of the daily electricity selling quantity, the monthly electricity selling quantity and the power load data are combined, external data such as weather factors, holiday data and the like are considered, relevant characteristics influencing the electricity selling quantity are selected reasonably, and an accurate electricity selling quantity predicted value can be obtained based on the relevant characteristics;
(3) the prediction model further improves the accuracy of model prediction through parameter optimization and fusion.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for predicting electricity sales amount based on load characteristics and model fusion is characterized by comprising the following steps:
s1, modeling data acquisition and preprocessing, wherein the acquired data comprise monthly electric quantity data and daily electric quantity data;
s2, constructing a characteristic project: selecting and extracting relevant characteristics of predicted target electricity sales amount according to modeling data, and optimizing the relevant characteristics through relevant coefficients; the relevant characteristics comprise power load characteristics, time characteristics and weather factor characteristics;
s3, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a residual item, respectively predicting the three decomposition items, and reconstructing to obtain estimated power load characteristics;
s4, inputting the time characteristics, the weather factor characteristics and the estimated power load characteristics into short-term and medium-term prediction learner models, and optimizing the parameters of the learner models by grid connection:
and S5, fusing the learner models to obtain the predicted values of the electricity sales in the short term and the medium term.
2. The method for predicting the electricity sales amount based on the fusion of the load characteristics and the model according to claim 1, further comprising the steps of:
and S6, performing Prophet model optimization on the power load characteristics, the time characteristics and the weather characteristics to obtain a long-term predicted value of the electricity sales amount.
3. The method for predicting electricity sales amount based on load characteristics and model fusion according to claim 1, wherein the time characteristics in step S2 include: day _ of _ year day, day _ of _ month day, day _ of _ week day, year, month, week _ total _ id week, week _ year _ id week, whether if _ holiday day is holiday, month of month _ year _ holiday, month _ total _ id month, and weedalay _ num weekend.
4. The method for predicting the electricity sales amount based on the fusion of the load characteristics and the model according to claim 1, wherein the weather factor characteristics in the step S2 are obtained by:
s2.2.1, setting a low temperature threshold and a high temperature threshold: drawing a fitting curve according to daily electricity sales and daily average air temperature, and setting a threshold value according to the trend of the fitting curve;
s2.2.2, calculating a heating coefficient and a refrigeration coefficient according to the low temperature threshold and the high temperature threshold:
hdi=max(Tlow-Ti,0),
cdi=max(Ti-Thigh,0),
wherein hdiAnd cdiRespectively representing the heating coefficient and the refrigeration coefficient of the ith day; t islowAnd ThighRespectively a low temperature threshold temperature and a high temperature threshold temperature; t isiIs the daily average temperature of the day;
s2.2.3, obtaining the temperature coefficient corresponding to the day according to the heating coefficient and the refrigeration coefficient:
HCDi=α*hdi+cdi
Figure FDA0002816465220000021
wherein, the power selling amount of the mth month in the Xth year is recorded as Sm:m+1The peak value of the electricity selling quantity is m months and n months, so the ratio of the peak value electricity selling quantity is Sm:m+1/Sn:n+1And alpha is the average value of the ratio of X, X +1 year peak electricity sales.
5. The method for predicting the electricity sales amount based on the fusion of the load characteristics and the model according to claim 1, wherein the step S3 comprises the steps of:
s3.1, decomposing the power load characteristics into three decomposition items, namely a trend item, a season item and a remaining item through X11;
s3.2, predicting the three decomposition terms respectively:
and S3.3, adding the predicted data of the three decomposition terms according to an X11 addition model to obtain the estimated power load characteristic.
6. The method for predicting the electricity sales amount based on the load characteristics and the model fusion of claim 5, wherein the predicting method of the step S3.2 comprises the following steps:
s3.2.1, trend item prediction: obtaining a trend term prediction result by utilizing a Prophet regression model;
s3.2.2, seasonal item period filling: the seasonal item has strong regularity and is filled according to the periodic item;
s3.2.3, residual term noise filling: the residual terms are checked for LB as white noise sequences and white noise filled using a random Gaussian model.
7. The method for predicting electricity sales amount based on load characteristics and model fusion of claim 1, wherein the learner models for short and medium term prediction in step S4 include LightGBM, XGBoost and GBDT.
8. The method according to claim 1, wherein the grid optimization method in step S4 is to perform grid search on the parameters by GridSearchCV to select an optimal parameter combination.
9. The method for predicting the electricity sales amount based on the load characteristics and the model fusion of claim 1, wherein the fusion method in the step S5 is a Stacking fusion method for the model by using Mlxtend.
10. A power selling amount prediction system based on load characteristics and model fusion is characterized by comprising:
at least one processor; and at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method for predicting the electricity sales amount based on the load characteristics and the model fusion according to any one of claims 1 to 9.
CN202011399227.XA 2020-12-04 2020-12-04 Load characteristic and model fusion based electricity sales amount prediction method and system Pending CN112418545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011399227.XA CN112418545A (en) 2020-12-04 2020-12-04 Load characteristic and model fusion based electricity sales amount prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011399227.XA CN112418545A (en) 2020-12-04 2020-12-04 Load characteristic and model fusion based electricity sales amount prediction method and system

Publications (1)

Publication Number Publication Date
CN112418545A true CN112418545A (en) 2021-02-26

Family

ID=74829887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011399227.XA Pending CN112418545A (en) 2020-12-04 2020-12-04 Load characteristic and model fusion based electricity sales amount prediction method and system

Country Status (1)

Country Link
CN (1) CN112418545A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610174A (en) * 2021-08-13 2021-11-05 中南大学 Power grid host load prediction method, equipment and medium based on Phik feature selection
CN114243702A (en) * 2022-01-28 2022-03-25 国网湖南省电力有限公司 Prediction method and system for operation parameters of power grid AVC system and storage medium
CN116502278A (en) * 2023-06-30 2023-07-28 长江三峡集团实业发展(北京)有限公司 Data privacy protection method, system, computer equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610174A (en) * 2021-08-13 2021-11-05 中南大学 Power grid host load prediction method, equipment and medium based on Phik feature selection
CN114243702A (en) * 2022-01-28 2022-03-25 国网湖南省电力有限公司 Prediction method and system for operation parameters of power grid AVC system and storage medium
CN116502278A (en) * 2023-06-30 2023-07-28 长江三峡集团实业发展(北京)有限公司 Data privacy protection method, system, computer equipment and medium
CN116502278B (en) * 2023-06-30 2023-10-20 长江三峡集团实业发展(北京)有限公司 Data privacy protection method, system, computer equipment and medium

Similar Documents

Publication Publication Date Title
CN112418545A (en) Load characteristic and model fusion based electricity sales amount prediction method and system
CN109002492B (en) Performance point prediction method based on LightGBM
CN112381673B (en) Park electricity utilization information analysis method and device based on digital twin
CN111027629A (en) Power distribution network fault outage rate prediction method and system based on improved random forest
CN108596242A (en) Power grid meteorology load forecasting method based on wavelet neural network and support vector machines
CN112445690B (en) Information acquisition method and device and electronic equipment
CN114330934A (en) Model parameter self-adaptive GRU new energy short-term power generation power prediction method
CN111178957A (en) Method for early warning sudden increase of electric quantity of electricity consumption customer
CN114862032B (en) XGBoost-LSTM-based power grid load prediction method and device
Heydari et al. Mid-term load power forecasting considering environment emission using a hybrid intelligent approach
CN115238854A (en) Short-term load prediction method based on TCN-LSTM-AM
CN117236800B (en) Multi-scene self-adaptive electricity load prediction method and system
CN110738565A (en) Real estate finance artificial intelligence composite wind control model based on data set
CN116342074B (en) Engineering project consultation expert base talent matching service system
CN117390502A (en) Resiofnn network-based voltage data missing value filling method and system
CN117236485A (en) Method for reducing and predicting power supply quantity errors of power grid based on CNN-BILSTM algorithm
CN114676931B (en) Electric quantity prediction system based on data center technology
CN113837486B (en) RNN-RBM-based distribution network feeder long-term load prediction method
CN115687788A (en) Intelligent business opportunity recommendation method and system
CN114925931A (en) Platform area load prediction method and system
CN115238951A (en) Power load prediction method and device
Pfeifer et al. A comparison of statistical and machine learning approaches for time series forecasting in a demand management scenario
CN118094454B (en) Power distribution network load data anomaly detection method and system
Abeysingha et al. Electricity load/demand forecasting in sri lanka using deep learning techniques
CN113762600B (en) LightGBM-based monthly gas consumption prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination