CN111291782B

CN111291782B - Accumulated load prediction method based on information accumulation k-Shape clustering algorithm

Info

Publication number: CN111291782B
Application number: CN202010032213.8A
Authority: CN
Inventors: 张宇帆; 艾芊; 王历晔; 于琪; 刘育权; 熊文; 王莉; 蔡莹; 吴任博; 李俊格; 黄开艺; 余志文; 张扬; 李诗颖
Original assignee: Shanghai Jiaotong University; Guangzhou Power Supply Bureau Co Ltd
Current assignee: Shanghai Jiaotong University; Guangzhou Power Supply Bureau Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2022-09-09
Anticipated expiration: 2040-01-13
Also published as: CN111291782A

Abstract

The invention discloses an accumulated load prediction method based on an information accumulation k-Shape clustering algorithm, which comprises the following steps: performing k-Shape clustering according to the Shape characteristics of the electrical load curve; then converting the load curve into a similarity matrix and a distance matrix of the load curve among the users; obtaining a hierarchical structure describing distances between each user on the distance matrix; selecting different clustering numbers to obtain different cluster partitions for the user, training a learning model, and performing probabilistic and deterministic prediction on the cumulative load of the user; the probabilistic and deterministic predictions of each cluster partition are weighted to predict the cumulative load forecast and combined into the final cumulative load forecast. The invention provides the shape information which covers the user electric load more comprehensively without depending on the extraction characteristics; the description of the electricity utilization characteristics of the user is facilitated; integrated learning of cumulative load prediction is achieved, as well as improvements in probabilistic and deterministic prediction accuracy.

Description

Accumulated load prediction method based on information accumulation k-Shape clustering algorithm

Technical Field

The invention relates to the technical field of load prediction, in particular to an accumulated load prediction method based on an information accumulation k-Shape clustering algorithm.

Background

With the deregulation of the electricity industry, load aggregators (agents that aggregate a series of users equipped with smart meters) are becoming an important participant in demand-side management. The cumulative load forecast provides a basis for the load aggregator decision-making process. Currently, methods for cumulative load prediction can be divided into three categories: 1) a method of complete polymerization; 2) a completely dispersed method; 3) a clustering based approach. The fully aggregated approach will overlay the cumulative load of all users and then load predict it. In contrast, the fully decentralized method predicts the load separately and then accumulates the prediction results. The clustering-based approach first divides the user load into several clusters. The sum of the loads on each cluster is then predicted separately, and the predicted loads for each cluster are then accumulated to form a final prediction result.

The application of a cluster-based cumulative load prediction method has been the focus of many studies. As a first step in this approach, it is crucial to select a suitable clustering method. As a Shape-based time series clustering method, the k-Shape shows the performance superior to other clustering methods in the fields of load prediction, energy management, accumulated load prediction and the like. Selecting the appropriate input features for the clustering algorithm is another important issue. So far, most documents take Representative Load Patterns (RLP) of power consumers as a clustering input feature and group load curves of the consumers. Average load data over a period of time is a typical type of RLP, however, such RLP curves do not reflect other statistical characteristics of load, other than the average statistics.

The existing cumulative load prediction method has the following problems, and brings unprecedented challenges for a load aggregator to make accurate decisions:

1) the single power user load has large fluctuation, and a prediction method based on complete dispersion has a series of problems of low prediction precision and the like because uncertainty is difficult to process.

2) The current load prediction method based on clustering usually depends on a single prediction result, and the improvement of prediction precision possibly brought by reasonably combining different results is neglected.

3) The input characteristics of the current clustering algorithm are the average value of loads in a period of time, and other statistical characteristics of the loads except the average value cannot be embodied.

Therefore, those skilled in the art are dedicated to developing an accumulated load prediction method based on an information accumulation k-Shape clustering algorithm, and realizing accumulated load prediction at a load aggregator based on data driving.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is how to provide an accumulated load prediction method based on an information accumulation k-Shape clustering algorithm, which is based on data driving to realize the accumulated load prediction at a load aggregator.

In order to achieve the above object, the present invention provides an accumulated load prediction method based on an information accumulation k-Shape clustering algorithm, which is characterized in that the method comprises the following steps:

step 1, executing k-Shape clustering according to the Shape characteristics of the power load curve of a user;

step 2, converting the division of the load curve into a similarity matrix of the load curve among the users through the information obtained by the combined clustering;

step 3, converting the similarity matrix into a distance matrix;

step 4, applying a hierarchical clustering algorithm based on a single relation on the distance matrix to obtain a hierarchical structure describing the distance between each user;

step 5, selecting different clustering numbers according to the hierarchical structure obtained in the step 4 to obtain different cluster partitions of the user, training a learning model, and performing probabilistic prediction and deterministic prediction on the cumulative load of the user;

and 6, determining the weight of the cumulative load prediction results of the probabilistic prediction and the deterministic prediction of each cluster partition, and combining the cumulative load prediction results of the probabilistic prediction and the deterministic prediction of each cluster partition into a final cumulative load prediction result.

Further, the step 1 specifically includes the following steps:

step 1.1, representing the electrical load curve as a training set

The load data set of user i is represented as

Wherein N is the number of power consumers, m is the length of the load sequence, N _tr Is the size of the training set;

and 1.2, solving an NP-hard optimization problem in a heuristic mode to realize k-Shape clustering.

Further, the specific formula of the NP-hard optimization problem of step 1.2 is:

wherein, the first and the second end of the pipe are connected with each other,

is a cluster p _j E.g. the center of mass of P,

is about a sequence of length m

Is measured by the shape distance of (a).

Further, the

The concrete formula of (1) is as follows:

wherein the content of the first and second substances,

is a measure of the cross-correlation of sequences, w 1.., 2 m-1;

further, the step 2 specifically includes the following steps:

step 2.1, after the clustering is finished, calculating each user _uj In each cluster _pi And is noted as the number of loads contained in

Wherein i 1.. k, j 1.. N;

step 2.2, defining the similarity matrix

Is composed of

Further, the step 3 specifically includes the following steps:

converting the similarity matrix into the distance matrix by D ═ I-S

Wherein the matrix

Is 1.

Further, the hierarchical structure in the step 4 is described by using a clustering tree diagram.

Further, the step 5 specifically includes the following steps:

by selecting different numbers of clusters

Get | N for N users _C L, different cluster division modes;

for the ith cluster partition, training is required

A model, the first

A model f _i,j Training on the jth cluster, i.e.

Wherein n is _i,j Is the number of users on the jth cluster in the ith partition;

for probabilistic prediction, the probabilistic prediction value for the quantile q is expressed as:

for deterministic prediction, the model is trained

To obtain

The predicted result of (2);

for the ith cluster partition, the cumulative load prediction for probabilistic prediction is expressed as

The cumulative load prediction for deterministic prediction is represented as

Further, the weight of the cumulative load prediction result of the probabilistic prediction of each cluster partition determined in the step 6 is specifically an optimization problem:

w _i,q ≥0

wherein the content of the first and second substances,

the load is the result of the cumulative load prediction of the ith clustering mode in the time l; the objective function is to minimize pinball loss function on the validation set; the pinball loss function is:

wherein N is _va Is the number of samples on the validation set.

Further, the weight of the cumulative load prediction result of the deterministic prediction of each cluster partition determined in step 6 is specifically an optimization problem:

wherein the content of the first and second substances,

the objective function is to minimize the MAPE value on the validation set.

The invention has the beneficial effects that:

1. the invention provides a clustering method based on information accumulation, which can more comprehensively cover the shape information of the user electric load without depending on extracted features. In addition, the method can form a tree diagram for describing the hierarchical correlation of the user electricity utilization, and is beneficial to describing the electricity utilization characteristics of the user.

2. Compared with the method relying on single prediction, the method provided by the invention realizes the integrated learning of the cumulative load prediction and improves the probability and the certainty prediction accuracy.

3. At present, large-scale implementation or operation is not available at home and abroad, and the method has stronger innovation and operability.

Drawings

FIG. 1 is a comparison of deterministic predictions for method P-10 of a preferred embodiment of the present invention and prior art method C-4;

FIG. 2a is a probabilistic prediction result of the method P-10 according to a preferred embodiment of the present invention;

FIG. 2b is a probabilistic prediction result of prior art method C-4;

FIG. 3 is a probabilistic prediction and deterministic prediction weight heatmap of a preferred embodiment of the present invention;

FIG. 4 is a 10-Shape clustering tree based on information accumulation according to a preferred embodiment of the present invention;

FIG. 5 is a statistical information analysis of the clustering results according to a preferred embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings for clarity and understanding of technical contents. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

The method improves the accumulated load prediction based on k-Shape clustering, and improves the prediction precision by adopting an ensemble learning method; and providing a k-Shape clustering algorithm based on information accumulation to realize hierarchical division of the electricity utilization behaviors of the user. Aiming at probabilistic forecasting (probabilistic forecasting), determining and constructing a linear programming problem with the minimum pinball loss function as a target function according to the weights of different forecasting results so as to realize the optimal reliability of the probabilistic forecasting; for deterministic prediction (deterministic prediction), a linear programming problem with the minimum Mean Absolute Percentage Error (MAPE) loss function as a target function is determined and constructed according to the weights of different prediction results, so that the accuracy of the deterministic prediction is optimal.

The method mainly comprises two parts, namely a k-Shape clustering algorithm based on information accumulation and cumulative prediction aiming at probabilistic and deterministic loads.

(1) k-Shape clustering algorithm based on information accumulation

1) Load Shape information mining based on k-Shape clustering algorithm

This step is intended to group all of the daily load curves of the user according to their shape characteristics. In the training set

And performing k-Shape clustering, wherein N is the number of power users. The load data set of user i can be represented as

Where m is the length of the payload sequence, N _tr Is the size of the training set.

Similar to other centroid-based clustering methods, k-Shape clustering aims at solving the following NP-hard optimization problem in a heuristic manner:

is a cluster p _j E.g. the center of mass of P,

defined by the following formula, which is a sequence of length m

Measure of shape distance of (a):

measure the cross-correlation of sequences, which canIs defined as

Wherein

Calculated from the following formula:

2) combining clustered information

After the clustering is completed, the load number of each user in each clustering cluster is calculated and recorded as

Wherein i 1.. k, j 1.. N. Then we pair the similarity matrices

The following definitions are made:

s (p, q) comprehensively describes the degree of similarity of load curves between users p and q, and thus, by combining the clustered information, the division of the load curves can be converted into a measure of similarity between users. Next, the similarity matrix is converted into a distance matrix by D ═ I-S

Wherein the matrix

Is 1.

The invention applies a hierarchical clustering algorithm based on single relation on the distance matrix. Thus, a hierarchy is obtained that characterizes the distance between each user and can be described using a tree graph.

(2) Ensemble learning based cumulative load prediction

1) Training phase

The purpose of the stage is to train a learning model so as to realize probabilistic and deterministic prediction of the cumulative load containing N power utilization users, and to realize ensemble learning by adopting a dendrogram which is obtained in a clustering stage and describes the hierarchical relationship of the users so as to improve the accuracy of the prediction. By selecting different numbers of clusters

The cluster division of different N power users is obtained, so | N can be obtained _C And | different cluster division modes. For the ith partition, training is required

A model, the first

A model f _i,j Training on the jth cluster, i.e.

Wherein n is _i,j Is the number of users on the jth cluster in the ith partition. Thus, the probabilistic predictor of quantile q can be expressed as:

similarly, for deterministic prediction, the model is trained

Can obtain

The predicted result of (1).

Thus, probabilistic and deterministic cumulative load prediction for ith cluster partitioning can be expressed as

And

2) ensemble learning phase

The ensemble learning phase aims to determine the weights of the cumulative load predictors for each cluster partition and combine them into the final predictor. For probabilistic and deterministic predictions, weight determination is constructed as an optimization problem that is performed on the verification set.

For probabilistic predictions, the objective function is to minimize the pinball loss function:

wherein N is _va The number of samples in the verification set is determined, and therefore, the following optimization problem is constructed for each quantile:

w _i,q ≥0

wherein

The cumulative load prediction result of the ith clustering mode in the time l is obtained.

For the probabilistic prediction of quantile q, according to the proof of the existing literature, auxiliary variables are introduced

The above optimization problem can be converted into a linear optimization problem as follows:

whereas for deterministic prediction, the objective function aims to minimize the MAPE values on the validation set, similarly, by introducing auxiliary variables

The following linear optimization problem can be constructed:

3) testing phase

Pinball loss function and MAPE on the test set were selected as evaluation indices for probabilistic and deterministic predictions, respectively. The smaller the value, the better the prediction performance.

Example (b):

1. description of data

The smart meter measurement data of the user is from a smart meter dataset provided by London Low Carbon (LCL). The invention randomly picks out the measurement of the quantity of 36 users which are measured once every half hour from 1/2013 to 31/12/2013. And the number of clusters is selected according to N _C ＝[1,2,4,8,16,32,36]The process is carried out. According to the user information statistical data provided by the LCL, 36 users can be divided into different clusters according to income and received electricity price policies. Wherein, there are several grades according to income as follows: affluence (Acorn-A), moderate (Acorn-H), and poor (Acorn-L). The following two categories can be classified according to the electricity rate policy: time of Use (ToU) and Standard electricity (Std).

2. Predicted results

To demonstrate the effectiveness of the proposed method we compared it with the k-Shape cluster based cumulative prediction method, in which we only use the RLP curve of the load to divide the users into k clusters, and then use the typical procedure of the cluster based method to form the final cumulative predicted load. Since determining the number of clusters is always a problem of the method, the invention tries the number of different kinds of clusters and adopts the result corresponding to the best-performing cluster number as the final prediction result. Thus, in the following discussion, P-k is used to denote the proposed method of the present invention, and C-k is used to denote the above-described comparison method.

Table 1 lists the deterministic predictions for the test set. The results show that the method based on complete dispersion shows the worst prediction performance due to the larger uncertainty of the user load, and C-4 in the compared method shows the best prediction performance under the consideration of different cluster numbers, even better than the method based on complete aggregation. However, the proposed method P-10 showed the best performance in all comparative methods, as shown in FIG. 1, which shows the predicted 168 hour load curves for P-10 and C-4. Despite the large uncertainty and volatility of peak load, the proposed method P-10 can learn it better than C-4.

The probabilistic predictions are shown in table 2. The Pinball loss function measures the reliability of probabilistic predictions. The present invention predicts quantiles of 20%, 40%, 60% and 80%, respectively, and represents them as Q20, Q40, Q60 and Q80. The values in bold represent the best results for each quantile prediction. C-4 and P-10 were chosen as methods of probabilistic prediction since they show the best performance in deterministic prediction. The results show that the clustering-based approach shows a great improvement in Pinball loss function compared to the full aggregation approach. Furthermore, the proposed method has the lowest Pinball loss function in almost all quantiles. As shown in FIGS. 2a and 2b, the probabilistic predictions of the 168 hour load curves for P-10 and C-4 are shown. The prediction interval formed by the predicted loads of different quantiles can well cover the actual load. Likewise, the width of the probability interval of P-10 is less than the width of the probability interval of C-4. To quantitatively predict the probability intervals, we calculated average intervals of 20% and 60% in table 3. The smaller the value, the higher the sharpness. Therefore, the above results show that the method has better reliability and sharpness.

TABLE 1 deterministic load prediction results

TABLE 2 probabilistic load prediction results

TABLE 3 mean probability prediction Interval

	20% probability prediction Interval (kW)	60% probability prediction Interval (kW)
			P-10	0.628	2.449
C-4	1.068	3.607

To visualize the optimized weights, we convert them into heat maps, as shown in fig. 3. w ═ w ₁ ,w ₂ ,...,w ₇ ]The weight in (1) corresponds to the weight assigned to N _C ＝[1,2,4,8,16,32,36]The number of clusters in (2) forms the weight of the result. The results show that for probabilistic and deterministic predictions, the weight w ₁ Are typically large. And all predictions will be weighted w ₇ The assignment is 0. Therefore, the weight corresponding to a prediction result with better performance is generally larger.

3. Clustering algorithm result based on information accumulation k-Shape

Since the predicted performance of P-10 is best, the analysis here accumulates the 10-Shape clustering results based on the information. The tree diagram is shown in fig. 4. It indicates that the energy consumption of the 12 th electricity consumer is very different from that of other consumers. Therefore, we cut the tree at the position where the whole power consumer is divided into 2 groups. And the clustering results are analyzed using the statistical data, as shown in fig. 5. The results show that the 12 th power consumer lives in poor wealth conditions and accepts standard price of electricity policy.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. An accumulated load prediction method based on an information accumulation k-Shape clustering algorithm is characterized by comprising the following steps:

the step 1 specifically comprises the following steps:

step 1.1, representing the electrical load curve as a training set

The load data set for user i is represented as

step 1.2, solving the NP-hard optimization problem in a heuristic mode to realize k-Shape clustering;

the step 2 specifically comprises the following steps:

step 2.1, after the clustering is finished, calculating each user u _j In each cluster p _i And is noted as the number of loads contained in

Wherein i 1.. k, j 1.. N;

step 2.2, defining the similarity matrix

Is composed of

Step 3, converting the similarity matrix into a distance matrix;

step 4, applying a hierarchical clustering algorithm based on single relation on the distance matrix to obtain a hierarchical structure describing the distance between each user;

2. The accumulated load prediction method based on the information accumulated k-Shape clustering algorithm of claim 1, wherein the specific formula of the NP-hard optimization problem of step 1.2 is as follows:

wherein the content of the first and second substances,

is a cluster p _j E.g. the center of mass of P,

is about a sequence of length m

Is measured by the shape distance of (a).

3. The method of claim 2, wherein the cumulative load prediction method based on the information cumulative k-Shape clustering algorithm is characterized in that

The concrete formula of (1) is as follows:

wherein the content of the first and second substances,

is a measure of the cross-correlation of sequences, w 1.., 2 m-1;

4. the method for predicting the cumulative load based on the information cumulative k-Shape clustering algorithm as claimed in claim 1, wherein the step 3 specifically comprises the following steps:

converting the similarity matrix into the distance matrix by D ═ I-S

Wherein the matrix

Is 1.

5. The method for predicting cumulative load based on information accumulation k-Shape clustering algorithm as claimed in claim 1, wherein the hierarchical structure in the step 4 is described by using clustering tree.

6. The method for predicting cumulative load based on information cumulative k-Shape clustering algorithm as claimed in claim 5, wherein said step 5 comprises the following steps:

by selecting different numbers of clusters

Get | N for N users _C L different cluster division modes;

for the ith cluster partition, training is required

A model, the first

A model f _i,j Training on the jth cluster, i.e.

for deterministic prediction, the model is trained

To obtain

The predicted result of (2);

The cumulative load prediction for deterministic prediction is represented as

7. The method for predicting cumulative load based on information accumulated k-Shape clustering algorithm as claimed in claim 6, wherein the weight of the cumulative load prediction result of probabilistic prediction of each cluster partition determined in the step 6 is specifically an optimization problem:

w _i,q ≥0

wherein N is _va Is the number of samples on the validation set.

8. The method for predicting the cumulative load based on the information cumulative k-Shape clustering algorithm as claimed in claim 6, wherein the weight of the cumulative load prediction result of the deterministic prediction of each cluster partition determined in the step 6 is specifically an optimization problem:

the objective function is to minimize the MAPE value on the validation set.