CN116960991B

CN116960991B - Probability-oriented power load prediction method based on graph convolution network model

Info

Publication number: CN116960991B
Application number: CN202311222388.5A
Authority: CN
Inventors: 何州; 裘一蕾; 陈细平; 宋小波; 陈卫强; 姚家渭
Original assignee: Hangzhou Half Cloud Technology Co ltd
Current assignee: Hangzhou Half Cloud Technology Co ltd
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-12-29
Anticipated expiration: 2043-09-21
Also published as: CN116960991A

Abstract

The invention discloses a probability-oriented power load prediction method based on a graph convolution network model. And (3) passing the space-time characteristic matrix and the self-adaptive adjacent matrix through a stacked graph rolling network to obtain a graph rolling characteristic matrix. Finally, obtaining the self-attention mechanism by the graph convolution characteristic matrix through linear transformationQ、KAndVand (3) performing a self-attention mechanism to obtain a self-attention feature matrix, performing residual connection on the self-attention feature matrix and the graph convolution feature matrix, and performing normalization processing to obtain a predicted load value matrix. The method can comprehensively extract the space relevance information from two layers of geography and semantics simultaneously so as to enrich the receptive field and further capture the multi-scale enhanced space characteristics, thereby obtaining more accurate prediction results.

Description

Probability-oriented power load prediction method based on graph convolution network model

Technical Field

The application belongs to the technical field of power load prediction, and particularly relates to a probability-oriented power load prediction method based on a graph rolling network model.

Background

In recent years, power load prediction has been attracting public attention because of its ability to provide good economic and social benefits. However, as the modernization process accelerates, the power system scale is also rapidly expanding, as are the complexity and difficulty of power load data predictions. Thus, accurate power load prediction has become a challenging task. Probabilistic power load prediction is a well-known method that effectively addresses this challenge, taking into account the uncertainty and variability of the power load data, and giving a prediction interval. The probability power load prediction can improve the accuracy of load prediction, and a scientific power dispatching plan is provided for a power system.

Most of the existing researches adopt a time sequence decomposition strategy to process the time sequence characteristics of the load data, but neglect the importance degree of different decomposition components, so that the captured time sequence relevance information is incomplete. In addition, the prior researches mainly pay attention to the geographic relevance between adjacent load areas, neglect the semantic relevance between the load areas, and lead to insufficient extraction capability of the spatial relevance information.

Disclosure of Invention

The purpose of the application is to provide a probability-oriented power load prediction method based on a graph convolution network model, so as to solve the problem that the extraction capacity of spatial correlation information is insufficient in the prior art.

In order to achieve the above purpose, the technical scheme of the application is as follows:

a probabilistic power load prediction method based on a graph rolling network model comprises the following steps:

the historical power load sequence data of the area to be predicted is decomposed into a trend component, a seasonal component and a residual component, the trend component and the seasonal component respectively pass through a global convolution channel, then are spliced to obtain a global time sequence feature matrix, and the residual component passes through a local convolution channel to obtain a local time sequence feature matrix;

processing the local time sequence feature matrix and the global time sequence feature matrix by adopting a gating mechanism, and obtaining a space-time feature matrix through full connection processing;

the space-time characteristic matrix and the self-adaptive adjacent matrix are passed through a stacked graph rolling network to obtain a graph rolling characteristic matrix;

obtaining self-attention mechanism by linear transformation of graph convolution characteristic matrix、/>And->The matrix is then subjected to a self-attention mechanism to obtain a self-attention feature matrix;

and carrying out residual connection on the self-attention characteristic matrix and the graph convolution characteristic matrix, and then carrying out normalization processing to obtain a predicted load value matrix.

Further, the self-attention mechanism is obtained by linear transformation of the graph convolution characteristic matrix、/>And->A matrix, and then performing a self-attention mechanism to obtain a self-attention feature matrix, comprising:

obtaining a multi-head attention mechanism by linearly changing the graph convolution characteristic matrixAnd->Matrix, the characteristic matrix of the graph convolution is obtained by linear change>Matrix, then->And->Each head of the matrix gets multiscale +_ through multiscale convolutional neural network>And->Each head of the matrix, then +/for multiple dimensions>And->Each head of the matrix is associated with->And executing a self-attention mechanism by the matrix, and finally splicing self-attention results corresponding to each head to obtain a self-attention characteristic matrix.

Further, the global convolution channel comprises a convolution layer and a full connection layer.

Further, the local convolution channels include a convolution layer, a pooling layer, and a full connection layer.

Further, the processing the local time sequence feature matrix and the global time sequence feature matrix by adopting a gating mechanism includes:

by means ofThe activation function performs an activation operation on the local timing feature matrix and then adds to the global timing feature matrix.

Further, the self-attention mechanism is obtained by linear transformation of the graph convolution characteristic matrix、/>And->A matrix, comprising:

；

wherein,for query matrix, ++>For key matrix>Is a value matrix,/->、/>And->Is a matrix of learnable parameters for generating a query matrix, a key matrix and a value matrix, respectively,/for each of the query matrix, the key matrix and the value matrix>The feature matrix is rolled up for the graph.

Further, the multi-head attention mechanism is obtained by linearly changing the graph convolution characteristic matrixAnd->A matrix, comprising:

；

wherein,representing the division of the input matrix into +>Head(s) and(s) of a person>For the i-th head of the query matrix, < +.>For the i-th head of the key matrix, < >>And->Is a matrix of learnable parameters for the ith head of the query matrix and the key matrix respectively,the feature matrix is rolled for the drawing;

the graph convolution characteristic matrix is obtained through linear changeA matrix, comprising:

；

wherein,is a value matrix,/->Is a matrix of learnable parameters for generating a value matrix;

the saidAnd->Each head of the matrix gets multiscale +_ through multiscale convolutional neural network>And->Each head of the matrix comprises:

；

wherein,and->The ith head of the multi-scale query matrix and the ith head of the key matrix obtained by processing the multi-scale convolutional neural network are respectively +.>Is the number of convolutions, < >>Is the weight value of the kth convolution layer, < >>Is the matrix of learnable parameters of the kth convolutional layer,/>is->Input matrix of->Is the offset coefficient of the kth convolutional layer.

Further, for multi-scaleAnd->Each head of the matrix is associated with->The matrix executes a self-attention mechanism, and finally, the self-attention results corresponding to the heads are spliced to obtain a self-attention feature matrix, which comprises the following steps:

；

wherein,is a self-attention feature matrix,Trepresenting transpose operation,/->Is a splicing function->Is a normalization function.

According to the probabilistic power load prediction method based on the graph convolution network model, different decomposition components are processed respectively by using the network model with the double convolution channels, so that the capability of extracting time sequence relevance information of the model is improved. On the basis, a novel graph convolution network with a multi-scale self-attention mechanism enhanced is provided, and the spatial relevance information can be comprehensively extracted from two layers of geography and semantics at the same time, so that the receptive field is enriched, the multi-scale enhanced spatial characteristics are captured, and a better prediction result is obtained.

Drawings

Fig. 1 is a flowchart of a probabilistic power load prediction method based on a graph convolutional network model.

Fig. 2 is a schematic diagram of a prediction network model constructed in the present application.

Fig. 3 is a schematic diagram of space-time feature matrix extraction based on improved STL decomposition according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a multi-scale convolutional neural network according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Example 1:

FIG. 1 provides a probabilistic power load prediction method based on a graph rolling network model, comprising:

s1, historical power load sequence data of a region to be predicted is decomposed into a trend component, a season component and a residual component, the trend component and the season component respectively pass through a global convolution channel, then are spliced to obtain a global time sequence feature matrix, and the residual component passes through a local convolution channel to obtain a local time sequence feature matrix.

The object of the present application is to obtain a mapping function of the load value at the future time T, which can be predicted by GAs shown in formula (1):

(1)；

wherein,is a time-space feature matrix at the time t,His the length of the history data and,Tis the length of the predicted data.

For this purpose, a predictive network model is constructed as shown in fig. 2, which is a graph roll-up network model (MSGCN-ISTL for short) based on improved STL decomposition and multi-scale self-attention mechanism enhancement.

In training the network model, the input sequence data takes some existing data set. When the trained network model is adopted for prediction, the input sequence data is the historical power load sequence data of the area to be predicted.

In this embodiment, first, STL decomposition is performed on input sequence data, and the input sequence data is decomposed into a trend component, a season component, and a residual component by the STL decomposition. The solution process is shown in the formula (2).

(2)；

Wherein,is the input sequence data at time t, +.>Is a trend component->Is a seasonal component, ++>Is the residual component. Time series decomposition (STL) is an advantageous way of preprocessing to reduce the complexity of the input sequence data. STL decomposition is a common time series decomposition method with good robustness.

The individual components have different importance in representing the input sequence data. Wherein the trend component facilitates analysis of long-term changes in the load data and the seasonal component facilitates analysis of periodic changes in the load data. The residual component contains less regularity information, which is less important in capturing the timing characteristics of the data than the other two components. Thus, the present embodiment utilizes the GDCNN approach to improve STL to ISTL with the aim of taking targeted strategies for the different decomposition components, as shown in FIG. 3.

As shown in fig. 3, the present embodiment first inputs trend and seasonal components into the global convolution channel to capture key features representing trend changes and seasonal changes in load data, as shown in equation (3):

(3)；

wherein,the time sequence feature matrix is obtained through global convolution channel processing, and the global convolution channel comprises a convolution layer and a full connection layer. />Is a splicing function->And->A matrix of leachable parameters and a coefficient of deviation, respectively, for calculating trend components, < >>And->A matrix of leachable parameters and a coefficient of deviation, respectively, for calculating the seasonal components. Namely, the trend component and the season component respectively pass through a global convolution channel, and then are spliced to obtain a global time sequence feature matrix +.>。

Since the residual component is less regular and of smaller magnitude than the other two components, the use of a pooling layer in the local convolution channel further eliminates the redundant information of that component, as shown in equation (4):

(4)；

wherein,the local time sequence characteristic matrix is obtained by processing a local convolution channel, wherein the local convolution channel comprises a convolution layer, a pooling layer and a full connection layer>Is a pooling operation, ++>And->A matrix of leachable parameters and a coefficient of deviation, respectively, for calculating the residual component.

And S2, processing the local time sequence feature matrix and the global time sequence feature matrix by adopting a gating mechanism, and obtaining a space-time feature matrix through full connection processing.

The present embodiment uses a gating mechanism pairAnd->Data flow control is performed by usingFor->Performing activation operation and associating it with +.>The addition is as shown in formula (5).

(5)；

Wherein,is a time sequence characteristic matrix obtained by the processing of a gating mechanism>Is an activation function.

Then, using the full connection layer pairPerforming dimension conversion to obtain a space-time feature matrix +.>As shown in formula (6).

(6)；

Wherein,is a space-time feature matrix, < >>And->Is a matrix of learnable parameters and a coefficient of deviation for the fully connected layer.

And S3, passing the space-time characteristic matrix and the self-adaptive adjacent matrix through a stacked graph rolling network to obtain a graph rolling characteristic matrix.

The present embodiment employs an adaptive adjacency matrix generation method to generate adjacency matrices that are not predefined and can be adaptively updated. Adjacency matrixThe definition of (2) is shown as a formula (7):

(7)；

wherein,and->Is an activation function for normalizing the adjacency matrix; />Source node and destination node respectively initialized randomly are embedded,/->Is the depth of node embedding. When the graph convolutions iterate, the adjacency matrix is adaptively updated to extract the spatial correlation between the load regions.

The present embodiment uses an unweighted graphTo represent the spatial structure of the load zone network. The figure regards each load area in the network as a node, wherein +.>Is a set of load area nodes, +.>；/>Is the number of nodes in the load area network, but +.>Is a collection of edges in the network. Then, the adjacency matrix is defined as +.>For representing the association between the load areas. The adjacency matrix uses 0 and 1 to represent the association relationship between the load areas, wherein 0 represents no association between the two load areas, and 1 represents the two load areasThere is an association between the domains.

Load-based network graphSpace-time feature matrix->And adjacency matrix->The graph roll-up network GCN may be constructed by stacking several convolution layers, as shown in equation (8):

(8)；

wherein,is the output matrix of the GCN layer, +.>Is the number of stacked GCN layers, +.>Is->Adjacency matrix of the individual GCN layers, +.>Is->A matrix of learnable parameters for the GCN layers. I.e. by means of a stacked graph rolling network, a graph rolling feature matrix is obtained>。

S4, obtaining a self-attention mechanism by linear transformation of the graph convolution feature matrix、/>And->The matrix, then the self-attention mechanism is executed to obtain the self-attention feature matrix.

The pair of the embodimentPerforming linear transformation to obtain a query matrix and a key matrix, wherein the query matrix and the key matrix are shown in formulas (9) and (10):

(9)；

(10)；

wherein,for query matrix, ++>For key matrix>And->Is a matrix of learnable parameters for generating a query matrix and a key matrix, respectively.

Next, toPerforming linear transformation to obtain a value matrix, as shown in formula (11):

(11)；

wherein,is a value matrix,/->Is a matrix of learnable parameters used to generate a value matrix.

Then, a self-attention mechanism is performed to obtain a self-attention feature matrix, as shown in formula (12):

(12)；

wherein,is the self-attention feature matrix calculated, < ->Representing transpose operation,/->Is a normalization function.

It should be noted that, the self-attention mechanism is implemented by self-attention neural network, which is a relatively mature technology in the art, and will not be described herein.

And S5, carrying out residual connection on the self-attention characteristic matrix and the graph convolution characteristic matrix, and then carrying out normalization processing to obtain a predicted load value matrix.

Residual connection is a widely used technique in deep network training and has been proven to be very efficient in transferring information between neural network layers. Therefore, the present embodiment will be self-attention feature matrixAnd->Add and input the result to the batch normalization layer. Finally, a matrix of predicted load values of the model can be obtained>I.e. prediction resultsAs shown in formula (13).

(13)；

Wherein,is a batch normalization function.

Example 2:

unlike embodiment 1, this embodiment proposes a multi-scale convolutional neural network (abbreviated as MSCN) and improves the self-attention mechanism to a multi-scale self-attention mechanism (abbreviated as MSSA).

The spatial correlation between load areas is reflected not only at the geographical level of the load network but also at the semantic level of the load network. However, existing self-attention mechanisms can only work on a single scale, and cannot effectively extract complex spatial correlation features.

For this reason, the present embodiment proposes a multi-scale convolutional neural network (abbreviated as MSCN), and improves the self-attention mechanism to a multi-scale self-attention mechanism (abbreviated as MSSA). The MSCN comprises a plurality of convolution layers with convolution kernels of different scales, so that the MSSA has rich receptive fields, and can comprehensively extract multi-scale enhanced spatial features of two layers of geography and semantics.

In step S4 of this embodiment, the multi-head attention mechanism is obtained by linearly changing the convolution feature matrixAndmatrix, the characteristic matrix of the graph convolution is obtained by linear change>Matrix, then->And->Each head of the matrix gets multiscale +_ through multiscale convolutional neural network>And->Each head of the matrix, then +/for multiple dimensions>And->Each head of the matrix is associated with->And executing a self-attention mechanism by the matrix, and finally splicing self-attention results corresponding to each head to obtain a self-attention characteristic matrix.

The framework of the multi-scale self-attention mechanism MSSA of this embodiment is shown in fig. 2, and first, a multi-head mechanism is adopted to improve the accuracy of the model for attention weight allocation. Next, toPerforming linear transformation to obtain a query matrix and a key matrix, wherein the query matrix and the key matrix are shown in formulas (14) and (15):

(14)；

(15)；

wherein,representing the division of the input matrix into +>Head(s) and(s) of a person>Is the first of the query matrixiHead(s) and(s) of a person>Is the first of key matrixiHead(s) and(s) of a person>And->Is the first to generate the query matrix and the key matrix respectivelyiA matrix of learnable parameters for an individual head.

Next, toPerforming linear transformation to obtain a value matrix, as shown in formula (16):

(16)；

In the embodiment, each head of the query matrix and the key matrix is obtained through a multi-scale convolutional neural network MSCN. Computation function of MSCNAs shown in formula (17):

(17)；

wherein,is the number of convolutions, < >>Is the firstkWeight value of each convolution layer, +.>Is the firstkA matrix of learnable parameters of the convolutional layers, +.>Is->Input matrix of->Is the firstkThe offset coefficients of the convolutions layers. The multi-scale convolutional neural network MSCN in this embodiment includes a plurality of convolutional layers with different scale convolutional kernels, as shown in fig. 4, each convolutional layer performs a convolutional operation on the input matrix, so as to obtain operation results under different receptive fields, and then performs weighted summation on the results obtained by the different convolutional layers to obtain a final processing result.

According to the difference of input matrixes in the formula (17), the calculation processes of the multi-scale query matrix and the key matrix in the MSCN are respectively shown as the formulas (18) and (19):

(18)；

(19)；

wherein,and->The ith heads of the obtained multi-scale query matrix and key matrix are processed by using an MSCN method respectively.

Then, the processing result of the multi-scale self-attention mechanism is as shown in formula (20):

(20)；

wherein,is a weighted load matrix (self-attention feature matrix),>representing transpose operation,/->Is a splicing function->Is a normalization function.

And S5, carrying out residual connection on the attention characteristic matrix and the graph convolution characteristic matrix, and then carrying out normalization processing to obtain a predicted load value matrix.

Residual connection is a widely used technique in deep network training and has been proven to be very efficient in transferring information between neural network layers. Thus, the present embodiment will weight the load matrixAnd +.>And inputs the result to the batch normalization layer. Finally, a matrix of predicted load values for the model herein can be obtained>(predicted load value matrix) as shown in the formula (21):

(21)；

wherein,is a batch normalization function。

The present application verifies the above technical solution by experiments that use two common data sets GEFCom2012 and GEFCom2017 to verify the performance of the built model. The dataset GEFCom2012 included 32944 pieces of load data from the 20 load areas in the united states collected from month 1 2004 to month 6 2008. The dataset GEFCom2017 includes 397464 pieces of load data from hundreds of load areas in the united states collected from month 1 2005 to month 12 2011.

Comparing the proposed model with several typical models, the superiority of the proposed model in probability load prediction was verified. First, a model is selected that extracts only the temporal relevance of payload data, including Q-LSTM, CNN-LSTM, DA-QLSTM, CNN-BiLSTM, and GDCNN-AR-AMPO. Second, a model is selected that extracts the temporal-spatial correlations of the load data, including Ada-GWN. The specific details of the above model are as follows:

(1) Q-LSTM: a hybrid model of LSTM model and marbles loss function is fused.

(2) CNN-LSTM: a hybrid model of CNN and LSTM models is fused.

(3) DA-QLSTM: a hybrid model of a two-stage attention mechanism and an LSTM model is fused.

(4) CNN-BiLSTM: a hybrid model of CNN and BiLSTM models is fused.

(5) GDCNN-AR-AMPO: a hybrid model of a gated double convolutional neural network, an attention mechanism and a pooling operation is fused.

(6) Ada-gwn: a hybrid model of the spatiotemporal GNN model is fused.

In the experiment, the number of GCN layers is set to 2, the embedding depth of adjacent matrix nodes is set to 10, and the number of heads in a multi-head mechanism is set to 8. When training the model, the initial learning rate was set to 0.005, the optimizer was set to Adam, and the batch size was set to 512.

For the comprehensive evaluation of the model, four evaluation indexes are used in the application, including prediction interval coverage probability (PICP for short), average prediction interval width (MPIW for short), winkler score (WS for short) and pachinko loss (PL for short). Wherein a larger PICP value indicates better performance, and smaller other three index values indicate better performance.

(1) PICP: the coverage of the prediction interval (abbreviated as PI) to the actual load value is represented by the formula (22):

(22)；

wherein,is the length of the predicted time, +.>、/>And->The actual load of PI at time t, the lower prediction interval and the upper prediction interval, respectively,/->Is an intermediate transition variable.

(2) MPIW: the width of PI is represented by formula (23):

(23)。

(3) WS: the coverage and width of PI are represented as shown in formula (24):

(24)；

wherein,is confidence interval level, +.>Is an intermediate transition variable.

(4) PL: a loss function of probability prediction as shown in equation (25):

(25)；

wherein,PL value at time t, q is the quantile of confidence interval, < >>Is the quantile at time t.

Tables 1-2 show comparative experimental results at different experimental parameter settings. PICP and MPIW are commonly used evaluation indexes in the prediction of the probability power load, and WS and PL simultaneously consider the coverage rate and the width of PI, so that the overall performance of a prediction model is more scientifically evaluated. From the experimental results, the model of the application is the optimal model, and Ada-GWN is the suboptimal model.

TABLE 1

TABLE 2

Wherein quantile refers to the quantile of the confidence interval in the probability prediction, i.e. the confidence level. Table 1 is experimental results using the GEFCom2012 dataset and table 2 is experimental results using the GEFCom2017 dataset, from which the following conclusions can be drawn from the experimental data of the four evaluation indexes in tables 1-2:

(1) Ada-GWN and MSGCN-ISTL as proposed herein have significant advantages over the other five baseline models in each set of comparative experiments, especially in WS and PL metrics. The experimental results show that the time sequence correlation information in the historical data is not enough to be captured, and the space correlation information among the load areas is also an important support for realizing accurate probability power load prediction.

(2) In most comparative experiments, the MSGCN-ISTL model was evaluated more optimally than Ada-GWN, indicating that improving the GCN using ISTL and MSCN methods can more fully capture the relevance information between load areas. In addition, under different experimental parameters, the overall performance of the MSGCN-ISTL is superior to that of all baseline models, which shows that the models have good robustness.

(3) From the comparison of Q-LSTM and DA-QLSTM, it can be seen that DA-QLSTM has better performance, indicating that the attention mechanism is effective in extracting characteristic information of payload data. Furthermore, the performance of CNN-LSTM is superior to Q-LSTM, indicating that CNN layer is efficient in feature extraction.

The application also verifies the effectiveness of each module in the MSGCN-ISTL model by an ablation experiment. The ablation experiment was mainly performed by comparing MSGCN-ISTL with its three simplified models as follows:

(1) MSGCN-ISTL w/o ISTL: the model removes ISTL modules on the basis of MSGCN-ISTL, i.e., does not contain an improved STL decomposition strategy.

(2) MSGCN-ISTL w/o MSSA: the model removes the MSSA module on the basis of MSGCN-ISTL, i.e., does not include a self-attention mechanism.

(3) MSGCN-ISTL w/o MSCN: the model removes the MSCN method on the basis of MSGCN-ISTL, i.e. the self-attention mechanism of the model is single-scale.

Tables 3 and 4 show the results of ablation experiments on the data sets GEFCom2012 and GEFCom2017, respectively.

TABLE 3 Table 3

TABLE 4 Table 4

From the experimental data of the four evaluation indexes in tables 3 to 4, the following conclusions can be drawn:

(1) MSGCN-ISTL achieved optimal performance under different experimental conditions of both data sets, indicating that each module of the model was effective.

(2) Compared with MSGCN-ISTL w/o ISTL, MSGCN-ISTL performs better, especially on MPIW, WS and PL indexes, and the fact that the GDCNN method is used for respectively processing decomposition components with different importance is helpful for capturing time sequence relevance information more comprehensively.

(3) The prediction performance of MSGCN-ISTL and MSGCN-ISTL w/o MSCN is superior to MSGCN-ISTL w/o MSSA, which shows that the self-attention mechanism is utilized to capture the association information between load areas, so that the prediction performance of the model can be effectively improved.

(4) MSGCN-ISTL gives better results than MSGCN-ISTL w/o MSCN. This shows that the MSSA method is effective because the method enriches the receptive field, and can comprehensively extract the multi-scale enhanced spatial features including geographic and semantic layers, thereby realizing the comprehensive extraction of the spatial relevance features.

The application provides a new probability power load prediction model MSGCN-ISTL, which aims to more comprehensively capture space-time correlation information from load data so as to realize accurate probability power load prediction. Specifically, the STL decomposition strategy STL is improved so as to realize that different processing methods are adopted for decomposition components with different importance, thereby improving the extraction capability of the model for time sequence characteristics in the load data stream. In addition, the application provides a new space-time correlation extraction method-MSGCN to enrich receptive fields and further improve the extraction capacity of multi-scale enhanced spatial features at two levels of geography and semantics. Compared with the existing baseline model, MSGCN-ISTL shows superiority and robustness under different experimental groups.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The probabilistic power load-oriented prediction method based on the graph rolling network model is characterized by comprising the following steps of:

residual connection is carried out on the self-attention characteristic matrix and the graph convolution characteristic matrix, and then normalization processing is carried out, so that a predicted load value matrix is obtained;

wherein the self-attention mechanism is obtained by linear transformation of the graph convolution characteristic matrix、/>And->A matrix, and then performing a self-attention mechanism to obtain a self-attention feature matrix, comprising:

obtaining a multi-head attention mechanism by linearly changing the graph convolution characteristic matrixAnd->Matrix, the characteristic matrix of the graph convolution is obtained by linear change>Matrix, then->And->Each head of the matrix gets multiscale +_ through multiscale convolutional neural network>And->Each head of the matrix, then +/for multiple dimensions>And->Each head of the matrix is associated with->The matrix executes a self-attention mechanism, and finally, the self-attention results corresponding to the heads are spliced to obtain a self-attention feature matrix;

wherein,the characteristic matrix of the graph convolution is subjected to linear change to obtain a multi-head attention mechanismAnd->A matrix, comprising:

；

wherein,representing the division of the input matrix into +>Head(s) and(s) of a person>For the i-th head of the query matrix, < +.>For the i-th head of the key matrix, < >>And->Is a matrix of learnable parameters for the ith head for generating a query matrix and a key matrix, respectively,/for each head>The feature matrix is rolled for the drawing;

；

wherein,and->The ith head of the multi-scale query matrix and the ith head of the key matrix obtained by processing the multi-scale convolutional neural network are respectively +.>Is the number of convolutions, < >>Is the weight value of the kth convolution layer, < >>Is the matrix of learnable parameters of the kth convolutional layer,>is->Input matrix of->Is the offset coefficient of the kth convolutional layer.

2. The probabilistic power load prediction method based on a graph-convolution network model as claimed in claim 1, wherein the global convolution channel comprises a convolution layer and a full-connection layer.

3. The probabilistic power load prediction method based on a graph rolling network model as claimed in claim 1, wherein the local convolution channels include a convolution layer, a pooling layer and a full-connection layer.

4. The probabilistic power load prediction method based on a graph rolling network model as claimed in claim 1, wherein the processing the local time sequence feature matrix and the global time sequence feature matrix by using a gating mechanism comprises:

5. The probabilistic power load prediction method based on a graph rolling network model as claimed in claim 1, wherein the graph rolling feature matrix is obtained by a self-attention mechanism through linear transformation、/>And->A matrix, comprising:

；

6. The probabilistic power load prediction method based on a graph rolling network model as claimed in claim 1, wherein the model is used for multi-scaleAnd->Each head of the matrix is associated with->The matrix executes a self-attention mechanism, and finally, the self-attention results corresponding to the heads are spliced to obtain a self-attention feature matrix, which comprises the following steps:

；