CN111160620B

CN111160620B - Short-term wind power prediction method based on end-to-end memory network

Info

Publication number: CN111160620B
Application number: CN201911247976.8A
Authority: CN
Inventors: 祝永晋; 查满霞; 严佳欣; 谢林枫; 马吉科; 龙玲莉; 李同哲; 司加胜; 周德宇
Original assignee: Southeast University; Jiangsu Fangtian Power Technology Co Ltd
Current assignee: Southeast University; Jiangsu Fangtian Power Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2022-06-17
Anticipated expiration: 2039-12-06
Also published as: CN111160620A

Abstract

The invention discloses a short-term wind power prediction method based on an end-to-end memory network, which comprises the following steps: collecting data of a wind power plant; dividing the collected wind power plant data into historical data, predicted multi-node weather data and power data, and carrying out normalization processing on the historical data and the predicted multi-node weather data; encoding historical data by using a multi-head self-attention mechanism model, and storing the encoded feature vectors into a memory pool of an end-to-end memory network; encoding the predicted multi-node weather data by adopting an attention mechanism, and taking an encoding result as an input vector of an end-to-end memory network; taking the power data as an output vector of the end-to-end memory network to train the end-to-end memory network; and predicting the output power of the fan by using the trained end-to-end memory network. Compared with the conventional short-term wind power prediction method, the method disclosed by the invention can be used for paying attention to the information implicit in the historical data, and has higher prediction accuracy and stability.

Description

Short-term wind power prediction method based on end-to-end memory network

Technical Field

The invention relates to the field of wind power prediction, in particular to a short-term wind power prediction method.

Background

The short-term wind power prediction means that the power output by a wind power place within 1-4 hours in the future is predicted according to NWP data of wind speed, temperature, wind direction, radiation, precipitation, cloud cover and the like and historical data of a wind power plant. The method has the advantages that the generated power of the wind power plant is accurately predicted in a short period, so that the power dispatching distribution can be timely adjusted by a national power grid, the power quality and the economic benefit are improved, and the safe and stable operation of a power system is guaranteed.

The wind power prediction method mainly comprises three types: physical methods, statistical methods, and artificial intelligence methods. The physical method is characterized in that a physical model is built according to the existing knowledge to describe the conversion process of wind energy-kinetic energy, and the output power of the wind turbine is calculated by combining a wind turbine power curve, but the physical method has the problem of low prediction accuracy because model parameters are difficult to accurately select. The statistical method is characterized in that a non-linear mapping relation between data samples is established by mining a statistical rule among historical data of the wind power plant, and the non-linear mapping relation comprises an autoregressive integral moving average method, a persistence method, a Kalman filtering method and the like.

The artificial intelligence method combines models such as a support vector machine and an artificial neural network, and defines a wind power prediction task as a time series modeling problem by carrying out nonlinear analysis on a large amount of historical data. In the existing wind power short-term prediction methods, some prediction methods based on BP neural networks initialize weight thresholds of the BP neural networks through an empire competition algorithm and a particle swarm optimization algorithm, so that better effects than those of general BP neural networks are obtained, but the method still does not consider the relevance of wind power on a time sequence; in other methods based on the LSTM network, a principal component analysis method is adopted to screen out factors which have large influence on power from weather data, and then the LSTM is used for modeling a weather data time sequence.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a method for short-term prediction of wind power, which is based on numerical weather forecast data and data acquired by an SCADA (supervisory control and data acquisition) system, and comprises the steps of firstly obtaining historical data feature codes by using a multi-head self-Attention mechanism (Transformer), then obtaining feature codes of predicted multi-node weather data by using an Attention mechanism, and finally predicting the power by combining the obtained feature codes through an end-to-end memory network (MemN2N) model.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

a short-term wind power prediction method based on an end-to-end memory network is characterized by comprising the following steps:

collecting data of a wind power plant;

dividing the collected wind power plant data into historical data, predicted multi-node weather data and power data, and carrying out normalization processing on the historical data and the predicted multi-node weather data;

encoding historical data by using a Transformer model, and storing the encoded characteristic vectors into a memory pool of MemN 2N; encoding the predicted multi-node weather data by adopting an attention mechanism, and taking an encoding result as an input vector of MemN 2N; performing MemN2N training with the power data as an output vector of MemN 2N;

and (4) predicting the output power of the fan by using the trained MemN2N network.

The method for acquiring the wind power plant data comprises the following steps of: judging the data of each node, and if the proportion of empty data in the node is more than or equal to 35%, deleting the node; otherwise, zero element filling processing is carried out on the obtained null data in the node.

The historical data is constructed by splicing numerical weather forecast data and SCADA system collected data, the numerical weather forecast data comprises wind speed and wind direction, air temperature, air pressure, radiation, precipitation and cloud cover amount at different altitudes, and the SCADA system collected data comprises a fan number and wind speed, wind direction, temperature and power measured by a fan sensor.

The method for dividing and normalizing the data specifically comprises the following steps: for each time node, taking numerical weather forecast data of N nodes in front of the time node and SCADA system collected data as historical data, namely X ═ { X ═ X₁，x₂，...，x_N}; taking the numerical weather forecast data of the M nodes behind the numerical weather forecast data as predicted multi-node weather data, namely B ═ B₁，b₂，...，b_M}; taking the actual power data of the M nodes behind the model as power data, namely the output of the model; the historical data and predicted multinode weather data are then normalized to [0,1 ]]In the interval, the output of the model is not normalized.

A method of encoding historical data using a Transformer model, comprising:

(1) a transform-based multi-head mechanism divides an encoder into T heads, and randomly initializes three matrixes in each head

Where X ∈ X represents any one of the historical data nodes, | X | represents the number of data in X,

representing a subspace coding length;

(2) for each history data x_nBelongs to X, N is more than or equal to 1 and less than or equal to N, respectively passes through with W_Q、W_K、W_VMultiplying to obtain a problem vector Q_nKey vector K_nValue vector V_n；

(3) For current historical data node x_iUsing problem vector Q_iKey vector K with all history data, respectively₁，K₂，...，K_NMultiplying by a point to calculate each historical data x_nAt x_iScore S ═ S in (1)₁，s₂，...，s_N}；

(4) For a score s_nOperating with epsilon S to obtain

Then all the

Respectively corresponding value vectors V_nMultiply and sum to obtain x_iSingle-headed precoding vector e of₁；

(5) For x_iPerforming (2) - (4) in all heads to obtain x_iPrecoding vectors in all headers e₁，e₂，...，e_TWill { e }₁，e₂，...，e_TAll the items in the Chinese character are spliced to obtain x_iPre-coding of (2); x is to be_iPrecoding and x_iAre summed to obtain

Then will be

Obtained by a feed-forward neural network

To pair

And

summing to obtain x_iSingle layer coding result of (2)

(6) For all historical dataNode x_nE, executing (2) - (5) for X to obtain single-layer coding results of all historical data;

(7) for the L-layer encoder, the single-layer encoding result in (6) is used as the input of the next-layer encoder, and (1) - (6) are executed, so that the final encoding H ═ H of all historical data can be obtained₁，h₂，...，h_N}；

Wherein, T, N, d_kL is a hyper-parameter, T is the number of headers in each encoder, N is the number of historical data nodes, d_kFor scaling, it is generally set to the dimension of the key vector K, and L is the number of layers of the encoder.

The operation of encoding the predicted multi-node weather data by adopting the attention mechanism specifically comprises the following steps: for predicted multi-node weather data B ═ B₁，b₂，...，b_MFor each node b_iE.g. B is assigned a weight α_iI is more than or equal to 1 and less than or equal to M, multiplying each node by the corresponding weight and summing to obtain a result u as the input of the MemN2N model; wherein M is a hyper-parameter and is the number of the predicted nodes.

The method for predicting the power by using the MemN2N model comprises the following steps:

(1) introducing the characteristic vector H into a memory pool of a long-term memory component of the MemN2N model;

(2) random initialization of two embedded matrices

And

wherein X belongs to X and represents any historical data node, | X | represents the number of data in X, and len represents the coding length. For code vector h in memory pool_iE is H, i is more than or equal to 1 and less than or equal to N, H_iMultiplying respectively with A and C to obtain h_iMapping m in two different feature spaces_iAnd c_i(ii) a The same operation is executed to all the code vectors in the memory pool to obtain two feature matrixes M₁＝{m₁，m₂，...，m_NAnd M₂＝{c₁，c₂，...，c_N}；

(3) The obtained input vector u and a first feature matrix M₁Multiplying, and calculating an attention vector P ═ P by using the obtained result through softmax operation₁，p₂，...，p_N]Wherein p is_iFor the input vector u to the i-th feature vector h_iThe attention of (1);

(4) for the feature matrix M₂Each feature vector c in_iIt is associated with the corresponding attention P in the vector P_iMultiplying, summing all weighted eigenvectors to obtain a context vector o, and adding the context vector o to the input u to obtain a result r₁As input to the next layer;

(5) for G layer MemN2N network, repeating the steps (2) - (4), and outputting vector r of the last layer_GRespectively performing point multiplication on the M different vectors, wherein the result of each point multiplication is a predicted power; wherein G is a hyper-parameter and is the number of layers of the MemN2N network.

According to the short-term wind power prediction method based on the end-to-end memory network, the long-distance dependency information in the historical data is mined by using the Transformer, and the dependency information is introduced into the memory pool of the MemN2N network, so that compared with the LSTM network, the MemN2N network has obvious advantages in memory storage, and therefore the prediction accuracy is higher; in the multi-step prediction process, the input component and the output component of the MemN2N network are further improved based on an attention mechanism, and the correlation between continuous multi-step weather data is respectively modeled according to different step sizes, so that the stability is higher in the multi-step prediction process.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following examples in connection with specific embodiments thereof, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense and that various equivalent modifications of the invention as described herein will occur to those skilled in the art upon reading the present disclosure and are intended to be covered by the appended claims.

A short-term wind power prediction method based on an end-to-end memory network is disclosed, and as shown in figure 1, the method comprises two steps of preprocessing original data and training a model.

The problem can be described as follows: with X ═ X₁，x₂，...，x_NDenotes history data, with Q ═ Q₁，q₂，...，q_MAnd expressing weather data of predicted multiple nodes, wherein the task of short-term wind power prediction is to predict the output power of the wind power plant at M future time points based on historical data X and weather data Q at M future time points.

Firstly, preprocessing original data

Step 1, for each node Z in the original data set Z_iJudging, and if the proportion of the null data in the node is more than or equal to 35%, deleting the node; otherwise, zero element filling processing is carried out on the null data in the node.

Step 2, dividing the processed Z into historical data, predicted multi-node weather data and power data, specifically: for each node Z in Z_iTaking the weather data and the actual power data of the previous N nodes as historical data, namely X ═ X₁，x₂，...，x_NN is the number of the historical data nodes; will be followed by M nodes (including z)_i) As predicted multi-node weather data, i.e., B ═ B₁，b₂，...，b_MM is the number of predicted multi-nodes; taking the actual power data of the M nodes behind the model as power data, namely the output of the model; then, the historical data X and the predicted multi-node weather data B are processed to [0,1 ]]And normalization in the interval is carried out, and normalization processing is not carried out on the output of the model.

Secondly, training an end-to-end memory network prediction model

Step 3, in the stage of training the encoder of the Transformer, dividing each encoder into T heads, and then, in each head, dividing each encoder into T headsMachine initialization three | x | xR matrices W_Q、W_K、W_VWhere x represents any one of the historical data nodes, | x | represents the number of data in x,

representing the subspace coding length. For each historical data node x_nBelongs to X (1 is less than or equal to N is less than or equal to N), respectively passes through with W_Q、W_K、W_VMultiplying to obtain a problem vector Q_nKey vector K_nValue vector V_n。

For current historical data node x_iUsing problem vector Q_iKey vector K with all history data, respectively₁，K₂，...，K_NMultiplying by the points, and calculating each historical data node x_nAt x_iScore S ═ S in (1)₁，s₂，...，s_N}; each score s_nIs epsilon S (1. ltoreq. n.ltoreq.N) divided by

Then performing softmax operation to obtain

Then all the

Respectively corresponding value vectors V_nMultiply and sum to obtain x_iSingle-headed precoding vector e of₁. Wherein d is_kTo scale, the dimension of the key vector K is typically set.

For x in all heads_iPerforming the same operation, x can be obtained_iPrecoding vectors in all headers e₁，e₂，...，e_TWill { e }₁，e₂，...，e_TAll the items in the Chinese character are spliced to be x_iPre-coding of (2); then, for x_iIs pre-coded with x_iThe result of the summation is

Will be provided with

Obtained by a feed-forward neural network

To pair

Medicine for curing cancer

Summing to obtain x_iSingle layer coding result of

The same operation is carried out on other historical data nodes, and a single-layer coding result of all historical data can be obtained.

For an L-layer encoder, the single-layer encoding result is used as the input of the next-layer encoder, and the same operation is performed on the encoder, so that the final encoding H ═ H of all the historical data can be obtained₁，h₂，...，h_N}. And N is the number of the historical data nodes.

Step 4, for predicted multi-node weather data B ═ B₁，b₂，...，b_MFor each node b_iE.g. B is assigned a weight α_i(1 ≦ i ≦ M), multiplying and summing each node with its corresponding weight, and taking the result u as the input of the MemN2N model.

Step 5, in the parameter training stage of the MemN2N network, firstly, introducing the coding result H in the step 3 into a memory pool of a long-term memory component of the MemN2N model; then, for a single-layer MemN2N network, two | x | × len embedding matrices A and C are randomly initialized, and for the coding vector h in the memory pool_iE is H (i is more than or equal to 1 and less than or equal to N), H is_iMultiplying respectively with A and C to obtain h_iMapping m in two different feature spaces_iAnd c_i(ii) a Performing the same for all code vectors in the memory poolOperate to obtain two feature matrices M₁＝{m₁，m₂，...，m_NAnd M₂＝{c₁，c₂，...，c_N}; the input vector u obtained in the step 4 and the first feature matrix M are combined₁Multiplying, and calculating an attention vector P ═ P by using the obtained result through softmax operation₁，p₂，...，p_N]Wherein p is_i(1. ltoreq. p. ltoreq.N) represents the attention of the input vector u to the ith feature vector; for the feature matrix M₂Each feature vector c in_iIt is associated with the corresponding attention P in the vector P_iMultiplying, summing all weighted eigenvectors to obtain a context vector o, and adding the context vector o to the input u to obtain a result r₁As input for the next layer. Where len denotes a code length.

Finally, for the G layer MemN2N network, the operation of the single layer network is repeatedly executed, and the output vector r of the last layer is_GAnd respectively carrying out point multiplication on the M different vectors, wherein the result of each point multiplication is a predicted power.

Thirdly, testing the trained model

For weather data node to be predicted

And corresponding historical data node

Will be provided with

Inputting into a Transformer, using an attention mechanism

And (4) performing calculation, inputting the calculation result into MemN2N, and outputting the model, namely the predicted power.

The short-term wind power short-term prediction algorithm flow based on the end-to-end memory network is as follows:

in the experimental process, the parameters are set as follows: the number of layers L of the transform encoder is 6, the number of headers T in each encoder is 4, the number of layers G of the MemN2N network is 3, and the encoding length Len is 100. The short-term wind power short-term prediction method based on the end-to-end memory network provided by the invention is characterized in that a test is carried out on an actually measured data set of a certain power company in Jiangsu, the root-mean-square errors in the steps 2, 4, 8 and 16 are respectively 401KW, 408KW, 418KW and 435KW, and the root-mean-square errors of an LSTM model on the data set are respectively 571KW, 549KW, 522KW and 596 KW. The error of the method provided by the invention fluctuates about 72% of the LSTM model, and the prediction effect is superior to that of the LSTM model.

The short-term wind power prediction method based on the end-to-end memory network not only can mine the relevance of the wind power on a time sequence, but also solves the problem of information loss in the long-time sequence analysis process, and is beneficial to timely adjustment of the national power grid for power dispatching.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A short-term wind power prediction method based on an end-to-end memory network is characterized by comprising the following steps:

collecting data of a wind power plant;

encoding historical data by using a multi-head self-attention mechanism model, and storing the encoded feature vectors into a memory pool of an end-to-end memory network; encoding the predicted multi-node weather data by adopting an attention mechanism, and taking an encoding result as an input vector of an end-to-end memory network; taking the power data as an output vector of the end-to-end memory network to train the end-to-end memory network;

and predicting the output power of the fan by using the trained end-to-end memory network.

2. The end-to-end memory network-based short-term wind power prediction method according to claim 1, wherein the acquiring of the wind farm data further comprises preprocessing the acquired wind farm data, and the preprocessing method specifically comprises: judging the data of each node, and if the proportion of empty data in the node is more than or equal to 35%, deleting the node; otherwise, the zero element filling processing is carried out on the empty data in the node.

3. The short-term wind power prediction method based on the end-to-end memory network as claimed in claim 1, characterized in that the historical data is constructed by splicing numerical weather forecast data and SCADA system collected data, wherein the numerical weather forecast data includes wind speed and direction, temperature, air pressure, radiation, precipitation and cloud cover at different altitudes, and the SCADA system collected data includes fan number, wind speed, direction, temperature and power measured by fan sensor.

4. The end-to-end memory network-based short-term wind power prediction method according to claim 1, wherein the method for dividing and normalizing data specifically comprises the following steps: for each time node, the numerical weather forecast data and the SCADA system data of the N nodes before the time node are collectedCollecting data as historical data, i.e. X ═ X₁,x₂,…,x_N}; taking the numerical weather forecast data of the M nodes behind the numerical weather forecast data as predicted multi-node weather data, namely B ═ B₁,b₂,…,b_M}; taking the actual power data of the M nodes behind the model as power data, namely the output of the model; the historical data and predicted multi-node weather data are then normalized to [0,1 ]]In the interval, the output of the model is not normalized.

5. The end-to-end memory network-based short-term wind power prediction method according to claim 1, wherein the method for encoding historical data by using a multi-head self-attention mechanism model comprises the following steps:

(1) a multi-head mechanism based on a multi-head self-attention mechanism divides an encoder into T heads, and randomly initializes three matrixes in each head

representing a subspace coding length;

(3) For current historical data node x_iUsing problem vector Q_iKey vector K with all history data, respectively₁,K₂,…,K_NMultiplying by a point to calculate each historical data x_nAt x_iScore S ═ S₁,s₂,…,s_N}；

(4) For a score s_nOperating with epsilon S to obtain

In the formula, e is an exponential operator; d_kRepresents a zoom scale set to a key vector K_nDimension (d); then all the

(5) For x_iOperations (2) - (4) are performed in all headers, resulting in x_iPrecoding vectors in all headers e₁,e₂,…,e_TWill { e }, will { e₁,e₂,…,e_TConcatenating all the items in the } to get x_iPre-coding of (2); x is to be_iPrecoding and x_iAre summed to obtain

Then will be

Obtained by a feed-forward neural network

To pair

And

summing to obtain x_iSingle layer coding result of

(6) For theAll historical data nodes x_nE, executing (2) - (5) for X to obtain single-layer coding results of all historical data;

(7) for the L-layer encoder, the single-layer encoding result in (6) is used as the input of the next-layer encoder, and (1) - (6) are executed, so that the final encoding H ═ H of all historical data can be obtained₁,h₂,…,h_N}；

Wherein, T, N, d_kL is a hyper-parameter, T is the number of heads in each encoder, N is the number of historical data nodes, d_kFor scaling, the dimension of the key vector K is set, and L is the number of layers of the encoder.

6. The end-to-end memory network-based short-term wind power prediction method according to claim 5, wherein the operation of encoding the predicted multi-node weather data by using an attention mechanism specifically comprises: for predicted multi-node weather data B ═ B₁,b₂,…,b_MFor each node b }_iE.g. B is assigned a weight α_iI is more than or equal to 1 and less than or equal to M, multiplying and summing each node and the corresponding weight thereof, and taking the obtained result u as the input of the end-to-end memory network model; wherein M is a hyper-parameter and is the number of the predicted nodes.

7. The end-to-end memory network-based short-term wind power prediction method according to claim 6, wherein the method for performing power prediction by using an end-to-end memory network model comprises the following steps:

(1) introducing the characteristic vector H into a memory pool of a long-term memory component of the end-to-end memory network model;

(2) random initialization of two embedded matrices

And

wherein X ∈ X represents any one historical data node, and | X | represents data in XNumber, len, represents the code length; for code vector h in memory pool_iE is H, i is more than or equal to 1 and less than or equal to N, H_iMultiplying respectively with A and C to obtain h_iMapping m in two different feature spaces_iAnd c_i(ii) a The same operation is executed to all the code vectors in the memory pool to obtain two feature matrixes M₁＝{m₁,m₂,…,m_N} and M₂＝{c₁,c₂,…,c_N}；

(3) The obtained input vector u and a first feature matrix M are combined₁Multiplying, and calculating attention vector P ═ P by softmax operation according to the obtained result₁,p₂,…,p_N]Wherein p is_iFor input vector u to ith feature vector h_iThe attention of (1);

(4) for the feature matrix M₂Each feature vector c in_iIt is associated with the corresponding attention P in the vector P_iMultiplying, summing all weighted eigenvectors to obtain a context vector o, and adding the context vector o to the input u to obtain a result r₁As input for the next layer;

(5) for the G layer end-to-end memory network, repeating the steps (2) - (4), and outputting the vector r of the last layer_GRespectively performing point multiplication on the M different vectors, wherein the result of each point multiplication is a predicted power; wherein G is a hyper-parameter, which is the number of layers of the end-to-end memory network.