CN115622047B - Power Transformer load prediction method based on Transformer model - Google Patents

Power Transformer load prediction method based on Transformer model Download PDF

Info

Publication number
CN115622047B
CN115622047B CN202211379043.6A CN202211379043A CN115622047B CN 115622047 B CN115622047 B CN 115622047B CN 202211379043 A CN202211379043 A CN 202211379043A CN 115622047 B CN115622047 B CN 115622047B
Authority
CN
China
Prior art keywords
layer
power transformer
data
model
head attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211379043.6A
Other languages
Chinese (zh)
Other versions
CN115622047A (en
Inventor
何霆
王屾
朱文龙
陈世茂
曾建华
杨子骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhonghai Energy Storage Technology Beijing Co Ltd
Original Assignee
Zhonghai Energy Storage Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhonghai Energy Storage Technology Beijing Co Ltd filed Critical Zhonghai Energy Storage Technology Beijing Co Ltd
Priority to CN202211379043.6A priority Critical patent/CN115622047B/en
Publication of CN115622047A publication Critical patent/CN115622047A/en
Application granted granted Critical
Publication of CN115622047B publication Critical patent/CN115622047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Power Engineering (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a power Transformer load prediction method based on a transducer model, which comprises the following steps: collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data set; dividing the data set into a training set, a testing set and a verification set, and ensuring that each data set sampling period can represent a characteristic change sample of the same period; defining and establishing an interactive multi-head attention transducer model, and initializing network internal parameters and learning rate; a three-layer decoder is constructed using a multi-headed attention layer and a multi-headed attention-interaction layer. The power transformer load prediction method provided by the invention can better capture the dependency relationship between long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in smart grid construction.

Description

Power Transformer load prediction method based on Transformer model
Technical Field
The invention belongs to the technical field of power metering data processing, and particularly relates to a method for predicting a load of a power transformer.
Background
The smart grid realizes reliable, safe, economical, efficient and environment-friendly operation of the power grid through an advanced sensing and measuring technology and an advanced control system. The power transformer is an important device in power grid construction, and according to historical operation rule data information, accurate long-term prediction of the load is made, so that an important condition for constructing a smart power grid is provided. The power transformer load prediction is to take historical time series data as a data source, establish a power transformer load prediction mathematical model by utilizing technologies such as data mining, deep learning and the like, and predict the power transformer load according to the established model, thereby being beneficial to realizing reasonable power distribution and reducing power waste.
With the continuous increase of the installed capacity of wind power, the technical and economic effects of wind power grid connection on a main power grid are larger and larger, and the data processing of a transformer is more challenging. Because the grid-connected operation of the wind power plant can bring negative effects to the aspects of power quality, voltage stability, power grid safety and the like of the power grid, the power transformer load can be accurately predicted, and the power quality and the voltage stability degree can be effectively improved. Therefore, how to reasonably estimate the load of the power transformer can effectively reduce unnecessary power waste and fully play the role of auxiliary decision of the intelligent power grid.
The power transformer has the characteristics of complex structure and nonlinear variation of material parameters. In the process of power distribution, the transformer can be adjusted relatively conservatively. In reality, it is very difficult to predict the load of the power transformer, because it is affected by various factors such as weather, temperature, season, environment, etc., and thus exhibits complex variation characteristics. The power transformer load prediction methods proposed at present can be generally divided into two types, one is a statistical model represented by ARIMA, prophet and the other is an autoregressive model represented by RNN. The method is often used for short-term prediction according to single or multiple variables, has low prediction time and precision, is difficult to process a large amount of data and complex time sequence relations in high dimension in real application, and is not suitable for practical application.
Disclosure of Invention
In order to solve the defects existing in the prior art, the invention aims to provide the power Transformer load prediction method based on the interactive multi-head attention transducer model, which is based on the encoder-decoder framework of the transducer model, utilizes depth separable convolution to realize information interaction of different subspaces of the traditional multi-head attention, improves the fitting capacity of the model to data, and simultaneously utilizes a maximum pooling layer to distill time-sequential data, reduces memory overhead in the model training process, and realizes accurate prediction of the power Transformer load.
A second object of the invention is to propose an application using the above prediction method.
A third object of the invention is to propose a device using the above prediction method.
The technical scheme for realizing the purposes of the invention is as follows:
a power Transformer load prediction method based on a transducer model comprises the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Indicating the value of the observed variable at time i, L x Represents the length of the observed time series, d x Representing the number of observed variables;
carrying out normalization processing on the sequence sample data set to enable the sample data value to be between 0 and 1, and obtaining the data set as a sample for supervised learning;
s2, dividing the normalized data set into a training set, a testing set and a verification set, and ensuring that each data set sampling period (the acquisition interval time) can represent a characteristic change sample of the same period;
s3: defining and establishing an interactive multi-head attention transducer model, and initializing network internal parameters and learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises a global time sequence coding and a local time sequence coding, the global time sequence coding consists of year, month and week information in a data time stamp, and a local time sequence coding formula is as follows:
where PE represents position encoding, pos represents position, j represents dimension,
s4, the transducer model consists of an encoder and a decoder, wherein in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the vector with the timing information into the multi-head attention layer to obtain an intermediate value:
wherein W is Q ,W K ,W V The weight matrix is Q, K and V are input vectors;
consists of a plurality of parts, each part representing a subspace:
information interaction on different subspaces is achieved using depth separable convolutions;
wherein Conv1 and Conv2 represent depth-wise Convolume and point-wise Convolume, respectively, and Elu represents an activation function;
then, a linear change layer is used for feature dimension conversion, and finally, a pooling layer is used for downsampling to obtain output:
s5: adopting a multi-head attention layer and a multi-head attention interaction layer to construct a three-layer decoder; first using features f from the multi-head attention interaction layer 1 And feature f from residual connection 2 Calculating weight ratioWherein->Representing a weight matrix, b g Representing the bias, sigmoid represents the activation function. Then based on the ratio, the two features f are compared 1 And f 2 Weighted summation is performed
Fusion(f1,f2)=g⊙f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-headed attention layer and a multi-headed attention-interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the obtained contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for carrying out subspace information interaction on the formed feature vectors, and finally, a linear change layer outputs a final prediction sequence.
The data points in S1 are arranged in time, the sampling can be 1 hour apart or 15 minutes apart, 1min apart, the shorter the time the more finely the data is. S4, in a feature extraction part, the traditional multi-head attention mechanism is to divide the features into a plurality of blocks, and information interaction of different subspaces is not considered, so that the feature extraction capability of a model on time series data is limited. The invention improves the attention mechanism in the model; with convolution processing, the blocks are interrelated and longer-term data can be predicted. The invention introduces a multi-head attention interaction layer based on a multi-head attention mechanism, and uses depth separable convolution to realize information interaction on different subspaces. The method reduces the memory overhead in the model training process. Features can be adaptively selected and redundant information filtered out.
Wherein, the sensor collects data related to the load of the power transformer by using a temperature measuring element, an ammeter and a voltmeter, wherein the data comprises one or more of load, oil temperature, position, climate and demand.
Further, in step S4:
output vector generated by multi-head attention layer
Information interaction is carried out through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution, a linear change layer and a maximum pooling; output tensor formed for multi-head self-attention mechanismFirstly, carrying out information aggregation on a channel dimension by using PointWise convolution of 1x 1; performing information interaction on a space dimension by using DepthWise convolution after an ELU activation function so as to learn correlation on space and correlation among channels at the same time; finally, the time series distillation operation is realized by using a maximum pooling layer with the step length of 2. The operation reduces the length of each layer of encoder by half in the time dimension, and filters redundant information, thereby reducing the memory consumption in the training process.
Wherein, S2, the preprocessed data set is processed according to 7:2:1 are respectively divided into a training set, a test set and a verification set, and each data set sampling period can represent a characteristic change sample of the same period (the same period is the acquired interval time).
Further, in step S4:
the input part of the decoder is denoted asWherein (1)>The value of the last k time steps from the Encoder input,/->Placeholders (filled with 0 s) as target sequences to be predicted; finally, full connectionThe layer is used to output a predicted value whose dimension depends on the number of variables that need to be predicted.
In the step S4, an average absolute error (MSE) loss function and an Adam algorithm with random gradient descent are used in the network convergence process.
According to the method, on one hand, the learning rate of each parameter is dynamically modified, and on the other hand, a momentum method is introduced, so that the parameter update has more opportunities to jump out of local optimum, and network convergence is accelerated and optimized.
The training process is a process of inputting a model, iterating in the gradient descent process, and reducing errors.
The power Transformer prediction method based on the transducer model further comprises S7: performing fitting evaluation on the model, and using earlyStopping to prevent the model from fitting during the training process; for each round of trained model, the verification set obtained in the step S2 is used for verification, and if the test error is found to rise on the verification set along with the increase of training rounds, the training is stopped; the weight after stopping is taken as the final parameter of the network.
The application of the power Transformer prediction method based on the transducer model is that the model is used for prediction: after model evaluation verification, the test set data obtained in the step S2 are input into the model verified by the step S7 to predict future time values.
The method may be used in wind farms or other similar feature facilities, preferably in transformer load prediction for wind farms.
The power Transformer load prediction model based on the interactive multi-head attention transducer receives a historical load sequence as input to predict load values of a plurality of time steps in the future; through the information interaction between the multiple-head attention points, the characteristic extraction capability of the model on long-sequence data is improved, and therefore high-precision long-term prediction on the load of the power transformer is achieved.
An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps when the program is executed.
The invention has the beneficial effects that:
the power Transformer load prediction method based on the interactive multi-head attention transducer model provided by the invention has the advantages that compared with the existing prediction method, the power Transformer load prediction method based on the interactive multi-head attention transducer model has the following advantages: the traditional time sequence prediction method cannot accurately predict long sequence data, introduces interactive multi-head attention on the basis of a transducer, is used for enhancing the characteristic extraction capability of a model on the sequence data, and simultaneously realizes the distillation operation on the sequence data by utilizing a maximum pooling layer in order to reduce the memory overhead in the model training process.
The power transformer load prediction method provided by the invention can better capture the dependency relationship between long sequence data, thereby realizing accurate prediction of the power transformer load and having certain practicability in smart grid construction.
The prediction method utilizes the maximum pooling layer to distill time series data, reduces memory overhead in the model training process, and realizes accurate prediction of the power transformer load.
Drawings
FIG. 1 is a flow chart of the power Transformer load prediction based on the interactive multi-headed attention transducer model of the present invention;
FIG. 2 is a model diagram of the power Transformer load prediction based on the interactive multi-headed attention transducer model of the present invention;
fig. 3 shows the prediction effect of the prediction method IMAHN according to the present invention compared with the real data.
Detailed Description
The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Unless otherwise indicated, all technical means employed in the specification are those known in the art.
The invention is further described in detail below with reference to the drawings and the embodiments, and is a power Transformer load prediction method based on an interactive multi-head attention transducer model.
The training data set used by the embodiment collects the load conditions of the power transformers in two different areas of the same province in China from 2016 to 2018. Each data point was recorded once per minute (labeled m), and was designated ETT-small-m1. The dataset contained 2 years x 365 days x 24 hours x 4 = 70,080 data points. In addition, the present dataset also provides one hour level granularity of dataset variant usage (labeled h), namely ETT-small-h1 and ETT-small-h2. Each data point contains an 8-dimensional characteristic including the date of record of the data point, the predicted value "oil temperature", and 6 different types of external load values, high payload (High usefull load), high payload (High useless load), medium payload (middle useful load), medium payload (middle useless load), low payload (low usefull load), low payload (low useless load), respectively.
Example 1:
fig. 1 is a flowchart of a power Transformer load prediction method based on an interactive multi-head attention transducer model according to the present invention. The method specifically comprises the following steps:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Indicating the value of the observed variable at time i, L x Represents the length of the observed time series, d x Representing the number of observed variables;
carrying out normalization processing on the sequence sample data set to enable the sample data value to be between 0 and 1, and obtaining the data set as a sample for supervised learning;
s2, normalizing the data set according to 7:2:1 is divided into a training set, a test set and a verification set, and each data set sampling period can represent a characteristic change sample of the same period.
And ensuring that each data set sampling period can represent a characteristic change sample of the same period;
s3: defining and establishing an interactive multi-head attention transducer model, and initializing network internal parameters and learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises a global time sequence coding and a local time sequence coding, the global time sequence coding consists of year, month and week information in a data time stamp, and a local time sequence coding formula is as follows:
where PE represents position encoding, pos represents position, j represents dimension,
s4, the transducer model consists of an encoder and a decoder, wherein in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: inputting the vector with the timing information into the multi-head attention layer to obtain an intermediate value:
wherein W is Q ,W K ,W V The weight matrix is Q, K and V are input vectors;
consists of a plurality of parts, each part representing a subspace:
information interaction on different subspaces is achieved using depth separable convolutions;
wherein Conv1 and Conv2 represent depth-wise Convolume and point-wise Convolume, respectively, and Elu represents an activation function;
then, a linear change layer is used for feature dimension conversion, and finally, a pooling layer is used for downsampling to obtain output:
in step S4:
output vector generated by multi-head attention layer
Information interaction is carried out through a multi-head attention interaction layer, and the interaction module consists of a depth separable convolution, a linear change layer and a maximum pooling; output tensor formed for multi-head self-attention mechanismInformation aggregation is first performed in the channel dimension using a PointWise convolution of 1x 1. Information interaction is performed in a spatial dimension by using DepthWise convolution after ELU activation functions so as to learn correlation in space and correlation among channels at the same time. Finally, the time series distillation operation is realized by using a maximum pooling layer with the step length of 2. Wherein the information interaction module consists of a depth separable convolution, a linear change layer and maximum pooling, and is used for forming an output tensor for a multi-head self-attention mechanism>Information aggregation is first performed in the channel dimension using a PointWise convolution of 1x 1. Performing information interaction on a space dimension by using DepthWise convolution after an ELU activation function so as to learn correlation on space and correlation among channels at the same time; finally, the time series distillation operation is realized by using a maximum pooling layer with the step length of 2.
S4, in the step of:
output vector O generated by a multi-headed attention layer i =Attention(QWi i Q ,KW i K ,VW i V ) And information interaction is performed through the multi-head attention interaction layer.
The input part of the decoder is denoted asWherein (1)>The value of the last k time steps from the Encoder input,/->Placeholders (filled with 0 s) as target sequences to be predicted; finally, the fully connected layer is used to output a predicted value, the dimension of which depends on the number of variables that need to be predicted.
Step 4 in the network convergence process, adam's algorithm with mean absolute error (MSE) loss function and random gradient descent is used.
S5: adopting a multi-head attention layer and a multi-head attention interaction layer to construct a three-layer decoder; first using features f from the multi-head attention interaction layer 1 And feature f from residual connection 2 Calculating weight ratioWherein->Representing a weight matrix, b g Representing the bias, sigmoid represents the activation function. Then based on the ratio, the two features were weighted and summed Fusion (f 1, f 2) =g.sup.f 1 +(1-g)f 2
S6: a decoder is constructed using a multi-headed attention layer and a multi-headed attention-interaction layer. The multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the obtained contribution degree score and the Value matrix to obtain a feature vector. The multi-head attention interaction layer is responsible for carrying out subspace information interaction on the formed feature vectors, and finally, a linear change layer outputs a final prediction sequence.
S7: performing fitting evaluation on the model, and using earlyStopping to prevent the model from fitting during the training process; for each round of trained model, the verification set obtained in the step S2 is used for verification, and if the test error is found to rise on the verification set along with the increase of training rounds, the training is stopped; the weight after stopping is taken as the final parameter of the network.
After model evaluation verification, the test set data obtained in the step 2 are input into the model verified in the step 5 to predict future time values. Fig. 3 shows the partial prediction results of the method on the ETT data set, and tables 1 and 2 show the comparison results of the method under single variable and multiple variable conditions compared with other prediction methods, respectively, and the effectiveness and the advancement of the model can be seen from the diagrams.
Table 1 univariate time series prediction results
In Table 1, IMHAN is the method proposed by the present invention, and Informer, LSTMa, deepAR, ARIMA, prophet is the comparative method.
MAE (mean absolute error), MSE (mean square error) is an evaluation index.
Example 2:
the same power Transformer load prediction method as in example 1 was used to obtain a Transformer model. In this embodiment, a plurality of variables including load, oil temperature, location, climate, demand are input for prediction. The data in the original data set is obtained by means of temperature measuring elements, current and user side power measurement and the like. The present embodiment predicts the load as a variable using multiple variables; the dimensions of the formula input are different from those of example 1.
The results obtained by the transducer model are shown in table 2:
TABLE 2 multivariate time series prediction results
In table 2, IMAHN is the method presented herein and Informer, LSTMa, LSTnet is a comparative predictive method.
Example 3: application of
After model evaluation verification, the test set data obtained in the step 2 are input into the model verified in the step 7 to predict future time values, so that the type selection and setting of transformers in the power grid are guided.
For the transformer of the wind driven generator grid connection, the multivariable middle position and the climate of the transformer are changed according to different wind farm settings, and the prediction method is particularly suitable for load prediction of the transformer of the wind power farm.
Although the invention has been described by way of examples, it will be appreciated by those skilled in the art that modifications and variations may be made thereto without departing from the spirit and scope of the invention.

Claims (10)

1. The power Transformer load prediction method based on the transducer model is characterized by comprising the following steps of:
s1, collecting load data of a power transformer, and arranging the collected load data of the power transformer according to time to obtain a sequence sample data setx i Indicating the value of the observed variable at time i, L x Represents the length of the observed time series, d x Representing the number of observed variables;
carrying out normalization processing on the sequence sample data set to enable the sample data value to be between 0 and 1, and obtaining the data set as a sample for supervised learning;
s2, dividing the normalized data set into a training set, a testing set and a verification set, and ensuring that each data set sampling period can represent a characteristic change sample in the same period;
s3: defining and establishing an interactive multi-head attention transducer model, and initializing network internal parameters and learning rate; the original data is converted into a feature vector with position information after passing through an embedding layer and a position coding layer, wherein the time sequence coding comprises a global time sequence coding and a local time sequence coding, the global time sequence coding consists of year, month and week information in a data time stamp, and a local time sequence coding formula is as follows:
where PE represents position encoding, pos represents position, j represents dimension,
s4, the transducer model consists of an encoder and a decoder, wherein in the encoder, a multi-head attention layer and a multi-head attention interaction layer are adopted for feature extraction, and the method comprises the following steps: the vector with timing information is input into the multi-head attention layer to obtain an intermediate value:
wherein W is Q ,W K ,W V The weight matrix is Q, K and V are input vectors;
consists of a plurality of parts, each part representing a subspace:
information interaction on different subspaces is achieved using depth separable convolutions;
wherein Conv1 and Conv2 represent depth-wise Convolume and point-wise Convolume, respectively, and Elu represents an activation function;
then, a linear change layer is used for feature dimension conversion, and finally, a pooling layer is used for downsampling to obtain output:
s5: adopting a multi-head attention layer and a multi-head attention interaction layer to construct a three-layer decoder; first using features f from the multi-head attention interaction layer 1 And feature f from residual connection 2 Calculating weight ratioWherein->Representing a weight matrix, b g Representing the bias, sigmoid represents the activation function; then based on the ratio, the characteristic f is calculated 1 And f 2 Weighted summation Fusion (f 1, f 2) =g+.f 1 +(1-g)f 2
S6: constructing a decoder by adopting a multi-head attention layer and a multi-head attention interaction layer; the multi-head attention layer is responsible for carrying out inner product operation on the Query matrix and the Key matrix to obtain a contribution degree score, and then multiplying the obtained contribution degree score by the Value matrix to obtain a feature vector; the multi-head attention interaction layer is responsible for carrying out subspace information interaction on the formed feature vectors, and finally, a linear change layer outputs a final prediction sequence.
2. The method for predicting load of power Transformer based on transducer model according to claim 1, wherein the data related to load of power Transformer is collected by using temperature measuring element, ammeter, voltmeter, sensor, and the data includes one or more of load, oil temperature, position, climate, and demand.
3. The power Transformer load prediction method based on the Transformer model according to claim 1, wherein in step S4:
output vector generated by multi-head attention layer
Information interaction is carried out through a multi-head attention interaction layer, wherein the multi-head attention interaction layer consists of a depth separable convolution, a linear change layer and a maximum pooling; output tensor formed for multi-head self-attention mechanismFirstly, carrying out information aggregation on a channel dimension by using PointWise convolution of 1x 1; performing information interaction on a space dimension by using DepthWise convolution after an ELU activation function so as to learn correlation on space and correlation among channels at the same time; finally, the time series distillation operation is realized by using a maximum pooling layer with the step length of 2.
4. The power Transformer load prediction method based on the Transformer model according to claim 1, wherein S2 is characterized by: 2: the ratio of 1 is divided into a training set, a test set and a verification set respectively, and each data set sampling period can represent a characteristic change sample of the same period.
5. The power Transformer load prediction method based on the Transformer model according to claim 1, wherein in step S4:
the input part of the decoder is denoted asWherein (1)>The value of the last k time steps from the Encoder input,/->Filling placeholders as target sequences to be predicted with 0 s; finally, the fully connected layer is used to output a predicted value, the dimension of which depends on the number of variables that need to be predicted.
6. The power Transformer load prediction method based on the Transformer model according to claim 1, wherein the step S4 network convergence process uses an average absolute error (MSE) loss function and a random gradient descent Adam algorithm.
7. The power Transformer load prediction method based on the Transformer model according to any one of claims 1 to 6, further comprising S7:
performing fitting evaluation on the model, and using earlyStopping to prevent the model from fitting during the training process; for each round of trained model, the verification set obtained in the step S2 is used for verification, and if the test error is found to rise on the verification set along with the increase of training rounds, the training is stopped; the weight after stopping is taken as the final parameter of the network.
8. Use of a power transformer load prediction method according to any of claims 1-7, characterized in that the prediction is performed by means of a model: after model evaluation verification, the test set data obtained in the step S2 are input into the model verified in the step S7 to predict future time values.
9. The use according to claim 8, characterized by a transformer load prediction for a wind farm.
10. A computer program running device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any one of claims 1-7 when executing the program.
CN202211379043.6A 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model Active CN115622047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211379043.6A CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211379043.6A CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Publications (2)

Publication Number Publication Date
CN115622047A CN115622047A (en) 2023-01-17
CN115622047B true CN115622047B (en) 2023-07-18

Family

ID=84877989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211379043.6A Active CN115622047B (en) 2022-11-04 2022-11-04 Power Transformer load prediction method based on Transformer model

Country Status (1)

Country Link
CN (1) CN115622047B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070799B (en) * 2023-03-30 2023-05-30 南京邮电大学 Photovoltaic power generation amount prediction system and method based on attention and deep learning
CN117034175B (en) * 2023-10-07 2023-12-05 北京麟卓信息科技有限公司 Time sequence data anomaly detection method based on channel fusion self-attention mechanism
CN117292243B (en) * 2023-11-24 2024-02-20 合肥工业大学 Method, equipment and medium for predicting magnetocardiogram signal space-time image based on deep learning
CN117435918B (en) * 2023-12-20 2024-03-15 杭州市特种设备检测研究院(杭州市特种设备应急处置中心) Elevator risk early warning method based on spatial attention network and feature division
CN117851897A (en) * 2024-03-08 2024-04-09 国网山西省电力公司晋城供电公司 Multi-dimensional feature fusion oil immersed transformer online fault diagnosis method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297885B (en) * 2019-05-27 2021-08-17 中国科学院深圳先进技术研究院 Method, device and equipment for generating real-time event abstract and storage medium
CN111080032B (en) * 2019-12-30 2023-08-29 成都数之联科技股份有限公司 Load prediction method based on transducer structure
CN112288595A (en) * 2020-10-30 2021-01-29 腾讯科技(深圳)有限公司 Power grid load prediction method, related device, equipment and storage medium

Also Published As

Publication number Publication date
CN115622047A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
CN115622047B (en) Power Transformer load prediction method based on Transformer model
CN108711847B (en) A kind of short-term wind power forecast method based on coding and decoding shot and long term memory network
CN109492823B (en) Method for predicting icing thickness of power transmission line
CN112633604B (en) Short-term power consumption prediction method based on I-LSTM
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN104951836A (en) Posting predication system based on nerual network technique
CN113379164B (en) Load prediction method and system based on deep self-attention network
Liu et al. Heating load forecasting for combined heat and power plants via strand-based LSTM
CN111160626B (en) Power load time sequence control method based on decomposition fusion
CN111553543A (en) Power load prediction method based on TPA-Seq2Seq and related assembly
CN115660161A (en) Medium-term and small-term load probability prediction method based on time sequence fusion Transformer model
CN111242351A (en) Tropical cyclone track prediction method based on self-encoder and GRU neural network
Kalogirou et al. Prediction of maximum solar radiation using artificial neural networks
CN112990597B (en) Ultra-short-term prediction method for industrial park power consumption load
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN113111592A (en) Short-term wind power prediction method based on EMD-LSTM
CN114498619A (en) Wind power prediction method and device
CN116865251A (en) Short-term load probability prediction method and system
CN116014722A (en) Sub-solar photovoltaic power generation prediction method and system based on seasonal decomposition and convolution network
CN117977587A (en) Power load prediction system and method based on deep neural network
CN117277304A (en) Photovoltaic power generation ultra-short-term power prediction method and system considering sunrise and sunset time
CN116703644A (en) Attention-RNN-based short-term power load prediction method
Li et al. Electricity Sales Forecasting Based on Model Fusion and Prophet Model
CN117895473A (en) Power grid operation checking method and system based on various historical load values
Wang et al. Research of combination of electricity GM (1, 1) and seasonal time series forecasting model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant