CN116680532A

CN116680532A - Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling

Info

Publication number: CN116680532A
Application number: CN202310630341.6A
Authority: CN
Inventors: 汪李忠; 朱鹏; 唐洪良; 高俊青; 陈悦; 周潮; 姚海燕; 崔金栋; 郭强
Original assignee: State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co; State Grid Zhejiang Electric Power Co Ltd Hangzhou Yuhang District Power Supply Co; Hangzhou Power Equipment Manufacturing Co Ltd; Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co; State Grid Zhejiang Electric Power Co Ltd Hangzhou Yuhang District Power Supply Co; Hangzhou Power Equipment Manufacturing Co Ltd; Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-09-01

Abstract

The invention discloses a transformer fault on-line diagnosis method for processing unbalanced small samples based on NNTR-SMOTE oversampling, which comprises the following five steps: firstly, standardizing collected transformer fault sample data, and obtaining balance data by using a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method; secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set; thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method; and fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis. And fifthly, performing online diagnosis on the faults of the transformer by using the trained XGBoost model. The invention can effectively solve the problems of misjudgment and missed judgment of the unbalanced fault sample, and improves the diagnosis precision of the model.

Description

Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling

The invention belongs to the field of transformer fault diagnosis research, and particularly relates to an on-line transformer fault diagnosis method for processing unbalanced small samples based on NNTR-SMOTE oversampling.

Background

The oil immersed transformer is one of key equipment of a power transmission and transformation system, and the running state of the oil immersed transformer determines whether the power system can run safely and reliably. However, in the current running oil immersed transformer in China, a large part of the running years are longer, and fault hidden dangers such as insulation degradation exist. In order to ensure the reliability and economy of the operation of the power system, a high-efficiency and accurate real-time diagnosis model of the transformer fault must be established.

When the oil immersed transformer is subjected to insulation aging, a small amount of gas is dissolved in the insulating oil, and the composition components and the proportion relation of the dissolved gas in the oil can reflect the operation condition of the transformer. The oil immersed transformer in normal operation can generate a very small amount of gas due to insulation aging cracking and the like, and the main component of the oil immersed transformer is H ₂ 、CH ₄ 、C ₂ H ₄ 、C ₂ H ₂ 、C ₂ H ₆ 、CO ₂ And CO. The transformer fault type and the change of the gas components show stronger correlation, so the method for analyzing the dissolved gas in the oil (Dissolved Gas Analysis, DGA) for analyzing the gas components and the gas content is one of the most important transformer state detection and fault diagnosis methods at present, the improved three-ratio method, the Duval triangle method, the Rogers ratio method and other rules formed based on the DGA are simple, play an important role, but all the problems of incomplete state coding, fuzzy coding limit and unclear exist, and the fault diagnosis precision is low. With the development of machine learning theory, models such as an artificial neural network (Artificial Neural Network, ANN), a support vector machine (Support Vector Machine, SVM), an extreme learning machine (Extreme Learning Machine, ELM) and the like have achieved good application results in transformer fault identification. However, the above method, although flexible and accurate, requires a large amount of training of the data support model. In actual production, the occurrence rate of different faults of the transformer has obvious difference, so that samples accumulated by different faults are extremely unbalanced, and diagnosis of small sample faults based on models trained by unbalanced data easily leads to misjudgment and misjudgment of the small sample faultsAnd (5) missing judgment.

Currently, the processing method for unbalanced data balances data mainly through downsampling and oversampling. The over-sampling method can increase redundant data, and an over-fitting problem can exist, so that data distribution is unreasonable and even the data is deviated from a real situation. The downsampling method may result in the possible loss of useful data information in the data set.

Aiming at the problems, the invention provides an online diagnosis method for transformer faults under the condition of unbalanced samples, which can effectively solve the problems of misjudgment and missed judgment of unbalanced fault samples.

Disclosure of Invention

An on-line fault diagnosis method for a transformer based on NNTR-SMOTE oversampling for processing unbalanced small samples effectively improves fault diagnosis precision of the transformer.

The invention is realized by adopting the following technical scheme: an on-line diagnosis method for transformer faults based on NNTR-SMOTE oversampling for processing unbalanced small samples comprises the following steps:

firstly, standardizing collected transformer fault sample data, and obtaining balance data by using a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;

secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set;

thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method;

and fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis.

And fifthly, performing online diagnosis on the faults of the transformer by using the trained XGBoost model.

Further, the specific steps of the first step are as follows:

step 1: collecting transformer DGA sample data

Where n is the number of dissolved gases in the collected oil and m is the number of samples collected.

Specifically, the dissolved gas in the oil includes H ₂ 、CH ₄ 、C ₂ H ₄ 、C ₂ H ₂ 、C ₂ H ₆ The fault types of the variable transformer are respectively as follows according to the temperature and the discharge energy of the fault of the transformer: normal, medium temperature overheat, medium and low temperature overheat, high temperature overheat, discharge double overheat, partial discharge, low energy discharge and high energy discharge, and tags 1 to 8.

Step 2: performing Z-score standardization processing on the collected fault sample data to obtain standardized sample data

Specifically, the Z-score normalization processing formula is:

where μ is the mean of the x dataset, σ is the standard deviation of the dataset elements, and x' is the result after x normalization.

Step 3: obtaining balance data by utilizing a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;

specifically, assuming that the number of the majority samples is M and the number of the minority samples is P, the minority sampling rate N is calculated.

N＝(M-P)/P

Selecting a minority class arbitrary sample y ₀ At y ₀ 2 neighbor minority class samples y of (2) ₁ And y ₂ Generates a new artificial sample y' _new The formula is:

y′ _new ＝y ₂ +rand(0，1)[y ₀ +rand(0，1)(y ₁ -y ₀ )-y ₂ ]

wherein rand (0, 1) represents a random number between 0 and 1.

When all the artificial samples are generated, detecting the nearest neighbor samples, and if the nearest neighbor samples are of the same type, reserving the samples; if heterogeneous, the sample is deleted. And calculating the sampling multiplying power, and synthesizing the artificial sample until the sampling multiplying power is 0.

Further, the specific steps of the second step are as follows:

step 1: constructing the characteristic of dissolved gas in oil by adopting a non-coding ratio method for fault sample data after standardized treatment;

the specific characteristics of dissolved gas in oil are as follows: CH (CH) ₄ /H ₂ 、C ₂ H ₂ /H ₂ 、C ₂ H ₂ /C ₂ H ₄ 、C ₂ H ₄ /C ₂ H ₆ 、C ₂ H ₆ /CH ₄ 、C ₂ H ₂ /CH ₄ 、C ₂ H ₄ /CH ₄ 、H ₂ /THC、CH ₄ /THC、C ₂ H ₄ /THC、C ₂ H ₆ /THC、C ₂ H ₂ /THC、(CH ₄ +C ₂ H ₄ )/THC、H ₂ /ALL、CH ₄ /ALL、C ₂ H ₂ /ALL、C ₂ H ₄ /ALL、C ₂ H ₆ /ALL。

Wherein thc=ch ₄ +C ₂ H ₂ +C ₂ H ₄ +C ₂ H ₆ ，ALL＝H ₂ +CH ₄ +C ₂ H ₂ +C ₂ H ₄ +C ₂ H ₆ 。

Further, the specific steps of the third step are as follows:

step 1: calculating Euclidean distance sigma among m d-dimensional samples, and generating a distance matrix delta;

assuming that m d-dimensional samples are present, the Euclidean distance between the samples is defined as:

in the method, in the process of the invention,σ _ij for the distance between the i-th sample and the j-th sample, namely:

step 2: calculating the inner product matrix of the sample after dimension reduction, namely solving the delta double-centering matrix

The solution of the matrix X can be obtained by carrying out singular value decomposition on the double-centering matrix of the distance matrix delta, wherein the double-centering matrix of the matrix delta is as follows:

in the method, in the process of the invention, the expression is:

step 3: for inner product matrixPerforming eigenvalue decomposition to obtain k eigenvalues and corresponding eigenvectors, and arranging the eigenvalues in descending order;

because of the matrixIs a symmetrical and semi-positive matrix, for which>Singular value decompositionThe solution is as follows:

in the above-mentioned formula, the formula,the diagonal matrix composed of eigenvalues of (a) is Λ; />Is U. For matrix->The eigenvalues of (c) are ordered from large to small.

Step 4: and obtaining a sample X after dimension reduction.

Selecting the first a larger eigenvalues and their corresponding eigenvectors, and then usingThe sample X after dimension reduction can be calculated.

Further, the specific steps of the fourth step are as follows:

step 1: dividing the sample data after dimension reduction into a training set, a verification set and a test set according to a certain proportion;

step 2: constructing an XGBoost model by utilizing data in a training set;

specifically, for a data set d= { (x) having n samples and m features _i ，y _i )}(x _i ∈R ^m ，y _i E R), K CART final prediction outputThe method comprises the following steps:

F＝{f(x)＝ω _q(x) }，q：R ^m →T，ω∈R ^T

wherein each function f _k Corresponding to an independent tree structure vector q and leaf weights omega, q points to corresponding leaf labels by samples, each leaf node of each CART corresponds to a continuous score value, namely the weight, and the score of the ith node is omega _i The method comprises the steps of carrying out a first treatment on the surface of the T is the number of leaf nodes; f is a set formed by CART; omega _q(x) A score for sample x, i.e., model predictive value. For each sample, each CART classifies the samples into leaf nodes according to different classification rules, and a final prediction result is obtained by accumulating the scores omega of the corresponding leaves.

Step 3: setting initial parameters of an XGBoost model, pre-training, and continuously adjusting the model parameters by utilizing GA;

specifically, the iteration times, the learning rate eta and the maximum depth d of the decision tree are selected _max Extraction ratio r of random samples _subsample Extraction ratio r of features _colsample And decision tree node splitting criterion gamma _split The 6 super parameters are optimized by GA to improve the performance of the diagnostic model.

Step 4: judging whether the maximum iteration times or termination conditions are reached, if so, taking the training parameter value at the moment as the optimal parameter of the model, otherwise, returning to the step 3;

step 5: and performing ten-fold cross validation, testing the diagnosis effect of the model, and outputting a fault classification result.

Further, the specific steps of the fifth step are as follows:

step 1: the collected transformer fault sample data is subjected to standardized treatment, and the characteristic of dissolved gas in oil is constructed by adopting a non-coding ratio method, so that characteristic data are obtained;

step 2: feature fusion is carried out on the feature data by adopting an MDS method, so that fusion data are obtained;

step 3: and importing the fusion data into the trained XGBoost model, and judging the fault type.

The beneficial effects of the invention are as follows:

the invention provides an online diagnosis method for transformer faults under an unbalanced sample condition. Firstly, the method takes DGA data as the characteristic quantity of a model, and the characteristic of dissolved gas in the oil is constructed after standardized treatment; secondly, obtaining balance data by using an NNTR-SMOTE oversampling method, and carrying out feature fusion on the balance data by adopting a multidimensional scale analysis (MDS) method; thirdly, constructing an XGBoost model, continuously optimizing model parameters by utilizing a GA algorithm, and constructing a transformer fault diagnosis model according to the optimal parameters: and finally, performing online diagnosis on the faults of the transformer by using the trained XGBoost model. The method can accurately judge the fault state of the transformer, effectively solves the problem of low classification precision caused by unbalanced fault data, and further provides an auxiliary decision for online diagnosis of the transformer fault.

Drawings

Figure 1 is a block flow diagram of a method according to the present invention.

FIG. 2 is a graph comparing the characteristic distribution of training data before and after sampling according to an embodiment.

FIG. 3 is a graph of diagnostic results for different input features of an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention.

Firstly, standardized processing is carried out on collected transformer fault sample data, and a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method is used for obtaining balance data, wherein the method comprises the following specific steps:

s101: collecting transformer DGA sample data

Specifically, the dissolved gas in the oil includes H ₂ 、CH ₄ 、C ₂ H ₄ 、C ₂ H ₂ 、C ₂ H ₆ Root of Chinese characterThe fault types of the variable transformer are respectively as follows according to the temperature and the discharge energy of the fault of the transformer: normal, medium temperature overheat, medium and low temperature overheat, high temperature overheat, discharge double overheat, partial discharge, low energy discharge and high energy discharge, and tags 1 to 8.

S102: performing Z-score standardization processing on the collected fault sample data to obtain standardized sample data

Specifically, the Z-score normalization processing formula is:

S103: obtaining balance data by utilizing a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;

N＝(M-P)/P

y′ _new ＝y ₂ +rand(0，1)[y ₀ +rand(0，1)(y ₁ -y ₀ )-y ₂ ]

wherein rand (0, 1) represents a random number between 0 and 1.

Secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set, wherein the specific steps are as follows:

s201: constructing the characteristic of dissolved gas in oil by adopting a non-coding ratio method for fault sample data after standardized treatment;

Wherein thc=ch ₄ +C ₂ H ₂ +C ₂ H ₄ +C ₂ H ₆ ，ALL＝H ₂ +CH ₄ +C ₂ H ₂ +C ₂ H ₄ +C ₂ R ₆ 。

Thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method, wherein the method comprises the following specific steps of:

s301: calculating Euclidean distance sigma among m d-dimensional samples, and generating a distance matrix delta;

there were 724 18-dimensional samples, and the euclidean distance between samples was defined as:

in sigma _ij For the distance between the i-th sample and the j-th sample, namely:

s302: calculating the inner product matrix of the sample after dimension reduction, namely solving the delta double-centering matrix

in the method, in the process of the invention, the expression is:

s303: for inner product matrixPerforming eigenvalue decomposition to obtain k eigenvalues and corresponding eigenvectors, and arranging the eigenvalues in descending order;

because of the matrixIs a symmetrical and semi-positive matrix, for which>Singular value decomposition is performed, namely:

S304: and obtaining a sample X after dimension reduction.

Selecting the first 8 larger eigenvalues and their corresponding eigenvectors, and then usingThe sample X after dimension reduction can be calculated.

Fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis, wherein the method comprises the following specific steps of:

s401: dividing the sample data after dimension reduction into a training set, a verification set and a test set according to a certain proportion;

s402: constructing an XGBoost model by utilizing data in a training set;

F＝{f(x)＝ω _q(x) }，q：R ^m →T，ω∈R ^T

wherein, the liquid crystal display device comprises a liquid crystal display device,each function f _k Corresponding to an independent tree structure vector q and leaf weights omega, q points to corresponding leaf labels by samples, each leaf node of each CART corresponds to a continuous score value, namely the weight, and the score of the ith node is omega _i The method comprises the steps of carrying out a first treatment on the surface of the T is the number of leaf nodes; f is a set formed by CART; omega _q(x) A score for sample x, i.e., model predictive value. For each sample, each CART classifies the samples into leaf nodes according to different classification rules, and a final prediction result is obtained by accumulating the scores omega of the corresponding leaves.

S403: setting initial parameters of an XGBoost model, pre-training, and continuously adjusting the model parameters by utilizing GA;

S404: judging whether the maximum iteration times or termination conditions are reached, if so, taking the training parameter value at the moment as the optimal parameter of the model, otherwise, returning to the step 3;

s405: and performing ten-fold cross validation, testing the diagnosis effect of the model, and outputting a fault classification result.

Fifthly, performing online diagnosis on transformer faults by using the trained XGBoost model, wherein the method comprises the following specific steps of:

s501: the collected transformer fault sample data is subjected to standardized treatment, and the characteristic of dissolved gas in oil is constructed by adopting a non-coding ratio method, so that characteristic data are obtained;

s502: feature fusion is carried out on the feature data by adopting an MDS method, so that fusion data are obtained;

s503: and importing the fusion data into the trained XGBoost model, and judging the fault type.

Examples:

firstly, 425 cases of transformer fault samples are collected, wherein the 425 cases comprise 91 normal cases, 46 cases of medium-temperature overheat, 33 cases of medium-low-temperature overheat, 53 cases of high-temperature overheat, 17 cases of discharge and overheat, 29 cases of partial discharge, 72 cases of low-energy discharge and 84 cases of high-energy discharge.

The minority class fault samples are expanded by using NNTR-SMOTE oversampling algorithm, and the distribution of the sample numbers before and after the treatment is shown in table 1.

TABLE 1 sample number comparison before and after treatment

In the embodiment, the XGBoost model is selected as a classifier, and the GA algorithm is adopted to optimize the learning capacity and classification performance of the XGBoost model. Dividing the sample data after dimension reduction into a training set, a verification set and a test set according to the ratio of 6:2:2. The initial parameters of XGBoost are preset as follows: learning rate η=0.3; decision tree node splitting criterion gamma _split =0; maximum depth d of decision tree _max =6; extraction ratio r of random samples _subsample =1; extraction ratio r of features _colsample =1; and when the maximum iteration times are obtained, obtaining an optimal GA-XGBoost classifier, and finally inputting the test set into the optimal classifier to obtain a classification result shown in the following figure 2.

In order to illustrate the superiority of the MDS-based feature fusion method, the feature fusion method is input into a GA-XGBoost optimal classifier by combining with an IEC three-ratio method, a Rogers four-ratio method and 18 dimensions to perform fault diagnosis, and the diagnosis result is shown in figure 3.

In order to verify the superiority of the method, 5 models of GA-XGBoost and PSO-XGBoost, XGBoost, SVM, ELM are subjected to comparison analysis, ten-fold cross verification is adopted for all models, and the model diagnosis accuracy is shown in Table 2.

Table 2 comparison of multimode diagnostic accuracy

As can be seen from table 2: the method has higher accuracy under the condition of unbalanced small samples.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention, which is defined by the following claims.

Claims

1. An on-line diagnosis method for transformer faults based on NNTR-SMOTE oversampling to process unbalanced small samples is characterized by comprising the following steps:

2. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the first step are as follows:

step 1: collecting transformer DGA sample data

Specifically, the Z-score normalization processing formula is:

N＝(M-P)/P

y′ _new ＝y ₂ +rand(0，1)[y ₀ +rand(0，1)(y ₁ -y ₀ )-y ₂ ]

wherein rand (0, 1) represents a random number between 0 and 1.

3. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the second step are as follows:

4. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the third step are as follows:

in the method, in the process of the invention,Δ ⁽²⁾ ＝σ _ij ² ，/>the expression is:

Step 4: and obtaining a sample X after dimension reduction.

5. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the fourth step are as follows:

step 2: constructing an XGBoost model by utilizing data in a training set;

F＝{f(x)＝ω _q(x) }，q：R ^m →T，ω∈R ^T

6. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the fifth step are as follows: