CN116680532A - Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling - Google Patents

Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling Download PDF

Info

Publication number
CN116680532A
CN116680532A CN202310630341.6A CN202310630341A CN116680532A CN 116680532 A CN116680532 A CN 116680532A CN 202310630341 A CN202310630341 A CN 202310630341A CN 116680532 A CN116680532 A CN 116680532A
Authority
CN
China
Prior art keywords
samples
sample
transformer
data
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310630341.6A
Other languages
Chinese (zh)
Inventor
汪李忠
朱鹏
唐洪良
高俊青
陈悦
周潮
姚海燕
崔金栋
郭强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co
State Grid Zhejiang Electric Power Co Ltd Hangzhou Yuhang District Power Supply Co
Hangzhou Power Equipment Manufacturing Co Ltd
Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co
State Grid Zhejiang Electric Power Co Ltd Hangzhou Yuhang District Power Supply Co
Hangzhou Power Equipment Manufacturing Co Ltd
Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co, State Grid Zhejiang Electric Power Co Ltd Hangzhou Yuhang District Power Supply Co, Hangzhou Power Equipment Manufacturing Co Ltd, Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd Hangzhou Linping District Power Supply Co
Priority to CN202310630341.6A priority Critical patent/CN116680532A/en
Publication of CN116680532A publication Critical patent/CN116680532A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/50Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
    • G01R31/62Testing of transformers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Medical Informatics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Power Engineering (AREA)
  • Testing Electric Properties And Detecting Electric Faults (AREA)

Abstract

The invention discloses a transformer fault on-line diagnosis method for processing unbalanced small samples based on NNTR-SMOTE oversampling, which comprises the following five steps: firstly, standardizing collected transformer fault sample data, and obtaining balance data by using a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method; secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set; thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method; and fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis. And fifthly, performing online diagnosis on the faults of the transformer by using the trained XGBoost model. The invention can effectively solve the problems of misjudgment and missed judgment of the unbalanced fault sample, and improves the diagnosis precision of the model.

Description

Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling
The invention belongs to the field of transformer fault diagnosis research, and particularly relates to an on-line transformer fault diagnosis method for processing unbalanced small samples based on NNTR-SMOTE oversampling.
Background
The oil immersed transformer is one of key equipment of a power transmission and transformation system, and the running state of the oil immersed transformer determines whether the power system can run safely and reliably. However, in the current running oil immersed transformer in China, a large part of the running years are longer, and fault hidden dangers such as insulation degradation exist. In order to ensure the reliability and economy of the operation of the power system, a high-efficiency and accurate real-time diagnosis model of the transformer fault must be established.
When the oil immersed transformer is subjected to insulation aging, a small amount of gas is dissolved in the insulating oil, and the composition components and the proportion relation of the dissolved gas in the oil can reflect the operation condition of the transformer. The oil immersed transformer in normal operation can generate a very small amount of gas due to insulation aging cracking and the like, and the main component of the oil immersed transformer is H 2 、CH 4 、C 2 H 4 、C 2 H 2 、C 2 H 6 、CO 2 And CO. The transformer fault type and the change of the gas components show stronger correlation, so the method for analyzing the dissolved gas in the oil (Dissolved Gas Analysis, DGA) for analyzing the gas components and the gas content is one of the most important transformer state detection and fault diagnosis methods at present, the improved three-ratio method, the Duval triangle method, the Rogers ratio method and other rules formed based on the DGA are simple, play an important role, but all the problems of incomplete state coding, fuzzy coding limit and unclear exist, and the fault diagnosis precision is low. With the development of machine learning theory, models such as an artificial neural network (Artificial Neural Network, ANN), a support vector machine (Support Vector Machine, SVM), an extreme learning machine (Extreme Learning Machine, ELM) and the like have achieved good application results in transformer fault identification. However, the above method, although flexible and accurate, requires a large amount of training of the data support model. In actual production, the occurrence rate of different faults of the transformer has obvious difference, so that samples accumulated by different faults are extremely unbalanced, and diagnosis of small sample faults based on models trained by unbalanced data easily leads to misjudgment and misjudgment of the small sample faultsAnd (5) missing judgment.
Currently, the processing method for unbalanced data balances data mainly through downsampling and oversampling. The over-sampling method can increase redundant data, and an over-fitting problem can exist, so that data distribution is unreasonable and even the data is deviated from a real situation. The downsampling method may result in the possible loss of useful data information in the data set.
Aiming at the problems, the invention provides an online diagnosis method for transformer faults under the condition of unbalanced samples, which can effectively solve the problems of misjudgment and missed judgment of unbalanced fault samples.
Disclosure of Invention
An on-line fault diagnosis method for a transformer based on NNTR-SMOTE oversampling for processing unbalanced small samples effectively improves fault diagnosis precision of the transformer.
The invention is realized by adopting the following technical scheme: an on-line diagnosis method for transformer faults based on NNTR-SMOTE oversampling for processing unbalanced small samples comprises the following steps:
firstly, standardizing collected transformer fault sample data, and obtaining balance data by using a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;
secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set;
thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method;
and fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis.
And fifthly, performing online diagnosis on the faults of the transformer by using the trained XGBoost model.
Further, the specific steps of the first step are as follows:
step 1: collecting transformer DGA sample data
Where n is the number of dissolved gases in the collected oil and m is the number of samples collected.
Specifically, the dissolved gas in the oil includes H 2 、CH 4 、C 2 H 4 、C 2 H 2 、C 2 H 6 The fault types of the variable transformer are respectively as follows according to the temperature and the discharge energy of the fault of the transformer: normal, medium temperature overheat, medium and low temperature overheat, high temperature overheat, discharge double overheat, partial discharge, low energy discharge and high energy discharge, and tags 1 to 8.
Step 2: performing Z-score standardization processing on the collected fault sample data to obtain standardized sample data
Specifically, the Z-score normalization processing formula is:
where μ is the mean of the x dataset, σ is the standard deviation of the dataset elements, and x' is the result after x normalization.
Step 3: obtaining balance data by utilizing a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;
specifically, assuming that the number of the majority samples is M and the number of the minority samples is P, the minority sampling rate N is calculated.
N=(M-P)/P
Selecting a minority class arbitrary sample y 0 At y 0 2 neighbor minority class samples y of (2) 1 And y 2 Generates a new artificial sample y' new The formula is:
y′ new =y 2 +rand(0,1)[y 0 +rand(0,1)(y 1 -y 0 )-y 2 ]
wherein rand (0, 1) represents a random number between 0 and 1.
When all the artificial samples are generated, detecting the nearest neighbor samples, and if the nearest neighbor samples are of the same type, reserving the samples; if heterogeneous, the sample is deleted. And calculating the sampling multiplying power, and synthesizing the artificial sample until the sampling multiplying power is 0.
Further, the specific steps of the second step are as follows:
step 1: constructing the characteristic of dissolved gas in oil by adopting a non-coding ratio method for fault sample data after standardized treatment;
the specific characteristics of dissolved gas in oil are as follows: CH (CH) 4 /H 2 、C 2 H 2 /H 2 、C 2 H 2 /C 2 H 4 、C 2 H 4 /C 2 H 6 、C 2 H 6 /CH 4 、C 2 H 2 /CH 4 、C 2 H 4 /CH 4 、H 2 /THC、CH 4 /THC、C 2 H 4 /THC、C 2 H 6 /THC、C 2 H 2 /THC、(CH 4 +C 2 H 4 )/THC、H 2 /ALL、CH 4 /ALL、C 2 H 2 /ALL、C 2 H 4 /ALL、C 2 H 6 /ALL。
Wherein thc=ch 4 +C 2 H 2 +C 2 H 4 +C 2 H 6 ,ALL=H 2 +CH 4 +C 2 H 2 +C 2 H 4 +C 2 H 6
Further, the specific steps of the third step are as follows:
step 1: calculating Euclidean distance sigma among m d-dimensional samples, and generating a distance matrix delta;
assuming that m d-dimensional samples are present, the Euclidean distance between the samples is defined as:
in the method, in the process of the invention,σ ij for the distance between the i-th sample and the j-th sample, namely:
step 2: calculating the inner product matrix of the sample after dimension reduction, namely solving the delta double-centering matrix
The solution of the matrix X can be obtained by carrying out singular value decomposition on the double-centering matrix of the distance matrix delta, wherein the double-centering matrix of the matrix delta is as follows:
in the method, in the process of the invention, the expression is:
step 3: for inner product matrixPerforming eigenvalue decomposition to obtain k eigenvalues and corresponding eigenvectors, and arranging the eigenvalues in descending order;
because of the matrixIs a symmetrical and semi-positive matrix, for which>Singular value decompositionThe solution is as follows:
in the above-mentioned formula, the formula,the diagonal matrix composed of eigenvalues of (a) is Λ; />Is U. For matrix->The eigenvalues of (c) are ordered from large to small.
Step 4: and obtaining a sample X after dimension reduction.
Selecting the first a larger eigenvalues and their corresponding eigenvectors, and then usingThe sample X after dimension reduction can be calculated.
Further, the specific steps of the fourth step are as follows:
step 1: dividing the sample data after dimension reduction into a training set, a verification set and a test set according to a certain proportion;
step 2: constructing an XGBoost model by utilizing data in a training set;
specifically, for a data set d= { (x) having n samples and m features i ,y i )}(x i ∈R m ,y i E R), K CART final prediction outputThe method comprises the following steps:
F={f(x)=ω q(x) },q:R m →T,ω∈R T
wherein each function f k Corresponding to an independent tree structure vector q and leaf weights omega, q points to corresponding leaf labels by samples, each leaf node of each CART corresponds to a continuous score value, namely the weight, and the score of the ith node is omega i The method comprises the steps of carrying out a first treatment on the surface of the T is the number of leaf nodes; f is a set formed by CART; omega q(x) A score for sample x, i.e., model predictive value. For each sample, each CART classifies the samples into leaf nodes according to different classification rules, and a final prediction result is obtained by accumulating the scores omega of the corresponding leaves.
Step 3: setting initial parameters of an XGBoost model, pre-training, and continuously adjusting the model parameters by utilizing GA;
specifically, the iteration times, the learning rate eta and the maximum depth d of the decision tree are selected max Extraction ratio r of random samples subsample Extraction ratio r of features colsample And decision tree node splitting criterion gamma split The 6 super parameters are optimized by GA to improve the performance of the diagnostic model.
Step 4: judging whether the maximum iteration times or termination conditions are reached, if so, taking the training parameter value at the moment as the optimal parameter of the model, otherwise, returning to the step 3;
step 5: and performing ten-fold cross validation, testing the diagnosis effect of the model, and outputting a fault classification result.
Further, the specific steps of the fifth step are as follows:
step 1: the collected transformer fault sample data is subjected to standardized treatment, and the characteristic of dissolved gas in oil is constructed by adopting a non-coding ratio method, so that characteristic data are obtained;
step 2: feature fusion is carried out on the feature data by adopting an MDS method, so that fusion data are obtained;
step 3: and importing the fusion data into the trained XGBoost model, and judging the fault type.
The beneficial effects of the invention are as follows:
the invention provides an online diagnosis method for transformer faults under an unbalanced sample condition. Firstly, the method takes DGA data as the characteristic quantity of a model, and the characteristic of dissolved gas in the oil is constructed after standardized treatment; secondly, obtaining balance data by using an NNTR-SMOTE oversampling method, and carrying out feature fusion on the balance data by adopting a multidimensional scale analysis (MDS) method; thirdly, constructing an XGBoost model, continuously optimizing model parameters by utilizing a GA algorithm, and constructing a transformer fault diagnosis model according to the optimal parameters: and finally, performing online diagnosis on the faults of the transformer by using the trained XGBoost model. The method can accurately judge the fault state of the transformer, effectively solves the problem of low classification precision caused by unbalanced fault data, and further provides an auxiliary decision for online diagnosis of the transformer fault.
Drawings
Figure 1 is a block flow diagram of a method according to the present invention.
FIG. 2 is a graph comparing the characteristic distribution of training data before and after sampling according to an embodiment.
FIG. 3 is a graph of diagnostic results for different input features of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention.
Firstly, standardized processing is carried out on collected transformer fault sample data, and a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method is used for obtaining balance data, wherein the method comprises the following specific steps:
s101: collecting transformer DGA sample data
Where n is the number of dissolved gases in the collected oil and m is the number of samples collected.
Specifically, the dissolved gas in the oil includes H 2 、CH 4 、C 2 H 4 、C 2 H 2 、C 2 H 6 Root of Chinese characterThe fault types of the variable transformer are respectively as follows according to the temperature and the discharge energy of the fault of the transformer: normal, medium temperature overheat, medium and low temperature overheat, high temperature overheat, discharge double overheat, partial discharge, low energy discharge and high energy discharge, and tags 1 to 8.
S102: performing Z-score standardization processing on the collected fault sample data to obtain standardized sample data
Specifically, the Z-score normalization processing formula is:
where μ is the mean of the x dataset, σ is the standard deviation of the dataset elements, and x' is the result after x normalization.
S103: obtaining balance data by utilizing a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;
specifically, assuming that the number of the majority samples is M and the number of the minority samples is P, the minority sampling rate N is calculated.
N=(M-P)/P
Selecting a minority class arbitrary sample y 0 At y 0 2 neighbor minority class samples y of (2) 1 And y 2 Generates a new artificial sample y' new The formula is:
y′ new =y 2 +rand(0,1)[y 0 +rand(0,1)(y 1 -y 0 )-y 2 ]
wherein rand (0, 1) represents a random number between 0 and 1.
When all the artificial samples are generated, detecting the nearest neighbor samples, and if the nearest neighbor samples are of the same type, reserving the samples; if heterogeneous, the sample is deleted. And calculating the sampling multiplying power, and synthesizing the artificial sample until the sampling multiplying power is 0.
Secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set, wherein the specific steps are as follows:
s201: constructing the characteristic of dissolved gas in oil by adopting a non-coding ratio method for fault sample data after standardized treatment;
the specific characteristics of dissolved gas in oil are as follows: CH (CH) 4 /H 2 、C 2 H 2 /H 2 、C 2 H 2 /C 2 H 4 、C 2 H 4 /C 2 H 6 、C 2 H 6 /CH 4 、C 2 H 2 /CH 4 、C 2 H 4 /CH 4 、H 2 /THC、CH 4 /THC、C 2 H 4 /THC、C 2 H 6 /THC、C 2 H 2 /THC、(CH 4 +C 2 H 4 )/THC、H 2 /ALL、CH 4 /ALL、C 2 H 2 /ALL、C 2 H 4 /ALL、C 2 H 6 /ALL。
Wherein thc=ch 4 +C 2 H 2 +C 2 H 4 +C 2 H 6 ,ALL=H 2 +CH 4 +C 2 H 2 +C 2 H 4 +C 2 R 6
Thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method, wherein the method comprises the following specific steps of:
s301: calculating Euclidean distance sigma among m d-dimensional samples, and generating a distance matrix delta;
there were 724 18-dimensional samples, and the euclidean distance between samples was defined as:
in sigma ij For the distance between the i-th sample and the j-th sample, namely:
s302: calculating the inner product matrix of the sample after dimension reduction, namely solving the delta double-centering matrix
The solution of the matrix X can be obtained by carrying out singular value decomposition on the double-centering matrix of the distance matrix delta, wherein the double-centering matrix of the matrix delta is as follows:
in the method, in the process of the invention, the expression is:
s303: for inner product matrixPerforming eigenvalue decomposition to obtain k eigenvalues and corresponding eigenvectors, and arranging the eigenvalues in descending order;
because of the matrixIs a symmetrical and semi-positive matrix, for which>Singular value decomposition is performed, namely:
in the above-mentioned formula, the formula,the diagonal matrix composed of eigenvalues of (a) is Λ; />Is U. For matrix->The eigenvalues of (c) are ordered from large to small.
S304: and obtaining a sample X after dimension reduction.
Selecting the first 8 larger eigenvalues and their corresponding eigenvectors, and then usingThe sample X after dimension reduction can be calculated.
Fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis, wherein the method comprises the following specific steps of:
s401: dividing the sample data after dimension reduction into a training set, a verification set and a test set according to a certain proportion;
s402: constructing an XGBoost model by utilizing data in a training set;
specifically, for a data set d= { (x) having n samples and m features i ,y i )}(x i ∈R m ,y i E R), K CART final prediction outputThe method comprises the following steps:
F={f(x)=ω q(x) },q:R m →T,ω∈R T
wherein, the liquid crystal display device comprises a liquid crystal display device,each function f k Corresponding to an independent tree structure vector q and leaf weights omega, q points to corresponding leaf labels by samples, each leaf node of each CART corresponds to a continuous score value, namely the weight, and the score of the ith node is omega i The method comprises the steps of carrying out a first treatment on the surface of the T is the number of leaf nodes; f is a set formed by CART; omega q(x) A score for sample x, i.e., model predictive value. For each sample, each CART classifies the samples into leaf nodes according to different classification rules, and a final prediction result is obtained by accumulating the scores omega of the corresponding leaves.
S403: setting initial parameters of an XGBoost model, pre-training, and continuously adjusting the model parameters by utilizing GA;
specifically, the iteration times, the learning rate eta and the maximum depth d of the decision tree are selected max Extraction ratio r of random samples subsample Extraction ratio r of features colsample And decision tree node splitting criterion gamma split The 6 super parameters are optimized by GA to improve the performance of the diagnostic model.
S404: judging whether the maximum iteration times or termination conditions are reached, if so, taking the training parameter value at the moment as the optimal parameter of the model, otherwise, returning to the step 3;
s405: and performing ten-fold cross validation, testing the diagnosis effect of the model, and outputting a fault classification result.
Fifthly, performing online diagnosis on transformer faults by using the trained XGBoost model, wherein the method comprises the following specific steps of:
s501: the collected transformer fault sample data is subjected to standardized treatment, and the characteristic of dissolved gas in oil is constructed by adopting a non-coding ratio method, so that characteristic data are obtained;
s502: feature fusion is carried out on the feature data by adopting an MDS method, so that fusion data are obtained;
s503: and importing the fusion data into the trained XGBoost model, and judging the fault type.
Examples:
firstly, 425 cases of transformer fault samples are collected, wherein the 425 cases comprise 91 normal cases, 46 cases of medium-temperature overheat, 33 cases of medium-low-temperature overheat, 53 cases of high-temperature overheat, 17 cases of discharge and overheat, 29 cases of partial discharge, 72 cases of low-energy discharge and 84 cases of high-energy discharge.
The minority class fault samples are expanded by using NNTR-SMOTE oversampling algorithm, and the distribution of the sample numbers before and after the treatment is shown in table 1.
TABLE 1 sample number comparison before and after treatment
In the embodiment, the XGBoost model is selected as a classifier, and the GA algorithm is adopted to optimize the learning capacity and classification performance of the XGBoost model. Dividing the sample data after dimension reduction into a training set, a verification set and a test set according to the ratio of 6:2:2. The initial parameters of XGBoost are preset as follows: learning rate η=0.3; decision tree node splitting criterion gamma split =0; maximum depth d of decision tree max =6; extraction ratio r of random samples subsample =1; extraction ratio r of features colsample =1; and when the maximum iteration times are obtained, obtaining an optimal GA-XGBoost classifier, and finally inputting the test set into the optimal classifier to obtain a classification result shown in the following figure 2.
In order to illustrate the superiority of the MDS-based feature fusion method, the feature fusion method is input into a GA-XGBoost optimal classifier by combining with an IEC three-ratio method, a Rogers four-ratio method and 18 dimensions to perform fault diagnosis, and the diagnosis result is shown in figure 3.
In order to verify the superiority of the method, 5 models of GA-XGBoost and PSO-XGBoost, XGBoost, SVM, ELM are subjected to comparison analysis, ten-fold cross verification is adopted for all models, and the model diagnosis accuracy is shown in Table 2.
Table 2 comparison of multimode diagnostic accuracy
As can be seen from table 2: the method has higher accuracy under the condition of unbalanced small samples.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention, which is defined by the following claims.

Claims (6)

1. An on-line diagnosis method for transformer faults based on NNTR-SMOTE oversampling to process unbalanced small samples is characterized by comprising the following steps:
firstly, standardizing collected transformer fault sample data, and obtaining balance data by using a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;
secondly, constructing the characteristic of dissolved gas in oil by adopting a coding-free ratio method to obtain a characteristic data set;
thirdly, carrying out feature fusion on the feature data set by adopting a multi-dimensional scale analysis (MDS) method;
and fourthly, utilizing a GA algorithm to optimize parameters of the XGBoost model, constructing a transformer fault diagnosis model, and finally performing ten-fold cross validation to realize transformer fault diagnosis.
And fifthly, performing online diagnosis on the faults of the transformer by using the trained XGBoost model.
2. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the first step are as follows:
step 1: collecting transformer DGA sample data
Where n is the number of dissolved gases in the collected oil and m is the number of samples collected.
Specifically, the dissolved gas in the oil includes H 2 、CH 4 、C 2 H 4 、C 2 H 2 、C 2 H 6 The fault types of the variable transformer are respectively as follows according to the temperature and the discharge energy of the fault of the transformer: normal, medium temperature overheat, medium and low temperature overheat, high temperature overheat, discharge double overheat, partial discharge, low energy discharge and high energy discharge, and tags 1 to 8.
Step 2: performing Z-score standardization processing on the collected fault sample data to obtain standardized sample data
Specifically, the Z-score normalization processing formula is:
where μ is the mean of the x dataset, σ is the standard deviation of the dataset elements, and x' is the result after x normalization.
Step 3: obtaining balance data by utilizing a nearest neighbor triangle region synthesis minority class oversampling (NNTR-SMOTE) method;
specifically, assuming that the number of the majority samples is M and the number of the minority samples is P, the minority sampling rate N is calculated.
N=(M-P)/P
Selecting a minority class arbitrary sample y 0 At y 0 2 neighbor minority class samples y of (2) 1 And y 2 Generates a new artificial sample y' new The formula is:
y′ new =y 2 +rand(0,1)[y 0 +rand(0,1)(y 1 -y 0 )-y 2 ]
wherein rand (0, 1) represents a random number between 0 and 1.
When all the artificial samples are generated, detecting the nearest neighbor samples, and if the nearest neighbor samples are of the same type, reserving the samples; if heterogeneous, the sample is deleted. And calculating the sampling multiplying power, and synthesizing the artificial sample until the sampling multiplying power is 0.
3. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the second step are as follows:
step 1: constructing the characteristic of dissolved gas in oil by adopting a non-coding ratio method for fault sample data after standardized treatment;
the specific characteristics of dissolved gas in oil are as follows: CH (CH) 4 /H 2 、C 2 H 2 /H 2 、C 2 H 2 /C 2 H 4 、C 2 H 4 /C 2 H 6 、C 2 H 6 /CH 4 、C 2 H 2 /CH 4 、C 2 H 4 /CH 4 、H 2 /THC、CH 4 /THC、C 2 H 4 /THC、C 2 H 6 /THC、C 2 H 2 /THC、(CH 4 +C 2 H 4 )/THC、H 2 /ALL、CH 4 /ALL、C 2 H 2 /ALL、C 2 H 4 /ALL、C 2 H 6 /ALL。
Wherein thc=ch 4 +C 2 H 2 +C 2 H 4 +C 2 H 6 ,ALL=H 2 +CH 4 +C 2 H 2 +C 2 H 4 +C 2 H 6
4. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the third step are as follows:
step 1: calculating Euclidean distance sigma among m d-dimensional samples, and generating a distance matrix delta;
assuming that m d-dimensional samples are present, the Euclidean distance between the samples is defined as:
in sigma ij For the distance between the i-th sample and the j-th sample, namely:
step 2: calculating the inner product matrix of the sample after dimension reduction, namely solving the delta double-centering matrix
The solution of the matrix X can be obtained by carrying out singular value decomposition on the double-centering matrix of the distance matrix delta, wherein the double-centering matrix of the matrix delta is as follows:
in the method, in the process of the invention,Δ (2) =σ ij 2 ,/>the expression is:
step 3: for inner product matrixPerforming eigenvalue decomposition to obtain k eigenvalues and corresponding eigenvectors, and arranging the eigenvalues in descending order;
because of the matrixIs a symmetrical and semi-positive matrix, for which>Singular value decomposition is performed, namely:
in the above-mentioned formula, the formula,the diagonal matrix composed of eigenvalues of (a) is Λ; />Is U. For matrix->The eigenvalues of (c) are ordered from large to small.
Step 4: and obtaining a sample X after dimension reduction.
Selecting the first a larger eigenvalues and their corresponding eigenvectors, and then usingThe sample X after dimension reduction can be calculated.
5. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the fourth step are as follows:
step 1: dividing the sample data after dimension reduction into a training set, a verification set and a test set according to a certain proportion;
step 2: constructing an XGBoost model by utilizing data in a training set;
specifically, for a data set d= { (x) having n samples and m features i ,y i )}(x i ∈R m ,y i E R), K CART final prediction outputThe method comprises the following steps:
F={f(x)=ω q(x) },q:R m →T,ω∈R T
wherein each function f k Corresponding to an independent tree structure vector q and leaf weights omega, q points to corresponding leaf labels by samples, each leaf node of each CART corresponds to a continuous score value, namely the weight, and the score of the ith node is omega i The method comprises the steps of carrying out a first treatment on the surface of the T is the number of leaf nodes; f is a set formed by CART; omega q(x) A score for sample x, i.e., model predictive value. For each sample, each CART classifies the samples into leaf nodes according to different classification rules, and a final prediction result is obtained by accumulating the scores omega of the corresponding leaves.
Step 3: setting initial parameters of an XGBoost model, pre-training, and continuously adjusting the model parameters by utilizing GA;
specifically, the iteration times, the learning rate eta and the maximum depth d of the decision tree are selected max Extraction ratio r of random samples subsample Extraction ratio r of features colsample And decision tree node splitting criterion gamma split The 6 super parameters are optimized by GA to improve the performance of the diagnostic model.
Step 4: judging whether the maximum iteration times or termination conditions are reached, if so, taking the training parameter value at the moment as the optimal parameter of the model, otherwise, returning to the step 3;
step 5: and performing ten-fold cross validation, testing the diagnosis effect of the model, and outputting a fault classification result.
6. The method for on-line diagnosis of transformer faults under the condition of small unbalanced samples according to claim 1, wherein the specific steps of the fifth step are as follows:
step 1: the collected transformer fault sample data is subjected to standardized treatment, and the characteristic of dissolved gas in oil is constructed by adopting a non-coding ratio method, so that characteristic data are obtained;
step 2: feature fusion is carried out on the feature data by adopting an MDS method, so that fusion data are obtained;
step 3: and importing the fusion data into the trained XGBoost model, and judging the fault type.
CN202310630341.6A 2023-05-31 2023-05-31 Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling Pending CN116680532A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310630341.6A CN116680532A (en) 2023-05-31 2023-05-31 Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310630341.6A CN116680532A (en) 2023-05-31 2023-05-31 Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling

Publications (1)

Publication Number Publication Date
CN116680532A true CN116680532A (en) 2023-09-01

Family

ID=87780322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310630341.6A Pending CN116680532A (en) 2023-05-31 2023-05-31 Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling

Country Status (1)

Country Link
CN (1) CN116680532A (en)

Similar Documents

Publication Publication Date Title
CN111274395A (en) Power grid monitoring alarm event identification method based on convolution and long-short term memory network
US20210334658A1 (en) Method for performing clustering on power system operation modes based on sparse autoencoder
CN110929847A (en) Converter transformer fault diagnosis method based on deep convolutional neural network
CN108051660A (en) A kind of transformer fault combined diagnosis method for establishing model and diagnostic method
CN111652704A (en) Financial credit risk assessment method based on knowledge graph and graph deep learning
CN115048988B (en) Unbalanced data set classification fusion method based on Gaussian mixture model
CN109947928A (en) A kind of retrieval type artificial intelligence question and answer robot development approach
CN116010884A (en) Fault diagnosis method of SSA-LightGBM oil-immersed transformer based on principal component analysis
CN114528921A (en) Transformer fault diagnosis method based on LOF algorithm and hybrid sampling
CN110704710A (en) Chinese E-commerce emotion classification method based on deep learning
CN114676814A (en) Wind power ultra-short-term prediction method based on SATCN-LSTM
Yin et al. Deep learning based feature reduction for power system transient stability assessment
CN116562114A (en) Power transformer fault diagnosis method based on graph convolution neural network
Zhang et al. Fault diagnosis of oil-immersed power transformer based on difference-mutation brain storm optimized catboost model
CN112559741B (en) Nuclear power equipment defect record text classification method, system, medium and electronic equipment
Liang et al. Dissolved gas analysis of transformer oil based on Deep Belief Networks
CN116522121A (en) Transformer online fault diagnosis method under unbalanced small sample condition
CN111626559A (en) Main factor analysis method-based medium-voltage distribution network line loss key characteristic index extraction method and system
CN116488151A (en) Short-term wind power prediction method based on condition generation countermeasure network
CN116680532A (en) Transformer fault online diagnosis method for processing unbalanced small sample based on NNTR-SMOTE oversampling
CN113961708B (en) Power equipment fault tracing method based on multi-level graph convolutional network
CN110348497A (en) A kind of document representation method based on the building of WT-GloVe term vector
CN116263814A (en) Fault diagnosis method for oil immersed transformer
CN113496255B (en) Power distribution network mixed observation point distribution method based on deep learning and decision tree driving
Ge et al. Remaining useful life prediction using deep multi-scale convolution neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination