CN116578833A - IGBT module aging fault diagnosis system based on optimized random forest model - Google Patents

IGBT module aging fault diagnosis system based on optimized random forest model Download PDF

Info

Publication number
CN116578833A
CN116578833A CN202310381111.0A CN202310381111A CN116578833A CN 116578833 A CN116578833 A CN 116578833A CN 202310381111 A CN202310381111 A CN 202310381111A CN 116578833 A CN116578833 A CN 116578833A
Authority
CN
China
Prior art keywords
random forest
fault diagnosis
forest model
module
igbt module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310381111.0A
Other languages
Chinese (zh)
Inventor
周荔丹
姚钢
李璟
杨晓帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202310381111.0A priority Critical patent/CN116578833A/en
Publication of CN116578833A publication Critical patent/CN116578833A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01JMEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
    • G01J5/00Radiation pyrometry, e.g. infrared or optical thermometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R19/00Arrangements for measuring currents or voltages or for indicating presence or sign thereof
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
    • Y04S10/52Outage or fault management, e.g. fault detection or location

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An IGBT module aging fault diagnosis system based on an optimized random forest model, comprising: the invention takes parameter data in the working process of an IGBT module as a diagnosis signal, builds an aging fault diagnosis data set to realize training, building and optimizing of a random forest model, and finally obtains an IGBT state diagnosis result by optimizing the random forest model.

Description

IGBT module aging fault diagnosis system based on optimized random forest model
Technical Field
The invention belongs to the field of IGBT module fault diagnosis, and particularly relates to an IGBT module aging fault diagnosis system based on an optimized random forest model.
Background
The IGBT module works in a severe environment throughout the year, carries complex cyclic stress, causes continuous accumulation of fatigue damage, finally generates complete aging failure, and finally turns into serious failure such as open circuit, short circuit and the like if the effective treatment is not performed, thereby causing catastrophic failure, causing damage of power equipment in the system, bringing great economic loss, and affecting personal safety if the effective treatment is not performed, and bringing serious potential safety hazard. Therefore, the method is an important means for accurately diagnosing the aging faults of the IGBT module and improving the running reliability, safety and usability of the system.
The current technology for diagnosing the aging fault of the IGBT module is mainly based on a direct measurement method or a historical data driving method. The direct measurement method is based on the aging degree of devices observed by equipment such as a ray or acoustic microscope, and the like, so that the aging failure process of the module is estimated, the method has no universality, and is difficult to adapt to actual conditions such as multi-sample big data, and the like; based on the change rule of the history data of the learning object, the method based on the history data driving establishes a machine learning model to realize effective diagnosis of the aging failure degree of the device. Compared with the method based on historical data driving for diagnosing the aging fault of the IGBT module, the method based on historical data driving for diagnosing the aging fault of the IGBT module has the technical problems that a machine learning model selected by the existing method for diagnosing the aging fault of the IGBT module is over-fitted, sensitive to a space data density difference value, redundant in modeling process, low in convergence speed, sensitive to a kernel function, low in diagnosis precision on a high-dimensional data set and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an IGBT module aging fault diagnosis system based on an optimized random forest model, wherein parameter data in the working process of the IGBT module is used as diagnosis signals, an aging fault diagnosis data set is constructed to realize training, construction and optimization of the random forest model, and finally an IGBT state diagnosis result is obtained through optimizing the random forest model.
The invention is realized by the following technical scheme:
the invention relates to an IGBT module aging fault diagnosis system based on an optimized random forest model, which comprises the following components: the system comprises a data acquisition module, a data processing module, a model construction module, a model optimization module and an aging fault diagnosis module, wherein: the data acquisition module acquires module temperature, collector current, collector voltage, grid current and grid voltage data in the working process of the IGBT module through a sensor, and the data are aging fault diagnosis data signals of the IGBT module and output the aging fault diagnosis data signals to the data processing module; the data processing module performs standardized processing on the IGBT module aging fault diagnosis data signals to obtain an IGBT module aging fault diagnosis data set, and outputs the IGBT module aging fault diagnosis data set to the model building module; the model construction module trains and constructs a random forest model taking the CART decision tree as a base evaluator; the model optimization module optimizes the model by combining the methods of pre-pruning, cross verification, learning curve and Bagging resampling to form an optimized random forest model; the aging fault diagnosis module outputs an aging fault diagnosis result of the IGBT module based on the optimized random forest model.
The sensor comprises: infrared sensor, current sensor and voltage sensor, wherein: collecting collector current, collector voltage, grid current and grid voltage data signals in the working process of the IGBT module of the current and voltage sensor.
The standardized treatment refers to: after the collected original data signals are centered according to the mean value, the collected original data signals are scaled according to the standard deviation, so that the processed signal data obey standard normal distribution with the mean value of 0 and the standard deviation of 1:wherein: x is the original data signal,/>For the normalized data signal, μ is the mean value of all sample data signals and σ is the standard deviation of all sample data signals.
The IGBT module aging fault diagnosis data set is a set of data signals obtained by standardized processing of diagnosis signals acquired by the sensor.
The base evaluator is a plurality of unified weak classification models which jointly form a random forest model, and a CART decision tree is preferably used as the base evaluator of the established random forest model, and is a classification regression tree taking Gini coefficients as a characteristic evaluation method.
The model optimization module adopts a grid search and pre-pruning method to define the optimal parameters of the base evaluator in the random forest model, and adopts a learning curve to determine the optimal number of the base evaluator in the random forest model which is built later.
The aging fault diagnosis result is output after comprehensive calculation by voting results of all base estimators in the optimized random forest model, and the method comprises the following steps: the IGBT module is in a normal working state, and a tag is set to be T0; the IGBT module is in an initial aging stage, and a tag is set to be T1; the IGBT module is in an aging fault state, and a tag is set to be T2.
Technical effects
According to the invention, each sensor in the data acquisition module acquires an IGBT module aging fault diagnosis data signal, and the acquired IGBT module aging fault diagnosis data signal is input into the data processing module; secondly, performing standardized processing on the IGBT module aging fault diagnosis data signals in the data processing module to obtain an IGBT module aging fault diagnosis data set, and outputting the IGBT module aging fault diagnosis data set to the model building module; thirdly, establishing a traditional random forest model by using the IGBT module aging fault diagnosis data set in the model construction module; on the basis of the traditional random forest model established by the model construction module, the model optimization module improves modeling efficiency and accuracy by adopting a method of pre-pruning and grid searching, then adopts a learning curve method to determine the number of base estimators in the random forest model, avoids model fitting condition imbalance caused by improper number setting of the base estimators, consumes more resources, and finally adopts a Bagging resampling method to lower average correlation coefficient between the base estimators so as to enable the random forest model to obtain higher diagnosis accuracy and complete optimization of the random forest model; and finally, an aging fault diagnosis module of the optimized random forest model is used for outputting the diagnosis result of the IGBT aging fault by using the optimized random forest model.
From the aspect of error distribution of multiple tests, the method optimizes the mean square error and the average absolute error of the random forest model in the aspect of aging fault diagnosis of the IGBT module, is lower and stable, has the highest correlation with the data set, has the advantages of high prediction precision and good fitting degree when applied to an aging fault diagnosis system of the IGBT device, and has good practical value and application prospect.
Drawings
FIG. 1 is a flowchart of an IGBT module aging fault diagnosis;
FIG. 2 is a flow chart of a conventional random forest model establishment;
FIG. 3 is a diagram of the results of a basic evaluator comprehensive evaluation grid search;
FIG. 4 is a graph of a base evaluator comprehensively evaluating 3D surfaces;
FIG. 5 is a base evaluator number optimizing curve;
FIG. 6 is a basic structural diagram of a Bagging resampling method;
FIG. 7 is a block diagram of a K-fold cross-validation;
FIG. 8 is a graph comparing the results of optimizing random forests with other model evaluations;
FIG. 9 is an output learning curve of an optimized random forest and other models.
Detailed Description
As shown in fig. 1, in the method for diagnosing an aging fault of an IGBT module based on the optimized random forest model of the system according to the present embodiment, a sensor in a data acquisition module acquires an aging fault diagnosis data signal of the IGBT module; inputting the acquired data signals into a data processing module for standardized processing to obtain an IGBT module aging fault diagnosis data set; inputting the obtained IGBT aging fault diagnosis data set into a model building module, and training and building a traditional random forest model by utilizing the IGBT module aging fault diagnosis data set; then optimizing the traditional random forest model obtained in the previous step in a model optimization module by utilizing a method of pre-pruning, grid searching, learning curve and Bagging resampling; thus, the establishment of the optimized random forest model is completed. And finally, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model in the optimized random forest model aging fault diagnosis module so as to realize the IGBT module aging fault diagnosis.
The specific steps of the embodiment include:
s1, collecting aging fault diagnosis data of an IGBT module: the method comprises the steps of respectively collecting collector current, grid current, collector voltage, grid voltage and module temperature in the normal running state, the initial aging state and the complete aging fault state of an IGBT module in a data collecting module through setting up voltage, current and infrared sensors, respectively collecting 301680 sampling points by each diagnosis signal, setting up a label T0 for the normal running state of the IGBT module, setting up a label T1 for the initial aging state and setting up a label T2 for the complete aging fault state of the IGBT module, and inputting the collected IGBT module aging fault diagnosis data signals into a data processing module.
S2, establishing an IGBT module aging fault diagnosis data set: after the IGBT module aging fault diagnosis data signals input to the data processing module are centered according to the mean value, scaling according to the standard deviation, so that the processed data obey the standard normal distribution with the mean value of 0 and the standard deviation of 1, namely, the data samples of the IGBT module aging fault diagnosis signals are subjected to standardized processing to obtain an IGBT aging fault diagnosis data set so as to be beneficial to improving the accuracy and convergence rate of a final random forest model:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean of all sample data signals and σ is the standard deviation of all data signal samples.
The standardized IGBT module aging fault diagnosis signal data set is an IGBT module aging fault diagnosis data set, and the obtained IGBT module aging fault diagnosis data set is shown in table 1.
TABLE 1
And inputting the obtained IGBT module aging fault diagnosis data set into a model building module.
S3, as shown in FIG. 2, training the IGBT module aging fault diagnosis data set obtained in the last step in a model building module and building a traditional random forest model building flow chart with a CART decision tree as a base evaluator, wherein the flow chart is specifically as follows: the traditional random forest model taking the CART decision tree as the base evaluator is essentially a set of a plurality of CART decision trees, when the number of decision trees in the random forest model is not limited and the growth process of any CART decision tree in the random forest model is not interfered, when all CART decision trees in the random forest model are grown, the establishment of the traditional random forest model is completed.
The base evaluator CART decision tree is a decision tree constructed based on a classification regression tree algorithm, and specifically comprises the following steps: the classification regression tree algorithm is a rule for constructing a CART decision tree model by taking a Gini coefficient as a criterion of an optimal split node, and the IGBT module aging fault diagnosis data set is taken as an example and specifically comprises the following steps:wherein: p (x) i I t) represents randomly extracting a sample in the dataset at node t and the sample belongs to category x i Probability of (2); the tini (t) is the tini coefficient at the t node, namely the probability of disagreement of two sample categories randomly extracted from the data set, and is calculatedIs a constant of (c). X is x i T0, T1 and T2 are included in total of 3 categories. The node t is any branch node in the decision tree.
The step 3 specifically comprises the following steps:
3.1 Randomly sampling an IGBT module aging fault diagnosis data set to obtain a sample set of each CART decision tree, randomly sampling times, randomly generating the sample set and a finally generated decision tree, and mixing the sample set of each decision tree according to 8:2, dividing the training set and the testing set in proportion;
3.2 Calculating Gini coefficients of different categories in each CART decision tree sample set, taking sample set D as an example: taking the feature A with the smallest Gini coefficient value as the root node of the decision tree;
3.3 From the root node, the data set D is divided into two sub-data sets according to the characteristic A, and then the Gini coefficient of the sample set D is
3.4 Taking the minimum Gini (D, A) value as the optimal splitting node of the root node of the decision tree, and continuing splitting the node downwards until the condition required by splitting at a certain node is not satisfied or the Gini coefficient reaches the minimum value and cannot continue splitting, wherein the decision tree stops growing, namely the decision tree stops growing by itself instead of human intervention, and the classification result of the node at the stop of the decision tree is the final output classification result of the decision tree.
3.5 Repeating the steps 3.3) and 3.4), and after all decision trees in the model are grown, centralizing all decision trees to finally establish the traditional random forest model.
S4, optimizing the traditional random forest model established in the step S3 in a model optimization module, wherein the method specifically comprises the following steps:
4.1 Firstly, inquiring parameters of a decision tree pre-pruning algorithm by adopting a grid search method: and (3) arranging and combining possible values of each growth parameter, listing all possible combined results to establish grids, establishing decision trees one by one, and outputting grid search results of the prediction precision and the running time of a single decision tree.
The growth parameters of the decision tree comprise: the maximum depth, the minimum number of branch nodes and the minimum number of branch samples of the decision tree; taking an IGBT module aging fault diagnosis data set as an example, setting the value range of the depth of a decision tree to be [1,50], and setting the step length to be 1; the value range of the minimum branch node number is [2,25], and the step length is 1; the minimum number of branch samples is [2,25], the step length is 1, the values of other parameters are determined according to the sample conditions, and the prediction precision and the grid search result of the running time of a single decision tree are output as shown in fig. 3.
In order to ensure the prediction precision and efficiency at the same time, taking the difference between the value 1 and the prediction precision of each point in fig. 3, superposing the value of the running time of the decision tree, and outputting the grid search result again as shown in fig. 4; the 3D curved surface graph obtained by carrying out data normalization processing on the lowest points of all the sections in FIG. 3 and connecting the sections is shown in FIG. 4: under the influence of the maximum depth, the minimum number of branch samples and the minimum number of branch nodes on the data set, the base evaluator generally displays a trend that the 3D curved surface graph is firstly lowered and then raised, and the numerical value minimum point is obtained at the curved surface depression minimum point and is used as the base evaluator of the random forest algorithm to output data at the 3D curved surface depression minimum point, namely, when the maximum depth of a decision tree is 7, the minimum number of branch nodes is 15 and the minimum number of branch samples is 3, the curved surface reaches the numerical value minimum point, namely, the random forest model is optimal in terms of the prediction precision and the consumed time.
The normalization process is to perform linear transformation on the data at the lowest point of all the sections and map the data to [0,1 ]]Between:wherein: x is x For the data at the lowest point of the tangent plane after normalization processing, x is the data at the lowest point of the original tangent plane, max (x) is the maximum value of the data at the lowest point of the original tangent plane, and min (x) is the minimum value of the data at the lowest point of the original tangent plane;
4.2 Pre-pruning the CART decision tree by using the grid search result: the complete growth of the tree is limited by setting the generation parameters of the decision tree in the tree building process, when the growth of the base evaluator is limited, the complexity of the random forest model is effectively controlled, the balance between the training error and the complexity of the decision tree can be achieved, and the detection, correction and optimization of the data set category are facilitated: namely, extracting main rules of the data set, discarding abnormal rules, and correcting hidden errors, noise and isolated points in the manually set data set labels.
Specifically, the decision tree is pre-pruned by taking the limit growth parameters of the decision tree, wherein the maximum depth of the decision tree is 7, the minimum branch node number is 15, and the minimum branch sample number is 3. I.e. when any one or more of the conditions are satisfied at the same time, the decision tree stops growing.
4.3 After the pre-pruning treatment of the decision tree in the traditional random forest is completed, outputting frame parameters n_estimators in the random forest modeling process, namely the number of base estimators in the random forest model, by a learning curve method, wherein the parameters determine one of important factors of the complexity degree of the random forest model. Too large a value of the parameter n_evastiators will cause model fitting conditions to be deregulated, and with an increase in the number of base estimators, the time and resources consumed by normal operation of the model will also be greatly increased.
As shown in fig. 5, when the value range of n_estimators is set to be [1,200], the step size is 1, a learning curve of the prediction accuracy of the model is output under the condition that the number of the base estimators is increased, the abscissa is the number of decision trees in the random forest model after the pre-pruning is applied, and the ordinate is the prediction accuracy of the random forest model; as can be seen from FIG. 5, when the base estimator parameter has been tuned to be optimal and the frame parameter n_estimators has been set to 24, the random forest model performs optimally without increasing the number of base estimators. Therefore, after the conventional random forest internal decision tree pre-pruning process has been completed, the number of base estimators in the random forest model is set to 24.
4.4 After the number of the base estimators in the random forest model is set, bagging resampling is carried out on the random forest model: as shown in fig. 6, in order to resample the random samples with the replaced training set in the modeling process, a plurality of new data sets which are similar to the original training set in scale but different from each other are formed, due to the randomness and the independence of the resampling of the training sample set, the variability of a plurality of base estimators formed on the basis is increased, and the correlation between any two base estimators is obviously reduced, so that the prediction precision of the random forest model is improved, and a final optimized random forest model is formed.
S5, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model obtained in the step 4 in the aging fault diagnosis module: voting is carried out on the classification results of all the decision trees in the optimized random forest model, and the class with the largest number of the obtained votes in the classification results of the decision trees is taken as the final output result of the optimized random forest model:wherein: y (x) is an output result of the optimized random forest model; yn (x) is the output result of the nth decision tree in the optimized random forest, and the expression in brackets indicates that the final classification result of the decision tree is i; λ is the number of decision trees satisfying the expression in brackets; z is the number of categories in the optimized random forest model.
Test data acquired through specific practical experiments are verified based on the environment of Python3.8 and Tensorflow2.3, and a cross verification method is adopted to compare and analyze the prediction precision and fitting effects of a random forest regression model, an XGboost model, a traditional random forest model and an optimized random forest model adopted by the invention:
as shown in fig. 7, the schematic diagram of the cross-validation method is that each sample set is substantially divided into K equal parts, each part is sequentially taken as a test set, the remaining K-1 parts are training sets, and the average value of the obtained test set results is output as a final model prediction result through K times of training. The cross-validation operation enables the whole data set to be used as a training set and a testing set at the same time, so that the finally obtained prediction result can effectively evaluate the prediction precision and generalization of the model.
Mean Square Error (MSE), mean Absolute Error (MAE), model decision coefficient (R) 2 )、The average value of the prediction accuracy of the model on the training set and the testing set under multiple training is used as an evaluation index, and the performance of the model established in the process and other models in a fault diagnosis system is compared and analyzed, specifically: model mean square errorMean absolute error->Model decision coefficient->Wherein: y is i Is true value +.>For model predictive value, +.>Is the mean of the original dataset.
Specifically, the smaller the values of MSE and MAE, R 2 The larger the value of (c), the higher the correlation of the model prediction accuracy with the test data.
The output model evaluation data are shown in table 2.
TABLE 2
FIG. 8 shows a comparison of the model adopted by the invention and other models in the aging fault diagnosis of the IGBT; as can be seen from the graph, the optimized random forest model applied by the method obtains the highest prediction precision on the training set and the testing set, wherein the prediction precision of the testing set is respectively increased by 17.35%, 17.27% and 1.09% compared with the XGboost model, the random forest regression model and the traditional random forest classification model under the same condition; from the error distribution condition of multiple tests, the mean square error and the mean absolute error of the optimized random forest model are lower and stable, and the correlation with the data set is highest.
The learning curve of the optimized random forest model and other fault diagnosis models adopted by the invention after multiple cross validation training and according to IGBT aging fault diagnosis is shown in figure 9: the red line and the blue line are respectively the variation trend of the prediction precision of the model on the training set and the testing set, the vertical axis is the prediction precision value, and the horizontal axis is the number of samples in the training set;
as can be seen from fig. 9: the random forest regression model has higher prediction precision on the training set but lower prediction precision on the test set, and is characterized by overfitting; the XGBoost model has poor prediction precision on the training set and the testing set and is in underfitting; the fitting degree of the traditional random forest model is general, but the prediction precision on the training set and the test set is lower than that of the optimized random forest model; the prediction precision of the optimized random forest model on the training set and the testing set is higher, the complete fitting can be realized on the training set, and the difference value between the two prediction curves is only 1.19%.
Compared with the prior art, the optimized random forest model adopted in the invention has no redundant number of basic estimators, has reasonable and proper complexity, can rapidly and reliably realize the aging fault diagnosis of the IGBT module, and reliably and accurately obtains the state information of the IGBT module. The model of the invention has high prediction precision and good fitting degree: the invention reduces average correlation coefficient among decision trees in the random forest model by selecting a traditional random forest model from a plurality of machine learning models and by adopting a grid searching, pre-pruning, learning curve and Bagging resampling methodThe method comprises the steps of improving the prediction precision and modeling efficiency of a single base evaluator, evaluating the number and parameters of the base evaluators in a set model according to actual conditions to obtain the best fitting effect of the model, optimizing a traditional random forest model in three aspects of training difficulty and final output precision, and establishing an optimized random forest IGBT aging fault diagnosis method. Finally, the system obtains 100% accuracy on the training set and 98.81% prediction accuracy on the test set.According to the comparison of the optimized random forest model adopted by the invention and other types of models under the same condition, the abnormal conditions such as fitting, under fitting and the like do not occur in the optimized random forest model adopted by the invention, the complete fitting can be realized on the training set, and the difference value between two prediction curves on the final training set and the test set is only 1.19%. 3. In the power system under the extreme environment that operation and maintenance are not available or difficult to implement, the method can be used for realizing diagnosis of the aging fault of the IGBT module by combining the IGBT working condition data collected by real-time monitoring in the power system, and has good practical value and application prospect.
The described embodiments may be modified in various ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and not by the description, and all embodiments within the scope of which are to be limited by the invention.

Claims (8)

1. IGBT module aging fault diagnosis system based on optimization random forest model, characterized by comprising: the system comprises a data acquisition module, a data processing module, a model construction module, a model optimization module and an aging fault diagnosis module, wherein: the data acquisition module acquires module temperature, collector current, collector voltage, grid current and grid voltage data in the working process of the IGBT module through a sensor, and the data are aging fault diagnosis data signals of the IGBT module and output the aging fault diagnosis data signals to the data processing module; the data processing module performs standardized processing on the IGBT module aging fault diagnosis data signals to obtain an IGBT module aging fault diagnosis data set, and outputs the IGBT module aging fault diagnosis data set to the model building module; the model construction module trains and constructs a random forest model taking the CART decision tree as a base evaluator; the model optimization module optimizes the model by combining the methods of pre-pruning, cross verification, learning curve and Bagging resampling to form an optimized random forest model; the aging fault diagnosis module outputs an aging fault diagnosis result of the IGBT module based on the optimized random forest model.
2. The IGBT module aging fault diagnosis system based on the optimized random forest model of claim 1, wherein the sensor comprises: infrared sensor, current sensor and voltage sensor, wherein: collecting collector current, collector voltage, grid current and grid voltage data signals in the working process of the IGBT module of the current and voltage sensor.
3. The IGBT module aging fault diagnosis system based on the optimized random forest model according to claim 1, wherein the normalization process is: after the collected original data signals are centered according to the mean value, the collected original data signals are scaled according to the standard deviation, so that the processed signal data obey standard normal distribution with the mean value of 0 and the standard deviation of 1:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean value of all sample data signals, σ is the standard deviation of all sample data signals;
the IGBT module aging fault diagnosis data set is a set of data signals obtained by standardized processing of diagnosis signals acquired by the sensor.
4. The IGBT module aging fault diagnosis system based on the optimized random forest model according to claim 1, wherein the base evaluator is a plurality of unified type weak classification models that together constitute the random forest model, and the CART decision tree is used as the base evaluator of the built random forest model;
the model optimization module adopts a grid searching and pre-pruning method to define the optimal parameters of the base evaluator in the random forest model, and adopts a learning curve to determine the optimal number of the base evaluator in the random forest model which is built later;
the aging fault diagnosis result is output after comprehensive calculation by voting results of all base estimators in the optimized random forest model, and the method comprises the following steps: the IGBT module is in a normal working state, and a tag is set to be T0; the IGBT module is in an initial aging stage, and a tag is set to be T1; the IGBT module is in an aging fault state, and a tag is set to be T2.
5. An IGBT module aging fault diagnosis method based on the optimized random forest model of any one of claims 1-4, which is characterized in that an IGBT module aging fault diagnosis data signal is collected through a sensor in a data collection module; inputting the acquired data signals into a data processing module for standardized processing to obtain an IGBT module aging fault diagnosis data set; inputting the obtained IGBT aging fault diagnosis data set into a model building module, and training and building a traditional random forest model by utilizing the IGBT module aging fault diagnosis data set; then optimizing the traditional random forest model obtained in the previous step in a model optimization module by utilizing a method of pre-pruning, grid searching, learning curve and Bagging resampling; and finally, outputting an aging fault diagnosis result of the IGBT module by using the optimized random forest model in the aging fault diagnosis module of the optimized random forest model so as to realize aging fault diagnosis of the IGBT module.
6. The method according to claim 5, characterized in that it comprises in particular:
s1, collecting aging fault diagnosis data of an IGBT module: collecting collector current, grid current, collector voltage, grid voltage and module temperature in the normal running state, the initial aging state and the complete aging fault state of the IGBT module respectively as samples of aging fault diagnosis data signals of the IGBT module by setting voltage, current and infrared sensors in a data collecting module, respectively collecting 301680 sampling points by each diagnosis signal, setting a label for the normal running state of the IGBT module as T0, setting a label for the initial aging state as T1 and setting a label for the complete aging fault state as T2, and inputting the collected aging fault diagnosis data signals of the IGBT module into a data processing module;
s2, establishing an IGBT module aging fault diagnosis data set: after the IGBT module aging fault diagnosis data signals input to the data processing module are centered according to the mean value, scaling according to the standard deviation, so that the processed data obey the standard normal distribution with the mean value of 0 and the standard deviation of 1, namely, the data samples of the IGBT module aging fault diagnosis signals are subjected to standardized processing to obtain an IGBT aging fault diagnosis data set so as to be beneficial to improving the accuracy and convergence rate of a final random forest model:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean of all sample data signals, σ is the standard deviation of all data signal samples;
s3, training by using an IGBT module aging fault diagnosis data set and establishing a traditional random forest model by taking a CART decision tree as a base evaluator by using a model construction module, wherein the method specifically comprises the following steps: the traditional random forest model taking the CART decision tree as the base evaluator is essentially a set of a plurality of CART decision trees, when the number of decision trees in the random forest model is not limited and the growth process of any CART decision tree in the random forest model is not interfered, when all CART decision trees in the random forest model are grown, the establishment of the traditional random forest model is completed;
s4, optimizing the traditional random forest model established in the step S3 in a model optimization module, wherein the method specifically comprises the following steps:
s5, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model obtained in the step 4 in the aging fault diagnosis module: voting is carried out on the classification results of all the decision trees in the optimized random forest model, and the class with the largest number of the obtained votes in the classification results of the decision trees is taken as the final output result of the optimized random forest model:wherein: y (x) is an output result of the optimized random forest model; y is n (x) In order to optimize the output result of the nth decision tree in the random forest, the expression in brackets is that the final classification result of the decision tree is i; λ is the number of decision trees satisfying the expression in brackets; z is the number of categories in the optimized random forest model.
7. The method according to claim 6, wherein the step 3 specifically comprises:
3.1 Randomly sampling an IGBT module aging fault diagnosis data set to obtain a sample set of each CART decision tree, randomly sampling times, randomly generating the sample set and a finally generated decision tree, and mixing the sample set of each decision tree according to 8:2, dividing the training set and the testing set in proportion;
3.2 Calculating Gini coefficients of different categories in each CART decision tree sample set, taking sample set D as an example: taking the feature A with the smallest Gini coefficient value as the root node of the decision tree;
3.3 From the root node, the data set D is divided into two sub-data sets according to the characteristic A, and then the Gini coefficient of the sample set D is
3.4 Taking the minimum Gini (D, A) value as the optimal splitting node of the root node of the decision tree, and continuing splitting the node downwards until the condition required by splitting at a certain node after the node is not met or the Gini coefficient reaches the minimum value and cannot continue splitting, stopping the growth of the decision tree, namely stopping the growth of the decision tree by the decision tree rather than performing human intervention, wherein the classification result of the node at the stop of the growth of the decision tree is the final output classification result of the decision tree;
3.5 Repeating the steps 3.3) and 3.4), and after all decision trees in the model are grown, centralizing all decision trees to finally establish the traditional random forest model.
8. The method according to claim 6, wherein the step 4 specifically includes:
4.1 Firstly, inquiring parameters of a decision tree pre-pruning algorithm by adopting a grid search method: arranging and combining possible values of each growth parameter, listing all possible combined results to establish grids, establishing decision trees one by one, and outputting grid search results of the prediction precision and the running time of a single decision tree;
4.2 Pre-pruning the CART decision tree by using the grid search result: the complete growth of the tree is limited by setting the generation parameters of the decision tree in the tree building process, when the growth of the base evaluator is limited, the complexity of the random forest model is effectively controlled, the balance between the training error and the complexity of the decision tree can be achieved, and the detection, correction and optimization of the data set category are facilitated: extracting main rules of the data set, discarding abnormal rules, and correcting hidden errors, noise and isolated points in the manually set data set labels;
4.3 After the pre-pruning treatment of the decision tree in the traditional random forest is completed, outputting frame parameters n_estimators in the random forest modeling process, namely the number of base estimators in the random forest model, by a learning curve method, wherein the parameters determine one of important factors of the complexity degree of the random forest model; too large a value of the parameter n_evastiators will cause model fitting condition imbalance, and with the increase of the number of the base estimators, the time and resources consumed by normal operation of the model will be greatly improved;
4.4 After the number of the base estimators in the random forest model is set, bagging resampling is carried out on the random forest model: the training set in the modeling process is resampled by the replaced random samples to form a plurality of new data sets which are similar to the original training set in scale but different from each other, and the randomness and the independence of the resampling of the training sample set are improved, so that the diversity of a plurality of base estimators formed on the basis is increased, the correlation between any two base estimators is obviously reduced, and the prediction precision of the random forest model is improved, and a final optimized random forest model is formed.
CN202310381111.0A 2023-04-11 2023-04-11 IGBT module aging fault diagnosis system based on optimized random forest model Pending CN116578833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310381111.0A CN116578833A (en) 2023-04-11 2023-04-11 IGBT module aging fault diagnosis system based on optimized random forest model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310381111.0A CN116578833A (en) 2023-04-11 2023-04-11 IGBT module aging fault diagnosis system based on optimized random forest model

Publications (1)

Publication Number Publication Date
CN116578833A true CN116578833A (en) 2023-08-11

Family

ID=87538533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310381111.0A Pending CN116578833A (en) 2023-04-11 2023-04-11 IGBT module aging fault diagnosis system based on optimized random forest model

Country Status (1)

Country Link
CN (1) CN116578833A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436569A (en) * 2023-09-18 2024-01-23 华能核能技术研究院有限公司 Nuclear power equipment fault prediction and intelligent calibration method and system based on random forest
CN117740381A (en) * 2024-01-22 2024-03-22 中国矿业大学 Bearing fault diagnosis method under low-speed heavy-load working condition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436569A (en) * 2023-09-18 2024-01-23 华能核能技术研究院有限公司 Nuclear power equipment fault prediction and intelligent calibration method and system based on random forest
CN117740381A (en) * 2024-01-22 2024-03-22 中国矿业大学 Bearing fault diagnosis method under low-speed heavy-load working condition

Similar Documents

Publication Publication Date Title
CN111914883B (en) Spindle bearing state evaluation method and device based on deep fusion network
CN106682814B (en) Wind turbine generator fault intelligent diagnosis method based on fault knowledge base
CN116578833A (en) IGBT module aging fault diagnosis system based on optimized random forest model
CN111275288A (en) XGboost-based multi-dimensional data anomaly detection method and device
CN112257530B (en) Rolling bearing fault diagnosis method based on blind signal separation and support vector machine
CN111562108A (en) Rolling bearing intelligent fault diagnosis method based on CNN and FCMC
CN111079861A (en) Power distribution network voltage abnormity diagnosis method based on image rapid processing technology
CN115438726A (en) Device life and fault type prediction method and system based on digital twin technology
CN113255591A (en) Bearing fault diagnosis method based on random forest and fusion characteristics
CN115021679A (en) Photovoltaic equipment fault detection method based on multi-dimensional outlier detection
Liang et al. Multibranch and multiscale dynamic convolutional network for small sample fault diagnosis of rotating machinery
CN112926686B (en) BRB and LSTM model-based power consumption anomaly detection method and device for big power data
TWI400619B (en) Product quality fault detection method and real metrolgy data evaluation method
CN116664098A (en) Abnormality detection method and system for photovoltaic power station
CN116224950A (en) Intelligent fault diagnosis method and system for self-organizing reconstruction of unmanned production line
CN116482491A (en) Transformer partial discharge fault diagnosis method based on Bayesian neural network
CN113496255B (en) Power distribution network mixed observation point distribution method based on deep learning and decision tree driving
CN114046816B (en) Sensor signal fault diagnosis method based on lightweight gradient lifting decision tree
CN115292820A (en) Method for predicting residual service life of urban rail train bearing
CN113505850A (en) Boiler fault prediction method based on deep learning
CN112183745A (en) High-voltage cable partial discharge mode identification method based on particle swarm algorithm and DBN
Lu et al. Smart Meter Fault Diagnosis Model Based on DBN-LSSVM Feature Fusion
Zhu et al. Research of system fault diagnosis method based on imbalanced data
Chen et al. A Multivariate Time Series Anomaly Detection Method Based on Generative Model
CN112834194B (en) Fault intelligent detection method based on soft target measurement under fault-free sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination