CN116578833A

CN116578833A - IGBT module aging fault diagnosis system based on optimized random forest model

Info

Publication number: CN116578833A
Application number: CN202310381111.0A
Authority: CN
Inventors: 周荔丹; 姚钢; 李璟; 杨晓帆
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-08-11

Abstract

An IGBT module aging fault diagnosis system based on an optimized random forest model, comprising: the invention takes parameter data in the working process of an IGBT module as a diagnosis signal, builds an aging fault diagnosis data set to realize training, building and optimizing of a random forest model, and finally obtains an IGBT state diagnosis result by optimizing the random forest model.

Description

IGBT module aging fault diagnosis system based on optimized random forest model

Technical Field

The invention belongs to the field of IGBT module fault diagnosis, and particularly relates to an IGBT module aging fault diagnosis system based on an optimized random forest model.

Background

The IGBT module works in a severe environment throughout the year, carries complex cyclic stress, causes continuous accumulation of fatigue damage, finally generates complete aging failure, and finally turns into serious failure such as open circuit, short circuit and the like if the effective treatment is not performed, thereby causing catastrophic failure, causing damage of power equipment in the system, bringing great economic loss, and affecting personal safety if the effective treatment is not performed, and bringing serious potential safety hazard. Therefore, the method is an important means for accurately diagnosing the aging faults of the IGBT module and improving the running reliability, safety and usability of the system.

The current technology for diagnosing the aging fault of the IGBT module is mainly based on a direct measurement method or a historical data driving method. The direct measurement method is based on the aging degree of devices observed by equipment such as a ray or acoustic microscope, and the like, so that the aging failure process of the module is estimated, the method has no universality, and is difficult to adapt to actual conditions such as multi-sample big data, and the like; based on the change rule of the history data of the learning object, the method based on the history data driving establishes a machine learning model to realize effective diagnosis of the aging failure degree of the device. Compared with the method based on historical data driving for diagnosing the aging fault of the IGBT module, the method based on historical data driving for diagnosing the aging fault of the IGBT module has the technical problems that a machine learning model selected by the existing method for diagnosing the aging fault of the IGBT module is over-fitted, sensitive to a space data density difference value, redundant in modeling process, low in convergence speed, sensitive to a kernel function, low in diagnosis precision on a high-dimensional data set and the like.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an IGBT module aging fault diagnosis system based on an optimized random forest model, wherein parameter data in the working process of the IGBT module is used as diagnosis signals, an aging fault diagnosis data set is constructed to realize training, construction and optimization of the random forest model, and finally an IGBT state diagnosis result is obtained through optimizing the random forest model.

The invention is realized by the following technical scheme:

the invention relates to an IGBT module aging fault diagnosis system based on an optimized random forest model, which comprises the following components: the system comprises a data acquisition module, a data processing module, a model construction module, a model optimization module and an aging fault diagnosis module, wherein: the data acquisition module acquires module temperature, collector current, collector voltage, grid current and grid voltage data in the working process of the IGBT module through a sensor, and the data are aging fault diagnosis data signals of the IGBT module and output the aging fault diagnosis data signals to the data processing module; the data processing module performs standardized processing on the IGBT module aging fault diagnosis data signals to obtain an IGBT module aging fault diagnosis data set, and outputs the IGBT module aging fault diagnosis data set to the model building module; the model construction module trains and constructs a random forest model taking the CART decision tree as a base evaluator; the model optimization module optimizes the model by combining the methods of pre-pruning, cross verification, learning curve and Bagging resampling to form an optimized random forest model; the aging fault diagnosis module outputs an aging fault diagnosis result of the IGBT module based on the optimized random forest model.

The sensor comprises: infrared sensor, current sensor and voltage sensor, wherein: collecting collector current, collector voltage, grid current and grid voltage data signals in the working process of the IGBT module of the current and voltage sensor.

The standardized treatment refers to: after the collected original data signals are centered according to the mean value, the collected original data signals are scaled according to the standard deviation, so that the processed signal data obey standard normal distribution with the mean value of 0 and the standard deviation of 1:wherein: x is the original data signal，/>For the normalized data signal, μ is the mean value of all sample data signals and σ is the standard deviation of all sample data signals.

The IGBT module aging fault diagnosis data set is a set of data signals obtained by standardized processing of diagnosis signals acquired by the sensor.

The base evaluator is a plurality of unified weak classification models which jointly form a random forest model, and a CART decision tree is preferably used as the base evaluator of the established random forest model, and is a classification regression tree taking Gini coefficients as a characteristic evaluation method.

The model optimization module adopts a grid search and pre-pruning method to define the optimal parameters of the base evaluator in the random forest model, and adopts a learning curve to determine the optimal number of the base evaluator in the random forest model which is built later.

The aging fault diagnosis result is output after comprehensive calculation by voting results of all base estimators in the optimized random forest model, and the method comprises the following steps: the IGBT module is in a normal working state, and a tag is set to be T0; the IGBT module is in an initial aging stage, and a tag is set to be T1; the IGBT module is in an aging fault state, and a tag is set to be T2.

Technical effects

According to the invention, each sensor in the data acquisition module acquires an IGBT module aging fault diagnosis data signal, and the acquired IGBT module aging fault diagnosis data signal is input into the data processing module; secondly, performing standardized processing on the IGBT module aging fault diagnosis data signals in the data processing module to obtain an IGBT module aging fault diagnosis data set, and outputting the IGBT module aging fault diagnosis data set to the model building module; thirdly, establishing a traditional random forest model by using the IGBT module aging fault diagnosis data set in the model construction module; on the basis of the traditional random forest model established by the model construction module, the model optimization module improves modeling efficiency and accuracy by adopting a method of pre-pruning and grid searching, then adopts a learning curve method to determine the number of base estimators in the random forest model, avoids model fitting condition imbalance caused by improper number setting of the base estimators, consumes more resources, and finally adopts a Bagging resampling method to lower average correlation coefficient between the base estimators so as to enable the random forest model to obtain higher diagnosis accuracy and complete optimization of the random forest model; and finally, an aging fault diagnosis module of the optimized random forest model is used for outputting the diagnosis result of the IGBT aging fault by using the optimized random forest model.

From the aspect of error distribution of multiple tests, the method optimizes the mean square error and the average absolute error of the random forest model in the aspect of aging fault diagnosis of the IGBT module, is lower and stable, has the highest correlation with the data set, has the advantages of high prediction precision and good fitting degree when applied to an aging fault diagnosis system of the IGBT device, and has good practical value and application prospect.

Drawings

FIG. 1 is a flowchart of an IGBT module aging fault diagnosis;

FIG. 2 is a flow chart of a conventional random forest model establishment;

FIG. 3 is a diagram of the results of a basic evaluator comprehensive evaluation grid search;

FIG. 4 is a graph of a base evaluator comprehensively evaluating 3D surfaces;

FIG. 5 is a base evaluator number optimizing curve;

FIG. 6 is a basic structural diagram of a Bagging resampling method;

FIG. 7 is a block diagram of a K-fold cross-validation;

FIG. 8 is a graph comparing the results of optimizing random forests with other model evaluations;

FIG. 9 is an output learning curve of an optimized random forest and other models.

Detailed Description

As shown in fig. 1, in the method for diagnosing an aging fault of an IGBT module based on the optimized random forest model of the system according to the present embodiment, a sensor in a data acquisition module acquires an aging fault diagnosis data signal of the IGBT module; inputting the acquired data signals into a data processing module for standardized processing to obtain an IGBT module aging fault diagnosis data set; inputting the obtained IGBT aging fault diagnosis data set into a model building module, and training and building a traditional random forest model by utilizing the IGBT module aging fault diagnosis data set; then optimizing the traditional random forest model obtained in the previous step in a model optimization module by utilizing a method of pre-pruning, grid searching, learning curve and Bagging resampling; thus, the establishment of the optimized random forest model is completed. And finally, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model in the optimized random forest model aging fault diagnosis module so as to realize the IGBT module aging fault diagnosis.

The specific steps of the embodiment include:

s1, collecting aging fault diagnosis data of an IGBT module: the method comprises the steps of respectively collecting collector current, grid current, collector voltage, grid voltage and module temperature in the normal running state, the initial aging state and the complete aging fault state of an IGBT module in a data collecting module through setting up voltage, current and infrared sensors, respectively collecting 301680 sampling points by each diagnosis signal, setting up a label T0 for the normal running state of the IGBT module, setting up a label T1 for the initial aging state and setting up a label T2 for the complete aging fault state of the IGBT module, and inputting the collected IGBT module aging fault diagnosis data signals into a data processing module.

S2, establishing an IGBT module aging fault diagnosis data set: after the IGBT module aging fault diagnosis data signals input to the data processing module are centered according to the mean value, scaling according to the standard deviation, so that the processed data obey the standard normal distribution with the mean value of 0 and the standard deviation of 1, namely, the data samples of the IGBT module aging fault diagnosis signals are subjected to standardized processing to obtain an IGBT aging fault diagnosis data set so as to be beneficial to improving the accuracy and convergence rate of a final random forest model:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean of all sample data signals and σ is the standard deviation of all data signal samples.

The standardized IGBT module aging fault diagnosis signal data set is an IGBT module aging fault diagnosis data set, and the obtained IGBT module aging fault diagnosis data set is shown in table 1.

TABLE 1

And inputting the obtained IGBT module aging fault diagnosis data set into a model building module.

S3, as shown in FIG. 2, training the IGBT module aging fault diagnosis data set obtained in the last step in a model building module and building a traditional random forest model building flow chart with a CART decision tree as a base evaluator, wherein the flow chart is specifically as follows: the traditional random forest model taking the CART decision tree as the base evaluator is essentially a set of a plurality of CART decision trees, when the number of decision trees in the random forest model is not limited and the growth process of any CART decision tree in the random forest model is not interfered, when all CART decision trees in the random forest model are grown, the establishment of the traditional random forest model is completed.

The base evaluator CART decision tree is a decision tree constructed based on a classification regression tree algorithm, and specifically comprises the following steps: the classification regression tree algorithm is a rule for constructing a CART decision tree model by taking a Gini coefficient as a criterion of an optimal split node, and the IGBT module aging fault diagnosis data set is taken as an example and specifically comprises the following steps:wherein: p (x) _i I t) represents randomly extracting a sample in the dataset at node t and the sample belongs to category x _i Probability of (2); the tini (t) is the tini coefficient at the t node, namely the probability of disagreement of two sample categories randomly extracted from the data set, and is calculatedIs a constant of (c). X is x _i T0, T1 and T2 are included in total of 3 categories. The node t is any branch node in the decision tree.

The step 3 specifically comprises the following steps:

3.1 Randomly sampling an IGBT module aging fault diagnosis data set to obtain a sample set of each CART decision tree, randomly sampling times, randomly generating the sample set and a finally generated decision tree, and mixing the sample set of each decision tree according to 8:2, dividing the training set and the testing set in proportion;

3.2 Calculating Gini coefficients of different categories in each CART decision tree sample set, taking sample set D as an example: taking the feature A with the smallest Gini coefficient value as the root node of the decision tree;

3.3 From the root node, the data set D is divided into two sub-data sets according to the characteristic A, and then the Gini coefficient of the sample set D is

3.4 Taking the minimum Gini (D, A) value as the optimal splitting node of the root node of the decision tree, and continuing splitting the node downwards until the condition required by splitting at a certain node is not satisfied or the Gini coefficient reaches the minimum value and cannot continue splitting, wherein the decision tree stops growing, namely the decision tree stops growing by itself instead of human intervention, and the classification result of the node at the stop of the decision tree is the final output classification result of the decision tree.

3.5 Repeating the steps 3.3) and 3.4), and after all decision trees in the model are grown, centralizing all decision trees to finally establish the traditional random forest model.

S4, optimizing the traditional random forest model established in the step S3 in a model optimization module, wherein the method specifically comprises the following steps:

4.1 Firstly, inquiring parameters of a decision tree pre-pruning algorithm by adopting a grid search method: and (3) arranging and combining possible values of each growth parameter, listing all possible combined results to establish grids, establishing decision trees one by one, and outputting grid search results of the prediction precision and the running time of a single decision tree.

The growth parameters of the decision tree comprise: the maximum depth, the minimum number of branch nodes and the minimum number of branch samples of the decision tree; taking an IGBT module aging fault diagnosis data set as an example, setting the value range of the depth of a decision tree to be [1,50], and setting the step length to be 1; the value range of the minimum branch node number is [2,25], and the step length is 1; the minimum number of branch samples is [2,25], the step length is 1, the values of other parameters are determined according to the sample conditions, and the prediction precision and the grid search result of the running time of a single decision tree are output as shown in fig. 3.

In order to ensure the prediction precision and efficiency at the same time, taking the difference between the value 1 and the prediction precision of each point in fig. 3, superposing the value of the running time of the decision tree, and outputting the grid search result again as shown in fig. 4; the 3D curved surface graph obtained by carrying out data normalization processing on the lowest points of all the sections in FIG. 3 and connecting the sections is shown in FIG. 4: under the influence of the maximum depth, the minimum number of branch samples and the minimum number of branch nodes on the data set, the base evaluator generally displays a trend that the 3D curved surface graph is firstly lowered and then raised, and the numerical value minimum point is obtained at the curved surface depression minimum point and is used as the base evaluator of the random forest algorithm to output data at the 3D curved surface depression minimum point, namely, when the maximum depth of a decision tree is 7, the minimum number of branch nodes is 15 and the minimum number of branch samples is 3, the curved surface reaches the numerical value minimum point, namely, the random forest model is optimal in terms of the prediction precision and the consumed time.

The normalization process is to perform linear transformation on the data at the lowest point of all the sections and map the data to [0,1 ]]Between:wherein: x is x ^′ For the data at the lowest point of the tangent plane after normalization processing, x is the data at the lowest point of the original tangent plane, max (x) is the maximum value of the data at the lowest point of the original tangent plane, and min (x) is the minimum value of the data at the lowest point of the original tangent plane;

4.2 Pre-pruning the CART decision tree by using the grid search result: the complete growth of the tree is limited by setting the generation parameters of the decision tree in the tree building process, when the growth of the base evaluator is limited, the complexity of the random forest model is effectively controlled, the balance between the training error and the complexity of the decision tree can be achieved, and the detection, correction and optimization of the data set category are facilitated: namely, extracting main rules of the data set, discarding abnormal rules, and correcting hidden errors, noise and isolated points in the manually set data set labels.

Specifically, the decision tree is pre-pruned by taking the limit growth parameters of the decision tree, wherein the maximum depth of the decision tree is 7, the minimum branch node number is 15, and the minimum branch sample number is 3. I.e. when any one or more of the conditions are satisfied at the same time, the decision tree stops growing.

4.3 After the pre-pruning treatment of the decision tree in the traditional random forest is completed, outputting frame parameters n_estimators in the random forest modeling process, namely the number of base estimators in the random forest model, by a learning curve method, wherein the parameters determine one of important factors of the complexity degree of the random forest model. Too large a value of the parameter n_evastiators will cause model fitting conditions to be deregulated, and with an increase in the number of base estimators, the time and resources consumed by normal operation of the model will also be greatly increased.

As shown in fig. 5, when the value range of n_estimators is set to be [1,200], the step size is 1, a learning curve of the prediction accuracy of the model is output under the condition that the number of the base estimators is increased, the abscissa is the number of decision trees in the random forest model after the pre-pruning is applied, and the ordinate is the prediction accuracy of the random forest model; as can be seen from FIG. 5, when the base estimator parameter has been tuned to be optimal and the frame parameter n_estimators has been set to 24, the random forest model performs optimally without increasing the number of base estimators. Therefore, after the conventional random forest internal decision tree pre-pruning process has been completed, the number of base estimators in the random forest model is set to 24.

4.4 After the number of the base estimators in the random forest model is set, bagging resampling is carried out on the random forest model: as shown in fig. 6, in order to resample the random samples with the replaced training set in the modeling process, a plurality of new data sets which are similar to the original training set in scale but different from each other are formed, due to the randomness and the independence of the resampling of the training sample set, the variability of a plurality of base estimators formed on the basis is increased, and the correlation between any two base estimators is obviously reduced, so that the prediction precision of the random forest model is improved, and a final optimized random forest model is formed.

S5, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model obtained in the step 4 in the aging fault diagnosis module: voting is carried out on the classification results of all the decision trees in the optimized random forest model, and the class with the largest number of the obtained votes in the classification results of the decision trees is taken as the final output result of the optimized random forest model:wherein: y (x) is an output result of the optimized random forest model; yn (x) is the output result of the nth decision tree in the optimized random forest, and the expression in brackets indicates that the final classification result of the decision tree is i; λ is the number of decision trees satisfying the expression in brackets; z is the number of categories in the optimized random forest model.

Test data acquired through specific practical experiments are verified based on the environment of Python3.8 and Tensorflow2.3, and a cross verification method is adopted to compare and analyze the prediction precision and fitting effects of a random forest regression model, an XGboost model, a traditional random forest model and an optimized random forest model adopted by the invention:

as shown in fig. 7, the schematic diagram of the cross-validation method is that each sample set is substantially divided into K equal parts, each part is sequentially taken as a test set, the remaining K-1 parts are training sets, and the average value of the obtained test set results is output as a final model prediction result through K times of training. The cross-validation operation enables the whole data set to be used as a training set and a testing set at the same time, so that the finally obtained prediction result can effectively evaluate the prediction precision and generalization of the model.

Mean Square Error (MSE), mean Absolute Error (MAE), model decision coefficient (R) ² )、The average value of the prediction accuracy of the model on the training set and the testing set under multiple training is used as an evaluation index, and the performance of the model established in the process and other models in a fault diagnosis system is compared and analyzed, specifically: model mean square errorMean absolute error->Model decision coefficient->Wherein: y is _i Is true value +.>For model predictive value, +.>Is the mean of the original dataset.

Specifically, the smaller the values of MSE and MAE, R ² The larger the value of (c), the higher the correlation of the model prediction accuracy with the test data.

The output model evaluation data are shown in table 2.

TABLE 2

FIG. 8 shows a comparison of the model adopted by the invention and other models in the aging fault diagnosis of the IGBT; as can be seen from the graph, the optimized random forest model applied by the method obtains the highest prediction precision on the training set and the testing set, wherein the prediction precision of the testing set is respectively increased by 17.35%, 17.27% and 1.09% compared with the XGboost model, the random forest regression model and the traditional random forest classification model under the same condition; from the error distribution condition of multiple tests, the mean square error and the mean absolute error of the optimized random forest model are lower and stable, and the correlation with the data set is highest.

The learning curve of the optimized random forest model and other fault diagnosis models adopted by the invention after multiple cross validation training and according to IGBT aging fault diagnosis is shown in figure 9: the red line and the blue line are respectively the variation trend of the prediction precision of the model on the training set and the testing set, the vertical axis is the prediction precision value, and the horizontal axis is the number of samples in the training set;

as can be seen from fig. 9: the random forest regression model has higher prediction precision on the training set but lower prediction precision on the test set, and is characterized by overfitting; the XGBoost model has poor prediction precision on the training set and the testing set and is in underfitting; the fitting degree of the traditional random forest model is general, but the prediction precision on the training set and the test set is lower than that of the optimized random forest model; the prediction precision of the optimized random forest model on the training set and the testing set is higher, the complete fitting can be realized on the training set, and the difference value between the two prediction curves is only 1.19%.

Compared with the prior art, the optimized random forest model adopted in the invention has no redundant number of basic estimators, has reasonable and proper complexity, can rapidly and reliably realize the aging fault diagnosis of the IGBT module, and reliably and accurately obtains the state information of the IGBT module. The model of the invention has high prediction precision and good fitting degree: the invention reduces average correlation coefficient among decision trees in the random forest model by selecting a traditional random forest model from a plurality of machine learning models and by adopting a grid searching, pre-pruning, learning curve and Bagging resampling methodThe method comprises the steps of improving the prediction precision and modeling efficiency of a single base evaluator, evaluating the number and parameters of the base evaluators in a set model according to actual conditions to obtain the best fitting effect of the model, optimizing a traditional random forest model in three aspects of training difficulty and final output precision, and establishing an optimized random forest IGBT aging fault diagnosis method. Finally, the system obtains 100% accuracy on the training set and 98.81% prediction accuracy on the test set.According to the comparison of the optimized random forest model adopted by the invention and other types of models under the same condition, the abnormal conditions such as fitting, under fitting and the like do not occur in the optimized random forest model adopted by the invention, the complete fitting can be realized on the training set, and the difference value between two prediction curves on the final training set and the test set is only 1.19%. 3. In the power system under the extreme environment that operation and maintenance are not available or difficult to implement, the method can be used for realizing diagnosis of the aging fault of the IGBT module by combining the IGBT working condition data collected by real-time monitoring in the power system, and has good practical value and application prospect.

The described embodiments may be modified in various ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and not by the description, and all embodiments within the scope of which are to be limited by the invention.

Claims

1. IGBT module aging fault diagnosis system based on optimization random forest model, characterized by comprising: the system comprises a data acquisition module, a data processing module, a model construction module, a model optimization module and an aging fault diagnosis module, wherein: the data acquisition module acquires module temperature, collector current, collector voltage, grid current and grid voltage data in the working process of the IGBT module through a sensor, and the data are aging fault diagnosis data signals of the IGBT module and output the aging fault diagnosis data signals to the data processing module; the data processing module performs standardized processing on the IGBT module aging fault diagnosis data signals to obtain an IGBT module aging fault diagnosis data set, and outputs the IGBT module aging fault diagnosis data set to the model building module; the model construction module trains and constructs a random forest model taking the CART decision tree as a base evaluator; the model optimization module optimizes the model by combining the methods of pre-pruning, cross verification, learning curve and Bagging resampling to form an optimized random forest model; the aging fault diagnosis module outputs an aging fault diagnosis result of the IGBT module based on the optimized random forest model.

2. The IGBT module aging fault diagnosis system based on the optimized random forest model of claim 1, wherein the sensor comprises: infrared sensor, current sensor and voltage sensor, wherein: collecting collector current, collector voltage, grid current and grid voltage data signals in the working process of the IGBT module of the current and voltage sensor.

3. The IGBT module aging fault diagnosis system based on the optimized random forest model according to claim 1, wherein the normalization process is: after the collected original data signals are centered according to the mean value, the collected original data signals are scaled according to the standard deviation, so that the processed signal data obey standard normal distribution with the mean value of 0 and the standard deviation of 1:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean value of all sample data signals, σ is the standard deviation of all sample data signals;

4. The IGBT module aging fault diagnosis system based on the optimized random forest model according to claim 1, wherein the base evaluator is a plurality of unified type weak classification models that together constitute the random forest model, and the CART decision tree is used as the base evaluator of the built random forest model;

the model optimization module adopts a grid searching and pre-pruning method to define the optimal parameters of the base evaluator in the random forest model, and adopts a learning curve to determine the optimal number of the base evaluator in the random forest model which is built later;

5. An IGBT module aging fault diagnosis method based on the optimized random forest model of any one of claims 1-4, which is characterized in that an IGBT module aging fault diagnosis data signal is collected through a sensor in a data collection module; inputting the acquired data signals into a data processing module for standardized processing to obtain an IGBT module aging fault diagnosis data set; inputting the obtained IGBT aging fault diagnosis data set into a model building module, and training and building a traditional random forest model by utilizing the IGBT module aging fault diagnosis data set; then optimizing the traditional random forest model obtained in the previous step in a model optimization module by utilizing a method of pre-pruning, grid searching, learning curve and Bagging resampling; and finally, outputting an aging fault diagnosis result of the IGBT module by using the optimized random forest model in the aging fault diagnosis module of the optimized random forest model so as to realize aging fault diagnosis of the IGBT module.

6. The method according to claim 5, characterized in that it comprises in particular:

s1, collecting aging fault diagnosis data of an IGBT module: collecting collector current, grid current, collector voltage, grid voltage and module temperature in the normal running state, the initial aging state and the complete aging fault state of the IGBT module respectively as samples of aging fault diagnosis data signals of the IGBT module by setting voltage, current and infrared sensors in a data collecting module, respectively collecting 301680 sampling points by each diagnosis signal, setting a label for the normal running state of the IGBT module as T0, setting a label for the initial aging state as T1 and setting a label for the complete aging fault state as T2, and inputting the collected aging fault diagnosis data signals of the IGBT module into a data processing module;

s2, establishing an IGBT module aging fault diagnosis data set: after the IGBT module aging fault diagnosis data signals input to the data processing module are centered according to the mean value, scaling according to the standard deviation, so that the processed data obey the standard normal distribution with the mean value of 0 and the standard deviation of 1, namely, the data samples of the IGBT module aging fault diagnosis signals are subjected to standardized processing to obtain an IGBT aging fault diagnosis data set so as to be beneficial to improving the accuracy and convergence rate of a final random forest model:wherein: x is the original data signal, ">For the normalized data signal, μ is the mean of all sample data signals, σ is the standard deviation of all data signal samples;

s3, training by using an IGBT module aging fault diagnosis data set and establishing a traditional random forest model by taking a CART decision tree as a base evaluator by using a model construction module, wherein the method specifically comprises the following steps: the traditional random forest model taking the CART decision tree as the base evaluator is essentially a set of a plurality of CART decision trees, when the number of decision trees in the random forest model is not limited and the growth process of any CART decision tree in the random forest model is not interfered, when all CART decision trees in the random forest model are grown, the establishment of the traditional random forest model is completed;

s5, outputting an IGBT module aging fault diagnosis result by using the optimized random forest model obtained in the step 4 in the aging fault diagnosis module: voting is carried out on the classification results of all the decision trees in the optimized random forest model, and the class with the largest number of the obtained votes in the classification results of the decision trees is taken as the final output result of the optimized random forest model:wherein: y (x) is an output result of the optimized random forest model; y is _n (x) In order to optimize the output result of the nth decision tree in the random forest, the expression in brackets is that the final classification result of the decision tree is i; λ is the number of decision trees satisfying the expression in brackets; z is the number of categories in the optimized random forest model.

7. The method according to claim 6, wherein the step 3 specifically comprises:

3.4 Taking the minimum Gini (D, A) value as the optimal splitting node of the root node of the decision tree, and continuing splitting the node downwards until the condition required by splitting at a certain node after the node is not met or the Gini coefficient reaches the minimum value and cannot continue splitting, stopping the growth of the decision tree, namely stopping the growth of the decision tree by the decision tree rather than performing human intervention, wherein the classification result of the node at the stop of the growth of the decision tree is the final output classification result of the decision tree;

8. The method according to claim 6, wherein the step 4 specifically includes:

4.1 Firstly, inquiring parameters of a decision tree pre-pruning algorithm by adopting a grid search method: arranging and combining possible values of each growth parameter, listing all possible combined results to establish grids, establishing decision trees one by one, and outputting grid search results of the prediction precision and the running time of a single decision tree;

4.2 Pre-pruning the CART decision tree by using the grid search result: the complete growth of the tree is limited by setting the generation parameters of the decision tree in the tree building process, when the growth of the base evaluator is limited, the complexity of the random forest model is effectively controlled, the balance between the training error and the complexity of the decision tree can be achieved, and the detection, correction and optimization of the data set category are facilitated: extracting main rules of the data set, discarding abnormal rules, and correcting hidden errors, noise and isolated points in the manually set data set labels;

4.3 After the pre-pruning treatment of the decision tree in the traditional random forest is completed, outputting frame parameters n_estimators in the random forest modeling process, namely the number of base estimators in the random forest model, by a learning curve method, wherein the parameters determine one of important factors of the complexity degree of the random forest model; too large a value of the parameter n_evastiators will cause model fitting condition imbalance, and with the increase of the number of the base estimators, the time and resources consumed by normal operation of the model will be greatly improved;

4.4 After the number of the base estimators in the random forest model is set, bagging resampling is carried out on the random forest model: the training set in the modeling process is resampled by the replaced random samples to form a plurality of new data sets which are similar to the original training set in scale but different from each other, and the randomness and the independence of the resampling of the training sample set are improved, so that the diversity of a plurality of base estimators formed on the basis is increased, the correlation between any two base estimators is obviously reduced, and the prediction precision of the random forest model is improved, and a final optimized random forest model is formed.