CN111240279A - Confrontation enhancement fault classification method for industrial unbalanced data - Google Patents

Confrontation enhancement fault classification method for industrial unbalanced data Download PDF

Info

Publication number
CN111240279A
CN111240279A CN201911369696.4A CN201911369696A CN111240279A CN 111240279 A CN111240279 A CN 111240279A CN 201911369696 A CN201911369696 A CN 201911369696A CN 111240279 A CN111240279 A CN 111240279A
Authority
CN
China
Prior art keywords
data
generated
samples
generator
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911369696.4A
Other languages
Chinese (zh)
Other versions
CN111240279B (en
Inventor
葛志强
江肖禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911369696.4A priority Critical patent/CN111240279B/en
Publication of CN111240279A publication Critical patent/CN111240279A/en
Application granted granted Critical
Publication of CN111240279B publication Critical patent/CN111240279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41875Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by quality surveillance of production
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/31From computer integrated manufacturing till monitoring
    • G05B2219/31359Object oriented model for fault, quality control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an confrontation enhancement fault classification method for industrial unbalanced data, and belongs to the field of fault diagnosis and classification in industrial processes. According to the method, through countertraining between a multi-classification discriminator and a small sample generator, the small sample generator respectively generates data directionally for each type of unbalanced small samples, and through data screening based on the Mahalanobis distance of a principal element space, generated data closer to real data are obtained. A dynamic table of a supplementary database is established, real data are supplemented by utilizing a quantitative updating sample set to obtain a new data set, and the imbalance of industrial data is solved. The classification method couples the training processes of the generator and the multi-classification discriminator together, and more effectively utilizes computing resources.

Description

Confrontation enhancement fault classification method for industrial unbalanced data
Technical Field
The invention belongs to the field of industrial process fault diagnosis and classification, and particularly relates to an anti-enhancement fault classification method for industrial unbalanced data.
Background
With the development of modern industry, industrial data is accumulated in large quantities, and a basis is provided for data-driven process analysis. Among them, the fault diagnosis problem is a typical application of industrial data. Many data-driven methods, such as support vector machines, back-propagation neural networks, etc. algorithms have been widely used for fault classification in some industrial processes.
However, since the fault conditions occurring in the industry are rare, the collected fault data is very limited. Compared with a large amount of non-fault data, namely data under a normal condition, the proportion of fault data is low. In addition, there is an imbalance in the number of faults of different probabilities. This presents difficulties for algorithmic classification based on balanced data distribution. Therefore, the fault diagnosis in the industry is essentially a multi-classification problem of unbalanced data, and needs to be solved urgently.
Supplementing data from the data plane is the most direct way to solve the imbalance problem. The generation of the countermeasure network is a promising generation model at present, and is composed of a generator and a discriminator. Through the countertraining between the generator and the arbiter, the generator can generate data that spoofs the arbiter. Thus, generating data that is generated against the network can be applied to the complement of small samples. However, the training process for generating the countermeasure network is extremely unstable, and noise points deviating from real data or pattern collapse problems are easily generated, so that the authenticity of the generated data is influenced. And the generation of the confrontation network and the training of the classifier are two independent processes, so that the complexity of the model is increased and the waste of computing resources is reduced.
Disclosure of Invention
Aiming at the classification problem of the industrial unbalanced data, the invention provides an confrontation enhancement fault classification method for the industrial unbalanced data, which realizes accurate classification by utilizing a confrontation generation network structure of a small sample generator and a multi-classification discriminator.
The purpose of the invention is realized by the following technical scheme: a confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure BDA0002339336610000011
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data set
Figure BDA0002339336610000025
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure BDA0002339336610000021
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure BDA0002339336610000022
Will MD2And kMD1maxMaking a comparison if MD2<kMD1maxThen the generated data G (z) is considered to be close to the training data set
Figure BDA0002339336610000023
Is the effective point Gvalid(ii) a If MD2>kMD1maxWhen the generated data G deviates from the training data set
Figure BDA0002339336610000024
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure BDA0002339336610000033
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
Figure BDA0002339336610000031
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure BDA0002339336610000032
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Compared with the prior art, the invention has the following beneficial effects: through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, and the small category eliminates the imbalance of the original data set through the supplement of the generated data. Meanwhile, high-quality data in the generated data of the generator is reserved through data screening, and the performance of the classifier is further improved. The countermeasure enhancement classification method provided by the invention is an end-to-end model and is a data enhancement method which more conveniently utilizes generated data.
Drawings
FIG. 1 is a flow chart of a training of a challenge enhanced fault classifier for industrial imbalance data;
FIG. 2 is a flow chart of the Tennessee Eastman (TE) process;
FIG. 3 is a comparison of classification results against robust fault classifiers and other oversampling methods.
Detailed Description
The present invention is further described in detail with reference to the accompanying drawings.
The countermeasure-enhancing fault classifier adopted by the invention is structurally divided into four parts, wherein the first part is a small sample generator: the second part is a data filter: screening the generated data based on the Mahalanobis distance of the principal component space, quantitatively storing screened data by a third part which is a dynamic table of a supplementary database and mixing the screened data with real data, and a fourth part which is a multi-classification discriminator: the method is formed by combining a multi-hidden-layer neural network and a softmax network layer, wherein the output of the last layer of the neural network and the output of the softmax network layer are m + n +1 items, wherein m is a large sample class number, and n is a small sample class number.
A confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure BDA0002339336610000041
The training of the countermeasure enhancement fault classifier is a game countermeasure process and needs to be iterated circularly. An iteration cycle can be divided into 5 stages, and the specific flow is shown in fig. 1:
(2) the data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data set
Figure BDA0002339336610000042
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure BDA0002339336610000043
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure BDA0002339336610000051
Will MD2And kMD1maxMake a comparison if
Figure BDA0002339336610000056
The generated data g (z) is considered to be close to the training data set
Figure BDA0002339336610000052
Is the effective point Gvalid(ii) a If it is
Figure BDA0002339336610000057
At this time, the generated data G is considered to deviate from the training data set
Figure BDA0002339336610000053
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure BDA0002339336610000054
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
Figure BDA0002339336610000058
wherein
Figure BDA0002339336610000055
X from n generators.
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure BDA0002339336610000061
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier. Through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, the imbalance of the data set is solved through effective generated data supplement, and the classification performance of the multi-classification discriminator is improved in confrontation and training.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Examples
The performance of the industrial imbalance data countermeasure enhancement fault classification discriminator is described below in conjunction with a specific TE process example. The TE process is a standard data set commonly used in the field of fault diagnosis and fault classification, and the whole data set includes 53 process variables, and the process flow thereof is shown in fig. 2. The process consists of 5 operation units, namely a gas-liquid separation tower, a continuous stirring type reaction kettle, a dephlegmator, a centrifugal compressor, a reboiler and the like, can be expressed by a plurality of algebraic and differential equations, and is mainly characterized by nonlinearity and strong coupling of the process sensing data.
The TE process sets 21 types of faults, wherein the 21 types of faults include 16 types of known faults and 5 types of unknown faults, the types of faults include step change of flow, slow ramp increase, viscosity of a valve and the like, and typical nonlinear faults and dynamic faults are included, normal data and five fault states are selected for research in the embodiment, and descriptions and corresponding ratios of different states are shown in table 1.
Table 1: fault list of the present embodiment
Numbering Type (B) State description Number of
0 Is normal Is free of 1000
1 Step fault The A/C feed flow ratio was varied, the content of component B being kept constant (stream 4) 50
2 Step fault The content of component B was varied and the A/C feed flow ratio was constant (stream 4) 50
3 Step fault Loss of Material A (stream 1) 50
4 Random variable fault The temperature of the cooling water inlet of the condenser changes 20
5 Unknown fault Is unknown 20
In this example, a total of 16 variables were selected for analysis, as shown in table 2.
Table 2: variable list of the present embodiment
Numbering Measuring variable Numbering Measuring variable
1 A feed rate 9 Product separator temperature
2 D amount of feed 10 Humidity of product classifier
3 E amount of feed 11 Product separator bottoms flow
4 Total amount of feed 12 Pressure of stripper
5 Flow rate of recirculation 13 Stripper temperature
6 Reactor feed 14 Flow rate of gas stripper
7 Reactor temperature 15 Reactor cooling water outlet temperature
8 Discharge velocity 16 Separator cooling water outlet temperature
In this example, the number of generators of the small sample generator is 5, the number of hidden layers per generator is 2, the number of hidden layer nodes is 32 and 64, respectively, the optimizer used is ADAM, and the learning rate is 0.01. The number of hidden layers of the multi-classification discriminator is 2, the number of hidden layer nodes is 100 and 200 respectively, and the classification discriminator adopts an optimizer SGD to learn 0.1. Each time, a batch of data is selected for training, the batch size is 60, all samples are traversed in each period, and 100 periods are iterated.
100 samples from each state were selected as test data. Fig. 3 is a graph showing that the countermeasure enhancement discriminator and a common data oversampling method are counted and compared with the classification result (classification accuracy) of the neural network classifier, and it can be seen from the graph that the method has higher classification accuracy than the method of SMOTE + BPNN, smoteemann + SMOTE, and the superiority of the method is proved.

Claims (1)

1. The method for classifying the confrontation enhancement faults facing the industrial unbalance data is characterized by comprising the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure FDA0002339336600000011
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data set
Figure FDA0002339336600000018
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure FDA0002339336600000012
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure FDA0002339336600000013
Will MD2And kMD1maxMake a comparison if
Figure FDA0002339336600000014
The generated data g (z) is considered to be close to the training data set
Figure FDA0002339336600000015
Is the effective point Gvalid(ii) a If it is
Figure FDA0002339336600000016
At this time, the generated data G is considered to deviate from the training data set
Figure FDA0002339336600000017
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure FDA0002339336600000023
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
Figure FDA0002339336600000021
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure FDA0002339336600000022
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
CN201911369696.4A 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data Active CN111240279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911369696.4A CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911369696.4A CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Publications (2)

Publication Number Publication Date
CN111240279A true CN111240279A (en) 2020-06-05
CN111240279B CN111240279B (en) 2021-04-06

Family

ID=70874084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911369696.4A Active CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Country Status (1)

Country Link
CN (1) CN111240279B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328588A (en) * 2020-11-27 2021-02-05 哈尔滨工程大学 Industrial fault diagnosis unbalanced time sequence data expansion method
WO2022166325A1 (en) * 2021-02-05 2022-08-11 华为技术有限公司 Multi-label class equalization method and device
CN117932457A (en) * 2024-03-22 2024-04-26 南京信息工程大学 Model fingerprint identification method and system based on error classification

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882992B1 (en) * 1999-09-02 2005-04-19 Paul J. Werbos Neural networks for intelligent control
DE10349094A1 (en) * 2003-10-22 2005-05-25 Rieter Ingolstadt Spinnereimaschinenbau Ag Textile machine and method for improving the production process
US20080010531A1 (en) * 2006-06-12 2008-01-10 Mks Instruments, Inc. Classifying faults associated with a manufacturing process
JP4312930B2 (en) * 2000-06-09 2009-08-12 富士重工業株式会社 Automobile failure diagnosis device
CN102254177A (en) * 2011-04-22 2011-11-23 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
US20140365195A1 (en) * 2013-06-07 2014-12-11 Scientific Design Company, Inc. System and method for monitoring a process
US8983882B2 (en) * 2006-08-17 2015-03-17 The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration Autonomic and apoptopic systems in computing, robotics, and security
WO2016144586A1 (en) * 2015-03-11 2016-09-15 Siemens Industry, Inc. Prediction in building automation
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107657274A (en) * 2017-09-20 2018-02-02 浙江大学 A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means
CN107884706A (en) * 2017-11-09 2018-04-06 合肥工业大学 The analog-circuit fault diagnosis method approached based on vector value canonical kernel function
CN108875771A (en) * 2018-03-30 2018-11-23 浙江大学 A kind of failure modes model and method being limited Boltzmann machine and Recognition with Recurrent Neural Network based on sparse Gauss Bernoulli Jacob
CN109062177A (en) * 2018-06-29 2018-12-21 无锡易通精密机械股份有限公司 A kind of Trouble Diagnostic Method of Machinery Equipment neural network based and system
CN109190665A (en) * 2018-07-30 2019-01-11 国网上海市电力公司 A kind of general image classification method and device based on semi-supervised generation confrontation network
CN109800895A (en) * 2019-01-18 2019-05-24 广东电网有限责任公司 A method of based on augmented reality in the early warning of metering automation pipeline stall and maintenance
CN109858352A (en) * 2018-12-26 2019-06-07 华中科技大学 A kind of method for diagnosing faults based on compressed sensing and the multiple dimensioned network of improvement
US20190187688A1 (en) * 2016-05-09 2019-06-20 Strong Force Iot Portfolio 2016, Llc Systems and methods for data collection and frequency analysis
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN110059631A (en) * 2019-04-19 2019-07-26 中铁第一勘察设计院集团有限公司 The contactless monitoring defect identification method of contact net
CN110070060A (en) * 2019-04-26 2019-07-30 天津开发区精诺瀚海数据科技有限公司 A kind of method for diagnosing faults of bearing apparatus
CN110208660A (en) * 2019-06-05 2019-09-06 国网江苏省电力有限公司电力科学研究院 A kind of training method and device for power equipment shelf depreciation defect diagonsis
US20190354094A1 (en) * 2018-05-17 2019-11-21 National Cheng Kung University System and method that consider tool interaction effects for identifying root causes of yield loss
CN110567720A (en) * 2019-08-07 2019-12-13 东北电力大学 method for diagnosing depth confrontation of fault of fan bearing under unbalanced small sample scene

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882992B1 (en) * 1999-09-02 2005-04-19 Paul J. Werbos Neural networks for intelligent control
JP4312930B2 (en) * 2000-06-09 2009-08-12 富士重工業株式会社 Automobile failure diagnosis device
DE10349094A1 (en) * 2003-10-22 2005-05-25 Rieter Ingolstadt Spinnereimaschinenbau Ag Textile machine and method for improving the production process
US20080010531A1 (en) * 2006-06-12 2008-01-10 Mks Instruments, Inc. Classifying faults associated with a manufacturing process
US8983882B2 (en) * 2006-08-17 2015-03-17 The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration Autonomic and apoptopic systems in computing, robotics, and security
CN102254177A (en) * 2011-04-22 2011-11-23 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
US20140365195A1 (en) * 2013-06-07 2014-12-11 Scientific Design Company, Inc. System and method for monitoring a process
WO2016144586A1 (en) * 2015-03-11 2016-09-15 Siemens Industry, Inc. Prediction in building automation
US20190187688A1 (en) * 2016-05-09 2019-06-20 Strong Force Iot Portfolio 2016, Llc Systems and methods for data collection and frequency analysis
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107657274A (en) * 2017-09-20 2018-02-02 浙江大学 A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means
CN107884706A (en) * 2017-11-09 2018-04-06 合肥工业大学 The analog-circuit fault diagnosis method approached based on vector value canonical kernel function
CN108875771A (en) * 2018-03-30 2018-11-23 浙江大学 A kind of failure modes model and method being limited Boltzmann machine and Recognition with Recurrent Neural Network based on sparse Gauss Bernoulli Jacob
US20190354094A1 (en) * 2018-05-17 2019-11-21 National Cheng Kung University System and method that consider tool interaction effects for identifying root causes of yield loss
CN109062177A (en) * 2018-06-29 2018-12-21 无锡易通精密机械股份有限公司 A kind of Trouble Diagnostic Method of Machinery Equipment neural network based and system
CN109190665A (en) * 2018-07-30 2019-01-11 国网上海市电力公司 A kind of general image classification method and device based on semi-supervised generation confrontation network
CN109858352A (en) * 2018-12-26 2019-06-07 华中科技大学 A kind of method for diagnosing faults based on compressed sensing and the multiple dimensioned network of improvement
CN109800895A (en) * 2019-01-18 2019-05-24 广东电网有限责任公司 A method of based on augmented reality in the early warning of metering automation pipeline stall and maintenance
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN110059631A (en) * 2019-04-19 2019-07-26 中铁第一勘察设计院集团有限公司 The contactless monitoring defect identification method of contact net
CN110070060A (en) * 2019-04-26 2019-07-30 天津开发区精诺瀚海数据科技有限公司 A kind of method for diagnosing faults of bearing apparatus
CN110208660A (en) * 2019-06-05 2019-09-06 国网江苏省电力有限公司电力科学研究院 A kind of training method and device for power equipment shelf depreciation defect diagonsis
CN110567720A (en) * 2019-08-07 2019-12-13 东北电力大学 method for diagnosing depth confrontation of fault of fan bearing under unbalanced small sample scene

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIXIANG DUAN: "Support vector data description for machinery multi-fault classification with unbalanced datasets", 《2016 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM)》 *
祝志博: "基于支持向量数据描述的非高斯过程故障重构与诊断", 《中国博士学位论文全文数据库信息科技辑》 *
葛志强: "负载工况过程统计监测方法研究", 《中国博士学位论文全文数据库信息科技辑》 *
邓文凯: "不平衡数据分类研究及其在污水处理***中的应用", 《中国优秀硕士学位论文全文数据库工程科技I辑》 *
陈革成: "基于聚类的工业不平衡故障数据分类方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328588A (en) * 2020-11-27 2021-02-05 哈尔滨工程大学 Industrial fault diagnosis unbalanced time sequence data expansion method
WO2022166325A1 (en) * 2021-02-05 2022-08-11 华为技术有限公司 Multi-label class equalization method and device
CN117932457A (en) * 2024-03-22 2024-04-26 南京信息工程大学 Model fingerprint identification method and system based on error classification
CN117932457B (en) * 2024-03-22 2024-05-28 南京信息工程大学 Model fingerprint identification method and system based on error classification

Also Published As

Publication number Publication date
CN111240279B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN111240279B (en) Confrontation enhancement fault classification method for industrial unbalanced data
CN108228716B (en) SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine
CN110033021B (en) Fault classification method based on one-dimensional multipath convolutional neural network
CN108875772B (en) Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning
US7362892B2 (en) Self-optimizing classifier
CN108875771B (en) Fault classification model and method based on sparse Gaussian Bernoulli limited Boltzmann machine and recurrent neural network
CN110287983A (en) Based on maximal correlation entropy deep neural network single classifier method for detecting abnormality
CN109781411A (en) A kind of combination improves the Method for Bearing Fault Diagnosis of sparse filter and KELM
CN102750286A (en) Novel decision tree classifier method for processing missing data
CN103914705A (en) Hyperspectral image classification and wave band selection method based on multi-target immune cloning
CN105760888A (en) Neighborhood rough set ensemble learning method based on attribute clustering
CN107239789A (en) A kind of industrial Fault Classification of the unbalanced data based on k means
CN111046961A (en) Fault classification method based on bidirectional long-and-short-term memory unit and capsule network
CN107392155A (en) The Manuscripted Characters Identification Method of sparse limited Boltzmann machine based on multiple-objection optimization
CN109164794B (en) Multivariable industrial process Fault Classification based on inclined F value SELM
CN114357870A (en) Metering equipment operation performance prediction analysis method based on local weighted partial least squares
CN107728476B (en) SVM-forest based method for extracting sensitive data from unbalanced data
Shen et al. A novel meta learning framework for feature selection using data synthesis and fuzzy similarity
CN113222046A (en) Feature alignment self-encoder fault classification method based on filtering strategy
CN116776245A (en) Three-phase inverter equipment fault diagnosis method based on machine learning
CN113609480B (en) Multipath learning intrusion detection method based on large-scale network flow
CN115017978A (en) Fault classification method based on weighted probability neural network
CN114997378A (en) Inductive graph neural network pruning method, system, device and storage medium
CN111488520B (en) Crop planting type recommendation information processing device, method and storage medium
El-Amin Detection of Hydrogen Leakage Using Different Machine Learning Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant