CN115859142A

CN115859142A - Small sample rolling bearing fault diagnosis method based on convolution transformer generation countermeasure network

Info

Publication number: CN115859142A
Application number: CN202211233344.8A
Authority: CN
Inventors: 高慧慧; 张潇然; 韩红桂; 高学金; 李方昱
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-03-28

Abstract

A small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network relates to the field of fault diagnosis of rolling bearings and other rotating equipment and solves the problem that accurate fault diagnosis is difficult to achieve under the condition of scarce operating data. Firstly, acquiring signal data of a rolling bearing under actual operation conditions and carrying out data standardization processing on the signal data; secondly, constructing a generator and a discriminator with a convolution and transformer cross structure, and effectively extracting global time domain characteristics of the time sequence signal by using a transformer layer; on the basis, the convolution layer is used for further extracting the local time domain characteristics of the time sequence signal. Meanwhile, the position codes are embedded into the time sequence signals, so that the model can fully learn the position information characteristics of the signals, and finally, high-quality time sequence signal samples are generated to expand the original training samples, thereby improving the fault diagnosis precision under the condition of small samples.

Description

Small sample rolling bearing fault diagnosis method based on convolution transformer generation countermeasure network

Technical Field

The invention relates to the field of fault diagnosis of rotating equipment such as rolling bearings and the like, in particular to a small-sample rolling bearing fault diagnosis method based on a countermeasure network generated by a convolution transformer.

Background

In recent years, the deep neural network has been successfully applied to the fault diagnosis of the rolling bearing by virtue of its strong feature extraction capability. Their main assumption is that there is a large amount of valid data for training the fault diagnosis model. However, in an actual engineering scene, due to the operation safety problem of the rolling bearing device and the complex and variable working conditions, the data acquisition system can often record only a small amount of operation data, and the fault diagnosis effect is greatly influenced. Therefore, it is critical and necessary to design an effective fault diagnosis method under the condition of scarce operation data.

Currently, researchers have proposed various methods to deal with the limited data problem in fault diagnosis. Data sampling is one method of handling limited data. Balancing the data of each class by undersampling a large number of sample classes and oversampling a small number of sample classes is a common way to handle the various sample scale imbalances. Although the data sampling method works well in situations where the data is limited and the samples are unbalanced. However, the data sampling method can only utilize the existing data information, and cannot effectively map the original data distribution, so that the data cannot be effectively expanded to meet the requirement of the intelligent fault diagnosis method on mass data. Transfer learning solves the cross-domain problem by transferring knowledge acquired by a source domain to a target domain. Methods based on transfer learning typically utilize model pre-training and tuning to solve the problem of fault diagnosis under limited data. However, the greatest limitation of this approach is that it does not fundamentally solve the data deficiency problem, and in addition, pre-training of the original model still requires a large number of samples.

With the gradual development of generative models, the solution of the sample scarcity problem through data generation has received a great deal of attention. Among them, generating a countermeasure Network (GAN) is a mainstream generation model in the field of artificial intelligence. GAN can generate data similar to the raw data distribution and is originally applied in the field of image processing. By virtue of its powerful data generating capability, GAN has been successfully applied in the field of rolling bearing fault diagnosis. Yang et al developed a fusion diagnostic model CGAN-2D-CNN. And converting the vibration signals into two-dimensional gray images, and expanding and classifying image data by using CGAN and 2D-CNN for diagnosing the bearing fault of the small sample. Liang et al extract the time-frequency image features from the one-dimensional raw time-domain signal by wavelet transform and generate a large number of time-frequency image samples using GAN. However, converting a one-dimensional time sequence signal into a two-dimensional image cannot well represent vibration information carried by a vibration signal, so that the quality of a generated sample is poor, and the final fault diagnosis effect is affected. With the gradual development of GAN in the field of time sequence signal generation, a method for directly expanding the vibration signal of the rolling bearing by using GAN has also made a rapid progress. Guo et al propose a fault diagnosis framework called multi-tag all-1D generation countermeasure network (ML 1D-GAN) that can be used to directly generate one-dimensional vibration signal data. Sonal Dixit et al propose a novel one-dimensional condition-assisted classifier to generate an anti-network fault diagnosis model to better generate bearing signal samples directly. Zhang et al developed a small sample intelligent fault diagnosis method based on multi-module gradient penalty generative countermeasure network (MGPGAN) to generate mechanical fault signals with high similarity.

However, the above solutions all have certain problems. 1) The GAN feature extraction capability with the full connection layer as the basic structure is insufficient, and the model parameter amount is too large when processing the long sequence signal of the bearing signal. 2) The GAN with the one-dimensional convolution as the basic structure has strong local feature extraction capability, but lacks global feature extraction capability seriously, and cannot be effectively modeled aiming at long-sequence signals. 3) The GAN at the last stage does not take into account the relative or absolute position information of the entire original vibration signal sequence when generating the bearing signal samples, thereby affecting the quality of signal generation.

Disclosure of Invention

The invention aims to solve the problem that diagnosis precision is reduced due to the fact that rolling bearing carrier data are scarce, and the small-sample rolling bearing fault diagnosis method is based on a Convolutional Transformer generation countermeasure Network (CoT-GAN). In order to enable the model to better extract global and local features of the vibration signal, a generator and a discriminator of a transformer and convolution cross structure are designed. The transformer is good at processing long sequence signals and has strong global feature extraction capability, and can effectively carry out global modeling on vibration signals. Furthermore, adding a position code to the vibration signal sequence may enable the model to efficiently learn the relative and absolute position information of the signal, thereby preserving its inherent vibration information characteristics. On the basis, the convolutional layer is utilized to further enhance the learning capability of the model to the local characteristics of the signal. The method starts from the characteristics of the vibration signals, fully considers the time sequence characteristics of the vibration signals, combines the respective advantages of the transformer and the convolution, models the vibration signals of the bearing from local and global, and fully utilizes the position information carried by the vibration signals. And finally, generating sufficient vibration signal samples and effectively improving the fault diagnosis performance.

In order to realize the purpose, the invention adopts the following technical scheme:

a small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network is characterized by comprising the following steps:

(1) Firstly, historical operating data of the rolling bearing is acquired, data standardization processing is carried out, and then signal samples after data standardization are divided into training samples and testing samples.

(2) Constructing a generation countermeasure network (CoT-GAN) with a convolution and transformer cross structure, generating random noise into a generation signal similar to the distribution of a real signal by using a generator, carrying out true and false discrimination and category discrimination on the generation signal and the real signal by using a discriminator, alternately learning the generator and the discriminator in a zero and game mode so as to improve the performance of a model until a Nash equilibrium state is reached, and finally generating a signal sample; expanding the generated signal sample to an original training sample as an enhanced data set to train a fault classifier;

(3) And (3) adopting the fault classifier trained in the step (2) to carry out fault identification and classification on the test sample, and completing a final fault diagnosis task.

A small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network is characterized in that the specific process of the step (1) is as follows:

1) Obtaining historical data of the rolling bearing under actual operation conditions

Where n represents the number of samples and m represents the sample dimension and also the total number of samples collected. Calculating the mean X and standard deviation sigma of the historical number X, and normalizing the data X to obtain->

Wherein i =1,2,. Cndot.n;

2) Will normalize the data

Divided into training sample sets>

And the test sample set->

Wherein the sum of p and q is n;

a small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network is characterized in that a signal is generated by using the generation countermeasure network with a convolution and transformer cross structure, and the specific process of the step (2) is as follows:

1) Setting random noise Z and embedding the corresponding fault class label c into the random noise to obtain random noise Z = [ Z, c ] containing the fault class label, specifically,

first, a random noise of a standard normal distribution (mean 0, variance 1) is obtained

Wherein k represents the number of random noises, and l represents the dimensionality of the random noises;

secondly, label the corresponding fault category

Embedding into random noise z = [ z ] ₁ ,z ₂ ,...z _k ]Obtaining random noise Z containing a fault class label, wherein i belongs to {1,2,3,4};

2) In order to facilitate the processing of the discriminator and the transformer module on the input vector by the signal generated by the subsequent generator, the input random noise Z is transformed and converted into a patch of a fixed size, specifically,

firstly, changing the dimension of input random noise into a fixed value L to facilitate the subsequent processing of a discriminator to generate signals;

secondly, random noise Z = [ Z ] is embedded through one-dimensional convolution ₁ ,Z ₂ ,...Z _k ]Into a plurality of fixed-size patches, specifically,

partitioning random noise into N patches of dimension M

Wherein M represents the size of the patch, N = L/M represents the number of patches, j ∈ {1, 2.

In order to reduce the parameter calculation amount, the characteristics of good weight sharing and local feature extraction effects of the convolutional neural network are utilized, and a one-dimensional convolution is used for forming an embedded module. Setting the convolution kernel size of the one-dimensional convolution as M multiplied by 1 and the step length as M, thereby enabling the one-dimensional convolution kernel to process random noise in a non-overlapping mode and finally obtaining N dimensions of M patches. In particular, the method comprises the following steps of,

embedding matrices using learning

It is projected to the dimension of the model as D by convolution _model In the vector of (a). Wherein, the formula of the one-dimensional convolution operation is as follows:

wherein v is _i And u _j Corresponding to the input of the ith channel and the output of the jth channel, respectively. k is the convolution kernel, b is the bias, and x is the convolution operation. M _j Is a channel set of jth channels for computing output functions;

the fixed size patch is then embedded with a position tag so that the generated signal can have more similar position information to the real signal, thereby improving the quality of the generated sample, specifically,

will have dimension D _model Position information matrix of

Encoding and attaching to the patch, the obtained patch with position information being:

finally, the patch sequence T carrying the position information _Z ′＝[T _Z,1 ,T _Z,1 ,...,T _Z,k ]A generator which is sent into the transformer module and sequentially passes through the convolution and the transformer cross structureSignal forming sample

Where l represents the number of generated samples, specifically,

the patch carrying the location information is sent to the transformer module to extract the global features of the input, specifically,

the transformer module can dynamically capture the characteristic information of the input vector by means of a multi-head attention mechanism in the transformer module, so that the generator can grasp the global characteristic information to a great extent. The function of self-attention is to update each component of the sequence by aggregating global context information from the complete input sequence. The formula for self-attention can be expressed as:

wherein d is _k The representation signal is converted into the dimension of a specific key value vector, and Q, K and V respectively represent a query vector, a key value vector and a matrix corresponding to the value vector.

Multi-head attention is a mechanism involving multiple self-attentions that can encapsulate multiple complex relationships between different elements in a sequence. Assuming h self-attention modules, multi-head attention translates a given input vector into three different sets of vectors. Each group has h vectors of dimension D/h. Then, vectors from different inputs are packed into different matrices:

and &>

Thus, the formula for a multi-head attention mechanism can be expressed as:

wherein Q ', K ' and V ' are each independently

And &>

The cascade of (2), device for combining or screening>

Is a linear projection matrix;

the transformer module applies layer normalization prior to multi-head attention operation. The information flow is then enhanced with residual concatenation to achieve higher performance. Specifically, it can be expressed as:

x′＝x+Multihead(LN(x)) (6)

wherein x is an input vector of the transformer module;

after the steps, the final output of the transformer module is output by the multilayer perceptron, and the specific operation is as follows:

after processing by the transformer module, the output is fed into the deconvolution layer to effectively obtain its local characteristics. Outputting the feature vector after passing through the deconvolution layer

And again input to the module cross-structured by the transformer and the deconvolution layer. The generator comprises 4 transformers and deconvolution cross-type structure modules in total, and generates a generated sample with the same dimension as a real signal after an input vector passes through a last deconvolution layer.

Will be generated by the generatorNumber (C)

And true signal

The mixture is fed to a discriminator, which, in particular,

first, the generated signal and the real signal input to the discrimination are converted into a plurality of fixed-size patches by means of one-dimensional convolution embedding, and the specific operation is similar to that of a 22). And processing the input signal of the discriminator by using a one-dimensional convolutional neural network in a non-overlapping mode to obtain a plurality of patches with fixed sizes.

Secondly, each patch is added with a corresponding position label, so that the discriminator can pay more attention to the relative position and absolute value information of the signal when learning the signal characteristics, and the signal generation of a generator is facilitated.

Then, the patch carrying the position information is sent to a subsequent transformer module and sequentially passes through a discriminator with a convolution and transformer cross structure, specifically,

the vector passing through the transformer module is introduced into a convolution layer for extracting the local characteristics of the input vector. Output vector passing through the convolutional layer

It will continue to be sent to a transformer module to obtain global signatures. The processing process of the input vector in the network is similar to that of a generator, and the discriminator always comprises 4 convolution and transformer modules with a cross structure.

Finally, the characteristic vector output by the last layer of convolution layer is deformed to obtain a plurality of vectors of 1 multiplied by 1024, the vectors are respectively subjected to two-classification discrimination and multi-classification discrimination by utilizing Sigmoid and Softmax activation functions,

and (3) respectively passing the output vectors with the dimensionality of 1 multiplied by 1024 through a two-classification full connection layer and a multi-classification full connection layer to respectively obtain the output vectors with the output dimensionality of 1 and the fault class number. Respectively sending the two output vectors into a Sigmoid and Softmax activation function to perform true and false discrimination and category discrimination, wherein the formula of the Sigmoid activation function is as follows:

where x represents the input vector into the Sigmoid activation function.

The formula for the Softmax activation function is:

wherein z represents an input vector, z _k Represents the kth input vector, z _i Representing the ith input vector and K representing the number of classes of the multi-classification.

3) Finally, the generated signal samples are expanded to the original training samples as an enhanced data set to train the fault classifier.

A small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network is characterized in that in step (2), the specific calculation process is as follows:

1) The generator and the discriminator are alternately trained in a zero-sum game mode until Nash equilibrium is reached, and the objective function of the CoT-GAN is expressed as follows:

wherein, P _data Is the true data distribution, P _g Is to generate a data distribution of the sample, D(s) represents the probability from the real data,

representing the probability from the real data. />

Represents a desire for a true data distribution>

Indicating the expectation of noise synthesis generation data. P (Y = Y | S) _real ) Representing a conditional probability distribution over class labels. The optimization process of the generator and the discriminator is a binary minimum and maximum problem, and can be formalized as the following equation:

2) Training the fault classifier by using the enhanced data set so that the fault classifier can have better generalization capability, wherein an objective function of the fault classifier is represented as follows:

the CoT-GAN network structure specifically comprises: the CoT-GAN is composed of a generator and a discriminator of a convolution and transformer cross structure, can effectively model the global characteristics and the local parts of the vibration signals, and fully considers the relative position and absolute position information contained in the signals to generate sufficient signal data. The generator is composed of L deconvolution and transformer cross modules, the input of the generator is composed of random noise and existing fault category labels, data points are converted into patch forms through one-dimensional convolution embedding, embedded position information is input into network layers of L transformers and deconvolution cross structures, and finally generated signals with the same dimensionalities as real signals are output. The discriminator consists of L convolution and transformer cross modules, the input of the discriminator consists of a generated signal and a real signal, the input signal is converted into a plurality of patches through one-dimensional convolution and embedded with position information, the patches are input to L transformer and network layers of a convolution cross structure, and finally the output layer of the discriminator is the probability of two-class classification and multi-class classification.

Output of generator and discriminator transformer module

And &>

Indicated as follows, L =1, 2., L,

wherein the content of the first and second substances,

represents the output vector of the l-1 th transformer module in the generator, is->

Representing the output vector of the ith transformer module in the generator. f. of _G,l (. H) represents the corresponding set i transformer module and deconvolution operation in the generator, when l =1, and->

I.e. a fixed patch representing position-coded information, when L = L, then £ h @>

I.e. the output vector representing the generator. />

Represents the output vector of the l-1 th transformer module in the discriminator>

To representAnd the output vector of the ith transformer module in the discriminator. f. of _D,l (·) represents the corresponding i-th set of transformer modules and convolution operations in the arbiter, when l =1,

I.e. the output vector representing the arbiter. More specifically, the generator consisting of L cross transformer modules and deconvolution can be expressed as:

advantageous effects

The invention designs a generation countermeasure network with a transformer and convolution cross structure, and utilizes the advantages of the transformer and convolution respectively to extract global and local characteristics of a time sequence signal by utilizing a transformer layer and a convolution layer, so that a model can fully capture time domain characteristics of vibration. Secondly, position coding is embedded into the vibration signal, so that the model can fully learn the relative and absolute position information of the signal, the inherent time sequence characteristic of the generated signal is enhanced, sufficient signal data are finally generated, and the fault diagnosis performance under the condition of small samples is effectively improved. The method fully considers the characteristics of the time sequence signals during sample generation and carries out modeling from the whole situation and the local situation, has the characteristics of strong characteristic expression capability, strong pertinence and high diagnosis accuracy, and has very important significance for fault diagnosis of the rolling bearing.

Drawings

FIG. 1 is a flow chart of the CoT-GAN method of the present invention;

FIG. 2 is a schematic diagram of a generator;

FIG. 3 is a schematic diagram of the discriminator;

FIG. 4 is a schematic view of a Kaiser university of West storage (CWRU) bearing test stand;

FIG. 5 illustrates the results of the present invention generated for a CWRU bearing dataset;

FIG. 6 is a graph showing the effect of varying the number of training samples on the diagnostic performance of a model;

FIG. 7 shows the diagnostic effect of the model for 1 training sample;

FIG. 8 shows the diagnostic effect of the model for 2 training samples;

FIG. 9 shows the diagnostic effect of the model for 4 training samples;

FIG. 10 shows the diagnostic effect of the model for 8 training samples;

FIG. 11 shows the diagnostic effect of the model for 16 training samples;

FIG. 12 shows the diagnostic effect of the model for 32 training samples;

Detailed Description

The invention provides a small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network, aiming at the defects of the prior art, and the method can effectively generate time sequence signal samples to expand an original training sample set so as to improve the rolling bearing fault diagnosis precision under the condition of small samples.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art based on the embodiments of the present invention without inventive step, are within the scope of the present invention.

Referring to fig. 1, the invention provides a small sample rolling bearing fault diagnosis method based on a Convolutional Transformer generated countermeasure Network (CoT-GAN), which overcomes the problem that accurate fault diagnosis is difficult to realize under the condition of scarce operating data. Firstly, acquiring signal data of a rolling bearing under actual operation conditions and carrying out data standardization processing on the signal data; next, a generator and a discriminator of a convolution and transformer cross structure are constructed as shown in fig. 2 and 3, respectively. The generator and the discriminator of the convolution and transformer cross structure are constructed, so that the local and global time domain characteristics of the time sequence signal can be effectively extracted. And meanwhile, the position codes are embedded into the time sequence signals, so that the model can fully learn the position information characteristics of the signals, and finally sufficient time sequence signal samples are generated to keep the fault diagnosis precision under the condition of small samples.

The Keiss Sichu university (CWRU) common bearing data set is widely used to verify the performance of fault diagnosis. Fig. 4 shows a CWRU bearing test stand consisting of two motors, a torque sensor, a dynamometer and other control devices. Single point failures on the bearing inner, outer and ball elements were caused by using electro-discharge machining with damage diameters of 0.007, 0.014 and 0.021 inches, respectively. The accelerometer collects vibration signals at various loads of 0 to 3 horsepower. Vibration signals were acquired at 12kHz and 48kHz sampling frequencies using a 16 channel DAT recorder. In this experiment, vibration data collected from a drive end bearing with a fault severity of 0.021 inches, a load of 0hp, and a sampling frequency of 12kHz was used for analysis. Four different bearing health conditions are selected for classification, namely a health state, an outer ring fault, an inner ring fault and a ball fault. Each class contains 100 samples, each sample containing 1024 data points. The training data of the experiment is 1-32 samples randomly sampled for each category, and the rest are test data.

In the hyper-parameter setting, the CoT-GAN adopts an Adam optimizer to perform model optimization, and in order to make model training more stable, a label smoothing strategy is adopted, wherein a real label is set to be 0.9, a false label is set to be 0.1, a Batch _ size of a training model is set to be 4, a learning rate lr of a discriminator is set to be 0.0003, a learning rate lr of a generator is set to be 0.0005, and the model is iterated for 1000 times in total.

Based on the above description, according to the invention, the specific process is implemented as follows:

1) For experimental data X = [ X ] ₁ ,x ₂ ,...,x ₁₀₀ ]∈R ^1×1024 Performing standardization, and calculating the mean value of X

And standard deviation->

Normalizing X by equation (1) results in->

2) The normalized data

Division into training samples>

And the test sample->

3) Random noise setting a standard normal distribution (mean 0, variance 1)

Label the corresponding fault category->

Embedding in random noise z = [ z ] ₁ ,z ₂ ,...z _k ]Obtaining random noise Z containing a fault class label, wherein i belongs to {1,2,3,4};

4) According to the formula (2), inputting random noise Z containing fault category to a one-dimensional convolution embedding module, and converting the form of data point into fixed patch Z _p ＝[Z ₁ ,Z ₂ ,...,Z _K ]In the form of (a);

5) According to the formula (3), adding position information to the patch to obtain the patch T carrying the position information _Z ＝[T _Z,1 ,T _Z,2 ,...,T _Z,k ]And subjecting it toSending the signal into a network layer of a transformer and convolution cross structure;

6) Obtaining the output vector of the transformer according to the formulas (5), (6) and (7)

Sending the signal into a convolution layer behind a rear transformer to obtain an output vector (or greater or lesser) of the transformer and the convolution crossing module>

The above operation can be represented by formula (10);

7) According to equation (12), the output vector of the final generator is obtained

I.e. to generate signal samples

8) Will generate a signal

And training samples

Inputting the data into a discriminator to train the discriminator;

9) Similar to 4), converting the generated signal and the real signal into a patch with a fixed size in a one-dimensional convolution embedding mode;

10 Like 5), position information codes are respectively added to patches of fixed size, and the patches are input into a network layer of a transformer and convolution cross structure;

11 Obtaining the output vector of the last transformer of the discriminator and the convolution cross module according to a formula (15) and a formula (17), and changing the shape of the output vector;

12 Respectively sending the final output vector into a two-classification full connection layer and a multi-classification full connection layer, and processing the output vector behind the full connection layers according to the activation functions of formulas (8) and (9) to finally obtain the probability of judging real data and judging categories;

13 ) generators and discriminators are trained in an alternating manner to eventually reach a nash equilibrium state and generate signal samples

A resulting plot of the generated signals is shown in fig. 5. Wherein, the upper part of fig. 5 is the original signal and the lower part is the generated signal; />

14 Adding the generated signal to the original training sample to obtain an enhanced data set

Wherein H represents the total number of samples;

15 Will enhance the data set

For training fault classifiers and using test data sets

And performing fault diagnosis. The diagnostic effect using the enhanced data set and the original data set is shown in table 1. Where 4 in the first column of table 1 indicates that there are four classes in total, the numbers multiplied by the latter represent the amount of training samples contained in each class. As can be seen from table 1, the final diagnosis effect obtained by training the fault classifier with the enhanced data is far better than that obtained by using only the original small sample data set, and the obtained fault diagnosis effect is better as the number of training CoT-GAN samples and the number of generated samples gradually increase. FIG. 6 illustrates the effect on model diagnostic efficacy as a function of the number of training samples. Wherein the number of generated samples for each category is 10. As can be seen from FIG. 6, with the increase of the number of training samples, the CoT-GAN can effectively generate a synthesized sample to train the classifier, thereby effectively improving the fault accuracy under a small sample. In order to further show the classification precision of each fault class under different training samples, the confusion matrix is used for showing the classification effect of different classes. As shown in FIGS. 7-12, as the number of training samples increasesIn addition, the classification effect of each category is also obviously improved.

Finally, the method can be used for effectively diagnosing the faults under the condition of the small sample, so that the method has great beneficial effect on fault diagnosis of the rolling bearing with the small sample.

TABLE 1 diagnostic accuracy (%) comparison using enhanced and raw data sets

/>

Claims

1. A small sample rolling bearing fault diagnosis method based on a convolution transformer generation countermeasure network is characterized by comprising the following steps:

(1) Firstly, acquiring historical operating data of a rolling bearing, carrying out data standardization processing, and dividing a signal sample after data standardization into a training sample and a test sample;

(2) Constructing a generation countermeasure network with a convolution and transformer cross structure, generating random noise into a generation signal similar to real signal distribution by using a generator, performing true and false discrimination and category discrimination on the generation signal and the real signal by using a discriminator, alternately learning the generator and the discriminator in a zero and game mode so as to improve the performance of a model until a Nash equilibrium state is reached, and finally generating a signal sample; expanding the generated signal samples to original training samples as an enhanced data set to train a fault classifier;

2. The small-sample rolling bearing fault diagnosis method based on the convolution transformer generation countermeasure network as claimed in claim 1, characterized in that: the specific steps of (1) are as follows:

1) Obtaining historical data of rolling bearing under actual operation condition

Wherein n represents the number of samples, m represents the sample dimension, and also represents the total number of collected samples; calculating the mean of the number of histories X>

And standard deviation σ, normalized data X results in >>

Wherein i =1,2, ·, n;

2) Will normalize the data

Divided into training sample sets>

And the test sample set->

Wherein the sum of p and q is n.

3. The small-sample rolling bearing fault diagnosis method based on the convolution transformer generation countermeasure network as claimed in claim 1, characterized in that in step (2), the signal sample is generated by using the generation countermeasure network of the cross-type structure of convolution and transformer, and the specific steps are as follows:

1) Setting a standard normally distributed random noise Z with a mean value of 0 and a variance of 1, and embedding a corresponding fault class label c into the random noise to obtain random noise Z = [ Z, c ] containing the fault class label;

2) Converting an input signal into a plurality of patches with fixed sizes by using a one-dimensional convolution embedding mode, and embedding position coding information into each patch;

3) Constructing a generation countermeasure network with a convolution and transformer cross structure, and extracting global characteristic local characteristics of signals by using a transformer layer and a convolution layer respectively; sending a random noise patch sequence carrying position information into a generator with a transformer and convolution cross structure to generate signal samples; patch operation is carried out on the generated signal and the real signal, position information is embedded, then the generated signal and the real signal are mixed and sent to a discriminator with a transformer and convolution cross structure for learning, and a two-class prediction label and a multi-class prediction label are output by utilizing a Sigmoid and a Softmax activation function at the tail end of the discriminator, so that true and false discrimination and category discrimination are carried out by comparing with the real label;

4) The generator and the discriminator are alternately trained in a zero-sum game mode to reach a Nash equilibrium state, and finally signal samples are generated;

5) The generated signal samples are extended to the original training samples as an enhanced data set to train the fault classifier.

4. The small-sample rolling bearing fault diagnosis method based on the convolution transformer generation countermeasure network as claimed in claim 3, characterized in that in step (2), the specific calculation process is as follows:

1) Performing convolution operation on an input signal in a non-overlapping sliding mode by utilizing a one-dimensional convolution kernel, so that the input signal is divided into a plurality of patches with fixed sizes, and each patch is embedded with a position code which can be learnt in model training; the one-dimensional convolution operation formula and the position coding operation formula are as follows:

wherein v is _i And u _j Inputs corresponding to the ith channel and outputs corresponding to the jth channel, respectively; k is the convolution kernel, b is the offset, and is the convolution operation; m is a group of _j Is a channel set of jth channels for computing output functions;

wherein, U _p Representing different patches, E representing a learnable embedded matrix, E _pos Representing a learnable position information matrix, T _U A final patch sequence representing a final binding position code;

2) The generator and the discriminator are alternately trained in a zero-sum game mode until Nash equilibrium is reached, and the objective function of the CoT-GAN is expressed as follows:

representing the probability from noisy data; />

Represents a desire for a true data distribution>

Representing a desire for noise synthesis generated data; p (Y = Y | S) _real ) Representing a conditional probability distribution over class labels; the optimization process of the generator and the discriminator is a binary minimum and maximum problem, and is formalized as the following equation:

3) Training a fault classifier with the enhanced data set, the objective function of the fault classifier being represented as follows:

where x represents the input sample of the fault classifier, y represents the data label output by the classifier, P _data And P _g Data distributions representing real and generated samples, respectively; p (Y = Y | x) also represents the conditional probability distribution on the class label.