CN117949823A

CN117949823A - Motor fault diagnosis method and device based on improved transfer learning model

Info

Publication number: CN117949823A
Application number: CN202410354193.4A
Authority: CN
Inventors: 孙宁; 朱思成; 蒋亮; 王松雷; 王海军; 陈江; 汤家辉
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-04-30
Anticipated expiration: 2044-03-27
Also published as: CN117949823B

Abstract

The invention discloses a motor fault diagnosis method and device based on an improved transfer learning model, wherein the method comprises the following steps: collecting multi-source heterogeneous signals when a motor operates, comprising: a three-axis vibration acceleration signal and a three-phase current signal; continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram; performing image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image; and inputting the fusion image into a trained improved transfer learning model EFFICIENTNETV-M0, and outputting a fault diagnosis result of the motor. According to the invention, the multisource heterogeneous signals are transformed and fused when the motor operates, so that the diagnosis efficiency and accuracy of motor faults are effectively improved.

Description

Motor fault diagnosis method and device based on improved transfer learning model

Technical Field

The invention belongs to the technical field of motor fault diagnosis, and particularly relates to a motor fault diagnosis method and device based on an improved migration learning model.

Background

The motor is an important electromechanical device, can mutually convert electric energy and mechanical energy, and has very important positions in various industrial fields such as electric transmission, transportation, servo control and the like. Therefore, the method has very important significance for guaranteeing the safety and reliability of the operation of the motor. The motor which normally operates can generate various characteristic signals, such as physical quantities of current, temperature, vibration quantity and the like, can be used as the representation of motor performance and operation conditions, and with the continuous development of artificial intelligence and deep learning, intelligent diagnosis technology based on sensor signals is continuously emerging. However, the data obtained by a single sensor is limited, the data is easily influenced by the sensor, and it is difficult to reflect that all the characteristics of equipment in operation are difficult to achieve the expected identification effect by means of single sensor signals for fault diagnosis.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides the motor fault diagnosis method and the motor fault diagnosis device based on the improved migration learning model, and the diagnosis efficiency and the accuracy of motor faults are effectively improved by transforming and fusing multi-source heterogeneous signals during motor operation.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

In a first aspect, a motor fault diagnosis method based on an improved transfer learning model is provided, including: collecting multi-source heterogeneous signals when a motor operates, comprising: a three-axis vibration acceleration signal and a three-phase current signal; continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram; performing image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image; and inputting the fusion image into a trained improved transfer learning model EFFICIENTNETV-M0, and outputting a fault diagnosis result of the motor.

Further, image fusion is carried out on the time-frequency diagram of the triaxial vibration acceleration signal and the time-frequency diagram of the three-phase current signal to obtain a fusion image, and the method comprises the following steps: decomposing the time-frequency diagram of the triaxial vibration acceleration signal and the time-frequency diagram of the three-phase current signal by adopting a two-dimensional Mallat algorithm of wavelet transformation to obtain a low-frequency subband coefficient and a high-frequency subband coefficient of the time-frequency diagram of the triaxial vibration acceleration signal and a low-frequency subband coefficient and a high-frequency subband coefficient of the time-frequency diagram of the three-phase current signal; fusing different levels of sub-band coefficients of the triaxial vibration acceleration signal time-frequency diagram and the triphase current signal time-frequency diagram by adopting a set fusion rule to obtain fused different levels of sub-band coefficients; and reconstructing the fused sub-band coefficients of different layers by adopting a two-dimensional Mallat algorithm to obtain a fused image.

Further, for different levels of subband coefficients of the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram, a set fusion rule is adopted for fusion, and the method comprises the following steps: fusing the low-frequency subband coefficient of the triaxial vibration acceleration signal time-frequency chart and the low-frequency subband coefficient of the triphase current signal time-frequency chart by adopting a weighted average method; and fusing the high-frequency subband coefficients of the triaxial vibration acceleration signal time-frequency chart and the high-frequency subband coefficients of the triphase current signal time-frequency chart by adopting a coefficient absolute value maximum method.

Further, the improved transfer learning model EFFICIENTNETV-M0 is obtained by optimizing the multiplying factor, introducing the DBB module, introducing the MCA attention mechanism and improving the loss function on the basis of the M version of the transfer learning model EFFICIENTNETV; the multiplying power factor is optimized and is used for reducing the parameter number and the calculation complexity of the model; introducing a DBB module for improving the feature extraction capability of the model; introducing an MCA attention mechanism for enhancing the expression capability of the learned features and precisely positioning the object of interest; and improving a loss function, which is used for improving the generalization capability of the model and preventing the training from being fitted.

Further, optimizing the magnification factor includes: the multiplying factor in the B0 version of the transfer learning model EFFICIENTNETV is combined with the network structure of the M version of the transfer learning model EFFICIENTNETV to generate an improved transfer learning model EFFICIENTNETV-M0, and model parameters and computational complexity in EFFCIENTNETV-M are reduced while model accuracy is improved.

Further, introducing a DBB module, comprising: adding a DBB module in the Fused-MBConv structure of the improved transfer learning model EFFICIENTNETV-M0, the DBB module containing six transformations in the reasoning/deployment phase: (1) Conv layer and BN layer combination: (2) branch merging; (3) convolution sequence combining; (4) deep splicing and merging; (5) mean pooling conversion; (6) multiscale convolution conversion. During the inference process, the complex structure of the DBB can be converted into a single convolution, thereby minimizing loss of accuracy and shortening the inference time. The DBB may be embedded directly into any existing architecture as an equivalent embedding module. The diversity and flexibility of the module are reflected, and the feature extraction capability of various backbone feature extraction networks can be obviously improved.

Further, introducing MCA attention mechanisms includes: and replacing the SE module based on the SE attention mechanism by the MCA module based on the MCA attention mechanism in the Fused-MBConv structure of the improved transfer learning model EFFICIENTNETV-M0, wherein the MCA module comprises three branches which are respectively responsible for capturing feature interdependence and capturing inter-channel interaction on the spatial dimensions W and H, and finally, in an integration stage, carrying out simple average aggregation on all outputs of the three branches, and carrying out recalibration on attention rights generated by different dimensions to derive a final refined feature map.

Further, improving the loss function includes:

The Label smoothing Label Smooth and Focal loss function is used for improving the original cross entropy loss function; the probability distribution after Label smoothing is:

，

Wherein, Is the probability distribution smoothed by the label; /(I)Is a super parameter; /(I)The total number of the categories is classified into multiple categories; /(I)For a certain class in multiple classifications,/>Is a real label;

The formula of Focal loss is:

，

Wherein, Predicting the probability of being a positive class for the model; /(I)Focal loss formula under probability of model prediction as positive class,/>For modulation factor,/>Is a balance coefficient.

Further, a training method for improving the transfer learning model EFFICIENTNETV-M0 includes: obtaining source domains using ImageNet datasetAnd target Domain/>; All initial weights of the improved transfer learning model EFFICIENTNETV-M0 are transferred from the transfer learning using the optimized and pre-trained model of the ImageNet dataset; and frozen before training using the fused sensor images begins. In the training process, only the last full-connection layer and the final full-connection layer of the classified output are modified to be matched with the number of fault categories in the experiment. The training process for improving the transfer learning model EFFICIENTNETV-M0 includes: integrating the fused image onto a previously trained network; all models are trained by using the same super parameters and are preprocessed by using the same data; and (3) adjusting the learning rate by adopting an exponential decay strategy, wherein the Batch Size is 8, the maximum number of epochs is limited to 100, the initial learning rate is set to 0.01, the learning rate is reduced to 0.0001 after 100 rounds of iteration, and a model with stable loss function convergence is selected as a final classification model.

In a second aspect, there is provided a motor fault diagnosis apparatus based on an improved transfer learning model, comprising: the signal acquisition module is used for acquiring multi-source heterogeneous signals when the motor operates, and comprises: a three-axis vibration acceleration signal and a three-phase current signal; the data processing module is used for carrying out continuous wavelet transformation on multi-source heterogeneous signals when the motor operates, respectively generating wavelet time-frequency diagrams, and comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram; the image fusion module is used for carrying out image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image; the fault diagnosis module is used for inputting the fusion image into the trained improved transfer learning model EFFICIENTNETV-M0 and outputting a fault diagnosis result of the motor.

Compared with the prior art, the invention has the beneficial effects that: the invention collects multi-source heterogeneous signals when the motor operates, comprising: a three-axis vibration acceleration signal and a three-phase current signal; continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram; performing image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image; the fusion image is input into a trained improved transfer learning model EFFICIENTNETV-M0, and a motor fault diagnosis result is output, so that the motor fault diagnosis efficiency and accuracy are effectively improved.

Drawings

Fig. 1 is a schematic flow diagram of a motor fault diagnosis method based on an improved transfer learning model according to an embodiment of the present invention;

FIG. 2 is a time-frequency image conversion of an X-axis vibration sensor in a normal state of a motor according to an embodiment of the present invention;

FIG. 3 is a three-layer decomposition principle of wavelet transformation in an embodiment of the present invention;

FIG. 4 is a fusion result of fusion of multi-source heterogeneous signals acquired by a heterogeneous multi-mode sensor for a motor in an embodiment of the invention;

FIG. 5 is a schematic diagram of the Fused-MBConv structure;

FIG. 6 is a schematic diagram of MBConv structures;

FIG. 7 is a schematic diagram of a DBB module conversion architecture;

FIG. 8 is a schematic diagram of the DBB model conversion and modified Fused-MBConv architecture;

fig. 9 is a schematic diagram of the MCA attention mechanism structure;

FIG. 10 is a schematic diagram of a modified MBConv structure;

FIG. 11 is a motor fault integrated simulation test stand in an embodiment of the present invention;

FIG. 12 is a confusion matrix;

FIG. 13 is a confusion matrix for the improved transfer learning model EFFICIENTNETV-M0;

FIG. 14 is a fused image under different failure modes in an embodiment of the invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Embodiment one:

As shown in fig. 1, a motor fault diagnosis method based on an improved transfer learning model includes: collecting multi-source heterogeneous signals when a motor operates, comprising: a three-axis vibration acceleration signal and a three-phase current signal; continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram; performing image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image; and inputting the fusion image into a trained improved transfer learning model EFFICIENTNETV-M0, and outputting a fault diagnosis result of the motor.

1) Collecting multi-source heterogeneous signals when a motor operates, comprising: a triaxial vibration acceleration signal and a three-phase current signal.

And acquiring multi-source heterogeneous data such as triaxial vibration acceleration channel signals, three-phase current signals and the like when the motor to be diagnosed runs by using a heterogeneous multi-mode sensor.

2) Continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram.

In processing the signals received from the sensors, different preprocessing methods may be used depending on the type of signal. They can be used in both the time and frequency domains. The original time series data is converted into a frequency domain representation using a transformation method such as fourier transform. However, the time component is no longer available in the fourier transform signal representation. The same advantages as the fourier transform ratio, both the continuous wavelet transform and the short-time fourier transform, are that the signal can be observed in both the time and frequency domains. The Continuous Wavelet Transform (CWT) plays a great role in the analysis of the unsteady signals. Compared with short-time Fourier transform, CWT has the characteristic of window self-adaption, namely high frequency signal resolution (but poor frequency resolution), high frequency signal frequency resolution (but poor time resolution), and the frequency of low frequency is often concerned about the occurrence time of high frequency in engineering, so that the CWT is more advantageous. CWT can convert the acquired data into a two-bit time-frequency image, referred to as a time-frequency map. Setting a functionIs the basic wavelet, and for any signal f (t) ∈R, its continuous wavelet transform is defined as/>The formula is as follows:

，

Wherein, Determining the position of a time-frequency window in the time domain as a translation factor; /(I)Determining the size of a time-frequency window and the position of the time-frequency window in a frequency domain as a scale factor; /(I)Is a source time variable signal; /(I)；/>Is a time variable; /(I)Representation/>Complex conjugate of (a); /(I)The expression of the wavelet basis function (also called parent wavelet) after translation and scale change is:

，

Wavelet transformation can be automatically regulated by characteristic of signal And/>So that the wavelet transform has self-adaptability and multi-resolution characteristics for different time interval vibration signals and current signals of the motor. The conversion process of the CWT is as follows: first, sensor data from multiple sensors is buffered and read as a rolling window segment. In the time domain, the wavelet coefficients can be obtained by comparing window signals at different positions one by one through the movement of the wavelet in time. In the frequency domain, the length and frequency of the wavelet are changed by stretching or compressing the length of the wavelet, so that the wavelet coefficients at different frequencies are realized. The wavelet coefficients at different frequencies are combined to obtain a wavelet coefficient map (time-frequency map) of the time-frequency transformation. Fig. 2 shows time-frequency image conversion of the X-axis vibration sensor in a normal state of the motor.

3) And carrying out image fusion on the time-frequency diagram of the triaxial vibration acceleration signal and the time-frequency diagram of the triphase current signal to obtain a fusion image.

3.1 Multi-sensor image fusion algorithm based on wavelet transformation

The multi-sensor image fusion algorithm is that image data acquired by multi-source channels are subjected to image processing, computer technology and the like, beneficial information and characteristics in the respective channels are extracted to the greatest extent, and finally, high-quality images are synthesized, so that accurate description of a measured object is obtained more comprehensively. The multi-sensor image fusion algorithm aims at improving the accuracy of detection results and providing basis for a decision layer by integrating data of a plurality of sensors.

The invention provides a pixel-level image fusion algorithm based on wavelet transformation, which selects different criteria for fusion according to different characteristics of a low-frequency component and a high-frequency component after wavelet transformation, and obtains a fusion image through wavelet inverse transformation. Experimental results show that the pixel-level image fusion algorithm based on wavelet transformation fully considers the characteristics of wavelet transformation and the visual characteristics of human eyes, has the capability of enhancing the spatial details of images, and has good fusion effect.

In wavelet transformation, the Mallat algorithm is one of the most commonly used decomposition algorithms, and the main process of the Mallat image decomposition algorithm is as follows:

Designing a filter: first, it is necessary to design a pair of two-dimensional wavelet filters, which are generally classified into a low-pass filter (h (x, y)) and a high-pass filter (g (x, y)) in the horizontal direction and the vertical direction. The two filters are typically orthogonal and satisfy a certain scaling function and shifting function relationship.

Wavelet decomposition: for a two-dimensional image (f (x, y)), the wavelet decomposition proceeds as follows, as shown in fig. 3:

s1, decomposing in the horizontal direction: the original image (f (x, y)) is convolved with a low-pass filter (h (x, y)) in the horizontal direction, and then the convolution result is downsampled to obtain a low-frequency sub-image (f_ { LL } (x, y)) in the horizontal direction.

S2, decomposing in the vertical direction: the original image (f (x, y)) is convolved with a low-pass filter (h (x, y)) in the vertical direction, and then the convolution result is downsampled to obtain a low-frequency sub-image (f_ { HH } (x, y)) in the vertical direction.

S3, high-frequency sub-images in the horizontal direction: the original image (f (x, y)) is convolved with a high-pass filter (g (x, y)) in the horizontal direction, and then the convolution result is subjected to a downsampling operation to obtain a high-frequency sub-image (f_ { HL } (x, y)) in the horizontal direction.

S4, high-frequency sub-images in the vertical direction: the original image (f (x, y)) is convolved with a high-pass filter (g (x, y)) in the vertical direction, and then the convolution result is downsampled to obtain a high-frequency sub-image (f_ { LH } (x, y)) in the vertical direction.

S5, repeated decomposition: further decomposition is performed on the low frequency sub-image (f_ { LL } (x, y)) to obtain a lower frequency sub-image and a higher frequency sub-image. Therefore, multi-scale decomposition can be realized, and image features under different scales can be extracted.

And repeating S1-S5 until the required decomposition layer number is reached.

These formulas describe the basic process of the Mallat image decomposition algorithm by which multi-scale analysis and processing of images can be achieved. Notably, the Mallat image decomposition algorithm is an important application of wavelet transformation in image processing, and can be used in the fields of image compression, edge detection, texture analysis and the like.

The three-layer decomposition principle of wavelet transformation can be represented by fig. 3, and four sub-bands can be obtained by the first decomposition, wherein three are high-frequency sub-bands, one is low-frequency sub-band, and the next decomposition only decomposes the low-frequency sub-band.

The reconstruction process of the two-dimensional Mallat algorithm is as follows:

t1, inverse wavelet decomposition: assuming that a multi-scale decomposition of the image has been performed, resulting in a low frequency sub-image and a high frequency sub-image of the image, it is now necessary to reconstruct these sub-images into a fused image. The process of reconstruction is the inverse of decomposition and includes the following steps.

T2, upsampling: the low frequency sub-image first needs to be up-sampled to restore it to its original size. Upsampling is achieved by inserting zero values in each direction, thereby enlarging the size of the sub-image.

T3, inverse filtering: for the up-sampled low frequency sub-image, convolution operation is required to be performed with the inverse filter responses of the low pass filters in the horizontal direction and the vertical direction, respectively, and then the two convolution results are added to obtain the result of the inverse wavelet transform.

T4, inverse wavelet reconstruction: the reconstructed inverse wavelet transformation result is the approximation of the original image.

T5, inverse wavelet reconstructing high frequency sub-image: for high frequency sub-images, inverse wavelet reconstruction is required to obtain high frequency detail information in the original image. The process of inverse wavelet reconstruction is similar to the inverse wavelet transform, but requires an inverse filter response using a high pass filter.

And T6, adding the low-frequency part and the high-frequency part of the inverse wavelet reconstruction to obtain an inverse wavelet transformation result of the fusion image.

Through T1-T6, a reconstruction process of a two-dimensional Mallat algorithm can be realized, and a multi-scale decomposition result of the image is reconstructed into a fusion image. Therefore, multi-scale analysis and processing of the image can be realized, image features under different scales can be extracted, and compression and denoising processing can be carried out on the image without losing important information.

3.2 Multi-sensor image fusion based on wavelet transform

Let the original images to be fused be A and B, the fused image be C, the steps of image fusion are as follows: (1) The original images A and B are decomposed by adopting a wavelet transformed two-dimensional Mallat algorithm, so that the low-frequency sub-band and high-frequency sub-band coefficients (2) of the obtained images are fused by adopting different rules for the sub-band coefficients of different layers of the images. (3) Reconstructing the fused wavelet coefficients by adopting a two-dimensional Mallat algorithm to obtain a fused image C; namely:

decomposing the time-frequency diagram of the triaxial vibration acceleration signal and the time-frequency diagram of the three-phase current signal by adopting a two-dimensional Mallat algorithm of wavelet transformation to obtain a low-frequency subband coefficient and a high-frequency subband coefficient of the time-frequency diagram of the triaxial vibration acceleration signal and a low-frequency subband coefficient and a high-frequency subband coefficient of the time-frequency diagram of the three-phase current signal;

fusing different levels of sub-band coefficients of the triaxial vibration acceleration signal time-frequency diagram and the triphase current signal time-frequency diagram by adopting a set fusion rule to obtain fused different levels of sub-band coefficients;

And reconstructing the fused sub-band coefficients of different layers by adopting a two-dimensional Mallat algorithm to obtain a fused image.

For the coefficients of the low-frequency sub-band and the high-frequency sub-band of the image, the following fusion rule (1) is adopted, wherein the fusion rule is suitable for the source image with rich high-frequency components and high brightness and contrast ratio. (2) The weighted average method has adjustable weight coefficient and wide application range, can eliminate partial noise and has less loss of source image information. The fusion strategy of the invention is that a low-frequency image adopts a weighted average method, and a high-frequency image adopts a coefficient absolute value maximum method; namely:

Fusing the low-frequency subband coefficient of the triaxial vibration acceleration signal time-frequency chart and the low-frequency subband coefficient of the triphase current signal time-frequency chart by adopting a weighted average method;

And fusing the high-frequency subband coefficients of the triaxial vibration acceleration signal time-frequency chart and the high-frequency subband coefficients of the triphase current signal time-frequency chart by adopting a coefficient absolute value maximum method.

For the multi-source heterogeneous signals (such as triaxial vibration and three-phase current signals) acquired by the heterogeneous multi-mode sensor used by the motor, the results of fusion by adopting the flow are shown in fig. 4 after the signals are processed by utilizing continuous wavelet transformation.

4) And inputting the fusion image into a trained improved transfer learning model EFFICIENTNETV-M0, and outputting a fault diagnosis result of the motor.

4.1、EfficientNetV2-M

EFFICIENTNETV2 is a smaller, faster network model that is mainly composed by superimposing an inverted fusion residual layer (Fuse-MBConv) and an inverted linear bottleneck layer with a depth separable convolution (MBConv).

The Fused-MBConv structure is shown in fig. 5, and the image input is subjected to 3*3 standard convolution, and a Dropout layer of Stochastic Depth type is used for the output characteristic diagram. When step=1 and the shape of the input image and the convolved output image of the module are the same, the residual is used to connect the input and output; and directly outputting the characteristic diagram of the convolution output when the step length=2 downsampling stage. The Fused-MBConv module can be divided into two different architectures according to the channel expansion times, when the channel expansion times are not equal to 1: the number of up channels is convolved using the 3*3 standard and then the number of down channels is convolved using 1*1. When channel expansion multiple=1: directly through 3*3 standard convolutions.

MBConv the structure is shown in FIG. 6, the image input is firstly carried out by 1x1 convolution ascending channel number; then using depth convolution under high latitude space; optimizing the feature map data through an SE attention mechanism; the number of down channels is then convolved (using a linear activation function) by 1x 1. Wherein the MBConv module can be divided into two different architectures according to the difference of the depth convolution layer step sizes. When the step size of the depth convolution layer is 1: if the shape of the input feature map is the same as the shape of the output feature map, adding a Stochastic Depth-type Dropout layer to the feature map after the 1x1 convolution is subjected to dimension reduction, so as to prevent over fitting; and finally, connecting the input and the output by residual errors. When the step size of the depth convolution layer is not 1: and the Dropout layer and residual connection are not adopted, and the characteristic diagram is directly output after the dimension reduction of the 1x1 convolution.

EFFICIENTNETV2 has versions B0, B1, B2, B3, S, M and L. Wherein EFFICIENTNETV-B0 is a lightweight version of the series, since EFFICIENTNETV-M contains an effective number of parameters and better accuracy during training. Thus EFFICIENTNETV-M networks were chosen as the base model.

The network structure of EFFICIENTNETV-M and EFFICIENTNETV-B0 is shown in Table 1.

TABLE 1 EfficientNetV2-M and EFFICIENTNETV2-B0 network Structure Table

EFFICIENTNETV2 the 2-M network is divided into 9 stages, stage0 is a common convolution layer with a convolution kernel size of 3 multiplied by 3 and a step length of 2, and comprises BN and Swish activation functions; stage1 to Stage3 are the repetition of the Fused-MBConv structure; stage4 to Stage 7 are repeats of MBConv structure; stage8 consists of a 1x 1 common convolution layer, an averaging pooling layer and a full connection layer. Wherein, fused-MBConv and Fused-MBConv4 respectively represent that the first 1×1 convolution layer in the Fused-MBConv structure is expanded by 1 time and 4 times along with the number of characteristic channels of the input matrix; k3×3 or represents the convolution kernel size employed in Fuse-MBConv for DEPTHWISE CONV; similarly MBConv and MBConv represent that the first 1×1 convolution layer in the MBConv structure expands 4 and 6 times with the number of characteristic channels of the input matrix, respectively; the Channels are the characteristic channel number and represent the channel number of the output matrix obtained after the input of the Stage; SE0.25 indicates that the number of nodes of the first full connection layer in the Squeeze-and-Excitation (SE) module is 1/4 of the number channels of the characteristic channels input into the MBConv module; layers represents how many times each module is repeatedly executed.

4.2 Optimization of multiplying factor

EFFICIENTNETV2 the NAS (Neural Architecture Search) technique is used to search for a rational configuration of three parameters of the image input resolution r of the network, the depth of the network and the width of the channel, by changing the three parameters, the performance of the network is improved. Width magnification factor: the width of the network is adjusted by adjusting the number of convolution kernels, and the channels of the output feature matrix are changed. Depth multiplying factor: the depth of the network is adjusted by adjusting the number of times each Stage repeatedly stacks the network structure.

TABLE 2 EfficientNetV2-M0 network Structure Table

The multiplying power factor in channel dimension and the multiplying power factor in depth in EFFICIENTNETV-B0 are both 1.0, so that the parameter number and the calculation complexity of the model are greatly reduced, the network structure of the model is only eight layers, compared with EFFICIENTNETV-M, the number of stages of a MBConv network structure is reduced, and the accuracy of EFFICIENTNETV-B0 is far lower than that of EFFICIENTNETV2-M in a training experiment. The magnification factor in the channel dimension in EFFCIENTNETV-M is 1.4, and the magnification factors in the depth are 1.8, which results in a EFFICIENTNETV-M model which is large and complex, and in order to obtain the maximum advantage of the EFFICIENTNETV-2-M network, the magnification factor in B0 is combined with the EFFICIENTNETV-2-M network structure to generate an improved EFFICIENTNETV-M0 network, so that the model accuracy is improved, and the model parameters and the calculation complexity in EFFCIENTNETV-M are reduced, and the improved network structure is shown in Table 2.

4.3 Introduction of DBB Module for structural re-parameterization

In order to improve the feature extraction capability of EFFICIENTNETV-M0, diverse Branch Block (DBB) is introduced into the Fused-MBConv structure, so that the structure re-parameterization is realized. The DBB module is an innovative method for realizing parameterized convolution. It combines the multi-branch and multi-scale concepts of Inception with the over-parameterized ideas to create the DBB module proposed by the present invention. A multi-drop module (DBB) utilizes complex multi-drop microstructures while maintaining an overall network structure during training. This enables efficient reasoning or deployment. During the inference process, the complex structure of the DBB can be converted into a single convolution, thereby minimizing loss of accuracy and shortening the inference time. The DBB may be embedded directly into any existing architecture as an equivalent embedding module. The diversity and flexibility of the module are reflected, and the feature extraction capability of various backbone feature extraction networks can be obviously improved.

The DBB module includes six transforms: branch addition combining, depth stitching combining, multi-scale operation, mean pooling, and convolution sequence. In the reasoning/deployment stage, it also includes multi-branch module merging, such as convolution (Conv) layer and BN layer merging, branch merging, convolution sequence merging, depth merging, mean pool transformation and multi-scale convolution transformation. Fig. 7 shows six alternative configurations of the DBB module. Where K represents the convolution kernel size, and 1×1 represents the convolution kernel size of 1×1.

The introduction of the DBB module solves the problem of slow reasoning speed caused by the increase of the network width. The DBB module uses the six transforms described above, whose structure for model transformation at deployment/reasoning is shown in fig. 8.

The invention adds a DBB module to the Fused-MBConv structure to improve accuracy and ensure faster reasoning speed. The structure of the modified Fused-MBConv is shown in FIG. 8.

4.4 Introduction of MCA attention mechanism

To improve the attention mechanism in EFFICIENTNETV-M0, the present invention employs the Multidimensional Collaborative Attention (MCA) attention mechanism. It is a lightweight and efficient multidimensional collaborative attention mechanism that models complementary attention simultaneously in the channel, height and width dimensions using a three-branch architecture to enhance the expressive power of learned features and pinpoint objects of interest. As shown in fig. 9, the proposed MCA module consists of three branches, where the left and middle two branches are responsible for capturing spatial dimensions and feature interdependencies on each other, while the right branch is mainly used for capturing inter-channel interactions, finally, in the integration stage, all the outputs of the three branches are simply and evenly aggregated, and the final refined feature map is derived by recalibrating the attention generated by different dimensions. To construct the core components of the MCA, extrusion and excitation transformations were proposed by modifying the SE channels. In particular, in the squeeze transforms, not only are global average pools and standard deviation pools utilized to aggregate cross-dimensional feature responses, but a combination mechanism is developed to adaptively merge the average pool and standard deviation pool features to enhance the representation of feature descriptors. In excitation transformation, local feature interactions are adaptively captured in a highly lightweight manner, rather than utilizing inefficient dimension reduction strategies in SE, to better overcome the contradiction between performance and computational overhead trade-offs. The present invention also replaces the SE attention mechanism in MBConv with the MCA attention mechanism. The MCA attention mechanism prevents the feature map from losing information due to dimension reduction operation, and the network can more fully extract the picture features. The improved MBConv module structure is shown in figure 10.

4.5 Loss function improvement

In the training process, the time-frequency image has larger target, stable appearance characteristic and easy learning, and the time-frequency signal generated during the fault has various characteristics, smaller target and more difficult training and learning of the characteristics. Imbalance in the number of samples of different classes can also lead to difficulty in learning other features of the model, affecting the gradient update direction of the loss function. The present invention uses Label smoothing (Label smoothing) and Focal loss to improve the original cross entropy loss function. Label Smooth distributes Label coefficients with smaller probability for other categories by setting a Label coefficient with larger probability for the target category, so that generalization capability of a model is improved, fitting is prevented, and probability distribution after Label smoothing is as follows:

，

Wherein, Is the probability distribution smoothed by the label; /(I)Is a super parameter; /(I)The total number of the categories is classified into multiple categories; /(I)For a certain class in multiple classifications,/>Is a real label; the updated probability distribution is equivalent to adding noise to the true distribution, which is subject to a simple uniform distribution for ease of computation.

Focal Loss dynamically adjusts the weights of samples of different categories in a Loss function, reduces the weight of samples easy to classify, improves the weight of samples difficult to classify, and relieves the problem of sample imbalance, and the formula is as follows:

，

Wherein, Predicting the probability of being a positive class for the model; /(I)Focal loss formula under probability of model prediction as positive class,/>For modulation factor,/>Is a balance coefficient. When/>Towards 0, the modulation factor tends to be 1, contributing significantly to the overall loss. When/>Towards 1, the modulation factor tends to be 0, contributing little to the overall loss.

4.6, Migration learning

Transfer learning is a Machine Learning (ML) method that improves model performance by transferring knowledge of the same or related source domain to the target domain. Rather than training the model from scratch, a pre-trained model may be used to refine a particular target task.

For formal definition, let D define a domain of ML classification questions consisting of two partsWherein/>Representing feature space,/>Representing a marginal probability distribution. Feature vector/>Is a specific element of the feature space and Y is a corresponding class number belonging to the tag space Y. For domain D, a task may be defined as/>Wherein/>Is fromFor the learned prediction function. The source domain dataset may be defined as/>Wherein/>And/>Corresponding class labels. Similarly, the target domain may be defined as. The tasks of the source domain and the target domain are/>, respectivelyAnd/>. If the prediction functions of the source task and the target task are/>, respectivelyAnd/>The transition learning can be formally defined as utilization/>And/>At/>Or/>Knowledge obtained at the time to improve/>。

The present invention uses popular image net datasets to obtainAnd/>And image classification training is performed using the improved transfer learning model EFFICIENTNETV-M0. The model was pre-trained on images of the ImageNet dataset and frozen before training using fused sensor images began. In the training process, only the last full-connection layer and the final full-connection layer of the classified output are modified to be matched with the number of fault categories in the experiment of the application. The optimized source model can identify features of more than one million images intended to contribute to the fault diagnosis task of the target domain.

In order to verify and analyze the effectiveness of the invention in fault diagnosis, the invention constructs a mechanical fault comprehensive simulation test bed and develops a series of experiments aiming at different faults. In experiments, the invention uses Pycharm for training and testing the transfer learning model, and uses Pytorch 1.10.1 framework for training the transfer learning model. All programs were run on a computer with a configuration of AMD Ryzen 7 5800h, NVIDIA RTX 3070, 16gb RAM.

All weights of the deep learning model are transferred from the model previously optimized and pre-trained using the ImageNet dataset for transfer learning. Model training integrates the fused time-frequency image onto a previously trained network. The improved EFFICIENTNETV-M0 model adopts an exponential decay strategy to adjust the learning rate, the Batch Size is 8, the maximum number of epochs is limited to 100, the initial learning rate is set to 0.01, the learning rate is reduced to 0.0001 after 100 rounds of iteration, and a model with stable loss function convergence is selected as a final classification model. The pre-training model explored is the improved EFFICIENTNETV-M0, pre-training is performed on the image of the ImageNet dataset using a transfer learning method, and the weights of the remaining layers except the last layer of the network structure with classification output are frozen before the formal training process is started using the fused sensor image.

The comprehensive simulation test bed for mechanical faults consists of a motor of 1.5KW, a rotor with two ends supported by bearings, a planetary gear box and a magnetic brake connected in series. The layout of the test stand is shown in fig. 11. The invention adopts motor current analysis and vibration spectrum analysis technology to test and verify the basic characteristics of electric and mechanical fault motors so as to obtain valuable experimental data under the same running state condition. The three-axis vibration acceleration sensor collects vibration signals of a fault or normal rolling bearing, the three-phase current detection sensor collects current fluctuation signals of the fault or normal motor, the sampling frequency is 12.8 kHz, and the sampling time is 11s. The experimental data set is detailed as follows: vibration data and current data include 6 categories including 6 conditions of Normal State (NS), turn-to-turn short circuit (ITSC), rotor bar Break (BR), eccentric Fault (EF), bearing Inner Race Fault (IRF), and bearing Outer Race Fault (ORF). Every time the fault mode is intercepted, 1750 samples with the length of 1024 are obtained, and the total is 1750 multiplied by 6=10500 samples. In addition, 70% of all data sets were used as training data sets, 20% as model selection and cross-validated validation data sets, and 10% as test data sets for the final test. During each training and test, the data set is randomly partitioned to ensure comprehensive evaluation of the model performance.

The invention adopts Continuous Wavelet Transform (CWT) to process the original data, selects non-overlapping sliding windows to obtain time-frequency diagrams of various signals, and the image size is 300x300. After all the time-frequency diagrams are generated, the time-frequency diagrams of various signals are fused through an image fusion technology based on wavelet transformation, and the size of the time-frequency diagrams is adjusted according to the image input size of a specific migration learning model. The fused images in the different failure modes are shown in fig. 14, which will serve as inputs to the proposed failure diagnosis model.

In connection with machine learning techniques, the accuracy evaluates a portion of the model performance, which is measured using various matrices derived from a2 x 2 confusion matrix, as shown in fig. 12.

The invention selects three parameters of precision, recall curve recall and specificity SPECIFICITY to evaluate the model. Where precision is the proportion of all positive predictions correctly predicted, recall is the proportion of all positive predictions correctly predicted, specificity is the proportion of all positive actual predicted samples, and the proportion is determined to be positive:

，

Wherein, The number of the true diagnosis results and the true detection samples is the number that the detection results are faults and the correct results are judged when the fault motor is detected; /(I)The number of error diagnosis is the number that the fault diagnosis result does not accord with the actual motor sample; /(I)The number of false actual judging results and motor samples, namely the number that no motor faults are detected and the result is judged to be true; /(I)The number of positive samples are the actual data, but negative samples are detected, namely the number of faults of the motor is not detected.

The confusion matrix of the test results of the improved transfer learning model EFFICIENTNETV-M0 on the test set according to the present invention is shown in FIG. 13, in which the darker the color, the higher the value. The confusion matrix can be intuitively seen, and the method can realize the accurate classification of various motor fault signals.

Table 3 shows the performance of the improved transfer learning model EFFICIENTNETV-M0 in terms of accuracy, recall, and specificity based on motor data categories. From the table, it can be deduced that the model achieves good effects on the accuracy, recall and specificity of each category.

TABLE 3 improvement of Performance of the transfer learning model EFFICIENTNETV2-M0

The method can intuitively see the performance of the confusion matrix, the precision, the recall rate and the specificity, and can realize the accurate classification of various motor fault signals.

The result shows that the method can effectively classify the faults of motor equipment, and the identification accuracy rate of the method for various working conditions is 99.52% through the evaluation of a test set.

Embodiment two:

the first embodiment provides a motor fault diagnosis method based on an improved transfer learning model, and the motor fault diagnosis device based on the improved transfer learning model includes:

the signal acquisition module is used for acquiring multi-source heterogeneous signals when the motor operates, and comprises: a three-axis vibration acceleration signal and a three-phase current signal;

The data processing module is used for carrying out continuous wavelet transformation on multi-source heterogeneous signals when the motor operates, respectively generating wavelet time-frequency diagrams, and comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram;

the image fusion module is used for carrying out image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image;

the fault diagnosis module is used for inputting the fusion image into the trained improved transfer learning model EFFICIENTNETV-M0 and outputting a fault diagnosis result of the motor.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A motor fault diagnosis method based on an improved transfer learning model is characterized by comprising the following steps:

collecting multi-source heterogeneous signals when a motor operates, comprising: a three-axis vibration acceleration signal and a three-phase current signal;

Continuous wavelet transformation is carried out on multi-source heterogeneous signals when a motor operates, and wavelet time-frequency diagrams are respectively generated, and the method comprises the following steps: a triaxial vibration acceleration signal time-frequency diagram and a three-phase current signal time-frequency diagram;

performing image fusion on the triaxial vibration acceleration signal time-frequency diagram and the three-phase current signal time-frequency diagram to obtain a fusion image;

And inputting the fusion image into a trained improved transfer learning model EFFICIENTNETV-M0, and outputting a fault diagnosis result of the motor.

2. The method for diagnosing motor faults based on an improved transfer learning model as claimed in claim 1, wherein the image fusion of the time-frequency diagram of the triaxial vibration acceleration signal and the time-frequency diagram of the triphase current signal is carried out to obtain a fused image, comprising:

3. The motor fault diagnosis method based on the improved transfer learning model according to claim 2, wherein the fusing of different levels of subband coefficients of the triaxial vibration acceleration signal time-frequency diagram and the triphase current signal time-frequency diagram is performed by adopting a set fusing rule, comprising:

4. The motor fault diagnosis method based on the improved transfer learning model according to claim 1, wherein the improved transfer learning model EFFICIENTNETV-M0 is obtained by optimizing a multiplying factor, introducing a DBB module, introducing an MCA attention mechanism, and improving a loss function on the basis of the M version of the transfer learning model EFFICIENTNETV 2;

the multiplying power factor is optimized and is used for reducing the parameter number and the calculation complexity of the model;

Introducing a DBB module for improving the feature extraction capability of the model;

introducing an MCA attention mechanism for enhancing the expression capability of the learned features and precisely positioning the object of interest;

And improving a loss function, which is used for improving the generalization capability of the model and preventing the training from being fitted.

5. The motor fault diagnosis method based on the improved transfer learning model according to claim 4, wherein optimizing the multiplying factor comprises: the multiplying factor in the B0 version of the transfer learning model EFFICIENTNETV is combined with the network structure of the M version of the transfer learning model EFFICIENTNETV to generate an improved transfer learning model EFFICIENTNETV-M0, and model parameters and computational complexity in EFFCIENTNETV-M are reduced while model accuracy is improved.

6. The motor fault diagnosis method based on the improved transfer learning model of claim 5, wherein the introduction of the DBB module comprises: adding a DBB module in the Fused-MBConv structure of the improved transfer learning model EFFICIENTNETV-M0, the DBB module containing six transformations in the reasoning/deployment phase: (1) Conv layer and BN layer combination: (2) branch merging; (3) convolution sequence combining; (4) deep splicing and merging; (5) mean pooling conversion; (6) multiscale convolution conversion.

7. The motor fault diagnosis method based on the improved transfer learning model according to claim 5, wherein the MCA attention mechanism is introduced, comprising: and replacing the SE module based on the SE attention mechanism by the MCA module based on the MCA attention mechanism in the Fused-MBConv structure of the improved transfer learning model EFFICIENTNETV-M0, wherein the MCA module comprises three branches which are respectively responsible for capturing feature interdependence and capturing inter-channel interaction on the spatial dimensions W and H, and finally, in an integration stage, carrying out simple average aggregation on all outputs of the three branches, and carrying out recalibration on attention rights generated by different dimensions to derive a final refined feature map.

8. The motor fault diagnosis method based on the improved transfer learning model of claim 5, wherein improving the loss function comprises:

，

The formula of Focal loss is:

，

9. The motor fault diagnosis method based on the improved transfer learning model according to claim 1, wherein the training method of the improved transfer learning model EFFICIENTNETV-M0 comprises:

obtaining source domains using ImageNet dataset And target Domain/>; All initial weights of the improved transfer learning model EFFICIENTNETV-M0 are transferred from the transfer learning using the optimized and pre-trained model of the ImageNet dataset;

The training process for improving the transfer learning model EFFICIENTNETV-M0 includes: integrating the fused image onto a previously trained network; all models are trained by using the same super parameters and are preprocessed by using the same data; and (3) adjusting the learning rate by adopting an exponential decay strategy, wherein the Batch Size is 8, the maximum number of epochs is limited to 100, the initial learning rate is set to 0.01, the learning rate is reduced to 0.0001 after 100 rounds of iteration, and a model with stable loss function convergence is selected as a final classification model.

10. A motor fault diagnosis device based on an improved transfer learning model, comprising: