CN117672255B

CN117672255B - Abnormal equipment identification method and system based on artificial intelligence and equipment operation sound

Info

Publication number: CN117672255B
Application number: CN202311768501.XA
Authority: CN
Inventors: 黄毅伟; 史超; 邢子龙; 樊燊; 李少洋
Original assignee: Beijing Disheng Technology Co ltd
Current assignee: Beijing Disheng Technology Co ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-05-14
Anticipated expiration: 2043-12-21
Also published as: CN117672255A

Abstract

The invention discloses an abnormal equipment identification method and system based on artificial intelligence and equipment operation sound, comprising the following steps: firstly, operation sound of undetermined equipment is obtained, an audio frame feature vector is obtained through analysis, and matched frequency domain features are obtained through further analysis. And constructing a frequency spectrum by using the features, and executing feature integration to obtain a joint feature vector. And obtaining a target audio frame feature vector by combining the corresponding joint feature vectors, and detecting the running state of the equipment according to the target audio frame feature vector. If the device state is abnormal, the abnormal device is marked. By the design, the running state of the equipment can be automatically and quickly identified, manual frequent monitoring is not needed, the efficiency of equipment detection is greatly improved, potential faults are prevented in advance, and high maintenance or replacement cost caused by equipment damage is avoided.

Description

Abnormal equipment identification method and system based on artificial intelligence and equipment operation sound

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an abnormal equipment identification method and system based on artificial intelligence and equipment operation sound.

Background

In the prior art, the running state of equipment is mainly judged by physical inspection and manual observation. This approach is not only inefficient, but also limited in accuracy. With the development of artificial intelligence and acoustic analysis techniques, machine learning algorithms are widely used in anomaly detection and prediction, including detection of device operating conditions. However, these methods typically require a large amount of computational resources and complex preprocessing steps, as well as a high degree of expertise in acoustic feature extraction and parsing. Furthermore, conventional methods may not accurately identify anomalies from the operating sound of the device because they typically focus on only a single frequency domain feature, ignoring other important information that may reveal device problems. Therefore, there is a need to develop a new method capable of effectively and accurately identifying equipment anomalies.

Disclosure of Invention

The invention aims to provide an abnormal equipment identification method and system based on artificial intelligence and equipment operation sound.

In a first aspect, an embodiment of the present invention provides a method for identifying an abnormal device based on artificial intelligence and device operation sound, where the method includes:

acquiring operation sound of the equipment to be determined, analyzing the operation sound of the equipment to be determined to obtain each audio frame, extracting the characteristics of each audio frame, and obtaining the characteristic vector of each audio frame;

Analyzing each audio frame feature vector respectively to obtain a first frequency domain feature matched with each audio frame feature vector to form a first frequency domain feature set, and obtaining a second frequency domain feature matched with each audio frame feature vector to form a second frequency domain feature set;

Constructing a first frequency spectrum corresponding to the first frequency domain feature set according to the pearson correlation coefficient between each first frequency domain feature in the first frequency domain feature set, and constructing a second frequency spectrum corresponding to the second frequency domain feature set according to the sequence of each audio frame;

Performing feature integration operation according to the first frequency domain features in the first frequency spectrum and the neighborhood features corresponding to the first frequency domain features to obtain first joint feature vectors respectively matched with each first frequency domain feature in the first frequency domain feature set, and performing feature integration operation according to the second frequency domain features in the second frequency spectrum and the neighborhood features corresponding to the second frequency domain features to obtain second joint feature vectors respectively matched with each second frequency domain feature in the second frequency domain feature set;

Performing merging operation according to the first joint feature vector and the second joint feature vector which are matched with the same audio frame feature vector, obtaining target audio frame feature vectors matched with each audio frame feature vector, and performing equipment running state detection according to the target audio frame feature vectors matched with each audio frame feature vector, so as to obtain equipment running states corresponding to the to-be-determined equipment running sound;

and when the equipment operation state represents equipment abnormality, marking the target equipment corresponding to the operation sound of the equipment to be determined as abnormal equipment.

In a second aspect, an embodiment of the present invention provides a server system, including a server, where the server is configured to perform the method described in the first aspect.

Compared with the prior art, the invention has the beneficial effects that: by adopting the abnormal equipment identification method and system based on the artificial intelligence and the equipment operation sound, disclosed by the invention, the audio frame feature vector is obtained through analysis by acquiring the undetermined equipment operation sound, and the matched frequency domain feature is further obtained through analysis. And constructing a frequency spectrum by using the features, and executing feature integration to obtain a joint feature vector. And obtaining a target audio frame feature vector by combining the corresponding joint feature vectors, and detecting the running state of the equipment according to the target audio frame feature vector. If the device state is abnormal, the abnormal device is marked. By the design, the running state of the equipment can be automatically and quickly identified, manual frequent monitoring is not needed, the efficiency of equipment detection is greatly improved, potential faults are prevented in advance, and high maintenance or replacement cost caused by equipment damage is avoided.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. Other relevant drawings may be made by those of ordinary skill in the art without undue burden from these drawings.

FIG. 1 is a schematic block diagram of a step flow of an abnormal device identification method based on artificial intelligence and device operation sound provided by an embodiment of the invention;

fig. 2 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

In order to solve the foregoing technical problems in the background art, fig. 1 is a schematic flow chart of an abnormal device identification method based on an artificial intelligence and a device operation sound according to an embodiment of the present disclosure, and the detailed description of the abnormal device identification method based on the artificial intelligence and the device operation sound is provided below.

Step S201, acquiring operation sound of the undetermined equipment, analyzing the operation sound of the undetermined equipment to obtain each audio frame, extracting characteristics of each audio frame, and obtaining characteristic vectors of each audio frame;

Step S202, analyzing each audio frame feature vector to obtain a first frequency domain feature matched with each audio frame feature vector, forming a first frequency domain feature set, and obtaining a second frequency domain feature matched with each audio frame feature vector, forming a second frequency domain feature set;

step 203, constructing a first frequency spectrum corresponding to the first frequency domain feature set according to pearson correlation coefficients between each first frequency domain feature in the first frequency domain feature set, and constructing a second frequency spectrum corresponding to the second frequency domain feature set according to the sequence of each audio frame;

Step S204, performing feature integration operation according to the first frequency domain features in the first frequency spectrum and the neighborhood features corresponding to the first frequency domain features to obtain first joint feature vectors respectively matched with each first frequency domain feature in the first frequency domain feature set, and performing feature integration operation according to the second frequency domain features in the second frequency spectrum and the neighborhood features corresponding to the second frequency domain features to obtain second joint feature vectors respectively matched with each second frequency domain feature in the second frequency domain feature set;

step S205, a merging operation is carried out according to the first joint feature vector and the second joint feature vector which are respectively matched with the same audio frame feature vector, so as to obtain target audio frame feature vectors respectively matched with each audio frame feature vector, and equipment operation state detection is carried out according to the target audio frame feature vectors respectively matched with each audio frame feature vector, so as to obtain equipment operation states corresponding to the undetermined equipment operation sound;

And step S206, when the equipment operation state represents equipment abnormality, marking the target equipment corresponding to the operation sound of the equipment to be determined as abnormal equipment.

In an embodiment of the invention, there are multiple machines, such as presses, drills, conveyor belts, etc., in an industrial plant, for example. Microphones will be installed to capture sounds produced by these machines in real time while they are running. For each audio frame (typically a few milliseconds or tens of milliseconds), it is transformed into a frequency domain representation by fourier transformation and key features are extracted, such as information on spectral energy, spectral shape, frequency distribution, etc. These features will be grouped into an audio frame feature vector. By further analyzing each audio frame feature vector, a first frequency domain feature, such as dominant frequency, distribution of spectral energy, and a second frequency domain feature, such as phase difference, spectral slope, etc., may be extracted. These features will be used for subsequent matching and analysis. The first frequency spectrum may be constructed from correlations between each of the features in the first set of frequency domain features. For example, by calculating pearson correlation coefficients between different features, a matrix representing the correlation of the different features may be obtained as the first spectrum. For the second set of frequency domain features, a second spectrum may be constructed from the sequence of audio frames to represent correlations between the audio frames. Based on features in the first and second spectrums, a feature integration operation may be performed to obtain first and second joint feature vectors. For example, for the first joint feature vector, each first frequency domain feature and its neighborhood feature may be combined to form a composite feature vector. Similarly, for the second combined feature vector, the second frequency domain feature and its neighborhood feature may be combined to obtain another integrated feature vector. And combining the first combined feature vector and the second combined feature vector which are matched with each audio frame feature vector respectively to obtain a target audio frame feature vector, wherein the target audio frame feature vector comprises information of multiple dimensions and reflects various features of the audio frame. And detecting the running state of the equipment by using the target audio frame feature vector. The operating state of the device may be judged by comparing the difference between the target feature vector and the normal device feature vector using the known feature vector of the normal device as a reference. For example, if the difference between the target feature vector and the normal device feature vector exceeds a certain threshold, it may indicate that there is an abnormality in the device. And when the equipment operation state is judged to be abnormal, marking the target equipment corresponding to the operation sound of the equipment to be determined as abnormal equipment. This will help maintenance personnel or systems to discover and handle equipment failures in a timely manner. The operation marked as abnormal equipment can be completed by recording or marking the corresponding equipment state in a database or a real-time monitoring system.

In a more detailed embodiment, for example, in an industrial production line, there is a press for stamping metal parts. Microphones are installed to capture sounds produced when the press is in operation and abnormal equipment identification is performed using the method described in the previous schemes. During operation of the press, acoustic signals generated by the press are captured by the microphone. Analyzing each audio frame, extracting characteristics such as information of frequency spectrum energy, frequency distribution and the like of the audio frame, and forming an audio frame characteristic vector. A first frequency domain feature, such as dominant frequency, spectral energy distribution, is extracted from the audio frame feature vector, and a second frequency domain feature, such as phase difference, spectral slope, etc., is extracted. A first spectrum is constructed from the pearson correlation coefficients between the first frequency domain features, representing the correlation between the different features. A second spectrum is constructed from the sequence of audio frames, representing the correlation between different audio frames. And performing feature integration operation based on features in the first frequency spectrum and the second frequency spectrum to obtain a first joint feature vector and a second joint feature vector. And combining the first combined feature vector and the second combined feature vector which are matched with each audio frame feature vector to obtain the target audio frame feature vector. The target audio frame feature vector is compared with the feature vector of a known normal press, and the running state of the equipment is judged through a classifier or a rule engine. If the difference exceeds a preset threshold, an abnormality in the press may be indicated. When the operating state of the press is judged to be abnormal, the equipment to be determined (i.e., the press) is marked as abnormal equipment. This may be recorded in a database or in a real-time monitoring system so that maintenance personnel or the system can take timely action to repair or replace. By the design, the method can be applied to an industrial equipment identification scene, and abnormal equipment can be accurately marked so as to promote timely response and treatment of equipment faults.

In one possible implementation, the foregoing step S201 may be implemented by the following example execution.

(1) Extracting the signal characteristics of each audio frame to obtain each audio signal characteristic;

(2) Acquiring a sequence of each audio frame, and executing feature extraction operation on the sequence of each audio frame to obtain each audio time sequence feature;

(3) And executing integration operation on each audio signal characteristic and the corresponding audio time sequence characteristic to obtain each audio frame characteristic vector.

In an embodiment of the present invention, for each audio frame, signal characteristics thereof, such as information of amplitude, energy, spectrum shape, etc., of the audio frame may be extracted, for example. These signal characteristics may be used to represent local characteristics of the audio frame. For example, in the sound collection process of a press, a time domain analysis method, such as Root Mean Square (RMS) energy, may be used to extract the signal characteristics of each audio frame. By calculating the energy of each audio frame, the sound intensity of each audio frame can be known. Throughout the audio signal, the audio is split into a series of consecutive audio frames. Feature extraction operations may be performed for sequences of these audio frames. Taking the press as an example, the acquired sound signal may be divided into a series of consecutive audio frames. Then, for a sequence of these audio frames, some audio timing characteristics may be extracted, such as the time interval between audio frames, the duration of the audio frames, the average energy of the audio frames, etc. These timing characteristics may provide dynamic information of the audio signal. And for each audio frame, integrating the signal characteristics and the corresponding audio time sequence characteristics to obtain the characteristic vector of each audio frame. Further to the above example of presses, the signal characteristics (e.g., amplitude, spectral shape) of each audio frame may be integrated with the corresponding audio timing characteristics (e.g., time interval, average energy). By combining these features, a more comprehensive and comprehensive audio frame feature vector can be obtained to describe the time and frequency domain characteristics of the audio frame.

By the design, the signal characteristic and the corresponding audio time sequence characteristic of each audio frame are extracted, and the signal characteristic and the corresponding audio time sequence characteristic are integrated, so that the characteristic vector of each audio frame is finally obtained. This allows more accurate characterization of the audio characteristics of each audio frame, providing more valuable information for subsequent device state detection and anomaly identification.

In one possible implementation, the aforementioned step S203 may be implemented by the following example execution.

(1) Calculating characteristic pearson correlation coefficients among the first frequency domain features, and determining frequency point relevance among the first frequency domain features according to the characteristic pearson correlation coefficients;

(2) And respectively taking each first frequency domain feature as a frequency point, and correlating each first frequency domain feature according to the frequency point correlation to obtain the first frequency spectrum.

In an embodiment of the invention, for example, taking a press as an example, it is assumed that a set of first frequency domain features, such as dominant frequencies, distribution of spectral energy, etc., have been extracted from the audio signal. Next, a first spectrum may be constructed as follows: the pearson correlation coefficients between all the first frequency domain features may be calculated for them. The pearson correlation coefficient measures the degree of linear correlation between two variables, ranging from-1 to 1, where 1 represents a complete positive correlation, -1 represents a complete negative correlation, and 0 represents no correlation. In this scenario, the degree of correlation between each first frequency domain feature may be calculated using pearson correlation coefficients. In this way, the frequency point correlation between each of the first frequency domain features can be determined. Based on the calculated characteristic pearson correlation coefficients, each first frequency domain characteristic may be regarded as a frequency point and associated according to the association between the frequency points. This may form a first spectrum. For example, in the press scenario, assume that there are three first frequency domain features: frequency point A, frequency point B and frequency point C. By calculating the characteristic pearson correlation coefficient between them, the frequency point correlation between them can be determined. If the frequency point A and the frequency point B have higher correlation coefficients and the frequency point A and the frequency point C have lower correlation coefficients, the frequency point A and the frequency point B can be correlated to form a first frequency spectrum.

By the design, the characteristic pearson correlation coefficient between each first frequency domain characteristic is calculated, and a corresponding first frequency spectrum is constructed according to the frequency point correlation. This allows a more comprehensive description of the correlation between the first frequency domain features, providing more information for subsequent abnormal device identification and fault detection.

Determining sequence indexes of the second frequency domain features matched with the feature vectors of each audio frame according to the sequence of each audio frame, and determining sequence relations among the second frequency domain features in the second frequency domain feature set according to the sequence indexes;

And respectively taking each second frequency domain feature as a frequency point, and correlating each second frequency domain feature according to the sequence relation to obtain the second frequency spectrum.

In the embodiment of the present invention, taking a press as an example, it is assumed that a feature vector of each audio frame has been extracted from an audio signal. Next, a second spectrum may be constructed as follows: for each feature vector of an audio frame, the sequence index of the second frequency domain feature it matches may be determined from its position in the sequence of audio frames. In this way an association between the audio frame and the second frequency domain feature can be established. For example, in the press scenario, a series of consecutive audio frames is assumed, where each audio frame has a corresponding feature vector. By matching each feature vector to its position in the sequence of audio frames, a sequence index of the second frequency domain feature to which each feature vector matches can be determined. For example, in the press scenario, assume that there is a set of second frequency domain features: x frequency point, Y frequency point and Z frequency point. By matching the feature vector with the sequence index of the second frequency domain feature, the sequence relationship between them can be determined. If the feature vector A matches the X frequency point, the feature vector B matches the Y frequency point, and the feature vector C matches the Z frequency point, then the sequence relationship of the feature vectors A, B and C in the second frequency domain feature set can be determined. Based on the calculated sequence relationships, each of the second frequency domain features may be regarded as frequency points and associated according to the sequence relationships. This may form a second spectrum. In the example of the press machine described above, the X frequency point, the Y frequency point, and the Z frequency point may be regarded as frequency points, and they may be associated according to the sequence relationship therebetween. For example, if the feature vector a matches the X frequency point, the feature vector B matches the Y frequency point, and the feature vector C matches the Z frequency point, the X frequency point, the Y frequency point, and the Z frequency point may be correlated to form a second spectrum. And determining the sequence index of each matched second frequency domain feature through the sequence of each audio frame, and determining the sequence relation between each second frequency domain feature in the second frequency domain feature set according to the sequence index. Then, each second frequency domain feature is respectively used as a frequency point, and the frequency points are associated according to a sequence relation, so that a second frequency spectrum is constructed. This allows a more comprehensive description of the correlation between the audio frame and the second frequency domain feature, providing more information for subsequent abnormal device identification and fault detection.

For example, assume that the operating state of an industrial machine is being monitored. The sound signal of the device is converted into a series of audio frames and feature vectors for each audio frame are extracted. These feature vectors may include information on sound intensity, frequency distribution, etc.

By performing feature extraction and sequence index matching, a sequence index of the second frequency domain feature corresponding to each audio frame feature vector can be determined. For example, if a certain audio frame feature vector a corresponds to the 3 rd feature in the second frequency domain feature set, the audio frame feature vector B corresponds to the 1 st feature in the second frequency domain feature set, and the audio frame feature vector C corresponds to the 2 nd feature in the second frequency domain feature set, then the sequence relationship between them is obtained.

Then, each second frequency domain feature is used as a frequency point and is associated according to a sequence relation. In the above example, the third feature may be associated with frequency point a, the first feature with frequency point B, and the second feature with frequency point C. Thus, a second spectrum is constructed.

So designed, a second frequency spectrum corresponding to the second frequency domain feature set can be constructed according to the sequence of each audio frame according to the description of the steps. In this way, the sequence relation between the audio frame feature vector and the second frequency domain feature can be more specifically displayed, and more valuable information is provided for subsequent abnormal equipment identification and fault detection.

In one possible implementation, the foregoing step S204 may be implemented by the following example execution.

Calculating the mean value characteristic of the neighborhood characteristic corresponding to the first frequency domain characteristic to obtain a first mean value characteristic, and calculating the deviation characteristic between the first frequency domain characteristic and the neighborhood characteristic corresponding to the first frequency domain characteristic to obtain a first deviation characteristic;

Performing a merging operation on the first frequency domain feature, the first deviation feature and the first mean feature to obtain a first merging feature, and performing a complete coupling operation according to the first merging feature to obtain a first joint feature vector corresponding to the first frequency domain feature;

And polling each first frequency domain feature in the first frequency spectrum to obtain a first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set.

In an embodiment of the invention, taking a press as an example, it is assumed that a first frequency spectrum has been constructed according to the previous steps and a set of first frequency domain features, such as dominant frequencies, spectral energy distribution, etc., are obtained. Next, a feature integration operation may be performed to obtain a first joint feature vector according to the following steps: for each first frequency domain feature, a mean value of its neighborhood features may be calculated. The neighborhood feature may be a feature of a neighboring frequency point of the feature, or may be a feature within a time window. By calculating the average value of the neighborhood characteristics, a first average value characteristic representing the overall trend of the neighborhood characteristics can be obtained. Between the first frequency domain feature and its corresponding neighborhood feature, a difference or deviation between them may be calculated. In this way a first deviation feature reflecting the relative change in the feature in the neighborhood can be obtained. And combining the first frequency domain feature, the first deviation feature and the first mean feature to obtain a first combined feature which comprehensively considers the frequency domain feature, the deviation and the trend of the frequency domain feature and the neighborhood feature. A full coupling operation (Fully Connected operation) is performed using the first combined feature as input to obtain a first combined feature vector representing information about the entirety of the first frequency domain feature and the neighborhood feature.

And performing feature integration operation on each first frequency domain feature in the first frequency spectrum and the corresponding neighborhood feature to obtain a first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set. In this way, the related information among different features can be fused, and a more comprehensive feature representation is provided for subsequent abnormal equipment identification and fault detection.

(1) Calculating the mean value characteristic of the neighborhood characteristic corresponding to the second frequency domain characteristic to obtain a second mean value characteristic, and calculating the deviation characteristic between the second frequency domain characteristic and the neighborhood characteristic corresponding to the second frequency domain characteristic to obtain a second deviation characteristic;

(2) Performing a merging operation on the second frequency domain feature, the second deviation feature and the second mean feature to obtain a second merging feature, and performing a complete coupling operation according to the second merging feature to obtain a second merging feature vector corresponding to the second frequency domain feature;

(3) And polling each second frequency domain feature in the second frequency spectrum to obtain a second combined feature vector matched with each second frequency domain feature in the second frequency domain feature set.

In an embodiment of the invention, it is assumed, by way of example, that the operating state of a wind turbine is being monitored. Converting vibration signals generated by the wind driven generator into a series of audio frames, and extracting the characteristic vector of each audio frame. By constructing the first frequency spectrum, a first set of frequency domain features, such as vibration intensity, frequency distribution, etc., may be obtained. Next, a feature integration operation may be performed to obtain a second joint feature vector according to the following steps: for each second frequency domain feature, the mean of its neighborhood features may be calculated. The neighborhood feature may be a feature of a neighboring frequency point of the feature, or may be a feature within a time window. And calculating the average value of the neighborhood characteristics to obtain a second average value characteristic representing the overall trend of the neighborhood characteristics. Between the second frequency domain feature and its corresponding neighborhood feature, a difference or deviation between them may be calculated. This allows to obtain a second deviation feature reflecting the relative variation of the feature in the neighborhood. And combining the second frequency domain feature, the second deviation feature and the second mean feature to obtain a second combined feature which comprehensively considers the frequency domain feature, the deviation and the trend of the frequency domain feature and the neighborhood feature. A full coupling operation (Fully Connected operation) is performed using the second combined feature as input to obtain a second combined feature vector representing overall information of the second frequency domain feature and the neighborhood feature.

And executing feature integration operation on each second frequency domain feature in the second frequency spectrum and the corresponding neighborhood feature to obtain second combined feature vectors matched with each second frequency domain feature in the second frequency domain feature set. In this way, the related information among different features can be fused, and a more comprehensive feature representation is provided for subsequent abnormal equipment identification and fault detection.

In one possible implementation, the aforementioned step S205 may be implemented by the following example execution.

(1) Acquiring a first data enhancement factor, and applying a nonlinear conversion function to a first joint feature vector which is matched with each first frequency domain feature in the first frequency domain feature set according to the first data enhancement factor to obtain first data enhancement features which are matched with each first frequency domain feature in the first frequency domain feature set;

(2) Applying a nonlinear conversion function to the second joint feature vectors matched with each second frequency domain feature in the second frequency domain feature set according to the first data enhancement factors to obtain second data enhancement features matched with each second frequency domain feature in the second frequency domain feature set;

(3) And executing merging operation on the first data strengthening feature and the second data strengthening feature which are matched with the same audio frame feature vector respectively to obtain strengthening audio frame feature vectors matched with each audio frame feature vector, and detecting the running state of the equipment according to the strengthening audio frame feature vectors matched with each audio frame feature vector to obtain the running state of the target equipment corresponding to the running sound of the equipment to be determined.

In an embodiment of the present invention, it is assumed, by way of example, that the operating state of an automobile engine is being monitored. The method comprises the steps of collecting sound signals in the working process of the engine, and extracting the characteristic vector of each audio frame. And respectively obtaining a first frequency domain feature set and a second frequency domain feature set by constructing the first frequency domain and the second frequency domain. Next, feature integration operations and device operational status detection may be performed as follows: a first data enhancement factor may be calculated and applied to the first joint feature vector, transformed by a nonlinear transformation function. This may enhance the expressive power of each feature in the first set of frequency domain features. Similarly, a second data enhancement factor is calculated and applied to the second joint feature vector, transformed by a nonlinear transformation function. This may enhance the expressive power of each feature in the second set of frequency domain features. The first data enhancement feature and the second data enhancement feature are combined to obtain an enhanced audio frame feature vector representing audio frame feature vector synthesis information. By using the enhanced audio frame feature vector as input, the device running state detection algorithm is executed, so that the running state of the target device corresponding to the running sound of the device to be determined, such as normal, abnormal or fault, can be judged.

The design is that the first joint feature vector and the second joint feature vector are combined, and the enhanced audio frame feature vector is utilized to detect the running state of the equipment. Thus, more representative and rich characteristic representations can be obtained, and the accuracy and reliability of equipment running state detection are improved.

In a possible implementation manner, the step of applying a nonlinear conversion function to the first joint feature vector of each first frequency domain feature of the first frequency domain feature set according to the first data enhancement factor to obtain the first data enhancement feature of each first frequency domain feature of the first frequency domain feature set, where the first joint feature vector is matched with each first frequency domain feature of the first frequency domain feature set, may be implemented by the following example.

(1) Activating the first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set according to the first data enhancement factor to obtain first activation features matched with each first frequency domain feature in the first frequency domain feature set, and calculating normal distribution errors corresponding to the first activation features to obtain first normal distribution errors;

(2) Performing weight distribution operation on the first joint feature vectors matched with each first frequency domain feature in the first frequency domain feature set to obtain first weight features matched with each first frequency domain feature in the first frequency domain feature set;

(3) And calculating the dot product of the first weight feature and the first normal distribution error to obtain a first data strengthening feature of each first frequency domain feature in the first frequency domain feature set, wherein the first data strengthening feature is matched with each first frequency domain feature.

In an exemplary embodiment of the present invention, it is assumed that a vibration signal of an industrial machine is being monitored and a vibration spectrum during operation of the machine is extracted. And obtaining vibration amplitude values on different frequency points of the machine by constructing a first frequency domain feature set. Next, a nonlinear conversion and computation operation may be performed to obtain a first data enhancement feature in accordance with the following steps: and performing activation operation on the first joint feature vector by using a first data enhancement factor, and mapping the first joint feature vector into a new feature space through a nonlinear conversion function to obtain a first activation feature. This may enhance the expressive power of each feature in the first set of frequency domain features. For each first activation feature, a degree of fitting to the normal distribution may be calculated, resulting in a first normal distribution error. This measures the degree of deviation of the first activation feature from the normal distribution. For each first frequency domain feature, a weight distribution operation is performed according to the first joint feature vector to obtain first weight features representing importance thereof. This allows for the degree of contribution of each feature in the detection of the operating state of the device. And performing dot product operation on the first weight features and the first normal distribution errors to obtain first data enhancement features matched with each first frequency domain feature. Thus, the weight of the feature and the fitting condition of the feature relative to normal distribution can be comprehensively considered, and the expression capacity of the feature is further enhanced. And applying a nonlinear conversion function to the first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set according to the first data enhancement factors to obtain first data enhancement features. Therefore, the feature representation can be further optimized by introducing nonlinear transformation and considering the weight and fitting degree of the feature, and the accuracy and reliability of equipment running state detection are improved.

In a possible implementation manner, the step of applying a nonlinear conversion function to the second joint feature vector that is matched with each second frequency-domain feature in the second frequency-domain feature set according to the first data enhancement factor to obtain the second data enhancement feature that is matched with each second frequency-domain feature in the second frequency-domain feature set may be implemented by the following example.

(1) Activating the second joint feature vectors matched with each second frequency domain feature in the second frequency domain feature set according to the first data enhancement factors to obtain second activation features matched with each second frequency domain feature in the second frequency domain feature set, and calculating normal distribution errors corresponding to the second activation features to obtain second normal distribution errors;

(2) Performing weight distribution operation on the second joint feature vectors matched with each second frequency domain feature in the second frequency domain feature set to obtain second weight features matched with each second frequency domain feature in the second frequency domain feature set;

(3) And calculating the dot product of the second weight feature and the second normal distribution error to obtain second data strengthening features matched with each second frequency domain feature in the second frequency domain feature set.

In an embodiment of the present invention, it is assumed, by way of example, that an abnormal sound detection system is being developed for monitoring the operating state of a machine on an industrial production line. In this system, sound signals from different sensors are obtained and converted into frequency domain features for further analysis. In processing the sound signal, nonlinear conversion and computation operations may be performed to obtain the second data enhancement features as follows: and performing activation operation on the second joint feature vector by using the first data enhancement factor, and mapping the second joint feature vector into a new feature space through a nonlinear conversion function to obtain a second activation feature. This may enhance the expressive power of each feature in the second set of frequency domain features. For each second activation feature, its degree of fit to the normal distribution may be calculated, resulting in a second normal distribution error. This measures the degree of deviation of the second activation feature from the normal distribution. And for each second frequency domain feature, performing weight distribution operation according to the second joint feature vector to obtain a second weight feature representing the importance of the second frequency domain feature. This allows for the degree of contribution of each feature in abnormal sound detection. And performing dot product operation on the second weight features and the second normal distribution errors to obtain second data enhancement features matched with each second frequency domain feature. Thus, the weight of the feature and the fitting condition of the feature relative to normal distribution can be comprehensively considered, and the expression capacity of the feature is further enhanced. And applying a nonlinear conversion function to the second joint feature vector matched with each second frequency domain feature in the second frequency domain feature set according to the first data enhancement factors to obtain second data enhancement features. Thus, the feature representation can be further optimized by introducing nonlinear transformation and considering the weight and fitting degree of the feature, and the accuracy and reliability of the abnormal sound detection system can be improved.

In a possible implementation manner, the step of detecting the running state of the device according to the respective matched enhanced audio frame feature vector of each audio frame feature vector to obtain the running state of the target device corresponding to the running sound of the device to be determined may be implemented by the following example.

(1) Analyzing the reinforced audio frame feature vectors matched with each audio frame feature vector to obtain a first reinforced frequency domain feature set, a second reinforced frequency domain feature set and a third reinforced frequency domain feature set, wherein the sum of the frequency spectrum number of the second reinforced frequency domain feature in the second reinforced frequency domain feature set and the frequency spectrum number of the third reinforced frequency domain feature in the third reinforced frequency domain feature set is consistent with the frequency spectrum number of the second frequency domain feature;

(2) Constructing a first enhanced frequency spectrum corresponding to the first enhanced frequency domain feature set according to the pearson correlation coefficient between each first enhanced frequency domain feature in the first enhanced frequency domain feature set, and constructing a second enhanced frequency spectrum corresponding to the second enhanced frequency domain feature set according to the sequence of each audio frame;

(3) Determining neighbor reinforced frequency domain features matched with each third reinforced frequency domain feature in the third reinforced frequency domain feature set according to the sequence of each audio frame, and constructing a third reinforced frequency spectrum corresponding to the third reinforced frequency domain feature set according to pearson correlation coefficients between the neighbor reinforced frequency domain features matched with each third reinforced frequency domain feature in the third reinforced frequency domain feature set;

(4) Performing feature integration operation according to the first enhanced frequency domain features in the first enhanced frequency spectrum and the neighborhood features corresponding to the first enhanced frequency domain features to obtain first combined enhanced features matched with each first enhanced frequency domain feature in the first enhanced frequency domain feature set;

(5) Executing feature integration operation according to the second enhanced frequency domain features in the second enhanced frequency spectrum and the neighborhood features matched with the second enhanced frequency domain features respectively to obtain second combined enhanced features matched with each second enhanced frequency domain feature in the second enhanced frequency domain feature set;

(6) Performing feature integration operation according to the third enhanced frequency domain features in the third enhanced frequency spectrum and the neighborhood features matched with the third enhanced frequency domain features respectively to obtain third combined enhanced features matched with each third enhanced frequency domain feature in the third enhanced frequency domain feature set;

(7) Performing merging operation according to the first merging strengthening feature, the second merging strengthening feature and the third merging strengthening feature which are respectively matched with the same audio frame feature vector to obtain target strengthening audio frame feature vectors which are respectively matched with each audio frame feature vector;

(8) And detecting the running state of the equipment according to the target enhanced audio frame feature vector matched with each audio frame feature vector, and obtaining the enhanced equipment running state corresponding to the undetermined equipment running sound.

In an embodiment of the invention, it is assumed, by way of example, that a large turbine in a plant is being monitored. The sound signal emitted by the turbine is collected using a microphone and divided into successive audio frames. Then, a feature vector is extracted for each audio frame, and spectrum analysis is performed. And analyzing the feature vector of each audio frame to obtain a first strengthening frequency domain feature set, a second strengthening frequency domain feature set and a third strengthening frequency domain feature set. These feature sets may include information such as sound energy, spectral averages, etc. over different frequency ranges. In the audio data of the turbine, a first enhanced spectrum is constructed from pearson correlation coefficients between features in a first set of enhanced frequency domain features. For example, one feature in the first set of enhanced frequency domain features represents low frequency energy while another feature represents high frequency energy. The degree of correlation between them can be determined by calculating the pearson correlation coefficient between them. Likewise, a second enhanced spectrum is constructed from the sequence of audio frames, wherein different features represent a trend of sound variation. In order to determine the operational state of the turbine, it is necessary to identify anomalies based on feature matches and correlations. For example, in a third set of enhanced frequency domain features, each feature represents sound energy in a different frequency range. The third enhanced spectrum may be constructed by determining the neighbor features to which each feature corresponds, and the correlation between them. In this way, the frequency domain characteristics of the turbine sound can be analyzed and areas where anomalies exist compared to normal operation can be identified. And combining the first combined strengthening characteristic, the second combined strengthening characteristic and the third combined strengthening characteristic which are matched with the characteristic vector of the same audio frame. For example, when combining the low frequency energy characteristic, the high frequency energy characteristic, and the sound variation trend characteristic of a specific audio frame, comprehensive information of the audio frame on different characteristics can be obtained. And detecting the running state of the equipment by using the matching target characteristics. And judging the running state of the turbine according to the target enhanced audio frame feature vector matched with each audio frame feature vector. By analyzing these target characteristics, it is possible to identify whether the turbine is operating normally or that there is abnormal operation. By the design, the equipment can be subjected to anomaly identification by using the steps of spectrum analysis, feature integration, matching and the like. For example, during monitoring of the turbine, the operating state of the turbine can be analyzed and determined according to the sound characteristics, so that potential faults can be found in time, and corresponding maintenance measures can be taken to ensure the stable operation of the industrial production line.

In one possible implementation manner, the step of constructing the third enhanced spectrum corresponding to the third enhanced frequency domain feature set according to pearson correlation coefficients between neighboring enhanced frequency domain features that each third enhanced frequency domain feature in the third enhanced frequency domain feature set matches, may be implemented by the following example.

(1) Selecting initial features and final features from each third enhanced frequency domain feature;

(2) Determining each initial neighbor feature corresponding to the initial feature from the third enhanced frequency domain features according to the sequence of each audio frame, and performing data integration on each initial neighbor feature to obtain an initial integrated neighbor feature;

(3) Determining each expected neighbor feature corresponding to the final feature from the third enhanced frequency domain features according to the sequence of each audio frame, and performing data integration on each expected neighbor feature to obtain an expected integrated neighbor feature;

(4) Calculating the pearson correlation coefficient between the initial integrated neighbor feature and the expected integrated neighbor feature to obtain the pearson correlation coefficient between the initial feature and the final feature;

(5) Polling each third strengthening frequency domain feature to obtain a pearson correlation coefficient between the neighbor strengthening frequency domain features which are respectively matched with each third strengthening frequency domain feature, and taking the pearson correlation coefficient between the neighbor strengthening frequency domain features which are respectively matched with each third strengthening frequency domain feature as a target pearson correlation coefficient between each third strengthening frequency domain feature;

(6) And determining target relevance among the third enhanced frequency domain features according to the target pearson correlation coefficient, respectively taking the third enhanced frequency domain features as frequency points, and relating the third enhanced frequency domain features according to the target relevance to obtain the third enhanced frequency spectrum.

In the present embodiment, it is assumed, by way of example, that a machine, such as a steel press, on a production line is being monitored. The sound signal of the press as it is being operated is collected by a microphone and divided into successive audio frames. Then, a feature vector is extracted from each audio frame, and spectrum analysis is performed. And analyzing the feature vector of each audio frame to obtain a first strengthening frequency domain feature set, a second strengthening frequency domain feature set and a third strengthening frequency domain feature set. And selecting initial features and final features from the third enhanced frequency domain feature set. For example, in the sound data of a steel press, one of the initial characteristics may represent low frequency vibration energy, while the final characteristic may represent high frequency noise energy. Then, initial neighbor features corresponding to the initial features are determined from the third enhanced frequency domain feature set according to the audio frame sequence, and data integration is performed on the initial neighbor features to obtain initial integrated neighbor features. Likewise, the desired neighbor features corresponding to the final features are determined from the sequence of audio frames and data-integrated to obtain the desired integrated neighbor features. And calculating the pearson correlation coefficient between the initial integrated neighbor feature and the expected integrated neighbor feature to obtain the pearson correlation coefficient between the initial feature and the final feature. These correlation coefficients reflect the degree of correlation between the two features. The target pearson correlation coefficient between each third enhanced frequency domain feature may be obtained by polling each third enhanced frequency domain feature and calculating pearson correlation coefficients between their respective matched neighbor features. According to the target pearson correlation coefficient, a target correlation between each third enhanced frequency domain feature can be determined, and the target correlations are correlated as frequency points, thereby constructing a third enhanced frequency spectrum.

By adopting the design, through the steps of spectrum analysis and feature integration and matching, a third enhanced spectrum can be constructed to describe the relevance between different frequency features. If abnormal conditions, such as excessive vibration or abnormal noise increase, occur in the operation process of the steel press, the problem can be identified by analyzing the abnormal region in the third strengthening frequency spectrum, and measures are timely taken to maintain or repair the equipment so as to ensure the normal operation of the production line.

In a possible implementation manner, the step of performing the merging operation according to the first merged enhanced feature, the second merged enhanced feature and the third merged enhanced feature that are respectively matched with the same audio frame feature vector to obtain the target enhanced audio frame feature vector that is respectively matched with each audio frame feature vector may be implemented by the following example implementation.

(1) Acquiring a second data enhancement factor, and applying a nonlinear conversion function to each first combined enhancement feature matched with each first enhancement frequency domain feature in the first enhancement frequency domain feature set according to the second data enhancement factor to obtain each first nonlinear conversion feature matched with each first enhancement frequency domain feature in the first enhancement frequency domain feature set;

(2) Applying a nonlinear conversion function to each second combined enhancement feature of the second enhancement frequency domain feature set, which is matched with each second enhancement frequency domain feature in the second enhancement frequency domain feature set, according to the second data enhancement factors to obtain each second nonlinear conversion feature of the second enhancement frequency domain feature set, which is matched with each second enhancement frequency domain feature in the second enhancement frequency domain feature set;

(3) Applying a nonlinear conversion function to each third combined enhancement feature matched with each third enhancement frequency domain feature in the third enhancement frequency domain feature set according to the second data enhancement factors to obtain each third nonlinear conversion feature matched with each third enhancement frequency domain feature in the third enhancement frequency domain feature set;

(4) And executing a merging operation on the first nonlinear conversion feature, the second nonlinear conversion feature and the third nonlinear conversion feature which are matched with the same audio frame feature vector, so as to obtain the target enhanced audio frame feature vector which is matched with each audio frame feature vector.

In the embodiment of the present invention, it is assumed that an operation state of an industrial robot on an assembly line is being monitored, and sound signals generated during the operation of the robot are collected by a microphone. First, a second data enhancement factor is obtained from a second enhanced frequency domain feature set. For example, there may be some noise source during robot operation whose intensity may be determined by analyzing features in the second set of enhanced frequency domain features. And then, according to the second data enhancement factors, applying a nonlinear conversion function to each first enhancement frequency domain feature in the first enhancement frequency domain feature set to obtain a first nonlinear conversion feature. Similarly, each second enhanced frequency-domain feature in the second enhanced frequency-domain feature set and each third enhanced frequency-domain feature in the third enhanced frequency-domain feature set are further applied with a nonlinear conversion function, respectively, to obtain a second nonlinear conversion feature and a third nonlinear conversion feature. And executing a merging operation on the first nonlinear conversion characteristic, the second nonlinear conversion characteristic and the third nonlinear conversion characteristic which are matched with each other by the same audio frame characteristic vector. Thus, a target enhanced audio frame feature vector is obtained for which each audio frame feature vector is matched.

By such design, the sound signal during robot operation can be further processed by obtaining the second data enhancement factor and applying the nonlinear conversion function. By means of the merging operation, feature vectors of the target enhanced audio frame can be obtained, wherein the feature vectors contain information obtained from different frequency domain sets and are optimized through nonlinear conversion functions. Therefore, the sound characteristics in the operation process of the robot can be more accurately described, and abnormal conditions or problems can be further analyzed, so that measures can be timely taken for maintenance and repair.

In a possible implementation manner, the step of applying a nonlinear conversion function to the third combined enhancement feature that is matched by each third enhancement frequency domain feature in the third enhancement frequency domain feature set according to the second data enhancement factor to obtain a third nonlinear conversion feature that is matched by each third enhancement frequency domain feature in the third enhancement frequency domain feature set may be implemented by the following example.

(1) Activating the third combined strengthening features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set according to the second data strengthening factors to obtain third activation features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set, and calculating normal distribution errors corresponding to the third activation features to obtain third normal distribution errors;

(2) Performing weight distribution operation on the second combined strengthening features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set to obtain third weight features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set;

(3) And calculating a dot product of the third weight feature and the third normal distribution error to obtain a third nonlinear conversion feature of each third reinforced frequency domain feature in the third reinforced frequency domain feature set, wherein the third nonlinear conversion feature is matched with each third reinforced frequency domain feature.

In the present embodiment, it is assumed, by way of example, that a vibration signal of an industrial fan is being monitored to evaluate its operational status and health. The vibration data collected by the acceleration sensor is used as input, the signal processing is performed using the steps described above, and the features in the third set of enhanced frequency domain features are applied with a nonlinear conversion function according to the second data enhancement factor. A nonlinear conversion function is first applied to each third enhanced frequency-domain feature in the third set of enhanced frequency-domain features. For example, the feature set may be utilized to highlight abnormal frequency components that are indicative of problems with gear wear or imbalance, etc. By nonlinear conversion, the difference between normal vibration and abnormal vibration can be better distinguished. And then, activating each third combined strengthening feature in the third strengthening frequency domain feature set according to the second data strengthening factors to obtain third activated features. Then, a normal distribution error corresponding to each third activation feature is calculated to measure the deviation of the feature from the normal vibration mode. For example, if the normal distribution error corresponding to a certain feature is significantly higher than a preset threshold, an abnormal condition may be indicated. And performing a weight distribution operation on each third combined reinforcement feature in the third set of reinforcement frequency domain features to reflect its importance in the overall vibration signal analysis. These weights may be based on expert knowledge or by machine learning methods. For example, given that vibration characteristics in certain frequency ranges have been determined to have a higher correlation to diagnosis and prediction of fan failure, these characteristics may be given a higher weight. And performing dot product operation on the third weight characteristic and the third normal distribution error, so as to obtain a third nonlinear conversion characteristic of each third reinforced frequency domain characteristic in the third reinforced frequency domain characteristic set, wherein the third nonlinear conversion characteristic is matched with each third reinforced frequency domain characteristic. These features will further provide information about anomalies or key features in the vibration signal.

By the design, the nonlinear conversion characteristic in the third enhanced frequency domain characteristic set can be obtained. By activating features, calculating normal distribution errors, and weight distribution operations, a third nonlinear transformation feature can be derived that can help identify abnormal frequency domain features in the vibration signal and evaluate its importance to the overall vibration mode. In this way, the running state of the fan can be monitored in real time, and an alarm is given or appropriate maintenance measures are taken when abnormal vibration occurs, so that potential faults are avoided.

In one possible implementation, the following examples are also provided by the present embodiments.

(1) The target enhanced audio frame feature vector is used as an enhanced audio frame feature vector, the steps of analyzing the enhanced audio frame feature vector which is matched with each audio frame feature vector respectively are repeatedly executed, and a first enhanced frequency domain feature set, a second enhanced frequency domain feature set and a third enhanced frequency domain feature set are obtained, the frequency spectrum number of the second enhanced frequency domain feature in the second enhanced frequency domain feature set is increased according to the set number, and the frequency spectrum number of the third enhanced frequency domain feature in the third enhanced frequency domain feature set is reduced according to the set number;

(2) And when the preset termination state is met, obtaining the output audio frame feature vector which is matched with each audio frame feature vector, and detecting the equipment running state according to the output audio frame feature vector which is matched with each audio frame feature vector, so as to obtain the final equipment running state corresponding to the running sound of the equipment to be determined.

In an embodiment of the present invention, it is assumed, by way of example, that an industrial machine is being monitored for acoustic signals to assess its operational status and health. The sound data collected by the microphone is used as input, the signal processing is performed by using the method, and the first strengthening frequency domain feature set, the second strengthening frequency domain feature set and the third strengthening frequency domain feature set are obtained according to specific steps. The method comprises the steps of firstly taking a target enhanced audio frame feature vector as an enhanced audio frame feature vector, and repeatedly executing the steps of analyzing the enhanced audio frame feature vector which is matched with each audio frame feature vector. This means that the same emphasis process is applied to each audio frame feature vector to extract frequency domain features related to the device operation state. In the analysis process, the frequency spectrum number of the second enhanced frequency domain features in the second enhanced frequency domain feature set is increased according to the set number. For example, it may have been found that sound characteristics in certain frequency ranges have an association with the normal operating state of the device. By increasing the number of frequency spectrums of the second enhanced frequency domain feature, these key features may be better captured. Meanwhile, in the analysis process, the frequency spectrum number of the third enhanced frequency domain feature in the third enhanced frequency domain feature set is reduced according to the set number. This may be because features in certain frequency ranges are independent of the device operating state or are not significant in device fault diagnostics. By reducing the number of frequency spectrums of the third enhanced frequency domain feature, the impact of these uncorrelated features can be reduced. And repeatedly executing the previous steps until the preset termination state is met, and obtaining the output audio frame feature vector matched with each audio frame feature vector. Then, device operation state detection is performed according to the output audio frame feature vectors to determine the final operation state of the undetermined device. For example, if the output audio frame feature vector indicates the presence of abnormal frequency domain features, it may be inferred that the pending device may be in a fault state.

By the design, key features related to the state of the equipment can be captured by analyzing and processing the audio feature vectors and increasing or decreasing the number of feature spectrums in the enhanced frequency domain feature set according to the set number. Finally, the final running state of the undetermined equipment can be determined according to the output audio frame feature vector, so that the purposes of equipment state monitoring and fault diagnosis are realized.

For example, on an industrial production line, the above technical solution is used to monitor the running state and diagnose faults of the machine a, take the target enhanced audio frame feature vector as the enhanced audio frame feature vector, and analyze each audio frame feature vector. In this way, frequency domain features associated with the sound signal of machine a may be extracted. In the analysis process, the frequency spectrum number of the second strengthening frequency domain features in the second strengthening frequency domain feature set is increased according to the set number. It is assumed that in the normal operating state of machine a, the sound characteristics in a specific frequency range are related to the stability of the device. By increasing the number of frequency spectrums of the second enhanced frequency domain feature, these key features can be captured more accurately. And simultaneously, reducing the frequency spectrum number of the third enhanced frequency domain feature in the third enhanced frequency domain feature set according to the set number. This is because the sound characteristics in certain frequency ranges may be independent of the operating state of machine a or not significant in fault diagnosis. By reducing the number of frequency spectrums of the third enhanced frequency domain feature, the impact of these uncorrelated features on the result may be reduced. And (3) repeatedly executing the steps 1 to 3 until a preset termination state is met, so that the output audio frame feature vectors matched with each audio frame feature vector are obtained. Then, device operation state detection is performed according to the output audio frame feature vectors. For example, if the output audio frame feature vector indicates that there is an abnormal frequency domain feature, such as a sudden occurrence of high frequency noise or frequency offset, it may be inferred that machine a may be in a fault state.

By analyzing and processing the audio feature vectors and increasing or decreasing the number of feature spectrums in the enhanced frequency domain feature set according to the set number, key features related to the state of the machine A can be captured. Finally, the final running state of the machine A and whether a fault condition exists or not can be determined according to the output audio frame feature vector, so that the state monitoring and fault diagnosis of the industrial equipment are realized.

(1) Analyzing the target audio frame feature vectors matched with the audio frame feature vectors respectively to obtain a first target frequency domain feature set, a second target frequency domain feature set and a third target frequency domain feature set, wherein the sum of the frequency spectrum number of the second target frequency domain feature in the second target frequency domain feature set and the frequency spectrum number of the third target frequency domain feature in the third target frequency domain feature set is consistent with the frequency spectrum number of the second frequency domain feature;

(2) Constructing a first target frequency spectrum corresponding to the first target frequency domain feature set according to the pearson correlation coefficient between each first target frequency domain feature in the first target frequency domain feature set, and constructing a second target frequency spectrum corresponding to the second target frequency domain feature set according to the sequence of each audio frame;

(3) Determining target neighbor frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set according to the sequence of each audio frame, and constructing a third target frequency spectrum corresponding to the third target frequency domain feature set according to pearson correlation coefficients between the target neighbor frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set;

(4) Performing feature integration operation according to first target frequency domain features in the first target frequency domain spectrum and neighborhood features corresponding to the first target frequency domain features to obtain first integrated frequency domain features matched with each first target frequency domain feature in the first target frequency domain feature set;

(5) Executing feature integration operation according to the second target frequency domain features in the second target frequency spectrum and the neighborhood features matched with the second target frequency domain features respectively to obtain second integrated frequency domain features matched with each second target frequency domain feature in the second target frequency domain feature set;

(6) Executing feature integration operation according to the third target frequency domain features in the third target frequency spectrum and the neighborhood features matched with the third target frequency domain features respectively to obtain third integrated frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set;

(7) Performing a merging operation according to the first integrated frequency domain feature, the second integrated frequency domain feature and the third integrated frequency domain feature which are respectively matched with the same audio frame feature vector to obtain a current audio frame feature vector which is respectively matched with each audio frame feature vector;

(8) And detecting the running state of the equipment according to the current audio frame feature vector matched with each audio frame feature vector, and obtaining the current running state of the equipment corresponding to the running sound of the equipment to be determined.

In the present embodiment, it is assumed, by way of example, that the motor sound of one industrial machine C is being monitored to evaluate its operating state and abnormal situation. The sound data collected by the microphone is used as input, the signal processing is performed by using the method, and the first target frequency domain feature set, the second target frequency domain feature set and the third target frequency domain feature set are obtained according to specific steps. And taking the target audio frame feature vector as input, and analyzing each audio frame feature vector. By applying fourier transform or the like, frequency domain features related to the motor sound signal of the machine C can be extracted. And constructing a first target frequency spectrum according to the pearson correlation coefficient between each first target frequency domain feature in the first target frequency domain feature set, and constructing a second target frequency spectrum according to the sequence of the audio frame. These spectra represent the target characteristic distribution of the machine C in normal operating conditions. And determining target neighborhood frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set according to the sequence of the audio frame. These neighborhood features provide a feature reference for the normal operating conditions of machine C. And executing feature integration operation according to the features in the first target frequency spectrum and the corresponding neighborhood features to obtain a first integrated frequency domain feature, wherein each feature in the first target frequency domain feature set is matched with each feature. By integrating features, noise and interference can be reduced, better capturing features related to the machine C state. And combining according to the first integrated frequency domain feature, the second integrated frequency domain feature and the third integrated frequency domain feature which are matched with the same audio frame feature vector, so as to obtain the current audio frame feature vector which is matched with each audio frame feature vector. This merging process will take multiple feature sets into account in combination to obtain a more comprehensive characterization. And detecting the running state of the equipment according to the current audio frame feature vector matched with each audio frame feature vector. By comparing the differences between the current audio frame characteristics and the target audio frame characteristics, the pending operating state of machine C may be determined. For example, if the current audio frame characteristics indicate that there are abnormal frequency domain characteristics, such as a sudden occurrence of high amplitude noise or frequency offset, it may be inferred that machine C may be in a fault state.

By the design, the motor sound signal can be used for monitoring the running state and diagnosing faults of the machine C. According to the method, key features related to the state of the machine C can be captured by analyzing and processing the audio feature vectors, constructing a target frequency spectrum according to the Pearson correlation coefficient and executing feature integration operation and merging operation. Finally, the feature vector may be based on the current audio frame.

In one possible implementation, the following examples are also provided in the examples of the present invention.

(1) Inputting the operation sound of the equipment to be determined into an equipment operation state detection model, analyzing the operation sound of the equipment to be determined by utilizing the equipment operation state detection model to obtain each audio frame, extracting the characteristics of each audio frame, and obtaining the characteristic vector of each audio frame;

(2) Analyzing each audio frame feature vector by using the equipment running state detection model to obtain a first frequency domain feature respectively matched with each audio frame feature vector, forming a first frequency domain feature set, and obtaining a second frequency domain feature respectively matched with each audio frame feature vector, forming a second frequency domain feature set;

(3) Constructing a first frequency spectrum corresponding to the first frequency domain feature set by using a pearson correlation coefficient between each first frequency domain feature in the first frequency domain feature set by using the equipment running state detection model, and constructing a second frequency spectrum corresponding to the second frequency domain feature set according to the sequence of each audio frame;

(4) Performing feature integration operation by using the first frequency domain features in the first frequency spectrum and neighborhood features corresponding to the first frequency domain features by using the equipment running state detection model to obtain first joint feature vectors respectively matched with each first frequency domain feature in the first frequency domain feature set, and performing feature integration operation according to second frequency domain features in the second frequency spectrum and neighborhood features corresponding to the second frequency domain features to obtain second joint feature vectors respectively matched with each second frequency domain feature in the second frequency domain feature set;

(5) And executing merging operation on the first joint feature vector and the second joint feature vector which are matched with the same audio frame feature vector by using the equipment operation state detection model to obtain target audio frame feature vectors matched with each audio frame feature vector, and detecting the equipment operation state according to the target audio frame feature vectors matched with each audio frame feature vector to obtain the determined equipment operation state corresponding to the undetermined equipment operation sound.

In an embodiment of the present invention, the sound generated by each device is input into a device operational state detection model for further analysis and processing, as an example. Analyzing the operation sound of the undetermined equipment into each audio frame through the equipment operation state detection model, and extracting the characteristics from each audio frame. These characteristics may include spectral shape, amplitude variation, energy, etc. The equipment running state detection model analyzes the feature vector of each audio frame to obtain a first frequency domain feature and a second frequency domain feature matched with the feature vector. For example, the first frequency domain feature may be a dominant frequency component of the audio frame, and the second frequency domain feature may be a rate of change of the frequency component. Using a device operational state detection model, a first spectrum is constructed using pearson correlation coefficients between each feature in a first set of frequency domain features. Similarly, a second spectrum is constructed from the sequence of audio frames. These spectra represent the target characteristic distribution of the device sound under normal operation. And performing feature integration operation on the first frequency domain features and the neighborhood features thereof in the first frequency spectrum by using the equipment running state detection model to generate each first joint feature vector. And similarly, performing feature integration operation on the second frequency domain features and the neighborhood features thereof to generate each second joint feature vector. And combining each audio frame feature vector with the corresponding first joint feature vector and second joint feature vector by using the equipment running state detection model. This will produce a target audio frame feature vector for each audio frame feature vector. And detecting the running state of the equipment according to the target audio frame feature vector corresponding to each audio frame feature vector by means of the equipment running state detection model. By comparing the differences between the current audio frame characteristics and the target audio frame characteristics, the operational state of the pending device may be determined. For example, if the difference exceeds a preset threshold, it may be inferred that the device may be abnormal or malfunctioning. The sound data generated by the device can be utilized to monitor and identify abnormal devices in an industrial scene. According to the method, key features related to the running state of the equipment can be captured by analyzing the audio frame, extracting the features, constructing the frequency spectrum and carrying out feature integration and merging operation, and the key features are analyzed and judged through the running state detection model of the equipment. Thus, the sound signal of the equipment can be monitored in real time, and normal operation and abnormal conditions can be identified.

An overall flow implementation of an embodiment of the present invention is provided below.

Consider a large manufacturing facility with hundreds of heavy machines running. Each machine produces a specific sound during normal operation. However, when a device starts to have a problem, its sound may change due to wear of parts, circuit failure, etc.

Firstly, acquiring operation sound from equipment to be detected, analyzing the operation sound to obtain audio frames, extracting the characteristics of each audio frame, and forming characteristic vectors. These features include audio signal features and audio timing features. The audio signal features are obtained by extracting signal features of audio frames, and the audio timing features are obtained by performing a feature extraction operation on a sequence of audio frames. Then, the system analyzes each audio frame feature vector to obtain a first frequency domain feature and a second frequency domain feature, and forms a corresponding frequency domain feature set. It also calculates pearson correlation coefficients between the first frequency domain features to construct a first frequency spectrum and constructs a second frequency spectrum from the sequence of audio frames. The system then performs feature integration operation according to the frequency domain features in the first frequency spectrum and the second frequency spectrum and the corresponding neighborhood features to obtain a first joint feature vector and a second joint feature vector. And then, the system combines the first joint feature vector and the second joint feature vector matched with the same audio frame feature vector to obtain the target audio frame feature vector. And then, detecting the running state of the equipment according to the target audio frame feature vectors to obtain the running state of the equipment corresponding to the running sound of the equipment to be determined. If the operational status of the device indicates that the device is abnormal, the system marks it as an abnormal device. The method can enable a factory manager to know the running condition of equipment in time and immediately maintain the equipment when the problem is found, thereby avoiding possible production loss.

In a more specific embodiment, a plant is contemplated in which the initial indication of a machine failure may be a change in its operating sound. To capture this, factories employ this artificial intelligence based technology. Firstly, the monitoring equipment acquires operation sound from a machine to be checked, and analyzes the operation sound to obtain each audio frame. The system then extracts signal characteristics, such as the amplitude, frequency, etc., of the sound for each audio frame, and timing characteristics, such as the trend of the sound over time. These two types of features are integrated into one audio frame feature vector for subsequent analysis. Then, the method analyzes the feature vector of each audio frame, calculates the pearson correlation coefficient between the first frequency domain features, and associates each frequency point according to the correlation to form a first frequency spectrum. At the same time, the system also determines a sequence index of the second frequency domain feature according to the sequence of the audio frame, and associates the second frequency domain feature according to the indexes to form a second frequency spectrum. Both steps are looking for patterns or trends that may reflect the status of the device. And then, the system performs feature integration operation according to the frequency domain features and the neighborhood features thereof in the first frequency spectrum and the second frequency spectrum to generate a first joint feature vector and a second joint feature vector. This process involves computing the mean and deviation of the neighborhood feature for each frequency domain feature and then combining the data with the original frequency domain feature. And then, the system acquires a data enhancement factor, and uses a nonlinear conversion function to enhance the combined feature vector to obtain a data enhancement feature. In this process, the system activates the first joint feature vector and the second joint feature vector and calculates their corresponding normal distribution errors. Then, the joint feature vectors are subjected to weight distribution operation to obtain weighted features. And finally, calculating dot products of the weighted features and normal distribution errors to obtain data enhancement features. And finally, the system detects the running state of the equipment according to the reinforced audio frame characteristic vector matched with each audio frame characteristic vector. For example, if the target enhanced audio frame feature vector of the current machine sound differs significantly from the feature vector during normal operation, the system may determine that the machine is malfunctioning. The factory can utilize this information to take action in advance, such as scheduling maintenance or checking for possible sources of problems, thereby avoiding more serious equipment failures and production breaks.

The embodiment of the invention provides a computer device 100, wherein the computer device 100 comprises a processor and a nonvolatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the abnormal device identification method based on artificial intelligence and device operation sound. As shown in fig. 2, fig. 2 is a block diagram of a computer device 100 according to an embodiment of the present invention. The computer device 100 comprises a memory 111, a processor 112 and a communication unit 113. For data transmission or interaction, the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly. For example, one or more communication buses or signal lines may be used to electrically couple these elements to each other.

The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated. The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. An abnormal device identification method based on artificial intelligence and device operation sound, the method comprising:

2. The method of claim 1, wherein extracting the features of each audio frame to obtain each audio frame feature vector comprises:

extracting the signal characteristics of each audio frame to obtain each audio signal characteristic;

Acquiring a sequence of each audio frame, and executing feature extraction operation on the sequence of each audio frame to obtain each audio time sequence feature;

And executing integration operation on each audio signal characteristic and the corresponding audio time sequence characteristic to obtain each audio frame characteristic vector.

3. The method of claim 1, wherein constructing a first spectrum corresponding to the first set of frequency-domain features from pearson correlation coefficients between each first frequency-domain feature in the first set of frequency-domain features comprises:

calculating characteristic pearson correlation coefficients among the first frequency domain features, and determining frequency point relevance among the first frequency domain features according to the characteristic pearson correlation coefficients;

Respectively taking each first frequency domain feature as a frequency point, and associating each first frequency domain feature according to the frequency point association to obtain the first frequency spectrum;

The constructing a second spectrum corresponding to the second frequency domain feature set according to the sequence of each audio frame includes:

4. The method of claim 1, wherein the performing feature integration according to the first frequency domain feature in the first spectrum and the neighborhood feature corresponding to the first frequency domain feature to obtain a first joint feature vector in which each first frequency domain feature in the first frequency domain feature set is matched respectively includes:

Polling each first frequency domain feature in the first frequency spectrum to obtain a first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set;

The performing feature integration operation according to the second frequency domain features in the second frequency spectrum and the neighborhood features corresponding to the second frequency domain features to obtain second joint feature vectors respectively matched with each second frequency domain feature in the second frequency domain feature set, including:

calculating the mean value characteristic of the neighborhood characteristic corresponding to the second frequency domain characteristic to obtain a second mean value characteristic, and calculating the deviation characteristic between the second frequency domain characteristic and the neighborhood characteristic corresponding to the second frequency domain characteristic to obtain a second deviation characteristic;

performing a merging operation on the second frequency domain feature, the second deviation feature and the second mean feature to obtain a second merging feature, and performing a complete coupling operation according to the second merging feature to obtain a second merging feature vector corresponding to the second frequency domain feature;

and polling each second frequency domain feature in the second frequency spectrum to obtain a second combined feature vector matched with each second frequency domain feature in the second frequency domain feature set.

5. The method of claim 1, wherein the performing a merging operation according to the first joint feature vector and the second joint feature vector that are respectively matched with the same audio frame feature vector to obtain the target audio frame feature vector that is respectively matched with each audio frame feature vector, and performing device running state detection according to the target audio frame feature vector that is respectively matched with each audio frame feature vector to obtain the device running state corresponding to the pending device running sound, includes:

acquiring a first data enhancement factor, activating a first joint feature vector matched with each first frequency domain feature in the first frequency domain feature set according to the first data enhancement factor to obtain a first activation feature matched with each first frequency domain feature in the first frequency domain feature set, and calculating a normal distribution error corresponding to the first activation feature to obtain a first normal distribution error;

Performing weight distribution operation on the first joint feature vectors matched with each first frequency domain feature in the first frequency domain feature set to obtain first weight features matched with each first frequency domain feature in the first frequency domain feature set;

Calculating a dot product of the first weight feature and the first normal distribution error to obtain a first data strengthening feature of each first frequency domain feature in the first frequency domain feature set, wherein the first data strengthening feature is matched with each first frequency domain feature;

Activating the second joint feature vectors matched with each second frequency domain feature in the second frequency domain feature set according to the first data enhancement factors to obtain second activation features matched with each second frequency domain feature in the second frequency domain feature set, and calculating normal distribution errors corresponding to the second activation features to obtain second normal distribution errors;

Performing weight distribution operation on the second joint feature vectors matched with each second frequency domain feature in the second frequency domain feature set to obtain second weight features matched with each second frequency domain feature in the second frequency domain feature set;

Calculating the dot product of the second weight feature and the second normal distribution error to obtain second data strengthening features matched with each second frequency domain feature in the second frequency domain feature set;

And executing merging operation on the first data strengthening feature and the second data strengthening feature which are matched with the same audio frame feature vector respectively to obtain strengthening audio frame feature vectors matched with each audio frame feature vector, and detecting the running state of the equipment according to the strengthening audio frame feature vectors matched with each audio frame feature vector to obtain the running state of the target equipment corresponding to the running sound of the equipment to be determined.

6. The method according to claim 5, wherein the detecting the device operation state according to the respective matched enhanced audio frame feature vector of each audio frame feature vector to obtain the target device operation state corresponding to the undetermined device operation sound includes:

analyzing the reinforced audio frame feature vectors matched with each audio frame feature vector to obtain a first reinforced frequency domain feature set, a second reinforced frequency domain feature set and a third reinforced frequency domain feature set, wherein the sum of the frequency spectrum number of the second reinforced frequency domain feature in the second reinforced frequency domain feature set and the frequency spectrum number of the third reinforced frequency domain feature in the third reinforced frequency domain feature set is consistent with the frequency spectrum number of the second frequency domain feature;

Constructing a first enhanced frequency spectrum corresponding to the first enhanced frequency domain feature set according to the pearson correlation coefficient between each first enhanced frequency domain feature in the first enhanced frequency domain feature set, and constructing a second enhanced frequency spectrum corresponding to the second enhanced frequency domain feature set according to the sequence of each audio frame;

Determining neighbor reinforced frequency domain features matched with each third reinforced frequency domain feature in the third reinforced frequency domain feature set according to the sequence of each audio frame, and selecting initial features and final features from each third reinforced frequency domain feature;

Determining each initial neighbor feature corresponding to the initial feature from the third enhanced frequency domain features according to the sequence of each audio frame, and performing data integration on each initial neighbor feature to obtain an initial integrated neighbor feature;

determining each expected neighbor feature corresponding to the final feature from the third enhanced frequency domain features according to the sequence of each audio frame, and performing data integration on each expected neighbor feature to obtain an expected integrated neighbor feature;

Calculating the pearson correlation coefficient between the initial integrated neighbor feature and the expected integrated neighbor feature to obtain the pearson correlation coefficient between the initial feature and the final feature;

Polling each third strengthening frequency domain feature to obtain a pearson correlation coefficient between the neighbor strengthening frequency domain features which are respectively matched with each third strengthening frequency domain feature, and taking the pearson correlation coefficient between the neighbor strengthening frequency domain features which are respectively matched with each third strengthening frequency domain feature as a target pearson correlation coefficient between each third strengthening frequency domain feature;

determining target relevance among the third enhanced frequency domain features according to the target pearson correlation coefficient, respectively taking the third enhanced frequency domain features as frequency points, and relating the third enhanced frequency domain features according to the target relevance to obtain a third enhanced frequency spectrum;

Performing feature integration operation according to the first enhanced frequency domain features in the first enhanced frequency spectrum and the neighborhood features corresponding to the first enhanced frequency domain features to obtain first combined enhanced features matched with each first enhanced frequency domain feature in the first enhanced frequency domain feature set;

Executing feature integration operation according to the second enhanced frequency domain features in the second enhanced frequency spectrum and the neighborhood features matched with the second enhanced frequency domain features respectively to obtain second combined enhanced features matched with each second enhanced frequency domain feature in the second enhanced frequency domain feature set;

Performing feature integration operation according to the third enhanced frequency domain features in the third enhanced frequency spectrum and the neighborhood features matched with the third enhanced frequency domain features respectively to obtain third combined enhanced features matched with each third enhanced frequency domain feature in the third enhanced frequency domain feature set;

Acquiring a second data enhancement factor, and applying a nonlinear conversion function to each first combined enhancement feature matched with each first enhancement frequency domain feature in the first enhancement frequency domain feature set according to the second data enhancement factor to obtain each first nonlinear conversion feature matched with each first enhancement frequency domain feature in the first enhancement frequency domain feature set;

Applying a nonlinear conversion function to each second combined enhancement feature of the second enhancement frequency domain feature set, which is matched with each second enhancement frequency domain feature in the second enhancement frequency domain feature set, according to the second data enhancement factors to obtain each second nonlinear conversion feature of the second enhancement frequency domain feature set, which is matched with each second enhancement frequency domain feature in the second enhancement frequency domain feature set;

Activating the third combined strengthening features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set according to the second data strengthening factors to obtain third activation features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set, and calculating normal distribution errors corresponding to the third activation features to obtain third normal distribution errors;

Performing weight distribution operation on the second combined strengthening features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set to obtain third weight features matched with each third strengthening frequency domain feature in the third strengthening frequency domain feature set;

Calculating a dot product of the third weight feature and the third normal distribution error to obtain a third nonlinear conversion feature of each third reinforced frequency domain feature in the third reinforced frequency domain feature set, wherein the third nonlinear conversion feature is matched with each third reinforced frequency domain feature;

Performing a merging operation on the first nonlinear conversion feature, the second nonlinear conversion feature and the third nonlinear conversion feature which are respectively matched with the same audio frame feature vector to obtain a target enhanced audio frame feature vector which is respectively matched with each audio frame feature vector;

and detecting the running state of the equipment according to the target enhanced audio frame feature vector matched with each audio frame feature vector, and obtaining the enhanced equipment running state corresponding to the undetermined equipment running sound.

7. The method of claim 6, wherein the method further comprises:

The target enhanced audio frame feature vector is used as an enhanced audio frame feature vector, the steps of analyzing the enhanced audio frame feature vector which is matched with each audio frame feature vector respectively are repeatedly executed, and a first enhanced frequency domain feature set, a second enhanced frequency domain feature set and a third enhanced frequency domain feature set are obtained, the frequency spectrum number of the second enhanced frequency domain feature in the second enhanced frequency domain feature set is increased according to the set number, and the frequency spectrum number of the third enhanced frequency domain feature in the third enhanced frequency domain feature set is reduced according to the set number;

And when the preset termination state is met, obtaining the output audio frame feature vector which is matched with each audio frame feature vector, and detecting the equipment running state according to the output audio frame feature vector which is matched with each audio frame feature vector, so as to obtain the final equipment running state corresponding to the running sound of the equipment to be determined.

8. The method according to claim 1, wherein the detecting the device running state according to the target audio frame feature vector matched with each audio frame feature vector to obtain the device running state corresponding to the pending device running sound includes:

Analyzing the target audio frame feature vectors matched with the audio frame feature vectors respectively to obtain a first target frequency domain feature set, a second target frequency domain feature set and a third target frequency domain feature set, wherein the sum of the frequency spectrum number of the second target frequency domain feature in the second target frequency domain feature set and the frequency spectrum number of the third target frequency domain feature in the third target frequency domain feature set is consistent with the frequency spectrum number of the second frequency domain feature;

Constructing a first target frequency spectrum corresponding to the first target frequency domain feature set according to the pearson correlation coefficient between each first target frequency domain feature in the first target frequency domain feature set, and constructing a second target frequency spectrum corresponding to the second target frequency domain feature set according to the sequence of each audio frame;

determining target neighbor frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set according to the sequence of each audio frame, and constructing a third target frequency spectrum corresponding to the third target frequency domain feature set according to pearson correlation coefficients between the target neighbor frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set;

Performing feature integration operation according to first target frequency domain features in the first target frequency spectrum and neighborhood features corresponding to the first target frequency domain features to obtain first integrated frequency domain features matched with each first target frequency domain feature in the first target frequency domain feature set;

executing feature integration operation according to the second target frequency domain features in the second target frequency spectrum and the neighborhood features matched with the second target frequency domain features respectively to obtain second integrated frequency domain features matched with each second target frequency domain feature in the second target frequency domain feature set;

Executing feature integration operation according to the third target frequency domain features in the third target frequency spectrum and the neighborhood features matched with the third target frequency domain features respectively to obtain third integrated frequency domain features matched with each third target frequency domain feature in the third target frequency domain feature set;

Performing a merging operation according to the first integrated frequency domain feature, the second integrated frequency domain feature and the third integrated frequency domain feature which are respectively matched with the same audio frame feature vector to obtain a current audio frame feature vector which is respectively matched with each audio frame feature vector;

And detecting the running state of the equipment according to the current audio frame feature vector matched with each audio frame feature vector, and obtaining the current running state of the equipment corresponding to the running sound of the equipment to be determined.

9. The method according to claim 1, wherein the method further comprises:

inputting the operation sound of the equipment to be determined into an equipment operation state detection model, analyzing the operation sound of the equipment to be determined by utilizing the equipment operation state detection model to obtain each audio frame, extracting the characteristics of each audio frame, and obtaining the characteristic vector of each audio frame;

Analyzing each audio frame feature vector by using the equipment running state detection model to obtain a first frequency domain feature respectively matched with each audio frame feature vector, forming a first frequency domain feature set, and obtaining a second frequency domain feature respectively matched with each audio frame feature vector, forming a second frequency domain feature set;

Constructing a first frequency spectrum corresponding to the first frequency domain feature set by using a pearson correlation coefficient between each first frequency domain feature in the first frequency domain feature set by using the equipment running state detection model, and constructing a second frequency spectrum corresponding to the second frequency domain feature set according to the sequence of each audio frame;

Performing feature integration operation by using the first frequency domain features in the first frequency spectrum and neighborhood features corresponding to the first frequency domain features by using the equipment running state detection model to obtain first joint feature vectors respectively matched with each first frequency domain feature in the first frequency domain feature set, and performing feature integration operation according to second frequency domain features in the second frequency spectrum and neighborhood features corresponding to the second frequency domain features to obtain second joint feature vectors respectively matched with each second frequency domain feature in the second frequency domain feature set;

And executing merging operation on the first joint feature vector and the second joint feature vector which are matched with the same audio frame feature vector by using the equipment operation state detection model to obtain target audio frame feature vectors matched with each audio frame feature vector, and detecting the equipment operation state according to the target audio frame feature vectors matched with each audio frame feature vector to obtain the determined equipment operation state corresponding to the undetermined equipment operation sound.

10. A server system comprising a server for performing the method of any of claims 1-9.