WO2023245991A1

WO2023245991A1 - Power plant equipment state auditory monitoring method merging frequency band top-down attention mechanism

Info

Publication number: WO2023245991A1
Application number: PCT/CN2022/135718
Authority: WO
Inventors: 陈满; 姚建超; 赵增涛; 张晖; 陈弘昊; 张豪; 窦博文; 李重阳; 林伟杰; 郑春; 叶超欣; 黄璐琦; 吴盛彪; 徐添; 何健辉
Original assignee: 南方电网调峰调频发电有限公司储能科研院
Priority date: 2022-06-24
Filing date: 2022-11-30
Publication date: 2023-12-28
Also published as: CN116825131A

Abstract

A power plant equipment state auditory monitoring method merging a frequency band top-down attention mechanism. The method comprises the following steps: setting a sound sensor, and obtaining the operating sounds of electrical equipment; preprocessing the operating sounds of the electrical equipment to obtain preprocessed sound data; performing frequency band top-down attention mechanism processing on the preprocessed sound data; and performing convolutional neural network identification to obtain an identification result. The provided method solves the problem of traditional sound monitoring methods requiring the deep mining of various electrical equipment operating status sound features, resulting in high difficulty and low efficiency; and solves the problem of directly applying a machine learning method, resulting in large sample requirements and many training instances; the provided novel frequency band top-down attention mechanism combines the characteristics of electrical equipment operating sounds to focus on concentrated downwardly spreading areas of a Mel spectrogram, thereby improving identification results and reducing the number of training instances.

Description

融合频带自向下注意力机制的电厂设备状态听觉监测方法Auditory monitoring method for power plant equipment status integrating frequency band downward attention mechanism

技术领域Technical field

本发明涉及计算机听觉技术领域，具体涉及融合频带自向下注意力机制的电厂设备状态听觉监测方法。The invention relates to the field of computer hearing technology, and in particular to a power plant equipment status hearing monitoring method that integrates a frequency band downward attention mechanism.

背景技术Background technique

计算机听觉技术主要针对非人声进行识别与处理，包括结合音乐领域知识的音乐信号处理和其他领域知识的音频信息处理。计算机听觉技术是一个基于音频信号处理和机器学习、对数字声音与音乐的内容进行理解和分析的学科，是使用计算方法对数字化声音与音乐内容进行理解和分析的交叉学科，主要基础学科是音频信号处理和机器学习。当前，计算机听觉技术的发展方向主要分为2个类别：面向音乐的计算机听觉技术和面向环境声的计算机听觉技术(可称为基于一般音频的计算机听觉技术)。Computer hearing technology mainly targets the recognition and processing of non-human voices, including music signal processing combined with knowledge in the music field and audio information processing with knowledge in other fields. Computer auditory technology is a discipline that understands and analyzes the content of digital sound and music based on audio signal processing and machine learning. It is an interdisciplinary subject that uses computational methods to understand and analyze the content of digital sound and music. The main basic discipline is audio Signal processing and machine learning. Currently, the development direction of computer hearing technology is mainly divided into two categories: music-oriented computer hearing technology and environmental sound-oriented computer hearing technology (which can be called computer hearing technology based on general audio).

计算机听觉技术的应用特点主要体现在：在传统音频信号处理的基础上，提取音频特征，结合机器学习算法(主要是模式识别方法)完成状态监测与故障诊断。在工业领域，计算机听觉技术也有很多的研究与应用，根据设备运转噪声进行故障诊断，主要进行声目标识别的应用。The application characteristics of computer hearing technology are mainly reflected in: based on traditional audio signal processing, audio features are extracted, and combined with machine learning algorithms (mainly pattern recognition methods) to complete status monitoring and fault diagnosis. In the industrial field, there are also many studies and applications of computer hearing technology. Fault diagnosis is based on equipment operating noise, and the main application is acoustic target recognition.

在电力领域，电力设备在运行过程中，由于机械振动会产生声音，正常运行下的声音一般具有一定的规律性，但当设备发生某种故障后，由于运行状态或运行结构改变，其声音也会随之改变，比如出现机械故障时，其振动特性或部分频段内的振动能量将发生改变，同时会伴随刺耳或尖锐的噪声。此外，设备的超负荷运行或其他故障也会引起异常的声音变化。因此，电力设备的声音信号包含许多运行状态信息，具有丰富经验的工程师可以根据现场设备的异常声音，通过音色、音量、音高等音频特征的变化判断出设备是否处于不正常运行状态，甚至判别出故障的类型和严重程度。In the field of electric power, during the operation of electric power equipment, sound will be produced due to mechanical vibration. The sound under normal operation generally has a certain regularity. However, when a certain fault occurs in the equipment, the sound will also change due to changes in the operating state or operating structure. For example, when a mechanical failure occurs, its vibration characteristics or vibration energy in some frequency bands will change, accompanied by harsh or sharp noises. In addition, overload operation or other malfunctions of the equipment can also cause abnormal sound changes. Therefore, the sound signals of power equipment contain a lot of operating status information. Engineers with rich experience can judge whether the equipment is in an abnormal operating state through changes in audio characteristics such as timbre, volume, and pitch based on the abnormal sounds of on-site equipment, and even identify whether the equipment is in an abnormal operating state. Type and severity of failure.

现有技术中，传统电气设备声音监测方法具体如下：In the existing technology, the traditional sound monitoring methods of electrical equipment are as follows:

由于旋转设备的工作状态与运行环境等条件导致的目标声信号夹杂大量的噪声信号，当计算机听觉技术用于旋转设备的状态监测时，大多结合频谱分析或小波变换等传统音频处理方法([1]胡胜，郝剑波，罗忠启，等.基于噪声频段提取的水轮发电机故障诊断方法[J].大电机技术，2017(6):25-29)来判断设备故障是否存在，再通过WPT、EMD或MFCC等音频特征提取复杂算法进行处理，最后采用HMM或SVM等传统分类器完成状态监测与故障预警任务。旋转设备的声信号处理一般会与传统的振动信号等其他监测参数结合，音频信号分析的结果用于辅助判断，使基于振动信号分析的结果更加可靠。但目前仍没有统一、可靠的设备声信号处理算法能够应用于电力***的全部设备，需要结合设备自身运行特点与工作状态选择不同的声信号特征提取方法，在应用中大多采用传统的音频信号处理技术。Due to the working status and operating environment of rotating equipment, the target acoustic signal is mixed with a large amount of noise signals. When computer auditory technology is used for status monitoring of rotating equipment, it is mostly combined with traditional audio processing methods such as spectrum analysis or wavelet transform ([1 ] Hu Sheng, Hao Jianbo, Luo Zhongqi, et al. Hydrogen generator fault diagnosis method based on noise frequency band extraction [J]. Large Electric Machinery Technology, 2017(6):25-29) to determine whether the equipment fault exists, and then use WPT, Complex audio feature extraction algorithms such as EMD or MFCC are used for processing, and finally traditional classifiers such as HMM or SVM are used to complete status monitoring and fault warning tasks. Acoustic signal processing of rotating equipment is generally combined with other monitoring parameters such as traditional vibration signals. The results of audio signal analysis are used to assist judgment, making the results based on vibration signal analysis more reliable. However, there is still no unified and reliable equipment acoustic signal processing algorithm that can be applied to all equipment in the power system. Different acoustic signal feature extraction methods need to be selected based on the equipment's own operating characteristics and working status. Traditional audio signal processing is mostly used in applications. technology.

与旋转设备相比，非旋转设备主要包括电厂的小型设备或变电站的一、二次设备，由于没有高功率旋转装置的运行，具有振动幅度小、环境噪音小等特点。由上述分析可以看出，通过设备声信号进行状态监测的非旋转设备不需要其他辅助信息的帮助即可完成状态监测任务；同时由于非旋转设备大多在空旷环境中运行，处理难度相对简单，故非旋转设备的音频处理仅需要FFT、STFT或WA等传统时频域算法提取特征，并通过VQ、自相关系数或模糊聚类等数学统计算法进行状态分类就能够满足基本监测要求。但当前算法的应用降低了音频处理技术与机器学习算法的结合度，不利于状态监测识别率与稳定性的提高。Compared with rotating equipment, non-rotating equipment mainly includes small equipment in power plants or primary and secondary equipment in substations. Since there is no operation of high-power rotating devices, non-rotating equipment has the characteristics of small vibration amplitude and low environmental noise. It can be seen from the above analysis that non-rotating equipment that performs status monitoring through equipment acoustic signals does not need the help of other auxiliary information to complete the status monitoring task; at the same time, since most non-rotating equipment operates in an open environment, the processing difficulty is relatively simple, so Audio processing of non-rotating equipment only requires traditional time-frequency domain algorithms such as FFT, STFT or WA to extract features, and perform status classification through mathematical statistical algorithms such as VQ, autocorrelation coefficient or fuzzy clustering to meet basic monitoring requirements. However, the application of the current algorithm reduces the integration of audio processing technology and machine learning algorithms, which is not conducive to improving the recognition rate and stability of condition monitoring.

现有技术中，基于机器学习的电气设备声音监测方法具体如下：In the existing technology, the sound monitoring method of electrical equipment based on machine learning is as follows:

基于声音的在线监测是一种简单可靠的非侵入性监测方法,不会干扰电气设备的正常运行,并可以良好地反映电气设备的工作状态和异常情况。电气设备种类多、结构复杂、故障类型多样,难以直接根据声音推断出健康状况。采用机器学习方法对电气设备进行声学异常监测受到了广泛的关注，深度学习技术比传统机器学习通常需要更多数据进行训练,但也不会因为数据规模、特征维度过大而显著增大训练难度,在线运行时仅需要神经网络的前向传播,具有很高的计算效率。然而,将深度学习技术应用于无/半监督的异常监测,还处于刚起步阶段。目前深度学习异常检测方法([3]梁延昌.基于机器学习的变压器声学异常检测方法研究[D].华北电力大学(北京))包括自编码器、变分自编码器、单目标生成对抗式主动学习和多目标生成对抗式主动学习。主成分分析和自编码器原理类似,主成分分析是使用线性代数技术的向量线性组合，而自动编码器是使用深度神经网络技术的向量非线性组合。变分自编码器在计算异常分数时使用了概率方法,相对于自编码器更具有可解释性。多目标生成对抗式主动学习在单目标生成对抗式主动学习的基础上使用了多个生成器以提高性能。现有技术通过深度神经网络方法,搭建卷积神经网络和循环神经网络模型,提取声音信号的语谱图和梅尔倒谱系数作为训练样本进行训练。([2]陈明泉.基于声音特征识别的12kV中压开关设备绝缘放电监测研究[D].厦门理工学院,2019.DOI:10.27866/d.cnki.gxlxy.2019.000074.)。Sound-based online monitoring is a simple and reliable non-invasive monitoring method that will not interfere with the normal operation of electrical equipment and can well reflect the working status and abnormal conditions of electrical equipment. There are many types of electrical equipment, complex structures, and various fault types, making it difficult to directly infer health conditions based on sound. The use of machine learning methods to monitor acoustic anomalies in electrical equipment has received widespread attention. Deep learning technology usually requires more data for training than traditional machine learning, but it does not significantly increase the difficulty of training due to excessive data scale and feature dimensions. ,Only the forward propagation of the neural network is ,required when running online, which has high computational ,efficiency. However, applying deep learning technology to unsupervised/semi-supervised anomaly detection is still in its infancy. Current deep learning anomaly detection methods ([3] Liang Yanchang. Research on transformer acoustic anomaly detection methods based on machine learning [D]. North China Electric Power University (Beijing)) include autoencoders, variational autoencoders, and single-target generative adversarial active Learning and multi-objective generative adversarial active learning. The principles of principal component analysis and autoencoders are similar. Principal component analysis is a linear combination of vectors using linear algebra technology, while autoencoders are a nonlinear combination of vectors using deep neural network technology. Variational autoencoders use probabilistic methods when calculating anomaly scores, which are more interpretable than autoencoders. Multi-objective generative adversarial active learning uses multiple generators to improve performance based on single-objective generative adversarial active learning. The existing technology uses deep neural network methods to build convolutional neural network and recurrent neural network models, and extracts the spectrogram and Mel cepstrum coefficients of the sound signal as training samples for training. ([2] Chen Mingquan. Research on insulation discharge monitoring of 12kV medium-voltage switchgear based on sound feature recognition [D]. Xiamen Institute of Technology, 2019. DOI: 10.27866/d.cnki.gxlxy.2019.000074.).

现有电气设备声音监测技术多数采用传统声音监测方法，效率低、难度大，利用机器学习方法需要数据规模大、训练次数多，还处于起步阶段。部分研究通过利用注意力机制提高声音识别效果，未曾结合电厂设备音频构建新注意机制。([4]黄聪.基于频带注意力和多度量学习的说话人识别算法[D].南昌大学,2021.DOI:10.27232/d.cnki.gnchu.2021.001316.)Most of the existing sound monitoring technologies for electrical equipment use traditional sound monitoring methods, which are inefficient and difficult. The use of machine learning methods requires large data scale and multiple training times, and is still in its infancy. Some studies improve the sound recognition effect by using attention mechanisms, but have not combined power plant equipment audio to build a new attention mechanism. ([4] Huang Cong. Speaker identification algorithm based on frequency band attention and multi-metric learning [D]. Nanchang University, 2021. DOI: 10.27232/d.cnki.gnchu.2021.001316.)

发明内容Contents of the invention

本发明结合发电厂设备声音特性，提出了一种融合频带自向下注意力机制的发电厂设备状态听觉监测方法，训练次数少，所需数据量也有一定程度上的减少。The present invention combines the sound characteristics of power plant equipment and proposes a power plant equipment status auditory monitoring method that integrates the frequency band downward attention mechanism. The number of training times is small and the amount of data required is also reduced to a certain extent.

本发明的目的至少通过如下技术方案之一实现。The object of the present invention is achieved by at least one of the following technical solutions.

融合频带自向下注意力机制的电厂设备状态听觉监测方法，有效解决了当前采用的传统声音监测方法，效率低、难度大以及机器学习方法数据规模大、训练次数多的问题，包括以下步骤：The power plant equipment status auditory monitoring method that integrates the frequency band downward attention mechanism effectively solves the problems of low efficiency and difficulty of the currently used traditional sound monitoring methods, as well as the large data scale and high number of training times of the machine learning method, including the following steps:

S1、设置声音传感器，获取电气设备运行的声音；S1. Set up a sound sensor to obtain the sound of the operation of electrical equipment;

S2、对电气设备运行的声音进行预处理，得到预处理后的声音数据；S2. Preprocess the sound of the operation of the electrical equipment and obtain the preprocessed sound data;

S3、对预处理后的声音数据进行频带自向下注意力机制处理；S3. Perform frequency band downward attention mechanism processing on the preprocessed sound data;

S4、进行卷积神经网络识别，得到识别结果。S4. Perform convolutional neural network recognition and obtain the recognition results.

进一步地，步骤S1中，通过预置声音传感器位置，收集监测的电气设备运行的声音并进行存储。Further, in step S1, by presetting the position of the sound sensor, the sound of the monitored electrical equipment operation is collected and stored.

进一步地，步骤S2中，对电气设备运行的声音进行预处理包括声音时长处理、傅里叶变换以及梅尔频谱变换。Further, in step S2, preprocessing the sound of the electrical equipment operation includes sound duration processing, Fourier transform and Mel spectrum transform.

进一步地，所述声音时长处理具体如下：Further, the sound duration processing is specifically as follows:

将完整的一段电气设备运行的音频以设置的时长进行切割分离，若未满足时长要求，则舍去。Cut and separate a complete piece of audio running on electrical equipment according to the set duration. If the duration requirement is not met, it will be discarded.

进一步地，所述傅里叶变换将声音时长处理得到的每段音频进行时频域变换，将时域信号变为频域信号，具体如下：Further, the Fourier transform performs time-frequency domain transformation on each audio segment obtained by processing the sound duration, and converts the time domain signal into a frequency domain signal, specifically as follows:

其中，f(t)为时域信号，F(ω)为频域信号，i为虚数单位，ω为角频率。Among them, f(t) is the time domain signal, F(ω) is the frequency domain signal, i is the imaginary unit, and ω is the angular frequency.

进一步地，所述梅尔频谱变换将傅里叶变换得到的频域信号以梅尔标度为单位进行变换，得到声音信号的梅尔频谱图，具体如下：Further, the Mel spectrum transform transforms the frequency domain signal obtained by Fourier transform in units of Mel scale to obtain the Mel spectrum diagram of the sound signal, as follows:

其中，

f为频率，MEL(f)为梅尔频谱标度下的梅尔频率。 in,

f is frequency, and MEL(f) is Mel frequency in Mel spectrum scale.

进一步地，步骤S3中，频带自向下注意力机制处理具体如下：Further, in step S3, the frequency band downward attention mechanism processing is as follows:

由梅尔频谱图可以看出频带在空间的分布，具有左右分布不均匀、集中在低频段且由往下蔓延的趋势，因此本发明提出了频带自向下注意力机制，能更好关注电气设备的频带向下集中的区域，将梅尔频谱矩阵进行如下的变换，得到一个注意力矩阵，将注意力矩阵与原梅尔频谱矩阵进行元素相乘，具体如下：It can be seen from the Mel spectrogram that the distribution of frequency bands in space has a tendency of uneven distribution on the left and right, concentrated in low frequency bands, and spreading downwards. Therefore, the present invention proposes a frequency band downward attention mechanism, which can better pay attention to electrical In the area where the frequency band of the device is concentrated downward, the Mel spectrum matrix is transformed as follows to obtain an attention matrix. The attention matrix is element-wise multiplied by the original Mel spectrum matrix, as follows:

M'＝X _注意力·M (4) M'=X _attention ·M (4)

其中，M为由MEL(f)所组成的梅尔频谱矩阵，M'为经过注意力机制处理后的梅尔频谱矩阵，x _j,k为梅尔频谱矩阵中第j行、第k列的数值，x _j,k'为向下叠加归一化后的数值，用以构成变换后的注意力矩阵X _注意力中的元素，j≤n，k≤m，n和m分别为梅尔频谱矩阵的行数与列数。 Among them, M is the Mel spectrum matrix composed of MEL(f), M' is the Mel spectrum matrix processed by the attention mechanism, x _j,k is the jth row and kth column of the Mel spectrum matrix. Values, x _{j, k} ' are the values after downward superposition and normalization, used to form the elements in the transformed attention matrix X _attention , j ≤ n, k ≤ m, n and m are Mel spectrum respectively The number of rows and columns of the matrix.

进一步地，步骤S4中，卷积神经网络包括输入层、第一卷积层、第一池化层、第二卷积层、第二池化层、第一全连接层、第二全连接层和输出层；Further, in step S4, the convolutional neural network includes an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first fully connected layer, and a second fully connected layer. and output layer;

输入层输入经过频带自向下注意力机制处理后的梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。The input layer inputs the Mel spectrogram processed by the frequency band self-down attention mechanism. The first convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2, downsampling; the second convolution layer sets the convolution kernel to 3*3, and the number of convolution kernels is 64; the second pooling layer sets the window size to 2*2, downsampling; the first full The length of the connection layer setting list is 512; the length of the second fully connected layer setting list is the number of electrical equipment states to be monitored; the output of the output layer is the statistical recognition accuracy of the test set or verification set.

对卷积神经网络进行训练，将待检测音频的梅尔频谱图作为输入，调用训练好的卷积神经网络，即可得到识别结果。Train the convolutional neural network, take the Mel spectrogram of the audio to be detected as input, and call the trained convolutional neural network to get the recognition result.

相比与现有技术，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

本发明提出的方法解决了传统声音监测方法需要深度挖掘不同电气设备运行状态声音特征从而导致难度大、效率低的问题；The method proposed by the present invention solves the problem that traditional sound monitoring methods require deep mining of sound characteristics of different electrical equipment operating states, resulting in high difficulty and low efficiency;

本发明提出的方法解决了机器学习方法直接套用从而导致样本需求大、训练次数多的问题；The method proposed by the present invention solves the problem of direct application of machine learning methods, which leads to large sample requirements and multiple training times;

本发明提出的新的频带自向下注意力机制，结合电气设备运行声音的特点，对梅尔频谱图中集中向下蔓延的区域惊醒关注，识别效果更好，训练次数更少。The new frequency band downward attention mechanism proposed by the present invention, combined with the characteristics of the operating sound of electrical equipment, awakens attention to the areas that spread downward in the Mel spectrogram, resulting in better recognition results and fewer training times.

附图说明Description of the drawings

图1为本发明实施例中融合频带自向下注意力机制的电厂设备状态听觉监测方法的步骤流程图；Figure 1 is a step flow chart of a power plant equipment status auditory monitoring method integrating frequency band downward attention mechanism in an embodiment of the present invention;

图2为本发明实施例中的梅尔顿谱图；Figure 2 is a Melton spectrum in an embodiment of the present invention;

图3为本发明实施例中的卷积神经网络结构图；Figure 3 is a structural diagram of a convolutional neural network in an embodiment of the present invention;

图4为本发明实施例1中识别结果示意图；Figure 4 is a schematic diagram of the recognition results in Embodiment 1 of the present invention;

图5为本发明实施例2中识别结果示意图。Figure 5 is a schematic diagram of the recognition results in Embodiment 2 of the present invention.

图6为本发明实施例3中识别结果示意图。Figure 6 is a schematic diagram of the identification results in Embodiment 3 of the present invention.

具体实施方式Detailed ways

上述识别和跟踪方法组合为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他任何未背离本发明的精神实质和原理下所作的修改、修饰、替代、组合、简化，均应为等效的置换方式，都应包含在本发明的保护范围之内。The above combination of identification and tracking methods is a preferred embodiment of the present invention, but the implementation of the present invention is not limited by the above embodiments. Any other modifications, modifications, substitutions, etc. that do not deviate from the spirit and principles of the present invention. Combinations, simplifications, and equivalent substitutions should all be included in the protection scope of the present invention.

实施例1：Example 1:

融合频带自向下注意力机制的电厂设备状态听觉监测方法，有效解决了当前采用的传统声音监测方法，效率低、难度大以及机器学习方法数据规模大、训练次数多的问题，如图1所示，包括以下步骤：The power plant equipment status auditory monitoring method that integrates the frequency band downward attention mechanism effectively solves the problems of low efficiency and difficulty of the traditional sound monitoring methods currently used, as well as the large data scale and high number of training times of the machine learning method, as shown in Figure 1 display, including the following steps:

通过预置声音传感器位置，收集监测的电气设备运行的声音，以单通道WAV形式存储。By presetting the sound sensor position, the sound of the monitored electrical equipment operation is collected and stored in the form of single-channel WAV.

对电气设备运行的声音进行预处理包括声音时长处理、傅里叶变换以及梅尔频谱变换。Preprocessing the sound of electrical equipment operation includes sound duration processing, Fourier transform and Mel spectrum transform.

所述声音时长处理具体如下：The sound duration processing is specifically as follows:

将完整的一段电气设备运行的音频以0.25ms为时长进行切割分离，若未满足时长要求，则舍去。Cut and separate a complete piece of audio running on electrical equipment with a duration of 0.25ms. If the duration requirement is not met, it will be discarded.

所述傅里叶变换将声音时长处理得到的每段音频进行时频域变换，将时域信号变为频域信号，具体如下：The Fourier transform transforms each audio segment obtained by processing the sound duration into the time-frequency domain, and converts the time-domain signal into a frequency-domain signal. The details are as follows:

其中，f(t)为时域信号F(ω)为频域信号，i为虚数单位，ω为角频率。Among them, f(t) is the time domain signal, F(ω) is the frequency domain signal, i is the imaginary unit, and ω is the angular frequency.

如图2所示，所述梅尔频谱变换将傅里叶变换得到的频域信号以梅尔标度为单位进行变换，得到声音信号的梅尔频谱图，具体如下：As shown in Figure 2, the Mel spectrum transform transforms the frequency domain signal obtained by Fourier transform in units of Mel scale to obtain the Mel spectrum diagram of the sound signal, as follows:

其中，

f为频率，MEL(f)为梅尔频谱标度下的梅尔频率。 in,

f is frequency, and MEL(f) is Mel frequency in Mel spectrum scale.

S3、对预处理后的声音数据进行频带自向下注意力机制处理，具体如下：S3. Perform frequency band downward attention mechanism processing on the preprocessed sound data, as follows:

M'＝X _注意力 ^·M (4) M'=X _attention ^· M (4)

S4、进行卷积神经网络识别，得到识别结果；S4. Perform convolutional neural network recognition and obtain the recognition results;

如图3所示，卷积神经网络包括输入层、第一卷积层、第一池化层、第二卷积层、第二池化层、第一全连接层、第二全连接层和输出层；As shown in Figure 3, the convolutional neural network includes an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first fully connected layer, a second fully connected layer and output layer;

训练数据为2400音频数量，测试数据为600音频数量，时长为25ms。输入层输入经过频带自向下注意力机制处理后的梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。The training data is 2400 audio numbers, the test data is 600 audio numbers, and the duration is 25ms. The input layer inputs the Mel spectrogram processed by the frequency band self-down attention mechanism. The first convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2, downsampling; the second convolution layer sets the convolution kernel to 3*3, and the number of convolution kernels is 64; the second pooling layer sets the window size to 2*2, downsampling; the first full The length of the connection layer setting list is 512; the length of the second fully connected layer setting list is the number of electrical equipment states to be monitored; the output of the output layer is the statistical recognition accuracy of the test set or verification set.

识别结果如图4所示。直线为本发明方法的训练识别准确率，横虚线为二层卷积神经网络方法的训练识别准确率。The recognition results are shown in Figure 4. The straight line represents the training recognition accuracy of the method of the present invention, and the horizontal dotted line represents the training recognition accuracy of the two-layer convolutional neural network method.

根据前两次准确率的对比，本发明方法比二层卷积神经网络有较高的准确率，识别效果较好。随着训练迭代数的增加，本发明方法、二层卷积神经网络均逐渐提高准确率According to the comparison of the first two accuracy rates, the method of the present invention has a higher accuracy rate and better recognition effect than the two-layer convolutional neural network. As the number of training iterations increases, the method of the present invention and the two-layer convolutional neural network gradually improve the accuracy.

其中，二层卷积神经网络包括输入层、第一卷积层、第一池化层、第二卷积层、第二池化层、第一全连接层、第二全连接层和输出层。输入层直接输入梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。Among them, the two-layer convolutional neural network includes an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first fully connected layer, a second fully connected layer and an output layer. . The input layer directly inputs the Mel spectrogram. The first convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2 and downsamples; the second The convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 64; the second pooling layer sets the window size to 2*2 and downsamples; the first fully connected layer sets the list length to 512; the second The length of the fully connected layer setting list is the number of electrical equipment states that need to be monitored; the output of the output layer is the statistical recognition accuracy of the test set or verification set.

实施例2：Example 2:

训练数据为2000音频数量，测试数据为500音频数量，时长为25ms。输入层输入经过频带自向下注意力机制处理后的梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。The training data is 2000 audio numbers, the test data is 500 audio numbers, and the duration is 25ms. The input layer inputs the Mel spectrogram processed by the frequency band self-down attention mechanism. The first convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2, downsampling; the second convolution layer sets the convolution kernel to 3*3, and the number of convolution kernels is 64; the second pooling layer sets the window size to 2*2, downsampling; the first full The length of the connection layer setting list is 512; the length of the second fully connected layer setting list is the number of electrical equipment states to be monitored; the output of the output layer is the statistical recognition accuracy of the test set or verification set.

识别结果如图5所示。直线为本发明方法的训练识别准确率，横虚线为二层卷积神经网络方法的训练识别准确率。The recognition results are shown in Figure 5. The straight line represents the training recognition accuracy of the method of the present invention, and the horizontal dotted line represents the training recognition accuracy of the two-layer convolutional neural network method.

根据前两次准确率的对比，本发明方法比二层卷积神经网络有高的准确率，识别效果较好。随着训练迭代数的增加，本发明方法、二层卷积神经网络均逐渐提高准确率According to the comparison of the first two accuracy rates, the method of the present invention has a higher accuracy than the two-layer convolutional neural network, and the recognition effect is better. As the number of training iterations increases, the method of the present invention and the two-layer convolutional neural network gradually improve the accuracy.

实施例3：Example 3:

训练数据为1200音频数量，测试数据为300音频数量，时长为50ms。输入层输入经过频带自向下注意力机制处理后的梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。The training data is 1200 audio numbers, the test data is 300 audio numbers, and the duration is 50ms. The input layer inputs the Mel spectrogram processed by the frequency band self-down attention mechanism. The first convolution layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2, downsampling; the second convolution layer sets the convolution kernel to 3*3, and the number of convolution kernels is 64; the second pooling layer sets the window size to 2*2, downsampling; the first full The length of the connection layer setting list is 512; the length of the second fully connected layer setting list is the number of electrical equipment states to be monitored; the output of the output layer is the statistical recognition accuracy of the test set or verification set.

识别结果如图6所示。直线为本发明方法的训练识别准确率，横虚线为二层卷积神经网络方法的训练识别准确率。The recognition results are shown in Figure 6. The straight line represents the training recognition accuracy of the method of the present invention, and the horizontal dotted line represents the training recognition accuracy of the two-layer convolutional neural network method.

根据前两次准确率的对比，本发明方法比二层卷积神经网络有高的准确率，识别效果较好。随着训练迭代数的增加，本发明方法、二层卷积神经网络均逐渐提高准确率。According to the comparison of the first two accuracy rates, the method of the present invention has a higher accuracy than the two-layer convolutional neural network, and the recognition effect is better. As the number of training iterations increases, the method of the present invention and the two-layer convolutional neural network gradually improve the accuracy.

Claims

融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，包括以下步骤：A power plant equipment status auditory monitoring method integrating frequency band downward attention mechanism is characterized by including the following steps:

S1、设置声音传感器，获取电气设备运行的声音；S1. Set up a sound sensor to obtain the sound of the operation of electrical equipment;

S2、对电气设备运行的声音进行预处理，得到预处理后的声音数据；S2. Preprocess the sound of the operation of the electrical equipment and obtain the preprocessed sound data;

S3、对预处理后的声音数据进行频带自向下注意力机制处理；S3. Perform frequency band downward attention mechanism processing on the preprocessed sound data;

S4、进行卷积神经网络识别，得到识别结果。S4. Perform convolutional neural network recognition and obtain the recognition results.
根据权利要求1所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，步骤S1中，通过预置声音传感器位置，收集监测的电气设备运行的声音并进行存储。The power plant equipment status auditory monitoring method integrating frequency band downward attention mechanism according to claim 1, characterized in that in step S1, the sound of the monitored electrical equipment operation is collected and stored by presetting the sound sensor position.
根据权利要求1所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，步骤S2中，对电气设备运行的声音进行预处理包括声音时长处理、傅里叶变换以及梅尔频谱变换。The power plant equipment status auditory monitoring method integrating the frequency band self-down attention mechanism according to claim 1, characterized in that in step S2, preprocessing the sound of the electrical equipment operation includes sound duration processing, Fourier transform and Mel spectrum transform.
根据权利要求3所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，所述声音时长处理具体如下：The power plant equipment status auditory monitoring method integrating frequency band downward attention mechanism according to claim 3, characterized in that the sound duration processing is specifically as follows:

将完整的一段电气设备运行的音频以设置的时长进行切割分离，若未满足时长要求，则舍去。Cut and separate a complete piece of audio running on electrical equipment according to the set duration. If the duration requirement is not met, it will be discarded.
根据权利要求4所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，所述傅里叶变换将声音时长处理得到的每段音频进行时频域变换，将时域信号变为频域信号，具体如下：The auditory monitoring method for power plant equipment status integrating frequency band self-down attention mechanism according to claim 4, characterized in that the Fourier transform transforms each audio segment obtained by processing the sound duration into the time-frequency domain, and transforms the time into the frequency domain. domain signal becomes a frequency domain signal, as follows:

其中，f(t)为时域信号，F(ω)为频域信号，i为虚数单位，ω为角频率。Among them, f(t) is the time domain signal, F(ω) is the frequency domain signal, i is the imaginary unit, and ω is the angular frequency.
根据权利要求5所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，所述梅尔频谱变换将傅里叶变换得到的频域信号以梅尔标度为单位进行变换，得到声音信号的梅尔频谱图，具体如下：The power plant equipment status auditory monitoring method integrating frequency band self-down attention mechanism according to claim 5, characterized in that the Mel spectrum transform converts the frequency domain signal obtained by Fourier transform into Mel scale as a unit Perform transformation to obtain the Mel spectrogram of the sound signal, as follows:

其中，
f为频率，MEL(f)为梅尔频谱标度下的梅尔频率。 in,
f is frequency, and MEL(f) is Mel frequency in Mel spectrum scale.
根据权利要求1所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，步骤S3中，频带自向下注意力机制处理具体如下：The power plant equipment status auditory monitoring method integrating frequency band downward attention mechanism according to claim 1, characterized in that, in step S3, the frequency band downward attention mechanism processing is as follows:

将梅尔频谱矩阵进行如下的变换，得到一个注意力矩阵，将注意力矩阵与原梅尔频谱矩阵进行元素相乘，具体如下：Transform the Mel spectrum matrix as follows to obtain an attention matrix. Multiply the attention matrix and the original Mel spectrum matrix element by element, as follows:

M'＝X _注意力·M (4) M'=X _attention ·M (4)

其中，M为由MEL(f)所组成的梅尔频谱矩阵，M'为经过注意力机制处理后的梅尔频谱矩阵，x _j,k为梅尔频谱矩阵中第j行、第k列的数值，x _j,k'为向下叠加归一化后的数值，用以构成变换后的注意力矩阵X _注意力中的元素，j≤n，k≤m，n和m分别为梅尔频谱矩阵的行数与列数。 Among them, M is the Mel spectrum matrix composed of MEL(f), M' is the Mel spectrum matrix processed by the attention mechanism, x _j,k is the jth row and kth column of the Mel spectrum matrix. Values, x _{j, k} ' are the values after downward superposition and normalization, used to form the elements in the transformed attention matrix X _attention , j ≤ n, k ≤ m, n and m are Mel spectrum respectively The number of rows and columns of the matrix.
根据权利要求1所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，步骤S4中，卷积神经网络包括输入层、第一卷积层、第一池化层、第二卷积层、第二池化层、第一全连接层、第二全连接层和输出层。The power plant equipment status auditory monitoring method integrating frequency band self-down attention mechanism according to claim 1, characterized in that, in step S4, the convolutional neural network includes an input layer, a first convolution layer, and a first pooling layer. , the second convolutional layer, the second pooling layer, the first fully connected layer, the second fully connected layer and the output layer.
根据权利要求8所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，输入层输入经过频带自向下注意力机制处理后的梅尔频谱图，第一卷积层设置卷积核为3*3，卷积核个数为32；第一池化层设置窗口大小为2*2，向下采样；第二卷积层设置卷积核为3*3，卷积核个数为64；第二池化层设置窗口大小为2*2，向下采样；第一全连接层设置列表长度为512；第二全连接层设置列表长度为所需监测的电气设备状态数量；输出层的输出为统计后的测试集或验证集的识别准确率。The power plant equipment status auditory monitoring method integrating the frequency band downward attention mechanism according to claim 8, characterized in that the input layer inputs the Mel spectrogram processed by the frequency band downward attention mechanism, and the first convolution The layer sets the convolution kernel to 3*3 and the number of convolution kernels to 32; the first pooling layer sets the window size to 2*2 and downsamples; the second convolution layer sets the convolution kernel to 3*3 and convolution The number of accumulation cores is 64; the second pooling layer setting window size is 2*2, downsampling; the first fully connected layer setting list length is 512; the second fully connected layer setting list length is the electrical equipment to be monitored The number of states; the output of the output layer is the statistical recognition accuracy of the test set or verification set.
根据权利要求1～8任一项所述的融合频带自向下注意力机制的电厂设备状态听觉监测方法，其特征在于，步骤S4中，对卷积神经网络进行训练，将待检测音频的梅尔频谱图作为输入，调用训练好的卷积神经网络，即可得到识别结果。The auditory monitoring method for power plant equipment status integrating frequency band downward attention mechanism according to any one of claims 1 to 8, characterized in that, in step S4, the convolutional neural network is trained, and the audio frequency of the audio to be detected is Taking the Er spectrogram as input and calling the trained convolutional neural network, the recognition result can be obtained.