CN112509599A

CN112509599A - Acoustic spectrum fault analysis and diagnosis method based on BP neural network and Mel cepstrum

Info

Publication number: CN112509599A
Application number: CN202011130870.2A
Authority: CN
Inventors: 钱立志; 殷希梅; 陈栋; 陈凯; 王曙光; 宁全利; 张晓龙; 马翰宇; 凌冲; 蒋滨安; 石胜斌; 朱建生; 周生; 吴刚; 孙姗姗; 康焰清
Original assignee: PLA Army Academy of Artillery and Air Defense
Current assignee: PLA Army Academy of Artillery and Air Defense
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-03-16

Abstract

The invention provides a sound spectrum fault analysis and diagnosis method based on a BP neural network and a Mel cepstrum, which relates to the field of automobile maintenance equipment and comprises the following specific generation processes of a cepstrum: firstly, framing is carried out, the frame length and the step length are as before, then windowing is carried out, then frequency spectrum data are obtained through FFT (fast Fourier transform), then a Mel frequency spectrum is obtained through a Mel filter bank, and DCT (discrete cosine transform) is carried out after the Mel frequency spectrum is subjected to logarithmic transformation to obtain cepstrum data. In the invention, the sound of the engine of the wheel-type transport vehicle is detected by using a method of a sound spectrum fault analysis and diagnosis method of a BP neural network and a Mel cepstrum.

Description

Acoustic spectrum fault analysis and diagnosis method based on BP neural network and Mel cepstrum

Technical Field

The invention relates to the field of automobile maintenance equipment, in particular to a sound spectrum fault analysis and diagnosis method based on a BP neural network and a Mel cepstrum.

Background

The sound detection of the engine of the wheel type transport vehicle is an important component part for the maintenance work of the transport vehicle in daily life, and the quality and safety of the transport vehicle are directly influenced by the level of the maintenance level and the quality of the work of the transport vehicle. Therefore, software development and equipment manufacturing of the audio fault prediction system of the transport vehicle engine are carried out.

The system for predicting the audio fault of the engine of the wheel type transport vehicle uses terminal acquisition equipment to acquire audio signals of the sound of the engine and automatically sends the audio signals to a server, and analyzes, processes and identifies the audio information through a neural network and a Mel cepstrum to output various audio analysis images so as to predict the equipment fault. And meanwhile, the collected audio information of certain type of combat equipment is analyzed and extracted to form a model library and an application system.

Disclosure of Invention

The invention aims to provide a sound spectrum fault analysis and diagnosis method based on a BP neural network and a Mel cepstrum, so as to solve the technical problems.

In order to solve the technical problems, the invention adopts the following technical scheme: a sound spectrum fault analysis and diagnosis method based on a BP neural network and Mel cepstrum comprises the following steps:

s1, setting acquisition parameters, namely setting parameters of a system server, setting server IP (Internet protocol), ports and acquisition time, defaulting for 30 seconds, setting the maximum time to 120 seconds, and setting step length interval to 10 seconds;

s2, data acquisition, wherein the data acquisition comprises environment acquisition and data acquisition, the environment acquisition is to acquire surrounding environment sounds through terminal equipment, main parameters comprise acquisition date, temperature, weather, sound sampling rate, default audio acquisition duration and the like, and a file naming format is as follows: naming an example of-25 ℃ -sunny-44.1 KHz-30s.wav, wherein the data acquisition is the acquisition of equipment sound, and mainly sets acquisition date, vehicle model, engine model, kilometer number, acquisition position, fault type, sampling rate and the like, and the naming format is as follows: named example-vehicle model-engine model-kilometers-acquisition location-fault type wav;

s3, data transmission, namely selecting a file to transmit after the data transmission module collects the collected data audio file in an off-line mode and communicates with a server;

and S4, data analysis, wherein the data analysis is performed through a BP neural network analysis module and a spectrogram and cepstrum module, the BP neural network analysis module performs response parameter setting through three of a training data file, a guidance output data file and a simulation test data file, and finally outputs a training data model for processing and analyzing the audio file collected by the terminal equipment and giving a response result, and the spectrogram and cepstrum module performs spectrogram generation and cepstrum generation on the specified audio file.

Preferably, the cepstrum generation process in the cepstrum module is as follows: firstly, framing is carried out, the frame length and the step length are as before, then windowing is carried out, then frequency spectrum data are obtained through FFT (fast Fourier transform), then a Mel frequency spectrum is obtained through a Mel filter group, and DCT (discrete cosine transform) is carried out after the Mel frequency spectrum is subjected to logarithmic transformation to obtain cepstrum data.

Preferably, the cepstrum analysis step:

s1, establishing a MatLab project, opening the MatLab, adding a directory where audio files are located to a MatLab path, adding 10 displayed audio files to a single folder, facilitating subsequent analysis and use, downloading a VoiceBox tool box, adding the VoiceBox tool box to the MatLab path, and compiling a MatLab script to generate a spectrogram and data thereof;

s2, extracting training samples, wherein a showMFCC function can output cepstrum data of an audio file to a specified file in a longitudinal quantity mode, reading all wav files under an audio folder, outputting the cepstrum longitudinal quantity data of the audio file to text files with the same name prefixes, splicing 10 generated text files into a file train data.

S3, guiding output values of the training samples, wherein according to the operation of the previous step, in the trainData training data set, the first 5 training samples are extracted from fault audios, the last 5 training samples are extracted from normal audios, and in the experiment, the guiding output value of the fault samples is 100; for normal samples, the output value is directed to be 0;

s4, training and testing BPNN, writing a MatLab script to construct BPNN, reading in train data txt and train DataResult txt as training data, testing the BPNN by using the train data txt as input data, and executing the script to obtain the training condition of the BPNN, wherein the expected output is the content of the train DataResult txt, the repeated training times of the training are 2000 times, the convergence error is 1e-7, the learning rate is 0.01;

and S5, adjusting the BPNN setting for testing again, taking measures and repeatedly testing the first reason, wherein three reasons are adopted for the fourth step that the experimental result does not reach the expected value. Changing the learning rate and carrying out training test, taking measures and repeatedly testing aiming at the second reason, only changing the number of the middle layers to 2 and carrying out test relative to the fourth step, and carrying out analysis test on 7 audio files including transmission shaft knocking audio, girder knocking audio, right front audio at idle speed, left front audio at idle speed, right front audio at acceleration and left front audio at acceleration sequentially according to the third reason, wherein each audio file needs to be subjected to the steps 2, 3 and 4 to test the output result of the BPNN.

Preferably, the BP neural network analysis module uses a BP neural network technology, when the BPNN is trained, the BPNN takes the audio data in the read template library and its category as training samples to perform repeated training, and when the BPNN is input, an audio cepstrum is to be converted into text data, all information of each audio cepstrum can be represented as 48 coordinate points, and abscissa values of the coordinate points are fixed to be 1 to 48, so that the abscissa can be ignored, only the ordinate is concerned, and the number of nodes in each intermediate layer of the BPNN can be calculated by a node calculation formula

Or m is log₂np is obtained by calculation.

The invention has the beneficial effects that:

in the invention, the sound of the engine of the wheel-type transport vehicle is detected by using a method of a sound spectrum fault analysis and diagnosis method of a BP neural network and a Mel cepstrum.

Drawings

FIG. 1 is a cepstrum obtained by performing DCT transform after Mel frequency spectrum plus logarithmic transform in accordance with the present invention;

FIG. 2 is an average spectrogram corresponding to a recording right before a group of failed vehicles according to the present invention when idling;

FIG. 3 is a set of average inversion charts corresponding to the recording immediately before the normal vehicle idling of the present invention;

FIG. 4 is a line graph of the training and testing results of the BPNN of the present invention;

FIG. 5 is a line graph of the experimental results of a retest of the first cause adjustment BPNN settings of the present invention;

FIG. 6 is a line graph of the experimental results of a second reason adjustment BPNN setting retest of the present invention;

FIG. 7 is a line graph of the results of a retest experiment with the BPNN setting adjusted for a second reason in accordance with the present invention;

FIG. 8 is a line graph of the training of the BPNN of the present invention;

FIG. 9 is a scatter plot of the output of the BPNN test on 100 audio sets according to the present invention.

Detailed Description

In order to make the technical means, the original characteristics, the achieved purposes and the effects of the invention easily understood, the invention is further explained below by combining the specific embodiments and the attached drawings, but the following embodiments are only the preferred embodiments of the invention, and not all embodiments are provided. Other embodiments, which can be obtained by persons skilled in the art without creative efforts based on the embodiments, belong to the protection scope of the invention.

Specific embodiments of the present invention are described below with reference to the accompanying drawings.

Example 1

As shown in fig. 1-3, a method for analyzing and diagnosing a cepstrum fault based on a BP neural network and mel cepstrum includes cepstrum analysis of a sample, wherein according to a research scheme, a mel cepstrum filter bank adopts 48 triangular band pass filters, discrete cosine transform (DCT transform) for obtaining cepstrum data adopts 48 point locations, and cepstrum data of each frame is transformed by RGB to obtain a cepstrum; the specific generation process is as follows: firstly, framing is carried out, the frame length and the step length are as before, then windowing is carried out, then spectrum data are obtained through FFT (fast Fourier transform), then a Mel frequency spectrum is obtained through a Mel filter bank, the Mel frequency spectrum is subjected to logarithmic transformation and then DCT (discrete cosine transformation) is carried out to obtain cepstrum data, as shown in figure 1, the abscissa in figure 1 represents time, the ordinate represents frequency, and the color depth represents amplitude.

It can be seen that similar to the audio map, the cepstrum information of each frame is relatively averaged, so that the average cepstrum information of each frame can reflect the overall characteristics of the frame. Fig. 2 is an average cepstrum corresponding to a group of records immediately before a fault vehicle idles, fig. 3 is an average cepstrum corresponding to a group of records immediately before a normal vehicle idles, a horizontal coordinate in fig. 2 represents 48 point locations of the mel-frequency cepstrum dimension, a vertical coordinate represents an amplitude, a horizontal coordinate in fig. 3 represents 48 point locations of the mel-frequency cepstrum dimension, and a vertical coordinate represents an amplitude.

And obtaining average cepstrum information of other samples for analysis, wherein 5 groups of average cepstrum information corresponding to the sound recorded immediately before idling of the fault vehicle are obtained, and 5 groups of average cepstrum information corresponding to the sound recorded immediately before idling of the normal vehicle are obtained.

Compared with the prior art, feature points for distinguishing are difficult to find through observation, but because the dimension of single sample data is low (48-bit vector) and static data information of the sample can be completely and intensively reflected, the single sample data can be classified by adopting a BP neural network, and the subsequent research can train the neural network to see whether the static features can be directly used for classification.

Example 2

As shown in fig. 1 to 9, a method for analyzing and diagnosing a failure of a sound spectrum based on a BP neural network and mel cepstrum includes a cepstrum analysis process:

step 1, establishing a MatLab project.

And after the MatLab is opened, adding the directory where the audio files are located to the MatLab path, and adding the 10 audio files shown in the figure 4 to a single folder, so that the audio files can be conveniently analyzed and used subsequently. The VoiceBox toolkit is downloaded and added to the MatLab path. And writing a MatLab script to generate a spectrogram and data thereof, wherein the abscissa in FIG. 4 represents 1 to 341 of the optimal training performance, and the ordinate represents the mean square error.

And 2, extracting a training sample.

The showmcc function may output cepstral data of an audio file to a specified file in the form of a vertical vector. The code is as follows, all wav files under the audio folder are read, and the cepstrum longitudinal vector data of the wav files are output to the text file with the same name prefix.

Code:

filePolder＝fullfile('mfcc/analysis/audiox')；

dirOutput＝dir(fullfile(filePolder,'*.wav'))；

fileNames＝{dirOutput.name}'；

for1:10

audioFile＝char(fileNames(i))

inputFile＝strcat('analysis/audiox/',fileNames(i))

outputFile＝strcat('mfcc/txt/',fileNames(i),'.txt')

showMFCC(char(inputFile),'f'，char(outputFile))；

end

and splicing the generated 10 text files into a file train data. Firstly, 5 fault sample data files are spliced into g.txt, then 5 normal sample data files are spliced into z.txt, and then the g.txt and the z.txt are spliced into train data. The partial code is as follows.

Part of codes:

g＝load('mfcc/txt/g.txt')；

z＝load('mfcc/txt/z.txt')；

trainData＝[g,z]；

save('mfcc/txt/trainData.txt'，trainData'，'-ascii')；

and 3, training the guide output value of the sample.

From the previous step of operation, it can be seen that in the raindata training dataset, the first 5 training samples are extracted from the failure audio, and the last 5 training samples are extracted from the normal audio. In the experiment, the output value of the fault sample is guided to be 100; for normal samples, the instructional output value is 0. Txt content of the guideline output file includes only 10 numbers that are space-divided: 10010010010010000000. after training using the BPNN of the guide value, the closer the output value is to 100, the greater the likelihood that the input data is from a faulty tone; the closer to 0, the opposite is true.

And 4, training and testing the BPNN.

Writing a MatLab script to construct BPNN, reading in and training the train data. The number of repeated training times of the training is 2000 times, the convergence error is 1e-7, and the learning rate is 0.01. The script is executed, and the obtained training situation and output result of the BPNN are shown in fig. 3.

It can be seen from fig. 4 that the result of this experiment does not reach the expected value, and when the Epoch is less than 50, the mse value has already converged locally, and cannot reach the set value 1 e-7.

And 5, adjusting the BPNN setting for testing again.

The results of the experiment in step 4 are not expected for three reasons: firstly, the learning rate of the BPNN is set inappropriately, so that local convergence is easy to occur in the training process; secondly, the number of BPNN intermediate layers is too small, so that the abstraction of the implicit commonalities in different types of audio cepstrum data cannot be completed; and thirdly, the relevance between the recording right before idling and the fault point is low, so that the fault can not cause the recording right before idling to be changed obviously. These three reasons will be examined in turn.

First, measures are taken for the first reason and the test is repeated. Changing the learning rate and carrying out training and testing, after repeating the experiment for many times, the result still can not reach the expected result, and the BPNN training under all conditions is expressed as a local convergence state. As shown in fig. 5, the left side shows the training performance when the learning rate is 0.001, and the right side shows the training performance when the learning rate is 0.1. In practical experiments, the learning rate is changed to be equal to 0.2, 0.3, 0.02, 0.05, 0.002, 0.005, and the experimental result is not ideal. The first reason can be excluded so far, the abscissa of the line graph in fig. 5 represents 1 to 991 of the best training performance, and the ordinate represents the mean square error; the abscissa of the line graph in fig. 5 represents 1 to 250 of the best training performance, and the ordinate represents the mean square error.

Next, measures are taken for a second reason and the test is repeated. Only the number of intermediate layers was changed to 2 with respect to step 4, and the test was performed. The training and test results are shown in fig. 6, and still do not reach the expected values. The number of middle layers was increased to 4, and the training and test results are shown in fig. 7. Besides, the number of the middle layers is also modified to be equal to 3, 5 and 6, and the test result is still unsatisfactory. The second reason can be excluded so far, the abscissa of the line graph of fig. 6 represents 1 to 233 of the best training performance, the ordinate represents the mean square error, the abscissa of the line graph of fig. 7 represents 1 to 686 of the best training performance, and the ordinate represents the mean square error.

The reason for the test result not reaching the expected value is locked to the third one. Aiming at the item, all recorded audio files comprise 7 audio frequencies, namely a transmission shaft knocking audio frequency, a girder knocking audio frequency, a right front audio frequency at idle speed, a left front audio frequency at idle speed, a right front audio frequency at acceleration and a left front audio frequency at acceleration, which are sequentially analyzed and tested. Each audio file is subjected to steps 2, 3, 4 to test the output of the BPNN. When the front audio frequency during acceleration is tested, the output result of the BPNN is in accordance with the expectation. Table 1 shows the results of three tests.

TABLE 1

Fig. 8 shows the training situation of BPNN, and it can be seen that in this training, the mse value steadily decreases as the training number increases, and reaches the expected convergence error 1e-7 when Epoch is 1009, the abscissa of the polygonal line graph in fig. 8 represents 1 to 1009 of the best training performance, and the ordinate represents the mean square error.

To verify the failure recognition capability of the BPNN, the lab team member recorded 100 sets of audio again (the first 50 sets were failure audio and the second 50 sets were normal audio), and used the BPNN test output results, table 2 shows that 100 sets of audio were recorded again and used the BPNN test output results.

100.0058	100.0058	100.0058	100.0058	100.0058
					100.0058	100.0058	100.0058	100.0058	100.0058
100.0058	100.0058	100.0058	100.0058	100.0058
					100.0058	0.0213	100.0058	100.0058	100.0058
100.0058	100.0058	100.0058	100.0058	100.0058
					100.0058	100.0058	100.0058	100.0058	100.0058
100.0058	100.0058	100.0058	100.0058	100.0058
					100.0058	100.0058	100.0058	100.0058	100.0058
0.0213	100.0058	100.0058	100.0058	100.0058
					100.0058	100.0058	100.0058	100.0058	100.0058
0.0213	0.0213	0.0213	0.0213	0.0213
					0.0213	0.0213	0.0213	0.0213	0.0213
0.0213	0.0213	0.0213	0.0213	0.0213
					0.0213	0.0213	0.0213	0.0213	0.0213
0.0213	0.0213	0.0213	0.0213	0.0213
					0.0213	0.0213	0.0213	0.0213	0.0213
0.0213	0.0213	0.0213	0.0213	0.0213
					0.0213	0.0213	0.0213	0.0213	0.0213
0.0213	0.0213	0.0213	0.0213	0.0213
					0.0213	0.0213	0.0213	0.0213	0.0213

TABLE 2

As can be seen from table 2 and fig. 9, in 100 sets of test data, 98 sets of output values meet the expected output. This again verifies that the BPNN has a certain fault recognition capability, with the abscissa of fig. 9 being the number of groups of 100 groups and the ordinate being the output value.

Example 3

As shown in FIGS. 1-9, a method for analyzing and diagnosing acoustic spectrum faults based on BP neural network and Mel cepstrum comprises that a BP neural network analysis module uses BP neural network technology, and when BPNN is trained, audio data and its category in a read template library are used as training samplesIn the method, repeated training is carried out, when the BPNN is input, an audio cepstrum is converted into text data, all information of each audio cepstrum can be represented as 48 coordinate points, the abscissa values of the coordinate points are fixed to be 1-48, so that the abscissa can be ignored, only the ordinate is concerned, and the number of nodes in each middle layer of the BPNN can be calculated through a node calculation formula

(r) or (m) log2 np.

The BPNN training process is a supervised learning process, i.e., each training sample needs to contain input data and output results. According to the research scheme, the board library to be modeled contains fault audio characteristic data and normal audio characteristic data. When the BPNN is trained, the audio data in the read template library and its category (whether failure occurs) are used as training samples to be repeatedly trained. After training, automatic fault identification can be carried out.

Through analysis, the audio cepstrum can completely and intensively reflect the static data information of the sample, so that the audio feature data in the template library can adopt the audio cepstrum data. To facilitate the input of the BPNN, the audio cepstrum needs to be converted into text data. All information of each audio cepstrum can be represented as 48 coordinate points, and the abscissa values of these coordinate points are fixed to 1 to 48, so that the abscissas can be ignored, and only the ordinate is focused, i.e., each audio cepstrum can be represented by 48 numbers.

The following conclusions can be drawn from the above analysis: the number of required input layer nodes of the BPNN is 48, and the required input layer nodes are used for receiving the textual data of the audio cepstrum; the number of output layer nodes is 1, and the output layer nodes are used for indicating whether faults exist or not.

The number of nodes in each intermediate layer of the BPNN can be calculated by an empirical formula (i) or (ii). In both equations, m represents the number of nodes per intermediate layer, n represents the number of input layer nodes, and p represents the number of output layer nodes. Substituting n-48 and p-1, and combining the results of the two formulas, the number of nodes in each intermediate layer is determined to be 7.

The number of intermediate (hidden) layers of the BPNN is not easily determined. In general, the greater the number of intermediate layers of the BPNN, the weaker the linear relationship between the input data and the output data is represented. In order to ensure that the most appropriate BPNN structure is adopted, the experiment starts from a single hidden layer, and the number of hidden layers is gradually increased for multiple tests.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are only preferred examples of the present invention and are not intended to limit the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the present invention, which fall within the scope of the claimed invention. The scope of the invention is defined by the appended claims and their equivalents.

Claims

1. A sound spectrum fault analysis and diagnosis method based on a BP neural network and a Mel cepstrum is characterized in that: the method comprises the following steps:

s1, setting acquisition parameters, namely setting parameters of a system server, setting a server IP (Internet protocol), a port and acquisition time, defaulting for 30 seconds, maximizing 120 seconds and setting a step interval of 10 seconds;

s2, data acquisition, wherein the data acquisition comprises environment acquisition and data acquisition, the environment acquisition is to acquire surrounding environment sounds through terminal equipment, main parameters comprise acquisition date, temperature, weather, sound sampling rate, default audio acquisition time and the like, and a file naming format is as follows: naming an example of-25 ℃ -sunny-44.1 KHz-30s.wav, wherein the data acquisition is the acquisition of equipment sound, and mainly sets acquisition date, vehicle model, engine model, kilometer number, acquisition position, fault type, sampling rate and the like, and the naming format is as follows: named example-vehicle model-engine model-kilometers-acquisition location-fault type wav;

2. The method for analyzing and diagnosing the acoustic spectrum fault based on the BP neural network and the Mel cepstrum as claimed in claim 1, wherein: the cepstrum generation process in the cepstrum module is as follows: firstly, framing is carried out, the frame length and the step length are as before, then windowing is carried out, then frequency spectrum data are obtained through FFT (fast Fourier transform), then a Mel frequency spectrum is obtained through a Mel filter bank, and DCT (discrete cosine transform) is carried out after the Mel frequency spectrum is subjected to logarithmic transformation to obtain cepstrum data.

3. The method for analyzing and diagnosing the acoustic spectrum fault based on the BP neural network and the Mel cepstrum as claimed in claim 1, wherein: the cepstrum analysis step:

s1, establishing a MatLab project, opening the MatLab, adding a directory where audio files are located to a MatLab path, adding 10 displayed audio files to a single folder for subsequent analysis and use, downloading a VoiceBox tool box, adding the VoiceBox tool box to the MatLab path, and compiling a MatLab script to generate a spectrogram and data thereof;

s2, extracting training samples, wherein a showMFCC function can output cepstrum data of an audio file to a specified file in a longitudinal vector mode, reading all wav files under an audio folder, outputting the cepstrum longitudinal vector data of the audio file to text files with the same name prefix, splicing 10 generated text files into a file train data.

s4, training and testing the BPNN, writing a MatLab script to construct the BPNN, reading in train data txt and train DataResult txt as training data, testing the BPNN by using the train data txt as input data, wherein the expected output is the content of the train DataResult. txt, the repeated training times of the training are 2000 times, the convergence error is 1e-7, the learning rate is 0.01, and executing the script to obtain the training condition of the BPNN;

and S5, adjusting the BPNN setting for testing again, wherein three reasons exist for the fourth step that the experimental result does not reach the expected value, and measures are taken and the test is repeated aiming at the first reason. Changing the learning rate and carrying out training test, taking measures and repeatedly testing aiming at the second reason, only changing the number of the middle layers to 2 and carrying out test relative to the fourth step, and carrying out analysis test on 7 audio files including transmission shaft knocking audio, girder knocking audio, right front audio at idle speed, left front audio at idle speed, right front audio at acceleration and left front audio at acceleration sequentially according to the third reason, wherein each audio file needs to be subjected to the steps 2, 3 and 4 to test the output result of the BPNN.

4. The method for analyzing and diagnosing the acoustic spectrum fault based on the BP neural network and the Mel cepstrum as claimed in claim 1, wherein: the BP neural network analysis module uses BP neural network technology, when the BPNN is trained, the audio data in the read template library and the category thereof are used as training samples to be repeatedly trained, and when the BPNN is input, an audio cepstrum is required to be converted into text data, all information of each audio cepstrum can be represented48 coordinate points, and the abscissa values of these coordinate points are fixed to 1 to 48, so that the abscissa can be ignored, only the ordinate is concerned, and the number of nodes of each intermediate layer of the BPNN can be calculated by a node calculation formula

(ii) or m ═ log₂np is obtained by calculation.