CN113705418A - Infrasound signal identification method, system and equipment based on MFCC and HMM - Google Patents

Infrasound signal identification method, system and equipment based on MFCC and HMM Download PDF

Info

Publication number
CN113705418A
CN113705418A CN202110972744.XA CN202110972744A CN113705418A CN 113705418 A CN113705418 A CN 113705418A CN 202110972744 A CN202110972744 A CN 202110972744A CN 113705418 A CN113705418 A CN 113705418A
Authority
CN
China
Prior art keywords
frequency
hmm
infrasound
signal
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110972744.XA
Other languages
Chinese (zh)
Inventor
苗家友
吴红莉
肖宏志
杨立学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
26th Unit 96901 Unit Chinese Pla
Original Assignee
26th Unit 96901 Unit Chinese Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 26th Unit 96901 Unit Chinese Pla filed Critical 26th Unit 96901 Unit Chinese Pla
Priority to CN202110972744.XA priority Critical patent/CN113705418A/en
Publication of CN113705418A publication Critical patent/CN113705418A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides an infrasound signal identification method, system, device and storage medium based on MFCC and HMM. Converting the frequency of the infrasound signal into a Mel frequency according with the auditory characteristics of human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences; extracting the feature of a Mel frequency cepstrum coefficient based on quasi-stable voiceprint data framing; setting an output probability threshold value for each HMM model, and determining the signal category; inputting known infrasound signal labeled data to obtain HMM parameters corresponding to the infrasound events, and training a model; inputting infrasound signal data, extracting MFCC features, calculating output probability aiming at each type of HMM, and determining the infrasound event with the maximum output probability. The invention improves the accuracy of feature extraction, is beneficial to subsequent classification and identification, and solves the problem that false alarm is easily generated on unknown sound signals.

Description

Infrasound signal identification method, system and equipment based on MFCC and HMM
Technical Field
The invention relates to the technical field of infrasound signal identification.
Background
Infrasound signals can be generated in human activities such as earthquakes, tsunamis, volcanoes, nuclear explosions, missile flight, gun firing, ship navigation, car racing, high-rise and bridge swaying. And the wavelength of the infrasound signal is often long and therefore can be diffracted around some large obstacles. Therefore, the method has great practical significance for the identification of infrasound signals. By accurately and timely recognizing various infrasound events, it is possible to effectively cope with the infrasound events.
However, in the existing infrasound signal identification, the frequency feature extraction of the infrasound signal does not fully consider the auditory characteristics of human ears, so that the extracted feature cannot be well used for identifying the infrasound event. Especially in complex field environments, many unknown acoustic signals are prone to false alarms.
Disclosure of Invention
The invention aims to provide a infrasound signal identification method, a system and equipment based on a Mel Frequency Cepstrum Coefficient (MFCC) and a Hidden Markov Model (HMM) so as to solve the technical problems that in the prior art, the classification identification precision is not high and false alarms are easy to generate.
In order to solve the above technical problem, the infrasound signal recognition method based on MFCC and HMM provided by the present invention includes:
step 1, designing a Maillard filter aiming at infrasound signals; converting the actual frequency into a Mel frequency according with the auditory characteristics of human ears according to the actual frequency of the infrasound signal and based on the auditory characteristics of human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences;
step 2, extracting the feature of the Mel frequency cepstrum coefficient based on the quasi-steady state voiceprint data framing; the method comprises the steps of framing ballistic wave voiceprint data, windowing, fast Fourier transform, calculating spectral line energy, calculating energy passing through a Miller filter, calculating DCT cepstrum and obtaining infrasonic signal characteristics;
step 3, signal identification based on a hidden Markov model; aiming at the problem that an unknown sound signal is easy to generate false alarm, an output probability threshold is set for each HMM model, the output probability of the model established by the same kind of signals according to the sound signal is greater than the threshold, and the output of the model established by different kinds of signals according to the sound signal is less than the threshold to determine the signal category; the method specifically comprises the following steps: firstly, inputting known infrasound signal labeled data, extracting MFCC (Mel frequency cepstrum coefficient) features, then training an HMM (hidden Markov model) model to obtain an HMM model corresponding to each infrasound event type, obtaining HMM parameters corresponding to the infrasound events, and training the model; secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
In order to solve the above technical problem, the infrasound signal recognition system based on MFCC and HMM according to the present invention includes:
the device comprises a Maire filter design module, a frequency conversion module and a frequency conversion module, wherein the Maire filter design module is used for converting actual frequency into Maire frequency which accords with the auditory characteristic of human ears according to the actual frequency of infrasound signals and on the basis of the auditory characteristic of the human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences;
the Mel frequency cepstrum coefficient feature extraction module comprises ballistic wave voiceprint data framing and windowing functions, fast Fourier transform is carried out, spectral line energy is calculated, energy passing through a Mel filter is calculated, DCT cepstrum is calculated, and infrasonic signal features are obtained;
the signal identification module is used for setting an output probability threshold value for each HMM model, determining the signal category according to the fact that the output probability of a model established by the sound signal through the same kind of signals is greater than the threshold value, and the output of the model established by the sound signal through different kinds of signals is less than the threshold value; the method specifically comprises the following steps: firstly, inputting known infrasound signal labeled data, extracting MFCC (Mel frequency cepstrum coefficient) features, then training an HMM (hidden Markov model), and finally obtaining an HMM model corresponding to each infrasound event type to obtain HMM parameters corresponding to the infrasound events; secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
In order to solve the above technical problem, the infrasound signal recognition apparatus based on MFCC and HMM according to the present invention includes:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the instructions are used for being stored by the memory and loaded by the processor and executing the method.
By adopting the technical scheme, the infrasound signal identification method, the infrasound signal identification system and the infrasound signal identification equipment based on the MFCC and the HMM, provided by the invention, have the advantages that the extracted frequency is more in line with the auditory characteristics of human ears by designing the distribution curve of the frequency, so that the subsequent classification identification precision is favorably improved; based on frequency division ratio feature extraction, the voiceprint frequency is divided into a series of triangular filter sequences, and the problem that false alarms are easily generated on unknown voice signals is solved by setting output probability threshold values for each HMM model and respectively processing the output probability threshold values.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an infrasound signal recognition method based on MFCC and HMM according to an embodiment of the present invention;
FIG. 2 is a mapping of a linear spectrum to a Mel spectrum;
FIG. 3 is a schematic diagram of frequency division feature extraction;
FIG. 4 is a Maillard filter spectral response curve;
FIG. 5 is a flow chart of MFCC coefficient calculation;
FIG. 6(a) is a 122 shot time domain waveform and MFCC coefficients;
FIG. 6(b) is a time domain waveform of 122 rocket launcher and MFCC coefficients;
FIG. 7 is a 130 shot time domain waveform and MFCC coefficients;
FIG. 8 is a graph of a 155 shot time domain waveform and MFCC coefficients;
FIG. 9 is a schematic diagram of HMM composition;
FIG. 10(a) is a schematic diagram of an HMM model;
FIG. 10(b) is a HMM model training process;
fig. 10(c) is a flowchart of infrasound signal recognition.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present invention will be further explained with reference to specific embodiments.
The technical terms related to the present embodiment include: infrasonic signals, also known as infrasonic waves, infrasound; mel, Mel; the Mel frequency, i.e., Mel frequency; Mel-Frequency Cepstral Coefficients, also known as Cepstral Coefficients, i.e. Mel-Frequency Cepstral Coefficients, abbreviated MFCC; the frequency spectrum is short for frequency spectrum density and is a distribution curve of frequency; mel frequency spectrum, the distribution curve of Mel frequency; hidden markov models, HMMs.
As shown in fig. 1, the present embodiment provides an infrasound signal recognition method based on MFCC and HMM, including the following steps:
step 1, designing a Maillard filter aiming at infrasound signals; converting the actual frequency into a Mel frequency according with the auditory characteristics of human ears according to the actual frequency of the infrasound signal and based on the auditory characteristics of human ears; and (4) frequency division ratio feature extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences.
And 2, extracting the feature of the Mel frequency cepstrum coefficient based on the quasi-steady-state voiceprint data framing. The method comprises the steps of framing and windowing ballistic wave voiceprint data, performing fast Fourier transform, calculating spectral line energy, calculating energy passing through a Miller filter, calculating DCT cepstrum and obtaining infrasonic signal characteristics.
Step 3, signal identification based on a hidden Markov model; setting an output probability threshold value for each HMM model, and determining the signal category according to the fact that the output probability of a model established by the sound signal through the same kind of signals is greater than the threshold value, and the output probability of the model established by the sound signal through different kinds of signals is less than the threshold value; the method specifically comprises the following steps: firstly, known infrasound signal labeled data are input, MFCC features are extracted, then training of an HMM model is carried out, an HMM model corresponding to each infrasound event type is obtained, HMM parameters corresponding to the infrasound events are obtained, and training of the model is carried out. Secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
In the step 1, a Mel filter for infrasound signals is designed, and the actual frequency is converted into the Mel frequency according with the auditory characteristics of human ears according to the actual frequency of the infrasound signals and based on the auditory characteristics of human ears.
Unlike common actual Frequency cepstrum analysis, MFCC (Mel-Frequency Cepstral Coefficients) analysis is focused on the auditory properties of the human ear. This is because the level of sound heard by the human ear is not linearly proportional to the frequency of the sound, and the Mel (Mel) frequency scale is more suitable for the auditory characteristics of the human ear. The values of the Mel frequency scale generally correspond to the logarithmic distribution of the actual frequencies. As shown in fig. 2, the linear spectrum is mapped to the mel-frequency spectrum.
The method for calculating the Mel frequency comprises the following steps:
FMel(f)=2595*lg(1+f/700)
wherein F represents the actual frequency of the signal in Hz, FMelIs the perceptual frequency in units of Mel (Mel).
The step 1 also comprises the following steps: and (4) frequency division ratio feature extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences.
Fig. 3 is a schematic diagram of frequency division feature extraction.
Setting several band-pass filters H in the frequency spectrum range of the voiceprintm(k) M is more than or equal to 0 and less than or equal to M, M is the serial number of the band-pass filter, and M is the number of the filters. Each filter has a triangular filtering characteristic,the centre frequency of the mth band-pass filter is f (m), and in the Mel frequency range, these filters are of equal bandwidth. Transfer function H of each band pass filterm(k) Comprises the following steps:
Figure BDA0003226313510000071
where k is a parameter of the transfer function,
Figure BDA0003226313510000072
the center frequency f (m) of the Maire filter is defined as
Figure BDA0003226313510000073
Wherein f ishAnd flRespectively the highest and lowest frequencies, f, of the filter banksIs the sampling frequency in Hz; n is the number of points of FFT transformation; FMel () is the conversion of frequency to Mel-frequency scale.
In the ballistic wave voiceprint data, although the sampling rate of the system is 10KHz only for reducing the error of the time delay estimation in the detection, the frequency of the signal is up to 1KHz by analyzing from the signal frequency spectrum, so the cutoff frequency of the mel filter is set to 1KHz, and the frequency response curve of the mel filter is shown in fig. 4.
Step 2, based on quasi-steady state voiceprint data framing, extraction of the feature of the Mel frequency cepstrum coefficient includes:
1) ballistic wave voiceprint data is framed and windowed.
Framing treatment: dividing the ballistic wave acoustic fringe signals into shorter frames, regarding each frame of signal as a steady-state signal, processing each frame of signal by adopting a method for processing the steady-state signal, and overlapping the adjacent two frames of signals;
windowing function: the purpose of the windowing function is to reduce leakage in the frequency domain, multiplying each frame of the ballistic voiceprint by a hamming window or a haining window. The ballistic wave voiceprint signal x (n) is preprocessed to be xi(m) where the index i denotes the ith frame after framing.
In the ballistic wave acoustic fringe identification, 256 points are selected as the frame length, 80 points are selected as the frame shift, and a Hamming window is selected as a windowing function.
2) Fast Fourier transform
Performing FFT transformation on each frame signal, converting time domain data into frequency domain data:
X(i,k)=FFT[xi(m)],
where FFT [ ] is a fast Fourier transform function and X (i, k) is the result of the fast Fourier transform.
3) Calculating spectral line energy
Calculating the energy of the spectral line for each frame of FFT data:
E(i,k)=[Xi(k)]2
where E (i, k) is the spectral line energy, i.e. energy spectrum, where i denotes the ith frame and k denotes the kth spectral line in the frequency domain.
4) Calculating energy through a Mel filter
The obtained energy spectrum of each frame of spectral line is passed through a Meier filter, and the energy in the Meier filter is calculated. In the frequency domain, the energy spectrum E (i, k) of each frame is equivalent to the frequency domain response H of the Maillard filterm(k) Multiplied and added.
Figure BDA0003226313510000081
S (i, m) is the filtered subband energy for each Mel-band.
5) Computing DCT cepstrum
FFT cepstrum of sequence x (n)
Figure BDA0003226313510000082
Is composed of
Figure BDA0003226313510000083
In the formula (I), the compound is shown in the specification,
Figure BDA0003226313510000084
wherein ln is natural logarithm, X (k) is energy of k frequency point after FFT,
DCT of the sequence x (n) is
Figure BDA0003226313510000091
Wherein the parameter N is the length of the sequence x (N); c (k) is an orthogonality factor, which can be expressed as:
Figure BDA0003226313510000092
taking logarithm of the energy of the Meier filter, calculating DCT:
Figure BDA0003226313510000093
wherein S (i, M) is the energy of the Miller filter after passing through the filter, M is the mth Miller filter, M is total, and i is the ith frame; n is the spectral line after DCT.
The MFCC parameter extracted after framing the voiceprint data is recorded as (q)1,q2........,qT)。
The time domain waveforms for the 122 barrel cannon, 122 rocket cannon, 130 barrel cannon, 155 barrel cannon are plotted against the MFCC coefficients.
As shown in FIG. 6(a), 122 the shot-time domain waveform and MFCC coefficients.
As shown in fig. 6(b), 122 rocket time domain waveforms and MFCC coefficients.
As shown in fig. 7, 130 shot time domain waveforms and MFCC coefficients.
As shown in fig. 8, 155 shot time domain waveform and MFCC coefficients.
Step 3, signal identification based on a hidden Markov model; aiming at the problem that an unknown sound signal is easy to generate false alarm, an output probability threshold is set for each HMM model, the output probability of the model established by the same kind of signals according to the sound signal is greater than the threshold, and the output of the model established by different kinds of signals according to the sound signal is less than the threshold to determine the signal category; the method specifically comprises the following steps: firstly, known infrasound signal labeled data are input, MFCC features are extracted, then training of an HMM model is carried out, an HMM model corresponding to each infrasound event type is obtained, HMM parameters corresponding to the infrasound events are obtained, and training of the model is carried out. Secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
An HMM can be described by M ═ { a, B, pi }, where a is the state transition matrix a ═ { a }ij}; b is an observation probability matrix B ═ Bij(k)};π={πiIs the initial state probability vector. Figuratively speaking, an HMM can be divided into two parts, one part being a Markov chain, described by π and A, and the output produced being a sequence of states (q)1,q2........,qT) Corresponding to MFCC parameters extracted from the shot seed voiceprint frames; the other part is a random process.
Fig. 9 is a schematic diagram of HMM composition. Wherein T is the observation time length. The state sequencer first follows a certain probability a ═ aijThe sequence of states is generated with a certain probability B ═ B for each state in the sequenceij(k) The actual observation data is generated, and the series of observation data constitutes an observation sequence. In the voiceprint target recognition, the observation sequence is a voiceprint characteristic parameter of one frame obtained by characteristic extraction, and the state is different voiceprint units which are specified in advance in a training stage.
When identifying objects with HMMs, two problems are faced, first, how to derive HMM parameters based on optimization criteria under known data. Second, what can be effectively identified under a given model and a set of unknown observations. These two problems are the training and recognition of the model.
The HMM model schematic shown in fig. 10(a), using the Baum _ Welch algorithm to optimize HMM model parameters after given training data; constructing one HMM-all by Lagrange number multiplicationAnd (3) deriving a relation between the new HMM model parameter and the old model parameter when the parameter is a variable target optimization function Q by using a derivation method, thereby obtaining the estimation of each parameter of the HMM. Repeating iterative operation by using the functional relation between the new HMM model parameter and the old HMM model parameter until the HMM model parameter is not obviously changed any more; firstly, using Veterbi algorithm to obtain HMM model M representing nth class eventnN is more than 1 and less than N, and N is the number of all possible target classes; the conditional likelihood function of the feature sequence O of the unknown signal given the trained HMM model is denoted as Pmax,PmaxIs an objective function that determines which class the signal belongs to; the event class number corresponding to the maximum conditional likelihood function is marked as n:
Figure BDA0003226313510000111
however, this method can only be used to limit the classification of signals, and there are many unknown sound signals on the battlefield, which are easy to generate false alarm. The method comprises the steps of setting an output probability threshold value for each HMM model, and determining the signal category according to the fact that the model output probability established by the same kind of signals of the sound signals is larger than the threshold value, and the output of different kinds of signals is smaller than the threshold value.
As shown in fig. 10(b), the HMM model training process is to set a probability threshold. The specific training steps are as follows:
(1) setting a maximum training frequency N and setting a normalized convergence threshold value T;
(2) giving an initial parameter λ ═ (pi, a, B);
(3) performing parameter reestimation on the initialization parameter lambda by using BW reestimation algorithm to obtain new model parameter
Figure BDA0003226313510000112
(4) Output probability of all observation value sequences by using Viterbi algorithm
Figure BDA0003226313510000113
(5) Calculating a sequence of observationsIs changed in the output probability P, e.g. if
Figure BDA0003226313510000114
New model parameters are accounted for
Figure BDA0003226313510000115
The iteration continues back to step 3 until the model parameters converge. If the number of iterations is greater than the maximum number of training, the operation is stopped even if not converged.
Fig. 10(c) shows a flowchart of training and recognizing infrasound signals. Firstly, known infrasound signal labeled data are input, MFCC features are extracted, then training of an HMM model is carried out, finally, an HMM model corresponding to each infrasound event type is obtained, and HMM parameters corresponding to the infrasound events are obtained. Secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
Another embodiment further provides an infrasound signal recognition system based on MFCC and HMM, comprising the following modules:
the device comprises a Maire filter design module, a frequency conversion module and a frequency conversion module, wherein the Maire filter design module is used for converting actual frequency into Maire frequency which accords with the auditory characteristic of human ears according to the actual frequency of infrasound signals and on the basis of the auditory characteristic of the human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences;
the Mel frequency cepstrum coefficient feature extraction module comprises ballistic wave voiceprint data framing and windowing functions, fast Fourier transform is carried out, spectral line energy is calculated, energy passing through a Mel filter is calculated, DCT cepstrum is calculated, and infrasonic signal features are obtained;
the signal identification module is used for setting an output probability threshold value for each HMM model, determining the signal category according to the fact that the output probability of a model established by the sound signal through the same kind of signals is greater than the threshold value, and the output of the model established by the sound signal through different kinds of signals is less than the threshold value; the method specifically comprises the following steps: firstly, inputting known infrasound signal labeled data, extracting MFCC (Mel frequency cepstrum coefficient) features, then training an HMM (hidden Markov model), and finally obtaining an HMM model corresponding to each infrasound event type to obtain HMM parameters corresponding to the infrasound events; secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
Another embodiment also provides an infrasound signal recognition apparatus based on MFCC and HMM, including:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the plurality of instructions are stored by the memory, and loaded and executed by the processor to perform the above-mentioned infrasound signal recognition method based on MFCC and HMM.
Another embodiment is directed to a computer readable storage medium having a plurality of instructions stored thereon; the plurality of instructions are used for loading and executing the infrasound signal identification method based on the MFCC and the HMM by the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An infrasound signal recognition method based on MFCC and HMM is characterized in that,
step 1, designing a Maillard filter aiming at infrasound signals; converting the actual frequency into a Mel frequency according with the auditory characteristics of human ears according to the actual frequency of the infrasound signal and based on the auditory characteristics of human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences;
step 2, extracting the feature of the Mel frequency cepstrum coefficient based on the quasi-steady state voiceprint data framing; the method comprises the steps of framing ballistic wave voiceprint data, windowing, fast Fourier transform, calculating spectral line energy, calculating energy passing through a Miller filter, calculating DCT cepstrum and obtaining infrasonic signal characteristics;
step 3, signal identification based on a hidden Markov model; setting an output probability threshold value for each HMM model, and determining the signal category according to the fact that the output probability of a model established by the sound signal through the same kind of signals is greater than the threshold value, and the output probability of the model established by the sound signal through different kinds of signals is less than the threshold value; the method specifically comprises the following steps: firstly, inputting known infrasound signal labeled data, extracting MFCC (Mel frequency cepstrum coefficient) features, then training an HMM (hidden Markov model) model to obtain an HMM model corresponding to each infrasound event type, obtaining HMM parameters corresponding to the infrasound events, and training the model; secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
2. The method of claim 1, wherein step 1 comprises,
the method for calculating the Mel frequency comprises the following steps:
FMel(f)=2595*lg(1+f/700)
wherein F represents the actual frequency of the signal in Hz, FMelPerceptual frequency in units of mel (Me l);
setting several band-pass filters H in the frequency spectrum range of the voiceprintm(k) M is more than or equal to 0 and less than or equal to M, M is the serial number of the band-pass filter, and M is the number of the filters; each filter has triangular filtering characteristics, the mth band-pass filter has a center frequency of f (m), and in the Mel frequency range, the filters are of equal bandwidth; transfer function H of each band pass filterm(k) Comprises the following steps:
Figure FDA0003226313500000021
where k is a parameter of the transfer function,
Figure FDA0003226313500000022
the center frequency f (m) of the Maire filter is defined as
Figure FDA0003226313500000023
Wherein f ishAnd flRespectively the highest and lowest frequencies, f, of the filter banksIs the sampling frequency in Hz; n is the number of points of FFT transformation;F Mel () Is the conversion of frequency to Me l frequency scale.
3. The method of claim 1, wherein step 2 comprises,
framing treatment: dividing the ballistic wave acoustic fringe signals into shorter frames, regarding each frame of signal as a steady-state signal, processing each frame of signal by adopting a method for processing the steady-state signal, and overlapping the adjacent two frames of signals;
windowing function: the purpose of the windowing function is to reduce leakage in the frequency domain, multiplying each frame of the ballistic voiceprint by a hamming window or a haining window; the ballistic wave voiceprint signal x (n) is preprocessed to be xi(m), wherein the subscript i denotes the ith frame after framing;
and fast Fourier transform, namely performing FFT (fast Fourier transform) on each frame of signal to convert time domain data into frequency domain data:
X(i,k)=FFT[xi(m)],
wherein FFT [ ] is a fast Fourier transform function, and X (i, k) is the result of the fast Fourier transform;
calculating spectral line energy, and calculating the energy of the spectral line for each frame of FFT data:
E(i,k)=[Xi(k)]2
wherein E (i, k) is the spectral line energy, i.e. energy spectrum, where i represents the ith frame and k represents the kth spectral line in the frequency domain;
calculating the energy passing through the Maier filter, and obtaining the energy spectrum of each frame of spectral linePassing through a mel filter and calculating the energy in the mel filter; in the frequency domain, the energy spectrum E (i, k) of each frame is equivalent to the frequency domain response H of the Maillard filterm(k) Multiplication and addition;
Figure FDA0003226313500000031
wherein S (i, m) is the filtered subband energy for each Mel band;
computing FFT cepstral of DCT cepstrum, sequence x (n)
Figure FDA0003226313500000032
Is composed of
Figure FDA0003226313500000033
In the formula (I), the compound is shown in the specification,
Figure FDA0003226313500000034
wherein ln is natural logarithm, X (k) is energy of k frequency point after FFT,
DCT of the sequence x (n) is
Figure FDA0003226313500000035
Wherein the parameter N is the length of the sequence x (N); c (k) is an orthogonality factor, which can be expressed as:
Figure FDA0003226313500000036
taking logarithm of the energy of the Meier filter, calculating DCT:
Figure FDA0003226313500000037
wherein S (i, M) is the energy of the Miller filter after passing through the filter, M is the mth Miller filter, M is total, and i is the ith frame; n is the spectral line after DCT;
the MFCC parameter extracted after framing the voiceprint data is recorded as (q)1,q2........,qT)。
4. The method of claim 1, wherein step 3 comprises,
an HMM can be described by M ═ { a, B, pi }, where a is the state transition matrix a ═ { a }ij}; b is an observation probability matrix B ═ Bij(k)};π={πiThe is the initial state probability vector; figuratively speaking, an HMM can be divided into two parts, one part being a Markov chain, described by π and A, and the output produced being a sequence of states (q)1,q2........,qT) Corresponding to MFCC parameters extracted from the shot seed voiceprint frames; the other part is a random process.
5. The method of claim 1, wherein step 3 comprises,
optimizing HMM model parameters after given training data with the Baum _ Welch algorithm; constructing a target optimization function Q with all parameters of the HMM as variables by Lagrange number multiplication, and deducing the relation between new HMM model parameters and old model parameters when Q reaches the pole by a derivation method, thereby obtaining the estimation of each parameter of the HMM; repeating iterative operation by using the functional relation between the new HMM model parameter and the old HMM model parameter until the HMM model parameter is not obviously changed any more; firstly, using Veterbi algorithm to obtain HMM model M representing nth class eventnN is more than 1 and less than N, and N is the number of all possible target classes; the conditional likelihood function of the feature sequence O of the unknown signal given the trained HMM model is denoted as Pmax,PmaxIs an objective function that determines which class the signal belongs to; the event class number corresponding to the maximum conditional likelihood function is marked as n:
Figure FDA0003226313500000041
6. the method of claim 1, wherein step 3 comprises,
an HMM model training process for setting a probability threshold value:
(1) setting a maximum training frequency N and setting a normalized convergence threshold value T;
(2) giving an initial parameter λ ═ (pi, a, B);
(3) performing parameter reestimation on the initialization parameter lambda by using BW reestimation algorithm to obtain new model parameter
Figure FDA0003226313500000051
(4) Output probability of all observation value sequences by using Viterbi algorithm
Figure FDA0003226313500000052
(5) Calculating the variation of the output probability P of the sequence of observations, e.g. if
Figure FDA0003226313500000053
New model parameters are obtained
Figure FDA0003226313500000054
Returning to the step 3 to continue iteration until the model parameters are converged; if the number of iterations is greater than the maximum number of training, the operation is stopped even if not converged.
7. The method of claim 1, wherein the method is used for nuclear explosion, missile flight, gun firing, and the generation of infrasonic signals.
8. An infrasound signal recognition system based on MFCC and HMM, comprising:
the device comprises a Maire filter design module, a frequency conversion module and a frequency conversion module, wherein the Maire filter design module is used for converting actual frequency into Maire frequency which accords with the auditory characteristic of human ears according to the actual frequency of infrasound signals and on the basis of the auditory characteristic of the human ears; frequency division rate characteristic extraction, namely dividing the voiceprint frequency into a series of triangular filter sequences;
the Mel frequency cepstrum coefficient feature extraction module comprises ballistic wave voiceprint data framing and windowing functions, fast Fourier transform is carried out, spectral line energy is calculated, energy passing through a Mel filter is calculated, DCT cepstrum is calculated, and infrasonic signal features are obtained;
the signal identification module is used for setting an output probability threshold value for each HMM model, determining the signal category according to the fact that the output probability of a model established by the sound signal through the same kind of signals is greater than the threshold value, and the output of the model established by the sound signal through different kinds of signals is less than the threshold value; the method specifically comprises the following steps: firstly, inputting known infrasound signal labeled data, extracting MFCC (Mel frequency cepstrum coefficient) features, then training an HMM (hidden Markov model), and finally obtaining an HMM model corresponding to each infrasound event type to obtain HMM parameters corresponding to the infrasound events; secondly, infrasound signal data are input, MFCC features are extracted, output probability is calculated aiming at each type of HMM, and the infrasound event with the maximum output probability is the type of the infrasound event.
9. An infrasound signal recognition apparatus based on MFCC and HMM, comprising:
a processor for executing a plurality of instructions;
a memory to store a plurality of instructions;
wherein the plurality of instructions are for storage by the memory and for loading and execution by the processor of the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that,
the storage medium has stored therein a plurality of instructions; the plurality of instructions for being loaded by a processor and for performing the method of any of the preceding claims 1 to 7.
CN202110972744.XA 2021-08-24 2021-08-24 Infrasound signal identification method, system and equipment based on MFCC and HMM Pending CN113705418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110972744.XA CN113705418A (en) 2021-08-24 2021-08-24 Infrasound signal identification method, system and equipment based on MFCC and HMM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110972744.XA CN113705418A (en) 2021-08-24 2021-08-24 Infrasound signal identification method, system and equipment based on MFCC and HMM

Publications (1)

Publication Number Publication Date
CN113705418A true CN113705418A (en) 2021-11-26

Family

ID=78654262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110972744.XA Pending CN113705418A (en) 2021-08-24 2021-08-24 Infrasound signal identification method, system and equipment based on MFCC and HMM

Country Status (1)

Country Link
CN (1) CN113705418A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762529A (en) * 2022-10-17 2023-03-07 国网青海省电力公司海北供电公司 Method for preventing cable from being broken outside by using voice recognition perception algorithm
CN116108372A (en) * 2023-04-13 2023-05-12 中国人民解放军96901部队 Infrasound event classification and identification method for small samples

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762529A (en) * 2022-10-17 2023-03-07 国网青海省电力公司海北供电公司 Method for preventing cable from being broken outside by using voice recognition perception algorithm
CN116108372A (en) * 2023-04-13 2023-05-12 中国人民解放军96901部队 Infrasound event classification and identification method for small samples

Similar Documents

Publication Publication Date Title
CN107146601B (en) Rear-end i-vector enhancement method for speaker recognition system
CN113705418A (en) Infrasound signal identification method, system and equipment based on MFCC and HMM
Wang et al. Recurrent deep stacking networks for supervised speech separation
CN109192200A (en) A kind of audio recognition method
CN113077806B (en) Audio processing method and device, model training method and device, medium and equipment
CN113111786B (en) Underwater target identification method based on small sample training diagram convolutional network
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
Sharma et al. A modified MFCC feature extraction technique for robust speaker recognition
CN113571095B (en) Speech emotion recognition method and system based on nested deep neural network
Helali et al. Real time speech recognition based on PWP thresholding and MFCC using SVM
CN116153337B (en) Synthetic voice tracing evidence obtaining method and device, electronic equipment and storage medium
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
CN116153339A (en) Speech emotion recognition method and device based on improved attention mechanism
JP6499095B2 (en) Signal processing method, signal processing apparatus, and signal processing program
CN113780408A (en) Live pig state identification method based on audio features
CN109473112B (en) Pulse voiceprint recognition method and device, electronic equipment and storage medium
JP2006215228A (en) Speech signal analysis method and device for implementing this analysis method, speech recognition device using this device for analyzing speech signal, program for implementing this analysis method, and recording medium thereof
CN102256201A (en) Automatic environmental identification method used for hearing aid
Hua et al. Sound anomaly detection of industrial products based on MFCC fusion short-time energy feature extraction
Jiang et al. Acoustic feature comparison of MFCC and CZT-based cepstrum for speech recognition
Pan Research and simulation on speech recognition by Matlab
Srinivas LFBNN: robust and hybrid training algorithm to neural network for hybrid features-enabled speaker recognition system
Chakraborty et al. An automatic speaker recognition system
Venkateswarlu et al. The performance evaluation of speech recognition by comparative approach
Mudgal et al. Template Based Real-Time Speech Recognition Using Digital Filters on DSP-TMS320F28335

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination