CN113571089A - Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture - Google Patents

Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture Download PDF

Info

Publication number
CN113571089A
CN113571089A CN202110908188.XA CN202110908188A CN113571089A CN 113571089 A CN113571089 A CN 113571089A CN 202110908188 A CN202110908188 A CN 202110908188A CN 113571089 A CN113571089 A CN 113571089A
Authority
CN
China
Prior art keywords
voice
identified
signal
sound
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110908188.XA
Other languages
Chinese (zh)
Inventor
吴华明
陈合谱
戴磊
张业超
肖文波
肖永生
黄丽贞
段军红
苏荃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baohang Technology Co ltd
Nanchang Hangkong University
Original Assignee
Beijing Baohang Technology Co ltd
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baohang Technology Co ltd, Nanchang Hangkong University filed Critical Beijing Baohang Technology Co ltd
Priority to CN202110908188.XA priority Critical patent/CN113571089A/en
Publication of CN113571089A publication Critical patent/CN113571089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a voice recognition method and a system based on a Mel cepstrum coefficient-support vector machine framework, wherein the method comprises the following steps: acquiring a voice signal to be identified; extracting sound characteristic data of a sound signal to be identified; the voice characteristic data comprises static characteristic data and dynamic characteristic data of the voice signal to be recognized; inputting the voice characteristic data of the voice signal to be recognized into a voice recognition model to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals. The method and the device can improve the accuracy of voice recognition by training the support vector machine model through the static characteristic data and the dynamic characteristic data of the voice signal to obtain the voice recognition model.

Description

Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture
Technical Field
The invention relates to the technical field of environmental monitoring, in particular to a voice recognition method and system based on a Mel cepstrum coefficient-support vector machine framework.
Background
By utilizing the characteristics of the optical fiber acoustic sensing system, the problems that the traditional electroacoustic sensor is difficult to use in extreme field environments such as strong electromagnetic interference, humidity and corrosion can be effectively solved when the optical fiber acoustic sensing system is applied to the sensing field, and the optical fiber acoustic sensing system can be widely applied to important fields such as medical treatment, aviation, energy and security. In an optical fiber acoustic sensing system, accurate identification and classification of acoustic signals are directly related to popularization and application of the system. To improve the accuracy of recognition and classification of sound signals, various solutions have been proposed by many scholars. An anti-noise speech feature extraction and optimization method based on HMM/SVM (hidden Markov model/support vector machine) is provided by Liwanling and Zhang autumn chrysanthemum of Xinjiang university, and a feature parameter extraction algorithm based on improved Mel cepstrum coefficient (MFCC) is provided by high Ming and Sunrong honest. However, the above method only performs simple feature extraction on the voice signal and uses the extracted features for recognition, and the extraction of the features cannot fully support the algorithm to accurately recognize and classify the voice signal.
Disclosure of Invention
The invention aims to provide a voice recognition method and a voice recognition system based on a Mel cepstrum coefficient-support vector machine framework, which can improve the recognition accuracy of voice signals.
In order to achieve the purpose, the invention provides the following scheme:
a voice recognition method based on a Mel cepstral coefficient-support vector machine architecture comprises the following steps:
acquiring a voice signal to be identified;
extracting sound characteristic data of the sound signal to be identified; the sound characteristic data comprises static characteristic data and dynamic characteristic data of the sound signal to be identified;
inputting the voice characteristic data of the voice signal to be recognized into a voice recognition model to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
Optionally, before the acquiring the voice signal to be recognized, the method further includes:
acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
extracting sound characteristic data of the historical sound signal;
and training a support vector machine model by taking the sound characteristic data of the historical sound signal as input and taking whether the historical sound signal contains an invading sound signal frame as output to obtain the sound identification model.
Optionally, after the acquiring the voice signal to be recognized, the method further includes:
carrying out normalization processing on the sound signal to be recognized;
and filtering the normalized voice signal to be identified.
Optionally, the extracting the sound feature data of the sound signal to be recognized specifically includes:
framing the voice signal to be identified to obtain a plurality of voice signal frames to be identified;
windowing each voice signal frame to be identified to obtain a plurality of voice windowed signal frames to be identified;
using formulas
Figure BDA0003202508930000021
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
using formulas
Figure BDA0003202508930000022
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
determining the multi-dimensional Mel cepstrum coefficient as static characteristic data of the sound signal to be identified; determining a first-order difference and a second-order difference of the static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure BDA0003202508930000023
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
A voice recognition system based on mel-frequency cepstral coefficient-support vector machine architecture, comprising:
the voice signal to be recognized acquisition module is used for acquiring a voice signal to be recognized;
the first sound characteristic data extraction module is used for extracting the sound characteristic data of the sound signal to be identified; the sound characteristic data comprises static characteristic data and dynamic characteristic data of the sound signal to be identified;
the voice signal identification module is used for inputting the voice characteristic data of the voice signal to be identified into a voice identification model to obtain a voice identification result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
Optionally, the system further includes:
the historical sound signal acquisition module is used for acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
the second sound characteristic data extraction module is used for extracting sound characteristic data of the historical sound signal;
and the voice recognition model determining module is used for training a support vector machine model by taking the voice feature data of the historical voice signal as input and taking whether the historical voice signal contains an invading voice signal frame as output to obtain the voice recognition model.
Optionally, the system further includes:
the normalization module is used for performing normalization processing on the voice signal to be recognized;
and the filtering module is used for filtering the normalized sound signal to be identified.
Optionally, the first sound feature data extraction module specifically includes:
the framing processing unit is used for framing the voice signal to be identified to obtain a plurality of voice signal frames to be identified;
the system comprises a to-be-identified sound windowing signal frame determining unit, a processing unit and a processing unit, wherein the to-be-identified sound windowing signal frame determining unit is used for respectively performing windowing processing on each to-be-identified sound signal frame to obtain a plurality of to-be-identified sound windowing signal frames;
fourier transform unit for using formula
Figure BDA0003202508930000031
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
a multi-Vimel cepstral coefficient determining unit for using the formula
Figure BDA0003202508930000032
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
the sound characteristic data extraction unit is used for determining the multi-dimensional Mel cepstrum coefficient as static characteristic data of the sound signal to be identified; determining a first-order difference and a second-order difference of the static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure BDA0003202508930000041
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a voice recognition method and a system based on a Mel cepstrum coefficient-support vector machine framework, wherein the method comprises the following steps: acquiring a voice signal to be identified; extracting sound characteristic data of a sound signal to be identified; the voice characteristic data comprises static characteristic data and dynamic characteristic data of the voice signal to be recognized; inputting the voice characteristic data of the voice signal to be recognized into a voice recognition model to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals. The method and the device can improve the accuracy of voice recognition by training the support vector machine model through the static characteristic data and the dynamic characteristic data of the voice signal to obtain the voice recognition model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a voice recognition method based on a Mel frequency cepstral coefficient-support vector machine architecture according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a Sagnac fiber optic sensing system according to an embodiment of the present invention;
FIG. 3 is a diagram of a voice recognition framework in an embodiment of the present invention;
FIG. 4 is a diagram illustrating the result of parameter optimization for SVM in accordance with an embodiment of the present invention;
FIG. 5 is a graph of a confusion matrix for training model accuracy in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a voice recognition system based on mel-frequency cepstral coefficient-support vector machine architecture in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a voice recognition method and a voice recognition system based on a Mel cepstrum coefficient-support vector machine framework, which can improve the recognition accuracy of voice signals.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a voice recognition method based on a mel-frequency cepstral coefficient-support vector machine architecture in an embodiment of the present invention, and as shown in fig. 1, the present invention provides a voice recognition method based on a mel-frequency cepstral coefficient-support vector machine architecture, which includes:
step 101: acquiring a voice signal to be identified;
step 102: extracting sound characteristic data of a sound signal to be identified; the voice characteristic data comprises static characteristic data and dynamic characteristic data of the voice signal to be recognized;
step 103: inputting the voice characteristic data of the voice signal to be recognized into a voice recognition model to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
Before step 101, further comprising:
acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
extracting sound characteristic data of the historical sound signal;
and training the support vector machine model by taking the sound characteristic data of the historical sound signal as input and taking whether the historical sound signal contains the invading sound signal frame as output to obtain a sound identification model.
After step 101, further comprising:
carrying out normalization processing on the sound signal to be recognized;
and filtering the normalized voice signal to be identified. Specifically, a wavelet threshold denoising method is adopted to filter the normalized sound signal to be recognized.
Specifically, in the voice recognition method based on the mel-frequency cepstrum coefficient-support vector machine architecture provided by the present invention, the method for extracting the voice feature data of the voice signal to be recognized is the same as the method for extracting the voice feature data of the historical voice signal, and taking the voice signal to be recognized as an example, the method for extracting the voice feature data of the voice signal to be recognized specifically includes:
performing frame processing on the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
windowing each voice signal frame to be identified to obtain a plurality of voice windowed signal frames to be identified;
using formulas
Figure BDA0003202508930000061
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
using formulas
Figure BDA0003202508930000062
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
determining a multi-dimensional Mel cepstrum coefficient as static characteristic data of the sound signal to be identified; determining a first-order difference and a second-order difference of static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure BDA0003202508930000063
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
Fig. 3 is a diagram of a voice recognition framework in an embodiment of the present invention, and as shown in fig. 3, the voice recognition method provided by the present invention is a signal processing algorithm based on a MFCC _ SVM (mel-frequency cepstral coefficient-support vector machine model) framework of a Sagnac fiber optic acoustic sensing system: an optical fiber sensing system is built based on the Sagnac principle, external sound signals are collected by the system, normalization processing and filtering processing are carried out on the signals, MFCC characteristic parameters of the signals are extracted, model training is carried out on the characteristic parameters by a support vector machine optimized by a grid search method, a signal recognition model is optimized, the signals collected by the system are distinguished by the optimized model, and the recognition capability of the system on the signals in a complex environment can be greatly improved.
The method comprises the following specific steps:
step one, sound signal collection: the optical fiber sensing system is built according to the Sagnac principle to cope with a complex monitoring environment, the invasion signals are manufactured in modes of knocking, hoeing, digging, pickaxe and the like, and the optical fiber sensing system realizes the collection of the invasion signals with human interference and the non-invasion signals without human interference. The structure of the optical fiber sensing system is shown in fig. 2, wherein Laser is a light source, PD is a photoelectric detector, DAQ is a data acquisition card, and PC is a computer; 1,2, 3 each represent 3 inputs of a 3 × 3 coupler a; 4, 5, 6 denote 3 outputs of the 3 × 3 coupler a; b represents a delay fiber; 7 and 8 respectively represent two input ends of the 2 × 1 coupler c, 9 represents an output end of the 2 × 1 coupler, and 10 represents a disturbance intrusion point position; d represents a 1 × 2 coupler; 11 denotes the concatenated fibre at the output of the 1 x 2 coupler d. cw denotes the clockwise light path, the path of which is 1-a-6-b-8-c-9-10-d-11-d-10-9-c-7-4-a-3, cww denotes the counterclockwise light path, the path of which is 1-a-4-7-c-9-10-d-11-d-10-9-c-8-b-6-a-3.
Step two, signal normalization: and carrying out operations such as zero filling or cutting on the collected sound signals to enable the lengths of all the signals to be consistent so as to facilitate subsequent feature extraction and model training, and mapping the signal data to be between 0 and 1 so as to improve the processing speed of the system.
Step three, filtering treatment: and denoising the signals processed in the step two by adopting a wavelet threshold denoising method, and filtering invalid information in the signals.
Step four, static characteristic extraction: the frame length of the selected sound signal is 512ms, and the frame overlapping is 128 ms; windowing each frame signal by using a Hanning (Hamming) window to reduce the influence of the Gibbs effect, so as to achieve the effects of inhibiting the waveform from generating oscillation and improving filtering; the Fourier transform is carried out on the sound signal to transform the time domain signal into a power spectrum, and the calculation formula is as follows:
Figure BDA0003202508930000071
wherein, Xa(k) The method comprises the steps of taking a to-be-identified voice windowing signal frame after Fourier transformation, taking x (N) as the to-be-identified voice windowing signal frame, taking k as the number of points of the Fourier transformation, taking k to be more than or equal to 0 and less than or equal to N, and taking N as the total number of the Fourier transformation.
Filtering the power spectrum by a group of triangular window filters which are linearly distributed on Mel frequency scale, and calculating the logarithm energy formula output by each filter group as follows:
Figure BDA0003202508930000072
wherein Hm(k) Is the frequency response of the mth filter, and M is the number of filters.
And finally, performing discrete cosine transform to remove the correlation among all dimensional signals, and mapping the signals to a low-dimensional space to obtain the MFCC, wherein the calculation formula of the discrete cosine transform is as follows:
Figure BDA0003202508930000081
where C (n) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank, and L is the dimension of the multidimensional Weimel cepstrum coefficient.
To maximize the efficiency of model training and recognition, the first 13 dimensions of the MFCC are extracted as the static features of the acoustic signal.
Step five, dynamic feature extraction: and taking the first order difference and the second order difference of the static features as the dynamic features of the sound signals, and combining the extracted static features and the extracted dynamic features into a 39-dimensional signal feature parameter.
Specifically, a first order difference spectrum and a second order difference spectrum are obtained by using the static characteristics as the dynamic characteristics of the sound signal, and the calculation of the difference parameters is calculated by using the following formula:
Figure BDA0003202508930000082
in the formula (d)tRepresents the t first order difference; ctRepresenting the t-th cepstral coefficient; ct+1Represents the t +1 th cepstrum coefficient; q represents the order of the cepstral coefficient; r represents the time difference of the first derivative, and can be 1 or 2.
Step six, selecting an SVM kernel function: the construction of the support vector machine mainly depends on the selection of kernel functions, and the support vector machine maps the feature data into a high-dimensional space so as to be linearly separable as much as possible. According to the invention, through comparison experiment analysis, a Gaussian kernel function is selected as a classification kernel function of the SVM.
Seventhly, model hyper-parameter optimization: in the aspect of super-parameter optimization, in order to improve the identification performance of the model, various possible punishment coefficients and kernel function radius 2 super-parameter pairs are tried by adopting a grid search method, then cross validation is carried out by adopting a K-fold cross validation (K-fold cross-validation) method, and the punishment coefficient c and the kernel function radius g which enable the SVM model to have the highest validation accuracy are selected as the optimal parameters. The result of the hyper-parameter optimization is shown in fig. 4, wherein GridSearchMethod indicates that a grid method is adopted for optimization, and CVAccuracy indicates the accuracy of cross validation.
Step eight, training and signal recognition of the model: the training and recognition speed and accuracy of the optimized SVM model can reach the best. Inputting the feature data obtained in the fourth step and the fifth step into the optimized SVM model, and the SVM can find a hyperplane according to the set parameters to divide the input data, so that the SVM-trained reference model is obtained. In order to verify the accuracy of the model, the signals collected by the Sagnac fiber sensing system are processed to extract characteristic parameters and are matched with the reference model, so that the signals are identified and classified, the classification accuracy is represented by a confusion matrix, and the confusion matrix is shown in fig. 5. In the figure, a represents a non-intrusion signal class, B represents an intrusion signal class, and each box represents that the column class is predicted in the row class, and the ratio of the column class to the number in each box.
Partial verification results, as shown in table 1:
TABLE 1 comparison of model predicted results with actual results
Figure BDA0003202508930000091
Wherein: traininggood1-traininggood 5: represents five non-invasive signals collected, respectively, rainingbad 1-rainingbad 5: respectively representing five collected intrusion signals. 1 denotes a non-intrusive signal and 2 denotes an intrusive signal.
Compared with the prior art, the invention has the advantages that: static characteristics and dynamic characteristics of signals are extracted by using an MFCC method to serve as parameters, the static characteristics and the dynamic characteristics are combined with a support vector machine, a grid search method is used for optimizing hyper-parameters of a model, and an optical fiber acoustic sensing recognition system based on an MFCC-SVM algorithm framework is formed. As shown in FIG. 5, the recognition accuracy can reach more than 91%.
Fig. 6 is a schematic structural diagram of a voice recognition system based on a mel-frequency cepstral coefficient-support vector machine architecture in an embodiment of the present invention, and as shown in fig. 6, the present invention further provides a voice recognition system based on a mel-frequency cepstral coefficient-support vector machine architecture, which includes:
a to-be-recognized sound signal acquisition module 601, configured to acquire a to-be-recognized sound signal;
a first sound feature data extraction module 602, configured to extract sound feature data of a sound signal to be identified; the voice characteristic data comprises static characteristic data and dynamic characteristic data of the voice signal to be recognized;
the voice signal recognition module 603 is configured to input voice feature data of the voice signal to be recognized into the voice recognition model, so as to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
In addition, the system further comprises:
the historical sound signal acquisition module is used for acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
the second sound characteristic data extraction module is used for extracting sound characteristic data of the historical sound signal;
and the voice recognition model determining module is used for training the support vector machine model by taking the voice characteristic data of the historical voice signal as input and taking whether the historical voice signal contains the invading voice signal frame as output so as to obtain the voice recognition model.
The invention provides a voice recognition system based on a Mel cepstrum coefficient-support vector machine framework, which also comprises:
the normalization module is used for performing normalization processing on the sound signal to be recognized;
and the filtering module is used for filtering the normalized sound signal to be identified.
The first sound feature data extraction module 602 specifically includes:
the framing processing unit is used for framing the voice signal to be recognized to obtain a plurality of voice signal frames to be recognized;
the system comprises a to-be-identified sound windowing signal frame determining unit, a processing unit and a processing unit, wherein the to-be-identified sound windowing signal frame determining unit is used for respectively performing windowing processing on each to-be-identified sound signal frame to obtain a plurality of to-be-identified sound windowing signal frames;
fourier transform unit for using formula
Figure BDA0003202508930000111
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
a multi-Vimel cepstral coefficient determining unit for using the formula
Figure BDA0003202508930000112
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
the voice characteristic data extraction unit is used for determining a multi-dimensional Mel cepstrum coefficient as static characteristic data of the voice signal to be identified; determining a first-order difference and a second-order difference of static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure BDA0003202508930000113
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A voice recognition method based on a Mel cepstral coefficient-support vector machine architecture, the method comprising:
acquiring a voice signal to be identified;
extracting sound characteristic data of the sound signal to be identified; the sound characteristic data comprises static characteristic data and dynamic characteristic data of the sound signal to be identified;
inputting the voice characteristic data of the voice signal to be recognized into a voice recognition model to obtain a voice recognition result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
2. The method for recognizing voice based on mel frequency cepstral coefficient-support vector machine architecture as claimed in claim 1, further comprising, before said obtaining the voice signal to be recognized:
acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
extracting sound characteristic data of the historical sound signal;
and training a support vector machine model by taking the sound characteristic data of the historical sound signal as input and taking whether the historical sound signal contains an invading sound signal frame as output to obtain the sound identification model.
3. The method for recognizing voice based on mel frequency cepstral coefficient-support vector machine architecture as claimed in claim 1, further comprising, after said obtaining the voice signal to be recognized:
carrying out normalization processing on the sound signal to be recognized;
and filtering the normalized voice signal to be identified.
4. The method as claimed in claim 1, wherein the extracting the voice feature data of the voice signal to be recognized specifically comprises:
framing the voice signal to be identified to obtain a plurality of voice signal frames to be identified;
windowing each voice signal frame to be identified to obtain a plurality of voice windowed signal frames to be identified;
using formulas
Figure FDA0003202508920000011
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
using formulas
Figure FDA0003202508920000021
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
determining the multi-dimensional Mel cepstrum coefficient as static characteristic data of the sound signal to be identified; determining a first-order difference and a second-order difference of the static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure FDA0003202508920000022
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
5. A voice recognition system based on mel-frequency cepstral coefficient-support vector machine architecture, the system comprising:
the voice signal to be recognized acquisition module is used for acquiring a voice signal to be recognized;
the first sound characteristic data extraction module is used for extracting the sound characteristic data of the sound signal to be identified; the sound characteristic data comprises static characteristic data and dynamic characteristic data of the sound signal to be identified;
the voice signal identification module is used for inputting the voice characteristic data of the voice signal to be identified into a voice identification model to obtain a voice identification result; the voice recognition model is obtained by training a support vector machine model according to historical voice signals.
6. The system of claim 5, wherein the system further comprises:
the historical sound signal acquisition module is used for acquiring a historical sound signal; the historical sound signal comprises a non-invasive sound signal frame and an invasive sound signal frame;
the second sound characteristic data extraction module is used for extracting sound characteristic data of the historical sound signal;
and the voice recognition model determining module is used for training a support vector machine model by taking the voice feature data of the historical voice signal as input and taking whether the historical voice signal contains an invading voice signal frame as output to obtain the voice recognition model.
7. The system of claim 5, wherein the system further comprises:
the normalization module is used for performing normalization processing on the voice signal to be recognized;
and the filtering module is used for filtering the normalized sound signal to be identified.
8. The system according to claim 5, wherein the first voice feature data extracting module specifically includes:
the framing processing unit is used for framing the voice signal to be identified to obtain a plurality of voice signal frames to be identified;
the system comprises a to-be-identified sound windowing signal frame determining unit, a processing unit and a processing unit, wherein the to-be-identified sound windowing signal frame determining unit is used for respectively performing windowing processing on each to-be-identified sound signal frame to obtain a plurality of to-be-identified sound windowing signal frames;
fourier transform unit for using formula
Figure FDA0003202508920000031
Respectively carrying out Fourier transform on each voice windowing signal frame to be identified to obtain a plurality of voice windowing signal frames to be identified after Fourier transform;
a multi-Vimel cepstral coefficient determining unit for using the formula
Figure FDA0003202508920000032
Respectively carrying out cosine transform processing on each Fourier transformed sound windowing signal frame to be identified to obtain a multi-dimensional Mel cepstrum coefficient of the signal to be identified;
the sound characteristic data extraction unit is used for determining the multi-dimensional Mel cepstrum coefficient as static characteristic data of the sound signal to be identified; determining a first-order difference and a second-order difference of the static characteristic data of the voice signal to be identified as dynamic characteristic data of the voice signal to be identified;
wherein, Xa(k) Is a voice windowing signal frame to be identified after Fourier transform, x (N) is the voice windowing signal frame to be identified, k is the number of points of Fourier transform, k is more than or equal to 0 and less than or equal to N, N is the total number of Fourier transform, C (N) is the nth Weimel cepstrum coefficient, s (m) is the logarithmic energy output by the mth filter bank,
Figure FDA0003202508920000033
Hm(k) and M is the frequency response of the mth filter, the number of the filters is M, n is 1,2, …, and L is the dimension of the multidimensional Mel cepstrum coefficient.
CN202110908188.XA 2021-08-09 2021-08-09 Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture Pending CN113571089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110908188.XA CN113571089A (en) 2021-08-09 2021-08-09 Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110908188.XA CN113571089A (en) 2021-08-09 2021-08-09 Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture

Publications (1)

Publication Number Publication Date
CN113571089A true CN113571089A (en) 2021-10-29

Family

ID=78170921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110908188.XA Pending CN113571089A (en) 2021-08-09 2021-08-09 Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture

Country Status (1)

Country Link
CN (1) CN113571089A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114199594A (en) * 2021-12-14 2022-03-18 奇瑞汽车股份有限公司 Vehicle steering abnormal sound identification method and system
CN116801456A (en) * 2023-08-22 2023-09-22 深圳市创洺盛光电科技有限公司 Intelligent control method of LED lamp

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN107424625A (en) * 2017-06-27 2017-12-01 南京邮电大学 A kind of multicenter voice activity detection approach based on vectorial machine frame
CN110155064A (en) * 2019-04-22 2019-08-23 江苏大学 Special vehicle traveling lane identification based on voice signal with from vehicle lane change decision system and method
US10403303B1 (en) * 2017-11-02 2019-09-03 Gopro, Inc. Systems and methods for identifying speech based on cepstral coefficients and support vector machines
CN110265035A (en) * 2019-04-25 2019-09-20 武汉大晟极科技有限公司 A kind of method for distinguishing speek person based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN107424625A (en) * 2017-06-27 2017-12-01 南京邮电大学 A kind of multicenter voice activity detection approach based on vectorial machine frame
US10403303B1 (en) * 2017-11-02 2019-09-03 Gopro, Inc. Systems and methods for identifying speech based on cepstral coefficients and support vector machines
CN110155064A (en) * 2019-04-22 2019-08-23 江苏大学 Special vehicle traveling lane identification based on voice signal with from vehicle lane change decision system and method
CN110265035A (en) * 2019-04-25 2019-09-20 武汉大晟极科技有限公司 A kind of method for distinguishing speek person based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张冠华: "基于卷积神经网络的鲸鱼叫声分类研究" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114199594A (en) * 2021-12-14 2022-03-18 奇瑞汽车股份有限公司 Vehicle steering abnormal sound identification method and system
CN114199594B (en) * 2021-12-14 2022-10-21 奇瑞汽车股份有限公司 Method and system for identifying abnormal steering sound of vehicle
CN116801456A (en) * 2023-08-22 2023-09-22 深圳市创洺盛光电科技有限公司 Intelligent control method of LED lamp

Similar Documents

Publication Publication Date Title
CN112257521B (en) CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation
CN113571089A (en) Voice recognition method based on Mel cepstrum coefficient-support vector machine architecture
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN103413113A (en) Intelligent emotional interaction method for service robot
CN103503060A (en) Speech syllable/vowel/phone boundary detection using auditory attention cues
CN111261189B (en) Vehicle sound signal feature extraction method
CN113488073B (en) Fake voice detection method and device based on multi-feature fusion
Gunasekaran et al. Content-based classification and retrieval of wild animal sounds using feature selection algorithm
Shekofteh et al. Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems
US20110218802A1 (en) Continuous Speech Recognition
Shan-shan et al. Research on bird songs recognition based on MFCC-HMM
CN112397090B (en) Real-time sound classification method and system based on FPGA
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
CN102141812A (en) Robot
CN112418173A (en) Abnormal sound identification method and device and electronic equipment
Zhang et al. Environmental sound recognition using double-level energy detection
Dhakal et al. Detection and identification of background sounds to improvise voice interface in critical environments
Prasad et al. Gender based emotion recognition system for telugu rural dialects using hidden markov models
CN113488069A (en) Method and device for quickly extracting high-dimensional voice features based on generative countermeasure network
Makropoulos et al. Convolutional recurrent neural networks for the classification of cetacean bioacoustic patterns
Wang et al. Multi-Scale Permutation Entropy for Audio Deepfake Detection
Shekofteh et al. Using phase space based processing to extract proper features for ASR systems
Xie et al. MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification
Heriyanto et al. Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation
Bencharif et al. Parallel implementation of distributed acoustic sensor acquired signals: detection, processing, and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211029