CN114999459A - Voice recognition method and system based on multi-scale recursive quantitative analysis - Google Patents

Voice recognition method and system based on multi-scale recursive quantitative analysis Download PDF

Info

Publication number
CN114999459A
CN114999459A CN202210481126.XA CN202210481126A CN114999459A CN 114999459 A CN114999459 A CN 114999459A CN 202210481126 A CN202210481126 A CN 202210481126A CN 114999459 A CN114999459 A CN 114999459A
Authority
CN
China
Prior art keywords
recursion
recursive
scale
speech recognition
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210481126.XA
Other languages
Chinese (zh)
Inventor
张晓俊
朱欣程
赵登煌
陶智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202210481126.XA priority Critical patent/CN114999459A/en
Publication of CN114999459A publication Critical patent/CN114999459A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/36Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using chaos theory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a voice recognition method and a system based on multi-scale recursive quantitative analysis, wherein the method comprises the following steps: extracting a glottal wave signal of the voice signal; utilizing a Gamma filter to divide the glottal wave signals in a multi-band mode to obtain glottal wave signals of a plurality of frequency channels; reconstructing a multi-scale phase space of the glottal wave signals of each frequency channel through time delay and embedding dimension, and constructing a recursion graph according to the distance between every two phase points in the phase space; quantizing the nonlinear dynamic recursive characteristics of the glottal wave signals in each frequency channel according to a recursive graph to obtain a plurality of characteristic parameters of the glottal wave signals of each frequency channel; dividing the voice signal into a training set and a testing set, and training a recognition model by using the characteristic parameters of the training set; and carrying out prediction classification on the characteristic parameters of the test set by using the trained recognition model. The invention can accurately quantize the nonlinear characteristics in the voice signal and improve the voice recognition accuracy.

Description

Voice recognition method and system based on multi-scale recursive quantitative analysis
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition method and system based on multi-scale recursive quantitative analysis.
Background
With the rapid development of artificial intelligence, the speech recognition technology has made remarkable progress, and gradually enters various fields such as household appliances, medical treatment, automotive electronics and the like. The speech recognition process mainly comprises feature extraction and recognition by using a classifier. The extracted features of speech mainly affect the accuracy of speech recognition. The commonly used characteristic parameters mainly comprise disturbance-like characteristics, such as base frequency Jitter and amplitude perturbation Shimmer; spectral and cepstral-like features such as linear prediction coefficients LPCC, mel-frequency cepstral coefficients MFCC, gamma cepstral coefficients GFCC; complex measures such as the maximum lyapunov exponent, correlation dimension and entropy characteristics, etc.
The computation of the perturbation features depends on choosing a suitable window length and accurately estimating the fundamental frequency, whereas for aperiodic, irregular speech signals it is obviously difficult to extract its pitch period. And the generation of the voice is not a deterministic linear process, is not a random process, but is a nonlinear process, so the spectrum and cepstrum-like characteristics cannot represent the nonlinear characteristics in the voice signal. The maximum lyapunov exponent, the correlation dimension and the entropy characteristic can only represent the low-dimensional chaotic characteristic of the voice signal. The accuracy of the recursive quantization measure in the aspect of speech recognition is not ideal, and the recursive quantization measure is difficult to be applied to actual scenes.
Disclosure of Invention
The invention aims to provide a voice recognition method and a system based on multi-scale recursive quantitative analysis, which can accurately quantize nonlinear characteristics in a voice signal and improve the voice recognition accuracy.
In order to solve the above technical problem, the present invention provides a speech recognition method based on multi-scale recursive quantization analysis, comprising the following steps:
s1, extracting a glottal wave signal of the voice signal;
s2, dividing the glottal wave signals in a multiband mode by using a Gamma filter to obtain glottal wave signals of a plurality of frequency channels;
s3, reconstructing a multi-scale phase space of the glottal wave signals of each frequency channel through time delay and embedding dimension, and constructing a recursion graph according to the distance between every two phase points in the phase space;
s4, quantizing the nonlinear dynamic recursive characteristics of the glottal wave signals in each frequency channel according to the recursive graph to obtain a plurality of characteristic parameters of the glottal wave signals of each frequency channel;
s5, dividing the voice signal into a training set and a testing set, and training a recognition model by using the characteristic parameters of the training set;
and S6, carrying out prediction classification on the characteristic parameters of the test set by using the trained recognition model.
As a further improvement of the present invention, the time domain impulse response of the Gammatone filter is:
g i (t)=B k t (k-1) e -2πbt cos(2πf i +φ)u(t)
wherein, the order k of the filter is set to 4, and the initial phase phi of the filter is set to 0; f. of i Is the center frequency of the ith channel filter; b is a parameter related to the equivalent rectangular bandwidth; u (t) is a step function.
As a further improvement of the present invention, the center frequency is:
Figure BDA0003627897560000021
where C is related to quality factor and bandwidth, f l And f h Is the lowest and highest frequency of the filter; the number K of the filters is 24; b is a parameter related to the equivalent rectangular bandwidth ERB:
B=1.019·ERB(f i )
the equivalent rectangular bandwidth ERB is related to the filter center frequency as follows:
ERB(f i )=24.7+0.108f i
as a further improvement of the present invention, a time series { x (1), x (2),.., x (N) } of length N is set, and the phase space is reconstructed by the Takens embedding theorem:
Figure BDA0003627897560000031
where τ is the time delay, m is the embedding dimension, and the total number of points represented by the vector in the reconstructed phase space is N ═ N- (m-1) τ.
As a further improvement of the present invention, when the distance between two facies points in the facies space is smaller than the threshold, it represents that the distance between the two points is recursive, and the obtained recursive value is:
R ij =θ(ε-||X i -X j ||)
i,j=1,2…n
wherein epsilon is a threshold, theta is a Herveseidel function, and | | · | |, represents a norm.
As a further development of the invention, a series of characteristic parameters relating to the recursion values is obtained on the basis of an analysis of the density of the repetition points, diagonal lines, vertical lines or horizontal lines, according to a recursion diagram.
As a further refinement of the invention, the characteristic parameters include: recursion rate, certainty, maximum diagonal length, entropy of diagonal length, average diagonal length, degree of stratification, capture time, maximum vertical line length, first recursion time, second recursion time, recursion time entropy, clustering coefficient, and transitivity.
As a further improvement of the present invention, the recursion rate is a percentage of the recursion points in the recursion graph;
the certainty represents a ratio of a recursion point forming a diagonal segment appearing in the recursion graph to an overall recursion point;
the maximum diagonal length is the length of the longest diagonal in the recursive graph structure;
the entropy of the length of the diagonal line is the Shannon entropy distributed by the length of the diagonal line structure on the recursion diagram, and the information content contained in the structure of the recursion diagram is measured;
the average diagonal length is highly correlated with the average predicted time of the dynamic system and the divergence of the system;
the degree of stratification is the ratio of the recursion points forming the vertical structure to all the recursion points in the recursion graph, and reflects the complexity of the dynamic system;
the capture time represents an average length of vertical lines in the recursive graph structure; measuring the average time that the system is in a very slowly varying state;
the maximum vertical line length represents the maximum length of a vertical line in the recursive graph structure;
first recursion time T1(i) and second recursion time T2 (i):
T1(i)=t i+1 -t i ,t=1,2,K
T2(i)=t i+1 -t i ,t=1,2,K
the recursive temporal entropy indicates the degree to which a temporal sequence repeats the same sequence;
the clustering coefficient represents the probability that two neighbor points of any state in the recursive graph structure are also clustered together;
the transfer quantifies the geometric properties of the phase space trajectory.
A speech recognition system based on multi-scale recursive quantization analysis adopts the speech recognition method based on multi-scale recursive quantization analysis to perform speech recognition.
As a further improvement of the invention, the recognition model classifier adopts a Bayesian network classifier.
The invention has the beneficial effects that: the multi-scale recursive quantization measure of the characteristic parameters provided by the method does not depend on the extraction of the pitch period of the voice, and simultaneously can measure the high-dimensional chaotic characteristic of the voice signals, thereby being beneficial to improving the accuracy of voice recognition; the recursion quantization measure effectively captures vocal cord vibration change, based on the vocal cord vibration mechanism, a glottal signal is extracted as a source signal, the signal reconstructs a high-dimensional phase space on a gamma scale, a recursion graph is drawn by combining the characteristics of human auditory perception, finally, the nonlinear dynamic recursion characteristic of the voice signal in each frequency channel is quantized in recursion, and the voice recognition rate by using the nonlinear analysis method exceeds that of the traditional linear analysis method.
Drawings
FIG. 1 is a schematic diagram of the multi-scale recursive quantization measure extraction process of the present invention;
FIG. 2 is a schematic diagram of a speech recognition system of the present invention.
Detailed Description
The present invention is further described below in conjunction with the drawings and the embodiments so that those skilled in the art can better understand the present invention and can carry out the present invention, but the embodiments are not to be construed as limiting the present invention.
As described in the background, with respect to the commonly used characteristic parameters:
1. disturbance characteristics: noise, such as base frequency jitter and amplitude perturbations, in speech signals caused by vocal cord irregularities due to vocal complaints are described. Fundamental jitter represents a short-term perturbation of the fundamental frequency and amplitude perturbations represent a short-term perturbation in amplitude.
2. Spectral and cepstral class features: MFCC and GFCC are characteristic parameters that precisely conform to the auditory perception characteristics of the human ear, and MFCC and GFCC are features commonly used for speech recognition. The basic principle is to map a linear spectrum into a mel or gamma non-linear spectrum based on the auditory perception characteristics of the human ear, and then to a cepstrum.
3. Complexity measure: maximum lyapunov exponent (LLE), Correlation Dimension (CD), and Recursive Quantization Measures (RQMs). The maximum Lyapunov exponent represents the numerical characteristic of the average exponential divergence rate of adjacent tracks in the phase space, and the maximum Lyapunov exponent, the correlation dimension and the recursive quantization measure are all based on the nonlinear characteristic of phase space reconstruction and represent the chaos degree of the voice signal.
Since a speech signal has complex nonlinear characteristics, a conventional nonlinear analysis method is applied to speech recognition. However, due to the non-stationary characteristic of the speech signal, the non-linear characteristic thereof cannot be accurately quantized, and thus the recognition effect is inferior to the linear analysis method. The invention provides a voice recognition method based on multi-scale recursive quantitative analysis. The characteristic parameter multi-scale recursive quantization measure provided by the method does not depend on the extraction of the voice pitch period, and simultaneously can measure the high-dimensional chaotic characteristics of the voice signal. The recursive quantization measure effectively captures vocal cord vibration changes. The signals reconstruct a high-dimensional phase space on a gamma scale, and a recursive graph is drawn by combining the characteristics of human auditory perception. Finally, the nonlinear dynamic recursive characteristics of the voice signal in each frequency channel are quantized from the recursion.
Referring to fig. 1, the present invention provides a speech recognition method based on multi-scale recursive quantization analysis, comprising the following steps:
s1, extracting a glottal wave signal of the voice signal;
s2, multi-band division is carried out on the glottal wave signals by using a Gamma-tone filter, and glottal wave signals of a plurality of frequency channels are obtained;
s3, reconstructing a multi-scale phase space of the glottal wave signals of each frequency channel through time delay and embedding dimension, and constructing a recursion graph according to the distance between every two phase points in the phase space;
s4, quantizing the nonlinear dynamic recursive characteristics of the glottal wave signals in each frequency channel according to the recursive graph to obtain a plurality of characteristic parameters of the glottal wave signals of each frequency channel;
s5, dividing the voice signal into a training set and a testing set, and training a recognition model by using the characteristic parameters of the training set;
and S6, carrying out prediction classification on the characteristic parameters of the test set by using the trained recognition model.
The multi-scale recursive quantitative measure provided by the invention starts from a vocal cord vibration mechanism, and extracts a glottal signal as a source signal. The feature can decompose a non-stationary, non-linear complex sequence into a set of frequency subsequence features through multi-scale analysis. The invention combines the auditory perception characteristic of human ears, reconstructs the multi-scale phase space of the glottal signal by calculating time delay and embedding dimension, quantizes a nonlinear and non-stable recursive structure to obtain the nonlinear characteristic in the voice signal, and then identifies the voice by an artificial intelligence method. The multi-scale recursive quantization measurement characteristic parameters provided by the invention do not need to extract the pitch period of the voice, can accurately quantize the nonlinear characteristics in the voice signal, is beneficial to improving the voice recognition accuracy rate, and exceeds the traditional linear analysis method.
Specifically, the invention mainly aims at feature extraction and researches from the perspective of glottal waves. In the aspect of feature extraction, the multi-band division of the glottal wave signal is performed by utilizing the Gamma filter group, so that the glottal wave signal can more finely express the voice characteristic and has the auditory perception characteristic.
The specific design of the speech recognition system in the invention mainly comprises:
1. extracting a glottal wave signal, namely extracting the glottal wave signal of the original voice signal by using a glottal inverse filtering algorithm;
gamma atom frequency division processing:
designing a Gamma tone auditory bionic filter to divide the glottal wave signals in multiple frequency bands to obtain the glottal wave signals of 24 frequency channels:
the time domain expression of the Gamma tone filter bank is as follows: g is a radical of formula i (t)=B k t (k-1) e -2πbt cos(2πf i + phi u (t), when the filter order k is set to be 4, the filter characteristic of the human ear basement membrane can be well simulated; the initial phase phi of the filter is set to 0; f. of i Is the center frequency of the i-th channel filter. The center frequency is:
Figure BDA0003627897560000071
where C is related to quality factor and bandwidth, f l And f h Is the lowest and highest frequency of the filters, the number K of filters is 24.
B is a parameter related to the equivalent rectangular bandwidth:
B=1.019·ERB(f i )
the equivalent rectangular bandwidth ERB is related to the filter center frequency as follows:
ERB(f i )=24.7+0.108f i
3. nonlinear kinetic analysis:
the first step in analyzing the signal using nonlinear dynamics theory is to reconstruct the phase space: assume a time series of length N { x (1), x (2),.., x (N) } can reconstruct the phase space by Takens embedding theorem:
Figure BDA0003627897560000072
where τ is the time delay and m is the embedding dimension. Reconstruction of the phase space { X 1 ,X 2 ,X 3 ...,X n The total number of points represented by the vector in (f) is N ═ N- (m-1) τ.
A. Constructing a recursive graph:
the recursion map is a tool for analyzing the signal recursion phenomenon in the two-dimensional space map. When the distance between two phase points is less than the threshold, it means that the distance between the two points is recursive, represented by a black point, otherwise it is not recursive, represented by a white point or a blank space.
R ij =θ(ε-||X i -X j ||)
i,j=1,2…n
B. Recursive measure of quantization:
the recursive nature of the time series depends on the geometrical nature of the recursion graph. Recursive quantization analysis is a method for quantizing system dynamics based on a recursive graph. Based on an analysis of the density of the repetition points, diagonal lines, vertical lines or horizontal lines, a series of statistical parameters may be derived. This work uses 13 recursive quantization measures such as average diagonal length, maximum diagonal length, clustering coefficients and transitivity.
Recursion Rate (RR): refers to the percentage of recursion points in the recursion graph:
Figure BDA0003627897560000081
certainty (DET) represents the ratio of the recursion points forming the diagonal segments appearing in RP to the total recursion points:
Figure BDA0003627897560000082
where l is the length of the diagonal segment, l is min Is its minimum value; the frequency distribution of the diagonal length l is represented by P ε (l) Represents; p ε (l)={l i ;i=1...n l },n l Is the absolute number of diagonals.
Maximum diagonal length (L) max ): length of longest diagonal in recursion graph structure:
L max =max({l i ;i=1...n l })
entropy of diagonal length (ENTR) refers to shannon entropy of diagonal structure length distribution on the recursion map, which measures the amount of information contained in the recursion map structure:
Figure BDA0003627897560000083
Figure BDA0003627897560000084
the average diagonal length (< L >) is highly correlated to the average predicted time of the dynamic system and the divergence of the system:
Figure BDA0003627897560000091
the degree of stratification (LAM) is the ratio of the recursion points forming the vertical structure to all the recursion points in the recursion graph, and can reflect the complexity of the dynamic system:
Figure BDA0003627897560000092
where v is the length of the vertical segment, P ε (v)={l i ;i=1...n v };
The capture time (TT) represents the average length of the vertical lines in the recursive graph structure. It measures the average time the system is in a very slowly varying state:
Figure BDA0003627897560000093
maximum vertical line length (V) max ) Represents the maximum length of a vertical line in the recursive graph structure:
V max =max({v i ;i=1...n v })
first recursion time (T1) and second recursion time (T2):
T1(i)=t i+1 -t i ,t=1,2,K
T2(i)=t i+1 -t i ,t=1,2,K
recursive temporal entropy (RPDE) has been successfully applied to biomedical tests. It has advantages in detecting subtle changes in biological time series, indicating the extent to which the time series repeats the same sequence:
Figure BDA0003627897560000094
each point of the time series { x (1), x (2),.., x (n) } is plotted as a histogram during the threshold reward period. P (t) is the result of the histogram normalization. Wherein T is max Is the maximum repetition period and t is the time between returns.
The clustering coefficient (cluster) represents in the complex network theory the probability that two neighbors of any state in the recursive graph structure are also clustered together:
Figure BDA0003627897560000101
Figure BDA0003627897560000102
RR i representing the local recursion rate.
Transmissibility (Trans) quantifies the geometric properties of the phase space trajectory:
Figure BDA0003627897560000103
3. dividing the voice into a training set and a testing set, and training a recognition model by using the characteristic parameters of the voice in the training set;
4. and carrying out prediction classification on the characteristic parameters of the test set by using the trained model.
Examples
In this embodiment, the effect of the method of the present invention is verified by comparing the speech recognition results of the feature extraction method:
1. extracting characteristic parameters MFCC:
(1) after pre-emphasis, the signal S (n) is windowed and framed by adopting a Hamming window to obtain each frame signal x n (m) then obtaining its spectrum X by short-time Fourier transform n (k) The square of the spectrum, i.e. the energy spectrum P, is then found n (k)。
P n (k)=|X n (k)| 2
(2) Using M Mel band-pass filter pairs P n (k) Filtering is performed, and since the contributions of the components in each band are superimposed in the human ear, the energy in each filter band is superimposed.
Figure BDA0003627897560000111
Wherein H m (k) In the form of the Mel Filter frequency Domain, S n (m) is the per filter band output.
(3) And taking a logarithmic power spectrum from the output of each filter and performing inverse discrete cosine transform to obtain L MFCC coefficients.
Figure BDA0003627897560000112
(4) The obtained MFCC coefficient is used as the characteristic parameter of the nth frame, the static characteristic of the voice signal is reflected, and a better effect is obtained if a first-order difference coefficient which is more sensitive to human ears is added. The first order difference is calculated as follows:
Figure BDA0003627897560000113
l is generally 2, represents the linear combination of 2 frames before and after the current frame and reflects the dynamic characteristics of the voice.
2. Maximum lyapunov exponent and associated dimension (LLE & D2):
(1) for a given speech signal, a smaller embedding dimension m is first selected 0 Reconstructing a phase space;
Figure BDA0003627897560000114
(2) calculating relevance dimension C (r)
Figure BDA0003627897560000115
Wherein,
Figure BDA0003627897560000116
represents the distance between two phase points, theta (u) is the Heaviside function,
Figure BDA0003627897560000117
c (r) is a cumulative score function representing the probability that the distance between two points on the attractor in phase space is less than r.
(3) At an initial phase point x 0 Selecting one and x from the dot set as a base point 0 Nearest point x 1 As end points, an initial vector, x, is constructed 0 ,x 1 The inter-Euclidean distance can be recorded as L (t) 0 ). The time step or evolution time k, the initial vector evolves forward along the trajectory to obtain a new vector, and the Euclidean distance between the corresponding point and the endpoint can be marked as L (t) 1 ) And the exponential growth rate of the system linearity in the corresponding time period is recorded as:
Figure BDA0003627897560000121
(4) continuing this way until all phase points, and then taking the average value of each exponential growth rate as the maximum lyapunov exponent estimate:
Figure BDA0003627897560000122
in this embodiment, a bayesian network classifier is used to classify and identify the speech by using the Recursive Quantization Measures (RQMs), the maximum lyapunov exponent and the associated dimension (LLE & D2), the mel-frequency cepstral coefficient (MFCC), and the multi-scale recursive quantization measures, and the experimental results are shown in the following table:
Figure BDA0003627897560000123
as can be seen from the table above, the multi-scale recursive quantization measure is superior to the traditional characteristic parameter Mel cepstral coefficient, the maximum Lyapunov exponent of nonlinear characteristics and the correlation and recursive quantization measures.
The accuracy of identifying the characteristic parameters of the multi-scale recursive quantitative measure in the Bayesian network classifier reaches 100%, and other evaluation indexes reach optimal values, which is superior to the traditional method. Therefore, the characteristics provided by the invention improve the recognition rate and reliability of the system.
As shown in fig. 2, the present invention also provides a speech recognition system based on multi-scale recursive quantization analysis, which uses a speech recognition method based on multi-scale recursive quantization analysis (including but not limited to bayesian network) as described above to recognize speech. The principle of solving the problem is similar to the speech recognition method based on the multi-scale recursive quantization analysis, and repeated parts are not repeated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (10)

1. A speech recognition method based on multi-scale recursive quantization analysis is characterized in that: the method comprises the following steps:
s1, extracting a glottal wave signal of the voice signal;
s2, dividing the glottal wave signals in a multiband mode by using a Gamma filter to obtain glottal wave signals of a plurality of frequency channels;
s3, reconstructing a multi-scale phase space of the glottal wave signals of each frequency channel through time delay and embedding dimension, and constructing a recursion graph according to the distance between every two phase points in the phase space;
s4, quantizing the nonlinear dynamic recursive characteristics of the glottal wave signals in each frequency channel according to the recursive graph to obtain a plurality of characteristic parameters of the glottal wave signals of each frequency channel;
s5, dividing the voice signal into a training set and a testing set, and training a recognition model by using the characteristic parameters of the training set;
and S6, carrying out prediction classification on the characteristic parameters of the test set by using the trained recognition model.
2. A speech recognition method based on multi-scale recursive quantization analysis according to claim 1, characterized in that: the time domain impulse response of the Gamma filter is as follows:
g i (t)=B k t (k-1) e -2πbt cos(2πf i +φ)u(t)
the order k of the filter is set to be 4, and the initial phase phi of the filter is set to be 0; f. of i Is the center frequency of the ith channel filter; b is a parameter related to the equivalent rectangular bandwidth; u (t) is a step function.
3. A speech recognition method based on multi-scale recursive quantization analysis according to claim 2, characterized in that: the center frequency is:
Figure FDA0003627897550000011
where C is related to quality factor and bandwidth, f l And f h Is the lowest and highest frequency of the filter; the number K of the filters is 24; b is a parameter related to the equivalent rectangular bandwidth ERB:
B=1.019·ERB(f i )
the equivalent rectangular bandwidth ERB is related to the filter center frequency as follows:
ERB(f i )=24.7+0.108f i
4. a speech recognition method based on multi-scale recursive quantization analysis according to claim 1, characterized in that: a time series { x (1), x (2),.., x (N) } of length N is set, the phase space is reconstructed by the Takens embedding theorem:
Figure FDA0003627897550000021
where τ is the time delay, m is the embedding dimension, and the total number of points represented by the vector in the reconstructed phase space is N ═ N- (m-1) τ.
5. A speech recognition method based on multi-scale recursive quantization as claimed in claim 4, characterized in that: when the distance between every two facies points in the facies space is less than the threshold, it represents that the distance between the two points is recursive, and the obtained recursive value is:
R ij =θ(ε-||X i -X j ||)
i,j=1,2…n
wherein epsilon is a threshold, theta is a Herveseidel function, and | | · | |, represents a norm.
6. The speech recognition method based on multi-scale recursive quantization analysis of claim 5, characterized in that: from the recursion map, a series of characteristic parameters about the recursion values are obtained based on an analysis of the density of the repeated points, diagonal lines, vertical lines or horizontal lines.
7. The speech recognition method based on multi-scale recursive quantization analysis of claim 6, characterized in that: the characteristic parameters comprise: recursion rate, certainty, maximum diagonal length, entropy of diagonal length, average diagonal length, degree of stratification, capture time, maximum vertical line length, first recursion time, second recursion time, recursion time entropy, clustering coefficient, and transitivity.
8. The speech recognition method based on multi-scale recursive quantization analysis of claim 7, characterized in that: the recursion rate is the percentage of recursion points in the recursion graph;
the certainty represents a ratio of a recursion point forming a diagonal segment appearing in the recursion graph to an overall recursion point;
the maximum diagonal length is the length of the longest diagonal in the recursive graph structure;
the entropy of the length of the diagonal line is the Shannon entropy distributed by the length of the diagonal line structure on the recursion diagram, and the information content contained in the structure of the recursion diagram is measured;
the average diagonal length is highly correlated with the average predicted time of the dynamic system and the divergence of the system;
the degree of stratification is the ratio of the recursion points forming the vertical structure to all the recursion points in the recursion graph, and reflects the complexity of the dynamic system;
the capture time represents an average length of vertical lines in the recursive graph structure; measuring the average time that the system is in a very slowly varying state;
the maximum vertical line length represents the maximum length of a vertical line in the recursive graph structure;
first recursion time T1(i) and second recursion time T2 (i):
T1(i)=t i+1 -t i ,t=1,2,K
T2(i)=t i+1 -t i ,t=1,2,K
the recursive temporal entropy indicates the degree to which a temporal sequence repeats the same sequence;
the clustering coefficient represents the probability that two neighbor points of any state in the recursive graph structure are also clustered together;
the recursion quantifies the geometric properties of the phase space trajectory.
9. A speech recognition system based on multi-scale recursive quantization analysis, characterized by: speech recognition using a method of speech recognition based on multi-scale recursive quantization as claimed in any one of claims 1 to 8.
10. A speech recognition system based on multi-scale recursive quantization analysis according to claim 9, characterized by: the recognition model classifier adopts a Bayesian network classifier.
CN202210481126.XA 2022-05-05 2022-05-05 Voice recognition method and system based on multi-scale recursive quantitative analysis Pending CN114999459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210481126.XA CN114999459A (en) 2022-05-05 2022-05-05 Voice recognition method and system based on multi-scale recursive quantitative analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210481126.XA CN114999459A (en) 2022-05-05 2022-05-05 Voice recognition method and system based on multi-scale recursive quantitative analysis

Publications (1)

Publication Number Publication Date
CN114999459A true CN114999459A (en) 2022-09-02

Family

ID=83024479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210481126.XA Pending CN114999459A (en) 2022-05-05 2022-05-05 Voice recognition method and system based on multi-scale recursive quantitative analysis

Country Status (1)

Country Link
CN (1) CN114999459A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211574A (en) * 2019-06-03 2019-09-06 哈尔滨工业大学 Speech recognition modeling method for building up based on bottleneck characteristic and multiple dimensioned bull attention mechanism
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211574A (en) * 2019-06-03 2019-09-06 哈尔滨工业大学 Speech recognition modeling method for building up based on bottleneck characteristic and multiple dimensioned bull attention mechanism
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
薛隆基: "病理嗓音的递归量化分析及分类研究" *

Similar Documents

Publication Publication Date Title
US8140331B2 (en) Feature extraction for identification and classification of audio signals
CN109599120B (en) Abnormal mammal sound monitoring method based on large-scale farm plant
CN104887263B (en) A kind of identification algorithm and its system based on heart sound multi-dimension feature extraction
Mesgarani et al. Speech discrimination based on multiscale spectro-temporal modulations
Gómez-García et al. On the design of automatic voice condition analysis systems. Part III: Review of acoustic modelling strategies
López-Pabón et al. Cepstral analysis and Hilbert-Huang transform for automatic detection of Parkinson’s disease
CN110647656A (en) Audio retrieval method utilizing transform domain sparsification and compression dimension reduction
Hsu et al. Local wavelet acoustic pattern: A novel time–frequency descriptor for birdsong recognition
Wisniewski et al. Application of tonal index to pulmonary wheezes detection in asthma monitoring
Yarga et al. Efficient spike encoding algorithms for neuromorphic speech recognition
CN112863517A (en) Speech recognition method based on perceptual spectrum convergence rate
CN104036785A (en) Speech signal processing method, speech signal processing device and speech signal analyzing system
Manikandan et al. Quality-driven wavelet based PCG signal coding for wireless cardiac patient monitoring
Wiśniewski et al. Automatic detection of prolonged fricative phonemes with the hidden Markov models approach
CN114999459A (en) Voice recognition method and system based on multi-scale recursive quantitative analysis
Neili et al. Gammatonegram based pulmonary pathologies classification using convolutional neural networks
ABAKARIM et al. Amazigh isolated word speech recognition system using the adaptive orthogonal transform method
CN112233693A (en) Sound quality evaluation method, device and equipment
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system
CN109215633A (en) The recognition methods of cleft palate speech rhinorrhea gas based on recurrence map analysis
Karam Various speech processing techniques for speech compression and recognition
Sisman et al. A new speech coding algorithm using zero cross and phoneme based SYMPES
CN118248152A (en) Speech-based identity recognition method and related equipment
CN118173102B (en) Bird voiceprint recognition method in complex scene
Feng et al. Underwater acoustic feature extraction based on restricted Boltzmann machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220902

RJ01 Rejection of invention patent application after publication