CN109767756B - Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient - Google Patents
Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient Download PDFInfo
- Publication number
- CN109767756B CN109767756B CN201910087494.4A CN201910087494A CN109767756B CN 109767756 B CN109767756 B CN 109767756B CN 201910087494 A CN201910087494 A CN 201910087494A CN 109767756 B CN109767756 B CN 109767756B
- Authority
- CN
- China
- Prior art keywords
- discrete cosine
- cosine transform
- inverse discrete
- sound
- persons
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The invention discloses a sound characteristic extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient, which comprises the following steps: s1, pre-emphasis, framing and windowing preprocessing are carried out on the sound signals: s2, performing transformation form processing from a time domain to a frequency domain on the preprocessed sound signals: s3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; and iterating the processes until the clustering is carried out to 24 classes, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic. The invention overcomes the defect that the prior art does not fully utilize the dynamic characteristics of sound to carry out frequency domain transformation, so that the invention has wider adaptability and can obtain higher identification precision on speaker identification.
Description
Technical Field
The invention belongs to the technical field of sound feature extraction, applies an unsupervised clustering analysis algorithm to a sound feature extraction direction, and particularly relates to a sound feature extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient.
Background
The speaker recognition technology comprises two parts of feature extraction and recognition modeling. Feature extraction is a key step in speaker recognition technology, and will directly affect the overall performance of the speech recognition system. Generally, after a speech signal is subjected to frame division and windowing preprocessing, high latitude data volume is generated, and when speaker characteristics are extracted, redundant information in original speech needs to be removed to reduce data dimensionality. In the existing method, triangular filtering is used to convert a voice signal into a voice feature vector meeting the requirement of feature parameters, and the voice feature vector can meet the similar human ear auditory perception characteristics and can enhance the voice signal and inhibit non-voice signals to a certain extent. The characteristic parameters commonly used are: the linear prediction analysis coefficient is a characteristic parameter obtained by simulating the human phonation principle and analyzing a model of the sound channel short pipe cascade connection; the perceptual linear prediction coefficient is applied to spectral analysis by calculation based on an auditory model, an input speech signal is processed by a human ear auditory model to replace characteristic parameters of an all-pole model prediction polynomial which is equivalent to LPC and is used for linear prediction coding of a time domain signal used by LPC; the Tandem feature and the Bottleneck feature are two types of features extracted by utilizing a neural network; the filter bank-based Fbank characteristic is equivalent to that the MFCC removes the discrete cosine transform of the last step, and more original voice data are reserved compared with the MFCC characteristic; the linear prediction cepstrum coefficients are important characteristic parameters which discard voice excitation information in the signal generation process based on the vocal tract model and represent the characteristics of formants by more than ten cepstrum coefficients; the extraction process comprises the steps of firstly preprocessing the voice, dividing frames, windowing, accelerating Fourier transform and the like, then filtering an energy spectrum through a group of Mel-scale triangular filter banks, calculating logarithmic energy output by each filter bank, obtaining an MFCC coefficient through Discrete Cosine Transform (DCT), solving a Mel-scale Cepstrum parameter, and then extracting a dynamic difference parameter, namely a Mel Cepstrum coefficient. In 2012, S.Al-Rawahya et al refer to an MFCC feature extraction method, perform equal frequency domain segmentation on DCT cepstrum coefficients obtained after voice preprocessing, and propose a method for Histopram DCT cepstrum coefficients. We find that the dynamic characteristics of the sound data can be ignored by the cepstrum coefficients of the frequency domain segmentation, so that the invention provides a new sound characteristic extraction algorithm on the basis of the dynamic segmentation inverse discrete cosine transform cepstrum coefficients, and combines unsupervised learning to perform cluster analysis on the sound data according to the similarity of the dynamic characteristics by using a hierarchical clustering method, thereby extracting the dynamic characteristic vector which can describe the sound characteristics better.
In the existing research, one of the most widely used speech recognition techniques is to use MFCC as a voice feature vector and perform speaker mode matching by combining machine learning methods such as a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM). The extraction process of MFCC is as follows: firstly, pre-emphasis, framing, windowing and accelerated Fourier transform preprocessing are carried out on voice; then filtering the energy spectrum through a group of Mel-scale triangular filter banks; calculating logarithmic energy output by each filter bank, obtaining an MFCC coefficient through Discrete Cosine Transform (DCT), introducing the obtained logarithmic energy into the DCT, solving a Mel-scale Cepstrum parameter, and extracting a dynamic differential parameter, namely a Mel Cepstrum coefficient MFCC.
Al-Rawahya et al discovered DCT Cepstrum as a new feature in 2012, and the equal frequency domain DCT Cepstrum coefficient-based acoustic feature extraction algorithm proposed by the same is provided. And (2) converting the preprocessed sound signals into frequency domains, namely converting the preprocessed sound signals from time domain convolution into a frequency domain spectrum multiplication form, taking logarithms of the sound signals, and expressing obtained components in an addition form to obtain discrete cosine transform Cepstrum coefficients (DCT Cepstrum coefficients). The DCT cepstral coefficients record the periodicity of the frequency range in non-linear increments, dividing the frequency domain feature interval every 50Hz between 0Hz and 600Hz, and dividing the frequency domain feature interval every 100Hz between 600Hz and 1000Hz, which process can be viewed as a count of the number of frequency range cycles in a given speech signal. Compared with the MFCC feature extraction method, the method is simpler and faster.
Disclosure of Invention
The invention mainly aims to provide a sound characteristic extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient, aiming at the inaccuracy of the segmentation frequency in the sound characteristic extraction algorithm based on the equal frequency domain segmentation inverse discrete cosine transform cepstrum coefficient. The technical means adopted by the invention are as follows:
a sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients comprises the following steps:
s1, preprocessing the sound signals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals;
the influence of factors such as aliasing, higher harmonic distortion, high frequency and the like caused by human vocal organs and equipment for acquiring vocal signals on the quality of the vocal signals is eliminated through preprocessing, so that the signals obtained through subsequent processing are more uniform and smooth, high-quality parameters are provided for the vocal feature extraction, and the subsequent processing quality is improved.
S2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals:
the method comprises the following steps of converting a preprocessed sound signal into a frequency domain, namely converting the preprocessed sound signal from time domain convolution into a frequency domain spectrum multiplication form, taking logarithm of the sound signal, expressing obtained components in an addition form, and obtaining an inverse discrete cosine transform Cepstrum coefficient (IDCT Cepstrum coefficient), wherein the specific process is carried out through the following formula:
C(q)=IDCT log|DCT{x(k)}|;
wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient;
the inverse discrete cosine transform cepstrum coefficient is a data matrix, and because the inherent frequency attribute of sound and sound, all column attributes are the same when hierarchical clustering is carried out, sequential clustering is carried out by calculating the similarity of adjacent column attributes.
S3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; and iterating the processes until the clustering is carried out to 24 classes, wherein the obtained dynamic segmentation inverse discrete cosine transform Cepstrum coefficient (DD-IDCT Cepstrum coefficient) is the sound characteristic.
The pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
where y (n) is the output signal after pre-emphasis, x (n) is the input acoustic signal, a is the pre-emphasis coefficient, and n is the time.
The average power spectrum of the acoustic signal is affected by glottic excitation and oronasal radiation, the high frequency end is attenuated by 6dB/oct (octave) above about 800Hz, the higher the frequency the smaller the corresponding component, and therefore the high frequency part of the acoustic signal is boosted before being analyzed.
Throughout the entire course of the sound analysis is a "short-time analysis technique". The acoustic signal has a time-varying characteristic, but within a short time range (generally within a short time of 10-30 ms), the time-varying characteristic thereof is basically kept unchanged, i.e. relatively stable, so that the acoustic signal can be regarded as a quasi-steady-state process, i.e. the acoustic signal has short-time stationarity. Therefore, the analysis and processing of any sound signal must be based on the "short-time", i.e. performing the "short-time analysis", and segmenting the sound signal to analyze its characteristic parameters, wherein each segment is called a "frame", and the length of the frame is generally 10-30 ms. Thus, for the whole acoustic signal, the characteristic parameter time sequence composed of the characteristic parameters of each frame is analyzed.
The framing is to segment the output signal after the pre-emphasis into 20ms one frame.
Windowing is carried out on the voice signals after the framing processing, and the purpose of windowing can be considered to be that the voice signals are more global and continuous, so that the Gibbs effect is avoided, and the voice signals without periodicity originally present partial characteristics of periodic functions. The windowing is hamming window windowing.
The transformation is in the form of a cepstral transform.
The clustering analysis algorithm is a hierarchical analysis algorithm.
And the similarity calculation is the Euclidean distance calculation.
Compared with the prior art, the invention has the following advantages:
firstly, due to the nature of the voice feature extraction algorithm of the in-depth analysis equal frequency domain segmentation DCT Cepstrum coefficient, the invention perfects the defect that the prior art does not fully utilize the dynamic features of voice to carry out frequency domain transformation, so that the invention has wider adaptability and can obtain higher identification precision in speaker identification.
Secondly, the unsupervised clustering analysis is applied to the sound characteristic extraction, so that the method has the advantages of simple process, high speed and less occupied computing resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a sound feature extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient in an embodiment of the present invention.
FIG. 2 is a diagram of a cluster analysis tree in accordance with an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a sound feature extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients has the following steps:
s1, preprocessing the sound signals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals;
the pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
wherein, y (n) is the output signal after pre-emphasis, x (n) is the input sound signal, a is the pre-emphasis coefficient, n is the time, and a takes the value of 0.97.
The framing is to segment the output signal after the pre-emphasis into 20ms one frame.
The windowing is hamming window windowing.
S2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals:
the method comprises the following steps of converting a preprocessed sound signal into a frequency domain, namely converting the preprocessed sound signal from time domain convolution into a frequency domain spectrum multiplication form, taking logarithm of the sound signal, expressing obtained components in an addition form, and obtaining an inverse discrete cosine transform Cepstrum coefficient (IDCT Cepstrum coefficient), wherein the specific process is carried out through the following formula:
C(q)=IDCT log|DCT{x(k)}|;
wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient; the transformation is in the form of a cepstral transform.
S3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; iterating the processes until the clustering is carried out to 24 types, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic, and the specific steps are as follows:
the matrix A represents the m-dimensional n-dimensional inverse discrete cosine transform cepstral coefficients obtained in step S2, and as shown in FIG. 2, a vector V of each dimension of the inverse discrete cosine transform cepstral coefficients is used1,V2…VnLooking at n, find ViAnd VjHas a Euclidean distance ofThe specific steps of cluster analysis are as follows:
clustering for the first time:
l1=Dis(V1,V2)
l2=Dis(V2,V3)
…
ln-1=Dis(Vn-1,Vn)
if i ═ arg min (l)1,l2,l3…ln-1) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vn) Namely, it is
Updating:
li-1=Dis(Vi-1,(Vi+Vi+1))
li=Dis((Vi+Vi+1),Vi+2)
li+1=li+2
…
ln-1=ln-2
Delete ln-1
and (5) clustering for the second time:
if j is argmin (l)1,l2,l3…ln-2) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vj+Vj+1),…(Vn) Namely, it is
And (3) updating again:
lj-1=Dis(Vj-1,(Vj+Vj+1))
lj=Dis((Vj+Vj+1),Vj+2)
lj+1=lj+2
…
ln-3=ln-2
Delete ln-2
and performing hierarchical clustering by analogy until the final clustering result is 24 types, obtaining a dynamic segmentation inverse discrete cosine transform cepstrum coefficient which is a sound characteristic, and putting the sound characteristic into a GMM (Gaussian mixture model) model for identification to judge the feasibility of the algorithm.
The clustering analysis algorithm is a hierarchical analysis algorithm.
And the similarity calculation is the Euclidean distance calculation.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (3)
1. A sound feature extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients is characterized by comprising the following steps:
s1, preprocessing sound signals of m individuals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals of m persons;
the pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
wherein, Y (n) is the output signal after pre-emphasis, X (n) is the input sound signal, a is the pre-emphasis coefficient, and n is the time; the framing is to segment the output signal after the pre-emphasis into one frame of 20 ms;
s2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals of the m persons:
the method comprises the steps of converting preprocessed sound signals of m persons into frequency domains, namely converting the preprocessed sound signals of the m persons from time domain convolution into a frequency domain spectrum multiplication form, taking logarithms of the sound signals, expressing obtained components in an addition form to obtain inverse discrete cosine transform cepstrum coefficients of the m persons, and performing the specific process through the following formula
C(q)=IDCTlog|DCT{x(k)}|;
Wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal of m persons after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient of m persons;
s3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients of the m individuals obtained in the step S2 by using a hierarchical clustering analysis algorithm, and sequentially combining two adjacent columns with the maximum similarity; iterating the processes until the clustering is carried out to 24 rows, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic of the m persons; the method comprises the following specific steps:
the matrix A represents the m-dimensional inverse discrete cosine transform cepstrum coefficients of the n-dimension of the person obtained in step S2, and the vector V of each dimension of the inverse discrete cosine transform cepstrum coefficients1,V2…VnLooking at n, find ViAnd VjHas a Euclidean distance of
The specific steps of cluster analysis are as follows:
clustering for the first time:
l1=Dis(V1,V2)
l2=Dis(V2,V3)
…
ln-1=Dis(Vn-1,Vn)
if i ═ arg min (l)1,l2,l3…ln-1) Then the clustering result is
Updating:
li-1=Dis(Vi-1,(Vi+Vi+1))
li=Dis((Vi+Vi+1),Vi+2)
li+1=li+2
…
ln-1=ln-2
Delete ln-1
and (5) clustering for the second time:
if j is argmin (l)1,l2,l3…ln-2) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vj+Vj+1),…(Vn) Namely, it is
And (3) updating again:
lj-1=Dis(Vj-1,(Vj+Vj+1))
lj=Dis((Vj+Vj+1),Vj+2)
lj+1=lj+2
…
ln-3=ln-2
Delete ln-2
and performing hierarchical clustering in the same way until the final clustering result is 24 rows, and obtaining a dynamic segmentation inverse discrete cosine transform cepstrum coefficient which is the sound characteristic.
2. The extraction algorithm according to claim 1, characterized in that: the windowing is hamming window windowing.
3. The extraction algorithm according to claim 1, characterized in that: the transformation is in the form of a cepstral transform.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910087494.4A CN109767756B (en) | 2019-01-29 | 2019-01-29 | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient |
JP2019186806A JP6783001B2 (en) | 2019-01-29 | 2019-10-10 | Speech feature extraction algorithm based on dynamic division of cepstrum coefficients of inverse discrete cosine transform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910087494.4A CN109767756B (en) | 2019-01-29 | 2019-01-29 | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109767756A CN109767756A (en) | 2019-05-17 |
CN109767756B true CN109767756B (en) | 2021-07-16 |
Family
ID=66455625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910087494.4A Active CN109767756B (en) | 2019-01-29 | 2019-01-29 | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6783001B2 (en) |
CN (1) | CN109767756B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197657B (en) * | 2019-05-22 | 2022-03-11 | 大连海事大学 | Dynamic sound feature extraction method based on cosine similarity |
CN110299134B (en) * | 2019-07-01 | 2021-10-26 | 中科软科技股份有限公司 | Audio processing method and system |
CN110488675A (en) * | 2019-07-12 | 2019-11-22 | 国网上海市电力公司 | A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm |
CN112180762B (en) * | 2020-09-29 | 2021-10-29 | 瑞声新能源发展(常州)有限公司科教城分公司 | Nonlinear signal system construction method, apparatus, device and medium |
CN112581939A (en) * | 2020-12-06 | 2021-03-30 | 中国南方电网有限责任公司 | Intelligent voice analysis method applied to power dispatching normative evaluation |
CN112669874B (en) * | 2020-12-16 | 2023-08-15 | 西安电子科技大学 | Speech feature extraction method based on quantum Fourier transform |
CN113449626B (en) * | 2021-06-23 | 2023-11-07 | 中国科学院上海高等研究院 | Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal |
CN113793614B (en) * | 2021-08-24 | 2024-02-09 | 南昌大学 | Speech feature fusion speaker recognition method based on independent vector analysis |
CN114783462A (en) * | 2022-05-11 | 2022-07-22 | 安徽理工大学 | Mine hoist fault source positioning analysis method based on CS-MUSIC |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458950B (en) * | 2007-12-14 | 2011-09-14 | 安凯(广州)微电子技术有限公司 | Method for eliminating interference from A/D converter noise to digital recording |
US9606530B2 (en) * | 2013-05-17 | 2017-03-28 | International Business Machines Corporation | Decision support system for order prioritization |
CN106971712A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive rapid voiceprint recognition methods and system |
CN107293308B (en) * | 2016-04-01 | 2019-06-07 | 腾讯科技(深圳)有限公司 | A kind of audio-frequency processing method and device |
CN109065071B (en) * | 2018-08-31 | 2021-05-14 | 电子科技大学 | Song clustering method based on iterative k-means algorithm |
CN109256127B (en) * | 2018-11-15 | 2021-02-19 | 江南大学 | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter |
-
2019
- 2019-01-29 CN CN201910087494.4A patent/CN109767756B/en active Active
- 2019-10-10 JP JP2019186806A patent/JP6783001B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109767756A (en) | 2019-05-17 |
JP6783001B2 (en) | 2020-11-11 |
JP2020140193A (en) | 2020-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767756B (en) | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient | |
CN109147796B (en) | Speech recognition method, device, computer equipment and computer readable storage medium | |
CN110942766A (en) | Audio event detection method, system, mobile terminal and storage medium | |
CN110931023B (en) | Gender identification method, system, mobile terminal and storage medium | |
WO2023001128A1 (en) | Audio data processing method, apparatus and device | |
CN114495969A (en) | Voice recognition method integrating voice enhancement | |
CN105845126A (en) | Method for automatic English subtitle filling of English audio image data | |
Gamit et al. | Isolated words recognition using mfcc lpc and neural network | |
Goyani et al. | Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters | |
Nawas et al. | Speaker recognition using random forest | |
KR100897555B1 (en) | Apparatus and method of extracting speech feature vectors and speech recognition system and method employing the same | |
Dave et al. | Speech recognition: A review | |
CN113744715A (en) | Vocoder speech synthesis method, device, computer equipment and storage medium | |
Luo et al. | Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform. | |
CN111785262A (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Chavan et al. | Speech recognition in noisy environment, issues and challenges: A review | |
Makhijani et al. | Speech enhancement using pitch detection approach for noisy environment | |
Akhter et al. | An analysis of performance evaluation metrics for voice conversion models | |
Hizlisoy et al. | Text independent speaker recognition based on MFCC and machine learning | |
Tzudir et al. | Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients | |
CN114298019A (en) | Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product | |
Swathy et al. | Review on feature extraction and classification techniques in speaker recognition | |
Maged et al. | Improving speaker identification system using discrete wavelet transform and AWGN | |
CN114512133A (en) | Sound object recognition method, sound object recognition device, server and storage medium | |
Lalitha et al. | An encapsulation of vital non-linear frequency features for various speech applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |