CN109767756B - Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient - Google Patents

Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient Download PDF

Info

Publication number
CN109767756B
CN109767756B CN201910087494.4A CN201910087494A CN109767756B CN 109767756 B CN109767756 B CN 109767756B CN 201910087494 A CN201910087494 A CN 201910087494A CN 109767756 B CN109767756 B CN 109767756B
Authority
CN
China
Prior art keywords
discrete cosine
cosine transform
inverse discrete
sound
persons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910087494.4A
Other languages
Chinese (zh)
Other versions
CN109767756A (en
Inventor
左毅
马赫
李铁山
贺培超
刘君霞
艾佳琪
肖杨
于仁海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201910087494.4A priority Critical patent/CN109767756B/en
Publication of CN109767756A publication Critical patent/CN109767756A/en
Priority to JP2019186806A priority patent/JP6783001B2/en
Application granted granted Critical
Publication of CN109767756B publication Critical patent/CN109767756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a sound characteristic extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient, which comprises the following steps: s1, pre-emphasis, framing and windowing preprocessing are carried out on the sound signals: s2, performing transformation form processing from a time domain to a frequency domain on the preprocessed sound signals: s3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; and iterating the processes until the clustering is carried out to 24 classes, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic. The invention overcomes the defect that the prior art does not fully utilize the dynamic characteristics of sound to carry out frequency domain transformation, so that the invention has wider adaptability and can obtain higher identification precision on speaker identification.

Description

Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient
Technical Field
The invention belongs to the technical field of sound feature extraction, applies an unsupervised clustering analysis algorithm to a sound feature extraction direction, and particularly relates to a sound feature extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient.
Background
The speaker recognition technology comprises two parts of feature extraction and recognition modeling. Feature extraction is a key step in speaker recognition technology, and will directly affect the overall performance of the speech recognition system. Generally, after a speech signal is subjected to frame division and windowing preprocessing, high latitude data volume is generated, and when speaker characteristics are extracted, redundant information in original speech needs to be removed to reduce data dimensionality. In the existing method, triangular filtering is used to convert a voice signal into a voice feature vector meeting the requirement of feature parameters, and the voice feature vector can meet the similar human ear auditory perception characteristics and can enhance the voice signal and inhibit non-voice signals to a certain extent. The characteristic parameters commonly used are: the linear prediction analysis coefficient is a characteristic parameter obtained by simulating the human phonation principle and analyzing a model of the sound channel short pipe cascade connection; the perceptual linear prediction coefficient is applied to spectral analysis by calculation based on an auditory model, an input speech signal is processed by a human ear auditory model to replace characteristic parameters of an all-pole model prediction polynomial which is equivalent to LPC and is used for linear prediction coding of a time domain signal used by LPC; the Tandem feature and the Bottleneck feature are two types of features extracted by utilizing a neural network; the filter bank-based Fbank characteristic is equivalent to that the MFCC removes the discrete cosine transform of the last step, and more original voice data are reserved compared with the MFCC characteristic; the linear prediction cepstrum coefficients are important characteristic parameters which discard voice excitation information in the signal generation process based on the vocal tract model and represent the characteristics of formants by more than ten cepstrum coefficients; the extraction process comprises the steps of firstly preprocessing the voice, dividing frames, windowing, accelerating Fourier transform and the like, then filtering an energy spectrum through a group of Mel-scale triangular filter banks, calculating logarithmic energy output by each filter bank, obtaining an MFCC coefficient through Discrete Cosine Transform (DCT), solving a Mel-scale Cepstrum parameter, and then extracting a dynamic difference parameter, namely a Mel Cepstrum coefficient. In 2012, S.Al-Rawahya et al refer to an MFCC feature extraction method, perform equal frequency domain segmentation on DCT cepstrum coefficients obtained after voice preprocessing, and propose a method for Histopram DCT cepstrum coefficients. We find that the dynamic characteristics of the sound data can be ignored by the cepstrum coefficients of the frequency domain segmentation, so that the invention provides a new sound characteristic extraction algorithm on the basis of the dynamic segmentation inverse discrete cosine transform cepstrum coefficients, and combines unsupervised learning to perform cluster analysis on the sound data according to the similarity of the dynamic characteristics by using a hierarchical clustering method, thereby extracting the dynamic characteristic vector which can describe the sound characteristics better.
In the existing research, one of the most widely used speech recognition techniques is to use MFCC as a voice feature vector and perform speaker mode matching by combining machine learning methods such as a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), and a Support Vector Machine (SVM). The extraction process of MFCC is as follows: firstly, pre-emphasis, framing, windowing and accelerated Fourier transform preprocessing are carried out on voice; then filtering the energy spectrum through a group of Mel-scale triangular filter banks; calculating logarithmic energy output by each filter bank, obtaining an MFCC coefficient through Discrete Cosine Transform (DCT), introducing the obtained logarithmic energy into the DCT, solving a Mel-scale Cepstrum parameter, and extracting a dynamic differential parameter, namely a Mel Cepstrum coefficient MFCC.
Al-Rawahya et al discovered DCT Cepstrum as a new feature in 2012, and the equal frequency domain DCT Cepstrum coefficient-based acoustic feature extraction algorithm proposed by the same is provided. And (2) converting the preprocessed sound signals into frequency domains, namely converting the preprocessed sound signals from time domain convolution into a frequency domain spectrum multiplication form, taking logarithms of the sound signals, and expressing obtained components in an addition form to obtain discrete cosine transform Cepstrum coefficients (DCT Cepstrum coefficients). The DCT cepstral coefficients record the periodicity of the frequency range in non-linear increments, dividing the frequency domain feature interval every 50Hz between 0Hz and 600Hz, and dividing the frequency domain feature interval every 100Hz between 600Hz and 1000Hz, which process can be viewed as a count of the number of frequency range cycles in a given speech signal. Compared with the MFCC feature extraction method, the method is simpler and faster.
Disclosure of Invention
The invention mainly aims to provide a sound characteristic extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient, aiming at the inaccuracy of the segmentation frequency in the sound characteristic extraction algorithm based on the equal frequency domain segmentation inverse discrete cosine transform cepstrum coefficient. The technical means adopted by the invention are as follows:
a sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients comprises the following steps:
s1, preprocessing the sound signals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals;
the influence of factors such as aliasing, higher harmonic distortion, high frequency and the like caused by human vocal organs and equipment for acquiring vocal signals on the quality of the vocal signals is eliminated through preprocessing, so that the signals obtained through subsequent processing are more uniform and smooth, high-quality parameters are provided for the vocal feature extraction, and the subsequent processing quality is improved.
S2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals:
the method comprises the following steps of converting a preprocessed sound signal into a frequency domain, namely converting the preprocessed sound signal from time domain convolution into a frequency domain spectrum multiplication form, taking logarithm of the sound signal, expressing obtained components in an addition form, and obtaining an inverse discrete cosine transform Cepstrum coefficient (IDCT Cepstrum coefficient), wherein the specific process is carried out through the following formula:
C(q)=IDCT log|DCT{x(k)}|;
wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient;
the inverse discrete cosine transform cepstrum coefficient is a data matrix, and because the inherent frequency attribute of sound and sound, all column attributes are the same when hierarchical clustering is carried out, sequential clustering is carried out by calculating the similarity of adjacent column attributes.
S3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; and iterating the processes until the clustering is carried out to 24 classes, wherein the obtained dynamic segmentation inverse discrete cosine transform Cepstrum coefficient (DD-IDCT Cepstrum coefficient) is the sound characteristic.
The pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
where y (n) is the output signal after pre-emphasis, x (n) is the input acoustic signal, a is the pre-emphasis coefficient, and n is the time.
The average power spectrum of the acoustic signal is affected by glottic excitation and oronasal radiation, the high frequency end is attenuated by 6dB/oct (octave) above about 800Hz, the higher the frequency the smaller the corresponding component, and therefore the high frequency part of the acoustic signal is boosted before being analyzed.
Throughout the entire course of the sound analysis is a "short-time analysis technique". The acoustic signal has a time-varying characteristic, but within a short time range (generally within a short time of 10-30 ms), the time-varying characteristic thereof is basically kept unchanged, i.e. relatively stable, so that the acoustic signal can be regarded as a quasi-steady-state process, i.e. the acoustic signal has short-time stationarity. Therefore, the analysis and processing of any sound signal must be based on the "short-time", i.e. performing the "short-time analysis", and segmenting the sound signal to analyze its characteristic parameters, wherein each segment is called a "frame", and the length of the frame is generally 10-30 ms. Thus, for the whole acoustic signal, the characteristic parameter time sequence composed of the characteristic parameters of each frame is analyzed.
The framing is to segment the output signal after the pre-emphasis into 20ms one frame.
Windowing is carried out on the voice signals after the framing processing, and the purpose of windowing can be considered to be that the voice signals are more global and continuous, so that the Gibbs effect is avoided, and the voice signals without periodicity originally present partial characteristics of periodic functions. The windowing is hamming window windowing.
The transformation is in the form of a cepstral transform.
The clustering analysis algorithm is a hierarchical analysis algorithm.
And the similarity calculation is the Euclidean distance calculation.
Compared with the prior art, the invention has the following advantages:
firstly, due to the nature of the voice feature extraction algorithm of the in-depth analysis equal frequency domain segmentation DCT Cepstrum coefficient, the invention perfects the defect that the prior art does not fully utilize the dynamic features of voice to carry out frequency domain transformation, so that the invention has wider adaptability and can obtain higher identification precision in speaker identification.
Secondly, the unsupervised clustering analysis is applied to the sound characteristic extraction, so that the method has the advantages of simple process, high speed and less occupied computing resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a sound feature extraction algorithm based on a dynamic segmentation inverse discrete cosine transform cepstrum coefficient in an embodiment of the present invention.
FIG. 2 is a diagram of a cluster analysis tree in accordance with an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a sound feature extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients has the following steps:
s1, preprocessing the sound signals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals;
the pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
wherein, y (n) is the output signal after pre-emphasis, x (n) is the input sound signal, a is the pre-emphasis coefficient, n is the time, and a takes the value of 0.97.
The framing is to segment the output signal after the pre-emphasis into 20ms one frame.
The windowing is hamming window windowing.
S2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals:
the method comprises the following steps of converting a preprocessed sound signal into a frequency domain, namely converting the preprocessed sound signal from time domain convolution into a frequency domain spectrum multiplication form, taking logarithm of the sound signal, expressing obtained components in an addition form, and obtaining an inverse discrete cosine transform Cepstrum coefficient (IDCT Cepstrum coefficient), wherein the specific process is carried out through the following formula:
C(q)=IDCT log|DCT{x(k)}|;
wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient; the transformation is in the form of a cepstral transform.
S3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients obtained in the step S2 by using a cluster analysis algorithm, and sequentially combining two adjacent classes with the maximum similarity; iterating the processes until the clustering is carried out to 24 types, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic, and the specific steps are as follows:
the matrix A represents the m-dimensional n-dimensional inverse discrete cosine transform cepstral coefficients obtained in step S2, and as shown in FIG. 2, a vector V of each dimension of the inverse discrete cosine transform cepstral coefficients is used1,V2…VnLooking at n, find ViAnd VjHas a Euclidean distance of
Figure BDA0001962256620000051
The specific steps of cluster analysis are as follows:
clustering for the first time:
Figure BDA0001962256620000052
l1=Dis(V1,V2)
l2=Dis(V2,V3)
ln-1=Dis(Vn-1,Vn)
if i ═ arg min (l)1,l2,l3…ln-1) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vn) Namely, it is
Figure BDA0001962256620000061
Updating:
li-1=Dis(Vi-1,(Vi+Vi+1))
li=Dis((Vi+Vi+1),Vi+2)
li+1=li+2
ln-1=ln-2
Delete ln-1
and (5) clustering for the second time:
if j is argmin (l)1,l2,l3…ln-2) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vj+Vj+1),…(Vn) Namely, it is
Figure BDA0001962256620000062
And (3) updating again:
lj-1=Dis(Vj-1,(Vj+Vj+1))
lj=Dis((Vj+Vj+1),Vj+2)
lj+1=lj+2
ln-3=ln-2
Delete ln-2
and performing hierarchical clustering by analogy until the final clustering result is 24 types, obtaining a dynamic segmentation inverse discrete cosine transform cepstrum coefficient which is a sound characteristic, and putting the sound characteristic into a GMM (Gaussian mixture model) model for identification to judge the feasibility of the algorithm.
The clustering analysis algorithm is a hierarchical analysis algorithm.
And the similarity calculation is the Euclidean distance calculation.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (3)

1. A sound feature extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficients is characterized by comprising the following steps:
s1, preprocessing sound signals of m individuals:
pre-emphasis, framing and windowing are sequentially carried out on the sound signals of m persons;
the pre-emphasis is realized by a digital filter, and the specific process is carried out by the following formula:
Y(n)=X(n)-aX(n-l);
wherein, Y (n) is the output signal after pre-emphasis, X (n) is the input sound signal, a is the pre-emphasis coefficient, and n is the time; the framing is to segment the output signal after the pre-emphasis into one frame of 20 ms;
s2, performing transformation form processing from time domain to frequency domain on the preprocessed sound signals of the m persons:
the method comprises the steps of converting preprocessed sound signals of m persons into frequency domains, namely converting the preprocessed sound signals of the m persons from time domain convolution into a frequency domain spectrum multiplication form, taking logarithms of the sound signals, expressing obtained components in an addition form to obtain inverse discrete cosine transform cepstrum coefficients of the m persons, and performing the specific process through the following formula
C(q)=IDCTlog|DCT{x(k)}|;
Wherein, DCT and IDCT are discrete cosine transform and inverse discrete cosine transform respectively, x (k) is input sound signal, namely sound signal of m persons after pretreatment, C (q) is output voice signal, namely inverse discrete cosine transform cepstrum coefficient of m persons;
s3, calculating the similarity between the inverse discrete cosine transform cepstrum coefficients of the m individuals obtained in the step S2 by using a hierarchical clustering analysis algorithm, and sequentially combining two adjacent columns with the maximum similarity; iterating the processes until the clustering is carried out to 24 rows, wherein the obtained dynamic segmentation inverse discrete cosine transform cepstrum coefficient is the sound characteristic of the m persons; the method comprises the following specific steps:
the matrix A represents the m-dimensional inverse discrete cosine transform cepstrum coefficients of the n-dimension of the person obtained in step S2, and the vector V of each dimension of the inverse discrete cosine transform cepstrum coefficients1,V2…VnLooking at n, find ViAnd VjHas a Euclidean distance of
Figure FDF0000012548050000011
The specific steps of cluster analysis are as follows:
clustering for the first time:
Figure FDF0000012548050000021
l1=Dis(V1,V2)
l2=Dis(V2,V3)
ln-1=Dis(Vn-1,Vn)
if i ═ arg min (l)1,l2,l3…ln-1) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vn) Namely, it is
Figure FDF0000012548050000022
Updating:
li-1=Dis(Vi-1,(Vi+Vi+1))
li=Dis((Vi+Vi+1),Vi+2)
li+1=li+2
ln-1=ln-2
Delete ln-1
and (5) clustering for the second time:
if j is argmin (l)1,l2,l3…ln-2) Then the clustering result is
(V1),(V2),…(Vi+Vi+1),…(Vj+Vj+1),…(Vn) Namely, it is
Figure FDF0000012548050000031
And (3) updating again:
lj-1=Dis(Vj-1,(Vj+Vj+1))
lj=Dis((Vj+Vj+1),Vj+2)
lj+1=lj+2
ln-3=ln-2
Delete ln-2
and performing hierarchical clustering in the same way until the final clustering result is 24 rows, and obtaining a dynamic segmentation inverse discrete cosine transform cepstrum coefficient which is the sound characteristic.
2. The extraction algorithm according to claim 1, characterized in that: the windowing is hamming window windowing.
3. The extraction algorithm according to claim 1, characterized in that: the transformation is in the form of a cepstral transform.
CN201910087494.4A 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient Active CN109767756B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910087494.4A CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient
JP2019186806A JP6783001B2 (en) 2019-01-29 2019-10-10 Speech feature extraction algorithm based on dynamic division of cepstrum coefficients of inverse discrete cosine transform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910087494.4A CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient

Publications (2)

Publication Number Publication Date
CN109767756A CN109767756A (en) 2019-05-17
CN109767756B true CN109767756B (en) 2021-07-16

Family

ID=66455625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910087494.4A Active CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient

Country Status (2)

Country Link
JP (1) JP6783001B2 (en)
CN (1) CN109767756B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197657B (en) * 2019-05-22 2022-03-11 大连海事大学 Dynamic sound feature extraction method based on cosine similarity
CN110299134B (en) * 2019-07-01 2021-10-26 中科软科技股份有限公司 Audio processing method and system
CN110488675A (en) * 2019-07-12 2019-11-22 国网上海市电力公司 A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm
CN112180762B (en) * 2020-09-29 2021-10-29 瑞声新能源发展(常州)有限公司科教城分公司 Nonlinear signal system construction method, apparatus, device and medium
CN112581939A (en) * 2020-12-06 2021-03-30 中国南方电网有限责任公司 Intelligent voice analysis method applied to power dispatching normative evaluation
CN112669874B (en) * 2020-12-16 2023-08-15 西安电子科技大学 Speech feature extraction method based on quantum Fourier transform
CN113449626B (en) * 2021-06-23 2023-11-07 中国科学院上海高等研究院 Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal
CN113793614B (en) * 2021-08-24 2024-02-09 南昌大学 Speech feature fusion speaker recognition method based on independent vector analysis
CN114783462A (en) * 2022-05-11 2022-07-22 安徽理工大学 Mine hoist fault source positioning analysis method based on CS-MUSIC

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458950B (en) * 2007-12-14 2011-09-14 安凯(广州)微电子技术有限公司 Method for eliminating interference from A/D converter noise to digital recording
US9606530B2 (en) * 2013-05-17 2017-03-28 International Business Machines Corporation Decision support system for order prioritization
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN107293308B (en) * 2016-04-01 2019-06-07 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN109065071B (en) * 2018-08-31 2021-05-14 电子科技大学 Song clustering method based on iterative k-means algorithm
CN109256127B (en) * 2018-11-15 2021-02-19 江南大学 Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter

Also Published As

Publication number Publication date
CN109767756A (en) 2019-05-17
JP6783001B2 (en) 2020-11-11
JP2020140193A (en) 2020-09-03

Similar Documents

Publication Publication Date Title
CN109767756B (en) Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient
CN109147796B (en) Speech recognition method, device, computer equipment and computer readable storage medium
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
CN110931023B (en) Gender identification method, system, mobile terminal and storage medium
WO2023001128A1 (en) Audio data processing method, apparatus and device
CN114495969A (en) Voice recognition method integrating voice enhancement
CN105845126A (en) Method for automatic English subtitle filling of English audio image data
Gamit et al. Isolated words recognition using mfcc lpc and neural network
Goyani et al. Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters
Nawas et al. Speaker recognition using random forest
KR100897555B1 (en) Apparatus and method of extracting speech feature vectors and speech recognition system and method employing the same
Dave et al. Speech recognition: A review
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium
Luo et al. Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform.
CN111785262A (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Chavan et al. Speech recognition in noisy environment, issues and challenges: A review
Makhijani et al. Speech enhancement using pitch detection approach for noisy environment
Akhter et al. An analysis of performance evaluation metrics for voice conversion models
Hizlisoy et al. Text independent speaker recognition based on MFCC and machine learning
Tzudir et al. Low-resource dialect identification in Ao using noise robust mean Hilbert envelope coefficients
CN114298019A (en) Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product
Swathy et al. Review on feature extraction and classification techniques in speaker recognition
Maged et al. Improving speaker identification system using discrete wavelet transform and AWGN
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
Lalitha et al. An encapsulation of vital non-linear frequency features for various speech applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant