CN112951245B - Dynamic voiceprint feature extraction method integrated with static component - Google Patents

Dynamic voiceprint feature extraction method integrated with static component Download PDF

Info

Publication number
CN112951245B
CN112951245B CN202110257723.XA CN202110257723A CN112951245B CN 112951245 B CN112951245 B CN 112951245B CN 202110257723 A CN202110257723 A CN 202110257723A CN 112951245 B CN112951245 B CN 112951245B
Authority
CN
China
Prior art keywords
voice data
dynamic
mfcc
target voice
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110257723.XA
Other languages
Chinese (zh)
Other versions
CN112951245A (en
Inventor
刘涛
刘斌
黄金国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Open University of Jiangsu City Vocational College
Original Assignee
Jiangsu Open University of Jiangsu City Vocational College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Open University of Jiangsu City Vocational College filed Critical Jiangsu Open University of Jiangsu City Vocational College
Priority to CN202110257723.XA priority Critical patent/CN112951245B/en
Publication of CN112951245A publication Critical patent/CN112951245A/en
Application granted granted Critical
Publication of CN112951245B publication Critical patent/CN112951245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a dynamic voiceprint feature extraction method integrated with static components, which comprises the steps of preprocessing target voice data, obtaining preprocessed target voice data, and processing the preprocessed target voice by using a Fourier transform and Mel filter bank to obtain MFCC coefficients of the target voice data; the MFCC coefficient of the target voice data is brought into a dynamic voiceprint feature extraction model which is integrated into the static component, an MFCC dynamic feature differential parameter matrix of the target voice data is obtained, and the matrix is defined as the dynamic voiceprint feature of the target voice data; the method provided by the invention can ensure the sound continuity, reduce the average error rate and the like and improve the recognition rate when the voiceprint feature extraction is carried out on the voice data.

Description

Dynamic voiceprint feature extraction method integrated with static component
Technical Field
The invention relates to the technical field of artificial intelligent voiceprint recognition, in particular to a dynamic voiceprint feature extraction method integrated with static components.
Background
At present, intelligent home is more and more widely applied to life and work of people, the intelligent home adopts technologies such as wireless communication, image processing and voice processing, a voice interaction-based intelligent home system is more convenient to use, an information acquisition space is wider, and user experience is more friendly.
Voiceprint recognition has been developed in recent years, and in some occasions, the recognition rate also meets the basic requirements of people on safety, and the voiceprint recognition has the advantages of economy, convenience and the like, so that the voiceprint recognition has a very wide application prospect. How to suppress external noise as much as possible and extract as pure speech features as possible from the collected signals is a premise for various speech processing techniques to be put into practical use.
At present, the living quality of people is rapidly improved, and the requirements of the public on the intelligent home system are not limited to the requirements of the public on the intelligent home system to execute standard and common control functions, but the intelligent, convenience, safety and comfort of the whole home are expected to be improved more. The voiceprint recognition function is added for the intelligent home system, and the stability of the system in a noise environment is improved by adopting voice enhancement, so that the man-machine interaction experience of the intelligent home can be further improved, and the use efficiency of a user on the intelligent home is improved; the system can set a level system for the control and operation of the intelligent home, and different service functions are provided for users with different authority levels, so that the overall safety and practicability of the system are further improved. However, the voice recognition or voice feature extraction method in the prior art has the problems of high average error rate and low recognition rate.
Therefore, in order to further reduce the average error rate and improve the recognition rate, the invention provides a dynamic voiceprint feature extraction method which is integrated with static components.
Disclosure of Invention
The purpose of the invention is that: the dynamic voiceprint feature extraction method is low in average error rate and high in recognition rate.
The technical scheme is as follows: the invention provides a dynamic voiceprint feature extraction method integrated with static components, which is used for extracting voiceprint features of target voice data and is characterized by comprising the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using Fourier transform and Mel filter bank to obtain MFCC coefficients of target voice data;
step 3: and carrying the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model integrated with the static component, obtaining an MFCC dynamic feature differential parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint feature of the target voice data.
As a preferred embodiment of the present invention, in step 1, a method for preprocessing target voice data includes: dividing target voice data into T frames to obtain multi-frame voice data;
in step 2, the method for processing the preprocessed target speech using a fourier transform and a Mel filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into a Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data, is obtained.
As a preferred solution of the present invention, in step 3, the dynamic voiceprint feature extraction model integrated with the static component is:
Figure BDA0002968221760000021
wherein d (l, t) is the first-order dynamic voiceprint feature extraction result of the t-th frame voice data, d (l, t) forms the first-order t element in the MFCC dynamic feature differential parameter matrix of the target voice data, C (l, t) is the first-order t parameter in the MFCC coefficient, C (l, t+1) is the first-order t+1 parameter in the MFCC coefficient, C (l, t+k) is the first-order t+k parameter in the MFCC coefficient, C (l, t-K) is the first-order t-K parameter in the MFCC coefficient, K is the frequency ordinal after Fourier transformation is performed on the t-th frame voice data, and K is the preset total step length when Fourier transformation is performed on the t-th frame voice data.
As a preferred embodiment of the present invention, the following formula is used:
Figure BDA0002968221760000022
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
As a preferred embodiment of the present invention, the following formula is used:
Figure BDA0002968221760000023
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
The beneficial effects are that: compared with the prior art, the dynamic voiceprint feature extraction method for merging the static component provided by the invention is used for extracting the voiceprint features based on the dynamic voiceprint feature extraction model merging the static component, so that the purposes of reducing the average error rate and improving the recognition rate are achieved while the sound continuity is ensured.
Drawings
FIG. 1 is a flow chart of a dynamic voiceprint feature extraction method provided in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of the constant error rate as a function of the ratio of dynamic and static features provided in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of the variation of the constant error rate with the static characteristic coefficient according to the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, the method for extracting dynamic voiceprint features incorporated into static components provided by the present invention includes the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data.
The method for preprocessing the target voice data comprises the following steps: dividing target voice data into T frames to obtain multi-frame voice data;
step 2: and processing the preprocessed target voice by using Fourier transformation and a Mel filter bank to acquire MFCC coefficients of the target voice data.
The method for processing the preprocessed target speech using a fourier transform and a Mel-filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into a Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data, is obtained.
The method of the step 1 and the step 2 specifically comprises the following steps:
the extraction of Mel-frequency cepstrum coefficients (MFCCs) is performed on data that has been subjected to speech preprocessing, and desired characteristic coefficients are obtained by performing operations such as fourier transform, mel (Mel) filter filtering, and the like on the data.
(1) Performing Fourier transform on each frame of data after voice pretreatment to obtain a corresponding frequency spectrum and obtaining a power spectrum |X (j) |of each frame 2 The formula of X (j) is as follows:
Figure BDA0002968221760000031
wherein, N is the length of each frame, J is the length of the fast Fourier transform, i.e. the total frame number, J is the value of 1-J, which represents the J frame, and x (N) is the voice data in the N frame.
(2) And designing a Mel filter bank, and filtering the power spectrum of the signal through the configured Mel filter bank. And carrying out logarithmic operation, and converting the frequency scale into Mel frequency. The center frequency f (m) of the mth filter in the filter bank satisfies the following formula:
Mel(f(m+1))-Mel(f(m))=Mel(f(m))-Mel(f(m-1))
where m is the number of filters in the filter bank, and Mel (f (m)) is the operation of converting the frequency f (m) into a Mel frequency.
Transfer function H of each band pass filter in the Mel Filter Bank m (f):
Figure BDA0002968221760000041
Where f is the frequency.
After the voice data is processed by the Mel filter, the logarithmic energy S (m) output by each filter bank is obtained:
Figure BDA0002968221760000042
wherein M is the number of the filter bank, M is the total number of the filters in the filter bank, generally 22-26 are taken, and m=24 are taken. I X (k) | 2 Representing the power spectrum of the kth frame, H m (f) Representing the transfer function of the mth filter frequency f in the filter bank.
(3) Performing discrete cosine transform on the logarithmic Mel power spectrum of each frame to perform decorrelation operation on energy of the power spectrum, eliminating correlation among signals of each dimension, mapping the signals to a low-dimension space, and obtaining a corresponding MFCC coefficient C (l):
Figure BDA0002968221760000043
where L is the total order of MFCC coefficients, typically taken from 12 to 18, the invention takes l=15; l is a value of 1 to L, and represents the first order of the MFCC coefficients.
Step 3: and carrying the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model integrated with the static component, obtaining an MFCC dynamic feature differential parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint feature of the target voice data.
In step 3, a dynamic voiceprint feature extraction model incorporating static components is constructed according to the following method:
the dynamic feature extraction is essentially a MFCC coefficient differential mode, i.e. the parameters of the t-1 th frame and the t+1 th frame are used for downsampling when calculating the MFCC coefficient differential parameters of the t-th frame. Therefore, the classical dynamic feature extraction formula is as follows:
Figure BDA0002968221760000051
wherein J represents the length of the fast Fourier transform, usually 1 or 2 is taken, and represents a first-order MFCC coefficient differential parameter and a second-order MFCC coefficient differential parameter, and J is the value of J (J is more than or equal to 1 and less than or equal to J); l is the mel cepstrum coefficient order, T is the frame number, T is the total frame number of a section of audio, C (l, T) is the first order T parameter of the mel cepstrum coefficient matrix of the voice signal, and d (l, T) is the MFCC dynamic characteristic parameter.
The novel dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention:
Figure BDA0002968221760000052
the modification is as follows:
Figure BDA0002968221760000053
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002968221760000054
for the dynamic voiceprint feature proposed by the invention, MFCC is a static voiceprint feature, Δmfcc is a classical dynamic voiceprint feature, i.e. a differential dynamic parameter, α is a static feature coefficient, β is a dynamic feature coefficient, and δ is the ratio of the dynamic feature coefficient to the static feature coefficient.
The sum alpha and delta values are determined according to the following method:
assuming α=1, the optimal value of the ratio δ of dynamic coefficient to static coefficient is determined experimentally.
The number of gaussian elements in the experiment was set to 64, and 100 persons (50 females and 50 males) of speech data were selected in the timi corpus as experimental speech data in the experiment. And selecting voice data of 60 persons as training data for UBM model training, combining 10 sections of voice of each person into 10 seconds of voice, and performing UBM model training. The model parameters of the UBM model are obtained and then saved, and then 5 segments of speech of each of the remaining 40 persons are combined into 10 seconds of speech data to train the GMM model of each specific speaker and save the obtained model parameters. Finally, the rest voice data of 40 persons are circularly formed into 10 sections of voice data of 5 seconds to carry out matching test on the system. The complete test process comprises 400 times of speaker acceptance test experiments and 15600 times of speaker rejection test experiments, and the constant error rate is obtained as an output result of one experiment.
For voiceprint features obtained by voice data, each test voice generates a plurality of frames of voice segments, the set MFCC (frequency-division multiplexing) order is 15, so that one frame of voice data can generate 15 MFCC coefficients, 15 dynamic feature coefficients are generated after calculation, and 30 MFCC coefficients are generated for each frame of voice segment after combination. The sampling frequency in the experiment is 16KHz, and the frame shift is 1/2 of the frame length.
Assuming α=1, the optimal value of the ratio δ of dynamic coefficient to static coefficient is determined experimentally.
According to the experimental conditions, δ takes 5 different values, and 5 experiments are performed respectively, so that average error rate data are shown in table 1:
TABLE 1
Figure BDA0002968221760000061
From the data shown in table 1, different dynamic to static feature ratios δ and average error rate curves are obtained as shown in fig. 1.
As can be seen from fig. 2, when δ=1, the average equivalent error rate is the lowest, so that the optimum value of the dynamic-to-static feature ratio δ is 1.
Accordingly, the dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention can be changed into:
Figure BDA0002968221760000062
according to the experimental conditions, α takes 5 different values, and 5 experiments are performed respectively, so that average error rate data are shown in table 2:
TABLE 2
Figure BDA0002968221760000063
From the data shown in Table 2, different static characteristic coefficients α and average error rate curves are obtained as shown in FIG. 3.
As can be seen from fig. 3, when α=0.5, the average error rate is the lowest, so that the optimum value of the static characteristic coefficient is 0.5.
Accordingly, the dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention can be changed into:
Figure BDA0002968221760000064
equation (5) represents a dynamic feature parameter, namely Δmfcc, and MFCC is a static feature parameter, namely mfcc=d (l, t), and the two parameters are added by taking weight 0.5, so as to obtain a dynamic feature extraction equation integrated into a static component:
Figure BDA0002968221760000071
the dynamic characteristic extraction formula which is integrated with the static component is obtained by arrangement:
Figure BDA0002968221760000072
the built dynamic voiceprint feature extraction model integrated with the static component is as follows:
Figure BDA0002968221760000073
d (l, t) is the first-order dynamic voiceprint feature extraction result of the t-th frame voice data, and d (l, t) forms the t-th element of the first-order in the MFCC dynamic feature differential parameter matrix of the target voice data, namely: d (l, t) is the first order t parameter of the MFCC dynamic characteristic differential parameter matrix; c (l, t) is the t parameter of the first order in the MFCC coefficient, C (l, t+1) is the t+1th parameter of the first order in the MFCC coefficient, C (l, t+k) is the t+kth parameter of the first order, C (l, t-K) is the t-kth parameter of the first order in the MFCC coefficient, K is the frequency ordinal number after Fourier transformation is performed on the t frame voice data, and K is the preset total step length when Fourier transformation is performed on the t frame voice data.
And for the constructed dynamic voiceprint feature extraction model integrated with the static component, the method is based on the following formula:
Figure BDA0002968221760000074
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
According to the following formula:
Figure BDA0002968221760000075
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Watch (watch)The transfer function of the mth Mel filter bank corresponding to the kth frequency is shown.
Based on the model and the method, according to parameters such as the mel cepstrum coefficient matrix, the audio duration and the like, static characteristic parameters can be calculated first, and dynamic characteristic providing parameters blended into static components are further calculated for voiceprint recognition.
In the voiceprint recognition algorithm, a Gaussian mixture model and a general background model are used for carrying out model establishment on the voiceprint characteristics of a speaker, and the model establishment method mainly comprises the steps of Gaussian mixture model training voice input, voice pretreatment, voiceprint characteristic extraction, general background model parameter input, gaussian mixture model construction and Gaussian mixture model parameter storage. In general, in the voiceprint recognition algorithm, in the voiceprint feature extraction process, a classical dynamic feature extraction algorithm is mostly adopted, and the invention improves the process, integrates static components when calculating dynamic feature extraction parameters, and improves the performance of the voiceprint recognition algorithm.
The foregoing is merely a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and the modifications and variations should also be regarded as the scope of the invention.

Claims (3)

1. A dynamic voiceprint feature extraction method integrated with static components is used for extracting voiceprint features of target voice data, and is characterized by comprising the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data;
in step 1, the method for preprocessing target voice data includes: dividing target voice data into T frames to obtain multi-frame voice data;
step 2: processing the preprocessed target voice by using Fourier transform and Mel filter bank to obtain MFCC coefficients of target voice data;
in step 2, the method for processing the preprocessed target speech using a fourier transform and a Mel filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
inputting the frequency spectrum of each frame of voice data into a Mel filter bank, and obtaining the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data;
step 3: the MFCC coefficient of the target voice data is brought into a dynamic voiceprint feature extraction model which is integrated into the static component, an MFCC dynamic feature differential parameter matrix of the target voice data is obtained, and the matrix is defined as the dynamic voiceprint feature of the target voice data;
in step 3, the dynamic voiceprint feature extraction model integrated with the static component is:
Figure FDA0004188979680000011
wherein d (l, t) isFirst, thetFrame(s)The method comprises the steps that d (l, t) form a first-order t element in an MFCC dynamic characteristic differential parameter matrix of target voice data according to a first-order dynamic voiceprint characteristic extraction result of the voice data, C (l, t) is a first-order t parameter in an MFCC coefficient, C (l, t+1) is a first-order t+1 parameter in the MFCC coefficient, C (l, t+k) is a first-order t+k parameter in the MFCC coefficient, C (l, t-K) is a first-order t-K parameter in the MFCC coefficient, K is a frequency ordinal number after Fourier transformation is carried out on t-th frame voice data, and K is a preset total step length when Fourier transformation is carried out on t-th frame voice data.
2. The method for dynamic voiceprint feature extraction incorporated into a static component of claim 1, wherein the method is based on the formula:
Figure FDA0004188979680000021
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
3. The method for dynamic voiceprint feature extraction incorporated into a static component of claim 2 wherein the method is based on the formula:
Figure FDA0004188979680000022
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
CN202110257723.XA 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component Active CN112951245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110257723.XA CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110257723.XA CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Publications (2)

Publication Number Publication Date
CN112951245A CN112951245A (en) 2021-06-11
CN112951245B true CN112951245B (en) 2023-06-16

Family

ID=76228612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110257723.XA Active CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Country Status (1)

Country Link
CN (1) CN112951245B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689863B (en) * 2021-09-24 2024-01-16 广东电网有限责任公司 Voiceprint feature extraction method, voiceprint feature extraction device, voiceprint feature extraction equipment and storage medium
CN115762529A (en) * 2022-10-17 2023-03-07 国网青海省电力公司海北供电公司 Method for preventing cable from being broken outside by using voice recognition perception algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1246745A (en) * 1985-03-25 1988-12-13 Melvyn J. Hunt Man/machine communications system using formant based speech analysis and synthesis
CA2158847A1 (en) * 1993-03-25 1994-09-29 Mark Pawlewski A Method and Apparatus for Speaker Recognition
KR100779242B1 (en) * 2006-09-22 2007-11-26 (주)한국파워보이스 Speaker recognition methods of a speech recognition and speaker recognition integrated system
CN102290048A (en) * 2011-09-05 2011-12-21 南京大学 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN109256138A (en) * 2018-08-13 2019-01-22 平安科技(深圳)有限公司 Auth method, terminal device and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN104616655B (en) * 2015-02-05 2018-01-16 北京得意音通技术有限责任公司 The method and apparatus of sound-groove model automatic Reconstruction
CN104835498B (en) * 2015-05-25 2018-12-18 重庆大学 Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter
JP6860901B2 (en) * 2017-02-28 2021-04-21 国立研究開発法人情報通信研究機構 Learning device, speech synthesis system and speech synthesis method
CN107610708B (en) * 2017-06-09 2018-06-19 平安科技(深圳)有限公司 Identify the method and apparatus of vocal print
CN107993663A (en) * 2017-09-11 2018-05-04 北京航空航天大学 A kind of method for recognizing sound-groove based on Android
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN110428841B (en) * 2019-07-16 2021-09-28 河海大学 Voiceprint dynamic feature extraction method based on indefinite length mean value
CN111489763B (en) * 2020-04-13 2023-06-20 武汉大学 GMM model-based speaker recognition self-adaption method in complex environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1246745A (en) * 1985-03-25 1988-12-13 Melvyn J. Hunt Man/machine communications system using formant based speech analysis and synthesis
CA2158847A1 (en) * 1993-03-25 1994-09-29 Mark Pawlewski A Method and Apparatus for Speaker Recognition
KR100779242B1 (en) * 2006-09-22 2007-11-26 (주)한국파워보이스 Speaker recognition methods of a speech recognition and speaker recognition integrated system
CN102290048A (en) * 2011-09-05 2011-12-21 南京大学 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN109256138A (en) * 2018-08-13 2019-01-22 平安科技(深圳)有限公司 Auth method, terminal device and computer readable storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
一种改进动态特征参数的话者语音识别***;申小虎;万荣春;张新野;;计算机仿真(04);全文 *
基于MFCC和加权动态特征组合的环境音分类;魏丹芳;李应;;计算机与数字工程(02);全文 *
基于改进MFCC和短时能量的咳嗽音身份识别;赵青;成谢锋;朱冬梅;;计算机技术与发展(06);全文 *
基于非线性幂函数的听觉特征提取算法研究;岳倩倩;周萍;景新幸;;微电子学与计算机(06);全文 *
说话人识别算法的研究;郭春霞;;西安邮电学院学报(05);全文 *

Also Published As

Publication number Publication date
CN112951245A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN112951245B (en) Dynamic voiceprint feature extraction method integrated with static component
CN103236260B (en) Speech recognition system
CN103778920B (en) Speech enhan-cement and compensating for frequency response phase fusion method in digital deaf-aid
CN102982801B (en) Phonetic feature extracting method for robust voice recognition
Sarikaya et al. High resolution speech feature parametrization for monophone-based stressed speech recognition
CN111223493A (en) Voice signal noise reduction processing method, microphone and electronic equipment
CN110085249A (en) The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate
CN110428849A (en) A kind of sound enhancement method based on generation confrontation network
CN106024010B (en) A kind of voice signal dynamic feature extraction method based on formant curve
EP1250699B1 (en) Speech recognition
CN111128209B (en) Speech enhancement method based on mixed masking learning target
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
CN103544961B (en) Audio signal processing method and device
CN106373559B (en) Robust feature extraction method based on log-spectrum signal-to-noise ratio weighting
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
CN108364641A (en) A kind of speech emotional characteristic extraction method based on the estimation of long time frame ambient noise
CN110428841B (en) Voiceprint dynamic feature extraction method based on indefinite length mean value
CN107248414A (en) A kind of sound enhancement method and device based on multiframe frequency spectrum and Non-negative Matrix Factorization
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN108962275A (en) A kind of music noise suppressing method and device
CN103475986A (en) Digital hearing aid speech enhancing method based on multiresolution wavelets
Das et al. Robust front-end processing for speech recognition in noisy conditions
Li et al. An auditory system-based feature for robust speech recognition
CN108022588A (en) A kind of robust speech recognition methods based on bicharacteristic model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant