CN112951245B - Dynamic voiceprint feature extraction method integrated with static component - Google Patents
Dynamic voiceprint feature extraction method integrated with static component Download PDFInfo
- Publication number
- CN112951245B CN112951245B CN202110257723.XA CN202110257723A CN112951245B CN 112951245 B CN112951245 B CN 112951245B CN 202110257723 A CN202110257723 A CN 202110257723A CN 112951245 B CN112951245 B CN 112951245B
- Authority
- CN
- China
- Prior art keywords
- voice data
- dynamic
- mfcc
- target voice
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003068 static effect Effects 0.000 title claims abstract description 37
- 238000000605 extraction Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 22
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000001228 spectrum Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- HJCCZIABCSDUPE-UHFFFAOYSA-N methyl 2-[4-[[4-methyl-6-(1-methylbenzimidazol-2-yl)-2-propylbenzimidazol-1-yl]methyl]phenyl]benzoate Chemical compound CCCC1=NC2=C(C)C=C(C=3N(C4=CC=CC=C4N=3)C)C=C2N1CC(C=C1)=CC=C1C1=CC=CC=C1C(=O)OC HJCCZIABCSDUPE-UHFFFAOYSA-N 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Complex Calculations (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a dynamic voiceprint feature extraction method integrated with static components, which comprises the steps of preprocessing target voice data, obtaining preprocessed target voice data, and processing the preprocessed target voice by using a Fourier transform and Mel filter bank to obtain MFCC coefficients of the target voice data; the MFCC coefficient of the target voice data is brought into a dynamic voiceprint feature extraction model which is integrated into the static component, an MFCC dynamic feature differential parameter matrix of the target voice data is obtained, and the matrix is defined as the dynamic voiceprint feature of the target voice data; the method provided by the invention can ensure the sound continuity, reduce the average error rate and the like and improve the recognition rate when the voiceprint feature extraction is carried out on the voice data.
Description
Technical Field
The invention relates to the technical field of artificial intelligent voiceprint recognition, in particular to a dynamic voiceprint feature extraction method integrated with static components.
Background
At present, intelligent home is more and more widely applied to life and work of people, the intelligent home adopts technologies such as wireless communication, image processing and voice processing, a voice interaction-based intelligent home system is more convenient to use, an information acquisition space is wider, and user experience is more friendly.
Voiceprint recognition has been developed in recent years, and in some occasions, the recognition rate also meets the basic requirements of people on safety, and the voiceprint recognition has the advantages of economy, convenience and the like, so that the voiceprint recognition has a very wide application prospect. How to suppress external noise as much as possible and extract as pure speech features as possible from the collected signals is a premise for various speech processing techniques to be put into practical use.
At present, the living quality of people is rapidly improved, and the requirements of the public on the intelligent home system are not limited to the requirements of the public on the intelligent home system to execute standard and common control functions, but the intelligent, convenience, safety and comfort of the whole home are expected to be improved more. The voiceprint recognition function is added for the intelligent home system, and the stability of the system in a noise environment is improved by adopting voice enhancement, so that the man-machine interaction experience of the intelligent home can be further improved, and the use efficiency of a user on the intelligent home is improved; the system can set a level system for the control and operation of the intelligent home, and different service functions are provided for users with different authority levels, so that the overall safety and practicability of the system are further improved. However, the voice recognition or voice feature extraction method in the prior art has the problems of high average error rate and low recognition rate.
Therefore, in order to further reduce the average error rate and improve the recognition rate, the invention provides a dynamic voiceprint feature extraction method which is integrated with static components.
Disclosure of Invention
The purpose of the invention is that: the dynamic voiceprint feature extraction method is low in average error rate and high in recognition rate.
The technical scheme is as follows: the invention provides a dynamic voiceprint feature extraction method integrated with static components, which is used for extracting voiceprint features of target voice data and is characterized by comprising the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using Fourier transform and Mel filter bank to obtain MFCC coefficients of target voice data;
step 3: and carrying the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model integrated with the static component, obtaining an MFCC dynamic feature differential parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint feature of the target voice data.
As a preferred embodiment of the present invention, in step 1, a method for preprocessing target voice data includes: dividing target voice data into T frames to obtain multi-frame voice data;
in step 2, the method for processing the preprocessed target speech using a fourier transform and a Mel filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into a Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data, is obtained.
As a preferred solution of the present invention, in step 3, the dynamic voiceprint feature extraction model integrated with the static component is:
wherein d (l, t) is the first-order dynamic voiceprint feature extraction result of the t-th frame voice data, d (l, t) forms the first-order t element in the MFCC dynamic feature differential parameter matrix of the target voice data, C (l, t) is the first-order t parameter in the MFCC coefficient, C (l, t+1) is the first-order t+1 parameter in the MFCC coefficient, C (l, t+k) is the first-order t+k parameter in the MFCC coefficient, C (l, t-K) is the first-order t-K parameter in the MFCC coefficient, K is the frequency ordinal after Fourier transformation is performed on the t-th frame voice data, and K is the preset total step length when Fourier transformation is performed on the t-th frame voice data.
As a preferred embodiment of the present invention, the following formula is used:
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
As a preferred embodiment of the present invention, the following formula is used:
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
The beneficial effects are that: compared with the prior art, the dynamic voiceprint feature extraction method for merging the static component provided by the invention is used for extracting the voiceprint features based on the dynamic voiceprint feature extraction model merging the static component, so that the purposes of reducing the average error rate and improving the recognition rate are achieved while the sound continuity is ensured.
Drawings
FIG. 1 is a flow chart of a dynamic voiceprint feature extraction method provided in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of the constant error rate as a function of the ratio of dynamic and static features provided in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of the variation of the constant error rate with the static characteristic coefficient according to the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, the method for extracting dynamic voiceprint features incorporated into static components provided by the present invention includes the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data.
The method for preprocessing the target voice data comprises the following steps: dividing target voice data into T frames to obtain multi-frame voice data;
step 2: and processing the preprocessed target voice by using Fourier transformation and a Mel filter bank to acquire MFCC coefficients of the target voice data.
The method for processing the preprocessed target speech using a fourier transform and a Mel-filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into a Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data, is obtained.
The method of the step 1 and the step 2 specifically comprises the following steps:
the extraction of Mel-frequency cepstrum coefficients (MFCCs) is performed on data that has been subjected to speech preprocessing, and desired characteristic coefficients are obtained by performing operations such as fourier transform, mel (Mel) filter filtering, and the like on the data.
(1) Performing Fourier transform on each frame of data after voice pretreatment to obtain a corresponding frequency spectrum and obtaining a power spectrum |X (j) |of each frame 2 The formula of X (j) is as follows:
wherein, N is the length of each frame, J is the length of the fast Fourier transform, i.e. the total frame number, J is the value of 1-J, which represents the J frame, and x (N) is the voice data in the N frame.
(2) And designing a Mel filter bank, and filtering the power spectrum of the signal through the configured Mel filter bank. And carrying out logarithmic operation, and converting the frequency scale into Mel frequency. The center frequency f (m) of the mth filter in the filter bank satisfies the following formula:
Mel(f(m+1))-Mel(f(m))=Mel(f(m))-Mel(f(m-1))
where m is the number of filters in the filter bank, and Mel (f (m)) is the operation of converting the frequency f (m) into a Mel frequency.
Transfer function H of each band pass filter in the Mel Filter Bank m (f):
Where f is the frequency.
After the voice data is processed by the Mel filter, the logarithmic energy S (m) output by each filter bank is obtained:
wherein M is the number of the filter bank, M is the total number of the filters in the filter bank, generally 22-26 are taken, and m=24 are taken. I X (k) | 2 Representing the power spectrum of the kth frame, H m (f) Representing the transfer function of the mth filter frequency f in the filter bank.
(3) Performing discrete cosine transform on the logarithmic Mel power spectrum of each frame to perform decorrelation operation on energy of the power spectrum, eliminating correlation among signals of each dimension, mapping the signals to a low-dimension space, and obtaining a corresponding MFCC coefficient C (l):
where L is the total order of MFCC coefficients, typically taken from 12 to 18, the invention takes l=15; l is a value of 1 to L, and represents the first order of the MFCC coefficients.
Step 3: and carrying the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model integrated with the static component, obtaining an MFCC dynamic feature differential parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint feature of the target voice data.
In step 3, a dynamic voiceprint feature extraction model incorporating static components is constructed according to the following method:
the dynamic feature extraction is essentially a MFCC coefficient differential mode, i.e. the parameters of the t-1 th frame and the t+1 th frame are used for downsampling when calculating the MFCC coefficient differential parameters of the t-th frame. Therefore, the classical dynamic feature extraction formula is as follows:
wherein J represents the length of the fast Fourier transform, usually 1 or 2 is taken, and represents a first-order MFCC coefficient differential parameter and a second-order MFCC coefficient differential parameter, and J is the value of J (J is more than or equal to 1 and less than or equal to J); l is the mel cepstrum coefficient order, T is the frame number, T is the total frame number of a section of audio, C (l, T) is the first order T parameter of the mel cepstrum coefficient matrix of the voice signal, and d (l, T) is the MFCC dynamic characteristic parameter.
The novel dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the dynamic voiceprint feature proposed by the invention, MFCC is a static voiceprint feature, Δmfcc is a classical dynamic voiceprint feature, i.e. a differential dynamic parameter, α is a static feature coefficient, β is a dynamic feature coefficient, and δ is the ratio of the dynamic feature coefficient to the static feature coefficient.
The sum alpha and delta values are determined according to the following method:
assuming α=1, the optimal value of the ratio δ of dynamic coefficient to static coefficient is determined experimentally.
The number of gaussian elements in the experiment was set to 64, and 100 persons (50 females and 50 males) of speech data were selected in the timi corpus as experimental speech data in the experiment. And selecting voice data of 60 persons as training data for UBM model training, combining 10 sections of voice of each person into 10 seconds of voice, and performing UBM model training. The model parameters of the UBM model are obtained and then saved, and then 5 segments of speech of each of the remaining 40 persons are combined into 10 seconds of speech data to train the GMM model of each specific speaker and save the obtained model parameters. Finally, the rest voice data of 40 persons are circularly formed into 10 sections of voice data of 5 seconds to carry out matching test on the system. The complete test process comprises 400 times of speaker acceptance test experiments and 15600 times of speaker rejection test experiments, and the constant error rate is obtained as an output result of one experiment.
For voiceprint features obtained by voice data, each test voice generates a plurality of frames of voice segments, the set MFCC (frequency-division multiplexing) order is 15, so that one frame of voice data can generate 15 MFCC coefficients, 15 dynamic feature coefficients are generated after calculation, and 30 MFCC coefficients are generated for each frame of voice segment after combination. The sampling frequency in the experiment is 16KHz, and the frame shift is 1/2 of the frame length.
Assuming α=1, the optimal value of the ratio δ of dynamic coefficient to static coefficient is determined experimentally.
According to the experimental conditions, δ takes 5 different values, and 5 experiments are performed respectively, so that average error rate data are shown in table 1:
TABLE 1
From the data shown in table 1, different dynamic to static feature ratios δ and average error rate curves are obtained as shown in fig. 1.
As can be seen from fig. 2, when δ=1, the average equivalent error rate is the lowest, so that the optimum value of the dynamic-to-static feature ratio δ is 1.
Accordingly, the dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention can be changed into:
according to the experimental conditions, α takes 5 different values, and 5 experiments are performed respectively, so that average error rate data are shown in table 2:
TABLE 2
From the data shown in Table 2, different static characteristic coefficients α and average error rate curves are obtained as shown in FIG. 3.
As can be seen from fig. 3, when α=0.5, the average error rate is the lowest, so that the optimum value of the static characteristic coefficient is 0.5.
Accordingly, the dynamic voiceprint characteristic Mel frequency cepstrum coefficient formula provided by the invention can be changed into:
equation (5) represents a dynamic feature parameter, namely Δmfcc, and MFCC is a static feature parameter, namely mfcc=d (l, t), and the two parameters are added by taking weight 0.5, so as to obtain a dynamic feature extraction equation integrated into a static component:
the dynamic characteristic extraction formula which is integrated with the static component is obtained by arrangement:
the built dynamic voiceprint feature extraction model integrated with the static component is as follows:
d (l, t) is the first-order dynamic voiceprint feature extraction result of the t-th frame voice data, and d (l, t) forms the t-th element of the first-order in the MFCC dynamic feature differential parameter matrix of the target voice data, namely: d (l, t) is the first order t parameter of the MFCC dynamic characteristic differential parameter matrix; c (l, t) is the t parameter of the first order in the MFCC coefficient, C (l, t+1) is the t+1th parameter of the first order in the MFCC coefficient, C (l, t+k) is the t+kth parameter of the first order, C (l, t-K) is the t-kth parameter of the first order in the MFCC coefficient, K is the frequency ordinal number after Fourier transformation is performed on the t frame voice data, and K is the preset total step length when Fourier transformation is performed on the t frame voice data.
And for the constructed dynamic voiceprint feature extraction model integrated with the static component, the method is based on the following formula:
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
According to the following formula:
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Watch (watch)The transfer function of the mth Mel filter bank corresponding to the kth frequency is shown.
Based on the model and the method, according to parameters such as the mel cepstrum coefficient matrix, the audio duration and the like, static characteristic parameters can be calculated first, and dynamic characteristic providing parameters blended into static components are further calculated for voiceprint recognition.
In the voiceprint recognition algorithm, a Gaussian mixture model and a general background model are used for carrying out model establishment on the voiceprint characteristics of a speaker, and the model establishment method mainly comprises the steps of Gaussian mixture model training voice input, voice pretreatment, voiceprint characteristic extraction, general background model parameter input, gaussian mixture model construction and Gaussian mixture model parameter storage. In general, in the voiceprint recognition algorithm, in the voiceprint feature extraction process, a classical dynamic feature extraction algorithm is mostly adopted, and the invention improves the process, integrates static components when calculating dynamic feature extraction parameters, and improves the performance of the voiceprint recognition algorithm.
The foregoing is merely a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and the modifications and variations should also be regarded as the scope of the invention.
Claims (3)
1. A dynamic voiceprint feature extraction method integrated with static components is used for extracting voiceprint features of target voice data, and is characterized by comprising the following steps:
step 1: preprocessing target voice data to obtain preprocessed target voice data;
in step 1, the method for preprocessing target voice data includes: dividing target voice data into T frames to obtain multi-frame voice data;
step 2: processing the preprocessed target voice by using Fourier transform and Mel filter bank to obtain MFCC coefficients of target voice data;
in step 2, the method for processing the preprocessed target speech using a fourier transform and a Mel filter bank comprises the steps of:
processing each frame of voice data by using Fourier transformation to obtain the frequency spectrum of each frame of voice data;
inputting the frequency spectrum of each frame of voice data into a Mel filter bank, and obtaining the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of target voice data;
step 3: the MFCC coefficient of the target voice data is brought into a dynamic voiceprint feature extraction model which is integrated into the static component, an MFCC dynamic feature differential parameter matrix of the target voice data is obtained, and the matrix is defined as the dynamic voiceprint feature of the target voice data;
in step 3, the dynamic voiceprint feature extraction model integrated with the static component is:
wherein d (l, t) isFirst, thetFrame(s)The method comprises the steps that d (l, t) form a first-order t element in an MFCC dynamic characteristic differential parameter matrix of target voice data according to a first-order dynamic voiceprint characteristic extraction result of the voice data, C (l, t) is a first-order t parameter in an MFCC coefficient, C (l, t+1) is a first-order t+1 parameter in the MFCC coefficient, C (l, t+k) is a first-order t+k parameter in the MFCC coefficient, C (l, t-K) is a first-order t-K parameter in the MFCC coefficient, K is a frequency ordinal number after Fourier transformation is carried out on t-th frame voice data, and K is a preset total step length when Fourier transformation is carried out on t-th frame voice data.
2. The method for dynamic voiceprint feature extraction incorporated into a static component of claim 1, wherein the method is based on the formula:
acquiring a first-order characteristic coefficient C (l, t) of the t-th frame voice data in the MFCC coefficient;
where L is the order of the MFCC coefficients, m is the number of the Mel filter bank, and S (m) is the logarithmic energy of the mth Mel filter bank output.
3. The method for dynamic voiceprint feature extraction incorporated into a static component of claim 2 wherein the method is based on the formula:
obtaining logarithmic energy S (m) output by an mth Mel filter bank;
wherein M represents the total number of filter banks, N represents the data length of the t-th frame voice data, X (k) represents the power corresponding to the k-th frequency, H m (k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257723.XA CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257723.XA CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112951245A CN112951245A (en) | 2021-06-11 |
CN112951245B true CN112951245B (en) | 2023-06-16 |
Family
ID=76228612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110257723.XA Active CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112951245B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113689863B (en) * | 2021-09-24 | 2024-01-16 | 广东电网有限责任公司 | Voiceprint feature extraction method, voiceprint feature extraction device, voiceprint feature extraction equipment and storage medium |
CN115762529A (en) * | 2022-10-17 | 2023-03-07 | 国网青海省电力公司海北供电公司 | Method for preventing cable from being broken outside by using voice recognition perception algorithm |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1246745A (en) * | 1985-03-25 | 1988-12-13 | Melvyn J. Hunt | Man/machine communications system using formant based speech analysis and synthesis |
CA2158847A1 (en) * | 1993-03-25 | 1994-09-29 | Mark Pawlewski | A Method and Apparatus for Speaker Recognition |
KR100779242B1 (en) * | 2006-09-22 | 2007-11-26 | (주)한국파워보이스 | Speaker recognition methods of a speech recognition and speaker recognition integrated system |
CN102290048A (en) * | 2011-09-05 | 2011-12-21 | 南京大学 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
CN109256138A (en) * | 2018-08-13 | 2019-01-22 | 平安科技(深圳)有限公司 | Auth method, terminal device and computer readable storage medium |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
CN104616655B (en) * | 2015-02-05 | 2018-01-16 | 北京得意音通技术有限责任公司 | The method and apparatus of sound-groove model automatic Reconstruction |
CN104835498B (en) * | 2015-05-25 | 2018-12-18 | 重庆大学 | Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter |
JP6860901B2 (en) * | 2017-02-28 | 2021-04-21 | 国立研究開発法人情報通信研究機構 | Learning device, speech synthesis system and speech synthesis method |
CN107610708B (en) * | 2017-06-09 | 2018-06-19 | 平安科技(深圳)有限公司 | Identify the method and apparatus of vocal print |
CN107993663A (en) * | 2017-09-11 | 2018-05-04 | 北京航空航天大学 | A kind of method for recognizing sound-groove based on Android |
CN108847244A (en) * | 2018-08-22 | 2018-11-20 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Voiceprint recognition method and system based on MFCC and improved BP neural network |
CN110428841B (en) * | 2019-07-16 | 2021-09-28 | 河海大学 | Voiceprint dynamic feature extraction method based on indefinite length mean value |
CN111489763B (en) * | 2020-04-13 | 2023-06-20 | 武汉大学 | GMM model-based speaker recognition self-adaption method in complex environment |
-
2021
- 2021-03-09 CN CN202110257723.XA patent/CN112951245B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1246745A (en) * | 1985-03-25 | 1988-12-13 | Melvyn J. Hunt | Man/machine communications system using formant based speech analysis and synthesis |
CA2158847A1 (en) * | 1993-03-25 | 1994-09-29 | Mark Pawlewski | A Method and Apparatus for Speaker Recognition |
KR100779242B1 (en) * | 2006-09-22 | 2007-11-26 | (주)한국파워보이스 | Speaker recognition methods of a speech recognition and speaker recognition integrated system |
CN102290048A (en) * | 2011-09-05 | 2011-12-21 | 南京大学 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
CN109256138A (en) * | 2018-08-13 | 2019-01-22 | 平安科技(深圳)有限公司 | Auth method, terminal device and computer readable storage medium |
Non-Patent Citations (5)
Title |
---|
一种改进动态特征参数的话者语音识别***;申小虎;万荣春;张新野;;计算机仿真(04);全文 * |
基于MFCC和加权动态特征组合的环境音分类;魏丹芳;李应;;计算机与数字工程(02);全文 * |
基于改进MFCC和短时能量的咳嗽音身份识别;赵青;成谢锋;朱冬梅;;计算机技术与发展(06);全文 * |
基于非线性幂函数的听觉特征提取算法研究;岳倩倩;周萍;景新幸;;微电子学与计算机(06);全文 * |
说话人识别算法的研究;郭春霞;;西安邮电学院学报(05);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112951245A (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112951245B (en) | Dynamic voiceprint feature extraction method integrated with static component | |
CN103236260B (en) | Speech recognition system | |
CN103778920B (en) | Speech enhan-cement and compensating for frequency response phase fusion method in digital deaf-aid | |
CN102982801B (en) | Phonetic feature extracting method for robust voice recognition | |
Sarikaya et al. | High resolution speech feature parametrization for monophone-based stressed speech recognition | |
CN111223493A (en) | Voice signal noise reduction processing method, microphone and electronic equipment | |
CN110085249A (en) | The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate | |
CN110428849A (en) | A kind of sound enhancement method based on generation confrontation network | |
CN106024010B (en) | A kind of voice signal dynamic feature extraction method based on formant curve | |
EP1250699B1 (en) | Speech recognition | |
CN111128209B (en) | Speech enhancement method based on mixed masking learning target | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN103544961B (en) | Audio signal processing method and device | |
CN106373559B (en) | Robust feature extraction method based on log-spectrum signal-to-noise ratio weighting | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN113744749B (en) | Speech enhancement method and system based on psychoacoustic domain weighting loss function | |
CN108364641A (en) | A kind of speech emotional characteristic extraction method based on the estimation of long time frame ambient noise | |
CN110428841B (en) | Voiceprint dynamic feature extraction method based on indefinite length mean value | |
CN107248414A (en) | A kind of sound enhancement method and device based on multiframe frequency spectrum and Non-negative Matrix Factorization | |
CN112017658A (en) | Operation control system based on intelligent human-computer interaction | |
CN108962275A (en) | A kind of music noise suppressing method and device | |
CN103475986A (en) | Digital hearing aid speech enhancing method based on multiresolution wavelets | |
Das et al. | Robust front-end processing for speech recognition in noisy conditions | |
Li et al. | An auditory system-based feature for robust speech recognition | |
CN108022588A (en) | A kind of robust speech recognition methods based on bicharacteristic model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |