CN112951245A - Dynamic voiceprint feature extraction method integrated with static component - Google Patents

Dynamic voiceprint feature extraction method integrated with static component Download PDF

Info

Publication number
CN112951245A
CN112951245A CN202110257723.XA CN202110257723A CN112951245A CN 112951245 A CN112951245 A CN 112951245A CN 202110257723 A CN202110257723 A CN 202110257723A CN 112951245 A CN112951245 A CN 112951245A
Authority
CN
China
Prior art keywords
voice data
target voice
dynamic
mfcc
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110257723.XA
Other languages
Chinese (zh)
Other versions
CN112951245B (en
Inventor
刘涛
刘斌
黄金国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Open University of Jiangsu City Vocational College
Original Assignee
Jiangsu Open University of Jiangsu City Vocational College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Open University of Jiangsu City Vocational College filed Critical Jiangsu Open University of Jiangsu City Vocational College
Priority to CN202110257723.XA priority Critical patent/CN112951245B/en
Publication of CN112951245A publication Critical patent/CN112951245A/en
Application granted granted Critical
Publication of CN112951245B publication Critical patent/CN112951245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a dynamic voiceprint feature extraction method integrated with static components, which comprises the steps of preprocessing target voice data, acquiring preprocessed target voice data, processing the preprocessed target voice by using a Fourier transform and Mel filter group, and acquiring MFCC coefficients of the target voice data; substituting the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model fused with the static component, obtaining an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data; the method provided by the invention can ensure the sound continuity, reduce the average error rate and the like and improve the recognition rate when the voiceprint feature extraction is carried out on the voice data.

Description

Dynamic voiceprint feature extraction method integrated with static component
Technical Field
The invention relates to the technical field of artificial intelligence voiceprint recognition, in particular to a dynamic voiceprint feature extraction method fused with static components.
Background
At present, smart homes are more and more widely applied to life and work of people, the smart homes adopt technologies such as wireless communication, image processing and voice processing, an intelligent home system based on voice interaction is more convenient to use, an information acquisition space is wider, and user experience is more friendly.
Voiceprint recognition has been developed greatly in recent years, and in some occasions, the recognition rate also meets the basic requirements of people on safety, and the voiceprint recognition has the advantages of economy, convenience and the like, so that the voiceprint recognition has a very wide application prospect. How to suppress external noise as much as possible and extract voice features as pure as possible from the acquired signals is a precondition for putting various voice processing techniques into practical use.
Today, the living quality of people is rapidly improved, the requirements of the public on the intelligent home system are not limited to the execution of standard and common control functions, but the intellectualization, convenience, safety and comfort of the whole home are expected to be improved. The voice print recognition function is added to the intelligent home system, and the stability of the system in a noise environment is improved by adopting voice enhancement, so that the human-computer interaction experience of the intelligent home can be further improved, and the use efficiency of a user on the intelligent home is improved; and a level system can be set for the control and operation of the smart home, and differentiated service functions can be provided for users with different permission levels, so that the overall safety and the practicability of the system are further improved. The system has strong impact force in the future market, especially under the large background that the development of the current smart home market is slow, the system can play more and more important roles and has profound influence on the life of the public, but the voice recognition or voice feature extraction method in the prior art has the problems of high average error rate and low recognition rate.
Therefore, in order to further reduce the error rate such as average and the like and improve the recognition rate, the invention provides a dynamic voiceprint feature extraction method which is integrated with a static component.
Disclosure of Invention
The purpose of the invention is as follows: the dynamic voiceprint feature extraction method is low in average equal error rate and high in recognition rate.
The technical scheme is as follows: the invention provides a dynamic voiceprint feature extraction method fused with static components, which is used for carrying out voiceprint feature extraction on target voice data and is characterized by comprising the following steps:
step 1: preprocessing the target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain an MFCC coefficient of the target voice data;
and step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
As a preferred aspect of the present invention, in step 1, a method for preprocessing target speech data includes: dividing target voice data into T frames, and acquiring multi-frame voice data;
in step 2, the method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
As a preferable aspect of the present invention, in step 3, the dynamic voiceprint feature extraction model merged into the static component is:
Figure BDA0002968221760000021
d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, C (l, t) is the tth parameter of the ith order in the MFCC coefficients, C (l, t +1) is the t +1 th parameter of the ith order in the MFCC coefficients, C (l, t + K) is the t + K parameter of the ith order, C (l, t-K) is the t-K parameter of the ith order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is performed on the tth frame of voice data, and K is the preset total step length when Fourier transform is performed on the tth frame of voice data.
As a preferred aspect of the present invention, according to the following formula:
Figure BDA0002968221760000022
obtaining an l-th order characteristic coefficient C (l, t) of the t-th frame of voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
As a preferred aspect of the present invention, according to the following formula:
Figure BDA0002968221760000023
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Has the advantages that: compared with the prior art, the method for extracting the dynamic voiceprint features fused with the static components, provided by the invention, has the advantages that the voiceprint features are extracted based on the dynamic voiceprint feature extraction model fused with the static components, and the purposes of reducing average equal error rate and improving identification rate are achieved while the sound continuity is ensured.
Drawings
FIG. 1 is a flow chart of a dynamic voiceprint feature extraction method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of the variation of the equal error rate with the ratio of the dynamic characteristic to the static characteristic provided by the embodiment of the invention;
FIG. 3 is a graph illustrating the variation of the constant error rate with the static feature coefficients according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1, the method for extracting dynamic voiceprint features merged into a static component provided by the invention comprises the following steps:
step 1: and preprocessing the target voice data to obtain the preprocessed target voice data.
The method for preprocessing the target voice data comprises the following steps: dividing target voice data into T frames, and acquiring multi-frame voice data;
step 2: and processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain the MFCC coefficient of the target voice data.
The method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
The method of step 1 and step 2 specifically comprises the following steps:
extraction of Mel-frequency cepstrum coefficients (MFCCs) is performed on data that has been subjected to speech preprocessing, and desired feature coefficients are obtained by performing operations such as fourier transform, Mel (Mel) filter, and the like on the data.
(1) Fourier transform is carried out on each frame of data after voice preprocessing to obtain corresponding frequency spectrum and obtain power spectrum | X (j) of each frame2X (j) is calculated as follows:
Figure BDA0002968221760000031
wherein, N is the length of each frame, J is the fast Fourier transform length, namely the total frame number, J is the value of 1-J, which represents the jth frame, and x (N) is the voice data in the nth frame.
(2) And designing a Mel filter bank, and filtering the power spectrum of the signal through the configured Mel filter bank. And carrying out logarithm operation, and converting the frequency scale into Mel frequency. The center frequency f (m) of the mth filter in the filter bank satisfies the following formula:
Mel(f(m+1))-Mel(f(m))=Mel(f(m))-Mel(f(m-1))
where m is the number of filters in the filter bank, and Mel (f (m)) is the operation of converting the frequency f (m) to Mel frequency.
Transfer function H of each band pass filter in Mel Filter Bankm(f):
Figure BDA0002968221760000041
Wherein f is the frequency.
After the voice data is processed by the Mel filter, the logarithmic energy S (m) output by each filter bank is calculated:
Figure BDA0002968221760000042
wherein M is the serial number of the filter bank filter, M is the total number of the filters in the filter bank, generally 22-26, and M is 24 in the invention. | X (k) messaging2Represents the power spectrum of the k-th frame, Hm(f) Representing the transfer function of the mth filter in the filter bank at frequency f.
(3) Discrete cosine transform is carried out on the logarithm Mel power spectrum of each frame to carry out decorrelation operation on the energy of the logarithm Mel power spectrum, the correlation among signals of all dimensions is eliminated, the signals are mapped to a low-dimensional space, and a corresponding MFCC coefficient C (l) is obtained:
Figure BDA0002968221760000043
wherein, L is the total order of the MFCC coefficient, usually 12 to 18, and the invention takes L-15; l is a value from 1 to L and represents the ith order of the MFCC coefficient.
And step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
In step 3, a dynamic voiceprint feature extraction model fused with the static component is constructed according to the following method:
the essence of the dynamic feature extraction is the MFCC coefficient difference mode, that is, when the MFCC coefficient difference parameter of the t-th frame is calculated, the parameters of the t-1-th frame and the t + 1-th frame are used for carrying out the downsampling. Therefore, the classical dynamic feature extraction formula is as follows:
Figure BDA0002968221760000051
wherein J represents the fast Fourier transform length, usually 1 or 2, represents a first-order MFCC coefficient differential parameter and a second-order MFCC coefficient differential parameter, and is the value of J (J is more than or equal to 1 and is less than or equal to J); l is the order of the Mel cepstrum coefficient, T is the frame number, T is the total frame number of a section of audio, C (l, T) is the T-th parameter of the L-th order of the Mel cepstrum coefficient matrix of the voice signal, and d (l, T) is the MFCC dynamic characteristic parameter.
The new dynamic voiceprint characteristic feature symplex frequency cepstrum coefficient formula provided by the invention is as follows:
Figure BDA0002968221760000052
the modification is as follows:
Figure BDA0002968221760000053
wherein the content of the first and second substances,
Figure BDA0002968221760000054
for the dynamic voiceprint feature proposed by the present invention, MFCC is the static voiceprint feature and Δ MFCC is the classical motionThe dynamic voiceprint characteristic is a difference dynamic parameter, alpha is a static characteristic coefficient, beta is a dynamic characteristic coefficient, and delta is the ratio of the dynamic characteristic coefficient to the static characteristic coefficient.
The sum α and δ values are determined according to the following method:
assuming that α is 1, the optimum value of the ratio δ of the dynamic coefficient to the static coefficient is determined by experiment.
The number of gaussian elements in the experiment was set to 64, and voice data of 100 persons (of which 50 women and 50 men) was selected from the timmit corpus as experimental voice data of the experiment. And selecting 60 persons of voice data as training data for UBM model training, and combining 10 sections of voice of each person into 10 seconds of voice for UBM model training. Model parameters of the UBM model are obtained and stored, 5 segments of speech of each of the remaining 40 persons are combined into 10-second speech data to train the GMM model of each specific speaker, and the obtained model parameters are stored. The remaining voice data of the last 40 people is cycled into 10 segments of 5 seconds of voice data to match the system. The complete test process comprises 400 times of speaker acceptance test experiments and 15600 times of speaker rejection test experiments, and the equal error rate is obtained as the output result of one experiment.
For the voiceprint feature obtained by the voice data, each section of test voice generates a plurality of frames of voice sections, the set MFCC order is 15, so that one frame of voice data can generate 15 MFCC coefficients, 15 dynamic feature coefficients are generated after calculation, and 30 MFCC coefficients are generated in each frame of voice section after combination. The sampling frequency in the experiment was 16KHz and the frame was shifted to 1/2 the length of the frame.
Assuming that α is 1, the optimum value of the ratio δ of the dynamic coefficient to the static coefficient is determined by experiment.
According to the experimental conditions, δ takes 5 different values, and 5 experiments are respectively carried out to obtain average equal error rate data as shown in table 1:
TABLE 1
Figure BDA0002968221760000061
Based on the data shown in table 1, error rate curves such as the ratio δ of different dynamic characteristics to static characteristics and the average can be obtained as shown in fig. 1.
As can be seen from fig. 2, when δ is 1, the average equal error rate is the lowest, so that the optimal value of the ratio δ of the dynamic characteristic to the static characteristic is 1.
Accordingly, the dynamic voiceprint characteristic symplex frequency cepstrum coefficient formula provided by the invention can be changed into:
Figure BDA0002968221760000062
according to the experimental conditions, α takes 5 different values, and 5 experiments are performed respectively to obtain average equal error rate data as shown in table 2:
TABLE 2
Figure BDA0002968221760000063
Based on the data shown in Table 2, error rate curves of different static characteristic coefficients α and average values can be obtained as shown in FIG. 3.
As can be seen from fig. 3, when α is 0.5, the average equal error rate is the lowest, and thus the optimal value of the static feature coefficient is 0.5.
Accordingly, the dynamic voiceprint characteristic symplex frequency cepstrum coefficient formula provided by the invention can be changed into:
Figure BDA0002968221760000064
formula (5) represents a dynamic feature parameter, that is, Δ MFCC, MFCC is a static feature parameter, that is, MFCC ═ d (l, t), and the two are added by taking a weight of 0.5, so as to obtain a dynamic feature extraction formula in which a static component is merged:
Figure BDA0002968221760000071
and (5) arranging to obtain a dynamic feature extraction formula fused with the static component:
Figure BDA0002968221760000072
namely, the constructed dynamic voiceprint feature extraction model fused into the static component is as follows:
Figure BDA0002968221760000073
wherein d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, and d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, namely: d (l, t) is the t-th parameter of the order I of the MFCC dynamic characteristic difference parameter matrix; c (l, t) is the t-th parameter of the l-th order in the MFCC coefficients, C (l, t +1) is the t + 1-th parameter of the l-th order in the MFCC coefficients, C (l, t + K) is the t + K-th parameter of the l-th order, C (l, t-K) is the t-K-th parameter of the l-th order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is carried out on the t-th frame voice data, and K is the preset total step length when Fourier transform is carried out on the t-th frame voice data.
And for the constructed dynamic voiceprint feature extraction model blended into the static component, the following formula is adopted:
Figure BDA0002968221760000074
obtaining an l-th order characteristic coefficient C (l, t) of the t-th frame of voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
According to the following formula:
Figure BDA0002968221760000075
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Based on the model and the method, according to parameters such as a Mel cepstrum coefficient matrix, audio time and the like, static characteristic parameters can be calculated firstly, and dynamic characteristic parameters blended into static components are further calculated for voiceprint recognition.
In the voiceprint recognition algorithm, a Gaussian mixture model and a general background model are commonly used for carrying out model establishment on voiceprint characteristics of a speaker, and the method mainly comprises the steps of training voice input of the Gaussian mixture model, voice preprocessing, voiceprint characteristic extraction, general background model parameter input, Gaussian mixture model construction and Gaussian mixture model parameter storage. Generally, in the voiceprint recognition algorithm, a classical dynamic feature extraction algorithm is mostly adopted in the process of voiceprint feature extraction, the process is improved, a static component is blended in when a dynamic feature extraction parameter is calculated, and the performance of the voiceprint recognition algorithm is improved.
The above description is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be considered as the protection scope of the present invention.

Claims (5)

1. A dynamic voiceprint feature extraction method fused with static components is used for carrying out voiceprint feature extraction on target voice data, and is characterized by comprising the following steps:
step 1: preprocessing the target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain an MFCC coefficient of the target voice data;
and step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
2. The method for extracting dynamic voiceprint features merged into a static component according to claim 1, wherein in step 1, the method for preprocessing the target voice data comprises: dividing target voice data into T frames, and acquiring multi-frame voice data;
in step 2, the method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
3. The method according to claim 2, wherein in step 3, the model for extracting the dynamic voiceprint features merged into the static component is:
Figure FDA0002968221750000011
d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, C (l, t) is the tth parameter of the ith order in the MFCC coefficients, C (l, t +1) is the t +1 th parameter of the ith order in the MFCC coefficients, C (l, t + K) is the t + K parameter of the ith order, C (l, t-K) is the t-K parameter of the ith order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is performed on the tth frame of voice data, and K is the preset total step length when Fourier transform is performed on the tth frame of voice data.
4. The method of claim 3, wherein the method comprises the following steps:
Figure FDA0002968221750000012
acquiring characteristic coefficients C (l, t) of the l order of the t frame voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
5. The method of extracting a dynamic temperature-increasing feature incorporating a static component according to claim 4, wherein the method is based on the following formula:
Figure FDA0002968221750000021
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
CN202110257723.XA 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component Active CN112951245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110257723.XA CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110257723.XA CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Publications (2)

Publication Number Publication Date
CN112951245A true CN112951245A (en) 2021-06-11
CN112951245B CN112951245B (en) 2023-06-16

Family

ID=76228612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110257723.XA Active CN112951245B (en) 2021-03-09 2021-03-09 Dynamic voiceprint feature extraction method integrated with static component

Country Status (1)

Country Link
CN (1) CN112951245B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689863A (en) * 2021-09-24 2021-11-23 广东电网有限责任公司 Voiceprint feature extraction method, device, equipment and storage medium
CN115762529A (en) * 2022-10-17 2023-03-07 国网青海省电力公司海北供电公司 Method for preventing cable from being broken outside by using voice recognition perception algorithm

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1246745A (en) * 1985-03-25 1988-12-13 Melvyn J. Hunt Man/machine communications system using formant based speech analysis and synthesis
CA2158847A1 (en) * 1993-03-25 1994-09-29 Mark Pawlewski A Method and Apparatus for Speaker Recognition
KR100779242B1 (en) * 2006-09-22 2007-11-26 (주)한국파워보이스 Speaker recognition methods of a speech recognition and speaker recognition integrated system
CN102290048A (en) * 2011-09-05 2011-12-21 南京大学 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN104835498A (en) * 2015-05-25 2015-08-12 重庆大学 Voiceprint identification method based on multi-type combination characteristic parameters
US20170365259A1 (en) * 2015-02-05 2017-12-21 Beijing D-Ear Technologies Co., Ltd. Dynamic password voice based identity authentication system and method having self-learning function
CN107610708A (en) * 2017-06-09 2018-01-19 平安科技(深圳)有限公司 Identify the method and apparatus of vocal print
CN107993663A (en) * 2017-09-11 2018-05-04 北京航空航天大学 A kind of method for recognizing sound-groove based on Android
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN109256138A (en) * 2018-08-13 2019-01-22 平安科技(深圳)有限公司 Auth method, terminal device and computer readable storage medium
CN110428841A (en) * 2019-07-16 2019-11-08 河海大学 A kind of vocal print dynamic feature extraction method based on random length mean value
US20200135171A1 (en) * 2017-02-28 2020-04-30 National Institute Of Information And Communications Technology Training Apparatus, Speech Synthesis System, and Speech Synthesis Method
CN111489763A (en) * 2020-04-13 2020-08-04 武汉大学 Adaptive method for speaker recognition in complex environment based on GMM model

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1246745A (en) * 1985-03-25 1988-12-13 Melvyn J. Hunt Man/machine communications system using formant based speech analysis and synthesis
CA2158847A1 (en) * 1993-03-25 1994-09-29 Mark Pawlewski A Method and Apparatus for Speaker Recognition
KR100779242B1 (en) * 2006-09-22 2007-11-26 (주)한국파워보이스 Speaker recognition methods of a speech recognition and speaker recognition integrated system
CN102290048A (en) * 2011-09-05 2011-12-21 南京大学 Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
US20170365259A1 (en) * 2015-02-05 2017-12-21 Beijing D-Ear Technologies Co., Ltd. Dynamic password voice based identity authentication system and method having self-learning function
CN104835498A (en) * 2015-05-25 2015-08-12 重庆大学 Voiceprint identification method based on multi-type combination characteristic parameters
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
US20200135171A1 (en) * 2017-02-28 2020-04-30 National Institute Of Information And Communications Technology Training Apparatus, Speech Synthesis System, and Speech Synthesis Method
CN107610708A (en) * 2017-06-09 2018-01-19 平安科技(深圳)有限公司 Identify the method and apparatus of vocal print
CN107993663A (en) * 2017-09-11 2018-05-04 北京航空航天大学 A kind of method for recognizing sound-groove based on Android
CN109256138A (en) * 2018-08-13 2019-01-22 平安科技(深圳)有限公司 Auth method, terminal device and computer readable storage medium
CN108847244A (en) * 2018-08-22 2018-11-20 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN110428841A (en) * 2019-07-16 2019-11-08 河海大学 A kind of vocal print dynamic feature extraction method based on random length mean value
CN111489763A (en) * 2020-04-13 2020-08-04 武汉大学 Adaptive method for speaker recognition in complex environment based on GMM model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
岳倩倩;周萍;景新幸;: "基于非线性幂函数的听觉特征提取算法研究", 微电子学与计算机, no. 06 *
申小虎;万荣春;张新野;: "一种改进动态特征参数的话者语音识别***", 计算机仿真, no. 04 *
赵青;成谢锋;朱冬梅;: "基于改进MFCC和短时能量的咳嗽音身份识别", 计算机技术与发展, no. 06 *
郭春霞;: "说话人识别算法的研究", 西安邮电学院学报, no. 05 *
魏丹芳;李应;: "基于MFCC和加权动态特征组合的环境音分类", 计算机与数字工程, no. 02 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689863A (en) * 2021-09-24 2021-11-23 广东电网有限责任公司 Voiceprint feature extraction method, device, equipment and storage medium
CN113689863B (en) * 2021-09-24 2024-01-16 广东电网有限责任公司 Voiceprint feature extraction method, voiceprint feature extraction device, voiceprint feature extraction equipment and storage medium
CN115762529A (en) * 2022-10-17 2023-03-07 国网青海省电力公司海北供电公司 Method for preventing cable from being broken outside by using voice recognition perception algorithm

Also Published As

Publication number Publication date
CN112951245B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
Sarikaya et al. High resolution speech feature parametrization for monophone-based stressed speech recognition
CN111223493A (en) Voice signal noise reduction processing method, microphone and electronic equipment
CN110428849B (en) Voice enhancement method based on generation countermeasure network
CN113129897B (en) Voiceprint recognition method based on attention mechanism cyclic neural network
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
EP1250699B1 (en) Speech recognition
CN106024010B (en) A kind of voice signal dynamic feature extraction method based on formant curve
CN111128209B (en) Speech enhancement method based on mixed masking learning target
CN102982801A (en) Phonetic feature extracting method for robust voice recognition
CN112951245A (en) Dynamic voiceprint feature extraction method integrated with static component
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
CN110428841B (en) Voiceprint dynamic feature extraction method based on indefinite length mean value
CN107274887A (en) Speaker's Further Feature Extraction method based on fusion feature MGFCC
CN112017658A (en) Operation control system based on intelligent human-computer interaction
Bhardwaj et al. Deep neural network trained Punjabi children speech recognition system using Kaldi toolkit
CN111739562A (en) Voice activity detection method based on data selectivity and Gaussian mixture model
Das et al. Robust front-end processing for speech recognition in noisy conditions
Li et al. An auditory system-based feature for robust speech recognition
Lee et al. A useful feature-engineering approach for a LVCSR system based on CD-DNN-HMM algorithm
Hurmalainen et al. Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition
TWI749547B (en) Speech enhancement system based on deep learning
CN111920390A (en) Snore detection method based on embedded terminal
CN112992131A (en) Method for extracting ping-pong command of target voice in complex scene
Chen et al. Entropy-based feature parameter weighting for robust speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant