CN103714818A - Speaker recognition method based on noise shielding nucleus - Google Patents

Speaker recognition method based on noise shielding nucleus Download PDF

Info

Publication number
CN103714818A
CN103714818A CN201310681894.0A CN201310681894A CN103714818A CN 103714818 A CN103714818 A CN 103714818A CN 201310681894 A CN201310681894 A CN 201310681894A CN 103714818 A CN103714818 A CN 103714818A
Authority
CN
China
Prior art keywords
gmm
noise
short
gauss
noise shielding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310681894.0A
Other languages
Chinese (zh)
Other versions
CN103714818B (en
Inventor
张卫强
刘加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201310681894.0A priority Critical patent/CN103714818B/en
Publication of CN103714818A publication Critical patent/CN103714818A/en
Application granted granted Critical
Publication of CN103714818B publication Critical patent/CN103714818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a speaker recognition method based on a noise shielding nucleus in the field of speech signal processing. The method comprises the following steps: step 1, inputting audio data and extracting short-time features of the audio data frame by frame; step 2, training a GMM model containing M Gauss mixed elements with the use of the short-time features of the audio data and recording the GMM model as an audio GMM; step 3, training a GMM model containing N Gauss mixed elements with the use of the short-time features of noise data and recording the GMM model as a noise GMM; step 4, splicing the audio GMM and the noise GMM into a mixed GMM; step 5, using the mixed GMM to generate a noise shielding super vector; and step 6, carry out SVM training and testing with the use of the generated noise shielding super vector and completing speaker training and recognition. The method can be used for automatically shielding noise contained in audio, is simple to implement and can effectively improve the performance of speaker recognition under noise conditions.

Description

Method for distinguishing speek person based on noise shielding core
Technical field
The invention belongs to field of voice signal, relate in particular to a kind of method for distinguishing speek person based on noise shielding core.
Background technology
Speaker Recognition Technology can be by speech recognition speaker's identity, and it has a wide range of applications in fields such as long-distance identity-certifying, information securities.At present in Speaker Identification field, the support vector machine of GSV-SVM(based on gauss hybrid models average super vector) be a kind of conventional method, it first utilizes UBM(universal background model) generate GSV(gauss hybrid models average super vector), and then use SVM(support vector machine) carry out Speaker Identification.The method is easily affected by noise, in order to address this problem, generally at front end, carries out voice enhancing, or adopts channel compensation technology during modeling.But these methods all need to introduce extra module processes noise, comparatively complicated while realizing.
Summary of the invention
The problem existing for above-mentioned prior art, the present invention proposes a kind of method for distinguishing speek person based on noise shielding core, it is characterized in that, and described method specifically comprises the following steps:
Step 1: input audio data, extracts short-time characteristic frame by frame to voice data;
Step 2: adopt one of the short-time characteristic training of speech data to mix first GMM model containing M Gauss, be designated as voice GMM;
Step 3: adopt one of the short-time characteristic training of noise data to mix first GMM model containing N Gauss, be designated as noise GMM;
Step 4: voice GMM and noise GMM are spliced into a mixing GMM;
Step 5: with mixing GMM generted noise shielding super vector;
Step 6: adopt the noise shielding super vector generating to carry out the training and testing of SVM, complete speaker's training and identification.
In described step 1, short-time characteristic adopts cepstrum feature in short-term, and cepstrum feature type is linear prediction cepstrum coefficient system LPCC, Mei Er frequency marking cepstrum coefficient MFCC or perception linear predictor coefficient PLP in short-term.
In described step 1, short-time characteristic can also adopt short-time energy, short-time zero-crossing rate, short-term correlation coefficient.
In described step 2 and step 3, GMM model training method adopts EM algorithm.
In described step 2, M value is hundreds of to several thousand, and in described step 3, N value is tens to hundreds of, and M value is more than 10N.
In described step 4, GMM joining method is: establishing voice GMM parameter is
Figure BDA0000436220480000021
noise GMM parameter is
Figure BDA0000436220480000022
wherein w is the mixed first weight of Gauss, and μ is the mixed first mean vector of Gauss, and Σ is the mixed first variance matrix of Gauss, and subscript m is the mixed first label of Gauss, and subscript s represents voice, and subscript n represents noise, and the parameter of mixing GMM is:
{ w m , μ m , Σ m } = { 1 2 w m s , μ m s , Σ m s } , m = 1 , . . . , M { 1 2 w m - M n , μ m - M n , Σ m - M n } , m = M + 1 , . . . , M + N
In described step 5, the production method of noise shielding super vector is M dimension corresponding to mixed unit before only calculating, and masks the dimension that noise is corresponding.
In described step 5, the concrete production method of noise shielding super vector is as follows:
Step 501: the cepstrum feature in short-term of supposing a section audio is { x t, t=1 ..., T}, wherein x is a frame feature, and subscript t is frame label, and T is totalframes, calculates frame by frame the mixed first posterior probability of each Gauss, t=1 ..., T, m=1 ..., M:
γ m ( t ) = w m p m ( x t ) Σ m ′ = 1 M + N w m ′ p m ′ ( x t )
P wherein m(x t) be the mixed first Gaussian probability density of m Gauss, its computing formula is:
p m ( x t ) = 1 ( 2 π ) D / 2 | Σ m | 1 / 2 exp { - 1 2 ( x t - μ m ) T Σ m - 1 ( x t - μ m ) } ;
Step 502: calculate the mixed unit of each Gauss and upgrade mean value vector, m=1 ..., M:
ξ m = Σ t = 1 T γ m ( t ) x t Σ t = 1 T γ m ( t ) ;
Step 503: utilize GMM weight and variance to carry out the mixed unit of its each Gauss renewal mean value vector regular, m=1 ..., M:
ξ m ′ = w m Σ m - 1 / 2 ξ m ;
Step 504: the vector after regular to M splices, production noise shielding super vector:
ζ = ξ 1 ′ ξ 2 ′ . . . ξ M ′
The training and testing Kernel Function of described SVM adopts linear kernel.
The invention has the beneficial effects as follows: noise shielding super vector can carry out automatic shield to the noise containing in audio frequency, and adopt the framework of GSV-SVM method to process, realize simple.Adopt the method, can effectively improve the performance of Speaker Identification under noise conditions.
Accompanying drawing explanation
Fig. 1 is the process flow diagram that in the present invention, training mixes GMM;
Fig. 2 is the process flow diagram of generted noise shielding super vector in the present invention.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation is only exemplary, rather than in order to limit the scope of the invention and to apply.
Fig. 1 is the process flow diagram of training mixing GMM provided by the invention.Described method specifically comprises the following steps:
Step 1: voice data is extracted to short-time characteristic frame by frame;
Short-time characteristic can adopt cepstrum feature in short-term, in short-term cepstrum feature extracting method (as voice signal process as described in textbook) in characteristic type be to be linear prediction cepstrum coefficient system LPCC, Mei Er frequency marking cepstrum coefficient MFCC or perception linear predictor coefficient PLP.
Short-time characteristic can also adopt short-time energy, short-time zero-crossing rate, short-term correlation coefficient etc.
Step 2: adopt one of the short-time characteristic training of speech data to mix first GMM model containing M Gauss, be designated as voice GMM; The general value of M is hundreds of to several thousand, enumerates typical value 2048,1024,512 here;
Step 3: adopt one of the short-time characteristic training of noise data to mix first GMM model containing N Gauss, be designated as noise GMM; The general value of N is tens to hundreds of, enumerates typical value 128,64,32 here;
GMM model training method (as voice signal process as described in textbook) adopt the EM(expectation maximum) algorithm.
Step 4: voice GMM and noise GMM are spliced into a mixing GMM, and concrete joining method is as follows: establishing voice GMM parameter is noise GMM parameter is
Figure BDA0000436220480000052
wherein w is the mixed first weight of Gauss, and μ is the mixed first mean vector of Gauss, and Σ is the mixed first variance matrix of Gauss, and subscript m is the mixed first label of Gauss, and subscript s represents voice, and subscript n represents noise, and the parameter of mixing GMM is:
{ w m , μ m , Σ m } = { 1 2 w m s , μ m s , Σ m s } , m = 1 , . . . , M { 1 2 w m - M n , μ m - M n , Σ m - M n } , m = M + 1 , . . . , M + N
Step 5: generate gauss hybrid models average super vector with mixing GMM, but M dimension corresponding to mixed unit before only calculating mask the dimension that noise is corresponding, is called noise shielding super vector;
The idiographic flow of generted noise shielding super vector as shown in Figure 2, comprises the following steps:
Step 501: the cepstrum feature in short-term of supposing a section audio is { x t, t=1 ..., T}, wherein x is a frame feature, and subscript t is frame label, and T is totalframes, calculates frame by frame the mixed first posterior probability of each Gauss, t=1 ..., T, m=1 ..., M:
γ m ( t ) = w m p m ( x t ) Σ m ′ = 1 M + N w m ′ p m ′ ( x t )
P wherein m(x t) be the mixed first Gaussian probability density of m Gauss, its computing formula is:
p m ( x t ) = 1 ( 2 π ) D / 2 | Σ m | 1 / 2 exp { - 1 2 ( x t - μ m ) T Σ m - 1 ( x t - μ m ) } ;
Step 502: calculate the mixed unit of each Gauss and upgrade mean value vector, m=1 ..., M:
ξ m = Σ t = 1 T γ m ( t ) x t Σ t = 1 T γ m ( t ) ;
Step 503: utilize GMM weight and variance to carry out the mixed unit of its each Gauss renewal mean value vector regular, m=1 ..., M:
ξ m ′ = w m Σ m - 1 / 2 ξ m ;
Step 504: the vector after regular to M splices, production noise shielding super vector:
ζ = ξ 1 ′ ξ 2 ′ . . . ξ M ′
Finally utilize the noise shielding super vector generating to carry out the training and testing of SVM, complete speaker's training and identification.
In the present invention, owing to mixing GMM, contain the mixed unit of noise, they can automatic absorption noise.While running into noise, the mixed first Gaussian probability density of noise is larger, and the mixed first Gaussian probability density of voice is less, and this can make the mixed first posterior probability of voice in step 501 less than normal, thereby make its vector proportion in step 502 less, reach the object of noise shielding.
Step 6: adopt the noise shielding super vector generating to carry out the training and testing of SVM, complete speaker's training and identification.
The training and testing method of SVM (method described in general modfel identification textbook) Kernel Function adopts linear kernel.
The super vector that adopts noise shielding, for the training and testing of SVM, can effectively improve the performance of Speaker Identification under noise conditions.
The above; be only the present invention's embodiment preferably, but protection scope of the present invention is not limited to this, is anyly familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (9)

1. the method for distinguishing speek person based on noise shielding core, is characterized in that, the method specifically comprises the following steps:
Step 1: input audio data, extracts short-time characteristic frame by frame to voice data;
Step 2: adopt one of the short-time characteristic training of speech data to mix first GMM model containing M Gauss, be designated as voice GMM;
Step 3: adopt one of the short-time characteristic training of noise data to mix first GMM model containing N Gauss, be designated as noise GMM;
Step 4: voice GMM and noise GMM are spliced into a mixing GMM;
Step 5: with mixing GMM generted noise shielding super vector;
Step 6: adopt the noise shielding super vector generating to carry out the training and testing of SVM, complete speaker's training and identification.
2. the method for distinguishing speek person based on noise shielding core according to claim 1, it is characterized in that, in described step 1, short-time characteristic adopts cepstrum feature in short-term, and cepstrum feature type is linear prediction cepstrum coefficient system LPCC, Mei Er frequency marking cepstrum coefficient MFCC or perception linear predictor coefficient PLP in short-term.
3. the method for distinguishing speek person based on noise shielding core according to claim 1 and 2, is characterized in that, in described step 1, short-time characteristic can also adopt short-time energy, short-time zero-crossing rate, short-term correlation coefficient.
4. the method for distinguishing speek person based on noise shielding core according to claim 1, is characterized in that, in described step 2 and step 3, GMM model training method adopts EM algorithm.
5. the method for distinguishing speek person based on noise shielding core according to claim 1, is characterized in that, in described step 2, M value is hundreds of to several thousand, and in described step 3, N value is tens to hundreds of, and M value is more than 10N.
6. the method for distinguishing speek person based on noise shielding core according to claim 1, is characterized in that, in described step 4, GMM joining method is: establishing voice GMM parameter is
Figure FDA0000436220470000021
noise GMM parameter is
Figure FDA0000436220470000022
wherein w is the mixed first weight of Gauss, and μ is the mixed first mean vector of Gauss, and Σ is the mixed first variance matrix of Gauss, and subscript m is the mixed first label of Gauss, and subscript s represents voice, and subscript n represents noise, and the parameter of mixing GMM is:
Figure FDA0000436220470000023
7. the method for distinguishing speek person based on noise shielding core according to claim 1, is characterized in that, in described step 5, the production method of noise shielding super vector is M dimension corresponding to mixed unit before only calculating, and masks the dimension that noise is corresponding.
8. according to the method for distinguishing speek person based on noise shielding core described in claim 1 or 7, it is characterized in that, in described step 5, the concrete production method of noise shielding super vector is as follows:
Step 501: the cepstrum feature in short-term of supposing a section audio is { x t, t=1 ..., T}, wherein x is a frame feature, and subscript t is frame label, and T is totalframes, calculates frame by frame the mixed first posterior probability of each Gauss, t=1 ..., T, m=1 ..., M:
Figure FDA0000436220470000024
P wherein m(x t) be the mixed first Gaussian probability density of m Gauss, its computing formula is:
Figure FDA0000436220470000031
Step 502: calculate the mixed unit of each Gauss and upgrade mean value vector, m=1 ..., M:
Figure FDA0000436220470000032
Step 503: utilize GMM weight and variance to carry out the mixed unit of its each Gauss renewal mean value vector regular, m=1 ..., M:
Figure FDA0000436220470000033
Step 504: the vector after regular to M splices, production noise shielding super vector:
Figure FDA0000436220470000034
9. the method for distinguishing speek person based on noise shielding core according to claim 1, is characterized in that, the training and testing Kernel Function of described SVM adopts linear kernel.
CN201310681894.0A 2013-12-12 2013-12-12 Method for distinguishing speek person based on noise shielding core Active CN103714818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310681894.0A CN103714818B (en) 2013-12-12 2013-12-12 Method for distinguishing speek person based on noise shielding core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310681894.0A CN103714818B (en) 2013-12-12 2013-12-12 Method for distinguishing speek person based on noise shielding core

Publications (2)

Publication Number Publication Date
CN103714818A true CN103714818A (en) 2014-04-09
CN103714818B CN103714818B (en) 2016-06-22

Family

ID=50407724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310681894.0A Active CN103714818B (en) 2013-12-12 2013-12-12 Method for distinguishing speek person based on noise shielding core

Country Status (1)

Country Link
CN (1) CN103714818B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN107424248A (en) * 2017-04-13 2017-12-01 成都步共享科技有限公司 A kind of vocal print method for unlocking of shared bicycle
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000075888A (en) * 1998-09-01 2000-03-14 Oki Electric Ind Co Ltd Learning method of hidden markov model and voice recognition system
CN1343968A (en) * 2000-09-18 2002-04-10 日本先锋公司 Speech identification system
CN1924998A (en) * 2005-08-29 2007-03-07 摩托罗拉公司 Method and system for verifying speakers
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000075888A (en) * 1998-09-01 2000-03-14 Oki Electric Ind Co Ltd Learning method of hidden markov model and voice recognition system
CN1343968A (en) * 2000-09-18 2002-04-10 日本先锋公司 Speech identification system
CN1924998A (en) * 2005-08-29 2007-03-07 摩托罗拉公司 Method and system for verifying speakers
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN107424248A (en) * 2017-04-13 2017-12-01 成都步共享科技有限公司 A kind of vocal print method for unlocking of shared bicycle

Also Published As

Publication number Publication date
CN103714818B (en) 2016-06-22

Similar Documents

Publication Publication Date Title
CN107103903B (en) Acoustic model training method and device based on artificial intelligence and storage medium
CN102723078B (en) Emotion speech recognition method based on natural language comprehension
Xia et al. Using i-Vector Space Model for Emotion Recognition.
De Leon et al. Detection of synthetic speech for the problem of imposture
Yu et al. Uncertainty propagation in front end factor analysis for noise robust speaker recognition
CN107527620A (en) Electronic installation, the method for authentication and computer-readable recording medium
CN102799892B (en) Mel frequency cepstrum coefficient (MFCC) underwater target feature extraction and recognition method
CN104376842A (en) Neural network language model training method and device and voice recognition method
CN105161092B (en) A kind of audio recognition method and device
CN106104674A (en) Mixing voice identification
CN107705802A (en) Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing
CN104809103A (en) Man-machine interactive semantic analysis method and system
CN105845140A (en) Speaker confirmation method and speaker confirmation device used in short voice condition
CN104575519A (en) Feature extraction method and device as well as stress detection method and device
CN105023570B (en) A kind of method and system for realizing sound conversion
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN110459232A (en) A kind of phonetics transfer method generating confrontation network based on circulation
CN103594084A (en) Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning
CN102664010A (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN105845141A (en) Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness
Kumar et al. Significance of GMM-UBM based modelling for Indian language identification
Sheng et al. GANs for children: A generative data augmentation strategy for children speech recognition
CN103559289A (en) Language-irrelevant keyword search method and system
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20161201

Address after: 100084 Zhongguancun Haidian District East Road No. 1, building 8, floor 8, A803B,

Patentee after: Beijing Hua Chong Chong Information Technology Co., Ltd.

Address before: 100084 Beijing, Beijing, 100084-82 mailbox

Patentee before: Tsinghua University

CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 100084 Tsinghua University, Haiding District, Haidian District, Beijing

Patentee after: BEIJING HUA KONG CHUANG WEI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 100084 Zhongguancun Haidian District East Road No. 1, building 8, floor 8, A803B,

Patentee before: BEIJING HUA KONG CHUANG WEI INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200430

Address after: 100084 Beijing city Haidian District Shuangqing Road No. 30 box 100084-82

Patentee after: TSINGHUA University

Address before: 100084 Tsinghua University, Haiding District, Haidian District, Beijing

Patentee before: BEIJING HUA KONG CHUANG WEI INFORMATION TECHNOLOGY Co.,Ltd.