WO2004112001A1 - Gmm incremental robust adaptation with forgetting factor for speaker verification - Google Patents

Gmm incremental robust adaptation with forgetting factor for speaker verification Download PDF

Info

Publication number
WO2004112001A1
WO2004112001A1 PCT/KR2003/001207 KR0301207W WO2004112001A1 WO 2004112001 A1 WO2004112001 A1 WO 2004112001A1 KR 0301207 W KR0301207 W KR 0301207W WO 2004112001 A1 WO2004112001 A1 WO 2004112001A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
speaker
adaptation
outlier
gmm
Prior art date
Application number
PCT/KR2003/001207
Other languages
French (fr)
Inventor
Ki Yong Lee
Jong Joo Lee
Youn Jeong Lee
Original Assignee
Kwangwoon Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kwangwoon Foundation filed Critical Kwangwoon Foundation
Priority to AU2003243026A priority Critical patent/AU2003243026A1/en
Priority to PCT/KR2003/001207 priority patent/WO2004112001A1/en
Publication of WO2004112001A1 publication Critical patent/WO2004112001A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • the present invention relates to a GMM incremental robust adaptation with a forgetting factor for the speaker verification. More specifically, the present invention is robust to an effect of outliers such as utterances change and a noise, and provides a speaker recognition model having a high recognition rate by uniformly adapting new data to a conventional speaker model in spite of the lapse of time in order to adapt a speaker model exactly.
  • the present invention relates to a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. More specifically, the present invention relates to a method which reduces an effect of outliers, and uniformly maintains a rate in which new data is adapted to a conventional model using a forgetting factor by preventing the rate from being reduced.
  • a speech recognition method is divided into a text independent method and a text dependent method.
  • a Gaussian Mixture Model is mainly used for the text independent method, and a Hidden Markov Model is mainly used for the text dependent method.
  • speaker recognition method based on GMM in order to have a high recognition rate, when first forming a speaker model, a great deal of data recorded through a multi session are required.
  • the speaker verification adaptation is a method which forms a speaker model using small amount of data during a registration, and adapts data received from every testing to a speaker model.
  • a speaker adaptation in the GMM is achieved as follows.
  • N is an integer
  • ⁇ pQ : - ⁇ y n (t), ⁇ ) , y" is a stored value during the
  • an object of the present invention to provide a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a robust method, and adapts an effect of new data to a model greater than a predetermined rate using a variable forgetting factor.
  • a GMM incremental robust adaptation with a forgetting factor for a speaker verification which reduces an effect of an outlier using a Cauchy weight and uniformly influences data upon a model during a registration.
  • the GMM incremental robust adaptation minimizes an influence of the data upon the model when an outlier occurs during testing, and adapts new data to the model constantly using a forgetting factor.
  • FIG. 1 is a graph for showing an EER when an outlier does not occur
  • FIG. 2 is a graph for showing an EER when an outlier occurs
  • FIG. 3 is a block diagram showing a GMM system using an incremental robust adaptation with a forgetting factor method
  • FIG. 4 is a view for illustrating problems of a conventional adaptation method according to time.
  • N data used in a registration is expressed by the following
  • Equation 2 p( ⁇ n ⁇ t] ⁇ )is Gaussian mixture density and a weighted sum of M Gaussian
  • Wn (t) is a weighted function.
  • ⁇ n (t) has a great value to lower w n (t) to thereby reduce an effect in a model
  • is a forgetting factor and is set to "1" during registration.
  • W l , ⁇ N) ⁇ N -" ⁇ w ll (t)p(iy n ⁇ t), ⁇ N ) .
  • Wlien new data Y N+ includes an outlier
  • W p [N + I) ⁇ W p [N) + ⁇ w N+ , [t)p(i y N+] [ ⁇ , ⁇
  • a forgetting factor ⁇ ranges ⁇ ⁇ 1.
  • the adaptation according to the present invention performs in the same manner as that of the conventional method.
  • the forgetting factor ⁇ becomes smaller, an effect for new data is increased, an effect which the conventional model exerts becomes smaller.
  • 0.9 ⁇ 0.95.
  • FIGs. 1 and 2 Authentications for 12 speakers are tested using 20 sentences (once per week, 5 recordings) for every section. The results are shown in FIGs. 1 and 2.
  • A indicates a case where a method according to the present invention is used.
  • B indicates a case where a conventional method is used.
  • C indicates a case in which a model is registered with a great deal of information without adaptation.
  • FIGs. 1 and 2 indicate the compared results of the case A, the case B, and the case C.
  • FIG. 1 indicates an Equal Error Rate (EER) change according to a lapse of time when an utterance change or an outlier does not occur.
  • EER Equal Error Rate
  • FIG. 2 indicates an EER change according to a lapse of time when an utterance change or an outlier occurs.
  • the outlier occurs, the case B and the case C are influenced by the outlier during testing so that a False Reject significantly occurs to increase the EER.
  • a speaker model is adapted by a polluted signal to form an inaccurate speaker model.
  • a false rejection occurs.
  • a performance of a wrong adapted model is discovered less than that of an adaptation model when an outlier does not occur.
  • the present invention solves a problem in which a speaker model inaccurately changes due to effect of the outlier. Although time lapses, a rate in which new data are adapted to the model becomes greater than a predetermined value. So, the present invention obtains a speaker adaptation model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Speaker recognition system uses a speaker model adaptation method with small amounts of data in order to obtain a good performance. However, in a conventional adaptation method, when new data has an outlier such as a noise or an utterance change, which results in inaccurate speaker model. As time goes by, a rate in which new data are adapted to a model is reduced. A new method uses an incremental robust adaptation in order to reduce effect of outlier and uses forgetting factor in order to maintain adaptive rate of new data.

Description

GMM INCREMENTAL ROBUST ADAPTATION WITH FORGETTING FACTOR FOR SPEAKER VERIFICATION
Technical Field The present invention relates to a GMM incremental robust adaptation with a forgetting factor for the speaker verification. More specifically, the present invention is robust to an effect of outliers such as utterances change and a noise, and provides a speaker recognition model having a high recognition rate by uniformly adapting new data to a conventional speaker model in spite of the lapse of time in order to adapt a speaker model exactly.
The present invention relates to a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. More specifically, the present invention relates to a method which reduces an effect of outliers, and uniformly maintains a rate in which new data is adapted to a conventional model using a forgetting factor by preventing the rate from being reduced.
Background Art
With the increasing use of Internet and a personal computer, a problem concurring the disclosure of personal information has significantly increased. A conventional ID and password are no longer adequate to prevent the personal information from being disclosed. Thus, various biological methods have been studied as an authentication method. Among them, the speaker recognition method is widely studied since a user is familiar with a speaker recognition method and easily uses it. A speech recognition method is divided into a text independent method and a text dependent method. A Gaussian Mixture Model is mainly used for the text independent method, and a Hidden Markov Model is mainly used for the text dependent method. In speaker recognition method based on GMM, in order to have a high recognition rate, when first forming a speaker model, a great deal of data recorded through a multi session are required. However, for the purpose of a speech recognition, it is impossible for a speaker recognition method based on GMM to receive a great deal of data recorded by a speaker. Thus, a speaker verification adaptation has been proposed. The speaker verification adaptation is a method which forms a speaker model using small amount of data during a registration, and adapts data received from every testing to a speaker model. A speaker adaptation in the GMM is achieved as follows.
It is assumed that a model registered by N (where, N is an integer) data is expressed by the following equation.
θN = {p?, μf , ∑f}, i = l,2,.,M
where pf is a mixture weight, μ" is a mean, ∑f is a variance. During
testing, when (N÷l)-th yN÷ι (t) is entered, an adaptation equation of a model is
expressed by the following equations Ia, Ib, and Ic. - Mixture weights: [Equation Ia]
Figure imgf000004_0001
Means: [Equation Ib]
Figure imgf000005_0001
- Variance: [Equation Ic]
Figure imgf000005_0002
N T11
In the equations, γ^ = ∑∑pQ :- \ yn (t),θ) , y" is a stored value during the
«=1 t=\
registration. When (N+l)-th data is entered, γ"+> = γ" + ∑p(i | yN+ι(t),θ) .
I=)
However, when an outlier occurs due to utterance changes or a noise in newly entered data, an effect of the outlier is adapted to a model in order to change the model inexactly. As an adaptation advances, γ" is increased. Accordingly, when
a predetermined time lapses, an effect the newly entered data yN+} (t) has a problem
on when model is adapted.
Disclosure of Invention
Therefore, an object of the present invention to provide a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a robust method, and adapts an effect of new data to a model greater than a predetermined rate using a variable forgetting factor.
According to the present invention, there is provided a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a Cauchy weight and uniformly influences data upon a model during a registration. The GMM incremental robust adaptation minimizes an influence of the data upon the model when an outlier occurs during testing, and adapts new data to the model constantly using a forgetting factor.
Brief Description of Drawings
The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:
FIG. 1 is a graph for showing an EER when an outlier does not occur; FIG. 2 is a graph for showing an EER when an outlier occurs;
FIG. 3 is a block diagram showing a GMM system using an incremental robust adaptation with a forgetting factor method; and
FIG. 4 is a view for illustrating problems of a conventional adaptation method according to time.
Best Mode for Carrying Out the Invention
Hereinafter, a preferred embodiment of the present invention will be described with reference to FIG. 2. -Registration It is assumed that N data used in a registration is expressed by the following
equation. YN = {Yn , n = 1,A , N}, Yn = {y,, {t), t = l,A , Tn ] . When an outlier exists at
YN , a robust estimating method is expressed by the following equation 2 based on an M-estimation method for a reliable estimating step of a GMM [Equation 2]
J = ∑λN-±p[logp(yn(t]θ)] t=\
where p[] is a loss function and reduces an effect of the outlier. In the
equation 2, p(γn{t]θ)is Gaussian mixture density and a weighted sum of M Gaussian
density. A model of a robust GMM for a speaker model is defined by ΘN = {p" , μ" , ∑f } as in a conventional method. By minimizing J with respect to
respective
Figure imgf000007_0001
Σ(. , a re-estimating equation of the model is obtained. By
setting = 0, = 0, = 0 , a robust GMM re-estimating equation is dp i dμi δ∑;
expressed by the following equations 3a, 3b, and 3c. - Mixture Weights [Equation 3a]
Figure imgf000007_0002
»=1 t=\
- Means [Equation 3b]
Figure imgf000008_0001
- Variance [Equation 3c]
Figure imgf000008_0002
In the equation 3c,
Figure imgf000008_0003
(t),θ) is a posteriori probability,
. Wn (t) is a weighted function. When
Figure imgf000008_0004
zn {t) = log p(y,, . In an embodiment of the
Figure imgf000008_0005
present invention, Cauchy's weight function is used that wn (t) = l/\[l + zn (t))2 / β),
where, β is a scale parameter. In the equations 3a, 3b, and 3c, when an outlier occurs,
n (t) has a great value to lower wn (t) to thereby reduce an effect in a model
equation, λ is a forgetting factor and is set to "1" during registration.
- Testing (Robust Incremental Adaptation for GMM) Where a model parameter ΘN is registered by N data, when data Y N+\ ~ (jViOO'Λ >yN+\ (T N+ι )} is entered, (N+l)-th re-estimating equation is
sequentially obtained by the following equations 4a, 4b, and 4c. - Mixture Weights, [Equation 4a]
Figure imgf000009_0001
- Means, [Equation 4b]
Figure imgf000009_0002
- Variance, [Equation 4c]
Figure imgf000009_0003
the equations 4a, 4b, and 4c, W(N)
Figure imgf000009_0004
, and
Λ' r,, I \
Wl,{N)=∑λN-"∑wll(t)p(iyn{t),θN) . Wlien new data YN+, includes an outlier,
H=I C=]
Wyv+ι(O becomes smaller to reduce an effect of the outlier. W{N + l) and PF (N + l) are sequentially obtained by the following equations 5a and 5b from the equations 4a, 4b, and 4c, respectively. [Equation 5a]
W(N + l) = λW{N)+ TfjwN+1 {t) t=\
[Equation 5b]
Wp [N + I) = λWp [N) + ∑ wN+, [t)p(i yN+] [ή, θ
Figure imgf000010_0001
During the adaptation, a forgetting factor λ ranges λ < 1. When λ =1, the adaptation according to the present invention performs in the same manner as that of the conventional method. As the forgetting factor λ becomes smaller, an effect for new data is increased, an effect which the conventional model exerts becomes smaller. According to a result through an experiment, in the case of an adaptation of a speaker model, it is suitable that λ =0.9 ~ 0.95.
Authentications for 12 speakers are tested using 20 sentences (once per week, 5 recordings) for every section. The results are shown in FIGs. 1 and 2. In FIGs. 1 and 2, each of sections TO ~T1 has a period of one month. A indicates a case where a method according to the present invention is used. B indicates a case where a conventional method is used. C indicates a case in which a model is registered with a great deal of information without adaptation. FIGs. 1 and 2 indicate the compared results of the case A, the case B, and the case C. FIG. 1 indicates an Equal Error Rate (EER) change according to a lapse of time when an utterance change or an outlier does not occur. When the outlier does not occur, the case A, the case B, and the case C have similar EERs. However, after time lapses a little longer, in sections T5 ~T6, EERs of both of the case A and the case B with adaptation are less than that of the case C without adaptation. As time goes by, both of the case A and the case B have less ERRs.
FIG. 2 indicates an EER change according to a lapse of time when an utterance change or an outlier occurs. When the outlier occurs, the case B and the case C are influenced by the outlier during testing so that a False Reject significantly occurs to increase the EER. When an authentication is achieved, a speaker model is adapted by a polluted signal to form an inaccurate speaker model. Thus, when clean data are also tested, a false rejection occurs. When another person enters, a false wrong acceptance occurs. A performance of a wrong adapted model is discovered less than that of an adaptation model when an outlier does not occur. In the case A, since a test is performed to minimize the effect of outlier, only a slight difference exists between a performance of the case A and a performance of the case when outlier does not occur. The adaptation for the model makes the process to minimize the effect of the outlier in the case A. Although outlier initially occurs, it prevents the model from being inaccurately adapted. Thus, a performance of an adapted model when the outlier occurs is similar to that when the outlier does not occur.
Industrial Applicability As can be seen from the foregoing, the present invention solves a problem in which a speaker model inaccurately changes due to effect of the outlier. Although time lapses, a rate in which new data are adapted to the model becomes greater than a predetermined value. So, the present invention obtains a speaker adaptation model. Although a preferred embodiment of the present invention has been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims

Claims
1, A GMM incremental robust adaptation with forgetting factor for speaker verification comprising: adapting new data yN+l (t) to a speaker model^^ = {p* , μ" , ∑f }) by the
following equation when a speaker is authenticated, - Mixture Weights:
Figure imgf000013_0001
- Means:
Figure imgf000013_0002
- Variance:
Figure imgf000013_0003
W(N + I) = MV(N)+ %wN+l (0 c=i
Figure imgf000013_0004
adapting data having an outlier such as a noise and an utterance change according to a time to a robust speaker model by adapting the equation; and uniformly maintaining a ratio in which new data is adapted to a model.
PCT/KR2003/001207 2003-06-19 2003-06-19 Gmm incremental robust adaptation with forgetting factor for speaker verification WO2004112001A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2003243026A AU2003243026A1 (en) 2003-06-19 2003-06-19 Gmm incremental robust adaptation with forgetting factor for speaker verification
PCT/KR2003/001207 WO2004112001A1 (en) 2003-06-19 2003-06-19 Gmm incremental robust adaptation with forgetting factor for speaker verification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2003/001207 WO2004112001A1 (en) 2003-06-19 2003-06-19 Gmm incremental robust adaptation with forgetting factor for speaker verification

Publications (1)

Publication Number Publication Date
WO2004112001A1 true WO2004112001A1 (en) 2004-12-23

Family

ID=33550103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2003/001207 WO2004112001A1 (en) 2003-06-19 2003-06-19 Gmm incremental robust adaptation with forgetting factor for speaker verification

Country Status (2)

Country Link
AU (1) AU2003243026A1 (en)
WO (1) WO2004112001A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2465782A (en) * 2008-11-28 2010-06-02 Univ Nottingham Trent Biometric identity verification utilising a trained statistical classifier, e.g. a neural network
CN102968990A (en) * 2012-11-15 2013-03-13 江苏嘉利德电子科技有限公司 Speaker identifying method and system
WO2013086736A1 (en) * 2011-12-16 2013-06-20 华为技术有限公司 Speaker recognition method and device
US10257191B2 (en) 2008-11-28 2019-04-09 Nottingham Trent University Biometric identity verification
US11978464B2 (en) 2021-01-22 2024-05-07 Google Llc Trained generative model speech coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KIM Y.J., CHUNG J.H.: "Double compensation framework based on GMM for speaker recognition", JOURNAL OF THE KOREAN SOCIETY OF PHONETIC SCIENCE AND SPEECH TECHNOLOGY, no. 45, March 2003 (2003-03-01) *
KIM Y.J., CHUNG J.H.: "Signal bias removal based GMM for robust speaker recognition", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, PROCEEDINGS, vol. 4, 13 May 2002 (2002-05-13) - 17 May 2002 (2002-05-17), pages IV - 4163, XP008041968 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2465782A (en) * 2008-11-28 2010-06-02 Univ Nottingham Trent Biometric identity verification utilising a trained statistical classifier, e.g. a neural network
US9311546B2 (en) 2008-11-28 2016-04-12 Nottingham Trent University Biometric identity verification for access control using a trained statistical classifier
GB2465782B (en) * 2008-11-28 2016-04-13 Univ Nottingham Trent Biometric identity verification
US10257191B2 (en) 2008-11-28 2019-04-09 Nottingham Trent University Biometric identity verification
WO2013086736A1 (en) * 2011-12-16 2013-06-20 华为技术有限公司 Speaker recognition method and device
CN103562993A (en) * 2011-12-16 2014-02-05 华为技术有限公司 Speaker recognition method and device
CN103562993B (en) * 2011-12-16 2015-05-27 华为技术有限公司 Speaker recognition method and device
US9142210B2 (en) 2011-12-16 2015-09-22 Huawei Technologies Co., Ltd. Method and device for speaker recognition
CN102968990A (en) * 2012-11-15 2013-03-13 江苏嘉利德电子科技有限公司 Speaker identifying method and system
CN102968990B (en) * 2012-11-15 2015-04-15 朱东来 Speaker identifying method and system
US11978464B2 (en) 2021-01-22 2024-05-07 Google Llc Trained generative model speech coding

Also Published As

Publication number Publication date
AU2003243026A1 (en) 2005-01-04

Similar Documents

Publication Publication Date Title
US11335353B2 (en) Age compensation in biometric systems using time-interval, gender and age
US9767806B2 (en) Anti-spoofing
US9646614B2 (en) Fast, language-independent method for user authentication by voice
AU2004300140B2 (en) System and method for providing improved claimant authentication
US6223155B1 (en) Method of independently creating and using a garbage model for improved rejection in a limited-training speaker-dependent speech recognition system
US20030033143A1 (en) Decreasing noise sensitivity in speech processing under adverse conditions
EP1399915B1 (en) Speaker verification
US7904295B2 (en) Method for automatic speaker recognition with hurst parameter based features and method for speaker classification based on fractional brownian motion classifiers
US6134527A (en) Method of testing a vocabulary word being enrolled in a speech recognition system
EP2860706A2 (en) Anti-spoofing
Sanderson et al. Noise compensation in a person verification system using face and multiple speech features
US20070219792A1 (en) Method and system for user authentication based on speech recognition and knowledge questions
US20100204993A1 (en) Confidence levels for speaker recognition
Liu et al. A Spearman correlation coefficient ranking for matching-score fusion on speaker recognition
Munteanu et al. Automatic speaker verification experiments using HMM
WO2004112001A1 (en) Gmm incremental robust adaptation with forgetting factor for speaker verification
EP1414023B1 (en) Method for recognizing speech
Sanderson et al. Information fusion for robust speaker verification
JP3092788B2 (en) Speaker recognition threshold setting method and speaker recognition apparatus using the method
JP3090119B2 (en) Speaker verification device, method and storage medium
Garcia-Romero et al. U-norm likelihood normalization in PIN-based speaker verification systems
JP2001350494A (en) Device and method for collating
Saeta et al. Automatic estimation of a priori speaker dependent thresholds in speaker verification
Lotia et al. A review of various score normalization techniques for speaker identification system
Ouzounov Mean-Delta Features for Telephone Speech Endpoint Detection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP