WO2004112001A1

WO2004112001A1 - Gmm incremental robust adaptation with forgetting factor for speaker verification

Info

Publication number: WO2004112001A1
Application number: PCT/KR2003/001207
Authority: WO
Inventors: Ki Yong Lee; Jong Joo Lee; Youn Jeong Lee
Original assignee: Kwangwoon Foundation
Priority date: 2003-06-19
Filing date: 2003-06-19
Publication date: 2004-12-23
Also published as: AU2003243026A1

Abstract

Speaker recognition system uses a speaker model adaptation method with small amounts of data in order to obtain a good performance. However, in a conventional adaptation method, when new data has an outlier such as a noise or an utterance change, which results in inaccurate speaker model. As time goes by, a rate in which new data are adapted to a model is reduced. A new method uses an incremental robust adaptation in order to reduce effect of outlier and uses forgetting factor in order to maintain adaptive rate of new data.

Description

GMM INCREMENTAL ROBUST ADAPTATION WITH FORGETTING FACTOR FOR SPEAKER VERIFICATION

Technical Field The present invention relates to a GMM incremental robust adaptation with a forgetting factor for the speaker verification. More specifically, the present invention is robust to an effect of outliers such as utterances change and a noise, and provides a speaker recognition model having a high recognition rate by uniformly adapting new data to a conventional speaker model in spite of the lapse of time in order to adapt a speaker model exactly.

The present invention relates to a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. More specifically, the present invention relates to a method which reduces an effect of outliers, and uniformly maintains a rate in which new data is adapted to a conventional model using a forgetting factor by preventing the rate from being reduced.

Background Art

With the increasing use of Internet and a personal computer, a problem concurring the disclosure of personal information has significantly increased. A conventional ID and password are no longer adequate to prevent the personal information from being disclosed. Thus, various biological methods have been studied as an authentication method. Among them, the speaker recognition method is widely studied since a user is familiar with a speaker recognition method and easily uses it. A speech recognition method is divided into a text independent method and a text dependent method. A Gaussian Mixture Model is mainly used for the text independent method, and a Hidden Markov Model is mainly used for the text dependent method. In speaker recognition method based on GMM, in order to have a high recognition rate, when first forming a speaker model, a great deal of data recorded through a multi session are required. However, for the purpose of a speech recognition, it is impossible for a speaker recognition method based on GMM to receive a great deal of data recorded by a speaker. Thus, a speaker verification adaptation has been proposed. The speaker verification adaptation is a method which forms a speaker model using small amount of data during a registration, and adapts data received from every testing to a speaker model. A speaker adaptation in the GMM is achieved as follows.

It is assumed that a model registered by N (where, N is an integer) data is expressed by the following equation.

θ^N = {p?, μf , ∑f}, i = l,2,.,M

where pf is a mixture weight, μ" is a mean, ∑f is a variance. During

testing, when (N÷l)-th y_N÷ι (t) is entered, an adaptation equation of a model is

expressed by the following equations Ia, Ib, and Ic. - Mixture weights: [Equation Ia]

Means: [Equation Ib]

- Variance: [Equation Ic]

N T₁₁

In the equations, γ^ = ∑∑pQ ^:- \ y_n (t),θ) , y" is a stored value during the

«=1 t=\

registration. When (N+l)-th data is entered, γ"^+> = γ" + ∑p(i | y_N+ι(t),θ) .

I=)

However, when an outlier occurs due to utterance changes or a noise in newly entered data, an effect of the outlier is adapted to a model in order to change the model inexactly. As an adaptation advances, γ" is increased. Accordingly, when

a predetermined time lapses, an effect the newly entered data y_N+} (t) has a problem

on when model is adapted.

Disclosure of Invention

Therefore, an object of the present invention to provide a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a robust method, and adapts an effect of new data to a model greater than a predetermined rate using a variable forgetting factor.

According to the present invention, there is provided a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a Cauchy weight and uniformly influences data upon a model during a registration. The GMM incremental robust adaptation minimizes an influence of the data upon the model when an outlier occurs during testing, and adapts new data to the model constantly using a forgetting factor.

Brief Description of Drawings

The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:

FIG. 1 is a graph for showing an EER when an outlier does not occur; FIG. 2 is a graph for showing an EER when an outlier occurs;

FIG. 3 is a block diagram showing a GMM system using an incremental robust adaptation with a forgetting factor method; and

FIG. 4 is a view for illustrating problems of a conventional adaptation method according to time.

Best Mode for Carrying Out the Invention

Hereinafter, a preferred embodiment of the present invention will be described with reference to FIG. 2. -Registration It is assumed that N data used in a registration is expressed by the following

equation. Y^N = {Y_n , n = 1,A , N}, Y_n = {y,, {t), t = l,A , T_n ] . When an outlier exists at

Y^N , a robust estimating method is expressed by the following equation 2 based on an M-estimation method for a reliable estimating step of a GMM [Equation 2]

J = ∑λ^N-±p[logp(y_n(t]θ)] t=\

where p[] is a loss function and reduces an effect of the outlier. In the

equation 2, p(γ_n{t]θ)is Gaussian mixture density and a weighted sum of M Gaussian

density. A model of a robust GMM for a speaker model is defined by Θ^N = {p" , μ" , ∑f } as in a conventional method. By minimizing J with respect to

respective

Σ₍. , a re-estimating equation of the model is obtained. By

setting = 0, = 0, = 0 , a robust GMM re-estimating equation is dp i dμ_i δ∑_;

expressed by the following equations 3a, 3b, and 3c. - Mixture Weights [Equation 3a]

»=1 t=\

- Means [Equation 3b]

- Variance [Equation 3c]

In the equation 3c,

(t),θ) is a posteriori probability,

. _Wn (t) is a weighted function. When

z_n {t) = log p(y,, . In an embodiment of the

present invention, Cauchy's weight function is used that w_n (t) = l/\[l + z_n (t))² / β),

where, β is a scale parameter. In the equations 3a, 3b, and 3c, when an outlier occurs,

∑_n (t) has a great value to lower w_n (t) to thereby reduce an effect in a model

equation, λ is a forgetting factor and is set to "1" during registration.

- Testing (Robust Incremental Adaptation for GMM) Where a model parameter Θ^N is registered by N data, when data ^Y _N+\ ~ (jViOO'Λ >y_N+\ (^T _N+ι )} ^is entered, (N+l)-th re-estimating equation is

sequentially obtained by the following equations 4a, 4b, and 4c. - Mixture Weights, [Equation 4a]

- Means, [Equation 4b]

- Variance, [Equation 4c]

the equations 4a, 4b, and 4c, W(N)

, and

Λ' r,, I \

W_l,{N)=∑λ^N-"∑w_ll(t)p(iy_n{t),θ^N) . Wlien new data Y_N+, includes an outlier,

H=I C=]

Wy_v+ι(O becomes smaller to reduce an effect of the outlier. W{N + l) and PF (N + l) are sequentially obtained by the following equations 5a and 5b from the equations 4a, 4b, and 4c, respectively. [Equation 5a]

W(N + l) = λW{N)+ ^Tf_jw_N+1 {t) t=\

[Equation 5b]

W_p [N ₊ I) = λW_p [N) + ∑ w_N+, [t)p(i y_N+] [ή, θ

During the adaptation, a forgetting factor λ ranges λ < 1. When λ =1, the adaptation according to the present invention performs in the same manner as that of the conventional method. As the forgetting factor λ becomes smaller, an effect for new data is increased, an effect which the conventional model exerts becomes smaller. According to a result through an experiment, in the case of an adaptation of a speaker model, it is suitable that λ =0.9 ~ 0.95.

Authentications for 12 speakers are tested using 20 sentences (once per week, 5 recordings) for every section. The results are shown in FIGs. 1 and 2. In FIGs. 1 and 2, each of sections TO ~T1 has a period of one month. A indicates a case where a method according to the present invention is used. B indicates a case where a conventional method is used. C indicates a case in which a model is registered with a great deal of information without adaptation. FIGs. 1 and 2 indicate the compared results of the case A, the case B, and the case C. FIG. 1 indicates an Equal Error Rate (EER) change according to a lapse of time when an utterance change or an outlier does not occur. When the outlier does not occur, the case A, the case B, and the case C have similar EERs. However, after time lapses a little longer, in sections T5 ~T6, EERs of both of the case A and the case B with adaptation are less than that of the case C without adaptation. As time goes by, both of the case A and the case B have less ERRs.

FIG. 2 indicates an EER change according to a lapse of time when an utterance change or an outlier occurs. When the outlier occurs, the case B and the case C are influenced by the outlier during testing so that a False Reject significantly occurs to increase the EER. When an authentication is achieved, a speaker model is adapted by a polluted signal to form an inaccurate speaker model. Thus, when clean data are also tested, a false rejection occurs. When another person enters, a false wrong acceptance occurs. A performance of a wrong adapted model is discovered less than that of an adaptation model when an outlier does not occur. In the case A, since a test is performed to minimize the effect of outlier, only a slight difference exists between a performance of the case A and a performance of the case when outlier does not occur. The adaptation for the model makes the process to minimize the effect of the outlier in the case A. Although outlier initially occurs, it prevents the model from being inaccurately adapted. Thus, a performance of an adapted model when the outlier occurs is similar to that when the outlier does not occur.

Industrial Applicability As can be seen from the foregoing, the present invention solves a problem in which a speaker model inaccurately changes due to effect of the outlier. Although time lapses, a rate in which new data are adapted to the model becomes greater than a predetermined value. So, the present invention obtains a speaker adaptation model. Although a preferred embodiment of the present invention has been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims

1, A GMM incremental robust adaptation with forgetting factor for speaker verification comprising: adapting new data y_N+l (t) to a speaker model^^ = {p* , μ" , ∑f }) by the

following equation when a speaker is authenticated, - Mixture Weights:

- Means:

- Variance:

W(N ₊ I) = MV(N)+ %w_N+l (0 c=i

adapting data having an outlier such as a noise and an utterance change according to a time to a robust speaker model by adapting the equation; and uniformly maintaining a ratio in which new data is adapted to a model.