GMM INCREMENTAL ROBUST ADAPTATION WITH FORGETTING FACTOR FOR SPEAKER VERIFICATION
Technical Field The present invention relates to a GMM incremental robust adaptation with a forgetting factor for the speaker verification. More specifically, the present invention is robust to an effect of outliers such as utterances change and a noise, and provides a speaker recognition model having a high recognition rate by uniformly adapting new data to a conventional speaker model in spite of the lapse of time in order to adapt a speaker model exactly.
The present invention relates to a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. More specifically, the present invention relates to a method which reduces an effect of outliers, and uniformly maintains a rate in which new data is adapted to a conventional model using a forgetting factor by preventing the rate from being reduced.
Background Art
With the increasing use of Internet and a personal computer, a problem concurring the disclosure of personal information has significantly increased. A conventional ID and password are no longer adequate to prevent the personal information from being disclosed. Thus, various biological methods have been studied as an authentication method. Among them, the speaker recognition method is widely studied since a user is familiar with a speaker recognition method and
easily uses it. A speech recognition method is divided into a text independent method and a text dependent method. A Gaussian Mixture Model is mainly used for the text independent method, and a Hidden Markov Model is mainly used for the text dependent method. In speaker recognition method based on GMM, in order to have a high recognition rate, when first forming a speaker model, a great deal of data recorded through a multi session are required. However, for the purpose of a speech recognition, it is impossible for a speaker recognition method based on GMM to receive a great deal of data recorded by a speaker. Thus, a speaker verification adaptation has been proposed. The speaker verification adaptation is a method which forms a speaker model using small amount of data during a registration, and adapts data received from every testing to a speaker model. A speaker adaptation in the GMM is achieved as follows.
It is assumed that a model registered by N (where, N is an integer) data is expressed by the following equation.
θN = {p?, μf , ∑f}, i = l,2,.,M
where pf is a mixture weight, μ" is a mean, ∑f is a variance. During
testing, when (N÷l)-th yN÷ι (t) is entered, an adaptation equation of a model is
expressed by the following equations Ia, Ib, and Ic. - Mixture weights: [Equation Ia]
- Variance: [Equation Ic]
N T11
In the equations, γ^ = ∑∑pQ :- \ yn (t),θ) , y" is a stored value during the
«=1 t=\
registration. When (N+l)-th data is entered, γ"+> = γ" + ∑p(i | yN+ι(t),θ) .
I=)
However, when an outlier occurs due to utterance changes or a noise in newly entered data, an effect of the outlier is adapted to a model in order to change the model inexactly. As an adaptation advances, γ" is increased. Accordingly, when
a predetermined time lapses, an effect the newly entered data yN+} (t) has a problem
on when model is adapted.
Disclosure of Invention
Therefore, an object of the present invention to provide a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces
an effect of an outlier using a robust method, and adapts an effect of new data to a model greater than a predetermined rate using a variable forgetting factor.
According to the present invention, there is provided a GMM incremental robust adaptation with a forgetting factor for a speaker verification, which reduces an effect of an outlier using a Cauchy weight and uniformly influences data upon a model during a registration. The GMM incremental robust adaptation minimizes an influence of the data upon the model when an outlier occurs during testing, and adapts new data to the model constantly using a forgetting factor.
Brief Description of Drawings
The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:
FIG. 1 is a graph for showing an EER when an outlier does not occur; FIG. 2 is a graph for showing an EER when an outlier occurs;
FIG. 3 is a block diagram showing a GMM system using an incremental robust adaptation with a forgetting factor method; and
FIG. 4 is a view for illustrating problems of a conventional adaptation method according to time.
Best Mode for Carrying Out the Invention
Hereinafter, a preferred embodiment of the present invention will be described with reference to FIG. 2. -Registration
It is assumed that N data used in a registration is expressed by the following
equation. YN = {Yn , n = 1,A , N}, Yn = {y,, {t), t = l,A , Tn ] . When an outlier exists at
YN , a robust estimating method is expressed by the following equation 2 based on an M-estimation method for a reliable estimating step of a GMM [Equation 2]
J = ∑λN-±p[logp(yn(t]θ)] t=\
where p[] is a loss function and reduces an effect of the outlier. In the
equation 2, p(γn{t]θ)is Gaussian mixture density and a weighted sum of M Gaussian
density. A model of a robust GMM for a speaker model is defined by ΘN = {p" , μ" , ∑f } as in a conventional method. By minimizing J with respect to
respective
Σ
(. , a re-estimating equation of the model is obtained. By
setting = 0, = 0, = 0 , a robust GMM re-estimating equation is dp i dμi δ∑;
expressed by the following equations 3a, 3b, and 3c. - Mixture Weights [Equation 3a]
»=1 t=\
- Variance [Equation 3c]
In the equation 3c,
(t),θ) is a posteriori probability,
. Wn (t) is a weighted function. When
z
n {t) = log p(y,, . In an embodiment of the
present invention, Cauchy's weight function is used that wn (t) = l/\[l + zn (t))2 / β),
where, β is a scale parameter. In the equations 3a, 3b, and 3c, when an outlier occurs,
∑n (t) has a great value to lower wn (t) to thereby reduce an effect in a model
equation, λ is a forgetting factor and is set to "1" during registration.
- Testing (Robust Incremental Adaptation for GMM)
Where a model parameter ΘN is registered by N data, when data Y N+\ ~ (jViOO'Λ >yN+\ (T N+ι )} is entered, (N+l)-th re-estimating equation is
sequentially obtained by the following equations 4a, 4b, and 4c. - Mixture Weights, [Equation 4a]
- Means, [Equation 4b]
- Variance, [Equation 4c]
the equations 4a, 4b, and 4c, W(N)
, and
Λ' r,, I \
Wl,{N)=∑λN-"∑wll(t)p(iyn{t),θN) . Wlien new data YN+, includes an outlier,
H=I C=]
Wyv+ι(O becomes smaller to reduce an effect of the outlier. W{N + l) and PF (N + l)
are sequentially obtained by the following equations 5a and 5b from the equations 4a, 4b, and 4c, respectively. [Equation 5a]
W(N + l) = λW{N)+ TfjwN+1 {t) t=\
[Equation 5b]
W
p [N
+ I) = λW
p [N) + ∑ w
N+, [t)p(i y
N+] [ή, θ
During the adaptation, a forgetting factor λ ranges λ < 1. When λ =1, the adaptation according to the present invention performs in the same manner as that of the conventional method. As the forgetting factor λ becomes smaller, an effect for new data is increased, an effect which the conventional model exerts becomes smaller. According to a result through an experiment, in the case of an adaptation of a speaker model, it is suitable that λ =0.9 ~ 0.95.
Authentications for 12 speakers are tested using 20 sentences (once per week, 5 recordings) for every section. The results are shown in FIGs. 1 and 2. In FIGs. 1 and 2, each of sections TO ~T1 has a period of one month. A indicates a case where a method according to the present invention is used. B indicates a case where a conventional method is used. C indicates a case in which a model is registered with a great deal of information without adaptation. FIGs. 1 and 2 indicate the compared results of the case A, the case B, and the case C. FIG. 1 indicates an Equal Error Rate (EER) change according to a lapse of time when an utterance change or an outlier does not occur. When the outlier does not occur, the case A, the case B, and the case C have similar EERs. However, after
time lapses a little longer, in sections T5 ~T6, EERs of both of the case A and the case B with adaptation are less than that of the case C without adaptation. As time goes by, both of the case A and the case B have less ERRs.
FIG. 2 indicates an EER change according to a lapse of time when an utterance change or an outlier occurs. When the outlier occurs, the case B and the case C are influenced by the outlier during testing so that a False Reject significantly occurs to increase the EER. When an authentication is achieved, a speaker model is adapted by a polluted signal to form an inaccurate speaker model. Thus, when clean data are also tested, a false rejection occurs. When another person enters, a false wrong acceptance occurs. A performance of a wrong adapted model is discovered less than that of an adaptation model when an outlier does not occur. In the case A, since a test is performed to minimize the effect of outlier, only a slight difference exists between a performance of the case A and a performance of the case when outlier does not occur. The adaptation for the model makes the process to minimize the effect of the outlier in the case A. Although outlier initially occurs, it prevents the model from being inaccurately adapted. Thus, a performance of an adapted model when the outlier occurs is similar to that when the outlier does not occur.
Industrial Applicability As can be seen from the foregoing, the present invention solves a problem in which a speaker model inaccurately changes due to effect of the outlier. Although time lapses, a rate in which new data are adapted to the model becomes greater than a predetermined value. So, the present invention obtains a speaker adaptation model.
Although a preferred embodiment of the present invention has been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.