CN105139856B - Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge - Google Patents

Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge Download PDF

Info

Publication number
CN105139856B
CN105139856B CN201510560667.1A CN201510560667A CN105139856B CN 105139856 B CN105139856 B CN 105139856B CN 201510560667 A CN201510560667 A CN 201510560667A CN 105139856 B CN105139856 B CN 105139856B
Authority
CN
China
Prior art keywords
regular
speaker
linear discriminant
vector
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510560667.1A
Other languages
Chinese (zh)
Other versions
CN105139856A (en
Inventor
李明
蔡炜城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
SYSU CMU Shunde International Joint Research Institute
Original Assignee
Sun Yat Sen University
SYSU CMU Shunde International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, SYSU CMU Shunde International Joint Research Institute filed Critical Sun Yat Sen University
Priority to CN201510560667.1A priority Critical patent/CN105139856B/en
Publication of CN105139856A publication Critical patent/CN105139856A/en
Application granted granted Critical
Publication of CN105139856B publication Critical patent/CN105139856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention discloses a kind of probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge, this can finally train with more distinction, more the probability linear discriminant analysis model that can reflect truth according to the covariance hypothesis and iterative process for arbitrarily removing regular probability linear discriminant analysis model about the useful information of training voice.Meanwhile introducing two regular coefficients and making model adjustable, can adaptively it be optimal for all kinds of different regular information.It is obviously improved using the present invention model that obtains of training than the Speaker Identification evaluation and test effect that conventional model obtains in same data set, error rates (EER) and minimum detection mistake cost (norm minDCF) the relative drop 10%-20% such as can make in internal authority Speaker Identification evaluation and test database.

Description

Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge
Technical field
The present invention relates to Application on Voiceprint Recognition fields, and in particular to a kind of probability based on the regular covariance of priori knowledge linearly reflects It Fen Xi not method for distinguishing speek person.
Background technique
Speaker Recognition Technology is using speaker characteristic information included in voice signal, to the true body of its behind Part makes the technology of judgement and identification.Speaker Recognition Technology is at present in identification, video conference, access control, military affairs The numerous areas such as criminal investigation are widely used, and develop into more and more important modern biotechnology feature verification technology.In recent years Come, the method for distinguishing speek person based on total changed factor becomes the main stream approach in Speaker Identification field, it does not distinguish strictly and says People and channel are talked about, they are modeled as an entirety.By the technology, every voice is in mixture Gaussian background model (UBM) the first-order statistics super vector on is mapped as the low-dimensional vector of regular length, and at the same time, largely Shangdi remains Speaker information, therefore the low-dimensional vector is also referred to as identity vector (ivector).For total changed factor of this low-dimensional, Educational circles proposes many channel compensations and rear end modeling technique based on supervised learning, probability linear discriminant analysis (PLDA) because Its excellent performance has obtained extensive concern.
PLDA is a typical production model, and it is poor between the class for describing different speakers that it is decomposed into total changed factor The channel component of difference in the class of different speaker's component and the same speaker of description, as follows:
ηij=φ βi+∈ij
Wherein ηijIndicating j-th of ivector of i-th of speaker in training voice data, φ is speaker space matrix, βiIt is low-dimensional speaker's vector of i-th of speaker, ∈ijBeing then cannot be by residual error item that speaker space captures.
In general, all assume βiAnd ∈ijTwo component statistical independences, and Gaussian distributed.The distribution of residual error item It is described with a unified global covariance matrix Σ.φ and Σ is unknown quantity, needs the training number by largely there is mark It according to going to obtain optimal φ and Σ, is then used on log-on data and test data, obtains log-on data and test number According in the space between any two likelihood score scoring, and with this come judge tested speech with register voice whether come from it is same People.
However, the limitation of above-mentioned algorithm frame is, all kinds of physical characteristics such as frame length, signal-to-noise ratio of every voice phase not to the utmost Together, with global covariance matrix go description residual distribution train come probability linear discriminant analysis model will obviously with it is true Model has certain deviation, and can every voice it is intrinsic the useful information for promoting recognition performance can be helped to erase.
Summary of the invention
For the above-mentioned limitation overcome in Speaker Identification in existing probability linear discriminant model training process, warp After crossing a large amount of experiment and Performance tuning, the present invention provides a kind of probability linear discriminant based on the regular covariance of priori knowledge Method for distinguishing speek person.This method can according to useful priori knowledge any in training voice, such as voice duration, signal-to-noise ratio, The score information that the even last round of model trained or other models obtain, to current probability linear discriminant analysis model Carry out regular training.
In order to achieve the above objectives, the technical solution adopted by the present invention is that:
A kind of probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge is to utilize training voice Effective Given information go the covariance of regular probability linear discriminant analysis model to assume and iterative process, including following step It is rapid:
1) every trained voice intrinsic physical message or subjective and objective score information are acquired, information d is denoted asij, subscript i, j Indicate that the information belongs to the j-th strip training voice of i-th of speaker;
2) information d is usedijThe covariance matrix that residual error item is portrayed in probability linear discriminant analysis model is carried out regular;
3) covariance matrix after utilization is regular obtains the average condition of the identity vector ivector of i-th of speaker Distribution;
Wherein, FiIndicate the average vector of all identity vector ivector of i-th of trained speaker, its mean value to Amount is φ βi,It is covariance, φ is speaker space matrix, MiIt is the total item of voice of i-th of trained speaker Number, βiIt is low-dimensional speaker's vector of i-th of speaker, is an implicit variable;
According to Bayesian formula, implicit variable β is obtainediIn given average vector FiUnder posterior probability, mean vector are as follows:
Wherein, I is unit matrix, χiFor the adduction vector of all identity vector ivector of i-th of people;
According to EM algorithm, known posterior probability P (β is obtainedi|Fi) mean vector E (βi) under each speaker space square The more new formula of battle array φ and covariance matrix Σ is as follows:
By alternately updating E (βi) and φ, Σ value iteration until convergence, obtain optimal φ and Σ value, complete to speak The training of probability linear discriminant analysis model in people's identification, obtains trained probability linear discriminant analysis model;
4) using the trained probability linear discriminant analysis model that is obtained by step 3) to it is to be identified be that voice carries out Identify.
Above-mentioned steps 1) in information dijIt can be the frame length of the voice, signal-to-noise ratio, what is obtained after other model identifications comments The score information etc. obtained after point information or the last round of identification of this model.
Further, the regular method of the step 2) is as follows:
Wherein Σ is global covariance matrix, and u and v are regular coefficients, finds optimal value by constantly adjusting,It is integrally formed a regular item, global covariance matrix is mapped as every trained voice adaptively Independent entry.
It compared with prior art, the beneficial effects of the present invention are: can be according to the useful information arbitrarily about training voice The covariance hypothesis and iterative process for removing regular probability linear discriminant analysis model, finally train with more distinction, more It can reflect the probability linear discriminant analysis model of truth.Meanwhile introducing two regular coefficients and making model adjustable, it can be directed to All kinds of different regular information are adaptively optimal.
The model obtained using present invention training is than the Speaker Identification evaluation and test that conventional model obtains in same data set Effect is obviously improved, and error rates (EER) the relative drop 10%- such as can make in internal authority Speaker Identification evaluation and test database 20%.
Detailed description of the invention
Fig. 1 is the flow chart that the intrinsic physical message of training voice is chosen in the present invention.
Fig. 2 is that score information that last training pattern obtains is chosen in the present invention as this model training priori knowledge The regular flow chart of iteration.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Fig. 1 is to choose training voice intrinsic physical message such as duration, signal-to-noise ratio and other models in the present invention to obtain The score information duration of choosing training voice as primary regular process the present embodiment of this model training priori knowledge make It is regular that covariance is carried out for priori knowledge.
Fig. 2 is that score information that last training pattern obtains is chosen in the present invention as this model training priori knowledge The regular process of iteration.
A kind of probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge is to utilize training voice Effective Given information go regular probability linear discriminant analysis model covariance assume and iterative process, comprising:
Every trained voice intrinsic physical message or subjective and objective score information are acquired, information d is denoted asij, subscript i, j table Show that the information belongs to the j-th strip training voice of i-th of speaker;
In the present embodiment, the duration information of training voice is chosen as priori knowledge.
Since the original duration information of voice can generally be lost after speech feature extraction, and handle voice data usually all It is to be handled by frame, therefore original duration information can be replaced with the frame length of voice in the present embodiment, the frame length letter of voice Breath can be measured out by zero order statistical of the voice on mixture Gaussian background model, specific as follows:
At this point, the d in above formulaijIt is the totalframes of i-th of people's jth sentence voice, Mix is the Gauss of mixture Gaussian background model Number, N (n) is posterior probability of the phonetic feature in each Gauss.
D can be thus based onijThe regular training of covariance is carried out to probability linear discriminant analysis model.It can choose at this time Different regular coefficient u, v are tested, and are finally chosen so that the optimal value of recognition effect.
Regular for the progress of frame length information in the present embodiment, the optimum value of u is all dijAverage value Δ d, v it is best When value is 1.5, when covariance matrix is regular, probability linear discriminant model training effect is best.
Namely the regular of such as following formula is carried out using frame length information:
Wherein Σ is global covariance matrix, and u and v are regular coefficients, finds optimal value by constantly adjusting,It is integrally formed a regular item, global covariance matrix is mapped as every trained voice adaptively Independent entry.
Using the covariance matrix after regular, the average condition point of the identity vector ivector of i-th of speaker is obtained Cloth;
Wherein, FiIndicate the average vector of all identity vector ivector of i-th of trained speaker, its mean value to Amount is φ βi,It is covariance, φ is speaker space matrix, MiIt is the total item of voice of i-th of trained speaker Number, βiIt is low-dimensional speaker's vector of i-th of speaker, is an implicit variable;
According to Bayesian formula, implicit variable β is obtainediIn given average vector FiUnder posterior probability, mean vector are as follows:
Wherein, I is unit matrix, χiFor the adduction vector of all identity vector ivector of i-th of people;
According to EM algorithm, known posterior probability P (β is obtainedi|Fi) mean vector E (βi) under each speaker space square The more new formula of battle array φ and covariance matrix Σ is as follows:
By alternately updating E (βi) and φ, Σ value iteration until convergence, obtain optimal φ and Σ value, complete to speak The training of probability linear discriminant analysis model in people's identification, obtains trained probability linear discriminant analysis model;
It is spoken using the trained probability linear discriminant model of this method than what conventional model obtained in same data set People identifies that evaluation and test effect is obviously improved.Core test set is evaluated and tested in American National Standard Technical Board Speaker Identification in 2010 (NIST SRE 2010) first-class error rate (EER) has decreased to 2.26% from 2.82%, relative drop 19.8%, minimum detection Mistake cost (norm minDCF) has decreased to 0.268 from 0.311, relative drop 13.8%.
The same or similar label correspond to the same or similar components;
Described in attached drawing positional relationship for only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (1)

1. a kind of probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge, it is characterised in that utilize instruction Practice voice effective Given information go regular probability linear discriminant analysis model covariance assume and iterative process, including with Lower step:
1) every trained voice intrinsic physical message or subjective and objective score information are acquired, information d is denoted asij, the expression of subscript i, j The information belongs to the j-th strip training voice of i-th of speaker;
2) information d is usedijIt is regular to the covariance matrix progress for portraying residual error item in probability linear discriminant analysis model, it is described regular Method is as follows:
Wherein ∑ is global covariance matrix, and u and v are regular coefficients, finds optimal value by constantly adjusting,It is whole Body constitutes a regular item, and global covariance matrix is mapped as to the independent entry adaptive for every trained voice;
3) covariance matrix after utilization is regular obtains the average condition point of the identity vector ivector of i-th of speaker Cloth;
Wherein, FiIndicate the average vector of all identity vector ivector of i-th of trained speaker, βiIt is i-th of speaker Low-dimensional speaker's vector, be an implicit variable, on the right of equationIndicate a Gaussian Profile, The variance of this Gaussian Profile isIts mean vector is φ βi, φ is speaker space matrix, MiIt is i-th The voice total number of training speaker;
According to Bayesian formula, implicit variable β is obtainediIn given average vector FiUnder posterior probability, mean vector are as follows:
Wherein, I is unit matrix, χiFor the adduction vector of all identity vector ivector of i-th of people;
According to EM algorithm, known posterior probability P (β is obtainedi|Fi) mean vector E (βi) under each speaker space matrix φ And the more new formula of covariance matrix ∑ is as follows:
By alternately updating E (βi) and φ, ∑ value iteration until convergence, obtains optimal φ and ∑ value, complete Speaker Identification In probability linear discriminant analysis model training, obtain trained probability linear discriminant analysis model, wherein ηijIt is i-th The ivector vector of the j-th strip training voice data of speaker, T indicate the total quantity of all speakers in training data;
4) it uses and voice to be identified is identified by the trained probability linear discriminant analysis model that step 3) obtains.
CN201510560667.1A 2015-09-02 2015-09-02 Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge Active CN105139856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510560667.1A CN105139856B (en) 2015-09-02 2015-09-02 Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510560667.1A CN105139856B (en) 2015-09-02 2015-09-02 Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge

Publications (2)

Publication Number Publication Date
CN105139856A CN105139856A (en) 2015-12-09
CN105139856B true CN105139856B (en) 2019-07-09

Family

ID=54725178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510560667.1A Active CN105139856B (en) 2015-09-02 2015-09-02 Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge

Country Status (1)

Country Link
CN (1) CN105139856B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679323B (en) * 2015-12-24 2019-09-03 讯飞智元信息科技有限公司 A kind of number discovery method and system
CN106297807B (en) * 2016-08-05 2019-03-01 腾讯科技(深圳)有限公司 The method and apparatus of training Voiceprint Recognition System
CN109584884B (en) * 2017-09-29 2022-09-13 腾讯科技(深圳)有限公司 Voice identity feature extractor, classifier training method and related equipment
CN107766892B (en) * 2017-10-31 2020-04-10 Oppo广东移动通信有限公司 Application program control method and device, storage medium and terminal equipment
CN112992157A (en) * 2021-02-08 2021-06-18 贵州师范大学 Neural network noisy line identification method based on residual error and batch normalization
CN113283804B (en) * 2021-06-18 2022-05-31 支付宝(杭州)信息技术有限公司 Training method and system of risk prediction model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7529666B1 (en) * 2000-10-30 2009-05-05 International Business Machines Corporation Minimum bayes error feature selection in speech recognition
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN103077719A (en) * 2012-12-27 2013-05-01 安徽科大讯飞信息科技股份有限公司 Method for quickly processing total space factor based on matrix off-line precomputation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996527B2 (en) * 2001-07-26 2006-02-07 Matsushita Electric Industrial Co., Ltd. Linear discriminant based sound class similarities with unit value normalization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7529666B1 (en) * 2000-10-30 2009-05-05 International Business Machines Corporation Minimum bayes error feature selection in speech recognition
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN103077719A (en) * 2012-12-27 2013-05-01 安徽科大讯飞信息科技股份有限公司 Method for quickly processing total space factor based on matrix off-line precomputation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NORMALIZATION OF TOTAL VARIABILITY MATRIX FOR I-VECTOR/PLDA SPEAKER VERIFICATION;Wei Rao,et al.;《2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20150424;1-4
Within-Class Covariance Normalization for SVM-based Speaker Recognition;Andrew O. Hatch,et al.;《INTERSPEECH 2006 - ICSLP》;20060921;1471-1474
基于PLDA 的"一对多"下的说话人确认方法研究;许云飞等;《第十二届全国人机语音通讯学术会议》;20131231;1-5
高斯PLDA 在说话人确认中的应用及其联合估计;许云飞等;《自动化学报》;20140630;第40卷(第6期);1068-1074

Also Published As

Publication number Publication date
CN105139856A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN105139856B (en) Probability linear discriminant method for distinguishing speek person based on the regular covariance of priori knowledge
Petridis et al. Deep complementary bottleneck features for visual speech recognition
Villalba et al. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations
McLaren et al. Advances in deep neural network approaches to speaker recognition
Garcia-Romero et al. Speaker diarization using deep neural network embeddings
Snyder et al. Deep neural network embeddings for text-independent speaker verification.
An et al. Deep CNNs with self-attention for speaker identification
CN105139857B (en) For the countercheck of voice deception in a kind of automatic Speaker Identification
Richardson et al. A unified deep neural network for speaker and language recognition
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN106952643A (en) A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN108711421A (en) A kind of voice recognition acoustic model method for building up and device and electronic equipment
CN104485103B (en) A kind of multi-environment model isolated word recognition method based on vector Taylor series
CN100363938C (en) Multi-model ID recognition method based on scoring difference weight compromised
CN105096955B (en) A kind of speaker's method for quickly identifying and system based on model growth cluster
CN104240706B (en) It is a kind of that the method for distinguishing speek person that similarity corrects score is matched based on GMM Token
Mallidi et al. Uncertainty estimation of DNN classifiers
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN108520752A (en) A kind of method for recognizing sound-groove and device
CN108364634A (en) Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
Ferrer et al. Spoken language recognition based on senone posteriors.
Schatz et al. Neural network vs. HMM speech recognition systems as models of human cross-linguistic phonetic perception
CN110085236B (en) Speaker recognition method based on self-adaptive voice frame weighting
Ng et al. Teacher-student training for text-independent speaker recognition
Xin et al. Improving speech enhancement via event-based query

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant