CN102129859B - Voiceprint authentication system and method for rapid channel compensation - Google Patents

Voiceprint authentication system and method for rapid channel compensation Download PDF

Info

Publication number
CN102129859B
CN102129859B CN2010100272991A CN201010027299A CN102129859B CN 102129859 B CN102129859 B CN 102129859B CN 2010100272991 A CN2010100272991 A CN 2010100272991A CN 201010027299 A CN201010027299 A CN 201010027299A CN 102129859 B CN102129859 B CN 102129859B
Authority
CN
China
Prior art keywords
factor
gaussian component
ubm
selection device
compensation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010100272991A
Other languages
Chinese (zh)
Other versions
CN102129859A (en
Inventor
黄伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shengle Information Technolpogy Shanghai Co Ltd
Original Assignee
Shengle Information Technolpogy Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shengle Information Technolpogy Shanghai Co Ltd filed Critical Shengle Information Technolpogy Shanghai Co Ltd
Priority to CN2010100272991A priority Critical patent/CN102129859B/en
Publication of CN102129859A publication Critical patent/CN102129859A/en
Application granted granted Critical
Publication of CN102129859B publication Critical patent/CN102129859B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a voiceprint authentication system for rapid channel compensation, which comprises a Gaussian selector, a universal background model (UBM) mixing degree selection module, a speaker factor or channel factor module, a UBM and a speaker model, wherein the Gaussian selector is used for classifying each Gaussian component of the UBM; the UBM mixing degree selection module is used for selecting mixing degree at a relatively shorter distance to calculate zero-order or first-order statistic for each frame of observation vectors of training data by utilizing the Gaussian selector; and the speaker factor or channel factor module is used for estimating a speaker factor or channel factor according to the zero-order or first-order statistic, and establishing the speaker model according to the speaker factor or channel factor. In addition, the invention also discloses a voiceprint authentication method for the rapid channel compensation. By the system and the method, mismatching compensation is realized, and simultaneously a calculated amount is remarkably reduced, thereby remarkably increasing the training speed.

Description

Voiceprint authentication system and the method for Fast Channel compensation
Technical field
The present invention relates to a kind of voiceprint authentication system, relate in particular to a kind of voiceprint authentication system of Fast Channel compensation; In addition, the invention still further relates to a kind of voiceprint authentication method of Fast Channel compensation.
Background technology
Application on Voiceprint Recognition (Voiceprint recognition), be called again Speaker Identification (Speaker Recognition), be one according to the speech parameter of reflection speaker physiology, psychological and behavioral characteristics in the speech waveform, automatically identify the technology of speaker ' s identity.Because the difference of the difference of congenital channel structure and the pronunciation of forming day after tomorrow custom causes everyone sound to have uniqueness, according to this uniqueness, can be distinguished people's identity accurately.
In voiceprint (or Application on Voiceprint Recognition) system, the mismatch of training utterance and tested speech is the principal element that restriction current system performance improves.The mismatch of voice derives from multiple situation: different transmission channels, the impact of the variation of the impact of sound pick-up outfit and neighbourhood noise and speaker's physiology mood.In the system based on the GMM-UBM model structure of adding up, because the impact of mismatch, the statistical model that training utterance obtains can not well be described the parameter distribution of voice under the test case, has caused the rapid decline of system performance.
For solving above-mentioned mismatch problems in the voiceprint authentication system, the parametric degree compensation methodes such as Feature Mapping (FeatureMapping) are arranged at present.Voice training passage sorter and passage mapping ruler that Feature Mapping needs known path marking utilize the court verdict of sorter to carry out channel compensation.Under the fewer and simple scenario of passage kind, the method can obtain certain effect.In complex environment, mismatch compensation method based on factorial analysis (factor analysis) has obtained good effect, the method is estimated respectively the mismatch information factor of training utterance and tested speech, according to the mismatch between mismatch information factor elimination training utterance and tested speech.Do not need the passage sorter to judge the passage classification of voice, realize the continuous comprehensive compensation to the voice mismatch.It is about 57% that the result of factor-analysis approach on NIST ' 06 database, error rate reduce, and this shows that factor-analysis approach is having good effect to improving voiceprint authentication system aspect the channel mismatch.
But need to estimate that based on the mismatch compensation of factor-analysis approach every frame speech parameter to the EM statistic of all degree of mixings of UBM, compares with the Feature Mapping method, calculated amount is too large, and training and testing is chronic, has limited its application in practice.Table 1 is result's contrast of baseline system and the voiceprint authentication system that adopts factorial analysis, the system of employing factorial analysis has well solved the mismatch of training and test environment and channel, its misclassification rate such as grade reduces to 3.76% by 8.94%, but the average required time increased to 9.53 seconds by 1.26 seconds, had so just limited the application of factorial analysis in practical application area.
Table 1
? Required time (second) EER(%)
Baseline system 1.26 8.94
Factorial analysis 9.53 3.76
Summary of the invention
The technical problem to be solved in the present invention has been to provide a kind of voiceprint authentication system of Fast Channel compensation, and this system is when realizing mismatch compensation, and calculated amount has obvious reduction, thereby can significantly improve training speed.For this reason, the present invention also provides the voiceprint authentication method of Fast Channel compensation.
The present invention for the technical scheme that provides is provided is: a kind of voiceprint authentication system of Fast Channel compensation, this voiceprint authentication system comprise the Gauss selection device, select UBM degree of mixing module, words person's factor or passage factor module, UBM model and Speaker model; Described Gauss selection device is used for each gaussian component of UBM model is classified; Described selection UBM degree of mixing module is used for utilizing the Gauss selection device to observe vector chosen distance nearer degree of mixing calculating zeroth order or first-order statistics amount for every frame of training data; Described words person's factor or passage factor module are used for estimating words person's factor or the passage factor according to above-mentioned zeroth order or first-order statistics amount, set up Speaker model according to this words person's factor or the passage factor.
In addition, the present invention also provides a kind of voiceprint authentication method of Fast Channel compensation, comprises the steps:
1) make up the Gauss selection device, this Gauss selection device is classified to each gaussian component of UBM model;
2) use through the gaussian component of Gauss selection device classification and every frame observation vector calculating posterior probability of training data, select wherein some groups of gaussian component of posterior probability maximum;
3) calculate the posterior probability that belongs to these some groups of gaussian component of selecting in current observation vector and the UBM model, the posterior probability of remaining gaussian component directly is zero;
4) according to step 3) calculate zeroth order or the first-order statistics amount is estimated words person's factor or the passage factor;
5) set up Speaker model according to this words person's factor or the passage factor.
Compared with prior art, beneficial effect of the present invention is as follows:
1. the present invention takes the continuous mismatch compensation method based on factorial analysis, more realistic mismatch situation, and the system performance behind the mismatch compensation improves also more obvious.
2. the present invention proposes the quick mismatch compensation algorithm based on the Gauss selection device, observe vector for every frame and select corresponding gaussian component to calculate the EM statistic, reduce the complexity of factorial analysis algorithm, training and testing speed has several times to improve.
3. for when guaranteeing that quick mismatch compensation algorithm improves training and testing speed, system performance behind the more effective raising mismatch compensation, the present invention proposes the degree of mixing selection scheme based on the Top-N strategy, improve Gauss selection device screening UBM gaussian component and the quick compensation precision of mismatch compensation algorithm, further improved the system performance behind the mismatch compensation.
Description of drawings
The present invention is further detailed explanation below in conjunction with accompanying drawing and embodiment:
Fig. 1 be among the present invention a frame speech vector to the posterior probability synoptic diagram of all degree of mixings of UBM;
Fig. 2 is the structured flowchart of the voiceprint authentication system of Fast Channel compensation of the present invention.
Embodiment
The present invention's proposition combines factorial analysis with Gauss selection (Gaussian selection) method mismatch compensation method, utilize the Gauss selection device that the degree of mixing of UBM is selected for every frame vector, only calculate the larger degree of mixing of EM (zeroth order or single order) statistic contribution, reduce the calculated amount of calculating the EM statistic.When realizing mismatch compensation, calculated amount has obvious reduction, and training speed is brought up to original 10 times, and performance does not almost obviously descend.
Shown in Figure 1 is that vector o observed in frame voice tPosterior probability output γ (o to all degree of mixings of UBM t).As seen from Figure 1, only to larger with the posterior probability output of its closer gaussian component that distributes in the space, apart the posterior probability of distant degree of mixing all is the very little value close to zero to speech vector.And only have the larger degree of mixing of those posterior probability just the EM statistic to be had contribution, and the number of these degree of mixings is much smaller than total number of UBM degree of mixing.Therefore, if only calculate the larger degree of mixing of these posterior probability, can significantly reduce the calculated amount of EM statistic in the factor-analysis approach.But the larger degree of mixing of posterior probability corresponding to every frame speech vector is different.Fig. 2 is based on the target Speaker model training block diagram of Gauss selection and factor-analysis approach, and shown in the dotted line frame is that to utilize the Gauss selection device be that the nearer degree of mixing of the quick chosen distance of every frame speech vector calculates the EM statistic.As shown in Figure 2, the voiceprint authentication system of Fast Channel compensation of the present invention comprises the Gauss selection device, selects UBM degree of mixing module, words person's factor or passage factor module, UBM model and Speaker model; Described Gauss selection device is used for each gaussian component of UBM model is classified; Described selection UBM degree of mixing module is used for utilizing the Gauss selection device to observe vector chosen distance nearer degree of mixing calculating zeroth order or first-order statistics amount for every frame of training data; Described words person's factor or passage factor module are used for estimating words person's factor or the passage factor according to above-mentioned zeroth order or first-order statistics amount, set up Speaker model according to this words person's factor or the passage factor, to realize words person's voiceprint.
The passage factor is a factor matrix that utilizes the speech data training of a lot of people under different channels to obtain, this matrix has carried out careful description to the characteristics of different channels, when training and test, utilize this matrix to carry out effective compensation to the voice signal through the different channels transmission, to reduce the decline that causes the system identification performance owing to the difference between the different channels.
Words person's factor then is a factor matrix that utilizes a lot of people's not homogeneous pronunciation training to obtain, this matrix has carried out careful description to the time variation of speaker (words person) pronunciation, in training and when test, utilize this matrix can to owing to the speaker not the variation of homogeneous pronunciation characteristic cause the decline of recognition system performance.
The algorithm flow of the quick mismatch compensation of the present invention is as follows:
1) each gaussian component of UBM model is carried out cluster according to similarity, structure Gauss selection device.The present invention is merged into one group of similar gaussian component a gaussian component of Gauss selection device by the cluster to the UBM gaussian component.
2) observe vector for every frame of training and testing voice, at first and through the gaussian component one of Gauss selection device classification be used from the calculating posterior probability, select wherein some groups of gaussian component of posterior probability maximum, namely adopt the Top-N strategy from all gaussian component, to select N the gaussian component that contribution is maximum.
3) calculate the posterior probability that belongs to these some groups of gaussian component of selecting in current observation vector and the UBM model, the posterior probability of remaining gaussian component directly is zero.
By the selection of Gauss selection device, each frame speech vector only needs calculating K+N iInferior posterior probability, calculated amount is all calculated the calculated amount of posterior probability much smaller than factor-analysis approach all gaussian component that neutralize.Take the degree of mixing of UBM as 512, being divided into 16 classes is example, supposes average being distributed in each class of each degree of mixing of UBM, and then the calculated amount of CUBM is 16+512/16=48 time, is reduced to about 1/10 of original method.Table 2 is result's contrast of the voiceprint authentication system of baseline system, factorial analysis and Fast Channel compensation.
Table 2
? Required time (second) EER(%)
Baseline system 1.26 8.94
Factorial analysis 9.53 3.76
The Fast Channel compensation 1.24 3.89
Fast Channel backoff algorithm of the present invention can effectively solve the channel mismatch problem as can be seen from Table 2, has the system performance suitable with factorial analysis, and efficient has then improved 10 times nearly, can satisfy the demand of actual use.
The present invention compares with existing mismatch compensation method, and following innovative point is arranged:
1. based on the continuous mismatch compensation of factorial analysis
Existing mismatch compensation method generally takes to utilize the passage sorter to compensate, and can only isolated several channels be compensated, and the mismatch between the training and testing is the comprehensive result of various factors.The present invention takes the continuous mismatch compensation method based on factorial analysis, more realistic mismatch situation, and the system performance behind the mismatch compensation improves also more obvious.
2. based on the quick mismatch compensation of Gauss selection device
All need very large calculated amount for training and testing in the mismatch compensation of factorial analysis, the present invention proposes the quick mismatch compensation algorithm based on the Gauss selection device, observing vector for every frame selects corresponding gaussian component to calculate the EM statistic, reduce the complexity of factorial analysis algorithm, training and testing speed has several times to improve.
3.Top-N selection strategy
For when guaranteeing that quick mismatch compensation algorithm improves training and testing speed, system performance behind the more effective raising mismatch compensation, the present invention proposes the degree of mixing selection scheme based on Top-N strategy (namely from all gaussian component, selecting N the gaussian component that contribution is maximum), improve Gauss selection device screening UBM gaussian component and the quick compensation precision of mismatch compensation algorithm, further improved the system performance behind the mismatch compensation.

Claims (6)

1. the voiceprint authentication system of a Fast Channel compensation is characterized in that, this voiceprint authentication system comprises the Gauss selection device, selects UBM degree of mixing module, words person's factor or passage factor module, UBM model and Speaker model; Described Gauss selection device is used for each gaussian component of UBM model is classified; Described selection UBM degree of mixing module is used for utilizing the Gauss selection device to observe vector chosen distance nearer degree of mixing calculating zeroth order or first-order statistics amount for every frame of training data; Described words person's factor or passage factor module are used for estimating words person's factor or the passage factor according to above-mentioned zeroth order or first-order statistics amount, set up Speaker model according to this words person's factor or the passage factor.
2. the voiceprint authentication system of Fast Channel as claimed in claim 1 compensation is characterized in that, described words person's factor is a factor matrix that utilizes a lot of people's not homogeneous pronunciation training to obtain; The described passage factor is a factor matrix that utilizes the speech data training of a lot of people under different channels to obtain.
3. the voiceprint authentication system of Fast Channel as claimed in claim 1 compensation, it is characterized in that, described Gauss selection device is classified according to similarity to each gaussian component of UBM model, by the classification to each gaussian component of UBM model, one group of similar gaussian component is merged into a gaussian component of Gauss selection device.
4. the voiceprint authentication system of Fast Channel as claimed in claim 1 compensation, it is characterized in that, the concrete disposal route of described selection UBM degree of mixing module is as follows: at first, every frame of calculation training data is observed vector and through the posterior probability of the gaussian component of Gauss selection device classification, is selected wherein some groups of gaussian component of posterior probability maximum; Then, calculate the posterior probability of current observation vector and these some groups of gaussian component of selecting, the posterior probability of remaining gaussian component directly is zero.
5. the voiceprint authentication method of a Fast Channel compensation is characterized in that, comprises the steps:
1) make up the Gauss selection device, this Gauss selection device is classified to each gaussian component of UBM model;
2) use through the gaussian component of Gauss selection device classification and every frame observation vector calculating posterior probability of training data, select wherein some groups of gaussian component of posterior probability maximum;
3) calculate the posterior probability that belongs to these some groups of gaussian component of selecting in current observation vector and the UBM model, the posterior probability of remaining gaussian component directly is zero;
4) according to step 3) calculate zeroth order or the first-order statistics amount is estimated words person's factor or the passage factor;
5) set up Speaker model according to this words person's factor or the passage factor.
6. the voiceprint authentication method of Fast Channel as claimed in claim 5 compensation, it is characterized in that, step 1) in, described Gauss selection device is classified to each gaussian component of UBM model, by the classification to each gaussian component of UBM model, one group of similar gaussian component is merged into a gaussian component of Gauss selection device.
CN2010100272991A 2010-01-18 2010-01-18 Voiceprint authentication system and method for rapid channel compensation Expired - Fee Related CN102129859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010100272991A CN102129859B (en) 2010-01-18 2010-01-18 Voiceprint authentication system and method for rapid channel compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010100272991A CN102129859B (en) 2010-01-18 2010-01-18 Voiceprint authentication system and method for rapid channel compensation

Publications (2)

Publication Number Publication Date
CN102129859A CN102129859A (en) 2011-07-20
CN102129859B true CN102129859B (en) 2013-10-30

Family

ID=44267915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010100272991A Expired - Fee Related CN102129859B (en) 2010-01-18 2010-01-18 Voiceprint authentication system and method for rapid channel compensation

Country Status (1)

Country Link
CN (1) CN102129859B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033670B (en) * 2015-03-19 2019-11-15 科大讯飞股份有限公司 Voiceprint password authentication method and system
CN106971730A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method for recognizing sound-groove based on channel compensation
CN106448684A (en) * 2016-11-16 2017-02-22 北京大学深圳研究生院 Deep-belief-network-characteristic-vector-based channel-robust voiceprint recognition system
CN108766465B (en) * 2018-06-06 2020-07-28 华中师范大学 Digital audio tampering blind detection method based on ENF general background model
CN111210809B (en) * 2018-11-22 2024-03-19 阿里巴巴集团控股有限公司 Voice training data adaptation method and device, voice data conversion method and electronic equipment
CN111312283B (en) * 2020-02-24 2023-03-21 中国工商银行股份有限公司 Cross-channel voiceprint processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1871639A (en) * 2003-08-20 2006-11-29 加利福尼亚大学董事会 Topological voiceprints for speaker identification
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1871639A (en) * 2003-08-20 2006-11-29 加利福尼亚大学董事会 Topological voiceprints for speaker identification
CN101226742A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing sound-groove based on affection compensation

Also Published As

Publication number Publication date
CN102129859A (en) 2011-07-20

Similar Documents

Publication Publication Date Title
CN102129859B (en) Voiceprint authentication system and method for rapid channel compensation
CN105393305B (en) Method for handling voice signal
CN102520341B (en) Analog circuit fault diagnosis method based on Bayes-KFCM (Kernelized Fuzzy C-Means) algorithm
CN102682760B (en) Overlapped voice detection method and system
CN109496334B (en) Apparatus and method for evaluating speech quality
CN105261367B (en) A kind of method for distinguishing speek person
CN106408423A (en) Method and system for risk assessment and method for constructing system for risk assessment
CN104485108A (en) Noise and speaker combined compensation method based on multi-speaker model
CN103280224A (en) Voice conversion method under asymmetric corpus condition on basis of adaptive algorithm
Kryszczuk et al. Reliability-based decision fusion in multimodal biometric verification systems
CN101477172A (en) Analogue circuit fault diagnosis method based on neural network
Aronowitz Compensating Inter-Dataset Variability in PLDA Hyper-Parameters for Robust Speaker Recognition.
CN102945670A (en) Multi-environment characteristic compensation method for voice recognition system
CN103310227A (en) Automatic window width and window level extraction method based on neural network
CN112131662A (en) Passenger car wind noise subjective evaluation objective quantification method
US20180061395A1 (en) Apparatus and method for training a neural network auxiliary model, speech recognition apparatus and method
Mossavat et al. A Bayesian hierarchical mixture of experts approach to estimate speech quality
Mossavat et al. A hierarchical Bayesian approach to modeling heterogeneity in speech quality assessment
Rouvier et al. Study on the temporal pooling used in deep neural networks for speaker verification
CN107665712A (en) A kind of marine organisms recognition methods based on dynamic time warping
CN1279462A (en) Method and device for parallelly having speech recognition, classification and segmentation of speaker
CN103308918A (en) Fish identification method and system based on segmented time-domain centroid features
Cumani et al. Impostor Score Statistics as Quality Measures for the Calibration of Speaker Verification Systems.
Vijayakumar International determinants on Indian rubber prices
Reynolds et al. The Lincoln speaker recognition system: NIST EVAL2000

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131030

Termination date: 20140118