CN106531170B - Spoken assessment identity identifying method based on speaker Recognition Technology - Google Patents
Spoken assessment identity identifying method based on speaker Recognition Technology Download PDFInfo
- Publication number
- CN106531170B CN106531170B CN201611141701.2A CN201611141701A CN106531170B CN 106531170 B CN106531170 B CN 106531170B CN 201611141701 A CN201611141701 A CN 201611141701A CN 106531170 B CN106531170 B CN 106531170B
- Authority
- CN
- China
- Prior art keywords
- spoken
- authentication
- assessment
- recognition technology
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000005516 engineering process Methods 0.000 title claims abstract description 26
- 238000012360 testing method Methods 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000007500 overflow downdraw method Methods 0.000 claims description 11
- 238000011161 development Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000009825 accumulation Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 238000011897 real-time detection Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000012549 training Methods 0.000 description 12
- 238000000556 factor analysis Methods 0.000 description 10
- 238000009826 distribution Methods 0.000 description 9
- 230000004927 fusion Effects 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 8
- 238000002156 mixing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 238000006424 Flood reaction Methods 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 229960000074 biopharmaceutical Drugs 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 244000144992 flock Species 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Spoken assessment identity identifying method provided by the invention based on speaker Recognition Technology obtains received pronunciation template firstly, obtaining and analyzing the voice messaging of user when for user's registration, initializes authentication score;Then, when user starts spoken assessment function, certification number and authenticated time are calculated according to the total duration T of assessment, authentication score S;Secondly, the voice messaging of user is obtained, and compare with received pronunciation template, if it does, this is authenticated successfully when authenticated time reaches;Otherwise it returns and authenticated time next time is monitored;Finally, updating authentication score according to the authentication result of this spoken language assessment.This method can update authentication number according to the previous identification result of student, to obtain the certification number in next spoken assessment, credibility is poor, increase the certification number in spoken assessment process next time, accurately and efficiently the identity for the people that tests and assesses in spoken assessment process is identified to realize.
Description
Technical field
The invention belongs to field of information processing, and in particular to the spoken assessment authentication side based on speaker Recognition Technology
Method.
Background technique
English learns to get up less susceptible, needs to build sufficient language in daily life, teaching as a foreign language
Environment, to assist student to learn, therefore spoken assessment becomes for one of widely used means.Teacher can do the best on classroom
Build language environment, help student study English, but this can not allow teacher grasp completely each student truth and
The other problems such as the pronunciation for needing to correct in learning process.Spoken language assessment can solve this problem, and spoken language assessment is used for class
Lower student voluntarily completes oral test, and test result is uploaded to teacher, and teacher can understand the truth of each student,
And correct the pronunciation etc. of different students.This requires increasing the function of identification in spoken evaluation system, to the body of assessment people
Part is judged.
Common identity recognizing technology includes the identity identifying technology of fingerprint, iris, face, handwritten signature and voice.
Voice is the important carrier of identity information, and compared with the other biologicals feature such as face, fingerprint, the procurement cost of voice is cheap, is made
With simple, be convenient for remote data acquisition, and voice-based man-machine communication interface is more friendly, thus speaker Recognition Technology at
For important automatic identity authentication technology.
The spoken assessment identity identifying method that it is urgent to provide a kind of thus based on speaker Recognition Technology, being capable of precise and high efficiency
Ground identifies the identity for the people that tests and assesses in spoken assessment process.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of spoken assessment identity based on speaker Recognition Technology and recognizes
Card method, can accurately and efficiently in spoken assessment process test and assess people identity identify.
Spoken assessment identity identifying method based on speaker Recognition Technology, comprising the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, and initialization identity is recognized
Demonstrate,prove score;
S2: it when user starts spoken assessment function, is calculated and recognizes according to the total duration T of assessment, authentication score S
Demonstrate,prove number and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if
Matching, this is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
Preferably, the step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if
It is to update authentication score.
Preferably, in the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or from user's
It is transferred in test database.
Preferably, the step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor the authenticated time of i-th certification, B is between 0~12S
Random number.
Preferably, the step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step
S3c;Otherwise, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number is accumulated
1, return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to next
Secondary authenticated time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
Preferably, in the step S4, authentication score is the inverse of authentification failure number.
Preferably, this method analysis user voice messaging when, construct several classifiers first, then by classifier into
Row fusion, obtains received pronunciation template.
Preferably, the construction method of the classifier is as follows:
Firstly, extracting the corresponding JFA words person super vector of voice messaging, one is chosen from the mean vector in JFA super vector
A lower subspace of new dimension;Then, the feature vector in the subspace is carried out using principal component analytical method optimal
Dimensionality reduction is projected into the lower-dimensional subspace that dimension is J;Secondly, being obtained in the lower-dimensional subspace using random sampling technique
To several stochastic subspaces;Finally, for each stochastic subspace, carry out that covariance in class is regular and nonparametric line respectively
Property distinguishing analysis, to obtain the corresponding projection matrix of each stochastic subspace, i.e. classifier.
Preferably, the output of classifier is merged using dynamic fusion method.
Preferably, the dynamic fusion method specifically:
Firstly, analyze the development set voice data collection X from a large amount of speakers, according to certain criterion by its stroke
It is divided into K subset SK;Then, the voice data in each subset is tested with each classifier, counts corresponding score
Output;Finally, using score averages as the weight for determining that classifier is closed in each collection.
As shown from the above technical solution, the spoken assessment authentication side provided by the invention based on speaker Recognition Technology
Method can update authentication number according to the previous identification result of student, to obtain recognizing in next spoken assessment
Number is demonstrate,proved, certification next time number is determined according to the credibility of the previous spoken assessment of student, credibility is poor, increases spoken assessment next time
Certification number in the process accurately and efficiently identifies the identity for the people that tests and assesses in spoken assessment process to realize.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element
Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is the flow chart of the spoken assessment identity identifying method based on speaker Recognition Technology.
Fig. 2 is that the multi-categorizer based on simultaneous factor analysis super vector constructs schematic diagram.
Fig. 3 is the determination method schematic diagram of basic classifier part classification confidence.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for
Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention
It encloses.It should be noted that unless otherwise indicated, technical term or scientific term used in this application are should be belonging to the present invention
The ordinary meaning that field technical staff is understood.
Spoken assessment identity identifying method based on speaker Recognition Technology, as shown in Figure 1, comprising the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, and initialization identity is recognized
Demonstrate,prove score;
S2: it when user starts spoken assessment function, is calculated and recognizes according to the total duration T of assessment, authentication score S
Demonstrate,prove number and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if
Matching, this is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
When realizing the authentication of spoken assessment process using this method, avoids and often answer a problem and just need to carry out once
Authentication, certification number is excessive, reduces the efficiency of spoken assessment.It is very little that authentication number is also avoided simultaneously, is risen not
To the effect of supervision.The credibility (i.e. authentication score) that this method is tested and assessed by last time spoken language determines next spoken assessment
Authentication number in the process, i.e. authentication score is lower, and credibility is poorer, illustrates possibility of the student there are cheating
Property it is higher, for this student, increase authentication number in spoken assessment process next time.Conversely, being directed to authentication score
Higher, the better student of credibility reduces authentication number in spoken assessment process next time.When it is implemented, authentication
The value range of score is 0~10.When user's registration, it is 1 that default identity, which authenticates score, is in the lowest class.This method can
Authentication number is updated according to the previous identification result of student, so that the certification number in next spoken assessment is obtained,
Certification next time number is determined according to the credibility of the previous spoken assessment of student, and credibility is poor, increases in spoken assessment process next time
Certification number, thus realize accurately and efficiently in spoken assessment process test and assess people identity identify.
The step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if so, updating
Authentication score.This method can also receive the authentication score of teacher's feedback, if teacher is in the mistake for listening evaluating result
, can be according to cheating degree scoring when the phenomenon that Cheng Zhong, discovery student has cheating or people is looked for read in generation, cheating degree is serious, score
It is lower.The spoken survey of student is monitored feed back authentication score by teacher and the credibility two of student that counts in the past in terms of
It comments, the supervision to student can be better achieved, can efficiently complete spoken assessment.
In the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or from the test number of user
According to being transferred in library.This method is also provided with template on probation before the spoken assessment of starting, and template on probation is for simulating formal mouth
The scene of language assessment enters template on probation in user in use, the voice messaging of user is stored in test database, as the later period
The basis that received pronunciation template is called.
The step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor the authenticated time of i-th certification, B is between 0~12S
Random number.
It is directly proportional to the total duration T of assessment to authenticate number A, is inversely proportional with authentication score S, i.e. total duration T is longer, recognizes
Card number is more, and authentication score S is higher, and certification number is fewer.A is rounded using rounding-off method.The selection of authenticated time
It is the truth that random, random authenticated time can more accurately recognize student.As authenticated time PiGreater than it is total when
When long T, P is setiEqual to total duration T, T and B unit is the second, and the unit of S is secondary.
The step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step
S3c;Otherwise, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number is accumulated
1, return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to next
Secondary authenticated time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
Authentication error upper limit value is for measuring most errors numbers, preferably 3 during one-time identity authentication.Certification is wrong
Accidentally reason may be that others is looked for receive caused by interference, ambient enviroment complexity etc. for reading, voice signal.The extension time is mainly used for
Measure the validity of voice signal.Since authenticated time is random, so if it is me when doing spoken assessment, when random
When pop-up is authenticated, then the voice messaging of user can be quickly received, and authenticated.If it is looking for others' generation
It reads, then may be authenticated with regard to needing generation reading people to find user, the time for receiving the voice messaging of user naturally will be grown.
So extend the time should not be arranged it is too long, preferably 5-10 seconds.If extend time arrival, still it is not received by user's
Voice messaging, then it is assumed that this time authentication failure.If voice messaging is received extending in the time, if matching this body
Part authenticates successfully.If mismatched, authentication error, when authentication error number reaches authentication error upper limit value, it is determined as this
Authentication failure.
In the step S4, authentication score is the inverse of authentification failure number.The authentication authorization and accounting frequency of failure is more, identity
It is fewer to authenticate score, then certification number is more in spoken assessment process next time.Conversely, authentification failure number is fewer, identity is recognized
It is bigger to demonstrate,prove score, then certification number is fewer in spoken assessment process next time.
The present embodiment is directed to method for distinguishing speek person, proposes a kind of Combining Multiple Classifiers of dynamic self-adapting.Herein
In method, the local classification performance of each fundamental classifier has been fully considered, avoid higher point of weight in linear fusion method
Class device floods the local classification capacity of the lower classifier of weight, and then improves the recognition result reliability to tested speech.
When this method analyzes the voice messaging of user, several classifiers are constructed first, then merge classifier, obtain standard
Sound template.
1, the building of classifier.
As shown in Fig. 2, using simultaneous factor analysis words person super vector (JFA) as the feature representation of speaker in the present invention,
Multiple fundamental classifiers are constructed using the double-deck subspace method of sampling, the first layer subspace sampling in the algorithm is for group
It is carried out at the mean value of each gauss component of simultaneous factor analysis words person's super vector, it is therefore an objective to remove a part of redundancy letter
Breath, determines the subspace of a suitable dimension;The second layer is then obtained after the optimal dimensionality reduction of PCA in first layer subspace
Stochastical sampling is carried out in the subspace of more low dimensional, forms several new subspaces.
Simultaneous factor analysis words person super vector and traditional GMM-UBM mean value super vector be in composed structure it is the same,
It can regard as and be spliced in order by the mean vector of gauss component each in GMM model.So proposed by the present invention
It is basic that the sampling of first layer subspace in the sampling algorithm of subspace, which is with the mean vector in simultaneous factor analysis super vector,
Unit carries out.Specifically, in the case where the h articles voice for giving i-th of speaker, it is assumed that the Gauss of UBM model
Component number is N, then the corresponding JFA super vector M of this voice dataihIt can be expressed as the combination of N number of Gaussian mean vector:
Mih=[mih1,mih2,...,mihN].Key step includes:
1) the corresponding JFA words person super vector M of every voice in development set is extractedih。
2) in order to tentatively remove a part of redundancy in higher-dimension original feature space, from composition JFA super vector
A part is chosen in mean vector and forms a lower subspace of new dimension, which contains in JFA super vector
Most of useful information, if the low dimensional feature vector S in the subspaceihIt indicates are as follows: Sih=[m 'ih1,m′ih2,...,m
′ihk]。
3) due to feature vector SihStill dimension with higher, and the numeric distribution of each dimension is than sparse, so
It still include a large amount of redundancy.Next using principal component analytical method to feature vector SihOptimal dimensionality reduction is carried out, by it
It projects in the lower-dimensional subspace that dimension is J.
4) in subspace obtained after PCA dimensionality reduction, several are obtained with loom sky using random sampling technique
Between.
5) for each stochastic subspace, carry out that covariance in class is regular and the linear distinguishing analysis of nonparametric respectively, from
And a projection matrix is obtained, multiplying for two projection matrixes can be expressed as corresponding to the projection matrix of each stochastic subspace
Product, i.e., the product of the regular projection matrix of covariance and the linear distinguishing analysis projection matrix of nonparametric in class.
According to the subspace analysis in above step as a result, a sub-spaces available for every sub-spaces are classified
Device.
2, adaptive multiple Classifiers Combination.
As shown in figure 3, the present invention constructs fundamental classifier, the party using the method for stochastical sampling is carried out in PCA space
Method is carried out based on different character subset, so not only variant between each fundamental classifier but also have certain complementarity.It adopts
The output of multiple fundamental classifiers is subjected to effective integration with dynamic fusion method, then can greatly improve speaker identification system
Performance.
In the training stage, in order to which the local classification capacity to fundamental classifier is evaluated, first to from largely speaking
The development set voice data collection X of people is analyzed, and is divided into K subset S according to certain criterion1,S2,...,SK, divide
There is certain similitude to a certain extent, then with each fundamental classifier to each between voice data into identity set
Voice data in a set is tested, and corresponding score output is counted, finally using score averages as determining classifier
In the foundation of each test result confidence level for collecting and closing.In this way, fundamental classifier can be obtained in each set
On classification capacity, it is final to determine that each fundamental classifier is in each confidence level vector w for collecting and closing in blending algorithm1,
w2,...,wQ.Each confidence level vector includes K value, represents the classifier to a certain classification confidence for collecting and closing.
In conjunction with solving the problems, such as herein, Dynamic multiple classifiers combination process may be summarized to be following steps:
1) suitable development set corpus is selected, if the corpus includes N number of different speaker, each speaker has two
Voice data.In development set corpus, a composition training set X is taken out from each speaker's voice1, remaining as ginseng
According to collection X2。
2) according to simultaneous factor analysis theory, speaker's factor of all voices in development set corpus is extracted, it is assumed that come from
Speaker's factor sequence of training set is expressed as
3) with Y1As input, the lesser gauss hybrid models of blending constituent number are trained, λ={ w is expressed asi,
μi,Σi, i=1 ..., K.Wherein parameter wi、μiAnd ΣiRespectively represent the weight of each gauss component in gauss hybrid models,
Mean value and covariance.If speaker's factorOccupation rate to i-th of gauss component in GMM model isWhenWhen, it willIt is divided into k-th of subset SkIn, it in this way, will be all in training set
Speaker's factor is divided into K different set.
4) according to the division result of speaker's factor, corresponding training voice is also divided into K set in previous step.
5) for a certain set Sk, trained voice therein and its corresponding reference from same speaker are concentrated
Voice data project in q-th of stochastic subspace, respectively obtain trained voice and the reference vector referring to voice.
6) it calculates trained voice and referring to the COS distance between phonetic reference vector, divides in this, as q-th of NLDA
The test score of class device exports.
7) q-th of NLDA classifier is calculated in set SkOn all test scores average valueExist as the classifier
Set SkOn classification confidence.Correspondingly, the local classification confidence vector of the fundamental classifier can be expressed as
8) in the multiple Classifiers Combination stage, for a certain voice to be measured, it is extracted first, in accordance with simultaneous factor analysis theory
Voice data to be measured, is then divided the criteria for classifying of development set data further according in training process by corresponding speaker's factor
To a certain set SkIn, finally with each fundamental classifier in set SkOn classification confidence valueAs power
Linear fusion is carried out to the output of all fundamental classifiers again.
During the determination of fundamental classifier part classification confidence, it is assumed that have between the personal characteristics of certain speakers
Certain similitude, and the phonetic feature of these speakers with similitude also has certain similitude in the regularity of distribution,
In feature space in some regional area.In the present invention, by speaker's mentioned speech feature vector sequence of different length
JFA words person's super vector that there is regular length and eliminate local channel influence is converted by simultaneous factor analysis technology.JFA
Distribution situation of words person's super vector in high-dimensional feature space has reacted the distribution of different speaker's personal characteristics.And it is adopted in this chapter
With the distribution of speaker's factor come the distribution situation of approximate simulation JFA words person's super vector, this is because:
1) JFA words person super vector often has very high dimension, using common statistical mathematics model be difficult to higher-dimension to
The regularity of distribution of amount carries out accurate modeling.
2) in order to guarantee not lose most of useful information, JFA words person's super vector is projected into nonparametric linear zone molecule
Still dimension with higher behind space.
3) speaker's factor dimension for JFA words person's vector after the projection in previous step is lower, and speaks
The extraction process of people's factor is also based on simultaneous factor analysis algorithm, so necessary speaker's individual information is also contained,
It can reflect the distribution of JFA words person's super vector.
It can be seen that in Combining Multiple Classifiers proposed by the present invention from the above content, according to voice data to be measured
Locating region determines weight of each fundamental classifier in score fusion process in speaker's factor space.Due to every
Speaker's factor of item voice to be measured has different distributions situation, so the weight of each fundamental classifier is with language to be measured
The difference of sound and dynamic change.It is worth noting that in multiple Classifiers Combination algorithm of the invention, each fundamental classifier
Fusion weight can determine that this mode substantially increases the real-time of emerging system before test.
3, evaluating system performance.
Experimental data is derived from 2008 speaker of NIST and evaluates and tests database, wherein trained and tested speech still selects core to comment
The performance of speaker identification system is measured in male's phone training in survey task to call test part as evaluation and test data set.
The training data of UBM comes from Switchboard II phase 2, Switchboard II phase 3, Switchboard
Telephone voice data in Cellular Part 2 and NIST SRE 2004,2005,2006, share 2048 Gausses at
Point.
To training nonparametric subspace distinguishing analysis projection matrix development set data be taken from NIST SRE2004,
2005, the call voice in 2006 databases, includes 563 speakers altogether, and each speaker has 8 voice data.
UBM is identical as the above in simultaneous factor analysis system, and speaker space load moment rank of matrix is 300, intrinsic
Channel space load moment rank of matrix be 100, residual error loading matrix by each gauss component in UBM model diagonal covariance square
Diagonal entry in battle array is spliced.
Covariance is regular in principal component analysis employed in the present invention, class and the linear distinguishing analysis of nonparametric projects square
The dimension of battle array is respectively as follows: (51 × k) × J, (E1+E2) × 799,799 × 550.Number, that is, fundamental classifier of stochastic subspace
Number Q is set as 10.In the linear distinguishing analysis of nonparametric, the number of neighbour's sample is set as 4.
After the subspace sampling in original feature space, we obtain new feature vector Sih.Assuming that first
In straton spatial sampling, we finally have chosen preceding 1280 Gaussian mean vectors in JFA words person's super vector after sequence.But
Be this feature vector dimension it is still very high for the training sample in development set.So reliable and stable in order to train
Subspace classifier, need further to project to new feature vector the subspace PCA of low-dimensional, set dropped by PCA here
The dimension of feature vector after dimension is J.Before carrying out stochastical sampling, in order to guarantee the property of each basic subspace classifier
Can, the preceding E compared with multiple data quantity will be contained first1A pivot component is fixed up, and random sampling algorithms are only applied to remaining J-E1
A pivot component, therefrom randomly selects E2A pivot component dimension is E1+E2Stochastic subspace.
In the experiment of second layer sample space, the value of J is fixed as 1200 or 1300, which determined by cross validation
The relatively figure of merit.E1+E2Value be fixed as 800.For each combination (E1, E2), we create 10 sub-spaces, i.e., 10 at random
A fundamental classifier.
The case where performance that dynamic self-adapting blending algorithm has been investigated in first group of experiment changes with clusters number K.By
In clustering method using GMM algorithm, and training data is limited, therefore the value of K is respectively set to 8,16 and 32.Experiment knot
Fruit list in table 1.
1 dynamic self-adapting fusion method experimental result of table
In table 1, when K is 8,16,32, dynamic self-adapting fusion results are to E1And E2EER under all combination conditions with
The mean value of minDCF is respectively as follows: 4.02,2.20;3.89 2.14;4.02 2.20.It can be seen that when the value of K is 12, fusion
System performance afterwards is best.The reason is that, when the value of clusters number K is smaller, it cannot be effectively by the feature of similar speaker
Vector flocks together, and the local classification capacity of fundamental classifier cannot be effectively reflected out, and causing it, locally classification is set
The estimation of reliability is not accurate enough;Conversely, the GMM mould when the value of K is larger for the scale of training data, for cluster
The complexity of type increases, and model parameter is easy to appear over-fitting in estimation procedure, causes the part point of fundamental classifier
Class confidence level cannot be effectively estimated.First group of experimental result sufficiently shows that fundamental classifier can be made when the value of K is 16
Local classification confidence estimation it is more accurate.
Second group of experiment then comparative analysis dynamic self-adapting fusion method (DY) and linear fusion proposed by the invention
Algorithm (LR), and the classical syncretizing effect for being applied to be based on Logistic regression algorithm (LG) in speaker verification field,
Wherein clusters number K=16 in dynamic self-adapting fusion method.
The comparison of the different fusion methods of table 2
E is listed in table 21And E2Three kinds of blending algorithms in various combination as a result, being constructed for every kind of combination
10 fundamental classifiers out.There it can be seen that dynamic self-adapting fusion method proposed by the invention is equal for every group of experiment
Minimum EER value, the blending algorithm followed by returned based on Logistic can be obtained, linear fusion system has highest EER,
Performance is worst.In terms of minDCF, dynamic self-adapting blending algorithm except third group experiment in addition to every group of experiment in substantially all
Minimum detection cost can be obtained.Especially in the 5th group of experiment, the EER of dynamic self-adapting fusion is that 3.76, minDCF is
2.08, system performance reaches best, than the minimum EER value relative reduction 3.7% for returning blending algorithm based on Logistic, than
Linear fusion minimum EER value relative reduction 6.6% accordingly.This sufficiently shows proposed in this paper based on stochastic subspace sampling
Dynamic multiple classifiers combination algorithm validity, and the blending algorithm is suitable for any subspace classifier, has fine
Generalization.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme should all cover within the scope of the claims and the description of the invention.
Claims (10)
1. the spoken assessment identity identifying method based on speaker Recognition Technology, which comprises the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, initialization authentication point
Number;
S2: when user starts spoken assessment function, certification time is calculated according to the total duration T of assessment, authentication score S
Several and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if it does,
This is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
2. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that
The step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if so, updating authentication
Score.
3. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that
In the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or adjusted from the test database of user
It takes.
4. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that
The step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor i-th certification authenticated time, B between 0~12S with
Machine number.
5. the spoken assessment identity identifying method according to claim 4 based on speaker Recognition Technology, which is characterized in that
The step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step S3c;It is no
Then, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number accumulation 1, is returned
Return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to recognizing next time
The card time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
6. the spoken assessment identity identifying method according to claim 5 based on speaker Recognition Technology, which is characterized in that
In the step S4, authentication score is the inverse of authentification failure number.
7. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that
When this method analyzes the voice messaging of user, several classifiers are constructed first, then merge classifier, obtain standard
Sound template.
8. the spoken assessment identity identifying method according to claim 7 based on speaker Recognition Technology, which is characterized in that
The construction method of the classifier is as follows:
Firstly, extracting the corresponding JFA words person super vector of voice messaging, one is chosen from the mean vector in JFA super vector newly
The lower subspace of dimension;Then, optimal dimensionality reduction is carried out to the feature vector in the subspace using principal component analytical method,
It is projected into the lower-dimensional subspace that dimension is J;Secondly, in the lower-dimensional subspace, if being obtained using random sampling technique
Dry stochastic subspace;Finally, for each stochastic subspace, carry out that covariance in class is regular and nonparametric linear zone respectively
Analysis, to obtain the corresponding projection matrix of each stochastic subspace, i.e. classifier.
9. the spoken assessment identity identifying method according to claim 7 based on speaker Recognition Technology, which is characterized in that
The output of classifier is merged using dynamic fusion method.
10. the spoken assessment identity identifying method according to claim 9 based on speaker Recognition Technology, feature exist
In the dynamic fusion method specifically:
Firstly, analyzing the development set voice data collection X from a large amount of speakers, it is divided into according to certain criterion
K subset SK;Then, the voice data in each subset is tested with each classifier, counts corresponding score output;
Finally, using score averages as the weight for determining that classifier is closed in each collection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611141701.2A CN106531170B (en) | 2016-12-12 | 2016-12-12 | Spoken assessment identity identifying method based on speaker Recognition Technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611141701.2A CN106531170B (en) | 2016-12-12 | 2016-12-12 | Spoken assessment identity identifying method based on speaker Recognition Technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106531170A CN106531170A (en) | 2017-03-22 |
CN106531170B true CN106531170B (en) | 2019-09-17 |
Family
ID=58342159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611141701.2A Active CN106531170B (en) | 2016-12-12 | 2016-12-12 | Spoken assessment identity identifying method based on speaker Recognition Technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106531170B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108417226A (en) * | 2018-01-09 | 2018-08-17 | 平安科技(深圳)有限公司 | Speech comparison method, terminal and computer readable storage medium |
CN112017694B (en) * | 2020-08-25 | 2021-08-20 | 天津洪恩完美未来教育科技有限公司 | Voice data evaluation method and device, storage medium and electronic device |
CN113053395B (en) * | 2021-03-05 | 2023-11-17 | 深圳市声希科技有限公司 | Pronunciation error correction learning method and device, storage medium and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6205424B1 (en) * | 1996-07-31 | 2001-03-20 | Compaq Computer Corporation | Two-staged cohort selection for speaker verification system |
CN102142254A (en) * | 2011-03-25 | 2011-08-03 | 北京得意音通技术有限责任公司 | Voiceprint identification and voice identification-based recording and faking resistant identity confirmation method |
CN102708867A (en) * | 2012-05-30 | 2012-10-03 | 北京正鹰科技有限责任公司 | Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice |
CN103065642B (en) * | 2012-12-31 | 2015-06-17 | 安徽科大讯飞信息科技股份有限公司 | Method and system capable of detecting oral test cheating |
CN105810199A (en) * | 2014-12-30 | 2016-07-27 | 中国科学院深圳先进技术研究院 | Identity verification method and device for speakers |
-
2016
- 2016-12-12 CN CN201611141701.2A patent/CN106531170B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106531170A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features | |
An et al. | Deep CNNs with self-attention for speaker identification | |
Hansen et al. | Speaker recognition by machines and humans: A tutorial review | |
CN100363938C (en) | Multi-model ID recognition method based on scoring difference weight compromised | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
Wan | Speaker verification using support vector machines | |
CN105261367B (en) | A kind of method for distinguishing speek person | |
CN101710490A (en) | Method and device for compensating noise for voice assessment | |
CN103198833B (en) | A kind of high precision method for identifying speaker | |
CN106531170B (en) | Spoken assessment identity identifying method based on speaker Recognition Technology | |
CN110299142A (en) | A kind of method for recognizing sound-groove and device based on the network integration | |
Pinto et al. | Exploiting contextual information for improved phoneme recognition | |
Bhardwaj et al. | GFM-based methods for speaker identification | |
Fu et al. | Speaker independent emotion recognition based on SVM/HMMs fusion system | |
CN110364168A (en) | A kind of method for recognizing sound-groove and system based on environment sensing | |
CN110111798A (en) | A kind of method and terminal identifying speaker | |
Shi et al. | Visual speaker authentication by ensemble learning over static and dynamic lip details | |
Shivakumar et al. | Simplified and supervised i-vector modeling for speaker age regression | |
Ng et al. | Teacher-student training for text-independent speaker recognition | |
CN110085236B (en) | Speaker recognition method based on self-adaptive voice frame weighting | |
Michalevsky et al. | Speaker identification using diffusion maps | |
Chen | On the use of different speech representations for speaker modeling | |
Zhang et al. | Optimized discriminative kernel for SVM scoring and its application to speaker verification | |
Ping | English speech recognition method based on hmm technology | |
Golipour et al. | Context-independent phoneme recognition using a k-nearest neighbour classification approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |