CN106531170B - Spoken assessment identity identifying method based on speaker Recognition Technology - Google Patents

Spoken assessment identity identifying method based on speaker Recognition Technology Download PDF

Info

Publication number
CN106531170B
CN106531170B CN201611141701.2A CN201611141701A CN106531170B CN 106531170 B CN106531170 B CN 106531170B CN 201611141701 A CN201611141701 A CN 201611141701A CN 106531170 B CN106531170 B CN 106531170B
Authority
CN
China
Prior art keywords
spoken
authentication
assessment
recognition technology
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611141701.2A
Other languages
Chinese (zh)
Other versions
CN106531170A (en
Inventor
姜卫武
李娜
李坤
孙立发
钟静华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201611141701.2A priority Critical patent/CN106531170B/en
Publication of CN106531170A publication Critical patent/CN106531170A/en
Application granted granted Critical
Publication of CN106531170B publication Critical patent/CN106531170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Spoken assessment identity identifying method provided by the invention based on speaker Recognition Technology obtains received pronunciation template firstly, obtaining and analyzing the voice messaging of user when for user's registration, initializes authentication score;Then, when user starts spoken assessment function, certification number and authenticated time are calculated according to the total duration T of assessment, authentication score S;Secondly, the voice messaging of user is obtained, and compare with received pronunciation template, if it does, this is authenticated successfully when authenticated time reaches;Otherwise it returns and authenticated time next time is monitored;Finally, updating authentication score according to the authentication result of this spoken language assessment.This method can update authentication number according to the previous identification result of student, to obtain the certification number in next spoken assessment, credibility is poor, increase the certification number in spoken assessment process next time, accurately and efficiently the identity for the people that tests and assesses in spoken assessment process is identified to realize.

Description

Spoken assessment identity identifying method based on speaker Recognition Technology
Technical field
The invention belongs to field of information processing, and in particular to the spoken assessment authentication side based on speaker Recognition Technology Method.
Background technique
English learns to get up less susceptible, needs to build sufficient language in daily life, teaching as a foreign language Environment, to assist student to learn, therefore spoken assessment becomes for one of widely used means.Teacher can do the best on classroom Build language environment, help student study English, but this can not allow teacher grasp completely each student truth and The other problems such as the pronunciation for needing to correct in learning process.Spoken language assessment can solve this problem, and spoken language assessment is used for class Lower student voluntarily completes oral test, and test result is uploaded to teacher, and teacher can understand the truth of each student, And correct the pronunciation etc. of different students.This requires increasing the function of identification in spoken evaluation system, to the body of assessment people Part is judged.
Common identity recognizing technology includes the identity identifying technology of fingerprint, iris, face, handwritten signature and voice. Voice is the important carrier of identity information, and compared with the other biologicals feature such as face, fingerprint, the procurement cost of voice is cheap, is made With simple, be convenient for remote data acquisition, and voice-based man-machine communication interface is more friendly, thus speaker Recognition Technology at For important automatic identity authentication technology.
The spoken assessment identity identifying method that it is urgent to provide a kind of thus based on speaker Recognition Technology, being capable of precise and high efficiency Ground identifies the identity for the people that tests and assesses in spoken assessment process.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of spoken assessment identity based on speaker Recognition Technology and recognizes Card method, can accurately and efficiently in spoken assessment process test and assess people identity identify.
Spoken assessment identity identifying method based on speaker Recognition Technology, comprising the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, and initialization identity is recognized Demonstrate,prove score;
S2: it when user starts spoken assessment function, is calculated and recognizes according to the total duration T of assessment, authentication score S Demonstrate,prove number and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if Matching, this is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
Preferably, the step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if It is to update authentication score.
Preferably, in the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or from user's It is transferred in test database.
Preferably, the step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor the authenticated time of i-th certification, B is between 0~12S Random number.
Preferably, the step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step S3c;Otherwise, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number is accumulated 1, return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to next Secondary authenticated time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
Preferably, in the step S4, authentication score is the inverse of authentification failure number.
Preferably, this method analysis user voice messaging when, construct several classifiers first, then by classifier into Row fusion, obtains received pronunciation template.
Preferably, the construction method of the classifier is as follows:
Firstly, extracting the corresponding JFA words person super vector of voice messaging, one is chosen from the mean vector in JFA super vector A lower subspace of new dimension;Then, the feature vector in the subspace is carried out using principal component analytical method optimal Dimensionality reduction is projected into the lower-dimensional subspace that dimension is J;Secondly, being obtained in the lower-dimensional subspace using random sampling technique To several stochastic subspaces;Finally, for each stochastic subspace, carry out that covariance in class is regular and nonparametric line respectively Property distinguishing analysis, to obtain the corresponding projection matrix of each stochastic subspace, i.e. classifier.
Preferably, the output of classifier is merged using dynamic fusion method.
Preferably, the dynamic fusion method specifically:
Firstly, analyze the development set voice data collection X from a large amount of speakers, according to certain criterion by its stroke It is divided into K subset SK;Then, the voice data in each subset is tested with each classifier, counts corresponding score Output;Finally, using score averages as the weight for determining that classifier is closed in each collection.
As shown from the above technical solution, the spoken assessment authentication side provided by the invention based on speaker Recognition Technology Method can update authentication number according to the previous identification result of student, to obtain recognizing in next spoken assessment Number is demonstrate,proved, certification next time number is determined according to the credibility of the previous spoken assessment of student, credibility is poor, increases spoken assessment next time Certification number in the process accurately and efficiently identifies the identity for the people that tests and assesses in spoken assessment process to realize.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is the flow chart of the spoken assessment identity identifying method based on speaker Recognition Technology.
Fig. 2 is that the multi-categorizer based on simultaneous factor analysis super vector constructs schematic diagram.
Fig. 3 is the determination method schematic diagram of basic classifier part classification confidence.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.It should be noted that unless otherwise indicated, technical term or scientific term used in this application are should be belonging to the present invention The ordinary meaning that field technical staff is understood.
Spoken assessment identity identifying method based on speaker Recognition Technology, as shown in Figure 1, comprising the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, and initialization identity is recognized Demonstrate,prove score;
S2: it when user starts spoken assessment function, is calculated and recognizes according to the total duration T of assessment, authentication score S Demonstrate,prove number and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if Matching, this is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
When realizing the authentication of spoken assessment process using this method, avoids and often answer a problem and just need to carry out once Authentication, certification number is excessive, reduces the efficiency of spoken assessment.It is very little that authentication number is also avoided simultaneously, is risen not To the effect of supervision.The credibility (i.e. authentication score) that this method is tested and assessed by last time spoken language determines next spoken assessment Authentication number in the process, i.e. authentication score is lower, and credibility is poorer, illustrates possibility of the student there are cheating Property it is higher, for this student, increase authentication number in spoken assessment process next time.Conversely, being directed to authentication score Higher, the better student of credibility reduces authentication number in spoken assessment process next time.When it is implemented, authentication The value range of score is 0~10.When user's registration, it is 1 that default identity, which authenticates score, is in the lowest class.This method can Authentication number is updated according to the previous identification result of student, so that the certification number in next spoken assessment is obtained, Certification next time number is determined according to the credibility of the previous spoken assessment of student, and credibility is poor, increases in spoken assessment process next time Certification number, thus realize accurately and efficiently in spoken assessment process test and assess people identity identify.
The step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if so, updating Authentication score.This method can also receive the authentication score of teacher's feedback, if teacher is in the mistake for listening evaluating result , can be according to cheating degree scoring when the phenomenon that Cheng Zhong, discovery student has cheating or people is looked for read in generation, cheating degree is serious, score It is lower.The spoken survey of student is monitored feed back authentication score by teacher and the credibility two of student that counts in the past in terms of It comments, the supervision to student can be better achieved, can efficiently complete spoken assessment.
In the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or from the test number of user According to being transferred in library.This method is also provided with template on probation before the spoken assessment of starting, and template on probation is for simulating formal mouth The scene of language assessment enters template on probation in user in use, the voice messaging of user is stored in test database, as the later period The basis that received pronunciation template is called.
The step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor the authenticated time of i-th certification, B is between 0~12S Random number.
It is directly proportional to the total duration T of assessment to authenticate number A, is inversely proportional with authentication score S, i.e. total duration T is longer, recognizes Card number is more, and authentication score S is higher, and certification number is fewer.A is rounded using rounding-off method.The selection of authenticated time It is the truth that random, random authenticated time can more accurately recognize student.As authenticated time PiGreater than it is total when When long T, P is setiEqual to total duration T, T and B unit is the second, and the unit of S is secondary.
The step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step S3c;Otherwise, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number is accumulated 1, return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to next Secondary authenticated time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
Authentication error upper limit value is for measuring most errors numbers, preferably 3 during one-time identity authentication.Certification is wrong Accidentally reason may be that others is looked for receive caused by interference, ambient enviroment complexity etc. for reading, voice signal.The extension time is mainly used for Measure the validity of voice signal.Since authenticated time is random, so if it is me when doing spoken assessment, when random When pop-up is authenticated, then the voice messaging of user can be quickly received, and authenticated.If it is looking for others' generation It reads, then may be authenticated with regard to needing generation reading people to find user, the time for receiving the voice messaging of user naturally will be grown. So extend the time should not be arranged it is too long, preferably 5-10 seconds.If extend time arrival, still it is not received by user's Voice messaging, then it is assumed that this time authentication failure.If voice messaging is received extending in the time, if matching this body Part authenticates successfully.If mismatched, authentication error, when authentication error number reaches authentication error upper limit value, it is determined as this Authentication failure.
In the step S4, authentication score is the inverse of authentification failure number.The authentication authorization and accounting frequency of failure is more, identity It is fewer to authenticate score, then certification number is more in spoken assessment process next time.Conversely, authentification failure number is fewer, identity is recognized It is bigger to demonstrate,prove score, then certification number is fewer in spoken assessment process next time.
The present embodiment is directed to method for distinguishing speek person, proposes a kind of Combining Multiple Classifiers of dynamic self-adapting.Herein In method, the local classification performance of each fundamental classifier has been fully considered, avoid higher point of weight in linear fusion method Class device floods the local classification capacity of the lower classifier of weight, and then improves the recognition result reliability to tested speech. When this method analyzes the voice messaging of user, several classifiers are constructed first, then merge classifier, obtain standard Sound template.
1, the building of classifier.
As shown in Fig. 2, using simultaneous factor analysis words person super vector (JFA) as the feature representation of speaker in the present invention, Multiple fundamental classifiers are constructed using the double-deck subspace method of sampling, the first layer subspace sampling in the algorithm is for group It is carried out at the mean value of each gauss component of simultaneous factor analysis words person's super vector, it is therefore an objective to remove a part of redundancy letter Breath, determines the subspace of a suitable dimension;The second layer is then obtained after the optimal dimensionality reduction of PCA in first layer subspace Stochastical sampling is carried out in the subspace of more low dimensional, forms several new subspaces.
Simultaneous factor analysis words person super vector and traditional GMM-UBM mean value super vector be in composed structure it is the same, It can regard as and be spliced in order by the mean vector of gauss component each in GMM model.So proposed by the present invention It is basic that the sampling of first layer subspace in the sampling algorithm of subspace, which is with the mean vector in simultaneous factor analysis super vector, Unit carries out.Specifically, in the case where the h articles voice for giving i-th of speaker, it is assumed that the Gauss of UBM model Component number is N, then the corresponding JFA super vector M of this voice dataihIt can be expressed as the combination of N number of Gaussian mean vector: Mih=[mih1,mih2,...,mihN].Key step includes:
1) the corresponding JFA words person super vector M of every voice in development set is extractedih
2) in order to tentatively remove a part of redundancy in higher-dimension original feature space, from composition JFA super vector A part is chosen in mean vector and forms a lower subspace of new dimension, which contains in JFA super vector Most of useful information, if the low dimensional feature vector S in the subspaceihIt indicates are as follows: Sih=[m 'ih1,m′ih2,...,m ′ihk]。
3) due to feature vector SihStill dimension with higher, and the numeric distribution of each dimension is than sparse, so It still include a large amount of redundancy.Next using principal component analytical method to feature vector SihOptimal dimensionality reduction is carried out, by it It projects in the lower-dimensional subspace that dimension is J.
4) in subspace obtained after PCA dimensionality reduction, several are obtained with loom sky using random sampling technique Between.
5) for each stochastic subspace, carry out that covariance in class is regular and the linear distinguishing analysis of nonparametric respectively, from And a projection matrix is obtained, multiplying for two projection matrixes can be expressed as corresponding to the projection matrix of each stochastic subspace Product, i.e., the product of the regular projection matrix of covariance and the linear distinguishing analysis projection matrix of nonparametric in class.
According to the subspace analysis in above step as a result, a sub-spaces available for every sub-spaces are classified Device.
2, adaptive multiple Classifiers Combination.
As shown in figure 3, the present invention constructs fundamental classifier, the party using the method for stochastical sampling is carried out in PCA space Method is carried out based on different character subset, so not only variant between each fundamental classifier but also have certain complementarity.It adopts The output of multiple fundamental classifiers is subjected to effective integration with dynamic fusion method, then can greatly improve speaker identification system Performance.
In the training stage, in order to which the local classification capacity to fundamental classifier is evaluated, first to from largely speaking The development set voice data collection X of people is analyzed, and is divided into K subset S according to certain criterion1,S2,...,SK, divide There is certain similitude to a certain extent, then with each fundamental classifier to each between voice data into identity set Voice data in a set is tested, and corresponding score output is counted, finally using score averages as determining classifier In the foundation of each test result confidence level for collecting and closing.In this way, fundamental classifier can be obtained in each set On classification capacity, it is final to determine that each fundamental classifier is in each confidence level vector w for collecting and closing in blending algorithm1, w2,...,wQ.Each confidence level vector includes K value, represents the classifier to a certain classification confidence for collecting and closing.
In conjunction with solving the problems, such as herein, Dynamic multiple classifiers combination process may be summarized to be following steps:
1) suitable development set corpus is selected, if the corpus includes N number of different speaker, each speaker has two Voice data.In development set corpus, a composition training set X is taken out from each speaker's voice1, remaining as ginseng According to collection X2
2) according to simultaneous factor analysis theory, speaker's factor of all voices in development set corpus is extracted, it is assumed that come from Speaker's factor sequence of training set is expressed as
3) with Y1As input, the lesser gauss hybrid models of blending constituent number are trained, λ={ w is expressed asi, μii, i=1 ..., K.Wherein parameter wi、μiAnd ΣiRespectively represent the weight of each gauss component in gauss hybrid models, Mean value and covariance.If speaker's factorOccupation rate to i-th of gauss component in GMM model isWhenWhen, it willIt is divided into k-th of subset SkIn, it in this way, will be all in training set Speaker's factor is divided into K different set.
4) according to the division result of speaker's factor, corresponding training voice is also divided into K set in previous step.
5) for a certain set Sk, trained voice therein and its corresponding reference from same speaker are concentrated Voice data project in q-th of stochastic subspace, respectively obtain trained voice and the reference vector referring to voice.
6) it calculates trained voice and referring to the COS distance between phonetic reference vector, divides in this, as q-th of NLDA The test score of class device exports.
7) q-th of NLDA classifier is calculated in set SkOn all test scores average valueExist as the classifier Set SkOn classification confidence.Correspondingly, the local classification confidence vector of the fundamental classifier can be expressed as
8) in the multiple Classifiers Combination stage, for a certain voice to be measured, it is extracted first, in accordance with simultaneous factor analysis theory Voice data to be measured, is then divided the criteria for classifying of development set data further according in training process by corresponding speaker's factor To a certain set SkIn, finally with each fundamental classifier in set SkOn classification confidence valueAs power Linear fusion is carried out to the output of all fundamental classifiers again.
During the determination of fundamental classifier part classification confidence, it is assumed that have between the personal characteristics of certain speakers Certain similitude, and the phonetic feature of these speakers with similitude also has certain similitude in the regularity of distribution, In feature space in some regional area.In the present invention, by speaker's mentioned speech feature vector sequence of different length JFA words person's super vector that there is regular length and eliminate local channel influence is converted by simultaneous factor analysis technology.JFA Distribution situation of words person's super vector in high-dimensional feature space has reacted the distribution of different speaker's personal characteristics.And it is adopted in this chapter With the distribution of speaker's factor come the distribution situation of approximate simulation JFA words person's super vector, this is because:
1) JFA words person super vector often has very high dimension, using common statistical mathematics model be difficult to higher-dimension to The regularity of distribution of amount carries out accurate modeling.
2) in order to guarantee not lose most of useful information, JFA words person's super vector is projected into nonparametric linear zone molecule Still dimension with higher behind space.
3) speaker's factor dimension for JFA words person's vector after the projection in previous step is lower, and speaks The extraction process of people's factor is also based on simultaneous factor analysis algorithm, so necessary speaker's individual information is also contained, It can reflect the distribution of JFA words person's super vector.
It can be seen that in Combining Multiple Classifiers proposed by the present invention from the above content, according to voice data to be measured Locating region determines weight of each fundamental classifier in score fusion process in speaker's factor space.Due to every Speaker's factor of item voice to be measured has different distributions situation, so the weight of each fundamental classifier is with language to be measured The difference of sound and dynamic change.It is worth noting that in multiple Classifiers Combination algorithm of the invention, each fundamental classifier Fusion weight can determine that this mode substantially increases the real-time of emerging system before test.
3, evaluating system performance.
Experimental data is derived from 2008 speaker of NIST and evaluates and tests database, wherein trained and tested speech still selects core to comment The performance of speaker identification system is measured in male's phone training in survey task to call test part as evaluation and test data set. The training data of UBM comes from Switchboard II phase 2, Switchboard II phase 3, Switchboard Telephone voice data in Cellular Part 2 and NIST SRE 2004,2005,2006, share 2048 Gausses at Point.
To training nonparametric subspace distinguishing analysis projection matrix development set data be taken from NIST SRE2004, 2005, the call voice in 2006 databases, includes 563 speakers altogether, and each speaker has 8 voice data.
UBM is identical as the above in simultaneous factor analysis system, and speaker space load moment rank of matrix is 300, intrinsic Channel space load moment rank of matrix be 100, residual error loading matrix by each gauss component in UBM model diagonal covariance square Diagonal entry in battle array is spliced.
Covariance is regular in principal component analysis employed in the present invention, class and the linear distinguishing analysis of nonparametric projects square The dimension of battle array is respectively as follows: (51 × k) × J, (E1+E2) × 799,799 × 550.Number, that is, fundamental classifier of stochastic subspace Number Q is set as 10.In the linear distinguishing analysis of nonparametric, the number of neighbour's sample is set as 4.
After the subspace sampling in original feature space, we obtain new feature vector Sih.Assuming that first In straton spatial sampling, we finally have chosen preceding 1280 Gaussian mean vectors in JFA words person's super vector after sequence.But Be this feature vector dimension it is still very high for the training sample in development set.So reliable and stable in order to train Subspace classifier, need further to project to new feature vector the subspace PCA of low-dimensional, set dropped by PCA here The dimension of feature vector after dimension is J.Before carrying out stochastical sampling, in order to guarantee the property of each basic subspace classifier Can, the preceding E compared with multiple data quantity will be contained first1A pivot component is fixed up, and random sampling algorithms are only applied to remaining J-E1 A pivot component, therefrom randomly selects E2A pivot component dimension is E1+E2Stochastic subspace.
In the experiment of second layer sample space, the value of J is fixed as 1200 or 1300, which determined by cross validation The relatively figure of merit.E1+E2Value be fixed as 800.For each combination (E1, E2), we create 10 sub-spaces, i.e., 10 at random A fundamental classifier.
The case where performance that dynamic self-adapting blending algorithm has been investigated in first group of experiment changes with clusters number K.By In clustering method using GMM algorithm, and training data is limited, therefore the value of K is respectively set to 8,16 and 32.Experiment knot Fruit list in table 1.
1 dynamic self-adapting fusion method experimental result of table
In table 1, when K is 8,16,32, dynamic self-adapting fusion results are to E1And E2EER under all combination conditions with The mean value of minDCF is respectively as follows: 4.02,2.20;3.89 2.14;4.02 2.20.It can be seen that when the value of K is 12, fusion System performance afterwards is best.The reason is that, when the value of clusters number K is smaller, it cannot be effectively by the feature of similar speaker Vector flocks together, and the local classification capacity of fundamental classifier cannot be effectively reflected out, and causing it, locally classification is set The estimation of reliability is not accurate enough;Conversely, the GMM mould when the value of K is larger for the scale of training data, for cluster The complexity of type increases, and model parameter is easy to appear over-fitting in estimation procedure, causes the part point of fundamental classifier Class confidence level cannot be effectively estimated.First group of experimental result sufficiently shows that fundamental classifier can be made when the value of K is 16 Local classification confidence estimation it is more accurate.
Second group of experiment then comparative analysis dynamic self-adapting fusion method (DY) and linear fusion proposed by the invention Algorithm (LR), and the classical syncretizing effect for being applied to be based on Logistic regression algorithm (LG) in speaker verification field, Wherein clusters number K=16 in dynamic self-adapting fusion method.
The comparison of the different fusion methods of table 2
E is listed in table 21And E2Three kinds of blending algorithms in various combination as a result, being constructed for every kind of combination 10 fundamental classifiers out.There it can be seen that dynamic self-adapting fusion method proposed by the invention is equal for every group of experiment Minimum EER value, the blending algorithm followed by returned based on Logistic can be obtained, linear fusion system has highest EER, Performance is worst.In terms of minDCF, dynamic self-adapting blending algorithm except third group experiment in addition to every group of experiment in substantially all Minimum detection cost can be obtained.Especially in the 5th group of experiment, the EER of dynamic self-adapting fusion is that 3.76, minDCF is 2.08, system performance reaches best, than the minimum EER value relative reduction 3.7% for returning blending algorithm based on Logistic, than Linear fusion minimum EER value relative reduction 6.6% accordingly.This sufficiently shows proposed in this paper based on stochastic subspace sampling Dynamic multiple classifiers combination algorithm validity, and the blending algorithm is suitable for any subspace classifier, has fine Generalization.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims (10)

1. the spoken assessment identity identifying method based on speaker Recognition Technology, which comprises the following steps:
S1: when user's registration, obtaining and analyze the voice messaging of user, obtains received pronunciation template, initialization authentication point Number;
S2: when user starts spoken assessment function, certification time is calculated according to the total duration T of assessment, authentication score S Several and authenticated time;
S3: when authenticated time reaches, obtaining the voice messaging of user, and compare with received pronunciation template, if it does, This is authenticated successfully;Otherwise return step S3 is monitored authenticated time next time;
S4: authentication score is updated according to the authentication result of this spoken language assessment.
2. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that The step S1 further include: whether real-time detection receives the authentication score of teacher's feedback, if so, updating authentication Score.
3. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that In the step S1, the voice messaging be it is a plurality of, directly acquired by microphone or adjusted from the test database of user It takes.
4. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that The step S2 specifically:
S2a: certification number A, A=5T/S are calculated;
S2b: authenticated time: P is calculatedi=Pi-1+B;Wherein, PiFor i-th certification authenticated time, B between 0~12S with Machine number.
5. the spoken assessment identity identifying method according to claim 4 based on speaker Recognition Technology, which is characterized in that The step S3 specifically:
S3a: as authenticated time PiWhen arrival, authentication error number is set as 0;
S3b: the voice messaging that user whether is received within the preset extension time is judged, if so, executing step S3c;It is no Then, authentification failure number accumulation 1, return step S2;
S3c: whether authentication error number reaches preset authentication error upper limit value, if so, authentification failure number accumulation 1, is returned Return step S2;Otherwise, step S3d is executed;
S3d: received voice messaging and received pronunciation template are compared, if it does, return step S3a is to recognizing next time The card time is monitored;If mismatched, authentication error number cumulative 1, return step S3b.
6. the spoken assessment identity identifying method according to claim 5 based on speaker Recognition Technology, which is characterized in that In the step S4, authentication score is the inverse of authentification failure number.
7. the spoken assessment identity identifying method according to claim 1 based on speaker Recognition Technology, which is characterized in that When this method analyzes the voice messaging of user, several classifiers are constructed first, then merge classifier, obtain standard Sound template.
8. the spoken assessment identity identifying method according to claim 7 based on speaker Recognition Technology, which is characterized in that The construction method of the classifier is as follows:
Firstly, extracting the corresponding JFA words person super vector of voice messaging, one is chosen from the mean vector in JFA super vector newly The lower subspace of dimension;Then, optimal dimensionality reduction is carried out to the feature vector in the subspace using principal component analytical method, It is projected into the lower-dimensional subspace that dimension is J;Secondly, in the lower-dimensional subspace, if being obtained using random sampling technique Dry stochastic subspace;Finally, for each stochastic subspace, carry out that covariance in class is regular and nonparametric linear zone respectively Analysis, to obtain the corresponding projection matrix of each stochastic subspace, i.e. classifier.
9. the spoken assessment identity identifying method according to claim 7 based on speaker Recognition Technology, which is characterized in that The output of classifier is merged using dynamic fusion method.
10. the spoken assessment identity identifying method according to claim 9 based on speaker Recognition Technology, feature exist In the dynamic fusion method specifically:
Firstly, analyzing the development set voice data collection X from a large amount of speakers, it is divided into according to certain criterion K subset SK;Then, the voice data in each subset is tested with each classifier, counts corresponding score output; Finally, using score averages as the weight for determining that classifier is closed in each collection.
CN201611141701.2A 2016-12-12 2016-12-12 Spoken assessment identity identifying method based on speaker Recognition Technology Active CN106531170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611141701.2A CN106531170B (en) 2016-12-12 2016-12-12 Spoken assessment identity identifying method based on speaker Recognition Technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611141701.2A CN106531170B (en) 2016-12-12 2016-12-12 Spoken assessment identity identifying method based on speaker Recognition Technology

Publications (2)

Publication Number Publication Date
CN106531170A CN106531170A (en) 2017-03-22
CN106531170B true CN106531170B (en) 2019-09-17

Family

ID=58342159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611141701.2A Active CN106531170B (en) 2016-12-12 2016-12-12 Spoken assessment identity identifying method based on speaker Recognition Technology

Country Status (1)

Country Link
CN (1) CN106531170B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108417226A (en) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 Speech comparison method, terminal and computer readable storage medium
CN112017694B (en) * 2020-08-25 2021-08-20 天津洪恩完美未来教育科技有限公司 Voice data evaluation method and device, storage medium and electronic device
CN113053395B (en) * 2021-03-05 2023-11-17 深圳市声希科技有限公司 Pronunciation error correction learning method and device, storage medium and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205424B1 (en) * 1996-07-31 2001-03-20 Compaq Computer Corporation Two-staged cohort selection for speaker verification system
CN102142254A (en) * 2011-03-25 2011-08-03 北京得意音通技术有限责任公司 Voiceprint identification and voice identification-based recording and faking resistant identity confirmation method
CN102708867A (en) * 2012-05-30 2012-10-03 北京正鹰科技有限责任公司 Method and system for identifying faked identity by preventing faked recordings based on voiceprint and voice
CN103065642B (en) * 2012-12-31 2015-06-17 安徽科大讯飞信息科技股份有限公司 Method and system capable of detecting oral test cheating
CN105810199A (en) * 2014-12-30 2016-07-27 中国科学院深圳先进技术研究院 Identity verification method and device for speakers

Also Published As

Publication number Publication date
CN106531170A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
Yu et al. Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features
An et al. Deep CNNs with self-attention for speaker identification
Hansen et al. Speaker recognition by machines and humans: A tutorial review
CN100363938C (en) Multi-model ID recognition method based on scoring difference weight compromised
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
Wan Speaker verification using support vector machines
CN105261367B (en) A kind of method for distinguishing speek person
CN101710490A (en) Method and device for compensating noise for voice assessment
CN103198833B (en) A kind of high precision method for identifying speaker
CN106531170B (en) Spoken assessment identity identifying method based on speaker Recognition Technology
CN110299142A (en) A kind of method for recognizing sound-groove and device based on the network integration
Pinto et al. Exploiting contextual information for improved phoneme recognition
Bhardwaj et al. GFM-based methods for speaker identification
Fu et al. Speaker independent emotion recognition based on SVM/HMMs fusion system
CN110364168A (en) A kind of method for recognizing sound-groove and system based on environment sensing
CN110111798A (en) A kind of method and terminal identifying speaker
Shi et al. Visual speaker authentication by ensemble learning over static and dynamic lip details
Shivakumar et al. Simplified and supervised i-vector modeling for speaker age regression
Ng et al. Teacher-student training for text-independent speaker recognition
CN110085236B (en) Speaker recognition method based on self-adaptive voice frame weighting
Michalevsky et al. Speaker identification using diffusion maps
Chen On the use of different speech representations for speaker modeling
Zhang et al. Optimized discriminative kernel for SVM scoring and its application to speaker verification
Ping English speech recognition method based on hmm technology
Golipour et al. Context-independent phoneme recognition using a k-nearest neighbour classification approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant