CN104538035B - A kind of method for distinguishing speek person and system based on Fisher super vectors - Google Patents

A kind of method for distinguishing speek person and system based on Fisher super vectors Download PDF

Info

Publication number
CN104538035B
CN104538035B CN201410802816.6A CN201410802816A CN104538035B CN 104538035 B CN104538035 B CN 104538035B CN 201410802816 A CN201410802816 A CN 201410802816A CN 104538035 B CN104538035 B CN 104538035B
Authority
CN
China
Prior art keywords
speaker
fisher
vector
subspace
projection matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410802816.6A
Other languages
Chinese (zh)
Other versions
CN104538035A (en
Inventor
李志锋
李娜
乔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201410802816.6A priority Critical patent/CN104538035B/en
Publication of CN104538035A publication Critical patent/CN104538035A/en
Application granted granted Critical
Publication of CN104538035B publication Critical patent/CN104538035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is suitable for technical field of voice recognition, there is provided a kind of method for distinguishing speek person and system based on Fisher super vectors, the described method includes:Extract Fisher super vectors;The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker model;The reference vector of speaker to be identified and the reference vector of training sample speaker are obtained according to subspace speaker model, and speaker to be identified is identified according to default computation rule, the reference vector of speaker to be identified and the reference vector of training sample speaker.Individual information of the present invention using the Fisher super vectors in voice data as characterization speaker, and Speaker Identification is carried out on the basis of Fisher super vectors using subspace analysis modeling technique, effectively improve the recognition performance of system.

Description

A kind of method for distinguishing speek person and system based on Fisher super vectors
Technical field
The invention belongs to technical field of voice recognition, more particularly to a kind of Speaker Identification side based on Fisher super vectors Method and system.
Background technology
With the continuous progress of computer technology and Internet technology, smart machine has become to get in people's lives Come more indispensable.And as the interactive voice of one of interactive mode between people and smart machine, due to its have collection it is easy, It is easy to store, is difficult to imitate, voice obtains the characteristic such as of low cost, also becomes the hot spot of research field.
Current intelligent sound processing mode, according to the difference of the voice messaging utilized, is broadly divided into:Speech recognition (Speech Recognition), languages identify (Language Recognition) and Speaker Identification (Speaker Recognition) etc..Wherein, speech recognition aims at which kind of semantic information judge to be transmitted in voice signal be; The target of languages identification is the category of language or dialect type identified belonging to voice signal;Speaker Identification is then by carrying The personal characteristics of characterization speaker is taken, identifies the identity of speaker.
Since voice is the important carrier of identity information, compared with the other biological feature such as face, fingerprint, the acquisition of voice Of low cost, using simple, easy to remote data acquisition, and voice-based man-machine communication interface is more friendly, therefore speaks People's identification technology becomes important automatic identity authentication technology.
The method for the Speaker Identification being commonly used at present includes being based on gauss hybrid models-universal background model (GMM- UBM speaker's speech recognition) is carried out, although GMM-UBM models have certain noise robustness, since the model is being instructed The influence of channel is not accounted for when practicing, when training voice and tested speech from different channels, causes its recognition performance Drastically decline.
The reduction of caused recognition performance during to overcome channel mismatch, the prior art propose one kind and are based on GMM-UBM The simultaneous factor analysis (Joint Factor Analysis, JFA) of model) mode, to carry out Speaker Identification.But due to JFA is theoretical to be established in the frame foundation of GMM-UBM models, it is assumed that the main letter that the GMM average super vectors of speaker are included Breath may map in two mutually independent lower-dimensional subspaces, using EM iterative algorithms to the space based on GMM model frame Loading matrix is estimated, GMM model frame can not be departed from calculating process.Method for identifying speaker based on JFA theories It is that channel compensation has been carried out to speaker model according to the parameter estimated during the test, test performance is poor.
The content of the invention
In consideration of it, the embodiment of the present invention provides a kind of method for distinguishing speek person and system based on Fisher super vectors, with Individual information using the Fisher super vectors high dimensional feature vector in voice data as characterization speaker, and using subspace point Analysis modeling technique carries out Speaker Identification on the basis of Fisher super vector high dimensional feature vectors, improves the identity of system Energy.
The embodiment of the present invention is achieved in that a kind of method for distinguishing speek person based on Fisher super vectors, the side Method includes:
Extract Fisher super vectors;
The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;
Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker Model;
Reference vector and the training sample speaker of speaker to be identified is obtained according to the subspace speaker model Reference vector, and according to default computation rule, and the reference vector of the speaker to be identified and the trained sample The speaker to be identified is identified in the reference vector of this speaker.
The another object of the embodiment of the present invention is to provide a kind of Speaker Recognition System based on Fisher super vectors, institute The system of stating includes:
Extraction unit, for extracting Fisher super vectors;
Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;
Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection, To establish subspace speaker model;
Recognition unit, for obtaining the reference vector and instruction of speaker to be identified according to the subspace speaker model Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with And the speaker to be identified is identified in the reference vector of the training sample speaker.
Existing beneficial effect is the embodiment of the present invention compared with prior art:In extraction voice data of the embodiment of the present invention Feature vector of the Fisher super vectors as speaker, and using subspace analysis modeling technique Fisher super vectors base Speaker Identification is carried out on plinth.Since the extraction of Fisher super vectors is simple, and the dimension with than JFA super vector highers, and Channel compensation processing was not done, so as to effectively improve the accuracy rate of Speaker Identification and efficiency.In addition, the embodiment of the present invention Extra hardware need not be increased in above-mentioned identification process, so as to effectively reduce cost, there is stronger ease for use and reality The property used.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is that the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides realizes flow Figure;
Fig. 2 is the schematic diagram for the nonparametric distinguishing analysis based on Fisher super vectors that the embodiment of the present invention one provides;
Fig. 3 is the Speaker Recognition System based on Fisher super vectors of the offer of the embodiment of the present invention one with being surpassed based on JFA The analogous diagram of the Speaker Recognition System comparative result of vector;
Fig. 4 is the composition structure of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors Figure.
Embodiment
In being described below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc Body details, understands the embodiment of the present invention to cut thoroughly.However, it will be clear to one skilled in the art that these are specific The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known system, device, electricity Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
Fig. 1 shows the realization stream for the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides Journey, details are as follows for this method process:
In step S101, Fisher super vectors are extracted.
In embodiments of the present invention, in order to further improve the accuracy rate of Speaker Identification and efficiency, the embodiment of the present invention Extract feature vector of the Fisher super vectors in voice data as speaker.
Wherein, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing and Into, the dimension of the Fisher super vectors is (2d+1) K, wherein:
Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The feature vector sequence of speaker's voice Arrange X={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM The number of gauss component in model,
K-th of gauss component in GMM modelwkTable Show the weight of k-th of gauss component in GMM model,μkRepresent GMM model in k-th of Gauss into The mean vector divided, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
It is described as follows:If the characteristic vector sequence from a voice data is X={ xt, t=1...T }, wherein Each feature vector xtBetween be mutually independent, X can represent as follows:
Between feature vector under mutually independent assumed condition, Fisher super vectors can regard as to each feature to The adduction of the regularization gradient statistic of amount, following operator:
It can be regarded as a feature vector xtA point being embedded into higher dimensional space, so as to be easier to linearly be divided The structure of class device.It is further noted that between feature vector it is mutually independent hypothesis in practical situations often not into Vertical, for this problem, corresponding processing method can be mentioned in herein below.
Since GMM model can be to any continuously distributed carry out Accurate Model, it is therefore assumed that pdf model pλFor GMM model.In order to obtain the corresponding Fisher super vectors of every voice data, it is necessary to which one only with speaker and channel information Vertical universal background model, pλModel is to be trained by a large amount of voice data from different speakers, different channels Common background GMM model with more gauss component number.Assuming that the GMM model has K gauss component, then the GMM model Parameter can be expressed as λ={ wkkk, k=1 ..., K }, wherein wk, μkAnd ΣkRepresent respectively k-th high in GMM model The weight of this component, mean vector and covariance matrix.GMM model is represented by the following formula:
Wherein, pkRepresent k-th of gauss component in GMM model:
And there is the following conditions establishment:
In order to ensure pλ(x) distribution of training data can effectively be described, it is assumed that each gauss component in GMM model Covariance matrix is diagonal matrix, and the element on its diagonal is with vectorRepresent.
In addition, the weight parameter w for the gauss component in GMM modelk, in order to avoid using the immediate constraint shape of above formula Formula, introduces parameter alphakBy gauss component weight wkIt is expressed as form:
GMM model parameter can be expressed as again, λ={ αkkk, k=1 ..., K }, a certain feature vector xtRelative to The gradient of GMM model parameter is expressed as form:
γ in above equationt(k) feature vector x is representedt, can be with to the occupation rate of k-th of gauss component in GMM model Calculated by the posterior probability of following formula to represent:
Gradient more than having seeks solution's expression, next solves the root mean square problem of Fisher's information inverse of a matrix. The value of posterior probability is typically very sparse, that is to say, that feature vector xtSimply some gauss component is occupied Rate is higher, all smaller to the occupation rate of remaining gauss component, reflects and just refers to feature in the spatial distribution of feature vector The center of some Gaussian function of vector distance is closer, with regard to distant with a distance from other Gaussian function centers.Due to taking What is be worth is openness, and Fisher's information matrix is diagonal matrix, therefore can obtain the pressure gradient expression formula of following regularization:
In above equationValue be a scalar,WithValue be d dimension vector.Final Fisher super vectors It is corresponding by all gauss components in GMM modelWithThree splicings obtain, its dimension is (2d+1) K.
In step s 102, the Fisher super vectors of extraction are divided into multiple Fisher subvectors collection.
Particularly, all Gaussian mean vectors of UBM model are clustered using GMM algorithms, according to cluster result, Average division, or non-average division can be used, the Fisher super vectors are divided into multiple Fisher subvectors collection.
In step s 103, each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to build Vertical subspace speaker model.
Since Fisher super vectors achieve preferable recognition effect in image classification, and extraction process is easy, therefore The embodiment of the present invention is introduced into field of speech recognition, studies its application effect in the field.Due to Fisher super vectors Be also based on what UBM model obtained, thus as JFA super vectors also have GMM super vectors structure, have than JFA surpass to Measure the dimension of higher.From the point of view of theoretically, more redundancy is contained in Fisher super vectors, it is therefore desirable to using nonparametric Distinguishing analysis algorithm (NDA) carries out analysis modeling, (as shown in Figure 2) specific as follows to Fisher super vectors:
1) redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained Projection matrix after the dimensionality reduction of each Fisher subvector collection.
It is included in specifically, being removed using principal component analysis (Principal Component Analysis, PCA) algorithm Redundancy in Fisher subvectors, corresponds to each Fisher subvectors in Nonparametric Analysis part as shown in Figure 2 Sub- projection matrix W in the projection matrix expression formula of collection11,W21,...,WK1Projection square as after the optimal dimensionality reduction of PCA algorithms Battle array.
2) projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtained each The corresponding subspace projection matrix of Fisher subvector collection.
It is specifically, regular (Within-Class Covariance Normalization, WCCN) using covariance in class Same speaker is reduced due to difference in class caused by the factor such as health status or emotional change, which is to be applied to In set of eigenvectors after the projection of PCA methods.Correspond to each Fisher in Nonparametric Analysis part shown in Fig. 2 Sub- projection matrix W in the projection matrix expression formula of subvector collection12,W22,...,WK2It is exactly after WCCN feature normalizations algorithm acts on Obtained subspace projection matrix.
3) differentiation on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric Information, obtains the linear property distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated.
Specifically, propose that nonparametric linearly distinguishes parser to extract the differentiation information on class border, so that between increasing class Difference.After the above dimensionality reduction of two steps and feature normalization denoising has been carried out, new characteristic dimension reducing further, so that Avoid in the linear distinguishing analysis of nonparametric of final step that the problem of singular matrix occurs in Scatter Matrix in obtained class. Corresponding to the sub- projection matrix in the projection matrix expression formula of each Fisher subvectors collection in Nonparametric Analysis part in Fig. 2 W13,W23,...,WK3It is exactly the projection matrix that nonparametric linearly distinguishes parser.The linear distinguishing analysis of nonparametric (Nonparametric Linear Discriminant Analysis, NLDA) is to linear distinguishing analysis (Linear Discriminant Analysis, LDA) algorithm a kind of improvement.
4) subspace after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class is thrown Shadow matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace projection matrix, As subspace speaker model.
Specifically, after above-mentioned subspace analysis processing has been carried out respectively to each subvector collection of Fisher super vectors, It can obtain the product of the projection matrix, i.e. three above projection matrix of each Fisher subvectors collection, Wk=Wk1Wk2Wk3. After having arrived the projection matrix of all Fisher subvectors collection, they are stitched together successively in sequence to form total Fisher and surpasses The projection matrix of vector, WTotal=[W1...Wk...WK]。
In step S104, the reference vector and instruction of speaker to be identified are obtained according to the subspace speaker model Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with And the speaker to be identified is identified in the reference vector of the training sample speaker.
Particularly, in the modeling of training sample speaker model and test phase, first to training sample speaker and treat Identify that the voice of speaker extracts corresponding Fisher super vectors according to the processing method in training total projection matrix, then with training Good total projection matrix WTotalFisher super vectors are mapped to the subspace of low-dimensional, training sample speaker is respectively obtained and treats Identify the reference vector R of speakertrainAnd Rtest, finally according to formulaCalculate COS distance between two reference vectors is as test score;
When the test score is less than predetermined value, it is artificial identical to judge that the speaker to be identified speaks with training sample Speaker;The artificial different speaker otherwise, it is determined that the speaker to be identified and training sample are spoken.
In order to verify the validity of the method for distinguishing speek person proposed by the present invention based on Fisher super vectors, pass through experiment The property of Speaker Recognition System of the comparative analysis based on Fisher super vectors and the Speaker Recognition System based on JFA super vectors Energy.
Experimental data is derived from 2008 speakers of NIST and evaluates and tests database, wherein training and tested speech select core evaluation and test Male's phone training in task weighs the performance of Speaker Recognition System to call test part as evaluation and test data set. The training data of UBM comes from Switchboard II phase 2, Switchboard II phase 3, Switchboard Telephone voice data in Cellular Part 2 and NIST SRE 2004,2005,2006, share 2048 Gausses into Point.
To training nonparametric subspace distinguishing analysis projection matrix development set data be taken from NIST SRE 2004, 2005th, the call voice in 2006 databases, altogether comprising 563 speakers, each speaker has 8 voice data.
The value of the parameter Q of neighbour's feature vector number is controlled to be set to 4 in the distinguishing analysis algorithm of nonparametric subspace.Non- ginseng Number subspace distinguishing analysis is with latent factorial analysis, 16 are set to the division number of Fisher super vectors.
Using JFA systems as contradistinction system, the UBM used in it is identical with the above, speaker space loading matrix V Order be 300, the order of eigenchannel space loading matrix U is 100, residual error loading matrix D by each Gauss in UBM model into Diagonal entry in the diagonal covariance matrix divided is spliced.
Nonparametric is investigated for first to distinguish in subspace analysis algorithm under the various combination of each projection matrix order System performance.Due to including 563 speakers in the development set data for training subspace projection matrix altogether, so subspace Projection matrix Wk3Order the upper limit be 562.In order to extract the differentiation information on classification boundaries, Wk3Order it is unsuitable less than normal, so this By W in experimentk3Order be set as 550.Further, since PCA dimensionality reductions amplitude is most in nonparametric distinguishes subspace analysis algorithm Big, Wk1If order cross conference cause projection after feature vector in contain excessive redundancy, Wk1If order it is too small The loss of necessary differentiation information can be caused again, so the step will directly affect the quality of system performance.In the part Experiment, For main system performance of investigating with the situation of change of projection matrix order, table 1 shows the nonparametric area based on Fisher super vectors Divide analysis result:
Table 1
From table 1 it follows that work as the linear distinguishing analysis projection matrix W of nonparametrick3Order to timing, surpassed based on Fisher The system performance of the nonparametric distinguishing analysis Speaker Recognition System of vector is within the specific limits with PCA projection matrixes Wk1Order Increase and improve, work as Wk1Order be 1300 when, system performance preferably (EER is minimum, that is, identify error rate it is minimum;MinDCF is (i.e. most Small detection cost) for 2.73), but with Wk1Order continue to increase so that projection properties in PCA subspaces vector contain compared with More redundancies, causes system performance to decline.
Second has been investigated the Speaker Recognition System proposed by the invention based on Fisher super vectors with surpassing based on JFA The comparison of the Speaker Recognition System of vector, as shown in figure 3, abscissa represents probability (the False Alarm that report an error Probability), ordinate represents miss probability (Miss probability).Although Fisher+NDA system performances compare JFA System is slightly poor, but it need not train speaker information space and channel to believe using the acoustic feature of substantial amounts of original language material Space is ceased, the speaker information in Fisher super vectors is directly compressed to one by LFA algorithms using EM iteration by PCA subspaces In the subspace of more low latitudes, so Fisher systems are whether in parameter learning process or in score calculating process, its meter Calculation complexity is lower than JFA system, and operation time is also fewer than JFA systems.
Embodiment two:
Fig. 4 shows the composition knot of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors Structure, for convenience of description, illustrate only and the relevant part of the embodiment of the present invention.
The Speaker Recognition System based on Fisher super vectors includes:
Extraction unit 41, for extracting Fisher super vectors;
Division unit 42, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;
Model foundation unit 43, for being divided based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection Analysis, to establish subspace speaker model;
Recognition unit 44, for obtained according to the subspace speaker model speaker to be identified reference vector and The reference vector of training sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified And the speaker to be identified is identified in the reference vector of the training sample speaker.
Further, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing Forming, the dimension of the Fisher super vectors is (2d+1) K, wherein:
Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The feature vector sequence of speaker's voice Arrange X={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM The number of gauss component in model,
K-th of gauss component in GMM modelwkRepresent The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
Further, the model foundation unit 43 includes:
First processing module 431, each Fisher subvectors collection is included in for being removed using principal component analysis PCA algorithms In redundancy, obtain the projection matrix after the dimensionality reduction of each Fisher subvectors collection;
Second processing module 432, for the regular WCCN algorithms of covariance in use class to the projection matrix after the dimensionality reduction Handled, obtain the corresponding subspace projection matrix of each Fisher subvectors collection;
3rd processing module 433, for extracting the subspace projection using the linear distinguishing analysis NLDA algorithms of nonparametric The differentiation information on the class border of matrix, obtains the linear property distinguishing analysis projection square of nonparametric that each Fisher subvectors are concentrated Battle array;
Model building module 434, for covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class to be advised The linear distinguishing analysis projection matrix of subspace projection matrix and nonparametric after whole WCCN splices successively in sequence, obtains total Subspace projection matrix.
Further, the recognition unit 44 includes:
Computing module 441, for obtaining the reference vector of speaker to be identified according to the subspace speaker model RtrainAnd the reference vector R of training sample speakertest, and according to formulaMeter The COS distance between two reference vectors is calculated as test score;
Identification module 442, for when the test score is less than predetermined value, judging the speaker to be identified and training Sample is spoken artificial identical speaker.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, will the internal structure of the system be divided into different functional units or module, more than completion The all or part of function of description.Each functional unit in embodiment can be integrated in a processing unit or Unit is individually physically present, can also two or more units integrate in a unit, above-mentioned integrated unit Both it can be realized, can also be realized in the form of SFU software functional unit in the form of hardware.In addition, each functional unit, mould The specific name of block is not limited to the protection domain of the application also only to facilitate mutually distinguish.It is single in said system Member, the specific work process of module, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
In conclusion the embodiment of the present invention extraction voice data in Fisher super vectors as speaker feature to Amount, and Speaker Identification is carried out on the basis of Fisher super vectors using subspace analysis modeling technique.Since Fisher surpasses Vector extraction is simple, and the dimension with than JFA super vector highers, and does not do channel compensation processing, so as to effective The accuracy rate and efficiency of Speaker Identification are improved, there is stronger usability and practicality.
Those of ordinary skill in the art may realize that each exemplary list described with reference to the embodiments described herein Member and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical solution.Professional technician Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, can pass through others Mode is realized.For example, device embodiment described above is only schematical, for example, the division of the module or unit, Only a kind of division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can be with With reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed Mutual coupling or direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or Communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, the technical solution of the embodiment of the present invention The part substantially to contribute in other words to the prior art or all or part of the technical solution can be with software products Form embody, which is stored in a storage medium, including some instructions use so that one Computer equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform this hair The all or part of step of bright each embodiment the method for embodiment.And foregoing storage medium includes:USB flash disk, mobile hard disk, Read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic Dish or CD etc. are various can be with the medium of store program codes.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to foregoing reality Example is applied the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to foregoing each Technical solution described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed Or replace, the essence of appropriate technical solution is departed from the spirit and model of each embodiment technical solution of the embodiment of the present invention Enclose.

Claims (6)

  1. A kind of 1. method for distinguishing speek person based on Fisher super vectors, it is characterised in that the described method includes:
    Extract Fisher super vectors;
    The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;
    Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker's mould Type;
    The reference vector of speaker to be identified and the ginseng of training sample speaker are obtained according to the subspace speaker model Vector is examined, and is said according to default computation rule, and the reference vector of the speaker to be identified and the training sample The speaker to be identified is identified in the reference vector of words people;
    The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein:
    Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The characteristic vector sequence X of speaker's voice ={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,
    K-th of gauss component in GMM modelwkRepresent The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
  2. 2. the method as described in claim 1, it is characterised in that the nonparametric distinguishing analysis algorithm that is based on is to each Fisher Subvector collection is analyzed, and is included with establishing subspace speaker model:
    The redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained each Projection matrix after the dimensionality reduction of Fisher subvector collection;
    The projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtains each Fisher The corresponding subspace projection matrix of vector set;
    The differentiation information on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric, is obtained The linear distinguishing analysis projection matrix of nonparametric concentrated to each Fisher subvectors;
    By the subspace projection matrix after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class And the linear distinguishing analysis projection matrix of nonparametric splices successively in sequence, obtains total subspace projection matrix.
  3. 3. method according to claim 1, it is characterised in that described to be identified according to subspace speaker model acquisition The reference vector of speaker and the reference vector of training sample speaker, according to default computation rule, and described wait to know The reference vector of other speaker and the reference vector of the training sample speaker, which carry out Speaker Identification step, to be included:
    The reference vector R of speaker to be identified is obtained according to the subspace speaker modeltrainAnd training sample speaker Reference vector Rtest, and according to formulaCalculate the cosine between two reference vectors Distance is as test score;
    When the test score is less than predetermined value, judge that the speaker to be identified and training sample are spoken artificially identical say Talk about people.
  4. A kind of 4. Speaker Recognition System based on Fisher super vectors, it is characterised in that the system comprises:
    Extraction unit, for extracting Fisher super vectors;
    Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;
    Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection, to build Vertical subspace speaker model;
    Recognition unit, for obtaining the reference vector and training sample of speaker to be identified according to the subspace speaker model The reference vector of this speaker, and according to default computation rule, and the reference vector of the speaker to be identified and institute The speaker to be identified is identified in the reference vector for stating training sample speaker;
    The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein:
    Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The characteristic vector sequence X of speaker's voice ={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,
    K-th of gauss component in GMM modelwkRepresent The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
  5. 5. system as claimed in claim 4, it is characterised in that the model foundation unit includes:
    First processing module, the superfluous of each Fisher subvectors concentration is included in for being removed using principal component analysis PCA algorithms Remaining information, obtains the projection matrix after the dimensionality reduction of each Fisher subvectors collection;
    Second processing module, at using the regular WCCN algorithms of covariance in class to the projection matrix after the dimensionality reduction Reason, obtains the corresponding subspace projection matrix of each Fisher subvectors collection;
    3rd processing module, for extracting the class of the subspace projection matrix using the linear distinguishing analysis NLDA algorithms of nonparametric The differentiation information on border, obtains the linear distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated;
    Model building module, for by after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class Subspace projection matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace Projection matrix.
  6. 6. system according to claim 4, it is characterised in that the recognition unit includes:
    Computing module, for obtaining the reference vector R of speaker to be identified according to the subspace speaker modeltrainAnd instruction Practice the reference vector R of sample speakertest, and according to formulaCalculate two references COS distance between vector is as test score;
    Identification module, for when the test score is less than predetermined value, judging that the speaker to be identified says with training sample The artificial identical speaker of words.
CN201410802816.6A 2014-12-19 2014-12-19 A kind of method for distinguishing speek person and system based on Fisher super vectors Active CN104538035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410802816.6A CN104538035B (en) 2014-12-19 2014-12-19 A kind of method for distinguishing speek person and system based on Fisher super vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410802816.6A CN104538035B (en) 2014-12-19 2014-12-19 A kind of method for distinguishing speek person and system based on Fisher super vectors

Publications (2)

Publication Number Publication Date
CN104538035A CN104538035A (en) 2015-04-22
CN104538035B true CN104538035B (en) 2018-05-01

Family

ID=52853551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410802816.6A Active CN104538035B (en) 2014-12-19 2014-12-19 A kind of method for distinguishing speek person and system based on Fisher super vectors

Country Status (1)

Country Link
CN (1) CN104538035B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632502A (en) * 2015-12-10 2016-06-01 江西师范大学 Weighted pairwise constraint metric learning algorithm-based speaker recognition method
CN105869645B (en) * 2016-03-25 2019-04-12 腾讯科技(深圳)有限公司 Voice data processing method and device
CN106128466B (en) 2016-07-15 2019-07-05 腾讯科技(深圳)有限公司 Identity vector processing method and device
CN106297807B (en) 2016-08-05 2019-03-01 腾讯科技(深圳)有限公司 The method and apparatus of training Voiceprint Recognition System
CN106601258A (en) * 2016-12-12 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speaker identification method capable of information channel compensation based on improved LSDA algorithm
CN107633845A (en) * 2017-09-11 2018-01-26 清华大学 A kind of duscriminant local message distance keeps the method for identifying speaker of mapping
CN109036437A (en) * 2018-08-14 2018-12-18 平安科技(深圳)有限公司 Accents recognition method, apparatus, computer installation and computer readable storage medium
CN109065059A (en) * 2018-09-26 2018-12-21 新巴特(安徽)智能科技有限公司 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established
CN111462759B (en) * 2020-04-01 2024-02-13 科大讯飞股份有限公司 Speaker labeling method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222500A (en) * 2011-05-11 2011-10-19 北京航空航天大学 Extracting method and modeling method for Chinese speech emotion combining emotion points
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN103578481A (en) * 2012-07-24 2014-02-12 东南大学 Method for recognizing cross-linguistic voice emotion
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10047723A1 (en) * 2000-09-27 2002-04-11 Philips Corp Intellectual Pty Method for determining an individual space for displaying a plurality of training speakers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222500A (en) * 2011-05-11 2011-10-19 北京航空航天大学 Extracting method and modeling method for Chinese speech emotion combining emotion points
CN103578481A (en) * 2012-07-24 2014-02-12 东南大学 Method for recognizing cross-linguistic voice emotion
CN103077720A (en) * 2012-12-19 2013-05-01 中国科学院声学研究所 Speaker identification method and system
CN104167208A (en) * 2014-08-08 2014-11-26 中国科学院深圳先进技术研究院 Speaker recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CLUSTERING SIMILAR ACOUSTIC CLASSES IN THE FISHERVOICE FRAMEWORK;Na li, et al.;《Acoustics, Speech and Signal Processing (ICASSP),2013 IEEE International Conference on》;20131021;7726-7728 *

Also Published As

Publication number Publication date
CN104538035A (en) 2015-04-22

Similar Documents

Publication Publication Date Title
CN104538035B (en) A kind of method for distinguishing speek person and system based on Fisher super vectors
CN107680600B (en) Sound-groove model training method, audio recognition method, device, equipment and medium
CN104167208B (en) A kind of method for distinguishing speek person and device
Liu et al. GMM and CNN hybrid method for short utterance speaker recognition
CN107610707B (en) A kind of method for recognizing sound-groove and device
Bonastre et al. ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN109461073A (en) Risk management method, device, computer equipment and the storage medium of intelligent recognition
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN103854645B (en) A kind of based on speaker's punishment independent of speaker's speech-emotion recognition method
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
CN110516696A (en) It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression
CN108597496A (en) Voice generation method and device based on generation type countermeasure network
CN107342077A (en) A kind of speaker segmentation clustering method and system based on factorial analysis
CN108281146A (en) A kind of phrase sound method for distinguishing speek person and device
CN102324232A (en) Method for recognizing sound-groove and system based on gauss hybrid models
CN108932950A (en) It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram
CN101923855A (en) Test-irrelevant voice print identifying system
CN108922544A (en) General vector training method, voice clustering method, device, equipment and medium
CN109256138A (en) Auth method, terminal device and computer readable storage medium
CN110047504A (en) Method for distinguishing speek person under identity vector x-vector linear transformation
CN109817222A (en) A kind of age recognition methods, device and terminal device
CN109378014A (en) A kind of mobile device source discrimination and system based on convolutional neural networks
CN109800309A (en) Classroom Discourse genre classification methods and device
CN109473102A (en) A kind of robot secretary intelligent meeting recording method and system

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant