CN104538035B - A kind of method for distinguishing speek person and system based on Fisher super vectors - Google Patents
A kind of method for distinguishing speek person and system based on Fisher super vectors Download PDFInfo
- Publication number
- CN104538035B CN104538035B CN201410802816.6A CN201410802816A CN104538035B CN 104538035 B CN104538035 B CN 104538035B CN 201410802816 A CN201410802816 A CN 201410802816A CN 104538035 B CN104538035 B CN 104538035B
- Authority
- CN
- China
- Prior art keywords
- speaker
- fisher
- vector
- subspace
- projection matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is suitable for technical field of voice recognition, there is provided a kind of method for distinguishing speek person and system based on Fisher super vectors, the described method includes:Extract Fisher super vectors;The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker model;The reference vector of speaker to be identified and the reference vector of training sample speaker are obtained according to subspace speaker model, and speaker to be identified is identified according to default computation rule, the reference vector of speaker to be identified and the reference vector of training sample speaker.Individual information of the present invention using the Fisher super vectors in voice data as characterization speaker, and Speaker Identification is carried out on the basis of Fisher super vectors using subspace analysis modeling technique, effectively improve the recognition performance of system.
Description
Technical field
The invention belongs to technical field of voice recognition, more particularly to a kind of Speaker Identification side based on Fisher super vectors
Method and system.
Background technology
With the continuous progress of computer technology and Internet technology, smart machine has become to get in people's lives
Come more indispensable.And as the interactive voice of one of interactive mode between people and smart machine, due to its have collection it is easy,
It is easy to store, is difficult to imitate, voice obtains the characteristic such as of low cost, also becomes the hot spot of research field.
Current intelligent sound processing mode, according to the difference of the voice messaging utilized, is broadly divided into:Speech recognition
(Speech Recognition), languages identify (Language Recognition) and Speaker Identification (Speaker
Recognition) etc..Wherein, speech recognition aims at which kind of semantic information judge to be transmitted in voice signal be;
The target of languages identification is the category of language or dialect type identified belonging to voice signal;Speaker Identification is then by carrying
The personal characteristics of characterization speaker is taken, identifies the identity of speaker.
Since voice is the important carrier of identity information, compared with the other biological feature such as face, fingerprint, the acquisition of voice
Of low cost, using simple, easy to remote data acquisition, and voice-based man-machine communication interface is more friendly, therefore speaks
People's identification technology becomes important automatic identity authentication technology.
The method for the Speaker Identification being commonly used at present includes being based on gauss hybrid models-universal background model (GMM-
UBM speaker's speech recognition) is carried out, although GMM-UBM models have certain noise robustness, since the model is being instructed
The influence of channel is not accounted for when practicing, when training voice and tested speech from different channels, causes its recognition performance
Drastically decline.
The reduction of caused recognition performance during to overcome channel mismatch, the prior art propose one kind and are based on GMM-UBM
The simultaneous factor analysis (Joint Factor Analysis, JFA) of model) mode, to carry out Speaker Identification.But due to
JFA is theoretical to be established in the frame foundation of GMM-UBM models, it is assumed that the main letter that the GMM average super vectors of speaker are included
Breath may map in two mutually independent lower-dimensional subspaces, using EM iterative algorithms to the space based on GMM model frame
Loading matrix is estimated, GMM model frame can not be departed from calculating process.Method for identifying speaker based on JFA theories
It is that channel compensation has been carried out to speaker model according to the parameter estimated during the test, test performance is poor.
The content of the invention
In consideration of it, the embodiment of the present invention provides a kind of method for distinguishing speek person and system based on Fisher super vectors, with
Individual information using the Fisher super vectors high dimensional feature vector in voice data as characterization speaker, and using subspace point
Analysis modeling technique carries out Speaker Identification on the basis of Fisher super vector high dimensional feature vectors, improves the identity of system
Energy.
The embodiment of the present invention is achieved in that a kind of method for distinguishing speek person based on Fisher super vectors, the side
Method includes:
Extract Fisher super vectors;
The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;
Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker
Model;
Reference vector and the training sample speaker of speaker to be identified is obtained according to the subspace speaker model
Reference vector, and according to default computation rule, and the reference vector of the speaker to be identified and the trained sample
The speaker to be identified is identified in the reference vector of this speaker.
The another object of the embodiment of the present invention is to provide a kind of Speaker Recognition System based on Fisher super vectors, institute
The system of stating includes:
Extraction unit, for extracting Fisher super vectors;
Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;
Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection,
To establish subspace speaker model;
Recognition unit, for obtaining the reference vector and instruction of speaker to be identified according to the subspace speaker model
Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with
And the speaker to be identified is identified in the reference vector of the training sample speaker.
Existing beneficial effect is the embodiment of the present invention compared with prior art:In extraction voice data of the embodiment of the present invention
Feature vector of the Fisher super vectors as speaker, and using subspace analysis modeling technique Fisher super vectors base
Speaker Identification is carried out on plinth.Since the extraction of Fisher super vectors is simple, and the dimension with than JFA super vector highers, and
Channel compensation processing was not done, so as to effectively improve the accuracy rate of Speaker Identification and efficiency.In addition, the embodiment of the present invention
Extra hardware need not be increased in above-mentioned identification process, so as to effectively reduce cost, there is stronger ease for use and reality
The property used.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, drawings in the following description be only the present invention some
Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is that the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides realizes flow
Figure;
Fig. 2 is the schematic diagram for the nonparametric distinguishing analysis based on Fisher super vectors that the embodiment of the present invention one provides;
Fig. 3 is the Speaker Recognition System based on Fisher super vectors of the offer of the embodiment of the present invention one with being surpassed based on JFA
The analogous diagram of the Speaker Recognition System comparative result of vector;
Fig. 4 is the composition structure of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors
Figure.
Embodiment
In being described below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc
Body details, understands the embodiment of the present invention to cut thoroughly.However, it will be clear to one skilled in the art that these are specific
The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known system, device, electricity
Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
Fig. 1 shows the realization stream for the method for distinguishing speek person based on Fisher super vectors that the embodiment of the present invention one provides
Journey, details are as follows for this method process:
In step S101, Fisher super vectors are extracted.
In embodiments of the present invention, in order to further improve the accuracy rate of Speaker Identification and efficiency, the embodiment of the present invention
Extract feature vector of the Fisher super vectors in voice data as speaker.
Wherein, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing and
Into, the dimension of the Fisher super vectors is (2d+1) K, wherein:
Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The feature vector sequence of speaker's voice
Arrange X={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM
The number of gauss component in model,
K-th of gauss component in GMM modelwkTable
Show the weight of k-th of gauss component in GMM model,μkRepresent GMM model in k-th of Gauss into
The mean vector divided, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal
Element.
It is described as follows:If the characteristic vector sequence from a voice data is X={ xt, t=1...T }, wherein
Each feature vector xtBetween be mutually independent, X can represent as follows:
Between feature vector under mutually independent assumed condition, Fisher super vectors can regard as to each feature to
The adduction of the regularization gradient statistic of amount, following operator:
It can be regarded as a feature vector xtA point being embedded into higher dimensional space, so as to be easier to linearly be divided
The structure of class device.It is further noted that between feature vector it is mutually independent hypothesis in practical situations often not into
Vertical, for this problem, corresponding processing method can be mentioned in herein below.
Since GMM model can be to any continuously distributed carry out Accurate Model, it is therefore assumed that pdf model pλFor
GMM model.In order to obtain the corresponding Fisher super vectors of every voice data, it is necessary to which one only with speaker and channel information
Vertical universal background model, pλModel is to be trained by a large amount of voice data from different speakers, different channels
Common background GMM model with more gauss component number.Assuming that the GMM model has K gauss component, then the GMM model
Parameter can be expressed as λ={ wk,μk,Σk, k=1 ..., K }, wherein wk, μkAnd ΣkRepresent respectively k-th high in GMM model
The weight of this component, mean vector and covariance matrix.GMM model is represented by the following formula:
Wherein, pkRepresent k-th of gauss component in GMM model:
And there is the following conditions establishment:
In order to ensure pλ(x) distribution of training data can effectively be described, it is assumed that each gauss component in GMM model
Covariance matrix is diagonal matrix, and the element on its diagonal is with vectorRepresent.
In addition, the weight parameter w for the gauss component in GMM modelk, in order to avoid using the immediate constraint shape of above formula
Formula, introduces parameter alphakBy gauss component weight wkIt is expressed as form:
GMM model parameter can be expressed as again, λ={ αk,μk,σk, k=1 ..., K }, a certain feature vector xtRelative to
The gradient of GMM model parameter is expressed as form:
γ in above equationt(k) feature vector x is representedt, can be with to the occupation rate of k-th of gauss component in GMM model
Calculated by the posterior probability of following formula to represent:
Gradient more than having seeks solution's expression, next solves the root mean square problem of Fisher's information inverse of a matrix.
The value of posterior probability is typically very sparse, that is to say, that feature vector xtSimply some gauss component is occupied
Rate is higher, all smaller to the occupation rate of remaining gauss component, reflects and just refers to feature in the spatial distribution of feature vector
The center of some Gaussian function of vector distance is closer, with regard to distant with a distance from other Gaussian function centers.Due to taking
What is be worth is openness, and Fisher's information matrix is diagonal matrix, therefore can obtain the pressure gradient expression formula of following regularization:
In above equationValue be a scalar,WithValue be d dimension vector.Final Fisher super vectors
It is corresponding by all gauss components in GMM modelWithThree splicings obtain, its dimension is (2d+1) K.
In step s 102, the Fisher super vectors of extraction are divided into multiple Fisher subvectors collection.
Particularly, all Gaussian mean vectors of UBM model are clustered using GMM algorithms, according to cluster result,
Average division, or non-average division can be used, the Fisher super vectors are divided into multiple Fisher subvectors collection.
In step s 103, each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to build
Vertical subspace speaker model.
Since Fisher super vectors achieve preferable recognition effect in image classification, and extraction process is easy, therefore
The embodiment of the present invention is introduced into field of speech recognition, studies its application effect in the field.Due to Fisher super vectors
Be also based on what UBM model obtained, thus as JFA super vectors also have GMM super vectors structure, have than JFA surpass to
Measure the dimension of higher.From the point of view of theoretically, more redundancy is contained in Fisher super vectors, it is therefore desirable to using nonparametric
Distinguishing analysis algorithm (NDA) carries out analysis modeling, (as shown in Figure 2) specific as follows to Fisher super vectors:
1) redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained
Projection matrix after the dimensionality reduction of each Fisher subvector collection.
It is included in specifically, being removed using principal component analysis (Principal Component Analysis, PCA) algorithm
Redundancy in Fisher subvectors, corresponds to each Fisher subvectors in Nonparametric Analysis part as shown in Figure 2
Sub- projection matrix W in the projection matrix expression formula of collection11,W21,...,WK1Projection square as after the optimal dimensionality reduction of PCA algorithms
Battle array.
2) projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtained each
The corresponding subspace projection matrix of Fisher subvector collection.
It is specifically, regular (Within-Class Covariance Normalization, WCCN) using covariance in class
Same speaker is reduced due to difference in class caused by the factor such as health status or emotional change, which is to be applied to
In set of eigenvectors after the projection of PCA methods.Correspond to each Fisher in Nonparametric Analysis part shown in Fig. 2
Sub- projection matrix W in the projection matrix expression formula of subvector collection12,W22,...,WK2It is exactly after WCCN feature normalizations algorithm acts on
Obtained subspace projection matrix.
3) differentiation on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric
Information, obtains the linear property distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated.
Specifically, propose that nonparametric linearly distinguishes parser to extract the differentiation information on class border, so that between increasing class
Difference.After the above dimensionality reduction of two steps and feature normalization denoising has been carried out, new characteristic dimension reducing further, so that
Avoid in the linear distinguishing analysis of nonparametric of final step that the problem of singular matrix occurs in Scatter Matrix in obtained class.
Corresponding to the sub- projection matrix in the projection matrix expression formula of each Fisher subvectors collection in Nonparametric Analysis part in Fig. 2
W13,W23,...,WK3It is exactly the projection matrix that nonparametric linearly distinguishes parser.The linear distinguishing analysis of nonparametric
(Nonparametric Linear Discriminant Analysis, NLDA) is to linear distinguishing analysis (Linear
Discriminant Analysis, LDA) algorithm a kind of improvement.
4) subspace after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class is thrown
Shadow matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace projection matrix,
As subspace speaker model.
Specifically, after above-mentioned subspace analysis processing has been carried out respectively to each subvector collection of Fisher super vectors,
It can obtain the product of the projection matrix, i.e. three above projection matrix of each Fisher subvectors collection, Wk=Wk1Wk2Wk3.
After having arrived the projection matrix of all Fisher subvectors collection, they are stitched together successively in sequence to form total Fisher and surpasses
The projection matrix of vector, WTotal=[W1...Wk...WK]。
In step S104, the reference vector and instruction of speaker to be identified are obtained according to the subspace speaker model
Practice the reference vector of sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified with
And the speaker to be identified is identified in the reference vector of the training sample speaker.
Particularly, in the modeling of training sample speaker model and test phase, first to training sample speaker and treat
Identify that the voice of speaker extracts corresponding Fisher super vectors according to the processing method in training total projection matrix, then with training
Good total projection matrix WTotalFisher super vectors are mapped to the subspace of low-dimensional, training sample speaker is respectively obtained and treats
Identify the reference vector R of speakertrainAnd Rtest, finally according to formulaCalculate
COS distance between two reference vectors is as test score;
When the test score is less than predetermined value, it is artificial identical to judge that the speaker to be identified speaks with training sample
Speaker;The artificial different speaker otherwise, it is determined that the speaker to be identified and training sample are spoken.
In order to verify the validity of the method for distinguishing speek person proposed by the present invention based on Fisher super vectors, pass through experiment
The property of Speaker Recognition System of the comparative analysis based on Fisher super vectors and the Speaker Recognition System based on JFA super vectors
Energy.
Experimental data is derived from 2008 speakers of NIST and evaluates and tests database, wherein training and tested speech select core evaluation and test
Male's phone training in task weighs the performance of Speaker Recognition System to call test part as evaluation and test data set.
The training data of UBM comes from Switchboard II phase 2, Switchboard II phase 3, Switchboard
Telephone voice data in Cellular Part 2 and NIST SRE 2004,2005,2006, share 2048 Gausses into
Point.
To training nonparametric subspace distinguishing analysis projection matrix development set data be taken from NIST SRE 2004,
2005th, the call voice in 2006 databases, altogether comprising 563 speakers, each speaker has 8 voice data.
The value of the parameter Q of neighbour's feature vector number is controlled to be set to 4 in the distinguishing analysis algorithm of nonparametric subspace.Non- ginseng
Number subspace distinguishing analysis is with latent factorial analysis, 16 are set to the division number of Fisher super vectors.
Using JFA systems as contradistinction system, the UBM used in it is identical with the above, speaker space loading matrix V
Order be 300, the order of eigenchannel space loading matrix U is 100, residual error loading matrix D by each Gauss in UBM model into
Diagonal entry in the diagonal covariance matrix divided is spliced.
Nonparametric is investigated for first to distinguish in subspace analysis algorithm under the various combination of each projection matrix order
System performance.Due to including 563 speakers in the development set data for training subspace projection matrix altogether, so subspace
Projection matrix Wk3Order the upper limit be 562.In order to extract the differentiation information on classification boundaries, Wk3Order it is unsuitable less than normal, so this
By W in experimentk3Order be set as 550.Further, since PCA dimensionality reductions amplitude is most in nonparametric distinguishes subspace analysis algorithm
Big, Wk1If order cross conference cause projection after feature vector in contain excessive redundancy, Wk1If order it is too small
The loss of necessary differentiation information can be caused again, so the step will directly affect the quality of system performance.In the part Experiment,
For main system performance of investigating with the situation of change of projection matrix order, table 1 shows the nonparametric area based on Fisher super vectors
Divide analysis result:
Table 1
From table 1 it follows that work as the linear distinguishing analysis projection matrix W of nonparametrick3Order to timing, surpassed based on Fisher
The system performance of the nonparametric distinguishing analysis Speaker Recognition System of vector is within the specific limits with PCA projection matrixes Wk1Order
Increase and improve, work as Wk1Order be 1300 when, system performance preferably (EER is minimum, that is, identify error rate it is minimum;MinDCF is (i.e. most
Small detection cost) for 2.73), but with Wk1Order continue to increase so that projection properties in PCA subspaces vector contain compared with
More redundancies, causes system performance to decline.
Second has been investigated the Speaker Recognition System proposed by the invention based on Fisher super vectors with surpassing based on JFA
The comparison of the Speaker Recognition System of vector, as shown in figure 3, abscissa represents probability (the False Alarm that report an error
Probability), ordinate represents miss probability (Miss probability).Although Fisher+NDA system performances compare JFA
System is slightly poor, but it need not train speaker information space and channel to believe using the acoustic feature of substantial amounts of original language material
Space is ceased, the speaker information in Fisher super vectors is directly compressed to one by LFA algorithms using EM iteration by PCA subspaces
In the subspace of more low latitudes, so Fisher systems are whether in parameter learning process or in score calculating process, its meter
Calculation complexity is lower than JFA system, and operation time is also fewer than JFA systems.
Embodiment two:
Fig. 4 shows the composition knot of the Speaker Recognition System provided by Embodiment 2 of the present invention based on Fisher super vectors
Structure, for convenience of description, illustrate only and the relevant part of the embodiment of the present invention.
The Speaker Recognition System based on Fisher super vectors includes:
Extraction unit 41, for extracting Fisher super vectors;
Division unit 42, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;
Model foundation unit 43, for being divided based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection
Analysis, to establish subspace speaker model;
Recognition unit 44, for obtained according to the subspace speaker model speaker to be identified reference vector and
The reference vector of training sample speaker, and according to default computation rule, and the reference vector of the speaker to be identified
And the speaker to be identified is identified in the reference vector of the training sample speaker.
Further, the Fisher super vectors are corresponding by all gauss components in GMM modelWithSplicing
Forming, the dimension of the Fisher super vectors is (2d+1) K, wherein:
Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The feature vector sequence of speaker's voice
Arrange X={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM
The number of gauss component in model,
K-th of gauss component in GMM modelwkRepresent
The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model
Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal
Element.
Further, the model foundation unit 43 includes:
First processing module 431, each Fisher subvectors collection is included in for being removed using principal component analysis PCA algorithms
In redundancy, obtain the projection matrix after the dimensionality reduction of each Fisher subvectors collection;
Second processing module 432, for the regular WCCN algorithms of covariance in use class to the projection matrix after the dimensionality reduction
Handled, obtain the corresponding subspace projection matrix of each Fisher subvectors collection;
3rd processing module 433, for extracting the subspace projection using the linear distinguishing analysis NLDA algorithms of nonparametric
The differentiation information on the class border of matrix, obtains the linear property distinguishing analysis projection square of nonparametric that each Fisher subvectors are concentrated
Battle array;
Model building module 434, for covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class to be advised
The linear distinguishing analysis projection matrix of subspace projection matrix and nonparametric after whole WCCN splices successively in sequence, obtains total
Subspace projection matrix.
Further, the recognition unit 44 includes:
Computing module 441, for obtaining the reference vector of speaker to be identified according to the subspace speaker model
RtrainAnd the reference vector R of training sample speakertest, and according to formulaMeter
The COS distance between two reference vectors is calculated as test score;
Identification module 442, for when the test score is less than predetermined value, judging the speaker to be identified and training
Sample is spoken artificial identical speaker.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(
Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different
Functional unit, module are completed, will the internal structure of the system be divided into different functional units or module, more than completion
The all or part of function of description.Each functional unit in embodiment can be integrated in a processing unit or
Unit is individually physically present, can also two or more units integrate in a unit, above-mentioned integrated unit
Both it can be realized, can also be realized in the form of SFU software functional unit in the form of hardware.In addition, each functional unit, mould
The specific name of block is not limited to the protection domain of the application also only to facilitate mutually distinguish.It is single in said system
Member, the specific work process of module, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
In conclusion the embodiment of the present invention extraction voice data in Fisher super vectors as speaker feature to
Amount, and Speaker Identification is carried out on the basis of Fisher super vectors using subspace analysis modeling technique.Since Fisher surpasses
Vector extraction is simple, and the dimension with than JFA super vector highers, and does not do channel compensation processing, so as to effective
The accuracy rate and efficiency of Speaker Identification are improved, there is stronger usability and practicality.
Those of ordinary skill in the art may realize that each exemplary list described with reference to the embodiments described herein
Member and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, application-specific and design constraint depending on technical solution.Professional technician
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, can pass through others
Mode is realized.For example, device embodiment described above is only schematical, for example, the division of the module or unit,
Only a kind of division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can be with
With reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed
Mutual coupling or direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or
Communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, the technical solution of the embodiment of the present invention
The part substantially to contribute in other words to the prior art or all or part of the technical solution can be with software products
Form embody, which is stored in a storage medium, including some instructions use so that one
Computer equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform this hair
The all or part of step of bright each embodiment the method for embodiment.And foregoing storage medium includes:USB flash disk, mobile hard disk,
Read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic
Dish or CD etc. are various can be with the medium of store program codes.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to foregoing reality
Example is applied the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to foregoing each
Technical solution described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed
Or replace, the essence of appropriate technical solution is departed from the spirit and model of each embodiment technical solution of the embodiment of the present invention
Enclose.
Claims (6)
- A kind of 1. method for distinguishing speek person based on Fisher super vectors, it is characterised in that the described method includes:Extract Fisher super vectors;The Fisher super vectors of extraction are divided into multiple Fisher subvectors collection;Each Fisher subvectors collection is analyzed based on nonparametric distinguishing analysis algorithm, to establish subspace speaker's mould Type;The reference vector of speaker to be identified and the ginseng of training sample speaker are obtained according to the subspace speaker model Vector is examined, and is said according to default computation rule, and the reference vector of the speaker to be identified and the training sample The speaker to be identified is identified in the reference vector of words people;The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein:Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The characteristic vector sequence X of speaker's voice ={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,K-th of gauss component in GMM modelwkRepresent The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
- 2. the method as described in claim 1, it is characterised in that the nonparametric distinguishing analysis algorithm that is based on is to each Fisher Subvector collection is analyzed, and is included with establishing subspace speaker model:The redundancy for being included in each Fisher subvectors and concentrating is removed using principal component analysis PCA algorithms, is obtained each Projection matrix after the dimensionality reduction of Fisher subvector collection;The projection matrix after the dimensionality reduction is handled using the regular WCCN algorithms of covariance in class, obtains each Fisher The corresponding subspace projection matrix of vector set;The differentiation information on the class border of the subspace projection matrix is extracted using the linear distinguishing analysis NLDA algorithms of nonparametric, is obtained The linear distinguishing analysis projection matrix of nonparametric concentrated to each Fisher subvectors;By the subspace projection matrix after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class And the linear distinguishing analysis projection matrix of nonparametric splices successively in sequence, obtains total subspace projection matrix.
- 3. method according to claim 1, it is characterised in that described to be identified according to subspace speaker model acquisition The reference vector of speaker and the reference vector of training sample speaker, according to default computation rule, and described wait to know The reference vector of other speaker and the reference vector of the training sample speaker, which carry out Speaker Identification step, to be included:The reference vector R of speaker to be identified is obtained according to the subspace speaker modeltrainAnd training sample speaker Reference vector Rtest, and according to formulaCalculate the cosine between two reference vectors Distance is as test score;When the test score is less than predetermined value, judge that the speaker to be identified and training sample are spoken artificially identical say Talk about people.
- A kind of 4. Speaker Recognition System based on Fisher super vectors, it is characterised in that the system comprises:Extraction unit, for extracting Fisher super vectors;Division unit, for the Fisher super vectors of extraction to be divided into multiple Fisher subvectors collection;Model foundation unit, for being analyzed based on nonparametric distinguishing analysis algorithm each Fisher subvectors collection, to build Vertical subspace speaker model;Recognition unit, for obtaining the reference vector and training sample of speaker to be identified according to the subspace speaker model The reference vector of this speaker, and according to default computation rule, and the reference vector of the speaker to be identified and institute The speaker to be identified is identified in the reference vector for stating training sample speaker;The Fisher super vectors are corresponding by all gauss components in GMM modelWithIt is spliced, it is described The dimension of Fisher super vectors is (2d+1) K, wherein:Wherein,Value be scalar,WithValue be d dimension vector, d >=1;The characteristic vector sequence X of speaker's voice ={ xt, t=1...T }, xtRepresent feature vector, T represents the number of feature vector in characteristic vector sequence X, and K represents GMM model The number of middle gauss component,K-th of gauss component in GMM modelwkRepresent The weight of k-th of gauss component in GMM model,μkRepresent k-th of gauss component in GMM model Mean vector, ΣkRepresent the covariance matrix of k-th of gauss component in GMM model,Represent ΣkMember on middle diagonal Element.
- 5. system as claimed in claim 4, it is characterised in that the model foundation unit includes:First processing module, the superfluous of each Fisher subvectors concentration is included in for being removed using principal component analysis PCA algorithms Remaining information, obtains the projection matrix after the dimensionality reduction of each Fisher subvectors collection;Second processing module, at using the regular WCCN algorithms of covariance in class to the projection matrix after the dimensionality reduction Reason, obtains the corresponding subspace projection matrix of each Fisher subvectors collection;3rd processing module, for extracting the class of the subspace projection matrix using the linear distinguishing analysis NLDA algorithms of nonparametric The differentiation information on border, obtains the linear distinguishing analysis projection matrix of nonparametric that each Fisher subvectors are concentrated;Model building module, for by after the regular WCCN of covariance in the projection matrix after the principal component analysis PCA dimensionality reductions, class Subspace projection matrix and the linear distinguishing analysis projection matrix of nonparametric splice successively in sequence, obtain total subspace Projection matrix.
- 6. system according to claim 4, it is characterised in that the recognition unit includes:Computing module, for obtaining the reference vector R of speaker to be identified according to the subspace speaker modeltrainAnd instruction Practice the reference vector R of sample speakertest, and according to formulaCalculate two references COS distance between vector is as test score;Identification module, for when the test score is less than predetermined value, judging that the speaker to be identified says with training sample The artificial identical speaker of words.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410802816.6A CN104538035B (en) | 2014-12-19 | 2014-12-19 | A kind of method for distinguishing speek person and system based on Fisher super vectors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410802816.6A CN104538035B (en) | 2014-12-19 | 2014-12-19 | A kind of method for distinguishing speek person and system based on Fisher super vectors |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104538035A CN104538035A (en) | 2015-04-22 |
CN104538035B true CN104538035B (en) | 2018-05-01 |
Family
ID=52853551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410802816.6A Active CN104538035B (en) | 2014-12-19 | 2014-12-19 | A kind of method for distinguishing speek person and system based on Fisher super vectors |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104538035B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105632502A (en) * | 2015-12-10 | 2016-06-01 | 江西师范大学 | Weighted pairwise constraint metric learning algorithm-based speaker recognition method |
CN105869645B (en) * | 2016-03-25 | 2019-04-12 | 腾讯科技(深圳)有限公司 | Voice data processing method and device |
CN106128466B (en) | 2016-07-15 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Identity vector processing method and device |
CN106297807B (en) | 2016-08-05 | 2019-03-01 | 腾讯科技(深圳)有限公司 | The method and apparatus of training Voiceprint Recognition System |
CN106601258A (en) * | 2016-12-12 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Speaker identification method capable of information channel compensation based on improved LSDA algorithm |
CN107633845A (en) * | 2017-09-11 | 2018-01-26 | 清华大学 | A kind of duscriminant local message distance keeps the method for identifying speaker of mapping |
CN109036437A (en) * | 2018-08-14 | 2018-12-18 | 平安科技(深圳)有限公司 | Accents recognition method, apparatus, computer installation and computer readable storage medium |
CN109065059A (en) * | 2018-09-26 | 2018-12-21 | 新巴特(安徽)智能科技有限公司 | The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established |
CN111462759B (en) * | 2020-04-01 | 2024-02-13 | 科大讯飞股份有限公司 | Speaker labeling method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222500A (en) * | 2011-05-11 | 2011-10-19 | 北京航空航天大学 | Extracting method and modeling method for Chinese speech emotion combining emotion points |
CN103077720A (en) * | 2012-12-19 | 2013-05-01 | 中国科学院声学研究所 | Speaker identification method and system |
CN103578481A (en) * | 2012-07-24 | 2014-02-12 | 东南大学 | Method for recognizing cross-linguistic voice emotion |
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10047723A1 (en) * | 2000-09-27 | 2002-04-11 | Philips Corp Intellectual Pty | Method for determining an individual space for displaying a plurality of training speakers |
-
2014
- 2014-12-19 CN CN201410802816.6A patent/CN104538035B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222500A (en) * | 2011-05-11 | 2011-10-19 | 北京航空航天大学 | Extracting method and modeling method for Chinese speech emotion combining emotion points |
CN103578481A (en) * | 2012-07-24 | 2014-02-12 | 东南大学 | Method for recognizing cross-linguistic voice emotion |
CN103077720A (en) * | 2012-12-19 | 2013-05-01 | 中国科学院声学研究所 | Speaker identification method and system |
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
Non-Patent Citations (1)
Title |
---|
CLUSTERING SIMILAR ACOUSTIC CLASSES IN THE FISHERVOICE FRAMEWORK;Na li, et al.;《Acoustics, Speech and Signal Processing (ICASSP),2013 IEEE International Conference on》;20131021;7726-7728 * |
Also Published As
Publication number | Publication date |
---|---|
CN104538035A (en) | 2015-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104538035B (en) | A kind of method for distinguishing speek person and system based on Fisher super vectors | |
CN107680600B (en) | Sound-groove model training method, audio recognition method, device, equipment and medium | |
CN104167208B (en) | A kind of method for distinguishing speek person and device | |
Liu et al. | GMM and CNN hybrid method for short utterance speaker recognition | |
CN107610707B (en) | A kind of method for recognizing sound-groove and device | |
Bonastre et al. | ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN109461073A (en) | Risk management method, device, computer equipment and the storage medium of intelligent recognition | |
CN107492382A (en) | Voiceprint extracting method and device based on neutral net | |
CN103854645B (en) | A kind of based on speaker's punishment independent of speaker's speech-emotion recognition method | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
CN110516696A (en) | It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression | |
CN108597496A (en) | Voice generation method and device based on generation type countermeasure network | |
CN107342077A (en) | A kind of speaker segmentation clustering method and system based on factorial analysis | |
CN108281146A (en) | A kind of phrase sound method for distinguishing speek person and device | |
CN102324232A (en) | Method for recognizing sound-groove and system based on gauss hybrid models | |
CN108932950A (en) | It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
CN108922544A (en) | General vector training method, voice clustering method, device, equipment and medium | |
CN109256138A (en) | Auth method, terminal device and computer readable storage medium | |
CN110047504A (en) | Method for distinguishing speek person under identity vector x-vector linear transformation | |
CN109817222A (en) | A kind of age recognition methods, device and terminal device | |
CN109378014A (en) | A kind of mobile device source discrimination and system based on convolutional neural networks | |
CN109800309A (en) | Classroom Discourse genre classification methods and device | |
CN109473102A (en) | A kind of robot secretary intelligent meeting recording method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |