CN106057206A - Voiceprint model training method, voiceprint recognition method and device - Google Patents
Voiceprint model training method, voiceprint recognition method and device Download PDFInfo
- Publication number
- CN106057206A CN106057206A CN201610388231.3A CN201610388231A CN106057206A CN 106057206 A CN106057206 A CN 106057206A CN 201610388231 A CN201610388231 A CN 201610388231A CN 106057206 A CN106057206 A CN 106057206A
- Authority
- CN
- China
- Prior art keywords
- character
- targeted customer
- vocal print
- model
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000012549 training Methods 0.000 title claims abstract description 62
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 230000001755 vocal effect Effects 0.000 claims description 114
- 239000012634 fragment Substances 0.000 claims description 30
- 238000000605 extraction Methods 0.000 claims description 22
- 239000000284 extract Substances 0.000 claims description 15
- 238000007476 Maximum Likelihood Methods 0.000 claims description 10
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 claims description 6
- 230000008447 perception Effects 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 description 181
- 230000008569 process Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 208000031481 Pathologic Constriction Diseases 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000013102 re-test Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000001215 vagina Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/16—Hidden Markov models [HMM]
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a voiceprint model training method, a voiceprint recognition method and a voiceprint recognition device, which belong to the field of voice recognition. The voiceprint recognition method comprises the steps of: acquiring a test voice signal generated by an unknown user reading a second character string, wherein the second character string comprises a plurality of characters arranged in sequence; extracting a voiceprint feature sequence corresponding to the characters from the test voice signal; constructing an HMM corresponding to the second character string according to n GMM, which correspond to n kinds of basic characters respectively, of a target user; calculating a similarity score of the voiceprint feature sequence and the HMM; and recognizing the unknown user to be the target user when the similarity score is greater than a preset threshold value. According to the voiceprint model training method, the voiceprint recognition method and the voiceprint recognition device, difference between audio content corresponding to each kind of basic characters in the phoneme level is considered among the GMM, which correspond to the basic characters respectively, of the target user, the HMM further considers the correlation between the audio content corresponding to each basic character in the time domain, and the recognition accuracy rate can be significantly improved.
Description
Technical field
The present embodiments relate to field of speech recognition, particularly to a kind of sound-groove model training method, Application on Voiceprint Recognition side
Method and device.
Background technology
Application on Voiceprint Recognition is a kind of to utilize vocal print characteristic information that unknown subscriber is carried out the technology of identity validation.Application on Voiceprint Recognition can
Need to identify the scene of user identity for gate control system, payment system etc..Current Application on Voiceprint Recognition generally uses text to be correlated with
Identify.
Application on Voiceprint Recognition generally includes two processes: the registration process of targeted customer and the identification procedure of unknown subscriber.
In the registration process of targeted customer, system can provide a login-string to read aloud for targeted customer, and this login-string leads to
Often include that tactic several are digital and/or alphabetical, the registration voice signal produced when system acquisition targeted customer reads aloud,
And the gauss hybrid models (Gaussian Mixture Model, GMM) of targeted customer is obtained according to registration voice signal training;
In the identification procedure of unknown subscriber, tested speech signal when unknown subscriber is read aloud an identification string and target
The GMM of user carries out similarity mode, when similarity is more than predetermined threshold value, unknown subscriber is identified as targeted customer.
During realizing the embodiment of the present invention, inventor finds that prior art at least there is problems in that above-mentioned
In method, there is dependency in audio content corresponding with each basis character in registration voice signal, this registration voice signal bag
Contain abundant information for characterizing the feature of unknown subscriber, but the GMM of targeted customer be the model that a kind of text is unrelated, it is impossible to
Utilize information abundant in registration voice signal.
Summary of the invention
In view of this, a kind of sound-groove model training method, method for recognizing sound-groove and device are embodiments provided.Institute
State technical scheme as follows:
First aspect, it is provided that a kind of sound-groove model training method, described method includes:
Gathering targeted customer and read aloud the first character string produced registration voice signal, described first character string includes m
The character of arranged in sequence, described m character includes n kind mutually different basis character, m and n is positive integer and m >=n;
The vocal print feature corresponding to each character is extracted from described registration voice signal;
It is characterized as the first sample data, to presetting with the described vocal print corresponding to each described character of described targeted customer
Universal background model be trained, obtain the mixed Gauss model of described targeted customer;
It is characterized as the second sample data, to described with the vocal print corresponding to i-th kind of basic character with described targeted customer
The described mixed Gauss model of targeted customer is trained, obtain described targeted customer with i-th kind of basic character corresponding to
Described mixed Gauss model;
Store n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, described n mixing
Gauss model is for building the hidden Markov model corresponding with the second character string.
Second aspect, it is provided that a kind of method for recognizing sound-groove, described method includes:
Obtaining unknown subscriber and read aloud tested speech signal produced by the second character string, described second character string includes k
The character of arranged in sequence, described k character include n kind mutually different basis character in alphabet or partial character, k and
N is positive integer;
The vocal print characteristic sequence corresponding to each character is extracted from described tested speech signal;
According to n the mixed Gauss model the most corresponding with n kind basis character of targeted customer, build and described second word
The HMM that symbol string is corresponding;
Calculate the similarity score of described vocal print characteristic sequence and described HMM;
When described similarity score is more than predetermined threshold value, described unknown subscriber is identified described targeted customer.
The third aspect, it is provided that a kind of sound-groove model training devices, described device includes:
Acquisition module, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, and described first
Character string includes that the character of m arranged in sequence, described m character include n kind mutually different basis character, m and n is the most whole
Number and m >=n;
Extraction module, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module, for being characterized as the with the described vocal print corresponding to each described character of described targeted customer
One sample data, is trained default universal background model, obtains the mixed Gauss model of described targeted customer;
Second training module, for being characterized as the with described targeted customer with the vocal print corresponding to i-th kind of basic character
Two sample datas, are trained the described mixed Gauss model of described targeted customer, obtain described targeted customer with i-th kind
The basis described mixed Gauss model corresponding to character;
Memory module, for storing n the mixed Gaussian mould the most corresponding with n kind basis character of described targeted customer
Type, described n mixed Gauss model is for building the hidden Markov model corresponding with the second character string.
Fourth aspect, it is provided that a kind of voice print identification device, described device includes:
Acquisition module, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, and described second
Character string includes that the character of k arranged in sequence, described k character include the alphabet in the character of n kind mutually different basis
Or partial character, k and n is positive integer;
Extraction module, for extracting the vocal print characteristic sequence corresponding to each character from described tested speech signal;
Build module, for n the mixed Gauss model the most corresponding with n kind basis character according to targeted customer, structure
Build the HMM corresponding with described second character string;
Computing module, for calculating the similarity score of described vocal print characteristic sequence and described HMM;
Identification module, for when described similarity score is more than predetermined threshold value, identifying described mesh by described unknown subscriber
Mark user.
The sound-groove model training method that the embodiment of the present invention provides has the benefit that
By according to the vocal print feature corresponding to each character of targeted customer, UBM training being obtained targeted customer's
GMM, trains n the GMM, n GMM the most corresponding with n kind basis character obtaining targeted customer to be used for by the GMM of targeted customer
Build the HMM corresponding with the second character string;Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize note
The problem of information abundant in volume voice signal;Having reached for each targeted customer, training obtains and several basis characters
The most corresponding GMM, considers every kind of audio content corresponding to the basic character diversity in phoneme aspect between each GMM,
In addition these several GMM can also be used to build and the HMM model corresponding to identification string, and HMM model also contemplates each
Basis audio content corresponding to character dependency in time domain such that it is able to the sound-groove model greatly increasing targeted customer exists
The recognition accuracy in identification stage;
The method for recognizing sound-groove that the embodiment of the present invention provides has the benefit that
By by the vocal print characteristic sequence of tested speech signal, HMM meter GMM constructed by corresponding with multiple bases character
Calculate similarity score, thus unknown subscriber is carried out identification;Solving the GMM of targeted customer is the unrelated mould of a text
Type, it is impossible to the problem utilizing information abundant in registration voice signal;Reach for each targeted customer, with each base word
Every kind of audio content corresponding to the basic character diversity in phoneme aspect, and HMM mould is considered between the GMM that symbol is corresponding respectively
Type also contemplates audio content corresponding to each basis character dependency in time domain such that it is able to greatly increases target and uses
The sound-groove model at family is at the recognition accuracy in identification stage.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below required for make
Accompanying drawing be briefly described, it should be apparent that, below describe in accompanying drawing be only some embodiments of the present invention, for
From the point of view of those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain other according to these accompanying drawings
Accompanying drawing.
Fig. 1 is the principle schematic of the method for recognizing sound-groove based on random string that one embodiment of the invention provides;
Fig. 2 is the flow chart of the sound-groove model training method that one embodiment of the invention provides;
Fig. 3 is the principle schematic of sound-groove model training method shown in Fig. 2;
Fig. 4 is the flow chart of the sound-groove model training method that another embodiment of the present invention provides;
Fig. 5 is the principle schematic of the voice messaging annotation process involved by sound-groove model training method shown in Fig. 4;
Fig. 6 is the principle schematic of the model training process involved by sound-groove model training method shown in Fig. 4;
Fig. 7 is the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides;
Fig. 8 is the flow chart of the method for recognizing sound-groove that another embodiment of the present invention provides;
Fig. 9 is the model schematic of the HMM constructed by method for recognizing sound-groove shown in Fig. 8;
Figure 10 is the block diagram of the sound-groove model training devices that one embodiment of the invention provides;
Figure 11 is the block diagram of the voice print identification device that another embodiment of the present invention provides.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiments provide a kind of method for recognizing sound-groove based on random string and device.Should be based on random words
Method for recognizing sound-groove and the device of symbol string can be applicable in the scene that be there is a need to identify unknown subscriber's identity.Random for generating
The basic character of character string can be Arabic numerals, English alphabet or other language characters etc., and each basis character is typically
One numeral or a character, but it is not excluded for the entirety possibility as a basic character of several numerals or several characters
Property.In order to simplify description, the embodiment of the present invention illustrates as a example by each basis character is Arabic numerals.
Method for recognizing sound-groove based on random string is divided into two stages, as shown in Figure 1:
First, the registration phase 12 of targeted customer;
At registration phase, one login-string of voice print identification device stochastic generation, and in display interface, show this number
Word character string.Targeted customer reads aloud this login-string, and voice print identification device gathers this targeted customer registration language when reading aloud
Tone signal, then carries out vocal print feature extraction and vocal print model training to registration voice signal, obtains the vocal print mould of targeted customer
Type.In the sound-groove model of each targeted customer, comprise several GMM (Gaussian Mixture Model, mixed Gaussian mould
Type), each GMM is corresponding with one numeral.
Such as, this login-string is digit strings 0185851, contains four kinds of digital " 0 ", " 1 ", " 5 ", " 8 ", then
In the sound-groove model of each targeted customer, comprise GMM corresponding with numeral " 1 " for the GMM corresponding with digital " 0 " and numeral " 5 "
The GMM that corresponding GMM is corresponding with numeral " 8 ".
Second, the identification stage 14 of unknown subscriber.
In the identification stage, voice print identification device is further according to digital collection " 0 ", " 1 ", " 5 " and " 8 " stochastic generation one
Identification string, and in display interface, show this identification string, unknown subscriber reads aloud this identification string, and Application on Voiceprint Recognition fills
Put collection this unknown subscriber tested speech signal when reading aloud, then tested speech signal is carried out vocal print feature extraction, adopt
With sound-groove model corresponding to each targeted customer build the HMM corresponding with digit strings (Hidden Markov Model, hidden
Markov model), calculate the similarity of vocal print feature and each HMM of unknown subscriber, by the highest for similarity and similarity is high
In the targeted customer corresponding to a HMM of threshold value, as the identification result of unknown subscriber.
Such as, the identification string of stochastic generation is digit strings 85851510 again, then voice print identification device according to
Each targeted customer with digital " 0 ", " 1 ", " 5 ", " 8 " each self-corresponding GMM, it is right with identification string " 85851510 " to build
The HMM answered, calculates the similarity of the vocal print feature of unknown subscriber and the HMM of each targeted customer, the highest and similar in similarity
When degree is targeted customer B higher than the HMM of threshold value, using targeted customer B as the identification result of unknown subscriber.
Use different embodiments that above-mentioned two process is illustrated respectively below.
Fig. 2 shows the method flow diagram of the sound-groove model training method that one embodiment of the invention provides.This vocal print mould
Type training method can apply in Voiceprint Recognition System.This sound-groove model training method includes:
Step 201, gathers targeted customer and reads aloud the first character string produced registration voice signal, and book character string includes
The character of m arranged in sequence, m character includes n kind mutually different basis character;
First character string is the character string of the registration phase for targeted customer.Alternatively, this first character string is random
The character string generated.M, n are positive integer, and m >=n.
Such as, the first character string is " 12358948 ", totally 8 characters, include 7 kinds mutually different basis characters " 1 ",
“2”、“3”、“4”、“5”、“8”、“9”。
Step 202, extracts the vocal print feature corresponding to each character from registration voice signal;
Such as, from primary speech signal, extract the language that the sound bite corresponding with character " 1 " is corresponding with character " 1 "
The voice sheet that sound bite that the tablet section sound bite corresponding with character " 2 " is corresponding with character " 3 " is corresponding with character " 4 "
The sound bite that sound bite that the section sound bite corresponding with character " 5 " is corresponding with character " 8 " is corresponding with character " 9 ".
Then, from the sound bite that each character is corresponding, the vocal print feature corresponding with this character is extracted.
Step 203, is characterized as the first sample data according to the vocal print corresponding to each character of targeted customer, to default
UBM is trained, and obtains the GMM of targeted customer;
UBM (Universal Background Model, universal background model) is that build in advance whole are instructed by numeral
The universal model got.UBM has the characteristic that identity is unrelated and text is unrelated.The unrelated UBM of referring to of identity does not consider user's body
Part difference, the most corresponding some or certain several specific users;The unrelated UBM of referring to of text does not consider numeral (character) difference, no
The several specific numerals of correspondence some or certain, as shown in the UBM32 in Fig. 3.
Alternatively, maximal posterior probability algorithm (Maximum A Posteriori, MAP) each according to targeted customer is used
Individual vocal print feature, is adjusted the parameter in UBM, thus self adaptation obtains the GMM of targeted customer.
The GMM of targeted customer has the feature that identity is relevant and text is unrelated.Identity is relevant refers to that this GMM is corresponding specific
Targeted customer;Text is unrelated refers to that this GMM does not consider numeral (basis character) difference, the most corresponding some or certain several specifically
Numeral, as shown in the GMM34 of the targeted customer in Fig. 3.
Step 204, is characterized as the second sample data with targeted customer with the vocal print corresponding to i-th kind of basic character, right
The GMM of targeted customer is trained, obtain targeted customer with the GMM corresponding to i-th kind of basic character;
Alternatively, use maximum a posteriori probability (Maximum A Posteriori, MAP) algorithm according to targeted customer with
Vocal print feature corresponding to i-th kind of character, is adjusted the parameter in the GMM of targeted customer, thus self adaptation obtains target
User with the GMM corresponding to i-th kind of basic character.Have to the GMM corresponding to i-th kind of basic character that identity is relevant and text
Relevant feature, identity is relevant refers to the corresponding specific targeted customer of this GMM;Text is relevant refers to the corresponding specific number of this GMM
Word, as shown in the GMM36 the most corresponding with various bases character in Fig. 3.
Such as, according to targeted customer A with numeral " 8 " corresponding to vocal print feature, to the ginseng in the GMM of targeted customer A
Number be adjusted, thus obtain targeted customer A with numeral " 8 " corresponding to GMM.
Repeated execution of steps 204, obtains n the GMM the most corresponding with each single character of targeted customer.
Step 205, n the GMM that the character single with n kind of storage targeted customer is respectively the most corresponding, n GMM be used for structure and
The HMM that second character string is corresponding.
N the GMM of storage targeted customer is to model library, in order in the identification stage of follow-up unknown subscriber, uses
N the GMM of targeted customer builds the HMM corresponding with the second character string.
In sum, the sound-groove model training method that the present embodiment provides, by each character institute according to targeted customer
Corresponding vocal print feature, obtains the GMM of targeted customer by UBM training, the GMM of targeted customer is trained obtain targeted customer with
N the GMM that n kind basis character is the most corresponding, n GMM is for building the HMM corresponding with the second character string;Solve target to use
The GMM at family is the model that a text is unrelated, it is impossible to the problem utilizing information abundant in registration voice signal;Reached for
Each targeted customer, training obtains the GMM the most corresponding with several basis characters, considers every kind of basis between each GMM
Audio content corresponding to character is in the diversity of phoneme aspect, and these several GMM can also be used to build and identify character in addition
HMM model corresponding to string, HMM model also contemplates audio content corresponding to each basis character dependency in time domain,
It is thus possible to greatly increase the sound-groove model recognition accuracy in the identification stage of targeted customer.
Fig. 4 shows the method flow diagram of the sound-groove model training method that one embodiment of the invention provides.This vocal print mould
Type training method can apply in Voiceprint Recognition System.The present embodiment includes with this sound-groove model training method:
Step 401, stochastic generation the first character string also shows.
Alternatively, in Voiceprint Recognition System, storage has basic character set.As a example by basis character is numeral, basis character
Set includes: 0,1,2,3,4,5,6,7,8,9.
Alternatively, Voiceprint Recognition System is with the basic character in the character set of basis as element, random according to random algorithm
Generate the first character string.First character string includes that the character of m arranged in sequence, m character include the mutually different base word of n kind
Symbol, m and n is positive integer and m >=n.That is, each basis character can occur many in the kinds of characters position in the first character string
Secondary.Alternatively, in order to improve model test coverage, the first character string can include the whole bases character in the character set of basis.
Such as, the first character string is 1981753651240;The most such as, the first character string is 01580518.
Voiceprint Recognition System by the first character string display on a display screen, is read aloud for targeted customer to be registered.Alternatively,
Voiceprint Recognition System shows auxiliary information the most on a display screen, and schematically auxiliary information is " please after prompt tone, under bright reading
State numeric string: 01580518 ".
Alternatively, in addition to stochastic generation mode, the first character string can also is that default changeless character string.
Step 402, gathers targeted customer and reads aloud the first character string produced registration voice signal.
Voiceprint Recognition System gathers targeted customer by mike and reads aloud the first character string produced registration voice signal.
Step 403, identifies the efficient voice fragment in registration voice signal and invalid voice fragment.
Owing to targeted customer is when reading aloud each character, between adjacent two characters, there is the dead time, so registration language
Tone signal i.e. includes effective sound bite, includes again invalid voice fragment.Invalid voice fragment can be the most quiet sheet
Section, the most quiet section;Can also be the fragment comprising noise, i.e. noise section.
Voiceprint Recognition System needs to identify the efficient voice fragment in registration voice signal and invalid voice fragment.Fig. 5
Schematically illustrate the principle schematic of this identification process.Registration voice is believed by Voiceprint Recognition System by speech recognition engine
Numbers 50 are labeled, and the region between two adjacent efficient voice fragments (sound bite at waveshape signal place in figure) is nothing
Effect sound bite, is not involved in follow-up calculating process.
Alternatively, after being labeled registration voice signal, corresponding voice annotation information is according to (initial time, termination
Moment, basis character) form preserve, the voice annotation information of such as Fig. 4 as shown in Table 1:
Table one
Initial time | End time | Basis character |
1.86 | 2.36 | 0 |
3.07 | 3.60 | 1 |
…… | …… | …… |
10.11 | 10.55 | 8 |
Wherein, 1.86 refer to first basic character " 0 " initial time in registration voice signal, and 2.36 refer to first
Individual basis character " 0 " end time in registration voice signal;3.07 refer to that second basic character " 1 " is at registration voice letter
Initial time in number, 3.60 refer to second basic character " 1 " end time in registration voice signal;10.11 refer to
Last basis character " 8 " initial time in registration voice signal, 10.55 refer to that last basis character " 8 " exists
End time in registration voice signal.
Step 404, by the jth efficient voice fragment in registration voice signal, is extracted as and the jth in the first character string
Sound bite corresponding to individual character.
Voiceprint Recognition System will registration voice signal in first efficient voice fragment, be extracted as with in the first character string
The sound bite corresponding to first character;By second efficient voice fragment in registration voice signal, it is extracted as and the
The sound bite corresponding to second character in one character string, by that analogy, last in registration voice signal is effective
Sound bite, is extracted as and the sound bite corresponding to last character in the first character string.
Such as, in conjunction with Fig. 5, the sound bite corresponding to " 1.86-2.36 " in registration voice signal is extracted as and first
The sound bite that individual character " 0 " is corresponding.
Step 405, extracts and the vocal print feature of the sound bite corresponding to jth character.
Each sound bite is equivalent to a Short Time Speech frame sequence, and it is right with jth character institute that Voiceprint Recognition System extracts
MFCC (Mel Frequency Cepstrum Coefficient, mel cepstrum coefficients) in the sound bite answered or PLP
(Perceptual Linear Predict ive, perception linear predictor coefficient), as with the voice corresponding to jth character
The vocal print feature of fragment.
It should be noted that j is more than or equal to 1 and less than or equal to m positive integer.Alternatively, exist and be positioned at different sequence
Position but substantially identical character, such as in the first character string " 01580518 ", first character and the 5th character are
Basis character " 0 ", now can extract two the vocal print features corresponding with basis character " 0 ".
If the first character string includes n kind basis character, then the available vocal print the most corresponding with n kind basis character is special
Levy.
Step 406, is characterized as the first sample data with the vocal print corresponding to each basis character of targeted customer, uses
Parameter in default UBM is adjusted by big posterior probability algorithm, obtains the GMM of targeted customer.
UBM is the whole universal background models obtained by numeral training built in advance.UBM has that identity is unrelated and text
Unrelated characteristic.Schematically, employing number is more than 1000 people, the duration speech samples more than 20 hours, does not consider numeral
Difference, training obtains UBM.
The mathematic(al) representation of UBM is:
Wherein, P (x) represents the probability distribution of UBM, and C represents total C Gauss module in UBM, sums up, ωiRepresent
The weight of i-th Gauss module, μiRepresenting the average of i-th Gauss module, N (x) represents Gauss distribution, and x represents the sample of input
This, sample namely vocal print feature.
In this step, do not consider the feature difference between the character of basis, by corresponding for all bases character of targeted customer
All vocal print features as input the first sample data, UBM is trained.In the training process, maximum a posteriori is passed through
Parameter in UBM is adjusted by probabilistic algorithm, thus obtains the GMM of targeted customer.
Step 407, is characterized as the second sample data with targeted customer with the vocal print corresponding to i-th kind of basic character, adopts
Be adjusted by the parameter in the maximal posterior probability algorithm GMM to targeted customer, obtain targeted customer with i-th kind of base word
GMM corresponding to symbol.
In this step, need to consider the feature difference between the character of basis, be only used for corresponding to i-th kind of basic character
Vocal print feature as input the second sample data, the GMM of targeted customer is carried out second training.In the training process, logical
Cross maximal posterior probability algorithm the parameter in the GMM of targeted customer is adjusted, obtain targeted customer with i-th kind of base word
GMM corresponding to symbol.
Such as, according to targeted customer with the vocal print feature corresponding to digital " 0 " as input sample, to targeted customer's
GMM carries out second training, obtain targeted customer with the vocal print feature corresponding to digital " 0 ".
When there is vocal print feature corresponding to n kind basis character, after performing step 407, whether Voiceprint Recognition System detection i
Equal to n, if i is less than n, then makes i=i+1, again perform step 407.
For each targeted customer, final training obtains n the GMM the most corresponding with n kind basis character, basic character and
GMM one_to_one corresponding,
Schematically with reference to Fig. 6, the first character string is 01580518, and final training obtains the sound-groove model of targeted customer
In, including the GMM corresponding with 4 basic characters, GMM, ID_5 that GMM, ID_1 of being corresponding for ID_0 respectively are corresponding are corresponding
The GMM that GMM, ID_8 are corresponding.
Step 408, n the GMM respectively the most corresponding with n kind basis character of storage targeted customer, n GMM be used for structure and
The HMM that second character string is corresponding.
N the GMM the most corresponding with n kind basis character of voiceprint identification module storage targeted customer.
Second character string is the character string used in identification procedure.Optionally, the second character string is based on n kind base
Alphabet in plinth character or the character string of partial character institute stochastic generation.Every kind of basic character can in the second character string not
Occur with ordinal position, and every kind of basic character can occur repeatedly in the different order position of the second character string.
In sum, the sound-groove model training method that the present embodiment provides, by each character institute according to targeted customer
Corresponding vocal print feature, obtains the GMM of targeted customer by UBM training, the GMM of targeted customer is trained obtain targeted customer with
N the GMM that n kind basis character is the most corresponding, n GMM is for building the HMM corresponding with the second character string;Solve target to use
The GMM at family is the model that a text is unrelated, it is impossible to the problem utilizing information abundant in registration voice signal;Reached for
Each targeted customer, training obtains the GMM the most corresponding with several basis characters, considers every kind of basis between each GMM
Audio content corresponding to character is in the diversity of phoneme aspect, and these several GMM can also be used to build and identify character in addition
HMM model corresponding to string, HMM model also contemplates audio content corresponding to each basis character dependency in time domain,
It is thus possible to greatly increase the sound-groove model recognition accuracy in the identification stage of targeted customer.
Fig. 7 shows the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides.This method for recognizing sound-groove can
Being applied in Voiceprint Recognition System, this Voiceprint Recognition System can belong to same with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4
Equipment, it is also possible to belong to distinct device with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4.This method for recognizing sound-groove includes:
Step 701, obtains unknown subscriber and reads aloud tested speech signal produced by the second character string.
Alternatively, the second character string includes that the character of k arranged in sequence, k character include the mutually different base word of n kind
Alphabet in symbol or partial character, k and n is positive integer.
Alternatively, n kind mutually different basis character is the n kind basis character that the registration process of targeted customer is used.
Alternatively, that the second character string is randomly generated or changeless, the second character string is identical with the first character string
Or differ.Such as, the second character string is digit strings " 851185 ".
Step 702, extracts the vocal print characteristic sequence corresponding to each character from registration voice signal.
Step 703, according to n the GMM the most corresponding with n kind basis character of targeted customer, builds and the second character string
Corresponding HMM.
Such as, n the GMM of targeted customer includes the GMM corresponding with 4 basic characters, be respectively corresponding for ID_0 GMM,
The GMM that GMM, ID_8 corresponding for GMM, ID_5 corresponding for ID_1 is corresponding.
Owing to the second character string only includes basis character " 1 " " 5 " " 8 ", then utilize GMM, ID_5 corresponding for ID_1 corresponding
The GMM that GMM, ID_8 are corresponding, constructs the HMM corresponding with the second character string " 851185 ".
Step 704, calculates the similarity score of tested speech signal and HMM.
Step 705, when similarity score is more than predetermined threshold value, identifies targeted customer by unknown subscriber.
In sum, the method for recognizing sound-groove that the present embodiment provides, by by the vocal print characteristic sequence of tested speech signal,
HMM GMM constructed by corresponding with multiple bases character calculates similarity score, thus unknown subscriber is carried out identification;
Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize asking of information abundant in registration voice signal
Topic;Reach, for each targeted customer, between the GMM the most corresponding with each basis character, to consider every kind of basic character pair
The audio content answered is in the diversity of phoneme aspect, and the audio content that HMM model also contemplates each basis character corresponding exists
Dependency in time domain such that it is able to greatly increase the sound-groove model of targeted customer accurate in the identification in identification stage
Rate.
Fig. 8 shows the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides.This method for recognizing sound-groove can
Being applied in Voiceprint Recognition System, this Voiceprint Recognition System can belong to same with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4
Equipment, it is also possible to belong to distinct device with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4.This method for recognizing sound-groove includes:
Step 801, based on n kind basis character, stochastic generation the second character string also shows.
Alternatively, in Voiceprint Recognition System, storage has basic character set.As a example by basis character is numeral, basis character
Set may include that 0,1,2,3,4,5,6,7,8,9.
Alternatively, Voiceprint Recognition System is with the basic character in the character set of basis as element, random according to random algorithm
Generate the second character string.Second character string includes that the character of k arranged in sequence, k character include the mutually different base word of n kind
Alphabet in symbol or partial character, k and n is positive integer, usual k >=n.That is, a basic character can be at the second word
Kinds of characters position in symbol string occurs repeatedly.Such as, the second character string is 851185.
Alternatively, n kind mutually different basis character is the n kind basis character that the registration process of targeted customer is used.
Voiceprint Recognition System by the second character string display on a display screen, is read aloud for unknown subscriber.Alternatively, Application on Voiceprint Recognition
System shows auxiliary information the most on a display screen, schematically auxiliary information be " please after prompt tone, the following numeral of bright reading
String: 851185 ".
Alternatively, in addition to stochastic generation mode, the second character string can also is that default changeless character string.
Step 802, extracts the vocal print characteristic sequence corresponding to each character from tested speech signal;
Owing to unknown subscriber is when reading aloud each character, between adjacent two characters, there is the dead time, so test language
Tone signal i.e. includes effective sound bite, includes again invalid voice fragment.Invalid voice fragment can be quiet section or noise
Section.
Voiceprint Recognition System identifies the efficient voice fragment in tested speech signal and invalid voice fragment, and to effectively
Sound bite is labeled.This process is referred to the associated description in step 403.
Voiceprint Recognition System by the jth efficient voice fragment in tested speech signal, be extracted as with in the first character string
The sound bite corresponding to jth character, and extract the vocal print feature with sound bite corresponding to jth character.
Each sound bite is equivalent to a Short Time Speech frame sequence, and it is right with jth character institute that Voiceprint Recognition System extracts
MFCC or PLP in the sound bite answered, as the vocal print feature with the sound bite corresponding to jth character.Due to test
Voice signal includes k character, so Voiceprint Recognition System can extract the k group vocal print feature of arranged in sequence, often group sound
Stricture of vagina feature includes MFCC or PLP of the quantity not speech frame of grade, after all of vocal print feature is ranked up, shape according to timestamp
Become the vocal print characteristic sequence of tested speech signal.
Such as, for the 1st character " 8 ", extract one group of vocal print feature of duration 1000 milliseconds, if each speech frame
Frame length is about 20 milliseconds, then there are about 50 vocal print features in this group vocal print feature;For the 2nd character " 5 ", extract
One group of vocal print feature of duration 1020 milliseconds, if the frame length of each speech frame is about 20 milliseconds, then deposits in this group vocal print feature
About 51 vocal print features, like this, repeat the most one by one.
In other words, the 1st character " 8 " is both corresponded in the most tactic 50 vocal print features, subsequently
51 vocal print features of arrangement both correspond to the 2nd character " 5 ", like this, repeat the most one by one.
Step 803, obtains the x-th character of the second character string, respectively the most corresponding with n kind basis character from targeted customer
In n GMM, by the GMM corresponding with x-th character, it is defined as the xth scalariform states model of HMM;
It is as a example by " 851185 " by the second character string, obtains the 1st character " 8 " of the second character string, from targeted customer's
In n the GMM respectively the most corresponding with n kind basis character, GMM that will be corresponding with the 1st character " 8 ", it is defined as Hidden Markov mould
1st scalariform states model of type;
Obtain the 1st character " 8 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 1st character " 8 ", it is defined as the 1st scalariform states model of HMM;
Obtain the 2nd character " 5 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 2nd character " 5 ", it is defined as the 2nd scalariform states model of HMM;
Obtain the 3rd character " 1 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 3rd character " 1 ", it is defined as the 3rd scalariform states model of HMM;
Obtain the 4th character " 1 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 4th character " 1 ", it is defined as the 4th scalariform states model of HMM;
Obtain the 5th character " 8 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 5th character " 8 ", it is defined as the 5th scalariform states model of HMM;
Obtain the 6th character " 5 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer
In GMM, GMM that will be corresponding with the 6th character " 5 ", it is defined as the 6th scalariform states model of HMM.
Owing to the second character string includes k character, so step 803 can perform k time.
Step 804, by the rotation probability of each scalariform states model with redirect probability and be set to preset value, builds and obtains and the
The HMM that two character strings are corresponding.
Every single order HMM state model includes probability distribution over states, rotation probability and redirects probability.For vocal print characteristic sequence
Middle vocal print feature corresponding for moment t, the probability distribution over states of xth scalariform states model represents that this vocal print feature meets xth scalariform state
The probability of the basic character that model is corresponding, rotation probability represents that observational characteristic forwards moment t to from vocal print feature corresponding for moment t
During the vocal print feature of+1 correspondence, it is maintained at the probability of xth scalariform states model from xth scalariform states model;Redirect probability and represent observation
Feature, when vocal print feature corresponding for moment t forwards vocal print feature corresponding for moment t+1 to, jumps to xth from xth scalariform states model
The probability of+1 scalariform states model.
Alternatively, by the rotation probability of each scalariform states model with redirect probability and be disposed as 0.5.
Through the HMM model that this step is generated, schematically with reference to shown in Fig. 9.
Step 805, inputs HMM by vocal print characteristic sequence, uses Viterbi allocation algorithm to calculate maximum likelihood probability, will
Maximum likelihood probability is defined as similarity score.
Due in vocal print characteristic sequence, the most corresponding tactic continuous multiple vocal print features of each character, vocal print
Vocal print feature quantity in characteristic sequence is more than the GMM model quantity in HMM, so for each scalariform states model in HMM,
There may be tactic continuous multiple vocal print features the most corresponding.After by vocal print characteristic sequence input HMM, according to difference
GMM redirect path, it is possible to calculate multiple probability that this vocal print characteristic sequence is corresponding.Viterbi (Viterbi) allocation algorithm
Can calculate the maximum likelihood probability after this vocal print characteristic sequence input HMM, voiceprint recognition algorithm is by this maximum likelihood probability
It is defined as the similarity score of this vocal print characteristic sequence and HMM model.
Alternatively, this similarity score uses logarithm log to be indicated.
It should be noted that n GMM based on each targeted customer, all can build with the second character string corresponding to
HMM.During so targeted customer is Z, also it is Z with the HMM corresponding to the second character string, the execution Z that step 805 also can be corresponding
Secondary.But under some scenes, it is only necessary to confirm whether unknown subscriber is some specific targeted customer, now, step
805 only need to perform 1 time.
Step 806, when similarity score is more than predetermined threshold value, identifies targeted customer by unknown subscriber.
After the vocal print characteristic sequence of tested speech signal is inputted the HMM of each targeted customer, obtain multiple similarity and divide
Number.By each similarity score compared with predetermined threshold value, if similarity score is more than predetermined threshold value, then Voiceprint Recognition System will not
The identification knowing user is targeted customer.
Otherwise, if similarity score is less than predetermined threshold value, then Voiceprint Recognition System determines that unknown subscriber is with targeted customer not
Coupling, Voiceprint Recognition System can allow unknown subscriber retest, or refusal unknown subscriber carries out subsequent operation.
In sum, the method for recognizing sound-groove that the present embodiment provides, by by the vocal print characteristic sequence of tested speech signal,
HMM GMM constructed by corresponding with multiple bases character calculates similarity score, thus unknown subscriber is carried out identification;
Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize asking of information abundant in registration voice signal
Topic;Reach, for each targeted customer, between the GMM the most corresponding with each basis character, to consider every kind of basic character pair
The audio content answered is in the diversity of phoneme aspect, and the audio content that HMM model also contemplates each basis character corresponding exists
Dependency in time domain such that it is able to greatly increase the sound-groove model of targeted customer accurate in the identification in identification stage
Rate.
It should be noted that Voiceprint Recognition System can be realized by a terminal, it is also possible to by terminal and server combination
Realize.When being realized by terminal and server combination, voice collecting stage and vocal print feature extraction phases can be performed by terminal, and
Training process and/or the Application on Voiceprint Recognition process of sound-groove model can be performed by server.
In the embodiment that some are possible, the training process of sound-groove model is performed by the first Voiceprint Recognition System, and will instruction
N the GMM of the targeted customer got is saved in Share Model storehouse, and Application on Voiceprint Recognition process is held by the second Voiceprint Recognition System
OK, the second Voiceprint Recognition System obtains from Share Model storehouse and uses n the GMM of targeted customer, with generate the second character string with
And build the HMM model corresponding with the second character string.
In a specific example, in 1000 people's training samples, 290,000 tests, (wherein the test of identities match exists
About 10000 times, matching test is not about at 280,000 times), it is possible to realize the recall rate of 68.88% under one thousandth error rate, wait mistake
Probability (EER, Equal Error Rate) is 4.52%, and compared to traditional unrelated modeling method of text, performance boost exceedes
More than 30%.
Figure 10 shows the block diagram of the sound-groove model training devices that one embodiment of the invention provides.This vocal print mould
Type training devices can pass through special hardware circuit, or, being implemented in combination with of software and hardware become the whole of Voiceprint Recognition System or
A part.Described device includes:
Acquisition module 1010, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, described
First character string includes that the character of m arranged in sequence, described m character include n kind mutually different basis character, m and n is
Positive integer and m >=n;
Extraction module 1020, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module 1030, for the described vocal print feature corresponding to each described character of described targeted customer
It is the first sample data, default universal background model is trained, obtain the mixed Gauss model of described targeted customer;
Second training module 1040, for described targeted customer with the vocal print feature corresponding to i-th kind of basic character
Be the second sample data, the described mixed Gauss model of described targeted customer be trained, obtain described targeted customer with
Described mixed Gauss model corresponding to i-th kind of basic character;
Memory module 1050, for storing n the mixed Gaussian the most corresponding with n kind basis character of described targeted customer
Model, described n mixed Gauss model is for building the hidden Markov model corresponding with the second character string.
In an alternate embodiment of the invention, described device, also include:
Display module 1060, shows for the first character string described in stochastic generation.
In an alternate embodiment of the invention, described extraction module 1020, including:
Recognition unit, for identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described
Invalid voice fragment includes quiet section and/or noise section;
Snippet extraction unit, for by the jth efficient voice fragment in described registration voice signal, is extracted as with described
The sound bite corresponding to jth character in first character string;
Feature extraction unit, for extracting and the vocal print feature of the sound bite corresponding to described jth character.
In an alternate embodiment of the invention, described feature extraction unit, for extracting and the voice corresponding to described jth character
Mel cepstrum coefficients MFCC in fragment or perception linear predictor coefficient PLP, as with the voice corresponding to described jth character
The vocal print feature of fragment.
In an alternate embodiment of the invention, described first training module 1030, specifically for each institute with described targeted customer
State the basis described vocal print corresponding to character and be characterized as the first sample data, use maximal posterior probability algorithm to default general
Parameter in background model is adjusted;Described universal background model after adjusting parameter is defined as the mixed of described targeted customer
Close Gauss model.
In an alternate embodiment of the invention, described second training module 1040, specifically for described targeted customer with i-th kind
Basis vocal print corresponding to character is characterized as the second sample data, uses maximal posterior probability algorithm to mix described targeted customer
The parameter closed in Gauss model is adjusted;By the mixed Gauss model of the described targeted customer after adjustment parameter, it is defined as institute
State targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character.
It should be noted that when Voiceprint Recognition System is realized by terminal and server combination, above-mentioned acquisition module
1010, extraction module 1020 and display module 1060 can be realized by the combination of the special hardware circuit in terminal or software and hardware;On
State first training module the 1030, second training module 1040 and memory module 1050 can by the special hardware circuit in server or
The combination of software and hardware realizes.But this is not limited by the embodiment of the present invention, such as, above-mentioned extraction module 1020 can also take
Special hardware circuit in business device realizes, or, the combination of software and hardware realizes.
Figure 11 shows the block diagram of the voice print identification device that one embodiment of the invention provides.This Application on Voiceprint Recognition fills
Put and can pass through special hardware circuit, or, software and hardware be implemented in combination with becoming all or part of Voiceprint Recognition System.Institute
State device to include:
Acquisition module 1110, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, described
Second character string include the character of k arranged in sequence, described k character include n kind mutually different basis character in whole
Character or partial character, k and n is positive integer;
Extraction module 1120, for extracting the vocal print feature sequence corresponding to each character from described tested speech signal
Row;
Build module 1130, for n the mixed Gaussian mould the most corresponding with n kind basis character according to targeted customer
Type, builds the HMM corresponding with described second character string;
Computing module 1140, divides for calculating the similarity of described vocal print characteristic sequence and described HMM
Number;
Identification module 1150, for when described similarity score is more than predetermined threshold value, identifying institute by described unknown subscriber
State targeted customer.
In an alternate embodiment of the invention, described device, also include:
Display module 1160, for based on described n kind basis character, the second character string described in stochastic generation shows.
In an alternate embodiment of the invention, described structure module 1130, specifically for obtaining the x-th word of described second character string
Symbol, x is the positive integer more than or equal to 1 and less than or equal to k;From the n the most corresponding with n kind basis character of described targeted customer
In mixed Gauss model, by the described mixed Gauss model corresponding with described x-th character, it is defined as described Hidden Markov mould
The xth scalariform states model of type;By the rotation probability of each scalariform states model with redirect probability and be set to preset value, build obtain with
The described HMM that described second character string is corresponding.
In an alternate embodiment of the invention, described computing module 1140, specifically for by described for the input of described vocal print characteristic sequence
HMM, uses dimension bit distribution algorithm to calculate maximum likelihood probability, is defined as by described maximum likelihood probability
Described similarity score.
It should be noted that when Voiceprint Recognition System is realized by terminal and server combination, above-mentioned acquisition module
1110, extraction module 1120 and display module 1160 can be realized by the special hardware circuit in terminal, or, the combination of software and hardware is real
Existing;Above-mentioned structure module 1130, computing module 1140 and identification module 1150 can be by the special hardware circuits in server or soft
The combination of hardware realizes.But this is not limited by the embodiment of the present invention, such as, above-mentioned extraction module 1120 can also service
Special hardware circuit in device realizes, or, the combination of software and hardware realizes.
The combination of the software and hardware described in the embodiment of the present invention, it is common that in finger processor run memory or one
Above programmed instruction, realize in step that said method embodiment provided or said apparatus embodiment " module or
Unit ".
It should be understood that above-described embodiment provide sound-groove model training devices train sound-groove model time, only more than
The division stating each functional module is illustrated, and in actual application, can distribute above-mentioned functions by difference as desired
Functional module complete, the internal structure of equipment will be divided into different functional modules, with complete described above all or
Person's partial function.It addition, the sound-groove model training devices that above-described embodiment provides belongs to sound-groove model training method embodiment
Same design, it implements process and refers to embodiment of the method, repeats no more here.
The voice print identification device that above-described embodiment provides, when Application on Voiceprint Recognition, is only carried out with the division of above-mentioned each functional module
Illustrate, in actual application, can as desired above-mentioned functions distribution be completed by different functional modules, will equipment
Internal structure be divided into different functional modules, to complete all or part of function described above.It addition, above-mentioned enforcement
The voice print identification device that example provides and method for recognizing sound-groove embodiment belong to same design, and it implements process and refers to method in fact
Execute example, repeat no more here
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can pass through hardware
Completing, it is also possible to instruct relevant hardware by program and complete, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read only memory, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.
Claims (20)
1. a sound-groove model training method, it is characterised in that described method includes:
Gathering targeted customer and read aloud the first character string produced registration voice signal, described first character string includes m sequentially
The character of arrangement, described m character includes n kind mutually different basis character, m and n is positive integer and m >=n;
The vocal print feature corresponding to each character is extracted from described registration voice signal;
It is characterized as the first sample data with the described vocal print corresponding to each described character of described targeted customer, default is led to
It is trained by background model, obtains the mixed Gauss model of described targeted customer;
It is characterized as the second sample data, to described target with the vocal print corresponding to i-th kind of basic character with described targeted customer
The described mixed Gauss model of user is trained, obtain described targeted customer with i-th kind of basic character corresponding to described in
Mixed Gauss model;
Store n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, described n mixed Gaussian
Model is for building the hidden Markov model corresponding with the second character string.
Method the most according to claim 1, it is characterised in that described collection targeted customer reads aloud book character string and produced
Voice signal before, also include:
Described in stochastic generation, the first character string shows.
Method the most according to claim 1, it is characterised in that described extract each word from described registration voice signal
Vocal print feature corresponding to symbol, including:
Identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described invalid voice fragment includes quiet
Segment and/or noise section;
By the jth efficient voice fragment in described registration voice signal, it is extracted as and the jth word in described first character string
Sound bite corresponding to symbol;
Extract and the vocal print feature of the sound bite corresponding to described jth character.
Method the most according to claim 3, it is characterised in that described extraction and the sound bite corresponding to jth character
Vocal print feature, including:
Extract and the mel cepstrum coefficients MFCC in the sound bite corresponding to described jth character or perception linear predictor coefficient
PLP, as the vocal print feature with the sound bite corresponding to described jth character.
5. according to the arbitrary described method of Claims 1-4, it is characterised in that described in described each with described targeted customer
The basis described vocal print corresponding to character is characterized as the first sample data, is trained default universal background model, obtains
The mixed Gauss model of described targeted customer, including:
It is characterized as the first sample data with the described vocal print corresponding to each described basis character of described targeted customer, uses
Parameter in default universal background model is adjusted by big posterior probability algorithm;
Described universal background model after adjusting parameter is defined as the mixed Gauss model of described targeted customer.
6. according to the arbitrary described method of Claims 1-4, it is characterised in that described with described targeted customer with i-th kind of base
Vocal print corresponding to plinth character is characterized as the second sample data, instructs the described mixed Gauss model of described targeted customer
Practice, obtain described targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character, including:
It is characterized as the second sample data with the vocal print corresponding to i-th kind of basic character, after using maximum with described targeted customer
Test probabilistic algorithm the parameter in the mixed Gauss model of described targeted customer is adjusted;
The mixed Gauss model of the described targeted customer after parameter will be adjusted, be defined as described targeted customer with i-th kind of basis
Described mixed Gauss model corresponding to character.
7. a method for recognizing sound-groove, it is characterised in that described method includes:
Obtaining unknown subscriber and read aloud tested speech signal produced by the second character string, described second character string includes k sequentially
The character of arrangement, described k character includes the alphabet in the character of n kind mutually different basis or partial character, k and n is equal
For positive integer;
The vocal print characteristic sequence corresponding to each character is extracted from described tested speech signal;
According to n the mixed Gauss model the most corresponding with n kind basis character of targeted customer, build and described second character string
Corresponding HMM;
Calculate the similarity score of described vocal print characteristic sequence and described HMM;
When described similarity score is more than predetermined threshold value, described unknown subscriber is identified described targeted customer.
Method the most according to claim 7, it is characterised in that described acquisition unknown subscriber reads aloud the second character string and produced
Tested speech signal before, also include:
Based on described n kind basis character, the second character string described in stochastic generation shows.
Method the most according to claim 7, it is characterised in that described distinguishing with n kind basis character according to targeted customer
N corresponding mixed Gauss model, build the HMM corresponding with described second character string, including:
Obtaining the x-th character of described second character string, x is the positive integer more than or equal to 1 and less than or equal to k;
From n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, will be with described x-th word
The described mixed Gauss model that symbol is corresponding, is defined as the xth scalariform states model of described HMM;
The rotation probability of each scalariform states model is set to preset value with redirecting probability, builds and obtain and described second character string
Corresponding described HMM.
Method the most according to claim 7, it is characterised in that described calculating described vocal print characteristic sequence and described hidden horse
The similarity score of Er Kefu model, including:
Described vocal print characteristic sequence is inputted described HMM, uses Viterbi allocation algorithm to calculate maximum likelihood
Probability, is defined as described similarity score by described maximum likelihood probability.
11. 1 kinds of sound-groove model training devicess, it is characterised in that described device includes:
Acquisition module, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, described first character
String includes that the character of m arranged in sequence, described m character include the mutually different basic character of n kind, m and n be positive integer and
m≥n;
Extraction module, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module, for being characterized as the first sample with the described vocal print corresponding to each described character of described targeted customer
Notebook data, is trained default universal background model, obtains the mixed Gauss model of described targeted customer;
Second training module, for being characterized as the second sample with described targeted customer with the vocal print corresponding to i-th kind of basic character
Notebook data, is trained the described mixed Gauss model of described targeted customer, obtain described targeted customer with i-th kind of basis
Described mixed Gauss model corresponding to character;
Memory module, for storing n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, institute
State n mixed Gauss model for building the hidden Markov model corresponding with the second character string.
12. devices according to claim 11, it is characterised in that described device, also include:
Display module, shows for the first character string described in stochastic generation.
13. devices according to claim 11, it is characterised in that described extraction module, including:
Recognition unit is for identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described invalid
Sound bite includes quiet section and/or noise section;
Snippet extraction unit, for by the jth efficient voice fragment in described registration voice signal, is extracted as and described first
The sound bite corresponding to jth character in character string;
Feature extraction unit, for extracting and the vocal print feature of the sound bite corresponding to described jth character.
14. devices according to claim 13, it is characterised in that described feature extraction unit, for extracting and described jth
Mel cepstrum coefficients MFCC in sound bite corresponding to individual character or perception linear predictor coefficient PLP, as with described jth
The vocal print feature of the sound bite corresponding to individual character.
15. according to the arbitrary described device of claim 11 to 14, it is characterised in that described first training module, specifically for
It is characterized as the first sample data, after using maximum with the described vocal print corresponding to each described basis character of described targeted customer
Test probabilistic algorithm the parameter in default universal background model is adjusted;By the described universal background model after adjustment parameter
It is defined as the mixed Gauss model of described targeted customer.
16. according to the arbitrary described device of claim 11 to 14, it is characterised in that described second training module, specifically for
It is characterized as the second sample data with described targeted customer with the vocal print corresponding to i-th kind of basic character, uses maximum a posteriori general
Parameter in the mixed Gauss model of described targeted customer is adjusted by rate algorithm;By the described targeted customer after adjustment parameter
Mixed Gauss model, be defined as described targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character.
17. 1 kinds of voice print identification device, it is characterised in that described device includes:
Acquisition module, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, described second character
String includes the character of k arranged in sequence, and described k character includes the alphabet in the character of n kind mutually different basis or portion
Dividing character, k and n is positive integer;
Extraction module, for extracting the vocal print characteristic sequence corresponding to each character from described tested speech signal;
Build module, for n the mixed Gauss model respectively the most corresponding with n kind basis character according to targeted customer, structure and
The HMM that described second character string is corresponding;
Computing module, for calculating the similarity score of described vocal print characteristic sequence and described HMM;
By described unknown subscriber, identification module, for when described similarity score is more than predetermined threshold value, identifying that described target is used
Family.
18. devices according to claim 17, it is characterised in that described device, also include:
Display module, for based on described n kind basis character, the second character string described in stochastic generation shows.
19. devices according to claim 17, it is characterised in that described structure module, specifically for obtaining described second
The x-th character of character string, x is the positive integer more than or equal to 1 and less than or equal to k;From described targeted customer with n kind base word
In n the mixed Gauss model that symbol is the most corresponding, by the described mixed Gauss model corresponding with described x-th character, it is defined as
The xth scalariform states model of described HMM;By the rotation probability of each scalariform states model with redirect probability and be set to
Preset value, builds and obtains the described HMM corresponding with described second character string.
20. devices according to claim 17, it is characterised in that described computing module, specifically for special by described vocal print
Levy HMM described in sequence inputting, use dimension bit distribution algorithm to calculate maximum likelihood probability, by described maximum
Likelihood probability is defined as described similarity score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610388231.3A CN106057206B (en) | 2016-06-01 | 2016-06-01 | Sound-groove model training method, method for recognizing sound-groove and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610388231.3A CN106057206B (en) | 2016-06-01 | 2016-06-01 | Sound-groove model training method, method for recognizing sound-groove and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106057206A true CN106057206A (en) | 2016-10-26 |
CN106057206B CN106057206B (en) | 2019-05-03 |
Family
ID=57169475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610388231.3A Active CN106057206B (en) | 2016-06-01 | 2016-06-01 | Sound-groove model training method, method for recognizing sound-groove and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106057206B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358945A (en) * | 2017-07-26 | 2017-11-17 | 谢兵 | A kind of more people's conversation audio recognition methods and system based on machine learning |
CN108416592A (en) * | 2018-03-19 | 2018-08-17 | 成都信达智胜科技有限公司 | A kind of high speed voice recognition methods |
CN109102812A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | A kind of method for recognizing sound-groove, system and electronic equipment |
WO2019000832A1 (en) * | 2017-06-30 | 2019-01-03 | 百度在线网络技术(北京)有限公司 | Method and apparatus for voiceprint creation and registration |
CN109473107A (en) * | 2018-12-03 | 2019-03-15 | 厦门快商通信息技术有限公司 | A kind of relevant method for recognizing sound-groove of text half and system |
CN109871847A (en) * | 2019-03-13 | 2019-06-11 | 厦门商集网络科技有限责任公司 | A kind of OCR recognition methods and terminal |
CN109948481A (en) * | 2019-03-07 | 2019-06-28 | 惠州学院 | A kind of passive human body recognition method based on the sampling of narrow radio frequency link |
CN110335608A (en) * | 2019-06-17 | 2019-10-15 | 平安科技(深圳)有限公司 | Voice print verification method, apparatus, equipment and storage medium |
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN110689895A (en) * | 2019-09-06 | 2020-01-14 | 北京捷通华声科技股份有限公司 | Voice verification method and device, electronic equipment and readable storage medium |
CN111081260A (en) * | 2019-12-31 | 2020-04-28 | 苏州思必驰信息科技有限公司 | Method and system for identifying voiceprint of awakening word |
CN111341307A (en) * | 2020-03-13 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN112151018A (en) * | 2019-06-10 | 2020-12-29 | 阿里巴巴集团控股有限公司 | Voice evaluation and voice recognition method, device, equipment and storage medium |
CN112820299A (en) * | 2020-12-29 | 2021-05-18 | 马上消费金融股份有限公司 | Voiceprint recognition model training method and device and related equipment |
CN113056784A (en) * | 2019-01-29 | 2021-06-29 | 深圳市欢太科技有限公司 | Voice information processing method and device, storage medium and electronic equipment |
CN113457096A (en) * | 2020-03-31 | 2021-10-01 | 荣耀终端有限公司 | Method for detecting basketball movement based on wearable device and wearable device |
CN113571054A (en) * | 2020-04-28 | 2021-10-29 | ***通信集团浙江有限公司 | Speech recognition signal preprocessing method, device, equipment and computer storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294083A1 (en) * | 2000-03-16 | 2007-12-20 | Bellegarda Jerome R | Fast, language-independent method for user authentication by voice |
CN102238190A (en) * | 2011-08-01 | 2011-11-09 | 安徽科大讯飞信息科技股份有限公司 | Identity authentication method and system |
CN104717219A (en) * | 2015-03-20 | 2015-06-17 | 百度在线网络技术(北京)有限公司 | Vocal print login method and device based on artificial intelligence |
CN104821934A (en) * | 2015-03-20 | 2015-08-05 | 百度在线网络技术(北京)有限公司 | Artificial intelligence based voice print login method and device |
-
2016
- 2016-06-01 CN CN201610388231.3A patent/CN106057206B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294083A1 (en) * | 2000-03-16 | 2007-12-20 | Bellegarda Jerome R | Fast, language-independent method for user authentication by voice |
CN102238190A (en) * | 2011-08-01 | 2011-11-09 | 安徽科大讯飞信息科技股份有限公司 | Identity authentication method and system |
CN104717219A (en) * | 2015-03-20 | 2015-06-17 | 百度在线网络技术(北京)有限公司 | Vocal print login method and device based on artificial intelligence |
CN104821934A (en) * | 2015-03-20 | 2015-08-05 | 百度在线网络技术(北京)有限公司 | Artificial intelligence based voice print login method and device |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102812A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | A kind of method for recognizing sound-groove, system and electronic equipment |
CN109102812B (en) * | 2017-06-21 | 2021-08-31 | 北京搜狗科技发展有限公司 | Voiceprint recognition method and system and electronic equipment |
WO2019000832A1 (en) * | 2017-06-30 | 2019-01-03 | 百度在线网络技术(北京)有限公司 | Method and apparatus for voiceprint creation and registration |
US11100934B2 (en) | 2017-06-30 | 2021-08-24 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for voiceprint creation and registration |
CN107358945A (en) * | 2017-07-26 | 2017-11-17 | 谢兵 | A kind of more people's conversation audio recognition methods and system based on machine learning |
CN108416592B (en) * | 2018-03-19 | 2022-08-05 | 成都信达智胜科技有限公司 | High-speed voice recognition method |
CN108416592A (en) * | 2018-03-19 | 2018-08-17 | 成都信达智胜科技有限公司 | A kind of high speed voice recognition methods |
CN109473107A (en) * | 2018-12-03 | 2019-03-15 | 厦门快商通信息技术有限公司 | A kind of relevant method for recognizing sound-groove of text half and system |
CN109473107B (en) * | 2018-12-03 | 2020-12-22 | 厦门快商通信息技术有限公司 | Text semi-correlation voiceprint recognition method and system |
CN113056784A (en) * | 2019-01-29 | 2021-06-29 | 深圳市欢太科技有限公司 | Voice information processing method and device, storage medium and electronic equipment |
CN109948481A (en) * | 2019-03-07 | 2019-06-28 | 惠州学院 | A kind of passive human body recognition method based on the sampling of narrow radio frequency link |
CN109948481B (en) * | 2019-03-07 | 2024-02-02 | 惠州学院 | Passive human body identification method based on narrowband radio frequency link sampling |
CN109871847B (en) * | 2019-03-13 | 2022-09-30 | 厦门商集网络科技有限责任公司 | OCR recognition method and terminal |
CN109871847A (en) * | 2019-03-13 | 2019-06-11 | 厦门商集网络科技有限责任公司 | A kind of OCR recognition methods and terminal |
CN112151018A (en) * | 2019-06-10 | 2020-12-29 | 阿里巴巴集团控股有限公司 | Voice evaluation and voice recognition method, device, equipment and storage medium |
CN110335608A (en) * | 2019-06-17 | 2019-10-15 | 平安科技(深圳)有限公司 | Voice print verification method, apparatus, equipment and storage medium |
CN110335608B (en) * | 2019-06-17 | 2023-11-28 | 平安科技(深圳)有限公司 | Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium |
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN110491393B (en) * | 2019-08-30 | 2022-04-22 | 科大讯飞股份有限公司 | Training method of voiceprint representation model and related device |
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110689895A (en) * | 2019-09-06 | 2020-01-14 | 北京捷通华声科技股份有限公司 | Voice verification method and device, electronic equipment and readable storage medium |
CN111081260A (en) * | 2019-12-31 | 2020-04-28 | 苏州思必驰信息科技有限公司 | Method and system for identifying voiceprint of awakening word |
CN111341307A (en) * | 2020-03-13 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN113457096A (en) * | 2020-03-31 | 2021-10-01 | 荣耀终端有限公司 | Method for detecting basketball movement based on wearable device and wearable device |
CN113457096B (en) * | 2020-03-31 | 2022-06-24 | 荣耀终端有限公司 | Method for detecting basketball movement based on wearable device and wearable device |
CN113571054A (en) * | 2020-04-28 | 2021-10-29 | ***通信集团浙江有限公司 | Speech recognition signal preprocessing method, device, equipment and computer storage medium |
CN113571054B (en) * | 2020-04-28 | 2023-08-15 | ***通信集团浙江有限公司 | Speech recognition signal preprocessing method, device, equipment and computer storage medium |
CN112820299B (en) * | 2020-12-29 | 2021-09-14 | 马上消费金融股份有限公司 | Voiceprint recognition model training method and device and related equipment |
CN112820299A (en) * | 2020-12-29 | 2021-05-18 | 马上消费金融股份有限公司 | Voiceprint recognition model training method and device and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106057206B (en) | 2019-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106057206A (en) | Voiceprint model training method, voiceprint recognition method and device | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
US20180197548A1 (en) | System and method for diarization of speech, automated generation of transcripts, and automatic information extraction | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
CN105489221B (en) | A kind of audio recognition method and device | |
WO2020181824A1 (en) | Voiceprint recognition method, apparatus and device, and computer-readable storage medium | |
TWI527023B (en) | A voiceprint recognition method and apparatus | |
CN106098068A (en) | A kind of method for recognizing sound-groove and device | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
CN107104803A (en) | It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password | |
CN103811009A (en) | Smart phone customer service system based on speech analysis | |
CN110544469B (en) | Training method and device of voice recognition model, storage medium and electronic device | |
CN107077843A (en) | Session control and dialog control method | |
CN109410664A (en) | Pronunciation correction method and electronic equipment | |
CN111128211B (en) | Voice separation method and device | |
CN109410956A (en) | A kind of object identifying method of audio data, device, equipment and storage medium | |
CN107492153A (en) | Attendance checking system, method, work attendance server and attendance record terminal | |
CN110797032A (en) | Voiceprint database establishing method and voiceprint identification method | |
Beigi | Challenges of LargeScale Speaker Recognition | |
CN109545226A (en) | A kind of audio recognition method, equipment and computer readable storage medium | |
WO2021152566A1 (en) | System and method for shielding speaker voice print in audio signals | |
CN105895079A (en) | Voice data processing method and device | |
CN112992155A (en) | Far-field voice speaker recognition method and device based on residual error neural network | |
CN107910005A (en) | The target service localization method and device of interaction text | |
Koolagudi et al. | Speaker recognition in the case of emotional environment using transformation of speech features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230713 Address after: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd. Address before: 2, 518000, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. |
|
TR01 | Transfer of patent right |