CN106098068B

CN106098068B - A kind of method for recognizing sound-groove and device

Info

Publication number: CN106098068B
Application number: CN201610416650.3A
Authority: CN
Inventors: 李为; 钱柄桦; 金星明; 李科; 吴富章; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2016-06-12
Filing date: 2016-06-12
Publication date: 2019-07-16
Anticipated expiration: 2036-06-12
Also published as: WO2017215558A1; CN106098068A

Abstract

The embodiment of the invention discloses a kind of method for recognizing sound-groove and devices, and voice messaging is verified caused by the first character string the method comprise the steps that obtaining verifying user and reading aloud；Speech recognition is carried out to the verifying voice messaging and obtains the sound bite corresponding with multiple characters in first character string respectively for including in the verifying voice messaging；Extract the vocal print feature of the corresponding sound bite of each character；According to the vocal print feature of the corresponding sound bite of each character, the corresponding feature vector of each character in voice messaging is verified in conjunction with the corresponding universal background model training of preset respective symbols；Calculate the corresponding feature vector of each character in verifying voice messaging and the preset similarity score for registering the corresponding feature vector of respective symbols in voice messaging, if the similarity score reaches default verifying thresholding, the verifying user is determined as the corresponding registration user of the registration voice messaging.Using the present invention, Application on Voiceprint Recognition accuracy rate can be effectively improved.

Description

A kind of method for recognizing sound-groove and device

Technical field

The present invention relates to voice recognition technology field more particularly to a kind of method for recognizing sound-groove and device.

Background technique

Application on Voiceprint Recognition knows method for distinguishing, including two ranks of user's registration and user identity identification as a kind of biological information Section.Voice is mapped as user model by a series of processing by registration phase.In the language that cognitive phase is unknown for one section of identity Whether sound carries out the matching of similarity with model, and then unanimously sentences to the identity of unknown voice and the identity of registration voice It is disconnected.Existing vocal print modeling method is usually to be modeled from the unrelated level of text to realize and retouch to speaker's identity feature It states, but the unrelated modeling pattern of text, when user reads aloud different content, recognition accuracy is lower, it is difficult to meet the requirements.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of method for recognizing sound-groove and device, Application on Voiceprint Recognition standard can be effectively improved True rate.

In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of method for recognizing sound-groove, which comprises

It obtains verifying user and reads aloud verifying voice messaging caused by the first character string；

To it is described verifying voice messaging carry out speech recognition obtain it is described verifying voice messaging in include respectively with it is described The corresponding sound bite of multiple characters in first character string；

Extract the vocal print feature of the corresponding sound bite of each character；

It is corresponding general in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of each character Background model training is verified the corresponding feature vector of each character in voice messaging；

Calculate the corresponding feature vector of each character and corresponding word in preset registration voice messaging in verifying voice messaging The similarity score of corresponding feature vector is accorded with, if the similarity score reaches default verifying thresholding, the verifying is used Family is determined as the corresponding registration user of the registration voice messaging.

Correspondingly, the embodiment of the invention also provides a kind of voice print identification device, described device includes:

Voice obtains module, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string；

Sound bite identification module obtains the verifying voice letter for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including in breath；

Vocal print feature extraction module, the vocal print for extracting the corresponding sound bite of each character in verifying voice messaging are special Sign；

Characteristic model training module, for the vocal print feature according to the corresponding sound bite of each character, in conjunction with pre- If respective symbols corresponding universal background model training be verified the corresponding feature vector of each character in voice messaging；

Similarity judgment module, for calculating each corresponding feature vector of character and preset note in verifying voice messaging The similarity score of the corresponding feature vector of respective symbols in volume voice messaging；

Subscriber identification module, it is if reaching default verifying thresholding for the similarity score, the verifying user is true It is set to the corresponding registration user of the registration voice messaging.

The vocal print of the corresponding sound bite of each character in verifying voice messaging of the present embodiment by obtaining verifying user Feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols, and leads to Cross will verify the feature vectors of respective symbols in the corresponding feature vector of each character and registration voice messaging in voice messaging into Row similarity-rough set, so that it is determined that the user identity of verifying user, which is to the user characteristics vector that compares and specific Character is corresponding, vocal print feature when user reads aloud kinds of characters is fully taken into account, so as to effectively improve Application on Voiceprint Recognition accuracy rate.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the Stages Overview schematic diagram of the method for recognizing sound-groove in the embodiment of the present invention；

Fig. 2 is the flow diagram of one of embodiment of the present invention method for recognizing sound-groove；

Fig. 3 is that the principle that identification obtains the corresponding sound bite of multiple characters from voice messaging in the embodiment of the present invention is shown It is intended to；

Fig. 4 is the principle signal for obtaining the corresponding feature vector of each character in the embodiment of the present invention from voice messaging Figure；

Fig. 5 is the voiceprint registration flow diagram that user is registered in the embodiment of the present invention；

Fig. 6 is the flow diagram of the method for recognizing sound-groove in another embodiment of the present invention；

Fig. 7 is the structural schematic diagram of one of embodiment of the present invention voice print identification device；

Fig. 8 is the structural schematic diagram of the sound bite identification module in the embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The embodiment of the invention provides a kind of method for recognizing sound-groove and devices.The method for recognizing sound-groove and device can be applied to It is in need identification unknown subscriber's identity scene or equipment in.The character in character string for carrying out Application on Voiceprint Recognition can be Arabic numerals, English alphabet or other language characters etc..To simplify the description, the character in the embodiment of the present invention is with Arab It is illustrated for number.

Method for recognizing sound-groove in the embodiment of the present invention can be divided into two stages, as shown in Figure 1:

1) the voiceprint registration stage of user is registered

In the voiceprint registration stage, a login-string (the second character occurred hereinafter can be read aloud by registering user String), voice print identification device acquires registration voice messaging of the registration user when reading aloud the login-string, then to registration language Message breath carry out voice recognition obtain it is described registration voice messaging in include respectively with multiple words in the login-string Corresponding sound bite is accorded with, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character, Including the vocal print feature according to the corresponding sound bite of each character, in conjunction with the corresponding common background of preset respective symbols Model (Universal Background Model, UBM, i.e. GMM-UBM) training obtains each character in registration voice messaging Corresponding feature vector, then voice print identification device can be respectively that different registration users reads aloud it in the voiceprint registration stage Registration voice messaging in the corresponding feature vector of multiple characters be stored in the model library of voice print identification device.

For example, login-string is digit strings 0185851, four kinds of digital " 0 "s, " 1 ", " 5 ", " 8 " are contained, then sound Line identification device carries out vocal print feature extraction and sound-groove model according to the corresponding sound bite of character each in registration voice messaging Training, obtain " 0 ", " 1 ", " 5 ", " 8 " corresponding sound bite vocal print feature, and then combine preset respective symbols it is corresponding UBM training obtains the corresponding feature vector of each character in registration voice messaging, including feature vector corresponding with digital " 0 ", And digital " 1 " corresponding feature vector feature vector corresponding with number " 5 " and feature vector corresponding with number " 8 ".

2) the identification stage of user is verified

In the identification stage, the user for verifying the i.e. unknown identity of user reads aloud a verifying character string (to be occurred hereinafter The first character string, second character string possesses at least one identical character with first character string), Application on Voiceprint Recognition dress Verifying voice messaging of the acquisition verifying user when reading aloud the verifying character string is set, sound then is carried out to verifying voice messaging Identification obtains the voice sheet corresponding with multiple characters in the verifying character string respectively for including in the verifying voice messaging Section, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character, including according to described each The vocal print feature of the corresponding sound bite of a character is verified voice letter in conjunction with the corresponding UBM training of preset respective symbols The corresponding feature vector of each character in breath finally calculates the corresponding feature vector of each character in verifying voice messaging and default Registration voice messaging in the corresponding feature vector of respective symbols similarity score, tested if the similarity score reaches default Thresholding is demonstrate,proved, then the verifying user is determined as the corresponding registration user of the registration voice messaging.

For example, verifying character string is digit strings 85851510, then when voice print identification device is read aloud according to verifying user The corresponding sound bite of each character carries out vocal print feature and extracts and vocal print model training in the verifying voice messaging of generation, obtains " 0 ", " 1 ", " 5 ", " 8 " corresponding GMM, and then combine the corresponding UBM of preset respective symbols that verifying user can be calculated Verifying voice messaging feature vector, including and the corresponding feature vector of digital " 0 ", feature vector corresponding with number " 1 ", And digital " 5 " corresponding feature vector and feature vector corresponding with digital " 8 ", and then calculate separately in verifying voice messaging " 0 ", " 1 ", " 5 ", " 8 " corresponding feature vector spy corresponding with " 0 ", " 1 ", " 5 ", " 8 " in registration voice messaging respectively The similarity score between vector is levied, if the similarity score reaches default verifying thresholding, the verifying user is determined For the corresponding registration user of the registration voice messaging.

It should be pointed out that the voiceprint registration stage of above-mentioned registration user and the identification stage of verifying user can be It realizes, can also be realized in different devices in same device respectively, such as the vocal print note of registration user The volume stage implements in the first equipment, and then the first equipment will be registered the corresponding feature vector of multiple characters in voice messaging and be sent out The second equipment is given, so as to implement the identification stage of verifying user in the second equipment.

Above-mentioned two process is described in detail respectively below by specific embodiment.

Fig. 2 is the flow diagram of one of embodiment of the present invention method for recognizing sound-groove, in the present embodiment as shown in the figure Method for recognizing sound-groove process may include:

S201 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.

Verifying user, that is, unknown identity user, needs to verify its user identity by voice print identification device.It is described First character string is that the character string of authentication is carried out for verifying user, can be randomly generated, and is also possible to default solid A fixed character string, such as the second character string corresponding with pre-generated registration voice messaging are one at least partly identical Character string.Specifically, the character string may include m character, wherein there is n mutually different characters, m, n are positive whole Number, and m >=n.

For example, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ", “3”、“4”、“5”、“8”、“9”。

In an alternative embodiment, voice print identification device can be generated and show first character string, allows and verifies user's root It is read aloud according to first character string of display.

S202, to it is described verifying voice messaging carry out speech recognition obtain it is described verifying voice messaging in include respectively with The corresponding sound bite of multiple characters in first character string.

As shown in figure 3, voice print identification device can be filtered by speech recognition and intensity of sound, by the verifying voice Information divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment, after being not involved in Continuous treatment process.

S203 extracts the vocal print feature of the corresponding sound bite of each character.

Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.

S204, it is corresponding in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of each character Universal background model training be verified the corresponding feature vector of each character in voice messaging；

The universal background model UBM in the embodiment of the present invention is a kind of language of optional network specific digit by a large amount of speakers Mixed Gauss model made of segment combined training characterizes distribution of the voice of corresponding number in feature space, and due to instruction Practice data source in a large amount of speaker, therefore it does not characterize certain one kind and specifically talks about people, it, can with the unrelated characteristic of identity Regard a kind of universal background model as.It schematically, can be more than 20 hours languages greater than 1000 people, duration using number of speaking Sound sample, and the frequency of occurrences relative equilibrium of each character, training obtain UBM.The mathematic(al) representation of UBM are as follows:

P (x)=∑_{I=1 ... C}a_iN(x|μ_i, ∑_i) ... ... formula (1)

Wherein, P (x) represents the probability distribution of UBM, and C, which is represented, shares C Gauss module in UBM, sums up, a_iIt represents The weight of i-th of Gauss module, μ_iRepresent the mean value of i-th of Gauss module, ∑_iRepresent the variance of i-th of Gauss module, N (x) Gaussian Profile is represented, x represents the sample of input, sample namely vocal print feature.

Voice print identification device can will verify the vocal print feature of the corresponding sound bite of each character in voice messaging as Training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pair The parameter for the universal background model answered is adjusted, i.e., in the sound that will verify the corresponding sound bite of each character in voice messaging After line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbols Parameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to verify voice according to making The corresponding feature vector of respective symbols in information.

Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speak The identity information of people, we define the mean value super vector of UBM model are as follows:

To which voice print identification device can be by the vocal print feature of the corresponding sound bite of character each in verifying voice messaging As training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messaging Corresponding feature vector.

In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, we Pass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis) The variation range of mean value super vector is limited in a sub-spaces, voice print identification device can will be verified each in voice messaging The vocal print feature of the corresponding sound bite of character is as training sample data, using maximal posterior probability algorithm to preset corresponding The mean value super vector of the corresponding universal background model of character is adjusted, and combines preset super vector subspace matrices to obtain The corresponding feature vector of each character into verifying voice messaging.In the specific implementation, can be using following formula to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, so that the corresponding common background mould of respective symbols adjusted The posterior probability of type is maximum:

M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verify The corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messaging The corresponding feature vector of respective symbols.The super vector subspace matrices T be according to the mean value of the gauss hybrid models surpass to What the correlation determination in amount between each dimension vector obtained.

S205 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messaging The similarity score of the corresponding feature vector of character is answered, if the similarity score reaches default verifying thresholding, is tested described Card user is determined as the corresponding registration user of the registration voice messaging.

Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging, And extracted by the vocal print feature similar with the present embodiment and vocal print model training, it is each in available registration voice messaging The corresponding feature vector of the sound bite of character.The registration voice messaging can be voice print identification device and obtain registration user It reads aloud and registers voice messaging caused by the second character string, second character string and first character string possess at least one Identical character, i.e., described corresponding second character string of registration voice messaging and first character string are at least partly identical.Into And in an alternative embodiment, it is corresponding that voice print identification device can also obtain respective symbols in the registration voice messaging from outside After feature vector, i.e. registration user are by other equipment typing registration voice messaging, other equipment or server pass through sound Line feature extraction and vocal print model training obtain the corresponding feature vector of sound bite of each character in registration voice messaging, sound Line identification device is by getting the corresponding feature of respective symbols in the registration voice messaging from other equipment or server Vector, thus verifying user the identification stage to feature vector corresponding with each character in verifying voice messaging into Row compares.

In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messaging After feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measured The score value of similarity degree between two feature vectors.In an alternative embodiment, each word in verifying voice messaging can be calculated Accord with the COS distance value between corresponding feature vector feature vector corresponding with respective symbols in preset registration voice messaging As the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging in corresponding feature vector and Register the similarity score between the feature vector in voice messaging:

Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ω_i(tar) table Show the character corresponding feature vector, ω in verifying voice messaging_i(test) indicate that the character is right in registration voice messaging The feature vector answered.If verifying in voice messaging and registration voice messaging includes multiple identical characters, can be according to above formula The similarity score for each character being calculated takes mean value, if the similarity score mean value of each character reaches corresponding default Thresholding is verified, then the verifying user is determined as the corresponding registration user of the registration voice messaging.Multidigit is registered if it exists User, such as registration user A, B and C shown in FIG. 1, can be according to the feature vector and each note for verifying some character of user The similarity of the feature vector of the respective symbols of volume user, when the feature vector and verifying language of the respective symbols of some registration user The similarity score highest and similarity of the feature vector of the character of sound reach default verifying thresholding, then make registration user For the identification result for verifying user.

In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such as occur 0,1,5 and 8 all occur 2 times respectively in verifying voice messaging as shown in Figure 2, then can be corresponding according to character 0 twice The feature vector that handles of the sound bite similarity with the feature vector of character 0 in preset registration voice messaging respectively The average value of score, as character 0 in the feature vector of character 0 in this verifying voice messaging and preset registration voice messaging Feature vector similarity score, and so on.

It should be pointed out that measuring the mode of the similarity between two feature vectors there are also very much, the above is only this hairs A kind of embodiment of bright offer, those skilled in the art may not need creative labor on the basis of scheme disclosed by the invention The similarity point of more feature vectors for calculating verifying voice messaging and registering the character shared in voice messaging is obtained dynamicly Several modes, the present invention is without exhaustion.

To the corresponding sound bite of character each in the verifying voice messaging of, the present embodiment by obtaining verifying user Vocal print feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols, And by will verify the corresponding feature vector of each character in voice messaging with register the features of respective symbols in voice messaging to Amount carries out similarity-rough set, so that it is determined that the user identity of verifying user, which to the user characteristics vector that compares with Specific character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, so as to effectively improve Application on Voiceprint Recognition standard True rate.

Fig. 5 is the voiceprint registration flow diagram that user is registered in the embodiment of the present invention, in the present embodiment as shown in the figure Voiceprint registration process may include:

S501 obtains registration user and reads aloud and registers voice messaging caused by the second character string, second character string with First character string possesses at least one identical character.

The registration user is the user for determining legal identity, and second character string is for acquiring registration user's vocal print The character string of feature vector can be randomly generated, and be also possible to preset a character string of fixation.Specifically, described Two character strings also may include m character, wherein there is n mutually different characters, m, n are positive integer, and m >=n.

In an alternative embodiment, voice print identification device can be generated and show second character string, allows and registers user's root It is read aloud according to second character string of display.

S502, to it is described registration voice messaging carry out speech recognition obtain it is described registration voice messaging in include respectively with The corresponding sound bite of multiple characters in second character string；

Voice print identification device can be filtered by speech recognition and intensity of sound, and the verifying voice messaging is divided To the corresponding sound bite of multiple characters, invalid voice segment can also optionally be weeded out, be not involved in subsequent processed Journey.

S503 extracts the vocal print feature of the corresponding sound bite of each character in registration voice messaging.

S504, according to the vocal print feature of the corresponding sound bite of character each in registration voice messaging, in conjunction with preset phase Character corresponding universal background model training is answered to obtain the corresponding feature vector of each character in registration voice messaging.

The expression formula of UBM can be with reference to embodiment above.The step of voiceprint registration process and Application on Voiceprint Recognition process S204 is similar, voice print identification device can will register the vocal print feature of the corresponding sound bite of each character in voice messaging as Training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pair The parameter for the universal background model answered is adjusted, i.e., in the sound that will register the corresponding sound bite of each character in voice messaging After line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbols Parameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to register voice according to making The corresponding feature vector of respective symbols in information.

And since the mean value of Gauss module each in UBM model can be used for distinguishing the identity information of speaker, vocal print is known Other device can be adopted using the vocal print feature of the corresponding sound bite of character each in registration voice messaging as training sample data With maximal posterior probability algorithm (Maximum A Posteriori, MAP) to the corresponding common background mould of preset respective symbols The mean value super vector of type is adjusted, i.e., makees in the vocal print feature that will register the corresponding sound bite of each character in voice messaging After substituting into formula (1) for input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) is maximum, so as to incite somebody to action So that the maximum mean value super vector of posterior probability P (x) is as the corresponding feature vector of respective symbols in registration voice messaging.

It, can be using following formula to the equal of the corresponding universal background model of preset respective symbols in another alternative embodiment Value super vector is adjusted, so that the posterior probability of the corresponding universal background model of respective symbols adjusted is maximum:

M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to register The corresponding feature vector of respective symbols in voice messaging will register the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in registration voice messaging The corresponding feature vector of respective symbols.

Fig. 6 is the flow diagram of the method for recognizing sound-groove in another embodiment of the present invention, in the present embodiment as shown in the figure Method for recognizing sound-groove may include following below scheme:

S601, it is random to generate the first character string and shown.

S602 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.

S603 identifies efficient voice segment and invalid voice segment in the verifying voice messaging.

Specifically, can be divided according to intensity of sound to verifying voice, the lesser sound bite of intensity of sound is regarded For invalid voice segment (for example including mute section and impulsive noise).

S604, to the efficient voice segment carry out speech recognition obtain respectively with multiple words in first character string Accord with corresponding sound bite.

Sound bite corresponding with multiple characters in first character string respectively can be obtained by speech recognition.

S605 determines the sequence and first character string of the sound bite of multiple characters in the verifying voice messaging In respective symbols sequence it is consistent.

In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be with It generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this step Sequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail, If consistent with the sequence of the respective symbols in the first character string, follow-up process is executed.

S606 extracts the vocal print feature of the corresponding sound bite of each character.

S607, using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as number of training According to being adjusted using mean value super vector of the maximal posterior probability algorithm to the corresponding universal background model of preset respective symbols It is whole, so that estimation is verified the corresponding feature vector of each character in voice messaging.

Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speak The identity information of people, voice print identification device can be by the vocal print features of the corresponding sound bite of character each in verifying voice messaging As training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messaging Corresponding feature vector.

In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, vocal print Identification device can be adjusted the mean value super vector of the corresponding universal background model of preset respective symbols using following formula, make The posterior probability for obtaining the corresponding universal background model of respective symbols adjusted is maximum:

M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verify The corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messaging The corresponding feature vector of respective symbols.

S608 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messaging The similarity score of the corresponding feature vector of character is answered, if similarity score reaches default verifying thresholding, it is true user will to be verified It is set to the corresponding registration user of registration voice messaging.

In the present embodiment, voice print identification device can calculate in verifying voice messaging the corresponding feature vector of each character with COS distance value in preset registration voice messaging between the corresponding feature vector of respective symbols as the similarity score, It is calculate by the following formula spy of some character respectively in verifying voice messaging in corresponding feature vector and registration voice messaging Levy the similarity score between vector:

To which, the present embodiment will be by that will verify the corresponding feature vector of each character in voice messaging and register voice messaging The feature vector of middle respective symbols carries out similarity-rough set, and combines the timing judgement of sound bite, can further really Protect the accuracy of the user identity of verifying user.

Fig. 7 is the structural schematic diagram of one of embodiment of the present invention voice print identification device, in the present embodiment as shown in the figure Voice print identification device may include:

Voice obtains module 710, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string.

Sound bite identification module 720 obtains the verifying language for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including in message breath.

As shown in figure 3, sound bite identification module 720 can be filtered by speech recognition and intensity of sound, it will be described Verifying voice messaging divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment, It is not involved in subsequent treatment process.

In an alternative embodiment, the sound bite identification module can further include as shown in Figure 8:

Effective segment recognition unit 721, for identification the efficient voice segment in the verifying voice messaging and invalid language Tablet section.

Specifically, effectively segment recognition unit 721 can divide verifying voice according to intensity of sound, sound is strong It spends lesser sound bite and is considered as invalid voice segment (for example including mute section and impulsive noise).

Voice recognition unit 722 obtains respectively for carrying out speech recognition to the efficient voice segment with described first The corresponding sound bite of multiple characters in character string.

Vocal print feature extraction module 730, for extracting the sound of the corresponding sound bite of each character in verifying voice messaging Line feature.

Specifically, vocal print feature extraction module 730 can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.

Characteristic model training module 740, for the vocal print feature according to the corresponding sound bite of each character, in conjunction with The corresponding universal background model training of preset respective symbols is verified the corresponding feature vector of each character in voice messaging.

Characteristic model training module 740 can be special by the vocal print of the corresponding sound bite of character each in verifying voice messaging Sign is used as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding The parameter of the corresponding universal background model of character is adjusted, i.e., will verify the corresponding voice sheet of each character in voice messaging After the vocal print feature of section substitutes into formula (1) as input sample, by constantly adjusting the corresponding common background of preset respective symbols The parameter of model, so that posterior probability P (x) is maximum, so that characteristic model training module 740 can be according to making posterior probability P (x) maximum parameter determines the corresponding feature vector of respective symbols in verifying voice messaging.

To which characteristic model training module 740 can be by the corresponding sound bite of character each in verifying voice messaging Vocal print feature is as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default The mean value super vector of the corresponding universal background model of respective symbols be adjusted, i.e., will verify each character in voice messaging After the vocal print feature of corresponding sound bite substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that after Probability P (x) maximum is tested, characteristic model training module 740 can will be so that the maximum mean value super vector conduct of posterior probability P (x) Verify the corresponding feature vector of respective symbols in voice messaging.

In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, we Pass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis) The variation range of mean value super vector is limited in a sub-spaces, characteristic model training module 740 can be by verifying voice letter The vocal print feature of the corresponding sound bite of each character is as training sample data in breath, using maximal posterior probability algorithm to pre- If the mean value super vector of the corresponding universal background model of respective symbols be adjusted, and combine preset super vector subspace square Battle array is to be verified the corresponding feature vector of each character in voice messaging.In the specific implementation, characteristic model training module 740 The mean value super vector of the corresponding universal background model of preset respective symbols can be adjusted using following formula, so that after adjustment The corresponding universal background model of respective symbols posterior probability it is maximum:

Similarity judgment module 750 for the corresponding feature vector of character each in calculating verifying voice messaging and is preset Registration voice messaging in the corresponding feature vector of respective symbols similarity score.

Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging, It is available and by sound bite identification module 720, vocal print feature extraction module 730 and characteristic model training module 740 Register the corresponding feature vector of sound bite of each character in voice messaging.The registration voice messaging can be vocal print knowledge Other device obtains registration user and reads aloud registration voice messaging, second character string and described first caused by the second character string Character string possesses at least one identical character, i.e., described corresponding second character string of registration voice messaging and first character It goes here and there at least partly identical.And then in an alternative embodiment, voice print identification device can also obtain the registration voice letter from outside After the corresponding feature vector of respective symbols in breath, i.e. registration user are by other equipment typing registration voice messaging, other are set Standby or server is extracted by vocal print feature and vocal print model training obtains the voice sheet of each character in registration voice messaging The corresponding feature vector of section, voice print identification device from other equipment or server by getting in the registration voice messaging The corresponding feature vector of respective symbols, thus verifying user identification stage similarity judgment module 750 to test The corresponding feature vector of each character is compared in card voice messaging.

In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messaging After feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measured The score value of similarity degree between two feature vectors.In an alternative embodiment, similarity judgment module 750 can calculate verifying The corresponding feature vector of each character feature vector corresponding with respective symbols in preset registration voice messaging in voice messaging Between COS distance value as the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging in The similarity score between feature vector in corresponding feature vector and registration voice messaging:

Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ω_i(tar) table Show the character corresponding feature vector, ω in verifying voice messaging_i(test) indicate that the character is right in registration voice messaging The feature vector answered.In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such as Occur in verifying voice messaging as shown in Figure 20,1,5 and 8 all to occur respectively 2 times, then can be according to character 0 twice The feature vector that corresponding sound bite is handled respectively with it is preset registration voice messaging in character 0 feature vector phase Like the average value of degree score, in the feature vector and preset registration voice messaging as character 0 in this verifying voice messaging The similarity score of the feature vector of character 0, and so on.

Subscriber identification module 760, if reaching default verifying thresholding for the similarity score, by the verifying user It is determined as the corresponding registration user of the registration voice messaging.

If verifying in voice messaging and registration voice messaging includes multiple identical characters, subscriber identification module 760 can Mean value is taken with the similarity score for each character being calculated according to similarity judgment module 750, if each character is similar Degree score mean value reaches corresponding default verifying thresholding, then it is corresponding the verifying user to be determined as the registration voice messaging Register user.Multidigit registers user if it exists, such as registration user A, B and C shown in FIG. 1, and subscriber identification module 760 can be with According to the similarity of the feature vector of verifying some character of user and the feature vector of the respective symbols of each registration user, when certain It is a registration user respective symbols feature vector and verifying voice the character feature vector similarity score highest and Similarity reaches default verifying thresholding, then using registration user as the identification result of verifying user.

And then in an alternative embodiment, the voice obtains module 710, is also used to obtain registration user and reads aloud the second character Voice messaging is registered caused by string, second character string possesses at least one identical character with first character string；

The sound bite identification module 720 is also used to obtain registration voice messaging progress speech recognition described The sound bite corresponding with multiple characters in second character string respectively for including in registration voice messaging；

The vocal print feature extraction module 730 is also used to extract the corresponding voice sheet of each character in registration voice messaging The vocal print feature of section；

The characteristic model training module 740 is also used to according to the corresponding language of character each in the registration voice messaging The vocal print feature of tablet section obtains each in registration voice messaging in conjunction with the corresponding universal background model training of preset respective symbols The corresponding feature vector of a character.

In an alternative embodiment, voice print identification device further can also include:

Character sequence determining module 770, for determining the sound bite for verifying multiple characters in voice messaging It sorts consistent with the sequence of respective symbols in first character string.

In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be with It generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this step Sequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail, If consistent with the sequence of the respective symbols in the first character string, vocal print feature extraction module 730 or characteristic model can be notified Training module 740 is executed for the feature extraction of the verifying voice messaging and vocal print training.

Character string display module 700, for generating first character string at random and being shown.

In actual test example, (wherein the test of identities match is 1 in 1000 people's training samples, 290,000 tests Ten thousand times or so, test is mismatched about at 280,000 times), it can be realized under one thousandth error rate 79.8% recall rate, wait wrong general Rate (EER, Equal Error Rate) is 3.39%, and compared to traditional unrelated modeling method of text, Application on Voiceprint Recognition performance is mentioned It rises more than 40% or more.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of method for recognizing sound-groove, which is characterized in that the described method includes:

Carry out speech recognition to the verifying voice messaging and obtain including in the verifying voice messaging respectively with described first The corresponding sound bite of multiple characters in character string, obtains the corresponding sound bite of each character；

The variation range of mean value super vector is limited in a sub-spaces by principal component analytical method based on probability, will be tested The vocal print feature of the corresponding sound bite of each character in voice messaging is demonstrate,proved as training sample data, using maximum a posteriori probability Algorithm is adjusted the mean value super vector of the corresponding universal background model of preset respective symbols, and combines preset super vector Subspace matrices are verified the corresponding feature vector of each character in voice messaging to training；

Calculate separately in verifying voice messaging the corresponding feature vector of each character respectively with phase in preset registration voice messaging The similarity score of the corresponding feature vector of character is answered, if the similarity score reaches default verifying thresholding, is tested described Card user is determined as the corresponding registration user of the registration voice messaging.

2. method for recognizing sound-groove as described in claim 1, which is characterized in that the acquisition verifying user reads aloud the first character string Before generated verifying voice messaging further include:

It obtains registration user and reads aloud registration voice messaging, second character string and first word caused by the second character string Symbol string possesses at least one identical character；

Carry out speech recognition to the registration voice messaging and obtain including in the registration voice messaging respectively with described second The corresponding sound bite of multiple characters in character string；

Extract the vocal print feature of the corresponding sound bite of each character in registration voice messaging；

It is corresponding in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of character each in registration voice messaging Universal background model training obtain the corresponding feature vector of each character in registration voice messaging.

3. method for recognizing sound-groove as described in claim 1, which is characterized in that described to verify each character pair in voice messaging The vocal print feature for the sound bite answered is as training sample data, using maximal posterior probability algorithm to preset respective symbols pair The mean value super vector for the universal background model answered is adjusted, and combines preset super vector subspace matrices to be verified The corresponding feature vector of each character includes: in voice messaging

Using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as training sample data, using following formula The mean value super vector of the corresponding universal background model of preset respective symbols is adjusted, so that respective symbols pair adjusted The posterior probability for the universal background model answered is maximum:

M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, before m represents adjustment Respective symbols universal background model mean value super vector, T be preset super vector subspace matrices, ω be verify voice The corresponding feature vector of respective symbols in information.

4. method for recognizing sound-groove as described in claim 1, which is characterized in that the super vector subspace matrices are according to What the correlation determination in universal background model between the weight of each Gauss module obtained.

5. method for recognizing sound-groove as described in claim 1, which is characterized in that each character in the calculating verifying voice messaging Corresponding feature vector includes: with the similarity score of the corresponding feature vector of respective symbols in preset registration voice messaging

Calculate in verifying voice messaging respective symbols pair in the corresponding feature vector of each character and preset registration voice messaging COS distance value between the feature vector answered is as the similarity score.

6. method for recognizing sound-groove as described in claim 1, which is characterized in that described to carry out voice to the verifying voice messaging Identification obtains the voice sheet corresponding with multiple characters in first character string respectively for including in the verifying voice messaging Section include:

Identify the efficient voice segment and invalid voice segment in the verifying voice messaging；

Efficient voice segment progress speech recognition is obtained corresponding with multiple characters in first character string respectively Sound bite.

7. method for recognizing sound-groove as described in claim 1, which is characterized in that described that the verifying user is determined as the note Before the corresponding registration user of volume voice messaging further include:

Determine that the sequence of the sound bite of multiple characters in the verifying voice messaging is corresponding in first character string The sequence of character is consistent.

8. such as method for recognizing sound-groove of any of claims 1-7, which is characterized in that the acquisition verifying user reads aloud Before verifying voice messaging caused by first character string further include:

First character string is generated at random and is shown.

9. a kind of voice print identification device, which is characterized in that described device includes:

Sound bite identification module obtains in the verifying voice messaging for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including, obtains the corresponding voice sheet of each character Section；

Vocal print feature extraction module, for extracting the vocal print feature of the corresponding sound bite of each character in verifying voice messaging；

Characteristic model training module, for being limited the variation range of mean value super vector by principal component analytical method based on probability It makes in a sub-spaces, using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as training sample Data are adjusted using mean value super vector of the maximal posterior probability algorithm to the corresponding universal background model of preset respective symbols It is whole, and combine preset super vector subspace matrices to training be verified the corresponding feature of each character in voice messaging to Amount；

Similarity judgment module, for calculate separately in verifying voice messaging the corresponding feature vector of each character respectively with it is default Registration voice messaging in the corresponding feature vector of respective symbols similarity score；

The verifying user is determined as by subscriber identification module if reaching default verifying thresholding for the similarity score The corresponding registration user of the registration voice messaging.

10. voice print identification device as claimed in claim 9, which is characterized in that

The voice obtains module, is also used to obtain registration user and reads aloud registration voice messaging, institute caused by the second character string It states the second character string and possesses at least one identical character with first character string；

The sound bite identification module is also used to carry out speech recognition to the registration voice messaging to obtain the registration voice The sound bite corresponding with multiple characters in second character string respectively for including in information；

The vocal print feature extraction module is also used to extract the vocal print of the corresponding sound bite of each character in registration voice messaging Feature；

The characteristic model training module is also used to according to the corresponding sound bite of character each in the registration voice messaging Vocal print feature obtains each character pair in registration voice messaging in conjunction with the corresponding universal background model training of preset respective symbols The feature vector answered.

11. voice print identification device as claimed in claim 9, which is characterized in that the characteristic model training module is specifically used for:

12. voice print identification device as claimed in claim 9, which is characterized in that the super vector subspace matrices are according to height What the correlation determination in the mean value super vector of this mixed model between each dimension vector obtained.

13. voice print identification device as claimed in claim 9, which is characterized in that the similarity judgment module is used for:

14. voice print identification device as claimed in claim 9, which is characterized in that the sound bite identification module includes:

Effective segment recognition unit, for identification the efficient voice segment and invalid voice segment in the verifying voice messaging；

Voice recognition unit obtains respectively and in first character string for carrying out speech recognition to the efficient voice segment The corresponding sound bite of multiple characters.

15. voice print identification device as claimed in claim 9, which is characterized in that further include:

Character sequence determining module, for determining sequence and the institute of the sound bite of multiple characters in the verifying voice messaging The sequence for stating the respective symbols in the first character string is consistent.

16. the voice print identification device as described in any one of claim 9-15, which is characterized in that further include:

Character string display module, for generating first character string at random and being shown.