CN106098068B - A kind of method for recognizing sound-groove and device - Google Patents

A kind of method for recognizing sound-groove and device Download PDF

Info

Publication number
CN106098068B
CN106098068B CN201610416650.3A CN201610416650A CN106098068B CN 106098068 B CN106098068 B CN 106098068B CN 201610416650 A CN201610416650 A CN 201610416650A CN 106098068 B CN106098068 B CN 106098068B
Authority
CN
China
Prior art keywords
voice messaging
character
verifying
voice
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610416650.3A
Other languages
Chinese (zh)
Other versions
CN106098068A (en
Inventor
李为
钱柄桦
金星明
李科
吴富章
吴永坚
黄飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610416650.3A priority Critical patent/CN106098068B/en
Publication of CN106098068A publication Critical patent/CN106098068A/en
Priority to PCT/CN2017/087911 priority patent/WO2017215558A1/en
Application granted granted Critical
Publication of CN106098068B publication Critical patent/CN106098068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephone Function (AREA)

Abstract

The embodiment of the invention discloses a kind of method for recognizing sound-groove and devices, and voice messaging is verified caused by the first character string the method comprise the steps that obtaining verifying user and reading aloud;Speech recognition is carried out to the verifying voice messaging and obtains the sound bite corresponding with multiple characters in first character string respectively for including in the verifying voice messaging;Extract the vocal print feature of the corresponding sound bite of each character;According to the vocal print feature of the corresponding sound bite of each character, the corresponding feature vector of each character in voice messaging is verified in conjunction with the corresponding universal background model training of preset respective symbols;Calculate the corresponding feature vector of each character in verifying voice messaging and the preset similarity score for registering the corresponding feature vector of respective symbols in voice messaging, if the similarity score reaches default verifying thresholding, the verifying user is determined as the corresponding registration user of the registration voice messaging.Using the present invention, Application on Voiceprint Recognition accuracy rate can be effectively improved.

Description

A kind of method for recognizing sound-groove and device
Technical field
The present invention relates to voice recognition technology field more particularly to a kind of method for recognizing sound-groove and device.
Background technique
Application on Voiceprint Recognition knows method for distinguishing, including two ranks of user's registration and user identity identification as a kind of biological information Section.Voice is mapped as user model by a series of processing by registration phase.In the language that cognitive phase is unknown for one section of identity Whether sound carries out the matching of similarity with model, and then unanimously sentences to the identity of unknown voice and the identity of registration voice It is disconnected.Existing vocal print modeling method is usually to be modeled from the unrelated level of text to realize and retouch to speaker's identity feature It states, but the unrelated modeling pattern of text, when user reads aloud different content, recognition accuracy is lower, it is difficult to meet the requirements.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method for recognizing sound-groove and device, Application on Voiceprint Recognition standard can be effectively improved True rate.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of method for recognizing sound-groove, which comprises
It obtains verifying user and reads aloud verifying voice messaging caused by the first character string;
To it is described verifying voice messaging carry out speech recognition obtain it is described verifying voice messaging in include respectively with it is described The corresponding sound bite of multiple characters in first character string;
Extract the vocal print feature of the corresponding sound bite of each character;
It is corresponding general in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of each character Background model training is verified the corresponding feature vector of each character in voice messaging;
Calculate the corresponding feature vector of each character and corresponding word in preset registration voice messaging in verifying voice messaging The similarity score of corresponding feature vector is accorded with, if the similarity score reaches default verifying thresholding, the verifying is used Family is determined as the corresponding registration user of the registration voice messaging.
Correspondingly, the embodiment of the invention also provides a kind of voice print identification device, described device includes:
Voice obtains module, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string;
Sound bite identification module obtains the verifying voice letter for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including in breath;
Vocal print feature extraction module, the vocal print for extracting the corresponding sound bite of each character in verifying voice messaging are special Sign;
Characteristic model training module, for the vocal print feature according to the corresponding sound bite of each character, in conjunction with pre- If respective symbols corresponding universal background model training be verified the corresponding feature vector of each character in voice messaging;
Similarity judgment module, for calculating each corresponding feature vector of character and preset note in verifying voice messaging The similarity score of the corresponding feature vector of respective symbols in volume voice messaging;
Subscriber identification module, it is if reaching default verifying thresholding for the similarity score, the verifying user is true It is set to the corresponding registration user of the registration voice messaging.
The vocal print of the corresponding sound bite of each character in verifying voice messaging of the present embodiment by obtaining verifying user Feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols, and leads to Cross will verify the feature vectors of respective symbols in the corresponding feature vector of each character and registration voice messaging in voice messaging into Row similarity-rough set, so that it is determined that the user identity of verifying user, which is to the user characteristics vector that compares and specific Character is corresponding, vocal print feature when user reads aloud kinds of characters is fully taken into account, so as to effectively improve Application on Voiceprint Recognition accuracy rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the Stages Overview schematic diagram of the method for recognizing sound-groove in the embodiment of the present invention;
Fig. 2 is the flow diagram of one of embodiment of the present invention method for recognizing sound-groove;
Fig. 3 is that the principle that identification obtains the corresponding sound bite of multiple characters from voice messaging in the embodiment of the present invention is shown It is intended to;
Fig. 4 is the principle signal for obtaining the corresponding feature vector of each character in the embodiment of the present invention from voice messaging Figure;
Fig. 5 is the voiceprint registration flow diagram that user is registered in the embodiment of the present invention;
Fig. 6 is the flow diagram of the method for recognizing sound-groove in another embodiment of the present invention;
Fig. 7 is the structural schematic diagram of one of embodiment of the present invention voice print identification device;
Fig. 8 is the structural schematic diagram of the sound bite identification module in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of method for recognizing sound-groove and devices.The method for recognizing sound-groove and device can be applied to It is in need identification unknown subscriber's identity scene or equipment in.The character in character string for carrying out Application on Voiceprint Recognition can be Arabic numerals, English alphabet or other language characters etc..To simplify the description, the character in the embodiment of the present invention is with Arab It is illustrated for number.
Method for recognizing sound-groove in the embodiment of the present invention can be divided into two stages, as shown in Figure 1:
1) the voiceprint registration stage of user is registered
In the voiceprint registration stage, a login-string (the second character occurred hereinafter can be read aloud by registering user String), voice print identification device acquires registration voice messaging of the registration user when reading aloud the login-string, then to registration language Message breath carry out voice recognition obtain it is described registration voice messaging in include respectively with multiple words in the login-string Corresponding sound bite is accorded with, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character, Including the vocal print feature according to the corresponding sound bite of each character, in conjunction with the corresponding common background of preset respective symbols Model (Universal Background Model, UBM, i.e. GMM-UBM) training obtains each character in registration voice messaging Corresponding feature vector, then voice print identification device can be respectively that different registration users reads aloud it in the voiceprint registration stage Registration voice messaging in the corresponding feature vector of multiple characters be stored in the model library of voice print identification device.
For example, login-string is digit strings 0185851, four kinds of digital " 0 "s, " 1 ", " 5 ", " 8 " are contained, then sound Line identification device carries out vocal print feature extraction and sound-groove model according to the corresponding sound bite of character each in registration voice messaging Training, obtain " 0 ", " 1 ", " 5 ", " 8 " corresponding sound bite vocal print feature, and then combine preset respective symbols it is corresponding UBM training obtains the corresponding feature vector of each character in registration voice messaging, including feature vector corresponding with digital " 0 ", And digital " 1 " corresponding feature vector feature vector corresponding with number " 5 " and feature vector corresponding with number " 8 ".
2) the identification stage of user is verified
In the identification stage, the user for verifying the i.e. unknown identity of user reads aloud a verifying character string (to be occurred hereinafter The first character string, second character string possesses at least one identical character with first character string), Application on Voiceprint Recognition dress Verifying voice messaging of the acquisition verifying user when reading aloud the verifying character string is set, sound then is carried out to verifying voice messaging Identification obtains the voice sheet corresponding with multiple characters in the verifying character string respectively for including in the verifying voice messaging Section, and then vocal print feature extraction and vocal print model training are carried out to the corresponding sound bite of each character, including according to described each The vocal print feature of the corresponding sound bite of a character is verified voice letter in conjunction with the corresponding UBM training of preset respective symbols The corresponding feature vector of each character in breath finally calculates the corresponding feature vector of each character in verifying voice messaging and default Registration voice messaging in the corresponding feature vector of respective symbols similarity score, tested if the similarity score reaches default Thresholding is demonstrate,proved, then the verifying user is determined as the corresponding registration user of the registration voice messaging.
For example, verifying character string is digit strings 85851510, then when voice print identification device is read aloud according to verifying user The corresponding sound bite of each character carries out vocal print feature and extracts and vocal print model training in the verifying voice messaging of generation, obtains " 0 ", " 1 ", " 5 ", " 8 " corresponding GMM, and then combine the corresponding UBM of preset respective symbols that verifying user can be calculated Verifying voice messaging feature vector, including and the corresponding feature vector of digital " 0 ", feature vector corresponding with number " 1 ", And digital " 5 " corresponding feature vector and feature vector corresponding with digital " 8 ", and then calculate separately in verifying voice messaging " 0 ", " 1 ", " 5 ", " 8 " corresponding feature vector spy corresponding with " 0 ", " 1 ", " 5 ", " 8 " in registration voice messaging respectively The similarity score between vector is levied, if the similarity score reaches default verifying thresholding, the verifying user is determined For the corresponding registration user of the registration voice messaging.
It should be pointed out that the voiceprint registration stage of above-mentioned registration user and the identification stage of verifying user can be It realizes, can also be realized in different devices in same device respectively, such as the vocal print note of registration user The volume stage implements in the first equipment, and then the first equipment will be registered the corresponding feature vector of multiple characters in voice messaging and be sent out The second equipment is given, so as to implement the identification stage of verifying user in the second equipment.
Above-mentioned two process is described in detail respectively below by specific embodiment.
Fig. 2 is the flow diagram of one of embodiment of the present invention method for recognizing sound-groove, in the present embodiment as shown in the figure Method for recognizing sound-groove process may include:
S201 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.
Verifying user, that is, unknown identity user, needs to verify its user identity by voice print identification device.It is described First character string is that the character string of authentication is carried out for verifying user, can be randomly generated, and is also possible to default solid A fixed character string, such as the second character string corresponding with pre-generated registration voice messaging are one at least partly identical Character string.Specifically, the character string may include m character, wherein there is n mutually different characters, m, n are positive whole Number, and m >=n.
For example, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ", “3”、“4”、“5”、“8”、“9”。
In an alternative embodiment, voice print identification device can be generated and show first character string, allows and verifies user's root It is read aloud according to first character string of display.
S202, to it is described verifying voice messaging carry out speech recognition obtain it is described verifying voice messaging in include respectively with The corresponding sound bite of multiple characters in first character string.
As shown in figure 3, voice print identification device can be filtered by speech recognition and intensity of sound, by the verifying voice Information divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment, after being not involved in Continuous treatment process.
S203 extracts the vocal print feature of the corresponding sound bite of each character.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S204, it is corresponding in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of each character Universal background model training be verified the corresponding feature vector of each character in voice messaging;
The universal background model UBM in the embodiment of the present invention is a kind of language of optional network specific digit by a large amount of speakers Mixed Gauss model made of segment combined training characterizes distribution of the voice of corresponding number in feature space, and due to instruction Practice data source in a large amount of speaker, therefore it does not characterize certain one kind and specifically talks about people, it, can with the unrelated characteristic of identity Regard a kind of universal background model as.It schematically, can be more than 20 hours languages greater than 1000 people, duration using number of speaking Sound sample, and the frequency of occurrences relative equilibrium of each character, training obtain UBM.The mathematic(al) representation of UBM are as follows:
P (x)=∑I=1 ... CaiN(x|μi, ∑i) ... ... formula (1)
Wherein, P (x) represents the probability distribution of UBM, and C, which is represented, shares C Gauss module in UBM, sums up, aiIt represents The weight of i-th of Gauss module, μiRepresent the mean value of i-th of Gauss module, ∑iRepresent the variance of i-th of Gauss module, N (x) Gaussian Profile is represented, x represents the sample of input, sample namely vocal print feature.
Voice print identification device can will verify the vocal print feature of the corresponding sound bite of each character in voice messaging as Training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pair The parameter for the universal background model answered is adjusted, i.e., in the sound that will verify the corresponding sound bite of each character in voice messaging After line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbols Parameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to verify voice according to making The corresponding feature vector of respective symbols in information.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speak The identity information of people, we define the mean value super vector of UBM model are as follows:
To which voice print identification device can be by the vocal print feature of the corresponding sound bite of character each in verifying voice messaging As training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messaging Corresponding feature vector.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, we Pass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis) The variation range of mean value super vector is limited in a sub-spaces, voice print identification device can will be verified each in voice messaging The vocal print feature of the corresponding sound bite of character is as training sample data, using maximal posterior probability algorithm to preset corresponding The mean value super vector of the corresponding universal background model of character is adjusted, and combines preset super vector subspace matrices to obtain The corresponding feature vector of each character into verifying voice messaging.In the specific implementation, can be using following formula to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, so that the corresponding common background mould of respective symbols adjusted The posterior probability of type is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verify The corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messaging The corresponding feature vector of respective symbols.The super vector subspace matrices T be according to the mean value of the gauss hybrid models surpass to What the correlation determination in amount between each dimension vector obtained.
S205 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messaging The similarity score of the corresponding feature vector of character is answered, if the similarity score reaches default verifying thresholding, is tested described Card user is determined as the corresponding registration user of the registration voice messaging.
Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging, And extracted by the vocal print feature similar with the present embodiment and vocal print model training, it is each in available registration voice messaging The corresponding feature vector of the sound bite of character.The registration voice messaging can be voice print identification device and obtain registration user It reads aloud and registers voice messaging caused by the second character string, second character string and first character string possess at least one Identical character, i.e., described corresponding second character string of registration voice messaging and first character string are at least partly identical.Into And in an alternative embodiment, it is corresponding that voice print identification device can also obtain respective symbols in the registration voice messaging from outside After feature vector, i.e. registration user are by other equipment typing registration voice messaging, other equipment or server pass through sound Line feature extraction and vocal print model training obtain the corresponding feature vector of sound bite of each character in registration voice messaging, sound Line identification device is by getting the corresponding feature of respective symbols in the registration voice messaging from other equipment or server Vector, thus verifying user the identification stage to feature vector corresponding with each character in verifying voice messaging into Row compares.
In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messaging After feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measured The score value of similarity degree between two feature vectors.In an alternative embodiment, each word in verifying voice messaging can be calculated Accord with the COS distance value between corresponding feature vector feature vector corresponding with respective symbols in preset registration voice messaging As the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging in corresponding feature vector and Register the similarity score between the feature vector in voice messaging:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) table Show the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messaging The feature vector answered.If verifying in voice messaging and registration voice messaging includes multiple identical characters, can be according to above formula The similarity score for each character being calculated takes mean value, if the similarity score mean value of each character reaches corresponding default Thresholding is verified, then the verifying user is determined as the corresponding registration user of the registration voice messaging.Multidigit is registered if it exists User, such as registration user A, B and C shown in FIG. 1, can be according to the feature vector and each note for verifying some character of user The similarity of the feature vector of the respective symbols of volume user, when the feature vector and verifying language of the respective symbols of some registration user The similarity score highest and similarity of the feature vector of the character of sound reach default verifying thresholding, then make registration user For the identification result for verifying user.
In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such as occur 0,1,5 and 8 all occur 2 times respectively in verifying voice messaging as shown in Figure 2, then can be corresponding according to character 0 twice The feature vector that handles of the sound bite similarity with the feature vector of character 0 in preset registration voice messaging respectively The average value of score, as character 0 in the feature vector of character 0 in this verifying voice messaging and preset registration voice messaging Feature vector similarity score, and so on.
It should be pointed out that measuring the mode of the similarity between two feature vectors there are also very much, the above is only this hairs A kind of embodiment of bright offer, those skilled in the art may not need creative labor on the basis of scheme disclosed by the invention The similarity point of more feature vectors for calculating verifying voice messaging and registering the character shared in voice messaging is obtained dynamicly Several modes, the present invention is without exhaustion.
To the corresponding sound bite of character each in the verifying voice messaging of, the present embodiment by obtaining verifying user Vocal print feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols, And by will verify the corresponding feature vector of each character in voice messaging with register the features of respective symbols in voice messaging to Amount carries out similarity-rough set, so that it is determined that the user identity of verifying user, which to the user characteristics vector that compares with Specific character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, so as to effectively improve Application on Voiceprint Recognition standard True rate.
Fig. 5 is the voiceprint registration flow diagram that user is registered in the embodiment of the present invention, in the present embodiment as shown in the figure Voiceprint registration process may include:
S501 obtains registration user and reads aloud and registers voice messaging caused by the second character string, second character string with First character string possesses at least one identical character.
The registration user is the user for determining legal identity, and second character string is for acquiring registration user's vocal print The character string of feature vector can be randomly generated, and be also possible to preset a character string of fixation.Specifically, described Two character strings also may include m character, wherein there is n mutually different characters, m, n are positive integer, and m >=n.
In an alternative embodiment, voice print identification device can be generated and show second character string, allows and registers user's root It is read aloud according to second character string of display.
S502, to it is described registration voice messaging carry out speech recognition obtain it is described registration voice messaging in include respectively with The corresponding sound bite of multiple characters in second character string;
Voice print identification device can be filtered by speech recognition and intensity of sound, and the verifying voice messaging is divided To the corresponding sound bite of multiple characters, invalid voice segment can also optionally be weeded out, be not involved in subsequent processed Journey.
S503 extracts the vocal print feature of the corresponding sound bite of each character in registration voice messaging.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S504, according to the vocal print feature of the corresponding sound bite of character each in registration voice messaging, in conjunction with preset phase Character corresponding universal background model training is answered to obtain the corresponding feature vector of each character in registration voice messaging.
The expression formula of UBM can be with reference to embodiment above.The step of voiceprint registration process and Application on Voiceprint Recognition process S204 is similar, voice print identification device can will register the vocal print feature of the corresponding sound bite of each character in voice messaging as Training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset respective symbols pair The parameter for the universal background model answered is adjusted, i.e., in the sound that will register the corresponding sound bite of each character in voice messaging After line feature substitutes into formula (1) as input sample, by constantly adjusting the corresponding universal background model of preset respective symbols Parameter, so that posterior probability P (x) is maximum, so as to which the maximum parameter of posterior probability P (x) is determining to register voice according to making The corresponding feature vector of respective symbols in information.
And since the mean value of Gauss module each in UBM model can be used for distinguishing the identity information of speaker, vocal print is known Other device can be adopted using the vocal print feature of the corresponding sound bite of character each in registration voice messaging as training sample data With maximal posterior probability algorithm (Maximum A Posteriori, MAP) to the corresponding common background mould of preset respective symbols The mean value super vector of type is adjusted, i.e., makees in the vocal print feature that will register the corresponding sound bite of each character in voice messaging After substituting into formula (1) for input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) is maximum, so as to incite somebody to action So that the maximum mean value super vector of posterior probability P (x) is as the corresponding feature vector of respective symbols in registration voice messaging.
It, can be using following formula to the equal of the corresponding universal background model of preset respective symbols in another alternative embodiment Value super vector is adjusted, so that the posterior probability of the corresponding universal background model of respective symbols adjusted is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to register The corresponding feature vector of respective symbols in voice messaging will register the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in registration voice messaging The corresponding feature vector of respective symbols.
Fig. 6 is the flow diagram of the method for recognizing sound-groove in another embodiment of the present invention, in the present embodiment as shown in the figure Method for recognizing sound-groove may include following below scheme:
S601, it is random to generate the first character string and shown.
S602 obtains verifying user and reads aloud verifying voice messaging caused by the first character string.
S603 identifies efficient voice segment and invalid voice segment in the verifying voice messaging.
Specifically, can be divided according to intensity of sound to verifying voice, the lesser sound bite of intensity of sound is regarded For invalid voice segment (for example including mute section and impulsive noise).
S604, to the efficient voice segment carry out speech recognition obtain respectively with multiple words in first character string Accord with corresponding sound bite.
Sound bite corresponding with multiple characters in first character string respectively can be obtained by speech recognition.
S605 determines the sequence and first character string of the sound bite of multiple characters in the verifying voice messaging In respective symbols sequence it is consistent.
In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be with It generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this step Sequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail, If consistent with the sequence of the respective symbols in the first character string, follow-up process is executed.
S606 extracts the vocal print feature of the corresponding sound bite of each character.
Specifically, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
S607, using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as number of training According to being adjusted using mean value super vector of the maximal posterior probability algorithm to the corresponding universal background model of preset respective symbols It is whole, so that estimation is verified the corresponding feature vector of each character in voice messaging.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speak The identity information of people, voice print identification device can be by the vocal print features of the corresponding sound bite of character each in verifying voice messaging As training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding word The mean value super vector for according with corresponding universal background model is adjusted, i.e., will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that posterior probability P (x) maximum, so as to which the maximum mean value super vector of posterior probability P (x) will be made as respective symbols in verifying voice messaging Corresponding feature vector.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, vocal print Identification device can be adjusted the mean value super vector of the corresponding universal background model of preset respective symbols using following formula, make The posterior probability for obtaining the corresponding universal background model of respective symbols adjusted is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verify The corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messaging The corresponding feature vector of respective symbols.
S608 calculates the corresponding feature vector of each character and phase in preset registration voice messaging in verifying voice messaging The similarity score of the corresponding feature vector of character is answered, if similarity score reaches default verifying thresholding, it is true user will to be verified It is set to the corresponding registration user of registration voice messaging.
In the present embodiment, voice print identification device can calculate in verifying voice messaging the corresponding feature vector of each character with COS distance value in preset registration voice messaging between the corresponding feature vector of respective symbols as the similarity score, It is calculate by the following formula spy of some character respectively in verifying voice messaging in corresponding feature vector and registration voice messaging Levy the similarity score between vector:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) table Show the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messaging The feature vector answered.If verifying in voice messaging and registration voice messaging includes multiple identical characters, can be according to above formula The similarity score for each character being calculated takes mean value, if the similarity score mean value of each character reaches corresponding default Thresholding is verified, then the verifying user is determined as the corresponding registration user of the registration voice messaging.Multidigit is registered if it exists User, such as registration user A, B and C shown in FIG. 1, can be according to the feature vector and each note for verifying some character of user The similarity of the feature vector of the respective symbols of volume user, when the feature vector and verifying language of the respective symbols of some registration user The similarity score highest and similarity of the feature vector of the character of sound reach default verifying thresholding, then make registration user For the identification result for verifying user.
To which, the present embodiment will be by that will verify the corresponding feature vector of each character in voice messaging and register voice messaging The feature vector of middle respective symbols carries out similarity-rough set, and combines the timing judgement of sound bite, can further really Protect the accuracy of the user identity of verifying user.
Fig. 7 is the structural schematic diagram of one of embodiment of the present invention voice print identification device, in the present embodiment as shown in the figure Voice print identification device may include:
Voice obtains module 710, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string.
Verifying user, that is, unknown identity user, needs to verify its user identity by voice print identification device.It is described First character string is that the character string of authentication is carried out for verifying user, can be randomly generated, and is also possible to default solid A fixed character string, such as the second character string corresponding with pre-generated registration voice messaging are one at least partly identical Character string.Specifically, the character string may include m character, wherein there is n mutually different characters, m, n are positive whole Number, and m >=n.
For example, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ", “3”、“4”、“5”、“8”、“9”。
Sound bite identification module 720 obtains the verifying language for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including in message breath.
As shown in figure 3, sound bite identification module 720 can be filtered by speech recognition and intensity of sound, it will be described Verifying voice messaging divides to obtain the corresponding sound bite of multiple characters, can also optionally weed out invalid voice segment, It is not involved in subsequent treatment process.
In an alternative embodiment, the sound bite identification module can further include as shown in Figure 8:
Effective segment recognition unit 721, for identification the efficient voice segment in the verifying voice messaging and invalid language Tablet section.
Specifically, effectively segment recognition unit 721 can divide verifying voice according to intensity of sound, sound is strong It spends lesser sound bite and is considered as invalid voice segment (for example including mute section and impulsive noise).
Voice recognition unit 722 obtains respectively for carrying out speech recognition to the efficient voice segment with described first The corresponding sound bite of multiple characters in character string.
Vocal print feature extraction module 730, for extracting the sound of the corresponding sound bite of each character in verifying voice messaging Line feature.
Specifically, vocal print feature extraction module 730 can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive perceives linear predictor coefficient), the vocal print feature as sound bite corresponding to each character.
Characteristic model training module 740, for the vocal print feature according to the corresponding sound bite of each character, in conjunction with The corresponding universal background model training of preset respective symbols is verified the corresponding feature vector of each character in voice messaging.
Characteristic model training module 740 can be special by the vocal print of the corresponding sound bite of character each in verifying voice messaging Sign is used as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to preset corresponding The parameter of the corresponding universal background model of character is adjusted, i.e., will verify the corresponding voice sheet of each character in voice messaging After the vocal print feature of section substitutes into formula (1) as input sample, by constantly adjusting the corresponding common background of preset respective symbols The parameter of model, so that posterior probability P (x) is maximum, so that characteristic model training module 740 can be according to making posterior probability P (x) maximum parameter determines the corresponding feature vector of respective symbols in verifying voice messaging.
Due to largely test the mean value for demonstrating each Gauss module in UBM model with paper can be used for distinguish speak The identity information of people, we define the mean value super vector of UBM model are as follows:
To which characteristic model training module 740 can be by the corresponding sound bite of character each in verifying voice messaging Vocal print feature is as training sample data, using maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default The mean value super vector of the corresponding universal background model of respective symbols be adjusted, i.e., will verify each character in voice messaging After the vocal print feature of corresponding sound bite substitutes into formula (1) as input sample, by constantly adjusting mean value super vector, so that after Probability P (x) maximum is tested, characteristic model training module 740 can will be so that the maximum mean value super vector conduct of posterior probability P (x) Verify the corresponding feature vector of respective symbols in voice messaging.
In another alternative embodiment, the slow problem of high-dimensional bring convergence rate in order to reduce super vector, we Pass through principal component analytical method based on probability (PPCA, probabilistic principal component analysis) The variation range of mean value super vector is limited in a sub-spaces, characteristic model training module 740 can be by verifying voice letter The vocal print feature of the corresponding sound bite of each character is as training sample data in breath, using maximal posterior probability algorithm to pre- If the mean value super vector of the corresponding universal background model of respective symbols be adjusted, and combine preset super vector subspace square Battle array is to be verified the corresponding feature vector of each character in voice messaging.In the specific implementation, characteristic model training module 740 The mean value super vector of the corresponding universal background model of preset respective symbols can be adjusted using following formula, so that after adjustment The corresponding universal background model of respective symbols posterior probability it is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, and m, which is represented, to be adjusted The mean value super vector of the universal background model of respective symbols before whole, T are preset super vector subspace matrices, and ω is to verify The corresponding feature vector of respective symbols in voice messaging will verify the corresponding sound bite of each character in voice messaging After vocal print feature substitutes into formula (1) as input sample, by constantly adjust the mean value that ω may be implemented in adjustment type (1) surpass to Amount, so that posterior probability P (x) is maximum, so as to which the maximum ω of posterior probability P (x) will be made as in verifying voice messaging The corresponding feature vector of respective symbols.The super vector subspace matrices T be according to the mean value of the gauss hybrid models surpass to What the correlation determination in amount between each dimension vector obtained.
Similarity judgment module 750 for the corresponding feature vector of character each in calculating verifying voice messaging and is preset Registration voice messaging in the corresponding feature vector of respective symbols similarity score.
Specifically, voice print identification device can the voiceprint registration stage get registration user registration voice messaging, It is available and by sound bite identification module 720, vocal print feature extraction module 730 and characteristic model training module 740 Register the corresponding feature vector of sound bite of each character in voice messaging.The registration voice messaging can be vocal print knowledge Other device obtains registration user and reads aloud registration voice messaging, second character string and described first caused by the second character string Character string possesses at least one identical character, i.e., described corresponding second character string of registration voice messaging and first character It goes here and there at least partly identical.And then in an alternative embodiment, voice print identification device can also obtain the registration voice letter from outside After the corresponding feature vector of respective symbols in breath, i.e. registration user are by other equipment typing registration voice messaging, other are set Standby or server is extracted by vocal print feature and vocal print model training obtains the voice sheet of each character in registration voice messaging The corresponding feature vector of section, voice print identification device from other equipment or server by getting in the registration voice messaging The corresponding feature vector of respective symbols, thus verifying user identification stage similarity judgment module 750 to test The corresponding feature vector of each character is compared in card voice messaging.
In the specific implementation, the similarity score is that voice print identification device is corresponding by each character in verifying voice messaging After feature vector feature vector corresponding with respective symbols in preset registration voice messaging is compared, identical characters are measured The score value of similarity degree between two feature vectors.In an alternative embodiment, similarity judgment module 750 can calculate verifying The corresponding feature vector of each character feature vector corresponding with respective symbols in preset registration voice messaging in voice messaging Between COS distance value as the similarity score, that is, be calculate by the following formula some character respectively verifying voice messaging in The similarity score between feature vector in corresponding feature vector and registration voice messaging:
Wherein, subscript i indicates i-th of verifying voice messaging and registers the character shared in voice messaging, ωi(tar) table Show the character corresponding feature vector, ω in verifying voice messagingi(test) indicate that the character is right in registration voice messaging The feature vector answered.In an alternative embodiment, if there are same characters to occur more than once in the verifying voice messaging, such as Occur in verifying voice messaging as shown in Figure 20,1,5 and 8 all to occur respectively 2 times, then can be according to character 0 twice The feature vector that corresponding sound bite is handled respectively with it is preset registration voice messaging in character 0 feature vector phase Like the average value of degree score, in the feature vector and preset registration voice messaging as character 0 in this verifying voice messaging The similarity score of the feature vector of character 0, and so on.
It should be pointed out that measuring the mode of the similarity between two feature vectors there are also very much, the above is only this hairs A kind of embodiment of bright offer, those skilled in the art may not need creative labor on the basis of scheme disclosed by the invention The similarity point of more feature vectors for calculating verifying voice messaging and registering the character shared in voice messaging is obtained dynamicly Several modes, the present invention is without exhaustion.
Subscriber identification module 760, if reaching default verifying thresholding for the similarity score, by the verifying user It is determined as the corresponding registration user of the registration voice messaging.
If verifying in voice messaging and registration voice messaging includes multiple identical characters, subscriber identification module 760 can Mean value is taken with the similarity score for each character being calculated according to similarity judgment module 750, if each character is similar Degree score mean value reaches corresponding default verifying thresholding, then it is corresponding the verifying user to be determined as the registration voice messaging Register user.Multidigit registers user if it exists, such as registration user A, B and C shown in FIG. 1, and subscriber identification module 760 can be with According to the similarity of the feature vector of verifying some character of user and the feature vector of the respective symbols of each registration user, when certain It is a registration user respective symbols feature vector and verifying voice the character feature vector similarity score highest and Similarity reaches default verifying thresholding, then using registration user as the identification result of verifying user.
And then in an alternative embodiment, the voice obtains module 710, is also used to obtain registration user and reads aloud the second character Voice messaging is registered caused by string, second character string possesses at least one identical character with first character string;
The sound bite identification module 720 is also used to obtain registration voice messaging progress speech recognition described The sound bite corresponding with multiple characters in second character string respectively for including in registration voice messaging;
The vocal print feature extraction module 730 is also used to extract the corresponding voice sheet of each character in registration voice messaging The vocal print feature of section;
The characteristic model training module 740 is also used to according to the corresponding language of character each in the registration voice messaging The vocal print feature of tablet section obtains each in registration voice messaging in conjunction with the corresponding universal background model training of preset respective symbols The corresponding feature vector of a character.
In an alternative embodiment, voice print identification device further can also include:
Character sequence determining module 770, for determining the sound bite for verifying multiple characters in voice messaging It sorts consistent with the sequence of respective symbols in first character string.
In order to after effectively avoiding the voice messaging of registration user from being copied illegally or illegally copied to carry out Application on Voiceprint Recognition, can be with It generates the first different character strings at random every time, and judges the sound bite of multiple characters in verifying voice messaging in this step Sequence it is whether consistent with the sequence of respective symbols in the first character string, if inconsistent, may determine that Application on Voiceprint Recognition fail, If consistent with the sequence of the respective symbols in the first character string, vocal print feature extraction module 730 or characteristic model can be notified Training module 740 is executed for the feature extraction of the verifying voice messaging and vocal print training.
In an alternative embodiment, voice print identification device further can also include:
Character string display module 700, for generating first character string at random and being shown.
To the corresponding sound bite of character each in the verifying voice messaging of, the present embodiment by obtaining verifying user Vocal print feature is verified the corresponding feature vector of each character in voice messaging in conjunction with the UBM training of preset respective symbols, And by will verify the corresponding feature vector of each character in voice messaging with register the features of respective symbols in voice messaging to Amount carries out similarity-rough set, so that it is determined that the user identity of verifying user, which to the user characteristics vector that compares with Specific character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, so as to effectively improve Application on Voiceprint Recognition standard True rate.
In actual test example, (wherein the test of identities match is 1 in 1000 people's training samples, 290,000 tests Ten thousand times or so, test is mismatched about at 280,000 times), it can be realized under one thousandth error rate 79.8% recall rate, wait wrong general Rate (EER, Equal Error Rate) is 3.39%, and compared to traditional unrelated modeling method of text, Application on Voiceprint Recognition performance is mentioned It rises more than 40% or more.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims (16)

1. a kind of method for recognizing sound-groove, which is characterized in that the described method includes:
It obtains verifying user and reads aloud verifying voice messaging caused by the first character string;
Carry out speech recognition to the verifying voice messaging and obtain including in the verifying voice messaging respectively with described first The corresponding sound bite of multiple characters in character string, obtains the corresponding sound bite of each character;
Extract the vocal print feature of the corresponding sound bite of each character;
The variation range of mean value super vector is limited in a sub-spaces by principal component analytical method based on probability, will be tested The vocal print feature of the corresponding sound bite of each character in voice messaging is demonstrate,proved as training sample data, using maximum a posteriori probability Algorithm is adjusted the mean value super vector of the corresponding universal background model of preset respective symbols, and combines preset super vector Subspace matrices are verified the corresponding feature vector of each character in voice messaging to training;
Calculate separately in verifying voice messaging the corresponding feature vector of each character respectively with phase in preset registration voice messaging The similarity score of the corresponding feature vector of character is answered, if the similarity score reaches default verifying thresholding, is tested described Card user is determined as the corresponding registration user of the registration voice messaging.
2. method for recognizing sound-groove as described in claim 1, which is characterized in that the acquisition verifying user reads aloud the first character string Before generated verifying voice messaging further include:
It obtains registration user and reads aloud registration voice messaging, second character string and first word caused by the second character string Symbol string possesses at least one identical character;
Carry out speech recognition to the registration voice messaging and obtain including in the registration voice messaging respectively with described second The corresponding sound bite of multiple characters in character string;
Extract the vocal print feature of the corresponding sound bite of each character in registration voice messaging;
It is corresponding in conjunction with preset respective symbols according to the vocal print feature of the corresponding sound bite of character each in registration voice messaging Universal background model training obtain the corresponding feature vector of each character in registration voice messaging.
3. method for recognizing sound-groove as described in claim 1, which is characterized in that described to verify each character pair in voice messaging The vocal print feature for the sound bite answered is as training sample data, using maximal posterior probability algorithm to preset respective symbols pair The mean value super vector for the universal background model answered is adjusted, and combines preset super vector subspace matrices to be verified The corresponding feature vector of each character includes: in voice messaging
Using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as training sample data, using following formula The mean value super vector of the corresponding universal background model of preset respective symbols is adjusted, so that respective symbols pair adjusted The posterior probability for the universal background model answered is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, before m represents adjustment Respective symbols universal background model mean value super vector, T be preset super vector subspace matrices, ω be verify voice The corresponding feature vector of respective symbols in information.
4. method for recognizing sound-groove as described in claim 1, which is characterized in that the super vector subspace matrices are according to What the correlation determination in universal background model between the weight of each Gauss module obtained.
5. method for recognizing sound-groove as described in claim 1, which is characterized in that each character in the calculating verifying voice messaging Corresponding feature vector includes: with the similarity score of the corresponding feature vector of respective symbols in preset registration voice messaging
Calculate in verifying voice messaging respective symbols pair in the corresponding feature vector of each character and preset registration voice messaging COS distance value between the feature vector answered is as the similarity score.
6. method for recognizing sound-groove as described in claim 1, which is characterized in that described to carry out voice to the verifying voice messaging Identification obtains the voice sheet corresponding with multiple characters in first character string respectively for including in the verifying voice messaging Section include:
Identify the efficient voice segment and invalid voice segment in the verifying voice messaging;
Efficient voice segment progress speech recognition is obtained corresponding with multiple characters in first character string respectively Sound bite.
7. method for recognizing sound-groove as described in claim 1, which is characterized in that described that the verifying user is determined as the note Before the corresponding registration user of volume voice messaging further include:
Determine that the sequence of the sound bite of multiple characters in the verifying voice messaging is corresponding in first character string The sequence of character is consistent.
8. such as method for recognizing sound-groove of any of claims 1-7, which is characterized in that the acquisition verifying user reads aloud Before verifying voice messaging caused by first character string further include:
First character string is generated at random and is shown.
9. a kind of voice print identification device, which is characterized in that described device includes:
Voice obtains module, reads aloud for acquisition verifying user and verifies voice messaging caused by the first character string;
Sound bite identification module obtains in the verifying voice messaging for carrying out speech recognition to the verifying voice messaging The sound bite corresponding with multiple characters in first character string respectively for including, obtains the corresponding voice sheet of each character Section;
Vocal print feature extraction module, for extracting the vocal print feature of the corresponding sound bite of each character in verifying voice messaging;
Characteristic model training module, for being limited the variation range of mean value super vector by principal component analytical method based on probability It makes in a sub-spaces, using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as training sample Data are adjusted using mean value super vector of the maximal posterior probability algorithm to the corresponding universal background model of preset respective symbols It is whole, and combine preset super vector subspace matrices to training be verified the corresponding feature of each character in voice messaging to Amount;
Similarity judgment module, for calculate separately in verifying voice messaging the corresponding feature vector of each character respectively with it is default Registration voice messaging in the corresponding feature vector of respective symbols similarity score;
The verifying user is determined as by subscriber identification module if reaching default verifying thresholding for the similarity score The corresponding registration user of the registration voice messaging.
10. voice print identification device as claimed in claim 9, which is characterized in that
The voice obtains module, is also used to obtain registration user and reads aloud registration voice messaging, institute caused by the second character string It states the second character string and possesses at least one identical character with first character string;
The sound bite identification module is also used to carry out speech recognition to the registration voice messaging to obtain the registration voice The sound bite corresponding with multiple characters in second character string respectively for including in information;
The vocal print feature extraction module is also used to extract the vocal print of the corresponding sound bite of each character in registration voice messaging Feature;
The characteristic model training module is also used to according to the corresponding sound bite of character each in the registration voice messaging Vocal print feature obtains each character pair in registration voice messaging in conjunction with the corresponding universal background model training of preset respective symbols The feature vector answered.
11. voice print identification device as claimed in claim 9, which is characterized in that the characteristic model training module is specifically used for:
Using the vocal print feature of the corresponding sound bite of character each in verifying voice messaging as training sample data, using following formula The mean value super vector of the corresponding universal background model of preset respective symbols is adjusted, so that respective symbols pair adjusted The posterior probability for the universal background model answered is maximum:
M=m+T ω, wherein M represents the mean value super vector of the universal background model of some character adjusted, before m represents adjustment Respective symbols universal background model mean value super vector, T be preset super vector subspace matrices, ω be verify voice The corresponding feature vector of respective symbols in information.
12. voice print identification device as claimed in claim 9, which is characterized in that the super vector subspace matrices are according to height What the correlation determination in the mean value super vector of this mixed model between each dimension vector obtained.
13. voice print identification device as claimed in claim 9, which is characterized in that the similarity judgment module is used for:
Calculate in verifying voice messaging respective symbols pair in the corresponding feature vector of each character and preset registration voice messaging COS distance value between the feature vector answered is as the similarity score.
14. voice print identification device as claimed in claim 9, which is characterized in that the sound bite identification module includes:
Effective segment recognition unit, for identification the efficient voice segment and invalid voice segment in the verifying voice messaging;
Voice recognition unit obtains respectively and in first character string for carrying out speech recognition to the efficient voice segment The corresponding sound bite of multiple characters.
15. voice print identification device as claimed in claim 9, which is characterized in that further include:
Character sequence determining module, for determining sequence and the institute of the sound bite of multiple characters in the verifying voice messaging The sequence for stating the respective symbols in the first character string is consistent.
16. the voice print identification device as described in any one of claim 9-15, which is characterized in that further include:
Character string display module, for generating first character string at random and being shown.
CN201610416650.3A 2016-06-12 2016-06-12 A kind of method for recognizing sound-groove and device Active CN106098068B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610416650.3A CN106098068B (en) 2016-06-12 2016-06-12 A kind of method for recognizing sound-groove and device
PCT/CN2017/087911 WO2017215558A1 (en) 2016-06-12 2017-06-12 Voiceprint recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610416650.3A CN106098068B (en) 2016-06-12 2016-06-12 A kind of method for recognizing sound-groove and device

Publications (2)

Publication Number Publication Date
CN106098068A CN106098068A (en) 2016-11-09
CN106098068B true CN106098068B (en) 2019-07-16

Family

ID=57846666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610416650.3A Active CN106098068B (en) 2016-06-12 2016-06-12 A kind of method for recognizing sound-groove and device

Country Status (2)

Country Link
CN (1) CN106098068B (en)
WO (1) WO2017215558A1 (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098068B (en) * 2016-06-12 2019-07-16 腾讯科技(深圳)有限公司 A kind of method for recognizing sound-groove and device
CN110169014A (en) 2017-01-03 2019-08-23 诺基亚技术有限公司 Device, method and computer program product for certification
CN108447471B (en) * 2017-02-15 2021-09-10 腾讯科技(深圳)有限公司 Speech recognition method and speech recognition device
CN107610708B (en) * 2017-06-09 2018-06-19 平安科技(深圳)有限公司 Identify the method and apparatus of vocal print
CN109102812B (en) * 2017-06-21 2021-08-31 北京搜狗科技发展有限公司 Voiceprint recognition method and system and electronic equipment
CN107492379B (en) 2017-06-30 2021-09-21 百度在线网络技术(北京)有限公司 Voiceprint creating and registering method and device
CN107248410A (en) * 2017-07-19 2017-10-13 浙江联运知慧科技有限公司 The method that Application on Voiceprint Recognition dustbin opens the door
CN109559759B (en) * 2017-09-27 2021-10-08 华硕电脑股份有限公司 Electronic device with incremental registration unit and method thereof
CN109584884B (en) * 2017-09-29 2022-09-13 腾讯科技(深圳)有限公司 Voice identity feature extractor, classifier training method and related equipment
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 A kind of method for recognizing sound-groove and device
CN108154588B (en) * 2017-12-29 2020-11-27 深圳市艾特智能科技有限公司 Unlocking method and system, readable storage medium and intelligent device
CN110047491A (en) * 2018-01-16 2019-07-23 中国科学院声学研究所 A kind of relevant method for distinguishing speek person of random digit password and device
CN108269590A (en) * 2018-01-17 2018-07-10 广州势必可赢网络科技有限公司 A kind of vocal cords restore methods of marking and device
CN108447489B (en) * 2018-04-17 2020-05-22 清华大学 Continuous voiceprint authentication method and system with feedback
CN110875044B (en) * 2018-08-30 2022-05-03 中国科学院声学研究所 Speaker identification method based on word correlation score calculation
CN109117622B (en) * 2018-09-19 2020-09-01 北京容联易通信息技术有限公司 Identity authentication method based on audio fingerprints
CN109257362A (en) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN111199729B (en) * 2018-11-19 2023-09-26 阿里巴巴集团控股有限公司 Voiceprint recognition method and voiceprint recognition device
CN109473107B (en) * 2018-12-03 2020-12-22 厦门快商通信息技术有限公司 Text semi-correlation voiceprint recognition method and system
CN111669350A (en) * 2019-03-05 2020-09-15 阿里巴巴集团控股有限公司 Identity verification method, verification information generation method, payment method and payment device
CN110600041B (en) * 2019-07-29 2022-04-29 华为技术有限公司 Voiceprint recognition method and device
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium
CN110517695A (en) * 2019-09-11 2019-11-29 国微集团(深圳)有限公司 Verification method and device based on vocal print
CN110971763B (en) * 2019-12-10 2021-01-26 Oppo广东移动通信有限公司 Arrival reminding method and device, storage medium and electronic equipment
CN110956732A (en) * 2019-12-19 2020-04-03 重庆特斯联智慧科技股份有限公司 Safety entrance guard based on thing networking
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111081256A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Digital string voiceprint password verification method and system
CN111597531A (en) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 Identity authentication method and device, electronic equipment and readable storage medium
CN111613230A (en) * 2020-06-24 2020-09-01 泰康保险集团股份有限公司 Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN112435673B (en) * 2020-12-15 2024-05-14 北京声智科技有限公司 Model training method and electronic terminal
CN112820299B (en) * 2020-12-29 2021-09-14 马上消费金融股份有限公司 Voiceprint recognition model training method and device and related equipment
CN113113022A (en) * 2021-04-15 2021-07-13 吉林大学 Method for automatically identifying identity based on voiceprint information of speaker
CN113570754B (en) * 2021-07-01 2022-04-29 汉王科技股份有限公司 Voiceprint lock control method and device and electronic equipment
WO2024077588A1 (en) * 2022-10-14 2024-04-18 Qualcomm Incorporated Voice-based user authentication
CN115641852A (en) * 2022-10-18 2023-01-24 中国电信股份有限公司 Voiceprint recognition method and device, electronic equipment and computer readable storage medium
CN115550075B (en) * 2022-12-01 2023-05-09 中网道科技集团股份有限公司 Anti-counterfeiting processing method and equipment for community correction object public welfare activity data
CN116530944B (en) * 2023-07-06 2023-10-20 荣耀终端有限公司 Sound processing method and electronic equipment
CN116978368B (en) * 2023-09-25 2023-12-15 腾讯科技(深圳)有限公司 Wake-up word detection method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254559A (en) * 2010-05-20 2011-11-23 盛乐信息技术(上海)有限公司 Identity authentication system and method based on vocal print
CN102314877A (en) * 2010-07-08 2012-01-11 盛乐信息技术(上海)有限公司 Voiceprint identification method for character content prompt
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
CN104282303A (en) * 2013-07-09 2015-01-14 威盛电子股份有限公司 Method for conducting voice recognition by voiceprint recognition and electronic device thereof
CN104901808A (en) * 2015-04-14 2015-09-09 时代亿宝(北京)科技有限公司 Voiceprint authentication system and method based on time type dynamic password

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100406307B1 (en) * 2001-08-09 2003-11-19 삼성전자주식회사 Voice recognition method and system based on voice registration method and system
CN101997689B (en) * 2010-11-19 2012-08-08 吉林大学 USB (universal serial bus) identity authentication method based on voiceprint recognition and system thereof
CN102163427B (en) * 2010-12-20 2012-09-12 北京邮电大学 Method for detecting audio exceptional event based on environmental model
CN102238189B (en) * 2011-08-01 2013-12-11 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system
CN103679452A (en) * 2013-06-20 2014-03-26 腾讯科技(深圳)有限公司 Payment authentication method, device thereof and system thereof
CN104064189A (en) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 Vocal print dynamic password modeling and verification method
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN105096121B (en) * 2015-06-25 2017-07-25 百度在线网络技术(北京)有限公司 voiceprint authentication method and device
CN105656887A (en) * 2015-12-30 2016-06-08 百度在线网络技术(北京)有限公司 Artificial intelligence-based voiceprint authentication method and device
CN106098068B (en) * 2016-06-12 2019-07-16 腾讯科技(深圳)有限公司 A kind of method for recognizing sound-groove and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254559A (en) * 2010-05-20 2011-11-23 盛乐信息技术(上海)有限公司 Identity authentication system and method based on vocal print
CN102314877A (en) * 2010-07-08 2012-01-11 盛乐信息技术(上海)有限公司 Voiceprint identification method for character content prompt
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
CN104282303A (en) * 2013-07-09 2015-01-14 威盛电子股份有限公司 Method for conducting voice recognition by voiceprint recognition and electronic device thereof
CN104901808A (en) * 2015-04-14 2015-09-09 时代亿宝(北京)科技有限公司 Voiceprint authentication system and method based on time type dynamic password

Also Published As

Publication number Publication date
WO2017215558A1 (en) 2017-12-21
CN106098068A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106098068B (en) A kind of method for recognizing sound-groove and device
CN106057206B (en) Sound-groove model training method, method for recognizing sound-groove and device
CN107104803B (en) User identity authentication method based on digital password and voiceprint joint confirmation
WO2016150032A1 (en) Artificial intelligence based voiceprint login method and device
KR20190075914A (en) Single-versus-short speaker recognition using depth neural networks
CN111402862B (en) Speech recognition method, device, storage medium and equipment
Mansour et al. Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms
CN109243465A (en) Voiceprint authentication method, device, computer equipment and storage medium
CN104765996B (en) Voiceprint password authentication method and system
Saquib et al. A survey on automatic speaker recognition systems
CN112712809B (en) Voice detection method and device, electronic equipment and storage medium
CN106782603A (en) Intelligent sound evaluating method and system
KR101988165B1 (en) Method and system for improving the accuracy of speech recognition technology based on text data analysis for deaf students
CN107346568A (en) The authentication method and device of a kind of gate control system
Beigi Challenges of LargeScale Speaker Recognition
CN110111798A (en) A kind of method and terminal identifying speaker
Meyer et al. Anonymizing speech with generative adversarial networks to preserve speaker privacy
CN111613230A (en) Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN114220419A (en) Voice evaluation method, device, medium and equipment
KR20210071713A (en) Speech Skill Feedback System
CN110364180A (en) A kind of examination system and method based on audio-video processing
Nazir et al. A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering
Mandalapu et al. Multilingual voice impersonation dataset and evaluation
CN105976819A (en) Rnorm score normalization based speaker verification method
CN106128464B (en) UBM divides the method for building up of word model, vocal print feature generation method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230712

Address after: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors

Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd.

Address before: 2, 518000, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

TR01 Transfer of patent right