CN106057206A - Voiceprint model training method, voiceprint recognition method and device - Google Patents

Voiceprint model training method, voiceprint recognition method and device Download PDF

Info

Publication number
CN106057206A
CN106057206A CN201610388231.3A CN201610388231A CN106057206A CN 106057206 A CN106057206 A CN 106057206A CN 201610388231 A CN201610388231 A CN 201610388231A CN 106057206 A CN106057206 A CN 106057206A
Authority
CN
China
Prior art keywords
character
targeted customer
vocal print
model
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610388231.3A
Other languages
Chinese (zh)
Other versions
CN106057206B (en
Inventor
李为
钱柄桦
金星明
李科
吴富章
吴永坚
黄飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610388231.3A priority Critical patent/CN106057206B/en
Publication of CN106057206A publication Critical patent/CN106057206A/en
Application granted granted Critical
Publication of CN106057206B publication Critical patent/CN106057206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/16Hidden Markov models [HMM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a voiceprint model training method, a voiceprint recognition method and a voiceprint recognition device, which belong to the field of voice recognition. The voiceprint recognition method comprises the steps of: acquiring a test voice signal generated by an unknown user reading a second character string, wherein the second character string comprises a plurality of characters arranged in sequence; extracting a voiceprint feature sequence corresponding to the characters from the test voice signal; constructing an HMM corresponding to the second character string according to n GMM, which correspond to n kinds of basic characters respectively, of a target user; calculating a similarity score of the voiceprint feature sequence and the HMM; and recognizing the unknown user to be the target user when the similarity score is greater than a preset threshold value. According to the voiceprint model training method, the voiceprint recognition method and the voiceprint recognition device, difference between audio content corresponding to each kind of basic characters in the phoneme level is considered among the GMM, which correspond to the basic characters respectively, of the target user, the HMM further considers the correlation between the audio content corresponding to each basic character in the time domain, and the recognition accuracy rate can be significantly improved.

Description

Sound-groove model training method, method for recognizing sound-groove and device
Technical field
The present embodiments relate to field of speech recognition, particularly to a kind of sound-groove model training method, Application on Voiceprint Recognition side Method and device.
Background technology
Application on Voiceprint Recognition is a kind of to utilize vocal print characteristic information that unknown subscriber is carried out the technology of identity validation.Application on Voiceprint Recognition can Need to identify the scene of user identity for gate control system, payment system etc..Current Application on Voiceprint Recognition generally uses text to be correlated with Identify.
Application on Voiceprint Recognition generally includes two processes: the registration process of targeted customer and the identification procedure of unknown subscriber. In the registration process of targeted customer, system can provide a login-string to read aloud for targeted customer, and this login-string leads to Often include that tactic several are digital and/or alphabetical, the registration voice signal produced when system acquisition targeted customer reads aloud, And the gauss hybrid models (Gaussian Mixture Model, GMM) of targeted customer is obtained according to registration voice signal training; In the identification procedure of unknown subscriber, tested speech signal when unknown subscriber is read aloud an identification string and target The GMM of user carries out similarity mode, when similarity is more than predetermined threshold value, unknown subscriber is identified as targeted customer.
During realizing the embodiment of the present invention, inventor finds that prior art at least there is problems in that above-mentioned In method, there is dependency in audio content corresponding with each basis character in registration voice signal, this registration voice signal bag Contain abundant information for characterizing the feature of unknown subscriber, but the GMM of targeted customer be the model that a kind of text is unrelated, it is impossible to Utilize information abundant in registration voice signal.
Summary of the invention
In view of this, a kind of sound-groove model training method, method for recognizing sound-groove and device are embodiments provided.Institute State technical scheme as follows:
First aspect, it is provided that a kind of sound-groove model training method, described method includes:
Gathering targeted customer and read aloud the first character string produced registration voice signal, described first character string includes m The character of arranged in sequence, described m character includes n kind mutually different basis character, m and n is positive integer and m >=n;
The vocal print feature corresponding to each character is extracted from described registration voice signal;
It is characterized as the first sample data, to presetting with the described vocal print corresponding to each described character of described targeted customer Universal background model be trained, obtain the mixed Gauss model of described targeted customer;
It is characterized as the second sample data, to described with the vocal print corresponding to i-th kind of basic character with described targeted customer The described mixed Gauss model of targeted customer is trained, obtain described targeted customer with i-th kind of basic character corresponding to Described mixed Gauss model;
Store n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, described n mixing Gauss model is for building the hidden Markov model corresponding with the second character string.
Second aspect, it is provided that a kind of method for recognizing sound-groove, described method includes:
Obtaining unknown subscriber and read aloud tested speech signal produced by the second character string, described second character string includes k The character of arranged in sequence, described k character include n kind mutually different basis character in alphabet or partial character, k and N is positive integer;
The vocal print characteristic sequence corresponding to each character is extracted from described tested speech signal;
According to n the mixed Gauss model the most corresponding with n kind basis character of targeted customer, build and described second word The HMM that symbol string is corresponding;
Calculate the similarity score of described vocal print characteristic sequence and described HMM;
When described similarity score is more than predetermined threshold value, described unknown subscriber is identified described targeted customer.
The third aspect, it is provided that a kind of sound-groove model training devices, described device includes:
Acquisition module, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, and described first Character string includes that the character of m arranged in sequence, described m character include n kind mutually different basis character, m and n is the most whole Number and m >=n;
Extraction module, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module, for being characterized as the with the described vocal print corresponding to each described character of described targeted customer One sample data, is trained default universal background model, obtains the mixed Gauss model of described targeted customer;
Second training module, for being characterized as the with described targeted customer with the vocal print corresponding to i-th kind of basic character Two sample datas, are trained the described mixed Gauss model of described targeted customer, obtain described targeted customer with i-th kind The basis described mixed Gauss model corresponding to character;
Memory module, for storing n the mixed Gaussian mould the most corresponding with n kind basis character of described targeted customer Type, described n mixed Gauss model is for building the hidden Markov model corresponding with the second character string.
Fourth aspect, it is provided that a kind of voice print identification device, described device includes:
Acquisition module, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, and described second Character string includes that the character of k arranged in sequence, described k character include the alphabet in the character of n kind mutually different basis Or partial character, k and n is positive integer;
Extraction module, for extracting the vocal print characteristic sequence corresponding to each character from described tested speech signal;
Build module, for n the mixed Gauss model the most corresponding with n kind basis character according to targeted customer, structure Build the HMM corresponding with described second character string;
Computing module, for calculating the similarity score of described vocal print characteristic sequence and described HMM;
Identification module, for when described similarity score is more than predetermined threshold value, identifying described mesh by described unknown subscriber Mark user.
The sound-groove model training method that the embodiment of the present invention provides has the benefit that
By according to the vocal print feature corresponding to each character of targeted customer, UBM training being obtained targeted customer's GMM, trains n the GMM, n GMM the most corresponding with n kind basis character obtaining targeted customer to be used for by the GMM of targeted customer Build the HMM corresponding with the second character string;Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize note The problem of information abundant in volume voice signal;Having reached for each targeted customer, training obtains and several basis characters The most corresponding GMM, considers every kind of audio content corresponding to the basic character diversity in phoneme aspect between each GMM, In addition these several GMM can also be used to build and the HMM model corresponding to identification string, and HMM model also contemplates each Basis audio content corresponding to character dependency in time domain such that it is able to the sound-groove model greatly increasing targeted customer exists The recognition accuracy in identification stage;
The method for recognizing sound-groove that the embodiment of the present invention provides has the benefit that
By by the vocal print characteristic sequence of tested speech signal, HMM meter GMM constructed by corresponding with multiple bases character Calculate similarity score, thus unknown subscriber is carried out identification;Solving the GMM of targeted customer is the unrelated mould of a text Type, it is impossible to the problem utilizing information abundant in registration voice signal;Reach for each targeted customer, with each base word Every kind of audio content corresponding to the basic character diversity in phoneme aspect, and HMM mould is considered between the GMM that symbol is corresponding respectively Type also contemplates audio content corresponding to each basis character dependency in time domain such that it is able to greatly increases target and uses The sound-groove model at family is at the recognition accuracy in identification stage.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below required for make Accompanying drawing be briefly described, it should be apparent that, below describe in accompanying drawing be only some embodiments of the present invention, for From the point of view of those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain other according to these accompanying drawings Accompanying drawing.
Fig. 1 is the principle schematic of the method for recognizing sound-groove based on random string that one embodiment of the invention provides;
Fig. 2 is the flow chart of the sound-groove model training method that one embodiment of the invention provides;
Fig. 3 is the principle schematic of sound-groove model training method shown in Fig. 2;
Fig. 4 is the flow chart of the sound-groove model training method that another embodiment of the present invention provides;
Fig. 5 is the principle schematic of the voice messaging annotation process involved by sound-groove model training method shown in Fig. 4;
Fig. 6 is the principle schematic of the model training process involved by sound-groove model training method shown in Fig. 4;
Fig. 7 is the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides;
Fig. 8 is the flow chart of the method for recognizing sound-groove that another embodiment of the present invention provides;
Fig. 9 is the model schematic of the HMM constructed by method for recognizing sound-groove shown in Fig. 8;
Figure 10 is the block diagram of the sound-groove model training devices that one embodiment of the invention provides;
Figure 11 is the block diagram of the voice print identification device that another embodiment of the present invention provides.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
Embodiments provide a kind of method for recognizing sound-groove based on random string and device.Should be based on random words Method for recognizing sound-groove and the device of symbol string can be applicable in the scene that be there is a need to identify unknown subscriber's identity.Random for generating The basic character of character string can be Arabic numerals, English alphabet or other language characters etc., and each basis character is typically One numeral or a character, but it is not excluded for the entirety possibility as a basic character of several numerals or several characters Property.In order to simplify description, the embodiment of the present invention illustrates as a example by each basis character is Arabic numerals.
Method for recognizing sound-groove based on random string is divided into two stages, as shown in Figure 1:
First, the registration phase 12 of targeted customer;
At registration phase, one login-string of voice print identification device stochastic generation, and in display interface, show this number Word character string.Targeted customer reads aloud this login-string, and voice print identification device gathers this targeted customer registration language when reading aloud Tone signal, then carries out vocal print feature extraction and vocal print model training to registration voice signal, obtains the vocal print mould of targeted customer Type.In the sound-groove model of each targeted customer, comprise several GMM (Gaussian Mixture Model, mixed Gaussian mould Type), each GMM is corresponding with one numeral.
Such as, this login-string is digit strings 0185851, contains four kinds of digital " 0 ", " 1 ", " 5 ", " 8 ", then In the sound-groove model of each targeted customer, comprise GMM corresponding with numeral " 1 " for the GMM corresponding with digital " 0 " and numeral " 5 " The GMM that corresponding GMM is corresponding with numeral " 8 ".
Second, the identification stage 14 of unknown subscriber.
In the identification stage, voice print identification device is further according to digital collection " 0 ", " 1 ", " 5 " and " 8 " stochastic generation one Identification string, and in display interface, show this identification string, unknown subscriber reads aloud this identification string, and Application on Voiceprint Recognition fills Put collection this unknown subscriber tested speech signal when reading aloud, then tested speech signal is carried out vocal print feature extraction, adopt With sound-groove model corresponding to each targeted customer build the HMM corresponding with digit strings (Hidden Markov Model, hidden Markov model), calculate the similarity of vocal print feature and each HMM of unknown subscriber, by the highest for similarity and similarity is high In the targeted customer corresponding to a HMM of threshold value, as the identification result of unknown subscriber.
Such as, the identification string of stochastic generation is digit strings 85851510 again, then voice print identification device according to Each targeted customer with digital " 0 ", " 1 ", " 5 ", " 8 " each self-corresponding GMM, it is right with identification string " 85851510 " to build The HMM answered, calculates the similarity of the vocal print feature of unknown subscriber and the HMM of each targeted customer, the highest and similar in similarity When degree is targeted customer B higher than the HMM of threshold value, using targeted customer B as the identification result of unknown subscriber.
Use different embodiments that above-mentioned two process is illustrated respectively below.
Fig. 2 shows the method flow diagram of the sound-groove model training method that one embodiment of the invention provides.This vocal print mould Type training method can apply in Voiceprint Recognition System.This sound-groove model training method includes:
Step 201, gathers targeted customer and reads aloud the first character string produced registration voice signal, and book character string includes The character of m arranged in sequence, m character includes n kind mutually different basis character;
First character string is the character string of the registration phase for targeted customer.Alternatively, this first character string is random The character string generated.M, n are positive integer, and m >=n.
Such as, the first character string is " 12358948 ", totally 8 characters, include 7 kinds mutually different basis characters " 1 ", “2”、“3”、“4”、“5”、“8”、“9”。
Step 202, extracts the vocal print feature corresponding to each character from registration voice signal;
Such as, from primary speech signal, extract the language that the sound bite corresponding with character " 1 " is corresponding with character " 1 " The voice sheet that sound bite that the tablet section sound bite corresponding with character " 2 " is corresponding with character " 3 " is corresponding with character " 4 " The sound bite that sound bite that the section sound bite corresponding with character " 5 " is corresponding with character " 8 " is corresponding with character " 9 ".
Then, from the sound bite that each character is corresponding, the vocal print feature corresponding with this character is extracted.
Step 203, is characterized as the first sample data according to the vocal print corresponding to each character of targeted customer, to default UBM is trained, and obtains the GMM of targeted customer;
UBM (Universal Background Model, universal background model) is that build in advance whole are instructed by numeral The universal model got.UBM has the characteristic that identity is unrelated and text is unrelated.The unrelated UBM of referring to of identity does not consider user's body Part difference, the most corresponding some or certain several specific users;The unrelated UBM of referring to of text does not consider numeral (character) difference, no The several specific numerals of correspondence some or certain, as shown in the UBM32 in Fig. 3.
Alternatively, maximal posterior probability algorithm (Maximum A Posteriori, MAP) each according to targeted customer is used Individual vocal print feature, is adjusted the parameter in UBM, thus self adaptation obtains the GMM of targeted customer.
The GMM of targeted customer has the feature that identity is relevant and text is unrelated.Identity is relevant refers to that this GMM is corresponding specific Targeted customer;Text is unrelated refers to that this GMM does not consider numeral (basis character) difference, the most corresponding some or certain several specifically Numeral, as shown in the GMM34 of the targeted customer in Fig. 3.
Step 204, is characterized as the second sample data with targeted customer with the vocal print corresponding to i-th kind of basic character, right The GMM of targeted customer is trained, obtain targeted customer with the GMM corresponding to i-th kind of basic character;
Alternatively, use maximum a posteriori probability (Maximum A Posteriori, MAP) algorithm according to targeted customer with Vocal print feature corresponding to i-th kind of character, is adjusted the parameter in the GMM of targeted customer, thus self adaptation obtains target User with the GMM corresponding to i-th kind of basic character.Have to the GMM corresponding to i-th kind of basic character that identity is relevant and text Relevant feature, identity is relevant refers to the corresponding specific targeted customer of this GMM;Text is relevant refers to the corresponding specific number of this GMM Word, as shown in the GMM36 the most corresponding with various bases character in Fig. 3.
Such as, according to targeted customer A with numeral " 8 " corresponding to vocal print feature, to the ginseng in the GMM of targeted customer A Number be adjusted, thus obtain targeted customer A with numeral " 8 " corresponding to GMM.
Repeated execution of steps 204, obtains n the GMM the most corresponding with each single character of targeted customer.
Step 205, n the GMM that the character single with n kind of storage targeted customer is respectively the most corresponding, n GMM be used for structure and The HMM that second character string is corresponding.
N the GMM of storage targeted customer is to model library, in order in the identification stage of follow-up unknown subscriber, uses N the GMM of targeted customer builds the HMM corresponding with the second character string.
In sum, the sound-groove model training method that the present embodiment provides, by each character institute according to targeted customer Corresponding vocal print feature, obtains the GMM of targeted customer by UBM training, the GMM of targeted customer is trained obtain targeted customer with N the GMM that n kind basis character is the most corresponding, n GMM is for building the HMM corresponding with the second character string;Solve target to use The GMM at family is the model that a text is unrelated, it is impossible to the problem utilizing information abundant in registration voice signal;Reached for Each targeted customer, training obtains the GMM the most corresponding with several basis characters, considers every kind of basis between each GMM Audio content corresponding to character is in the diversity of phoneme aspect, and these several GMM can also be used to build and identify character in addition HMM model corresponding to string, HMM model also contemplates audio content corresponding to each basis character dependency in time domain, It is thus possible to greatly increase the sound-groove model recognition accuracy in the identification stage of targeted customer.
Fig. 4 shows the method flow diagram of the sound-groove model training method that one embodiment of the invention provides.This vocal print mould Type training method can apply in Voiceprint Recognition System.The present embodiment includes with this sound-groove model training method:
Step 401, stochastic generation the first character string also shows.
Alternatively, in Voiceprint Recognition System, storage has basic character set.As a example by basis character is numeral, basis character Set includes: 0,1,2,3,4,5,6,7,8,9.
Alternatively, Voiceprint Recognition System is with the basic character in the character set of basis as element, random according to random algorithm Generate the first character string.First character string includes that the character of m arranged in sequence, m character include the mutually different base word of n kind Symbol, m and n is positive integer and m >=n.That is, each basis character can occur many in the kinds of characters position in the first character string Secondary.Alternatively, in order to improve model test coverage, the first character string can include the whole bases character in the character set of basis.
Such as, the first character string is 1981753651240;The most such as, the first character string is 01580518.
Voiceprint Recognition System by the first character string display on a display screen, is read aloud for targeted customer to be registered.Alternatively, Voiceprint Recognition System shows auxiliary information the most on a display screen, and schematically auxiliary information is " please after prompt tone, under bright reading State numeric string: 01580518 ".
Alternatively, in addition to stochastic generation mode, the first character string can also is that default changeless character string.
Step 402, gathers targeted customer and reads aloud the first character string produced registration voice signal.
Voiceprint Recognition System gathers targeted customer by mike and reads aloud the first character string produced registration voice signal.
Step 403, identifies the efficient voice fragment in registration voice signal and invalid voice fragment.
Owing to targeted customer is when reading aloud each character, between adjacent two characters, there is the dead time, so registration language Tone signal i.e. includes effective sound bite, includes again invalid voice fragment.Invalid voice fragment can be the most quiet sheet Section, the most quiet section;Can also be the fragment comprising noise, i.e. noise section.
Voiceprint Recognition System needs to identify the efficient voice fragment in registration voice signal and invalid voice fragment.Fig. 5 Schematically illustrate the principle schematic of this identification process.Registration voice is believed by Voiceprint Recognition System by speech recognition engine Numbers 50 are labeled, and the region between two adjacent efficient voice fragments (sound bite at waveshape signal place in figure) is nothing Effect sound bite, is not involved in follow-up calculating process.
Alternatively, after being labeled registration voice signal, corresponding voice annotation information is according to (initial time, termination Moment, basis character) form preserve, the voice annotation information of such as Fig. 4 as shown in Table 1:
Table one
Initial time End time Basis character
1.86 2.36 0
3.07 3.60 1
…… …… ……
10.11 10.55 8
Wherein, 1.86 refer to first basic character " 0 " initial time in registration voice signal, and 2.36 refer to first Individual basis character " 0 " end time in registration voice signal;3.07 refer to that second basic character " 1 " is at registration voice letter Initial time in number, 3.60 refer to second basic character " 1 " end time in registration voice signal;10.11 refer to Last basis character " 8 " initial time in registration voice signal, 10.55 refer to that last basis character " 8 " exists End time in registration voice signal.
Step 404, by the jth efficient voice fragment in registration voice signal, is extracted as and the jth in the first character string Sound bite corresponding to individual character.
Voiceprint Recognition System will registration voice signal in first efficient voice fragment, be extracted as with in the first character string The sound bite corresponding to first character;By second efficient voice fragment in registration voice signal, it is extracted as and the The sound bite corresponding to second character in one character string, by that analogy, last in registration voice signal is effective Sound bite, is extracted as and the sound bite corresponding to last character in the first character string.
Such as, in conjunction with Fig. 5, the sound bite corresponding to " 1.86-2.36 " in registration voice signal is extracted as and first The sound bite that individual character " 0 " is corresponding.
Step 405, extracts and the vocal print feature of the sound bite corresponding to jth character.
Each sound bite is equivalent to a Short Time Speech frame sequence, and it is right with jth character institute that Voiceprint Recognition System extracts MFCC (Mel Frequency Cepstrum Coefficient, mel cepstrum coefficients) in the sound bite answered or PLP (Perceptual Linear Predict ive, perception linear predictor coefficient), as with the voice corresponding to jth character The vocal print feature of fragment.
It should be noted that j is more than or equal to 1 and less than or equal to m positive integer.Alternatively, exist and be positioned at different sequence Position but substantially identical character, such as in the first character string " 01580518 ", first character and the 5th character are Basis character " 0 ", now can extract two the vocal print features corresponding with basis character " 0 ".
If the first character string includes n kind basis character, then the available vocal print the most corresponding with n kind basis character is special Levy.
Step 406, is characterized as the first sample data with the vocal print corresponding to each basis character of targeted customer, uses Parameter in default UBM is adjusted by big posterior probability algorithm, obtains the GMM of targeted customer.
UBM is the whole universal background models obtained by numeral training built in advance.UBM has that identity is unrelated and text Unrelated characteristic.Schematically, employing number is more than 1000 people, the duration speech samples more than 20 hours, does not consider numeral Difference, training obtains UBM.
The mathematic(al) representation of UBM is:
P ( x ) = Σ i = 1... C ω i N ( x | μ i , Σ i )
Wherein, P (x) represents the probability distribution of UBM, and C represents total C Gauss module in UBM, sums up, ωiRepresent The weight of i-th Gauss module, μiRepresenting the average of i-th Gauss module, N (x) represents Gauss distribution, and x represents the sample of input This, sample namely vocal print feature.
In this step, do not consider the feature difference between the character of basis, by corresponding for all bases character of targeted customer All vocal print features as input the first sample data, UBM is trained.In the training process, maximum a posteriori is passed through Parameter in UBM is adjusted by probabilistic algorithm, thus obtains the GMM of targeted customer.
Step 407, is characterized as the second sample data with targeted customer with the vocal print corresponding to i-th kind of basic character, adopts Be adjusted by the parameter in the maximal posterior probability algorithm GMM to targeted customer, obtain targeted customer with i-th kind of base word GMM corresponding to symbol.
In this step, need to consider the feature difference between the character of basis, be only used for corresponding to i-th kind of basic character Vocal print feature as input the second sample data, the GMM of targeted customer is carried out second training.In the training process, logical Cross maximal posterior probability algorithm the parameter in the GMM of targeted customer is adjusted, obtain targeted customer with i-th kind of base word GMM corresponding to symbol.
Such as, according to targeted customer with the vocal print feature corresponding to digital " 0 " as input sample, to targeted customer's GMM carries out second training, obtain targeted customer with the vocal print feature corresponding to digital " 0 ".
When there is vocal print feature corresponding to n kind basis character, after performing step 407, whether Voiceprint Recognition System detection i Equal to n, if i is less than n, then makes i=i+1, again perform step 407.
For each targeted customer, final training obtains n the GMM the most corresponding with n kind basis character, basic character and GMM one_to_one corresponding,
Schematically with reference to Fig. 6, the first character string is 01580518, and final training obtains the sound-groove model of targeted customer In, including the GMM corresponding with 4 basic characters, GMM, ID_5 that GMM, ID_1 of being corresponding for ID_0 respectively are corresponding are corresponding The GMM that GMM, ID_8 are corresponding.
Step 408, n the GMM respectively the most corresponding with n kind basis character of storage targeted customer, n GMM be used for structure and The HMM that second character string is corresponding.
N the GMM the most corresponding with n kind basis character of voiceprint identification module storage targeted customer.
Second character string is the character string used in identification procedure.Optionally, the second character string is based on n kind base Alphabet in plinth character or the character string of partial character institute stochastic generation.Every kind of basic character can in the second character string not Occur with ordinal position, and every kind of basic character can occur repeatedly in the different order position of the second character string.
In sum, the sound-groove model training method that the present embodiment provides, by each character institute according to targeted customer Corresponding vocal print feature, obtains the GMM of targeted customer by UBM training, the GMM of targeted customer is trained obtain targeted customer with N the GMM that n kind basis character is the most corresponding, n GMM is for building the HMM corresponding with the second character string;Solve target to use The GMM at family is the model that a text is unrelated, it is impossible to the problem utilizing information abundant in registration voice signal;Reached for Each targeted customer, training obtains the GMM the most corresponding with several basis characters, considers every kind of basis between each GMM Audio content corresponding to character is in the diversity of phoneme aspect, and these several GMM can also be used to build and identify character in addition HMM model corresponding to string, HMM model also contemplates audio content corresponding to each basis character dependency in time domain, It is thus possible to greatly increase the sound-groove model recognition accuracy in the identification stage of targeted customer.
Fig. 7 shows the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides.This method for recognizing sound-groove can Being applied in Voiceprint Recognition System, this Voiceprint Recognition System can belong to same with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4 Equipment, it is also possible to belong to distinct device with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4.This method for recognizing sound-groove includes:
Step 701, obtains unknown subscriber and reads aloud tested speech signal produced by the second character string.
Alternatively, the second character string includes that the character of k arranged in sequence, k character include the mutually different base word of n kind Alphabet in symbol or partial character, k and n is positive integer.
Alternatively, n kind mutually different basis character is the n kind basis character that the registration process of targeted customer is used.
Alternatively, that the second character string is randomly generated or changeless, the second character string is identical with the first character string Or differ.Such as, the second character string is digit strings " 851185 ".
Step 702, extracts the vocal print characteristic sequence corresponding to each character from registration voice signal.
Step 703, according to n the GMM the most corresponding with n kind basis character of targeted customer, builds and the second character string Corresponding HMM.
Such as, n the GMM of targeted customer includes the GMM corresponding with 4 basic characters, be respectively corresponding for ID_0 GMM, The GMM that GMM, ID_8 corresponding for GMM, ID_5 corresponding for ID_1 is corresponding.
Owing to the second character string only includes basis character " 1 " " 5 " " 8 ", then utilize GMM, ID_5 corresponding for ID_1 corresponding The GMM that GMM, ID_8 are corresponding, constructs the HMM corresponding with the second character string " 851185 ".
Step 704, calculates the similarity score of tested speech signal and HMM.
Step 705, when similarity score is more than predetermined threshold value, identifies targeted customer by unknown subscriber.
In sum, the method for recognizing sound-groove that the present embodiment provides, by by the vocal print characteristic sequence of tested speech signal, HMM GMM constructed by corresponding with multiple bases character calculates similarity score, thus unknown subscriber is carried out identification; Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize asking of information abundant in registration voice signal Topic;Reach, for each targeted customer, between the GMM the most corresponding with each basis character, to consider every kind of basic character pair The audio content answered is in the diversity of phoneme aspect, and the audio content that HMM model also contemplates each basis character corresponding exists Dependency in time domain such that it is able to greatly increase the sound-groove model of targeted customer accurate in the identification in identification stage Rate.
Fig. 8 shows the flow chart of the method for recognizing sound-groove that one embodiment of the invention provides.This method for recognizing sound-groove can Being applied in Voiceprint Recognition System, this Voiceprint Recognition System can belong to same with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4 Equipment, it is also possible to belong to distinct device with the Voiceprint Recognition System mentioned by Fig. 2 or Fig. 4.This method for recognizing sound-groove includes:
Step 801, based on n kind basis character, stochastic generation the second character string also shows.
Alternatively, in Voiceprint Recognition System, storage has basic character set.As a example by basis character is numeral, basis character Set may include that 0,1,2,3,4,5,6,7,8,9.
Alternatively, Voiceprint Recognition System is with the basic character in the character set of basis as element, random according to random algorithm Generate the second character string.Second character string includes that the character of k arranged in sequence, k character include the mutually different base word of n kind Alphabet in symbol or partial character, k and n is positive integer, usual k >=n.That is, a basic character can be at the second word Kinds of characters position in symbol string occurs repeatedly.Such as, the second character string is 851185.
Alternatively, n kind mutually different basis character is the n kind basis character that the registration process of targeted customer is used.
Voiceprint Recognition System by the second character string display on a display screen, is read aloud for unknown subscriber.Alternatively, Application on Voiceprint Recognition System shows auxiliary information the most on a display screen, schematically auxiliary information be " please after prompt tone, the following numeral of bright reading String: 851185 ".
Alternatively, in addition to stochastic generation mode, the second character string can also is that default changeless character string.
Step 802, extracts the vocal print characteristic sequence corresponding to each character from tested speech signal;
Owing to unknown subscriber is when reading aloud each character, between adjacent two characters, there is the dead time, so test language Tone signal i.e. includes effective sound bite, includes again invalid voice fragment.Invalid voice fragment can be quiet section or noise Section.
Voiceprint Recognition System identifies the efficient voice fragment in tested speech signal and invalid voice fragment, and to effectively Sound bite is labeled.This process is referred to the associated description in step 403.
Voiceprint Recognition System by the jth efficient voice fragment in tested speech signal, be extracted as with in the first character string The sound bite corresponding to jth character, and extract the vocal print feature with sound bite corresponding to jth character.
Each sound bite is equivalent to a Short Time Speech frame sequence, and it is right with jth character institute that Voiceprint Recognition System extracts MFCC or PLP in the sound bite answered, as the vocal print feature with the sound bite corresponding to jth character.Due to test Voice signal includes k character, so Voiceprint Recognition System can extract the k group vocal print feature of arranged in sequence, often group sound Stricture of vagina feature includes MFCC or PLP of the quantity not speech frame of grade, after all of vocal print feature is ranked up, shape according to timestamp Become the vocal print characteristic sequence of tested speech signal.
Such as, for the 1st character " 8 ", extract one group of vocal print feature of duration 1000 milliseconds, if each speech frame Frame length is about 20 milliseconds, then there are about 50 vocal print features in this group vocal print feature;For the 2nd character " 5 ", extract One group of vocal print feature of duration 1020 milliseconds, if the frame length of each speech frame is about 20 milliseconds, then deposits in this group vocal print feature About 51 vocal print features, like this, repeat the most one by one.
In other words, the 1st character " 8 " is both corresponded in the most tactic 50 vocal print features, subsequently 51 vocal print features of arrangement both correspond to the 2nd character " 5 ", like this, repeat the most one by one.
Step 803, obtains the x-th character of the second character string, respectively the most corresponding with n kind basis character from targeted customer In n GMM, by the GMM corresponding with x-th character, it is defined as the xth scalariform states model of HMM;
It is as a example by " 851185 " by the second character string, obtains the 1st character " 8 " of the second character string, from targeted customer's In n the GMM respectively the most corresponding with n kind basis character, GMM that will be corresponding with the 1st character " 8 ", it is defined as Hidden Markov mould 1st scalariform states model of type;
Obtain the 1st character " 8 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 1st character " 8 ", it is defined as the 1st scalariform states model of HMM;
Obtain the 2nd character " 5 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 2nd character " 5 ", it is defined as the 2nd scalariform states model of HMM;
Obtain the 3rd character " 1 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 3rd character " 1 ", it is defined as the 3rd scalariform states model of HMM;
Obtain the 4th character " 1 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 4th character " 1 ", it is defined as the 4th scalariform states model of HMM;
Obtain the 5th character " 8 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 5th character " 8 ", it is defined as the 5th scalariform states model of HMM;
Obtain the 6th character " 5 " of the second character string, from the n the most corresponding with n kind basis character of targeted customer In GMM, GMM that will be corresponding with the 6th character " 5 ", it is defined as the 6th scalariform states model of HMM.
Owing to the second character string includes k character, so step 803 can perform k time.
Step 804, by the rotation probability of each scalariform states model with redirect probability and be set to preset value, builds and obtains and the The HMM that two character strings are corresponding.
Every single order HMM state model includes probability distribution over states, rotation probability and redirects probability.For vocal print characteristic sequence Middle vocal print feature corresponding for moment t, the probability distribution over states of xth scalariform states model represents that this vocal print feature meets xth scalariform state The probability of the basic character that model is corresponding, rotation probability represents that observational characteristic forwards moment t to from vocal print feature corresponding for moment t During the vocal print feature of+1 correspondence, it is maintained at the probability of xth scalariform states model from xth scalariform states model;Redirect probability and represent observation Feature, when vocal print feature corresponding for moment t forwards vocal print feature corresponding for moment t+1 to, jumps to xth from xth scalariform states model The probability of+1 scalariform states model.
Alternatively, by the rotation probability of each scalariform states model with redirect probability and be disposed as 0.5.
Through the HMM model that this step is generated, schematically with reference to shown in Fig. 9.
Step 805, inputs HMM by vocal print characteristic sequence, uses Viterbi allocation algorithm to calculate maximum likelihood probability, will Maximum likelihood probability is defined as similarity score.
Due in vocal print characteristic sequence, the most corresponding tactic continuous multiple vocal print features of each character, vocal print Vocal print feature quantity in characteristic sequence is more than the GMM model quantity in HMM, so for each scalariform states model in HMM, There may be tactic continuous multiple vocal print features the most corresponding.After by vocal print characteristic sequence input HMM, according to difference GMM redirect path, it is possible to calculate multiple probability that this vocal print characteristic sequence is corresponding.Viterbi (Viterbi) allocation algorithm Can calculate the maximum likelihood probability after this vocal print characteristic sequence input HMM, voiceprint recognition algorithm is by this maximum likelihood probability It is defined as the similarity score of this vocal print characteristic sequence and HMM model.
Alternatively, this similarity score uses logarithm log to be indicated.
It should be noted that n GMM based on each targeted customer, all can build with the second character string corresponding to HMM.During so targeted customer is Z, also it is Z with the HMM corresponding to the second character string, the execution Z that step 805 also can be corresponding Secondary.But under some scenes, it is only necessary to confirm whether unknown subscriber is some specific targeted customer, now, step 805 only need to perform 1 time.
Step 806, when similarity score is more than predetermined threshold value, identifies targeted customer by unknown subscriber.
After the vocal print characteristic sequence of tested speech signal is inputted the HMM of each targeted customer, obtain multiple similarity and divide Number.By each similarity score compared with predetermined threshold value, if similarity score is more than predetermined threshold value, then Voiceprint Recognition System will not The identification knowing user is targeted customer.
Otherwise, if similarity score is less than predetermined threshold value, then Voiceprint Recognition System determines that unknown subscriber is with targeted customer not Coupling, Voiceprint Recognition System can allow unknown subscriber retest, or refusal unknown subscriber carries out subsequent operation.
In sum, the method for recognizing sound-groove that the present embodiment provides, by by the vocal print characteristic sequence of tested speech signal, HMM GMM constructed by corresponding with multiple bases character calculates similarity score, thus unknown subscriber is carried out identification; Solving the GMM of targeted customer is the unrelated model of a text, it is impossible to utilize asking of information abundant in registration voice signal Topic;Reach, for each targeted customer, between the GMM the most corresponding with each basis character, to consider every kind of basic character pair The audio content answered is in the diversity of phoneme aspect, and the audio content that HMM model also contemplates each basis character corresponding exists Dependency in time domain such that it is able to greatly increase the sound-groove model of targeted customer accurate in the identification in identification stage Rate.
It should be noted that Voiceprint Recognition System can be realized by a terminal, it is also possible to by terminal and server combination Realize.When being realized by terminal and server combination, voice collecting stage and vocal print feature extraction phases can be performed by terminal, and Training process and/or the Application on Voiceprint Recognition process of sound-groove model can be performed by server.
In the embodiment that some are possible, the training process of sound-groove model is performed by the first Voiceprint Recognition System, and will instruction N the GMM of the targeted customer got is saved in Share Model storehouse, and Application on Voiceprint Recognition process is held by the second Voiceprint Recognition System OK, the second Voiceprint Recognition System obtains from Share Model storehouse and uses n the GMM of targeted customer, with generate the second character string with And build the HMM model corresponding with the second character string.
In a specific example, in 1000 people's training samples, 290,000 tests, (wherein the test of identities match exists About 10000 times, matching test is not about at 280,000 times), it is possible to realize the recall rate of 68.88% under one thousandth error rate, wait mistake Probability (EER, Equal Error Rate) is 4.52%, and compared to traditional unrelated modeling method of text, performance boost exceedes More than 30%.
Figure 10 shows the block diagram of the sound-groove model training devices that one embodiment of the invention provides.This vocal print mould Type training devices can pass through special hardware circuit, or, being implemented in combination with of software and hardware become the whole of Voiceprint Recognition System or A part.Described device includes:
Acquisition module 1010, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, described First character string includes that the character of m arranged in sequence, described m character include n kind mutually different basis character, m and n is Positive integer and m >=n;
Extraction module 1020, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module 1030, for the described vocal print feature corresponding to each described character of described targeted customer It is the first sample data, default universal background model is trained, obtain the mixed Gauss model of described targeted customer;
Second training module 1040, for described targeted customer with the vocal print feature corresponding to i-th kind of basic character Be the second sample data, the described mixed Gauss model of described targeted customer be trained, obtain described targeted customer with Described mixed Gauss model corresponding to i-th kind of basic character;
Memory module 1050, for storing n the mixed Gaussian the most corresponding with n kind basis character of described targeted customer Model, described n mixed Gauss model is for building the hidden Markov model corresponding with the second character string.
In an alternate embodiment of the invention, described device, also include:
Display module 1060, shows for the first character string described in stochastic generation.
In an alternate embodiment of the invention, described extraction module 1020, including:
Recognition unit, for identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described Invalid voice fragment includes quiet section and/or noise section;
Snippet extraction unit, for by the jth efficient voice fragment in described registration voice signal, is extracted as with described The sound bite corresponding to jth character in first character string;
Feature extraction unit, for extracting and the vocal print feature of the sound bite corresponding to described jth character.
In an alternate embodiment of the invention, described feature extraction unit, for extracting and the voice corresponding to described jth character Mel cepstrum coefficients MFCC in fragment or perception linear predictor coefficient PLP, as with the voice corresponding to described jth character The vocal print feature of fragment.
In an alternate embodiment of the invention, described first training module 1030, specifically for each institute with described targeted customer State the basis described vocal print corresponding to character and be characterized as the first sample data, use maximal posterior probability algorithm to default general Parameter in background model is adjusted;Described universal background model after adjusting parameter is defined as the mixed of described targeted customer Close Gauss model.
In an alternate embodiment of the invention, described second training module 1040, specifically for described targeted customer with i-th kind Basis vocal print corresponding to character is characterized as the second sample data, uses maximal posterior probability algorithm to mix described targeted customer The parameter closed in Gauss model is adjusted;By the mixed Gauss model of the described targeted customer after adjustment parameter, it is defined as institute State targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character.
It should be noted that when Voiceprint Recognition System is realized by terminal and server combination, above-mentioned acquisition module 1010, extraction module 1020 and display module 1060 can be realized by the combination of the special hardware circuit in terminal or software and hardware;On State first training module the 1030, second training module 1040 and memory module 1050 can by the special hardware circuit in server or The combination of software and hardware realizes.But this is not limited by the embodiment of the present invention, such as, above-mentioned extraction module 1020 can also take Special hardware circuit in business device realizes, or, the combination of software and hardware realizes.
Figure 11 shows the block diagram of the voice print identification device that one embodiment of the invention provides.This Application on Voiceprint Recognition fills Put and can pass through special hardware circuit, or, software and hardware be implemented in combination with becoming all or part of Voiceprint Recognition System.Institute State device to include:
Acquisition module 1110, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, described Second character string include the character of k arranged in sequence, described k character include n kind mutually different basis character in whole Character or partial character, k and n is positive integer;
Extraction module 1120, for extracting the vocal print feature sequence corresponding to each character from described tested speech signal Row;
Build module 1130, for n the mixed Gaussian mould the most corresponding with n kind basis character according to targeted customer Type, builds the HMM corresponding with described second character string;
Computing module 1140, divides for calculating the similarity of described vocal print characteristic sequence and described HMM Number;
Identification module 1150, for when described similarity score is more than predetermined threshold value, identifying institute by described unknown subscriber State targeted customer.
In an alternate embodiment of the invention, described device, also include:
Display module 1160, for based on described n kind basis character, the second character string described in stochastic generation shows.
In an alternate embodiment of the invention, described structure module 1130, specifically for obtaining the x-th word of described second character string Symbol, x is the positive integer more than or equal to 1 and less than or equal to k;From the n the most corresponding with n kind basis character of described targeted customer In mixed Gauss model, by the described mixed Gauss model corresponding with described x-th character, it is defined as described Hidden Markov mould The xth scalariform states model of type;By the rotation probability of each scalariform states model with redirect probability and be set to preset value, build obtain with The described HMM that described second character string is corresponding.
In an alternate embodiment of the invention, described computing module 1140, specifically for by described for the input of described vocal print characteristic sequence HMM, uses dimension bit distribution algorithm to calculate maximum likelihood probability, is defined as by described maximum likelihood probability Described similarity score.
It should be noted that when Voiceprint Recognition System is realized by terminal and server combination, above-mentioned acquisition module 1110, extraction module 1120 and display module 1160 can be realized by the special hardware circuit in terminal, or, the combination of software and hardware is real Existing;Above-mentioned structure module 1130, computing module 1140 and identification module 1150 can be by the special hardware circuits in server or soft The combination of hardware realizes.But this is not limited by the embodiment of the present invention, such as, above-mentioned extraction module 1120 can also service Special hardware circuit in device realizes, or, the combination of software and hardware realizes.
The combination of the software and hardware described in the embodiment of the present invention, it is common that in finger processor run memory or one Above programmed instruction, realize in step that said method embodiment provided or said apparatus embodiment " module or Unit ".
It should be understood that above-described embodiment provide sound-groove model training devices train sound-groove model time, only more than The division stating each functional module is illustrated, and in actual application, can distribute above-mentioned functions by difference as desired Functional module complete, the internal structure of equipment will be divided into different functional modules, with complete described above all or Person's partial function.It addition, the sound-groove model training devices that above-described embodiment provides belongs to sound-groove model training method embodiment Same design, it implements process and refers to embodiment of the method, repeats no more here.
The voice print identification device that above-described embodiment provides, when Application on Voiceprint Recognition, is only carried out with the division of above-mentioned each functional module Illustrate, in actual application, can as desired above-mentioned functions distribution be completed by different functional modules, will equipment Internal structure be divided into different functional modules, to complete all or part of function described above.It addition, above-mentioned enforcement The voice print identification device that example provides and method for recognizing sound-groove embodiment belong to same design, and it implements process and refers to method in fact Execute example, repeat no more here
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can pass through hardware Completing, it is also possible to instruct relevant hardware by program and complete, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read only memory, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.

Claims (20)

1. a sound-groove model training method, it is characterised in that described method includes:
Gathering targeted customer and read aloud the first character string produced registration voice signal, described first character string includes m sequentially The character of arrangement, described m character includes n kind mutually different basis character, m and n is positive integer and m >=n;
The vocal print feature corresponding to each character is extracted from described registration voice signal;
It is characterized as the first sample data with the described vocal print corresponding to each described character of described targeted customer, default is led to It is trained by background model, obtains the mixed Gauss model of described targeted customer;
It is characterized as the second sample data, to described target with the vocal print corresponding to i-th kind of basic character with described targeted customer The described mixed Gauss model of user is trained, obtain described targeted customer with i-th kind of basic character corresponding to described in Mixed Gauss model;
Store n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, described n mixed Gaussian Model is for building the hidden Markov model corresponding with the second character string.
Method the most according to claim 1, it is characterised in that described collection targeted customer reads aloud book character string and produced Voice signal before, also include:
Described in stochastic generation, the first character string shows.
Method the most according to claim 1, it is characterised in that described extract each word from described registration voice signal Vocal print feature corresponding to symbol, including:
Identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described invalid voice fragment includes quiet Segment and/or noise section;
By the jth efficient voice fragment in described registration voice signal, it is extracted as and the jth word in described first character string Sound bite corresponding to symbol;
Extract and the vocal print feature of the sound bite corresponding to described jth character.
Method the most according to claim 3, it is characterised in that described extraction and the sound bite corresponding to jth character Vocal print feature, including:
Extract and the mel cepstrum coefficients MFCC in the sound bite corresponding to described jth character or perception linear predictor coefficient PLP, as the vocal print feature with the sound bite corresponding to described jth character.
5. according to the arbitrary described method of Claims 1-4, it is characterised in that described in described each with described targeted customer The basis described vocal print corresponding to character is characterized as the first sample data, is trained default universal background model, obtains The mixed Gauss model of described targeted customer, including:
It is characterized as the first sample data with the described vocal print corresponding to each described basis character of described targeted customer, uses Parameter in default universal background model is adjusted by big posterior probability algorithm;
Described universal background model after adjusting parameter is defined as the mixed Gauss model of described targeted customer.
6. according to the arbitrary described method of Claims 1-4, it is characterised in that described with described targeted customer with i-th kind of base Vocal print corresponding to plinth character is characterized as the second sample data, instructs the described mixed Gauss model of described targeted customer Practice, obtain described targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character, including:
It is characterized as the second sample data with the vocal print corresponding to i-th kind of basic character, after using maximum with described targeted customer Test probabilistic algorithm the parameter in the mixed Gauss model of described targeted customer is adjusted;
The mixed Gauss model of the described targeted customer after parameter will be adjusted, be defined as described targeted customer with i-th kind of basis Described mixed Gauss model corresponding to character.
7. a method for recognizing sound-groove, it is characterised in that described method includes:
Obtaining unknown subscriber and read aloud tested speech signal produced by the second character string, described second character string includes k sequentially The character of arrangement, described k character includes the alphabet in the character of n kind mutually different basis or partial character, k and n is equal For positive integer;
The vocal print characteristic sequence corresponding to each character is extracted from described tested speech signal;
According to n the mixed Gauss model the most corresponding with n kind basis character of targeted customer, build and described second character string Corresponding HMM;
Calculate the similarity score of described vocal print characteristic sequence and described HMM;
When described similarity score is more than predetermined threshold value, described unknown subscriber is identified described targeted customer.
Method the most according to claim 7, it is characterised in that described acquisition unknown subscriber reads aloud the second character string and produced Tested speech signal before, also include:
Based on described n kind basis character, the second character string described in stochastic generation shows.
Method the most according to claim 7, it is characterised in that described distinguishing with n kind basis character according to targeted customer N corresponding mixed Gauss model, build the HMM corresponding with described second character string, including:
Obtaining the x-th character of described second character string, x is the positive integer more than or equal to 1 and less than or equal to k;
From n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, will be with described x-th word The described mixed Gauss model that symbol is corresponding, is defined as the xth scalariform states model of described HMM;
The rotation probability of each scalariform states model is set to preset value with redirecting probability, builds and obtain and described second character string Corresponding described HMM.
Method the most according to claim 7, it is characterised in that described calculating described vocal print characteristic sequence and described hidden horse The similarity score of Er Kefu model, including:
Described vocal print characteristic sequence is inputted described HMM, uses Viterbi allocation algorithm to calculate maximum likelihood Probability, is defined as described similarity score by described maximum likelihood probability.
11. 1 kinds of sound-groove model training devicess, it is characterised in that described device includes:
Acquisition module, is used for gathering targeted customer and reads aloud the first character string produced registration voice signal, described first character String includes that the character of m arranged in sequence, described m character include the mutually different basic character of n kind, m and n be positive integer and m≥n;
Extraction module, for extracting the vocal print feature corresponding to each character from described registration voice signal;
First training module, for being characterized as the first sample with the described vocal print corresponding to each described character of described targeted customer Notebook data, is trained default universal background model, obtains the mixed Gauss model of described targeted customer;
Second training module, for being characterized as the second sample with described targeted customer with the vocal print corresponding to i-th kind of basic character Notebook data, is trained the described mixed Gauss model of described targeted customer, obtain described targeted customer with i-th kind of basis Described mixed Gauss model corresponding to character;
Memory module, for storing n the mixed Gauss model the most corresponding with n kind basis character of described targeted customer, institute State n mixed Gauss model for building the hidden Markov model corresponding with the second character string.
12. devices according to claim 11, it is characterised in that described device, also include:
Display module, shows for the first character string described in stochastic generation.
13. devices according to claim 11, it is characterised in that described extraction module, including:
Recognition unit is for identifying the efficient voice fragment in described registration voice signal and invalid voice fragment, described invalid Sound bite includes quiet section and/or noise section;
Snippet extraction unit, for by the jth efficient voice fragment in described registration voice signal, is extracted as and described first The sound bite corresponding to jth character in character string;
Feature extraction unit, for extracting and the vocal print feature of the sound bite corresponding to described jth character.
14. devices according to claim 13, it is characterised in that described feature extraction unit, for extracting and described jth Mel cepstrum coefficients MFCC in sound bite corresponding to individual character or perception linear predictor coefficient PLP, as with described jth The vocal print feature of the sound bite corresponding to individual character.
15. according to the arbitrary described device of claim 11 to 14, it is characterised in that described first training module, specifically for It is characterized as the first sample data, after using maximum with the described vocal print corresponding to each described basis character of described targeted customer Test probabilistic algorithm the parameter in default universal background model is adjusted;By the described universal background model after adjustment parameter It is defined as the mixed Gauss model of described targeted customer.
16. according to the arbitrary described device of claim 11 to 14, it is characterised in that described second training module, specifically for It is characterized as the second sample data with described targeted customer with the vocal print corresponding to i-th kind of basic character, uses maximum a posteriori general Parameter in the mixed Gauss model of described targeted customer is adjusted by rate algorithm;By the described targeted customer after adjustment parameter Mixed Gauss model, be defined as described targeted customer with the described mixed Gauss model corresponding to i-th kind of basic character.
17. 1 kinds of voice print identification device, it is characterised in that described device includes:
Acquisition module, is used for obtaining unknown subscriber and reads aloud tested speech signal produced by the second character string, described second character String includes the character of k arranged in sequence, and described k character includes the alphabet in the character of n kind mutually different basis or portion Dividing character, k and n is positive integer;
Extraction module, for extracting the vocal print characteristic sequence corresponding to each character from described tested speech signal;
Build module, for n the mixed Gauss model respectively the most corresponding with n kind basis character according to targeted customer, structure and The HMM that described second character string is corresponding;
Computing module, for calculating the similarity score of described vocal print characteristic sequence and described HMM;
By described unknown subscriber, identification module, for when described similarity score is more than predetermined threshold value, identifying that described target is used Family.
18. devices according to claim 17, it is characterised in that described device, also include:
Display module, for based on described n kind basis character, the second character string described in stochastic generation shows.
19. devices according to claim 17, it is characterised in that described structure module, specifically for obtaining described second The x-th character of character string, x is the positive integer more than or equal to 1 and less than or equal to k;From described targeted customer with n kind base word In n the mixed Gauss model that symbol is the most corresponding, by the described mixed Gauss model corresponding with described x-th character, it is defined as The xth scalariform states model of described HMM;By the rotation probability of each scalariform states model with redirect probability and be set to Preset value, builds and obtains the described HMM corresponding with described second character string.
20. devices according to claim 17, it is characterised in that described computing module, specifically for special by described vocal print Levy HMM described in sequence inputting, use dimension bit distribution algorithm to calculate maximum likelihood probability, by described maximum Likelihood probability is defined as described similarity score.
CN201610388231.3A 2016-06-01 2016-06-01 Sound-groove model training method, method for recognizing sound-groove and device Active CN106057206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610388231.3A CN106057206B (en) 2016-06-01 2016-06-01 Sound-groove model training method, method for recognizing sound-groove and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610388231.3A CN106057206B (en) 2016-06-01 2016-06-01 Sound-groove model training method, method for recognizing sound-groove and device

Publications (2)

Publication Number Publication Date
CN106057206A true CN106057206A (en) 2016-10-26
CN106057206B CN106057206B (en) 2019-05-03

Family

ID=57169475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610388231.3A Active CN106057206B (en) 2016-06-01 2016-06-01 Sound-groove model training method, method for recognizing sound-groove and device

Country Status (1)

Country Link
CN (1) CN106057206B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN108416592A (en) * 2018-03-19 2018-08-17 成都信达智胜科技有限公司 A kind of high speed voice recognition methods
CN109102812A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 A kind of method for recognizing sound-groove, system and electronic equipment
WO2019000832A1 (en) * 2017-06-30 2019-01-03 百度在线网络技术(北京)有限公司 Method and apparatus for voiceprint creation and registration
CN109473107A (en) * 2018-12-03 2019-03-15 厦门快商通信息技术有限公司 A kind of relevant method for recognizing sound-groove of text half and system
CN109871847A (en) * 2019-03-13 2019-06-11 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal
CN109948481A (en) * 2019-03-07 2019-06-28 惠州学院 A kind of passive human body recognition method based on the sampling of narrow radio frequency link
CN110335608A (en) * 2019-06-17 2019-10-15 平安科技(深圳)有限公司 Voice print verification method, apparatus, equipment and storage medium
CN110491393A (en) * 2019-08-30 2019-11-22 科大讯飞股份有限公司 The training method and relevant apparatus of vocal print characterization model
CN110517671A (en) * 2019-08-30 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of appraisal procedure of audio-frequency information, device and storage medium
CN110689895A (en) * 2019-09-06 2020-01-14 北京捷通华声科技股份有限公司 Voice verification method and device, electronic equipment and readable storage medium
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111341307A (en) * 2020-03-13 2020-06-26 腾讯科技(深圳)有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112151018A (en) * 2019-06-10 2020-12-29 阿里巴巴集团控股有限公司 Voice evaluation and voice recognition method, device, equipment and storage medium
CN112820299A (en) * 2020-12-29 2021-05-18 马上消费金融股份有限公司 Voiceprint recognition model training method and device and related equipment
CN113056784A (en) * 2019-01-29 2021-06-29 深圳市欢太科技有限公司 Voice information processing method and device, storage medium and electronic equipment
CN113457096A (en) * 2020-03-31 2021-10-01 荣耀终端有限公司 Method for detecting basketball movement based on wearable device and wearable device
CN113571054A (en) * 2020-04-28 2021-10-29 ***通信集团浙江有限公司 Speech recognition signal preprocessing method, device, equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294083A1 (en) * 2000-03-16 2007-12-20 Bellegarda Jerome R Fast, language-independent method for user authentication by voice
CN102238190A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN104717219A (en) * 2015-03-20 2015-06-17 百度在线网络技术(北京)有限公司 Vocal print login method and device based on artificial intelligence
CN104821934A (en) * 2015-03-20 2015-08-05 百度在线网络技术(北京)有限公司 Artificial intelligence based voice print login method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294083A1 (en) * 2000-03-16 2007-12-20 Bellegarda Jerome R Fast, language-independent method for user authentication by voice
CN102238190A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN104717219A (en) * 2015-03-20 2015-06-17 百度在线网络技术(北京)有限公司 Vocal print login method and device based on artificial intelligence
CN104821934A (en) * 2015-03-20 2015-08-05 百度在线网络技术(北京)有限公司 Artificial intelligence based voice print login method and device

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102812A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 A kind of method for recognizing sound-groove, system and electronic equipment
CN109102812B (en) * 2017-06-21 2021-08-31 北京搜狗科技发展有限公司 Voiceprint recognition method and system and electronic equipment
WO2019000832A1 (en) * 2017-06-30 2019-01-03 百度在线网络技术(北京)有限公司 Method and apparatus for voiceprint creation and registration
US11100934B2 (en) 2017-06-30 2021-08-24 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for voiceprint creation and registration
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN108416592B (en) * 2018-03-19 2022-08-05 成都信达智胜科技有限公司 High-speed voice recognition method
CN108416592A (en) * 2018-03-19 2018-08-17 成都信达智胜科技有限公司 A kind of high speed voice recognition methods
CN109473107A (en) * 2018-12-03 2019-03-15 厦门快商通信息技术有限公司 A kind of relevant method for recognizing sound-groove of text half and system
CN109473107B (en) * 2018-12-03 2020-12-22 厦门快商通信息技术有限公司 Text semi-correlation voiceprint recognition method and system
CN113056784A (en) * 2019-01-29 2021-06-29 深圳市欢太科技有限公司 Voice information processing method and device, storage medium and electronic equipment
CN109948481A (en) * 2019-03-07 2019-06-28 惠州学院 A kind of passive human body recognition method based on the sampling of narrow radio frequency link
CN109948481B (en) * 2019-03-07 2024-02-02 惠州学院 Passive human body identification method based on narrowband radio frequency link sampling
CN109871847B (en) * 2019-03-13 2022-09-30 厦门商集网络科技有限责任公司 OCR recognition method and terminal
CN109871847A (en) * 2019-03-13 2019-06-11 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal
CN112151018A (en) * 2019-06-10 2020-12-29 阿里巴巴集团控股有限公司 Voice evaluation and voice recognition method, device, equipment and storage medium
CN110335608A (en) * 2019-06-17 2019-10-15 平安科技(深圳)有限公司 Voice print verification method, apparatus, equipment and storage medium
CN110335608B (en) * 2019-06-17 2023-11-28 平安科技(深圳)有限公司 Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN110517671A (en) * 2019-08-30 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of appraisal procedure of audio-frequency information, device and storage medium
CN110491393B (en) * 2019-08-30 2022-04-22 科大讯飞股份有限公司 Training method of voiceprint representation model and related device
CN110491393A (en) * 2019-08-30 2019-11-22 科大讯飞股份有限公司 The training method and relevant apparatus of vocal print characterization model
CN110689895A (en) * 2019-09-06 2020-01-14 北京捷通华声科技股份有限公司 Voice verification method and device, electronic equipment and readable storage medium
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111341307A (en) * 2020-03-13 2020-06-26 腾讯科技(深圳)有限公司 Voice recognition method and device, electronic equipment and storage medium
CN113457096A (en) * 2020-03-31 2021-10-01 荣耀终端有限公司 Method for detecting basketball movement based on wearable device and wearable device
CN113457096B (en) * 2020-03-31 2022-06-24 荣耀终端有限公司 Method for detecting basketball movement based on wearable device and wearable device
CN113571054A (en) * 2020-04-28 2021-10-29 ***通信集团浙江有限公司 Speech recognition signal preprocessing method, device, equipment and computer storage medium
CN113571054B (en) * 2020-04-28 2023-08-15 ***通信集团浙江有限公司 Speech recognition signal preprocessing method, device, equipment and computer storage medium
CN112820299B (en) * 2020-12-29 2021-09-14 马上消费金融股份有限公司 Voiceprint recognition model training method and device and related equipment
CN112820299A (en) * 2020-12-29 2021-05-18 马上消费金融股份有限公司 Voiceprint recognition model training method and device and related equipment

Also Published As

Publication number Publication date
CN106057206B (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN106057206A (en) Voiceprint model training method, voiceprint recognition method and device
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
US20180197548A1 (en) System and method for diarization of speech, automated generation of transcripts, and automatic information extraction
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
CN105489221B (en) A kind of audio recognition method and device
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
TWI527023B (en) A voiceprint recognition method and apparatus
CN106098068A (en) A kind of method for recognizing sound-groove and device
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN107104803A (en) It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password
CN103811009A (en) Smart phone customer service system based on speech analysis
CN110544469B (en) Training method and device of voice recognition model, storage medium and electronic device
CN107077843A (en) Session control and dialog control method
CN109410664A (en) Pronunciation correction method and electronic equipment
CN111128211B (en) Voice separation method and device
CN109410956A (en) A kind of object identifying method of audio data, device, equipment and storage medium
CN107492153A (en) Attendance checking system, method, work attendance server and attendance record terminal
CN110797032A (en) Voiceprint database establishing method and voiceprint identification method
Beigi Challenges of LargeScale Speaker Recognition
CN109545226A (en) A kind of audio recognition method, equipment and computer readable storage medium
WO2021152566A1 (en) System and method for shielding speaker voice print in audio signals
CN105895079A (en) Voice data processing method and device
CN112992155A (en) Far-field voice speaker recognition method and device based on residual error neural network
CN107910005A (en) The target service localization method and device of interaction text
Koolagudi et al. Speaker recognition in the case of emotional environment using transformation of speech features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230713

Address after: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors

Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd.

Address before: 2, 518000, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

TR01 Transfer of patent right