CN107993663A - A kind of method for recognizing sound-groove based on Android - Google Patents
A kind of method for recognizing sound-groove based on Android Download PDFInfo
- Publication number
- CN107993663A CN107993663A CN201710809811.XA CN201710809811A CN107993663A CN 107993663 A CN107993663 A CN 107993663A CN 201710809811 A CN201710809811 A CN 201710809811A CN 107993663 A CN107993663 A CN 107993663A
- Authority
- CN
- China
- Prior art keywords
- model
- speaker
- voice
- training
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 239000013598 vector Substances 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000012360 testing method Methods 0.000 claims abstract description 20
- 230000001755 vocal effect Effects 0.000 claims abstract description 19
- 238000013139 quantization Methods 0.000 claims abstract description 15
- 239000004568 cement Substances 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims description 22
- 238000009432 framing Methods 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000003064 k means clustering Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 230000000452 restraining effect Effects 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of method for recognizing sound-groove based on Android of the present invention, operates in Android operation system, by sound pick-up outfit collection training speaker's audio built in calling and carries out speech enhan-cement, kd trees and the gauss hybrid models of vector quantization model are built when training.Searched out during identification in the kd trees of vector quantization model with after the test trained speaker of immediate K of speaker's vocal print feature, accurately being identified using gauss hybrid models.The present invention not only avoid all models in traversal gauss hybrid models storehouse using the kd trees of vector quantization model, add recognition speed, and the accuracy of identification is added using the secondary identification of vector quantization model and gauss hybrid models, there is very strong practicality, ease for use and robustness.
Description
(1) technical field:
A kind of method for recognizing sound-groove based on Android of the present invention, belongs to field of computer technology.
(2) background technology:
In the epoch of " nowhere not account, nowhere not password ", people are usually because forgeing or losing password and feel tired
Angry, the appearance of method for recognizing sound-groove undoubtedly brings more convenient, efficient method of service, and vocal print feature " can be carried with ",
Authentication can be carried out whenever and wherever possible.The present invention is based on Android operation system, extracts vocal print feature, is sweared by using construction
Measure the kd trees (k-dimension Tree) of quantitative model and gauss hybrid models reach the effect of identification speaker.This hair
It is bright to lift search recognition speed, precision, there are very strong practicality, ease for use and robustness.
(3) content of the invention:
A kind of method for recognizing sound-groove based on Android of the present invention, extract voice mel-frequency cepstrum coefficient, one
Order difference mel-frequency cepstrum coefficient, second differnce mel-frequency cepstrum coefficient, frame energy and the energy screening of utilization frame are effective
After feature, the vector quantization model kd trees of training speaker's vocal print feature are constructed, establish the gauss hybrid models of vocal print feature
Storehouse, when test first search out in kd trees K with after the most similar model of vocal print feature, using gauss hybrid models in K model
It is middle accurately to be identified, so as to identify speaker's identity.
1st, a kind of method for recognizing sound-groove based on Android, this method step are as follows:
Step 1:Voice data is gathered using AudioRecorder interfaces, using monaural recording, sample frequency is set
22050HZ, takes pulse code modulation, each sampled point quantization digit 16, while passes through calling
AcousticEchoCanceler classes example, NoiseSuppressor classes example and AutomaticGainControl class examples
Carry out automatic echo cancellor, suppress noise and automatic growth control, reach the effect of speech enhan-cement;Pass through Android's at the same time
Asynchronous message treatment mechanism realizes sub-line journey renewal UI and realizes clocking capability;While audio uncorrected data is obtained, for its volume
It is stored in after writing the header file of wave forms in corresponding training speaker's voice library archive, after End of Tape, software interface bullet
Go out renaming window, user inputs the name of corresponding training speaker as filename;
Step 2:After training speaker's sound bank collection, all audio files in the storehouse are pre-processed, it is first
Framing operation is first carried out, it is 16 milliseconds to set framing frame length, and it is 8 milliseconds that frame, which moves,;After the completion of framing operation, double threshold is utilized
Method carries out end-point detection work, and since speech enhan-cement effect is preferable in step 1, it is 0.1 to set low energy threshold value, high-energy threshold
It is worth for 1, low zero-crossing rate threshold value is 0.01, and high zero-crossing rate threshold value is 10, a length of 12 frame, i.e. voice when most long mute in voice segments
The frame energy and zero-crossing rate of section are at the same time less than the duration of low energy threshold value and low zero-crossing rate threshold value no more than 12 frames, most phrase
A length of 10 frame during sound, i.e. the frame energy of initiating speech section be more than low frame energy threshold or zero-crossing rate be more than zero-crossing rate threshold value when
Length cannot be shorter than 10 frames;After the completion of end-point detection, voice segment signal is obtained, is to add Hamming window per frame signal;When adding window is grasped
It is to carry out preemphasis operation per frame voice segment signal after the completion of work, makes up the loss of high fdrequency component, this method sets preemphasis
Coefficient is 0.93;
Step 3:After the completion of pretreatment, this method takes mel-frequency cepstrum coefficient, sets triangle bandpass filter number
For 40,12 number is maintained before extraction per frame;After the completion of mel-frequency cepstrum coefficient is extracted, 1 rank and 2 ranks 12 dimension difference before extraction
Mel-frequency cepstrum coefficient;After Differential Characteristics are completed in extraction, eliminate corresponding frame energy and be less than 1 and the feature vector more than 10,
The identification error that elimination speaker's wave volume is excessive or too small and produces;
Step 4:After feature extraction is completed, its vector quantization model of training, generation represents the code book of the speaker, adopts
With the algorithm of construction balance kd trees, kd trees are built with the code word in all trained speaker's code books, each code word is as in the tree
A node and exist;
Step 5:Meanwhile program is clustered characteristic vector using k means clustering algorithms in another thread, classification
Number is 16, after k means clustering process is repeated 10 times by this method, calculates total variance within clusters after cluster every time, choosing respectively
That selects total variance within clusters minimum is once used as final result;After cluster operation is completed, Gauss model is carried out using EM algorithms
Parameter Estimation, the initial ginseng first using each parameter-average, variance and the weight coefficient after cluster as gauss hybrid models
Number, carries out the revaluation of parameters by the parameter revaluation formula of EM algorithms, pair of likelihood function value is calculated during revaluation
The knots modification of numerical value, when the knots modification is less than threshold value 0.01, that is, is judged to restraining, records average, variance and weight at this time
Coefficient, then three parameters and the voice document name of crawl are that training speaker name is stored in speaker model class, and
The speaker model class example is stored in model dynamic array;When the memory size in model dynamic array, that is, model class is real
The number of example, during equal to speaker's number, shows that all voice documents are all trained to finish, and then program is by model dynamic
Array carries out serializing operation, and each model class instance transfer therein is byte sequence, then mixes sequence deposit Gauss
Molding type database file, training finish;
Step 6:After the completion of the foundation of gauss hybrid models database file, you can carry out test job, admission test is said
After talking about people's voice data, according to step 2, the vocal print feature vector set of 3 extraction test speakers, concentrate and choose in feature vector
One feature vector, in the kd trees that step 4 generates, finds the M code word nearest with the Euclidean distance of this feature vector, and
The corresponding code book of M code word is searched, is repeated the above steps, until having traveled through testing feature vector collection;Then find out by
Search K most code book of number, wherein K<M, it is then that the vocal print of the corresponding K trained speaker of the K code book is special
Sign carries out the identification of gauss hybrid models, and during gauss hybrid models identify, program pin is to wherein everyone vocal print
Feature, finds the model of posterior probability maximum in gauss hybrid models database, and the corresponding speaker of the model judges
To test speaker.
2nd, the corresponding Voiceprint Recognition System of method for recognizing sound-groove according to 1, including following module:
Voice acquisition module:Hardware components include microphone and sound card, and microphone is responsible for the collection of voice, and sound card is then
It is for being digitally converted to voice;Software section include audio data collecting, automatic echo cancellor, suppress noise, from
Dynamic gain control and clocking capability;
Vocal print feature extraction module:Including voice pretreatment and two submodules of feature extraction, input is by voice collecting
Feature is sent to model training module and is handled by the voice data of module admission, training stage, and test phase sends feature to
Model identification module is handled;
Model training module:The module includes the vector quantization model of training voice, the kd trees for building code word and training language
The gauss hybrid models of sound;
Model modification module:Newly added trained speaker's vector quantization model is inserted into kd trees by the module, and will
Its gauss hybrid models is added in gauss hybrid models database file;
Model identification module:The input of the module is the output of vocal print feature extraction module, and the output of the module is to survey
Try the name and its personal information of speaker;
Database management module:It is responsible for training name and its personal information, the model parameter and voice of speaker
File, wherein each model correspond to each voice document, the name of each voice document is the name of corresponding speaker, often
The name of a speaker is connected to its personal information in the database.
(4) illustrate:
Fig. 1, model training flow chart;
Fig. 2, Model Identification flow chart;
Fig. 3, asynchronous message treatment mechanism;
Fig. 4, pretreatment process figure;
Fig. 5, vocal print feature extraction flow chart;
Fig. 6, triangle bandpass filter;
Fig. 7, gauss hybrid models Establishing process figure.
(5) embodiment:
Below in conjunction with the accompanying drawings, technical solutions according to the invention are further elaborated.
The software program of the present invention mainly includes algorithm realization and the client software control circle of model training and identification
Face, wherein algorithm part are mainly including the framing adding window to voice data, end-point detection, preemphasis, feature extraction and model
Training, identification, training and identification process are as shown in Figure 1, 2.Client software control interface is mainly used for behaviour of the user to software
Control.
Audio collection:It is mainly used for gathering voice data, timing and speech enhan-cement.Wherein audio collection subprogram
AudioRecorder class examples are called to carry out inputting audio data, program writes wave forms for it after voice data typing
Header file, and this document is named with speaker's name, it is stored in voice library file.In order to reach the effect of speech enhan-cement, adopting
Collect audio while, program by call AcousticEchoCanceler classes example, NoiseSuppressor classes example and
AutomaticGainControl classes example carries out automatic echo cancellor, suppresses noise and automatic growth control, at the same time journey
Sequence realizes clocking capability by using the asynchronous message treatment mechanism of Android, and the mechanism flow chart is as shown in Figure 3;
Feature extraction:The main algorithm realization for including pretreatment and feature extraction, preprocessing part is as shown in figure 4, first
Framing operation is carried out to audio, after making it have relatively stable characteristic, partial digitized processing can be carried out to it.To obtaining
The audio taken carries out end-point detection work, to determine voice segments position, calculates the zero-crossing rate and frame energy of present frame, such as
At least one parameter of the parameter of fruit two has exceeded set threshold value, if duration long enough, note present frame is
Point, otherwise, jumps to next frame and is calculated;If two parameters are respectively less than set threshold value, if duration long enough,
Terminal is then denoted as, otherwise, next frame is jumped to and is calculated.It is to add per frame data after the start frame and abort frame that determine voice segments
Hamming window carries out preemphasis work to lift its high fdrequency component to ensure the continuity between frame and frame.Characteristic extraction part is such as
Shown in Fig. 5, after the completion of pretreatment, program carries out extraction and the screening operation of feature.After the data of all frames of program pass,
Frame data of the frame energy in range of normal value are filtered out, after carrying out FFT transform to every frame data after screening, by the frequency
Spectrum carries out a square processing, obtains its energy spectrum.Meanwhile program carries out the design evaluation work of triangle bandpass filter, experiment is seen
Survey and find human ear just as one group of wave filter, it simply selectively pays close attention to some frequency signals.But these wave filters
Distribution on the frequency axis is not but unified, and low frequency region is densely distributed, and high-frequency region distribution is sparse, as shown in Figure 6.Should
The design procedure of triangle bandpass filter group is:
(1) low-limit frequency and the number of highest frequency and Mel wave filters of voice signal are determined;
(2) the Mel frequencies corresponding to minimum and highest frequency are calculated;
(3) since filter centre frequency is spacedly distributed on Mel frequency axis, calculate in two neighboring Mel wave filters
The distance of frequency of heart;
(4) each Mel centre frequencies are converted into actual frequency, calculate the FFT subscripts corresponding to each Frequency point.
(5) according to formula:
Calculate the range value of triangle bandpass filter group.Wherein, under f (m) is represented corresponding to m-th of center frequency points
Mark, HmRepresent m-th of wave filter.Assuming that o (m), c (m) and h (m) are the lower frequency limit of m-th of triangular filter, center respectively
Frequency and upper limiting frequency, then adjacent wave filter have:
C (m)=h (m-1)=o (m+1)
After the completion of the design of bandpass filter group, the energy spectrum per frame is filtered using wave filter group, then takes result pair
Discrete cosine transform is carried out after number, output result is mel-frequency cepstrum coefficient, then utilizes mel-frequency cepstrum coefficient meter
Calculate single order and second differnce mel-frequency cepstrum coefficient, calculating formula are as follows:
Wherein, t represents t-th of difference coefficient, and the general values of N are 2, and it is MFCC that c is corresponding when calculating first-order difference, meter
It is first-order difference MFCC that c is corresponding when calculating second differnce, and hereafter each order differential points are all based on first-order difference calculating.We
Method extraction is first-order difference and second differnce, since the later parameter amplitude of 12 dimensions substantially goes to zero, so before only needing
12 dimension parameters.Finally, program every frame feature vector it is last it is one-dimensional add per frame energy value as a kind of vocal print feature,
And eliminate feature vector of the frame energy outside 1 to 10.
The structure of vector quantization model kd trees:After training speaker's speech feature vector collection is obtained, its vector quantity of training
Change model, the code word in all code books obtained after training pattern is constructed into balance kd according to the flow of structure balance kd trees
Tree, all code words represent a node in kd trees.
The training of gauss hybrid models:The algorithm flow is as shown in fig. 7, program is special by the training speaker vocal print of acquisition
Sign vector is classified using K mean cluster algorithm, per the average of class, variance as gauss hybrid models initial mean value and
Vectorial number in every class, is accounted for initial weight of the ratio as gauss hybrid models of general characteristic vector number by variance, and will
Degree of mixing of the preliminary classification number as gauss hybrid models, then carries out the average, variance and weight of model using EM algorithms
Interative computation, until convergence.
The search of vector quantization model kd trees:It is any wherein to choose one after test speaker characteristic vector set is obtained
A feature vector, goes out M and the most similar code word of this feature vector according to the nearest neighbor search algorithm search of kd trees, looks into respectively
Look for its corresponding code book.All feature vectors that feature vector is concentrated are traveled through, after repeating the above steps, K is picked out and is searched
The most code book of number, wherein K<M.Obtain the training speaker corresponding to the K code book and be used for the accurate of gauss hybrid models
Identification.
Gauss hybrid models identify:After searching out with immediate K trained speaker of test speaker, in height
The gauss hybrid models of this K trained speaker are found in this mixed model storehouse, then find mould in the K model parameter
Shape parameter λiSo that the characteristic vector group X for testing speaker has maximum a posteriori probability P (λi| X), model parameter λiIt is corresponding
Training speaker i.e. be determined as test speaker.
According to Bayesian formula:
Again due to P (λi) it is the probability that i-th of model is selected in model library, so
And P (X) is the probability for testing speaker, it is a definite constant, all equal for all testers, because
This, and the maximum posterior probability i.e. maximum P of correspondence (X | λi), in order to reduce computation complexity, usually taken the logarithm, it is as follows
It is shown:
Claims (2)
1. a kind of method for recognizing sound-groove based on Android, this method step are as follows:
Step 1:Voice data is gathered using AudioRecorder interfaces, using monaural recording, sample frequency is set
22050HZ, takes pulse code modulation, each sampled point quantization digit 16, while passes through calling
AcousticEchoCanceler classes example, NoiseSuppressor classes example and AutomaticGainControl class examples
Carry out automatic echo cancellor, suppress noise and automatic growth control, reach the effect of speech enhan-cement, while pass through Android's
Asynchronous message treatment mechanism realizes sub-line journey renewal UI and realizes clocking capability, while audio uncorrected data is obtained, for its volume
It is stored in after writing the header file of wave forms in corresponding training speaker's voice library archive, after End of Tape, software interface bullet
Go out renaming window, user inputs the name of corresponding training speaker as filename;
Step 2:When training speaker's sound bank collection after, all audio files in the storehouse are pre-processed, first into
Row framing operates, and it is 16 milliseconds to set framing frame length, and it is 8 milliseconds that frame, which moves,;After the completion of framing operation, carried out using double threshold method
End-point detection works, and since speech enhan-cement effect is preferable in step 1, it is 0.1 to set low energy threshold value, and high-energy threshold value is 1, low
Zero-crossing rate threshold value is 0.01, and high zero-crossing rate threshold value is 10, the frame energy of a length of 12 frame, i.e. voice segments when most long mute in voice segments
A length of 10 during with zero-crossing rate at the same time less than the duration of low energy threshold value and low zero-crossing rate threshold value no more than 12 frames, most phrase sound
The frame energy of frame, i.e. initiating speech section is more than low frame energy threshold or zero-crossing rate and cannot be shorter than more than the duration of low zero-crossing rate threshold value
10 frames;After the completion of end-point detection, voice segment signal is obtained, is to add Hamming window per frame signal;After the completion of windowing operation, it is
Preemphasis operation is carried out per frame voice segment signal, makes up the loss of high fdrequency component, it is 0.93 that this method, which sets pre emphasis factor,;
Step 3:After the completion of pretreatment, this method extracts mel-frequency cepstrum coefficient, and setting triangle bandpass filter number is
40,12 maintain number before being extracted per frame;After the completion of mel-frequency cepstrum coefficient is extracted, 1 rank and 2 ranks 12 dimension difference Meier before extraction
Frequency cepstral coefficient;After Differential Characteristics are completed in extraction, eliminate corresponding frame energy and be less than 1 and the feature vector more than 10, eliminate
The identification error that speaker's wave volume is excessive or too small and produces;
Step 4:After feature extraction is completed, its vector quantization model of training, generation represents the code book of the speaker, using construction
The algorithm of kd trees is balanced, kd trees are built with the code word in all trained speaker's code books, each code word is as one in the tree
Node and exist;
Step 5:Meanwhile program is clustered characteristic vector using k means clustering algorithms in another thread, classification number is
16, after k means clustering process is repeated 10 times by this method, total variance within clusters after cluster every time are calculated respectively, and selection is total
Variance within clusters minimum is once used as final result;After cluster operation is completed, the parameter of Gauss model is carried out using EM algorithms
Estimation, first using each parameter-average, variance and the weight coefficient after cluster as the initial parameter of gauss hybrid models, passes through
The parameter revaluation formula of EM algorithms carries out the revaluation of parameters, and changing to numerical value for likelihood function value is calculated during revaluation
Variable, when the knots modification is less than threshold value 0.01, that is, is judged to restraining, records average, variance and weight coefficient at this time, then
Three parameters and the i.e. training speaker name of the voice document name of crawl are stored in speaker model class, and this is spoken
In people's model class example deposit model dynamic array;When the number of the memory size in model dynamic array, that is, model class example,
During equal to speaker's number, show that all voice documents are all trained to finish, then model dynamic array is carried out sequence by program
Rowization operate, and each model class instance transfer therein is byte sequence, and the sequence then is stored in gauss hybrid models data
Library file, training finish;
Step 6:After the completion of the foundation of gauss hybrid models database file, you can carry out test job, admission test speaker
After voice data, according to step 2, the vocal print feature vector set of 3 extraction test speakers, concentrated in feature vector and choose a spy
Sign vector, in the kd trees that step 4 generates, finds the M code word nearest with the Euclidean distance of this feature vector, and search the M
The corresponding code book of a code word, repeats the above steps, until having traveled through testing feature vector collection, then finds out by lookup number
K most code books, wherein K<M, then carries out the vocal print feature of the corresponding K trained speaker of the K code book high
The identification of this mixed model, during gauss hybrid models identify, program pin is to wherein everyone vocal print feature, in height
The model of posterior probability maximum is found in this mixed model database, the corresponding speaker of the model is judged as that test is spoken
People.
2. the corresponding Voiceprint Recognition System of method for recognizing sound-groove according to claim 1, including following module:
Voice acquisition module:Hardware components include microphone and sound card, and microphone is responsible for the collection of voice, and sound card be then for
Voice is digitally converted;Software section includes audio data collecting, automatic echo cancellor, suppresses noise, automatic gain
Control and clocking capability;
Vocal print feature extraction module:Including voice pretreatment and two submodules of feature extraction, input is by voice acquisition module
Feature is sent to model training module and is handled by the voice data of admission, training stage, and feature is sent to model by test phase
Identification module is handled;
Model training module:The module includes the vector quantization model of training voice, the kd trees for building code word and training voice
Gauss hybrid models;
Model modification module:Newly added trained speaker's vector quantization model is inserted into kd trees by the module, and it is high
This mixed model is added in gauss hybrid models database file;
Model identification module:The input of the module is the output of vocal print feature extraction module, and the output of the module is to test to say
Talk about the name and its personal information of people;
Database management module:It is responsible for training name and its personal information, the model parameter and voice document of speaker,
Wherein each model correspond to each voice document, and the name of each voice document is the name of corresponding speaker, is each spoken
The name of people is connected to its personal information in the database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710809811.XA CN107993663A (en) | 2017-09-11 | 2017-09-11 | A kind of method for recognizing sound-groove based on Android |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710809811.XA CN107993663A (en) | 2017-09-11 | 2017-09-11 | A kind of method for recognizing sound-groove based on Android |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107993663A true CN107993663A (en) | 2018-05-04 |
Family
ID=62028944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710809811.XA Pending CN107993663A (en) | 2017-09-11 | 2017-09-11 | A kind of method for recognizing sound-groove based on Android |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107993663A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766435A (en) * | 2018-05-17 | 2018-11-06 | 东莞市华睿电子科技有限公司 | A kind of robot for space control method based on non-touch |
CN108922543A (en) * | 2018-06-11 | 2018-11-30 | 平安科技(深圳)有限公司 | Model library method for building up, audio recognition method, device, equipment and medium |
CN109147818A (en) * | 2018-10-30 | 2019-01-04 | Oppo广东移动通信有限公司 | Acoustic feature extracting method, device, storage medium and terminal device |
CN109243465A (en) * | 2018-12-06 | 2019-01-18 | 平安科技(深圳)有限公司 | Voiceprint authentication method, device, computer equipment and storage medium |
CN109801635A (en) * | 2019-01-31 | 2019-05-24 | 北京声智科技有限公司 | A kind of vocal print feature extracting method and device based on attention mechanism |
CN110556114A (en) * | 2019-07-26 | 2019-12-10 | 国家计算机网络与信息安全管理中心 | Speaker identification method and device based on attention mechanism |
CN110992930A (en) * | 2019-12-06 | 2020-04-10 | 广州国音智能科技有限公司 | Voiceprint feature extraction method and device, terminal and readable storage medium |
CN111243601A (en) * | 2019-12-31 | 2020-06-05 | 北京捷通华声科技股份有限公司 | Voiceprint clustering method and device, electronic equipment and computer-readable storage medium |
CN111640419A (en) * | 2020-05-26 | 2020-09-08 | 合肥讯飞数码科技有限公司 | Language identification method, system, electronic equipment and storage medium |
CN112201275A (en) * | 2020-10-09 | 2021-01-08 | 深圳前海微众银行股份有限公司 | Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium |
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
CN113168837A (en) * | 2018-11-22 | 2021-07-23 | 三星电子株式会社 | Method and apparatus for processing human voice data of voice |
WO2021174883A1 (en) * | 2020-09-22 | 2021-09-10 | 平安科技(深圳)有限公司 | Voiceprint identity-verification model training method, apparatus, medium, and electronic device |
CN114530163A (en) * | 2021-12-31 | 2022-05-24 | 安徽云磬科技产业发展有限公司 | Method and system for recognizing life cycle of equipment by adopting voice based on density clustering |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004088632A2 (en) * | 2003-03-26 | 2004-10-14 | Honda Motor Co., Ltd. | Speaker recognition using local models |
CN102509547A (en) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
CN104464738A (en) * | 2014-10-31 | 2015-03-25 | 北京航空航天大学 | Vocal print recognition method oriented to smart mobile device |
CN104573652A (en) * | 2015-01-04 | 2015-04-29 | 华为技术有限公司 | Method, device and terminal for determining identity identification of human face in human face image |
CN106682650A (en) * | 2017-01-26 | 2017-05-17 | 北京中科神探科技有限公司 | Mobile terminal face recognition method and system based on technology of embedded deep learning |
-
2017
- 2017-09-11 CN CN201710809811.XA patent/CN107993663A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004088632A2 (en) * | 2003-03-26 | 2004-10-14 | Honda Motor Co., Ltd. | Speaker recognition using local models |
CN102509547A (en) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
CN104464738A (en) * | 2014-10-31 | 2015-03-25 | 北京航空航天大学 | Vocal print recognition method oriented to smart mobile device |
CN104573652A (en) * | 2015-01-04 | 2015-04-29 | 华为技术有限公司 | Method, device and terminal for determining identity identification of human face in human face image |
CN106682650A (en) * | 2017-01-26 | 2017-05-17 | 北京中科神探科技有限公司 | Mobile terminal face recognition method and system based on technology of embedded deep learning |
Non-Patent Citations (5)
Title |
---|
MARIO VARGA .ET AL: "Performance Evaluation of GMM and KD-KNN Algorithms Implemented in Speaker Identification Web-Application Based on Java EE", 《56TH INTERNATIONAL SYMPOSIUM ELMAR-2014》 * |
张志兵: "《空间数据挖掘及其相关问题研究》", 31 October 2011 * |
曾向阳: "《智能水中目标识别》", 31 March 2016 * |
赵力: "《语音信号处理》", 31 May 2009 * |
鲁晓倩: "基于VQ和GMM的实时声纹识别研究", 《计算机***应用》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108766435A (en) * | 2018-05-17 | 2018-11-06 | 东莞市华睿电子科技有限公司 | A kind of robot for space control method based on non-touch |
CN108922543A (en) * | 2018-06-11 | 2018-11-30 | 平安科技(深圳)有限公司 | Model library method for building up, audio recognition method, device, equipment and medium |
CN108922543B (en) * | 2018-06-11 | 2022-08-16 | 平安科技(深圳)有限公司 | Model base establishing method, voice recognition method, device, equipment and medium |
CN109147818A (en) * | 2018-10-30 | 2019-01-04 | Oppo广东移动通信有限公司 | Acoustic feature extracting method, device, storage medium and terminal device |
CN113168837A (en) * | 2018-11-22 | 2021-07-23 | 三星电子株式会社 | Method and apparatus for processing human voice data of voice |
CN109243465A (en) * | 2018-12-06 | 2019-01-18 | 平安科技(深圳)有限公司 | Voiceprint authentication method, device, computer equipment and storage medium |
CN109801635A (en) * | 2019-01-31 | 2019-05-24 | 北京声智科技有限公司 | A kind of vocal print feature extracting method and device based on attention mechanism |
CN110556114A (en) * | 2019-07-26 | 2019-12-10 | 国家计算机网络与信息安全管理中心 | Speaker identification method and device based on attention mechanism |
CN110992930A (en) * | 2019-12-06 | 2020-04-10 | 广州国音智能科技有限公司 | Voiceprint feature extraction method and device, terminal and readable storage medium |
CN111243601A (en) * | 2019-12-31 | 2020-06-05 | 北京捷通华声科技股份有限公司 | Voiceprint clustering method and device, electronic equipment and computer-readable storage medium |
CN111640419A (en) * | 2020-05-26 | 2020-09-08 | 合肥讯飞数码科技有限公司 | Language identification method, system, electronic equipment and storage medium |
CN111640419B (en) * | 2020-05-26 | 2023-04-07 | 合肥讯飞数码科技有限公司 | Language identification method, system, electronic equipment and storage medium |
WO2021174883A1 (en) * | 2020-09-22 | 2021-09-10 | 平安科技(深圳)有限公司 | Voiceprint identity-verification model training method, apparatus, medium, and electronic device |
CN112201275A (en) * | 2020-10-09 | 2021-01-08 | 深圳前海微众银行股份有限公司 | Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium |
CN112201275B (en) * | 2020-10-09 | 2024-05-07 | 深圳前海微众银行股份有限公司 | Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium |
CN112951245A (en) * | 2021-03-09 | 2021-06-11 | 江苏开放大学(江苏城市职业学院) | Dynamic voiceprint feature extraction method integrated with static component |
CN114530163A (en) * | 2021-12-31 | 2022-05-24 | 安徽云磬科技产业发展有限公司 | Method and system for recognizing life cycle of equipment by adopting voice based on density clustering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107993663A (en) | A kind of method for recognizing sound-groove based on Android | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
WO2019232829A1 (en) | Voiceprint recognition method and apparatus, computer device and storage medium | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN110428842A (en) | Speech model training method, device, equipment and computer readable storage medium | |
TW201935464A (en) | Method and device for voiceprint recognition based on memorability bottleneck features | |
CN106952649A (en) | Method for distinguishing speek person based on convolutional neural networks and spectrogram | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
CN105096955B (en) | A kind of speaker's method for quickly identifying and system based on model growth cluster | |
CN102324232A (en) | Method for recognizing sound-groove and system based on gauss hybrid models | |
CN102968990B (en) | Speaker identifying method and system | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN109584884A (en) | A kind of speech identity feature extractor, classifier training method and relevant device | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
CN103794207A (en) | Dual-mode voice identity recognition method | |
CN108922543A (en) | Model library method for building up, audio recognition method, device, equipment and medium | |
CN113221673B (en) | Speaker authentication method and system based on multi-scale feature aggregation | |
CN104887263A (en) | Identity recognition algorithm based on heart sound multi-dimension feature extraction and system thereof | |
CN111899757A (en) | Single-channel voice separation method and system for target speaker extraction | |
CN113488060B (en) | Voiceprint recognition method and system based on variation information bottleneck | |
CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
CN109961794A (en) | A kind of layering method for distinguishing speek person of model-based clustering | |
CN102496366B (en) | Speaker identification method irrelevant with text | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180504 |