CN107993663A - A kind of method for recognizing sound-groove based on Android - Google Patents

A kind of method for recognizing sound-groove based on Android Download PDF

Info

Publication number
CN107993663A
CN107993663A CN201710809811.XA CN201710809811A CN107993663A CN 107993663 A CN107993663 A CN 107993663A CN 201710809811 A CN201710809811 A CN 201710809811A CN 107993663 A CN107993663 A CN 107993663A
Authority
CN
China
Prior art keywords
model
speaker
voice
training
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710809811.XA
Other languages
Chinese (zh)
Inventor
陈立江
窦文韬
张旭东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710809811.XA priority Critical patent/CN107993663A/en
Publication of CN107993663A publication Critical patent/CN107993663A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of method for recognizing sound-groove based on Android of the present invention, operates in Android operation system, by sound pick-up outfit collection training speaker's audio built in calling and carries out speech enhan-cement, kd trees and the gauss hybrid models of vector quantization model are built when training.Searched out during identification in the kd trees of vector quantization model with after the test trained speaker of immediate K of speaker's vocal print feature, accurately being identified using gauss hybrid models.The present invention not only avoid all models in traversal gauss hybrid models storehouse using the kd trees of vector quantization model, add recognition speed, and the accuracy of identification is added using the secondary identification of vector quantization model and gauss hybrid models, there is very strong practicality, ease for use and robustness.

Description

A kind of method for recognizing sound-groove based on Android
(1) technical field:
A kind of method for recognizing sound-groove based on Android of the present invention, belongs to field of computer technology.
(2) background technology:
In the epoch of " nowhere not account, nowhere not password ", people are usually because forgeing or losing password and feel tired Angry, the appearance of method for recognizing sound-groove undoubtedly brings more convenient, efficient method of service, and vocal print feature " can be carried with ", Authentication can be carried out whenever and wherever possible.The present invention is based on Android operation system, extracts vocal print feature, is sweared by using construction Measure the kd trees (k-dimension Tree) of quantitative model and gauss hybrid models reach the effect of identification speaker.This hair It is bright to lift search recognition speed, precision, there are very strong practicality, ease for use and robustness.
(3) content of the invention:
A kind of method for recognizing sound-groove based on Android of the present invention, extract voice mel-frequency cepstrum coefficient, one Order difference mel-frequency cepstrum coefficient, second differnce mel-frequency cepstrum coefficient, frame energy and the energy screening of utilization frame are effective After feature, the vector quantization model kd trees of training speaker's vocal print feature are constructed, establish the gauss hybrid models of vocal print feature Storehouse, when test first search out in kd trees K with after the most similar model of vocal print feature, using gauss hybrid models in K model It is middle accurately to be identified, so as to identify speaker's identity.
1st, a kind of method for recognizing sound-groove based on Android, this method step are as follows:
Step 1:Voice data is gathered using AudioRecorder interfaces, using monaural recording, sample frequency is set 22050HZ, takes pulse code modulation, each sampled point quantization digit 16, while passes through calling AcousticEchoCanceler classes example, NoiseSuppressor classes example and AutomaticGainControl class examples Carry out automatic echo cancellor, suppress noise and automatic growth control, reach the effect of speech enhan-cement;Pass through Android's at the same time Asynchronous message treatment mechanism realizes sub-line journey renewal UI and realizes clocking capability;While audio uncorrected data is obtained, for its volume It is stored in after writing the header file of wave forms in corresponding training speaker's voice library archive, after End of Tape, software interface bullet Go out renaming window, user inputs the name of corresponding training speaker as filename;
Step 2:After training speaker's sound bank collection, all audio files in the storehouse are pre-processed, it is first Framing operation is first carried out, it is 16 milliseconds to set framing frame length, and it is 8 milliseconds that frame, which moves,;After the completion of framing operation, double threshold is utilized Method carries out end-point detection work, and since speech enhan-cement effect is preferable in step 1, it is 0.1 to set low energy threshold value, high-energy threshold It is worth for 1, low zero-crossing rate threshold value is 0.01, and high zero-crossing rate threshold value is 10, a length of 12 frame, i.e. voice when most long mute in voice segments The frame energy and zero-crossing rate of section are at the same time less than the duration of low energy threshold value and low zero-crossing rate threshold value no more than 12 frames, most phrase A length of 10 frame during sound, i.e. the frame energy of initiating speech section be more than low frame energy threshold or zero-crossing rate be more than zero-crossing rate threshold value when Length cannot be shorter than 10 frames;After the completion of end-point detection, voice segment signal is obtained, is to add Hamming window per frame signal;When adding window is grasped It is to carry out preemphasis operation per frame voice segment signal after the completion of work, makes up the loss of high fdrequency component, this method sets preemphasis Coefficient is 0.93;
Step 3:After the completion of pretreatment, this method takes mel-frequency cepstrum coefficient, sets triangle bandpass filter number For 40,12 number is maintained before extraction per frame;After the completion of mel-frequency cepstrum coefficient is extracted, 1 rank and 2 ranks 12 dimension difference before extraction Mel-frequency cepstrum coefficient;After Differential Characteristics are completed in extraction, eliminate corresponding frame energy and be less than 1 and the feature vector more than 10, The identification error that elimination speaker's wave volume is excessive or too small and produces;
Step 4:After feature extraction is completed, its vector quantization model of training, generation represents the code book of the speaker, adopts With the algorithm of construction balance kd trees, kd trees are built with the code word in all trained speaker's code books, each code word is as in the tree A node and exist;
Step 5:Meanwhile program is clustered characteristic vector using k means clustering algorithms in another thread, classification Number is 16, after k means clustering process is repeated 10 times by this method, calculates total variance within clusters after cluster every time, choosing respectively That selects total variance within clusters minimum is once used as final result;After cluster operation is completed, Gauss model is carried out using EM algorithms Parameter Estimation, the initial ginseng first using each parameter-average, variance and the weight coefficient after cluster as gauss hybrid models Number, carries out the revaluation of parameters by the parameter revaluation formula of EM algorithms, pair of likelihood function value is calculated during revaluation The knots modification of numerical value, when the knots modification is less than threshold value 0.01, that is, is judged to restraining, records average, variance and weight at this time Coefficient, then three parameters and the voice document name of crawl are that training speaker name is stored in speaker model class, and The speaker model class example is stored in model dynamic array;When the memory size in model dynamic array, that is, model class is real The number of example, during equal to speaker's number, shows that all voice documents are all trained to finish, and then program is by model dynamic Array carries out serializing operation, and each model class instance transfer therein is byte sequence, then mixes sequence deposit Gauss Molding type database file, training finish;
Step 6:After the completion of the foundation of gauss hybrid models database file, you can carry out test job, admission test is said After talking about people's voice data, according to step 2, the vocal print feature vector set of 3 extraction test speakers, concentrate and choose in feature vector One feature vector, in the kd trees that step 4 generates, finds the M code word nearest with the Euclidean distance of this feature vector, and The corresponding code book of M code word is searched, is repeated the above steps, until having traveled through testing feature vector collection;Then find out by Search K most code book of number, wherein K<M, it is then that the vocal print of the corresponding K trained speaker of the K code book is special Sign carries out the identification of gauss hybrid models, and during gauss hybrid models identify, program pin is to wherein everyone vocal print Feature, finds the model of posterior probability maximum in gauss hybrid models database, and the corresponding speaker of the model judges To test speaker.
2nd, the corresponding Voiceprint Recognition System of method for recognizing sound-groove according to 1, including following module:
Voice acquisition module:Hardware components include microphone and sound card, and microphone is responsible for the collection of voice, and sound card is then It is for being digitally converted to voice;Software section include audio data collecting, automatic echo cancellor, suppress noise, from Dynamic gain control and clocking capability;
Vocal print feature extraction module:Including voice pretreatment and two submodules of feature extraction, input is by voice collecting Feature is sent to model training module and is handled by the voice data of module admission, training stage, and test phase sends feature to Model identification module is handled;
Model training module:The module includes the vector quantization model of training voice, the kd trees for building code word and training language The gauss hybrid models of sound;
Model modification module:Newly added trained speaker's vector quantization model is inserted into kd trees by the module, and will Its gauss hybrid models is added in gauss hybrid models database file;
Model identification module:The input of the module is the output of vocal print feature extraction module, and the output of the module is to survey Try the name and its personal information of speaker;
Database management module:It is responsible for training name and its personal information, the model parameter and voice of speaker File, wherein each model correspond to each voice document, the name of each voice document is the name of corresponding speaker, often The name of a speaker is connected to its personal information in the database.
(4) illustrate:
Fig. 1, model training flow chart;
Fig. 2, Model Identification flow chart;
Fig. 3, asynchronous message treatment mechanism;
Fig. 4, pretreatment process figure;
Fig. 5, vocal print feature extraction flow chart;
Fig. 6, triangle bandpass filter;
Fig. 7, gauss hybrid models Establishing process figure.
(5) embodiment:
Below in conjunction with the accompanying drawings, technical solutions according to the invention are further elaborated.
The software program of the present invention mainly includes algorithm realization and the client software control circle of model training and identification Face, wherein algorithm part are mainly including the framing adding window to voice data, end-point detection, preemphasis, feature extraction and model Training, identification, training and identification process are as shown in Figure 1, 2.Client software control interface is mainly used for behaviour of the user to software Control.
Audio collection:It is mainly used for gathering voice data, timing and speech enhan-cement.Wherein audio collection subprogram AudioRecorder class examples are called to carry out inputting audio data, program writes wave forms for it after voice data typing Header file, and this document is named with speaker's name, it is stored in voice library file.In order to reach the effect of speech enhan-cement, adopting Collect audio while, program by call AcousticEchoCanceler classes example, NoiseSuppressor classes example and AutomaticGainControl classes example carries out automatic echo cancellor, suppresses noise and automatic growth control, at the same time journey Sequence realizes clocking capability by using the asynchronous message treatment mechanism of Android, and the mechanism flow chart is as shown in Figure 3;
Feature extraction:The main algorithm realization for including pretreatment and feature extraction, preprocessing part is as shown in figure 4, first Framing operation is carried out to audio, after making it have relatively stable characteristic, partial digitized processing can be carried out to it.To obtaining The audio taken carries out end-point detection work, to determine voice segments position, calculates the zero-crossing rate and frame energy of present frame, such as At least one parameter of the parameter of fruit two has exceeded set threshold value, if duration long enough, note present frame is Point, otherwise, jumps to next frame and is calculated;If two parameters are respectively less than set threshold value, if duration long enough, Terminal is then denoted as, otherwise, next frame is jumped to and is calculated.It is to add per frame data after the start frame and abort frame that determine voice segments Hamming window carries out preemphasis work to lift its high fdrequency component to ensure the continuity between frame and frame.Characteristic extraction part is such as Shown in Fig. 5, after the completion of pretreatment, program carries out extraction and the screening operation of feature.After the data of all frames of program pass, Frame data of the frame energy in range of normal value are filtered out, after carrying out FFT transform to every frame data after screening, by the frequency Spectrum carries out a square processing, obtains its energy spectrum.Meanwhile program carries out the design evaluation work of triangle bandpass filter, experiment is seen Survey and find human ear just as one group of wave filter, it simply selectively pays close attention to some frequency signals.But these wave filters Distribution on the frequency axis is not but unified, and low frequency region is densely distributed, and high-frequency region distribution is sparse, as shown in Figure 6.Should The design procedure of triangle bandpass filter group is:
(1) low-limit frequency and the number of highest frequency and Mel wave filters of voice signal are determined;
(2) the Mel frequencies corresponding to minimum and highest frequency are calculated;
(3) since filter centre frequency is spacedly distributed on Mel frequency axis, calculate in two neighboring Mel wave filters The distance of frequency of heart;
(4) each Mel centre frequencies are converted into actual frequency, calculate the FFT subscripts corresponding to each Frequency point.
(5) according to formula:
Calculate the range value of triangle bandpass filter group.Wherein, under f (m) is represented corresponding to m-th of center frequency points Mark, HmRepresent m-th of wave filter.Assuming that o (m), c (m) and h (m) are the lower frequency limit of m-th of triangular filter, center respectively Frequency and upper limiting frequency, then adjacent wave filter have:
C (m)=h (m-1)=o (m+1)
After the completion of the design of bandpass filter group, the energy spectrum per frame is filtered using wave filter group, then takes result pair Discrete cosine transform is carried out after number, output result is mel-frequency cepstrum coefficient, then utilizes mel-frequency cepstrum coefficient meter Calculate single order and second differnce mel-frequency cepstrum coefficient, calculating formula are as follows:
Wherein, t represents t-th of difference coefficient, and the general values of N are 2, and it is MFCC that c is corresponding when calculating first-order difference, meter It is first-order difference MFCC that c is corresponding when calculating second differnce, and hereafter each order differential points are all based on first-order difference calculating.We Method extraction is first-order difference and second differnce, since the later parameter amplitude of 12 dimensions substantially goes to zero, so before only needing 12 dimension parameters.Finally, program every frame feature vector it is last it is one-dimensional add per frame energy value as a kind of vocal print feature, And eliminate feature vector of the frame energy outside 1 to 10.
The structure of vector quantization model kd trees:After training speaker's speech feature vector collection is obtained, its vector quantity of training Change model, the code word in all code books obtained after training pattern is constructed into balance kd according to the flow of structure balance kd trees Tree, all code words represent a node in kd trees.
The training of gauss hybrid models:The algorithm flow is as shown in fig. 7, program is special by the training speaker vocal print of acquisition Sign vector is classified using K mean cluster algorithm, per the average of class, variance as gauss hybrid models initial mean value and Vectorial number in every class, is accounted for initial weight of the ratio as gauss hybrid models of general characteristic vector number by variance, and will Degree of mixing of the preliminary classification number as gauss hybrid models, then carries out the average, variance and weight of model using EM algorithms Interative computation, until convergence.
The search of vector quantization model kd trees:It is any wherein to choose one after test speaker characteristic vector set is obtained A feature vector, goes out M and the most similar code word of this feature vector according to the nearest neighbor search algorithm search of kd trees, looks into respectively Look for its corresponding code book.All feature vectors that feature vector is concentrated are traveled through, after repeating the above steps, K is picked out and is searched The most code book of number, wherein K<M.Obtain the training speaker corresponding to the K code book and be used for the accurate of gauss hybrid models Identification.
Gauss hybrid models identify:After searching out with immediate K trained speaker of test speaker, in height The gauss hybrid models of this K trained speaker are found in this mixed model storehouse, then find mould in the K model parameter Shape parameter λiSo that the characteristic vector group X for testing speaker has maximum a posteriori probability P (λi| X), model parameter λiIt is corresponding Training speaker i.e. be determined as test speaker.
According to Bayesian formula:
Again due to P (λi) it is the probability that i-th of model is selected in model library, so
And P (X) is the probability for testing speaker, it is a definite constant, all equal for all testers, because This, and the maximum posterior probability i.e. maximum P of correspondence (X | λi), in order to reduce computation complexity, usually taken the logarithm, it is as follows It is shown:

Claims (2)

1. a kind of method for recognizing sound-groove based on Android, this method step are as follows:
Step 1:Voice data is gathered using AudioRecorder interfaces, using monaural recording, sample frequency is set 22050HZ, takes pulse code modulation, each sampled point quantization digit 16, while passes through calling AcousticEchoCanceler classes example, NoiseSuppressor classes example and AutomaticGainControl class examples Carry out automatic echo cancellor, suppress noise and automatic growth control, reach the effect of speech enhan-cement, while pass through Android's Asynchronous message treatment mechanism realizes sub-line journey renewal UI and realizes clocking capability, while audio uncorrected data is obtained, for its volume It is stored in after writing the header file of wave forms in corresponding training speaker's voice library archive, after End of Tape, software interface bullet Go out renaming window, user inputs the name of corresponding training speaker as filename;
Step 2:When training speaker's sound bank collection after, all audio files in the storehouse are pre-processed, first into Row framing operates, and it is 16 milliseconds to set framing frame length, and it is 8 milliseconds that frame, which moves,;After the completion of framing operation, carried out using double threshold method End-point detection works, and since speech enhan-cement effect is preferable in step 1, it is 0.1 to set low energy threshold value, and high-energy threshold value is 1, low Zero-crossing rate threshold value is 0.01, and high zero-crossing rate threshold value is 10, the frame energy of a length of 12 frame, i.e. voice segments when most long mute in voice segments A length of 10 during with zero-crossing rate at the same time less than the duration of low energy threshold value and low zero-crossing rate threshold value no more than 12 frames, most phrase sound The frame energy of frame, i.e. initiating speech section is more than low frame energy threshold or zero-crossing rate and cannot be shorter than more than the duration of low zero-crossing rate threshold value 10 frames;After the completion of end-point detection, voice segment signal is obtained, is to add Hamming window per frame signal;After the completion of windowing operation, it is Preemphasis operation is carried out per frame voice segment signal, makes up the loss of high fdrequency component, it is 0.93 that this method, which sets pre emphasis factor,;
Step 3:After the completion of pretreatment, this method extracts mel-frequency cepstrum coefficient, and setting triangle bandpass filter number is 40,12 maintain number before being extracted per frame;After the completion of mel-frequency cepstrum coefficient is extracted, 1 rank and 2 ranks 12 dimension difference Meier before extraction Frequency cepstral coefficient;After Differential Characteristics are completed in extraction, eliminate corresponding frame energy and be less than 1 and the feature vector more than 10, eliminate The identification error that speaker's wave volume is excessive or too small and produces;
Step 4:After feature extraction is completed, its vector quantization model of training, generation represents the code book of the speaker, using construction The algorithm of kd trees is balanced, kd trees are built with the code word in all trained speaker's code books, each code word is as one in the tree Node and exist;
Step 5:Meanwhile program is clustered characteristic vector using k means clustering algorithms in another thread, classification number is 16, after k means clustering process is repeated 10 times by this method, total variance within clusters after cluster every time are calculated respectively, and selection is total Variance within clusters minimum is once used as final result;After cluster operation is completed, the parameter of Gauss model is carried out using EM algorithms Estimation, first using each parameter-average, variance and the weight coefficient after cluster as the initial parameter of gauss hybrid models, passes through The parameter revaluation formula of EM algorithms carries out the revaluation of parameters, and changing to numerical value for likelihood function value is calculated during revaluation Variable, when the knots modification is less than threshold value 0.01, that is, is judged to restraining, records average, variance and weight coefficient at this time, then Three parameters and the i.e. training speaker name of the voice document name of crawl are stored in speaker model class, and this is spoken In people's model class example deposit model dynamic array;When the number of the memory size in model dynamic array, that is, model class example, During equal to speaker's number, show that all voice documents are all trained to finish, then model dynamic array is carried out sequence by program Rowization operate, and each model class instance transfer therein is byte sequence, and the sequence then is stored in gauss hybrid models data Library file, training finish;
Step 6:After the completion of the foundation of gauss hybrid models database file, you can carry out test job, admission test speaker After voice data, according to step 2, the vocal print feature vector set of 3 extraction test speakers, concentrated in feature vector and choose a spy Sign vector, in the kd trees that step 4 generates, finds the M code word nearest with the Euclidean distance of this feature vector, and search the M The corresponding code book of a code word, repeats the above steps, until having traveled through testing feature vector collection, then finds out by lookup number K most code books, wherein K<M, then carries out the vocal print feature of the corresponding K trained speaker of the K code book high The identification of this mixed model, during gauss hybrid models identify, program pin is to wherein everyone vocal print feature, in height The model of posterior probability maximum is found in this mixed model database, the corresponding speaker of the model is judged as that test is spoken People.
2. the corresponding Voiceprint Recognition System of method for recognizing sound-groove according to claim 1, including following module:
Voice acquisition module:Hardware components include microphone and sound card, and microphone is responsible for the collection of voice, and sound card be then for Voice is digitally converted;Software section includes audio data collecting, automatic echo cancellor, suppresses noise, automatic gain Control and clocking capability;
Vocal print feature extraction module:Including voice pretreatment and two submodules of feature extraction, input is by voice acquisition module Feature is sent to model training module and is handled by the voice data of admission, training stage, and feature is sent to model by test phase Identification module is handled;
Model training module:The module includes the vector quantization model of training voice, the kd trees for building code word and training voice Gauss hybrid models;
Model modification module:Newly added trained speaker's vector quantization model is inserted into kd trees by the module, and it is high This mixed model is added in gauss hybrid models database file;
Model identification module:The input of the module is the output of vocal print feature extraction module, and the output of the module is to test to say Talk about the name and its personal information of people;
Database management module:It is responsible for training name and its personal information, the model parameter and voice document of speaker, Wherein each model correspond to each voice document, and the name of each voice document is the name of corresponding speaker, is each spoken The name of people is connected to its personal information in the database.
CN201710809811.XA 2017-09-11 2017-09-11 A kind of method for recognizing sound-groove based on Android Pending CN107993663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710809811.XA CN107993663A (en) 2017-09-11 2017-09-11 A kind of method for recognizing sound-groove based on Android

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710809811.XA CN107993663A (en) 2017-09-11 2017-09-11 A kind of method for recognizing sound-groove based on Android

Publications (1)

Publication Number Publication Date
CN107993663A true CN107993663A (en) 2018-05-04

Family

ID=62028944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710809811.XA Pending CN107993663A (en) 2017-09-11 2017-09-11 A kind of method for recognizing sound-groove based on Android

Country Status (1)

Country Link
CN (1) CN107993663A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766435A (en) * 2018-05-17 2018-11-06 东莞市华睿电子科技有限公司 A kind of robot for space control method based on non-touch
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN109147818A (en) * 2018-10-30 2019-01-04 Oppo广东移动通信有限公司 Acoustic feature extracting method, device, storage medium and terminal device
CN109243465A (en) * 2018-12-06 2019-01-18 平安科技(深圳)有限公司 Voiceprint authentication method, device, computer equipment and storage medium
CN109801635A (en) * 2019-01-31 2019-05-24 北京声智科技有限公司 A kind of vocal print feature extracting method and device based on attention mechanism
CN110556114A (en) * 2019-07-26 2019-12-10 国家计算机网络与信息安全管理中心 Speaker identification method and device based on attention mechanism
CN110992930A (en) * 2019-12-06 2020-04-10 广州国音智能科技有限公司 Voiceprint feature extraction method and device, terminal and readable storage medium
CN111243601A (en) * 2019-12-31 2020-06-05 北京捷通华声科技股份有限公司 Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
CN111640419A (en) * 2020-05-26 2020-09-08 合肥讯飞数码科技有限公司 Language identification method, system, electronic equipment and storage medium
CN112201275A (en) * 2020-10-09 2021-01-08 深圳前海微众银行股份有限公司 Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component
CN113168837A (en) * 2018-11-22 2021-07-23 三星电子株式会社 Method and apparatus for processing human voice data of voice
WO2021174883A1 (en) * 2020-09-22 2021-09-10 平安科技(深圳)有限公司 Voiceprint identity-verification model training method, apparatus, medium, and electronic device
CN114530163A (en) * 2021-12-31 2022-05-24 安徽云磬科技产业发展有限公司 Method and system for recognizing life cycle of equipment by adopting voice based on density clustering

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088632A2 (en) * 2003-03-26 2004-10-14 Honda Motor Co., Ltd. Speaker recognition using local models
CN102509547A (en) * 2011-12-29 2012-06-20 辽宁工业大学 Method and system for voiceprint recognition based on vector quantization based
CN104464738A (en) * 2014-10-31 2015-03-25 北京航空航天大学 Vocal print recognition method oriented to smart mobile device
CN104573652A (en) * 2015-01-04 2015-04-29 华为技术有限公司 Method, device and terminal for determining identity identification of human face in human face image
CN106682650A (en) * 2017-01-26 2017-05-17 北京中科神探科技有限公司 Mobile terminal face recognition method and system based on technology of embedded deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088632A2 (en) * 2003-03-26 2004-10-14 Honda Motor Co., Ltd. Speaker recognition using local models
CN102509547A (en) * 2011-12-29 2012-06-20 辽宁工业大学 Method and system for voiceprint recognition based on vector quantization based
CN104464738A (en) * 2014-10-31 2015-03-25 北京航空航天大学 Vocal print recognition method oriented to smart mobile device
CN104573652A (en) * 2015-01-04 2015-04-29 华为技术有限公司 Method, device and terminal for determining identity identification of human face in human face image
CN106682650A (en) * 2017-01-26 2017-05-17 北京中科神探科技有限公司 Mobile terminal face recognition method and system based on technology of embedded deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MARIO VARGA .ET AL: "Performance Evaluation of GMM and KD-KNN Algorithms Implemented in Speaker Identification Web-Application Based on Java EE", 《56TH INTERNATIONAL SYMPOSIUM ELMAR-2014》 *
张志兵: "《空间数据挖掘及其相关问题研究》", 31 October 2011 *
曾向阳: "《智能水中目标识别》", 31 March 2016 *
赵力: "《语音信号处理》", 31 May 2009 *
鲁晓倩: "基于VQ和GMM的实时声纹识别研究", 《计算机***应用》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766435A (en) * 2018-05-17 2018-11-06 东莞市华睿电子科技有限公司 A kind of robot for space control method based on non-touch
CN108922543A (en) * 2018-06-11 2018-11-30 平安科技(深圳)有限公司 Model library method for building up, audio recognition method, device, equipment and medium
CN108922543B (en) * 2018-06-11 2022-08-16 平安科技(深圳)有限公司 Model base establishing method, voice recognition method, device, equipment and medium
CN109147818A (en) * 2018-10-30 2019-01-04 Oppo广东移动通信有限公司 Acoustic feature extracting method, device, storage medium and terminal device
CN113168837A (en) * 2018-11-22 2021-07-23 三星电子株式会社 Method and apparatus for processing human voice data of voice
CN109243465A (en) * 2018-12-06 2019-01-18 平安科技(深圳)有限公司 Voiceprint authentication method, device, computer equipment and storage medium
CN109801635A (en) * 2019-01-31 2019-05-24 北京声智科技有限公司 A kind of vocal print feature extracting method and device based on attention mechanism
CN110556114A (en) * 2019-07-26 2019-12-10 国家计算机网络与信息安全管理中心 Speaker identification method and device based on attention mechanism
CN110992930A (en) * 2019-12-06 2020-04-10 广州国音智能科技有限公司 Voiceprint feature extraction method and device, terminal and readable storage medium
CN111243601A (en) * 2019-12-31 2020-06-05 北京捷通华声科技股份有限公司 Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
CN111640419A (en) * 2020-05-26 2020-09-08 合肥讯飞数码科技有限公司 Language identification method, system, electronic equipment and storage medium
CN111640419B (en) * 2020-05-26 2023-04-07 合肥讯飞数码科技有限公司 Language identification method, system, electronic equipment and storage medium
WO2021174883A1 (en) * 2020-09-22 2021-09-10 平安科技(深圳)有限公司 Voiceprint identity-verification model training method, apparatus, medium, and electronic device
CN112201275A (en) * 2020-10-09 2021-01-08 深圳前海微众银行股份有限公司 Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium
CN112201275B (en) * 2020-10-09 2024-05-07 深圳前海微众银行股份有限公司 Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium
CN112951245A (en) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) Dynamic voiceprint feature extraction method integrated with static component
CN114530163A (en) * 2021-12-31 2022-05-24 安徽云磬科技产业发展有限公司 Method and system for recognizing life cycle of equipment by adopting voice based on density clustering

Similar Documents

Publication Publication Date Title
CN107993663A (en) A kind of method for recognizing sound-groove based on Android
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
WO2019232829A1 (en) Voiceprint recognition method and apparatus, computer device and storage medium
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN110428842A (en) Speech model training method, device, equipment and computer readable storage medium
TW201935464A (en) Method and device for voiceprint recognition based on memorability bottleneck features
CN106952649A (en) Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN101923855A (en) Test-irrelevant voice print identifying system
CN105096955B (en) A kind of speaker&#39;s method for quickly identifying and system based on model growth cluster
CN102324232A (en) Method for recognizing sound-groove and system based on gauss hybrid models
CN102968990B (en) Speaker identifying method and system
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN109584884A (en) A kind of speech identity feature extractor, classifier training method and relevant device
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN103794207A (en) Dual-mode voice identity recognition method
CN108922543A (en) Model library method for building up, audio recognition method, device, equipment and medium
CN113221673B (en) Speaker authentication method and system based on multi-scale feature aggregation
CN104887263A (en) Identity recognition algorithm based on heart sound multi-dimension feature extraction and system thereof
CN111899757A (en) Single-channel voice separation method and system for target speaker extraction
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN112053694A (en) Voiceprint recognition method based on CNN and GRU network fusion
CN109961794A (en) A kind of layering method for distinguishing speek person of model-based clustering
CN102496366B (en) Speaker identification method irrelevant with text
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504