CN108648759A - A kind of method for recognizing sound-groove that text is unrelated - Google Patents

A kind of method for recognizing sound-groove that text is unrelated Download PDF

Info

Publication number
CN108648759A
CN108648759A CN201810457528.XA CN201810457528A CN108648759A CN 108648759 A CN108648759 A CN 108648759A CN 201810457528 A CN201810457528 A CN 201810457528A CN 108648759 A CN108648759 A CN 108648759A
Authority
CN
China
Prior art keywords
voice
level
frame
groove
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810457528.XA
Other languages
Chinese (zh)
Inventor
郭炜强
平怡强
张宇
郑波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810457528.XA priority Critical patent/CN108648759A/en
Publication of CN108648759A publication Critical patent/CN108648759A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of method for recognizing sound-groove that text is unrelated, including Application on Voiceprint Recognition model training, extraction insertion, decision scoring three phases.Model training stage step:1) speech signal pre-processing;2) voice frame level operates;3) statistics convergence-level summarizes frame level output;4) one-dimensional convolution operation;5) full articulamentum exports speaker clustering.After the completion of model training, insertion is extracted before full articulamentum first layer non-linearization.It is finally scored using COS distance decision, determines to accept or reject.Present invention combination neural network embedded technology and convolutional neural networks carry out dimensionality reduction using one-dimensional convolution, and using maximum value convergence-level, increase the convolution number of plies and improve the performance of model in this way to carry out further feature extraction.Make the process using COS distance as standards of grading faster, more simply.

Description

A kind of method for recognizing sound-groove that text is unrelated
Technical field
The present invention relates to the technical field of Application on Voiceprint Recognition, a kind of combination neural network embedded technology and convolution god are referred in particular to The unrelated method for recognizing sound-groove of text through network.
Background technology
Vocal print refers to the sound wave spectrum that verbal information is carried in human speech, it is the same with fingerprint, has unique biology Feature is learned, has the function of identification, not only there is specificity, but also there is opposite stability.Voice signal is one-dimensional Continuous signal, after it is carried out discretization, so that it may to obtain our the now common manageable voice signals of computer.
The manageable discrete voice signal of computer.Application on Voiceprint Recognition (also referred to as Speaker Identification) technology also as existed now It is the same using very extensive fingerprint identification technology on smart mobile phone, it is special that voice is extracted in the voice signal sent out from speaker Sign, and the biological identification technology of authentication is carried out to speaker accordingly.
Application on Voiceprint Recognition mainstream technology scheme has the identifying system based on i-vector.Its base in simultaneous factor analysis technology On plinth, propose that speaker and session difference can be characterized by an individual subspace.It, can be with using this sub-spaces The digital vector obtained from a sound materials, it is further converted to low dimension vector, is exactly i-vector.
Later with hardware device performance boost, deep neural network is successfully applied to Acoustic Modeling, the ability of identification Have compared with much progress, it was also proposed that the model that rational DNN and i-vector are combined, in the process of extraction sufficient statistic In, the UBM in original i-vector models is replaced with the DNN models based on phoneme state, is corresponded to obtain each frame The posterior probability of each classification.
Current newest technology, which has, extracts embedded spy in the slave time-delay neural network network that David Snyder et al. are proposed The acoustics identification model of sign, also known as x-vector.The model is used to calculate speaker's insertion of elongated voice (embedding).Its structure is a kind of end-to-end system.Its step are as follows:
Model training is carried out first.Voice signal is pre-processed, first 5 layers of network Shang not operated in frame level, statistics Convergence-level receives the output of last frame level layer as input, summarizes one section of all frame of voice and inputs and calculate its mean value and standard Difference.It then in voice segments level operations, connects full articulamentum and uses activation primitive ReLU, final full articulamentum Softmax is defeated Go out N number of speaker clustering.
After the completion of model training, the speaker for mapping directly to fixed length is embedded in by the voice of every section of random length.Then in pairs Registration voice and tested speech carry out decision scoring using the rear end based on PLDA, make and final determine to accept or reject..
Current network structure is all using full articulamentum.It is understood that its more ability to express of the network number of plies are stronger, but It is to train the full Connection Neural Network of depth highly difficult by gradient descent method, because the gradient of full Connection Neural Network is difficult to pass It passs more than 3 layers.Therefore, we can not possibly obtain a very deep full Connection Neural Network, also limit its ability.
Invention content
The shortcomings that it is an object of the invention to overcome the prior art and deficiency, it is proposed that a kind of Application on Voiceprint Recognition that text is unrelated Method improves the neural network embedded structure using convolutional neural networks, to the data of statistics convergence-level output, attempts to use One-dimensional convolution operation, and dimensionality reduction is carried out using maximum value convergence-level, increase the convolution number of plies, to carry out further feature extraction, this Sample improves the performance of model, and makes the process faster using COS distance as standards of grading, more simply.
To achieve the above object, technical solution provided by the present invention is:A kind of method for recognizing sound-groove that text is unrelated, packet Include following steps:
1) Application on Voiceprint Recognition model training
1.1) speech signal pre-processing;
1.2) voice frame level operates;
1.3) statistics convergence-level summarizes frame level output;
1.4) one-dimensional convolution operation;
1.5) full articulamentum exports speaker clustering;
2) extraction insertion:After model training is completed, registration voice and tested speech are inputted into Application on Voiceprint Recognition model, extraction It is embedded;
3) decision scores:The insertion of registration voice and tested speech calculates its score using COS distance, makes final determine Surely it accepts or rejects.
In step 1.1), by every section of voice in corpus with 25ms framings, and voice activity detection is carried out, believed from sound The prolonged mute phase is identified and eliminated in number stream, 20 Jan Vermeer frequency spectrum cepstrum coefficient MFCC is generated, adds single order and two scales Point coefficient generate per frame the MFCC feature vectors of totally 60 dimensions as input.
In step 1.2), first 5 layers of model training network structure Shang not operated in frame level, have time delay framework, it is assumed that t It is that current frame splices the Meier frequency spectrum cepstrum coefficient MFCC of the frame at { t-2, t-1, t, t+1, t+2 } one in input terminal It rises, next two layers output of the splicing preceding layer at time { t-2, t, t+2 } and { t-3, t, t+3 } respectively, later two Layer is not operated in frame level yet, but without any additional frame, and the frame level part of the network has t-7 to t+7 totally 15 in total Frame.
In step 1.3), statistics convergence-level receives the output of last frame level layer as input, and it is all to summarize one section of voice Frame inputs and calculates its mean value, it is assumed that one section of voice is divided into T frames in total, and statistics convergence-level summarizes from frame level layer layer 5 All T frames export and calculate its average value, and statistic is 3200 dimensional vectors, each input voice is only calculated once, this mistake Journey aggregation information on time dimension, so that succeeding layer runs operation on entire voice.
In step 1.4), to counting the output of convergence-level, handled using one-dimensional convolution, totally 5 layers of convolutional layer, preceding two The convolution kernel that layer convolutional layer is 5 using 256 sizes, step-length 2, third and fourth, five convolutional layers the use of 256 sizes is 3 Core, step-length 1, each convolutional layer are followed by a maximum value convergence-level.
In step 1.5), connect two full articulamentums, the activation primitives of two full articulamentums be respectively ReLU with The output of Softmax, the last one full articulamentum are N number of speaker clustering.
In step 2), after model training completion, insertion is extracted before full articulamentum first layer non-linearization, i.e., 1024 dimensional vectors export.
In step 3), register the insertion of voice and tested speech and calculate its score using COS distance, and with threshold value into Row compares, and makes final decision and accepts or rejects, score is then refused more than threshold value, then receives less than threshold value, formula is as follows:
Wherein, w1,w2Respectively register voice and tested speech insertion, score (w1,w2) indicate COS distance,<w1,w2> It is embedded in the dot product with tested speech insertion for registration voice, | | w1||,||w2| | it respectively registers voice insertion and tested speech is embedding The length entered, θ are preset threshold value.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
1, each neuron is no longer connected with all neurons of last layer in convolutional network, and only neural with sub-fraction Member is connected.Which reduces many parameters.
2, one group of connection can share the same weight, rather than there are one different weights for each connection, subtract again in this way Many parameters are lacked.
3, the sample dimension that every layer is reduced using maximum value convergence-level, is further reduced number of parameters, while can be with The robustness of lift scheme.
4, COS distance scores as the decision of speaker verification makes the process faster, more simply.
Description of the drawings
Fig. 1 is the logical flow chart of the method for the present invention.
Fig. 2 is the Application on Voiceprint Recognition model training flow chart of the present invention.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
As shown in Figure 1, the method for recognizing sound-groove that the text that is provided of the present embodiment is unrelated, is divided into three phases:Vocal print is known Other model training, extraction insertion, decision scoring.
The training for carrying out Application on Voiceprint Recognition model first, selects suitable corpus, such as uses AISHELL-ASR0009-OS1 It increases income Chinese speech database, wherein including training library and test library.
As shown in Fig. 2, steps are as follows for Application on Voiceprint Recognition model training:
1) speech signal pre-processing
By every section of voice in corpus with 25ms framings, and carry out voice activity detection, in voice signal stream identification and The prolonged mute phase is eliminated, 20 Jan Vermeer frequency spectrum cepstrum coefficient MFCC are generated, adds single order and second differnce coefficient, most throughout one's life At every frame, the totally 60 MFCC feature vectors tieed up are used as input.
2) voice frame level operates
First 5 layers of the sound-groove model network structure Shang not operated in frame level, have time delay framework.Vacation lets t be current frame. In input terminal, the MFCC of the frame at { t-2, t-1, t, t+1, t+2 } is stitched together by we.Splice respectively for next two layers Output of the preceding layer at time { t-2, t, t+2 } and { t-3, t, t+3 }.Two layers later is not operated in frame level yet, but There is no any additional frame.The frame level part of the network has t-7 to t+7 totally 15 frames in total.
3) statistics convergence-level summarizes frame level output
Statistics convergence-level receives the output of last frame level layer as input, summarizes one section of all frame of voice and inputs and calculate it Mean value.Assuming that one section of voice is divided into T frames in total, statistics convergence-level summarizes the output of all T frames from frame level layer layer 5 simultaneously Calculate its average value.Statistic is 3200 dimensional vectors, and each input voice is only calculated once.This process is on time dimension Aggregation information, so that succeeding layer runs operation on entire voice.
4) one-dimensional convolution operation
To counting the output of convergence-level, handled using one-dimensional convolution.Preceding two layers of convolutional layer is 5 using 256 sizes Convolution kernel, step-length 2, third and fourth, five convolutional layers the use of 256 sizes is 3 core, step-length 1, each convolutional layer is followed by One maximum value convergence-level.
5) full articulamentum exports speaker clustering.
Two full articulamentums are connected, the activation primitive of two full articulamentums is respectively ReLU and Softmax, the last one is complete The output of articulamentum is N number of speaker clustering.
The network structure of frame level operations and statistics convergence-level is as shown in table 1:
1 frame level operations of table and statistics convergence-level network structure
Layer Every layer of included frame Total context frame number Input → output
Frame level first layer [t-2,t+2] 5 300→1024
The frame level second layer {t-2,t,t+2} 9 3072→1024
Frame level third layer {t-3,t,t+3} 15 3072→1024
The 4th layer of frame level {t} 15 1024→1024
Frame level layer 5 {t} 15 1024→3200
Count convergence-level [0,T] T 3200T→3200
Convolutional layer and full articulamentum network structure are as shown in table 2:
2 convolutional layer of table and full articulamentum network structure
2)~5 the MFCC of every section of voice is carried out above) step operation, it constantly updates convolution kernel and connects layer parameter entirely, Complete the training to Application on Voiceprint Recognition model.
Extraction insertion:After the completion of model training, tested using the test library voice in corpus, will registration voice with Tested speech inputs Application on Voiceprint Recognition model, extracts and is embedded in before the full articulamentum first layer non-linearization of identification model, i.e., and 1024 Dimensional vector exports.
Decision scores:The insertion of registration voice and tested speech calculates its score using COS distance, and is carried out with threshold value Compare, makes final decision and accept or reject, score is then refused more than threshold value, then receives less than threshold value.Formula is as follows:
Wherein w1,w2The respectively insertion of registration voice and tested speech insertion, score (w1,w2) indicate COS distance,<w1, w2>It is embedded in the dot product with tested speech insertion for registration voice, | | w1||,||w2| | respectively register voice insertion and test language The length of sound insertion, θ are preset threshold value.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore Change made by all shapes according to the present invention, principle, should all cover within the scope of the present invention.

Claims (8)

1. a kind of method for recognizing sound-groove that text is unrelated, which is characterized in that include the following steps:
1) Application on Voiceprint Recognition model training
1.1) speech signal pre-processing;
1.2) voice frame level operates;
1.3) statistics convergence-level summarizes frame level output;
1.4) one-dimensional convolution operation;
1.5) full articulamentum exports speaker clustering;
2) extraction insertion:After model training is completed, registration voice and tested speech are inputted into Application on Voiceprint Recognition model, extraction is embedding Enter;
3) decision scores:The insertion of registration voice and tested speech calculates its score using COS distance, makes final decision and connects By or refusal.
2. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 1.1), By every section of voice in corpus with 25ms framings, and voice activity detection is carried out, the time is identified and eliminated in voice signal stream Length is more than the mute phase of preset value, generates 20 Jan Vermeer frequency spectrum cepstrum coefficient MFCC, adds single order and the life of second differnce coefficient At every frame, the totally 60 MFCC feature vectors tieed up are used as input.
3. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 1.2), First 5 layers of model training network structure Shang not operated in frame level, have time delay framework, it is assumed that and t is current frame, in input terminal, The Meier frequency spectrum cepstrum coefficient MFCC of frame at { t-2, t-1, t, t+1, t+2 } is stitched together, next two layers of difference Splice output of the preceding layer at time { t-2, t, t+2 } and { t-3, t, t+3 }, two layers later is not grasped in frame level yet Make, but no any additional frame, the frame level part of the network has t-7 to t+7 totally 15 frames in total.
4. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 1.3), Statistics convergence-level receives the output of last frame level layer as input, summarizes one section of all frame of voice and inputs and calculate its mean value, false If one section of voice is divided into T frames in total, statistics convergence-level, which summarizes all T frames from frame level layer layer 5 and export and calculate it, puts down Mean value, statistic are 3200 dimensional vectors, each input voice are only calculated once, this process polymerize letter on time dimension Breath, so that succeeding layer runs operation on entire voice.
5. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 1.4), To counting the output of convergence-level, handled using one-dimensional convolution, totally 5 layers of convolutional layer, preceding two layers of convolutional layer uses 256 sizes For 5 convolution kernel, step-length 2, third and fourth, five convolutional layers the use of 256 sizes is 3 core, step-length 1, after each convolutional layer Connect a maximum value convergence-level.
6. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 1.5), Connect two full articulamentums, the activation primitives of two full articulamentums is respectively ReLU and Softmax, the last one full articulamentum Output is N number of speaker clustering.
7. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 2), After model training is completed, insertion, i.e. 1024 dimensional vectors output are extracted before full articulamentum first layer non-linearization.
8. a kind of unrelated method for recognizing sound-groove of text according to claim 1, it is characterised in that:In step 3), note The insertion of volume voice and tested speech calculates its score using COS distance, and is compared with threshold value, makes final decision and connects By or refusal, score then refuses more than threshold value, then receives less than threshold value, formula is as follows:
Wherein, w1,w2Respectively register voice and tested speech insertion, score (w1,w2) indicate COS distance,<w1,w2>For note The dot product of the insertion of volume voice and tested speech insertion, | | w1||,||w2| | respectively register voice insertion and tested speech insertion Length, θ are preset threshold value.
CN201810457528.XA 2018-05-14 2018-05-14 A kind of method for recognizing sound-groove that text is unrelated Pending CN108648759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810457528.XA CN108648759A (en) 2018-05-14 2018-05-14 A kind of method for recognizing sound-groove that text is unrelated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810457528.XA CN108648759A (en) 2018-05-14 2018-05-14 A kind of method for recognizing sound-groove that text is unrelated

Publications (1)

Publication Number Publication Date
CN108648759A true CN108648759A (en) 2018-10-12

Family

ID=63755316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810457528.XA Pending CN108648759A (en) 2018-05-14 2018-05-14 A kind of method for recognizing sound-groove that text is unrelated

Country Status (1)

Country Link
CN (1) CN108648759A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584887A (en) * 2018-12-24 2019-04-05 科大讯飞股份有限公司 A kind of method and apparatus that voiceprint extracts model generation, voiceprint extraction
CN110033757A (en) * 2019-04-04 2019-07-19 行知技术有限公司 A kind of voice recognizer
CN110120223A (en) * 2019-04-22 2019-08-13 南京硅基智能科技有限公司 A kind of method for recognizing sound-groove based on time-delay neural network TDNN
CN110136686A (en) * 2019-05-14 2019-08-16 南京邮电大学 Multi-to-multi voice conversion method based on STARGAN Yu i vector
CN110189757A (en) * 2019-06-27 2019-08-30 电子科技大学 A kind of giant panda individual discrimination method, equipment and computer readable storage medium
CN110675878A (en) * 2019-09-23 2020-01-10 金瓜子科技发展(北京)有限公司 Method and device for identifying vehicle and merchant, storage medium and electronic equipment
CN110942777A (en) * 2019-12-05 2020-03-31 出门问问信息科技有限公司 Training method and device for voiceprint neural network model and storage medium
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111429921A (en) * 2020-03-02 2020-07-17 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN112382298A (en) * 2020-11-17 2021-02-19 北京清微智能科技有限公司 Awakening word voiceprint recognition method, awakening word voiceprint recognition model and training method thereof
CN113360869A (en) * 2020-03-04 2021-09-07 北京嘉诚至盛科技有限公司 Method for starting application, electronic equipment and computer readable medium
CN113488058A (en) * 2021-06-23 2021-10-08 武汉理工大学 Voiceprint recognition method based on short voice
CN113488060A (en) * 2021-06-25 2021-10-08 武汉理工大学 Voiceprint recognition method and system based on variation information bottleneck
CN114826709A (en) * 2022-04-15 2022-07-29 马上消费金融股份有限公司 Identity authentication and acoustic environment detection method, system, electronic device and medium
CN115457968A (en) * 2022-08-26 2022-12-09 华南理工大学 Voiceprint confirmation method based on mixed resolution depth separable convolution network
CN115457968B (en) * 2022-08-26 2024-07-05 华南理工大学 Voiceprint confirmation method based on mixed resolution depth separable convolution network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060022492A (en) * 2004-09-07 2006-03-10 학교법인연세대학교 Transformation method of speech feature vector for speaker recognition
CN107146624A (en) * 2017-04-01 2017-09-08 清华大学 A kind of method for identifying speaker and device
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CN107492382A (en) * 2016-06-13 2017-12-19 阿里巴巴集团控股有限公司 Voiceprint extracting method and device based on neutral net
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060022492A (en) * 2004-09-07 2006-03-10 학교법인연세대학교 Transformation method of speech feature vector for speaker recognition
CN107492382A (en) * 2016-06-13 2017-12-19 阿里巴巴集团控股有限公司 Voiceprint extracting method and device based on neutral net
CN107146624A (en) * 2017-04-01 2017-09-08 清华大学 A kind of method for identifying speaker and device
CN107464568A (en) * 2017-09-25 2017-12-12 四川长虹电器股份有限公司 Based on the unrelated method for distinguishing speek person of Three dimensional convolution neutral net text and system
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584887A (en) * 2018-12-24 2019-04-05 科大讯飞股份有限公司 A kind of method and apparatus that voiceprint extracts model generation, voiceprint extraction
CN109584887B (en) * 2018-12-24 2022-12-02 科大讯飞股份有限公司 Method and device for generating voiceprint information extraction model and extracting voiceprint information
CN110033757A (en) * 2019-04-04 2019-07-19 行知技术有限公司 A kind of voice recognizer
CN110120223A (en) * 2019-04-22 2019-08-13 南京硅基智能科技有限公司 A kind of method for recognizing sound-groove based on time-delay neural network TDNN
CN110136686A (en) * 2019-05-14 2019-08-16 南京邮电大学 Multi-to-multi voice conversion method based on STARGAN Yu i vector
CN110189757A (en) * 2019-06-27 2019-08-30 电子科技大学 A kind of giant panda individual discrimination method, equipment and computer readable storage medium
CN110675878A (en) * 2019-09-23 2020-01-10 金瓜子科技发展(北京)有限公司 Method and device for identifying vehicle and merchant, storage medium and electronic equipment
CN110942777B (en) * 2019-12-05 2022-03-08 出门问问信息科技有限公司 Training method and device for voiceprint neural network model and storage medium
CN110942777A (en) * 2019-12-05 2020-03-31 出门问问信息科技有限公司 Training method and device for voiceprint neural network model and storage medium
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111429921A (en) * 2020-03-02 2020-07-17 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111429921B (en) * 2020-03-02 2023-01-03 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN113360869A (en) * 2020-03-04 2021-09-07 北京嘉诚至盛科技有限公司 Method for starting application, electronic equipment and computer readable medium
CN112382298A (en) * 2020-11-17 2021-02-19 北京清微智能科技有限公司 Awakening word voiceprint recognition method, awakening word voiceprint recognition model and training method thereof
CN112382298B (en) * 2020-11-17 2024-03-08 北京清微智能科技有限公司 Awakening word voiceprint recognition method, awakening word voiceprint recognition model and training method thereof
CN113488058A (en) * 2021-06-23 2021-10-08 武汉理工大学 Voiceprint recognition method based on short voice
CN113488060A (en) * 2021-06-25 2021-10-08 武汉理工大学 Voiceprint recognition method and system based on variation information bottleneck
CN113488060B (en) * 2021-06-25 2022-07-19 武汉理工大学 Voiceprint recognition method and system based on variation information bottleneck
CN114826709A (en) * 2022-04-15 2022-07-29 马上消费金融股份有限公司 Identity authentication and acoustic environment detection method, system, electronic device and medium
CN115457968A (en) * 2022-08-26 2022-12-09 华南理工大学 Voiceprint confirmation method based on mixed resolution depth separable convolution network
CN115457968B (en) * 2022-08-26 2024-07-05 华南理工大学 Voiceprint confirmation method based on mixed resolution depth separable convolution network

Similar Documents

Publication Publication Date Title
CN108648759A (en) A kind of method for recognizing sound-groove that text is unrelated
CN105632501B (en) A kind of automatic accent classification method and device based on depth learning technology
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
CN110473566A (en) Audio separation method, device, electronic equipment and computer readable storage medium
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN104036774B (en) Tibetan dialect recognition methods and system
CN110211574A (en) Speech recognition modeling method for building up based on bottleneck characteristic and multiple dimensioned bull attention mechanism
CN110289003A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN109816092A (en) Deep neural network training method, device, electronic equipment and storage medium
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN110675859B (en) Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN106503805A (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN109119072A (en) Civil aviaton&#39;s land sky call acoustic model construction method based on DNN-HMM
CN110390955A (en) A kind of inter-library speech-emotion recognition method based on Depth Domain adaptability convolutional neural networks
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN110428843A (en) A kind of voice gender identification deep learning method
CN107993664B (en) Robust speaker recognition method based on competitive neural network
CN111048097B (en) Twin network voiceprint recognition method based on 3D convolution
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN103578481A (en) Method for recognizing cross-linguistic voice emotion
CN108877812B (en) Voiceprint recognition method and device and storage medium
CN106898355A (en) A kind of method for distinguishing speek person based on two modelings
CN110085216A (en) A kind of vagitus detection method and device
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181012

WD01 Invention patent application deemed withdrawn after publication