CN109256139A - A kind of method for distinguishing speek person based on Triplet-Loss - Google Patents

A kind of method for distinguishing speek person based on Triplet-Loss Download PDF

Info

Publication number
CN109256139A
CN109256139A CN201810835179.0A CN201810835179A CN109256139A CN 109256139 A CN109256139 A CN 109256139A CN 201810835179 A CN201810835179 A CN 201810835179A CN 109256139 A CN109256139 A CN 109256139A
Authority
CN
China
Prior art keywords
neural network
loss
voice signal
triplet
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810835179.0A
Other languages
Chinese (zh)
Inventor
王艺航
熊晓明
刘祥
李辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201810835179.0A priority Critical patent/CN109256139A/en
Publication of CN109256139A publication Critical patent/CN109256139A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The present invention relates to a kind of method for distinguishing speek person based on Triplet-Loss, the following steps are included: S1: obtaining voice signal, including three groups of samples, respectively one group of voice sequence of the one of speaker group voice sequence, another group of same speaker of voice sequence and different speakers;S2: the pretreatment of voice signal, the interchannel noise generated during removal voice collecting are carried out;S3: speech characteristic parameter extraction is carried out to the voice signal after denoising;S4: based on LSTM neural network, RNN neural network is constructed;S5: using extract 90% three groups of speech characteristic parameters as the input of RNN neural network, for training RNN neural network;After S6:RNN neural metwork training is good, Speaker Identification is carried out using remaining 10% three groups of speech characteristic parameter as the input of RNN neural network.The present invention is high with accuracy rate, recognition effect is good, high reliability.

Description

A kind of method for distinguishing speek person based on Triplet-Loss
Technical field
The present invention relates to the technical fields of neural network and deep learning, more particularly to one kind to be based on Triplet-Loss Method for distinguishing speek person.
Background technique
As information security issue is on the rise, caused by influence it is increasing." individual privacy secrecy " problem compels to be essential It solves.How accurate and safety determination a person's identity causes the thinking of people.One as human-computer interaction of voice Key interface plays an important role in authentication.Application on Voiceprint Recognition, as Speaker Identification, vocal print are only as speaker The biological characteristic of one nothing two, exactly overcomes the new tool of conventional authentication method.It is compared with other methods, contains the language of vocal print feature Sound obtains convenience, naturally, voiceprint extraction can be completed unconsciously, therefore the acceptance level of user is also high;Obtain voice Identification it is low in cost, using simple a, microphone, there are no need additional sound pick-up outfit when using communication apparatus; Voiceprint is suitble to remote identity confirmation, it is only necessary to a microphone or phone, mobile phone can by network (communication network or Internet) realize Telnet.
The method for recognizing sound-groove based on signal processing of common method for recognizing sound-groove such as early stage, uses signal processing skill Some technical methods calculate voice data in the parameter of signal in art, then carry out template matching, statistical variance analysis etc., This method is extremely sensitive to voice data, and accuracy rate is very low, and recognition effect is very unsatisfactory.
Recognition methods based on gauss hybrid models can obtain preferable effect and simple and flexible, but it is to amount of voice data It is required that very big, the requirement that is unable to satisfy real scene under very sensitive to channel circumstance noise.
The existing method based on deep learning neural network does not consider the context-sensitive essence of voice signal, mentions The feature got can not represent speaker well, and there is no the advantages for playing deep learning completely.
Summary of the invention
That it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of accuracys rate is high, recognition effect is good, reliability The high method for distinguishing speek person based on Triplet-Loss.
To achieve the above object, technical solution provided by the present invention are as follows:
A kind of method for distinguishing speek person based on Triplet-Loss, comprising the following steps:
S1: obtain voice signal, the voice signal include three groups of samples, respectively the one of speaker group voice sequence Xa, One group of voice sequence Xn of the voice sequence Xp of another group of same speaker and different speakers;
S2: the pretreatment of voice signal, the interchannel noise generated during removal voice collecting are carried out;
S3: speech characteristic parameter extraction is carried out to the voice signal after denoising;
S4: based on LSTM neural network, RNN neural network is constructed;
S5: 90% three groups of speech characteristic parameters that step S3 is extracted are used for as the input of RNN neural network Training RNN neural network;
After S6:RNN neural metwork training is good, using remaining 10% three groups of speech characteristic parameter as RNN neural network Input carry out Speaker Identification.
Further, the step S2 carries out denoising to voice signal using subtractive method of spectrums, the specific steps are as follows:
S2-1: voice signal is filtered;
S2-2: preemphasis is carried out to voice signal after filtering, by voice signal framing, to signal frame plus Hamming window;
S2-3: Fast Fourier Transform (FFT) is carried out to the signal after adding window, power spectrum is asked to each frame voice signal, then asks flat Equal noise power;
S2-4: noise estimation is carried out using VAD and monitors quiet section, and then combination recurrence is smooth, updates noise spectrum;
S2-5: it carries out spectrum and subtracts operation, obtain the voice signal power spectrum estimated;
S2-6: insertion phase spectrum calculates speech manual, then carry out Fast Fourier Transform Inverse, the speech frame restored;
S2-7: voice signal is combined into according to each speech frame group, voice signal is aggravated into the signal after being denoised.
Further, the step S3 carries out the specific steps of acoustical characteristic parameters extraction such as to the voice signal after denoising Under:
S3-1: carrying out preemphasis processing to three groups of voice signals after denoising, then by signal framing, each frame multiplied by Hamming window;
S3-2: Fast Fourier Transform (FFT) is carried out to every frame signal, obtains the Energy distribution on frequency spectrum;
Power spectrum: being passed through the triangle filter group of one group of Meier scale by S3-3, calculates each filter group output Logarithmic energy;
S3-4: the characteristic parameter exported by discrete cosine transform.
Further, the step step S4 is based on LSTM neural network, in LSTM neural network characteristics output layer Addition normalization layer and Triplet-Loss loss function layer afterwards, construct RNN neural network.
Further, the Triplet-Loss loss function layer is by study, allow between Xa and Xp feature representation away from From as small as possible, and the distance between feature representation of Xa and Xn is as big as possible, and to allow the distance between Xa and Xn and Xa There is a smallest interval α between the distance between Xp;
Corresponding objective function are as follows:
Wherein,Indicate the Euclidean distance measurement between Xa and Xp;
What is indicated is the Euclidean distance measurement between Xa and Xn;
Distance is measured with Euclidean distance herein, when the value in+[] is greater than zero, takes the value for loss, when minus It waits, loss zero.
Further, specific step is as follows for the step S6 progress Speaker Identification:
S6-1: the feature representation f (Xa) of three groups of samples, f (Xp), f (Xn) are obtained by LSTM neural network;
S6-2: obtained feature representation is normalized;
S6-3: pass through Triplet-Loss loss function optimization neural network;
S6-4: comparing the metric and preset threshold of Triplet-Loss loss function, if metric is greater than preset threshold, Then speak artificial same people, artificial different people of otherwise speaking.
Compared with prior art, this programme principle and advantage is as follows:
1. the pretreatment of voice signal uses subtractive method of spectrums, the constraint condition introduced relative to other methods, subtractive method of spectrums At least, physical significance is most direct, and operand is small, so as to effectively improve the accuracy of identification.
2. based on Triplet-Loss (triple loss function) come training pattern, pass through Inter-class loss and Intra-class loss Joint constraint to carry out model the optimization training of backpropagation, so that similar sample is in feature space as close possible to and different Class sample is away as far as possible in feature space, improves the sense of model, to improve the reliability and accuracy of identification.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the method for distinguishing speek person based on Triplet-Loss of the present invention;
Fig. 2 is the flow chart of subtractive method of spectrums in the present invention.
Fig. 3 is the flow chart that speech characteristic parameter extracts in the present invention.
Specific embodiment
The present invention is further explained in the light of specific embodiments:
Referring to figure 1, a kind of method for distinguishing speek person based on Triplet-Loss described in the present embodiment, including Following steps:
S1: obtain voice signal, the voice signal include three groups of samples, respectively the one of speaker group voice sequence Xa, One group of voice sequence Xn of the voice sequence Xp of another group of same speaker and different speakers;
S2: the pretreatment of voice signal is carried out;More interchannel noise can be generated during voice collecting, therefore can be to knowledge Other task brings bigger difficulty, therefore carries out denoising to input voice data using subtractive method of spectrums first, i.e., makes an uproar from band Noise spectrum valuation is subtracted in voice valuation, to obtain the frequency spectrum of clean speech.What is eliminated herein is interchannel noise, and channel is made an uproar Sound be by sound pick-up outfit caused by noise;While removing channel noise, all letters related with speaker are saved completely Breath.
As shown in Fig. 2, carrying out denoising to voice signal using subtractive method of spectrums, the specific steps are as follows:
S2-1: voice signal is filtered;
S2-2: preemphasis is carried out to voice signal after filtering, by voice signal framing, to signal frame plus Hamming window;
Specifically, in signal processing, windowing process is a necessary process, because computer can only be handled The signal of finite length, therefore original signal X (t) will be truncated with T (sampling time), i.e. finite process become after XT (t) again into one Step processing, this process is exactly windowing process, in actual signal processing, generally uses rectangular window, but rectangular window is at edge Signal is truncated suddenly at place, and time-domain information all disappears outside window, causes the phenomenon that frequency domain increases frequency component, i.e., frequency spectrum is let out Leakage, consider how reduce adding window when caused by leakage errors, main measure is using reasonable windowed function, and Hamming window is exactly One kind of signal window, the shape that the shape of major part arrives the section pi 0 as sin (x), and rest part is all 0, in this way Function be multiplied by other any one function f, f all only some have nonzero value;
S2-3: Fast Fourier Transform (FFT) is carried out to the signal after adding window, power spectrum is asked to each frame voice signal, then asks flat Equal noise power;
S2-4: it is quiet that noise estimation monitoring is carried out using VAD (Voice Activity Detection speech terminals detection) Quiet section, and then combination recurrence is smooth, updates noise spectrum;
S2-5: it carries out spectrum and subtracts operation, obtain the voice signal power spectrum estimated;
S2-6: insertion phase spectrum calculates speech manual, then carry out Fast Fourier Transform Inverse, the speech frame restored;
S2-7: voice signal is combined into according to each speech frame group, voice signal is aggravated into the signal after being denoised.
S3: as shown in figure 3, carrying out speech characteristic parameter extraction to the voice signal after denoising, the specific steps are as follows:
S3-1: carrying out preemphasis processing to three groups of voice signals after denoising, then to three groups of voice signal signals point Not according to frame length 25ms, frame moves 10ms and carries out framing, and each frame is multiplied by Hamming window;
S3-2: Fast Fourier Transform (FFT) is carried out to every frame signal, obtains the Energy distribution on frequency spectrum;
Power spectrum: being passed through the triangle filter group of one group of Meier scale by S3-3, calculates each filter group output Logarithmic energy;
S3-4: the speech characteristic parameter exported by discrete cosine transform.
S4: after getting speech characteristic parameter, based on LSTM neural network (long Memory Neural Networks in short-term), Addition normalization layer and Triplet-Loss loss function layer, construct RNN nerve net after LSTM neural network characteristics output layer Network (Recognition with Recurrent Neural Network);
The Triplet-Loss loss function layer used allows the distance between Xa and Xp feature representation to the greatest extent may be used by study Can be small, and the distance between feature representation of Xa and Xn is as big as possible, and to allow the distance between Xa and Xn and Xa and Xp it Between distance between have a smallest interval α;
Corresponding objective function are as follows:
Wherein,Indicate the Euclidean distance measurement between Xa and Xp;
What is indicated is the Euclidean distance measurement between Xa and Xn;
Distance is measured with Euclidean distance herein, when the value in+[] is greater than zero, takes the value for loss, when minus It waits, loss zero.
S5: 90% three groups of speech characteristic parameters that step S3 is extracted are used for as the input of RNN neural network Training RNN neural network;
After S6:RNN neural metwork training is good, using remaining 10% three groups of speech characteristic parameter as RNN neural network Input carry out Speaker Identification;Specific step is as follows for identification:
S6-1: the feature representation f (Xa) of three groups of samples, f (Xp), f (Xn) are obtained by LSTM neural network;
S6-2: obtained feature representation is normalized;
S6-3: pass through Triplet-Loss loss function optimization neural network;
S6-4: comparing the metric and preset threshold of Triplet-Loss loss function, if metric is greater than preset threshold, Then speak artificial same people, artificial different people of otherwise speaking.
The pretreatment of voice signal uses subtractive method of spectrums in the present embodiment, introduces relative to other methods, subtractive method of spectrums Constraint condition is minimum, and physical significance is most direct, and operand is small, so as to effectively improve the accuracy of identification.In addition, this implementation Example come training pattern, combines constraint by Inter-class loss and Intra-class loss based on Triplet-Loss (triple loss function) To carry out model the optimization training of backpropagation, so that similar sample is in feature space as close possible to and foreign peoples's sample exists Feature space is away as far as possible, and improves the sense of model, to improve the reliability and accuracy of identification.
The examples of implementation of the above are only the preferred embodiments of the invention, and implementation model of the invention is not limited with this It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.

Claims (6)

1. a kind of method for distinguishing speek person based on Triplet-Loss, which comprises the following steps:
S1: voice signal is obtained, which includes three groups of samples, respectively the one of speaker group voice sequence Xa, same One group of voice sequence Xn of another group of speaker of voice sequence Xp and different speakers;
S2: the pretreatment of voice signal, the interchannel noise generated during removal voice collecting are carried out;
S3: speech characteristic parameter extraction is carried out to the voice signal after denoising;
S4: based on LSTM neural network, RNN neural network is constructed;
S5: 90% three groups of speech characteristic parameters that step S3 is extracted are as the input of RNN neural network, for training RNN neural network;
After S6:RNN neural metwork training is good, using remaining 10% three groups of speech characteristic parameter as the defeated of RNN neural network Enter to carry out Speaker Identification.
2. a kind of method for distinguishing speek person based on Triplet-Loss according to claim 1, which is characterized in that described Step S2 carries out denoising to voice signal using subtractive method of spectrums, the specific steps are as follows:
S2-1: voice signal is filtered;
S2-2: preemphasis is carried out to voice signal after filtering, by voice signal framing, to signal frame plus Hamming window;
S2-3: Fast Fourier Transform (FFT) is carried out to the signal after adding window, power spectrum is asked to each frame voice signal, is then averaging and makes an uproar Acoustical power;
S2-4: noise estimation is carried out using VAD and monitors quiet section, and then combination recurrence is smooth, updates noise spectrum;
S2-5: it carries out spectrum and subtracts operation, obtain the voice signal power spectrum estimated;
S2-6: insertion phase spectrum calculates speech manual, then carry out Fast Fourier Transform Inverse, the speech frame restored;
S2-7: voice signal is combined into according to each speech frame group, voice signal is aggravated into the signal after being denoised.
3. a kind of method for distinguishing speek person based on Triplet-Loss according to claim 1, which is characterized in that described Step S3 carries out acoustical characteristic parameters extraction to the voice signal after denoising, and specific step is as follows:
S3-1: preemphasis processing is carried out to three groups of voice signals after denoising, then by signal framing, each frame is multiplied by Hamming Window;
S3-2: Fast Fourier Transform (FFT) is carried out to every frame signal, obtains the Energy distribution on frequency spectrum;
Power spectrum: being passed through the triangle filter group of one group of Meier scale by S3-3, calculates pair of each filter group output Number energy;
S3-4: the characteristic parameter exported by discrete cosine transform.
4. a kind of method for distinguishing speek person based on Triplet-Loss according to claim 1, which is characterized in that described Step step S4 based on LSTM neural network, after LSTM neural network characteristics output layer addition normalization layer and Triplet-Loss loss function layer constructs RNN neural network.
5. a kind of method for distinguishing speek person based on Triplet-Loss according to claim 4, which is characterized in that described Triplet-Loss loss function layer makes the distance between Xa and Xp feature representation as small as possible by study, and Xa and Xn The distance between feature representation is as big as possible, and to allow and have one between the distance between Xa and Xn and the distance between Xa and Xp A the smallest interval α;
Corresponding objective function are as follows:
Wherein,Indicate the Euclidean distance measurement between Xa and Xp;
What is indicated is the Euclidean distance measurement between Xa and Xn;
Distance is measured with Euclidean distance herein, when the value in+[] is greater than zero, takes the value for loss, when minus, damage Mistake is zero.
6. a kind of method for distinguishing speek person based on Triplet-Loss according to claim 1, which is characterized in that described Step S6 carries out Speaker Identification, and specific step is as follows:
S6-1: the feature representation f (Xa) of three groups of samples, f (Xp), f (Xn) are obtained by LSTM neural network;
S6-2: obtained feature representation is normalized;
S6-3: pass through Triplet-Loss loss function optimization neural network;
S6-4: comparing the metric and preset threshold of Triplet-Loss loss function, if metric is greater than preset threshold, says Talk about artificial same people, artificial different people of otherwise speaking.
CN201810835179.0A 2018-07-26 2018-07-26 A kind of method for distinguishing speek person based on Triplet-Loss Pending CN109256139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810835179.0A CN109256139A (en) 2018-07-26 2018-07-26 A kind of method for distinguishing speek person based on Triplet-Loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810835179.0A CN109256139A (en) 2018-07-26 2018-07-26 A kind of method for distinguishing speek person based on Triplet-Loss

Publications (1)

Publication Number Publication Date
CN109256139A true CN109256139A (en) 2019-01-22

Family

ID=65049985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810835179.0A Pending CN109256139A (en) 2018-07-26 2018-07-26 A kind of method for distinguishing speek person based on Triplet-Loss

Country Status (1)

Country Link
CN (1) CN109256139A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390937A (en) * 2019-06-10 2019-10-29 南京硅基智能科技有限公司 A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN110570870A (en) * 2019-09-20 2019-12-13 平安科技(深圳)有限公司 Text-independent voiceprint recognition method, device and equipment
CN110570871A (en) * 2019-09-20 2019-12-13 平安科技(深圳)有限公司 TristouNet-based voiceprint recognition method, device and equipment
CN110838295A (en) * 2019-11-17 2020-02-25 西北工业大学 Model generation method, voiceprint recognition method and corresponding device
CN111312259A (en) * 2020-02-17 2020-06-19 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
CN111418009A (en) * 2019-10-31 2020-07-14 支付宝(杭州)信息技术有限公司 Personalized speaker verification system and method
WO2020156153A1 (en) * 2019-01-29 2020-08-06 腾讯科技(深圳)有限公司 Audio recognition method and system, and device
CN112613481A (en) * 2021-01-04 2021-04-06 上海明略人工智能(集团)有限公司 Bearing abrasion early warning method and system based on frequency spectrum

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637438A (en) * 2012-03-23 2012-08-15 同济大学 Voice filtering method
US20170228641A1 (en) * 2016-02-04 2017-08-10 Nec Laboratories America, Inc. Distance metric learning with n-pair loss
CN107481736A (en) * 2017-08-14 2017-12-15 广东工业大学 A kind of vocal print identification authentication system and its certification and optimization method and system
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637438A (en) * 2012-03-23 2012-08-15 同济大学 Voice filtering method
US20170228641A1 (en) * 2016-02-04 2017-08-10 Nec Laboratories America, Inc. Distance metric learning with n-pair loss
CN107481736A (en) * 2017-08-14 2017-12-15 广东工业大学 A kind of vocal print identification authentication system and its certification and optimization method and system
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNLEI ZHANG等: "END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION", 《2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)》 *
HERVÉ BREDIN: "TristouNet: Triplet loss for speaker turn embedding", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020156153A1 (en) * 2019-01-29 2020-08-06 腾讯科技(深圳)有限公司 Audio recognition method and system, and device
CN110390937B (en) * 2019-06-10 2021-12-24 南京硅基智能科技有限公司 Cross-channel voiceprint recognition method based on ArcFace loss algorithm
CN110390937A (en) * 2019-06-10 2019-10-29 南京硅基智能科技有限公司 A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN110570870A (en) * 2019-09-20 2019-12-13 平安科技(深圳)有限公司 Text-independent voiceprint recognition method, device and equipment
CN110570871A (en) * 2019-09-20 2019-12-13 平安科技(深圳)有限公司 TristouNet-based voiceprint recognition method, device and equipment
US11031018B2 (en) 2019-10-31 2021-06-08 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for personalized speaker verification
CN111418009B (en) * 2019-10-31 2023-09-05 支付宝(杭州)信息技术有限公司 Personalized speaker verification system and method
US11244689B2 (en) 2019-10-31 2022-02-08 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
CN111418009A (en) * 2019-10-31 2020-07-14 支付宝(杭州)信息技术有限公司 Personalized speaker verification system and method
WO2020098828A3 (en) * 2019-10-31 2020-09-03 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for personalized speaker verification
US10997980B2 (en) 2019-10-31 2021-05-04 Alipay (Hangzhou) Information Technology Co., Ltd. System and method for determining voice characteristics
CN110838295B (en) * 2019-11-17 2021-11-23 西北工业大学 Model generation method, voiceprint recognition method and corresponding device
CN110838295A (en) * 2019-11-17 2020-02-25 西北工业大学 Model generation method, voiceprint recognition method and corresponding device
CN111312259A (en) * 2020-02-17 2020-06-19 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
CN112613481A (en) * 2021-01-04 2021-04-06 上海明略人工智能(集团)有限公司 Bearing abrasion early warning method and system based on frequency spectrum

Similar Documents

Publication Publication Date Title
CN109256139A (en) A kind of method for distinguishing speek person based on Triplet-Loss
EP2763134B1 (en) Method and apparatus for voice recognition
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
CN102005070A (en) Voice identification gate control system
CN102968990B (en) Speaker identifying method and system
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN111243617B (en) Speech enhancement method for reducing MFCC feature distortion based on deep learning
CN101930733B (en) Speech emotional characteristic extraction method for speech emotion recognition
CN111554302A (en) Strategy adjusting method, device, terminal and storage medium based on voiceprint recognition
CN109473102A (en) A kind of robot secretary intelligent meeting recording method and system
CN111508504B (en) Speaker recognition method based on auditory center perception mechanism
CN110570871A (en) TristouNet-based voiceprint recognition method, device and equipment
Charisma et al. Speaker recognition using mel-frequency cepstrum coefficients and sum square error
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN108172220A (en) A kind of novel voice denoising method
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Maazouzi et al. MFCC and similarity measurements for speaker identification systems
CN111105798B (en) Equipment control method based on voice recognition
CN116312561A (en) Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
CN111862991A (en) Method and system for identifying baby crying
Nijhawan et al. A new design approach for speaker recognition using MFCC and VAD
Sukor et al. Speaker identification system using MFCC procedure and noise reduction method
CN106971712A (en) A kind of adaptive rapid voiceprint recognition methods and system
Khetri et al. Automatic speech recognition for marathi isolated words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190122