CN103730121A - Method and device for recognizing disguised sounds - Google Patents

Method and device for recognizing disguised sounds Download PDF

Info

Publication number
CN103730121A
CN103730121A CN201310728591.XA CN201310728591A CN103730121A CN 103730121 A CN103730121 A CN 103730121A CN 201310728591 A CN201310728591 A CN 201310728591A CN 103730121 A CN103730121 A CN 103730121A
Authority
CN
China
Prior art keywords
speaker
coefficient
probability
conversion
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310728591.XA
Other languages
Chinese (zh)
Other versions
CN103730121B (en
Inventor
王泳
黄继武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Sun Yat Sen University
Original Assignee
Shenzhen University
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University, Sun Yat Sen University filed Critical Shenzhen University
Priority to CN201310728591.XA priority Critical patent/CN103730121B/en
Publication of CN103730121A publication Critical patent/CN103730121A/en
Application granted granted Critical
Publication of CN103730121B publication Critical patent/CN103730121B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method and device for recognizing disguised sounds. The method is characterized in that the voice transformation coefficient is estimated through fundamental frequency characteristics of voices, and an Mel frequency cepstrum coefficient extraction algorithm is improved, namely the estimated coefficient is integrated into the Mel frequency cepstrum coefficient extraction algorithm through linear interpolation extension so that the Mel frequency cepstrum coefficient of the before-transformation voices can be approximately calculated. Finally, the method is integrated into a GMM-UBM recognizing frame to calculate the similarity between the voices, and meanwhile the transformed voices can be restored into the original voices through the estimated transformation coefficient. According to the method and device, a great improvement is achieved on the recognizing performance compared with a conventional recognizing evidence-obtaining method, and detection missing and the false alarm are both lower than the detection missing and the false alarm of a conventional scheme.

Description

A kind of recognition methods and device that pretends sound
Technical field
The present invention relates to multi-media information security field, more specifically, relate to a kind of recognition methods and device that pretends sound.
Background technology
Speech conversion (Voice Transformation) is one of the most frequently used method of speech processing.Its function is a sound to be become to another sound diverse sound of nature.Speech conversion is generally used for music making or protection speaker's safety and privacy, but also likely by criminal, is used for covering up sound, in case be identified identity.Therefore the speaker ' s identity identification after speech conversion has important using value.
The general step of speech conversion:
1) to signal x (n), divide frame, windowing:
F ( k ) = &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n 0 &le; n < N - - - ( 1 )
2) calculate instantaneous amplitude:
| F ( k ) | = | &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n | 0 &le; n < N - - - ( 2 )
3) by this frame and the phase relation of former frame, calculate instantaneous frequency:
&omega; ( k ) = ( k + &Delta; ) * F s N - - - ( 3 )
F wherein sbe sampling frequency, Δ is the deviation frequency of relative centre frequency.
4) frequency spectrum is flexible.First be instantaneous amplitude linear interpolation
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2 (4)
Figure BDA0000445670430000014
μ=k′/α-k (6)
Can not give rise to misunderstanding under prerequisite, still the instantaneous amplitude after interpolation is being designated as | F (k) |.
Then carrying out frequency line moves:
ω′(k*α)=ω(k)*α 0≤k<N/2 0≤k*α<N/2 (7)
Can not give rise to misunderstanding under prerequisite, still by after moving instantaneous frequency be designated as ω (k).
5) by instantaneous frequency, calculate instantaneous phase φ (k), obtain the FFT coefficient after speech conversion:
F(k)=|F(k)|e jφ(k) (8)
6) F (k) is carried out to FFT inverse transformation, can obtain the signal after speech conversion.
Extract the process of MFCC as shown in Figure 1.Concrete steps are as follows:
1) windowing and calculating frequency spectrum.
The hamming window that MFCC has wherein adopted N=1024 to order:
w ( n ) = 0.53836 - 0.46164 cos ( 2 &pi;n N - 1 ) 0 &le; n < N - - - ( 9 )
To doing FFT conversion after source signal x (n) windowing:
F ( k ) = &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n 0 &le; n < N - - - ( 10 )
2) Mel segmentation (triangle filtering) and log-transformation:
Weighted window is used quarter window, and its formula is as follows:
H m ( k ) = 0 k < k m - 1 k - k m - 1 k m - k m - 1 k m - 1 &le; k &le; k m k m + 1 - k k m + 1 - k m k m < k &le; k m + 1 0 k > k m + 1 - - - ( 11 )
Wherein, k m=f (m) N/F s.F sfor sampling frequency.
After utilizing quarter window to the energy spectrum weighting of FFT, make log-transformation:
Y ( m ) = log [ &Sigma; k = 0 N - 1 | F ( k ) | 2 &CenterDot; H m ( k ) ] 1 &le; m &le; M - - - ( 12 )
3) cosine inverse transformation
Finally utilize cosine inverse transformation, can obtain Mel cepstrum coefficient, i.e. MFCC:
MFCC ( n ) = 1 M &Sigma; m = 1 M Y ( m ) cos ( n ( m - 0.5 ) &pi; M ) 1 &le; m &le; M 0 &le; n &le; N - 1 - - - ( 13 )
GMM-UBM
Speaker Identification can be considered two class hypothesis:
H 0: Y is from speaker S;
H 1: Y is not from speaker S.
On mathematics, H 0model λ with speaker S hyprepresent H 1with consistent background model λ bkgrepresent.Probability calculation is as shown in Equation (14):
Figure BDA0000445670430000031
Along with the widespread use of Audiotechnica, protection audio production becomes a study hotspot of information security field.Voice evidence obtaining is also one of them important field.In the administration of justice, business and other application, the speaker ' s identity identification after speech conversion all has important using value.Experimental result shows, at voice, after larger conversion, conventional Speaker Identification scheme can cause higher or high loss and false alarm rate, identification complete failure.
Summary of the invention
Primary and foremost purpose of the present invention is to propose a kind of recognition methods of pretending sound, adopts the method can identify the speaker ' s identity of a conversion audio production, has very important using value on the speaker ' s identity after speech conversion.
Another object of the present invention is the recognition device that proposes a kind of language camouflage sound.
In order to solve the deficiencies in the prior art, the technical solution used in the present invention is as follows:
Pretend a recognition methods for sound, described method comprises:
In the training stage, utilize maximum expected value EM algorithm from background sound storehouse, to calculate consistent background model UBM λ bkg;
In the training stage, extract the tested speech S of speaker j jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j j, calculate fundamental frequency mean value f j; Set up the model V of speaker j j=(λ j, f j), and be stored in model database;
In the training stage, obtain threshold value θ; Threshold value θ acquisition methods: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice Y y; Utilize f y/ f jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V jprobability Λ (X);
Relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein said modified MFCC extraction algorithm is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
Figure BDA0000445670430000041
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f y/ f j.
In a kind of preferred scheme, the extraction step of described fundamental frequency is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing midthe signal of the predetermined length value in front and back;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t midfundamental frequency F.
In a kind of preferred scheme, described fundamental frequency mean value is mean (F), and mean () is for being averaging.
In a kind of preferred scheme, as α ' >1, need carry out frequency spectrum compensation; Making nyquist frequency is F n; Compensation method is at F n/ 2/ α ' is to F n/ 2/ α '-F nin frequency spectrum between/2, symmetry copies into F n/ 2/ α ' is to F nin/2/ scope.The effect of compensation is herein approximate reduction F n/ 2/ α ' is to F nthe amplitude of frequency range between/2/, thus the MFCC value after order reduction approaches original MFCC value.
A recognition device that pretends sound, comprising:
Training module, for utilizing maximum expected value EM algorithm to calculate consistent background model UBM λ from background sound storehouse bkg; Extract the tested speech S of speaker j jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j j, calculate fundamental frequency mean value f j; Set up the model V of speaker j j=(λ j, f j), and be stored in model database, in the training stage, obtain threshold value θ;
Threshold value θ acquisition methods wherein: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value f y; Utilize f y/ f jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V jprobability Λ (X);
Identification module, relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein the modified MFCC extraction algorithm specific implementation described in test module is:
After windowing in MFCC extraction algorithm and FFT conversion, the amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f y/ f j.
Compared with prior art, beneficial effect of the present invention is: the present invention compares conventional identification evidence collecting method on recognition performance great raising, adopt the mean value of fundamental frequency and estimate conversion coefficient, MFCC extraction algorithm has been done to improvement simultaneously, directly computing voice is changed previous MFCC feature, whether be some target speaker said, undetected all low than conventional scheme with false-alarm if by the method for calculating probability based on GMM-UBM, calculating tested speech.
Accompanying drawing explanation
Fig. 1 is the leaching process schematic diagram of Mel frequency cepstral coefficient.
Conversion coefficient contrast true obtain conversion coefficient (the true factor alpha (k)=2 of Fig. 2 for estimating k/12, estimation coefficient α ' (y)=2 y/12) comparison schematic diagram.
Fig. 3 is tetra-kinds of frequency domain camouflage methods of EER() curve map.
Fig. 4 is tetra-kinds of frequency domain methods of DET() curve map.
Fig. 5 is EER(TD-PSOLA) curve map.
Fig. 6 is DET(TD-PSOLA) curve map.
Embodiment
As shown in Fig. 3-6, the present invention is disclosed in the training stage, utilizes EM(Expectation Maximum maximum expected value) algorithm calculates the consistent background model of UBM(from background sound storehouse) λ bkg; In the training stage, extract the tested speech S of speaker j jmFCC coefficient and fundamental frequency, utilize MAP((Maximum A posteriori, maximum a posteriori probability) algorithm calculates the GMM(Gaussian Mixture Model of speaker j) model λ j, calculate fundamental frequency mean value f j.Set up the model V of speaker j j=(λ j, f j), and be stored in model database.In the training stage, obtain threshold value θ.At test phase, voice Y is the voice after conversion, extracts its fundamental frequency mean value f y.Utilize f y/ f jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion.Through the probability estimate algorithm based on GMM-UBM, show that Y is model V jprobability Λ (X).If gained probability is greater than threshold value θ, to identify voice Y be the said fragment of j; If be not more than threshold value, identify voice Y not for j is said.
The method of estimation of calculating described conversion coefficient is: α '=f y/ f j, the conversion coefficient that wherein α ' is described estimation, fundamental frequency mean value is that fundamental frequency is averaging to gained.
The extraction step of fundamental frequency value is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing midthe signal of front and back one predetermined length value;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t midfundamental frequency F.
Modified extraction algorithm is after windowing in Mel frequency cepstral coefficient extraction algorithm and FFT conversion, the amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | and F (k ') |.Shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0£k′<N/2
Figure BDA0000445670430000061
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein flexible value 1/ α ' of linear interpolation is described estimation.
The method that coupling is calculated is the method for calculating probability based on GMM-UBM.Coupling is calculated and to be referred to that the probability of computing voice fragment under certain model, this probability reflect speak speaker's the probability of artificial this model representative of this sound bite.
Now provide sound bank and some experimental results of utilizing the inventive method to adopt.
Sound bank is TIMIT.This is the most frequently used sound bank in speech/speaker identification.Comprise 192 female and 438 men from 8 different geographicals, totally 630 speakers.Each speaker reads respectively 10 sections of different voice, amounts to 6300 sections of voice.All voice are WAV form, 8k sampling rate, 16bit quantified precision.This experiment is divided into three word banks by TIMIT:
1) consistent background sound storehouse: 60 men, all sound bites of 60 woman link together and train UBM.
2) the regular sound bank of mark: 40 female, 90 man's sound bites are for mark regular (TNorm).
3) exploitation-evaluation and test sound bank: 92 female, 288 men.To speaker j, its 5 sections of fragments are linked to be one section of 2048 dimension GMM model λ that train j jand fundamental frequency mean value f j.All the other 5 sections are linked to be one section, and act in this fragment with different switching transformation of coefficient camouflage.For training the sound bank of speaker model to be called exploitation sound bank; For the sound bank pretending, be called evaluation and test sound bank.
Consider five kinds of transformation tool (method): Adobe Audition, the Audacity based on frequency domain, GoldWave, RSITI and the TD-PSOLA based on time domain.Adjust as conversion intensity with 12 half acoustics the inside, and conversion coefficient and half is adjusted following relation:
α(k)=2 k/12
In experiment, only consider the speech conversion of-11≤k≤11, because the speech conversion of generally provide-11≤k≤11 of practical audio frequency (voice) instrument.
By transition function H (z)=1-0.97z -1voice signal is carried out to pre-emphasis.
Frame length 1024 sample points, 24 dimension MFCC matrixes are comprised of 12 dimension MFCC coefficients and 12 dimension Δ MFCC coefficients.
Provide the experimental example of estimating speech conversion coefficient below.Estimated value to same people's conversion coefficient is averaging, and in true that conversion coefficient is made comparisons, is presented in Fig. 2.
Provide recognition performance below.Fig. 3 is EER(Equal Error Rate, and loss False Reject Rate (FRR) equals false alarm rate False Alarm Rate (FAR)).Overall situation EER is in Table 1 and table 2.
The overall EER of table 1, | k|≤11(%)
The overall EER of table 2, | k|≤8(%)
Figure BDA0000445670430000072
Fig. 4 is DET(Detection Error Tradeoff, detects wrong balance).Visible, the performance of conventional scheme (baseline) is destroyed completely by various camouflage methods, that is to say, conventional Speaker Recognition System cannot correctly be identified the speaker ' s identity of camouflage voice.And method of the present invention (proposed with estimated scaling factor) has greatly reduced error probability, go up largely and can identify speaker ' s identity, reached acceptable level in a lot of application.Give in addition this method performance (optimum performance of this performance for adopting the present invention to reach) in the right-on situation of conversion coefficient.From each chart, the performance that the present invention reaches very approaches optimum performance.
Method of the present invention has also comprised the identification to TD-PSOLA camouflage.Result is presented in Fig. 5 and Fig. 6.Conventional scheme performance is slightly better than the method that the present invention puies forward.But owing to also cannot keep the sense of hearing naturality of voice when conversion intensity is little, so range of application is less, and present useful application software all no longer in fact in this way.

Claims (5)

1. pretend a recognition methods for sound, it is characterized in that, described method comprises:
In the training stage, utilize maximum expected value EM algorithm from background sound storehouse, to calculate consistent background model UBM λ bkg;
In the training stage, extract the tested speech S of speaker j jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j j, calculate fundamental frequency mean value f j; Set up the model V of speaker j j=(λ j, f j), and be stored in model database;
In the training stage, obtain threshold value θ, threshold value θ acquisition methods: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice Y y; Utilize f y/ f jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V jprobability Λ (X);
Relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein said modified MFCC extraction algorithm is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
Figure FDA0000445670420000011
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f y/ f j.
2. the recognition methods of camouflage sound according to claim 1, is characterized in that, the extraction step of described fundamental frequency is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing midthe signal of the predetermined length value in front and back;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t midfundamental frequency F.
3. the recognition methods of camouflage sound according to claim 2, is characterized in that, described fundamental frequency mean value is mean (F), and mean () is for being averaging.
4. the recognition methods of camouflage sound according to claim 1, is characterized in that, as α ' >1, need carry out frequency spectrum compensation; Making nyquist frequency is F n; Compensation method is at F n/ 2/ α ' is to F n/ 2/ α '-F nin frequency spectrum between/2, symmetry copies into F n/ 2/ α ' is to F nin/2/ scope.
5. a recognition device that pretends sound, is characterized in that, comprising:
Training module, for utilizing maximum expected value EM algorithm to calculate consistent background model UBM λ from background sound storehouse bkg; Extract the tested speech S of speaker j jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j j, calculate fundamental frequency mean value f j; Set up the model V of speaker j j=(λ j, f j), and be stored in model database, in the training stage, obtain threshold value θ;
Threshold value θ acquisition methods wherein: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value f y; Utilize f y/ f jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V jprobability Λ (X);
Identification module, relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
The modified MFCC extraction algorithm wherein adopting in test module is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
Figure FDA0000445670420000021
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f y/ f j.
CN201310728591.XA 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device Expired - Fee Related CN103730121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310728591.XA CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310728591.XA CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Publications (2)

Publication Number Publication Date
CN103730121A true CN103730121A (en) 2014-04-16
CN103730121B CN103730121B (en) 2016-08-24

Family

ID=50454168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310728591.XA Expired - Fee Related CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Country Status (1)

Country Link
CN (1) CN103730121B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104183245A (en) * 2014-09-04 2014-12-03 福建星网视易信息***有限公司 Method and device for recommending music stars with tones similar to those of singers
CN105976819A (en) * 2016-03-23 2016-09-28 广州势必可赢网络科技有限公司 Rnorm score normalization based speaker verification method
CN109215680A (en) * 2018-08-16 2019-01-15 公安部第三研究所 A kind of voice restoration method based on convolutional neural networks
CN109741761A (en) * 2019-03-13 2019-05-10 百度在线网络技术(北京)有限公司 Sound processing method and device
CN109920435A (en) * 2019-04-09 2019-06-21 厦门快商通信息咨询有限公司 A kind of method for recognizing sound-groove and voice print identification device
CN110363406A (en) * 2019-06-27 2019-10-22 上海淇馥信息技术有限公司 Appraisal procedure, device and the electronic equipment of a kind of client intermediary risk
CN111462763A (en) * 2019-09-21 2020-07-28 美律电子(深圳)有限公司 Computer-implemented voice command verification method and electronic device
CN111739547A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN112967712A (en) * 2021-02-25 2021-06-15 中山大学 Synthetic speech detection method based on autoregressive model coefficient
CN113270112A (en) * 2021-04-29 2021-08-17 中国人民解放军陆军工程大学 Electronic camouflage voice automatic distinguishing and restoring method and system
CN116013323A (en) * 2022-12-27 2023-04-25 浙江大学 Active evidence obtaining method oriented to voice conversion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005345599A (en) * 2004-06-01 2005-12-15 Toshiba Tec Corp Speaker-recognizing device, program, and speaker-recognizing method
CN1914667A (en) * 2004-06-01 2007-02-14 东芝泰格有限公司 Speaker recognizing device, program, and speaker recognizing method
CN1967657A (en) * 2005-11-18 2007-05-23 成都索贝数码科技股份有限公司 Automatic tracking and tonal modification system of speaker in program execution and method thereof
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN102354496A (en) * 2011-07-01 2012-02-15 中山大学 PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005345599A (en) * 2004-06-01 2005-12-15 Toshiba Tec Corp Speaker-recognizing device, program, and speaker-recognizing method
CN1914667A (en) * 2004-06-01 2007-02-14 东芝泰格有限公司 Speaker recognizing device, program, and speaker recognizing method
CN1967657A (en) * 2005-11-18 2007-05-23 成都索贝数码科技股份有限公司 Automatic tracking and tonal modification system of speaker in program execution and method thereof
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN102354496A (en) * 2011-07-01 2012-02-15 中山大学 PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104183245A (en) * 2014-09-04 2014-12-03 福建星网视易信息***有限公司 Method and device for recommending music stars with tones similar to those of singers
CN105976819A (en) * 2016-03-23 2016-09-28 广州势必可赢网络科技有限公司 Rnorm score normalization based speaker verification method
CN109215680B (en) * 2018-08-16 2020-06-30 公安部第三研究所 Voice restoration method based on convolutional neural network
CN109215680A (en) * 2018-08-16 2019-01-15 公安部第三研究所 A kind of voice restoration method based on convolutional neural networks
CN109741761A (en) * 2019-03-13 2019-05-10 百度在线网络技术(北京)有限公司 Sound processing method and device
CN109920435A (en) * 2019-04-09 2019-06-21 厦门快商通信息咨询有限公司 A kind of method for recognizing sound-groove and voice print identification device
CN110363406A (en) * 2019-06-27 2019-10-22 上海淇馥信息技术有限公司 Appraisal procedure, device and the electronic equipment of a kind of client intermediary risk
CN111462763A (en) * 2019-09-21 2020-07-28 美律电子(深圳)有限公司 Computer-implemented voice command verification method and electronic device
CN111462763B (en) * 2019-09-21 2024-02-27 美律电子(深圳)有限公司 Voice command verification method implemented by computer and electronic device
CN111739547A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN112967712A (en) * 2021-02-25 2021-06-15 中山大学 Synthetic speech detection method based on autoregressive model coefficient
CN113270112A (en) * 2021-04-29 2021-08-17 中国人民解放军陆军工程大学 Electronic camouflage voice automatic distinguishing and restoring method and system
CN116013323A (en) * 2022-12-27 2023-04-25 浙江大学 Active evidence obtaining method oriented to voice conversion

Also Published As

Publication number Publication date
CN103730121B (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN103730121A (en) Method and device for recognizing disguised sounds
CN103236260B (en) Speech recognition system
CN106847292B (en) Method for recognizing sound-groove and device
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
Villalba et al. Detecting replay attacks from far-field recordings on speaker verification systems
CN102968990B (en) Speaker identifying method and system
Yu et al. Uncertainty propagation in front end factor analysis for noise robust speaker recognition
CN102354496B (en) PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof
CN104464724A (en) Speaker recognition method for deliberately pretended voices
CN104900229A (en) Method for extracting mixed characteristic parameters of voice signals
CN103077728B (en) A kind of patient&#39;s weak voice endpoint detection method
CN106409298A (en) Identification method of sound rerecording attack
Zhang et al. Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach
CA2492204A1 (en) Similar speaking recognition method and system using linear and nonlinear feature extraction
Shchemelinin et al. Examining vulnerability of voice verification systems to spoofing attacks by means of a TTS system
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
CN101887722A (en) Rapid voiceprint authentication method
CN106782508A (en) The cutting method of speech audio and the cutting device of speech audio
CN105280181A (en) Training method for language recognition model and language recognition method
Alam et al. Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus.
CN104464738A (en) Vocal print recognition method oriented to smart mobile device
Mohammadi et al. Robust features fusion for text independent speaker verification enhancement in noisy environments
CN116665649A (en) Synthetic voice detection method based on prosody characteristics
Zhang The algorithm of voiceprint recognition model based DNN-RELIANCE
Hanilci et al. VQ-UBM based speaker verification through dimension reduction using local PCA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160824

Termination date: 20211224

CF01 Termination of patent right due to non-payment of annual fee