CN103730121A - Method and device for recognizing disguised sounds - Google Patents
Method and device for recognizing disguised sounds Download PDFInfo
- Publication number
- CN103730121A CN103730121A CN201310728591.XA CN201310728591A CN103730121A CN 103730121 A CN103730121 A CN 103730121A CN 201310728591 A CN201310728591 A CN 201310728591A CN 103730121 A CN103730121 A CN 103730121A
- Authority
- CN
- China
- Prior art keywords
- speaker
- coefficient
- probability
- conversion
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000006243 chemical reaction Methods 0.000 claims description 54
- 238000012549 training Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 9
- 239000012634 fragment Substances 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 7
- 238000005311 autocorrelation function Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000005314 correlation function Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 abstract description 8
- 238000001514 detection method Methods 0.000 abstract description 3
- 230000006872 improvement Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011551 log transformation method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000002386 leaching Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a method and device for recognizing disguised sounds. The method is characterized in that the voice transformation coefficient is estimated through fundamental frequency characteristics of voices, and an Mel frequency cepstrum coefficient extraction algorithm is improved, namely the estimated coefficient is integrated into the Mel frequency cepstrum coefficient extraction algorithm through linear interpolation extension so that the Mel frequency cepstrum coefficient of the before-transformation voices can be approximately calculated. Finally, the method is integrated into a GMM-UBM recognizing frame to calculate the similarity between the voices, and meanwhile the transformed voices can be restored into the original voices through the estimated transformation coefficient. According to the method and device, a great improvement is achieved on the recognizing performance compared with a conventional recognizing evidence-obtaining method, and detection missing and the false alarm are both lower than the detection missing and the false alarm of a conventional scheme.
Description
Technical field
The present invention relates to multi-media information security field, more specifically, relate to a kind of recognition methods and device that pretends sound.
Background technology
Speech conversion (Voice Transformation) is one of the most frequently used method of speech processing.Its function is a sound to be become to another sound diverse sound of nature.Speech conversion is generally used for music making or protection speaker's safety and privacy, but also likely by criminal, is used for covering up sound, in case be identified identity.Therefore the speaker ' s identity identification after speech conversion has important using value.
The general step of speech conversion:
1) to signal x (n), divide frame, windowing:
2) calculate instantaneous amplitude:
3) by this frame and the phase relation of former frame, calculate instantaneous frequency:
F wherein
sbe sampling frequency, Δ is the deviation frequency of relative centre frequency.
4) frequency spectrum is flexible.First be instantaneous amplitude linear interpolation
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2 (4)
μ=k′/α-k (6)
Can not give rise to misunderstanding under prerequisite, still the instantaneous amplitude after interpolation is being designated as | F (k) |.
Then carrying out frequency line moves:
ω′(k*α)=ω(k)*α 0≤k<N/2 0≤k*α<N/2 (7)
Can not give rise to misunderstanding under prerequisite, still by after moving instantaneous frequency be designated as ω (k).
5) by instantaneous frequency, calculate instantaneous phase φ (k), obtain the FFT coefficient after speech conversion:
F(k)=|F(k)|e
jφ(k) (8)
6) F (k) is carried out to FFT inverse transformation, can obtain the signal after speech conversion.
Extract the process of MFCC as shown in Figure 1.Concrete steps are as follows:
1) windowing and calculating frequency spectrum.
The hamming window that MFCC has wherein adopted N=1024 to order:
To doing FFT conversion after source signal x (n) windowing:
2) Mel segmentation (triangle filtering) and log-transformation:
Weighted window is used quarter window, and its formula is as follows:
Wherein, k
m=f (m) N/F
s.F
sfor sampling frequency.
After utilizing quarter window to the energy spectrum weighting of FFT, make log-transformation:
3) cosine inverse transformation
Finally utilize cosine inverse transformation, can obtain Mel cepstrum coefficient, i.e. MFCC:
GMM-UBM
Speaker Identification can be considered two class hypothesis:
H
0: Y is from speaker S;
H
1: Y is not from speaker S.
On mathematics, H
0model λ with speaker S
hyprepresent H
1with consistent background model λ
bkgrepresent.Probability calculation is as shown in Equation (14):
Along with the widespread use of Audiotechnica, protection audio production becomes a study hotspot of information security field.Voice evidence obtaining is also one of them important field.In the administration of justice, business and other application, the speaker ' s identity identification after speech conversion all has important using value.Experimental result shows, at voice, after larger conversion, conventional Speaker Identification scheme can cause higher or high loss and false alarm rate, identification complete failure.
Summary of the invention
Primary and foremost purpose of the present invention is to propose a kind of recognition methods of pretending sound, adopts the method can identify the speaker ' s identity of a conversion audio production, has very important using value on the speaker ' s identity after speech conversion.
Another object of the present invention is the recognition device that proposes a kind of language camouflage sound.
In order to solve the deficiencies in the prior art, the technical solution used in the present invention is as follows:
Pretend a recognition methods for sound, described method comprises:
In the training stage, utilize maximum expected value EM algorithm from background sound storehouse, to calculate consistent background model UBM λ
bkg;
In the training stage, extract the tested speech S of speaker j
jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j
j, calculate fundamental frequency mean value f
j; Set up the model V of speaker j
j=(λ
j, f
j), and be stored in model database;
In the training stage, obtain threshold value θ; Threshold value θ acquisition methods: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice Y
y; Utilize f
y/ f
jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V
jprobability Λ (X);
Relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein said modified MFCC extraction algorithm is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f
y/ f
j.
In a kind of preferred scheme, the extraction step of described fundamental frequency is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing
midthe signal of the predetermined length value in front and back;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t
midfundamental frequency F.
In a kind of preferred scheme, described fundamental frequency mean value is mean (F), and mean () is for being averaging.
In a kind of preferred scheme, as α ' >1, need carry out frequency spectrum compensation; Making nyquist frequency is F
n; Compensation method is at F
n/ 2/ α ' is to F
n/ 2/ α '-F
nin frequency spectrum between/2, symmetry copies into F
n/ 2/ α ' is to F
nin/2/ scope.The effect of compensation is herein approximate reduction F
n/ 2/ α ' is to F
nthe amplitude of frequency range between/2/, thus the MFCC value after order reduction approaches original MFCC value.
A recognition device that pretends sound, comprising:
Training module, for utilizing maximum expected value EM algorithm to calculate consistent background model UBM λ from background sound storehouse
bkg; Extract the tested speech S of speaker j
jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j
j, calculate fundamental frequency mean value f
j; Set up the model V of speaker j
j=(λ
j, f
j), and be stored in model database, in the training stage, obtain threshold value θ;
Threshold value θ acquisition methods wherein: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value f
y; Utilize f
y/ f
jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V
jprobability Λ (X);
Identification module, relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein the modified MFCC extraction algorithm specific implementation described in test module is:
After windowing in MFCC extraction algorithm and FFT conversion, the amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f
y/ f
j.
Compared with prior art, beneficial effect of the present invention is: the present invention compares conventional identification evidence collecting method on recognition performance great raising, adopt the mean value of fundamental frequency and estimate conversion coefficient, MFCC extraction algorithm has been done to improvement simultaneously, directly computing voice is changed previous MFCC feature, whether be some target speaker said, undetected all low than conventional scheme with false-alarm if by the method for calculating probability based on GMM-UBM, calculating tested speech.
Accompanying drawing explanation
Fig. 1 is the leaching process schematic diagram of Mel frequency cepstral coefficient.
Conversion coefficient contrast true obtain conversion coefficient (the true factor alpha (k)=2 of Fig. 2 for estimating
k/12, estimation coefficient α ' (y)=2
y/12) comparison schematic diagram.
Fig. 3 is tetra-kinds of frequency domain camouflage methods of EER() curve map.
Fig. 4 is tetra-kinds of frequency domain methods of DET() curve map.
Fig. 5 is EER(TD-PSOLA) curve map.
Fig. 6 is DET(TD-PSOLA) curve map.
Embodiment
As shown in Fig. 3-6, the present invention is disclosed in the training stage, utilizes EM(Expectation Maximum maximum expected value) algorithm calculates the consistent background model of UBM(from background sound storehouse) λ
bkg; In the training stage, extract the tested speech S of speaker j
jmFCC coefficient and fundamental frequency, utilize MAP((Maximum A posteriori, maximum a posteriori probability) algorithm calculates the GMM(Gaussian Mixture Model of speaker j) model λ
j, calculate fundamental frequency mean value f
j.Set up the model V of speaker j
j=(λ
j, f
j), and be stored in model database.In the training stage, obtain threshold value θ.At test phase, voice Y is the voice after conversion, extracts its fundamental frequency mean value f
y.Utilize f
y/ f
jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion.Through the probability estimate algorithm based on GMM-UBM, show that Y is model V
jprobability Λ (X).If gained probability is greater than threshold value θ, to identify voice Y be the said fragment of j; If be not more than threshold value, identify voice Y not for j is said.
The method of estimation of calculating described conversion coefficient is: α '=f
y/ f
j, the conversion coefficient that wherein α ' is described estimation, fundamental frequency mean value is that fundamental frequency is averaging to gained.
The extraction step of fundamental frequency value is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing
midthe signal of front and back one predetermined length value;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t
midfundamental frequency F.
Modified extraction algorithm is after windowing in Mel frequency cepstral coefficient extraction algorithm and FFT conversion, the amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | and F (k ') |.Shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0£k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein flexible value 1/ α ' of linear interpolation is described estimation.
The method that coupling is calculated is the method for calculating probability based on GMM-UBM.Coupling is calculated and to be referred to that the probability of computing voice fragment under certain model, this probability reflect speak speaker's the probability of artificial this model representative of this sound bite.
Now provide sound bank and some experimental results of utilizing the inventive method to adopt.
Sound bank is TIMIT.This is the most frequently used sound bank in speech/speaker identification.Comprise 192 female and 438 men from 8 different geographicals, totally 630 speakers.Each speaker reads respectively 10 sections of different voice, amounts to 6300 sections of voice.All voice are WAV form, 8k sampling rate, 16bit quantified precision.This experiment is divided into three word banks by TIMIT:
1) consistent background sound storehouse: 60 men, all sound bites of 60 woman link together and train UBM.
2) the regular sound bank of mark: 40 female, 90 man's sound bites are for mark regular (TNorm).
3) exploitation-evaluation and test sound bank: 92 female, 288 men.To speaker j, its 5 sections of fragments are linked to be one section of 2048 dimension GMM model λ that train j
jand fundamental frequency mean value f
j.All the other 5 sections are linked to be one section, and act in this fragment with different switching transformation of coefficient camouflage.For training the sound bank of speaker model to be called exploitation sound bank; For the sound bank pretending, be called evaluation and test sound bank.
Consider five kinds of transformation tool (method): Adobe Audition, the Audacity based on frequency domain, GoldWave, RSITI and the TD-PSOLA based on time domain.Adjust as conversion intensity with 12 half acoustics the inside, and conversion coefficient and half is adjusted following relation:
α(k)=2
k/12
In experiment, only consider the speech conversion of-11≤k≤11, because the speech conversion of generally provide-11≤k≤11 of practical audio frequency (voice) instrument.
By transition function H (z)=1-0.97z
-1voice signal is carried out to pre-emphasis.
Frame length 1024 sample points, 24 dimension MFCC matrixes are comprised of 12 dimension MFCC coefficients and 12 dimension Δ MFCC coefficients.
Provide the experimental example of estimating speech conversion coefficient below.Estimated value to same people's conversion coefficient is averaging, and in true that conversion coefficient is made comparisons, is presented in Fig. 2.
Provide recognition performance below.Fig. 3 is EER(Equal Error Rate, and loss False Reject Rate (FRR) equals false alarm rate False Alarm Rate (FAR)).Overall situation EER is in Table 1 and table 2.
The overall EER of table 1, | k|≤11(%)
The overall EER of table 2, | k|≤8(%)
Fig. 4 is DET(Detection Error Tradeoff, detects wrong balance).Visible, the performance of conventional scheme (baseline) is destroyed completely by various camouflage methods, that is to say, conventional Speaker Recognition System cannot correctly be identified the speaker ' s identity of camouflage voice.And method of the present invention (proposed with estimated scaling factor) has greatly reduced error probability, go up largely and can identify speaker ' s identity, reached acceptable level in a lot of application.Give in addition this method performance (optimum performance of this performance for adopting the present invention to reach) in the right-on situation of conversion coefficient.From each chart, the performance that the present invention reaches very approaches optimum performance.
Method of the present invention has also comprised the identification to TD-PSOLA camouflage.Result is presented in Fig. 5 and Fig. 6.Conventional scheme performance is slightly better than the method that the present invention puies forward.But owing to also cannot keep the sense of hearing naturality of voice when conversion intensity is little, so range of application is less, and present useful application software all no longer in fact in this way.
Claims (5)
1. pretend a recognition methods for sound, it is characterized in that, described method comprises:
In the training stage, utilize maximum expected value EM algorithm from background sound storehouse, to calculate consistent background model UBM λ
bkg;
In the training stage, extract the tested speech S of speaker j
jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j
j, calculate fundamental frequency mean value f
j; Set up the model V of speaker j
j=(λ
j, f
j), and be stored in model database;
In the training stage, obtain threshold value θ, threshold value θ acquisition methods: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice Y
y; Utilize f
y/ f
jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V
jprobability Λ (X);
Relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
Wherein said modified MFCC extraction algorithm is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f
y/ f
j.
2. the recognition methods of camouflage sound according to claim 1, is characterized in that, the extraction step of described fundamental frequency is as follows:
(1) to signal, arbitrary moment t is tried to achieve in windowing
midthe signal of the predetermined length value in front and back;
(2) ask the autocorrelation function of signal and the autocorrelation function of window function of described predetermined length value;
(3) pairwise correlation function is divided by, and maximal value place is cycle T, obtains this moment t
midfundamental frequency F.
3. the recognition methods of camouflage sound according to claim 2, is characterized in that, described fundamental frequency mean value is mean (F), and mean () is for being averaging.
4. the recognition methods of camouflage sound according to claim 1, is characterized in that, as α ' >1, need carry out frequency spectrum compensation; Making nyquist frequency is F
n; Compensation method is at F
n/ 2/ α ' is to F
n/ 2/ α '-F
nin frequency spectrum between/2, symmetry copies into F
n/ 2/ α ' is to F
nin/2/ scope.
5. a recognition device that pretends sound, is characterized in that, comprising:
Training module, for utilizing maximum expected value EM algorithm to calculate consistent background model UBM λ from background sound storehouse
bkg; Extract the tested speech S of speaker j
jmel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm to calculate the gauss hybrid models GMM λ of speaker j
j, calculate fundamental frequency mean value f
j; Set up the model V of speaker j
j=(λ
j, f
j), and be stored in model database, in the training stage, obtain threshold value θ;
Threshold value θ acquisition methods wherein: computing client mark and personator's mark, utilize the distribution of this two classes mark to select threshold value θ to reach loss and the false alarm rate that meets application requirements, client's mark Client Scores wherein, the probability of speaker's sound bite under this speaker model, personator's mark Imposter Scores is the probability of speaker's sound bite under other speaker model;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value f
y; Utilize f
y/ f
jcalculate conversion coefficient; Utilize modified MFCC extraction algorithm to calculate the original MFCC coefficient X before Y conversion; Through the probability estimate algorithm based on GMM-UBM, show that Y is model V
jprobability Λ (X);
Identification module, relatively probability Λ (X) and threshold value θ, if gained probability be greater than threshold value θ voice Y be the said fragment of j; Otherwise voice Y is not that j is said;
The modified MFCC extraction algorithm wherein adopting in test module is specially: after the windowing in MFCC extraction algorithm and FFT conversion, amplitude to FFT coefficient | F (k) | carry out that linear interpolation is flexible to be drawn | F (k ') |, shown in the flexible following formula of amplitude linear interpolation of FFT coefficient:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
The inverse of the conversion coefficient that wherein 1/ α ' is described estimation, the conversion coefficient of α ' for estimating, α '=f
y/ f
j.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310728591.XA CN103730121B (en) | 2013-12-24 | 2013-12-24 | A kind of recognition methods pretending sound and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310728591.XA CN103730121B (en) | 2013-12-24 | 2013-12-24 | A kind of recognition methods pretending sound and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103730121A true CN103730121A (en) | 2014-04-16 |
CN103730121B CN103730121B (en) | 2016-08-24 |
Family
ID=50454168
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310728591.XA Expired - Fee Related CN103730121B (en) | 2013-12-24 | 2013-12-24 | A kind of recognition methods pretending sound and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103730121B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息***有限公司 | Method and device for recommending music stars with tones similar to those of singers |
CN105976819A (en) * | 2016-03-23 | 2016-09-28 | 广州势必可赢网络科技有限公司 | Rnorm score normalization based speaker verification method |
CN109215680A (en) * | 2018-08-16 | 2019-01-15 | 公安部第三研究所 | A kind of voice restoration method based on convolutional neural networks |
CN109741761A (en) * | 2019-03-13 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | Sound processing method and device |
CN109920435A (en) * | 2019-04-09 | 2019-06-21 | 厦门快商通信息咨询有限公司 | A kind of method for recognizing sound-groove and voice print identification device |
CN110363406A (en) * | 2019-06-27 | 2019-10-22 | 上海淇馥信息技术有限公司 | Appraisal procedure, device and the electronic equipment of a kind of client intermediary risk |
CN111462763A (en) * | 2019-09-21 | 2020-07-28 | 美律电子(深圳)有限公司 | Computer-implemented voice command verification method and electronic device |
CN111739547A (en) * | 2020-07-24 | 2020-10-02 | 深圳市声扬科技有限公司 | Voice matching method and device, computer equipment and storage medium |
CN112967712A (en) * | 2021-02-25 | 2021-06-15 | 中山大学 | Synthetic speech detection method based on autoregressive model coefficient |
CN113270112A (en) * | 2021-04-29 | 2021-08-17 | 中国人民解放军陆军工程大学 | Electronic camouflage voice automatic distinguishing and restoring method and system |
CN116013323A (en) * | 2022-12-27 | 2023-04-25 | 浙江大学 | Active evidence obtaining method oriented to voice conversion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005345599A (en) * | 2004-06-01 | 2005-12-15 | Toshiba Tec Corp | Speaker-recognizing device, program, and speaker-recognizing method |
CN1914667A (en) * | 2004-06-01 | 2007-02-14 | 东芝泰格有限公司 | Speaker recognizing device, program, and speaker recognizing method |
CN1967657A (en) * | 2005-11-18 | 2007-05-23 | 成都索贝数码科技股份有限公司 | Automatic tracking and tonal modification system of speaker in program execution and method thereof |
CN101399044A (en) * | 2007-09-29 | 2009-04-01 | 国际商业机器公司 | Voice conversion method and system |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
-
2013
- 2013-12-24 CN CN201310728591.XA patent/CN103730121B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005345599A (en) * | 2004-06-01 | 2005-12-15 | Toshiba Tec Corp | Speaker-recognizing device, program, and speaker-recognizing method |
CN1914667A (en) * | 2004-06-01 | 2007-02-14 | 东芝泰格有限公司 | Speaker recognizing device, program, and speaker recognizing method |
CN1967657A (en) * | 2005-11-18 | 2007-05-23 | 成都索贝数码科技股份有限公司 | Automatic tracking and tonal modification system of speaker in program execution and method thereof |
CN101399044A (en) * | 2007-09-29 | 2009-04-01 | 国际商业机器公司 | Voice conversion method and system |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息***有限公司 | Method and device for recommending music stars with tones similar to those of singers |
CN105976819A (en) * | 2016-03-23 | 2016-09-28 | 广州势必可赢网络科技有限公司 | Rnorm score normalization based speaker verification method |
CN109215680B (en) * | 2018-08-16 | 2020-06-30 | 公安部第三研究所 | Voice restoration method based on convolutional neural network |
CN109215680A (en) * | 2018-08-16 | 2019-01-15 | 公安部第三研究所 | A kind of voice restoration method based on convolutional neural networks |
CN109741761A (en) * | 2019-03-13 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | Sound processing method and device |
CN109920435A (en) * | 2019-04-09 | 2019-06-21 | 厦门快商通信息咨询有限公司 | A kind of method for recognizing sound-groove and voice print identification device |
CN110363406A (en) * | 2019-06-27 | 2019-10-22 | 上海淇馥信息技术有限公司 | Appraisal procedure, device and the electronic equipment of a kind of client intermediary risk |
CN111462763A (en) * | 2019-09-21 | 2020-07-28 | 美律电子(深圳)有限公司 | Computer-implemented voice command verification method and electronic device |
CN111462763B (en) * | 2019-09-21 | 2024-02-27 | 美律电子(深圳)有限公司 | Voice command verification method implemented by computer and electronic device |
CN111739547A (en) * | 2020-07-24 | 2020-10-02 | 深圳市声扬科技有限公司 | Voice matching method and device, computer equipment and storage medium |
CN112967712A (en) * | 2021-02-25 | 2021-06-15 | 中山大学 | Synthetic speech detection method based on autoregressive model coefficient |
CN113270112A (en) * | 2021-04-29 | 2021-08-17 | 中国人民解放军陆军工程大学 | Electronic camouflage voice automatic distinguishing and restoring method and system |
CN116013323A (en) * | 2022-12-27 | 2023-04-25 | 浙江大学 | Active evidence obtaining method oriented to voice conversion |
Also Published As
Publication number | Publication date |
---|---|
CN103730121B (en) | 2016-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103730121A (en) | Method and device for recognizing disguised sounds | |
CN103236260B (en) | Speech recognition system | |
CN106847292B (en) | Method for recognizing sound-groove and device | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
Villalba et al. | Detecting replay attacks from far-field recordings on speaker verification systems | |
CN102968990B (en) | Speaker identifying method and system | |
Yu et al. | Uncertainty propagation in front end factor analysis for noise robust speaker recognition | |
CN102354496B (en) | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof | |
CN104464724A (en) | Speaker recognition method for deliberately pretended voices | |
CN104900229A (en) | Method for extracting mixed characteristic parameters of voice signals | |
CN103077728B (en) | A kind of patient's weak voice endpoint detection method | |
CN106409298A (en) | Identification method of sound rerecording attack | |
Zhang et al. | Joint information from nonlinear and linear features for spoofing detection: An i-vector/DNN based approach | |
CA2492204A1 (en) | Similar speaking recognition method and system using linear and nonlinear feature extraction | |
Shchemelinin et al. | Examining vulnerability of voice verification systems to spoofing attacks by means of a TTS system | |
CN104240706A (en) | Speaker recognition method based on GMM Token matching similarity correction scores | |
CN101887722A (en) | Rapid voiceprint authentication method | |
CN106782508A (en) | The cutting method of speech audio and the cutting device of speech audio | |
CN105280181A (en) | Training method for language recognition model and language recognition method | |
Alam et al. | Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus. | |
CN104464738A (en) | Vocal print recognition method oriented to smart mobile device | |
Mohammadi et al. | Robust features fusion for text independent speaker verification enhancement in noisy environments | |
CN116665649A (en) | Synthetic voice detection method based on prosody characteristics | |
Zhang | The algorithm of voiceprint recognition model based DNN-RELIANCE | |
Hanilci et al. | VQ-UBM based speaker verification through dimension reduction using local PCA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160824 Termination date: 20211224 |
|
CF01 | Termination of patent right due to non-payment of annual fee |