CN103280224B - Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm - Google Patents

Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm Download PDF

Info

Publication number
CN103280224B
CN103280224B CN201310146293.XA CN201310146293A CN103280224B CN 103280224 B CN103280224 B CN 103280224B CN 201310146293 A CN201310146293 A CN 201310146293A CN 103280224 B CN103280224 B CN 103280224B
Authority
CN
China
Prior art keywords
speaker
formula
average
model
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310146293.XA
Other languages
Chinese (zh)
Other versions
CN103280224A (en
Inventor
宋鹏
包永强
赵力
刘健刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Taiyu information technology Limited by Share Ltd
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201310146293.XA priority Critical patent/CN103280224B/en
Publication of CN103280224A publication Critical patent/CN103280224A/en
Application granted granted Critical
Publication of CN103280224B publication Critical patent/CN103280224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, first utilization MAP algorithm utilizes a small amount of training statement to train the model obtaining source speaker and target speaker respectively from reference speaker model.Then, utilize the parameter in self-adaptation speaker model, propose the method for Gaussian normalization and average conversion respectively.And in order to improve conversion effect further, and then propose method Gaussian normalization method and average conversion merged.Simultaneously, because training statement is limited, the accuracy of adaptive model must be affected, the method that the present invention proposes KL divergence is optimized speaker model when changing, subjective and objective experimental result shows: no matter be distortion spectrum degree, or the quality of converting speech and the similarity with target voice.The method that the present invention proposes all obtains and based on the analogous effect of classical GMM method under symmetrical corpus condition.

Description

Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm
Technical field
The present invention relates to a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, belong to voice process technology field.
Background technology
Speech conversion refers to the speaker characteristics speaker characteristics of a people being converted to another person, and a kind of technology keeping semantic content constant.It has a very wide range of applications: as the phonetic synthesis for personalization; The voice communication of low bit rate; The medically recovery etc. of impaired speech.In the past few decades, Voice Conversion Techniques obtains significant progress.A series of phonetics transfer methods that to have occurred with methods such as codebook mapping, gauss hybrid models, neural networks be representative.These methods achieve the conversion of speaker's voice personal characteristics to a great extent.But sight mainly concentrates on based on the speech conversion under symmetrical corpus (same sentence) condition by these methods.And situation about ignoring under asymmetric corpus (different statement).In other words, although before achieve comparatively satisfied converting speech quality based on the speech conversion under symmetrical corpus condition, be widely used, the situation of more asymmetric corpus in actual environment can not be directly applied to.Therefore, we need research further based on the phonetics transfer method under asymmetric corpus condition.
Abroad in the middle of pertinent literature, there is the phonetics transfer method that some propose for asymmetric corpus.As the training method etc. of the method based on the two linear regression of maximum likelihood, the method be separated with content based on the text of two-wire type conversion and the transfer function based on arest neighbors loop iteration.But there is a lot of defect in these methods: as maximum likelihood bilinear regression method depends on the pre-prepd transfer function obtained by symmetrical training; Two-wire type converter technique needs the training statement of a large amount of source speakers and target speaker to ensure the accuracy changed; Arest neighbors cyclic iterative is based upon the spectrum signature faced recently to correspond to identical phoneme, and need a large amount of training statements simultaneously.Therefore, it is large that these methods above-mentioned realize difficulty in actual applications, is not easy to operation.
Summary of the invention
Goal of the invention: the defect existed in order to the phonetics transfer method solved under asymmetric corpus, the invention provides a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm.
Technical scheme: a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, first obtains background speaker model by pre-prepd with reference to the training of speaker's statement; Then by MAP(Maximum a posteriori) adaptive technique, the statement of source speaker and target speaker is trained respectively and obtains source speaker and target speaker model; Then speech conversion function is obtained by the average in self-adaptation source speaker and target speaker model and variance training, propose the method for Gaussian normalization and average conversion respectively, in order to improve conversion effect further, and then propose the method for Gaussian normalization and average conversion fusion.In addition, because the training statement of source speaker and target speaker is limited, very difficult training obtains speaker model accurately, in the present invention, we have proposed and solves this problem by the method for KL divergence (Kullback-Leibler divergence).
1) self-adaptation of speaker model
Described based in the phonetics transfer method of adaptive technique, background speaker model is by GMM(Gaussian mixture model) describe, as follows:
p ( z ) = Σ i = 1 M ω i N ( z , μ i B , Σ i B ) Formula (1)
Wherein N (.) represents Gaussian distribution, and z is speech spectral characteristics vector, and M represents the number of gaussian component, ω ifor the weight shared by i-th gaussian component, meet with represent mean vector and the variance matrix of i-th gaussian component respectively.Sequence O=[the o of given observation spectrum signature vector 1, o 2..., o t], use MAP(Maximum a posteriori) adaptive algorithm upgrades average and variance, and formula is as follows:
μ ^ i B = γ i E i ( o ) + ( 1 - γ i ) μ i B Formula (2)
Σ ^ i B = γ i E i ( o 2 ) + ( 1 - γ i ) [ ( μ i B ) 2 + Σ i B ] - ( μ ^ i B ) 2 Formula (3)
Wherein with represent the middle updated value of i-th gaussian component average and variance respectively, E i(o) and E i(o 2) represent the average of i-th gaussian component and variance statistic amount, γ ibe adaptive factor, for the balance to new and old statistic self-adaptation degree, meet
γ i = n i n i + ρ Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n irepresent weight statistic.Finally obtain the weight of source speaker x and target speaker y model, average and variance respectively: with
2) based on the phonetics transfer method of Gaussian normalization
In the present invention, first proposed the phonetics transfer method based on Gaussian normalization, at translate phase, calculate each frame frequency spectrum signature parameter x of source speaker tposterior probability on the speaker model of source, is expressed as:
m = arg max p ( i | x t ) , i = 1,2 , . . . , M i Formula (5)
Wherein p (i|x t) represent x tbelong to the posterior probability of i-th gaussian component, meet according to the character of GMM cluster, source speaker and the same gaussian component of target speaker can be thought and belong to same phoneme, meet:
x - μ m x σ m x = y ^ - μ m y σ m y Formula (6)
Wherein with represent average and the variance of m the gaussian component of source speaker and target speaker respectively, then can obtain transfer function as follows:
F ( x ) = y ^ = σ m y σ m x x + μ m y - σ m y σ m x μ m x Formula (7)
3) based on the phonetics transfer method of average conversion
In the present invention, we have proposed the phonetics transfer method that another is changed based on average, the model mean vector sequence of given source speaker and target speaker: with then μ xand μ ybetween mapping function be shown below:
μ y=F (μ x)=A μ x+ b formula (8)
Setting use least square method can obtain unknown parameter A and b:
A = μ ^ y μ ^ x T ( μ ^ x μ ^ x T ) - 1 , b = μ ‾ y - A μ ‾ x Formula (9)
Wherein transfer function shown in formula (8) can be directly used in the conversion of spectrum signature, then transfer function is as follows:
F (x)=Ax+b formula (10)
4) based on the phonetics transfer method that Gaussian normalization and average conversion merge
The the 2nd and the 3rd) sets forth phonetics transfer method based on Gaussian normalization and average conversion in two parts.Wherein Gaussian normalization method can be counted as a kind of local linear smoothing method, and average conversion method can think a kind of global map method.In order to promote conversion effect further, the present invention proposes a kind of conversion method of these two kinds of methods being carried out merge.Transfer function is shown below:
F (x)=θ F g(x)+(1-θ) F m(x) formula (11)
Wherein F g(x) and F mx () represents respectively and trains by Gaussian normalization and average conversion method the transfer function obtained, θ is that weighting coefficient meets 0≤θ≤1.
5) model optimization
Have employed MAP adaptive algorithm in the present invention and modeling carried out to speaker model, but due to adaptive training statement limited, be not that the parameter of each gaussian component of speaker model can be updated.This will inevitably affect the effect of speech conversion.Invention introduces KL divergence to reduce the impact of this problem.KL divergence is used for describing the distance between different distributions, supposes f i(x) and f jx () represents the distribution of two gaussian component respectively, then KL divergence is therebetween expressed as
D ( f i ( x ) | | f j ( x ) ) = Σ x f i ( x ) log f i ( x ) f j ( x ) Formula (12)
Formula (12) has asymmetry, here we to redefine KL divergence as follows:
D ij ( x ) = 1 2 [ D ( f i ( x ) | | f j ( x ) ) + D ( f j ( x ) | | f i ( x ) ) ] Formula (13)
In transfer process, if the average of present component or variance are not updated, then the average of nearest gaussian component or variance is selected to replace.
Beneficial effect: compared with prior art, provided by the invention based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, advantage and effect are:
1) achieve the speech conversion based on asymmetric corpus, can effectively avoid for the symmetric requirement of corpus.
2) adopt MAP adaptive algorithm to carry out modeling to speaker model, speaker model can be obtained by the training statement of minute quantity, decrease the demand of speaker being trained to statement quantity.
3) propose the phonetics transfer method based on Gaussian normalization and average conversion respectively, and and then propose the two method merged, avoid on the one hand the demand for symmetrical corpus, decrease the calculated amount of transfer function training on the other hand to a great extent.
4) by KL divergence method, self-adaptation speaker model being optimized, by being optimized the parameter of the gaussian component be not updated in speaker model, the effect of speech conversion can being improved to a certain extent.
Accompanying drawing explanation
Fig. 1 is the process flow diagram obtaining transfer function in the embodiment of the present invention based on the method for Gaussian normalization;
Fig. 2 is the process flow diagram that the method mapped based on average in the embodiment of the present invention obtains transfer function;
Fig. 3 is the process flow diagram obtaining merging transfer function in the embodiment of the present invention;
Fig. 4 is that the embodiment of the present invention and prior art are about the conversion comparison diagram of male voice to female voice;
Fig. 5 is that the embodiment of the present invention and prior art are about the conversion comparison diagram of female voice to male voice;
Fig. 6 is the embodiment of the present invention and the Comparative result figure adopting mean opinion score method to obtain based on the classical GMM method under symmetrical corpus condition;
Fig. 7 is the embodiment of the present invention and the similarity test result comparison diagram obtained based on the classical GMM method under symmetrical corpus condition.
Embodiment
Below in conjunction with specific embodiment, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to the various equivalent form of value of the present invention has all fallen within the application's claims limited range.
Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, comprise the steps:
1) use STRAIGHT model to carry out feature extraction to the statement of all speakers, extract MFCC cepstrum (Mel-cepstrum coefficients, MCC) and fundamental frequency (F0) respectively.
2) trained the background model generating and meet GMM and distribute with reference to the spectrum signature MCC that the training statement of speaker extracts by pre-prepd third party; The description of background model, as follows:
p ( z ) = Σ i = 1 M ω i N ( z , μ i B , Σ i B ) Formula (1)
Wherein N (.) represents Gaussian distribution, and z is speech spectral characteristics vector, and M represents the number of gaussian component, ω ifor the weight shared by i-th gaussian component, meet with represent mean vector and the variance matrix of i-th gaussian component respectively.
3) similar with the speaker adaptation in Speaker Identification, selection MAP algorithm respectively adaptive training obtains the model of source speaker and target speaker.
Sequence O=[the o of given observation spectrum signature vector 1, o 2..., o t], use MAP adaptive algorithm to upgrade average and variance, formula is as follows:
μ ^ i B = γ i E i ( o ) + ( 1 - γ i ) μ i B Formula (2)
Σ ^ i B = γ i E i ( o 2 ) + ( 1 - γ i ) [ ( μ i B ) 2 + Σ i B ] - ( μ ^ i B ) 2 Formula (3)
Wherein with represent the middle updated value of i-th gaussian component average and variance respectively, E i(o) and E i(o 2) represent the average of i-th gaussian component and variance statistic amount, γ iadaptive factor, for the balance to new and old statistic self-adaptation degree.Meet
γ i = n i n i + ρ Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n irepresent weight statistic; Finally obtain the weight of source speaker x and target speaker y model, average and variance respectively: with
4) utilization KL divergence calculates the distance in each speaker model between different component respectively.
Suppose f i(x) and f jx () represents the distribution of two gaussian component respectively, then KL divergence is therebetween expressed as
D ( f i ( x ) | | f j ( x ) ) = Σ x f i ( x ) log f i ( x ) f j ( x ) Formula (12)
Formula (12) has asymmetry, here we to redefine KL divergence as follows:
D ij ( x ) = 1 2 [ D ( f i ( x ) | | f j ( x ) ) + D ( f j ( x ) | | f i ( x ) ) ] Formula (13)
5) for the spectrum signature vector of each frame tested speech, calculate its posterior probability in the speaker model of source in gaussian component, then select the gaussian component that posterior probability is maximum.
m = arg max i p ( i | x t ) , i = 1,2 , . . . , M Formula (5)
Wherein p (i|x) represents posterior probability, meets p ( i | x ) = ω i N ( x , μ i x , Σ i xx ) Σ j = 1 M ω j N ( x , μ j x , Σ j xx ) .
According to the character of GMM cluster, the same gaussian component of source speaker and target speaker can be thought and belongs to same phoneme, meets:
x - μ m x σ m x = y ^ - μ m y σ m y Formula (6)
Wherein with represent average and the variance of m the gaussian component of source speaker and target speaker respectively, in current gaussian component, use Gaussian normalization thus obtain transfer function F g(x).Meanwhile, in the training process of transfer function, if the average of present component or variance are not updated, then the average of the gaussian component selecting KL nearest or variance replace.Fig. 1 method given based on Gaussian normalization obtains the flow process of transfer function.
6) utilize the mean vector in self-adaptation speaker model, use the method based on least square to obtain spectrum signature transfer function F mx (), meanwhile, in the training process of transfer function, if the average of present component or variance are not updated, then the average of the gaussian component selecting KL nearest or variance replace.Fig. 2 gives the flow process that the method mapped based on average obtains transfer function.
7) Gaussian normalization method can be counted as a kind of local linear smoothing method, and average conversion method can be regarded as a kind of global map method.In order to promote conversion effect further, the present invention proposes a kind of conversion method these two kinds of methods merged.Then transfer function is F (x)=θ F g(x)+(1-θ) F m(x).Fig. 3 give merge transfer function obtain process.
8) conversion of F0: adopt the classical method based on Gaussian normalization to change F0.
9) spectrum signature after the conversion obtained by transfer function and F0 carry out the synthesis of voice by STAIGHT model, finally obtain converting speech.
Performance evaluation:
The present embodiment have selected the English speech database of CMU ATCTIC and evaluates conversion effect.500 statements of BDL and CLB a man and a woman two speakers are selected to carry out the training of background model respectively.Respectively by RMS and SLT a man and a woman two speakers, comprise 120 statements respectively.50 wherein symmetrical statements are used for GMM pedestal method, and asymmetrical 50 statements are used for method of the present invention, and other 20 statements are used for evaluation test.The size of the mixed components M of background model is optimised is set as 256, and the size of the gaussian component of GMM pedestal method is optimised is simultaneously set as that 16, MCC exponent number is set to 24.
First we select Mel-cepstrum distance (Mel cepstral distance, MCD) to carry out objective evaluation to the spectrum signature after conversion.
MCD = 10 / log 10 2 Σ j = 1 D ( mc j c - mc j t ) 2 Formula (14)
Wherein with be respectively the MCC of converting speech and target voice, D is the exponent number of MCC, and the less expression conversion effect of MCD value is better.
Fig. 4 and Fig. 5 gives the present invention the several method proposed and the MCD result obtained based on the classical GMM Measures compare under symmetrical corpus condition, and Fig. 4 gives the conversion of male voice to female voice, and Fig. 5 gives the conversion of female voice to male voice.Wherein GN represents Gaussian normalization method, MT represents average transformation approach, GNMT represents fusion method.Can find, along with the increase of training statement, the MCD curve of the method that the present invention proposes presents identical trend, all gradually near the result of GMM pedestal method.And adopt GNMT method total energy to obtain effect more better than GN or MT method.This shows that fusion method can improve the effect of Gaussian normalization method and average transformation approach effectively.
Then we select the methods such as mean opinion score (Mean opinion score, MOS) and similarity test to carry out subjective assessment to the quality of converting speech and the similarity of converting speech and target voice respectively.Fig. 6 be with the present invention propose method with adopt mean opinion score (Mean opinion score based on the classical GMM method under symmetrical corpus condition, MOS) result that obtains of method, what adopt is that 5 points of marking principles (wherein 1 is divided into " poor ", and 5 are divided into " very good ") made are given a mark to the quality of converting speech.Fig. 7 is by the inventive method and the similarity test result obtained based on the classical GMM method under symmetrical corpus condition, what adopt is the similarity that 5 points of systems (wherein 1 represents " completely different ", and 5 represent " completely the same ") judge converting speech and target voice equally.MOS test and similarity test all adopt 5 asymmetric statements to be used for speaker adaptation, and take part in marking by 6 professional researchists, and " work " font wherein in figure represents variance.Can find from the result of Fig. 6 and Fig. 7, the method that the present invention proposes can achieve the effect comparable with GMM method, demonstrates the result of objective evaluation MCD to a certain extent.

Claims (5)

1. based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that: first obtain background speaker model by pre-prepd with reference to the training of speaker's statement; Then by MAP adaptive technique, the statement of source speaker and target speaker is trained respectively and obtains source speaker and target speaker model; Then obtain speech conversion function by the average in self-adaptation source speaker and target speaker model and variance training, in speech conversion process, use the method for Gaussian normalization and average conversion, and the method for Gaussian normalization and average conversion fusion; In addition statement is trained to obtain speaker model accurately by KL divergence from limited source speaker and target speaker.
2. as claimed in claim 1 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
The self-adaptation of speaker model
Described based in the phonetics transfer method of adaptive technique, background speaker model is described by GMM, as follows:
p ( z ) = Σ i = 1 M ω i N ( z , μ i B , Σ i B ) Formula (1)
Wherein N (.) represents Gaussian distribution, and Z is speech spectral characteristics vector, and M represents the number of gaussian component, ω ifor the weight shared by i-th gaussian component, meet with represent mean vector and the variance matrix of i-th gaussian component respectively; Sequence O=[the o of given observation spectrum signature vector 1, o 2..., o t], use MAP adaptive algorithm to upgrade average and variance, formula is as follows:
μ ^ i B = γ i E i ( o ) + ( 1 - γ i ) μ i B Formula (2)
Σ ^ i B = γ i E i ( o 2 ) + ( 1 - γ i ) [ ( μ i B ) 2 + Σ i B ] - ( μ ^ i B ) 2 Formula (3)
Wherein with represent the middle updated value of i-th gaussian component average and variance respectively; E i(o) and E i(o 2) represent the average of i-th gaussian component and variance statistic amount, γ iadaptive factor, for the balance to new and old statistic self-adaptation degree; Meet
γ i = n i n i + ρ Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n irepresent weight statistic; Finally obtain the weight of source speaker x and target speaker y model, average and variance respectively: with
3. as claimed in claim 2 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
Based on the phonetics transfer method of Gaussian normalization
First proposed the phonetics transfer method based on Gaussian normalization, at translate phase, calculate each frame frequency spectrum signature parameter x of source speaker tposterior probability on the speaker model of source, is expressed as:
m = arg m a x i p ( i | x t ) , i = 1 , 2 , ... , M Formula (5)
Wherein p (i|x t) represent x tbelong to the posterior probability of i-th gaussian component, meet according to the character of GMM cluster, source speaker and the same gaussian component of target speaker can be thought and belong to same phoneme, meet:
x - μ m x σ m x = y ^ - μ m y σ m y Formula (6)
Wherein with represent average and the variance of m the gaussian component of source speaker and target speaker respectively, then can obtain transfer function as follows:
F ( x ) = y ^ = σ m y σ m x x + μ m y - σ m y σ m x μ m x Formula (7)
Based on the phonetics transfer method of average conversion
The model mean vector sequence of given source speaker and target speaker: with then μ xand μ ybetween mapping function be shown below:
μ y=F (μ x)=A μ x+ b formula (8)
Setting use least square method can obtain unknown parameter A and b:
A = μ ^ y μ ^ x T ( μ ^ x μ ^ x T ) - 1 , b = μ ‾ y - A μ ‾ x Formula (9)
Wherein transfer function shown in formula (8) can be directly used in the conversion of spectrum signature, then transfer function is as follows:
F (x)=Ax+b formula (10)
4. as claimed in claim 3 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
Based on the phonetics transfer method that Gaussian normalization and average conversion merge
Transfer function is shown below:
F (x)=θ F g(x)+(1-θ) F m(x) formula (11)
Wherein F g(x) and F mx () represents respectively and trains by Gaussian normalization and average conversion method the transfer function obtained, θ is that weighting coefficient meets 0≤θ≤1;
Model optimization
KL divergence is used for describing the distance between different distributions, supposes f i(x) and f jx () represents the distribution of two gaussian component respectively, then KL divergence is therebetween expressed as
D ( f i ( x ) | | f j ( x ) ) = Σ x f i ( x ) l o g f i ( x ) f j ( x ) Formula (12)
Formula (12) has asymmetry, redefines KL divergence as follows:
D i j ( x ) = 1 2 [ D ( f i ( x ) | | f j ( x ) ) + D ( f j ( x ) | | f i ( x ) ) ] Formula (13)
In transfer process, if the average of present component or variance are not updated, then the average of nearest gaussian component or variance is selected to replace.
5. as claimed in claim 1 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that: the size of background speaker model GMM component M is selected according to the size of training corpus scale, be chosen as the Nth power of 2, N is positive integer.
CN201310146293.XA 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm Active CN103280224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310146293.XA CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310146293.XA CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Publications (2)

Publication Number Publication Date
CN103280224A CN103280224A (en) 2013-09-04
CN103280224B true CN103280224B (en) 2015-09-16

Family

ID=49062718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310146293.XA Active CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Country Status (1)

Country Link
CN (1) CN103280224B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103531205B (en) * 2013-10-09 2016-08-31 常州工学院 The asymmetrical voice conversion method mapped based on deep neural network feature
CN104123933A (en) * 2014-08-01 2014-10-29 中国科学院自动化研究所 Self-adaptive non-parallel training based voice conversion method
CN104217721B (en) * 2014-08-14 2017-03-08 东南大学 Based on the phonetics transfer method under the conditions of the asymmetric sound bank that speaker model aligns
CN106205623B (en) * 2016-06-17 2019-05-21 福建星网视易信息***有限公司 A kind of sound converting method and device
US10176819B2 (en) * 2016-07-11 2019-01-08 The Chinese University Of Hong Kong Phonetic posteriorgrams for many-to-one voice conversion
CN106504741B (en) * 2016-09-18 2019-10-25 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of phonetics transfer method based on deep neural network phoneme information
CN107301859B (en) * 2017-06-21 2020-02-21 南京邮电大学 Voice conversion method under non-parallel text condition based on self-adaptive Gaussian clustering
CN107945795B (en) * 2017-11-13 2021-06-25 河海大学 Rapid model self-adaption method based on Gaussian classification
WO2019116889A1 (en) * 2017-12-12 2019-06-20 ソニー株式会社 Signal processing device and method, learning device and method, and program
CN112331181B (en) * 2019-07-30 2024-07-05 中国科学院声学研究所 Target speaker voice extraction method based on multi-speaker condition
CN110544466A (en) * 2019-08-19 2019-12-06 广州九四智能科技有限公司 Speech synthesis method under condition of small amount of recording samples
CN112767942B (en) * 2020-12-31 2023-04-07 北京云迹科技股份有限公司 Speech recognition engine adaptation method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007103520A2 (en) * 2006-03-08 2007-09-13 Voxonic, Inc. Codebook-less speech conversion method and system
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007103520A2 (en) * 2006-03-08 2007-09-13 Voxonic, Inc. Codebook-less speech conversion method and system
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Efficient fundamental frequency transformation for voice conversion;Song Peng ET AL;《Journal of southeast university》;20120630;第28卷(第2期);140-144 *

Also Published As

Publication number Publication date
CN103280224A (en) 2013-09-04

Similar Documents

Publication Publication Date Title
CN103280224B (en) Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm
CN101833951B (en) Multi-background modeling method for speaker recognition
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN108831445A (en) Sichuan dialect recognition methods, acoustic training model method, device and equipment
CN105261367B (en) A kind of method for distinguishing speek person
CN104217721A (en) Speech conversion method based on asymmetric speech database conditions of speaker model alignment
CN103198827A (en) Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
CN105469784A (en) Generation method for probabilistic linear discriminant analysis (PLDA) model and speaker clustering method and system
CN105261358A (en) N-gram grammar model constructing method for voice identification and voice identification system
CN105404621A (en) Method and system for blind people to read Chinese character
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
CN104361894A (en) Output-based objective voice quality evaluation method
CN102982799A (en) Speech recognition optimization decoding method integrating guide probability
CN104462409A (en) Cross-language emotional resource data identification method based on AdaBoost
CN106782609A (en) A kind of spoken comparison method
CN107093422A (en) A kind of audio recognition method and speech recognition system
CN105280181A (en) Training method for language recognition model and language recognition method
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN105488098A (en) Field difference based new word extraction method
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Garg et al. Survey on acoustic modeling and feature extraction for speech recognition
CN107785030A (en) A kind of phonetics transfer method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180614

Address after: 210037 Qixia district and Yanlu No. 408, Nanjing, Jiangsu

Patentee after: Nanjing Boke Electronic Technology Co., Ltd.

Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu

Patentee before: Southeast University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180709

Address after: 211103 No. 1009 Tianyuan East Road, Jiangning District, Nanjing, Jiangsu.

Patentee after: LIXIN Wireless Electronic Technology Co., Ltd.

Address before: 210037 Qixia district and Yanlu No. 408, Nanjing, Jiangsu

Patentee before: Nanjing Boke Electronic Technology Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190109

Address after: Room 125, Building 1, Building 6, 4299 Jindu Road, Minhang District, Shanghai 201100

Patentee after: Shanghai Taiyu information technology Limited by Share Ltd

Address before: 211103 No. 1009 Tianyuan East Road, Jiangning District, Nanjing, Jiangsu.

Patentee before: LIXIN Wireless Electronic Technology Co., Ltd.

TR01 Transfer of patent right