CN103280224A - Voice conversion method under asymmetric corpus condition on basis of adaptive algorithm - Google Patents

Voice conversion method under asymmetric corpus condition on basis of adaptive algorithm Download PDF

Info

Publication number
CN103280224A
CN103280224A CN201310146293XA CN201310146293A CN103280224A CN 103280224 A CN103280224 A CN 103280224A CN 201310146293X A CN201310146293X A CN 201310146293XA CN 201310146293 A CN201310146293 A CN 201310146293A CN 103280224 A CN103280224 A CN 103280224A
Authority
CN
China
Prior art keywords
speaker
formula
average
gaussian
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310146293XA
Other languages
Chinese (zh)
Other versions
CN103280224B (en
Inventor
宋鹏
包永强
赵力
刘健刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Taiyu information technology Limited by Share Ltd
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201310146293.XA priority Critical patent/CN103280224B/en
Publication of CN103280224A publication Critical patent/CN103280224A/en
Application granted granted Critical
Publication of CN103280224B publication Critical patent/CN103280224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a voice conversion method under the asymmetric corpus condition on the basis of an adaptive algorithm. According to the method, firstly, source speaker and target speaker models are respectively obtained through training from a reference speaker model via utilizing a few of training sentences by a MAP (maximum a posteriori) algorithm; then, Gaussian normalization and average conversion methods are respectively provided through utilizing parameters in an adaptive speaker model; in addition, in order to further improve the conversion effect, a method combining the Gaussian normalization method and the average conversion method is further provided; and meanwhile, because the training sentences are limited, the accuracy of the adaptive model is inevitably influenced, the invention provides a KL (Kullback-Leibler) divergence method, so that a speaker model is optimized during the conversion, and subjective and objective experimental results show that the frequency spectrum distortion degree, the converted voice quality and the target voice similarity are respectively improved. All the methods provided by the invention obtain the effect similar to that of a classical GMM (Gaussian mixture model) method under the condition on the basis of a symmetric corpus.

Description

Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm
Technical field
The present invention relates to a kind ofly based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, belong to the voice process technology field.
Background technology
Speech conversion refers to the speaker characteristics that a people's speaker characteristics is converted to another person, and keeps the constant a kind of technology of semantic content.It has a very wide range of applications: as being used for personalized phonetic synthesis; The voice communication of low bit rate; Medically recovery of impaired speech etc.In the past few decades, Voice Conversion Techniques has obtained significant progress.Having occurred with methods such as code book mapping, gauss hybrid models, neural networks is a series of phonetics transfer methods of representative.These methods have realized the conversion of speaker's voice personal characteristics to a great extent.Yet these methods mainly concentrate on sight based on the speech conversion under symmetrical corpus (same sentence) condition.And ignored situation under the asymmetric corpus (different statement).In other words, although the speech conversion based under the symmetrical corpus condition has before obtained comparatively satisfied converting speech quality, obtained using widely, can not directly apply to the situation of more asymmetric corpus in actual environment.Therefore, we need further research based on the phonetics transfer method under the asymmetric corpus condition.
In the middle of the pertinent literature, some have been arranged at the phonetics transfer method of asymmetric corpus proposition abroad.The method of separating with content as the method that returns based on maximum likelihood two-wire type, based on the text of two-wire type conversion and based on the training method of the transfer function of arest neighbors loop iteration etc.But there are a lot of defectives in these methods: depend on the pre-prepd transfer function that is obtained by symmetrical corpus training as the maximum likelihood bilinearity Return Law; Two-wire type converter technique needs a large amount of source speakers and target speaker's training statement to guarantee the accuracy of changing; The arest neighbors cyclic iterative is to be based upon the spectrum signature correspondence of facing recently identical phoneme, and needs a large amount of training statements simultaneously.Therefore, above-mentioned these methods realize that in actual applications difficulty is big, are not easy to operation.
Summary of the invention
Goal of the invention: the defective for the phonetics transfer method that solves under the asymmetric corpus exists the invention provides a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm.
Technical scheme: a kind of based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, at first obtain the background speaker model by pre-prepd with reference to the training of speaker's statement; Then by MAP(Maximum a posteriori) adaptive technique, source speaker and target speaker's statement trained respectively obtain source speaker and target speaker model; Then obtain the speech conversion function by the average in self-adaptation source speaker and the target speaker model and variance training, the method of Gaussian normalization and average conversion has been proposed respectively, improve conversion effect for further, and then proposed the method for Gaussian normalization and average conversion fusion.In addition, because source speaker and target speaker's training statement is limited, very difficult training obtains speaker model accurately, and in the present invention, we have proposed to solve this problem by the method for KL divergence (Kullback-Leibler divergence).
1) self-adaptation of speaker model
In described phonetics transfer method based on adaptive technique, the background speaker model is by GMM(Gaussian mixture model) describe, as follows:
p ( z ) = Σ i = 1 M ω i N ( z , μ i B , Σ i B ) Formula (1)
Wherein N (.) represents Gaussian distribution, and z is the voice spectrum proper vector, and M represents the number of gaussian component, ω iBe i the weight that gaussian component is shared, satisfy
Figure BDA00003093836600023
With
Figure BDA00003093836600024
Mean vector and the variance matrix of representing i gaussian component respectively.The sequence O=[o of given observation spectrum signature vector 1, o 2..., o T], use MAP(Maximum a posteriori) adaptive algorithm upgrades average and variance, and formula is as follows:
μ ^ i B = γ i E i ( o ) + ( 1 - γ i ) μ i B Formula (2)
Σ ^ i B = γ i E i ( o 2 ) + ( 1 - γ i ) [ ( μ i B ) 2 + Σ i B ] - ( μ ^ i B ) 2 Formula (3)
Wherein
Figure BDA00003093836600027
With
Figure BDA00003093836600028
The middle updating value of representing i gaussian component average and variance respectively, E i(o) and E i(o 2) expression i gaussian component average and variance statistic, γ iBe the self-adaptation factor, be used for the balance to new and old statistic self-adaptation degree, satisfy
γ i = n i n i + ρ Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n iExpression weight statistic.Final weight, average and the variance that obtains source speaker x and target speaker y model respectively:
Figure BDA000030938366000210
With
Figure BDA000030938366000211
2) based on the phonetics transfer method of Gaussian normalization
In the present invention, at first propose the phonetics transfer method based on Gaussian normalization, at translate phase, calculated each frame frequency spectrum signature parameter x of source speaker tPosterior probability on the speaker model of source is expressed as:
m = arg max p ( i | x t ) , i = 1,2 , . . . , M i Formula (5)
P (i|x wherein t) expression x tThe posterior probability that belongs to i gaussian component satisfies
Figure BDA00003093836600032
According to the character of GMM cluster, source speaker and the same gaussian component of target speaker can be thought and belong to same phoneme, satisfy:
x - μ m x σ m x = y ^ - μ m y σ m y Formula (6)
Wherein
Figure BDA00003093836600034
Figure BDA00003093836600035
Figure BDA00003093836600036
With
Figure BDA00003093836600037
Represent average and the variance of source speaker and target speaker's m gaussian component respectively, it is as follows then can to obtain transfer function:
F ( x ) = y ^ = σ m y σ m x x + μ m y - σ m y σ m x μ m x Formula (7)
3) phonetics transfer method of changing based on average
In the present invention, we have proposed another phonetics transfer method based on the average conversion, given source speaker and target speaker's model mean vector sequence:
Figure BDA00003093836600039
With
Figure BDA000030938366000310
μ then xAnd μ yBetween mapping function be shown below:
μ y=F (μ x)=A μ x+ b formula (8)
Set
Figure BDA000030938366000311
Use least square method can obtain unknown parameter A and b:
A = μ ^ y μ ^ x T ( μ ^ x μ ^ x T ) - 1 , b = μ ‾ y - A μ ‾ x Formula (9)
Wherein
Figure BDA000030938366000313
Transfer function shown in the formula (8) can be directly used in the conversion of spectrum signature, and then transfer function is as follows:
F (x)=Ax+b formula (10)
4) phonetics transfer method that conversion is merged based on Gaussian normalization and average
The the 2nd and the 3rd) provided the phonetics transfer method based on Gaussian normalization and average conversion in two parts respectively.Wherein the Gaussian normalization method can be counted as a kind of local linear regression method, and the average conversion method can be thought a kind of global map method.In order further to promote conversion effect, the present invention proposes a kind of with these two kinds of conversion methods that method merges.Transfer function is shown below:
F (x)=θ F g(x)+(1-θ) F m(x) formula (11)
F wherein g(x) and F m(x) represent respectively to train the transfer function that obtains by Gaussian normalization and average conversion method, θ is that weighting coefficient satisfies 0≤θ≤1.
5) model optimization
Adopted the MAP adaptive algorithm that speaker model is carried out modeling among the present invention, but because the adaptive training statement is limited, be not that the parameter of each gaussian component of speaker model all can be updated.This will inevitably influence the effect of speech conversion.The present invention has introduced the influence that the KL divergence reduces this problem.The KL divergence is used for describing the distance between the different distributions, supposes f i(x) and f j(x) represent the distribution of two gaussian component respectively, then the KL divergence between the two is expressed as
D ( f i ( x ) | | f j ( x ) ) = Σ x f i ( x ) log f i ( x ) f j ( x ) Formula (12)
Formula (12) has asymmetry, and it is as follows that we redefine the KL divergence here:
D ij ( x ) = 1 2 [ D ( f i ( x ) | | f j ( x ) ) + D ( f j ( x ) | | f i ( x ) ) ] Formula (13)
In transfer process, if the average of current component or variance are not updated, then select for use the average of nearest gaussian component or variance to replace.
Beneficial effect: compared with prior art, provided by the invention based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, advantage and effect are:
1) realized speech conversion based on asymmetric corpus, can effectively avoid for the symmetric requirement of corpus.
2) adopt the MAP adaptive algorithm that speaker model is carried out modeling, can obtain speaker model by the training statement of minute quantity, reduced the demand of the speaker being trained statement quantity.
3) phonetics transfer method based on the conversion of Gaussian normalization and average has been proposed respectively, and and then the method for the two fusion has been proposed, avoided the demand for symmetrical corpus on the one hand, reduced the calculated amount of transfer function training on the other hand to a great extent.
4) by KL divergence method the self-adaptation speaker model is optimized, is optimized by the parameter to the gaussian component that is not updated in the speaker model, can improve the effect of speech conversion to a certain extent.
Description of drawings
Fig. 1 is for obtaining the process flow diagram of transfer function based on the method for Gaussian normalization in the embodiment of the invention;
Fig. 2 is for obtaining the process flow diagram of transfer function based on the method for average mapping in the embodiment of the invention;
Fig. 3 is for obtaining merging the process flow diagram of transfer function in the embodiment of the invention;
Fig. 4 is the embodiment of the invention and prior art arrive female voice about male voice conversion comparison diagram;
Fig. 5 is the embodiment of the invention and prior art arrive male voice about female voice conversion comparison diagram;
Fig. 6 is the embodiment of the invention and the comparison diagram as a result that adopts the mean opinion score method to obtain based on the classical GMM method under the symmetrical corpus condition;
Fig. 7 is the embodiment of the invention and the similarity test result comparison diagram that obtains based on the classical GMM method under the symmetrical corpus condition.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment only is used for explanation the present invention and is not used in and limits the scope of the invention, after having read the present invention, those skilled in the art all fall within the application's claims institute restricted portion to the modification of the various equivalent form of values of the present invention.
Phonetics transfer method based under the asymmetric corpus condition of adaptive algorithm comprises the steps:
1) use the STRAIGHT model to carry out feature extraction to all speakers' statement, extract respectively the Mei Er cepstrum coefficient (Mel-cepstrum coefficients, MCC) and fundamental frequency (F0).
2) satisfy the background model that GMM distributes by pre-prepd third party with reference to the spectrum signature MCC training generation of speaker's training statement extraction; The description of background model, as follows:
p ( z ) = Σ i = 1 M ω i N ( z , μ i B , Σ i B ) Formula (1)
Wherein N (.) represents Gaussian distribution, and z is the voice spectrum proper vector, and M represents the number of gaussian component, ω iBe i the weight that gaussian component is shared, satisfy
Figure BDA00003093836600052
With
Figure BDA00003093836600054
Mean vector and the variance matrix of representing i gaussian component respectively.
3) with Speaker Identification in speaker adaptation similar, select the MAP algorithm respectively adaptive training obtain source speaker and target speaker's model.
The sequence O=[o of given observation spectrum signature vector 1, o 2..., o T], use the MAP adaptive algorithm that average and variance are upgraded, formula is as follows:
μ ^ i B = γ i E i ( o ) + ( 1 - γ i ) μ i B Formula (2)
Σ ^ i B = γ i E i ( o 2 ) + ( 1 - γ i ) [ ( μ i B ) 2 + Σ i B ] - ( μ ^ i B ) 2 Formula (3)
Wherein
Figure BDA00003093836600061
With
Figure BDA00003093836600062
The middle updating value of representing i gaussian component average and variance respectively, E i(o) and E i(o 2) expression i gaussian component average and variance statistic, γ iBe the self-adaptation factor, be used for the balance to new and old statistic self-adaptation degree.Satisfy
γ i = n i n i + ρ Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n iExpression weight statistic; Final weight, average and the variance that obtains source speaker x and target speaker y model respectively: With
Figure BDA00003093836600065
4) use the KL divergence to calculate the distance between the different components in each speaker model respectively.
Suppose f i(x) and f j(x) represent the distribution of two gaussian component respectively, then the KL divergence between the two is expressed as
D ( f i ( x ) | | f j ( x ) ) = Σ x f i ( x ) log f i ( x ) f j ( x ) Formula (12)
Formula (12) has asymmetry, and it is as follows that we redefine the KL divergence here:
D ij ( x ) = 1 2 [ D ( f i ( x ) | | f j ( x ) ) + D ( f j ( x ) | | f i ( x ) ) ] Formula (13)
5) for the spectrum signature vector of each frame tested speech, calculate its posterior probability on the gaussian component in the speaker model of source, then select the gaussian component of posterior probability maximum.
m = arg max i p ( i | x t ) , i = 1,2 , . . . , M Formula (5)
Wherein p (i|x) expression posterior probability satisfies p ( i | x ) = ω i N ( x , μ i x , Σ i xx ) Σ j = 1 M ω j N ( x , μ j x , Σ j xx ) .
According to the character of GMM cluster, source speaker and target speaker's same gaussian component can be thought and belongs to same phoneme, satisfies:
x - μ m x σ m x = y ^ - μ m y σ m y Formula (6)
Wherein
Figure BDA000030938366000611
Figure BDA000030938366000612
Figure BDA000030938366000613
With Average and the variance of representing source speaker and target speaker's m gaussian component respectively in current gaussian component, thereby use Gaussian normalization to obtain transfer function F g(x).Simultaneously, in the training process of transfer function, if the average of current component or variance are not updated, then select for use average or the variance of the nearest gaussian component of KL to replace.Fig. 1 has provided the flow process that obtains transfer function based on the method for Gaussian normalization.
6) utilize mean vector in the self-adaptation speaker model, use the method based on least square to obtain spectrum signature transfer function F m(x), simultaneously, in the training process of transfer function, if the average of current component or variance are not updated, then select for use average or the variance of the nearest gaussian component of KL to replace.Fig. 2 has provided the flow process that obtains transfer function based on the method for average mapping.
7) the Gaussian normalization method can be counted as a kind of local linear regression method, and the average conversion method can be regarded as a kind of global map method.In order further to promote conversion effect, the present invention proposes a kind of conversion method that these two kinds of methods are merged.Then transfer function is F (x)=θ F g(x)+(1-θ) F m(x).Fig. 3 has provided the process that obtains that merges transfer function.
8) conversion of F0: adopt the classical method based on Gaussian normalization that F0 is changed.
9) carry out the synthetic of voice by the spectrum signature after the conversion of transfer function acquisition and F0 by the STAIGHT model, finally obtain converting speech.
Performance evaluation:
Present embodiment has selected the English speech database of CMU ATCTIC that conversion effect is estimated.Select 500 statements of BDL and two speakers of CLB a man and a woman to carry out the training of background model respectively.By RMS and two speakers of SLT a man and a woman, comprise 120 statements respectively respectively.Wherein Dui Cheng 50 statements are used for the GMM pedestal method, and asymmetrical 50 statements are used for method of the present invention, and other 20 statements are used for evaluation test.The big or small optimised of the mixed components M of background model is set at 256, and big or small optimised 16, the MCC exponent number that is set at of the gaussian component of GMM pedestal method is made as 24 simultaneously.
We at first select Mei Er cepstrum distance, and (Mel cepstral distance MCD) comes the spectrum signature after the conversion is carried out objective evaluation.
MCD = 10 / log 10 2 Σ j = 1 D ( mc j c - mc j t ) 2 Formula (14)
Wherein
Figure BDA00003093836600072
With Be respectively the MCC of converting speech and target voice, D is the exponent number of MCC, and the more little expression conversion effect of MCD value is more good.
Fig. 4 and Fig. 5 have provided the present invention several method that proposes and the MCD result who relatively obtains based on the classical GMM method under the symmetrical corpus condition, and Fig. 4 has provided the conversion of male voice to female voice, and Fig. 5 has provided the conversion of female voice to male voice.Wherein GN represents that Gaussian normalization method, MT represent that average transformation approach, GNMT represent fusion method.Can find that along with the increase of training statement, the MCD curve of the method that the present invention proposes presents identical trend, all gradually near the result of GMM pedestal method.And adopt GNMT method total energy to obtain than GN or the better effect of MT method.This shows that fusion method can improve the effect of Gaussian normalization method and average transformation approach effectively.
Then we select mean opinion score (Mean opinion score, MOS) and method such as similarity test respectively the quality of converting speech and the similarity of converting speech and target voice have been carried out subjective assessment.Fig. 6 adopts mean opinion score (Mean opinion score with the method for the present invention's proposition and based on the classical GMM method under the symmetrical corpus condition, MOS) result that obtains of method, what adopt is that the marking principle of making in 5 minutes (wherein 1 is divided into " poor ", and 5 are divided into " very good ") is come the quality of converting speech is given a mark.Fig. 7 is with the inventive method and the similarity test result that obtains based on the classical GMM method under the symmetrical corpus condition, adopt be equally 5 minutes system (wherein 1 expression " different fully ", 5 expressions " in full accord ") judge the similarity of converting speech and target voice.MOS test and similarity test all adopt 5 asymmetric statements to be used for speaker adaptation, and have participated in marking by 6 professional researchists, and wherein variance represented in " worker " font among the figure.Can find that from the result of Fig. 6 and Fig. 7 the method that the present invention proposes can obtain the effect comparable with the GMM method, has verified the result of objective evaluation MCD to a certain extent.

Claims (5)

1. one kind based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that: at first obtain the background speaker model by pre-prepd with reference to the training of speaker's statement; Pass through the MAP adaptive technique then, source speaker and target speaker's statement is trained respectively obtain source speaker and target speaker model; Follow by the average in self-adaptation source speaker and the target speaker model and variance training and obtain the speech conversion function, in speech conversion process, use the method for Gaussian normalization and average conversion, and the method for Gaussian normalization and average conversion fusion; Sending out from limited source speaker and target speaker by the KL divergence in addition trains statement to obtain speaker model accurately.
2. as claimed in claim 1 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
The self-adaptation of speaker model
In described phonetics transfer method based on adaptive technique, the background speaker model is described by GMM, and is as follows:
Figure FDA00003093836500011
Formula (1)
Wherein N (.) represents Gaussian distribution, and z is the voice spectrum proper vector, and M represents the number of gaussian component, ω iBe i the weight that gaussian component is shared, satisfy
Figure FDA00003093836500012
Figure FDA00003093836500013
With
Figure FDA00003093836500014
Mean vector and the variance matrix of representing i gaussian component respectively; The sequence O=[o of given observation spectrum signature vector 1, o 2..., o T], use the MAP adaptive algorithm that average and variance are upgraded, formula is as follows:
Figure FDA00003093836500015
Formula (2)
Figure FDA00003093836500016
Formula (3)
Wherein
Figure FDA00003093836500017
With
Figure FDA00003093836500018
The middle updating value of representing i gaussian component average and variance respectively; E i(o) and E i(o 2) expression i gaussian component average and variance statistic, γ iBe the self-adaptation factor, be used for the balance to new and old statistic self-adaptation degree; Satisfy
Formula (4)
Wherein ρ is the related coefficient of self-adaptation speaker model and reference model, n iExpression weight statistic; Final weight, average and the variance that obtains source speaker x and target speaker y model respectively:
Figure FDA000030938365000110
With
Figure FDA000030938365000111
3. as claimed in claim 2 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
Phonetics transfer method based on Gaussian normalization
At first propose the phonetics transfer method based on Gaussian normalization, at translate phase, calculated each frame frequency spectrum signature parameter x of source speaker tPosterior probability on the speaker model of source is expressed as:
Figure FDA000030938365000215
Formula (5)
P (i|x wherein t) expression x tThe posterior probability that belongs to i gaussian component satisfies
Figure FDA00003093836500021
According to the character of GMM cluster, source speaker and the same gaussian component of target speaker can be thought and belong to same phoneme, satisfy:
Figure FDA00003093836500022
Formula (6)
Wherein
Figure FDA00003093836500023
Figure FDA00003093836500024
Figure FDA00003093836500025
With
Figure FDA00003093836500026
Represent average and the variance of source speaker and target speaker's m gaussian component respectively, it is as follows then can to obtain transfer function:
Figure FDA00003093836500027
Formula (7)
Phonetics transfer method based on the average conversion
Given source speaker and target speaker's model mean vector sequence:
Figure FDA00003093836500028
With
Figure FDA00003093836500029
μ then xAnd μ yBetween mapping function be shown below:
μ y=F (μ x)=A μ x+ b formula (8)
Set
Figure FDA000030938365000211
Use least square method can obtain unknown parameter A and b:
Formula (9)
Wherein
Figure FDA000030938365000213
Figure FDA000030938365000214
Transfer function shown in the formula (8) can be directly used in the conversion of spectrum signature, and then transfer function is as follows:
F (x)=Ax+b formula (10).
4. as claimed in claim 3 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that:
Phonetics transfer method based on Gaussian normalization and average conversion fusion
Transfer function is shown below:
F (x)=θ F g(x)+(1-θ) F m(x) formula (11)
F wherein g(x) and F m(x) represent respectively to train the transfer function that obtains by Gaussian normalization and average conversion method, θ is that weighting coefficient satisfies 0≤θ≤1;
Model optimization
The KL divergence is used for describing the distance between the different distributions, supposes f i(x) and f j(x) represent the distribution of two gaussian component respectively, then the KL divergence between the two is expressed as
Figure FDA00003093836500031
Formula (12)
Formula (12) has asymmetry, and it is as follows to redefine the KL divergence:
Figure FDA00003093836500032
Formula (13)
In transfer process, if the average of current component or variance are not updated, then select for use the average of nearest gaussian component or variance to replace.
5. as claimed in claim 1 based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm, it is characterized in that: the size of background speaker model GMM component M is selected according to the size of training corpus scale, be chosen as 2 Nth power, N is positive integer.
CN201310146293.XA 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm Active CN103280224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310146293.XA CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310146293.XA CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Publications (2)

Publication Number Publication Date
CN103280224A true CN103280224A (en) 2013-09-04
CN103280224B CN103280224B (en) 2015-09-16

Family

ID=49062718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310146293.XA Active CN103280224B (en) 2013-04-24 2013-04-24 Based on the phonetics transfer method under the asymmetric corpus condition of adaptive algorithm

Country Status (1)

Country Link
CN (1) CN103280224B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103531205A (en) * 2013-10-09 2014-01-22 常州工学院 Asymmetrical voice conversion method based on deep neural network feature mapping
CN104123933A (en) * 2014-08-01 2014-10-29 中国科学院自动化研究所 Self-adaptive non-parallel training based voice conversion method
CN104217721A (en) * 2014-08-14 2014-12-17 东南大学 Speech conversion method based on asymmetric speech database conditions of speaker model alignment
CN106205623A (en) * 2016-06-17 2016-12-07 福建星网视易信息***有限公司 A kind of sound converting method and device
CN106504741A (en) * 2016-09-18 2017-03-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of phonetics transfer method based on deep neural network phoneme information
CN107301859A (en) * 2017-06-21 2017-10-27 南京邮电大学 Phonetics transfer method under the non-parallel text condition clustered based on adaptive Gauss
CN107610717A (en) * 2016-07-11 2018-01-19 香港中文大学 Many-one phonetics transfer method based on voice posterior probability
CN107945795A (en) * 2017-11-13 2018-04-20 河海大学 A kind of accelerated model adaptive approach based on Gaussian classification
CN110544466A (en) * 2019-08-19 2019-12-06 广州九四智能科技有限公司 Speech synthesis method under condition of small amount of recording samples
CN111465982A (en) * 2017-12-12 2020-07-28 索尼公司 Signal processing device and method, training device and method, and program
CN112331181A (en) * 2019-07-30 2021-02-05 中国科学院声学研究所 Target speaker voice extraction method based on multi-speaker condition
CN112767942A (en) * 2020-12-31 2021-05-07 北京云迹科技有限公司 Speech recognition engine adaptation method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007103520A2 (en) * 2006-03-08 2007-09-13 Voxonic, Inc. Codebook-less speech conversion method and system
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007103520A2 (en) * 2006-03-08 2007-09-13 Voxonic, Inc. Codebook-less speech conversion method and system
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SONG PENG ET AL: "Efficient fundamental frequency transformation for voice conversion", 《JOURNAL OF SOUTHEAST UNIVERSITY》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103531205A (en) * 2013-10-09 2014-01-22 常州工学院 Asymmetrical voice conversion method based on deep neural network feature mapping
CN103531205B (en) * 2013-10-09 2016-08-31 常州工学院 The asymmetrical voice conversion method mapped based on deep neural network feature
CN104123933A (en) * 2014-08-01 2014-10-29 中国科学院自动化研究所 Self-adaptive non-parallel training based voice conversion method
CN104217721A (en) * 2014-08-14 2014-12-17 东南大学 Speech conversion method based on asymmetric speech database conditions of speaker model alignment
CN104217721B (en) * 2014-08-14 2017-03-08 东南大学 Based on the phonetics transfer method under the conditions of the asymmetric sound bank that speaker model aligns
CN106205623A (en) * 2016-06-17 2016-12-07 福建星网视易信息***有限公司 A kind of sound converting method and device
CN107610717A (en) * 2016-07-11 2018-01-19 香港中文大学 Many-one phonetics transfer method based on voice posterior probability
CN107610717B (en) * 2016-07-11 2021-07-06 香港中文大学 Many-to-one voice conversion method based on voice posterior probability
CN106504741A (en) * 2016-09-18 2017-03-15 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of phonetics transfer method based on deep neural network phoneme information
CN107301859A (en) * 2017-06-21 2017-10-27 南京邮电大学 Phonetics transfer method under the non-parallel text condition clustered based on adaptive Gauss
CN107301859B (en) * 2017-06-21 2020-02-21 南京邮电大学 Voice conversion method under non-parallel text condition based on self-adaptive Gaussian clustering
CN107945795A (en) * 2017-11-13 2018-04-20 河海大学 A kind of accelerated model adaptive approach based on Gaussian classification
CN107945795B (en) * 2017-11-13 2021-06-25 河海大学 Rapid model self-adaption method based on Gaussian classification
CN111465982A (en) * 2017-12-12 2020-07-28 索尼公司 Signal processing device and method, training device and method, and program
US11894008B2 (en) 2017-12-12 2024-02-06 Sony Corporation Signal processing apparatus, training apparatus, and method
CN112331181A (en) * 2019-07-30 2021-02-05 中国科学院声学研究所 Target speaker voice extraction method based on multi-speaker condition
CN110544466A (en) * 2019-08-19 2019-12-06 广州九四智能科技有限公司 Speech synthesis method under condition of small amount of recording samples
CN112767942A (en) * 2020-12-31 2021-05-07 北京云迹科技有限公司 Speech recognition engine adaptation method and device, electronic equipment and storage medium
CN112767942B (en) * 2020-12-31 2023-04-07 北京云迹科技股份有限公司 Speech recognition engine adaptation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103280224B (en) 2015-09-16

Similar Documents

Publication Publication Date Title
CN103280224A (en) Voice conversion method under asymmetric corpus condition on basis of adaptive algorithm
CN101833951B (en) Multi-background modeling method for speaker recognition
CN101000765B (en) Speech synthetic method based on rhythm character
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN103065620B (en) Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time
CN105469784B (en) A kind of speaker clustering method and system based on probability linear discriminant analysis model
CN110060701A (en) Multi-to-multi phonetics transfer method based on VAWGAN-AC
Xie et al. Sequence error (SE) minimization training of neural network for voice conversion.
CN109599091A (en) Multi-to-multi voice conversion method based on STARWGAN-GP and x vector
CN105139864A (en) Voice recognition method and voice recognition device
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN104217721A (en) Speech conversion method based on asymmetric speech database conditions of speaker model alignment
CN103531196A (en) Sound selection method for waveform concatenation speech synthesis
CN102332263A (en) Close neighbor principle based speaker recognition method for synthesizing emotional model
CN107195299A (en) Train the method and apparatus and audio recognition method and device of neutral net acoustic model
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
CN102779510A (en) Speech emotion recognition method based on feature space self-adaptive projection
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN105261367A (en) Identification method of speaker
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN104462409A (en) Cross-language emotional resource data identification method based on AdaBoost
CN109584893A (en) Based on the multi-to-multi speech conversion system of VAE and i-vector under non-parallel text condition
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN105280181A (en) Training method for language recognition model and language recognition method
CN105488098A (en) Field difference based new word extraction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180614

Address after: 210037 Qixia district and Yanlu No. 408, Nanjing, Jiangsu

Patentee after: Nanjing Boke Electronic Technology Co., Ltd.

Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu

Patentee before: Southeast University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180709

Address after: 211103 No. 1009 Tianyuan East Road, Jiangning District, Nanjing, Jiangsu.

Patentee after: LIXIN Wireless Electronic Technology Co., Ltd.

Address before: 210037 Qixia district and Yanlu No. 408, Nanjing, Jiangsu

Patentee before: Nanjing Boke Electronic Technology Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190109

Address after: Room 125, Building 1, Building 6, 4299 Jindu Road, Minhang District, Shanghai 201100

Patentee after: Shanghai Taiyu information technology Limited by Share Ltd

Address before: 211103 No. 1009 Tianyuan East Road, Jiangning District, Nanjing, Jiangsu.

Patentee before: LIXIN Wireless Electronic Technology Co., Ltd.