CN109285536A - Voice special effect synthesis method and device, electronic equipment and storage medium - Google Patents
Voice special effect synthesis method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN109285536A CN109285536A CN201811413566.1A CN201811413566A CN109285536A CN 109285536 A CN109285536 A CN 109285536A CN 201811413566 A CN201811413566 A CN 201811413566A CN 109285536 A CN109285536 A CN 109285536A
- Authority
- CN
- China
- Prior art keywords
- target
- basic
- prosodic features
- voice
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000694 effects Effects 0.000 title claims abstract description 55
- 238000003860 storage Methods 0.000 title claims abstract description 19
- 238000001308 synthesis method Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000013507 mapping Methods 0.000 claims abstract description 17
- 230000015572 biosynthetic process Effects 0.000 claims description 31
- 238000003786 synthesis reaction Methods 0.000 claims description 31
- 238000010189 synthetic method Methods 0.000 claims description 21
- 238000001228 spectrum Methods 0.000 claims description 15
- 230000033764 rhythmic process Effects 0.000 claims description 13
- 230000003595 spectral effect Effects 0.000 claims description 7
- 230000001052 transient effect Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 101100033673 Mus musculus Ren1 gene Proteins 0.000 description 1
- 101100506221 Nitrosomonas europaea (strain ATCC 19718 / CIP 103999 / KCTC 2705 / NBRC 14298) hao3 gene Proteins 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention discloses a method, a device, electronic equipment and a storage medium for synthesizing a special voice effect, wherein the method comprises the following steps: acquiring text data corresponding to original voice data, and acquiring basic prosodic features and basic acoustic features matched with the text data; acquiring a feature adjustment parameter corresponding to a required special effect according to a mapping relation between at least two pre-established special effects and the corresponding feature adjustment parameter, wherein the feature adjustment parameter comprises: adjusting basic prosodic features and/or basic acoustic features by using the feature adjustment parameters to obtain target prosodic features and/or target acoustic features; and generating special effect synthetic voice corresponding to the original voice data according to the target prosody characteristic and/or the target acoustic characteristic. The technical scheme of the embodiment of the invention can meet the diversity requirement of special-effect voice.
Description
Technical field
The present embodiments relate to voice processing technology field, more particularly to a kind of voice special efficacy synthetic method, device,
Electronic equipment and storage medium.
Background technique
Speech synthesis, also known as literary periodicals (Text to Speech) technology, can convert in real time any text information to
Smooth voice.
Existing speech synthesis technique generallys use preparatory trained rhythm model and acoustic model to raw tone number
According to text data handled, obtain the corresponding synthesis voice of primary voice data.Inventor in the specific implementation process, sends out
Following defect now exists in the prior art: using preparatory trained rhythm model and acoustic model to the text of primary voice data
Notebook data is handled, and be can only obtain a kind of synthesis voice of fixed type, is unable to satisfy the diversity requirement of special efficacy voice.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of voice special efficacy synthetic method, device, electronic equipment and storages to be situated between
Matter, main purpose are that the existing synthesis voice of solution speech synthesis system is more single, are unable to satisfy the multiplicity of special efficacy voice
The problems such as property demand.
To solve the above-mentioned problems, the embodiment of the present invention mainly provides the following technical solutions:
In a first aspect, the embodiment of the invention provides a kind of voice special efficacy synthetic methods, this method comprises:
The corresponding text data of primary voice data is obtained, and obtains the basic prosodic features with the matches text data
With basic acoustic feature;
According to the mapping relations between at least two special effects pre-established and corresponding Character adjustment parameter, obtain
Character adjustment parameter corresponding with the special effect of demand, the Character adjustment parameter include: target prosodic features adjusting parameter
And/or target acoustical Character adjustment parameter;
The basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, obtained
Target prosodic features and/or target acoustical feature;
According to the target prosodic features and/or target acoustical feature, spy corresponding with the primary voice data is generated
Effect synthesis voice.
Second aspect, the embodiment of the present invention also provide a kind of voice special efficacy synthesizer, which includes:
Data acquisition module for obtaining the corresponding text data of primary voice data, and obtains and the text data
Matched basis prosodic features and basic acoustic feature;
Parameter acquisition module, for according at least two special effects that pre-establish and corresponding Character adjustment parameter it
Between mapping relations, obtain corresponding with the special effect of demand Character adjustment parameter, the Character adjustment parameter includes: target
Prosodic features adjusting parameter and/or target acoustical Character adjustment parameter;
Parameter adjustment module, for using the Character adjustment parameter to the basic prosodic features and/or basic acoustics
Feature is adjusted, and obtains target prosodic features and/or target acoustical feature;
Voice synthetic module, for according to the target prosodic features and/or target acoustical feature, generate with it is described original
The corresponding special efficacy of voice data synthesizes voice.
The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, comprising:
At least one processor;
And at least one processor, the bus being connected to the processor;Wherein,
The processor, memory complete mutual communication by the bus;
The processor is used to call the program instruction in the memory, is provided with executing any embodiment of that present invention
Voice special efficacy synthetic method.
Fourth aspect, the embodiment of the present invention also provide a kind of non-transient computer readable storage medium, the non-transient meter
Calculation machine readable storage medium storing program for executing stores computer instruction, and the computer instruction makes the computer execute any embodiment of that present invention
Provided voice special efficacy synthetic method.
By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention is at least had the advantage that
The embodiment of the present invention is by obtaining the corresponding text data of primary voice data and the basis with matches text data
Prosodic features and basic acoustic feature, and according at least two special effects pre-established and corresponding Character adjustment parameter it
Between mapping relations, corresponding with the special effect of demand Character adjustment parameter is obtained, to use Character adjustment parameter to basis
Prosodic features and/or basic acoustic feature are adjusted, to obtain target prosodic features and/or target acoustical feature, finally
Special efficacy corresponding with primary voice data, which is generated, according to the target prosodic features of acquisition and/or target acoustical feature synthesizes voice,
The a variety of special effects realized according to demand are adjusted and synthesize, and solve the existing synthesis voice of existing voice synthesis system
Single problem, to meet the diversity requirement of special efficacy voice.
Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention
Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention
, feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is a kind of flow chart for voice special efficacy synthetic method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of voice special efficacy synthetic method provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of schematic diagram for answer output device that the embodiment of the present invention three provides;
Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention four provides.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Embodiment one
Fig. 1 is a kind of flow chart for voice special efficacy synthetic method that the embodiment of the present invention one provides, and the present embodiment is applicable
According to different special effect demand synthesis special efficacy synthesis voice the case where, this method can by voice special efficacy synthesizer Lai
It executes, which can be realized by the mode of software and/or hardware.Correspondingly, as shown in Figure 1, this method includes following behaviour
Make:
S110, the corresponding text data of primary voice data is obtained, and obtains the basic rhythm with the matches text data
Restrain feature and basic acoustic feature.
Wherein, primary voice data can be manually entered, and need to be converted to the voice data of special efficacy synthesis voice.Base
Plinth prosodic features can be the prosodic features of primary voice data, for example, the tone of voice phonetic, stress, participle and pause
Feature etc..Basic acoustic feature can be the acoustic feature of primary voice data, for example, the base frequency parameters of voice data and spectrum ginseng
Number etc..
In embodiments of the present invention, any voice data that primary voice data can be manually entered.Optionally, it can be used
The mode of mandarin inputs, and can also input using with emotion, the embodiment of the present invention is not to primary voice data
Typing mode and content etc. are defined.Obtaining the corresponding text data of primary voice data can be known using any voice
Other technology realizes that the embodiment of the present invention does not limit the mode for obtaining the corresponding text data of primary voice data equally
It is fixed.Correspondingly, can further be obtained and matches text data after getting the corresponding text data of primary voice data
Basic prosodic features and basic acoustic feature.
S120, it is closed according to the mapping between at least two special effects and corresponding Character adjustment parameter pre-established
System obtains Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: target prosodic features tune
Whole parameter and/or target acoustical Character adjustment parameter.
Wherein, Character adjustment parameter can be for the basic prosodic features and/or basic acoustics to primary voice data
The parameter that feature is adjusted, such as tone, stress, participle, pause feature, base frequency parameters and spectrum parameter.Target prosodic features
Adjusting parameter can be the available parameter being adjusted to the basic prosodic features of primary voice data, target acoustical Character adjustment
Parameter can be the available parameter being adjusted to the basic acoustic feature of primary voice data.
In embodiments of the present invention, before synthesizing special efficacy voice to primary voice data, at least two can be initially set up
Mapping relations between a special effect and corresponding Character adjustment parameter, to realize that the voice to different special effect demands closes
At.For example, being established respectively and feature tune for multiple special effects such as machine sound effect, male senile patient sound effect and children's sound effects
Mapping relations between whole parameter.For different special effects, the target rhythm that corresponding Character adjustment parameter includes is special
Sign adjusting parameter and/or target acoustical Character adjustment parameter are also not quite similar each other.The corresponding feature of part special effect
Adjusting parameter may be related to target prosodic features adjusting parameter and target acoustical Character adjustment parameter simultaneously, but also have part special
Effect effect may relate only to target prosodic features adjusting parameter or target acoustical Character adjustment parameter.
S130, the basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter,
Obtain target prosodic features and/or target acoustical feature.
Wherein, target prosodic features can be the prosodic features of the special efficacy synthesis voice of demand, and target acoustical feature can be with
The acoustic feature of the special efficacy synthesis voice of demand.
Correspondingly, the special effect pair of voice can be synthesized according to special efficacy when carrying out special efficacy synthesis to primary voice data
The basic prosodic features and/or basic acoustic feature that the Character adjustment parameter answered includes to primary voice data are adjusted, from
And target prosodic features and/or target acoustical feature after being adjusted.That is, the embodiment of the present invention can be respectively to basic rhythm
Rule feature and basic acoustic feature are adjusted to obtain target prosodic features and target acoustical feature, can also be only to the basic rhythm
Feature or basic acoustic feature are adjusted to obtain target prosodic features or target acoustical feature.
S140, according to the target prosodic features and/or target acoustical feature, generate corresponding with the primary voice data
Special efficacy synthesize voice.
It in embodiments of the present invention, can be according to the target rhythm after obtaining target prosodic features and/or target acoustical feature
Feature and/or target acoustical feature generate special efficacy corresponding with primary voice data and synthesize voice.
The embodiment of the present invention is by obtaining the corresponding text data of primary voice data and the basis with matches text data
Prosodic features and basic acoustic feature, and according at least two special effects pre-established and corresponding Character adjustment parameter it
Between mapping relations, corresponding with the special effect of demand Character adjustment parameter is obtained, to use Character adjustment parameter to basis
Prosodic features and/or basic acoustic feature are adjusted, to obtain target prosodic features and/or target acoustical feature, finally
Special efficacy corresponding with primary voice data, which is generated, according to the target prosodic features of acquisition and/or target acoustical feature synthesizes voice,
The a variety of special effects realized according to demand are adjusted and synthesize, and solve the existing synthesis voice of existing voice synthesis system
Single problem, to meet the diversity requirement of special efficacy voice.
Embodiment two
Fig. 2 is a kind of flow chart of voice special efficacy synthetic method provided by Embodiment 2 of the present invention, and the present embodiment is with above-mentioned
It is embodied based on embodiment, in the present embodiment, is given using the Character adjustment parameter to the basic rhythm
Feature and/or basic acoustic feature are adjusted, and obtain the specific implementation side of target prosodic features and/or target acoustical feature
Formula.Correspondingly, as shown in Fig. 2, this method includes following operation:
S210, the corresponding text data of primary voice data is obtained, and obtains the basic rhythm with the matches text data
Restrain feature and basic acoustic feature.
S220, it is closed according to the mapping between at least two special effects and corresponding Character adjustment parameter pre-established
System obtains Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: target prosodic features tune
Whole parameter and/or target acoustical Character adjustment parameter.
S230, the basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter,
Obtain target prosodic features and/or target acoustical feature.
Wherein, the basic prosodic features includes: that basis pronunciation corresponding with the text character in the text data is special
Basic pause value between text character two-by-two in sign and the text data, the basis pronunciation character include and text
The corresponding pinyin elements of character and tone value corresponding with pinyin elements;The target prosodic features adjusting parameter, comprising: mesh
Mark tone adjusted value and target pause value.The basis acoustic feature includes: basis corresponding with the primary voice data
Base frequency parameters and basic spectral parameter;The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target spectrum ginseng
Number.
In embodiments of the present invention, basic pronunciation character can be each text of the corresponding text data of primary voice data
Pronunciation character, specifically can be the corresponding pinyin elements of text character and tone value corresponding with pinyin elements.Stop on basis
Value of pausing can be the pause value between adjacent two text of the corresponding text data of primary voice data.Basic base frequency parameters and base
Plinth spectrum parameter is the base frequency parameters and spectrum parameter of primary voice data.Target tone adjusted value is that required special efficacy synthesizes voice
The pinyin elements of corresponding text data and tone value corresponding with pinyin elements, target pause value are that required special efficacy synthesizes
Pause value between adjacent two text of the corresponding text data of voice.Needed for target base frequency parameters and target spectrum parameter are
Special efficacy synthesizes the corresponding base frequency parameters of voice and spectrum parameter.
Correspondingly, being obtained when using the target prosodic features adjusting parameter to be adjusted the basic prosodic features
When target prosodic features, S230 can specifically include operations described below:
S231a, updated using the target tone adjusted value it is corresponding with each pinyin elements in the basic prosodic features
Tone value.
Specifically, the basic rhythm of the corresponding text data of primary voice data can be updated using target tone adjusted value
Tone value corresponding with each pinyin elements in feature.
S232a, text character two-by-two is updated in the basic prosodic features text data using the target pause value
Between basic pause value.
Similarly, it can be updated using target pause value in the basic prosodic features of the corresponding text data of primary voice data
Basic pause value between text character two-by-two in text data.
S233a, result will be updated as the target prosodic features.
Correspondingly, finally can be using the update result of tone value and pause value as target prosodic features.
Correspondingly, being obtained when using the target acoustical Character adjustment parameter to be adjusted the basic acoustic feature
When target acoustical feature, S230 can specifically include operations described below:
S231b, the basic base frequency parameters in the basic acoustic feature are updated using the target base frequency parameters.
Specifically, the basis in the corresponding basic acoustic feature of primary voice data can be updated using target base frequency parameters
Base frequency parameters.
S232b, the basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter.
Similarly, the basic spectral ginseng in the corresponding basic acoustic feature of primary voice data can be updated using target spectrum parameter
Number.
S233b, result will be updated as the target acoustical feature.
Correspondingly, finally can be using the update result of base frequency parameters and spectrum parameter as target acoustical feature.
S240, using vocoder according to the target prosodic features and/or target acoustical feature, generate and the original language
The corresponding special efficacy of sound data synthesizes voice.
In embodiments of the present invention, after obtaining target prosodic features and/or target acoustical feature, vocoder pair can be used
Target prosodic features and/or target acoustical feature are synthesized, and synthesize voice to generate special efficacy corresponding with primary voice data.
In an alternate embodiment of the present invention where, it according to the target prosodic features and/or target acoustical feature, generates
Special efficacy corresponding with the primary voice data synthesizes voice, may include following one:
According to the target prosodic features and the basic acoustic feature, generate corresponding with the primary voice data
Special efficacy synthesizes voice;
According to the basic prosodic features and the target acoustical feature, generate corresponding with the primary voice data
Special efficacy synthesizes voice;
According to the target prosodic features and target acoustical feature, special efficacy corresponding with the primary voice data is generated
Synthesize voice.
It should be noted that the corresponding Character adjustment parameter of part special effect can due to for different special effects
It can be related to target prosodic features adjusting parameter and target acoustical Character adjustment parameter simultaneously, but also have part special effect may
Relate only to target prosodic features adjusting parameter or target acoustical Character adjustment parameter.Therefore, it generates and primary voice data pair
When the special efficacy synthesis voice answered, it is also possible to while being related to target prosodic features and target acoustical feature, or relate only to
Target prosodic features or target acoustical feature.Specifically, when obtaining target prosodic features and target acoustical feature, it can be to mesh
Mark prosodic features and target acoustical feature are combined to obtain corresponding special efficacy synthesis voice;When obtaining target prosodic features,
Target prosodic features can be combined to obtain corresponding special efficacy synthesis voice with basic acoustic feature;When obtaining target acoustical
When feature, target acoustical feature can be combined to obtain corresponding special efficacy synthesis voice with basic prosodic features.
In addition it should be noted that, special effect corresponding different prosodic features model and acoustic feature can also be used
Model is adjusted basic prosodic features and/or basic acoustic feature, obtains target prosodic features and/or target acoustical is special
Sign.
In a specific example, illustrate for synthesizing voice using machine sound as special efficacy.Assuming that primary voice data
Corresponding text data are as follows: " hello, and Nice to see you ".The basic prosodic features of text Data Matching are as follows: 1) ni3
hao3 hen3 gao1 xing4 ren4 shi1 ni3;2) your good #2 Nice to see you #3.Wherein, first feature is text
The corresponding basic pronunciation character of this character, second feature are the basic pause value between text character two-by-two in text data.
Basic spectral parameter in the basic acoustic feature of text Data Matching is ... 258 263 275 ....Correspondingly, if machine
Mapping relations between the corresponding target prosodic features adjusting parameter of device sound special effect are that target tone adjusted value is unified for one
Sound and target pause value are unified for second level, the mapping between the corresponding target acoustical Character adjustment parameter of machine sound special effect
Relationship is that target base frequency parameters are fixed value 260, and spectrum parameter remains unchanged, then the target rhythm of final special efficacy synthesis voice is special
Sign are as follows: 1) ni1 hao1 hen1 gao1 xing1 ren1 shi1 ni1;2) your good #2 Nice to see you #2.Target acoustical
The target base frequency parameters that feature includes are ... 260 260 260 ..., the target spectrum parameter and base that target acoustical feature includes
Plinth spectrum parameter is consistent, without adjustment.Finally, can be by vocoder by the corresponding target prosodic features of machine sound and target
Acoustic feature is combined synthesis, and then generates the corresponding special efficacy of final machine sound and synthesize voice.
It should be noted that Fig. 2 is only a kind of schematic diagram of implementation, between S231a-S233a and S231b-S233b
It there is no sequencing relationship, can first implement S231a-S233a, then implement S231b-S233b, can also first implement S231b-
S233b, then implement S231a-S233a, with the two parallel practice or an implementation can be selected.
By adopting the above technical scheme, by according to reflecting between the corresponding Character adjustment parameter of special effect pre-established
Penetrate relationship, the corresponding Character adjustment parameter of special effect needed for determining, and according to Character adjustment parameter to basic prosodic features and/
Or basic acoustic feature is adjusted, and then obtains target prosodic features and/or target acoustical feature, finally according to the mesh of acquisition
It marks prosodic features and/or target acoustical feature generates special efficacy corresponding with primary voice data and synthesizes voice, realize according to need
The a variety of special effects asked are adjusted and synthesize, and solve the problems, such as that the existing synthesis voice of existing voice synthesis system is single,
To meet the diversity requirement of special efficacy voice.
It should be noted that in the above various embodiments between each technical characteristic arbitrary arrangement combination also belong to it is of the invention
Protection scope.
Embodiment three
Fig. 3 is a kind of schematic diagram for answer output device that the embodiment of the present invention three provides, as shown in figure 3, described device
It include: data acquisition module 310, parameter acquisition module 320, parameter adjustment module 330 and voice synthetic module 340, in which:
Data acquisition module 310 for obtaining the corresponding text data of primary voice data, and obtains and the textual data
According to matched basic prosodic features and basic acoustic feature;
Parameter acquisition module 320, for being joined according at least two special effects pre-established with corresponding Character adjustment
Mapping relations between number, obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes:
Target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter;
Parameter adjustment module 330, for using the Character adjustment parameter to the basic prosodic features and/or basic sound
It learns feature to be adjusted, obtains target prosodic features and/or target acoustical feature;
Voice synthetic module 340, for according to the target prosodic features and/or target acoustical feature, generate with it is described
The corresponding special efficacy of primary voice data synthesizes voice.
The embodiment of the present invention is by obtaining the corresponding text data of primary voice data and the basis with matches text data
Prosodic features and basic acoustic feature, and according at least two special effects pre-established and corresponding Character adjustment parameter it
Between mapping relations, corresponding with the special effect of demand Character adjustment parameter is obtained, to use Character adjustment parameter to basis
Prosodic features and/or basic acoustic feature are adjusted, to obtain target prosodic features and/or target acoustical feature, finally
Special efficacy corresponding with primary voice data, which is generated, according to the target prosodic features of acquisition and/or target acoustical feature synthesizes voice,
The a variety of special effects realized according to demand are adjusted and synthesize, and solve the existing synthesis voice of existing voice synthesis system
Single problem, to meet the diversity requirement of special efficacy voice.
Optionally, the basic prosodic features includes: basis pronunciation corresponding with the text character in the text data
The basic pause value between text character, the basis pronunciation character include and text two-by-two in feature and the text data
The corresponding pinyin elements of this character and tone value corresponding with pinyin elements;The target prosodic features adjusting parameter, comprising:
Target tone adjusted value and target pause value.
Optionally, parameter adjustment module 330 are specifically used for updating the basic rhythm using the target tone adjusted value
Tone value corresponding with each pinyin elements in feature;Textual data in the basic prosodic features is updated using the target pause value
The basic pause value between text character two-by-two in;Result will be updated as the target prosodic features.
Optionally, the basic acoustic feature includes: basic base frequency parameters corresponding with the primary voice data and base
Plinth composes parameter;The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target compose parameter.
Optionally, parameter adjustment module 330 are specifically used for updating the basic acoustics spy using the target base frequency parameters
Basic base frequency parameters in sign;The basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter;It will more
New result is as the target acoustical feature.
Optionally, voice synthetic module 340 are specifically used for special according to the target prosodic features and the basic acoustics
Sign generates special efficacy corresponding with the primary voice data and synthesizes voice;According to the basic prosodic features and the target
Acoustic feature generates special efficacy corresponding with the primary voice data and synthesizes voice;According to the target prosodic features and mesh
Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice.
Optionally, voice synthetic module 340 are specifically used for using vocoder according to the target prosodic features and/or mesh
Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice.
Voice special efficacy synthetic method provided by any embodiment of the invention can be performed in above-mentioned voice special efficacy synthesizer, tool
The standby corresponding functional module of execution method and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to this
The voice special efficacy synthetic method that invention any embodiment provides.
Since the voice special efficacy synthesizer that the present embodiment is introduced is the voice spy that can be executed in the embodiment of the present invention
The device of synthetic method is imitated, so based on voice special efficacy synthetic method described in the embodiment of the present invention, the affiliated skill in this field
Art personnel can understand the specific embodiment and its various change form of the voice special efficacy synthesizer of the present embodiment, so
How voice special efficacy synthetic method in embodiment of the present invention in detail is realized if being no longer situated between for the voice special efficacy synthesizer at this
It continues.As long as those skilled in the art implement device used by voice special efficacy synthetic method in the embodiment of the present invention, all belong to
In the range that the application to be protected.
Example IV
Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention four provides.As shown in figure 4, such as Fig. 4 institute
Show, comprising: at least one processor (processor) 41;And at least one processor being connect with the processor 41
(memory) 42, bus 43;Wherein,
The processor 41, memory 42 complete mutual communication by the bus 43;
The processor 41 is used to call the program instruction in the memory 42, to execute above-mentioned voice special efficacy synthesis side
Step in method embodiment.For example, the processor 41 executes: obtaining the corresponding text data of primary voice data, and obtain
With the basic prosodic features and basic acoustic feature of the matches text data;According at least two special effects pre-established
With the mapping relations between corresponding Character adjustment parameter, Character adjustment parameter corresponding with the special effect of demand, institute are obtained
Stating Character adjustment parameter includes: target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter;Use the feature
Adjusting parameter is adjusted the basic prosodic features and/or basic acoustic feature, obtains target prosodic features and/or target
Acoustic feature;According to the target prosodic features and/or target acoustical feature, spy corresponding with the primary voice data is generated
Effect synthesis voice.
Embodiment five
The present embodiment provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium
Computer instruction is stored, the computer instruction proposes the above-mentioned each voice special efficacy synthetic method embodiment of the computer execution
The voice special efficacy synthetic method of confession: the corresponding text data of primary voice data is obtained, and is obtained and the matches text data
Basic prosodic features and basic acoustic feature;Joined according at least two special effects pre-established with corresponding Character adjustment
Mapping relations between number, obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes:
Target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter;Using the Character adjustment parameter to the basis
Prosodic features and/or basic acoustic feature are adjusted, and obtain target prosodic features and/or target acoustical feature;According to described
Target prosodic features and/or target acoustical feature generate special efficacy corresponding with the primary voice data and synthesize voice.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
Usable storage medium (including but not limited to magnetic disk storage, CD-ROM (Compact Disc-Read Only Memory,
CD-ROM), optical memory etc.) on the form of computer program product implemented.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (Central Processing
Unit/Processor, CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (Random
Access Memory, RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (Read Only Memory, ROM)
Or flash memory (flash RAM).Memory is the example of computer-readable medium.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer include, but are not limited to phase change memory (Parallel Random Access Machine,
PRAM), static random access memory (Static Random-Access Memory, SRAM), dynamic random access memory
(Dynamic RandomAccess Memory, DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only
Memory, EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital multi
CD (Digital Video Disc, DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic
Property storage equipment or any other non-transmission medium, can be used for storing and can be accessed by a computing device information.According to herein
Define, computer-readable medium does not include temporary computer readable media (transitory media), the data letter of such as modulation
Number and carrier wave.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence " including one ... ", it is not excluded that including element
Process, method, there is also other identical elements in commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of voice special efficacy synthetic method characterized by comprising
The corresponding text data of primary voice data is obtained, and obtains the basic prosodic features and base with the matches text data
Plinth acoustic feature;
According to the mapping relations between at least two special effects pre-established and corresponding Character adjustment parameter, obtains and need
The corresponding Character adjustment parameter of the special effect asked, the Character adjustment parameter include: target prosodic features adjusting parameter and/or
Target acoustical Character adjustment parameter;
The basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, obtain target
Prosodic features and/or target acoustical feature;
According to the target prosodic features and/or target acoustical feature, generates special efficacy corresponding with the primary voice data and close
At voice.
2. according to the method described in claim 1, it is characterized by:
The basis prosodic features includes: basic pronunciation character corresponding with the text character in the text data, Yi Jisuo
The basic pause value between text character two-by-two is stated in text data, the basis pronunciation character includes corresponding with text character
Pinyin elements and tone value corresponding with pinyin elements;
The target prosodic features adjusting parameter, comprising: target tone adjusted value and target pause value.
3. according to the method described in claim 2, it is characterized in that, using the target prosodic features adjusting parameter to the base
Plinth prosodic features is adjusted, and obtains target prosodic features, comprising:
Tone value corresponding with each pinyin elements in the basic prosodic features is updated using the target tone adjusted value;
The basis between text character two-by-two is updated in the basic prosodic features text data using the target pause value
Pause value;
Result will be updated as the target prosodic features.
4. according to the method described in claim 1, it is characterized by:
The basis acoustic feature includes: basic base frequency parameters corresponding with the primary voice data and basic spectral parameter;
The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target compose parameter.
5. according to the method described in claim 4, it is characterized in that, using the target acoustical Character adjustment parameter to the base
Plinth acoustic feature is adjusted, and obtains target acoustical feature, comprising:
The basic base frequency parameters in the basic acoustic feature are updated using the target base frequency parameters;
The basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter;
Result will be updated as the target acoustical feature.
6. method according to claim 1-5, which is characterized in that according to the target prosodic features and/or mesh
Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice, including following one:
According to the target prosodic features and the basic acoustic feature, special efficacy corresponding with the primary voice data is generated
Synthesize voice;
According to the basic prosodic features and the target acoustical feature, special efficacy corresponding with the primary voice data is generated
Synthesize voice;
According to the target prosodic features and target acoustical feature, special efficacy synthesis corresponding with the primary voice data is generated
Voice.
7. method according to claim 1-5, which is characterized in that according to the target prosodic features and/or mesh
Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice, comprising:
Using vocoder according to the target prosodic features and/or target acoustical feature, generate and the primary voice data pair
The special efficacy synthesis voice answered.
8. a kind of voice special efficacy synthesizer characterized by comprising
Data acquisition module for obtaining the corresponding text data of primary voice data, and obtains and the matches text data
Basic prosodic features and basic acoustic feature;
Parameter acquisition module, for according between at least two special effects and corresponding Character adjustment parameter pre-established
Mapping relations obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: the target rhythm
Character adjustment parameter and/or target acoustical Character adjustment parameter;
Parameter adjustment module, for using the Character adjustment parameter to the basic prosodic features and/or basic acoustic feature
It is adjusted, obtains target prosodic features and/or target acoustical feature;
Voice synthetic module, for generating and the raw tone according to the target prosodic features and/or target acoustical feature
The corresponding special efficacy of data synthesizes voice.
9. a kind of electronic equipment characterized by comprising
At least one processor;
And at least one processor, the bus being connected to the processor;Wherein,
The processor, memory complete mutual communication by the bus;
The processor is used to call the program instruction in the memory, any into claim 7 with perform claim requirement 1
Voice special efficacy synthetic method described in.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Store up computer instruction, the computer instruction requires the computer perform claim 1 to described in any one of claim 7
Voice special efficacy synthetic method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811413566.1A CN109285536B (en) | 2018-11-23 | 2018-11-23 | Voice special effect synthesis method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811413566.1A CN109285536B (en) | 2018-11-23 | 2018-11-23 | Voice special effect synthesis method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109285536A true CN109285536A (en) | 2019-01-29 |
CN109285536B CN109285536B (en) | 2022-05-13 |
Family
ID=65172650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811413566.1A Active CN109285536B (en) | 2018-11-23 | 2018-11-23 | Voice special effect synthesis method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109285536B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108140393A (en) * | 2016-09-28 | 2018-06-08 | 华为技术有限公司 | A kind of methods, devices and systems for handling multi-channel audio signal |
CN112672259A (en) * | 2019-10-16 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | Loudspeaker control method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050267758A1 (en) * | 2004-05-31 | 2005-12-01 | International Business Machines Corporation | Converting text-to-speech and adjusting corpus |
CN103035251A (en) * | 2011-09-30 | 2013-04-10 | 西门子公司 | Method for building voice transformation model and method and system for voice transformation |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN105185372A (en) * | 2015-10-20 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device |
CN108364631A (en) * | 2017-01-26 | 2018-08-03 | 北京搜狗科技发展有限公司 | A kind of phoneme synthesizing method and device |
-
2018
- 2018-11-23 CN CN201811413566.1A patent/CN109285536B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050267758A1 (en) * | 2004-05-31 | 2005-12-01 | International Business Machines Corporation | Converting text-to-speech and adjusting corpus |
CN103035251A (en) * | 2011-09-30 | 2013-04-10 | 西门子公司 | Method for building voice transformation model and method and system for voice transformation |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN105185372A (en) * | 2015-10-20 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device |
CN108364631A (en) * | 2017-01-26 | 2018-08-03 | 北京搜狗科技发展有限公司 | A kind of phoneme synthesizing method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108140393A (en) * | 2016-09-28 | 2018-06-08 | 华为技术有限公司 | A kind of methods, devices and systems for handling multi-channel audio signal |
CN112672259A (en) * | 2019-10-16 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | Loudspeaker control method and device |
CN112672259B (en) * | 2019-10-16 | 2023-03-10 | 北京地平线机器人技术研发有限公司 | Loudspeaker control method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109285536B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10410621B2 (en) | Training method for multiple personalized acoustic models, and voice synthesis method and device | |
CN105845125B (en) | Phoneme synthesizing method and speech synthetic device | |
CN105244020B (en) | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device | |
CN106688034B (en) | Text-to-speech conversion with emotional content | |
JP5293460B2 (en) | Database generating apparatus for singing synthesis and pitch curve generating apparatus | |
JP5471858B2 (en) | Database generating apparatus for singing synthesis and pitch curve generating apparatus | |
CN109285537A (en) | Acoustic model foundation, phoneme synthesizing method, device, equipment and storage medium | |
CN104916284A (en) | Prosody and acoustics joint modeling method and device for voice synthesis system | |
WO2022121176A1 (en) | Speech synthesis method and apparatus, electronic device, and readable storage medium | |
US8380508B2 (en) | Local and remote feedback loop for speech synthesis | |
CN108492818B (en) | Text-to-speech conversion method and device and computer equipment | |
CN110176237A (en) | A kind of audio recognition method and device | |
US20180232363A1 (en) | System and method for audio dubbing and translation of a video | |
CN108766413A (en) | Phoneme synthesizing method and system | |
JP6680933B2 (en) | Acoustic model learning device, speech synthesis device, acoustic model learning method, speech synthesis method, program | |
CN110599998A (en) | Voice data generation method and device | |
US20210366461A1 (en) | Generating speech signals using both neural network-based vocoding and generative adversarial training | |
CN113724683B (en) | Audio generation method, computer device and computer readable storage medium | |
CN109285536A (en) | Voice special effect synthesis method and device, electronic equipment and storage medium | |
CN111785248A (en) | Text information processing method and device | |
US9159329B1 (en) | Statistical post-filtering for hidden Markov modeling (HMM)-based speech synthesis | |
US8781835B2 (en) | Methods and apparatuses for facilitating speech synthesis | |
JP6082657B2 (en) | Pose assignment model selection device, pose assignment device, method and program thereof | |
CN111048065B (en) | Text error correction data generation method and related device | |
JP2013164609A (en) | Singing synthesizing database generation device, and pitch curve generation device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220414 Address after: 210034 floor 8, building D11, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province Applicant after: New Technology Co.,Ltd. Address before: 100080 Room 501, 5th floor, NO.67, North Fourth Ring Road West, Haidian District, Beijing Applicant before: Beijing Yufanzhi Information Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |