CN109285536A

CN109285536A - Voice special effect synthesis method and device, electronic equipment and storage medium

Info

Publication number: CN109285536A
Application number: CN201811413566.1A
Authority: CN
Inventors: 张冉; 张征
Original assignee: Beijing Yufanzhi Information Technology Co ltd
Current assignee: Mobvoi Innovation Technology Co Ltd
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2019-01-29
Anticipated expiration: 2038-11-23
Also published as: CN109285536B

Abstract

The embodiment of the invention discloses a method, a device, electronic equipment and a storage medium for synthesizing a special voice effect, wherein the method comprises the following steps: acquiring text data corresponding to original voice data, and acquiring basic prosodic features and basic acoustic features matched with the text data; acquiring a feature adjustment parameter corresponding to a required special effect according to a mapping relation between at least two pre-established special effects and the corresponding feature adjustment parameter, wherein the feature adjustment parameter comprises: adjusting basic prosodic features and/or basic acoustic features by using the feature adjustment parameters to obtain target prosodic features and/or target acoustic features; and generating special effect synthetic voice corresponding to the original voice data according to the target prosody characteristic and/or the target acoustic characteristic. The technical scheme of the embodiment of the invention can meet the diversity requirement of special-effect voice.

Description

A kind of voice special efficacy synthetic method, device, electronic equipment and storage medium

Technical field

The present embodiments relate to voice processing technology field, more particularly to a kind of voice special efficacy synthetic method, device, Electronic equipment and storage medium.

Background technique

Speech synthesis, also known as literary periodicals (Text to Speech) technology, can convert in real time any text information to Smooth voice.

Existing speech synthesis technique generallys use preparatory trained rhythm model and acoustic model to raw tone number According to text data handled, obtain the corresponding synthesis voice of primary voice data.Inventor in the specific implementation process, sends out Following defect now exists in the prior art: using preparatory trained rhythm model and acoustic model to the text of primary voice data Notebook data is handled, and be can only obtain a kind of synthesis voice of fixed type, is unable to satisfy the diversity requirement of special efficacy voice.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of voice special efficacy synthetic method, device, electronic equipment and storages to be situated between Matter, main purpose are that the existing synthesis voice of solution speech synthesis system is more single, are unable to satisfy the multiplicity of special efficacy voice The problems such as property demand.

To solve the above-mentioned problems, the embodiment of the present invention mainly provides the following technical solutions:

In a first aspect, the embodiment of the invention provides a kind of voice special efficacy synthetic methods, this method comprises:

The corresponding text data of primary voice data is obtained, and obtains the basic prosodic features with the matches text data With basic acoustic feature；

According to the mapping relations between at least two special effects pre-established and corresponding Character adjustment parameter, obtain Character adjustment parameter corresponding with the special effect of demand, the Character adjustment parameter include: target prosodic features adjusting parameter And/or target acoustical Character adjustment parameter；

The basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, obtained Target prosodic features and/or target acoustical feature；

According to the target prosodic features and/or target acoustical feature, spy corresponding with the primary voice data is generated Effect synthesis voice.

Second aspect, the embodiment of the present invention also provide a kind of voice special efficacy synthesizer, which includes:

Data acquisition module for obtaining the corresponding text data of primary voice data, and obtains and the text data Matched basis prosodic features and basic acoustic feature；

Parameter acquisition module, for according at least two special effects that pre-establish and corresponding Character adjustment parameter it Between mapping relations, obtain corresponding with the special effect of demand Character adjustment parameter, the Character adjustment parameter includes: target Prosodic features adjusting parameter and/or target acoustical Character adjustment parameter；

Parameter adjustment module, for using the Character adjustment parameter to the basic prosodic features and/or basic acoustics Feature is adjusted, and obtains target prosodic features and/or target acoustical feature；

Voice synthetic module, for according to the target prosodic features and/or target acoustical feature, generate with it is described original The corresponding special efficacy of voice data synthesizes voice.

The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, comprising:

At least one processor；

And at least one processor, the bus being connected to the processor；Wherein,

The processor, memory complete mutual communication by the bus；

The processor is used to call the program instruction in the memory, is provided with executing any embodiment of that present invention Voice special efficacy synthetic method.

Fourth aspect, the embodiment of the present invention also provide a kind of non-transient computer readable storage medium, the non-transient meter Calculation machine readable storage medium storing program for executing stores computer instruction, and the computer instruction makes the computer execute any embodiment of that present invention Provided voice special efficacy synthetic method.

By above-mentioned technical proposal, technical solution provided in an embodiment of the present invention is at least had the advantage that

The embodiment of the present invention is by obtaining the corresponding text data of primary voice data and the basis with matches text data Prosodic features and basic acoustic feature, and according at least two special effects pre-established and corresponding Character adjustment parameter it Between mapping relations, corresponding with the special effect of demand Character adjustment parameter is obtained, to use Character adjustment parameter to basis Prosodic features and/or basic acoustic feature are adjusted, to obtain target prosodic features and/or target acoustical feature, finally Special efficacy corresponding with primary voice data, which is generated, according to the target prosodic features of acquisition and/or target acoustical feature synthesizes voice, The a variety of special effects realized according to demand are adjusted and synthesize, and solve the existing synthesis voice of existing voice synthesis system Single problem, to meet the diversity requirement of special efficacy voice.

Above description is only the general introduction of technical solution of the embodiment of the present invention, in order to better understand the embodiment of the present invention Technological means, and can be implemented in accordance with the contents of the specification, and in order to allow above and other mesh of the embodiment of the present invention , feature and advantage can be more clearly understood, the special specific embodiment for lifting the embodiment of the present invention below.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention The limitation of embodiment.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is a kind of flow chart for voice special efficacy synthetic method that the embodiment of the present invention one provides；

Fig. 2 is a kind of flow chart of voice special efficacy synthetic method provided by Embodiment 2 of the present invention；

Fig. 3 is a kind of schematic diagram for answer output device that the embodiment of the present invention three provides；

Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention four provides.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Embodiment one

Fig. 1 is a kind of flow chart for voice special efficacy synthetic method that the embodiment of the present invention one provides, and the present embodiment is applicable According to different special effect demand synthesis special efficacy synthesis voice the case where, this method can by voice special efficacy synthesizer Lai It executes, which can be realized by the mode of software and/or hardware.Correspondingly, as shown in Figure 1, this method includes following behaviour Make:

S110, the corresponding text data of primary voice data is obtained, and obtains the basic rhythm with the matches text data Restrain feature and basic acoustic feature.

Wherein, primary voice data can be manually entered, and need to be converted to the voice data of special efficacy synthesis voice.Base Plinth prosodic features can be the prosodic features of primary voice data, for example, the tone of voice phonetic, stress, participle and pause Feature etc..Basic acoustic feature can be the acoustic feature of primary voice data, for example, the base frequency parameters of voice data and spectrum ginseng Number etc..

In embodiments of the present invention, any voice data that primary voice data can be manually entered.Optionally, it can be used The mode of mandarin inputs, and can also input using with emotion, the embodiment of the present invention is not to primary voice data Typing mode and content etc. are defined.Obtaining the corresponding text data of primary voice data can be known using any voice Other technology realizes that the embodiment of the present invention does not limit the mode for obtaining the corresponding text data of primary voice data equally It is fixed.Correspondingly, can further be obtained and matches text data after getting the corresponding text data of primary voice data Basic prosodic features and basic acoustic feature.

S120, it is closed according to the mapping between at least two special effects and corresponding Character adjustment parameter pre-established System obtains Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: target prosodic features tune Whole parameter and/or target acoustical Character adjustment parameter.

Wherein, Character adjustment parameter can be for the basic prosodic features and/or basic acoustics to primary voice data The parameter that feature is adjusted, such as tone, stress, participle, pause feature, base frequency parameters and spectrum parameter.Target prosodic features Adjusting parameter can be the available parameter being adjusted to the basic prosodic features of primary voice data, target acoustical Character adjustment Parameter can be the available parameter being adjusted to the basic acoustic feature of primary voice data.

In embodiments of the present invention, before synthesizing special efficacy voice to primary voice data, at least two can be initially set up Mapping relations between a special effect and corresponding Character adjustment parameter, to realize that the voice to different special effect demands closes At.For example, being established respectively and feature tune for multiple special effects such as machine sound effect, male senile patient sound effect and children's sound effects Mapping relations between whole parameter.For different special effects, the target rhythm that corresponding Character adjustment parameter includes is special Sign adjusting parameter and/or target acoustical Character adjustment parameter are also not quite similar each other.The corresponding feature of part special effect Adjusting parameter may be related to target prosodic features adjusting parameter and target acoustical Character adjustment parameter simultaneously, but also have part special Effect effect may relate only to target prosodic features adjusting parameter or target acoustical Character adjustment parameter.

S130, the basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, Obtain target prosodic features and/or target acoustical feature.

Wherein, target prosodic features can be the prosodic features of the special efficacy synthesis voice of demand, and target acoustical feature can be with The acoustic feature of the special efficacy synthesis voice of demand.

Correspondingly, the special effect pair of voice can be synthesized according to special efficacy when carrying out special efficacy synthesis to primary voice data The basic prosodic features and/or basic acoustic feature that the Character adjustment parameter answered includes to primary voice data are adjusted, from And target prosodic features and/or target acoustical feature after being adjusted.That is, the embodiment of the present invention can be respectively to basic rhythm Rule feature and basic acoustic feature are adjusted to obtain target prosodic features and target acoustical feature, can also be only to the basic rhythm Feature or basic acoustic feature are adjusted to obtain target prosodic features or target acoustical feature.

S140, according to the target prosodic features and/or target acoustical feature, generate corresponding with the primary voice data Special efficacy synthesize voice.

It in embodiments of the present invention, can be according to the target rhythm after obtaining target prosodic features and/or target acoustical feature Feature and/or target acoustical feature generate special efficacy corresponding with primary voice data and synthesize voice.

Embodiment two

Fig. 2 is a kind of flow chart of voice special efficacy synthetic method provided by Embodiment 2 of the present invention, and the present embodiment is with above-mentioned It is embodied based on embodiment, in the present embodiment, is given using the Character adjustment parameter to the basic rhythm Feature and/or basic acoustic feature are adjusted, and obtain the specific implementation side of target prosodic features and/or target acoustical feature Formula.Correspondingly, as shown in Fig. 2, this method includes following operation:

S210, the corresponding text data of primary voice data is obtained, and obtains the basic rhythm with the matches text data Restrain feature and basic acoustic feature.

S220, it is closed according to the mapping between at least two special effects and corresponding Character adjustment parameter pre-established System obtains Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: target prosodic features tune Whole parameter and/or target acoustical Character adjustment parameter.

S230, the basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, Obtain target prosodic features and/or target acoustical feature.

Wherein, the basic prosodic features includes: that basis pronunciation corresponding with the text character in the text data is special Basic pause value between text character two-by-two in sign and the text data, the basis pronunciation character include and text The corresponding pinyin elements of character and tone value corresponding with pinyin elements；The target prosodic features adjusting parameter, comprising: mesh Mark tone adjusted value and target pause value.The basis acoustic feature includes: basis corresponding with the primary voice data Base frequency parameters and basic spectral parameter；The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target spectrum ginseng Number.

In embodiments of the present invention, basic pronunciation character can be each text of the corresponding text data of primary voice data Pronunciation character, specifically can be the corresponding pinyin elements of text character and tone value corresponding with pinyin elements.Stop on basis Value of pausing can be the pause value between adjacent two text of the corresponding text data of primary voice data.Basic base frequency parameters and base Plinth spectrum parameter is the base frequency parameters and spectrum parameter of primary voice data.Target tone adjusted value is that required special efficacy synthesizes voice The pinyin elements of corresponding text data and tone value corresponding with pinyin elements, target pause value are that required special efficacy synthesizes Pause value between adjacent two text of the corresponding text data of voice.Needed for target base frequency parameters and target spectrum parameter are Special efficacy synthesizes the corresponding base frequency parameters of voice and spectrum parameter.

Correspondingly, being obtained when using the target prosodic features adjusting parameter to be adjusted the basic prosodic features When target prosodic features, S230 can specifically include operations described below:

S231a, updated using the target tone adjusted value it is corresponding with each pinyin elements in the basic prosodic features Tone value.

Specifically, the basic rhythm of the corresponding text data of primary voice data can be updated using target tone adjusted value Tone value corresponding with each pinyin elements in feature.

S232a, text character two-by-two is updated in the basic prosodic features text data using the target pause value Between basic pause value.

Similarly, it can be updated using target pause value in the basic prosodic features of the corresponding text data of primary voice data Basic pause value between text character two-by-two in text data.

S233a, result will be updated as the target prosodic features.

Correspondingly, finally can be using the update result of tone value and pause value as target prosodic features.

Correspondingly, being obtained when using the target acoustical Character adjustment parameter to be adjusted the basic acoustic feature When target acoustical feature, S230 can specifically include operations described below:

S231b, the basic base frequency parameters in the basic acoustic feature are updated using the target base frequency parameters.

Specifically, the basis in the corresponding basic acoustic feature of primary voice data can be updated using target base frequency parameters Base frequency parameters.

S232b, the basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter.

Similarly, the basic spectral ginseng in the corresponding basic acoustic feature of primary voice data can be updated using target spectrum parameter Number.

S233b, result will be updated as the target acoustical feature.

Correspondingly, finally can be using the update result of base frequency parameters and spectrum parameter as target acoustical feature.

S240, using vocoder according to the target prosodic features and/or target acoustical feature, generate and the original language The corresponding special efficacy of sound data synthesizes voice.

In embodiments of the present invention, after obtaining target prosodic features and/or target acoustical feature, vocoder pair can be used Target prosodic features and/or target acoustical feature are synthesized, and synthesize voice to generate special efficacy corresponding with primary voice data.

In an alternate embodiment of the present invention where, it according to the target prosodic features and/or target acoustical feature, generates Special efficacy corresponding with the primary voice data synthesizes voice, may include following one:

According to the target prosodic features and the basic acoustic feature, generate corresponding with the primary voice data Special efficacy synthesizes voice；

According to the basic prosodic features and the target acoustical feature, generate corresponding with the primary voice data Special efficacy synthesizes voice；

According to the target prosodic features and target acoustical feature, special efficacy corresponding with the primary voice data is generated Synthesize voice.

It should be noted that the corresponding Character adjustment parameter of part special effect can due to for different special effects It can be related to target prosodic features adjusting parameter and target acoustical Character adjustment parameter simultaneously, but also have part special effect may Relate only to target prosodic features adjusting parameter or target acoustical Character adjustment parameter.Therefore, it generates and primary voice data pair When the special efficacy synthesis voice answered, it is also possible to while being related to target prosodic features and target acoustical feature, or relate only to Target prosodic features or target acoustical feature.Specifically, when obtaining target prosodic features and target acoustical feature, it can be to mesh Mark prosodic features and target acoustical feature are combined to obtain corresponding special efficacy synthesis voice；When obtaining target prosodic features, Target prosodic features can be combined to obtain corresponding special efficacy synthesis voice with basic acoustic feature；When obtaining target acoustical When feature, target acoustical feature can be combined to obtain corresponding special efficacy synthesis voice with basic prosodic features.

In addition it should be noted that, special effect corresponding different prosodic features model and acoustic feature can also be used Model is adjusted basic prosodic features and/or basic acoustic feature, obtains target prosodic features and/or target acoustical is special Sign.

In a specific example, illustrate for synthesizing voice using machine sound as special efficacy.Assuming that primary voice data Corresponding text data are as follows: " hello, and Nice to see you ".The basic prosodic features of text Data Matching are as follows: 1) ni3 hao3 hen3 gao1 xing4 ren4 shi1 ni3；2) your good #2 Nice to see you #3.Wherein, first feature is text The corresponding basic pronunciation character of this character, second feature are the basic pause value between text character two-by-two in text data. Basic spectral parameter in the basic acoustic feature of text Data Matching is ... 258 263 275 ....Correspondingly, if machine Mapping relations between the corresponding target prosodic features adjusting parameter of device sound special effect are that target tone adjusted value is unified for one Sound and target pause value are unified for second level, the mapping between the corresponding target acoustical Character adjustment parameter of machine sound special effect Relationship is that target base frequency parameters are fixed value 260, and spectrum parameter remains unchanged, then the target rhythm of final special efficacy synthesis voice is special Sign are as follows: 1) ni1 hao1 hen1 gao1 xing1 ren1 shi1 ni1；2) your good #2 Nice to see you #2.Target acoustical The target base frequency parameters that feature includes are ... 260 260 260 ..., the target spectrum parameter and base that target acoustical feature includes Plinth spectrum parameter is consistent, without adjustment.Finally, can be by vocoder by the corresponding target prosodic features of machine sound and target Acoustic feature is combined synthesis, and then generates the corresponding special efficacy of final machine sound and synthesize voice.

It should be noted that Fig. 2 is only a kind of schematic diagram of implementation, between S231a-S233a and S231b-S233b It there is no sequencing relationship, can first implement S231a-S233a, then implement S231b-S233b, can also first implement S231b- S233b, then implement S231a-S233a, with the two parallel practice or an implementation can be selected.

By adopting the above technical scheme, by according to reflecting between the corresponding Character adjustment parameter of special effect pre-established Penetrate relationship, the corresponding Character adjustment parameter of special effect needed for determining, and according to Character adjustment parameter to basic prosodic features and/ Or basic acoustic feature is adjusted, and then obtains target prosodic features and/or target acoustical feature, finally according to the mesh of acquisition It marks prosodic features and/or target acoustical feature generates special efficacy corresponding with primary voice data and synthesizes voice, realize according to need The a variety of special effects asked are adjusted and synthesize, and solve the problems, such as that the existing synthesis voice of existing voice synthesis system is single, To meet the diversity requirement of special efficacy voice.

It should be noted that in the above various embodiments between each technical characteristic arbitrary arrangement combination also belong to it is of the invention Protection scope.

Embodiment three

Fig. 3 is a kind of schematic diagram for answer output device that the embodiment of the present invention three provides, as shown in figure 3, described device It include: data acquisition module 310, parameter acquisition module 320, parameter adjustment module 330 and voice synthetic module 340, in which:

Data acquisition module 310 for obtaining the corresponding text data of primary voice data, and obtains and the textual data According to matched basic prosodic features and basic acoustic feature；

Parameter acquisition module 320, for being joined according at least two special effects pre-established with corresponding Character adjustment Mapping relations between number, obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: Target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter；

Parameter adjustment module 330, for using the Character adjustment parameter to the basic prosodic features and/or basic sound It learns feature to be adjusted, obtains target prosodic features and/or target acoustical feature；

Voice synthetic module 340, for according to the target prosodic features and/or target acoustical feature, generate with it is described The corresponding special efficacy of primary voice data synthesizes voice.

Optionally, the basic prosodic features includes: basis pronunciation corresponding with the text character in the text data The basic pause value between text character, the basis pronunciation character include and text two-by-two in feature and the text data The corresponding pinyin elements of this character and tone value corresponding with pinyin elements；The target prosodic features adjusting parameter, comprising: Target tone adjusted value and target pause value.

Optionally, parameter adjustment module 330 are specifically used for updating the basic rhythm using the target tone adjusted value Tone value corresponding with each pinyin elements in feature；Textual data in the basic prosodic features is updated using the target pause value The basic pause value between text character two-by-two in；Result will be updated as the target prosodic features.

Optionally, the basic acoustic feature includes: basic base frequency parameters corresponding with the primary voice data and base Plinth composes parameter；The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target compose parameter.

Optionally, parameter adjustment module 330 are specifically used for updating the basic acoustics spy using the target base frequency parameters Basic base frequency parameters in sign；The basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter；It will more New result is as the target acoustical feature.

Optionally, voice synthetic module 340 are specifically used for special according to the target prosodic features and the basic acoustics Sign generates special efficacy corresponding with the primary voice data and synthesizes voice；According to the basic prosodic features and the target Acoustic feature generates special efficacy corresponding with the primary voice data and synthesizes voice；According to the target prosodic features and mesh Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice.

Optionally, voice synthetic module 340 are specifically used for using vocoder according to the target prosodic features and/or mesh Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice.

Voice special efficacy synthetic method provided by any embodiment of the invention can be performed in above-mentioned voice special efficacy synthesizer, tool The standby corresponding functional module of execution method and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to this The voice special efficacy synthetic method that invention any embodiment provides.

Since the voice special efficacy synthesizer that the present embodiment is introduced is the voice spy that can be executed in the embodiment of the present invention The device of synthetic method is imitated, so based on voice special efficacy synthetic method described in the embodiment of the present invention, the affiliated skill in this field Art personnel can understand the specific embodiment and its various change form of the voice special efficacy synthesizer of the present embodiment, so How voice special efficacy synthetic method in embodiment of the present invention in detail is realized if being no longer situated between for the voice special efficacy synthesizer at this It continues.As long as those skilled in the art implement device used by voice special efficacy synthetic method in the embodiment of the present invention, all belong to In the range that the application to be protected.

Example IV

Fig. 4 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention four provides.As shown in figure 4, such as Fig. 4 institute Show, comprising: at least one processor (processor) 41；And at least one processor being connect with the processor 41 (memory) 42, bus 43；Wherein,

The processor 41, memory 42 complete mutual communication by the bus 43；

The processor 41 is used to call the program instruction in the memory 42, to execute above-mentioned voice special efficacy synthesis side Step in method embodiment.For example, the processor 41 executes: obtaining the corresponding text data of primary voice data, and obtain With the basic prosodic features and basic acoustic feature of the matches text data；According at least two special effects pre-established With the mapping relations between corresponding Character adjustment parameter, Character adjustment parameter corresponding with the special effect of demand, institute are obtained Stating Character adjustment parameter includes: target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter；Use the feature Adjusting parameter is adjusted the basic prosodic features and/or basic acoustic feature, obtains target prosodic features and/or target Acoustic feature；According to the target prosodic features and/or target acoustical feature, spy corresponding with the primary voice data is generated Effect synthesis voice.

Embodiment five

The present embodiment provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Computer instruction is stored, the computer instruction proposes the above-mentioned each voice special efficacy synthetic method embodiment of the computer execution The voice special efficacy synthetic method of confession: the corresponding text data of primary voice data is obtained, and is obtained and the matches text data Basic prosodic features and basic acoustic feature；Joined according at least two special effects pre-established with corresponding Character adjustment Mapping relations between number, obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: Target prosodic features adjusting parameter and/or target acoustical Character adjustment parameter；Using the Character adjustment parameter to the basis Prosodic features and/or basic acoustic feature are adjusted, and obtain target prosodic features and/or target acoustical feature；According to described Target prosodic features and/or target acoustical feature generate special efficacy corresponding with the primary voice data and synthesize voice.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, Usable storage medium (including but not limited to magnetic disk storage, CD-ROM (Compact Disc-Read Only Memory, CD-ROM), optical memory etc.) on the form of computer program product implemented.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (Central Processing Unit/Processor, CPU), input/output interface, network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (Random Access Memory, RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (Read Only Memory, ROM) Or flash memory (flash RAM).Memory is the example of computer-readable medium.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer include, but are not limited to phase change memory (Parallel Random Access Machine, PRAM), static random access memory (Static Random-Access Memory, SRAM), dynamic random access memory (Dynamic RandomAccess Memory, DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital multi CD (Digital Video Disc, DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic Property storage equipment or any other non-transmission medium, can be used for storing and can be accessed by a computing device information.According to herein Define, computer-readable medium does not include temporary computer readable media (transitory media), the data letter of such as modulation Number and carrier wave.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence " including one ... ", it is not excluded that including element Process, method, there is also other identical elements in commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims

1. a kind of voice special efficacy synthetic method characterized by comprising

The corresponding text data of primary voice data is obtained, and obtains the basic prosodic features and base with the matches text data Plinth acoustic feature；

According to the mapping relations between at least two special effects pre-established and corresponding Character adjustment parameter, obtains and need The corresponding Character adjustment parameter of the special effect asked, the Character adjustment parameter include: target prosodic features adjusting parameter and/or Target acoustical Character adjustment parameter；

The basic prosodic features and/or basic acoustic feature are adjusted using the Character adjustment parameter, obtain target Prosodic features and/or target acoustical feature；

According to the target prosodic features and/or target acoustical feature, generates special efficacy corresponding with the primary voice data and close At voice.

2. according to the method described in claim 1, it is characterized by:

The basis prosodic features includes: basic pronunciation character corresponding with the text character in the text data, Yi Jisuo The basic pause value between text character two-by-two is stated in text data, the basis pronunciation character includes corresponding with text character Pinyin elements and tone value corresponding with pinyin elements；

The target prosodic features adjusting parameter, comprising: target tone adjusted value and target pause value.

3. according to the method described in claim 2, it is characterized in that, using the target prosodic features adjusting parameter to the base Plinth prosodic features is adjusted, and obtains target prosodic features, comprising:

Tone value corresponding with each pinyin elements in the basic prosodic features is updated using the target tone adjusted value；

The basis between text character two-by-two is updated in the basic prosodic features text data using the target pause value Pause value；

Result will be updated as the target prosodic features.

4. according to the method described in claim 1, it is characterized by:

The basis acoustic feature includes: basic base frequency parameters corresponding with the primary voice data and basic spectral parameter；

The target acoustical Character adjustment parameter, comprising: target base frequency parameters and target compose parameter.

5. according to the method described in claim 4, it is characterized in that, using the target acoustical Character adjustment parameter to the base Plinth acoustic feature is adjusted, and obtains target acoustical feature, comprising:

The basic base frequency parameters in the basic acoustic feature are updated using the target base frequency parameters；

The basic spectral parameter in the basic acoustic feature is updated using target spectrum parameter；

Result will be updated as the target acoustical feature.

6. method according to claim 1-5, which is characterized in that according to the target prosodic features and/or mesh Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice, including following one:

According to the target prosodic features and the basic acoustic feature, special efficacy corresponding with the primary voice data is generated Synthesize voice；

According to the basic prosodic features and the target acoustical feature, special efficacy corresponding with the primary voice data is generated Synthesize voice；

According to the target prosodic features and target acoustical feature, special efficacy synthesis corresponding with the primary voice data is generated Voice.

7. method according to claim 1-5, which is characterized in that according to the target prosodic features and/or mesh Acoustic feature is marked, special efficacy corresponding with the primary voice data is generated and synthesizes voice, comprising:

Using vocoder according to the target prosodic features and/or target acoustical feature, generate and the primary voice data pair The special efficacy synthesis voice answered.

8. a kind of voice special efficacy synthesizer characterized by comprising

Data acquisition module for obtaining the corresponding text data of primary voice data, and obtains and the matches text data Basic prosodic features and basic acoustic feature；

Parameter acquisition module, for according between at least two special effects and corresponding Character adjustment parameter pre-established Mapping relations obtain Character adjustment parameter corresponding with the special effect of demand, and the Character adjustment parameter includes: the target rhythm Character adjustment parameter and/or target acoustical Character adjustment parameter；

Parameter adjustment module, for using the Character adjustment parameter to the basic prosodic features and/or basic acoustic feature It is adjusted, obtains target prosodic features and/or target acoustical feature；

Voice synthetic module, for generating and the raw tone according to the target prosodic features and/or target acoustical feature The corresponding special efficacy of data synthesizes voice.

9. a kind of electronic equipment characterized by comprising

At least one processor；

And at least one processor, the bus being connected to the processor；Wherein,

The processor, memory complete mutual communication by the bus；

The processor is used to call the program instruction in the memory, any into claim 7 with perform claim requirement 1 Voice special efficacy synthetic method described in.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Store up computer instruction, the computer instruction requires the computer perform claim 1 to described in any one of claim 7 Voice special efficacy synthetic method.