CN102881283A

CN102881283A - Method and system for processing voice

Info

Publication number: CN102881283A
Application number: CN2011102043953A
Authority: CN
Inventors: 陈晓晓; 李远友; 向春
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2011-07-13
Filing date: 2011-07-13
Publication date: 2013-01-16
Anticipated expiration: 2031-07-13
Also published as: CN102881283B

Abstract

The invention provides a method and a system for processing a voice. The system comprises a voice characteristic parameter acquisition module for acquiring voice characteristic parameters for representing voice characteristics of a first voice and a second voice, a voice template generation module for generating the voice characteristic parameter of the first voice into a voice template, and a voice processing module for adjusting the voice characteristic parameter of the second voice according to the voice template and applying the adjusted voice characteristic parameter to the second voice.

Description

The method and system that is used for speech processes

Technical field

The present invention relates to a kind of method and system for speech processes, more particularly, relate to a kind of method and system that can use sound template that voice are processed.

Background technology

In recent years, along with the fast development of voice processing technology, people are more and more deep to the understanding of voice, and multiple application about voice occurred, for example, speech recognition, record, repeat the words of others like a parrot etc.Because the starting point of various voice application is different, thereby these application differ from one another, and can satisfy all kinds of crowds' different demands.

Although many application and method about the voice change of voice in the speech processes have occurred in the prior art, but most of changes of voice are used and can only be processed voice with predetermined pattern, and be difficult to multifarious, variational voice are effectively processed and the change of voice, so that the user can't come voice are processed flexibly according to the actual requirements.Therefore, along with the continuous variation of the user's request of the widespread use of digital device and digital device, the existing change of voice is used and can't have been satisfied current and needs future development.In this case, need a kind of method and system that can come voice are processed to realize neatly the change of voice according to user's demand.

Summary of the invention

The object of the present invention is to provide a kind of method and system that can become sound template and use sound template that voice are processed next life according to user's request, thereby so that the user can more flexibly and process voice effectively, wherein, can produce sound template by the characteristics of speech sounds parameter of extracting voice signal.

According to an aspect of the present invention, provide a kind of speech processing system, described system comprises: the characteristics of speech sounds parameter acquisition module, for the characteristics of speech sounds parameter of the characteristics of speech sounds that obtains performance the first voice and the second voice; Voice masterplate generation module is used for the characteristics of speech sounds parameter of the first voice is generated as the voice masterplate; Speech processing module, for the characteristics of speech sounds parameter of adjusting the second voice according to sound template, and the characteristics of speech sounds parameter after will adjusting is applied to the second voice.

Described system also can comprise: the voice acquisition module is used for obtaining the first voice and/or the second voice.

Described system also can comprise: memory module is used for the storaged voice template.

Described characteristics of speech sounds can comprise at least one in volume, tone and the tone color characteristic of voice.

Described voice acquisition module can be chosen the first voice and/or the second voice from pre-stored voice.

Described voice acquisition module can use sound pick-up outfit to record the first voice and/or the second voice.

Described characteristics of speech sounds parameter can comprise at least one in the following parameter: frequency and the range parameter of the overtone of the predetermined quantity of the frequency of the fundamental tone of the volume parameter of performance volume characteristic, performance tone characteristic and range parameter, performance tone color characteristic.

The characteristics of speech sounds parameter acquisition module can directly arrange the every characteristics of speech sounds parameter that forms the first required voice of sound template, so that the characteristics of speech sounds parameter that arranges is generated as sound template by the sound template generation module.

Speech processing module is adjusted the characteristics of speech sounds parameter that the sound template of selecting from the sound template of storage comprises, and generates another sound template different from the sound template of selecting according to the characteristics of speech sounds parameter after adjusting by the sound template generation module.

According to a further aspect in the invention, also provide a kind of method of speech processing, described method comprises: the characteristics of speech sounds parameter of obtaining the characteristics of speech sounds of performance the first voice and the second voice; The characteristics of speech sounds parameter of the first voice is generated as the voice masterplate; Adjust the characteristics of speech sounds parameter of the second voice according to sound template, and the characteristics of speech sounds parameter after will adjusting is applied to the second voice.

Described method also can comprise: obtain the first voice and/or the second voice.

Described method also can comprise: the storaged voice template.

Can from pre-stored voice, choose the first voice and/or the second voice.

Can use sound pick-up outfit to record the first voice and/or the second voice.

The every characteristics of speech sounds parameter that forms the first required voice of sound template can directly be set, so that the characteristics of speech sounds parameter that arranges is generated as sound template.

The characteristics of speech sounds parameter that the sound template of selecting from the sound template of storage comprises is adjusted, and generated another sound template different from the sound template of selecting according to the characteristics of speech sounds parameter after adjusting.

The method of speech processing of the application of the invention and system can process voice according to user's demand more neatly, so that the result of speech processes is more true to nature and various, the purpose of user's entertainment life is enriched in realization.

Will be in ensuing description part set forth the present invention other aspect and/or advantage, some will be clearly by describing, and perhaps can learn through enforcement of the present invention.

Description of drawings

By the detailed description of carrying out below in conjunction with accompanying drawing, above-mentioned and/or other purpose, characteristics and advantage of the present invention will become apparent, wherein:

Fig. 1 is the block diagram that illustrates according to the speech processing system of exemplary embodiment of the present invention;

Fig. 2 is the process flow diagram that illustrates according to the method for speech processing of exemplary embodiment of the present invention;

Fig. 3 is the process flow diagram that illustrates according to the sound template generation method of another exemplary embodiment of the present invention;

Fig. 4 is the process flow diagram that sound template generation method in accordance with a further exemplary embodiment of the present invention is shown.

Embodiment

Below, exemplary embodiment of the present invention is described with reference to the accompanying drawings more fully, exemplary embodiment is shown in the drawings.Yet, can be with many different form exemplifying embodiment embodiment, and should not be construed as limited to exemplary embodiment set forth herein.On the contrary, thereby provide these embodiment disclosure will be thoroughly and complete, and will be fully the scope of exemplary embodiment be conveyed to those skilled in the art.In the accompanying drawings, identical label represents identical part.

Fig. 1 is the block diagram that illustrates according to the speech processing system 100 of exemplary embodiment of the present invention.With reference to Fig. 1, comprise according to the speech processing system 100 of exemplary embodiment of the present invention: characteristics of speech sounds parameter acquisition module 120, voice masterplate generation module 130 and speech processing module 140.

With reference to Fig. 1, characteristics of speech sounds parameter acquisition module 120 can be used for obtaining the characteristics of speech sounds parameter of at least a characteristics of speech sounds of performance (for example, the volume of voice, tone and tone color characteristic etc.).Only as example, the below will describe the method that the voice audio signals of using the PCM stream format (below, referred to as " pcm audio signal ") is obtained the characteristics of speech sounds parameter of at least a characteristics of speech sounds in performance volume, tone and the tone color characteristic.

Volume refers to the sound size strong and weak subjective feeling of people's ear to hearing, the amplitude size that its objective evaluation yardstick is sound.Therefore can show with the amplitude of pcm audio signal the volume of voice.

Tone is often referred to the signal with specific and stable pitch, is the height that sound sounds tune, depends primarily on frequency.The reaction of the sound people ear that frequency is high is that tone is high, and the reaction of the sound people ear that frequency is low is that tone is low.Tone is mainly determined by the fundamental frequency of sound, therefore, can pass through frequency (that is, fundamental frequency) and the amplitude of the fundamental tone of extraction pcm audio signal and obtain pitch parameters.

Tone color is the characteristic of sound, everyone voice have different tone colors, therefore can pick out different people according to its tone color, the difference of tone color depends on different overtones, each musical instrument, different people and all can sounding the sound that sends of object in, except a fundamental tone, also have the overtone of many different frequencies to follow, the various combination of the frequency of these overtones and amplitude has determined different tone colors just.Frequency and the amplitude of the overtone of predetermined quantity that therefore, can be by extracting sound signal are obtained the tone color characterisitic parameter.

Characteristics of speech sounds commonly used can also be summarised as fundamental tone, resonance peak, the linear prediction cepstrum coefficient coefficient, the digital parameters such as Mel frequency cepstral coefficient, by present more existing mainstream technologys, such as the Feature Extraction Technology based on LPCC, Feature Extraction Technology based on MFCC, Short Time Fourier Transform (processing the classic method of stationary signal) technology can be obtained at least one the following parameter that shows characteristics of speech sounds: the volume parameter of performance volume characteristic, frequency and the range parameter of the fundamental tone of performance tone characteristic, frequency and the range parameter of the overtone of the predetermined quantity of performance tone color characteristic.

After the extraction operation was finished, characteristics of speech sounds parameter acquisition module 120 can send to the characteristics of speech sounds parameter of obtaining sound template generation module 130 to generate sound template, perhaps keeps described parameter, subsequently these voice is processed to utilize these parameters.

Voice masterplate generation module 130 becomes the voice masterplate next life according to a plurality of characteristics of speech sounds parameter that obtains from characteristics of speech sounds parameter acquisition module 120, and with the voice template stores that generates in memory module 150, wherein, described voice masterplate refers to show the set of characteristics of speech sounds parameter of the multiple voice characteristic of special sound, described characteristics of speech sounds can comprise, but be not limited at least one in volume, tone color and the tone characteristic.Selectively, according to another embodiment, voice masterplate generation module 130 also can directly be input to speech processing module 140 with the sound template that generates to be come voice are processed.

In addition, also can according to forming the required various parameters of sound template each characteristics of speech sounds parameter be set directly by characteristics of speech sounds parameter acquisition module 120, and the characteristics of speech sounds parameter that arranges is delivered to sound template generation module 130, to generate self-defining voice masterplate.Specifically, in an embodiment of the present invention, can volume directly be set by the user, the frequency of the overtone of the frequency of fundamental tone and amplitude and predetermined quantity and amplitude, and the characteristics of speech sounds parameter that sets sent to sound template generation module 130, to generate the sound template of user's expectation.

In addition, also can be by the parameter of existing sound template being made amendment to generate new sound template.In this way, can increase easily the kind of the voice masterplate that can generate and use, thereby realize abundanter voice processing effect.

Speech processing module 140 is used for pending voice are processed.The voice masterplate that can select the user to expect according to user's request, and the sound template of selecting outputed to speech processing module 140, so that the user can process voice according to selected voice masterplate.In detail, speech processing module 140 can be come being adjusted from the characteristics of speech sounds parameter that pending voice extract by characteristics of speech sounds parameter acquisition module 120 according to the characteristics of speech sounds parameter that records in the voice masterplate of selecting, and the characteristics of speech sounds parameter after will adjusting is applied to pending voice, thereby makes pending voice have the characteristics of speech sounds of user's expectation.

For example, the fundamental frequency of pending voice and amplitude can be adjusted into respectively consistent with frequency and the amplitude of the fundamental tone that records in the voice masterplate of selecting, the frequency of each overtone of pending voice and amplitude can be adjusted into respectively consistent with frequency and the amplitude of each corresponding overtone of recording in the voice masterplate of selecting, also pending speech volume size can be adjusted into consistent with the volume that records in the voice masterplate of selecting, thereby make the volume of the voice that obtain, the volume that tone and tone color characteristic and voice masterplate can show, tone is consistent with the tone color characteristic, realizes the effect of the voice that imitation voice masterplate shows.

Only as example, suppose that the parameter of the pending voice that obtain by characteristics of speech sounds parameter acquisition module 120 is as follows: the frequency of fundamental tone and range parameter are (f ₀, C ₀), and having extracted 16 groups of overtone parameters, frequency and the range parameter of each overtone are respectively (f ₁, C ₁), (f ₂, C ₂) ..., (f ₁₆, C ₁₆), volume is V, wherein, and f ₀, f ₁... f ₁₆Frequency parameter, C ₀, C ₁... C ₁₆It is range parameter.The user uses 1 pair of pending voice of sound template to process, and wherein, the characteristics of speech sounds parameter that comprises in the sound template 1 is: frequency and the amplitude of fundamental tone are respectively (f _R0, C _R0), the frequency of overtone and range parameter are (f _R1, C _R1), (f _R2, C _R2) ..., (f _R16, C _R16), volume is V _R, wherein, f _R0, f _R1... f _R16Frequency parameter, C _R0, C _R1... C _R16It is range parameter.For make pending voice after processing can with sound template in the characteristic voice that shows of characteristics of speech sounds parameter same or similar, respectively fundamental tone, overtone and the volume parameters of pending voice is adjusted into fundamental tone, overtone and the volume parameters of the record in the template, specifically, so that the speech parameter value of pending voice be adjusted to respectively: f ₀=f _R0, f ₁=f _R1, f ₂=f _R2..., f ₁₆=f _R16, C ₀=C _R0, C ₁=C _R1..., C ₂=C _R2, C ₁₆=C _R16, and so that the volume V of pending voice is adjusted into V _R, that is to say, so that volume V=V _R

After the adjustment of finishing the characteristics of speech sounds parameter for the treatment of processed voice, adjusted speech parameter is applied to pending voice, thereby finishes change of voice process.Specifically, in the present embodiment, by using and the corresponding inverse operation of operation that extracts the characteristics of speech sounds parameter information, the characteristics of speech sounds parameter after adjusting is applied to pending voice, thereby finish the change of voice of pending voice is processed.

Should be appreciated that said method only is exemplary, use the method that sound template processes pending voice and be not limited in this, can use sound template according to user's needs or predetermined arranging pending voice are processed.Perhaps the user can not use sound template that pending sound is carried out the change of voice, but each characteristics of speech sounds parameter of the pending voice that direct adjustment is extracted in speech processing module 140 is finished the change of voice process of pending voice.

In addition, also can in speech processing module 140, realize treated voice are adjusted and landscaping treatment, thereby so that described voice can obtain audio more true to nature.As example, only describe here by adjust the method for the tamber effect of voice with adjustment overtone parameter.

No matter voice, song, or the voice of musical instrument, they are not single-tones, but a complex tone.Namely fundamental tone and a series of overtone by voice consisted of.These overtones all are the multiples of fundamental frequency, and the characteristic of tone color is had very important impact.Overtone can be divided into low frequency overtone, intermediate frequency overtone and high frequency overtone.If the amplitude of low frequency overtone is stronger, tone color just shows mixed thickly; The amplitude of intermediate frequency overtone is more intense, and tone color just shows mellow and fullly, nature, harmony; The amplitude of high frequency overtone is more intense, tone color just shows brightly, clear thoroughly, parsing power is strong.

The frequency characteristic of the different formation tone colors of the quantity of overtone and overtone amplitude.This curve has just embodied the expressive force of tone color.The frequency characteristic of tone color is different.The intensity of 16 overtones of fundamental tone to the is drawn a straight line at coordinate, and this straight line just is called as best Western style of singing line.The frequency characteristic of tone color is more near this straight line, and the ratio of the basic, normal, high frequency overtone of described tone color is also balanced, and the expressive force of its tone color is also best.

Can carry out frequency processing to tone color by four sections balanced devices, improve the artistic expression of tone color.Audio frequency can be divided into 4 large frequency bands, that is:

HF:6kHz-16kHz affects expressive force, the parsing power of tone color;

MID HF:600Hz～6kHz affects lightness, the sharpness of tone color;

MID LF:200Hz～600Hz affects tone color and dynamics and compactedness;

LF:20Hz～200Hz affects mixed thickness and the richness of tone color.

If high-band frequency excessively a little less than, tone color the losing of color, charm, individual character that just become; If high-band frequency is excessively strong, tone color will become point make an uproar, hoarse, ear-piercing.If the frequency of medium-high frequency section excessively a little less than, it is dim, dim that tone color just becomes; If the frequency of medium-high frequency section is excessively strong, it is stiff that its tone color will become.If the frequency of medium and low frequency section excessively a little less than, it is hollow, unable, soft that tone color can become; If the frequency of medium and low frequency section is excessively strong, that tone color can become is stiff, lose vigor.If the frequency of low-frequency range excessively a little less than, it is thin, pale that tone color will become; If the frequency of low-frequency range is excessively strong, it is muddy that tone color can become.

Make tone color that aesthetic feeling be arranged, will enrich, have levels by overtone.After promoting a certain frequency range, also to consider the impact on other frequency ranges, generally consider sharpness and the richness of song.For example, female voice produces S sound (hiss) easily at HFS, then can eliminate the S sound at the 7-10KHz 3dB that decays; Man's voice domain is than low one the 8 degree interval of female voice, and frequency is hanged down a frequency multiplication, about 100Hz decay 3dB, can increase sharpness.In this way, can finish adjustment to tone color information.

As mentioned above, come method that voice are done further adjustment and beautified although described by adjusting overtone, the invention is not restricted to this, also can use additive method to realize the adjustment of voice and beautify.

In addition, according to exemplary embodiment of the present invention, speech processing system 100 also can comprise: voice acquisition module 110 and memory module 150.As shown in Figure 1, voice acquisition module 110 is used for obtaining pending voice, in exemplary embodiment of the present invention, voice acquisition module 110 can obtain pending voice with two kinds of voice obtain manners at least: can pass through voice deriving means (for example, microphone) from extraneous recorded speech; Also can directly from pre-stored voice, select pending voice.Finish obtain after, voice acquisition module 110 can be with pending voice output to characteristics of speech sounds parameter acquisition module 120.Memory module 150 is used for the sound template that storage generates, and the sound template of user selection is offered speech processing module 140, to help through the change of voice of pending voice is processed.

In addition, according to exemplary embodiment of the present invention, described speech processing system 100 can comprise that also the playing module (not shown) plays voice.

Fig. 2 is the process flow diagram that illustrates according to the speech processes of exemplary embodiment of the present invention.Describe hereinafter with reference to Fig. 2 and to use method of speech processing of the present invention to finish process to the processing of voice.

In step 201, voice acquisition module 110 can utilize recording device to record pending voice from the external world, perhaps selects pending voice from pre-stored voice, and the pending voice output that then will obtain is to characteristics of speech sounds parameter acquisition module 120.

In step 203, characteristics of speech sounds parameter acquisition module 120 with pending tone decoding be can be used for the characteristics of speech sounds parameter extraction form (for example, the PCM stream format), then the voice of decoding are analyzed, to extract each characteristics of speech sounds parameter (for example, at least a characteristics of speech sounds parameter in volume, tone and the tone color characteristic).

In step 205, determine whether that the characteristics of speech sounds parameter that will extract is generated as the voice masterplate in step 203.If determine to generate the voice masterplate, then enter step 207; If determine to be not the voice masterplate with described characteristics of speech sounds Information generation, then enter step 209.

In step 207, in voice masterplate generation module 130, generate corresponding voice masterplate according to the characteristics of speech sounds parameter that receives, and described voice masterplate is kept in the memory module 150.Selectively, according to another embodiment, the sound template that also can directly voice masterplate generation module 130 be generated is input to speech processing module 140 to be come voice are processed.

In step 209, determine whether the voice that obtain are processed, if need to process the voice that obtain, then enter step 211.

In step 211, then the voice masterplate from memory module 150 selects the user to expect is input to speech processing module 140 with the voice masterplate of selecting with the characteristics of speech sounds parameter of extracting in step 203.

In step 213, according to the voice masterplate of selecting in the step 211 the characteristics of speech sounds parameter of pending voice is adjusted, in conjunction with Fig. 1 detailed parameter adjustment process has been described, therefore no longer described at this.Characteristics of speech sounds parameter after adjusting is applied to pending voice obtaining new voice, thereby realizes the purpose of the sound effect of imitation voice masterplate.

In addition, also can realize beautifying of voice adjusted processing in step 213, for example, can pass through to adjust the tone color (that is, the frequency of overtone and range parameter) of voice so that the voice audio after the described variation is more true to nature.

Fig. 3 is the process flow diagram that illustrates according to the generation sound template of another exemplary embodiment of the present invention.

As shown in Figure 3, in step 301, in characteristics of speech sounds parameter acquisition module 120 the required every characteristics of speech sounds parameter of generation sound template is set directly, specifically, only as example, in exemplary embodiment of the present invention, the amplitude of volume, fundamental tone and amplitude and the frequency of frequency and overtone can be set directly.

In step 303, determine whether the setting of characteristics of speech sounds parameter is finished.

If determined to finish the setting of characteristics of speech sounds parameter in step 303, then in step 305, each characteristics of speech sounds parameter that arranges in sound template generation module 130 usefulness generates corresponding sound template, and in step 307 sound template that generates is kept at memory module 150.If the setting in step 303 characteristics of speech sounds parameter does not also finish, then can continue parameters, perhaps according to another embodiment, can select directly to finish the processing shown in Fig. 3

Fig. 4 is the process flow diagram that generation sound template in accordance with a further exemplary embodiment of the present invention is shown.

As shown in Figure 4, in step 401, select sound template from memory module 150.

In step 403, by each the characteristics of speech sounds parameter in the template of speech processing module 140 modification selections.

In step 405, determine whether the modification of characteristics of speech sounds parameter is finished.

If determine to have finished modification to the characteristics of speech sounds parameter of the sound template selected in step 405, then in step 407, in sound template generation module 130, generate new sound template with amended each characteristics of speech sounds parameter, and described new sound template is kept in the memory module 150.If the modification in step 405 characteristics of speech sounds parameter does not also finish, then can continue to revise the characteristics of speech sounds parameter, perhaps according to another embodiment, can select directly to finish the processing shown in Fig. 4.

Should be appreciated that, after the modification of finishing the characteristics of speech sounds parameter of the sound template selected, the new sound template that also can not generate, but directly preserve amended characteristics of speech sounds parameter at selected sound template, thus realize adjusting the effect of sound template.

Although specifically shown with reference to its exemplary embodiment and described the present invention, but it should be appreciated by those skilled in the art, in the situation that does not break away from the spirit and scope of the present invention that claim limits, can carry out various changes on form and the details to it.

Claims

1. speech processing system, described system comprises:

The characteristics of speech sounds parameter acquisition module is for the characteristics of speech sounds parameter of the characteristics of speech sounds that obtains performance the first voice and the second voice;

Voice masterplate generation module is used for the characteristics of speech sounds parameter of the first voice is generated as the voice masterplate;

Speech processing module, for the characteristics of speech sounds parameter of adjusting the second voice according to sound template, and the characteristics of speech sounds parameter after will adjusting is applied to the second voice.

2. the system as claimed in claim 1 also comprises: the voice acquisition module is used for obtaining the first voice and/or the second voice.

3. the system as claimed in claim 1 also comprises: memory module is used for the storaged voice template.

4. the system as claimed in claim 1, wherein, described characteristics of speech sounds comprises at least one in volume, tone and the tone color characteristic of voice.

5. system as claimed in claim 2, wherein, described voice acquisition module is chosen the first voice and/or the second voice from pre-stored voice.

6. system as claimed in claim 2, wherein, described voice acquisition module uses sound pick-up outfit to record the first voice and/or the second voice.

7. system as claimed in claim 4, wherein, described characteristics of speech sounds parameter comprises at least one in the following parameter: frequency and the range parameter of the overtone of the predetermined quantity of the frequency of the fundamental tone of the volume parameter of performance volume characteristic, performance tone characteristic and range parameter, performance tone color characteristic.

8. the system as claimed in claim 1, wherein, the characteristics of speech sounds parameter acquisition module directly arranges each the characteristics of speech sounds parameter that forms the first required voice of sound template, so that the characteristics of speech sounds parameter that arranges is generated as sound template by the sound template generation module.

9. system as claimed in claim 3, wherein, speech processing module is adjusted the characteristics of speech sounds parameter that the sound template of selecting from the sound template of storage comprises, and generates another sound template different from the sound template of selecting according to the characteristics of speech sounds parameter after adjusting by the sound template generation module.

10. method of speech processing, described method comprises:

Obtain the characteristics of speech sounds parameter of the characteristics of speech sounds of performance the first voice and the second voice;

The characteristics of speech sounds parameter of the first voice is generated as the voice masterplate;

Adjust the characteristics of speech sounds parameter of the second voice according to sound template, and the characteristics of speech sounds parameter after will adjusting is applied to the second voice.

11. method as claimed in claim 10 also comprises: obtain the first voice and/or the second voice.

12. method as claimed in claim 10 also comprises: the storaged voice template.

13. method as claimed in claim 10, wherein, described characteristics of speech sounds comprises at least one in volume, tone and the tone color characteristic of voice.

14. method as claimed in claim 11 wherein, is chosen the first voice and/or the second voice from pre-stored voice.

15. method as claimed in claim 11 wherein, uses sound pick-up outfit to record the first voice and/or the second voice.

16. method as claimed in claim 13, wherein, described characteristics of speech sounds parameter comprises at least one in the following parameter: frequency and the range parameter of the overtone of the predetermined quantity of the frequency of the fundamental tone of the volume parameter of performance volume characteristic, performance tone characteristic and range parameter, performance tone color characteristic.

17. method as claimed in claim 10 wherein, directly arranges the every characteristics of speech sounds parameter that forms the first required voice of sound template, so that the characteristics of speech sounds parameter that arranges is generated as sound template.

18. method as claimed in claim 12, wherein, the characteristics of speech sounds parameter that the sound template of selecting from the voice masterplate of storage comprises is adjusted, and generated another sound template different from the sound template of selecting according to the characteristics of speech sounds parameter after adjusting.