JPS62134694A

JPS62134694A - Rule synthesization system

Info

Publication number: JPS62134694A
Application number: JP60275381A
Authority: JP
Inventors: 敏郎柴沼
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-12-06
Filing date: 1985-12-06
Publication date: 1987-06-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔擾既要〕規則合成方式におけるピッチパタンの生成方式において
、−ＪＧに、音声を構成しているピッチパタンの話調成
分の立ち上がり部分以外は、直線で近似しても合成音の
品質には殆ど影吉がないことに着目し、話調成分を直線
近似し、該直線近似された話調成分にアクセント成分を
重畳する際、該直線近似による話調成分の誤差分をその
侭２或いは近似した情報で、上記アクセント成分の第１
番目のアクセント成分のみを補正して重畳することによ
り、ピンチパタンを得るようにしたものである。[Detailed Description of the Invention] [Required] In a pitch pattern generation method in a regular synthesis method, -JG is approximated by a straight line except for the rising part of the tone component of the pitch pattern that makes up the voice. Focusing on the fact that there is almost no shadow in the quality of synthesized speech, the tone component is approximated by a straight line, and when an accent component is superimposed on the linearly approximated tone component, the error in the tone component due to the linear approximation is The first part of the accent component is the second or approximate information.
A pinch pattern is obtained by correcting and superimposing only the th accent component.

〔産業上の利用分野〕[Industrial application field]

本発明は、規則合成方式によって音声を合成する際に用
いられるピッチパタンの生成方式に関する。The present invention relates to a pitch pattern generation method used when synthesizing speech using a rule synthesis method.

最近の計算機技術の著しい進歩に伴って、該計算機シス
テムの制御の下で、必要な情報を音声の形で人出力する
ことが行われるようになってきた。With recent remarkable advances in computer technology, necessary information has come to be output to humans in the form of voice under the control of the computer system.

特に、該計算機システムの中で生成された情報（例えば
、データヘースの内容等）を音声の形で出力する場合、
実時間性を保ちながら、できる限り自然音声に近い合成
音が要求される。In particular, when outputting information generated in the computer system (for example, the contents of a data base, etc.) in the form of audio,
Synthesized speech that is as close to natural speech as possible is required while maintaining real-time characteristics.

音声合成方式には、幾つかの方式があるが、その中で、
計算機システムの記４ｇ容量、処理蚕とか、或いは、大
容量倍量を特徴とする請求がある等の点から規則合成方
式による音声合成が有望視されている。There are several methods of speech synthesis, among which:
Speech synthesis based on the rule synthesis method is viewed as promising in view of claims for computer systems featuring 4g capacity, processing capacity, and large capacity doubling.

規則合成方式による音声合成は、所謂「音声合成規則」
を用いて、任意の音声を合成するもので、一般に、明瞭
性、自然性とも良好な合成音を生成できる規則が必要と
なる。Speech synthesis using the rule synthesis method is the so-called "speech synthesis rule"
is used to synthesize arbitrary speech, and generally requires rules that can generate synthesized speech with good clarity and naturalness.

該合成音声の品質を左右する要素の一つとして、アクセ
ントや、イントネーションに関する規則があり、このよ
うな韻律規則が合成音に自然性を与える上で重要である
ことが知られている。One of the factors that influences the quality of the synthesized speech is rules regarding accent and intonation, and it is known that such prosodic rules are important in giving naturalness to the synthesized speech.

通常、人間が音声を発生するとき、幾つかの単語を−ｊ
３めにし、該文を複数個の発話の単位、即ちアクセント
句に分割して発生する。Normally, when humans produce speech, some words are -j
Third, the sentence is divided into a plurality of utterance units, that is, accent phrases.

この場合、該文を構成するアクセント句の種類には依存
しない基本的なイントネーションのバクーンがあり、そ
の上に各アクセント句のアクセント成分が重畳して、該
文全体の音調を定めていると考えることができる。In this case, there is a basic intonation that does not depend on the type of accent phrases that make up the sentence, and the accent components of each accent phrase are superimposed on this to determine the tone of the entire sentence. be able to.

更に、該アクセント句は、適当なストレス（プロミネン
スとも云う）値の下で発声される。Furthermore, the accented phrase is uttered under an appropriate stress (also called prominence) value.

このような考察から、−Ｃには、音声の音調、即ち、ピ
ッチバタンは、文の基本音調である話調成分と、各アク
セント句固有のアクセント成分の重畳モデルで表現でき
ると言われている。Based on these considerations, -C says that the tone of speech, that is, the pitch bang, can be expressed by a superimposed model of the tone component, which is the basic tone of the sentence, and the accent component unique to each accent phrase. .

従って、上記音声の話調成分と、アクセント成分の与え
方によって、より自然に近い音声合成が可能となること
から、効果的な話調成分、アクセント成分の抽出方式が
待たれることになる。Therefore, since more natural-sounding speech synthesis is possible depending on how the tone components and accent components of the voice are given, an effective method for extracting tone components and accent components is awaited.

〔従来の技術と発明が解決しようとする問題点］第３図
は従来のピッチバタン生成方式を説明する図である。[Prior art and problems to be solved by the invention] FIG. 3 is a diagram illustrating a conventional pitch bang generation system.

一＋Ｃに、音声のピッチバタンの話調成分は、インパル
ス状の話調指令に対する臨界制動２次線形系の応答であ
るとされている。Furthermore, it is said that the tone component of the pitch bang of speech is a response of a critical damping quadratic linear system to an impulse-like tone command.

該臨界制動２次線形系の応答式は、例えば、日本音’ｌ
学会音声Ｎ究会’Ｒ料５７９−０３（１９７９）　　「
日本語平叙文の基本周波数バタンの音響的特徴」５３゜
「文音声のＦ０パクンの分析」の中の、！１３．１　　
ｒ文音声のＦ０パタンのモデル」中にある（１）式で示
されている。The response equation of the critical braking quadratic linear system is, for example, the Japanese sound 'l
Academic Society of Speech Research Group'R Fee 579-03 (1979)
Acoustic characteristics of the fundamental frequency bang of Japanese declarative sentences” 53゜ “Analysis of F0 pakun of sentence sounds”,! 13.1
This is expressed by equation (1) in ``Model of F0 pattern of r-sentence speech''.

第３図（ａ）は、上記の臨界制動２次線形系の応答式を
用いて、音声の話調成分のパラメータを抽出する方式を
模式的に示した図である。FIG. 3(a) is a diagram schematically showing a method for extracting parameters of speech tone components using the response equation of the critical damping quadratic linear system described above.

本方式においては、臨界制動２次線形系の応答シミュレ
ータ１に、所望の合成音声に対応した各種パラメータ　
（例えば、声立て成分のパラメータ。In this method, various parameters corresponding to the desired synthesized speech are added to the response simulator 1 of the critical braking quadratic linear system.
(For example, the parameters of the vocalization component.

アクセント成分のパラメータ）を入力して、対応するピ
ッチバタンを抽出する。(accent component parameters) and extract the corresponding pitch bang.

然しなから、この従来方式においては、正確なピッチバ
タンが得られると云う特徴がある反面、上記臨界制動２
次線形系の応答を計算するのに、種々のパラメータを入
力して計算する必要があり、該計算に時間がかかると云
う問題があった。However, while this conventional method has the characteristic of being able to obtain accurate pitch slams, it also has the disadvantage of
In order to calculate the response of an order linear system, it is necessary to input various parameters and perform the calculation, which poses a problem in that the calculation takes time.

第３図（ｂ）は、従来の他のピッチバタン生成方式を模
式的に示した図であって、一般に、音声の構成するピッ
チバタンの話調成分の立ち上がり部分の非線形的な過渡
部分以外は、直線近似しても、合成音の品質には殆ど影
響しないことに着目して、ピッチバタンの話調成分を直
線近似し、該直線の話調成分にアクセント成分を重畳す
る方式を示している。FIG. 3(b) is a diagram schematically showing another conventional pitch bang generation method. , focusing on the fact that linear approximation has almost no effect on the quality of synthesized speech, presents a method of linearly approximating the tone component of a pitch bang and superimposing an accent component on the tone component of the straight line. .

即ち、話調成分設定部２において、所望の合成音に対応
して得られているパラメータ（即ち、直線近似の話調成
分の始端周波数と、終端周波数）とにより、直線の話調
成分を設定し、アクセント成分重畳部４に入力する。That is, the tone component setting unit 2 sets a straight tone component based on the parameters obtained corresponding to the desired synthesized voice (i.e., the starting frequency and the ending frequency of the tone component of linear approximation). and inputs it to the accent component superimposing section 4.

一方、アクセント成分設定部３においては、所望の合成
音に対して得られているパラメータ　（例えば、中心周
波数と、ストレス値）に基づいて、複数個のアクセント
句の内、先ず、第１番目のアクセント成分を生成し、上
記アクセント成分重畳部４に送出する。On the other hand, the accent component setting unit 3 selects the first accent phrase among the plurality of accent phrases based on the parameters (for example, center frequency and stress value) obtained for the desired synthesized voice. An accent component is generated and sent to the accent component superimposing section 4.

アクセント成分重畳部４においては、上記直線で近似さ
れた話調成分に、上記第１番目のアクセント成分を重畳
する。The accent component superimposing section 4 superimposes the first accent component on the tone component approximated by the straight line.

以下、複￥１個のアクセント成分を順次重畳するように
して所望のピッチパタンを得る。Thereafter, a desired pitch pattern is obtained by sequentially superimposing multiple accent components.

この方式は第３図（ａ）で説明した臨界制動２次線形系
の応答シミュレータｌによるピンチバタンの生成法に比
較して、処理が簡単になると云う特徴があるが、合成音
声の品質が劣化し、自然なイントネーションが得られな
いと云う問題があった。This method is characterized by simpler processing than the method for generating pinch bangs using the response simulator of the critical damping quadratic linear system explained in Fig. 3(a), but the quality of the synthesized speech deteriorates. , there was a problem that natural intonation could not be obtained.

本発明は上記従来の欠点に鑑み、聴覚的に自然発声に近
いイントネーションを持つ合成音を得る為の通切なピッ
チパタンを、少ない処理型で得る方法を提供することを
目的とするものである。SUMMARY OF THE INVENTION In view of the above-mentioned drawbacks of the prior art, it is an object of the present invention to provide a method for obtaining, with less processing, a consistent pitch pattern for obtaining a synthesized sound with an intonation that is aurally close to natural speech.

〔問題点を解決するための手段〕[Means for solving problems]

第１図は、本発明のピッチパタン生成方式の概念を示し
た図である。FIG. 1 is a diagram showing the concept of the pitch pattern generation method of the present invention.

本図において、（ａ）は前述の臨界制動２次線形系の応
答シミュレータによって得られるピッチパタンの生成方
式を示した図であって、インパルスに対する臨界制動２
次線形系の応答として得られる話調成分と、アクセント
成分を重畳することによりピッチパタンが得られること
を示しており、（ｂ）が本発明によるピンチバタン生成
方式を示した図である。In this figure, (a) is a diagram showing a pitch pattern generation method obtained by the response simulator of the critical braking quadratic linear system described above, in which the critical braking 2
It is shown that a pitch pattern can be obtained by superimposing a tone component obtained as a response of a second-order linear system and an accent component, and (b) is a diagram showing a pinch bang generation method according to the present invention.

本発明においては、話調成分を直線■で近イ以し、該近
似処理に伴って生起する、当該話調成分の立ち上がり部
分の誤差情報■（斜線で示す）を、その儘、或いは該誤
差成分に適当な近似処理を施して、上記話調成分の立ち
上がり部分のアクセント成分（具体的には、複数個のア
クセント句の第１番目のアクセント句のアクセント成分
）を補正し、上記直線近似された話調成分■に重畳する
ように構成する。In the present invention, the speech tone component is approximated by a straight line (2), and the error information (indicated by diagonal lines) of the rising part of the speech tone component that occurs due to the approximation process is used as it is or the error information Appropriate approximation processing is applied to the components to correct the accent component of the rising part of the tone component (specifically, the accent component of the first accent phrase of the plurality of accent phrases), and the linear approximation is performed on the component. It is configured so that it is superimposed on the tone component ■.

〔作用〕[Effect]

即ち、本発明によれば、規則合成方式におけるピッチパ
タンの生成方式において、一般に、音声を構成している
ピッチパタンの話調成分の立ち上がり部分以外は、直線
で近似しても合成音の品質にはグｈど影響がないことに
着目し、話調成分を直線近似し、該直線近似された話調
成分にアクセント成分を重畳する１際、該直線近イ以に
よる話調成分の誤差分をその儘、或いは近似した情報で
、上記アクセント成分の第１番目のアクセント成分のみ
を補正して重畳することにより、ピッチパタンを得るよ
うにしたものであるので、簡単な処理で、自然なインド
皐−ションを持つ合成音声が得られる効果がある。That is, according to the present invention, in the pitch pattern generation method in the rule synthesis method, generally speaking, except for the rising part of the tone component of the pitch pattern that makes up the voice, the quality of the synthesized sound is not affected even if it is approximated by a straight line. Focusing on the fact that there is no influence on the tone component, when the tone component is linearly approximated and the accent component is superimposed on the linearly approximated tone component, the error in the tone component due to the linear approximation is calculated. The pitch pattern is obtained by correcting and superimposing only the first accent component of the above accent components using that information or approximate information, so it is possible to obtain a pitch pattern with a simple process. This has the effect of producing synthesized speech with -motion.

〔実施例〕〔Example〕

以下本発明の実施例を図面によって詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

第２図は本発明の一実施例を模式的に示した図であり、
アクセン１−成分補正部５が本発明を実施するのに必要
な手段である。尚、全図を通して同じ符号は同じ対象物
を示している。以下、第１図（ｂ）を参照しながら、本
発明によるピッチパタンの生成方式を説明する。FIG. 2 is a diagram schematically showing an embodiment of the present invention,
The accent 1-component correction section 5 is a necessary means for implementing the present invention. Note that the same reference numerals indicate the same objects throughout the figures. Hereinafter, a pitch pattern generation method according to the present invention will be explained with reference to FIG. 1(b).

先ず、従来方式と同じようにして、話調成分設定部２に
おいて、所望の合成音声に対応するパラメータにより、
直線で近似した話調成分を生成し、アクセント成分重畳
部４に送出する。First, in the same manner as in the conventional method, the tone component setting section 2 sets the parameters corresponding to the desired synthesized speech.
A tone component approximated by a straight line is generated and sent to the accent component superimposition section 4.

このとき、該話調成分の直線近似に伴う誤差情報が非線
形情報蓄積部２１に格納される。At this time, error information associated with the linear approximation of the tone component is stored in the nonlinear information storage section 21.

同様に、アクセント成分設定部３において、所望の合成
音声に対応するパラメータにより、複数個のアクセント
句のアクセント成分が順次設定される。Similarly, in the accent component setting section 3, accent components of a plurality of accent phrases are sequentially set using parameters corresponding to a desired synthesized speech.

ここで、本発明においては、アクセント成分補正部５に
おいて、非線形情報蓄積部２１に保有している誤差情報
に基づいて、該誤差情報をその儘、或いは、適当な近似
処理を施した後、上記第１番目のアクセント句のアクセ
ント成分を補正する（第１図、（ｂ）参照）該補正された第１番目のアクセント成分を、次のアクセ
ント成分重畳部４において重畳する。Here, in the present invention, based on the error information held in the nonlinear information storage section 21, the accent component correction section 5 processes the error information as it is, or after performing appropriate approximation processing, the above-mentioned The accent component of the first accent phrase is corrected (see FIG. 1, (b)). The corrected first accent component is superimposed in the next accent component superimposing section 4.

第２番目以降のアクセント句については、上記補正を行
うことなく、従来方式と同じようにして、アクセント成
分設定部で生成された、各アクセント成分を、上記直線
近似されている話調成分に順次重畳することを繰り返し
て、所望の合成音声に対応するピノチバクンを得る。For the second and subsequent accent phrases, without performing the above correction, in the same way as in the conventional method, each accent component generated by the accent component setting section is sequentially added to the tone component linearly approximated above. By repeating the superimposition, a pinochibakun corresponding to the desired synthesized speech is obtained.

このように、本発明は、所望の合成音声に対応した話調
成分を直線で近似し、該話調成分の立ち上がり部分の非
線形的な過渡部を直線近似したときに生じる誤差分をそ
の儘、或いは適当な近似処理を施して、対応するアクセ
ント成分を補正するようにした所に特徴がある。As described above, the present invention approximates the speech tone component corresponding to the desired synthesized speech with a straight line, and then calculates the error that occurs when the nonlinear transition part of the rising part of the speech tone component is approximated by the straight line. Alternatively, the feature is that appropriate approximation processing is performed to correct the corresponding accent component.

〔発明の効果〕〔Effect of the invention〕

以上、詳細に説明したように、本発明の規則合成方式は
、規則合成方式におけるピソチバクンの生成方式におい
て、一般に、音声を構成しているピソチバクンの話調成
分の立ち上がり部分以外は、直線で近似しても合成音の
品質には殆ど影υがないことに着目し、話調成分を直線
近似し、該直線近似された話調成分にアクセント成分を
重畳する際、該直線近似による話調成分の誤差分をその
侭。As explained above in detail, in the rule synthesis method of the present invention, in the method of generating pisochibakun in the rule synthesis method, in general, except for the rising part of the tone component of pisochibakun that constitutes the voice, it is approximated by a straight line. Focusing on the fact that there is almost no effect υ on the quality of the synthesized speech even if the tone component is approximated by a straight line, and when superimposing an accent component on the linearly approximated tone component, the tone component due to the linear approximation is The error is accounted for.

或いは近似した情報で、上記アクセント成分の第１番目
のアクセント成分のみを補正して重畳ずろことにより、
ピソチバクンを得るようにしたものであるので、簡単な
処理で、自然なイントネーションを持つ合成音声が得ら
れる効果がある。Alternatively, by correcting and superimposing only the first accent component of the accent components using approximate information,
Since it is designed to obtain pisochibakun, it has the effect of obtaining synthesized speech with natural intonation with simple processing.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は本発明の詳細な説明する図。第２図は本発明の一実施例を模式的に示した図。第３図は従来のピッチパタン生成方式を説明する図。である。図面において、１は臨界制動２次線形系の応答シミュレータ。２は話調成分設定部、２１は非線形情報蓄積部。３はアクセント成分設定部。４はアクセント成分重畳部。５はアクセント成分補正部。 ■は直線で近似された話調成分。 ■はアクセント成分の補正分。をそれぞれ示す。 FIG. 1 is a diagram explaining the present invention in detail. FIG. 2 is a diagram schematically showing an embodiment of the present invention. FIG. 3 is a diagram explaining a conventional pitch pattern generation method. It is. In the drawing, 1 is a response simulator of a critical braking quadratic linear system. 2 is a tone component setting section, and 21 is a nonlinear information storage section. 3 is the accent component setting section. 4 is an accent component superimposition part. 5 is an accent component correction section. ■ is a tone component approximated by a straight line. ■ is the correction amount for the accent component. are shown respectively.

Claims

【特許請求の範囲】規則合成方式で音声を合成する時に使用されるピッチパ
タンを生成する際に、アクセント成分と、話調成分とを
別々に求めて重畳する方式において、話調成分の立ち上
がり部分の非線形的部分を、その儘、或いは近似した情
報として保有する第１の手段（２１）と、上記非線形部分を含む話調成分を直線で近似する第２の
手段（２）とを設け、話調成分の立ち上がり部分のアクセント成分に、上記第
１の手段（２１）で保有している情報を重畳したアクセ
ント成分と、上記第２の手段（２）で生成した直線で近
似した話調成分とを重畳することにより、ピッチパタン
を生成するようにしたことを特徴とする規則合成方式。[Claims] In a method in which an accent component and a tone component are obtained separately and superimposed when generating a pitch pattern used when synthesizing speech by a rule synthesis method, the rising portion of the tone component is A first means (21) for retaining the nonlinear part of the nonlinear part as it is or approximate information, and a second means (2) for approximating the tone component including the nonlinear part with a straight line, An accent component obtained by superimposing the information held in the first means (21) on the accent component of the rising part of the tone component, and a tone component approximated by the straight line generated by the second means (2). A regular synthesis method characterized in that a pitch pattern is generated by superimposing .