JPS61122700A

JPS61122700A - Synthetic voice pronunciation speed control system

Info

Publication number: JPS61122700A
Application number: JP24398284A
Authority: JP
Inventors: 敏郎柴沼; 大井　泰; 金盛　亨
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-11-19
Filing date: 1984-11-19
Publication date: 1986-06-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は合成音声発声速度制御方式、特に音声合成にお
いて音声合成用パラメータ時系列からパラメータを間引
いて読み出すことにより１合成の際における発声速度を
制御する合成音声発声速度制御方式に関するものである
。[Detailed Description of the Invention] [Field of Industrial Application] The present invention provides a method for controlling the speech rate of synthesized speech, particularly in speech synthesis, by thinning out and reading out parameters from a time series of parameters for speech synthesis to reduce the speech rate during one synthesis. This invention relates to a method for controlling the rate of synthetic speech production.

〔従来の技術と問題点〕[Conventional technology and problems]

第３図は従来方式による発声速度の調整を説明するため
の図である。FIG. 3 is a diagram for explaining the adjustment of the speaking rate according to the conventional method.

例えば、規則合成においては、Ｃ（子音）／Ｖ（母音）
の組合せからなる音節、ｃｖ、ｃｖｖ。For example, in rule synthesis, C (consonant) / V (vowel)
A syllable consisting of a combination of cv, cvv.

ｖｃｖ、ｃｖｃ等を合成単位として、これらの音響パラ
メータを、いわゆるＰＡＲＣＯＲ方式、ＬＳＰ方式等に
より定める。そして、これらを各合成単位対応に音声合
成用パラメータ時系列として記憶しておき、入力された
文字列の読みに対応する一連の音声合成用パラメータ時
系列を所定のりイミング周期で合成部に供給し、音声の
合成を行うようにされる。Using vcv, cvc, etc. as a synthesis unit, these acoustic parameters are determined by the so-called PARCOR method, LSP method, etc. Then, these are stored as speech synthesis parameter time series corresponding to each synthesis unit, and a series of speech synthesis parameter time series corresponding to the reading of the input character string is supplied to the synthesis unit at a predetermined timing cycle. , to perform voice synthesis.

一般に、音声合成における発声速度は、各合成単位毎の
音声合成用パラメータ時系列のフレーム数と、パラメー
タを処理するタイミング周期とによって定まるが、従来
５例えば規則合成において発声速度を変化させる場合、
第３図図示の如く。In general, the speaking speed in speech synthesis is determined by the number of frames in the speech synthesis parameter time series for each synthesis unit and the timing period for processing the parameters. Conventionally, for example, when changing the speaking speed in rule synthesis,
As shown in Figure 3.

補間の時間長を変える方式が用いられてきた。即ち、１
合成重位の音声合成用パラメータ時系列における子音部
Ｃおよび母音部ｖ１と、母音後半部■２との間に、補間
部Ｍを設け、この補間部Ｍ内に当該母音に関連する音声
合成用パラメータ時系列を挿入する。この補間するパラ
メータ時系列の数を多（すれば９発声速度は遅くなり、
補間するパラメータ時系列の数を少なくすれば２発声速
度は速くなることになる。A method of varying the interpolation time length has been used. That is, 1
An interpolation part M is provided between the consonant part C and the vowel part v1 and the second half of the vowel ■2 in the time series of synthesis-focused speech synthesis parameters. Insert parameter time series. If you increase the number of parameter time series to be interpolated (9, the speaking rate will become slower,
If the number of parameter time series to be interpolated is reduced, the utterance speed will become faster.

しかし、上記従来の方式によれば、第３図（ハゞ　　　
　　　）図示のように、子音および母音のパラメータ時
系列と母音後半のパラメータ時系列とを合わせたものに
よる時間長よりも１発声時間を短くすることができず、
全く補間を行わない場合の基準となる子音＋母音と母音
後半のパラメータ時系列によって最大発声速度が決まっ
てしまうという問題がある。また、もしこの基準となる
パラメータ時系列について最初から短いものを登録して
おくとすると　Ｍを行って発声時間を長くした場合に。However, according to the conventional method described above,
) As shown in the figure, it is not possible to make one utterance time shorter than the combined time length of the parameter time series of consonants and vowels and the parameter time series of the second half of the vowel.
When no interpolation is performed at all, there is a problem in that the maximum speech rate is determined by the parameter time series of the consonant + vowel and the second half of the vowel, which serve as a reference. Also, if you register a short parameter time series from the beginning to serve as the standard, then when M is performed to lengthen the utterance time.

合成音声の品質が劣化するという問題がある。There is a problem that the quality of the synthesized speech deteriorates.

Ｃ問題点を解決するための手段〕本発明は上記問題点の解決を図り、音声合成用パラメー
タ時系列を、指示された発声速度に応じて間引いて読み
出し、この間引いて読み出したパラメータ時系列につい
て、音声を合成するディジタル信号処理部で処理するこ
とにより、読み出しのタイミング周期を変えずに発声速
度を速くすることを可能としたものである。即ち１本発
明の合成音声発声速度制御方式は、各合成単位毎に音声
合成用パラメータ時系列が格納された音声合成用パラメ
ータ格納部と、入力された文字列の読みに対応する音声
合成用パラメータ時系列を上記音声合成用パラメータ格
納部から取り出す合成用制御部と、音声合成用パラメー
タ時系列に従って音声を合成するディジタル信号処理部
とを備えた音声合成装置において１合成音声の発声速度
を指示する手段と、指示された発声速度が標準発声速度
よりも大きいとき上記合成用制御部が取り出した音声合
成用パラメータ時系列に対する間引率を算出する手段と
、該手段により算出した間引率に基づいて上記音声合成
用パラメータ時系列を間引いて上記ディジタル信号処理
部へ供給する手段とを備えたことを特徴としている。以
下２図面を参照しつつ、実施例に従って説明する。Means for Solving Problem C] The present invention aims to solve the above problems, and reads out a parameter time series for speech synthesis by thinning it out according to the instructed speaking rate, and then reading out the parameter time series after thinning out the parameter time series read out after being thinned out according to the instructed speaking rate. By processing the audio in a digital signal processing unit that synthesizes the audio, it is possible to increase the speaking speed without changing the readout timing cycle. In other words, the synthesized speech utterance rate control method of the present invention includes a speech synthesis parameter storage section in which a speech synthesis parameter time series is stored for each synthesis unit, and a speech synthesis parameter corresponding to the reading of an input character string. Instructing the speaking speed of one synthesized speech in a speech synthesis device comprising a synthesis control section that retrieves a time series from the speech synthesis parameter storage section and a digital signal processing section that synthesizes speech according to the speech synthesis parameter time series. means for calculating a thinning rate for the speech synthesis parameter time series extracted by the synthesis control unit when the instructed speech rate is greater than the standard speech rate; The present invention is characterized by comprising means for thinning out the speech synthesis parameter time series and supplying the same to the digital signal processing section. Embodiments will be described below with reference to two drawings.

〔実施例〕〔Example〕

第１図は本発明の一実施例構成ブロック図、第２図は本
発明の一実施例による速度制御を説明するための図を示
す。FIG. 1 is a block diagram of a configuration of an embodiment of the present invention, and FIG. 2 is a diagram for explaining speed control according to an embodiment of the present invention.

図中５１は音声合成用パラメータ格納部、２はバッファ
、３は合成用制御部、４は発声速度指示部、５は発声速
度制御部、６は補間／間引率算出部、７はパラメータ調
整部、９はバンドパスフィルタ、１０はスピーカ、１１
は音声合成における一連の流れを制御する主制御部を表
わす。In the figure, 51 is a speech synthesis parameter storage section, 2 is a buffer, 3 is a synthesis control section, 4 is a speech rate instruction section, 5 is a speech rate control section, 6 is an interpolation/thinning rate calculation section, and 7 is a parameter adjustment section. , 9 is a bandpass filter, 10 is a speaker, 11
represents a main control unit that controls a series of steps in speech synthesis.

音声合成用パラメータ格納部１は、ｃｖ、ｃｖｖ、ｖｃ
ｖまたはＣＶＣ等の各合成単位毎に、その各音節などに
対応する音声合成用パラメータ時系列を記憶しているも
のである。バッファ２は。The speech synthesis parameter storage unit 1 includes cv, cvv, vc
For each synthesis unit such as v or CVC, a time series of speech synthesis parameters corresponding to each syllable is stored. Buffer 2 is.

音声合成する入力文字列データまたは音声合成用パラメ
ータ格納部１から取り出された音声合成用パラメータ時
系列を一時的に記憶するメモリである。合成用制御部３
は、入力された文字列の読みに対応する音声合成用パラ
メータ時系列を、音声合成用パラメータ格納部１から取
り出す制御を実行するものである。発声速度指示部４は
１例えば入力文字列の先頭に付される指示データまたは
外部スイッチからの信号等によって、指示された発声速
度を１発声速度制御部５に通知する。This memory temporarily stores input character string data for speech synthesis or a time series of speech synthesis parameters taken out from the speech synthesis parameter storage section 1. Synthesis control unit 3
is for executing control for extracting a speech synthesis parameter time series corresponding to the reading of an input character string from the speech synthesis parameter storage unit 1. The speaking speed instructing section 4 notifies the designated speaking speed to the speaking speed controlling section 5, for example, by means of instruction data appended to the beginning of the input character string or a signal from an external switch.

発声速度制御部５は、ディジタル信号処理部に供給する
音声パラメータ時系列について９発声速度が標準の速度
よりも遅く指示された場合には。When the speech rate control unit 5 is instructed to make the speech rate slower than the standard rate for the audio parameter time series to be supplied to the digital signal processing unit.

パラメータ時系列間を補間する制御を行い、また標準の
速度よりも速く指示された場合には、パラメータ時系列
内のパラメータを間引く制御を行うものである。補間／
間引率算出部６は１発声速度指示部からの指示通知によ
り、補間するか間引くかを決定し、補間する場合には、
補間するパラメータ時系列の数を決定する。また、パラ
メータを間引くときには、指示さた発声速度に基づいて
。Control is performed to interpolate between parameter time series, and when instructions are given faster than the standard speed, control is performed to thin out parameters within the parameter time series. interpolation/
The thinning rate calculation unit 6 determines whether to perform interpolation or thinning based on the instruction notification from the speech rate instruction unit, and when performing interpolation,
Determine the number of parameter time series to interpolate. Also, when thinning parameters, it is based on the instructed speaking speed.

何フレームにつき何フレームを間引けば指示された発声
速度になるかを計算し１間引率を求める。Calculate how many frames must be thinned out to achieve the instructed speaking speed, and obtain a thinning rate of 1.

パラメータ調整部７は、補間／間引率算出部６の算出結
果に基づいて、実際にディジタル信号処理部へ送出する
パラメータ時系列の増減を行うものである。The parameter adjustment section 7 increases or decreases the parameter time series actually sent to the digital signal processing section based on the calculation result of the interpolation/thinning rate calculation section 6.

ディジタル信号処理部８は、供給された音声合成用パラ
メータ時系列について、所定の合成方式に基づき、音声
合成を行うものである。音声波形を定めるディジタル信
号は、Ｄ／Ａ変換器によってアナログ信号に変換され、
バンドパスフィルタへ９へ出力される。バンドパスフィルタ９によって。The digital signal processing section 8 performs speech synthesis on the supplied speech synthesis parameter time series based on a predetermined synthesis method. The digital signal that defines the audio waveform is converted into an analog signal by a D/A converter,
It is output to band pass filter 9. by bandpass filter 9.

不要な高周波成分の除去などが行われ、スピーカ１０か
ら指示された発声速度による合成音声が出力されること
になる。Unnecessary high frequency components are removed, and synthesized speech at the instructed speaking rate is output from the speaker 10.

パラメータ調整部７におけるパラメータ調整において、
パラメータを補間する場合については。In parameter adjustment in the parameter adjustment section 7,
As for interpolating parameters.

従来と同様であると考えてよいので、パラメータを間引
く場合について、さらに詳細に説明する。Since this can be considered to be the same as the conventional method, the case where parameters are thinned out will be explained in more detail.

例えば、入力文字列によって定められる音節について、
音声合成用パラメータ格納部１に格納されている標準の
音声合成用パラメータ時系列が、第２図（イ）図示のよ
うに、フレームＦｌ、Ｆ２゜Ｆ３．・・・Ｆｎのパラメ
ータ時系列となっていたとする。標準の速度から例えば
３割のスピードアップが指示された場合、１０個のフレ
ームにつき。For example, for the syllables defined by the input string,
The standard speech synthesis parameter time series stored in the speech synthesis parameter storage unit 1 is divided into frames Fl, F2°F3, . . . . Assume that the parameters of Fn are time series. If a speedup of, say, 30% from the standard speed is instructed, then for every 10 frames.

３個のフレームを間引いて、ディジタル信号処理部８に
出力する。即ち、第２図（ロ）図示のように、フレーム
Ｆ３．Ｆ６．・・・等の約３個に１個のフレームを除去
したパラメータ時系列を出力する。Three frames are thinned out and output to the digital signal processing section 8. That is, as shown in FIG. 2(b), frame F3. F6. . . . outputs a parameter time series in which approximately one out of every three frames is removed.

種・々の間引率について、第何番目のフレームを間引く
かについては、カウンタと演算によって、そ　　　　　
　うの都度決定してもよく、また各間引率に対応して間
引くフレーム番号を、予めテーブル化して記憶しておく
ようにしてもよい。The number of frames to be thinned out for various thinning rates can be determined using counters and calculations.
The frame numbers to be thinned out may be determined each time the frames are thinned out, or the frame numbers to be thinned out corresponding to each thinning rate may be stored in advance in a table.

第２図に示した例では２合成重位がＣＶＶの形式となっ
ているが、他の形式を持つ音節についても同様である。In the example shown in FIG. 2, the binary compound weight is in the CVV format, but the same applies to syllables with other formats.

本発明によれば、子音部における音声合成用パラメータ
時系列についての間引きもなされる。これにより９発声
速度の調節に自由度が増すが１例えば音声合成用パラメ
ータ時系列が。According to the present invention, the time series of speech synthesis parameters in consonant parts is also thinned out. This increases the degree of freedom in adjusting the speech rate; for example, the time series of parameters for speech synthesis.

ＰＡＲＣＯＲ方式によって定められている場合などには
、パラメータ時系列の間引きによる合成音声の品質の劣
化は比較的小さい。When determined by the PARCOR method, the quality of synthesized speech deteriorates relatively little due to thinning of the parameter time series.

〔発明の効果〕〔Effect of the invention〕

以上説明した如く１本発明によれば、音声合成用パラメ
ータ時系列を間引いた系列でもって、音声合成を行うこ
とにより、読み出しのタイミング周期を変えずに発声速
度を速くすることができるようになる。As explained above, according to the present invention, by performing speech synthesis using a thinned-out speech synthesis parameter time series, it is possible to increase the speech rate without changing the readout timing cycle. .

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例構成ブロック図、第２図は本
発明の一実施例による速度制御を説明するだめの図、第
３図は従来方式による発声速度の調整を説明するための
図を示す。図中、１は音声合成用パラメータ格納部、２はバッファ
、３は合成用制御部、４は発声速度指示部、５は発声速
度制御部、６は補間／間引率算出部、７はパラメータ調
整部、９はバンドパスフィルタ、１０はスピーカ、１１
は主制御部を表わす。FIG. 1 is a block diagram of a configuration of an embodiment of the present invention, FIG. 2 is a diagram for explaining speed control according to an embodiment of the present invention, and FIG. 3 is a diagram for explaining adjustment of speech rate using a conventional method. Show the diagram. In the figure, 1 is a speech synthesis parameter storage unit, 2 is a buffer, 3 is a synthesis control unit, 4 is a speech rate instruction unit, 5 is a speech rate control unit, 6 is an interpolation/thinning rate calculation unit, and 7 is a parameter adjustment unit. 9 is a band pass filter, 10 is a speaker, 11
represents the main control section.

Claims

【特許請求の範囲】[Claims]

各合成単位毎に音声合成用パラメータ時系列が格納され
た音声合成用パラメータ格納部と、入力された文字列の
読みに対応する音声合成用パラメータ時系列を上記音声
合成用パラメータ格納部から取り出す合成用制御部と、
音声合成用パラメータ時系列に従って音声を合成するデ
ィジタル信号処理部とを備えた音声合成装置において、
合成音声の発声速度を指示する手段と、指示された発声
速度が標準発声速度よりも大きいとき上記合成用制御部
が取り出した音声合成用パラメータ時系列に対する間引
率を算出する手段と、該手段により算出した間引率に基
づいて上記音声合成用パラメータ時系列を間引いて上記
ディジタル信号処理部へ供給する手段とを備えたことを
特徴とする合成音声発声速度制御方式。A speech synthesis parameter storage section in which a speech synthesis parameter time series is stored for each synthesis unit, and a synthesis in which a speech synthesis parameter time series corresponding to the reading of an input character string is retrieved from the speech synthesis parameter storage section. a control unit for
A speech synthesis device comprising a digital signal processing unit that synthesizes speech according to a time series of speech synthesis parameters,
means for instructing the speech rate of the synthesized speech; means for calculating a thinning rate for the speech synthesis parameter time series taken out by the synthesis control section when the instructed speech rate is greater than the standard speech rate; A synthesized speech utterance rate control method, comprising means for thinning out the speech synthesis parameter time series based on the calculated thinning rate and supplying the thinned out to the digital signal processing section.