JPS5895800A

JPS5895800A - Coding of voice synthesizer

Info

Publication number: JPS5895800A
Application number: JP56194773A
Authority: JP
Inventors: 茂原　宏; 田中　教成; 浩関口
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1981-12-03
Filing date: 1981-12-03
Publication date: 1983-06-07
Also published as: JPS6343023B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】発明の技術分野この発明は、特に音声分析・合成方式におけるｆｋＲ幅
値等の・臂うメータの符号化を改善した音声合成装置の
符号化方法に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to an encoding method for a speech synthesizer that improves the encoding of a meter such as an fkR width value, particularly in a speech analysis/synthesis method.

発明の技術的背景とその問題点一般に、ＬＰＣ（Ｌｉｎ＠ａｒ　Ｐｒ＊ｄｉａｔｉｖｓ
　Ｃｏｄｉｎｇ　）方式等の音声分析・合成方式による
音声合成装置では、予めメモリに記憶した音声分析デー
タ罠応じた音声の合成が行なわれる。この音声分析デー
タは、音声の振幅値、有声音と無声ｉの区別、有声音の
際のビ、チ周期および声道の共振特性などのノナ２メー
タからなる。このノナ２メータの中で音声の振幅値は発
声の強弱に対応する物理量であり、発声の際の音声に対
する印象や意味の了解性を大きく左右する重壁な因子の
一つである。Technical background of the invention and its problems In general, LPC (Lin@ar Pr*diativs
In a speech synthesis device using a speech analysis/synthesis method such as a coding method, speech is synthesized according to speech analysis data stored in a memory in advance. This voice analysis data consists of nona-meters such as the amplitude value of the voice, the distinction between voiced and unvoiced sounds, the bi- and g-cycles of voiced sounds, and the resonance characteristics of the vocal tract. The amplitude value of the voice within this Nona 2 meter is a physical quantity that corresponds to the strength of the voice, and is one of the important factors that greatly influences the impression of the voice and the intelligibility of the meaning when voiced.

ところで、上記のような音声合成装置では、通電上記の
音声の振幅値等のパラメータは所定のピット数のデノタ
ル値に符号化される。すなわち、第１図に示すように振
幅値であるパラメータＡは符号化回路１１で所定のピッ
ト数のデジタル値ＡＭ、に符号化され、この符号化され
た・譬うメータＡＭ、が音声合成回路（以下単に合成回
路と称する）に入力する。この合成回路の変換テーブル
（通常リードオンリメモリ）１２で符号化された／ｌラ
メータＡＮ、が一定のビ、１・数のデジタル値ＡＩ、２
に再符号化される。そして、このノ４２メータＡＮ２が
合成回路の音源発生回路等に入力してパラ゛メータＡＭ
２に応じた振幅の音声が合成されて出力する。この場合
、符号化回路１１での符号化は、通常第２図に示すよう
に振幅値の〔０，〜、ｘ）またけ〔１，Ａｎｌ、１〕の
区間を量子化するか、または適当な関数（通常対数関数
等の広義単調関数）ｆ囚で変換した後の〔ｆ（Ｏ）、ｆ
（ＡＴｎｌｘ）〕または（ｆ　（１）　、　ｆ　（ＡＩ
Ｔｌ、、））の区間を量子化して行なう。しかしながら
、このような符号化方法では、音声の振幅値（第２図の
轟）は上記のような区間に一様に生起することがないた
め、その区間の振幅値を符号化し′苑場合にはよく用い
られる値とほとんど用いられない値が生じる。さらに、
振ＩＭ１１Ｎは、ダイナミーツク・し／ノが非常に広い
ので、上記区間の下限値が「０」や「１」またはｒ　ｆ
　（０）　Ｊやｒ　ｆ　（１）　Ｊであると、区間が大
金くなる。そのため、その区間を量子化すると、重子化
の輪が大きくなり、振幅の細かい変化の様子を失うばか
シでなく、振幅の急激な変化によ）音質が非常に劣化す
る欠点がある。By the way, in the above-mentioned speech synthesis apparatus, parameters such as the amplitude value of the above-mentioned speech are encoded into digital values of a predetermined number of pits. That is, as shown in FIG. 1, the parameter A, which is an amplitude value, is encoded into a digital value AM having a predetermined number of pits in the encoding circuit 11, and this encoded meter AM is used as a speech synthesis circuit. (hereinafter simply referred to as a synthesis circuit). The /l parameter AN encoded in the conversion table (usually read-only memory) 12 of this synthesis circuit is a constant bi, 1, digital value AI, 2
will be re-encoded. Then, this 42 meter AN2 is input to the sound source generation circuit of the synthesis circuit, etc., and the parameter AM
2 is synthesized and output. In this case, the encoding in the encoding circuit 11 is usually carried out by quantizing the amplitude value in the interval [0, ~, x) spanning [1, Anl, 1], as shown in FIG. function (usually a wide-sense monotone function such as a logarithm function) [f(O), f
(ATnlx)] or (f (1), f (AI
This is done by quantizing the interval Tl, , )). However, in such an encoding method, the amplitude value of the voice (the roar in Figure 2) does not occur uniformly in the above interval, so if the amplitude value of the interval is encoded. produces values that are often used and values that are rarely used. moreover,
The IM11N has a very wide dynamic range, so the lower limit of the above range is "0", "1", or r f
If it is (0) J or r f (1) J, the section will cost a lot of money. Therefore, if that section is quantized, the ring of multiplexing will become larger, and there is a drawback that the sound quality will deteriorate significantly (not only because of the loss of the details of changes in amplitude, but also due to rapid changes in amplitude).

また、上記第１図に示す変換テーブル１２で、デジタル
値ＡＭ、を例えば音源発生回路等の装置に適合するビッ
ト数のデジタル値ＡＮ２に再符号化する必要がある。こ
の場合、例えば区間（７（１）、！（へ、工）〕で符号
化されたデジタル値ＡＭ、が逆変換関数ｆ−によシ俵号
化しであるｆ＊Ａになったとする。そして、この値Ａを
第３図俤）に示すように区間〔０，〜、８□〕でに子化
して、所定のビット数のデジタル値ＡＮ２に符号化する
。Further, in the conversion table 12 shown in FIG. 1, it is necessary to re-encode the digital value AM into a digital value AN2 having a bit number suitable for a device such as a sound source generating circuit. In this case, for example, assume that the digital value AM encoded in the interval (7(1),!(he, 工)) becomes f*A, which is coded by the inverse transformation function f-. , this value A is converted into a child in the interval [0, -, 8□] as shown in FIG. 3, and encoded into a digital value AN2 of a predetermined number of bits.

すなわち、従来では第３図囚、（Ｂ）に示すように、振
幅値人を符号化するときの区間（図（４）の上限ム□８
．と復号化して再符号化するときの区間（図（Ｂ）の上
限〜、Ｘ□が同値である。しかしながら、仁のような再
符号化方法では、音声合成時のｒイン、すなわち”ｍ＊
ｘ、／Ａ０１ａｘ２Ｊが「１」であると、振幅値Ａが大
きい場合には音声合成装置内での計算でオー／４７０−
を起こしやすい欠点がある。したがって、例えば複数の
合成音声の出力音声の振幅値のバランスを調整する場合
には、振幅値Ａそのものを大きくしたシ、小さくしたり
して、全体のバランスを取る必要があるため、きわめて
困難かつ非常に煩雑である。That is, in the past, as shown in Figure 3 (B), the interval when encoding the amplitude value person (the upper limit of 8 in Figure (4)
．． and the interval when decoding and re-encoding (the upper limit in Figure (B) ~, X
If x, /A01ax2J is "1", if the amplitude value A is large, the calculation within the speech synthesizer will result in O/470-
There is a drawback that it is easy to cause Therefore, for example, when adjusting the balance of the amplitude values of the output voices of multiple synthesized voices, it is necessary to increase or decrease the amplitude value A itself to maintain the overall balance, which is extremely difficult and difficult. It's very complicated.

発明の目的この発明は上記の事情を鑑みてなされたもので、音声の
振幅値等のノ量うメータを符号化する場合、振幅値の変
化を正確に再現できる良質な発生を朶現し、複数の合成
音声の発声に対して全体の振幅値のバランスをきわめて
容品に調整できる音声合成装置の符号化方法を提供する
ことを目的とする。Purpose of the Invention The present invention has been made in view of the above circumstances, and is aimed at achieving high-quality generation that can accurately reproduce changes in amplitude value when encoding a meter that measures the amplitude value of speech, etc. An object of the present invention is to provide an encoding method for a speech synthesizer that can extremely gracefully adjust the balance of the overall amplitude value for the utterance of synthesized speech.

発明の概簀上記の目的を達成するため、この発明では音声の振幅値
を音声の振幅分布に基づいた区間を量子化して符号化す
る。また、符号化された振幅値の量子化の区間の上限値
に対して、振幅値を復号化して再符号化する場合の区間
の上限値をさきほどの上限値よりも大きくして、この区
間で再量子化して再符号化する。Summary of the Invention In order to achieve the above object, the present invention encodes the amplitude value of the audio by quantizing the interval based on the amplitude distribution of the audio. Also, with respect to the upper limit of the interval for quantization of the encoded amplitude value, the upper limit of the interval when decoding and re-encoding the amplitude value is set larger than the previous upper limit, and in this interval Requantize and recode.

発明の実施例以下図面を参照してこの発明の一実施例について説明す
る。第４図はこの発明に係る符号化方法を示す図である
。すなわち、音声分析データのノ母うメータである音声
の振幅値人に対して、音声の振幅分布（第４図のａ）に
基づいた振幅値Ａの区間（’ｍｉｎ　＊　”１！１ｍｇ
　）で量子化を行なう。Embodiment of the Invention An embodiment of the invention will be described below with reference to the drawings. FIG. 4 is a diagram showing the encoding method according to the present invention. In other words, the range of amplitude value A ('min * "1!1mg) based on the speech amplitude distribution (a in Figure 4)
) to perform quantization.

または、適当な関数（通常対数関数等の広義単調関数）
ｆ（Ａで変換した後の区間〔ｆ（〜ｌｎ）’ｆ（〜１ｘ
）〕で量子化を行なう。この場合、Ａｍ　ｉ　ｎは下限
値で通常１よシ大きい値で、あり、ム、。は上限値であ
る。Or an appropriate function (usually a broad-sense monotone function such as a logarithmic function)
f(The interval after conversion with A [f(~ln)'f(~1x
)] to perform quantization. In this case, Amin is the lower limit value, which is usually larger than 1, and is . is the upper limit value.

このようにしｔ１音声の振幅の生起分布に基づいて決定
される〜１ｎ、〜１工からなる区間で量子化して、振幅
値Ａを所定のビット数のＡＮ。In this way, the amplitude value A is quantized in the interval consisting of ~1n and ~1n determined based on the occurrence distribution of the amplitude of the t1 voice, and the amplitude value A is AN of a predetermined number of bits.

（第１図に示す）に符号化すれば、従来の量子化幅′よ
り小さくなる。したがって、従来と同じ符号化ビット数
を割当てた場合、音声の振幅の細かい変化の様子を十分
含むことができ、発声の際の音質の劣化を大幅に防ぐこ
とができる。(as shown in FIG. 1), the quantization width becomes smaller than the conventional quantization width. Therefore, when the same number of encoding bits as before is allocated, it is possible to sufficiently include minute changes in the amplitude of the voice, and it is possible to significantly prevent deterioration of sound quality during vocalization.

さらに、上記のように符号化された振幅値ムあ１、すな
わち例えば第３図囚に示すように区間〔０，〜１ｘ、〕
で量子化されて符号化された場合を考える。こ、の振幅
値Ａ）ｌ、を逆変換関数／−１によシ復号化して、ある
値Ａになったとする。この値ムに対して、第３図（Ｑに
示すように、上記Ａｍ、ｘ、より大きい上限値Ａｗａｘ
ｓの区間〔０゜〜、Ｘ、〕で量子化して所定のビット数
のデジタル値ム。２（第１図に示す）に再符号化する。Furthermore, the amplitude value Mu1 encoded as described above, for example, as shown in FIG.
Consider the case where the signal is quantized and encoded. Suppose that the amplitude value A)l is decoded using the inverse transformation function /-1 to obtain a certain value A. For this value Am, as shown in FIG.
A digital value of a predetermined number of bits is quantized in the interval [0°~,X,] of s. 2 (shown in FIG. 1).

この人。のビット数は、通常ＡＭ、のビット数よシ大龜
く、音声合成装置の音源発生回路等からなる合成回路に
適合するＣット数である。this person. The number of bits is usually larger than the number of bits of AM, and is a C cut number suitable for a synthesis circuit including a sound source generation circuit of a speech synthesizer.

このようにして、合成回路に適合するビット数のデジタ
ル値ＡＮ２に再符号化し、この値ＡＭ２を含む音声分析
データを音源発生回路およびデジタルフィルタ等からな
る合成回路に入力すると、所定の振幅値Ａに応じた音声
が合成され発声することになる。この場合、音声合成時
のｒイン、すなわち”ｍａｘ、／　Ａｌｎａｘｓ　Ｊが
１よシ小さｉため、例えば振幅値Ａが大きくなる場合、
合成回路内の計算プ晶セスにおいてオーバフローを防ぐ
ことができる。また、複数の合成音声の出力音圧のバラ
ンス、すなわち出力音声の振幅のバランスを上記ｒイン
ｒ〜＆　ｘ　１　／　ｋｍａ□」を調整することによっ
て、容易に調整できる。したがって、単語や文章を編集
して発声する場合、全体としての発声の強弱の７４ラン
スをきわめて容易に調整できる。なお、ＡｌＴ１．ｘｓ
を’ｍａｘｌより大きくすることによって、振幅の変化
の細かさを失しなうようにみえるが、実際には第１図に
示す最初の符号化されたデジタル値ム８．のビット数よ
）、再符号化されるＡＮ２０ピット数の方が大きい丸め
、問題はなく、音声の振幅の変化を正確に再現すること
ができる。In this way, when the audio analysis data containing the value AM2 is re-encoded into a digital value AN2 with a bit number that is suitable for the synthesis circuit and is input to the synthesis circuit consisting of a sound source generation circuit, a digital filter, etc., a predetermined amplitude value A is obtained. The corresponding voice will be synthesized and uttered. In this case, since r-in during speech synthesis, that is, "max, / Alnaxs J, is smaller than 1, for example, when the amplitude value A becomes large,
Overflow can be prevented in the calculation process within the synthesis circuit. Further, the balance of the output sound pressures of the plurality of synthesized voices, that is, the balance of the amplitudes of the output voices, can be easily adjusted by adjusting the above-mentioned "rinr~&x1/kma□". Therefore, when editing words or sentences and uttering them, the overall strength of the utterance can be adjusted very easily. Note that AlT1. xs
Although it seems that the fineness of the amplitude change is not lost by increasing 'maxl' to a value greater than 'maxl, in reality, the first encoded digital value M8 shown in FIG. (the number of bits), the number of re-encoded AN20 pits is larger, so there is no problem with rounding, and changes in the amplitude of the audio can be reproduced accurately.

°発明の効果以上詳述したようにこの発明によれば、音声分析データ
の振幅値等のノ９ラメータを符号化して、このパラメー
タに応じた音声を合成する音声合成装置において、音声
の振幅分布に基づいた区間で量子化して、振幅値の変化
を正確に再現した良質な発声を実現できる。さらに、ノ
ぐラメータを符号化した区間の上限値よシ大きい上限値
の区間でノ４ツメータを量子化し再符号化して、音声の
振幅の変化を容易に調整できる。したがって、複数の合
成音声の発声に対して全体の振幅値のバランスを容易に
調整できるものである。Effects of the Invention As detailed above, according to the present invention, in a speech synthesis device that encodes parameters such as amplitude values of speech analysis data and synthesizes speech according to the parameters, the amplitude distribution of speech is By quantizing in intervals based on , it is possible to achieve high-quality vocalizations that accurately reproduce changes in amplitude values. Further, by quantizing and re-encoding the 4-meter in an interval whose upper limit value is larger than the upper limit of the interval in which the 4-meter is encoded, it is possible to easily adjust changes in the amplitude of the voice. Therefore, it is possible to easily adjust the balance of the overall amplitude values for the utterances of a plurality of synthesized voices.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は従来の音声分析データの７や２メータを符号化
する回路の概略的構成図、第２図はその符号化方法を説
明する図、第３図（６）、（Ｂ）はその再符号化方法を
説明する図、第３図（Ｑおよび第４図はこの発明の一実
施例に係るパラメータの再符号化および符号化方法を説
明する図でおる。Figure 1 is a schematic diagram of a circuit for encoding conventional voice analysis data of 7 and 2 meters, Figure 2 is a diagram explaining the encoding method, and Figures 3 (6) and (B) are the same. FIG. 3 (Q and FIG. 4 are diagrams explaining a re-encoding method and a method for re-encoding parameters according to an embodiment of the present invention.

Claims

【特許請求の範囲】（リ　音声分析データのノナ２メータに応じて所定の音
声を合成する音声合成装置において、上記・豐うメータ
の振幅値を音声の振幅分布に基づいた範囲で量子化して
、この量子化された範囲の振幅値を符号化することを特
徴とする音声合成装置の符号化方法。（２）　　音声分析データの符号化された・譬うメータ
を音声合成回路の動作に必要なピット数に再符号化して
、このノ譬うメータの内容に応じた所定の音声を合成す
る音声合成装置において、上記符号化されたＪ１ラメー
タを、量子化した時に用いた一定の範囲の上限値に対し
て、この上限値より木きい上限値の範囲で量子化して、
上記符号化されたノ譬うメータを上記必要なぜ、ト数に
再符号化することを％飯とする音声合成装置の符号化方
法。[Claims] (Li) In a speech synthesis device that synthesizes a predetermined speech according to the nonameter of speech analysis data, the amplitude value of the meter is quantized in a range based on the amplitude distribution of the speech. , an encoding method for a speech synthesis device characterized by encoding the amplitude values in this quantized range. (2) The encoded meter of the speech analysis data is necessary for the operation of the speech synthesis circuit. In a speech synthesizer that synthesizes a predetermined voice according to the content of the meter by re-encoding it into a number of pits, the upper limit of a certain range used when quantizing the encoded J1 parameter. The value is quantized within the range of the upper limit that is larger than this upper limit,
An encoding method for a speech synthesizer that involves re-encoding the encoded meter into the necessary number.