JPS599920B2

JPS599920B2 - Audio parameter modification method

Info

Publication number: JPS599920B2
Application number: JP56214565A
Authority: JP
Inventors: 亨金盛; 忠靖杉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-12-25
Filing date: 1981-12-25
Publication date: 1984-03-06
Also published as: JPS58111994A

Description

【発明の詳細な説明】（１）発明の技術分野本発明は、音声の分析合成方式において、合成音声中の
異音などの聞き辛い音の発生を抑制するための音声パラ
メータ修正方式に関し、特に音声パラメータを極周波数
と帯域幅の関数に変換して修正処理を行なう方式に関す
る。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention relates to a speech parameter modification method for suppressing the occurrence of hard-to-hear sounds such as abnormal sounds in synthesized speech in a speech analysis and synthesis method. This paper relates to a method for performing correction processing by converting audio parameters into functions of polar frequency and bandwidth.

（２）技術の背景一般に、ＰＡＲＣＯＲ、ＬＳＰなどの線形予測方式では
、Ｐを予測次数、ａｉを線形予測係数としたとき、伝達
関数Ｈ（２）■ １＋ Σ ａｉｚ−１ｉ＝１によつて表わされる全極型モデルにより音声のスペクト
ル包絡特性を近似するものであるため、このモデルには
あてはまらない零点をもつ伝達関数になる場合の多いＮ
やＭなどの鼻音性の子音や、撥音、あるいは母音のＩな
どの第１フオルマント周波数が低い音声で、かつ基本周
波数が第１フオルマント周波数にほぼ一致している場合
などに第１フオルマントの帯域幅が異常に狭く分析され
ることがある。(2) Technical background In general, in linear prediction methods such as PARCOR and LSP, where P is the prediction order and ai is the linear prediction coefficient, the transfer function H (2) ■ 1+ Σ aiz-1 i=1 Since the spectral envelope characteristics of speech are approximated by the all-pole model expressed, the N
The bandwidth of the first formant is used when the first formant frequency is low, such as nasal consonants such as and M, or the vowel I, and when the fundamental frequency almost matches the first formant frequency. may be analyzed unusually narrowly.

このような場合には、合成された音声は、その振幅が非
常に大きくなつたり、あるいは、聞き辛い異音を含むも
のとなつたりすることが多い。（３）従来技術と問題点従来、音声分析合成を行なう場合で、特定の合成音声の
品質を向上させるためには、合成と試聴を繰り返しなが
ら人手によつてパラメータの異常が生じている時点を探
索し、さらにイくラメータに適当な修正を加えてみる、
という作業を繰り返さねばならなかつた。In such cases, the synthesized speech often has a very large amplitude or contains abnormal sounds that are difficult to hear. (3) Prior art and problems Conventionally, when performing speech analysis and synthesis, in order to improve the quality of a specific synthesized speech, it is necessary to manually identify the point at which an abnormality in a parameter occurs while repeating synthesis and audition. Explore and make appropriate modifications to the Ikura meter.
I had to repeat this process.

特に短いが強Ｎ、−ｙｌｌ激をもつて・出現する異音
を抑制するため修正処理は、音声パラメータとの関連が
不明なため、困難であつた。（４）発明の目的本発明は
、異常な音声を発生する可能性が特に高い上記のような
パラメータを、自動的に修正し、て、合成音声の品質を
向上させることを目的とする。In particular, it was difficult to perform correction processing to suppress the abnormal sounds that appear with short but strong N and -yll sounds because the relationship with voice parameters is unclear. (4) Purpose of the Invention The present invention aims to improve the quality of synthesized speech by automatically correcting the above-mentioned parameters that are particularly likely to cause abnormal speech.

（５）発明の構成 η − 本発明は、第１図に示すように、スペクトル包絡Ｆ中の
狭少な帯域幅Ｂｉをもつ極周波数Ｆｉが、合成音声中に
おける強いパワーレベルをもつた異音の出現に密接に関
係していることに着目し、線形予測分析されたバラメー
タを極周波数と帯域幅との関数に変換して、狭少な帯域
幅をもつ極を検出し、該帯域幅を修正するものであり、
そのような極が検出された場合には、音声合成の際のデ
イジタルフイルタにおいて該極近傍の利得を下げるよう
に、帯域幅Ｂｉの値を大きくし、Ｑを小さくすることに
より、異音のパワーレベルを抑制するものである。(5) Structure of the Invention η - As shown in FIG. Focusing on the fact that it is closely related to appearance, the parameters analyzed by linear prediction are converted into a function of pole frequency and bandwidth, detecting poles with narrow bandwidth, and correcting the bandwidth. It is a thing,
When such a pole is detected, the power of the abnormal sound is reduced by increasing the value of the bandwidth Bi and decreasing the Q so as to reduce the gain near the pole in the digital filter during speech synthesis. This is to suppress the level.

本発明は、そのための構成として、音声波のスベクトル
情報をパラメータ化して用いる音声の分析合成方式にお
いて、入力音声を線形予測分析する手段と、該線形予測
分析された結果を極周波数とその帯域幅との関数に近似
的に交換する手段と、該変換された近似関数の中の狭少
な帯域幅をもつ極周波数を検出する手段と、該検出手段
が狭少な帯域幅をもつ極周波数を検出したとき、該極周
波数の帯域幅を広げる方向に近似関数を修正する手段と
、該修正された近似関数に基づいてパラメータ変換する
手段とをそなえ、音声波のスペクトル情報を線形予測分
析パラメータ化する中間段階で一旦、極周波数と帯域幅
との関数に変換し、該関数を介してパラメータ修正を行
なうことを特徴としている。As a configuration for this purpose, the present invention provides a method for analyzing and synthesizing speech using parametrized vector information of speech waves, which includes a means for performing linear predictive analysis on input speech, and a means for analyzing the linear predictive analysis using polar frequencies and their bands. means for approximately converting the approximation function into a function of a width, means for detecting a polar frequency having a narrow bandwidth in the transformed approximation function, and means for detecting a polar frequency having a narrow bandwidth in the transformed approximation function. When this occurs, the method includes means for modifying the approximation function in the direction of widening the bandwidth of the polar frequency, and means for converting parameters based on the modified approximation function, and converts the spectral information of the audio wave into linear predictive analysis parameters. It is characterized in that it is once converted into a function of pole frequency and bandwidth in an intermediate stage, and parameters are modified via this function.

（６）発明の実施例以下に、本発明を実施例にしたがつて説明する。(6) Examples of the invention The present invention will be explained below using examples.

第２図は、本発明を適用した音声分析器の実施例の構成
図である。図において、１は従来の音声分析合成方式の
分析処理部、２は本発明による音声パラメータの修正処
理部である。３は線形予測分析部であり、入力音声から
線形予測係数Ａｉを求める処理を行ない、４はＰＡＲＣ
ＯＲ係数変換部であり、ＡｉをＰＡＲＣＯＲ係数Ｋｉに
変換し、出力する。FIG. 2 is a block diagram of an embodiment of a speech analyzer to which the present invention is applied. In the figure, 1 is an analysis processing section of the conventional speech analysis and synthesis method, and 2 is a speech parameter correction processing section according to the present invention. 3 is a linear prediction analysis unit that performs processing to obtain a linear prediction coefficient Ai from input audio, and 4 is a PARC
This is an OR coefficient conversion unit, which converts Ai into a PARCOR coefficient Ki and outputs it.

修正処理部２において、５は極周波数・帯域幅変換部で
あり、前述した音声スペクトル包絡の全極型近似モデル
伝達関数Ｈ（ｚ）の分母をＡ（ｚ）としたときの方程式
について、ニユートン・ラフソン法などによりＡ（ｚ）
＝０を満たすｚ平面上の根を求める求根演算と、演算結
果の解をＺｉ＝ＥｊＯｌとし、サンプリング周期をＴと
したとき、これらから、極周波数および帯域幅を求める変換処理とを行なう。In the correction processing unit 2, 5 is a polar frequency/bandwidth conversion unit, and the equation when the denominator of the all-pole approximation model transfer function H(z) of the audio spectrum envelope described above is A(z) is expressed by Newton.・A(z) by Rafson method etc.
When the solution of the calculation result is Zi=EjOl and the sampling period is T, a conversion process is performed to find the polar frequency and bandwidth from these.

６は、特に狭少な帯域幅Ｂｉをもつ極周波数Ｆｉの抽出
部である。6 is a part for extracting the polar frequency Fi having a particularly narrow bandwidth Bi.

狭少な帯域幅は、Ｂｉの値がある一定値以下であるか、
あるいはＱ−ｆｌ／Ｂｉがある一定値以上であるかの条
件によつて判定され、抽出される。７１１ζ狭少帯域幅
修正部であり、抽出された極の帯域幅について、上記狭
少帯域幅の判定条件から外れるような値に修正する。A narrow bandwidth means that the value of Bi is below a certain value, or
Alternatively, it is determined and extracted based on the condition that Q-fl/Bi is greater than or equal to a certain value. 711ζ is a narrow bandwidth correction unit that corrects the bandwidth of the extracted pole to a value that deviates from the above-mentioned narrow bandwidth determination condition.

８は、線形予測係数変換部であり、修正された帯域幅を
含む全ての帯域幅と極周波数の値に基づいて、修正され
た線形予測係数Ａｉ′に還元する処理とを行なう。Reference numeral 8 denotes a linear prediction coefficient conversion unit, which performs a process of reducing the linear prediction coefficient Ai' to a corrected linear prediction coefficient based on the values of all the bandwidths including the corrected bandwidth and the polar frequencies.

修正された線形予測係数Ａｉ′は、ＰＡＲＣＯＲ係数変
換部４に入力され、修正されたＰＡＲＣＯＲ係数にＫｉ
′を出力する。このＰＡＲＣＯＲ係数Ｋｉ″は、ＰＡＲ
ＣＯＲ方式の音声合成器において使用されるとき、異音
の抑制された合成音声を生成する。（７）発明の効果本
発明は、異常な音声を発生する可能性の高いパラメータ
を、予め自動的に検出し、そして修正することができる
ので、合成音声の品質向上に役立ち、また品質改善のた
めの修正作業の負担を著しく軽減することができる。The modified linear prediction coefficient Ai′ is input to the PARCOR coefficient conversion unit 4, and the modified PARCOR coefficient is converted to Ki.
′ is output. This PARCOR coefficient Ki″ is PAR
When used in a COR-based speech synthesizer, it generates synthesized speech with suppressed abnormal sounds. (7) Effects of the Invention The present invention is capable of automatically detecting and correcting parameters that are likely to cause abnormal speech in advance, which is useful for improving the quality of synthesized speech. The burden of correction work can be significantly reduced.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は狭少帯域幅をもつ極周波数の説明図、第２図は
実施例の構成図である。図において、１は音声の分析処理部、２は音声パラメー
タの修正・処理部、３は線形予測分析部、４はＰＡＲＣ
ＯＲ係数変換部、５は極周波数・帯域幅変換部、６は狭
少帯域幅の極抽出部、７は狭少帯域幅の修正部、８は線
形予測係数変換部、をそれぞれ示す。FIG. 1 is an explanatory diagram of a polar frequency having a narrow bandwidth, and FIG. 2 is a configuration diagram of an embodiment. In the figure, 1 is a speech analysis processing section, 2 is a speech parameter modification/processing section, 3 is a linear prediction analysis section, and 4 is a PARC
5 is an OR coefficient conversion unit, 5 is a pole frequency/bandwidth conversion unit, 6 is a narrow bandwidth pole extraction unit, 7 is a narrow bandwidth correction unit, and 8 is a linear prediction coefficient conversion unit.

Claims

【特許請求の範囲】[Claims]

１音声波のスペクトル情報をパラメータ化して用いる
音声の分析合成方式において、入力音声を線形予測分析
する手段と、該線形予測分析された結果を極周波数とそ
の帯域幅との関数に近似的に変換する手段と、該変換さ
れた近似関数の中の狭少な帯域幅をもつ極周波数を検出
する手段と、該検出手段が狭少な帯域幅をもつ極周波数
を検出したとき、該極周波数の帯域幅を広げる方向に近
似関数を修正する手段と、該修正された近似関数に基づ
いてパラメータ変換する手段とをそなえ、音声波のスペ
クトル情報を線形予測分析パラメータ化する中間段階で
一旦、極周波数と帯域幅との関数に変換し、該関数を介
してパラメータ修正を行なうことを特徴とする音声パラ
メータの修正方式。1 In a speech analysis and synthesis method that uses parameterized spectral information of speech waves, means for linearly predictively analyzing input speech and approximately converting the results of the linearly predictively analyzing into a function of polar frequency and its bandwidth. means for detecting a polar frequency with a narrow bandwidth in the transformed approximation function; and when the detecting means detects a polar frequency with a narrow bandwidth, the bandwidth of the polar frequency; and a means for parameter conversion based on the modified approximation function. A sound parameter modification method characterized by converting into a function with a width and modifying the parameter via the function.