JPS61168000A

JPS61168000A - Voiceless sound waveform compression

Info

Publication number: JPS61168000A
Application number: JP60007410A
Authority: JP
Inventors: 三木　敬; 隆矢頭; 森戸　誠
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1985-01-21
Filing date: 1985-01-21
Publication date: 1986-07-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声波形に対称化処理を施すことによって波形
領域での情報圧縮を図る方法に関し、特に無声音波形に
対する対称化忙よる波形圧縮忙関する。[Detailed Description of the Invention] (Industrial Application Field) The present invention relates to a method for compressing information in a waveform domain by subjecting a speech waveform to symmetrization processing, and particularly relates to a waveform compression process by symmetrization processing for an unvoiced sound waveform. .

（従来の技術）音声伝送や音声応答における情報量を減少させる音声圧
縮方法としては音声波形の相関性に着目した、ＤＰＣＭ
、ＡＤＰＣＭ、ＡＤＭなどの手法が知られており、又こ
れらの手法と併用して更に圧縮を図る方法として音声波
形を対称化し、その対称性を利用して記憶領域を軽減さ
せる手法が知られている。その−例としては人間の聴覚
が音声の周波数成分の位相に敏感でないことを利用して
、各周波数成分の位相を整えるととＫより対称波形を算
出する手法である。以下この手法を簡単に説明する。（
特開昭５７−１６３２９７　）図２は従来の対称化回路のブロック図である。(Prior art) DPCM, which focuses on the correlation of audio waveforms, is an audio compression method that reduces the amount of information in audio transmission and audio responses.
, ADPCM, ADM, and other methods are known, and as a method for further compression that can be used in combination with these methods, a method is also known in which the audio waveform is made symmetrical and the storage area is reduced by utilizing the symmetry. There is. An example of this is a method that takes advantage of the fact that the human sense of hearing is not sensitive to the phase of the frequency components of voice, adjusts the phase of each frequency component, and calculates a symmetrical waveform using K. This method will be briefly explained below. (
(JP-A-57-163297) FIG. 2 is a block diagram of a conventional symmetrization circuit.

入力される音声は１０秒ととに標本化され、ある一定の
分析区間Ｔで切り出される（ここでは分析区間Ｔで標本
化されるサンプル点の数は例えば２５６点とする）。た
だし分析区間長Ｔは長（ても区間分での周波数成分が定
常と見なせる長さく例えば３２　ｍ５ｅｃ　）とする。The input voice is sampled every 10 seconds and cut out in a certain analysis interval T (here, the number of sample points sampled in the analysis interval T is assumed to be 256, for example). However, the analysis interval length T is set to be long (for example, 32 m5ec) so that the frequency components in the interval can be considered stationary.

まずこの分析区間Ｔで切り出された音声波形ｆ（ｎ’ｒ
ｏ）に重み関数ｗ（ｎＴ、）をかける。この重み関数は
連続で分析区間Ｔの両端に対応する部分が０又はＯに近
い値としたもので、例えば二乗余弦波Ｗ（ｎＴｏ　）　
＝＋（１−１−ｃｏ４π）（例えばｎ＝−１２８〜１２
７）とする。このかけ算操作は以下に記するＦＦＴの処
理において波形の不連続性を軽減するためである。音声
波形ｆ（ｎＴｏ）に重み関数ｗ（ｎＴｏ）を乗じた関数
をｇ（ｎＴｏ）とするとｇ（ｎＴｏ）＝ｗ（ｎＴｏ）’
ｆ（ｎＴｏ）　　　・（１１となる。First, the audio waveform f(n'r
o) is multiplied by the weighting function w(nT,). This weighting function is continuous and the parts corresponding to both ends of the analysis interval T have values close to 0 or O, for example, a raised cosine wave W(nTo)
=+(1-1-co4π) (e.g. n=-128~12
7). This multiplication operation is to reduce waveform discontinuity in the FFT processing described below. If g(nTo) is the function obtained by multiplying the audio waveform f(nTo) by the weighting function w(nTo), then g(nTo) = w(nTo)'
f(nTo) ・(11)

関数ｇ（ｎＴｏ）に対して２５６点ＦＦＴ処理（フーリ
エ変換）を行い、関数ｇ（ｎＴｏ）のスペクトラム関数
Ｇ（ｋＷｏ）を求める。A 256-point FFT process (Fourier transform) is performed on the function g(nTo) to obtain a spectrum function G(kWo) of the function g(nTo).

関数Ｇ（ｋＷＯ）は一般に複素関数であり、Ｇ（ｋＷｏ
）の実部をＲ，（ｋＷｏ）、虚部をＩ　（ｋＷ。）とす
るとＧ　（ｋＷｏ　）　＝　Ｅ％　（ｋＷｏ　）　＋　Ｊ　
Ｉ　（ｋＷｏ　）　　”’（２）となる。一方、関数Ｇ
（ｋＷＯ）を振幅特性Ａ　（ｋＷ。）と位相特性θ（ｋ
ＷＧ）Ａ（ｋＷｏ）＝ｖ’Ｒ２（ｋＷｏ）＋ｌ２（ｋＷｏ）　
　・（３）θ（ｋＷ。）　＝　−ａｒｃｔａｎ　Ｌ遅い
・・・（４）Ｒ，（ｋ％）を用いて分解するとＧ　（ｋＷｏ　）　＝Ａ（ｋＷｏ　）　ｅｘｐ（Ｊθ（
ｋＷｏ）　：］　　・（５１となる。この振幅特性Ａ（
ｋＷ。）、位相特性θ（ｋＷ。）を用いると θ（ｋＷ。月・・・（６）と表わされる。すなわち、ｇ（ｎＴｏ）は２５６個の正
弦波が重なった波形と考えられる。ここで人間の聴覚が
音声の周波数成分の位相に敏感でない性質を考えると、
位相特性θ（ｋＷＯ）をゼロとしても音質の劣化は少な
い。そこで位相特性をゼロにした波形ｇ’（ｎＴｏ）を
考える。するとｇ’（ｎＴｏ）は次式で与えられる。The function G(kWo) is generally a complex function, and G(kWo
), the real part is R, (kWo) and the imaginary part is I (kW.), then G (kWo) = E% (kWo) + J
I (kWo) ''(2).On the other hand, the function G
(kWO), amplitude characteristic A (kW.) and phase characteristic θ (k
WG) A (kWo) = v'R2 (kWo) + l2 (kWo)
・(3) θ (kW.) = -arctan L slow...(4) R, (k%) When decomposed using
kWo) :] ・(51. This amplitude characteristic A(
kW. ), and using the phase characteristic θ(kW.), it is expressed as θ(kW.month...(6). In other words, g(nTo) can be considered to be a waveform of 256 sine waves superimposed.Here, the human Considering the fact that the auditory sense is not sensitive to the phase of the frequency components of speech,
Even if the phase characteristic θ (kWO) is set to zero, there is little deterioration in sound quality. Therefore, consider a waveform g'(nTo) in which the phase characteristic is set to zero. Then, g'(nTo) is given by the following equation.

式（力かられかるように、関数ｇ’（ｎＴｏ）はｎ＝０
に対して対称となる。図３（ａ）に入力される音声波形
ｆ（ｎＴ。）を図３（ｂ）に重み付けをした波形ｇ（ｎ
Ｔｏ）を、図２（Ｃ）に対称化した波形ｇ’（ｎＴｏ）
を示す。As can be seen from the equation (force), the function g'(nTo) is n=0
It is symmetrical with respect to. The voice waveform f(nT.) input in FIG. 3(a) is weighted as shown in FIG. 3(b)
The waveform g'(nTo) is obtained by symmetricalizing To) to Fig. 2(C).
shows.

また図３（ｄ）には以後説明する手法によって、図３（
Ｃ）の波形を非線形変換した波形を示す。In addition, FIG. 3(d) was created using the method described below.
The waveform obtained by non-linearly transforming the waveform in C) is shown.

図３（Ｃ）において、波形ｇ’（ｎＴＯ）を見ると式（
６）において位相項のすべてをゼロにしたため、分析区
間Ｔの中央ｎ　＝００点（以後単に原点と呼ぶ）に波形
の電力が集中し、その結果として波形が極端に尖る形と
なる。このように、電力が原点に集中した波形を用いて
音声を合成するとインパルス性が強いためパリパリとい
った音が耳につ（こととなる。したがって、式（７）で
与えられた波形ｇ’（ｎＴｏ）に対して非線形変換を施
し、波形が尖った部分の修正を行う。非線形変換の一例
として以下の変換式ｃ（ｘｌを示す。In FIG. 3(C), looking at the waveform g'(nTO), the equation (
Since all the phase terms were set to zero in step 6), the power of the waveform is concentrated at the center n = 00 point (hereinafter simply referred to as the origin) of the analysis section T, and as a result, the waveform becomes extremely sharp. In this way, if you synthesize speech using a waveform in which power is concentrated at the origin, the impulsive nature will be strong and a crisp sound will be heard. Therefore, the waveform g'(nTo ) is subjected to non-linear transformation to correct the sharp portion of the waveform.As an example of non-linear transformation, the following transformation formula c(xl is shown).

変換式ｃ　（ｘ）の関係を図４に示す。波形ｇ’（ｎＴ
ｏ）を変換式ｃ　（ｘｌにしたがって変換した波形をｇ
“（ｎ　Ｔｏ　）とすると、で与えられる。図３（ｄ）に非線形変換した波形Ｆｆ（
ｎＴｏ）を示す。明らかに図３（ｄ）の波形は原点に対
し左右対称である。この対称波形を用いて音声合成を行
う場合、波形の対称性により記憶すべき波形の長さが、
元の音声波形ｆ（ｎＴｏ）を用いる場合に比べ、約半分
ですむことは明らかである。The relationship of the conversion formula c(x) is shown in FIG. Waveform g'(nT
o) to the conversion formula c (the waveform converted according to xl is g
``(n To ), it is given by
nTo). Obviously, the waveform in FIG. 3(d) is symmetrical with respect to the origin. When performing speech synthesis using this symmetrical waveform, the length of the waveform to be memorized is
It is clear that the time required is about half that of using the original audio waveform f(nTo).

ところで図３（ａ）のｆ（ｎＴｏ）はほぼ一定の周期で
非常に似かよった波形がくりかえされおり、その周期は
ピッチと呼ばれている。このようなピッチが明確忙表わ
れるものは声帯が規則的に振動する有声音と呼ばれてい
る。音声にはこのような有声音の他に、無声音と呼ばれ
る、明確なピッチが存在しないものがある。図５（ａ）
は無声音「シ」の波形ｐ（ｎＴｏ）を示す。又図５（ｂ
）はｐ（ｎＴｏ）Ｋ上記対称化処理を施こした波形ｐ　
’　（ｎ　Ｔｏ　）を示す。By the way, f(nTo) in FIG. 3(a) has a very similar waveform repeated at a substantially constant period, and this period is called a pitch. Such sounds in which the pitch is clearly expressed are called voiced sounds in which the vocal cords vibrate regularly. In addition to such voiced sounds, there are also voices called unvoiced sounds that do not have a clear pitch. Figure 5(a)
indicates the waveform p(nTo) of the unvoiced sound "shi". Also, Figure 5 (b
) is p(nTo)K waveform p subjected to the above symmetrization process
'(nTo) is shown.

このようにランダム性の強い無声音波形に上記対称化処
理を施した場合波形原点の振幅が異常に尖ってしまう現
象が発生する。このような波形を用いて音声を合成する
と原音には存在しなかったパリバリといった音が耳につ
（。この音質劣化要因となる異常振幅現象は無声音のラ
ンダム性からくる位相特性θ（ｋＷｏ　）の強いランダ
ムな性質を無視して、位相をゼロに整えてしまったため
である。When the above-mentioned symmetrization processing is applied to such a highly random unvoiced sound waveform, a phenomenon occurs in which the amplitude of the waveform origin becomes abnormally sharp. If you synthesize speech using such a waveform, you will hear a crisp sound that did not exist in the original sound (.This abnormal amplitude phenomenon, which causes sound quality deterioration, is due to the phase characteristic θ (kWo) caused by the randomness of unvoiced sounds. This is because the strong random nature was ignored and the phase was adjusted to zero.

したがって上記の手法を用いて無声音の対称化を行うこ
とには無理がある。Therefore, it is impossible to symmetrize unvoiced sounds using the above method.

他に無声音波形の対称化方法として無声音のランダム性
によりその波形の偶対称成分が元の無声音波形の音声上
の性質をある程度保存しているという点に着目した方法
がある。（特開昭５７−１６３２９９　）この方法は無
声音波形として元の無声音波形の代りにその偶対称成分
で置き換えてしまうものである。以下簡単に説明する。Another method for symmetrizing an unvoiced sound waveform focuses on the fact that due to the randomness of unvoiced sounds, even symmetrical components of the waveform preserve some of the phonetic properties of the original unvoiced sound waveform. (Japanese Unexamined Patent Publication No. 57-163299) This method replaces the original unvoiced sound waveform with its even symmetrical component as an unvoiced sound waveform. This will be briefly explained below.

分析区間Ｔ（例えば３２ｍ５ｅｃ）で切り出された無声
音波形ｐ（ｎＴｏ）（ｎ＝−１２８〜１２７）に対する
この分析区間Ｔでの無声音対称波形ｑ（ｎＴｏ）は次式
（ＩＩの如き演算を行って求められる。The unvoiced sound symmetrical waveform q(nTo) in this analysis interval T for the unvoiced sound waveform p(nTo) (n=-128 to 127) cut out in the analysis interval T (for example, 32 m5ec) is calculated by the following formula (II). Desired.

ここでｐ（−（ｎ＋１）Ｔｏ）は前記無声音波形ｐ（ｎ
Ｔｏ）を参照すればわかるとおり、無声音対称波形ｑ（
ｎＴｏ）は原点とｎ　＝　ｌの間の時間軸に対して対称
である。Here, p(-(n+1)To) is the unvoiced sound waveform p(n
As can be seen by referring to the unvoiced symmetrical waveform q(
nTo) is symmetrical about the time axis between the origin and n = l.

この対称波形を用いて音声合成を行う場合、波形の対称
性により記憶すべき波形の長さが元の音声波形ｐ（ｎＴ
ｏ）を用いる場合に比較して、約半分に減少する。しか
し上記の手法では原無声音波形ｐ（ｎＴｏ）と対称化さ
れた無声音波形ｑ（ｎＴｏ）の両者について、聴覚上極
めて重要なファクターであるスペクトル振幅成分の同一
性が保障されておらず音質劣化の要因となっていた。When performing speech synthesis using this symmetrical waveform, the length of the waveform to be stored is smaller than the original speech waveform p(nT
Compared to the case of using o), it is reduced to about half. However, the above method does not guarantee the sameness of the spectral amplitude components of both the original unvoiced sound waveform p (nTo) and the symmetrized unvoiced sound waveform q (nTo), which is an extremely important factor for auditory perception, resulting in sound quality deterioration. This was a contributing factor.

（発明が解決しようとする問題点）以上説明したとおり、従来のいずれの技術忙よりても、
無声音を対称化処理により情報圧縮すると、音質が劣化
するという欠点がある。(Problems to be solved by the invention) As explained above, compared to any of the conventional technologies,
Compressing information on unvoiced sounds through symmetrization processing has the disadvantage that the sound quality deteriorates.

本発明はこの欠点を改善し、無声音の情報圧縮を、音質
を劣化させずに、対称化処理により提供することを目的
とする。It is an object of the present invention to improve this drawback and provide information compression of unvoiced sounds by symmetrization processing without deteriorating the sound quality.

（問題点を解決するための手段）上記目的を達成するための本発明の特徴は、時間軸上の
無声音信号を所定の時間長で切り出し、切り出された信
号を周波数軸上の波形に変換し、該波形の位相項をラン
ダムにＯ又はπに設定し、その出力波形を時間軸上の信
号に逆変換して対称信号を求め、該対称信号の前半部又
は後半部を圧縮信号とする無音声波形圧縮方法にある。(Means for Solving the Problems) A feature of the present invention for achieving the above object is to cut out an unvoiced sound signal on the time axis at a predetermined time length and convert the cut out signal into a waveform on the frequency axis. , the phase term of the waveform is randomly set to O or π, the output waveform is inversely converted to a signal on the time axis to obtain a symmetrical signal, and the first half or the second half of the symmetrical signal is made into a compressed signal. In the audio waveform compression method.

（作用）本発明は周波数軸上の波形の位相項を強制的に０とする
代りに、ランダムにＯ又はπとしている。(Operation) In the present invention, instead of forcing the phase term of the waveform on the frequency axis to be 0, it is randomly set to O or π.

このことにより無声音の位相特性のランダム性が保存さ
れ、従来に比べて情報量を増やさずに音質の改善を図り
、原無声音波形に含まれているスペクトル振幅情報を保
存したままで無声音波形の音声上の性質を変化させずに
波形の対称化を実現し、記憶領域の軽減を図ることがで
きる。As a result, the randomness of the phase characteristics of unvoiced sounds is preserved, and the sound quality can be improved without increasing the amount of information compared to the conventional method. Waveform symmetry can be realized without changing the above properties, and the storage area can be reduced.

（発明の詳細な説明）この発明は無声音のランダム性からくる位相特性の強い
ランダム性を保存するために１と−１を不規則に得るラ
ンダムパルス発生回路を用いて位相特性を二値化し偶対
称波形を得るようにしたものである。(Detailed Description of the Invention) In order to preserve the strong randomness of the phase characteristics resulting from the randomness of unvoiced sounds, the present invention binarizes the phase characteristics using a random pulse generation circuit that randomly obtains 1 and -1. This is to obtain a symmetrical waveform.

ここで位相二値化処理をランダムパルス発生回路で実現
する原理を説明する。偶対称波形を得るだめの必要条件
は式（６）より、位相特性θＵＷｏ）がＯ又はπに設定
されれば良い。二値化位相特性θ’（ｋＷ。）を次式で
与える。Here, the principle of realizing phase binarization processing using a random pulse generation circuit will be explained. According to equation (6), the necessary condition for obtaining an even symmetrical waveform is that the phase characteristic θUWo) is set to O or π. The binarization phase characteristic θ' (kW) is given by the following equation.

但しｎ　（ｋＷ。）はランダムパルス発生回路からの出
力。However, n (kW.) is the output from the random pulse generation circuit.

この二値化位相特性θ’（ｋＷｏ）を式（６）のθ（ｋ
Ｗｏ）と置き換え、逆ＦＦＴ処理を施した波形をｑ’（
ｎＴｏ）とすると、式（６）ａυより＋　ｓｉｎ　ｎｋＴ（、Ｗ６　＊　ｓｉｎ　（θ’（ｋ
Ｗ。月〕となる。従ってｑ’（ｎＴｏ）　を算出するに
は式（３）のスペクトラム関数Ｇ（ｋＷ。）の振幅成分
Ａ　（ｋｗｏ）にランダムパルス発生回路の出力の符号
を付けたものを逆フーリエ変換を施せば良いことが分る
。This binarization phase characteristic θ' (kWo) is expressed as θ(k
Wo) and the waveform subjected to inverse FFT processing is converted to q'(
nTo), then + sin nkT(, W6 * sin (θ'(k
W. month]. Therefore, to calculate q'(nTo), the amplitude component A (kwo) of the spectrum function G (kW.) in equation (3) is given the sign of the output of the random pulse generator, and then subjected to inverse Fourier transform. I know it's good.

図５（ｅ）は無声音波形ｐ（ｎＴｏ）の振幅特性Ａ（ｋ
Ｗ。）を示し、図（ｆ）は位相特性θ（ｋＷ。）を示す
。図５（ｇ）は二値化位相特性θ’　（ｋＷ。）を示し
たものであり、図５（ｈ）は式α４に従って原無声音波
形ｐ（ｎＴｏ）を対称化した波形ｑ’（ｎＴｏ）を示し
ている。このように本手法は無声音波形から簡単な計算
で原波形の持つスペクトル振幅特性を保存したまま対称
化を行うものであり１、無声音波形に対する記憶領域の
軽減という効果がある。Figure 5(e) shows the amplitude characteristic A(k
W. ), and Figure (f) shows the phase characteristic θ (kW.). FIG. 5(g) shows the binarized phase characteristic θ' (kW.), and FIG. 5(h) shows the waveform q'(nTo) obtained by symmetrizing the original unvoiced sound waveform p(nTo) according to equation α4. It shows. In this way, the present method performs symmetrization from an unvoiced sound waveform by simple calculations while preserving the spectral amplitude characteristics of the original waveform, 1 and has the effect of reducing the storage area for unvoiced sound waves.

（実施例）図１は音声応答装置に用いる波形圧縮部に本方式を用い
た場合の実施例を示すブロック図である。(Embodiment) FIG. 1 is a block diagram showing an embodiment in which this method is used in a waveform compression section used in a voice response device.

図１において１１は有声無声判定器、１３は重み関数乗
算器、１５はＦＦＴ回路、１８は振幅特性算出器、ｎは
ランダムパルス発生回路、冴は符号変換器、３０゜４０
はセレクタ、３２はＩＦＦＴ（逆ＦＦＴ）回路、３５は
非線形変換器、４３は波形記憶部である。図Ｉにおいて
一定の標本化周期Ｔ０でサンプリングされた音声波形は
分析区間長Ｔで切り出され入力端子１０から有声無声判
定器１１に入力される。有声無声判定器１１では入力音
声波形に対し有声音／無声音の判定を行い、判定信号Ｖ
／ＵＶを２９．４１へ出力する。In FIG. 1, 11 is a voiced/unvoiced determiner, 13 is a weighting function multiplier, 15 is an FFT circuit, 18 is an amplitude characteristic calculator, n is a random pulse generation circuit, and Sae is a code converter.
32 is an IFFT (inverse FFT) circuit, 35 is a nonlinear converter, and 43 is a waveform storage section. In FIG. 1, a speech waveform sampled at a constant sampling period T0 is cut out with an analysis interval length T and inputted from an input terminal 10 to a voiced/unvoiced determiner 11. The voiced/unvoiced determiner 11 determines whether the input speech waveform is voiced or unvoiced, and generates a determination signal V.
Output /UV to 29.41.

さて音声波形は線１２を介して重み関数乗算器１３に入
力され式（１１に従い重み関数ｗ（ｎＴｏ）が乗じられ
る。重み関数を掛けられた音声波形はＦＦＴ回路１５に
入力され、ＦＦＴ演算が実行される。この結果算出され
たスペクトラム関数Ｃ，（ｋＷ。）の虚数成分Ｉ（ｋＷ
。）及び実数成分Ｒ，（ｋＷ。）はそれぞれ線１６．１
７より振幅特性算出器１８に入力される。振幅特性算出
器１８では式（３）に従って振幅特性Ａ（ｋＷｏ）が計
算される。振幅特性Ａ（ｋＷｏ）は線２１を介してセレ
クタ加の端子部に入力される一方、符号変換器冴へも線
１９から入力される。又ランダムパルス発生回路ρから
発生する−１．　ｌのランダム信号ｎ（ｋＷ。）も線田
を介して符号変換益友へ入力されている。Now, the audio waveform is input to the weighting function multiplier 13 via the line 12, and multiplied by the weighting function w(nTo) according to equation (11).The audio waveform multiplied by the weighting function is input to the FFT circuit 15, and the FFT calculation is The imaginary component I (kW) of the spectrum function C, (kW) calculated as a result is
. ) and the real component R, (kW.) are respectively line 16.1
7 to the amplitude characteristic calculator 18. The amplitude characteristic calculator 18 calculates the amplitude characteristic A (kWo) according to equation (3). The amplitude characteristic A (kWo) is input to the selector terminal via line 21, and is also input to the code converter via line 19. -1 generated from the random pulse generating circuit ρ. A random signal n (kW.) of l is also input to the code converter Masutomo via the line.

符号変換益友はランダム信号の符号部分を振幅特性Ａ（
ｋＷ。）に付加した振幅特性に（ｋＷＯ）（＝ｎ（ｋＶ
／υ・Ａ（ｔ＜ｗｏ））を作り、線５を介してセレクタ
Ｉの端子かに送出する。セレクタ美は線画から入力され
ている判定信号Ｖ／ＵＶ　Ｋ従い、有声音ならば端子が
のＡ（ｋＷ。）を逆に無声音ならば端子がのＡ／（ｋＷ
。）をセレクトする。セレクトされた振幅特性は端子部
から線３１を介してＩＰＦＴ回路（に入力される。ＩＦ
ＦＴ回路（は逆ＦＦＴ演算を実行する回路であり通常は
実部、虚部が入力されるのであるが、本実施例では有声
音ならば式（７）を無声音ならば式圓を演算させるため
虚数成分は常にゼロである。従ってここでは内部的に虚
数成分はゼロにセットされているものとする。この結果
ＩＦＦＴＦＦ型では判定信号Ｖ／ＵＶに従りて式（７）
又は式ｔｔａによる対称波形が算出される。更に対称波
形は線３３からセレクタ切の端子ごに入力される一方、
線Ｍを介して非線形変換益田に入力される。非線形変換
器あでは式（８）の変換式ｃ　（ｘ）に従い、非線形変
換を施す。非線形変換後の対称波形は線Ｉよりセレクタ
菊のもう一方の端子部に入力される。Code conversion Masutomo transforms the code part of a random signal into amplitude characteristic A (
kW. ) to the amplitude characteristic added to (kWO) (=n(kV
/υ·A(t<wo)) and sends it to the terminal of selector I via line 5. The selector beauty follows the judgment signal V/UV K input from the line drawing, and if it is a voiced sound, the terminal is A (kW), and if it is an unvoiced sound, the terminal is A / (kW).
. ). The selected amplitude characteristic is input to the IPFT circuit from the terminal section via the line 31.
The FT circuit (FT circuit) is a circuit that executes an inverse FFT operation, and normally the real and imaginary parts are input. The imaginary component is always zero. Therefore, here it is assumed that the imaginary component is internally set to zero. As a result, in the IFFTFF type, according to the judgment signal V/UV, equation (7)
Alternatively, a symmetrical waveform is calculated using the formula tta. Furthermore, the symmetrical waveform is input from line 33 to each selector-off terminal, while
It is input to the nonlinear transformation Masuda via line M. The nonlinear transformer performs nonlinear transformation according to the transformation formula c (x) of formula (8). The symmetrical waveform after the nonlinear transformation is input from line I to the other terminal of the selector chrysanthemum.

本手法に基づ（無声音対称化処理では非線形変換の必要
は全くない。しかしながら本実施例で用いている有声音
の場合の対称化手法は各周波成分の位相をすべてゼロに
整えるものであるため、前述の通りインパルス性の強い
波形となる。従ってセレクタ菊では４１からの判定信号
によって有声音判定ならば非線形変換後の対称波形を選
択し、無声音判定ならば端子Ｍからの対称波形、すなわ
ちＩＦＦＴ回路（からの非線形変換を施さない対称波形
がセレクトされる。セレクタ切の端子からの出力波形は
判定信号Ｖ／ＵＶの区別により音質劣化の少ない対称化
手法を施された偶対称波形となっていることが分る。以
上のような過程をへて対称化された波形は前半又は後半
部分から適当な長さで切り出されて線心な介して波形記
憶部葛に格納される。Based on this method (unvoiced sound symmetrization processing does not require nonlinear transformation at all. However, the symmetrization method for voiced sounds used in this example adjusts the phase of each frequency component to all zero, As described above, the waveform has a strong impulsive nature.Therefore, the selector Kiku selects the symmetrical waveform after non-linear transformation for voiced sound determination based on the determination signal from 41, and selects the symmetrical waveform from terminal M for unvoiced sound determination, that is, the IFFT A symmetrical waveform that is not subjected to nonlinear transformation from the circuit is selected.The output waveform from the terminal with the selector off becomes an even symmetrical waveform that has been subjected to a symmetrization method with less deterioration in sound quality by distinguishing the judgment signal V/UV. The waveform that has been made symmetrical through the above process is cut out at an appropriate length from the first half or the second half and stored in the waveform storage section via the wire core.

波形記憶部むに格納されたデータから対称波形を再生す
る方法は次のようにすれば良い。すなわち波形記憶部招
に前半波形が格納されているのならば、最初は波形デー
タを時間順方向に読み出し、後半部分を再生する際には
データを時間軸と逆方向に読み出すことにより実現でき
る。又対称波形の後半部分が格納されている場合には前
述の逆の処理を施せば良い。A method for reproducing a symmetrical waveform from the data stored in the waveform storage section is as follows. That is, if the first half of the waveform is stored in the waveform storage section, this can be achieved by first reading the waveform data in the forward direction of time, and when reproducing the second half, the data can be read out in the direction opposite to the time axis. Furthermore, if the latter half of the symmetrical waveform is stored, the above-mentioned process may be reversed.

（発明の効果）以上説明したように、本発明によれば無声音の音声素片
波形を前記の方法によって対称化することによって無声
音の位相特性のランダム性が保存され、合成音声の品質
は劣化しない。また波形が対称であるためにその波形を
記憶するた゛めの記憶装置の記憶領域を半減できる。(Effects of the Invention) As explained above, according to the present invention, the randomness of the phase characteristics of the unvoiced sound is preserved by symmetricalizing the speech segment waveform of the unvoiced sound by the above-described method, and the quality of the synthesized speech does not deteriorate. . Furthermore, since the waveform is symmetrical, the storage area of the storage device for storing the waveform can be halved.

本発明は、合成音声の品質を劣化させずに音声素片の記
憶領域を軽減でき、すべての音声素片型の音声合成装置
に適用可能である。The present invention can reduce the storage area of speech segments without deteriorating the quality of synthesized speech, and is applicable to all speech segment type speech synthesis devices.

【図面の簡単な説明】[Brief explanation of the drawing]

図１は本発明による波形圧縮部のブロック図、図２は従
来の波形圧縮部のブロック図、図３は従来の音声波形圧
縮の動作説明図、図４は非線形変換の説明図、図５は本
発明による無声音波形の波形圧縮の各波形を従来の波形
と対比して示す図である。１１・・・有声無声判定器、　１３・・・重み関数乗算
器、１５・・・ＦＦＴ回路、　　　　１８・・・振幅特
性算出器、ρ・・・ランダムパルス発生回路、あ・・・符号変換器、　　　（資）、４０・・・セレク
タ、３２・・・ＩＦＦＴ（逆ＦＦＴ）回路、あ・・・非
線形変換器、　　梠・・・波形記憶部。FIG. 1 is a block diagram of a waveform compression section according to the present invention, FIG. 2 is a block diagram of a conventional waveform compression section, FIG. 3 is an explanatory diagram of the operation of conventional audio waveform compression, FIG. 4 is an explanatory diagram of nonlinear conversion, and FIG. FIG. 3 is a diagram showing each waveform of waveform compression of an unvoiced sound waveform according to the present invention in comparison with a conventional waveform. 11... Voiced/unvoiced determiner, 13... Weighting function multiplier, 15... FFT circuit, 18... Amplitude characteristic calculator, ρ... Random pulse generation circuit, Ah... Code converter , (Material), 40...Selector, 32...IFFT (inverse FFT) circuit, A...Nonlinear converter, 梠...Waveform storage unit.

Claims

【特許請求の範囲】[Claims]

時間軸上の無声音信号を所定の時間長で切り出し、切り
出された信号を周波数軸上の波形に変換し、該波形の位
相項をランダムに０又はπに設定し、その出力波形を時
間軸上の信号に逆変換して対称信号を求め、該対称信号
の前半部又は後半部を圧縮信号とすることを特徴とする
、無声音波形圧縮方法。Cut out an unvoiced sound signal on the time axis for a predetermined time length, convert the cut out signal into a waveform on the frequency axis, randomly set the phase term of the waveform to 0 or π, and convert the output waveform on the time axis. A method for compressing an unvoiced sound waveform, the method comprising: obtaining a symmetrical signal by inversely converting it into a signal, and using the first half or the second half of the symmetrical signal as a compressed signal.