JPS5855992A

JPS5855992A - Voice analysis/synthesization system

Info

Publication number: JPS5855992A
Application number: JP56153578A
Authority: JP
Inventors: 谷戸　文広; 来山　征士; 「くれ」松　明
Original assignee: Kokusai Denshin Denwa KK
Current assignee: KDDI Corp
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1983-04-02

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、一般に音声の狭帯域伝送方式に関し、特に線
形予測型分析合成方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates generally to narrowband speech transmission systems, and more particularly to linear predictive analysis and synthesis systems.

線形予測型音声分析合成方式は音声信号を高能率でディ
ジタル伝送する目的で開発されたものであり、その基本
的な考え方を一般的に表現するなら、音声の情報を音源
情報とスペクトル情報（声道情報）とに分解して伝送し
、受信側では音源信号にスペクトル情報を付加すること
により音声情報を合成して再生す６ものである。線形予
測型音声分析合成方式において特徴的な点は、スペクト
ル情報（声道情報）を表現するためにＨ（Ｚ）　：と、
音源情報として周期的パルスあるいは白色雑音のいずれ
か一方だけ、もしくは両者を適当な割合で混合して用い
る７ことである。従って伝送する情報としては、全極型
関数の係数αｉあるいはこれと等価な係数、音源の平均
振幅あるいは平均エネルギーの大きさ、音源の特性つま
りパルス性か白色緒音性かという情報、及びパルス性音
源の場合のパルスを生成する周期が必要である。The linear predictive speech analysis and synthesis method was developed for the purpose of highly efficient digital transmission of speech signals, and its basic idea can be expressed generally by dividing speech information into sound source information and spectral information (voice information). On the receiving side, spectrum information is added to the sound source signal to synthesize and reproduce audio information. The characteristic feature of the linear predictive speech analysis and synthesis method is that H(Z): and
The method is to use either periodic pulses or white noise alone, or a mixture of both at an appropriate ratio7, as sound source information. Therefore, the information to be transmitted includes the coefficient αi of the all-pole function or an equivalent coefficient, the average amplitude or average energy of the sound source, information on the characteristics of the sound source, that is, whether it is pulsed or white-tone, and the pulseness. A period for generating pulses in the case of a sound source is required.

なる形の全極型関数で表現されることは、音声の時系列
信号Ｓｔにおいて、ある時点の信号Ｓｔはその直前のｐ
個の信号値５ｔ１（＋−１〜ｐ）により夕α１Ｓ　（−
ｉなる形で、最小２乗誤差の意味においてトー０最適に予測されていることに対応する。さらに、このよ
うな形で予測がなされることは、時系列信号ＳＬにおい
て近接した信号間に強い相関が存在することを意味する
。なお、αｌは線形予測係数と呼ばれる。一方音源情報
については゛、まず時系列信号Ｓｔから線形予測係数を
求め、これＫより予測した信号Ｓｌｔともとの信号Ｓｔ
との差を音源信号ε、と考え、ε、から必要な情報つま
り音源の大きさ及び特性を決定する。これと等価な手続
きとしては、時系列信号Ｓｔから近接した相関成分を次
々に除去することにより音源信号εｔを求めることがで
きる。What is expressed by an all-pole function of the form is that in the audio time series signal St, the signal St at a certain point is p
α1S (-
i, which corresponds to being optimally predicted in the least squares error sense. Furthermore, making predictions in this manner means that there is a strong correlation between adjacent signals in the time series signal SL. Note that αl is called a linear prediction coefficient. On the other hand, regarding the sound source information, first obtain the linear prediction coefficient from the time series signal St, and then combine the signal Slt predicted from this K with the original signal St.
The difference between the two is considered to be the sound source signal ε, and the necessary information, that is, the size and characteristics of the sound source, is determined from ε. As a procedure equivalent to this, the sound source signal εt can be obtained by successively removing adjacent correlation components from the time-series signal St.

さて、従来の線形予測型分析合成方式で音声を分析伶成
すると、十分な了解性を有する合成音が得られるが、自
然性や個人性などを考慮すると必ずしも満足できる音質
とは言い難い。この主な原因としては、音源の特性とし
てパルス列か白色雑音を用いているが、実際の音声波形
の音源信号εｔは、有声性の場合には特にパルス列とみ
なすことには無理があり、さらに音韻によってスペクト
ル特性に号εｔＦ＜直接符号化伝送することにより、も
との音声を完全に復元することができるが、伝送すべき
情報量が多すぎるため狭帯域伝送としては実現が不可能
である。Now, when speech is analyzed and synthesized using the conventional linear prediction type analysis and synthesis method, synthesized speech with sufficient intelligibility can be obtained, but when naturalness and individuality are taken into consideration, the sound quality is not necessarily satisfactory. The main reason for this is that a pulse train or white noise is used as the sound source characteristic, but it is unreasonable to treat the sound source signal εt of the actual speech waveform as a pulse train, especially in the case of voicing, and furthermore, the phonological Although it is possible to completely restore the original voice by directly encoding and transmitting the signal εtF<< to the spectrum characteristic, it is impossible to realize narrowband transmission because the amount of information to be transmitted is too large.

従って本発明は、従来の技術の上記欠点を除去すること
を目的とし、その特徴は、有声性の場合には音源信号ε
ｔから抽出した波形の一部分又はそのインパルス応答を
符号化伝送し、音声波形の合成時には伝送された波形を
パルスの祝りに用いて音声を合成することにある。以下
図面により詳細に説明する。The present invention therefore aims to eliminate the above-mentioned drawbacks of the prior art and is characterized in that in the case of voicing, the source signal ε
A part of the waveform extracted from t or its impulse response is encoded and transmitted, and when synthesizing a voice waveform, the transmitted waveform is used as a pulse symbol to synthesize voice. This will be explained in detail below with reference to the drawings.

第１図は、線形予測型音声分析合成方式の一実施例を示
し、このシステムにおいては、線形予測係数αｉと等価
であり伝送に適した特性をもつ部分自己相関係数に、を
求めて伝送している。また、音源信号εｔを時系列信号
Ｓｔから求める手続きとしては、近接した信号間から次
々に相関成分を除去する方法を用いている。音源として
は、白色雑音が分析部において予測誤差信号系列εｔか
ら抽出した音源波形のいずれかを用いている。Figure 1 shows an example of a linear predictive speech analysis and synthesis method. In this system, a partial autocorrelation coefficient that is equivalent to the linear predictive coefficient αi and has characteristics suitable for transmission is determined and transmitted. are doing. Further, as a procedure for obtaining the sound source signal εt from the time-series signal St, a method is used in which correlated components are removed one after another from adjacent signals. As the sound source, one of the sound source waveforms extracted from the prediction error signal sequence εt in the white noise analysis section is used.

第１図において、分析部では部分自己相関器３〜］２が
スペクトル情報及び音源信号εｔを求める部分に対応し
、レジスタ１３、自己相関器１４、最大値検出回路１５
、除算器１６、判定回路１７が音源情報を求める部分に
対応する。一方、合成部では音源波形レジスタｎ、乗算
器Ｕ、白色雑音発生器５、スイッチ２６、平方根回路が
、乗算器公が音源信号を生成する部分に対応し、合成フ
ィルタ四〜あがスペクトル情報を音源に付加する部分に
対応する。In FIG. 1, in the analysis section, partial autocorrelators 3 to 2 correspond to a part for obtaining spectral information and sound source signal εt, a register 13, an autocorrelator 14, and a maximum value detection circuit 15.
, the divider 16, and the determination circuit 17 correspond to the part for obtaining sound source information. On the other hand, in the synthesis section, the sound source waveform register n, multiplier U, white noise generator 5, switch 26, and square root circuit correspond to the part where the multiplier generates the sound source signal. Corresponds to the part added to the sound source.

なお、最大値検出回路１０１、正規化回路１０２、音源
波形レジスタ路は従来の方式にない本発明の重要な部分
である。Note that the maximum value detection circuit 101, the normalization circuit 102, and the sound source waveform register path are important parts of the present invention that are not found in conventional systems.

次に第１図に示したシステムの動作を説明する。Next, the operation of the system shown in FIG. 1 will be explained.

このシステムは、分析部と合成部に分けられる。This system is divided into an analysis section and a synthesis section.

まず分析部においては、入力音声信号ｖ　（ｔ）が低域
Ｐ波器１を通過した後にＡＤ変換器２によりディジタル
サンプリングデータ系列Ｓｔとして分析の対象となる。First, in the analysis section, after the input audio signal v (t) passes through the low-frequency P-wave device 1, it is analyzed by the AD converter 2 as a digital sampling data series St.

まず入力データ系列Ｓｔが１０段の部分自己相関器３〜
１２を通過することにより、低次の相関成分カー取り除
かれた予測誤差信号系列Ｃ４に変換される。第２図には
、入力信号ｖ　（ｔｌに対する予測誤差信号系列εｔの
例を示した。なお第２図（ａ）は有声音の場合、第２図
（１３）は無声音の場合である。一般的に、母音などの
有声音に対しては予測誤差信号系列εｔは周期的であり
、無声音に対してはεｔは白色雑音に近い波形である。First, the input data series St is a 10-stage partial autocorrelator 3~
12, it is converted into a prediction error signal sequence C4 from which low-order correlation components have been removed. FIG. 2 shows an example of the prediction error signal sequence εt for the input signal v (tl. Note that FIG. 2(a) is for a voiced sound, and FIG. 2(13) is for an unvoiced sound.General Specifically, the prediction error signal sequence εt is periodic for voiced sounds such as vowels, and εt has a waveform close to white noise for unvoiced sounds.

一方合成部では、εｔは白色雑音又は分析部で求めた音
源波形のいずれかであると仮定しており、分析部におい
てεｔの特性を決定する必要がある。On the other hand, the synthesis section assumes that εt is either white noise or the sound source waveform obtained by the analysis section, and the analysis section needs to determine the characteristics of εt.

そこで、予測誤差信号系列εｔをレジスタ１３へ順次ス
トアして１０ミリ秒に１度の割合で自己相関器１４によ
り自己相関係数を計算し、最大値検出回路１５により０
次の自己相関係数ｖ０以外で最大の自己相関係数ＶＭＡ
Ｘ及び最大値を与える自己相関係数の次数りを求め、次
に除算器１６において■。でＶＭＡＸＹ割って周期性尺
度σを求め、判定回路１７においてσが０５より犬なる
場合は有声性であるとし１を、σが０５以下の場合は有
声性ではないとしてＯをＶ／ＵＶの値として出力する。Therefore, the prediction error signal series εt is sequentially stored in the register 13, the autocorrelator 14 calculates the autocorrelation coefficient once every 10 milliseconds, and the maximum value detection circuit 15 calculates the autocorrelation coefficient
The largest autocorrelation coefficient VMA other than the next autocorrelation coefficient v0
The order of the autocorrelation coefficient that gives the maximum value of The periodicity measure σ is obtained by dividing VMAXY by VMAXY, and in the judgment circuit 17, if σ is smaller than 05, it is considered to be voiced and is set to 1, and if σ is less than 05, it is not considered to be voiced, and O is the value of V/UV. Output as .

以上のプロセスにお℃・て、部分自己相関器３〜１２で
決定される部分自己相関、係数ｋＨ（ｉ＝１〜１０）、
予測誤差信号のエネルギーｖｏ、基本周期Ｌ、予測誤差
信号系列εｔから抽出した音源波形の形状及び有声無声
の判定Ｖ／ＵＶの情報が１０ミリ秒ごとに符号化器１８
へ送られ、符号化された後に変復調器（ＭＯＤＥＭ’）
１９を介して通信回線加から合成部へと転送される。In the above process, partial autocorrelations determined by partial autocorrelators 3 to 12 at °C, coefficients kH (i=1 to 10),
Information on the energy vo of the prediction error signal, the fundamental period L, the shape of the sound source waveform extracted from the prediction error signal sequence εt, and the voiced/unvoiced determination V/UV is sent to the encoder 18 every 10 milliseconds.
is sent to a modem and modulator (MODEM') after being encoded.
The data is transferred from the communication line adder to the synthesis unit via 19.

次に合成部では、通信回線加からの信号が変復調器（Ｍ
ＯＤＥＭ）２１を介して入力されると、復号器２２によ
り部分自己相関係数Ｑｉ（ｉ＝ｌ〜１０）、予測誤差信
号のエネルギーｖ０、基本周期Ｌ１有声無声の判定Ｖ／
ＵＶ及び音源波形の情報に変換される。Next, in the combining section, the signal from the communication line is sent to the modulator/demodulator (M
ODEM) 21, the decoder 22 calculates the partial autocorrelation coefficient Qi (i=l~10), the energy v0 of the prediction error signal, and the voiced/unvoiced judgment V/ of the fundamental period L1.
It is converted into UV and sound source waveform information.

合成部においては、分析部より送られた基本周期りごと
に音源波形を出力する音源波形レジスタお及び音源波形
の大きさを基本周期の長さＬ倍だけ増幅する増幅器２４
からなる有声音用の部分と、白色雑音発生器５を用いた
無声音用の部分が音声を合成するだめの音源信号ｅｔを
与えるために用意されており、Ｖ／ＵＶの信号によりス
イッチ謳が切り換えられてどちらを用いるか指定される
。次に、して振幅値に変換され、・増幅器公において源
信号ｅ、を・Ｂ万倍して合成側における予測誤差信号系
列負を生成する。次に信号系列？ｔに対して部分自己相
関係数で、を係数として有する合成フィルタ６〜側によ
って、分析部とは逆のプロセスにより近接した相関成分
が付加されてディジタルデータ系列宮ｔが生成される。The synthesis section includes a sound source waveform register that outputs the sound source waveform every fundamental period sent from the analysis section, and an amplifier 24 that amplifies the size of the sound source waveform by L times the length of the fundamental period.
A part for voiced sound consisting of a white noise generator 5 and a part for unvoiced sound using a white noise generator 5 are prepared to provide a sound source signal et for synthesizing speech, and the switch song is switched by the V/UV signal. to specify which one to use. Next, it is converted into an amplitude value, and the source signal e is multiplied by B0,000 in the amplifier to generate a negative prediction error signal sequence on the synthesis side. Next is the signal series? The synthesis filter 6, which has a partial autocorrelation coefficient for t as a coefficient, generates a digital data sequence t by adding adjacent correlation components through a process opposite to that of the analysis section.

その後にＳｔはＤＡ変換器３９及び低域ｒ波器４０を通
って合成音声ｖ　（ｔ）として出力される。Thereafter, St passes through a DA converter 39 and a low-frequency r wave generator 40 and is output as synthesized speech v (t).

第３図に部分自己相関器３〜１２の１段の構成を示した
。部分自己相関器は、単位サンプル時内の遅延回路４１
、部分自己相関係、数を計算するための部分は加算器４
２　、４３、自乗回路４４　、４５、加算器４６゜４７
、平均化フィルタ４８　＃　４９４及び割り算器５０か
らなっており、信号系列ｘｔ　、　ｙｔが入力した場合
に及び加算器４６からの出力は４ｘｔｙｔであり、加算
器４７からの出力は２　Ｘ　ｔ２−１−２　ｙ　ｐとな
り、信号ｘｉとｙｌの相関成分及びエネルギーの大きさ
が求められた形となっている。FIG. 3 shows the configuration of one stage of partial autocorrelators 3 to 12. The partial autocorrelator is a delay circuit 41 within the unit sample time.
, partial autocorrelation, the part for calculating the number is adder 4
2, 43, square circuit 44, 45, adder 46° 47
, an averaging filter 48 #494, and a divider 50, and when the signal sequences xt and yt are input, the output from the adder 46 is 4xtyt, and the output from the adder 47 is 2xt2-1. −2 y p, and the magnitude of the correlation component and energy of the signals xi and yl are determined.

一方、平均化フィルタ都及び４９の内部は第４図に示し
たようにディジタル型低域ｆ波器であり、加算器５１　
、５２　、５３、単位サンプル時内の遅延回路５１１、
５５　、５６、及び乗算器５７　、５８　、５９から構
成されており、入力を時間平均する働きを持っている。On the other hand, the inside of the averaging filter 49 is a digital low-frequency f-wave filter as shown in FIG.
, 52 , 53, delay circuit 511 within unit sample time,
55, 56, and multipliers 57, 58, 59, and has the function of time-averaging the input.

従って、平均化フィルタ４８からの出力はＥ（４ｘｔｙ
ｔ）であり、フィルタ４９からの出力Ｅ　（ｘｔ２＋　
ｙｌ２）となり、割り算器５０による割り算の結果とし
て信号ｘｔとｙｔの相関成分ｌエネルギーで正規化した
形となっており、以上のプロセスにより部分自己相関係
数ｋｉが求められるわけである。次に、この部分自己相
関係数に、を用いて信号系列ｘｔ、ｙｔから相関成分を
取り除く操作が格子型回路にお（・てなされる。格子型
回路は加算器６０　、６１及び乗算器６２゜６３よりな
り立っている。また、部分自己相関係数ｋｔＹ出力する
ために端末６４が用意されている。Therefore, the output from the averaging filter 48 is E(4xty
t), and the output E (xt2+
yl2), which is normalized by the correlation component l energy of the signals xt and yt as a result of the division by the divider 50, and the partial autocorrelation coefficient ki is obtained by the above process. Next, using this partial autocorrelation coefficient, a lattice-type circuit performs an operation to remove correlation components from the signal sequences xt and yt. 63. Furthermore, a terminal 64 is prepared to output the partial autocorrelation coefficient ktY.

第５図には、合成フィルタ四・・・・・・３７　、３８
の１段の構成を示した。合成フィルタは単位サンプル時
内の遅延回線７１、加算器７２　、７３　、７４、及び
部分自己相関係数に、倍するための乗算器７５から構成
されており、相関成分を信号に付加する働きをもってい
る。FIG. 5 shows synthesis filters 4...37, 38.
The configuration of one stage is shown. The synthesis filter is composed of a delay line 71 within the unit sample time, adders 72, 73, 74, and a multiplier 75 for multiplying the partial autocorrelation coefficient, and has the function of adding a correlation component to the signal. There is.

ところで本発明の重要な特徴は、すでに述べたごとく、
音声波形の合成時に有声音部分において、従来のパルス
列の代りに予測誤差信号系列εｔの一部分、又はそのイ
ンパルス応答を符号化伝送した音源波形を用いることに
ある。本実施例では、予測誤差信号系列εｔの一部分を
そのまま用いる方式％式％以上の手続きは、第１図に示した最大値検出回路１０１
１正規化回路１０２により実現される。すなわち、レジ
スタ１３に記憶されている予測誤差信号系列εｔから連
続したＮ点のエネルギーが最大となる区間が最大値検出
回路１０１により発見され、この連続したＮ点の波形が
正規化回路１０２へ送られ、エネルギー的に正規化され
た後に符号化器］８へ送られる。これまでの実験結果に
よるなら、Ｎの値として１５程度を用いることにより１
基本周期内の予測誤差信号系列εｔのエネルギーの約７
０％が上記の区間に含まれており、Ｎ≧１５とすれば実
用上汁分である。By the way, the important feature of the present invention is, as already mentioned,
The purpose of this invention is to use a part of the prediction error signal sequence εt or a sound source waveform obtained by encoding and transmitting its impulse response in place of the conventional pulse train in the voiced sound part when synthesizing the voice waveform. In this embodiment, a method using a part of the prediction error signal sequence εt as it is is used.The above procedure is performed by the maximum value detection circuit 101 shown in FIG.
This is realized by the 1 normalization circuit 102. That is, the maximum value detection circuit 101 discovers from the prediction error signal sequence εt stored in the register 13 an interval in which the energy of consecutive N points is maximum, and the waveforms of these consecutive N points are sent to the normalization circuit 102. After being energetically normalized, it is sent to the encoder ]8. According to the experimental results so far, by using a value of N of about 15,
Approximately 7 of the energy of the prediction error signal sequence εt within the fundamental period
0% is included in the above range, and if N≧15, it is practically a juice content.

第６図には、最大値検出回路１０１及び正規化回路１０
２による一連の処理手続きの流れ図を示した。FIG. 6 shows a maximum value detection circuit 101 and a normalization circuit 10.
A flowchart of a series of processing procedures according to 2 is shown.

まず、最大値検出回路１０１では、レジスタ１３に記憶
されている予測誤差信号系列εｔの連続したＮ点の総エ
ネルギーが逐次求められ、レジスタ１３内のεｔについ
て総エネルギーが最大となる位置及び対応する信号系列
が得られる。次に、総エネルギーが最大である連続した
Ｎ点の予測誤差信号系列εｔが正規化回路１０２へ送ら
れ、エネルギーによる正規化が行なわれ、音源情報とし
て符号化され伝送される。First, the maximum value detection circuit 101 sequentially calculates the total energy of N consecutive points of the prediction error signal sequence εt stored in the register 13, and determines the position and corresponding position where the total energy is maximum for εt in the register 13. A signal sequence is obtained. Next, the prediction error signal sequence εt of consecutive N points having the maximum total energy is sent to the normalization circuit 102, normalized by energy, and encoded and transmitted as sound source information.

このような構成になっていることからその結果としては
、従来の分析合成方式で問題となっていた自然性や個人
性の欠如した感じが取り除かれ、合成した音声の品質が
格段に優れている。また、装置的にも従来のシステムに
単に最大値検出機能、正規化機能及び記憶装置が若干付
加するだけでよく、極めて簡単である。また、以上に説
明した第１図の動作の主要部をプログラムされた汎用コ
ンThe result of this structure is that the lack of naturalness and individuality that was a problem with conventional analysis and synthesis methods is removed, and the quality of the synthesized speech is much better. . In addition, the system is extremely simple, requiring only a maximum value detection function, a normalization function, and a storage device to be slightly added to the conventional system. In addition, a general-purpose computer programmed with the main part of the operation shown in Figure 1 explained above is also available.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明による線形予測型音声分析合成装置のブ
ロックダイヤグラム、第２図は予測誤差信号系列の例、
第３図は部分自己相関器のブロックタイヤグラム、第４
図は平均化フィルタのブロックダイヤグラム、第５図は
合成フィルタのブロックダイヤグラム、第６図は最大値
検出回路１０１及び正規化回路１０２の処理の流れ図で
ある。ｌ・・・・・・ローパスフィルタ、２・・・・・・アナログ・ディジタル変換器、３〜１２
・・・・・・部分自己相関器、１３・・・・・・レジス
タ、１４・・曲自己相関器、１５・・・・・・最大値検
出回路、１６・・四割り算回路、１７・・・・・・判定
回路、　　　Ｉ８・曲・符号化器１９　、２１・・・変
復調器、　　加・・・・・・伝送路、η・・・・・・復
号器、　　　　　る・・・・・・音源波形レジースタ、
々・・・・・・乗算器、　　　　　５・・Ｉｎ色雑音発
生器、謳・・・・・・スイッチ、　　　υ・・曲平方根
回路、昂・・・・・・乗算器、２９〜３８・・・・・・合成フィルタ、３９・・・・・
・・・・・・・ディジタル・アナログ変換器、４０・・
・・・・・・・・・・ローパスフィルタ、１０１　・・
・・・・・・・最大値検出回路、１０２・・・・・・・
・・正規化回路特許出願人国際電信電話株式会社特許出願代理人弁理士　　山　　本　　恵　　− 亀３図ａ尾４図尾５図准ＡｎFIG. 1 is a block diagram of a linear predictive speech analysis and synthesis device according to the present invention, and FIG. 2 is an example of a prediction error signal sequence.
Figure 3 is the block tire diagram of the partial autocorrelator, and the fourth
5 is a block diagram of the averaging filter, FIG. 5 is a block diagram of the synthesis filter, and FIG. 6 is a flow chart of the processing of the maximum value detection circuit 101 and the normalization circuit 102. l...Low pass filter, 2...Analog-digital converter, 3-12
... Partial autocorrelator, 13 ... Register, 14 ... Song autocorrelator, 15 ... Maximum value detection circuit, 16 ... Division by four circuit, 17 ... ...determination circuit, I8/music/encoder 19, 21...modulator/demodulator, addition...transmission line, η...decoder, ru... sound source waveform register,
5...In color noise generator, song...switch, υ...curved square root circuit, 昂...multiplier, 29-38... ...Synthesis filter, 39...
・・・・・・Digital-to-analog converter, 40...
......Low pass filter, 101...
......Maximum value detection circuit, 102...
...Normalization circuit patent applicant International Telegraph and Telephone Co., Ltd. Patent application agent Megumi Yamamoto - Tortoise 3 figure a Tail 4 figure Tail 5 figure Associate An

Claims

【特許請求の範囲】[Claims]

音声を入力信号としてその時系列信号を線形予測係数と
、音源情報をあられす情報と、スペクトル情報とに分解
して伝送し、受信側でこれらの情報を合成して原音声を
再生する音声分析合成方式において、送信側では、入力
信号を線形予測した予測誤差信号の波形の一部分又はそ
れに対応するインパルス応答を伝送し、受信側では、こ
の情報及び白色雑音のいずれか一方を有声音か否か九従
って波形合成時に駆動音源として用い、スペクトル情報
を音源に付加して原音声を再生することを特徴とする音
声分析合成方式。Speech analysis and synthesis uses audio as an input signal and decomposes the time-series signal into linear prediction coefficients, sound source information into hail information, and spectral information and transmits them, and synthesizes these information on the receiving side to reproduce the original audio. In this method, the transmitting side transmits a part of the waveform of the prediction error signal obtained by linearly predicting the input signal or the corresponding impulse response, and the receiving side determines whether either this information or white noise is voiced or not. Therefore, the speech analysis and synthesis method is characterized in that it is used as a driving sound source during waveform synthesis, adds spectral information to the sound source, and reproduces the original speech.