JPH0497199A

JPH0497199A - Voice encoding system

Info

Publication number: JPH0497199A
Application number: JP2209337A
Authority: JP
Inventors: Kimio Miseki; 公生三関; Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-08-09
Filing date: 1990-08-09
Publication date: 1992-03-30
Anticipated expiration: 2015-07-17
Also published as: JP3065638B2

Abstract

PURPOSE:To obtain a synthesized voice of high quality by providing a means stored with coefficient information on a filter and obtaining a synthesized voice signal by using the coefficient information. CONSTITUTION:A composite filter is composed of a polarity-zero filter and the zero filter 115 has the coefficient information on the zero filter in a code book B176. Then the electric power of a weighted error signal, obtained by weighting the error signal between an input signal and a synthesized voice signal outputted by the composite filter 113 consisting of the zero filter 115, the coke book B176, and a polarity filter by a weighing filter 120, is found by varying the coefficients in a code book A175 and the code book B176 in a closed loop shape. Further, when the weighted error becomes minimum, a distortion comparator 210 outputs the index of the coefficient in the code book A175 and the index of the coefficient in the code book B176 at the time of the minimum error as encoded signals corresponding to the input voice signal. Consequently, the stable synthesized voice is obtained.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は音声信号等を高能率に圧縮する音声符号化方
式に係り、特に低ビットの伝送レートにおける音声符号
化方式に関する。[Detailed Description of the Invention] [Object of the Invention] (Industrial Application Field) This invention relates to an audio encoding method for compressing audio signals etc. with high efficiency, and particularly relates to an audio encoding method at a low bit transmission rate. .

（従来の技術）音声信号を低ビットの伝送レートで伝送する場合におい
て、例えば１ｏｋｂ／ｓ程度以下の伝送情報量で符号化
する効果的な方法として、マルチモードＣＥＬＰ（Ｃｏ
ｄｅ　Ｅｘｃｉｔｅｄ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔ
ｉｏｎ）符号化方式が知られている。この詳細は１９８
９年のグタスゴーで行われたＩＣＡＳＳＰの論文（第１
の論文）　　ｒＭｕｌｔｉｗｏｄｅ　ｃｏｄｉｎｇ：　
Ａｐｐｌｉｃａｔｉｏｎ　ｔｏ　ＣＥＬＰＴｏｍｏｈｉ
ｋｏ　Ｔａｎｉｇｕｃｈｉ、　Ｓ　ｈｉｇｅｙｕｋｉ　
Ｕｎａｇａｎ＋ｉ　ａｎｄＲｏｂｅｒｔ　Ｍ、　Ｇｒａ
ｙＪに記載されている。この内容を簡単に説明する。第
６図はそれぞれ前記論文に記載されたマルチモード符号
化の原理を説明する図、第７図はマルチモードＣＥＬＰ
符号化器の処理を示すブロック図である。(Prior Art) When transmitting an audio signal at a low bit transmission rate, multimode CELP (Co
de Excited Linear Predict
ion) encoding method is known. This details is 198
ICASSP paper (1st
paper) rMultiwide coding:
Application to CELP Tomohi
ko Taniguchi, S higeyuki
Unagan+i and Robert M, Gra
It is described in yJ. This content will be briefly explained. Figure 6 is a diagram explaining the principle of multi-mode encoding described in the above paper, and Figure 7 is a diagram explaining the principle of multi-mode CELP.
FIG. 2 is a block diagram showing the processing of an encoder.

第６図において、符号側は、ｍ個の符号化器５１０．５
２０，５３０　　（符号化器＃１〜符号化器ｔｆｍ）を
備え、各符号化器は予め駆動信号パラメータとスペクト
ルパラメータに対して異なるビット割りあてを与えるよ
うに設定されている。In FIG. 6, the code side includes m encoders 510.5.
20,530 (encoder #1 to encoder tfm), and each encoder is set in advance to give different bit allocations to drive signal parameters and spectrum parameters.

各符号化器はフレーム単位で評価と最適符号化器の決定
部５５０で人力音声信号を並列的に処理し、入力音声信
号を用いて、各符号化器の与える合成音声信号（複合音
声信号）の品質を評価し、セレクタ５４０で最適な符号
化器のインテ・ソクスｎ（ｎは１，２．・・・ｍのうち
のいずれか）を用いて、伝送する駆動信号パラメータ及
びスペクトルパラメータを選択し伝送すると共に、イン
デックスｎの情報も複合側に伝送する。複合側では、符
号化器のインデックスｎを基に、符号化器１ｔｎに対応
する複合化器５６０（複合化器＃ｎ）を用いることによ
り合成音声信号を出力する。Each encoder processes the human voice signal in parallel in an evaluation and optimal encoder determining unit 550 on a frame-by-frame basis, and uses the input voice signal to generate a synthesized voice signal (composite voice signal) provided by each encoder. The selector 540 selects the driving signal parameters and spectral parameters to be transmitted using the optimal encoder inte-socs n (n is one of 1, 2, ... m). At the same time, the information of index n is also transmitted to the composite side. On the decoding side, a composite speech signal is output by using a decoder 560 (decoder #n) corresponding to the encoder 1tn based on the index n of the encoder.

以上が前記論文で示されたマルチモード符号化の概要で
ある。このマルチモード符号化の考えをＣＥＬＰ方式に
応用したものか第７図に示されるマルチモードＣＥＬＰ
符号化器である。The above is an overview of the multimode encoding presented in the paper. The multimode CELP method shown in Fig. 7 is an application of this multimode encoding idea to the CELP method.
It is an encoder.

ＣＥＬＰ方式は、駆動信号のベクトル量子化を合成音の
レベルで行う音声符号化方式であり、公知な技術である
。又、ＣＥＬＰ方式についての詳細はｒＭ、Ｒ，５ｃｈ
ｒｏｅｄｅｒ　ａｎｄ　Ｂ、Ｓ、　Ａｔａｌ、　”Ｃｏ
ｄｅｅｘｃｉｔｅｄｌｉｎｅａｒ　　ｐｒｅｄｉｃｔｉ
ｏｎ　　ＣＥＬＰ）：　　Ｈｉｇｈ　　ｑｕａｌｉｔｙ
　　５ｐｅｅｃｈａｔ　ｖｅｒｙ　ｌｏｗ　ｂｉｔ　ｒ
ａｔｅｓ、２　Ｐｒｏｃ、　ＩｃＡｓ５Ｐ　８５．　ｐ
ｐ。The CELP method is a speech encoding method that performs vector quantization of a drive signal at the level of a synthesized sound, and is a well-known technology. For details about the CELP method, please refer to rM, R, 5ch.
roeder and B.S., Atal, “Co.
deexcited linear predictive
on CELP): High quality
5peechat very low bit r
ates, 2 Proc, IcAs5P 85. p
p.

９３７−９４０　Ｊに記載されている。937-940 J.

第７図のマルチモード符号化方式は、上記のマルチモー
ド符号化方式を２つのモードという最も簡単な形でＣＥ
ＬＰに適用したものである。すなわちＡモードは、従来
の公知なＣＥＬＰ方式で、駆動信号ノくラメータ、スペ
クトルパラメータ（ＬＰＧ）ぐラメータ）を伝送し、さ
らに１ビ・ントのモード情報をフレーム毎に伝送する。The multimode encoding method shown in FIG.
This is applied to LP. That is, the A mode uses the conventional well-known CELP method to transmit drive signal parameters and spectral parameters (LPG parameters), and further transmits 1 bit of mode information for each frame.

一方、Ｂモードはスペクトルパラメータを伝送せずに、
前のフレームと同じスペクトルノくラメツを用いること
で、駆動信号パラメータに割りあてる量子化ビット数を
増加させた構成となっている。各フレームにおいて、Ａ
／Ｂのモード決定は、それぞれのモードの合成音声信号
の品質評価（ＳＮＲ等を用いる）に基づいて行われ、伝
送情報の割りあては２つのモード間のスイッチングによ
りダイナミックにコントロールされる。第７図において
、ＡモードではＬＰＣ分析部１００は入力音声信号から
スペクトルパラメータ（ＬＰＧパラメータ）を摘出し、
切り換え端子Ａ及び短時間合成フィルタ１１０に出力す
る。長時間合成フィルタ１５０のノくラメータ及びコー
ドブック（小）１７０から選択されるベクトルの波形（
コードブック内のベクトルに付されるインデックス＋符
号）及びゲインは入力音声と短時間合成フィルタ１１０
（合成フィルタ）で合成された合成信号との誤差信号を
、重みフィルタ１２０で重み付けした重み付き誤差信号
の電力が最小化するよう閉ループ的に求める。On the other hand, B mode does not transmit spectral parameters.
By using the same spectral rays as in the previous frame, the number of quantization bits allocated to the drive signal parameters is increased. In each frame, A
/B mode is determined based on the quality evaluation (using SNR, etc.) of the synthesized audio signal of each mode, and the allocation of transmission information is dynamically controlled by switching between the two modes. In FIG. 7, in A mode, the LPC analysis section 100 extracts spectral parameters (LPG parameters) from the input audio signal,
It is output to the switching terminal A and the short-time synthesis filter 110. The waveform of the vector selected from the parameter of the long-time synthesis filter 150 and the codebook (small) 170 (
The index + sign attached to the vector in the codebook) and the gain are the input voice and the short-time synthesis filter 110.
The error signal with respect to the composite signal synthesized by the (synthesis filter) is determined in a closed loop so that the power of the weighted error signal weighted by the weight filter 120 is minimized.

一方、Ｂモードでは、スペクトルパラメータメモリ２４
０がＡモードと決定された場合のみ端子Ａに接続されス
ペクトルパラメータを更新する構成となっており、スペ
クトルパラメータメモリ２４０に蓄積されるスペクトル
パラメータはＢモードである間は更新されずに同じもの
が使用される。長時間合成フィルタ１６０のパラメータ
及びコードブック（大）１８０の波形及びゲインはＡモ
ードで行ったのと同様の方法で決定される。モード決定
部２３０はＡモード、Ｂモードで計算された各モードの
誤差電力の最小値を入力し、誤差電力の小さい方のモー
ドを決定されたモードとして出力する。On the other hand, in B mode, the spectral parameter memory 24
0 is connected to terminal A and updates the spectrum parameters only when mode A is determined, and the spectrum parameters stored in the spectrum parameter memory 240 are not updated while the mode is B, and the same ones remain. used. The parameters of the long-term synthesis filter 160 and the waveform and gain of the large codebook 180 are determined in the same manner as in A mode. The mode determining unit 230 inputs the minimum value of the error power of each mode calculated in the A mode and the B mode, and outputs the mode with the smaller error power as the determined mode.

以上か第７図のマルチモードＣＥＬＰ方式（従来方式）
の説明である。Multi-mode CELP method (conventional method) as shown in Figure 7
This is an explanation.

この方式は、従来のＣＥＬＰ方式に比べて４．８ｋｂｉ
ｔ／Ｓ及び８ｋｂｉｔ／ｓの伝送レートにおいて、約２
ｄＢのセグメンタルＳＮＲの改善かあることが上記第１
の論文でも示されている。This method uses 4.8kbit more than the conventional CELP method.
t/s and a transmission rate of 8 kbit/s, approximately 2
The first thing mentioned above is that there is an improvement in the dB segmental SNR.
This is also shown in the paper.

この音声符号化方式は、入力信号に応してＡモードＢモ
ードと切りかわることにより駆動信号とスペクトルパラ
メータのビット割り当てがフレム毎に可変であった。In this audio encoding system, the bit allocation of the drive signal and spectrum parameter can be varied for each frame by switching between A mode and B mode depending on the input signal.

そしてフレームを一定の符号量で伝送する際、Ａモート
ではスペクトルパラメータへのビットの割りあてが多く
なり、駆動信号パラメータにはあまりビットを割りあて
ることかできない。このため、Ａモードでは従来のＣＥ
ＬＰ方式と同一であり、Ｂモードが使われる音声の区間
では前のフレームと同じスペクトルパラメータを用いる
ことにより駆動符号信号パラメータにより多くの量子化
ビットを割りあてることができる。よって、Ｂモートで
はＣＥＬＰ方式における音声品質の改善がなされる。When transmitting a frame with a constant code amount, in the A mode, a large number of bits are allocated to spectrum parameters, and only a small number of bits can be allocated to drive signal parameters. Therefore, in A mode, conventional CE
This is the same as the LP method, and by using the same spectrum parameters as the previous frame in the audio section where B mode is used, more quantization bits can be allocated to the drive code signal parameters. Therefore, in B-mote, the voice quality in the CELP method is improved.

一方、Ｂモードは現フレームのスペクトルパラメータの
代りに前フレームのスペクトルパラメタを使用できるよ
うな音声区間、すなわち、時間的にスペクトルの変化の
少ないような母音の区間で選択されやすいことは明白で
ある。On the other hand, it is clear that mode B is more likely to be selected in speech intervals where the spectral parameters of the previous frame can be used instead of the spectral parameters of the current frame, that is, in vowel intervals where the spectrum changes little over time. .

ところがこのような音声区間は一般に駆動信号の周期的
くり返しによる冗長度も高いため、通常のＣＥＬＰ方式
でも高いＳＮ比の合成音声が得られる。However, since such voice sections generally have a high degree of redundancy due to periodic repetition of the drive signal, synthesized voice with a high SN ratio can be obtained even with the normal CELP method.

このような音声区間にＢモードの符号化を行うと、ＣＥ
ＬＰ方式よりもさらに高いＳＮ比の合成音声が得られる
ことが期待されるが聴感的にはある程度高いＳＮ比をク
リアしている音声の違いはわかりにくい。When B-mode encoding is performed on such a speech section, CE
Although it is expected that synthesized speech with an even higher SN ratio than the LP method can be obtained, it is difficult to hear the difference between voices that have cleared a somewhat high SN ratio.

また、母音以外のスペクトルの変化の大きな音声区間は
Ａモード（通常のＣＥＬＰ方式）が選択されやすいので
、聴感的には通常のＣＥＬＰ方式による音声品質の劣化
は改善されないという問題点があった。Furthermore, since mode A (normal CELP method) is likely to be selected for speech sections with large spectral changes other than vowels, there is a problem in that the deterioration in sound quality caused by the normal CELP method cannot be audibly improved.

（発明が解決しようとする課題）上述したように、従来の音声符号化方式は、現フレーム
のスペクトルパラメータを使うモードと、前フレームの
スペクトルパラメータを使うモードとの２つのモードの
切り換えにより、駆動信号パラメータとスペクトルパラ
メータのビット割りあてがフレーム毎に可変であるが、
スペクトルの時間的変化の大きな子音等の音声区間では
前フレームのスペクトルパラメータを使用するモードは
使用されにくくなるため、低レートでは結局、従来の音
声符号化方式であるＣＥＬＰ方式における非定常区間の
音声品質の劣化は改善されないという問題点がある。(Problem to be Solved by the Invention) As described above, the conventional audio encoding method is driven by switching between two modes: a mode that uses the spectral parameters of the current frame, and a mode that uses the spectral parameters of the previous frame. Although the bit allocation of signal parameters and spectrum parameters is variable for each frame,
The mode that uses the spectral parameters of the previous frame is less likely to be used in speech sections such as consonants where the spectrum changes over time, so at low rates, the non-stationary section of speech in the CELP method, which is a conventional speech coding method, is difficult to use. There is a problem in that quality deterioration cannot be improved.

本発明は、このような問題点を解決するためになされた
ものであり、その目的は、低ビットの伝送レートで高品
質の合成音声を得ることのできる音声符号化方式を提供
することである。The present invention has been made to solve these problems, and its purpose is to provide a speech encoding method that can obtain high-quality synthesized speech at a low bit transmission rate. .

［発明の構成］（課題を解決するための手段）上述した目的を達成するため、本発明の音声符号化方式
は、極フィルタ及び零フィルタからなる合成フィルタを
駆動信号で駆動して合成音声信号を得る音声符号化方式
において、前記零フィルタの係数情報を格納する手段を
有し、前記係数情報を用いて前記合成音声信号を得るこ
とを特徴とするものである。[Structure of the Invention] (Means for Solving the Problems) In order to achieve the above-mentioned object, the speech encoding method of the present invention generates a synthesized speech signal by driving a synthesis filter consisting of a pole filter and a zero filter with a drive signal. The speech encoding system for obtaining the above-mentioned zero filter includes means for storing coefficient information of the zero filter, and the synthesized speech signal is obtained using the coefficient information.

（作　用）上述した構成を有する本発明の音声符号化方式によれば
、極フィルタ及び零フィルタからなる合成フィルタのう
ち、該零フィルタの係数情報を格納する手段を有し、こ
の係数情報を用いて合成音声信号を得るので、スペクト
ルの変化か大きな子音等の音声区間でも、該区間の音声
にあったフィルタを選択するができる。よって高品質で
安定した合成音声を得ることかできる。(Function) According to the speech encoding system of the present invention having the above-described configuration, there is provided means for storing coefficient information of the zero filter among the synthesis filters consisting of the pole filter and the zero filter, and the coefficient information is stored in the synthesis filter. Since a synthesized speech signal is obtained using the filter, even in a speech section where the spectrum changes or there is a large consonant, etc., a filter suitable for the speech in that section can be selected. Therefore, it is possible to obtain high quality and stable synthesized speech.

（実施例）以下、図面を参照して本発明の符号化方式について詳細
に述べる。(Example) Hereinafter, the encoding method of the present invention will be described in detail with reference to the drawings.

第１図、第２図は本発明の音声符号化方式を行なうため
のブロック図である。第１図において入力音声信号はＬ
ＰＣ分析部１００により線形予測とピッチ検出あ行なわ
れ、これを短時間合成フィルタ１１０及び長時間合成フ
ィルタ１５０に出力する。そしてコードブックＡ１７５
から選択されるベクトルの波形（該コードブックＡ内の
ベクトルに付されるインデックス＋符号）及びゲインが
乗算回路１９０を介して長時間合成フィルタ１５０に入
力される。長時間合成フィルタ１５０では、入力音声信
号のピッチの周期性を除去する。これを短時間合成フィ
ルタ（以下合成フィルタという）１１０に入力すると、
前記ＬＰＣ分析部１００の線形予測による予測パラメー
タ（合成フィルタ（極フィルタ）１１０の係数情報）か
ら合成音声信号を生成する。ここで本発明によれば、合
成フィルタを極零形フィルタで構成するので、零フィル
タ１１５を有する。モして零フィルタ１１５はコートブ
ック８１７６に零フィルタの係数情報を有している。よ
って零フィルタ　１１５及び極フィルタからなる合成フ
ィルタ　＋１３から出力される合成音声信号と前記入力
信号との誤差信号に対して、重みフィルタ１２０て重み
付けした重み付は誤差信号の電力を、前記コードブクＡ
１７５及びコードブックＢ１７６内の係数を閉ループ的
に変化させる。そして歪み比較器２１０はこれら重み付
けした誤差か最小となると、該最小となる時のコートブ
ックＡ１７５内の係数のインデックス及びコードブック
Ｂ１７６内の係数のインデックスを入力音声信号に対応
する符号化信号として出力する。なお、第１図の零フィ
ルタ１１５に対応する第２図のＢ（Ｚ）かＢ（Ｚ）−１
の場合、零フィルタの係数の情報はない。ここで固定レ
ートで伝送を行なう際、伝送できる駆動信号パラメータ
及び零フィルタのパラメータは決まってしまう。しかし
、定の符号量であればこれらに対するビットの割りあて
は任意でもかまわない。したがって上述したようにＢ（
Ｚ）−１の場合には零フィルタのパラメータは送る必要
かなく、駆動信号パラメータにより多くのビットを割り
あてることかできる。反対にＢ（Ｚ）−１の場合は、零
フィルタの係数も伝送しなければならないので、駆動信
号パラメータのビット割りあては少なくなる。FIGS. 1 and 2 are block diagrams for carrying out the speech encoding method of the present invention. In Figure 1, the input audio signal is L
The PC analysis section 100 performs linear prediction and pitch detection, and outputs the results to the short-time synthesis filter 110 and the long-time synthesis filter 150. And codebook A175
The waveform (index+sign assigned to the vector in the codebook A) and gain of the vector selected from are input to the long-term synthesis filter 150 via the multiplication circuit 190. The long-term synthesis filter 150 removes pitch periodicity of the input audio signal. When this is input to the short-time synthesis filter (hereinafter referred to as synthesis filter) 110,
A synthesized speech signal is generated from the predicted parameters (coefficient information of the synthesis filter (pole filter) 110) obtained by the linear prediction of the LPC analysis section 100. Here, according to the present invention, since the synthesis filter is composed of a pole-zero filter, the zero filter 115 is provided. Furthermore, the zero filter 115 has coefficient information of the zero filter in the code book 8176. Therefore, the weighting performed by the weighting filter 120 on the error signal between the synthesized speech signal outputted from the synthesis filter +13 consisting of the zero filter 115 and the pole filter +13 and the input signal increases the power of the error signal to the codebook A.
175 and the coefficients in codebook B 176 are varied in a closed loop. When the weighted errors become the minimum, the distortion comparator 210 outputs the index of the coefficient in the codebook A 175 and the index of the coefficient in the codebook B 176 at the time of the minimum as a coded signal corresponding to the input audio signal. do. Note that B(Z) or B(Z)-1 in FIG. 2, which corresponds to the zero filter 115 in FIG.
In the case of , there is no information about the coefficients of the zero filter. When transmitting at a fixed rate, the drive signal parameters and zero filter parameters that can be transmitted are fixed. However, as long as the amount of code is constant, bits may be allocated to these bits arbitrarily. Therefore, as mentioned above, B(
In the case of Z)-1, there is no need to send the zero filter parameters, and more bits can be allocated to the drive signal parameters. Conversely, in the case of B(Z)-1, the coefficients of the zero filter must also be transmitted, so the bit allocation for the drive signal parameters becomes smaller.

次に第３図は第１図に示した音声符号化方式を複数用い
た方式を示したブロック図である。第３図ではＢ（Ｚ）
〜１の場合、零フィルタ１１５はコードブックＢ１７６
を有しているため、零フィルタ１１６のＢ（Ｚ）−１の
場合における駆動信号パラメータのコードブック１８０
より小さくなってしまう。Next, FIG. 3 is a block diagram showing a method using a plurality of audio encoding methods shown in FIG. 1. In Figure 3, B(Z)
~1, the zero filter 115 is codebook B176
Therefore, the codebook 180 of the drive signal parameters in the case of B(Z)-1 of the zero filter 116 is
It becomes smaller.

さらに第４図は本発明の一実施例に係る符号化方式を符
号化装置に適用した場合のブロック図を示す。Furthermore, FIG. 4 shows a block diagram when the encoding method according to an embodiment of the present invention is applied to an encoding device.

第４図において、入力端子１ＧからＡ／Ｄ変換された人
力音声信号の系列か入力される。フレームバッファ１１
は入力音声信号を１フレ一ム分蓄積する回路である。第
４図の各ブロックはフレーム単位又はフレームを複数個
に分割したサブフレーム単位に以下の処理を行う′。In FIG. 4, a series of A/D converted human voice signals is input from the input terminal 1G. frame buffer 11
is a circuit that stores input audio signals for one frame. Each block in FIG. 4 performs the following processing on a frame basis or on a subframe basis obtained by dividing a frame into a plurality of pieces.

予／ＩＩＩＪハラメータ計算回路１２は、予測パラメー
タを公知の方法を用いて計算する。予測フィルタが第５
図に示すような長時間予測フィルタ４１と短時間予測フ
ィルタ４２を縦続持続して構成される場合、予測パラメ
ータ計算回路１２はピッチ周期ピッチ予測係数および線
形予測係数（αパラメータまたはにパラメータ：総して
ＬＰＣパラメータと称す）を自己相関法や共分散法等の
公知の方法で計算する。The Pre/IIIJ Harameter calculation circuit 12 calculates prediction parameters using a known method. The prediction filter is the fifth
When a long-term prediction filter 41 and a short-time prediction filter 42 are configured in cascade as shown in the figure, the prediction parameter calculation circuit 12 calculates pitch cycle pitch prediction coefficients and linear prediction coefficients (α parameter or parameter: total). (referred to as LPC parameters) using a known method such as an autocorrelation method or a covariance method.

計算法については、例えば（古井貞照著「ディジタル音
声処理Ｊ　１９Ｂ５年東海大学比版会発行）に記述され
ている。計算された予測パラメータは、予測パラメータ
符号化回路１３へ入力される。予測パラメータ符号化回
路１３は、予測パラメータを予め定められた量子化ビッ
ト数に基づいて符号化し、この符号をマルチプレクサ２
５に出力すると共に、ゲイン計算回路１５、合成フィル
タ１８、重みフィルタ２０へそれぞれ出力する。The calculation method is described, for example, in Sadateru Furui, "Digital Speech Processing J, published by Tokai University Bibankai, 19B5." The calculated prediction parameters are input to the prediction parameter encoding circuit 13. Prediction. The parameter encoding circuit 13 encodes the prediction parameter based on a predetermined number of quantization bits, and sends this code to the multiplexer 2.
5, and also output to the gain calculation circuit 15, synthesis filter 18, and weighting filter 20, respectively.

ゲイン計算回路１５は後述する零フィルタ係数コードブ
ック１４からの零フィルタの係数と、係数検索回路２４
から出力される係数更新信号と、符号化回路１３からの
予測パラメータ（極フィルタの係数情報）をもとに極零
形の合成フィルタＨ（Ｚ）を構成する。この逆フィルタ
１／Ｈ（Ｚ）を予測フィルタとして入力音声信号を予測
し、予測残差信号を作成する。次にゲイン計算回路１５
は予測残差信号の平均パワーを計算してこれをゲインと
して符号化回路１６へ出力する。前記予測残差信号の平
均パワーとしては、例えば標準偏差を用いることができ
る。The gain calculation circuit 15 uses zero filter coefficients from a zero filter coefficient codebook 14, which will be described later, and a coefficient search circuit 24.
A pole-zero synthesis filter H(Z) is constructed based on the coefficient update signal output from the encoder 13 and the prediction parameters (pole filter coefficient information) from the encoding circuit 13. This inverse filter 1/H(Z) is used as a prediction filter to predict the input audio signal and create a prediction residual signal. Next, gain calculation circuit 15
calculates the average power of the prediction residual signal and outputs this as a gain to the encoding circuit 16. For example, a standard deviation can be used as the average power of the prediction residual signal.

符号化回路ＩＢはゲインを予め定められた量子化ビット
数に基づいて符号化し、この符号をマルチプレクサ２５
および乗算回路１７へ出力する。零フィルタ係数コード
ブック１４は予め定められた次数と、量子化ビット数Ｍ
に対応した２Ｍ種類の零フィルタのフィルタ係数情報を
格納するものである。また、零フィルタ係数コードブッ
ク１４に格納される零フィルタＢ（Ｚ）の１つにＢ（Ｚ
）−１となるフィルタ情報を格納すれば、零フィルタを
用いない全極形の合成フィルタＨ（Ｚ）が自動的に同一
の構成で作成できる。The encoding circuit IB encodes the gain based on a predetermined number of quantization bits, and sends this code to the multiplexer 25.
and output to the multiplication circuit 17. The zero filter coefficient codebook 14 has a predetermined order and a quantization bit number M.
It stores filter coefficient information of 2M types of zero filters corresponding to . In addition, one of the zero filters B(Z) stored in the zero filter coefficient codebook 14 has B(Z
)-1, an all-pole synthesis filter H(Z) without using a zero filter can be automatically created with the same configuration.

本実施例では、零フィルタ係数コードブック１４は、２
ト１種類の零フィルタ係数情報を格納し、その第１番の
コードベクトルを用いて作成される零フィルタＢ（Ｚ）
は、Ｂ（Ｚ）−１となるように予めコードブック１４か
作成されているものとする。In this embodiment, the zero filter coefficient codebook 14 has 2
Zero filter B (Z) that stores one type of zero filter coefficient information and creates it using the first code vector.
It is assumed that the codebook 14 has been created in advance so that B(Z)-1.

零フィルタ係数コートブック１４は、係数探索回路２４
から入力されるコード更新信号に基つき、該零フィルタ
コードブック１４に格納された零フィルタ係数（コード
ベクトル）をゲイン計算回路１５、合成フィルタ１８へ
出力すると共に、零フィルタＢ、（Ｚ）がＢ（Ｚ）−１
かＢ（Ｚ）　壓１　カッ情報ＰＺをコートブック２１へ
出力する。The zero filter coefficient coatbook 14 is a coefficient search circuit 24.
Based on the code update signal inputted from B(Z)-1
KA B(Z) 壓1 Output the ka information PZ to the coatbook 21.

コードブック２１はコードブック１４からの情報Ｐｚに
応じて予め設定される制限された数のコートベクトルを
乗算回路１７へ出力する。このときのコドベクトルの出
力は、コード探索回路２３から入力されるコード更新信
号によって制御される。コードブック２１内のコードベ
クトルの検索範囲の制限は例えば次のように決めること
かできる。The codebook 21 outputs a limited number of code vectors set in advance according to information Pz from the codebook 14 to the multiplication circuit 17. The output of the codevector at this time is controlled by the code update signal input from the code search circuit 23. For example, the limit on the search range of code vectors in the code book 21 can be determined as follows.

コードブックからの情報ＰＺが零フィルタＢ（Ｚ）−１
を示す情報である場合は、零フィルタ係数の情報は無い
ので、その分駆動信号に多くのビット数割りあてて、駆
動信号の形状を表すコートブック２１内のコードベクト
ルの検索範囲を広げることができる。Information PZ from the codebook is zero filter B(Z)-1
If the information indicates that there is no information on the zero filter coefficient, it is possible to allocate a larger number of bits to the drive signal to widen the search range of the code vector in the code book 21 that represents the shape of the drive signal. can.

逆に、該情報ＰＺが零フィルタＢ（Ｚ）４１を示す情報
である場合は零フィルタ係数の情報を伝送する必要があ
るので、その分駆動信号に少ないビ・ソト数を割りあて
て、コードブック２１内のコードベクトルの検索範囲を
せばめるものとする。On the other hand, if the information PZ is information indicating the zero filter B(Z) 41, it is necessary to transmit the information of the zero filter coefficient, so a smaller number of bi-sotos is assigned to the drive signal and the code is Assume that the search range for code vectors in book 21 is narrowed.

乗讃回路１７は、コードブック２１から出力されるコー
ドベクトルに符号化されたゲインを乗じて駆動信号の候
補となるベクトルを生成し、合成フィルタ１８へ入力す
る。The multiplication circuit 17 multiplies the code vector output from the codebook 21 by the encoded gain to generate a vector that is a candidate for the drive signal, and inputs the vector to the synthesis filter 18 .

合成フィルタ１８は零フィルタ係数コードブック１４と
符号化回路１３とより、零フィルタの係数情報および極
フィルタの係数情報（これをまとめてスペクトルパラメ
ータと呼んでいる）をそれぞれ入力し、合成フィルタＨ
（Ｚ）を構成し、乗算回路１７よりの駆動信号の候補ベ
クトルを入力信号として合成音声信号を出力する。The synthesis filter 18 inputs zero filter coefficient information and pole filter coefficient information (collectively referred to as spectral parameters) from the zero filter coefficient codebook 14 and the encoding circuit 13, and generates the synthesis filter H.
(Z), and outputs a synthesized speech signal using the candidate vector of the drive signal from the multiplication circuit 17 as an input signal.

減算回路１９は人力音声信号と上述の合成音声信号を入
力し、その誤差信号を出力する。The subtraction circuit 19 inputs the human voice signal and the above-mentioned synthesized voice signal, and outputs an error signal thereof.

重みフィルタ２０は上述の誤差信号に予測パラメタから
作成される重みを付けて出力する。重みフィルタ２０は
伝達関数かＡ（Ｚ）Ｗ（Ｚ）−（０≦γ≦１）　　　（１）Ａ　（Ｚ／γ）で表されるフィルタで、聴覚のマスキング効果を利用し
て、複合時に合成音声に含まれる符号化ノイズを聞こえ
にくくする効果があることが知られている。（１）式に
おいて、Ａ（Ｚ）は予測パラメータから作成される予測
フィルタを表している。The weighting filter 20 attaches a weight created from the prediction parameters to the above-mentioned error signal and outputs the weighted signal. The weight filter 20 is a filter expressed by the transfer function A (Z) W (Z) - (0≦γ≦1) (1) A (Z/γ), and uses the auditory masking effect to It is known that it has the effect of making it difficult to hear the encoding noise contained in synthesized speech. In equation (1), A(Z) represents a prediction filter created from prediction parameters.

２乗誤差計算回路２２は、重み付けされた誤差信号の２
乗和をコードブック２１から出力されるコーベクトル毎
に計算し、その結果をコード検索回路２３へ出力すると
共に、誤差信号の２乗和を１フレ一ム分計算した値を係
数検索回路２４へ出力する。The squared error calculation circuit 22 calculates 2 of the weighted error signal.
A sum of multiplications is calculated for each covector output from the codebook 21, and the result is output to the code search circuit 23, and a value obtained by calculating the sum of squares of the error signal for one frame is sent to the coefficient search circuit 24. Output.

コード検索回路２３は後述する係数検索回路２４から出
力される現在検索中の零フィルタのコード番号を入力し
、その零フィルタのコード番号ごとに各サブフレームの
２乗誤差が最小となるコードをコートブック２１から検
索し、このコードを保持する。係数検索回路２４で最終
的に零フィルタのコード番号か決定すると、この番号を
入力し保持していた駆動信号のコードのうち、零フィル
タのコド番号に対応して保持していたコードをマルチプ
レクサ２５へ出力する。The code search circuit 23 inputs the code number of the zero filter currently being searched that is output from the coefficient search circuit 24 (described later), and codes a code that minimizes the squared error of each subframe for each code number of the zero filter. Search from Book 21 and keep this code. When the code number of the zero filter is finally determined by the coefficient search circuit 24, this number is input and the code held corresponding to the code number of the zero filter is sent to the multiplexer 25 among the codes of the drive signal held. Output to.

係数検索回路２４は２乗誤差計算回路２２から入力され
る各零フィルタのコード番号毎にフレーム単位で計算さ
れた誤差信号の２乗和を比較してこれが最小となる零フ
ィルタのコード番号を選択し、このコード番号をマルチ
プレクサ２５およびコード検索回路２３へ出力する。も
し検索された零フィルタ係数のコード番号が１ならば上
述したように、零フィルタは非使用であることがわかる
ので、このときは、コード検索回路２３から出力される
駆動信号のコードは零フィルタ使用時に比べてより大き
なビット数で表されている。係数検索回路２４は零フィ
ルタの使用・非使用の情報も同時にマルチプレクサ２５
へ出力する。第１表に本実施例における駆動信号とスペ
クトルパラメータとの間のビット配分の例を示す。The coefficient search circuit 24 compares the sum of squares of the error signals calculated for each frame for each code number of each zero filter inputted from the squared error calculation circuit 22, and selects the code number of the zero filter that has the smallest value. Then, this code number is output to multiplexer 25 and code search circuit 23. If the code number of the searched zero filter coefficient is 1, it can be seen that the zero filter is not used as described above, so in this case, the code of the drive signal output from the code search circuit 23 is the zero filter coefficient. It is represented by a larger number of bits than when used. The coefficient search circuit 24 also sends information on whether the zero filter is used or not to the multiplexer 25.
Output to. Table 1 shows an example of bit allocation between the drive signal and the spectrum parameter in this embodiment.

第１表第１表において、使用する合成フィルタは零フィルタが
Ｂ（Ｚ）−１とＢ（Ｚ）＋１の場合により、全極フィル
タと極零フィルタとに分けることができる。Table 1 In Table 1, the synthesis filters used can be divided into all-pole filters and pole-zero filters depending on whether the zero filter is B(Z)-1 or B(Z)+1.

今、フレームあたりのビット数をＲビットとする時、ス
ペクトルパラメータ用ビット数は極フィルタのビット数
にビットのみとなり、駆動信号要ビット数は当然Ｒ−に
ビットとなる。よってフレームあたりのビット数は常に
Ｒ一定となる。また、極零フィルタを用いた場合には、
零フィルタにもスペクトルパラメータ用ビットとしてＭ
ビットを割りふるので、残りを駆動用信号とするもので
ある。Now, when the number of bits per frame is R bits, the number of bits for spectral parameters is only the number of bits for the polar filter, and the number of required bits for the drive signal is naturally R- bits. Therefore, the number of bits per frame is always R constant. Also, when using a pole-zero filter,
The zero filter also has M as a spectral parameter bit.
Since the bits are allocated, the remainder is used as a driving signal.

マルチプレクサ２５は入力されるコード情報を多重化し
、端子２６より伝送路へコード情報を出力する。The multiplexer 25 multiplexes the input code information and outputs the code information from the terminal 26 to the transmission path.

このように、本発明の音声符号化によれば、入力音声信
号の音質の変化に適応して、スペクトル包絡を表すフィ
ルタと駆動信号のパラメータのビット配分がフレーム単
位で変化するだけでなく、このフィルタを極零形で表し
、零フィルタのフィルタ係数の量子化、つまりコードブ
ックの選択を、入力音声信号と合成音声信号の聴感重み
付けした誤差が最小となるように行っている。このため
、スペクトルお時間的変化が大きな音声区間に対しても
、その区間に適合したフィルタを選択できるので、合成
音声の品質を安定して向上させることができる。As described above, according to the audio encoding of the present invention, not only the bit allocation of the filter representing the spectral envelope and the parameter of the driving signal changes on a frame-by-frame basis in response to changes in the sound quality of the input audio signal. The filter is represented by a pole-zero shape, and the filter coefficients of the zero filter are quantized, that is, the codebook is selected so that the perceptually weighted error between the input speech signal and the synthesized speech signal is minimized. Therefore, even for a speech section with a large temporal change in spectrum, a filter suitable for that section can be selected, so that the quality of synthesized speech can be stably improved.

なお、ここで説明した実施例は本発明の一実施Note that the embodiment described here is one implementation of the present invention.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図　第２図は本発明の音声符号化方式を行なうため
ブロック図、第３図は複数の音声符号化方式に本発明の
音声符号化方式を用いたブロック図、第４図は本発明の
一実施例に係る音声符号化方式を符号化装置に適用した
構成を示すブロック図、第５図は第４図を用いた実施例
に記載される予測フィルタの一構成例を示すブロック図
、第６図、第７図は従来技術による符号化装置の構成を
示すブロック図である。１１０・・短時間合成フィルタ（極フィルタ）１１３・
・・合成フィルタ１１５・・・零フィルタ１７５．１７６・・・フードブック１９５・・・駆動信号発生部以上詳述したように本発明の音声符号化方式によれば、
高品質で安定した合成音声を得ることができる。Figure 1. Figure 2 is a block diagram for implementing the audio encoding method of the present invention. Figure 3 is a block diagram for using the audio encoding method of the present invention for multiple audio encoding methods. Figure 4 is a block diagram for implementing the audio encoding method of the present invention. FIG. 5 is a block diagram showing a configuration example of a prediction filter described in the embodiment using FIG. 4; FIGS. 6 and 7 are block diagrams showing the configuration of a conventional encoding device. 110... Short-time synthesis filter (pole filter) 113...
...Synthesis filter 115...Zero filter 175, 176...Food book 195...Drive signal generator As detailed above, according to the speech encoding method of the present invention
High quality and stable synthesized speech can be obtained.

Claims

【特許請求の範囲】[Claims]

（１）極フィルタ及び零フィルタからなる合成フィルタ
を駆動信号で駆動して合成音声信号を得る音声符号化方
式において、前記零フィルタの係数情報を格納する手段
を有し、前記係数情報を用いて前記合成音声信号を得る
ことを特徴とする音声符号化方式。(1) A speech encoding method for obtaining a synthesized speech signal by driving a synthesis filter consisting of a pole filter and a zero filter with a drive signal, comprising means for storing coefficient information of the zero filter, and using the coefficient information. A speech encoding method characterized in that the synthesized speech signal is obtained.

（２）極フィルタ及び零フィルタからなる合成フィルタ
と駆動信号で駆動して合成音声信号を得る音声符号化方
式において、前記フィルタの係数情報を格納する手段を
有し、前記係数情報を用いて合成音声信号を生成し、こ
の合成音声信号と入力音声信号とのひずみにもとづいて
前記零フィルタの係数情報の選択を行なうことを特徴と
する音声符号化方式。(2) In a speech encoding method that obtains a synthesized speech signal by driving a synthesis filter consisting of a pole filter and a zero filter and a drive signal, the method includes means for storing coefficient information of the filter, and a means for storing coefficient information of the filter is used for synthesis using the coefficient information. 1. A speech encoding method, characterized in that a speech signal is generated, and coefficient information of the zero filter is selected based on distortion between the synthesized speech signal and the input speech signal.

（３）駆動信号パラメータと、極フィルタ及び零フィル
タからなる合成フィルタのパラメータのビット割りあて
が異なる複数種類の符号化方式から各符号化方式による
合成音声信号と入力音声信号のひずみを計算して１つの
符号化方式を選択する音声符号化方式において、前記複
数個の符号化方式のうち少なくとも１つの符号化方式は
前記零フィルタの係数情報を格納する手段を有し、前記
係数情報を用いて合成音声信号を生成し、この合成音声
信号と入力音声信号とのひずみにもとづいて前記零フィ
ルタの係数の選択を行なうことを特徴とする音声符号化
方式。(3) Calculate the distortion of the synthesized audio signal and input audio signal by each encoding method from multiple types of encoding methods with different bit assignments for the drive signal parameters and the parameters of the synthesis filter consisting of pole filters and zero filters. In a speech encoding method in which one encoding method is selected, at least one of the plurality of encoding methods has means for storing coefficient information of the zero filter, and the coefficient information is used to store coefficient information of the zero filter. 1. A speech encoding method, characterized in that a synthesized speech signal is generated, and coefficients of the zero filter are selected based on distortion between the synthesized speech signal and an input speech signal.

（４）前記駆動信号パラメータと前記スペクトルパラメ
ータのビット割りあてが、前記合成フィルタ中に、前記
零フィルタを用いるか用いないかに依存して、決まるこ
とを特徴とする請求項２及び３記載の音声符号化方式。(4) The audio according to claim 2 or 3, characterized in that the bit allocation of the drive signal parameter and the spectrum parameter is determined depending on whether or not the zero filter is used in the synthesis filter. Encoding method.

（５）前記合成フィルタ中の前記極フィルタが、各符号
化方式で共通であることを特徴とする請求項２及び３記
載の音声符号化方式。(5) The speech encoding method according to claim 2 or 3, wherein the polar filter in the synthesis filter is common to each encoding method.

（６）前記合成フィルタのうち、前記零フィルタのフィ
ルタ係数を入力音声信号と合成音声信号との聴感重み付
誤差に基づいて選択することを特徴とする請求項２及び
３記載の音声符号化方式。(6) The speech encoding method according to claim 2 or 3, wherein the filter coefficients of the zero filter among the synthesis filters are selected based on a perceptually weighted error between the input speech signal and the synthesized speech signal. .