JP3954716B2

JP3954716B2 - Excitation signal encoding apparatus, excitation signal decoding apparatus and method thereof, and recording medium

Info

Publication number: JP3954716B2
Application number: JP05618098A
Authority: JP
Inventors: 宏幸江原; 利幸森井
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-02-19
Filing date: 1998-02-19
Publication date: 2007-08-08
Anticipated expiration: 2018-02-19
Also published as: JPH11237899A

Abstract

PROBLEM TO BE SOLVED: To improve voice quality at a low bit rate of about 4 kb by specifying a driving source vector candidate of an object channel among driving sound source vectors with indexes to other channels. SOLUTION: The indexes of a pulse 1 and a pulse 2 outputted from a pulse-1 index generator 101 and a pulse-2 index generator 102 are inputted to a 1st index/pulse position converter 104 and a 1st index/pulse position converter 104 determines the pulse position of the pulse 1 by using those two indexes. Namely, the search range of sound source pulses can be expanded to double length without increasing the number of bits by using bit information assigned to another pulse in addition to bits assigned to the pulse so as to represent the position of one pulse.

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号を符号化して伝送する移動通信システム等におけるＣＥＬＰ（Code Excited Linear Prediction)型音源信号符号化装置、音源信号復号化装置及びそれらの方法、並びに記録媒体に関するものである。
【０００２】
【従来の技術】
ディジタル移動通信や音声蓄積の分野においては、電波や記憶媒体の有効利用を図るために、音声情報を圧縮して高能率で符号化する音声符号化装置が用いられている。そのなかでも、ＣＥＬＰ（Code Excited Linear Prediction：符号励振線形予測符号化）方式をベースにした方式が、中・低ビットレートにおいて広く実用化されている。ＣＥＬＰ技術については、M.R.Schroeder and B.S.Atal："Code-Excited Linear Prediction (CELP)：High-quality Speech at Very Low Bit Rates"，Proc．ICASSP-85, 25.1.1, pp.937-940, 1985"に示されている。
【０００３】
ＣＥＬＰ型音声符号化方式は、音声をある一定のフレーム長（５ｍｓ〜５０ｍｓ程度）に区切り、各フレーム毎に音声の線形予測を行い、フレーム毎の線形予測による予測残差（励振信号）を既知の波形からなる適応符号ベクトルと雑音符号ベクトルを用いて符号化するものである。適応符号ベクトルは、過去に生成した駆動音源ベクトルを格納している適応符号帳から、雑音符号ベクトルは予め用意された定められた数の定められた形状を有するベクトルを格納している雑音符号帳から、それぞれ選択されて使用される。
【０００４】
雑音符号帳に格納される雑音符号ベクトルには、ランダムな雑音系列のベクトルや何本かのパルスを異なる位置に配置することによって生成されるベクトルなどが用いられる。
【０００５】
特に後者の代表的な例として１９９６年にＩＴＵ−Ｔで国際標準として勧告されたＣＳ−ＡＣＥＬＰ（Conjugate Structure and Algebraic CELP）が挙げられる。ＣＳ−ＡＣＥＬＰの技術については "Recommendation G.729：Coding of Speech at 8 kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP)", March 1996に示されている。
【０００６】
ＣＳ−ＡＣＥＬＰでは雑音符号帳として代数的符号帳（Algebraic Codebook）を用いている。ＣＳ−ＡＣＥＬＰの代数的符号帳から生成される雑音符号ベクトルは、４０サンプル（５ｍｓ）のサブフレーム中に振幅が−１か＋１である４本のインパルスが立てられたベクトル（４本のパルスが立てられた位置以外は基本的に全て零）である。振幅の絶対値は１に固定されているので、音源ベクトルを表現するためには、各パルスの位置と極性（正負）のみを表現すれば良い。このため、４０（サブフレーム長）次元のベクトルとして符号帳に格納する必要がなく、符号帳格納用のメモリが不要である。また、振幅が１であるパルスが４本しかベクトル中に存在しないため、符号帳探索のための演算量を大幅に削減できるなどの利点を有している。
【０００７】
ＣＳ−ＡＣＥＬＰ型符号化装置の従来例を図３７を参照して以下に具体的に説明する。
【０００８】
図３７はＣＳ−ＡＣＥＬＰ型音声符号化装置の基本的なブロック図を示している。同図において、１は加算器６から出力される過去に生成された駆動音源ベクトルを入力として、歪み最小化器１２からの制御信号を受けて乗算器３に適応符号ベクトルを出力する適応符号帳、２は歪み最小化器１２からの制御信号を受けて乗算器４に雑音符号ベクトルを出力する代数的符号帳、３はゲイン符号帳８から出力される適応符号利得と適応符号帳１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器６に出力する乗算器、４はゲイン予測器７から出力される予測ゲインと代数的符号帳２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器５に出力する乗算器、５は乗算器４から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳８から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器６およびゲイン予測器７にそれぞれ出力する乗算器、６は乗算器３から出力された利得乗算後の適応符号ベクトルと乗算器５から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ１０と適応符号帳１に出力する加算器、７は乗算器５から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器４に出力するゲイン予測器、８は歪み最小化器１２からの制御信号によって適応符号利得を乗算器３に、雑音符号利得を乗算器５にそれぞれ出力するゲイン符号帳、９は音声信号を入力として線形予測分析を行い合成フィルタに線形予測係数を出力する線形予測分析器、１０は加算器６から出力された駆動音源ベクトルと線形予測分析器９から出力された線形予測係数とを入力として、合成音声信号を加算器１１に出力する合成フィルタ、１１は入力音声信号と合成フィルタ１０から出力される合成音声信号とを入力としてその差を算出する加算器、１２は加算器１１から出力された誤差信号を入力として符号化歪みを算出し、符号化歪みが最小となるように適応符号帳１と代数的符号帳２とゲイン符号帳８に制御信号を送る符号化歪み最小化器である。なお、歪み最小化器１２によって決定された歪みを最小とする適応符号ベクトルのインデックスＡと雑音符号ベクトルのインデックスＳとゲインベクトルのインデックスＧはそれぞれ復号装置に伝送される。また、線形予測分析器によって得られた線形予測係数は量子化された後、量子化線形予測係数Ｌとして復号装置に伝送される。
【０００９】
以下に前記構成のＣＳ−ＡＣＥＬＰ型音声符号化装置の動作を図３７〜４４を参照して説明する。
【００１０】
まず、図３７において、音声信号は線形予測分析器９に入力される。線形予測分析器９は入力音声信号の線形予測分析を行い、合成フィルタ１０で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ１０に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。
【００１１】
次に、線形予測分析器９から出力された量子化線形予測係数を用いて合成フィルタ１０が構成される。歪み最小化器１２は適応符号帳１のみを用いて合成フィル１０駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳１の中から選択する。以後、適応符号帳１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。
【００１２】
続いて、適応符号帳１と代数的符号帳２の両方を用いて合成フィルタ１０を駆動する。このとき、適応符号帳１から出力される適応符号ベクトルは既に決定されているので、代数的符号帳２から出力される雑音符号ベクトルを換えていき、歪み最小化器１２によって歪みが最小と判定される雑音符号ベクトルを選択する。
【００１３】
ここまでの処理で、適応符号帳１から出力される適応符号ベクトルと代数的符号帳２から出力される雑音符号ベクトルとが決定される。
【００１４】
最後に適応符号ベクトルに乗ずべき適応符号利得と雑音符号ベクトルに乗ずべき雑音符号利得とが歪み最小化器１２によって決定される。ゲイン符号帳８には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器３へ、雑音符号利得の要素が乗算器５へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。
【００１５】
なお、ＣＳ−ＡＣＥＬＰ型音声符号化装置では、雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器７によって予測し、その予測残差分をゲイン符号帳８を用いて量子化している。ゲイン予測器７は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測結果を予測ゲインとして乗算器４へ出力する。
【００１６】
代数的符号帳２から出力される雑音符号ベクトルは乗算器４と乗算器５でそれぞれ予測ゲインと雑音符号利得を乗ぜれる。歪み最小化器１２は、ゲイン符号帳８の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【００１７】
以下に、ＣＳ−ＡＣＥＬＰの特徴の一つである代数的符号帳の構成について図３８と図３９を参照して説明を加える。
ＣＳ−ＡＣＥＬＰの代数的符号帳は４チャンネルから構成される。各チャンネルからは振幅が＋１か−１である１本のパルスが出力される。各チャンネルから出力されるパルスの位置には制限が加えられていて予め定められた範囲の位置にしかパルスが立てられる事はない。ＣＳ−ＡＣＥＬＰでは４０サンプル（５ｍｓ）のサブフレーム単位で励振信号の符号化が行われる。この１サブフレーム内の各サンプル点を表したのが図３８（ａ）である。この４０サンプルの点を図３８（ｂ）〜（ｅ）の４つのグループに分割する。すなわち、図３８（ｂ）は先頭のサンプル点の番号を０として以下順番に１、２、３、…、３９としたときにサンプル点の番号が５で割り切れるもの、即ち０、５、１０、…、３５のサンプル点からなるグループを示している。図３８（ｃ）は同様にサンプル点の番号を５で割った場合に１余るもの、即ち１、６、１１、…、３６のサンプル点から成るグループを示している。図３８（ｄ）も同様にサンプル点の番号を５で割った場合に２余るもの、即ち２、７、１２、…、２７のサンプル点から成るグループを示している。図３８（ｅ）も同様にサンプル点の番号を５で割った場合に３または４余るもの、即ち３、８、１３、…、３８および４、９、１４、…、３９のサンプル点から成るグループを示している。
これら各グループに含まれるサンプル点の中から１箇所を選んで振幅が＋１か−１のパルスを図３８（ｇ）〜（ｊ）のように立てる。このようにして立てられた４本のパルスを合わせたものが代数的符号帳から出力される雑音符号ベクトルとなる（図３８（ｆ））。
【００１８】
このように、代数的符号帳２ではパルスの振幅と位置のみが雑音ベクトルを表現する上で必要となる情報であり、代数的符号帳は図３９のように、パルスの振幅情報と位置情報で表される。
次に、各グループの中からパルスの振幅と位置を決める方法を以下に説明する。パルスの振幅は、探索の演算量を大幅に削減するために、位置の探索を行う前に決定される。これは、歪み最小化に用いられる評価関数の各項が全て正になるように決定される。
次に、パルスの位置の探索方法を図４０および図４１を参照して説明する。
図４０はパルスの位置探索に用いられるネスト構造のループをフローチャートで示したものである。図４０ではパルス数が３の場合を示している。パルスはチャンネル１から３まで順番に決定されていき、チャンネル１のパルス（パルス１）の位置が決定されるとパルス１のみから計算される歪み最小化に用いられる誤差評価関数が求められ、続いてチャンネル２のパルス（パルス２）の位置が決定されるとパルス１およびパルス２から計算される誤差評価関数が求められる。このようにして、新たなチャンネルのパルスが加わるたびに、新たに加えられたパルスに関係する誤差評価関数の項が付け加えられていくようなネストループ構造で誤差評価関数が求められる。図４１は図４０をプログラム的に表したものである。
【００１９】
前述のようなパルス探索法を実現するパルス探索装置の一例を図４２に示す。ここでは、代数的符号帳は３チャンネルであり、パルス数は３本である。図４２において、１３はパルス１の位置を表すインデックスを出力するパルス１インデックス生成器、１４はパルス２の位置を表すインデックスを出力するパルス２インデックス生成器、１５はパルス３の位置を表すインデックスｒを出力するパルス３インデックス生成器、１６はパルス１インデックス生成器１３から出力されたパルス１のインデックスを入力としてパルス１のインデックスをパルス１の実際の位置に変換し、歪み評価関数生成器１９に出力する第１のインデックス／パルス位置変換器、１７は同様にパルス２インデックス生成器１４から出力されたパルス２のインデックスを入力としてパルス２のインデックスをパルス２の実際の位置に変換して歪み評価関数生成器１９に出力する第２のインデックス／パルス位置変換器２、１８は同様にパルス３インデックス生成器１５から出力されたパルス３のインデックスを入力としてパルス３のインデックスをパルス３の実際の位置に変換して歪み評価関数生成器１９に出力する第３のインデックス／パルス位置変換器３、１９は第１のインデックス／パルス位置変換器１６と第２のインデックス／パルス位置変換器１７と第３のインデックス／パルス位置変換器１８とから出力された各パルスの位置を入力として歪み評価関数を計算して歪み評価関数の値とそのときの各パルス位置とを歪み最小化器２０に出力する歪み評価関数計算器、２０は歪み評価関数計算器１９から出力された歪み評価関数値と各パルスの位置を入力として歪みを最小とするパルスの組み合わせを出力する歪み最小化器である。
【００２０】
さらに、より長いフレーム（サブフレーム）長に対応するための従来の代数的符号帳の構成例について、図４３および図４４を参照して説明する。
【００２１】
ここでも代数的符号帳のチャンネル数は３で音源パルス数は３である。図４３において、まず最初に全探索範囲である１サブフレーム（ａ）を偶数サンプル点のグループ（ｂ）と奇数サンプル点のグループ（ｃ）の２つに分類する。そして、それぞれのグループ内のサンプル点をさらに３つのグループに分け、それぞれを各チャンネルとしてパルス１〜パルス３の探索位置とする。この３つに分ける方法は、先に示したＣＳ−ＡＣＥＬＰの図３８に示したものと同様である。
【００２２】
このように、各チャンネル毎に探索範囲を限定することに加えて、偶数サンプル点か奇数サンプル点かというモード分けを行うことによって、より少ないビット数で長いサブフレーム内の全てのサンプル点を探索できるようにしている。
【００２３】
この場合、代数的符号帳は図４４に示されるようになり、パルス振幅とパルス位置情報の他にモード情報が１ビット加わる。このような技術は"Recommendation G.723.1：Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 & 6.3 kbit/s", March 1996に示されている。
【００２４】
【発明が解決しようとする課題】
しかしながら、上記従来の代数的符号帳を用いた音声符号化装置の更なる低ビットレート化を図るためにフレーム長を長くした場合、パルス探索を行うべきサンプル点の数が増えてしまうため代数的符号帳の各音源パルスの位置を表現するために必要なビット数の確保が難しくなるという問題点があった。
【００２５】
本発明は、以上のような実情に鑑みてなされたもので、ビットレートが４ｋｂｓ程度の低ビットレートの場合において音声品質を向上させることのできる代数的符号帳を用いた音源信号符号化装置、音源信号復号化装置及びそれらの方法、並びに記録媒体を提供することを目的とする。
【００２６】
【課題を解決するための手段】
上記目的を達成するために、本発明は代数的符号帳の１つのパルスの位置を表すためにそのパルスに割り当てられたビットの他に少なくとももう一つの別のパルスに割り当てられたビット情報を用いることによって、ビット数の増加なしに、音源パルスの探索範囲を２倍以上の長さに拡大できるようにしたものである。
【００２７】
また本発明は、複数種類の代数的符号帳を有する構成とすることにより、ビット数の不十分なパルス数の多い代数的符号帳を有効的に利用してピッチ周期の短い音声の品質を向上するとともに、ビット数の十分なパルス数の少ない代数的符号帳を用いて有声立ち上がり部分等の品質を向上することが出来るようにしたものである。
【００２８】
また本発明は、代数的符号帳とランダム雑音符号帳とを併用することによって、音声品質の向上を図ることが出来るようにしたものである。
【００２９】
また本発明は、複数種類の代数的符号帳を有する構成において、ビット数の不十分なパルス数の少ないモードでは代数的符号帳の一部を適応符号ベクトルから求められるピッチピーク位置を用いて適応的に変化させることによって音声品質の向上を図ることが出来るようにしたものである。
【００３０】
また本発明は、２チャンネル以上から成る雑音符号帳において、各チャンネルの組合わせによって使用する符号帳を切り替えることによって、独立したモード情報無しにモードの切換が行うことが出来るようにしたものであり、各チャンネルの一部を代数的符号帳とすることによって演算量・メモリ量の削減をも図るようにしたものである。
【００３１】
また本発明は、２チャンネルから構成される雑音符号帳において、一方のチャンネルに代数的符号帳を他方のチャンネルにランダム雑音符号帳を使用して用いる場合に、ランダム雑音符号帳に格納されている雑音符号ベクトルの基準点を代数的符号帳から出力されているパルスの位置に合わせて使用することによって、ランダム雑音符号帳の利用効率の向上と音声品質の向上を図ることが出来るようにしたものである。
【００３２】
【発明の実施の形態】
以下、本発明の実施の形態となる音声符号化装置及び音声復号化装置について、図面を用いて説明する。
【００６３】
（実施の形態１）
図１は本発明の実施の形態１にかかる励振信号符号化装置の雑音符号ベクトル生成器を示している。同図に示す雑音符号ベクトル生成器は、ある特定のチャンネルから出力されたインデックスと少なくとももう一つの別のチャンネルから出力されたインデックスとを用いて前記特定のチャンネルから出力される駆動音源ベクトルを決定する機能を有するランダム符号帳を備える。
【００６４】
図１において、１０１はランダム符号帳のチャンネル１から出力される音源パルス（パルス１）の位置を表すインデックスを第１のインデックス／パルス位置変換器１０４と、第３のインデックス／パルス位置変換器１０６とに出力するパルス１インデックス生成器、１０２はランダム符号帳のチャンネル２から出力される音源パルス（パルス２）の位置を表すインデックスを第２のインデックス／パルス位置変換器１０５と第１のインデックス／パルス位置変換器１０４とに出力するパルス２インデックス生成器、１０３はチャンネル３から出力される音源パルス（パルス３）の位置を表すインデックスを第３のインデックス／パルス位置変換器１０６と第２のインデックス／パルス位置変換器１０５とに出力するパルス３インデックス生成器、１０４はパルス１インデックス生成器１０１とパルス２インデックス生成器１０２とから出力されたインデックスを入力として、パルス１のパルス位置を歪み評価関数計算器１０７に出力する第１のインデックス／パルス位置変換器、１０５はパルス２インデックス生成器１０２とパルス３インデックス生成器１０３とから出力されたインデックスを入力として、パルス２のパルス位置を歪み評価関数計算器１０７に出力する第２のインデックス／パルス位置変換器、１０６はパルス３インデックス生成器１０３とパルス１インデックス生成器１０１とから出力されたインデックスを入力として、パルス３のパルス位置を歪み評価関数計算器１０７に出力する第３のインデックス／パルス位置変換器、１０７は第１のインデックス／パルス位置変換器１０４と第２のインデックス／パルス位置変換器１０５と第３のインデックス／パルス位置変換器１０６とから出力された各パルスの位置を入力として歪み評価関数値および各パルスの位置を歪み最小化器１０８に出力する歪み評価関数計算器、１０８は歪み評価関数計算器１０７から出力された歪み評価関数値と各パルスの位置を入力として、歪みが最小となる組合わせを出力する歪み最小化器である。
【００６５】
以上のように構成された雑音符号ベクトル生成器の動作について図１〜図５を参照して説明する。
【００６６】
図１において、パルス１インデックス生成器１０１は予め定められたパルス探索位置の中から１個所を選び、その位置を表すインデックスを第１のインデックス／パルス位置変換器１０４と第３のインデックス／パルス位置変換器１０６とに出力する。
【００６７】
同様にして、パルス２インデックス生成器１０２は第２のインデックス／パルス位置変換器１０５と第１のインデックス／パルス位置変換器１０４に、パルス３インデックス生成器１０３は第３のインデックス／パルス位置変換器１０６と第２のインデックス／パルス位置変換器１０５に、それぞれ各パルスの位置を表すインデックスを出力する。
【００６８】
ここで、各パルスが取り得る予め定められた位置は図２に示す様になっている。図２（ａ）はサブフレーム全体のサンプル点を、図２（ｂ）〜（ｄ）は各チャンネル（パルス１〜３）のパルス探索位置を示している。図２はサブフレーム長が２４サンプルの例を示している。
【００６９】
例えば、パルス１の探索位置はサブフレーム内のサンプル点に対して先頭から順番に０、１、２、…と番号をつけた場合に、その番号を６で割った余りが０か１であるサンプル点として定められている。パルス２の探索位置は同様にして６で割った余りが２か３、パルス３の探索位置は同様にして６で割った余りが４か５であるサンプル点として定義される。各パルスの位置とインデックスとの対応は図２に示した通りで、例えばパルス１のインデックスが０の場合、パルス１の位置は０または１である。このとき、パルス１が０と１のどちらになるかは、後述するようにパルス２のインデックスによって決定される。
【００７０】
続いて、図１において、パルス１インデックス生成器１０１とパルス２インデックス生成器１０２とから出力されたパルス１およびパルス２のインデックスは第１のインデックス／パルス位置変換器１０４に入力され、これら２つのインデックスを用いてパルス１のパルス位置が第１のインデックス／パルス位置変換器１０４によって決定される。
【００７１】
パルス位置の決定方法は以下の通りである。例えば、パルス１インデックス生成器１０１から出力されたパルス１のインデックスが２であった場合、図２（ｂ）に示したとおり、パルス１の位置は１２または１３である。ここで、パルス１の位置が１２になるか１３になるかはパルス２のインデックスによって決定される。パルス２のインデックスが偶数ならばパルス１の位置は１２、パルス２のインデックスが奇数ならばパルス１の位置は１３になる。同様にしてパルス２の位置は第２のインデックス／パルス位置変換器１０５によって、パルス３の位置は第３のインデックス／パルス位置変換器１０６によって、それぞれ決定される。ここで示したような符号帳を表としてまとめたものが図３である。
【００７２】
歪み評価関数１０７は、第１のインデックス／パルス位置変換器１０４と第２のインデックス／パルス位置変換器１０５と第３のインデックス／パルス位置変換器１０６とから出力された各パルスの位置から生成される雑音符号ベクトルを用いた場合に生じる歪みを歪み評価関数を計算することによって定量化する。求められた歪み評価関数は、そのときの各パルスの組合わせとともに歪み最小化器１０８に出力される。
【００７３】
歪み最小化器１０８は歪みが最小となるパルスの組合わせを出力する。
【００７４】
以上の符号帳探索手順をフローチャートで示したものが、図４である。これは従来法のフローチャートである図４０と同様のネストループ構造であるが、各パルスの位置が２つのパルスのインデックスが定まらないと決定しないため、最初（パルス１）のループでは誤差評価関数の計算は行わず、２番目（パルス２）のループで最初の誤差評価関数（パルス１成分）が計算され、最後の３番目（パルス３）のループでパルス２成分の誤差評価関数とパルス３成分の誤差評価関数が同時に計算される。図４のフローチャートで表現された符号帳探索手順は図５に示すプログラムで表現することもできる。
【００７５】
なお、図１から図５では、雑音符号帳として代数的符号帳を用いた場合を示したが、その他の複数チャンネルを有する雑音符号帳においても各チャンネルの組合わせ情報を利用することが可能である。また、図２ではサブフレーム長（パルスの探索範囲）は２４サンプルであるが、より長いサブフレーム長に対して有効である。
【００７６】
次に、以上のようなランダム符号帳として隣接チャネル依存型代数的符号帳を備えた音声符号化装置について説明する。
【００７７】
図６は隣接チャネル依存型代数的符号帳を備えた音声符号化装置の機能ブロックである。同図において、１５１は加算器１５６から出力される過去に生成された駆動音源ベクトルを入力として、歪み最小化器１６２からの制御信号を受けて乗算器１５３に適応符号ベクトルを出力する適応符号帳、１５２は歪み最小化器１２からの制御信号を受けて乗算器１５４に雑音符号ベクトルを出力する本実施の形態に示した図１の動作を行う隣接チャンネル依存型代数的符号帳、１５３はゲイン符号帳１５８から出力される適応符号利得と適応符号帳１５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器１５６に出力する乗算器、１５４はゲイン予測器１５７から出力される予測ゲインと隣接チャンネル依存型代数的符号帳１５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器１５５に出力する乗算器、１５５は乗算器１５４から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳１５８から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器１５６およびゲイン予測器１５７にそれぞれ出力する乗算器、１５６は乗算器１５３から出力された利得乗算後の適応符号ベクトルと乗算器１５５から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ１６０と適応符号帳１５１に出力する加算器、１５７は乗算器１５５から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器１５４に出力するゲイン予測器、１５８は歪み最小化器１６２からの制御信号によって適応符号利得を乗算器１５３に、雑音符号利得を乗算器１５５にそれぞれ出力するゲイン符号帳、１５９は音声信号を入力として線形予測分析を行い合成フィルタに線形予測係数を出力する線形予測分析器、１６０は加算器１５６から出力された駆動音源ベクトルと線形予測分析器１５９から出力された線形予測係数とを入力として、合成音声信号を加算器１６１に出力する合成フィルタ、１６１は入力音声信号と合成フィルタ１６０から出力される合成音声信号とを入力としてその差を算出する加算器、１６２は加算器１６１から出力された誤差信号を入力として符号化歪みを算出し、符号化歪みが最小となるように適応符号帳１５１と代数的符号帳１５２とゲイン符号帳１５８に制御信号を送る符号化歪み最小化器である。
【００７８】
なお、歪み最小化器１６２によって決定された歪みを最小とする適応符号ベクトルのインデックスＡと雑音符号ベクトルのインデックスＳとゲインベクトルのインデックスＧはそれぞれ復号装置に伝送される。また、線形予測分析器によって得られた線形予測係数は量子化された後、量子化線形予測係数Ｌとして復号装置に伝送される。
【００７９】
さらに、本発明の実施の形態における音声復号化装置について、図７を参照してその構成と動作を説明する。
【００８０】
図７は本発明の実施の形態における音声復号化装置の機能ブロック図を示している。同図において、１７１は加算器１７６から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器１７３に出力する適応符号帳、１７２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを乗算器１７４に出力する本実施の形態に示した図２および図３の構成を有する符号化装置と同じ隣接チャンネル依存型代数的符号帳、１７３はゲイン符号帳１７８から出力される適応符号利得と適応符号帳１７１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器１７６に出力する乗算器、１７４はゲイン予測器１７７から出力される予測ゲインと隣接チャンネル依存型代数的符号帳１７２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器１７５に出力する乗算器、１７５は乗算器１７４から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳１７８から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器１７６およびゲイン予測器１７７にそれぞれ出力する乗算器、１７６は乗算器１７３から出力された利得乗算後の適応符号ベクトルと乗算器１７５から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ１８０と適応符号帳１７１に出力する加算器、１７７は乗算器１７５から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器１７４に出力するゲイン予測器、１７８は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器１７３に、雑音符号利得を乗算器１７５にそれぞれ出力するゲイン符号帳、１７９は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ１８０に出力する線形予測係数復号器、１８０は加算器１７６から出力された駆動音源ベクトルと線形予測係数復号器１７９から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【００８１】
以上の様に構成された音声復号化装置の動作を図７を参照して以下に示す。図７において、符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器１７９、適応符号帳１７１、隣接チャンネル依存型代数的符号帳１７２、ゲイン符号帳１７８に入力される。
【００８２】
線形予測係数復号器１７９は情報Ｌとして受け取った量子化線形予測係数を復号して、合成フィルタ１８０に出力する。合成フィルタ１８０は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳１７１は情報Ａで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器１７３に出力する。情報Ｓを受け取った隣接チャンネル依存型代数的符号帳１７２は情報Ｓで指定される雑音符号ベクトルを生成し、乗算器１７４に出力する。乗算器１７４はゲイン予測器１７７から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器１７５に出力する。情報Ｇを受け取ったゲイン符号帳は、情報Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器１７３に、雑音符号利得を乗算器１７５に、それぞれ出力する。乗算器１７３は適応符号帳１７１から出力された適応符号ベクトルにゲイン符号帳１７８から出力された適応符号利得を乗じて加算器１７６に出力する。乗算器１７５は隣接チャンネル依存型代数的符号帳から出力され乗算器１７４で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳１７８から出力された雑音符号利得を乗じて加算器１７６に出力する。なお、乗算器１７５から出力された乗算後の雑音符号ベクトルはゲイン予測器１７７へも出力される。ゲイン予測器１７７は過去に乗算器１７５から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器１７４に出力する。加算器１７６は乗算器１７３から出力された駆動音源信号の適応符号ベクトル成分と、乗算器１７５から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ１８０へ出力する。また、加算器１７６から出力された駆動信号ベクトルは適応符号帳１７１へも出力されて、適応符号帳１７１の更新に用いられる。合成フィルタ１８０は加算器１７６から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【００８３】
このように上記実施の形態によれば、複数チャンネルを有する雑音符号帳において、各チャンネル間の組合わせ情報を用いて雑音符号ベクトルを生成するため、雑音符号帳全体に割り当てられたビットを有効に利用することが出来る。
【００８４】
（実施の形態２）
図８は本発明の実施の形態２にかかる、雑音符号帳として２種類の代数的符号帳を備えたＣＥＬＰ型音声信号符号化装置の機能ブロックを示している。図８において、２０１は過去に生成した駆動音源ベクトルを加算器２０８から入力して格納し、歪み最小化器２１４からの制御信号によって適応符号ベクトルを乗算器２０５に出力する適応符号帳、２０２は２種類の代数的符号帳から成り、歪み最小化器２１４からの制御信号によってどちらか一方の代数的符号帳の中から雑音符号ベクトルを出力する雑音符号帳、２０３は雑音符号帳の一部である第１の代数的符号帳、２０４は雑音符号帳の一部であり第１の代数的符号帳２０３とは異なる構造である第２の代数的符号帳、２０５は適応符号帳２０１から出力された適応符号ベクトルとゲイン符号帳から出力された適応符号利得とを入力として乗算結果を加算器２０８に出力する乗算器、２０６は雑音符号帳２０２から出力された雑音符号ベクトルとゲイン予測器２０９から出力された予測ゲインとを入力として乗算結果を乗算器２０７に出力する乗算器、２０７は乗算器２０６から出力された予測ゲイン乗算後の雑音符号ベクトルとゲイン符号帳から出力された雑音符号利得とを入力として乗算結果を加算器２０８とゲイン予測器２０９に出力する加算器、２０８は乗算器２０５から出力された適応符号利得乗算後の適応符号ベクトルと乗算器２０７から出力された予測ゲインおよび雑音符号利得乗算後の雑音符号ベクトルとを入力としてベクトル加算結果を合成フィルタ２１２と適応符号帳２０１に出力する加算器、２０９は乗算器２０７から出力された予測ゲインと雑音符号利得乗算後の雑音符号ベクトルを入力として乗算器２０６に予測ゲインを出力するゲイン予測器、２１０は歪み最小化器２１４からの制御信号によって適応符号利得を乗算器２０５に、雑音符号利得を乗算器２０７にそれぞれ出力するゲイン符号帳、２１１は入力音声信号を入力として線形予測分析および線形予測係数の量子化を行い、量子化線形予測係数を合成フィルタ２１２に出力する線形予測分析器、２１２は加算器２０８から出力される励振ベクトルと線形予測分析器２１１から出力される量子化線形予測係数とを入力として、合成音声信号を加算器２１３に出力する合成フィルタ、２１３は入力音声信号と合成フィルタ２１２から出力された合成音声信号とを入力としてベクトル減算をおこない結果を歪み最小化器２１４に出力する加算器、２１４は加算器２１３から出力された差分ベクトルから聴覚重みづけ領域等における合成音声信号の入力音声信号に対する歪みを計算し、この歪みが最小となるように適応符号帳２０１および雑音符号帳２０２およびゲイン符号帳２１０の出力を制御する歪み最小化器である。
【００８５】
以上のように構成された音声信号符号化装置について、図８〜図１０を参照してその動作を説明する。
【００８６】
まず、音声信号は線形予測分析器２１１に入力される。線形予測分析器２１１は入力音声信号の線形予測分析を行い、合成フィルタ２１２で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ２１２に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。
【００８７】
次に、線形予測分析器２１１から出力された量子化線形予測係数を用いて合成フィルタ２１２が構成される。この合成フィルタを歪み最小化器２１４は適応符号帳２０１のみを用いて駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳２０１の中から選択する。この適応符号帳は音声信号の周期的成分（その周期をピッチ周期と呼ぶ）を表現するものであり、普通１ピッチ周期前の時点から１サブフレーム長のベクトルが切り出される（１ピッチ周期長が１サブフレーム長より短い場合は切り出されたベクトルをピッチ周期で繰り返して１サブフレーム長のベクトルにする）。以後、適応符号帳２０１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。
【００８８】
続いて、適応符号帳２０１と雑音符号帳２０２の両方を用いて合成フィルタ２１２を駆動する。このとき、適応符号帳２０１から出力される適応符号ベクトルは既に決定されているので、雑音符号帳２０２から出力される雑音符号ベクトルを換えていき、歪み最小化器２１４によって歪みが最小と判定される雑音符号ベクトルを選択する。
【００８９】
雑音符号帳２０２は、第１の代数的符号帳２０３と第２の代数的符号帳２０４を備えているので、両代数的符号帳の中から歪みを最小とするものが選択される。ここまでで適応符号帳２０１から出力される適応符号ベクトルと雑音符号帳２０２から出力される雑音符号ベクトルとが決定される。
【００９０】
最後に適応符号ベクトルに乗ずべき適応符号利得と雑音符号ベクトルに乗ずべき雑音符号利得とが歪み最小化器２１４によって決定される。ゲイン符号帳２１０には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器２０５へ、雑音符号利得の要素が乗算器２０７へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。
【００９１】
なお、本実施の形態における音声符号化装置では雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器２０９によって予測し、その予測残差分をゲイン符号帳を用いて量子化している。ゲイン予測器２０９は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測ゲインとして乗算器２０６へ出力する。雑音符号帳２０２から出力される雑音符号ベクトルは乗算器２０６と乗算器２０７でそれぞれ予測ゲインと雑音符号利得を乗ぜられる。歪み最小化器２１４は、ゲイン符号帳２１０の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【００９２】
本実施の形態では、代数的符号帳を２種類備えることによって、入力音声の状態に応じてモードを切替えられる構成となっているところに特徴がある。上記雑音符号帳における代数的符号帳の探索方法をフローチャートで示した図が図９である。最初に、３パルスの代数的符号帳の探索を行い、その後２パルスの代数的符号帳の探索をおこなう。また、逆に２パルスの代数的符号帳の探索を先に行ってもよい。代数的符号帳の探索に用いられる誤差評価関数の値は探索前の初期化処理で最小値（０以下の値）に設定されるが、図９においては２パルスの代数的符号帳探索の初期化処理はなく、既に得られている３パルスの代数的符号帳探索において歪み最小化されたときの誤差評価関数が初期値として用いられる。
【００９３】
図９において、まず３チャンネルの代数的符号帳で各チャンネルのパルスはチャンネル１から３まで順番に決定されていく。チャンネル１のパルス（パルス１）の位置が決定されるとパルス１のみから計算される歪み最小化に用いられる誤差評価関数が求められ、続いてチャンネル２のパルス（パルス２）の位置が決定されるとパルス１およびパルス２から計算される誤差評価関数が求められる。
このようにして、新たなチャンネルのパルスが加わるたびに、新たに加えられたパルスに関係する誤差評価関数の項が付け加えられていくようなネストループ構造で誤差評価関数が求められ、この誤差を最小とする誤差評価関数値とそのときの各パルスの位置の組合わせが求められる。
実際には（１）式に示されるような誤差評価関数を最大化することによって、誤差の最小化が行われる。
【００９４】
【数１】

（１）式において、ｘはターゲットベクトル、Ｈは合成フィルタのインパルス応答、ｃｉは雑音符号帳から出力される雑音符号ベクトルをそれぞれ示している。ターゲットベクトルとは、適応符号ベクトルのみで合成フィルタを駆動した場合に合成される信号を入力音声信号から減算し、さらに合成フィルタの零入力応答信号を減算したもので、雑音符号ベクトルで生成されるべき信号のことである。
【００９５】
なお、適応符号ベクトルと雑音符号ベクトルの直交化探索を行うなどして、既に決定されている適応符号ベクトルとの組み合わせも考慮した探索を行う場合には、（１）式ではなく適応符号ベクトル成分も評価式に含まれている（２）式を用いることによって、同様の探索を行うことが可能である。
【００９６】
【数２】

（２）式において、ｘはターゲットベクトル（適応符号ベクトルと雑音符号ベクトルを組み合せて作られる駆動音源ベクトルから合成フィルタによって合成されるべきベクトル。（1）式のターゲットベクトルとは異なる。ｐは適応符号ベクトル、ｃは雑音符号ベクトル（ｃｉはインデックスｉで示される雑音符号ベクトル）、Ｈは合成フィルタのインパルス応答畳み込み行列である。
【００９７】
次に、２チャンネルの代数的符号帳において、各チャンネルのパルス位置が決定される。２チャンネルの代数的符号帳における誤差最小化過程においては、３チャンネルの代数的符号帳探索において最大化された誤差評価関数値を超える誤差評価関数値が得られたときのみ、各パルス位置の更新が行われ、この更新が行われると３チャンネルの代数的符号帳からではなく２チャンネルの代数的符号帳から雑音符号ベクトルが出力されることになる。
【００９８】
さらに、２種類の代数的符号帳の構成の一例を図１０と図１１を用いて以下に説明する。
【００９９】
図１０および図１１はそれぞれ従来例の図４３および図４４に対応するものであり、図１０は各チャンネルのパルス（パルス１〜３またはパルス１、２）のパルス探索位置を示す模式図であり、図１１は符号帳を示す表である。これらの図はパルス数が３本と２本のモードを有する場合の例を示している。
【０１００】
図４３および図４４においてはモード情報をパルス位置が偶数サンプル点なのか奇数サンプル点なのかを示すものとして用いていたが、図１０および図１１ではモード情報を２パルスの代数的符号帳なのか３パルスの代数的符号帳なのかを示すものとして用いている。
【０１０１】
さらに、図１０および図１１では代数的符号帳が３パルスである場合には探索位置の範囲をサブフレームの前半部分に集中させている。このことによって、特にピッチ周期化処理を行う（雑音符号ベクトルをピッチ周期で繰り返す処理を行ったり、ピッチ周期を強調するフィルタ処理を雑音符号ベクトルに対して行う）場合においては、１ピッチ周期内のパルス探索の時間的分解能を落とさずに雑音符号ベクトルのパルス数を増やすことができる。
【０１０２】
また、２パルスの代数的符号帳を備えることによって、位置精度の細かいパルス探索が行えるので有声立ち上がり部等の性能改善が可能となる。
【０１０３】
本実施の形態の例では説明を簡単にするためサブフレーム長を２４としたが、実際にはサブフレーム長が８０サンプル程度以上あるような長い場合において上記の有効性を得ることができる。
【０１０４】
なお、本実施の形態の例においては代数的符号帳の種類は２種類としたが、パルス探索の範囲をもっとサブフレームの先頭部分に集中させてパルス本数を増やした代数的符号帳などを備えるようにして、代数的符号帳の種類を増やす方が有効である場合もある。
【０１０５】
なお、本実施の形態の例においては、ゲインの量子化においてベクトル量子化と雑音符号ベクトルパワーのバックワード予測を用いているが、スカラ量子化を用いた場合や雑音符号ベクトルパワーのバックワード予測を用いない場合においても本方式は適用できる。
【０１０６】
また、本実施の形態の例においては、ゲイン符号帳に格納されているベクトルは適応符号利得成分と雑音符号利得成分の２次元ベクトルであるが、音源ベクトルのパワーとそのパワーに占める適応符号ベクトル（または雑音符号ベクトル）の割合とを各要素とするなど他の２要素から成る２次元ベクトルであっても良い。
【０１０７】
また、本実施の形態における音声復号化装置について図１２を参照して説明する。図１２は本発明の実施の形態２における音声復号化装置の一例のブロック図を示しており、２５１は加算器２５８から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器２５５に出力する適応符号帳、２５２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを第１の代数的符号帳２５３または第２の代数的符号帳２５４の中から取り出して乗算器２５６に出力する本実施の形態に示した図１０および図１１の構成を有する符号化装置と同じ雑音符号帳、２５５はゲイン符号帳２６０から出力される適応符号利得と適応符号帳２５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器２５８に出力する乗算器、２５６はゲイン予測器２５９から出力される予測ゲインと雑音符号帳２５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器２５７に出力する乗算器、２５７は乗算器２５６から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳２６０から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器２５８およびゲイン予測器２５９にそれぞれ出力する乗算器、２５８は乗算器２５５から出力された利得乗算後の適応符号ベクトルと乗算器２５７から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ２６２と適応符号帳２５１に出力する加算器、２５９は乗算器２５７から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器２５６に出力するゲイン予測器、２６０は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器２５５に、雑音符号利得を乗算器２５７にそれぞれ出力するゲイン符号帳、２６１は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ１８０に出力する線形予測係数復号器、２６２は加算器２５８から出力された駆動音源ベクトルと線形予測係数復号器２６１から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【０１０８】
以上の様に構成された音声復号化装置の動作を説明する。符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器２６１、適応符号帳２５１、雑音符号帳２５２、ゲイン符号帳２６０に入力される。
【０１０９】
情報Ｌを受け取った線形予測係数復号器２６１は量子化線形予測係数を復号して、合成フィルタ２６２に出力する。合成フィルタ２６２は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳２５１はＡで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器２５５に出力する。情報Ｓを受け取った雑音符号帳２５２はＳで指定される雑音符号ベクトルを第１の代数的符号帳２５３または第２の代数的符号帳２５４から生成し、乗算器２５６に出力する。乗算器２５６はゲイン予測器２５９から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器２５７に出力する。情報Ｇを受け取ったゲイン符号帳は、Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器２５５に、雑音符号利得を乗算器２５７に、それぞれ出力する。乗算器２５５は適応符号帳２５１から出力された適応符号ベクトルにゲイン符号帳２６０から出力された適応符号利得を乗じて加算器２５８に出力する。乗算器２５７は隣接雑音符号帳２５２から出力され乗算器２５６で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳２６０から出力された雑音符号利得を乗じて加算器２５８に出力する。なお、乗算器２５７から出力された乗算後の雑音符号ベクトルはゲイン予測器２５９へも出力される。
【０１１０】
ゲイン予測器２５９は過去に乗算器２５７から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器２５６に出力する。加算器２５８は乗算器２５５から出力された駆動音源信号の適応符号ベクトル成分と、乗算器２５７から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ２６２へ出力する。また、加算器２５８から出力された駆動信号ベクトルは適応符号帳２５１へも出力されて、適応符号帳２５１の更新に用いられる。合成フィルタ２６２は加算器２５８から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【０１１１】
このように、上記実施の形態２によれば、代数的符号帳を複数種類備え、それぞれの代数的符号帳を入力音声信号の特徴に応じて切替えられるような構成とすることによって、少ないビット数の場合でも音声信号の符号化性能を改善できるものである。
【０１１２】
（実施の形態３）
図１３は本発明の実施の形態３にかかる、雑音符号帳として代数的符号帳とランダム雑音符号帳とを併用する音声信号符号化装置の機能ブロックを示す。
【０１１３】
図１３において、３０１は過去に生成した駆動音源ベクトルを加算器３０８から入力して格納し、歪み最小化器３１４からの制御信号によって適応符号ベクトルを乗算器３０５に出力する適応符号帳、３０２は２種類の雑音符号帳から成り、歪み最小化器３１４からの制御信号によってどちらか一方の雑音符号帳の中から雑音符号ベクトルを出力する雑音符号帳、３０３は雑音符号帳３０２の一部である代数的符号帳、３０４は雑音符号帳３０２の一部であり代数的符号帳３０３とは異なる構造の代数的符号帳と代数的符号帳ではないランダム符号帳から成る雑音符号帳、３０５は適応符号帳３０１から出力された適応符号ベクトルとゲイン符号帳から出力された適応符号利得とを入力として乗算結果を加算器３０８に出力する乗算器、３０６は雑音符号帳３０２から出力された雑音符号ベクトルとゲイン予測器３０９から出力された予測ゲインとを入力として乗算結果を乗算器３０７に出力する乗算器、３０７は乗算器３０６から出力された予測ゲイン乗算後の雑音符号ベクトルとゲイン符号帳から出力された雑音符号利得とを入力として乗算結果を加算器３０８とゲイン予測器３０９に出力する加算器、３０８は乗算器３０５から出力された適応符号利得乗算後の適応符号ベクトルと乗算器３０７から出力された予測ゲインおよび雑音符号利得乗算後の雑音符号ベクトルとを入力としてベクトル加算結果を合成フィルタ３１２と適応符号帳３０１に出力する加算器、３０９は乗算器３０７から出力された予測ゲインと雑音符号利得乗算後の雑音符号ベクトルを入力として乗算器３０６に予測ゲインを出力するゲイン予測器、３１０は歪み最小化器３１４からの制御信号によって適応符号利得を乗算器３０５に、雑音符号利得を乗算器３０７にそれぞれ出力するゲイン符号帳、３１１は入力音声信号を入力として線形予測分析および線形予測係数の量子化を行い、量子化線形予測係数を合成フィルタ３１２に出力する線形予測分析器、３１２は加算器３０８から出力される励振ベクトルと線形予測分析器３１１から出力される量子化線形予測係数とを入力として、合成音声信号を加算器３１３に出力する合成フィルタ、３１３は入力音声信号と合成フィルタ３１２から出力された合成音声信号とを入力としてベクトル減算をおこない結果を歪み最小化器３１４に出力する加算器、３１４は加算器３１３から出力された差分ベクトルから聴覚重みづけ領域等における合成音声信号の入力音声信号に対する歪みを計算し、この歪みが最小となるように適応符号帳３０１および雑音符号帳３０２およびゲイン符号帳３１０の出力を制御する歪み最小化器である。
【０１１４】
以上のように構成された音声信号符号化装置について、図１３〜図１９を参照してその動作を説明する。
【０１１５】
まず、音声信号は線形予測分析器３１１に入力される。線形予測分析器３１１は入力音声信号の線形予測分析を行い、合成フィルタ３１２で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ３１２に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。
【０１１６】
次に、線形予測分析器３１１から出力された量子化線形予測係数を用いて合成フィルタ３１２が構成される。この合成フィルタを歪み最小化器３１４は適応符号帳３０１のみを用いて駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳３０１の中から選択する。この適応符号帳は音声信号の周期的成分（その周期をピッチ周期と呼ぶ）を表現するものであり、普通１ピッチ周期前の時点から１サブフレーム長のベクトルが切り出される（１ピッチ周期長が１サブフレーム長より短い場合は切り出されたベクトルをピッチ周期で繰り返して１サブフレーム長のベクトルにする）。選択された適応符号ベクトルを表すインデックスＡが復号器側へ伝送される。以後、適応符号帳３０１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。
【０１１７】
続いて、適応符号帳３０１と雑音符号帳３０２の両方を用いて合成フィルタ３１２を駆動する。このとき、適応符号帳３０１から出力される適応符号ベクトルは既に決定されているので、雑音符号帳３０２から出力される雑音符号ベクトルを換えていき、歪み最小化器３１４によって歪みが最小と判定される雑音符号ベクトルを選択する。
【０１１８】
雑音符号帳３０２は代数的符号帳３０３と代数的符号帳および代数的符号帳ではないランダム符号帳から成る雑音符号帳３０４を備えている。両符号帳の中から歪みを最小とするものが選択される。選択された雑音符号ベクトルを表すインデックスＳが復号器側へ伝送される。ここまでで適応符号帳３０１から出力される適応符号ベクトルと雑音符号帳３０２から出力される雑音符号ベクトルとが決定される。最後に適応符号ベクトルに乗ずるべき適応符号利得と雑音符号ベクトルに乗ずるべき雑音符号利得とが歪み最小化器３１４によって決定される。ゲイン符号帳３１０には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器３０５へ、雑音符号利得の要素が乗算器３０７へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。なお、本実施の形態における音声符号化装置では雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器３０９によって予測し、その予測残差分をゲイン符号帳を用いて量子化している。ゲイン予測器３０９は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測ゲインとして乗算器３０６へ出力する。雑音符号帳３０２から出力される雑音符号ベクトルは乗算器３０６と乗算器３０７でそれぞれ予測ゲインと雑音符号利得を乗ぜられる。歪み最小化器３１４は、ゲイン符号帳３１０の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【０１１９】
上記動作は、２種類ある雑音符号帳の片方が代数的符号帳と代数的符号帳でないランダム符号帳から構成されている点を除けば、実施の形態２と同じである。代数的符号帳の一部として代数的符号帳でないランダム符号帳を備えることによって、より様々な入力音声の状態に応じたモードを持つことができる構成となっているところに特徴がある。
【０１２０】
上記雑音符号帳の探索方法をフローチャートで示した図が図１４である。図１４は実施の形態３のフローチャートである図９の２パルス代数的符号帳の部分を書き換えたものである。最初に、３パルスの代数的符号帳の探索を行い、その後２パルスの代数的符号帳の探索および代数的符号帳の第１チャンネルとランダム符号帳の第２チャンネルとを組合わせた探索および雑音符号帳の第１チャンネルと代数的符号帳の第２チャンネルとを組合わせた探索および雑音符号帳の第１チャンネルと雑音符号帳の第２チャンネルとを組合わせた探索をおこなう。これらの符号帳の探索の順番は他に考えられるいずれの順番でもよい。図１４の２パルス代数的符号帳探索３１５および（代数的符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）探索３１６および（ランダム符号帳の第１チャンネル＋代数的符号帳の第２チャンネル）探索３１７および（ランダム符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）探索３１８の内容を表すフローチャートを図１５および図１６および図１７および図１８にそれぞれ示す。図１５に示した２パルス代数的符号帳の探索は、実施の形態２で示した２パルス代数的符号帳の探索部分と同様である。即ち、各チャンネルのパルスはチャンネル１から２まで順番に決定されていく。チャンネル１のパルス（パルス１）の位置が決定されるとパルス１のみから計算される歪み最小化に用いられる誤差評価関数が求められ、続いてチャンネル２のパルス（パルス２）の位置が決定されるとパルス１およびパルス２から計算される誤差評価関数が求められる。このようにして、新たなチャンネルのパルスが加わるたびに、新たに加えられたパルスに関係する誤差評価関数の項が付け加えられていくようなネストループ構造で誤差評価関数が求められ、誤差を最小とする誤差評価関数値とそのときの各パルスの位置の組合わせが求められる。実際には、（１）式に示されるような誤差評価関数を最大化することによって、誤差の最小化が行われる。
【０１２１】
ここで得られた２パルス代数的符号帳における最適出力雑音符号ベクトルは図１４中３１９の最適モードの選択において２パルス代数的符号帳から出力された候補として用いられる。
【０１２２】
図１６は（代数的符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）の探索のフローチャートを示している。まず、代数的符号帳の第１チャンネル（２パルス代数的符号帳のうちの１チャンネル）から誤差評価関数の分子項を大きくするもの上位Ｎ候補を選択する。Ｎは許容演算量によって決定される。
【０１２３】
次に、ランダム符号帳の第２チャンネルの中から誤差評価関数の分子項を大きくするもの上位Ｍ候補を選択する。Ｍは許容演算量によって決定され、Ｎと等しくても等しくなくても良い。そして、誤差評価関数の分子項のみの評価によって選ばれた１パルス代数的符号帳からのＮ候補の雑音ベクトルと、ランダム符号帳の第２チャンネルから選ばれたＭ候補の雑音ベクトルとを組合わせた場合の誤差評価関数の分子および分母項をそれぞれの組合わせについて計算し、誤差評価関数を最大化することによって、合成音声と入力音声との誤差の最小化を行う。誤差が最小になると判断された組合わせによって生成される雑音符号ベクトルは図１４中３１９の最適モードの選択において（代数的符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）モードから出力された候補として用いられる。
図１７は図１６におけるそれぞれのチャンネルを反対にしたもので、ランダム符号帳の第１チャンネルと代数的符号帳の第２チャンネルとを組合わせるモードにおける、符号帳探索のフローチャートを示した図である。
【０１２４】
まず、ランダム符号帳の第１チャンネルから誤差評価関数の分子項を大きくする上位Ｍ候補を候補１として選択する。つぎに、代数的符号帳の第２チャンネルから誤差評価関数の分子項を大きくする上位Ｎ候補を候補２として選択する。その後、候補１と候補２のＭ×Ｎの組合わせにおいて、誤差評価関数の分子および分母項をそれぞれ計算し、誤差評価関数の値を最大化して歪みが最小となる組合わせを決定する。決定された組み合わせから得られる雑音符号ベクトルは、図１４中３１９の最適モードの選択において（ランダム符号帳の第１チャンネル＋代数的符号帳の第２チャンネル）モードから出力された候補として用いられる。
【０１２５】
図１８は図１６または図１７に示した代数的符号帳の部分をランダム符号帳の探索ループに置き換えたものである。即ち、ランダム符号帳の第１チャンネルと第２チャンネルから出力される雑音符号ベクトルを組合わせた場合の最適雑音符号ベクトルを選択するフローチャートを示している。
【０１２６】
最初に、ランダム符号帳の第１チャンネルから誤差評価関数の分子項を大きくするもの上位Ｍ候補が候補１として選択される。次に、ランダム符号帳の第２チャンネルから同様にして誤差関数の分子項を大きくする上位Ｍ候補の雑音符号ベクトルが候補２として選択される。そして、候補１と候補２のＭ×Ｍ通りの組合わせにおいて、組合わせた結果得られる雑音符号ベクトルを用いた場合の誤差評価関数の分子および分母項をそれぞれ求めて誤差評価関数を計算する。このＭ×Ｍ候補の中から誤差評価関数を最大とする、即ち歪みを最小とする組合わせを選択し、（ランダム符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）モードの最適雑音符号ベクトルとして決定する。この決定された雑音符号ベクトルは、（ランダム符号帳の第１チャンネル＋ランダム符号帳の第２チャンネル）モードから出力された候補として、図１４中３１９の最適モードの選択において用いられる。そして、図１４の３１９においては、各モード（２パルス／１パルス＋ランダム／ランダム＋１パルス／ランダム＋ランダム）の中から誤差評価関数が最も大きい、即ち歪みが最も小さくなる雑音符号ベクトルを、２チャンネルモードの最適雑音符号ベクトルとして選び出す。さらに、これを先に決定されている３チャンネルモードの最適雑音符号ベクトルと比較し、誤差評価関数を大きくする、即ち歪みが小さくなる方のモードを選択し、図１３における雑音符号帳３０２からの出力雑音符号ベクトルを決定する。
【０１２７】
このような雑音符号帳３０２の具体例を示す表を図１９に示す。この例ではサブフレーム長（雑音符号ベクトルのベクトル次元数）を８０としている。３チャンネルモードか２チャンネルモードかを示すビットに１ビット、各チャンネルの極性を表すために１ビット／チャンネル、残りのビット数をパルス位置または雑音符号帳を表すインデックス情報に用いて、合計１５ビットを用いている。表中のパルス探索位置欄にある数値は、サブフレームの先頭を０としたサンプル点の位置であり、２チャンネルモードのパルス探索位置欄にあるＲＡ０１〜ＲＡ２４およびＲＢ０１〜ＲＢ２４は、各チャンネル内の雑音帳のインデックスを示している。
【０１２８】
この符号帳の２チャンネル雑音符号帳部（図１３における３０４）の構成を図示したのが図２０である。以下に図２０を参照して本２チャンネル雑音符号帳部の構成および動作を説明する。図２０において、３２０は雑音符号帳の第１チャンネルであり、代数的符号帳の第１チャンネル３２２とランダム符号帳の第１チャンネル３２３から構成されている。代数的符号帳の第１チャンネル３２２は加算器３２６と加算器３２７に雑音符号ベクトルを出力し、ランダム符号帳の第１チャンネル３２３は加算器３２８と加算器３２９に雑音符号ベクトルを出力する。一方、３２１は雑音符号帳の第２チャンネルであり、代数的符号帳の第２チャンネル３２４とランダム符号帳の第２チャンネル３２５から構成されている。代数的符号帳の第２チャンネル３２４は加算器３２６と加算器３２８に雑音符号ベクトルを出力し、ランダム符号帳の第２チャンネル３２５は加算器３２７と加算器３２９に雑音符号ベクトルを出力する。加算器３２６〜３２９は入力された２つの雑音符号ベクトルを加算し、スイッチ３３０に出力する。スイッチ３３０は加算器３２６〜３２９から出力された雑音符号ベクトルの中から最適な雑音符号ベクトルを選択して出力する。ここで、加算器３２６に入力される代数的符号帳の第１チャンネル３２２からの雑音符号ベクトルと代数的符号帳の第２チャンネル３２４からの雑音符号ベクトルはその組合わせにおいて最適であるベクトルであり、加算器３２７に入力される代数的符号帳の第１チャンネル３２２からの雑音符号ベクトルとランダム符号帳の第２チャンネル３２５からの雑音符号ベクトルはその組合わせにおいて最適であるベクトルであり、加算器３２８に入力されるランダム符号帳の第１チャンネル３２３からの雑音符号ベクトルと代数的符号帳第２チャンネル３２４からの雑音符号ベクトルはその組合わせにおいて最適であるベクトルであり、加算器３２９に入力されるランダム符号帳の第１チャンネル３２３からの雑音符号ベクトルとランダム符号帳の第２チャンネル３２５からの雑音符号ベクトルはその組合わせにおいて最適であるベクトルである。また、スイッチ３３０は専用の切換情報によって切り替わるのではなく、チャンネル１の雑音符号帳３２０とチャンネル２の雑音符号帳３２１の組合わせによって切り替わる。
【０１２９】
なお、上記の代数的符号帳でないランダム符号帳としては、白色雑音的な雑音符号帳や学習によって最適化された符号帳、さらにはこれらのような符号帳をスパース構造としたものなどが使用可能であり、特にスパース構造としたものは演算量の面から有効である。また、学習型の符号帳では学習による性能改善が得られる。さらに、雑音符号帳の一部を学習型の符号帳、残りの部分をランダム雑音符号帳とすると、雑音耐性の向上と学習による性能改善の両方の効果を得ることも可能である。
【０１３０】
また、本実施の形態における音声復号化装置について図２１を参照して説明する。図２１は実施の形態３における音声復号化装置の一例のブロック図を示しており、３５１は加算器３５８から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器３５５に出力する適応符号帳、３５２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを代数的符号帳３５３または代数的符号帳と代数的符号帳ではないランダム符号帳から成る雑音符号帳３５４の中から取り出して乗算器３５６に出力する本実施の形態に示した図１９および図２０の構成を有する符号化装置と同じ雑音符号帳、３５５はゲイン符号帳３６０から出力される適応符号利得と適応符号帳３５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器３５８に出力する乗算器、３５６はゲイン予測器３５９から出力される予測ゲインと雑音符号帳３５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器３５７に出力する乗算器、３５７は乗算器３５６から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳３６０から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器３５８およびゲイン予測器３５９にそれぞれ出力する乗算器、３５８は乗算器３５５から出力された利得乗算後の適応符号ベクトルと乗算器３５７から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ３６２と適応符号帳３５１に出力する加算器、３５９は乗算器３５７から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器３５６に出力するゲイン予測器、３６０は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器３５５に、雑音符号利得を乗算器３５７にそれぞれ出力するゲイン符号帳、３６１は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ３８０に出力する線形予測係数復号器、３６２は加算器３５８から出力された駆動音源ベクトルと線形予測係数復号器３６１から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【０１３１】
以上の様に構成された音声復号化装置の動作を図２１を参照して以下に示す。図２１において、符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器３６１、適応符号帳３５１、雑音符号帳３５２、ゲイン符号帳３６０に入力される。情報Ｌを受け取った線形予測係数復号器３６１は量子化線形予測係数を復号して、合成フィルタ３６２に出力する。合成フィルタ３６２は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳３５１はＡで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器３５５に出力する。情報Ｓを受け取った雑音符号帳３５２はＳで指定される雑音符号ベクトルを代数的符号帳３５３または代数的符号帳とランダム符号帳とから成る雑音符号帳３５４から生成し、乗算器３５６に出力する。乗算器３５６はゲイン予測器３５９から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器３５７に出力する。情報Ｇを受け取ったゲイン符号帳は、Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器３５５に、雑音符号利得を乗算器３５７に、それぞれ出力する。乗算器３５５は適応符号帳３５１から出力された適応符号ベクトルにゲイン符号帳３６０から出力された適応符号利得を乗じて加算器３５８に出力する。乗算器３５７は隣接雑音符号帳３５２から出力され乗算器３５６で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳３６０から出力された雑音符号利得を乗じて加算器３５８に出力する。なお、乗算器３５７から出力された乗算後の雑音符号ベクトルはゲイン予測器３５９へも出力される。ゲイン予測器３５９は過去に乗算器３５７から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器３５６に出力する。加算器３５８は乗算器３５５から出力された駆動音源信号の適応符号ベクトル成分と、乗算器３５７から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ３６２へ出力する。また、加算器３５８から出力された駆動信号ベクトルは適応符号帳へも出力されて、適応符号帳の更新に用いられる。合成フィルタ３６２は加算器３５８から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【０１３２】
このように、上記実施の形態３によれば、代数的符号帳の一部に代数的符号帳でない雑音符号帳を備えることにより、代数的符号帳のみでは向上できない音声符号化性能の向上を図ることが可能となる。
【０１３３】
（実施の形態４）
図２２は本発明の実施の形態４にかかる音声符号化装置の機能ブロックを示したものである。実施の形態４にかかる音声符号化装置は、複数種類の代数的符号帳を有する構成において、ビット数の不十分なパルス数の少ないモードでは代数的符号帳の一部を適応符号ベクトルから求められるピッチピーク位置を用いて適応的に変化させることによって音声品質の向上を図ることが出来るようにした。図２２において、４０１は過去に生成した駆動音源ベクトルを加算器４０８から入力して格納し、歪み最小化器４１４からの制御信号によって適応符号ベクトルを乗算器４０５および代数的符号帳＋位相適応型代数的符号帳４０３に出力する適応符号帳、４０２は２種類の雑音符号帳から成り、歪み最小化器４１４からの制御信号によってどちらか一方の雑音符号帳の中から雑音符号ベクトルを出力する雑音符号帳、４０３は雑音符号帳４０２の一部である位相適応型部分を有する代数的符号帳、４０４は雑音符号帳４０２の一部であり代数的符号帳４０３とは異なる構造の代数的符号帳と代数的符号帳ではないランダム符号帳から成る雑音符号帳、４０５は適応符号帳４０１から出力された適応符号ベクトルとゲイン符号帳から出力された適応符号利得とを入力として乗算結果を加算器４０８に出力する乗算器、４０６は雑音符号帳４０２から出力された雑音符号ベクトルとゲイン予測器４０９から出力された予測ゲインとを入力として乗算結果を乗算器４０７に出力する乗算器、４０７は乗算器４０６から出力された予測ゲイン乗算後の雑音符号ベクトルとゲイン符号帳から出力された雑音符号利得とを入力として乗算結果を加算器４０８とゲイン予測器４０９に出力する加算器、４０８は乗算器４０５から出力された適応符号利得乗算後の適応符号ベクトルと乗算器４０７から出力された予測ゲインおよび雑音符号利得乗算後の雑音符号ベクトルとを入力としてベクトル加算結果を合成フィルタ４１２と適応符号帳４０１に出力する加算器、４０９は乗算器４０７から出力された予測ゲインと雑音符号利得乗算後の雑音符号ベクトルを入力として乗算器４０６に予測ゲインを出力するゲイン予測器、４１０は歪み最小化器４１４からの制御信号によって適応符号利得を乗算器４０５に、雑音符号利得を乗算器４０７にそれぞれ出力するゲイン符号帳、４１１は入力音声信号を入力として線形予測分析および線形予測係数の量子化を行い、量子化線形予測係数を合成フィルタ４１２に出力する線形予測分析器、４１２は加算器４０８から出力される励振ベクトルと線形予測分析器４１１から出力される量子化線形予測係数とを入力として、合成音声信号を加算器４１３に出力する合成フィルタ、４１３は入力音声信号と合成フィルタ４１２から出力された合成音声信号とを入力としてベクトル減算をおこない結果を歪み最小化器４１４に出力する加算器、４１４は加算器４１３から出力された差分ベクトルから聴覚重みづけ領域等における合成音声信号の入力音声信号に対する歪みを計算し、この歪みが最小となるように適応符号帳４０１および雑音符号帳４０２およびゲイン符号帳４１０の出力を制御する歪み最小化器である。
【０１３４】
以上のように構成された音声信号符号化装置について、図２２〜図２５を参照してその動作を説明する。
【０１３５】
まず、図２２において、音声信号は線形予測分析器４１１に入力される。線形予測分析器４１１は入力音声信号の線形予測分析を行い、合成フィルタ４１２で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ４１２に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。
【０１３６】
次に、線形予測分析器４１１から出力された量子化線形予測係数を用いて合成フィルタ４１２が構成される。この合成フィルタを歪み最小化器４１４は適応符号帳４０１のみを用いて駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳４０１の中から選択する。この適応符号帳は音声信号の周期的成分（その周期をピッチ周期と呼ぶ）を表現するものであり、普通１ピッチ周期前の時点から１サブフレーム長のベクトルが切り出される（１ピッチ周期長が１サブフレーム長より短い場合は切り出されたベクトルをピッチ周期で繰り返して１サブフレーム長のベクトルにする）。選択された適応符号ベクトルを表すインデックスＡが復号器側へ伝送される。以後、適応符号帳４０１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。続いて、適応符号帳４０１と雑音符号帳４０２の両方を用いて合成フィルタ４１２を駆動する。このとき、適応符号帳４０１から出力される適応符号ベクトルは既に決定されているので、雑音符号帳４０２から出力される雑音符号ベクトルを換えていき、歪み最小化器４１４によって歪みが最小と判定される雑音符号ベクトルを選択する。
【０１３７】
雑音符号帳４０２は位相適応型部分を備える代数的符号帳４０３と代数的符号帳および代数的符号帳ではないランダム符号帳からなる雑音符号帳４０４を備えているので、両符号帳の中から歪みを最小とするものが選択される。
【０１３８】
なお、位相適応型部分の代数的符号帳は適応符号帳から出力された適応符号ベクトルおよびピッチ周期を入力として生成される。選択された雑音符号ベクトルを表すインデックスＳは復号器側へ伝送される。ここまでで適応符号帳４０１から出力される適応符号ベクトルと雑音符号帳４０２から出力される雑音符号ベクトルとが決定される。最後に適応符号ベクトルに乗ずるべき適応符号利得と雑音符号ベクトルに乗ずるべき雑音符号利得とが歪み最小化器４１４によって決定される。ゲイン符号帳４１０には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器４０５へ、雑音符号利得の要素が乗算器４０７へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。
【０１３９】
なお、本実施の形態における音声符号化装置では雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器４０９によって予測し、その予測残差分をゲイン符号帳を用いて量子化している。
【０１４０】
ゲイン予測器４０９は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測ゲインとして乗算器４０６へ出力する。雑音符号帳４０２から出力される雑音符号ベクトルは乗算器４０６と乗算器４０７でそれぞれ予測ゲインと雑音符号利得を乗ぜられる。歪み最小化器４１４は、ゲイン符号帳４１０の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【０１４１】
図２３は図２２の位相適応型部分を有する代数的符号帳４０３の構成例を示したものである。図２３を用いて以下にその構成と動作を説明する。図において４１５〜４１８はパルス１の符号帳Ａ〜Ｄ、４１９〜４２２はパルス２の符号帳Ａ〜Ｄ、４２３〜４２６はパルス３の符号帳Ａ〜Ｄであり、各パルス符号帳の出力はスイッチ４２９を介して加算器４３０に入力される。スイッチ４２９は３つの連動するスイッチであり、位相適応位置判定器４２８から出力される信号によって、各パルス符号帳のＡ〜Ｄのいずれかを選択して加算器４３０と接続する。即ち、パルス１の符号帳Ａが選択される場合は、パルス２および３の符号帳も符号帳Ａが選択される。加算器４３０はスイッチ４２９から出力される３つの符号帳出力のベクトル加算を行い、雑音符号ベクトルとして出力する。一方、ピッチピーク位置検出器４２７は、適応符号ベクトルおよびピッチ周期を入力として、サブフレーム内の先頭にあるピッチピーク位置を求め、位相適応位置判定器４２８に出力する。ピッチピーク位置の求めかたは、ピッチ周期で並べたインパルス列と適応符号ベクトルとの相互相関関数を最大にする位置を求める等の方法がある。その他の求めかたとしては、間野、守谷：“位相適応型ＰＳＩ−ＣＥＬＰ音声符号化の検討”、電子情報通信学会、技術報告ＳＰ９４−９６（１９９５年２月）に述べられているように、ピッチ周期で並べたインパルス列で合成フィルタを駆動したものと適応符号ベクトルで合成フィルタを駆動したものとの相互相関を最大化する方法がある。ピッチピーク位置が求められると、その位置に応じて切替えて使用すべき符号帳を位相適応位置判定器４２８が決定し、スイッチ４２９に対して切り替えるための信号を送出する。
【０１４２】
なお、図２３ではパルス１、２、３の符号帳Ａ〜Ｄは全く別のブロックとして示しているが、各パルスの符号帳Ａ〜Ｄの一部は共有された符号帳である。即ち、図２２の４０３に示したとおり、固定の代数的符号帳と位相位置によって適応的に切り替わる代数的符号帳から構成されており、図２３では便宜上別々のブロックとしている。この代数的符号帳について、図２４を用いて説明する。
【０１４３】
図２４は本実施の形態に用いられる、位相適応型部分を有する代数的符号帳の一例を示している。図２４（ａ）は適応符号帳から出力された適応符号ベクトル（１サブフレーム分）を示しており、（ｂ）は１サブフレームをＡ〜Ｄの４領域に分割した様子を示しており、（ｃ）は代数的符号帳の固定部分と位相適応部分を示している。図２４（ｃ）において、▲６▼の部分は固定された代数的符号帳の部分であり、この部分はピッチピークの位置に関係なく常に探索される。この例では固定部分の代数的符号帳では偶数サンプル点の探索に用いられている。これに対して、▲５▼の部分は従来法において固定部分であったところを固定部分から外した部分である。この外した部分に割り当てられていた情報を位相適応型部分に割り当てている。即ち、従来▲５▼の部分に探索位置が固定されていたものを、ピッチピークの位置によって▲１▼〜▲４▼のいずれかの部分を探索するように切り替えるようにしている。この部分を位相によって適応的に切り替わる部分、という意味で位相適応型部分と呼ぶ。この例では▲４▼の部分を除いて、位相適応型部分の代数的符号帳では奇数サンプル点の探索に用いられている。▲４▼の部分は位相適応型の部分でのみ探索されるので、偶数サンプル点でも奇数サンプル点でもどちらにしても良いが、図２４では偶数サンプル点にしている。なお、固定型代数的符号帳および位相適応型代数的符号帳のどちらに対しても、実施の形態１に示したような複数のチャンネルのインデックスによってパルスの位置が決定されるような代数的符号帳を用いることも出来る。図２４（ａ）のＰはピッチピーク位置検出器４２７で求められたピッチピーク位置である。位相適応位置判定器４２８では、まず１サブフレームを図２４（ｂ）に示すように４分割した場合に、ピッチピーク位置ＰがＡ〜Ｄのどの領域に存在するかを判定する。次に、位相適応部分の代数的符号帳を図２４（ｃ）の▲１▼〜▲４▼のどの位置にするかを決定する。図２４の例では、Ｐの位置は（ｂ）の領域Ａに存在するので、位相適応型代数的符号帳は▲１▼の部分が選択される。
【０１４４】
このように段階的に位相適応部分を切り替えるので、算出されたピッチピーク位置が符号器側と復号器側とで異なってしまった場合（伝送路誤りがあった場合など）においても、ある程度の範囲内で影響を抑えることが可能である。なお、図２４（ｃ）においては▲１▼〜▲４▼はオーバーラップする部分が無いが、オーバーラップするような位相適応型符号帳を設けることによってピッチピーク位置が位相適応型符号帳の境界付近にある場合に対応することも可能である。さらに、ピッチ周期情報を用いて図２４（ｃ）の▲５▼と▲６▼の長さの比を切り替える、即ちピッチ周期が長いときは▲６▼の長さを長くして▲５▼の長さを短くし、逆にピッチ周期が短いときは▲６▼の長さを短くして▲５▼の長さを長くする、というような手法も有効である。このとき、▲６▼の長さはピッチ周期以上の長さとする。
【０１４５】
図２５は本実施の形態における雑音符号帳探索方法のフローチャートを示したものである。本発明の実施の形態３で示した図１４と比べて、最初の３ブロック（ピッチピーク位置の算出、位相適応位置の判定、位相適応型符号帳の選択（切換））が３チャンネル代数的符号帳探索の前処理として新たに付け足されている。まず、図２３の４２７に示したように適応符号ベクトルとピッチ周期を用いてピッチピーク位置の算出が行われる。続いて、ピッチピーク位置に応じて位相適応型の代数的符号帳の探索範囲を決定する。そして、その探索範囲である代数的符号帳を選択して各パルスの探索位置を設定する。ここまでで、３パルスの探索前処理が終了する。その後、パルス１〜３の位置を初期化し、前処理で設定されたパルス１〜３の探索位置の組合わせの中から最適な３パルスの位置の組合わせを探し出す。各チャンネルのパルスはチャンネル１から３まで順番に決定されていく。チャンネル１のパルス（パルス１）の位置が決定されるとパルス１のみから計算される歪み最小化に用いられる誤差評価関数が求められ、続いてチャンネル２のパルス（パルス２）の位置が決定されるとパルス１およびパルス２から計算される誤差評価関数が求められる。このようにして、新たなチャンネルのパルスが加わるたびに、新たに加えられたパルスに関係する誤差評価関数の項が付け加えられていくようなネストループ構造で誤差評価関数が求められ、この誤差を最小とする誤差評価関数値とそのときの各パルスの位置の組合わせが求められる。このような探索方法は従来例と同じであるが、探索する位置の一部が探索ループの前処理においてピッチピーク位置によって切替えられる点が異なる。３チャンネルの代数的符号帳の探索が終了すると、２チャンネルの雑音符号帳の探索が行われる。３チャンネルの代数的符号帳によって決定された音源よりも歪みを小さくするような音源ベクトルが見つかった場合には、この２チャンネルの雑音符号帳から最終的な出力である雑音符号ベクトルが出力される。
【０１４６】
また、本実施の形態における音声復号化装置について図２６を参照して説明する。図２６は本発明の実施の形態４における音声復号化装置の一例のブロック図を示しており、４５１は加算器４５８から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器４５５および一部分が位相適応型代数的符号帳である代数的符号帳４５３に出力する適応符号帳、４５２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを一部分が位相適応型代数的符号帳である代数的符号帳４５３または代数的符号帳と代数的符号帳ではないランダム符号帳から成る雑音符号帳４５４の中から取り出して乗算器４５６に出力する本実施の形態に示した図２３および図２４の構成を有する符号化装置と同じ雑音符号帳、４５５はゲイン符号帳４６０から出力される適応符号利得と適応符号帳４５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器４５８に出力する乗算器、４５６はゲイン予測器４５９から出力される予測ゲインと雑音符号帳４５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器４５７に出力する乗算器、４５７は乗算器４５６から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳４６０から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器４５８およびゲイン予測器４５９にそれぞれ出力する乗算器、４５８は乗算器４５５から出力された利得乗算後の適応符号ベクトルと乗算器４５７から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ４６２と適応符号帳４５１に出力する加算器、４５９は乗算器４５７から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器４５６に出力するゲイン予測器、４６０は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器４５５に、雑音符号利得を乗算器４５７にそれぞれ出力するゲイン符号帳、４６１は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ４８０に出力する線形予測係数復号器、４６２は加算器４５８から出力された駆動音源ベクトルと線形予測係数復号器４６１から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【０１４７】
以上の様に構成された音声復号化装置の動作を図２６を参照して以下に示す。図２６において、符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器４６１、適応符号帳４５１、雑音符号帳４５２、ゲイン符号帳４６０に入力される。情報Ｌを受け取った線形予測係数復号器４６１は量子化線形予測係数を復号して、合成フィルタ４６２に出力する。合成フィルタ４６２は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳４５１はＡで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器４５５に出力する。なお、このとき適応符号ベクトルは一部分が位相適応型である代数的符号帳４５３へも出力される。情報Ｓを受け取った雑音符号帳４５２はＳで指定される雑音符号ベクトルを一部分が位相適応型である代数的符号帳４５３または代数的符号帳とランダム符号帳とから成る雑音符号帳４５４から生成し、乗算器４５６に出力する。なお、一部分が位相適応型である代数的符号帳４５３において、位相適応型の部分は適応符号帳４５１から入力した適応符号ベクトルから求められるピッチピーク位置に基づいて適応的に生成される。乗算器４５６はゲイン予測器４５９から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器４５７に出力する。情報Ｇを受け取ったゲイン符号帳は、Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器４５５に、雑音符号利得を乗算器４５７に、それぞれ出力する。乗算器４５５は適応符号帳４５１から出力された適応符号ベクトルにゲイン符号帳４６０から出力された適応符号利得を乗じて加算器４５８に出力する。乗算器４５７は隣接雑音符号帳４５２から出力され乗算器４５６で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳４６０から出力された雑音符号利得を乗じて加算器４５８に出力する。なお、乗算器４５７から出力された乗算後の雑音符号ベクトルはゲイン予測器４５９へも出力される。ゲイン予測器４５９は過去に乗算器４５７から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器４５６に出力する。加算器４５８は乗算器４５５から出力された駆動音源信号の適応符号ベクトル成分と、乗算器４５７から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ４６２へ出力する。また、加算器４５８から出力された駆動信号ベクトルは適応符号帳へも出力されて、適応符号帳の更新に用いられる。合成フィルタ４６２は加算器４５８から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【０１４８】
このように、上記実施の形態４によれば、複数種類の代数的符号帳を有する構成において、ビット数の不十分なパルス数の少ないモードでは代数的符号帳の一部を適応符号ベクトルから求められるピッチピーク位置を用いて適応的に変化させるので音声品質の向上を図ることが出来る。
【０１４９】
（実施の形態５）
図２７は本発明の実施の形態５にかかる音声符号化装置のブロック図を示している。実施の形態５にかかる音声符号化装置は、２チャンネル以上から成る雑音符号帳において、各チャンネルの組合わせによって使用する符号帳を切り替えることによって、独立したモード情報無しにモードの切換を行うことが出来るようにしたものである。
【０１５０】
図２７において、５０１は過去に生成した駆動音源ベクトルを加算器５０８から入力して格納し、歪み最小化器５１４からの制御信号によって適応符号ベクトルを乗算器５０５に出力する適応符号帳、５０２は２種類の雑音符号帳から成り、歪み最小化器５１４からの制御信号によってどちらか一方の雑音符号帳の中から雑音符号ベクトルを出力する雑音符号帳、５０３は雑音符号帳５０２の一部である代数的符号帳、５０４は雑音符号帳５０２の一部であり代数的符号帳５０３とは異なる構造の代数的符号帳（２チャンネル構成の代数的符号帳）と２種類の代数的符号帳ではないランダム符号帳（２チャンネル構成）から成る雑音符号帳、５０５は適応符号帳５０１から出力された適応符号ベクトルとゲイン符号帳から出力された適応符号利得とを入力として乗算結果を加算器５０８に出力する乗算器、５０６は雑音符号帳５０２から出力された雑音符号ベクトルとゲイン予測器５０９から出力された予測ゲインとを入力として乗算結果を乗算器５０７に出力する乗算器、５０７は乗算器５０６から出力された予測ゲイン乗算後の雑音符号ベクトルとゲイン符号帳から出力された雑音符号利得とを入力として乗算結果を加算器５０８とゲイン予測器５０９に出力する加算器、５０８は乗算器５０５から出力された適応符号利得乗算後の適応符号ベクトルと乗算器５０７から出力された予測ゲインおよび雑音符号利得乗算後の雑音符号ベクトルとを入力としてベクトル加算結果を合成フィルタ５１２と適応符号帳５０１に出力する加算器、５０９は乗算器５０７から出力された予測ゲインと雑音符号利得乗算後の雑音符号ベクトルを入力として乗算器５０６に予測ゲインを出力するゲイン予測器、５１０は歪み最小化器５１４からの制御信号によって適応符号利得を乗算器５０５に、雑音符号利得を乗算器５０７にそれぞれ出力するゲイン符号帳、５１１は入力音声信号を入力として線形予測分析および線形予測係数の量子化を行い、量子化線形予測係数を合成フィルタ５１２に出力する線形予測分析器、５１２は加算器５０８から出力される励振ベクトルと線形予測分析器５１１から出力される量子化線形予測係数とを入力として、合成音声信号を加算器５１３に出力する合成フィルタ、５１３は入力音声信号と合成フィルタ５１２から出力された合成音声信号とを入力としてベクトル減算をおこない結果を歪み最小化器５１４に出力する加算器、５１４は加算器５１３から出力された差分ベクトルから聴覚重みづけ領域等における合成音声信号の入力音声信号に対する歪みを計算し、この歪みが最小となるように適応符号帳５０１および雑音符号帳５０２およびゲイン符号帳５１０の出力を制御する歪み最小化器である。
【０１５１】
以上のように構成された音声信号符号化装置について、図２７〜図２９を参照してその動作を説明する。
【０１５２】
まず、図２７において、音声信号は線形予測分析器５１１に入力される。線形予測分析器５１１は入力音声信号の線形予測分析を行い、合成フィルタ５１２で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ５１２に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。次に、線形予測分析器５１１から出力された量子化線形予測係数を用いて合成フィルタ５１２が構成される。この合成フィルタを歪み最小化器５１４は適応符号帳５０１のみを用いて駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳５０１の中から選択する。この適応符号帳は音声信号の周期的成分（その周期をピッチ周期と呼ぶ）を表現するものであり、普通１ピッチ周期前の時点から１サブフレーム長のベクトルが切り出される（１ピッチ周期長が１サブフレーム長より短い場合は切り出されたベクトルをピッチ周期で繰り返して１サブフレーム長のベクトルにする）。選択された適応符号ベクトルを表すインデックスＡが復号器側へ伝送される。以後、適応符号帳５０１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。続いて、適応符号帳５０１と雑音符号帳５０２の両方を用いて合成フィルタ５１２を駆動する。このとき、適応符号帳５０１から出力される適応符号ベクトルは既に決定されているので、雑音符号帳５０２から出力される雑音符号ベクトルを換えていき、歪み最小化器５１４によって歪みが最小と判定される雑音符号ベクトルを選択する。雑音符号帳５０２は代数的符号帳５０３と代数的符号帳および２種類の代数的符号帳ではないランダム符号帳からなる雑音符号帳５０４を備えているので、両符号帳の中から歪みを最小とするものが選択される。選択された雑音符号ベクトルを表すインデックスＳが復号器側へ伝送される。ここまでで適応符号帳５０１から出力される適応符号ベクトルと雑音符号帳５０２から出力される雑音符号ベクトルとが決定される。最後に適応符号ベクトルに乗じるべき適応符号利得と雑音符号ベクトルに乗じるべき雑音符号利得とが歪み最小化器５１４によって決定される。ゲイン符号帳５１０には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器５０５へ、雑音符号利得の要素が乗算器５０７へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。なお、本実施の形態における音声符号化装置では雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器５０９によって予測し、その予測残差分をゲイン符号帳を用いて量子化している。ゲイン予測器５０９は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測ゲインとして乗算器５０６へ出力する。雑音符号帳５０２から出力される雑音符号ベクトルは乗算器５０６と乗算器５０７でそれぞれ予測ゲインと雑音符号利得を乗じられる。歪み最小化器５１４は、ゲイン符号帳５１０の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【０１５３】
図２８は、本実施の形態において特徴的である２チャンネル構成の雑音符号帳５０４の構成を示している。図２８において、５１５は代数的符号帳の第１チャンネル、５１６は第１のランダム符号帳の第２チャンネル、５１７はる第１のランダム符号帳の第１チャンネル、５１８は代数的符号帳の第２チャンネル、５１９は第２のランダム符号帳の第１チャンネル、５２０は第２のランダム符号帳の第２チャンネル、５２１は代数的符号帳の第１チャンネルからの出力ベクトルと代数的符号帳の第２チャンネルからの出力ベクトルとのベクトル加算を行い、結果をスイッチ５２５に出力する加算器、５２２は代数的符号帳の第１チャンネルから出力されるベクトルと第１ののランダム符号帳の第２チャンネルから出力されるベクトルとのベクトル加算を行い、結果をスイッチ５２５に出力する加算器、５２３は第１のランダム符号帳の第１チャンネルから出力されるベクトルと代数的符号帳の第２チャンネルから出力されるベクトルとのベクトル加算を行い、結果をスイッチ５２５に出力する加算器、５２４は第２のランダム符号帳の第１チャンネルから出力されたベクトルと第２のランダム符号帳の第２チャンネルから出力されたベクトルとのベクトル加算を行い、結果をスイッチ５２５に出力する加算器、５２５は加算器５２１〜５２４から出力されたベクトルの中から１つのベクトルを選択して出力するスイッチである。図２８に示した雑音符号帳の構成は基本的には実施の形態３の図２０に示した雑音符号帳の構成と同様であるが、２つのチャンネルとも代数的符号帳ではないランダム符号帳が使用される場合に別のランダム符号帳が用いられる点で異なる。即ち、２チャンネルの内一方のチャンネルが代数的符号帳で他方のチャンネルがランダム符号帳である場合と、両方のチャンネルがランダム符号帳である場合とで使用されるランダム符号帳が異なっている点が、本実施の形態における特徴である。
【０１５４】
図２９に、本実施の形態における雑音符号帳５０２の探索のフローチャートを示す。まずはじめにパルス１〜３の位置の初期化を行った後、３パルス（チャンネル）の代数的符号帳の探索を行い、初期化した３本のパルス位置の組合わせよりも誤差を小さくするパルス位置の組合わせがあった場合はその位置に各パルスの位置が更新される。続いて２パルス（チャンネル）の代数的符号帳（代数的符号帳の第１チャンネルと第２チャンネルの組合わせ）、代数的符号帳の第１チャンネルと第１のランダム符号帳の第２チャンネルの組合わせ、第１のランダム符号帳の第１チャンネルと代数的符号帳の第２チャンネルの組合わせ、第２のランダム符号帳の第１チャンネルと第２のランダム符号帳の第２チャンネルの組合わせ、のそれぞれにおいて符号帳探索を行い、これら４つの組合わせにおいて誤差を最小とする組合わせをそれぞれ決定する。最後に、前記４つの組合わせと３パルスの代数的符号帳から選択・生成された５種類の雑音符号ベクトルの中から、符号化歪みを最小とするものを１つ選択し、出力する。なお、３パルスの代数的符号帳の探索方法および２チャンネルの雑音符号帳の探索方法は実施の形態３に示したものと同様である。また、図２９においては、最初に３パルスの代数的符号帳の探索を行ってから、２チャンネルの雑音符号帳の探索を行っているが、探索の順番はこの順番でなくともよい。
【０１５５】
また、本実施の形態における音声復号化装置について図３０を参照して説明する。図３０は本発明の実施の形態５における音声復号化装置の一例のブロック図を示しており、５５１は加算器５５８から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器５５５に出力する適応符号帳、５５２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを代数的符号帳５５３または代数的符号帳と代数的符号帳ではないランダム符号帳から成る雑音符号帳５５４の中から取り出して乗算器５５６に出力する本実施の形態に示した図２８の構成を有する、符号化装置と同じ雑音符号帳、５５５はゲイン符号帳５６０から出力される適応符号利得と適応符号帳５５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器５５８に出力する乗算器、５５６はゲイン予測器５５９から出力される予測ゲインと雑音符号帳５５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器５５７に出力する乗算器、５５７は乗算器５５６から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳５６０から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器５５８およびゲイン予測器５５９にそれぞれ出力する乗算器、５５８は乗算器５５５から出力された利得乗算後の適応符号ベクトルと乗算器５５７から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ５６２と適応符号帳５５１に出力する加算器、５５９は乗算器５５７から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器５５６に出力するゲイン予測器、５６０は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器５５５に、雑音符号利得を乗算器５５７にそれぞれ出力するゲイン符号帳、５６１は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ５８０に出力する線形予測係数復号器、５６２は加算器５５８から出力された駆動音源ベクトルと線形予測係数復号器５６１から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【０１５６】
以上の様に構成された音声復号化装置の動作を図３０を参照して以下に示す。図３０において、符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器５６１、適応符号帳５５１、雑音符号帳５５２、ゲイン符号帳５６０に入力される。情報Ｌを受け取った線形予測係数復号器５６１は量子化線形予測係数を復号して、合成フィルタ５６２に出力する。合成フィルタ５６２は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳５５１はＡで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器５５５に出力する。情報Ｓを受け取った雑音符号帳５５２はＳで指定される雑音符号ベクトルを代数的符号帳５５３または代数的符号帳と２種類のランダム符号帳とから成る雑音符号帳５５４から生成し、乗算器５５６に出力する。乗算器５５６はゲイン予測器５５９から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器５５７に出力する。情報Ｇを受け取ったゲイン符号帳は、Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器５５５に、雑音符号利得を乗算器５５７に、それぞれ出力する。乗算器５５５は適応符号帳５５１から出力された適応符号ベクトルにゲイン符号帳５６０から出力された適応符号利得を乗じて加算器５５８に出力する。乗算器５５７は隣接雑音符号帳５５２から出力され乗算器５５６で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳５６０から出力された雑音符号利得を乗じて加算器５５８に出力する。なお、乗算器５５７から出力された乗算後の雑音符号ベクトルはゲイン予測器５５９へも出力される。ゲイン予測器５５９は過去に乗算器５５７から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器５５６に出力する。加算器５５８は乗算器５５５から出力された駆動音源信号の適応符号ベクトル成分と、乗算器５５７から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ５６２へ出力する。また、加算器５５８から出力された駆動信号ベクトルは適応符号帳へも出力されて、適応符号帳の更新に用いられる。合成フィルタ５６２は加算器５５８から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【０１５７】
このように、上記実施の形態５によれば、格納ベクトルが少ないランダム符号帳をチャンネルの組合わせによって切替えて使用する構成とすることにより雑音符号帳全体の性能を向上できる。
【０１５８】
（実施の形態６）
図３１は本発明の実施の形態６にかかる音声符号化装置のブロック図を示している。実施の形態６にかかる音声符号化装置は、２チャンネルから構成される雑音符号帳において、一方のチャンネルに代数的符号帳を他方のチャンネルにランダム雑音符号帳を使用して用いる場合に、ランダム雑音符号帳に格納されている雑音符号ベクトルの基準点を代数的符号帳から出力されているパルスの位置に合わせて使用することによって、ランダム雑音符号帳の利用効率の向上と音声品質の向上を図ることが出来るようにした図３１において、６０１は過去に生成した駆動音源ベクトルを加算器６０８から入力して格納し、歪み最小化器６１４からの制御信号によって適応符号ベクトルを乗算器６０５に出力する適応符号帳、６０２は２種類の雑音符号帳から成り、歪み最小化器６１４からの制御信号によってどちらか一方の雑音符号帳の中から雑音符号ベクトルを出力する雑音符号帳、６０３は雑音符号帳６０２の一部分である代数的符号帳、６０４は雑音符号帳６０２の一部分であり代数的符号帳６０３とは異なる構造の代数的符号帳と代数的符号帳ではないランダム符号帳と他方のチャンネルのパルス位置に応じて適応的に変化する適応型ランダム符号帳とから成る雑音符号帳、６０５は適応符号帳６０１から出力された適応符号ベクトルとゲイン符号帳から出力された適応符号利得とを入力として乗算結果を加算器６０８に出力する乗算器、６０６は雑音符号帳６０２から出力された雑音符号ベクトルとゲイン予測器６０９から出力された予測ゲインとを入力として乗算結果を乗算器６０７に出力する乗算器、６０７は乗算器６０６から出力された予測ゲイン乗算後の雑音符号ベクトルとゲイン符号帳から出力された雑音符号利得とを入力として乗算結果を加算器６０８とゲイン予測器６０９に出力する加算器、６０８は乗算器６０５から出力された適応符号利得乗算後の適応符号ベクトルと乗算器６０７から出力された予測ゲインおよび雑音符号利得乗算後の雑音符号ベクトルとを入力としてベクトル加算結果を合成フィルタ６１２と適応符号帳６０１に出力する加算器、６０９は乗算器６０７から出力された予測ゲインと雑音符号利得乗算後の雑音符号ベクトルを入力として乗算器６０６に予測ゲインを出力するゲイン予測器、６１０は歪み最小化器６１４からの制御信号によって適応符号利得を乗算器６０５に、雑音符号利得を乗算器６０７にそれぞれ出力するゲイン符号帳、６１１は入力音声信号を入力として線形予測分析および線形予測係数の量子化を行い、量子化線形予測係数を合成フィルタ６１２に出力する線形予測分析器、６１２は加算器６０８から出力される励振ベクトルと線形予測分析器６１１から出力される量子化線形予測係数とを入力として、合成音声信号を加算器６１３に出力する合成フィルタ、６１３は入力音声信号と合成フィルタ６１２から出力された合成音声信号とを入力としてベクトル減算をおこない結果を歪み最小化器６１４に出力する加算器、６１４は加算器６１３から出力された差分ベクトルから聴覚重みづけ領域等における合成音声信号の入力音声信号に対する歪みを計算し、この歪みが最小となるように適応符号帳６０１および雑音符号帳６０２およびゲイン符号帳６１０の出力を制御する歪み最小化器である。
【０１５９】
以上のように構成された音声信号符号化装置について、図３１〜図３４を参照してその動作を説明する。
【０１６０】
まず、図３１において、音声信号は線形予測分析器６１１に入力される。線形予測分析器６１１は入力音声信号の線形予測分析を行い、合成フィルタ６１２で必要となる線形予測係数を算出する。算出された線形予測係数は量子化された後、合成フィルタ６１２に出力される。また、ここで出力される線形予測係数を表す符号Ｌが復号装置に伝送される。次に、線形予測分析器６１１から出力された量子化線形予測係数を用いて合成フィルタ６１２が構成される。この合成フィルタを歪み最小化器６１４は適応符号帳６０１のみを用いて駆動し、最も歪みが小さくなる適応符号ベクトルを適応符号帳６０１の中から選択する。この適応符号帳は音声信号の周期的成分（その周期をピッチ周期と呼ぶ）を表現するものであり、普通１ピッチ周期前の時点から１サブフレーム長のベクトルが切り出される（１ピッチ周期長が１サブフレーム長より短い場合は切り出されたベクトルをピッチ周期で繰り返して１サブフレーム長のベクトルにする）。選択された適応符号ベクトルを表すインデックスＡが復号器側へ伝送される。以後、適応符号帳６０１から出力される適応符号ベクトルはここで決定された適応符号ベクトルに固定される。続いて、適応符号帳６０１と雑音符号帳６０２の両方を用いて合成フィルタ６１２を駆動する。このとき、適応符号帳６０１から出力される適応符号ベクトルは既に決定されているので、雑音符号帳６０２から出力される雑音符号ベクトルを換えていき、歪み最小化器６１４によって歪みが最小と判定される雑音符号ベクトルを選択する。雑音符号帳６０２は代数的符号帳６０３と代数的符号帳および２種類の代数的符号帳ではないランダム符号帳からなる雑音符号帳６０４を備えているので、両符号帳の中から歪みを最小とするものが選択される。選択された雑音符号ベクトルを表すインデックスＳが復号器側へ伝送される。ここまでで適応符号帳６０１から出力される適応符号ベクトルと雑音符号帳６０２から出力される雑音符号ベクトルとが決定される。最後に適応符号ベクトルに乗ずるべき適応符号利得と雑音符号ベクトルに乗ずるべき雑音符号利得とが歪み最小化器６１４によって決定される。ゲイン符号帳６１０には適応符号利得と雑音符号利得をそれぞれ要素とする２次元ベクトル（ゲインベクトル）が格納されており、歪みを最小とする適応符号利得と雑音符号利得の組合わせを有するゲインベクトルが選択され、適応符号利得の要素が乗算器６０５へ、雑音符号利得の要素が乗算器６０７へ出力される。また、選択されたゲインベクトルを表すインデックスＧが復号化装置に伝送される。
【０１６１】
なお、本実施の形態における音声符号化装置では雑音符号利得のダイナミックレンジを狭くするために対数領域のパワーをゲイン予測器６０９によって予測し、その予測残差分をゲイン符号帳を用いて量子化している。ゲイン予測器６０９は過去に生成された雑音符号ベクトルのパワーを用いて現在の雑音符号ベクトルのパワーを予測し、予測ゲインとして乗算器６０６へ出力する。雑音符号帳６０２から出力される雑音符号ベクトルは乗算器６０６と乗算器６０７でそれぞれ予測ゲインと雑音符号利得を乗ぜられる。歪み最小化器６１４は、ゲイン符号帳６１０の中から最適なゲインベクトルを選択する以前までに決定されている適応符号ベクトルと雑音符号ベクトルと予測ゲインを用いて、合成音声信号の符号化歪みが最小となる適応符号利得と雑音符号利得の組合わせを決定する。
【０１６２】
図３２は、本実施の形態において特徴的である雑音符号帳６０４の構成を示したブロック図である。図３２は、実施の形態５における図２８に示した構成と良く似ているが、第１のランダム符号帳が適応型ランダム符号帳であり、これらのランダム符号帳からの出力ベクトルに対して適応器による適応処理が加えられている点が異なる。図３２において、６１５は代数的符号帳の第１チャンネル、６１６は適応型ランダム符号帳の第２チャンネル、６１７は適応型ランダム符号帳の第１チャンネル、６１８は代数的符号帳の第２チャンネル、６１９はランダム符号帳の第１チャンネル、６２０はランダム符号帳の第２チャンネル、６２１は代数的符号帳の第１チャンネル６１５から出力されたベクトルと適応型ランダム符号帳の第２チャンネル６１６から出力されたベクトルとを入力として、代数的符号帳の第１チャンネルから出力されたベクトルの音源パルスの位置と適応型ランダム符号帳に格納されているベクトルの基準点を合わせる操作を行って適応型ランダム符号帳の第２チャンネルに格納されているベクトルを加算器６２４に出力する適応器、６２２は代数的符号帳の第２チャンネル６１８から出力されたベクトルと適応型ランダム符号帳の第１チャンネル６１７から出力されたベクトルとを入力として、代数的符号帳の第２チャンネルから出力されたベクトルの音源パルスの位置と適応型ランダム符号帳の第１チャンネルに格納されているベクトルの基準点を合わせる操作を行って適応型ランダム符号帳の第１チャンネルに格納されているベクトルを加算器６２５に出力する適応器、６２３は代数的符号帳の第１チャンネル６１５と代数的符号帳の第２チャンネル６１８から出力されたベクトルとを入力としてベクトル加算を行い、結果をスイッチ６２７に出力する加算器、６１４は代数的符号帳の第１チャンネル６１５から出力されたベクトルと適応器６２１から出力されたベクトルとのベクトル加算を行い、スイッチ６２７に出力する加算器、６２５は適応器６２２から出力されるベクトルと代数的符号帳の第２チャンネル６１８から出力されるベクトルとのベクトル加算を行い、結果をスイッチ６２７に出力する加算器、６２６はランダム符号帳の第１チャンネルとランダム符号帳の第２チャンネルから出力されるベクトルのベクトル加算を行い、結果をスイッチ６２７に出力する加算器、６２７は加算器６２３〜６２６から出力されたベクトルの中から１つを選んで出力するスイッチである。なお、本実施の形態における代数的符号帳の第１チャンネルおよび第２チャンネルは音源パルスが１本であるベクトルを出力するものである。なお、代数的符号帳に対してピッチ周期化またはピッチ周期化に準ずる処理（ピッチ強調フィルタ処理等）を加える場合は、代数的符号帳の第１チャンネルおよび第２チャンネルから出力されるベクトル中の音源パルスは複数本となるが、この場合は適応器６２１および６２２においてはベクトル中の音源パルスのうち最も先頭にあるパルスの位置にランダム符号帳のベクトルの基準点を合わせる処理を行う。図３３に、適応器６２１および６２２の処理内容を模式的に表した図を示す。図３３において、６２８は代数的符号帳のチャンネル１から出力された第１の雑音符号ベクトル、６２９は適応型ランダム符号帳のチャンネル２から出力された後、適応器６２１によって適応処理が行われて出力された第２の雑音符号ベクトル、６３０は適応型ランダム符号帳のチャンネル２に格納されている雑音符号ベクトル、６３１は第１の雑音符号ベクトルと第２の雑音符号ベクトルとのベクトル加算を行って最終的な雑音符号ベクトルとして出力する加算器である。図３３に示すように、適応型ランダム符号帳に格納されているベクトル６３０は、適応器によって基準点が第１の雑音符号ベクトルの音源パルスの位置と合うようにシフトされて、第２の雑音符号ベクトルとなる。これは、代数的符号帳の第２チャンネルと適応型ランダム符号帳の第１チャンネルを組合わせて使用する場合も同様である。
【０１６３】
図２９および図３４に、本実施の形態における雑音符号帳探索法のフローチャートを示す。図２９は実施の形態５において示した図である。実施の形態６においては、第１のランダム符号帳として適応型ランダム符号帳を、第２のランダム符号帳としてランダム符号帳をそれぞれ用いる点が異なり、その処理内容において異なる部分は（代数的符号帳の第１チャンネル＋第１のランダム符号帳（適応型ランダム符号帳）の第２チャンネル）探索の部分と（第１のランダム符号帳（適応型ランダム符号帳）の第１チャンネル＋代数的符号帳の第２チャンネル）探索の部分である。これらの探索においては、第１のランダム符号帳が適応型ランダム符号帳であり、適応処理が適用されている点が異なる。これをフローチャートに示したのが図３４である。
【０１６４】
図３４は（代数的符号帳の第１チャンネル＋適応型ランダム符号帳の第２チャンネル）探索の部分について示した図である。（代数的符号帳の第２チャンネル＋適応型ランダム符号帳の第１チャンネル）探索においても同様の手順で処理が行われる。図３４において、まず、第１の代数的符号帳の探索が行われる。第１の代数的符号帳の探索は（１）式の最大化によって行われる。ここで、（１）式の値を大きくする複数の候補を残しても良いがこの例では演算量を少なくするため（１）式を最大化する候補を１つだけ選択する。これが第１の雑音符号ベクトルとなる。つぎに、選択された代数的符号帳の第１チャンネルから出力された音源パルスの位置に対して適応型ランダム符号帳の第２チャンネルに格納されている全てのベクトルの基準点を合わせるためのシフト処理を行う。つぎに、シフト処理が行われたベクトルを用いて第２のランダム符号帳の探索を行う。ここでは、既に代数的符号帳の第１チャンネルからの出力が決定しているので、代数的符号帳の第１チャンネルと組合わせて符号化歪みが最小となるベクトルを適応型ランダム符号帳の第２チャンネルから選択する。選択された適応型ランダム符号帳の第２チャンネルの（シフトされた）ベクトルは第２の雑音符号ベクトルとして決定される。そして、第１の雑音符号ベクトルと第２の雑音符号ベクトルをベクトル加算したものが最終的な雑音符号ベクトルとして出力される。
【０１６５】
なお、図３５に本発明の実施の形態における音声符号化装置によって送出される情報Ａ（適応符号帳インデックス），Ｓ（雑音符号帳インデックス），Ｇ（ゲイン符号帳インデックス），Ｌ（線形予測係数符号化情報）を受信して復号化処理を行う復号化装置の一例を示す。
【０１６６】
図３５は本発明の実施の形態６における音声復号化装置の一例のブロック図を示しており、６５１は加算器６５８から出力される過去に生成された駆動音源ベクトルを入力として、符号化装置から伝送された情報Ａで指定される適応符号ベクトルを乗算器６５５に出力する適応符号帳、６５２は符号化装置から伝送された情報Ｓによって指定される雑音符号ベクトルを代数的符号帳６５３または代数的符号帳と代数的符号帳ではない２種類のランダム符号帳から成る雑音符号帳６５４の中から取り出して乗算器６５６に出力する本実施の形態に示した図３２および図３３の構成を有する符号化装置と同じ雑音符号帳、６５５はゲイン符号帳６６０から出力される適応符号利得と適応符号帳６５１から出力される適応符号ベクトルを入力として乗算を行い、乗算結果を加算器６５８に出力する乗算器、６５６はゲイン予測器６５９から出力される予測ゲインと雑音符号帳６５２から出力された雑音符号ベクトルとを入力とし、乗算結果を乗算器６５７に出力する乗算器、６５７は乗算器６５６から出力された予測ゲインを乗じた後の雑音符号ベクトルとゲイン符号帳６６０から出力される雑音符号利得とを入力として乗算を行い、乗算結果を加算器６５８およびゲイン予測器６５９にそれぞれ出力する乗算器、６５８は乗算器６５５から出力された利得乗算後の適応符号ベクトルと乗算器６５７から出力された利得乗算後の雑音符号ベクトルとを入力としてベクトル加算を行い、結果を合成フィルタ６６２と適応符号帳６５１に出力する加算器、６５９は乗算器６５７から出力された利得乗算後の雑音符号ベクトルを入力とし、予測ゲインを乗算器６５６に出力するゲイン予測器、６６０は符号化装置から伝送された情報Ｇで指定される適応符号利得を乗算器６５５に、雑音符号利得を乗算器６５７にそれぞれ出力するゲイン符号帳、６６１は符号化装置から伝送される情報Ｌの復号処理を行い、量子化線形予測係数を求めて合成フィルタ６８０に出力する線形予測係数復号器、６６２は加算器６５８から出力された駆動音源ベクトルと線形予測係数復号器６６１から出力された量子化線形予測係数とを入力として、合成音声信号を出力する合成フィルタである。なお、合成フィルタから出力された復号音声信号はさらに、聴覚的な品質を高めるためのフィルタ処理等が施されるのが一般的である。
【０１６７】
以上の様に構成された音声復号化装置の動作を図３５を参照して以下に示す。図３５において、符号化装置から情報Ｌ，Ａ，Ｓ，Ｇが伝送され、それぞれの情報は線形予測係数復号器６６１、適応符号帳６５１、雑音符号帳６５２、ゲイン符号帳６６０に入力される。情報Ｌを受け取った線形予測係数復号器６６１は量子化線形予測係数を復号して、合成フィルタ６６２に出力する。合成フィルタ６６２は量子化線形予測係数を用いて構築される。情報Ａを受け取った適応符号帳６５１はＡで指定された適応符号ベクトルを適応符号帳から切り出して、乗算器６５５に出力する。情報Ｓを受け取った雑音符号帳６５２はＳで指定される雑音符号ベクトルを代数的符号帳６５３または代数的符号帳とランダム符号帳と適応型ランダム符号帳とから成る雑音符号帳６５４から生成し、乗算器６５６に出力する。乗算器６５６はゲイン予測器６５９から出力された予測ゲインを雑音符号ベクトルに乗算して乗算器６５７に出力する。情報Ｇを受け取ったゲイン符号帳は、Ｇで指定された量子化ゲインをゲイン符号帳の中から選択して出力する。このとき、適応符号利得を乗算器６５５に、雑音符号利得を乗算器６５７に、それぞれ出力する。乗算器６５５は適応符号帳６５１から出力された適応符号ベクトルにゲイン符号帳６６０から出力された適応符号利得を乗じて加算器６５８に出力する。乗算器６５７は隣接雑音符号帳６５２から出力され乗算器６５６で予測ゲインを乗ぜられた雑音符号ベクトルにゲイン符号帳６６０から出力された雑音符号利得を乗じて加算器６５８に出力する。なお、乗算器６５７から出力された乗算後の雑音符号ベクトルはゲイン予測器６５９へも出力される。ゲイン予測器６５９は過去に乗算器６５７から出力された雑音符号ベクトルを用いて、現在の雑音符号ベクトルの利得（対数パワー）をＭＡ予測等を用いて予測し、乗算器６５６に出力する。加算器６５８は乗算器６５５から出力された駆動音源信号の適応符号ベクトル成分と、乗算器６５７から出力された駆動音源信号の雑音符号ベクトル成分との加算を行って駆動音源信号を生成し、合成フィルタ６６２へ出力する。また、加算器６５８から出力された駆動信号ベクトルは適応符号帳へも出力されて、適応符号帳の更新に用いられる。合成フィルタ６６２は加算器６５８から出力される駆動音源から合成信号を合成して出力する。出力音声信号はそのまま復号音声信号として出力しても良いが、一般的には品質が不十分なため、高域強調やピッチ強調やホルマント強調等の後処理を行い、聴感的な品質を向上してから復号音声信号として出力する。
【０１６８】
このように、上記実施の形態６によれば、２チャンネルから構成される雑音符号帳において、一方のチャンネルに代数的符号帳を他方のチャンネルにランダム雑音符号帳を使用する場合に、ランダム符号帳に格納されている雑音符号ベクトルの基準点を代数的符号帳から出力されているパルスの位置に合わせる構成とすることにより、ランダム符号帳の利用効率の向上と音声品質の向上を図ることが出来る。
【０１６９】
（実施の形態７）
図３６は本発明の実施の形態１〜実施の形態６のいずれかの音声符号化または復号化装置を備えた音声信号送信機および受信機を示したブロック図である。図３６において、７０１はマイク等音声信号を電気的信号に変換してＡ／Ｄ変換器７０２に出力する音声信号入力装置、７０２は音声信号入力装置から出力されたアナログ音声信号をディジタル信号に変換して音声符号化器７０３に出力するＡ／Ｄ変換器、７０３は本発明の第１〜実施の形態６のいずれかの音声符号化装置によって音声符号化を行ってＲＦ変調器７０４に出力する音声符号化器、７０４は音声符号化器７０３によって符号化された音声情報を電波等の伝播媒体に載せて送出するための信号に変換し、送信アンテナ７０５に出力するＲＦ変調器、７０５はＲＦ変調器７０４から出力された送出信号を電波として送出する送信アンテナ、７０６は送信アンテナ７０５から送出された電波である。また、７０７はＡ／Ｄ変換器７０２と音声符号化器７０３とＲＦ変調器７０４とを構成要素として備える送信装置である。さらに、７０８は電波７０６を受信してＲＦ変調器７１１に出力する受信アンテナ、７１１は受信アンテナ７０８から入力した受信信号を符号化された音声信号に変換して音声復号化器７１２へ出力するＲＦ復調器、７１２はＲＦ復調器から出力された符号化された音声信号を入力として本発明の実施の形態６に示される音声復号化装置によって復号処理を行い、復号音声信号をD／A変換器７１３に出力する音声復号化器、７１３は音声復号化器７１２から復号音声信号を入力してアナログ音声信号に変換し、音声出力装置７１０に出力するＤ／Ａ変換器、７１０はＤ／Ａ変換器からアナログ音声信号を入力して音声を出力するスピーカ等の音声出力装置である。また、７０９はＲＦ復調器７１１と音声復号化器７１２とＤ／Ａ変換器７１３とを構成要素として備える受信装置である。
【０１７０】
以上のように構成された音声信号送信機および受信機について、図３６を参照して説明する。まず、音声が音声入力装置７０１によって電気的アナログ信号に変換され、Ａ／Ｄ変換器７０２に出力される。続いて前記アナログ音声信号がＡ／Ｄ変換器７０２によってディジタル音声信号に変換され、音声符号化器７０３に出力される。続いて音声符号化器７０３は音声符号化処理を行い、符号化した情報をＲＦ変調器７０４に出力する。続いてＲＦ変調器は符号化された音声信号の情報を変調・増幅・符号拡散等の電波として送出するための操作を行い、送信アンテナ７０５に出力する。最後に送信アンテナ７０５から電波７０６が送出される。一方、受信機においては、電波７０６を受信アンテナ７０８で受信し、受信信号はＲＦ復調器７１１に送られる。ＲＦ復調器７１１は符号逆拡散・復調等電波信号を符号化情報に変換するための処理を行い、符号化情報を音声復号化器７１２に出力する。音声復号化器７１２は、符号化情報の復号処理を行ってディジタル復号音声信号をＤ／Ａ変換器７１３へ出力する。Ｄ／Ａ変換器は音声復号化器７１２から出力されたディジタル復号音声信号をアナログ復号音声信号に変換する。最後に音声出力装置が電気的アナログ復号音声信号を復号音声に変換して出力する。
【０１７１】
上記送信装置および受信装置は携帯電話等の移動通信機器の移動機または基地局装置として利用することが可能である。なお、情報を伝送する媒体は本実施の形態に示したような電波に限らず、光信号などを利用することも可能であり、さらには有線の伝送路を使用することも可能である。
【０１７２】
なお、上記実施の形態１から６に示した音声符号化装置または復号化装置および上記実施の形態７に示した送信装置および受信装置は、磁気ディスク、光磁気ディスク、ＲＯＭカートリッジ等の記録媒体にソフトウェアとして記録して実現することも可能であり、その記録媒体を使用することにより、このような記録媒体を使用するパーソナルコンピュータ等により音声符号化装置／復号化装置および送信装置／受信装置を実現するとができる。
【０１７３】
【発明の効果】
以上詳記したように、本発明によれば、代数的符号帳の１つのパルスの位置を表すためにそのパルスに割り当てられたビットの他に少なくとももう一つの別のパルスに割り当てられたビット情報を用いることによって、ビット数の増加なしに、音源パルスの探索範囲を２倍以上の長さに拡大できる。
【０１７４】
本発明はまた、複数種類の代数的符号帳を有する構成とすることにより、ビット数の不十分なパルス数の多い代数的符号帳を有効的に利用してピッチ周期の短い音声の品質を向上するとともに、ビット数の十分なパルス数の少ない代数的符号帳を用いて有声立ち上がり部分等の品質を向上することが出来る。
【０１７５】
本発明はまた、代数的符号帳とランダム雑音符号帳とを併用することによって、音声品質の向上を図ることが出来る。
【０１７６】
本発明はまた、複数種類の代数的符号帳を有する構成において、ビット数の不十分なパルス数の少ないモードでは代数的符号帳の一部を適応符号ベクトルから求められるピッチピーク位置を用いて適応的に変化させることによって音声品質の向上を図ることが出来る。
【０１７７】
本発明はまた、２チャンネル以上から成る雑音符号帳において、各チャンネルの組合わせによって使用する符号帳を切り替えることによって、独立したモード情報無しにモードの切換が行うことが出来るようにしたものであり、各チャンネルの一部を代数的符号帳とすることによって演算量・メモリ量の削減をも図ることができる。
【０１７８】
本発明はまた、２チャンネルから構成される雑音符号帳において、一方のチャンネルに代数的符号帳を他方のチャンネルにランダム雑音符号帳を使用して用いる場合に、ランダム雑音符号帳に格納されている雑音符号ベクトルの基準点を代数的符号帳から出力されているパルスの位置に合わせて使用することによって、ランダム雑音符号帳の利用効率の向上と音声品質の向上を図ることが出来る。
【０１７９】
本発明はまた、上記音声符号化装置または復号化装置を音声符号化器または復号化器として備えることにより、より高品質な音声品質を提供できる送信装置または受信装置を実現できる。
【図面の簡単な説明】
【図１】本発明の実施の形態１における代数的符号帳の探索器の構成を示すブロック図
【図２】本発明の実施の形態１における代数的符号帳のパルス探索位置を表す模式図
【図３】本発明の実施の形態１における代数的符号帳の内容を示す表
【図４】本発明の実施の形態１における代数的符号帳の探索方法を示す流れ図
【図５】本発明の実施の形態１における代数的符号帳の探索方法を示すプログラム例
【図６】本発明の実施の形態１における音声符号化装置の構成を表すブロック図
【図７】本発明の実施の形態１における音声復号化装置の構成を表すブロック図
【図８】本発明の実施の形態２における音声符号化装置の構成を示すブロック図
【図９】本発明の実施の形態２における雑音符号帳探索方法を示す流れ図
【図１０】本発明の実施の形態２における代数的符号帳のパルス探索位置を表す模式図
【図１１】本発明の実施の形態２における代数的符号帳の内容を示す表
【図１２】本発明の実施の形態２における音声復号化装置の構成を表すブロック図
【図１３】本発明の実施の形態３における音声符号化装置の構成を示すブロック図
【図１４】本発明の実施の形態３における雑音符号帳探索方法を示す流れ図
【図１５】本発明の実施の形態３における２パルス代数的符号帳の探索方法を示す流れ図
【図１６】本発明の実施の形態３における代数的符号帳の第１チャンネルとランダム符号帳の第２チャンネルとの組合わせ探索方法を示す流れ図
【図１７】本発明の実施の形態３におけるランダム符号帳の第１チャンネルと代数的符号帳の第２チャンネルとの組合わせ探索方法を示す流れ図
【図１８】本発明の実施の形態３におけるランダム符号帳の第１チャンネルとランダム符号帳の第２チャンネルとの組合わせ探索方法を示す流れ図
【図１９】本発明の実施の形態３における代数的符号帳の内容を示す表
【図２０】本発明の実施の形態３における雑音符号帳の構成を示すブロック図
【図２１】本発明の実施の形態３における音声復号化装置の構成を示すブロック図
【図２２】本発明の実施の形態４における音声符号化装置の構成を示すブロック図
【図２３】本発明の実施の形態４における位相適応型代数的符号帳の構成を示すブロック図
【図２４】本発明の実施の形態４における位相適応型代数的符号帳のパルス探索位置を表す模式図
【図２５】本発明の実施の形態４における雑音符号帳の探索方法を示す流れ図
【図２６】本発明の実施の形態４における音声復号化装置の構成を示すブロック図
【図２７】本発明の実施の形態５における音声符号化装置の構成を示すブロック図
【図２８】本発明の実施の形態５における２チャンネル構成の雑音符号帳の構成を示すブロック図
【図２９】本発明の実施の形態５における雑音符号帳の探索方法を示す流れ図
【図３０】本発明の実施の形態５における音声復号化装置の構成を示すブロック図
【図３１】本発明の実施の形態６における音声符号化装置の構成を示すブロック図
【図３２】本発明の実施の形態６における２チャンネル構成の雑音符号帳の構成を示すブロック図
【図３３】本発明の実施の形態６における適応型ランダム符号帳の原理を表す模式図
【図３４】本発明の実施の形態６における代数的符号帳の第１チャンネルと適応型ランダム符号帳の第２チャンネルとの組合わせ探索の方法を示す流れ図
【図３５】本発明の実施の形態６における音声復号化装置の構成を示すブロック図
【図３６】本発明の実施の形態７における送信装置および受信装置の構成を示すブロック図
【図３７】従来の代数的符号帳を用いたＣＥＬＰ音声符号化装置の構成を示すブロック図
【図３８】従来の代数的符号帳のパルス探索位置を表す模式図
【図３９】従来の代数的符号帳の内容を示す表
【図４０】従来の代数的符号帳の探索方法を示す流れ図
【図４１】従来の代数的符号帳の探索方法を示すプログラム
【図４２】従来の代数的符号帳の探索器の構成を示すブロック図
【図４３】従来の代数的符号帳のパルス探索位置を表す模式図
【図４４】従来の代数的符号帳の内容を示す表
【符号の説明】
１０１パルス１インデックス生成器
１０２パルス２インデックス生成器
１０３パルス３インデックス生成器
１０４第１のインデックス／パルス位置変換器
１０５第２のインデックス／パルス位置変換器
１０６第３のインデックス／パルス位置変換器
１０７歪み評価関数計算器
１０８歪み最小化器
１５１適応符号帳
１５２、１７２隣接チャンネル依存型代数的符号帳
１５３〜１５５乗算器
１５６加算器
１５７ゲイン予測分析器
１５８ゲイン符号帳
１５９線形予測器
１６０合成フィルタ
１６１加算器
１６２歪み最小化器
２０３第１の代数的符号帳
２０４第２の代数的符号帳
２５３第１の代数的符号帳
２５４第２の代数的符号帳
３０４代数的符号帳および代数的符号帳でないランダム符号帳から構成される雑音符号帳
３５４代数的符号帳とランダム符号帳とから成る雑音符号帳
４０３一部が位相適応型である代数的符号帳
４０４代数的符号帳および代数的符号帳でないランダム符号帳から構成される雑音符号帳
４５３一部分が位相適応型代数的符号帳である代数的符号帳
４５４代数的符号帳とランダム符号帳とから成る雑音符号帳
５５４代数的符号帳と２種類のランダム符号帳とから成る雑音符号帳
６０４代数的符号帳および代数的符号帳でないランダム符号帳および適応型ランダム符帳から構成される雑音符号帳
６５４代数的符号帳と適応型ランダム符号帳とランダム符号帳とから成る雑音符号帳[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a CELP (Code Excited Linear Prediction) type in a mobile communication system that encodes and transmits an audio signal.Excitation signal encoding apparatus, excitation signal decoding apparatus and method thereof, and recording mediumIt is about.
[0002]
[Prior art]
In the field of digital mobile communication and voice storage, a voice encoding device that compresses voice information and encodes it with high efficiency is used in order to effectively use radio waves and storage media. Among them, a method based on the CELP (Code Excited Linear Prediction) method has been widely put into practical use at medium and low bit rates. For CELP technology, see M.R. Schroeder and B.S. Atal: “Code-Excited Linear Prediction (CELP): High-quality Speech at Very Low Bit Rates”, Proc. ICASSP-85, 25.1.1, pp.937-940, 1985 ".
[0003]
The CELP speech coding method divides speech into a certain frame length (about 5 ms to 50 ms), performs speech linear prediction for each frame, and knows the prediction residual (excitation signal) by linear prediction for each frame. The encoding is performed using the adaptive code vector and the noise code vector having the waveform. The adaptive code vector is an adaptive code book that stores drive excitation vectors generated in the past, and the noise code vector is a noise code book that stores vectors having a predetermined number of predetermined shapes prepared in advance. Are selected and used.
[0004]
As the noise code vector stored in the noise code book, a random noise sequence vector, a vector generated by arranging several pulses at different positions, or the like is used.
[0005]
A typical example of the latter is CS-ACELP (Conjugate Structure and Algebraic CELP), which was recommended as an international standard in ITU-T in 1996. The CS-ACELP technology is described in “Recommendation G.729: Coding of Speech at 8 kbit / s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP)”, March 1996.
[0006]
CS-ACELP uses an algebraic codebook as a noise codebook. The noise code vector generated from the CS-ACELP algebraic codebook is a vector in which four impulses having an amplitude of −1 or +1 are set in a subframe of 40 samples (5 ms) (four pulses are All except for the raised position are basically zero). Since the absolute value of the amplitude is fixed at 1, only the position and polarity (positive / negative) of each pulse need be expressed in order to express the sound source vector. For this reason, it is not necessary to store a 40 (subframe length) dimensional vector in the codebook, and a codebook storage memory is not required. Further, since only four pulses having an amplitude of 1 exist in the vector, there is an advantage that the amount of calculation for the codebook search can be greatly reduced.
[0007]
A conventional example of the CS-ACELP type coding apparatus will be specifically described below with reference to FIG.
[0008]
FIG. 37 shows a basic block diagram of a CS-ACELP speech coding apparatus. In the figure, reference numeral 1 denotes an adaptive codebook that receives a drive excitation vector generated in the past output from the adder 6 and receives a control signal from the distortion minimizer 12 and outputs an adaptive code vector to the multiplier 3. 2 is an algebraic codebook which receives a control signal from the distortion minimizer 12 and outputs a noise code vector to the

multiplier

4, and 3 is an adaptive code gain output from the gain codebook 8 and an output from the adaptive codebook 1 A multiplier that performs multiplication using the adaptive code vector to be input and outputs the multiplication result to the adder 6; 4 is a prediction gain output from the gain predictor 7 and a noise code vector output from the algebraic codebook 2; , And the multiplier 5 that outputs the multiplication result to the multiplier 5 receives the noise code vector that has been multiplied by the prediction gain output from the multiplier 4 and the noise code gain that is output from the gain codebook 8. When And a multiplier for outputting the multiplication results to the adder 6 and the gain predictor 7, respectively. 6 is the adaptive code vector after gain multiplication output from the multiplier 3 and after gain multiplication output from the multiplier 5. And an adder that outputs the result to the synthesis filter 10 and the

adaptive codebook

1, and 7 is an input of the noise code vector after gain multiplication output from the multiplier 5 for prediction. A gain predictor that outputs the gain to the multiplier 4, a gain codebook that outputs the adaptive code gain to the multiplier 3 and a noise code gain to the multiplier 5 according to the control signal from the distortion minimizer 12, and 9 A linear prediction analyzer 10 performs linear prediction analysis with an audio signal as an input and outputs a linear prediction coefficient to a synthesis filter. Reference numeral 10 denotes a driving sound source vector output from the adder 6 and a linear prediction analyzer 9. A synthesis filter that receives the input linear prediction coefficient as an input and outputs the synthesized speech signal to the

adder

11, and 11 is an addition that calculates the difference between the input speech signal and the synthesized speech signal output from the synthesis filter 10 as inputs. And 12 are used to calculate the coding distortion by using the error signal output from the adder 11, and control signals to the adaptive codebook 1, the algebraic codebook 2 and the gain codebook 8 so as to minimize the coding distortion. Is an encoding distortion minimizer. The index A of the adaptive code vector, the index S of the noise code vector, and the index G of the gain vector that minimize the distortion determined by the distortion minimizer 12 are respectively transmitted to the decoding device. Further, the linear prediction coefficient obtained by the linear prediction analyzer is quantized and then transmitted to the decoding apparatus as a quantized linear prediction coefficient L.
[0009]
The operation of the CS-ACELP speech coding apparatus having the above configuration will be described below with reference to FIGS.
[0010]
First, in FIG. 37, the speech signal is input to the linear prediction analyzer 9. The linear prediction analyzer 9 performs a linear prediction analysis of the input speech signal and calculates a linear prediction coefficient necessary for the synthesis filter 10. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 10. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device.
[0011]
Next, the synthesis filter 10 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 9. The distortion minimizer 12 drives the synthesis filter 10 using only the adaptive codebook 1 and selects the adaptive code vector having the smallest distortion from the adaptive codebook 1. Thereafter, the adaptive code vector output from the adaptive codebook 1 is fixed to the adaptive code vector determined here.
[0012]
Subsequently, the synthesis filter 10 is driven using both the adaptive codebook 1 and the algebraic codebook 2. At this time, since the adaptive code vector output from the adaptive codebook 1 has already been determined, the noise code vector output from the algebraic codebook 2 is changed, and the distortion minimizer 12 determines that the distortion is minimum. A random code vector to be selected.
[0013]
The adaptive code vector output from the adaptive codebook 1 and the noise code vector output from the algebraic codebook 2 are determined by the processing so far.
[0014]
Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 12. The gain codebook 8 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, the adaptive code gain element is output to the multiplier 3, and the noise code gain element is output to the multiplier 5. Also, an index G representing the selected gain vector is transmitted to the decoding device.
[0015]
In the CS-ACELP type speech encoding apparatus, the logarithmic domain power is predicted by the gain predictor 7 in order to narrow the dynamic range of the noise code gain, and the prediction residual is quantized using the gain codebook 8. ing. The gain predictor 7 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs the prediction result to the multiplier 4 as a prediction gain.
[0016]
The noise code vector output from the algebraic codebook 2 is multiplied by a prediction gain and a noise code gain by a multiplier 4 and a multiplier 5, respectively. The distortion minimizer 12 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before selecting the optimum gain vector from the gain codebook 8 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0017]
The configuration of the algebraic codebook, which is one of the features of CS-ACELP, will be described below with reference to FIGS. 38 and 39.
The CS-ACELP algebraic codebook consists of four channels. Each channel outputs one pulse with an amplitude of +1 or -1. The position of the pulse output from each channel is limited, and the pulse is set only at a position within a predetermined range. In CS-ACELP, excitation signals are encoded in units of 40 samples (5 ms) subframes. FIG. 38A shows each sample point in one subframe. The 40 sample points are divided into four groups shown in FIGS. That is, in FIG. 38B, when the number of the first sample point is 0 and the number is 1, 2, 3,..., 39 in order, the sample point number is divisible by 5, that is, 0, 5, 10, ..., a group of 35 sample points is shown. FIG. 38C similarly shows a group consisting of 1, 6, 11,..., 36 sample points when the number of sample points is divided by 5, that is, one remainder. FIG. 38 (d) also shows a group consisting of two sample points, ie, 2, 7, 12,..., 27, when the sample point number is divided by 5. Similarly, FIG. 38 (e) includes 3 or 4 remainders when the sample point number is divided by 5, ie, 3, 8, 13,..., 38 and 4, 9, 14,. Indicates a group.
One of the sample points included in each group is selected and a pulse having an amplitude of +1 or -1 is set up as shown in FIGS. 38 (g) to (j). A combination of the four pulses set in this way becomes a random code vector output from the algebraic codebook (FIG. 38 (f)).
[0018]
As described above, in the algebraic codebook 2, only the pulse amplitude and position are information necessary for expressing the noise vector, and the algebraic codebook includes the pulse amplitude information and position information as shown in FIG. expressed.
Next, a method for determining the amplitude and position of the pulse from each group will be described below. The amplitude of the pulse is determined before the position search is performed in order to greatly reduce the calculation amount of the search. This is determined so that all the terms of the evaluation function used for distortion minimization are all positive.
Next, a pulse position search method will be described with reference to FIGS. 40 and 41. FIG.
FIG. 40 is a flowchart showing a loop of a nested structure used for pulse position search. FIG. 40 shows a case where the number of pulses is three. The pulses are sequentially determined from channel 1 to channel 3. When the position of the pulse of channel 1 (pulse 1) is determined, an error evaluation function used for distortion minimization calculated from only pulse 1 is obtained. When the position of the pulse of channel 2 (pulse 2) is determined, an error evaluation function calculated from pulse 1 and pulse 2 is obtained. In this manner, an error evaluation function is obtained in a nested loop structure in which an error evaluation function term related to a newly added pulse is added each time a new channel pulse is added. FIG. 41 is a programmatic representation of FIG.
[0019]
An example of a pulse search apparatus that implements the pulse search method as described above is shown in FIG. Here, the algebraic codebook has 3 channels and the number of pulses is 3. 42, 13 is a pulse 1 index generator that outputs an index that represents the position of

pulse

1, 14 is a pulse 2 index generator that outputs an index that represents the position of

pulse

2, and 15 is an index r that represents the position of pulse 3. The pulse 3 index generator 16 outputs the pulse 1 index output from the pulse 1 index generator 13 and converts the pulse 1 index into the actual position of the pulse 1 to the distortion evaluation function generator 19. Similarly, the first index / pulse position converter 17 to be output receives the pulse 2 index output from the pulse 2 index generator 14 and converts the pulse 2 index into the actual position of the pulse 2 to evaluate distortion. Second index / pulse position converter to be output to the

function generator

19 , 18 is a third index which is similarly input to the index of pulse 3 output from the pulse 3 index generator 15 and converts the index of pulse 3 into the actual position of pulse 3 and outputs it to the distortion evaluation function generator 19. /

Pulse position converters

3 and 19 are the positions of the pulses output from the first index / pulse position converter 16, the second index / pulse position converter 17 and the third index / pulse position converter 18. The distortion evaluation function calculator 20 calculates the distortion evaluation function and outputs the distortion evaluation function value and each pulse position at that time to the

distortion minimizer

20, and 20 is output from the distortion evaluation function calculator 19. This is a distortion minimizer that outputs a combination of pulses that minimizes distortion by inputting a distortion evaluation function value and the position of each pulse.
[0020]
Further, a configuration example of a conventional algebraic codebook for dealing with a longer frame (subframe) length will be described with reference to FIGS. 43 and 44. FIG.
[0021]
Again, the algebraic codebook has 3 channels and 3 excitation pulses. In FIG. 43, first, one subframe (a) which is the entire search range is classified into two groups, an even sample point group (b) and an odd sample point group (c). Then, the sample points in each group are further divided into three groups, and each is used as a search position for pulses 1 to 3 for each channel. The method of dividing into three is the same as that shown in FIG. 38 of the CS-ACELP described above.
[0022]
In this way, in addition to limiting the search range for each channel, by dividing the mode into even sample points or odd sample points, all sample points in a long subframe are searched with a smaller number of bits. I can do it.
[0023]
In this case, the algebraic codebook is as shown in FIG. 44, and one bit of mode information is added in addition to the pulse amplitude and pulse position information. Such a technique is described in "Recommendation G.723.1: Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 & 6.3 kbit / s", March 1996.
[0024]
[Problems to be solved by the invention]
However, when the frame length is increased in order to further reduce the bit rate of the speech coding apparatus using the conventional algebraic codebook, the number of sample points to be subjected to pulse search increases. There is a problem that it is difficult to secure the number of bits necessary to represent the position of each excitation pulse in the codebook.
[0025]
The present invention has been made in view of the above circumstances, and uses an algebraic codebook that can improve speech quality when the bit rate is a low bit rate of about 4 kbps.Excitation signal encoding apparatus, excitation signal decoding apparatus and method thereof, and recording mediumThe purpose is to provide.
[0026]
[Means for Solving the Problems]
In order to achieve the above object, the present invention uses bit information assigned to at least another pulse in addition to the bit assigned to the pulse to represent the position of one pulse of the algebraic codebook. As a result, the search range of the sound source pulse can be expanded to twice or more the length without increasing the number of bits.
[0027]
In addition, the present invention has a configuration having a plurality of types of algebraic codebooks, thereby effectively using algebraic codebooks having a large number of pulses with insufficient bits and improving the quality of speech with a short pitch period. In addition, the quality of the voiced rising portion or the like can be improved by using an algebraic codebook with a sufficient number of bits and a small number of pulses.
[0028]
In addition, the present invention is intended to improve speech quality by using an algebraic codebook and a random noise codebook in combination.
[0029]
Further, the present invention adapts a part of the algebraic codebook using a pitch peak position obtained from the adaptive code vector in a mode having a plurality of types of algebraic codebooks and a mode with a small number of pulses and an insufficient number of bits. The sound quality can be improved by changing the sound quality.
[0030]
Further, in the present invention, in a noise codebook composed of two or more channels, the mode can be switched without independent mode information by switching the codebook to be used depending on the combination of each channel. In addition, a part of each channel is made an algebraic codebook so that the amount of calculation and memory can be reduced.
[0031]
Further, the present invention is stored in a random noise codebook when using an algebraic codebook for one channel and a random noise codebook for the other channel in a noise codebook composed of two channels. By using the reference point of the noise code vector according to the position of the pulse output from the algebraic codebook, the use efficiency of the random noise codebook can be improved and the voice quality can be improved. It is.
[0032]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a speech encoding apparatus and speech decoding apparatus according to embodiments of the present invention will be described with reference to the drawings.
[0063]
(Embodiment 1)
FIG. 1 shows a random code vector generator of an excitation signal encoding apparatus according to Embodiment 1 of the present invention. The noise code vector generator shown in the figure determines a driving excitation vector output from the specific channel using an index output from a specific channel and an index output from at least another channel. A random codebook having the function of
[0064]
In FIG. 1, reference numeral 101 denotes an index indicating the position of a sound source pulse (pulse 1) output from channel 1 of the random codebook as a first index / pulse position converter 104 and a third index / pulse position converter 106. The pulse 1 index generator 102 outputs the index representing the position of the excitation pulse (pulse 2) output from the channel 2 of the random codebook to the second index / pulse position converter 105 and the first index / A pulse 2 index generator output to the

pulse position converter

104, and 103 represents an index representing the position of the sound source pulse (pulse 3) output from the channel 3 as the third index / pulse position converter 106 and the second index. / Pulse 3 index output to pulse position converter 105 The first index / pulse position converter 104 receives the index output from the pulse 1 index generator 101 and the pulse 2 index generator 102 as input and outputs the pulse position of the pulse 1 to the distortion evaluation function calculator 107. The second index / pulse position converter 105 receives the index output from the pulse 2 index generator 102 and the pulse 3 index generator 103 as an input and outputs the pulse position of the pulse 2 to the distortion evaluation function calculator 107. The third index / pulse position converter 106 receives the index output from the pulse 3 index generator 103 and the pulse 1 index generator 101 and outputs the pulse position of the pulse 3 to the distortion evaluation function calculator 107. , 107 is the first index / parameter. The distortion evaluation function value and the position of each pulse are distorted by using the positions of the pulses output from the pulse position converter 104, the second index / pulse position converter 105, and the third index / pulse position converter 106 as input. The distortion evaluation function calculator 108 outputs to the

minimizer

108, and 108 receives the distortion evaluation function value output from the distortion evaluation function calculator 107 and the position of each pulse, and outputs a combination that minimizes the distortion. Is a generator.
[0065]
The operation of the random code vector generator configured as described above will be described with reference to FIGS.
[0066]
In FIG. 1, a pulse 1 index generator 101 selects one place from predetermined pulse search positions, and uses an index representing the position as a first index / pulse position converter 104 and a third index / pulse position. To the converter 106.
[0067]
Similarly, the pulse 2 index generator 102 is used for the second index / pulse position converter 105 and the first index / pulse position converter 104, and the pulse 3 index generator 103 is used for the third index / pulse position converter 104. An index representing the position of each pulse is output to 106 and the second index / pulse position converter 105, respectively.
[0068]
Here, the predetermined positions that each pulse can take are as shown in FIG. FIG. 2A shows sample points of the entire subframe, and FIGS. 2B to 2D show pulse search positions of the respective channels (pulses 1 to 3). FIG. 2 shows an example in which the subframe length is 24 samples.
[0069]
For example, when the search position of pulse 1 is numbered 0, 1, 2,... Sequentially from the beginning of the sample point in the subframe, the remainder obtained by dividing the number by 6 is 0 or 1. It is defined as a sample point. Similarly, the search position of pulse 2 is defined as a sample point with a remainder of 2 or 3 divided by 6, and the search position of pulse 3 is similarly defined as a sample point with a remainder of 4 or 5 divided by 6. The correspondence between the position of each pulse and the index is as shown in FIG. 2. For example, when the index of pulse 1 is 0, the position of pulse 1 is 0 or 1. At this time, whether the pulse 1 is 0 or 1 is determined by the index of the pulse 2 as described later.
[0070]
Subsequently, in FIG. 1, the indexes of pulse 1 and pulse 2 output from the pulse 1 index generator 101 and the pulse 2 index generator 102 are input to the first index / pulse position converter 104. Using the index, the pulse position of pulse 1 is determined by the first index / pulse position converter 104.
[0071]
The method for determining the pulse position is as follows. For example, when the index of the pulse 1 output from the pulse 1 index generator 101 is 2, the position of the pulse 1 is 12 or 13 as shown in FIG. Here, whether the position of pulse 1 is 12 or 13 is determined by the index of pulse 2. If the index of pulse 2 is an even number, the position of pulse 1 is 12, and if the index of pulse 2 is an odd number, the position of pulse 1 is 13. Similarly, the position of pulse 2 is determined by the second index / pulse position converter 105, and the position of pulse 3 is determined by the third index / pulse position converter 106. FIG. 3 summarizes the codebook as shown here as a table.
[0072]
The distortion evaluation function 107 is generated from the position of each pulse output from the first index / pulse position converter 104, the second index / pulse position converter 105, and the third index / pulse position converter 106. The distortion that occurs when using a random code vector is quantified by calculating a distortion evaluation function. The obtained distortion evaluation function is output to the distortion minimizer 108 together with the combination of each pulse at that time.
[0073]
The distortion minimizer 108 outputs a combination of pulses that minimizes distortion.
[0074]
FIG. 4 is a flowchart showing the above code book search procedure. This is a nested loop structure similar to that of FIG. 40, which is a flowchart of the conventional method, but since the position of each pulse is not determined unless the index of the two pulses is determined, the error evaluation function of the first (pulse 1) loop is No calculation is performed, the first error evaluation function (pulse 1 component) is calculated in the second (pulse 2) loop, and the error evaluation function of pulse 2 component and pulse 3 component in the last third (pulse 3) loop. Are simultaneously calculated. The codebook search procedure expressed in the flowchart of FIG. 4 can also be expressed by the program shown in FIG.
[0075]
1 to 5 show the case where an algebraic codebook is used as the noise codebook, but the combination information of each channel can be used also in a noise codebook having other plural channels. is there. In FIG. 2, the subframe length (pulse search range) is 24 samples, but this is effective for a longer subframe length.
[0076]
Next, a speech coding apparatus provided with an adjacent channel dependent algebraic codebook as the random codebook as described above will be described.
[0077]
FIG. 6 is a functional block diagram of a speech coding apparatus having an adjacent channel dependent algebraic codebook. In the figure, reference numeral 151 denotes an adaptive codebook that receives a drive excitation vector generated in the past output from the adder 156 and receives a control signal from the distortion minimizer 162 and outputs an adaptive code vector to the multiplier 153. , 152 receives the control signal from the distortion minimizer 12 and outputs a noise code vector to the multiplier 154. The adjacent channel-dependent algebraic codebook for performing the operation of FIG. The multiplier 154 outputs the multiplication result to the adder 156 and performs multiplication by using the adaptive code gain output from the codebook 158 and the adaptive code vector output from the adaptive codebook 151 as inputs, and is output from the gain predictor 157. The prediction gain and the random code vector output from the adjacent channel dependent algebraic codebook 152 are input, and the multiplication result is output to the multiplier 155. The multiplier 155 multiplies the noise code vector after multiplying the prediction gain output from the multiplier 154 and the noise code gain output from the gain codebook 158, and the multiplication result is added to the adder 156 and Multipliers output to the

gain predictor

157, 156 performs vector addition using the adaptive code vector after gain multiplication output from the multiplier 153 and the noise code vector after gain multiplication output from the multiplier 155 as inputs. , An adder that outputs the result to the synthesis filter 160 and the adaptive codebook 151, a gain predictor that receives the noise code vector after gain multiplication output from the multiplier 155 and outputs a prediction gain to the multiplier 154; Reference numeral 158 denotes an adaptive code gain to the multiplier 153 and a noise code gain to the multiplier 15 according to the control signal from the distortion minimizer 162. 159 is a linear prediction analyzer that outputs a speech signal as an input and performs linear prediction analysis and outputs a linear prediction coefficient to a synthesis filter. 160 is a driving excitation vector and linear prediction analysis output from an adder 156 A synthesis filter that outputs the synthesized speech signal to the adder 161, and 161 receives the input speech signal and the synthesized speech signal output from the synthesis filter 160 as inputs. An adder 162 that calculates the coding distortion using the error signal output from the adder 161 as an input, and the adaptive codebook 151, the algebraic codebook 152, and the gain codebook 158 so that the coding distortion is minimized. Is a coding distortion minimizer that sends a control signal to.
[0078]
The index A of the adaptive code vector, the index S of the noise code vector, and the index G of the gain vector that minimize the distortion determined by the distortion minimizer 162 are transmitted to the decoding device. Further, the linear prediction coefficient obtained by the linear prediction analyzer is quantized and then transmitted to the decoding apparatus as a quantized linear prediction coefficient L.
[0079]
Further, the configuration and operation of the speech decoding apparatus according to the embodiment of the present invention will be described with reference to FIG.
[0080]
FIG. 7 shows a functional block diagram of the speech decoding apparatus according to the embodiment of the present invention. In the figure, reference numeral 171 denotes an adaptive code that inputs a drive excitation vector generated in the past output from the adder 176 and outputs an adaptive code vector specified by the information A transmitted from the encoding device to the multiplier 173. A book 172 outputs the noise code vector designated by the information S transmitted from the encoding apparatus to the multiplier 174, and is the same adjacent channel as the encoding apparatus having the configuration of FIGS. 2 and 3 shown in the present embodiment. Dependent algebraic codebook 173 is a multiplier that multiplies the adaptive code gain output from gain codebook 178 and the adaptive code vector output from adaptive codebook 171 as inputs, and outputs the multiplication result to adder 176 174 is a prediction gain output from the gain predictor 177 and a noise code vector output from the adjacent channel dependent algebraic codebook 172 , And a multiplier 175 for outputting the multiplication result to the multiplier 175, and a noise code vector obtained by multiplying the prediction gain output from the multiplier 174 and a noise code gain output from the gain codebook 178. The multiplier 176 outputs the multiplication result to the adder 176 and the

gain predictor

177, and 176 denotes the gain-adapted adaptive code vector output from the multiplier 173 and the gain multiplied output from the multiplier 175. The adder outputs the result to the synthesis filter 180 and the adaptive codebook 171, and 177 receives the noise code vector after gain multiplication output from the multiplier 175 as an input, and performs prediction. A gain predictor that outputs the gain to the multiplier 174, 178 is an adaptive code gain designated by the information G transmitted from the encoder A gain codebook for outputting a noise code gain to the multiplier 175 to the multiplier 173, 179 performs a decoding process on the information L transmitted from the encoding device, obtains a quantized linear prediction coefficient, and outputs it to the synthesis filter 180 A linear prediction coefficient decoder 180, and a synthesis filter 180, which receives the driving excitation vector output from the adder 176 and the quantized linear prediction coefficient output from the linear prediction coefficient decoder 179, and outputs a synthesized speech signal. . In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0081]
The operation of the speech decoding apparatus configured as described above will be described below with reference to FIG. In FIG. 7, information L, A, S, and G is transmitted from the encoding device, and each piece of information includes a linear prediction coefficient decoder 179, an adaptive codebook 171, an adjacent channel dependent algebraic codebook 172, and a gain codebook 178. Is input.
[0082]
The linear prediction coefficient decoder 179 decodes the quantized linear prediction coefficient received as the information L and outputs it to the synthesis filter 180. The synthesis filter 180 is constructed using quantized linear prediction coefficients. The adaptive codebook 171 that has received the information A cuts out the adaptive code vector specified by the information A from the adaptive codebook and outputs it to the multiplier 173. The adjacent channel dependent algebraic codebook 172 that has received the information S generates a noise code vector specified by the information S and outputs it to the multiplier 174. Multiplier 174 multiplies the noise gain vector by the prediction gain output from gain predictor 177 and outputs the result to multiplier 175. The gain codebook that has received the information G selects and outputs the quantization gain designated by the information G from the gain codebook. At this time, the adaptive code gain is output to the multiplier 173, and the noise code gain is output to the multiplier 175. Multiplier 173 multiplies the adaptive code vector output from adaptive codebook 171 by the adaptive code gain output from gain codebook 178 and outputs the result to adder 176. The multiplier 175 multiplies the noise code vector output from the gain codebook 178 by the noise code vector output from the adjacent channel dependent algebraic codebook and multiplied by the prediction gain in the multiplier 174 and outputs the result to the adder 176. The multiplied noise code vector output from the multiplier 175 is also output to the gain predictor 177. The gain predictor 177 predicts the gain (logarithmic power) of the current noise code vector using the MA prediction or the like using the noise code vector output from the multiplier 175 in the past, and outputs it to the multiplier 174. The adder 176 adds the adaptive code vector component of the driving excitation signal output from the multiplier 173 and the noise code vector component of the driving excitation signal output from the multiplier 175 to generate a driving excitation signal, and synthesizes it. Output to the filter 180. The drive signal vector output from the adder 176 is also output to the adaptive codebook 171 and used for updating the adaptive codebook 171. The synthesis filter 180 synthesizes a synthesized signal from the driving sound source output from the adder 176 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0083]
As described above, according to the above-described embodiment, in the noise codebook having a plurality of channels, the noise code vector is generated using the combination information between the channels, so that the bits allocated to the entire noise codebook are effectively used. It can be used.
[0084]
(Embodiment 2)
FIG. 8 shows functional blocks of a CELP type speech signal encoding apparatus having two types of algebraic codebooks as noise codebooks according to the second embodiment of the present invention. In FIG. 8, 201 is an adaptive codebook in which a driving excitation vector generated in the past is input from the adder 208 and stored, and an adaptive code vector is output to the multiplier 205 by a control signal from the

distortion minimizer

214, 202 The noise codebook is composed of two types of algebraic codebooks and outputs a noise code vector from either one of the algebraic codebooks according to a control signal from the distortion minimizer 214. 203 is a part of the noise codebook A first algebraic codebook, 204 is a part of the noise codebook, and a second algebraic codebook having a different structure from the first

algebraic codebook

203, 205 is output from the adaptive codebook 201 A multiplier that receives the adaptive code vector and the adaptive code gain output from the gain codebook as input and outputs the multiplication result to the adder 208; 206, a noise code output from the noise codebook 202 A multiplier that inputs the multiplication result to the multiplier 207 with the vector and the prediction gain output from the gain predictor 209 as inputs, and 207 from the noise code vector and the gain codebook after the prediction gain multiplication output from the multiplier 206 An adder that receives the output of the noise code gain as an input and outputs the multiplication result to the adder 208 and the

gain predictor

209, and 208 represents the adaptive code vector output from the multiplier 205 and the adaptive code vector after multiplication and the multiplier 207. An adder for inputting the output prediction gain and the noise code vector after the noise code gain multiplication and outputting the vector addition result to the synthesis filter 212 and the

adaptive codebook

201, and 209, the prediction gain and noise output from the multiplier 207 A gain predictor for outputting a prediction gain to the multiplier 206 by using the noise code vector after the code gain multiplication as an input; 10 is a gain codebook that outputs an adaptive code gain to the multiplier 205 and a noise code gain to the multiplier 207 according to a control signal from the

distortion minimizer

214, and 211 is an input speech signal as an input, and linear prediction analysis and linear prediction. A linear prediction analyzer that performs coefficient quantization and outputs a quantized linear prediction coefficient to the synthesis filter 212, an excitation vector output from the adder 208, and a quantized linear prediction coefficient output from the linear prediction analyzer 211 Are input to the adder 213, a synthesis filter 213 receives the input speech signal and the synthesized speech signal output from the synthesis filter 212, and performs vector subtraction to the distortion minimizer 214. The output adder, 214 is a synthesis in the auditory weighting region or the like from the difference vector output from the adder 213 This is a distortion minimizer that calculates distortion of an audio signal with respect to an input audio signal and controls outputs of the adaptive codebook 201, the noise codebook 202, and the gain codebook 210 so that the distortion is minimized.
[0085]
The operation of the speech signal encoding apparatus configured as described above will be described with reference to FIGS.
[0086]
First, the speech signal is input to the linear prediction analyzer 211. The linear prediction analyzer 211 performs linear prediction analysis of the input speech signal and calculates a linear prediction coefficient required by the synthesis filter 212. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 212. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device.
[0087]
Next, the synthesis filter 212 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 211. The distortion minimizer 214 drives this synthesis filter using only the adaptive codebook 201 and selects an adaptive code vector having the smallest distortion from the adaptive codebook 201. This adaptive codebook expresses a periodic component of a speech signal (this period is called a pitch period), and a vector of 1 subframe length is usually cut out from a point before one pitch period (one pitch period length is If it is shorter than one subframe length, the extracted vector is repeated at a pitch period to make a vector of one subframe length). Thereafter, the adaptive code vector output from the adaptive codebook 201 is fixed to the adaptive code vector determined here.
[0088]
Subsequently, the synthesis filter 212 is driven using both the adaptive codebook 201 and the noise codebook 202. At this time, since the adaptive code vector output from the adaptive codebook 201 has already been determined, the noise code vector output from the noise codebook 202 is changed, and the distortion minimizer 214 determines that the distortion is minimum. Select a random code vector.
[0089]
Since the noise codebook 202 includes the first algebraic codebook 203 and the second algebraic codebook 204, the one with the least distortion is selected from the algebraic codebook 204. Up to this point, the adaptive code vector output from the adaptive code book 201 and the noise code vector output from the noise code book 202 are determined.
[0090]
Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 214. The gain codebook 210 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, and the element of the adaptive code gain is output to the multiplier 205 and the element of the noise code gain is output to the multiplier 207. Also, an index G representing the selected gain vector is transmitted to the decoding device.
[0091]
In the speech coding apparatus according to the present embodiment, in order to narrow the dynamic range of the noise code gain, the logarithmic domain power is predicted by the gain predictor 209, and the prediction residual is quantized using the gain codebook. Yes. The gain predictor 209 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs it to the multiplier 206 as a prediction gain. The noise code vector output from the noise code book 202 is multiplied by a prediction gain and a noise code gain by a multiplier 206 and a multiplier 207, respectively. The distortion minimizer 214 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before selecting the optimum gain vector from the gain codebook 210 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0092]
The present embodiment is characterized in that the mode can be switched according to the state of the input speech by providing two types of algebraic codebooks. FIG. 9 is a flowchart showing the algebraic codebook search method in the noise codebook. First, a 3-pulse algebraic codebook is searched, and then a 2-pulse algebraic codebook is searched. Conversely, a two-pulse algebraic codebook search may be performed first. The value of the error evaluation function used for the search of the algebraic codebook is set to the minimum value (value of 0 or less) in the initialization process before the search. In FIG. 9, the initial value of the 2-pulse algebraic codebook search is set. There is no conversion processing, and the error evaluation function when distortion is minimized in the already obtained 3-pulse algebraic codebook search is used as the initial value.
[0093]
In FIG. 9, first, the pulses of each channel are determined in order from channels 1 to 3 in the 3-channel algebraic codebook. When the position of the channel 1 pulse (pulse 1) is determined, an error evaluation function used for distortion minimization calculated from only pulse 1 is obtained, and then the position of the channel 2 pulse (pulse 2) is determined. Then, an error evaluation function calculated from pulse 1 and pulse 2 is obtained.
In this way, each time a pulse of a new channel is added, an error evaluation function is obtained in a nested loop structure in which an error evaluation function term related to the newly added pulse is added. A combination of the error evaluation function value to be minimized and the position of each pulse at that time is obtained.
Actually, the error is minimized by maximizing the error evaluation function as shown in the equation (1).
[0094]
[Expression 1]

In equation (1), x represents a target vector, H represents an impulse response of the synthesis filter, and ci represents a noise code vector output from the noise codebook. A target vector is a signal generated by driving a synthesis filter with only an adaptive code vector, subtracted from the input speech signal, and further subtracted from the zero input response signal of the synthesis filter. It should be a signal.
[0095]
In addition, when performing a search that also considers a combination with an already determined adaptive code vector, such as by performing an orthogonal search of an adaptive code vector and a noise code vector, the adaptive code vector component is used instead of (1). The same search can be performed by using the formula (2) included in the evaluation formula.
[0096]
[Expression 2]

In equation (2), x is a target vector (a vector to be synthesized by a synthesis filter from a driving excitation vector formed by combining an adaptive code vector and a noise code vector. Different from the target vector in equation (1), p is adaptive. A code vector, c is a noise code vector (ci is a noise code vector indicated by an index i), and H is an impulse response convolution matrix of the synthesis filter.
[0097]
Next, in the two-channel algebraic codebook, the pulse position of each channel is determined. In the error minimization process in the 2-channel algebraic codebook, each pulse position is updated only when an error evaluation function value exceeding the error evaluation function value maximized in the 3-channel algebraic codebook search is obtained. When this update is performed, the random code vector is output from the 2-channel algebraic codebook instead of the 3-channel algebraic codebook.
[0098]
Further, an example of the configuration of two types of algebraic codebooks will be described below with reference to FIGS.
[0099]
FIGS. 10 and 11 correspond to FIGS. 43 and 44 of the conventional example, respectively, and FIG. 10 is a schematic diagram showing pulse search positions of pulses (pulses 1 to 3 or pulses 1 and 2) of each channel. FIG. 11 is a table showing a code book. These drawings show an example in which the number of pulses has three and two modes.
[0100]
43 and 44, the mode information is used to indicate whether the pulse position is an even sample point or an odd sample point. In FIGS. 10 and 11, whether the mode information is an algebraic codebook of two pulses. It is used to indicate whether it is a 3-pulse algebraic codebook.
[0101]
Further, in FIG. 10 and FIG. 11, when the algebraic codebook has 3 pulses, the range of the search position is concentrated on the first half of the subframe. This makes it possible to perform a pitch periodization process (a process for repeating a noise code vector at a pitch period or a filter process for emphasizing the pitch period for a noise code vector). The number of pulses of the noise code vector can be increased without reducing the temporal resolution of the pulse search.
[0102]
Also, by providing a two-pulse algebraic codebook, it is possible to perform a pulse search with a fine positional accuracy, so that it is possible to improve the performance of a voiced rising portion or the like.
[0103]
In the example of the present embodiment, the subframe length is set to 24 for the sake of simplicity, but the above effectiveness can be obtained when the subframe length is actually as long as about 80 samples or more.
[0104]
In the example of the present embodiment, there are two types of algebraic codebooks, but an algebraic codebook or the like in which the number of pulses is increased by concentrating the pulse search range more in the head part of the subframe is provided. In this way, it may be more effective to increase the types of algebraic codebooks.
[0105]
In the example of the present embodiment, vector quantization and backward prediction of noise code vector power are used in gain quantization. However, when scalar quantization is used or backward prediction of noise code vector power is used. This method can be applied even when not used.
[0106]
In the example of the present embodiment, the vector stored in the gain codebook is a two-dimensional vector of an adaptive code gain component and a noise code gain component, but the power of the excitation vector and the adaptive code vector occupying that power It may be a two-dimensional vector composed of other two elements such as a ratio of (or a noise code vector).
[0107]
Also, the speech decoding apparatus in the present embodiment will be described with reference to FIG. FIG. 12 is a block diagram showing an example of a speech decoding apparatus according to Embodiment 2 of the present invention. Reference numeral 251 designates a driving excitation vector generated in the past output from the adder 258 as an input and outputs from the encoding apparatus. An adaptive codebook that outputs the adaptive code vector specified by the transmitted information A to the

multiplier

255, and 252 indicates the noise code vector specified by the information S transmitted from the encoding device as the first algebraic codebook 253. Alternatively, the same noise codebook as the coding apparatus having the configuration of FIGS. 10 and 11 shown in the present embodiment, which is extracted from the second algebraic codebook 254 and output to the

multiplier

256, 255 is a gain codebook Multiplier that multiplies the adaptive code gain output from 260 and the adaptive code vector output from adaptive codebook 251 as inputs, and outputs the multiplication result to adder 258 256 is a multiplier that receives the prediction gain output from the gain predictor 259 and the noise code vector output from the noise codebook 252, and outputs a multiplication result to the

multiplier

257, and 257 is output from the multiplier 256. A multiplier 258 that multiplies the noise code vector multiplied by the prediction gain and the noise code gain output from the gain codebook 260 and outputs the multiplication result to the adder 258 and the gain predictor 259, respectively. Vector addition is performed using the adaptive code vector after gain multiplication output from multiplier 255 and the noise code vector after gain multiplication output from multiplier 257 as inputs, and the result is output to synthesis filter 262 and adaptive codebook 251. The adder 259 receives the noise code vector after gain multiplication output from the multiplier 257 as an input, and outputs a prediction gain. A gain predictor 260 that outputs to the multiplier 256, a gain codebook 260 that outputs the adaptive code gain specified by the information G transmitted from the encoding device to the multiplier 255, and the noise code gain to the

multiplier

257, 261 Is a linear prediction coefficient decoder that decodes the information L transmitted from the encoding device, obtains a quantized linear prediction coefficient, and outputs the quantized linear prediction coefficient to the

synthesis filter

180, and 262 is linear with the driving excitation vector output from the adder 258. This is a synthesis filter that receives the quantized linear prediction coefficient output from the prediction coefficient decoder 261 and outputs a synthesized speech signal. In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0108]
The operation of the speech decoding apparatus configured as described above will be described. Information L, A, S, and G is transmitted from the encoding device, and each information is input to the linear prediction coefficient decoder 261, the adaptive codebook 251, the noise codebook 252, and the gain codebook 260.
[0109]
The linear prediction coefficient decoder 261 that has received the information L decodes the quantized linear prediction coefficient and outputs it to the synthesis filter 262. The synthesis filter 262 is constructed using quantized linear prediction coefficients. The adaptive codebook 251 that has received the information A cuts out the adaptive code vector designated by A from the adaptive codebook and outputs it to the multiplier 255. The noise codebook 252 that has received the information S generates a noise code vector designated by S from the first algebraic codebook 253 or the second algebraic codebook 254, and outputs it to the multiplier 256. Multiplier 256 multiplies the noise gain vector by the prediction gain output from gain predictor 259 and outputs the result to multiplier 257. The gain codebook that has received the information G selects and outputs the quantization gain designated by G from the gain codebook. At this time, the adaptive code gain is output to the multiplier 255 and the noise code gain is output to the multiplier 257, respectively. Multiplier 255 multiplies the adaptive code vector output from adaptive codebook 251 by the adaptive code gain output from gain codebook 260 and outputs the result to adder 258. Multiplier 257 multiplies the noise code vector output from gain codebook 260 by the noise code vector output from adjacent noise codebook 252 and multiplied by the prediction gain in multiplier 256 and outputs the result to adder 258. Note that the multiplied noise code vector output from the multiplier 257 is also output to the gain predictor 259.
[0110]
The gain predictor 259 predicts the gain (logarithmic power) of the current noise code vector using the MA prediction or the like using the noise code vector output from the multiplier 257 in the past, and outputs it to the multiplier 256. The adder 258 adds the adaptive code vector component of the driving excitation signal output from the multiplier 255 and the noise code vector component of the driving excitation signal output from the multiplier 257 to generate a driving excitation signal, and synthesizes it. Output to the filter 262. The drive signal vector output from the adder 258 is also output to the adaptive codebook 251 and used for updating the adaptive codebook 251. The synthesis filter 262 synthesizes a synthesized signal from the driving sound source output from the adder 258 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0111]
As described above, according to the second embodiment, by providing a plurality of types of algebraic codebooks and switching each algebraic codebook according to the characteristics of the input speech signal, the number of bits can be reduced. Even in this case, the encoding performance of the audio signal can be improved.
[0112]
(Embodiment 3)
FIG. 13 shows functional blocks of a speech signal coding apparatus according to a third embodiment of the present invention that uses both an algebraic codebook and a random noise codebook as a noise codebook.
[0113]
In FIG. 13, reference numeral 301 denotes an adaptive codebook in which driving excitation vectors generated in the past are input from the adder 308 and stored, and an adaptive code vector is output to the multiplier 305 by a control signal from the distortion minimizer 314. A noise codebook which consists of two types of noise codebooks and outputs a noise code vector from one of the noise codebooks according to a control signal from the distortion minimizer 314, and 303 is a part of the noise codebook 302 An algebraic codebook, 304 is a part of the noise codebook 302 and is composed of an algebraic codebook having a different structure from the algebraic codebook 303 and a random codebook that is not an algebraic codebook, 305 is an adaptive code A multiplier that inputs the adaptive code vector output from the book 301 and the adaptive code gain output from the gain codebook as inputs and outputs the multiplication result to the adder 308; 30 Is a multiplier that inputs the noise code vector output from the noise codebook 302 and the prediction gain output from the gain predictor 309 and outputs the multiplication result to the multiplier 307, and 307 is the prediction gain output from the multiplier 306 An adder that receives the multiplied noise code vector and the noise code gain output from the gain codebook and outputs the multiplication result to the adder 308 and the gain predictor 309, and 308 denotes the adaptive code gain output from the multiplier 305 An adder 309 outputs the vector addition result to the synthesis filter 312 and the adaptive codebook 301 by inputting the adaptive code vector after the multiplication and the prediction gain output from the multiplier 307 and the noise code vector after the noise code gain multiplication. Multiplier using as input the prediction code output from multiplier 307 and the noise code vector after the noise code gain multiplication A gain predictor that outputs a prediction gain to 06, a gain codebook that outputs adaptive code gain to the multiplier 305, and a noise code gain to the multiplier 307, respectively, according to a control signal from the distortion minimizer 314; A linear prediction analyzer that performs linear prediction analysis and quantization of linear prediction coefficients with an audio signal as an input, and outputs the quantized linear prediction coefficients to the synthesis filter 312; 312 is an excitation vector output from the adder 308 and linear prediction analysis A synthesis filter that outputs the synthesized speech signal to the adder 313 using the quantized linear prediction coefficient output from the unit 311 as an input, and a vector 313 that receives the input speech signal and the synthesized speech signal output from the synthesis filter 312 as a vector An adder that performs subtraction and outputs the result to the distortion minimizer 314; 314 is a difference output from the adder 313 The distortion with respect to the input speech signal of the synthesized speech signal in the auditory weighting region or the like is calculated from the vector, and the distortion minimum for controlling the output of the adaptive codebook 301, the noise codebook 302, and the gain codebook 310 so as to minimize this distortion Is a generator.
[0114]
The operation of the speech signal encoding apparatus configured as described above will be described with reference to FIGS.
[0115]
First, the speech signal is input to the linear prediction analyzer 311. The linear prediction analyzer 311 performs linear prediction analysis of the input speech signal and calculates a linear prediction coefficient required by the synthesis filter 312. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 312. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device.
[0116]
Next, the synthesis filter 312 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 311. The distortion minimizer 314 drives this synthesis filter using only the adaptive codebook 301, and selects an adaptive code vector having the smallest distortion from the adaptive codebook 301. This adaptive codebook expresses a periodic component of a speech signal (this period is called a pitch period), and a vector of 1 subframe length is usually cut out from a point before one pitch period (one pitch period length is If it is shorter than one subframe length, the extracted vector is repeated at a pitch period to make a vector of one subframe length). An index A representing the selected adaptive code vector is transmitted to the decoder side. Thereafter, the adaptive code vector output from the adaptive code book 301 is fixed to the adaptive code vector determined here.
[0117]
Subsequently, the synthesis filter 312 is driven using both the adaptive codebook 301 and the noise codebook 302. At this time, since the adaptive code vector output from the adaptive codebook 301 has already been determined, the noise code vector output from the noise codebook 302 is changed and the distortion minimizer 314 determines that the distortion is minimum. Select a random code vector.
[0118]
The noise codebook 302 includes an algebraic codebook 303 and a noise codebook 304 including a random codebook that is not an algebraic codebook and an algebraic codebook. The one that minimizes the distortion is selected from both codebooks. An index S representing the selected random code vector is transmitted to the decoder side. Up to this point, the adaptive code vector output from the adaptive code book 301 and the noise code vector output from the noise code book 302 are determined. Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 314. The gain codebook 310 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, the adaptive code gain element is output to the multiplier 305, and the noise code gain element is output to the multiplier 307. Also, an index G representing the selected gain vector is transmitted to the decoding device. In the speech coding apparatus according to the present embodiment, in order to narrow the dynamic range of the noise code gain, the logarithmic domain power is predicted by the gain predictor 309, and the prediction residual is quantized using the gain codebook. Yes. The gain predictor 309 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs it to the multiplier 306 as a prediction gain. The noise code vector output from the noise codebook 302 is multiplied by a prediction gain and a noise code gain by a multiplier 306 and a multiplier 307, respectively. The distortion minimizer 314 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before selecting the optimum gain vector from the gain codebook 310 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0119]
The above operation is the same as that of Embodiment 2 except that one of the two types of noise codebooks is composed of an algebraic codebook and a random codebook that is not an algebraic codebook. By providing a random codebook that is not an algebraic codebook as a part of the algebraic codebook, it is characterized in that it has a configuration that can have modes corresponding to various states of input speech.
[0120]
FIG. 14 is a flowchart showing the noise codebook search method. FIG. 14 is a rewrite of the two-pulse algebraic codebook portion of FIG. 9 which is a flowchart of the third embodiment. First, a 3-pulse algebraic codebook search is performed, followed by a 2-pulse algebraic codebook search and a search and noise combining the first channel of the algebraic codebook and the second channel of the random codebook. A search combining the first channel of the codebook and the second channel of the algebraic codebook and a search combining the first channel of the noise codebook and the second channel of the noise codebook are performed. The search order of these code books may be any other possible order. FIG. 14 two-pulse algebraic codebook search 315 and (first channel of algebraic codebook + second channel of random codebook) search 316 and (first channel of random codebook + second channel of algebraic codebook) 15), FIG. 16, FIG. 17, and FIG. 18 show the flowcharts representing the contents of the search 317 and the (random codebook first channel + random codebook second channel) search 318, respectively. The search of the 2-pulse algebraic codebook shown in FIG. 15 is the same as the search portion of the 2-pulse algebraic codebook shown in the second embodiment. That is, the pulse of each channel is determined in order from channel 1 to channel 2. When the position of the channel 1 pulse (pulse 1) is determined, an error evaluation function used for distortion minimization calculated from only pulse 1 is obtained, and then the position of the channel 2 pulse (pulse 2) is determined. Then, an error evaluation function calculated from pulse 1 and pulse 2 is obtained. In this way, each time a new channel pulse is added, an error evaluation function is obtained in a nested loop structure in which an error evaluation function term related to the newly added pulse is added, and the error is minimized. A combination of the error evaluation function value and the position of each pulse at that time is obtained. Actually, the error is minimized by maximizing the error evaluation function as shown in the equation (1).
[0121]
The optimum output noise code vector in the two-pulse algebraic codebook obtained here is used as a candidate output from the two-pulse algebraic codebook in the selection of the optimum mode 319 in FIG.
[0122]
FIG. 16 shows a flowchart for searching for (first channel of algebraic codebook + second channel of random codebook). First, the top N candidates that increase the numerator of the error evaluation function are selected from the first channel of the algebraic codebook (one channel of the two-pulse algebraic codebook). N is determined by the allowable calculation amount.
[0123]
Next, the top M candidate that increases the numerator term of the error evaluation function is selected from the second channel of the random codebook. M is determined by the allowable calculation amount and may or may not be equal to N. Then, the N candidate noise vectors from the one-pulse algebraic codebook selected by evaluating only the numerator term of the error evaluation function and the M candidate noise vectors selected from the second channel of the random codebook are combined. In this case, the numerator and denominator of the error evaluation function are calculated for each combination, and the error evaluation function is maximized to minimize the error between the synthesized speech and the input speech. The noise code vector generated by the combination determined to have the smallest error is output from the mode in the selection of the optimum mode 319 in FIG. 14 (first channel of algebraic codebook + second channel of random codebook). Used as a candidate.
FIG. 17 is a diagram showing a codebook search flowchart in a mode in which the first channel of the random codebook and the second channel of the algebraic codebook are combined with each channel in FIG. 16 reversed. .
[0124]
First, the top M candidate that increases the numerator term of the error evaluation function is selected as candidate 1 from the first channel of the random codebook. Next, the top N candidate that increases the numerator term of the error evaluation function is selected as candidate 2 from the second channel of the algebraic codebook. Thereafter, in the M × N combination of candidate 1 and candidate 2, the numerator and denominator term of the error evaluation function are calculated, respectively, and the combination that minimizes the distortion is determined by maximizing the value of the error evaluation function. The noise code vector obtained from the determined combination is used as a candidate output from the mode (first channel of random codebook + second channel of algebraic codebook) in the selection of the optimum mode 319 in FIG.
[0125]
FIG. 18 is obtained by replacing the algebraic codebook portion shown in FIG. 16 or 17 with a random codebook search loop. That is, it shows a flowchart for selecting the optimum noise code vector when the noise code vectors output from the first channel and the second channel of the random codebook are combined.
[0126]
First, the top M candidate that increases the numerator of the error evaluation function is selected as candidate 1 from the first channel of the random codebook. Next, the noise code vector of the top M candidate that increases the numerator term of the error function is selected as candidate 2 from the second channel of the random codebook. Then, in the M × M combinations of candidate 1 and candidate 2, the numerator and denominator term of the error evaluation function when the noise code vector obtained as a result of the combination is used are calculated to calculate the error evaluation function. A combination that maximizes the error evaluation function, that is, minimizes the distortion is selected from the M × M candidates, and the optimal noise code in the mode (the first channel of the random codebook + the second channel of the random codebook) Determine as a vector. The determined random code vector is used in the selection of the optimum mode 319 in FIG. 14 as a candidate output from the (random codebook first channel + random codebook second channel) mode. In 319 of FIG. 14, the noise code vector having the largest error evaluation function, that is, the smallest distortion is selected from 2 modes / 1 pulse + random / random + 1 pulse / random + random. Select as the optimal noise code vector for the channel mode. Furthermore, this is compared with the optimal noise code vector of the three-channel mode determined in advance, and a mode with a larger error evaluation function, that is, a distortion is selected, is selected from the noise codebook 302 in FIG. An output noise code vector is determined.
[0127]
A table showing a specific example of such a noise codebook 302 is shown in FIG. In this example, the subframe length (the vector dimension number of the noise code vector) is 80. 1 bit for the bit indicating 3 channel mode or 2 channel mode, 1 bit / channel to indicate the polarity of each channel, and the remaining number of bits for index information indicating pulse position or noise codebook, totaling 15 bits Is used. The numerical values in the pulse search position column in the table are the positions of the sample points where the top of the subframe is 0, and RA01 to RA24 and RB01 to RB24 in the pulse search position column of the 2-channel mode are in each channel. The index of the noise book is shown.
[0128]
FIG. 20 shows the configuration of the two-channel noise codebook section (304 in FIG. 13) of this codebook. The configuration and operation of this 2-channel noise codebook section will be described below with reference to FIG. In FIG. 20, reference numeral 320 denotes a first channel of a noise codebook, which is composed of a first channel 322 of an algebraic codebook and a first channel 323 of a random codebook. The first channel 322 of the algebraic codebook outputs a noise code vector to the adder 326 and the adder 327, and the first channel 323 of the random codebook outputs the noise code vector to the adder 328 and the adder 329. On the other hand, reference numeral 321 denotes a second channel of the noise codebook, which includes a second channel 324 of an algebraic codebook and a second channel 325 of a random codebook. The second channel 324 of the algebraic codebook outputs a noise code vector to the adder 326 and the adder 328, and the second channel 325 of the random codebook outputs the noise code vector to the adder 327 and the adder 329. Adders 326 to 329 add the input two noise code vectors and output the result to switch 330. The switch 330 selects and outputs an optimum noise code vector from among the noise code vectors output from the adders 326 to 329. Here, the noise code vector from the first channel 322 of the algebraic codebook input to the adder 326 and the noise code vector from the second channel 324 of the algebraic codebook are vectors that are optimal in the combination. The noise code vector from the first channel 322 of the algebraic codebook input to the adder 327 and the noise code vector from the second channel 325 of the random codebook are vectors that are optimal in the combination, and the adder The random code vector from the first channel 323 of the random codebook and the random code vector from the algebraic codebook second channel 324 are input to the adder 329. Random codebook and random codebook from the first channel 323 of the random codebook Random code vectors from the second channel 325 is a vector which is optimal in the combination. The switch 330 is not switched by dedicated switching information, but is switched by a combination of the channel 1 noise codebook 320 and the channel 2 noise codebook 321.
[0129]
In addition, as a random codebook that is not an algebraic codebook, a white noise-like noise codebook, a codebook optimized by learning, or a codebook having such a sparse structure can be used. In particular, the sparse structure is effective in terms of the amount of calculation. In addition, learning-type codebooks can improve performance by learning. Further, if a part of the noise codebook is a learning-type codebook and the remaining part is a random noise codebook, it is possible to obtain both effects of improving noise resistance and improving performance by learning.
[0130]
Moreover, the speech decoding apparatus in the present embodiment will be described with reference to FIG. FIG. 21 is a block diagram showing an example of a speech decoding apparatus according to

Embodiment

3, and 351 is transmitted from the encoding apparatus with the drive excitation vector generated in the past output from the adder 358 as an input. An adaptive codebook that outputs the adaptive code vector specified by the information A to the multiplier 355, and 352 a noise code vector specified by the information S transmitted from the encoding device as an algebraic codebook 353 or an algebraic codebook. The same noise codebook as that of the coding apparatus having the configuration of FIGS. 19 and 20 shown in the present embodiment, which is extracted from the random codebook 354 made up of a random codebook that is not an algebraic codebook and output to the multiplier 356 355 performs multiplication by using the adaptive code gain output from the gain codebook 360 and the adaptive code vector output from the adaptive codebook 351 as input. A multiplier that outputs the result to the adder 358 356 receives the prediction gain output from the gain predictor 359 and the noise code vector output from the noise codebook 352, and outputs the multiplication result to the multiplier 357. The multiplier 357 multiplies the noise code vector multiplied by the prediction gain output from the multiplier 356 and the noise code gain output from the gain codebook 360 as inputs, and the multiplication result is added to the adder 358 and gain prediction. Multipliers 358 respectively output to the multiplier 359, and the vector addition is performed with the gain-multiplied adaptive code vector output from the multiplier 355 and the gain-multiplied noise code vector output from the multiplier 357 as inputs. Are added to the synthesis filter 362 and the

adaptive codebook

351, and 359 is the noise after gain multiplication outputted from the multiplier 357. A gain predictor that receives the signal vector and outputs a prediction gain to the

multiplier

356, 360 is an adaptive code gain designated by the information G transmitted from the encoding device to the multiplier 355, and a noise code gain is a multiplier 357. 361 is a linear prediction coefficient decoder that performs decoding processing of information L transmitted from the encoding device, obtains a quantized linear prediction coefficient, and outputs it to the synthesis filter 380, and 362 is an adder 358. 2 is a synthesis filter that outputs a synthesized speech signal with the drive excitation vector output from the signal and the quantized linear prediction coefficient output from the linear prediction coefficient decoder 361 as inputs. In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0131]
The operation of the speech decoding apparatus configured as described above will be described below with reference to FIG. In FIG. 21, information L, A, S, and G is transmitted from the encoding device, and each information is input to the linear prediction coefficient decoder 361, the adaptive codebook 351, the noise codebook 352, and the gain codebook 360. The linear prediction coefficient decoder 361 that has received the information L decodes the quantized linear prediction coefficient and outputs it to the synthesis filter 362. The synthesis filter 362 is constructed using quantized linear prediction coefficients. The adaptive codebook 351 that has received the information A cuts out the adaptive code vector designated by A from the adaptive codebook and outputs it to the multiplier 355. The noise codebook 352 that has received the information S generates a noise code vector designated by S from the algebraic codebook 353 or the noise codebook 354 composed of the algebraic codebook and the random codebook, and outputs it to the multiplier 356. . Multiplier 356 multiplies the noise gain vector by the prediction gain output from gain predictor 359 and outputs the result to multiplier 357. The gain codebook that has received the information G selects and outputs the quantization gain designated by G from the gain codebook. At this time, the adaptive code gain is output to the multiplier 355 and the noise code gain is output to the multiplier 357, respectively. Multiplier 355 multiplies the adaptive code vector output from adaptive codebook 351 by the adaptive code gain output from gain codebook 360 and outputs the result to adder 358. The multiplier 357 multiplies the noise code vector output from the gain codebook 360 by the noise code vector output from the adjacent noise codebook 352 and multiplied by the prediction gain in the multiplier 356 and outputs the result to the adder 358. Note that the multiplied noise code vector output from the multiplier 357 is also output to the gain predictor 359. The gain predictor 359 predicts the gain (logarithmic power) of the current noise code vector using the MA prediction or the like using the noise code vector output from the multiplier 357 in the past, and outputs it to the multiplier 356. The adder 358 adds the adaptive code vector component of the drive excitation signal output from the multiplier 355 and the noise code vector component of the drive excitation signal output from the multiplier 357 to generate a drive excitation signal, and synthesizes it. Output to the filter 362. The drive signal vector output from the adder 358 is also output to the adaptive codebook and used for updating the adaptive codebook. The synthesis filter 362 synthesizes a synthesized signal from the driving sound source output from the adder 358 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0132]
As described above, according to the third embodiment, by providing a noise codebook that is not an algebraic codebook as a part of the algebraic codebook, speech coding performance that cannot be improved only by the algebraic codebook is improved. It becomes possible.
[0133]
(Embodiment 4)
FIG. 22 shows functional blocks of a speech coding apparatus according to Embodiment 4 of the present invention. In the speech coding apparatus according to the fourth embodiment, in a configuration having a plurality of types of algebraic codebooks, a part of the algebraic codebook can be obtained from an adaptive code vector in a mode in which the number of pulses is insufficient and the number of pulses is insufficient. The voice quality can be improved by adaptively changing the pitch peak position. In FIG. 22, reference numeral 401 denotes a drive excitation vector generated in the past from an adder 408 and stores it. An adaptive code vector is multiplied by a multiplier 405 and an algebraic codebook + phase adaptive type according to a control signal from a distortion minimizer 414. An adaptive codebook output to the algebraic codebook 403 is composed of two types of noise codebooks, and noise that outputs a noise code vector from one of the noise codebooks according to a control signal from the distortion minimizer 414. Codebook 403 is an algebraic codebook having a phase adaptive type part that is part of the

noise codebook

402, 404 is a part of the noise codebook 402 and has an algebraic codebook having a structure different from that of the algebraic codebook 403 405 is a random codebook that is not an algebraic codebook, 405 is output from an adaptive code vector and gain codebook output from the adaptive codebook 401 A multiplier that receives the multiplication gain as an input and outputs the multiplication result to the adder 408. A multiplier 406 receives the noise code vector output from the noise codebook 402 and the prediction gain output from the gain predictor 409 as input. A multiplier 407 outputs to the

multiplier

407, and 407 denotes a noise code vector after the prediction gain multiplication output from the multiplier 406 and the noise code gain output from the gain codebook as inputs. An adder output to the

multiplier

409, and 408, the adaptive code vector after the adaptive code gain multiplication output from the multiplier 405 and the noise code vector after the multiplication of the prediction gain and the noise code gain output from the multiplier 407 are input. An adder that outputs the vector addition result to the synthesis filter 412 and the adaptive codebook 401, 409 is output from the multiplier 407. A gain predictor that outputs the prediction gain to the multiplier 406 by using the prediction code and the noise code vector after the noise code gain multiplication, and 410 a adaptive code gain to the multiplier 405 according to a control signal from the distortion minimizer 414. A gain codebook that outputs the noise code gain to the multiplier 407, 411 performs linear prediction analysis and linear prediction coefficient quantization using the input speech signal as input, and outputs the quantized linear prediction coefficient to the synthesis filter 412 The analyzer 412 receives the excitation vector output from the adder 408 and the quantized linear prediction coefficient output from the linear prediction analyzer 411 as inputs, and the synthesis filter that outputs the synthesized speech signal to the adder 413. Vector subtraction is performed using the speech signal and the synthesized speech signal output from the synthesis filter 412 as input, and the result is distorted. An adder 414 that outputs to the minimizer 414 calculates distortion of the synthesized speech signal in the auditory weighting region or the like with respect to the input speech signal from the difference vector output from the adder 413, and is adapted so that this distortion is minimized. It is a distortion minimizer that controls the output of the codebook 401, the noise codebook 402, and the gain codebook 410.
[0134]
The operation of the speech signal encoding apparatus configured as described above will be described with reference to FIGS.
[0135]
First, in FIG. 22, the speech signal is input to the linear prediction analyzer 411. The linear prediction analyzer 411 performs linear prediction analysis of the input speech signal and calculates a linear prediction coefficient required by the synthesis filter 412. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 412. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device.
[0136]
Next, the synthesis filter 412 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 411. The distortion minimizer 414 drives this synthesis filter using only the adaptive codebook 401, and selects an adaptive code vector having the smallest distortion from the adaptive codebook 401. This adaptive codebook expresses a periodic component of a speech signal (this period is called a pitch period), and a vector of 1 subframe length is usually cut out from a point before one pitch period (one pitch period length is If it is shorter than one subframe length, the extracted vector is repeated at a pitch period to make a vector of one subframe length). An index A representing the selected adaptive code vector is transmitted to the decoder side. Thereafter, the adaptive code vector output from the adaptive codebook 401 is fixed to the adaptive code vector determined here. Subsequently, the synthesis filter 412 is driven using both the adaptive codebook 401 and the noise codebook 402. At this time, since the adaptive code vector output from the adaptive codebook 401 has already been determined, the noise code vector output from the noise codebook 402 is changed and the distortion minimizer 414 determines that the distortion is minimum. Select a random code vector.
[0137]
Since the noise codebook 402 includes an algebraic codebook 403 having a phase-adaptive part and a noise codebook 404 including a random codebook that is not an algebraic codebook and an algebraic codebook, a distortion is generated from both codebooks. The one that minimizes is selected.
[0138]
Note that the algebraic codebook of the phase adaptive type part is generated with the adaptive code vector and pitch period output from the adaptive codebook as inputs. The index S representing the selected random code vector is transmitted to the decoder side. Up to this point, the adaptive code vector output from the adaptive code book 401 and the noise code vector output from the noise code book 402 are determined. Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 414. The gain codebook 410 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, and the elements of the adaptive code gain are output to the multiplier 405 and the elements of the noise code gain are output to the multiplier 407. Also, an index G representing the selected gain vector is transmitted to the decoding device.
[0139]
In the speech coding apparatus according to the present embodiment, the power of the logarithmic region is predicted by the gain predictor 409 in order to narrow the dynamic range of the noise code gain, and the prediction residual is quantized using the gain codebook. Yes.
[0140]
The gain predictor 409 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs it to the multiplier 406 as a prediction gain. The noise code vector output from the noise codebook 402 is multiplied by a prediction gain and a noise code gain by a multiplier 406 and a multiplier 407, respectively. The distortion minimizer 414 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before selecting the optimal gain vector from the gain codebook 410 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0141]
FIG. 23 shows an example of the configuration of the algebraic codebook 403 having the phase adaptive type part of FIG. The configuration and operation will be described below with reference to FIG. In the figure, 415 to 418 are the codebooks A to D of the

pulse

1, 419 to 422 are the codebooks A to D of the

pulse

2, 423 to 426 are the codebooks A to D of the pulse 3, and the output of each pulse codebook is The signal is input to the adder 430 through the switch 429. The switch 429 is three interlocking switches, and selects one of the pulse codebooks A to D by the signal output from the phase adaptive position determination unit 428 and connects it to the adder 430. That is, when codebook A of pulse 1 is selected, codebook A is also selected for the codebooks of

pulses

2 and 3. Adder 430 performs vector addition of the three codebook outputs output from switch 429 and outputs the result as a noise code vector. On the other hand, the pitch peak position detector 427 receives the adaptive code vector and the pitch period, obtains the pitch peak position at the head in the subframe, and outputs it to the phase adaptive position determiner 428. As a method of obtaining the pitch peak position, there is a method of obtaining a position that maximizes the cross-correlation function between the impulse sequence arranged in the pitch period and the adaptive code vector. As other methods, as described in Mano, Moriya: “Examination of phase adaptive PSI-CELP speech coding”, IEICE Technical Report SP94-96 (February 1995), There is a method of maximizing the cross-correlation between the one in which the synthesis filter is driven by the impulse train arranged in the pitch period and the one in which the synthesis filter is driven by the adaptive code vector. When the pitch peak position is obtained, the phase adaptive position determination unit 428 determines a codebook to be used by switching according to the position, and sends a signal for switching to the switch 429.
[0142]
In FIG. 23, the code books A to D of the

pulses

1, 2, and 3 are shown as completely different blocks, but a part of the code books A to D of each pulse are shared code books. That is, as shown by 403 in FIG. 22, it is composed of a fixed algebraic codebook and an algebraic codebook that is adaptively switched according to the phase position. In FIG. This algebraic codebook will be described with reference to FIG.
[0143]
FIG. 24 shows an example of an algebraic codebook having a phase adaptive part used in the present embodiment. FIG. 24A shows an adaptive code vector (for one subframe) output from the adaptive codebook, and FIG. 24B shows a state in which one subframe is divided into four regions A to D. (C) shows the fixed part and the phase adaptive part of the algebraic codebook. In FIG. 24C, the part (6) is a fixed algebraic codebook part, and this part is always searched regardless of the position of the pitch peak. In this example, the fixed part algebraic codebook is used to search for even sample points. On the other hand, the portion (5) is a portion obtained by removing the fixed portion from the fixed portion in the conventional method. The information assigned to the removed part is assigned to the phase adaptive part. That is, the conventional search position is fixed to the portion {circle around (5)} so that any one of the portions {circle around (1)} to {circle around (4)} is searched according to the position of the pitch peak. This part is called a phase adaptive part in the sense that it is a part that is adaptively switched according to the phase. In this example, except for the portion {circle around (4)}, the phase adaptive type algebraic codebook is used to search for odd sample points. Since the portion {circle around (4)} is searched only in the phase adaptive type portion, it may be either an even sample point or an odd sample point, but in FIG. 24 it is an even sample point. For both the fixed algebraic codebook and the phase adaptive algebraic codebook, the algebraic code in which the position of the pulse is determined by the indexes of a plurality of channels as shown in the first embodiment. You can also use a book. P in FIG. 24A is the pitch peak position obtained by the pitch peak position detector 427. The phase adaptive position determination unit 428 first determines in which region A to D the pitch peak position P exists when one subframe is divided into four as shown in FIG. Next, it is determined which position (1) to (4) in FIG. In the example of FIG. 24, since the position of P exists in the area A of (b), the portion (1) is selected as the phase adaptive algebraic codebook.
[0144]
Since the phase adaptive portion is switched step by step in this way, even if the calculated pitch peak position differs between the encoder side and the decoder side (such as when there is a transmission path error), a certain range It is possible to suppress the influence within. In FIG. 24C, (1) to (4) have no overlapping portion, but by providing a phase adaptive codebook that overlaps, the pitch peak position is the boundary of the phase adaptive codebook. It is also possible to deal with the case of being in the vicinity. Further, the ratio of the lengths (5) and (6) in FIG. 24 (c) is switched using the pitch period information, that is, when the pitch period is long, the length of (6) is increased and the length of (5) is changed. A technique of shortening the length and conversely shortening the length of (6) and increasing the length of (5) when the pitch period is short is also effective. At this time, the length of (6) is a length equal to or longer than the pitch period.
[0145]
FIG. 25 shows a flowchart of the noise codebook search method in the present embodiment. Compared to FIG. 14 shown in Embodiment 3 of the present invention, the first three blocks (pitch peak position calculation, phase adaptive position determination, phase adaptive codebook selection (switching)) are three-channel algebraic codes. It is newly added as pre-processing for book search. First, as shown at 427 in FIG. 23, the pitch peak position is calculated using the adaptive code vector and the pitch period. Subsequently, the search range of the phase adaptive algebraic codebook is determined according to the pitch peak position. Then, the search position of each pulse is set by selecting an algebraic codebook that is the search range. Up to this point, the three-pulse search preprocessing is completed. Thereafter, the positions of the pulses 1 to 3 are initialized, and the optimum combination of the positions of the three pulses is found out of the combinations of the search positions of the pulses 1 to 3 set in the preprocessing. The pulse of each channel is determined in order from channels 1 to 3. When the position of the channel 1 pulse (pulse 1) is determined, an error evaluation function used for distortion minimization calculated from only pulse 1 is obtained, and then the position of the channel 2 pulse (pulse 2) is determined. Then, an error evaluation function calculated from pulse 1 and pulse 2 is obtained. In this way, each time a pulse of a new channel is added, an error evaluation function is obtained in a nested loop structure in which an error evaluation function term related to the newly added pulse is added. A combination of the error evaluation function value to be minimized and the position of each pulse at that time is obtained. Such a search method is the same as that of the conventional example, except that a part of the search position is switched depending on the pitch peak position in the preprocessing of the search loop. When the search for the 3-channel algebraic codebook is completed, the search for the 2-channel noise codebook is performed. When a sound source vector that has a distortion smaller than that determined by the 3-channel algebraic codebook is found, a noise code vector that is the final output is output from the 2-channel noise codebook. .
[0146]
Also, the speech decoding apparatus according to the present embodiment will be described with reference to FIG. FIG. 26 is a block diagram showing an example of a speech decoding apparatus according to Embodiment 4 of the present invention. Reference numeral 451 denotes an input from a driving excitation vector generated in the past output from the adder 458. An adaptive codebook for outputting the adaptive code vector specified by the transmitted information A to the multiplier 455 and an algebraic codebook 453, a part of which is a phase adaptive algebraic codebook, 452 is information transmitted from the encoding device The noise code vector specified by S is extracted from the algebraic codebook 453, which is partly a phase-adaptive algebraic codebook, or a random codebook 454 consisting of an algebraic codebook and a random codebook that is not an algebraic codebook. The same noise codebook as the coding apparatus having the configuration of FIG. 23 and FIG. Is a multiplier that multiplies the adaptive code gain output from the adaptive code vector and the adaptive code vector output from the adaptive codebook 451 as input, and outputs the multiplication result to the

adder

458, and 456 is the prediction gain output from the gain predictor 459. And a noise code vector output from the noise codebook 452, and a multiplier for outputting a multiplication result to the multiplier 457. 457 is a noise code vector and a gain after being multiplied by the prediction gain output from the multiplier 456. The multiplier 458 performs multiplication by using the noise code gain output from the codebook 460 as an input, and outputs the multiplication results to the adder 458 and the gain predictor 459, respectively. Adaptation after gain multiplication output from the multiplier 455 The vector addition is performed using the code vector and the noise code vector after gain multiplication output from the multiplier 457 as input, and the result is obtained. An adder 458 output to the synthesis filter 462 and the adaptive codebook 451 receives the noise code vector after gain multiplication output from the multiplier 457, and a gain predictor 460 outputs the prediction gain to the multiplier 456. A gain codebook for outputting the adaptive code gain specified by the information G transmitted from the encoding device to the multiplier 455 and the noise code gain to the

multiplier

457, and 461 for decoding the information L transmitted from the encoding device And a linear prediction coefficient decoder 462 for obtaining a quantized linear prediction coefficient and outputting it to the synthesis filter 480, and a driving excitation vector output from the adder 458 and a quantized linear prediction output from the linear prediction coefficient decoder 461 This is a synthesis filter that receives a coefficient and outputs a synthesized speech signal. In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0147]
The operation of the speech decoding apparatus configured as described above will be described below with reference to FIG. In FIG. 26, information L, A, S, and G is transmitted from the encoding device, and each information is input to the linear prediction coefficient decoder 461, the adaptive codebook 451, the noise codebook 452, and the gain codebook 460. The linear prediction coefficient decoder 461 that has received the information L decodes the quantized linear prediction coefficient and outputs it to the synthesis filter 462. The synthesis filter 462 is constructed using quantized linear prediction coefficients. The adaptive codebook 451 that has received the information A cuts out the adaptive code vector designated by A from the adaptive codebook and outputs it to the multiplier 455. At this time, the adaptive code vector is also output to the algebraic codebook 453, a part of which is phase adaptive. The noise codebook 452 that has received the information S generates a noise code vector designated by S from the algebraic codebook 453 that is partially phase-adaptive or the noise codebook 454 that consists of an algebraic codebook and a random codebook. , Output to the multiplier 456. Note that in the algebraic codebook 453 that is partly phase adaptive, the phase adaptive part is adaptively generated based on the pitch peak position obtained from the adaptive code vector input from the adaptive codebook 451. Multiplier 456 multiplies the noise gain vector by the prediction gain output from gain predictor 459 and outputs the result to multiplier 457. The gain codebook that has received the information G selects and outputs the quantization gain designated by G from the gain codebook. At this time, the adaptive code gain is output to the multiplier 455 and the noise code gain is output to the multiplier 457, respectively. Multiplier 455 multiplies the adaptive code vector output from adaptive codebook 451 by the adaptive code gain output from gain codebook 460 and outputs the result to adder 458. The multiplier 457 multiplies the noise code vector output from the gain codebook 460 by the noise code vector output from the adjacent noise codebook 452 and multiplied by the prediction gain in the multiplier 456 and outputs the result to the adder 458. The post-multiplication noise code vector output from multiplier 457 is also output to gain predictor 459. The gain predictor 459 predicts the gain (logarithmic power) of the current noise code vector using the MA prediction or the like using the noise code vector output from the multiplier 457 in the past, and outputs it to the multiplier 456. The adder 458 adds the adaptive code vector component of the driving excitation signal output from the multiplier 455 and the noise code vector component of the driving excitation signal output from the multiplier 457 to generate a driving excitation signal, and synthesizes it. Output to the filter 462. The drive signal vector output from the adder 458 is also output to the adaptive codebook and used for updating the adaptive codebook. The synthesis filter 462 synthesizes a synthesized signal from the driving sound source output from the adder 458 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0148]
As described above, according to Embodiment 4 described above, in a configuration having a plurality of types of algebraic codebooks, a part of the algebraic codebook is obtained from the adaptive code vector in a mode with a small number of pulses and an insufficient number of bits. Since the pitch peak position is adaptively changed, the voice quality can be improved.
[0149]
(Embodiment 5)
FIG. 27 shows a block diagram of a speech coding apparatus according to Embodiment 5 of the present invention. The speech coding apparatus according to the fifth embodiment can switch modes without independent mode information by switching the codebook to be used by combining each channel in a noise codebook composed of two or more channels. It is something that can be done.
[0150]
In FIG. 27, reference numeral 501 denotes an adaptive codebook in which driving excitation vectors generated in the past are input from the adder 508 and stored, and an adaptive code vector is output to the multiplier 505 by a control signal from the distortion minimizer 514. A noise codebook which consists of two types of noise codebooks and outputs a noise code vector from one of the noise codebooks according to a control signal from the distortion minimizer 514, 503 is a part of the noise codebook 502 An algebraic codebook 504 is a part of the noise codebook 502 and is not an algebraic codebook having a different structure from the algebraic codebook 503 (two-channel algebraic codebook) and two kinds of algebraic codebooks. A random codebook (two-channel configuration) is a noise codebook, 505 is an adaptive code vector output from the adaptive codebook 501 and an adaptive code output from the gain codebook A multiplier that outputs the multiplication result to the adder 508, and a multiplier 506 that receives the noise code vector output from the noise codebook 502 and the prediction gain output from the gain predictor 509 as input. The multiplier 507 outputs the multiplication result, and the multiplier 506 outputs the noise code vector after the prediction gain multiplication output from the multiplier 506 and the noise code gain output from the gain codebook. The multiplication result is added to the adder 508 and the gain predictor 509. An adder 508 outputs a vector addition using as input the adaptive code vector after the adaptive code gain multiplication outputted from the multiplier 505 and the noise code vector after the multiplication of the prediction gain and the noise code gain outputted from the multiplier 507. An adder that outputs the result to the synthesis filter 512 and the

adaptive codebook

501, 509 is a prediction output from the multiplier 507 A gain predictor that receives the noise code vector multiplied by IN and the noise code gain as input, and outputs a prediction gain to the multiplier 506. 510 designates an adaptive code gain to the multiplier 505 according to a control signal from the distortion minimizer 514. A gain codebook that outputs the gain to the multiplier 507, 511 performs linear prediction analysis and quantization of the linear prediction coefficient, using the input speech signal as input, and outputs a quantized linear prediction coefficient to the synthesis filter 512 A synthesis filter 512 receives the excitation vector output from the adder 508 and the quantized linear prediction coefficient output from the linear prediction analyzer 511, and outputs a synthesized speech signal to the adder 513. 513 is an input speech signal. And the synthesized speech signal output from the synthesis filter 512 as input, and the vector subtraction is performed to obtain the distortion minimizer. The adder 514 outputs the distortion of the synthesized speech signal to the input speech signal in the auditory weighting region or the like from the difference vector output from the adder 513, and the adaptive codebook 501 so that this distortion is minimized. And a distortion minimizer that controls the output of the noise codebook 502 and the gain codebook 510.
[0151]
The operation of the speech signal encoding apparatus configured as described above will be described with reference to FIGS.
[0152]
First, in FIG. 27, the speech signal is input to the linear prediction analyzer 511. The linear prediction analyzer 511 performs linear prediction analysis of the input speech signal and calculates a linear prediction coefficient required by the synthesis filter 512. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 512. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device. Next, the synthesis filter 512 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 511. The distortion minimizer 514 drives this synthesis filter using only the adaptive codebook 501, and selects an adaptive code vector having the smallest distortion from the adaptive codebook 501. This adaptive codebook expresses a periodic component of a speech signal (this period is called a pitch period), and a vector of 1 subframe length is usually cut out from a point before one pitch period (one pitch period length is If it is shorter than one subframe length, the extracted vector is repeated at a pitch period to make a vector of one subframe length). An index A representing the selected adaptive code vector is transmitted to the decoder side. Thereafter, the adaptive code vector output from the adaptive codebook 501 is fixed to the adaptive code vector determined here. Subsequently, the synthesis filter 512 is driven using both the adaptive codebook 501 and the noise codebook 502. At this time, since the adaptive code vector output from the adaptive code book 501 has already been determined, the noise code vector output from the noise code book 502 is changed, and the distortion minimizer 514 determines that the distortion is minimum. Select a random code vector. Since the noise codebook 502 includes the algebraic codebook 503, the algebraic codebook, and the noise codebook 504 composed of a random codebook that is not two kinds of algebraic codebooks, the distortion is minimized among both codebooks. What to do is selected. An index S representing the selected random code vector is transmitted to the decoder side. Up to this point, the adaptive code vector output from the adaptive code book 501 and the noise code vector output from the noise code book 502 are determined. Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 514. The gain codebook 510 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, and the element of the adaptive code gain is output to the multiplier 505 and the element of the noise code gain is output to the multiplier 507. Also, an index G representing the selected gain vector is transmitted to the decoding device. In the speech coding apparatus according to the present embodiment, in order to narrow the dynamic range of the noise code gain, the logarithmic domain power is predicted by the gain predictor 509, and the prediction residual is quantized using the gain codebook. Yes. The gain predictor 509 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs it to the multiplier 506 as a prediction gain. The noise code vector output from the noise code book 502 is multiplied by a prediction gain and a noise code gain by a multiplier 506 and a multiplier 507, respectively. The distortion minimizer 514 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before selecting the optimum gain vector from the gain codebook 510 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0153]
FIG. 28 shows the configuration of a noise codebook 504 having a two-channel configuration, which is characteristic in the present embodiment. In FIG. 28, 515 is the first channel of the algebraic codebook, 516 is the second channel of the first random codebook, 517 is the first channel of the first random codebook, and 518 is the first channel of the algebraic codebook. 2 channel, 519 is the first channel of the second random codebook, 520 is the second channel of the second random codebook, and 521 is the output vector from the first channel of the algebraic codebook and the first vector of the algebraic codebook. An adder that performs vector addition with the output vectors from the two channels and outputs the result to the

switch

525, 522 is a vector output from the first channel of the algebraic codebook and the second channel of the first random codebook An adder for performing vector addition with the vector output from, and outputting the result to the

switch

525, 523 is the first channel of the first random codebook. Is added to the vector output from the second channel of the algebraic codebook and the result is output to the switch 525. The adder 524 is output from the first channel of the second random codebook. An adder that performs vector addition of the generated vector and the vector output from the second channel of the second random codebook and outputs the result to the

switch

525, and 525 is the vector output from the adders 521 to 524. Is a switch that selects and outputs one vector from. The configuration of the noise codebook shown in FIG. 28 is basically the same as the configuration of the noise codebook shown in FIG. 20 of the third embodiment, but a random codebook that is not an algebraic codebook is used for both channels. It differs in that another random codebook is used when used. That is, the random codebook used is different when one of the two channels is an algebraic codebook and the other channel is a random codebook, and when both channels are random codebooks. These are the features of the present embodiment.
[0154]
FIG. 29 shows a flowchart of the search of the noise codebook 502 in the present embodiment. First, the positions of pulses 1 to 3 are initialized, then an algebraic codebook search of 3 pulses (channels) is performed, and the pulse position that makes the error smaller than the combination of the 3 initialized pulse positions If there is a combination, the position of each pulse is updated to that position. Subsequently, a 2 pulse (channel) algebraic codebook (combination of the first and second channels of the algebraic codebook), the first channel of the algebraic codebook and the second channel of the first random codebook. A combination of a first channel of a first random codebook and a second channel of an algebraic codebook, a combination of a first channel of a second random codebook and a second channel of a second random codebook A codebook search is performed in each of these to determine the combination that minimizes the error among these four combinations. Finally, one of the five types of noise code vectors selected and generated from the four combinations and the three-pulse algebraic codebook is selected and outputted. The 3-pulse algebraic codebook search method and the 2-channel noise codebook search method are the same as those described in the third embodiment. In FIG. 29, a 3-pulse algebraic codebook is searched first and then a 2-channel noise codebook is searched. However, the search order may not be this order.
[0155]
Also, the speech decoding apparatus in the present embodiment will be described with reference to FIG. FIG. 30 is a block diagram showing an example of a speech decoding apparatus according to Embodiment 5 of the present invention. Reference numeral 551 denotes a previously generated drive excitation vector output from the adder 558 as an input from the encoding apparatus. An adaptive codebook that outputs the adaptive code vector specified by the transmitted information A to the multiplier 555, and 552 is the algebraic codebook 553 or algebraic code for the noise code vector specified by the information S transmitted from the encoding device. The same noise code as that of the encoding apparatus having the configuration of FIG. 28 shown in the present embodiment, which is extracted from the noise codebook 554 composed of a codebook and a random codebook that is not an algebraic codebook and output to the multiplier 556 The book 555 multiplies the adaptive code gain output from the gain codebook 560 and the adaptive code vector output from the adaptive codebook 551 as inputs and performs multiplication. Is supplied to the adder 558. The multiplier 556 receives the prediction gain output from the gain predictor 559 and the noise code vector output from the noise codebook 552, and outputs the multiplication result to the multiplier 557. 557 performs multiplication by using the noise code vector multiplied by the prediction gain output from the multiplier 556 and the noise code gain output from the gain codebook 560 as input, and the multiplication result is added to the adder 558 and the gain predictor. Multipliers 558 output to 559, vector addition is performed with the gain-multiplied adaptive code vector output from multiplier 555 and the gain-multiplied noise code vector output from multiplier 557 as inputs, and the result is obtained. The adder 559 outputs to the synthesis filter 562 and the adaptive codebook 551, and the noise symbol after gain multiplication output from the multiplier 557. A gain predictor that receives a vector as an input and outputs a prediction gain to a

multiplier

556, 560 includes an adaptive code gain specified by information G transmitted from the encoding device to a multiplier 555, and a noise code gain to a multiplier 557. Gain codebooks 561 to be output, a linear prediction coefficient decoder 561 for performing decoding processing of the information L transmitted from the encoding device, obtaining quantized linear prediction coefficients and outputting them to the synthesis filter 580, and 562 from the adder 558 This is a synthesis filter that outputs a synthesized speech signal with the output drive excitation vector and the quantized linear prediction coefficient output from the linear prediction coefficient decoder 561 as inputs. In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0156]
The operation of the speech decoding apparatus configured as described above will be described below with reference to FIG. In FIG. 30, information L, A, S, and G is transmitted from the encoding device, and each piece of information is input to the linear prediction coefficient decoder 561, adaptive codebook 551, noise codebook 552, and gain codebook 560. The linear prediction coefficient decoder 561 that has received the information L decodes the quantized linear prediction coefficient and outputs it to the synthesis filter 562. The synthesis filter 562 is constructed using quantized linear prediction coefficients. The adaptive codebook 551 that has received the information A cuts out the adaptive code vector designated by A from the adaptive codebook and outputs it to the multiplier 555. The noise codebook 552 that has received the information S generates a noise code vector designated by S from the algebraic codebook 553 or the noise codebook 554 composed of an algebraic codebook and two types of random codebooks, and a multiplier 556. Output to. Multiplier 556 multiplies the noise gain vector by the prediction gain output from gain predictor 559 and outputs the result to multiplier 557. The gain codebook that has received the information G selects and outputs the quantization gain designated by G from the gain codebook. At this time, the adaptive code gain is output to multiplier 555 and the noise code gain is output to multiplier 557, respectively. Multiplier 555 multiplies the adaptive code vector output from adaptive codebook 551 by the adaptive code gain output from gain codebook 560 and outputs the result to adder 558. Multiplier 557 multiplies the noise code vector output from gain codebook 560 by the noise code vector output from adjacent noise codebook 552 and multiplied by the prediction gain in multiplier 556, and outputs the result to adder 558. Note that the multiplied noise code vector output from the multiplier 557 is also output to the gain predictor 559. The gain predictor 559 predicts the gain (logarithmic power) of the current noise code vector using the MA prediction or the like using the noise code vector output from the multiplier 557 in the past, and outputs it to the multiplier 556. The adder 558 adds the adaptive code vector component of the driving excitation signal output from the multiplier 555 and the noise code vector component of the driving excitation signal output from the multiplier 557 to generate a driving excitation signal, and synthesizes it. Output to the filter 562. The drive signal vector output from the adder 558 is also output to the adaptive codebook and used for updating the adaptive codebook. The synthesis filter 562 synthesizes a synthesized signal from the driving sound source output from the adder 558 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0157]
As described above, according to the fifth embodiment, it is possible to improve the performance of the entire noise codebook by adopting a configuration in which a random codebook with a small storage vector is used by switching according to a combination of channels.
[0158]
(Embodiment 6)
FIG. 31 shows a block diagram of a speech coding apparatus according to Embodiment 6 of the present invention. The speech coding apparatus according to the sixth embodiment uses a random noise in a noise codebook composed of two channels using an algebraic codebook for one channel and a random noise codebook for the other channel. By using the reference point of the noise code vector stored in the codebook in accordance with the position of the pulse output from the algebraic codebook, the use efficiency of the random noise codebook and the speech quality are improved. In FIG. 31, reference numeral 601 denotes a drive excitation vector generated in the past, which is input from the adder 608 and stored, and an adaptive code vector is output to the multiplier 605 according to a control signal from the distortion minimizer 614. The adaptive codebook 602 includes two types of noise codebooks, and one of the noise codebooks is controlled by a control signal from the distortion minimizer 614. A noise codebook that outputs a noise code vector from the book, 603 is an algebraic codebook that is a part of the

noise codebook

602, and 604 is a part of the noise codebook 602 and has a different structure from the algebraic codebook 603. A random codebook consisting of a random codebook that is not an algebraic codebook, an algebraic codebook, and an adaptive random codebook that changes adaptively according to the pulse position of the other channel, 605 is output from the adaptive codebook 601 A multiplier that receives the adaptive code vector and the adaptive code gain output from the gain codebook and outputs the multiplication result to the

adder

608, and 606 outputs the noise code vector output from the noise codebook 602 and the gain predictor 609. A multiplier that receives the predicted gain as an input and outputs the multiplication result to the multiplier 607, and 607 is the multiplier after the prediction gain output from the multiplier 606. An adder that receives the noise code vector of the signal and the noise code gain output from the gain codebook as inputs and outputs the multiplication result to the adder 608 and the

gain predictor

609, and 608 is the product of the adaptive code gain output from the multiplier 605 And an adder for outputting the vector addition result to the synthesis filter 612 and the adaptive codebook 601 with the prediction gain output from the multiplier 607 and the noise code vector multiplied by the noise code gain as inputs, and 609 a multiplier. A gain predictor that outputs the prediction gain output from 607 and the noise code vector multiplied by the noise code gain and outputs the prediction gain to the

multiplier

606, and 610 multiplies the adaptive code gain by the control signal from the distortion minimizer 614. A gain codebook for outputting the noise code gain to the multiplier 607, and 611 for the input speech signal. A linear prediction analyzer that performs linear prediction analysis and quantization of a linear prediction coefficient as an input and outputs the quantized linear prediction coefficient to the synthesis filter 612. An excitation vector output from the adder 608 and a linear prediction analyzer 611 A synthesis filter that inputs the output quantized linear prediction coefficient and outputs the synthesized speech signal to the adder 613, and 613 performs vector subtraction using the input speech signal and the synthesized speech signal output from the synthesis filter 612 as inputs. An adder that outputs the result to the distortion minimizer 614, and 614 calculates distortion of the synthesized speech signal in the auditory weighting region or the like with respect to the input speech signal from the difference vector output from the adder 613, and this distortion is minimized. Thus, the distortion code for controlling the output of the adaptive codebook 601, the noise codebook 602, and the gain codebook 610 It is an equalizer.
[0159]
The operation of the speech signal encoding apparatus configured as described above will be described with reference to FIGS.
[0160]
First, in FIG. 31, the speech signal is input to the linear prediction analyzer 611. The linear prediction analyzer 611 performs linear prediction analysis of the input speech signal and calculates a linear prediction coefficient necessary for the synthesis filter 612. The calculated linear prediction coefficient is quantized and then output to the synthesis filter 612. Also, a code L representing the linear prediction coefficient output here is transmitted to the decoding device. Next, the synthesis filter 612 is configured using the quantized linear prediction coefficient output from the linear prediction analyzer 611. The distortion minimizer 614 drives this synthesis filter using only the adaptive codebook 601 and selects an adaptive code vector having the smallest distortion from the adaptive codebook 601. This adaptive codebook expresses a periodic component of a speech signal (this period is called a pitch period), and a vector of 1 subframe length is usually cut out from a point before one pitch period (one pitch period length is If it is shorter than one subframe length, the extracted vector is repeated at a pitch period to make a vector of one subframe length). An index A representing the selected adaptive code vector is transmitted to the decoder side. Thereafter, the adaptive code vector output from the adaptive codebook 601 is fixed to the adaptive code vector determined here. Subsequently, the synthesis filter 612 is driven using both the adaptive codebook 601 and the noise codebook 602. At this time, since the adaptive code vector output from the adaptive code book 601 has already been determined, the noise code vector output from the noise code book 602 is changed, and the distortion minimizer 614 determines that the distortion is minimum. Select a random code vector. Since the noise codebook 602 includes the algebraic codebook 603, the algebraic codebook, and the noise codebook 604 composed of a random codebook that is not two kinds of algebraic codebooks, the distortion is minimized among both codebooks. What to do is selected. An index S representing the selected random code vector is transmitted to the decoder side. Up to this point, the adaptive code vector output from the adaptive code book 601 and the noise code vector output from the noise code book 602 are determined. Finally, an adaptive code gain to be multiplied by the adaptive code vector and a noise code gain to be multiplied by the noise code vector are determined by the distortion minimizer 614. The gain codebook 610 stores two-dimensional vectors (gain vectors) each having an adaptive code gain and a noise code gain as elements, and a gain vector having a combination of an adaptive code gain and a noise code gain that minimizes distortion. Are selected, and the elements of the adaptive code gain are output to the multiplier 605 and the elements of the noise code gain are output to the multiplier 607. Also, an index G representing the selected gain vector is transmitted to the decoding device.
[0161]
In the speech coding apparatus according to the present embodiment, in order to narrow the dynamic range of the noise code gain, the logarithmic domain power is predicted by the gain predictor 609, and the prediction residual is quantized using the gain codebook. Yes. The gain predictor 609 predicts the power of the current noise code vector using the power of the noise code vector generated in the past, and outputs it to the multiplier 606 as a prediction gain. The noise code vector output from the noise codebook 602 is multiplied by a prediction gain and a noise code gain by a multiplier 606 and a multiplier 607, respectively. The distortion minimizer 614 uses the adaptive code vector, the noise code vector, and the prediction gain that have been determined before the optimum gain vector is selected from the gain codebook 610 to reduce the coding distortion of the synthesized speech signal. The minimum combination of adaptive code gain and noise code gain is determined.
[0162]
FIG. 32 is a block diagram showing a configuration of noise codebook 604 that is characteristic in the present embodiment. FIG. 32 is very similar to the configuration shown in FIG. 28 in the fifth embodiment, except that the first random codebook is an adaptive random codebook and is adapted to output vectors from these random codebooks. The difference is that adaptive processing by the instrument is added. 32, 615 is the first channel of the algebraic codebook, 616 is the second channel of the adaptive random codebook, 617 is the first channel of the adaptive random codebook, 618 is the second channel of the algebraic codebook, 619 is the first channel of the random codebook, 620 is the second channel of the random codebook, 621 is the vector output from the first channel 615 of the algebraic codebook, and the second channel 616 of the adaptive random codebook. Adaptive random code by performing an operation to match the position of the excitation pulse of the vector output from the first channel of the algebraic codebook with the reference point of the vector stored in the adaptive random codebook. An adaptor for outputting a vector stored in the second channel of the book to an adder 624; 622 is an algebraic codebook Using the vector output from the second channel 618 and the vector output from the first channel 617 of the adaptive random codebook as input, the position and adaptation of the excitation pulse of the vector output from the second channel of the algebraic codebook An adaptor 623 that performs an operation of matching a reference point of a vector stored in the first channel of the random codebook and outputs the vector stored in the first channel of the adaptive random codebook to the adder 625; An adder that performs vector addition using the first channel 615 of the algebraic codebook and the vector output from the second channel 618 of the algebraic codebook as inputs, and outputs the result to the switch 627; 614 is the algebraic codebook Vector addition of vector output from first channel 615 and vector output from adaptor 621 An adder that outputs to the

switch

627, and 625 performs a vector addition of the vector output from the adaptor 622 and the vector output from the second channel 618 of the algebraic codebook, and outputs the result to the switch 627. 626 is a vector addition of the vectors output from the first channel of the random codebook and the second channel of the random codebook, and outputs the result to the switch 627. 627 is output from the adders 623 to 626. This is a switch that selects and outputs one of the vectors. It should be noted that the first channel and the second channel of the algebraic codebook in the present embodiment output a vector having one excitation pulse. In addition, when adding processing (pitch emphasis filter processing, etc.) according to pitch periodicity or pitch periodicity to an algebraic codebook, a vector in a vector output from the first channel and the second channel of the algebraic codebook. In this case, the

adaptors

621 and 622 perform processing for matching the reference point of the random codebook vector to the position of the leading pulse among the excitation pulses in the vector. FIG. 33 schematically shows the processing contents of the

adaptors

621 and 622. In FIG. 33, 628 is the first random code vector output from channel 1 of the algebraic codebook, 629 is output from channel 2 of the adaptive random codebook, and then adaptive processing is performed by the adaptor 621. The output second random code vector, 630 is a random code vector stored in channel 2 of the adaptive random codebook, and 631 is a vector addition of the first random code vector and the second random code vector. The adder outputs the final noise code vector. As shown in FIG. 33, the vector 630 stored in the adaptive random codebook is shifted by the adaptor so that the reference point matches the position of the excitation pulse of the first noise code vector. This is a code vector. The same applies to the case where the second channel of the algebraic codebook and the first channel of the adaptive random codebook are used in combination.
[0163]
FIG. 29 and FIG. 34 show flowcharts of the noise codebook search method in the present embodiment. FIG. 29 is a diagram shown in the fifth embodiment. The sixth embodiment is different in that an adaptive random codebook is used as the first random codebook and a random codebook is used as the second random codebook. The different parts in the processing contents are (algebraic codebook). First channel + second channel of first random codebook (adaptive random codebook) search portion and (first channel of first random codebook (adaptive random codebook) + algebraic codebook The second channel) of the search. In these searches, the first random codebook is an adaptive random codebook, and an adaptive process is applied. This is shown in the flowchart in FIG.
[0164]
FIG. 34 is a diagram showing a search portion (first channel of algebraic codebook + second channel of adaptive random codebook). The same procedure is performed in the search (the second channel of the algebraic codebook + the first channel of the adaptive random codebook). In FIG. 34, first, a search for the first algebraic codebook is performed. The search for the first algebraic codebook is performed by maximizing equation (1). Here, a plurality of candidates for increasing the value of equation (1) may be left, but in this example, only one candidate for maximizing equation (1) is selected to reduce the amount of calculation. This is the first random code vector. Next, a shift for matching the reference points of all the vectors stored in the second channel of the adaptive random codebook to the position of the excitation pulse output from the first channel of the selected algebraic codebook. Process. Next, the second random codebook is searched using the vector subjected to the shift process. Here, since the output from the first channel of the algebraic codebook has already been determined, a vector that minimizes the coding distortion in combination with the first channel of the algebraic codebook is assigned to the first of the adaptive random codebook. Select from 2 channels. The (shifted) vector of the second channel of the selected adaptive random codebook is determined as the second random code vector. Then, a vector obtained by adding the first noise code vector and the second noise code vector is output as a final noise code vector.
[0165]
FIG. 35 shows information A (adaptive codebook index), S (noise codebook index), G (gain codebook index), L (linear prediction coefficient) transmitted by the speech coding apparatus according to the embodiment of the present invention. An example of the decoding apparatus which receives (encoding information) and performs a decoding process is shown.
[0166]
FIG. 35 is a block diagram showing an example of a speech decoding apparatus according to Embodiment 6 of the present invention. Reference numeral 651 denotes a driving excitation vector generated in the past output from the adder 658 as an input from the encoding apparatus. An adaptive codebook for outputting the adaptive code vector specified by the transmitted information A to the multiplier 655, and 652 a noise code vector specified by the information S transmitted from the encoding device as an algebraic codebook 653 or algebraic codebook. Coding having the configuration of FIGS. 32 and 33 shown in the present embodiment, which is extracted from noise codebook 654 composed of two types of random codebooks which are not a codebook and an algebraic codebook, and is output to multiplier 656 The same noise codebook as that of the apparatus, 655 receives the adaptive code gain output from the gain codebook 660 and the adaptive code vector output from the adaptive codebook 651 as inputs. A multiplier that performs a calculation and outputs a multiplication result to an adder 658; 656 receives a prediction gain output from a gain predictor 659 and a noise code vector output from a noise codebook 652; 657 is a multiplier that outputs to 657, and 657 multiplies the noise code vector that has been multiplied by the prediction gain output from multiplier 656 and the noise code gain that is output from gain codebook 660 as inputs, and adds the multiplication results A multiplier 658 and a gain predictor 659 respectively output to the multiplier 658, a gain-multiplied adaptive code vector output from the multiplier 655 and a gain-multiplied noise code vector output from the multiplier 657 as inputs, An adder that performs addition and outputs the result to the synthesis filter 662 and the

adaptive codebook

651, and 659 is output from the multiplier 657 A gain predictor that takes the obtained noise code vector as an input and outputs a prediction gain to the

multiplier

656, and 660 supplies an adaptive code gain specified by the information G transmitted from the encoding device to the multiplier 655. A gain codebook that outputs the gain to the multiplier 657, a linear prediction coefficient decoder 661 that performs a decoding process on the information L transmitted from the encoding device, obtains a quantized linear prediction coefficient, and outputs it to the synthesis filter 680, Reference numeral 662 denotes a synthesis filter that receives the drive excitation vector output from the adder 658 and the quantized linear prediction coefficient output from the linear prediction coefficient decoder 661 and outputs a synthesized speech signal. In general, the decoded speech signal output from the synthesis filter is further subjected to filter processing or the like for enhancing auditory quality.
[0167]
The operation of the speech decoding apparatus configured as described above will be described below with reference to FIG. In FIG. 35, information L, A, S, and G is transmitted from the encoding device, and each piece of information is input to the linear prediction coefficient decoder 661, the adaptive codebook 651, the noise codebook 652, and the gain codebook 660. The linear prediction coefficient decoder 661 that has received the information L decodes the quantized linear prediction coefficient and outputs it to the synthesis filter 662. The synthesis filter 662 is constructed using quantized linear prediction coefficients. The adaptive codebook 651 that has received the information A cuts out the adaptive code vector designated by A from the adaptive codebook and outputs it to the multiplier 655. The noise codebook 652 that has received the information S generates a noise code vector designated by S from the algebraic codebook 653 or the noise codebook 654 including the algebraic codebook, the random codebook, and the adaptive random codebook, Output to the multiplier 656. Multiplier 656 multiplies the noise gain vector by the prediction gain output from gain predictor 659 and outputs the result to multiplier 657. The gain codebook that has received the information G selects and outputs the quantization gain designated by G from the gain codebook. At this time, the adaptive code gain is output to multiplier 655 and the noise code gain is output to multiplier 657, respectively. Multiplier 655 multiplies the adaptive code vector output from adaptive codebook 651 by the adaptive code gain output from gain codebook 660 and outputs the result to adder 658. Multiplier 657 multiplies the noise code vector output from gain codebook 660 by the noise code vector output from adjacent noise codebook 652 and multiplied by the prediction gain in multiplier 656, and outputs the result to adder 658. The multiplied noise code vector output from multiplier 657 is also output to gain predictor 659. The gain predictor 659 predicts the gain (logarithmic power) of the current noise code vector using the MA code or the like using the noise code vector output from the multiplier 657 in the past, and outputs it to the multiplier 656. The adder 658 adds the adaptive code vector component of the driving excitation signal output from the multiplier 655 and the noise code vector component of the driving excitation signal output from the multiplier 657 to generate a driving excitation signal, and synthesizes it. Output to the filter 662. The drive signal vector output from the adder 658 is also output to the adaptive codebook and used for updating the adaptive codebook. The synthesis filter 662 synthesizes a synthesized signal from the driving sound source output from the adder 658 and outputs the synthesized signal. Although the output audio signal may be output as a decoded audio signal as it is, the quality is generally insufficient, so post-processing such as high frequency emphasis, pitch emphasis, formant emphasis is performed to improve the auditory quality. And then output as a decoded audio signal.
[0168]
As described above, according to the sixth embodiment, in the noise codebook composed of two channels, when the algebraic codebook is used for one channel and the random noise codebook is used for the other channel, the random codebook is used. It is possible to improve the use efficiency of random codebooks and improve speech quality by adjusting the reference point of the noise code vector stored in the code to the position of the pulse output from the algebraic codebook. .
[0169]
(Embodiment 7)
FIG. 36 is a block diagram showing an audio signal transmitter and receiver including the audio encoding / decoding device according to any of Embodiments 1 to 6 of the present invention. 36, reference numeral 701 denotes an audio signal input device that converts an audio signal such as a microphone into an electrical signal and outputs it to an A /

D converter

702, and 702 converts an analog audio signal output from the audio signal input device into a digital signal. The A / D converter 703 for outputting to the speech encoder 703 performs speech encoding by the speech encoding apparatus according to any one of the first to sixth embodiments of the present invention and outputs the result to the RF modulator 704. An audio encoder 704 converts the audio information encoded by the audio encoder 703 into a signal to be transmitted on a propagation medium such as a radio wave, and outputs the signal to the transmission antenna 705. An RF modulator 705 A transmission antenna that transmits a transmission signal output from the modulator 704 as a radio wave, and 706 is a radio wave transmitted from the transmission antenna 705. Reference numeral 707 denotes a transmission apparatus including an A / D converter 702, a speech encoder 703, and an RF modulator 704 as constituent elements. Further, reference numeral 708 denotes a receiving antenna that receives the radio wave 706 and outputs it to the RF modulator 711. Reference numeral 711 denotes an RF that converts the received signal input from the receiving antenna 708 into an encoded voice signal and outputs it to the voice decoder 712. The demodulator 712 receives the encoded speech signal output from the RF demodulator and performs decoding processing by the speech decoding apparatus shown in Embodiment 6 of the present invention, and converts the decoded speech signal into a D / A converter. 713 is a speech decoder that outputs to 713, 713 is a D / A converter that receives the decoded speech signal from speech decoder 712, converts it into an analog speech signal, and outputs it to

speech output device

710, 710 is a D / A conversion An audio output device such as a speaker that inputs an analog audio signal from a device and outputs audio. Reference numeral 709 denotes a receiving apparatus including an RF demodulator 711, a speech decoder 712, and a D / A converter 713 as constituent elements.
[0170]
The audio signal transmitter and receiver configured as described above will be described with reference to FIG. First, the voice is converted into an electrical analog signal by the voice input device 701 and output to the A / D converter 702. Subsequently, the analog voice signal is converted into a digital voice signal by the A / D converter 702 and output to the voice encoder 703. Subsequently, the speech encoder 703 performs speech encoding processing and outputs the encoded information to the RF modulator 704. Subsequently, the RF modulator performs an operation for transmitting the information of the encoded audio signal as a radio wave such as modulation / amplification / code spreading and outputs the information to the transmission antenna 705. Finally, a radio wave 706 is transmitted from the transmission antenna 705. On the other hand, in the receiver, the radio wave 706 is received by the receiving antenna 708, and the received signal is sent to the RF demodulator 711. The RF demodulator 711 performs processing for converting a radio wave signal such as code despreading / demodulation into encoded information, and outputs the encoded information to the speech decoder 712. The audio decoder 712 performs a decoding process on the encoded information and outputs a digital decoded audio signal to the D / A converter 713. The D / A converter converts the digital decoded speech signal output from the speech decoder 712 into an analog decoded speech signal. Finally, the audio output device converts the electrical analog decoded audio signal into decoded audio and outputs it.
[0171]
The transmission device and the reception device can be used as a mobile device or a base station device of a mobile communication device such as a mobile phone. Note that the medium for transmitting information is not limited to the radio wave as shown in this embodiment mode, and an optical signal or the like can be used, and a wired transmission path can also be used.
[0172]
Note that the speech encoding apparatus or decoding apparatus shown in the first to sixth embodiments and the transmitting apparatus and the receiving apparatus shown in the seventh embodiment are applied to a recording medium such as a magnetic disk, a magneto-optical disk, or a ROM cartridge. It can also be realized by recording as software, and by using the recording medium, a speech encoding device / decoding device and a transmitting device / receiving device are realized by a personal computer or the like using such a recording medium. Then you can.
[0173]
【The invention's effect】
As described above in detail, according to the present invention, in order to represent the position of one pulse of the algebraic codebook, in addition to the bit assigned to the pulse, bit information assigned to at least another other pulse. By using, the search range of the sound source pulse can be expanded to twice or more without increasing the number of bits.
[0174]
The present invention is also configured to have a plurality of types of algebraic codebooks, thereby effectively using algebraic codebooks having a large number of pulses with insufficient bits and improving the quality of speech with a short pitch period. In addition, the quality of the voiced rising portion or the like can be improved by using an algebraic codebook having a sufficient number of bits and a small number of pulses.
[0175]
The present invention can also improve speech quality by using both an algebraic codebook and a random noise codebook.
[0176]
The present invention also adapts a part of the algebraic codebook using a pitch peak position obtained from an adaptive code vector in a mode having a plurality of types of algebraic codebooks in a mode with a small number of pulses and an insufficient number of bits. The voice quality can be improved by changing the sound quality.
[0177]
The present invention also enables switching of modes without independent mode information by switching the codebook to be used according to the combination of each channel in a noise codebook composed of two or more channels. By making a part of each channel an algebraic codebook, it is possible to reduce the calculation amount and the memory amount.
[0178]
The present invention is also stored in a random noise codebook when a noise codebook composed of two channels is used by using an algebraic codebook for one channel and a random noise codebook for the other channel. By using the reference point of the noise code vector according to the position of the pulse output from the algebraic codebook, it is possible to improve the use efficiency of the random noise codebook and improve the speech quality.
[0179]
The present invention can also realize a transmitting apparatus or receiving apparatus that can provide higher-quality audio quality by providing the above-described audio encoding device or decoding device as an audio encoder or decoder.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an algebraic codebook searcher according to Embodiment 1 of the present invention;
FIG. 2 is a schematic diagram showing a pulse search position of an algebraic codebook according to Embodiment 1 of the present invention.
FIG. 3 is a table showing the contents of an algebraic codebook in Embodiment 1 of the present invention.
FIG. 4 is a flowchart showing an algebraic codebook search method according to Embodiment 1 of the present invention;
FIG. 5 is a program example showing an algebraic codebook search method according to Embodiment 1 of the present invention;
FIG. 6 is a block diagram showing the configuration of a speech encoding apparatus according to Embodiment 1 of the present invention.
FIG. 7 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 1 of the present invention.
FIG. 8 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 2 of the present invention.
FIG. 9 is a flowchart showing a noise codebook search method according to Embodiment 2 of the present invention;
FIG. 10 is a schematic diagram showing a pulse search position of an algebraic codebook according to Embodiment 2 of the present invention.
FIG. 11 is a table showing the contents of an algebraic codebook in Embodiment 2 of the present invention.
FIG. 12 is a block diagram showing the configuration of a speech decoding apparatus according to Embodiment 2 of the present invention.
FIG. 13 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 3 of the present invention.
FIG. 14 is a flowchart showing a noise codebook search method according to Embodiment 3 of the present invention;
FIG. 15 is a flowchart showing a 2-pulse algebraic codebook search method according to Embodiment 3 of the present invention;
FIG. 16 is a flowchart showing a combination search method for the first channel of the algebraic codebook and the second channel of the random codebook in Embodiment 3 of the present invention;
FIG. 17 is a flowchart showing a combination search method for the first channel of the random codebook and the second channel of the algebraic codebook in Embodiment 3 of the present invention;
FIG. 18 is a flowchart showing a combination search method for the first channel of the random codebook and the second channel of the random codebook in Embodiment 3 of the present invention;
FIG. 19 is a table showing the contents of an algebraic codebook in Embodiment 3 of the present invention.
FIG. 20 is a block diagram showing a configuration of a noise codebook according to the third embodiment of the present invention.
FIG. 21 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 3 of the present invention.
FIG. 22 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 4 of the present invention.
FIG. 23 is a block diagram showing a configuration of a phase adaptive algebraic codebook in Embodiment 4 of the present invention.
FIG. 24 is a schematic diagram showing a pulse search position of a phase adaptive algebraic codebook according to Embodiment 4 of the present invention;
FIG. 25 is a flowchart showing a noise codebook search method in Embodiment 4 of the present invention;
FIG. 26 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 4 of the present invention.
FIG. 27 is a block diagram showing a configuration of a speech coding apparatus according to Embodiment 5 of the present invention.
FIG. 28 is a block diagram showing a configuration of a two-channel noise codebook according to Embodiment 5 of the present invention;
FIG. 29 is a flowchart showing a noise codebook search method according to the fifth embodiment of the present invention.
FIG. 30 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 5 of the present invention.
FIG. 31 is a block diagram showing a configuration of a speech coding apparatus according to Embodiment 6 of the present invention.
FIG. 32 is a block diagram showing a configuration of a two-channel noise codebook according to Embodiment 6 of the present invention;
FIG. 33 is a schematic diagram illustrating the principle of an adaptive random codebook according to the sixth embodiment of the present invention.
FIG. 34 is a flowchart showing a method for searching for a combination of the first channel of the algebraic codebook and the second channel of the adaptive random codebook in Embodiment 6 of the present invention;
FIG. 35 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 6 of the present invention.
FIG. 36 is a block diagram showing a configuration of a transmission apparatus and a reception apparatus in Embodiment 7 of the present invention.
FIG. 37 is a block diagram showing a configuration of a CELP speech encoding apparatus using a conventional algebraic codebook.
FIG. 38 is a schematic diagram showing a pulse search position of a conventional algebraic codebook.
FIG. 39 is a table showing the contents of a conventional algebraic codebook.
FIG. 40 is a flowchart showing a conventional algebraic codebook search method.
FIG. 41 is a program showing a conventional algebraic codebook search method
FIG. 42 is a block diagram showing a configuration of a conventional algebraic codebook searcher;
FIG. 43 is a schematic diagram showing a pulse search position of a conventional algebraic codebook.
FIG. 44 is a table showing the contents of a conventional algebraic codebook.
[Explanation of symbols]
101 Pulse 1 index generator
102 Pulse 2 index generator
103 Pulse 3 index generator
104 First index / pulse position converter
105 Second index / pulse position converter
106 Third index / pulse position converter
107 Distortion evaluation function calculator
108 Distortion minimizer
151 Adaptive codebook
152, 172 Adjacent channel dependent algebraic codebook
153-155 multiplier
156 Adder
157 Gain prediction analyzer
158 Gain codebook
159 linear predictor
160 Synthesis filter
161 adder
162 Distortion Minimizer
203 first algebraic codebook
204 Second algebraic codebook
253 First algebraic codebook
254 Second algebraic codebook
304 A noise codebook composed of an algebraic codebook and a random codebook that is not an algebraic codebook
354 Noise codebook consisting of algebraic codebook and random codebook
403 Algebraic codebook, part of which is phase adaptive
404 A noise codebook composed of an algebraic codebook and a random codebook that is not an algebraic codebook
453 Algebraic codebook, part of which is a phase adaptive algebraic codebook
454 Noise codebook consisting of algebraic codebook and random codebook
554 Noise codebook consisting of algebraic codebook and two kinds of random codebook
604 Noise codebook composed of an algebraic codebook and a random codebook that is not an algebraic codebook and an adaptive random codebook
654 A noise codebook comprising an algebraic codebook, an adaptive random codebook, and a random codebook

Claims

複数のチャンネルから構成される代数的符号帳からチャンネル毎に１本のパルスを特定し、特定したパルスを合わせて雑音符号ベクトルを生成する音源信号符号化装置であって、
チャンネル毎に予め定められているパルス探索位置の中から１箇所を選び、選んだパルス探索位置を示すインデックスを生成するインデックス生成手段と、
チャンネル毎に、対象チャンネルに対するインデックスで示されるパルス探索位置に対応する複数のパルス候補の中から、対象チャンネルとは異なるチャンネルのインデックスに基づいて１本のパルスを特定する特定手段と、
を具備する音源信号符号化装置。An excitation signal encoding device that identifies one pulse for each channel from an algebraic codebook composed of a plurality of channels and generates a noise code vector by combining the identified pulses,
Index generation means for selecting one of the pulse search positions predetermined for each channel and generating an index indicating the selected pulse search position;
A specifying means for specifying one pulse based on an index of a channel different from the target channel from among a plurality of pulse candidates corresponding to a pulse search position indicated by an index for the target channel for each channel ;
A sound source signal encoding device comprising:

複数のチャンネルから構成される代数的符号帳からチャンネル毎に１本のパルスを特定し、特定したパルスを合わせて雑音符号ベクトルを生成する音源信号符号化方法であって、
チャンネル毎に予め定められているパルス探索位置の中から１箇所を選び、選んだパルス探索位置を示すインデックスを生成する工程と、
チャンネル毎に、対象チャンネルに対するインデックスで示されるパルス探索位置に対応する複数のパルス候補の中から、対象チャンネルとは異なるチャンネルのインデックスに基づいて１本のパルスを特定する工程と、
を具備する音源信号符号化方法。An excitation signal encoding method for identifying one pulse for each channel from an algebraic codebook composed of a plurality of channels and generating a noise code vector by combining the identified pulses,
Selecting one place from pulse search positions predetermined for each channel, and generating an index indicating the selected pulse search position;
Identifying one pulse based on an index of a channel different from the target channel from among a plurality of pulse candidates corresponding to a pulse search position indicated by an index for the target channel for each channel ;
A sound source signal encoding method comprising:

請求項１記載の音源信号符号化装置と同一の代数的符号帳を具備し、前記音源信号符号化装置から出力されたインデックスを用いて雑音符号ベクトルを生成する音源信号復号化装置であって、
チャンネル毎に、対象チャンネルに対するインデックスで示されるパルス探索位置に対応する複数のパルス候補の中から、対象チャンネルとは異なるチャンネルのインデックスに基づいて１本のパルスを特定する特定手段を具備する音源信号復号化装置。An excitation signal decoding device comprising the same algebraic codebook as the excitation signal encoding device according to claim 1 and generating a noise code vector using an index output from the excitation signal encoding device,
A sound source signal provided with a specifying means for specifying one pulse based on an index of a channel different from the target channel from among a plurality of pulse candidates corresponding to the pulse search position indicated by the index for the target channel for each channel Decryption device.

請求項２記載の音源信号符号化方法で用いたものと同一の代数的符号帳から、前記音源信号符号化方法で生成されたインデックスを用いて雑音符号ベクトルを生成する音源信号復号化方法であって、
チャンネル毎に、対象チャンネルに対するインデックスで示されるパルス探索位置に対応する複数のパルス候補の中から、対象チャンネルとは異なるチャンネルのインデックスに基づいて１本のパルスを特定する工程を具備する音源信号復号化方法。A sound source signal decoding method for generating a noise code vector from an algebraic codebook used in the sound source signal encoding method according to claim 2, using an index generated by the sound source signal encoding method. And
Sound source signal decoding comprising a step of identifying one pulse based on an index of a channel different from the target channel from among a plurality of pulse candidates corresponding to pulse search positions indicated by the index for the target channel for each channel Method.

コンピュータに、複数のチャンネルから構成される代数的符号帳からチャンネル毎に１本のパルスを特定し、特定したパルスを合わせて雑音符号ベクトルを生成する音源信号符号化方法を実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体であって、
チャンネル毎に予め定められているパルス探索位置の中から１箇所を選び、選んだパルス探索位置を示すインデックスを生成する手順と、
チャンネル毎に、対象チャンネルに対するインデックスで示されるパルス探索位置に対応する複数のパルス候補の中から、対象チャンネルとは異なるチャンネルのインデックスに基づいて１本のパルスを特定する手順と、
を具備するプログラムを記録したコンピュータ読み取り可能な記録媒体。A computer recorded program for executing a sound source signal encoding method for identifying one pulse for each channel from an algebraic codebook composed of a plurality of channels and generating a noise code vector by combining the identified pulses. A computer- readable recording medium,
A procedure for selecting one place from pulse search positions determined in advance for each channel and generating an index indicating the selected pulse search position;
A procedure for identifying one pulse based on an index of a channel different from the target channel from among a plurality of pulse candidates corresponding to the pulse search position indicated by the index for the target channel for each channel ;
A computer- readable recording medium having recorded thereon a program comprising: