JPH04502675A

JPH04502675A - Digital speech coder with improved long-term predictor

Info

Publication number: JPH04502675A
Application number: JP2509641A
Authority: JP
Inventors: ジャーソン・イラ　アラン; ジャシウク・マーク　エイ
Original assignee: モトローラ・インコーポレーテッド
Priority date: 1989-09-01
Filing date: 1990-06-25
Publication date: 1992-05-14
Anticipated expiration: 2017-03-25
Also published as: JP3268360B2; ES2145737T5; DE69033510D1; DK0450064T3; CN1050633A; AU634795B2; CN1026274C; EP0450064B1; DK0450064T4; CA2037899C; ATE191987T1; AU5952590A; CA2037899A1; DE69033510T3; EP0450064A4; EP0450064B2; MX167644B; DE69033510T2; SG47028A1; EP0450064A1

Abstract

A digital speech coder includes a long-term filter (124) having an improved sub-sample resolution long-term predictor which allows for subsample resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied to an adder (510). The output of the adder (510) produces the output vector b(n) for the long term filter (124). The output vector b(n) is fed back to a delayed vector generator block (530) of the long-term predictor. The nominal long-term predictor lag parameter L is also input to the delayed vector generator block (530). The long-term predictor lag parameter L can take on non-integer values, which may be multiples of one half, one third, one fourth or any other rational fraction. The delayed vector generator (530) includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated by the delayed vector generator (530) and stored in its memory, at least one interpolated sample being calculated and stored between each past sample of b(n). The delayed vector generator (530) provides output vector q(n) to the long-term multiplier block (520), which scales the long-term predictor response by the long-term predictor coefficient beta . The scaled output beta q(n) is then applied to the adder (510) to complete the feedback loop of the recursive filter (124).

Description

【発明の詳細な説明】改良されたロングターム予測器を有するデジタル音声フード発明の背景この発明は、１９８８年６月２８日に出願され今は放棄されている、米国出願番号第０７／２１２，４５５号の一部継続出願である、１９８９年９月１日に出願されかつ今は放棄されている、米国出願番号第０７／４０２．２０６号の継続出願である。[Detailed description of the invention] An improved long-term predictor Digital audio hood with Background of the invention This invention was filed on June 28, 1988 and is now abandoned, US Application No. No. 07/212,455, a continuation in part, filed on September 1, 1989 Continued filing of U.S. Application Serial No. 07/402.206, now abandoned. It is a wish.

コード励起リニア予測（ｃｏｄｅ−ｅｘｃ　ｉ　ｔ　ｅｄｌｉｎｅａｒ　ｐｒｅｄｉｃｔｉｏｎ：ＣＥＬＰ）は低いビットレート、すなわち、４．８〜９．６キロビツト／秒（Ｋｂｐｓ）における高品質の合成音声を生成できる可能性を有する音声コーディング技術である。このクラスの音声符号化は、またベクトル励起リニア予測または推計符号化（ｓｔｏｃｈａｓｔｉｃ　ｃｏｄｉｎｇ）として知られているが、数多くの音声通信および音声合成の用途に最もよく用いられる。Code-excited linear prediction (code-exc itedlinear pre diction: CELP) is a low bit rate, i.e. 4.8 to 9.6 bits. It has the potential to generate high-quality synthesized speech in robots per second (Kbps). This is a voice coding technology that uses This class of speech coding also uses vector excitation Also known as linear prediction or stochastic coding. However, it is most commonly used in many voice communication and speech synthesis applications.

ＣＥＬＰは音声品質、データレート、大きさおよびコストが重要な要素であるデジタル音声暗号化およびデジタル無線電話通信システムに特に適用可能であることが分っている。CELP is a device where voice quality, data rate, size and cost are important factors. be particularly applicable to digital voice encryption and digital radiotelephone communication systems. I know that.

［符号励起（ｃｏｄｅ−ｅｘｃ　ｉ　ｔ　ｅｄ）Ｊまたはベクトル励起（ｖｅｃｔｏｒ−ｅｘｃｉｔｅｄ）Ｊという用語は音声フードのための励起シーケンスがベクトル量子化されている、すなわち単一のコード語（ｃｏｄｅｗｏｒｄ）が励起サンプルのシーケンス、ベクトルを表すのに用いられるという事実からきている。このようにして、毎サンプルにつき１ビツトより小さなデータレートが励起シーケンスを符号化するために可能となる。記憶された励起符号ベクトルは一般に独立のランダムなホワイトガウスシーケンスからなる。フードブックからの１つのコードベクトルはＮ個の励起サンプルの各ブロックを表すのに用いられる。[code excitation (code-exc i t ed) J or vector excitation (vec tor-excited) The term J means that the excitation sequence for the audio hood is vector quantized, i.e. a single codeword is It comes from the fact that it is used to represent a sequence of samples, a vector. Ru. In this way, a data rate of less than 1 bit per sample can be excited. It becomes possible to encode sequences. The stored excitation code vector is general consists of independent random white Gaussian sequences. 1 from the food book One codevector is used to represent each block of N excitation samples.

各々の記憶されたコードベクトルはコード語、すなわちコードベクトルメモリのロケーションのアドレスによって表される。受信機において音声フレームを再構成するために通信チャネルを介して音声シンセサイザに引き続き送られるのはこのコード語である。エム・アール・シュローダおよびビー・ニス・アタルによる、「コード励起リニア予測（ＣＥＬＰ）、非常に低いビットレートにおける高品質音声」、音響に関するＩ　ＥＥＥ国際会議紀要、音声および信号処理（ＩＣＡＳＳＰ）、第３巻、ＰＰ、９３７−４０．１９８５年３月、をＣＥＬＰのより詳細な説明のために参照。Each stored code vector is a code word, i.e. a code vector memory. Represented by the address of a location. Reconstructs the audio frame at the receiver. This is then sent to the voice synthesizer via the communication channel to is the code word for By M. R. Schroeda and B. Nis Attal , “Code Excited Linear Prediction (CELP), High Quality at Very Low Bitrates” "Quality Audio", Proceedings of the IEEE International Conference on Acoustics, Audio and Signal Processing (ICA SSP), Volume 3, PP, 937-40. March 1985, in more detail in CELP. See for detailed explanation.

ＣＥＬＰ音声コーダにおいては、コードブックからの励起コードベクトルは入力音声信号の特性を作る２個の時間変動リニアフィルタに印加される。第１のフィルタはそのフィードバックループにロングターム予測器を含み、これは有声音（ｖｏｊｃｅｄ　５ｐｅｅｃｈ）のピッチの周期性を導入するために使用される、長い遅延、すなわち、２〜１５ミリセカンドを有している。第２のフィルタはそのフィードバックループにショートターム予測器を含み、これはスペクトル的なエンベロープまたはフォーマット構造を導入するために使用される、短い遅延、すなわち、２ミリセカンドより短い遅延、を有している。音声の各フレームに対して、音声コーグはそれぞれの個々のコードベクトルをフィルタに印加して再構成された音声信号を発生し、元の入力音声信号を該再構成された信号と比較してエラー信号を発生する。このエラー信号は次に人間の聴覚に基づく応答を有する重み付はフィルタを通すことにより重み付けされる。最適の励起信号は現在のフレームに対して最小のエネルギを有する重み付はエラー信号を生成するコードベクトルを選択することにより決定される。最適のコードベクトルに対するコード語は次に通信チャネルによって送信される。In the CELP speech coder, the excitation codevector from the codebook is the input It is applied to two time-varying linear filters that create the characteristics of the audio signal. first fi router includes a long-term predictor in its feedback loop, which is used for voiced sounds ( vojced 5peech) is used to introduce pitch periodicity, It has a long delay, ie 2-15 milliseconds. The second filter includes a short-term predictor in the feedback loop of the spectral A short delay, used to introduce an envelope or formatting structure, That is, it has a delay of less than 2 milliseconds. for each frame of audio. Then, VoiceCog applies each individual codevector to a filter to reconstruct it. generating a reconstructed audio signal and comparing the original input audio signal with the reconstructed signal. Generates an error signal. This error signal then has a response based on human hearing Weighting is performed by passing through a filter. The optimal excitation signal is The weighting with the minimum energy for the frame is the codebase that produces the error signal. determined by selecting the vector. code for optimal code vector The words are then transmitted over a communication channel.

ＣＥＬＰ音声合成器においては、チャネルから受信されたコード語は励起ベクトルのコードブックをアドレスするために使用される。単一のコードベクトルは次にゲインファクタによって乗算され、ロングタームおよびショートタームフィルタによってろ波され再構成された音声ベクトルを得る。ゲインファクタおよび予測器パラメータはまた該チャネルから得られる。より良好な品質の合成信号は合成器によって使用される実際のパラメータが解析段において使用され、従って量子化誤差を最小化することにより発生できることが分っている。従って、より高い品質の音声を生成するためにＣＥＬＰ音声解析段においてこれらの合成パラメータを使用することはアナリシス・パイ・シンセシス音声コーディングと称されている。In the CELP speech synthesizer, the codeword received from the channel is the excitation vector used to address the codebook of the file. A single code vector is is multiplied by the gain factor and the long-term and short-term filters obtain the reconstructed speech vector filtered by the filter. Gain factor and Instrument parameters are also obtained from the channel. A better quality composite signal is The actual parameters used by the generator are used in the analysis stage and therefore the quantity It is known that this can be generated by minimizing the childization error. Therefore, higher These synthesis parameters are used in the CELP speech analysis stage to generate high-quality speech. The use of data is called Analysis Pi Synthesis Audio Coding. ing.

ショートターム予測器は次の式に従って、直前の出力サンプル５（ｎ−ｉ）のリニアな組み合わせにより現在の出力サンプルｓ　（ｎ）を予測しようと試みる。The short-term predictor calculates the previous output sample 5(n−i) according to the following formula: We try to predict the current output sample s(n) by a near combination.

ｓ　（ｎ）　＝ａ　ｓ　（ｎ−１）　＋ａ２ｓ　（ｎ−２）＋・・・＋α　ｓ　（ｎ−ｐ）　＋ｅ　（ｎ）この式で、ｐはショートターム予測器の次数（ｏｒｄｅｒ）であり、ｅ　（ｎ）は予測残差（ｐｒｅｄｉｃｔ、１ｏｎｒｅｓｉｄｕａｌ）、すなわち、ｐ個の先のサンプルの重み付けされた和によって表すことのできないｓ　（ｎ）の部分、である。予測器の次数ｐは典型的には、８キロヘルツ（ＫＨｚ）のサンプリングレートを仮定すると、８〜１２の範囲にわたっている。この式における重みＣ１，Ｃ２゜α　は予測器係数と呼ばれている。ショートターム予測器係数は伝統的なリニア予測コーディング（Ｌ　Ｐ　Ｇ）技術を用いて音声信号から決定される。ショートタームフィルタの出力応答は２変換表現で次のように表される。s (n) = a s (n-1) + a2s (n-2) +...+α s (n-p) +e (n) In this formula, p is the order of the short-term predictor (or er), and e(n) is the prediction residual (predict, 1onresidua l), i.e., expressed by a weighted sum of p previous samples. This is the part of s(n) that cannot be reached. Predictor order p is typically 8 kHz Assuming a sampling rate of (KHz), it ranges from 8 to 12 . The weights C1 and C2°α in this equation are called predictor coefficients. short The term predictor coefficients are calculated using traditional linear predictive coding (LPG) techniques. is determined from the audio signal. The output response of the short-term filter is represented by two transformations. It is expressed as follows.

Ａ　（ｚ）　−−−一−−−−−−−−−−−−−−−ショートタームフィルタパラメータのその他の説明に関しては、「低ビツトレートにおける音声の予測的コーディング」と題する、Ｉ　ＥＥＥ紀要、通信、Ｃ０Ｍ−３０、ｐｐ、６００ −１４．１９８２年４月、ビー・ニス・アタルによる論文を参照。A (z) ------------ Short-term filter For other descriptions of the parameters, see Predictive Speech at Low Bit Rates. Coding, IEEE Bulletin, Communications, C0M-30, pp. 600 -14. See the paper by Bee Nis Attal, April 1982.

これに対し、ロングタームフィルタはずっと長い期間にわたり延在する先行サンプルから次の出力サンプルを予測しなければならない。予測器において単一の過去のサンプルのみが使用されれば、予測器は単一タップ予測器である。Long-term filters, on the other hand, We have to predict the next output sample from the pull. A single error in the predictor If only previous samples are used, the predictor is a single tap predictor.

典型的には、１〜３タツプが使用される。単一タップ、ロングターム予測器を導入したロングタームフィルタのための出力応答はＺ変換表示で次のように与えられる。Typically 1-3 taps are used. Introducing a single-tap, long-term predictor The output response for the entered long-term filter is given in Z-transform representation as It will be done.

Ｂ　（ｚ）　＝−−−−−−−−−−−−−−−−１−βｚ−Ｌこの出力応答はフィルタの遅延またはラグＬおよびフィルタ係数βのみの関数であることに注意を要する。有声音に対しては、ラグＬは典型的には音声のピッチ期間、あるいはその倍数である。８ＫＨｚのサンプリングレートにおいては、ラグＬの適切な範囲は１６と１４３の間であり、これは５００Ｈｚから５６Ｈｚの間のピッチレンジに対応する。B (z) =−−−−−−−−−−−−−−−1−βz−L This output response is a function only of the filter delay or lag L and the filter coefficient β. There is one thing that requires attention. For voiced sounds, the lag L is typically the pitch of the voice. period or a multiple thereof. At a sampling rate of 8KHz, the A suitable range for G L is between 16 and 143, which is between 500Hz and 56Hz. It corresponds to a pitch range between.

ロングターム予測器のラグＬおよびロングターム予測器の係数βは開ループまたは閉ループ構成のいずれかから決定できる。開ループ構成を用いると、ラグＬおよび係数βは入力信号（またはその残差）から直接計算される。閉ループ構成では、ラグＬ１および係数βはロングタームフィルタの過去の出力を表す符号化データおよび入力音声信号からフレームレートで計算される。符号化データを使用する場合における、ロングターム予測器のラグ決定は合成器において存在する実際のロングタームフィルタの状態に基づいている。従って、閉ループ構成が開ループ方法よりもより良好な性能を与えるが、それはピッチフィルタそれ自体がエラー信号の最適化に貢献するからである。さらに、単一タップの予測器は閉ループ構成で非常に良好に作動する。The lag L of the long-term predictor and the coefficient β of the long-term predictor are open-loop or can be determined from any closed-loop configuration. Using an open-loop configuration, the lag L and and the coefficient β are calculated directly from the input signal (or its residual). in a closed loop configuration lag L1 and coefficient β are encoded data representing the past output of the long-term filter. The frame rate is calculated from the data and the input audio signal. Use encoded data The lag determination of the long-term predictor when Based on the current long-term filter state. Therefore, the closed-loop configuration becomes gives better performance than the pitch method, but it is because the pitch filter itself This is because it contributes to the optimization of the error signal. Additionally, single-tap predictors are closed-loop Works very well in configuration.

閉ループ構成を使用すると、ロングタームフィルタの出力応答ｂ　（ｎ）はロングタームフィルタからの過去の出力サンプルのみから、かつ次の式による現在の入力音声サンプルｓ　（ｎ）から決定される。Using a closed-loop configuration, the output response b(n) of the long-term filter is from only the past output samples from the term filter, and from the current Determined from the input audio samples s(n).

ｂ　（ｎ）　＝ｓ　（ｎ）＋βｂ（ｎ−Ｌ）この技術はフレーム長Ｎより大きなピッチラグしに対し、すなわち、ＬＡＮの場合に、直接的であるが、それは項ｂ（ｎ−Ｌ）は常にすべてのサンプル番号ｎ、Ｏ≦ｎ≦Ｎ−１に対し過去のサンプルを表すからである。さらに、Ｌ〉Ｎの場合、励起ゲインファクタγおよびロングターム予測器係数βはラグＬおよびコード語ｉの与えられた値に対し同時に最適化することができる。この組み合わされた最適化技術は音声品質の注目すべき改善をもたらすことが発見されている。b (n) = s (n) + βb (n - L) This technique is used when the frame length is larger than N. Directly for pitch lag, i.e. in the case of LAN, it is the term b (n-L) is always the past sample number for all sample numbers n, O≦n≦N-1. This is because it represents . Furthermore, if L>N, the excitation gain factor γ and Ron The code word predictor coefficients β are simultaneously maximal for given values of lag L and code word i. can be optimized. This combined optimization technique improves voice quality. It has been found that it brings about improvement.

しかしながら、もしロングターム予測器のフレーム長Ｎより小さなラグＬを収容しなければならない場合には、閉ループ手法は不都合である。この問題は高いピッチの女性の声の場合に容易に起こり得る。たとえば２５０Ｈｚのピッチ周波数に対応する女性の音声は４ミリセカンド（ｍｓｅｃ）に等しいロングターム予測器ラグＬを必要とする。However, if the long-term predictor accommodates a lag L smaller than the frame length N Closed-loop approaches are disadvantageous when This issue has a high This can easily happen with a female voice. For example, a pitch frequency of 250Hz The female voice corresponding to the long term prediction is equal to 4 milliseconds (msec). Requires a container lug L.

８ＫＨｚのサンプリングレートにおける２５０Ｈｚのピッチは３２サンプルのロングターム予測器のラグＬに対応する。しかしながら、４ミリセカンドより小さなフレーム長Ｎを用いることは望ましくなく、これはＣＥＬＰ励起ベクトルはより長いフレーム長が用いられる場合により効率的に符号化できるからである。従って、８ＫＨｚのサンプリングレートにおける７、５ミリセカンドのフレーム長時間を用いると、フレーム長Ｎは６０サンプルに等しくなるであろう。これはフレームの次の６０サンプルを予測するために３２の過去のサンプルのみが利用できることを意味する。従って、もしロングターム予測器のラグＬがフレーム長Ｎより小さければ、必要とされるＮサンプルの内のＬの過去のサンプルのみが規定される。A pitch of 250Hz at a sampling rate of 8KHz is a log of 32 samples. corresponds to the lag L of the term predictor. However, less than 4 milliseconds It is undesirable to use a long frame length N, since the CELP excitation vector is This is because encoding can be performed more efficiently when a longer frame length is used. subordinate So, the frame length is 7.5 milliseconds at a sampling rate of 8KHz. Using time, the frame length N would be equal to 60 samples. This is a file Only 32 past samples are available to predict the next 60 samples of the frame. It means to be able to do something. Therefore, if the lag L of the long-term predictor is the frame length N If smaller, only L past samples out of the required N samples are specified. be done.

フレーム長Ｎより小さなピッチラグＬの問題に対処するため従来技術においていくつかの別の手法がとられている。In order to deal with the problem of pitch lag L being smaller than frame length N, the conventional technology Several other approaches have been taken.

ロングターム予測器のラグＬおよび係数βを組み合わせて最適化する試みにおいて、第１の手法はなんらの励起信号も存在しないと仮定して方程式を直接解くよう試みることである。この手法は「規則的なパルス励起−音声の有効かつ効率的な多重パルス符号化」、クルーン他、音響、音声および信号処理に関するＩ　ＥＥＥ紀要、ＡＳＳＰ−３４巻、第５号、１９８６年１０月、ｐｐ、１０５４−１０６３の論文に説明されている。しかしながら、この手法に従うと、単一のパラメータβにおける非線形方程式を解かなければならない。βにおける２次方程式または３次方程式の解を解かなければならない。βにおける２次方程式または３次方程式の解は計算機的に実際的でない。その上、利得ファクタγと係数βとを一緒にして最適化することはこの手法では依然として不可能である。In an attempt to combine and optimize the lag L and coefficient β of a long-term predictor, Therefore, the first method is to solve the equation directly assuming that no excitation signal exists. It is a good idea to try. This method is called ``Regular pulse excitation - effective and efficient method of speech production.'' "Multipulse Coding", Kroon et al., IE on Acoustics, Speech and Signal Processing EE Bulletin, ASSP-34, No. 5, October 1986, pp, 1054-1 It is explained in the paper No. 063. However, following this approach, a single parameter We have to solve the nonlinear equation in meter β. Quadratic equation in β Or you have to solve a cubic equation. Quadratic equation in β or 3 Solving the following equation is computationally impractical. Moreover, the gain factor γ and the coefficient β are Joint optimization is still not possible with this approach.

ロングターム予測器遅延りをフレーム長Ｎより大きいものと限定することによる、第２の解法は、シングハルおよびアタルにより提案された論文「低いビットレートにおける多重パルスＬＰＧコーダの性能の改善Ｊ１音響、音声、および信号処理に関するＩ　ＥＥＥ国際会議の紀要、第１巻、１９８４年３月１９−２１日、ｐｐ、１．３．　１−１．３゜４において提案されている。ピッチラグしに対するこの人工的な制約はしばしばピッチ情報を正確に表さない。従って、この手法を用いると音声品質が高いピッチの音声に対し劣化する。By limiting the long-term predictor delay to be greater than the frame length N. , the second solution is proposed by Singhal and Atal in the paper “Low bitrate Improving the performance of multipulse LPG coders in J1 acoustics, speech, and signals Proceedings of the IEEE International Conference on Processing, Volume 1, March 19-21, 1984 , pp. 1.3. 1-1.3゜4. Against pitch lag This artificial constraint often does not accurately represent pitch information. Therefore, this method When using this method, the voice quality deteriorates for high-pitched voices.

第３の解法はフレーム長Ｎの大きさを低減することである。より短いフレーム長により、ロングターム予測器のラグＬは常に過去のサンプルから決定することができる。しかしながら、この手法は厳しいビットレートのペナルティを被る。より短いフレーム長では、より大きな数のロングターム予測器パラメータおよび励起ベクトルを符号化しなければならず、かつ従ってチャネルのビットレートは余分のコーディングを収容するためにより大きくなければならない。A third solution is to reduce the size of the frame length N. shorter frame length Therefore, the lag L of the long-term predictor can always be determined from past samples. can. However, this approach suffers from severe bitrate penalties. Yo For shorter frame lengths, a larger number of long-term predictor parameters and the bit rate of the channel must be encoded and therefore the bit rate of the channel is Must be larger to accommodate minutes of coding.

第２の問題は高いピッチの話者に対して存在する。コーグにおいて使用されるサンプリングレートは単一タップのピッチ予測器の性能に対し上限を設ける。たとえば、もしピッチ周波数が実際には４８５Ｈｚであれば、最も近いラグ値は１６でありこれは５００Ｈｚに対応する。これは音声品質を劣化させる基本ピッチ周波数に対し１５Ｈｚのエラーを生ずる結果となる。このエラーは該ピッチ周波数の高調波に対し増倍されさらに劣化を引き起こす。A second problem exists for high pitch speakers. Services used at Coorg The sampling rate places an upper limit on the performance of a single tap pitch predictor. and For example, if the pitch frequency is actually 485Hz, the nearest lag value is 16 and this corresponds to 500Hz. This is the basic pitch frequency that degrades audio quality. This results in a 15 Hz error in wavenumber. This error is caused by the pitch frequency harmonics are multiplied and cause further deterioration.

従って、ロングターム予測器のラグＬを決定するための改良された方法を提供する必要性が存在する。最適の解法は高いピッチの音声のコーディングに対し計算機的な複雑性および音声品質の双方に向けられなければならない。Therefore, we provide an improved method for determining the lag L of a long-term predictor. There is a need to The optimal solution is calculated for coding high pitch speech. Both mechanical complexity and voice quality must be addressed.

発明の概要従って、本発明の一般的な目的は、低いビットレートにおいて高い品質の音声を生成する改良されたデジタル音声コーディング技術を提供することにある。Summary of the invention Therefore, the general object of the present invention is to provide high quality audio at low bitrates. An object of the present invention is to provide an improved digital speech coding technique for generating.

本発明のより特定的な目的は、閉ループ手法を用いたロングターム予測器のパラメータを決定するための方法を提供することにある。A more specific object of the invention is to develop the parameters of a long term predictor using a closed loop approach. The objective is to provide a method for determining the meter.

本発明の他の目的は、ロングターム予測器のラグパラメータＬが非整数である場合にロングターム予測器の出力応答を決定するための改良された方法を提供することにある。Another object of the present invention is that when the lag parameter L of the long-term predictor is a non-integer Provides an improved method for determining the output response of a long-term predictor when There is a particular thing.

本発明のさらに他の目的は、最適の励起コードベクトルのためのコードブックサーチの間に利得ファクタγおよびロングターム予測器係数βの組み合わされた最適化を許容する改良されたＣＥＬＰ音声コーダを提供することにある。Still another object of the present invention is to provide a codebook support for optimal excitation codevectors. During the search, the combined optimum of the gain factor γ and the long-term predictor coefficient β is The object of the present invention is to provide an improved CELP speech coder that allows optimization.

本発明の新規な見地によれば、パラメータＬの分解能（ｒｅｓｏｌｕｔｉｏｎ）はＬが整数でない値をとることを許容することにより増大される。これはロングターム予測器の状態の補間されたサンプルを提供するために補間フィルタを使用することにより達成される。閉ループ構成においては、ロングターム予測器の状態の将来のサンプルは補間フィルタにとって利用できない。この問題はロングターム予測器の状態を補間フィルタによる使用のために将来にわたってピッチ同期的に延長することにより回避される。According to a novel aspect of the invention, the resolution of the parameter L is increased by allowing L to take on non-integer values. this is long Use interpolation filter to provide interpolated samples of term predictor states This is achieved by In a closed-loop configuration, the shape of the long-term predictor Future samples of the state are not available to the interpolation filter. This problem is long future pitch synchronization of the state of the system predictor for use by interpolation filters This can be avoided by extending the term.

次のフレームに対する実際の励起サンプルが利用できるようになると、ロングターム予測器の状態が（ピッチ同期的に延長されたサンプルに基づくものに置き代わる）実際の励起サンプルを反映するために更新される。たとえば、補間は各々の存在するサンプルの間の１つのサンプルを補間するために使用でき従ってＬの分解能をサンプルの半分に倍加する。３または４のような、より高い補間ファクタもまた選択でき、これはＬの分解能を１つのサンプルの３分の１または４分の１に増加するであろう。When the actual excitation samples for the next frame are available, the long The state of the frame predictor (replaced by one based on pitch-synchronously extended samples) updated) to reflect the actual excitation sample. For example, each interpolation can be used to interpolate one sample between the existing samples of L. Double the resolution to half the sample. Higher interpolation factors, such as 3 or 4 The resolution of L can also be selected by one-third or one-fourth of one sample. will increase to 1.

図面の簡単な説明新規であると信じられる本発明の特徴は特に添付の請求の範囲に記載されている。本発明は、そのさらに他の目的および利点とともに、添付の図面を取り入れて以下の説明を参照することにより最もよく理解でき、そのいくつかの図面においては同様の参照数字は同様の要素を表わし、かつ各図面において、第１図は、本発明とともに使用するためのロングタームフィルタの位置を示す、コード励起リニア予測音声コーグの一般的なブロック図であり、第２Ａ図は、第１図のロングタームフィルタの１実施例を示す詳細なブロック図であり、フィルタのラグＬが整数である場合のロングターム予測器の応答を示しており、第２Ｂ図は、第２Ａ図におけるロングターム予測器の動作を説明するために用いることができるシフトレジスタを示す概略図であり、第２Ｃ図は、第１図のロングタームフィルタの他の実施例を示す詳細なブロック図であり、フィルタのラグＬが整数である場合のロングターム予測器の応答を示しており、第３図は、第２Ａ図のロングタームフィルタにより達成される動作を説明する詳細なフローチャートであり、第４図は、本発明に従って使用するための音声合成器の一般的なブロック図であり、第５図は、第１図のロングタームフィルタの詳細なブロック図であり、本発明に従いサブサンプルの分解能のロングターム予測器応答を示しており、第６Ａ図および第６Ｂ図は、第５図のロングタームフィルタにより行われる動作を説明する詳細なフローチャートであり、そして第７図は、第４図における音声合成器のショートタームフィルタおよびＤ／Ａコンバータを相互結合するためのピッチポストフィルタを示す詳細なブロック図である。Brief description of the drawing The features of the invention believed to be novel are particularly pointed out in the appended claims. . The invention, together with further objects and advantages, incorporates the accompanying drawings. It can be best understood by referring to the following description, some of which are illustrated in the drawings. Like reference numerals represent like elements, and in each drawing: FIG. 1 shows the location of a long-term filter for use with the present invention; A general block diagram of a code-excited linear predictive speech cog, FIG. 2A is a detailed block diagram illustrating one embodiment of the long-term filter of FIG. and denotes the response of the long-term predictor when the filter lag L is an integer. FIG. 2B is a diagram for explaining the operation of the long-term predictor in FIG. 2A. 1 is a schematic diagram showing a shift register that can be used for FIG. 2C is a detailed block diagram illustrating another embodiment of the long-term filter of FIG. , which shows the response of the long-term predictor when the filter lag L is an integer. Figure 3 shows the operation achieved by the long-term filter of Figure 2A. FIG. 4 is a detailed flowchart illustrating the flowchart for use in accordance with the present invention; is a general block diagram of a speech synthesizer, FIG. 5 is a detailed block diagram of the long-term filter shown in FIG. Therefore, we show the long-term predictor response with subsample resolution, 6A and 6B illustrate the operations performed by the long-term filter of FIG. is a detailed flowchart explaining, and Figure 7 shows the short-term filter and D/A controller of the speech synthesizer in Figure 4. with a detailed block diagram showing a pitch post filter for interconnecting inverters. be.

好ましい実施例の詳細な説明次に第１図を参照すると、本発明に係わるロングタームフィルタを利用するコード励起リニア予測音声コーグ１００の一般的なブロック図が示されている。分析されるべき音響入力信号はマイクロホン１０２において音声コーグ１００に印加される。典型的には音声信号である、入力信号は次にフィルタ１０４に印加される。フィルタ１０４は一般的にはバンドパスフィルタ特性を示す。しかしながら、もし音声の帯域幅がすでに適切であれば、フィルタ１０４は直接的なワイヤ接続であってもよい。DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Next, referring to FIG. 1, a code using a long-term filter according to the present invention is shown. A general block diagram of a highly excited linear predictive speech code 100 is shown. analysis The acoustic input signal to be transmitted is applied to the audio cog 100 at the microphone 102. be done. The input signal, typically an audio signal, is then applied to filter 104. Ru. Filter 104 generally exhibits bandpass filter characteristics. however , if the audio bandwidth is already adequate, the filter 104 can be connected directly to the wire. It may be a continuation.

フィルタ１０４からのアナログ音声信号は次に−続きのＮパルスのサンプルに変換され、かつ各パルスサンプルの振幅は次に、技術上知られているように、アナログ−デジタル（Ａ／Ｄ）コンバータ１０８においてデジタル符号により表される。サンプリングレートはサンプルクロックＳＣにより決定され、これは好ましい実施例においては８゜０ｋＨｚのレートを呈する。サンプルクロックＳＣはクロック１１２によってフレームクロックＦＣとともに発生される。The analog audio signal from filter 104 is then transformed into a sample of N consecutive pulses. and the amplitude of each pulse sample is then analyzed by an analyzer, as is known in the art. represented by a digital code in a log-to-digital (A/D) converter 108 Ru. The sampling rate is determined by the sample clock SC, which is preferably In the preferred embodiment, it exhibits a rate of 8°0 kHz. The sample clock SC is Generated by lock 112 along with frame clock FC.

Ａ／Ｄ　１０８のデジタル出力は、入力音声ベクトル５（ｎ）として表されるが、次に係数アナライザ１１０に印加される。この入力音声ベクトルｓ　（ｎ）は別々のフレーム、すなわち、その長さがフレームクロックＦＣによって決定される、時間のブロック、において反復的に得られる。The digital output of A/D 108 is represented as input audio vector 5(n). , which is then applied to coefficient analyzer 110. This input speech vector s(n) is separate frames, i.e. their length is determined by the frame clock FC is obtained iteratively in blocks of time.

好ましい実施例においては、入力音声ベクトルｓ　（ｎ）、０≦ｎ≦Ｎ−１、はＮ＝６０サンプルを含む７．５ミリセカンドのフレームを表し、この場合者サンプルはデジタルコードの１２〜１６ビツトによって表される。この実施例においては、音声の各ブロックに際し、−組のリニア予測コーディング（Ｌ　Ｐ　Ｇ）パラメータがオープンループ構成の係数アナライザー１０によって生成される。In a preferred embodiment, the input speech vector s(n), 0≦n≦N-1, is N= represents a 7.5 millisecond frame containing 60 samples, in which case A pull is represented by 12-16 bits of digital code. In this example For each block of audio, - sets of linear predictive coding (LPG) are applied. Parameters are generated by a coefficient analyzer 10 in an open loop configuration.

ショトターム予測器パラメータα０、ロングターム予測器係数β、公称ロングターム予測器うグパラメータＬ１重み付はフィルタパラメータＷＦＰ、および励起利得ファクタγ（後に説明する最善の励起コード語Ｉとともに）はマルチプレクサ１５０に印加されかつ音声合成器による使用のためチャネルによって送られる。この実施例のためにこれらのパラメータを発生する代表的な方法については、ビー・ニス・アタルによる、１９８２年４月、ｐｐ、６００−１４、Ｃ０Ｍ−３０巻、Ｉ　ＥＥＥ通信紀要、［低いビットレートにおける音声の予測コーディング」と題する論文を参照。入力音声ベクトルｓ　（ｎ）はまた減算器１３０にも印加され、その機能は後に説明する。short-term predictor parameter α0, long-term predictor coefficient β, nominal long term The weighting parameter L1 of the program predictor is the filter parameter WFP, and the excitation The gain factor γ (along with the best excitation codeword I, explained later) is channel 150 and sent by the channel for use by the speech synthesizer. . For a typical method of generating these parameters for this example: By B. Nis Attal, April 1982, pp. 600-14, C0M-3 Volume 0, IEEE Communication Bulletin, [Predictive coding of speech at low bit rates] See the paper entitled ``G. The input speech vector s(n) is also input to the subtractor 130. and its function will be explained later.

コードブックＲＯＭＩ２０は一組のＭ個の励起ベクトルｕ、（ｎ）を含み、ここで１≦ｉ≦Ｍであり、各々Ｎ個のサンプルから成り、この場合０≦ｎ≦Ｎ−１である。コードブツクＲＯＭ１２０は好ましくはここに参照のため導入される、米国特許第４，８１７．１５７号に述べられたようにして実施される。コードブックＲＯＭ１２０は一組の励起コード語ｉの内の特定の１つに応じてこれらの擬似ランダム励起ベクトルを発生する。Ｍ個の励起ベクトルの各々は−続きのランダムなホワイトガウスサンプルから成るが、他の形式の励起ベクトルも本発明とともに使用することができる。もし励起信号が６０サンプルの各々に対しサンプルごとに０．　２ビツトのレートで符号化されたならば、可能な励起ベクトルに対応する４０９６のコード語ｉがある。The codebook ROMI 20 contains a set of M excitation vectors u,(n), where and 1≦i≦M, each consisting of N samples, in which case 0≦n≦N-1 and be. Codebook ROM 120 is preferably a US Pat. It is carried out as described in National Patent No. 4,817.157. code book The ROM 120 stores these pseudo-code words in response to a particular one of a set of excitation code words i. Generate random excitation vectors. Each of the M excitation vectors is a -continuation randa consists of a typical white Gaussian sample, but other types of excitation vectors are also compatible with the present invention. Can also be used. If the excitation signal is sampled for each of the 60 samples 0. If encoded at a rate of 2 bits, for the possible excitation vectors There are 4096 corresponding codewords i.

各々の個々の励起ベクトルｕ、（ｎ）に対し、再構成された音声ベクトルｓ’　、（ｎ）が入力音声ベクトルｓ　（ｎ）に対する比較のために発生される。利得ブロック１２２には、フレームに対する内容である、励起利得ファクタγにより励起ベクトルｕ、（ｎ）を尺度変更（ｓｃａｌｅ）する。励起利得ファクタγは係数アナライザー１０によってあらかじめ計算されかつ第１図に示されるようにすべての励起ベクトルを解析するために使用され、あるいは最善の励起コード語ｒのサーチと組み合わせて最適化されかっコードブックサーチコントローラー４０により発生される。For each individual excitation vector u,(n), the reconstructed speech vector s' , (n) are generated for comparison against the input speech vector s(n). gain Block 122 includes an excitation gain factor γ, which is the content for the frame. Scale the excitation vector u,(n). The excitation gain factor γ is As previously calculated by the coefficient analyzer 10 and shown in FIG. used to analyze all excitation vectors, or the best excitation codeword Optimized codebook search controller in combination with r search 4 Generated by 0.

尺度変更された励起信号γｕ−（ｎ）は次に再構成された音声ベクトルｓ’　、（ｎ）を発生するためにロングタームフィルター２４およびショートタームフィルタ１２６によりろ波される。フィルター２４は音声の周期性を導入するためにロングターム予測器パラメータβおよびＬを利用し、かつフィルタ１２６は上に述べたように、スペクトル的なエンベロープを導入するためにショートターム予測器パラメータα１を利用する。ロングタームフィルター２４は以下の図面において詳細に説明する。ブロック１２４および１２６は実際にはそれぞれのフィードバック経路にロングターム予測器およびショートターム予測器を含む再帰（ｒｅｃｕｒｓｉｖｅ）フィルタである。The rescaled excitation signal γu-(n) is then converted into the reconstructed speech vector s', A long-term filter 24 and a short-term filter are used to generate (n). filter 126. Filter 24 is used to introduce periodicity in the audio. utilizing the long-term predictor parameters β and L, and the filter 126 is As mentioned, a short-term forecast is used to introduce a spectral envelope. The instrument parameter α1 is used. The long term filter 24 is shown in the drawing below. This will be explained in detail. Blocks 124 and 126 are actually Recursion (r cursive) filter.

ｉ番目の励起コードベクトルに対する再構成された音声ベクトルｓ’　−（ｎ）は入力音声ベクトルｓ　（ｎ）の同じブロックとこれら２つの信号を減算器１３０において減算することにより比較される。差分ベクトルｅ、（ｎ）は音声の元のおよび再構成されたブロックの間の差を表す。差分ベクトルは係数アナライザー１０によって発生される重み付はフィルタのパラメータＷＴＰを利用して、重み付はフィルター３２により知覚的に重み付けされる。代表的な重み付はフィルタの伝達関数に対する先の参照物を参照されたい。知覚的な重み付けはエラーが知覚的に人間の耳にとってより重要な周波数を強調し、かつ他の周波数を減衰させる。Reconstructed speech vector s′−(n) for the i-th excitation codevector is the same block of input speech vector s(n) and these two signals are subtracted by subtractor 13 Compare by subtracting at zero. The difference vector e, (n) is the source of the voice and the reconstructed block. Difference vector is coefficient analyzer -10 is generated using the filter parameter WTP. The impressions are perceptually weighted by filter 32. Typical weighting is fill See the previous reference to the data transfer function. Perceptual weighting is error-prone Emphasizes frequencies that are perceptually more important to the human ear, and attenuates other frequencies. let

エネルギ計算機１３４は重み付は差分ベクトルｅ／　。The energy calculator 134 weights the difference vector e/.

（ｎ）のエネルギを計算し、かつこのエラー信号Ｅ、をコードブックサーチコントローラー４０に印加する。該サーチコントローラは現在の励起ベクトルｕ、（ｎ）に対するｉ番目のエラー信号を先のエラー信号に対して比較し最小のエラーを生成する励起ベクトルを決定する。最小のエラーを有するｉ番目の励起ベクトルのコードは次にチャネルによって最善の励起コードＩとして出力される。あるいは、サーチコントローラー４０は、あらかじめ規定されたエラーしきい値に合致するような、何等かの所定の基準を有するエラー信号を提供する特定のコード語を決定することができる。(n), and convert this error signal E into a codebook search controller. is applied to the troller 40. The search controller determines the current excitation vector u, ( Compare the i-th error signal for n) with the previous error signal and find the minimum error. Determine the excitation vector that generates . i-th excitation vector with minimum error This code is then output by the channel as the best excitation code I. be Alternatively, the search controller 40 may meet a predefined error threshold. a specific code that provides an error signal with some predetermined criteria, such that The word can be determined.

第１図は、コード励起リニア予測音声コーグに対する本発明の１実施例を示す。FIG. 1 shows one embodiment of the present invention for a code-excited linear predictive speech cog.

この実施例においては、ロングタームフィルタのパラメータＬおよびβは係数アナライザー１０によってオープンループ構成で決定される。あるいは、ロングタームフィルタのパラメータは先に述べたシングハルおよびアタルの参照文献に述べられているように閉ループ構成で決定することができる。一般に、音声コーグの性能は閉ループ構成で決定されるロングタームフィルタのパラメータを使用することにより改善される。本発明に係わるロングターム予測器の新規な構造はフレーム長Ｎより小さなラグＬに対するこれらのパラメータの閉ループ決定の使用に大いに便宜を与える。In this example, the parameters L and β of the long-term filter are coefficients determined by analyzer 10 in an open-loop configuration. Or long tag The parameters of the system filter are described in the Singhal and Attal reference cited above. can be determined in a closed-loop configuration as described above. In general, voice cog The performance of is determined by using long-term filter parameters determined in a closed-loop configuration. This can be improved by The novel structure of the long-term predictor according to the present invention is Use of closed-loop determination of these parameters for lag L smaller than frame length N give great convenience to

第２Ａ図は、第１図のロングタームフィルター２４の１実施例を示し、この場合しは整数に限定されている。第１図は尺度変更された励起ベクトルγｕ−（ｎ）が利得プロツク１２２からロングタームフィルター２４へ入力されるものとして示されているが、第２Ａ図においては説明の目的で代表的な入力音声ベクトルｓ　（ｎ）が使用されている。FIG. 2A shows one embodiment of the long-term filter 24 of FIG. is limited to integers. Figure 1 shows the scaled excitation vector γu-(n) is input from gain block 122 to long-term filter 24. However, in FIG. 2A, for illustrative purposes, a representative input speech vector s (n) is used.

従って、入力音声ベクトルｓ　（ｎ）のＮ個のサンプルのフレームが加算器２１０に印加される。加算器２１０の出力はロングタームフィルター２４のための出力ベクトルｂ（ｎ）を生成する。出力ベクトルｂ　（ｎ）はロングターム予測器の遅延ブロック２３０にフィードバックされる。公称ロングターム予測器ラグパラメータＬもまた遅延ブロック２３０に入力される。ロングターム予測器遅延ブロックは出力ベクトルｑ　（ｎ）をロングターム予測器の乗算器ブロック２２０に提供し、これはロングターム予測器の係数βによりロングターム予測器の応答を尺度変更する。尺度変更された出力βｑ　（ｎ）は次に再帰フィルタのフィードバックループを完成させるために加算器２１０に印加される。Therefore, a frame of N samples of the input speech vector s(n) is sent to the adder 21 Applied to 0. The output of adder 210 is the output for long-term filter 24. Generate force vector b(n). The output vector b(n) is a long-term predictor is fed back to delay block 230. Nominal long-term predictor lagpa The parameter L is also input to delay block 230. Long-term predictor delay block The lock converts the output vector q(n) to the multiplier block 220 of the long-term predictor. , which gives the response of the long-term predictor by the coefficient β of the long-term predictor scale. The scaled output βq(n) is then applied to the recursive filter is applied to adder 210 to complete the backloop.

ロングタームフィルター２４の出力応答Ｈ（ｚ）はＺ変換表示で次のように定義される。The output response H(z) of the long-term filter 24 is defined as follows in Z-transform representation. be done.

この場合、ｎはＮ個のサンプルを含むフレームのサンプル数を表し、０≦ｎ≦Ｎ −１であり、βはフィルタ係数を表し、Ｌはロングターム予測器の公称ラグまたは遅延を表かまたは等しい最も近い整数を表す。ロングターム予測器の遅延　１（ｎ、＋Ｌ）　／ＬＪ　Ｌはサンプル数ｎの関数として変化する。従って、本発明によれば、実際のロングターム予測器の遅延はｋＬになり、この場合りは基本的なまたは公称のロングターム予測器のラグであり、かっｋはサンプル数ｎの関数として組み（１，２，３，４，、、ｉから選択された整数である。従って、ロングタームフィルタの出力応答ｂ　（ｎ）は公称ロングターム予測器のラグパラメータＬおよびフレームの始めに存在するフィルタ状態ＦＳの関数である。このステートメントはピッチラグＬがフレーム長Ｎより小さい問題の場合に対してさえ、Ｌのすべての値に対して当−Ｃはまる。In this case, n represents the number of samples in a frame containing N samples, and 0≦n≦N −1, β represents the filter coefficients, and L is the nominal lag of the long-term predictor or represents the delay or the nearest integer equal to it. Long-term predictor delay 1 (n, +L) /LJ L changes as a function of the number of samples n. Therefore, the original According to Akira, the actual delay of the long-term predictor is kL, and in this case the basic is the lag of the formal or nominal long-term predictor, where k is the function of the number of samples n. As a number, the set is an integer selected from 1, 2, 3, 4, , i. The output response b(n) of the long-term filter is the lag parameter of the nominal long-term predictor. It is a function of the meter L and the filter state FS present at the beginning of the frame. this The statement is for the problem where the pitch lag L is smaller than the frame length N. Well, -C holds true for all values of L.

ロングターム予測器の遅延ブロック２３０の機能は将来のサンプルを予測するために現在の入力サンプルを記憶するこ、とである。第２Ｂ図は、シフトレジスタの単純化した図であり、これは第２Ａ図のロングターム予測器遅延ブロック２３０の動作を理解する上で有用であろう。ｎ＝１（エル）のようなサンプル数１　（エル）に対し、現在の出力サンプルｂ　（ｎ）がシフトレジスタの入力に印加され、これは第２Ｂ図の右側に示されている。次のサンプルｎ＝１＋１に対しては、先のサンプルｂ　（ｎ）は左にシフトレジスタの中にシフトされる。このサンプルは今や最初の過去のサンプルｂ（ｎ−１）となる。次のサンプルｎ＝１＋２に対しては、ｂ　（ｎ）の他のサンプルがレジスタ中にシ多　フトされ、かつ元のサンプルが再び左にシフトされて第２の過去のサンプルｂ（ｎ−２）になる。Ｌ個のサンプルがシフトされた後、元のサンプルはＬの回数圧にシフトされており従ってそれはｂ　（ｎ−Ｌ）として表すことができる。The function of the long-term predictor delay block 230 is to predict future samples. The purpose of this is to remember the current input sample. Figure 2B shows the shift register is a simplified diagram of the long-term predictor delay block 23 of FIG. 2A. This will be useful in understanding the operation of 0. The number of samples is 1, such as n = 1 (L) (el), the current output sample b(n) is applied to the input of the shift register and this is shown on the right side of Figure 2B. For the next sample n=1+1 , the previous sample b(n) is shifted to the left into the shift register. This service The sample is now the first past sample b(n-1). Next sample n=1+ 2, another sample of b(n) is shifted into the register, and The original sample is shifted left again to become the second past sample b(n-2) . After L samples are shifted, the original samples are shifted L times. Therefore, it can be expressed as b(n-L).

上に述べたように、ラグＬは典型的には有声音のピッチ期間またはその倍数であろう。もしラグＬが少なくともフ；　レーム長Ｎと同じぐらい長ければ、十分な数の過去のサンプルが音声の次のフレームを予測するためにシフト入力されかつ記憶されている。Ｌ＝Ｎかつｎ＝Ｎ−１の極端な場合でも、ｂ（ｎ−Ｌ）はｂ（ −１）であり、これは真に過去のサンプルである。従って、サンプルｂ（ｎ−Ｌ）は出力サンプルｑ　（ｎ）としてシフトレジスタから出力されるであろう。As mentioned above, the lag L is typically the pitch period of the voiced sound or a multiple thereof. Dew. If the lag L is at least as long as the frame length N, then A number of past samples are shifted in and out to predict the next frame of audio. remembered. Even in the extreme case of L=N and n=N-1, b(n-L) is b( -1), which is truly a past sample. Therefore, sample b(n-L ) will be output from the shift register as output samples q(n).

しかしながら、もしロングターム予測器のラグパラメータＬがフレーム長Ｎより短ければ、不十分な数のサンプルが次のフレームの開始までにシフトレジスタ中にシフト入力される。２５０Ｈｚのピッチ期間の上の例を用いると、ピッチラグＬは３２に等しくなる。従って、Ｌ＝３２かっＮ＝６０の場合、かつに＝Ｎ−１＝５９の場合、ｂ（ｎ−Ｌ）は通常ｂ（２７）となり、これは６０サンプルのフルームの始めに関して将来のサンプルを表す。言い換えれば、完全なロングターム予測器応答を提供するためには不十分な過去のサンプルが格納されていることになる。予測器パラメータの閉ループ解析が達成できるようにフレームの始めにおいて完全なロングターム予測器の応答が必要である。However, if the lag parameter L of the long-term predictor is less than the frame length N If it is short, an insufficient number of samples will be in the shift register by the start of the next frame. Shift input is performed. Using the example above with a pitch period of 250Hz, the pitch lag L will be equal to 32. Therefore, if L=32 and N=60, and =N-1 = 59, b(n-L) is usually b(27), which is a frame of 60 samples. Represents a future sample with respect to the beginning of the room. In other words, a complete long term Insufficient past samples are stored to provide a system predictor response. become. at the beginning of the frame so that a closed-loop analysis of the predictor parameters can be achieved. A complete long-term predictor response is required.

その場合本発明によれば、同じ記憶されたサンプルｂ（ｎ−Ｌ）、Ｏ≦ｎ≦Ｌ１が繰り返されそれによりロングターム予測器の出力応答は常に現在のフレームの開始に先立ちロングターム予測器の遅延ブロックに入力されたサンプルの関数である。第２Ｂ図に関しては、ロングターム予測器の遅延ブロック２３０の構造を変更することを示す、他のｋＬサンプルの記憶のためにシフトレジスタが延長されている。従って、該シフトレジスタが新しいサンプルｂ　（ｎ）で満たされるから、ｋはｂ（ｎ−ｋＬ）がフレームの開始に先立ちシフトレジスタ中に存在するサンプルを表すように選択されなければならない。Ｌ＝３２およびＮ＝６０の先の例を用いると、出力サンプルｑ（３２）はサンプルｑ（０）の繰り返しであり、これはｂ　（０−Ｌ）　＝ｂ　（３２−２Ｌ）またはｂ（−３２）である。In that case, according to the invention, the same stored sample b(n-L), O≦n≦L1 is repeated so that the output response of the long-term predictor is always the same as that of the current frame. is a function of the samples input to the long-term predictor's delay block prior to starting. be. With respect to FIG. 2B, the structure of the long-term predictor delay block 230 is The shift register is extended to store another kL sample indicating the change. It is. Therefore, the shift register is filled with new samples b(n) Since, k is b(n-kL) in the shift register before the start of the frame. must be selected to represent a sample of L=32 and N=60 Using the previous example, output sample q(32) is a repeat of sample q(0). Therefore, this is b (0-L) = b (32-2L) or b (-32).

従って、ロングターム予測器遅延ブロック２３０の出力応答ｑ　（ｎ）は、ｑ　（ｎ）　＝ｂ　（ｎ−ｋＬ）に対応し、ここで０≦ｎ≦Ｎ−１であり、ｋは（ｎ　−ｋ　Ｌ）が負になるように選択された最も小さな整数である。より特定的にはもしｓ　（ｎ）のＮサンプルのフレームがロングターム予測器フィルタ１２４に入力されると、各サンプル番号ｎはｊ≦ｎ≦Ｎ＋ｊ−１であり、ここでｊはＮサンプルのフレームの最初のサンプルに対するインデックスである。従って、変数には（ｎ−ｋＬ）が常にｊより小さくなるように変化する。このことはロングターム予測器が出力応答を予測するために該フレームの開始に先立ち利用できるサンプルのみを利用することを保証する。Therefore, the output response q(n) of the long-term predictor delay block 230 is: q (n) = b (n-kL) , where 0≦n≦N-1, and k is such that (n - k L) is negative. is the smallest integer selected. More specifically, if N samples of s(n) When a frame of four frames is input to the long-term predictor filter 124, each sample The number n is j≦n≦N+j−1, where j is the first of a frame of N samples. This is an index for the sample. Therefore, the variable (n-kL) is always j Change to become smaller. This means that the long-term predictor predicts the output response. use only the samples available prior to the start of the frame to measure guaranteed.

第２Ａ図のロングタームフィルタ１２４の動作を第３図のフローチャートに従って説明する。ステップ３５０においてスタートすると、サンプル番号ｎはステップ３５１において０に初期化される。公称ロングターム予測器ラグパラメータＬおよびロングターム予測器係数βはステップ３５２において係数アナライザ１１０から入力される。ステップ３５３において、サンプル数または番号ｎがテストされ全フレームが出力されたかを見る。もしｎ＝Ｎであれば、動作はステップ３６１において終了する。もしすべてのサンプルがまだ計算されておらなければ、ステップ３５４において信号サンプルｓ　（ｎ）が入力される。ステップ３５５において、ロングターム予測器遅延ブロック２３０の出力応答が次の式に従って計算される。The operation of the long-term filter 124 in FIG. 2A is performed according to the flowchart in FIG. I will explain. Starting in step 350, sample number n is It is initialized to 0 in step 351. Nominal long-term predictor lag parameter L and the long-term predictor coefficients β are determined by the coefficient analyzer 11 in step 352. Input from 0. In step 353, the number of samples or number n tested to see if all frames have been output. If n=N, the operation is step 3 The process ends at 61. If all samples have not been calculated yet, In step 354, signal samples s(n) are input. Step 355 , the output response of the long-term predictor delay block 230 is according to the equation Calculated.

いかまたは等しい最も近い整数を表す。たとえば、もしｎ＝５６かつＬ＝３２であれば、　巨＋Ｌ）／ＬＪＬ）は１（５６＋３２／３２Ｊ　Ｌとなり、これは　Ｌ（２，７５）」Ｌまたは２Ｌとなる。ステップ３５６において、ロングタームフィルタの出力応答ｂ　（ｎ）は次の式に従って計算される。Represents the nearest integer equal to or equal to. For example, if n=56 and L=32 If there is, Giant+L)/LJL) will be 1(56+32/32JL), which is L(2,75)''L or 2L. In step 356, the long term The output response b(n) of the filter is calculated according to the following equation.

ｂ　（ｎ）　＝βｑ　（ｎ）　＋ｓ　（ｎ）これは乗算器２２０および加算器２１０の機能を表す。b (n) = βq (n) + s (n) This is multiplier 220 and adder 2 Represents 10 functions.

ステップ３５７において、ｂ（ｎ−２）およびｂ（ｎ−ＬＭＡＸ）の間のすべてのレジスタのロケーションに対して、シフトレジスタのサンプルが左に１ポジシヨンシフトされ、ここでＬ　は割り当て可能な最大のロングターム予測ＡＸ器ラグを表す。好ましい実施例においては、Ｌ　はＩＡＸ４３に等しい。ステップ３５８において、出力サンプルｂ（ｎ）がシフトレジスタの最初のロケーションｂ（ｎ−１）に入力される。ステップ３５９はろ渡されたサンプルｂ（ｎ）を出力する。サンプル数ｎは次にステップ３６０において増分され、かつ次にステップ３５３においてテストされる。すべてのＮサンプルが計算されたとき、処理はステップ３６１において終了する。In step 357, everything between b(n-2) and b(n-LMAX) For a register location, the shift register sample moves one position to the left. where L is the maximum assignable long-term prediction AX Represents a vessel rug. In a preferred embodiment, L is IAX Equal to 43. In step 358, the output samples b(n) are transferred to the shift register. data is input into the first location b(n-1) of the data. Step 359 is passed output sample b(n). The number of samples n is then increased in step 360. and then tested in step 353. All N samples are Once calculated, processing ends at step 361.

第２Ｃ図は本発明を導入したロングタームフィルタの別の実施例である。フィルタ１２４′は第２Ａ図の再帰（ｒｅｃｕｒｓｉｖｅ）フィルタ構成のフィードフォワード反転板である。入力ベクトルｓ　（ｎ）は減算器２４０およびロングターム予測器遅延ブロック２６０の双方に印加される。遅延されたベクトルｑ　（ｎ）は乗算器２５０に出力され、該乗算器２５０はロングターム予測器係数βによって該ベクトルを尺度変更する。デジタルフィルタ１２４′の出力応答Ｈ（ｚ）は２変換表示で次のように与えられる。FIG. 2C is another embodiment of a long-term filter incorporating the present invention. fill The filter 124' is the feedfif of the recursive filter configuration of FIG. 2A. It is a forward reversal board. The input vector s(n) is input to the subtracter 240 and the long is applied to both system predictor delay blocks 260. Delayed vector q ( n) is output to a multiplier 250, which multiplier 250 inputs the long-term predictor coefficient β Therefore, scale the vector. The output response H(z ) is given in two-transform representation as follows.

この式において、ｎはＮのサンプルを含むフレームのサンプル番号を表し、０≦ ｎ≦Ｎ−１であり、βはロングタームフィルタの係数を表し、Ｌはロングターム予測器の公（ｎ＋Ｌ）／’Ｌより小さいかまたはこれに等しい最も近い整数を表す。フィルタ１２４′の出力信号ｂ　（ｎ）はまた入力信号ｓ　（ｎ）に関し次のように規定できる。In this formula, n represents the sample number of the frame containing N samples, and 0≦ n≦N-1, β represents the coefficient of the long-term filter, and L represents the long-term filter. represents the nearest integer less than or equal to the predictor's common (n+L)/'L vinegar. The output signal b(n) of the filter 124' also has the following relation to the input signal s(n): It can be specified as follows.

この場合、０≦ｎ≦Ｎ−１である。当業者に理解できるように、ロングターム予測器の構造は再びロングターム予測器のラグＬがフレーム長Ｎより小さい場合に該ロングターム予測器の同じ記憶されたサンプルを繰り返し出力するように変更されている。In this case, 0≦n≦N-1. As can be understood by those skilled in the art, long-term forecasting The structure of the instrument is again when the lag L of the long-term predictor is smaller than the frame length N. Changed the long-term predictor to repeatedly output the same stored samples. has been done.

次に第５図を参照すると、第１図のロングタームフィルタ１２４の好ましい実施例が示されており、これはラグパラメータＬに対するサブサンプルの分解能を許容する。入力音声ベクトルｓ　（ｎ）のＮサンプルのフレームが加算器５１０に印加される。加算器５１０の出力はロングタームフィルタ１２４に対する出力ベクトルｂ　（ｎ）を生成する。Referring now to FIG. 5, a preferred implementation of the long-term filter 124 of FIG. An example is shown, which allows subsample resolution for the lag parameter L. To tolerate. A frame of N samples of the input speech vector s(n) is sent to the adder 510. applied. The output of adder 510 is the output vector for long-term filter 124. vector b(n) is generated.

出力ベクトルｂ　（ｎ）はロングターム予測器の遅延されたベクトル発生器ブロック５３０にフィードバックされる。The output vector b(n) is the delayed vector generator block of the long-term predictor. feedback to block 530.

公称ロングターム予測器ラグパラメータＬもまた遅延ベクトル発生器ブロック５３０に入力される。ロングターム予測器ラグパラメータＬは非整数有理数の値を取り得る。好ましい実施例はＬが２分の１の倍数である値を取ることを許容する。本発明のサブサンプル分解能ロングターム予測器の別の構成は３分の１または４分の１あるいは任意の他の有理分数の倍数である値を許容することができる。The nominal long-term predictor lag parameter L is also determined by the delay vector generator block 5. 30 is input. The long-term predictor lag parameter L is a non-integer rational number. It can be taken. The preferred embodiment allows L to take on values that are multiples of 1/2. . Another configuration of the subsample resolution long-term predictor of the present invention is one-third or Values that are multiples of a quarter or any other rational fraction can be accepted.

好ましい実施例においては、遅延ベクトル発生器５３０はｂ　（ｎ）の過去のサンプルを保持するメモリを含む。さらに、ｂ　（ｎ）の補間されたサンプルもまた遅延ベクトル発生器５３０で計算されかつそのメモリ内に格納される。In the preferred embodiment, delay vector generator 530 uses the past samples of b(n). Contains memory that holds samples. Furthermore, the interpolated samples of b(n) are also is calculated by the delay vector generator 530 and stored in its memory.

好ましい実施例においては、遅延ベクトル発生器５３０に含まれるロングターム予測器の状態（ｓ　ｔ　ａ　ｔ　ｅ）はｂ（ｎ）の各々の記憶されたサンプルに対し２つのサンプルを有している。１つのサンプルはｂ　（ｎ）に対するものでありかつ他のサンプルは２つの連続するｂ　（ｎ）のサンプルの間の補間されたサンプルを表す。このようにして、ｂ（ｎ）のサンプルは整数の遅延またはハーフサンプルの遅延の倍数に対応する遅延ベクトル発生器５３０から得ることができる。この補間は、１９８３年に、プレンティス・ホール・ルーピン・ドナリーにより出版された、アール・クロチーアおよびエル・ラビナーによる「多重レート・デジタル信号処理」に述べられている補間有限インパルス応答フィルタを用いて行われる。ベクトル遅延発生器５３０の動作は第６Ａ図および第６Ｂ図のフローチャートに関連してさらに詳細に説明される。In a preferred embodiment, delay vector generator 530 includes a long term The state of the predictor (s t a t e) is set for each stored sample of b(n). However, there are two samples. One sample is for b(n) and other samples are interpolated between two consecutive samples of b(n) Represents a sample. In this way, the samples of b(n) are integer delayed or hard can be obtained from delay vector generator 530 corresponding to a multiple of the delay of the fsample. Wear. This interpolation was developed in 1983 by Prentice Hall, Lupine, and Donnalley. “Multiple Ray” by Earl Clochia and Elle Rabiner, published by Using the interpolating finite impulse response filter described in It will be carried out. The operation of vector delay generator 530 is shown in FIGS. 6A and 6B. Further details will be explained in connection with the lowchart.

遅延ベクトル発生器５３０はロングターム乗算器ブロック５２０に出力ベクトルｑ　（ｎ）を提供し、該ロングターム乗算器ブロック５２０はロングターム予測器係数βによりロングターム予測器応答を尺度変更する。尺度変更された出力β ｑ　（ｎ）は次に加算器５１０に印加され第５図における再帰フィルタ１２４のフィードバックループを完成する。Delay vector generator 530 supplies the output vector to long-term multiplier block 520. q(n), and the long-term multiplier block 520 provides long-term prediction scale the long-term predictor response by the predictor coefficient β. scaled output β q(n) is then applied to adder 510 and is applied to recursive filter 124 in FIG. Complete the feedback loop.

第６Ａ図および第６Ｂ図を参照すると、第５図のロングタームフィルタにより行われる動作を詳細に説明するための詳細フローチャートが示されている。本発明の好ましい実施例によれば、ロングターム予測器メモリの分解能はＮポイントのシーケンスｂ　（ｎ）を、２Ｎポイントのベクトルｅｘ　（ｉ）にマツピングすることにより拡張される。ｅｘ　（ｉ）の負のインデックスされたサンプルはロングタームフィルタ出力ｂ（ｎ）、励起、または拡張分解能ロングタームヒストリの拡張された分解能の過去の値を含む。該マツピング処理はそれが印加される各時間ごとに、ロングターム予測器メモリの一時的な分解能を倍加する。ここでは簡単化のために単一段のマツピングが説明されているが、付加的な段も本発明の他の実施例においては実施することができる。Referring to FIGS. 6A and 6B, the long-term filter of FIG. A detailed flowchart is shown to explain in detail the operations performed. present invention According to a preferred embodiment, the long-term predictor memory has a resolution of N points. Mapping the sequence b (n) to the 2N point vector ex (i) Expanded by The negative indexed samples of ex (i) are long-term filter output b(n), excitation, or extended resolution long-term hist Contains past values of extended resolution for The mapping process is applied Each time, double the temporal resolution of the long-term predictor memory. here Although a single stage of mapping is described for simplicity, additional stages are also contemplated by the present invention. can be implemented in other embodiments.

第６Ａ図におけるステップ６０２のスタートに入ると、フローチャートはステップ６０４に進み、そこでＬ１βおよびｓ　（ｎ）が入力される。ステップ６０８において、ベクトルｑ　（ｎ）が次の式に従って構成される。Upon entering the start of step 602 in FIG. 6A, the flowchart The process advances to step 604, where L1β and s(n) are input. Step 608 , the vector q(n) is constructed according to the following equation.

この式において　ｉ（ｎ＋Ｌ）／ＬＪは（ｎ＋Ｌ）／Ｌより小さいかまたは等しい最も近い整数を表し、かつＬはロングターム予測器ラグである。有声音に対しては、ロングターム予測器ラグＬはピッチ期間またはピッチ期間の倍数である。In this formula, i(n+L)/LJ is less than or equal to (n+L)/L. and L is the long-term predictor lag. for voiced sounds In other words, the long-term predictor lag L is the pitch period or a multiple of the pitch period.

Ｌは整数または好ましい実施例においてはその少数部分が０．　５である実数である。Ｌの少数部分が０．５である場合、Ｌはサンプルの半分の実効分解能を有する。L is an integer or, in a preferred embodiment, the decimal portion thereof is 0. A real number that is 5 be. If the fractional part of L is 0.5, then L has an effective resolution of half the sample. do.

ステップ６１０において、ロングタームフィルタのベクトルｂ　（ｎ）は次の式によって計算される。In step 610, the long-term filter vector b(n) is determined by the following equation: Calculated by

ｂ　（ｎ）　＝βＱ　（ｎ）　＋ｓ　（ｎ）但し、０≦ｎ≦Ｎ−１ステップ６１２において、ロングタームフィルタのベクトルｂ　（ｎ）が出力される。ステップ６１４において、拡張された分解能の状態ｅｘ（ｎ）が更新されｑ　（ｎ）の補間値が発生されかつ遅延ベクトル発生器５３０のメモリ内に格納される。ステップ６１４は第６Ｂ図により詳細に示されている。次に、ステップ６１６において処理は完了しかつ停止する。b (n) = βQ (n) + s (n) However, 0≦n≦N-1 In step 612, the long-term filter vector b(n) is output. It will be done. In step 614, the extended resolution state ex(n) is updated. An interpolated value of q(n) is generated and stored in the memory of the delay vector generator 530. be done. Step 614 is shown in more detail in Figure 6B. Then step At 616, processing is complete and stops.

第６ＢＩＩのステップ６２２におけるスタートに入ると、フローチャートはステップ６２４に進み、そこでこのサブフレームにおいて計算されるべきｅｘ　（ｉ）におけるサンプルが０に合わせられ、すなわちｉ　＝−Ｍ、　−Ｍ＋２．　、　。Upon entering the start at step 622 of the sixth BII, the flowchart Proceed to step 624 where ex(i ) are zeroed, i.e. i = -M, -M+2 . , .

、、２Ｎ−１に対しｅｘ　（ｉ）＝０とされ、ここでＭは２Ｍ＋１次のフィルタのために奇数に選択される。たとえば、フィルタの次数が３９であれば、Ｍは１９である。もちろんＭは単純化のために奇数であるとして選択されているが、Ｍはまた偶数でもよい。ステップ６２６において、ｉ＝０゜２、、、、．２（Ｎ− １）に対するｅｘ　（ｉ）の１つおきのサンプルが次の式に従いｂ　（ｎ）のサンプルで初期化される。,, ex (i) = 0 for 2N-1, where M is the 2M+1st-order filter is chosen to be an odd number. For example, if the order of the filter is 39, M is 1 It is 9. Of course, M is chosen to be odd for simplicity, but M can also be an even number. In step 626, i=0°2, . 2(N- For 1), every other sample of ex (i) is the sample of b (n) according to the following formula. initialized with a sample.

ｅｘ　（２ｉ）　＝ｂ　（ｆ）但し、ｉ＝０．１．、、、、Ｎ−１゜従って、ｊ＝０．　２．　、　、　、　、　２　（Ｎ−１）に対するｅｘ　（ｉ）はその偶数の指数に対しマツピングされた現在のサブフレームに対する出力ベクトルｂ　（ｎ）を保持し、一方ｉ＝１．　３．　、　、　、　、　２　（Ｎ− １）　＋１に対するｅｘｄ（ｉ）の奇数の指数は０で初期化されている。ex (2i) = b (f) However, i=0.1. ,,,,N-1゜ Therefore, j=0. 2. , , , , ex (i ) is the output vector for the current subframe mapped to that even index. vector b(n), while i=1. 3. , , , , 2 (N- 1) Odd exponents of exd(i) for +1 are initialized to 0.

ステップ６２８において、０に初期化されたｅｘ　（ｉ）の補間されたサンプルがそのようなＦＩＲフィルタの次数が上に述べたように２Ｍ＋１であると仮定して、対称、ゼロ位相シフトフィルタを使用し、ＦＩＲ補間により再構成される。In step 628, the interpolated samples of ex(i) initialized to 0 Assume that the order of such a FIR filter is 2M+1 as mentioned above. is reconstructed by FIR interpolation using a symmetric, zero phase shift filter.

ＦＩＲフィルタの係数はａ　（Ｄであり、ここでｊ＝−Ｍ、−Ｍ＋２．、、、、Ｍ−１，Ｍおよびａ　（Ｄ＝ａ（−ｊ）である。ＦＩＲフィルタのタップに向けられた偶数サンプルのみがサンプル再構成において使用されるが、それは奇数サンプルは０にセットされているからである。その結果、２Ｍ＋１サンプルの代わりにＭ＋１サンプルが実際に重み付けられかつ各々の再構成されたサンプルのために加算される。ＦＩＲ補間は次の式に従って行われる。The coefficients of the FIR filter are a (D, where j=-M, -M+2., , M-1, M and a (D=a(-j). Towards the tap of the FIR filter Only the even numbered samples are used in the sample reconstruction, while the odd numbered samples This is because sample is set to 0. As a result, instead of 2M+1 samples, Then M+1 samples are actually weighted and for each reconstructed sample will be added to the FIR interpolation is performed according to the following equation.

（Ｍ＋］）ｅｔ（ｉ）＝　２Σ”２ｊ−１［ｅｘ（ｉ−２ｉ＋Ｉ）＋ｅｘ（ｉ＋２ｉ−１）　］ｊ・１但し、ｉ＝−Ｍ、　−１１＋２．　、　、　、　、２　（Ｎ−１）　−Ｍ−２，２（Ｎ〜１）−Ｍ再構成されるべき最初のサンプルはｅｘ（−Ｍ）であり、期待するようにｅｘ（１）ではないことに注意を要する。(M+]) et(i)=　2Σ”2j-1[ex(i-2i+I)+ex(i+2i-1) ]j・1 However, i=-M, -11+2. , , , , 2 (N-1) -M-2, 2(N~1)-M The first sample to be reconstructed is ex(-M) and the expected Note that it is not ex(1) as shown in the figure.

これは、指数−Ｍ、　−Ｍ＋２．　、　、　、　、−１における補間されたサンプルは現在のフレームにおける励起の評価を用いて先のフレームにおいて再構成されたからであり、それは実際の励起サンプルはその時規定されていなかったからである。現在のフレームにおいてこれらのサンプルは知られており（我々はｂ　（ｎ）を有している）、かつ従ってｉ＝−Ｍ、−Ｍ＋、、、、、−１に対するｅｘ（ｉ）のサンプルは今や再び再構成され、フィルタのタップは実際のかつ評価されていない値ｂ　（ｎ）に向けられている。This is the index -M, -M+2. Interpolated samples at , , , , -1 Pull reconstructs in the previous frame using the evaluation of the excitation in the current frame because the actual excitation sample was not specified at that time. It is et al. In the current frame these samples are known (we have b (n)), and therefore for i=-M, -M+, , , , -1 The samples of ex(i) are now reconstructed again and the taps of the filter are is directed to the unvalued value b(n).

上の式においてｉの最大の値は２　（Ｎ−１）−Ｍである。In the above equation, the maximum value of i is 2(N-1)-M.

これは、ｉ　＝２Ｎ−Ｍ、２Ｎ−Ｍ＋２．　、　、　、　、　２　（Ｎ　−１）＋１に対し、ｅｘ　（ｉ）の（Ｍ＋１）／２の奇数サンプルが依然として再構成されるべきであることを意味する。This means that i = 2N-M, 2N-M+2. , , , , 2 (N-1) +1, (M+1)/2 odd samples of ex (i) are still reconstructed means that it should be done.

しかしながら、指数（ｉｎｄｅｘ）ｉのこれらの値に対し、補間フィルタのより上のタップはまだ規定されていない励起の将来のサンプルを指示している。これらの指数に対するｅｘ　（ｉ）の値を計算するためには、１＝２Ｎ、２Ｎ＋２．、、、．２Ｎ＋Ｍに対するｅｘ　（ｉ）　（７）将来の状態カステップ６３０において評価することにより拡張される。However, for these values of index i, the interpolation filter's The top tap indicates future samples of excitation that have not yet been defined. this To calculate the value of ex (i) for the exponents of 1=2N, 2N+2. ,,,． ex (i) for 2N+M (7) Future state Ka step 630 It is expanded by evaluating the

ｅｘ　（ｉ）＝λｅｘ（ｉ−２Ｌ）但し、１＝２Ｎ、２Ｎ＋２．、、、．２Ｎ＋Ｍ−１この機構において使用されるべき２Ｌの最小値は２Ｍ＋１である。この束縛は次のように規定することにより解放できる。ex (i) = λex (i-2L) However, 1=2N, 2N+2. ,,,． 2N+M-1 used in this mechanism The minimum value of power 2L is 2M+1. This binding can be achieved by specifying It can be released.

ｅｘ　（ｉ）＝λｅｘ　（Ｆ　（ｉ−２Ｌ）　）但し、１＝２Ｎ、２Ｎ＋２．、、、．２Ｎ＋Ｍ−１この場合、奇数に等しい１−２Ｌに対し、Ｆ　（ｉ−２Ｌ）は次式で与えられる。ex (i) = λex (F (i-2L)) where 1 = 2N, 2N + 2. , ,,． 2N+M-1 In this case, F (i-2L) for 1-2L which is equal to an odd number is given by the following equation.

また、偶数に等しい１−２Ｌに対するＦ（ｉ−２Ｌ）は次式により与えられる。Further, F(i-2L) for 1-2L, which is an even number, is given by the following equation.

パラメータλ、すなわちヒストリ拡張スケールファクタは、ピッチ予測器係数である、βに等しくセットすることができ、あるいは１にセットされる。The parameter λ, i.e. the history expansion scale factor, is the pitch predictor coefficient 1, can be set equal to β, or is set to 1.

励起ヒストリがこのように拡張されると、ステップ６３２において、現行の拡張されたレゾリュージョンのサブフレームの最後の（Ｍ＋１）／２のゼロサンプルは次式を用いて計算される。Once the excitation history has been expanded in this way, in step 632 the current expansion The last (M+1)/2 zero samples of the subframe of the resolved resolution is calculated using the following formula.

（Ｍ＋１）ｅｘ（ｉ）＝　２Σ　’２ｉ−１［ｅｘ（ｉ−２ｉ＋１）＋ｅｘ（ｉ＋２ｉ−１）　］ｉ＝１但し、ｉ・２Ｎ−Ｍ、　２Ｎ−Ｍ＋２．　、　、　、　、２　（Ｎ−１）　＋１これらのサンプルは、１＝２Ｎ、２Ｎ＋２．、、、．２Ｎ＋Ｍ−１に対するｅｘ　（ｆ）のための実際の励起サンプルが一旦利用可能になると、次のサブフレームで再計算されるであろう。(M+1) ex(i)=2Σ'2i-1[ex(i-2i+1)+ex(i+2i-1 )]i=1 However, i・2N-M, 2N-M+2. , , , , 2 (N-1) +1 These samples are 1=2N, 2N+2. ,,,． ex for 2N+M-1 Once the actual excitation sample for (f) is available, the next subframe will be recalculated in the next session.

従って、ｎ＝０．Ｎ−１に対し、ｂ　（ｎ）がベクトルｅｘ　（ｉ）にマツピングされ、ここでｉ＝ｏ、２．、、、。Therefore, n=0. For N-1, b (n) is attached to vector ex (i) , where i=o, 2. ,,,.

２　（Ｎ−１）である。失われたゼロ化サンプルはＦＩＲ補間フィルタを用いて再構成されている。ＦＩＲ補間は失われたサンプルにのみ適用されることに注意を要する。これはｅｘ　（ｉ）の偶数の指数に格納されている、知られたサンプルに何等のひずみも無用に導入されないことを保証する。失われたサンプルのみを処理する付加的な利点は補間に関連する計算が半分になることである。2 (N-1). Lost zeroed samples are removed using a FIR interpolation filter. It has been restructured. Note that FIR interpolation is only applied to missing samples It takes. This is the known sample stored in the even exponent of ex(i). ensure that no strain is unnecessarily introduced into the module. lost samples only An additional advantage of processing is that the computations associated with interpolation are halved.

ステップ６３４において、最後にロングターム予測器ヒストリが２Ｎポイントにより拡張されたレゾリュージョン励起ベクトルｅｘ　（ｉ）の内容をシフトダウンすることにより更新される。In step 634, the long-term predictor history finally reaches the 2N point. Shift down the contents of the more extended resolution excitation vector ex (i) Updated by logging in.

ｅｘ　（ｉ）＝ｅｘ　（ｉ＋２Ｎ）但し、ｉ＝２Ｍａｘ　Ｌ、−１ここで、ＭａｘＬは用いられる最大のロングターム予測器遅延である。次に、ステップ６３６において、処理は完了しかつ停止する。ex (i) = ex (i+2N) However, i=2Max L, -1 where MaxL is the maximum long-term predictor delay used. Next, At step 636, processing is complete and stops.

次に第４図を参照すると、音声合成器（ｓｐｅｅｃｈｓｙｎｔｈｅｓｉｚｏｒ）のブロック図が本発明のロングタームフィルタを用いて図示されている。シンセサイザ４００はデマルチプレクサ４５０を介し、チャネルから受信したショートターム予測器パラメータα１、ロングターム予測器パラメータβおよびＬ１励起利得ファクタγおよびコード語■を得る。コード語ＩはコードブックＲＯＭ４２０に印加され励起ベクトルのコードブックをアドレスする。Referring now to FIG. 4, a speech synthesizer (speech synthesizer) A block diagram of is illustrated using the long-term filter of the present invention. synth The sizer 400 sends short signals received from the channel via a demultiplexer 450. Term predictor parameter α1, long term predictor parameter β and L1 excitation Obtain the gain factor γ and the code word ■. Code word I is code book ROM42 0 to address the codebook of excitation vectors.

コードブックＲＯＭ４２０は好ましくは、ここに参照のため導入される、米国特許第４，８１７，１５７号に述べられているようにして実施される。単一の励起ベクトルｕ■（ｎ）が次にブロック４２２において利得ファクタγにより乗算され、ロングターム予測器フィルタ４２４およびショートターム予測器フィルタ４２６によりろ波され再構成された音声ベクトルｓ’　１（ｎ）を得る。再構成された音声のフレームを表す、このベクトルは次にアナログ−デジタル（Ａ／Ｄ）コンバータ４０８に印加され再構成されたアナログ信号を生成し、これは次にフィルタ４０４によりエイリアシングを低減するために低域ろ波され、かつスピーカ４０２のような出力変換器に印加される。従って、このＣＥＬＰシンセサイザは第１図のＣＥＬＰアナライザと同じコードブック、利得ブロック、ロングタームフィルタ、およびショートタームフィルタを用いる。Codebook ROM 420 is preferably a US Pat. No. 4,817,157. single excitation The vector u(n) is then multiplied by the gain factor γ in block 422. , long-term predictor filter 424 and short-term predictor filter 4 26 to obtain the filtered and reconstructed speech vector s'1(n). reconstructed This vector is then converted into an analog-to-digital (A/D) is applied to a converter 408 to produce a reconstructed analog signal, which is then applied to a converter 408. filter 404 to reduce aliasing and is applied to an output transducer such as 402. Therefore, this CELP synthesizer has the same codebook, gain block, and long term as the CELP analyzer in Figure 1. A short-term filter and a short-term filter are used.

第７図は、第４図における音声シンセサイザのショートタームフィルタ４２６およびＤ／Ａコンバータ４０８を相互結合するためのピッチポストフィルタの詳細なブロック図である。ピッチポストフィルタは音声品質をフィルタ４２４および４２６により導入されるノイズを除去することにより増大する。Ｎサンプルの再構成された音声ベクトルｓ’　１　（ｎ）のフレームが加算器７１０に印加される。加算器７１０の出力はピッチポストフィルタのための出力ベクトルｓ”（ｎ）を生成する。出力ベクトルｓ″　（ｎ）はピッチポストフィルタの遅延サンプル発生器ブロック６３０にフィードバックされる。公称（ｎｏｍｉｎａｌ）ロングターム予測器ラグパラメータＬもまた遅延サンプル発生器ブロック７３０に入力される。Ｌは本発明に対しては非整数値をとることができる。もしＬが非整数であれば、補間ＦＩＲフィルタが使用され必要とされる分数的サンプル遅延を発生する。遅延サンプル発生器７３０は出力ベクトルｑ　（ｎ）を乗算器ブロック７２０に提供し、該乗算器ブロック７２０はピッチポストフィルタ応答をロングターム予測器係数βの関数である係数Ｒによって尺度変更する。FIG. 7 shows the short-term filter 426 and the voice synthesizer in FIG. Pitch post filter details for mutually coupling D/A converter 408 and D/A converter 408 It is a block diagram. The pitch post filter improves the audio quality by filtering 424 and 426 is increased by removing the noise introduced by 426. N samples re The constructed frame of speech vector s'1(n) is applied to adder 710. Ru. The output of adder 710 is the output vector s''(n ) is generated. The output vector s'' (n) is the delayed sample of the pitch post filter is fed back to the file generator block 630. nominal Ron The term predictor lag parameter L also enters delayed sample generator block 730. Powered. L can take non-integer values for the present invention. If L is a non-integer If so, an interpolating FIR filter is used to generate the required fractional sample delay. live. Delayed sample generator 730 transfers the output vector q(n) to the multiplier block 720, the multiplier block 720 provides a pitch postfilter response to a long Scale by a coefficient R that is a function of the term predictor coefficient β.

尺度変更された出力Ｒｑ　（ｎ）は次に加算器７１０に印加され第７図のピッチポストフィルタのフィードバックループを、完成する。The scaled output Rq(n) is then applied to an adder 710 to generate the pitch of FIG. Complete the postfilter feedback loop.

本発明に係わるロングターム予測器応答を利用する際に、励起利得ファクタγおよびロングターム予測器係数βは閉ループ構成におけるＬのすべての値に対し同時に最適化することができる。この結合された最適化技術はＬ＜Ｎの値に対しては今までは実際的ではなかったが、それは結合最適化方程式が単一パラメータβ において非線形となるからであった。本発明はロングターム予測器の構造を変更しリニアな結合最適化方程式を許容する。さらに、本発明はロングターム予測器ラグが１サンプルより良好な分解能を持つことを許容し、それによりその性能を増強する。When utilizing the long-term predictor response according to the present invention, the excitation gain factor γ and and long-term predictor coefficient β are the same for all values of L in the closed-loop configuration. can be optimized at times. This combined optimization technique works for values of L<N. has not been practical until now, since the joint optimization equation has a single parameter β This is because it becomes nonlinear in . The present invention changes the structure of the long-term predictor and allows linear joint optimization equations. Furthermore, the present invention provides a long-term predictor Allows the lag to have better resolution than one sample, thereby improving its performance. Strengthen.

さらに、フードブックサーチ手順がさらに単純化されるが、それはロングタームフィルタのゼロ状態応答がフレーム長より小さなラグに対しゼロになるからである。この付加的な特徴は当業者がロングタームフィルタの効果をフードブックサーチ手順から除去することを許容する。従って、実際の実施上の利点および低いビットレートを維持しながらすべてのピッチレートに対しより高い品質の音声を提供できるＣＥＬＰ音声コーダコーされた。In addition, the food book search procedure is further simplified, but only in long terms. This is because the zero-state response of the filter goes to zero for lags smaller than the frame length. Ru. This additional feature allows those skilled in the art to appreciate the effectiveness of long-term filters in food book studies. be removed from the process. Therefore, the practical implementation advantage and low Higher quality audio for all pitch rates while maintaining bitrate CELP audio coder can be provided.

本発明の特定の実施例が示されかつ説明されたが、その広い見地における本発明から離れることなくさらに別の変更および改良を行うことができる。例えば、任意の形式の音声コーディング（例えば、ＲＥＬＰ、マルチパルス、ＲＰＥ、ＬＰＧ、その他）をここに述べたサブサンプル分解能ロングターム予測器ろ波技術とともに用いることができる。さらに、サブサンプル分解能のロングターム予測器構造の付加的な等画構成を上に述べたものと同じ計算を行うように構成することができる。While specific embodiments of the invention have been shown and described, the invention in its broader aspects Further changes and improvements can be made without departing from the. For example, Any form of audio coding (e.g. RELP, multipulse, RPE, LP) G, et al.) with the subsample resolution long-term predictor filtering techniques described here. Both can be used. Additionally, a long-term predictor with subsample resolution Configuring additional isometric configurations of structures to perform the same calculations as described above. Can be done.

ＦＩＧ、６Ｂ閑瞭膿査報告１ｍｓ□ＩＡ″″−“’ｐＣｗ１０ｃＱｎ／（ＩＭ２５FIG.6B Absolute abscess report 1ms□IA″″−“’pCw10cQn/(IM25

Claims

【特許請求の範囲】[Claims]

１．再生のために音声シンセサイザに通信するための音声のエンコード方法であって、前記音声はＮが１より大きな整数であるとし、各々Ｎ個のサンプルを有する音声ベクトルのフレームを具備し、前記方法は、メモリ手段に各々複数のサンプルを有する複数の励起ベクトルを格納する段階であって、前記励起ベクトルの第１の部分は各々Ｎ個より少ないサンプルを有しかつ前記励起ベクトルの第２の部分は各々Ｎ個のサンプルを有し、かつ各々の励起ベクトルは異なるデジタルコード語および１より大きくかつ所定の最大数より小さな整数である異なる遅延パラメータに関連するもの、前記励起ベクトルを現在の音声ベクトルでサーチし前記現在の音声ベクトルに最もよく整合する励起ベクトルのコード語および遅延パラメータを決定する段階であって、該決定は、前記メモリ手段から前記励起ベクトルのサンプルを読み取る段階、前記第１の部分における前記励起ベクトルのサンプルを反復しそれにより前記第１の部分の各励起ベクトルがＮ個のサンプルを有するようにする段階、各々の励起ベクトルに対応する少なくとも１つの補間励起ベクトルを発生する段階であって、前記補間励起ベクトルのサンプルは対応する励起ベクトルのサンプルから補間され、該補間された励起ベクトルは前記対応する励起ベクトルと同じコード語および前記対応する励起ベクトルの遅延パラメータに関連する非整数有理数である遅延パラメータを有するもの、前記励起ベクトルおよび前記補間された励起ベクトルのサンプルを前記現在の音声ベクトルと比較しそれらの間の差を決定する段階、そして前記励起ベクトルまたは補間励起ベクトルであって前記現在の音声ベクトルからの差が最小のもののコード語および遅延パラメータを選択する段階、によって行われるもの、および前記音声シンセサイザによる前記現在の音声ベクトルの再生のために前記メモリ手段における前記決定された励起ベクトルのロケーションを識別するために前記決定されたコード語および遅延パラメータを通信する段階、を具備する音声シンセサイザに通信するための音声のエンコード方法。1. A method of encoding audio to communicate to an audio synthesizer for playback. Therefore, each voice has N samples, where N is an integer greater than 1. the method comprises storing a plurality of frames of audio vectors each in memory means; storing a plurality of excitation vectors having pulls, the first parts each have fewer than N samples and the second part of the excitation vector Each part has N samples, and each excitation vector has a different digital component. code word and a different delay parameter that is an integer greater than 1 and less than a predetermined maximum number. Things related to lameter, Search the excitation vector with the current speech vector and find the most suitable vector for the current speech vector. In the step of determining the codeword and delay parameters of the excitation vectors that match well Therefore, the decision was reading samples of the excitation vector from the memory means; repeating the sampling of the excitation vector in the first portion, thereby each excitation vector of one part has N samples; generating at least one interpolated excitation vector corresponding to the excitation vector; Thus, the interpolated excitation vector samples are interpolated from the corresponding excitation vector samples. and the interpolated excitation vector has the same code word as the corresponding excitation vector. and is a fractional rational number associated with the delay parameter of the corresponding excitation vector. with a delay parameter of A sample of the excitation vector and the interpolated excitation vector is added to the current sound. comparing the voice vectors and determining the difference between them; the excitation vector or an interpolated excitation vector from the current speech vector; performed by selecting the codeword and delay parameters with the smallest difference in and the memory for playback of the current audio vector by the audio synthesizer; the means for identifying the location of the determined excitation vector in the means; communicating the determined codeword and delay parameters. How to encode audio to communicate to the synthesizer.

２．少なくとも１つの補間励起ベクトルを発生する前記段階は対応するベクトルの２つの連続するサンプルを平均化して前記補間された励起ベクトルの対応するサンプルを発生する段階を含む請求の範囲第８項に記載の方法。2. said step of generating at least one interpolated excitation vector of the interpolated excitation vector by averaging two consecutive samples of 9. The method of claim 8, including the step of generating a sample.

３．再生のために音声シンセサイザに通信ずるために音声をエンコードする装置であって、前記音声は、Ｎが１より大きな整数であるとしてＮ個のサンプルを各々有する音声ベクトルのフレームを備え、前記装置は、各々複数のサンプルを有する複数の励起ベクトルを記憶するための手段であって、前記励起ベクトルの第１の部分は各々Ｎ個より少ないサンプルを有しかつ前記励起ベクトルの第２の部分は各々Ｎ個のサンプルを有し、かつ各励起ベクトルは異なるデジタルコード語および１より大きくかつ所定の最大数より小さな整数である異なる遅延パラメータと関連しているもの、前記励起ベクトルを現在の音声ベクトルによってサーチし前記現在の音声ベクトルに最もよく整合する励起ベクトルのコード語および遅延パラメータを決定するための手段であって、前記決定は、前記記憶手段からの前記励起ベクトルのサンプルを読み取り、前記第１の部分における前記励起ベクトルのサンプルを反復して前記第１の部分の各励起ベクトルがＮ個のサンプルを有するようにし、各励起ベクトルに対応する少なくとも１つの補間励起ベクトルを発生し、この場合該補間励起ベクトルのサンプルは対応する励起ベクトルのサンプルから補間され、前記補間励起ベクトルは対応する励起ベクトルと同じコード語および前記対応する励起ベクトルの遅延パラメータに関連する非整数有理数である遅延パラメータを有するようにし、前記励起ベクトルおよび前記補間された励起ベクトルのサンプルを前記現在の音声ベクトルのサンプルと比較しそれらの間の差を決定し、かつ前記励起ベクトルまたは補間された励起ベクトルであって前記現在の音声ベクトルからの差が最も少ないもののコード語および遅延パラメータを選択する、ことにより行われるもの、および前記音声シンセサイザによる前記現在の音声ベクトルの再生のために前記記憶手段における前記決定された励起ベクトルのロケーションを識別するために前記決定されたコード語および遅延パラメータを通信するための手段、を具備する音声シンセサイザに通信するために音声をエンコードする装置。3. A device that encodes audio for communication to an audio synthesizer for playback. where N samples are each divided into N samples, where N is an integer greater than 1. the apparatus comprises frames of audio vectors each having a plurality of samples; means for storing a plurality of excitation vectors, the means for storing a plurality of excitation vectors; 1 part each has fewer than N samples and the second part of the excitation vector each excitation vector has N samples, and each excitation vector has a different digital code word. and a different delay parameter that is an integer greater than 1 and less than a given maximum number. Things related to ta, Searching the excitation vector by the current audio vector and searching the current audio vector Determine the excitation vector codeword and delay parameters that best match the means for determining, reading a sample of the excitation vector from the storage means; repeating the samples of the excitation vector in the first part to repeat the sampling of the excitation vector in the first part; Let each excitation vector of have N samples, Generate at least one interpolated excitation vector corresponding to each excitation vector, and The samples of the combined interpolated excitation vector are interpolated from the samples of the corresponding excitation vector. and the interpolated excitation vector has the same code word and the pair as the corresponding excitation vector. The delay parameter is a fractional rational number related to the delay parameter of the corresponding excitation vector. data, A sample of the excitation vector and the interpolated excitation vector is added to the current sound. compare the voice vector samples and determine the differences between them; and the excitation vector or an interpolated excitation vector and the current audio vector; selecting the codeword and delay parameters that differ least from the performed by, and the memory hand for playback of the current audio vector by the audio synthesizer; the determined excitation vector in order to identify the location of the determined excitation vector in the stage; a means for communicating a defined code word and delay parameters; A device that encodes audio for communication to a synthesizer.

４．前記サーチ手段は補間された励起ベクトルの各サンプルを対応するベクトルの２つの連続する対応するサンプルを平均することにより発生する請求の範囲第１０項に記載の装置。4. The search means searches each sample of the interpolated excitation vector for a corresponding vector. Claim number 1 generated by averaging two consecutive corresponding samples of The device according to item 10.