JP2002530706A

JP2002530706A - Closed loop variable speed multi-mode predictive speech coder

Info

Publication number: JP2002530706A
Application number: JP2000583004A
Authority: JP
Inventors: ダス、アミタバ; マンジュナス、シャラス; デジャコ、アンドリュー・ピー
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 1998-11-13
Filing date: 1999-11-12
Publication date: 2002-09-17
Also published as: AU1524300A; WO2000030075A1; KR20010087393A; EP1129451A1

Abstract

(57)【要約】閉ループでマルチモードの予測スピーチコーダは、複数のうち１つのコード化モードで動作するように構成されているコデック100,200 と、最低のビット速度コード化モードを入力スピーチフレームへ与えるように構成されている閉ループモード決定モジュールとを含んでいる。コデックの性能の尺度が獲得され、しきい値と比較される。性能の尺度がしきい値を超えないならば、最低のビット速度のコード化モードは高いビット速度を有するコード化モードのために拒否される。プロセスはコード化性能が満足されるまで継続されることができる。高いビット速度の直接コード化モードは低いビット速度の予測ベースのコード化モードが適切に実行できなくなった後、与えられてもよい。 (57) Abstract: A closed-loop, multi-mode predictive speech coder provides an input speech frame with a codec 100,200 configured to operate in one of a plurality of coding modes and a lowest bit rate coding mode. And a closed loop mode determination module configured as described above. A measure of the codec's performance is obtained and compared to a threshold. If the performance measure does not exceed the threshold, the coding mode with the lowest bit rate is rejected for the coding mode with the higher bit rate. The process can continue until the coding performance is satisfied. The high bit rate direct coding mode may be provided after the low bit rate prediction-based coding mode cannot perform properly.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】TECHNICAL FIELD OF THE INVENTION

本発明は、スピーチ処理の分野、特に閉ループで可変速度でマルチモードであ
るスピーチの予測コード化に関する。The present invention relates to the field of speech processing, and more particularly to predictive coding of speech that is multi-mode with variable speed in a closed loop.

【０００２】[0002]

【従来の技術】[Prior art]

デジタル技術による音声の送信は特に長距離でデジタル無線電話用として広く
普及している。その結果として、再構成されたスピーチの知覚された品質を維持
しながらチャンネルによって送信されることができる最少量の情報を決定するこ
とに関心が向けられている。スピーチが単にサンプリングされ、デジタル化され
て送信されるならば、毎秒６４キロビット（ｋｂｐｓ）程度のデータ速度が一般
的なアナログ電話のスピーチ品質を実現するのに必要とされる。しかしながら、
スピーチ解析の使用と、その後の適切なコード化と、送信と、受信機における再
合成により、データ速度の大きな減少が実現されることができる。Transmission of voice by digital technology is widespread, especially for long distance digital radio telephones. As a result, there is interest in determining the minimum amount of information that can be transmitted by a channel while maintaining the perceived quality of the reconstructed speech. If the speech is simply sampled and digitized and transmitted, a data rate on the order of 64 kilobits per second (kbps) is required to achieve typical analog telephone speech quality. However,
Through the use of speech analysis, followed by proper coding, transmission and re-synthesis at the receiver, a large reduction in data rate can be realized.

【０００３】人間のスピーチ生成モデルに関するパラメータを抽出することによりスピー
チを圧縮するための技術を使用する装置はスピーチコーダと呼ばれる。スピーチ
コーダは入来するスピーチ信号を時間または解析フレームのブロックに分割する
。スピーチコーダは典型的にエンコーダおよびデコーダ、またはコデックを具備
する。エンコーダは入来するスピーチフレームを解析してある関連パラメータを
抽出し、その後、パラメータを２進表示即ちビットのセットまたは２進データパ
ケットに量子化する。データパケットは通信チャンネルによって受信機またはデ
コーダへ送信される。デコーダはデータパケットを処理し、パラメータを生成す
るためにこれらを量子化から復元し、その後、復元されたパラメータを使用して
スピーチフレームを再合成する。An apparatus that uses a technique for compressing speech by extracting parameters related to a human speech generation model is called a speech coder. The speech coder divides the incoming speech signal into blocks of time or analysis frames. A speech coder typically comprises an encoder and a decoder, or codec. The encoder analyzes the incoming speech frame to extract certain relevant parameters and then quantizes the parameters into a binary representation or set of bits or a binary data packet. The data packets are transmitted over a communication channel to a receiver or decoder. The decoder processes the data packets and recovers them from quantization to generate parameters, and then resynthesizes the speech frames using the recovered parameters.

【０００４】スピーチコーダの機能はスピーチに固有の全ての自然冗長を除去することに
よりデジタル化されたスピーチ信号を低ビット速度信号に圧縮することである。
デジタル圧縮は１組のパラメータで入力スピーチフレームを表し、１組のビット
を有するパラメータを表すために量子化を使用することにより実現される。入力
スピーチフレームがビット数Ｎ_iを有し、スピーチコーダにより発生されるデー
タパケットがビット数Ｎ_oを有するならば、スピーチコーダにより実現される圧
縮係数はＣ_r＝Ｎ_i／Ｎ_oである。ターゲットの圧縮係数を実現しながら、復号
されたスピーチの高い音声品質を維持する挑戦が行われる。スピーチコーダの性
能は（１）スピーチモデル、または前述の解析および合成プロセスの組合わせが
どの程度良好に行われ、（２）パラメータ量子化プロセスがフレーム当たりＮ_o ビットのターゲットビット速度でどの程度良好に実行されるかに依存する。した
がってスピーチモデルの目標はスピーチ信号の本質または各フレームに対する小
さいセットのパラメータを有するターゲット音声品質を捕捉することである。[0004] The function of the speech coder is to compress the digitized speech signal into a low bit rate signal by removing any natural redundancy inherent in the speech.
Digital compression is achieved by using quantization to represent an input speech frame with a set of parameters and to represent a parameter with a set of bits. If the input speech frame has a number of bits N _i and the data packet generated by the speech coder has a number of bits N _o , then the compression factor realized by the speech coder is C _r = N _i / N _o . The challenge is to maintain high speech quality of the decoded speech while achieving the target compression factor. Performance of a speech coder (1) speech model or combination of the above analysis and synthesis process how been satisfactorily performed, (2) how well the parameter quantization process is the target bit rate of N _o bits per frame Depending on what is done. Thus, the goal of the speech model is to capture the essence of the speech signal or a target speech quality with a small set of parameters for each frame.

【０００５】低ビット速度で効率的にスピーチを符号化するための１つの有効な方法はマ
ルチモードコード化である。マルチモードコーダは異なるモードまたは符号化−
復号化アルゴリズムを異なるタイプの入力スピーチフレームに与える。各モード
または符号化−復号化プロセスは最も効率的な方法で１つのタイプのスピーチセ
グメント（即ち音声、非音声、または背景雑音）を表すようにカスタム化される
。外部モード決定機構は入力スピーチフレームを検査し、フレームに与えるモー
ドに関する決定を行う。典型的に、モード決定は入力フレームから複数のパラメ
ータを抽出し、これらを評価し、与えるモードに関する決定を行うことによって
開ループ方法で行われる。したがって、モード決定は前もって出力スピーチの正
確な状況、即ち出力スピーチがどの程度音声品質または任意の他の性能の尺度に
関して入力スピーチと類似しているかを知らずに行われる。スピーチコデックに
対する例示的な開ループモード決定は本明細書で参考文献とされる米国特許第5,
414,796 号明細書に記載されている。[0005] One effective way to efficiently encode speech at low bit rates is multi-mode coding. Multi-mode coder uses different modes or coding
The decoding algorithm is applied to different types of input speech frames. Each mode or encoding-decoding process is customized to represent one type of speech segment (ie, speech, non-speech, or background noise) in the most efficient manner. An external mode decision mechanism examines the input speech frame and makes a decision about the mode to give to the frame. Typically, the mode decision is made in an open-loop manner by extracting a plurality of parameters from the input frame, evaluating them and making a decision on the mode to be given. Thus, the mode decision is made without knowing in advance the exact situation of the output speech, i.e., how similar the output speech is to the input speech in terms of voice quality or any other measure of performance. An exemplary open loop mode determination for a speech codec is described in U.S. Pat.
No. 414,796.

【０００６】マルチモードコード化は各フレームの同一数のビットＮ_oを使用した固定し
た速度、または異なるビット速度が異なるモードで使用される可変速度であって
もよい。可変速度のコード化の目標は、ターゲット品質を得るのに適切なレベル
にコデックパラメータを符号化するために必要のあるビット量だけを使用するこ
とである。結果として、固定した速度の高速度のコーダと同一のターゲット音声
品質が可変ビット速度（ＶＢＲ）技術を使用して非常に低い平均速度で得られる
ことができる。通常のＶＢＲスピーチコーダは異なるビット速度を有するモード
で設計される。例示的な可変速度スピーチコーダは米国特許第5,414,796 号明細
書に記載されている。前述の特許明細書で記載されたコデックは以下の４つの速
度（１）全速度（ＦＲ）、（２）半速度（ＨＲ）、（３）１／４速度（ＱＲ）、
（４）１／８速度（ＥＲ）を有する。前述の速度に対しては、スピーチの各フレ
ームはそれぞれフレーム当り１６０、８０、４０、２０ビットにより符号化され
る。外部開ループモード決定は入力スピーチフレームに与えるモード（ＦＲ、Ｈ
Ｒ、ＱＲ、またはＥＲ）に関して行われる。[0006] Multimode coding may be a variable speed rate was fixed using bits N _o of the same number of each frame or different bit rates, are used in different modes. The goal of variable rate coding is to use only the amount of bits needed to encode the codec parameters to an appropriate level to achieve target quality. As a result, the same target speech quality as a fixed rate high speed coder can be obtained at a very low average rate using variable bit rate (VBR) techniques. Conventional VBR speech coder is designed in modes with different bit rates. An exemplary variable rate speech coder is described in U.S. Pat. No. 5,414,796. The codec described in the aforementioned patent specification has the following four speeds (1) full speed (FR), (2) half speed (HR), (3) quarter speed (QR),
(4) It has 1/8 speed (ER). For the aforementioned rates, each frame of speech is encoded with 160, 80, 40, 20 bits per frame, respectively. The external open loop mode is determined by the mode (FR, H
R, QR, or ER).

【０００７】現在、問題の研究と、中間から低ビット速度（即ち２．４乃至４ｋｂｐｓ以
下の範囲）に対して動作する高品質スピーチコーダを開発する強い商用の要求が
急増している。応用範囲は無線電話、衛星通信、インターネット電話、種々のマ
ルチメディアおよび音声ストリーム応用、音声メール、他の音声記憶システムを
含んでいる。駆動力は高容量に対して必要であり、パケット損失状況下の頑丈な
性能に対する要求である。種々の最近のスピーチコード化標準化の努力は、低速
度スピーチコード化アルゴリズムの研究と開発を推進する別の直接的な駆動力で
ある。低速度スピーチコーダは許容可能な応用の帯域幅につきさらに多くのチャ
ンネルまたはユーザを生成し、適切なチャンネルコード化の付加的な層と結合す
る低速度スピーチコーダはコーダ仕様の総合的なビットバジェットに適合し、チ
ャンネルエラー状況下で頑丈な性能を与える。[0007] There is now a surge in research into the problem and a strong commercial need to develop high quality speech coders that operate for medium to low bit rates (ie, in the range of 2.4-4 kbps or less). Applications include wireless telephony, satellite communications, Internet telephony, various multimedia and voice stream applications, voice mail, and other voice storage systems. Driving power is needed for high capacity, a requirement for robust performance under packet loss situations. Various recent speech coding standardization efforts are another direct driving force driving the research and development of low speed speech coding algorithms. The low-speed speech coder creates more channels or users per acceptable application bandwidth, and the low-speed speech coder combined with the additional layer of proper channel coding provides a coder-specific overall bit budget. Fits and provides robust performance under channel error conditions.

【０００８】[0008]

【発明が解決しようとする課題】[Problems to be solved by the invention]

一般的なスピーチコーダは典型的に現在のフレームを符号化するための幾つか
の形態の予測機構を使用する。したがって、現在のフレームを符号化するため、
スピーチコーダは最後に復号され再生されたフレームに含まれる情報を開発し、
使用する。典型的に連続的なフレーム間に強力な相関または類似性が存在するの
で、これは良好に作用する。したがって、スピーチのフレームまたは短いセグメ
ント、即ちＮ個のサンプルを有するＳ_cur（ｎ）は次式にしたがって符号化され
たフレームＳ_{cur quantized}（ｎ）を形成する予測方法により符号化されること
ができ、ここでｎ＝１，２，…，Ｎである。Ｓ_{cur quantized}（ｎ）＝Ｓ_{cur predicted}（ｎ）＋Ｅ_{cur quantized}（ｎ）＝Ｓ_{prev quantize}（ｎ）^*Ｐ（ｎ）＋Ｅ_{cur quantized}（ｎ）ここで“^*”はコンボリューション演算を表し、Ｐ（ｎ）は過去に量子化され
たフレームＳ_{prev quantize}（ｎ）（例えばＳ_{cur predicted}（ｎ）＝Ｓ_{prev q} _uantize （ｎ）^*Ｐ（ｎ））からの現在のフレームの近似を生成する一般的な予
測フィルタであり、Ｅ_{cur quantized}（ｎ）は現在のフレームの予測エラーＥ_cu _r （ｎ）の量子化されたバージョンである。予測エラーはＥ_cur（ｎ）＝Ｓ_cur （ｎ）−Ｓ_{cur predicted}（ｎ）として規定される。A typical speech coder typically uses some form of prediction mechanism to encode the current frame. Therefore, to encode the current frame,
The speech coder develops the information contained in the last decoded and reproduced frame,
use. This works well because there is typically strong correlation or similarity between successive frames. Thus, a frame or short segment of speech, ie, S _cur (n) with N samples, can be encoded by a prediction method that forms an encoded frame S _{cur quantized} (n) according to , Where n = 1, 2,..., N. S _{cur quantized} (n) = S _{cur predicted} (n) + E _{cur quantized} (n) = S _{prev quantize} (n) ^* P (n) + E _{cur quantized} (n) where “ ^* ” represents a convolution operation, (N) generates an approximation of the current frame from a _{previously quantized} frame S _{prev quantize} (n) (eg, S _{cur predicted} (n) = S _prev _quantant (n) ^* P (n)) a prediction _filter, E _cur quantized (n) is the quantized version of the prediction error E _cu _r (n) of the current frame. The prediction error is defined as E _cur (n) = S _cur (n) −S _{cur predicted} (n).

【０００９】予測構造の性能はしばしば信号対雑音比（ＳＮＲ）または知覚ＳＮＲ（ＰＳ
ＮＲ）により測定され、典型的に以下のように規定される。[0009] The performance of the prediction structure is often the signal-to-noise ratio (SNR) or the perceived SNR (PS
NR) and is typically defined as:

【数１】ここで、Ｗ（ｎ）はｎ＝１，２，…，Ｎでは知覚加重係数であり、Ｎ_cur（ｎ）
は総合的なコード化プロセスのエラーである。総合的なコード化プロセスのエラ
ーは、Ｎ_cur（ｎ）＝Ｓ_cur（ｎ）−Ｓ_{cur quantized}（ｎ）として定められる。通
常のＳＮＲでは、Ｗ（ｎ）は全てのｎ＝１，２，…，Ｎで１に等しく設定される
。(Equation 1) Here, W (n) is a perceptual weighting coefficient for n = 1, 2,..., N, and N _cur (n)
Is an error in the overall coding process. The error of the overall coding process is defined as _Ncur (n) = _Scur (n) _{-Scur quantized} (n). In a normal SNR, W (n) is set equal to 1 for all n = 1, 2,..., N.

【００１０】エラーＮ_curが減少するならば、予測ベースのスピーチコード化方式の性能
またはＳＮＲは増加する。それ故、エラーＮ_curを最少にすることが有効である
。以下の式、Ｎ_cur（ｎ）＝Ｓ_cur（ｎ）−Ｓ_{cur quantized}（ｎ）＝［Ｓ_cur（ｎ）−Ｓ_{cur predicted}（ｎ）］＋［Ｅ_{cur quantized}（ｎ） −Ｅ_cur（ｎ）］＝予測エラー＋予測エラー信号の量子化におけるエラーにより、総合的なエラーＮ_curは予測がどの程度良好に実行され、予測エラーが
どの程度良好に量子化されるかに依存していることが示される。If the error N _cur decreases, the performance or SNR of the prediction-based speech coding scheme increases. Therefore, it is useful to minimize the error N _cur . The following equation: N _cur (n) = S _cur (n) −S _{cur quantized} (n) = [S _cur (n) −S _{cur predicted} (n)] + [E _{cur quantized} (n) −E _cur (n )] == prediction error + error in the quantization of the prediction error signal that the overall error N _cur depends on how well the prediction is performed and how well the prediction error is quantized. Is shown.

【００１１】予測フィルタ情報はある数のビットＮｐとしてデコーダへ送信される必要が
ある。残りの有効なビットＮｏ−Ｎｐは予測エラー信号Ｅ_curを符号化するため
に使用されることができる。量子化された過去のフレームからの予測Ｓ_{prev qu} _antized が現在のフレームＳ_curの優秀な予測された表示Ｓ_{cur predicted}を生
成したならば、予測エラーＥ_curは小さく、低いダイナミック範囲を有する。し
たがって予測エラーＥ_curを少数のビットで符号化することは比較的容易である
。The prediction filter information needs to be transmitted to the decoder as a certain number of bits Np. The remaining valid bits No-Np can be used to encode the prediction error signal E _cur . If the prediction S _{prev qu} _antized from the quantized past frame _produced an excellent _predicted representation S _{cur predicted} of the current frame S _cur , then the prediction error E _cur is small and has a low dynamic range. Therefore, it is relatively easy to encode the prediction error E _cur with a small number of bits.

【００１２】例えばカルコム社により製造されているＱＣＥＬＰ（商標名）１３ｋボコー
ダのような高いビット速度の予測スピーチコーダでは、フレーム当たりのビット
の総数Ｎｏは高い。例えばＱＣＥＬＰ（商標名）は２０ｍｓフレーム当たり２６
０ビットをサポートする。それ故、予測フィルタパラメータを量子化するために
複数のビットＮｐを割当てた後でも、正確に予測エラーを符号化するためのビッ
トＮｏ−Ｎｐが十分に残されている。しかしながら低ビット速度（例えば４ｋｂ
ｐｓ以下）においては、利用可能なビットの総量（即ちフレーム当たり８０以下
）は予測フィルタパラメータと予測エラー信号との両者を正確に符号化するのに
十分な大きさはない。結果として、全体的なコード化エラーＮ_curは大きくなり
、不適切な性能を生じ、本来のフレームＳ_curとは全く異なるバージョンの現在
のフレームの量子化Ｓ_{cur quantized}を生成する。次のフレームの符号化は現在
のフレームがどの程度良好に符号化されるかに基づいているので、不適切な性能
は同様に将来のフレームの予測性能を劣化させる可能性がある。したがって、低
いビット速度で高い音声品質を生成できる可変速度で、マルチモードの予測コー
ダが要求されている。In a high bit rate predictive speech coder, such as the QCELP® 13k vocoder manufactured by Calcom, the total number of bits per frame No is high. For example, QCELP ™ is 26 per 20 ms frame.
Supports 0 bits. Therefore, even after a plurality of bits Np are allocated to quantize the prediction filter parameter, sufficient bits No-Np for accurately encoding the prediction error remain. However, low bit rates (eg, 4 kb)
(ps or less), the total amount of available bits (ie, less than 80 per frame) is not large enough to correctly encode both the prediction filter parameters and the prediction error signal. As a result, the overall coding error N _cur is large, resulting in inadequate performance and producing a completely different version of the current frame quantization S _{cur quantized} than the original frame S _cur . Since the encoding of the next frame is based on how well the current frame is encoded, improper performance can similarly degrade the prediction performance of future frames. Accordingly, there is a need for a variable rate, multi-mode predictive coder that can produce high voice quality at low bit rates.

【００１３】[0013]

【課題を解決するための手段】[Means for Solving the Problems]

本発明は、低いビット速度で高い音声品質を生成することができる可変速度で
マルチモードの予測コーダに関する。したがって、本発明の１特徴によれば、ス
ピーチコーダは複数のコード化モードの少なくとも１つのコード化モードで動作
するように構成されているコデックと、コデックに結合され、複数のコード化モ
ードから第１のコード化モードを入力スピーチフレームへ与えるように構成され
ている閉ループモード決定モジュールとを含んでおり、第１のコード化モードは
複数のコード化モードの他のどのコード化モードのビット速度よりも低い第１の
ビット速度を有しており、閉ループモード決定モジュールはさらにコデックの性
能の尺度を得るように構成され、性能の尺度を予め定められたしきい値と比較し
、性能の尺度がしきい値を超えていない場合には、第１のビット速度よりも大き
い第２のビット速度を有する第２のコード化モードを選んで第１のコード化モー
ドを拒否する。The present invention relates to a variable rate, multi-mode predictive coder that can produce high speech quality at low bit rates. Thus, according to one aspect of the present invention, a speech coder is configured to operate in at least one of a plurality of coding modes, and a codec coupled to the codec and configured to operate from a plurality of coding modes. A closed-loop mode determination module configured to provide one coding mode to the input speech frame, wherein the first coding mode has a higher bit rate than any other coding mode of the plurality of coding modes. Has a lower first bit rate, the closed loop mode determination module is further configured to obtain a measure of performance of the codec, comparing the measure of performance to a predetermined threshold, and determining that the measure of performance is If the threshold has not been exceeded, select a second coding mode having a second bit rate greater than the first bit rate. Deny the first coding mode.

【００１４】本発明の別の特徴では、スピーチフレームのコード化方法は、スピーチフレ
ームに与えるための第１のビット速度を有する第１のコード化モードを選択し、
コード化性能の尺度を獲得し、コード化性能の尺度をしきい値と比較し、コード
化性能の尺度がしきい値を超えていない場合には、第１のビット速度を超える第
２のビット速度を有する第２のコード化モードを選んで第１のコード化モードを
拒否するステップを含んでいる。In another aspect of the invention, a method of encoding a speech frame comprises selecting a first encoding mode having a first bit rate to provide to the speech frame;
Obtaining a measure of coding performance, comparing the measure of coding performance to a threshold, and if the measure of coding performance does not exceed the threshold, a second bit exceeding the first bit rate Selecting a second coding mode having a speed and rejecting the first coding mode.

【００１５】本発明の別の特徴によれば、スピーチコーダはスピーチフレームに与えるた
めの第１のビット速度を有する第１のコード化モードを選択する手段と、コード
化性能の尺度を獲得する手段と、コード化性能の尺度をしきい値と比較する手段
と、コード化性能の尺度がしきい値を超えていない場合には、第１のビット速度
を超える第２のビット速度を有する第２のコード化モードを選んで第１のコード
化モードを拒否する手段を含んでいる。According to another feature of the invention, the speech coder has means for selecting a first coding mode having a first bit rate for providing speech frames, and means for obtaining a measure of coding performance. Means for comparing a measure of coding performance to a threshold, and a second having a second bit rate greater than the first bit rate if the measure of coding performance does not exceed the threshold. Means for selecting the coding mode of the first and rejecting the first coding mode.

【００１６】[0016]

【発明の実施の形態】BEST MODE FOR CARRYING OUT THE INVENTION

図１では、第１のエンコーダ10はデジタル化されたスピーチサンプルＳ（ｎ）
を受信し、送信媒体12または通信チャンネル12で第１のデコーダ14へ送信するた
めにサンプルＳ（ｎ）を符号化する。デコーダ14は符号化されたスピーチサンプ
ルを復号し、出力スピーチ信号Ｓ_SYNTH（ｎ）を合成する。反対方向の送信では
、第２のエンコーダ16はデジタル化されたスピーチサンプルＳ（ｎ）を符号化し
、これは通信チャンネル18で送信される。第２のデコーダ20は符号化されたスピ
ーチサンプルを受信し復号し、合成された出力スピーチ信号Ｓ_SYNTH（ｎ）を発
生する。In FIG. 1, the first encoder 10 has digitized speech samples S (n)
And encode the sample S (n) for transmission to the first decoder 14 on the transmission medium 12 or communication channel 12. The decoder 14 decodes the encoded speech samples and _combines the output speech signals S _SYNTH (n). In the opposite direction of transmission, the second encoder 16 encodes the digitized speech sample S (n), which is transmitted on the communication channel 18. The second decoder 20 receives and decodes the encoded speech samples and generates a synthesized output speech signal S _SYNTH (n).

【００１７】スピーチサンプルＳ（ｎ）は例えばパルスコード変調（ＰＣＭ）、圧伸され
たμ法則またはＡ法則を含む技術的に知られている種々の方法にしたがってデジ
タル化され量子化されているスピーチ信号を表している。技術で知られているよ
うに、スピーチサンプルＳ（ｎ）は入力データのフレームに組織され、ここで各
フレームは予め定められた数のデジタル化されたスピーチサンプルＳ（ｎ）を含
んでいる。例示的な実施形態では、８ｋＨｚのサンプリング速度が使用され、各
２０ｍｓのフレームは１６０サンプルを含んでいる。以下説明する実施形態では
、データ送信速度は８ｋｂｐｓ（全速度）から４ｋｂｐｓ（半速度）、２ｋｂｐ
ｓ（１／４速度）、１ｋｂｐｓ（１／８速度）までフレーム対フレームベースで
有効に変更されることができる。データ送信速度の変更は低いビット速度が比較
的少ないスピーチ情報を含むフレームで選択的に使用されることができるので有
効である。当業者に理解されているように、他のサンプリング速度、フレームサ
イズ、データ送信速度が使用されてもよい。The speech sample S (n) is digitized and quantized according to various methods known in the art, including, for example, pulse code modulation (PCM), companded μ-law or A-law. Represents a signal. As is known in the art, the speech samples S (n) are organized into frames of input data, where each frame includes a predetermined number of digitized speech samples S (n). In the exemplary embodiment, a sampling rate of 8 kHz is used, and each 20 ms frame contains 160 samples. In the embodiment described below, the data transmission speed is 8 kbps (full speed) to 4 kbps (half speed), and 2 kbps.
Up to s (１／ rate), 1 kbps (１／ rate) can be effectively changed on a frame-by-frame basis. Changing the data transmission rate is advantageous because a lower bit rate can be selectively used in frames containing relatively little speech information. Other sampling rates, frame sizes, data transmission rates may be used, as will be appreciated by those skilled in the art.

【００１８】第１のエンコーダ10と第２のデコーダ20は共に第１のスピーチコーダまたは
スピーチコデックを構成している。同様に、第２のエンコーダ16と第１のデコー
ダ14は共に第２のスピーチコーダを構成している。スピーチコーダはデジタル信
号プロセッサ（ＤＳＰ）と、特定用途向け集積回路（ＡＳＩＣ）と、ディスクリ
ートなゲート論理装置と、ファームウェアまたは任意の通常のプログラム可能な
ソフトウェアモジュールとマイクロプロセッサによって構成されてもよいことが
当業者により理解される。ソフトウェアモジュールはＲＡＭメモリ、フラッシュ
メモリ、レジスタ、または任意の他の形態の技術で知られている書込み可能な記
憶媒体に含まれている。その代りに、任意のプロセッサ、制御装置または状態マ
シンがマイクロプロセッサと置換されることができる。スピーチコード化用に特
別に設計された例示的なＡＳＩＣは米国特許第5,727,123 号明細書と、米国特許
出願第08/197,417号明細書（題名“VOCODER ASIC”、1994年２月16日出願）に記
載されている。The first encoder 10 and the second decoder 20 together constitute a first speech coder or speech codec. Similarly, the second encoder 16 and the first decoder 14 together constitute a second speech coder. The speech coder may comprise a digital signal processor (DSP), an application specific integrated circuit (ASIC), discrete gate logic, firmware or any conventional programmable software module and microprocessor. It will be understood by those skilled in the art. The software modules may be included in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art. Alternatively, any processor, controller or state machine can be replaced with a microprocessor. Exemplary ASICs specifically designed for speech coding are described in U.S. Patent No. 5,727,123 and U.S. Patent Application No. 08 / 197,417, entitled "VOCODER ASIC", filed February 16, 1994. Has been described.

【００１９】図２においては、スピーチコーダで使用されてもよいエンコーダ100 はモー
ド決定モジュール102 と、ピッチ評価モジュール104 と、ＬＰ解析モジュール10
6 と、ＬＰ解析フィルタ108 と、ＬＰ量子化モジュール110 と、剰余量子化モジ
ュール112 とを含んでいる。入力スピーチフレームＳ（ｎ）はモード決定モジュ
ール102 、ピッチ評価モジュール104 、ＬＰ解析モジュール106 、ＬＰ解析フィ
ルタ108 に与えられる。モード決定モジュール102 は各入力スピーチフレームＳ
（ｎ）の周期性に基づいてモードインデックスＩ_MとモードＭとを発生する。周
期性にしたがったスピーチフレームを分類する種々の方法は米国特許出願第08/8
15,354号明細書（題名“METHOD AND APPARATUS FOR PERFORMING REDUCED RATE V
ARIABLE RATE VOCODING ”、1997年３月11日）に記載されている。このような方
法はまた米国電気通信工業会の産業上の暫定標準ＴＩＡ／ＥＩＡＩＳ−127 と
ＴＩＡ／ＥＩＡＩＳ−733 にも記載されている。In FIG. 2, an encoder 100 that may be used in a speech coder includes a mode determination module 102, a pitch estimation module 104, and an LP analysis module 10
6, an LP analysis filter 108, an LP quantization module 110, and a remainder quantization module 112. The input speech frame S (n) is provided to a mode determination module 102, a pitch evaluation module 104, an LP analysis module 106, and an LP analysis filter 108. The mode determination module 102 determines each input speech frame S
A mode index _IM and a mode M are generated based on the periodicity of (n). Various methods for classifying speech frames according to periodicity are described in U.S. patent application Ser.
No. 15,354 (title "METHOD AND APPARATUS FOR PERFORMING REDUCED RATE V
ARIABLE RATE VOCODING ", March 11, 1997. Such methods are also described in the Telecommunications Industry Association's provisional industrial standards TIA / EIA IS-127 and TIA / EIA IS-733. Has been described.

【００２０】ピッチ評価モジュール104 は各入力スピーチフレームＳ（ｎ）に基づいてピ
ッチインデックスＩ_Pと遅れ（ラグ）値Ｐ_Oを発生する。ＬＰ解析モジュール10
6 はＬＰパラメータａを発生するために各入力スピーチフレームＳ（ｎ）で線形
の予測解析を実行する。ＬＰパラメータａはＬＰ量子化モジュール110 へ与えら
れる。ＬＰ量子化モジュール110 はまたモードＭを受信する。ＬＰ量子化モジュ
ール110 はＬＰインデックスＩ_LPと、量子化されたＬＰパラメータ［ａ］とを発
生する。ＬＰ解析フィルタ108 は入力スピーチフレームＳ（ｎ）に加えて量子化
されたＬＰパラメータ［ａ］を受信する。ＬＰ解析フィルタ108 はＬＰ剰余信号
Ｒ（ｎ）を発生し、これは入力スピーチフレームＳ（ｎ）と量子化された線形の
予測されたパラメータとの間のエラーを表している。ＬＰ剰余Ｒ（ｎ）、モード
Ｍ、量子化されたＬＰパラメータ［ａ］は剰余量子化モジュール112 へ与えられ
る。これらの値に基づいて、剰余量子化モジュール112 は剰余インデックスＩ_R と量子化された剰余信号とを発生する。The pitch evaluation module 104 generates a pitch index I _P and a delay (lag) value P _O based on each input speech frame S (n). LP analysis module 10
6 performs a linear prediction analysis on each input speech frame S (n) to generate the LP parameter a. The LP parameter a is provided to the LP quantization module 110. LP quantization module 110 also receives mode M. The LP quantization module 110 generates an LP index I _LP and a quantized LP parameter [a]. The LP analysis filter 108 receives the quantized LP parameter [a] in addition to the input speech frame S (n). LP analysis filter 108 generates an LP residue signal R (n), which represents the error between the input speech frame S (n) and the quantized linear predicted parameters. The LP remainder R (n), the mode M, and the quantized LP parameter [a] are supplied to the remainder quantization module 112. Based on these values, the residue quantization module 112 generates a residue index I _R and a quantized residue signal.

【００２１】図３において、スピーチコーダで使用されてもよいデコーダ200 はＬＰパラ
メータデコードモジュール202 と、剰余デコードモジュール204 と、モードデコ
ードモジュール206 と、ＬＰ合成フィルタ208 とを含んでいる。モードデコード
モジュール206 はモードインデックスＩ_Mを受信しデコードし、そこからモード
Ｍを発生する。ＬＰパラメータデコードモジュール202 はモードＭおよびＬＰイ
ンデックスＩ_LPを受取る。ＬＰパラメータデコードモジュール202 は量子化され
たＬＰパラメータ［ａ］を発生するために受信された値をデコードする。剰余デ
コードモジュール204 は剰余インデックスＩ_Rと、ピッチインデックスＩ_Pと、
モードインデックスＩ_Mとを受信する。剰余デコードモジュール204 は量子化さ
れた剰余信号を発生するため受信された値をデコードする。量子化された剰余信
号［Ｒ（ｎ）］と量子化されたＬＰパラメータ［ａ］はＬＰ合成フィルタ208 へ
与えられ、これはそこからデコードされた出力スピーチ信号［Ｓ（ｎ）］を合成
する。In FIG. 3, a decoder 200 that may be used in the speech coder includes an LP parameter decoding module 202, a remainder decoding module 204, a mode decoding module 206, and an LP synthesis filter 208. Mode decoding module 206 decodes received the mode index I _M, generating a mode M therefrom. LP parameter decode module 202 receives mode M and LP index I _LP . LP parameter decoding module 202 decodes the received value to generate a quantized LP parameter [a]. Remainder decoding module 204 and a remainder index I _R, a pitch index I _P,
It receives the mode index I _M. The remainder decoding module 204 decodes the received value to generate a quantized remainder signal. The quantized remainder signal [R (n)] and the quantized LP parameter [a] are provided to an LP synthesis filter 208, which synthesizes a decoded output speech signal [S (n)] therefrom. .

【００２２】図２のエンコーダ100 と図３のデコーダの種々のモジュールの動作および構
成は技術で知られており、L. B. Rabiner & R. W. Schafer のDigital Processi
ng of Speech Signalsの396 −453 頁（1978年）に記載されている。例示的なエ
ンコーダおよび例示的なデコーダは米国特許第5,414,796 号明細書に記載されて
いる。The operation and configuration of the various modules of the encoder 100 of FIG. 2 and the decoder of FIG. 3 are known in the art and are described in LB Rabiner & RW Schafer's Digital Processi.
ng of Speech Signals, pages 396-453 (1978). An exemplary encoder and an exemplary decoder are described in U.S. Patent No. 5,414,796.

【００２３】１実施形態では、マルチモードコーダは現在のフレームを背景雑音／無音（
Ｎ）、音声ではないスピーチ（ＵＶ）、または音声のスピーチ（Ｖ）に分類する
ように現在のフレームから抽出されたパラメータに基づいて最初に開ループ決定
モードを使用する。前述の米国特許第5,414,796 号明細書に記載された方法を含
む速度決定に使用される種々のスピーチ分類方法が技術で知られている。Ｎタイ
プのフレームが１／８速度モードでコード化され、ＵＶタイプのフレームは１／
４速度モードでコード化される。In one embodiment, the multi-mode coder converts the current frame to background noise / silence (
N) Use the open-loop decision mode first based on parameters extracted from the current frame to classify as non-speech speech (UV) or speech speech (V). Various speech classification methods are known in the art for use in rate determination, including the method described in the aforementioned US Pat. No. 5,414,796. N type frames are coded in 1/8 speed mode, UV type frames are 1 /
Coded in 4-speed mode.

【００２４】Ｖタイプのフレーム（即ち音声のスピーチフレーム）では、全速度のような
高い速度（Ｎｏ＝フレーム当たりＮ１ビット）または、半速度のような低い速度
（Ｎｏ＝フレーム当たりＮ２ビット、ここでＮ２＜ｎ１）が使用される。全速度
モードは正確に種々のタイプの音声スピーチを符号化するために適切なビットを
有する予測ベースのコード化構造であることが有効であり、ターゲットＰＳＮＲ
（予め規定されたまたは可変のしきい値）を十分に超える知覚信号対雑音比（Ｐ
ＳＮＲ）を与える。半速度モードは以前のフレーム（即ち以前のフレームに非常
に類似したフレーム）と高い度合いで相関してフレームを符号化するように設計
されている有効な予測ベースのコード化方式である。したがって、半速度モード
で利用されるビット数、即ちフレーム当たりのＮ２ビットは予測エラーと同様に
高い相関を有するフレームの予測パラメータを符号化するのに適切であり、予測
エラーは連続的なフレーム間の高い相関のために比較的小さい。このようなフレ
ームは典型的に安定した音声スピーチセグメントに遭遇され、それ故半速度コー
ド化に修正可能である。さらに、予測ベースのコード化構造の性能はまた先のフ
レームがどの程度正確に量子化されるかに基づいている。閉ループモード選択プ
ロセスは開ループモード後に使用され、それによってコード化性能が予め規定さ
れた（または可変の）ターゲットＰＳＮＲ値を超えることを確実にする。開ルー
プモードは必ずしも与えられる必要があるわけではないことを当業者は理解する
であろう。For V-type frames (ie, speech speech frames), a high rate such as full rate (No = N1 bits per frame) or a low rate such as half rate (No = N2 bits per frame, where N2 <n1) are used. Advantageously, the full rate mode is a prediction-based coding structure with the appropriate bits to accurately code various types of speech speech, and the target PSNR
Perceptual signal-to-noise ratio (P
SNR). Half-rate mode is an efficient prediction-based coding scheme that is designed to encode frames with a high degree of correlation with previous frames (ie, frames that are very similar to previous frames). Therefore, the number of bits used in the half-rate mode, i.e. N2 bits per frame, is appropriate to encode the prediction parameters of a frame with high correlation as well as the prediction error, the prediction error being Relatively small for high correlation. Such frames are typically encountered in a stable speech speech segment, and can therefore be modified to half-rate coding. Further, the performance of the prediction-based coding structure is also based on how accurately the previous frame is quantized. The closed-loop mode selection process is used after the open-loop mode, thereby ensuring that the coding performance exceeds a predefined (or variable) target PSNR value. One skilled in the art will appreciate that the open loop mode need not necessarily be provided.

【００２５】図４のフローチャートは、１実施形態にしたがって低いビット速度における
スピーチフレーム用の閉ループ、マルチモードの予測コード化技術を示している
。ステップ300 では、フレーム数カウンタは１に設定されている。その後、アル
ゴリズムはステップ302 に進み、コード化プロセスを開始する。アルゴリズムは
ステップ304 に進む。ステップ304 では、アルゴリズムは現在のフレームと以前
に量子化されたフレームをチェックする。その後、アルゴリズムはステップ306
へ進む。ステップ306 では、アルゴリズムは現在のフレームが無音または背景雑
音として分類されるか否かを決定する。この決定は例えば平方和を計算する等、
フレームエネルギを測定する種々の一般的な技術にしたがって行われる。フレー
ムが無音または背景雑音として分類されたならば、アルゴリズムはステップ308
へ進む。ステップ308 では、アルゴリズムは１／８速度のコード化モードをフレ
ームに与える。その後、アルゴリズムはステップ310 へ進む。他方、ステップ30
6 ではフレームが背景雑音または無音として分類されないならば、アルゴリズム
はステップ312 へ進む。The flowchart of FIG. 4 illustrates a closed-loop, multi-mode predictive coding technique for speech frames at low bit rates according to one embodiment. In step 300, the frame number counter is set to one. Thereafter, the algorithm proceeds to step 302 to start the encoding process. The algorithm proceeds to step 304. In step 304, the algorithm checks the current frame and the previously quantized frame. Then the algorithm proceeds to step 306
Proceed to. At step 306, the algorithm determines whether the current frame is classified as silence or background noise. This decision can be, for example,
This is done according to various common techniques for measuring frame energy. If the frame was classified as silence or background noise, the algorithm proceeds to step 308
Proceed to. In step 308, the algorithm provides a 1/8 rate coding mode to the frame. Thereafter, the algorithm proceeds to step 310. On the other hand, step 30
If in 6 the frame is not classified as background noise or silence, the algorithm proceeds to step 312.

【００２６】ステップ312 では、アルゴリズムは現在のフレームが非音声のスピーチとし
て分類されるべきか否かを決定される。この決定は例えばゼロ交差および標準化
された自己相関機能（ＮＡＣＦ）の使用等の、周期性決定の種々の既知の方法に
したがって行われる。これらの技術は米国特許出願第08/815,354号明細書に記載
されている。フレームが非音声スピーチとして分類されたならば、アルゴリズム
はステップ314 へ進む。ステップ314 では、１／４速度のコード化モードがフレ
ームに与えられる。その後アルゴリズムはステップ310 へ進む。他方、ステップ
312 でフレームが非音声スピーチとして分類されなかったならば、アルゴリズム
はステップ316 へ進み、そのフレームが音声スピーチを含んでいるとして考慮す
る。ステップ316 でアルゴリズムは１／２速度の予測ベースコードモードへ進む
。その後、アルゴリズムはステップ318 へ進む。ステップ318 ではＰＳＮＲが計
算される。その後、アルゴリズムはステップ320 へ進む。In step 312, the algorithm determines whether the current frame is to be classified as non-speech speech. This determination is made according to various known methods of periodicity determination, for example, the use of zero crossings and a standardized autocorrelation function (NACF). These techniques are described in US patent application Ser. No. 08 / 815,354. If the frame has been classified as non-speech speech, the algorithm proceeds to step 314. In step 314, a 1/4 rate coding mode is provided to the frame. Thereafter, the algorithm proceeds to step 310. On the other hand, step
If the frame was not classified at 312 as non-speech speech, the algorithm proceeds to step 316 and considers the frame as containing speech speech. In step 316, the algorithm proceeds to the 1/2 speed predictive base code mode. Thereafter, the algorithm proceeds to step 318. In step 318, the PSNR is calculated. Thereafter, the algorithm proceeds to step 320.

【００２７】ステップ320 ではアルゴリズムは計算されたＰＳＮＲが予め定められたしき
い値またはターゲットＰＳＮＲ値よりも大きいか否かを決定する。その代わりに
、しきい値またはターゲットＰＳＮＲ値は平均ビット速度の関数であってもよい
。例えば平均ビット速度が周期的に計算され、アルゴリズムへフィードバックさ
れ、したがってターゲットしきい値を調節する。さらに、性能の任意の一般的な
尺度がＰＳＮＲに置換されてもよいことが理解されよう。計算されたＰＳＮＲが
ターゲットＰＳＮＲを超過したならば、アルゴリズムはステップ322 へ進む。ス
テップ322 では、半速度コード化モードがフレームに与えられる。その後、アル
ゴリズムはステップ310 へ進む。他方で、ステップ320 において計算されたＰＳ
ＮＲがターゲットＰＳＮＲを超過していないならば、アルゴリズムはステップ32
4 へ進む。ステップ324 で、アルゴリズムは全速度コード化モードをフレームに
与える。その後、アルゴリズムはステップ310 へ進む。In step 320, the algorithm determines whether the calculated PSNR is greater than a predetermined threshold or target PSNR value. Alternatively, the threshold or target PSNR value may be a function of the average bit rate. For example, the average bit rate is calculated periodically and fed back to the algorithm, thus adjusting the target threshold. Further, it will be appreciated that any general measure of performance may be replaced by PSNR. If the calculated PSNR exceeds the target PSNR, the algorithm proceeds to step 322. In step 322, a half rate coding mode is provided to the frame. Thereafter, the algorithm proceeds to step 310. On the other hand, the PS calculated in step 320
If the NR does not exceed the target PSNR, the algorithm proceeds to step 32
Proceed to 4. At step 324, the algorithm provides the full rate coding mode for the frame. Thereafter, the algorithm proceeds to step 310.

【００２８】ステップ310 で、フレーム数カウンタは１だけ増加される。その後、アルゴ
リズムはステップ326 へ進む。ステップ326 では、アルゴリズムはフレーム数カ
ウンタ値が処理されなければならないフレームの総数以上であるか否か（即ち処
理されるフレームが残留するか否か）を決定する。フレーム数カウンタ値が処理
されるフレームの総数よりも少ないならば、アルゴリズムはステップ302 へ戻り
、次のフレームのコード化プロセスを開始する。他方、フレーム数カウンタ値が
処理されるフレームの総数以上であるならば、アルゴリズムはステップ328 へ進
んで、コード化プロセスを終了する。At step 310, the frame number counter is incremented by one. Thereafter, the algorithm proceeds to step 326. At step 326, the algorithm determines whether the frame number counter value is greater than or equal to the total number of frames that must be processed (ie, whether there are any remaining frames to be processed). If the number of frames counter value is less than the total number of frames to be processed, the algorithm returns to step 302 and begins the encoding process for the next frame. On the other hand, if the frame number counter value is greater than or equal to the total number of frames to be processed, the algorithm proceeds to step 328 and ends the encoding process.

【００２９】別の実施形態では、図４に関して前述した全速度コード化モードは高いビッ
ト速度の予測機構（即ち半速度よりも大きい任意のビット速度）である。１実施
形態では、高いビット速度の直接コード化機構は全速度の予測コード化モードに
置換される。直接コード化モードは以前のフレームからの何等の情報も使用せず
に、現在のスピーチフレームまたは剰余を符号化する。In another embodiment, the full rate coding mode described above with respect to FIG. 4 is a high bit rate predictor (ie, any bit rate greater than half rate). In one embodiment, the high bit rate direct coding scheme is replaced with a full rate predictive coding mode. Direct coding mode encodes the current speech frame or remainder without using any information from previous frames.

【００３０】直接符号化方法の使用は現在のフレームと以前のフレームとの間に類似性が
ないスピーチセグメントに適している。１例は音声セグメントの開始期間中であ
る。別の例は非音声と音声セグメントの転移である。対応する本来のスピーチフ
レームとの同期から過度に離れるように、予測ベースの符号化の累積効果が過去
の量子化されたフレームを劣化するとき直接符号化方法は音声セグメントの中間
部分で有効である。このような場合に予測コード化は、以前の量子化されたフレ
ームと過去のもとのフレームとの間の類似性が欠けていることにより、はるかに
高いビット速度であっても失敗する。このような場合、直接符号化方法による現
在のフレームを新たに捕捉することは、予測機構がさらに正確なメモリにより補
助されるので、現在のフレームの保存を強化するだけでなく、次およびその後の
フレームの将来の予測ベースの符号化を容易にする。The use of the direct coding method is suitable for speech segments where there is no similarity between the current frame and the previous frame. One example is during the start of an audio segment. Another example is the transition between non-speech and speech segments. The direct coding method is effective in the middle part of the speech segment when the cumulative effect of prediction-based coding degrades past quantized frames, so as to deviate too much from the synchronization with the corresponding original speech frame . In such a case, predictive coding fails even at much higher bit rates due to the lack of similarity between the previous quantized frame and the previous original frame. In such a case, re-capturing the current frame with the direct encoding method not only enhances the preservation of the current frame, as the prediction mechanism is assisted by a more accurate memory, but also the next and subsequent ones. Facilitates future prediction-based encoding of frames.

【００３１】前述の実施形態は４つのビット速度を考慮しているが、任意の合理的な数の
ビット速度が４つのビット速度に代って使用されてもよいことを当業者は理解す
るであろう。当業者はさらに、ここで説明した実施形態が付加的な処理時間また
は容量を犠牲にして１よりも多数のフレーム数にわたる解析するように拡張され
ることができることを認識するであろう。Although the above embodiments consider four bit rates, those skilled in the art will appreciate that any reasonable number of bit rates may be used instead of four bit rates. There will be. Those skilled in the art will further recognize that the embodiments described herein can be extended to analyze over more than one frame at the expense of additional processing time or capacity.

【００３２】１実施形態では２つのモード、即ちビット速度Ｒ１とＲ２が使用されている
。Ｒ１コード化方法は高速度の直接コード化方法である。Ｒ２コード化方法は低
速度の予測コード化方法である。Ｒ２コード化方法が最初に試され、性能が性能
の尺度と比較されることによってチェックされ、Ｒ２コード化モードの性能が不
十分であるならばアルゴリズムがＲ１コード化方法に切換えられるように閉ルー
プ決定が行われる。代わりの実施形態では、高速度のＲ１コード化モードが最初
に試され、性能は性能の尺度と比較することによりチェックされ、性能が満足す
べきものであるならば低速度のＲ２コード化モードが試される。性能のチェック
はその後、Ｒ２コード化モードで実行され、Ｒ２コード化モード性能が不適切で
あるならば、Ｒ１コード化モードがフレームに与えられる。In one embodiment, two modes are used, bit rates R1 and R2. The R1 coding method is a high speed direct coding method. The R2 coding method is a low speed predictive coding method. The R2 coding method is tried first and the performance is checked by comparing it to a measure of performance, and if the performance of the R2 coding mode is insufficient, the closed-loop decision is made so that the algorithm can be switched to the R1 coding method. Is performed. In an alternative embodiment, the high-speed R1 coding mode is tried first, the performance is checked by comparing to a measure of performance, and if the performance is satisfactory, the low-speed R2 coding mode is tried. It is. The performance check is then performed in the R2 coding mode, and if the R2 coding mode performance is inappropriate, the R1 coding mode is given to the frame.

【００３３】別の実施形態では、ビット速度Ｒ１，Ｒ２，…，ＲＮ−１，ＲＮ（ここでは
Ｒ１＞Ｒ２＞…＞ＲＮ−１＞ＲＮ）を有する多数のコード化モードが使用される
。最低速度のＲＮが最初に試されるように閉ループ決定は行われる。ＲＮコード
化モードが適切に実行されるならば、ＲＮコード化モードはフレームに対して保
持される。そうでなければ、次の高い速度のコード化モードＲＮ−１が与えられ
る。プロセスはコード化モードが適切に実行するか、または最高速度のモードＲ
１が保持されるまで再度反復される。代わりの実施形態では最高速度のＲ１が最
初に試される。Ｒ１モードが適切に実行するならば、次の低い速度のコード化モ
ードＲ２が試される。プロセスは所定のコード化モードが適切に行われなくなる
（この時、適切に実行するための最後のコード化モードが与えられる）まで、ま
たは最低速度のコード化モードＲＮが適切に満足できるように実行され、与えら
れるまで継続される。In another embodiment, multiple coding modes with bit rates R1, R2,..., RN-1, RN (where R1> R2 >>... RN-1> RN) are used. The closed loop decision is made so that the lowest speed RN is tried first. If the RN coding mode is properly performed, the RN coding mode is maintained for the frame. Otherwise, the next higher rate coding mode RN-1 is provided. The process is performed either in coded mode properly or in top speed mode R
Repeat again until 1 is retained. In an alternative embodiment, the highest speed R1 is tried first. If the R1 mode performs properly, the next lower speed coding mode R2 is tried. The process runs until the given coding mode is no longer performed properly (this time giving the last coding mode to perform properly) or the slowest coding mode RN is adequately satisfactory. And continue until given.

【００３４】別の実施形態では、ビット速度Ｒ１，Ｒ２，…，Ｒｍ−１，Ｒｍ，Ｒｍ＋１
，…，ＲＮを有する多数のコード化モードが使用される。ビット速度は以下の相
対的な大きさ、即ちＲ１＞Ｒ２＞Ｒｍ−１＞Ｒｍ＞Ｒｍ＋１＞ＲＮを有する。閉
ループモード決定は開ループモード決定と関連して作用する。開ループモード決
定は、フレームエネルギまたはフレーム周期性等のパラメータに基づいて、モー
ドにビット速度Ｒｍを与えるようにコーダへ要求し、この点で閉ループモード決
定が引継ぐ。閉ループモード決定はＲｍコード化モードを与え、性能を試験し、
性能が満足すべきものであるならばＲｍコード化モードを維持する。そうでなけ
れば、閉ループモード決定は次に高速度コード化モードＲｍ−１を試す。プロセ
スはコード化モードが適切に実行されるか、最高速度モードＲ１が維持されるま
で再度反復される。代わりに、閉ループモード決定はＲｍコード化モードを与え
、性能を試験し、性能が適切であるならばＲｍコード化モードを維持する。そう
でなければ、閉ループモード決定は次に低速度のコード化モードＲｍ＋１を試す
。プロセスはコード化モードが不適切に実行される（この時、適切に実行するた
めの最後のコード化モードが与えられる）か、または最低速度のモードＲＮが維
持されるまで再度反復される。In another embodiment, the bit rates R1, R2,..., Rm-1, Rm, Rm + 1
,..., RN are used. The bit rates have the following relative magnitudes: R1>R2>Rm-1>Rm> Rm + 1> RN. The closed loop mode decision works in conjunction with the open loop mode decision. The open loop mode decision requests the coder to give the mode a bit rate Rm based on parameters such as frame energy or frame periodicity, at which point the closed loop mode decision takes over. The closed loop mode decision gives the Rm coding mode, tests the performance,
If the performance is satisfactory, keep the Rm coding mode. Otherwise, the closed loop mode decision next tries the high speed coding mode Rm-1. The process is repeated again until the coding mode is properly executed or the maximum speed mode R1 is maintained. Instead, the closed loop mode decision gives the Rm coding mode, tests the performance and maintains the Rm coding mode if the performance is appropriate. Otherwise, the closed loop mode decision next tries the low speed coding mode Rm + 1. The process is repeated again until the coding mode is performed improperly (this time giving the last coding mode to perform properly) or until the lowest speed mode RN is maintained.

【００３５】別の実施形態では、ビット速度Ｒ１，Ｒ２，…，ＲＮ（ここではＲ１＞Ｒ２
＞…＞ＲＮ）を有する多数のコード化モードが使用される。全てのコード化モー
ドは入力スピーチフレームに対して並列に与えられ、コード化モードの性能は１
組のＮ個のしきい値性能の尺度と比較される。最も正確な結果を生成するように
見えるコード化モードが選択される。In another embodiment, the bit rates R1, R2,..., RN (where R1> R2
... RN) are used. All coding modes are given in parallel to the input speech frames, and the performance of the coding mode is 1
It is compared to a set of N threshold performance measures. The coding mode that appears to produce the most accurate result is selected.

【００３６】別の実施形態では、ビット速度Ｒ１，Ｒ２，…，ＲＮ（ここではＲ１＞Ｒ２
＞…＞ＲＮ）を有する多数のコード化モードが使用される。全てのコード化モー
ドは入力スピーチフレームに対して並列に与えられ、コード化モードの性能は１
セットのＮ個の性能しきい値尺度と比較される。幾つかのコード化モードが性能
のしきい値ターゲットを超過したならば、最低のビット速度を有する（かつ、性
能しきい値を超えて実行する）コードモードが選択される。In another embodiment, the bit rates R1, R2,..., RN (where R1> R2
... RN) are used. All coding modes are given in parallel to the input speech frames, and the performance of the coding mode is 1
It is compared to a set of N performance threshold measures. If some coding modes exceed the performance threshold target, the code mode with the lowest bit rate (and running above the performance threshold) is selected.

【００３７】別の実施形態では、ビット速度Ｒ１，Ｒ２，…，１／４速度，…，半速度，
…，ＲＮ（ここではＲ１は全速度、ＲＮは１／８速度）を有する多数のコード化
モードが使用される。閉ループモード決定は開ループモード決定と関連して行わ
れる。開ループモード決定は、フレームエネルギまたはフレームの周期性等のパ
ラメータに基づいて、全速度コード化モードを非音声から音声への転移フレーム
と音声から音声への転移フレームと、静止していない音声セグメントと、静止し
ていない非音声セグメントへ与えるようにコーダに要求する。またフレームパラ
メータに基づいて、開ループモード決定は半速度コード化モードをフレーム間の
大きな類似性の程度を示す一定の音声セグメントへ与えるようにコーダに要求す
る。またフレームパラメータに基づいて、開ループモード決定は１／４速度コー
ド化モードを一定の非音声セグメントへ与えるようにコーダに要求する。またフ
レームパラメータに基づいて、開ループモード決定は１／８速度コード化モード
を背景雑音と、無声のような他の非スピーチ信号へ与えるようにコーダに要求す
る。開ループモード決定が一度フレームへ与えるためのコード化モードを選択す
ると、閉ループモード決定が引継ぐ。閉ループモード決定は開ループモード決定
により選択されたコード化モードを与え、性能を試験し、性能が適切であるなら
ば選択されたコード化モードを維持する。そうでなければ、閉ループモード決定
は次の高い速度のコード化モードを試みる。プロセスはコード化モードが適切に
実行されるか、または全速度モードが維持されるまで再度反復される。代わりに
、閉ループモード決定は開ループモード決定により選択されたコード化モードを
与え、性能を試験し、性能が適切であるならば選択されたコード化モードを維持
する。そうでなければ、閉ループモード決定は次の低い速度のコード化モードを
試みる。プロセスはコード化モードが不適切に行われる（この時、適切に実行す
るための最後のコード化モードが与えられる）かまたは最低速度のモードが維持
されるまで再度反復される。In another embodiment, the bit rates R1, R2,..., １／ rate,.
.., RN (where R1 is full rate, RN is 1/8 rate) and a number of coding modes are used. The closed loop mode decision is made in connection with the open loop mode decision. The open-loop mode decision is based on parameters such as frame energy or frame periodicity, and the full rate coding mode is switched between non-speech to speech transition frames, speech to speech transition frames, and non-stationary speech segments. To the coder to give to non-speech segments that are not stationary. Also, based on the frame parameters, the open loop mode decision requires the coder to apply a half-rate coding mode to certain speech segments that exhibit a large degree of similarity between frames. Also, based on the frame parameters, the open loop mode decision requires the coder to provide a quarter rate coding mode to certain non-speech segments. Also, based on the frame parameters, the open loop mode decision requires the coder to provide a 1/8 rate coding mode to background noise and other non-speech signals such as unvoiced. Once the coding mode has been selected for the open loop mode decision to give to the frame, the closed loop mode decision takes over. The closed loop mode decision gives the coding mode selected by the open loop mode decision, tests the performance, and if the performance is appropriate, keeps the selected coding mode. Otherwise, the closed loop mode decision attempts the next higher speed coding mode. The process is repeated again until the coding mode is properly executed or the full speed mode is maintained. Instead, the closed loop mode decision gives the coding mode selected by the open loop mode decision, tests the performance and keeps the selected coding mode if the performance is appropriate. Otherwise, the closed loop mode decision attempts the next lower speed coding mode. The process is repeated again until the coding mode is improperly performed (this time giving the last coding mode to perform properly) or the slowest mode is maintained.

【００３８】別の実施形態では、マルチモードコーダは第１のセットのＮモードＭｉを含
んでおり、第１のセットのモードはそれぞれのビット速度Ｒｉを有し、ここでｉ
＝１，２，…，Ｎである。コーダはまた第２のセットのＮモードＭＣＣｉを有し
、第２のセットのモードはそれぞれのビット速度ＲＣＣｉを有し、ここでｉ＝１
，２，…，Ｎである。ＭＣＣｉとＭｉコード化モードはそれぞれ同じソースコー
ド化モード（即ち同一のエンコーダとデコーダ）を使用する。しかしながら、Ｍ
ＣＣｉコード化モードはチャンネル保護の付加的な層を含んでおり、それにおい
て（ＲＣＣｉ−Ｒｉ）ビットは通信システムの最悪の可能なチャンネル状況下で
Ｍｉコード化モードのパラメータの頑丈な保護のために使用される。チャンネル
のエラーのない状況下におけるＭｉコード化モードにより与えられた性能または
音声品質は、可能な最悪のチャンネルのエラーのある状況下におけるＭＣＣｉコ
ード化モードにより得られた性能または音声品質に類似している。（ＲＣＣｉ−
Ｒｉ）チャンネルコード化ビットは想定された、またはターゲットの最悪のチャ
ンネル状況下で適切な保護を与えるように作用する。想定された最悪のチャンネ
ル状況は便宜的に、例えばフレームエラー速度（ＦＥＲ）の予め限定された割合
であってもよい。In another embodiment, the multi-mode coder includes a first set of N-modes Mi, wherein the first set of modes has respective bit rates Ri, where i
= 1, 2,..., N. The coder also has a second set of N-mode MCCi, the second set of modes having respective bit rates RCCi, where i = 1
, 2,..., N. The MCCi and Mi coding modes each use the same source coding mode (ie, the same encoder and decoder). However, M
The CCi coded mode includes an additional layer of channel protection, in which the (RCCi-Ri) bits are used for robust protection of Mi coded mode parameters under the worst possible channel conditions of the communication system. used. The performance or voice quality provided by the Mi-coded mode under channel-free conditions is similar to the performance or voice quality obtained by the MCCi-coded mode under the worst possible channel error conditions. I have. (RCCi-
Ri) The channel coded bits act to provide adequate protection under the assumed or target worst channel conditions. The assumed worst channel situation may conveniently be, for example, a predefined percentage of the frame error rate (FER).

【００３９】この特別な実施形態では、閉ループモード決定は保証されたサービス品質を
与えるためにチャンネル変化とソース変化の両者を有効に考慮する。例えば前述
したソース制御された閉ループモード決定が最初に与えられる。閉ループモード
決定はＭｉコード化モードを使用するようにコーダへ要求する。通信ネットワー
クによりスピーチエンコーダへ与えられた信号である外部のネットワーク制御イ
ンジケータＳＷは通信チャンネルが良好な状態（例えばＳＷ＝１ならば、チャン
ネルエラーはない）であるかまたは悪い状態（例えばＳＷ＝０ならば、チャンネ
ルにエラーがある）であるかを示す。チャンネルが良好な状態であるならば、ビ
ット速度Ｒｉを有するコード化モードＭｉが使用される。他方、チャンネルが悪
い状態であるならば、ビット速度ＲＣＣｉを有するコード化モードＭＣＣｉが使
用される。In this particular embodiment, the closed loop mode decision effectively takes into account both channel changes and source changes to provide a guaranteed quality of service. For example, the above-described source controlled closed loop mode determination is provided first. A closed loop mode decision requests the coder to use the Mi coded mode. An external network control indicator SW, which is a signal provided to the speech encoder by the communication network, indicates whether the communication channel is in a good state (eg, if SW = 1, no channel error) or a bad state (eg, if SW = 0). If there is an error in the channel). If the channel is in good condition, the coding mode Mi with the bit rate Ri is used. On the other hand, if the channel is in a bad state, the coding mode MCCi with the bit rate RCCi is used.

【００４０】当業者は多数のネットワーク状態が２つに限定される必要のないことを認識
するであろう。したがって、１実施形態では、マルチモードコーダは各本来のソ
ース制御コード化モードＭｉに対して速度ＲＣＣｉ，ｊを有するＭの異なるモー
ドＭＣＣｉ，ｊを与えることによりＭの異なる可能なネットワーク状態を考慮し
て設計されており、ｊ＝１，２，…，Ｍである。（ＲＣＣｉ，ｊ−ＲＣＣｉ）は
、チャンネルエラー保護をチャンネルコード化層へ付加するのに必要な最小数の
ビットを表すので、このような方式は可変量のチャンネルコード化を可能にし、
したがってチャンネルエラー保護はｊ番目のチャンネルエラー状態の最悪のケー
スのシナリオに対して適している。ソース制御された閉ループモード決定はその
後、最初に与えるコード化モードＭｉを決定し、ＳＷ＝ｊ（ここでｊ＝１，２，
…，Ｍ）の値に基づいてコード化モードＭＣＣｉ，ｊを選択する。このような閉
ループの結合されたネットワークおよびソース制御されたコデックは低い平均ビ
ット速度を与えながら種々のチャンネル状態をわたって保証されたサービス品質
を与える。Those skilled in the art will recognize that many network conditions need not be limited to two. Thus, in one embodiment, the multi-mode coder considers M different possible network conditions by providing M different modes MCCi, j with rates RCCi, j for each native source control coding mode Mi. , M = 1, 2,..., M. Since (RCCi, j-RCCi) represents the minimum number of bits needed to add channel error protection to the channel coding layer, such a scheme allows for a variable amount of channel coding,
Therefore, channel error protection is suitable for the worst case scenario of the jth channel error condition. The source controlled closed loop mode decision then determines the first applied coding mode Mi, SW = j (where j = 1,2,2,
, M), the coding mode MCCi, j is selected. Such a closed loop combined network and source controlled codec provides guaranteed quality of service across various channel conditions while providing low average bit rates.

【００４１】本発明の好ましい実施形態をしたがって図示し説明した。しかしながら、多
数の変更が本発明の技術的範囲を逸脱せずにここで説明した実施形態に対して行
われてもよいことは当業者に明白である。それ故、本発明は特許請求の範囲によ
ってのみ限定される。A preferred embodiment of the present invention has thus been shown and described. However, it will be apparent to one skilled in the art that many changes may be made to the embodiments described herein without departing from the scope of the invention. Therefore, the present invention is limited only by the claims.

【図面の簡単な説明】[Brief description of the drawings]

【図１】スピーチコーダにより各端末で終端する通信チャンネルのブロック図。FIG. 1 is a block diagram of a communication channel terminated at each terminal by a speech coder.

【図２】エンコーダのブロック図。FIG. 2 is a block diagram of an encoder.

【図３】デコーダのブロック図。FIG. 3 is a block diagram of a decoder.

【図４】低いビット速度におけるスピーチフレームの閉ループ、マルチモード、予測コ
ード化技術のステップを示したフローチャート。FIG. 4 is a flowchart illustrating the steps of a closed-loop, multi-mode, predictive coding technique for speech frames at low bit rates.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ )，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ (72)発明者マンジュナス、シャラスアメリカ合衆国、カリフォルニア州 92126 サン・ディエゴ、シーリング・アベニュー・ナンバー５、7104 (72)発明者デジャコ、アンドリュー・ピーアメリカ合衆国、カリフォルニア州 92126 サン・ディエゴ、フランダース・コーブ 10424 Ｆターム(参考） 5D045 CC02 DA20 5J064 AA01 BB03 BC02 BC11 BC16 BC22 BC25 BD02 ──────────────────────────────────────────────────続き Continuation of front page (81) Designated country EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE ), OA (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, SD, SL, SZ, TZ, UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CR, CU, CZ, DE, DK, DM, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID , IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, NO, (72) Invention NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW Manjunas, Shalas United States, California 92126 San Diego, Ceiling Avenue Number 5, 7104 (72) Inventor Dejaco, Andrew P. United States of America, California 92126 San Diego, Flanders Cove 10424 F-term (reference) 5D045 CC02 DA20 5J064 AA01 BB03 BC02 BC11 BC16 BC22 BC25 BD02

Claims

【特許請求の範囲】[Claims]

【請求項１】複数のコード化モードの少なくとも１つのコード化モードで
動作するように構成されているコデックと、コデックに結合され、複数のコード化モードから第１のコード化モードを入力
スピーチフレームへ与えるように構成されている閉ループモード決定モジュール
とを含んでおり、第１のコード化モードは複数のコード化モードの他のどのコード化モードのビ
ット速度よりも低い第１のビット速度を有しており、閉ループモード決定モジュ
ールはさらにコデックの性能の尺度を得るように構成され、性能の尺度を予め定
められたしきい値と比較し、性能の尺度がしきい値を超えていない場合には、第
１のビット速度よりも大きい第２のビット速度を有する第２のコード化モードを
選んで第１のコード化モードを拒否するスピーチコーダ。1. A codec configured to operate in at least one of a plurality of coding modes, and an input speech frame coupled to the codec and receiving a first coding mode from the plurality of coding modes. A first coding mode having a first bit rate that is lower than a bit rate of any of the other coding modes of the plurality of coding modes. And the closed loop mode determination module is further configured to obtain a measure of performance of the codec, comparing the measure of performance to a predetermined threshold, and determining if the measure of performance does not exceed the threshold. Selects a second coding mode having a second bit rate greater than the first bit rate and rejects the first coding mode. Coder.

【請求項２】閉ループモード決定モジュールは選択プロセスを継続し、性
能に基づいて増加されたビット速度の順序で連続的に選択されたコード化モジュ
ールを拒否するように構成されている請求項１記載スピーチコーダ。2. The closed loop mode determination module is configured to continue the selection process and reject coded modules that are successively selected in order of increasing bit rate based on performance. Speech coder.

【請求項３】尺度に基づいた性能は、結果的な合成スピーチフレームを入
力スピーチフレームと比較することによって得られる請求項１記載スピーチコー
ダ。3. A speech coder according to claim 1, wherein the metric-based performance is obtained by comparing the resulting synthesized speech frame with the input speech frame.

【請求項４】前記第１のコード化モードは予測ベースのコード化モードで
あり、前記第２のコード化モードは直接コード化モードである請求項１記載スピ
ーチコーダ。4. The speech coder according to claim 1, wherein said first coding mode is a prediction-based coding mode and said second coding mode is a direct coding mode.

【請求項５】コデックに結合され、閉ループモード決定モジュールがコー
ド化モードを与える前に入力スピーチフレームへ与えるための複数のコード化モ
ードのうちの１つを選択するように構成されている開ループモード決定モジュー
ルをさらに具備しており、閉ループモード決定モジュールは開ループモード決定
モジュールにより選択されたコード化モードを最初に与えるように構成されてい
る請求項１記載スピーチコーダ。5. An open loop coupled to the codec, wherein the closed loop mode determination module is configured to select one of a plurality of coding modes to provide to the input speech frame before providing the coding mode. The speech coder of claim 1, further comprising a mode determination module, wherein the closed loop mode determination module is configured to initially provide a coding mode selected by the open loop mode determination module.

【請求項６】コデックに結合され、閉ループモード決定モジュールがコー
ド化モードを与える前に入力スピーチフレームへ与えるための複数のコード化モ
ードのうちの１つを選択するように構成されている開ループモード決定モジュー
ルをさらに具備しており、閉ループモード決定モジュールは開ループモード決定
モジュールにより選択されたコード化モードを最初に与えるように構成されてい
る請求項２記載スピーチコーダ。6. An open loop coupled to the codec, wherein the closed loop mode determination module is configured to select one of a plurality of coding modes to provide to the input speech frame before providing the coding mode. 3. The speech coder of claim 2, further comprising a mode determination module, wherein the closed loop mode determination module is configured to initially provide a coding mode selected by the open loop mode determination module.

【請求項７】しきい値は予め定められた量である請求項１記載のスピーチ
コーダ。7. The speech coder according to claim 1, wherein the threshold value is a predetermined amount.

【請求項８】しきい値は平均ビット速度の関数である請求項１記載のスピ
ーチコーダ。8. The speech coder of claim 1, wherein the threshold is a function of the average bit rate.

【請求項９】スピーチフレームに与えるための第１のビット速度を有する
第１のコード化モードを選択し、コード化性能の尺度を獲得し、コード化性能の尺度をしきい値と比較し、コード化性能の尺度がしきい値を超えていない場合には、第１のビット速度を
超える第２のビット速度を有する第２のコード化モードを選んで第１のコード化
モードを拒否するステップを含んでいるスピーチフレームのコード化方法。9. Selecting a first coding mode having a first bit rate for providing speech frames, obtaining a measure of coding performance, comparing the measure of coding performance to a threshold, If the measure of coding performance does not exceed the threshold, reject the first coding mode by selecting a second coding mode having a second bit rate that is greater than the first bit rate. Encoding of speech frames containing

【請求項１０】コード化性能の尺度がしきい値を超えるまで、連続的な順
序で獲得、比較、拒否ステップを反復するステップをさらに含んでいる請求項９
記載の方法。10. The method of claim 9 further comprising the step of repeating the obtaining, comparing, and rejecting steps in a sequential order until the measure of coding performance exceeds a threshold.
The described method.

【請求項１１】前記獲得するステップは、結果的な合成スピーチフレーム
をスピーチフレームと比較するステップを含んでいる請求項９記載の方法。11. The method of claim 9, wherein said obtaining comprises comparing the resulting synthesized speech frame to a speech frame.

【請求項１２】前記第１のコード化モードは予測ベースのコード化モード
であり、前記第２のコード化モードは直接コード化モードである請求項９記載の
方法。12. The method according to claim 9, wherein said first coding mode is a prediction-based coding mode and said second coding mode is a direct coding mode.

【請求項１３】前記選択するステップはスピーチフレームのパラメータに
基づいて第１のコード化モードを選択するステップを含んでいる請求項９記載の
方法。13. The method of claim 9, wherein said selecting comprises selecting a first coding mode based on parameters of a speech frame.

【請求項１４】前記選択するステップはスピーチフレームのパラメータに
基づいて第１のコード化モードを選択するステップを含んでいる請求項１０記載
の方法。14. The method of claim 10, wherein said selecting step includes selecting a first coding mode based on parameters of a speech frame.

【請求項１５】前記比較するステップはコード化の性能の尺度と予め定め
られたしきい値とを比較するステップを含んでいる請求項９記載の方法。15. The method of claim 9, wherein said comparing step comprises comparing a measure of coding performance with a predetermined threshold.

【請求項１６】前記比較するステップはコード化の性能の尺度と平均ビッ
ト速度の関数であるしきい値とを比較するステップを含んでいる請求項９記載の
方法。16. The method of claim 9, wherein said comparing step comprises comparing a measure of coding performance with a threshold value that is a function of the average bit rate.

【請求項１７】スピーチフレームに与えるための第１のビット速度を有す
る第１のコード化モードを選択する手段と、コード化性能の尺度を獲得する手段と、コード化性能の尺度をしきい値と比較する手段と、コード化性能の尺度がしきい値を超えていない場合には、第１のビット速度を
超える第２のビット速度を有する第２のコード化モードを選んで第１のコード化
モードを拒否する手段を含んでいるスピーチコーダ。17. A means for selecting a first coding mode having a first bit rate for providing to a speech frame; means for obtaining a measure of coding performance; And a second coding mode having a second bit rate greater than the first bit rate if the measure of coding performance does not exceed a threshold. A speech coder that includes means for rejecting the activation mode.

【請求項１８】継続して性能の尺度を獲得し、性能の尺度をしきい値と比
較し、コード化の性能の尺度がしきい値を超えるまで、大きいビット速度を有す
る他のコード化モードを選んでコード化モードを拒否する手段をさらに含んでい
る請求項１７記載のスピーチコーダ。18. A further coding mode having a higher bit rate until a measure of performance is continuously obtained, the measure of performance is compared to a threshold, and the measure of performance of coding exceeds the threshold. 18. The speech coder of claim 17, further comprising: means for selecting to reject the coding mode.

【請求項１９】前記獲得する手段は結果的な合成スピーチフレームをスピ
ーチフレームと比較する手段を備えている請求項１７記載のスピーチコーダ。19. The speech coder of claim 17, wherein said means for obtaining comprises means for comparing the resulting synthesized speech frame to a speech frame.

【請求項２０】前記第１のコード化モードは予測ベースのコード化モード
であり、前記第２のコード化モードは直接コード化モードである請求項１７記載
のスピーチコード。20. The speech code according to claim 17, wherein the first coding mode is a prediction-based coding mode, and the second coding mode is a direct coding mode.

【請求項２１】前記選択する手段はスピーチフレームのパラメータに基づ
いて第１のコード化モードを選択する手段を具備している請求項１７記載のスピ
ーチコーダ。21. The speech coder of claim 17, wherein said means for selecting comprises means for selecting a first coding mode based on parameters of a speech frame.

【請求項２２】前記選択する手段はスピーチフレームのパラメータに基づ
いて第１のコード化モードを選択する手段を具備している請求項１８記載のスピ
ーチコーダ。22. The speech coder of claim 18, wherein said means for selecting comprises means for selecting a first coding mode based on parameters of a speech frame.

【請求項２３】しきい値は予め定められた量である請求項１７記載のスピ
ーチコーダ。23. The speech coder according to claim 17, wherein the threshold value is a predetermined amount.

【請求項２４】しきい値は平均ビット速度の関数である請求項１７記載の
スピーチコーダ。24. The speech coder of claim 17, wherein the threshold is a function of the average bit rate.