JPH05276049A

JPH05276049A - Voice coding method and its device

Info

Publication number: JPH05276049A
Application number: JP33252891A
Authority: JP
Inventors: Seiji Sasaki; 誠司佐々木
Original assignee: Kokusai Electric Corp
Current assignee: Kokusai Electric Corp
Priority date: 1991-11-21
Filing date: 1991-11-21
Publication date: 1993-10-22

Abstract

PURPOSE:To improve reproduced sound quality by devising the method such that a spectrum envelope changing rapidly and complicatedly is utilized for a non-voice period and a short time prediction residual signal is approximated by a simulated noise sound. CONSTITUTION:A discriminator 31 discriminates a voiced period and a non-voice period of an input voice signal (k) in the unit of frames. As a result, a voice/non-voice discrimination flag Pg representing a voice/non-voice period is outputted to a voice/non- voice changeover device 32. When a non-voice coder is selected, an input signal (k) is inputted to a short time prediction analyzer 39, from which the input signal is divided and outputted into spectrum envelope information Pk and a short time prediction residual signal (p) resulting from the input signal (k) with the envelope information Pk eliminated therefrom. Then, a power calculator 40 calculates power Ps of the signal (p), the power information Ps, the envelope information Pk and the discrimination flag Pg are given to a non-voice period signal encoder 41, in which they are processed into a digital signal string (q) through encoding and the result is multiplexed and sent to a transmission line. A rapid and complicated change in the spectrum envelope is processed easily for the non-voice period and the voice for the voice/non- voice period is reproduced with fidelity.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】自動車電話，携帯電話の普及は目
ざましく、現行のアナログ通信システムでは増大する加
入者を収容しきれなくなる事態が予想される。電波をよ
り有効に利用するため、ディジタル通信システムに移行
する計画が進められており、その第１世代（フルレー
ト）の標準化仕様がＲＣＲ（電波システム開発センタ）
から公開された。この中での音声符号化方式の符号化速
度は、音声データと誤り訂正用の冗長データで１１．２
ｋｂｐｓ（ビット／秒）である。更に２倍の電波使用効
率を目指して、音声符号化のハーフレート化が計画され
ている。このハーフレート音声符号化の符号化速度は、
音声データと誤り訂正用の冗長データで５．６ｋｂｐｓ
である。本発明は、ハーフレートシステムに適用するこ
とを目的とした音声符号化ディジタル通信に用いられ、
音声データを情報圧縮するための音声符号化方法及びそ
の装置に関するものである。[Industrial field of use] The spread of automobile phones and mobile phones is remarkable, and it is expected that current analog communication systems will not be able to accommodate the increasing number of subscribers. In order to use radio waves more effectively, a plan to move to a digital communication system is in progress, and the first generation (full rate) standardized specifications are RCR (Radio System Development Center).
Published by. The coding rate of the voice coding method in this is 11.2 for voice data and redundant data for error correction.
kbps (bits / second). Further, half-rate voice coding is planned for the purpose of doubling the radio wave usage efficiency. The coding rate of this half-rate speech coding is
5.6 kbps with voice data and redundant data for error correction
Is. INDUSTRIAL APPLICABILITY The present invention is used in voice coded digital communication for the purpose of applying to a half rate system,
The present invention relates to a speech coding method and apparatus for compressing speech data.

【０００２】[0002]

【従来の技術】図４は、ピッチ予測を用いた適応変換符
復号器の従来回路を示すブロック図であり、（Ａ）は音
声符号化装置、（Ｂ）は、音声復号装置である。この方
式は、音声データは例えば、６４ｋｂｐｓ（６．４ｋＨ
ｚサンプリング、１０ビット量子化されている）から
４．５ｋｂｐｓに情報圧縮する方式である。この方式を
ハーフレートに適用した場合、音声データが４．５ｋｂ
ｐｓであるので、誤り訂正冗長ビットとして１．１ｋｂ
ｐｓ割り当てられる。2. Description of the Related Art FIG. 4 is a block diagram showing a conventional circuit of an adaptive transform codec using pitch prediction, in which (A) is a speech coding apparatus and (B) is a speech decoding apparatus. In this method, the voice data is, for example, 64 kbps (6.4 kH).
This is a method of compressing information from z sampling (10-bit quantized) to 4.5 kbps. When this method is applied to half rate, the audio data is 4.5 kb
Since it is ps, it is 1.1 kb as an error correction redundant bit.
ps allocated.

【０００３】以下、例として音声データを６４ｋｂｐｓ
から４．５ｋｂｐｓに圧縮する方法について説明する。
図４（Ａ）の送信側において、６．４ｋＨｚサンプリン
グで１０ビット量子化された入力音声信号（６４ｋｂｐ
ｓ）は、１フレーム（３０ｍｓｅｃ：１９２サンプル）
毎に長期予測分析器１によりピッチ情報Ｐａを抽出して
出力するとともに、ピッチ成分を取り除いた信号である
長期予測残差信号を出力する。長期予測残差信号は、サ
ブフレーム（１５ｍｓｅｃ：９６サンプル）に分割され
た後、離散コサイン変換（ＤＣＴ）器２により周波数領
域に変換され、ＤＣＴ係数（９６サンプル／サブフレー
ム）を出力する。ＤＣＴ変換式については後述する。Ｄ
ＣＴ係数は、サブフレーム毎に適応間引き器３により間
引かれ情報圧縮される。ここでの間引き方は、各ＤＣＴ
係数の振幅はサブフレーム毎に変化するので、それに適
応するように振幅の大きいＤＣＴ係数を限られた個数だ
け選択し、それらの振幅情報，位置情報をＤＣＴ情報Ｐ
ｂとして受信側に送っている。残りの振幅の小さいＤＣ
Ｔ係数は０にする。ピッチ情報Ｐａ，ＤＣＴ情報Ｐｂは
符号化器４によりディジタル信号系列に変換され、多重
化されて送出される。In the following, as an example, voice data is 64 kbps.
To 4.5 kbps will be described.
On the transmission side of FIG. 4 (A), an input voice signal (64 kbp) quantized by 6.4 kHz sampling and 10 bits is used.
s) is 1 frame (30 msec: 192 samples)
The long-term prediction analyzer 1 extracts and outputs the pitch information Pa for each time and outputs the long-term prediction residual signal which is a signal from which the pitch component is removed. The long-term prediction residual signal is divided into subframes (15 msec: 96 samples) and then converted into the frequency domain by the discrete cosine transform (DCT) unit 2 to output DCT coefficients (96 samples / subframe). The DCT conversion formula will be described later. D
The CT coefficient is decimated by the adaptive decimator 3 for each subframe and information is compressed. The decimation method here is for each DCT
Since the amplitude of the coefficient changes for each subframe, a limited number of DCT coefficients with large amplitude are selected so as to adapt to the subframe, and the amplitude information and position information thereof are set to the DCT information P.
It is sent to the receiving side as b. The remaining small DC
The T coefficient is set to 0. The pitch information Pa and the DCT information Pb are converted into a digital signal sequence by the encoder 4, multiplexed and transmitted.

【０００４】受信側では、ディジタル信号列を受信し、
分離回路５によりピッチ情報ＰｃとＤＣＴ情報Ｐｄとに
分離する。適応間引き復号器６では、ピッチ情報Ｐｃ中
のＤＣＴ係数振幅情報と位置情報により、送られてきた
ＤＣＴ係数を再生する。送られてこなかったＤＣＴ係数
の位置に０を挿入することにより補間する。再生され
たＤＣＴ係数を逆離散コサイン変換（ＩＤＣＴ）器７に
より時間領域に変換し、長期予測残差信号を再生する。
長期予測合成器８では長期予測残差信号にＤＣＴ情報Ｐ
ｄを付加することにより音声信号を復号再生する。従来
の符復号器のフレーム（３０ｍｓｅｃ）毎のビット配分
の例を下の表１に示す。各フレームの先頭には、フレー
ム同期をとるため５ビットの同期ビットを挿入してい
る。表１での合計を１秒当たりに換算すると、１３５ビ
ット／３０ｍｓｅｃ＝４．５ｋｂｐｓとなる。On the receiving side, the digital signal train is received,
The separation circuit 5 separates the pitch information Pc and the DCT information Pd. The adaptive thinning decoder 6 reproduces the transmitted DCT coefficient based on the DCT coefficient amplitude information and the position information in the pitch information Pc. Interpolation is performed by inserting 0 at the position of the DCT coefficient that has not been sent. The reproduced DCT coefficient is converted into the time domain by the inverse discrete cosine transform (IDCT) device 7, and the long-term prediction residual signal is reproduced.
In the long-term prediction synthesizer 8, the DCT information P is added to the long-term prediction residual signal.
The audio signal is decoded and reproduced by adding d. An example of bit allocation for each frame (30 msec) of the conventional codec is shown in Table 1 below. Five synchronization bits are inserted at the beginning of each frame for frame synchronization. Converting the total in Table 1 per second gives 135 bits / 30 msec = 4.5 kbps.

【表１】 [Table 1]

【０００５】携帯電話・自動車電話等の移動通信システ
ムでは有線または固定通信システムと違い伝送路状況が
過酷であり、ビット誤り率は常時０．１％〜１％であり
１０％程度になることも稀ではない。このため、ハーフ
レート音声符号化方式では、強力な誤り訂正機能を有す
る必要があり、全符号化速度（５．６ｋｂｐｓ）のうち
３５％（約２ｋｂｐｓ）程度以上は誤り訂正用の冗長ビ
ットに割り当てることが必要であるといえる。従って、
ハーフレート音声符号化方式では、音声データの符号化
速度は約３．６ｋｂｐｓ以下で高品質（ｌｏｇ−ＰＣＭ
６ビット相当）な再生音声が得られることが要求され
る。In a mobile communication system such as a mobile phone or an automobile phone, unlike a wired or fixed communication system, the transmission line condition is severe, and the bit error rate is always 0.1% to 1% and may be about 10%. Not rare. For this reason, the half-rate speech coding system needs to have a strong error correction function, and about 35% (about 2 kbps) or more of the total coding speed (5.6 kbps) is allocated to redundant bits for error correction. Can be said to be necessary. Therefore,
In the half-rate speech coding method, the coding rate of the speech data is about 3.6 kbps or less and high quality (log-PCM
It is required to obtain reproduced sound of 6 bits).

【０００６】上記従来の方式の再生音声品質は、音声符
号化速度４．５ｋｂｐｓで、ｌｏｇ−ＰＣＭ４ビット相
当しか得られておらず、音声符号化速度をさらに３．６
ｋｂｐｓ以下に下げた場合、伝送できるＤＣＴ係数の個
数は減少し、周波数領域での歪みが大きくなるため、更
に再生音声品質は劣下する。つまり、従来の方式では再
生音声品質をｌｏｇ−ＰＣＭ６ビット相当で符号化速度
を３．６ｋｂｐｓ以下に下げることはできず、ハーフレ
ート音声符号器に要求される性能（品質，誤り訂正能
力）を満たすことはできない。上述の従来技術の問題点
を解決するため、本発明者は図４の適応間引器３の代り
にベクトル量子化器を用いた音声符号化方法及びその装
置を先に提案した。図３は上記の提案による音声符号化
装置のブロック図であり、（Ａ）は音声符号化装置、
（Ｂ）は音声復号装置である。図３（Ａ）において、
６．４ｋＨｚサンプリング、１０ビット量子化された入
力音声信号（６４ｋｂｐｓ）ａは、長期予測分析器１１
によりフレーム（１０ｍｓｅｃ：６４サンプル）毎にピ
ッチ情報Ｐａを抽出して出力するとともに、更に入力信
号ａからピッチ成分を取り除いた信号である長期予測残
差信号ｂを生成して出力する。それを離散コサイン変換
（ＤＣＴ）器１２によりサブフレーム（１０ｍｓｅｃ：
６４サンプル）毎に周波数領域に変換して周波数成分で
あるＤＣＴ係数ｃ（６４係数）を出力する。離散コサイ
ン変換については後で述べる。ＤＣＴ係数ｃを正規化器
１３によりＤＣＴ係数の最大値により正規化し、ＤＣＴ
係数最大値Ｐｂと正規化されたＤＣＴ係数ｄを得る。次
に正規化されたＤＣＴ係数ｄをベクトル量子化器１４と
符号帳１５によりベクトル量子化する。符号帳１５には
例えば５１２種類のベクトルパターンが記憶されてい
る。ベクトル量子化器１４は、未知入力ベクトルである
ＤＣＴ係数ｄと符号帳１５の中のベクトルを比較し、ベ
クトル間距離が最小となるベクトルを選択し、その番号
Ｐｃを出力する。ベクトル番号Ｐｃは９ビット量子化さ
れる。ベクトル量子化の効果については後で説明する。
ベクトル番号Ｐｃは符号化器１６によりＤＣＴ係数の最
大値Ｐｂ及びピッチ情報Ｐａとともにディジタル列信号
ｅの形態に符号化した後多重化して伝送路に送出され
る。The reproduced voice quality of the above-mentioned conventional system is a voice coding rate of 4.5 kbps, and only log-PCM 4-bit equivalent is obtained, and the voice coding rate is further increased to 3.6.
When the frequency is reduced to kbps or less, the number of DCT coefficients that can be transmitted decreases, and distortion in the frequency domain increases, so that the reproduced voice quality further deteriorates. That is, in the conventional system, the reproduced voice quality cannot be reduced to 3.6 kbps or less with the log-PCM equivalent to 6 bits, and the performance (quality, error correction capability) required for the half rate voice encoder is satisfied. It is not possible. In order to solve the above-mentioned problems of the prior art, the present inventor has previously proposed a speech coding method and apparatus using a vector quantizer instead of the adaptive decimator 3 of FIG. FIG. 3 is a block diagram of a speech coder according to the above proposal, in which (A) is a speech coder,
(B) is a voice decoding device. In FIG. 3 (A),
The 6.4 kHz sampling, 10-bit quantized input speech signal (64 kbps) a is the long-term prediction analyzer 11
Thus, the pitch information Pa is extracted and output for each frame (10 msec: 64 samples), and the long-term prediction residual signal b, which is a signal obtained by removing the pitch component from the input signal a, is generated and output. The discrete cosine transform (DCT) unit 12 converts it into a subframe (10 msec:
Each 64 samples) is converted into the frequency domain and the DCT coefficient c (64 coefficient) that is a frequency component is output. Discrete cosine transform will be described later. The DCT coefficient c is normalized by the normalizer 13 with the maximum value of the DCT coefficient,
The coefficient maximum value Pb and the normalized DCT coefficient d are obtained. Next, the normalized DCT coefficient d is vector-quantized by the vector quantizer 14 and the codebook 15. The codebook 15 stores, for example, 512 types of vector patterns. The vector quantizer 14 compares the DCT coefficient d, which is an unknown input vector, with the vector in the codebook 15, selects the vector having the smallest inter-vector distance, and outputs the number Pc. The vector number Pc is quantized by 9 bits. The effect of vector quantization will be described later.
The vector number Pc is encoded by the encoder 16 together with the maximum value Pb of the DCT coefficient and the pitch information Pa in the form of the digital sequence signal e, then multiplexed and transmitted to the transmission line.

【０００７】（Ｂ）の受信側では、伝送路を介して受信
した前記ディジタル列信号ｆは分離回路２１によりＤＣ
Ｔベクトル番号Ｐｄ，ＤＣＴ係数最大値Ｐｅ及びピッチ
情報Ｐｆが分離されて取り出され、ＤＣＴベクトル番号
Ｐｄを用いてベクトル逆量子化器２２及び符号帳２３に
より逆ベクトル量子化され、正規化されたＤＣＴ係数ｇ
を再生する。ここで、符号帳２３は送信側符号化装置の
符号帳１５と同じ内容になっており、ベクトル番号Ｐｄ
を指定することにより送信側で選択されたベクトルと同
じものを得ることができる。逆正規化器２４では、ＤＣ
Ｔ係数最大値Ｐｅを正規化されたＤＣＴ係数ｇに乗算す
ることによりＤＣＴ係数ｈを再生する。逆離散コサイン
変換（逆ＤＣＴ）器２５ではＤＣＴ係数ｈを時間領域に
変換して長期予測残差信号ｉを再生する。長期予測合成
器２６では長期予測残差信号ｉにピッチ情報Ｐｆを付加
して音声信号ｊを復号再生する。On the receiving side of (B), the digital column signal f received via the transmission line is DC-converted by the separation circuit 21.
The T vector number Pd, the DCT coefficient maximum value Pe, and the pitch information Pf are separated and taken out, and the DCT vector number Pd is used to perform the inverse vector quantization by the vector inverse quantizer 22 and the codebook 23 to obtain the normalized DCT. Coefficient g
To play. Here, the codebook 23 has the same contents as the codebook 15 of the transmitting side encoding device, and the vector number Pd
By specifying, it is possible to obtain the same vector as the vector selected on the transmission side. In the denormalizer 24, DC
The DCT coefficient h is reproduced by multiplying the normalized DCT coefficient g by the T coefficient maximum value Pe. The inverse discrete cosine transform (inverse DCT) unit 25 transforms the DCT coefficient h into the time domain to reproduce the long-term prediction residual signal i. The long-term prediction synthesizer 26 adds pitch information Pf to the long-term prediction residual signal i and decodes and reproduces the voice signal j.

【０００８】この図３の方法では、ピッチ情報を取り除
いた長期予測残差信号のＤＣＴ係数に対してベクトル量
子化を適用している。。ベクトル量子化とは、Ｎ個の相
続くサンプル値のまとまりはＮ個の成分をもったＮ次元
空間の１点あるいは原点からその点までの１つのＮ次元
ベクトルであると考え、ベクトル単位で量子化する方法
である。以下、ベクトル量子化使用の効果について説明
する。音声信号の特徴として、その周波数成分はスペク
トル包絡の形により低域（約１ｋＨｚ以下）では大きく
なり、高域では小さな値をとる。よって、本発明で扱わ
れる長期予測残差信号はスペクトル包絡情報を含んでい
るため、その周波数成分であるＤＣＴ係数は発生分布に
片寄りをもつ。分布が集中しているそれぞれの領域内の
ベクトルをそれぞれ１個の代表ベクトルにより代表させ
ることにより、音声信号の統計的性質を考慮した効率の
良い量子化が実現できる。代表ベクトルのパターンは予
め符号帳に記憶させておき、符号化器側と復号器側に同
じものを用意しておく。そして、符号化器側で未知入力
ベクトルと距離が最小になる代表ベクトルを探索しその
番号を送ることにより、復号器側でＤＣＴ係数の再生が
可能となる。例えば、１フレーム（９６サンプル）毎の
ＤＣＴ係数を１つの９６次元ベクトルとし、符号帳に記
憶させた代表ベクトルの個数を５１２とすれば、ベクト
ル番号はｌｏｇ₂５１２＝９ビットで表現できる。つま
り、９６サンプルのＤＣＴ係数を９ビットで量子化でき
るため従来のスカラー量子化に比べ符号化速度の低い音
声符号化器の実現が可能となる。In the method of FIG. 3, vector quantization is applied to the DCT coefficient of the long-term prediction residual signal from which pitch information has been removed. . In vector quantization, a group of N consecutive sample values is considered to be one point in an N-dimensional space having N components or one N-dimensional vector from the origin to that point. It is a method of converting. The effect of using vector quantization will be described below. As a feature of the audio signal, its frequency component becomes large in the low range (about 1 kHz or less) due to the shape of the spectrum envelope, and takes a small value in the high range. Therefore, since the long-term prediction residual signal handled in the present invention includes spectral envelope information, the DCT coefficient, which is its frequency component, has a deviation in the generation distribution. By representing each vector in each region where the distribution is concentrated by one representative vector, efficient quantization can be realized in consideration of the statistical properties of the voice signal. The representative vector pattern is stored in the codebook in advance, and the same one is prepared on the encoder side and the decoder side. Then, the encoder side searches for a representative vector that has the minimum distance from the unknown input vector and sends the number, so that the decoder side can reproduce the DCT coefficient. For example, assuming that the DCT coefficient for each frame (96 samples) is one 96-dimensional vector and the number of representative vectors stored in the codebook is 512, the vector number can be represented by log ₂ 512 = 9 bits. That is, since the DCT coefficient of 96 samples can be quantized with 9 bits, it is possible to realize a speech coder having a lower coding speed than the conventional scalar quantization.

【０００９】この先に提案した方式におけるビット配分
（フレーム毎１０ｍｓｅｃ）を次の表２に示す。このビ
ット配分を用いれば、符号化速度は約２．４ｋｂｐｓ
（２４ビット／１０ｍｓｅｃ）となる。The bit allocation (10 msec per frame) in the previously proposed method is shown in Table 2 below. If this bit allocation is used, the coding rate is about 2.4 kbps.
(24 bits / 10 msec).

【表２】 [Table 2]

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、この方
式で問題となるのは、ハードウェアを実現する際、メモ
リ容量に限界があるため符号帳１５，２３の大きさが制
限され、再生音声の高品質化が望めないことである。実
際には符号帳内ベクトル数は５１２種類程度が限界であ
る。また、１個の符号帳によって有声音声，無声音声両
方のスペクトル包絡および微細構造を表現しているた
め、スペクトル包絡が時間的に急激かつ複雑に変化する
無声音声に対応できず、再生音声の明瞭性が劣化するた
め、ハーフレート音声符復号化器に要求される品質（ｌ
ｏｇ−ＰＣＭ６ビット相当）を完全に満たすことはで
きないといえる。本発明の目的は音声符号化速度３．６
ｋｂｐｓ以下のハーフレートシステムに適用可能で、し
かも再生音声品質がｌｏｇ−ＰＣＭ６ビット相当以上
の音声符号化方法およびその装置を提供することであ
る。However, the problem with this method is that when the hardware is implemented, the size of the codebooks 15 and 23 is limited due to the limitation of the memory capacity, and the reproduction voice is high. It is that quality cannot be expected. Actually, the number of vectors in the codebook is limited to about 512 kinds. In addition, since the spectrum envelope and fine structure of both voiced speech and unvoiced speech are expressed by one codebook, it is not possible to deal with unvoiced speech in which the spectrum envelope changes abruptly and complicatedly with time, and the reproduced speech is clear. Quality is deteriorated, the quality required for the half-rate speech codec (l
It can be said that it is not possible to completely satisfy the requirement of og-PCM 6 bits). The object of the present invention is to achieve a speech coding rate of 3.6.
It is an object of the present invention to provide a voice encoding method and apparatus applicable to a half rate system of kbps or lower and having a reproduced voice quality of log-PCM 6 bits or more.

【００１１】[0011]

【課題を解決するための手段】本発明は、従来方式の問
題を解決するため、有声区間については先に提案した方
法と同様な符復号化方法を用いるが、無声区間に対して
は無声音声の性質、つまり、（１）スペクトル包絡が急激かつ複雑に変化すること（２）短期予測残差信号は擬似雑音により近似できる
ことを考慮した線形予測符号化（ＬＰＣ）方式の符号化方法
を適用したことを特徴とするものである。その構成は、
入力音声信号が有声区間のとき該入力音声信号を長期予
測分析してピッチ情報と該ピッチ情報を取り除いた長期
予測残差信号とを抽出し該長期予測残差信号を離散コサ
イン変換により周波数領域に変換したＤＣＴ係数をその
最大値によって正規化してベクトル量子化しそのベクト
ルに最も近似するベクトルの番号を符号帳から読み出し
て該ベクトル番号と前記ＤＣＴ係数の最大値と前記ピッ
チ情報と有声区間を示すフラグとを符号化して有声区間
のディジタル信号列とし、前記入力音声信号が無声区間
のとき該入力音声信号を短期予測分析してスペクトル包
絡情報と該スペクトル包絡情報を取り除いた短期予測残
差信号とを抽出し該短期予測残差信号の電力を計算した
電力情報と前記スペクトル包絡情報と無声区間を示すフ
ラグとを符号化して無声区間のディジタル信号列とし、
前記有声区間のディジタル信号列と該無声区間のディジ
タル信号列とを多重化して伝送路に送出し、受信側で
は、受信した多重化ディジタル信号列を前記フラグによ
り有声区間か無声区間かを判別して分離し、有声区間で
あれば前記ベクトル番号と前記ＤＣＴ係数の最大値と前
記ピッチ情報とを抽出し該ベクトル番号に対応する量子
化ベクトルを符号帳から読み出してベクトル逆量子化に
より正規化されたＤＣＴ係数を求め該正規化されたＤＣ
Ｔ係数と前記ＤＣＴ係数の最大値とによりＤＣＴ係数を
再生したのち逆離散コサイン変換により長期予測残差信
号を再生し該長期予測残差信号に前記ピッチ情報を付加
して長期予測合成した有声音声信号を出力し、無声区間
であれば前記電力情報と前記スペクトル包絡情報とを抽
出し擬似雑音に該電力情報を付加して近似的に再生した
短期予測残差信号と前記スペクトル包絡情報とを短期予
測合成して無声音声信号を出力するようにしたことを特
徴とするものである。In order to solve the problems of the conventional method, the present invention uses a coding / decoding method similar to the previously proposed method for voiced sections, but unvoiced speech for unvoiced sections. The linear predictive coding (LPC) coding method is applied in consideration of the property of (1) the spectrum envelope changes abruptly and complicatedly (2) the short-term prediction residual signal can be approximated by pseudo noise. It is characterized by that. Its composition is
When the input speech signal is in the voiced section, the input speech signal is subjected to long-term prediction analysis to extract pitch information and a long-term prediction residual signal from which the pitch information has been removed, and the long-term prediction residual signal is converted to the frequency domain by discrete cosine transform. The converted DCT coefficient is normalized by its maximum value, the vector is quantized, the vector number closest to the vector is read from the codebook, and the vector number, the maximum value of the DCT coefficient, the pitch information, and the flag indicating the voiced section are read. Are encoded into a voiced section digital signal sequence, and when the input speech signal is in the unvoiced section, the input speech signal is subjected to short-term prediction analysis to obtain spectrum envelope information and a short-term prediction residual signal from which the spectrum envelope information has been removed. Encode the power information obtained by calculating the power of the short-term prediction residual signal, the spectrum envelope information, and a flag indicating the unvoiced section A digital signal sequence of unvoiced,
The digital signal sequence of the voiced section and the digital signal sequence of the unvoiced section are multiplexed and transmitted to the transmission line, and the receiving side discriminates the voiced section or the unvoiced section from the received multiplexed digital signal sequence by the flag. If it is a voiced section, the vector number, the maximum value of the DCT coefficient, and the pitch information are extracted, and the quantized vector corresponding to the vector number is read from the codebook and normalized by vector dequantization. DCT coefficient obtained and the normalized DC
Voiced speech synthesized by long-term prediction synthesis by reproducing the DCT coefficient by the T coefficient and the maximum value of the DCT coefficient, then reproducing the long-term prediction residual signal by inverse discrete cosine transform, and adding the pitch information to the long-term prediction residual signal. A signal is output, and in the unvoiced section, the power information and the spectrum envelope information are extracted, the power information is added to pseudo noise, and the short-term prediction residual signal approximately reproduced by adding the power information and the spectrum envelope information are short-term. It is characterized in that a predictive synthesis is performed to output an unvoiced voice signal.

【００１２】[0012]

【実施例】図１は本発明の音声符号化装置の実施例を示
すブロック図であり、図２は本発明の音声復号装置の実
施例を示すブロック図である。図１において、６．４ｋ
Ｈｚサンプリング，１０ビット量子化された入力音声信
号（６４ｋｂｐｓ）ｋは、有声／無声判定器３１により
フレーム（２０ｍｓｅｃ：１２８サンプル）毎に有声区
間であるか無声区間であるかが判定され、その結果その
いずれかを示す有声／無声判定プラグＰｇが出力され
る。有声／無声判定フラグＰｇは２つの有声／無声切換
器３２に入力され、音声符号化器の入出力を有声用符号
化器，無声用符号化器のいずれかに切換える。有声用が
選択された場合、入力音声ｋは、長期予測分析器３３に
よりサブフレーム（１０ｍｓｅｃ：６４サンプル）毎に
ピッチ情報Ｐｈを抽出して出力するとともに、入力信号
ｋからピッチ成分を取り除いた信号である長期予測残差
信号を生成して出力する。それを離散コサイン変換（Ｄ
ＣＴ）器３４により、サブフレーム（１０ｍｓｅｃ：６
４サンプル）毎に周波数領域に変換して周波数成分であ
るＤＣＴ係数ｍ（６４係数）を出力する。離散コサイン
変換については後で述べる。ＤＣＴ係数ｍを正規化器３
５によりＤＣＴ係数の最大値により正規化し、ＤＣＴ係
数最大値Ｐｉと正規化されたＤＣＴ係数ｎを得る。1 is a block diagram showing an embodiment of a speech coding apparatus of the present invention, and FIG. 2 is a block diagram showing an embodiment of a speech decoding apparatus of the present invention. In FIG. 1, 6.4k
The input voice signal (64 kbps) k that has been subjected to Hz sampling and 10-bit quantization is determined by the voiced / unvoiced determination unit 31 for each frame (20 msec: 128 samples) to be a voiced section or an unvoiced section. A voiced / unvoiced determination plug Pg indicating either of them is output. The voiced / unvoiced determination flag Pg is input to the two voiced / unvoiced switching units 32 to switch the input / output of the voice encoder to either the voiced encoder or the unvoiced encoder. When the voiced one is selected, the input speech k is a signal obtained by extracting the pitch information Ph for each subframe (10 msec: 64 samples) by the long-term prediction analyzer 33 and outputting the same, and removing the pitch component from the input signal k. And generates and outputs the long-term prediction residual signal. The discrete cosine transform (D
The CT) unit 34 causes the sub-frame (10 msec: 6).
The DCT coefficient m (64 coefficients), which is a frequency component, is output for every 4 samples). Discrete cosine transform will be described later. DCT coefficient m is normalized by the normalizer 3
5, the DCT coefficient maximum value is normalized, and the DCT coefficient maximum value Pi and the normalized DCT coefficient n are obtained.

【００１３】次に正規化されたＤＣＴ係数ｎをベクトル
量子化器３６と符号帳３７によりベクトル量子化する。
符号帳３７には例えば５１２種類のベクトルパターンが
記憶されている。ベクトル量子化器３６は、未知入力ベ
クトルであるＤＣＴ係数ｎと符号帳３７の中のベクトル
を比較し、ベクトル間距離が最小となるベクトルを選択
し、その番号Ｐｊを出力する。ベクトル番号Ｐｊは９ビ
ットで量子化される。ベクトル番号Ｐｊは有声音用符号
化器３８によりＤＣＴ係数の最大値Ｐｉ，ピッチ情報Ｐ
ｈおよび有声／無声判定フラグＰｈとともにディジタル
列信号ｏの形態に符号化した後多重化して伝送路に送出
される。無声用が選択された場合、入力音声信号ｋは短
期予測分析器３８に入力され、スペクトル包絡情報Ｐｋ
と入力音声信号ｋからスペクトル包絡情報Ｐｋを取り除
いた短期予測残差信号ｐとに分割、出力される。次に電
力算出器４０により、短期予測誤差信号ｐの電力Ｐｓを
計算する。電力情報Ｐｓとスペクトル包絡情報Ｐｋおよ
び前記有声／無声判定フラグＰｇは無声用符号化器４１
によりディジタル信号列ｑの形態に符号化した後多重化
して伝送路に送出される。Next, the normalized DCT coefficient n is vector-quantized by the vector quantizer 36 and the codebook 37.
The codebook 37 stores, for example, 512 types of vector patterns. The vector quantizer 36 compares the DCT coefficient n, which is an unknown input vector, with the vector in the codebook 37, selects the vector having the smallest inter-vector distance, and outputs the number Pj. The vector number Pj is quantized with 9 bits. The vector number Pj is the maximum value Pi of the DCT coefficient and the pitch information P by the voiced sound encoder 38.
After being encoded in the form of the digital sequence signal o together with h and the voiced / unvoiced determination flag Ph, they are multiplexed and sent to the transmission line. When unvoiced is selected, the input speech signal k is input to the short-term prediction analyzer 38 and the spectrum envelope information Pk is input.
And the short-term prediction residual signal p obtained by removing the spectrum envelope information Pk from the input speech signal k and output. Next, the power calculator 40 calculates the power Ps of the short-term prediction error signal p. The power information Ps, the spectrum envelope information Pk, and the voiced / unvoiced determination flag Pg are the unvoiced encoder 41.
Then, it is encoded in the form of a digital signal sequence q, then multiplexed and sent to the transmission line.

【００１４】一方、受信側の図２において、伝送路を介
して受信したディジタル列信号ｓは、分離回路４２によ
り分離されて、まず有声／無声判定フラグＰｍを調べ、
それが有声であればＤＣＴベクトル番号Ｐｎ，ＤＣＴ係
数最大値Ｐｏおよびピッチ情報Ｐｐを取り出して有声用
復号器を動作させ、無声であれば電力情報Ｐｑ，スペク
トル包絡情報Ｐｒを取り出して無声用復号器を動作させ
る。有声用復号器では、ＤＣＴベクトル番号Ｐｎを用い
てベクトル逆量子化器４３および符号帳４４により逆ベ
クトル量子化して正規化されたＤＣＴ係数ｔを再生す
る。ここで、符号帳４４は送信側の符号帳（図１の３
７）と同じ内容を有しており、ベクトル番号Ｐｎを指定
することにより送信側で選択されたベクトルと同じもの
を得ることができる。逆正規化器４５では、ＤＣＴ係数
最大値Ｐｏを正規化されたＤＣＴ係数ｔに乗算すること
によりＤＣＴ係数ｕを再生する。逆離散コサイン変換
（逆ＤＣＴ）器４６ではＤＣＴ係数ｕを時間領域に変換
して長期予測残差信号ｖを再生する。長期予測合成器４
７では長期予測残差信号ｖにピッチ情報Ｐｐを付加して
音声信号ｗを復号再生する。無声用復号器では、擬似雑
音発生器４８から擬似雑音ｘを発生させ、乗算器４９に
より擬似雑音ｘに電力情報Ｐｑを付加することにより短
期予測残差信号ｙを近似的に再生する。再生された短期
予測残差信号ｙにスペクトル包絡情報Ｐｒを短期予測合
成器５０により付加することにより再生音声信号ｚを得
る。本発明の方法におけるビット配分（フレーム毎２０
ｍｓｅｃ）を次の表３に示す。このビット配分を用いれ
ば、符号化速度は２．４ｋｂｐｓ（４８ビット／２０ｍ
ｓｅｃ）となる。On the other hand, in FIG. 2 on the receiving side, the digital sequence signal s received through the transmission line is separated by the separation circuit 42, and the voiced / unvoiced determination flag Pm is checked first,
If it is voiced, the DCT vector number Pn, DCT coefficient maximum value Po and pitch information Pp are taken out to operate the voiced decoder. If it is unvoiced, power information Pq and spectrum envelope information Pr are taken out and the voiceless decoder is taken out. To operate. In the voiced decoder, the DCT vector number Pn is used to perform the inverse vector quantization by the vector inverse quantizer 43 and the codebook 44 to reproduce the normalized DCT coefficient t. Here, the codebook 44 is the codebook on the transmitting side (3 in FIG. 1).
It has the same contents as 7), and by specifying the vector number Pn, it is possible to obtain the same vector as that selected by the transmitting side. The inverse normalizer 45 reproduces the DCT coefficient u by multiplying the normalized DCT coefficient t by the DCT coefficient maximum value Po. The inverse discrete cosine transform (inverse DCT) unit 46 transforms the DCT coefficient u into the time domain to reproduce the long-term prediction residual signal v. Long-term prediction synthesizer 4
In 7, the pitch information Pp is added to the long-term prediction residual signal v to decode and reproduce the voice signal w. In the unvoiced decoder, the pseudo noise generator 48 generates the pseudo noise x, and the multiplier 49 adds the power information Pq to the pseudo noise x to approximately reproduce the short-term prediction residual signal y. The reproduced speech signal z is obtained by adding the spectrum envelope information Pr to the reproduced short-term prediction residual signal y by the short-term prediction synthesizer 50. Bit allocation in the method of the invention (20 per frame)
msec) is shown in Table 3 below. If this bit allocation is used, the coding speed is 2.4 kbps (48 bits / 20 m).
sec).

【００１５】[0015]

【表３】 [Table 3]

【００１６】ＤＣＴ及びＩＤＣＴの変換式は、入力信号
をＸ（ｎ）とするとそれぞれ次のようになる。（１）ＤＣＴの場合、求めるＤＣＴ係数Ｘｃ（ｋ）
は、但し、Ｎはブロック当たりのサンプル数ｇ（ｋ）＝１（ｋ＝０）ｇ（ｋ）＝√２（ｋ）＝１，２，…，Ｎ−１）（２）ＩＤＣＴの場合、復元される信号Ｘ（ｎ）は、 The conversion formulas for DCT and IDCT are as follows when the input signal is X (n). (1) In the case of DCT, desired DCT coefficient Xc (k)
Is However, N is the number of samples per block g (k) = 1 (k = 0) g (k) = √2 (k) = 1, 2, ..., N-1) (2) In the case of IDCT, it is restored. Signal X (n)

【００１７】[0017]

【発明の効果】以上詳細に説明したように、本発明を実
施することにより、次の効果が得られる。（１）無声区間ではスペクトル包絡情報を伝送するた
め、無声音声の特徴である急速かつ複雑なスペクトル包
絡の変化に対応することができ、忠実に無声音声を再生
できる。（２）符号帳は有声音声専用にできるため、有声区間の
再生音声品質は従来方式に比べ向上する。これらの効果
により、符号帳の記憶容量が少なくてよく、ハーフレー
トシステムに適用可能な、つまり音声符号化速度３．６
ｋｂｐｓ以下で、再生音声品質がｌｏｇ−ＰＣＭ６ビ
ット相当以上の優れた再生音声品質の音声符復号化装置
を実現することができる。As described in detail above, the following effects can be obtained by implementing the present invention. (1) Since the spectrum envelope information is transmitted in the unvoiced section, it is possible to cope with a rapid and complicated change in the spectrum envelope, which is a characteristic of the unvoiced voice, and to faithfully reproduce the unvoiced voice. (2) Since the codebook can be dedicated to voiced speech, the reproduced speech quality in the voiced section is improved as compared with the conventional method. Due to these effects, the storage capacity of the codebook may be small, and it is applicable to the half rate system, that is, the speech coding speed is 3.6.
It is possible to realize a speech codec having an excellent reproduced voice quality of kbps or less and a reproduced voice quality of log-PCM equivalent to 6 bits or more.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の音声符号化装置のブロック図である。FIG. 1 is a block diagram of a speech encoding apparatus of the present invention.

【図２】本発明の音声復号装置のブロック図である。FIG. 2 is a block diagram of a speech decoding apparatus of the present invention.

【図３】従来の実施例を示すブロック図である。FIG. 3 is a block diagram showing a conventional example.

【図４】従来の実施例を示すブロック図である。FIG. 4 is a block diagram showing a conventional embodiment.

【符号の説明】[Explanation of symbols]

１長期予測分析器２ＤＣＴ器３適応間引器４符号化器５分離回路６適応間引復号器７ＩＤＣＴ器８長期予測合成器１１長期予測分析器１２ＤＣＴ器１３正規化器１４ベクトル量子化器１５，２３符号帳１６符号化器２１分離回路２２ベクトル逆量子化器２４逆正規化器２５逆ＤＣＴ器２６長期予測合成器３１有声／無声判定器３２切替器３３長期予測分析器３４ＤＣＴ器３５正規化器３６ベクトル量子化器３７符号帳３８有声音用符号化器３９短期予測分析器４０電力算出器４１無声音用符号化器４２分離器４３ベクトル逆量子化器４４符号帳４５逆正規化器４６逆ＤＣＴ器４７長期予測合成器４８擬似雑音発生器４９乗算器５０短期予測合成器５１切替器 DESCRIPTION OF SYMBOLS 1 Long-term prediction analyzer 2 DCT device 3 Adaptive thinning-out device 4 Encoder 5 Separation circuit 6 Adaptive thinning-out decoder 7 IDCT device 8 Long-term prediction synthesizer 11 Long-term prediction analyzer 12 DCT device 13 Normalizer 14 Vector quantization Device 15,23 Codebook 16 Encoder 21 Separation circuit 22 Vector dequantizer 24 Inverse normalizer 25 Inverse DCT device 26 Long-term prediction synthesizer 31 Voiced / unvoiced decision unit 32 Switcher 33 Long-term prediction analyzer 34 DCT device 35 normalizer 36 vector quantizer 37 codebook 38 voiced sound encoder 39 short-term prediction analyzer 40 power calculator 41 unvoiced sound encoder 42 separator 43 vector dequantizer 44 codebook 45 denormalization Unit 46 Inverse DCT unit 47 Long-term prediction synthesizer 48 Pseudo-noise generator 49 Multiplier 50 Short-term prediction synthesizer 51 Switcher

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年２月１６日[Submission date] February 16, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００４[Correction target item name] 0004

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【０００４】受信側では、ディジタル信号列を受信し、
分離回路５によりピッチ情報ＰｄとＤＣＴ情報Ｐｃとに
分離する。適応間引き復号器６では、ＤＣＴ情報Ｐｃ中
のＤＣＴ係数振幅情報と位置情報により、送られてきた
ＤＣＴ係数を再生する。送られてこなかったＤＣＴ係数
の位置に０を挿入することにより補間する。再生され
たＤＣＴ係数を逆離散コサイン変換（ＩＤＣＴ）器７に
より時間領域に変換し、長期予測残差信号を再生する。
長期予測合成器８では長期予測残差信号にピッチ情報Ｐ
ｄを付加することにより音声信号を復号再生する。従来
の符復号器のフレーム（３０ｍｓｅｃ）毎のビット配分
の例を下の表１に示す。各フレームの先頭には、フレー
ム同期をとるため５ビットの同期ビットを挿入してい
る。表１での合計を１秒当たりに換算すると、１３５ビ
ット／３０ｍｓｅｃ＝４．５ｋｂｐｓとなる。On the receiving side, the digital signal train is received,
The separation circuit 5 separates the pitch information Pd and the DCT information Pc . The adaptive thinning-out decoder 6 reproduces the transmitted DCT coefficient based on the DCT coefficient amplitude information and the position information in the DCT information Pc. Interpolation is performed by inserting 0 at the position of the DCT coefficient that has not been sent. The reproduced DCT coefficient is converted into the time domain by the inverse discrete cosine transform (IDCT) device 7, and the long-term prediction residual signal is reproduced.
In the long-term prediction synthesizer 8, pitch information P is added to the long-term prediction residual signal.
The audio signal is decoded and reproduced by adding d. An example of bit allocation for each frame (30 msec) of the conventional codec is shown in Table 1 below. Five synchronization bits are inserted at the beginning of each frame for frame synchronization. Converting the total in Table 1 per second gives 135 bits / 30 msec = 4.5 kbps.

【表１】 [Table 1]

【手続補正２】[Procedure Amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】０００６[Correction target item name] 0006

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【０００６】 [0006]

【発明が解決しようとする課題】上記従来の方式の再生
音声品質は、音声符号化速度４．５ｋｂｐｓで、ｌｏｇ
−ＰＣＭ４ビット相当しか得られておらず、音声符号化
速度をさらに３．６ｋｂｐｓ以下に下げた場合、伝送で
きるＤＣＴ係数の個数は減少し、周波数領域での歪みが
大きくなるため、更に再生音声品質は劣下する。つま
り、従来の方式では再生音声品質をｌｏｇ−ＰＣＭ６ビ
ット相当で符号化速度を３．６ｋｂｐｓ以下に下げるこ
とはできず、ハーフレート音声符号器に要求される性能
（品質，誤り訂正能力）を満たすことはできない。上述
の従来技術の問題点を解決するため、本発明者は図４の
適応間引器３の代りにベクトル量子化器を用いた音声符
号化方法及びその装置を先に提案した。（特願平３−３
２０９８２号参照）図３は上記の提案による音声符号化
装置のブロック図であり、（Ａ）は音声符号化装置、
（Ｂ）は音声復号装置である。図３（Ａ）において、
６．４ｋＨｚサンプリング、１０ビット量子化された入
力音声信号（６４ｋｂｐｓ）ａは、長期予測分析器１１
によりフレーム（１０ｍｓｅｃ：６４サンプル）毎にピ
ッチ情報Ｐａを抽出して出力するとともに、更に入力信
号ａからピッチ成分を取り除いた信号である長期予測残
差信号ｂを生成して出力する。それを離散コサイン変換
（ＤＣＴ）器１２によりサブフレーム（１０ｍｓｅｃ：
６４サンプル）毎に周波数領域に変換して周波数成分で
あるＤＣＴ係数ｃ（６４係数）を出力する。離散コサイ
ン変換については後で述べる。ＤＣＴ係数ｃを正規化器
１３によりＤＣＴ係数の最大値により正規化し、ＤＣＴ
係数最大値Ｐｂと正規化されたＤＣＴ係数ｄを得る。次
に正規化されたＤＣＴ係数ｄをベクトル量子化器１４と
符号帳１５によりベクトル量子化する。符号帳１５には
例えば５１２種類のベクトルパターンが記憶されてい
る。ベクトル量子化器１４は、未知入力ベクトルである
ＤＣＴ係数ｄと符号帳１５の中のベクトルを比較し、ベ
クトル間距離が最小となるベクトルを選択し、その番号
Ｐｃを出力する。ベクトル番号Ｐｃは９ビット量子化さ
れる。ベクトル量子化の効果については後で説明する。
ベクトル番号Ｐｃは符号化器１６によりＤＣＴ係数の最
大値Ｐｂ及びピッチ情報Ｐａとともにディジタル列信号
ｅの形態に符号化した後多重化して伝送路に送出され
る。It reproduced speech quality of the conventional method The object of the invention is to solve the above-speech encoding rate 4.5Kbps, log
-If only PCM 4 bits are obtained and the voice coding rate is further reduced to 3.6 kbps or less, the number of DCT coefficients that can be transmitted is reduced, and distortion in the frequency domain is increased, so that the reproduced voice quality is further increased. Is inferior. That is, in the conventional system, the reproduced voice quality cannot be reduced to 3.6 kbps or less with the log-PCM equivalent to 6 bits, and the performance (quality, error correction capability) required for the half rate voice encoder is satisfied. It is not possible. In order to solve the above-mentioned problems of the prior art, the present inventor has previously proposed a speech coding method and apparatus using a vector quantizer instead of the adaptive decimator 3 of FIG. (Japanese Patent Application No. 3-3
20982 No. reference) FIG. 3 is a block diagram of a speech coding apparatus according to the above suggestions, (A) is a speech coding apparatus,
(B) is a voice decoding device. In FIG. 3 (A),
The 6.4 kHz sampling, 10-bit quantized input speech signal (64 kbps) a is the long-term prediction analyzer 11
Thus, the pitch information Pa is extracted and output for each frame (10 msec: 64 samples), and the long-term prediction residual signal b, which is a signal obtained by removing the pitch component from the input signal a, is generated and output. The discrete cosine transform (DCT) unit 12 converts it into a subframe (10 msec:
Each 64 samples) is converted into the frequency domain and the DCT coefficient c (64 coefficient) that is a frequency component is output. Discrete cosine transform will be described later. The DCT coefficient c is normalized by the normalizer 13 with the maximum value of the DCT coefficient,
The coefficient maximum value Pb and the normalized DCT coefficient d are obtained. Next, the normalized DCT coefficient d is vector-quantized by the vector quantizer 14 and the codebook 15. The codebook 15 stores, for example, 512 types of vector patterns. The vector quantizer 14 compares the DCT coefficient d, which is an unknown input vector, with the vector in the codebook 15, selects the vector having the smallest inter-vector distance, and outputs the number Pc. The vector number Pc is quantized by 9 bits. The effect of vector quantization will be described later.
The vector number Pc is encoded by the encoder 16 together with the maximum value Pb of the DCT coefficient and the pitch information Pa in the form of the digital sequence signal e, then multiplexed and transmitted to the transmission line.

【手続補正３】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１０[Correction target item name] 0010

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００１０】しかしながら、この方式で問題となるの
は、ハードウェアを実現する際、メモリ容量に限界があ
るため符号帳１５，２３の大きさが制限され、再生音声
の高品質化が望めないことである。実際には符号帳内ベ
クトル数は５１２種類程度が限界である。また、１個の
符号帳によって有声音声，無声音声両方のスペクトル包
絡および微細構造を表現しているため、スペクトル包絡
が時間的に急激かつ複雑に変化する無声音声に対応でき
ず、再生音声の明瞭性が劣化するため、ハーフレート音
声符復号化器に要求される品質（ｌｏｇ−ＰＣＭ６ビ
ット相当）を完全に満たすことはできないといえる。本
発明の目的は音声符号化速度３．６ｋｂｐｓ以下のハー
フレートシステムに適用可能で、しかも再生音声品質が
ｌｏｇ−ＰＣＭ６ビット相当以上の音声符号化方法お
よびその装置を提供することである。However, the problem with this method is that the size of the codebooks 15 and 23 is limited due to the limited memory capacity when the hardware is implemented, and it is not possible to improve the quality of the reproduced voice. Is. Actually, the number of vectors in the codebook is limited to about 512 kinds. In addition, since the spectrum envelope and fine structure of both voiced speech and unvoiced speech are expressed by one codebook, it is not possible to deal with unvoiced speech in which the spectrum envelope changes abruptly and complicatedly with time, and the reproduced speech is clear. It can be said that the quality (corresponding to 6 bits of log-PCM) required for the half-rate speech coder / decoder cannot be completely satisfied because the quality deteriorates. It is an object of the present invention to provide a speech coding method and apparatus applicable to a half rate system having a speech coding rate of 3.6 kbps or less, and having a reproduced speech quality equivalent to log-PCM 6 bits or more.

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図面の簡単な説明[Name of item to be corrected] Brief description of the drawing

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【図３】先に提案した発明の実施例を示すブロック図で
ある。FIG. 3 is a block diagram showing an embodiment of the previously proposed invention .

【手続補正書】[Procedure amendment]

【提出日】平成５年６月４日[Submission date] June 4, 1993

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図３[Name of item to be corrected] Figure 3

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

Claims

【特許請求の範囲】[Claims]

【請求項１】入力音声信号が有声区間のとき該入力音
声信号を長期予測分析してピッチ情報と該ピッチ情報を
取り除いた長期予測残差信号とを抽出し該長期予測残差
信号を離散コサイン変換により周波数領域に変換したＤ
ＣＴ係数をその最大値によって正規化してベクトル量子
化しそのベクトルに最も近似するベクトルの番号を符号
帳から読み出して該ベクトル番号と前記ＤＣＴ係数の最
大値と前記ピッチ情報と有声区間を示すフラグとを符号
化して有声区間のディジタル信号列とし、前記入力音声
信号が無声区間のとき該入力音声信号を短期予測分析し
てスペクトル包絡情報と該スペクトル包絡情報を取り除
いた短期予測残差信号とを抽出し該短期予測残差信号の
電力を計算した電力情報と前記スペクトル包絡情報と無
声区間を示すフラグとを符号化して無声区間のディジタ
ル信号列とし、前記有声区間のディジタル信号列と該無
声区間のディジタル信号列とを多重化して伝送路に送出
し、受信側では、受信した多重化ディジタル信号列を前記フ
ラグにより有声区間か無声区間かを判別して分離し、有
声区間であれば前記ベクトル番号と前記ＤＣＴ係数の最
大値と前記ピッチ情報とを抽出し該ベクトル番号に対応
する量子化ベクトルを符号帳から読み出してベクトル逆
量子化により正規化されたＤＣＴ係数を求め該正規化さ
れたＤＣＴ係数と前記ＤＣＴ係数の最大値とによりＤＣ
Ｔ係数を再生したのち逆離散コサイン変換により長期予
測残差信号を再生し該長期予測残差信号に前記ピッチ情
報を付加して長期予測合成した有声音声信号を出力し、
無声区間であれば前記電力情報と前記スペクトル包絡情
報とを抽出し擬似雑音に該電力情報を付加して近似的に
再生した短期予測残差信号と前記スペクトル包絡情報と
を短期予測合成して無声音声信号を出力するようにした
音声符号化方法。1. When the input speech signal is in a voiced section, the input speech signal is subjected to long-term prediction analysis to extract pitch information and a long-term prediction residual signal from which the pitch information has been removed, and the long-term prediction residual signal is discrete cosine. D converted to frequency domain by conversion
The CT coefficient is normalized by its maximum value, the vector is quantized, the vector number closest to the vector is read from the codebook, and the vector number, the maximum value of the DCT coefficient, the pitch information, and a flag indicating a voiced section are set. When the input speech signal is encoded into a digital signal sequence in a voiced section and the input speech signal is in the unvoiced section, the input speech signal is subjected to short-term predictive analysis to extract spectrum envelope information and a short-term prediction residual signal from which the spectrum envelope information is removed. The power information obtained by calculating the power of the short-term prediction residual signal, the spectrum envelope information, and a flag indicating an unvoiced section are encoded into a digital signal string of the unvoiced section, and the digital signal string of the voiced section and the digital signal of the unvoiced section are encoded. The signal sequence is multiplexed and sent to the transmission path, and the receiving side uses the flag to identify the received multiplexed digital signal sequence. The voice section or the unvoiced section is discriminated and separated, and if it is the voiced section, the vector number, the maximum value of the DCT coefficient and the pitch information are extracted, and the quantized vector corresponding to the vector number is read from the codebook. To obtain a DCT coefficient normalized by vector dequantization, and to obtain a DC value from the normalized DCT coefficient and the maximum value of the DCT coefficient.
After reproducing the T coefficient, a long-term prediction residual signal is reproduced by inverse discrete cosine transform, the pitch information is added to the long-term prediction residual signal, and a long-term prediction synthesized voiced speech signal is output.
In the unvoiced section, the power information and the spectrum envelope information are extracted, the power information is added to pseudo noise, and the short-term prediction residual signal approximately reproduced by the power and the spectrum envelope information are subjected to short-term predictive synthesis and unvoiced. A voice encoding method for outputting a voice signal.

【請求項２】入力音声信号が有声区間か無声区間のい
ずれであるかを判定してそのいずれかを示すフラグを出
力する有声／無声判定器と、前記入力音声信号が有声区
間のとき該入力音声信号を符号化する有声用符号化器
と、前記入力音声信号が無声区間のとき該入力音声信号
を符号化する無声用符号化器と、前記フラグが有声区間
を示すとき前記入力音声信号を前記有声用符号化器によ
り符号化して伝送路に送出し前記フラグが無声区間を示
すとき前記入力音声信号を前記無声用符号化器により符
号化して伝送路に送出するように切替える有声／無声切
替え器とを備え、前記有声用符号化器は、前記入力音声信号からピッチ情
報と該ピッチ情報を取り除いた長期予測残差信号とを出
力する長期予測分析器と、該長期予測残差信号を周波数
領域に変換してＤＣＴ係数を出力する離散コサイン変換
器と、該ＤＣＴ係数をその最大値によって正規化し、Ｄ
ＣＴ係数最大値と正規化されたＤＣＴ係数とを出力する
正規化器と、該正規化されたＤＣＴ係数をベクトル量子
化し、その量子化ベクトルに近似するベクトルの番号を
符号帳から読み出してベクトル番号として出力するベク
トル量子化器と、前記ピッチ情報と前記ＤＣＴ係数最大
値と前記ベクトル番号と前記有声区間を示すフラグとを
ディジタル信号列の形態に符号化したのち多重化して伝
送路に送出する符号化器とを備え、前記無声用符号化器は、前記入力音声信号からスペクト
ル包絡情報と該スペクトル包絡情報を取り除いた短期予
測残差信号とを出力する短期予測分析器と、該短期予測
残差信号の電力を計算して電力情報を出力する電力算出
器と、該電力情報と前記スペクトル包絡情報と前記無声
区間を示すフラグとをディジタル信号列の形態に符号化
したのち多重化して伝送路に送出する符号化器とを備え
た音声符号化装置。2. A voiced / unvoiced deciding unit for judging whether an input voice signal is in a voiced section or an unvoiced section and outputting a flag indicating either of the voiced section and the unvoiced section; A voiced encoder for encoding a voice signal, an unvoiced encoder for encoding the input voice signal when the input voice signal is in an unvoiced section, and an input voice signal when the flag indicates a voiced section. Voiced / unvoiced switching for encoding by the voiced encoder and transmitting to the transmission line, and when the flag indicates an unvoiced section, encoding the input voice signal by the unvoiced encoder and transmitting to the transmission line The voiced encoder, a long-term prediction analyzer that outputs the pitch information and the long-term prediction residual signal from which the pitch information is removed from the input speech signal, and the long-term prediction residual signal as a frequency. Territory Normalizing the discrete cosine transformer for outputting a DCT coefficient, the maximum value of the DCT coefficients are converted into, D
A normalizer that outputs the maximum value of the CT coefficient and the normalized DCT coefficient, vector quantization of the normalized DCT coefficient, and the vector number that approximates the quantized vector is read from the codebook to obtain the vector number. , A code for outputting the pitch information, the maximum value of the DCT coefficient, the vector number, and a flag indicating the voiced section in the form of a digital signal sequence, and then multiplexing and transmitting to the transmission line. A short-term prediction analyzer that outputs a spectrum envelope information and a short-term prediction residual signal obtained by removing the spectrum envelope information from the input speech signal, and the short-term prediction residual. A digital signal including a power calculator that calculates power of the signal and outputs power information, the power information, the spectrum envelope information, and a flag indicating the unvoiced section An audio encoding device comprising an encoder that encodes in the form of a string, multiplexes it, and sends it to a transmission line.

【請求項３】有声区間を示すフラグ，ピッチ情報，Ｄ
ＣＴ係数最大値，ベクトル番号が含まれてディジタル信
号列の形態に符号化された有声音情報または無声区間を
示すフラグ，電力情報およびスペクトル包絡情報が含ま
れてディジタル信号列の形態に符号化された無声音情報
を受信して有声音情報と無声音情報とに分離出力する分
離器と、該分離器からの有声音情報を復号して再生音声
を出力する有声用復号器と、該分離器からの無声音情報
を復号して再生音声を出力する無声用復号器と、前記有
声用復号器と該無声用復号器からの出力を前記フラグに
従って切替え出力する有声／無声切替え器とを備え、前記有声用復号器は、前記ベクトル番号に対応する量子
化ベクトルを符号帳から読み出してベクトル逆量子化に
より正規化されたＤＣＴ係数を再生出力するベクトル逆
量子化器と、該正規化されたＤＣＴ係数と前記ＤＣＴ係
数最大値とからＤＣＴ係数を再生出力する逆正規化器
と、該ＤＣＴ係数を時間領域に変換して長期予測残差信
号を出力する逆離散コサイン変換器と、該長期予測残差
信号と前記ピッチ情報とを長期予測合成して前記有声／
無声切替え器に与える長期予測合成器とを備え、前記無声用復号器は、擬似雑音を発生する擬似雑音発生
器と、該擬似雑音に前記電力情報を付加することにより
短期予測残差信号を近似的に再生する乗算器と、該短期
予測残差信号と前記スペクトル包絡情報とを短期予測合
成して前記有声／無声切替え器に与える短期予測合成器
とを備えた音声復号装置。3. A flag indicating a voiced section, pitch information, and D
The maximum value of the CT coefficient and the vector number are included and encoded in the form of a digital signal sequence. Voiced sound information or a flag indicating an unvoiced section, power information, and spectral envelope information are included and encoded into the form of a digital signal sequence. Separator that receives unvoiced sound information and separates and outputs it into voiced sound information and unvoiced sound information, a voiced decoder that decodes the voiced sound information from the separator and outputs reproduced voice, and a separator from the separator. A voiceless decoder for decoding unvoiced sound information and outputting reproduced voice; a voiced decoder and a voiced / unvoiced switcher for switching and outputting the output from the voiceless decoder according to the flag; The decoder reads a quantized vector corresponding to the vector number from a codebook and reproduces and outputs a DCT coefficient normalized by vector dequantization; An inverse normalizer that reproduces and outputs a DCT coefficient from the converted DCT coefficient and the maximum value of the DCT coefficient, an inverse discrete cosine converter that converts the DCT coefficient into a time domain and outputs a long-term prediction residual signal, The long-term prediction residual signal and the pitch information are subjected to long-term prediction synthesis, and
And a long-term predictive synthesizer provided to the unvoiced switcher, wherein the unvoiced decoder approximates a short-term prediction residual signal by adding a pseudo noise generator that generates pseudo noise and the power information to the pseudo noise. A speech decoding apparatus comprising: a multiplier which reproduces the short-term prediction residual signal; and a short-term prediction synthesizer which short-term predictively synthesizes the short-term prediction residual signal and the spectrum envelope information and provides the voiced / unvoiced switch.