JPH01205199A

JPH01205199A - Sound encoding system

Info

Publication number: JPH01205199A
Application number: JP63028729A
Authority: JP
Inventors: Shigeru Ono; 茂小野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-02-12
Filing date: 1988-02-12
Publication date: 1989-08-17

Abstract

PURPOSE:To encode pitch information at a high quality and low pitch rate by causing an encoder to follow the change of a pitch mechanism in a frame while the linear coupling of reference sound signals is formed and performing efficient quantization. CONSTITUTION:An encoder is constituted of a linear forecasting analyzer 110, pitch predicting analyzer 120 which performs pitch extraction, reference sound component calculator 130, sound varying component calculator 140, quantizer 145, and multiplexer 150. Pitch variation in a frame is absorbed in such a way that a reference reproduced sound signal which represents the average characteristic of the basic period length of sound signals in the frame is synthesized and input sound signals are represented by the linear sum of the reference reproduced sound signal, and then, factors and phase components for forming this linear coupling are found so that the pitch variation of input sounds can be followed faithfully. In addition, when a linear coupling factor is quantized, the correlation between the linear coupling factor and reference period information is utilized. Therefore, pitch information can be encoded at a high quality and low pitch rate.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声の帯域圧縮に用いる音声符号化方式に関
し、特に離散音声信号を特定区間の再生音声信号の線形
結合で表す音声符号化方式に関する。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to an audio encoding method used for audio band compression, and in particular to an audio encoding method in which a discrete audio signal is expressed by a linear combination of reproduced audio signals of a specific section. Regarding.

〔従来の技術〕[Conventional technology]

音声符号化方式には、単区間の音声信号５（ｎ）を位相
の異なる線形フィルタのインパルス応答の和で表すもの
がマルチパルス駆動型音声符号化法として知られており
、これは、初め、ピー・ニス・アクル（Ｂ、　Ｓ、Ａｔ
ａｌ）らにより、文献１：「アニユラ−モデル　オブ　
エル　ピー　シー　エキサイティジョン　フォー　プロ
デューシングナチュラル　サウンデイング　スピーチ　
アットロウ　ビット　レイク（Ａ　ｎｅｉｌｒｎｏｄｅ
ｌ　ｏｆ　ＬＰＣｅｘｃｉｔａｔｉｏｎ　ｆｏｒ　ｐｒ
ｏｄｕｃｉｎｇ　ｎａｔｕｒａｌ　ｓｏｕｎｄｉｎｇｓ
ｐｅｅｃｈ　ａｔ　ｌｏｗ　ｂｉｔ　ｒａｔｅｓ　）　
Ｊ　、アイ・シー・ニー・ニス・ニス・ピー８２　ＣＩ
ＣＡＳＳＰ　８２）ノロ１４　から６１７ページで提案
された。この符号化法では、１６　ｋｂ／ｓ程度のビッ
トレートでは自然性の高い再生音を提供できることが確
認されているが、ビットレートをさらに低くすると、パ
ルスの数、即ちインパルス応答の数が少なくなり、再生
音声の品質が劣化する。Among the audio encoding methods, one in which a single interval audio signal 5(n) is represented by the sum of impulse responses of linear filters with different phases is known as a multipulse driven audio encoding method. P Nis Akle (B, S, At
al) et al., Reference 1: “Annual Model of
LPC Excitement for Producing Natural Sounding Speech
Atlow Bit Lake (A nailrnode)
l of LPCexcitation for pr
educating natural sounds
low bit rates)
J, I C K N I N I S P 82 CI
CASSP 82) Noro 14 proposed on page 617. It has been confirmed that this encoding method can provide highly natural reproduced sound at a bit rate of about 16 kb/s, but if the bit rate is lowered even further, the number of pulses, that is, the number of impulse responses, decreases. , the quality of the playback audio deteriorates.

そこで、これに対する対策として、ピッチ予測を組み込
む方式によるもの（文献２：手沢、小野、飛開「マルチ
パルス駆動型音声符号化法の品質改善」音響学会音声研
究会資料、５８３−７８（昭和５９年））がある。本方
式のブロック図を第２図に示す。Therefore, as a countermeasure to this problem, a method incorporating pitch prediction (Reference 2: Tezawa, Ono, Hikai, "Quality Improvement of Multi-Pulse Driven Speech Coding Method", Materials of the Speech Study Group of the Acoustical Society of Japan, 583-78 (1983) )). A block diagram of this method is shown in FIG.

入力端子１０は一定間隔のフレームに分割された離散的
な音声信号を入力する入力端子である。フレームの長さ
は２０ｍ５ｅｃから３０ｍ５ｅｃが普通である。The input terminal 10 is an input terminal into which a discrete audio signal divided into frames at regular intervals is input. The length of the frame is usually between 20m5ec and 30m5ec.

人力された音声信号は、線形予測分析器２０とピッチ予
測分析器３０とバッファ４０とに供給される。線形予測
分析器２０においては、線形予測係数或いはＰＡＲＣＯ
Ｒ係数を求め、それを局所マルチパルス符号器５０と局
所マルチパルス復号器６０に供給するとともにマルチプ
レクサ９０へ出力する。線形予測係数或いはＰＡＲＣＯ
Ｒ係数の求め方は公知である。ピンチ予測分析器３０は
、人力音声信号の自己相関関係を計算しその最大値を与
える遅延時間とその時の値から入力音声信号の基本周期
（ピッチ周期）並びにピンチ予測係数を算出するもので
ある。これも公知の方法である。算出されたピッチ周期
並びにピッチ予測係数はピンチ予測器７０とマルチプレ
クサ９０とに出力される。バッファ４０は、入力音声信
号をピッチ予測分析器３０で抽出されたピッチ周期毎に
分割し、各ピッチ周期毎に入力音声信号を減算器８０に
出力し、ピッチ予測器７０の出力との差即ちピッチ予測
残差を計算する。計算されたピッチ予測残差は局所マル
チパルス符号器５０に供給される。局所マルチパルス符
号器５０は、減算器８０から入力されるピッチ予測残差
と前記文献１の符号化方法を適用するもので、線形予測
分析器２０から入力される線形予測係数で定まるフィル
タのインパルス応答とピッチ予測残差とをもとに。The human input speech signal is supplied to a linear prediction analyzer 20, a pitch prediction analyzer 30, and a buffer 40. In the linear prediction analyzer 20, linear prediction coefficients or PARCO
The R coefficient is determined and supplied to the local multipulse encoder 50 and the local multipulse decoder 60 and output to the multiplexer 90. Linear prediction coefficient or PARCO
The method for determining the R coefficient is known. The pinch prediction analyzer 30 calculates the autocorrelation of the human input audio signal and calculates the fundamental period (pitch period) and pinch prediction coefficient of the input audio signal from the delay time that gives the maximum value and the value at that time. This is also a known method. The calculated pitch period and pitch prediction coefficient are output to the pinch predictor 70 and the multiplexer 90. The buffer 40 divides the input audio signal into pitch periods extracted by the pitch prediction analyzer 30, outputs the input audio signal to the subtracter 80 for each pitch period, and calculates the difference between the input audio signal and the output of the pitch predictor 70, i.e., Compute the pitch prediction residual. The calculated pitch prediction residual is provided to a local multipulse encoder 50. The local multi-pulse encoder 50 applies the pitch prediction residual input from the subtracter 80 and the encoding method of the above-mentioned document 1, and uses the filter impulse determined by the linear prediction coefficient input from the linear prediction analyzer 20. Based on the response and the pitch prediction residual.

ピッチ予測残差を最も良く表すインパルス応答の線形結
合パラメータ、即ち音源パルスの振幅並びに位置を算出
し、出力するものである。算出されたパルス振幅並びに
位置はマルチプレクサ９０と局所マルチパルス復号器６
０とに出力される。局所マルチパルス復号器６０は局所
マルチパルス符号器５０から出力されるパルスの振幅並
びに位置を受は取り、それと線形予測分析器２０から出
力される線形予測係数とをもとに当該区間の再生信号を
合成するものである。合成された再生信号は、加算器８
５に送られ、そこでピッチ予測器７０の出力と加算され
て当該区間の再生音声信号となる。再生音声信号はピッ
チ予測器７０に供給される。マルチプレクサ９０は線形
予測係数とピッチ情報とパルス振幅とパルス位置とを表
す符号を多重化して出力端子９９に出力する。The linear combination parameters of the impulse response that best represent the pitch prediction residual, that is, the amplitude and position of the sound source pulse are calculated and output. The calculated pulse amplitude and position are sent to the multiplexer 90 and the local multipulse decoder 6.
It is output as 0. The local multi-pulse decoder 60 receives the amplitude and position of the pulse output from the local multi-pulse encoder 50, and based on this and the linear prediction coefficients output from the linear prediction analyzer 20, reproduces the reproduced signal of the section. It synthesizes. The combined reproduced signal is sent to an adder 8
5, where it is added to the output of the pitch predictor 70 to become a reproduced audio signal for the section. The reproduced audio signal is supplied to a pitch predictor 70. The multiplexer 90 multiplexes codes representing the linear prediction coefficient, pitch information, pulse amplitude, and pulse position, and outputs the multiplexed code to an output terminal 99.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

しかし、前記文献２の従来方式ではフレーム内のピッチ
周期を一定としており、フレーム内におけるピッチ周期
の変動成分はピッチ予測残差に含まれることになる。一
般に音声信号は時間的に大きく変動しているため、この
ようにフレーム内のピッチ変動を無視する構成ではピッ
チ予測残差の変動成分は大きく、それを高品質で表すに
は多く音源パルス情報を必要とする。ピンチ情報は、音
声知覚の上で重要な特徴と考えられるため、時間的に変
動するピッチ情報を高品質に符号化することは高品質音
声符号化にとって大変重要である。However, in the conventional method disclosed in Document 2, the pitch period within a frame is kept constant, and the fluctuation component of the pitch period within the frame is included in the pitch prediction residual. In general, audio signals fluctuate greatly over time, so in a configuration that ignores pitch fluctuations within a frame, the fluctuation component of the pitch prediction residual is large, and in order to represent this with high quality, a large amount of sound source pulse information is required. I need. Since pinch information is considered to be an important feature in speech perception, it is very important to encode temporally varying pitch information with high quality for high-quality speech encoding.

本発明の目的は、高品質でかつ低いビットレートにピッ
チ情報を符号化できる音声符号化方式を提供することに
ある。SUMMARY OF THE INVENTION An object of the present invention is to provide an audio encoding method that can encode pitch information with high quality and at a low bit rate.

［課題を解決するための手段〕本発明の音声符号化方式は、離散的な音声信号列を入力し、この入力した音声信号列
の基本周期成分を抽出し、前記入力した音声信号列から
前記基本周期成分に相当する区間長の平均的な特性を表
す基準再生音声信号列を合成し、この基準再生音声信号
列の線形結合からなる再生音声信号列を形成する際に線
形結合係数を、前記入力音声信号列と再生音声信号とが
近くなるよう求め、この求めた結合係数を前記基本周期
成分を基に量子化することを特徴としている。[Means for Solving the Problems] The audio encoding method of the present invention inputs a discrete audio signal string, extracts the fundamental period component of the input audio signal string, and extracts the fundamental period component from the input audio signal string. When synthesizing a reference reproduced audio signal sequence representing an average characteristic of the interval length corresponding to the fundamental period component and forming a reproduced audio signal sequence consisting of a linear combination of this reference reproduction audio signal sequence, the linear combination coefficient is It is characterized in that the input audio signal sequence and the reproduced audio signal are determined to be close to each other, and the determined coupling coefficient is quantized based on the fundamental periodic component.

〔作用〕[Effect]

本発明によれば、フレーム内の音声信号の基本周期長の
平均的な特性を表す基準再生音声信号を合成し、人力音
声信号を前記基準再生音声信号の線形和で表し、この線
形結合を形成するための係数並びに位相成分を前記入力
音声のピッチ変動に忠実に追随するように求めることで
、フレーム内のピッチ変動を吸収することを可能にする
高品質音声符号化方式が提供され、かつ前記線形結合係
数を量子化する際に線形結合係数と基本周期情報との間
にある相関を利用して低ビツトレートで符号化が可能な
音声符号化方式が提供される。According to the present invention, a reference reproduced audio signal representing an average characteristic of the fundamental period length of an audio signal in a frame is synthesized, a human input audio signal is expressed as a linear sum of the reference reproduced audio signals, and this linear combination is formed. A high-quality audio encoding method is provided that makes it possible to absorb pitch fluctuations within a frame by determining coefficients and phase components to faithfully follow the pitch fluctuations of the input audio, and An audio encoding method is provided that enables encoding at a low bit rate by utilizing the correlation between the linear combination coefficient and fundamental period information when quantizing the linear combination coefficient.

〔実施例〕〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は本発明の音声符号化方式の一実施例に係る音声
符号化並びに復号化装置を表すプロ・ツク図を示す。FIG. 1 is a block diagram showing a speech encoding and decoding apparatus according to an embodiment of the speech encoding method of the present invention.

第１図に示す如く、符号化側の装置は、入力端子１００
と、線形予測分析器１１０と、ピ・ノチ抽出を行うピッ
チ予測分析器１２０と、基準音声成分計算器１３０と、
音声変動成分計算器１４０と、量子化器１４５と、マル
チプレクサ１５０と、出力端子１６０とから成っている
。復号化側の装置は、符号入力端子２００と、デマルチ
プレクサ２１０と、逆量子化器２１５と、音源再生器２
２０と、音声再生器２３０と、再生音声出力端子２４０
とから構成される。As shown in FIG. 1, the encoding side device has an input terminal 100.
, a linear prediction analyzer 110, a pitch prediction analyzer 120 that performs pitch extraction, a reference speech component calculator 130,
It consists of a voice fluctuation component calculator 140, a quantizer 145, a multiplexer 150, and an output terminal 160. The decoding side device includes a code input terminal 200, a demultiplexer 210, an inverse quantizer 215, and a sound source regenerator 2.
20, an audio reproducer 230, and a reproduced audio output terminal 240.
It consists of

本発明に従う上述の符号化側の装置の特徴は、フレーム
内におけるピッチ構造（主にピンチ周期）の時間的変化
を高品質に符号化するために、フレーム内音声の平均的
な特性を表す一定区間長く例えばピッチ周期と同じ長さ
）の基準音声信号を基本に、その基準音声信号の線形結
合を形成させながらフレーム内のピッチ構造の変化に追
随するようにしたこと、線形結合パラメータを量子化に
おいてパラメータ変動とピッチ周期との関係を利用した
効率的な量子化器を備えていることにある。The above-mentioned encoding side device according to the present invention is characterized by the fact that in order to encode temporal changes in the pitch structure (mainly pinch period) within a frame with high quality, a constant Based on a reference audio signal with a long interval (for example, the same length as the pitch period), a linear combination of the reference audio signals is formed to follow changes in the pitch structure within the frame, and the linear combination parameter is quantized. The present invention is equipped with an efficient quantizer that utilizes the relationship between parameter variation and pitch period.

以下、これに関して、まず、本発明の原理について説明
する。In this regard, the principle of the present invention will be explained below.

いま、−フレーム内の音声信号をｓ　（ｎ）、ｓ　（ｎ
）の線形予測分析により定まる線形予測係数を係数に持
つ全極フィルタのインパルス応答をｈ（ｎ）とすると、
当該フレームの一ピンチ区間の平均的な特性を表す、基
準再生音声信号ｂ　（ｎ）は、ｂ（ｎ）＝Σｇｉｈ（ｎ
−ｍ＋）　　　　　・・・　（１）と表せる。係数（、
ｇｔ）　、位置（ｍｕ）、ｉ＝ｔ。Now, the audio signals in the − frame are expressed as s (n), s (n
), let h(n) be the impulse response of an all-pole filter whose coefficients are linear prediction coefficients determined by linear prediction analysis of
The standard reproduced audio signal b(n) representing the average characteristics of one pinch section of the frame is b(n)=Σgih(n
-m+) ... It can be expressed as (1). coefficient(,
gt), position (mu), i=t.

・・・、　Ｍ　（Ｍ　：　ｂ　（ｎ）を形成するインパ
ルス応答ｈ　（ｎ）の数）の決定法は幾つか考えられる
が、最も望ましいものは、次の重み付き平均二乗誤差Ｅ
を最小化するものである。..., M (M: number of impulse responses h (n) forming b (n)) can be determined in several ways, but the most desirable one is the following weighted mean square error E
The goal is to minimize the

１ｗ（ｎ）］　”　　　　・・・　（２）ここで、Ｔは
フレーム内の平均ピ・ソチ周期、ｗ（ｎ）は重み関数で
ある。ｗ（ｎ）は、目的に合った任意の関数を選択する
ことができる。例えば、文献１のように入力音声信号の
スペクトル包絡によって決まるものや、音声信号の周期
性を強調するコムフィルタ特性を用いるものなどがある
。尚、第（２）式において（ａｉ）と（ｇｔ）とは積の
関係にあるので、（ｇ、）を容易に求めることは難しい
。そこで、例えば、全てのａ、を１、τを０として求め
ることとする。すると、第（２）式は、１ｗ（ｎ）］　
”　　　　・・・　（３）と表される。ここで、Ｔは既
知であるから、第（３）式をできるだけ小さくする（ｇ
ｔ）　と（ｍ８）を求める問題は、前記文献１にあるマ
ルチパルス符号化におけるパルスの振幅と位相を求める
問題と同一になり、従来から良く知られている。例えば
、そのためのアルゴリズムは、前記文献１や文献３　（
Ｋ、Ｏｚａｗａ、　Ｓ、Ｏｎｏ、　ａｎｄ　Ｔ、Ａｒａ
ｓｅｋｉ、　”Ａ　５ｔｕｄｙｏｎ　Ｐｕ１ｓｅ　５ｅ
ａｒｃｈ　Ｌｏｇｏｒｉｔｈｍｓ　ｆｏｒ　Ｍｕｌｔｉ
−ＰｕｌｓｅＥｘｃｉｔｅｄ　５ｐｅｅｃｈ　Ｃｏｄｅ
ｒ　Ｒｅａｌｉｚａｔｉｏｎ、　　”　　ＩＥＥＥＳＡ
Ｃ，ｖｏｌ−４，ｐｐ、１３３−１４１．　Ｊａｎｕａ
ｒＶ　１９８６．　）に記載されているものを利用する
ことができる。これは、基本的に第（３）式の両辺を（
ｇ、）で偏微分して零として正規方程式を求め、Ｇａｕ
ｓｓの消去法などの数値解析の手法を用いて正規方程式
の解を求めることで、所望のパラメータを計算するもの
である。1w(n)] ” ... (2) Here, T is the average Pi-Sochi period within the frame, and w(n) is a weighting function. w(n) can be any function that suits the purpose. For example, there are those that are determined by the spectral envelope of the input audio signal as in Reference 1, and those that use comb filter characteristics that emphasize the periodicity of the audio signal.In addition, in Equation (2), Since (ai) and (gt) have a product relationship, it is difficult to easily obtain (g,). Therefore, for example, let us find all a's as 1 and τ as 0. Then, Equation (2) is 1w(n)]
” ... (3). Here, since T is known, the formula (3) is made as small as possible (g
The problem of finding t) and (m8) is the same as the problem of finding the amplitude and phase of a pulse in multi-pulse encoding described in Document 1, and is well known in the past. For example, the algorithm for this purpose is the above-mentioned document 1 and document 3 (
K, Ozawa, S, Ono, and T, Ara.
seki, ”A 5tudyon Pulse 5e
arch Logorithms for Multi
-Pulse Excited 5peech Code
rRealization, ”IEEESA
C, vol-4, pp, 133-141. January
rV 1986. ) can be used. This basically converts both sides of equation (3) into (
Gau
Desired parameters are calculated by finding solutions to normal equations using numerical analysis methods such as ss elimination.

次に、定まったｂ（ｎ）の線形結合で入力音声信号ｓ　
（ｎ）を表すことを考える。これは、次式の近似問題を
解（ことに等しく、重み付き平均二乗誤差１ｗ（ｎ）］　　”　　　　　・　・　・　（４）をな
るべく小さくする（Ｃｋ）　と（ｄう）、ｋ＝１゜・・
・、Ｋ（Ｋ：線形結合を形成するｂ　（ｎ）の数）を求
めるものである。（Ｃｋ）　と（ｄｋ）を求める問題は
、やはり、第（３）式の場合と同様にマルチパルス符号
化におけるパルス探索問題となり、従来例である前記文
献１や前記文献３に詳しく述べられているアルゴリズム
を用いて解くことができる。Next, by linear combination of fixed b(n), input audio signal s
Consider expressing (n). This solves the approximation problem of the following equation (which is equivalent to the weighted mean square error 1w(n)) ” ・・・ Make (4) as small as possible (Ck) and (d), k = 1°・・
, K (K: number of b (n) forming a linear combination). The problem of finding (Ck) and (dk) is still a pulse search problem in multipulse encoding, as in the case of equation (3), and is described in detail in the conventional examples of the above-mentioned documents 1 and 3. It can be solved using the following algorithm.

一方、音声を再生するためには、ｘ（ｎ）−ΣＣ１１δ（ｎ　　ｄｍ）ｙ（ｎ）−Σｇｉδ（ｎ−ｍｔ）なる信号を形成し、ｖ　（ｎ）＝　ｘ　（ｎ）　＊　ｙ　（ｎ）　　　　　
　・・・　（５）なるｖ（ｎ）で線形予測係数合成フィ
ルタを駆動すればよい。On the other hand, in order to reproduce audio, a signal x(n)-ΣC11δ(n dm) y(n)-Σgiδ(n-mt) is formed, and v (n)= )
... (5) The linear prediction coefficient synthesis filter may be driven by v(n).

上記の説明から分かるように、（ｄｋ）の自己相関関数
はピッチ周期Ｔに近い周期性を示すことが期待される。As can be seen from the above explanation, the autocorrelation function of (dk) is expected to exhibit periodicity close to the pitch period T.

従って、（ｄｋ）を量子化する際には、ピッチ周期Ｔを
利用することが考えられ、（ｄｋ）を直接量子化するの
でなく、ｄ、とｋ・Ｔとの差を量子化するようにすれば
効率的になる。Therefore, when quantizing (dk), it is possible to use the pitch period T, and instead of directly quantizing (dk), it is possible to quantize the difference between d and k・T. It will make you more efficient.

また、（Ｃｋ）はピンチ周期毎の入力音声信号系列と基
準再生音声信号系列との相関係数という性質を持ってい
るので、入力音声信号が急峻に変化しない限り隣り合う
　（Ｃｋ）には高い相関があると考えられるので、（Ｃ
２）を直接量子化する代わりにＣｋ−１とｃｋとの差を
量子化することで高い圧縮率が期待できる。In addition, since (Ck) has the property of being a correlation coefficient between the input audio signal sequence and the standard reproduced audio signal sequence for each pinch period, it is high for adjacent (Ck) as long as the input audio signal does not change sharply. Since it is thought that there is a correlation, (C
A high compression rate can be expected by quantizing the difference between Ck-1 and ck instead of directly quantizing 2).

第１図に戻り、次に第１図を参照して符号化側の具体的
な処理について説明する。Returning to FIG. 1, next, specific processing on the encoding side will be described with reference to FIG.

図の符号化側の処理において、入力端子１００には、一
定間隔のフレームに分割されたＮサンプルの離散的音声
信号（例えば８　ｋＨｚサンプリングで、Ｎ　＝　１６
０　（２０ｍｓｅｃ）　）を人力し、線形予測分析器１
１０とピッチ予測分析器１２０とに供給する。線形予測
分析器１１０は入力端子１００から入力した音声信号か
らＰＡＲＣＯＲ係数を求め、量子化した後、基準音声成
分計算器１３０とマルチプレクサ１５０とに出力する。In the encoding process shown in the figure, the input terminal 100 receives a discrete audio signal of N samples divided into frames at regular intervals (for example, 8 kHz sampling, N = 16
0 (20msec)) and linear predictive analyzer 1
10 and a pitch prediction analyzer 120. The linear prediction analyzer 110 obtains a PARCOR coefficient from the audio signal input from the input terminal 100, quantizes it, and outputs it to the reference audio component calculator 130 and multiplexer 150.

ピッチ予測分析器１２０は、当該フレーム内の音声信号
の基本周期成分と平均的なピッチ予測係数を求めるもの
で、入力音声信号の自己相関関数の最大値から定められ
る。その他の平均ピッチ周期成分子を求める手段は、例
えば、文献４：古井著、ディジタル音声処理、東海大学
出版会（昭和６０）の第４章に記載されている。計算さ
れたピッチは量子化した後、基準音声成分計算器１３０
と音声変動成分計算器１４０とマルチプレクサ１５０と
に出力される。基準音声成分計算器１３０は、前記の第
（１）式で示したｂ　（ｎ）を求めるものである。線形
予測分析器１１０から供給されるＰＡＲＣＯＲ係数を逆
量子化したあとそれを線形予測係数に変換し当該全極フ
ィルタのインパルス応答ｈ（ｎ）を計算する。計算され
たｈ　（ｎ）と、入力端子１００から入力される離散音
声信号と、ピッチ分析器１２０から入力されるピッチ周
期とから前記第（３）弐で表される最小化問題を解いて
、所望の（ｇ、）と（ｍｔ）とを求める。前記第（３）
式にある重み関数ｗ（ｎ）は、例えば前記文献１にある
ように、ｗ（ｎ）＝δ（ｎ）＋Σａ、δ（ｎ−ｉ）−戸ｙ’ａ４
ｗ（ｎ−ｉ）　・・・　（６）ただし、 δ（ｎ）二単位インパルス（ａｉ）　　：線形予測係数 γ　：係数（０〈γ〈１）として計算できる。また、この重み関数は、線形予測係
数の時間的変化、即ち過去のフレームに於ける線形予測
係数に応じて選ぶこともできる。The pitch prediction analyzer 120 determines the basic period component and average pitch prediction coefficient of the audio signal within the frame, which is determined from the maximum value of the autocorrelation function of the input audio signal. Other methods for determining the average pitch period component are described in, for example, Document 4: Furui, Digital Speech Processing, Chapter 4 of Tokai University Press (1988). After the calculated pitch is quantized, it is sent to the reference audio component calculator 130.
is output to the audio fluctuation component calculator 140 and multiplexer 150. The reference audio component calculator 130 calculates b (n) shown in the above-mentioned equation (1). After dequantizing the PARCOR coefficients supplied from the linear prediction analyzer 110, they are converted into linear prediction coefficients and the impulse response h(n) of the all-pole filter is calculated. Solving the minimization problem expressed in (3) 2 from the calculated h (n), the discrete audio signal input from the input terminal 100, and the pitch period input from the pitch analyzer 120, Find the desired (g,) and (mt). Said No. (3)
The weighting function w(n) in the formula is, for example, as in the above-mentioned document 1, w(n) = δ(n) + Σa, δ(ni) - y'a4
w(ni) (6) However, it can be calculated as follows: δ(n) two-unit impulse (ai): linear prediction coefficient γ: coefficient (0<γ<1). Moreover, this weighting function can also be selected according to temporal changes in linear prediction coefficients, that is, linear prediction coefficients in past frames.

（ｇｔｌ　と（ｍｔ）との計算の仕方は、前記文献４の
外、例えば文献５：特願昭５８−１５０７８３号明細書
。(How to calculate gtl and (mt) can be found in Document 4, for example, in Document 5: Japanese Patent Application No. 150783/1983).

「音声符号化方法」に詳しく説明されているので、ここ
では簡単に説明する。まず、第（３）式の両辺を（ｇｉ
）で偏微分して零とおく。それは整理すると次のような
正規力、程式になる。This is explained in detail in "Speech encoding method", so a brief explanation will be provided here. First, let both sides of equation (3) be (gi
) and set it to zero. When rearranged, it becomes the following normal force and equation.

ｋ＝１．・・・、Ｋ　・・・　（７）（記載の簡易化のため重み関数ｗ（ｎ）は省略しである
）例えば、上式をｋが１から所望の値までの各問題に対し
て解いていけばよい。（ｍ４）は各にと全ての可能なｍ
、に対して第（７）式で求めた（ｇ、）に対して第（３
）式を計算し、その時の値を最も小さくするｍｉが求め
るものとなる。計算した（ｇ、）と（ｍｉ）　は量子化
されマルチプレクサ１５０に供給されるとともに、前記
第（１）式のｂ（ｎ）に相当する信号に変換されて音声
変動成分計算器１４０に供給される。音声変動成分計算
器１４０は、入力端子１００から人力する音声信号と基
準音声成分計算器１３０から入力する基準音声成分ｂ（
ｎ）と、ピンチ予測分析器１２０から入力するピッチ周
期と、線形予測分析器１１０から入力される線形予測係
数とから、前記第（４）式の最小化問題を解いて、所望
の（Ｃ，）　と（ｄ３）　とを求めるものである。k=1. ..., K ... (7) (The weighting function w(n) is omitted to simplify the description.) For example, solve the above equation for each problem where k is from 1 to the desired value. All you have to do is go. (m4) is for each and all possible m
, the (3rd
) is calculated, and the mi that minimizes the value at that time is the one to be found. The calculated (g,) and (mi) are quantized and supplied to the multiplexer 150, and are also converted to a signal corresponding to b(n) in equation (1) and supplied to the audio fluctuation component calculator 140. Ru. The audio fluctuation component calculator 140 combines the audio signal manually input from the input terminal 100 and the reference audio component b(
The desired (C, ) and (d3).

ここで、重み関数は前記第（６）式と同じものを用いる
。音声変動成分計算器１４０において、ｂ　（ｎ）と入
力音声信号とから（Ｃ５）と（ｄ６）を、求める手段は
既に述べたように前記文献４と文献５において述べられ
ている方法を用いる。即ち、前記第（４）式の両辺を（
Ｃ５）で偏微分して零とおいて、次のような正規方程式
を立てる。Here, the same weighting function as in equation (6) above is used. In the audio fluctuation component calculator 140, the methods described in References 4 and 5 are used as means for determining (C5) and (d6) from b(n) and the input audio signal, as described above. That is, both sides of the equation (4) are (
By partially differentiating C5) and setting it to zero, we set up the following normal equation.

Σ５（ｎ）ｂ（ｎ−ｄ、）＝ｔ＝１．・・・、Ｋ　・・・　（８）（記載の簡易化のため重み関数ｗ（ｎ）は省略しである
）この式から（Ｃ６）と（ｄｋ）を求めるのは前記第（７
）式の場合と同じである。量子化器１４５は（ｄｋ）　
と（Ｃ５）の量子化を、前記原理で述べたように、ピッ
チ周期情報を使うことで効率良く実現するものである。Σ5(n)b(nd-d,)=t=1. ..., K ... (8) (The weighting function w(n) is omitted to simplify the description.) Calculating (C6) and (dk) from this equation is based on the above (7th)
) is the same as for the expression. The quantizer 145 is (dk)
The quantization of (C5) and (C5) can be efficiently realized by using pitch period information, as described in the above principle.

ここでは、（ｄｋｌを・・・　（９）と変換して、（ｄ’ｋ）を量子化する。また（Ｃｋ）は
、と変換して、（Ｃｋ）を量子化する。更しこ、（ｄｋ）
を求める際に、パルス位置（ｄ３）がピ・ノチ周期内に
偏らないように（ｄｋ）の各要素間の距離力くピッチ周
期Ｔを用いて制御させることで、よりｔａ’ｔ、）の変
動を小さくすることもできる。求めた（Ｃ５）と（ｄｋ
）　とは、量子化された後、マルチプレクサ１５０に出
力される。マルチプレクサ１５０は、線形予測分析器１
１０からＰＡＲＣＯＲ係数を表す符号を、ピンチ分析器
１２０から平均ピ・ノチ周期を表す符号を、基準音声成
分計算器１３０力・ら基準音声成分の（ｇｉ）　と（ｍ
ｉ）　とを表す符号を、量子化器１４５から音声変動成
分の（ｃｍ）と（ｄｋ）　とを表す符号とを入力し、そ
れらを多重化して出力端子１６０から出力する。Here, (dkl is converted to... (9) to quantize (d'k). Also, (Ck) is converted to and quantized (Ck). (dk)
When determining ta't, ), by controlling the distance between each element of (dk) using the pitch period T so that the pulse position (d3) does not deviate within the pitch period, the It is also possible to reduce fluctuations. The obtained (C5) and (dk
) is output to the multiplexer 150 after being quantized. Multiplexer 150 includes linear predictive analyzer 1
10 to the code representing the PARCOR coefficient, the pinch analyzer 120 to the code representing the average pitch period, the reference speech component calculator 130 to the reference speech component (gi) and (m
i) A code representing (cm) and (dk) of the audio fluctuation components is inputted from the quantizer 145, multiplexed, and outputted from the output terminal 160.

上述のように、（ｄｋ）を量子化する際には、ピンチ周
期Ｔを利用し、（ｄ、）を直接量子化するのではなく、
ｄよとに−Ｔとの差を量子化するようにすれば効率的で
ある。また、（Ｃ３）はピッチ周期毎の入力音声信号系
列と基準再生音声信号系列との相関係数という性質を持
っており、入力音声信号が急峻に変化しない限り隣り合
う（Ｃｋ）には高い相関があるので、（Ｃｋ）　と直接
量子化する代わりにＣｋ−１とＣｋとの差を量子化する
ことで高い圧縮率が期待できる。As mentioned above, when quantizing (dk), the pinch period T is used and instead of directly quantizing (d,),
It is efficient to quantize the difference between -T and d. In addition, (C3) has the property of being a correlation coefficient between the input audio signal sequence and the standard reproduced audio signal sequence for each pitch period, and as long as the input audio signal does not change sharply, adjacent (Ck) have a high correlation. Therefore, a high compression ratio can be expected by quantizing the difference between Ck-1 and Ck instead of directly quantizing (Ck).

このようにして、符号化側では、線形結合を形成するた
めの係数並びに位相成分を入力音声のピンチ変動に忠実
に追随するように求めることにより、フレーム内のピン
チ変動を吸収することが可能で高品質な音声符号化を行
うことができ、また前記線形結合係数を量子化する際に
線形結合係数と基本周期情報との間にある相関を利用し
て低ビツトレートで音声符号化を行うことができる。In this way, on the encoding side, it is possible to absorb pinch fluctuations within a frame by determining the coefficients and phase components for forming a linear combination to faithfully follow the pinch fluctuations of the input audio. It is possible to perform high-quality audio encoding, and when quantizing the linear combination coefficients, it is possible to perform audio encoding at a low bit rate by using the correlation between the linear combination coefficients and the fundamental period information. can.

復号化側における再生は、以下のようにして行うことが
できる。Reproduction on the decoding side can be performed as follows.

即ち、復号化側では、符号入力端子２００より多重化さ
れた符号列を入力し、デマルチプレクサ２１０で、音声
の変動成分（ｃ’、）と（ｄ’ｋ）を表す符号を、逆量
子化器２１５へ出力し、基準音声成分（ｇｉ）　と（ｍ
ｉ）　とを表す符号と、平均ピッチ周期Ｔを表す符号を
音源再生器２２０へ出力し、ＰＡＲＣＯＲ係数を表す符
号を音声再生器２３０へ出力−する。逆量子化器２１５
は（Ｃ’ｈ）と（ａ’ｉ＝）から（Ｃｋ）　とｃｄ、）
を求めて音源再生器２２０へ出力する。音源再生器２２
０は、復号化・逆量子化した各パラメータから、ｅ　（ｎ）＝・・・（１１）＊：畳み込みを計算して、音源信号ｅ（ｎ）を再生する。再生された
ｅ（ｎ）は音声再生器２３０へ出力される。音声再生器
２３０では、デマルチプレクサ２１０により供給される
ＰＡＲＣＯＲ係数から線形予測係数（ａ、）を求め、音
源再生器２２０から供給される音源ｅ　（ｎ）から次式
に従って再生音声５（ｎ）を計算する。That is, on the decoding side, a multiplexed code string is input from the code input terminal 200, and the demultiplexer 210 dequantizes the codes representing the voice fluctuation components (c',) and (d'k). The standard audio components (gi) and (m
i) A code representing the average pitch period T and a code representing the average pitch period T are outputted to the sound source reproducer 220, and a code representing the PARCOR coefficient is outputted to the audio reproducer 230. Inverse quantizer 215
is (C'h) and (a'i=) to (Ck) and cd,)
is determined and output to the sound source regenerator 220. Sound source regenerator 22
0 calculates convolution from the decoded and dequantized parameters as follows: e (n)=...(11) *: Regenerates the sound source signal e(n). The reproduced e(n) is output to the audio reproducer 230. The audio reproducer 230 obtains a linear prediction coefficient (a,) from the PARCOR coefficient supplied by the demultiplexer 210, and reproduces reproduced audio 5(n) from the sound source e(n) supplied from the sound source reproducer 220 according to the following equation. calculate.

５（ｎ）＝−Σａｉｓ（ｎ−ｉ）＋ｅ（ｎ）直・・・（１２）再生音声ｓ　（ｎ）は出力端子２４０より出力される。5(n)=-Σais(n-i)+e(n) direct ...(12) The reproduced sound s(n) is output from the output terminal 240.

尚、以上の説明において、パルス振幅（ｇ、）並びに（
Ｃ３）の量子化法は種々考えられ、例えばスラカー量子
化を用いる場合は文献６のジャイヤントとノルの著書、
ディジタル　コーディングオブ　ウニイブフオーム、プ
レンティスホール、１９８４　　（Ｎ、Ｓ、Ｊａｉｙａ
ｎｔ　　ａｎｄ　　Ｐｅｔｅｒ　　Ｎｏ１ｌ、　　ＤＩ
ＧＩＴＡＬＣＯＤＩＮＧ　ＯＦ　ＷＡＩＦＯＲＭＳ、　
Ｐｒｅｎｔｉｃｅ−Ｈａｌｌ　１９８４．）の第４章に
詳しく記載されている。一方、ＰＡＲＣＯＲ係数の量子
化法も既に良く知られており、例えば文献７：北脇、板
金、斎胚、”　ＰＡＲＣＯＲ形音声分析合成系における
最適符号構成、”電子通信学会論文誌Ｊ６１−Ａ、２、
ｐｐ、１１９〜１２６（昭和５３−２）に詳しく述べら
れている。In addition, in the above explanation, pulse amplitude (g, ) and (
Various quantization methods can be considered for C3), for example, when using Slaker quantization, the book by Jayant and Noll in Reference 6,
Digital Coding of Uniform Form, Prentice Hall, 1984 (N, S, Jaiya
nt and Peter No1l, DI
GITAL CODING OF WAIFORMS,
Prentice-Hall 1984. ) is described in detail in Chapter 4. On the other hand, the quantization method of PARCOR coefficients is already well known, for example, Reference 7: Kitawaki, Itakane, and Saiju, "Optimal Code Configuration in PARCOR-type Speech Analysis and Synthesis System," Journal of the Institute of Electronics and Communication Engineers J61-A, 2. ,
It is described in detail in pp. 119-126 (Showa 53-2).

〔発明の効果〕〔Effect of the invention〕

以上説明したように、本発明によれば、フレーム内にお
けるピッチ構造の時間的変化を高品質に符号化するため
に、フレーム内音声の平均的な特性を表す一定区間長の
基準音声信号を基本に、その基準音声信号の線形結合を
形成させながらフレーム内のピッチ構造の変化に追随す
るようにし、また、線形結合パラメータの量子化におい
てパラメータ変動とピッチ周期との関係を利用した効率
的な量子化を行うようにしたので、ピンチ構造のパラメ
ータをフレーム内で固定する従来方式と比較して、高品
質でかつ低いビットレートにピッチ情報を符号化できる
効果がある。As explained above, according to the present invention, in order to encode temporal changes in the pitch structure within a frame with high quality, a reference audio signal of a constant length representing the average characteristics of audio within a frame is used as a base signal. In order to follow changes in the pitch structure within a frame while forming a linear combination of the reference audio signals, we also developed an efficient quantum method that uses the relationship between parameter variation and pitch period in quantizing the linear combination parameter. Compared to the conventional method in which the parameters of the pinch structure are fixed within a frame, pitch information can be encoded with higher quality and at a lower bit rate.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
従来例を示すブロック図である。１００　　・・・・・入力端子１１０　　・・・・・線形予測分析器１２０　　・・・・・ピッチ予測分析器１３０　　・・
・・・基準音声成分計算器１４０　　・・・・・音声変
動成分計算器１４５　　・・・・・量子化器１５０　　・・・・・マルチプレクサ１６０　　・・・・・出力端子２００　　・・・・・符号入力端子２１０　　・・・・・デマルチプレクサ２１５　　・・
・・・逆量子化器２２０　　・・・・・音源再生器２３０　　・・・・・音声再生器２４０　　・・・・・再生音声出力端子代理人　弁理士
　　岩　佐　　義　幸FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG. 2 is a block diagram showing a conventional example. 100...Input terminal 110...Linear prediction analyzer 120...Pitch prediction analyzer 130...
...Reference speech component calculator 140 ...Speech variation component calculator 145 ...Quantizer 150 ...Multiplexer 160 ...Output terminal 200 ... Code Input terminal 210...Demultiplexer 215...
...Inverse quantizer 220 ...Sound source regenerator 230 ...Audio regenerator 240 ...Reproduced audio output terminal agent Patent attorney Yoshiyuki Iwasa

Claims

【特許請求の範囲】[Claims]

（１）離散的な音声信号列を入力し、この入力した音声
信号列の基本周期成分を抽出し、前記入力した音声信号
列から前記基本周期成分に相当する区間長の平均的な特
性を表す基準再生音声信号列を合成し、この基準再生音
声信号列の線形結合からなる再生音声信号列を形成する
際に線形結合係数を、前記入力音声信号列と再生音声信
号とが近くなるよう求め、この求めた結合係数を前記基
本周期成分を基に量子化することを特徴とする音声符号
化方式。(1) Input a discrete audio signal string, extract the fundamental period component of the input audio signal string, and express the average characteristic of the section length corresponding to the fundamental period component from the input audio signal string. When synthesizing a reference reproduction audio signal sequence and forming a reproduction audio signal sequence consisting of a linear combination of the reference reproduction audio signal sequence, a linear combination coefficient is determined so that the input audio signal sequence and the reproduction audio signal are close to each other, A speech encoding method characterized in that the determined coupling coefficient is quantized based on the basic period component.