JPH01257999A

JPH01257999A - Voice signal encoding and decoding method, voice signal encoder and voice signal decoder

Info

Publication number: JPH01257999A
Application number: JP63085191A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-04-08
Filing date: 1988-04-08
Publication date: 1989-10-16
Anticipated expiration: 2015-06-26
Also published as: JP3055901B2

Abstract

PURPOSE:To reduce the amount of information required for sound source signal transmission by segment a voice signal nonuniformly by using feature parameters, and finding a sound source pulse train with all the whole section except at a vowel stationary part. CONSTITUTION:The voice signal is divided nonuniformly by using the feature parameters and when a divided section is a vowel stationary part which has a little variation in the features of a voice and is long in time, a sound source pulse train is found for one pitch sections among the sections, but when the section is not the vowel stationary part, the sound source pulse train is found with all the sections. Namely, the discrete voice signal is inputted on a transmission side and a segmentation part 410 divides the voice signal into nonuniform sections and sound source signals of all or some of the divided sections are represented as a combination of pulse trains calculated by a sound source calculation part 420 and transmitted. Then, those pulse trains are used on a reception side to restore the sound source signals of said sections, and a composite voice signal representing the voice signals excellently is outputted. Consequently, the amount of information required for the sound source signal transmission is reduced.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声信号を低いビットレートで効率的に符号化
、復号化するだめの音声信号符号化復号化方法並びに音
声信号符号化装置及び音声信号復号化装置に関し、特に
聴覚の特性にもとづいて音声を非一様に分割し、分割し
た区間において音声信号の特徴を表すパラメータを求め
て符号化、復号化することのできる音声信号符号化復号
化方法並びにそれに用いる装置に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an audio signal encoding/decoding method, an audio signal encoding device, and an audio signal encoding method for efficiently encoding and decoding audio signals at a low bit rate. Regarding signal decoding devices, in particular, audio signal encoding/decoding is capable of non-uniformly dividing audio based on auditory characteristics and determining parameters representing the characteristics of the audio signal in the divided sections for encoding and decoding. The present invention relates to a method of oxidation and an apparatus used therefor.

〔従来の技術〕[Conventional technology]

音声信号を低い伝送ビットレート、例えば８ｋｂ　／　
ｓ以下で伝送する方法としては、８ｋ　ｂ　／　ｓ程度
ではピッチ予測マルチパルス符号化法、４．８ｋ　ｂ　
／　ｓ程度ではピッチ補間マルチパルス符号化法などが
知られている。これらはいずれも音源信号を複数個のパ
ルスの組合せ（マルチパルス）で表し、声帯の特性をデ
ジタルフィルタで表し、音源パルスの情報とフィルタの
係数を、一定時間区間（フレーム）毎に求めて伝送して
いる。この方法の詳細については、前者は例えばＯｚａ
ｗａ、　、　Ａｒａｓｅｋｉ氏による“ｌｌｉｇｈ　Ｑ
ｕａｌｉｔｙ　Ｍｕｌｔｉ−ｐｕｌｓｅ　５ｐｅｅｃｈ
　Ｃｏｄｅｒｗｉｔｈ　Ｐｉｔｃｈ　Ｐｒｅｄｉｃｔｉ
ｏｎ”（Ｐｒｏｃ、　１．Ｃ，Ａ、Ｓ、Ｓ、Ｐ、、講演
番号３３．３１９８６）　（文献１）に、後者について
は例えばＱｚＢＨＢ、　Ａｒａｓｅｋｉ氏による”　Ｌ
ｏｗ　Ｂｉｔ　ＲａｔｅＭｕｌｔｉ−ｐｕｌｓｅ　５ｐ
ｅｅｃｈ　Ｃｏｄｅｒ　ｗｉｔｈ　Ｎａｔｕｒａｌ　５
ｐｅｅｃｈΩｕａｌｉｔｙ’（Ｐｒｏｃ、　１．Ｃ，Ａ
、Ｓ、Ｓ、Ｐ、＋講演番号９．７．１９８６）（文献２
）に記載されている。これらの方法では、伝送情報量を
低減するために、音源パルス信号のピンチ予測やフレー
ム内の１つのピンチ区間に対してのみパルス列を求める
ことによって、伝送すべき音源パルス情報を低減してい
る。Transmit the audio signal at a low transmission bit rate, e.g. 8kb/
Methods for transmitting data at speeds of 8k b/s or less include pitch predictive multipulse coding, 4.8k b/s
Pitch interpolation multi-pulse encoding methods and the like are known for applications on the order of /s. In both of these, the sound source signal is expressed as a combination of multiple pulses (multipulse), the characteristics of the vocal cords are represented by a digital filter, and the information on the sound source pulse and the filter coefficients are determined and transmitted at each fixed time interval (frame). are doing. For details of this method, the former can be found, for example, in Oza
wa, , “lligh Q” by Mr. Araseki
uality Multi-pulse 5peech
Coder with Pitch Predicti
on" (Proc, 1.C, A, S, S, P, Lecture number 33.31986) (Reference 1), for the latter, for example, QzBHB, Araseki's "L
ow Bit RateMulti-pulse 5p
eech Coder with Natural 5
peachΩuality'(Proc, 1.C,A
, S, S, P, + lecture number 9.7.1986) (Reference 2
)It is described in. In these methods, in order to reduce the amount of information to be transmitted, the amount of sound source pulse information to be transmitted is reduced by pinch prediction of the sound source pulse signal or by determining a pulse train for only one pinch section within a frame.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

しかしながら、この従来の方法では、音源パルス、フィ
ルタ係数を求める区間長を一定（文献１゜２では２０ｍ
５）としていた。従って、母音区間ではほぼ周期的な波
形が連続し音声の特徴が余り変化していないにも拘わら
ず、２０ｍ５という短時間毎に情報を伝送するというこ
とで、非常に効率が悪く、他方、子音区間では速い音声
の特徴の変化に追随出来ずに音質劣化が起こるという問
題点があった。また、特にこの問題点はビットレートが
８　ｋ　ｂ　／　ｓよりもかなり低い場合に顕著であっ
た。However, in this conventional method, the interval length for determining the sound source pulse and filter coefficient is constant (20 m in Reference 1.2).
5). Therefore, even though the almost periodic waveform continues in the vowel section and the characteristics of the voice do not change much, the efficiency is very low because information is transmitted every 20 m5, and on the other hand, the consonant There was a problem in that the sound quality deteriorated in sections because it was unable to follow changes in the characteristics of fast voices. Moreover, this problem was particularly noticeable when the bit rate was considerably lower than 8 kb/s.

上述の問題を更に具体的に説明すると、まず、よく知ら
れているように、母音区間は、発生速度にも依存するが
、一般に１００〜３００ｍ５と時間長が長く、この半分
以上は音声信号の特徴が殆ど変化しない定常区間とみな
せる。更に、母音定常部では、信号を零に抑圧し情報を
全く伝送しなくても、音節明瞭度は殆ど劣化しないこと
が知られている。To explain the above-mentioned problem more specifically, first, as is well known, vowel intervals generally have a long time length of 100 to 300 m5, although it depends on the generation speed, and more than half of this time is the length of the speech signal. It can be regarded as a stationary interval in which the characteristics hardly change. Furthermore, it is known that in the vowel stationary region, syllable intelligibility hardly deteriorates even if the signal is suppressed to zero and no information is transmitted at all.

但し、自然性は劣化する。従って、従来方法の如く、こ
れを短い２０ｍ５程度のフレーム区間毎に分析して情報
を伝送しているのでは効率が非常に悪かった。一方、子
音区間では音声の特徴の変化が速いために、２０ｍ５の
フレームでは長すぎて音声の変化に対応した精度の良い
分析が難しく、再生音声の音質が劣化していた。However, the naturalness deteriorates. Therefore, as in the conventional method, the efficiency is extremely low if the information is transmitted by analyzing the information in every short frame section of about 20 m5. On the other hand, since the characteristics of the voice change rapidly in the consonant section, a frame of 20 m5 is too long, making it difficult to perform accurate analysis that corresponds to the change in voice, and the quality of the reproduced voice deteriorates.

そこで、これらの問題点を改善するために、例えばＭａ
ｒｋｅｌ、　Ｇｒａｙ氏による“Ｌｉｎｅａｒ　Ｐｒｅ
ｄｉｃｔｉｏｎｏｆ　５ｐｅｅｃｈ”第１０章（Ｓｐｒ
ｉｎｇｅｒ−Ｖｅｒｌａｇ社、　１９７６）（文献３）
にあるように、１０ｍ　ｓ程度の固定長フレームで求め
たスペクトルのフレーム間での差分の変化をもとに、フ
レーム長を固定区間の整数倍に可変にする方法が提案さ
れているが、この改善策でも、問題がある。すなわち、
かかる方法においては、上述のようにしたときに聴覚と
の対応づけの良くない特徴パラメータを用いてフレーム
長の可変を行っていることと、フレーム長の可変方法が
固定区間長を基にしており自由度がないために、フレー
ム長の増大区間を増してビットレートを低減すると、音
質が大きく劣化するという問題点があった。Therefore, in order to improve these problems, for example, Ma
“Linear Pre
dictionary of 5peech” Chapter 10 (Spr.
inger-Verlag, 1976) (Reference 3)
As shown in , a method has been proposed in which the frame length is made variable by an integer multiple of the fixed interval based on the change in the difference between frames of the spectrum obtained with a fixed length frame of about 10 ms. Even with improvements, there are problems. That is,
In this method, the frame length is varied using feature parameters that do not correlate well with the auditory sense, and the method for varying the frame length is based on a fixed interval length. Since there is no degree of freedom, there is a problem in that when the bit rate is reduced by increasing the frame length increasing section, the sound quality deteriorates significantly.

本発明の目的は、音源信号伝送に必要な情報量を大幅に
低減することができ、ビットレートを大幅に下げても合
成音声の聴覚的な劣化を非常に少なくすることのできる
音声信号符号化復号化方法並びに音声信号符号化装置及
び音声信号復号化装置を提供することにある。An object of the present invention is to provide audio signal encoding that can significantly reduce the amount of information required for audio source signal transmission and that can significantly reduce auditory deterioration of synthesized speech even when the bit rate is significantly lowered. An object of the present invention is to provide a decoding method, an audio signal encoding device, and an audio signal decoding device.

〔課題を解決するための手段〕[Means to solve the problem]

本発明の音声信号符号化復号化方法は、離散的な音声信
号を入力し聴覚の特性と対応の良い方法により前記音声
信号を非一様な区間に分割し、その分割された区間の全
部または一部の区間における音源信号を複数個のパルス
列の組合せで表して伝送し、伝送されたパルス列を用いて前記区間の音源信号を復元
して前記音声信号を表す合成音声信号を出力することを
特徴としている。The audio signal encoding/decoding method of the present invention inputs a discrete audio signal, divides the audio signal into non-uniform sections using a method that corresponds well to the auditory characteristics, and divides the audio signal into non-uniform sections or A sound source signal in a certain section is expressed as a combination of a plurality of pulse trains and transmitted, the transmitted pulse train is used to restore the sound source signal in the section, and a synthesized sound signal representing the sound signal is output. It is said that

また、本発明の音声信号符号化装置は、入力した離散的
な音声信号系列から聴覚の特性と対応の良い特徴パラメ
ータを抽出しそのパラメータを用いて音声信号を非一様
な時間区間にセグメンテーションするセグメンテーショ
ン回路と、分割された音声信号の全部または一部の区間
から短時間スペクトル特性を表すスペクトルパラメータ
とピンチパラメータとを計算するスペクトルパラメータ
計算回路と、分割された区間の全部または一部の区間における音源信
号を表す複数個パルス列の組合せを計算する音源パルス
計算回路と、スペクトルパラメータとピッチパラメータと音源パルス
列を組み合わせて出力するマルチプレクサ回路とを存す
ることを特徴としている。Further, the audio signal encoding device of the present invention extracts feature parameters that correspond well to auditory characteristics from the input discrete audio signal sequence, and uses the parameters to segment the audio signal into non-uniform time intervals. a segmentation circuit; a spectral parameter calculation circuit that calculates a spectral parameter and a pinch parameter representing short-time spectral characteristics from all or part of the divided audio signal; It is characterized by comprising a sound source pulse calculation circuit that calculates a combination of a plurality of pulse trains representing a sound source signal, and a multiplexer circuit that outputs a combination of a spectrum parameter, a pitch parameter, and a sound source pulse train.

更に、本発明の音声信号復号化装置は、音声信号の短時
間スペクトル特性を表すスペクトルパラメータとピッチ
パラメータと音源信号を表す音源パルス列を入力して前
記スペクトルパラメータとピッチパラメータと音源パル
ス列とを分離するデマルチプレクサ回路と、ピンチパラメータと音源パルス列を用いて非一様に分割
された区間全体の音源信号を復元する音源復元回路と、復元された音源信号を用いて前記区間の音声信号を合成
する合成フィルタとを有することを特徴としている。Further, the audio signal decoding device of the present invention inputs a spectral parameter representing a short-time spectral characteristic of an audio signal, a pitch parameter, and a sound source pulse train representing a sound source signal, and separates the spectral parameter, pitch parameter, and sound source pulse train. a demultiplexer circuit, a sound source restoration circuit that restores the sound source signal of the whole section non-uniformly divided using the pinch parameter and the sound source pulse train, and a synthesis circuit that uses the restored sound source signal to synthesize the sound signal of the section. It is characterized by having a filter.

〔作用〕[Effect]

上記のようにして音声信号の符号化復号化を行うため、
音源信号伝送に必要な情ｆｌ［ｔを大幅に低減でき、し
かもビットレートを大幅に下げても合成音声の聴覚的な
劣化の非常に少ない符号化復号化処理を行える。In order to encode and decode the audio signal as described above,
The information fl[t required for sound source signal transmission can be significantly reduced, and even if the bit rate is significantly lowered, encoding and decoding processing can be performed with very little auditory deterioration of synthesized speech.

音声信号符号化装置は、上記構成のセグメンテーション
回路、スペクトルパラメータ計算回路、音源パルス計算
回路、マルチプレクサ回路を有することにより、上述の
ような符号化処理が行える。The audio signal encoding device can perform the encoding process as described above by having the segmentation circuit, spectral parameter calculation circuit, excitation pulse calculation circuit, and multiplexer circuit configured as described above.

音声信号復号化装置は、上記構成のデマルチプレクサ回
路、音源復元回路、合成フィルタを有することにより、
その復号化処理を行える。The audio signal decoding device includes the demultiplexer circuit, the sound source restoration circuit, and the synthesis filter configured as described above.
The decryption process can be performed.

〔実施例〕〔Example〕

次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明による音声信号符号化復号化方法並びに
音声信号符号化装置及び音声信号復号化装置の一実施例
の構成を示すブロック図であり、また、第２図はその原
理の説明に供するブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of an audio signal encoding/decoding method, an audio signal encoding device, and an audio signal decoding device according to the present invention, and FIG. 2 is a block diagram for explaining the principle thereof. FIG.

本発明に係る音声信号の符号化復号化方法においては、
送信側では離散的な音声信号を入力し聴覚の特性と対応
の良い方法により音声信号を非一様な区間に分割し、分
割された区間の全部または一部の区間における音源信号
を複数個のパルス列の組合せで表して伝送し、受信側で
は前記パルス列を用いて前記区間の音源信号を復元して
音声信号を良好に表す合成音声信号を出力する。In the audio signal encoding/decoding method according to the present invention,
On the transmitting side, a discrete audio signal is input, the audio signal is divided into non-uniform sections using a method that corresponds well to the auditory characteristics, and the sound source signal in all or part of the divided sections is divided into multiple sections. The signal is expressed as a combination of pulse trains and transmitted, and the receiving side uses the pulse train to restore the sound source signal of the section and outputs a synthesized audio signal that satisfactorily represents the audio signal.

以下、まず、本発明に従う符号化処理の原理について、
第２図（ａ）を用いて説明する。図において、セグメン
テーション尺度計算部４００は、音声信号を入力し、音
声特徴変化の速い子音部でも精度よく分析できるような
短時間区間（例えば５ｍ５）毎に、聴覚の特性との対応
の良いセグメンテーション尺度を計算する。ここではこ
の尺度として、動的尺度Ｄ　（ｔ）を用いる。この尺度
は５ｍｓ毎に求めたＬＰＣケプストラムＣ１（１≦ｉ≦
ｐ）を用いて下式の様に書ける。Below, first, the principle of encoding processing according to the present invention will be explained.
This will be explained using FIG. 2(a). In the figure, a segmentation scale calculation unit 400 inputs a speech signal and calculates a segmentation scale that corresponds well to auditory characteristics for each short period (for example, 5 m5) so that even consonant parts with rapid speech characteristics changes can be analyzed with high accuracy. Calculate. Here, a dynamic measure D (t) is used as this measure. This measure is the LPC cepstrum C1 (1≦i≦
p) can be written as the following formula.

ここで、ａｌは、である。Here, al is It is.

尚、この計算法についての詳細な説明はＦｏｒｕｉ氏に
よる”　Ｏｎ　ｔｈｅ　Ｒｏｌｅ　ｏｆ　５ｐｅｃｔｒ
ａｌ　Ｔｒａｎｓｉｔｉｏｎｆｏｒ　５ｐｅｅｃｈ　Ｐ
ｅｃｅｐｔｉｏｎ”と題した論文（Ｊ　、　Ａｃｏｕｓ
　ｔ　ｉｃａ　ｌ５ｏｃｉｅｔｙ　ｏｆ　Ａｍｅｒｉｃ
ａ、　ｖｏｌ、８０．ｐｐ、１０１６−１０２５．１９
８６）（文献４）に記載されているので、ここでは詳細
は省略する。また、（１）式の代わりにパワ項ａ。A detailed explanation of this calculation method can be found in Mr. Forui's ``On the Role of 5pectr''.
al Transition for 5peech P
A paper entitled “Eception” (J, Acous
tica l5ociety of America
a, vol, 80. pp, 1016-1025.19
86) (Reference 4), the details are omitted here. Also, the power term a is used instead of equation (1).

を含めた（３）式や他の良好な方法を用いることも出来
る。It is also possible to use equation (3) including .

セグメンテーション部４１０は、セグメンテーション尺
度を入力して、音声信号を非一様に分割（セグメンテー
ション）する、これは前記（１）あるいは（３）式の尺
度を用いて行う。まず前記尺度の極大値の付近毎に音声
信号をあらかじめ分割する。ここで、前記文献４に記さ
れているように、前記尺度の極大値の前後数１０ｍ　ｓ
の部分は、子音から母音、母音から子音への調音結合部
分にほぼ対応しており、音韻知覚の際の聴覚的に非常に
重要な部分であることが報告されている。従ってこのよ
うな聴覚的に重要な部分を除き前記尺度がある程度連続
的に小さくなる箇所で音声信号をセグメンテーションす
る。セグメンテーションした様子を第２図（ｂ）に示す
。ここで第２図（ｂ）の上段は音声波形、下段は動的尺
度とセグメンテーションの一例を示す。The segmentation unit 410 inputs a segmentation measure and non-uniformly divides (segments) the audio signal using the measure of equation (1) or (3). First, the audio signal is divided in advance into areas near the maximum value of the scale. Here, as described in the above-mentioned document 4, several 10 m s before and after the maximum value of the scale
This part roughly corresponds to the articulatory connection part from a consonant to a vowel, and from a vowel to a consonant, and has been reported to be a very important part auditorily during phonological perception. Therefore, the audio signal is segmented at locations where the scale decreases continuously to some extent, excluding such audibly important portions. The state of segmentation is shown in FIG. 2(b). Here, the upper part of FIG. 2(b) shows a speech waveform, and the lower part shows an example of dynamic scale and segmentation.

次に、ＬＰＧ、　　ピッチ分析部４３０はセグメンテー
ションされた区間全体あるいはこの中の一部分の音声信
号を分析してＬＰＧ係数を求める。なお、一部分の音声
信号から求める場合は、セグメンテーション部４１０で
求めたケプストラムから周知の方法によってＬＰＧ係数
に変換することもできる。Next, the LPG/pitch analyzer 430 analyzes the entire segmented section or a portion of the audio signal to obtain LPG coefficients. Note that when determining from a portion of the audio signal, the cepstrum determined by the segmentation unit 410 can be converted into LPG coefficients by a well-known method.

そして周知の方法によってピッチ周期の計算及びセグメ
ンテーションされた区間が母音定常部か否かの判別を行
う。ここでこの判別には、セグメンテーション区間内の
電力とピッチ周期だけ離れた自己相関関数（ピッチゲイ
ン）の値があらかじめ定められたしきい値よりも大きい
か否かによって判別する方法を用いることができる。Then, the pitch period is calculated and it is determined whether the segmented section is a vowel stationary part using a well-known method. Here, for this determination, a method can be used in which the power within the segmentation interval is determined based on whether the value of the autocorrelation function (pitch gain) separated by the pitch period is larger than a predetermined threshold. .

音源計算部４２０は、セグメンテーションされた区間が
母音定常部のときは、前記セグメンテーション区間をピ
ッチ区間の周期毎のサブフレームに分割し、そのうちの
１つのピッチ区間について、音源パルス列を計算する。When the segmented section is a vowel stationary section, the sound source calculation unit 420 divides the segmentation section into subframes for each period of the pitch section, and calculates a sound source pulse train for one of the pitch sections.

ここで音源パルス列の計算には、特願昭５９−２７２４
３５号明細書（文献５）を参照することができる。Here, for calculation of the sound source pulse train, patent application No. 59-2724
Reference can be made to Specification No. 35 (Document 5).

また、他のピッチ区間については、ピッチ区間毎にピッ
チ区間の波形を良好に表すように振幅補正係数を求める
。Furthermore, for other pitch sections, amplitude correction coefficients are determined for each pitch section so as to satisfactorily represent the waveform of the pitch section.

従って、本発明によれば、従来方式に比ベビットレート
を大幅に下げても１ピッ千区間の音源パルスの数を大幅
に増やすことが可能であるため、後述のように他のピン
チ区間は振幅補正あるいは補間処理を用いて復元すると
しても、前記区間全体の音源信号を良好に表すことがで
きる。Therefore, according to the present invention, even if the baby bit rate is significantly lowered compared to the conventional method, it is possible to significantly increase the number of sound source pulses in the 1,000 pitch interval. Even if the restoration is performed using amplitude correction or interpolation processing, the sound source signal of the entire section can be represented satisfactorily.

一方、前記セグメンテーション区間が母音定常部でない
ときは、区間全体で音源パルス列を求める。On the other hand, when the segmentation section is not a vowel stationary section, the sound source pulse train is determined for the entire section.

送信側の伝送情報は音源パルス列の振幅１位置、セグメ
ンテーションされた区間の長さを示すセグメンテーショ
ン情報、ピッチ周期、判別情報、振幅補正係数である。The transmission information on the transmitting side is the amplitude 1 position of the sound source pulse train, segmentation information indicating the length of the segmented section, pitch period, discrimination information, and amplitude correction coefficient.

受信側では、母音定常部の時は、伝送された音源パルス
列の振幅と位置をピンチ周期毎に滑らかに変化させたり
、セグメンテーションされた区間の間での音源信号に補
間処理を施し、伝送されたピッチ区間以外のピッチ区間
のパルス列を復元しセグメンテーションされた区間の音
源信号を復元する。On the receiving side, in the case of a vowel stationary part, the amplitude and position of the transmitted sound source pulse train are smoothly changed every pinch cycle, and the sound source signal between the segmented sections is subjected to interpolation processing. The pulse train of the pitch section other than the pitch section is restored to restore the sound source signal of the segmented section.

次に、第１図を参照して説明する。Next, a description will be given with reference to FIG.

第１図において、送信側は音声信号符号化装置を、また
受信側は音声信号復号化装置をそれぞれ含み、両者間に
は適宜の伝送路が設けられている。In FIG. 1, the transmitting side includes an audio signal encoding device, and the receiving side includes an audio signal decoding device, and an appropriate transmission path is provided between the two.

音声信号符号化装置は、入力した離散的な音声信号系列
から聴覚の特性と対応の良い特徴パラメータを抽出し前
記パラメータを用いて前記音声信号を非一様な時間区間
にセグメンテーションするセグメンテーション回路と、
前記分割された音声信号から短時間スペクトル特性を表
すスペクトルパラメータとピッチパラメータとを計算す
るスペクトルパラメータ計算回路と、前記分割された区
間の全部または一部の区間における音源信号を表す複数
個のパルス列の組合せを計算する音源パルス計算回路と
、前記スペクトルパラメータと前記ピッチパラメータと
前記音源パルス列を組み合わせて出力するマルチプレク
サ回路とを有する。The audio signal encoding device includes a segmentation circuit that extracts feature parameters that correspond well to auditory characteristics from an input discrete audio signal sequence and uses the parameters to segment the audio signal into non-uniform time intervals;
a spectral parameter calculation circuit that calculates a spectral parameter and a pitch parameter representing short-time spectral characteristics from the divided audio signal; and a spectral parameter calculation circuit that calculates a spectral parameter and a pitch parameter representing short-time spectral characteristics from the divided audio signal; The sound source pulse calculation circuit includes a sound source pulse calculation circuit that calculates a combination, and a multiplexer circuit that combines and outputs the spectrum parameter, the pitch parameter, and the sound source pulse train.

音声信号復号化装置は、音声信号の短時間スペクトル特
性を表すスペクトルパラメータとピッチパラメータと音
源信号を表す音源パルス列を入力して前記スペクトルパ
ラメータと前記ピンチパラメータと前記音源パルス列と
を分離するデマルチプレクサ回路と、前記ピッチパラメ
ータと前記音源パルス列を用いて非一様に分割された区
間全体の音源信号を復元する音源復元回路と、前記復元
された音源信号を用いて前記区間の音声信号を合成する
合成フィルタとを有する。The audio signal decoding device includes a demultiplexer circuit that inputs a spectral parameter representing a short-time spectral characteristic of an audio signal, a pitch parameter, and a sound source pulse train representing a sound source signal and separates the spectral parameter, the pinch parameter, and the sound source pulse train. a sound source restoration circuit that uses the pitch parameter and the sound source pulse train to restore the sound source signal of the entire non-uniformly divided section; and a synthesis circuit that uses the restored sound source signal to synthesize the sound signal of the section. It has a filter.

音声信号符号化、復号化処理は、以下のようにしてなさ
れる。Audio signal encoding and decoding processing is performed as follows.

本発明の一実施例を示す第１図において、入力端子５０
０から離数的な音声信号を入力する。セグメンテーショ
ン尺度計算回路５０５は第２図（ａ）のセグメンテーシ
ョン尺度計算部４００と同一の計算を行い、セグメンテ
ーション尺度を出力する。In FIG. 1 showing an embodiment of the present invention, an input terminal 50
Input audio signals that are a number apart from 0. The segmentation measure calculation circuit 505 performs the same calculation as the segmentation measure calculation section 400 of FIG. 2(a), and outputs a segmentation measure.

セグメンテーション回路５１０は第２図（ａ）のセグメ
ンテーション部４１０と同一の処理を行い、音声信号を
非一様な区間にセグメンテーションし、セグメンテーシ
ョン区間の長さを表すセグメンテーション情報とセグメ
ンテーションされた音声信号を出力する。ＬＰＧ、　　
ピッチ計算回路５２０は第２図（ａ）のＬＰＣ，ピッチ
分析部４３０と同一の処理を行い、セグメンテーション
された音声信号について、ＬＰＧ分析、ピッチ周期の計
算及び、セグメンテーションされた区間が母音定常部か
否かの判別を行い、ＬＰＧ係数、ピッチ周期、判別情報
を量子化器５３０へ出力する。量子化器５３０はこれら
の情報を所定のビット数で量子化しマルチプレクサ６０
０へ出力すると共に、これらを逆量子化する。The segmentation circuit 510 performs the same processing as the segmentation unit 410 in FIG. 2(a), segments the audio signal into non-uniform sections, and outputs segmentation information indicating the length of the segmentation section and the segmented audio signal. do. LPG,
The pitch calculation circuit 520 performs the same processing as the LPC and pitch analysis section 430 in FIG. It is determined whether or not the LPG coefficient, pitch period, and discrimination information are output to the quantizer 530. The quantizer 530 quantizes this information into a predetermined number of bits and sends it to the multiplexer 60.
In addition to outputting to 0, these are dequantized.

重みづけ回路５４０は、セグメンテーションされた音声
信号と逆量子化されたＬＰＧ係数を用いて前記信号に重
みづけを施す。重みづけの方法は前記文献５の重みづけ
回路（２００）を参照することができる。インパルス応
答計算回路５６０は逆量子化されたＬＰＧ係数を用いて
インパルス応答を計算する。インパルス応答計算の方法
は前記文献５のインパルス応答計算回路（１７０）を参
照することができる。自己相関関数計算回路５７０は前
記インパルス応答の自己相関関数を計算し音源パルス計
算回路５８０へ出力する。自己相関関数の計算法は前記
文献５の自己相関関数計算回路（１８０）を参照するこ
とができる。相互相関関数計算回路５５０は前記重みづ
けられた信号と前記インパルス応答との相互相関関数を
計算して音源パルス計算回路５８０へ出力する。この計
算法については、前記文献５の相互相関関数計算回路（
２１０）を参照することができる。Weighting circuit 540 weights the signal using the segmented audio signal and the dequantized LPG coefficients. For the weighting method, reference can be made to the weighting circuit (200) in Document 5 mentioned above. Impulse response calculation circuit 560 calculates an impulse response using the dequantized LPG coefficients. For the method of impulse response calculation, reference can be made to the impulse response calculation circuit (170) of the above-mentioned document 5. The autocorrelation function calculation circuit 570 calculates the autocorrelation function of the impulse response and outputs it to the sound source pulse calculation circuit 580. For the method of calculating the autocorrelation function, reference can be made to the autocorrelation function calculation circuit (180) in Document 5. The cross-correlation function calculation circuit 550 calculates a cross-correlation function between the weighted signal and the impulse response and outputs it to the sound source pulse calculation circuit 580. Regarding this calculation method, refer to the cross-correlation function calculation circuit (
210).

音源パルス計算回路５８０は、セグメンテーションされ
た区間が母音定常部の時は、前記第２図（ａ）の説明中
で述べた様に、前記区間をピンチ周期毎のサブフレーム
に分割して中央付近のサブフレーム区間について音源パ
ルス列を計算する。When the segmented section is a vowel stationary section, the sound source pulse calculation circuit 580 divides the section into subframes for each pinch period and divides the section into subframes near the center, as described in the explanation of FIG. 2(a). The sound source pulse train is calculated for the subframe section.

また他のサブフレーム区間では前記第２図（ａ）の説明
中で述べたようにパルス列の振幅補正係数を各区間で１
つずつ求める。一方、母音定常部でないときは、前記区
間全体に対して音源パルス列を計算する。音源パルス列
の計算法については前記文献５の駆動信号計算回路（２
２０）を参照することができる量子化器５９０は前記音
源パルス列の振幅２位置を所定のビット数で量子化して
マルチプレクサ６００へ出力する。量子化器５９０の動
作は前記文献５の符号化回路（２３０）を参照すること
が出来る。マルチプレクサ６００は量子化された音源パ
ルス列、ＬＰＣ係数、ピッチ周期、セグメンテーション
情報、判別情報、振幅補正係数を組み合わせて出力する
。In addition, in other subframe sections, the amplitude correction coefficient of the pulse train is set to 1 in each section, as described in the explanation of Fig. 2(a) above.
Ask for one by one. On the other hand, if it is not a vowel stationary region, the sound source pulse train is calculated for the entire section. Regarding the calculation method of the sound source pulse train, see the drive signal calculation circuit (2
20) quantizes the two amplitude positions of the sound source pulse train using a predetermined number of bits and outputs the quantizer 590 to the multiplexer 600. For the operation of the quantizer 590, reference can be made to the encoding circuit (230) in Document 5. The multiplexer 600 combines and outputs the quantized sound source pulse train, LPC coefficient, pitch period, segmentation information, discrimination information, and amplitude correction coefficient.

一方、受信側では、デマルチプレクサ６１０は、音源パ
ルス情報、ＬＰＧ係数、ピッチ周期、セグメンテーショ
ン情報、判別情報、振幅補正係数を分離して出力する。On the other hand, on the receiving side, the demultiplexer 610 separates and outputs the sound source pulse information, LPG coefficient, pitch period, segmentation information, discrimination information, and amplitude correction coefficient.

音源パルス復号器６２０は音源パルス列の振幅、位置を
復号する。ＬＰＣ，ピッチ復号器６４０はＬＰＣ係数、
ピッチ周期を復号する。音源復元器６３０は判別情報、
セグメンテーション情報を入力して、区間が母音定常部
の時は、復号した１ピッチ区間の音源パルス列を用いて
セグメンテーション区間全体の音源信号を復元し出力す
る。ここで伝送されていないピッチ区間の音源パルス列
の復元法としては、ピッチ区間のパルス全体をピッチ周
期だけずらして位置を復元し、振幅に関しては振幅補正
係数を乗じて振幅を復元する。この方法以外にも、隣接
セグメンテーション区間の音源パルス列を用いて補間処
理によって復元する方法などが知られており、この詳細
については前記文献５を参照することかできる。またこ
れ以外にも他の周知な方法を用いることもできる。一方
、区間が母音定常部でないときには、受信した音源パル
ス列を用いて前記区間全体の音源信号を発生して出力す
る。補間器６５０は復号したＬＰＧ係数、判別情報、ピ
ッチ周期を用いて、セグメンテーション区間が母音定常
部のときはスペクトル変化を滑らかにするために、ピッ
チ周期毎にＬＰＣ係数をＰＡＲＣＯＲ係数上で補間する
。The sound source pulse decoder 620 decodes the amplitude and position of the sound source pulse train. LPC, pitch decoder 640 uses LPC coefficients,
Decode the pitch period. The sound source restorer 630 uses discrimination information,
When the segmentation information is input and the section is a vowel stationary part, the decoded sound source pulse train of the one pitch section is used to restore and output the sound source signal of the entire segmentation section. Here, as a method for restoring the sound source pulse train in the pitch section that has not been transmitted, the entire pulse in the pitch section is shifted by the pitch period to restore the position, and as for the amplitude, the amplitude is restored by multiplying by an amplitude correction coefficient. In addition to this method, a method of restoring by interpolation processing using a sound source pulse train of an adjacent segmentation section is known, and the above-mentioned document 5 can be referred to for details of this method. In addition to this, other known methods can also be used. On the other hand, when the section is not a vowel stationary part, the received sound source pulse train is used to generate and output a sound source signal for the entire section. The interpolator 650 uses the decoded LPG coefficients, discrimination information, and pitch period to interpolate the LPC coefficient on the PARCOR coefficient for each pitch period in order to smooth the spectrum change when the segmentation section is a vowel stationary part.

一方、前記区間が母音定常部でないときには係数を補間
せずに合成フィルタ６６０へ出力する。これは母音定常
部以外では音声信号のスペクトル特徴の変化が速いので
補間によってかえって大きな歪が入ることを防ぐ為であ
る０合成フィルタ６６０はＬＰＧ係数、復元された音源
信号、セグメンテーション情報を用いてセグメンテーシ
ョン区間全体における音声信号を合成し端子６７０を通
して出力する。On the other hand, when the section is not a constant vowel section, the coefficients are output to the synthesis filter 660 without being interpolated. This is to prevent large distortion from being introduced by interpolation since the spectral characteristics of the speech signal change rapidly outside of the vowel stationary region.The synthesis filter 660 performs segmentation using the LPG coefficients, the restored sound source signal, and segmentation information. The audio signals in the entire section are synthesized and output through the terminal 670.

以上のように、上記構成によれば、聴覚の特性と対応づ
けのよい特徴パラメータを用いて音声信号を非一様にセ
グメンテーションし、さらにセグメンテーションされた
区間のスペクトルの特徴によって、複数種類のベクトル
量子化器を切り替えてスペクトルパラメータの量子化を
行い、さらに前記区間が音声の特徴の、変化が殆どなく
時間的にも長い母音定常部のときは、その区間のうちの
１つのピッチ区間について音源パルス列を求め、母音定
常部以外のときは区間全体で音源パルス列を求めている
ので、音源信号伝送に必要な情ＩＩを大幅に低減するこ
とができる。従ってピットレートを大幅に下げても合成
音声の聴覚的な劣化は非常に少なく高い自然性が得られ
る。As described above, according to the above configuration, an audio signal is non-uniformly segmented using feature parameters that are well correlated with auditory characteristics, and furthermore, multiple types of vector quantum The spectral parameters are quantized by switching the quantizer, and if the section is a vowel stationary part with little change and long in time, the sound source pulse train is calculated for one pitch section of the section. Since the sound source pulse train is found for the entire section when the sound source pulse train is not in the vowel stationary region, information II required for sound source signal transmission can be significantly reduced. Therefore, even if the pit rate is significantly lowered, the auditory deterioration of the synthesized speech is very small and a high degree of naturalness can be obtained.

上述した実施例はあくまで本発明の一実施例に過ぎずそ
の変形例を種々考えられる。The embodiment described above is merely one embodiment of the present invention, and various modifications thereof can be considered.

例えば、セグメンテーションされた区間が母音定常部で
あるときには、相互相関関数計算回路５５０は前記区間
全体に対してではなく、前記区間の中央付近の１ピッチ
区間に対してのみ相互相関関数を計算しても良い。これ
は実際に音源パルス列を求めるのは１ピッチ区間である
ためである。この方法では特性は少し劣化するが演算量
はほぼＰ／Ｎ（ここでＰはピッチ周期、Ｎは母音定常部
のセグメンテーション区間の長さ）に低減できる。For example, when the segmented section is a vowel stationary part, the cross-correlation function calculation circuit 550 calculates the cross-correlation function only for one pitch section near the center of the section, not for the entire section. Also good. This is because the sound source pulse train is actually determined in one pitch section. In this method, although the characteristics are slightly degraded, the amount of calculation can be reduced to approximately P/N (where P is the pitch period and N is the length of the segmentation section of the vowel stationary part).

また、音源パルスの計算法としては上述の実施例の他に
周知の良好な方法を用いることもできる。Further, as a method for calculating the sound source pulse, other than the above-mentioned embodiments, a well-known and good method can also be used.

これについては、Ｋ、Ｏｚａｗａ　”　Ａ　５ｔｕｄｙ
　ｏｆ　Ｐｕ１ｓｅＳｅａｒｃｈ　Ａｌｇｏｒｉｔｈｍ
ｓ　ｆｏｒ　Ｍｕｌｔｉ−ｐｕｌｓｅ　５ｐｅｅｃｈＣ
ｏｄｅｃ　Ｒｅａｌｉｚａｔｉｏｎ”（Ｊ、５ｅｌｅｃ
ｔｅｄ　Ａｒｅａ　ｏｆＣｏ＊ｍｕｎｉｃａｔｉｏｎｓ
＋　ｐｐ、＋　１９８７）　（文献６）を参照すること
ができる。Regarding this, please refer to K. Ozawa's A 5tudy.
of PulseSearch Algorithm
s for Multi-pulse 5peechC
odec Realization” (J, 5elec
ted Area of Co*communications
+pp, +1987) (Reference 6).

また、セグメンテーションされた区間が母音定常部のと
きは、音源パルス列を求める１ピッチ区間の位置として
は、固定ではなく、最も良好な合成音声が得られるよう
なピッチ区間を探索して求めるようにすることもできる
。この処理によって音質はさらに良好になるが演算量は
若干増加する。Also, when the segmented section is a vowel stationary part, the position of one pitch section for which the sound source pulse train is found is not fixed, but is searched for and found the pitch section that will yield the best synthesized speech. You can also do that. Although this processing improves the sound quality, the amount of calculations increases slightly.

具体的な方法については前記文献５を参照することがで
きる。For a specific method, reference can be made to the above-mentioned document 5.

また、合成フィルタ６６０の係数の補間法としては、対
数断面積比上や他のパラメータ上で補間することもでき
る。さらに補間法としては線形補間以外に対数補間等を
用いることもできる。これらの方法の詳細についてはＢ
、Ｓ、Ａｔａ１氏らによる”５ｐｅｅｃｈ　　Ａｎａｌ
ｙｓｉｓ　　ａｎｄ　　５ｙｎｔｈｅｓｉｓ　　ｂｙ　
　ＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ　ｏｆ　ｔｈｅ　
５ｐｅｅｃｈ　Ｈａｖｅ”　（Ｊ、Ａｃｏｕｓｔ、Ｓｏ
ｃ。Further, as a method for interpolating the coefficients of the synthesis filter 660, interpolation can be performed on the logarithmic cross-sectional area ratio or on other parameters. Furthermore, as an interpolation method, logarithmic interpolation or the like can be used in addition to linear interpolation. For details on these methods, see B.
“5peech Anal” by Mr. , S. Ata1 et al.
ysis and 5ynthesis by
Linear Prediction of the
5peech Have” (J, Acoust, So
c.

八ｍｅｒｉｃａ、　ｐｐ、６３７−６５５．１９７１　
）　　（文献７）を参照することができる。8 merica, pp, 637-655.1971
) (Reference 7).

また、受信側でピンチ周期を補間によって滑らかに変化
させることによって合成音質はさらに改善される。Furthermore, by smoothly changing the pinch period by interpolation on the receiving side, the synthesized sound quality is further improved.

〔発明の効果〕〔Effect of the invention〕

以上説明したように、本発明の音声信号符号化復号化方
法によれば、音声信号を符号化し伝送して復号化したと
き、音声信号を良好に表す合成音声信号を得ることがで
きる。従来の固定長フレームによるものや、あるいは固
定長フレームで求めたスペクトルのフレーム間での差分
の変化を基にフレーム長を可変にするものに比し音質の
劣化を少なくすることができる。聴覚の特性と対応づけ
のよい特徴パラメータを用いて音声信号を非一様に分割
することができると共に、分割された区間が音声の特徴
の変化が殆どな（時間的にも長い母音定常部のときはそ
の区間のうち１つのピッチ区間について音源パルス列を
求め、母音定常部以外のときは区間全体で音源パルスを
求めることが可能であり、音源信号伝送に必要な情報量
を大幅に低減することができると同時にビットレートを
大幅に下げても合成音声の聴覚的な劣化の非常に少なく
高い自然性が得られる符号化復号化処理を行うことがで
きるので、低いビットレートで効率的に符号化、復号化
する場合に適している。As described above, according to the audio signal encoding/decoding method of the present invention, when an audio signal is encoded, transmitted, and decoded, a synthesized audio signal that satisfactorily represents the audio signal can be obtained. Deterioration in sound quality can be reduced compared to conventional methods using fixed-length frames or methods in which the frame length is variable based on changes in the difference between frames of spectra determined using fixed-length frames. It is possible to non-uniformly divide a speech signal using feature parameters that correlate well with auditory characteristics, and the divided sections have almost no change in speech characteristics (in the case of vowel stationary parts that are long in time). When this is the case, the sound source pulse train can be found for one pitch section of that section, and when it is not a constant vowel section, the sound source pulse train can be found for the entire section, which greatly reduces the amount of information required for sound source signal transmission. At the same time, even if the bit rate is significantly lowered, the synthesized speech can be encoded and decoded with very little auditory deterioration and a high degree of naturalness, so it is possible to encode efficiently at a low bit rate. , suitable for decoding.

さらに本発明によれば、音声信号符号化復号化方法を実
施するのに好適な音声信号符号化装置及び復号化装置が
得られる。Further, according to the present invention, an audio signal encoding device and a decoding device suitable for implementing the audio signal encoding/decoding method can be obtained.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明による音声信号符号化復号化法法並びに
音声信号符号化装置及び音声信号復号化装置の一実施例
の構成を示すブロック図、第２図は本発明の説明に供す
る原理ブロック図及び波形図である。４００　　・・・セグメンテーション尺度計算部４１０
　　・・・セグメンテーション部４２０　　・・・音源
計算部４３０　　・・・ＬＰＣ，ピッチ分析部５０５　　・・
・セグメンテーション尺度計算回路５１０　　・・・セ
グメンテーション回路５２０　　・・・ＬＰＣ，ピッチ
計算回路５３０．５９０　　・量子化器５４０　　・・・重みづけ回路５５０　　・・・相互相関関数計算回路５６０　　・・
・インパルス応答計算回路５７０　　・・・自己相関関
数計算回路６００　　・・・マルチプレクサ６１０　　・・・デマルチプレクサ６２０、６４０　　・・・復号器６３０　　・・・音源復元器６５０　　・・・補間器６６０　　・・・合成フィルタ代理人　弁理士　　岩　佐　　義　幸FIG. 1 is a block diagram showing the configuration of an embodiment of an audio signal encoding/decoding method, an audio signal encoding device, and an audio signal decoding device according to the present invention, and FIG. 2 is a principle block diagram for explaining the present invention. FIG. 400...Segmentation measure calculation unit 410
... Segmentation section 420 ... Sound source calculation section 430 ... LPC, pitch analysis section 505 ...
- Segmentation scale calculation circuit 510 ... Segmentation circuit 520 ... LPC, pitch calculation circuit 530.590 - Quantizer 540 ... Weighting circuit 550 ... Cross-correlation function calculation circuit 560 ...
- Impulse response calculation circuit 570 ... Autocorrelation function calculation circuit 600 ... Multiplexer 610 ... Demultiplexer 620, 640 ... Decoder 630 ... Sound source restorer 650 ... Interpolator 660 ... Synthesis Filter Agent Patent Attorney Yoshiyuki Iwasa

Claims

【特許請求の範囲】[Claims]

（１）離散的な音声信号を入力し聴覚の特性と対応の良
い方法により前記音声信号を非一様な区間に分割し、そ
の分割された区間の全部または一部の区間における音源
信号を複数個のパルス列の組合せで表して伝送し、伝送されたパルス列を用いて前記区間の音源信号を復元
して前記音声信号を表す合成音声信号を出力する音声信
号符号化復号化方法。(1) Input a discrete audio signal, divide the audio signal into non-uniform sections using a method that corresponds well to the auditory characteristics, and generate multiple sound source signals in all or part of the divided sections. A method for encoding and decoding an audio signal, the method comprising: transmitting a signal represented by a combination of pulse trains, restoring a sound source signal of the section using the transmitted pulse train, and outputting a synthesized audio signal representing the audio signal.

（２）入力した離散的な音声信号系列から聴覚の特性と
対応の良い特徴パラメータを抽出しそのパラメータを用
いて音声信号を非一様な時間区間にセグメンテーション
するセグメンテーション回路と、分割された音声信号の全部または一部の区間から短時間
スペクトル特性を表すスペクトルパラメータとピッチパ
ラメータとを計算するスペクトルパラメータ計算回路と
、分割された区間の全部または一部の区間における音源信
号を表す複数個パルス列の組合せを計算する音源パルス
計算回路と、スペクトルパラメータとピッチパラメータと音源パルス
列を組み合わせて出力するマルチプレクサ回路とを有す
る音声信号符号化装置。(2) A segmentation circuit that extracts feature parameters that correspond well to auditory characteristics from an input discrete audio signal sequence and uses the parameters to segment the audio signal into non-uniform time intervals; and a segmented audio signal. a spectral parameter calculation circuit that calculates a spectral parameter and a pitch parameter representing a short-time spectral characteristic from all or a part of the divided section; and a combination of a plurality of pulse trains representing a sound source signal in all or a part of the divided section. An audio signal encoding device comprising: a sound source pulse calculation circuit that calculates a spectral parameter, a pitch parameter, and a sound source pulse train; and a multiplexer circuit that combines and outputs a spectrum parameter, a pitch parameter, and a sound source pulse train.

（３）音声信号の短時間スペクトル特性を表すスペクト
ルパラメータとピッチパラメータと音源信号を表す音源
パルス列を入力して前記スペクトルパラメータとピッチ
パラメータと音源パルス列とを分離するデマルチプレク
サ回路と、ピッチパラメータと音源パルス列を用いて非一様に分割
された区間全体の音源信号を復元する音源復元回路と、復元された音源信号を用いて前記区間の音声信号を合成
する合成フィルタとを有する音声信号復号化装置。(3) a demultiplexer circuit that inputs a spectral parameter representing a short-time spectral characteristic of an audio signal, a pitch parameter, and a sound source pulse train representing a sound source signal and separates the spectral parameter, pitch parameter, and sound source pulse train; the pitch parameter and the sound source; An audio signal decoding device comprising: a sound source restoration circuit that uses a pulse train to restore the sound source signal of the entire section non-uniformly divided; and a synthesis filter that uses the restored sound source signal to synthesize the audio signal of the section. .