JPH0279099A

JPH0279099A - Multi-pulse voice processor

Info

Publication number: JPH0279099A
Application number: JP63231250A
Authority: JP
Inventors: Yasuhiro Wake; 和気　靖浩
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-09-14
Filing date: 1988-09-14
Publication date: 1990-03-19
Anticipated expiration: 2011-01-29
Also published as: JPH087598B2

Abstract

PURPOSE:To improve synthetic sound quantity of high pitch frequency by comparing the largest value among absolute values of an cross correlation function which is updated in order while a driving sound source pulse is searched for with a predetermined threshold value and searching for the driving sound source pulse until the largest value becomes smaller than the threshold value. CONSTITUTION:A voice signal which is inputted from an input terminal 1 is inputted to a short-time spectrum information extraction part 2 and a cross correlation function part 3, the output result of the short-time spectrum information extraction part 2 is inputted to an autocorrelation function part 4 and the cross correlation function extraction part 3, and the output of the cross correlation function extraction part 3 and the output of the autocorrelation function extraction part 4 are inputted to a driving sound source pulse search part 5. Then a cross correlation function maximum absolute value calculation part 6 finds the maximum value among absolute values of the cross correlation which is updated successively while the driving sound source pulse is searched for and compares it with the predetermined threshold value, and a driving sound source pulse search part 5 searches for the driving sound source pulse until the maximum value becomes smaller than the threshold value. Consequently, the synthetic sound quality of high pitch frequency is improved.

Description

【発明の詳細な説明】 −〔産業上の利用分野〕本発明は、音声処理装置に関し、特に、音声の駆動音源
パルスを抽出し、伝送するマルチパルス音声処理装置に
関する。DETAILED DESCRIPTION OF THE INVENTION - [Field of Industrial Application] The present invention relates to an audio processing device, and more particularly to a multi-pulse audio processing device that extracts and transmits audio driving sound source pulses.

〔従来の技術〕[Conventional technology]

従来、この種のマルチパルス音声処理装置では、予め１
フレーム内に求めるべき駆動音源パルスの数を決めてお
き、この決められた数のパルスを伝送する構成となって
いた。つまり、従来のマルチパルス音声処理装置では、
入力音声のピッチに拘らず、１フレーム内の駆動音源パ
ルス数は常に一定数となっていた。Conventionally, in this type of multi-pulse audio processing device, 1
The number of driving sound source pulses to be obtained within a frame is determined in advance, and the configuration is such that this determined number of pulses is transmitted. In other words, in the conventional multipulse audio processing device,
Regardless of the pitch of the input audio, the number of drive sound source pulses within one frame is always a constant number.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

前述した従来のマルチパルス音声処理装置では、音源パ
ルス探索部において、入力音声と合成音声の差信号、あ
るいは前記差信号に対応する相互相関関数の大小に拘ら
ず、１フレーム内の駆動音源パルス数は常に一定数にな
っていたので、ピッチ周波数の低い音声に対しては、決
められたパルス数において波形を良好に再現出来るが、
ピッチ周波数の高いところでは駆動音源パルスの数が足
らず、波形を思人に再現出来ず、合成音質の劣化を招く
という欠点がある。In the conventional multi-pulse audio processing device described above, in the sound source pulse search unit, the number of driving sound source pulses in one frame is determined regardless of the magnitude of the difference signal between the input sound and the synthesized sound or the cross-correlation function corresponding to the difference signal. is always a constant number, so for voices with a low pitch frequency, the waveform can be reproduced well with a fixed number of pulses, but
At high pitch frequencies, the number of drive sound source pulses is insufficient, making it impossible to reproduce the waveform to the human ear, resulting in a deterioration of the synthesized sound quality.

〔課題を解決するための手段〕[Means to solve the problem]

本発明のマルチパルス音声符号化装置は、従来のマルチ
パルス音声処理装置に加え、駆動音源パルスを探索中に
順次更新される相互相関関数の絶対値の最大値を求める
手段と、前記相互相関関数の絶対値の最大値を予め決め
られているしきい値と比較する手段とを有し、前記相互
相関関数の絶対値の最大値が前記しきい値以下となるま
で駆動音源パルスを探索することを特徴とる。尚、駆動
音源パルス数に応じ駆動音源パルスを量子化し符号化す
ることにより、駆動音源パルス数の多いところではより
少ないビット数で量子化が行われ、全体としては伝送す
べき駆動音源パルスの数によらず、伝送速度は常に一定
に保たれる。The multi-pulse speech encoding device of the present invention includes, in addition to the conventional multi-pulse speech processing device, means for determining the maximum absolute value of a cross-correlation function that is sequentially updated while searching for a driving excitation pulse, and means for comparing the maximum absolute value of the cross-correlation function with a predetermined threshold value, and searching for the driving sound source pulse until the maximum absolute value of the cross-correlation function becomes equal to or less than the threshold value. It is characterized by Furthermore, by quantizing and encoding the driving sound source pulses according to the number of driving sound source pulses, quantization is performed with a smaller number of bits in areas where the number of driving sound source pulses is large, and the overall number of driving sound source pulses to be transmitted is reduced. Regardless, the transmission speed is always kept constant.

〔実施例〕〔Example〕

次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.

第１図を参照すると、入力端子１より入力された音声信
号は短時間スペクトル情報抽出部２と相互相関関数抽出
部３に入力される。短時間スペクトル情報抽出部２の出
力結果は、自己相関関数抽出部４と相互相関関数抽出部
３に入力される。相互相関関数抽出部３の出力と自己相
関関数抽出部４の出力はそれぞれ駆動音源パルス探索部
５に入力されている。相互相関関数絶対最大値算出部６
は、駆動音源パルスを探索中に順次更新される相互相関
関数の絶対値の最大値を求める。しきい値比較部７は、
相互相関関数の絶対値の最大値を予め決められているし
きい値と比較する。Referring to FIG. 1, an audio signal input from an input terminal 1 is input to a short-time spectrum information extraction section 2 and a cross-correlation function extraction section 3. The output result of the short-time spectral information extraction section 2 is input to the autocorrelation function extraction section 4 and the cross-correlation function extraction section 3. The outputs of the cross-correlation function extracting section 3 and the outputs of the autocorrelation function extracting section 4 are respectively input to the driving excitation pulse searching section 5. Cross-correlation function absolute maximum value calculation unit 6
calculates the maximum absolute value of the cross-correlation function that is sequentially updated while searching for the driving sound source pulse. The threshold comparison unit 7
The maximum absolute value of the cross-correlation function is compared with a predetermined threshold.

駆動音源パルス探索部５では、次の（１）式及び第２図
に示されるフローチャートに基づいて、順次、駆動音源
パルスが求められる。The driving sound source pulse search unit 5 sequentially finds driving sound source pulses based on the following equation (1) and the flowchart shown in FIG.

ｇに一φ、、（ｍに）但し、ｇＫは駆動音源パルス振幅、ｍＫは駆動音源パル
ス位置、φ、ｈはφ、、（０）により正規化された相互
ｉｌｌ関関数、φゎ、はφｈ、（０）により正規化され
た自己相関関数である。g is one φ, , (m is) where gK is the driving sound source pulse amplitude, mK is the driving sound source pulse position, φ, h is the mutual ill-function function normalized by φ, , (0), and φゎ is φh is an autocorrelation function normalized by (0).

駆動音源パルス探索部５により求まった音源パルス数と
音源パルスとは量子化器８に入力される。The number of sound source pulses and the sound source pulses determined by the driving sound source pulse search unit 5 are input to a quantizer 8 .

量子化器８ではフレーム全体でパルスに割当てられるビ
ット数と伝送すべきパルス数より音源パルス数を決定し
、量子化および符号化した後、量子化情報と共に、出力
端子９に出力される。The quantizer 8 determines the number of sound source pulses from the number of bits allocated to pulses in the entire frame and the number of pulses to be transmitted, quantizes and encodes them, and outputs them to an output terminal 9 together with quantization information.

第２図に於いて、ｇ　ｏ　＝　０　、　Ｍ　Ａ　Ｘ　＝
　ｔ　Ａ　Ｂ　Ｓ〔φ１、〕）　は絶対値の最大値、Ｔ
ｈはしきい値である。In Figure 2, go = 0, M A x =
t A B S [φ1, ]) is the maximum absolute value, T
h is a threshold.

尚、第２図に示されたフローチャートでは、リアルタイ
ムで動作する音声符号化装置において制限されるパルス
探索時間の保護も設定されである。In the flowchart shown in FIG. 2, protection of the pulse search time, which is limited in a speech encoding device operating in real time, is also set.

このパルス時間長の制限により、伝送できるパルス数の
最大値も決めることができ、量子化器８のビット配分を
予めテーブル化することもてきる。By limiting the pulse time length, the maximum number of pulses that can be transmitted can be determined, and the bit allocation of the quantizer 8 can be tabulated in advance.

例えば、第１表に示されるようなビット割当てを行うこ
とにより、最大４８％の駆動音源パルスが増加する。こ
れは、音源パルスの符号化ビット数の減少による合成音
質を補うに十分である。但し、第１表は、１６　ｋｂｐ
ｓ、　２０　ｎ５ｅｃ／フレームの場合である。For example, by performing the bit allocation as shown in Table 1, the driving sound source pulses can be increased by up to 48%. This is sufficient to compensate for the synthesized sound quality due to the reduction in the number of coded bits of the sound source pulse. However, in Table 1, 16 kbp
s, 20 n5ec/frame.

以下余白第　　　　１　　　　表〔発明の効果〕以上説明したように本発明は、音源パルス探索中に順次
更新される相互相関関数の絶対値の最大値を調べること
により、駆動音源パルス数及び駆動音源パルスの符号化
ビット数を可変とする事によって、特に、ピッチ周波数
の高い女性話者など予め決められた音源パルス数の足ら
ない場合における合成音質を向上させることができると
いう効果がある。Table 1 with blank space below [Effects of the Invention] As explained above, the present invention can determine the number of driving sound source pulses and driving sound source pulses by checking the maximum absolute value of the cross-correlation function that is sequentially updated during the sound source pulse search. By making the number of encoding bits variable, the synthesized sound quality can be improved, especially when the predetermined number of sound source pulses is insufficient, such as for a female speaker with a high pitch frequency.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例によるマルチパルス音声処理
装置の構成を示すブロック図、第２図は本実施例におけ
る音源パルス探索のフローチャートである。１・・・入力端子、２・・・短時間スペクトル情報抽出
部、３・・・相互相関関数抽出ごＩＳ、４・・・自己相
関関数抽出部、４・・・駆動音源パルス探索部、６・・
・相互相関関数絶対値算出部、７・・・しきい値比較部
、８・・・音源パルス量子化器、９・・・出力端子。第２図FIG. 1 is a block diagram showing the configuration of a multi-pulse audio processing device according to an embodiment of the present invention, and FIG. 2 is a flow chart of sound source pulse search in this embodiment. DESCRIPTION OF SYMBOLS 1... Input terminal, 2... Short-time spectrum information extraction section, 3... IS for cross-correlation function extraction, 4... Auto-correlation function extraction section, 4... Drive sound source pulse search section, 6・・・
- Cross-correlation function absolute value calculation unit, 7... Threshold comparison unit, 8... Sound source pulse quantizer, 9... Output terminal. Figure 2

Claims

【特許請求の範囲】１、入力音声を一定時間長のフレームに分け、該フレー
ム毎に入力音声の駆動音源パルスを抽出し、伝送するマ
ルチパルス音声処理装置であって、前記フレーム毎の入
力音声より短時間スペクトル情報を抽出する短時間スペ
クトル情報抽出手段と、前記短時間スペクトル情報より
構成される合成フィルタのインパルス応答の自己相関関
数を求める自己相関関数抽出手段と、前記入力音声と前
記短時間スペクトル情報と前記自己相関関数とから相互
相関関数を求める相互相関関数抽出手段と、前記相互相
関関数と前記自己相関関数とから前記駆動音源パルスを
求める駆動音源パルス探索手段とを有するマルチパルス
音声処理装置に於いて、前記駆動音源パルス探索手段に
おいて前記駆動音源パルスを求める際に、順次更新され
る相互相関関数の絶対値の最大値を求める相互相関関数
絶対最大値算出手段と、前記相互相関関数の絶対値の最大値を予め決められてい
るしきい値と比較するしきい値比較手段とを有し、前記駆動音源パルス探索手段は、前記相互相関関数の絶
対値の最大値が前記しきい値以下となるまで駆動音源パ
ルスを探索することを特徴とするマルチパルス音声処理
装置。[Scope of Claims] 1. A multi-pulse audio processing device that divides input audio into frames of a certain length of time, extracts and transmits driving sound source pulses of the input audio for each frame, the apparatus comprising: short-time spectral information extraction means for extracting shorter-term spectral information; autocorrelation function extraction means for determining an autocorrelation function of an impulse response of a synthesis filter configured from the short-term spectral information; Multi-pulse audio processing comprising cross-correlation function extraction means for obtaining a cross-correlation function from spectral information and the autocorrelation function, and driving sound source pulse searching means for finding the driving sound source pulse from the cross-correlation function and the autocorrelation function. In the apparatus, when determining the driving sound source pulse in the driving sound source pulse searching means, cross-correlation function absolute maximum value calculation means for calculating the maximum value of the absolute value of the cross-correlation function that is sequentially updated; and the cross-correlation function threshold comparison means for comparing the maximum absolute value of the cross-correlation function with a predetermined threshold; A multi-pulse audio processing device characterized in that a driving sound source pulse is searched until the driving sound source pulse becomes equal to or less than a value.