JPH043560B2

JPH043560B2 -

Info

Publication number: JPH043560B2
Application number: JP58205242A
Authority: JP
Priority date: 1983-11-01
Filing date: 1983-11-01
Publication date: 1992-01-23
Also published as: JPS6097399A

Description

【発明の詳細な説明】本発明はマルチパルス型ボコーダに関する。入
力音声信号を分析して、この入力音声信号の音声
情報を構成するスペクトル包絡情報と音源情報と
を分析側で抽出し、これら音声情報を伝送路を介
して合成側に送出して入力音声信号を再生するボ
コーダはよく知られている。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a multi-pulse vocoder. The input audio signal is analyzed, the spectral envelope information and sound source information that constitute the audio information of this input audio signal are extracted on the analysis side, and these audio information are sent to the synthesis side via the transmission path to generate the input audio signal. Vocoders that play .

上述したスペクトル包絡情報は、入力音声信号
を発生する声道系のスペクトル分布情報を表わす
もので、通常LPC分析によつて得られた分析次
数に対応する個数のLPC係数、たとえばαパラ
メータ、κパラメータ等によつて表現され、また
音源情報はスペクトル包絡の微細構造を示すもの
で入力音声信号からスペクトル分布情報を除い
た、いわゆる残差信号として知られるもので、入
力音声信号の音源の強さ、ピツチ周期および有
声・無声に関する情報が含まれ、通常これらの情
報は入力音声信号の分析フレームごとの自己相関
係数を介して抽出されることもよく知られてい
る。 The above-mentioned spectral envelope information represents the spectral distribution information of the vocal tract system that generates the input speech signal, and usually includes the number of LPC coefficients corresponding to the analysis order obtained by LPC analysis, such as the α parameter and the κ parameter. The sound source information indicates the fine structure of the spectral envelope, and is known as the so-called residual signal obtained by removing the spectral distribution information from the input audio signal. It is also well known that information regarding pitch period and voiced/unvoiced is included, and that this information is usually extracted via autocorrelation coefficients for each analysis frame of the input audio signal.

さて、スペクトル包絡情報はボコーダの合成側
で入力音声信号を合成する場合、通常全極型のデ
ジタルフイルタを利用して近似的声道系を形成せ
しめるLPC合成器の係数として利用され、音源
情報はこのデジタルフイルタの駆動音源として利
用され、このデジタルフイルタによつて入力音声
信号が合成される。 Now, when spectral envelope information is synthesized on the synthesis side of a vocoder, it is usually used as coefficients of an LPC synthesizer that uses an all-pole digital filter to form an approximate vocal tract system, and the sound source information is It is used as a driving sound source for this digital filter, and input audio signals are synthesized by this digital filter.

このようにして得られる従来のLPCボコーダ
は、約4Kb（キロビツト）以下の低ビツトレート
でも音声の合成が可能であり多用されているもの
の、高品質の音声合成は高ビツトレートにおいて
も困難であるという欠点を有する。この原因は音
源情報のモデル化の場合、有音声に対してはその
内容に対応するピツチ周期を抽出してこのピツチ
周期に対応する単一のインパルス列で近似的に表
現し、ランダム周期の無声音に対しては白色雑音
で近似的に表現するという単純なモデル化処理を
前提としているため、入力音声信号の音源状報を
忠実に抽出したものとならず、従つて音源情報に
含まれる入力音声信号の波形情報の分析、合成が
実施されていないことによる。 Although the conventional LPC vocoder obtained in this way is capable of synthesizing speech even at low bit rates of about 4Kb (kilobits) or less and is widely used, it has the disadvantage that high-quality speech synthesis is difficult even at high bit rates. has. The reason for this is that when modeling sound source information, for voiced speech, the pitch period corresponding to the content is extracted and approximately represented by a single impulse train corresponding to this pitch period, and unvoiced speech with a random period is is assumed to be a simple modeling process in which it is approximated by white noise, so it does not faithfully extract the sound source information of the input audio signal, and therefore the input audio contained in the sound source information This is because the signal waveform information is not analyzed or synthesized.

マルチパルス型ボコーダは、このような波形非
伝送による問題の改善を図るため波形伝送を行な
つて入力音声信号の合成を実施するボコーダのひ
とつとして近時よく知られつつあるものである。 A multi-pulse vocoder has recently become well known as a type of vocoder that performs waveform transmission and synthesizes input audio signals in order to improve the problem caused by non-transmission of waveforms.

第１図は従来のマルチパルス型ボコーダの基本
的構成を示すブロツク図である。 FIG. 1 is a block diagram showing the basic configuration of a conventional multi-pulse vocoder.

LPC合成器１は声道をシミユレートする全極
型デジタルフイルタを備え、その係数は入力端子
2001を介して入力される入力音声信号ｘ（ｎ）（ｎ
＝１，２，３……ｎ）をLPC分析器２により分
析フレームごとに分析したLPC係数が供給され
る。音源パルス発生器３は、入力音声信号の音源
情報から複数個のインパルス系列、すなわちマル
チパルスからなる駆動音源系列Ｖ（ｎ）を得て、
これをLPC合成器１の駆動音源として供給する。 The LPC synthesizer 1 is equipped with an all-pole digital filter that simulates the vocal tract, and its coefficients are input to the input terminal.
The input audio signal x(n)(n
= 1, 2, 3...n) by the LPC analyzer 2 for each analysis frame. The sound source pulse generator 3 obtains a driving sound source sequence V(n) consisting of a plurality of impulse sequences, that is, multipulses, from the sound source information of the input audio signal,
This is supplied as a driving sound source to the LPC synthesizer 1.

LPC合成器１はこうして入力するLPC係数を、
通常は全極型デジタルフイルタを利用する合成フ
イルタの係数とし、マルチパルスを駆動音源とし
て駆動され合成信号x〓（ｎ）を出力する。この場
合、マルチパルスは入力音声信号の波形情報を含
むものであり、LPC合成器１は波形情報を含む
入力音声信号の合成を行なうこととなる。 The LPC synthesizer 1 inputs the LPC coefficients in this way,
Usually, this is the coefficient of a synthesis filter using an all-pole digital filter, which is driven by a multi-pulse as a driving sound source and outputs a synthesis signal x〓(n). In this case, the multi-pulse includes waveform information of the input audio signal, and the LPC synthesizer 1 synthesizes the input audio signal including the waveform information.

さて、LPC合成器１から出力する合成信号x〓
（ｎ）は次に減算器４で入力音声信号ｘ（ｎ）との
差をとり、誤差ｅ（ｎ）を得てこれを聴感重み付
け器５に送出する。 Now, the composite signal x output from LPC synthesizer 1
(n) is then subtracted from the input audio signal x(n) by a subtracter 4 to obtain an error e(n), which is sent to the auditory weighter 5.

聴感重み付け器５は、誤差ｅ（ｎ）に対して次
の(1)式に示す特性Ｗ（Ｚ）を有する重み付けフイ
ルタによつて聴感的な重み付けを付与したうえ、
これらを２乗誤差最小化器６に送出するものであ
る。 The perceptual weighting device 5 applies perceptual weighting to the error e(n) using a weighting filter having a characteristic W(Z) shown in the following equation (1), and
These are sent to the square error minimizer 6.

Ｗ（Ｚ）＝〔１−_p 〓^k=1 a_kZ^-k〕／〔１−_p 〓^k=1 a_kγ^kZ^-k〕 ……(1) (1)式においてa_kはLPC合成器１の全極型デジタ
ルフイルタの係数とすべきLPC係数、ｐはその
次数であり従つてLPC分析次数、γは重み付け
係数、Ｚは全極型デジタルフイルタのＺ変換表示
による伝達関数Ｈ（Z^-1）におけるＺ＝exp（jλ）
を示し、ここにλ＝2πΔTfでありΔTは分析フレ
ームの標本化サンプリング周期、ｆは周波数を示
す。W (Z) = [1- _p 〓 ^k=1 a _k Z ^-k ] / [1- _p 〓 ^k=1 a _k γ ^k Z ^-k ] ...(1) In equation (1), a _k is LPC The LPC coefficient to be used as the coefficient of the all-pole digital filter of the synthesizer 1, p is its order and therefore the LPC analysis order, γ is the weighting coefficient, and Z is the transfer function H( Z = exp(jλ) at Z ^-1 )
, where λ=2πΔTf, ΔT is the sampling period of the analysis frame, and f is the frequency.

また(1)式において重み付け係数γは、０＜γ＜
１の範囲で設定される。 In addition, in equation (1), the weighting coefficient γ is 0<γ<
It is set in the range of 1.

(1)式に示すＷ（Ｚ）はγ＝１に対しては１、γ
＝０に対してはＷ（Ｚ）＝１−ｐ（Ｚ）の範囲の範
囲で変化し、γの値は誤差ｅ（ｎ）の周波数スペ
クトルにおけるフオルマント領域に現われる過大
なレベルを抑圧する程度に対応して前述した範囲
の中で設定され、合成すべき信号の聴感的重み付
けの役割を果たすものであり、通常予め最適聴感
テストによつてその最適値が選定される。 W(Z) shown in equation (1) is 1 for γ=1, γ
= 0, it varies within the range of W(Z) = 1-p(Z), and the value of γ is set to an extent that suppresses the excessive level appearing in the formant region in the frequency spectrum of the error e(n). It is correspondingly set within the above-mentioned range and plays the role of audible weighting of the signals to be combined, and its optimum value is usually selected in advance by an optimum audibility test.

このようにして重み付けされた誤差ｅ（ｎ）は、
音源パルス発生器３から出力される駆動音源系列
Ｖ（ｎ）、すなわちマルチパルスの最適時間位置と
振幅とを決定するために２乗誤差最小化器６に送
出され、次の(2)式による２乗誤差εを計算し、ε
を最小にするように駆動音源系列Ｖ（ｎ）が選択
される。 The error e(n) weighted in this way is
In order to determine the driving sound source sequence V(n) output from the sound source pulse generator 3, that is, the optimal time position and amplitude of the multi-pulse, it is sent to the square error minimizer 6, and is calculated according to the following equation (2). Calculate the squared error ε, ε
The driving sound source sequence V(n) is selected so as to minimize the value of V(n).

ε＝_N 〓ⁿ⁼¹ 〔ｅ（ｎ）＊ｗ（ｎ）〕² ……(2) (2)式において記号＊は聴感重み付け器５の重み
付けフイルタによるたたみ込み積分、Ｎはマルチ
パルスを計算する区間長を示す。 ε= _N 〓 ⁿ⁼¹ [e(n)*w(n)] ² ...(2) In equation (2), the symbol * is the convolution integral by the weighting filter of the auditory weighter 5, and N is the multipulse calculation Indicates the length of the interval.

上述した処理はマルチパルスのパルスごとに繰
返され、分析による合成がマルチパルスごとに行
なわれる、いわゆるAnalysis−by−Synthesis手
法（以下Ａ−ｂ−Ｓ手法と略称する）であつて、
このＡ−ｂ−Ｓ手法は上述した内容からも明らか
な如く、マルチパルス１つずつについてパルス発
生、２乗誤差計算およびパルス位置・振幅調整の
ループで行なわれるため、低ビツトレート領域に
おける有効な手段であるにもかかわらずその演算
量が極めて膨大なものとなるという欠点がある。 The above-mentioned process is repeated for each multi-pulse, and synthesis by analysis is performed for each multi-pulse, which is the so-called Analysis-by-Synthesis method (hereinafter abbreviated as A-b-S method).
As is clear from the above, this A-b-S method is an effective method in the low bit rate region because it is performed in a loop of pulse generation, square error calculation, and pulse position/amplitude adjustment for each multipulse. However, the disadvantage is that the amount of calculation required is extremely large.

なお、このＡ−ｂ−Ｓ手法については、B.S.
Atal et al、“Ａ New Model of LPC
Excitation for Producing Natural−Sounding
Speech at Low Bit Rates”、Proc.ICASSP
82、pp−614−617、（1982）等に詳述されてい
る。 Regarding this A-b-S method, BS
Atal et al. “A New Model of LPC
Excitation for Producing Natural−Sounding
“Speech at Low Bit Rates”, Proc.ICASSP
82, pp-614-617, (1982), etc.

このような従来のＡ−ｂ−Ｓ手法における欠点
に対して、相関演算にもとづき最適なマルチパル
スを効率的に計算する次のような演算処理アルゴ
リズムが最近紹介されている。 In order to address these shortcomings in the conventional A-b-S method, the following arithmetic processing algorithm has recently been introduced which efficiently calculates optimal multi-pulses based on correlation calculations.

すなわち、入力音声信号ｘ（ｎ）はＮサンプル
ごと処理フレームによつて区分され、このフレー
ムごとにマルチパルスが包括的に計算されるもの
である。 That is, the input audio signal x(n) is divided into processing frames every N samples, and multipulses are comprehensively calculated for each frame.

いま、１分析フレーム内に音源パルスがｋ個存
在するものとし、ｉ番目のパルスがフレーム端か
ら時間位置miにあり、かつその振幅がgiである
とすると、LPC合成フイルタの駆動音源ｄ（ｎ）
は次の(3)式で示される。 Assume that there are k sound source pulses in one analysis frame, and that the i-th pulse is at the time position mi from the frame end and its amplitude is gi, then the driving sound source d(n )
is expressed by the following equation (3).

ｄ（ｎ）＝_k 〓ⁱ⁼¹ g_i・δn、m_i ……(3) (3)式においてδn、m_iはクロネツカーのデルタ
関数であり、δn、m_i＝１（ｎ＝m_i）、δn、m_i＝０
（ｎ≒m_i）である。 d(n)= _k 〓 ⁱ⁼¹ g _i・δn, m _i ...(3) In equation (3), δn, m _i are Kronetzker's delta functions, and δn, m _i =1 (n=m _i ), δn, m _i =0
(n≒ _mi ).

LPC合成フイルタはこの駆動音源ｄ（ｎ）によ
つて駆動され合成信号x〓（ｍ）を出力する。 The LPC synthesis filter is driven by this drive sound source d(n) and outputs a synthesis signal x〓(m).

LPC合成フイルタとして、たとえば全極型デ
ジタルフイルタを考えるものとし、その伝達関数
をインパルス応答ｋ（ｎ）（０≦ｎ≦Ｍ−１）で表
現するものとすると、合成信号x〓（ｎ）は次の(4)
式で表わされる。 As an LPC synthesis filter, let us consider, for example, an all-pole digital filter, and its transfer function is expressed by an impulse response k(n) (0≦n≦M-1), then the synthesized signal x〓(n) is Next (4)
It is expressed by the formula.

x〓（ｎ）＝_M-1 〓^l=0 α（ｌ）・ｈ（ｎ−ｌ） ……(4) (4)式においてｄ（ｌ）は駆動音源を表わす。次
に入力音声信号ｘ（ｎ）と合成信号x〓（ｎ）との誤
差に対し聴感的な補正を施した重み付け誤差をe_w
（ｎ）とするとe_w（ｎ）は次の(5)式で示される。 x〓(n)= _M-1〓l ⁼⁰ α(l)・h(n-l)...(4) In equation (4), d(l) represents the driving sound source. Next, the weighting error obtained by performing auditory correction on the error between the input audio signal x(n) and the composite signal x〓(n) is e _w
(n), e _w (n) is expressed by the following equation (5).

e_w（ｎ）＝｛ｘ（ｎ）−x〓（ｎ）｝＊ｗ（ｎ）……(5) さらに２乗誤差は(5)式から誘導して次の(6)式で
示すことができる。_M 〓ⁿ⁼¹ e² _w(n)＝_M 〓ⁿ⁼¹ 〔{x(n)-x〓(n)}＊ｗ(n)〕² ……(6) (6)式においてＭは誤差を最小化する区間のサン
プル数を示し、たとえば１分析フレーム長に選
ぶ。最適な音源パルス列としてのマルチパルスは
(6)式を最小化するg_iを得ることによつて得られ、
このg_iは上述した(3)、(4)および(6)式から次の(7)式
の如く誘導される。e _w (n)={x(n)−x〓(n)}*w(n)……(5) Furthermore, the squared error can be derived from equation (5) and expressed as the following equation (6). I can do it. _M 〓 ⁿ⁼¹ e ² _w (n)= _M 〓 ⁿ⁼¹ [{x(n)-x〓(n)}＊w(n)] ² ...(6) In equation (6), M is the error The number of samples in the interval that minimizes is selected, for example, as the length of one analysis frame. Multipulse as the optimal sound source pulse train is
Obtained by obtaining g _i that minimizes equation (6),
This g _i is derived from the above-mentioned equations (3), (4), and (6) as shown in the following equation (7).

g_i（m_i）＝_M 〓ⁿ⁼¹ x_w(n)・h_w（ｎ−m_i）_i-1 〓^l=1 〔g_lM 〓^M=1 h_w（ｎ−m_l）・h_w（ｎ−m_i）〕/_M 〓ⁿ⁼¹ h_w（ｎ−m_i）・h_w（ｎ−m_i） ……(7) (7)式においてx_w（ｎ）はｘ（ｎ）＊ｗ（ｎ）、h_w
（ｎ）はｈ（ｎ）＊ｗ（ｎ）を示す。(7)式の右辺の
分子の第１項はΧ_w（ｎ）とh_w（ｎ）との時間遅れ
m_iの相互相関関数_hx（m_i）を示すものであり、
また、第２項の_M 〓^M=1 h_w（ｎ−m_l）・h_w（ｎ−m_i）は
h_w（ｎ）の共分散関数_hh（m_l、m_i）（１≦m_l、m_i
≦Ｍ）を示す。共分散関数_hh（m_l、m_i）は自己
相関関数R_hh（｜m_l−m_i｜）と等しくなり、従つ
て(7)式は次の(8)式の如く表わすことができる。g _i (m _i )= _M 〓 ⁿ⁼¹ x _w (n)・h _w (n−m _i ) _i−1 〓 ^l=1 [g _lM 〓 ^M=1 h _w (n−m _l )・h _w (n-m _i )]/ _M 〓 ⁿ⁼¹ h _w (n-m _i )・h _w (n-m _i ) ...(7) In equation (7), x _w (n) is x(n ) * w (n), h _w
(n) indicates h(n)*w(n). The first term in the numerator on the right side of equation (7) is the time delay between Χ _w (n) and h _w (n).
It indicates the cross-correlation function _hx (m _i ) of m _i ,
Also, the second term _M 〓 ^M=1 h _w (n-m _l )・h _w (n-m _i ) is
Covariance function of h _w (n) _hh (m _l , m _i ) (1≦m _l , m _i
≦M). The covariance function _hh (m _l , m _i ) is equal to the autocorrelation function R _hh (|m _l −m _i |), and therefore equation (7) can be expressed as the following equation (8).

(8)式によれば、時間位置m_iにおいてパルスを
発生せしめると振幅g_i（m_i）が最適なものとして
決定しうることとなる。なお(8)式において１≦
m_i≦Ｍである。 According to equation (8), if a pulse is generated at time position m _i , the amplitude g _i (m _i ) can be determined to be optimal. Note that in equation (8), 1≦
m _i ≦M.

つまり、ある音源パルスに着目し、種種の時間
位置において(8)式によりその振幅を計算したう
え、その振幅の絶対値を最大とするものが(6)式に
示す２乗誤差を最小化するパルスとなり、このよ
うな手続を繰返して複数個の音源パルスを求める
ことができる。 In other words, by focusing on a certain sound source pulse and calculating its amplitude using equation (8) at various time positions, the one that maximizes the absolute value of the amplitude minimizes the squared error shown in equation (6). A plurality of sound source pulses can be obtained by repeating this procedure.

なお、上述した計算アルゴリズムに関しては、
小沢、荒関、小野“マルチパルス駆動形音声符号
化法の検討”、1983年３月電子通信学会通信
方式研究会に詳述されている。 Regarding the calculation algorithm mentioned above,
Ozawa, Araseki, and Ono, ``Study of multipulse-driven speech coding method,'' March 1983, detailed in the Communications Method Study Group of the Institute of Electronics and Communication Engineers.

このような計算アルゴリズムに基づいて行なわ
れるマルチパルスの発生によれば、相互相関関数
と自己相関関数ならびに最大値演算から最適なマ
ルチパルスの計算が可能となるため、構成が非常
に簡素化されたものとなり演算量を大幅に低減し
うるマルチパルス型ボコーダを実現することがで
きる。 Generating multipulses based on such calculation algorithms makes it possible to calculate optimal multipulses from cross-correlation functions, autocorrelation functions, and maximum value calculations, which greatly simplifies the configuration. Therefore, it is possible to realize a multi-pulse vocoder that can significantly reduce the amount of calculation.

しかしながら、このようにして改善したマルチ
パルス型ボコーダにあつてもさらに次に述べるよ
うな欠点がある。 However, even the multi-pulse vocoder improved in this way still has the following drawbacks.

すなわち、分析フレーム内での電力の急激な変
動がある場合、例えば語頭、破裂音等では電力の
より大きなある分析フレーム内の一区間にパルス
が集中し、電力のより小いさい同一の分析フレー
ム内の区間にパルスが存在しなくなり、結果とし
て前記区間の音声が合成側で再生されず係る区間
の明瞭性が損なわれる。 In other words, when there is a sudden change in power within an analysis frame, for example at the beginning of a word or a plosive, the pulses are concentrated in one section within a certain analysis frame with higher power, and when the same analysis frame with lower power As a result, the sound in the section is not reproduced on the synthesis side, and the clarity of the section is impaired.

第２図は小沢らのアルゴリズムを用いて抽出し
たパルス列の例であり、文章“He took ａ
walk every morning”の一部をフレーム周期20
ｍSECで分析したものである。第２図に於いて
51，52，……55は分析フレームの区間を示す。56
は分析結果であり横軸はパルス発生時間位置を意
味し、縦軸はパルス振幅を意味する。分析フレー
ム51，53，55ではパルスが時間位置の面からは平
均的に発生されている。しかしながらフレーム内
に電力の急激な変動があるフレーム52，54の前半
の時間位置にはパルスが発生せず後半の時間位置
にパルスが集中している。故にフレーム52，54の
前半の時間位置に対応する区間の音声が合成側で
再生されず係る区間の明瞭性が損なわれる。 Figure 2 shows an example of a pulse train extracted using the algorithm of Ozawa et al.
frame period 20
This was analyzed using mSEC. In Figure 2
51, 52, . . . 55 indicate sections of the analysis frame. 56
is the analysis result, the horizontal axis means the pulse generation time position, and the vertical axis means the pulse amplitude. In analysis frames 51, 53, and 55, pulses are generated evenly in terms of time position. However, no pulses are generated in the first half of frames 52 and 54, where there is a sudden change in power within the frame, and pulses are concentrated in the second half. Therefore, the sound in the section corresponding to the first half time position of frames 52 and 54 is not reproduced on the synthesis side, and the clarity of the section is impaired.

本発明の目的は上述した欠点を除去し、マルチ
パルス型ボコーダにおいて、分析フレーム内での
電力の急激な変動がある場合に、電力のより大き
な分析フレーム内の一区間にパルスが集中し、電
力のより小いさい分析フレーム内の区間にパルス
が存在しなくなることによる合成音質の劣化を大
幅に改善した簡単な構成のマルチパルス型ボコー
ダを提供することにある。 An object of the present invention is to eliminate the above-mentioned drawbacks, and to provide a multi-pulse vocoder in which when there is a sudden change in power within an analysis frame, the pulses are concentrated in a section within the analysis frame with higher power, and the power To provide a multi-pulse vocoder with a simple configuration that greatly improves the deterioration of synthesized sound quality due to the absence of pulses in sections within a smaller analysis frame.

本発明のマルチパルス型ボコーダは、入力音声
信号を分析フレームごとにLPC分析して抽出し
たLPC係数をスペクトル包絡情報としこのスペ
クトル包絡情報とともに前記入力音声信号の音声
情報を構成する音源情報を分析フレームごとにこ
の音源情報の特徴に対応する発生時間位置と振幅
とを有する複数個のインパルス系列（マルチパル
ス）を以つて表現し前記入力音声信号の分析およ
び合成を行なうマルチパルス型ボコーダにおい
て、前記入力音声信号の分析フレーム内の短時間
電力の変動率を求め、更に前記変動率が予じめ設
定された値を越える場合に、前記入力音声信号の
分析フレーム内に於ける短時間電力の変動率を求
め、更に前記変動率が予じめ設定された値を越え
る場合に、前記入力音声信号の分析フレーム内に
於ける短時間電力の変動を補正し、あるいは分析
フレームを複数の分析区間に分割して複数個のイ
ンパルス系列を求める手段を分析側に備えて構成
されている。 The multi-pulse vocoder of the present invention performs LPC analysis on an input audio signal for each analysis frame, uses the extracted LPC coefficients as spectral envelope information, and uses the spectral envelope information and sound source information constituting the audio information of the input audio signal as spectral envelope information. In a multi-pulse type vocoder, the input audio signal is analyzed and synthesized by representing a plurality of impulse sequences (multipulses) each having a generation time position and amplitude corresponding to the characteristics of the sound source information. The fluctuation rate of the short-time power within the analysis frame of the audio signal is determined, and if the fluctuation rate exceeds a preset value, the fluctuation rate of the short-time power within the analysis frame of the input audio signal is determined. and further correct short-term power fluctuations within the analysis frame of the input audio signal, or divide the analysis frame into multiple analysis sections, if the fluctuation rate exceeds a preset value. The analyzer is provided with means for determining a plurality of impulse sequences on the analysis side.

次に図面を参照して本発明を詳細に説明する。
第３図は本発明によるマルチパルス型ボコーダの
分析側の一実施例を示すブロツク図、第４図は本
発明によるマルチパルス型ボコーダの合成側の一
実施例を示すブロツク図である。 Next, the present invention will be explained in detail with reference to the drawings.
FIG. 3 is a block diagram showing an embodiment of the analysis side of the multi-pulse vocoder according to the present invention, and FIG. 4 is a block diagram showing an embodiment of the synthesis side of the multi-pulse vocoder according to the present invention.

第３図に示す本発明によるマルチパルス型ボコ
ーダの分析側は、LPC分析器７、相互相関数算
出器８、電力算出器９、符号化器(1)１０、自己相
関関数算出器１１、音源パルス発生器１２、符号
化器(2)１３、電力変動補正器１４およびマルチプ
レクサ１５を備えて構成される。 The analysis side of the multi-pulse vocoder according to the present invention shown in FIG. It is configured to include a pulse generator 12, an encoder (2) 13, a power fluctuation corrector 14, and a multiplexer 15.

入力端子7001を介して入力した入力音声信号
は、LPC分析器７、電力算出器９、および電力
変動補正器１４にそれぞれ供給される。 The input audio signal input via the input terminal 7001 is supplied to the LPC analyzer 7, the power calculator 9, and the power fluctuation corrector 14, respectively.

LPC分析器７は入力音声信号を分析フレーム
ごとに、予め設定するビツト数のデジタル量とし
て量子化し、この量子化音声信号をLPC分析し
て、LPC係数としてのｐ次のκパラメータ（偏
自己相関係数）を抽出し、これを出力ライン701
を介して符号化器(1)１０に供給する。本実施例に
おいては分析フレームは20ｍSECに設定してい
る。 The LPC analyzer 7 quantizes the input audio signal as a digital quantity with a preset number of bits for each analysis frame, performs LPC analysis on this quantized audio signal, and calculates the p-order κ parameter (partial self-correlation) as an LPC coefficient. relation coefficient) and output this to the output line 701
The signal is supplied to the encoder (1) 10 via the encoder (1). In this example, the analysis frame is set to 20mSEC.

符号化器(1)１０は、入力したLPC係数の量子
化と符号化を行なつたのち、出力ライン1001を介
してマルチプレクサ１５に送出する。 Encoder (1) 10 quantizes and encodes the input LPC coefficients, and then sends them to multiplexer 15 via output line 1001.

LPC分析器７はまた、LPC係数からインパル
ス応答ｈ（ｎ）（０≦ｎ≦Ｍ−１）を計算し、出力
ライン702、符号化器(1)１０、出力ライン1002を
介して相互相関関数算出器８および自己相関関数
算出器１１に供給する。 The LPC analyzer 7 also calculates the impulse response h(n) (0≦n≦M−1) from the LPC coefficients and sends the cross-correlation function It is supplied to the calculator 8 and the autocorrelation function calculator 11.

相互相関関数算出器８は、後述する電力変動補
正器１４によりフレーム内の短時間電力の変動が
補正された入力音声信号とインパルス応答ｈ（ｎ）
とを利用して相互相関関数_hxを計算し、これを
出力ライン801を介して音源パルス発生器１２に
送出する。 The cross-correlation function calculator 8 calculates the input audio signal and impulse response h(n) whose short-term power fluctuations within a frame have been corrected by a power fluctuation corrector 14, which will be described later.
A cross-correlation function _hx is calculated using

また、自己相関関数算出器１１は、入力したイ
ンパルス応答ｈ（ｎ）の自己相関関数R_hhを計算
し、これを出力ライン1101を介して音源パルス算
出器１２に送出する。 Further, the autocorrelation function calculator 11 calculates the autocorrelation function R _hh of the input impulse response h(n), and sends it to the sound source pulse calculator 12 via the output line 1101.

音源パルス算出器１２は、こうして入力した分
析フレームごとの相互相関関数_hxと自己相関関
数R_hhとを利用して(8)式の計算を実行し所定の数
の音源パルス列を得て、これらのパルスの振幅お
よび位置情報を出力ライン1201を介して符号化器
(2)１３に送出し、これによつてフレーム内の短時
間電力の変動に対応して重み付けられているパル
スの振幅を再補正し量子化および符号化を行なつ
たのち出力ライン1301を介してマルチプレクサ１
５に送出する。 The sound source pulse calculator 12 executes the calculation of equation (8) using the cross-correlation function _hx and autocorrelation function _Rhh for each analysis frame thus input, obtains a predetermined number of sound source pulse trains, and calculates these The encoder outputs pulse amplitude and position information via line 1201.
(2) 13, whereby the amplitude of the pulse, which is weighted in response to short-term power fluctuations within the frame, is re-corrected, quantized and encoded, and then sent via the output line 1301. multiplexer 1
Send to 5.

このようにして、量子化および符号化されてマ
ルチプレクサ１５に送出されるLPC係数および
マルチパルスデータは、入力音声信号のスペクト
ル包絡および音源情報を表わすデータとしてマル
チプレクサ１５を介して所定の方式で時分割さ
れ、伝送路1501を介して第３図に示す分析側から
第４図に示す合成側に伝送されるが、分析側にお
ける処理において、分析フレーム内で短時間電力
の変動が激しい場合には、電力のより大きな分析
フレーム内の一区間にパルスが集中し、電力のよ
り小いさい分析フレーム内の区間にパルスが存在
しなくなり、結果として前記区間の音声が合成側
で再生されず、係る区間の明瞭性が損なわれると
いう欠点を生ずることは前述したとおりである。 In this way, the LPC coefficients and multipulse data that are quantized and encoded and sent to the multiplexer 15 are time-divided in a predetermined manner via the multiplexer 15 as data representing the spectral envelope and sound source information of the input audio signal. is transmitted from the analysis side shown in FIG. 3 to the synthesis side shown in FIG. 4 via the transmission path 1501. However, in the processing on the analysis side, if there is a large fluctuation in the power for a short time within the analysis frame, Pulses are concentrated in a section in the analysis frame with higher power, and pulses no longer exist in the section in the analysis frame with lower power, and as a result, the audio in the said section is not reproduced on the synthesis side, and the corresponding section As mentioned above, this has the drawback that the clarity of the image is impaired.

そこで、本実施例にあつては第３図に示す電力
算出器９、電力変動補正器１４等を備え、次のよ
うにしてこの欠点の除去を図つている。 Therefore, in this embodiment, the power calculator 9, power fluctuation corrector 14, etc. shown in FIG. 3 are provided, and this drawback is removed in the following manner.

電力算出器９は入力音声信号を受けると、分析
フレームごとに例えば分析フレームの前半部分と
後半部分との電力の比R_pを下記(9)式により算出
する。 When the power calculator 9 receives the input audio signal, it calculates the power ratio R _p between the first half and the second half of the analysis frame for each analysis frame, for example, using equation (9) below.

ただしｘ(i)は入力音声信号の音声サンプル、Ｎ
は分析フレームに含まれる音声サンプルの総数で
ある。次に短時間電力の変動率v_pを下記(10)式によ
り求める。 However, x(i) is the audio sample of the input audio signal, N
is the total number of audio samples included in the analysis frame. Next, the short-time power fluctuation rate v _p is determined using the following equation (10).

v_p＝max（R_p、１／R_p） ……(10) 電力算出器９は更に(10)式により求められたv_pが
予じめ設定された値r_v（r_v０）を越える場合に
は補正係数c_R（＝√_p）を伝送路901，902を介し
て電力変動補正器１４と符号化器(2)１３とへ出力
しv_pがr_vを越えない場合には補正係数c_Rを“１”
として同様に出力する。 v _p = max (R _p , 1/R _p ) ...(10) The power calculator 9 further calculates that v _p obtained by equation (10) is the preset value r _v (r _v 0). If v p does not exceed r v , the correction coefficient c _R (=√ _p ) is output to the power fluctuation corrector 14 and encoder (2) 13 via transmission lines 901 and 902, and if v _p does not exceed r _v Correction coefficient c _R is “1”
Output the same as .

電力変動補正器１４は前記c_Rを用いて入力音声
信号列ｘ（０）、ｘ(1)……ｘ（Ｎ−１）の振幅を例
えば下記（11）式により補正し結果を相互相関関
数算出器８へ出力する。 The power fluctuation corrector 14 uses the c _R to correct the amplitudes of the input audio signal sequences x(0), x(1)... Output to calculator 8.

x′(i)＝〔２／Ｎ（c_R−１／c_R）・ｉ＋｛１／c_R−１／２（c_R−１／c_R)}〕・ｘ(i)……(
11) ただしx′(i)は補正された入力音声サンプルであ
り、又、（11）式はＮ／４−１番目のサンプルに対する重み１／c_R、３／４Ｎ−１番目のサンプルに対する重みc_Rを直線で接続したものである。x′(i)=[2/N(c _R −1/c _R )・i + {1/c _R −1/2(c _R −1/c _R )}]・x(i)……(
11) However, x'(i) is the corrected input audio sample, and equation (11) is the weight 1/c _R for the N/4-1st sample, and the weight for the 3/4N-1st sample. c Connect _R with a straight line.

符号化器(2)１３は前記c_Rを用いて音源パルス発
生器１２で発生されたパルスの振幅g_i（m_i）を下
記（12）式を用いて再補正する。 The encoder (2) 13 uses the c _R to re-correct the amplitude g _i (m _i ) of the pulse generated by the excitation pulse generator 12 using the following equation (12).

g_i′（m_i）＝g_i（m_i）／２／Ｎ（c_R−１／c_R）・m_i＋｛１／c_R−１
／２（c_R−１／c_Rc_R）｝ ……（12）ただしg_i′（m_i）は再補正されたマルチパルス
の振幅である。更に符号化器(2)１３は前述の説明
の様にパルスの振幅を量子化、符号化しマルチプ
レクサ１５へ出力する。g _i ′(m _i ) = g _i (m _i )/2/N(c _R −1/c _R )・m _i +{1/c _R −1
/2(c _R −1/c _R c _R )} (12) where g _i ′(m _i ) is the re-corrected multipulse amplitude. Furthermore, the encoder (2) 13 quantizes and encodes the amplitude of the pulse as described above, and outputs it to the multiplexer 15.

マルチプレクサ１５は出力ライン1001を介して
受けるLPC係数データ、および出力ライン1301
を介して受けるマルチパルスデータの転送を伝送
路1501を介して予め定める時分割方式により同時
伝送をする。 Multiplexer 15 receives LPC coefficient data via output line 1001 and output line 1301.
Multi-pulse data received via the transmission line 1501 is simultaneously transmitted using a predetermined time division method.

第４図に示す合成側は、伝送路1501を介して合
成側から伝送されたデータに基づいて入力音声信
号の合成を行なうものであり、デマルチプレクサ
１６、復号化器(1)１７、複号化器(2)１８、LPC
合成器２１、LPF（Low Pass Filtcr）等を備え
て構成される。 The synthesis side shown in FIG. 4 synthesizes input audio signals based on data transmitted from the synthesis side via a transmission path 1501, and includes a demultiplexer 16, decoders (1) 17, decoder Converter (2) 18, LPC
It is configured to include a synthesizer 21, an LPF (Low Pass Filtcr), and the like.

デマルチプレクサ１６は、伝送路1501を介して
入力した各種データをマルチプレクサ１５の時分
割伝送形式による変換前の状態に復元し、LPC
係数データは出力ライン161を介して復号化器(1)
１７に、マルチパルスデータは出力ライン162を
介して復号化器(2)１８にそれぞれ供給され、これ
らの復号化器によつてデータの復号化を行なつた
うえ、それぞれ出力ライン171，181に送出する。 The demultiplexer 16 restores various data input via the transmission path 1501 to the state before conversion by the time division transmission format of the multiplexer 15, and
Coefficient data is sent to the decoder (1) via output line 161
17, the multi-pulse data is supplied to decoders (2) and 18 through output lines 162, and the data is decoded by these decoders and then output to output lines 171 and 181, respectively. Send.

LPC合成器２１は、このようにして入力する
マルチパルスを音源情報としてｐ次の全極型デジ
タルフイルタの駆動音源に利用し、また出力ライ
ン171を介して入力するｐ次のLPC係数データを
上記全極型デジタルフイルタの係数としてこの
LPC合成フイルタを制御して入力音声信号を合
成し、これを出力ライン211を介してLPF22に送
出し、所定の低域フイルタリングを行つてアナロ
グ量の合成音声として出力ライン221に送出する。 The LPC synthesizer 21 uses the input multi-pulses as sound source information as a driving sound source of the p-order all-pole digital filter, and also uses the p-order LPC coefficient data input via the output line 171 as the sound source information. This is used as the coefficient of an all-pole digital filter.
The LPC synthesis filter is controlled to synthesize the input audio signal, and sends it to the LPF 22 via the output line 211, performs a predetermined low-pass filtering, and sends it to the output line 221 as analog synthesized audio.

なお、第３図および第４図に示す本発明の実施
例においては、LPC係数としてκパラメータを
用いているがこれは他のLPC係数、たとえばα
パラメータ等を利用してもよく、また符号化器と
マルチプレクサ、および復号化器とデマルチプレ
クサはそれぞれこれらを一体化した構成のものと
しても同様に実施し得ることは明らかであり、ま
たLPC合成フイルタは全極型以外の非極型デジ
タルフイルタ等と置換してもほぼ同様に実施しう
ることもまた明らかである。 Note that in the embodiments of the present invention shown in FIGS. 3 and 4, the κ parameter is used as the LPC coefficient, but this is different from other LPC coefficients, such as α.
It is clear that the encoder and the multiplexer, and the decoder and the demultiplexer can be implemented in the same way by integrating them, and the LPC synthesis filter It is also clear that it can be implemented in almost the same way even if it is replaced with a non-polar type digital filter or the like other than the all-polar type.

又、本実施例においては入力音声信号の分析フ
レーム内の短時間電力の変動率を求め、更に前記
変動率が予じめ設定された値を越える場合に、前
記入力音声信号の分析フレーム内に於ける短時間
電力の変動を補正し複数個のインパルス系列を求
めているが、本発明の主旨は分析フレーム内に於
ける短時間電力の変動率を観測し、更に前記変動
率を用いて分析フレームの特定の区間にパルスが
過度に集中する欠点を除去することにあり、必ず
しも入力音声信号の分析フレームに内に於ける短
時間電力の変動を補正する必要はない。例えば、
前記入力音声信号の分析フレーム内の短時間電力
の変動率を求め、更に前記変動率が予じめ設定さ
れた値を越える場合に分析フレームを複数の分析
区間に分割し、各分割された区間に所定のパルス
を割当て、更に各々の区間で独立に複数個のイン
パルス系列を求めることにより本発明の主旨を損
うことなく実現し得ることは明らかである。 In addition, in this embodiment, the short-term power fluctuation rate within the analysis frame of the input audio signal is determined, and if the fluctuation rate exceeds a preset value, the fluctuation rate within the analysis frame of the input audio signal is determined. Although a plurality of impulse sequences are obtained by correcting short-term power fluctuations, the gist of the present invention is to observe the short-time power fluctuation rate within an analysis frame, and further perform analysis using the fluctuation rate. The purpose is to eliminate the drawback of excessive concentration of pulses in certain sections of the frame, and it is not necessarily necessary to correct for short-term power fluctuations within the analysis frame of the input audio signal. for example,
The fluctuation rate of short-time power within the analysis frame of the input audio signal is determined, and if the fluctuation rate exceeds a preset value, the analysis frame is divided into a plurality of analysis sections, and each divided section is calculated. It is clear that the gist of the present invention can be realized by allocating predetermined pulses to and further obtaining a plurality of impulse sequences independently in each section without impairing the gist of the present invention.

以上説明した如く本発明によれば、マルチパル
スボコーダにおいて、分析フレーム内での電力の
急激な変動がある場合に、電力のより大きな分析
フレーム内の一区間にパルスが集中し、電力のよ
り小いさい分析フレーム内の区間にパルスが存在
しなくなることによる合成音質の劣化を大幅に改
善し得るという効果がある。 As explained above, according to the present invention, in a multipulse vocoder, when there is a sudden change in power within an analysis frame, the pulses are concentrated in one section within the analysis frame with higher power, and This has the effect of significantly improving the deterioration of synthesized sound quality due to the absence of pulses in sections within the analysis frame.

【図面の簡単な説明】[Brief explanation of drawings]

第１図は従来のマルチパルス型ボコーダの基本
的構成を示すブロツク図、第２図は従来のマルチ
パルス型ボコーダの欠点を説明するための波形
図、第３図は本発明によるマルチパルス型ボコー
ダの分析側の一実施例を示すブロツク図、第４図
は本発明によるマルチパルス型ボコーダの合成側
の一実施例を示すブロツク図である。１，２１……LPC合成器、２，７……LPC分
析器、３，１２……音源パルス発生器、４……減
算器、５……聴感重み付け器、６……２乗誤差最
小化器、８……相互相関関数算出器、９……電力
算出器、１０，１３……符号化器、１１……自己
相関関数算出器、１４……電力変動補正器、１５
……マルチプレクサ、１６……デマルチプレク
サ、１７，１８……復号化器、２２……LPF。 Fig. 1 is a block diagram showing the basic configuration of a conventional multi-pulse vocoder, Fig. 2 is a waveform diagram for explaining the drawbacks of the conventional multi-pulse vocoder, and Fig. 3 is a multi-pulse vocoder according to the present invention. FIG. 4 is a block diagram showing an embodiment of the analysis side of the multi-pulse vocoder according to the present invention. 1, 21... LPC synthesizer, 2, 7... LPC analyzer, 3, 12... Sound source pulse generator, 4... Subtractor, 5... Auditory weighter, 6... Square error minimizer , 8... Cross-correlation function calculator, 9... Power calculator, 10, 13... Encoder, 11... Auto-correlation function calculator, 14... Power fluctuation corrector, 15
... multiplexer, 16 ... demultiplexer, 17, 18 ... decoder, 22 ... LPF.

Claims

【特許請求の範囲】１入力音声信号を分析フレームごとにLPC
（Linear Prediction Coefficient、線形予測係数）
分析して抽出したLPC係数をスペクトル包絡情
報としこのスペクトル包絡情報とともに前記入力
音声信号の音声情報を構成する音源情報を分析フ
レームごとにこの音源情報の特徴に対応する発生
時間位置と振幅とを有する予じめ定めた複数個の
インパルス系列（マルチパルス）を以つて表現し
て前記入力音声信号の分析および合成を行なうマ
ルチパルス型ボコーダにおいて、前記入力音声信
号の分析フレーム内の短時間電力の変動率を求め
更に前記変動率が予じめ設定された値を越える場
合に分析フレームの一部の区間にインパルスが集
中するのを防ぐ手段を分析側に有することを特徴
とするマルチ型ボコーダ。２入力音声信号を分析フレームごとにLPC
（Linear Prediction Coefficient、線形予測係数）
分析して抽出したLPC係数をスペクトル包絡情
報としこのスペクトル包絡情報とともに前記入力
音声信号の音声情報を構成する音源情報を分析フ
レームごとにこの音源情報の特徴に対応する発生
時間位置と振幅とを有する予じめ定めた複数個の
インパルス系列（マルチパルス）を以つて表現し
て前記入力音声信号の分析および合成を行なうマ
ルチパルス型ボコーダにおいて、前記入力音声信
号の分析フレーム内の短時間電力の変動率を求め
更に前記変動率が予じめ設定された値を越える場
合に分析フレームを複数の分析区間に分割し複数
個のインパルス系列を求める手段を分析側に有す
ることを特徴とするマルチパルス型ボコーダ。[Claims] 1. LPC the input audio signal for each analysis frame.
(Linear Prediction Coefficient)
The analyzed and extracted LPC coefficients are used as spectral envelope information, and together with this spectral envelope information, sound source information constituting the audio information of the input audio signal has a generation time position and amplitude corresponding to the characteristics of this sound source information for each analysis frame. In a multi-pulse vocoder that analyzes and synthesizes the input audio signal by expressing it using a plurality of predetermined impulse sequences (multipulses), short-term power fluctuations within an analysis frame of the input audio signal 1. A multi-type vocoder, characterized in that the analysis side has means for determining the fluctuation rate and further preventing concentration of impulses in a part of the analysis frame when the fluctuation rate exceeds a preset value. 2 Analyze input audio signal by LPC for each frame
(Linear Prediction Coefficient)
The analyzed and extracted LPC coefficients are used as spectral envelope information, and together with this spectral envelope information, sound source information constituting the audio information of the input audio signal has a generation time position and amplitude corresponding to the characteristics of this sound source information for each analysis frame. In a multi-pulse vocoder that analyzes and synthesizes the input audio signal by expressing it using a plurality of predetermined impulse sequences (multipulses), short-term power fluctuations within an analysis frame of the input audio signal multi-pulse type, characterized in that the analysis side has means for determining the fluctuation rate and further dividing the analysis frame into a plurality of analysis intervals and determining a plurality of impulse sequences when the fluctuation rate exceeds a preset value. vocoder.