JP5323144B2

JP5323144B2 - Decoding device and spectrum shaping method

Info

Publication number: JP5323144B2
Application number: JP2011172221A
Authority: JP
Inventors: 公生三関
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2011-08-05
Filing date: 2011-08-05
Publication date: 2013-10-23
Anticipated expiration: 2026-07-07
Also published as: JP2012003277A

Description

この発明は、音声信号のスペクトル整形を行うスペクトル整形方法および装置に関する。 The present invention relates to a spectrum shaping method and apparatus for performing spectrum shaping of an audio signal.

従来のＣＥＬＰベースの音声符号化方式では、復号部において、復号した音源信号を合成フィルタに入力することにより音声信号を生成し、この音声信号を適応ポストフィルタに通過させたものを出力音声信号とする。この適応ポストフィルタは、スペクトル強調部を備え、このスペクトル強調部において、合成フィルタの係数を利用したスペクトル強調フィルタで音声信号のスペクトル包絡（スペクトルの概形）を強調することによりスペクトル整形を行う（例えば、非特許文献１参照）。このようなスペクトル整形により、符号化音声の雑音感が低減するので主観品質が改善される。 In a conventional CELP-based speech coding method, a decoding unit generates a speech signal by inputting a decoded excitation signal to a synthesis filter, and passes the speech signal through an adaptive post filter as an output speech signal. To do. This adaptive post filter includes a spectrum emphasizing unit, and in this spectrum emphasizing unit, spectrum shaping is performed by emphasizing a spectrum envelope (general shape of spectrum) of a speech signal with a spectrum emphasizing filter using a coefficient of a synthesis filter ( For example, refer nonpatent literature 1). By such spectrum shaping, the noise quality of the encoded speech is reduced, so that the subjective quality is improved.

一般に、理想モデルとしては音源信号のスペクトルの形状は平坦であることを想定しているが、実音声を予測分析して得られる音源信号や、符号化して得られる音源信号は、実際には理想的な平坦ではなく、通常はスペクトル包絡の形状にかなりの凹凸がある。この要因としては、予測分析については、予測分析の精度や予測の次数の不足が要因となっている。 In general, the ideal model assumes that the shape of the spectrum of the sound source signal is flat, but the sound source signal obtained by predictive analysis of actual speech and the sound source signal obtained by encoding are actually ideal. In general, there is considerable unevenness in the shape of the spectral envelope. As this factor, in the prediction analysis, the accuracy of the prediction analysis and the lack of the order of the prediction are factors.

また、符号化の影響としては、ＣＥＬＰ系の符号化などでは、合成された音声信号の歪みが小さくなるように音源信号の符号化が行われるため、音声信号のパワーが大きい周波数帯域を表すために音源信号の符号化ビットが多く消費され、結果、その周波数帯域に音源信号のエネルギが集中する傾向が強くなることにある。 In addition, as an influence of encoding, in CELP encoding or the like, since the sound source signal is encoded so that the distortion of the synthesized audio signal is reduced, it represents a frequency band in which the power of the audio signal is large. As a result, many encoded bits of the sound source signal are consumed, and as a result, the tendency of the energy of the sound source signal to concentrate in the frequency band becomes stronger.

音源信号のエネルギが集中する傾向は、符号化された音源信号の周波数帯域を狭くする傾向につながる。音源信号の周波数帯域が狭くなる傾向は、即ち、その音源信号を合成フィルタで合成して得られる音声信号の周波数帯域も狭くなる傾向につながるので、結果的に、音のこもり感や狭帯域感となって音質の自然性が低下し、主観品質が低下することになる。 The tendency for the energy of the sound source signal to concentrate leads to the tendency to narrow the frequency band of the encoded sound source signal. The tendency that the frequency band of the sound source signal becomes narrower, that is, the frequency band of the audio signal obtained by synthesizing the sound source signal with the synthesis filter also tends to become narrower. As a result, the naturalness of the sound quality is lowered, and the subjective quality is lowered.

従来法では、合成フィルタの周波数特性の形状を利用したスペクトル強調フィルタA(z/α)/A(z/β)を用いてスペクトル強調を行うことにより、符号化後の音源信号の周波数帯域が狭くなることについて何ら考慮をしていない。このため、符号化の結果により音源信号の周波数帯域が狭くなっている符号化区間では、この音源信号から生成された音声信号に対してスペクトル強調を行うと、他の符号化区間よりも、音声信号にこもり感や狭帯域感の程度を増加させることになる。 In the conventional method, by performing spectrum enhancement using the spectrum enhancement filter A (z / α) / A (z / β) using the shape of the frequency characteristic of the synthesis filter, the frequency band of the sound source signal after encoding is reduced. No consideration is given to narrowing. For this reason, in the coding section where the frequency band of the sound source signal is narrowed due to the coding result, if the spectrum enhancement is performed on the sound signal generated from this sound source signal, the sound is more effective than the other coding sections. This will increase the degree of feeling of being crowded and feeling narrow in the signal.

このように、音源信号の周波数帯域が狭くなる程度は、符号化区間毎の音声信号の特性や、符号化の結果の影響を受けて不規則なものとなり、スペクトル強調を行って得られる音声信号に付与されるこもり感や狭帯域感も不規則に現れる傾向にある。このことも主観品質を劣化させる要因となっている。 In this way, the degree to which the frequency band of the sound source signal becomes narrower becomes irregular due to the influence of the characteristics of the sound signal for each coding section and the result of the coding, and the sound signal obtained by performing spectrum enhancement There is also a tendency for the feeling of being confined and the feeling of a narrow band to appear irregularly. This is also a factor that degrades subjective quality.

このように、従来のポストフィルタを用いたスペクトル整形では、音声信号を合成するための合成フィルタの係数を利用して、音声信号のスペクトル包絡を強調するが、このスペクトル強調により符号化音声の雑音感はある程度まで低減できるももの、副作用として、スペクトル強調された音声信号にこもり感や狭帯域感が付与されやすく、より高品質なスペクトル整形が難しいという問題があった。 As described above, in the spectrum shaping using the conventional post filter, the spectrum envelope of the voice signal is enhanced by using the coefficient of the synthesis filter for synthesizing the voice signal. Although the feeling can be reduced to a certain extent, as a side effect, there is a problem that a feeling of being muffled or a narrow band is easily imparted to the spectrum-enhanced audio signal, and it is difficult to shape a higher quality spectrum.

P.Kroon and E.F.Deprettere, “A Class of Analysis - by - Synthesis Predictive Coders for High Quality Speech Coding at Rates Between4.8 and 16kbits/s”, IEEE SAC - G. February, 1988. pp. 353〜363P.Kroon and E.F.Deprettere, “A Class of Analysis-by-Synthesis Predictive Coders for High Quality Speech Coding at Rates Between4.8 and 16kbits / s”, IEEE SAC-G. February, 1988. pp. 353-363

従来のポストフィルタを用いたスペクトル整形では、合成フィルタの係数を利用したスペクトル強調フィルタで音声信号のスペクトル包絡を強調することによりスペクトル強調により符号化音声の雑音感はある程度まで低減できるももの、副作用として、こもり感や狭帯域感が付与されやすくなり、より高品質なスペクトル整形が難しいという問題があった。 In spectrum shaping using a conventional post filter, noise emphasis can be reduced to some extent by spectrum enhancement by emphasizing the spectral envelope of the speech signal with a spectrum enhancement filter that uses the coefficients of the synthesis filter. As described above, there is a problem that a feeling of being obscured or a feeling of a narrow band is easily given, and it is difficult to shape a spectrum with higher quality.

この発明は上記の問題を解決すべくなされたもので、音のこもり感や狭帯域感を低減して、従来よりも安定的に主観品質を改善するための、高品質なスペクトル整形が容易な復号装置およびスペクトル整形方法を提供することを目的とする。 The present invention has been made to solve the above-mentioned problems, and it is easy to perform high-quality spectrum shaping to improve the subjective quality more stably than in the past by reducing the feeling of noise and a narrow band. An object is to provide a decoding device and a spectrum shaping method.

上記の目的を達成するために、この発明の復号装置は、符号化データを復号して第１の音源信号を生成する音源生成手段と、第１の音源信号を重み付き予測して第１の音源信号のスペクトルを平坦化した第２の音源信号を生成する重み付き予測手段と、第２の音源信号を用いて音声信号を合成する合成手段とを具備することを特徴とする。 In order to achieve the above object, a decoding apparatus according to the present invention includes a sound source generating unit that decodes encoded data to generate a first sound source signal, and performs weighted prediction of the first sound source signal to generate a first sound source signal . It is characterized by comprising weighted prediction means for generating a second sound source signal in which the spectrum of the sound source signal is flattened, and synthesis means for synthesizing an audio signal using the second sound source signal.

この発明に係わる復号装置の一実施形態の構成を示す回路ブロック図。The circuit block diagram which shows the structure of one Embodiment of the decoding apparatus concerning this invention. 図１に示した復号装置の音源分析部および音源スペクトル平坦化部の構成を示す回路ブロック図。FIG. 3 is a circuit block diagram showing a configuration of a sound source analysis unit and a sound source spectrum flattening unit of the decoding apparatus shown in FIG. 1. この発明に係わる復号装置の他の構成例を示す回路ブロック図。The circuit block diagram which shows the other structural example of the decoding apparatus concerning this invention. この発明に係わる復号装置の他の構成例を示す回路ブロック図。The circuit block diagram which shows the other structural example of the decoding apparatus concerning this invention. 図１に示した復号装置の音源生成部の構成を示す回路ブロック図。FIG. 3 is a circuit block diagram illustrating a configuration of a sound source generation unit of the decoding device illustrated in FIG. 1. 図５に示した音源生成部を図１に示した復号装置に適用した構成を示す回路ブロック図。FIG. 6 is a circuit block diagram illustrating a configuration in which the sound source generation unit illustrated in FIG. 5 is applied to the decoding device illustrated in FIG. 1. 図５に示した音源生成部を図１に示した復号装置に適用した構成を示す回路ブロック図。FIG. 6 is a circuit block diagram illustrating a configuration in which the sound source generation unit illustrated in FIG. 5 is applied to the decoding device illustrated in FIG. 1.

以下、図面を参照して、この発明の一実施形態について説明する。
図１は、この発明の一実施形態に係わる復号装置の構成を示すものである。復号装置は、合成フィルタ係数生成部１０と、音源生成部２０と、音源分析部３０と、音源スペクトル平坦化部４０と、合成フィルタ部５０と、ポストフィルタ部６０とを備えている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 shows the configuration of a decoding apparatus according to an embodiment of the present invention. The decoding apparatus includes a synthesis filter coefficient generation unit 10, a sound source generation unit 20, a sound source analysis unit 30, a sound source spectrum flattening unit 40, a synthesis filter unit 50, and a post filter unit 60.

合成フィルタ係数生成部１０は、合成フィルタ用の符号化データから、後述する合成フィルタ部５０およびポストフィルタ部６０で用いる合成フィルタ係数を生成し、これを合成フィルタ部５０およびポストフィルタ部６０に出力する。
音源生成部２０は、音源信号用の符号化データを復号して音源信号を生成し、これを合成フィルタ部５０および音源分析部３０に出力する。 The synthesis filter coefficient generation unit 10 generates synthesis filter coefficients to be used in the synthesis filter unit 50 and the post filter unit 60 described later from the encoded data for the synthesis filter, and outputs them to the synthesis filter unit 50 and the post filter unit 60. To do.
The sound source generating unit 20 decodes the encoded data for the sound source signal to generate a sound source signal, and outputs this to the synthesis filter unit 50 and the sound source analyzing unit 30.

音源分析部３０は、音源生成部２０にて生成された音源信号を分析し、この音源信号のスペクトル包絡を平坦化するための平坦化パラメータを求め、これを音源スペクトル平坦化部４０に出力する。その一例としては、上記音源信号をＬＰＣ分析して、上記音源信号についての短期予測係数を求め、これを上記平坦化パラメータとして出力する。 The sound source analysis unit 30 analyzes the sound source signal generated by the sound source generation unit 20, obtains a flattening parameter for flattening the spectrum envelope of the sound source signal, and outputs this to the sound source spectrum flattening unit 40. . As an example, the sound source signal is subjected to LPC analysis to obtain a short-term prediction coefficient for the sound source signal, and this is output as the flattening parameter.

音源スペクトル平坦化部４０は、音源分析部３０にて求めた平坦化パラメータを用いて、音源生成部２０から与えられる音源信号のスペクトル包絡の過度な傾きや凹凸を平坦化するようにスペクトル整形する。 The sound source spectrum flattening unit 40 uses the flattening parameter obtained by the sound source analyzing unit 30 to perform spectrum shaping so as to flatten an excessive inclination or unevenness of the spectrum envelope of the sound source signal given from the sound source generating unit 20. .

ここで、音源分析部３０と音源スペクトル平坦化部４０の具体的な構成について図２を用いて説明する。
図２にあるように、音源分析部３０は、自己相関算出部３０１と、予測係数算出部３０２とを備えている。自己相関算出部３０１は、音源信号ｅ（ｎ）の自己相関Ｒｅｅ（ｋ）を下式（１）にしたがって求める。 Here, specific configurations of the sound source analysis unit 30 and the sound source spectrum flattening unit 40 will be described with reference to FIG.
As shown in FIG. 2, the sound source analysis unit 30 includes an autocorrelation calculation unit 301 and a prediction coefficient calculation unit 302. The autocorrelation calculation unit 301 obtains the autocorrelation Ree (k) of the sound source signal e (n) according to the following equation (1).

予測係数算出部３０２は、自己相関算出部３０１で求められた自己相関Ｒｅｅ（ｋ）を用いて、例えば、下式（２）の正規方程式を解くことにより平坦化パラメータ（短期予測係数）ｄｉ（ｉ＝１，…，ｑ）を求める。下式（２）の正規方程式を解くための具体的なアルゴリズムとしては、例えば、Levinson-Durbin法が知られている。

The prediction coefficient calculation unit 302 uses the autocorrelation Ree (k) obtained by the autocorrelation calculation unit 301, for example, to solve the normal equation of the following equation (2) to obtain a flattening parameter (short-term prediction coefficient) di ( i = 1,..., q). For example, the Levinson-Durbin method is known as a specific algorithm for solving the normal equation of the following equation (2).

一方、音源スペクトル平坦化部４０は、重み付き予測フィルタ４０１と、ゲイン調整部４０２とを備えている。
なお、平坦化に関しては、音源信号のスペクトル包絡の凹凸を完全に平坦化するのではなく、弱い平坦化を行うようにすることで音質改善の効果が現れる。このような、弱い平坦化処理を実現する方法として、本実施例では、以下に述べるような、重み付き予測フィルタを用いる方法について説明する。重み付き予測フィルタの重み係数を適切な値に設定することによって平坦化の程度が弱くなるなるように制御することができる。 On the other hand, the sound source spectrum flattening unit 40 includes a weighted prediction filter 401 and a gain adjusting unit 402.
As for flattening, the effect of improving the sound quality appears by performing weak flattening instead of completely flattening the unevenness of the spectrum envelope of the sound source signal. In this embodiment, as a method for realizing such a weak flattening process, a method using a weighted prediction filter as described below will be described. By setting the weighting coefficient of the weighted prediction filter to an appropriate value, it is possible to control the degree of flattening to be weakened.

重み付き予測フィルタ４０１は、音源分析部３０にて求めた平坦化パラメータが設定され、これにより音源生成部２０から与えられる音源信号のスペクトル包絡を平坦化する重み付きの予測フィルタとして機能する。重み付き予測フィルタとしては種々の構成が考えられるが、ここでは一例として、下式（３）、（４）で示すＤｗ（ｚ）を用いることにする。ここで、λ１，λ２は重み係数を表す。

The weighted prediction filter 401 functions as a weighted prediction filter that is set with the flattening parameter obtained by the sound source analysis unit 30 and thereby flattens the spectrum envelope of the sound source signal given from the sound source generation unit 20. Various configurations can be considered as the weighted prediction filter. Here, as an example, Dw (z) represented by the following equations (3) and (4) is used. Here, λ1 and λ2 represent weighting factors.

この例では、Ｄｗ（ｚ）が重み付き予測フィルタとして機能するためには、λ２＜λ１、かつ、０≦λ２≦１の関係になるように重み係数λ１とλ２を設定すればよい。λ１を１以下に設定する場合は、Ｄｗ（ｚ）が重み付き予測フィルタとして機能するためには０≦λ２＜λ１≦１の関係になるように重み係数λ１とλ２を設定すればよい。こうすることにより、Ｄｗ（ｚ）の分子側の予測フィルタの特性を分母側の予測フィルタの特性が弱めるので、重み付き予測フィルタＤｗ（ｚ）が“弱い”予測フィルタとして機能するようになる。この“弱い”予測フィルタで音源信号をフィルタリングすることにより、スペクトルが“弱く”平坦化された音源信号を生成することができる。 In this example, in order for Dw (z) to function as a weighted prediction filter, the weighting factors λ1 and λ2 may be set so that λ2 <λ1 and 0 ≦ λ2 ≦ 1. When λ1 is set to 1 or less, in order for Dw (z) to function as a weighted prediction filter, the weighting factors λ1 and λ2 may be set so that 0 ≦ λ2 <λ1 ≦ 1. By doing so, the characteristics of the prediction filter on the numerator side of Dw (z) are weakened by the characteristics of the prediction filter on the denominator side, so that the weighted prediction filter Dw (z) functions as a “weak” prediction filter. By filtering the sound source signal with this “weak” prediction filter, it is possible to generate a sound source signal whose spectrum is “weak” and flattened.

この場合、具体的な処理の例は、下式（５）のようになる。下式（５）のｅ（ｎ）は平坦化される前の音源信号を表し、ｅ‘（ｎ）は平坦化された音源信号を表している。

In this case, a specific processing example is as shown in the following formula (5). In the following expression (5), e (n) represents a sound source signal before being flattened, and e ′ (n) represents a flattened sound source signal.

上記の重み付き予測フィルタの構成は極零型となるが、これに限られるものではない。例えば、重み付き予測フィルタとして、下式（６）に示すような全零型の構成でも有効である。この場合、具体的な処理の例は、下式（７）のようになる。

The configuration of the weighted prediction filter is a pole-zero type, but is not limited to this. For example, an all-zero configuration as shown in the following equation (6) is also effective as a weighted prediction filter. In this case, a specific processing example is as shown in the following formula (7).

このとき、β（ｉ）として減少する指数窓（即ち、β（ｉ）＝λ_１ ^ｉ）を用いれば、式（３）でλ_２＝０とすることと等価となる。さらに、β（ｉ）として指数窓と異なる減少窓を用いることで、より自由度の高い平坦化を行うことが可能となる。 At this time, if an exponent window that decreases as β (i) (that is, β (i) = λ ₁ ⁱ ) is used, this is equivalent to setting λ ₂ = 0 in Equation (3). Furthermore, it is possible to perform flattening with a higher degree of freedom by using a reduction window different from the exponent window as β (i).

このように、“弱い”予測フィルタとして機能するような重み付き予測フィルタには種々の構成が考えられるが、どのような構成を用いたとしても、本発明に含まれることは言うまでもない。 As described above, various configurations of the weighted prediction filter functioning as the “weak” prediction filter can be considered, but it goes without saying that any configuration is used in the present invention.

また、音源信号のスペクトル形状を平坦化する特性を付与する対象は、音源信号に対してだけに限定されるものではない。すなわち、音源信号のスペクトル形状を平坦化する特性を音声信号のレベルの信号に付与することも可能である。このように、音源信号のスペクトル形状を平坦化する特性を付与するための対象は音源信号のレベルの信号でも音声信号のレベルの信号でも構わない。また、音源信号のスペクトル形状を平坦化する特性を付与する対象は、時間領域の信号でも周波数領域の信号でもよく、同様の効果が得られる。 Further, the object to which the characteristic for flattening the spectrum shape of the sound source signal is not limited to the sound source signal. In other words, it is possible to impart a characteristic for flattening the spectrum shape of the sound source signal to the signal at the level of the audio signal. Thus, the target for imparting the characteristic of flattening the spectrum shape of the sound source signal may be a signal at the level of the sound source signal or a signal at the level of the audio signal. Further, the target to which the characteristic for flattening the spectrum shape of the sound source signal may be a time domain signal or a frequency domain signal, and the same effect can be obtained.

その骨子とするところは、音源信号のスペクトル形状を平坦化する特性が、結果として最終的に出力される音声信号に反映されるような構成であればよいので、実現方法や構成の違いに依らず、本発明に含まれる。 The essential point is that the characteristic that flattens the spectrum shape of the sound source signal may be reflected in the audio signal that is finally output, so that it depends on the implementation method and the difference in configuration. It is included in the present invention.

ゲイン調整部４０２は、音源生成部２０から与えられる音源信号に基づいて、重み付き予測フィルタ４０１から出力される、スペクトル包絡が平坦化された音源信号のゲインを調整して出力する。 Based on the sound source signal given from the sound source generation unit 20, the gain adjustment unit 402 adjusts and outputs the gain of the sound source signal with a flat spectral envelope output from the weighted prediction filter 401.

図１に戻ると、合成フィルタ部５０は、合成フィルタ係数生成部１０で生成した合成フィルタ係数を用いて、音源スペクトル平坦化部４０（ゲイン調整部４０２）から出力される音源信号に基づいて、音声信号を合成し、これをポストフィルタ部６０に出力する。 Returning to FIG. 1, the synthesis filter unit 50 uses the synthesis filter coefficient generated by the synthesis filter coefficient generation unit 10, based on the sound source signal output from the sound source spectrum flattening unit 40 (gain adjustment unit 402). The audio signal is synthesized and output to the post filter unit 60.

ポストフィルタ部６０は、合成フィルタ係数生成部１０で合成した合成フィルタ係数を用いて、合成フィルタ部５０から与えられる音声信号に対して、スペクトル包絡を強調して、上記音声信号のスペクトルを整形する。これにより出力音声信号が得られる。 The post filter unit 60 emphasizes the spectrum envelope of the audio signal given from the synthesis filter unit 50 by using the synthesis filter coefficient synthesized by the synthesis filter coefficient generation unit 10 and shapes the spectrum of the audio signal. . Thereby, an output audio signal is obtained.

以上のように、上記構成の復号装置は、音源分析部３０が、音源生成部２０にて生成された音源信号に基づいて、この音源信号のスペクトルを平坦化するための平坦化パラメータを求め、音源スペクトル平坦化部４０が、上記平坦化パラメータを用いて、音源生成部２０から与えられる音源信号に対応するスペクトル包絡の過度な傾きや凹凸を平坦化するようにスペクトル整形するようにしている。 As described above, in the decoding device configured as described above, the sound source analysis unit 30 obtains a flattening parameter for flattening the spectrum of the sound source signal based on the sound source signal generated by the sound source generation unit 20, The sound source spectrum flattening unit 40 uses the above flattening parameter to perform spectrum shaping so as to flatten an excessive inclination or unevenness of the spectrum envelope corresponding to the sound source signal given from the sound source generating unit 20.

したがって、上記構成の復号装置によれば、音源信号の凹凸が平坦化されているので、合成フィルタ部５０の特性が持つスペクトル包絡の形状をポストフィルタ部６０で強調しても、合成フィルタのスペクトル包絡を強調したスペクトルの凹凸と音源信号のスペクトルの凹凸とが重なることで生じる過度なスペクトル強調を防止することができ、これにより過度なスペクトル強調に起因する音のこもり感や不安定感が低減されるので、主観的な品質が改善できる。 Therefore, according to the decoding apparatus configured as described above, since the unevenness of the sound source signal is flattened, even if the shape of the spectral envelope possessed by the characteristics of the synthesis filter unit 50 is emphasized by the post filter unit 60, the spectrum of the synthesis filter Excessive spectral emphasis caused by overlapping of the spectral irregularities with emphasis on the envelope and the spectral irregularities of the sound source signal can be prevented, thereby reducing the feeling of noise and instability caused by excessive spectral enhancement. Therefore, subjective quality can be improved.

図５は、音源生成部２０の内部の構成の一例をより詳細に表したものである。
音源生成部２０は、音源信号用の符号化データを復号して音源信号を生成し、これを合成フィルタ部５０および音源分析部３０に出力する。この例では、音源生成部２０は、Ｇ．７２９やＡＭＲ方式と同様の構成であり、適応コードブック（適応ＣＢ）２２、固定コードブック（固定ＣＢ）２４、ゲインコードブック（ゲインＣＢ）２６、結合部２８から構成されている。 FIG. 5 shows an example of the internal configuration of the sound generator 20 in more detail.
The sound source generating unit 20 decodes the encoded data for the sound source signal to generate a sound source signal, and outputs this to the synthesis filter unit 50 and the sound source analyzing unit 30. In this example, the sound generator 20 729 and the AMR system, and includes an adaptive codebook (adaptive CB) 22, a fixed codebook (fixed CB) 24, a gain codebook (gain CB) 26, and a combining unit 28.

適応ＣＢ２２は、音源信号用の符号化データのうち、適応ＣＢ用の符号を基に、適応ＣＢから適応コードブックベクトルｅ１（ｎ）を生成する。固定ＣＢ２４は、音源信号用の符号化データのうち、固定ＣＢ用の符号を基に固定ＣＢから固定コードブックベクトルｅ２（ｎ）を生成する。 The adaptive CB 22 generates an adaptive codebook vector e1 (n) from the adaptive CB based on the code for the adaptive CB among the encoded data for the sound source signal. The fixed CB 24 generates a fixed codebook vector e2 (n) from the fixed CB based on the fixed CB code in the encoded data for the sound source signal.

ゲインＣＢ２６は、音源信号用の符号化データのうち、ゲインＣＢ用の符号を基に、ゲインＣＢから適応コードブックベクトル用のゲインｇ１と固定コードブックベクトル用のゲインｇ２を生成する。 The gain CB 26 generates the gain g1 for the adaptive codebook vector and the gain g2 for the fixed codebook vector from the gain CB based on the code for the gain CB in the encoded data for the excitation signal.

結合部２８は適応コードブックベクトル用のゲインを乗じた適応コードブックベクトルと、固定コードブックベクトル用のゲインを乗じた固定コードブックベクトルとを結合させることにより、音源信号を生成し、これを合成フィルタ部５０および音源分析部３０に出力する。結合の方法の一例は、ｅ（ｎ）＝ｇ１×ｅ１（ｎ）＋ｇ２×ｅ２（ｎ）とすることである。また、次の符号化に備えてこの音源信号を適応コードブックに格納する。 The combining unit 28 generates a sound source signal by combining the adaptive codebook vector multiplied by the gain for the adaptive codebook vector and the fixed codebook vector multiplied by the gain for the fixed codebook vector, and combines them. The data is output to the filter unit 50 and the sound source analysis unit 30. An example of the bonding method is e (n) = g1 * e1 (n) + g2 * e2 (n). In addition, the sound source signal is stored in the adaptive codebook in preparation for the next encoding.

図５に示すような音源生成部２０を図１に用いたものが図６である。ここでは図１と同様の箇所については、説明を省略する。図６に示すように、合成フィルタ処理される前の音源信号は適応コードブックと固定コードブックの両方の寄与から生成されている。このような合成フィルタ処理される前の音源信号を対象に、この音源信号のスペクトル包絡を平坦化するための平坦化パラメータを求めることで、合成前の音源信号のスペクトル包絡を安定的に平坦化することが可能となる効果がある。 FIG. 6 shows the sound source generation unit 20 shown in FIG. 5 used in FIG. Here, the description of the same parts as in FIG. 1 is omitted. As shown in FIG. 6, the sound source signal before the synthesis filter processing is generated from the contributions of both the adaptive codebook and the fixed codebook. Stable flattening of the spectral envelope of the sound source signal before synthesis by obtaining a flattening parameter for flattening the spectral envelope of the sound source signal for such a sound source signal before the synthesis filter processing. There is an effect that can be done.

この場合も、平坦化に関しては、音源信号のスペクトル包絡の凹凸を完全に平坦化するのではなく、弱い平坦化を行うようにすることで音質改善の効果が現れる。適応コードブックには、次の符号化に備えて音源信号を適応コードブックに格納する必要がある。その際、適応コードブックを符号化側と復号側で同じ内容にするために、復号側でも符号化側と同じ音源信号を格納する必要がある。 In this case as well, regarding the flattening, the unevenness of the spectrum envelope of the sound source signal is not completely flattened, but the effect of improving the sound quality appears by performing weak flattening. The adaptive code book needs to store the sound source signal in the adaptive code book in preparation for the next encoding. At this time, in order to make the adaptive codebook have the same content on the encoding side and the decoding side, it is necessary to store the same excitation signal on the decoding side as on the encoding side.

図６は、符号化側で平坦化する前の音源信号を適応コードブックに格納する場合に対応しており、平坦化する前の音源信号を適応コードブックに格納している。もし、符号化側で平坦化後の音源信号を適応コードブックに格納する場合は、図７に点線で示すように、復号側でも平坦化後の音源信号を適応コードブックに格納する必要がある。 FIG. 6 corresponds to the case where the excitation signal before flattening on the encoding side is stored in the adaptive codebook, and the excitation signal before flattening is stored in the adaptive codebook. If the encoding side stores the flattened excitation signal in the adaptive codebook, the decoding side needs to store the flattened excitation signal in the adaptive codebook as shown by a dotted line in FIG. .

なお、上記実施の形態では、合成フィルタ部５０の前段に音源スペクトル平坦化部４０を設けて音源信号を平坦化するようにしたが、これに代わって例えば、図３に示すように、音源スペクトル平坦化部４０を合成フィルタ部５０とポストフィルタ部６０との間に設けるようにしても同様の効果が得られる。 In the above embodiment, the sound source spectrum flattening unit 40 is provided before the synthesis filter unit 50 to flatten the sound source signal. Instead, for example, as shown in FIG. The same effect can be obtained by providing the flattening unit 40 between the synthesis filter unit 50 and the post filter unit 60.

さらには、図４に示すように、音源スペクトル平坦化部４０を、ポストフィルタ部６０内に設けるようにしてもよい。図４の例では、音源スペクトル平坦化部４０がポストフィルタ部６０内のスペクトル強調部６０１の後段に設けられている。 Furthermore, as shown in FIG. 4, the sound source spectrum flattening unit 40 may be provided in the post filter unit 60. In the example of FIG. 4, the sound source spectrum flattening unit 40 is provided after the spectrum enhancing unit 601 in the post filter unit 60.

この場合、ポストフィルタ部６０内では、スペクトル強調部６０１が、合成フィルタ係数生成部１０で生成した合成フィルタ係数を用いて、合成フィルタ部５０から与えられる音声信号に対して、スペクトル包絡を強調して、上記音声信号のスペクトルを整形する。 In this case, in the post filter unit 60, the spectrum enhancement unit 601 uses the synthesis filter coefficient generated by the synthesis filter coefficient generation unit 10 to enhance the spectrum envelope with respect to the audio signal provided from the synthesis filter unit 50. Then, the spectrum of the audio signal is shaped.

そして、音源スペクトル平坦化部４０が、音源分析部３０にて求めた平坦化パラメータを用いて、スペクトル強調部６０１から与えられる音声信号に対し、前述した、重み付き予測フィルタを用いることにより、平坦化処理が行われる。この重み付き予測フィルタの特性は音源分析部３０にて求めた平坦化パラメータを用いているため、音声信号を重み付き予測フィルタに通過させることにより、音声信号を構成する音源信号のスペクトル包絡についての過度な傾きや凹凸が平坦化された音声信号を生成することができる。 Then, the sound source spectrum flattening unit 40 uses the above-described weighted prediction filter for the speech signal given from the spectrum emphasizing unit 601 using the flattening parameter obtained by the sound source analyzing unit 30, thereby flattening. Processing is performed. Since the characteristics of the weighted prediction filter use the flattening parameters obtained by the sound source analysis unit 30, by passing the sound signal through the weighted prediction filter, the spectral envelope of the sound source signal constituting the sound signal is determined. An audio signal in which excessive inclination and unevenness are flattened can be generated.

傾き補償フィルタ６０２は、スペクトル強調部６０１の処理により付与されたスペクトルの傾きを補償する役割を持つ。この実施形態では、傾き補償フィルタ６０２が、音源スペクトル平坦化部４０から与えられる音声信号に対し、スペクトル強調部６０１で使用されたスペクトル強調フィルタの係数から傾き補償特性を求めて、この傾きを補償する。 The tilt compensation filter 602 has a role of compensating for the tilt of the spectrum given by the processing of the spectrum enhancement unit 601. In this embodiment, the gradient compensation filter 602 obtains a gradient compensation characteristic from the coefficient of the spectrum enhancement filter used in the spectrum enhancement unit 601 for the audio signal given from the sound source spectrum flattening unit 40, and compensates for this gradient. To do.

ゲイン調整部６０３は、傾き補償フィルタ６０２からの音声信号について、合成フィルタ部５０から与えられる音声信号と同程度のエネルギになるように、ゲインを調整したものを出力音声信号として出力する。 The gain adjustment unit 603 outputs, as an output audio signal, the gain adjusted so that the audio signal from the inclination compensation filter 602 has the same energy as the audio signal supplied from the synthesis filter unit 50.

このような構成であっても、合成フィルタのスペクトル包絡を強調したスペクトルの凹凸と音源信号のスペクトルの凹凸とが重なることで生じる過度なスペクトル強調を防止することができ、これにより過度なスペクトル強調に起因する音のこもり感や不安定感が低減されるので、主観的な品質が改善できる。 Even with such a configuration, it is possible to prevent excessive spectral enhancement caused by overlapping of the spectral irregularity that emphasizes the spectral envelope of the synthesis filter and the spectral irregularity of the sound source signal. Since the feeling of volume and instability caused by sound is reduced, subjective quality can be improved.

図４の例では、音源スペクトル平坦化部４０がポストフィルタ部６０内のスペクトル強調部６０１の後段に設けられているが、音源スペクトル平坦化部を設ける位置はこれに限られるものではない。ポストフィルタ自体も様々な構成が可能であるが、そのような場合でも、音源スペクトル平坦化部に相当する処理は、ポストフィルタ部の中の他にある処理の前段に配置しても、後段に配置しても、同じ機能を提供することができるため、ポストフィルタ部の中に音源スペクトル平坦化部を設ける場合、どの位置に配置する場合でも、本発明の一例に含まれることは明白である。 In the example of FIG. 4, the sound source spectrum flattening unit 40 is provided after the spectrum emphasizing unit 601 in the post filter unit 60, but the position where the sound source spectrum flattening unit is provided is not limited to this. The post filter itself can have various configurations, but even in such a case, the processing equivalent to the sound source spectrum flattening unit can be placed in the subsequent stage even if it is arranged in the previous stage of other processing in the post filter section. Even if arranged, since the same function can be provided, when the sound source spectrum flattening unit is provided in the post filter unit, it is obvious that any position is included in the example of the present invention. .

なお、この発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また上記実施形態に開示されている複数の構成要素を適宜組み合わせることによって種々の発明を形成できる。また例えば、実施形態に示される全構成要素からいくつかの構成要素を削除した構成も考えられる。さらに、異なる実施形態に記載した構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. Further, for example, a configuration in which some components are deleted from all the components shown in the embodiment is also conceivable. Furthermore, you may combine suitably the component described in different embodiment.

１０…合成フィルタ係数生成部、２０…音源生成部、２２…適応コードブック（適応ＣＢ）、２４…固定コードブック（固定ＣＢ）、２６…ゲインコードブック（ゲインＣＢ）、２８…結合部、３０…音源分析部、４０…音源スペクトル平坦化部、５０…合成フィルタ部、６０…ポストフィルタ部、３０１…自己相関算出部、３０２…予測係数算出部、４０１…予測フィルタ、４０２…ゲイン調整部、６０１…スペクトル強調部、６０２…補償フィルタ、６０３…ゲイン調整部。 DESCRIPTION OF SYMBOLS 10 ... Synthetic filter coefficient production | generation part, 20 ... Sound source production | generation part, 22 ... Adaptive codebook (adaptive CB), 24 ... Fixed codebook (fixed CB), 26 ... Gain codebook (gain CB), 28 ... Combining part, 30 DESCRIPTION OF SYMBOLS ... Sound source analysis part 40 ... Sound source spectrum flattening part 50 ... Synthesis filter part 60 ... Post filter part 301 ... Autocorrelation calculation part 302 ... Prediction coefficient calculation part 401 ... Prediction filter 402 ... Gain adjustment part 601: Spectrum enhancement unit, 602: Compensation filter, 603: Gain adjustment unit.

Claims

符号化データを復号して第１の音源信号を生成する音源生成手段と、
第１の音源信号を重み付き予測して第１の音源信号のスペクトルを平坦化した第２の音源信号を生成する重み付き予測手段と、
第２の音源信号を用いて音声信号を合成する合成手段とを具備することを特徴とする復号装置。 Sound source generating means for decoding the encoded data to generate a first sound source signal;
Weighted prediction means for generating a second sound source signal obtained by weighted prediction of the first sound source signal and flattening the spectrum of the first sound source signal ;
And a synthesizing unit that synthesizes an audio signal using the second sound source signal.

符号化データを復号して第１の音源信号を生成する音源生成手段と、
第１の音源信号を重み付き予測して第１の音源信号のスペクトルを平坦化した第２の音源信号を生成する重み付き予測手段と、
第２の音源信号を用いて音声信号を合成する合成手段と、
合成手段で合成した音声信号に対してスペクトル整形を行うポスト処理手段とを具備することを特徴とする復号装置。 Sound source generating means for decoding the encoded data to generate a first sound source signal;
Weighted prediction means for generating a second sound source signal obtained by weighted prediction of the first sound source signal and flattening the spectrum of the first sound source signal ;
Synthesizing means for synthesizing an audio signal using the second sound source signal;
A decoding apparatus comprising: post-processing means for performing spectrum shaping on the audio signal synthesized by the synthesizing means.

前記音源生成手段は、前記符号化データに含まれる適応コードブック用の符号に基づき生成される適応コードブックベクトルおよび前記符号化データに含まれる固定コードブック用の符号に基づき生成される固定コードブックベクトルに、前記符号化データに含まれるゲインコードブック用の符号に基づき生成される第１および第２のゲインをそれぞれ乗じ、前記第１のゲインを乗じた適応コードブックベクトルと、前記第２のゲインを乗じた固定コードブックベクトルとを結合することによって、前記第１の音源信号を生成する請求項１または２記載の復号装置。 The sound source generating means, fixed to be generated based on the code for the fixed codebook included in the adaptive codebook vector and the coded data is generated based on the code for adaptation codebook contained in the encoded data the codebook vector, multiplying the first and second gain is generated based on the sign of the gain codebook contained in the encoded data, respectively, and adaptation codebook vector obtained by multiplying the first gain, the The decoding apparatus according to claim 1 or 2, wherein the first sound source signal is generated by combining a fixed codebook vector multiplied by a second gain.

符号化データを復号して第１の音源信号を生成する音源生成ステップと、
第１の音源信号を重み付き予測して第１の音源信号のスペクトルを平坦化した第２の音源信号を生成する重み付き予測ステップと、
第２の音源信号を用いて音声信号を合成する合成ステップとを具備することを特徴とするスペクトル整形方法。 A sound source generating step of decoding encoded data to generate a first sound source signal;
A weighted prediction step of generating a second sound source signal in which the spectrum of the first sound source signal is flattened by weighted prediction of the first sound source signal ;
And a synthesis step of synthesizing an audio signal using the second sound source signal.

符号化データを復号して第１の音源信号を生成する音源生成ステップと、
第１の音源信号を重み付き予測して第１の音源信号のスペクトルを平坦化した第２の音源信号を生成する重み付き予測ステップと、
第２の音源信号を用いて音声信号を合成する合成ステップと、
合成ステップで合成した音声信号に対してスペクトル整形を行うポスト処理ステップとを具備することを特徴とするスペクトル整形方法。 A sound source generating step of decoding encoded data to generate a first sound source signal;
A weighted prediction step of generating a second sound source signal in which the spectrum of the first sound source signal is flattened by weighted prediction of the first sound source signal ;
A synthesis step of synthesizing an audio signal using the second sound source signal;
And a post-processing step of performing spectrum shaping on the audio signal synthesized in the synthesis step.

前記音源生成ステップは、前記符号化データに含まれる適応コードブック用の符号に基づき生成される適応コードブックベクトルおよび前記符号化データに含まれる固定コードブック用の符号に基づき生成される固定コードブックベクトルに、前記符号化データに含まれるゲインコードブック用の符号に基づき生成される第１および第２のゲインをそれぞれ乗じ、前記第１のゲインを乗じた適応コードブックベクトルと、前記第２のゲインを乗じた固定コードブックベクトルとを結合することによって、前記第１の音源信号を生成する請求項４または５記載のスペクトル整形方法。 The sound source generating step, fixed to be generated based on the code for the fixed codebook included in the adaptive codebook vector and the coded data is generated based on the code for adaptation codebook contained in the encoded data the codebook vector, multiplying the first and second gain is generated based on the sign of the gain codebook contained in the encoded data, respectively, and adaptation codebook vector obtained by multiplying the first gain, the 6. The spectrum shaping method according to claim 4, wherein the first sound source signal is generated by combining with a fixed codebook vector multiplied by a second gain.