JP4764956B1

JP4764956B1 - Speech coding apparatus and speech coding method

Info

Publication number: JP4764956B1
Application number: JP2011025047A
Authority: JP
Inventors: 秀太鈴木; 大祐奥野; 明孝苫米地
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2011-02-08
Filing date: 2011-02-08
Publication date: 2011-09-07
Anticipated expiration: 2031-02-08
Also published as: JP2012163825A

Abstract

【課題】処理量の増大を抑制しつつ、背景雑音を復号側に正確に伝送する音声符号化装置及び音声符号化方法を提供する。
【解決手段】線形予測符号帳１０２には、音声の分析により得られた線形予測パラメータと、背景雑音の分析により得られた線形予測パラメータとが格納され、入力音声信号から求められた線形予測パラメータを示すインデックスが選択される。駆動音源選択部１０９は、選択された線形予測符号帳１０２のインデックスが音声由来か背景雑音由来かに応じて、固定符号帳１０７又は固定符号帳１０８を選択する。
【選択図】図１A speech encoding apparatus and speech encoding method for accurately transmitting background noise to a decoding side while suppressing an increase in processing amount are provided.
A linear prediction codebook stores linear prediction parameters obtained by speech analysis and linear prediction parameters obtained by background noise analysis, and linear prediction parameters obtained from an input speech signal. An index indicating is selected. The driving excitation selection unit 109 selects the fixed codebook 107 or the fixed codebook 108 according to whether the selected index of the linear prediction codebook 102 is derived from speech or background noise.
[Selection] Figure 1

Description

本発明は、音声及び背景雑音の符号化を行う音声符号化装置及び音声符号化方法に関する。 The present invention relates to a speech encoding apparatus and speech encoding method for encoding speech and background noise.

０．従来の背景と課題
０．１．ＣＥＬＰ（Code Excited Linear Prediction）の概要
移動体通信においては、限られた伝送帯域を有効に利用するために、音声や画像のディジタル情報の圧縮符号化が必須である。その中でも、携帯電話で広く利用された音声コーデック（符号化／復号化）技術に対する期待は大きく、圧縮率の高い高効率符号化に対してよりよい音質の要求が強まっている。 0. Conventional background and problems 0.1. Outline of CELP (Code Excited Linear Prediction) In mobile communication, in order to effectively use a limited transmission band, it is essential to compress and encode digital information of voice and images. Among them, there is a great expectation for speech codec (encoding / decoding) technology widely used in mobile phones, and there is an increasing demand for better sound quality for high-efficiency encoding with a high compression rate.

高効率な音声符号化技術としては、音声の発声機構をモデル化してベクトル量子化を応用したＣＥＬＰが広く知られている。ＣＥＬＰの主要な技術は、音声スペクトルの概形を低ビットレートで符号化することができる線形予測（ＬＰＣ：Linear Predictive Coding）分析と、線形予測分析によって得られたパラメータの量子化である。 As a high-efficiency speech coding technique, CELP that models a speech utterance mechanism and applies vector quantization is widely known. The main technology of CELP is linear predictive coding (LPC) analysis that can encode the outline of the speech spectrum at a low bit rate, and quantization of parameters obtained by the linear prediction analysis.

音声符号化技術は、ＣＥＬＰを基本として応用、発展がなされ、様々な技術が開発されているが、これらのＣＥＬＰ系の符号化技術は、音声に特化した符号化方式であるため、音声に背景雑音を合成すると異音が発生する。これは、聴覚的な要素を考慮せず、入力音声との二乗誤差が最小となるパラメータの組み合わせを選択することに起因する。 Speech coding technology has been applied and developed based on CELP, and various technologies have been developed. Since these CELP coding technologies are coding methods specialized for speech, When background noise is synthesized, abnormal noise is generated. This is due to selecting a combination of parameters that minimizes the square error with the input speech without considering auditory elements.

そこで、異音を低減し、耳障りを良くすることにより、聴覚上音質を改善するため、例えば、ゲインを平滑化することにより雑音による著しい振幅変動を低減したり、有音／無音及び雑音の分類により聴覚補正フィルタを切り替えたり、さらには、背景雑音区間の振幅レベルを抑圧してから符号化することが考えられる。 Therefore, in order to improve auditory sound quality by reducing abnormal noise and improving harshness, for example, significant amplitude fluctuation due to noise can be reduced by smoothing the gain, and voice / silence and noise classification It is conceivable that encoding is performed after the auditory correction filter is switched or the amplitude level of the background noise section is suppressed.

０．２．従来の装置構成
このような技術として、例えば、特許文献１及び特許文献２に開示の技術が知られている。特許文献１には、雑音区間において音源ゲインを過去の音源ゲインを用いて平滑化し、このとき、過去の音源ゲインからの変動量を一定の値に収まるように平滑化ゲインを制限することが開示されている。これにより、雑音区間のゲインが不自然に大きくならず、音声品質を改善することができる。 0.2. Conventional Apparatus Configuration As such a technique, for example, techniques disclosed in Patent Document 1 and Patent Document 2 are known. Patent Document 1 discloses that a sound source gain is smoothed using a past sound source gain in a noise interval, and at this time, the smoothing gain is limited so that a variation amount from the past sound source gain falls within a certain value. Has been. Thereby, the gain of the noise section does not increase unnaturally, and the voice quality can be improved.

また、特許文献２には、符号化側で符号化の対象となるフレームが音声区間か背景雑音区間かを分析し、分析結果を復号側に通知し、復号側では、背景雑音区間における第１合成フィルタの出力の平均パワーを求め、また、復号線形予測パラメータの平均スペクトルを求め、この平均スペクトルをフィルタ係数とする第２合成フィルタを白色雑音で駆動し、そのフィルタ出力を平均パワーで振幅調整して背景雑音の定常成分信号を得て、この信号を第１合成フィルタの出力に加算することが開示されている。これにより、ＣＥＬＰ系の音声符号化方式において、背景雑音の性質を復号側に伝え、より自然な再生音を実現することができる。 Further, in Patent Document 2, the encoding side analyzes whether the frame to be encoded is a speech section or a background noise section, and notifies the decoding side of the analysis result. On the decoding side, the first in the background noise section is reported. The average power of the synthesis filter output is obtained, the average spectrum of the decoded linear prediction parameter is obtained, the second synthesis filter having the average spectrum as a filter coefficient is driven with white noise, and the amplitude of the filter output is adjusted with the average power. It is disclosed that a stationary noise component signal is obtained and this signal is added to the output of the first synthesis filter. As a result, in the CELP audio coding method, the nature of the background noise is transmitted to the decoding side, and a more natural reproduced sound can be realized.

特開２００１−１３４２９６号公報JP 2001-134296 A 特開２０００−２３５４００号公報JP 2000-235400 A

しかしながら、上記特許文献１に開示の技術では、ＣＥＬＰが苦手とする背景雑音の復号において、短時間平均パワーの著しい変動に起因する音質劣化の低減を目的としており、背景雑音を復号側に正確に伝送できないという問題がある。 However, in the technique disclosed in Patent Document 1 described above, the background noise that CELP is not good at is aimed at reducing deterioration in sound quality caused by significant fluctuations in the average power for a short time. There is a problem that transmission is not possible.

また、上記特許文献２に開示の技術では、符号化側における音声区間又は背景雑音区間の分析に要する処理、及び、復号側における背景雑音の定常成分信号の取得に要する処理が複雑であり、処理量が増大するという問題がある。さらに、聴覚上の自然さは得られるものの、背景雑音を復号側に正確に伝送できないという問題がある。 Further, in the technique disclosed in Patent Document 2, the processing required for analysis of the speech section or background noise section on the encoding side and the processing required for acquisition of the steady-state component signal of background noise on the decoding side are complicated. There is a problem that the amount increases. Furthermore, although natural sound is obtained, there is a problem that background noise cannot be accurately transmitted to the decoding side.

本発明の目的は、処理量の増大を抑制しつつ、背景雑音を復号側に正確に伝送する音声符号化装置及び音声符号化方法を提供することである。 An object of the present invention is to provide a speech encoding apparatus and speech encoding method that accurately transmit background noise to a decoding side while suppressing an increase in processing amount.

本発明の音声符号化装置は、音声の分析により得られた第１の線形予測パラメータと、背景雑音の分析により得られた第２の線形予測パラメータとが格納され、格納された第１及び第２の線形予測パラメータから入力音声信号に基づいて線形予測パラメータを選択し、選択した前記線形予測パラメータに予め対応付けられたインデックスを出力する線形予測符号帳と、所定形状の音源ベクトルを格納する複数の固定符号帳と、出力された前記インデックスが前記第１の線形予測パラメータを示すか、前記第２の線形予測パラメータを示すかに応じて、前記複数の固定符号帳のいずれかを選択する駆動音源選択手段と、を具備する構成を採る。 The speech coding apparatus according to the present invention stores a first linear prediction parameter obtained by analyzing speech and a second linear prediction parameter obtained by analyzing background noise, and stores the stored first and second stored parameters. A linear prediction codebook that selects a linear prediction parameter from two linear prediction parameters based on an input speech signal, outputs an index that is associated with the selected linear prediction parameter in advance, and a plurality of sound source vectors that store a predetermined shape The fixed codebook and a drive for selecting one of the plurality of fixed codebooks depending on whether the output index indicates the first linear prediction parameter or the second linear prediction parameter And a sound source selecting unit.

本発明の音声符号化方法は、音声の分析により得られた第１の線形予測パラメータと、背景雑音の分析により得られた第２の線形予測パラメータとが格納された線形予測符号帳から、入力音声信号に基づいて選択された線形予測パラメータを示すインデックスを出力するステップと、出力された前記インデックスが前記第１の線形予測パラメータを示すか、前記第２の線形予測パラメータを示すかに応じて、複数の固定符号帳のいずれかを選択するステップと、を具備するようにした。 The speech coding method according to the present invention is input from a linear prediction codebook storing a first linear prediction parameter obtained by speech analysis and a second linear prediction parameter obtained by background noise analysis. Outputting an index indicating the linear prediction parameter selected based on the speech signal, and depending on whether the output index indicates the first linear prediction parameter or the second linear prediction parameter And selecting one of a plurality of fixed codebooks.

本発明によれば、処理量の増大を抑制しつつ、背景雑音を復号側に正確に伝送することができる。 According to the present invention, it is possible to accurately transmit background noise to the decoding side while suppressing an increase in processing amount.

本発明の一実施の形態に係るＣＥＬＰ符号化装置の構成を示すブロック図The block diagram which shows the structure of the CELP encoding apparatus which concerns on one embodiment of this invention 本発明の一実施の形態に係る線形予測符号帳の一例を示す図The figure which shows an example of the linear prediction codebook which concerns on one embodiment of this invention 図１に示した駆動音源選択部における駆動音源選択処理を示すフロー図The flowchart which shows the drive sound source selection process in the drive sound source selection part shown in FIG.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（一実施の形態）
１．一実施の形態
１．１．一実施の形態のＣＥＬＰ符号化装置の構成
図１は、本発明の一実施の形態に係るＣＥＬＰ符号化装置１００の構成を示すブロック図である。以下、ＣＥＬＰ符号化装置１００の構成について図１を用いて説明する。 (One embodiment)
1. Embodiment 1.1. Configuration of CELP Encoding Device According to an Embodiment FIG. 1 is a block diagram showing a configuration of a CELP encoding device 100 according to an embodiment of the present invention. Hereinafter, the configuration of CELP encoding apparatus 100 will be described with reference to FIG.

図１において、線形予測分析部１０１は、入力音声信号に対して線形予測分析を施し、スペクトル包絡情報である線形予測パラメータ（ＬＰＣ（Linear Prediction Coding）ともいう）を求め、求めた線形予測パラメータを線形予測符号帳１０２に出力する。 In FIG. 1, a linear prediction analysis unit 101 performs linear prediction analysis on an input speech signal, obtains a linear prediction parameter (also referred to as LPC (Linear Prediction Coding)) that is spectrum envelope information, and obtains the obtained linear prediction parameter. It outputs to the linear prediction codebook 102.

線形予測符号帳１０２は、音声の分析により得られた線形予測パラメータの代表ベクトル（音声由来）と、背景雑音の分析により得られた線形予測パラメータの代表ベクトル（背景雑音由来）とを格納し、線形予測分析部１０１から出力された線形予測パラメータとの二乗誤差が最小となる代表ベクトルを選択し、選択した代表ベクトルを聴覚重み付けフィルタ１０３及び合成フィルタ１０４に出力する。また、線形予測符号帳１０２は、選択した代表ベクトルに予め対応付けられたインデックスを線形予測符号帳インデックスとして駆動音源選択部１０９に出力すると共に、図示せぬ復号装置に伝送する。線形予測符号帳１０２は、例えば、図２に示すように、代表ベクトルの格納領域を２つに分け、一方の領域に音声由来の代表ベクトルを格納し、他方の領域に背景雑音由来の代表ベクトルを格納する。 The linear prediction codebook 102 stores a representative vector (derived from speech) of a linear prediction parameter obtained by speech analysis and a representative vector (derived from background noise) of a linear prediction parameter obtained by analyzing background noise, The representative vector that minimizes the square error with the linear prediction parameter output from the linear prediction analysis unit 101 is selected, and the selected representative vector is output to the auditory weighting filter 103 and the synthesis filter 104. Further, the linear prediction codebook 102 outputs an index associated with the selected representative vector in advance as a linear prediction codebook index to the drive excitation selection unit 109 and transmits it to a decoding device (not shown). For example, as shown in FIG. 2, the linear predictive codebook 102 divides a representative vector storage area into two areas, stores a representative vector derived from speech in one area, and a representative vector derived from background noise in the other area. Is stored.

聴覚重み付けフィルタ１０３は、線形予測符号帳１０２から出力された代表ベクトルに対応した係数によって、入力音声信号に重み付けを行い、聴覚重み付けがされた音声信号を加算器１０５に出力する。 The perceptual weighting filter 103 weights the input speech signal by a coefficient corresponding to the representative vector output from the linear prediction codebook 102 and outputs the perceptually weighted speech signal to the adder 105.

合成フィルタ１０４は、線形予測符号帳１０２から出力された代表ベクトルをフィルタ係数とし、後述する適応符号帳１０６及び固定符号帳１０７又は固定符号帳１０８で生成される音源ベクトルを駆動音源としたフィルタ関数、すなわち、ＬＰＣ合成フィルタを用いて合成信号を生成する。この合成信号は、加算器１０５に出力される。 The synthesis filter 104 uses a representative vector output from the linear prediction codebook 102 as a filter coefficient, and a filter function using a excitation vector generated by the adaptive codebook 106 and the fixed codebook 107 or the fixed codebook 108 described later as a driving sound source. That is, a synthesized signal is generated using an LPC synthesis filter. This synthesized signal is output to adder 105.

加算器１０５は、合成フィルタ１０４から出力された合成信号を聴覚重み付けフィルタ１０３から出力された音声信号から減算することによって誤差信号を算出し、この誤差信号を適応符号帳１０６、固定符号帳１０７、固定符号帳１０８及び駆動音源選択部１０９に出力する。 The adder 105 calculates an error signal by subtracting the synthesized signal output from the synthesis filter 104 from the audio signal output from the auditory weighting filter 103, and the error signal is converted into an adaptive codebook 106, a fixed codebook 107, The data is output to the fixed codebook 108 and the driving sound source selection unit 109.

適応符号帳１０６は、合成フィルタ１０４で使用された過去の駆動音源を記憶しており、加算器１０５から出力された誤差信号に基づいて、記憶している駆動音源から音源ベクトルを生成する。この音源ベクトルは、適応符号帳ベクトルとして加算器１１０に出力される。また、適応符号帳１０６は、適応符号帳ベクトルに予め対応付けられたインデックスを適応符号帳インデックスとして図示せぬ復号装置に伝送する。 The adaptive codebook 106 stores past driving sound sources used in the synthesis filter 104, and generates a sound source vector from the stored driving sound sources based on the error signal output from the adder 105. This excitation vector is output to adder 110 as an adaptive codebook vector. In addition, adaptive codebook 106 transmits an index previously associated with the adaptive codebook vector as an adaptive codebook index to a decoding device (not shown).

固定符号帳１０７は、有声音成分の所定形状の音源ベクトルを予め複数個記憶しており、加算器１０５から出力された誤差信号に基づく音源ベクトルを、固定符号帳ベクトルとして加算器１１０に出力する。また、固定符号帳１０７は、固定符号帳ベクトルに予め対応付けられたインデックスを固定符号帳インデックスとして図示せぬ復号装置に伝送する。 Fixed codebook 107 stores a plurality of sound source vectors having a predetermined shape of voiced sound components in advance, and outputs a sound source vector based on the error signal output from adder 105 to adder 110 as a fixed codebook vector. . In addition, fixed codebook 107 transmits an index previously associated with the fixed codebook vector to a decoding apparatus (not shown) as a fixed codebook index.

また、固定符号帳１０８は、無声音成分の所定形状の音源ベクトルを予め複数個記憶しており、加算器１０５から出力された誤差信号に基づく音源ベクトルを、固定符号帳ベクトルとして加算器１１０に出力する。また、固定符号帳１０８は、固定符号帳ベクトルに予め対応付けられたインデックスを固定符号帳インデックスとして図示せぬ復号装置に伝送する。 Fixed codebook 108 stores in advance a plurality of sound source vectors of a predetermined shape of unvoiced sound components, and outputs a sound source vector based on the error signal output from adder 105 to adder 110 as a fixed codebook vector. To do. In addition, fixed codebook 108 transmits an index previously associated with the fixed codebook vector as a fixed codebook index to a decoding device (not shown).

駆動音源選択部１０９は、線形予測符号帳１０２から出力されたインデックスの由来（音声由来か背景雑音由来か）に応じて、固定符号帳１０７又は固定符号帳１０８を選択する。ただし、線形予測符号帳インデックスが音声由来である場合、二乗誤差の大小判定を行って、二乗誤差が小さくなる固定符号帳１０７又は固定符号帳１０８を選択する。具体的には、固定符号帳１０７を選択したときに加算器１０５から出力された誤差信号を用いて二乗誤差１を算出し、また、固定符号帳１０８を選択した場合に加算器１０５から出力された誤差信号を用いて二乗誤差２を算出し、算出した二乗誤差１及び二乗誤差２の大小判定を行う。 The driving excitation selection unit 109 selects the fixed codebook 107 or the fixed codebook 108 according to the origin of the index output from the linear prediction codebook 102 (whether it is derived from speech or background noise). However, when the linear prediction codebook index is derived from speech, the magnitude of the square error is determined, and the fixed codebook 107 or the fixed codebook 108 in which the square error is small is selected. Specifically, the square error 1 is calculated using the error signal output from the adder 105 when the fixed codebook 107 is selected, and is output from the adder 105 when the fixed codebook 108 is selected. The square error 2 is calculated using the error signal, and the magnitudes of the calculated square error 1 and square error 2 are determined.

加算器１１０は、適応符号帳１０６から出力された適応符号帳ベクトルと、固定符号帳１０７又は固定符号帳１０８から出力された固定符号帳ベクトルとを加算し、加算後の音源ベクトルを駆動音源として合成フィルタ１０４に出力する。 Adder 110 adds the adaptive codebook vector output from adaptive codebook 106 and the fixed codebook vector output from fixed codebook 107 or fixed codebook 108, and uses the added excitation vector as a driving excitation source. Output to the synthesis filter 104.

１．２．駆動音源選択部における駆動音源選択処理
次に、図１に示した駆動音源選択部１０９における駆動音源選択処理について図３を用いて説明する。図３において、ステップ（以下、「ＳＴ」と省略する）２０１では、線形予測符号帳インデックスは音声由来か否かを判定し、音声由来である（ＹＥＳ）場合、ＳＴ２０２に移行し、音声由来ではない、すなわち、背景雑音由来である（ＮＯ）場合、ＳＴ２０６に移行する。 1.2. Drive Sound Source Selection Processing in Drive Sound Source Selection Unit Next, drive sound source selection processing in the drive sound source selection unit 109 shown in FIG. 1 will be described with reference to FIG. In step (hereinafter abbreviated as “ST”) 201 in FIG. 3, it is determined whether or not the linear prediction codebook index is derived from speech. If it is derived from speech (YES), the process proceeds to ST202. If not, i.e., it is derived from background noise (NO), the process proceeds to ST206.

ＳＴ２０２では、固定符号帳１０７を選択した場合の二乗誤差１を算出し、ＳＴ２０３では、固定符号帳１０８を選択した場合の二乗誤差２を算出する。 In ST202, a square error 1 is calculated when the fixed codebook 107 is selected, and in ST203, a square error 2 is calculated when the fixed codebook 108 is selected.

ＳＴ２０４では、ＳＴ２０２において算出した二乗誤差１がＳＴ２０３において算出した二乗誤差２より小さいか否かが判定され、二乗誤差１が二乗誤差２より小さい（ＹＥＳ）場合、ＳＴ２０５に移行し、二乗誤差１が二乗誤差２より小さくない（ＮＯ）場合、ＳＴ２０６に移行する。 In ST204, it is determined whether or not the square error 1 calculated in ST202 is smaller than the square error 2 calculated in ST203. If the square error 1 is smaller than the square error 2 (YES), the process proceeds to ST205, and the square error 1 is If it is not smaller than the square error 2 (NO), the process proceeds to ST206.

ＳＴ２０５では、固定符号帳１０７を選択して駆動音源選択処理を終了し、ＳＴ２０６では、固定符号帳１０８を選択して駆動音源選択処理を終了する。 In ST205, the fixed codebook 107 is selected and the driving excitation selection process is terminated. In ST206, the fixed codebook 108 is selected and the driving excitation selection process is terminated.

１．３．本実施の形態の効果
このように、本実施の形態によれば、音声の分析により得られた線形予測パラメータと、背景雑音の分析により得られた線形予測パラメータとを線形予測符号帳に格納し、入力音声信号から求められた線形予測パラメータを示すインデックスを選択し、選択されたインデックスが音声由来か背景雑音由来かに応じて、第１の固定符号帳又は第２の固定符号帳を選択する。これにより、処理量の増大を抑制しつつ、背景雑音を復号側に正確に伝送することができる。 1.3. As described above, according to the present embodiment, the linear prediction parameters obtained by speech analysis and the linear prediction parameters obtained by background noise analysis are stored in the linear prediction codebook. Then, an index indicating a linear prediction parameter obtained from the input speech signal is selected, and the first fixed codebook or the second fixed codebook is selected according to whether the selected index is derived from speech or background noise. . Thereby, it is possible to accurately transmit the background noise to the decoding side while suppressing an increase in the processing amount.

なお、本実施の形態では、線形予測パラメータを音声由来と背景雑音由来の２つの場合を例に説明したが、本発明はこれに限らず、例えば、これら２つに楽器音の分析により得られた線形予測パラメータ（楽器音由来）を加えてもよい。この場合、線形予測符号帳は、代表ベクトルの格納領域を３つに分ける。 In this embodiment, the two cases where the linear prediction parameters are derived from speech and background noise have been described as examples. However, the present invention is not limited to this, and for example, these two can be obtained by analyzing instrument sounds. Linear prediction parameters (derived from instrument sounds) may be added. In this case, the linear predictive codebook divides the representative vector storage area into three.

また、本実施の形態では、有声音用の固定符号帳と、無声音用の固定符号帳とを設ける場合について説明したが、本発明はこれに限らず、例えば、パルス用の固定符号帳と、雑音用の固定符号帳とを設けるようにしてもよい。また、システム及び使用環境により、背景雑音に最適な固定符号帳の種類を可変としてもよい。 Further, in the present embodiment, the case of providing a fixed codebook for voiced sound and a fixed codebook for unvoiced sound has been described, but the present invention is not limited to this, for example, a fixed codebook for pulses, A fixed codebook for noise may be provided. Also, the type of fixed codebook that is optimal for background noise may be variable depending on the system and usage environment.

本発明にかかる音声符号化装置及び音声符号化方法は、移動通信システムにおける無線通信端末装置等に適用できる。 The speech coding apparatus and speech coding method according to the present invention can be applied to a wireless communication terminal device in a mobile communication system.

１０１線形予測分析部
１０２線形予測符号帳
１０３聴覚重み付けフィルタ
１０４合成フィルタ
１０５、１１０加算器
１０６適応符号帳
１０７、１０８固定符号帳
１０９駆動音源選択部 DESCRIPTION OF SYMBOLS 101 Linear prediction analysis part 102 Linear prediction codebook 103 Auditory weighting filter 104 Synthesis filter 105,110 Adder 106 Adaptive codebook 107,108 Fixed codebook 109 Drive excitation selection part

Claims

音声の分析により得られた第１の線形予測パラメータと、背景雑音の分析により得られた第２の線形予測パラメータとが格納され、格納された第１及び第２の線形予測パラメータから入力音声信号に基づいて線形予測パラメータを選択し、選択した前記線形予測パラメータに予め対応付けられたインデックスを出力する線形予測符号帳と、
所定形状の音源ベクトルを格納する複数の固定符号帳と、
出力された前記インデックスが前記第１の線形予測パラメータを示すか、前記第２の線形予測パラメータを示すかに応じて、前記複数の固定符号帳のいずれかを選択する駆動音源選択手段と、
を具備する音声符号化装置。 The first linear prediction parameter obtained by the speech analysis and the second linear prediction parameter obtained by the background noise analysis are stored, and the input speech signal is stored from the stored first and second linear prediction parameters. A linear prediction codebook that selects a linear prediction parameter based on and outputs an index previously associated with the selected linear prediction parameter;
A plurality of fixed codebooks that store sound source vectors of a predetermined shape;
Driving excitation selection means for selecting one of the plurality of fixed codebooks depending on whether the output index indicates the first linear prediction parameter or the second linear prediction parameter;
A speech encoding apparatus comprising:

前記線形予測符号帳は、２つの格納領域を有し、一方の格納領域に前記第１の線形予測パラメータを格納し、他方の格納領域に前記第２の線形予測パラメータを格納する請求項１に記載の音声符号化装置。 The linear prediction codebook has two storage areas, stores the first linear prediction parameter in one storage area, and stores the second linear prediction parameter in the other storage area. The speech encoding device described.

前記駆動音源選択手段は、出力された前記インデックスが前記第１の線形予測パラメータを示す場合、前記複数の固定符号帳をそれぞれ選択したときに得られる二乗誤差が最小となる固定符号帳を選択する請求項１に記載の音声符号化装置。 When the output index indicates the first linear prediction parameter, the driving excitation selection unit selects a fixed codebook that minimizes a square error obtained when each of the plurality of fixed codebooks is selected. The speech encoding apparatus according to claim 1.

前記駆動音源選択手段は、出力された前記インデックスが前記第２の線形予測パラメータを示す場合、前記複数の固定符号帳のうち、予め決定された一つの固定符号帳を選択する請求項１に記載の音声符号化装置。 2. The drive excitation selection unit, when the output index indicates the second linear prediction parameter, selects a predetermined fixed codebook among the plurality of fixed codebooks. Speech encoding device.

音声の分析により得られた第１の線形予測パラメータと、背景雑音の分析により得られた第２の線形予測パラメータとが格納された線形予測符号帳から、入力音声信号に基づいて選択された線形予測パラメータを示すインデックスを出力するステップと、
出力された前記インデックスが前記第１の線形予測パラメータを示すか、前記第２の線形予測パラメータを示すかに応じて、複数の固定符号帳のいずれかを選択するステップと、
を具備する音声符号化方法。

The linear selected based on the input speech signal from the linear prediction codebook storing the first linear prediction parameter obtained by the speech analysis and the second linear prediction parameter obtained by the background noise analysis Outputting an index indicating the prediction parameter;
Selecting one of a plurality of fixed codebooks depending on whether the output index indicates the first linear prediction parameter or the second linear prediction parameter;
A speech encoding method comprising: