JP2006325162A

JP2006325162A - Device for performing multi-channel space voice coding using binaural queue

Info

Publication number: JP2006325162A
Application number: JP2005148763A
Authority: JP
Inventors: Rin Ryuu Wei; リン・リュウウェイ; Sen Chon Kok; セン・チョンコク; Naoya Tanaka; 直也田中; Hon Neo Sua; ホン・ネオスア
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-05-20
Filing date: 2005-05-20
Publication date: 2006-11-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device which generates mutually orthogonal and noncorrelated sound signals using a down mix voice signals as input for channel separation. <P>SOLUTION: This invention proposes the device which generates (a plurality of) noncorrelated voice signals having necessary characteristics using (one) down mix sound signal as input to improve a sound image and using the noncorrelation for the purpose of channel separation. In addition, this invention properly uses ICCH instead of ICC to improve the sound image. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本明細書においては、符号化処理においてバイノーラルキューを抽出して、ダウンミックス信号を生成し、復号化処理において前記バイノーラルキューを前記ダウンミックス信号に適用することでマルチチャネル音声信号を圧縮する装置に関する発明について詳細に説明する。本発明は、トレーニング用シミュレータ、カーオーディオシステム、家庭用またはビジネス用のオーディオ・ビデオシステム等に適用可能である。 The present invention relates to an apparatus for compressing a multi-channel audio signal by extracting a binaural cue in an encoding process to generate a downmix signal and applying the binaural cue to the downmix signal in a decoding process. The invention will be described in detail. The present invention is applicable to training simulators, car audio systems, home or business audio / video systems, and the like.

本発明は、従来のマルチチャネル音声符号化技術を改良した技術を提供する。本発明の目的は、ビットレートに制約がある場合でも、知覚上のクオリティー（空間的イメージや音の忠実度など）を保ちつつマルチチャネル音声信号の符号化を行うことである。ビットレートが低くなると、マルチチャネル音声信号の送信や記憶に必要とされる帯域幅や記憶容量を低減できる。また本発明は、従来技術との互換性を保つため、従来の符号化方式や規格をベースとしている。 The present invention provides a technique obtained by improving the conventional multi-channel speech coding technique. An object of the present invention is to encode a multichannel audio signal while maintaining perceptual quality (spatial image, sound fidelity, etc.) even when the bit rate is limited. When the bit rate is lowered, the bandwidth and storage capacity required for transmission and storage of multi-channel audio signals can be reduced. In addition, the present invention is based on a conventional encoding method or standard in order to maintain compatibility with the prior art.

少ないビットでステレオ音声信号を表現するための従来技術の例としてジョイントステレオ符号化がある。ジョイントステレオ符号化の方法として、Ｍｉｄｄｌｅ／Ｓｉｄｅ（ＭＳ）ステレオ符号化およびインテンシティーステレオ符号化が一般に用いられる。ＭＳステレオ符号化は、音声信号間の相関性が高い場合に非常に効率的な符号化方法である。なぜなら、ＭＳステレオ符号化においては、右チャネルと左チャネルの代わりに和（Ｍまたはミドル）チャネルと差分（Ｓまたはサイド）チャネルとが用いられるが、この場合、差分信号は非常に小さいからである。インテンシティーステレオ符号化では、高周波数の信号において、Ｌ信号およびＲ信号を、一つの代表信号と方向性情報とで置き換えることでビットレートの低減を実現している。これは、高周波数の信号位相に対しては感度が低いという人間の聴覚システムが持つ特徴を利用したものである。 As an example of the prior art for expressing a stereo audio signal with a small number of bits, there is joint stereo coding. As a method of joint stereo coding, Middle / Side (MS) stereo coding and intensity stereo coding are generally used. MS stereo coding is a very efficient coding method when the correlation between audio signals is high. This is because, in MS stereo coding, a sum (M or middle) channel and a difference (S or side) channel are used instead of the right and left channels, and in this case, the difference signal is very small. . Intensity stereo coding replaces the L signal and the R signal with one representative signal and directionality information in a high-frequency signal to realize a reduction in bit rate. This utilizes the feature of the human auditory system that is less sensitive to high frequency signal phases.

通常、音声信号が人間の左右の耳に届くまでには時間差があり、左右の耳に届く音声レベルも異なる。これらの差は、音声信号がそれぞれ異なる経路を通って左右の耳に届くことに起因する。聴き手の脳は、音声信号が左右の耳に届くまでの時間差とレベル差を分析し、聴き手に対して、受け取った音声信号の音源が当該聴き手に対してどの位置にあるのかを知覚させる。聴き手は、その音源の方向および距離の両方を識別できる。一または複数の異なる場所にある一または複数の音源から音声信号を受け取った聴き手は、聴覚情景を描くことが可能となる。 Normally, there is a time difference until the audio signal reaches the left and right ears of a human, and the audio level reaching the left and right ears is also different. These differences are caused by the sound signals reaching the left and right ears through different paths. The listener's brain analyzes the time difference and level difference until the audio signal reaches the left and right ears, and perceives the position of the sound source of the received audio signal relative to the listener. Let The listener can identify both the direction and distance of the sound source. A listener who receives audio signals from one or more sound sources in one or more different locations can draw an auditory scene.

特許文献１、２、および３によれば、近年、ビットレートの低減を目的として、音声符号化処理においてチャネル間レベル／強度差（ＩＬＤ）、チャネル間位相／遅延差（ＩＰＤ）、チャネル間干渉性／相関性（ＩＣＣ）等のバイノーラルキューが多く用いられている。バイノーラルキューは、まず符号化処理においてオリジナル音声信号から導出された後、ダウンミック信号とともに送出され、復号化処理において音声信号を復元するために前記ダウンミックス信号を変換するために用いられる。ＩＬＤキューからは二つの信号間の相対的な信号パワーを測定でき、ＩＰＤキューからは音が両耳に届くまでの時間差を測定でき、ＩＣＣキューからは二つのチャネル間の干渉性や類似度を測定できる。これらのキューは、マルチチャネル音声信号の音源の拡散性、位置関係、方向性を特定し、聴き手が聴覚的情景を頭の中で構成するのを助ける空間的パラメータとなる。 According to Patent Documents 1, 2, and 3, in recent years, for the purpose of reducing the bit rate, inter-channel level / intensity difference (ILD), inter-channel phase / delay difference (IPD), and inter-channel interference in speech coding processing. Binaural cues such as sex / correlation (ICC) are often used. The binaural cue is first derived from the original audio signal in the encoding process, and then transmitted together with the downmic signal. The binaural cue is used to convert the downmix signal in order to restore the audio signal in the decoding process. The relative signal power between two signals can be measured from the ILD queue, the time difference until the sound reaches both ears can be measured from the IPD queue, and the interference and similarity between the two channels can be measured from the ICC queue. It can be measured. These cues are spatial parameters that identify the diffusivity, positional relationship, and directionality of the sound source of the multi-channel audio signal and help the listener compose an auditory scene in the head.

図１は、音声符号化においてバイノーラルキューを用いる典型的な音声エンコーダ／デコーダ（コーデック）を示す図である。符号化処理において、音声信号はフレームごとに処理される。ダウンミックスモジュール（１００）は、左（Ｌ）チャネルおよび右（Ｒ）チャネルをダウンミックスし、ダウンミックス信号Ｍを生成する。ここで、Ｍ＝（Ｌ＋Ｒ）／２である。これら３つの信号Ｌ、ＲおよびＭを入力として、バイノーラルキュー抽出モジュール（１０２）はバイノーラルキューを生成する。これらのバイノーラルキューは、通常、周波数だけの領域、または時間領域と周波数領域とのハイブリッド型の領域において生成される。バイノーラルキューの生成は、通常、モジュール（１０２）に対して、高速フーリエ変換（ＦＦＴ）、変形離散コサイン変換（ＭＤＣＴ）等の時間−周波数変換の関数、またはＱＭＦバンク等のハイブリッド型の時間−周波数変換の関数を実装することで実現される。一般に、上述のようなコーデックにおいては、音声信号処理はフレーム単位で行われる。 FIG. 1 is a diagram illustrating a typical speech encoder / decoder (codec) that uses binaural cues in speech coding. In the encoding process, the audio signal is processed for each frame. The downmix module (100) downmixes the left (L) channel and the right (R) channel to generate a downmix signal M. Here, M = (L + R) / 2. With these three signals L, R and M as inputs, the binaural cue extraction module (102) generates a binaural cue. These binaural cues are usually generated in a frequency-only region or a hybrid region between a time region and a frequency region. Binaural cues are usually generated for a module (102) by a time-frequency conversion function such as fast Fourier transform (FFT) or modified discrete cosine transform (MDCT), or a hybrid time-frequency such as a QMF bank. Realized by implementing a conversion function. In general, in a codec as described above, audio signal processing is performed in units of frames.

次に、オーディオエンコーダ（１０４）はＭから圧縮ビットストリームを生成する。モジュール（１０６）は、量子化されたバイノーラルキューと前記ビットストリームとを多重化し、完全なビットストリームを形成する。一般にオーディオエンコーダにおいては、ＭＰ３やＡＡＣ等の規格に基づくアルゴリズムが用いられる。 The audio encoder (104) then generates a compressed bitstream from M. The module (106) multiplexes the quantized binaural cue and the bitstream to form a complete bitstream. In general, an audio encoder uses an algorithm based on a standard such as MP3 or AAC.

復号化処理において、デマルチプレクサ（１０８）は、伝送媒体または記憶媒体を介して受信したＭのビットストリームをバイノーラルキュー副情報から分離する。オーディオデコーダ（１１０）はダウンミックス信号Ｍを再生する。再生されたダウンミックス信号Ｍは、マルチチャネル分離モジュール（１１２）に送出さる。マルチチャネル分離モジュール（１１２）は、ダウンミックス信号と逆量子化されたバイノーラルキューとを入力とし、マルチチャネル信号を復元する。 In the decoding process, the demultiplexer (108) separates the M bitstream received via the transmission medium or storage medium from the binaural queue sub-information. The audio decoder (110) reproduces the downmix signal M. The reproduced downmix signal M is sent to the multi-channel separation module (112). The multi-channel separation module (112) receives the downmix signal and the dequantized binaural cue as inputs, and restores the multi-channel signal.

上記ビットレート低減という課題だけでなく、音声信号の聴覚上のクオリティーを保つという課題も存在する。復号された音声信号のクオリティーを向上させるための技術が数多く存在するが、そのうち非特許文献１は、音の聴覚上のクオリティーを向上させるためにエコーや残響を用いる手法を提案している。離散時間信号処理に基づく人工的な残響は、１９６０年代初めにその使用が始まって以来非特許文献２、音響工学の分野において広く利用されている。音響工学における残響の応用例として、室内音響学シミュレーション、音楽知覚クオリティーの向上、相関性のない出力の生成等が挙げられる。 In addition to the problem of reducing the bit rate, there is a problem of maintaining the auditory quality of the audio signal. There are many techniques for improving the quality of the decoded speech signal. Among them, Non-Patent Document 1 proposes a technique using echo and reverberation in order to improve the auditory quality of sound. Artificial reverberation based on discrete-time signal processing has been widely used in the field of non-patent literature 2 and acoustic engineering since its use began in the early 1960s. Application examples of reverberation in acoustic engineering include room acoustics simulation, improvement of music perception quality, generation of uncorrelated output, and the like.

非特許文献１は、相関性のない出力を生成するためにフィードバック遅延システム（ＦＤＮ）を用いた実装について記載している。図２はＦＤＮの例を示す図である。ＦＤＮは、オールパスフィルタ（２０２）と、複数の遅延線（２０４〜２１０）と、フィードバック行列（２１２）とで構成される。フィードバック行列によって、各遅延線からの出力を各遅延入力にフィードバックすることが可能になる。特に、ＳｔａｕｔｎｅｒおよびＰｕｃｋｅｔｔｅが提案するＦＤＮ（非特許文献３）は、互いに非干渉的であり無相関的である出力を生成できるという望ましい特徴を有しているため、チャネル分離に利用できる。例えば図３に示すＭおよびＭ_0,revのように、互いに非干渉的であり無相関的である二つの出力は直交的なベクトル関係を有している。 Non-Patent Document 1 describes an implementation that uses a feedback delay system (FDN) to generate an uncorrelated output. FIG. 2 is a diagram illustrating an example of FDN. The FDN includes an all-pass filter (202), a plurality of delay lines (204 to 210), and a feedback matrix (212). The feedback matrix allows the output from each delay line to be fed back to each delay input. In particular, the FDN proposed by Stautner and Puckette (Non-Patent Document 3) has a desirable feature that it can generate outputs that are non-interfering and uncorrelated with each other, and thus can be used for channel separation. For example, like M and M _{0, rev} shown in FIG. 3, two outputs that are incoherent and uncorrelated with each other have an orthogonal vector relationship.

本発明は、従来のバイノーラルキュー符号化に基づく手法を改善することを目的とする。特に、本発明では、チャネル分離の過程において追加的に無相関信号および残響信号を用いる。これによって、チャネル分離ステージ毎に、他のステージで用いられた残響信号とは異なる残響信号を用いることが可能となる。
International Patent Publication WO03/090208A1, "Parametric Representation of Spatial Audio" US2003/0035553A1, "Backwards-Compatible Perceptual Coding of Spatial Cues" US2003/0236583A1, "Hybrid Multi-channel/Cue Coding/Decoding of Audio Signals" Karls, M., Brandenburg, K., et al, "Applications of Digital Signal Processing to Audio and Acoustics", Kluwear Academic Press. Schroeder MR. (1962), "Natural Sounding Artificial Reverberation". J. Audio Eng. Soc., 10(3) Stautner J and Puckette M. (1982), "Designing multi-channel reverberators". Computer Music Journal. 6(1): 52-65. JP2004/248989, "Encoding and Decoding Devices for Audio Signals" An object of the present invention is to improve a technique based on conventional binaural cue coding. In particular, the present invention additionally uses uncorrelated signals and reverberation signals in the process of channel separation. As a result, a reverberation signal different from the reverberation signals used in other stages can be used for each channel separation stage.
International Patent Publication WO03 / 090208A1, "Parametric Representation of Spatial Audio" US2003 / 0035553A1, "Backwards-Compatible Perceptual Coding of Spatial Cues" US2003 / 0236583A1, "Hybrid Multi-channel / Cue Coding / Decoding of Audio Signals" Karls, M., Brandenburg, K., et al, "Applications of Digital Signal Processing to Audio and Acoustics", Kluwear Academic Press. Schroeder MR. (1962), "Natural Sounding Artificial Reverberation". J. Audio Eng. Soc., 10 (3) Stautner J and Puckette M. (1982), "Designing multi-channel reverberators". Computer Music Journal. 6 (1): 52-65. JP2004 / 248989, "Encoding and Decoding Devices for Audio Signals"

本発明は、符号化処理においてＱＭＦフィルタバンクを用いて音声チャネルを時間−周波数（Ｔ／Ｆ）表現に変換するバイノーラルキュー符号化方法に関する。本明細書では、Ｘに対する処理が時間−周波数領域で行われる場合、Ｘの量または関数をＸ（ｔ，ｆ）と
表記する。 The present invention relates to a binaural cue coding method for converting a voice channel into a time-frequency (T / F) representation using a QMF filter bank in the coding process. In this specification, when the process for X is performed in the time-frequency domain, the amount or function of X is expressed as X (t, f).

チャネル分離の際、全てのチャネルに対して同じ残響信号または無相関信号(Ｍ_rev（ｔ，ｆ）)を用いると、結果として得られる復元信号の音像は、一聴して音幅の狭さを感じさせるような改善の余地を残すものである。 When the same reverberation signal or non-correlated signal (M _rev (t, f)) is used for all channels during channel separation, the sound image of the resulting restored signal can be listened to by narrowing the sound range. It leaves room for improvement that makes you feel.

空間音声符号化においてマルチチャネル信号を分離するために、パラメトリックステレオ（ＰＳ）モジュール（４０４）〜（４１２）を、図４に示すようにカスケード接続する。このような構成により、符号化・復号化処理において、受け取る残響のレベルをチャネルごとに異ならせることができる。例えば、図４の例では、チャネルＣおよびチャネルＬＦＥが受け取る残響のレベルは、他のチャネルが受け取る残響レベルより低くなる。 In order to separate multi-channel signals in spatial speech coding, parametric stereo (PS) modules (404)-(412) are cascaded as shown in FIG. With such a configuration, the level of reverberation received can be made different for each channel in the encoding / decoding process. For example, in the example of FIG. 4, the level of reverberation received by channel C and channel LFE is lower than the reverberation level received by other channels.

本発明の実施の形態１では、チャネル分離のために、ダウンミックス音声信号を入力として用いて、互いに直交的で無相関的な音声信号を生成する装置を提案する。 Embodiment 1 of the present invention proposes an apparatus for generating orthogonal and non-correlated audio signals using a downmix audio signal as an input for channel separation.

実施の形態２では、まずダウンミックスチャネルとオリジナルチャネルとの間のベクトル関係をバイノーラルキューから決定した後、前記ダウンミックス信号と当該ダウンミックス信号に直交である信号との間の正確なベクトル関係をシミュレートするための新しいミキシング方法を提案する。 In the second embodiment, first, after determining the vector relationship between the downmix channel and the original channel from the binaural cue, the accurate vector relationship between the downmix signal and the signal orthogonal to the downmix signal is determined. A new mixing method for simulating is proposed.

実施の形態３では、上記複数の無相関信号と上記新しいミキシング方法とを組み合わせて用いることで、チャネル分離をマルチチャネルに応用する方法を提案する。 Embodiment 3 proposes a method of applying channel separation to multichannel by using a combination of the plurality of uncorrelated signals and the new mixing method.

本発明は、チャネル分離ステージ毎に異なる残響信号（Ｍ_i,rev（ｔ，ｆ））を用いることで音像の向上を目指す。これによって、復元された各チャネルは、復元の時点で他のチャネルとは十分に“異なる”ことになり、結果として音幅と音像が向上する。さらに本発明では、全ての残響信号は確実に同時に生成され、チャネル分離において各チャネルは確実に同じレベルの残響信号を受け取ることができる。 The present invention aims to improve a sound image by using a reverberation signal (M _{i, rev} (t, f)) that is different for each channel separation stage. Thereby, each restored channel is sufficiently “different” from the other channels at the time of restoration, and as a result, the sound width and the sound image are improved. Furthermore, in the present invention, all reverberation signals are reliably generated simultaneously, and in channel separation, each channel can reliably receive the same level of reverberation signal.

本発明は、従来技術による方法で生成された復元音声信号の音像と比較して改善された音像の生成を実現する。これは、チャネル分離ステージ毎に異なる残響信号を用いることができるように、複数の残響信号を生成する無相関器を実装すること、およびＩＣＣの代わりに適宜ＩＣＣＨを用いることで実現される。 The present invention realizes improved sound image generation compared to the sound image of the restored sound signal generated by the method according to the prior art. This is realized by mounting a decorrelator that generates a plurality of reverberation signals so that different reverberation signals can be used for each channel separation stage, and appropriately using ICCH instead of ICC.

以下に示す実施の形態は、本発明の様々な進歩性の原理を例示しているにすぎず、以下に示す詳細な説明に対して種々変形を加えることが可能であることは、当業者であれば容易に理解するところである。従って、本発明は特許請求の範囲によってのみ制限されるものであって、以下に示す詳細な具体例よって限定されるものではない。 It will be understood by those skilled in the art that the embodiments described below merely illustrate various inventive principles of the present invention, and various modifications can be made to the detailed description given below. If there is, it is easy to understand. Therefore, the present invention is limited only by the scope of the claims, and is not limited by the specific examples shown below.

さらに、ここでは、ステレオ−モノラル−ステレオ（以降、“２-１-２ケース”と記す）および５チャネル−モノラル−５チャネル（以降、“５-１-５ケース”と記す）の２つのケースのみを示しているが、本発明はこれに限定されるものではない。これを、ＭオリジナルチャネルおよびＮダウンミックスチャネルとして一般化することができる。 Further, here, there are two cases: stereo-mono-stereo (hereinafter referred to as “2-1-2 case”) and 5-channel-mono-5 channel (hereinafter referred to as “5-1-5 case”). However, the present invention is not limited to this. This can be generalized as an M original channel and an N downmix channel.

本発明の実施の形態１において、図２に示す無相関器（２００）は、一つのダウンミックス信号から、互いに干渉性がなく無相関的な出力を生成する。無相関器の出力Ｍ_revは、互いに無相関的であり、直交である。遅延線（２０４〜２１０）の遅延長は、図２においてｍ₀、ｍ₁、ｍ₂、およびｍ₃として示されるように、互いに素である必要がある。 In Embodiment 1 of the present invention, the decorrelator (200) shown in FIG. 2 generates a non-correlated output from one downmix signal without interference. The decorrelator outputs M _rev are uncorrelated and orthogonal to each other. Delay length of the delay line (204-210), as shown as m _0, m _1, m _2, and m ₃ in FIG. 2, there must be relatively prime.

非特許文献３において指摘されているように、フィードバック行列Ａ（２１２）において、出力Ｍ_i,revが互いに干渉性を持たないように遅延出力をミックスする。信号間に互いに干渉性がないということは、それらの信号が互いに直交であることを意味する。このような関係を数学的に表現すると以下のようになる。 As pointed out in Non-Patent Document 3, in the feedback matrix A (212), the delay outputs are mixed so that the outputs M _{i, rev} are not coherent with each other. That there is no interference between the signals means that the signals are orthogonal to each other. This relationship is expressed mathematically as follows.

上記数式、および以降の説明において、● は内積を表わす。 In the above formula and the following description, ● represents an inner product.

さらに、オリジナル信号（Ｍ）の残響信号（Ｍ_rev）が当該オリジナル信号（Ｍ）に対して直交であるというモデルは音響工学においては広く受け入れられている。オールパスフィルタ（２０２）によって、ＦＤＮで生成されるすべての信号はオリジナルダウンミックス信号Ｍに対して確実に直交である。これを数学的に表現すると、 Furthermore, the model that the reverberation signal (M _rev ) of the original signal (M) is orthogonal to the original signal (M) is widely accepted in acoustic engineering. The all-pass filter (202) ensures that all signals generated by the FDN are orthogonal to the original downmix signal M. Expressing this mathematically,

図５において、非特許文献３が５-１-５ケースで用いているフィードバック行列Ａの例を行列（５００）として示している。行列（５００）はユニタリー行列であり、行列要素は次の関係が満たされるように設定される。 In FIG. 5, an example of the feedback matrix A used in Non-Patent Document 3 in the case of 5-1-5 is shown as a matrix (500). The matrix (500) is a unitary matrix, and the matrix elements are set so that the following relationship is satisfied.

本発明の実施の形態２では、チャネル分離のために、バイノーラルキューを用いてダウンミックスチャネルとオリジナルチャネルとの間のベクトル関係を導出するという新しいミキシング方法を説明する。 In the second embodiment of the present invention, a new mixing method for deriving a vector relationship between a downmix channel and an original channel using binaural cues for channel separation will be described.

図６は２-１-２ケースにおける符号化処理を示す図である。変換モジュール（６００）は、例えば複素ＱＭＦフィルタバンクであり、オリジナルチャネルＬ（ｔ）およびＲ（ｔ）を処理し、それぞれの時間−周波数表現Ｌ（ｔ，f）およびＲ（ｔ，f）を生成する。時間−周波数領域において、一つの信号は連続する複数のサブバンドを含んでおり、それぞれのサブバンドはオリジナル信号の狭い周波数帯域を表している。周波数帯域の幅にバリエーションを持たせるために、ＱＭＦフィルタバンクを複数のステージで構成することができる。これによって、より細かな分割が求められるサブバンドに対しては狭い周波数帯域を与えることができ、より粗い分割が求められるサブバンドに対しては広い周波数帯域を与えることができる。 FIG. 6 is a diagram showing an encoding process in the 2-1-2 case. The transform module (600) is, for example, a complex QMF filter bank, which processes the original channels L (t) and R (t) and converts the respective time-frequency representations L (t, f) and R (t, f). Generate. In the time-frequency domain, one signal includes a plurality of continuous subbands, and each subband represents a narrow frequency band of the original signal. In order to provide variations in the width of the frequency band, the QMF filter bank can be composed of a plurality of stages. As a result, a narrow frequency band can be given to subbands that require finer division, and a wide frequency band can be given to subbands that require coarser division.

ダウンミックスモジュール（６０２）は、Ｌ（ｔ，f）およびＲ（ｔ，f）を処理してダウンミックス信号Ｍ（ｔ，f）を生成する。図６においては”重み付け”を用いた方法を示す。 The downmix module (602) processes L (t, f) and R (t, f) to generate a downmix signal M (t, f). FIG. 6 shows a method using “weighting”.

本発明では、ＩＬＤキューを用いてレベル調整を行う。モジュール（６０４）は、Ｌ（ｔ，f）およびＲ（ｔ，f）を処理し、ＩＬＤ（ｌ，ｂ）およびＢｏｒｄｅｒを生成する。図７に示すように、時間−周波数領域において、まずＬ（ｔ，f）を周波数方向に複数の帯域（７００）に分割する。それぞれの帯域は複数のサブバンドを含む。耳が持つ音響心理学的特性を利用して、低周波数帯域は、高周波数帯域よりもサブバンドの数が少なくなっており、高周波数帯域よりも細かく分割される。例えば、最適な処理を行うためには、サブバンドへの分割を行う際に分割位置を正確に微調整することが望ましいが、音響心理学の分野でよく知られている”バーク尺度”または”臨界帯域”を用いて分割を行ってもよい。 In the present invention, level adjustment is performed using an ILD queue. Module (604) processes L (t, f) and R (t, f) to generate ILD (l, b) and Border. As shown in FIG. 7, in the time-frequency domain, L (t, f) is first divided into a plurality of bands (700) in the frequency direction. Each band includes a plurality of subbands. Utilizing the psychoacoustic characteristics of the ear, the low frequency band has a smaller number of subbands than the high frequency band, and is divided more finely than the high frequency band. For example, for optimal processing, it is desirable to precisely fine-tune the division position when dividing into subbands, but the “Bark scale” or “ The division may be performed using the “critical band”.

Ｌ（ｔ，f）およびＲ（ｔ，f）はさらに時間方向にＢｏｒｄｅｒ（７０２）で周波数帯域（ｌ，ｂ）に分割され、これに対してＥ_L（ｌ，ｂ）およびＥ_R（ｌ，ｂ）を計算する。本明細書においては、ｌは時間的区分の指標であり、ｂは（周波数）帯域の指標を示す。Ｂｏｒｄｅｒの最適な配置場所は、Ｅ_L（ｌ，ｂ）およびＥ_R（ｌ，ｂ）の比率の急激な変化に代表される過渡事象が発生する時間的位置である。次にモジュール（６０４）は、次の数式に基づきＩＬＤ（ｌ,ｂ）を算出する。 L (t, f) and R (t, f) are further divided into frequency bands (l, b) by Border (702) in the time direction, whereas E _L (l, b) and E _R (l , B). In this specification, l is an index of time division, and b is an index of (frequency) band. The optimum placement location of Border is a time position where a transient event represented by a sudden change in the ratio of E _L (l, b) and E _R (l, b) occurs. Next, the module (604) calculates ILD (l, b) based on the following mathematical formula.

符号化処理において、モジュール（６０６）はＬ（ｔ，f）およびＲ（ｔ，f）を処理してＩＣＣキューを求める。ＩＣＣ（ｌ，ｂ）以下の数式を用いて求められる。 In the encoding process, the module (606) processes L (t, f) and R (t, f) to obtain an ICC queue. ICC (l, b) is calculated using the following equation.

さらに、符号化処理において、モジュール（６０８）はＬ（ｔ，f）およびＲ（ｔ，f）を処理し高周波数サブバンド（＞１．５ｋＨｚのみ）に対する高周波数ＩＣＣＨ（ＩＣＣＨ）キューを求める。ＩＣＣＨ（ｌ，ｂ）は以下の数式を用いて求められる。 Further, in the encoding process, module (608) processes L (t, f) and R (t, f) to determine a high frequency ICCH (ICCH) queue for the high frequency subband (> 1.5 kHz only). ICCH (l, b) is obtained using the following equation.

ＩＣＣ（ｌ，ｂ）およびＩＬＤ（ｌ，ｂ）は、チャネル分離において、Ｍに対する二つのチャネルの実際の信号強度を導出するためのゲインファクターを算出するために用いられる。ＩＣＣ（ｌ，ｂ）に関しては、低周波数におけるＬとＲとの間の位相関係の計測にも用いられる。結果として、ＩＣＣ（ｌ，ｂ）は、ＬとＲの分離度合いの計測にも役立つ。しかしながら高周波数（通常、＞１．５ｋＨｚ）においては、音が分離していることによってもたらされる効果は、位相差ではなく、ＬおよびＲの波形の類似度または相関性に影響される。よって、このような波形相関の計測にはＩＣＣＨ（ｌ，ｂ）の利用がより適している。 ICC (l, b) and ILD (l, b) are used in channel separation to calculate a gain factor for deriving the actual signal strength of the two channels for M. ICC (l, b) is also used to measure the phase relationship between L and R at low frequencies. As a result, ICC (l, b) is also useful for measuring the degree of separation between L and R. However, at high frequencies (typically> 1.5 kHz), the effect brought about by the separation of the sounds is influenced by the similarity or correlation of the L and R waveforms, not the phase difference. Therefore, the use of ICCH (l, b) is more suitable for such waveform correlation measurement.

図１におけるモジュール（１０２）の出力として示すように、上記バイノーラルキューは、符号化処理における副情報の一部となる。図８に示すように、バイノーラルキュー生成のための全処理は、モジュール（８００）においてＬ（ｔ，f）およびＲ（ｔ，f）を入力として実行され、ＩＣＣ（ｌ，ｂ）、ＩＣＣＨ（ｌ，ｂ）、Ｂｏｒｄｅｒ、およびＩＬＤ（ｌ，ｂ）が出力される。 As shown as the output of the module (102) in FIG. 1, the binaural cue becomes part of the sub information in the encoding process. As shown in FIG. 8, the entire process for generating the binaural queue is executed in the module (800) with L (t, f) and R (t, f) as inputs, and ICC (l, b), ICCH ( l, b), Border, and ILD (l, b) are output.

図９は、上述のバイノーラルキューを利用してチャネル分離を行う復号化処理を示す図である。変換モジュール（９００）はダウンミックス信号Ｍ（ｔ）を処理し、時間−周波数表現Ｍ（ｔ，f）に変換する。この場合、変換モジュールは複素ＱＭＦフィルタバンクである。 FIG. 9 is a diagram showing a decoding process for performing channel separation using the binaural queue described above. A conversion module (900) processes the downmix signal M (t) and converts it into a time-frequency representation M (t, f). In this case, the transform module is a complex QMF filter bank.

無相関器（９０２）はＭ（ｔ，f）を処理し、チャネル分離に用いるための直交信号を二つ生成する。図２に示されるモジュール（２００）は、そのような無相関器の一例である。本実施の形態においては、無相関器としてモジュール（２００）を想定し、チャネル分離に用いる信号としてＭ（ｔ，f）およびＭ_0,rev（ｔ，f）を想定する。実際上、図９に示すＳ１（ｔ，f）およびＳ２（ｔ，f）はそれぞれ図２に示すＭ（ｔ，f）およびＭ_0,rev（ｔ，f）である。 A decorrelator (902) processes M (t, f) and generates two orthogonal signals for use in channel separation. The module (200) shown in FIG. 2 is an example of such a decorrelator. In the present embodiment, a module (200) is assumed as a decorrelator, and M (t, f) and M _{0, rev} (t, f) are assumed as signals used for channel separation. In practice, S1 (t, f) and S2 (t, f) shown in FIG. 9 are M (t, f) and M _{0, rev} (t, f) shown in FIG. 2, respectively.

次に復号化処理において、モジュール（９０６）はモジュール（９０４）からの出力に基づいてチャネル分離を行う。モジュール（９０４）は（ｌ，ｂ）として示される帯域それぞれについて、バイノーラルキューＢｏｒｄｅｒ、ＩＬＤ（ｌ，ｂ）、ＩＣＣ（ｌ，ｂ）、およびＩＣＣＨ（ｌ，ｂ）からミキシング係数ｇ_L（ｌ，ｂ）、ｇ_R（
ｌ，ｂ）、θ_L（ｌ，ｂ）、およびθ_R（ｌ，ｂ
）を求める。これらのミキシング係数はモジュール（９０６）に送られる。モジュール（９０６）は、求められたミキシング係数に基づいてミキシングファクターｇ_L1（ｌ，ｂ）、ｇ_L2（ｌ，ｂ）、ｇ_R1（ｌ，ｂ）、およびｇ_R2（ｌ，ｂ）を算出し、チャネル分離を行う。 Next, in the decoding process, the module (906) performs channel separation based on the output from the module (904). Module (904) for each band denoted as (l, b), from binaural queues Border, ILD (l, b), ICC (l, b), and ICCH (l, b), mixing coefficients g _L (l, b), g _R (
l, b), θ _L (l, b), and θ _R (l, b
) These mixing coefficients are sent to the module (906). The module (906) calculates mixing factors g _L1 (l, b), g _L2 (l, b), g _R1 (l, b), and g _R2 (l, b) based on the obtained mixing coefficients. Channel separation.

ミキシング係数の算出およびチャネル分離についての数学的根拠については後述する。記載の簡略化を図るため、以下では（ｌ，ｂ）の表記を省略する。 The mathematical basis for calculating the mixing coefficient and channel separation will be described later. In order to simplify the description, the notation of (l, b) is omitted below.

図６に示すダウンミックス処理を参照して、Ｌ、Ｒ、およびＭのエネルギー間の関係を以下のように導出する。 With reference to the downmix process shown in FIG. 6, the relationship between L, R, and M energies is derived as follows.

従来、ＩＬＤおよびＩＣＣは以下のように定義されている。 Conventionally, ILD and ICC are defined as follows.

このため、Ｌ’チャネルおよびＲ’チャネルを分離するために、Ｍを適切なレベルにまで増幅するのに必要なゲイン係数ｇ_Lおよびｇ_Rは、上記ＩＬＤおよびＩＣＣの定義を数式Ｅ_Mに代入することで求めることができる。 For this reason, in order to separate the L ′ channel and the R ′ channel, the gain coefficients g _L and g _R necessary for amplifying M to an appropriate level are substituted with the definitions of the ILD and ICC in the equation E _M. You can ask for it.

チャネル分離を完了させるために、θ_Lおよびθ_Rとして示される二つのチャネルの分離の度合いを求める必要がある。図１０はＭ、ＬおよびＲ間のベクトル関係を幾何学的に示した図である（特許文献４）。ここでは、角度測定はすべて図１０を参照して行うものとする。高周波数（通常、＞１．５ｋＨｚ）に対しては（θ_L＋θ_R）をθ＝ｃｏｓ^-1（ＩＣＣＨ）に設定し、低周波数に対しては（θ_L＋θ_R）をθ＝ｃｏｓ^-1（
ＩＣＣ）に設定する。 In order to complete the channel separation, it is necessary to determine the degree of separation between the two channels, denoted as θ _L and θ _R. FIG. 10 is a diagram geometrically showing the vector relationship among M, L and R (Patent Document 4). Here, all angle measurements are performed with reference to FIG. For high frequencies (typically> 1.5 kHz), set (θ _L + θ _R ) to θ = cos ⁻¹ (ICCH), and for low frequencies (θ _L + θ _R ) to θ = cos ^{− 1} (
ICC).

タンジェントの三角関数の定義を適用することで、θ_Rは次のように導出される。 By applying the definition of the tangent trigonometric function, θ _R is derived as follows.

同様に、θ_Lは次のように導出される Similarly, θ _L is derived as

モジュール（９０６）は、二つの無相関信号Ｓ１（ｔ，ｆ）およびＳ２（ｔ，ｆ）をミキシングしてＬおよびＲを復元することでチャネル分離を行い、Ｌ’およびＲ’を得る。なお、Ｌ’およびＲ’はオリジナルのＬおよびＲを複製したものではなく、シミュレーションしたものである。実施の形態１で説明したように、無相関器（２００）は、｜Ｍ｜＝｜Ｍ_0,rev｜となるように、そして信号ＭおよびＭ_0,revが直交的なベクトル関係を有すように設計されている。図３を参照し、ＸをＬおよびＲとすると、ミキシング処理は、ミ
キシングファクターｇ_L1、ｇ_L2、ｇ_R1、およびｇ_R2を用いてＭおよびＭ_0,revをスケーリングすることと、それに続くベクトル加算によって表される。ｇ_L1、ｇ_L2、ｇ_R1、およびｇ_R2は、ｇ_L、ｇ_R、θ_L、およびθ_Rから導出されるが、これについては後述する。

Ｌ’を導出するためには、次の２つの要件が満たされている必要がある。 The module (906) performs channel separation by mixing two uncorrelated signals S1 (t, f) and S2 (t, f) to restore L and R, and obtains L ′ and R ′. L ′ and R ′ are not duplicates of the original L and R, but are simulated. As described in the first embodiment, the decorrelator (200) has a vector relationship such that | M | = | M _{0, rev} | and the signals M and M _{0, rev} are orthogonal. Designed to be Referring to FIG. 3, where X is L and R, the mixing process is to scale M and M _{0, rev} using mixing factors g _L1 , g _L2 , g _R1 , and g _R2 , followed by a vector. Represented by addition. g _L1 , g _L2 , g _R1 , and g _R2 are derived from g _L , g _R , θ _L , and θ _R , which will be described later.

In order to derive L ′, the following two requirements must be satisfied.

および and

この二つの連立方程式ｇ_L1およびｇ_L2を解くことで、左チャネルＬ’を導出するためのミキシングファクターを求めることができる。 By solving these two simultaneous equations g _L1 and g _L2 , a mixing factor for deriving the left channel L ′ can be obtained.

同様に、右チャネルＲ’を導出するためのミキシングファクターを以下のように求めることができる。 Similarly, the mixing factor for deriving the right channel R ′ can be obtained as follows.

上記のようにして導出されたミキシングファクターを用い、Ｌ’およびＲ’を以下のように表現できる。 Using the mixing factor derived as described above, L ′ and R ′ can be expressed as follows.

モジュール（９０８）は、分離されたチャネルＬ’およびＲ’を逆変換し、時間領域信号Ｌ’（ｔ）およびＲ’（ｔ）を形成する。 Module (908) inverse transforms the separated channels L 'and R' to form time domain signals L '(t) and R' (t).

本発明の実施の形態３では、実施の形態１に示した無相関器（２００）と、実施の形態２に示した新しいミキシング方法とをマルチチャネルに対するチャネル分離に応用する方法を示す。 In the third embodiment of the present invention, a method is shown in which the decorrelator (200) shown in the first embodiment and the new mixing method shown in the second embodiment are applied to channel separation for multichannel.

本発明の実施の形態２で説明したように、チャネルは、二つの直交信号に対して適切なミキシングファクターを付加することで復元される。復元された信号Ｘは通常、次のようになる。 As described in Embodiment 2 of the present invention, the channel is restored by adding an appropriate mixing factor to the two orthogonal signals. The restored signal X is usually as follows:

この数式において、ｇ_xはゲイン係数を示し、θ_xは分離の度合いを示す。 In this equation, g _x represents a gain coefficient, and θ _x represents a degree of separation.

本実施の形態では、５-１-５ケースを用いて説明を行う。また、以下の数式をダウンミックス用の数式として想定する。 In the present embodiment, the description will be made using the 5-1-5 case. Further, the following formula is assumed as a formula for downmix.

上記数式において、ＬおよびＲは二つのフロント（前方）チャネルを示し、Ｌ_sおよびＲ_sは二つのリア（後方）チャネルを示し、Ｃはセントラル（中央）チャネルを示す。 In the above equation, L and R indicate two front (front) channels, L _s and R _s indicate two rear (rear) channels, and C indicates a central (center) channel.

図１１は５-１-５ケースにおける符号化処理を示す図である。同処理においては、ＢＣＣ符号化モジュール（１１００）〜（１１０６）を用いて異なる４通りのチャネルの組合せに対して処理を行うことで４つのバイノーラルキューセットを生成する。一つ目のバイノーラルキューセットは、モジュール（１１００）においてＣチャネルと中間ダウンミックスチャネル（Ｌ＋０．７０７Ｌ_s＋Ｒ＋０．７０７Ｒ_s）を入力として生成される。モジュール（１１０２）〜（１１０６）も同様の機能を持つが、それぞれ異なる入力を用いることでそれぞれ異なるバイノーラルキューセットを生成する。生成された４つのバイノーラルキューセットは、マルチステージ復号化処理においてダウンミックスチャネルＭをＬ、Ｒ、Ｌ_s、Ｒ_sおよびＣに繰り返し分離するために用いられる。 FIG. 11 is a diagram showing an encoding process in the 5-1-5 case. In this process, four binaural queue sets are generated by performing processing on four different combinations of channels using the BCC encoding modules (1100) to (1106). The first binaural cue set is generated in the module (1100) with the C channel and the intermediate downmix channel (L + 0.707L _s + R + 0.707R _s ) as inputs. Modules (1102) to (1106) have similar functions, but generate different binaural cue sets by using different inputs. The generated four binaural queue sets are used to repeatedly separate the downmix channel M into L, R, L _s , R _s, and C in the multistage decoding process.

図１２はチャネル分離の前に実行される復号化処理を示す図である。同前処理においては、本発明の実施の形態１と同様に、ダウンミックスチャネルＭに対してＱＭＦ変換（１２００）および無相関処理（１２０２）を行い、複数の直交残響信号Ｍ_i，_rev（ｔ，ｆ）を生成する（ｉ＝０、１、２、３）。 FIG. 12 is a diagram showing a decoding process executed before channel separation. In the preprocessing, as in the first embodiment of the present invention, QMF conversion (1200) and decorrelation processing (1202) are performed on the downmix channel M, and a plurality of orthogonal reverberation signals M _i , _rev (t , F) (i = 0, 1, 2, 3).

バイノーラルキューセット１をＭＣＣモジュール（１２０４）において処理し、二つのミキシングファクターセット（ｇ_c、θ_c）および（ｇ_M1、θ_M1）を生成する。この処理は、Ｍ（ｔ，ｆ）をＣ（ｔ，ｆ）とＭ₁（ｔ，ｆ）とに分離するために行われる。ここで、Ｍ₁（ｔ，ｆ）＝（Ｌ（ｔ，ｆ）＋０．７０７Ｌ_s（ｔ，ｆ）＋Ｒ（ｔ，ｆ）＋０．７０７Ｒ_s（ｔ，ｆ））／３．４１４である。[数１９]より、Ｍ（ｔ，ｆ）＝０．２９３Ｃ（ｔ，ｆ）＋０．７０７Ｍ₁（ｔ，ｆ）を求めることは容易であり、重み付けの値として０．２９３および０．７０７をＣ（ｔ，ｆ）およびＭ₁（ｔ，ｆ）にそれぞれ用いる。 Binaural queue set 1 is processed in MCC module (1204) to generate two mixing factor sets (g _c , θ _c ) and (g _M1 , θ _M1 ). This process is performed to separate M (t, f) into C (t, f) and M ₁ (t, f). Here, M ₁ (t, f) = (L (t, f) + 0.707L _s (t, f) + R (t, f) + 0.707R _s (t, f)) / 3.414. From [Equation 19], it is easy to obtain M (t, f) = 0.293C (t, f) + 0.707M ₁ (t, f), and 0.293 and 0.707 are used as weighting values. Used for C (t, f) and M ₁ (t, f), respectively.

バイノーラルキューセット２をＭＣＣモジュール（１２０６）において処理し、二つのミキシングファクターセット（ｇ_M2、θ_M2）および（ｇ_M3、θ_M3）を生成する。この処理はＭ₁（ｔ，ｆ）をＭ₂（ｔ，ｆ）＝（Ｌ（ｔ，ｆ）＋Ｒ（ｔ，ｆ））／２とＭ₃（ｔ，ｆ）＝（Ｌ_s（ｔ，ｆ）＋Ｒ_s（ｔ，ｆ））／２とに分離するために行われる。[数１９]より、Ｍ₁（ｔ，ｆ）＝０．５８６Ｍ₂（ｔ，ｆ）＋０．４１４Ｍ₃（ｔ，ｆ）を求めることは容易であり、重み付けの値として０．５８６および０．４１４をＭ₂（ｔ，ｆ）およびＭ₃（ｔ，ｆ）にそれぞれ用いる。 Binaural cue set 2 is processed in MCC module (1206) to generate two mixing factor sets (g _M2 , θ _M2 ) and (g _M3 , θ _M3 ). In this process, M ₁ (t, f) is changed to M ₂ (t, f) = (L (t, f) + R (t, f)) / 2 and M ₃ (t, f) = (L _s (t, f, f) + R _s (t, f)) / 2. From [Equation 19], it is easy to obtain M ₁ (t, f) = 0.586M ₂ (t, f) + 0.414M ₃ (t, f), and 0.586 and 0. 414 is used for M ₂ (t, f) and M ₃ (t, f), respectively.

バイノーラルキューセット３をＭＣＣモジュール（１２０８）において処理し、二つのミキシングファクターセット（ｇ_L、θ_L）および（ｇ_R、θ_R）を生成する。この処理はＭ₂（ｔ，ｆ）をＬ（ｔ，ｆ）とＲ（
ｔ，ｆ）とに分離するために行われる。Ｍ₂（ｔ，ｆ）＝０．５Ｌ（ｔ，ｆ）＋０．５Ｒ（ｔ，ｆ）であるため、重み付けの値として０．５を用いる。 The binaural cue set 3 is processed in the MCC module (1208) to generate two mixing factor sets (g _L , θ _L ) and (g _R , θ _R ). This process changes M ₂ (t, f) to L (t, f) and R (
t, f). Since M ₂ (t, f) = 0.5L (t, f) + 0.5R (t, f), 0.5 is used as the weighting value.

バイノーラルキューセット４をＭＣＣモジュール（１２１０）において処理し、二つのミキシングファクターセット（ｇ_Ls、θ_Ls）および（ｇ_Rs、θ_Rs）を生成する。この処理はＭ₃（ｔ，ｆ）をＬ_s（ｔ，ｆ）とＲ_s（ｔ，ｆ）とに分離するために行われる。Ｍ₃（ｔ，ｆ）＝０．５Ｌ_s（ｔ，ｆ）＋０．５Ｒ_s（ｔ，ｆ）であるため、重み付けの値として０．５を用いる。 The binaural queue set 4 is processed in the MCC module (1210) to generate two mixing factor sets (g _Ls , θ _Ls ) and (g _Rs , θ _Rs ). This process is performed to separate M ₃ (t, f) into L _s (t, f) and R _s (t, f). Since M ₃ (t, f) = 0.5L _s (t, f) + 0.5R _s (t, f), 0.5 is used as the weighting value.

図１３は上記マルチステージチャネル分離および復号化処理を示す図である。チャネル分離モジュール（１３０２）〜（１３０８）は、ダウンミックス信号Ｍ（ｔ，ｆ）および中間信号Ｍ_i（ｔ，ｆ）（ｉ＝０、１、２、３）と、無相関器（１２０２）によって生成された残響信号Ｍ_i,rev（ｔ，ｆ）（ｉ＝０、１、２、３）とに関連する一連の反復演算において、ミキシング係数の組合せを行う。各チャネル分離モジュールの出力は次のチャネル分離モジュールの入力となることがある。これは出力が以下のいずれかの場合に起こる。すなわち、出力が合成信号であって、これをさらに分離することで個々の音声信号が得られる場合、またはさらに分離できる他の合成信号が得られる場合である。具体的には、モジュール（１３０２）はＭ(ｔ，ｆ)、Ｍ_0,rev（ｔ，ｆ）、（ｇ_c，θ_c）および（ｇ_M1、θ_M1）を取り込んで処理し、Ｍ(ｔ，ｆ)をＭ１(ｔ，ｆ)とＣ’(ｔ，ｆ)とに分離する。この場合、Ｍ１(ｔ，ｆ)は複数の信号を含むため、モジュール（１３０４）に渡され、さらにチャネル分離される。一方、Ｃ’(ｔ，ｆ)はセンターチャネルの復元信号であるため、モジュール（１３１０）に渡され、時間領域表現に逆変換される。モジュール（１３０４）〜（１３０８）においても同様の処理が行われる。チャネル分離モジュールが復元チャネルを得るために用いる数式は以下のとおりである。 FIG. 13 shows the multi-stage channel separation and decoding process. The channel separation modules (1302) to (1308) include a downmix signal M (t, f) and an intermediate signal M _i (t, f) (i = 0, 1, 2, 3) and a decorrelator (1202). In the series of iterative operations associated with the reverberation signal M _{i, rev} (t, f) (i = 0, 1, 2, 3) generated by the above, the mixing coefficients are combined. The output of each channel separation module may be the input of the next channel separation module. This happens when the output is one of the following: That is, it is a case where the output is a synthesized signal, and individual audio signals can be obtained by further separating them, or another synthesized signal that can be further separated. Specifically, the module (1302) captures and processes M (t, f), M _{0, rev} (t, f), (g _c , θ _c ) and (g _M1 , θ _M1 ), and M ( t, f) is separated into M1 (t, f) and C ′ (t, f). In this case, since M1 (t, f) includes a plurality of signals, it is passed to the module (1304) and further channel-separated. On the other hand, since C ′ (t, f) is a center channel restoration signal, it is passed to the module (1310) and inversely transformed into a time domain representation. Similar processing is performed in the modules (1304) to (1308). The formula used by the channel separation module to obtain the recovered channel is as follows:

中間信号は以下のようになる。 The intermediate signal is as follows.

有効なチャネル分離の条件として、一つのステージで分離された二つのチャネル間の相関性が、ＢＣＣから推測されたものである必要がある。この条件を満たしているかどうかは以下のように証明できる。 As a condition for effective channel separation, the correlation between two channels separated in one stage needs to be inferred from BCC. Whether this condition is satisfied can be proved as follows.

上記チャネル分離は有効である。なぜなら、 The channel separation is effective. Because

図１３に示す逆ＱＭＦモジュール（１３１０）〜（１３１８）を用いて、全ての合成チャネルを時間領域信号に変換することができる。 Using the inverse QMF modules (1310) to (1318) shown in FIG. 13, all combined channels can be converted to time domain signals.

（その他変形例）
なお、本発明を上記実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。 (Other variations)
Although the present invention has been described based on the above embodiment, it is needless to say that the present invention is not limited to the above embodiment. The following cases are also included in the present invention.

（１）上記の各装置は、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭ、ハードディスクユニット、ディスプレイユニット、キーボード、マウスなどから構成されるコンピュータシステムである。前記ＲＡＭまたはハードディスクユニットには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、各装置は、その機能を達成する。ここでコンピュータプログラムは、所定の機能を達成するために、コンピュータに対する指令を示す命令コードが複数個組み合わされて構成されたものである。 (1) Each of the above devices is specifically a computer system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

（２）上記の各装置を構成する構成要素の一部または全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）から構成されているとしてもよい。システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるコンピュータシステムである。前記ＲＡＭには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、システムＬＳＩは、その機能を達成する。 (2) A part or all of the constituent elements constituting each of the above-described devices may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

（３）上記の各装置を構成する構成要素の一部または全部は、各装置に脱着可能なＩＣカードまたは単体のモジュールから構成されているとしてもよい。前記ＩＣカードまたは前記モジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどから構成されるコンピュータシステムである。前記ＩＣカードまたは前記モジュールは、上記の超多機能ＬＳＩを含むとしてもよい。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、前記ＩＣカードまたは前記モジュールは、その機能を達成する。このＩＣカードまたはこのモジュールは、耐タンパ性を有するとしてもよい。 (3) Part or all of the constituent elements constituting each of the above devices may be configured from an IC card that can be attached to and detached from each device or a single module. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

（４）本発明は、上記に示す方法であるとしてもよい。また、これらの方法をコンピュータにより実現するコンピュータプログラムであるとしてもよいし、前記コンピュータプログラムからなるデジタル信号であるとしてもよい。 (4) The present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号をコンピュータ読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙＤｉｓｃ）、半導体メモリなどに記録したものとしてもよい。また、これらの記録媒体に記録されている前記デジタル信号であるとしてもよい。 The present invention also provides a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc). ), Recorded in a semiconductor memory or the like. The digital signal may be recorded on these recording media.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号を、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものとしてもよい。 In the present invention, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

また、本発明は、マイクロプロセッサとメモリを備えたコンピュータシステムであって、前記メモリは、上記コンピュータプログラムを記憶しており、前記マイクロプロセッサは、前記コンピュータプログラムにしたがって動作するとしてもよい。 The present invention may be a computer system including a microprocessor and a memory, wherein the memory stores the computer program, and the microprocessor operates according to the computer program.

また、前記プログラムまたは前記デジタル信号を前記記録媒体に記録して移送することにより、または前記プログラムまたは前記デジタル信号を前記ネットワーク等を経由して移送することにより、独立した他のコンピュータシステムにより実施するとしてもよい。 In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like, and executed by another independent computer system. It is good.

（５）上記実施の形態及び上記変形例をそれぞれ組み合わせるとしてもよい。 (5) The above embodiment and the above modifications may be combined.

本発明は、トレーニング用シミュレータ、カーオーディオシステム、家庭用またはビジネス用のオーディオ・ビデオシステム等に適用可能である。 The present invention is applicable to training simulators, car audio systems, home or business audio / video systems, and the like.

典型的なバイノーラルキューコーデック。A typical binaural cue codec. フィードバック遅延ネットワークを用いる無相関器。A decorrelator using a feedback delay network. 二つの直交信号ベクトルの和による信号合成。Signal synthesis by the sum of two orthogonal signal vectors. カスケード型ＰＳモジュールの実装（従来技術）。Cascade PS module implementation (prior art). フィードバック行列の例。Example feedback matrix. ２-１-２ケースにおける空間音声復号化処理。2-1-2 spatial audio decoding processing in the case. 時間・周波数表現における帯域分割。Band division in time / frequency representation. バイノーラルキュー抽出モジュール。Binaural queue extraction module. ２-１-２ケースにおける空間音声復号化処理。2-1-2 spatial audio decoding processing in the case. ステレオ音ペアおよびそれらのダウンミックスの幾何学的表現。A geometric representation of stereo sound pairs and their downmixes. ５-１-５ケースにおける空間音声符号化処理の一部。Part of the spatial speech coding process in the 5-1-5 case. チャネル分離の前に行われる復号化処理。Decoding process performed before channel separation. 本発明が用いるチャネル分割・復号化処理（５-１-５ケース）。Channel division / decoding processing used in the present invention (5-1-5 case).

符号の説明Explanation of symbols

６００変換モジュール
６０２ダウンミックスモジュール
６０４ＩＬＤモジュール
６０６ＩＣＣモジュール
６０８ＩＣＣＨモジュール
８００２−１ＢＣＣ符号化モジュール
９００ＱＭＦフィルタバンク
９０２無相関器
９０４ミキシング係数算出モジュール
９０６チャネル分離モジュール
９０８ＱＭＦ-1フィルタバンク 600 conversion module 602 downmix module 604 ILD module 606 ICC module 608 ICCH module 800 2-1 BCC encoding module 900 QMF filter bank 902 decorrelator 904 mixing coefficient calculation module 906 channel separation module 908 QMF-1 filter bank

Claims

一つの音声信号を処理して、互いに非干渉である複数の残響信号を生成する装置であって、
（ａ）オールパスフィルタを用いて前記音声信号を処理して、中間残響信号を生成し、
（ｂ）フィードバック遅延ネットワーク（ＦＤＮ）を用いて前記中間残響信号を処理して、複数の残響信号を生成する
ことを特徴とする装置。 An apparatus for processing a single audio signal to generate a plurality of reverberation signals that are non-interfering with each other,
(A) processing the audio signal using an all-pass filter to generate an intermediate reverberation signal;
(B) The intermediate reverberation signal is processed using a feedback delay network (FDN) to generate a plurality of reverberation signals.

請求項１に記載の装置であって、前記ＦＤＮは、互いに素である遅延長を有するフィードフォワード遅延線と、フィードバック行列を有するフィードバック経路とを含む
ことを特徴とする装置。 The apparatus according to claim 1, wherein the FDN includes a feedforward delay line having a delay length that is relatively prime, and a feedback path having a feedback matrix.

請求項１および請求項２に記載の装置であって、前記フィードバック行列は、（１）ユニタリー行列であり、（２）それぞれの残響信号のエネルギーが同じになり、それぞれの残響信号が互いに非干渉かつ直交になるような行列要素を持つ
ことを特徴とする装置。 3. The apparatus according to claim 1, wherein the feedback matrix is (1) a unitary matrix, (2) the energy of each reverberation signal is the same, and each reverberation signal is non-interfering with each other. A device characterized by having matrix elements that are orthogonal.

複数の信号を、合成ダウンミックス信号とバイノーラルキュー（ＢＣ）情報とからなるビットストリームに符号化する装置であって、
（ａ）前記複数の信号を用いてダウンミックス信号を生成し、
（ｂ）前記複数の信号とダウンミックス信号とを、ハイブリッド時間−周波数表現に変換し、それらを周波数軸に沿って複数の帯域に分割し、
（ｃ）反復的に行われるマルチステージ復号化処理において、前記ダウンミックス信号を個々の信号に分離するためのチャネル分離ステージを導出し、
（ｄ）各チャネル分離ステージにおいて、前記複数の帯域を時間方向にさらに周波数領域に分割する境界（Ｂｏｒｄｅｒ）を決定し、
（ｅ）各チャネル分離ステージにおいて、前記複数の信号およびダウンミックス信号を用いて、周波数帯域毎のＢＣ情報を算出する
ことを特徴とする装置。 An apparatus for encoding a plurality of signals into a bitstream composed of a combined downmix signal and binaural cue (BC) information,
(A) generating a downmix signal using the plurality of signals;
(B) converting the plurality of signals and the downmix signal into a hybrid time-frequency representation and dividing them into a plurality of bands along the frequency axis;
(C) Deriving a channel separation stage for separating the downmix signal into individual signals in an iteratively performed multistage decoding process;
(D) In each channel separation stage, determine a boundary (Border) that further divides the plurality of bands into a frequency domain in the time direction;
(E) In each channel separation stage, BC information for each frequency band is calculated using the plurality of signals and the downmix signal.

請求項４に記載の装置であって、各チャネル分離ステージにおいて、複数の信号からなる入力合成ダウンミックス信号を、それぞれが（１）複数の信号からなる他の合成信号である、または（２）一つの信号である、二つの信号に分離する
ことを特徴とする装置。 5. The apparatus according to claim 4, wherein in each channel separation stage, an input composite downmix signal composed of a plurality of signals is (1) another composite signal composed of a plurality of signals, or (2). A device that separates two signals, which are one signal.

請求項４に記載の装置であって、前記境界は、ＩＬＤにおける大きな変化に代表される、過渡事象が発生する時間的位置に配置される
ことを特徴とする装置。 5. The apparatus according to claim 4, wherein the boundary is located at a temporal position where a transient event occurs, represented by a large change in ILD.

請求項４に記載の装置であって、前記バイノーラルキュー情報は、必要に応じて、分離される二つの信号間のチャネル間レベル差キュー、チャネル間干渉性キュー、および高周波数チャネル間干渉性キューを含む
ことを特徴とする装置。 5. The apparatus according to claim 4, wherein the binaural queue information includes an inter-channel level difference queue between two signals to be separated, an inter-channel coherence queue, and a high-frequency inter-channel coherence queue, as necessary. The apparatus characterized by including.

請求項４および請求項７に記載の装置であって、ＩＬＤキューは一つの周波数帯域において分離される二つの信号間のエネルギー比である
ことを特徴とする装置。 The apparatus according to claim 4 and 7, wherein the ILD queue is an energy ratio between two signals separated in one frequency band.

請求項４および請求項７に記載の装置であって、ＩＣＣキューは、一つの周波数帯域において分離される二つの信号間の、位相の相関性を計測するために用いられる
ことを特徴とする装置。 8. The apparatus according to claim 4 and 7, wherein the ICC queue is used to measure a phase correlation between two signals separated in one frequency band. .

請求項４および請求項７に記載の装置であって、ＩＣＣＨキューは、一つの周波数帯域において分離される二つの信号間の、位相ではなく波形の相関性を計測するために用いられる
ことを特徴とする装置。 8. The apparatus according to claim 4 and claim 7, wherein the ICCH queue is used to measure a waveform correlation, not a phase, between two signals separated in one frequency band. Equipment.

合成ダウンミックス信号とＢＣ情報とからなるビットストリームを、重み係数を用いて複数の個々の信号に復号化する装置であって、
（ａ）前記合成ダウンミックス信号をハイブリッド時間−周波数表現に変換し、それらを周波数軸に沿って複数の帯域に分割し、
（ｂ）請求項１に記載の装置の実装を前記ダウンミックス信号に対して適用して、チャネル分離に用いる複数の残響無相関信号を生成し、
（ｃ）各チャネル分離ステージにおいて、ミキシング係数算出（ＭＣＣ）モジュールを用いて、重み係数と、Ｂｏｒｄｅｒ、ＩＬＤ，ＩＣＣ、およびＩＣＣＨを含むバイノーラルキューとで構成される全てのセットを処理してミキシング係数を導出し、
（ｄ）各チャネル分離ステージにおいて、チャネル分離（ＣＳ）モジュールで前記合成ダウンミックス信号と、前記残響無相関信号のうちいずれかの残響無相関信号とを前記ミキシング係数を用いて変調し、それらを、それぞれが単一の信号、または合成信号である二つの出力信号に分離し、
（ｅ）前記出力信号が合成信号である場合、他のＣＳモジュールにおいて、当該出力信号を、それまで未使用の残響信号を用いて、全ての合成信号が個々の信号に分離されるまで繰り返し処理し、
（ｆ）全ての個々の信号を時間−周波数表現から時間領域に逆変換して、マルチチャネル音声信号を復元する
ことを特徴とする装置。 An apparatus for decoding a bitstream composed of a synthesized downmix signal and BC information into a plurality of individual signals using weighting factors,
(A) converting the synthesized downmix signal into a hybrid time-frequency representation, dividing them into a plurality of bands along the frequency axis;
(B) applying the apparatus implementation of claim 1 to the downmix signal to generate a plurality of reverberant uncorrelated signals for use in channel separation;
(C) At each channel separation stage, a mixing coefficient calculation (MCC) module is used to process all sets consisting of weighting coefficients and binaural cues including Border, ILD, ICC, and ICCH to mix the coefficients. Is derived,
(D) In each channel separation stage, a channel separation (CS) module modulates the synthesized downmix signal and any one of the reverberant uncorrelated signals using the mixing coefficient, , Separated into two output signals, each of which is a single signal or a composite signal,
(E) When the output signal is a composite signal, in another CS module, the output signal is repeatedly processed until all the composite signals are separated into individual signals using a reverberation signal that has not been used so far. And
(F) A device characterized by reconstructing a multi-channel audio signal by inversely transforming all individual signals from a time-frequency representation into the time domain.

請求項１、３、および１１に記載の装置であって、前記複数の残響無相関信号は、互いにかつ入力ダウンミックス信号に対して直交である
ことを特徴とする装置。 12. Apparatus according to claim 1, 3 and 11, wherein the plurality of reverberant uncorrelated signals are orthogonal to each other and to the input downmix signal.

請求項１１に記載の装置であって、前記ＭＭＣは、対応するチャネル分離ステージにおいて出力された二つの出力信号のそれぞれに適用される前記ＢＣ情報と重み係数とに基づいて２つのミキシング係数セットを生成する
ことを特徴とする装置。

12. The apparatus according to claim 11, wherein the MMC determines two mixing coefficient sets based on the BC information and weighting coefficient applied to each of two output signals output in a corresponding channel separation stage. A device characterized by generating.