JP2008538239A

JP2008538239A - Apparatus and method for generating data streams and multi-channel representations

Info

Publication number: JP2008538239A
Application number: JP2008503398A
Authority: JP
Inventors: フィーゼルヴォルフガング; ノイジンガーマティアス; ポップハーラルト; ガイヤースベルガーシュテファン
Original assignee: フラウンホーファーゲゼルシャフトツールフォルデルングデルアンゲヴァンテンフォルシユングエー．フアー．
Priority date: 2005-03-30
Filing date: 2006-03-15
Publication date: 2008-10-16
Anticipated expiration: 2026-03-15
Also published as: CN101189661A; JP5273858B2; ATE434253T1; US7903751B2; TWI318845B; WO2006102991A1; CA2603027C; EP1864279A1; US20080013614A1; DE102005014477A1; AU2006228821A1; TW200644704A; AU2006228821B2; CA2603027A1; HK1111259A1; EP1864279B1; CN101189661B; DE502006003997D1; MY139836A

Abstract

マルチチャネル補助データを含むデータストリームおよび１以上の基本チャネル（３）に関するデータを含むデータストリームを時刻同期するために、前記１以上の基本チャネルに対するフィンガープリント情報の計算（２）をエンコーダで行い、前記フィンガープリント情報を前記マルチチャネル補助データと時間的に対応付けてデータストリームに挿入（４）する。デコーダでは、フィンガープリント情報を前記１以上の基本チャネルから計算し、データストリームから抽出したフィンガープリント情報と組合わせて、前記マルチチャネル補助情報を含むデータストリームおよび前記１以上の基本チャネルを含むデータストリームとの間のタイムオフセットを、例えば相関処理により計算し補間し、同期マルチチャネル表現を生成する。 In order to time-synchronize the data stream including multi-channel auxiliary data and the data stream including data related to one or more basic channels (3), the encoder performs calculation (2) of fingerprint information for the one or more basic channels, The fingerprint information is temporally associated with the multi-channel auxiliary data and inserted into the data stream (4). In the decoder, fingerprint information is calculated from the one or more basic channels, and combined with the fingerprint information extracted from the data stream, the data stream including the multi-channel auxiliary information and the data stream including the one or more basic channels The time offset between and is calculated and interpolated, for example, by correlation processing to generate a synchronized multi-channel representation.

Description

本発明は音声信号の処理に関し、特に、１以上の基本チャネルおよび／またはダウンミックスチャネルおよびマルチチャネル補助情報に基づいて、元のマルチチャネル信号をマルチチャネル再生するマルチチャネル処理技術に関する。 The present invention relates to audio signal processing, and more particularly to multi-channel processing technology for multi-channel reproduction of an original multi-channel signal based on one or more basic channels and / or downmix channels and multi-channel auxiliary information.

近年、データ量の減少により音声信号をこれまでになくより効率的に送信したり、また、マルチチャネル技術等を利用した改良により、聴く楽しみをより向上するような技術が開発されている。このような、公知の送信技術の改良例が、バイノーラルキュー符号化（ＢＣＣ）および「空間音声符号化」として近年知られており、ジェイ・ヘレ、シー・ファーラー、エス・ディッシュ、シー・エーテル、ジェイ・ヒルバート、エイ・ホールツァー、ケイ・リンツメイアー、シー・シュプレンガー、ピー・クルーンによる「空間音声符号化：効率的で互換性のある次世代マルチチャネル音声符号化」と題するＡＥＳ予稿６１８６、第１１７回ＡＥＳ大会、２００４年、サンフランシスコ（J. Herre, C. Faller, S. Disch, C. Ertel, J. Hilbert, A. Hoelzer, K. Linzmeier, C. Sprenger, P. Kroon: “Spatial Audio Coding: Next Generation Efficient and Compatible Coding of Multi-Channel Audio”, 117th AES Convention, San Francisco 2004, Preprint 6186）に記載されている。 In recent years, a technology has been developed that transmits audio signals more efficiently than ever due to a reduction in the amount of data, and that further enhances listening enjoyment through improvements using multi-channel technology and the like. Such improvements to known transmission techniques have recently been known as binaural cue coding (BCC) and “spatial speech coding”, Jay Helle, Sea Farrer, S Dish, Sea Ether, AES Proposal 6186 entitled "Spatial Speech Coding: Efficient and Compatible Next-Generation Multi-Channel Speech Coding" by Jay Hilbert, Ai Holzer, Kay Linzmeier, Sea Sprengaler, P. Kroon 117th AES Convention, 2004, San Francisco (J. Herre, C. Faller, S. Disch, C. Ertel, J. Hilbert, A. Hoelzer, K. Linzmeier, C. Sprenger, P. Kroon: “Spatial Audio Coding: Next Generation Efficient and Compatible Coding of Multi-Channel Audio ”, 117th AES Convention, San Francisco 2004, Preprint 6186).

マルチチャネル音声信号の送信時に必要なデータ量を減少するための様々な技術について以下に詳細に述べる。 Various techniques for reducing the amount of data required when transmitting a multi-channel audio signal are described in detail below.

これらの技術はジョイントステレオ技術と呼ばれる。この目的で、図３に示すジョイントステレオ装置６０を参照する。この装置は、例えばインテンシティステレオ（ＩＳ）技術、もしくはバイノーラルキュー符号化技術（ＢＣＣ）を実行する装置である。この装置は、一般に２つ以上のチャネルＣＨ１、ＣＨ２、．．．、ＣＨｎを入力信号として受信し、単一の搬送波チャネルおよびパラメトリックマルチチャネル情報を出力する。パラメトリックデータは、元のチャネル（ＣＨ１、ＣＨ２、．．．、ＣＨｎ）の近似値をデコーダで計算できるように規定される。 These techniques are called joint stereo techniques. For this purpose, reference is made to the joint stereo device 60 shown in FIG. This apparatus is an apparatus that executes, for example, intensity stereo (IS) technology or binaural cue coding technology (BCC). This device generally has two or more channels CH1, CH2,. . . , CHn as input signals and output single carrier channel and parametric multi-channel information. Parametric data is defined such that an approximate value of the original channel (CH1, CH2,..., CHn) can be calculated by the decoder.

通常、搬送波チャネルは、基底信号を比較的良好に表すサブバンドサンプル、スペクトル係数、時間領域サンプル等を含む。一方、パラメトリックデータは、これらのサンプルやスペクトル係数を含まないが、乗法、時間シフティング、周波数シフティングによる重み付け等の所定の再生アルゴリズムを制御するための制御パラメータを含む。このように、パラメトリックマルチチャネル情報は信号または関連するチャネルの比較的大雑把な表現を含む。数値で表すと、搬送波チャネルが必要とするデータ量は約６０〜７０ｋｂｉｔ／ｓの範囲にあり、一方、パラメトリック補助情報がチャネルに対し必要とするデータ量は１．５〜２．５ｋｂｉｔ／ｓの範囲にある。なお、上記の数値は圧縮データに対するものである。当然のことながら、非圧縮ＣＤチャネルはおよそ１０倍程度のデータ量を必要とする。パラメトリックデータの例としては、後述のとおり、既知のスケールファクタ、インテンシティステレオ情報またはＢＣＣパラメータがある。 Typically, the carrier channel includes subband samples, spectral coefficients, time domain samples, etc. that represent the base signal relatively well. On the other hand, the parametric data does not include these samples and spectral coefficients, but includes control parameters for controlling a predetermined reproduction algorithm such as weighting by multiplication, time shifting, and frequency shifting. Thus, parametric multi-channel information includes a relatively rough representation of the signal or associated channel. Expressed numerically, the amount of data required by the carrier channel is in the range of about 60-70 kbit / s, while the amount of data required by the parametric auxiliary information for the channel is 1.5-2.5 kbit / s. Is in range. The above numerical values are for compressed data. Naturally, an uncompressed CD channel requires about 10 times the amount of data. Examples of parametric data include known scale factors, intensity stereo information or BCC parameters, as will be described later.

インテンシティステレオ符号化技術については、ジェイ・ヘレ、ケイ・エイチ・ブランデンブルグ、ディー・レデラーによる「インテンシティ・ステレオ符号化」と題するＡＥＳ予稿３７９９、１９９４年２月、アムステルダム（AES Preprint 3799 "Intensity Stereo Coding", J. Herre, K.H. Brandenburg, D. Lederer, February 1994, Amsterdam）に記載されている。一般には、インテンシティステレオの概念は、両方のステレオ音声チャネルのデータに適用される主軸変換に基づく。データポイントのほとんどが第１の主軸の周りに集中している場合、符号化利得は、符号化の前にある角度だけ両方の信号を回転させることにより達成できる。しかしながら、この方法は、実際のステレオ再生技術では必ずしもうまくいかない。したがって、この方法を改善し、第２の直交成分をビットストリームでの送信から除外する。その結果、左右のチャネルについて再生された信号は、同じ伝送信号を様々に重み付けもしくはスケーリングしたバージョンからなる。にもかかわらず、これら再生された信号は、振幅が異なるものの位相情報については同一である。しかしながら、元の音声チャネルのエネルギ時間エンベロープは、一般に周波数選択的に作用する選択的スケーリング動作により維持される。このことは、支配的な空間情報がエネルギエンベロープにより決まる、人間の高周波での音声知覚と同じである。 Intensity stereo coding is discussed in the AES draft 3799 entitled “Intensity Stereo Coding” by J. Helle, KH Brandenburg and Dee Lederer, February 1994, Amsterdam (AES Preprint 3799 “Intensity Stereo”). Coding ", J. Herre, KH Brandenburg, D. Lederer, February 1994, Amsterdam). In general, the concept of intensity stereo is based on a principal axis transformation applied to the data of both stereo audio channels. If most of the data points are concentrated around the first major axis, coding gain can be achieved by rotating both signals by an angle prior to coding. However, this method does not always work with an actual stereo reproduction technique. Therefore, this method is improved and the second orthogonal component is excluded from transmission in the bitstream. As a result, the signal reproduced for the left and right channels consists of variously weighted or scaled versions of the same transmission signal. Nevertheless, these reproduced signals have the same phase information with different amplitudes. However, the energy time envelope of the original voice channel is maintained by a selective scaling operation that generally acts frequency selective. This is the same as human speech perception at high frequencies, where the dominant spatial information is determined by the energy envelope.

また、実際の信号送信、つまり搬送波チャネルは、両方の成分を回転させるのではなく、左右チャネルの和信号から生成される。さらに、この処理、すなわちスケーリングを行うためにインテンシティステレオパラメータを生成する処理は、周波数選択的に行われる。つまり、スケール係数帯ごと、エンコーダ周波数区分ごとに独立して行われる。好ましくは、両方のチャネルを結合して、１つの結合したチャネルもしくは「搬送波」チャネルを形成し、結合されたチャネルに追加してインテンシティステレオ情報を形成する。インテンシティステレオ情報は、第１のチャネルのエネルギ、第２のチャネルのエネルギ、もしくは結合チャネルのエネルギに基づく。 Also, the actual signal transmission, ie the carrier channel, is generated from the sum signal of the left and right channels rather than rotating both components. Further, this process, that is, the process of generating intensity stereo parameters for performing scaling, is performed in a frequency selective manner. That is, it is performed independently for each scale coefficient band and for each encoder frequency section. Preferably, both channels are combined to form one combined channel or “carrier” channel and added to the combined channel to form intensity stereo information. The intensity stereo information is based on the energy of the first channel, the energy of the second channel, or the energy of the combined channel.

ＢＣＣ技術については、ティー・ファーラー、エフ・バウムガルトの「ステレオおよびマルチチャネル音声圧縮に適用するバイノーラルキュー符号化」と題するＡＥＳ大会論文５５７４、２００２年５月、ミュンヘン（AES convention paper 5574 “Binaural Cue Coding applied to stereo and multi-channel audio compression”, T. Faller, F. Baumgarte, May 2002, Munich）に記載されている。ＢＢＣ符号化においては、ＤＦＴ変換に基づいて、複数の音声入力チャネルをオーバーラッピング窓を用いてスペクトラム表現に変換する。その結果得られるスペクトルは、重ならない部分に分割され、それぞれがインデックスを有する。各区分は、等価直角帯域幅（ＥＲＢ）に比例するバンド幅を有する。各区分および各フレームｋについて、チャネル間レベル差（ＩＣＬＤ）およびチャネル間時間差（ＩＣＴＤ）が決定される。ＩＣＬＤおよびＩＣＴＤは量子化され符号化されて、最終的には補助情報としてのＢＢＣビットストリームに到達する。チャネル間レベル差およびチャネル間時間差は、各チャネルごとに参照チャネル対して与えられる。そして、処理をする信号の特定の区分に基づく所定の方式に従って、パラメータを計算する。 Regarding BCC technology, AES convention paper 5574 entitled "Binaural Cue Coding", AES convention paper 5574 entitled "Binaural cue coding applied to stereo and multi-channel audio compression" by T. Farrer, EF Baumgart. applied to stereo and multi-channel audio compression ", T. Faller, F. Baumgarte, May 2002, Munich). In BBC encoding, a plurality of audio input channels are converted into a spectrum representation using an overlapping window based on DFT conversion. The resulting spectrum is divided into non-overlapping parts, each having an index. Each section has a bandwidth that is proportional to the Equivalent Right Angle Bandwidth (ERB). For each partition and each frame k, an inter-channel level difference (ICLD) and an inter-channel time difference (ICTD) are determined. ICLD and ICTD are quantized and encoded, and finally arrive at the BBC bit stream as auxiliary information. The inter-channel level difference and the inter-channel time difference are given to the reference channel for each channel. Then, the parameters are calculated according to a predetermined method based on a specific section of the signal to be processed.

デコーダ側では、デコーダは通常、モノ信号およびＢＢＣビットストリームを受信する。モノ信号は周波数領域に変換され、空間合成ブロックへ入力される。空間合成ブロックは同様に、復号化ＩＣＬＤ値およびＩＣＴＤ値も受信する。空間合成ブロックでは、ＢＣＣパラメータ（ＩＣＬＤおよびＩＣＴＤ）を用いてモノ信号を重み付けし、マルチチャネル信号を合成する。このマルチチャネル信号は、周波数／時間変換を経て、元のマルチチャネル音声信号の再生を表す。 On the decoder side, the decoder typically receives a mono signal and a BBC bitstream. The mono signal is converted to the frequency domain and input to the spatial synthesis block. The spatial synthesis block also receives the decoded ICLD value and ICTD value. In the spatial synthesis block, the mono signal is weighted using BCC parameters (ICLD and ICTD) to synthesize a multi-channel signal. This multi-channel signal undergoes frequency / time conversion and represents the reproduction of the original multi-channel audio signal.

ＢＣＣの場合、ジョイントステレオモジュール６０は、パラメトリックチャネルデータが量子化および符号化ＩＣＬＤまたはＩＣＴＤパラメータになるようチャネル補助情報を出力するよう作用し、元のチャネルのうち１つがチャネル補助情報を符号化するための参照チャネルとして使用される。 In the case of BCC, the joint stereo module 60 operates to output channel auxiliary information so that the parametric channel data becomes quantized and encoded ICLD or ICTD parameters, and one of the original channels encodes the channel auxiliary information. Used as a reference channel.

通常、搬送波信号は、関与する元のチャネルの和からなる。 The carrier signal usually consists of the sum of the original channels involved.

上記の技術は当然、搬送波チャネルのみを処理できるデコーダのためのモノ表現を与えるに過ぎず、複数の入力チャネルに対し１つまたは複数の近似値を生成するためにパラメトリックデータを処理することはできない。 Of course, the above technique only provides a mono representation for a decoder that can only process carrier channels, and cannot process parametric data to generate one or more approximations for multiple input channels. .

このＢＢＣ技術については、米国特許公開ＵＳ２００３／０２１９１３０Ａ１、ＵＳ２００３／００２６４４１Ａ１およびＵＳ２００３／００３５５５３Ａ１にも記載されている。また、ティー・ファーラーおよびエフ・バウムガルトの「バイノーラル・キュー符号化、パートＩＩ：構成および応用」、音声および音声処理に関するＩＥＥＥ論文誌、第１１巻、第６号、２００３年１１月（"Binaural Cue Coding. Part II: Schemes and Applications", T. Faller and F. Baumgarte, IEEE Trans. On Audio and Speech Proc., Vol. 11, No. 6, November 2003）という専門家による出版物も引用する。 This BBC technology is also described in US Patent Publications US2003 / 0219130A1, US2003 / 0026441A1, and US2003 / 0035553A1. Also, T. Farrer and F. Baumgart, “Binaural Cue Coding, Part II: Construction and Applications”, IEEE Journal of Speech and Speech Processing, Volume 11, Issue, November 2003 (“Binaural Cue Part II: Schemes and Applications ", T. Faller and F. Baumgarte, IEEE Trans. On Audio and Speech Proc., Vol. 11, No. 6, November 2003).

次に、マルチチャネル音声符号化のための典型的なＢＣＣ方式について、図４〜図６を参照して詳細に述べる。 Next, a typical BCC scheme for multi-channel speech coding will be described in detail with reference to FIGS.

図５は、そのような、マルチチャネル音声信号を符号化／送信するためのＢＣＣ方式を示す。ＢＣＣエンコーダ１１２の入力１１０におけるマルチチャネル音声入力信号は、所謂ダウンミックスブロック１１４においてミックスダウンされる。この例では、入力１１０の元のマルチチャネル信号は、前方左チャネル、前方右チャネル、左サラウンドチャネル、右サラウンドチャネル、および中央チャネルを有する５チャネルのサラウンド信号である。本発明の好ましい実施例では、ダウンミックスブロック１１４が、これら５つのチャネルを単純に加算して１つのモノ信号にすることで、和信号を生成する。 FIG. 5 illustrates such a BCC scheme for encoding / transmitting a multi-channel audio signal. The multi-channel audio input signal at the input 110 of the BCC encoder 112 is mixed down in a so-called downmix block 114. In this example, the original multi-channel signal at input 110 is a 5-channel surround signal having a front left channel, a front right channel, a left surround channel, a right surround channel, and a center channel. In the preferred embodiment of the present invention, the downmix block 114 generates a sum signal by simply adding these five channels into one mono signal.

先行技術では、他のダウンミックス方法が知られており、マルチチャネル入力信号を用いて、単一のチャネルを有するダウンミックスチャネルが得られる。 In the prior art, other downmix methods are known and a multichannel input signal is used to obtain a downmix channel having a single channel.

この単一のチャネルは、和信号ライン１１５上に出力される。ＢＣＣ分析ブロック１１６から得られた補助情報を補助情報ライン１１７上に出力する。 This single channel is output on the sum signal line 115. The auxiliary information obtained from the BCC analysis block 116 is output on the auxiliary information line 117.

上記のとおり、チャネル間レベル差（ＩＣＬＤ）およびチャネル間時間差（ＩＣＴＤ）をＢＣＣ分析ブロックで計算する。ここで、ＢＣＣ分析ブロック１１６は、チャネル間相関値（ＩＣＣ値）も計算することができる。和信号と補助情報とを量子化および符号化された形式で、ＢＣＣデコーダ１２０へ送信する。ＢＣＣデコーダは、送信された和信号をいくつかのサブバンドに分割し、スケーリングを行い、遅延を行い、かつ他の処理ステップを行って、出力するマルチチャネル音声チャネルのサブバンドを与える。この処理は、出力１２１における再生マルチチャネル信号のＩＣＬＤ、ＩＣＴＤおよびＩＣＣパラメータ（キュー）が、ＢＣＣエンコーダ１１２の入力１１０における元のマルチチャネル信号に対応するキューと一致するように行われる。この目的で、ＢＣＣデコーダ１２０は、ＢＣＣ合成ブロック１２２および補助情報処理ブロック１２３を備える。 As described above, the inter-channel level difference (ICLD) and the inter-channel time difference (ICTD) are calculated in the BCC analysis block. Here, the BCC analysis block 116 can also calculate an inter-channel correlation value (ICC value). The sum signal and the auxiliary information are transmitted to the BCC decoder 120 in a quantized and encoded format. The BCC decoder divides the transmitted sum signal into several subbands, performs scaling, delays, and other processing steps to provide subbands for the output multichannel audio channel. This process is performed so that the ICLD, ICTD, and ICC parameters (queues) of the reproduced multichannel signal at output 121 match the cues corresponding to the original multichannel signal at input 110 of BCC encoder 112. For this purpose, the BCC decoder 120 includes a BCC synthesis block 122 and an auxiliary information processing block 123.

次に、ＢＣＣ合成ブロック１２２の内部設定について図６を参照して説明する。ライン１１５上の和信号が、時間／周波数変換ユニットまたはフィルタバンクＦＢ１２５へ供給される。ブロック１２５の出力では、Ｎ個のサブバンド信号か、または極端な場合、音声フィルタバンク１２５が１：１の変換、すなわちＮ個の時間領域サンプルからＮ個のスペクトル係数を生成する変換を行った場合には、スペクトル係数のブロックが得られる。 Next, the internal setting of the BCC synthesis block 122 will be described with reference to FIG. The sum signal on line 115 is supplied to a time / frequency conversion unit or filter bank FB125. At the output of block 125, either N subband signals or, in extreme cases, audio filter bank 125 performed a 1: 1 transformation, ie, a transformation that generates N spectral coefficients from N time domain samples. In some cases, a block of spectral coefficients is obtained.

ＢＣＣ合成ブロック１２２は、遅延ステージ１２６、レベル変更ステージ１２７、相関処理ステージ１２８および逆フィルタバンクステージＩＦＢ１２９をさらに備える。ステージ１２９の出力では、たとえば、５チャネルサラウンドシステムの場合、５つのチャネルを有する再生マルチチャネル音声信号が、図５または図４に示すようなラウドスピーカ１２４のセットへ出力されてもよい。 The BCC synthesis block 122 further includes a delay stage 126, a level change stage 127, a correlation processing stage 128, and an inverse filter bank stage IFB 129. At the output of stage 129, for example, in the case of a 5-channel surround system, a reproduced multi-channel audio signal having 5 channels may be output to a set of loudspeakers 124 as shown in FIG.

入力信号ｓｎを、素子１２５により、周波数領域またはフィルタバンク領域へ変換する。素子１２５により出力される信号を、コピーノード１３０により示すとおり、同信号のいくつかのバージョンが得られるようコピーする。元の信号のバージョンの数は、出力信号における出力チャネルの数に等しい。そして、ノード１３０の元の信号の各バージョンに、ある遅延ｄ_ｌ、ｄ_２、．．．、ｄ_ｉ、．．．、ｄ_Nを与える。遅延パラメータは、図５の補助情報処理ブロック１２３により計算され、図５のＢＣＣ分析ブロック１１６により計算されたチャネル間時間差から生成される。 The input signal sn is converted into the frequency domain or the filter bank domain by the element 125. The signal output by element 125 is copied so that several versions of the signal are obtained, as indicated by copy node 130. The number of versions of the original signal is equal to the number of output channels in the output signal. Then, each version of the original signal at node 130 has a certain delay d ₁ , d ₂ _,. . . , D _i,. . . , D _N. The delay parameter is calculated by the auxiliary information processing block 123 of FIG. 5, and is generated from the inter-channel time difference calculated by the BCC analysis block 116 of FIG.

同じことが、乗算パラメータａ₁、ａ₂、．．．ａ_ｉ、．．．、ａ_Ｎに当てはまり、これらもＢＣＣ分析ブロック１１６により計算されたチャネル間レベル差に基づき、補助情報処理ブロック１２３により計算される。 The same is true for the multiplication parameters a ₁ , a ₂ ,. . . a _i,. . . , A _N , which are also calculated by the auxiliary information processing block 123 based on the inter-channel level difference calculated by the BCC analysis block 116.

ＢＣＣ分析ブロック１１６により計算されるＩＣＣパラメータは、ブロック１２８の出力に、遅延され、かつ、レベルを操作された信号間に所定の相関が得られるように、ブロック１２８の機能性を制御するために使用される。なお、ステージ１２６、１２７および１２８の順序は、図６に示す順序と異なってもよい。 The ICC parameters calculated by the BCC analysis block 116 are used to control the functionality of the block 128 such that a predetermined correlation is obtained between the delayed and level manipulated signals at the output of the block 128. used. Note that the order of the stages 126, 127, and 128 may be different from the order shown in FIG.

また、音声信号のフレームによる処理において、ＢＣＣ分析をフレーム的に、すなわち時間的可変的に行い、図６のフィルタバンク分割により明らかなように、周波数によるＢＣＣ分析をさらに得てもよい。これは、ＢＣＣパラメータがスペクトル帯ごとに得られることを意味する。また、これは、音声フィルタバンク１２６が入力信号をたとえば３２のバンドパス信号に分ける場合、ＢＣＣ分析ブロックが３２のバンドの各々についてＢＣＣパラメータのセットを取得することも意味する。図６でより詳細に示す、図５のＢＣＣ合成ブロック１２２も、例として上に述べた３２のバンドに基づく再生を行う。 Further, in the processing based on the frame of the audio signal, the BCC analysis may be performed in a frame, that is, temporally variable, and the BCC analysis by the frequency may be further obtained as apparent from the filter bank division in FIG. This means that BCC parameters are obtained for each spectrum band. This also means that if the audio filter bank 126 divides the input signal into, for example, 32 bandpass signals, the BCC analysis block obtains a set of BCC parameters for each of the 32 bands. The BCC synthesis block 122 of FIG. 5, shown in more detail in FIG. 6, also performs playback based on the 32 bands described above as an example.

次に、個々のＢＣＣパラメータを決定するためのシナリオについて、図４を参照して説明する。通常、ＩＣＬＤ、ＩＣＴＤおよびＩＣＣパラメータは、チャネル対の間で規定され得る。しかしながら、ＩＣＬＤおよびＩＣＴＤパラメータは、参照チャネルおよび他の各チャネルとの間で決定することが好ましい。これについては、図４Ａに示す。 Next, a scenario for determining individual BCC parameters will be described with reference to FIG. In general, ICLD, ICTD and ICC parameters may be defined between channel pairs. However, the ICLD and ICTD parameters are preferably determined between the reference channel and each other channel. This is illustrated in FIG. 4A.

ＩＣＣパラメータは、異なる方法で規定してもよい。図４Ｂに示すとおり、一般に、ＩＣＣパラメータは、エンコーダにおいて可能な全てのチャネル対の間で決定され得る。しかしながら、図４Ｃに示すように、随時２つの最も強いチャネルの間でのみＩＣＣパラメータを計算するという提案がなされている。図４Ｃの例では、随時にチャネル１および２の間のＩＣＣパラメータが計算され、別の時に、チャネル１および５の間のＩＣＣパラメータが計算される。そして、デコーダは、デコーダにおける最強チャネルの間のチャネル間相関を合成し、残りのチャネル対については、ある種の発見的ルールを用いて、チャネル間コヒーレンスを計算かつ合成する。 ICC parameters may be defined in different ways. As shown in FIG. 4B, in general, ICC parameters may be determined between all possible channel pairs at the encoder. However, as shown in FIG. 4C, proposals have been made to calculate ICC parameters only between the two strongest channels at any given time. In the example of FIG. 4C, the ICC parameter between channels 1 and 2 is calculated at any time, and the ICC parameter between channels 1 and 5 is calculated at another time. The decoder then synthesizes the inter-channel correlation between the strongest channels at the decoder and computes and synthesizes the inter-channel coherence for the remaining channel pairs using some kind of heuristic rule.

たとえば、送信されたＩＣＬＤパラメータに基づく乗算パラメータａ₁、ａ_Ｎの計算に関しては、ＡＥＳ大会論文、第５５７４を引用する。ＩＣＬＤパラメータは、元のマルチチャネル信号のエネルギ分布を表す。普遍性を失わず、図４Ａに示すように、それぞれのチャネルと前方左チャネルとの間のエネルギ差分を表す４つのＩＣＬＤパラメータをとるのが好ましい。補助情報処理ブロック１２２においては、乗算パラメータａ₁、．．．、ａ_Ｎは、再生出力チャネル全ての合計エネルギーが同じである（または送信された和信号のエネルギに比例する）ように、ＩＣＬＤパラメータから生成される。 For example, regarding the calculation of the multiplication parameters a ₁ and a _N based on the transmitted ICLD parameters, reference is made to AES Conference Paper No. 5574. The ICLD parameter represents the energy distribution of the original multichannel signal. Without loss of universality, it is preferable to take four ICLD parameters representing the energy difference between each channel and the front left channel, as shown in FIG. 4A. In the auxiliary information processing block 122, the multiplication parameters a ₁ ,. . . , A _N are generated from the ICLD parameters so that the total energy of all the playback output channels is the same (or proportional to the energy of the transmitted sum signal).

図５から明らかなように、一般に、１以上の基本チャネルおよび補助情報は、このような特定のパラメトリックマルチチャネル符号化方式において生成される。また、同様に図５から明らかなように、ブロックに基づく方式においては、通常、各ブロックのダウンミックス信号および／または和信号および／または１以上の基本チャネルが例えば１１５２のサンプルからなる１つのブロックを構成するように、入力１１０における元のマルチチャネル信号をブロックステージ１１１によりブロック処理する。同時に、各ブロックにおいて、対応するマルチチャネルパラメータをＢＣＣ分析により生成する。通常、和信号はダウンミックスチャネルを経て、ＭＰ３エンコーダやＡＡＣエンコーダなどのブロック処理エンコーダにより再び符号化され、さらにデータ量を減少させる。同様に、パラメータデータも差分符号化、スケーリング／量子化、エントロピ符号化等により符号化される。 As is apparent from FIG. 5, in general, one or more basic channels and auxiliary information are generated in such a particular parametric multi-channel coding scheme. Similarly, as is apparent from FIG. 5, in the block-based scheme, one block in which the downmix signal and / or the sum signal and / or one or more basic channels of each block is typically composed of 1152 samples, for example. The original multi-channel signal at the input 110 is subjected to block processing by the block stage 111 so that At the same time, in each block, the corresponding multi-channel parameters are generated by BCC analysis. Usually, the sum signal is encoded again by a block processing encoder such as an MP3 encoder or an AAC encoder through a downmix channel, and further reduces the amount of data. Similarly, parameter data is also encoded by differential encoding, scaling / quantization, entropy encoding, and the like.

そして、ＢＣＣエンコーダ１１２およびダウンストリーム基本チャネルエンコーダなどのエンコーダ全体の出力において、共通のデータストリームを導出する。このデータストリームでは、１以上の基本チャネルからなるブロックが、１以上の基本チャネルからなる先行のブロックに連続し、また、符号化されたマルチチャネル補助情報が例えばビットストリームマルチプレクサにより挿入される。 A common data stream is then derived at the output of the entire encoder, such as the BCC encoder 112 and downstream basic channel encoder. In this data stream, a block composed of one or more basic channels is continuous with a preceding block composed of one or more basic channels, and encoded multi-channel auxiliary information is inserted by, for example, a bit stream multiplexer.

マルチチャネル補助情報を挿入する際には、基本チャネルデータおよびマルチチャネル補助情報を含むデータストリームが必ず基本チャネルデータからなるブロックおよびそれに対応するマルチチャネル補助データからなるブロックを含むようにする。これらのブロックは、例えば、送信フレームを構成する。その後、この送信フレームは送信経路を経てデコーダへ送信される。 When inserting the multi-channel auxiliary information, the data stream including the basic channel data and the multi-channel auxiliary information is sure to include a block made up of the basic channel data and a block made up of the corresponding multi-channel auxiliary data. These blocks constitute a transmission frame, for example. Thereafter, this transmission frame is transmitted to the decoder via the transmission path.

入力側では、デコーダはデータストリームからなるフレームを基本チャネルデータのブロックと、それに対応するマルチチャネル補助情報のブロックとに再び分離するデータストリームデマルチプレクサを含む。そして、基本データのブロックを例えばＭＰ３デコーダやＡＡＣデコーダにより復号化する。復号化した基本データのブロックをマルチチャネル補助情報のブロックと共にＢＣＣデコーダ１０２へ送信する。この時、マルチチャネル補助情報のブロックを復号化してもよい。 On the input side, the decoder includes a data stream demultiplexer that again separates the frame of data streams into a block of basic channel data and a corresponding block of multi-channel auxiliary information. Then, the basic data block is decoded by, for example, an MP3 decoder or an AAC decoder. The decoded basic data block is transmitted to the BCC decoder 102 together with the multi-channel auxiliary information block. At this time, the block of multi-channel auxiliary information may be decoded.

このようにして、補助情報と基本チャネルデータの時間的対応関係は基本チャネルデータおよび補助情報を共に送信することで自動的に決定され、フレーム方式のデコーダにより容易に再生できる。つまり、基本チャネルデータからなるブロックおよび関連する補助情報という２種類のデータを１つのデータストリームに含めて共に送信することにより、デコーダはその関連する補助情報を自動的に検出する。これにより、マルチチャネルを高品質に再生することが可能となる。したがって、マルチチャネル補助情報が基本チャネルデータに対しタイムオフセットを受けるという問題は起こらない。しかしながら、タイムオフセットが発生した場合、基本チャネルデータからなるブロックは自身ではなく、例えばその前後のブロックに対応するマルチチャネル補助情報と共に処理されるため、マルチチャネルの再生において大幅な品質劣化の原因となる。 In this way, the temporal correspondence between the auxiliary information and the basic channel data is automatically determined by transmitting both the basic channel data and the auxiliary information, and can be easily reproduced by a frame type decoder. That is, the decoder automatically detects the associated auxiliary information by transmitting two types of data including a block made up of basic channel data and related auxiliary information together in one data stream. Thereby, it is possible to reproduce the multi-channel with high quality. Therefore, the problem that the multi-channel auxiliary information receives a time offset with respect to the basic channel data does not occur. However, when time offset occurs, the block consisting of basic channel data is not itself, but is processed together with multi-channel auxiliary information corresponding to the preceding and following blocks, for example. Become.

このように、基本チャネルデータおよびマルチチャネル補助情報が１つの共通のデータストリームではなく、それぞれ別のデータストリームを構成した場合、マルチチャネル補助情報と基本チャネルデータが対応しなくなってしまう。そのような状況は、例えば、無線やインターネットのように逐次的に動作する送信システムにおいて発生する可能性がある。これらの環境では、送信される音声プログラムは音声基本データ（モノもしくはステレオのダウンミックス音声信号）と拡張データ（マルチチャネル補助情報）に分離され、別々に、もしくは結合して送信される。これら２つのデータストリームが送信装置により同時に送信されたとしても、受信装置までの通信経路において数多くの「予想外のこと」が発生する可能性があり、その結果、ビット数においてかなり軽量なマルチチャネル補助情報のデータストリームが、基本チャネルデータのデータストリームより早く受信装置に到達するというようなことが発生する。 As described above, when the basic channel data and the multi-channel auxiliary information are not one common data stream but are configured as different data streams, the multi-channel auxiliary information and the basic channel data are not associated with each other. Such a situation may occur in a transmission system that operates sequentially, such as wireless or the Internet. In these environments, the audio program to be transmitted is separated into basic audio data (mono or stereo downmix audio signal) and extended data (multi-channel auxiliary information) and transmitted separately or combined. Even if these two data streams are transmitted simultaneously by the transmitting device, a number of “unexpected things” can occur in the communication path to the receiving device, resulting in a multi-channel that is significantly lighter in number of bits. It may happen that the auxiliary information data stream reaches the receiving device earlier than the basic channel data data stream.

さらに、大幅に効率的なビット数を実現するためには、出力データ量が可変的なエンコーダ／デコーダを使うことが好ましい。ある基本チャネルデータのブロックを復号化するのに、どれくらいの時間がかかるか予測不可能である。さらに、この処理は、例えばパソコンやデジタル受信装置内の、実際に使用される復号化ハードウエアに依存する。さらに、システムおよび／またはアルゴリズムに由来するデータの歪みも発生する。なぜなら、特にビット格納技術においては、平均してある一定の出力データ量が得られるが、現実的な観点からすると、かなり符号化しやすいブロックにおいて使用されないビットは、ビット格納装置にキープされ、含まれる音声信号が大容量などの理由から符号化しにくい別のブロックに使われるからである。 Furthermore, in order to realize a significantly efficient number of bits, it is preferable to use an encoder / decoder having a variable output data amount. It is unpredictable how long it will take to decode a block of basic channel data. Furthermore, this process depends on the decoding hardware actually used, for example, in a personal computer or a digital receiver. In addition, distortion of data originating from the system and / or algorithm also occurs. This is because, in particular, in the bit storage technique, a certain amount of output data can be obtained on average, but from a practical viewpoint, bits that are not used in a block that is quite easy to encode are kept and included in the bit storage device. This is because the audio signal is used for another block that is difficult to encode due to a large capacity or the like.

一方で、上述した結合データストリームを２つの別々のデータストリームに分離する方法には、優れた利点がある。例えば、単純なモノあるいはステレオ受信装置などの古いタイプの受信装置は、マルチチャネル補助情報の内容やバージョンに関係なく、いつでも音声基本データを受信および再生できる。このように、個々のデータストリームに分離する方法は、この方法における下位互換性を保障する。 On the other hand, the method of separating the combined data stream described above into two separate data streams has excellent advantages. For example, older types of receivers, such as simple mono or stereo receivers, can receive and play basic audio data at any time, regardless of the content and version of the multi-channel auxiliary information. Thus, the method of separating into individual data streams ensures backward compatibility in this method.

対照的に、新世代の受信装置は、このようなマルチチャネル補助情報を分析し音声基本データと結合することによって、データ全体の拡がり、つまりマルチチャネルサウンドをユーザに提供していると言える。 In contrast, it can be said that the new generation of receiving devices analyze the multi-channel auxiliary information and combine it with the voice basic data, thereby expanding the entire data, that is, providing multi-channel sound to the user.

音声基本データと拡張データを分離して送信する方法としては、デジタル無線が特に興味深い。デジタル無線においては、更なる送信処理をほとんど加えることなく、これまでのステレオ音声信号を５．１などのマルチチャネル形式に拡張するためにマルチチャネル補助情報を利用することができる。この場合、プログラムプロバイダは、送信装置側で、音声／ビデオＤＶＤ等に含まれるようなマルチチャネル音源からマルチチャネル補助情報を生成する。そして、このマルチチャネル補助情報は従来どおりステレオ音声信号と平行して送信されるが、この時点で、ステレオ音声信号は単なるステレオ信号ではなく、ダウンミックスによりマルチチャネル信号から生成された２つの基本チャネルを含む。しかし、２つの基本チャネルからなるステレオ信号も、ユーザには従来のステレオ信号と同じように聞こえる。なぜなら、マルチチャネル分析においても、複数のトラックをミキシングして１つのステレオ信号を取り出すという従来行われている音源加工処理に似た処理が、最終的には行われるからである。 Digital radio is particularly interesting as a method for transmitting voice basic data and extended data separately. In digital radio, multi-channel auxiliary information can be used to extend a conventional stereo audio signal to a multi-channel format such as 5.1 with little additional transmission processing. In this case, the program provider generates multi-channel auxiliary information from the multi-channel sound source as included in the audio / video DVD or the like on the transmission device side. The multi-channel auxiliary information is transmitted in parallel with the stereo audio signal as usual, but at this point, the stereo audio signal is not a simple stereo signal, but two basic channels generated from the multi-channel signal by downmixing. including. However, a stereo signal composed of two basic channels sounds to the user in the same way as a conventional stereo signal. This is because, even in multi-channel analysis, a process similar to a sound source processing process that is conventionally performed, in which a single stereo signal is extracted by mixing a plurality of tracks, is finally performed.

分離処理のすばらしい利点は、既存のデジタル無線送信システムと互換性があるということである。補助情報を分析することのできない従来の受信装置でも、品質の制限を受けることなく、従来どおり２チャネルの音声信号を受信し再生することができる。一方、新しいタイプの受信装置では、マルチチャネル情報を既に受信したステレオ音声信号と併せて分析し、復号化し、それに基づいて元の５．１マルチチャネル信号を再生する。 The great advantage of the separation process is that it is compatible with existing digital radio transmission systems. Even a conventional receiving apparatus that cannot analyze auxiliary information can receive and reproduce a two-channel audio signal as usual without being limited in quality. On the other hand, a new type of receiving apparatus analyzes and decodes the multi-channel information together with the already received stereo audio signal, and reproduces the original 5.1 multi-channel signal based thereon.

デジタル無線システムにおいて、従来使われていたステレオ信号に変わるものとしてマルチチャネル補助情報を同時に送信するためには、上記で述べたようにマルチチャネル補助情報を符号化したダウンミックス音声信号と結合する方法が考えられる。つまり、必要があればスケーリングでき、かつ従来の受信装置でも読み出すことが可能な１つのデータストリームが考えられる。しかしながら、この時、従来の受信装置はマルチチャネル補助情報に関する補助データを検知しない。 In a digital radio system, in order to simultaneously transmit multi-channel auxiliary information as an alternative to a stereo signal used in the past, a method of combining multi-channel auxiliary information with a downmix audio signal encoded as described above Can be considered. That is, one data stream can be considered that can be scaled if necessary and can be read by a conventional receiving apparatus. However, at this time, the conventional receiving apparatus does not detect auxiliary data related to multi-channel auxiliary information.

また、受信装置は（有効な）音声データストリームのみ検知し、新しいタイプの受信装置の場合はさらにマルチチャネル音声補助情報を、対応するアップストリームデータ配信装置を介してデータストリームから抽出し、復号化し、５．１マルチチャネル音声として出力する。この時、マルチチャネル補助情報の抽出は、関連する音声データブロックに同期して行われる。 In addition, the receiving device detects only (valid) audio data streams, and in the case of a new type of receiving device, multi-channel audio auxiliary information is further extracted from the data stream via the corresponding upstream data distribution device and decoded. 5.1 Output as multi-channel audio. At this time, the extraction of the multi-channel auxiliary information is performed in synchronization with the related audio data block.

しかしながら、このアプローチの欠点は、従来のようにステレオ音声信号のみ送信するのではなく、ダウンミックス信号および拡張を結合したデータ信号を送信できるように、従来の構造および／または従来のデータ経路を改良する必要がある点である。そうすれば、標準の送信形式をステレオデータに適応した場合、無線送信においても、同期性は結合データストリームにより保障される。 However, the disadvantage of this approach is that it improves the conventional structure and / or the conventional data path so that it can transmit a data signal combined with a downmix signal and an extension rather than transmitting only a stereo audio signal as in the prior art. It is a point that needs to be done. Then, when the standard transmission format is applied to stereo data, the synchronization is ensured by the combined data stream even in wireless transmission.

しかしながら、従来の無線システムを変更しなければならない、つまりデコーダのみならず無線送信装置および標準化された送信プロトコルも改良しなければならないとすれば、市場の発展の面からかなり大きな問題である。従って、この方法は、一旦標準として実施されているシステムを変更しなければならないという点でかなりの不利益がある。 However, if the conventional wireless system has to be changed, that is, if not only the decoder but also the wireless transmission device and the standardized transmission protocol have to be improved, it is a considerable problem in terms of market development. Therefore, this method has a considerable disadvantage in that the system once implemented as a standard must be changed.

別の選択肢としては、マルチチャネル補助情報を従来の音声符号化システムに適用せず、実際の音声データストリームにも挿入しない方法がある。この場合、送信は異なるデジタル補助チャネルを介して行われるが、必ずしも同期する必要がない。そのような例としては、スタジオ内の従来の音声配信システムによって、例えばＡＥＳ／ＥＢＵデータ形式によるＰＣＭデータのような、非圧縮形式でダウンミックスデータを送信する場合が考えられる。そのようなシステムは音声信号を様々な発信元間でデジタル配信することを目的としており、通常、「クロスレール」として知られている機能ユニットが使われている。この方法に変えて、もしくは追加して、音声調節と動的圧縮を目的としてＰＣＭ形式で音声信号を処理する方法もある。いずれの方法においても、送信装置と受信装置の間の通信経路において、予測不能な遅延が発生する。 Another option is not to apply the multi-channel auxiliary information to the conventional speech coding system and not to insert it into the actual speech data stream. In this case, transmission takes place via a different digital auxiliary channel, but it does not necessarily have to be synchronized. As such an example, a case where downmix data is transmitted in an uncompressed format such as PCM data in an AES / EBU data format by a conventional audio distribution system in a studio can be considered. Such systems are intended to digitally distribute audio signals between various sources, and typically use functional units known as “cross rails”. There is a method of processing an audio signal in the PCM format for the purpose of audio adjustment and dynamic compression instead of or in addition to this method. In any method, an unpredictable delay occurs in the communication path between the transmission device and the reception device.

一方で、基本チャネルデータとマルチチャネル補助情報を分離して送信する方法は、既存のステレオシステムを変更する必要がないという点から特に興味深い。つまり、最初の対応策で述べた、標準に適合しないという不利益は発生しないということである。無線システムは補助チャネルのみを送信すればよく、既存のステレオチャネルシステムを変更する必要がない。受信装置のみ下位互換性を持つよう改良する努力をすればよく、ユーザは新しいタイプの受信装置で古いタイプの受信装置より高品質の音声を得られる。 On the other hand, the method of transmitting basic channel data and multi-channel auxiliary information separately is particularly interesting because it does not require modification of an existing stereo system. In other words, the disadvantage of not conforming to the standard described in the first countermeasure does not occur. The wireless system only needs to transmit the auxiliary channel and there is no need to change the existing stereo channel system. Efforts should be made to improve only the receiving device to be backward compatible, and the user can obtain higher quality speech with a new type of receiving device than with an old type of receiving device.

既に述べたとおり、時間シフトの幅は受信した音声信号および補助情報では決定できない。したがって、受信装置において、正しく同期するマルチチャネル信号を再生および対応付けできるかどうか保証がない。このような遅延の更なる例として、例えばデジタル無線の受信装置のような既存の２チャネル送信システムをマルチチャネル送信に改良する場合が考えられる。この場合、ダウンミックス信号を従来の受信装置内の２チャネル音声デコーダで復号化する際、遅延時間が予測できず補正できないということが良く起こる。極端なケースでは、ダウンミックス音声信号は、アナログ部分を持つ送信システムを介してマルチチャネル再生音声デコーダへ送信されることすらある。つまり、ある時点でデジタル／アナログ変換が行われ、その後記憶処理／送信処理を経て、再度アナログ／デジタル変換が行われる。無線通信では、このようなことが常に発生する。しかも、マルチチャネル補助情報に対するダウンミックス信号の遅延をいかに適切に補正するかを前もって予測することができない。また、Ａ／Ｄ変換のサンプル周波数とＤ／Ａ変換のサンプル周波数が互いに少しでも違えば、２つのサンプルレート同士の比率に応じて、必然的に遅延による時間のずれが発生する。 As already mentioned, the width of the time shift cannot be determined by the received audio signal and auxiliary information. Therefore, there is no guarantee that the receiving apparatus can reproduce and associate a correctly synchronized multi-channel signal. As a further example of such a delay, the case where an existing two-channel transmission system such as a digital radio receiver is improved to multi-channel transmission can be considered. In this case, when the downmix signal is decoded by the two-channel audio decoder in the conventional receiving apparatus, it often happens that the delay time cannot be predicted and cannot be corrected. In extreme cases, the downmix audio signal may even be transmitted to the multi-channel playback audio decoder via a transmission system having an analog portion. That is, digital / analog conversion is performed at a certain point, and then analog / digital conversion is performed again through storage processing / transmission processing. In wireless communication, this always occurs. Moreover, it is impossible to predict in advance how to properly correct the delay of the downmix signal with respect to the multi-channel auxiliary information. Further, if the sample frequency for A / D conversion and the sample frequency for D / A conversion are slightly different from each other, a time lag due to delay inevitably occurs according to the ratio between the two sample rates.

補助データを基本データに同期させるために使われる技術として、「時刻同期方法」として知られる様々な技術がある。これらの技術は、時間スタンプを両方のデータストリームに挿入し、その時間スタンプに基づいて、受信装置において正しくデータを対応させることを基本とする。しかしながら、時間スタンプを挿入するということは、従来のステレオシステムを変更することを意味する。 There are various techniques known as “time synchronization methods” as techniques used to synchronize auxiliary data with basic data. These techniques are based on inserting time stamps into both data streams and correctly matching data at the receiving device based on the time stamps. However, inserting a time stamp means changing a conventional stereo system.

本発明の目的は、基本チャネルデータおよびマルチチャネル補助情報の同期を可能にする、データストリームおよび／またはマルチチャネル表現の生成概念を提供することである。 It is an object of the present invention to provide a data stream and / or multi-channel representation generation concept that allows synchronization of basic channel data and multi-channel auxiliary information.

この目的は、請求項１に記載のデータストリーム生成装置、請求項１７に記載のマルチチャネル表現生成装置、請求項２６に記載のデータストリーム生成方法、請求項２７に記載のマルチチャネル表現生成方法、請求項２８に記載のコンピュータプログラム、または請求項２９に記載のデータストリーム表現により達成される。 The object is to provide a data stream generating device according to claim 1, a multi-channel representation generating device according to claim 17, a data stream generating method according to claim 26, a multi-channel representation generating method according to claim 27, A computer program according to claim 28 or a data stream representation according to claim 29.

本発明は、マルチチャネルデータストリームを「送信側」で修正することにより、基本チャネルデータストリームおよびマルチチャネル補助情報データストリームを別々に送信し、時刻同期して結合することができるとする知見に基づく。この時、１以上の基本チャネルに時間経過を付与するフィンガープリント情報を、マルチチャネル補助情報を含むデータストリームに挿入する。それにより、マルチチャネル補助情報とフィンガープリント情報の対応関係をデータストリームから生成できる。したがって、導出されたマルチチャネル補助情報は導出された基本チャネルデータに対応する。データストリームを別々に送信する際にも保障しなければならないのは、まさにこの対応関係である。 The present invention is based on the finding that by modifying a multi-channel data stream on the “transmitting side”, the basic channel data stream and the multi-channel auxiliary information data stream can be transmitted separately and combined in time synchronization. . At this time, fingerprint information that gives time lapse to one or more basic channels is inserted into a data stream including multi-channel auxiliary information. Thereby, the correspondence between multi-channel auxiliary information and fingerprint information can be generated from the data stream. Accordingly, the derived multi-channel auxiliary information corresponds to the derived basic channel data. It is precisely this correspondence that must be ensured when transmitting data streams separately.

本発明によれば、マルチチャネル補助情報と基本チャネルデータの対応関係は、フィンガープリント情報を基本チャネルデータから決定することによって送信装置側で信号化される。この時、それぞれの基本チャネルデータに対応するマルチチャネル補助情報はマークされる。このマルチチャネル補助情報およびフィンガープリント情報の対応関係のマーキングおよび／または信号化は、ブロックに基づくデータ処理、つまり、それぞれの基本チャネルデータブロックに対応するマルチチャネル補助情報ブロック、そのマルチチャネル補助情報に対応する基本チャネルデータブロックのフィンガープリントを関連付けることで達成される。 According to the present invention, the correspondence between the multi-channel auxiliary information and the basic channel data is signaled on the transmission device side by determining the fingerprint information from the basic channel data. At this time, multi-channel auxiliary information corresponding to each basic channel data is marked. The marking and / or signaling of the correspondence between the multi-channel auxiliary information and the fingerprint information is performed on the basis of block-based data processing, that is, the multi-channel auxiliary information block corresponding to each basic channel data block and the multi-channel auxiliary information. This is accomplished by associating the corresponding basic channel data block fingerprints.

つまり、再生の際に、マルチチャネル補助情報と一緒に処理されるべき基本チャネルデータブロックのフィンガープリントがマルチチャネル補助情報と関連付けられる。ブロックに基づく送信処理では、各マルチチャネル補助情報ブロックが、対応する基本データのブロックフィンガープリントを含むように、マルチチャネル補助情報データストリームのブロック構造の中に基本チャネルデータブロックのブロックフィンガープリントを挿入してもよい。マルチチャネル再生の際に、ブロックフィンガープリントを同期化の目的で読み出すことができるように、ブロックフィンガープリントを先行のマルチチャネル補助情報の後に直接書き込んでも良いし、既に存在するブロックの前に書き込んでも良いし、ブロック内であればいつの時点で書き込んでも良い。データストリームには、適宜挿入されるブロックフィンガープリントと併せて、通常のマルチチャネル補助データも存在する。 That is, at the time of reproduction, the fingerprint of the basic channel data block to be processed together with the multi-channel auxiliary information is associated with the multi-channel auxiliary information. In the block-based transmission process, the block fingerprint of the basic channel data block is inserted into the block structure of the multi-channel auxiliary information data stream so that each multi-channel auxiliary information block includes a corresponding basic data block fingerprint. May be. In order to be able to read out the block fingerprint for synchronization purposes during multi-channel playback, the block fingerprint may be written directly after the preceding multi-channel auxiliary information, or may be written before an already existing block. You can write at any point in the block. In the data stream, there is also normal multi-channel auxiliary data together with a block fingerprint inserted as appropriate.

別な選択肢として、データストリームを、例えばブロックカウンターのような補助情報を与えられたブロックフィンガープリントの全てが、本発明によって生成されたデータストリームの最初に位置するような形式で生成してもよい。それにより、データストリームの第一の部分はブロックフィンガープリントのみを含み、第二の部分は、ブロック処理で書き込まれた、ブロックフィンガープリント情報に対応するマルチチャネル補助情報を含む。この方法には、参照情報が必要であるという欠点があるが、しかしながら、ブロック処理によって書き込まれたブロックフィンガープリントとマルチチャネル補助情報の対応関係はその順番から暗黙的であり、更なる情報は必要ない。 As another option, the data stream may be generated in such a way that all of the block fingerprints provided with auxiliary information such as a block counter are located at the beginning of the data stream generated by the present invention. . Thereby, the first part of the data stream contains only the block fingerprint, and the second part contains the multi-channel auxiliary information corresponding to the block fingerprint information written in the block processing. This method has the disadvantage of requiring reference information, however, the correspondence between block fingerprints written by block processing and multi-channel auxiliary information is implicit in that order, and more information is needed. Absent.

この場合、マルチチャネル再生において、同期化の目的で多数のブロックフィンガープリントを予め読み込み、参照フィンガープリント情報を生成してもよい。そして、相関処理に必要な最低限の数のテストフィンガープリントが得られるまで、テストフィンガープリントを段階的に生成する。その間に、マルチチャネル再生における相関処理が差分を用いて行われる場合は、参照フィンガープリントを例えば差分符号化により処理してもよい。この時、データストリームには差分ブロックフィンガープリントではなく、絶対ブロックフィンガープリントが含まれる。 In this case, in multi-channel playback, a number of block fingerprints may be read in advance for the purpose of synchronization to generate reference fingerprint information. Then, test fingerprints are generated step by step until the minimum number of test fingerprints necessary for the correlation processing is obtained. In the meantime, when the correlation processing in multi-channel reproduction is performed using a difference, the reference fingerprint may be processed by differential encoding, for example. At this time, the data stream includes an absolute block fingerprint, not a differential block fingerprint.

一般的に、基本チャネルデータを含むデータストリームは受信装置側で処理される。すなわち、基本チャネルデータを含むデータストリームはまず復号化され、それから例えばマルチチャネル再生装置へ送信される。好ましくは、このマルチチャネル再生装置は、補助情報を受信しなかった場合には、単にスルースイッチだけを行い、好ましくは２つの基本チャネルをステレオ信号として出力するように構成される。同様に、マルチチャネル補助データに対する基本チャネルデータのオフセットを計算するための相関処理を行うために、復号化した基本チャネルデータから参照フィンガープリント情報を抽出し、テストフィンガープリント情報を計算する。実施例によっては、さらに相関計測して、そのオフセットが本当に正しいかどうか検証してもよい。この場合、２回目の相関処理により得られたオフセットと、１回目の相関処理により得られたオフセットとの差は、所定の閾値以下である。 In general, a data stream including basic channel data is processed on the receiving device side. That is, a data stream including basic channel data is first decoded and then transmitted to, for example, a multi-channel playback device. Preferably, the multi-channel playback device is configured to simply perform a through switch and preferably output two basic channels as stereo signals when no auxiliary information is received. Similarly, in order to perform correlation processing for calculating an offset of basic channel data with respect to multi-channel auxiliary data, reference fingerprint information is extracted from the decoded basic channel data, and test fingerprint information is calculated. In some embodiments, further correlation measurements may be performed to verify that the offset is really correct. In this case, the difference between the offset obtained by the second correlation process and the offset obtained by the first correlation process is equal to or less than a predetermined threshold.

この場合、得られたオフセットは正しいと考えられる。したがって、同期されたマルチチャネル補助情報を受信した後、ステレオ出力からマルチチャネル出力へ変換される。 In this case, the obtained offset is considered correct. Therefore, after receiving the synchronized multi-channel auxiliary information, the stereo output is converted to the multi-channel output.

この処理は、ユーザに同期に要する時間に気づいて欲しくない場合に望ましい。この場合、基本チャネルデータは受信された瞬間に処理され、同期化が行われる際、つまりオフセットが計算される際に、当然ステレオデータのみが出力される。これは、その時点ではまだ同期されたマルチチャネル補助情報が検知されていないためである。 This process is desirable when the user does not want to be aware of the time required for synchronization. In this case, the basic channel data is processed at the moment it is received, and naturally only stereo data is output when synchronization is performed, that is, when the offset is calculated. This is because the synchronized multi-channel auxiliary information has not been detected yet.

オフセットの計算に必要な「最初の遅延」が問題とならない他の実施例では、基本チャネルデータの第１のブロックから順番に同期マルチチャネル補助情報を生成するのと平行して、ステレオデータを予め出力することなく同期処理全体を行い、再生処理してもよい。これにより、ユーザはブロックの最初から同期した５．１を体感できる。 In another embodiment where the “first delay” required for the offset calculation is not a problem, the stereo data is pre-processed in parallel with the generation of the synchronized multi-channel auxiliary information in order from the first block of the basic channel data. The entire synchronization processing may be performed without outputting and reproduction processing may be performed. Thereby, the user can experience 5.1 synchronized from the beginning of the block.

本発明の好ましい実施例では、理想的にオフセットを計算するために参照フィンガープリント情報としての参照フィンガープリントが約２００必要なため、同期に要する時間は通常５秒である。例えば一方向の送信信号の場合のように、この約５秒の遅延が問題にならない場合、オフセット計算に要した時間が経過してからではあるが、５．１再生は最初から行われる。例えば会話等の対話型アプリケーションでは、この遅延は望ましくなく、その場合は同期処理が終了した後、随時ステレオ再生をマルチチャネル再生へと切り替える。同期されないマルチチャネル補助情報に基づいてマルチチャネル再生を行うより、ステレオ再生のみを行うほうが良いことが分かっている。 In the preferred embodiment of the present invention, approximately 200 reference fingerprints are required as reference fingerprint information in order to ideally calculate the offset, so the time required for synchronization is typically 5 seconds. For example, when the delay of about 5 seconds is not a problem as in the case of a unidirectional transmission signal, 5.1 reproduction is performed from the beginning although the time required for the offset calculation has elapsed. For example, in an interactive application such as conversation, this delay is not desirable. In this case, after the synchronization processing is completed, the stereo reproduction is switched to multi-channel reproduction at any time. It has been found that it is better to perform only stereo playback than to perform multi-channel playback based on unsynchronized multi-channel auxiliary information.

本発明によれば、基本チャネルデータとマルチチャネル補助情報を時間的に関連付ける際に発生する問題は、送信装置、受信装置双方を改良することによって解決できる。 According to the present invention, the problem that occurs when temporally associating basic channel data and multi-channel auxiliary information can be solved by improving both the transmitting device and the receiving device.

送信装置においては、時間可変的で適切なフィンガープリント情報を、対応するモノあるいはステレオのダウンミックス音声信号から計算する。好ましくは、このフィンガープリント情報は、送信されたマルチチャネル補助情報データストリームにおいて同期化補助として定期的に挿入される。この処理は、好ましくは、例えばブロック処理された空間音声符号化補助情報の中間におけるデータフィールドとして行われる。もしくは、フィンガープリント信号は、容易に追加したり削除できるようにデータブロックにおける最初あるいは最後の情報として送信される。 In the transmission device, time-varying and appropriate fingerprint information is calculated from the corresponding mono or stereo downmix audio signal. Preferably, this fingerprint information is periodically inserted as a synchronization aid in the transmitted multi-channel auxiliary information data stream. This process is preferably performed, for example, as a data field in the middle of the block-processed spatial audio coding auxiliary information. Alternatively, the fingerprint signal is transmitted as the first or last information in the data block so that it can be easily added or deleted.

受信装置側においては、時間可変的で適切なフィンガープリント情報を、対応するステレオ音声信号、すなわち基本チャネルデータから計算する。この基本チャネルデータは、本発明によれば、好ましくは２つの基本チャネルの複数対からなる。さらに、フィンガープリントをマルチチャネル補助情報から抽出する。その後、マルチチャネル補助情報および受信した音声信号との間のタイムオフセットを、例えばテストフィンガープリント情報および参照フィンガープリント情報の相互相関を計算するような相関処理方法により計算する。また、試行錯誤法により、様々なブロックラスタに基づいて基本チャネルデータから計算した様々な種類のフィンガープリント情報を参照フィンガープリント情報と比較し、対応するテストフィンガープリント情報が参照フィンガープリント情報と最も良く適合するブロックラスタに基づいてタイムオフセットを決定してもよい。 On the receiving device side, time-variable and appropriate fingerprint information is calculated from the corresponding stereo audio signal, that is, basic channel data. According to the invention, this basic channel data preferably consists of a plurality of pairs of two basic channels. Further, the fingerprint is extracted from the multi-channel auxiliary information. Thereafter, the time offset between the multi-channel auxiliary information and the received speech signal is calculated by a correlation processing method such as calculating the cross-correlation of the test fingerprint information and the reference fingerprint information. Also, by trial and error method, various types of fingerprint information calculated from basic channel data based on various block rasters are compared with reference fingerprint information, and the corresponding test fingerprint information is best compared with the reference fingerprint information. A time offset may be determined based on a matching block raster.

最後に、マルチチャネル補助情報を伴う基本チャネルからなる音声信号は、ダウンストリーム遅延補正ステージにより、後のマルチチャネル再生のために同期される。実施例によっては、最初の遅延のみ補正してもよい。しかしながら、好ましくは、オフセット計算は、オフセットを必要に応じて再調整できるように、再生と平行して行われる。また、最初の遅延を補正したにも関わらず、送信した基本チャネルデータおよびマルチチャネル補助情報の間に時間的ずれがある場合には、オフセットの計算は相関処理の結果に基づいて行われる。この遅延補正ステージは、能動的に制御してもよい。 Finally, the audio signal consisting of the basic channel with multi-channel auxiliary information is synchronized for later multi-channel playback by the downstream delay correction stage. In some embodiments, only the initial delay may be corrected. Preferably, however, the offset calculation is performed in parallel with playback so that the offset can be readjusted as needed. If there is a time lag between the transmitted basic channel data and the multi-channel auxiliary information even though the initial delay is corrected, the offset is calculated based on the correlation processing result. This delay correction stage may be actively controlled.

本発明は、基本チャネルデータおよび／または基本チャネルデータの処理経路において、一切変更を必要としない点で効果的である。受信装置に送信された基本チャネルデータストリームは、従来の基本チャネルデータストリームと一切変わらない。変更されるのはマルチチャネルデータストリームのみである。フィンガープリント情報が挿入されるという点が改良点であるが、現時点では、マルチチャネルデータストリームに関しては標準化された方式がないため、マルチチャネル補助データストリームに変更を加えても、基本チャネルデータストリームを改良した場合には発生すると思われる、既に標準として実施され確立された方式に反するという不利益は発生しない。 The present invention is effective in that no change is required in the basic channel data and / or the processing path of the basic channel data. The basic channel data stream transmitted to the receiving apparatus is not different from the conventional basic channel data stream. Only the multi-channel data stream is changed. The improvement is that fingerprint information is inserted, but at present there is no standardized method for multi-channel data streams, so even if changes are made to the multi-channel auxiliary data stream, the basic channel data stream is not changed. There will be no penalty for violating the already established and established standard that would occur if improved.

本発明の概念によれば、マルチチャネル補助情報をかなり柔軟に配信することができる。特に、マルチチャネル補助情報がかなり少ないデータ量または／および記憶容量しか必要としない軽量なパラメータ情報である場合、デジタル受信装置は、そのデータをステレオ信号と完全に分離して受信してもよい。例えば、ユーザはステレオ録音のためのマルチチャネル補助情報を、既に手持ちのソリッドステートプレーヤあるいは別の供給者のＣＤから獲得し、それらをユーザの再生装置に記録することもできる。このような記録処理においては、特にパラメトリックマルチチャネル補助情報の記録に必要な記録条件はそんなに大きくないため、問題は一切発生しない。ユーザがＣＤを挿入もしくはステレオ機器を選択すると、対応するマルチチャネル補助データストリームをマルチチャネル補助データメモリからフェッチし、マルチチャネル補助データストリームのフィンガープリント情報に基づいてステレオ信号と同期して、マルチチャネル再生を実施する。本発明による解決法によれば、全く異なる送信元から送信される場合もあり得るマルチチャネル補助データを、ステレオ信号の種類に関わらずステレオ信号と同期できる。すなわち、ステレオ信号はデジタル無線受信装置から受信しようとも、ＣＤから受信しようとも、ＤＶＤから受信しようとも構わない。また、例えばインターネットを介して受信しようとも構わず、この場合、ステレオ信号は基本チャネルデータとなり、マルチチャネル再生はそれに基づいて行われる。 According to the inventive concept, multi-channel auxiliary information can be distributed with considerable flexibility. In particular, if the multi-channel auxiliary information is light-weight parameter information that requires a relatively small amount of data or / and storage capacity, the digital receiver may receive the data completely separated from the stereo signal. For example, a user can obtain multi-channel auxiliary information for stereo recording from an existing solid state player or another supplier's CD and record it on the user's playback device. In such a recording process, the recording conditions necessary for recording the parametric multi-channel auxiliary information are not so large, and no problem occurs. When the user inserts a CD or selects a stereo device, the corresponding multi-channel auxiliary data stream is fetched from the multi-channel auxiliary data memory and synchronized with the stereo signal based on the fingerprint information of the multi-channel auxiliary data stream. Perform playback. According to the solution according to the invention, multi-channel auxiliary data, which may be transmitted from completely different sources, can be synchronized with the stereo signal regardless of the type of stereo signal. That is, the stereo signal may be received from the digital wireless receiver, received from the CD, or received from the DVD. Also, for example, it may be received via the Internet. In this case, the stereo signal becomes basic channel data, and multi-channel reproduction is performed based on the data.

発明の好ましい実施例について、添付の図面を参照しながら詳細に説明する。 Preferred embodiments of the invention will be described in detail with reference to the accompanying drawings.

図１は、元のマルチチャネル信号をマルチチャネル再生するためのデータストリームを生成する装置を示す。この場合、本発明の好ましい実施例によれば、マルチチャネル信号は少なくとも２つのチャネルからなる。データストリーム生成装置はフィンガープリント生成装置２を含み、元のマルチチャネル信号から生成された１以上の基本チャネルを入力ライン３を通じて、フィンガープリント生成装置２に送信してもよい。基本チャネルの数は、１以上且つ元のマルチチャネル信号のチャネル数より少ない。元のマルチチャネル信号が、２つのチャネルからなる１つのステレオ信号であれば、２つのステレオチャネルからなる１つの基本チャネルのみ生成されることになる。しかしながら、元のマルチチャネル信号が３つ以上のチャネルからなる信号であれば、基本チャネルの数は２となる。従来のステレオ再生と同様に、マルチチャネル補助データなしで音声を再生できるため、このような実施形態が好ましい。本発明の好ましい実施例では、元のマルチチャネル信号は５つのチャネルと、１つのＬＦＥ（ＬｏｗＦｒｅｑｕｅｎｃｙＥｎｈａｎｃｅｍｅｎｔ＝低音増強）チャネルからなるサラウンド信号である。ＬＦＥチャネルはサブウーファともよばれる。５つのチャネルは、左サラウンドチャネルＬｓ、左チャネルＬ、中央チャネルＣ、右チャネルＲ、後方右および／または右サラウンドチャネルＲｓからなる。２つの基本チャネルは左基本チャネルおよび右基本チャネルからなる。当業者は、１つおよび／または複数の基本チャネルをダウンミックスチャネルと呼ぶこともある。 FIG. 1 shows an apparatus for generating a data stream for multi-channel reproduction of an original multi-channel signal. In this case, according to a preferred embodiment of the invention, the multi-channel signal consists of at least two channels. The data stream generation device may include a fingerprint generation device 2, and may transmit one or more basic channels generated from the original multi-channel signal to the fingerprint generation device 2 through the input line 3. The number of basic channels is one or more and less than the number of channels of the original multi-channel signal. If the original multi-channel signal is one stereo signal composed of two channels, only one basic channel composed of two stereo channels is generated. However, if the original multi-channel signal is a signal composed of three or more channels, the number of basic channels is two. Such an embodiment is preferred because audio can be played back without multi-channel auxiliary data as in conventional stereo playback. In the preferred embodiment of the present invention, the original multi-channel signal is a surround signal consisting of five channels and one LFE (Low Frequency Enhancement) channel. The LFE channel is also called a subwoofer. The five channels include a left surround channel Ls, a left channel L, a center channel C, a right channel R, a rear right and / or a right surround channel Rs. The two basic channels consist of a left basic channel and a right basic channel. One skilled in the art may refer to one and / or multiple basic channels as downmix channels.

フィンガープリント生成装置２は１以上の基本チャネルからフィンガープリント情報を生成するための装置である。フィンガープリント情報は１以上の基本チャネルに時間経過を付与する。実施例によって、フィンガープリント情報の計算に要する作業量は変わる。例えば、「音声ＩＤ」で知られる統計的方法に基づいてフィンガープリントを計算する際には、大きな作業量を要する。しかしながら、これ以外のどんな数値で１以上の基本チャネルに時間経過を与えても構わない。 The fingerprint generation device 2 is a device for generating fingerprint information from one or more basic channels. The fingerprint information gives the passage of time to one or more basic channels. The amount of work required to calculate fingerprint information varies depending on the embodiment. For example, when calculating a fingerprint based on a statistical method known as “voice ID”, a large amount of work is required. However, time may be given to one or more basic channels with any other numerical value.

本発明によれば、ブロックに基づく処理が望ましい。この場合、フィンガープリント情報は一連のブロックフィンガープリントからなり、各ブロックフィンガープリントは各ブロック内の１つおよび／または複数のチャネルのエネルギを示す値となる。別の方法としては、例えば所定のサンプルブロック１個もしくは複数のサンプルブロックの組合せをブロックフィンガープリントとして利用することもできる。この場合、フィンガープリント情報であるフィンガープリントブロックの数が十分に多ければ、粗いものであったとしても、１以上の基本チャネルの時間特性を再生できるからである。一般的に、フィンガープリント情報は１以上の基本チャネルのサンプルデータから生成され、多少のエラーを伴って１以上の基本チャネルに時間経過を付与する。これにより、後で述べるように、マルチチャネル補助情報のデータストリームおよび基本チャネルの間のオフセットを最終決定するために、デコーダ／受信装置側で基本チャネルからテストフィンガープリント情報との相関を計算できる。 According to the present invention, block-based processing is desirable. In this case, the fingerprint information consists of a series of block fingerprints, each block fingerprint being a value indicating the energy of one and / or multiple channels within each block. As another method, for example, a predetermined sample block or a combination of a plurality of sample blocks can be used as a block fingerprint. In this case, if the number of fingerprint blocks, which are fingerprint information, is sufficiently large, the time characteristics of one or more basic channels can be reproduced even if they are coarse. In general, fingerprint information is generated from sample data of one or more basic channels, and gives a time lapse to one or more basic channels with some errors. This allows the decoder / receiver side to calculate the correlation with the test fingerprint information from the basic channel in order to finally determine the offset between the multi-channel auxiliary information data stream and the basic channel, as will be described later.

出力側では、フィンガープリント生成装置２はデータストリーム生成装置４に送信するフィンガープリント情報を生成する。データストリーム生成装置４はフィンガープリント情報からデータストリームと、通常、時間可変的なマルチチャネル補助情報を生成する。マルチチャネル補助情報と１以上の基本チャネルを組合わせることにより元のマルチチャネル信号をマルチチャネル再生できる。データストリーム生成装置は出力５においてデータストリームを生成し、マルチチャネル補助情報とフィンガープリント情報の対応関係をデータストリームから生成する。本発明によれば、１以上の基本チャネルから生成されたフィンガープリント情報によりマルチチャネル補助情報のデータストリームをマーキングし、マルチチャネル補助情報と基本チャネルデータの対応関係をフィンガープリント情報により決定する。この時、フィンガープリント情報とマルチチャネル補助情報とはデータストリーム生成装置４において対応付けられる。 On the output side, the fingerprint generation device 2 generates fingerprint information to be transmitted to the data stream generation device 4. The data stream generator 4 generates a data stream from the fingerprint information and usually multi-channel auxiliary information that is variable in time. By combining the multichannel auxiliary information and one or more basic channels, the original multichannel signal can be reproduced in multichannel. The data stream generator generates a data stream at output 5 and generates a correspondence between the multi-channel auxiliary information and the fingerprint information from the data stream. According to the present invention, the data stream of the multi-channel auxiliary information is marked by the fingerprint information generated from one or more basic channels, and the correspondence between the multi-channel auxiliary information and the basic channel data is determined by the fingerprint information. At this time, the fingerprint information and the multi-channel auxiliary information are associated in the data stream generation device 4.

図２は、本発明による、１以上の基本チャネルとデータストリームから元のマルチチャネル信号のマルチチャネル表現を生成する装置を示す。この時、データストリームは１以上の基本チャネルに時間経過を付与するフィンガープリント情報およびマルチチャネル補助情報を含み、１以上の基本チャネルと組合わせることにより、元のマルチチャネル信号をマルチチャネル再生できる。マルチチャネル補助情報およびフィンガープリント情報の対応関係はデータストリームから生成してもよい。受信装置および／またはデコーダにおいて、１以上の基本チャネルは入力１０を介してフィンガープリント生成装置１１に送信される。出力側では、フィンガープリント生成装置１１は出力１２を介してテストフィンガープリント情報を同期装置１３に送信する。好ましくは、図１に示すブロック２で実行されるものと全く同じアルゴリズムにより、１以上の基本チャネルからテストフィンガープリント情報を生成する。しかしながら、実施例によっては、このアルゴリズムは全く同一でなくてもよい。 FIG. 2 shows an apparatus for generating a multi-channel representation of an original multi-channel signal from one or more basic channels and a data stream according to the present invention. At this time, the data stream includes fingerprint information and multi-channel auxiliary information that gives time lapse to one or more basic channels, and by combining with one or more basic channels, the original multi-channel signal can be reproduced in multi-channel. The correspondence between the multi-channel auxiliary information and the fingerprint information may be generated from the data stream. In the receiving device and / or the decoder, one or more basic channels are transmitted to the fingerprint generating device 11 via the input 10. On the output side, the fingerprint generation device 11 transmits test fingerprint information to the synchronization device 13 via the output 12. Preferably, test fingerprint information is generated from one or more basic channels by exactly the same algorithm as executed in block 2 shown in FIG. However, in some embodiments, this algorithm may not be exactly the same.

例えば、フィンガープリント生成装置２は絶対符号化によりブロックフィンガープリントを生成し、デコーダのフィンガープリント生成装置１１は差分に基づきフィンガープリントを決定してもよい。この時、ブロックに対応するテストブロックフィンガープリントは２つの絶対フィンガープリントの差分となる。この場合、すなわち、絶対ブロックフィンガープリントがフィンガープリント情報を含むデータストリームによって送信される場合、フィンガープリント抽出装置１４はデータストリームからフィンガープリント情報を抽出し、同時に差分を形成し、そのデータを参照フィンガープリント情報として出力１５を介して同期装置１３に送信する。このデータはテストフィンガープリント情報に相当する。 For example, the fingerprint generation device 2 may generate a block fingerprint by absolute encoding, and the fingerprint generation device 11 of the decoder may determine the fingerprint based on the difference. At this time, the test block fingerprint corresponding to the block is the difference between the two absolute fingerprints. In this case, i.e. when the absolute block fingerprint is transmitted by a data stream containing fingerprint information, the fingerprint extractor 14 extracts the fingerprint information from the data stream and at the same time forms a difference and uses the data as a reference finger. The print information is transmitted to the synchronization device 13 via the output 15. This data corresponds to the test fingerprint information.

一般的に、デコーダにおけるテストフィンガープリント情報の計算アルゴリズムと、エンコーダにおけるフィンガープリント情報の計算アルゴリズムは少なくとも、同期装置１３において、これら２種類のフィンガープリント情報を使って、入力１６を介して受信するデータストリームに含まれるマルチチャネル補助データと１以上の基本チャネルに含まれるデータを同期できる程度には、類似していることが望ましい。この時、エンコーダにおけるフィンガープリント情報は、図２に示すように、参照フィンガープリント情報とも呼ばれる。同期装置の出力におけるマルチチャネル表現として、基本チャネルデータおよび同期するマルチチャネル補助データを含む、同期マルチチャネル表現が生成される。 Generally, the test fingerprint information calculation algorithm in the decoder and the fingerprint information calculation algorithm in the encoder are at least data received at the synchronizer 13 via the input 16 using these two types of fingerprint information. It is desirable that the multi-channel auxiliary data included in the stream be similar to the extent that the data included in one or more basic channels can be synchronized. At this time, the fingerprint information in the encoder is also called reference fingerprint information as shown in FIG. As a multi-channel representation at the output of the synchronizer, a synchronized multi-channel representation is generated that includes the basic channel data and the synchronized multi-channel auxiliary data.

この観点から、好ましくは、同期装置１３は基本チャネルデータおよびマルチチャネル補助データ間のタイムオフセットを決定し、決定したタイムオフセットに基づいてマルチチャネル補助データを遅延させる。通常、マルチチャネル補助データの方が到達するのが早い、すなわち早すぎることが明らかになっている。これは、通常、基本チャネルデータのデータ量に比べてマルチチャネル補助情報に対応するデータ量がかなり少ないことに起因するであろう。したがって、マルチチャネル補助データが遅延すれば、１以上の基本チャネルに含まれるデータは基本チャネルデータライン１７を介して入力１０から同期装置１３へ送信され、文字通り同期装置１３をただ「通過」し、出力１８から再び出力される。入力１６から受信したマルチチャネル補助データはマルチチャネル補助データライン１９を介して同期装置へ送信され、そこで、決定されたタイムオフセットに基づき遅延され、基本チャネルデータと共に同期装置の出力２０からマルチチャネル再生装置２１に送信される。再生装置は、例えば５つの音声チャネルおよび１つのウーファチャネル（図２には示さない）を生成するために、出力側で音声再生処理を行う。 From this point of view, the synchronizer 13 preferably determines a time offset between the basic channel data and the multi-channel auxiliary data, and delays the multi-channel auxiliary data based on the determined time offset. It has been found that multi-channel auxiliary data usually arrives faster, ie too early. This may be due to the fact that the amount of data corresponding to multi-channel auxiliary information is usually much smaller than the amount of basic channel data. Thus, if the multi-channel auxiliary data is delayed, the data contained in one or more basic channels is transmitted from the input 10 to the synchronizer 13 via the basic channel data line 17 and literally just "passes" through the synchronizer 13; The output 18 is output again. Multi-channel auxiliary data received from input 16 is transmitted to the synchronizer via multi-channel auxiliary data line 19 where it is delayed based on the determined time offset and multi-channel playback from the synchronizer output 20 along with the basic channel data. It is transmitted to the device 21. The playback device performs a sound playback process on the output side in order to generate, for example, five sound channels and one woofer channel (not shown in FIG. 2).

ライン１８および２０におけるデータは同期したマルチチャネル表現を構成し、ライン２０上のデータストリームは、フィンガープリント情報がデータストリームから分離されている点を除き、恐らく行われるマルチチャネル補助データの符号化から離れて、入力１６におけるデータストリームに対応している。実施例によっては、フィンガープリント情報をデータストリームから分離する処理は、同期装置１３、もしくはそれ以前の段階で行われる。もしくは、フィンガープリントを分離する処理は、予めフィンガープリント抽出装置１４で行ってもよい。この場合、ライン１９は存在せず、ライン１９’が直接フィンガープリント抽出装置９から同期装置１３に接続される。この場合、マルチチャネル補助データおよび参照フィンガープリント情報の両方が、フィンガープリント抽出装置により同期装置１３へ並列に送信される。 The data on lines 18 and 20 constitutes a synchronized multi-channel representation, and the data stream on line 20 is probably from the encoding of the multi-channel auxiliary data performed, except that the fingerprint information is separated from the data stream. Away, it corresponds to the data stream at input 16. Depending on the embodiment, the process of separating the fingerprint information from the data stream is performed at the synchronizer 13 or earlier. Alternatively, the process of separating the fingerprints may be performed in advance by the fingerprint extraction device 14. In this case, the line 19 does not exist, and the line 19 ′ is directly connected from the fingerprint extractor 9 to the synchronizer 13. In this case, both the multi-channel auxiliary data and the reference fingerprint information are transmitted in parallel to the synchronizer 13 by the fingerprint extractor.

同期装置は、テストフィンガープリント情報および参照フィンガープリント情報に基づいて、また、マルチチャネル情報およびデータストリームから生成され、かつ、データストリームに含まれるフィンガープリント情報との相関に基づいて、マルチチャネル補助情報および１以上の基本チャネルを同期する。後で述べるように、好ましくは、マルチチャネル補助情報とフィンガープリント情報の時間的対応関係は、単純にフィンガープリント情報が、マルチチャネル補助情報の前に位置するか、後ろに位置するか、もしくは、中に位置するかによって決定される。フィンガープリントがマルチチャネル補助情報の前に位置するか、後ろに位置するか、中に位置するかによって、そのマルチチャネル補助情報が間違いなくそのフィンガープリント情報に対応するものかどうか、エンコーダで決定される。 The synchronizer includes multi-channel auxiliary information based on the test fingerprint information and the reference fingerprint information, and based on the correlation with the fingerprint information generated from the multi-channel information and the data stream and included in the data stream. And one or more basic channels are synchronized. As will be described later, preferably, the temporal correspondence between the multi-channel auxiliary information and the fingerprint information is simply that the fingerprint information is located before or after the multi-channel auxiliary information, or It is determined by whether it is located inside. Depending on whether the fingerprint is positioned before, behind, or in the multi-channel auxiliary information, the encoder determines whether the multi-channel auxiliary information definitely corresponds to the fingerprint information. The

好ましくは、ブロックに基づく処理が行われる。好ましくは、フィンガープリントを挿入する際に、マルチチャネル補助データのブロックは必ずブロックフィンガープリントの後に続く。すなわち、マルチチャネル補助情報は、ブロックフィンガープリントと交互になっている。しかし、またこれとは別に、全てのフィンガープリント情報がデータストリームの最初の分離した部分に書かれ、その後にデータストリーム全体が続くようなデータストリームの形式が使われてもよい。この場合は、ブロックフィンガープリントと、マルチチャネル補助情報のブロックは交互にならない。フィンガープリントとマルチチャネル補助情報を関連付ける他の方法は、当業者には公知である。本発明によれば、マルチチャネル補助情報とフィンガープリント情報の関連付けは、フィンガープリント情報を使ってマルチチャネル補助情報および基本チャネルデータを同期できるように、デコーダにおいてデータストリームに基づいて行われればよい。 Preferably, block-based processing is performed. Preferably, when inserting a fingerprint, the block of multi-channel auxiliary data always follows the block fingerprint. That is, the multi-channel auxiliary information is alternated with the block fingerprint. However, alternatively, a data stream format may be used in which all fingerprint information is written in the first separate part of the data stream, followed by the entire data stream. In this case, the block fingerprint and the block of multi-channel auxiliary information are not alternated. Other methods for associating fingerprints with multi-channel auxiliary information are known to those skilled in the art. According to the present invention, the association between the multi-channel auxiliary information and the fingerprint information may be performed based on the data stream in the decoder so that the multi-channel auxiliary information and the basic channel data can be synchronized using the fingerprint information.

次に、図７ａ〜図７ｄを参照して、ブロック処理の好ましい実施例について述べる。図７ａは、一連のブロックＢ１〜Ｂ８からなる、例えば５．１信号等の元のマルチチャネル信号を示し、図７ａの例によれば各ブロックはマルチチャネル情報ＭＫｉを含む。５チャネル信号の場合を考えると、ブロックＢ１など各ブロックはそれぞれのチャネルに対応する、例えば１１５２個の第一の音声サンプルを含む。このブロックサイズは、例えば、図５に示すＢＣＣエンコーダ１１２において好ましい。この場合、連続する信号から一連のブロックを生成するためのブロック生成処理、すなわち切出し処理は、図５において「ブロック」として示す構成要素１１１によって実行される。 A preferred embodiment of the block processing will now be described with reference to FIGS. 7a-7d. FIG. 7a shows an original multi-channel signal, such as a 5.1 signal, consisting of a series of blocks B1-B8, and according to the example of FIG. 7a, each block contains multi-channel information MKi. Considering the case of a 5-channel signal, each block, such as block B1, includes, for example, 1152 first audio samples corresponding to the respective channel. This block size is preferable, for example, in the BCC encoder 112 shown in FIG. In this case, block generation processing for generating a series of blocks from continuous signals, that is, extraction processing, is executed by the component 111 shown as “block” in FIG. 5.

１以上の基本チャネルを、図５において参照符号１１５で示す「和信号」としてダウンミックスブロック１１４で出力する。基本チャネルデータは、再び、一連のブロックＢ１〜Ｂ８として示す。ここで、図７ｂに示すブロックＢ１〜Ｂ８は図７ａに示すブロックＢ１〜Ｂ８に対応する。しかし、時間領域表現に基づけば、この時点ではブロックは元の５．１信号は含まず、モノ信号もしくは２つのステレオ基本チャネルからなるステレオ信号のみ含む。従って、ブロックＢ１は、第１のステレオ基本チャネルおよび第２のステレオ基本チャネルの両方の、１１５２個の時間サンプルを含む。この左右両方のステレオ基本チャネルの１１５２個のサンプルは、サンプル加減および重み付けにより計算され、該当する場合には、例えば図５に示すダウンミックスブロック１１４における実施例により計算される。同様に、マルチチャネル情報を含むデータストリームはブロックＢ１〜Ｂ８を含む。図７ｃに示す各ブロックは図７ａに示す元のマルチチャネル信号のブロックおよび／または図７ｂに示す１つまたは複数の基本チャネルのブロックに対応する。例えば、元のマルチチャネル信号ＭＫ１のブロックＢ１を再生するためには、基本チャネルデータストリームのブロックＢ１に含まれる基本チャネルデータＢＫ１を、図７ｃに示すブロックＢ１に含まれるマルチチャネル情報Ｐ１と結合させなければならない。図６に示す実施例においては、この結合処理はＢＣＣ合成ブロックにおいて行われる。この場合、基本チャネルデータをブロック処理するために、入力においてブロック生成ステージを含む。 One or more basic channels are output by the downmix block 114 as a “sum signal” denoted by reference numeral 115 in FIG. The basic channel data is again shown as a series of blocks B1-B8. Here, the blocks B1 to B8 shown in FIG. 7b correspond to the blocks B1 to B8 shown in FIG. 7a. However, based on the time domain representation, at this point the block does not contain the original 5.1 signal, but only a mono signal or a stereo signal consisting of two stereo basic channels. Thus, block B1 includes 1152 time samples of both the first stereo base channel and the second stereo base channel. The 1152 samples of both left and right stereo fundamental channels are calculated by sample addition and weighting, and where applicable, for example, by the embodiment in the downmix block 114 shown in FIG. Similarly, a data stream including multi-channel information includes blocks B1 to B8. Each block shown in FIG. 7c corresponds to the original multi-channel signal block shown in FIG. 7a and / or one or more basic channel blocks shown in FIG. 7b. For example, to reproduce the block B1 of the original multichannel signal MK1, the basic channel data BK1 included in the block B1 of the basic channel data stream is combined with the multichannel information P1 included in the block B1 shown in FIG. 7c. There must be. In the embodiment shown in FIG. 6, this combining process is performed in the BCC synthesis block. In this case, a block generation stage is included at the input to block the basic channel data.

したがって、図７ｃに示すようにＰ３はマルチチャネル情報を表し、マルチチャネル情報と、基本チャネルに含まれるＢＫ３のブロックとを組み合わせることにより、元のマルチチャネル信号に含まれるブロック値ＭＫ３を再生することができる。 Accordingly, as shown in FIG. 7c, P3 represents multi-channel information, and the block value MK3 included in the original multi-channel signal is reproduced by combining the multi-channel information and the block of BK3 included in the basic channel. Can do.

本発明によれば、図７ｃに示すデータストリームの各ブロックＢｉはブロックフィンガープリントを含む。すなわち、好ましくは、ブロックＢ３においてブロックフィンガープリントＦ３はマルチチャネル情報のブロックＰ３の後ろに書かれている。このブロックフィンガープリントは、この時点で、ブロック値ＢＫ３を含むブロックＢ３から生成される。もしくは、ブロックフィンガープリントＦ３は差分符号化により処理してもよい。この時、フィンガープリントＦ３は、基本チャネルにおけるブロックＢＫ３のブロックフィンガープリントと、基本チャネルにおけるブロック値ＢＫ２を含むブロックのブロックフィンガープリントとの差分である。本発明の好ましい実施例においては、エネルギ値および／または差分エネルギ値をブロックフィンガープリントとして利用する。 According to the present invention, each block Bi of the data stream shown in FIG. 7c includes a block fingerprint. That is, preferably, in block B3, the block fingerprint F3 is written after the block P3 of multi-channel information. This block fingerprint is now generated from block B3 containing block value BK3. Alternatively, the block fingerprint F3 may be processed by differential encoding. At this time, the fingerprint F3 is a difference between the block fingerprint of the block BK3 in the basic channel and the block fingerprint of the block including the block value BK2 in the basic channel. In the preferred embodiment of the present invention, energy values and / or differential energy values are utilized as block fingerprints.

初めに述べた方式では、図７ｂに示す１以上の基本チャネルを含むデータストリームを、図７ｃに示すマルチチャネル情報およびフィンガープリント情報を含むデータストリームから分離してマルチチャネル再生装置へ送信する。他の処理を何も行わなければ、例えば図５に示すＢＣＣ合成ブロック１２２のようなマルチチャネル再生装置において、次に処理されるべきブロックがＢＫ５という場合が考えられる。しかしながら、マルチチャネル情報における時間のずれから、ブロックＢ５の変わりにブロックＢ７が次に処理される、ということが起こり得る。そのままいくと、基本チャネルデータのブロックＢＫ５はマルチチャネル情報Ｐ７と共に再生され、アーチファクトとなる。本発明によれば、後で詳細に述べるように、２つのブロック間のオフセットを計算して図７ｃに示すデータストリームを２ブロック分遅延し、互いに同期した図７ｂに示すデータストリームと図７ｃに示すデータストリームからマルチチャネル表現を再生する。 In the scheme described at the beginning, the data stream including one or more basic channels illustrated in FIG. 7b is separated from the data stream including multi-channel information and fingerprint information illustrated in FIG. 7c and transmitted to the multi-channel playback device. If no other processing is performed, for example, in a multi-channel playback device such as the BCC synthesis block 122 shown in FIG. 5, the block to be processed next may be BK5. However, due to the time lag in the multi-channel information, it may happen that block B7 is processed next instead of block B5. As it is, the basic channel data block BK5 is reproduced together with the multi-channel information P7 and becomes an artifact. According to the present invention, as will be described in detail later, the offset between two blocks is calculated to delay the data stream shown in FIG. 7c by two blocks, and the data stream shown in FIG. Play a multi-channel representation from the indicated data stream.

実施例により、またフィンガープリント情報の構成／正確性により、本発明おけるオフセットの決定は、ブロックの倍数（整数）として計算することに限らず、ブロックの分数として正確なオフセットを決定してもよい。もしくは、計算された相関が十分に正確で、十分な数のブロックフィンガープリントがあれば、あるサンプルを導出してもよい。（当然、相関を計算するための時間を要する。）しかしながら、そんなに高い正確性は必ずしも必要としないことが明らかになっており、プラスマイナスブロック半分の誤差の同期精度（１１５２個のサンプルからなるブロック長）があれば、ユーザが欠陥データを感じないと思われる程度のマルチチャネル再生が達成される。 According to the embodiment and due to the configuration / accuracy of the fingerprint information, the determination of the offset in the present invention is not limited to calculation as a multiple (integer) of the block, and an accurate offset may be determined as a fraction of the block. . Alternatively, a sample may be derived if the calculated correlation is sufficiently accurate and there are a sufficient number of block fingerprints. (Of course, it takes time to calculate the correlation.) However, it has become clear that such high accuracy is not necessarily required, and the synchronization accuracy of the error of half of plus and minus blocks (block consisting of 1152 samples) Long channel), multi-channel playback is achieved to the extent that the user does not feel defective data.

図７ｄはブロックＢｉ、例えば図７ｃに示すデータストリームに含まれるブロックＢ３の好ましい実施例を示す。このブロックは、例えば１バイトの長さをもつ同期語で始まり、次には長さ情報が来る。なぜなら、当業者には明らかなように、このブロックは好ましくは計算処理の後、マルチチャネル情報Ｐ３をスケーリングし、量子化し、エントロピ符号化するためである。例えばパラメータ情報やサイドチャネルの波形信号などのマルチチャネル情報の長さを最初から知ることはできず、そのため、データストリームの中で信号化しなければならない。 FIG. 7d shows a preferred embodiment of block Bi, for example block B3 included in the data stream shown in FIG. 7c. This block begins with a sync word having a length of, for example, 1 byte, and then comes length information. This is because, as will be apparent to those skilled in the art, this block preferably scales, quantizes and entropy codes the multi-channel information P3 after computational processing. For example, the length of multi-channel information such as parameter information and side channel waveform signals cannot be known from the beginning, and therefore must be signaled in the data stream.

そこで、本発明においては、ブロックフィンガープリントをマルチチャネル情報Ｐ３の最後部に挿入する。図７ｄに示す実施例において、１バイト、つまり８ビットがブロックフィンガープリントに使われる。１ブロックあたり１つのエネルギ尺度のみ使われるため、量子化のみ行われて、エントロピ符号化は行われない実施例においては、８ビットの量子化出力長による量子化では、量子化装置が使われる。したがって、量子化エネルギ値が図７ｄに示す８ビットのフィールド、「ブロックＦＡ」に、更なる処理を経ずに入力される。図７ｄには示さないが同様に、次のデータストリームブロックのための同期化バイト、長さのバイト、そしてさらにＢＫ４に対応するマルチチャネル情報Ｐ４と続く。この場合、基本チャネルデータブロックＢＫ４に対応するマルチチャネル情報Ｐ４のブロックの後には、同様に、基本チャネルデータＢＫ４に基づくブロックフィンガープリントが続く。 Therefore, in the present invention, the block fingerprint is inserted at the end of the multichannel information P3. In the embodiment shown in FIG. 7d, 1 byte, or 8 bits, is used for the block fingerprint. Since only one energy measure is used per block, only quantization is performed, and entropy coding is not performed. In the quantization using the 8-bit quantization output length, a quantizer is used. Therefore, the quantized energy value is input to the 8-bit field “Block FA” shown in FIG. 7d without further processing. Although not shown in FIG. 7d, it is similarly followed by a synchronization byte for the next data stream block, a length byte, and further multi-channel information P4 corresponding to BK4. In this case, similarly to the block of the multi-channel information P4 corresponding to the basic channel data block BK4, a block fingerprint based on the basic channel data BK4 follows.

図７ｄに示すように、絶対エネルギ尺度もしくは差分エネルギ尺度をエネルギ尺度として採用してもよい。この場合、基本チャネルデータＢＫ３のエネルギ尺度と基本チャネルデータＢＫ２の差分がデータストリームのブロックＢ３にブロックフィンガープリントエネルギ値として追加される。 As shown in FIG. 7d, an absolute energy scale or a differential energy scale may be employed as the energy scale. In this case, the difference between the energy measure of the basic channel data BK3 and the basic channel data BK2 is added to the block B3 of the data stream as a block fingerprint energy value.

図８は、図２に示す同期装置、フィンガープリント生成装置１１、およびフィンガープリント抽出装置９をマルチチャネル再生装置２１と共に更に詳細に示す。基本チャネルデータを基本チャネルデータバッファ２５へ送信し、中間でバッファリングする。同様に、補助情報および／または、補助情報およびフィンガープリント情報を含むデータストリームを補助情報バッファ２６に送信する。通常、両方のバッファはＦＩＦＯバッファの構造になっているが、バッファ２６は更にフィンガープリント情報を参照フィンガープリント抽出装置９により抽出し、さらにデータストリームから分離できる容量を備える。これにより、挿入されたフィンガープリントを含まずに、マルチチャネル補助情報のみをバッファ出力ライン２７を介して出力する。フィンガープリントをデータストリームから分離する処理は、タイムシフタ２８やその他の構成要素により行われてもよく、その場合、マルチチャネル再生時に、マルチチャネル再生装置２１はフィンガープリントバイトの影響を受けない。絶対フィンガープリントが参照用およびテスト用両方に使われた場合、フィンガープリント生成装置１１により計算されたフィンガープリント情報は、フィンガープリント抽出装置９により決定されたフィンガープリント情報同様に、図２に示す同期装置１３内の相関器２９に直接送信されてもよい。そして、相関器はオフセット値を計算し、その計算したオフセット値をオフセットライン３０を介してタイムシフタ２８へ送信する。有効なオフセット値が生成され、タイムシフタ２８へ送信されると、同期装置１３は更に、実行装置３１を制御する。これにより、実行装置３１はスイッチ３２を閉鎖し、バッファ２６からのマルチチャネル補助データのストリームは、タイムシフタ２８およびスイッチ３２を介してマルチチャネル再生装置２１へ送信される。 FIG. 8 shows the synchronization device, fingerprint generation device 11 and fingerprint extraction device 9 shown in FIG. The basic channel data is transmitted to the basic channel data buffer 25 and buffered in the middle. Similarly, a data stream including auxiliary information and / or auxiliary information and fingerprint information is transmitted to the auxiliary information buffer 26. Typically, both buffers are in the form of FIFO buffers, but buffer 26 also has the capacity to extract fingerprint information by reference fingerprint extractor 9 and further separate it from the data stream. As a result, only the multi-channel auxiliary information is output via the buffer output line 27 without including the inserted fingerprint. The process of separating the fingerprint from the data stream may be performed by the time shifter 28 or other components. In this case, the multi-channel playback device 21 is not affected by the fingerprint byte during multi-channel playback. When the absolute fingerprint is used for both reference and test, the fingerprint information calculated by the fingerprint generator 11 is the same as the fingerprint information determined by the fingerprint extractor 9 as shown in FIG. It may be sent directly to the correlator 29 in the device 13. The correlator calculates an offset value and transmits the calculated offset value to the time shifter 28 via the offset line 30. When a valid offset value is generated and transmitted to the time shifter 28, the synchronization device 13 further controls the execution device 31. As a result, the execution device 31 closes the switch 32, and the multi-channel auxiliary data stream from the buffer 26 is transmitted to the multi-channel playback device 21 via the time shifter 28 and the switch 32.

本発明の好ましい実施例では、マルチチャネル補助情報のタイムシフト（遅延）のみ行われる。同時に、正確なオフセット値の計算と平行してマルチチャネル再生も行われるため、ユーザはマルチチャネル再生装置２１の出力において、オフセット値を正確に計算するために発生する時間の遅延に気づかない。しかしながら、このようなマルチチャネル再生は、「簡単な」マルチチャネル再生に過ぎない。なぜなら、好ましくは、単に２つのステレオ基本チャネルがマルチチャネル再生装置２１から出力されるだけだからである。したがって、スイッチ３２が開放されている場合、ステレオ出力のみ行われる。しかし、スイッチ３２が閉鎖されている場合、マルチチャネル再生装置２１は、ステレオ基本チャネルと併せてマルチチャネル補助情報も受信し、マルチチャネル出力を行う。しかしながら、この時、このマルチチャネル出力は既に同期されている。ユーザは、ステレオ品質がマルチチャネル品質に変換されていることにしか気づかない。 In the preferred embodiment of the present invention, only time shifting (delaying) of multi-channel auxiliary information is performed. At the same time, multi-channel playback is also performed in parallel with accurate offset value calculation, so that the user is unaware of the time delay that occurs at the output of the multi-channel playback device 21 to accurately calculate the offset value. However, such multi-channel playback is only “simple” multi-channel playback. This is because preferably only two stereo basic channels are output from the multi-channel playback device 21. Therefore, when the switch 32 is opened, only stereo output is performed. However, when the switch 32 is closed, the multi-channel playback device 21 also receives multi-channel auxiliary information together with the stereo basic channel and performs multi-channel output. However, at this time, the multi-channel output is already synchronized. The user only notices that the stereo quality has been converted to multi-channel quality.

しかしながら、時間の最初の遅延が主な問題ではないケースの場合、マルチチャネル再生装置２１における出力は、有効なオフセットが得られるまで保留してもよい。一番最初のブロック（図７ｂに示すＢＫ１）を、正確に遅延させたマルチチャネル補助データＰ１（図７ｃ）と共にマルチチャネル再生装置２１へ送信してもよい。この場合、マルチチャネルデータが得られた時のみ、出力が開始されることになる。この実施例では、スイッチが開放している時にはマルチチャネル再生装置２１での出力は行われない。 However, in cases where the initial delay in time is not the main problem, the output at the multi-channel playback device 21 may be suspended until a valid offset is obtained. The very first block (BK1 shown in FIG. 7b) may be transmitted to the multi-channel playback device 21 together with the accurately delayed multi-channel auxiliary data P1 (FIG. 7c). In this case, output is started only when multi-channel data is obtained. In this embodiment, when the switch is open, no output from the multi-channel playback device 21 is performed.

次に、図９を参照して、図８に示す相関器２９の機能について説明する。図９の最上部の図に示すように、テストフィンガープリント計算装置１１の出力において、一連のテストフィンガープリント情報が送信される。従って、１、２、３、４、ｉの参照符号で示す基本チャネルの各ブロックに対して、ブロックフィンガープリントが得られる。相関アルゴリズムによっては、一連の離散値のみ相関に必要とする場合もある。しかしながら、図９に示すように、他の相関アルゴリズムでは離散値の間を補間する曲線を入力値として求めてもよい。同様に、参照フィンガープリント決定装置９は、一連の離散参照フィンガープリントをデータストリームから抽出し生成する。例えば、データストリームが差分符号化処理されたフィンガープリント情報を含み、相関器が絶対フィンガープリントに基づいて動作する場合、図８に示す差分デコーダ３５が作動する。しかしながら、好ましくは、データストリームはエネルギ尺度としての絶対フィンガープリントを含む。なぜなら、このようなブロックごとの総エネルギに関する情報は、マルチチャネル再生装置２１におけるレベル補正にも有効活用できるからである。更に、好ましくは、相関処理は、差分フィンガープリントに基づいて行われる。この場合、既に述べたように、ブロック９は相関器より前の段階で差分処理を行い、ブロック１１も相関器より前の段階で差分処理を行う。 Next, the function of the correlator 29 shown in FIG. 8 will be described with reference to FIG. As shown in the top diagram of FIG. 9, a series of test fingerprint information is transmitted at the output of the test fingerprint calculation device 11. Accordingly, a block fingerprint is obtained for each block of the basic channel indicated by reference numerals 1, 2, 3, 4, i. Depending on the correlation algorithm, only a series of discrete values may be required for correlation. However, as shown in FIG. 9, in another correlation algorithm, a curve that interpolates between discrete values may be obtained as an input value. Similarly, the reference fingerprint determination device 9 extracts and generates a series of discrete reference fingerprints from the data stream. For example, if the data stream includes fingerprint information that has been differentially encoded and the correlator operates based on absolute fingerprints, the differential decoder 35 shown in FIG. 8 operates. However, preferably the data stream includes an absolute fingerprint as an energy measure. This is because such information on the total energy for each block can be effectively used for level correction in the multi-channel playback device 21. Further preferably, the correlation process is performed based on a differential fingerprint. In this case, as already described, the block 9 performs difference processing at a stage before the correlator, and the block 11 also performs difference processing at a stage before the correlator.

図９の上部２つの図に示すように、相関器２９は曲線および／または一連の離散値を示し、また、図９の最下部の図に示すような相関結果を得る。この相関結果では、オフセット成分は２つのフィンガープリント情報曲線の間のオフセットを示す。更に、オフセットは正であるため、マルチチャネル補助情報を正の時間方向へシフト、つまり遅延しなければならない。なお、マルチチャネル再生装置における２つの情報の入力時に同期マルチチャネル表現を含んでさえいれば、当然、基本チャネルデータを負の時間方向へシフトしてもよい。あるいは、マルチチャネル補助情報を正の方向へいくらかシフトし、且つ基本チャネル補助データをオフセットのうちいくらか分、負の方向へシフトしてもよい。 As shown in the top two diagrams of FIG. 9, the correlator 29 shows a curve and / or a series of discrete values and obtains a correlation result as shown in the bottom diagram of FIG. In this correlation result, the offset component indicates the offset between the two fingerprint information curves. Furthermore, since the offset is positive, the multichannel auxiliary information must be shifted or delayed in the positive time direction. Of course, the basic channel data may be shifted in the negative time direction as long as the synchronized multi-channel expression is included when two pieces of information are input in the multi-channel playback device. Alternatively, the multi-channel auxiliary information may be shifted somewhat in the positive direction, and the basic channel auxiliary data may be shifted in the negative direction by some of the offset.

次に、図１０を参照して、音声出力と平行してオフセットを計算する際の好ましい実施例について説明する。基本チャネルデータが常に１つのフィンガープリントを計算するようにバッファリングし、マルチチャネル再生のために、既に計算したテストブロックフィンガープリントをマルチチャネル再生装置へ送信する。次に、同様に基本チャネルデータの次のブロックをバッファ２５へ送信し、このブロックからテストブロックフィンガープリントを計算する。例えば、２００個のブロックについて、この処理を実行する。しかしながら、この２００個のブロックは、「簡単な」マルチチャネル再生として、単にステレオ出力データとしてマルチチャネル再生装置からステレオ出力される。この場合、ユーザは遅延には気づかない。 Next, a preferred embodiment for calculating the offset in parallel with the audio output will be described with reference to FIG. The basic channel data is always buffered so as to calculate one fingerprint, and the already calculated test block fingerprint is transmitted to the multi-channel playback device for multi-channel playback. Next, similarly, the next block of basic channel data is transmitted to the buffer 25, and the test block fingerprint is calculated from this block. For example, this process is executed for 200 blocks. However, these 200 blocks are simply output as stereo output data from the multi-channel playback device as “simple” multi-channel playback. In this case, the user is unaware of the delay.

実施例によっては、２００個より少ない、もしくは２００個より多い数のブロックを使用してもよい。本発明によれば、１００個から３００個の間の数のブロック、好ましくは２００個のブロックから、計算時間・相関計算作業量・オフセットの正確性の間に妥当な妥協点を得られることが分かっている。 Depending on the embodiment, fewer than 200 or more than 200 blocks may be used. According to the present invention, it is possible to obtain a reasonable compromise between the calculation time, the correlation calculation workload, and the accuracy of the offset from the number of blocks between 100 and 300, preferably 200 blocks. I know it.

ブロック３６の処理が完了すると、ブロック３７の処理を実行する。ここでは、計算した２００個のテストブロックフィンガープリントと計算した２００個の参照ブロックフィンガープリントを相関器２９により相関処理し、得られるオフセット結果を記憶する。そして、次の例えば２００個の基本チャネルデータブロックを、ブロック３６の処理に相当するブロック３８の処理に基づき計算する。同様に、２００個のブロックをマルチチャネル補助情報を含むデータストリームから抽出する。続いて、ブロック３９で同様に相関処理を行い、得られるオフセット結果を記憶する。そして、ブロック４０の処理で、第１の２００個のブロック群に基づくオフセット結果と、第２の２００個のブロック郡に基づくオフセット結果の偏差値を決定する。ブロック４１の処理では、この偏差値が所定の閾値より小さい場合、オフセットを図８に示すタイムシフタ２８へオフセットライン３０を介して送信し、スイッチ３２を閉鎖する。それにより、この時点でマルチチャネル出力のスイッチを構成する。偏差値に対する所定の閾値は、例えば、１つもしくは２つブロック分である。これは、オフセットが最初の計算と次の計算の間で、１つもしくは２つブロック分以上違わなければ、相関計算処理において誤りは発生しないからである。 When the process of block 36 is completed, the process of block 37 is executed. Here, the calculated 200 test block fingerprints and the calculated 200 reference block fingerprints are correlated by the correlator 29, and the obtained offset results are stored. Then, the next 200 basic channel data blocks, for example, are calculated based on the processing of the block 38 corresponding to the processing of the block 36. Similarly, 200 blocks are extracted from the data stream including multi-channel auxiliary information. Subsequently, the correlation process is similarly performed in block 39, and the obtained offset result is stored. Then, in the process of block 40, a deviation value between the offset result based on the first 200 block groups and the offset result based on the second 200 block groups is determined. In the process of block 41, when the deviation value is smaller than the predetermined threshold value, the offset is transmitted to the time shifter 28 shown in FIG. 8 via the offset line 30, and the switch 32 is closed. Thus, a multi-channel output switch is configured at this point. The predetermined threshold value for the deviation value is, for example, one or two blocks. This is because an error does not occur in the correlation calculation process unless the offset differs by one or more blocks between the first calculation and the next calculation.

上記の実施例とは違い、例えば２００個のブロック分の窓の長さに基づくスライドウインドウを利用してもよい。例えば、２００個のブロックの計算を行い、結果を得る。そして、１個先のブロックを処理し、相関計算処理に使ったブロックからブロックを１個削除し、変わりに新しいブロックを使う。先に得られた結果同様に、計算した結果をヒストグラムに記録する。この処理を、相関計算処理の回数分だけ、つまり、例えば１００個か２００個行い、段階的にヒストグラムを埋める。ヒストグラムの頂点をオフセットとして計算し、最初のオフセットを算出し、もしくは動的再調整を行う。 Unlike the above embodiment, for example, a sliding window based on the length of a window of 200 blocks may be used. For example, 200 blocks are calculated and the result is obtained. Then, one block ahead is processed, one block is deleted from the block used for the correlation calculation processing, and a new block is used instead. Similar to the result obtained earlier, the calculated result is recorded in the histogram. This process is performed as many times as the number of correlation calculation processes, that is, for example, 100 or 200, and the histogram is filled stepwise. The vertex of the histogram is calculated as an offset, the first offset is calculated, or dynamic readjustment is performed.

オフセット計算は出力と同時に行われ、ブロック４２の処理と平行して行われる。必要であれば、マルチチャネル情報を含むデータストリームおよび基本チャネルデータを含むデータストリームが正しく対応付けされていないのが発見された場合、更新したオフセット値を図８に示すタイムシフタ２８にライン３０を介して送信し、適応および／または動的オフセットトラッキングを実行する。なお、適応トラッキングを行う際は、実施例に応じてオフセット変化を平滑化し、例えば２つのブロックの偏差値を求めた時に、必要に応じてオフセットを１つずつ増加し続け、曲線が急激に変化しないようにしてもよい。 The offset calculation is performed simultaneously with the output, and is performed in parallel with the processing of the block 42. If necessary, if it is found that the data stream containing multi-channel information and the data stream containing basic channel data are not correctly associated, the updated offset value is sent to the time shifter 28 shown in FIG. Transmit and perform adaptive and / or dynamic offset tracking. When adaptive tracking is performed, the offset change is smoothed according to the embodiment. For example, when the deviation value of two blocks is obtained, the offset is continuously increased one by one as necessary, and the curve changes rapidly. You may make it not.

次に、図１１を参照して、図１に示すエンコーダ側のフィンガープリント生成装置２および図２に示すデコーダ側のフィンガープリント生成装置１１の好ましい実施例について述べる。 Next, a preferred embodiment of the encoder-side fingerprint generator 2 shown in FIG. 1 and the decoder-side fingerprint generator 11 shown in FIG. 2 will be described with reference to FIG.

通常、マルチチャネル音声信号は、マルチチャネル補助データを取得するために、所定のサイズのブロックに分割される。この時、マルチチャネル補助データの取得と同時に、ブロックごとのフィンガープリントを計算する。この方法は、信号の時間構造を出来るだけ一意的に特徴付けるのに有効である。この考えに基づく実施例では、音声ブロックの現在のダウンミックス音声信号におけるエネルギ容量を例えばデシベル表現のような対数形式で利用する。この場合、フィンガープリントは音声信号の時間エンベロープを表す。送信する情報量を減少し、測定値の正確性を向上させるために、このような同期情報を先行するブロックのエネルギ値との差分として表現してもよく、その後に適宜、例えばハフマン符号化などのエントロピ符号化、適応スケーリング、および量子化を実行してもよい。時間エンベロープのフィンガープリントは以下ように求める。 Usually, a multi-channel audio signal is divided into blocks of a predetermined size in order to obtain multi-channel auxiliary data. At this time, the fingerprint for each block is calculated simultaneously with the acquisition of the multi-channel auxiliary data. This method is effective for characterizing the temporal structure of the signal as uniquely as possible. In an embodiment based on this idea, the energy capacity in the current downmix audio signal of the audio block is utilized in a logarithmic form, for example in decibel representation. In this case, the fingerprint represents the time envelope of the audio signal. In order to reduce the amount of information to be transmitted and improve the accuracy of the measurement value, such synchronization information may be expressed as a difference from the energy value of the preceding block, after which, for example, Huffman coding etc. Entropy coding, adaptive scaling, and quantization may be performed. The fingerprint of the time envelope is obtained as follows.

まず、図１１の１に示すように、現在のブロックにおけるダウンミックス音声信号のエネルギを、通常、ステレオ信号について計算する。例えば、左右両方のダウンミックスチャネルの１１５２個の音声サンプルをそれぞれ二乗し、合計する。Ｓ_ｌｅｆｔ（ｉ）は左基本チャネルの時間ｉにおける時間サンプルを表し、Ｓ_{ｒｉｇｈｔ}（ｉ）は右基本チャネルの時間ｉにおける時間サンプルを表す。モノラルのダウンミックス信号では、合計処理は行われない。更に、好ましくは、ダウンミックス音声信号において、本発明に重要ではない直接の構成要素を計算処理の前の段階で削除する。 First, as shown by 1 in FIG. 11, the energy of the downmix audio signal in the current block is normally calculated for a stereo signal. For example, 1152 audio samples of both the left and right downmix channels are squared and summed. S _left (i) represents the time sample at time i of the left basic channel, and S _right (i) represents the time sample at time i of the right basic channel. Sum processing is not performed on a monaural downmix signal. Further, preferably, in the downmix audio signal, the direct components not important to the present invention are deleted at a stage before the calculation process.

次に行われる対数表現のために、ステップ２においてエネルギの最小化を行う。エネルギをデシベル分析するために、好ましくは最小エネルギオフセットを使い、ゼロエネルギの場合には、妥当な対数計算が行われるようにする。このエネルギ尺度をｄＢで表すと、１６ビットの音声信号解像度では、０〜９０（ｄＢ）の範囲になる。 In step 2, energy minimization is performed for the next logarithmic representation. In order to analyze the energy in decibels, preferably a minimum energy offset is used so that in the case of zero energy, a reasonable logarithmic calculation is performed. When this energy scale is expressed in dB, it is in the range of 0 to 90 (dB) at a 16-bit audio signal resolution.

図１１の３に示すように、マルチチャネル補助情報および受信した信号の間のタイムオフセットを正確に決定する際には、絶対エネルギエンベロープではなく、信号エンベロープの傾き（傾斜度）を使用するのが好ましい。したがって、エネルギエンベロープの傾きのみを相関計算処理に使用する。技術的な面から言うと、この信号導出は、先行するブロックのエネルギ値との間の差分処理により計算する。この処理は、例えばエンコーダなどで実行され、フィンガープリントは差分符号化された値からなる。また、この処理は、デコーダのみで実行してもよい。この場合、送信されたフィンガープリントは非差分符号化の値からなる。この時、差分の計算はデコーダのみで行われる。後者の解決法においては、フィンガープリントがダウンミックス信号の絶対エネルギに関する情報を含むという利点がある。しかしながら、典型的には、フィンガープリントにおいて、いくらか長いワード長を必要とする。 As shown in FIG. 11-3, when accurately determining the time offset between the multi-channel auxiliary information and the received signal, the slope (gradient) of the signal envelope is used instead of the absolute energy envelope. preferable. Therefore, only the slope of the energy envelope is used for the correlation calculation process. From a technical point of view, this signal derivation is calculated by a difference process between the energy values of the preceding blocks. This process is executed by an encoder or the like, for example, and the fingerprint consists of a differentially encoded value. Further, this process may be executed only by the decoder. In this case, the transmitted fingerprint consists of non-differential encoded values. At this time, the difference is calculated only by the decoder. The latter solution has the advantage that the fingerprint contains information about the absolute energy of the downmix signal. However, typically a somewhat longer word length is required in the fingerprint.

さらに、最適制御のために、エネルギ（信号のエンベロープ）をスケーリングするのが好ましい。次に行うこのフィンガープリントの量子化において、数値的な幅を最大まで活用し、さらに低いエネルギ値に対する解像度を向上するために、さらにスケーリング（利得）するのが有効である。スケーリングは所定の統計的重み付けにより実行してもよいし、あるいはエンベロープ信号に適応された動的利得制御により実行してもよい。 Furthermore, it is preferable to scale the energy (signal envelope) for optimal control. In the subsequent fingerprint quantization, it is useful to further scale (gain) to take full advantage of the numerical width and improve the resolution for even lower energy values. Scaling may be performed with a predetermined statistical weighting or with dynamic gain control adapted to the envelope signal.

さらに、図１１の５に示すように、フィンガープリントを量子化する。このフィンガープリントをマルチチャネル補助情報に挿入するために８ビットに量子化する。実際、この減少したフィンガープリント解像度は、必要となるビット数や遅延の検出における信頼度の面から有効な妥協点であることが分かっている。２５５を超える数のオーバーフローについては、特性飽和曲線により２５５が最大値となるよう制限されている。 Further, as shown by 5 in FIG. 11, the fingerprint is quantized. This fingerprint is quantized to 8 bits for insertion into multi-channel auxiliary information. In fact, this reduced fingerprint resolution has proven to be an effective compromise in terms of reliability in detecting the required number of bits and delay. The number of overflows exceeding 255 is limited by the characteristic saturation curve so that 255 is the maximum value.

図１１の６に示すように、この時点でフィンガープリントを最適エントロピ符号化してもよい。フィンガープリントの統計的特性を求めることにより、量子化フィンガープリントが必要とするビット数を更に減少できる。有効なエントロピ方法は、例えばハフマン符号化や算術符号化である。フィンガープリントごとの統計的に異なる周波数は、異なる符号長により表し、フィンガープリント表現において平均的に必要なビット数を減少してもよい。
マルチチャネル補助データの計算は、マルチチャネル音声信号を利用して、音声ブロックごとに行われる。計算されたマルチチャネル補助情報は続いて同期情報により拡張され、適当な埋込み処理によりビットストリームに追加される。 As indicated by 6 in FIG. 11, the fingerprint may be optimally entropy encoded at this point. By determining the statistical characteristics of the fingerprint, the number of bits required by the quantization fingerprint can be further reduced. An effective entropy method is, for example, Huffman coding or arithmetic coding. Statistically different frequencies for each fingerprint may be represented by different code lengths, reducing the average number of bits required in the fingerprint representation.
Multi-channel auxiliary data is calculated for each audio block using a multi-channel audio signal. The calculated multi-channel auxiliary information is subsequently extended with synchronization information and added to the bitstream by an appropriate embedding process.

本発明の解決策によれば、受信装置はダウンミックス信号と補助情報のタイムオフセットを検出し、時間にずれのない適応化、つまり、ステレオ音声信号とマルチチャネル補助情報の間の遅延を、プラスマイナス音声ブロック半分分の範囲で補間する。したがって、受信装置において、マルチチャネル構造はほぼ完全に、つまりプラスマイナス音声フレーム半分分のほとんど知覚されない時間のずれを除いて、再生される。この場合、再生されたマルチチャネル音声信号の品質に、特筆するほどの影響は与えない。 According to the solution of the present invention, the receiving device detects the time offset between the downmix signal and the auxiliary information and adds a time-free adaptation, i.e. a delay between the stereo audio signal and the multi-channel auxiliary information. Interpolate in the range of half the negative audio block. Thus, at the receiving device, the multi-channel structure is reproduced almost completely, i.e., with little perceptible time lag of half a plus or minus voice frame. In this case, the quality of the reproduced multi-channel audio signal is not significantly affected.

環境に応じて、本発明による生成方法および／または復号化方法はハードウエアまたはソフトウエアのいずれで実現してもよい。これは、デジタル記憶媒体、特に電子的に読出し可能な制御信号を備えるフロッピーディスクやＣＤ上で実現され、フロッピーディスクやＣＤは、本発明による方法が実行されるように、プログラム可能なコンピュータシステムと連動できる。一般に、本発明は、コンピュータで実行し、本発明の方法を実現するための機械で読取り可能な担体上に記憶されたプログラム符号を有するコンピュータプログラム製品においても実現される。すなわち、本発明は、コンピュータ上で実行すれば、本発明による方法を実現するためのプログラム符号を有するコンピュータプログラムとして、実現することもできる。 Depending on the environment, the generation method and / or the decoding method according to the present invention may be implemented in either hardware or software. This is realized on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can be programmed with a programmable computer system so that the method according to the invention can be carried out. Can be linked. In general, the invention is also implemented in a computer program product having a program code stored on a machine-readable carrier for executing the method of the invention on a computer. In other words, the present invention can be realized as a computer program having a program code for realizing the method according to the present invention when executed on a computer.

本発明のデータストリーム生成装置の回路ブロック図である。It is a circuit block diagram of the data stream generation device of the present invention. 本発明のマルチチャネル表現生成装置の回路ブロック図である。It is a circuit block diagram of the multi-channel expression generation device of the present invention. チャネルデータおよびパラメトリックマルチチャネル情報を生成するための公知のジョイントステレオエンコーダの図である。1 is a diagram of a known joint stereo encoder for generating channel data and parametric multi-channel information. FIG. ＢＣＣ符号化／復号化のためのＩＣＬＤ、ＩＣＴＤおよびＩＣＣパラメータを決定するための図式である。FIG. 4 is a diagram for determining ICLD, ICTD and ICC parameters for BCC encoding / decoding. BCCエンコーダ／デコーダ列のブロック図である。It is a block diagram of a BCC encoder / decoder row. 図５に示すＢＣＣ合成ブロックの実現例である。6 is an implementation example of the BCC synthesis block shown in FIG. 5. 元のマルチチャネル信号を一連のブロックとして表した概略図である。It is the schematic which represented the original multichannel signal as a series of blocks. １つまたは複数の基本チャネルを一連のブロックとして表した概略図である。FIG. 2 is a schematic diagram representing one or more basic channels as a series of blocks. 本発明による、マルチチャネル情報および関連するブロックフィンガープリントを含むデータストリームの概略図である。FIG. 3 is a schematic diagram of a data stream including multi-channel information and associated block fingerprints according to the present invention. 図７ｃに示すデータストリームのブロックの典型例を示す図である。FIG. 8 is a diagram illustrating a typical example of a block of the data stream illustrated in FIG. 7c. 本発明の好ましい実施例による、マルチチャネル表現生成装置の詳細図である。FIG. 3 is a detailed view of a multi-channel representation generator according to a preferred embodiment of the present invention. テストフィンガープリント情報および参照フィンガープリント情報間の相関に基づくオフセット決定処理の概略図である。It is the schematic of the offset determination process based on the correlation between test fingerprint information and reference fingerprint information. データ出力と平行して行うオフセット決定処理の好ましい実施例を示すフローチャートである。It is a flowchart which shows the preferable Example of the offset determination process performed in parallel with a data output. エンコーダおよびデコーダにおける、フィンガープリント情報および／または符号化フィンガープリント情報の計算処理の概略図である。It is the schematic of the calculation process of the fingerprint information and / or encoding fingerprint information in an encoder and a decoder.

Claims

２以上のチャネルを有する元のマルチチャネル信号をマルチチャネル再生するためのデータストリームを生成する装置であって、
前記元のマルチチャネル信号から生成した１以上かつ前記元のマルチチャネル信号のチャネル数より少ない数の１以上の基本チャネルから、前記１以上の基本チャネルに時間経過を付与するフィンガープリント情報を生成するフィンガープリント生成装置（２）および、
前記１以上の基本チャネルと組合わせることにより前記元のマルチチャネル信号のマルチチャネル再生を可能にする時間可変的マルチチャネル補助情報のデータストリームを前記フィンガープリント情報から生成するデータストリーム生成装置（４）を備え、
前記データストリーム生成装置（４）は前記マルチチャネル補助情報および前記フィンガープリント情報の間の時間的対応関係を前記データストリームから生成するための前記データストリームを生成する、装置。 An apparatus for generating a data stream for multi-channel reproduction of an original multi-channel signal having two or more channels,
Fingerprint information that gives a time lapse to the one or more basic channels is generated from one or more basic channels that are one or more generated from the original multichannel signal and less than the number of channels of the original multichannel signal. A fingerprint generator (2); and
A data stream generating device (4) for generating a data stream of time-variable multi-channel auxiliary information that enables multi-channel reproduction of the original multi-channel signal by combining with the one or more basic channels from the fingerprint information With
The data stream generator (4) generates the data stream for generating a temporal correspondence between the multi-channel auxiliary information and the fingerprint information from the data stream.

前記フィンガープリント生成装置（２）は前記１以上の基本チャネルをブロック処理して前記フィンガープリント情報を生成し、
マルチチャネル再生のために前記１以上の基本チャネルのブロックと組合わせるために前記マルチチャネル補助情報をブロック処理により計算し、
前記データストリーム生成装置（４）は前記マルチチャネル補助情報および前記フィンガープリント情報をブロック処理により前記データストリームに書き込む、請求項１に記載の装置。 The fingerprint generation device (2) generates the fingerprint information by blocking the one or more basic channels.
Calculating the multi-channel auxiliary information by block processing to combine with the one or more basic channel blocks for multi-channel playback;
The device according to claim 1, wherein the data stream generation device (4) writes the multi-channel auxiliary information and the fingerprint information to the data stream by block processing.

前記フィンガープリント生成装置（２）は、前記１以上の基本チャネルのブロックに関するフィンガープリント情報として、前記ブロック内の前記基本チャネルに時間経過を付与するブロックフィンガープリントを生成し、
前記マルチチャネル補助情報のブロックはマルチチャネル再生のために前記基本チャネルのブロックと組合わされ、
前記マルチチャネル補助情報のブロックおよび前記フィンガープリントのブロックが互いに所定の対応関係を形成するように、前記データストリーム生成装置（４）はブロック処理にて前記データストリームを生成する、請求項２に記載の装置。 The fingerprint generation device (2) generates a block fingerprint that gives a time lapse to the basic channel in the block as fingerprint information about the block of the one or more basic channels,
The multi-channel auxiliary information block is combined with the basic channel block for multi-channel playback,
The said data stream production | generation apparatus (4) produces | generates the said data stream by a block process so that the block of the said multi-channel auxiliary information and the block of the said fingerprint may form predetermined correspondence mutually. Equipment.

前記フィンガープリント生成装置（２）は、時間的に連続する前記１以上の基本チャネルのブロックに対し一連のブロックフィンガープリントをフィンガープリント情報として計算し、
前記マルチチャネル補助情報を、時間的に連続する前記１以上の基本チャネルのブロックに対しブロック処理にて生成し、
前記データストリーム生成装置は、前記一連のマルチチャネル補助情報のブロックに対し所定の関係で前記一連のブロックフィンガープリントを書き込む、請求項２に記載の装置。 The fingerprint generation device (2) calculates a series of block fingerprints as fingerprint information for the one or more basic channel blocks that are temporally continuous,
The multi-channel auxiliary information is generated by block processing for the one or more basic channel blocks that are temporally continuous,
3. The apparatus of claim 2, wherein the data stream generator writes the series of block fingerprints in a predetermined relationship to the series of multi-channel auxiliary information blocks.

前記フィンガープリント生成装置（２）は、前記１以上の基本チャネルのブロック２個における２種類のフィンガープリント値間の差分をブロックフィンガープリントとして計算する、請求項４に記載の装置。 The device according to claim 4, wherein the fingerprint generation device (2) calculates a difference between two types of fingerprint values in two blocks of the one or more basic channels as a block fingerprint.

前記フィンガープリント生成装置（２）は、フィンガープリント値の量子化およびエントロピ符号化を行い、前記フィンガープリント情報を生成する、請求項１から請求項５のいずれかに記載の装置。 The said fingerprint production | generation apparatus (2) is an apparatus in any one of Claims 1-5 which performs the quantization and entropy encoding of a fingerprint value, and produces | generates the said fingerprint information.

前記フィンガープリント生成装置（２）はフィンガープリント値をスケーリング情報によりスケーリングし、更に前記フィンガープリント情報に基づき前記スケーリング情報を前記データストリームに書き込む、請求項６に記載の装置。 The apparatus according to claim 6, wherein the fingerprint generator (2) scales a fingerprint value with scaling information and further writes the scaling information to the data stream based on the fingerprint information.

前記フィンガープリント生成装置（２）はブロック処理にて前記フィンガープリント情報を計算し、
前記データストリーム生成装置（４）は、データストリームのブロックが、マルチチャネル補助情報のブロックおよびそれに対応するフィンガープリント情報のブロックおよび前記１以上の基本チャネルのブロックからなるようにブロック処理にて前記データストリームを生成する、請求項１から請求項７のいずれかに記載の装置。 The fingerprint generation device (2) calculates the fingerprint information by block processing,
The data stream generation device (4) performs block processing so that the data stream block includes a multi-channel auxiliary information block, a corresponding fingerprint information block, and the one or more basic channel blocks. The apparatus according to claim 1, which generates a stream.

２以上の基本チャネルがあり、
前記フィンガープリント生成装置（２）は前記２以上の基本チャネルをサンプル処理もしくはスペクトル処理により加算、もしくは二乗し加算する、請求項１から請求項８のいずれかに記載の装置。 There are two or more basic channels,
The device according to any one of claims 1 to 8, wherein the fingerprint generation device (2) adds or squares the two or more basic channels by sample processing or spectral processing.

前記フィンガープリント生成装置（２）は、前記１以上の基本チャネルのエネルギエンベロープに関するデータをフィンガープリント情報として利用する、請求項１から請求項９のいずれかに記載の装置。 The device according to any one of claims 1 to 9, wherein the fingerprint generation device (2) uses data relating to an energy envelope of the one or more basic channels as fingerprint information.

前記フィンガープリント生成装置（２）は、前記１以上の基本チャネルのエネルギエンベロープに関するデータをフィンガープリント情報として利用し、
前記フィンガープリント生成装置（２）は更に、前記エネルギの最小化を利用し、最小エネルギを対数表現する、請求項１０に記載の装置。 The fingerprint generation device (2) uses data relating to an energy envelope of the one or more basic channels as fingerprint information,
11. The device according to claim 10, wherein the fingerprint generator (2) further utilizes the energy minimization to logarithmically represent the minimum energy.

前記１以上の基本チャネルが符号化形式でマルチチャネル再生装置に送信され、
前記符号化形式は非可逆エンコーダにより生成され、
更に、前記１以上の基本チャネルを前記フィンガープリント生成装置（２）に対する入力信号として復号化するための基本チャネルデコーダを備える、請求項１１に記載の装置。 The one or more basic channels are transmitted in encoded form to a multi-channel playback device;
The encoding format is generated by a lossy encoder,
12. Apparatus according to claim 11, further comprising a basic channel decoder for decoding the one or more basic channels as an input signal to the fingerprint generator (2).

前記マルチチャネル補助データが、それぞれ対応する前記１以上の基本チャネルのブロックとブロック的に対応するマルチチャネルパラメータデータである、請求項１から請求項１２のいずれかに記載の装置。 13. The apparatus according to any one of claims 1 to 12, wherein the multi-channel auxiliary data is multi-channel parameter data corresponding in block with the corresponding one or more basic channel blocks.

前記１以上の基本チャネルの一連のブロックおよび前記マルチチャネル補助情報の一連のブロックをブロック処理にて生成するマルチチャネル分析装置（１１２）を更に備え、
前記フィンガープリント生成装置（２）はブロックフィンガープリント値を前記１以上の基本チャネルの各ブロック値から計算する、請求項１３に記載の装置。 A multi-channel analyzer (112) for generating a block of the one or more basic channels and a block of the multi-channel auxiliary information by block processing;
14. The device according to claim 13, wherein the fingerprint generator (2) calculates a block fingerprint value from each block value of the one or more basic channels.

前記データストリーム生成装置（４）は、前記１以上の基本チャネルをマルチチャネル再生手段に送信するための標準データチャネルとは別のデータチャネルに前記データストリームを生成する、請求項１４に記載の装置。 The apparatus according to claim 14, wherein the data stream generating device (4) generates the data stream in a data channel different from a standard data channel for transmitting the one or more basic channels to a multi-channel playback means. .

前記標準データチャネルは、デジタルステレオ無線信号のための標準チャネルまたはインターネットを介する送信のための標準チャネルである、請求項１５に記載の装置。 The apparatus according to claim 15, wherein the standard data channel is a standard channel for digital stereo radio signals or a standard channel for transmission over the Internet.

１以上の基本チャネルおよび、前記１以上の基本チャネルに時間経過を付与するフィンガープリント情報および前記１以上の基本チャネルと組合わせることにより前記元のマルチチャネル信号の前記マルチチャネル再生を可能にするマルチチャネル補助情報を含むデータストリームから元のマルチチャネル信号のマルチチャネル表現（１８、２０）を生成する装置であって、前記マルチチャネル補助情報よび前記フィンガープリント情報の対応関係は前記データストリームから生成され、
テストフィンガープリント情報を前記１以上の基本チャネルから生成するためのフィンガープリント生成装置（１１）、
前記データストリームからフィンガープリント情報を抽出し、参照フィンガープリント情報を生成するためのフィンガープリント抽出装置（９）および、
前記テストフィンガープリント情報、前記参照フィンガープリント情報および、前記データストリームに含まれ且つ前記データストリームから生成される前記マルチチャネル情報および前記フィンガープリント情報の対応関係を利用して、前記マルチチャネル補助情報および前記１以上の基本チャネルを時間的に同期し、同期マルチチャネル表現を生成する同期装置（１３）を備える、装置。 One or more basic channels, fingerprint information that gives a time lapse to the one or more basic channels, and a multi that enables the multi-channel reproduction of the original multi-channel signal by combining with the one or more basic channels An apparatus for generating a multi-channel representation (18, 20) of an original multi-channel signal from a data stream including channel auxiliary information, wherein the correspondence between the multi-channel auxiliary information and the fingerprint information is generated from the data stream. ,
A fingerprint generation device (11) for generating test fingerprint information from the one or more basic channels;
A fingerprint extractor (9) for extracting fingerprint information from the data stream and generating reference fingerprint information; and
Using the correspondence between the test fingerprint information, the reference fingerprint information, the multi-channel information included in the data stream and generated from the data stream, and the fingerprint information, the multi-channel auxiliary information and An apparatus comprising a synchronizer (13) for synchronizing the one or more elementary channels in time and generating a synchronized multi-channel representation.

前記同期マルチチャネル表現を利用して前記マルチチャネル表現を再生し、前記元のマルチチャネル信号を再生するためのマルチチャネル再生装置（２１）を更に備える、請求項１７に記載の装置。 The apparatus according to claim 17, further comprising a multi-channel playback device (21) for playing back the multi-channel representation using the synchronized multi-channel representation and reproducing the original multi-channel signal.

前記データストリームは、参照フィンガープリント情報としての一連の参照フィンガープリント値に時間的に対応する、一連のマルチチャネル補助データのブロックからなり、
前記抽出装置（９）は、マルチチャネル補助データのブロックに対し、時間的対応関係に基づき対応するフィンガープリント値を決定し、
前記フィンガープリント生成装置（１１）は、一連の前記１以上の基本チャネルのブロックに対し、一連のテストフィンガープリント値をテストフィンガープリント情報として決定し、
前記同期装置（１３）は前記マルチチャネル補助データのブロックおよび前記１以上の基本チャネルのブロックとの間のオフセットを、前記一連のテストフィンガープリント値および前記一連の参照フィンガープリント値の間のオフセット（３０）に基づき計算し、前記一連のマルチチャネル補助情報のブロックを計算したオフセットに基づき遅延（２８）することにより前記オフセットを補間する、請求項１７または請求項１８に記載の装置。 The data stream consists of a series of multi-channel auxiliary data blocks corresponding in time to a series of reference fingerprint values as reference fingerprint information;
The extraction device (9) determines a corresponding fingerprint value based on a temporal correspondence for a block of multi-channel auxiliary data;
The fingerprint generator (11) determines a series of test fingerprint values as test fingerprint information for a series of blocks of the one or more basic channels,
The synchronizer (13) calculates an offset between the block of multi-channel auxiliary data and the block of one or more basic channels, and an offset between the series of test fingerprint values and the series of reference fingerprint values ( 19. The apparatus according to claim 17 or 18, wherein the offset is interpolated by calculating based on 30) and delaying (28) based on the calculated offset of the series of multi-channel auxiliary information blocks.

前記フィンガープリント生成装置（１１）はフィンガープリント値を量子化し、前記テストフィンガープリント情報を生成する、請求項１７から請求項１９のいずれかに記載の装置。 20. Apparatus according to any of claims 17 to 19, wherein the fingerprint generator (11) quantizes a fingerprint value and generates the test fingerprint information.

前記フィンガープリント生成装置（１１）は、前記データストリームに含まれるスケーリング情報に基づき、フィンガープリント値をスケーリングする、請求項１７から請求項２０のいずれかに記載の装置。 21. Apparatus according to any of claims 17 to 20, wherein the fingerprint generator (11) scales a fingerprint value based on scaling information contained in the data stream.

２以上の基本チャネルがあり、
前記フィンガープリント生成装置（１１）は前記２以上の基本チャネルをサンプル処理またはスペクトラム処理により加算、または二乗し加算する、請求項１７から請求項２１のいずれかに記載の装置。 There are two or more basic channels,
The device according to any one of claims 17 to 21, wherein the fingerprint generation device (11) adds or squares the two or more basic channels by sample processing or spectrum processing.

前記フィンガープリント生成装置（１１）は、前記１以上の基本チャネルのエネルギエンベロープに関するデータをフィンガープリント情報として利用する、請求項１７から請求項２２のいずれかに記載の装置。 23. A device according to any of claims 17 to 22, wherein the fingerprint generator (11) uses data relating to the energy envelope of the one or more basic channels as fingerprint information.

前記フィンガープリント生成装置（１１）は、前記１以上の基本チャネルのエネルギエンベロープに関するデータをフィンガープリント情報として利用し、
前記フィンガープリント生成装置（１１）は更に、前記エネルギの最小化を利用し、最小エネルギを対数表現する、請求項１７から請求項２３のいずれかに記載の装置。 The fingerprint generator (11) uses data relating to the energy envelope of the one or more basic channels as fingerprint information,
24. Apparatus according to any of claims 17 to 23, wherein the fingerprint generator (11) further utilizes the energy minimization to logarithmically represent the minimum energy.

マルチチャネル補助情報のブロックおよびブロックフィンガープリントは、ブロック構成された前記データストリームのブロックに含まれ、
前記フィンガープリント生成装置（１１）は、前記１以上の基本チャネルの２つのブロックフィンガープリント間の差分をテストフィンガープリント情報として計算し、
前記フィンガープリント抽出装置（９）は更に前記データストリームに含まれる２つのブロックフィンガープリント間の差分を計算し、参照フィンガープリントとして前記同期装置（１３）に送信する、請求項１７から請求項２４のいずれかに記載の装置。 A block of multi-channel auxiliary information and a block fingerprint are included in the block of the data stream that is block-configured,
The fingerprint generation device (11) calculates a difference between two block fingerprints of the one or more basic channels as test fingerprint information,
25. The fingerprint extraction device (9) further calculates a difference between two block fingerprints included in the data stream and sends it as a reference fingerprint to the synchronization device (13). The device according to any one of the above.

前記同期装置（１３）は、オーディオ出力と平行して前記マルチチャネル補助データおよび前記１以上の基本チャネルとの間のオフセットを計算し、前記オフセットを適応的に補間する、請求項１７から請求項２５のいずれかに記載の装置。 18. The synchronizer (13) calculates an offset between the multi-channel auxiliary data and the one or more elementary channels in parallel with an audio output and adaptively interpolates the offset. The device according to any one of 25.

更に、同期マルチチャネル補助データが得られない時は前記１以上の基本チャネルを再生し、同期マルチチャネル補助データが得られた時は前記１以上の基本チャネルのモノラルまたはステレオ再生からマルチチャネル再生に変換（３２）する、請求項１８に記載の装置。 Further, when the synchronized multi-channel auxiliary data cannot be obtained, the one or more basic channels are reproduced, and when the synchronized multi-channel auxiliary data is obtained, the mono or stereo reproduction of the one or more basic channels is changed to multi-channel reproduction. 19. The device according to claim 18, wherein the conversion (32) is performed.

互いに異なる２つの論理チャネルまたは物理チャネルを介して、もしくは異なるタイミングで動作する同一の送信チャネルを介して受信されるビットストリームから、前記データストリームおよび前記１以上の基本チャネルを別々に生成する、請求項１７から請求項２７のいずれかに記載の装置。 Generating the data stream and the one or more basic channels separately from bit streams received over two different logical or physical channels or over the same transmission channel operating at different timings. 28. The apparatus according to any one of items 17 to 27.

２以上のチャネルを有する元のマルチチャネル信号をマルチチャネル再生するためのデータストリーム生成方法であって、
前記元のマルチチャネル信号から生成した、１以上且つ前記元のマルチチャネル信号のチャネル数よりも少ない数の１以上の基本チャネルから、前記１以上の基本チャネルに時間経過を付与するフィンガープリント情報を生成（２）し、
フィンガープリント情報から時間可変的なマルチチャネル補助情報のデータストリームを生成（４）し、前記１以上の基本チャネルと組み合わさって前記元のマルチチャネル信号のマルチチャネル再生を可能にし、前記マルチチャネル補助情報および前記フィンガープリント情報の時間的対応関係を前記データストリームから生成できるように前記データストリームを生成する、方法。 A data stream generation method for multi-channel reproduction of an original multi-channel signal having two or more channels,
Fingerprint information that gives a time lapse to one or more basic channels from one or more basic channels that are generated from the original multi-channel signal and less than the number of channels of the original multi-channel signal. Generate (2),
A time-variable multi-channel auxiliary information data stream is generated from the fingerprint information (4) and combined with the one or more basic channels to enable multi-channel reproduction of the original multi-channel signal. Generating the data stream such that a temporal correspondence of information and the fingerprint information can be generated from the data stream.

元のマルチチャネル信号のマルチチャネル表現（１８、２０）を１以上の基本チャネルおよび、前記１以上の基本チャネルに時間経過を付与するフィンガープリント情報および前記１以上の基本チャネルと組合わさって前記元のマルチチャネル信号のマルチチャネル再生を可能にするマルチチャネル補助情報を含むデータストリームから生成する方法であって、前記マルチチャネル補助情報および前記フィンガープリント情報の対応関係は前記データストリームから生成され、
テストフィンガープリント情報を前記１以上の基本チャネルから生成（１１）し、
前記フィンガープリント情報を前記データストリームから抽出（９）し、参照フィンガープリント情報を生成し、および
前記テストフィンガープリント情報、前記参照フィンガープリント情報、および前記マルチチャネル補助情報および前記データストリームに含まれ且つ前記データストリームから生成される前記フィンガープリント情報の対応関係に基づき、前記マルチチャネル補助情報および前記１以上の基本チャネルを同期（１３）し、同期マルチチャネル表現を生成する、方法。 The original multi-channel signal multi-channel representation (18, 20) in combination with one or more basic channels and fingerprint information for giving a time lapse to the one or more basic channels and the one or more basic channels Generating from a data stream including multi-channel auxiliary information enabling multi-channel reproduction of a multi-channel signal, wherein the correspondence between the multi-channel auxiliary information and the fingerprint information is generated from the data stream,
Generating (11) test fingerprint information from the one or more basic channels;
Extracting (9) the fingerprint information from the data stream, generating reference fingerprint information, and being included in the test fingerprint information, the reference fingerprint information, and the multi-channel auxiliary information and the data stream; A method of synchronizing (13) the multi-channel auxiliary information and the one or more basic channels based on a correspondence relationship of the fingerprint information generated from the data stream to generate a synchronized multi-channel representation.

請求項２９または請求項３０に記載の方法をコンピュータ上で実行するためのプログラム符合を含むコンピュータプログラム。 A computer program comprising a program code for executing the method according to claim 29 or 30 on a computer.

元のマルチチャネル信号から生成された、１以上且つ前記元のマルチチャネル信号のチャネル数より少ない数の１以上の基本チャネルに時間経過を付与するフィンガープリント情報および前記１以上の基本チャネルと組合わさって前記元のマルチチャネル信号のマルチチャネル再生を可能にするマルチチャネル補助情報含むデータストリームであって、前記マルチチャネル補助情報および前記フィンガープリント情報の対応関係は前記データストリームから生成される、データストリーム。 Fingerprint information for giving a time lapse to one or more basic channels generated from the original multi-channel signal and having a number smaller than the number of channels of the original multi-channel signal and the one or more basic channels. A data stream including multi-channel auxiliary information enabling multi-channel reproduction of the original multi-channel signal, wherein the correspondence between the multi-channel auxiliary information and the fingerprint information is generated from the data stream .

前記データストリームが請求項１７に記載の装置に送信される時に前記元のマルチチャネル信号の同期マルチチャネル表現を生成するための制御信号を含む、請求項３２に記載のデータストリーム。 35. The data stream of claim 32, comprising a control signal for generating a synchronized multi-channel representation of the original multi-channel signal when the data stream is transmitted to the apparatus of claim 17.