JP4345217B2

JP4345217B2 - Data processing method and apparatus

Info

Publication number: JP4345217B2
Application number: JP2000301393A
Authority: JP
Inventors: 愼治根岸; 秀樹小柳; 陽一矢ケ崎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-09-29
Filing date: 2000-09-29
Publication date: 2009-10-14
Anticipated expiration: 2020-09-29
Also published as: JP2002112149A

Description

【０００１】
【発明の属する技術分野】
本発明は、静止画像や動画像等のビデオデータ、オーディオデータ、テキストデータやグラフィックデータなどから成るマルチメディアデータと共に、それらマルチメディアデータを用いてシーンを構成するためのシーン記述データをネットワークを用いて配信し、その配信されたマルチメディアデータ及びシーン記述データを復号端末において受信し、当該復号端末にて復号して表示するようなデータ配信システムにおいて特殊再生を行う場合に用いて最適な、データ処理方法及び装置に関する。
【０００２】
【従来の技術】
静止画像や動画像の画像信号を圧縮して蓄積したビデオデータ等を、伝送媒体を介して配信し、復号端末において受信し、復号して表示するような従来のデータ配信システムの構成例を図３９に示す。なお、図３９では、説明を簡略化するためにビデオデータの経路のみについて説明している。また、以下の説明では、ビデオデータを例えばＩＳＯ（Internatioal Organization for Standardization）／ＩＥＣ（International Electrotechnical Comission）１３８１８−１（いわゆるＭＰＥＧ２ Systems）で規定されているトランスポートストリーム（Transport stream、以下、単にＴＳとする）へパケット化して配信する場合を例に挙げている。
【０００３】
図３９において、サーバ２００は、ビデオデータを記憶する記憶部２０９を備えている。上記記憶部２０９から読み出されたビデオデータは、多重化部２０４にてＴＳへパケット化され、さらに送信部２０５にて配信データ２１１となされて伝送媒体２１０へ出力され、例えば復号端末２１２へ配信される。このとき、上記ＴＳの配信データ２１１は、伝送媒体２１０で使用されるプロトコルを使用して伝送されることになる。例えばＩＳＯ／ＩＥＣ１３８１８−１の規定を満たすＴＳは、ＩＥＣ６１８８３の「Digital Interface for consumer audio/video equipment」に定められた方法で、例えばＩＥＥＥ（Institute of Electrical and Electronics Engineers）１３９４規格の伝送媒体を使用して伝送することが可能である。なお、多重化部２０４と送信部２０５は一体の構成であっても構わない。
【０００４】
復号端末２１２では、受信部２１３により上記配信データ２１１を受信し、分離部２１４に送る。分離部２１４では、上記ＴＳのパケットからビデオデータを分離し、復号部２１５に送る。復号部２１５では、符号化されているビデオデータを復号する。この復号されたビデオデータは、例えば図示していない表示装置等に送られ、ビデオ画像として表示されることになる。
【０００５】
このようなデータ配信システムにおいて、例えば早送り再生やコマ送り再生、一時停止などの特殊再生表示を行う場合には、例えばユーザによる端末フロントパネル或いはリモートコントローラ等の操作に応じた特殊再生指定信号（早送り再生やコマ送り再生などの指示信号）２０６が、当該復号端末２１２の特殊再生制御部２１６へ入力されることになる。このときの復号端末２１２の特殊再生制御部２１６は、特殊再生指定信号２０６にて指定された種類の特殊再生用のビデオデータをサーバ２００に対して要求するための特殊再生要求信号２２０を発生し、その特殊再生要求信号２２０を前記伝送媒体２１０を介してサーバ２００の特殊再生制御部２０１へ送信する。
【０００６】
この特殊再生要求信号２２０を受け取ったサーバ２００の特殊再生制御部２０１は、その要求に応じたコントロール信号２０２ａ，２０２ｂを発生し、それぞれ対応する多重化部２０４と送信部２０５へ送る。多重化部２０４は、コントロール信号２０２ｂによる特殊再生制御部２０１の制御の元で、記憶部２０９から、上記ユーザが指定した種類の特殊再生を復号端末２１２にて可能とするための特殊再生用のビデオデータを読み出す。また、多重化部２０４では、その当該特殊再生用のビデオデータをＴＳへパケット化し、送信部２０５へ送る。送信部２０５は、コントロール信号２０２ａによる特殊再生制御部２０１の制御の元で、その特殊再生用のビデオデータのパケットを配信データ２１１として復号端末２１２に配信する。
【０００７】
当該特殊再生用のビデオデータからなる配信データ２１１が供給されたときの復号端末２１２では、上記特殊再生指定信号２０６に応じた特殊再生制御を行うためのコントロール信号２１７ａ，２１７ｂが上記特殊再生制御部２１６から出力され、それぞれ対応する受信部２１３と復号部２１５へ送られる。受信部２１３は、このコントロール信号２１７ｂによる特殊再生制御部２１６の制御の元で、上記特殊再生用のビデオデータからなる配信データ２１１を受信し、分離部２１４へ送る。分離部２１４では、上記ＴＳのパケットから上記特殊再生用のビデオデータを分離し、復号部２１５へ送る。復号部２１５では、コントロール信号２１７ａによる特殊再生制御部２１６の制御の元で、特殊再生用のビデオデータの復号を行う。これにより、図示しない表示装置等には、早送り再生やコマ送り再生などの特殊再生表示がなされることになる。
【０００８】
なお、ＩＳＯ／ＩＥＣ１３８１８−２に規定されているビデオフレームの符号化方法には、フレーム内データのみから符号化するＩピクチャ（Intra-coded picture：イントラ符号化画像）と、フレーム間の予測を利用して符号化するＢピクチャ（Bidirectionally predictive-coded picture：両方向予測符号化画像）およびＰピクチャ（Predictive-coded picture：前方予測符号化画像）があり、前述の図３９に示したデータ配信システムでは、上記記憶部２０９から読み出される上記特殊再生用のビデオデータとして、上記ビデオフレーム間の予測処理を使用しないＩピクチャが用いられている。すなわち、通常再生用のビデオデータ中にはランダムアクセスを可能にするために定期的にＩピクチャが含まれおり、そのＩピクチャを抽出して特殊再生用のビデオデータを構成している。このように、図３９に示した従来のデータ配信システムでは、復号端末２１２において例えば早送り等の特殊再生が行われる場合、ＩＳＯ／ＩＥＣ１３８１８−２のＩピクチャのみからなるビデオデータのような特殊再生用のビデオデータを、サーバ２００から復号端末２１２へ配信するようになされている。
【０００９】
その一方で、上述したデータ配信システムのように、例えばＩＳＯ／ＩＥＣ１３８１８−２（いわゆるＭＰＥＧ２ video）に準拠した圧縮ビデオデータを配信する場合、当該ＩＳＯ／ＩＥＣ１３８１８−２に規定される圧縮ビデオデータは、デコーダバッファをオーバーフローおよびアンダーフローしないように符号化されていなければならない。なお、デコーダバッファとは、復号部２１５が備える図示しない入力バッファに相当するものである。このＩＳＯ／ＩＥＣ１３８１８−２に規定されているバッファのサイズを超えてデータを入力すると、上記デコーダバッファはオーバーフローとなり、一方、復号すべき時刻において復号に必要なデータが到着していなければアンダーフローとなる。
【００１０】
しかしながら、上述の特殊再生用のビデオデータのように、Ｉピクチャのみからなるビデオデータはデータ量が多くなり、デコーダバッファをオーバーフローもしくはアンダーフローさせてしまうことがある。このため、従来のデータ配信システムでは、デコーダバッファをオーバーフロー若しくはアンダーフローさせることなく、且つ特殊再生をも可能とするような、通常再生用とは異なる特殊再生用の特別なデータを予め用意しておき、復号端末において特殊再生を行う際に、その特殊再生用の特別なデータを配信する必要がある。また、復号端末側においても、その特殊再生用の特別なデータに対応した、通常の特殊再生処理とは異なる特別な特殊再生処理が行えるような、特別な端末が必要となっている。
【００１１】
すなわち、従来のデータ配信システムによれば、デコーダバッファをオーバーフローもしくはアンダーフローさせることなく、特殊再生を実現するために、上述したＩピクチャのみからなる特殊再生用のビデオデータとは異なる特殊再生用の特別なデータを予め用意し、特殊再生時にその特別なデータを配信しなければならない。同じく、復号端末は、その特殊再生用の特別なデータに対応可能な特別な各復号部２１５を備えた端末が必要となり、また、特殊再生制御部２１６では、受信部２１３、分離部２１４、復号部２１５を、特殊再生用データ処理のためのコントロールが必要となっている。
【００１２】
このようなことから、本件出願人は、特許願２０００−１７８９９９号や特許願２０００−１７９０００号により、サーバにおいて、記憶部から読み出された通常再生用ビデオデータを用い、ユーザから指定された種類の特殊再生を行った結果のデータをＩＳＯ／ＩＥＣ１３８１８−２の規定を満足するビデオデータへ変換し、その変換後のビデオデータを復号端末へ配信することにより、前述したような特殊再生用の特別な配信データを使用及び予め用意する必要が無く、また、その特殊再生用の特別な配信データに対応可能な特別な復号端末を必要としない簡易な構成とする技術について提案している。
【００１３】
図４０には、通常再生用のビデオデータを用いて特殊再生を行った結果のデータを、例えばＩＳＯ／ＩＥＣ１３８１８−２の規定を満足するビデオデータへ変換して出力することを実現する、データ配信システムの概略構成を示す。なお、この図４０の例では、例えばビデオデータ等をＩＳＯ（Internatioal Organization for Standardization）／ＩＥＣ（International Electrotechnical Comission）１３８１８−１（いわゆるＭＰＥＧ２ Systems）で規定されているトランスポートストリーム（Transport stream：ＴＳ）へパケット化して配信する場合を挙げている。
【００１４】
図４０において、サーバ２２０は、静止画像や動画像等のビデオデータ、オーディオデータ、テキストデータ、グラフィックデータ等のマルチメディアデータを記憶する記憶部２２９を備えている。上記記憶部２２９からは例えばビデオデータが読み出され、そのビデオデータは例えば後述するデータ変換部２２３を介して多重化部２２４へ送られる。多重化部２２４では、データ変換部２２３から出力されたデータをＴＳへパケット化する。このＴＳパケットは、さらに送信部２２５にて配信データ２３１となされて伝送媒体２３０へ出力され、例えば復号端末２３２へ配信される。このとき、上記ＴＳの配信データ２３１は、伝送媒体２３０で使用されるプロトコルを使用して伝送されることになる。例えばＩＳＯ／ＩＥＣ１３８１８−１の規定を満たすＴＳは、ＩＥＣ６１８８３の「Digital Interface for consumer audio/video equipment」に定められた方法で、例えばＩＥＥＥ（Institute of Electrical and Electronics Engineers）１３９４規格の伝送媒体を使用して伝送することが可能である。
【００１５】
復号端末２３２では、受信部２３３により上記配信データ２３１を受信し、分離部２３４に送る。分離部２３４では、上記ＴＳのパケットからビデオデータを分離し、復号部２３５に送る。復号部２３５では、供給されたデータを復号、すなわち符号化されているビデオデータを復号する。この復号されたビデオデータは、例えば図示していない表示装置等に送られ、ビデオ画像として表示されることになる。
【００１６】
このデータ配信システムの復号端末２３２において、特殊再生表示が行われる場合は、例えば、復号端末２３２のユーザによる操作に応じた特殊再生指定信号２２６が、当該復号端末２３２内の図示しない伝送媒体インターフェイス部などから伝送媒体２３０を介してサーバ２２０へ送信される。この特殊再生指定信号２２６は、例えば早送り再生や巻き戻し再生、コマ送り再生などの特殊再生の種類と、記憶部２２９に格納されているビデオデータ等の指定を含む信号である。なお、サーバ２２０と復号端末２３２が例えば家庭用ネットワークのように近距離にて接続され、ユーザがサーバ２２０のフロントパネルやリモートコントローラ等を操作可能な環境である場合には、当該サーバ２２０のフロントパネルやリモートコントローラ等をユーザが操作することにより、サーバ２２０に対して直接に特殊再生指定信号２２６を入力することも可能である。
【００１７】
サーバ２２０へ入力された特殊再生指定信号２２６は、当該サーバ２２０内に設けられている特殊再生制御部２２１へ入力する。この特殊再生制御部２２１は、特殊再生指定信号２２６に応じて、特殊再生の種類、ビデオデータの指定を含む特殊再生制御用のコントロール信号２２２を発生し、データ変換部２２３へ送る。
【００１８】
データ変換部２２３は、コントロール信号２２２による特殊再生制御部２２１の制御の元で、記憶部２２９からビデオデータを読み出す。さらに、データ変換部２２３は、記憶部２２９から読み出したビデオデータを用い、コントロール信号２２２にて指定される種類の特殊再生を行った結果のデータを、例えばＩＳＯ／ＩＥＣ１３８１８−２の規定を満足するビデオデータへ変換して出力する。すなわち、このときデータ変換部２２３は、復号端末２３２の復号部２３５において通常再生時と同様に復号を行った時に、早送り再生や巻き戻し再生、コマ送り再生等の特殊再生（ユーザにより指定された特殊再生）が実現されるビデオデータへ、記憶部２２９から読み出したビデオデータを変換する。
【００１９】
ここで、図４１及び図４２を用いて、上記データ変換部２２３におけるデータ変換処理について簡単に説明する。
【００２０】
図４１には、ＭＰＥＧ２ videoで符号化されている通常再生用のビデオデータ（記憶部２２９から読み出されたビデオデータ）を、上記データ変換部２２３において、特殊再生処理の一例としての早送り再生を実現し且つＩＳＯ／ＩＥＣ１３８１８−２の規定を満足するビデオデータへ変換する際の、データ変換処理の概略を示す。なお、図中のＩはＩピクチャ、ＰはＰピクチャ、ＢはＢピクチャを表している。また、ＭＰＥＧ２ videoの規定では、ピクチャ間の予測を使用して符号化を行う関係上、符号化順（データがビットストリーム中に符号化される順番）と実際の表示順が異なる場合があるため、図４１では符号化順と表示順を併記して示している。図４１の（ａ）には、通常再生用ビデオデータの符号化順を示し、図４１の（ｂ）には、通常再生用ビデオデータを復号して表示する際の表示順を示している。図４１の（ｃ）には、通常再生区間ＵＳの次に早送り再生区間ＦＳへ移行し、その後通常再生区間ＵＳへ戻されるような特殊再生のための変換処理が行われる場合の符号化順を示し、図４１の（ｄ）には、図４１の（ｃ）のような特殊再生のための変換処理が行われる場合の表示順を示している。
【００２１】
データ変換部２２３では、特殊再生が行われる早送り再生区間ＦＳについて、図中Ｅ_k、Ｅ_m、Ｅ_nに示すように、図４１の（ａ）の通常再生用ビデオデータ中のＩピクチャ（Ｉ_k、Ｉ_m、Ｉ_n）を抜き出して使用し、さらに、デコーダバッファを破綻させないために、それらＩピクチャの間にリピートピクチャＢ_Rを挿入するようなデータ変換処理を行う。なお、上記リピートピクチャＢ_Rとは、予測元画像を繰り返すピクチャであり、復号の際にはＢピクチャとして扱われるピクチャである。また、リピートピクチャＢ_Rの挿入は、早送り再生の速度を調節する効果もある。
【００２２】
図４２には、図４１と同様に、ＭＰＥＧ２ videoで符号化されている通常再生用のビデオデータ（記憶部２２９から読み出されたビデオデータ）を、上記データ変換部２２３において、特殊再生処理の一例としての巻き戻し再生を実現し且つＩＳＯ／ＩＥＣ１３８１８−２の規定を満足するビデオデータへ変換する際の、データ変換処理の概略を示す。図４２の（ａ）には、通常再生用ビデオデータの符号化順を示し、図４２の（ｂ）には、通常再生用ビデオデータを復号して表示する際の表示順を示している。図４２の（ｃ）には、通常再生区間ＵＳの次に巻き戻し再生区間ＢＳへ移行し、その後通常再生区間ＵＳに戻されるような特殊再生のための変換処理が行われる場合の符号化順を示し、図４２の（ｄ）には、図４２の（ｃ）のような特殊再生のための変換処理が行われる場合の表示順を示している。
【００２３】
データ変換部２２３では、特殊再生が行われる巻き戻し再生区間ＢＳについて、図中Ｅ_k、Ｅ_m、Ｅ_nに示すように、図４２の（ａ）の通常再生用ビデオデータ中のＩピクチャ（Ｉ_k、Ｉ_m、Ｉ_n）を抜き出し且つそれらの順序を入れ替え、さらにデコーダバッファを破綻させないために、それらＩピクチャの間にリピートピクチャＢ_Rを挿入するようなデータ変換処理を行う。
【００２４】
このように、データ変換部２２３にて変換処理された特殊再生用のビデオデータは、前述同様に多重化部２２４以降の構成を介して復号端末２３２へ配信されることになる。
【００２５】
【発明が解決しようとする課題】
ところで、従来のテレビジョン放送では、１つの画像信号を画像表示装置の画面上に表示し、１つの音声信号のみがスピーカから出力されるようになされているが、近年は、静止画や動画等のビデオデータ、オーディオデータ、テキストデータやグラフィックデータなどから成るマルチメディアデータを用いて１つのシーンを構成するようなことも考えられている。なお、それらマルチメディアデータを用いてシーンの構成を記述する方法としては、いわゆるインターネットのホームページ等で用いられているＨＴＭＬ（HyperText Markup Language）、ＩＳＯ／ＩＥＣ１４４９６−１に規定されたシーン記述方式であるＭＰＥＧ４ＢＩＦＳ（Binary Format for the Scene）、ＩＳＯ／ＩＥＣ１４７７２に規定されたＶＲＭＬ（Virtual Reality Modeling Language）、Ｊａｖａ（商標）などがある。以下、シーンの構成を記述したデータをシーン記述と呼ぶことにする。
【００２６】
ＶＲＭＬおよびＭＰＥＧ４ＢＩＦＳを用いたシーン記述の例を、図４３を用いて説明する。なお、図４３にはシーン記述の内容が示されている。ＶＲＭＬでは、図４３のようなテキストデータによりシーン記述が行われ、ＭＰＥＧ４ＢＩＦＳではこのテキストデータをバイナリに符号化したものによりシーン記述が行われる。
【００２７】
ＶＲＭＬおよびＭＰＥＧ４ＢＩＦＳのシーン記述は、ノードと呼ばれる基本的な記述単位により表現され、図４３の例ではノードを太線斜体文字にて表している。ノードは、表示される物体や物体同士の結合関係等を記述する単位であり、ノードの特性や属性を示すためにフィールドと呼ばれるデータを含んでいる。例えば、図４３中のTransformノードは、三次元の座標変換を指定可能なノードであり、そのノード中のtranslationフィールドにて座標原点の平行移動量が指定されている。また、フィールドには他のノードを指定可能なフィールドも存在する。例えば図４３中のTransformノードは、Transformノードにより座標変換される子ノード群を示すChildrenフィールドがあり、このChildrenフィールドにより例えばShapeノード等がグルーピングされている。表示する物体をシーン中に配置するには、物体を表すノードを、属性を表すノードと共にグループ化し、さらに、配置位置を示すノードによってそれらノードをグループ化する。例えば、図４３中のShapeノードが表している物体は、その親ノードであるTransformノードによって指定された平行移動を適用されて、シーン中に配置されることになる。
【００２８】
前記ビデオデータやオーディオデータなどは、上記シーン記述により空間的および時間的に配置されて表示される。例えば、図４３中のMovieTextureノードは、３というＩＤで指定される動画像を、立方体の表面に貼り付けて表示することを指定している。
【００２９】
【発明が解決しようとする課題】
上述のように、近年は、ビデオデータ、オーディオデータ、テキストデータやグラフィックデータなどから成るマルチメディアデータを用いて１つのシーンを構成するようなことが考えられているが、従来のデータ配信システムでは、特殊再生中にはビデオデータのみしか復号および表示等されない。
【００３０】
このため、例えばビデオデータ、オーディオデータ、テキストデータやグラフィックデータなどから成るマルチメディアデータを配信するようにしたとしても、特殊再生中にはビデオデータのみしか復号および表示等されず、例えばオーディオデータや字幕用テキスト等のビデオ以外のデータを含むデータが配信されたとしても、従来のデータ配信システムでは、特殊再生中にそれらビデオ以外のデータの復号及び表示等は行われない。
【００３１】
このようなことから、早送り再生や巻き戻し再生等の特殊再生中にも、オーディオデータや字幕用テキストデータなどのビデオデータ以外のデータの復号及び表示等を可能とすることが望まれている。
【００３２】
また、現在のところ、上述のようなシーンを構成するためのシーン記述データを、特殊再生中にも配信し、復号等するための手法及び手段は実現されていない。このため、従来のデータ配信システムでは、例えば上述のマルチメディアデータを用いて１つのシーンを構成し、そのマルチメディアデータを配信するようにしたとしても、特殊再生中にはシーンの構成が行えず、その結果、例えば特殊再生の開始時及び終了時に表示されるシーンが不連続となってしまうような問題が発生する。
【００３３】
このようなことから、上記シーン記述データを、特殊再生中にも配信し、復号等するための手法及び手段の実現が望まれている。
【００３４】
さらに、特殊再生中にも上述のマルチメディアデータとシーン記述データを配信し、復号し、表示等することを実現する上では、それらのデータ間の同期関係を保持して表示等がなされるようにする必要があり、また、伝送ビットレートなどの評価基準（デコーダバッファを破綻させない等の基準）を満たすデータとして配信する必要もある。
【００３５】
そこで、本発明はこのような実情に鑑みてなされたものであり、特殊再生を行う場合において、ビデオ以外のデータの復号及び表示等を可能とし、また、シーン記述データを配信、復号等するための手法及び手段を実現し、さらに、データ間の同期関係を保持し、伝送ビットレートなどの評価基準を満たすデータとして配信することを可能とする、データ処理方法及び装置を提供することを目的とする。
【００３６】
【課題を解決するための手段】
本発明のデータ処理方法は、所定の符号化単位毎に符号化したデータを送信側から受信側に伝送する際のデータ処理方法において、上記受信側から供給された特殊再生指定信号を受信するステップと、上記受信した特殊再生指定信号に基づいて、出力するデータのビットレート調整に応じて上記受信側での特殊再生に使用するデータの出力時の符号化単位を選択するステップと、上記選択した符号化単位の再生に関連する時間情報を上記特殊再生に応じて変換するステップと、上記出力するデータのビットレート調整に応じて、上記特殊再生に使用するデータの出力時の表示領域が記述されたシーン記述データを変更するステップと、上記変換後の時間情報、上記変更後のシーン記述データ及び上記特殊再生に使用するデータを上記受信側に出力するステップとを有することにより、上述した課題を解決する。
【００３７】
また、本発明のデータ処理装置は、所定の符号化単位毎に符号化したデータを受信側に伝送するデータ処理装置において、上記受信側から供給された特殊再生指定信号に基づいて、出力するデータのビットレート調整に応じて上記受信側での特殊再生に使用するデータの出力時の符号化単位を選択し、該選択した符号化単位の再生に関連する時間情報を上記特殊再生に応じて変換するデータ変換手段と、上記特殊再生指定信号に基づいて、上記出力するデータのビットレート調整に応じて、上記特殊再生に使用するデータの出力時の表示領域が記述されたシーン記述データを変更するフィルタ手段と、上記データ変換手段により変換した時間情報、上記フィルタ手段により変更したシーン記述データ及び上記特殊再生に使用するデータを上記受信側に出力する送信手段とを備えることにより、上述した課題を解決する。
【００４０】
すなわち、本発明によれば、例えば通常再生用データの表示単位の表示時刻および表示時間もしくは表示終了時刻を、特殊再生に応じて算出して書き換えることにより特殊再生用データへ変換することで、復号端末において特殊再生中もデータ間の同期関係を保存して表示することを可能とする。また、本発明によれば、例えばビットレートなどの評価基準を満たすように、通常再生用データ中の表示単位を選択して配信することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信を可能とする。また、本発明によれば、ビットレートなどの評価基準を満たすように、通常再生用データ中の表示単位を変換して出力することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信を可能とする。
【００４１】
【発明の実施の形態】
以下、本発明の好ましい実施の形態について、図面を参照しながら説明する。
【００４２】
静止画像や動画像等のビデオデータ、オーディオデータ、テキストデータやグラフィックデータなどのマルチメディアデータ、及び、シーン記述データ等を、伝送媒体を介して配信し、復号端末において受信し、復号して表示するような、本発明実施の形態のデータ配信システムの構成例を図１に示す。なお、以下の説明では、例えばビデオデータ等をＩＳＯ（Internatioal Organization for Standardization）／ＩＥＣ（International Electrotechnical Comission）１３８１８−１（いわゆるＭＰＥＧ２ Systems）で規定されているトランスポートストリーム（Transport stream：ＴＳ）へパケット化して配信する場合を例に挙げている。
【００４３】
図１において、サーバ１０は、静止画像や動画像等のビデオデータ、オーディオデータ、テキストデータ、グラフィックデータ等のマルチメディアデータ、及びシーン記述データ等を記憶する記憶部９を備えている。上記記憶部９から読み出されたデータは、例えば後述するデータ変換部７を介して多重化部４へ送られる。多重化部４では、データ変換部７から出力されたデータをＴＳへパケット化する。このＴＳパケットは、さらに送信部５にて配信データ２２となされて伝送媒体２１へ出力され、例えば復号端末１２へ配信される。このとき、上記ＴＳの配信データ２２は、伝送媒体２１で使用されるプロトコルを使用して伝送されることになる。例えばＩＳＯ／ＩＥＣ１３８１８−１の規定を満たすＴＳは、ＩＥＣ６１８８３の「Digital Interface for consumer audio/video equipment」に定められた方法で、例えばＩＥＥＥ（Institute of Electrical and Electronics Engineers）１３９４規格の伝送媒体を使用して伝送することが可能である。なお、多重化部４と送信部５は一体の構成であっても構わない。
【００４４】
復号端末１２では、受信部１３により上記配信データ２２を受信し、分離部１４に送る。分離部１４では、上記ＴＳのパケットからデータを分離し、複数の復号部１５₁〜１５_nのうちそれぞれ対応する復号部に送る。復号部１５₁〜１５_nでは、それぞれ供給されたデータを復号、すなわち符号化されているデータを復号する。
【００４５】
シーンの構成を記述したシーン記述データが配信されている場合、シーン合成部１６では、上記復号部１５₁〜１５_nより復号されたデータを上記シーン記述データに従って合成する。当該シーン合成部１６によりシーン合成された合成データは、例えば図示していない表示装置やスピーカ等に送られ、シーン画像及び音声として表示・放音等されることになる。なお、復号端末１２は複数接続されていても良い。
【００４６】
また、このデータ配信システムの復号端末１２において、特殊再生表示が行われる場合は、例えば、復号端末１２のユーザによる操作に応じた特殊再生指定信号６が、当該復号端末１２内の図示しない伝送媒体インターフェイス部などから伝送媒体２１を介してサーバ１０へ送信される。この特殊再生指定信号６は、例えば早送り再生や巻き戻し再生、コマ送り再生、スロー再生などの特殊再生の種類と、記憶部９に格納されているデータの指定を含む信号である。なお、サーバ１０と復号端末１２が例えば家庭用ネットワークのように近距離にて接続され、ユーザがサーバ１０のフロントパネルやリモートコントローラ等を操作可能な環境である場合には、当該サーバ１０のフロントパネルやリモートコントローラ等をユーザが操作することにより、サーバ１０に対して直接に特殊再生指定信号６を入力することも可能である。
【００４７】
サーバ１０へ入力された特殊再生指定信号６は、当該サーバ１０内に設けられている特殊再生制御部１へ入力する。この特殊再生制御部１は、特殊再生指定信号６に応じて、特殊再生の種類、データの指定を含む特殊再生制御用のコントロール信号２を発生し、データ変換部７へ送る。なお、データ変換部７は、配信するデータの数に応じて任意個存在してよい。
【００４８】
データ変換部７は、コントロール信号２による特殊再生制御部１の制御の元で、記憶部９からデータを読み出し、そのデータを、コントロール信号２にて指定される種類の特殊再生を実現する特殊再生用データへ変換する。
【００４９】
以下、本発明実施の形態のデータ配信システムにおけるデータ変換部７の詳細な構成及び動作について説明する。
【００５０】
図２には、本発明の第１の実施の形態にかかるデータ変換部７を備えたデータ配信システムのサーバ１０の詳細な構成を示す。なお、データ変換部７以外の各構成要素の動作は前述同様であるため、それらの詳細な説明は省略する。
【００５１】
この図２において、第１の実施の形態のデータ変換部７は、上記特殊再生制御部１からのコントロール信号２の制御の元で上記記憶部９からデータを読み出す読み出し部１７と、出力データ中に符号化される時刻情報を特殊再生に応じて書き換える時間情報書き換え部１９とを備える。なお、データ変換部７が複数在る場合、データ変換部７中の読み出し部１７は、全てのデータ変換部７で共通の構成としても良い。
【００５２】
上記読み出し部１７は、上記特殊再生制御部１からのコントロール信号２により指定される通常再生用のデータを、上記記憶部９から読み出し、時間情報書き換え部１９へ送る。
【００５３】
時間情報書き換え部１６は、上記読み出し部１７により記憶部９から読み出された通常再生用データの時間情報を、上記特殊再生に応じて変換がなされた後のデータの時間情報へ変換し、出力されるデータ中に符号化する。なお、上記データの時間情報とは、データ到着時間、表示開始時刻、表示終了時刻、表示時間もしくは復号時刻等である。なお、オーディオデータの場合、これら各時間情報は、実際には放音に関する時間であるが、画像の表示と音声の放音は関連しているため、上述のように表示開始時刻、表示終了時刻、表示時間等の表現を用いている。以下の説明でも同様である。当該第１の実施の形態では、この時間情報書き換え部１６により時間情報の書き換えがなされたデータが、前記多重化部４へ送られる。
【００５４】
図３を用いて、データ変換部７の時間情報書き換え部１９における時間情報の変換処理について説明する。なお、図３の例では、早送り再生を実現する場合の時間情報の変換処理例を示している。
【００５５】
図３の（ａ）は、上記記憶部９から読み出された通常再生用データについて、上記時間情報書き換え部１６による特殊再生のための時間情報変換処理を行わない場合（すなわち復号端末１２にて通常再生が行われる場合）のデータの表示タイミングを表している。なお、ＭＰＥＧ２ video等の一部の符号化方法では、実際の表示順番と符号化順番（データがビットストリーム中に符号化される順番）とが異なる場合があるが、図３の例では説明を分かり易くするため表示順番に合わせて示している。図３中のＡＵ３０，ＡＵ３１，ＡＵ３２等はそれぞれデータの１表示単位を表し、ビデオデータの場合にはピクチャに相当する。データの符号化は通常、この表示単位毎に行われる。この表示単位すなわち符号化単位を、以下ＡＵ（アクセスユニット）と呼ぶ。１ＡＵは、表示開始時間Ｔｓから表示を開始し、表示時間ΔＴ後の表示終了時刻Ｔｅにおいて表示を終了する。なお、１ＡＵの表示時間Δは、一般に符号化方法によって異なる。
【００５６】
一方、図３の（ｂ）は、上記記憶部９から読み出された通常再生用データについて、上記時間情報書き換え部１６による特殊再生（この場合は早送り再生）のための時間情報変換処理を行った場合、つまり復号端末１２にて特殊再生が行われる場合の変換済みデータの表示タイミングを表している。すなわち、この図３の（ｂ）には、通常再生区間内のＡＵ３０’の途中から早送り再生区間（特殊再生区間）となり、ＡＵ３１’は早送り再生区間、当該ＡＵ３１’の後のＡＵ３２’は通常再生区間となるような場合の表示タイミングを表している。
【００５７】
ここで、特殊再生として図３の例のように早送り再生が行われた場合、上記特殊再生のための変換処理がなされていないときの時間ｔ（以下、変換前の時間ｔとする）上の時刻Ｔと、当該特殊再生のための変換処理がなされたときの時間ｔ’（以下、変換後の出力の時間ｔ’とする）上の時刻Ｔ’との関係は、当該特殊再生を行う度に変化することになる。
【００５８】
このため、本発明実施の形態のデータ変換部７（時間情報書き換え部１９）では、上記変換後の出力の時間ｔ’上の時刻Ｔ'を、当該変換後の出力の時間ｔ’上の特殊再生開始時刻Ｔｏ’と、上記変換前の時間ｔ上における特殊再生開始時刻Ｔｏ（特殊再生開始時刻Ｔｏ’に対応する変換前の時間ｔ上の開始時刻）とを用いて、式（１）のように算出する。
Ｔ’＝Ｔｏ’+ （Ｔ−Ｔｏ）／ｎ (1)
ただし、式（１）中のｎは特殊再生中の再生速度を表し、倍速再生ならばｎの値は２であり、巻き戻し再生では負の値とする。
【００５９】
一方、通常再生中は、上記変換後の出力の時間ｔ’上の時刻Ｔ’を、当該変換後の出力の時間ｔ’上の特殊再生終了時刻Ｔｉ’と、上記変換前の時間ｔ上における特殊再生終了時刻Ｔｉ（特殊再生開始時刻Ｔｉ’に対応する変換前の時間ｔ上の終了時刻）とを用いて、式（２）のように算出する。
Ｔ’＝Ｔｉ’＋（Ｔ−Ｔｉ） (2)
また、通常再生中は、その直前の特殊再生終了時刻は変わらないため、次の特殊再生開始時における特殊再生開始時刻は、式（２）を用いて、式（３）のように求められる。
Ｔｏ’＝Ｔｉ’＋（Ｔｏ−Ｔｉ） (3)
上記の式（２）〜（３）に基づいて、上記データ変換部７は、通常再生中も特殊再生中も、変換後の出力の時間ｔ’上におけるＡＵの表示開始時刻Ｔs’および表示終了時刻Ｔｅ’を、変換前の時間ｔ上でのＡＵの表示開始時刻Ｔｓおよび表示終了時刻Ｔｅに基づいて算出することが可能となる。また、表示時間ΔＴ’は、変換前の時間ｔ上での表示時間ΔＴを１／ｎ倍（ｎは再生速度）するか、もしくは上記表示終了時刻Ｔｅ’から表示時刻Ｔｓ’を減算することにより算出する。
【００６０】
また、本実施の形態において、上記特殊再生開始時刻、特殊再生終了時刻及び特殊再生速度ｎは、上記コントロール信号２と共に特殊再生制御部１から、データ変換部７へ指定される。なお、これら特殊再生開始時刻、特殊再生終了時刻及び特殊再生速度ｎは、図示しない他のデータ変換部から指定される場合もある。すなわち例えば、本実施の形態のデータ配信システムが、前述の特許願２０００−１７８９９９号や特許願２０００−１７９０００号の図４０に示したようにビデオデータを特殊再生用に変換するデータ変換部２２３を備え、このデータ変換部２２３にてビデオデータの表示タイミングに合わせて特殊再生終了時刻、特殊再生開始時刻及び特殊再生速度が決定されるような場合には、当該データ変換部２２３からそれら特殊再生終了時刻、特殊再生開始時刻及び特殊再生速度が、本実施の形態のデータ変換部７に直接指定される場合もある。
【００６１】
本実施の形態のデータ配信システムによれば、上述したようにして、通常再生中も特殊再生中も、変換後の出力の時間ｔ’上におけるＡＵの表示開始時刻Ｔs’および表示終了時刻Ｔｅ’を算出し、また、表示時間ΔＴ’を算出することにより、時間情報書き換え部１９では、出力データ中に符号化される表示時刻、表示終了時刻、表示時間を、特殊再生に応じて書き換えることができる。また、復号時刻やデータ到着時刻などの時間情報もデータ中に符号化されている場合、時間情報書き換え部１９では、それらの時間情報も式（１）および式（２）に基づいて変換後の時間ｔ’上の時間情報へ変換して出力することができる。
【００６２】
以上のように、本実施の形態によれば、復号端末１２にて特殊再生が実行される際に、通常再生用データの時間情報が、上記特殊再生に応じて変換された後のデータの時間情報へ変換され、その時間情報がデータへ符号化されてサーバ１０から配信されるようになされている。すなわち、本実施の形態のデータ配信システムによれば、復号端末１２が受信する配信データは、サーバ１０において既に特殊再生用に時間情報を変換済みであるため、復号端末１２では特殊再生のための特別な処理は不要であり、通常再生中と同様に表示時間などの時間情報に基づいたタイミングで復号及び表示等を行えば、自動的に特殊再生を行った結果の表示結果が得られる。つまり、本実施の形態の場合の復号端末１２は、特殊再生用に特別な処理を行うこと無く、特殊再生用の特別な配信データに対応可能な特別な端末である必要もない。さらに、本実施の形態によれば、配信される複数のデータが等しい再生速度に合わせて変換されているため、それら複数のデータ間の同期にずれは発生せず、またズレが蓄積することも無い。
【００６３】
次に、図３と同様に表される図４を用い、特殊再生としてスロー再生を行う場合の時間情報書き換え部１９における時間情報の変換処理について説明する。
【００６４】
図４の（ａ）は、図３の（ａ）と同様に、変換前の時間ｔ上の通常再生用データの表示タイミングを表している。図４中のＡＵ４０，ＡＵ４１，ＡＵ４２等はそれぞれデータの１表示単位を表している。また、図４の（ｂ）は、図３の（ｂ）と同様に、上記時間情報書き換え部１６による特殊再生（この場合はスロー再生）のための時間情報変換処理を行った場合の変換済みデータの表示タイミングを表している。すなわち、この図４の（ｂ）には、通常再生区間内のＡＵ４０’の途中からスロー再生区間となり、ＡＵ４１’はスロー再生区間、当該ＡＵ４１’の後のＡＵ４２’は通常再生区間となるような場合の表示タイミングを表している。
【００６５】
ここで、特殊再生として例えば０．５倍速再生を行う場合、本発明実施の形態のデータ変換部７（時間情報書き換え部１９）では、再生速度ｎの値を０．５として前記式（１）の演算が行われる。
【００６６】
この図４の例のように、再生速度が等倍速よりも低速の特殊再生を行う場合においても、本実施の形態のデータ変換部７での時間情報の変換処理は前述同様に有効でありしたがって、復号端末１２においてはスロー再生用に特別な処理無しに、通常再生時と同様の復号及び表示等を行えば、スロー再生を行った結果の表示結果等を得ることができる。
【００６７】
次に、図３と同様に表される図５を用い、時間的に非連続な表示単位へ再生位置を移動するジャンプ等の特殊再生を行う場合の時間情報書き換え部１９における時間情報の変換処理について説明する。
【００６８】
図５の（ａ）は、図３の（ａ）と同様に、変換前の時間ｔ上の通常再生用データの表示タイミングを表している。図５中のＡＵ５０，ＡＵ５１，ＡＵ５２等はそれぞれデータの１表示単位を表している。また、図５の（ｂ）は、図３の（ｂ）と同様に、上記時間情報書き換え部１６による特殊再生（この場合はジャンプ）のための時間情報変換処理を行った場合の変換済みデータの表示タイミングを表している。すなわち、この図５の（ｂ）には、通常再生区間内のＡＵ５０’の途中からジャンプが行われ、ジャンプの開始時刻である特殊再生開始時刻Ｔｏ’とジャンプの終了時刻である特殊再生終了時刻Ｔｉ’の間のＡＵ５１が出力されず、上記ＡＵ５０’上の特殊再生開始時刻Ｔｏ’に続けて、特殊再生終了時刻Ｔｉ’以降のＡＵ５１’が出力されるような場合の表示タイミングを表している。
【００６９】
ここで、ジャンプの場合は、特殊再生中の再生速度が存在しないため、特殊再生制御部１からはデータ変換部７に対して特殊再生開始時刻および特殊再生終了時刻が指定される。特殊再生開始時刻は、前記式（３）により変換前の時間ｔ上における特殊再生開始時刻Ｔｏと、変換後の時間ｔ’上における特殊再生開始時刻Ｔｏ’間の換算が可能であるため、それら変換前後のどちらの時間上で指定しても構わない。また、特殊再生終了時刻は、変換前後の双方の時間上における特殊再生終了時刻ＴｉおよびＴｉ’を指定する。但し、変換後の時間ｔ’上における特殊再生終了時刻Ｔｏ’が、特殊再生開始時刻Ｔｉ’と等しい場合には、Ｔｉ’を指定しなくとも良い。
【００７０】
当該図５の例の場合、データ変換部７は、ジャンプの開始時刻Ｔｏ’と終了時刻Ｔｉ’の間のＡＵ５１を出力せず、また、ジャンプの開始時刻Ｔｏ’をまたいで表示されるＡＵ５０は、表示終了時刻がＴｏ’となるように時間情報を変更して出力するか、若しくは出力しない。さらに、ジャンプの終了時刻Ｔｉ’をまたいで表示されるＡＵ５２は、表示時刻がＴｉ’となるように時間情報を変更して出力するか、若しくは出力しない。
【００７１】
この図５の例のように、時間的に非連続な表示単位へ再生位置を移動するジャンプ等の特殊再生を行う場合においても、本実施の形態のデータ変換部７での時間情報の変換処理は前述同様に有効でありしたがって、復号端末１２においてはジャンプのための特別な処理無しに、通常再生時と同様の復号及び表示等を行えば、ジャンプを行った結果の表示結果等を得ることができる。
【００７２】
また、本発明によれば、シーンの構成を記述したシーン記述データを、特殊再生に応じて変換することにより、シーン記述データを特殊再生中においても配信し、復号等することが可能となり、したがって、特殊再生の開始時終了時に表示されるシーンが例えば不連続となるような不都合を回避可能となる。
【００７３】
なお、上述した例では、表示時刻や復号時刻等の時間情報がデータ自体に符号化されて付加されている場合、データ変換部７の時間情報書き換え部１９が、それら時間情報を書き換えて出力する例を挙げているが、その他に、例えば、時間情報が多重化部４によりデータへ付加される場合には、データ変換部７から多重化部４へ時間情報の変更を通知し、多重化部４がその変更後の時間情報をデータへ付加する。或いは、時間情報が送信部５によりデータへ付加される場合には、同様にデータ変換部７からそれら時間情報の変更を送信部５へ通知し、送信部５が当該変更後の時間情報を付加する。このことは、後述する他の各実施の形態においても同様に適用できる。
【００７４】
ところで、ビデオデータ、オーディオデータ、テキストデータ、グラフィックデータ等のマルチメディアデータ及びシーン記述データを配信し、復号して表示等するデータ配信システムにおいて、例え特殊再生中であっても、ビットレートなどの評価基準を満たすデータとして配信したいと云う要求がある。
【００７５】
すなわち、前記図３の例のような早送り再生中の配信データは、通常再生時の配信データに比べて時間軸上で圧縮されており、その平均ビットレートは通常再生時のものに比べて高くなり、その一方で、本実施の形態のように伝送媒体を介してデータを配信するシステムの場合は、伝送媒体の伝送容量や復号端末の能力に応じて配信時に許されるビットレートの上限が決まっており、例えば配信データのビットレートが上記配信に許されるビットレートの上限を超えてしまうと、データの遅延や損失が生してしまう。このような場合、例えば、上記配信データのビットレートに制限を加えれば、配信データのビットレートが上記配信時に許される上限ビットレートを超えてしまうことを防止できると考えられる。
【００７６】
また例えば、一定時間内の配信データ中に含まれるデータが相対的に増加すると、復号やシーン合成および表示の難易度が高くなるため、復号端末において正しく表示されなくなる危険性がある。このような場合、例えば上記配信データの復号、シーン合成、表示の難易度に制限を加えれば、復号端末において正しく表示されなくなる危険性を減らすことができると考えられる。
【００７７】
そこで、本発明の第２の実施の形態では、特殊再生中であってもビットレート等の評価基準を満たすようなデータを配信可能とすることにより、データの遅延や損失の発生を防止し、また、復号端末においてシーンを正しく表示可能としている。
【００７８】
図６には、本発明の第２の実施の形態にかかるデータ変換部７を備えたデータ配信システムのサーバ１０の詳細な構成を示す。
【００７９】
この図６において、データ変換部７は、上記特殊再生制御部１からのコントロール信号２の制御の元で上記記憶部９からデータを読み出す読み出し部１７と、出力データ中に符号化される時刻情報を特殊再生に応じて書き換える時間情報書き換え部１９の他に、ビットレートなどの評価基準に基づいて出力するＡＵを選択するスケジューラ１８を備える。なお、当該データ変換部７において、変換前の通常再生用データの時間から変換後の時間へ時間情報の変換を行い、その時間情報をデータ中に符号化して出力する処理は、第１の実施の形態の場合と同様である。
【００８０】
図７及び図８を用いて、第２の実施の形態の場合のデータ変換部７のスケジューラ１８における変換処理について説明する。
【００８１】
図７は図３と同様に表され、図７の（ａ）は、図３の（ａ）と同様に、変換前の時間ｔ上の通常再生用データの表示タイミングを表している。図７中のＡＵ７０，ＡＵ７１，ＡＵ７２，ＡＵ７３等はそれぞれデータの１表示単位を表している。また、図７の（ｂ），（ｃ），（ｄ）は、図３の（ｂ）と同様に、上記時間情報書き換え部１６による特殊再生（この場合はジャンプ）のための時間情報変換処理が行われると共に、本実施の形態のスケジューラ１８によって、配信時に許容されるビットレートに応じてＡＵが選択された場合の、変換済みデータの表示タイミングを表している。すなわち、この図７の（ｂ）には、早送り再生区間（特殊再生区間）においてスケジューラ１８によりＡＵ７１とＡＵ７２が選択されると共に、それらＡＵ７１，７２が時間情報書き換え部１６により時間情報変換処理されてＡＵ７１’，ＡＵ７２’となされ、その後のＡＵ７３’は通常再生区間となされた場合の表示タイミングを表している。また、図７の（ｃ）には、早送り再生区間においてスケジューラ１８によりＡＵ７１のみが選択されると共に、そのＡＵ７１が時間情報書き換え部１６により変換処理されてＡＵ７１’となされ、一方ＡＵ７２は出力されず、その後のＡＵ７３’は通常再生区間となされた場合の表示タイミングを表している。また、図７の（ｄ）には、早送り再生区間においてスケジューラ１８によりＡＵ７１とＡＵ７２の何れも選択されず、その後のＡＵ７３’が通常再生区間となされた場合の表示タイミングを表している。
【００８２】
ここで、図７の（ａ）に示す変換前の時間ｔ上での特殊再生区間（早送り再生区間）には、ＡＵ７１とＡＵ７２の２つのＡＵが存在し、前述の第１の実施の形態の場合は、それらＡＵ７１，ＡＵ７２の時間情報を特殊再生速度に応じて変換し、ＡＵ７１’，Ｕ７２’として出力する。しかし、例えば図８に示すように、特殊再生（図７および８の例では早送り再生）を行うと、その再生速度に応じて配信データのビットレートが変化することになる。このように変化したビットレートが、伝送媒体や復号端末の許容ビットレートを超えると、データの遅延や損失等が発生することになる。
【００８３】
そこで、本実施の形態のデータ変換部７が備えるスケジューラ１８は、配信データに許されるビットレートを満足するように、出力するＡＵと出力しないＡＵを選択する。例えば配信データに許されるビットレートが、ＡＵ７１のみを出力しＡＵ７２を出力しない場合のビットレートＢＲ８１以上で且つＡＵ７１およびＡＵ２の双方を出力した場合のビットレートＢＲ８０未満である場合、スケジューラ１８は、ＡＵ７２を出力しないことを決定する。この場合の変換出力は、図７の（ｃ）に示すようになる。また、配信データに許されるビットレートが、ＡＵ７１のみを出力しＡＵ７２を出力しない場合のビットレートＢＲ８１未満である場合、スケジューラ１８は、ＡＵ７１およびＡＵ７２の双方とも出力しないことを決定する。この場合の変換出力は、図７の（ｄ）に示すようになる。一方、配信データに許されるビットレートが、ＡＵ７１およびＡＵ７２の双方を出力した場合のビットレートＢＲ８０以上である場合、スケジューラ１８は、ＡＵ７１およびＡＵ７２の双方とも出力することを決定する。この場合の変換出力は、図７の（ａ）に示すようになる。このようにしてスケジューラ１８により選択されて出力されたＡＵは、その後、時間情報書き換え部１９により、前述したように特殊再生の再生速度に基づいて時間情報が変換される。
【００８４】
以上のように、第２の実施の形態によれば、ビットレートなどの評価基準を満たすように、通常再生用データ中の表示単位（ＡＵ）を選択して出力することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信が可能となっている。なお、評価基準はビットレートに限らない。例えば、一定時間に許されるポリゴン数やシーン記述データにおけるノードの数等の、データの復号やシーン合成，表示等の難易度を表す評価基準でも良い。また、テキストデータにおける文字数等、一定時間に出力可能なデータを制限可能な評価基準であっても良い。
【００８５】
さらに、本発明の第２の実施の形態のデータ変換部７は、上記のように出力する表示単位（ＡＵ）と出力しない表示単位を選択する際に、データが表示単位間の予測を用いずに符号化されている表示単位を優先して出力し、予測を用いて符号化されている表示単位を出力しないよう選択することもできる。これにより、復号端末では、上記予測を用いずに符号化されている表示単位を予測元とした予測復号が可能となる。
【００８６】
上記第２の実施の形態では、ＡＵを選択して出力するか否かにより、ビットレートなどの評価基準を満たす配信データを出力可能とする例を挙げたが、以下に説明する第３の実施の形態のように、ＡＵの内容自体を変換することにより、ビットレートなどの評価基準を満たす配信データを出力するようなことも可能である。
【００８７】
図９には、本発明の第３の実施の形態にかかるデータ配信システムのサーバ１０の詳細な構成を示す。
【００８８】
この図９において、サーバ１０は、前述の各実施の形態の何れかに対応するデータ変換部７の出力段に、フィルタ２３を備えること以外は、前記第１，第２の実施の形態と共通である。
【００８９】
上記フィルタ２３は、前述の第１又は第２の実施の形態のデータ変換部７によって特殊再生用に変換済みのデータ、すなわちＡＵ自体を、ビットレートなどの評価基準を満たすように変換する。なお、データ変換部７およびフィルタ２３は、複数存在しても構わない。すなわち、この第３の実施の形態のフィルタ２３は、第２の実施の形態のデータ変換部７のように、出力するＡＵと出力しないＡＵを選択するだけではなく、ＡＵ自体を変換することにより、ビットレートなどの評価基準を満たすデータを出力する。例えばテキストデータの場合、１つのＡＵに含まれる文字数を減じることにより、配信するデータ量を減じ、所望のビットレートを満足するデータへ変換して出力する。
【００９０】
本実施の形態によれば、ＡＵ自体を変換することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信が可能となっている。また、フィルタ２３に入力するＡＵは、第１又は第２の実施の形態のデータ変換部７により、既に特殊再生に応じて時間情報が変換済みであるため、復号端末１２側では特殊再生用に特別な処理を必要とせず、且つ、復号端末１２において特殊再生用の特別な処理無しに通常再生時と同様の復号及び表示等の処理を行っていれば、自動的に特殊再生用の表示等が実現できる。
【００９１】
以下に、上記フィルタ２３の具体例を説明する。
【００９２】
上記フィルタ２３の第１の具体例として、例えば、シーン記述中のデータを分割単位毎に扱い、伝送容量などの評価基準を満たすように分割単位毎にシーン記述を変換して出力するものを挙げることができる。当該第１の具体例のフィルタ２３を、本発明の第１又は第２の実施の形態のデータ変換部７と組み合わせて使用することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信が可能となる。
【００９３】
以下、本発明の第３の実施の形態に適用される上記第１の具体例のフィルタ２３の動作について説明する。
【００９４】
上記第１の具体例のフィルタ２３は、入力されたシーン記述を階層化情報に基づいて変換するものである。当該フィルタ２３は、シーン記述を出力する際に、復号端末１２の復号及び表示能力を示す復号端末情報を得る。上記復号端末情報とは、復号端末１２がシーン記述を表示する際の画枠や、ノード数の上限、ポリゴン数の上限、含まれるオーディオやビデオなどのマルチメディアデータの上限など、復号端末１２の復号及び表示能力を示す情報である。また、フィルタ２３には、復号端末情報の他に、シーン記述の配信に使用する伝送媒体２２の伝送容量を表す情報を加えた階層化情報が入力される。フィルタ２３は、上記階層化情報に基づいて、シーン記述入力を、階層構造を成すシーン記述データへ変換する。
【００９５】
第１の具体例のフィルタ２３を備えた第３の実施の形態のデータ配信システムによれば、上述のように、階層化情報に基づいてシーン記述を変換することにより、配信に使用される伝送媒体２２に適したシーン記述データを配信することが可能となり、また、復号端末１２の性能に合せたシーン記述を配信することが可能となる。
【００９６】
以下、フィルタ２３におけるシーン記述変換処理の手順を図１０に示す。
【００９７】
図１０において、フィルタ２３は、先ず、ステップＳ２００として、シーン記述を後述するような分割候補単位に分割する。図１０においては、分割候補の番号をｎで表す。また、入力されたシーン記述を複数の階層から成るシーン記述データへ変換するため、出力となるシーン記述データの階層をｍで表す。階層の番号ｍは０から開始し、番号が小さいほど基本的な階層を表すものとする。
【００９８】
次に、フィルタ２３は、ステップＳ２０１として、階層化情報に基づいて、分割候補ｎを現在の階層として出力可能であるか判断する。例えば、階層化情報によって現在の階層に許されるデータのバイト数が制限される場合には、分割候補ｎを加えても現在の階層の出力シーン記述が、上記制限されるバイト数以下であるかを調べる。このステップＳ２０１において、分割候補ｎを現在の階層に出力不可と判断された場合にはステップＳ２０２へ進み、一方、出力可能ならばステップＳ２０３へ進む。
【００９９】
ステップＳ２０２へ進むと、フィルタ２３は、階層の番号ｍを１進める。つまり、現在の階層ｍへの出力を終了し、以降は新しい階層のシーン記述データへ出力することとする。そして、ステップＳ２０３へ進む。
【０１００】
ステップＳ２０３に進むと、フィルタ２３は、現在の階層ｍへ、分割候補ｎを出力する。そして、ステップＳ２０４へ進む。
【０１０１】
ステップＳ２０４へ進むと、フィルタ２３は、全ての分割候補を処理したか否か判断し、処理したならば、変換処理を終了する。一方、まだ分割候補が残っているならば、ステップＳ２０５へ進む。
【０１０２】
ステップＳ２０５に進むと、フィルタ２３は、分割候補の番号ｎを１進める。つまり、次の分割候補を処理対象とする。そして、ステップＳ２０１から処理を繰り返す。
【０１０３】
ここで、ＭＰＥＧ４ＢＩＦＳを例に取り、図１０に示したフィルタ２３でのシーン記述変換処理における分割について、図１１を用いて説明する。
【０１０４】
先ず、図１１のシーン記述データの内容から説明し、その後に、フィルタ２３でのシーン記述処理における分割の説明を行う。
【０１０５】
図１１において、ransformノード３０２は、三次元の座標変換を指定可能なノードで、そのtranslationフィールド３０３に座標原点の平行移動量を指定できる。フィールドには他のノードを指定可能なフィールドも存在し、シーン記述の構成は図１２のようなツリー構造を成す。図１２中の楕円はノードを表し、ノード間の破線はイベントの伝播経路を表し、ノード間の実線はノードの親子関係を表す。親ノードに対して、その親ノードのフィールドを表すノードを子ノードと呼ぶこととする。例えば図１１中のTransformノード３０２は、Transformノードにより座標変換される子ノード群を示すChildrenフィールド３０４があり、TouchSensorノード３０５およびShapeノード３０６が子ノードとしてグルーピングされている。このようにChildrenフィールドに子ノードをグルーピングするノードを、グルーピングノードと呼ぶ。グルーピングノードとは、ＩＳＯ／ＩＥＳ１４７７２−１の４．６．５章に定義されているノードで、ノードのリストから成るフィールドを持つノードを指す。ＩＳＯ／ＩＥＳ１４７７２−１の４．６．５章に定義されているように、フィールド名がChildrenではない特別な例外もあるが、以下、Childrenフィールドは、このような例外も含むものとして説明する。
【０１０６】
表示する物体をシーン中に配置するには、物体を表すノードを、属性を表すノードと共にグループ化し、配置位置を示すノードによりさらにグループ化する。図１１中のShapeノード３０６が表している物体は、その親ノードであるTransformノード３０２によって指定された平行移動を適用されて、シーン中に配置される。図１１のシーン記述は、球体を表すSphereノード３０７、立方体を表すBoxノード３１２、円錐を表すConeノード３１７および円柱を表すCylinderノード３２２を含んでおり、この例のシーン記述を復号及び表示した結果は図１３に示すようになる。
【０１０７】
シーン記述は、ユーザインタラクションを含むことも出来る。図１１中のROUTEは、イベントの伝播を表している。ROUTE３２３は、２という識別子を割り当てられたTouchSensorノード３０５のtouchTimeフィールドが変化した場合に、その値がイベントとして５という識別子を割り当てられたTimeSensorノード３１８のstartTimeフィールドへ伝播する、ということを示している。ＶＲＭＬではＤＥＦというキーワードに続く任意の文字列により識別子を表し、ＭＰＥＧ４ＢＩＦＳでは、ノードＩＤ（nodeID）と呼ばれる数値が識別子として用いられる。TouchSensorノード３０５は、その親ノードであるTransformノード３０２のChildrenフィールド３０４にグルーピングされているShapeノード３０６をユーザが選択した場合に、選択した時刻をtouchTimeイベントとして出力する。このようにグルーピングノードによって付随したShapeノードと共にグルーピングされて働くセンサーを、以下、Sensorノードと呼ぶ。ＶＲＭＬにおけるSensorノードとは、ＩＳＯ／ＩＥＣ１４７７２−１の４．６．７．３章に定義されているPointing-device sensorsであり、付随したShapeノードとは、Sensorノードの親ノードにグルーピングされているShapeノードを指す。一方、TimeSensorノード３１８は、startTimeから１秒間の間、経過時間をfraction_changedベントとして出力する。
【０１０８】
ROUTE３２４により、TimeSensorノード３１８から出力された経過時間を表すfraction_changedイベントは、６という識別子を割り当てられたColorInterpolatorノード３１９のset_fractionフィールドへ伝播される。ColorInterpolatorノード３１９は、ＲＧＢ色空間の値を線形補間する機能を持つ。ColorInterpolatorノード３１９のkeyとkeyValueフィールドは、入力となるset_fractionフィールドの値が０の場合にはvalue_changedとしてＲＧＢの値［０００］をイベント出力し、入力となるset_fractionフィールドの値が１の場合にはvalue_changedとしてＲＧＢの値［１１１］をイベント出力することを表している。入力となるset_fractionフィールドの値が０と１の間の場合には、value_changedとしてＲＧＢの値［０００］と［１１１］の間を線形補完した値をイベント出力する。つまり、入力となるset_fractionフィールドの値が０．２の場合にはvalue_changedとしてＲＧＢの値［0.2 0.2 0.2］をイベント出力する。
【０１０９】
ROUTE３２５により、線形補間結果の値value_changedは、４という識別子を割り当てられたMaterialノード３１４のdiffuseColorフィールドへ伝播される。diffuseColorは、Materialノード３１４が属しているShapeノード３１１が表す物体表面の拡散色を表している。上記のROUTE３２３、ROUTE３２４およびROUTE３２５によるイベント伝播により、ユーザが表示されている球体を選択した直後から1秒の間、表示されている立方体のＲＧＢ値が、［０００］から［１１１］まで変化するというユーザインタラクションが実現される。このユーザインタラクションは、ROUTE３２３，ROUTE３２４，ROUTE３２５と、図１２中の太線枠で示したイベントの伝播に関連するノードにより表されており、このようにユーザインタラクションに必要なシーン記述中のデータを、以下、イベント伝播に必要なデータ、と呼ぶこととする。なお、太線枠で示した以外は、イベントに関連しないノードである。
【０１１０】
以上のように一例として挙げた図１１のシーン記述データについて、本実施の形態の第１の具体例のフィルタ２３では、図１０のステップＳ２００において、シーン記述を分割候補単位へ分割する。
【０１１１】
ここで、いわゆるNode Insertion commandを用いるために、グルーピングノードのChildrenフィールドを分割単位とする。ただし、ユーザインタラクションのためのイベント伝播に必要なデータは分割しないとすると、図１１に示す３つの分割候補Ｄ０，Ｄ１，Ｄ２となる。
【０１１２】
入力シーン記述中の最上位ノードであるGroupノード３００を含む分割単位を、ｎ＝０の分割候補Ｄ０とする。Transformノード３１５以下のノードをｎ＝１の分割候補Ｄ１とする。ｎ＝１の分割候補Ｄ１中のShapeノード３１６は、グルーピングノードであるTransformノード３１５のChildrenフィールドであるため、別個の分割候補とすることも可能である。
【０１１３】
ただし、この例ではTransformノード３１５がShapeノード３１６以外のChildrenフィールドを持たないことから、Shapeノード３１６を別の分割候補にはしていない。Transformノード３２０以下のノードをｎ＝２の分割候補Ｄ２とする。同様に、Shapeノード３２１以下を別の分割候補としても良い。
【０１１４】
ｎ＝０の分割候補Ｄ０は、階層ｍ＝０へ必ず出力される。ｎ＝１の分割候補Ｄ１は、図１０のステップＳ２０１により、階層化情報に基づいて、ｍ＝０の階層へ出力可能であるか判断される。
【０１１５】
次に、図１４には、階層化情報により、出力するシーン記述データの階層に許されるデータ量が指定される場合の判断例を示す。図１４中のＡの例では、階層ｍ＝０にｎ＝１の分割候補Ｄ１も出力したとすると、階層ｍ＝０に許されているデータ量を上回ってしまうことから、ｎ＝１の分割候補Ｄ１を階層ｍ＝０へ出力不可と判断する。
【０１１６】
従って、図１０のステップＳ２０２の手順により、図１４中のＢに示す階層ｍ＝０の出力はｎ＝０の分割候補Ｄ０のみを含むと決定され、以降は階層ｍ＝１に出力することとする。ステップＳ２０３の手順により、階層ｍ＝１へｎ＝１の分割候補Ｄ１を出力する。
【０１１７】
次のｎ＝２の分割候補Ｄ２についても同様の手順を行うと、図１４中のＡに示すように、階層ｍ＝１にｎ＝２の分割候補Ｄ２を出力しても、階層ｍ＝０と階層ｍ＝１の合計に許されるデータ量を超過しないため、図１４中のＣに示すように、ｎ＝２の分割候補Ｄ２は、ｎ＝１の分割候補Ｄ１と同じ階層ｍ＝１へ出力することが決定される。
【０１１８】
上記の手順により、フィルタ２３は、入力のシーン記述を、図１４中のＢに示す階層ｍ＝０の変換済みシーン記述データ出力と図１４中のＣに示す階層ｍ＝１の変換済みシーン記述データ出力との２階層から成るシーン記述データ出力へと変換する。
【０１１９】
また、図１５中のＡに示すシーン記述の変換例は、図１４のＡと同様のシーン記述入力に対して、異なる階層化情報に基づいて変換を行った結果、３階層から成るシーン記述データ出力へと変換された例を示している。
【０１２０】
すなわち、図１５中のＡに示したシーン記述は、図１４の場合と同様にして、図１５中のＢに示す階層ｍ＝０の変換済みシーン記述データ出力、図１５中のＣに示す階層ｍ＝１の変換済みシーン記述データ出力、図１５中のＤに示す階層ｍ＝２の変換済みデータ出力に変換される。
【０１２１】
この変換結果例において、シーン記述の配信に使用する伝送媒体の伝送容量が低く、階層ｍ＝０に許されるデータ量までしか伝送できない伝送媒体に対しては、図１５中のＢに示す階層ｍ＝０のシーン記述データのみを配信する。
【０１２２】
階層ｍ＝０のシーン記述のみであっても、ユーザインタラクションのためのイベント伝播に必要なデータは分割されていないために、復号端末１２において、変換前と同様のユーザインタラクションが実現できる。
【０１２３】
また、伝送容量が、ｍ＝０およびｍ＝１の階層を合計したデータ量に対して充分である伝送媒体に対しては、図１５中のＢに示したｍ＝０およ図１５中のＣに示したびｍ＝１双方の階層のシーン記述データを配信する。
【０１２４】
階層ｍ＝１のシーン記述データは、Node Insertion commandにより階層ｍ＝０のシーン記述に挿入されるため、復号端末１２においては変換前と同様のシーン記述を復号し、表示することが可能である。
【０１２５】
第１の具体例のフィルタ２３は、時間変化する階層化情報に基づいてシーン記述を変換することにより、伝送媒体２２の伝送容量が変化する場合にも適応することが可能となった。なお、伝送媒体２２に変換したシーン記述データを記録する場合にも同様の効果がある。
【０１２６】
また、図１５の変換結果例において、シーン記述を受信して復号及び表示する復号端末１２の復号及び表示能力が低く、階層ｍ＝０に許されるデータ量までしか復号及び表示できない復号端末１２に対しては、図１５中のＢに示した階層ｍ＝０のシーン記述データのみを配信することができる。
階層ｍ＝０のシーン記述のみであっても、ユーザインタラクションのためのイベント伝播に必要なデータは分割されていないために、復号端末１２において、変換前と同様のユーザインタラクションが実現できる。
【０１２７】
また、復号及び表示能力が、ｍ＝０およびｍ＝１の階層を合計したデータ量に対して充分である復号端末１２に対しては、図１５中のＢに示したｍ＝０および図１５中のＣに示したｍ＝１双方の階層のシーン記述データを配信する。
【０１２８】
階層ｍ＝１のシーン記述１００データは、Node Insertion commandにより階層ｍ＝０のシーン記述に挿入されるため、復号端末１２においては変換前と同様のシーン記述を復号し、表示することが可能である。
【０１２９】
以上のように第１のフィルタ２３によれば、時間変化する復号端末情報に基づいてシーン記述を変換することにより、復号端末１２の復号および表示能力が動的に変化したり、あらたな性能を持つ復号端末１２が配信対象に加えられた場合にも適応することが可能となった。
【０１３０】
なお、ＭＰＥＧ４ＢＩＦＳにおいては、シーン記述を階層化するために、ノードを挿入するコマンドを使用しても良いし、Inlineノードを使用しても良い。また、ＩＳＯ／ＩＥＣ１４７７２−１の４．９章に記載のEXTERNPROTOを使用しても良い。EXTERNPROTOとは、外部のシーン記述データ中でPROTOと呼ばれるノード定義方法により定義したノードを参照する方法であり、ＭＰＥＧ４ＢＩＦＳにおいてもＶＲＭＬと同様にEXTERNPROTOを使用することが出来る。
【０１３１】
また、ＩＳＯ／ＩＥＣ１４７７２−１の４．６．２章に記載のＤＥＦ／ＵＳＥは、ノードにＤＥＦにより名前を付け、シーン記述中の他の場所からＵＳＥによりＤＥＦしたノードを参照することを可能としている。
【０１３２】
ＭＰＥＧ４ＢＩＦＳにおいてもノードにノードＩＤと呼ばれる数値の識別子をＤＥＦと同様に設け、シーン記述中の他の場所からノードＩＤを指定することによりＵＳＥと同様に使用するというＶＲＭＬと同様の参照が可能である。
【０１３３】
従って、シーン記述を階層化する際に、ＩＳＯ／ＩＥＣ１４７７２−１の４．６．２章に記載のＤＥＦ／ＵＳＥを使用している部分を異なる分割候補に分割しなければ、ＵＳＥからＤＥＦしたノードへの参照関係を壊すこと無く、シーン記述変換を行うことが可能である。
【０１３４】
図１４および図１５では、階層化情報として、各階層に許されるデータ量を用いた例を示したが、階層化情報はシーン記述中の分割候補をある階層のシーン記述データに含めて良いか判断できる情報であれば良く、例えば階層中に含まれるノード数の上限や、階層中に含まれるコンピュータグラフィックスにおけるポリゴンデータの数などでも良く、階層中に含まれるオーディオやビデオなどのメディアデータの制限でも良く、また複数の階層化情報を組み合わせても良い。
【０１３５】
以上のように、第１の具体例のフィルタ２３によれば、入力のシーン記述を複数の階層構造を成すシーン記述データに変換したことにより、シーン記述を伝送する際に、伝送容量を節約する目的で、シーン記述の階層構造を利用することが可能である。
【０１３６】
また、第１の具体例のフィルタ２３によれば、シーン記述を複数の階層から成るシーン記述データに変換しておき、データを削除する際には、削除すべきデータ量に達するまでの階層のシーン記述データのみを削除することにより、そのシーン記述が記述していたコンテンツの情報の一部を保存しておくことが可能となる。
【０１３７】
その他、以上説明したことは、シーン記述方法の種類に依存せず、分割可能なあらゆるシーン記述方法において有効である。
【０１３８】
次に、本発明の第３の実施の形態に適用される前記第２の具体例のフィルタ２３の動作について説明する。
【０１３９】
当該第２の具体例のフィルタ２３は、図１６に示すように、シーン記述処理部２４とＥＳ（Elementary Stream）処理部２５、及びそれらの動作を制御する制御部２６とを備え、シーン記述処理部２４によりシーン記述データを変更すると共に、シーン記述データ以外のマルチメディアデータをＥＳ処理部２５により変更可能となものを挙げることができる。ＥＳ処理部２５は、伝送容量や復号端末の能力に合わせてデータを異なるビットレートのデータへ再符号化するなどして変換を行うものである。また、シーン記述処理部２４は、例えば伝送媒体２２の伝送容量や復号端末１２の処理能力に合わせてシーン記述の内容を変換することにより、データ量の調節を行うものである。これらシーン記述処理部２４やＥＳ処理部２５を備えたフィルタ２３を、本発明の第１又は第２の実施の形態のデータ変換部７と組み合わせて使用することにより、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信が可能となる。なお、この例の場合、図示は省略するが、復号端末１２の復号部１５には、ＥＳを復号してビデオデータやオーディオデータ等を復元するＥＳ復号部と、シーン記述を復号すると共にその復号されたシーン記述に基づいてビデオやオーディオデータ等を用いたシーンを構成するＥＳシーン記述復号部とを備えることになる。
【０１４０】
ここで、この第２の具体例のフィルタ２３を備えた第３の実施の形態のデータ配信システムは、伝送媒体２２の伝送可能帯域やトラフィックの混雑状態が変化する場合に、伝送するデータに遅延や損失が生じてしまうという問題に対処するために、以下のようなことを行うようになされている。
【０１４１】
サーバ１０の送信部５は、伝送路（伝送媒体２２）へ送出するデータのパケット毎に通し番号（符号化した通し番号）を付加する機能を有し、一方、復号端末１２の受信部１３は、受け取ったパケット毎に付加されている通し番号（符号化された通し番号）の欠落を監視することにより、データの損失（データ損失割合）を検出する機能を備えている。或いは、サーバ１０の送信部５は、伝送路へ送出するデータに時刻情報（符号化した時刻情報）を付加する機能を備え、一方、復号端末１２の受信部１３は、伝送路から受け取ったデータに付加されている時刻情報（符号化された時刻情報）を監視し、その時刻情報により伝送遅延を検出する機能を備える。復号端末１２の受信部１３は、このようにして伝送路のデータ損失割合、或いは伝送遅延等を検出すると、その検出情報をサーバ１０の送信部５へ送信（報告）する。
【０１４２】
また、サーバ１０の送信部５は、伝送状態検出機能を備え、当該伝送状態機能では、復号端末１２の受信部１３から送られてきた伝送路のデータ損失割合、或いは伝送遅延等の情報により、伝送路の伝送可能帯域やトラフィックの混雑状態を検出する。すなわち、伝送状態検出機能は、データ損失が高ければ伝送路が混雑していると判断し、或いは、伝送遅延が増加したならば伝送路が混雑していると判断する。また、帯域予約型の伝送路が使用されている場合、伝送状態検出機能は、サーバ１０が利用可能な空き帯域幅（伝送可能帯域）を直接知ることが出来る。なお、伝送帯域については、気象条件などに左右される電波などの伝送媒体が用いられた場合、気象条件などに応じてユーザが予め設定する場合もある。当該伝送状態検出機能での伝送状態の検出情報は、フィルタ２３の制御部２６へ送られる。
【０１４３】
制御部２６は、伝送路の伝送可能帯域やトラフィックの混雑状態の検出情報を元に、ＥＳ処理部２５において例えばビットレートの異なるＥＳが選択的に切り替えられるような制御を行ったり、或いは、ＥＳ処理部２５にてＩＳＯ／ＩＥＣ１３８１８（いわゆるＭＰＥＧ２）等の符号化が行われる場合にはその符号化ビットレートを調整するなどの制御を行う。すなわち、例えば伝送路が混雑していると検出された場合に、ＥＳ処理部２５からビットレートの低いＥＳを出力するようなことを行えば、データの遅延を回避することが可能となる。
【０１４４】
また例えば、サーバ１０に不特定多数の復号端末１２が接続されていて、それら復号端末１２の仕様が予め統一されておらず、様々な処理能力を持つ復号端末１２に向けて当該サーバ１０からＥＳを送信するようなシステム構成の場合、これら復号端末１２の受信部１３は伝送要求処理機能を備え、当該伝送要求処理機能は、自己の復号端末１２の処理能力に応じたＥＳを要求するための伝送要求信号をサーバ１０へ送信する。この伝送要求信号には、自己の復号端末１２の能力を表す信号も含まれる。当該伝送要求処理機能からサーバ１０へ渡される、自己の復号端末１２の能力を表す信号としては、例えばメモリサイズ、表示部の解像度、演算能力、バッファサイズ、復号可能なＥＳの符号化フォーマット、復号可能なＥＳの数、復号可能なＥＳのビットレートなどを挙げることができる。上記の伝送要求信号を受け取った送信部５は、その伝送要求信号をフィルタ２３の制御部２６へ送り、当該制御部２６は、復号端末１２の性能に適合するようなＥＳが送信されるように、ＥＳ処理部２５を制御する。なお、ＥＳ処理部２５が復号端末１２の性能に適合するようにＥＳを変換する際の画像信号変換処理については、例えば、本件出願人により既に提案がなされている画像信号変換処理方法がある。
【０１４５】
さらに、上記制御部２６は、送信部５の伝送状態検出機能により検出された伝送路の状態に応じて、ＥＳ処理部２５だけでなくシーン記述処理部２４もコントロールする。また、制御部２６は、復号端末１２が自己の復号，表示性能に応じたシーン記述を要求する復号端末である場合には、その復号端末１２の受信部１３の伝送要求処理機能から送られてきた当該復号端末自身の能力を表す信号に応じて、ＥＳ処理部２５およびシーン記述処理部２４をコントロールする。なお、制御部２６とシーン記述処理部２４、ＥＳ処理部２５は、一体の構成であっても良い。
【０１４６】
以下、制御部２６の制御の元で、ＥＳ処理部２５が、複数のＥＳの内から送信する特定のＥＳを選択する際の選択方法について説明する。
【０１４７】
上記制御部２６は、上記複数のＥＳの各ＥＳ毎に、伝送時の優先度を表す伝送優先度情報を保持しており、ＥＳを送信する際の伝送路の状態若しくは復号端末１２からの要求に応じて、上記伝送優先度の高い順に送信可能なＥＳを決定する。すなわち、制御部２６は、ＥＳを送信する際の伝送路の状態若しくは復号端末１２からの要求に応じて、上記伝送優先度の高い順に送信可能なＥＳが送信されるように、ＥＳ処理部２５をコントロールする。なおここでは、例えば制御部２６が伝送優先度情報を保持しているとして説明するが、記憶部９に記憶させておいても良い。
【０１４８】
図１７には、例えばＥＳａ、ＥＳｂ、ＥＳｃの３つのＥＳが存在する場合の各ＥＳの伝送優先度の一例を示している。すなわち、図１７の例では、ＥＳａの伝送優先度が「３０」、ＥＳｂの伝送優先度が「２０」、ＥＳｃの伝送優先度が「１０」となされている。それら伝送優先度は、値が小さいほど、伝送時の優先度が高いとする。また、図１７中のＲａはＥＳａを伝送する際の伝送ビットレートであり、ＲｂはＥＳｂを伝送する際の伝送ビットレートであり、ＲｃはＥＳｃを伝送する際の伝送ビットレートである。
【０１４９】
ここで、伝送路の状態や復号端末１２からの要求により、送信可能なビットレートＲが定まった場合、制御部２６は、伝送優先度が高い順に、上記伝送可能なビットレートＲを超えない範囲でＥＳが選択されて送信されるように、ＥＳ処理部２４をコントロールする。
【０１５０】
すなわち例えば、伝送可能なビットレートＲと、各ＥＳの伝送ビットレートとの関係が式（４）で表されるとき、制御部２６は、最も伝送優先度が高いＥＳｃのみを選択して送信するように、ＥＳ処理部２５をコントロールする。
【０１５１】
Ｒｃ≦Ｒ＜（Ｒｃ＋Ｒｂ） (4)
また例えば、伝送可能なビットレートＲと、各ＥＳの伝送ビットレートとの関係が式（５）で表されるとき、制御部２６は、最も伝送優先度が高いＥＳｃと次に（２番目に）伝送優先度が高いＥＳｂを選択して送信するように、ＥＳ処理部２５をコントロールする。
【０１５２】
（Ｒｃ＋Ｒｂ）≦Ｒ＜（Ｒｃ＋Ｒｂ＋Ｒａ） (5)
また例えば、伝送可能なビットレートＲと、各ＥＳの伝送ビットレートとの関係が式（６）で表されるとき、制御部２６は、全てのＥＳを選択して送信するように、ＥＳ処理部２５をコントロールする。
【０１５３】
（Ｒｃ＋Ｒｂ＋Ｒａ）≦Ｒ (6)
このように、第３の具体例のフィルタ２３を備えた第３の実施の形態のデータ配信システムによれば、制御部２６がＥＳ毎に伝送優先度情報を保持し、ＥＳを送信する際の伝送路の状態や復号端末１２からの要求に応じて、その伝送優先度の高い順に送信可能なＥＳを決定することにより、複数存在するＥＳの内から重要なＥＳを優先して伝送することが可能となっている。
【０１５４】
上述の説明では、予め設定された優先度に基づいて、ＥＳの選択やシーン記述の変換を行う例を挙げているが、当該ＥＳの変換に伴って優先度を変更することも可能である。なお、ＥＳの変換に伴って優先度を変更する場合、当該優先度の変更は、例えばＥＳ処理部２５にて行う。
【０１５５】
図１８には、ＥＳａのビットレートをＲａ'になるよう変換したことに伴い、ＥＳ処理部２５により変換された伝送優先度の一例を示す。なお、図１８は、ＥＳａのビットレートを図１７の例のビットレートＲａよりも低いビットレートＲａとした場合を例に挙げており、当該ビットレートを低くしたことに伴って、伝送優先度を例えば高く変換（図１７では「３０」であったものを図１８では「１５」に変換）している。
【０１５６】
さらに、上記伝送優先度は、予め設定した値を制御部２６が保持しておく場合の他に、例えば、ＥＳのビットレートや画枠等の符号化パラメータに応じて設定することができる。例えば図１９に示すように、ＥＳのビットレートＲと伝送優先度の関係Ｐｓ（Ｒ）を保持しておくことにより、伝送優先度をＥＳのビットレートに応じて設定することもできる。すなわち例えば、ビットレートが高いほど伝送コストが高くなると考えられるため、図１９の例のように、ＥＳのビットレートが高いほど伝送優先度を低く割り当てることにより、伝送コストの低い（ビットレートの低い）ＥＳを優先して送信することが可能となる。
【０１５７】
また、画像データのようにＥＳ自体が明示的な画枠を持っている場合は、その画枠に応じて伝送優先度を設定することも可能である。例えば、図２０には、ＥＳの画枠領域Ｓと伝送優先度の関係Ｐｓ（Ｓ）の例を示しており、この画枠領域Ｓと伝送優先度の関係Ｐｓ（Ｓ）を保持しておくことにより、伝送優先度をＥＳの画枠に応じて設定することが出来るようになる。すなわち、一般に画枠が大きいほど伝送コストが高いと考えられるため、図２０の例のように、画枠が大きいほど伝送優先度を低く割り当てることにより、伝送コストが低くなると考えられるＥＳを優先して送信するようなことが可能となる。
【０１５８】
上述したように、ＥＳのビットレートや画枠などの符号化パラメータに応じて伝送優先度を設定する方法は、ＥＳ処理部２５がＥＳの変換に伴って伝送優先度を変更する際にも使用できる。例えば、ＥＳ処理部２５がビットレートＲａのＥＳをビットレートＲａ’へ変換したならば、図１９に示すように伝送優先度をＰｓ（Ｒａ’）へ変更することが出来る。
【０１５９】
また、伝送優先度は、動画像や静止画像、テキスト等のＥＳの種類や、ＥＳの符号化フォーマット毎に割り当てても良い。例えばテキストには常に最高の伝送優先度を割り当てるとすれば、伝送路の状態や復号端末からの要求によって伝送可能なビットレートが制限される場合でも、テキストデータは常に優先して送信することが可能となる。
【０１６０】
また、伝送優先度は、ユーザの嗜好に基づいて決定することもできる。すなわち、サーバ１０が、ユーザが好む動画像や静止画像、テキスト等のＥＳの種類や、ＥＳの符号化フォーマット、ＥＳの符号化パラメータ等の嗜好情報を保持しておくことにより、ユーザが好むＥＳの種類、符号化フォーマット、符号化パラメータを持つＥＳに高い伝送優先度を割り当てることができる。これにより、伝送路の状態や復号端末からの要求に応じて伝送可能なビットレートが制限される場合でも、ユーザの嗜好に合ったＥＳを優先的に送信し、高品質で表示させることが可能となる。
【０１６１】
上述したように、制御部２６がＥＳ毎に伝送優先度情報を保持し、送信する際の伝送路の状態若しくは復号端末１２からの要求に応じて、伝送優先度の高い順に送信可能なＥＳを決定することにより、重要なＥＳを優先して送信することが可能となっている。
【０１６２】
また、本発明の第３の実施の形態に適用される前記第３の具体例のフィルタ２３では、以下のようにして、特殊再生中であってもビットレートなどの評価基準を満たすデータの配信を可能とする。すなわち、この第３の具体例のフィルタ２３に設けられるシーン記述処理部２４は、制御部２６の制御の元で、以下に述べる第１〜第５のシーン記述処理を行うことができる。
【０１６３】
第１のシーン記述処理として、第３の具体例のフィルタ２３は、例えばＥＳ処理部２５より出力されるＥＳに適したシーン記述を出力可能となっている。すなわち、シーン記述処理部２４は、制御部２６の制御の元で、ＥＳ処理部２５より出力されるＥＳに適したシーン記述を出力可能となされている。以下、図２１〜図２５を用いて第１のシーン記述処理を具体的に説明する。
【０１６４】
図２１には、動画像ＥＳと静止画像ＥＳによって構成されたシーンの一表示例を示す。図２１中のＥｓｉはシーン表示領域を示し、図中のＥｍｖはシーン表示領域Ｅｓｉ内の動画像ＥＳ表示領域を、図中のＥｓｖはシーン表示領域Ｅｓｉ内の静止画像ＥＳ表示領域を示している。
【０１６５】
また、図２２には、図２１のシーン表示領域Ｅｓｉに対応したシーン記述を、ＭＰＥＧ４ＢＩＦＳにて記述した場合の内容、テキストにて表す。
【０１６６】
この図２２に示したシーン記述は、２つの立方体を含み、それぞれの表面には、動画像と静止画像をテクスチャとして貼り付けることが指定されている。それぞれの物体は、Transformノードによって座標変換指定されており、図中の＃５００と＃５０２で示されたtranslationフィールドの値（ローカル座標の原点位置）により、その物体が平行移動してシーン中に配置される。また、図中の＃５０１と＃５０３で示された値（ローカル座標のスケーリング）により、Transformノードに含まれる物体の拡大，縮小が指定されている。
【０１６７】
ここで例えば、伝送路（伝送媒体２２）の状態若しくは復号端末１２からの要求によって配信データのビットレートを下げる必要が生じた場合において、例えば伝送時に多くのデータ量が必要となる動画像ＥＳのビットレートを下げるようなＥＳの変換処理を行ったとする。なお、この時点で静止画像については、例えば高解像度の静止画像ＥＳが既に伝送されており、復号端末側に蓄積されているとする。
【０１６８】
この場合、従来のデータ配信システムでは、ＥＳのビットレート調整の有無に関わらず同一のシーン構成で復号及び表示がなされるため、ビットレートが下げられた動画像は画質等の劣化が目立つようになる。すなわち、図２１の例を挙げて具体的に説明すると、従来のデータ配信システムでは、図２１中の動画像ＥＳ表示領域Ｅｍｖに表示されることになる動画像ＥＳのビットレートを下げるような調整が行われた場合であっても、その調整以前のものと同じシーン構成のままでＥＳの復号及び表示（実際のビットレートに見合わない広い動画像ＥＳ表示領域Ｅｍｖへの表示）がなされるため、動画像が粗く（例えば空間解像度が粗く）なり、画質の劣化が目立つようになる。
【０１６９】
これに対し、動画像ＥＳのビットレートを下げた場合に、例えば図２３に示すように、動画像ＥＳ表示領域Ｅｍｖを狭くするようなことを行えば、当該動画像ＥＳ表示領域Ｅｍｖに表示される動画像の画質劣化（この例の場合、空間解像度の劣化）を目立たなくすることが可能になると考えられる。また、本実施の形態の場合、静止画像については、既に静止画像ＥＳが伝送されて復号端末に蓄積されているが、当該静止画像が例えば高解像度画像であり、図２１中の静止画像ＥＳ表示領域Ｅｓｖが当該解像度には見合わない狭い領域であったような場合には、例えば図２３に示すように静止画像ＥＳ表示領域Ｅｓｖを広くすれば、その解像度を十分に活かすことができると考えられる。このように、動画像ＥＳ表示領域Ｅｍｖを狭くし、また、静止画像ＥＳ表示領域Ｅｓｖを広くするような対処は、シーン記述をそのような内容を表すシーン記述に変更しなければ実現できない。
【０１７０】
そこで、第３の具体例のフィルタ２３に設けられているシーン記述処理部２４は、ＥＳ処理部２５におけるＥＳのビットレート調整に応じて、シーン記述を動的に変更して出力するようなことを行う。言い換えると、この第３の具体例における制御部２６では、ＥＳ処理部２５を制御してＥＳのビットレート調整を行わせた場合、そのＥＳ処理部２５から出力されるＥＳに適したシーン記述が出力されるようにシーン記述処理部２４を制御することをも行う。これにより、上述の例のように動画像のビットレートを下げたときの画質の劣化を目立たなくしている。なお、この例では、既に伝送済みの静止画像の解像度を活かすために、図２３に示すように動画像ＥＳ表示領域Ｅｍｖを狭くし、一方、静止画像ＥＳ領域Ｅｓｖを広くする、というような対応を実現している。
【０１７１】
図２４を用いて、上述したことを実現する制御部２６の具体的な動作を説明する。
【０１７２】
図２４において、伝送路の状態若しくは復号端末１２からの要求によって配信データのビットレートを下げる必要が生じた場合、制御部２６は、時刻Ｔにおいて、動画像ＥＳ２９２よりもビットレートを下げた動画像ＥＳ２９３が出力されるようにＥＳ処理部２５を制御する。
【０１７３】
また、制御部２６は、時刻Ｔにおいて、図２１のシーン表示領域Ｅｓｉに対応したシーン記述２９０を、図２３のシーン表示領域Ｅｓｉに対応したシーン記述２９１へ変更するように、シーン記述処理部２４を制御する。すなわちこのときのシーン記述処理部２４は、制御部２６の制御の元で、図２１のシーン表示領域Ｅｓｉを表す前述の図２２に示したシーン記述を、図２３のシーン表示領域Ｅｓｉを表す図２５に示すようなシーン記述へ変換する。なお、この図２５のシーン記述も図２２の場合と同様に、ＭＰＥＧ４ＢＩＦＳにて記述されるシーン記述の内容テキストで示している。
【０１７４】
前述の図２２のシーン記述と比較して、図２５に示したシーン記述では、図中＃６００と＃６０２で示されたtranslationフィールドの値（ローカル座標の原点位置）が変更されていることにより、２つの立方体を移動させ、図中＃６０１と＃６０３で示されたtranslationフィールドの値（ローカル座標のスケーリング）により、表面に動画像（図２３のＥｍｖ）を貼り付けた立方体を小さく変換し、代わりに表面に静止画像（図２３のＥｓｖ）を貼り付けた立方体を大きく変換している。
【０１７５】
この第１のシーン記述処理のように、例えば図２２に示したシーン記述から図２５に示したシーン記述への変換処理は、シーン記述処理部２４において、予め記憶部９に記憶されている複数のシーン記述のなかからＥＳ処理部２５より出力されるＥＳに対応したシーン記述（図２５のシーン記述）を選択的に読み出して送出する処理、若しくは、記憶部４から読み出されているシーン記述（図２２のシーン記述）を、ＥＳ処理部２５より出力されるＥＳに対応したシーン記述（図２５のシーン記述）に変換して送出する処理、或いは、ＥＳ処理部２５が出力するＥＳに対応するシーン記述データ（図２５のシーン記述）を生成若しくは符号化して送出する処理などを行うことにより実現される。なお、シーン記述の変化分のみを記述可能なシーン記述方法を用いている場合には、その変化分のみを送信するようにしても構わない。また、上述の例では、動画像ＥＳのビットレートを下げたときにその動画像ＥＳ表示領域Ｅｍｖを狭める場合について説明を行ったが、逆に、ビットレートを上げたときに動画像ＥＳ表示領域Ｅｍｖを広げるような場合であっても当然に本発明にかかるシーン記述変換を適用できる。さらに、上述の例では、高解像度の静止画像ＥＳが予め伝送されて蓄積されているとして説明を行ったが、例えば、予め伝送されて蓄積されている静止画像が低解像度のものであった場合、新たに高解像度の静止画像ＥＳを伝送し、且つ、それに対応するシーン記述を伝送するようにしても良いことは言うまでもない。その他、本実施の形態では動画像と静止画像を例に挙げたが、本発明は他のマルチメディアデータのビットレート調整に応じてシーン記述を変更する場合も含まれる。
【０１７６】
以上、図２１〜図２５を用いて説明した第１のシーン記述処理によれば、シーンの構成情報を表すシーン記述を変換処理することにより、伝送路の状態や復号端末１２からの要求に合わせたシーン記述を送信可能となる上、例えばＥＳ処理部２５にてＥＳの変換が行われた場合には、その変換後のＥＳに最適なシーン記述を送信することが可能となる。
【０１７７】
次に、第２のシーン記述処理について説明する。
【０１７８】
例えば、伝送路や復号端末１２の状態に応じてＥＳ処理部２５よりＥＳのビットレートなどを変換してＥＳの復号に必要な情報が変化した場合、フィルタ２３は、第２のシーン記述処理として、そのＥＳの復号に必要な情報を含むシーン記述自体も変換して送信することにより、復号端末側でＥＳのデータ自体から復号に必要な情報を抽出する必要性を無くしている。すなわち、シーン記述処理部２４は、制御部２６の制御の元で、ＥＳ処理部２５でＥＳ変換処理が行われて当該ＥＳの復号に必要な情報が変化した場合、そのＥＳの復号に必要な情報を含むシーン記述を出力可能となされている。なお、ＥＳの復号に必要な情報とは、例えばＥＳの符号化フォーマット、復号に必要なバッファサイズ、ビットレートなどである。以下、前述した各図と図２６及び図２７を用いて、第２のシーン記述処理を具体的に説明する。
【０１７９】
図２６は、前述の図２１及び図２２にて説明したようなシーンで使用されるＥＳの復号に必要な情報の例を、ＭＰＥＧ４で定められている記述子ObjectDescriptorで記述したものである。図２２のシーン記述中で、物体表面にテクスチャとしてマッピングする動画像は３（=url3)という数値で指定されているが、これは図２６のObjectDescriptorの識別子であるODid=3に対応付けられる。識別子ODid=3のObjectDescriptor内に含まれるES_Descriptorは、ＥＳに関する情報を記述している。また、図中のES_IDは、ＥＳを一意に特定する識別子である。この識別子ES_IDはさらに、例えばＥＳを伝送するために使用している伝送プロトコル中のヘッダの識別子やポート番号等と関連付けることで、実際のＥＳに対応付けされる。
【０１８０】
また、ES_Descriptorの記述中には、DecoderConfigDescriptorというＥＳの復号に必要な情報の記述子が含まれる。当該記述子DecoderConfigDescriptorの情報は、例えばＥＳの復号に必要なバッファサイズや最大ビットレート、平均ビットレートなどである。
【０１８１】
一方、図２７は、前述の図２３に示したシーンに対応する、シーン記述処理部２４での変換処理後のシーン記述に付随するＥＳの復号に必要な情報の例を、ＭＰＥＧ４で定められている記述子ObjectDescriptorにより記述したものである。ＥＳの変換によって変化した動画像（ODidが3でシーン記述中から参照される）の復号バッファサイズ（bufferSiseDB）、最大ビットレート（maxBitRate）および平均ビットレート（avgBitRate）が、上記変換前の図２６に示したObjectDescriptor中の記述から図２７のように変換されている。すなわち、図２６の例では、bufferSiseDB=4000、maxBitRate=1000000、avgBitRate=1000000となっていたものが、図２７では、bufferSiseDB=2000、maxBitRate=5000000、avgBitRate=5000000に変換されている。
【０１８２】
この第２のシーン記述処理のように、シーン記述に付随するＥＳの復号に必要な情報の変換処理は、シーン記述処理部２４において、予め記憶部９に記憶されている複数のＥＳの復号に必要な情報のなかから、ＥＳ処理部２５より出力されるＥＳに対応した情報（図２７の情報）を選択的に読み出して送出する処理、若しくは、記憶部９から読み出されているＥＳの復号に必要な情報（図２６の情報）を、ＥＳ処理部２５より出力されるＥＳの復号に必要な情報（図２７の情報）に変換して送出する処理、或いは、ＥＳ処理部２５が出力するＥＳの復号に必要な情報（図２７の情報）を符号化して送出する処理などを行うことにより実現される。
【０１８３】
以上、説明した第２のシーン記述処理によれば、伝送路や復号端末１２の状態に応じてＥＳのビットレートなどを変換することによりＥＳの復号に必要な情報が変化した場合、図２７に示すように、シーン記述中に含まるＥＳの復号に必要な情報を変更して復号端末１２へ送信することにより、復号端末１２側でＥＳのデータ自体からＥＳ復号に必要な情報を抽出する必要性を無くすことが可能となっている。
【０１８４】
次に、第３のシーン記述処理について説明する。
【０１８５】
第３のシーン記述処理として、フィルタ２３は、シーンを構成するＥＳの数を増減するように明示的にシーン記述を変換して出力することにより、伝送帯域に見合うＥＳのみを送信可能にし、一方、復号端末１２においては、表示等に必要なＥＳをＥＳデータの到着遅れやデータの損失に依存せずに判断することを可能としている。すなわち、この例のシーン記述処理部２４は、制御部２６の制御の元で、ＥＳの数を増減するように明示的にシーン記述を変換して出力し、復号端末１２の復号部１５に設けられるシーン記述復号機能は、表示等に必要なＥＳをＥＳデータの到着遅れやデータの損失に依存せずに判断する。以下、前述した各図と図２８及び図２９を用いて第３のシーン記述処理を具体的に説明する。
【０１８６】
図２８は、前述の図２１及び図２２で説明したようなシーンから、例えば、動画像のＥＳを削除した場合のシーン記述を、ＭＰＥＧ４ＢＩＦＳで記述（分かり易くテキストとして記述）したものである。また、図２９は、図２８のシーン記述に基づいて表示されるシーンの一例を表し、シーン表示領域ＥｓｉにはイメージＥＳ表示領域（例えば静止画像ＥＳ表示領域）Ｅｉｍのみが配されている。図２８のシーン記述中で使用されるＥＳはODidが４のＥＳのみであることがシーン記述から判断可能であるため、復号端末１２においては、ODidが３の動画像ＥＳデータが到着しなくとも、それがＥＳデータの到着遅れやデータの損失に依るものではないと判断することが出来る。さらに、図２６や図２７の例のようなODidが３のObjectDescriptorの記述を削除することにより、ODidが３の動画像ＥＳは不要となったと判定することが出来る。
【０１８７】
また、この第３のシーン記述処理の例において、シーンを復号して構成するための処理負荷を一時的に減じたいとの伝送要求が復号端末１２から伝送された場合、フィルタ２３では、例えば図２２に示したシーン記述を図２８に示したシーン記述に変更することにより、動画像をシーン中にテクスチャとしてマッピングする処理を明示的に不要とすることを復号端末１２に知らせることが出来る。これにより、復号端末１２では、シーンを復号する処理負荷を減らすことが可能となる。
【０１８８】
この第３のシーン記述処理のように、前述の図２２に示したシーン記述から図２８に示したシーン記述への変換処理は、シーン記述処理部２４において、予め記憶部９に用意されている複数のシーン記述のなかから、ＥＳ処理部２５より出力されるＥＳ数に対応付けられているシーン記述（図２８のシーン記述）を選択的に読み出して送出する処理、若しくは、記憶部９から読み出されたシーン記述を入力とし、出力しないＥＳに対応する部分データ（シーン記述中のデータ）を削除したシーン記述（図２８のシーン記述）へ変換して出力する処理、或いは、シーン記述を符号化出力する場合には、出力しないＥＳに対応する部分を符号化しない処理を行うことにより実現できる。
【０１８９】
以上説明したように、第３のシーン記述処理によれば、上述のようにシーン記述を変換することにより、サーバ１０側で意図した通りのシーンを、意図したタイミングにおいて復号端末１２側で復元することが可能となる。また、第３のシーン記述処理によれば、シーン記述処理部２４において、伝送帯域若しくは復号端末１２の処理性能に適合するまで、シーン記述中の重要度の低い部分データから順に削除することが可能となる。また、第３のシーン記述処理によれば、復号端末１２の処理性能に余裕が生じた場合には、より詳細なシーン記述を送信することが可能となり、それによって復号端末１２の処理性能に対して最適なシーンを復号、表示等させることが可能となる。
【０１９０】
次に、第４のシーン記述処理について説明する。
【０１９１】
第４のシーン記述処理として、本実施の形態のサーバ１０側では、伝送路の状態や復号端末１２からの要求に応じて、シーン記述の複雑さを変換することにより、シーン記述のデータ量を調整し、かつ復号端末１２における処理負荷を調整可能としている。すなわちこの例のシーン記述処理部２４は、制御部２６の制御の元、伝送路の状態や復号端末１２からの要求に応じて、シーン記述のデータ量を調整して出力する。以下、図３０〜図３３を用いて第４のシーン記述処理を具体的に説明する。
【０１９２】
図３０は、ポリゴンで記述した物体を表示するためのシーン記述を、ＭＰＥＧ４ＢＩＦＳで記述（分かり易くテキストとして記述）したものである。なお、図３０の例では、簡略化のために、ポリゴンの座標は省略している。なお、図３０のシーン記述において、IndexedFaceSetとは、Coordinate中のpointで指定した頂点座標を、CoordIndexで指定した順番に接続してできる幾何物体を表している。また、図３１は、図３０のシーン記述を復号することにより表示されるシーンの表示例（ポリゴンの物体の表示例）を示す。
【０１９３】
この第４のシーン記述処理の例において、伝送路の状態により、例えばサーバ１０が送信するデータ量を減じたい場合、或いは、処理負荷を下げたいとの伝送要求が復号端末１２から伝送された場合、フィルタ２３のシーン記述処理部２４では、シーン記述を、より簡易なシーン記述へと変換する。例えば、図３２に示すシーン記述の例では、図３１のようなポリゴンを表すIndexedFaceSetを、図３３に示すような球体を表すSphereで置き換えることにより、シーン記述のデータ量自体を減じ、且つ復号端末１２における復号処理とシーンの構成を行うための処理の負荷を軽減可能となっている。すなわち、図３１のようなポリゴンの場合は、多面体を表す各値が必要になるのに対し、図３３に示すような球体の場合には、それらが不要となるため、シーン記述のデータ量を減らすことができる。また、復号端末１２側では、多面体を表示するための複雑な処理が、球体を表示するための簡単な処理になり、処理負担が軽減されている。
【０１９４】
この第４のシーン記述処理のように、上記図３０に示したシーン記述から図３２に示したシーン記述への変換処理は、シーン記述処理部２４において、例えば予め記憶部９に用意されている複数のシーン記述のなかから、伝送路の状態や復号端末１２からの要求に適した評価基準を満たすシーン記述を選択して出力すること、或いは、記憶部９から読み出されたシーン記述を入力とし、上記評価基準を満たすシーン記述へ変換したり、或いは、上記評価基準を満たすシーン記述を符号化出力することにより実現できる。なお、上記評価基準とは、シーン記述のデータ量や、ノードやポリゴンの数などのシーン記述の複雑さを表す基準であれば良い。
【０１９５】
また、シーン記述処理部２４におけるシーン記述の複雑さを変換する他の処理手法としては、図３２のように複雑な部分データを簡易な部分データで置き換える処理若しくはその逆の処理、或いは、部分データを取り除く処理若しくはその逆の処理、或いはシーン記述を符号化する場合には量子化ステップを変更することによってシーン記述データのデータ量を調整するような処理などであっても良い。なお、符号化時の量子化ステップ調整によるシーン記述のデータ量制御は、例えば次のようにして実現できる。例えばＭＰＥＧ４ＢＩＦＳでは、座標や回転軸と角度、サイズ等の量子化カテゴリ毎に、量子化の使用／不使用や使用ビット数を表す量子化パラメータを設定することが可能であり、且つ１つのシーン記述中でも量子化パラメータを変更することができるとされているので、例えば量子化に使用するビット数を小さくすれば、シーン記述のデータ量を減じることが可能となる。
【０１９６】
以上説明したように、第４のシーン記述処理によれば、シーン記述を変換することにより、サーバ１０側で意図した通りに簡易化したシーンを、復号端末１２側で復元することが可能となる。また、第４のシーン記述処理によれば、シーン記述処理部２４において、伝送帯域若しくは復号端末１２の処理性能に適合するまで、シーン記述中の重要度の低い部分データから順に削除することが可能となる。
【０１９７】
次に、第５のシーン記述処理について説明する。
【０１９８】
第５のシーン記述処理として、サーバ１０側では、伝送路の状態や復号端末１２からの要求に応じて、シーン記述を複数の復号単位に分割することにより、シーン記述データのビットレートを調整し、且つ復号端末１２における局所的な処理負荷の集中を回避可能としている。すなわち、この例のシーン記述処理部２４は、制御部２６の制御の元、伝送路の状態や復号端末１２からの要求に応じて、シーン記述を複数の復号単位に分割し、それら分割した復号単位のシーン記述の送出タイミングを調整して出力する。なお、ある時刻に復号すべきシーン記述の復号単位は前記符号化単位のＡＵと同じである。以下、図３４〜図３８を用いて第５のシーン記述処理を具体的に説明する。
【０１９９】
図３４には、例えば球体、立方体、円錐、円柱の４つの物体を表すシーン記述を、ＭＰＥＧ４ＢＩＦＳの１つのＡＵで記述したものである。また、図３５は、図３４のシーン記述を復号して表示されるシーンの表示例を示し、球体４１、立方体４２、円錐４４、円柱４３の４つの物体が表示されている。この図３４に示した１つのＡＵに記述されたシーンは、指定された復号時刻において全て復号し、指定された表示時刻において表示に反映しなければならない。なお、この復号時刻（ＡＵをデコードして有効にすべき時刻）は、ＭＰＥＧ４においてはＤＴＳ（Decoding Time Stamp）と呼ばれている。
【０２００】
この第５のシーン記述処理の例において、伝送路の状態若しくは復号端末１２からの要求により、例えば送信するデータのビットレートを減じたい場合、或いは復号端末１２における局所的な処理負荷を下げたい場合、フィルタ２３のシーン記述処理部２４では、シーン記述を複数のＡＵへ分割し、ＡＵ毎のＤＴＳをずらすことにより、シーン記述の局所的なビットレートを伝送路の状態若しくは復号端末１２からの要求に見合うビットレートへ調整し、ＤＴＳ毎の復号処理に必要な処理量を復号端末１２からの要求に見合う処理量へ調整する。
【０２０１】
すなわち、シーン記述処理部２４は、先ず例えば図３４に示したシーン記述を、図３６に示すように４つのＡＵ１〜ＡＵ４に分割する。ここで、第１のＡＵ１は、グルーピングを行っているGroupノードに１というＩＤを割り当て、後続のＡＵから参照することを可能とすることが記述されている。ＭＰＥＧ４ＢＩＦＳでは、参照可能なグルーピングノードに対して、後から部分シーンを追加していくことが可能となされている。第２のＡＵ２から第４のＡＵ４は、部分シーンを第１のＡＵ１で定義されているＩＤが１のGroupノードのChildrenフィールドへ追加するコマンドが記述されている。
【０２０２】
次に、シーン記述処理部２４は、上述の第１のＡＵ１〜第４のＡＵ４について、それぞれ図３７に示すようにＤＴＳをずらして指定する。すなわち、第１のＡＵ１に対しては第１のＤＴＳ１を指定し、第２のＡＵ２に対しては第２のＤＴＳ２を、第３のＡＵ３に対しては第３のＤＴＳ３を、第４のＡＵ４に対しては第４のＤＴＳ４を指定する。これにより、サーバ１０から復号端末１２への局所的なシーン記述データのビットレートは減じられ、且つ、復号端末１２ではＤＴＳ毎に発生する局所的な復号処理の負荷が減じられる。
【０２０３】
なお、図３６のように４つに分割されたシーン記述を、それぞれＤＴＳ１〜ＤＴＳ４にて復号して表示されるシーンは、図３８に示すように、ＤＴＳ毎に物体が追加され、最後のＤＴＳ４において図３５と同様のシーンが得られることになる。すなわち、ＤＴＳ１では球体４１が表示され、ＤＴＳ２ではさらに立方体４２が追加され、ＤＴＳ３ではさらに円錐４４が追加され、ＤＴＳ４ではさらに円柱４３が追加されることで、最終的に４つの物体が表示される。
【０２０４】
この第５のシーン記述処理のように、上記図３４に示したシーン記述から図３６に示したシーン記述への変換処理は、シーン記述処理部２４において、例えば予め記憶部９に用意されている複数のシーン記述のなかから、伝送路の状態や復号端末１２からの要求に適した評価基準を満たすシーン記述を選択して出力すること、或いは、記憶部９から読み出されたシーン記述を入力とし、上記評価基準を満たすまで分割したシーン記述（ＡＵ１〜ＡＵ４）へ変換したり、或いは、上記評価基準を満たすまで分割したシーン記述（ＡＵ１〜ＡＵ４）をそのＡＵ毎に符号化出力することにより実現できる。なお、この第５のシーン記述処理における上記評価基準とは、１つのＡＵのデータ量や、１つのＡＵに含まれるノードの数、物体の数、ポリゴン数等、１つのＡＵに含めるシーンの限界を表す基準であれば良い。
【０２０５】
以上説明したように、当該第５のシーン記述処理によれば、シーン記述を複数のＡＵへ分割し、ＡＵ毎のＤＴＳの間隔を調整することにより、シーン記述の平均ビットレートを制御することが可能であり、また、復号端末１２の局所的な復号処理の負担を軽減可能である。なお、平均ビットレートは、ある時間間隔中に含まれるＤＴＳを持つＡＵのデータ量の合計を、上記時間間隔で除算することにより算出可能であるため、シーン記述処理部２４では、伝送路の状態や復号端末１２からの要求に適した平均ビットレートを実現するようにＤＴＳの間隔を調節することができる。なお、上述の例では、ＡＵを分割する例を挙げたが、逆に複数のＡＵを結合するようなことも可能である。
【０２０６】
上述の説明では、第１〜第５のシーン記述処理を個々に行う例を挙げているが、それら各シーン記述処理を任意に組み合わせて、複数個のシーン記述処理を同時に行うことも可能である。この場合は、それら組み合わせたシーン記述処理それぞれの前述した作用効果を同時に実現することが可能となる。
【０２０７】
また、本実施の形態では、シーン記述の例としてＭＰＥＧ４ＢＩＦＳを挙げているが、本発明はこれに限定されるものではなく、あらゆるシーン記述方法に対しても適用可能である。また例えば、シーン記述の変化分のみを記述可能なシーン記述方法を用いている場合には、その変化分のみを送信する場合も本発明は適用可能である。
【０２０８】
さらに、上述した本発明実施の形態は、ハードウェア構成によっても、また、ソフトウェアによっても実現可能である。
【０２０９】
また、上述の説明では、シーン記述の例としてＨＴＭＬやＭＰＥＧ４ＢＩＦＳを挙げているが、その他にＶＲＭＬ、Ｊａｖａ（商標）など、あらゆるシーン記述方法に対しても適用可能である。
【０２１０】
また、本発明は、ビデオデータ、オーディオデータ、静止画像データ、テキストデータ、グラフィックデータ、シーン記述データなどのデータのタイプに依らず、かつあらゆるデータの符号化方法に対して有効である。さらに、本発明は、ハードウェアによってもソフトウェアによっても実現可能である。
【０２１１】
【発明の効果】
本発明においては、受信側にて通常再生を行うときには当該通常再生に使用するデータを出力し、受信側にて特殊再生を行うときには通常再生に使用するデータの符号化単位の再生に関連する時間情報を特殊再生に応じて変換して出力することにより、受信側において特殊再生を行う場合に、例えばビデオ以外のデータの復号及び表示等が可能となり、また、シーン記述データを配信、復号等することができ、さらに、データ間の同期関係を保持し、伝送ビットレートなどの評価基準を満たすデータとして配信することが可能となっている。
【図面の簡単な説明】
【図１】本発明実施の形態のデータ配信システムの構成例を示すブロック図である。
【図２】第１の実施の形態のデータ配信システムのサーバの詳細な構成を示すブロック図である。
【図３】第１の実施の形態において早送り再生を行う場合の時間情報の変換処理の説明に用いる図である。
【図４】第１の実施の形態においてスロー再生を行う場合の時間情報の変換処理の説明に用いる図である。
【図５】第１の実施の形態においてジャンプを行う場合の時間情報の変換処理の説明に用いる図である。
【図６】第２の実施の形態のデータ配信システムのサーバの詳細な構成を示すブロック図である。
【図７】第２の実施の形態において早送り再生を行う場合の時間情報の変換処理の説明に用いる図である。
【図８】第２の実施の形態において早送り再生を行う場合のビットレートの変化の説明に用いる図である。
【図９】第３の実施の形態のデータ配信システムのサーバの詳細な構成を示すブロック図である。
【図１０】第３の実施の形態の第１の具体例のフィルタにおける分割処理の流れを示すフローチャートである。
【図１１】第１の具体例のフィルタにおいてＭＰＥＧ４ＢＩＦＳによるシーン記述の分割候補の説明に用いる図である。
【図１２】図１１のシーン記述の構造説明に用いる図である。
【図１３】図１１のシーン記述の復号及び表示結果を表す図である。
【図１４】図１１のシーン記述の変換結果を表す図である。
【図１５】図１１のシーン記述の異なる変換候補を表す図である。
【図１６】第３の実施の形態の第２の具体例のフィルタの詳細な構成を示すブロック図である。
【図１７】第２の具体例のフィルタにおける伝送優先度とビットレートと３つのＥＳとの関係説明に用いる図である。
【図１８】ビットレートの変更と伝送優先度の変更の説明に用いる図である。
【図１９】ＥＳのビットレートＲと伝送優先度の関係Ｐｓ（Ｒ）を示す図である。
【図２０】ＥＳの画枠領域Ｓと伝送優先度の関係Ｐｓ（Ｓ）を示す図である。
【図２１】第１のシーン記述処理における変換前のシーン記述によるシーン表示結果を示す図である。
【図２２】図２１のシーンに対応したシーン記述（MPEG4 BIFS）の例を表す図である。
【図２３】第１のシーン記述処理における変換後のシーン記述によるシーン表示結果を示す図である。
【図２４】第１のシーン記述処理におけるＥＳ変換とシーン記述変換のタイミングの説明に用いる図である。
【図２５】図２３のシーンに対応したシーン記述（MPEG4 BIFS）の例を表す図である。
【図２６】図２１のシーンに対応するＥＳの復号に必要な、図２２のシーン記述に付随する情報（MPEG4 ObjectDescriptor）の例を表す図である。
【図２７】図２３のシーンに対応するＥＳの復号に必要な、図２５のシーン記述に付随する情報（MPEG4 ObjectDescriptor）の例を表す図である。
【図２８】図２１及び図２２で説明したシーンから動画像のＥＳを削除した場合のシーン記述（MPEG4 BIFS）の例を表す図である。
【図２９】図２８のシーン記述による表示結果を示す図である。
【図３０】ポリゴンで記述した物体を表示するためのシーン記述（MPEG4 BIFS）の例を表す図である。
【図３１】図３０に示すシーン記述による表示結果を示す図である。
【図３２】ポリゴンで記述した物体を球体で置換したシーン記述（MPEG4 BIFS）の例を表す図である。
【図３３】図３２に示すシーン記述による表示結果を示す図である。
【図３４】４つの物体からなるシーン記述（MPEG4 BIFS）の例を表す図である。
【図３５】図３４に示すシーン記述による表示結果を示す図である。
【図３６】図３４に示すシーン記述を４つのＡＵに分割した各シーン記述（MPEG4 BIFS）の例を表す図である。
【図３７】図３６に示す各ＡＵの復号タイミングの説明に用いる図である。
【図３８】図３６に示す各ＡＵのシーン記述による表示結果を示す図である。
【図３９】従来のデータ配信システムの概略構成を示すブロック図である。
【図４０】図３９に示したデータ配信システムの欠点を解消するデータ配信システムの概略構成を示すブロック図である。
【図４１】図４０のデータ配信システムにおけるビデオデータ用のデータ変換部の動作の一例（早送り再生）の簡単な説明に用いる図である。
【図４２】図４０のデータ配信システムにおけるビデオデータ用のデータ変換部の動作の一例（巻き戻し再生）の簡単な説明に用いる図である。
【図４３】ＶＲＭＬおよびＭＰＥＧ４ＢＩＦＳを用いたシーン記述の説明に用いる図である。
【符号の説明】
１特殊再生制御部、７データ変換部、４多重化部、５送信部、９記憶部、１０サーバ、１２復号端末、１３受信部、１４分離部、１５復号部、１６シーン合成部、１７読み出し部、１８スケジューラ、１９時間情報書き換え部、２３フィルタ、２４シーン記述処理部、２５ＥＳ処理部、２６制御部[0001]
BACKGROUND OF THE INVENTION
The present invention uses a network to describe scene description data for constructing a scene using multimedia data including video data such as still images and moving images, audio data, text data and graphic data. Optimal data used when special reproduction is performed in a data distribution system in which the distributed multimedia data and scene description data are received at the decoding terminal and decoded and displayed at the decoding terminal. Processing method and equipment In place Related.
[0002]
[Prior art]
A configuration example of a conventional data distribution system in which video data or the like stored by compressing and storing still image or moving image image signals is distributed via a transmission medium, received at a decoding terminal, and decoded and displayed. 39. In FIG. 39, only the video data path is described for the sake of simplicity. Further, in the following description, video data is transported (Transport stream, hereafter simply referred to as TS) defined by, for example, ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission) 13818-1 (so-called MPEG2 Systems). In this example, the packet is distributed to
[0003]
In FIG. 39, the server 200 includes a storage unit 209 that stores video data. The video data read from the storage unit 209 is packetized into TS by the multiplexing unit 204, further converted to distribution data 211 by the transmission unit 205, and output to the transmission medium 210, for example, distributed to the decoding terminal 212. Is done. At this time, the TS distribution data 211 is transmitted using the protocol used in the transmission medium 210. For example, a TS that meets the requirements of ISO / IEC13818-1 uses the transmission medium of IEEE (Institute of Electrical and Electronics Engineers) 1394 standard, for example, according to the method defined in “Digital Interface for consumer audio / video equipment” of IEC61883. Can be transmitted. Note that the multiplexing unit 204 and the transmission unit 205 may be integrated.
[0004]
In the decryption terminal 212, the distribution data 211 is received by the reception unit 213 and sent to the separation unit 214. The separation unit 214 separates the video data from the TS packet and sends it to the decoding unit 215. The decoding unit 215 decodes the encoded video data. The decoded video data is sent to, for example, a display device (not shown) and displayed as a video image.
[0005]
In such a data distribution system, when performing special playback display such as fast-forward playback, frame-by-frame playback, and pause, for example, a special playback designation signal (fast-forward) according to the operation of the terminal front panel or remote controller by the user, for example. (Instruction signal such as playback or frame advance playback) 206 is input to the special playback control unit 216 of the decoding terminal 212. At this time, the special reproduction control unit 216 of the decoding terminal 212 generates a special reproduction request signal 220 for requesting the server 200 for video data for special reproduction of the type designated by the special reproduction designation signal 206. Then, the special reproduction request signal 220 is transmitted to the special reproduction control unit 201 of the server 200 via the transmission medium 210.
[0006]
Upon receiving the special reproduction request signal 220, the special reproduction control unit 201 of the server 200 generates control signals 202a and 202b corresponding to the request, and sends them to the corresponding multiplexing unit 204 and transmission unit 205, respectively. The multiplexing unit 204 performs special reproduction for enabling the decoding terminal 212 to perform special reproduction of the type specified by the user from the storage unit 209 under the control of the special reproduction control unit 201 by the control signal 202b. Read video data. Further, the multiplexing unit 204 packetizes the special reproduction video data into a TS and sends it to the transmission unit 205. The transmission unit 205 distributes the special reproduction video data packet to the decoding terminal 212 as distribution data 211 under the control of the special reproduction control unit 201 by the control signal 202a.
[0007]
In the decoding terminal 212 when the distribution data 211 including the special reproduction video data is supplied, control signals 217a and 217b for performing special reproduction control in accordance with the special reproduction designation signal 206 are sent to the special reproduction control unit. 216 and sent to the corresponding receiving unit 213 and decoding unit 215, respectively. The receiving unit 213 receives the distribution data 211 made up of the video data for special reproduction under the control of the special reproduction control unit 216 by the control signal 217 b and sends it to the separation unit 214. The separation unit 214 separates the special reproduction video data from the TS packet and sends it to the decoding unit 215. The decoding unit 215 decodes the video data for special reproduction under the control of the special reproduction control unit 216 by the control signal 217a. Thereby, special display such as fast-forward playback and frame-by-frame playback is performed on a display device (not shown).
[0008]
Note that the video frame encoding method defined in ISO / IEC13818-2 uses I-picture (intra-coded picture) encoded only from intra-frame data and prediction between frames. B picture (Bidirectionally predictive-coded picture: bi-directional predictive coded picture) and P picture (Predictive-coded picture: forward predictive coded picture) to be encoded. In the data distribution system shown in FIG. As the special reproduction video data read from the storage unit 209, an I picture that does not use prediction processing between the video frames is used. That is, normal reproduction video data includes I pictures periodically to enable random access, and the I pictures are extracted to constitute special reproduction video data. In this manner, in the conventional data distribution system shown in FIG. 39, when special reproduction such as fast-forwarding is performed at the decoding terminal 212, special reproduction such as video data consisting only of ISO / IEC13818-2 I-pictures is performed. The video data is distributed from the server 200 to the decoding terminal 212.
[0009]
On the other hand, when distributing compressed video data compliant with, for example, ISO / IEC13818-2 (so-called MPEG2 video) as in the data distribution system described above, the compressed video data defined in the ISO / IEC13818-2 is The decoder buffer must be encoded so that it does not overflow and underflow. The decoder buffer corresponds to an input buffer (not shown) provided in the decoding unit 215. If data exceeding the size of the buffer specified in ISO / IEC13818-2 is input, the decoder buffer overflows. On the other hand, if data necessary for decoding does not arrive at the time to be decoded, an underflow occurs. Become.
[0010]
However, video data consisting of only I pictures, such as the above-mentioned special playback video data, increases the amount of data and may cause the decoder buffer to overflow or underflow. For this reason, in the conventional data distribution system, special data for special reproduction different from normal reproduction is prepared in advance so that special reproduction can be performed without overflowing or underflowing the decoder buffer. In addition, when special playback is performed at the decoding terminal, it is necessary to distribute special data for the special playback. On the decoding terminal side, a special terminal is required which can perform special special reproduction processing corresponding to the special data for special reproduction, which is different from normal special reproduction processing.
[0011]
That is, according to the conventional data distribution system, in order to realize special reproduction without overflowing or underflowing the decoder buffer, special reproduction video data different from the special reproduction video data including only the I picture described above is used. Special data must be prepared in advance, and the special data must be distributed during special playback. Similarly, the decoding terminal requires a terminal including each special decoding unit 215 capable of handling special data for special reproduction. In the special reproduction control unit 216, the receiving unit 213, the separation unit 214, the decoding unit The unit 215 needs to be controlled for special reproduction data processing.
[0012]
For this reason, the applicant of the present application uses the normal playback video data read from the storage unit in the server according to Japanese Patent Application Nos. 2000-178999 and 2000-179000. Is converted into video data that satisfies the ISO / IEC13818-2 standard, and the converted video data is distributed to the decoding terminal, so that special playback for special playback as described above is performed. A technique has been proposed for a simple configuration that does not require the use and preparation in advance of distribution data and does not require a special decoding terminal that can handle the special distribution data for special reproduction.
[0013]
FIG. 40 shows data distribution that realizes conversion and output of data obtained as a result of special reproduction using video data for normal reproduction, for example, into video data satisfying the regulations of ISO / IEC13818-2. 1 shows a schematic configuration of a system. In the example of FIG. 40, for example, video data or the like is transport stream (TS) defined by ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission) 13818-1 (so-called MPEG2 Systems). The case where it is packetized and distributed is given.
[0014]
40, the server 220 includes a storage unit 229 that stores multimedia data such as video data such as still images and moving images, audio data, text data, and graphic data. For example, video data is read from the storage unit 229, and the video data is sent to the multiplexing unit 224 via, for example, a data conversion unit 223 described later. The multiplexing unit 224 packetizes the data output from the data conversion unit 223 into a TS. The TS packet is further converted into distribution data 231 by the transmission unit 225 and output to the transmission medium 230, and is distributed to the decoding terminal 232, for example. At this time, the TS distribution data 231 is transmitted using the protocol used in the transmission medium 230. For example, a TS that meets the requirements of ISO / IEC13818-1 uses the transmission medium of IEEE (Institute of Electrical and Electronics Engineers) 1394 standard, for example, according to the method defined in “Digital Interface for consumer audio / video equipment” of IEC61883. Can be transmitted.
[0015]
In the decryption terminal 232, the distribution data 231 is received by the reception unit 233 and sent to the separation unit 234. The separation unit 234 separates the video data from the TS packet and sends it to the decoding unit 235. The decoding unit 235 decodes the supplied data, that is, decodes encoded video data. The decoded video data is sent to, for example, a display device (not shown) and displayed as a video image.
[0016]
When special reproduction display is performed in the decoding terminal 232 of this data distribution system, for example, a special reproduction designation signal 226 corresponding to an operation by the user of the decoding terminal 232 is transmitted in the transmission medium interface unit (not shown) in the decoding terminal 232. Or the like to the server 220 via the transmission medium 230. The special reproduction designation signal 226 is a signal including the type of special reproduction such as fast forward reproduction, rewind reproduction, and frame advance reproduction, and designation of video data stored in the storage unit 229, for example. In addition, when the server 220 and the decryption terminal 232 are connected at a short distance such as a home network and the user can operate the front panel, the remote controller, or the like of the server 220, the front of the server 220 It is also possible to input the special reproduction designation signal 226 directly to the server 220 by the user operating a panel, a remote controller or the like.
[0017]
The special reproduction designation signal 226 input to the server 220 is input to the special reproduction control unit 221 provided in the server 220. In response to the special reproduction designation signal 226, the special reproduction control unit 221 generates a special reproduction control control signal 222 including the type of special reproduction and designation of video data, and sends it to the data conversion unit 223.
[0018]
The data conversion unit 223 reads the video data from the storage unit 229 under the control of the special reproduction control unit 221 by the control signal 222. Further, the data conversion unit 223 uses the video data read from the storage unit 229, and the data resulting from the special reproduction of the type specified by the control signal 222 satisfies, for example, the standard of ISO / IEC13818-2. Convert to video data and output. That is, at this time, when the decoding unit 235 of the decoding terminal 232 performs decoding in the same manner as during normal playback, the data conversion unit 223 performs special playback (specified by the user) such as fast-forward playback, rewind playback, and frame-by-frame playback. The video data read from the storage unit 229 is converted into video data for which special playback is realized.
[0019]
Here, the data conversion processing in the data conversion unit 223 will be briefly described with reference to FIGS. 41 and 42.
[0020]
In FIG. 41, normal playback video data encoded in MPEG2 video (video data read out from the storage unit 229) is fast-forwarded as an example of special playback processing in the data conversion unit 223. An outline of a data conversion process when converting into video data that is realized and satisfies the regulations of ISO / IEC13818-2 will be described. In the figure, I represents an I picture, P represents a P picture, and B represents a B picture. Also, in the MPEG2 video regulations, the encoding order (the order in which data is encoded in the bitstream) and the actual display order may differ from each other because of encoding using prediction between pictures. In FIG. 41, the encoding order and the display order are shown together. 41A shows the encoding order of normal playback video data, and FIG. 41B shows the display order when normal playback video data is decoded and displayed. FIG. 41 (c) shows the coding order when conversion processing for special playback is performed in which the normal playback section US is followed by the fast forward playback section FS and then returned to the normal playback section US. FIG. 41D shows the display order when the conversion process for special reproduction as shown in FIG. 41C is performed.
[0021]
In the data conversion unit 223, a fast-forward playback section FS in which special playback is performed is indicated by E in the figure. _k , E _m , E _n As shown in FIG. 41, the I picture (I in the normal reproduction video data shown in FIG. _k , I _m , I _n In order to prevent the decoder buffer from failing, repeat pictures B are used between these I pictures. _R Data conversion processing is performed such as inserting. The repeat picture B _R Is a picture that repeats the prediction source image, and is a picture that is treated as a B picture at the time of decoding. Repeat picture B _R Insertion also has the effect of adjusting the speed of fast-forward playback.
[0022]
In FIG. 42, as in FIG. 41, video data for normal playback (video data read from the storage unit 229) encoded with MPEG2 video is processed by the data conversion unit 223 in the special playback process. An outline of a data conversion process when converting to video data that realizes rewind playback as an example and satisfies the regulations of ISO / IEC13818-2 is shown. 42A shows the encoding order of the normal playback video data, and FIG. 42B shows the display order when the normal playback video data is decoded and displayed. FIG. 42 (c) shows the coding order when conversion processing for special playback is performed in which the normal playback section US is followed by the rewind playback section BS and then returned to the normal playback section US. FIG. 42D shows the display order when the conversion process for special reproduction as shown in FIG. 42C is performed.
[0023]
In the data conversion unit 223, the rewind playback section BS in which special playback is performed is shown in FIG. _k , E _m , E _n As shown in FIG. 42, an I picture (I) in the normal reproduction video data shown in FIG. _k , I _m , I _n ) And change their order, and in order not to break down the decoder buffer, repeat pictures B between these I pictures _R Data conversion processing is performed such as inserting.
[0024]
As described above, the special reproduction video data converted by the data conversion unit 223 is distributed to the decoding terminal 232 through the configuration after the multiplexing unit 224 as described above.
[0025]
[Problems to be solved by the invention]
By the way, in the conventional television broadcast, one image signal is displayed on the screen of the image display device, and only one audio signal is output from the speaker. It is also considered that one scene is formed using multimedia data including video data, audio data, text data, graphic data, and the like. As a method of describing the structure of the scene using the multimedia data, there is a scene description system defined in HTML (HyperText Markup Language), ISO / IEC 14496-1, which is used on a so-called Internet homepage or the like. There are MPEG4 BIFS (Binary Format for the Scene), VRML (Virtual Reality Modeling Language) defined in ISO / IEC14772, Java (trademark), and the like. Hereinafter, data describing the configuration of the scene is referred to as a scene description.
[0026]
An example of scene description using VRML and MPEG4 BIFS will be described with reference to FIG. FIG. 43 shows the contents of the scene description. In VRML, scene description is performed by text data as shown in FIG. 43, and in MPEG4 BIFS, scene description is performed by binary encoding of this text data.
[0027]
VRML and MPEG4 BIFS scene descriptions are expressed in basic description units called nodes, and in the example of FIG. 43, nodes are indicated by bold italic characters. A node is a unit that describes an object to be displayed, a connection relationship between objects, and the like, and includes data called a field to indicate the characteristics and attributes of the node. For example, the Transform node in FIG. 43 is a node capable of designating three-dimensional coordinate transformation, and the translation amount of the coordinate origin is designated in the translation field in the node. In addition, there is a field in which other nodes can be specified. For example, the Transform node in FIG. 43 has a Children field indicating a child node group whose coordinates are transformed by the Transform node, and for example, a Shape node is grouped by the Children field. In order to arrange the objects to be displayed in the scene, the nodes representing the objects are grouped together with the nodes representing the attributes, and further, the nodes are grouped by the nodes representing the arrangement positions. For example, the object represented by the Shape node in FIG. 43 is arranged in the scene by applying the parallel movement specified by the Transform node that is the parent node.
[0028]
The video data, audio data, and the like are displayed spatially and temporally arranged according to the scene description. For example, the MovieTexture node in FIG. 43 specifies that a moving image specified by an ID of 3 is to be pasted and displayed on the surface of a cube.
[0029]
[Problems to be solved by the invention]
As mentioned above, in recent years, video data, audio data, Te Although it is considered that one scene is formed by using multimedia data composed of text data, graphic data, etc., in a conventional data distribution system, only video data is decoded and displayed during special playback. Not.
[0030]
For this reason, even if multimedia data including, for example, video data, audio data, text data, graphic data, and the like is distributed, only video data is decoded and displayed during special playback. Even if data including data other than video, such as subtitle text, is distributed, the conventional data distribution system does not decode or display data other than video during special playback.
[0031]
For this reason, it is desired to be able to decode and display data other than video data such as audio data and subtitle text data during special playback such as fast-forward playback and rewind playback.
[0032]
At present, methods and means for distributing and decoding scene description data for configuring a scene as described above even during special playback have not been realized. For this reason, in a conventional data distribution system, for example, even if a single scene is configured using the above-described multimedia data and the multimedia data is distributed, the scene cannot be configured during special playback. As a result, there arises a problem that, for example, the scene displayed at the start and end of special playback becomes discontinuous.
[0033]
For this reason, it is desired to realize a technique and means for distributing and decoding the scene description data even during special reproduction.
[0034]
Furthermore, in order to deliver, decode, and display the above-mentioned multimedia data and scene description data even during special playback, it is possible to display the data while maintaining the synchronization relationship between the data. It is also necessary to distribute the data as data that satisfies an evaluation criterion such as a transmission bit rate (a criterion that does not cause the decoder buffer to fail).
[0035]
Therefore, the present invention has been made in view of such circumstances, and enables decoding and display of data other than video when special playback is performed, and for distributing and decoding scene description data. In addition, the data processing method and device that realize the above-described method and means, and that can maintain the synchronization relationship between the data and can be distributed as data satisfying the evaluation criteria such as the transmission bit rate. Place The purpose is to provide.
[0036]
[Means for Solving the Problems]
The data processing method of the present invention is a data processing method for transmitting data encoded for each predetermined encoding unit from the transmission side to the reception side, and receiving the special reproduction designation signal supplied from the reception side. And based on the received special reproduction designation signal, Depending on the bit rate adjustment of the output data Selecting a coding unit at the time of outputting data used for special reproduction on the receiving side, converting time information related to reproduction of the selected coding unit according to the special reproduction, and Depending on the bit rate adjustment of the output data The step of changing the scene description data in which the display area at the time of outputting the data used for the special reproduction is described, the time information after the conversion, the scene description data after the change, and the data used for the special reproduction are And a step of outputting to the receiving side.
[0037]
Further, the data processing apparatus of the present invention is a data processing apparatus for transmitting data encoded for each predetermined encoding unit to the receiving side, based on the special reproduction designation signal supplied from the receiving side, Depending on the bit rate adjustment of the output data A data conversion means for selecting a coding unit at the time of output of data used for special reproduction on the receiving side, and converting time information related to reproduction of the selected coding unit according to the special reproduction; and Based on the special playback designation signal, Depending on the bit rate adjustment of the output data, Filter means for changing scene description data describing a display area when outputting data used for special reproduction, time information converted by the data conversion means, scene description data changed by the filter means, and special reproduction The above-mentioned problem is solved by providing a transmission means for outputting data used for the reception to the receiving side.
[0040]
That is, according to the present invention, for example, the display time and the display time or the display end time of the display unit for normal reproduction data are calculated according to the special reproduction and rewritten to be converted into the special reproduction data. It is possible to save and display the synchronization relationship between data even during special playback on the terminal. In addition, according to the present invention, for example, by selecting a display unit in normal reproduction data and delivering it so as to satisfy an evaluation criterion such as a bit rate, an evaluation criterion such as a bit rate even during special reproduction. It is possible to deliver data that meets the requirements. In addition, according to the present invention, by converting the display unit in the normal reproduction data so as to satisfy the evaluation standard such as the bit rate, the evaluation standard such as the bit rate can be obtained even during the special reproduction. Enables distribution of data that meets the requirements.
[0041]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.
[0042]
Video data such as still images and moving images, audio data, multimedia data such as text data and graphic data, and scene description data are distributed via a transmission medium, received at a decoding terminal, decoded, and displayed. A configuration example of the data distribution system according to the embodiment of the present invention is shown in FIG. In the following description, for example, video data or the like is packetized into a transport stream (TS) defined by ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission) 13818-1 (so-called MPEG2 Systems). An example is given in the case of distribution.
[0043]
In FIG. 1, the server 10 includes a storage unit 9 that stores video data such as still images and moving images, audio data, multimedia data such as text data and graphic data, scene description data, and the like. The data read from the storage unit 9 is sent to the multiplexing unit 4 via, for example, a data conversion unit 7 described later. The multiplexing unit 4 packetizes the data output from the data conversion unit 7 into a TS. The TS packet is further converted into distribution data 22 by the transmission unit 5 and output to the transmission medium 21, and is distributed to the decoding terminal 12, for example. At this time, the TS distribution data 22 is transmitted using a protocol used in the transmission medium 21. For example, a TS that meets the requirements of ISO / IEC13818-1 uses the transmission medium of IEEE (Institute of Electrical and Electronics Engineers) 1394 standard, for example, according to the method defined in “Digital Interface for consumer audio / video equipment” of IEC61883. Can be transmitted. Note that the multiplexing unit 4 and the transmission unit 5 may be integrated.
[0044]
In the decryption terminal 12, the distribution data 22 is received by the reception unit 13 and sent to the separation unit 14. The separation unit 14 separates data from the TS packet, and a plurality of decoding units 15 ₁ ~ 15 _n Are sent to corresponding decoding units. Decoding unit 15 ₁ ~ 15 _n Then, the supplied data is decoded, that is, the encoded data is decoded.
[0045]
When scene description data describing the scene configuration is distributed, the scene composition unit 16 performs the decoding unit 15. ₁ ~ 15 _n The decoded data is synthesized according to the scene description data. The synthesized data synthesized by the scene synthesizing unit 16 is sent to, for example, a display device or a speaker (not shown), and is displayed and emitted as a scene image and sound. A plurality of decoding terminals 12 may be connected.
[0046]
Further, when special reproduction display is performed in the decoding terminal 12 of this data distribution system, for example, a special reproduction designation signal 6 according to an operation by a user of the decoding terminal 12 is transmitted in a transmission medium (not shown) in the decoding terminal 12. The data is transmitted from the interface unit or the like to the server 10 via the transmission medium 21. The special reproduction designation signal 6 is a signal including the type of special reproduction such as fast forward reproduction, rewind reproduction, frame advance reproduction, and slow reproduction, and designation of data stored in the storage unit 9. If the server 10 and the decryption terminal 12 are connected at a short distance, such as a home network, and the user can operate the front panel or remote controller of the server 10, the front of the server 10 It is also possible to input the special reproduction designation signal 6 directly to the server 10 by the user operating a panel, a remote controller or the like.
[0047]
The special reproduction designation signal 6 input to the server 10 is input to the special reproduction control unit 1 provided in the server 10. In response to the special reproduction designation signal 6, the special reproduction control unit 1 generates a special reproduction control control signal 2 including a special reproduction type and data designation, and sends it to the data conversion unit 7. Note that any number of data conversion units 7 may exist depending on the number of data to be distributed.
[0048]
The data conversion unit 7 reads the data from the storage unit 9 under the control of the special reproduction control unit 1 by the control signal 2, and uses the special reproduction to realize the special reproduction of the type specified by the control signal 2. To data.
[0049]
The detailed configuration and operation of the data conversion unit 7 in the data distribution system according to the embodiment of the present invention will be described below.
[0050]
FIG. 2 shows a detailed configuration of the server 10 of the data distribution system including the data conversion unit 7 according to the first embodiment of the present invention. Since the operations of the constituent elements other than the data conversion unit 7 are the same as described above, their detailed description is omitted.
[0051]
In FIG. 2, the data conversion unit 7 of the first embodiment includes a reading unit 17 that reads data from the storage unit 9 under the control of the control signal 2 from the special reproduction control unit 1, And a time information rewriting unit 19 for rewriting the time information encoded in accordance with special reproduction. When there are a plurality of data conversion units 7, the reading unit 17 in the data conversion unit 7 may have a configuration common to all the data conversion units 7.
[0052]
The reading unit 17 reads data for normal reproduction designated by the control signal 2 from the special reproduction control unit 1 from the storage unit 9 and sends it to the time information rewriting unit 19.
[0053]
The time information rewriting unit 16 converts the time information of the normal reproduction data read from the storage unit 9 by the reading unit 17 into the time information of the data after being converted according to the special reproduction, and outputs it. Encoded in the data to be processed. The time information of the data includes data arrival time, display start time, display end time, display time or decoding time. In the case of audio data, these pieces of time information are actually times related to sound emission, but since the image display and sound emission are related, the display start time and display end time as described above. Expressions such as display time are used. The same applies to the following description. In the first embodiment, the data whose time information has been rewritten by the time information rewriting unit 16 is sent to the multiplexing unit 4.
[0054]
The time information conversion process in the time information rewriting unit 19 of the data conversion unit 7 will be described with reference to FIG. Note that the example of FIG. 3 shows an example of time information conversion processing when fast-forward playback is realized.
[0055]
FIG. 3A shows a case where time information conversion processing for special reproduction by the time information rewriting unit 16 is not performed on the normal reproduction data read from the storage unit 9 (that is, at the decoding terminal 12). This shows the data display timing when normal playback is performed. Note that in some encoding methods such as MPEG2 video, the actual display order may differ from the encoding order (the order in which data is encoded in the bitstream). In order to facilitate understanding, the display order is shown. Each of AU30, AU31, AU32, etc. in FIG. 3 represents one display unit of data, and corresponds to a picture in the case of video data. Data encoding is normally performed for each display unit. This display unit, that is, the encoding unit is hereinafter referred to as AU (access unit). 1 AU starts display from the display start time Ts, and ends display at the display end time Te after the display time ΔT. Note that the display time Δ of 1 AU generally differs depending on the encoding method.
[0056]
On the other hand, (b) of FIG. 3 performs time information conversion processing for special reproduction (in this case, fast-forward reproduction) by the time information rewriting unit 16 for the normal reproduction data read from the storage unit 9. In other words, the display timing of converted data when special reproduction is performed at the decoding terminal 12 is shown. That is, in FIG. 3B, a fast forward playback section (special playback section) starts from the middle of the AU 30 ′ in the normal playback section, AU 31 ′ is the fast forward playback section, and AU 32 ′ after the AU 31 ′ is the normal playback. The display timing in the case of a section is shown.
[0057]
Here, when fast-forward playback is performed as in the example of FIG. 3 as special playback, the time t (hereinafter referred to as time t before conversion) when the conversion processing for special playback is not performed is performed. The relationship between the time T and the time T ′ on the time t ′ when the conversion process for the special reproduction is performed (hereinafter referred to as the output time t ′ after the conversion) is as follows. Will change.
[0058]
For this reason, in the data conversion unit 7 (time information rewriting unit 19) of the embodiment of the present invention, the time T ′ on the output time t ′ after the conversion is changed to a special time on the output time t ′ after the conversion. Using the reproduction start time To ′ and the special reproduction start time To (the start time on the time t before conversion corresponding to the special reproduction start time To ′) on the time t before conversion, the equation (1) Calculate as follows.
T '= To' + (T-To) / n (1)
However, n in the formula (1) represents the playback speed during special playback. The value of n is 2 for double speed playback, and is negative for rewind playback.
[0059]
On the other hand, during normal playback, the time T ′ on the output time t ′ after the conversion is set to the special playback end time Ti ′ on the output time t ′ after the conversion and the time t before the conversion. Using the special reproduction end time Ti (end time on the time t before conversion corresponding to the special reproduction start time Ti ′), the calculation is performed as shown in Expression (2).
T '= Ti' + (T-Ti) (2)
Also, during normal playback, the special playback end time immediately before that does not change, so the special playback start time at the start of the next special playback is obtained as shown in formula (3) using formula (2).
To '= Ti' + (To-Ti) (3)
Based on the above equations (2) to (3), the data conversion unit 7 displays the AU display start time Ts ′ and the display end on the output time t ′ after the conversion during normal playback and special playback. The time Te ′ can be calculated based on the display start time Ts and the display end time Te of the AU on the time t before conversion. Further, the display time ΔT ′ is obtained by multiplying the display time ΔT on the time t before conversion by 1 / n (n is a reproduction speed) or by subtracting the display time Ts ′ from the display end time Te ′. calculate.
[0060]
In this embodiment, the special reproduction start time, special reproduction end time, and special reproduction speed n are specified from the special reproduction control unit 1 to the data conversion unit 7 together with the control signal 2. Note that the special reproduction start time, special reproduction end time, and special reproduction speed n may be specified by another data conversion unit (not shown). That is, for example, the data distribution system according to the present embodiment includes a data conversion unit 223 that converts video data for special reproduction as shown in FIG. 40 of the aforementioned patent application 2000-178999 and patent application 2000-179000. When the special conversion end time, special reproduction start time, and special reproduction speed are determined in accordance with the video data display timing by the data conversion unit 223, the special conversion end is performed from the data conversion unit 223. The time, special reproduction start time, and special reproduction speed may be directly specified in the data conversion unit 7 of the present embodiment.
[0061]
According to the data distribution system of the present embodiment, as described above, the display start time Ts ′ and the display end time Te ′ of the AU on the output time t ′ after the conversion are performed during normal playback and special playback. In addition, by calculating the display time ΔT ′, the time information rewriting unit 19 can rewrite the display time, the display end time, and the display time encoded in the output data according to the special reproduction. it can. When time information such as decoding time and data arrival time is also encoded in the data, the time information rewriting unit 19 converts the time information after conversion based on the equations (1) and (2). It can be converted into time information on time t ′ and output.
[0062]
As described above, according to the present embodiment, when special playback is executed in the decoding terminal 12, the time of data after the time information of normal playback data is converted according to the special playback. It is converted into information, and the time information is encoded into data and distributed from the server 10. That is, according to the data distribution system of the present embodiment, since the distribution data received by the decoding terminal 12 has already been converted in time information for special reproduction in the server 10, the decoding terminal 12 uses special data for special reproduction. Special processing is not required, and if decoding and display are performed at a timing based on time information such as display time as in normal playback, a display result of the result of performing special playback automatically can be obtained. That is, the decoding terminal 12 in the case of the present embodiment does not need to be a special terminal that can handle special distribution data for special reproduction without performing special processing for special reproduction. Furthermore, according to the present embodiment, since a plurality of distributed data are converted at the same playback speed, there is no deviation in synchronization between the plurality of data, and deviations may accumulate. No.
[0063]
Next, time information conversion processing in the time information rewriting unit 19 when performing slow reproduction as special reproduction will be described with reference to FIG. 4 represented in the same manner as FIG.
[0064]
FIG. 4A shows the display timing of normal reproduction data on time t before conversion, as in FIG. AU40, AU41, AU42, etc. in FIG. 4 each represent one display unit of data. 4B is similar to FIG. 3B, and has been converted when time information conversion processing for special reproduction (in this case, slow reproduction) by the time information rewriting unit 16 is performed. Indicates the data display timing. That is, in FIG. 4 (b), the AU 40 ′ in the normal playback section is in the middle of the slow playback section, AU 41 ′ is the slow playback section, and AU 42 ′ after the AU 41 ′ is the normal playback section. In this case, the display timing is shown.
[0065]
Here, for example, in the case of performing 0.5 × speed playback as special playback, the data conversion unit 7 (time information rewriting unit 19) according to the embodiment of the present invention sets the value of the playback speed n to 0.5 and the above formula (1) Is calculated.
[0066]
As in the example of FIG. 4, the time information conversion process in the data conversion unit 7 of the present embodiment is also effective as described above even in the case of performing special reproduction whose reproduction speed is lower than the normal speed. If the decoding terminal 12 performs the same decoding and display as in normal playback without performing special processing for slow playback, the display result of the slow playback can be obtained.
[0067]
Next, the time information conversion processing in the time information rewriting unit 19 when performing special reproduction such as jump for moving the reproduction position to a discontinuous display unit in time, using FIG. 5 expressed in the same manner as FIG. Will be described.
[0068]
FIG. 5A shows the display timing of normal reproduction data on time t before conversion, as in FIG. 3A. Each of AU50, AU51, AU52, etc. in FIG. 5 represents one display unit of data. FIG. 5B shows converted data when time information conversion processing for special reproduction (in this case, jump) by the time information rewriting unit 16 is performed, as in FIG. 3B. Is displayed. That is, in FIG. 5B, the jump is performed from the middle of the AU 50 ′ in the normal playback section, the special playback start time To ′ that is the jump start time, and the special playback end time that is the jump end time. This represents the display timing when the AU 51 during Ti ′ is not output and the AU 51 ′ after the special reproduction end time Ti ′ is output after the special reproduction start time To ′ on the AU 50 ′. .
[0069]
Here, in the case of jump, since there is no playback speed during special playback, the special playback control unit 1 designates the special playback start time and special playback end time to the data conversion unit 7. Since the special reproduction start time can be converted between the special reproduction start time To on the time t before conversion and the special reproduction start time To ′ on the time t ′ after conversion by the above formula (3), You can specify either time before or after conversion. Further, the special reproduction end time designates special reproduction end times Ti and Ti ′ on both the time before and after the conversion. However, if the special reproduction end time To ′ on the converted time t ′ is equal to the special reproduction start time Ti ′, Ti ′ need not be specified.
[0070]
In the case of the example of FIG. 5, the data conversion unit 7 does not output the AU 51 between the jump start time To ′ and the end time Ti ′, and the AU 50 displayed across the jump start time To ′ is The time information is changed and output so that the display end time becomes To ′, or is not output. Further, the AU 52 displayed across the jump end time Ti ′ changes the time information so that the display time becomes Ti ′, or does not output it.
[0071]
Even in the case of performing special reproduction such as jump for moving the reproduction position to a discontinuous display unit as in the example of FIG. 5, the time information conversion processing in the data conversion unit 7 of the present embodiment. Is effective as described above. Therefore, if the decoding terminal 12 performs the same decoding and display as in normal playback without special processing for jumping, the display result of the jump result can be obtained. Can do.
[0072]
Further, according to the present invention, by converting the scene description data describing the structure of the scene according to the special reproduction, the scene description data can be distributed and decoded even during the special reproduction. Thus, it is possible to avoid the inconvenience that the scene displayed at the end of the special reproduction is discontinuous, for example.
[0073]
In the above-described example, when time information such as display time and decoding time is encoded and added to the data itself, the time information rewriting unit 19 of the data conversion unit 7 rewrites and outputs the time information. In addition, for example, when the time information is added to the data by the multiplexing unit 4, for example, the data conversion unit 7 notifies the multiplexing unit 4 of the change of the time information, and the multiplexing unit 4 adds the changed time information to the data. Alternatively, when time information is added to the data by the transmission unit 5, similarly, the data conversion unit 7 notifies the transmission unit 5 of changes in the time information, and the transmission unit 5 adds the time information after the change. To do. This can be similarly applied to other embodiments described later.
[0074]
By the way, in a data distribution system that distributes multimedia data such as video data, audio data, text data, graphic data, and scene description data, and decodes and displays the data, even during special playback, the bit rate, etc. There is a request to distribute data that satisfies the evaluation criteria.
[0075]
That is, the delivery data during fast-forward playback as in the example of FIG. 3 is compressed on the time axis compared to the delivery data during normal playback, and the average bit rate is higher than that during normal playback. On the other hand, in the case of a system that distributes data via a transmission medium as in this embodiment, the upper limit of the bit rate allowed at the time of distribution is determined according to the transmission capacity of the transmission medium and the capability of the decoding terminal. For example, if the bit rate of the distribution data exceeds the upper limit of the bit rate allowed for the distribution, data delay or loss occurs. In such a case, for example, if the bit rate of the distribution data is limited, it is considered that the bit rate of the distribution data can be prevented from exceeding the upper limit bit rate allowed at the time of distribution.
[0076]
Further, for example, if the data included in the distribution data within a certain period of time relatively increases, the difficulty of decoding, scene synthesis, and display increases, and there is a risk that the data is not correctly displayed on the decoding terminal. In such a case, for example, if the difficulty of decoding of the distribution data, scene synthesis, and display is limited, it is considered that the risk of not being correctly displayed on the decoding terminal can be reduced.
[0077]
Therefore, in the second embodiment of the present invention, it is possible to distribute data that satisfies the evaluation criteria such as the bit rate even during special playback, thereby preventing the occurrence of data delay and loss, In addition, the scene can be correctly displayed on the decoding terminal.
[0078]
FIG. 6 shows a detailed configuration of the server 10 of the data distribution system including the data conversion unit 7 according to the second embodiment of the present invention.
[0079]
In FIG. 6, the data conversion unit 7 includes a reading unit 17 that reads data from the storage unit 9 under the control of the control signal 2 from the special reproduction control unit 1, and time information that is encoded in the output data. In addition to the time information rewriting unit 19 that rewrites in accordance with special reproduction, a scheduler 18 that selects an AU to be output based on an evaluation criterion such as a bit rate is provided. The data conversion unit 7 converts the time information from the normal reproduction data time before conversion to the time after conversion, encodes the time information into the data, and outputs the data. This is the same as the case of the form.
[0080]
The conversion process in the scheduler 18 of the data conversion unit 7 in the case of the second embodiment will be described with reference to FIGS.
[0081]
FIG. 7 is expressed in the same manner as FIG. 3, and FIG. 7A shows the display timing of normal reproduction data on time t before conversion, as in FIG. 3A. AU70, AU71, AU72, AU73, etc. in FIG. 7 each represent one display unit of data. 7B, 7C, and 7D are time information conversion processing for special reproduction (in this case, jump) by the time information rewriting unit 16, as in FIG. 3B. And the display timing of the converted data when the AU is selected according to the bit rate allowed at the time of distribution by the scheduler 18 of the present embodiment. That is, in FIG. 7B, the AU 71 and AU 72 are selected by the scheduler 18 in the fast forward playback section (special playback section), and the AU 71 and 72 are subjected to time information conversion processing by the time information rewriting unit 16. AU71 ′ and AU72 ′, and the subsequent AU73 ′ represents the display timing in the case of the normal playback section. 7C, only the AU 71 is selected by the scheduler 18 in the fast-forward playback section, and the AU 71 is converted into the AU 71 ′ by the time information rewriting unit 16, while the AU 72 is not output. Subsequent AU 73 'represents the display timing when the normal playback section is set. FIG. 7D shows the display timing when the AU 71 and AU 72 are not selected by the scheduler 18 in the fast forward playback section and the subsequent AU 73 ′ is set as the normal playback section.
[0082]
Here, there are two AUs AU71 and AU72 in the special playback section (fast forward playback section) on the time t before conversion shown in FIG. 7A, and the first embodiment described above. In this case, the time information of these AU71 and AU72 is converted according to the special reproduction speed and is output as AU71 ′ and U72 ′. However, for example, as shown in FIG. 8, when special playback (fast forward playback in the examples of FIGS. 7 and 8) is performed, the bit rate of the distribution data changes according to the playback speed. If the bit rate thus changed exceeds the allowable bit rate of the transmission medium or the decoding terminal, data delay or loss occurs.
[0083]
Therefore, the scheduler 18 included in the data conversion unit 7 of the present embodiment selects the AU to be output and the AU not to be output so as to satisfy the bit rate allowed for the distribution data. For example, when the bit rate allowed for the distribution data is equal to or higher than the bit rate BR81 when only AU71 is output and AU72 is not output and less than the bit rate BR80 when both AU71 and AU2 are output, the scheduler 18 Is not output. The conversion output in this case is as shown in FIG. When the bit rate allowed for the distribution data is less than the bit rate BR81 when only AU71 is output and AU72 is not output, the scheduler 18 determines that neither AU71 nor AU72 is output. The conversion output in this case is as shown in FIG. On the other hand, when the bit rate allowed for the distribution data is equal to or higher than the bit rate BR80 when both AU71 and AU72 are output, the scheduler 18 determines to output both AU71 and AU72. The conversion output in this case is as shown in FIG. The AU selected and output by the scheduler 18 in this manner is then converted by the time information rewriting unit 19 into time information based on the playback speed of special playback as described above.
[0084]
As described above, according to the second embodiment, during the special reproduction, the display unit (AU) in the normal reproduction data is selected and output so as to satisfy the evaluation criteria such as the bit rate. Even in such a case, it is possible to distribute data that satisfies the evaluation criteria such as the bit rate. Note that the evaluation criterion is not limited to the bit rate. For example, it may be an evaluation criterion representing the difficulty of data decoding, scene composition, display, etc., such as the number of polygons allowed in a certain time and the number of nodes in scene description data. Further, it may be an evaluation criterion capable of limiting data that can be output in a certain time, such as the number of characters in text data.
[0085]
Furthermore, when the data conversion unit 7 according to the second embodiment of the present invention selects the display unit (AU) to be output and the display unit not to be output as described above, the data does not use prediction between display units. It is also possible to preferentially output the display unit encoded in, and select not to output the display unit encoded using prediction. As a result, the decoding terminal can perform predictive decoding with the display unit encoded without using the prediction as a prediction source.
[0086]
In the second embodiment, an example has been given in which delivery data satisfying an evaluation criterion such as a bit rate can be output depending on whether or not an AU is selected and output. However, a third embodiment described below will be described. It is also possible to output distribution data that satisfies an evaluation criterion such as a bit rate by converting the content of the AU itself as in the above form.
[0087]
FIG. 9 shows a detailed configuration of the server 10 of the data distribution system according to the third exemplary embodiment of the present invention.
[0088]
In FIG. 9, the server 10 is common to the first and second embodiments except that the output stage of the data conversion unit 7 corresponding to any of the above-described embodiments is provided with a filter 23. It is.
[0089]
The filter 23 converts the data converted for special reproduction by the data conversion unit 7 of the first or second embodiment, that is, the AU itself so as to satisfy an evaluation criterion such as a bit rate. A plurality of data conversion units 7 and filters 23 may exist. That is, the filter 23 according to the third embodiment not only selects AUs to be output and AUs that are not output, but also converts the AU itself, as in the data conversion unit 7 according to the second embodiment. Outputs data that satisfies evaluation criteria such as bit rate. For example, in the case of text data, by reducing the number of characters included in one AU, the amount of data to be distributed is reduced, and the data is converted into data satisfying a desired bit rate and output.
[0090]
According to the present embodiment, by converting the AU itself, it is possible to distribute data that satisfies the evaluation criteria such as the bit rate even during special reproduction. Further, the AU input to the filter 23 has already been converted in time information according to the special reproduction by the data conversion unit 7 of the first or second embodiment. If special processing is not required and the decoding terminal 12 performs processing such as decoding and display similar to normal playback without special processing for special playback, the display for special playback automatically Can be realized.
[0091]
A specific example of the filter 23 will be described below.
[0092]
As a first specific example of the filter 23, for example, data in a scene description is handled for each division unit, and the scene description is converted and output for each division unit so as to satisfy evaluation criteria such as transmission capacity. be able to. By using the filter 23 of the first specific example in combination with the data conversion unit 7 of the first or second embodiment of the present invention, evaluation criteria such as bit rate can be set even during special reproduction. It is possible to deliver data that satisfies the requirements.
[0093]
The operation of the filter 23 of the first specific example applied to the third embodiment of the present invention will be described below.
[0094]
The filter 23 of the first specific example converts the input scene description based on the hierarchized information. The filter 23 obtains decoding terminal information indicating the decoding and display capability of the decoding terminal 12 when outputting the scene description. The decoding terminal information includes the image frame when the decoding terminal 12 displays a scene description, the upper limit of the number of nodes, the upper limit of the number of polygons, the upper limit of multimedia data such as included audio and video, and the like. Information indicating decoding and display capability. In addition to the decoding terminal information, the filter 23 receives hierarchized information to which information indicating the transmission capacity of the transmission medium 22 used for distribution of the scene description is added. The filter 23 converts the scene description input into scene description data having a hierarchical structure based on the hierarchical information.
[0095]
According to the data distribution system of the third embodiment including the filter 23 of the first specific example, as described above, the transmission used for distribution is performed by converting the scene description based on the hierarchical information. Scene description data suitable for the medium 22 can be distributed, and a scene description that matches the performance of the decoding terminal 12 can be distributed.
[0096]
The procedure of the scene description conversion process in the filter 23 is shown in FIG.
[0097]
In FIG. 10, the filter 23 first divides the scene description into division candidate units as described later in step S200. In FIG. 10, the division candidate number is represented by n. Further, in order to convert the input scene description into scene description data having a plurality of hierarchies, the hierarchy of the scene description data to be output is represented by m. The hierarchy number m starts from 0, and the smaller the number, the more fundamental the hierarchy is represented.
[0098]
Next, in step S201, the filter 23 determines whether the division candidate n can be output as the current hierarchy based on the hierarchization information. For example, when the number of bytes of data allowed in the current layer is limited by the layering information, whether the output scene description of the current layer is equal to or less than the limited number of bytes even when the division candidate n is added Check out. In step S201, if it is determined that the candidate division n cannot be output to the current hierarchy, the process proceeds to step S202. If it can be output, the process proceeds to step S203.
[0099]
In step S202, the filter 23 advances the hierarchy number m by 1. That is, the output to the current hierarchy m is terminated, and thereafter, the scene description data is output to a new hierarchy. Then, the process proceeds to step S203.
[0100]
In step S203, the filter 23 outputs the division candidate n to the current hierarchy m. Then, the process proceeds to step S204.
[0101]
In step S204, the filter 23 determines whether or not all the division candidates have been processed. If the processing has been performed, the conversion process ends. On the other hand, if division candidates still remain, the process proceeds to step S205.
[0102]
In step S205, the filter 23 advances the division candidate number n by one. That is, the next division candidate is set as a processing target. Then, the processing is repeated from step S201.
[0103]
Here, taking MPEG4 BIFS as an example, division in the scene description conversion processing by the filter 23 shown in FIG. 10 will be described with reference to FIG.
[0104]
First, the contents of the scene description data in FIG. 11 will be described, and then the division in the scene description processing in the filter 23 will be described.
[0105]
In FIG. 11, a ransform node 302 is a node that can designate a three-dimensional coordinate transformation, and can designate a translation amount of the coordinate origin in its translation field 303. There are fields that can specify other nodes, and the scene description has a tree structure as shown in FIG. An ellipse in FIG. 12 represents a node, a broken line between nodes represents an event propagation path, and a solid line between nodes represents a parent-child relationship of the nodes. For a parent node, a node representing the field of the parent node is called a child node. For example, the Transform node 302 in FIG. 11 has a Children field 304 indicating a child node group whose coordinates are transformed by the Transform node, and the TouchSensor node 305 and the Shape node 306 are grouped as child nodes. A node that groups child nodes in the Children field in this way is called a grouping node. A grouping node is a node defined in Chapter 4.6.5 of ISO / IES 14772-1 and refers to a node having a field composed of a list of nodes. There is a special exception where the field name is not “Children” as defined in Chapter 4.6.5 of ISO / IES14772-1. Hereinafter, the “Children” field will be described as including such an exception.
[0106]
In order to arrange the objects to be displayed in the scene, the nodes representing the objects are grouped together with the nodes representing the attributes, and further grouped by the nodes representing the arrangement positions. The object represented by the Shape node 306 in FIG. 11 is placed in the scene by applying the translation specified by the Transform node 302 that is the parent node. The scene description of FIG. 11 includes a Sphere node 307 representing a sphere, a Box node 312 representing a cube, a Cone node 317 representing a cone, and a Cylinder node 322 representing a cylinder. The result of decoding and displaying the scene description of this example Is as shown in FIG.
[0107]
The scene description can also include user interaction. ROUTE in FIG. 11 represents event propagation. ROUTE 323 indicates that when the touchTime field of the TouchSensor node 305 assigned the identifier 2 changes, the value is propagated to the startTime field of the TimeSensor node 318 assigned the identifier 5 as an event. . In VRML, an identifier is represented by an arbitrary character string following the keyword DEF. In MPEG4 BIFS, a numerical value called a node ID (nodeID) is used as the identifier. When the user selects the Shape node 306 grouped in the Children field 304 of the Transform node 302 that is the parent node, the TouchSensor node 305 outputs the selected time as a touchTime event. A sensor that is grouped together with the shape node attached by the grouping node in this manner is hereinafter referred to as a sensor node. The Sensor node in VRML is Pointing-device sensors defined in Chapter 4.6.7.3 of ISO / IEC 14772-1. The attached Shape node is grouped in the parent node of the Sensor node. Points to the Shape node. On the other hand, the TimeSensor node 318 outputs the elapsed time as a fraction_changed vent for 1 second from startTime.
[0108]
The fraction_changed event representing the elapsed time output from the TimeSensor node 318 is propagated by the ROUTE 324 to the set_fraction field of the ColorInterpolator node 319 assigned the identifier of 6. The ColorInterpolator node 319 has a function of linearly interpolating values in the RGB color space. The key and keyValue fields of the ColorInterpolator node 319 output an RGB value [000] as value_changed when the value of the input set_fraction field is 0, and value_changed when the value of the input set_fraction field is 1 Represents that an RGB value [111] is output as an event. When the value of the set_fraction field to be input is between 0 and 1, a value obtained by linearly complementing between RGB values [000] and [111] is output as value_changed. That is, when the value of the input set_fraction field is 0.2, RGB value [0.2 0.2 0.2] is output as an event as value_changed.
[0109]
By ROUTE 325, the value value_changed of the linear interpolation result is propagated to the diffuseColor field of the Material node 314 assigned the identifier of 4. diffuseColor represents the diffuse color of the object surface represented by the Shape node 311 to which the Material node 314 belongs. By the event propagation by the above ROUTE 323, ROUTE 324 and ROUTE 325, the RGB value of the displayed cube changes from [000] to [111] for 1 second immediately after the user selects the displayed sphere. User interaction is realized. This user interaction is represented by ROUTE 323, ROUTE 324, ROUTE 325 and nodes related to event propagation indicated by the thick line frame in FIG. 12, and the data in the scene description necessary for the user interaction is represented as follows. This is called data necessary for event propagation. It should be noted that nodes other than those shown in bold lines are nodes not related to the event.
[0110]
As described above, with the scene description data of FIG. 11 taken as an example, the filter 23 of the first specific example of the present embodiment divides the scene description into division candidate units in step S200 of FIG.
[0111]
Here, in order to use a so-called Node Insertion command, the Children field of the grouping node is set as a division unit. However, if the data necessary for event propagation for user interaction is not divided, the three division candidates D0, D1, and D2 shown in FIG. 11 are obtained.
[0112]
A division unit including the Group node 300 which is the highest node in the input scene description is a division candidate D0 with n = 0. Nodes below the Transform node 315 are set as n = 1 division candidates D1. The Shape node 316 in the n = 1 division candidate D1 is a Children field of the Transform node 315 that is a grouping node, and thus can be a separate division candidate.
[0113]
However, in this example, since the Transform node 315 has no Children field other than the Shape node 316, the Shape node 316 is not set as another division candidate. Nodes below the Transform node 320 are designated as n = 2 division candidates D2. Similarly, the shape node 321 or lower may be another division candidate.
[0114]
The division candidate D0 with n = 0 is always output to the hierarchy m = 0. In step S201 in FIG. 10, it is determined whether the division candidate D1 with n = 1 can be output to the hierarchy with m = 0 based on the hierarchization information.
[0115]
Next, FIG. 14 shows an example of determination when the data amount allowed for the hierarchy of the scene description data to be output is specified by the hierarchization information. In the example of A in FIG. 14, if the division candidate D1 with n = 1 is also output to the hierarchy m = 0, the amount of data allowed for the hierarchy m = 0 is exceeded, so the division with n = 1 It is determined that the candidate D1 cannot be output to the hierarchy m = 0.
[0116]
Therefore, according to the procedure of step S202 of FIG. 10, it is determined that the output of the hierarchy m = 0 shown in B in FIG. 14 includes only the division candidate D0 of n = 0, and thereafter the output to the hierarchy m = 1. To do. According to the procedure of step S203, the division candidate D1 with n = 1 is output to the hierarchy m = 1.
[0117]
When the same procedure is performed for the next division candidate D2 with n = 2, as shown by A in FIG. 14, even if the division candidate D2 with n = 2 is output to the hierarchy m = 1, the hierarchy m = 0. Therefore, as shown by C in FIG. 14, the division candidate D2 with n = 2 is moved to the same hierarchy m = 1 as the division candidate D1 with n = 1. It is decided to output.
[0118]
By the above procedure, the filter 23 converts the input scene description into the converted scene description data output of the hierarchy m = 0 shown in B in FIG. 14 and the converted scene description of the hierarchy m = 1 shown in C in FIG. It is converted into scene description data output consisting of two layers of data output.
[0119]
In addition, the example of the scene description conversion indicated by A in FIG. 15 is obtained by converting scene description data similar to A in FIG. 14 based on different hierarchization information, as a result of scene description data consisting of three layers. An example converted to output is shown.
[0120]
That is, the scene description shown at A in FIG. 15 is the same as the case shown in FIG. 14, and the converted scene description data output of the hierarchy m = 0 shown at B in FIG. 15 and the hierarchy shown at C in FIG. The converted scene description data output of m = 1 is converted into the converted data output of the hierarchy m = 2 shown at D in FIG.
[0121]
In this conversion result example, the transmission medium used for the delivery of the scene description has a low transmission capacity, and the transmission medium that can transmit only the data amount allowed for the hierarchy m = 0 is the hierarchy m shown in B in FIG. Only the scene description data of = 0 is distributed.
[0122]
Even with only the scene description of the hierarchy m = 0, the data necessary for event propagation for user interaction is not divided, so that the user interaction similar to that before conversion can be realized in the decoding terminal 12.
[0123]
Further, for a transmission medium whose transmission capacity is sufficient for the total data amount of the layers of m = 0 and m = 1, m = 0 shown in FIG. As shown in C, scene description data of both layers of m = 1 is distributed.
[0124]
Since the scene description data of the layer m = 1 is inserted into the scene description of the layer m = 0 by the Node Insertion command, the decoding terminal 12 can decode and display the same scene description as before conversion. .
[0125]
The filter 23 of the first specific example can be adapted even when the transmission capacity of the transmission medium 22 changes by converting the scene description based on the hierarchical information that changes with time. The same effect can be obtained when the scene description data converted into the transmission medium 22 is recorded.
[0126]
In the example of the conversion result in FIG. 15, the decoding terminal 12 that receives and decodes and displays a scene description has a low decoding and display capability, and the decoding terminal 12 can only decode and display up to the data amount allowed for the hierarchy m = 0. On the other hand, only the scene description data of the hierarchy m = 0 shown in B in FIG. 15 can be distributed.
Even with only the scene description of the hierarchy m = 0, the data necessary for event propagation for user interaction is not divided, so that the user interaction similar to that before conversion can be realized in the decoding terminal 12.
[0127]
For the decoding terminal 12 whose decoding and display capability is sufficient for the total data amount of the layers of m = 0 and m = 1, m = 0 and B shown in FIG. The scene description data of both layers of m = 1 shown in C in the middle is distributed.
[0128]
Since the scene description 100 data of the hierarchy m = 1 is inserted into the scene description of the hierarchy m = 0 by the Node Insertion command, the decoding terminal 12 can decode and display the same scene description as before conversion. is there.
[0129]
As described above, according to the first filter 23, the decoding and display capability of the decoding terminal 12 can be dynamically changed or new performance can be improved by converting the scene description based on the time-dependent decoding terminal information. The present invention can also be applied to the case where the decryption terminal 12 possessed is added to the distribution target.
[0130]
In MPEG4 BIFS, a command for inserting a node or an Inline node may be used for hierarchizing scene descriptions. Further, EXTERNPROTO described in Chapter 4.9 of ISO / IEC 14772-1 may be used. EXTERNPROTO is a method of referring to a node defined by a node definition method called PROTO in external scene description data. In MPEG4 BIFS, EXTERNPROTO can be used in the same way as VRML.
[0131]
Also, DEF / USE described in Chapter 4.6.2 of ISO / IEC14772-1 makes it possible to name a node by DEF and refer to the node DEF by USE from other places in the scene description. Yes.
[0132]
In MPEG4 BIFS, a numerical identifier called a node ID is provided in a node in the same way as in DEF, and the same reference as VRML can be used in the same way as in USE by specifying the node ID from another place in the scene description. is there.
[0133]
Therefore, when the scene description is hierarchized, if the part using DEF / USE described in Chapter 4.6.2 of ISO / IEC 14772-1 is not divided into different division candidates, the node DEF from USE It is possible to perform scene description conversion without destroying the reference relationship to.
[0134]
14 and 15 show an example in which the amount of data allowed for each layer is used as layering information. However, in layering information, can division candidates in a scene description be included in scene description data of a certain layer? Any information that can be determined may be used. For example, the upper limit of the number of nodes included in the hierarchy, the number of polygon data in the computer graphics included in the hierarchy, and the like of media data such as audio and video included in the hierarchy may be used. It may be limited or a plurality of hierarchized information may be combined.
[0135]
As described above, according to the filter 23 of the first specific example, the input scene description is converted into the scene description data having a plurality of hierarchical structures, thereby saving the transmission capacity when transmitting the scene description. For the purpose, it is possible to use the hierarchical structure of the scene description.
[0136]
Further, according to the filter 23 of the first specific example, the scene description is converted into scene description data composed of a plurality of hierarchies, and the data is deleted when the data is deleted. All By deleting only the scene description data in the hierarchy up to the amount of data, it is possible to save part of the content information described by the scene description.
[0137]
In addition, what has been described above is effective in any scene description method that can be divided without depending on the type of the scene description method.
[0138]
Next, the operation of the filter 23 of the second specific example applied to the third embodiment of the present invention will be described.
[0139]
As shown in FIG. 16, the filter 23 of the second specific example includes a scene description processing unit 24, an ES (Elementary Stream) processing unit 25, and a control unit 26 that controls the operations thereof, and includes scene description processing. The scene description data can be changed by the unit 24, and multimedia data other than the scene description data can be changed by the ES processing unit 25. The ES processing unit 25 performs conversion by re-encoding data into data of different bit rates in accordance with the transmission capacity and the capability of the decoding terminal. The scene description processing unit 24 adjusts the data amount by converting the contents of the scene description according to the transmission capacity of the transmission medium 22 and the processing capability of the decoding terminal 12, for example. By using the filter 23 including the scene description processing unit 24 and the ES processing unit 25 in combination with the data conversion unit 7 of the first or second embodiment of the present invention, even during special reproduction. Distribution of data that satisfies evaluation criteria such as bit rate becomes possible. In this example, although not shown, the decoding unit 15 of the decoding terminal 12 includes an ES decoding unit that decodes ES to restore video data, audio data, and the like, and decodes the scene description and decodes the scene description. And an ES scene description decoding unit that configures a scene using video, audio data, or the like based on the scene description.
[0140]
Here, the data distribution system of the third embodiment provided with the filter 23 of the second specific example is delayed in the data to be transmitted when the transmittable bandwidth of the transmission medium 22 or the traffic congestion state changes. In order to deal with the problem of loss and loss, the following things are done.
[0141]
The transmission unit 5 of the server 10 has a function of adding a serial number (encoded serial number) to each packet of data sent to the transmission path (transmission medium 22), while the reception unit 13 of the decoding terminal 12 receives A function of detecting data loss (data loss ratio) is provided by monitoring the lack of serial numbers (encoded serial numbers) added to each packet. Alternatively, the transmission unit 5 of the server 10 has a function of adding time information (encoded time information) to data sent to the transmission line, while the reception unit 13 of the decoding terminal 12 receives the data received from the transmission line. The time information (encoded time information) added to is monitored, and a transmission delay is detected based on the time information. When the receiving unit 13 of the decoding terminal 12 detects the data loss ratio of the transmission path or the transmission delay in this way, it transmits (reports) the detected information to the transmitting unit 5 of the server 10.
[0142]
Further, the transmission unit 5 of the server 10 has a transmission state detection function, and in the transmission state function, information such as a data loss ratio of a transmission path transmitted from the reception unit 13 of the decoding terminal 12 or a transmission delay is obtained. Detects the transmittable bandwidth of the transmission path and traffic congestion. That is, the transmission state detection function determines that the transmission path is congested if the data loss is high, or determines that the transmission path is congested if the transmission delay increases. When a bandwidth reservation type transmission path is used, the transmission state detection function can directly know the available bandwidth (transmittable bandwidth) that can be used by the server 10. Note that the transmission band may be set in advance by the user according to the weather condition or the like when a transmission medium such as a radio wave depending on the weather condition or the like is used. Transmission state detection information in the transmission state detection function is sent to the control unit 26 of the filter 23.
[0143]
The control unit 26 performs control such that ESs having different bit rates are selectively switched in the ES processing unit 25 based on the detection information of the transmittable bandwidth of the transmission path and the traffic congestion state, or the ES When encoding such as ISO / IEC13818 (so-called MPEG2) is performed by the processing unit 25, control such as adjusting the encoding bit rate is performed. That is, for example, if it is detected that the transmission path is congested, an ES with a low bit rate is output from the ES processing unit 25, so that a data delay can be avoided.
[0144]
Further, for example, an unspecified number of decryption terminals 12 are connected to the server 10, and the specifications of the decryption terminals 12 are not unified in advance, and the server 10 sends an ES to the decryption terminal 12 having various processing capabilities. In the case of a system configuration such that the receiving unit 13 of the decoding terminal 12 has a transmission request processing function, the transmission request processing function is used for requesting an ES corresponding to the processing capability of its own decoding terminal 12. A transmission request signal is transmitted to the server 10. This transmission request signal also includes a signal representing the capability of its own decoding terminal 12. Examples of the signal representing the capability of the decoding terminal 12 passed from the transmission request processing function to the server 10 include, for example, a memory size, a display unit resolution, a calculation capability, a buffer size, a decodable ES encoding format, and a decoding The number of ESs that can be used, the bit rate of ES that can be decoded, and the like can be mentioned. The transmission unit 5 that has received the transmission request signal sends the transmission request signal to the control unit 26 of the filter 23, and the control unit 26 transmits an ES that matches the performance of the decoding terminal 12. Be Thus, the ES processing unit 25 is controlled. For example, there is an image signal conversion processing method that has already been proposed by the applicant of the present application for the image signal conversion processing when the ES processing unit 25 converts the ES so as to match the performance of the decoding terminal 12.
[0145]
Further, the control unit 26 controls not only the ES processing unit 25 but also the scene description processing unit 24 in accordance with the state of the transmission path detected by the transmission state detection function of the transmission unit 5. In addition, when the decoding terminal 12 is a decoding terminal that requests a scene description corresponding to its own decoding and display performance, the control unit 26 is sent from the transmission request processing function of the receiving unit 13 of the decoding terminal 12. The ES processing unit 25 and the scene description processing unit 24 are controlled according to the signal representing the capability of the decoding terminal itself. The control unit 26, the scene description processing unit 24, and the ES processing unit 25 may be integrated.
[0146]
Hereinafter, a selection method when the ES processing unit 25 selects a specific ES to be transmitted from a plurality of ESs under the control of the control unit 26 will be described.
[0147]
The control unit 26 holds transmission priority information indicating the priority at the time of transmission for each ES of the plurality of ESs, and the state of the transmission path when the ES is transmitted or a request from the decoding terminal 12 Accordingly, ESs that can be transmitted in the descending order of the transmission priority are determined. That is, the control unit 26 transmits the ESs that can be transmitted in the order of higher transmission priority in accordance with the state of the transmission path when transmitting the ES or a request from the decoding terminal 12. Control. Note that, here, for example, the control unit 26 is described as holding transmission priority information, but may be stored in the storage unit 9.
[0148]
FIG. 17 shows an example of the transmission priority of each ES when there are three ESs, for example, ESa, ESb, and ESc. That is, in the example of FIG. 17, the transmission priority of ESa is “30”, the transmission priority of ESb is “20”, and the transmission priority of ESc is “10”. It is assumed that the transmission priority is higher as the value is lower. Moreover, Ra in FIG. 17 is a transmission bit rate when transmitting ESa, Rb is a transmission bit rate when transmitting ESb, and Rc is a transmission bit rate when transmitting ESc.
[0149]
Here, when the bit rate R that can be transmitted is determined according to the state of the transmission path or the request from the decoding terminal 12, the control unit 26 does not exceed the bit rate R that can be transmitted in descending order of transmission priority. The ES processing unit 24 is controlled so that the ES is selected and transmitted.
[0150]
That is, for example, when the relationship between the bit rate R that can be transmitted and the transmission bit rate of each ES is expressed by Expression (4), the control unit 26 selects and transmits only the ESc having the highest transmission priority. Thus, the ES processing unit 25 is controlled.
[0151]
Rc ≦ R <(Rc + Rb) (4)
Further, for example, when the relationship between the bit rate R that can be transmitted and the transmission bit rate of each ES is expressed by the equation (5), the control unit 26 selects the ESc with the highest transmission priority next (second). ) The ES processing unit 25 is controlled so as to select and transmit an ESb having a high transmission priority.
[0152]
(Rc + Rb) ≦ R <(Rc + Rb + Ra) (5)
Further, for example, when the relationship between the bit rate R that can be transmitted and the transmission bit rate of each ES is expressed by Expression (6), the control unit 26 selects and transmits all ESs. Control part 25.
[0153]
(Rc + Rb + Ra) ≦ R (6)
As described above, according to the data distribution system of the third embodiment including the filter 23 of the third specific example, the control unit 26 holds the transmission priority information for each ES and transmits the ES. By determining ESs that can be transmitted in descending order of transmission priority in accordance with the state of the transmission path and the request from the decoding terminal 12, it is possible to preferentially transmit important ESs from among a plurality of existing ESs. It is possible.
[0154]
In the above description, an example of selecting an ES or converting a scene description based on a preset priority is given. However, the priority can be changed along with the conversion of the ES. In addition, when changing a priority with conversion of ES, the change of the said priority is performed in the ES process part 25, for example.
[0155]
FIG. 18 shows an example of the transmission priority converted by the ES processing unit 25 in accordance with the conversion of the ESa bit rate to Ra ′. FIG. 18 shows an example in which the bit rate of ESa is set to a bit rate Ra lower than the bit rate Ra of the example of FIG. 17, and the transmission priority is increased as the bit rate is lowered. For example, it is highly converted (from “30” in FIG. 17 to “15” in FIG. 18).
[0156]
Furthermore, the transmission priority can be set according to encoding parameters such as the ES bit rate and image frame, in addition to the case where the control unit 26 holds a preset value. For example, as shown in FIG. 19, the transmission priority can be set according to the ES bit rate by maintaining the relationship Ps (R) between the ES bit rate R and the transmission priority. That is, for example, the higher the bit rate, the higher the transmission cost. Therefore, as shown in the example of FIG. 19, the lower the transmission cost (the lower the bit rate) by assigning a lower transmission priority to the higher ES bit rate. ) It is possible to transmit with priority on ES.
[0157]
Further, when the ES itself has an explicit image frame like image data, the transmission priority can be set according to the image frame. For example, FIG. 20 shows an example of the relationship Ps (S) between the ES image frame area S and the transmission priority, and the relationship Ps (S) between the image frame area S and the transmission priority is held. As a result, the transmission priority can be set according to the ES image frame. That is, since it is generally considered that the larger the image frame is, the higher the transmission cost is. Therefore, as shown in the example of FIG. Can be transmitted.
[0158]
As described above, the method of setting the transmission priority according to the encoding parameters such as the ES bit rate and the image frame is also used when the ES processing unit 25 changes the transmission priority in accordance with the ES conversion. it can. For example, if the ES processing unit 25 converts the ES of the bit rate Ra into the bit rate Ra ′, the transmission priority can be changed to Ps (Ra ′) as shown in FIG.
[0159]
The transmission priority may be assigned for each ES type such as a moving image, a still image, and text, or for each ES encoding format. For example, if the highest transmission priority is always assigned to text, text data can always be transmitted with priority even if the bit rate that can be transmitted is limited by the state of the transmission path or the request from the decoding terminal. It becomes possible.
[0160]
Also, the transmission priority can be determined based on the user's preference. That is, the server 10 retains preference information such as ES types such as moving images, still images, and texts preferred by the user, ES encoding formats, ES encoding parameters, and the like. High transmission priority can be assigned to ESs having different types, encoding formats, and encoding parameters. As a result, even when the bit rate that can be transmitted is limited according to the state of the transmission path or the request from the decoding terminal, it is possible to preferentially transmit ES that matches the user's preference and display it with high quality. It becomes.
[0161]
As described above, the control unit 26 holds transmission priority information for each ES, and ESs that can be transmitted in the order of higher transmission priority according to the state of the transmission path at the time of transmission or a request from the decoding terminal 12. By determining, it is possible to preferentially transmit important ES.
[0162]
Further, in the filter 23 of the third specific example applied to the third embodiment of the present invention, data distribution satisfying evaluation criteria such as bit rate even during special reproduction is performed as follows. Is possible. That is, the scene description processing unit 24 provided in the filter 23 of the third specific example can perform the following first to fifth scene description processes under the control of the control unit 26.
[0163]
As the first scene description process, the filter 23 of the third specific example can output a scene description suitable for the ES output from the ES processing unit 25, for example. That is, the scene description processing unit 24 can output a scene description suitable for the ES output from the ES processing unit 25 under the control of the control unit 26. Hereinafter, the first scene description process will be specifically described with reference to FIGS.
[0164]
FIG. 21 shows a display example of a scene composed of a moving image ES and a still image ES. 21 indicates a scene display area, Emv in the figure indicates a moving image ES display area in the scene display area Esi, and Esv in the figure indicates a still image ES display area in the scene display area Esi. .
[0165]
Further, in FIG. 22, the scene description corresponding to the scene display area Esi of FIG. 21 is represented by the contents and text when described in MPEG4 BIFS.
[0166]
The scene description shown in FIG. 22 includes two cubes, and it is specified that a moving image and a still image are pasted as textures on each surface. Each object is designated for coordinate transformation by the Transform node, and the object moves in parallel in the scene according to the values of the translation field indicated by # 500 and # 502 in the figure (origin position of local coordinates). Be placed. Further, the enlargement / reduction of the object included in the Transform node is designated by the values (scaling of local coordinates) indicated by # 501 and # 503 in the figure.
[0167]
Here, for example, when it is necessary to reduce the bit rate of distribution data due to the state of the transmission path (transmission medium 22) or a request from the decoding terminal 12, for example, a moving image ES that requires a large amount of data during transmission is required. Assume that an ES conversion process that lowers the bit rate is performed. Note that at this time, for a still image, for example, it is assumed that a high-resolution still image ES has already been transmitted and accumulated on the decoding terminal side.
[0168]
In this case, in the conventional data distribution system, decoding and display are performed with the same scene configuration regardless of whether or not the ES bit rate is adjusted, so that a moving image with a lowered bit rate is conspicuously deteriorated in image quality and the like. Become. That is, to explain specifically with reference to the example of FIG. 21, in the conventional data distribution system, adjustment is performed to lower the bit rate of the moving picture ES to be displayed in the moving picture ES display area Emv in FIG. Even in the case where the image is performed, the ES is decoded and displayed (display on the wide moving image ES display area Emv that does not match the actual bit rate) with the same scene configuration as that before the adjustment. For this reason, the moving image becomes rough (for example, the spatial resolution is rough), and the deterioration of the image quality becomes conspicuous.
[0169]
On the other hand, when the bit rate of the moving image ES is lowered, as shown in FIG. 23, for example, if the moving image ES display area Emv is reduced, the moving image ES is displayed in the moving image ES display area Emv. It is considered that the image quality degradation of the moving image (in this case, the spatial resolution degradation) can be made inconspicuous. In the case of the present embodiment, the still image ES has already been transmitted and stored in the decoding terminal for the still image. However, the still image is, for example, a high-resolution image, and the still image ES display in FIG. In the case where the region Esv is a narrow region that does not match the resolution, for example, if the still image ES display region Esv is widened as shown in FIG. 23, the resolution can be fully utilized. It is done. In this way, a countermeasure for narrowing the moving image ES display area Emv and widening the still image ES display area Esv cannot be realized unless the scene description is changed to a scene description representing such contents.
[0170]
Therefore, the scene description processing unit 24 provided in the filter 23 of the third specific example dynamically changes and outputs the scene description according to the ES bit rate adjustment in the ES processing unit 25. I do. In other words, in the control unit 26 in the third specific example, when the ES processing unit 25 is controlled to adjust the ES bit rate, a scene description suitable for the ES output from the ES processing unit 25 is provided. The scene description processing unit 24 is also controlled so as to be output. As a result, the deterioration of the image quality when the bit rate of the moving image is lowered as in the above example is made inconspicuous. In this example, in order to make use of the resolution of the already transmitted still image, the moving image ES display area Emv is narrowed as shown in FIG. 23, while the still image ES area Esv is widened. Is realized.
[0171]
A specific operation of the control unit 26 realizing the above will be described with reference to FIG.
[0172]
In FIG. 24, when it is necessary to lower the bit rate of the distribution data due to the state of the transmission path or the request from the decoding terminal 12, the control unit 26 moves the moving image with the bit rate lower than the moving image ES 292 at time T. The ES processing unit 25 is controlled so that ES293 is output.
[0173]
Further, at time T, the control unit 26 changes the scene description 290 corresponding to the scene display area Esi in FIG. 21 to the scene description 291 corresponding to the scene display area Esi in FIG. To control. That is, the scene description processing unit 24 at this time is a diagram representing the scene description shown in FIG. 22 representing the scene display area Esi in FIG. 21 and the scene display area Esi in FIG. 23 under the control of the control unit 26. The scene description is converted into a scene description as shown in FIG. Note that the scene description of FIG. 25 is also shown by the content text of the scene description described in MPEG4 BIFS, as in the case of FIG.
[0174]
Compared with the scene description of FIG. 22 described above, in the scene description shown in FIG. 25, the value of the translation field (local coordinate origin position) indicated by # 600 and # 602 in the figure is changed. Two cubes are moved, and the cube with the moving image (Emv in FIG. 23) pasted on the surface is converted to a small size by the values of the translation fields (scaling of local coordinates) indicated by # 601 and # 603 in the figure. Instead, a cube with a still image (Esv in FIG. 23) pasted on the surface is largely transformed.
[0175]
As in the first scene description process, for example, the scene description processing unit 24 converts the scene description shown in FIG. 22 into the scene description shown in FIG. The scene description corresponding to the ES output from the ES processing unit 25 (scene description in FIG. 25) is selectively read out and transmitted, or the scene description read from the storage unit 4 (Scene description in FIG. 22) is converted to a scene description (scene description in FIG. 25) corresponding to the ES output from the ES processing unit 25 and transmitted, or corresponds to the ES output from the ES processing unit 25 This is realized by performing processing for generating or encoding scene description data (scene description in FIG. 25) to be transmitted. If a scene description method capable of describing only changes in scene description is used, only the changes may be transmitted. In the above example, the case where the moving image ES display area Emv is narrowed when the bit rate of the moving image ES is reduced has been described. Conversely, when the bit rate is increased, the moving image ES display area is decreased. Naturally, the scene description conversion according to the present invention can be applied even when the Emv is expanded. Furthermore, in the above-described example, the description has been given on the assumption that the high-resolution still image ES is transmitted and stored in advance. For example, when the still image transmitted and stored in advance has a low resolution. Needless to say, a high-resolution still image ES may be newly transmitted and a corresponding scene description may be transmitted. In addition, although a moving image and a still image are given as examples in the present embodiment, the present invention includes a case where the scene description is changed according to the bit rate adjustment of other multimedia data.
[0176]
As described above, according to the first scene description process described with reference to FIGS. 21 to 25, the scene description representing the scene configuration information is converted, so that it matches the transmission path state and the request from the decoding terminal 12. In addition, for example, when ES conversion is performed by the ES processing unit 25, it is possible to transmit a scene description optimal for the ES after the conversion.
[0177]
Next, the second scene description process will be described.
[0178]
For example, when information necessary for ES decoding is changed by converting the ES bit rate or the like from the ES processing unit 25 according to the state of the transmission path or the decoding terminal 12, the filter 23 performs the second scene description processing. The scene description itself including information necessary for decoding the ES is also converted and transmitted, thereby eliminating the need for the decoding terminal to extract information necessary for decoding from the ES data itself. In other words, the scene description processing unit 24 is necessary for decoding the ES when the ES conversion processing is performed by the ES processing unit 25 and information necessary for decoding the ES changes under the control of the control unit 26. A scene description including information can be output. The information necessary for ES decoding includes, for example, the ES encoding format, the buffer size necessary for decoding, the bit rate, and the like. Hereinafter, the second scene description process will be specifically described with reference to the above-described drawings and FIGS. 26 and 27. FIG.
[0179]
FIG. 26 describes an example of information necessary for decoding the ES used in the scene as described with reference to FIGS. 21 and 22 described in the descriptor ObjectDescriptor defined in MPEG4. In the scene description of FIG. 22, a moving image to be mapped as a texture on the object surface is designated by a numerical value of 3 (= url3), which is associated with ODid = 3, which is an identifier of ObjectDescriptor of FIG. ES_Descriptor included in the ObjectDescriptor with the identifier ODid = 3 describes information about the ES. Also, ES_ID in the figure is an identifier that uniquely identifies the ES. This identifier ES_ID is further associated with an actual ES by associating it with, for example, an identifier of a header or a port number in a transmission protocol used for transmitting the ES.
[0180]
In addition, the description of ES_Descriptor includes a descriptor of information necessary for decoding of ES called DecoderConfigDescriptor. The information of the descriptor DecoderConfigDescriptor includes, for example, a buffer size, a maximum bit rate, an average bit rate, and the like necessary for ES decoding.
[0181]
On the other hand, FIG. 27 shows an example of information necessary for decoding the ES accompanying the scene description after the conversion processing in the scene description processing unit 24 corresponding to the scene shown in FIG. Descriptor is described by ObjectDescriptor. The decoding buffer size (bufferSiseDB), the maximum bit rate (maxBitRate), and the average bit rate (avgBitRate) of a moving image (ODid is 3 and referred to from the scene description) changed by the ES conversion are as shown in FIG. 27 is converted from the description in the ObjectDescriptor shown in FIG. That is, in the example of FIG. 26, bufferSiseDB = 4000, maxBitRate = 1000000, and avgBitRate = 1000000 are converted into bufferSiseDB = 2000, maxBitRate = 5000000, and avgBitRate = 5000000 in FIG.
[0182]
As in the second scene description process, the information conversion process necessary for decoding the ES accompanying the scene description is performed by the scene description processing unit 24 in decoding a plurality of ESs stored in the storage unit 9 in advance. Processing for selectively reading out information corresponding to the ES output from the ES processing unit 25 (information in FIG. 27) from necessary information, or decoding of the ES read from the storage unit 9 The information necessary for processing (information in FIG. 26) is converted into information necessary for ES decoding (information in FIG. 27) output from the ES processing unit 25 and transmitted, or the ES processing unit 25 outputs the information. This is realized by performing a process of encoding and transmitting information necessary for ES decoding (information shown in FIG. 27).
[0183]
As described above, according to the second scene description process described above, when the information necessary for ES decoding changes by converting the ES bit rate or the like according to the state of the transmission path or the decoding terminal 12, FIG. As shown in the figure, the information necessary for decoding the ES included in the scene description is changed and transmitted to the decoding terminal 12, so that the information necessary for ES decoding needs to be extracted from the ES data itself on the decoding terminal 12 side. It is possible to eliminate sex.
[0184]
Next, the third scene description process will be described.
[0185]
As the third scene description process, the filter 23 explicitly converts the scene description so as to increase or decrease the number of ESs constituting the scene and outputs the result, thereby enabling transmission of only ES corresponding to the transmission band. The decoding terminal 12 can determine the ES necessary for display or the like without depending on the arrival delay of ES data or the loss of data. That is, the scene description processing unit 24 in this example explicitly converts and outputs a scene description so as to increase or decrease the number of ESs under the control of the control unit 26, and is provided in the decoding unit 15 of the decoding terminal 12. The scene description decoding function to be used determines the ES necessary for display or the like without depending on the arrival delay of ES data or the loss of data. Hereinafter, the third scene description process will be specifically described with reference to the above-described drawings and FIGS. 28 and 29. FIG.
[0186]
FIG. 28 shows, for example, a scene description when moving picture ES is deleted from the scene described with reference to FIGS. 21 and 22 described in MPEG4 BIFS (descriptive text). FIG. 29 shows an example of a scene displayed based on the scene description of FIG. 28, and only the image ES display area (for example, still image ES display area) Eim is arranged in the scene display area Esi. Since it can be determined from the scene description that the ES used in the scene description of FIG. 28 is only an ES with an ODid of 4, the decoding terminal 12 does not need to receive moving picture ES data with an ODid of 3. Therefore, it can be determined that it is not due to the arrival delay of ES data or the loss of data. Furthermore, by deleting the description of the ObjectDescriptor with an ODid of 3 as in the examples of FIGS. 26 and 27, it can be determined that the moving image ES with an ODid of 3 is no longer necessary.
[0187]
Also, in this third scene description processing example, when a transmission request for temporarily reducing the processing load for decoding and configuring a scene is transmitted from the decoding terminal 12, the filter 23, for example, By changing the scene description shown in FIG. 22 to the scene description shown in FIG. 28, it is possible to notify the decoding terminal 12 that the process of mapping a moving image as a texture in the scene is not required. Thereby, in the decoding terminal 12, it becomes possible to reduce the processing load which decodes a scene.
[0188]
Like the third scene description process, the conversion process from the scene description shown in FIG. 22 to the scene description shown in FIG. 28 is prepared in the storage unit 9 in advance in the scene description processing unit 24. A process of selectively reading out and transmitting a scene description (scene description in FIG. 28) associated with the number of ESs output from the ES processing unit 25 from among a plurality of scene descriptions, or reading from the storage unit 9 Processing that outputs the scene description that is output and converts the partial data (data in the scene description) corresponding to the ES that is not output into a deleted scene description (scene description in FIG. 28) or outputs the scene description, or encodes the scene description In the case of output, it can be realized by performing processing that does not encode the portion corresponding to the ES that is not output.
[0189]
As described above, according to the third scene description process, by converting the scene description as described above, the scene as intended on the server 10 side is restored on the decoding terminal 12 side at the intended timing. It becomes possible. Further, according to the third scene description process, the scene description processing unit 24 can delete the partial data in the scene description in order from the least important until the transmission band or the processing performance of the decoding terminal 12 is satisfied. It becomes. Further, according to the third scene description process, when there is a margin in the processing performance of the decoding terminal 12, it becomes possible to transmit a more detailed scene description, thereby reducing the processing performance of the decoding terminal 12. Thus, it is possible to decode, display, etc. the optimal scene.
[0190]
Next, the fourth scene description process will be described.
[0191]
As the fourth scene description process, on the server 10 side of the present embodiment, the scene description data amount is converted by converting the complexity of the scene description according to the state of the transmission path and the request from the decoding terminal 12. The processing load at the decoding terminal 12 can be adjusted. That is, the scene description processing unit 24 in this example adjusts and outputs the data amount of the scene description according to the control of the control unit 26, the state of the transmission path, and the request from the decoding terminal 12. Hereinafter, the fourth scene description process will be specifically described with reference to FIGS. 30 to 33.
[0192]
FIG. 30 shows a scene description for displaying an object described by polygons described in MPEG4 BIFS (described as text for easy understanding). In the example of FIG. 30, polygon coordinates are omitted for simplification. In the scene description of FIG. 30, IndexedFaceSet represents a geometric object that is formed by connecting the vertex coordinates specified by point in Coordinate in the order specified by CoordIndex. FIG. 31 shows a display example of a scene (polygon object display example) displayed by decoding the scene description of FIG.
[0193]
In this fourth scene description processing example, for example, when it is desired to reduce the amount of data transmitted by the server 10 depending on the state of the transmission path, or when a transmission request for reducing the processing load is transmitted from the decoding terminal 12 The scene description processing unit 24 of the filter 23 converts the scene description into a simpler scene description. For example, in the example of the scene description shown in FIG. 32, the indexedFaceSet representing the polygon as shown in FIG. 31 is replaced with a Sphere representing the sphere as shown in FIG. The processing load for performing the decoding process and the scene configuration in 12 can be reduced. That is, in the case of a polygon as shown in FIG. 31, each value representing a polyhedron is required, whereas in the case of a sphere as shown in FIG. Can be reduced. Further, on the decoding terminal 12 side, complicated processing for displaying a polyhedron is simple processing for displaying a sphere, and the processing load is reduced.
[0194]
As in the fourth scene description process, a conversion process from the scene description shown in FIG. 30 to the scene description shown in FIG. 32 is prepared in the storage unit 9 in advance in the scene description processing unit 24, for example. From among a plurality of scene descriptions, select and output a scene description that satisfies the evaluation criteria suitable for the state of the transmission path and the request from the decoding terminal 12, or input the scene description read from the storage unit 9 It can be realized by converting into a scene description that satisfies the above evaluation criteria, or by encoding and outputting a scene description that satisfies the above evaluation criteria. The evaluation criteria may be any criteria that represents the amount of scene description data and the complexity of the scene description such as the number of nodes and polygons.
[0195]
Further, as another processing method for converting the complexity of the scene description in the scene description processing unit 24, a process of replacing complex partial data with simple partial data as shown in FIG. 32 or the reverse process, or partial data For example, the process of removing the image or the reverse process, or the process of adjusting the data amount of the scene description data by changing the quantization step when the scene description is encoded may be used. The data amount control of the scene description by adjusting the quantization step at the time of encoding can be realized as follows, for example. For example, in MPEG4 BIFS, it is possible to set quantization parameters indicating the use / nonuse of quantization and the number of bits used for each quantization category such as coordinates, rotation axis, angle, and size, and one scene. Since it is said that the quantization parameter can be changed even during description, for example, if the number of bits used for quantization is reduced, the data amount of the scene description can be reduced.
[0196]
As described above, according to the fourth scene description process, by converting the scene description, a scene simplified as intended on the server 10 side can be restored on the decoding terminal 12 side. . Further, according to the fourth scene description process, the scene description processing unit 24 can delete the partial data in the scene description in ascending order until it matches the transmission band or the processing performance of the decoding terminal 12. It becomes.
[0197]
Next, the fifth scene description process will be described.
[0198]
As the fifth scene description process, the server 10 side adjusts the bit rate of the scene description data by dividing the scene description into a plurality of decoding units according to the state of the transmission path and the request from the decoding terminal 12. In addition, it is possible to avoid local concentration of processing load in the decoding terminal 12. That is, the scene description processing unit 24 in this example divides the scene description into a plurality of decoding units in accordance with the control of the control unit 26, the state of the transmission path, and the request from the decoding terminal 12, and the divided decodings. Adjust the output timing of the unit scene description and output it. The decoding unit of the scene description to be decoded at a certain time is the same as the AU of the coding unit. Hereinafter, the fifth scene description process will be specifically described with reference to FIGS. 34 to 38.
[0199]
In FIG. 34, for example, a scene description representing four objects of a sphere, a cube, a cone, and a cylinder is described in one AU of MPEG4 BIFS. FIG. 35 shows a display example of a scene displayed by decoding the scene description of FIG. 34, in which four objects of a sphere 41, a cube 42, a cone 44, and a cylinder 43 are displayed. All the scenes described in one AU shown in FIG. 34 must be decoded at the designated decoding time and reflected on the display at the designated display time. Note that this decoding time (time when AU should be decoded and validated) is called DTS (Decoding Time Stamp) in MPEG4.
[0200]
In the fifth scene description processing example, for example, when it is desired to reduce the bit rate of data to be transmitted or to reduce the local processing load at the decoding terminal 12 due to the state of the transmission path or the request from the decoding terminal 12 The scene description processing unit 24 of the filter 23 divides the scene description into a plurality of AUs and shifts the DTS for each AU, thereby changing the local bit rate of the scene description from the state of the transmission path or the request from the decoding terminal 12. And the processing amount necessary for the decoding process for each DTS is adjusted to a processing amount that meets the request from the decoding terminal 12.
[0201]
That is, the scene description processing unit 24 first divides the scene description shown in FIG. 34 into four AU1 to AU4 as shown in FIG. Here, it is described that the first AU1 assigns an ID of 1 to the Group node that is performing grouping and can be referred to from the subsequent AU. In MPEG4 BIFS, it is possible to add partial scenes to groupable nodes that can be referred to later. The second AU2 to the fourth AU4 describe a command for adding a partial scene to the Children field of the Group node whose ID is defined by the first AU1.
[0202]
Next, the scene description processing unit 24 designates the first AU1 to the fourth AU4 by shifting the DTS as shown in FIG. That is, the first DTS1 is designated for the first AU1, the second DTS2 is designated for the second AU2, the third DTS3 is designated for the third AU3, and the fourth AU4 is designated. For the fourth DTS4. Thereby, the bit rate of the local scene description data from the server 10 to the decoding terminal 12 is reduced, and the local decoding processing load generated for each DTS is reduced in the decoding terminal 12.
[0203]
As shown in FIG. 38, the scene displayed by decoding the scene description divided into four parts by DTS1 to DTS4 as shown in FIG. 36 has an object added for each DTS and the last DTS4. Thus, the same scene as in FIG. 35 is obtained. That is, a sphere 41 is displayed in DTS1, a cube 42 is further added in DTS2, a cone 44 is further added in DTS3, and a cylinder 43 is further added in DTS4, so that finally four objects are displayed. .
[0204]
As in the fifth scene description process, a conversion process from the scene description shown in FIG. 34 to the scene description shown in FIG. 36 is prepared in the storage unit 9 in advance in the scene description processing unit 24, for example. From among a plurality of scene descriptions, select and output a scene description that satisfies the evaluation criteria suitable for the state of the transmission path and the request from the decoding terminal 12, or input the scene description read from the storage unit 9 By converting the scene description (AU1 to AU4) divided until the above evaluation criteria are satisfied, or by encoding and outputting the scene description (AU1 to AU4) divided until the above evaluation criteria are satisfied. realizable. Note that the evaluation criteria in the fifth scene description process are the limits of the scenes included in one AU, such as the data amount of one AU, the number of nodes included in one AU, the number of objects, the number of polygons, etc. Any standard can be used.
[0205]
As described above, according to the fifth scene description process, the average bit rate of the scene description can be controlled by dividing the scene description into a plurality of AUs and adjusting the interval of the DTS for each AU. It is possible, and the burden of local decoding processing of the decoding terminal 12 can be reduced. Since the average bit rate can be calculated by dividing the total amount of AU data having DTS included in a certain time interval by the time interval, the scene description processing unit 24 determines the state of the transmission path. Alternatively, the DTS interval can be adjusted so as to realize an average bit rate suitable for the request from the decoding terminal 12. In the above example, the example in which the AU is divided is given, but conversely, a plurality of AUs may be combined.
[0206]
In the above description, the first to fifth scene description processes are individually performed. However, it is also possible to perform a plurality of scene description processes by combining these scene description processes arbitrarily. . In this case, it is possible to simultaneously realize the above-described functions and effects of the combined scene description processes.
[0207]
In this embodiment, MPEG4 BIFS is given as an example of the scene description. However, the present invention is not limited to this, and can be applied to any scene description method. In addition, for example, when a scene description method capable of describing only a change amount of a scene description is used, the present invention can be applied to a case where only the change amount is transmitted.
[0208]
Furthermore, the above-described embodiment of the present invention can be realized by a hardware configuration or by software.
[0209]
In the above description, HTML and MPEG4 BIFS are given as examples of the scene description. However, the present invention can be applied to all scene description methods such as VRML and Java (trademark).
[0210]
The present invention is effective for any data encoding method regardless of the data type such as video data, audio data, still image data, text data, graphic data, and scene description data. Furthermore, the present invention can be realized by hardware or software.
[0211]
【The invention's effect】
In the present invention, when normal playback is performed on the receiving side, data used for normal playback is output, and when special playback is performed on the receiving side, time associated with playback of the coding unit of data used for normal playback By converting and outputting information according to special playback, when special playback is performed on the receiving side, for example, data other than video can be decoded and displayed, and scene description data is distributed and decoded. Furthermore, it is possible to maintain the synchronization relationship between the data and to distribute the data as satisfying the evaluation criteria such as the transmission bit rate.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration example of a data distribution system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a detailed configuration of a server in the data distribution system according to the first embodiment.
FIG. 3 is a diagram used for explaining time information conversion processing when fast-forward playback is performed in the first embodiment;
FIG. 4 is a diagram used for explaining time information conversion processing when performing slow reproduction in the first embodiment;
FIG. 5 is a diagram used for explaining conversion processing of time information when performing a jump in the first embodiment.
FIG. 6 is a block diagram illustrating a detailed configuration of a server of the data distribution system according to the second embodiment.
FIG. 7 is a diagram used for explaining time information conversion processing when fast-forward playback is performed in the second embodiment;
FIG. 8 is a diagram used for explaining a change in bit rate when fast-forward playback is performed in the second embodiment.
FIG. 9 is a block diagram illustrating a detailed configuration of a server of the data distribution system according to the third embodiment.
FIG. 10 is a flowchart showing a flow of division processing in the filter of the first specific example of the third embodiment;
FIG. 11 is a diagram used for explaining scene description division candidates by MPEG4 BIFS in the filter of the first specific example;
12 is a diagram used for explaining the structure of the scene description in FIG. 11. FIG.
13 is a diagram illustrating a result of decoding and displaying the scene description of FIG. 11. FIG.
14 is a diagram illustrating a conversion result of the scene description of FIG. 11. FIG.
15 is a diagram illustrating conversion candidates with different scene descriptions in FIG. 11;
FIG. 16 is a block diagram illustrating a detailed configuration of a filter according to a second specific example of the third embodiment;
FIG. 17 is a diagram used for explaining the relationship between transmission priority, bit rate, and three ESs in the filter of the second specific example;
FIG. 18 is a diagram used for explaining a change in bit rate and a change in transmission priority.
FIG. 19 is a diagram illustrating a relationship Ps (R) between an ES bit rate R and a transmission priority;
FIG. 20 is a diagram illustrating a relationship Ps (S) between an image frame area S of ES and a transmission priority.
FIG. 21 is a diagram illustrating a scene display result based on a scene description before conversion in the first scene description process;
22 is a diagram illustrating an example of a scene description (MPEG4 BIFS) corresponding to the scene of FIG.
FIG. 23 is a diagram illustrating a scene display result based on a scene description after conversion in the first scene description process.
FIG. 24 is a diagram used for explaining the timing of ES conversion and scene description conversion in the first scene description processing.
FIG. 25 is a diagram illustrating an example of a scene description (MPEG4 BIFS) corresponding to the scene of FIG.
26 is a diagram illustrating an example of information (MPEG4 ObjectDescriptor) attached to the scene description of FIG. 22 necessary for decoding the ES corresponding to the scene of FIG.
27 is a diagram illustrating an example of information (MPEG4 ObjectDescriptor) attached to the scene description of FIG. 25 necessary for decoding the ES corresponding to the scene of FIG.
28 is a diagram illustrating an example of a scene description (MPEG4 BIFS) in a case where an ES of a moving image is deleted from the scene described with reference to FIGS. 21 and 22. FIG.
29 is a diagram showing a display result based on the scene description of FIG. 28. FIG.
FIG. 30 is a diagram illustrating an example of a scene description (MPEG4 BIFS) for displaying an object described by polygons.
31 is a diagram showing a display result based on the scene description shown in FIG. 30. FIG.
FIG. 32 is a diagram illustrating an example of a scene description (MPEG4 BIFS) in which an object described with polygons is replaced with a sphere.
33 is a diagram showing a display result based on the scene description shown in FIG. 32. FIG.
FIG. 34 is a diagram illustrating an example of a scene description (MPEG4 BIFS) including four objects.
35 is a diagram showing a display result based on the scene description shown in FIG. 34. FIG.
36 is a diagram illustrating an example of each scene description (MPEG4 BIFS) obtained by dividing the scene description illustrated in FIG. 34 into four AUs.
FIG. 37 is a diagram used for explaining the decoding timing of each AU shown in FIG. 36;
38 is a diagram showing a display result of each AU shown in FIG. 36 according to the scene description.
FIG. 39 is a block diagram showing a schematic configuration of a conventional data distribution system.
40 is a block diagram showing a schematic configuration of a data distribution system that eliminates the drawbacks of the data distribution system shown in FIG. 39. FIG.
41 is a diagram used for a brief description of an example (fast-forward playback) of an operation of a data conversion unit for video data in the data distribution system of FIG. 40. FIG.
FIG. 42 is a diagram used for a brief description of an example of operation (rewind playback) of the data conversion unit for video data in the data distribution system of FIG. 40;
FIG. 43 is a diagram used for describing scene description using VRML and MPEG4 BIFS.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Special reproduction control part, 7 Data conversion part, 4 Multiplexing part, 5 Transmission part, 9 Storage part, 10 Server, 12 Decoding terminal, 13 Receiving part, 14 Separation part, 15 Decoding part, 16 Scene composition part, 17 Reading Section, 18 scheduler, 19 time information rewriting section, 23 filter, 24 scene description processing section, 25 ES processing section, 26 control section

Claims

所定の符号化単位毎に符号化したデータを送信側から受信側に伝送する際のデータ処理方法において、
上記受信側から供給された特殊再生指定信号を受信するステップと、
上記受信した特殊再生指定信号に基づいて、出力するデータのビットレート調整に応じて上記受信側での特殊再生に使用するデータの出力時の符号化単位を選択するステップと、
上記選択した符号化単位の再生に関連する時間情報を上記特殊再生に応じて変換するステップと、
上記出力するデータのビットレート調整に応じて、上記特殊再生に使用するデータの出力時の表示領域が記述されたシーン記述データを変更するステップと、
上記変換後の時間情報、上記変更後のシーン記述データ及び上記特殊再生に使用するデータを上記受信側に出力するステップと
を有するデータ処理方法。In a data processing method when transmitting data encoded for each predetermined encoding unit from the transmission side to the reception side,
Receiving a special reproduction designation signal supplied from the receiving side;
Selecting a coding unit at the time of outputting data to be used for special reproduction on the receiving side according to the bit rate adjustment of the data to be output based on the received special reproduction designation signal;
Converting time information related to reproduction of the selected coding unit according to the special reproduction;
Changing the scene description data describing the display area at the time of outputting the data used for the special reproduction according to the bit rate adjustment of the data to be output;
Outputting the time information after the conversion, the changed scene description data, and the data used for the special reproduction to the receiving side.

上記シーン記述データを変更するステップでは、上記受信側の復号能力及び表示能力を示す復号端末情報と、上記シーン記述データの伝送に使用する伝送媒体の伝送容量を表す階層化情報とに基づいて、上記シーン記述データを変更する請求項１記載のデータ処理方法。 In the step of changing the scene description data, based on decoding terminal information indicating the decoding capability and display capability of the receiving side, and layered information indicating transmission capacity of a transmission medium used for transmission of the scene description data, The data processing method according to claim 1, wherein the scene description data is changed.

上記シーン記述データは、複数のエレメンタリーストリームで構成されたシーンの表示領域を記述したデータである請求項１記載のデータ処理方法。 The data processing method according to claim 1, wherein the scene description data is data describing a display area of a scene composed of a plurality of elementary streams.

上記符号化単位の再生に関連する時間情報は、上記受信側にて上記符号化単位を再現する際の再現開始時刻、再現時間、再現終了時刻、復号時刻、データ到着時刻の何れか若しくは組み合わせを含む請求項１記載のデータ処理方法。 The time information related to the reproduction of the coding unit is any one or a combination of a reproduction start time, a reproduction time, a reproduction end time, a decoding time, and a data arrival time when reproducing the coding unit on the receiving side. The data processing method according to claim 1.

上記特殊再生の再生速度に応じて、上記通常再生に使用するデータの上記符号化単位の再生に関連する時間情報を変換する請求項１記載のデータ処理方法。 The data processing method according to claim 1, wherein time information related to reproduction of the coding unit of data used for normal reproduction is converted according to a reproduction speed of the special reproduction.

上記特殊再生として、時間的に非連続な符号化単位へ再生位置を移動させるジャンプ再生を行うときには、上記ジャンプ再生の開始時刻と終了時刻中に相当する符号化単位の出力を停止する請求項１記載のデータ処理方法。 2. As the special reproduction, when jump reproduction is performed in which a reproduction position is moved to a temporally non-continuous coding unit, output of the corresponding coding unit is stopped during the start time and end time of the jump reproduction. The data processing method described.

上記特殊再生の開始時刻をまたいで再現される符号化単位の再現終了時刻を、上記特殊再生の開始時刻となるよう上記時間情報を変更する請求項１記載のデータ処理方法。 The data processing method according to claim 1, wherein the time information is changed so that a reproduction end time of a coding unit reproduced across the start time of the special reproduction becomes the start time of the special reproduction.

上記特殊再生の終了時刻をまたいで再現される符号化単位の再現開始時刻を、上記特殊再生の終了時刻となるよう上記時間情報を変更する請求項１記載のデータ処理方法。 2. The data processing method according to claim 1, wherein the time information is changed so that a reproduction start time of a coding unit reproduced across the end time of the special reproduction becomes an end time of the special reproduction.

上記特殊再生として、時間的に非連続な符号化単位へ再生位置を移動させるジャンプ再生を行うとき、上記ジャンプ再生の開始時刻若しくは終了時刻をまたいで再現される符号化単位の出力を停止する請求項６記載のデータ処理方法。 As the special reproduction, when performing jump reproduction in which a reproduction position is moved to a temporally discontinuous coding unit, the output of the coding unit reproduced across the start time or end time of the jump reproduction is stopped. Item 7. The data processing method according to Item 6.

上記符号化単位を選択する際に、上記データが符号化単位間の予測を用いずに符号化されている符号化単位を優先的に選択する請求項２記載のデータ処理方法。 The data processing method according to claim 2, wherein when the coding unit is selected, a coding unit in which the data is coded without using prediction between coding units is preferentially selected.

所定の符号化単位毎に符号化したデータを受信側に伝送するデータ処理装置において、
上記受信側から供給された特殊再生指定信号に基づいて、出力するデータのビットレート調整に応じて上記受信側での特殊再生に使用するデータの出力時の符号化単位を選択し、該選択した符号化単位の再生に関連する時間情報を上記特殊再生に応じて変換するデータ変換手段と、
上記特殊再生指定信号に基づいて、上記出力するデータのビットレート調整に応じて、上記特殊再生に使用するデータの出力時の表示領域が記述されたシーン記述データを変更するフィルタ手段と、
上記データ変換手段により変換した時間情報、上記フィルタ手段により変更したシーン記述データ及び上記特殊再生に使用するデータを上記受信側に出力する送信手段と
を備えるデータ処理装置。In a data processing apparatus for transmitting data encoded for each predetermined encoding unit to the receiving side,
Based on the special reproduction designation signal supplied from the receiving side, the encoding unit at the time of outputting data used for special reproduction on the receiving side is selected according to the bit rate adjustment of the output data, and the selected Data conversion means for converting time information related to reproduction of a coding unit according to the special reproduction;
Filter means for changing scene description data in which a display area at the time of output of data used for the special reproduction is described according to the bit rate adjustment of the output data based on the special reproduction designation signal;
A data processing apparatus comprising: time information converted by the data conversion means, scene description data changed by the filter means, and transmission means for outputting data used for the special reproduction to the reception side.