JP3660514B2

JP3660514B2 - Variable rate video encoding method and video editing system

Info

Publication number: JP3660514B2
Application number: JP02847499A
Authority: JP
Inventors: 晋一郎古藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-02-05
Filing date: 1999-02-05
Publication date: 2005-06-15
Anticipated expiration: 2019-02-05
Also published as: JP2000228770A

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像シーケンスに対して複数回の符号化を行うことにより、可変レート符号化のための最適なビット配分を行う可変レート符号化方法および同方法を用いた動画像編集システムに関する。
【０００２】
【従来の技術】
蓄積媒体に対する動画像圧縮符号化方法としては、動画像符号化の国際標準であるＭＰＥＧ２ビデオ符号化方式が採用されたＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）に代表されるように、画像の性質に応じて圧縮データのビットレート（すなわち圧縮率）を時間的に変動させて、総符号量一定の制約中で高画質化を実現するという可変レート符号化技術が用いられている。これは、時間的或いは空間的相関が高く、比較的少ない符号量で必要な画質を満足できる部分には、不必要な符号量割り当ては行わず、また解像度が高くあるいは動きが激しいような、画質を維持するためには多くの符号量を必要とする部分には、多くの符号量を割り当て、全体としてディスク容量に収まるように符号化することで、一定の符号化レートで符号化するよりも高画質化を実現するという技術である。
【０００３】
通常、動画像シーケンス全体に渡っての最適な符号量割り当てを行うためには、まずはじめに動画像シーケンス全体を符号化してその符号化特性の解析を行い、それに基づいて符号量割り当てを行うことが必要になる。例えば、量子化幅を一定にして第一回目の符号化を行うことにより、そのときの発生符号量をフレームあるいはＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）等の単位で計測し、計測された発生符号量に応じてフレーム或いはＧＯＰ単位で第二回目の符号化のためのビット割り当てを行う方法等が一般的に取られている。
【０００４】
また、符号化すべき入力動画像シーケンスを一旦ハードディスク等の蓄積媒体に取り込んで計算機上で編集作業を行うという、いわゆるノンリニア編集を行う場合には、
１）動画像シーケンスの取り込み、
２）編集、
３）符号量配分のための第一回目の符号化、
４）最終的な符号化、
という手順を踏む必要がある。
【０００５】
さらに、ノンリニア編集作業の効率化のため、符号化データを用いた動画像のシーンチェンジの検出やシーンの統合といった動画像解析技術を併用してユーザによる編集作業を支援する機能を持つノンリニア編集システムの場合は、上記動画像シーケンスの取り込みに際して符号化を行う場合もある。この場合、取り込みに際して行う符号化時の動画像シーケンスと、編集操作後の動画像シーケンスとは一致しなくなるので、編集後に２回の符号化が必要となり、合計で少なくとも３回の符号化が必要となることになる。
【０００６】
【発明が解決しようとする課題】
上述したように、最適な符号量割り当てに基づく可変レート符号化を行う場合、通常、同一の素材に対して少なくとも２回の符号化が必要である。しかし、１回目の符号化の後に素材に対して編集操作が加えられ、特に時間方向の編集操作により、１回目の符号化時の素材との間でフレーム単位の対応関係が崩れた場合は、再度編集後の素材に対して、符号量配分のための１回目の符号化からやり直す必要があった。
【０００７】
本発明はこのような事情に鑑みてなされたものであり、１回目の符号化後に時間方向の操作を含む編集操作が素材に対して加えられても、再度２回の符号化を行うことなく、最適な符号量割り当てに基づく可変レート符号化を実現することが可能な可変レート動画像符号化方法および同方法を用いた動画像編集システムを提供することを目的とする。
【０００８】
【課題を解決するための手段】
上述の課題を解決するため、本発明は、ユーザからの編集情報に基づいて動画像を編集してその編集後の動画像を圧縮符号化する動画像編集システムに適用される可変レート動画像符号化方法であって、編集対象の第１の動画像シーケンスを符号化する第一の符号化ステップと、前記第一の符号化ステップによる符号化結果から、前記第１の動画像シーケンスの所定単位毎の発生符号量および平均量子化ステップサイズを含む統計データを前記第１の動画像シーケンス全体にわたって計測するステップと、前記計測された統計データに基づいて、前記第１の動画像シーケンスにおける連続する複数フレームから構成される第１の所定期間毎に当該第１の所定期間に属する画像の複雑さを示す第１の符号化困難度を算出するステップと、前記第１の動画像シーケンスとその編集後の第２の動画像シーケンスとの間の時間軸上の対応関係に基づいて、前記第１の符号化困難度を、前記第２の動画像シーケンスにおける連続する複数フレームから構成される第２の所定期間毎に当該第２の所定期間に属する画像の複雑さを示す第２の符号化困難度に変換する変換ステップであって、前記第２の所定期間が複数の前記第１の所定期間の間の境界にまたがる場合、前記第２の所定期間が属する前記複数の第１の所定期間それぞれに対する比率に応じて前記複数の第１の所定期間それぞれの第１の符号化困難度を加重平均することによって、前記第２の所定期間の第２の符号化困難度を求める変換ステップと、前記第２の符号化困難度に基づいて、前記第２の動画像シーケンスの前記第２の所定期間毎に割り当て符号量を決定するステップと、この割り当て符号量に基づいて、前記第２の動画像シーケンスを可変レート符号化する第二の符号化ステップとを具備することを特徴とする。
【０００９】
この可変レート動画像符号化方法においては、第１の動画像シーケンスの符号化結果から所定期間毎の符号化困難度を算出し、得られた符号化困難度を、第１の動画像シーケンスとその編集後の第２の動画像シーケンスとの間の時間軸上の対応関係に基づいて、第２の動画像シーケンス上の所定期間毎の符号化困難度に変換する。この場合、第２の所定期間が複数の前記第１の所定期間の間の境界にまたがる場合には、第２の所定期間が属する複数の第１の所定期間それぞれに対する比率に応じて、複数の第１の所定期間それぞれの第１の符号化困難度を加重平均することによって、第２の所定期間の第２の符号化困難度が求められる。そして、その第２の動画像シーケンス上の所定期間毎の符号化困難度を用いて、第２の動画像シーケンスの所定期間毎のビット割り当てが行われる。このように符号化困難度を編集後の第２の動画像シーケンスに再マッピングする操作を行うことにより、１回目と２回目の符号化とで、映像素材がフレーム単位あるいはＧＯＰ単位で一致してない場合でも、第２の動画像シーケンスが第１の動画像シーケンスの少なくとも一部から構成されていれさえすれば、最適なビット割り当てが可能になり、総符号化回数を削減することができる。
【００１０】
この場合、編集後の動画像を構成する第２の動画像シーケンスは、時間方向には、編集前の第１の動画像シーケンスと同一、あるいは第１の動画像シーケンスに含まれる一部分、あるいは第１の動画像シーケンスの複数部分を接続したもの、のいずれの場合でもよく、また第１の動画像シーケンスの各フレームと第２の動画像シーケンスの各フレームの解像度は、それぞれどちらか一方から解像度変換された縮小画像でもよい。また、第１の動画像シーケンスの各フレームと第２の動画像シーケンスの各フレームの映像は、それぞれどちらか一方の映像の一部を切り出したものや、フィルタ処理等を加えられたものでもよい。つまり、第１の動画像シーケンスと第２の動画像シーケンスとの間に、時間方向および空間方向の何らかの編集或いは変換が加えられたものであればよい。
【００１１】
また、前記第１の符号化ステップは、所定のフレーム周期でフレーム内符号化フレームが挿入されるように、前記第１の動画像シーケンスを予め決められたフレーム間予測構造を持つ符号化フレーム群単位に区切りながら符号化し、前記第１の動画像シーケンスに対応する前記第１の符号化困難度は、前記符号化フレーム群のＮ倍（Ｎは自然数）に相当する期間毎に算出することが好ましい。
【００１２】
このように第１の動画像シーケンスにおける所定期間を、第一の符号化におけるＧＯＰの倍数とすることで、ピクチャタイプに依存した符号量の変動の影響を取り除くことが可能となる。
【００１３】
また、第２の動画像シーケンスにおける所定期間も、第二の符号化におけるＧＯＰの倍数とすれば、ピクチャタイプ毎の符号量変動を特に意識せずにビット割り当てを行うことが可能となる。
【００１４】
また、第１の動画像シーケンスにおける所定期間と第２の動画像シーケンスにおける所定期間とが時間軸上で一致している場合には、第一の符号化で得られた第１の符号化困難度はそのまま第２の符号化困難度として利用可能であり、またそれぞれの所定期間が一致しない場合は、それらの位置関係に基づいて、第１の動画像シーケンスにおける所定期間毎の符号化困難度を加重平均することで、第２の符号化困難度を求めることができる。よって、第２の動画像シーケンスにおけるＧＯＰ構成は自由に設定することが可能となり、第一の符号化結果からシーンチェンジ点を検出し、シーンチェンジとＧＯＰの先頭を一致させたＧＯＰ構成として、第２の動画像シーケンスを符号化することも可能となる。
【００１５】
また、本発明では、第一の符号化における所定単位毎の発生符号量および平均量子化ステップサイズに基づいて符号化困難度を検出しているため、第一の符号化では、従来一般的に行われている固定の量子化ステップサイズの符号化のみならず、量子化ステップサイズを変動させるレート制御を加えた符号化を行ってもよい。したがって、従来は発生符号量を制御できなかった第一の符号化を、所望のレートでエンコードした有効な符号化データとして用いることが可能となる。例えば、第一の符号化は１．５Ｍｂｐｓの固定レート符号化とし、第二の符号化は平均４Ｍｂｐｓの可変レート符号化とすることで、４Ｍｂｐｓの可変レート符号化データと、副次的な１．５Ｍｂｐｓの固定レート符号化データを得ることが可能となる。
【００１６】
また、入力動画像シーケンスをフレーム単位で水平垂直それぞれサブサンプリングした画像を用いて第一の符号化を行い、第二の符号化は通常の画像サイズでの符号化を行うことで、第一の符号化で固定レートのＭＰＥＧ１の符号化データ、第二の符号化で可変レートのＭＰＥＧ２の符号化データを得ることや、或いは、第一の符号化で固定レートのＳＤＴＶ（標準ＴＶ画像）の符号化データ、第二の符号化で可変レートのＨＤＴＶ（高精細ＴＶ画像）の符号化データを得るような構成も可能である。
【００１７】
【発明の実施の形態】
以下、図面を参照して本発明の実施形態を説明する。
【００１８】
図１には、本発明の一実施形態に係る動画像編集システムで用いられる可変レート動画像符号化処理の流れが示されている。本可変レート動画像符号化処理は、ハードディスク装置等に記録されたＶＴＲ等の映像素材に対する２回の符号化により、高画質な可変レート符号化を行うものである。以下、図１を参照して、本実施形態の処理全体の流れについて説明する。
【００１９】
（１）符号化タイムコード指定
まず、ユーザ操作に基づいて、原画像であるＶＴＲ素材に対して、符号化開始タイムコード（ＩＮ点）と符号化終了タイムコード（ＯＵＴ点）の指定が行われる（ステップ１０）。
【００２０】
（２）第一の符号化（１パス目符号化）
第一の符号化では、ＩＮ点およびＯＵＴ点を示す符号化タイムコード指定情報２０に基づき、ＩＮ点からＯＵＴ点までの連続した動画像シーケンスの符号化が実行される（ステップ１１）。この第一の符号化では、例えば水平７２０画素×垂直４８０ラインのフレームサイズの入力画像を、水平３５２画素×垂直２４０ラインのフレームサイズにダウンサンプリングし、例えば１．５Ｍｂｐｓの固定ビットレートのＭＰＥＧ１方式により符号化が行われる。この場合、所定のフレーム周期でフレーム内符号化フレーム（Ｉピクチャ）が挿入されるように、ＩＮ点からＯＵＴ点までの連続した動画像シーケンスは予め決められたフレーム間予測構造を持つＧＯＰ単位に区切りながら符号化する。
【００２１】
この第一の符号化に関しては、特別な制約条件はなく、任意のビットレート、任意のパラメータ、任意の符号化方式、任意の解像度での符号化も可能である。つまり、ダウンサンプリングの有無や、符号化方式がＭＰＥＧ１かＭＰＥＧ２かの選択、また、レート制御についても、固定ビットレート、リアルタイムの可変ビットレート、あるいはレート制御をかけずに量子化ステップを固定にするなど、自由に設定できる。本実施形態では、標準で上記のサブサンプル画像のＭＰＥＧ１符号化を第一の符号化に用いている。その理由は、ビットレートの低いＭＰＥＧ１符号化を第一の符号化とすることで、後述するビットストリーム解析を高速に行うことが可能となり、また第二の符号化をＭＰＥＧ２符号化とすることで、ＭＰＥＧ１とＭＰＥＧ２の２種類の有効なストリームを得るためである。
【００２２】
この第一の符号化では、ＩＮ点からＯＵＴ点までの連続した動画像シーケンスの符号化データから構成される符号化ビットストリームファイル２２が生成されると共に、符号化ログデータおよび符号化パラメータそれぞれについてのファイル２１が生成される。ここで、符号化ログデータとは、符号化により得られるフレーム単位の発生符号量、各フレームの平均量子化ステップサイズ、動き補償予測の誤差量、等の統計データを含むものである。また、符号化パラメータは、第一の符号化におけるビットレート等の符号化パラメータを記録したものである。符号化ログデータは、符号化ログデータを出力する機能を有する符号化器を使用することによって符号化処理と並行してリアルタイムに生成することもできるが、符号化ログデータを出力する機能がない符号化器であっても、符号化器から出力されるビットストリーム２２を解析することで、符号化後にオフラインで符号化ログデータとほぼ同等の情報を生成することも可能である。
【００２３】
（３）ビットストリーム解析
ビットストリーム解析は、ユーザによる編集作業を効率化するために必要なシーンチェンジポイント等の情報を検出するためのものであり、このステージでは、第一の符号化により得られたビットストリーム２２の解析処理が行われる（ステップ１２）。このビットストリーム解析処理では、まず、シーンチェンジ点の検出が行われる。そして、検出されたシーンチェンジ点を用いることにより、シーン構造の解析が行われる、シーン構造解析では、検出されたシーンチェンジ点間に挟まれる部分をショットとして位置づけ、関連するショット同士を同一シーンとして統合する処理などが行われる。これにより、異なるショット間でも意味的に同一種に属するショット同士は１シーンとして扱われる。
【００２４】
また、シーン構造を示す幾つかの代表フレームを、縮小フレームサイズで簡易デコードした画像データの生成も併せて行われる。
【００２５】
なお、ビットストリーム解析処理としては、第一の符号化が終了した後に、生成された符号化ビットストリームファイル２２の解析を開始する構成でもよいし、あるいは第一の符号化が開始された一定遅延後に、第一の符号化と並行して、生成中のビットストリーム２２を順次解析する構成でもよい。
【００２６】
（４）符号化タイムコードエディット
ステップ１３の符号化タイムコードエディット処理は、第一の符号化（１パス目符号化）と同一の現画像に対する編集処理をユーザ操作に基づいて行うためのものであり、その処理の実行のために、ステップ１２のビットストリーム解析により得られたシーンチェンジ点の情報、シーン構造の解析結果、及びシーン構造を示す代表フレームの画像データ等を含む情報２３が使用される。この符号化タイムコードエディット処理では、シーンチェンジ点、シーン構造、および代表フレームをグラフィカルにユーザに呈示するとともに、ユーザによる第二の符号化領域指定機能を提供する。
【００２７】
ここで、第二の符号化領域指定とは、第一の符号化により符号化されたＩＮ点からＯＵＴ点までに属する時間領域の中で、最終的に符号化データとしたい部分を第二の符号化領域としてユーザに指定させるものである。また、ユーザにより指定された第二の符号化領域に属する時間領域全体をユーザ操作に応じて明示的に分割してチャプターを構成し、最終的な符号化データでチャプター単位のランダムアクセスを可能とするランダムアクセスポイントの設定処理も併せて行う。
【００２８】
（５）可変レートビットアロケーション
可変レートビットアロケーションでは、第２の符号化における符号量割り当てを決定する処理が行われる（ステップ１４）。この最適符号量割り当て処理では、符号化タイムコードエディット処理で設定された符号化対象範囲を示すタイムコード及びチャプター境界を示すタイムコードの情報２４と、第一の符号化で得られた符号化ログファイル及び符号化パラメータファイル２１が用いられる。ここでの符号量割り当てでは、すべての符号化領域での総符号量が所定値以下となり、且つ最大瞬間ビットレート、最小瞬間ビットレート、ＭＰＥＧ２の規格で規定されるＶＢＶ（ＶｉｄｅｏＢｕｆｆｅｒｉｎｇＶｅｒｉｆｉｅｒ）の制約条件等をすべて満たした上で、均一で安定した画質を得るための可変レートビット配分を行う。この可変レートビットアロケーションの具体的に処理手順については、図６で詳述するが、基本的には、第一の符号化（ステップ１１）で得られた発生符号量および平均量子化ステップサイズを用いて画像の複雑さ（符号化困難度）を示すパラメータをＧＯＰ単位で算出し、そのパラメータを可変レート符号化の対象となる編集後の動画像シーケンスに再マッピングすることによって行われる。
【００２９】
（６）第二の符号化（２パス目符号化）
第二の符号化処理では、可変レートビットアロケーション処理によるビット配分結果２６と、符号化対象及びチャプター境界を示すタイムコード情報２５とに従って、再度同一のＶＴＲ素材から構成される第二の動画像シーケンスに対して第二の符号化が実行される（ステップ１５）。この第二の符号化処理では、ビット配分結果２６に応じた可変レート制御によって、第二の符号化領域が属する符号化対象タイムコード期間の符号化が行われる。また、チャプタ境界をランダムアクセス可能とするためのフレーム間予測構造の制御も行われる。これにより、最適化された符号化ビットストリーム２７が得られる。
【００３０】
第一の符号化と同様に、第二の符号化においても各符号化パラメータは自由に設定することが可能である。本実施形態では、標準の第二の符号化として、水平７２０画素×垂直４８０ラインのフルサイズのＭＰＥＧ２符号化を例えば平均４Ｍｂｐｓの可変レートで行う。この場合、所定のフレーム周期でフレーム内符号化フレーム（Ｉピクチャ）が挿入されるように、符号化対象の動画像シーケンスは予め決められたフレーム間予測構造を持つＧＯＰ単位に区切りながら符号化される。
【００３１】
このように、図１では、第一の符号化（ステップ１１）の符号化結果を、画像解析および編集処理（ステップ１２，１３）と、符号量割り当て決定処理および第二の符号化処理（ステップＳ１４，Ｓ１５）とで共用する構成となっている。
【００３２】
次に、本実施形態の動画像編集システムで用いられる各モジュールの構成例を説明する。図２は、本実施形態に係る動画像編集システム全体の構成を示したものである。本システムは、符号化されたデジタルコンテンツを作成するための一種のオーサリングシステムとして利用されるものである。図２においては、汎用の計算機（例えば、パーソナルコンピュータ（以下ＰＣ））をベースにして、ソフトウェアモジュールとハードウェアモジュールから構成される場合が例示されている。
【００３３】
動画像符号化部であるエンコーダ３３はフレーム内符号化および動き補償予測フレーム間符号化をフレーム単位またはマクロブロック単位に切り替えながら符号化を行うものであり、これは専用ハードウェアで構成されている。エンコーダ３３を制御するためのエンコードコントローラ３２はＰＣ上のソフトウェアで構成される。ただし、エンコーダ３３は必ずしも専用ハードウェアでなくともよく、エンコーダ３３がソフトウェアで構成されていてもよい。また、図２において、システム全体を制御するシステムコントローラ３１、ユーザーがエンコーダ３３の制御を行うためのメインＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）モジュール３０、符号化データからのシーン構造の解析と縮小フレームサイズの簡易デコードを行うデータアナライザ３４、データアナライザ３４の出力をグラフィカルに表示するとともにタイムコードの編集操作を行う編集操作ＧＵＩモジュール３５、第二の符号化のための符号量割り当てを行うビットアロケータ３６の各モジュールは、すべてＰＣ上のソフトウェアで構成される。
【００３４】
また、第一の符号化により得られるビットストリームと、符号化ログデータ及び符号化パラメータは、それぞれハードディスク装置３８と３７にファイルとして記録される。また、第二の符号化により得られるビットストリームは、ハードディスク装置３９にファイルとして記録される。これらのハードディスク装置３７，３８，３９は同一のものであってもよい。
【００３５】
図３は、図２におけるエンコーダ３３の内部構成を示したものである。エンコーダ３３は、ハードウェアによる符号化装置であっても、また符号化ソフトウェアであっても、その基本的な構成は同じである。
【００３６】
すなわち、入力動画像信号５４は、動き検出部（ＭＥ）４１にてマクロブロック単位で動き検出が行われる。また、動き補償部（ＭＣ）４９では、動き検出結果を基に、参照画像メモリ４８に記録されたローカルデコード画像（参照画像）からマクロブロック単位で予測画像が作成される。入力画像と予測画像との差分は予測誤差信号として離散コサイン変換部（ＤＣＴ）４２に入力され、離散コサイン変換部（ＤＣＴ）４２による離散コサイン変換処理によって直交変換される。離散コサイン変換処理によって得られたＤＣＴ係数は、量子化部（Ｑ）４３、可変長符号化部（ＶＬＣ）４４によりそれぞれ、量子化、可変長符号化の処理が行われることにより、符号化ビットストリームが生成される。この符号化ビットストリームはハードディスク装置５２に記録される。また、Ｉピクチャ、つまりフレーム内符号化画像として符号化する場合には、動き検出および動き補償は行われず、入力画像信号に対して離散コサイン変換４２以降の処理のみが行われる。また、ＩまたはＰピクチャについてはそれを参照画像として使用するために、量子化部（Ｑ）４３の出力は、さらに逆量子化部（ＩＱ）４６、逆離散コサイン変換部（ＩＤＣＴ）４７により逆量子化、逆離散コサイン変換の処理が加えられ、そして予測画像との加算により、ローカルデコード画像として参照画像メモリ（ＦＭ）４８に書き込まれる。
【００３７】
レート制御部（ＲＣ）４５では、符号化により発生した符号量の計数結果に応じて、フィードバック制御により量子化部（Ｑ）４３でのマクロブロック単位の量子化ステップサイズの決定を順次行う。エンコーダコントローラ（ＣＯＮＴ）５０は、レート制御部（ＲＣ）４５、動き検出部（ＭＥ）４１及び動き補償部（ＭＣ）４９との間の双方向通信をデータバス５３を介して行い、動き補償誤差量、各フレーム毎の発生符号量、各フレーム単位の平均量子化ステップサイズ等の統計データを収集してハードディスク装置５１へ符号化ログデータファイルとして書き込むと共に、符号化ビットレート等の符号化パラメタを符号化パラメタファイルとしてハードディスク装置５１へ書き込む。また、エンコーダコントローラ（ＣＯＮＴ）５０は、チャプター境界にランダムアクセスポイントを挿入するためのフレーム間の予測構造の制御、レート制御部（ＲＣ）４５への指示等も行う。レート制御部（ＲＣ）４５への指示は、可変レート符号化に対応したレート変動の制御や、ランダムアクセスポイント設定のためのフレーム予測構造変化に対応したレート制御等のために行われる。
【００３８】
図４は、図２におけるデータアナライザ３４に含まれる動画像シーケンス分析モジュールの構造を示したものである。動画像シーケンス分析モジュールは前述のビットストリーム解析処理（ステップ１２）を行うものである。ここで示した動画像シーケンス分析モジュールは、公開特許公報（特開平９−９３５８８号）で開示された手法を用いて、シーン構造の解析を行う。すなわち、ビットストリーム解析部６１に対して第一の符号化データ６０（図１のビットストリーム２２に相当）を入力し、符号化データ中の動きベクトルの抽出を行い、抽出された動きベクトル情報６２をシーンチェンジ検出モジュール６３に入力し、動きベクトルの変化に基づいてシーンチェンジ点の検出を行う。さらに、シーン構成検出モジュール６５において、検出されたシーンチェンジ点に挟まれたそれぞれのショット間の類似度計算により類似するショット同士の統合が行われる。
【００３９】
図５は、図２における編集操作ＧＵＩモジュール３５によって提供される編集操作画面の一例を示したものである。図中、７１は図２のデータアナライザ３４で簡易デコードされた縮小画像であり、横軸を時間軸として表示される。縮小画像は、フレーム内符号化画像（Ｉピクチャ）を構成する各ブロックのＤＣ成分のデコードに基づくものであり、シーン構造解析の結果、特徴的な画像（Ｉピクチャ）が選択されて表示される。図中の７２は、図４のシーンチェンジ検出モジュール６３で検出されたシーンチェンジ点である。縮小画像７１とシーンチェンジ点７２は図示のように対応して表示され、どの縮小画像７１がシーンチェンジ点であるかがユーザにグラフィカルに呈示される。７３〜７５は、図４のシーン構成検出モジュール６５で検出されたシーン構造を表示したものである。関連するショットは一つのシーンとして扱われる。図５の例では、７３〜７５の３シーンがシーン構造として表示されている。
【００４０】
図中７６〜７９は、ユーザにより設定されたチャプタ構造を示している。ユーザは、表示された縮小画像７１、シーンチェンジポイント７２、シーン構造７３〜７５等を参照して、第一の符号化が行われた範囲内で、第二の符号化を行う範囲及びチャプタの設定をマウス操作により行う。図５の例では、第二の符号化領域は、７６，７７，７９で示される３つのチャプタで構成され、７８で示される区間は第二の符号化では削除される。また、７６，７７，７９は、途切れのない連続再生が可能なように符号化され、且つそれぞれのチャプタの先頭からのランダムアクセス再生が可能なように、チャプタの先頭がＧＯＰの先頭となるような予測構造の制御も行われる。
【００４１】
プレビュー再生用コマンド８２を操作すると、ユーザにより設定されたチャプタ構造に沿って、第一の符号化データを再生することにより、表示部８３に編集後の動画像シーンチェンジの簡易表示をすることも可能である。ユーザによりチャプタ構成が決定されると、次にビット配分ボタン８６を操作することにより、符号化期間全体（７６，７７，７９）で所定の符号量以下となり、ビットレートの上限、下限、及びＶＢＶの条件を満たす最適符号量配分の計算が開始される。符号量配分の計算には、第一の符号化における符号化ログファイルと符号化パラメータファイル、そしてユーザが設定したチャプタ構成情報が用いられる。
【００４２】
ビット配分処理が終了すると、図中の８１に示すように、ビット配分結果に基づく可変レートのレート変動がビットアロケーション情報としてグラフィカルに表示される。ユーザは、さらにマウス操作により、レート変動に補正を加えることが可能である。その場合、補正した部分の情報を含めて、符号量配分処理の再計算が行われる。チャプタ構造、符号量配分のいずれもが確定した時点で、エンコードボタン８７をユーザが操作すると、第二の符号化が開始される。符号化中は、表示部８３に原画像が、表示部８４には第二の符号化で得られる第二の符号化データをリアルタイムで復号した画像がそれぞれ表示される。
【００４３】
次に、本実施形態で用いられる符号量配分アルゴリズムについて説明する。
【００４４】
図６は、本実施形態に係るビットアロケータモジュールの符号量配分処理のフローチャートを示した図である。ビットアロケータモジュールへの入力は、編集操作ＧＵＩモジュール３５で設定された符号化領域及びチャプタ境界を示すタイムコード情報９０、第一の符号化による符号化ログデータ９１、設定ビットレート情報９２である。タイムコード情報９０は、各チャプタ毎に開始タイムコード（ＩＮ点）と終了タイムコード（ＯＵＴ点）の組を示すテーブルである。また、符号化ログデータ９１は、第一の符号化におけるフレーム単位の発生符号量と、マクロブロック毎の量子化ステップサイズを各フレーム毎に平均したフレーム平均量子化ステップサイズとを少なくとも含んだ符号化情報である。また、設定ビットレート９２は、予め設定された平均ビットレートＲａｖｅ、最大瞬間ビットレートＲｍａｘ、最小瞬間ビットレートＲｍｉｎを含むビットレートパラメータである。
【００４５】
（１）短時間コンプレキシティ計算
短時間コンプレキシティ計算では、第一の符号化ログデータ９１を用いることにより短時間タイムスロット毎の符号化困難度（以下、短時間コンプレキシティと呼ぶ）が計算される（ステップ９３）。短時間コンプレキシティとは、その単位時間に属する画像の時間的・空間的な複雑さの度合い（符号化困難度）を示すパラメータであり、時間的に動きの激しい画像や高解像度の画像程、その値が大きくなる。図７の１００は、量子化ステップサイズを固定にした場合のフレーム単位の発生符号量の変化を示したものである。ＭＰＥＧの符号化では、一般にフレーム内符号化を行うＩピクチャ、前方予測符号化を行うＰピクチャ、両方向予測符号化を行うＢピクチャの順で予測による符号化効率が上がり、符号量が少なくなる。つまり、同一画像であっても、フレーム単位の発生符号量はＩ，Ｐ，Ｂのピクチャタイプに依存して大きく変化する。そこで、上記短時間タイムスロットをＧＯＰの構成にアラインドさせて、ＮＧＯＰ（Ｎは自然数）毎に、そこに含まれるフレームそれぞれの画像の複雑さの平均値を短時間コンプレキシティとして求めることで、ピクチャタイプに依存した変動をなくすことができる。図７の１０１は、１ＧＯＰ毎に求めた短時間コンプレキシティの変化を示したものである。短時間コンプレキシティは、例えば以下の式（１）で求めることができる。
【００４６】
【数１】

【００４７】
式（１）では、各ＧＯＰを構成するタイムスロット内のフレームそれぞれについて、発生符号量Ｂと平均量子化ステップサイズＱとピクチャタイプに依存した重み係数Ｗとの積を累積加算し、ＧＯＰを構成するタイムスロットの時間幅で除算することで、短時間コンプレキシティを求めている。これは、同一の映像については量子化ステップサイズが増加すると発生符号量はそれに対して単調に減少するということを考慮し、同一画像であれば発生符号量と量子化ステップ幅の積は一定であるというモデル化を行い、さらに、Ｉピクチャ及びＰピクチャは参照画像として用いられるが、Ｂピクチャは参照画像としては用いられないので、ピクチャタイプに応じた重み付けを行うようにしたものである。ピクチャタイプに応じた重み付けは、次のように行われる。
【００４８】
すなわち、参照画像として用いられないＢピクチャは、通常、参照画像として用いられるＩピクチャ及びＰピクチャよりも、割り当て符号量が小さく設定される。具体的には、Ｉピクチャ及びＰピクチャの量子化ステップサイズを１０と仮定すると、Ｂピクチャの量子化ステップサイズはその１．４倍の１４が用いられる。したがって、このような量子化ステップサイズの違いを吸収するために、Ｂピクチャの重み付けＷの値は、Ｉピクチャ及びＰピクチャのそれよりも小さく設定される。これにより、短時間コンプレキシティの値はＩピクチャ及びＰピクチャが支配的となり、ピクチャタイプに依存した量子化ステップサイズの違いを平均化した状態で、短時間コンプレキシティを求めることができる。
【００４９】
（２）短時間コンプレキシティの再マッピング
次に、タイムコード情報９０に従って、上記の短時間コンプレキシティを第二の符号化領域に再マッピングする（ステップ９４）。これは、ユーザの指定したチャプタ構造によってＧＯＰの構成が第一の符号化と第二の符号化とで変化することを考慮して、第二の符号化におけるＧＯＰ構造に対応する第二のタイムスロットを設定し、上記の短時間コンプレキシティを第二のタイムスロットにマッピングする操作を行うものである。第二のタイムスロットは、第二の符号化で用いられる各ＧＯＰの始点から終点までの時間領域である。
【００５０】
短時間コンプレキシティの再マッピングは、第一の符号化におけるＧＯＰと第二の符号化におけるＧＯＰとの間の時間軸上の対応関係に基づいて行われる。式２に、式１で求めた第一の符号化結果による短時間コンプレキシティＣ１（ｉ）の、第二の短時間コンプレキシティＣ２（ｉ）への加重平均によるマッピング方法の例を示す。
【００５１】
【数２】

【００５２】
図８〜１０は、短時間コンプレキシティの再マッピングの例を示したものである。図８〜１０の上段は、第一の符号化におけるＧＯＰ毎の短時間コンプレキシティを、横軸を時間軸として示したものであり、また下段は第二のタイムスロットに再マッピングした短時間コンプレキシティを示したものである。
【００５３】
図８では、ユーザの指定により、１１２及び１１３の期間の２つのチャプタが構成され、第二の符号化においては各チャプタの先頭と終端に端数のフレーム数から構成されるＧＯＰを設定して、残りのＧＯＰについては、第一の符号化と同一の構成とした例である。この場合、第二のタイムスロットＴＳ間の境界（ＧＯＰ境界）は第一のタイムスロットｔｓ間の境界と一致しており、短時間コンプレキシティの再マッピングは、第一のタイムスロットｔｓからそれに対応する第二のタイムスロットＴＳに対してダイレクトに行うことができる。
【００５４】
図９は、ユーザの指定により、１２２及び１２３の期間の２つのチャプタが構成され、第二の符号化においては各チャプタの終端にのみ端数のフレーム数から構成されるＧＯＰを設定して、残りのＧＯＰについては、第一の符号化とは異なる固定のフレーム数で構成した例である。この場合、第二のタイムスロットＴＳ間のＧＯＰ境界と、第一の符号化におけるタイムスロットｔｓ間のＧＯＰ境界は、通常一致しないものとなる。そこで、第一の符号化で各ＧＯＰ毎に求めた第一のタイムスロットｔｓそれぞれの短時間コンプレキシティを、第二のタイムスロットＴＳとの時間軸上の位置関係に応じて加重平均することにより、第二のタイムスロットＴＳにおける短時間コンプレキシティを求める。例えば、ある第２のタイムスロットＴＳが、時間的に、ある２つの第１のタイムスロットｔｓの境界に跨る場合には、その第２のタイムスロットＴＳのＧＯＰが属する比率に応じた重み付けを行って、それら２つの第１のタイムスロットｔｓそれぞれの短時間コンプレキシティの加重平均を求めることになる。
【００５５】
図８の構成では、通常のＧＯＰよりも少ないフレーム数で構成されるＧＯＰが、各チャプタの始点と終点のそれぞれに設けられるため、符号化効率が若干低下することになる。これは、各ＧＯＰには少なくとも１つのＩピクチャが含まれ、Ｉピクチャはフレーム間予測を行わないため、Ｉピクチャの周期が短くなるほど、符号化効率は一般に低下するためである。一方、図９の構成では、第二の符号化対象となる動画像シーケンスの始点から通常のＧＯＰ構成で符号化されるため、符号化効率の低下を図８の構成よりも抑えることが可能となる。
【００５６】
図１０では、ユーザの指定により、１３２及び１３３の期間の２つの連続する領域と、さらに１３４，１３５，１３６のチャプタ境界が構成された例である。この場合、第二の符号化においては、各チャプタの先頭をＧＯＰの先頭と一致させることにより、ランダムアクセス可能とすることが必要となる。従って、チャプタ境界直前のＧＯＰ長が短くなるので、図示のように、各チャプタ境界直前の第二のタイムスロットＴＳは他のＴＳに比べ短くなる。この場合も図９と同様に、加重平均による短時間コンプレキシティの再マッピングを行う。
【００５７】
（３）ビットアロケーション
以上の処理によって第二のタイムスロットに対する短時間コンプレキシティの再マッピングを行った後に、再マッピングした短時間コンプレキシティに基づき、設定ビットレート（平均ビットレートＲａｖｅ、最大瞬間ビットレートＲｍａｘ、最小瞬間ビットレートＲｍｉｎ）の条件に基づき、符号量配分を行う（図６のステップ９５）。
【００５８】
符号量配分は、第二のタイムスロット単位、つまり第二の符号化におけるＧＯＰ単位に行い、それぞれのタイムスロットにおけるビットレートを決定する。各第二のタイムスロット毎のビットレートは、それぞれのタイムスロットの短時間コンプレキシティＣに応じて単調増加する変換式ｆ（Ｃ）を用いて計算される。図１１は、変換関数ｆ（Ｃ）の例を示したものである。ビットレートを決定する条件は、各タイムスロットにおけるビットレートＲ（ｉ）は、
Ｒｍｉｎ ≦ Ｒ（ｉ） ≦ Ｒｍａｘ
を満たし、且つ総符号量が、Ｒａｖｅに基づいて算出される総符号量以下となることである。
【００５９】
式（３）に、第二のタイムスロット単位のビットアロケーション算出式の例を示す。
【００６０】
【数３】

【００６１】
符号量配分処理により決定された第二の各タイムスロットにおけるビットレート情報と、チャプタ構成を規定するタイムコード情報とにより、第二の符号化を行い、指定されたレート変動に基づく可変レートの符号化が、符号化対象領域に対して、それぞれ行われる。第二の符号化のレート制御では、ＧＯＰ単位の符号量割り当て及びフィードバック制御、ピクチャ単位の割り当て及びフィードバック制御、マクロブロック単位の量子化ステップサイズの設定及びフィードバック制御の階層的な制御が行われる。式４は、ＧＯＰ単位のレート制御の例を示したものである。上記のビットアロケーションにより決定されたＧＯＰ単位のビットレートとそれまでに発生した符号量との累積誤差を考慮して、次に符号化するＧＯＰの割り当て符号量が決定される。式４により決定されたＧＯＰの割り当て符号量は、さらにＧＯＰ内の各ピクチャに配分されて、ピクチャ単位のフィードバック制御が加えられて符号化が行われる。最終的なレート制御は、各ピクチャ毎に設定したピクチャの符号量に近づくように、マクロブロック単位の量子化ステップサイズの動的な制御を行うことにより実現される。
【００６２】
【数４】

【００６３】
なお、本実施形態では、第一の符号化における各ＧＯＰの短時間コンプレキシティを、編集後の動画像シーケンスに対する第二の符号化で用いられる各ＧＯＰの短時間コンプレキシティに変換するようにしたが、第一の符号化における短時間コンプレキシティを複数ＧＯＰ単位で算出し、その短時間コンプレキシティを第二の符号化で用いられる複数ＧＯＰ単位の短時間コンプレキシティに変換するようにしてもよい。
【００６４】
また、本実施形態による符号化制御の手順はすべてソフトウェアによって実現することができ、この場合には、その手順を実行するコンピュータプログラムを記録媒体を介して通常の計算機に導入するだけで、本実施形態と同様の効果を得ることが可能となる。
【００６５】
また、編集後の動画像を構成する第２の動画像シーケンスは、時間方向には、編集前の第１の動画像シーケンスと同一、あるいは第１の動画像シーケンスに含まれる一部分、あるいは第１の動画像シーケンスの複数部分を接続したもの、のいずれの場合でもよく、また第１の動画像シーケンスの各フレームと第２の動画像シーケンスの各フレームの解像度は、それぞれどちらか一方から解像度変換された縮小画像でもよい。また、第１の動画像シーケンスの各フレームと第２の動画像シーケンスの各フレームの映像は、それぞれどちらか一方の映像の一部を切り出したものや、フィルタ処理等を加えられたものでもよい。つまり、第１の動画像シーケンスと第２の動画像シーケンスとの間に、時間方向および空間方向の何らかの編集或いは変換が加えられたものであればよい。
【００６６】
【発明の効果】
以上説明したように、本発明によれば、少なくとも２回の符号化による動画像シーケンスの高能率符号化において、第一の符号化後に動画像シーケンスに対する時間方向の編集操作が加えられた場合においても、第一の符号化を再度行うことなく、高能率な第二の符号化を行うことが可能となる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る動画像編集システムに適用される可変レート符号化処理の手順を示すフローチャート。
【図２】同実施形態の動画像編集システムのシステム構成の一例を示すブロック図。
【図３】同実施形態のシステムで用いられる符号化部の構成を示すブロック図。
【図４】同実施形態のシステムで用いられる動画像シーケンス分析モジュールの機能構成を示すブロック図。
【図５】同実施形態のシステムで用いられる動画像シーケンス分析結果の表示および編集操作ＧＵＩの例を示す図。
【図６】同実施形態のシステムにおける符号量配分アルゴリズムを示すフローチャート。
【図７】同実施形態のシステムで用いられるＭＰＥＧ２符号化における符号量変動の例を示す図。
【図８】同実施形態のシステムで用いられる短時間コンプレキシティの再マッピングの例を示す図。
【図９】同実施形態のシステムで用いられる短時間コンプレキシティの再マッピングの例を示す図。
【図１０】同実施形態のシステムで用いられる短時間コンプレキシティの再マッピングの例を示す図。
【図１１】同実施形態のシステムで用いられる符号量配分のための変換関数の例を示す図。
【符号の説明】
１０…第一の符号化タイムコードを設定するステップ
１１…第一の符号化を行うステップ
１２…ビットストリームを解析するステップ
１３…符号化タイムコードを編集するステップ
１４…符号量配分を行うステップ
１５…第二の符号化を行うステップ
２０…符号化タイムコード指定情報
２１…符号化パラメータデータ及び符号化ログデータ、
２２…第一のビットストリーム
２３…動画像シーケンス構造化情報（シーンチェンジ点及びシーン構造）
２４，２５…符号化タイムコードテーブル
２６…ビットアロケーションデータ
２７…第二のビットストリーム
３０…エンコーダ制御メインＧＵＩ
３１…システムコントローラ
３２…エンコーダコントローラ
３３…エンコーダ
３４…データアナライザ
３５…編集操作ＧＵＩ
３６…ビットアロケータ
３７，３８，３９…ハードディスク装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a variable rate encoding method for performing optimal bit allocation for variable rate encoding by performing encoding a plurality of times on a moving image sequence, and a moving image editing system using the method.
[0002]
[Prior art]
As a moving image compression encoding method for a storage medium, compression is performed according to the property of the image, as represented by a DVD (Digital Versatile Disc) adopting the MPEG2 video encoding method which is an international standard for moving image encoding. A variable rate coding technique is used in which the bit rate (that is, the compression rate) of data is changed with time, and high image quality is achieved under the constraint that the total code amount is constant. This is because the temporal or spatial correlation is high, and the image quality that does not allocate unnecessary code amount to the part that can satisfy the required image quality with relatively small code amount, and that has high resolution or intense motion. Rather than encoding at a constant encoding rate, a large amount of code is allocated to a part that requires a large amount of code to maintain the code, and encoding is performed so as to fit within the disk capacity as a whole. This is a technology that achieves high image quality.
[0003]
Usually, in order to perform the optimal code amount allocation over the entire moving image sequence, first, the entire moving image sequence is first encoded and the coding characteristics are analyzed, and then the code amount is allocated based on the analysis. I need it. For example, by performing the first encoding with a constant quantization width, the generated code amount at that time is measured in units such as a frame or GOP (Group Of Pictures), and the generated code amount is determined according to the measured generated code amount. In general, a method of assigning bits for the second encoding in units of frames or GOPs is generally used.
[0004]
In addition, when performing so-called nonlinear editing in which an input moving image sequence to be encoded is once taken into a storage medium such as a hard disk and edited on a computer,
1) Importing a video sequence,
2) Edit,
3) First encoding for code amount distribution,
4) Final encoding,
It is necessary to follow the procedure.
[0005]
Furthermore, in order to improve the efficiency of non-linear editing work, a non-linear editing system with a function to support editing work by the user by using moving picture analysis technology such as detection of scene change of moving pictures using encoded data and integration of scenes. In this case, encoding may be performed when the moving image sequence is captured. In this case, since the moving image sequence at the time of encoding performed at the time of import and the moving image sequence after the editing operation do not match, two encodings are necessary after editing, and a total of at least three encodings are required. Will be.
[0006]
[Problems to be solved by the invention]
As described above, when performing variable rate coding based on optimal code amount allocation, it is usually necessary to perform coding at least twice for the same material. However, when the editing operation is added to the material after the first encoding, and the frame-wise correspondence relationship with the material at the first encoding is lost due to the editing operation in the time direction in particular, It was necessary to start again from the first encoding for distributing the code amount for the edited material.
[0007]
The present invention has been made in view of such circumstances, and even if an editing operation including a time direction operation is added to the material after the first encoding, the encoding is not performed again twice. Another object of the present invention is to provide a variable-rate moving image encoding method capable of realizing variable-rate encoding based on optimal code amount allocation and a moving image editing system using the same method.
[0008]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention provides a variable-rate moving image code applied to a moving image editing system that edits a moving image based on editing information from a user and compresses and encodes the edited moving image. A first encoding step for encoding the first moving image sequence to be edited, and a predetermined unit of the first moving image sequence based on the encoding result of the first encoding step. A step of measuring statistical data including a generated code amount and an average quantization step size over the entire first moving image sequence, and a sequence in the first moving image sequence based on the measured statistical data. Composed of multiple frames First Every predetermined period The first predetermined Correspondence on the time axis between the step of calculating the first encoding difficulty indicating the complexity of the image belonging to the period, and the first moving image sequence and the second moving image sequence after editing The first encoding difficulty level is composed of a plurality of consecutive frames in the second moving image sequence. The second predetermined period every second predetermined period A conversion step for converting to a second encoding difficulty indicating the complexity of an image belonging to a period When the second predetermined period spans a boundary between the plurality of first predetermined periods, the second predetermined period corresponds to a ratio to each of the plurality of first predetermined periods to which the second predetermined period belongs. A conversion step of obtaining a second encoding difficulty level in the second predetermined period by weighted averaging the first encoding difficulty levels of each of the plurality of first predetermined periods. And, based on the second encoding difficulty level, the second video sequence Second The method includes a step of determining an allocated code amount for each predetermined period, and a second encoding step for performing variable rate encoding on the second moving image sequence based on the allocated code amount.
[0009]
In this variable-rate video encoding method, the encoding difficulty level for each predetermined period is calculated from the encoding result of the first video sequence, and the obtained encoding difficulty level is determined as the first video sequence. Based on the correspondence relationship on the time axis with the edited second moving image sequence, the second moving image sequence is converted into an encoding difficulty level for each predetermined period. In this case, when the second predetermined period spans boundaries between the plurality of the first predetermined periods, a plurality of the plurality of first predetermined periods belong to the plurality of first predetermined periods to which the second predetermined period belongs. By calculating the weighted average of the first encoding difficulty levels for each of the first predetermined periods, the second encoding difficulty levels for the second predetermined period are obtained. Then, bit allocation for each predetermined period of the second moving image sequence is performed using the encoding difficulty level for each predetermined period on the second moving image sequence. In this way, by performing the operation of remapping the encoding difficulty level to the second moving image sequence after editing, the video material matches in frame units or GOP units in the first encoding and the second encoding. Even if there is not, as long as the second moving image sequence is composed of at least a part of the first moving image sequence, the optimum bit allocation can be performed, and the total number of encodings can be reduced.
[0010]
In this case, the second moving image sequence constituting the edited moving image is the same as the first moving image sequence before editing in the time direction, or a part included in the first moving image sequence, or the second moving image sequence. Any one of a plurality of portions of one moving image sequence connected may be used, and the resolution of each frame of the first moving image sequence and each frame of the second moving image sequence is determined from either one. The converted reduced image may be used. In addition, each frame of the first moving image sequence and each frame of the second moving image sequence may be obtained by cutting out a part of one of the images or by adding a filter process or the like. . That is, any editing or conversion in the time direction and the spatial direction may be added between the first moving image sequence and the second moving image sequence.
[0011]
The first encoding step includes a group of encoded frames having a predetermined inter-frame prediction structure for the first moving image sequence so that intra-frame encoded frames are inserted at a predetermined frame period. The first encoding difficulty corresponding to the first moving image sequence is calculated for each period corresponding to N times (N is a natural number) of the encoded frame group. preferable.
[0012]
Thus, by setting the predetermined period in the first moving image sequence to a multiple of the GOP in the first encoding, it is possible to remove the influence of the variation in the code amount depending on the picture type.
[0013]
If the predetermined period in the second moving image sequence is also a multiple of the GOP in the second encoding, it is possible to perform bit allocation without being particularly aware of the code amount variation for each picture type.
[0014]
Further, when the predetermined period in the first moving image sequence and the predetermined period in the second moving image sequence coincide on the time axis, the first encoding difficulty obtained by the first encoding is difficult. The degree can be used as the second encoding difficulty level as it is, and when the respective predetermined periods do not match, the encoding difficulty level for each predetermined period in the first moving image sequence is based on the positional relationship between them. By calculating the weighted average, it is possible to obtain the second encoding difficulty level. Therefore, the GOP configuration in the second moving image sequence can be freely set, and the scene change point is detected from the first encoding result, and the GOP configuration in which the scene change and the top of the GOP are made coincident with each other. It is also possible to encode two moving image sequences.
[0015]
In the present invention, since the encoding difficulty level is detected based on the generated code amount and the average quantization step size for each predetermined unit in the first encoding, Not only encoding with a fixed quantization step size being performed, but also encoding with rate control that varies the quantization step size may be performed. Therefore, it is possible to use the first encoding, which has conventionally been unable to control the generated code amount, as effective encoded data encoded at a desired rate. For example, the first encoding is fixed rate encoding of 1.5 Mbps, and the second encoding is variable rate encoding with an average of 4 Mbps, so that variable rate encoded data of 4 Mbps and subordinate 1 It becomes possible to obtain fixed rate encoded data of .5 Mbps.
[0016]
Also, the first encoding is performed using an image obtained by sub-sampling the input moving image sequence in the horizontal and vertical directions in units of frames, and the second encoding is performed by encoding with a normal image size. Encoding to obtain fixed rate MPEG1 encoded data, second encoding to obtain variable rate MPEG2 encoded data, or first encoding to a fixed rate SDTV (standard TV image) code A configuration is also possible in which encoded data and variable-rate HDTV (high-definition TV image) encoded data are obtained in the second encoding.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0018]
FIG. 1 shows a flow of variable rate moving image encoding processing used in a moving image editing system according to an embodiment of the present invention. This variable-rate moving image encoding process performs high-quality variable-rate encoding by performing encoding twice on video material such as a VTR recorded on a hard disk device or the like. Hereinafter, with reference to FIG. 1, a flow of the entire processing of the present embodiment will be described.
[0019]
(1) Specify encoding time code
First, based on a user operation, an encoding start time code (IN point) and an encoding end time code (OUT point) are specified for the VTR material that is the original image (step 10).
[0020]
(2) First encoding (first pass encoding)
In the first encoding, a continuous moving image sequence from the IN point to the OUT point is encoded based on the encoded time code designation information 20 indicating the IN point and the OUT point (step 11). In this first encoding, an input image having a frame size of, for example, horizontal 720 pixels × vertical 480 lines is down-sampled to a frame size of horizontal 352 pixels × vertical 240 lines and, for example, MPEG1 with a fixed bit rate of 1.5 Mbps. The encoding is performed by the following. In this case, the continuous moving image sequence from the IN point to the OUT point is in GOP units having a predetermined inter-frame prediction structure so that intra-frame encoded frames (I pictures) are inserted at a predetermined frame period. Encode while separating.
[0021]
Regarding the first encoding, there is no special restriction condition, and encoding at an arbitrary bit rate, an arbitrary parameter, an arbitrary encoding method, and an arbitrary resolution is also possible. That is, with regard to the presence or absence of downsampling, the selection of whether the encoding method is MPEG1 or MPEG2, and rate control, the quantization step is fixed without applying a fixed bit rate, real-time variable bit rate, or rate control. Etc. can be set freely. In the present embodiment, the MPEG1 encoding of the sub-sample image is used for the first encoding as a standard. The reason for this is that MPEG1 encoding with a low bit rate is used as the first encoding, so that the bitstream analysis described later can be performed at high speed, and the second encoding is MPEG2 encoding. This is because two types of effective streams, MPEG1 and MPEG2, are obtained.
[0022]
In the first encoding, an encoded bit stream file 22 composed of encoded data of a continuous moving image sequence from the IN point to the OUT point is generated, and each of the encoded log data and the encoding parameters is generated. File 21 is generated. Here, the encoded log data includes statistical data such as a generated code amount in units of frames obtained by encoding, an average quantization step size of each frame, an error amount of motion compensation prediction, and the like. The encoding parameter is a recording of an encoding parameter such as a bit rate in the first encoding. The encoded log data can be generated in real time in parallel with the encoding process by using an encoder having a function of outputting the encoded log data, but there is no function of outputting the encoded log data. Even in the case of an encoder, by analyzing the bit stream 22 output from the encoder, it is possible to generate information substantially equivalent to the encoded log data offline after encoding.
[0023]
(3) Bitstream analysis
The bitstream analysis is for detecting information such as scene change points necessary for improving the efficiency of editing work by the user. In this stage, the bitstream 22 obtained by the first encoding is analyzed. Processing is performed (step 12). In this bit stream analysis process, first, a scene change point is detected. The scene structure analysis is performed by using the detected scene change points. In the scene structure analysis, the portion sandwiched between the detected scene change points is positioned as a shot, and related shots are regarded as the same scene. Processing to be integrated is performed. Thereby, shots that belong to the same kind semantically between different shots are treated as one scene.
[0024]
In addition, generation of image data obtained by simply decoding several representative frames indicating the scene structure with a reduced frame size is also performed.
[0025]
Note that the bitstream analysis processing may be configured to start analysis of the generated encoded bitstream file 22 after the completion of the first encoding, or a fixed delay at which the first encoding is started. Later, in parallel with the first encoding, the bit stream 22 being generated may be sequentially analyzed.
[0026]
(4) Encoding time code edit
The encoding time code editing process in step 13 is for performing an editing process on the same current image as the first encoding (first pass encoding) based on a user operation, and for executing the process. In addition, information 23 including information on the scene change point obtained by the bitstream analysis in step 12, the analysis result of the scene structure, and image data of the representative frame indicating the scene structure is used. In this encoding time code editing process, the scene change point, scene structure, and representative frame are presented to the user graphically, and a second encoding area designation function by the user is provided.
[0027]
Here, the second coding area designation means that the part that is finally set as coded data in the time domain belonging to the IN point to the OUT point coded by the first coding is the second coding area. This is for the user to designate as an encoding area. In addition, the entire time domain belonging to the second coding area specified by the user is explicitly divided according to the user operation to form chapters, and random access in units of chapters is possible with the final encoded data The random access point setting process is also performed.
[0028]
(5) Variable rate bit allocation
In variable-rate bit allocation, processing for determining code amount allocation in the second encoding is performed (step 14). In this optimal code amount allocation process, the time code indicating the encoding target range set in the encoding time code editing process and the time code information 24 indicating the chapter boundary, and the encoding log obtained by the first encoding A file and an encoding parameter file 21 are used. In the code amount allocation here, the total code amount in all the coding regions is less than a predetermined value, and the maximum instantaneous bit rate, the minimum instantaneous bit rate, and VBV (Video Buffering Verifier) restrictions defined by the MPEG2 standard. Variable rate bit allocation is performed to obtain uniform and stable image quality while satisfying all conditions. The specific processing procedure of this variable rate bit allocation will be described in detail with reference to FIG. 6. Basically, the generated code amount and the average quantization step size obtained in the first encoding (step 11) are determined. A parameter indicating the complexity (encoding difficulty) of the image is calculated in GOP units, and the parameter is remapped to an edited moving image sequence to be subjected to variable rate encoding.
[0029]
(6) Second encoding (second pass encoding)
In the second encoding process, the second moving image sequence composed of the same VTR material again according to the bit allocation result 26 by the variable rate bit allocation process and the time code information 25 indicating the encoding target and the chapter boundary. Is subjected to a second encoding (step 15). In the second encoding process, encoding of the encoding target time code period to which the second encoding region belongs is performed by variable rate control according to the bit distribution result 26. In addition, the inter-frame prediction structure is controlled so that the chapter boundary can be randomly accessed. As a result, an optimized encoded bit stream 27 is obtained.
[0030]
Similar to the first encoding, each encoding parameter can be set freely in the second encoding. In this embodiment, as standard second encoding, full-size MPEG2 encoding of horizontal 720 pixels × vertical 480 lines is performed at a variable rate of, for example, an average of 4 Mbps. In this case, the video sequence to be encoded is encoded while being divided into GOP units having a predetermined inter-frame prediction structure so that intra-frame encoded frames (I pictures) are inserted at a predetermined frame period. The
[0031]
As described above, in FIG. 1, the encoding result of the first encoding (step 11) is converted into the image analysis and editing process (steps 12 and 13), the code amount allocation determination process, and the second encoding process (step S14 and S15).
[0032]
Next, a configuration example of each module used in the moving image editing system of the present embodiment will be described. FIG. 2 shows the configuration of the entire moving image editing system according to the present embodiment. This system is used as a kind of authoring system for creating encoded digital content. FIG. 2 illustrates a case where a software module and a hardware module are configured based on a general-purpose computer (for example, a personal computer (hereinafter referred to as PC)).
[0033]
The encoder 33 which is a moving image encoding unit performs encoding while switching between intraframe encoding and motion compensated prediction interframe encoding in units of frames or macroblocks, and is configured by dedicated hardware. . An encoding controller 32 for controlling the encoder 33 is configured by software on a PC. However, the encoder 33 is not necessarily dedicated hardware, and the encoder 33 may be configured by software. In FIG. 2, a system controller 31 for controlling the entire system, a main GUI (Graphical User Interface) module 30 for a user to control the encoder 33, analysis of a scene structure from encoded data, and simplified reduction of frame size Each module of the data analyzer 34 for decoding, the editing GUI module 35 for graphically displaying the output of the data analyzer 34 and the time code editing operation, and the bit allocator 36 for assigning the code amount for the second encoding Are composed of software on a PC.
[0034]
Further, the bit stream obtained by the first encoding, the encoded log data, and the encoding parameters are recorded as files on the

hard disk devices

38 and 37, respectively. The bit stream obtained by the second encoding is recorded as a file on the hard disk device 39. These

hard disk devices

37, 38, 39 may be the same.
[0035]
FIG. 3 shows the internal configuration of the encoder 33 in FIG. The encoder 33 has the same basic configuration regardless of whether it is a hardware encoding device or encoding software.
[0036]
That is, the motion detection of the input moving image signal 54 is performed on a macroblock basis by the motion detector (ME) 41. In addition, the motion compensation unit (MC) 49 creates a prediction image in units of macroblocks from the local decoded image (reference image) recorded in the reference image memory 48 based on the motion detection result. The difference between the input image and the predicted image is input as a prediction error signal to the discrete cosine transform unit (DCT) 42 and orthogonally transformed by the discrete cosine transform process by the discrete cosine transform unit (DCT) 42. The DCT coefficient obtained by the discrete cosine transform process is subjected to quantization and variable length coding processing by the quantization unit (Q) 43 and variable length coding unit (VLC) 44, respectively. A stream is generated. This encoded bit stream is recorded in the hard disk device 52. Also, in the case of encoding as an I picture, that is, an intra-frame encoded image, motion detection and motion compensation are not performed, and only the processing after the discrete cosine transform 42 is performed on the input image signal. Also, in order to use an I or P picture as a reference image, the output of the quantization unit (Q) 43 is further inverted by an inverse quantization unit (IQ) 46 and an inverse discrete cosine transform unit (IDCT) 47. Processing of quantization and inverse discrete cosine transform is added, and it is written in the reference image memory (FM) 48 as a locally decoded image by addition with the predicted image.
[0037]
In the rate control unit (RC) 45, the quantization step size in units of macroblocks in the quantization unit (Q) 43 is sequentially determined by feedback control according to the count result of the code amount generated by the encoding. The encoder controller (CONT) 50 performs bi-directional communication between the rate control unit (RC) 45, the motion detection unit (ME) 41, and the motion compensation unit (MC) 49 via the data bus 53. And collects statistical data such as the amount, generated code amount for each frame, average quantization step size for each frame and writes it as an encoded log data file to the hard disk device 51, and sets encoding parameters such as an encoding bit rate. Write to the hard disk device 51 as an encoding parameter file. The encoder controller (CONT) 50 also controls the prediction structure between frames for inserting a random access point at the chapter boundary, and instructs the rate control unit (RC) 45. The instruction to the rate control unit (RC) 45 is performed for rate control corresponding to variable rate coding, rate control corresponding to frame prediction structure change for setting a random access point, and the like.
[0038]
FIG. 4 shows the structure of a moving image sequence analysis module included in the data analyzer 34 in FIG. The moving image sequence analysis module performs the above-described bit stream analysis processing (step 12). The moving image sequence analysis module shown here analyzes the scene structure using the method disclosed in the published patent publication (Japanese Patent Laid-Open No. 9-93588). That is, the first encoded data 60 (corresponding to the bit stream 22 in FIG. 1) is input to the bit stream analysis unit 61, the motion vector in the encoded data is extracted, and the extracted motion vector information 62 Is input to the scene change detection module 63, and the scene change point is detected based on the change of the motion vector. Further, the scene configuration detection module 65 integrates similar shots by calculating the similarity between the shots sandwiched between the detected scene change points.
[0039]
FIG. 5 shows an example of an editing operation screen provided by the editing operation GUI module 35 in FIG. In the figure, reference numeral 71 denotes a reduced image that is simply decoded by the data analyzer 34 of FIG. 2, and is displayed with the horizontal axis as the time axis. The reduced image is based on the decoding of the DC component of each block constituting the intra-frame encoded image (I picture). As a result of the scene structure analysis, a characteristic image (I picture) is selected and displayed. . Reference numeral 72 in the figure denotes a scene change point detected by the scene change detection module 63 of FIG. The reduced image 71 and the scene change point 72 are displayed correspondingly as shown in the figure, and which reduced image 71 is the scene change point is graphically presented to the user. 73 to 75 display the scene structure detected by the scene configuration detection module 65 of FIG. Related shots are treated as one scene. In the example of FIG. 5, three scenes 73 to 75 are displayed as a scene structure.
[0040]
In the figure, reference numerals 76 to 79 denote chapter structures set by the user. The user refers to the displayed reduced image 71, scene change point 72, scene structures 73 to 75, etc., and within the range where the first encoding has been performed, the range for which the second encoding is performed and the chapter Set by mouse operation. In the example of FIG. 5, the second coding area is composed of three chapters indicated by 76, 77, and 79, and the section indicated by 78 is deleted in the second encoding. Also, 76, 77, and 79 are encoded so that continuous reproduction without interruption is possible, and the beginning of each chapter is the beginning of GOP so that random access reproduction can be performed from the beginning of each chapter. Control of the prediction structure is also performed.
[0041]
When the preview playback command 82 is operated, the first encoded data is played back along the chapter structure set by the user, so that the edited moving image scene change can be simply displayed on the display unit 83. Is possible. When the chapter configuration is determined by the user, the bit allocation button 86 is operated next, so that the code amount becomes less than a predetermined code amount in the entire encoding period (76, 77, 79), and the upper limit, lower limit, and VBV The calculation of the optimal code amount distribution satisfying the above condition is started. For the calculation of the code amount distribution, an encoding log file and an encoding parameter file in the first encoding, and chapter configuration information set by the user are used.
[0042]
When the bit allocation process is completed, as shown at 81 in the figure, the rate fluctuation of the variable rate based on the bit allocation result is graphically displayed as the bit allocation information. The user can further correct the rate fluctuation by operating the mouse. In this case, the code amount distribution process is recalculated including the corrected part information. When both the chapter structure and the code amount distribution are confirmed, when the user operates the encode button 87, the second encoding is started. During encoding, an original image is displayed on the display unit 83, and an image obtained by decoding second encoded data obtained by the second encoding in real time is displayed on the display unit 84.
[0043]
Next, the code amount distribution algorithm used in this embodiment will be described.
[0044]
FIG. 6 is a diagram showing a flowchart of the code amount distribution processing of the bit allocator module according to this embodiment. The input to the bit allocator module is time code information 90 indicating the encoding area and chapter boundary set by the editing operation GUI module 35, encoded log data 91 by the first encoding, and set bit rate information 92. The time code information 90 is a table indicating a set of a start time code (IN point) and an end time code (OUT point) for each chapter. The encoding log data 91 is a code including at least a generated code amount for each frame in the first encoding and a frame average quantization step size obtained by averaging the quantization step size for each macroblock for each frame. Information. The set bit rate 92 is a bit rate parameter including a preset average bit rate Rave, maximum instantaneous bit rate Rmax, and minimum instantaneous bit rate Rmin.
[0045]
(1) Short-time complexity calculation
In the short-time complexity calculation, the first encoding log data 91 is used to calculate the encoding difficulty for each short-time time slot (hereinafter referred to as short-time complexity) (step 93). Short-time complexity is a parameter that indicates the degree of temporal and spatial complexity (encoding difficulty) of an image belonging to the unit time. , The value becomes larger. Reference numeral 100 in FIG. 7 indicates a change in the amount of generated code for each frame when the quantization step size is fixed. In MPEG coding, generally, coding efficiency by prediction increases in the order of an I picture that performs intraframe coding, a P picture that performs forward prediction coding, and a B picture that performs bidirectional prediction coding, and the amount of code decreases. That is, even for the same image, the generated code amount in units of frames varies greatly depending on the picture types of I, P, and B. Therefore, the short time slots are aligned with the GOP structure, and for each N GOP (N is a natural number), the average value of the image complexity of each frame included therein is obtained as the short time complexity. Variations depending on the picture type can be eliminated. Reference numeral 101 in FIG. 7 shows a change in short-time complexity obtained for each GOP. The short-time complexity can be obtained, for example, by the following equation (1).
[0046]
[Expression 1]

[0047]
In equation (1), for each frame in the time slot constituting each GOP, the product of the generated code amount B, the average quantization step size Q, and the weighting factor W depending on the picture type is cumulatively added to construct the GOP. The complexity is calculated for a short time by dividing by the time width of the time slot. This is because the generated code amount monotonously decreases as the quantization step size increases for the same video. For the same image, the product of the generated code amount and the quantization step width is constant. In addition, the I picture and the P picture are used as reference images, but the B picture is not used as a reference image, so that weighting according to the picture type is performed. The weighting according to the picture type is performed as follows.
[0048]
That is, a B picture that is not used as a reference image is normally set with a smaller allocated code amount than an I picture and a P picture that are used as reference images. Specifically, assuming that the quantization step size of an I picture and P picture is 10, 14 is used, which is 1.4 times the quantization step size of a B picture. Therefore, in order to absorb such a difference in quantization step size, the value of the weighting W of the B picture is set smaller than that of the I picture and the P picture. As a result, the short-time complexity value is dominant for the I picture and the P picture, and the short-time complexity can be obtained in a state where the difference in quantization step size depending on the picture type is averaged.
[0049]
(2) Short-time complexity remapping
Next, according to the time code information 90, the short-time complexity is remapped to the second coding region (step 94). This is because the second time corresponding to the GOP structure in the second encoding takes into account that the GOP configuration changes between the first encoding and the second encoding depending on the chapter structure specified by the user. A slot is set, and the operation for mapping the short-time complexity to the second time slot is performed. The second time slot is a time region from the start point to the end point of each GOP used in the second encoding.
[0050]
The remapping of the short-time complexity is performed based on the correspondence on the time axis between the GOP in the first encoding and the GOP in the second encoding. Expression 2 shows an example of a mapping method by weighted averaging of the short-time complexity C1 (i) based on the first encoding result obtained in Expression 1 to the second short-time complexity C2 (i). .
[0051]
[Expression 2]

[0052]
8 to 10 show examples of short-time complexity remapping. The upper part of FIGS. 8 to 10 shows the short-time complexity for each GOP in the first encoding with the horizontal axis as the time axis, and the lower part is the short time that is remapped to the second time slot. It shows complexity.
[0053]
In FIG. 8, two chapters of

periods

112 and 113 are configured by the user, and in the second encoding, a GOP composed of fractional frames is set at the beginning and end of each chapter, The remaining GOP is an example having the same configuration as the first encoding. In this case, the boundary between the second time slots TS (GOP boundary) coincides with the boundary between the first time slots ts, and the short-time complexity remapping is performed from the first time slot ts to it. This can be done directly for the corresponding second time slot TS.
[0054]
FIG. 9 shows that two

chapters

122 and 123 are configured by the user's specification, and in the second encoding, a GOP composed of fractional frames is set only at the end of each chapter, and the rest This GOP is an example configured with a fixed number of frames different from the first encoding. In this case, the GOP boundary between the second time slots TS and the GOP boundary between the time slots ts in the first encoding usually do not match. Therefore, the short-time complexity of each first time slot ts obtained for each GOP in the first encoding is weighted and averaged according to the positional relationship on the time axis with the second time slot TS. Thus, the short-time complexity in the second time slot TS is obtained. For example, when a certain second time slot TS straddles the boundary between two certain first time slots ts in time, weighting is performed according to the ratio to which the GOP of the second time slot TS belongs. The short-time complexity of each of these two first time slots ts Weight The average will be calculated.
[0055]
In the configuration shown in FIG. 8, since a GOP configured with a smaller number of frames than a normal GOP is provided at each of the start point and end point of each chapter, the encoding efficiency slightly decreases. This is because each GOP includes at least one I picture, and the I picture does not perform inter-frame prediction, so that the encoding efficiency generally decreases as the period of the I picture becomes shorter. On the other hand, in the configuration of FIG. 9, since encoding is performed with a normal GOP configuration from the start point of the moving image sequence to be encoded second, it is possible to suppress a decrease in encoding efficiency compared to the configuration of FIG. Become.
[0056]
FIG. 10 shows an example in which two consecutive areas of 132 and 133 and chapter boundaries of 134, 135, and 136 are configured by the user's designation. In this case, in the second encoding, it is necessary to enable random access by matching the head of each chapter with the head of the GOP. Therefore, since the GOP length immediately before the chapter boundary becomes shorter, the second time slot TS immediately before each chapter boundary becomes shorter than the other TS as shown in the figure. In this case as well, the short-time complexity remapping is performed by the weighted average as in FIG.
[0057]
(3) Bit allocation
After performing the remapping of the short-time complexity for the second time slot by the above processing, based on the remapped short-time complexity, the set bit rate (average bit rate Rave, maximum instantaneous bit rate Rmax, minimum Based on the condition of the instantaneous bit rate (Rmin), code amount distribution is performed (step 95 in FIG. 6).
[0058]
The code amount distribution is performed in units of second time slots, that is, in units of GOPs in the second encoding, and the bit rate in each time slot is determined. The bit rate for each second time slot is calculated using a conversion formula f (C) that monotonically increases according to the short-time complexity C of each time slot. FIG. 11 shows an example of the conversion function f (C). The condition for determining the bit rate is that the bit rate R (i) in each time slot is
Rmin ≦ R (i) ≦ Rmax
And the total code amount is equal to or less than the total code amount calculated based on Rave.
[0059]
Formula (3) shows an example of a bit allocation calculation formula for the second time slot unit.
[0060]
[Equation 3]

[0061]
The variable rate code based on the specified rate fluctuation is performed by performing the second encoding based on the bit rate information in each second time slot determined by the code amount allocation process and the time code information defining the chapter structure. Is performed on each encoding target area. In the second encoding rate control, code amount allocation and feedback control in GOP units, allocation and feedback control in picture units, quantization step size setting in macroblock units, and hierarchical control of feedback control are performed. Equation 4 shows an example of rate control in GOP units. The allocated code amount of the GOP to be encoded next is determined in consideration of the accumulated error between the GOP unit bit rate determined by the above bit allocation and the code amount generated so far. The GOP allocation code amount determined by Equation 4 is further distributed to each picture in the GOP, and is subjected to encoding by performing feedback control in units of pictures. The final rate control is realized by dynamically controlling the quantization step size in units of macroblocks so as to approach the picture code amount set for each picture.
[0062]
[Expression 4]

[0063]
In the present embodiment, the short-time complexity of each GOP in the first encoding is converted into the short-time complexity of each GOP used in the second encoding for the edited moving image sequence. However, the short-term complexity in the first encoding is calculated in units of a plurality of GOPs, and the short-term complexity is converted into the short-term complexity in units of a plurality of GOPs used in the second encoding. You may do it.
[0064]
In addition, the encoding control procedure according to the present embodiment can be realized by software. In this case, the computer program for executing the procedure is simply introduced into a normal computer via a recording medium. It is possible to obtain the same effect as the form.
[0065]
Further, the second moving image sequence constituting the moving image after editing is the same as the first moving image sequence before editing in the time direction, or a part included in the first moving image sequence, or the first moving image sequence. In this case, the resolution of each frame of the first moving image sequence and each frame of the second moving image sequence is converted from one of the resolutions. A reduced image may be used. In addition, each frame of the first moving image sequence and each frame of the second moving image sequence may be obtained by cutting out a part of one of the images or by adding a filter process or the like. . That is, any editing or conversion in the time direction and the spatial direction may be added between the first moving image sequence and the second moving image sequence.
[0066]
【The invention's effect】
As described above, according to the present invention, in a high-efficiency encoding of a video sequence by at least two encodings, when a time direction editing operation is added to the video sequence after the first encoding, However, it is possible to perform the highly efficient second encoding without performing the first encoding again.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a procedure of variable rate encoding processing applied to a moving image editing system according to an embodiment of the present invention.
FIG. 2 is an exemplary block diagram showing an example of the system configuration of the moving image editing system according to the embodiment;
FIG. 3 is an exemplary block diagram illustrating a configuration of an encoding unit used in the system according to the embodiment;
FIG. 4 is an exemplary block diagram showing a functional configuration of a moving image sequence analysis module used in the system according to the embodiment;
FIG. 5 is a view showing an example of a moving image sequence analysis result display and editing operation GUI used in the system according to the embodiment;
FIG. 6 is a flowchart showing a code amount distribution algorithm in the system according to the embodiment;
FIG. 7 is a view showing an example of code amount fluctuation in MPEG2 encoding used in the system of the embodiment;
FIG. 8 is a diagram showing an example of short-time complexity remapping used in the system according to the embodiment;
FIG. 9 is a diagram showing an example of short-time complexity remapping used in the system according to the embodiment;
FIG. 10 is a diagram showing an example of short-time complexity remapping used in the system according to the embodiment;
FIG. 11 is a diagram showing an example of a conversion function for code amount distribution used in the system according to the embodiment;
[Explanation of symbols]
10: Setting the first encoded time code
11: Step of performing first encoding
12 ... Step of analyzing bitstream
13 ... Step for editing the encoded time code
14: Step of code amount distribution
15 ... Step of performing second encoding
20: Encoding time code designation information
21: Encoding parameter data and encoding log data,
22: First bit stream
23. Moving picture sequence structuring information (scene change point and scene structure)
24, 25 ... Encoding time code table
26: Bit allocation data
27 ... Second bit stream
30 ... Encoder control main GUI
31 ... System controller
32 ... Encoder controller
33 ... Encoder
34 ... Data analyzer
35 ... Editing operation GUI
36 ... bit allocator
37, 38, 39 ... Hard disk device

Claims

ユーザからの編集情報に基づいて動画像を編集してその編集後の動画像を圧縮符号化する動画像編集システムに適用される可変レート動画像符号化方法であって、
編集対象の第１の動画像シーケンスを符号化する第一の符号化ステップと、
前記第一の符号化ステップによる符号化結果から、前記第１の動画像シーケンスの所定単位毎の発生符号量および平均量子化ステップサイズを含む統計データを前記第１の動画像シーケンス全体にわたって計測するステップと、
前記計測された統計データに基づいて、前記第１の動画像シーケンスにおける連続する複数フレームから構成される第１の所定期間毎に当該第１の所定期間に属する画像の複雑さを示す第１の符号化困難度を算出するステップと、
前記第１の動画像シーケンスとその編集後の第２の動画像シーケンスとの間の時間軸上の対応関係に基づいて、前記第１の符号化困難度を、前記第２の動画像シーケンスにおける連続する複数フレームから構成される第２の所定期間毎に当該第２の所定期間に属する画像の複雑さを示す第２の符号化困難度に変換する変換ステップであって、前記第２の所定期間が複数の前記第１の所定期間の間の境界にまたがる場合、前記第２の所定期間が属する前記複数の第１の所定期間それぞれに対する比率に応じて前記複数の第１の所定期間それぞれの第１の符号化困難度を加重平均することによって、前記第２の所定期間の第２の符号化困難度を求める変換ステップと、
前記第２の符号化困難度に基づいて、前記第２の動画像シーケンスの前記第２の所定期間毎に割り当て符号量を決定するステップと、
この割り当て符号量に基づいて、前記第２の動画像シーケンスを可変レート符号化する第二の符号化ステップとを具備することを特徴とする可変レート動画像符号化方法。A variable rate moving image encoding method applied to a moving image editing system that edits a moving image based on editing information from a user and compresses and encodes the edited moving image,
A first encoding step for encoding a first moving image sequence to be edited;
Statistical data including a generated code amount and an average quantization step size for each predetermined unit of the first moving image sequence is measured over the entire first moving image sequence from the encoding result of the first encoding step. Steps,
A first indicating the complexity of an image belonging to the first predetermined period for each first predetermined period composed of a plurality of consecutive frames in the first moving image sequence based on the measured statistical data. Calculating the encoding difficulty;
Based on the correspondence on the time axis between the first moving image sequence and the second moving image sequence after editing, the first encoding difficulty level is determined based on the second moving image sequence. A conversion step for converting into a second encoding difficulty indicating the complexity of an image belonging to the second predetermined period for each second predetermined period composed of a plurality of consecutive frames, the second predetermined period When a period spans a boundary between a plurality of the first predetermined periods, each of the plurality of first predetermined periods according to a ratio to each of the plurality of first predetermined periods to which the second predetermined period belongs A conversion step of obtaining a second encoding difficulty level in the second predetermined period by performing a weighted average of the first encoding difficulty levels ;
Determining an allocated code amount for each second predetermined period of the second moving image sequence based on the second encoding difficulty level;
A variable-rate video encoding method comprising: a second encoding step for variable-rate encoding the second video sequence based on the allocated code amount.

前記第一の符号化ステップによる符号化結果から、前記第１の動画像シーケンス中のシーンチェンジ位置を少なくとも含む動画像シーケンス分析情報を生成し、その動画像シーケンス分析情報を、ユーザによる編集操作を支援するための編集支援情報としてユーザに呈示するステップをさらに具備することを特徴とする請求項１記載の可変レート動画像符号化方法。 From the encoding result of the first encoding step, moving image sequence analysis information including at least a scene change position in the first moving image sequence is generated, and the moving image sequence analysis information is edited by a user. The variable-rate video encoding method according to claim 1, further comprising a step of presenting to a user as editing support information for supporting.

前記第一の符号化ステップは、所定のフレーム周期でフレーム内符号化フレームが挿入されるように、前記第１の動画像シーケンスを予め決められたフレーム間予測構造を持つ符号化フレーム群単位に区切りながら符号化し、
前記第１の動画像シーケンスに対応する前記第１の符号化困難度は、前記符号化フレーム群のＮ倍（Ｎは自然数）に相当する期間毎に算出されることを特徴とする請求項１記載の可変レート動画像符号化方法。In the first encoding step, the first moving image sequence is encoded into a group of encoded frames having a predetermined inter-frame prediction structure so that intra-frame encoded frames are inserted at a predetermined frame period. Encode while separating,
The first encoding difficulty level corresponding to the first moving image sequence is calculated for each period corresponding to N times (N is a natural number) of the encoded frame group. The variable rate moving image encoding method described.

前記第二の符号化ステップは、所定のフレーム周期でフレーム内符号化フレームが挿入されるように、前記第２の動画像シーケンスを予め決められたフレーム間予測構造を持つ符号化フレーム群単位に区切りながら符号化し、
前記第２の動画像シーケンスに対応する前記第２の符号化困難度は、前記第１の動画像シーケンスに対応する前記第１の符号化困難度に基づいて、前記第２の動画像シーケンスを構成する符号化フレーム群のＮ倍（Ｎは自然数）に相当する期間毎に算出されることを特徴とする請求項３記載の可変レート動画像符号化方法。In the second encoding step, the second moving image sequence is set in units of encoded frame groups having a predetermined inter-frame prediction structure so that intra-frame encoded frames are inserted at a predetermined frame period. Encode while separating,
The second encoding difficulty level corresponding to the second moving image sequence is calculated based on the first encoding difficulty level corresponding to the first moving image sequence. 4. The variable-rate video encoding method according to claim 3, wherein the variable-rate video encoding method is calculated for each period corresponding to N times (N is a natural number) of the encoded frame group.

第２の符号化困難度に変換する変換ステップは、前記第１の動画像シーケンスと前記第２の動画像シーケンスとの間の符号化フレーム群の時間軸上の対応関係に基づいて、前記第２の動画像シーケンスを構成する符号化フレーム群のＮ倍（Ｎは自然数）に相当する期間毎に、前記第２の符号化困難度を算出することを特徴とする請求項４記載の可変レート動画像符号化方法。 The conversion step for converting to the second encoding difficulty level is based on the correspondence on the time axis of the encoded frame group between the first moving image sequence and the second moving image sequence. 5. The variable rate according to claim 4, wherein the second encoding difficulty level is calculated for each period corresponding to N times (N is a natural number) of the encoded frame group constituting the moving image sequence of 2. Video encoding method.

前記第一の符号化ステップによる符号化結果から、前記第１の動画像シーケンス中のシーンチェンジ位置を検出するステップをさらに具備し、
前記第２の符号化ステップは、前記第２の動画像シーケンスの画像内に前記検出されたシーンチェンジ位置が属するとき、その検出されたシーンチェンジ位置がランダムアクセスポイントとなるように、前記検出されたシーンチェンジ位置に応じて、前記第２の動画像シーケンスを符号化するための予測構造を決定することを特徴とする請求項１記載の可変レート動画像符号化方法。A step of detecting a scene change position in the first moving image sequence from the encoding result of the first encoding step;
In the second encoding step, when the detected scene change position belongs to an image of the second moving image sequence, the detected scene change position is detected as a random access point. 2. The variable rate moving picture coding method according to claim 1, wherein a prediction structure for coding the second moving picture sequence is determined in accordance with a scene change position.

前記第一の符号化ステップは、前記第１の動画像シーケンスの各フレームを解像度変換して縮小画像を生成するステップと、前記解像度変換された前記第１の動画像シーケンスを符号化するステップとを含むことを特徴とする請求項１記載の可変レート動画像符号化方法。The first encoding step includes a step of generating a reduced image by performing resolution conversion on each frame of the first moving image sequence, and a step of encoding the first moving image sequence subjected to the resolution conversion. The variable-rate video encoding method according to claim 1 , further comprising :

ユーザからの編集情報に基づいて動画像を編集し、その編集後の動画像を圧縮符号化する動画像編集システムにおいて、
編集対象の第１の動画像シーケンスを符号化する第一の符号化手段と、
前記第一の符号化手段による符号化結果から前記第１の動画像シーケンス中のシーンチェンジ位置を少なくとも含む動画像シーケンス分析情報を生成し、その動画像シーケンス分析情報を、ユーザによる編集操作を支援するための編集支援情報としてユーザに呈示する手段と、
前記第一の符号化手段による符号化結果から、前記第１の動画像シーケンスの所定単位毎の発生符号量および平均量子化ステップサイズを含む統計データを前記第１の動画像シーケンス全体にわたって計測する手段と、
前記計測された統計データに基づいて、前記第１の動画像シーケンスにおける連続する複数フレームから構成される第１の所定期間毎に当該第１の所定期間に属する画像の複雑さを示す第１の符号化困難度を算出する手段と、
前記第１の動画像シーケンスとその編集後の第２の動画像シーケンスとの間の時間軸上の対応関係に基づいて、前記第１の符号化困難度を、前記第２の動画像シーケンスにおける連続する複数フレームから構成される第２の所定期間毎に当該第２の所定期間に属する画像の複雑さを示す第２の符号化困難度に変換する変換手段であって、前記第２の所定期間が複数の前記第１の所定期間の間の境界にまたがる場合、前記第２の所定期間が属する前記複数の第１の所定期間それぞれに対する比率に応じて前記複数の第１の所定期間それぞれの第１の符号化困難度を加重平均することによって、前記第２の所定期間の第２の符号化困難度を求める変換手段と、
前記第２の符号化困難度に基づいて、前記第２の動画像シーケンスの前記第２の所定期間毎に割り当て符号量を決定する手段と、
この割り当て符号量に基づいて、前記第２の動画像シーケンスを可変レート符号化する第二の符号化手段とを具備することを特徴とする動画像編集システム。In a moving image editing system that edits a moving image based on editing information from a user and compresses and encodes the edited moving image,
First encoding means for encoding the first moving image sequence to be edited;
Generating moving image sequence analysis information including at least a scene change position in the first moving image sequence from the encoding result of the first encoding means, and assisting the user in editing the moving image sequence analysis information Means for presenting to the user as editing support information for
Statistical data including a generated code amount and an average quantization step size for each predetermined unit of the first moving image sequence is measured over the entire first moving image sequence from the result of encoding by the first encoding means. Means,
A first indicating the complexity of an image belonging to the first predetermined period for each first predetermined period composed of a plurality of consecutive frames in the first moving image sequence based on the measured statistical data. Means for calculating the encoding difficulty;
Based on the correspondence on the time axis between the first moving image sequence and the second moving image sequence after editing, the first encoding difficulty level is determined based on the second moving image sequence. Conversion means for converting into a second encoding difficulty indicating the complexity of an image belonging to the second predetermined period for each second predetermined period composed of a plurality of consecutive frames, the second predetermined period When a period spans a boundary between a plurality of the first predetermined periods, each of the plurality of first predetermined periods according to a ratio to each of the plurality of first predetermined periods to which the second predetermined period belongs Conversion means for obtaining a second encoding difficulty level in the second predetermined period by performing a weighted average of the first encoding difficulty levels ;
Means for determining an allocated code amount for each second predetermined period of the second moving image sequence based on the second encoding difficulty;
A moving image editing system comprising: a second encoding unit configured to perform variable rate encoding on the second moving image sequence based on the allocated code amount.