JP3948596B2

JP3948596B2 - Moving object detection and tracking device in moving images

Info

Publication number: JP3948596B2
Application number: JP2000061200A
Authority: JP
Inventors: 晴久加藤; 康之中島; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2000-03-06
Filing date: 2000-03-06
Publication date: 2007-07-25
Anticipated expiration: 2020-03-06
Also published as: JP2001250118A

Description

【０００１】
【発明の属する技術分野】
この発明は動画像内の移動物体検出追跡装置に関し、特に、圧縮符号化データそのもの、またはその一部だけを復号した情報から、高速かつ高精度に移動領域を抽出できる動画像内の移動物体検出追跡装置に関する。
【０００２】
【従来の技術】
従来の移動領域を検出する方式（以下、第１の検出方式）、特に移動する物体の検出方式として、あらかじめ保持されている背景画像に対して、対象フレームにおけるある走査線の微分や差分の絶対値和を移動物体の検出判定に使用する方式がある。また、オプティカルフローから移動物体を求める方式、小領域の輝度分布の相関や距離情報、速度情報を考慮して移動領域を検出する方法などが提案されている。これらの方式は、動画像の各画素の輝度値一つ一つに様々なアルゴリズムを適用して移動領域を抽出する方式である。
【０００３】
この第１の検出方式を圧縮符号化された動画像データに適用するには、符号化データを一旦復号する必要があり、例えば、図９のブロック図で示される構成により実現される。図９の可変長復号部５１は圧縮された画像符号化データを入力とし、可変長復号、逆量子化や係数範囲制限等の復号処理を行う。該可変長復号部５１からは、復号されたフレームやブロックの動き予測誤差情報ａ、動き予測情報ｂ、符号化情報等が出力される。該動き予測誤差情報ａは画像生成部５２に入力され、該画像生成部５２で逆変換されて１フレームの画像を生成する。また、動き補償部５３は動き予測の参照フレームを保持している画像メモリ５４から動き予測情報ｂを使って参照画像ｃを抽出する。加算器５５は、画像生成部５２からの出力画像と、動き補償部５３からの動き補償画像とを加算し、完全な１フレームの復号画像ｄを生成する。該復号画像ｄは移動領域検出部５６へ送られて前記した第１の検出方式の移動領域検出処理がなされる。また、該復号画像ｄが次フレーム以降の参照画像となるフレームであれば、画像メモリ５４に蓄積される。移動領域検出部５６で移動領域検出処理がされた画像は画像表示部５７で表示される。
【０００４】
また、従来の他の移動領域検出方式（以下、第２の検出方式）として、画素そのものを利用するのではなく、圧縮符号化された動画像の符号化データそのものを利用する方式が提案されている。この方式は圧縮の際に求められる各種のパラメータや符号化データを直接操作することで移動領域の検出処理を達成する。検出処理は画像内の符号単位ブロックが持つ動き予測情報を中心に、必要な情報のみを取捨選択して移動領域の抽出が行われる。このため、抽出される解像度は動き予測情報を持つ各圧縮方式の符号化単位ブロックの大きさに設定される。
【０００５】
さらに、従来の移動領域追跡方式（以下、第３の検出方式）として、検出処理と並んで過去の検出の結果を未来のフレームの検出処理に活かす技術がある。この技術は、前記第１、第２の検出方式のいずれにも有効である。前記第１の検出方式では、前フレームの目標物体の領域とオプティカルフローなどの速度から移動領域を推定する。前記第２の検出方式では動き予測情報をもとに移動領域を推定する。これらの方式は過去数フレームの移動領域の軌跡から未来の予測位置を領域単位で推定し、推定結果を次フレームの検出処理判定に利用する。
【０００６】
【発明が解決しようとする課題】
しかしながら、前記した第１〜３の検出方式には次のような問題がある。動画像は非常に広い信号帯域を持ち膨大なデータ量を必要とするので、一般的に動画像は圧縮された形で記録や伝送に広く利用されている。したがって、１フレームの復号処理の後、初めて画素領域での移動領域検出処理が施される第１の検出方式では、本来の移動領域検出処理のコストに加えて、動画像の復号処理という大きなコストを必要とする問題がある。
【０００７】
一方、前記の第２の検出方式は、移動領域の判定に、動き補償をなす動き予測情報の有無を利用している。したがって、復号処理過程が省略でき検出処理も高速に実行できる。しかし、実際の動画像には環境光の煌き、散乱、大気の揺らぎなどの要因によって、圧縮過程においては本来静止した背景領域であってもその変化に動きベクトルが割り当てられることがある。特に画素が均一な領域においてはこの影響が大きく、静止領域を移動領域と誤認識するなど検出精度が劣るという問題がある。
【０００８】
また、前記の第３の検出方式には、符号化データ領域での追跡処理において、移動物体の軌跡は重心の軌跡や領域内動きベクトルの平均から求める方法がある。しかし、重心の軌跡では、移動物体が小さな場合、ブロック数の僅かな増減で重心が大きく変動する可能性がある。また、単純な動きベクトル平均の場合、平坦部における動き予測情報のように信頼性の低い情報の影響も受けるという問題がある。
【０００９】
本発明の目的は、前述した従来技術の問題点を解消し、圧縮された符号化データそのもの、またはその一部だけを復号した情報から、移動物体を高速かつ高精度に検出および追跡できる移動物体検出追跡装置を提供することにある。
【００１０】
【課題を解決するための手段】
前記した目的を達成するために、本発明は、圧縮された動画像のデータを入力とし、該動画像のデータに移動領域情報を付加して出力する動画像内の移動物体検出追跡装置において、前記圧縮された動画像データを部分復号して、符号化モード情報、予測誤差情報および動き予測情報を出力する可変長復号部と、前記符号化モード情報と動き予測された領域の位置情報とを基に、移動物体の検出処理対象ブロックを設定する検出対象設定部と、該検出対象設定部によって設定された検出処理対象ブロックが移動領域に属するか否かを検出する移動領域検出部と、該移動領域検出部で検出された移動領域から、該移動領域全体の動きを予測する領域動き予測部とを具備し、前記移動領域検出部は、前記予測誤差情報と動き予測情報の双方を包含するブロック、もしくはフレーム内情報だけで符号化されたブロックを移動領域として検出し、前記領域動き予測部は、前記移動領域内の動き補償情報に対し前記予測誤差情報のパワーに応じた重み付け平均をとり、該重み付け平均を該移動領域全体の動き予測とし、該動き予測を前記検出対象設定部に提供するようにした点に特徴がある。
【００１１】
この発明によれば、圧縮符号化された動画像データを部分的に復号することに加え、移動物体の検出処理の対象を動き予測された領域の位置情報を基に設定し、該設定されたブロックにつき移動領域に属するか否かを検出するようにしたので、従来の画素領域の検出方式（前記第１の検出方式）に対しては無論のこと、符号データ領域での検出方式（前記第２の検出方式）と比較しても、はるかに処理コストを抑えることができるようになる。
【００１２】
【発明の実施の形態】
以下に、図面を参照して、本発明を詳細に説明する。図１は、本発明の一実施形態の構成を示すブロック図である。
【００１３】
可変長復号部１は、入力してくる動画像の圧縮符号化データを部分的に復号し、現フレームの符号化モード情報ｐと、予測誤差情報ａと、動き予測情報ｂを出力する。符号化モード情報ｐは検出対象設定部２と移動領域検出処理部３に送られ、予測誤差情報ａと動き予測情報ｂは移動領域検出処理部３に送られる。
【００１４】
前記符号化モード情報ｐには、フレーム符号化モードとブロック符号化モードがあり、フレーム符号化モードには、フレーム内符号化フレーム（Ｉピクチャ）、順方向予測フレーム（Ｐピクチャ）、双方向予測フレーム（Ｂピクチャ）の３種類が存在する。一方、ブロック符号化モードには、動き補償と符号化の有無から分類される、フレーム内符号化ブロック（Intra ）、動き補償符号化ブロック（MC coded）、フレーム差分符号化ブロック（no MC coded ）、動き補償ブロック（MC no coded ）の４種類が存在する。また、動き予測には、順方向、逆方向、両方向の３種類が存在する。
【００１５】
次に、検出対象設定部２は、可変長復号部１からの符号化モード情報ｐと、領域動き予測部４からの領域の動き予測位置情報ｑを入力とし、検出対象ブロック位置情報ｒと検出結果を出力する。該検出対象設定部２の機能を、図２のフローチャート、および図３を参照して説明する。なお、該検出対象設定部２は、符号化モード情報ｐ中のブロック符号化モードを用いて処理を行う。
【００１６】
領域動き予測部４の詳細な機能は後述するが、いま検出結果メモリ５から提供された移動領域Ｃが、図３(a) のブロックＢ１〜Ｂ５からなる領域であるとすると、領域動き予測部４は該領域Ｃと現在のブロックデータとを比較して、領域Ｃの動き予測ＭＶを用い、同図(b) に示されているような、領域の動き予測位置情報ｑを出力する。
【００１７】
図２のステップＳ２では、例えば同図(b) の各ブロックが１つずつ、領域の動き予測位置ｑの内縁部、すなわち該予測位置ｑに完全に覆われているか否かの判断がなされる。この判断が肯定の時には、移動領域の時間的連続性から見て当該ブロックは移動物体と判定し（ステップＳ１０）、ステップＳ８に進む。ステップＳ８では、検出対象フラグを立てない処理を行う。例えば、同図(b) のブロックＢm は移動物体と判定される。一方、この判断が否定の時には、ステップＳ３に進み、当該ブロックは領域の動き予測位置ｑの外周部、すなわち該予測位置ｑに一部だけ覆われているか否かの判断がなされる。例えば同図(b) のブロックＢn 、Ｂn+1 等は外周部であると判断され、ブロックＢs などは、外周部にないと判断される。
【００１８】
この判断が肯定の場合にはステップＳ４に進む。ステップＳ４は符号化モード情報を用いて移動領域の可能性を判断し、検出判定の処理対象として妥当か否かを決定する。Ｐピクチャ、Ｂピクチャ内に存在するIntra ブロックは、移動領域の後ろに隠れていた背景が現れる場面など動き補償では十分な画質が維持できない場合に現れることが多いため、検出対象とする。
【００１９】
一方、no MC coded は動き予測情報を持たないので移動領域である可能性は低く、検出判定の処理対象から外すとともに静止領域と決定する。MC no coded は動きを表す動き予測情報を持つが予測誤差情報を持たない。予測誤差情報が無いことは該当ブロックがベクトルの信頼性が低い平坦領域であるか、若しくは完全に静止した領域で予測誤差情報を必要としないことを示している。よって、予測誤差情報を持たず動き予測情報の信頼性に乏しいMC no coded は検出判定の処理対象から外すとともに静止領域と決定する。MC codedは動き予測情報と予測誤差情報をともに持つので、これを検出対象とする。
【００２０】
ステップＳ４の判断が肯定の時には、ステップＳ５に進んで当該ブロックに検出対象フラグが立てられる。一方、前記ステップＳ３の判断が否定の時、または前記ステップＳ４の判断が否定の時、すなわち当該ブロックのブロック符号化モードがＭＣ no coded またはno ＭＣ codedの時には、ステップＳ６に進んで静止領域と判定される。ステップＳ７では、当該ブロックに検出対象フラグを立てない処理をする。
【００２１】
ステップＳ９では、１フレームの全部のブロックに対して該検出対象処理が終了したか否かの判断がなされ、この判断が否定の時にはステップＳ１に戻って、次のブロックに対する処理が行われる。
【００２２】
したがって、同図(b) においては、ブロックＢn 、Ｂn+1 等が領域の動き予測位置ｑの外周部にあると判定され、これらのブロックのブロック符号化モードがＭＣ codedまたはIntra であれば、これらのブロックに検出対象フラグが立てられる。そして、これらのブロックは、現フレームにおける移動領域検出対象として、前記移動領域検出処理部３に送られる。具体的には、前記ステップＳ５で検出対象フラグが立てられたブロックの位置情報ｒが該移動領域検出処理部３に送られる。また、検出対象設定部において移動領域若しくは静止領域と決定されたブロックについては検出結果表示部および検出結果メモリへ送る。
【００２３】
次に、前記移動領域検出処理部３の機能を、図４を参照して説明する。該移動領域検出処理部３には、可変長復号部１からの符号化モード情報ｐ、予測誤差情報ａ、および動き予測情報ｂと、検出対象設定部２からの検出対象ブロック位置情報ｒが入力される。
【００２４】
ステップＳ１１では、検出対象ブロック位置情報ｒが入力したか否かの判断がなされ、この判断が肯定になると、ステップＳ１２に進んで、ＤＣＴ係数判定、すなわち、ＤＣＴ係数を用いて、当該ブロックが移動領域候補に該当するか否かの判定が行われる。この判定で、当該ブロックが移動領域候補であると判定されるとステップＳ１３に進み、否定と判定されるとステップＳ１７に進む。ステップＳ１３では、当該ブロックのマクロブロックのタイプがＭＣ codedであるか否かの判断がなされ、この判断が肯定の時にはステップＳ１４に進み、否定の時にはステップＳ１５に進む。
【００２５】
ステップＳ１５では、動きベクトルにより、当該ブロックが移動領域候補に該当するか否かの判定が行われる。この判定で、当該ブロックが移動領域候補であると判定されると、ステップＳ１５に進み、逆に静止領域と判定されると、ステップＳ１７に進む。次に、ステップＳ１５では、空間的相関により、当該ブロックが移動領域候補に該当するか否かの判定が行われる。そして、この判定で、当該ブロックが移動領域候補であると判定されると、ステップＳ１６に進み、当該ブロックは最終的に移動領域候補であると判定される。逆に該判断が否定になると、ステップＳ１７に進み静止領域であると判定される。
【００２６】
ステップＳ１８では、全ての移動領域検出対象ブロックに対して前記した処理が終わったか否かの判断が行われ、この判断が否定の時には、ステップＳ１１に戻って前記と同じ処理が繰り返される。
【００２７】
以上の処理によって得られた処理結果は、必要に応じて、領域の形状を整形され、検出結果表示部６に出力されると同時に、検出結果メモリ５に出力され保存される。
【００２８】
次に、動き予測誤差情報をもとにしたステップＳ１２の詳細を説明する。ここでは、動き予測誤差情報としてＤＣＴ係数を用いる。移動物体の形状や模様を表す情報として、ＤＣＴ係数を用いることができる。しかし、ブロックのＤＣＴ係数そのものを用いた場合、複雑なテクスチャを持つ領域ならば移動の有無に関わらず全て抽出される。動き予測誤差のＤＣＴ係数については、動き予測が情報の大半を補償するため複雑な模様を持った背景が静止している場合、僅かなＤＣＴ係数しか持たない。従って、動き予測誤差量が大きい場合、移動領域候補とすることができる。
【００２９】
本判定を満足する場合、動きベクトル相関判定部へ進む．そうでなければ静止領域とする。具体的なDCT 係数による判定方法については閾値以上のＤＣＴ係数の個数やＤＣＴ係数の絶対値和、２乗和などの閾値処理を用いることができる。また、以下のような方法も用いることができる。図５(a) に示されるようにＤＣＴ係数部分和判定はＤＣ成分を除く低周波寄りのＡＣ成分の部分和を判定基準に用いて、有意な誤差情報を持つ領域を候補領域として絞り込む。ここでいう低周波寄りのＡＣ成分とは予測誤差情報においてジグザグスキャンなど符号化スキャンオーダーで1 番目からthr １番目までのＤＣＴ係数情報を指す。このＡＣ成分はフレームｉのブロックｊにおける動き補償による差分値Ｃi,j,z (0 ≦z ≦63) である。このとき、図１１の式(1) を満たすことを判定基準とする。また、判定に用いる閾値thr ２はフレーム符号化モード別に設定する。
【００３０】
この判断が肯定の時には、当該ブロックは有意な誤差情報をもつ領域であるので、ステップＳ３２に進んで移動領域候補とする。一方、該判定が否定の時には、ステップＳ３３に進んで静止領域のブロックであると判定する。なお、ＡＣ成分は、動き補償による差分値であり、前記閾値はフレーム符号化モード別に設定する。
【００３１】
次に、前記ステップＳ１４の動きベクトル判定の詳細を、図６(a) 〜(d) を参照して説明する。ステップＳ４１では、当該ブロックの動きベクトル（ＭＶ）の長さが予め定められた閾値以上であるか否かの判断を行う（同図(b) 参照）。長さの短い動きベクトルは、様々な外乱によるノイズベクトルと考えられるので、該長さの短い動きベクトルをもつブロックは静止領域とする。
【００３２】
動きベクトル長が閾値以上の場合は空間的判定部に進む。ここで、移動領域の動きベクトルは参照フレームと入力フレーム間距離におおよそ比例する．したがって，フレーム間距離に応じた閾値処理で移動領域候補を決定することができる。例えば，ベクトル長を判定する閾値は参照フレームまでの時間的距離によって変化させ、Ｉピクチャ若しくはＰピクチャが出力される時点で初期の閾値に再設定する。フレームｉにおけるブロックｊの順方向動きベクトルと逆方向動きベクトルをそれぞれｆｍｖi,j とｂｍｖi,j とし、入力フレームから順方向参照フレームまでの距離をｕi 、Ｐピクチャ間隔をｍ、初期閾値をthr ３とすると、例えば移動領域候補の決定に下記の式(2) 、式(3) のような判別式を用いることができる。但し、｜｜はユークリッドノルムを表す。
｜ｆｍｖi,j ｜≧ｕi ・thr ３ …(2)
｜ｂｍｖi,j ｜≧（ｍ−ｕi ）・thr ３ …(3)
【００３３】
入力ブロックが双方向予測ベクトルの場合、順方向予測、逆方向予測ベクトルについて双方のベクトル長を検証する。両方の式の成立を持って移動領域候補とする。そうでなければ、背景領域と決定する。片方向予測の場合、順方向予測、逆方向予測ベクトルいずれかのみについて検証する。条件を満足するとき、動きベクトルは十分な長さを保有していると判断し次の判定処理に移る。そうでなければ、静止領域と判定する。
【００３４】
次に、ステップＳ４２では、当該ブロックが双方向予測ブロックであるか否かの判断がなされる。この判断が肯定の場合にはステップＳ４３に進み、否定の場合にはステップＳ４４に進む。双方向予測ブロックは２つの動き予測ベクトルを内包する。短時間での移動領域の動きを線形と仮定すると、２つの動き予測ベクトルが示す向きは互いに反対方向となるため，判定Ｓ４３では順方向と逆方向を指すベクトルの角度を調査する。２つのベクトルの方向を調べる方法としては下記の式(4) のように内積値を用いる方法を使用する。thr ４は２つのベクトルが作る角度の閾値を設定する。式４を満たせば動きベクトル間の相関が高いと判断し、時間相関判定へ進む。そうでなければ静止領域と決定する。
ｆｍｖi,j ・ｂｍｖi,j ／｜ｆｍｖi,j ｜・｜ｂｍｖi,j ｜≦cos(thr ４) …(4)
【００３５】
ステップＳ４３では、双方向予測において、順方向と逆方向を指すベクトルの角度θが予め定められた閾値以上であるか否かを判断し（同図(c) 参照）、閾値以上のブロックは移動領域候補とし、該閾値より小さい場合は静止領域であると判断する。
【００３６】
次に、ステップＳ４４では、動きベクトルＭＶの時間的変化率がフレーム間距離に比例するか否かの判断をする（同図(d) 参照）。物体の移動は、局所的には、線形移動であると仮定し、現ブロックが参照する領域の動き予測情報と、現ブロックの動き予測情報の時間的な相関があるブロックを、移動領域候補として選択する。双方向予測ブロックについては、前方予測、後方予測ベクトルと、参照フレーム間の動きベクトルとの時間的相関を調査し、時間的相関があるブロックを、移動領域候補として選択する。
【００３７】
もし、動きベクトルに時間的相関が存在しないならば、そのブロックは静止領域と決定する。そうでなければ，動きベクトル情報に関して移動領域と判定する。時間的相関を判定する一例を図６(d) に示す。図６(d) は参照フレームにある同じ移動領域を指す動きベクトル同士はその長さがフレーム間距離に比例することを利用している。現フレームｉのブロックｊに対して，比較対象となるフレームk における比較対象ブロックをｌとしたとき、閾値thr ５で抑えられる式(5) 、式(6) を時間相関の判別式とする。ここで、ブロックｌの決定法については後述するが、ブロックｊとｌはそれぞれのフレームにおいて同じ移動物体に相当する。入力ブロックが双方向予測ベクトルの場合、順方向予測、逆方向予測ベクトルについて時間的相関を検証する。両方の式の成立を持って移動領域候補とする。そうでなければ、静止領域とする．片方向予測の場合、順方向予測、逆方向予測ベクトルいずれかのみで判断する。
｜ｕk ・ｆｍｖi,j −ｕi ・ｆｍｖk,l ｜≦thr ５ …(5)
｜（ｍ−ｕk ）・ｂｍｖi,j −（ｍ−ｕi ）・ｂｍｖk,j ｜≦thr ５ …(6)
【００３８】
比較対象となるフレームk の選択については自由度が存在する。例えばフレームｋをフレームｉの直前のフレーム（ｉ−１）とする。但し、フレームｋがＩピクチャやＰピクチャの場合は、フレームｉの参照フレームとフレームｋの参照フレームが異なるので、未来の参照フレーム（ｉ−１＋ｍ）を比較対象フレームｋとする。フレームｋ内の比較対象ブロックｌは、図６(e) に示すように参照フレームまでの距離の比で現在の動きベクトルを内分した場所を占めるブロックとする。判別式を満足する場合、ブロックｊ、ブロックｌの動きベクトル長の比が参照フレームまでの距離の比に近く、信頼性が高い動き予測情報と判断する。そうでなければ、信頼性の低い動き予測情報と判断し、そのブロックは静止領域とする。
【００３９】
また、Ｂピクチャが存在しない符号化方式では、連続するＰピクチャにおいて各動きベクトルが直前の参照フレームの動き予測情報量と類似する。このような場合、直前のＰピクチャの動きベクトルとの相関を調査することができる。例えば、下記の式(7) のように直前のＰピクチャの動きベクトルと現ブロックの絶対差分値を用いて相関を調査する．
｜ｆｍｖi,j −ｆｍｖi-1,ｌ｜≦thr ６
【００４０】
次に、前記ステップＳ１５の空間的相関判定の詳細を、図７を参照して説明する。図７(a) は同一領域生成部をフローチャートで表したものであり、該フローチャートで空間的な相関関係を調べる。入力情報は符号化モード情報，動き予測情報，予測誤差情報，移動領域候補情報から構成される。まず、入力情報は同一領域生成部に移り、同一領域を形成する。
【００４１】
図７(a) の符号化モード判定（ステップＳ５１）では、MC codedとIntra を調べる。MC codedの場合、ベクトル長判定（ステップＳ５２）へ進み、Intra の場合はＤＣ判定（ステップＳ５４）へ進む。動きベクトル長判定では、近傍の移動領域候補ブロックに対して式(8) 、式(9) で動きベクトル長の類似性を判定する。入力ブロックが双方向予測ベクトルの場合、順方向予測、逆方向予測ベクトルについて双方のベクトル長を検証する。両方の式が成立する場合、動きベクトル角度判定（ステップＳ５３）へ進む。そうでなければ同一領域でないと決定する（ステップＳ５６）。片方向予測の場合には、順方向予測、逆方向予測ベクトルいずれかのみについて検証する。もしブロックｊとブロックｎが持つベクトル長の差が閾値以下ならば動きベクトル角度判定（ステップＳ５３）へ進む．そうでなければ、同一領域ではないと決定する（ステップＳ５６）。例えば、近傍ブロックｎは入力ブロックｊを中心とした周囲８ブロックを指す。
｜ｆｍｖi,j ｜−｜ｆｍｖi,n ｜≦thr ７ …(8)
｜ｂｍｖi,j ｜−｜ｂｍｖi,n ｜≦thr ７ …(9)
【００４２】
動きベクトル角度判定（ステップＳ５３）では近傍の移動領域候補ブロックに対して式(10)、式(11)で角度の類似性を判定する。入力ブロックが双方向予測ベクトルの場合、順方向予測、逆方向予測ベクトルについて双方のベクトル角度を検証する。片方向予測の場合、順方向予測、逆方向予測ベクトルいずれかのみについて検証する。もしブロックｊとブロックｎのベクトルが作る角度が閾値以下ならば、ＤＣ成分判定（ステップＳ５４）へ移る。そうでなければ、同一領域ではないと決定する。
ｆｍｖi,j ・ｆｍｖi,n ／｜ｆｍｖi,j ｜・｜ｆｍｖi,n ｜≦cos(thr8) …(10)
ｂｍｖi,j ・ｂｍｖi,n ／｜ｂｍｖi,j ｜・｜ｂｍｖi,n ｜≦cos(thr8) …(11)
【００４３】
ＤＣ成分判定（ステップＳ５４）ではＤＣＴ係数ＤＣ成分の類似性を判定する。但し、ＤＣＴ係数の類似度判定には動き補償後の差分値ではなく本来のＤＣＴ係数にて判定を行う。ＤＣ成分は参照フレームｋにおける参照ブロックｌのＤＣ成分Ｃk,j,0 と予測誤差ＤＣ成分Ｃi,j,0 を足すことで求める．もし式(12)で示す近傍ブロックｎとのＤＣ成分の差が閾値thr ９以下ならば同一領域と決定し（ステップＳ５５）、そうでなければ同一領域ではないと決定する。ここで、ブロックｓはブロックｎの近傍ブロックである。
｜（Ｃi,j,0 ＋Ｃk,ｌ,0 ）−（Ｃi,n,0 ＋Ｃk,s,0 ）｜≦thr ９ …(12)
【００４４】
全ての移動領域候補ブロックにおいて、十分な類似性を持つブロック同士を同一移動領域と判定し互いに結合する（ステップＳ５７）。同時に結合情報を同一移動領域情報として出力する。
【００４５】
図７(a) から求められた同一領域情報は、図１０の空間相関判定に入力される（ステップＳ６１）。この判定は移動領域の空間的相関を検証する。同一領域を形成するブロック数が閾値よりも小さな領域は近傍に類似ブロックが存在しない孤立ブロックである。よって、もし領域ブロック数が閾値以下の空間的な相関が乏しいブロックならば（ステップＳ６２の判断が否定）、静止領域とする（ステップＳ６４）。そうでなければ（ステップＳ６２の判断が肯定）、移動領域候補とする（ステップＳ６３）。
【００４６】
全ての移動領域検出対象ブロックに対する処理が終了すると（ステップＳ６５）、検出処理結果は検出結果表示部６に出力すると同時に、この結果を未来のフレームにおける検出に利用するため、検出結果メモリ５にも出力し記録する。
【００４７】
例えば、図７(b) に矢示されているような各ブロックごとの動きベクトルがあった場合、移動領域候補ブロックＢ１〜Ｂ５に対して、隣接ブロックの動きベクトルの長さと角度、ＤＣＴ係数ＤＣ成分の相関を調べて、近傍のブロックとの連結度合いを計算する。これをもとに類似した角度と長さの動きベクトルをもつブロックを統合して同一領域を形成し、これを移動物体領域を決定する。
【００４８】
次に、前記領域動き予測部４（図１参照）の機能を説明する。領域動き予測部４では、領域毎の動き予測を行い、未来のフレームでの領域の出現位置を予測する。移動物体の形状は局所時間ではほぼ同一と仮定して、直前のフレームにおける移動物体判定結果を用いて移動物体の追跡に利用する。
【００４９】
図８(a) 、(b) 、(c) は、検出処理で求められた領域全体の動きを予測する方法を表している。領域動き予測部４（図１参照）への入力情報は、同図(b) に示されているような、移動領域検出処理部３の出力である移動領域の構成情報である。
【００５０】
領域毎の動き予測の推定は次のように行う。領域全体の動き予測はテクスチャを表すＤＣＴ係数のパワーを反映させる。テクスチャを持たない平坦領域では動きベクトルが物体の動きを正確に捉えることが困難なので、ＤＣＴ係数のパワーを動きベクトルの信頼性の基準とすることができる。よって、入力情報の予測誤差情報をもとにＤＣＴ係数のパワーに応じて動きベクトルを重み付け平均して領域全体の動きを求める（ステップＳ７１）。ここで、重み計算部は重みｗi,j のパワー表現にはＤＣＴ係数の絶対値和、２乗和、閾値以上の個数等の情報を利用することが可能である。例えば、図１１の式(13)にＤＣＴ係数の絶対値和を重み係数とした具体式を示す。ここで、thr １は式(1) にてＤＣＴ係数の低周波と高周波の分離に用いた閾値である。
【００５１】
重み計算部で求められた重み情報は動きベクトル計算部へ進む。重み情報に対して動きベクトル計算部では、重み計算部で求めた重み係数ｗi,j を用いて動きベクトルの重み付け平均を求める（ステップＳ７２）。ある移動領域Ｒ全体の動き予測ベクトルをｆｍｖR 、ｂｍｖR とするとき、それぞれ、図１１の式(14)、式(15)にて求める。この計算結果はスケーリング部へ移る。
【００５２】
スケーリング部では領域全体の動き予測ベクトルをフレーム間距離情報でスケーリング補正を行い、次フレームでの存在場所を示す動き予測位置情報とする（ステップＳ７３）。次フレームでの予測ベクトルｍｖR は、移動領域Ｒ内に逆方向動き予測情報の重み付け平均ｂｍｖR が存在する場合は式(16)から求める。一方、移動領域Ｒ内に逆方向動き予測情報が無い場合やＰピクチャの場合は、順方向動き予測情報の重み付け平均ｆｍｖR を反転して式(17)から求める。Ｉピクチャでは直前のフレームでの動きを保持する。この領域毎の動き予測情報は前述の検出対象設定部に出力される。
ｍｖR ＝ｂｍｖR ／（ｍ−ｕi ） …(16)
ｍｖR ＝−ｆｍｖR ／ｕi …(17)
【００５３】
【発明の効果】
以上の説明から明らかなように、本発明によれば、圧縮符号化された動画像データを部分的に復号することに加え、移動領域検出処理の対象を動き予測情報と予測誤差情報の両方を備えたブロックに限定することにより、従来の画素領域の検出方式（前記第１の検出方式）に対しては無論のこと、符号データ領域での検出方式（前記第２の検出方式）と比較してもはるかに処理コストを抑えることが可能となる。
【００５４】
また、過去の検出処理結果をもとに検出処理対象を限定するようにしたので、効果的に処理コストの削減を達成することができる。また、過去からの処理結果情報は時間的な連続性を維持し、検出結果を再生画像と重ね合わせて確認するとき自然な形状を保証することができる。
【００５５】
また、本発明によれば、過去における移動領域の動き予測を、動き予測情報の信頼性を考慮した重み付け動き予測ベクトルに求めるようにしたので、移動領域を確実に追跡することができる。この結果、調査対象のブロック数やフレームを大幅に削減しても、検出精度を劣化させずに移動物体検出を行うことができるようになる。つまり、本発明方式は検出判定の適用範囲を必要最小限に抑えることができるため、圧縮符号化データ上での移動領域抽出の高速性をさらに活かすことが可能である。
【図面の簡単な説明】
【図１】本発明の一実施形態の概略の構成を示すブロック図である。
【図２】図１の検出対象設定部の動作を示すフローチャートである。
【図３】図１の検出対象設定部の動作の説明図である。
【図４】図１の移動領域検出処理部の動作を示すフローチャートである。
【図５】図４のステップＳ１２の詳細説明図である。
【図６】図４のステップＳ１４の詳細説明図である。
【図７】図４のステップＳ１５の詳細説明図である。
【図８】図１の領域動き予測部の詳細説明図である。
【図９】従来装置の一構成を示す概略ブロック図である。
【図１０】図４のステップＳ１５の他の詳細説明図である。
【図１１】数式を表す図である。
【符号の説明】
１…可変長復号部、２…検出対象設定部、３…移動領域検出処理部、４…領域動き予測部、５…検出結果メモリ、６…検出結果表示部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a moving object detection / tracking apparatus in a moving image, and in particular, to detect a moving object in a moving image that can extract a moving region at high speed and with high accuracy from information obtained by decoding compressed encoded data itself or only a part thereof. It relates to a tracking device.
[0002]
[Prior art]
As a conventional method for detecting a moving region (hereinafter referred to as a first detection method), particularly a method for detecting a moving object, the differential or difference of a certain scanning line in a target frame with respect to a background image held in advance. There is a method that uses a sum of values for detection detection of a moving object. In addition, a method for obtaining a moving object from an optical flow, a method for detecting a moving region in consideration of a luminance distribution correlation, distance information, and velocity information in a small region have been proposed. These methods are methods for extracting a moving region by applying various algorithms to the luminance values of each pixel of a moving image.
[0003]
In order to apply this first detection method to compression-encoded moving image data, it is necessary to once decode the encoded data. For example, this is realized by the configuration shown in the block diagram of FIG. The variable length decoding unit 51 in FIG. 9 receives compressed image encoded data and performs decoding processing such as variable length decoding, inverse quantization, and coefficient range restriction. The variable length decoding unit 51 outputs motion prediction error information a, motion prediction information b, encoding information, and the like of the decoded frame or block. The motion prediction error information a is input to the image generation unit 52 and is inversely converted by the image generation unit 52 to generate an image of one frame. Further, the motion compensation unit 53 extracts the reference image c from the image memory 54 holding the motion prediction reference frame using the motion prediction information b. The adder 55 adds the output image from the image generation unit 52 and the motion compensation image from the motion compensation unit 53 to generate a complete decoded image d of one frame. The decoded image d is sent to the moving area detecting unit 56, and the moving area detecting process of the first detection method described above is performed. Further, if the decoded image d is a frame that becomes a reference image after the next frame, it is stored in the image memory 54. The image subjected to the movement area detection process by the movement area detection unit 56 is displayed on the image display unit 57.
[0004]
In addition, as another conventional moving region detection method (hereinafter referred to as a second detection method), a method has been proposed in which encoded data of a compressed moving image is used instead of using the pixel itself. Yes. This method achieves a moving region detection process by directly manipulating various parameters and encoded data required for compression. In the detection process, the movement area is extracted by selecting only necessary information, centering on the motion prediction information of the code unit block in the image. For this reason, the extracted resolution is set to the size of the encoding unit block of each compression method having motion prediction information.
[0005]
Further, as a conventional moving area tracking method (hereinafter, referred to as a third detection method), there is a technique of using past detection results in detection processing of future frames along with detection processing. This technique is effective for both the first and second detection methods. In the first detection method, the moving area is estimated from the area of the target object in the previous frame and the velocity of the optical flow. In the second detection method, the moving area is estimated based on the motion prediction information. In these methods, a predicted future position is estimated for each region from the trajectory of the moving region of the past several frames, and the estimation result is used for the detection processing determination of the next frame.
[0006]
[Problems to be solved by the invention]
However, the above first to third detection methods have the following problems. Since a moving image has a very wide signal band and requires an enormous amount of data, the moving image is generally widely used for recording and transmission in a compressed form. Therefore, in the first detection method in which the moving area detection process in the pixel area is performed for the first time after the decoding process of one frame, in addition to the cost of the original moving area detection process, the large cost of the moving picture decoding process There is a problem that requires.
[0007]
On the other hand, the second detection method uses the presence / absence of motion prediction information for motion compensation in the determination of the moving region. Therefore, the decoding process can be omitted and the detection process can be executed at high speed. However, due to factors such as ambient light scattering, scattering, and atmospheric fluctuations, an actual moving image may be assigned a motion vector to the change even in an originally stationary background region in the compression process. In particular, this influence is large in a region where pixels are uniform, and there is a problem that detection accuracy is inferior, for example, a still region is erroneously recognized as a moving region.
[0008]
Further, in the third detection method, there is a method of obtaining the trajectory of the moving object from the trajectory of the center of gravity and the average of the intra-region motion vectors in the tracking process in the encoded data region. However, in the locus of the center of gravity, if the moving object is small, the center of gravity may fluctuate greatly with a slight increase or decrease in the number of blocks. In addition, in the case of a simple motion vector average, there is a problem that it is also affected by information with low reliability such as motion prediction information in a flat part.
[0009]
An object of the present invention is to solve the above-described problems of the prior art and to detect and track a moving object at high speed and with high accuracy from compressed encoded data itself or information obtained by decoding only a part thereof. It is to provide a detection tracking device.
[0010]
[Means for Solving the Problems]
In order to achieve the above-described object, the present invention provides a moving object detection / tracking apparatus in a moving image which receives compressed moving image data as an input, adds moving region information to the moving image data, and outputs the moving image data. A variable length decoding unit that partially decodes the compressed moving image data and outputs encoding mode information, prediction error information, and motion prediction information; and the encoding mode information; Position information of motion-predicted area When A detection target setting unit that sets a detection processing target block of a moving object, a moving region detection unit that detects whether or not the detection processing target block set by the detection target setting unit belongs to the movement region, An area motion prediction unit that predicts the movement of the entire movement area from the movement area detected by the movement area detection unit; The moving region detecting unit detects a block including both the prediction error information and the motion prediction information or a block encoded only with intra-frame information as a moving region, and the region motion predicting unit detects the moving region. A weighted average corresponding to the power of the prediction error information is taken with respect to the motion compensation information, and the weighted average is used as the motion prediction of the entire moving region, and the motion prediction is provided to the detection target setting unit. There is a feature in the point.
[0011]
According to the present invention, in addition to partially decoding the compressed and encoded moving image data, the moving object detection processing target is set based on the position information of the motion-predicted region, and the set Since whether or not a block belongs to a moving region is detected, it goes without saying that the conventional pixel region detection method (the first detection method) is a detection method in the code data region (the first one). Compared with the detection method (2), the processing cost can be greatly reduced.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.
[0013]
The variable length decoding unit 1 partially decodes the compressed encoded data of the input moving image, and outputs encoding mode information p, prediction error information a, and motion prediction information b of the current frame. The encoding mode information p is sent to the detection target setting unit 2 and the moving region detection processing unit 3, and the prediction error information a and the motion prediction information b are sent to the moving region detection processing unit 3.
[0014]
The coding mode information p includes a frame coding mode and a block coding mode. The frame coding mode includes an intra-frame coded frame (I picture), a forward prediction frame (P picture), and bidirectional prediction. There are three types of frames (B pictures). On the other hand, the block coding mode is classified based on the presence or absence of motion compensation and coding, intra-frame coding block (Intra), motion compensation coding block (MC coded), frame differential coding block (no MC coded). There are four types of motion compensation blocks (MC no coded). There are three types of motion prediction: forward direction, reverse direction, and both directions.
[0015]
Next, the detection target setting unit 2 receives the encoding mode information p from the variable length decoding unit 1 and the motion prediction position information q of the region from the region motion prediction unit 4, and detects the detection target block position information r and the detection. Output the result. The function of the detection target setting unit 2 will be described with reference to the flowchart of FIG. 2 and FIG. The detection target setting unit 2 performs processing using the block coding mode in the coding mode information p.
[0016]
Although the detailed function of the area motion prediction unit 4 will be described later, if the moving area C provided from the detection result memory 5 is an area composed of the blocks B1 to B5 in FIG. 4 compares the area C with the current block data, uses the motion prediction MV of the area C, and outputs the motion estimation position information q of the area as shown in FIG.
[0017]
In step S2 of FIG. 2, for example, it is determined whether or not each block in FIG. 2B is completely covered by the inner edge of the motion prediction position q of the region, that is, the prediction position q. . When this determination is affirmative, the block is determined to be a moving object in view of the temporal continuity of the moving area (step S10), and the process proceeds to step S8. In step S8, a process that does not set the detection target flag is performed. For example, the block Bm in FIG. 5B is determined as a moving object. On the other hand, when this determination is negative, the process proceeds to step S3, where it is determined whether or not the block is partially covered by the outer peripheral portion of the motion predicted position q of the region, that is, the predicted position q. For example, it is determined that the blocks Bn, Bn + 1, etc. in FIG. 5B are at the outer periphery, and the block Bs is not at the outer periphery.
[0018]
If this determination is affirmative, the process proceeds to step S4. In step S4, the possibility of the moving area is determined using the encoding mode information, and it is determined whether or not the detection target is appropriate. Intra blocks existing in P and B pictures often appear when motion compensation cannot maintain sufficient image quality, such as when a background hidden behind a moving area appears, and are therefore subject to detection.
[0019]
On the other hand, since no MC coded has no motion prediction information, it is unlikely that it is a moving region, and it is excluded from the detection determination processing target and determined as a still region. MC no coded has motion prediction information representing motion but no prediction error information. The absence of prediction error information indicates that the corresponding block is a flat region with low vector reliability or does not require prediction error information in a completely stationary region. Therefore, MC no coded which does not have prediction error information and has poor reliability of motion prediction information is excluded from the detection determination processing target and determined as a still region. MC coded has both motion prediction information and prediction error information.
[0020]
When the determination in step S4 is affirmative, the process proceeds to step S5, and a detection target flag is set in the block. On the other hand, when the determination at step S3 is negative or when the determination at step S4 is negative, that is, when the block coding mode of the block is MC no coded or no MC coded, the process proceeds to step S6 and Determined. In step S7, processing for not setting a detection target flag in the block is performed.
[0021]
In step S9, it is determined whether or not the detection target process has been completed for all the blocks of one frame. If this determination is negative, the process returns to step S1 to perform the process for the next block.
[0022]
Therefore, in FIG. 5B, it is determined that the blocks Bn, Bn + 1, etc. are in the outer periphery of the motion prediction position q of the region, and if the block coding mode of these blocks is MC coded or Intra, A detection target flag is set in these blocks. These blocks are sent to the moving area detection processing unit 3 as moving area detection targets in the current frame. Specifically, the position information r of the block for which the detection target flag is set in step S5 is sent to the moving area detection processing unit 3. Further, the block determined as the moving area or the stationary area in the detection target setting section is sent to the detection result display section and the detection result memory.
[0023]
Next, the function of the moving area detection processing unit 3 will be described with reference to FIG. The moving region detection processing unit 3 receives the encoding mode information p, the prediction error information a, and the motion prediction information b from the variable length decoding unit 1, and the detection target block position information r from the detection target setting unit 2. Is done.
[0024]
In step S11, it is determined whether or not the detection target block position information r is input. If this determination is affirmative, the process proceeds to step S12, and the block is moved using DCT coefficient determination, that is, using the DCT coefficient. A determination is made as to whether the region is a candidate. If it is determined in this determination that the block is a moving area candidate, the process proceeds to step S13. If it is determined negative, the process proceeds to step S17. In step S13, it is determined whether or not the macroblock type of the block is MC coded. If this determination is affirmative, the process proceeds to step S14, and if not, the process proceeds to step S15.
[0025]
In step S15, it is determined whether the block corresponds to a moving area candidate based on the motion vector. If it is determined in this determination that the block is a moving area candidate, the process proceeds to step S15. If it is determined that the block is a still area, the process proceeds to step S17. Next, in step S15, it is determined whether or not the block corresponds to a moving region candidate by spatial correlation. If it is determined in this determination that the block is a moving area candidate, the process proceeds to step S16, and it is finally determined that the block is a moving area candidate. On the other hand, if the determination is negative, the process proceeds to step S17 and is determined to be a still area.
[0026]
In step S18, it is determined whether or not the above-described processing has been completed for all the moving area detection target blocks. If this determination is negative, the processing returns to step S11 and the same processing is repeated.
[0027]
The processing result obtained by the above processing is shaped in the area as necessary and output to the detection result display unit 6 and at the same time is output and stored in the detection result memory 5.
[0028]
Next, details of step S12 based on the motion prediction error information will be described. Here, DCT coefficients are used as motion prediction error information. A DCT coefficient can be used as information representing the shape or pattern of a moving object. However, when the DCT coefficient of the block itself is used, all regions having a complex texture are extracted regardless of whether or not there is movement. The DCT coefficients of motion prediction errors have only a few DCT coefficients when the background with a complex pattern is stationary because motion prediction compensates for most of the information. Therefore, when the motion prediction error amount is large, it can be set as a moving region candidate.
[0029]
If this determination is satisfied, the process proceeds to the motion vector correlation determination unit. Otherwise, it is a static area. As a specific determination method using DCT coefficients, threshold processing such as the number of DCT coefficients equal to or greater than the threshold, the sum of absolute values of DCT coefficients, and the sum of squares can be used. Moreover, the following methods can also be used. As shown in FIG. 5A, DCT coefficient partial sum determination uses a partial sum of low-frequency AC components excluding DC components as a criterion, and narrows down areas having significant error information as candidate areas. The AC component closer to the low frequency here refers to DCT coefficient information from the first to the thr first in the coding scan order such as zigzag scanning in the prediction error information. This AC component is a difference value Ci, j, z (0 ≦ z ≦ 63) by motion compensation in block j of frame i. At this time, it is determined that the expression (1) in FIG. 11 is satisfied. The threshold value thr 2 used for determination is set for each frame encoding mode.
[0030]
When this determination is affirmative, since the block is an area having significant error information, the process proceeds to step S32 to be a moving area candidate. On the other hand, when the determination is negative, the process proceeds to step S33 to determine that the block is a still area. The AC component is a difference value by motion compensation, and the threshold is set for each frame coding mode.
[0031]
Next, details of the motion vector determination in step S14 will be described with reference to FIGS. 6 (a) to 6 (d). In step S41, it is determined whether or not the length of the motion vector (MV) of the block is greater than or equal to a predetermined threshold (see FIG. 5B). Since a motion vector with a short length is considered as a noise vector due to various disturbances, a block having the motion vector with a short length is a still region.
[0032]
If the motion vector length is greater than or equal to the threshold, the process proceeds to the spatial determination unit. Here, the motion vector of the moving region is roughly proportional to the distance between the reference frame and the input frame. Therefore, the moving region candidate can be determined by threshold processing according to the interframe distance. For example, the threshold for determining the vector length is changed according to the temporal distance to the reference frame, and is reset to the initial threshold when an I picture or P picture is output. The forward and backward motion vectors of block j in frame i are fmvi, j and bmvi, j, respectively, the distance from the input frame to the forward reference frame is ui, the P picture interval is m, and the initial threshold is thr 3 Then, for example, discriminants such as the following formulas (2) and (3) can be used to determine the moving region candidates. However, || represents Euclidean norm.
| Fmvi, j | ≧ ui · thr 3 (2)
| Bmvi, j | ≧ (m−ui) · thr 3 (3)
[0033]
When the input block is a bidirectional prediction vector, both vector lengths of the forward prediction vector and the backward prediction vector are verified. If both expressions are satisfied, the region is set as a moving area candidate. Otherwise, it is determined as the background area. In the case of unidirectional prediction, only the forward prediction or the backward prediction vector is verified. When the condition is satisfied, it is determined that the motion vector has a sufficient length, and the process proceeds to the next determination process. Otherwise, it is determined as a still area.
[0034]
Next, in step S42, it is determined whether or not the block is a bidirectional prediction block. If this determination is affirmative, the process proceeds to step S43, and if negative, the process proceeds to step S44. The bi-predictive block contains two motion prediction vectors. Assuming that the motion of the moving region in a short time is linear, the directions indicated by the two motion prediction vectors are opposite to each other. Therefore, in decision S43, the angle of the vector indicating the forward direction and the reverse direction is investigated. As a method for examining the directions of the two vectors, a method using an inner product value as shown in the following equation (4) is used. thr 4 sets the threshold of the angle created by the two vectors. If Expression 4 is satisfied, it is determined that the correlation between the motion vectors is high, and the process proceeds to temporal correlation determination. Otherwise, it is determined as a still area.
fmvi, j · bmvi, j / | fmvi, j | · | bmvi, j | ≦ cos (thr 4) (4)
[0035]
In step S43, in bi-directional prediction, it is determined whether or not the angle θ of the vector indicating the forward direction and the reverse direction is equal to or greater than a predetermined threshold (see FIG. 5C), and blocks exceeding the threshold are moved. A region candidate is determined, and if it is smaller than the threshold, it is determined that the region is a still region.
[0036]
Next, in step S44, it is determined whether or not the temporal change rate of the motion vector MV is proportional to the inter-frame distance (see FIG. 4D). Assuming that the movement of the object is locally a linear movement, a block having a temporal correlation between the motion prediction information of the region referred to by the current block and the motion prediction information of the current block is determined as a moving region candidate. select. For the bi-predictive block, the temporal correlation between the forward prediction and backward prediction vectors and the motion vector between the reference frames is investigated, and a block with temporal correlation is selected as a moving region candidate.
[0037]
If there is no temporal correlation in the motion vector, the block is determined as a still area. Otherwise, it is determined as a moving area with respect to the motion vector information. An example of determining temporal correlation is shown in FIG. FIG. 6D uses the fact that the motion vectors pointing to the same moving area in the reference frame are proportional to the distance between the frames. For the block j of the current frame i, the comparison target block in the frame k to be compared is l Threshold thr Equations (5) and (6) suppressed by 5 are used as discriminants for time correlation. Where block l Will be described later, but block j and l Corresponds to the same moving object in each frame. When the input block is a bidirectional prediction vector, the temporal correlation is verified for the forward prediction vector and the backward prediction vector. If both expressions are satisfied, the region is set as a moving area candidate. Otherwise, it is a static area. In the case of unidirectional prediction, the determination is made only with either the forward prediction vector or the backward prediction vector.
| Uk · fmvi, j −ui · fmvk, l | ≦ thr 5 (5)
| (M−uk) · bmvi, j − (m−ui) · bmvk, j | ≦ thr 5 (6)
[0038]
There is a degree of freedom in selecting the frame k to be compared. For example, the frame k is a frame (i−1) immediately before the frame i. However, if the frame k is an I picture or a P picture, the reference frame of the frame i and the reference frame of the frame k are different, so the future reference frame (i−1 + m) is set as the comparison target frame k. Block to be compared in frame k l Is a block occupying a place where the current motion vector is internally divided by the ratio of the distance to the reference frame as shown in FIG. If the discriminant is satisfied, block j, block l The motion vector length ratio is close to the distance ratio to the reference frame, and the motion prediction information is determined to be highly reliable. Otherwise, it is determined that the motion prediction information has low reliability, and the block is a still area.
[0039]
Also, in a coding scheme in which no B picture exists, each motion vector in a continuous P picture is similar to the amount of motion prediction information in the immediately preceding reference frame. In such a case, the correlation with the motion vector of the immediately preceding P picture can be investigated. For example, the correlation is investigated using the motion vector of the previous P picture and the absolute difference value of the current block as shown in the following equation (7).
| Fmvi, j -fmvi-1, l | ≦ thr 6
[0040]
Next, details of the spatial correlation determination in step S15 will be described with reference to FIG. FIG. 7A is a flowchart showing the same region generation unit, and the spatial correlation is examined in the flowchart. The input information includes coding mode information, motion prediction information, prediction error information, and moving area candidate information. First, the input information moves to the same area generation unit, and forms the same area.
[0041]
In coding mode determination (step S51) in FIG. 7 (a), MC coded and Intra are examined. In the case of MC coded, the process proceeds to vector length determination (step S52), and in the case of Intra, the process proceeds to DC determination (step S54). In motion vector length determination, the similarity of motion vector lengths is determined by Equation (8) and Equation (9) for neighboring moving region candidate blocks. When the input block is a bidirectional prediction vector, both vector lengths of the forward prediction vector and the backward prediction vector are verified. If both equations are satisfied, the process proceeds to motion vector angle determination (step S53). Otherwise, it is determined that they are not in the same area (step S56). In the case of unidirectional prediction, only the forward prediction or the backward prediction vector is verified. If the difference between the vector lengths of the block j and the block n is equal to or smaller than the threshold value, the process proceeds to the motion vector angle determination (step S53). Otherwise, it is determined that they are not in the same area (step S56). For example, the neighboring block n indicates eight surrounding blocks centered on the input block j.
| Fmvi, j | − | fmvi, n | ≦ thr 7 (8)
│bmvi, j │-│bmvi, n │≤thr 7 (9)
[0042]
In the motion vector angle determination (step S53), the similarity of the angle is determined by the equations (10) and (11) with respect to the neighboring moving region candidate blocks. When the input block is a bidirectional prediction vector, both vector angles are verified for the forward prediction vector and the backward prediction vector. In the case of unidirectional prediction, only the forward prediction or the backward prediction vector is verified. If the angle formed by the vector of block j and block n is equal to or smaller than the threshold value, the process proceeds to DC component determination (step S54). Otherwise, it is determined that they are not in the same area.
fmvi, j · fmvi, n / | fmvi, j | · | fmvi, n | ≦ cos (thr8) (10)
bmvi, j · bmvi, n / | bmvi, j | · | bmvi, n | ≦ cos (thr8) (11)
[0043]
In the DC component determination (step S54), the similarity of the DCT coefficient DC component is determined. However, the DCT coefficient similarity determination is performed using the original DCT coefficient, not the difference value after motion compensation. The DC component is the reference block in reference frame k l Is obtained by adding the DC component Ck, j, 0 and the prediction error DC component Ci, j, 0. If the difference of the DC component from the neighboring block n shown in the equation (12) is equal to or smaller than the threshold value thr 9, it is determined as the same region (step S55), otherwise it is determined as not the same region. Here, the block s is a neighboring block of the block n.
| (Ci, j, 0 + Ck, l , 0) − (Ci, n, 0 + Ck, s, 0) | ≦ thr 9 (12)
[0044]
In all the moving area candidate blocks, blocks having sufficient similarity are determined to be the same moving area and combined with each other (step S57). At the same time, the combined information is output as the same movement area information.
[0045]
The same area information obtained from FIG. 7A is input to the spatial correlation determination of FIG. 10 (step S61). This determination verifies the spatial correlation of the moving region. A region where the number of blocks forming the same region is smaller than a threshold is an isolated block in which no similar block exists in the vicinity. Therefore, if the number of area blocks is less than the threshold and the spatial correlation is poor (No in step S62), the area is set as a still area (step S64). Otherwise (judgment in step S62 is affirmative), it is determined as a moving area candidate (step S63).
[0046]
When the processing for all the moving area detection target blocks is completed (step S65), the detection processing result is output to the detection result display unit 6, and at the same time, this result is used for detection in a future frame. Output and record.
[0047]
For example, when there is a motion vector for each block as indicated by an arrow in FIG. 7B, the length and angle of the motion vector of the adjacent block, the DCT coefficient DC, with respect to the moving region candidate blocks B1 to B5. The correlation between components is examined, and the degree of connection with neighboring blocks is calculated. Based on this, blocks having similar angle and length motion vectors are integrated to form the same region, and a moving object region is determined based on this.
[0048]
Next, the function of the region motion prediction unit 4 (see FIG. 1) will be described. The region motion prediction unit 4 performs motion prediction for each region, and predicts the appearance position of the region in a future frame. Assuming that the shape of the moving object is almost the same in local time, the moving object determination result in the immediately preceding frame is used to track the moving object.
[0049]
FIGS. 8A, 8B, and 8C show a method for predicting the motion of the entire area obtained by the detection process. The input information to the region motion prediction unit 4 (see FIG. 1) is the configuration information of the moving region that is the output of the moving region detection processing unit 3 as shown in FIG.
[0050]
The estimation of motion prediction for each region is performed as follows. The motion prediction for the entire area reflects the power of the DCT coefficient representing the texture. Since it is difficult for the motion vector to accurately capture the motion of the object in a flat region having no texture, the power of the DCT coefficient can be used as a reference for the reliability of the motion vector. Therefore, based on the prediction error information of the input information, the motion vector is weighted and averaged according to the power of the DCT coefficient to obtain the motion of the entire region (step S71). Here, the weight calculation unit can use information such as the sum of absolute values of DCT coefficients, the sum of squares, and the number of thresholds or more, etc., for the power representation of the weights wi, j. For example, the formula (13) in FIG. 11 shows a specific formula using the sum of absolute values of DCT coefficients as a weighting coefficient. Here, thr 1 is a threshold value used for separating the low frequency and high frequency of the DCT coefficient in equation (1).
[0051]
The weight information obtained by the weight calculator proceeds to the motion vector calculator. For the weight information, the motion vector calculation unit obtains a weighted average of the motion vectors using the weighting coefficient w i, j obtained by the weight calculation unit (step S72). When motion prediction vectors for an entire moving region R are set to fmvR and bmvR, they are obtained by equations (14) and (15) in FIG. 11, respectively. The calculation result moves to the scaling unit.
[0052]
In the scaling unit, the motion prediction vector of the entire region is subjected to scaling correction with the inter-frame distance information, and is used as motion prediction position information indicating the location in the next frame (step S73). The prediction vector mvR in the next frame is obtained from equation (16) when the weighted average bmvR of the backward motion prediction information exists in the moving region R. On the other hand, when there is no backward motion prediction information in the moving region R or in the case of a P picture, the weighted average fmvR of the forward motion prediction information is inverted and obtained from Equation (17). In the I picture, the motion in the immediately preceding frame is retained. The motion prediction information for each region is output to the aforementioned detection target setting unit.
mvR = bmvR / (m-ui) (16)
mvR = -fmvR / ui (17)
[0053]
【The invention's effect】
As is apparent from the above description, according to the present invention, in addition to partially decoding the compression-coded moving image data, both the motion prediction information and the prediction error information are targeted for the moving region detection processing. By limiting to the provided block, it goes without saying that the conventional pixel area detection method (the first detection method) is compared with the detection method in the code data area (the second detection method). Even processing costs can be greatly reduced.
[0054]
Further, since the detection processing targets are limited based on the past detection processing results, it is possible to effectively reduce the processing cost. Further, the processing result information from the past maintains temporal continuity, and a natural shape can be ensured when the detection result is confirmed by overlapping with the reproduced image.
[0055]
Further, according to the present invention, since the motion prediction of the moving region in the past is obtained from the weighted motion prediction vector considering the reliability of the motion prediction information, the moving region can be reliably tracked. As a result, it is possible to detect a moving object without degrading the detection accuracy even if the number of blocks and frames to be investigated are greatly reduced. In other words, since the method of the present invention can minimize the application range of detection determination, it is possible to further utilize the high speed of moving region extraction on compressed and encoded data.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of a detection target setting unit in FIG. 1;
FIG. 3 is an explanatory diagram of an operation of a detection target setting unit in FIG. 1;
4 is a flowchart showing an operation of a moving area detection processing unit in FIG. 1;
FIG. 5 is a detailed explanatory diagram of step S12 of FIG.
FIG. 6 is a detailed explanatory diagram of step S14 in FIG.
FIG. 7 is a detailed explanatory diagram of step S15 in FIG.
FIG. 8 is a detailed explanatory diagram of a region motion prediction unit in FIG. 1;
FIG. 9 is a schematic block diagram showing a configuration of a conventional apparatus.
FIG. 10 is another detailed explanatory diagram of step S15 of FIG.
FIG. 11 is a diagram illustrating a mathematical expression.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Variable length decoding part, 2 ... Detection object setting part, 3 ... Moving region detection process part, 4 ... Area motion estimation part, 5 ... Detection result memory, 6 ... Detection result display part

Claims

圧縮された動画像のデータを入力とし、該動画像のデータに移動領域情報を付加して出力する動画像内の移動物体検出追跡装置において、
前記圧縮された動画像データを部分復号して、符号化モード情報、予測誤差情報および動き予測情報を出力する可変長復号部と、
前記符号化モード情報と動き予測された領域の位置情報とを基に、移動物体の検出処理対象ブロックを設定する検出対象設定部と、
該検出対象設定部によって設定された検出処理対象ブロックが移動領域に属するか否かを検出する移動領域検出部と、
該移動領域検出部で検出された移動領域から、該移動領域全体の動きを予測する領域動き予測部とを具備し、
前記移動領域検出部は、前記予測誤差情報と動き予測情報の双方を包含するブロック、もしくはフレーム内情報だけで符号化されたブロックを移動領域として検出し、
前記領域動き予測部は、前記移動領域内の動き補償情報に対し前記予測誤差情報のパワーに応じた重み付け平均をとり、該重み付け平均を該移動領域全体の動き予測とし、該動き予測を前記検出対象設定部に提供することを特徴とする動画像内の移動物体検出追跡装置。In a moving object detection and tracking device in a moving image that receives compressed moving image data as an input, adds moving area information to the moving image data, and outputs it.
A variable length decoding unit that partially decodes the compressed moving image data and outputs encoding mode information, prediction error information, and motion prediction information;
Based on the position information of the encoding mode information and the motion predicted region, a detection target setting unit for setting a detection target block of the moving object,
A moving region detecting unit for detecting whether or not the detection processing target block set by the detection target setting unit belongs to the moving region;
An area motion prediction unit that predicts the movement of the entire movement area from the movement area detected by the movement area detection unit;
The moving region detection unit detects a block including both the prediction error information and motion prediction information, or a block encoded only with intra-frame information as a moving region,
The region motion prediction unit takes a weighted average corresponding to the power of the prediction error information for motion compensation information in the moving region, sets the weighted average as a motion prediction of the entire moving region, and detects the motion prediction as the detection A moving object detection and tracking device in a moving image, which is provided to a target setting unit .

請求項１に記載の動画像内の移動物体検出追跡装置において、
前記予測誤差情報のパワー表現に、前記移動領域内の閾値以上の値を持つＤＣＴ係数の個数、ＤＣＴ係数の絶対値和、２乗和および部分和のいずれかを選択的に用いることを特徴とする動画像内の移動物体検出追跡装置。The moving object detection and tracking device in a moving image according to claim 1 ,
For the power expression of the prediction error information, any one of the number of DCT coefficients having a value equal to or larger than a threshold value in the moving region, an absolute value sum of DCT coefficients, a square sum, and a partial sum is selectively used. A moving object detection and tracking device in a moving image.

請求項１に記載の動画像内の移動領域検出追跡装置において、
前記動き予測された領域の位置情報は、前記領域動き予測部で予測された領域の動き予測位置情報であり、前記検出対象設定部は該領域の動き予測位置の内縁部を移動物体と決定することを特徴とする動画像内の移動物体検出追跡装置。In the moving region detection tracking apparatus in the moving image according to claim 1,
The position information of the motion-predicted region is motion prediction position information of the region predicted by the region motion prediction unit, and the detection target setting unit determines an inner edge portion of the motion prediction position of the region as a moving object. A moving object detection and tracking device in a moving image.

請求項１に記載の動画像内の移動物体検出追跡装置において、前記動き予測された領域の位置情報は、前記領域動き予測部で予測された領域の動き予測位置情報であり、前記検出対象設定部は該領域の動き予測位置の外周部を移動領域の検出対象とすることを特徴とする動画像内の移動物体検出追跡装置。 The moving object detection and tracking device in a moving image according to claim 1, wherein the position information of the motion-predicted region is motion prediction position information of the region predicted by the region motion prediction unit, and the detection target setting The moving object detection and tracking device in a moving image is characterized in that an outer peripheral portion of the motion prediction position of the region is a detection target of the moving region.

請求項１に記載の動画像内の移動領域検出追跡装置において、前記動き予測された領域の位置情報は、前記領域動き予測部で予測された領域の動き予測位置情報であり、前記検出対象設定部は前記領域動き予測部で予測された領域の動き予測位置の外周部であって、前記符号化モード情報中の所定のブロック符号化情報に該当する予測位置を、移動領域の検出対象とすることを特徴とする動画像内の移動物体検出追跡装置。2. The moving region detection and tracking device in a moving image according to claim 1, wherein the position information of the motion predicted region is motion predicted position information of the region predicted by the region motion prediction unit, and the detection target setting is set. Is a peripheral part of the motion prediction position of the region predicted by the region motion prediction unit, and a predicted position corresponding to predetermined block coding information in the coding mode information is set as a detection target of the moving region A moving object detection and tracking device in a moving image.

請求項１ないし５のいずれかに記載の動画像内の移動領域検出追跡装置において、
前記領域動き予測部は、前記動き予測された領域全体の動き情報に基づいて、フレーム間の時間的距離によるスケーリングを行った後、領域全体を追跡することを特徴とする動画像内の移動物体検出追跡装置。In the moving region detection tracking apparatus in the moving image according to any one of claims 1 to 5 ,
The region motion prediction unit performs tracking based on temporal distance between frames based on motion information of the entire region predicted for motion, and then tracks the entire region. Detection tracking device.