JP3784086B2

JP3784086B2 - Video signal encoding / decoding device and encoding / decoding method

Info

Publication number: JP3784086B2
Application number: JP29129194A
Authority: JP
Inventors: 正加瀬沢; 喜子幡野; 隆篠原; 幸治岡崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-11-25
Filing date: 1994-11-25
Publication date: 2006-06-07
Anticipated expiration: 2021-06-07
Also published as: JPH08149481A

Description

【０００１】
【産業上の利用分野】
この発明は、動き補償予測を用いて映像信号の符号化・復号化を行う映像信号符号化・復号化装置及び符号化・復号化方法に関するものである。
【０００２】
【従来の技術】
映像信号符号化・復号化装置における符号化手段として、動き補償予測とＤＣＴ（離散コサイン変換）を併用したものがよく使われている。以下に説明する従来例もこれを用いたものである。
【０００３】
図１１乃至図１５は、例えば、ISO/IEC 13818-2 Draft International Standardに示されたような、従来の映像信号符号化装置及び該符号化装置により符号化された映像信号を再生する復号化装置について説明するための図である。
ここに、図１１は映像信号符号化装置の概略構成を示すブロック図、図１２は映像信号復号化装置の概略構成を示すブロック図であり、また、図１３，図１４は映像信号の符号化に際して行われる動き補償予測の概念を示すための概念図、図１５は符号化のためのベクトルコードを示す図（なお、同図ではいくつかの探索範囲におけるベクトルコードのみを示している。）である。
また、図１６は後述する動きベクトルの差分値とその出現確率との関係を示した図である。
【０００４】
一般に、動き補償予測とＤＣＴを用いた符号化では、１枚の画像情報を格子状の複数の小領域（以下、符号化対象領域と呼ぶ。）に分割し、かかる小領域毎に符号化を行う。
この動き補償予測と呼ばれるものは、現在符号化しようとしている画像（以下、符号化対象画像と呼ぶ。）の符号化対象領域に対して、過去に符号化したいくつかの画像（以下、参照画像と呼ぶ。）における最も類似し、かつ符号化対象領域と同じ大きさの領域（以下、予測領域と呼ぶ。）を検出し、当該予測領域と符号化対象領域との差信号のみを符号化して伝送するものである。
【０００５】
この際、どの領域が当該予測領域であるかという情報も同時に復号化手段に伝送することが必要であるが、この情報を動きベクトルと呼び、インタレース画像とノンインタレース画像とでは異なるが、本従来例では、説明上、水平動きベクトルと垂直動きベクトルの２つのベクトルによって構成されるものとする。
この動き補償予測を概念的に示したものが図１４である。
【０００６】
一方、復号化手段では、伝送されてきた上述の動きベクトルと再生された参照画像とから予測領域を検出し、該予測領域の映像信号に伝送されてきた差信号を加えるようにされており、ここに元の符号化対象領域の信号を再生できる。
【０００７】
また、予測領域は、図１３に示すように、参照画像において符号化対象画像の符号化対象領域と同じ水平／垂直位置を中心として水平方向に±ｈ画素、垂直方向に±ｖライン分拡張した領域（以下、探索領域と呼ぶ。）内から選択される。
一般に、動きの速い映像に対して符号化効率を上げるためにはその探索領域を広げることが必要である。そこで、従来の装置ではこの探索領域の大きさを適宜選択可能にし得るように構成されている。
【０００８】
では、従来の映像信号符号化装置の具体的な構成について、図１１に基づき説明する。
図において、１ａは映像信号の入力端子、２ａは符号化された映像信号の出力端子、３ａは減算手段、４ａは情報圧縮のため映像信号を水平／垂直の空間周波数に変換するＤＣＴ手段、５ａは量子化手段、６ａは逆量子化手段、７ａは周波数変換された映像信号を元の映像信号に再変換するＩＤＣＴ（逆離散コサイン変換）手段、８ａは加算手段、９ａはメモリ手段、１２ａはスイッチ（切替）手段、１３ａは可変長符号化手段、１４ａは送信バッファ手段、１５ａは符号量制御手段、１８ａは動き検出手段である。
【０００９】
入力端子１ａから入力された映像信号１０１は、動きベクトル生成のため、その一部が動き検出手段１８ａに入力されるとともに、減算手段３ａにおいて予測領域の信号との差信号１０２とされる。
この差信号１０２は、ＤＣＴ手段４ａにおいて周波数変換され、さらに量子化手段５ａによって量子化される。
【００１０】
そして、量子化された差信号１０４の一部は逆量子化手段６ａ及び逆ＤＣＴ手段７ａを介して再変換されて元の差信号とされ、加算手段８ａで予測領域の信号が加算されて元の映像信号となり、メモリ手段９ａに参照画像として蓄えられる。一方、残りの差信号１０４は可変長符号化手段１３ａにおいて、動き検出手段１８ａで生成された動きベクトル１１２とともに符号化され、多重化される。
ここで、可変長符号化とは、出現確率の高いシンボルには短い符号語を、出現確率の低いシンボルには長い符号語を割り当てる符号化手法の一つである。
【００１１】
そして、多重化信号１１４は送信バッファ手段１４ａを経て、出力端子２ａより伝送、あるいは、図示せぬ記録媒体に記録されることになる。
なお、符号量制御手段１５ａは、送信バッファ手段１４ａにおけるメモリ残量等の信号を受けて、オーバーフローが発生しないよう、量子化手段５ａの量子化ステップを適応的に変化させている。
【００１２】
一方、メモリ手段９ａに蓄えられた参照画像はスイッチ手段１２ａの第１の端子に入力されるとともに、動き検出手段１８ａにも入力される。（スイッチ手段１２ａの第２の端子には零信号が入力されている。）
【００１３】
動き検出手段１８ａでは、入力された参照画像１０８、及び、映像信号（符号化対象画像）１０１から符号化対象画像の符号化対象領域毎に、上述したような動きベクトル１１２を検出する（図１３，図１４参照）。
検出された動きベクトル１１２は可変長符号化手段１３ａに送出され、ここで隣接する符号化対象領域の動きベクトルとの差分値が算出され、これを可変長符号化したベクトルコードは、可変長符号化された差信号１０４に多重化される。
【００１４】
また、動き検出手段１８ａの出力１１０は、スイッチ手段１２ａの切替信号としても用いられ、かかる信号に基づき参照画像の映像信号は予測領域の信号１０９に変換されて減算手段３ａ及び加算手段８ａに入力される。さらに、かかる信号１１０は可変長符号化手段１３ａにおいて、差信号１０４と動きベクトルの符号化切替信号としても用いられる。
【００１５】
次に、上述のごとく符号化された映像信号を復号化する映像信号復号化装置の具体的構成について、図１２に基づき説明する。
図において、１ｂは符号化された映像信号の入力端子、２ｂは復号化された映像信号の出力端子、１４ｂは受信バッファ手段、１３ｂは可変長復号化手段、５ｂは逆量子化手段、４ｂはＩＤＣＴ手段、８ｂは加算手段、９ｂはメモリ手段、１２ｂはスイッチ手段である。
【００１６】
入力端子１ｂから入力された符号化映像信号２０１は、受信バッファ手段１４ｂを介して、可変長復号化手段１３ｂに入力される。可変長復号化手段１３ｂではこの符号化映像信号２０２を復号化し、動きベクトル２１３と差信号２０３とに分離する。
分離された差信号２０３は逆量子化手段５ｂで逆量子化され、ＩＤＣＴ手段４ｂで元の差信号２０５に変換される。さらに、この差信号２０５は、加算手段８ｂにおいて予測領域の信号２０８と加算されて元の符号化対象領域の信号２０６に戻され、その一部がメモリ手段９ｂに蓄えられるとともに、元の符号化対象画像として出力端子２ｂから出力される。
【００１７】
一方、メモリ手段９ｂは加算手段８ｂからの符号化対象領域信号２０６と可変長復号化手段で復号化された動きベクトル２１３とから予測領域の信号２０７を生成し、該予測領域信号２０７を可変長復号化手段で生成した動き補償予測のための切替信号２０９に基づいて切り替えられるスイッチ手段１２ｂを介して加算手段８ｂに入力させるように構成されている。なお、スイッチ手段１２ｂの一端には零信号が入力されており、この端子が選択されている場合には動き補償予測のなされていない信号が８ｂより出力される。
【００１８】
【発明が解決しようとする課題】
従来の映像信号符号化・復号化装置は以上のように構成されており、動きの速い映像に対する符号化効率を上げるため、探索領域の大きさを適宜設定することができるように構成されていた。
【００１９】
しかし、図１６に示すように、探索領域の大きさが異なると、それに伴って動きベクトルの差分値の出現確率が異なってくる。このため、可変長符号化手段１３ａにおいて動きベクトルのベクトルコードを作成する際にはその探索領域の大きさに応じてベクトルコードを異ならせることが必要となる。
【００２０】
そこで、従来の装置では、図１５に示すように、motion code（ある定められた可変長コード）と、motion residual（探索範囲に応じて符号長の定められたコード）の２つのコードを組み合わせることによってベクトルコードを作成するように構成されており、これらを組み合わせることで探索領域の大きさに応じた異なるベクトルコードを作成していた。
【００２１】
ここで、動きベクトルの差分値は、図１６に示されるように、その探索領域の大きさにかかわらず、ベクトルの差分値の小さいものの出現確率が高いという特徴を有している。しかし、従来のベクトルコードではこの点を全く考慮していなかったため、図１５に示すように、探索領域が大きくなるにしたがって、ベクトル差分値の小さい値を示すベクトルコードの符号長が長くなるという特徴を持っている。
【００２２】
このことは、動きの速い映像に対して符号化効率を上げるために探索領域を広げているのにもかかわらず、広げたことによりベクトルコードの平均語長が長くなってしまい、その結果、符号化効率が悪化し、画質を劣化させてしまうという問題を生じさせていた。
【００２３】
また、従来の装置では、motion codeと、motion residualの２つのコードを組み合わせることによって探索領域の大きさに応じて異なるベクトルコードを作成するようにしているため、あらゆる探索領域に対して最適なベクトルコードを得るために探索領域の大きさに対応した複数種類のベクトルコードを並列的に持つことが必要となる。従って、ハードウェア／ソフトウェア規模が必然的に大きくならざるを得ず、実用的ではないという問題点もあった。
【００２４】
本発明は、以上述べたような従来装置の問題点を解消するためになされたものであり、探索領域を広げても、符号化効率が劣化しない、また、ハードウェア／ソフトウェア規模を小規模のものとすることができる、映像信号符号化・復号化装置及び符号化・復号化方法を得ることを目的としている。
【００２５】
【課題を解決するための手段】
本発明に係る映像信号符号化装置は、１枚の画像情報に対応する領域を複数の領域に分割した上位符号化対象領域の時間的な動きを示す第１の動きベクトルを検出する第１の動き検出手段と、前記上位符号化領域を複数の領域に分割した符号化対象領域の時間的な動きを示す第２の動きベクトルを検出する第２の動き検出手段と、前記第１および第２の動き検出手段から出力される前記第１および第２の動きベクトルを符号化する可変長符号化手段とを備える。
また、第２の動きベクトルは、第１の動きベクトルで指定される領域を中心とした所定の探索領域内において検出されることを特徴とする。
また、第１および第２の動きベクトルがそれぞれに対応するベクトルコードによってコード化されることを特徴とする。
また、隣接する上位符号化対象領域及び符号化対象領域における第１および第２の動きベクトルの差分値をベクトルコードによりコード化することを特徴とする。
また、第１の動き検出手段は、符号化対象画像の低域成分を出力する低域通過フィルタ手段と、該低域通過フィルタ手段からの出力をサブサンプリングするサブサンプリング手段と、該サブサンプリング手段からのサブサンプリングされた映像信号を参照画像として蓄積するメモリ手段と、該メモリ手段から出力される前記参照画像出力と前記サブサンプリング手段から出力される前記符号化対象画像のサブサンプリング出力とに基づいて第１の動きベクトルを出力する代表ベクトル検出手段とを備える。
【００２６】
また、第ｎ画像及び第ｎ＋ｍ画像の第１の動きベクトルをもとに、当該第ｎ画像及び第ｎ＋ｍ画像の間にある第ｎ＋ｙ画像における過去の画像に対する第１の動きベクトルｖｆと未来の画像に対する第１の動きベクトルｖｂとを下記の式により作成することを特徴とする。
ｖｆ＝ { ｙ／ｍ } ×ｖ
ｖｂ＝ { （ｍ−ｙ）／ｍ } ×（−ｖ）
また、符号化対象画像のうちの一の上位符号化対象領域に対する第１の動きベクトルを他の上位符号化対象領域に対する第１の動きベクトルとすることを特徴とする。
本発明に係る映像信号符号化方法は、１枚の画像情報に対応する領域を複数の領域に分割した上位符号化対象領域の時間的な動きを示す第１の動きベクトルを検出する第１の動き検出工程と、前記上位符号化領域を複数の領域に分割した符号化対象領域の時間的な動きを示す第２の動きベクトルを検出する第２の動き検出工程と、前記第１および第２の動き検出工程によって得られる前記第１および第２の動きベクトルを符号化する可変長符号化工程とを含む。
また、第２の動きベクトルは、第１の動きベクトルで指定される領域を中心とした所定の探索領域内において検出されることを特徴とする。
また、第１および第２の動きベクトルがそれぞれに対応するベクトルコードによってコード化されることを特徴とする。
また、隣接する上位符号化対象領域及び符号化対象領域における第１および第２の動きベクトルの差分値をベクトルコードによりコード化することを特徴とする。
【００２８】
本発明に係る映像信号復号化装置は、１枚の画像情報に対応する領域を複数の領域に分割した上位符号化対象領域の時間的な動きを示す第１の動きベクトルおよび前記上位符号化対象領域を複数の領域に分割した符号化対象領域の時間的な動きを示す第２の動きベクトルを含む符号化映像信号から前記第１および第２の動きベクトルの各々に対応する各ベクトルコードを分離し復号する可変長復号化手段と、該可変長復号化手段から出力される前記第１および第２の動きベクトルコードに対応する前記第１および第２の動きベクトルに基づいて動きベクトルを出力する動きベクトル再生手段と、該動きベクトル再生手段から出力される前記動きベクトルに基づいて前記１枚の画像情報を再生する画像情報再生手段とを備える。
また、第１および第２の動きベクトルをベクトル合成することにより動きベクトルを再生することを特徴とする。
【００３０】
【作用】
本発明によれば、画像情報の大域的な動きを示す第１の動きベクトルと局所的な動きを示す第２の動きベクトルとの組み合わせによって、画像情報の符号化対象領域の動きベクトルを表すようにしているので、第１の動きベクトルの符号量を小さくできるとともに、第２の動きベクトルの検出のために複数の探索領域を設けておく必要がなくなる。
【００３１】
また、本発明によれば、第１の動きベクトルが検出されない場合にも、パンニング等、通常の画像情報の特徴及び人間の視覚特性に基づいて、他の第１の動きベクトルを用いて当該第１の動きベクトルの作成あるいは検出された第１の動きベクトルによる代用をさせることができる。
【００３２】
【実施例】
実施例１．
以下、本発明の実施例について図に基づき説明する。
図１乃至図８は、本発明の第１の実施例にかかる映像信号符号化装置及び該符号化装置により符号化された映像信号を再生する復号化装置について説明するための図である。
【００３３】
ここに、図１は映像信号符号化装置の概略構成を示すブロック図、図２は映像信号復号化装置の概略構成を示すブロック図、図３は第１の動き検出手段の構成の一例を示すブロック図である。また、図４は本実施例における符号化対象領域と上位符号化対象領域の概念を示す概念図、図５，図６は本実施例の映像信号の符号化に際して行われる動き補償予測の概念を示すための概念図、図７は符号化のためのベクトルコードを示す図、図８は本実施例の第１の動きベクトルと画像との関係を示す概念図である。
【００３４】
上述の図１６で説明したように、通常の画像における動きベクトルの差分値は、その探索領域の大きさにかかわらず、ベクトル差分値の小さいものの出現確率が非常に高い。これは、通常の画像の時間的な動きは、カメラのパンニング等に代表されるように、あるまとまった領域においては同じような動きをすることが多いことを意味している。よって、動きの速い画像の場合、その動きベクトル自体は大きな値をとるとしても、動きベクトルの差分値については大部分は小さな値をとることになる。
また、人間の視覚特性を考慮すると、このパンニングのような、あるまとまった領域毎の速い動きに対しては、人間の視覚特性は比較的良好であるのに対し、逆に画面の局所的な速い動きに対しては、人間の視覚特性は極度に劣化するという特徴がある。
【００３５】
本発明はかかる画像及び人間の視覚特性の特徴を利用したものであり、本実施例においては、まず、図４に示したような、上位符号化対象領域というものを定義する。これは、従来の符号化対象領域を複数集めることで構成されるものである。
そして、図５に示すように、この符号化対象画像を構成する全ての上位符号化対象領域に対して、大域的な動きを示す第１の動きベクトルをそれぞれ検出する。その後、図６に示すように、第１の動きベクトルの検出された上位符号化対象領域に含まれた符号化対象領域の各々に対し、第１の動きベクトルにて指定された位置を中心とする所定の探索領域内における第２の動きベクトルを検出し、これらの動きベクトルを各々符号化するようにしている。
【００３６】
このような本実施例によれば、第１の動きベクトルは広い領域に対して検出されるため、全体の符号量に対する第１の動きベクトルの符号量は非常に少ないものとなる。また、第２の動きベクトルは既に第１の動きベクトルにより広い領域の動きを検出しているため、動きの速い画像であっても、限られた範囲の中での局所的な動き検出を行えば十分であり、予め適当な１つの探索領域を設定しておき、この探索領域内において第２の動きベクトルを検出すればよい。従って、従来のように複数のベクトルコードを並列的に設ける必要はなくなる。
【００３７】
では、本実施例の具体的な装置構成について以下、説明する。図１は、このような本発明の第１の実施例にかかる映像信号符号化装置の構成を示すブロック図である。
図において、１ｃは映像信号の入力端子、２ｃは符号化された映像信号の出力端子、３ｃは減算手段、４ｃは情報圧縮のため映像信号を水平／垂直の空間周波数に変換するＤＣＴ手段、５ｃは量子化手段、６ｃは逆量子化手段、７ｃは周波数変換された映像信号を元の映像信号に再変換するＩＤＣＴ（逆離散コサイン変換）手段、８ｃは加算手段、９ｃはメモリ手段、１０ｃは第１の動き検出手段、１２ｃはスイッチ（切替）手段、１３ｃは可変長符号化手段、１４ｃは送信バッファ手段、１５ｃは符号量制御手段、１８ｃは第２の動き検出手段である。
【００３８】
入力端子１ｃから入力された映像信号３０１は、その一部が第１の動き検出手段１０ｃ及び第２の動き検出手段１８ｃに入力されるとともに、減算手段３ｃ入力されて予測領域の信号３０９との差信号３０２とされる。
この差信号３０２は、ＤＣＴ手段４ｃにおいて周波数変換され、さらに量子化手段５ｃによって量子化される。
【００３９】
そして、量子化された差信号３０４の一部は逆量子化手段６ｃ及び逆ＤＣＴ手段７ｃを介して再変換されて元の差信号とされ、加算手段８ｃで予測領域の信号３０９が加算されて元の映像信号となり、メモリ手段９ｃに参照画像として蓄えられる。
一方、残りの差信号３０４は可変長符号化手段１３ｃにおいて、第１の動き検出手段１０ｃ及び第２の動き検出手段１８ｃで生成された第１，第２の動きベクトル３１２，３１３とともに符号化され、多重化される。
【００４０】
そして、多重化信号３１４は送信バッファ手段１４ｃを経て、出力端子２ｃより伝送、あるいは、図示せぬ記録媒体に記録されることになる。
なお、符号量制御手段１５ｃは、送信バッファ手段１４ｃにおけるメモリ残量等の信号を受けて、オーバーフローが発生しないよう、量子化手段５ｃの量子化ステップを適応的に変化させている。
【００４１】
一方、メモリ手段９ｃに蓄えられた参照画像はスイッチ手段１２ｃの第１の端子に入力されるとともに、第２の動き検出手段１８ｃにも入力される。（スイッチ手段１２ｃの第２の端子には零信号が入力されている。）
【００４２】
第２の動き検出手段１８ｃでは、入力された参照画像３０８、映像信号（符号化対象画像）３０１及び第１の動き検出手段１０ｃで生成された第１の動きベクトル３１３から符号化対象画像の符号化対象領域毎に、上位符号化対象領域の第１の動きベクトルで指定される領域を中心とした所定の探索領域内において動き検出されて第２の動きベクトルを検出する（図６参照）。
検出された第１の動きベクトル３１３及び第２の動きベクトル３１２は可変長符号化手段１３ｃに送出され、ここでそれぞれ隣接する上位符号化対象領域及び符号化対象領域における動きベクトルとの差分値が算出され、これを図７に示すようなベクトルコードによりベクトルコード化し、可変長符号化された差信号３０４に多重化される。
【００４３】
また、第２の動き検出手段１８ｃの出力３１０は、スイッチ手段１２ｃの切替信号としても用いられ、かかる信号に基づき参照画像の映像信号３０８は予測領域の信号３０９に変換されて減算手段３ｃ及び加算手段８ｃに入力され、また、かかる信号３１０は可変長符号化手段１３ｃにおいて、差信号３０４と第１，第２の動きベクトルとの符号化切替信号としても用いられる。
【００４４】
なお、図７に示すように、本実施例では第１の動きベクトルのベクトルコードと、第２の動きベクトルのベクトルコードとにより動きベクトルがコード化される。
また、本実施例では第１の動きベクトルのベクトルコードとして８ビット固定長のコードを示したが、これに限られるものではなく、他のビット長でも、可変長コードでもよい。
さらに、本実施例では第２の動きベクトルのベクトルコードとして従来例に示した基準探索範囲におけるベクトルコードを示したが、これに限られるものではなく、他の探索範囲におけるベクトルコードとしてもよい。
【００４５】
次に、上述のごとく符号化された映像信号を復号化する映像信号復号化装置について、図２に基づき説明する。
図において、１ｄは符号化された映像信号の入力端子、２ｄは復号化された映像信号の出力端子、１４ｄは受信バッファ手段、１３ｄは可変長復号化手段、５ｄは逆量子化手段、４ｄはＩＤＣＴ手段、８ｄは加算手段、９ｄはメモリ手段、１２ｄはスイッチ手段、１７ｄは動きベクトル再生手段である。
【００４６】
入力端子１ｄから入力された符号化映像信号４０１は、受信バッファ手段１４ｄを介して、可変長復号化手段１３ｄに入力される。可変長復号化手段１３ｄではこの符号化映像信号４０２を復号化し、第１の動きベクトル４１０と第２の動きベクトル４１１と差信号４０３とに分離する。
分離された差信号４０３は逆量子化手段５ｄで逆量子化され、ＩＤＣＴ手段４ｄで元の差信号４０５に変換される。さらに、この差信号４０５は、加算手段８ｄにおいて予測領域の信号４０８と加算されて元の符号化対象領域の信号４０６に戻され、その一部がメモリ手段９ｄに蓄えられるとともに、元の符号化対象画像として出力端子２ｄから出力される。
【００４７】
一方、メモリ手段９ｄは加算手段８ｄからの符号化対象領域信号４０６と可変長復号化手段で復号化され、動きベクトル再生手段１７ｄでベクトル合成された動きベクトル４１２とから予測領域の信号４０７を生成し、該予測領域信号４０７を可変長復号化手段で生成した動き補償予測のための切替信号４０９に基づいて切り替えられるスイッチ手段１２ｄを介して加算手段８ｄに入力させるように構成されている。なお、スイッチ手段１２ｄの一端には零信号が入力されており、この信号端子が選択されている場合には動き補償予測のなされていない再生信号が加算手段８ｄより出力されることになる。
【００４８】
次に、本実施例における第１の動きベクトルの検出方法について説明する。図３は、図１に示した第１の動き検出手段１０ｃの具体的構成の一例を示す図である。
図において、１９ｃは低域通過フィルタ（ＬＰＦ）手段、２０ｃはサブサンプリング手段、２１ｃはメモリ手段、２２ｃは代表ベクトル検出手段である。
【００４９】
第１の動き検出手段１０ｃに入力された映像信号３０１は、ＬＰＦ手段１９ｃを通過することにより高周波成分が除去されるとともに、サブサンプリング手段２０ｃによりハードウェア規模を縮小するためにサブサンプリングされる。この際、サブサンプリングの前処理としてＬＰＦ手段１９ｃを施しているので、動き検出に与える折り返し歪の影響を除去することができる。
サブサンプリングされた映像信号は、メモリ手段２１ｃにおいて参照画像として蓄えられるとともに、代表ベクトル検出手段２２ｃに直接与えられる。代表ベクトル検出手段２２ｃでは、入力された映像信号から構成される符号化対象画像の上位符号化対象領域とメモリ手段２１ｃからの参照画像を基に、図５で説明したように第１の動きベクトルを検出する。
【００５０】
また、図８は本実施例における第１の動きベクトルと画像との関係を示す図である。
図において、縦線は各画像、横の短線は上位符号化対象領域の境界、矢印は第１の動きベクトルを示している。
同図からわかるように、第１の動きベクトルはすべての上位符号化対象領域に対して検出される。
【００５１】
実施例２．
次に、本発明の第２の実施例を説明する。
図９は第１の動きベクトルと画像との間の第１の関係を示す図である。
【００５２】
上述の実施例１では、符号化対象画像のすべての上位符号化対象領域に対して第１の動きベクトルが検出される場合について説明したが、第１の動きベクトルは、その画像の動きの早さや上位符号化対象領域の大きさの取り方によっては検出されない場合がある。
【００５３】
本実施例は、このような第１の動きベクトルが検出されない上位符号化対象領域における第１の動きベクトルの作成方法に関するものであり、符号化装置と復号化装置との間に定められた一定の規則に基づき、他の上位符号化対象領域の第１の動きベクトルから当該上位符号化対象領域の第１の動きベクトルを作成する。
【００５４】
図９は、ある画像間隔をおいて、第１の動きベクトルが検出された場合である。同図では、ｍ枚の画像毎に第１の動きベクトルが検出される。この際、第１の動きベクトルが検出されなかった画像では、第１の動きベクトルの検出された最も近接する未来の画像における当該第１の動きベクトルから、以下のような方法で第１の動きベクトルを作成する。
【００５５】
すなわち、第ｎ画像及び第ｎ＋ｍ画像の第１の動きベクトルが検出されている時、第ｎ＋ｍ画像における第１の動きベクトルをｖとすると、第ｎ＋ｙ画像では過去の画像に対する第１の動きベクトルｖｆと未来の画像に対する第１の動きベクトルｖｂを以下のように作成する。
ｖｆ＝｛ｙ／ｍ｝×ｖ
ｖｂ＝｛(ｍ−ｙ)／ｍ｝×（−ｖ）
【００５６】
なお、第１のベクトル作成方法としては、基本的には符号化装置と復号化装置との間で共通の規則に従って定められた方法であれば良く、上記式以外の方法であっても良い。
【００５７】
実施例３．
図１０は本発明の第３の実施例を示す図であり、第１の動きベクトルと画像との間の第２の関係を示すものである。
【００５８】
本実施例は、上記実施例２とは異なり、符号化対象画像のうち１つの上位符号化対象画像に対してのみ第１の動きベクトルが検出された場合である。このような場合、本実施例では、他の上位符号化対象画像の第１の動きベクトルとして、この検出された第１の動きベクトルを代用するようにしている。
このようにしたとしても、上述したように、通常の画像は、パンニング等、画面全体を一つの塊として移動することが多いため、大きな問題とはならない。
【００５９】
なお、上述の実施例２及び実施例３において、第１の動きベクトルが検出されない上位符号化対象領域ではかかる第１の動きベクトルのベクトルコードを第２の動きベクトルや差信号のベクトルコードに多重化して伝送する必要のないことはいうまでもない。
また、上記各実施例においては、第１，第２の動きベクトルの検出に際して１枚の画像すなわち、ＴＶにおけるフレーム画像を単位としていたが、フィールド画像を単位に第１，第２の動きベクトルを検出するようにしてもよい。
【００６０】
【発明の効果】
以上のように、本発明によれば、画像情報の動きをその大域的な動きを示す第１の動きベクトルと、局所的な動きを示す第２の動きベクトルとにより２段階の動き補償予測を行うようにしているため、ハードウェア／ソフトウェア規模を小規模なものとしながら、動きの早い画像にあっても符号化効率の高い映像信号符号化・復号化装置及び符号化・復号化方法が得られるという効果がある。
【００６１】
また、本発明によれば、第１の動きベクトルが検出できない上位符号化対象領域があったとしても、容易にこれに代わる第１の動きベクトルを得ることができ、また、本発明により得た第１の動きベクトルは、パンニング等の画像情報の特徴及び人間の視覚特性に基づいて得たものであるため、再生画質の劣化も僅かなものに押さえた映像信号符号化・復号化装置及び符号化・復号化方法が得られるという効果がある。
【図面の簡単な説明】
【図１】本発明の映像信号符号化・復号化装置における符号化装置の概略構成を示すブロック図である。
【図２】本発明の映像信号符号化・復号化装置における復号化装置の概略構成を示すブロック図である。
【図３】本発明の映像信号符号化・復号化装置における第１の動き検出手段の構成を示すブロック図である。
【図４】本発明の映像信号符号化・復号化装置における符号化対象領域及び上位符号化対象領域の概念を示す概念図である。
【図５】本発明の映像信号符号化・復号化装置における第１の動きベクトル検出の概念を示す概念図である。
【図６】本発明の映像信号符号化・復号化装置における第２の動きベクトル検出の概念及び第１，第２の動きベクトルの関係を示す概念図である。
【図７】本発明の映像信号符号化・復号化装置におけるベクトルコードを示す図である。
【図８】本発明の映像信号符号化・復号化装置における第１の動きベクトルと各画像間の関係を示す図である。
【図９】本発明の映像信号符号化・復号化装置において検出されなかった第１の動きベクトルを作成する方法を示す図である。
【図１０】本発明の映像信号符号化・復号化装置において検出されなかった第１の動きベクトルを検出された他の第１の動きベクトルで代用する方法を示す図である。
【図１１】従来の映像信号符号化・復号化装置における符号化装置の概略構成を示すブロック図である。
【図１２】従来の映像信号符号化・復号化装置における復号化装置の概略構成を示すブロック図である。
【図１３】従来の映像信号符号化・復号化装置において動きベクトルを検出するための探索領域の概念を示す概念図である。
【図１４】従来の映像信号符号化・復号化装置における動きベクトル検出の概念を示す概念図である。
【図１５】従来の映像信号符号化・復号化装置におけるベクトルコードを示す図である。
【図１６】従来の映像信号符号化・復号化装置における動きベクトルの差分値とその出現確率との関係を示す図である。
【符号の説明】
１ａ，１ｂ，１ｃ，１ｄ：入力端子、２ａ，２ｂ，２ｃ，２ｄ：出力端子、３ａ，３ｃ：減算手段、４ａ，４ｃ：ＤＣＴ（離散コサイン変換）手段、４ｂ：ＩＤＣＴ（逆離散コサイン変換）手段、５ａ，５ｃ：量子化手段、５ｂ，５ｄ：逆量子化手段、６ａ，６ｃ：逆量子化手段、７ａ，７ｃ：ＩＤＣＴ手段、８ａ，８ｂ，８ｃ，８ｄ：加算手段、９ａ，９ｂ，９ｃ，９ｄ：メモリ手段、１０ｃ：第１の動き検出手段、１２ａ，１２ｂ，１２ｃ，１２ｄ：スイッチ（切替）手段、１３ａ，１３ｃ：可変長符号化手段、１３ｂ，１３ｄ：可変長復号化手段、１４ａ，１４ｃ：送信バッファ手段、１４ｂ，１４ｄ：受信バッファ手段、１５ａ，１５ｃ：符号量制御手段、１７ｄ：動きベクトル再生手段、１８ａ，１８ｃ：第２の動き検出手段、１９ｃ：低域通過フィルタ（ＬＰＦ）手段、２０ｃ：サブサンプリング手段、２１ｃ：メモリ手段、２２ｃ：代表ベクトル検出手段[0001]
[Industrial application fields]
The present invention relates to a video signal encoding / decoding device and encoding / decoding method for encoding / decoding a video signal using motion compensation prediction.
[0002]
[Prior art]
As an encoding means in a video signal encoding / decoding device, a combination of motion compensated prediction and DCT (discrete cosine transform) is often used. The conventional example described below also uses this.
[0003]
11 to 15 show, for example, a conventional video signal encoding apparatus and a decoding apparatus for reproducing a video signal encoded by the encoding apparatus, as shown in ISO / IEC 13818-2 Draft International Standard. It is a figure for demonstrating.
FIG. 11 is a block diagram showing a schematic configuration of the video signal encoding apparatus, FIG. 12 is a block diagram showing a schematic configuration of the video signal decoding apparatus, and FIGS. 13 and 14 show encoding of the video signal. FIG. 15 is a diagram showing a vector code for encoding (only the vector codes in some search ranges are shown in FIG. 15). is there.
FIG. 16 is a diagram showing a relationship between a difference value of a motion vector, which will be described later, and its appearance probability.
[0004]
In general, in coding using motion compensation prediction and DCT, one piece of image information is divided into a plurality of grid-like small areas (hereinafter referred to as coding target areas), and coding is performed for each small area. Do.
What is referred to as motion compensated prediction is that several images (hereinafter referred to as reference images) encoded in the past with respect to the encoding target area of an image to be encoded (hereinafter referred to as an encoding target image). The area that is most similar and the same size as the encoding target area (hereinafter referred to as the prediction area) is detected, and only the difference signal between the prediction area and the encoding target area is encoded. To be transmitted.
[0005]
At this time, it is necessary to simultaneously transmit information on which region is the prediction region to the decoding means, but this information is referred to as a motion vector and is different between an interlaced image and a non-interlaced image. In this conventional example, for the sake of explanation, it is assumed to be composed of two vectors, a horizontal motion vector and a vertical motion vector.
FIG. 14 conceptually shows this motion compensation prediction.
[0006]
On the other hand, the decoding means detects the prediction region from the transmitted motion vector and the reproduced reference image, and adds the difference signal transmitted to the video signal of the prediction region. Here, the original signal of the encoding target area can be reproduced.
[0007]
Further, as shown in FIG. 13, the prediction area is extended by ± h pixels in the horizontal direction and ± v lines in the vertical direction around the same horizontal / vertical position as the encoding target area of the encoding target image in the reference image. The region is selected from the region (hereinafter referred to as a search region).
In general, it is necessary to widen the search area in order to increase the coding efficiency for fast moving images. Therefore, the conventional apparatus is configured so that the size of the search area can be appropriately selected.
[0008]
Now, a specific configuration of the conventional video signal encoding apparatus will be described with reference to FIG.
In the figure, 1a is an input terminal for a video signal, 2a is an output terminal for an encoded video signal, 3a is a subtracting means, 4a is a DCT means for converting the video signal to a horizontal / vertical spatial frequency for information compression, 5a Is quantization means, 6a is inverse quantization means, 7a is IDCT (Inverse Discrete Cosine Transform) means for reconverting the frequency-converted video signal to the original video signal, 8a is addition means, 9a is memory means, and 12a is Switch (switching) means, 13a is variable length coding means, 14a is transmission buffer means, 15a is code amount control means, and 18a is motion detection means.
[0009]
A part of the video signal 101 input from the input terminal 1a is input to the motion detection means 18a to generate a motion vector, and is also made a difference signal 102 from the prediction region signal in the subtraction means 3a.
The difference signal 102 is frequency-converted by the DCT means 4a and further quantized by the quantization means 5a.
[0010]
Then, a part of the quantized difference signal 104 is retransformed through the inverse quantization means 6a and the inverse DCT means 7a to be the original difference signal, and the signal in the prediction region is added by the addition means 8a. And is stored as a reference image in the memory means 9a. On the other hand, the remaining difference signal 104 is encoded and multiplexed together with the motion vector 112 generated by the motion detector 18a in the variable length encoder 13a.
Here, variable-length coding is one of coding methods in which a short code word is assigned to a symbol with a high appearance probability and a long code word is assigned to a symbol with a low appearance probability.
[0011]
The multiplexed signal 114 is transmitted from the output terminal 2a via the transmission buffer means 14a or recorded on a recording medium (not shown).
The code amount control means 15a adaptively changes the quantization step of the quantization means 5a in response to a signal such as the remaining memory capacity in the transmission buffer means 14a so that overflow does not occur.
[0012]
On the other hand, the reference image stored in the memory means 9a is input to the first terminal of the switch means 12a and also input to the motion detection means 18a. (A zero signal is input to the second terminal of the switch means 12a.)
[0013]
The motion detection means 18a detects the motion vector 112 as described above for each encoding target area of the encoding target image from the input reference image 108 and the video signal (coding target image) 101 (FIG. 13). , FIG. 14).
The detected motion vector 112 is sent to the variable length coding means 13a, where a difference value from the motion vector of the adjacent encoding target region is calculated, and a vector code obtained by variable length coding this is a variable length code. The difference signal 104 is multiplexed.
[0014]
The output 110 of the motion detection means 18a is also used as a switching signal for the switch means 12a. Based on this signal, the video signal of the reference image is converted into the signal 109 of the prediction area and input to the subtraction means 3a and the addition means 8a. Is done. Further, the signal 110 is also used as a difference signal 104 and a motion vector coding switching signal in the variable length coding means 13a.
[0015]
Next, a specific configuration of the video signal decoding apparatus that decodes the video signal encoded as described above will be described with reference to FIG.
In the figure, 1b is an input terminal for an encoded video signal, 2b is an output terminal for a decoded video signal, 14b is a reception buffer means, 13b is a variable length decoding means, 5b is an inverse quantization means, and 4b is an inverse quantization means. IDCT means, 8b is addition means, 9b is memory means, and 12b is switch means.
[0016]
The encoded video signal 201 input from the input terminal 1b is input to the variable length decoding unit 13b via the reception buffer unit 14b. The variable length decoding means 13 b decodes the encoded video signal 202 and separates it into a motion vector 213 and a difference signal 203.
The separated difference signal 203 is inversely quantized by the inverse quantization means 5b and converted to the original difference signal 205 by the IDCT means 4b. Further, the difference signal 205 is added to the prediction area signal 208 in the adding means 8b and returned to the original encoding target area signal 206, and a part of the difference signal 205 is stored in the memory means 9b. The target image is output from the output terminal 2b.
[0017]
On the other hand, the memory unit 9b generates a prediction region signal 207 from the encoding target region signal 206 from the addition unit 8b and the motion vector 213 decoded by the variable length decoding unit, and the prediction region signal 207 is variable length. The adder 8b is configured to input the signal via the switch unit 12b that is switched based on the switching signal 209 for motion compensation prediction generated by the decoding unit. Note that a zero signal is input to one end of the switch means 12b. When this terminal is selected, a signal not subjected to motion compensation prediction is output from 8b.
[0018]
[Problems to be solved by the invention]
The conventional video signal encoding / decoding device is configured as described above, and is configured so that the size of the search area can be set as appropriate in order to increase the encoding efficiency for fast moving video. .
[0019]
However, as shown in FIG. 16, when the size of the search area is different, the appearance probability of the difference value of the motion vector is different accordingly. For this reason, when the vector code of the motion vector is created in the variable length coding means 13a, it is necessary to make the vector code different according to the size of the search area.
[0020]
Therefore, in the conventional apparatus, as shown in FIG. 15, two codes of motion code (a predetermined variable length code) and motion residual (a code length determined according to the search range) are combined. The vector code is generated according to the above, and by combining these, a different vector code corresponding to the size of the search area is generated.
[0021]
Here, as shown in FIG. 16, the difference value of the motion vector has a feature that the appearance probability of the vector having a small difference value is high regardless of the size of the search area. However, since the conventional vector code did not consider this point at all, as shown in FIG. 15, the code length of the vector code indicating a smaller vector difference value becomes longer as the search area becomes larger. have.
[0022]
This means that although the search area has been expanded to increase the coding efficiency for fast moving images, the average word length of the vector code becomes longer due to the expansion. As a result, there has been a problem that the conversion efficiency deteriorates and the image quality deteriorates.
[0023]
Further, in the conventional apparatus, different vector codes are generated according to the size of the search area by combining the two codes of motion code and motion residual, so that the optimal vector for any search area In order to obtain a code, it is necessary to have a plurality of types of vector codes corresponding to the size of the search area in parallel. Therefore, there is a problem that the hardware / software scale is inevitably large and is not practical.
[0024]
The present invention has been made to solve the problems of the conventional apparatus as described above. Even if the search area is expanded, the encoding efficiency does not deteriorate, and the hardware / software scale is reduced. An object of the present invention is to obtain a video signal encoding / decoding device and encoding / decoding method.
[0025]
[Means for Solving the Problems]
The video signal encoding apparatus according to the present invention detects a first motion vector indicating a temporal motion of an upper encoding target area obtained by dividing an area corresponding to one piece of image information into a plurality of areas. Motion detection means; second motion detection means for detecting a second motion vector indicating temporal motion of an encoding target area obtained by dividing the upper encoding area into a plurality of areas; and the first and second Variable length encoding means for encoding the first and second motion vectors output from the motion detection means.
The second motion vector is detected in a predetermined search area centered on the area specified by the first motion vector.
Further, the first and second motion vectors are encoded by vector codes corresponding to the first and second motion vectors, respectively.
In addition, the difference value between the first and second motion vectors in the adjacent upper encoding target region and the encoding target region is encoded by a vector code.
The first motion detection means includes a low-pass filter means for outputting a low-frequency component of the encoding target image, a sub-sampling means for sub-sampling the output from the low-pass filter means, and the sub-sampling means Based on a memory means for storing a subsampled video signal from the memory means as a reference image, the reference image output output from the memory means and the subsampling output of the encoding target image output from the subsampling means Representative vector detecting means for outputting the first motion vector.
[0026]
Further, based on the first motion vectors of the nth image and the n + m image, the first motion vector vf for the past image and the future image in the n + y image between the nth image and the n + m image. The first motion vector vb for is generated by the following equation.
vf = { y / m } × v
vb = { (My) / m } × (−v)
In addition, the first motion vector for one upper encoding target region of the encoding target image is set as a first motion vector for another upper encoding target region.
The video signal encoding method according to the present invention detects a first motion vector indicating a temporal motion of an upper encoding target region obtained by dividing a region corresponding to one piece of image information into a plurality of regions. A motion detection step, a second motion detection step of detecting a second motion vector indicating temporal motion of an encoding target region obtained by dividing the upper encoding region into a plurality of regions, and the first and second And a variable length encoding step for encoding the first and second motion vectors obtained by the motion detection step.
The second motion vector is detected in a predetermined search area centered on the area specified by the first motion vector.
Further, the first and second motion vectors are encoded by vector codes corresponding to the first and second motion vectors, respectively.
In addition, the difference value between the first and second motion vectors in the adjacent upper encoding target region and the encoding target region is encoded by a vector code.
[0028]
The video signal decoding apparatus according to the present invention includes a first motion vector indicating temporal motion of an upper encoding target area obtained by dividing an area corresponding to one piece of image information into a plurality of areas, and the upper encoding target. Each vector code corresponding to each of the first and second motion vectors is separated from an encoded video signal including a second motion vector indicating temporal motion of the encoding target region obtained by dividing the region into a plurality of regions. Variable length decoding means for decoding and outputting a motion vector based on the first and second motion vectors corresponding to the first and second motion vector codes output from the variable length decoding means A motion vector reproducing unit; and an image information reproducing unit configured to reproduce the one piece of image information based on the motion vector output from the motion vector reproducing unit.
Also, the present invention is characterized in that a motion vector is reproduced by vector synthesis of the first and second motion vectors.
[0030]
[Action]
According to the present invention, the motion vector of the encoding target area of the image information is represented by the combination of the first motion vector indicating the global motion of the image information and the second motion vector indicating the local motion. Therefore, the code amount of the first motion vector can be reduced, and it is not necessary to provide a plurality of search areas for detecting the second motion vector.
[0031]
Further, according to the present invention, even when the first motion vector is not detected, the first motion vector is used by using the other first motion vector based on the characteristics of normal image information such as panning and human visual characteristics. It is possible to create one motion vector or substitute the detected first motion vector.
[0032]
【Example】
Example 1.
Embodiments of the present invention will be described below with reference to the drawings.
1 to 8 are diagrams for explaining a video signal encoding apparatus and a decoding apparatus for reproducing a video signal encoded by the encoding apparatus according to the first embodiment of the present invention.
[0033]
FIG. 1 is a block diagram showing a schematic configuration of the video signal encoding device, FIG. 2 is a block diagram showing a schematic configuration of the video signal decoding device, and FIG. 3 shows an example of the configuration of the first motion detection means. It is a block diagram. FIG. 4 is a conceptual diagram showing the concept of the encoding target region and the upper encoding target region in the present embodiment, and FIGS. 5 and 6 illustrate the concept of motion compensation prediction performed when the video signal is encoded in the present embodiment. FIG. 7 is a diagram illustrating a vector code for encoding, and FIG. 8 is a conceptual diagram illustrating a relationship between a first motion vector and an image according to the present embodiment.
[0034]
As described above with reference to FIG. 16, the motion vector difference value in a normal image has a very high appearance probability of a small vector difference value regardless of the size of the search region. This means that the normal temporal movement of the image often moves in the same manner in a certain area as represented by camera panning and the like. Therefore, in the case of an image with fast motion, even if the motion vector itself takes a large value, the difference value of the motion vector mostly takes a small value.
In addition, considering human visual characteristics, human visual characteristics are relatively good for fast movements in a certain area such as panning, but on the contrary, the local characteristics of the screen For fast movement, human visual characteristics are extremely deteriorated.
[0035]
The present invention utilizes the characteristics of such an image and human visual characteristics. In the present embodiment, first, an upper encoding target region as shown in FIG. 4 is defined. This is configured by collecting a plurality of conventional encoding target areas.
Then, as shown in FIG. 5, the first motion vector indicating the global motion is detected for each of the upper encoding target areas constituting the encoding target image. Thereafter, as shown in FIG. 6, for each of the encoding target areas included in the upper encoding target area in which the first motion vector is detected,The position specified by the first motion vectorThe second motion vector in a predetermined search area centered on is detected, and each of these motion vectors is encoded.
[0036]
According to this embodiment, since the first motion vector is detected for a wide area, the code amount of the first motion vector with respect to the entire code amount is very small. In addition, since the second motion vector has already detected a motion of a wide area by the first motion vector, local motion detection within a limited range is performed even for a fast motion image. It is sufficient to set an appropriate search area in advance and detect the second motion vector in this search area. Therefore, there is no need to provide a plurality of vector codes in parallel as in the prior art.
[0037]
The specific apparatus configuration of this embodiment will be described below. FIG. 1 is a block diagram showing the configuration of such a video signal encoding apparatus according to the first embodiment of the present invention.
In the figure, 1c is an input terminal for a video signal, 2c is an output terminal for an encoded video signal, 3c is a subtracting means, 4c is a DCT means for converting the video signal into a horizontal / vertical spatial frequency for information compression, 5c Is quantization means, 6c is inverse quantization means, 7c is IDCT (Inverse Discrete Cosine Transform) means for reconverting the frequency-converted video signal to the original video signal, 8c is addition means, 9c is memory means, and 10c is memory means First motion detection means, 12c is switch (switching) means, 13c is variable length coding means, 14c is transmission buffer means, 15c is code amount control means, and 18c is second motion detection means.
[0038]
A part of the video signal 301 inputted from the input terminal 1c is inputted to the first motion detecting means 10c and the second motion detecting means 18c, and also inputted to the subtracting means 3c to obtain the signal 309 of the prediction region. The difference signal 302 is used.
The difference signal 302 is frequency-converted by the DCT means 4c and further quantized by the quantization means 5c.
[0039]
Then, a part of the quantized difference signal 304 is retransformed through the inverse quantization means 6c and the inverse DCT means 7c to be the original difference signal, and the signal 309 in the prediction region is added by the addition means 8c. The original video signal is stored as a reference image in the memory means 9c.
On the other hand, the remaining difference signal 304 is encoded together with the first and second motion vectors 312 and 313 generated by the first motion detecting unit 10c and the second motion detecting unit 18c in the variable length encoding unit 13c. Is multiplexed.
[0040]
The multiplexed signal 314 is transmitted from the output terminal 2c via the transmission buffer means 14c or recorded on a recording medium (not shown).
Note that the code amount control means 15c adaptively changes the quantization step of the quantization means 5c in response to a signal such as the remaining memory capacity in the transmission buffer means 14c so that overflow does not occur.
[0041]
On the other hand, the reference image stored in the memory means 9c is input to the first terminal of the switch means 12c and also input to the second motion detection means 18c. (A zero signal is input to the second terminal of the switch means 12c.)
[0042]
In the second motion detection unit 18c, the code of the encoding target image is determined from the input reference image 308, the video signal (encoding target image) 301, and the first motion vector 313 generated by the first motion detection unit 10c. For each encoding target region, motion detection is performed within a predetermined search region centered on the region specified by the first motion vector of the higher-order encoding target region to detect a second motion vector (see FIG. 6).
The detected first motion vector 313 and second motion vector 312 are sent to the variable length coding means 13c, where the difference values between the motion vector in the adjacent higher coding target region and coding target region are respectively determined. This is calculated, converted into a vector code by a vector code as shown in FIG. 7, and multiplexed on a variable length encoded difference signal 304.
[0043]
The output 310 of the second motion detection means 18c is also used as a switching signal for the switch means 12c. Based on this signal, the video signal 308 of the reference image is converted into a signal 309 in the prediction area, and the subtraction means 3c and addition are performed. The signal 310 is inputted to the means 8c, and the signal 310 is also used as a coding switching signal between the difference signal 304 and the first and second motion vectors in the variable length coding means 13c.
[0044]
As shown in FIG. 7, in this embodiment, the motion vector is coded by the vector code of the first motion vector and the vector code of the second motion vector.
In the present embodiment, an 8-bit fixed length code is shown as the vector code of the first motion vector. However, the present invention is not limited to this, and other bit lengths or variable length codes may be used.
Further, in the present embodiment, the vector code in the reference search range shown in the conventional example is shown as the vector code of the second motion vector, but the present invention is not limited to this, and may be a vector code in another search range.
[0045]
Next, a video signal decoding apparatus for decoding the video signal encoded as described above will be described with reference to FIG.
In the figure, 1d is an input terminal for an encoded video signal, 2d is an output terminal for a decoded video signal, 14d is a reception buffer means, 13d is a variable length decoding means, 5d is an inverse quantization means, and 4d is an inverse quantization means. IDCT means, 8d is addition means, 9d is memory means, 12d is switch means, and 17d is motion vector reproduction means.
[0046]
The encoded video signal 401 input from the input terminal 1d is input to the variable length decoding unit 13d via the reception buffer unit 14d. The variable length decoding means 13d decodes the encoded video signal 402 and separates it into a first motion vector 410, a second motion vector 411, and a difference signal 403.
The separated difference signal 403 is inversely quantized by the inverse quantization means 5d and converted to the original difference signal 405 by the IDCT means 4d. Further, this difference signal 405 is added to the prediction region signal 408 in the adding means 8d and returned to the original encoding target region signal 406, and a part of the difference signal 405 is stored in the memory means 9d and the original encoding is performed. The target image is output from the output terminal 2d.
[0047]
On the other hand, the memory unit 9d generates a prediction region signal 407 from the encoding target region signal 406 from the addition unit 8d and the motion vector 412 decoded by the variable length decoding unit and vector-combined by the motion vector reproduction unit 17d. The prediction area signal 407 is input to the adding means 8d via the switching means 12d that is switched based on the switching signal 409 for motion compensation prediction generated by the variable length decoding means. Note that a zero signal is input to one end of the switch means 12d. When this signal terminal is selected, a reproduction signal for which motion compensation prediction is not performed is output from the adder means 8d.
[0048]
Next, the first motion vector detection method in the present embodiment will be described. FIG. 3 is a diagram showing an example of a specific configuration of the first motion detecting unit 10c shown in FIG.
In the figure, 19c is a low-pass filter (LPF) means, 20c is sub-sampling means, 21c is memory means, and 22c is representative vector detection means.
[0049]
The video signal 301 input to the first motion detection unit 10c passes through the LPF unit 19c to remove high frequency components, and is subsampled by the subsampling unit 20c to reduce the hardware scale. At this time, since the LPF means 19c is applied as the pre-processing of the sub-sampling, the influence of aliasing distortion on the motion detection can be removed.
The subsampled video signal is stored as a reference image in the memory means 21c and is directly given to the representative vector detecting means 22c. In the representative vector detection unit 22c, the first motion vector as described with reference to FIG. 5 is based on the upper encoding target region of the encoding target image composed of the input video signal and the reference image from the memory unit 21c. Is detected.
[0050]
FIG. 8 is a diagram showing the relationship between the first motion vector and the image in this embodiment.
In the figure, vertical lines indicate the respective images, horizontal short lines indicate the boundaries of the upper encoding target area, and arrows indicate the first motion vector.
As can be seen from the figure, the first motion vector is detected for all the upper coding target regions.
[0051]
Example 2
Next, a second embodiment of the present invention will be described.
FIG. 9 is a diagram illustrating a first relationship between the first motion vector and the image.
[0052]
In the above-described first embodiment, the case where the first motion vector is detected for all the upper coding target regions of the coding target image has been described. However, the first motion vector is a fast motion of the image. In some cases, it may not be detected depending on the size of the upper encoding target area.
[0053]
The present embodiment relates to a method for creating a first motion vector in a higher-order encoding target region in which such a first motion vector is not detected, and is a constant defined between an encoding device and a decoding device. Based on the above rule, the first motion vector of the higher coding target region is created from the first motion vector of the other higher coding target region.
[0054]
FIG. 9 shows a case where the first motion vector is detected at a certain image interval. In the figure, the first motion vector is detected for every m images. At this time, in the image in which the first motion vector is not detected, the first motion vector is detected from the first motion vector in the closest future image in which the first motion vector is detected by the following method. Create a vector.
[0055]
That is, when the first motion vector of the nth image and the n + m image is detected, if the first motion vector in the n + m image is v, the first motion vector vf for the past image in the n + y image. The first motion vector vb for the future image is created as follows.
vf = {y / m} × v
vb = {(my) / m} * (-v)
[0056]
The first vector generation method may be basically a method determined according to a common rule between the encoding device and the decoding device, and may be a method other than the above formula.
[0057]
Example 3 FIG.
FIG. 10 is a diagram showing a third embodiment of the present invention, and shows a second relationship between the first motion vector and the image.
[0058]
Unlike the second embodiment, the present embodiment is a case where the first motion vector is detected only for one higher-order encoding target image among the encoding target images. In such a case, in the present embodiment, the detected first motion vector is substituted as the first motion vector of the other higher-order encoding target image.
Even if it does in this way, as above-mentioned, since a normal image often moves the whole screen as one lump, such as panning, it is not a big problem.
[0059]
In the second and third embodiments, the first motion vector vector code is multiplexed on the second motion vector or the difference signal vector code in the upper coding target region where the first motion vector is not detected. Needless to say, it is not necessary to transmit the data.
In each of the above-described embodiments, one image, that is, a frame image on TV is used as a unit when detecting the first and second motion vectors. However, the first and second motion vectors are used as a unit of field image. You may make it detect.
[0060]
【The invention's effect】
As described above, according to the present invention, two-stage motion compensation prediction is performed on the motion of image information by using the first motion vector indicating the global motion and the second motion vector indicating the local motion. Therefore, it is possible to obtain a video signal encoding / decoding device and encoding / decoding method having high encoding efficiency even in a fast-moving image while reducing the hardware / software scale. There is an effect that it is.
[0061]
Further, according to the present invention, even if there is a higher coding target area where the first motion vector cannot be detected, the first motion vector can be easily obtained instead of the first target vector, and the first motion vector obtained by the present invention can be obtained. Since the first motion vector is obtained on the basis of the characteristics of image information such as panning and human visual characteristics, the video signal encoding / decoding device and the code that suppress the deterioration of the reproduction image quality to a slight extent There is an effect that an encoding / decoding method is obtained.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an encoding device in a video signal encoding / decoding device of the present invention.
FIG. 2 is a block diagram showing a schematic configuration of a decoding device in the video signal encoding / decoding device of the present invention.
FIG. 3 is a block diagram showing a configuration of first motion detecting means in the video signal encoding / decoding device of the present invention.
FIG. 4 is a conceptual diagram showing the concept of an encoding target region and a higher encoding target region in the video signal encoding / decoding device of the present invention.
FIG. 5 is a conceptual diagram showing a concept of first motion vector detection in the video signal encoding / decoding device of the present invention.
FIG. 6 is a conceptual diagram showing the concept of second motion vector detection and the relationship between the first and second motion vectors in the video signal encoding / decoding device of the present invention.
FIG. 7 is a diagram showing a vector code in the video signal encoding / decoding device of the present invention.
FIG. 8 is a diagram illustrating a relationship between a first motion vector and each image in the video signal encoding / decoding device of the present invention.
FIG. 9 is a diagram illustrating a method of creating a first motion vector that has not been detected by the video signal encoding / decoding device of the present invention.
FIG. 10 is a diagram showing a method of substituting the first motion vector that has not been detected by the detected other first motion vector with the video signal encoding / decoding device of the present invention.
FIG. 11 is a block diagram showing a schematic configuration of an encoding device in a conventional video signal encoding / decoding device.
FIG. 12 is a block diagram showing a schematic configuration of a decoding device in a conventional video signal encoding / decoding device.
FIG. 13 is a conceptual diagram showing the concept of a search area for detecting a motion vector in a conventional video signal encoding / decoding device.
FIG. 14 is a conceptual diagram showing a concept of motion vector detection in a conventional video signal encoding / decoding device.
FIG. 15 is a diagram illustrating a vector code in a conventional video signal encoding / decoding device.
FIG. 16 is a diagram illustrating a relationship between a motion vector difference value and its appearance probability in a conventional video signal encoding / decoding device.
[Explanation of symbols]
1a, 1b, 1c, 1d: input terminal, 2a, 2b, 2c, 2d: output terminal, 3a, 3c: subtraction means, 4a, 4c: DCT (discrete cosine transform) means, 4b: IDCT (inverse discrete cosine transform) Means, 5a, 5c: quantization means, 5b, 5d: inverse quantization means, 6a, 6c: inverse quantization means, 7a, 7c: IDCT means, 8a, 8b, 8c, 8d: addition means, 9a, 9b, 9c, 9d: memory means, 10c: first motion detection means, 12a, 12b, 12c, 12d: switch (switching) means, 13a, 13c: variable length encoding means, 13b, 13d: variable length decoding means, 14a, 14c: transmission buffer means, 14b, 14d: reception buffer means, 15a, 15c: code amount control means, 17d: motion vector reproduction means, 18a, 18c: second motion detection means, 1 c: a low pass filter (LPF) means, 20c: sub-sampling means, 21c: memory means, 22c: the representative vector detecting means

Claims

＜Ａ１＞上位符号化対象領域を、現在符号化しようとしている１枚の画像情報に対応する領域を複数の領域に分割した領域と定義し、
＜Ａ２＞参照領域を、過去に符号化した画像である参照画像内において、前記上位符号化対象領域と同一の大きさを有する任意の領域と定義したときに、
＜Ａ３＞前記上位符号化対象領域が、前記参照画像内の複数の参照領域のいずれをシフトしたものであるかを示す第１の動きベクトルを検出する第１の動き検出手段を有し、
＜Ｂ１＞符号化対象領域を、前記上位符号化対象領域を複数に分割した領域と定義し、
＜Ｂ２＞探索領域を、前記参照画像内において前記第１の動きベクトルで指定される位置を中心とした所定の大きさの領域と定義したときに、
＜Ｂ３＞前記符号化対象領域が、前記探索領域内で前記符号化対象領域と同一の大きさを有する任意の領域のうち、いずれをシフトしたものかを示す第２の動きベクトルを検出する第２の動き検出手段と、
＜Ｃ＞前記第１および第２の動き検出手段から出力される前記第１および第２の動きベクトルを符号化する可変長符号化手段と
を有し、
前記第１の動き検出手段は、
符号化対象画像の低域成分を出力する低域通過フィルタ手段と、
該低域通過フィルタ手段からの出力をサブサンプリングするサブサンプリング手段と、
該サブサンプリング手段からのサブサンプリングされた映像信号を参照画像として蓄積するメモリ手段と、
該メモリ手段から出力される前記参照画像出力と前記サブサンプリング手段から出力される前記符号化対象画像のサブサンプリング出力とに基づいて第１の動きベクトルを出力する代表ベクトル検出手段とを備える映像信号符号化装置。<A1> An upper encoding target area is defined as an area obtained by dividing an area corresponding to one piece of image information to be encoded into a plurality of areas.
<A2> When a reference area is defined as an arbitrary area having the same size as the upper encoding target area in a reference image that is an image encoded in the past,
<A3> First motion detecting means for detecting a first motion vector indicating which one of the plurality of reference regions in the reference image is shifted as the upper coding target region,
<B1> An encoding target area is defined as an area obtained by dividing the upper encoding target area into a plurality of areas,
<B2> When the search area is defined as an area having a predetermined size centered on the position specified by the first motion vector in the reference image,
<B3> A second motion vector that detects which one of the arbitrary regions having the same size as the encoding target region in the search region is shifted is detected in the search region. Two motion detection means;
<C> Variable length encoding means for encoding the first and second motion vectors output from the first and second motion detection means,
The first motion detection means includes
Low-pass filter means for outputting a low-frequency component of the encoding target image;
Sub-sampling means for sub-sampling the output from the low-pass filter means;
Memory means for storing the subsampled video signal from the subsampling means as a reference image;
A video signal comprising: representative vector detection means for outputting a first motion vector based on the reference picture output outputted from the memory means and the sub-sampling output of the encoding target picture outputted from the sub-sampling means Encoding device.

＜Ａ１＞上位符号化対象領域を、現在符号化しようとしている１枚の画像情報に対応する領域を複数の領域に分割した領域と定義し、
＜Ａ２＞参照領域を、過去に符号化した画像である参照画像内において、前記上位符号化対象領域と同一の大きさを有する任意の領域と定義したときに、
＜Ａ３＞前記上位符号化対象領域が、前記参照画像内の複数の参照領域のいずれをシフトしたものであるかを示す第１の動きベクトルを検出する第１の動き検出工程を有し、
＜Ｂ１＞符号化対象領域を、前記上位符号化対象領域を複数に分割した領域と定義し、
＜Ｂ２＞探索領域を、前記参照画像内において前記第１の動きベクトルで指定される位置を中心とした所定の大きさの領域と定義したときに、
＜Ｂ３＞前記符号化対象領域が、前記探索領域内で前記符号化対象領域と同一の大きさを有する任意の領域のうち、いずれをシフトしたものかを示す第２の動きベクトルを検出する第２の動き検出工程と、
＜Ｃ＞前記第１および第２の動き検出手段から出力される前記第１および第２の動きベクトルを符号化する可変長符号化工程と
を含み、
前記第１の動き検出工程は、
符号化対象画像の低域成分を出力する低域通過フィルタ工程と、
該低域通過フィルタ工程によって得られる出力をサブサンプリングするサブサンプリング工程と、
該サブサンプリング工程からのサブサンプリングされた映像信号を参照画像として蓄積するメモリ工程と、
該メモリ工程により出力される前記参照画像出力と前記サブサンプリング工程によって得られる前記符号化対象画像のサブサンプリング出力とに基づいて第１の動きベクトルを出力する代表ベクトル検出工程とを含む映像信号符号化方法。<A1> An upper encoding target area is defined as an area obtained by dividing an area corresponding to one piece of image information to be encoded into a plurality of areas.
<A2> When a reference area is defined as an arbitrary area having the same size as the upper encoding target area in a reference image that is an image encoded in the past,
<A3> a first motion detection step of detecting a first motion vector indicating which one of the plurality of reference regions in the reference image is shifted as the upper encoding target region;
<B1> An encoding target area is defined as an area obtained by dividing the upper encoding target area into a plurality of areas,
<B2> When the search area is defined as an area having a predetermined size centered on the position specified by the first motion vector in the reference image,
<B3> A second motion vector that detects which one of the arbitrary regions having the same size as the encoding target region in the search region is shifted is detected in the search region. 2 motion detection steps;
<C> a variable length encoding step for encoding the first and second motion vectors output from the first and second motion detection means,
The first motion detection step includes
A low-pass filter step for outputting a low-frequency component of the image to be encoded;
A subsampling step of subsampling the output obtained by the low pass filter step;
A memory step of storing the subsampled video signal from the subsampling step as a reference image;
A video signal code including a representative vector detection step of outputting a first motion vector based on the reference image output output by the memory step and the sub-sampling output of the encoding target image obtained by the sub-sampling step Method.