JP2007060704A

JP2007060704A - Moving picture coding method, apparatus, and program

Info

Publication number: JP2007060704A
Application number: JP2006282043A
Authority: JP
Inventors: Shinichiro Koto; 晋一郎古藤; Takeshi Nakajo; 健中條; Takeshi Nagai; 剛永井; Yoshihiro Kikuchi; 義浩菊池; Wataru Asano; 渉浅野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-01-18
Filing date: 2006-10-16
Publication date: 2007-03-08
Also published as: JP2007060697A; JP2007259484A; JP2007060700A; JP2007060705A; JP2007060701A; JP2007060699A; JP2007060695A; JP2007060696A; JP2007060702A; JP2007060698A; JP2007060703A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving picture coding method capable of reducing computational complexity and an increase in overhead of coded data and considerably improving the predictive efficiency as to a fade image at the processing of which the conventional moving picture coding method such as MPEG is poor. <P>SOLUTION: The moving picture coding method codes a moving picture by adaptively switching whether a plurality of decoded moving picture signals are selected for reference frames and a predictive macroblock image is generated from one of the reference frames by each macroblock, or reference macroblocks are segmented from the reference frames and an average is used for a predictive macroblock image, or reference macroblocks are segmented from the reference frames and linear extrapolation or linear interpolation in response to an inter-frame distance between a reference frame and a coded frame is applied to the reference macroblocks to generate a predictive macroblock image. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、複数の参照フレームを利用する動き補償予測フレーム間符号化方法、装置及びプログラムに関する。 The present invention relates to a motion compensated prediction interframe coding method, apparatus, and program using a plurality of reference frames.

動画像の動き補償予測フレーム間符号化方法として、ＭＰＥＧ１（ＩＳＯ/ＩＥＣ１１１７２−２）、ＭＰＥＧ２（ＩＳＯ/ＩＥＣ１３８１８−２），ＭＰＥＧ４（ＩＳＯ/ＩＥＣ１４４９６−２）などが広く実用化されている。これらの符号化方式では、フレーム内符号化ピクチャ（Ｉピクチャ）、前方予測フレーム間符号化ピクチャ（Pピクチャ）、及び両方向予測フレーム間符号化ピクチャ（Bピクチャ）の組み合わせによる符号化が行われる。 MPEG1 (ISO / IEC11172-2), MPEG2 (ISO / IEC13818-2), MPEG4 (ISO / IEC14496-2), etc. have been widely put into practical use as motion compensation prediction interframe coding methods for moving images. In these encoding schemes, encoding is performed using a combination of an intra-frame encoded picture (I picture), a forward predicted inter-frame encoded picture (P picture), and a bidirectionally predicted inter-frame encoded picture (B picture).

Ｐピクチャは、直前のＰまたはＩピクチャを参照画像として符号化され、Ｂピクチャは直前及び直後のＰまたはＩピクチャを参照画像として符号化される。ＭＰＥＧでは、１つの映像フレーム或いは複数の映像フレームから、マクロブロック毎に選択的に予測画像を生成することが可能である。Ｐピクチャの場合、通常は１つの参照フレームから、マクロブロック単位に予測画像を生成し、Ｂピクチャの場合、前方或いは後方の参照画像の何れか１つから予測画像を生成する場合と、前方及び後方の参照画像からそれぞれ参照マクロブロックを切り出し、その平均値から予測画像を生成する場合とがある。これらの予測モードの情報は、マクロブロック毎に符号化データに埋め込まれる。 The P picture is encoded using the immediately preceding P or I picture as a reference image, and the B picture is encoded using the immediately preceding and immediately following P or I picture as a reference image. In MPEG, a predicted image can be selectively generated for each macroblock from one video frame or a plurality of video frames. In the case of a P picture, a prediction image is usually generated in units of macroblocks from one reference frame, and in the case of a B picture, a prediction image is generated from one of front or rear reference images, In some cases, a reference macroblock is cut out from each reference image behind and a predicted image is generated from the average value. Information on these prediction modes is embedded in encoded data for each macroblock.

しかし、何れの予測符号化も、マクロブロックのサイズ或いはそれよりも大きな領域で、同じ映像がフレーム間で時間的に平行移動した場合に予測が当たるというものであり、時間的な映像の拡大縮小や回転、或いはフェードイン・フェードアウトのような信号の振幅の時間変動に対しては、必ずしも予測効率がよいとは言えない。特に、固定ビットレートでの符号化では、これらの予測効率の悪い映像が入力されると、大幅な画質劣化を招くことがある。また、可変ビットレートの符号化では、画質劣化を抑制するため、これらの予測効率の悪い映像に対して、多くの符号量が費やされてしまい、総符号量の増加を招いてしまう。 However, any prediction encoding is performed when the same video is temporally translated between frames in a macro block size or larger area, and temporal video scaling is performed. Prediction efficiency is not necessarily good with respect to temporal fluctuations in signal amplitude such as rotation, rotation, or fade-in / fade-out. In particular, in encoding at a fixed bit rate, when these videos with poor prediction efficiency are input, the image quality may be greatly degraded. In addition, in variable bit rate encoding, in order to suppress image quality degradation, a large amount of code is consumed for these videos with poor prediction efficiency, leading to an increase in the total code amount.

一方、映像の時間的な拡大・縮小や回転、或いはフェードイン・フェードアウト等は、動画像信号のアフィン変換で近似できるため、アフィン変換を用いた予測を行えば、これらの映像に対する予測効率は大幅に向上する。しかし、アフィン変換のパラメータを推定するためには、符号化時に膨大なパラメータ推定演算が必要となる。 On the other hand, temporal enlargement / reduction, rotation, or fade-in / fade-out of video can be approximated by affine transformation of moving image signals. Therefore, if prediction using affine transformation is performed, the prediction efficiency for these videos is greatly increased. To improve. However, in order to estimate the affine transformation parameters, enormous parameter estimation calculations are required at the time of encoding.

具体的には、複数の変換パラメータで参照画像を変換させ、予測残差が最小となるパラメータを決定することが必要となり、変換演算の演算量が膨大になる。その結果、符号化の演算量あるいはハードウエア規模等のコストが膨大になってしまう。また、残差信号だけでなく変換パラメータ自体を符号化する必要があり、符号化データのオーバヘッドが膨大になる。さらに、復号化時には、逆アフィン変換が必要となり、復号化の演算量あるいはハードウエア規模等のコストも膨大になってしまう。 Specifically, it is necessary to convert the reference image with a plurality of conversion parameters and determine a parameter that minimizes the prediction residual, which increases the amount of conversion calculation. As a result, the amount of calculation for encoding or the cost of hardware becomes enormous. Further, it is necessary to encode not only the residual signal but also the conversion parameter itself, and the overhead of encoded data becomes enormous. Furthermore, when decoding, inverse affine transformation is required, and the amount of decoding computation or the cost of hardware becomes enormous.

上述したように、従来のＭＰＥＧなどの動画像符号化方法では、平行移動以外の動画像の時間変化に対して、十分な予測効率が得られないという問題があった。また、アフィン変換を用いた動画像符号化及び復号化方法では、予測効率自体は改善されるものの、符号化データのオーバヘッドの増加や、符号化及び復号化コストの大幅な増加を招くという問題があった。 As described above, the conventional moving picture coding method such as MPEG has a problem that sufficient prediction efficiency cannot be obtained with respect to a time change of a moving picture other than parallel movement. In addition, in the moving picture encoding and decoding method using affine transformation, although the prediction efficiency itself is improved, there is a problem that the overhead of the encoded data is increased and the encoding and decoding costs are significantly increased. there were.

本発明では、特に従来のＭＰＥＧなどの動画像符号化方法が不得意としていたフェード画像について、演算量や符号化データのオーバヘッドの増加が少なく、かつ予測効率を大幅に向上することが可能な動画像符号化方法と装置及び復号化方法と装置を提供することを目的とする。 In the present invention, particularly for a fade image, which is not good at a conventional moving image encoding method such as MPEG, there is little increase in calculation amount and overhead of encoded data, and a moving image capable of greatly improving prediction efficiency. An object is to provide an image encoding method and apparatus, and a decoding method and apparatus.

上記課題を解決するため、本発明では、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化において、前記複数の参照フレームから、複数の参照マクロブロックを生成し、前記複数の参照マクロブロックの１つ、或いは前記複数の参照マクロブロックの平均値、或いは前記複数の参照マクロブロックによる線形外挿予測あるいは線形内挿予測のいずれかを予測マクロブロックとして選択し、前記選択された予測マクロブロックと符号化マクロブロックとの予測誤差信号、予測モード情報、及び動きベクトルを符号化することを第１の特徴としている。 In order to solve the above problems, in the present invention, in motion compensated prediction interframe coding that refers to a plurality of moving image frames for each macroblock, a plurality of reference macroblocks are generated from the plurality of reference frames, and the plurality of the plurality of reference macroblocks are generated. One of the reference macroblocks, an average value of the plurality of reference macroblocks, or one of linear extrapolation prediction and linear interpolation prediction using the plurality of reference macroblocks is selected as the prediction macroblock, and the selected The first feature is that the prediction error signal, the prediction mode information, and the motion vector of the predicted macroblock and the encoded macroblock are encoded.

また、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化データの復号化において、符号化された動きベクトルデータ、予測モード情報及び予測誤差信号を受信し、前記動きベクトル及び前記予測モードに応じて、前記複数の参照フレームのうち、特定の１フレームから予測マクロブロックを生成するか、或いは複数の参照フレームから複数の参照マクロブロックを生成し、前記の複数の参照マクロブロックの平均値を予測マクロブロックとして生成するか、或いは前記複数の参照マクロブロックによる線形外挿予測あるいは線形内挿予測のいずれかから予測マクロブロックを生成するかを選択し、前記生成された予測マクロブロックと前記予測誤差信号を加算すること第２の特徴としている。 In addition, in decoding of motion compensated prediction interframe encoded data that refers to a plurality of video frames for each macroblock, the encoded motion vector data, prediction mode information, and prediction error signal are received, and the motion vector and According to the prediction mode, a prediction macroblock is generated from a specific one frame among the plurality of reference frames, or a plurality of reference macroblocks are generated from a plurality of reference frames, and the plurality of reference macroblocks are generated. Is generated as a prediction macroblock, or a prediction macroblock is generated from linear extrapolation prediction or linear interpolation prediction using the plurality of reference macroblocks, and the generated prediction macro is selected. A second feature is to add the block and the prediction error signal.

従来のＭＰＥＧなどの動画像符号化方式では、複数の参照フレームから予測マクロブロック画像を生成する場合、各参照フレームから参照マクロブロックを切り出し、その信号の平均値を用いていた。従って、フェードなどで映像信号の振幅が時間変動する場合、予測効率が低下する場合があった。しかし、本発明の第１あるいは第２の特徴によれば、複数のフレームからの線形予測で外挿或いは内挿する形で予測画像を生成することにより、映像信号の振幅が単調に時間変動する場合、予測効率を大幅に改善することが可能となり、高画質で高能率な符号化が可能となる。 In a conventional moving picture coding system such as MPEG, when a predicted macroblock image is generated from a plurality of reference frames, a reference macroblock is cut out from each reference frame and an average value of the signals is used. Therefore, when the amplitude of the video signal fluctuates over time due to fading or the like, the prediction efficiency may decrease. However, according to the first or second feature of the present invention, the amplitude of the video signal monotonously varies with time by generating a prediction image in the form of extrapolation or interpolation by linear prediction from a plurality of frames. In this case, the prediction efficiency can be greatly improved, and high-quality and highly efficient encoding becomes possible.

フレーム間予測符号化では、符号化側では既に符号化された画像を復号化した画像を参照フレームとし、復号化側では、既に復号化された画像を参照フレームとして用いられることが一般的である。したがって、参照フレームにおける符号化ノイズの影響が、予測効率を低下される一因となる。複数の参照フレームから切り出した参照マクロブロックを平均化することは、ノイズ除去効果があり、符号化効率の向上に寄与することが知られている。これは、予測符号化におけるループフィルタとして知られている技術と等価な作用である。 In inter-frame predictive coding, it is common for an encoding side to use an image obtained by decoding an already encoded image as a reference frame, and on the decoding side, an already decoded image is used as a reference frame. . Therefore, the influence of coding noise in the reference frame contributes to a decrease in prediction efficiency. It is known that averaging the reference macroblocks cut out from a plurality of reference frames has a noise removal effect and contributes to improvement in coding efficiency. This is an operation equivalent to a technique known as a loop filter in predictive coding.

本発明の第１あるいは第２の特徴によれば、ループフィルタ効果の高い複数の参照フレームからの平均化処理、或いはフェード画像等に効果のある線形内挿或いは線形内挿の、入力画像に応じて、最適な予測モードを選択することが可能となり、任意の入力画像に対して、符号化効率を向上させることが可能となる。 According to the first or second feature of the present invention, an averaging process from a plurality of reference frames having a high loop filter effect, or linear interpolation or linear interpolation effective for a fade image or the like, depending on an input image Thus, it is possible to select an optimal prediction mode, and it is possible to improve the encoding efficiency for an arbitrary input image.

本発明では、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化において、前記複数の参照フレームが、符号化対象フレームの直前に符号化された２フレームであり、前記複数の参照マクロブロックによる線形外挿予測において、直前の参照フレームから生成した参照マクロブロック信号の振幅を２倍にした信号から、さらにその１フレーム前の参照フレームから生成した参照マクロブロック信号を減じることにより、前記予測マクロブロックを生成することを第３の特徴としている。 In the present invention, in motion compensated prediction interframe coding that refers to a plurality of moving image frames for each macroblock, the plurality of reference frames are two frames encoded immediately before the encoding target frame, In the linear extrapolation prediction by the reference macroblock, the reference macroblock signal generated from the reference frame one frame before is subtracted from the signal obtained by doubling the amplitude of the reference macroblock signal generated from the immediately preceding reference frame. Thus, the third feature is to generate the predicted macroblock.

また、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化データの復号化において、前記複数の参照フレームが、符号化対象フレームの直前に復号化された２フレームであり、前記複数の参照マクロブロックによる線形外挿予測において、直前の参照フレームから生成した参照マクロブロック信号の振幅を２倍にした信号から、さらにその１フレーム前の参照フレームから生成した参照マクロブロック信号を減じることにより、前記予測マクロブロックを生成することを第４の特徴としている。 Further, in the decoding of motion compensated prediction interframe encoded data that refers to a plurality of moving image frames for each macroblock, the plurality of reference frames are two frames decoded immediately before the encoding target frame, In linear extrapolation prediction using the plurality of reference macroblocks, a reference macroblock signal generated from a reference frame one frame before is further generated from a signal obtained by doubling the amplitude of a reference macroblock signal generated from the immediately preceding reference frame. The fourth feature is that the prediction macroblock is generated by subtraction.

前述の通り、従来のＭＰＥＧなどの動画像符号化方式では、フェードなどで映像信号の振幅が時間変動する場合に、予測効率が悪いという問題がある。例えば、Ｖ(t)を時刻ｔの映像フレームとし、Ｖ’(t)をフェード処理された時刻ｔの映像フレームとすると、フェードイン及びフェードアウトはそれぞれ、数式（１）及び数式（２）で実現することが出来る。数式（１）では、（ａ）がフェード期間を示しており、時刻ｔ＝０からフィードインが開始され、時刻Ｔでフェードインが終了する。また、数式（２）では、（ｂ）がフェード期間を示しており、時刻Ｔ₀からフェードアウト処理が、開始され時刻Ｔ₀＋Ｔでフェードアウトが終了することを示している。

As described above, the conventional moving picture coding system such as MPEG has a problem that prediction efficiency is poor when the amplitude of the video signal fluctuates over time due to fading or the like. For example, when V (t) is a video frame at time t and V ′ (t) is a video frame at time t after fade processing, fade-in and fade-out are realized by Equation (1) and Equation (2), respectively. I can do it. In Equation (1), (a) indicates a fade period, feed-in starts from time t = 0, and fade-in ends at time T. Further, in Equation (2) shows that (b) is shows the fade period, the fade-out process from time T ₀ is, fade-out initiated time T ₀ + T is completed.

ここで、フェード処理された時刻ｔのフレームＹ’(t)が符号化対象フレームであり、時刻ｔ−１及び時刻ｔ−２の同フェード処理された２フレームＹ’(t-1)，Ｙ’(t-2)を参照フレームと仮定する。 Here, the frame Y ′ (t) at time t subjected to the fade process is the encoding target frame, and the two frames Y ′ (t−1), Y subjected to the same fade process at time t−1 and time t−2. '(t-2) is assumed to be a reference frame.

まず、この２フレームの平均値から数式（３）で示すように、予測画像Ｐ（ｔ）を生成する場合を考える。
Ｐ(t)＝{Ｙ’(t-1)＋Ｙ’(t-2)}/２（３）
ここで、数式（１）の(a)及び数式（２）の(b)のフェード期間を考えると、数式（３）の予測画像はそれぞれ、数式（４）及び数式（５）で示される。
P(t) =｛Y(t-1)×(t-1)/T＋Y(t-2)×(t-2)/T｝／２（４）
P(t) =｛Y(t-1)×(T-t+1+T₀)/T＋Y(t-2)×(T-t+2+T₀)/T｝／２（５）
いま、フェード前の原信号Ｙ(t)の時間変動が無い、すなわちＹ(t)がtによらず一定だと仮定し、Ｙ(t)＝Ｃ（一定）とすると、数式（４）及び数式（５）は、それぞれ数式（６）及び数式（７）となる。
Ｐ(t)＝Ｃ×(2t-3)/２Ｔ（６）
Ｐ(t)＝Ｃ×(2Ｔ-2t+3+2Ｔ₀)/２Ｔ（７）
一方、符号化すべき信号Ｙ’(t)は数式（８）及び数式（９）で示される。
Ｙ’(t)＝Ｃ×t/Ｔ（８）
Ｙ’(t)＝Ｃ×(Ｔ-t+Ｔ₀)/Ｔ（９）
数式（８）及び数式（９）のＹ’(t)から、数式（６）及び数式（７）の予測画像Ｐ(t)を減じた予測誤差信号Ｄ(t)は、それぞれ数式（１０）及び数式（１１）となる。
Ｄ(t)＝Ｃ×３/２Ｔ（１０）
Ｄ(t)＝−Ｃ×３/２Ｔ（１１）
一方、本発明の第３及び第４の特徴によると、数式（１２）で示す予測画像Ｐ(t)が、生成される。
Ｐ(t)＝２×Ｙ’(t-1)−Ｙ’(t-2) （１２）
上記と同様にＹ(t)＝Ｃ（一定）を仮定すると、数式（１）のフェードイン及び数式（２）のフェードアウト時の予測画像はそれぞれ数式（１３）及び数式（１４）で示される。
Ｐ(t)＝Ｃ×t/Ｔ（１３）
Ｐ(t)＝Ｃ×(Ｔ-t+Ｔ₀)/Ｔ（１４）
数式（１３）及び数式（１４）は、数式（８）及び数式（９）で示される符号化すべき画像と一致しており、符号化画像から予測画像を減じた予測誤差信号Ｄ(t)は何れの場合も０となる。上述のように、フェード画像では、ＭＰＥＧ等の従来の動き補償では残差信号が発生してしまうが、本発明の第３及び第４の特徴によれば、残差信号がなくなり、予測効率が大幅に改善することが分かる。 First, let us consider a case where a predicted image P (t) is generated from the average value of these two frames as shown in Equation (3).
P (t) = {Y '(t-1) + Y' (t-2)} / 2 (3)
Here, considering the fade periods of (a) in Equation (1) and (b) in Equation (2), the predicted images of Equation (3) are represented by Equation (4) and Equation (5), respectively.
P (t) = {Y (t-1) × (t-1) / T + Y (t-2) × (t-2) / T} / 2 (4)
P (t) = {Y (t−1) × (T−t + 1 + T ₀ ) / T + Y (t−2) × (T−t + 2 + T ₀ ) / T} / 2 (5)
Assuming that there is no time fluctuation of the original signal Y (t) before fading, that is, Y (t) is constant regardless of t, and Y (t) = C (constant), Equation (4) and Equation (5) becomes Equation (6) and Equation (7), respectively.
P (t) = C × (2t-3) / 2T (6)
P (t) = C × ( 2T-2t + 3 + 2T 0) / 2T (7)
On the other hand, the signal Y ′ (t) to be encoded is expressed by Equation (8) and Equation (9).
Y ′ (t) = C × t / T (8)
Y ′ (t) = C × (T−t + T ₀ ) / T (9)
Prediction error signals D (t) obtained by subtracting the predicted images P (t) of Formula (6) and Formula (7) from Y ′ (t) of Formula (8) and Formula (9) are respectively Formula (10). And Equation (11).
D (t) = C × 3 / 2T (10)
D (t) = − C × 3 / 2T (11)
On the other hand, according to the third and fourth features of the present invention, a predicted image P (t) represented by Expression (12) is generated.
P (t) = 2 × Y ′ (t−1) −Y ′ (t−2) (12)
Assuming Y (t) = C (constant) in the same manner as described above, the prediction images at the time of fade-in of Formula (1) and fade-out of Formula (2) are expressed by Formula (13) and Formula (14), respectively.
P (t) = C × t / T (13)
P (t) = C × (T−t + T ₀ ) / T (14)
Equations (13) and (14) match the images to be encoded represented by Equations (8) and (9), and the prediction error signal D (t) obtained by subtracting the prediction image from the encoded image is In either case, it is 0. As described above, in a fade image, a residual signal is generated by conventional motion compensation such as MPEG. However, according to the third and fourth features of the present invention, the residual signal is eliminated and prediction efficiency is improved. It turns out that it improves significantly.

数式（１）及び数式（２）の１/Ｔは、フェードイン及びフェードアウトの時間変化の速さを示しており、数式（１０）及び数式（１１）から、従来の動き補償では、フェードの変化速度が速いほど、予測残差が大きくなり、符号化効率が低下することが分かる。一方、本発明の第３及び第４の特徴によれば、フェードの変化速度によらず、高い予測効率を得ることが可能となる。 1 / T in Equation (1) and Equation (2) indicates the speed of time change of fade-in and fade-out. From Equation (10) and Equation (11), in the conventional motion compensation, the fade change It can be seen that the higher the speed, the larger the prediction residual and the lower the coding efficiency. On the other hand, according to the third and fourth features of the present invention, it is possible to obtain high prediction efficiency regardless of the rate of fade change.

本発明では、本発明の第１及び第３の特徴に加えて、前記符号化される動きベクトルが、前記複数の参照フレームのうち、特定の１つの参照フレームに関する動きベクトルであることを第５の特徴としている。 In the present invention, in addition to the first and third features of the present invention, it is fifth that the encoded motion vector is a motion vector related to one specific reference frame among the plurality of reference frames. It has the characteristics of

また、本発明の第２及び第４の特徴に加えて、前記受信した動きベクトルデータが、前記複数の参照フレームのうち、特定の１つの参照フレームに関する動きベクトルであり、前記動きベクトルデータを、復号化対象フレームと参照フレームとのフレーム間距離に応じてスケール変換して、他の参照フレームに対する動きベクトルを生成することを第６の特徴としている。 In addition to the second and fourth features of the present invention, the received motion vector data is a motion vector related to a specific reference frame among the plurality of reference frames, and the motion vector data is A sixth feature is that scale conversion is performed according to the interframe distance between the decoding target frame and the reference frame to generate a motion vector for another reference frame.

本発明の第１から第４の発明により、複数の参照画像を用いて、フェード画像等に対して、従来のより高い予測効率を得ることが可能となる。しかし、各符号化マクロブロックに対して、複数の参照画像に対する動きベクトルを個別に符号化データに多重化すると、符号化オーバーヘッドが大きくなる。ＩＴＵ−ＴＨ．２６３などの符号化方式では、Ｂピクチャに対する動きベクトルを送らず、Ｂピクチャを跨ぐＰピクチャの動きベクトルを、参照画像と符号化対象画像とのフレーム間距離に応じてスケーリングして、Ｂピクチャの動きベクトルとするダイレクトモードと呼ばれる符号化方法がある。これは、符号化対象の動画像が、数フレームの短時間で見て、動きの速度がほぼ一定または静止していると近似したモデルであり、多く場合、動きベクトル符号量削減の効果的がある。 According to the first to fourth aspects of the present invention, it is possible to obtain higher prediction efficiency than conventional methods for a fade image or the like using a plurality of reference images. However, if the motion vectors for a plurality of reference images are individually multiplexed into encoded data for each encoded macroblock, the encoding overhead increases. ITU-TH. In a coding scheme such as H.263, the motion vector for the B picture is not sent, and the motion vector of the P picture straddling the B picture is scaled according to the interframe distance between the reference image and the encoding target image, and There is an encoding method called a direct mode using a motion vector. This is a model that approximates that the moving image to be encoded is almost constant or stationary when viewed in a short period of several frames. In many cases, the motion vector code amount reduction is effective. is there.

本発明の第５及び第６の特徴によれば、Ｂピクチャのダイレクトモードと同じく、Ｐピクチャにおいても、複数の参照フレームに対する動きベクトルのうち、１つの動きベクトルのみを符号化し、復号化側では、参照画像とのフレーム間距離に応じて、受信した動きベクトルをスケーリングして用いることが可能となり、本発明の第１から第４の発明による符号化効率の向上を、符号化オーバヘッドの増加なしに実現することが可能となる。 According to the fifth and sixth features of the present invention, in the P picture as well as in the direct mode of the B picture, only one motion vector among the motion vectors for a plurality of reference frames is encoded. The received motion vector can be scaled and used in accordance with the inter-frame distance from the reference image, and the encoding efficiency according to the first to fourth aspects of the present invention can be improved without increasing the encoding overhead. Can be realized.

本発明では、本発明の第５の特徴に加えて、前記特定の１つの参照フレームに関する動きベクトルが、前記参照フレームと符号化されるフレームとのフレーム間距離に応じて正規化された動きベクトルであることを第７の特徴としている。 In the present invention, in addition to the fifth feature of the present invention, a motion vector related to the specific reference frame is a motion vector normalized according to an interframe distance between the reference frame and a frame to be encoded. This is the seventh feature.

また、本発明の第６の特徴に加えて、前記受信した特定の１つの参照フレームに関する動きベクトルが、前記参照フレームと符号化されるフレームとのフレーム間距離に応じて正規化された動きベクトルであることを第８の特徴としている。 In addition to the sixth aspect of the present invention, the motion vector related to the received one specific reference frame is a motion vector normalized according to the interframe distance between the reference frame and the frame to be encoded. This is the eighth feature.

本発明の第７及び第８の特徴により、フレーム間距離が変わっても符号化される動きベクトルの基準スケールが一定となり、複数の参照フレームそれぞれに対する動きベクトルのスケーリング処理が、参照フレームと符号化されるフレームとのフレーム間距離の情報だけで演算することが可能となる。また、任意のスケーリングを行うためには除算が必要となるが、符号化される動きベクトルがフレーム間距離で正規化されていることにより、スケーリング処理を乗算だけで実現することが可能となり、符号化及び符号化のコストを軽減することが可能となる。 According to the seventh and eighth features of the present invention, the reference scale of the motion vector to be encoded is constant even if the inter-frame distance changes, and the motion vector scaling processing for each of the plurality of reference frames is encoded with the reference frame. It is possible to calculate only with the information of the inter-frame distance from the frame to be performed. In addition, division is necessary to perform arbitrary scaling, but since the motion vector to be encoded is normalized by the interframe distance, the scaling process can be realized only by multiplication. The cost of encoding and encoding can be reduced.

本発明では、本発明の第１及び第３の特徴に加えて、前記符号化される動きベクトルが、前記複数の参照フレームのうち、特定の１つの参照フレームに関する第１の動きベクトルと、他の複数の参照フレームに対する複数の動きベクトルであり、前記複数の動きベクトルが、前記第１の動きベクトルを、符号化対象フレームと前記複数の参照フレームとのフレーム間距離に応じて、スケーリングした動きベクトルと、前記複数の動きベクトルとの差分ベクトルとして符号化されることを第９の特徴としている。 In the present invention, in addition to the first and third features of the present invention, the encoded motion vector includes a first motion vector related to a specific reference frame among the plurality of reference frames, and the like. A plurality of motion vectors with respect to a plurality of reference frames, wherein the plurality of motion vectors are motions obtained by scaling the first motion vector according to interframe distances between an encoding target frame and the plurality of reference frames. A ninth feature is that the vector is encoded as a difference vector between the vector and the plurality of motion vectors.

また、本発明の第２及び第４の特徴に加えて、前記受信した動きベクトルデータが、前記複数の参照フレームのうち、特定の１つの参照フレームに関する動きベクトルと、他の参照フレームに関する差分ベクトルであり、前記動きベクトルデータを、復号化対象フレームと参照フレームとのフレーム間距離に応じてスケール変換し、前記差分ベクトルと加算することで、前記複数の参照フレームのうち、前記特定の１フレーム以外の参照フレームに関する動きベクトルを生成することを第１０の特徴としている。 In addition to the second and fourth features of the present invention, the received motion vector data includes a motion vector related to a specific reference frame and a difference vector related to another reference frame among the plurality of reference frames. The motion vector data is scaled according to the interframe distance between the decoding target frame and the reference frame, and added to the difference vector, so that the specific one frame among the plurality of reference frames. A tenth feature is that a motion vector related to a reference frame other than the above is generated.

本発明の第５及び第６の特徴によれば、静止画像あるいは動きの速度が一定の映像の場合に対して、動きベクトル情報の符号化オーバヘッドの増加なしに、複数の参照フレームを用いて予測効率を向上させることが可能となる。しかし、動きの速度が一定ではない場合には、動きベクトルの単純なスケーリングだけでは、充分な予測効率が得られない場合がある。 According to the fifth and sixth aspects of the present invention, prediction is performed using a plurality of reference frames without increasing the coding overhead of motion vector information for a still image or a video with a constant motion speed. Efficiency can be improved. However, when the motion speed is not constant, sufficient prediction efficiency may not be obtained only by simple scaling of the motion vector.

一方、ＭＰＥＧ２動画像符号化の１つの予測モードである、dual-prime（デュアルプライム）予測では、２つの連続するフィールドを用いた動き予測で、一方のフィールドに対する動きベクトルと、前記動きベクトルをフィールド間距離に応じてスケーリングした動きベクトルと、他方のフィールドに対する動きベクトルとの差分ベクトルを符号化する構成となっている。動きベクトルは、1/2画素精度で表現され、これにより、２フィールドの参照マクロブロックの平均化により、適応的な時空間フィルタによるループフィルタ効果がもたらされ、且つ符号化オーバヘッドの増加を抑えることが可能となり、符号化効率の向上に大きく寄与している。 On the other hand, in dual-prime prediction, which is one prediction mode of MPEG2 video coding, motion prediction for one field and the motion vector for the field are performed by motion prediction using two consecutive fields. The difference vector between the motion vector scaled according to the distance and the motion vector for the other field is encoded. The motion vector is expressed with 1/2 pixel accuracy, which results in a loop filter effect by an adaptive spatio-temporal filter by averaging two-field reference macroblocks, and suppresses an increase in coding overhead. Therefore, it greatly contributes to the improvement of coding efficiency.

本発明の第９及び第１０の特徴によれば、dual-prime予測と同様の、符号化オーバヘッドの増加を抑えた、適応的な時空間フィルタによるループフィルタ効果に加えて、さらにフェード画像等に対する予測効率の改善を図ることが可能となり、従来よりも高い符号化効率を得ることが可能なる。 According to the ninth and tenth features of the present invention, in addition to the loop filter effect by the adaptive spatio-temporal filter, which suppresses the increase in coding overhead, similar to the dual-prime prediction, further to the fade image etc. It is possible to improve the prediction efficiency, and it is possible to obtain higher encoding efficiency than before.

本発明では，本発明の第１、第３、第５、第７及び第９の特徴に加えて、前記予測モードが，特定の１つの参照フレームを利用した予測か，或いは複数の参照フレームを利用した予測のいずれかを示す第１のフラグと、前記複数の参照フレームを利用した予測が、複数の参照マクロブロックの平均値による予測か、或いは複数の参照マクロブロックの線形外挿或いは線形内挿による予測かを示す第２フラグから構成され、前記第２のフラグが符号化フレームのヘッダデータ、または複数の符号化フレーム群に対するヘッダデータに含まれることを第１１の特徴としている。 In the present invention, in addition to the first, third, fifth, seventh, and ninth features of the present invention, the prediction mode is a prediction using a specific reference frame or a plurality of reference frames. The first flag indicating one of the predictions used and the prediction using the plurality of reference frames is prediction based on an average value of a plurality of reference macroblocks, or linear extrapolation or linear interpolation of a plurality of reference macroblocks The eleventh feature is that the second flag indicating prediction by insertion is included in header data of an encoded frame or header data for a plurality of encoded frame groups.

また、本発明の第２、第４、第６、第８及び第１０の特徴に加えて、前記受信した予測モードが、特定の１つの参照フレームを利用した予測か、或いは複数の参照フレームを利用した予測のいずれかを示す第１のフラグと、前記複数の参照フレームを利用した予測が、複数の参照マクロブロックの平均値による予測か、或いは複数の参照マクロブロックの線形外挿或いは線形内挿による予測かを示す第２フラグから構成され、前記第２フラグが符号化フレームのヘッダデータまたは複数の符号化フレーム群に対するヘッダデータの一部として受信されることを第１２の特徴としている。 In addition to the second, fourth, sixth, eighth, and tenth features of the present invention, the received prediction mode may be a prediction using a specific reference frame, or a plurality of reference frames. The first flag indicating one of the predictions used and the prediction using the plurality of reference frames is prediction based on an average value of a plurality of reference macroblocks, or linear extrapolation or linear interpolation of a plurality of reference macroblocks A twelfth feature is that the second flag indicates whether the prediction is based on insertion, and the second flag is received as part of header data of an encoded frame or header data for a plurality of encoded frame groups.

上記説明した通り、本発明によれば、複数の参照フレームの中から、符号化フレームのマクロブロック毎に、特定の参照フレームだけから予測マクロブロック信号を生成すか、複数の参照画像の平均値から予測マクロブロック信号を生成するか、あるいは複数の参照画像の線形外挿あるいは線形内挿により予測マクロブロック信号を生成するかを適応的に切り替えることで、予測効率を向上させ高能率で高画質の符号化が可能となる。 As described above, according to the present invention, a predicted macroblock signal is generated from only a specific reference frame for each macroblock of an encoded frame from among a plurality of reference frames, or from an average value of a plurality of reference images. By adaptively switching whether to generate a predicted macroblock signal or to generate a predicted macroblock signal by linear extrapolation or linear interpolation of multiple reference images, the prediction efficiency is improved and high efficiency and high image quality are achieved. Encoding is possible.

例えば、同一フレーム内で時間的に背景が見え隠れするような映像部分では、複数の参照フレームの内、特定の参照フレームだけから予測すること（ここでは、予測モード１とする）が効果的であり、時間変動が少ない部分は、複数の参照画像の平均値から予測すること（ここでは、予測モード２とする）で、参照画像における符号化歪を除去するループフィルタ効果が得られ、また、フェード画像などの映像信号の振幅が時間変動する場合には、複数の参照画像の線形外挿あるいは線形内挿（ここでは、予測モード３とする）により、予測効率を向上させることが可能となる。 For example, in a video portion where the background appears and disappears in time in the same frame, it is effective to predict from a specific reference frame among a plurality of reference frames (here, prediction mode 1). In addition, a portion with little temporal variation is predicted from an average value of a plurality of reference images (here, prediction mode 2), thereby obtaining a loop filter effect for removing coding distortion in the reference image, and fading. When the amplitude of a video signal such as an image varies with time, prediction efficiency can be improved by linear extrapolation or linear interpolation (here, prediction mode 3) of a plurality of reference images.

通常、従来の符号化方式では、このようにマクロブロック毎に最適な予測モードを選択的に切り替える場合は、各マクロブロック毎に予測モードを示すフラグを、各マクロブロック毎のヘッダデータに含めて符号化される。しかし、多くの予測モードを切り替えて使用すると、予測モードを示すフラグの符号化オーバヘッドが増加するという問題がある。 Normally, in the conventional encoding method, when selectively switching the optimal prediction mode for each macroblock as described above, a flag indicating the prediction mode for each macroblock is included in the header data for each macroblock. Encoded. However, when many prediction modes are switched and used, there is a problem that the encoding overhead of a flag indicating the prediction mode increases.

本発明の第１１及び第１２の特徴によれば、符号化フレーム毎に、上記予測モード１と予測モード２の組み合わせか、あるいは予測モード１と予測モード３の組み合わせかに制限する。上記組み合わせのどちらであるかを示す第２フラグと、予測モード１か、あるいは予測モード２または予測モード３であることを示す第１フラグを用意し、予測モードの組み合わせを示す第２フラグは、符号化フレームのヘッダデータに含め、予測モードを示す第１フラグは、マクロブロック毎に変更可能とし、マクロブロックのヘッダデータに含めることで、符号化データにおける予測モードに関するオーバヘッドを低減することが可能となる。 According to the eleventh and twelfth features of the present invention, each coding frame is limited to the combination of the prediction mode 1 and the prediction mode 2 or the combination of the prediction mode 1 and the prediction mode 3. A second flag indicating which of the above combinations and a first flag indicating whether the prediction mode 1 or the prediction mode 2 or the prediction mode 3 is provided, and the second flag indicating the combination of the prediction modes is: The first flag indicating the prediction mode included in the header data of the encoded frame can be changed for each macroblock, and by including it in the header data of the macroblock, the overhead related to the prediction mode in the encoded data can be reduced. It becomes.

フェード画像のように映像信号の振幅が時間変化する場合は、フレーム内で一様に振幅が時間変化するため、マクロブロック毎に予測モード２と予測モード３を切り替える必要がなく、フレーム毎に固定で何ら予測効率の低下は招かない。 When the amplitude of the video signal changes over time as in a fade image, the amplitude changes over time uniformly within the frame, so there is no need to switch between prediction mode 2 and prediction mode 3 for each macroblock, and the fixed value for each frame. However, there will be no decline in the prediction efficiency.

一方、背景の時間的な見え隠れ等は、映像信号の振幅の時間変化に無関係に、フレーム内に発生するため、フレーム毎に固定とすると、予測効率の低下をもたらす。よって、第１フラグにより、最適な予測モードをマクロブロック毎に切り替えることが必要となる。よって、上記のように、予測モードを示すフラグを、フレームのヘッダとマクロブロックのヘッダとに分離することで、予測効率を低下させずに、符号化オーバヘッドを低減させることが可能となる。 On the other hand, the temporal appearance and the like of the background occur within the frame regardless of the temporal change in the amplitude of the video signal. Therefore, if fixed for each frame, the prediction efficiency decreases. Therefore, it is necessary to switch the optimal prediction mode for each macroblock by the first flag. Therefore, as described above, by separating the flag indicating the prediction mode into the header of the frame and the header of the macroblock, it is possible to reduce the coding overhead without reducing the prediction efficiency.

また、本発明では、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化において、前記複数の参照フレームからの線形予測により予測マクロブロックを生成し、前記予測マクロブロックと符号化マクロブロックとの予測誤差信号及び動きベクトルをマクロブロック毎に符号化し、前記線形予測の予測係数の組をフレーム毎に符号化することを第１３の特徴としている。 In the present invention, in motion compensated prediction interframe coding that refers to a plurality of video frames for each macroblock, a prediction macroblock is generated by linear prediction from the plurality of reference frames, and the prediction macroblock and the code are encoded. A thirteenth feature is that a prediction error signal and a motion vector with a coded macroblock are encoded for each macroblock, and a set of prediction coefficients for linear prediction is encoded for each frame.

また、本発明の第１３の特徴に加えて、前記複数の参照フレームが、符号化対象フレームより時間的に過去のフレームであることを、本発明の第１４の特徴としている。 In addition to the thirteenth feature of the present invention, the fourteenth feature of the present invention is that the plurality of reference frames are temporally past frames from the encoding target frame.

また、マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化データの復号化において、マクロブロック毎に符号化された動きベクトルデータ及び予測誤差信号と、フレーム毎に符号化された予測係数の組を受信し、前記動きベクトル及び予測係数に応じて、前記複数の参照フレームから予測マクロブロックを生成し、前記生成された予測マクロブロックと前記予測誤差信号を加算することを、本発明の第１５の特徴としている。 In addition, in decoding of motion compensated prediction interframe encoded data that refers to a plurality of video frames for each macroblock, the motion vector data and the prediction error signal encoded for each macroblock are encoded for each frame. Receiving a set of prediction coefficients, generating a prediction macroblock from the plurality of reference frames according to the motion vector and the prediction coefficient, and adding the generated prediction macroblock and the prediction error signal, This is the fifteenth feature of the present invention.

また、本発明の第１５の特徴に加えて、前記複数の参照フレームが、符号化対象フレームより時間的に過去のフレームであることを、本発明の第１６の特徴としている。 In addition to the fifteenth feature of the present invention, the sixteenth feature of the present invention is that the plurality of reference frames are temporally past frames from the encoding target frame.

本発明の第１３から第１６の特徴によれば、任意の時間方向の予測係数を設定できるため、フェード画像などで映像信号振幅の時間変化が一定の場合のみならず、映像信号振幅の任意の時間変動に対して、最適な予測係数の組を符号化側で用いることで、予測効率の向上が図れ、また、前記の予測係数を符号化データに多重化して伝送することで、復号化時にも符号化時と同一の線形予測が可能となり、高能率な予測符号化を行うことが可能となる。 According to the thirteenth to sixteenth aspects of the present invention, since a prediction coefficient in an arbitrary time direction can be set, not only when the temporal change of the video signal amplitude is constant in a fade image or the like, but also any arbitrary video signal amplitude. The prediction efficiency can be improved by using an optimum set of prediction coefficients on the encoding side with respect to temporal fluctuations, and the above prediction coefficients can be multiplexed with the encoded data and transmitted. Also, the same linear prediction as that at the time of encoding becomes possible, and highly efficient predictive encoding can be performed.

本発明では、複数の参照フレームからの予測により、符号化効率の向上を得ることが可能であるが、参照フレームは、ＭＰＥＧにおけるＢピクチャのように、時間的に前後のフレームからの予測において、時間的に過去及び未来の複数のフレームを参照する構成としてもよい。また、ＭＰＥＧのＩピクチャ及びＰピクチャだけの場合と同様に、時間的に過去のフレームのみを参照する構成として、過去の複数のＰピクチャ及びＩピクチャを参照画像とする構成としてもよい。 In the present invention, it is possible to improve the coding efficiency by prediction from a plurality of reference frames. However, in the prediction from frames preceding and following in time, like a B picture in MPEG, It may be configured to refer to a plurality of frames in the past and the future in terms of time. Similarly to the case of only MPEG I and P pictures, a configuration in which only a past frame in time is referred to may be a configuration in which a plurality of past P pictures and I pictures are used as reference images.

このような構成とすることで、従来のＭＰＥＧ符号化よりも、さらに高画質な符号化が実現可能となる。特に、過去の画像のみを使うＰピクチャの符号化においても、従来とは異なり過去の複数の参照フレームを用いることで、従来よりも大幅に符号化効率を改善させることが可能となる。Ｂピクチャを用いない符号化では、符号化フレーム並べ替えのための遅延が不要となり、低遅延の符号化が可能であるため、本発明によれば、低遅延の符号化においても、従来よりも大きな符号化効率の改善得られるものとなる。 By adopting such a configuration, it is possible to realize encoding with higher image quality than conventional MPEG encoding. In particular, even in the coding of a P picture that uses only past images, it is possible to significantly improve the coding efficiency compared to the conventional art by using a plurality of past reference frames unlike the conventional art. In encoding without using a B picture, a delay for rearranging encoded frames is not necessary, and low-delay encoding is possible. Therefore, according to the present invention, even in low-delay encoding, compared to the conventional case. A large improvement in coding efficiency can be obtained.

さらに、本発明は入力動画像の符号化対象フレームに含まれる符号化対象ブロック毎に、少なくとも一つの参照フレームを用いて動き補償予測フレーム間符号化を行う動画像符号化方法において、単一の参照フレームから予測ブロック信号を生成する第１の予測ブロック生成モード、及び複数の参照フレームから切り出した複数の参照ブロックの線形和予測により予測ブロック信号を生成する第２の予測ブロック生成モードのいずれかを符号化ブロック毎に選択するステップと、選択された予測ブロック信号と符号化対象ブロックの信号との差分信号を符号化するステップと、線形和予測を複数の参照ブロックの平均値予測とするか、あるいは複数の参照フレーム及び符号化対象フレームの表示時刻に基づく線形補間予測とするかを符号化対象フレーム内の複数の画素ブロック毎または符号化対象フレーム毎に選択するステップと、予測ブロック信号の生成時に第１及び第２の予測ブロック生成モードのいずれが選択されたかを示す第１の符号化モード情報を符号化対象ブロック毎または複数の符号化対象ブロック毎に符号化するステップと、線形和予測に平均値予測及び線形補間予測のいずれが選択されたかを示す第２の符号化モード情報を符号化対象フレームの複数の画素ブロック毎あるいは符号化対象フレーム毎に符号化するステップとを具備することを特徴とする。 Furthermore, the present invention provides a moving image coding method for performing motion compensation prediction interframe coding using at least one reference frame for each coding target block included in a coding target frame of an input moving image. One of a first prediction block generation mode for generating a prediction block signal from a reference frame and a second prediction block generation mode for generating a prediction block signal by linear sum prediction of a plurality of reference blocks cut out from a plurality of reference frames For each encoding block, encoding a difference signal between the selected prediction block signal and the signal of the encoding target block, and whether linear sum prediction is an average value prediction of a plurality of reference blocks Or whether to use linear interpolation prediction based on the display times of multiple reference frames and encoding target frames. A step of selecting for each of a plurality of pixel blocks in the frame or for each encoding target frame, and a first encoding mode indicating which one of the first and second prediction block generation modes is selected when the prediction block signal is generated A step of encoding information for each encoding target block or a plurality of encoding target blocks, and encoding second encoding mode information indicating whether average value prediction or linear interpolation prediction is selected for linear sum prediction Encoding for each of a plurality of pixel blocks of the encoding target frame or for each encoding target frame.

一方、動画像の復号化対象フレームに含まれる復号化対象ブロック毎に、少なくとも一つの参照フレームを用いて動き補償予測フレーム間復号化を行う動画像復号化方法において、予測ブロック信号に対する復号化対象ブロックの信号の予測誤差信号を復号化するステップと、符号化側における予測ブロック信号の生成時に、単一の参照フレームから予測ブロック信号を生成する第１の予測ブロック生成モード、及び複数の参照フレームから切り出した複数の参照ブロックの線形和予測により予測ブロック信号を生成する第２の予測ブロック生成モードのいずれが選択されたかを示す第１の符号化モード情報を復号化対象ブロック毎または複数の復号化対象ブロック毎に受信して復号化するステップと、線形和予測に複数の参照ブロックの平均値予測、あるいは複数の参照フレーム及び符号化対象フレームの表示時刻に基づく線形補間予測のいずれが選択されたかを示す第２の符号化モード情報を復号化対象フレームの複数の画素ブロック毎あるいは復号化対象フレーム毎に受信して復号化するステップと、復号化された第１の符号化モード情報及び第２の符号化モード情報に従って予測ブロック信号を生成するステップと、生成された予測ブロック信号及び復号された予測誤差信号を用いて再生動画像信号を生成するステップとを具備することを特徴とする。 On the other hand, in a moving picture decoding method for performing motion compensated prediction interframe decoding using at least one reference frame for each decoding target block included in a decoding target frame of a moving picture, a decoding target for a prediction block signal A step of decoding a prediction error signal of the signal of the block, a first prediction block generation mode for generating a prediction block signal from a single reference frame when generating a prediction block signal on the encoding side, and a plurality of reference frames First coding mode information indicating which one of the second prediction block generation modes for generating a prediction block signal by linear sum prediction of a plurality of reference blocks cut out from each block is to be decoded or a plurality of decoding Receiving and decoding for each block to be normalized, and calculating a plurality of reference blocks for linear sum prediction. Second encoding mode information indicating whether value prediction or linear interpolation prediction based on display times of a plurality of reference frames and encoding target frames is selected for each of a plurality of pixel blocks of a decoding target frame or decoding Receiving and decoding for each target frame; generating a predicted block signal according to the decoded first coding mode information and second coding mode information; and the generated predicted block signal and decoding Generating a reproduced moving image signal using the predicted error signal.

本発明によれば、従来のＭＰＥＧなどの動画像符号化方式が不得意としていた、フェードイン・フェードアウトなどの映像に対して、符号化及び復号化の演算量やコストの大幅な増加を必要とせずに、予測効率を大幅に向上することが可能となり、また、符号化データのオーバヘッドも小さく、高画質で高能率な動画像符号化及び復号化方式を提供することが可能となる。 According to the present invention, it is necessary to significantly increase the amount of calculation and cost of encoding and decoding for video such as fade-in and fade-out, which has been unsatisfactory with conventional video encoding methods such as MPEG. Therefore, the prediction efficiency can be greatly improved, and the overhead of encoded data is small, and it is possible to provide a high-quality and highly efficient moving picture encoding and decoding method.

図１は、本発明の実施形態に係る動画像符号化方法のブロック図である。入力動画像信号１００に対して、第１の参照フレームメモリ１１７に保存されたフレームと、第２の参照フレームメモリ１１８に保存されたフレームとから、予測マクロブロック生成手段１１９により、予測画像を生成し、予測マクロブロック選択手段１２０で最適な予測マクロブロックを選択し、入力信号と予測信号との予測誤差信号１０１に対して、ＤＣＴ変換（離散コサイン変換）１１２、量子化１１３、及び可変長符号化１１４を行い、符号化データ１０２を出力する。出力される符号化データ１０２には、後述する動きベクトル情報は予測モードの情報も合わせて符号化されて出力される。量子化手段１１３で量子化された信号は、逆量子化手段１１５により逆量子化され、予測信号１０６と加算して、ローカルデコード画像１０３を生成し、参照フレームメモリ１１７へ書き込まれる。 FIG. 1 is a block diagram of a video encoding method according to an embodiment of the present invention. A predicted macroblock generation unit 119 generates a predicted image from the frame stored in the first reference frame memory 117 and the frame stored in the second reference frame memory 118 for the input moving image signal 100. Then, the prediction macroblock selection means 120 selects an optimal prediction macroblock, and DCT transform (discrete cosine transform) 112, quantization 113, and variable length code are applied to the prediction error signal 101 between the input signal and the prediction signal. 114, and the encoded data 102 is output. In the output encoded data 102, motion vector information, which will be described later, is encoded together with prediction mode information and output. The signal quantized by the quantizing unit 113 is dequantized by the dequantizing unit 115, added to the prediction signal 106, generates a local decoded image 103, and is written to the reference frame memory 117.

本実施例では、予測誤差信号１０１は、ＤＣＴ変換、量子化、可変長符号化により符号化されるが、例えばＤＣＴ変換をウエーブレット変換に置き換えた構成としたり、あるいは可変長符号化を算術符号化に置き換えた構成としてもよい。 In this embodiment, the prediction error signal 101 is encoded by DCT transform, quantization, and variable length coding. For example, the DCT transform is replaced with wavelet transform, or variable length coding is an arithmetic code. It is good also as a structure replaced with 化.

本実施形態では、第１の参照フレームメモリ１１７には、直前に符号化されたフレームのローカルデコード画像が保存され、第２の参照フレームメモリ１１８には、さらにその前に符号化されたフレームのローカルデコード画像が保存される構成である。予測マクロブロック生成手段１１９では、第１の参照フレームメモリ１１７の画像だけから生成される予測マクロブロック信号１３０と、第２の参照フレームメモリ１１８の画像だけから生成される予測マクロブロック信号１３１と、第１及び第２の参照フレームメモリから切り出した参照マクロブロック信号を平均化した予測マクロブロック信号１３２と、第１の参照フレームメモリ１１７から切り出した参照マクロブロック信号の振幅を２倍にした信号から、第２の参照フレームメモリ１１８から切り出した参照マクロブロック信号を減じた予測マクロブロック信号１３３を生成する。これらの予測マクロブロック信号は、それぞれフレーム内の複数の位置から切り出して、複数の予測マクロブロック信号を生成する。 In the present embodiment, the first reference frame memory 117 stores a locally decoded image of the frame encoded immediately before, and the second reference frame memory 118 further stores the previously encoded frame. In this configuration, a local decoded image is stored. In the prediction macroblock generation means 119, the prediction macroblock signal 130 generated only from the image of the first reference frame memory 117, the prediction macroblock signal 131 generated only from the image of the second reference frame memory 118, From the prediction macroblock signal 132 obtained by averaging the reference macroblock signals cut out from the first and second reference frame memories and the signal obtained by doubling the amplitude of the reference macroblock signal cut out from the first reference frame memory 117 Then, a prediction macroblock signal 133 obtained by subtracting the reference macroblock signal cut out from the second reference frame memory 118 is generated. These prediction macroblock signals are cut out from a plurality of positions in the frame, respectively, to generate a plurality of prediction macroblock signals.

予測マクロブロック選択手段１２０では、予測マクロブロック生成手段１１９で生成された複数の予測マクロブロック信号に対し、入力動画像信号１００から切り出した符号化対象マクロブロック信号との差分を計算し、誤差が最小となる予測マクロブロックを、符号化マクロブロック毎に選択する。選択された予測マクロブロックの、符号化対象マクロブロックから見た相対的な位置を、動きベクトルとして、また、選択された予測マクロブロックの生成方法（図１では、１３０から１３３のいずれか）を、予測モードとして、それぞれマクロブロック毎に符号化する。 The prediction macroblock selection unit 120 calculates the difference between the plurality of prediction macroblock signals generated by the prediction macroblock generation unit 119 and the encoding target macroblock signal cut out from the input video signal 100, and the error is calculated. The smallest predicted macroblock is selected for each encoded macroblock. The relative position of the selected prediction macroblock as viewed from the encoding target macroblock is used as a motion vector, and the method of generating the selected prediction macroblock (any one of 130 to 133 in FIG. 1) is selected. As a prediction mode, encoding is performed for each macroblock.

ここで、動画像信号が輝度及び色差信号で構成される場合、各マクロブロックのそれぞれの信号成分に対して、同一の動きベクトル及び予測モードを適用して、予測誤差信号の生成を行う。 Here, when the moving image signal is composed of luminance and color difference signals, a prediction error signal is generated by applying the same motion vector and prediction mode to each signal component of each macroblock.

図２は、本発明の実施形態に係る動画像復号化方法のブロック図である。図２の動画像復号化方法は、図１で示した本発明の実施形態に係る動画像符号化方法で符号化された符号化データを入力し、復号化するものである。 FIG. 2 is a block diagram of a video decoding method according to the embodiment of the present invention. The moving picture decoding method in FIG. 2 is for inputting and decoding the encoded data encoded by the moving picture encoding method according to the embodiment of the present invention shown in FIG.

入力された符号化データ２００は、可変長符号復号化手段２１４で可変長符号が復号化され、予測誤差信号２０１、動きベクトル情報及び予測モード情報２０２が抽出される。予測誤差信号２０１は、逆量子化２１５、逆ＤＣＴ２１６が施され、予測信号２０６と加算して復号化画像２０３が生成される。 The input encoded data 200 is subjected to variable length code decoding by the variable length code decoding means 214, and a prediction error signal 201, motion vector information, and prediction mode information 202 are extracted. The prediction error signal 201 is subjected to inverse quantization 215 and inverse DCT 216, and is added to the prediction signal 206 to generate a decoded image 203.

復号化画像２０３は第１の参照フレームメモリ２１７に書き込まれる。予測信号２０６は、第１の参照フレームメモリ２１７に記録された直前に復号化された画像信号と、さらにそれよりも前に復号化された動画像信号が記録された第２の参照フレームメモリ２１８の画像信号から、符号化データ２００から抽出した動きベクトル及び予測モードに応じて、予測マクロブロック生成手段２１９及び予測マクロブロック選択手段２２０により、符号化時に使用された予測マクロブロック信号と同じ予測信号が生成される。 The decoded image 203 is written in the first reference frame memory 217. The prediction signal 206 is a second reference frame memory 218 in which an image signal decoded immediately before being recorded in the first reference frame memory 217 and a moving image signal decoded earlier than that is recorded. In accordance with the motion vector extracted from the encoded data 200 and the prediction mode, the prediction macroblock generation unit 219 and the prediction macroblock selection unit 220 predict the same prediction signal as the prediction macroblock signal used at the time of encoding. Is generated.

図３は、本発明の実施形態に係るフレーム間予測の関係を模式的に示した例である。図中３０２が符号化対象フレームであり、３０１はその直前のフレーム、３００はさらにその前のフレームを示している。フレーム３０２を符号化、あるいは復号化しているとき、図１の１１７あるいは図２の２１７の第１の参照フレームメモリには、フレーム３０１の復号化画像が保存されており、また図１の１１８あるいは図２の２１８の第２の参照フレームメモリには、フレーム３００が保存されている。 FIG. 3 is an example schematically showing the relationship of inter-frame prediction according to the embodiment of the present invention. In the figure, reference numeral 302 denotes an encoding target frame, 301 denotes a frame immediately before, and 300 denotes a previous frame. When the frame 302 is encoded or decoded, the decoded image of the frame 301 is stored in the first reference frame memory 117 of FIG. 1 or 217 of FIG. A frame 300 is stored in the second reference frame memory 218 in FIG.

図３の３０５が符号化対象マクロブロック示しており、参照フレーム３００の参照マクロブロック３０３及び参照フレーム３０１の参照マクロブロック３０４の、いずれか或いは両方を用いて予測マクロブロックが生成される。図中３０６及び３０７は、それぞれ参照マクロブロック３０３及び３０４の位置を示す動きベクトルである。符号化時は、符号化マクロブロック３０５に最適な動きベクトル及び予測モードの探索が行われる。また、復号化時は、符号化データに含まれる動きベクトル及び予測モードを用いて、予測マクロブロック信号が生成される。 Reference numeral 305 in FIG. 3 denotes an encoding target macroblock, and a prediction macroblock is generated using one or both of the reference macroblock 303 of the reference frame 300 and the reference macroblock 304 of the reference frame 301. In the figure, reference numerals 306 and 307 denote motion vectors indicating the positions of the reference macroblocks 303 and 304, respectively. At the time of encoding, a search for an optimal motion vector and prediction mode for the encoding macroblock 305 is performed. At the time of decoding, a prediction macroblock signal is generated using a motion vector and a prediction mode included in encoded data.

図４及び図５は、本発明の実施形態に係わり、３フレーム以上の参照フレームを用いるフレーム間予測の関係を示す例である。図４では、過去の複数の参照フレームを用いる例であり、また図５は過去及び未来の複数の参照フレームを用いる例である。 4 and 5 are examples showing the relationship of inter-frame prediction using three or more reference frames according to the embodiment of the present invention. 4 is an example using a plurality of past reference frames, and FIG. 5 is an example using a plurality of past and future reference frames.

図４において、４０４が符号化対象フレームを示し、４００から４０３が、フレーム４０４に対する参照フレームとなる。４１３は符号化マクロブロックを示しており、符号化においては、符号化マクロブロック毎に、各参照フレームから、各参照フレームに対する動きベクトル（図中４０５から４０８）に応じて、参照マクロブロック（図中４０９から４１２）を切り出し、複数の参照マクロブロックからの線形予測により予測マクロブロックを生成する。次に、複数の参照マクロブロックの中の１つ、或いは線形予測による予測マクロブロックの何れかの予測モードで、予測誤差が最小となる動きベクトル及び予測モードの組を選択する。線形予測係数は、例えはフレーム間の平均輝度の時間変化等から符号化フレーム毎に１組決定する。決定された予測係数の組は符号化フレームのヘッダデータとして符号化し、また、各マクロブロックの動きベクトル、予測モード、及び予測誤差信号は、マクロブロック毎に符号化される。 In FIG. 4, reference numeral 404 denotes an encoding target frame, and reference numerals 400 to 403 are reference frames for the frame 404. Reference numeral 413 denotes an encoded macroblock. In encoding, for each encoded macroblock, reference macroblocks (FIG. 405 to 408) are determined according to motion vectors (405 to 408 in the figure) from each reference frame. 409 to 412) are cut out, and a prediction macroblock is generated by linear prediction from a plurality of reference macroblocks. Next, a set of a motion vector and a prediction mode that minimizes a prediction error is selected in one of a plurality of reference macroblocks or a prediction mode of a prediction macroblock based on linear prediction. One set of linear prediction coefficients is determined for each encoded frame from, for example, temporal change in average luminance between frames. The determined set of prediction coefficients is encoded as header data of an encoded frame, and the motion vector, prediction mode, and prediction error signal of each macroblock are encoded for each macroblock.

また、復号化時は、フレーム毎に受信した線形予測係数の組を用いて、マクロブロック毎に動きベクトル及び予測モードの情報から、複数の参照フレームより、予測マクロブロックを生成し、予測誤差信号と加算することで復号化を行う。 At the time of decoding, a prediction macroblock is generated from a plurality of reference frames from motion vector and prediction mode information for each macroblock using a set of linear prediction coefficients received for each frame, and a prediction error signal is generated. And decoding is performed.

図５では、５０２が符号化対象フレームを示しており、５００、５０１、５０３、５０４が参照フレームを示している。図５の場合、符号化及び復号化時は、５００、５０１、５０３、５０４、５０２の順序となるように、フレームの並べ替えが行われ、符号化の場合は複数のローカルデコードフレーム、復号化の場合はすでに復号化された複数のフレームを、それぞれ参照フレームとして用いる。符号化対象マクロブロック５１１に対して、図４の例と同様に、参照マクロブロック５０９，５１０，５１２，５１３の１つ、またはそれらからの線形予測による予測信号の、何れか１つがマクロブロック毎に選択されて符号化される。 In FIG. 5, reference numeral 502 denotes an encoding target frame, and reference numerals 500, 501, 503, and 504 denote reference frames. In the case of FIG. 5, at the time of encoding and decoding, the frames are rearranged so as to be in the order of 500, 501, 503, 504, and 502. In the case of encoding, a plurality of local decoded frames and decoding are performed. In this case, a plurality of frames that have already been decoded are used as reference frames. For the encoding target macroblock 511, one of the reference macroblocks 509, 510, 512, and 513, or a prediction signal based on linear prediction from them, for each macroblock, as in the example of FIG. 4. Is selected and encoded.

図６は、本発明の実施形態に係る動きベクトル情報の符号化方法及び復号化方法を示す図である。図３の例のように、複数の参照フレームを用いたフレーム間符号化において、符号化マクロブロック毎に複数の参照マクロブロック信号を用いて予測マクロブロック信号を生成する場合、マクロブロック毎に複数の動きベクトル情報を符号化する必要がある。したがって、参照するマクロブロックの数が増えるほど、符号化すべき動きベクトル情報のオーバヘッドが増加し、符号化効率を低下させる原因となる。図６の例では、２つの参照フレームからそれぞれ参照マクロブロック信号を切り出して、予測マクロブロック信号を生成する場合に、１つの動きベクトルとその動きベクトルをフレーム間距離に応じてスケーリングしたベクトルを用いる例を示したものである。 FIG. 6 is a diagram illustrating an encoding method and a decoding method for motion vector information according to the embodiment of the present invention. In the case of inter-frame coding using a plurality of reference frames as in the example of FIG. 3, when generating a predicted macroblock signal using a plurality of reference macroblock signals for each coding macroblock, a plurality of macroblocks are provided for each macroblock. It is necessary to encode the motion vector information. Accordingly, as the number of macroblocks to be referenced increases, the overhead of motion vector information to be encoded increases, which causes a decrease in encoding efficiency. In the example of FIG. 6, when a reference macroblock signal is cut out from each of two reference frames to generate a predicted macroblock signal, one motion vector and a vector obtained by scaling the motion vector according to the interframe distance are used. An example is shown.

図中６０２が符号化対象フレームであり、６０１及び６００が参照フレームである。また、６１１及び６１０が、動きベクトルを示している。黒で示した点は垂直方向の画素位置を示しており、白で示した点は１/４画素精度の補間点を示している。図６は、動き補償予測を１/４画素精度で行う例を示している。動き補償の画素精度は、１画素、１/２画素、１/８画素など、符号化方式毎に定義される。通常は、動きベクトルを動き補償の精度で表現し、参照画像を参照フレームの画像データから、補間して生成するのが一般的である。 In the figure, reference numeral 602 denotes an encoding target frame, and reference numerals 601 and 600 denote reference frames. Reference numerals 611 and 610 denote motion vectors. The points shown in black indicate the pixel positions in the vertical direction, and the points shown in white indicate interpolation points with 1/4 pixel accuracy. FIG. 6 shows an example in which motion compensation prediction is performed with 1/4 pixel accuracy. The pixel accuracy of motion compensation is defined for each encoding method, such as 1 pixel, 1/2 pixel, 1/8 pixel, and the like. In general, a motion vector is generally expressed with motion compensation accuracy, and a reference image is generated by interpolation from image data of a reference frame.

図６では、符号化対象の画素６０５に着目すると、参照フレーム６００からは、２．５画素垂直方向に離れた点６０３を参照するものとし、２．５画素のずれを示す動きベクトル６１０が符号化される。一方、同画素６０５に対する参照フレーム６０１からの予測は、フレーム間距離に応じて、前述の符号化された動きベクトル６１０をスケーリングすることにより生成する。ここでは、フレーム６０１に対する動きベクトルは、フレーム間距離を考慮し２．５/２＝１．２５画素となり、参照フレーム６０１における画素６０４が、符号化フレーム６０２の画素６０５の参照画素として用いられる。 In FIG. 6, focusing on the encoding target pixel 605, it is assumed that a point 603 that is 2.5 pixels away from the reference frame 600 is referred to, and a motion vector 610 indicating a 2.5 pixel shift is encoded. It becomes. On the other hand, the prediction from the reference frame 601 for the pixel 605 is generated by scaling the coded motion vector 610 described above according to the interframe distance. Here, the motion vector for the frame 601 is 2.5 / 2 = 1.25 pixels in consideration of the inter-frame distance, and the pixel 604 in the reference frame 601 is used as the reference pixel of the pixel 605 in the encoding frame 602.

符号化時及び復号化時に同一の精度で動きベクトルのスケーリングを行うことで、各マクロブロック毎に符号化すべき動きベクトルは、符号化対象マクロブロックが複数のフレームを参照する場合でも１つの動きベクトルで済み、符号化オーバヘッドの増加を防ぐことが可能となる。ここで、動きベクトルのスケーリング結果が、動き補償の精度のサンプル点上にない場合は、端数の四捨五入によりスケーリングされた動きベクトルを丸めるものとする。 By performing motion vector scaling with the same accuracy at the time of encoding and decoding, a motion vector to be encoded for each macroblock is one motion vector even when the encoding target macroblock refers to a plurality of frames. Thus, an increase in coding overhead can be prevented. Here, if the scaling result of the motion vector is not on the sample point of the accuracy of motion compensation, the scaled motion vector is rounded off by rounding off to the nearest whole number.

図７は、本発明の実施形態に係る、図６とは異なる動きベクトル情報の符号化方法及び復号化方法を示す図である。図６の例では、動画像の時間的な動きの速さが一定の場合に、符号化データに占める動きベクトルのオーバヘッドを効率的に低減することが可能となる。一方、動画像の時間的な動きが単調ではあるが、動きの速さが一定ではない場合、単純にスケーリングした動きベクトルを用いると、予測効率の低下が発生して符号化効率の低下の原因となる場合がある。図７では図６と同様に、画素５０６の参照画素として、参照フレーム７００及び７０１の２フレームの参照画素から予測画素を生成する。ここでは、フレーム７００の画素７０３と、フレーム７０１の画素７０５が参照されるものとする。 FIG. 7 is a diagram illustrating an encoding method and a decoding method of motion vector information different from FIG. 6 according to the embodiment of the present invention. In the example of FIG. 6, it is possible to efficiently reduce the overhead of the motion vector in the encoded data when the temporal motion speed of the moving image is constant. On the other hand, if the temporal motion of a moving image is monotonous but the speed of motion is not constant, using a simply scaled motion vector will cause a decrease in prediction efficiency and cause a decrease in coding efficiency. It may become. In FIG. 7, as in FIG. 6, a prediction pixel is generated from the reference pixels of the two frames of the reference frames 700 and 701 as the reference pixel of the pixel 506. Here, the pixel 703 of the frame 700 and the pixel 705 of the frame 701 are referred to.

図６の例と同様に、フレーム７００に対する動きベクトル７１０が符号化され、それに加えて、フレーム７０１に対する動きベクトル７１１が、動きベクトル７１０をスケーリングしたベクトルとの差分ベクトル７２０として符号化される。動きベクトル７１０を１/２にスケーリングすることにより、フレーム７０１における画素７０４の位置が示され、本来の予測画素７０５と画素７０４との差分量を示す差分ベクトル７２０が符号化される。通常、時間的に単調な動きに対して、前述の差分ベクトルの大きさは小さくなるため、動きの速度が一定でない場合も、予測効率を低下させずに、かつ動きベクトルのオーバヘッドの増加を抑えて、効率的な符号化行うことが可能となる。 Similar to the example of FIG. 6, the motion vector 710 for the frame 700 is encoded, and in addition, the motion vector 711 for the frame 701 is encoded as a difference vector 720 with a vector obtained by scaling the motion vector 710. By scaling the motion vector 710 to ½, the position of the pixel 704 in the frame 701 is indicated, and the difference vector 720 indicating the amount of difference between the original predicted pixel 705 and the pixel 704 is encoded. Normally, the magnitude of the above-mentioned difference vector is small for a monotonous movement in time, so even if the speed of movement is not constant, the prediction efficiency is not lowered and the increase in overhead of the motion vector is suppressed. Thus, efficient encoding can be performed.

図８は、本発明の実施形態に係る、さらに別の動きベクトル情報の符号化方法及び復号化方法を示す図である。図８の例では、フレーム８０３が符号化対象フレームであり、フレーム６０２を飛ばして、フレーム８０１及びフレーム８００が参照フレームとなっている例である。さらに、画素８０６に対して、参照フレーム８００の画素８０４及び参照フレーム８０１の画素８０５が予測画素を生成するための参照画素となっている。 FIG. 8 is a diagram showing still another motion vector information encoding method and decoding method according to the embodiment of the present invention. In the example of FIG. 8, the frame 803 is an encoding target frame, the frame 602 is skipped, and the frames 801 and 800 are reference frames. Further, with respect to the pixel 806, the pixel 804 of the reference frame 800 and the pixel 805 of the reference frame 801 are reference pixels for generating a prediction pixel.

図６或いは図７の例と同様に、参照フレーム８００に対する動きベクトル８１１を符号化し、動きベクトル８１１をスケーリングした動きベクトルを用いて、参照フレーム８０１に対する動きベクトルを生成することも可能であるが、図８の場合、参照フレームと符号化フレームのフレーム間距離の関係から、動きベクトル８１１に対して２/３倍のスケーリングが必要となる。図８の例に限らず、任意のスケーリングを行うためには、分母が２のべき乗でない任意の整数となり、除算が必要となる。動きベクトルのスケーリングは、符号化時及び復号化時のいずれでも必要であり、特に除算は、ハードウエア及びソフトウエアのいずれにおいても、コストや演算時間が多くかかるため、符号化及び復号化のコスト増をもたらしてしまう。 Similar to the example of FIG. 6 or FIG. 7, it is possible to encode the motion vector 811 for the reference frame 800 and generate a motion vector for the reference frame 801 using a motion vector obtained by scaling the motion vector 811. In the case of FIG. 8, scaling of 2/3 times the motion vector 811 is necessary due to the interframe distance between the reference frame and the encoded frame. Not limited to the example of FIG. 8, in order to perform arbitrary scaling, the denominator becomes an arbitrary integer that is not a power of 2, and division is required. Motion vector scaling is required for both encoding and decoding. In particular, division requires a lot of cost and calculation time in both hardware and software. It will increase.

一方、図８では、符号化すべき動きベクトル８１１をフレーム間距離で正規化した動きベクトル８１０を符号化し、正規化された動きベクトル８１０を符号化フレームと各参照フレームとのフレーム間距離に応じて、スケーリングした動きベクトルと、本来の動きベクトルとの差分ベクトルを符号化するものである。つまり、参照画素８０４は、正規化された動きベクトル８１０を３倍した動きベクトルと、差分ベクトル８２０から生成され、参照画素８０５は、正規化された動きベクトル８１０を２倍した動きベクトルと、差分ベクトル８２１から生成される。図８の構成とすることで、予測効率を低下させずに、動きベクトルの符号化オーバヘッドの増加を防ぎ、さらに、動きベクトルのスケーリングが乗算のみで実現できるため、符号化及び復号化の演算コストも抑えることが可能となる。 On the other hand, in FIG. 8, a motion vector 810 obtained by normalizing the motion vector 811 to be encoded by the inter-frame distance is encoded, and the normalized motion vector 810 is determined according to the inter-frame distance between the encoded frame and each reference frame. The difference vector between the scaled motion vector and the original motion vector is encoded. That is, the reference pixel 804 is generated from the motion vector obtained by multiplying the normalized motion vector 810 by three times and the difference vector 820, and the reference pixel 805 is obtained by calculating the difference from the motion vector obtained by multiplying the normalized motion vector 810 by two times. Generated from vector 821. The configuration shown in FIG. 8 prevents an increase in motion vector encoding overhead without degrading prediction efficiency. Further, since the motion vector scaling can be realized only by multiplication, the calculation costs of encoding and decoding are reduced. Can also be suppressed.

図９は、本発明の第２の実施形態に係る動画像符号化方法のブロック図である。本発明の第２の実施形態では、前述の本発明の第１１及び第１２の特徴を用いた実施形態であり、図１で示した動画像符号化の実施形態に対して、入力画像９００に対するフェード検出手段９００が付加された構成である。フェード検出手段９００では、入力された動画像信号のフレーム毎の平均輝度値を計算し、輝度の時間変化に一定の傾きがある場合は、フェード画像であると判断し、その結果７０１を予測モード選択手段１２０に通知する。 FIG. 9 is a block diagram of a video encoding method according to the second embodiment of the present invention. The second embodiment of the present invention is an embodiment using the above-described eleventh and twelfth features of the present invention. The second embodiment of the present invention relates to the input image 900 with respect to the moving image coding embodiment shown in FIG. In this configuration, fade detection means 900 is added. The fade detection unit 900 calculates an average luminance value for each frame of the input moving image signal, and determines that the image is a fade image when the luminance temporal change has a certain inclination, and determines the result 701 as a prediction mode. The selection unit 120 is notified.

フェード検出手段９００により、フェード画像だと判断された場合は、予測モードを１つ参照フレームからの予測または複数の参照フレームの線形外挿あるいは線形内挿による予測の何れかに限定して、マクロブロック毎に最適な動きベクトル及び予測モードを決定する。決定された動きベクトル、及び予測モードを示す第１のフラグをマクロブロックのヘッダに書き込み、予測誤差信号の符号化行う。また、とり得る予測モードの組を示す第２のフラグは、フレームのヘッダデータに書き込んで出力する。 When the fade detection unit 900 determines that the image is a fade image, the prediction mode is limited to prediction from one reference frame, or prediction by linear extrapolation or linear interpolation of a plurality of reference frames. An optimal motion vector and prediction mode are determined for each block. The determined motion vector and the first flag indicating the prediction mode are written in the header of the macroblock, and the prediction error signal is encoded. The second flag indicating a set of possible prediction modes is written in the header data of the frame and output.

フェード検出手段９００により、フェード画像でないと判断された場合は、予測モードを１つ参照フレームからの予測または複数の参照フレームの平均値による予測のいずれかに限定して、同様に最適な動きベクトル及び予測モードを決定し、動きベクトル、予測モード、及び予測信号の符号化を同様に行う。 If the fade detection unit 900 determines that the image is not a fade image, the prediction mode is limited to either prediction from one reference frame or prediction using an average value of a plurality of reference frames, and an optimal motion vector is similarly obtained. Then, the prediction mode is determined, and the motion vector, the prediction mode, and the prediction signal are similarly encoded.

図９の構成で符号化された符号化データを受信して復号化する場合は、予測モードを示す上記第１及び第２のフラグから、マクロブロック毎の予測モードを判断し、マクロブロック毎に送られる動きベクトル及び判断した予測モードから、予測マクロブロック信号を生成し、符号化された予測誤差信号を復号化して、予測信号と加算することで復号化を行う。このような構成とすることで、本発明の第１１及び第１２に記載した通り、予測モード情報の符号化オーバヘッドを低減させることが可能となる。 When receiving and decoding the encoded data encoded in the configuration of FIG. 9, the prediction mode for each macroblock is determined from the first and second flags indicating the prediction mode, and for each macroblock. A prediction macroblock signal is generated from the motion vector to be sent and the determined prediction mode, and the encoded prediction error signal is decoded and added to the prediction signal for decoding. With this configuration, as described in the eleventh and twelfth aspects of the present invention, it is possible to reduce the encoding overhead of prediction mode information.

次に、図１０を用いて本発明の第３の実施形態における動画像符号化の手順について説明する。符号化対象の動画像フレームは１フレームずつ入力され、輝度値のフレーム内平均値の時間変化などに基づいてフレーム全体あるいはフレーム内の複数の画素ブロックで構成されるスライス毎に、フェード画像の検出が行われる（ステップＳ１）。フレーム内の画素ブロック毎に、複数の参照フレームの中から１つの最適な参照フレームを選択して予測画素ブロック信号を生成する（単一フレーム予測）か、あるいは２つの参照画素ブロック信号の線形和による予測によって予測画素ブロックを生成する（線形和予測）かが選択される。 Next, the procedure of moving picture coding in the third embodiment of the present invention will be described with reference to FIG. A moving image frame to be encoded is input frame by frame, and a fade image is detected for each slice composed of the entire frame or a plurality of pixel blocks in the frame based on temporal changes in the average value of luminance values within the frame. Is performed (step S1). For each pixel block in the frame, one optimal reference frame is selected from a plurality of reference frames to generate a prediction pixel block signal (single frame prediction), or a linear sum of two reference pixel block signals Whether to generate a prediction pixel block (linear sum prediction) is selected.

線形和予測では、入力動画像がフェード画像であると検出された場合は、時間線形補間（フレーム間の時間距離に基づく内挿または外挿）予測とし、フェード画像でない場合は、２つの参照画素ブロック信号の平均値により予測画素ブロックを生成する。複数フレームを用いた線形和予測が平均値予測であるか、あるいは時間線形補間予測であるかを示す第２の符号化モード情報をフレーム（ピクチャ）あるいはスライスのヘッダデータとして符号化する（ステップＳ２）。 In linear sum prediction, when it is detected that the input moving image is a fade image, temporal linear interpolation (interpolation or extrapolation based on a temporal distance between frames) is used for prediction. When the input moving image is not a fade image, two reference pixels are used. A prediction pixel block is generated based on the average value of the block signals. Second encoding mode information indicating whether linear sum prediction using a plurality of frames is average value prediction or temporal linear interpolation prediction is encoded as frame (picture) or slice header data (step S2). ).

次に、ステップＳ１の検出結果をステップＳ３で調べ、入力動画像がフェード画像であるか否かを判定する。ここで、入力動画像がフェード画像と判定された場合には、画素ブロック毎に複数の参照フレームから単一の予測ブロックを選択する符号化モード（ステップＳ５）と時間線形補間予測による符号化モード（ステップＳ４）のうち、符号化効率の高い方、すなわち発生符号量が少なくなる方の符号化モードを決定する（ステップＳ８）。 Next, the detection result of step S1 is examined in step S3, and it is determined whether or not the input moving image is a fade image. Here, when the input moving image is determined to be a fade image, an encoding mode for selecting a single prediction block from a plurality of reference frames for each pixel block (step S5) and an encoding mode based on temporal linear interpolation prediction Of (step S4), the encoding mode with the higher encoding efficiency, that is, the one with the smaller generated code amount is determined (step S8).

この後、単一フレーム予測かあるいは線形和予測かを示す第１の符号化モード情報と、その他の選択された符号化モードに関する情報（予測に用いる参照フレームの識別情報、動きベクトル情報など）をマクロブロックヘッダ領域に符号化する（ステップＳ１０）。最後に、選択された予測ブロック信号と符号化対象ブロックの信号との差分信号（予測誤差信号）を符号化し（ステップＳ１１）、それぞれの符号化データを出力する。 Thereafter, the first encoding mode information indicating whether the prediction is single frame prediction or linear sum prediction, and information on other selected encoding modes (identification information of reference frames used for prediction, motion vector information, etc.). Encode the macroblock header area (step S10). Finally, a difference signal (prediction error signal) between the selected prediction block signal and the signal of the encoding target block is encoded (step S11), and each encoded data is output.

一方、ステップＳ３での判定の結果、入力動画像がフェード画像でない場合には、単一フレーム予測モード（ステップＳ６）及び平均値予測モード（ステップＳ７）から最適な方の符号化モードを選択する（ステップＳ９）。以下、同様に符号化モードに関する情報の符号化（ステップＳ１０）と、差分信号の符号化（ステップＳ１１）を行う。 On the other hand, if the result of determination in step S3 is that the input moving image is not a fade image, the most suitable encoding mode is selected from the single frame prediction mode (step S6) and the average value prediction mode (step S7). (Step S9). Hereinafter, similarly, encoding of information on the encoding mode (step S10) and encoding of the difference signal (step S11) are performed.

ステップＳ１のフェード検出結果に従って、フレーム内あるいはスライス内の各ブロックが上記の通り符号化され、１フレーム（ピクチャ）あるいは１スライス内の全ての画素ブロックの符号化が終了すると（ステップＳ１２）、次に符号化すべきフレームあるいはスライスのフェード検出を行い（ステップＳ１）、同様に符号化を進める。上記説明では、１フレームを１ピクチャとして符号化する例を示したが、１フィールドを１ピクチャとしてフィールド単位に符号化を行ってもよい。 According to the fade detection result of step S1, each block in the frame or slice is encoded as described above, and when encoding of all pixel blocks in one frame (picture) or one slice is completed (step S12), Fade detection of the frame or slice to be encoded is performed (step S1), and the encoding proceeds in the same manner. In the above description, an example in which one frame is encoded as one picture has been described, but encoding may be performed in units of fields with one field as one picture.

図１１及び図１２は、本実施形態に係る動画像符号化データのデータ構造を示す図である。図１１はピクチャあるいはスライスのヘッダデータを含むデータ構造の一部を示し、図１２はマクロブロックデータの一部を示している。ピクチャあるいはスライスのヘッダ領域では、該符号化対象フレームの表示時刻に関する情報“time_info_to_be_displayed”や、上述した線形和予測が時間線形補間予測であるか、平均値予測であるかを示す第２の符号化モード情報であるフラグ“linear_weighted_prediction_flag”が符号化される。“linear_weighted_prediction_flag”が０の場合は平均値予測を表し、１の場合は時間線形補間予測を示す。 11 and 12 are diagrams showing a data structure of moving image encoded data according to the present embodiment. FIG. 11 shows a part of a data structure including picture or slice header data, and FIG. 12 shows a part of macroblock data. In the header area of a picture or slice, information “time_info_to_be_displayed” relating to the display time of the encoding target frame and second encoding indicating whether the linear sum prediction described above is temporal linear interpolation prediction or average value prediction. A flag “linear_weighted_prediction_flag” that is mode information is encoded. When “linear_weighted_prediction_flag” is 0, it represents average value prediction, and when it is 1, it represents temporal linear interpolation prediction.

ピクチャあるいはスライスの符号化データ内には、複数のマクロブロック符号化データが包含されており、各マクロブロックデータは図１２で示すような構造となっている。マクロブロックデータのヘッダ領域には、参照フレームの選択情報や動きベクトル情報等とともに、選択された単一フレームからの単一フレーム予測かあるいは複数のフレームからの線形和による予測かを示す情報（第１の符号化モード情報）が、“macroblock_type”として符号化される。 A plurality of macro block encoded data is included in the encoded data of a picture or a slice, and each macro block data has a structure as shown in FIG. In the header area of the macroblock data, information indicating whether to select a single frame from a selected single frame or a linear sum from a plurality of frames, together with selection information of a reference frame, motion vector information, etc. 1 encoding mode information) is encoded as “macroblock_type”.

図１３は、図１１及び図１２で示したデータ構造を含む動画像符号化データの時系列全体の構造を模式的に示したものである。符号化データの先頭には、画像サイズ等の１つの符号化シーケンス内全体で不変な複数の符号化パラメータの情報がシーケンスヘッダ（ＳＨ）として符号化される。 FIG. 13 schematically shows the structure of the entire time-series of moving image encoded data including the data structure shown in FIGS. 11 and 12. At the beginning of the encoded data, information on a plurality of encoding parameters that are invariant throughout the entire encoding sequence, such as the image size, is encoded as a sequence header (SH).

次に、各画像フレームあるいはフィールドがそれぞれピクチャとして符号化される。各ピクチャは、ピクチャヘッダ（ＰＨ）とピクチャデータ(Picture data)の組として順次符号化されている。ピクチャヘッダ（ＰＨ）には、図１１で示した符号化対象フレームの表示時刻に関する情報“time_info_to_be_displayed”及び第２の符号化モード情報“linear_weighted_prediction_flag”がそれぞれＤＴＩ，ＬＷＰとして符号化されている。ピクチャデータは、１つまたは複数のスライス（ＳＬＣ）に分割され、スライス毎に順次符号化される。 Next, each image frame or field is encoded as a picture. Each picture is sequentially encoded as a set of a picture header (PH) and picture data (Picture data). In the picture header (PH), information “time_info_to_be_displayed” and second encoding mode information “linear_weighted_prediction_flag” related to the display time of the encoding target frame shown in FIG. 11 are encoded as DTI and LWP, respectively. The picture data is divided into one or a plurality of slices (SLC) and sequentially encoded for each slice.

各スライスでＳＬＣは、まずスライス内の各画素ブロックに関する符号化パラメータがスライスヘッダ（ＳＨ）として符号化され、スライスヘッダＳＨに続けて、１つまたは複数のマクロブロックデータ（ＭＢ）が順次符号化されている。マクロブロックデータＭＢでは、図１２で示した第１の符号化モード情報である“macroblock_type”がＭＢＴとして符号化されており、また動きベクトル情報（ＭＶ）などの、マクロブロック内の各画素の符号化に関する情報が符号化され、最後に符号化すべき画素信号あるいは予測誤差信号に対して直交変換（例えば、離散コサイン変換）を施して符号化された直交変換係数（ＤＣＴ）が含まれている。 In each SLC, the encoding parameter for each pixel block in the slice is first encoded as a slice header (SH), and one or more macroblock data (MB) is sequentially encoded following the slice header SH. Has been. In the macroblock data MB, “macroblock_type” that is the first encoding mode information shown in FIG. 12 is encoded as MBT, and the code of each pixel in the macroblock, such as motion vector information (MV). Information related to encoding is encoded, and an orthogonal transform coefficient (DCT) encoded by performing orthogonal transform (for example, discrete cosine transform) on the pixel signal or prediction error signal to be encoded last is included.

ここで、ピクチャヘッダＰＨに含まれている第２の符号化モード情報“linear_weighted_prediction_flag”は、スライスヘッダＳＨでスライス毎に符号化する構成としてもよい。 Here, the second encoding mode information “linear_weighted_prediction_flag” included in the picture header PH may be encoded for each slice with the slice header SH.

次に、図１４を用いて本実施形態における動画像復号化の手順について説明する。本実施形態では、図１０で示した動画像符号化方法により符号化され、図１１及び図１２で示したようなデータ構造を有する符号化データを入力し、復号化するものである。入力された符号化データから、ピクチャあるいはスライスのヘッダ情報の復号化を行い、符号化対象フレームの表示時刻に関する情報“time_info_to_be_displayed”や、第２の符号化モード情報“linear_weighted_prediction_flag”の復号化を行う（ステップＳ３０）。 Next, the procedure of moving picture decoding in the present embodiment will be described with reference to FIG. In the present embodiment, encoded data encoded by the moving image encoding method shown in FIG. 10 and having a data structure as shown in FIGS. 11 and 12 is input and decoded. The header information of the picture or slice is decoded from the input encoded data, and the information “time_info_to_be_displayed” relating to the display time of the encoding target frame and the second encoding mode information “linear_weighted_prediction_flag” are decoded ( Step S30).

さらに、ピクチャあるいはスライス内のマクロブロック毎に、マクロブロックのヘッダ情報の復号化を行い、参照フレームの識別情報、動きベクトル情報、第１の符号化モード情報を含む“macroblock_type”などの復号化を行う（ステップＳ３１）。 Further, for each macroblock in the picture or slice, the macroblock header information is decoded, and “macroblock_type” including reference frame identification information, motion vector information, and first coding mode information is decoded. This is performed (step S31).

復号化された第１の符号化モード情報が単一フレーム予測を示している場合、参照フレームの識別情報、動きベクトル情報などの予測モード情報に従って、予測ブロック信号を生成する（ステップＳ３４）。第１の符号化モード情報が複数フレームからの線形和による予測を示している場合、復号化された第２の符号化モード情報に従って（ステップＳ３３）、平均予測（ステップ３５）あるいは時間線形補間予測（ステップＳ３６）のいずれかの方法で予測信号を生成する。 When the decoded first coding mode information indicates single frame prediction, a prediction block signal is generated according to prediction mode information such as reference frame identification information and motion vector information (step S34). When the first coding mode information indicates prediction by linear sum from a plurality of frames, average prediction (step 35) or temporal linear interpolation prediction is performed according to the decoded second coding mode information (step S33). A prediction signal is generated by any method of (Step S36).

次に、符号化された予測誤差信号の復号化を行い、生成した予測信号と加算することにより、復号化画像を生成する（ステップＳ３７）。ピクチャあるいはスライス内の各マクロブロックについては、各マクロブロックヘッダから順次復号化を行い、ピクチャあるいはスライス内の全てのマクロブロックの復号化を終了したら（ステップＳ３８）、引き続き次のピクチャあるいはスライスヘッダから復号化を進める。 Next, the encoded prediction error signal is decoded and added to the generated prediction signal to generate a decoded image (step S37). For each macroblock in a picture or slice, decoding is sequentially performed from each macroblock header, and when decoding of all macroblocks in the picture or slice is completed (step S38), the next picture or slice header is continued. Proceed with decryption.

上述したように、本実施形態では符号化モードに関する情報を単一フレーム予測か複数フレームからの線形和による予測かを示す第１の符号化モード情報と、線形和による予測が時間線形内挿予測か平均予測を示す第２の符号化モード情報に分離し、第１の符号化モード情報をマクロブロック毎に、第２の符号化モード情報をピクチャあるいはスライス毎にそれぞれ符号化することで、符号化効率を維持しつつ、符号化モード情報を符号化するオーバヘッドを低減することが可能となる。 As described above, in the present embodiment, the first coding mode information indicating whether the information regarding the coding mode is a single frame prediction or a prediction based on a linear sum from a plurality of frames, and the prediction based on the linear sum is a temporal linear interpolation prediction. Or by encoding the first coding mode information for each macroblock and the second coding mode information for each picture or slice. It is possible to reduce the overhead of encoding the encoding mode information while maintaining the encoding efficiency.

すなわち、第２の符号化モード情報はフェード画像等のフレーム内の広域的な特性を示しているので、第２の符号化モード情報をスライスあるいはフレーム毎に符号化することで、マクロブロック毎に符号化する場合に比べて、符号化モード情報自体を符号化するための符号量を抑えることが可能となり、かつ符号化効率の大幅な低下は発生しない。 That is, since the second encoding mode information indicates a wide-area characteristic in a frame such as a fade image, the second encoding mode information is encoded for each macroblock by encoding the second encoding mode information for each slice or frame. Compared to the case of encoding, it is possible to suppress the amount of code for encoding the encoding mode information itself, and the encoding efficiency does not significantly decrease.

一方、第１の符号化モード情報については、マクロブロック毎に符号化することで、各画素ブロックの個別の特性（例えば、時間的な見え隠れが部分的にある映像など）に応じて適切なモードを決定することが可能となり、符号化効率をより向上させることが可能となる。 On the other hand, the first encoding mode information is encoded for each macroblock, so that an appropriate mode can be selected according to the individual characteristics of each pixel block (for example, a video with a partial appearance of time). Can be determined, and the encoding efficiency can be further improved.

このように本実施形態は、動画像の持つ特性を考慮して第１及び第２の符号化モードの符号化頻度が決定されているため、高能率で高画質な符号化を行うことが可能となる。 As described above, in the present embodiment, since the encoding frequencies of the first and second encoding modes are determined in consideration of the characteristics of the moving image, it is possible to perform high-efficiency and high-quality encoding. It becomes.

次に、図１５及び図１６を用いて、本実施形態における時間線形補間予測について詳しく説明する。図１５におけるＦ０，Ｆ１，Ｆ２及び図１６におけるＦ０，Ｆ２，Ｆ１は、それぞれ時間的に連続するフレームを示している。図１５及び図１６において、Ｆ２が符号化あるいは復号化対象フレーム、Ｆ０及びＦ１が参照フレームを示している。ここで、図１５及び図１６の例について、それぞれ符号化あるいは復号化対象フレーム内のある画素ブロックが２つの参照フレームの線形和による予測を行う場合を考える。 Next, the temporal linear interpolation prediction in the present embodiment will be described in detail with reference to FIGS. 15 and 16. F0, F1, and F2 in FIG. 15 and F0, F2, and F1 in FIG. 16 indicate temporally continuous frames, respectively. 15 and 16, F2 indicates a frame to be encoded or decoded, and F0 and F1 indicate reference frames. Here, in the example of FIGS. 15 and 16, consider a case where a certain pixel block in a frame to be encoded or decoded performs prediction based on a linear sum of two reference frames.

線形和予測が平均値予測の場合は、各参照フレームから切り出された参照ブロックの単純平均により、予測画素ブロックが生成される。フレームＦ０及びＦ１から切り出し参照画素ブロック信号をそれぞれref0、ref1とすると、図１５及び図１６のそれぞれの予測画素ブロック信号pred2は、以下の数式（１５）に従って計算される。
pred2＝( ref0 + ref1 ) / 2 （１５）
一方、時間線形補間予測の場合は、符号化あるいは復号化対象フレームと各参照フレームとの時間距離に応じて線形和が計算される。図１１で示したように、符号化対象フレーム毎にピクチャあるいはスライスヘッダ領域に表示時刻に関する情報“time_info_to_be_displayed”が符号化されており、復号化時にはこの情報に基づいて各フレームの表示時刻が算出される。ここで、フレームＦ０，Ｆ１，Ｆ２の表示時刻が、それぞれＤt0，Ｄt1，Ｄt2であるとする。 When the linear sum prediction is an average value prediction, a prediction pixel block is generated by a simple average of reference blocks cut out from each reference frame. Assuming that the reference pixel block signals cut out from the frames F0 and F1 are ref0 and ref1, respectively, the respective predicted pixel block signals pred2 in FIGS. 15 and 16 are calculated according to the following formula (15).
pred2 = (ref0 + ref1) / 2 (15)
On the other hand, in the case of temporal linear interpolation prediction, a linear sum is calculated according to the time distance between the encoding or decoding target frame and each reference frame. As shown in FIG. 11, information “time_info_to_be_displayed” relating to the display time is encoded in the picture or slice header area for each encoding target frame, and the display time of each frame is calculated based on this information at the time of decoding. The Here, it is assumed that the display times of the frames F0, F1, and F2 are Dt0, Dt1, and Dt2, respectively.

図１５の例では、過去の２フレームから現在のフレームを予測するため線形外挿予測となり、図１６の例では、未来と過去のフレームからの線形内挿予測となる。図１５及び図１６について、２つの参照フレーム間の時間距離をＲｒ、符号化対象フレームに対する時間的に最も過去の参照フレームから、該符号化対象フレームまでの時間距離をＲcとすると、それぞれ以下のようになる。
Ｒr＝Ｄt1−Ｄt0, Ｒc＝Ｄt2−Ｄt0 （１６）
上記時間距離に基づく線形外挿予測及び線形内挿予測は、図１５及び図１６の場合のいずれにおいても、以下の数式（１７）で計算される。
pred2＝{ (Ｒr−Ｒc)*ref0＋Ｒc*ref1 } / Ｒr （１７）
また、数式（１７）は数式（１８）のように変形することも可能である。 Pred2＝ref0 + (ref1-ref0)* Ｒc/ Ｒr （１８）
フェード画像やクロスフェード画像のような信号振幅がフレーム間で単調に時間変動する画像では、非常に短時間（例えば、連続する３フレーム）内では、その信号振幅の時間変動を１次近似することが可能である。従って、本実施形態のように符号化対象フレームと２つの参照フレームとのフレーム間の時間距離に応じた時間線形補間（線形外挿または線形内挿）を行うことで、より的確な予測画像を生成することが可能となる。その結果、フレーム間予測の効率が向上し、画質を劣化させずに、より発生符号量を低減したり、あるいは同一ビットレートでより高画質な符号化を行うことが可能となる。 In the example of FIG. 15, linear extrapolation prediction is performed to predict the current frame from the past two frames, and in the example of FIG. 16, linear interpolation prediction is performed from the future and past frames. 15 and 16, assuming that the time distance between two reference frames is Rr, and the time distance from the earliest reference frame to the encoding target frame to the encoding target frame is Rc, It becomes like this.
Rr = Dt1-Dt0, Rc = Dt2-Dt0 (16)
The linear extrapolation prediction and the linear interpolation prediction based on the time distance are calculated by the following formula (17) in both cases of FIGS. 15 and 16.
pred2 = {(Rr−Rc) * ref0 + Rc * ref1} / Rr (17)
Further, the equation (17) can be modified as the equation (18). Pred2 = ref0 + (ref1-ref0) * Rc / Rr (18)
For images in which the signal amplitude varies monotonically between frames, such as fade images and cross-fade images, the time variation of the signal amplitude is first-order approximated within a very short time (for example, three consecutive frames). Is possible. Therefore, by performing temporal linear interpolation (linear extrapolation or linear interpolation) according to the time distance between the encoding target frame and the two reference frames as in this embodiment, a more accurate prediction image can be obtained. Can be generated. As a result, the efficiency of inter-frame prediction is improved, and it is possible to reduce the amount of generated codes or perform higher quality encoding at the same bit rate without degrading the image quality.

上述した本発明の符号化及び復号化の処理は、ハードウェアにより実現してもよいし、処理の一部または全部をコンピュータを用いてソフトウェアにより実行することも可能である。従って、本発明によれば例えば以下に示すようなプログラムを提供することができる。 The above-described encoding and decoding processing of the present invention may be realized by hardware, or part or all of the processing may be executed by software using a computer. Therefore, according to the present invention, for example, the following program can be provided.

（１）マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化処理をコンピュータに行わせるためのプログラムにおいて、前記複数の参照フレームから複数の参照マクロブロックを生成する処理と、前記複数の参照マクロブロックの１つ、前記複数の参照マクロブロックの平均値、或いは前記複数の参照マクロブロックによる線形外挿予測または線形内挿予測のいずれかを予測マクロブロックとして選択する処理と、前記選択された予測マクロブロックと符号化マクロブロックとの予測誤差信号、予測モード情報及び動きベクトルを符号化する処理とを前記コンピュータに行わせるためのプログラム。 (1) In a program for causing a computer to perform motion compensated prediction interframe coding processing that refers to a plurality of moving image frames for each macroblock, a process of generating a plurality of reference macroblocks from the plurality of reference frames; A process of selecting one of the plurality of reference macroblocks, an average value of the plurality of reference macroblocks, or linear extrapolation prediction or linear interpolation prediction using the plurality of reference macroblocks as a prediction macroblock; A program for causing the computer to perform a process of encoding a prediction error signal, prediction mode information, and a motion vector between the selected prediction macroblock and the encoding macroblock.

（２）マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化データの復号化処理をコンピュータに行わせるためのプログラムにおいて、符号化された動きベクトルデータ、予測モード情報及び予測誤差信号を受信する処理と、前記受信された動きベクトル及び前記予測モードに応じて、（ａ）前記複数の参照フレームのうち特定の１フレームから予測マクロブロックを生成するか、（ｂ）複数の参照フレームから複数の参照マクロブロックを生成して前記複数の参照マクロブロックの平均値を予測マクロブロックとして生成するか、或いは（ｃ）前記複数の参照マクロブロックによる線形外挿予測または線形内挿予測のいずれかから予測マクロブロックを生成するかを選択する処理と、前記生成された予測マクロブロックと前記予測誤差信号を加算する処理とを前記コンピュータに行わせるためのプログラム。 (2) In a program for causing a computer to decode motion-compensated prediction interframe encoded data that refers to a plurality of video frames for each macroblock, encoded motion vector data, prediction mode information, and prediction Depending on the process of receiving the error signal and the received motion vector and the prediction mode, (a) generating a prediction macroblock from a specific one of the plurality of reference frames, or (b) a plurality of Generating a plurality of reference macroblocks from a reference frame and generating an average value of the plurality of reference macroblocks as a prediction macroblock; or (c) linear extrapolation prediction or linear interpolation prediction using the plurality of reference macroblocks. A process of selecting whether to generate a prediction macroblock from any of the above, and the generated prediction macro A program for causing the computer to perform a block and a process of adding the prediction error signal.

（３）マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化処理をコンピュータに行わせるためのプログラムにおいて、前記複数の参照フレームからの線形予測により予測マクロブロックを生成する処理と、前記予測マクロブロックと符号化マクロブロックとの予測誤差信号及び動きベクトルをマクロブロック毎に符号化する処理と、前記線形予測の予測係数の組をフレーム毎に符号化する処理とを前記コンピュータに行わせるためのプログラム。 (3) Processing for generating a predicted macroblock by linear prediction from the plurality of reference frames in a program for causing a computer to perform motion compensation prediction interframe coding processing that refers to a plurality of moving image frames for each macroblock A process for encoding the prediction error signal and motion vector of the prediction macroblock and the encoding macroblock for each macroblock, and a process for encoding the set of prediction coefficients of the linear prediction for each frame. A program to let you do.

（４）マクロブロック毎に複数の動画像フレームを参照する動き補償予測フレーム間符号化データの復号化処理をコンピュータに行わせるためのプログラムにおいて、マクロブロック毎に符号化された動きベクトルデータ及び予測誤差信号と、フレーム毎に符号化された予測係数の組を受信する処理と、前記受信された動きベクトル及び予測係数に応じて、前記複数の参照フレームから予測マクロブロックを生成する処理と、前記生成された予測マクロブロックと前記予測誤差信号を加算する処理とを前記コンピュータに行わせるためのプログラム。 (4) Motion vector data and prediction encoded for each macroblock in a program for causing a computer to perform decoding processing of motion compensated prediction interframe encoded data that refers to a plurality of video frames for each macroblock A process of receiving an error signal and a set of prediction coefficients encoded for each frame; a process of generating a prediction macroblock from the plurality of reference frames according to the received motion vector and prediction coefficient; A program for causing the computer to perform the process of adding the generated prediction macroblock and the prediction error signal.

本発明の実施形態に係る動画像符号化方法のブロック図Block diagram of a video encoding method according to an embodiment of the present invention 本発明の実施形態に係る動画像復号化方法のブロック図Block diagram of a video decoding method according to an embodiment of the present invention 本発明の実施形態に係るフレーム間予測の関係を示す図The figure which shows the relationship of the inter-frame prediction which concerns on embodiment of this invention. 本発明の実施形態に係るフレーム間予測の関係を示す図The figure which shows the relationship of the inter-frame prediction which concerns on embodiment of this invention. 本発明の実施形態に係るフレーム間予測の関係を示す図The figure which shows the relationship of the inter-frame prediction which concerns on embodiment of this invention. 本発明の実施形態に係る動きベクトル情報の符号化方法及び復号化方法の例を示す図The figure which shows the example of the encoding method of the motion vector information which concerns on embodiment of this invention, and a decoding method 本発明の実施形態に係る動きベクトル情報の符号化方法及び復号化方法の例を示す図The figure which shows the example of the encoding method of the motion vector information which concerns on embodiment of this invention, and a decoding method 本発明の実施形態に係る動きベクトル情報の符号化方法及び復号化方法の例を示す図The figure which shows the example of the encoding method of the motion vector information which concerns on embodiment of this invention, and a decoding method 本発明の実施形態に係る動画像符号化方法のブロック図Block diagram of a video encoding method according to an embodiment of the present invention 本発明の実施形態に係る動画像符号化方法の手順を示すフローチャートThe flowchart which shows the procedure of the moving image encoding method which concerns on embodiment of this invention. 同実施形態における動画像符号化データのピクチャヘッダ又はスライスヘッダのデータ構造の例を示す図A diagram showing an example of a data structure of a picture header or a slice header of moving image encoded data in the same embodiment 同実施形態における動画像符号化データのマクロブロックのデータ構造の例を示す図The figure which shows the example of the data structure of the macroblock of the moving image encoded data in the embodiment 本発明の実施形態に係る動画像符号化データ全体のデータ構造の例を示す図The figure which shows the example of the data structure of the whole moving image coding data based on embodiment of this invention 同実施形態に係る動画像復号化方法の手順を示すフローチャートThe flowchart which shows the procedure of the moving image decoding method which concerns on the embodiment 同実施形態における時間線形補間を説明する図The figure explaining the time linear interpolation in the same embodiment 同実施形態における時間線形補間を説明する図The figure explaining the time linear interpolation in the same embodiment

符号の説明Explanation of symbols

１１２…ＤＣＴ手段、１１３…量子化手段、１１５、２１５…逆量子化手段、１１６、２１６…逆ＤＣＴ手段、１１７、２１７…第１の参照フレームメモリ、１１８、２１８…第２の参照フレームメモリ、１１９，２１９…予測マクロブロック生成手段、１２０，２２０…予測マクロブロック選択手段、９００…フェード検出手段 DESCRIPTION OF SYMBOLS 112 ... DCT means, 113 ... Quantization means, 115, 215 ... Inverse quantization means, 116, 216 ... Inverse DCT means, 117, 217 ... 1st reference frame memory, 118, 218 ... 2nd reference frame memory, 119, 219 ... Predictive macroblock generation means, 120, 220 ... Predictive macroblock selection means, 900 ... Fade detection means

Claims

入力動画像の符号化対象フレームに含まれる符号化対象ブロック毎に、少なくとも一つの参照フレームを用いて動き補償予測フレーム間符号化を行う動画像符号化方法において、
単一の参照フレームから予測ブロック信号を生成する第１の予測ブロック生成モード、及び複数の参照フレームから切り出した複数の参照ブロックの線形和予測により前記予測ブロック信号を生成する第２の予測ブロック生成モードのいずれかを符号化ブロック毎に選択するステップと、
選択された前記予測ブロック信号と前記符号化対象ブロックの信号との差分信号を符号化するステップと、前記線形和予測を前記複数の参照ブロックの平均値予測とするか、あるいは前記複数の参照フレーム及び前記符号化対象フレームの表示時刻に基づく線形補間予測とするかを前記符号化対象フレーム内の複数の画素ブロック毎または前記符号化対象フレーム毎に選択するステップと、
前記予測ブロック信号の生成時に前記第１及び第２の予測ブロック生成モードのいずれが選択されたかを示す第１の符号化モード情報を前記符号化対象ブロック毎または複数の前記符号化対象ブロック毎に符号化するステップと、
前記線形和予測に前記平均値予測及び前記線形補間予測のいずれが選択されたかを示す第２の符号化モード情報を前記符号化対象フレームの複数の画素ブロック毎あるいは前記符号化対象フレーム毎に符号化するステップと、を含む動画像符号化方法。 In a moving picture coding method for performing motion compensation prediction interframe coding using at least one reference frame for each coding target block included in a coding target frame of an input moving picture,
First prediction block generation mode for generating a prediction block signal from a single reference frame, and second prediction block generation for generating the prediction block signal by linear sum prediction of a plurality of reference blocks cut out from a plurality of reference frames Selecting one of the modes for each coding block;
A step of encoding a difference signal between the selected prediction block signal and the signal of the encoding target block, and the linear sum prediction is an average value prediction of the plurality of reference blocks, or the plurality of reference frames Selecting for each of a plurality of pixel blocks in the encoding target frame or for each encoding target frame whether linear interpolation prediction based on a display time of the encoding target frame is performed,
First encoding mode information indicating which of the first and second prediction block generation modes is selected at the time of generation of the prediction block signal for each of the encoding target blocks or a plurality of the encoding target blocks Encoding, and
Second encoding mode information indicating which of the average value prediction and the linear interpolation prediction is selected for the linear sum prediction is encoded for each of a plurality of pixel blocks of the encoding target frame or for each encoding target frame. A moving image encoding method.

入力動画像の符号化対象フレームに含まれる符号化対象ブロック毎に、少なくとも一つの参照フレームを用いて動き補償予測フレーム間符号化を行う動画像符号化装置において、
単一の参照フレームから予測ブロック信号を生成する第１の予測ブロック生成モード、及び複数の参照フレームから切り出した複数の参照ブロックの線形和予測により前記予測ブロック信号を生成する第２の予測ブロック生成モードのいずれかを符号化ブロック毎に選択する選択部と、
選択された前記予測ブロック信号と前記符号化対象ブロックの信号との差分信号を符号化する符号化部と、
前記線形和予測を前記複数の参照ブロックの平均値予測とするか、あるいは前記複数の参照フレーム及び前記符号化対象フレームの表示時刻に基づく線形補間予測とするかを前記符号化対象フレーム内の複数の画素ブロック毎または前記符号化対象フレーム毎に選択する選択部と、
前記予測ブロック信号の生成時に前記第１及び第２の予測ブロック生成モードのいずれが選択されたかを示す第１の符号化モード情報を前記符号化対象ブロック毎または複数の前記符号化対象ブロック毎に符号化する符号化部と、
前記線形和予測に前記平均値予測及び前記線形補間予測のいずれが選択されたかを示す第２の符号化モード情報を前記符号化対象フレームの複数の画素ブロック毎あるいは前記符号化対象フレーム毎に符号化する符号化部と、を具備する動画像符号化装置。 In a moving image coding apparatus that performs motion compensated prediction interframe coding using at least one reference frame for each coding target block included in a coding target frame of an input moving image,
First prediction block generation mode for generating a prediction block signal from a single reference frame, and second prediction block generation for generating the prediction block signal by linear sum prediction of a plurality of reference blocks cut out from a plurality of reference frames A selection unit for selecting one of the modes for each coding block;
An encoding unit that encodes a difference signal between the selected prediction block signal and the signal of the encoding target block;
Whether the linear sum prediction is an average value prediction of the plurality of reference blocks or a linear interpolation prediction based on display times of the plurality of reference frames and the encoding target frame. A selection unit that selects each pixel block or each encoding target frame;
First encoding mode information indicating which of the first and second prediction block generation modes is selected at the time of generation of the prediction block signal for each of the encoding target blocks or a plurality of the encoding target blocks An encoding unit for encoding;
Second encoding mode information indicating which of the average value prediction and the linear interpolation prediction is selected for the linear sum prediction is encoded for each of a plurality of pixel blocks of the encoding target frame or for each encoding target frame. A moving image encoding device comprising:

入力動画像の符号化対象フレームに含まれる符号化対象ブロック毎に、少なくとも一つの参照フレームを用いて動き補償予測フレーム間符号化をコンピュータに実行させるプログラムにおいて、
単一の参照フレームから予測ブロック信号を生成する第１の予測ブロック生成モード、及び複数の参照フレームから切り出した複数の参照ブロックの線形和予測により前記予測ブロック信号を生成する第２の予測ブロック生成モードのいずれかを符号化ブロック毎に選択する手段と、
選択された前記予測ブロック信号と前記符号化対象ブロックの信号との差分信号を符号化する手段と、
前記線形和予測を前記複数の参照ブロックの平均値予測とするか、あるいは前記複数の参照フレーム及び前記符号化対象フレームの表示時刻に基づく線形補間予測とするかを前記符号化対象フレーム内の複数の画素ブロック毎または前記符号化対象フレーム毎に選択するステップと、前記予測ブロック信号の生成時に前記第１及び第２の予測ブロック生成モードのいずれが選択されたかを示す第１の符号化モード情報を前記符号化対象ブロック毎または複数の前記符号化対象ブロック毎に符号化する手段と、
前記線形和予測に前記平均値予測及び前記線形補間予測のいずれが選択されたかを示す第２の符号化モード情報を前記符号化対象フレームの複数の画素ブロック毎あるいは前記符号化対象フレーム毎に符号化する手段としてコンピュータに実行させるプログラム。 In a program for causing a computer to perform motion compensation prediction interframe coding using at least one reference frame for each coding target block included in a coding target frame of an input moving image,
First prediction block generation mode for generating a prediction block signal from a single reference frame, and second prediction block generation for generating the prediction block signal by linear sum prediction of a plurality of reference blocks cut out from a plurality of reference frames Means for selecting one of the modes for each coding block;
Means for encoding a difference signal between the selected prediction block signal and the signal of the encoding target block;
Whether the linear sum prediction is an average value prediction of the plurality of reference blocks or a linear interpolation prediction based on display times of the plurality of reference frames and the encoding target frame. And a first encoding mode information indicating which one of the first and second prediction block generation modes is selected at the time of generation of the prediction block signal Means for encoding for each encoding target block or a plurality of the encoding target blocks;
Second encoding mode information indicating which of the average value prediction and the linear interpolation prediction is selected for the linear sum prediction is encoded for each of a plurality of pixel blocks of the encoding target frame or for each encoding target frame. A program to be executed by a computer as a means for converting to a computer.