JP2006311265A

JP2006311265A - Video encoding device, method and program, and video decoding device, method and program

Info

Publication number: JP2006311265A
Application number: JP2005132012A
Authority: JP
Inventors: Daijiro Ichimura; 大治郎市村; Yoshimasa Honda; 義雅本田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-04-28
Filing date: 2005-04-28
Publication date: 2006-11-09

Abstract

PROBLEM TO BE SOLVED: To solve the problem that a quantization parameter has to be encoded and has to be stored in a video stream in order to improve the image quality of an area of high importance. SOLUTION: A motion vector generated by motion predictive compensation of an MCTF (motion compensated temporal filtering) is used to generate a reference map representing a reference relation between areas, and an important area is made to have higher image quality without encoding the quantization parameter. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、映像を符号化し映像ストリームを生成する映像符号化装置および方法と、映像ストリームを復号化して復号化映像を生成する映像復号化装置および方法に関するものである。 The present invention relates to a video encoding apparatus and method for encoding video and generating a video stream, and a video decoding apparatus and method for generating decoded video by decoding the video stream.

映像は、もはや我々の生活とは切り離せない関係にあり、インターネットや携帯電話網、放送波、蓄積メディアなどの伝送手段を通じ、パソコンや携帯端末、テレビ、ハイビジョンテレビなどの多様な表示端末において、視覚的な情報を享受させてくれる重要な存在となった。 Video is no longer inseparable from our daily lives, and it can be viewed on various display terminals such as personal computers, mobile terminals, TVs, and high-definition TVs through transmission methods such as the Internet, mobile phone networks, broadcast waves, and storage media. It became an important existence to let you enjoy the information.

伝送手段を通じて伝送される映像は、効率良く情報を伝達するために、映像符号化技術を用いてより少ないデータ量を持つ映像ストリームに圧縮される。しかしながら、近年のカメラや映像ディスプレイの高解像度化によって映像の持つ情報は膨大となり、さらなる高効率の映像符号化技術が求められている。ここで、映像とは連続した画像、すなわち動画像のことを指す。 The video transmitted through the transmission means is compressed into a video stream having a smaller amount of data using a video encoding technique in order to efficiently transmit information. However, with the recent increase in resolution of cameras and video displays, the information held in video has become enormous, and further efficient video coding technology is required. Here, the video refers to continuous images, that is, moving images.

映像符号化技術では、入力した連続する複数の画像内や画像間の相関関係を利用して、もとの画像データをより多くの０を含む情報へと変換し、０を効率よく符号化できるハフマン符号化（ＨｕｆｆｍａｎＥｎｃｏｄｉｎｇ）や算術符号化を用いて情報を圧縮することにより符号化効率を向上させる。 In the video encoding technique, the correlation between the input continuous image and the inter-image can be used to convert the original image data into information including more zeros, and the zeros can be efficiently encoded. Coding efficiency is improved by compressing information using Huffman coding or arithmetic coding.

ここで、全く同じ復号化画像を得ることが可能な２つの映像ストリームがある場合、より符号量の少ない映像ストリームを、符号化効率が良いという。また、全く同じ符号量の映像ストリームが２つある場合、復号化画像と符号化対象である映像との誤差がより少ない映像ストリームを、符号化効率が良いという。ここで、より綺麗な画像とは種々の定義があるが、人間が見てより綺麗と感じる画像や、原画像との誤差が少ない画像を言う。 Here, when there are two video streams from which the same decoded image can be obtained, a video stream with a smaller code amount is said to have good coding efficiency. Also, when there are two video streams having the same code amount, a video stream with less error between the decoded image and the video to be encoded is said to have good encoding efficiency. Here, there are various definitions of a cleaner image, but an image that is more beautiful when viewed by humans or an image with less error from the original image.

非特許文献１では、符号化する領域に対して、時間的に前後する画像から似た領域を探し出し領域間で差分をとって、より０を多く含む情報へと変換し符号化効率を向上させる。 In Non-Patent Document 1, a similar region is searched from images that are temporally mixed with respect to a region to be encoded, and a difference is obtained between the regions, which is converted into information containing more zeros to improve encoding efficiency. .

図１５は、非特許文献１に開示された映像符号化装置の構成を示す構成図である。 FIG. 15 is a configuration diagram illustrating the configuration of the video encoding device disclosed in Non-Patent Document 1.

まず、映像信号入力部１５０１が、映像を入力し動き予測補償符号化部１５０２に原映像として出力する。次に、動き予測補償符号化部１５０２が、原映像とデブロック部１５０６から入力した復号化映像を用いて、動き予測補償符号化を行い、差分画像を生成する。 First, the video signal input unit 1501 inputs video and outputs it as an original video to the motion prediction / compensation coding unit 1502. Next, the motion prediction / compensation encoding unit 1502 performs motion prediction / compensation encoding using the original video and the decoded video input from the deblocking unit 1506 to generate a difference image.

次に、ＤＣＴ部１５０３が、差分画像をＤＣＴ変換する。次に、量子化部１５０４が、ＤＣＴ変換した差分画像を量子化する。量子化とは、ある値を適当な数値で除算し、より少ない情報量を持つ小さい値に変換する処理である。次に、可変長符号化部１５０５が、ハフマン符号化や算術符号化を用いて可変長符号化を行い、映像ストリームを生成する。次に、ストリーム出力部１５０６が、映像ストリームを出力する。次に、逆量子化部１５０８が、量子化した差分画像を逆量子化する。次に、逆ＤＣＴ部１５０７が、逆量子化した差分画像を逆ＤＣＴ変換する。次に、デブロック部１５０９が、逆ＤＣＴ変換した差分画像をデブロック処理し復号化画像を生成する。 Next, the DCT unit 1503 performs DCT conversion on the difference image. Next, the quantization unit 1504 quantizes the DCT-transformed difference image. Quantization is a process of dividing a certain value by an appropriate numerical value and converting it to a small value having a smaller amount of information. Next, the variable length coding unit 1505 performs variable length coding using Huffman coding or arithmetic coding, and generates a video stream. Next, the stream output unit 1506 outputs a video stream. Next, the inverse quantization unit 1508 inversely quantizes the quantized difference image. Next, the inverse DCT unit 1507 performs inverse DCT transform on the inversely quantized difference image. Next, the deblocking unit 1509 performs a deblocking process on the difference image obtained by inverse DCT transformation to generate a decoded image.

逆量子化部１５０８と逆ＤＣＴ部１５０７とデブロック部１５０９の処理により、符号化処理において、復号化処理で得ることのできる復号化画像と同一の画像を生成することが可能である。これにより、動き予測補償符号化部１５０２では、復号化処理で用いられる復号化画像と同一の画像を用いて、時間的に前後する画像から似た領域を探し出すことができ、符号化と復号化処理の間での誤差をなくすことが可能である。 By the processes of the inverse quantization unit 1508, the inverse DCT unit 1507, and the deblocking unit 1509, it is possible to generate the same image as the decoded image that can be obtained by the decoding process in the encoding process. As a result, the motion prediction / compensation encoding unit 1502 can search for a similar region from temporally adjacent images using the same image as the decoded image used in the decoding process, and can perform encoding and decoding. It is possible to eliminate errors during processing.

この非特許文献２の８．４項に記載する、動き予測補償符号化部１５０２の操作を動き予測補償と呼び、符号化する領域から差分をとった似た領域までの方向と距離を動きベクトルと呼び、領域間の関係を参照関係と呼ぶ。動き予測補償を用いる符号化では、動きベクトルと差分をとった残りの情報である残差を符号化して映像ストリームを生成する。 The operation of the motion prediction / compensation encoding unit 1502 described in Section 8.4 of Non-Patent Document 2 is referred to as motion prediction compensation, and the direction and distance from the region to be encoded to a similar region obtained by taking the difference is a motion vector. And the relationship between the regions is called a reference relationship. In encoding using motion prediction compensation, a video stream is generated by encoding a residual, which is residual information obtained by taking a difference from a motion vector.

非特許文献２では、ＭＣＴＦ（ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｅｄＴｅｍｐｏｒａｌＦｉｌｔｅｒｉｎｇ）という動き予測補償を用いて、符号化効率を向上している。 In Non-Patent Document 2, encoding efficiency is improved by using motion prediction compensation called MCTF (Motion Compensated Temporal Filtering).

ＭＣＴＦについて簡単に説明する。ＭＣＴＦでは、２＾Ｎ枚の連続する画像から時間方向に平均した１枚の時間的平均画像を生成する。まず、２のＮ乗枚の連続する画像から、半分の枚数である２の（Ｎ−１）乗枚の時間的平均画像を生成し、さらにそこから半分の枚数の時間的平均画像を生成し、順次、最後の１枚になるまで繰り返す。これら時間的平均画像に加えることによりもとの２倍の枚数の画像を復号化することのできる画像を時間的差分画像と呼ぶ。ＭＣＴＦでは、最後の１枚の時間的平均画像と全ての時間的差分画像を符号化する。時間的差分画像の情報が０を多く含むほど符号化効率が高く、０を多く含ませるためＭＣＴＦでは時間的に前後する画像から似た領域を探し出し領域間で差分をとる。 The MCTF will be briefly described. In MCTF, one temporal average image averaged in the time direction from 2 ^ N consecutive images is generated. First, a time average image of 2 (N-1) powers, which is half the number, is generated from 2 N consecutive images, and then a time average image of half the number is generated therefrom. Repeat until the last one. An image that can be added to these temporal average images to decode twice the original number of images is called a temporal difference image. In MCTF, the last one temporal average image and all temporal difference images are encoded. The more the time difference image information contains 0, the higher the coding efficiency. In order to include more 0, MCTF searches for similar regions from temporally surrounding images and takes differences between the regions.

以下に、より詳しいＭＣＴＦの説明を行う。図１６は連続する２枚の画像にＭＣＴＦを適用した場合の概念図である。１６０１は１枚目の画像、１６０２は２枚目の画像を表す。画像１６０１と画像１６０２の時間的平均画像１６０３と時間的差分画像１６０４を生成する。時間的平均画像１６０３は画像１６０１と画像１６０２を平均した画像であり、時間的差分画像１６０４と時間的平均画像１６０３を用いると画像１６０１と画像１６０２が復号化可能である。 The MCTF will be described in more detail below. FIG. 16 is a conceptual diagram when MCTF is applied to two consecutive images. 1601 represents the first image, and 1602 represents the second image. A temporal average image 1603 and a temporal difference image 1604 of the image 1601 and the image 1602 are generated. The temporal average image 1603 is an image obtained by averaging the images 1601 and 1602. When the temporal difference image 1604 and the temporal average image 1603 are used, the images 1601 and 1602 can be decoded.

以下の（数１）は時間的平均画像１６０３の中の１画素の計算方法を示したものである。（数１）のＡは画像１６０１の１画素の値を示し、Ｂは画像１６０２の対応する１画素の値を示し、Ｌは画像１６０３の対応する１画素の値を示す。 The following (Equation 1) shows a method of calculating one pixel in the temporal average image 1603. A in (Expression 1) indicates the value of one pixel of the image 1601, B indicates the value of one corresponding pixel of the image 1602, and L indicates the value of one corresponding pixel of the image 1603.

以下の（数２）は時間的差分画像１６０４の中の１画素の計算方法を示したものである。（数２）のＡとＢは数式１と同様であり、Ｈは画像１６０４の対応する１画素の値を示す。 The following (Equation 2) shows a method for calculating one pixel in the temporal difference image 1604. A and B in (Expression 2) are the same as those in Equation 1, and H indicates the value of one corresponding pixel of the image 1604.

以下の（数３）と（数４）は時間的平均画像１６０３と時間的差分画像１６０４から、画像１６０１と画像１６０２の対応する画素を計算する方法を示したものである。 The following (Equation 3) and (Equation 4) show a method of calculating corresponding pixels of the image 1601 and the image 1602 from the temporal average image 1603 and the temporal difference image 1604.

ただし、（数１）から（数４）は説明の簡単のための一例であり、その他の式を用いることも可能である。 However, (Equation 1) to (Equation 4) are examples for the sake of simplicity of explanation, and other equations can also be used.

ＭＣＴＦでは（数２）のＨの値を０に近づけることにより、符号化効率を向上させる。そのため、数式内のＡとＢの対応する画素は同位置である必要はない。例えば、２枚の画像内の物体が移動しており、画像１６０１で物体を描画する画素の領域１６０５と画像１６０２の領域１６０６が似るならば、領域１６０５の各画素の値を（数１）および（数２）におけるＡの値とし、領域１６０６の各画素の値を（数１）および（数２）におけるＢの値とすると、Ｈの値が小さくなり符号化効率が向上する。その際、(数１）および（数２）のＬの値は時間的平均画像１６０３の領域１６０７にあたり、Ｈの値は時間的差分画像１６０４の領域１６０８にあたり、領域１６０７と領域１６０８の関係を参照と呼び、領域１６０８から領域１６０７への相対的位置を動きベクトル１６０９で表す。ＭＣＴＦの動き予測補償では、Ｌの値である時間的平均画像と、残差のＨの値である時間的差分画像と、動きベクトルとを符号化して映像ストリームを生成する。 In MCTF, encoding efficiency is improved by bringing the value of H in (Equation 2) close to 0. Therefore, the corresponding pixels of A and B in the formula need not be in the same position. For example, if an object in two images is moving, and a region 1605 of a pixel in which the object is drawn in the image 1601 is similar to a region 1606 of the image 1602, the value of each pixel in the region 1605 is expressed by (Equation 1) and When the value of A in (Expression 2) is used and the value of each pixel in the region 1606 is the value of B in (Expression 1) and (Expression 2), the value of H is reduced and the coding efficiency is improved. At that time, the value of L in (Equation 1) and (Equation 2) corresponds to the region 1607 of the temporal average image 1603, the value of H corresponds to the region 1608 of the temporal difference image 1604, and the relationship between the region 1607 and the region 1608 is referred to. The relative position from the region 1608 to the region 1607 is represented by a motion vector 1609. In MCTF motion prediction compensation, a video stream is generated by encoding a temporal average image having an L value, a temporal difference image having a residual H value, and a motion vector.

ここで、似た領域であるということは、対応する画素ごとの誤差の２乗和がより小さい値であること、実際に映像ストリームに変換した場合の符号量が小さいこと、などを指す。 Here, a similar region indicates that the square sum of errors for each corresponding pixel is a smaller value, that the code amount when actually converted into a video stream is small, and the like.

動きベクトルの数が多くなると動きベクトルそのものの符号量が多くなり符号化効率が下がるので、動き予測補償は４×４画素や８×８画素の複数画素単位の領域である符号化ブロックごとで行う。 When the number of motion vectors increases, the amount of code of the motion vectors themselves increases and the coding efficiency decreases. Therefore, motion prediction compensation is performed for each coding block that is a region of a plurality of pixels of 4 × 4 pixels or 8 × 8 pixels. .

また、動き予測補償の後、ＬやＨの値を適当な数値で除算することで、絶対値のより小さな値や０に変換し符号化効率を向上させる。これを量子化と呼び、復号化の際には同じ画素を同じ数値で掛算する逆量子化を行い元のＬやＨと近い値を復元する。量子化の際の除算する数値を決定するパラメータを量子化パラメータと呼ぶ。一般的に量子化パラメータの値が大きいほど除算する数値が大きく、画質が悪くなる。 Further, after motion prediction compensation, the values of L and H are divided by appropriate numerical values to convert the values into smaller values or 0, thereby improving the coding efficiency. This is called quantization, and at the time of decoding, inverse quantization is performed by multiplying the same pixel by the same numerical value, and values close to the original L and H are restored. A parameter that determines a numerical value to be divided upon quantization is called a quantization parameter. In general, the larger the quantization parameter value, the larger the numerical value to be divided and the worse the image quality.

ＭＣＴＦで得た時間的平均画像１６０３と時間的差分画像１６０４と動きベクトルとを復号化する方法について述べる。例えば、動きベクトル１６０９によって領域１６０７と領域１６０８が参照関係にあり、復号化すると画像１６０１の領域１６０５と画像１６０２の領域１６０６になることが分かる。領域１６０７の各画素の値を（数３）および（数４）におけるＬの値とし、領域１６０８の各画素の値をＨの値として、領域１６０５の各画素を表すＡの値、および、領域１６０８の各画素を表すＢの値を求める。 A method of decoding the temporal average image 1603, temporal difference image 1604, and motion vector obtained by MCTF will be described. For example, it can be seen that the region 1607 and the region 1608 are in a reference relationship based on the motion vector 1609 and that when decoded, the region 1605 of the image 1601 and the region 1606 of the image 1602 are obtained. The value of each pixel in the area 1607 is set to the value L in (Equation 3) and (Equation 4), the value of each pixel in the area 1608 is set to H, and the value of A representing each pixel in the area 1605 A value of B representing each pixel 1608 is obtained.

ただし、符号化の際に量子化などの処理を行うと、原画像を完全に復号化できるとは限らない。 However, if processing such as quantization is performed at the time of encoding, the original image may not be completely decoded.

図１７は階層的に秒間３０枚で入力する映像のうち、８枚の画像にＭＣＴＦを階層的に適用した概念図である。入力する８枚の原画像に対して２枚ずつＭＣＴＦを適用し、得られた時間的平均画像４枚に対して再度２枚ずつＭＣＴＦを適用し、時間的平均画像が１枚の１７０１になるまで繰り返す。この際、時間的差分画像１７０２〜１７０８は動き予測補償の後に量子化によって、効率よく符号化する。 FIG. 17 is a conceptual diagram in which MCTF is hierarchically applied to 8 images among the images input hierarchically at 30 frames per second. Two MCTFs are applied to the eight input original images, two MCTFs are applied again to the four temporal average images obtained, and the temporal average image becomes one 1701. Repeat until. At this time, the temporal difference images 1702 to 1708 are efficiently encoded by quantization after motion prediction compensation.

ここで、段階的に繰り返し適用したＭＣＴＦの数が多い時間的平均画像および時間的差分画像ほど高いレベルであるという。図１７において、原画像がレベル０である。原画像にＭＣＴＦを施した時間的平均画像１７０７、１７０９、１７１１、１７１３および時間的差分画像１７０８、１７１０、１７１２、１７１４がレベル１である。レベル１の時間的平均画像１７０７、１７０９、１７１１、１７１３にＭＣＴＦを施した時間的平均画像１７０３、１７０５および時間的差分画像１７０４、１７０６がレベル２である。レベル２の時間的平均画像１７０３、１７０５にＭＣＴＦを施した時間的平均画像１７０１および時間的差分画像１７０２がレベル３である。上位の時間的平均画像と時間的差分画像があれば、下位の時間的平均画像を復号化することができるので、図１７において、映像ストリームに格納するのは時間的平均画像１７０１と時間的差分画像１７０２、１７０４、１７０６、１７０８、１７１０、１７１２、１７１４だけでよい。 Here, the temporal average image and the temporal difference image having a large number of MCTFs repeatedly applied in stages are said to have higher levels. In FIG. 17, the original image is level 0. Level 1 includes temporal average images 1707, 1709, 1711, and 1713 and temporal difference images 1708, 1710, 1712, and 1714 obtained by performing MCTF on the original image. Level 1 temporal average images 1707, 1709, 1711, and 1713 obtained by applying MCTF to level 1 temporal average images 1707, 1709, 1711, and 1713 and temporal difference images 1704 and 1706 are level 2. Level 3 is a time average image 1701 and a time difference image 1702 obtained by performing MCTF on level 2 time average images 1703 and 1705. If there is an upper temporal average image and a temporal difference image, the lower temporal average image can be decoded. Therefore, in FIG. 17, what is stored in the video stream is the temporal average image 1701 and the temporal difference. Only the images 1702, 1704, 1706, 1708, 1710, 1712, 1714 are required.

段階的にＭＣＴＦを適用した後に、量子化などの処理を行うことが可能である。 After applying MCTF step by step, it is possible to perform processing such as quantization.

また、この段階的なＭＣＴＦ構造を利用してフレームレート（秒間の画像枚数）を選択することが可能である。例えば、原画像を秒間３０枚で入力している場合に、８枚ずつ３段階のＭＣＴＦを適用したレベル３の時間的平均画像１７０１のみを復号化すると秒間３．７５枚の復号化画像が得られる。時間的平均画像１７０１にレベル３の時間的差分画像１７０２を用いてＭＣＴＦの復号化を行うと時間的平均画像１７０３、１７０５が得られ、秒間７．５枚となる。さらにレベル２の時間的平均画像１７０４、１７０６を加えると秒間１５枚となる。さらにレベル１の時間的差分画像１７０８、１７１０、１７１２、１７１４を加えると秒間３０枚の復号化画像が得られる。このように、復号化に用いる時間的差分画像のレベルを選択することにより、通信速度に応じてフレームレートを調整することなどが可能である。
“Ａｄｖａｎｃｅｄｖｉｄｅｏｃｏｄｉｎｇ“、ＩＳＯ／ＩＥＣ、１４４９６−１０、２００３−１２−０１ＪｕｌｉａｎＲｅｉｃｈｅｌ他著、“ＳｃａｌａｂｌｅＶｉｄｅｏＣｏｄｉｎｇ−ＷｏｒｋｉｎｇＤｒａｆｔ１”、ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／ＷＧ１１ａｎｄＩＴＵ−ＴＳＧ１６Ｑ．６、１４ｔｈＭｅｅｔｉｎｇ：ＨｏｎｇＫｏｎｇ、ＣＮ、１７−２１、Ｊａｎｕａｒｙ、２００５ Further, it is possible to select the frame rate (number of images per second) using this stepwise MCTF structure. For example, when the original image is input at 30 frames per second, if only the level 3 temporal average image 1701 to which three stages of MCTF are applied is decoded every 8 frames, 3.75 decoded images are obtained per second. It is done. When MCTF decoding is performed using the temporal difference image 1702 of level 3 for the temporal average image 1701, temporal average images 1703 and 1705 are obtained, which is 7.5 images per second. If level 2 temporal average images 1704 and 1706 are further added, the number of images is 15 per second. Furthermore, when level 1 temporal difference images 1708, 1710, 1712, and 1714 are added, 30 decoded images are obtained per second. Thus, by selecting the level of the temporal difference image used for decoding, it is possible to adjust the frame rate according to the communication speed.
“Advanced video coding”, ISO / IEC, 14496-10, 2003-12-01 Julian Reichel et al., “Scalable Video Coding-Working Draft 1”, ISO / IEC JTC1 / SC29 / WG11 and ITU-T SG16 Q. 6, 14th Meeting: Hong Kong, CN, 17-21, January, 2005

しかしながら、非特許文献１の方法では、参照される領域の画質が悪くノイズを多く含むと、参照する領域の画質も低下する。これはドリフトノイズと呼ばれ、解決するためには、参照される領域の画質を向上すれば良い。非特許文献１の方法では、参照する画像として、ＤＣＴ部１５０３、量子化部１５０４での符号化処理の後、逆量子化部１５０８、逆ＤＣＴ部１５０７、デブロック部１５０６での復号化処理を行った復号化画像を用いている。よって、参照される領域の画質を向上するには、符号化と復号化処理を再度行わねばならず、処理負荷の面で効率が悪い。 However, in the method of Non-Patent Document 1, if the image quality of the referenced region is poor and contains a lot of noise, the image quality of the referenced region also deteriorates. This is called drift noise. In order to solve this problem, the image quality of the referenced region should be improved. In the method of Non-Patent Document 1, as an image to be referenced, after the encoding process in the DCT unit 1503 and the quantization unit 1504, the decoding process in the inverse quantization unit 1508, the inverse DCT unit 1507, and the deblocking unit 1506 is performed. The performed decoded image is used. Therefore, in order to improve the image quality of the referenced region, the encoding and decoding processes must be performed again, which is inefficient in terms of processing load.

非特許文献２の方法では、たとえば、図１７において、３段階のＭＣＴＦを行った後で量子化の処理を施すことが可能で、多く参照される領域は量子化パラメータを小さくして高画質にするといった処理が可能である。 In the method of Non-Patent Document 2, for example, in FIG. 17, it is possible to perform a quantization process after performing three stages of MCTF. Can be processed.

しかしながら、非特許文献２では、多く参照される領域を高画質化する場合、その領域をどう量子化するかを示す量子化パラメータが必要であり、この符号が多いと符号化効率が悪くなる。非特許文献２に記載するＳ．７．３．８とＳ．７．３．９．１とＳ．７．３．９．２項のｍｂ＿ｑｐ＿ｄｅｌｔａがこれに当たる。 However, in Non-Patent Document 2, in order to improve the image quality of a frequently referred area, a quantization parameter indicating how to quantize the area is required. If there are many codes, the coding efficiency is deteriorated. Non-Patent Document 2 describes S.I. 7.3.8 and S.W. 7.3.9.1 and S.E. This is mb_qp_delta in Section 7.3.9.2.

また、非特許文献２では、単独で復号化可能な基本レイヤと、基本レイヤを高画質化する拡張レイヤと、に階層化して符号化することが可能である。ここで、基本レイヤにて多く参照される領域を拡張レイヤで高画質化する場合を考える。 Further, in Non-Patent Document 2, it is possible to encode by layering into a base layer that can be decoded independently and an enhancement layer that improves the image quality of the base layer. Here, let us consider a case where a region that is frequently referred to in the base layer is improved in image quality in the enhancement layer.

例えば、時間的平均画像１７０１の基本レイヤストリームを拡張レイヤストリームで高画質化する。この場合、拡張レイヤストリームの符号が基本レイヤのどの領域を高画質化するか、を示す符号が必要であり、その符号の分だけ符号化効率が悪くなる。非特許文献２に記載するＳ．７．３．８とＳ．７．３．９．１とＳ．７．３．９．２項のｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎやｃｏｄｅｄ＿ｂｌｏｃｋ＿ｐａｔｔｅｒｎ＿ｂｉｔが、その領域の符号を含むかどうかを示しており、これに当たる。 For example, the basic layer stream of the temporal average image 1701 is improved in image quality with the enhancement layer stream. In this case, a code indicating which region of the base layer is enhanced by the code of the enhancement layer stream is necessary, and the coding efficiency is deteriorated by the amount of the code. Non-Patent Document 2 describes S.I. 7.3.8 and S.W. 7.3.9.1 and S.E. The coded_block_pattern and coded_block_pattern_bit in Section 7.3.9.2 indicate whether or not the code of the area is included, which corresponds to this.

本発明の映像符号化装置は、ＭＣＴＦ（ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｅｄＴｅｍｐｏｒａｌＦｉｌｔｅｒｉｎｇ）により動きベクトルを生成する動き予測補償符号化手段と、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成手段と、前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成手段と、領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出手段と、前記ビット抽出手段が抽出したビットを可変長符号化する可変長符号化手段と、を有する構成をとる。 The video encoding apparatus according to the present invention generates a motion prediction / compensation encoding unit that generates a motion vector by MCTF (Motion Compensated Temporal Filtering), and generates a reference map that represents a reference relationship between regions of a plurality of images using the motion vector. A reference map generation means for generating, and a bit specification parameter generation means for generating a bit specification parameter for specifying which bit of a set of bits representing a pixel in a region is encoded and stored in a video stream using the reference map. A bit extracting means for extracting a bit specified by the bit specifying parameter from a bit representing a pixel of the region, and a variable length encoding means for variable length encoding the bit extracted by the bit extracting means. .

この構成により、動き予測補償符号化を行った後に前記参照マップに応じて画質への影響度の大きい領域を特定し、前記ビット指定パラメータを用いてその領域に多くの符号を割当てることができ、また、符号化処理と復号化処理で同一の前記参照マップを生成することが可能であるので前記ビット指定パラメータを符号化せずに映像ストリームが生成でき、符号化効率を向上することが可能である。 With this configuration, after performing motion prediction compensation coding, it is possible to identify a region having a large influence on image quality according to the reference map, and to assign many codes to the region using the bit designation parameter, In addition, since the same reference map can be generated by the encoding process and the decoding process, a video stream can be generated without encoding the bit designation parameter, and the encoding efficiency can be improved. is there.

また、本発明の映像符号化装置は上記の構成において、前記参照マップ生成手段が、画像の領域ごとに前記動きベクトルが参照する回数が多いほど値が大きくなる参照度合を算出し、前記ビット指定パラメータ生成手段が、前記参照マップの参照度合に応じて前記ビット指定パラメータを生成する、ことを特徴とする。 In the video encoding device of the present invention, in the above configuration, the reference map generation unit calculates a reference degree that increases as the number of times the motion vector refers to each area of the image, and the bit designation The parameter generation means generates the bit designation parameter according to the reference degree of the reference map.

この構成により、領域ごとの前記参照度合に基づき影響度の大きい領域を特定し、影響度の大きい領域を高画質化することができ、符号化効率を向上することが可能である。 With this configuration, it is possible to identify a region having a large influence based on the reference degree for each region, improve the image quality of the region having a large influence, and improve the coding efficiency.

また、本発明の映像符号化装置は上記の構成において、前記参照マップ生成手段が、ある領域がもつ前記参照度合が大きいほど、その領域が参照する領域の前記参照度合を大きくする、ことを特徴とする。 Further, the video encoding device of the present invention is characterized in that, in the above configuration, the reference map generating means increases the reference degree of the area referred to by the area as the reference degree of the area increases. And

この構成により、前記参照マップが、前記動きベクトルが表す２枚の画像間の参照関係だけでなく、２枚以上の複数の画像の参照関係を表し、影響度の大きい領域を特定し、影響度の大きい領域を高画質化することができ、符号化効率を向上することが可能である。 With this configuration, the reference map represents not only a reference relationship between two images represented by the motion vector but also a reference relationship between a plurality of two or more images, specifies a region having a large influence, It is possible to improve the image quality of a large area and improve the encoding efficiency.

また、本発明の映像符号化装置は上記の構成において、前記参照マップ生成手段が、前記動きベクトルが参照する領域の画素をあらかじめ定めた数以上含む領域に対して前記参照度合を大きくする、ことを特徴とする。 Further, in the video encoding device according to the present invention, in the above configuration, the reference map generation unit increases the reference degree for a region including a predetermined number of pixels of a region referred to by the motion vector. It is characterized by.

この構成により、前記動きベクトルが参照する領域が前記参照度合を定義する領域の境界を跨ぐ場合にでも、画質への影響度の大きい領域の前記参照度合を大きくでき、符号化効率を向上することが可能である。 With this configuration, even when the region referred to by the motion vector crosses the boundary of the region that defines the reference degree, the reference degree of the region having a large influence on the image quality can be increased, and the coding efficiency can be improved. Is possible.

また、本発明の映像符号化装置は上記の構成において、前記参照マップ生成手段が、段階的に行うＭＣＴＦによる動き予測補償のうち高いレベルの段階で生成した前記動きベクトルが参照する領域ほど参照度合を大きくする、ことを特徴とする。 In the video encoding device of the present invention, in the above configuration, the reference map generation unit refers to a region that is referred to by the motion vector generated at a higher level in the motion prediction compensation by MCTF performed stepwise. It is characterized by increasing.

この構成により、より画質への影響の大きい、段階的に行うＭＣＴＦによる動き予測補償の遅い段階で生成する時間的平均画像および時間的差分画像に対して前記参照度合を大きくでき、符号化効率を向上することが可能である。 With this configuration, the degree of reference can be increased with respect to the temporal average image and temporal difference image generated at a later stage of motion prediction compensation by MCTF performed in stages, which has a larger influence on image quality, and the encoding efficiency can be increased. It is possible to improve.

また、本発明の映像符号化装置は上記の構成において、前記ビット抽出手段が、前記ビット指定パラメータに基づき量子化を行う、ことを特徴とする。 The video encoding apparatus according to the present invention is characterized in that, in the above configuration, the bit extraction means performs quantization based on the bit designation parameter.

この構成により、前記ビット抽出手段が、前記ビット指定パラメータが指定する画質への影響度の大きい領域ほど小さい値で除算することにより、その領域を高画質化することができ、符号化効率を向上することが可能である。 With this configuration, the bit extraction means can divide the area having a larger influence on the image quality specified by the bit specification parameter by a smaller value, thereby improving the image quality of the area and improving the encoding efficiency. Is possible.

また、本発明の映像符号化装置は上記の構成において、前記ビット抽出手段が、前記ビット指定パラメータに基づき複数のビット列の同じ位のビット毎に符号化を行う、ことを特徴とする。 The video encoding apparatus according to the present invention is characterized in that, in the above configuration, the bit extracting means performs encoding for each bit in the same order of a plurality of bit strings based on the bit designation parameter.

この構成により、前記ビット抽出手段が、前記ビット指定パラメータが指定する画質への影響度の大きい領域ほど多くのビット平面を割り当てることにより、その領域を高画質化することができ、符号化効率を向上することが可能である。 With this configuration, the bit extraction unit can assign a higher number of bit planes to a region having a higher degree of influence on the image quality specified by the bit specification parameter, thereby improving the image quality of the region. It is possible to improve.

ここで、ビット平面とは、複数のビット列の同じ位のビットの集まりのことである。 Here, the bit plane is a group of bits in the same order in a plurality of bit strings.

また、本発明の映像符号化装置は上記の構成において、映像を単独で復号化可能な基本レイヤと基本レイヤの画質を向上する拡張レイヤに階層化して符号化するものであり、前記ビット指定パラメータ生成手段が、拡張レイヤに対する前記ビット指定パラメータを生成する、特徴を有する。 The video encoding apparatus according to the present invention, in the above configuration, encodes a video layered into a base layer that can be decoded independently and an extended layer that improves the image quality of the base layer, and the bit designation parameter The generation unit has a feature of generating the bit designation parameter for the enhancement layer.

この構成により、符号化処理と復号化処理で同一の前記参照マップを生成することが可能であるので前記ビット指定パラメータを符号化せずに拡張レイヤの映像ストリームである拡張レイヤストリームを生成でき、符号化効率を向上することが可能である。 With this configuration, since the same reference map can be generated in the encoding process and the decoding process, an enhancement layer stream that is an enhancement layer video stream can be generated without encoding the bit designation parameter. It is possible to improve the encoding efficiency.

本発明の映像復号化装置は、映像ストリームを可変長復号化する可変長復号化手段と、動きベクトルを用いてＭＣＴＦ（ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｅｄＴｅｍｐｏｒａｌＦｉｌｔｅｒｉｎｇ）を行う動き予測補償復号化手段と、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成手段と、前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成手段と、領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置手段と、を有する構成をとる。 The video decoding apparatus according to the present invention includes a variable length decoding unit that performs variable length decoding of a video stream, a motion predictive compensation decoding unit that performs MCTF (Motion Compensated Temporal Filtering) using a motion vector, and the motion vector. A reference map generating means for generating a reference map representing a reference relationship between regions of a plurality of images, and which bit representing a pixel of the region corresponds to a bit obtained by decoding a video stream using the reference map The configuration includes bit designation parameter generation means for generating a bit designation parameter to be designated, and bit arrangement means for arranging a bit designated by the bit designation parameter in a bit representing a pixel in a region.

この構成により、前記参照マップに応じて画質への影響度の大きい領域を特定し、前記ビット指定パラメータを用いてその領域に多くの符号を割当てることができ、また、符号化処理と復号化処理で同一の前記参照マップを生成することが可能であるので前記ビット指定パラメータを符号化せずに映像ストリームが生成でき、符号化効率を向上することが可能である。 With this configuration, it is possible to identify a region having a large influence on image quality according to the reference map, and to assign a large number of codes to the region using the bit designation parameter, and to perform encoding processing and decoding processing Since the same reference map can be generated, a video stream can be generated without encoding the bit designation parameter, and encoding efficiency can be improved.

また、本発明の映像復号化装置は上記の構成において、前記参照マップ生成手段が、画像の領域ごとに前記動きベクトルが参照する回数が多いほど値が大きくなる参照度合を算出し、前記ビット指定パラメータ生成手段が、前記参照マップの参照度合に応じて前記ビット指定パラメータを生成する、ことを特徴とする。 In the video decoding device of the present invention, in the above configuration, the reference map generation unit calculates a reference degree that increases as the number of times the motion vector refers to each area of the image, and the bit designation The parameter generation means generates the bit designation parameter according to the reference degree of the reference map.

また、本発明の映像復号化装置は上記の構成において、前記参照マップ生成手段が、ある領域がもつ前記参照度合が大きいほど、その領域が参照する領域の前記参照度合を大きくする、ことを特徴とする。 The video decoding apparatus according to the present invention is characterized in that, in the above configuration, the reference map generation means increases the reference degree of the area referred to by the area as the reference degree of the area increases. And

また、本発明の映像復号化装置は上記の構成において、前記参照マップ生成手段が、前記動きベクトルが参照する領域の画素をあらかじめ定めた数以上含む領域に対して前記参照度合を大きくする、ことを特徴とする。 In the video decoding device according to the present invention, in the above configuration, the reference map generation unit increases the reference degree with respect to a region including a predetermined number of pixels in a region referred to by the motion vector. It is characterized by.

また、本発明の映像復号化装置は上記の構成において、前記参照マップ生成手段が、段階的に行うＭＣＴＦによる動き予測補償のうち遅い段階で生成した前記動きベクトルが参照する領域ほど参照度合を大きくする、ことを特徴とする。 In the video decoding device according to the present invention, in the above configuration, the reference map generation unit increases the reference degree in a region referred to by the motion vector generated at a later stage of the motion prediction compensation by MCTF performed in stages. It is characterized by.

また、本発明の映像復号化装置は上記の構成において、前記ビット配置手段が、前記ビット指定パラメータに基づき逆量子化を行う、ことを特徴とする。 The video decoding apparatus according to the present invention is characterized in that, in the above configuration, the bit arrangement means performs inverse quantization based on the bit designation parameter.

また、本発明の映像復号化装置は上記の構成において、前記ビット配置手段が、前記ビット指定パラメータに基づき複数のビット列の同じ位のビット毎に復号化を行う、ことを特徴とする。 The video decoding apparatus according to the present invention is characterized in that, in the above configuration, the bit arrangement means performs decoding for each bit in the same order of a plurality of bit strings based on the bit designation parameter.

また、本発明の映像復号化装置は上記の構成において、映像を単独で復号化可能な基本レイヤと基本レイヤの画質を向上する拡張レイヤに階層化した映像ストリームを復号化するものであり、前記ビット指定パラメータ生成手段が、拡張レイヤに対する前記ビット指定パラメータを生成する、特徴を有する。 The video decoding device of the present invention, in the above configuration, decodes a video stream hierarchized into a base layer capable of decoding video independently and an enhancement layer that improves the image quality of the base layer, The bit designation parameter generation means generates the bit designation parameter for the enhancement layer.

この構成により、符号化処理と復号化処理で同一の前記参照マップを生成することが可能であるので前記ビット指定パラメータを格納していない拡張レイヤの映像ストリームである拡張レイヤストリームを復号化でき、符号化効率を向上することが可能である。 With this configuration, since the same reference map can be generated in the encoding process and the decoding process, an enhancement layer stream that is an enhancement layer video stream that does not store the bit designation parameter can be decoded. It is possible to improve the encoding efficiency.

本発明の映像符号化方法は、ＭＣＴＦにより動きベクトルを生成する動き予測補償符号化処理ステップと、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出処理ステップと、前記ビット抽出処理ステップで抽出したビットを可変長符号化する可変長符号化処理ステップと、を有する構成をとる。 The video coding method according to the present invention includes a motion prediction / compensation coding processing step for generating a motion vector by MCTF, and a reference map generation processing for generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector. A bit designation parameter generation processing step for generating a bit designation parameter for designating which bits of the set of bits representing the pixels of the area using the reference map are encoded and stored in the video stream; and A bit extraction processing step for extracting a bit specified by the bit specification parameter from a bit representing a pixel, and a variable length encoding processing step for variable length encoding the bit extracted in the bit extraction processing step. .

本発明の映像符号化プログラムは、ＭＣＴＦにより動きベクトルを生成する動き予測補償符号化処理ステップと、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出処理ステップと、前記ビット抽出処理ステップで抽出したビットを可変長符号化する可変長符号化処理ステップと、を有する構成をとる。 The video encoding program according to the present invention includes a motion prediction / compensation encoding processing step for generating a motion vector by MCTF, and a reference map generation processing for generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector. A bit designation parameter generation processing step for generating a bit designation parameter for designating which bits of the set of bits representing the pixels of the area using the reference map are encoded and stored in the video stream; and A bit extraction processing step for extracting a bit specified by the bit specification parameter from a bit representing a pixel, and a variable length encoding processing step for variable length encoding the bit extracted in the bit extraction processing step. .

本発明の映像復号化方法は、映像ストリームを可変長復号化する可変長復号化処理ステップと、動きベクトルを用いてＭＣＴＦを行う動き予測補償復号化処理ステップと、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置処理ステップと、を有する構成をとる。 The video decoding method of the present invention includes a variable length decoding processing step for variable length decoding of a video stream, a motion prediction compensation decoding processing step for performing MCTF using a motion vector, and a plurality of images using the motion vector. A reference map generation processing step for generating a reference map representing a reference relationship between regions, and a bit for designating which bit representing a pixel in the region corresponds to a bit obtained by decoding a video stream using the reference map The configuration includes a bit designation parameter generation processing step for generating a designation parameter, and a bit arrangement processing step for arranging a bit designated by the bit designation parameter in a bit representing a pixel in a region.

本発明の映像復号化装置は、映像ストリームを可変長復号化する可変長復号化処理ステップと、動きベクトルを用いてＭＣＴＦを行う動き予測補償復号化処理ステップと、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置処理ステップと、を有する構成をとる。 The video decoding device of the present invention includes a variable length decoding processing step for variable length decoding a video stream, a motion prediction compensation decoding processing step for performing MCTF using a motion vector, and a plurality of images using the motion vector. A reference map generation processing step for generating a reference map representing a reference relationship between regions, and a bit for designating which bit representing a pixel in the region corresponds to a bit obtained by decoding a video stream using the reference map The configuration includes a bit designation parameter generation processing step for generating a designation parameter, and a bit arrangement processing step for arranging a bit designated by the bit designation parameter in a bit representing a pixel in a region.

本発明では、映像符号化において符号化効率を向上し、映像復号化における復号化画像の画質を向上することが可能である。 In the present invention, it is possible to improve the encoding efficiency in video encoding and improve the image quality of a decoded image in video decoding.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（実施の形態１）
第１の実施の形態では、映像符号化において、領域ごとの画質を決定する量子化パラメータを省略して符号化する。 (Embodiment 1)
In the first embodiment, in video encoding, encoding is performed by omitting a quantization parameter that determines image quality for each region.

図１は、本実施の形態１に関わる映像符号化方法を適用した映像符号化装置１００の構成を示す。 FIG. 1 shows a configuration of a video encoding apparatus 100 to which the video encoding method according to the first embodiment is applied.

図１において、映像符号化装置１００は、映像信号入力部１０１、ＭＣＴＦ符号化部１０２、参照マップ生成部１０３、量子化パラメータ生成部１０４、量子化部１０５、可変長符号化部１０６、ストリーム出力部１０７を有する。 In FIG. 1, a video encoding apparatus 100 includes a video signal input unit 101, an MCTF encoding unit 102, a reference map generation unit 103, a quantization parameter generation unit 104, a quantization unit 105, a variable length encoding unit 106, a stream output. Part 107.

映像信号入力部１０１は、映像符号化装置１００の外部から映像を原画像として入力し、ＭＣＴＦ符号化部１０２に出力する。また、映像符号化装置１００の外部から入力する映像の有無を判定し、映像の入力がなければ処理を終了する。 The video signal input unit 101 inputs a video as an original image from the outside of the video encoding device 100 and outputs it to the MCTF encoding unit 102. Also, the presence / absence of a video input from the outside of the video encoding device 100 is determined. If there is no video input, the process is terminated.

ここで、映像信号入力部１０１は、図１７のように、８枚の原画像に対して３回のＭＣＴＦを行う場合は、画像を８枚ずつ入力する。１６枚に対して４回のＭＣＴＦを行う場合は、１６枚ずつ入力する。 Here, as shown in FIG. 17, the video signal input unit 101 inputs eight images each when performing three MCTFs on eight original images. When performing four MCTFs for 16 sheets, input 16 sheets each.

ＭＣＴＦ符号化部１０２は、映像信号入力部１０１から入力した複数の原画像をＭＣＴＦにより動き予測補償符号化し、動きベクトルを可変長符号化部１０６と参照マップ生成部１０３に出力する。また、時間的平均画像と時間的差分画像を生成し、量子化部１０５に出力する。 The MCTF encoding unit 102 performs motion prediction / compensation encoding on the plurality of original images input from the video signal input unit 101 using MCTF, and outputs the motion vectors to the variable length encoding unit 106 and the reference map generation unit 103. In addition, a temporal average image and a temporal difference image are generated and output to the quantization unit 105.

参照マップ生成部１０３は、ＭＣＴＦ符号化部１０２から入力した動きベクトルを用いて参照マップを生成し、量子化パラメータ生成部１０４に出力する。参照マップの生成方法については後述する。 The reference map generation unit 103 generates a reference map using the motion vector input from the MCTF encoding unit 102 and outputs the reference map to the quantization parameter generation unit 104. A reference map generation method will be described later.

量子化パラメータ生成部１０４は、参照マップ生成部１０３から入力した参照マップを用いて画像内の領域ごとに量子化パラメータを生成し、量子化部１０５と可変長符号化部１０６に出力する。ここで、量子化パラメータはスカラー値で、小さい値を持つ方が量子化で除算する数値が小さく、すなわち、その領域を高画質化するものとする。量子化パラメータの生成については後述する。 The quantization parameter generation unit 104 generates a quantization parameter for each region in the image using the reference map input from the reference map generation unit 103, and outputs the quantization parameter to the quantization unit 105 and the variable length encoding unit 106. Here, it is assumed that the quantization parameter is a scalar value, and a smaller value has a smaller numerical value to be divided by quantization, that is, the area has higher image quality. The generation of the quantization parameter will be described later.

量子化部１０５は、ＭＣＴＦ符号化部１０２から入力した時間的平均画像と時間的差分画像を、量子化パラメータ生成部１０４から入力した量子化パラメータを用いて量子化し、可変長符号化部１０６に出力する。 The quantization unit 105 quantizes the temporal average image and the temporal difference image input from the MCTF encoding unit 102 using the quantization parameter input from the quantization parameter generation unit 104, and sends the quantized result to the variable length encoding unit 106. Output.

なお、量子化の前に時間的平均画像と時間的差分画像にＤＣＴ変換（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍａｔｉｏｎ）を行うなどしても良い。その場合は、復号化の処理の際に逆ＤＣＴ変換を必要とする。本発明の主旨ではないので説明は割愛する。 Note that DCT transform (Discrete Cosine Transformation) may be performed on the temporal average image and the temporal difference image before quantization. In that case, an inverse DCT transform is required in the decoding process. Since it is not the gist of the present invention, the description is omitted.

可変長符号化部１０６は、ＭＣＴＦ符号化部１０２から入力した動きベクトルと、量子化部１０５から入力した量子化後の時間的平均画像と時間的差分画像と、を可変長符号化し映像ストリームを生成して、ストリーム出力部１０７に出力する。 The variable length coding unit 106 performs variable length coding on the motion vector input from the MCTF encoding unit 102 and the quantized temporal average image and the temporal difference image input from the quantization unit 105 to generate a video stream. It is generated and output to the stream output unit 107.

ここで、可変長符号化は、量子化後の時間的平均画像および時間的差分画像ごとにレベルの高いものから行う。なぜならば、映像ストリームはレベルの高いものから復号化を行い順次レベルの低い時間的平均画像を生成していくので、レベルの高いものほど映像ストリームの先頭になければならないからである。動きベクトルは、参照される側の量子化後の時間的平均画像および時間的差分画像ではなく、参照する側の量子化後の時間的差分画像とともに符号化する。なぜならば、時間的平均画像と動きベクトルだけではより低いレベルの時間的平均画像を復号化できず、動きベクトルの情報は時間的差分画像の情報があって初めて意味をなすからである。 Here, the variable-length coding is performed from the highest level for each temporal average image and temporal difference image after quantization. This is because the video stream is decoded from the one with the higher level and the temporal average image with the lower level is sequentially generated, so that the higher the level, the more must be at the head of the video stream. The motion vector is encoded together with the quantized temporal difference image on the reference side, not the quantized temporal average image and temporal differential image on the reference side. This is because a temporal average image at a lower level cannot be decoded only with a temporal average image and a motion vector, and motion vector information only makes sense if there is temporal difference image information.

ここで、可変長符号化には算術符号化やハフマン符号化などを用いる。本発明の主旨ではないので説明は割愛する。 Here, arithmetic coding, Huffman coding, or the like is used for variable length coding. Since it is not the gist of the present invention, the description is omitted.

ストリーム出力部１０７は、可変長符号化部１０６から入力した映像ストリームを、映像符号化装置１００の外部に出力する。 The stream output unit 107 outputs the video stream input from the variable length encoding unit 106 to the outside of the video encoding device 100.

なお、ＭＣＴＦ符号化部１０２が本発明の動き予測補償符号化手段に相当し、参照マップ生成部１０３が参照マップ生成手段に相当し、量子化パラメータ生成部１０４がビット指定パラメータ生成手段に相当し、量子化部１０５がビット抽出手段に相当し、可変長符号化部１０６が可変長符号化手段に相当する。また、量子化パラメータがビット指定パラメータに相当する。 The MCTF encoding unit 102 corresponds to the motion prediction / compensation encoding unit of the present invention, the reference map generation unit 103 corresponds to the reference map generation unit, and the quantization parameter generation unit 104 corresponds to the bit designation parameter generation unit. The quantization unit 105 corresponds to a bit extraction unit, and the variable length encoding unit 106 corresponds to a variable length encoding unit. Further, the quantization parameter corresponds to a bit designation parameter.

以下に、本発明の骨子の一つである、参照マップの生成と量子化パラメータの生成について述べる。 In the following, reference map generation and quantization parameter generation, which are one aspect of the present invention, will be described.

ここでは、説明の簡単のため、２段階のＭＣＴＦについて説明する。 Here, for simplicity of explanation, a two-stage MCTF will be described.

図２は、ＭＣＴＦにおける画像間の参照の関係を示す図である。２０１〜２０４は時間的に連続する４枚の画像に対してＭＣＴＦを施したもので、２０１はレベル１の時間的平均画像を表し、２０２は２０１に対するレベル１の時間的差分画像を表す。同様に２０３と２０４はレベル１の時間的平均画像と時間的差分画像を表す。２０５と２０６はレベル１の時間的平均画像２０１と２０３に更にＭＣＴＦを施したもので、２０５はレベル２の時間的平均画像を表し、２０６はレベル２の時間的差分画像を表す。動きベクトル２２２は、ＭＣＴＦにおいて時間的差分画像２０２の領域２２１が時間的平均画像２０１の領域２１１を参照することを表す。動きベクトル２４２は領域２４１が領域２３１を、動きベクトル２６２は領域２６１が領域２５１を、動きベクトル２６４は領域２６２が領域２５２を参照することを表す。 FIG. 2 is a diagram illustrating a reference relationship between images in MCTF. 201 to 204 are obtained by performing MCTF on four temporally continuous images, 201 represents a level 1 temporal average image, and 202 represents a level 1 temporal difference image with respect to 201. Similarly, 203 and 204 represent level 1 temporal average images and temporal difference images. 205 and 206 are obtained by further applying MCTF to the level 1 temporal average images 201 and 203, 205 represents a level 2 temporal average image, and 206 represents a level 2 temporal difference image. The motion vector 222 represents that the region 221 of the temporal difference image 202 refers to the region 211 of the temporal average image 201 in MCTF. The motion vector 242 represents that the region 241 refers to the region 231, the motion vector 262 represents that the region 261 refers to the region 251, and the motion vector 264 represents that the region 262 refers to the region 252.

図３は、図２のうち符号化する必要のないレベル１の時間的平均画像２０１と２０３を省略して参照の関係を示したものである。レベル１の時間的平均画像２０１と２０３はレベル２の時間的平均画像２０５と時間的差分画像２０６から生成することができる。 FIG. 3 shows a reference relationship by omitting the level 1 temporal average images 201 and 203 that do not need to be encoded in FIG. Level 1 temporal average images 201 and 203 can be generated from level 2 temporal average image 205 and temporal difference image 206.

よって、時間的平均画像２０１の領域２１１と時間的差分画像２０２の領域２２１に対する動きベクトル２２２は、レベル１の領域２１１とレベル２の時間的差分画像２０６の領域２６２に対する動きベクトル３２２に置き換えることができる。同様に、時間敵平均画像２０３の領域２３１と時間的差分画像２０４の領域２４１に対する動きベクトル２４２は、時間的差分画像２０６の領域Ｃ６３と領域２４１に対する動きベクトル３４２に置き換えることができる。 Accordingly, the motion vector 222 for the region 211 of the temporal average image 201 and the region 221 of the temporal difference image 202 may be replaced with the motion vector 322 for the region 262 of the level 1 region 211 and the level 2 temporal difference image 206. it can. Similarly, the motion vector 242 for the region 231 of the temporal enemy average image 203 and the region 241 of the temporal difference image 204 can be replaced with the motion vector 342 for the region C63 and the region 241 of the temporal difference image 206.

この動きベクトルの参照関係から、ある領域の画質を向上するためには、他のどの領域を高画質化すればよいかの情報が得られる。たとえば、動きベクトル３２２の参照関係から、レベル１の時間的平均画像２０２の領域２２１を復号化した画像の画質を向上するためには、レベル２の時間的差分画像２０６の領域２６２の画質を向上すればよいことがわかる。また、動きベクトル２６４の参照関係から、領域２６２の画質を向上するためには、時間的平均画像２０５の領域２５２の画質を向上すればよいことがわかる。 From this motion vector reference relationship, in order to improve the image quality of a certain region, information on which other region should be improved in image quality can be obtained. For example, in order to improve the image quality of the image obtained by decoding the region 221 of the level 1 temporal average image 202 from the reference relationship of the motion vector 322, the image quality of the region 262 of the level 2 temporal difference image 206 is improved. You can see that Further, it can be seen from the reference relationship of the motion vector 264 that the image quality of the region 252 of the temporal average image 205 may be improved in order to improve the image quality of the region 262.

本発明では、ある領域が他の画像の領域からどの程度参照されているかを示す参照度合を定義し、ＭＣＴＦによる段階的な参照関係を順に追って、画像ごとに、その画像の領域ごとの参照度合を示す参照マップを生成し、参照度合が大きい領域を高画質化する。以下に、詳細を示す。ここで、全ての参照度合の初期値は０であるとする。 In the present invention, a reference degree indicating how much a certain area is referred to from another image area is defined, and the stepwise reference relation by MCTF is followed in order, and the reference degree for each area of the image is determined for each image. A reference map is generated to improve the image quality of an area with a high reference degree. Details are shown below. Here, it is assumed that initial values of all reference degrees are zero.

また、量子化パラメータは符号化ブロックごとに生成するので、参照度合も符号化ブロックごとに用意する。しかし、参照される側の領域、例えば領域Ｃ６３などは符号化ブロックの領域と必ずしも一致するとは限らない。この場合、領域Ｃ６３と一部でも画素を共有する符号化ブロック全ての参照度合を増やす。また、領域Ｃ６３の４分の１以上の画素を共有する符号化ブロックのみ参照度合を増やすなどしても良い。 Further, since the quantization parameter is generated for each coding block, the reference degree is also prepared for each coding block. However, the referenced area, such as the area C63, does not necessarily match the area of the coding block. In this case, the reference degrees of all the coding blocks that share pixels with the region C63 are increased. In addition, the reference degree may be increased only for an encoded block sharing one or more pixels of the region C63.

時間的差分画像２０２の領域２２１は、１つレベルの高い時間的差分画像２０６の領域２６２を参照しているので、領域２６２の参照度合を１増やす。同様に、領域２４１との関係から領域Ｃ６３の参照度合を１増やす。 Since the region 221 of the temporal difference image 202 refers to the region 262 of the one-level high temporal difference image 206, the reference degree of the region 262 is increased by one. Similarly, the reference degree of the region C63 is increased by 1 from the relationship with the region 241.

時間的差分画像２０６の領域２６１は、時間的平均画像２０５の領域２５１を参照しているので、領域２５１の参照度合を１増やす。同様に領域２６２との関係から領域２５２の参照度合を１増やすが、領域２６２は既に参照度合が１であるので、さらに領域２５２の参照度合を１増やす。 Since the region 261 of the temporal difference image 206 refers to the region 251 of the temporal average image 205, the reference degree of the region 251 is increased by one. Similarly, the reference degree of the area 252 is increased by 1 from the relationship with the area 262, but since the reference degree of the area 262 is already 1, the reference degree of the area 252 is further increased by 1.

よって、領域２５２の参照度合が２、領域２５１と領域２６２と領域Ｃ６３が１となる。参照度合が大きい領域ほど、小さい量子化パラメータを生成し高画質化する。 Therefore, the reference degree of the area 252 is 2, and the area 251, the area 262, and the area C63 are 1. A region with a higher reference degree generates a smaller quantization parameter to improve image quality.

図４は、１６枚の画像に対してＭＣＴＦを４段階行った場合の参照マップの例である。領域ごとに参照度合を記載している。空白部分は参照度合０である。４０１、４０２、４０３、４０４はレベル２の時間的差分画像に対する参照マップである。４０５、４０６はレベル３の時間的差分画像に対する参照マップである。４０８はレベル４の時間的差分画像に対する参照マップである。４０７はレベル４の時間的平均画像に対する参照マップである。レベル１に関しては参照されないので参照マップは生成しない。 FIG. 4 is an example of a reference map when MCTF is performed in four stages on 16 images. The reference degree is described for each area. The blank part has a reference degree of 0. Reference numerals 401, 402, 403, and 404 are reference maps for level 2 temporal difference images. Reference numerals 405 and 406 denote reference maps for the level 3 temporal difference image. Reference numeral 408 denotes a reference map for the level 4 temporal difference image. Reference numeral 407 denotes a reference map for the level 4 temporal average image. Since reference is not made for level 1, no reference map is generated.

ここで、ＭＣＴＦの高レベルほど参照度合の増やし方を多くしても良い。これによって、より他の復号化画像への影響の大きい高レベルの画像を優先的に高画質化することが可能である。 Here, the way of increasing the reference degree may be increased as the MCTF becomes higher. As a result, it is possible to preferentially improve the image quality of high-level images that have a greater influence on other decoded images.

１６枚の画像に対してＭＣＴＦを４段階行い、それら全ての動きベクトルを用いて参照マップを生成し量子化パラメータを生成する場合、復号化の処理においても１６枚の画像に対する４段階のＭＣＴＦの全ての動きベクトルが必要である。本実施の形態１は、符号化と復号化で同一の動きベクトルを用いて同一の参照マップを生成し、同一の量子化パラメータを生成することにより、量子化パラメータの符号化を省略することが可能である。 When 16 steps of MCTF are performed on 16 images and a reference map is generated using all of these motion vectors and a quantization parameter is generated, 4 steps of MCTF are also applied to 16 images in the decoding process. All motion vectors are required. In the first embodiment, the same reference map is generated using the same motion vector for encoding and decoding, and the same quantization parameter is generated, thereby omitting the encoding of the quantization parameter. Is possible.

なお、例えば、監視カメラを固定し同じ位置を撮影し続ける監視映像においては、参照される回数が多くとも動きベクトルの方向と大きさが０の領域であれば、監視上重要でない背景などである。また、監視カメラが上下左右に動く場合、動きベクトルの方向と大きさがカメラの動きと完全に対応している領域であれば、監視上重要でない背景である。これらのケースは監視映像以外でも考えられ、その場合は参照度合を増やさず高画質化しなくとも良い。 For example, in a monitoring video in which a surveillance camera is fixed and the same position is continuously photographed, if the direction and size of the motion vector are zero at most, the background is not important for monitoring. . Further, when the surveillance camera moves up, down, left, and right, if the direction and magnitude of the motion vector completely corresponds to the movement of the camera, it is an unimportant background for surveillance. These cases can be considered other than the monitoring video, and in that case, it is not necessary to increase the image quality without increasing the reference level.

次に、以上のように構成された映像符号化装置１００の動作を説明する。図５は、図１に示す第１の実施の形態の映像符号化装置１００の動作の一例を示すフローチャートである。なお、図５に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納されたプログラムを、同じく図示しないＣＰＵが実行し、プログラムによりソフトウエア的に実行することも可能である。 Next, the operation of the video encoding apparatus 100 configured as described above will be described. FIG. 5 is a flowchart showing an example of the operation of the video encoding device 100 according to the first embodiment shown in FIG. In the flowchart shown in FIG. 5, a program stored in a storage device (not shown) (for example, a ROM or a flash memory) can be executed by a CPU (not shown) and can be executed by software using the program.

最初に、ステップＳ５０１において、映像信号入力部１０１が、映像符号化装置１００の外部から映像を原画像として入力し、ＭＣＴＦ符号化部１０２に出力する。 First, in step S 501, the video signal input unit 101 inputs a video as an original image from the outside of the video encoding device 100 and outputs it to the MCTF encoding unit 102.

次に、ステップＳ５０２において、ＭＣＴＦ符号化部１０２が、映像信号入力部１０１から入力した複数の原画像をＭＣＴＦにより動き予測補償符号化し、動きベクトルを可変長符号化部１０６と参照マップ生成部１０３に出力する。また、時間的平均画像と時間的差分画像を生成し、量子化部１０５に出力する。 Next, in step S502, the MCTF encoding unit 102 performs motion prediction compensation encoding on the plurality of original images input from the video signal input unit 101 using MCTF, and converts the motion vector into the variable length encoding unit 106 and the reference map generation unit 103. Output to. In addition, a temporal average image and a temporal difference image are generated and output to the quantization unit 105.

次に、ステップＳ５０３において、参照マップ生成部１０３が、ＭＣＴＦ符号化部１０２から入力した動きベクトルを用いて参照マップを生成し、量子化パラメータ生成部１０４に出力する。参照マップの生成方法については後述する。 In step S 503, the reference map generation unit 103 generates a reference map using the motion vector input from the MCTF encoding unit 102 and outputs the reference map to the quantization parameter generation unit 104. A reference map generation method will be described later.

次に、ステップＳ５０４において、量子化パラメータ生成部１０４が、参照マップ生成部１０３から入力した参照マップを用いて画像内の領域ごとに量子化パラメータを生成し、量子化部１０５と可変長符号化部１０６に出力する。 Next, in step S504, the quantization parameter generation unit 104 generates a quantization parameter for each region in the image using the reference map input from the reference map generation unit 103, and the quantization unit 105 and the variable length coding. To the unit 106.

次に、ステップＳ５０５において、量子化部１０５が、ＭＣＴＦ符号化部１０２から入力した時間的平均画像と時間的差分画像を、量子化パラメータ生成部１０４から入力した量子化パラメータを用いて量子化し、可変長符号化部１０６に出力する。 Next, in step S505, the quantization unit 105 quantizes the temporal average image and temporal difference image input from the MCTF encoding unit 102 using the quantization parameter input from the quantization parameter generation unit 104, The data is output to the variable length coding unit 106.

次に、ステップＳ５０６において、可変長符号化部１０６が、ＭＣＴＦ符号化部１０２から入力した動きベクトルと、量子化部１０５から入力した量子化後の時間的平均画像と時間的差分画像と、を可変長符号化し映像ストリームを生成して、ストリーム出力部１０７に出力する。 Next, in step S506, the variable length encoding unit 106 calculates the motion vector input from the MCTF encoding unit 102, the quantized temporal average image and the temporal difference image input from the quantizing unit 105. A variable-length encoded video stream is generated and output to the stream output unit 107.

次に、ステップＳ５０７において、ストリーム出力部１０７が、可変長符号化部１０６から入力した映像ストリームを、映像符号化装置１００の外部に出力する。 Next, in step S507, the stream output unit 107 outputs the video stream input from the variable length encoding unit 106 to the outside of the video encoding device 100.

最後に、ステップＳ５０８において、映像信号入力部１０１が、映像符号化装置１００の外部から入力する映像の有無を判定し、映像の入力がなければ処理を終了する。そうでなければ、ステップＳ５０１に戻る。 Finally, in step S508, the video signal input unit 101 determines whether there is a video input from the outside of the video encoding apparatus 100. If there is no video input, the process ends. Otherwise, the process returns to step S501.

なお、ステップＳ５０２が本発明の動き予測補償符号化処理ステップに相当し、ステップＳ５０３が参照マップ生成処理ステップに相当し、ステップＳ５０４がビット指定パラメータ生成処理ステップに相当し、ステップＳ５０５がビット抽出処理ステップに相当し、ステップＳ５０６が可変長符号化処理ステップに相当する。 Step S502 corresponds to the motion prediction / compensation encoding processing step of the present invention, step S503 corresponds to the reference map generation processing step, step S504 corresponds to the bit designation parameter generation processing step, and step S505 corresponds to the bit extraction processing. Step S506 corresponds to a variable-length encoding processing step.

以上のように、本実施の形態１によれば、映像符号化装置１００は、ＭＣＴＦによる動き予測補償符号化の動きベクトルに基づき参照マップを生成し、参照マップに基づき量子化パラメータを生成することにより、量子化パラメータを符号化することなく、多く参照される重要な領域を高画質化することが可能で、符号化効率を向上することができる。 As described above, according to the first embodiment, the video encoding device 100 generates the reference map based on the motion vector of the motion prediction compensation encoding by MCTF, and generates the quantization parameter based on the reference map. Thus, it is possible to improve the image quality of important regions that are often referred to without encoding quantization parameters, and to improve encoding efficiency.

（実施の形態２）
第２の実施の形態では、第１の実施の形態を適用した映像符号化装置８００が出力した映像ストリームを復号化する。領域ごとの画質を決定する量子化パラメータがなくとも復号化する。 (Embodiment 2)
In the second embodiment, the video stream output from the video encoding apparatus 800 to which the first embodiment is applied is decoded. Decoding is performed even if there is no quantization parameter for determining the image quality for each region.

図６は、本実施の形態２に関わる映像復号化方法を適用した映像復号化装置６００の構成を表す。 FIG. 6 shows a configuration of a video decoding apparatus 600 to which the video decoding method according to the second embodiment is applied.

図６において、映像復号化装置６００は、ストリーム入力部６０１、可変長復号化部６０２、参照マップ生成部６０３、量子化パラメータ生成部６０４、逆量子化部６０５、ＭＣＴＦ復号化部６０６、映像信号出力部６０７を有する。 In FIG. 6, a video decoding apparatus 600 includes a stream input unit 601, a variable length decoding unit 602, a reference map generation unit 603, a quantization parameter generation unit 604, an inverse quantization unit 605, an MCTF decoding unit 606, a video signal. An output unit 607 is included.

ストリーム入力部６０１は、映像復号化装置６００の外部から映像ストリームを入力し、可変長復号化部６０２に出力する。また、映像復号化装置６００の外部から入力する映像ストリームの有無を判定し、映像ストリームの入力がなければ処理を終了する。 The stream input unit 601 receives a video stream from the outside of the video decoding apparatus 600 and outputs the video stream to the variable length decoding unit 602. Also, the presence / absence of a video stream input from the outside of the video decoding apparatus 600 is determined. If there is no video stream input, the process ends.

可変長復号化部６０２は、ストリーム入力部６０１から入力した映像ストリームを可変長復号化して、量子化後の時間的平均画像と時間的差分画を生成し、逆量子化部６０５に出力する。また、動きベクトルを生成し、ＭＣＴＦ復号化部６０６と参照マップ生成部６０３に出力する。 The variable length decoding unit 602 performs variable length decoding on the video stream input from the stream input unit 601, generates a quantized temporal average image and a temporal difference image, and outputs them to the inverse quantization unit 605. Also, a motion vector is generated and output to the MCTF decoding unit 606 and the reference map generation unit 603.

参照マップ生成部６０３は、可変長復号化部６０３から入力した動きベクトルを用いて、参照マップを生成する。参照マップの生成は実施の形態１と同様であるので詳細は割愛する。 The reference map generation unit 603 generates a reference map using the motion vector input from the variable length decoding unit 603. Since the generation of the reference map is the same as in the first embodiment, the details are omitted.

量子化パラメータ生成部６０４は、参照マップ生成部６０３から入力した参照マップを用いて画像内の領域ごとに量子化パラメータを生成し、逆量子化部６０５に出力する。量子化パラメータの生成は実施の形態１と同様であるので詳細は割愛する。 The quantization parameter generation unit 604 generates a quantization parameter for each region in the image using the reference map input from the reference map generation unit 603 and outputs the quantization parameter to the inverse quantization unit 605. Since the generation of the quantization parameter is the same as in the first embodiment, the details are omitted.

逆量子化部６０５は、可変長復号化部６０２から入力した量子化後の時間的平均画像と時間的差分画像を、量子化パラメータ生成部６０４から入力した量子化パラメータを用いて逆量子化して、ＭＣＴＦ復号化部６０６に出力する。 The inverse quantization unit 605 performs inverse quantization on the quantized temporal average image and temporal difference image input from the variable length decoding unit 602 using the quantization parameter input from the quantization parameter generation unit 604. , Output to the MCTF decoding unit 606.

ＭＣＴＦ復号化部６０６は、逆量子化部６０５から入力した逆量子化後の時間的平均画像と時間的差分画像を、可変長復号化部６０２から入力した動きベクトルを用いてＭＣＴＦによる動き予測補償復号化して復号化画像を生成し、映像信号出力部６０７に出力する。 The MCTF decoding unit 606 uses the motion vector input from the variable length decoding unit 602 for the motion average compensation and the time difference image after the inverse quantization input from the inverse quantization unit 605 and the motion prediction compensation by the MCTF. The decoded image is generated by decoding, and is output to the video signal output unit 607.

映像信号出力部６０７は、ＭＣＴＦ復号化部６０６から入力した復号化画像を、映像復号化装置６００の外部に出力する。 The video signal output unit 607 outputs the decoded image input from the MCTF decoding unit 606 to the outside of the video decoding device 600.

なお、ＭＣＴＦ復号化部６０２が本発明の動き予測補償復号化手段に相当し、参照マップ生成部６０３が参照マップ生成手段に相当し、量子化パラメータ生成部６０４がビット指定パラメータ生成手段に相当し、逆量子化部１０５がビット配置手段に相当し、可変長復号化部６０２が可変長復号化手段に相当する。また、量子化パラメータがビット指定パラメータに相当する。 The MCTF decoding unit 602 corresponds to the motion prediction / compensation decoding unit of the present invention, the reference map generation unit 603 corresponds to the reference map generation unit, and the quantization parameter generation unit 604 corresponds to the bit designation parameter generation unit. The inverse quantization unit 105 corresponds to a bit arrangement unit, and the variable length decoding unit 602 corresponds to a variable length decoding unit. Further, the quantization parameter corresponds to a bit designation parameter.

次に、以上のように構成された映像復号化装置６００の動作を説明する。図７は、図６に示す第２の実施の形態の映像復号化装置６００の動作の一例を示すフローチャートである。なお、図７に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納されたプログラムを、同じく図示しないＣＰＵが実行し、プログラムによりソフトウエア的に実行することも可能である。 Next, the operation of the video decoding apparatus 600 configured as described above will be described. FIG. 7 is a flowchart showing an example of the operation of the video decoding apparatus 600 according to the second embodiment shown in FIG. In the flowchart shown in FIG. 7, a program stored in a storage device (not shown) (for example, a ROM or a flash memory) can be executed by a CPU (not shown) and can be executed by software using the program.

最初に、ステップＳ７０１において、ストリーム入力部６０１が、映像復号化装置６００の外部から映像ストリームを入力し、可変長復号化部６０２に出力する。 First, in step S 701, the stream input unit 601 inputs a video stream from the outside of the video decoding apparatus 600 and outputs it to the variable length decoding unit 602.

次に、ステップＳ７０２において、可変長復号化部６０２が、ストリーム入力部６０１から入力した映像ストリームを可変長復号化して、量子化後の時間的平均画像と時間的差分画を生成し、逆量子化部６０５に出力する。また、動きベクトルを生成し、ＭＣＴＦ復号化部６０６と参照マップ生成部６０３に出力する。 Next, in step S702, the variable length decoding unit 602 performs variable length decoding on the video stream input from the stream input unit 601, generates a quantized temporal average image and a temporal difference image, and performs inverse quantization. To the conversion unit 605. Also, a motion vector is generated and output to the MCTF decoding unit 606 and the reference map generation unit 603.

次に、ステップＳ７０３において、次に、ステップＳ７０１において、参照マップ生成部６０３が、可変長復号化部６０３から入力した動きベクトルを用いて、参照マップを生成する。 Next, in step S703, in step S701, the reference map generation unit 603 generates a reference map using the motion vector input from the variable length decoding unit 603.

次に、ステップＳ７０４において、量子化パラメータ生成部６０４が、参照マップ生成部６０３から入力した参照マップを用いて画像内の領域ごとに量子化パラメータを生成し、逆量子化部６０５に出力する。 In step S 704, the quantization parameter generation unit 604 generates a quantization parameter for each region in the image using the reference map input from the reference map generation unit 603, and outputs the quantization parameter to the inverse quantization unit 605.

次に、ステップＳ７０５において、逆量子化部６０５が、可変長復号化部６０２から入力した量子化後の時間的平均画像と時間的差分画像を、量子化パラメータ生成部６０４から入力した量子化パラメータを用いて逆量子化して、ＭＣＴＦ復号化部６０６に出力する。 Next, in step S705, the inverse quantization unit 605 uses the quantization parameter input from the quantization parameter generation unit 604 as the quantized temporal average image and temporal difference image input from the variable length decoding unit 602. Is dequantized using, and output to the MCTF decoding unit 606.

次に、ステップＳ７０６において、ＭＣＴＦ復号化部６０６が、逆量子化部６０５から入力した逆量子化後の時間的平均画像と時間的差分画像を、可変長復号化部６０２から入力した動きベクトルを用いてＭＣＴＦによる動き予測補償復号化して復号化画像を生成し、映像信号出力部６０７に出力する。 Next, in step S706, the MCTF decoding unit 606 uses the motion vector input from the variable length decoding unit 602 as the temporal average image and the temporal difference image after the inverse quantization input from the inverse quantization unit 605. The decoded image is generated by performing motion prediction compensation decoding using MCTF, and is output to the video signal output unit 607.

次に、ステップＳ７０７において、映像信号出力部６０７が、ＭＣＴＦ復号化部６０６から入力した復号化画像を、映像復号化装置６００の外部に出力する。 Next, in step S 707, the video signal output unit 607 outputs the decoded image input from the MCTF decoding unit 606 to the outside of the video decoding device 600.

最後に、ステップＳ７０８において、映像復号化装置６００の外部から入力する映像ストリームの有無を判定し、映像ストリームの入力がなければ処理を終了する。そうでなければ、ステップＳ７０１に戻る。 Finally, in step S708, the presence / absence of a video stream input from the outside of the video decoding apparatus 600 is determined. If there is no video stream input, the process ends. Otherwise, the process returns to step S701.

なお、ステップＳ７０６が本発明の動き予測補償復号化処理ステップに相当し、ステップＳ７０３が参照マップ生成処理ステップに相当し、ステップＳ７０４がビット指定パラメータ生成処理ステップに相当し、ステップＳ７０５がビット配置処理ステップに相当し、ステップＳ７０２が可変長復号化処理ステップに相当する。 Step S706 corresponds to the motion prediction / compensation decoding processing step of the present invention, step S703 corresponds to the reference map generation processing step, step S704 corresponds to the bit designation parameter generation processing step, and step S705 corresponds to the bit arrangement processing. Step S702 corresponds to a variable length decoding process step.

以上のように、本実施の形態２によれば、映像復号化装置６００は、実施の形態１の映像符号化装置１００の出力する映像ストリームを復号化できる。ＭＣＴＦによる動き予測補償符号化の動きベクトルに基づき参照マップを生成し、参照マップに基づき量子化パラメータを生成することにより、量子化パラメータの符号を省略することが可能で、符号化効率を向上することができる。 As described above, according to the second embodiment, the video decoding apparatus 600 can decode the video stream output from the video encoding apparatus 100 according to the first embodiment. By generating a reference map based on the motion vector of motion prediction compensation encoding by MCTF and generating a quantization parameter based on the reference map, it is possible to omit the code of the quantization parameter and improve the encoding efficiency. be able to.

（実施の形態３）
第３の実施の形態では、映像を基本レイヤと拡張レイヤに階層化して符号化する階層符号化において、拡張レイヤの符号が基本レイヤのどの領域を高画質化するかを特定するパラメータを省略して符号化する。 (Embodiment 3)
In the third embodiment, in hierarchical coding in which video is hierarchized into a base layer and an enhancement layer, a parameter for specifying which region of the base layer the image quality of the enhancement layer is enhanced is omitted. To encode.

図８は、本実施の形態３に関わる映像符号化方法を適用した映像符号化装置８００の構成を示す。 FIG. 8 shows a configuration of a video encoding apparatus 800 to which the video encoding method according to the third embodiment is applied.

図８において、映像符号化装置８００は、映像信号入力部８０１、基本レイヤ符号化部８０２、拡張レイヤ符号化部８０３、ストリーム出力部８０４を有する。 In FIG. 8, a video encoding apparatus 800 includes a video signal input unit 801, a base layer encoding unit 802, an enhancement layer encoding unit 803, and a stream output unit 804.

映像信号入力部８０１は、映像符号化装置８００の外部から映像を原画像として入力し、基本レイヤ符号化部８０２に出力する。また、映像符号化装置８００の外部から入力する映像の有無を判定し、映像の入力がなければ処理を終了する。 The video signal input unit 801 inputs a video as an original image from the outside of the video encoding device 800 and outputs it to the base layer encoding unit 802. Also, the presence / absence of a video input from the outside of the video encoding device 800 is determined. If there is no video input, the process ends.

ここで、映像信号入力部８０１は、図１７のように、８枚の原画像に対して３回のＭＣＴＦを行う場合は、画像を８枚ずつ入力する。１６枚に対して４回のＭＣＴＦを行う場合は、１６枚ずつ入力する。 Here, as shown in FIG. 17, the video signal input unit 801 inputs eight images each when performing three MCTFs on eight original images. When performing four MCTFs for 16 sheets, input 16 sheets each.

基本レイヤ符号化部８０２は、映像信号入力部８０１から入力する原画像を符号化し基本レイヤストリームを生成して、ストリーム出力部８０４に出力する。また、基本レイヤを符号化する際に得られる中間情報である、動きベクトル、量子化パラメータ、時間的平均画像と時間的差分画像、量子化後の時間的平均画像と時間的差分画像、を拡張レイヤ符号化部８０３に出力する。 The base layer encoding unit 802 encodes the original image input from the video signal input unit 801, generates a base layer stream, and outputs the base layer stream to the stream output unit 804. In addition, motion vectors, quantization parameters, temporal average images and temporal difference images, and quantized temporal average images and temporal difference images, which are intermediate information obtained when encoding the base layer, are expanded. The data is output to the layer encoding unit 803.

基本レイヤ符号化部８０２の符号化は、ＭＣＴＦを用いた動き予測補償符号化を行うものとするが、フレームレートの選択ができないので実施の形態１を適用するものではないものとする。詳しくは後述する。 The base layer encoding unit 802 performs motion prediction / compensation encoding using MCTF. However, since the frame rate cannot be selected, the first embodiment is not applied. Details will be described later.

拡張レイヤ符号化部８０３は、基本レイヤ符号化部８０２から入力した、動きベクトル、量子化パラメータ、時間的平均画像と時間的差分画像、量子化した時間的平均画像と時間的差分画像、を用いて拡張レイヤストリームを生成し、ストリーム出力部８０４に出力する。 The enhancement layer encoding unit 803 uses the motion vector, the quantization parameter, the temporal average image and the temporal difference image, and the quantized temporal average image and the temporal difference image input from the base layer encoding unit 802. The enhancement layer stream is generated and output to the stream output unit 804.

ストリーム出力部８０４は、基本レイヤ符号化部８０２から入力した基本レイヤストリームと拡張レイヤ符号化部０３から入力した拡張レイヤストリームを映像ストリームとして、映像符号化装置８００の外部に出力する。 The stream output unit 804 outputs the base layer stream input from the base layer encoding unit 802 and the enhancement layer stream input from the enhancement layer encoding unit 03 to the outside of the video encoding apparatus 800 as a video stream.

拡張レイヤ符号化部８０３は、差分部８３２、参照マップ生成部８３３、組替パラメータ生成部８３６、組替部８３４、可変長符号化部８３５を有する。 The enhancement layer encoding unit 803 includes a difference unit 832, a reference map generation unit 833, a reconfiguration parameter generation unit 836, a reconfiguration unit 834, and a variable length encoding unit 835.

差分部８３２は、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力した時間的平均画像と時間的差分画像と、量子化した時間的平均画像と時間的差分画像を入力する。そして、時間的平均画像と量子化した時間的平均画像との差分をとり、また、時間的差分画像と量子化した時間的差分画像との差分をとり、空間的差分画像を生成して、組替部８３４に出力する。 The difference unit 832 receives the temporal average image and temporal difference image input from the base layer encoding unit 802 to the enhancement layer encoding unit 803, and the quantized temporal average image and temporal difference image. Then, the difference between the temporal average image and the quantized temporal average image is taken, and the difference between the temporal difference image and the quantized temporal difference image is taken to generate a spatial difference image, The data is output to the replacement unit 834.

ここで、時間的差分画像とはＭＣＴＦにおける時間的に前後する画像同士の差分を表したが、空間的差分画像は、ある画像とそれと同一の時間を表す他の画像との差分を表す。基本レイヤが生成した量子化した時間的平均画像および時間的差分画像は、量子化の除算の処理により、精細な画素の情報、すなわち、幾つかの空間的情報を失う。空間的差分情報は、量子化によって失われたそれら空間的な情報を有する。 Here, the temporal difference image represents a difference between images that are temporally different in MCTF, but the spatial difference image represents a difference between an image and another image that represents the same time. The quantized temporal average image and temporal difference image generated by the base layer lose fine pixel information, that is, some spatial information, due to quantization division processing. Spatial difference information has those spatial information lost due to quantization.

参照マップ生成部８３３は、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力した動きベクトルを用いて参照マップを生成し、組替パラメータ生成部８３６に出力する。 The reference map generation unit 833 generates a reference map using the motion vector input from the base layer encoding unit 802 to the enhancement layer encoding unit 803, and outputs the reference map to the reconfiguration parameter generation unit 836.

組替パラメータ生成部８３６は、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力した量子化パラメータと、参照マップ生成部８３３から入力した参照マップを用いて組替パラメータを生成し、組替部８３４に出力する。 Recombination parameter generation section 836 generates a reconfiguration parameter using the quantization parameter input from base layer encoding section 802 to enhancement layer encoding section 803 and the reference map input from reference map generation section 833, and The data is output to the replacement unit 834.

組替パラメータとは、２進化した空間的差分画像のビットの映像ストリームに格納する順序を、符号化効率が向上するように組み替える方法を指定するものである。組替パラメータの生成については後述する。 The rearrangement parameter specifies a method of rearranging the order of storing the binarized spatial difference image bits in the video stream so that the coding efficiency is improved. The generation of the rearrangement parameter will be described later.

組替部８３４は、差分部８３２から入力した空間的差分画像を、組替パラメータ生成部８３６から入力した組替パラメータを用いてデータ構造を組み換え、可変長符号化部８３５に出力する。空間的差分画像のデータ構造の組み替えについては後述する。 The rearrangement unit 834 recombines the data structure of the spatial difference image input from the difference unit 832 using the rearrangement parameter input from the rearrangement parameter generation unit 836, and outputs the rearranged data structure to the variable length encoding unit 835. The rearrangement of the data structure of the spatial difference image will be described later.

なお、組み替えの前に空間的差分画像にＤＣＴ変換を行うなどしても良い。その場合は、復号化の処理の際に逆ＤＣＴ変換を必要とする。本発明の主旨ではないので説明は割愛する。 Note that DCT conversion may be performed on the spatial difference image before recombination. In that case, an inverse DCT transform is required in the decoding process. Since it is not the gist of the present invention, the description is omitted.

可変長符号化部８３５は、組替部８３４から入力した組替後の空間的差分画像を、可変長符号化し拡張レイヤストリームを生成し、ストリーム出力部８０４に出力する。 The variable length coding unit 835 performs variable length coding on the spatial difference image after the rearrangement input from the rearrangement unit 834, generates an enhancement layer stream, and outputs the enhancement layer stream to the stream output unit 804.

なお、基本レイヤ符号化部８０２が本発明の動き予測補償符号化手段に相当し、参照マップ生成部８３３が参照マップ生成手段に相当し、組替パラメータ生成部８３６がビット指定パラメータ生成手段に相当し、組替部１０５がビット抽出手段に相当し、可変長符号化部８３５が可変長符号化手段に相当する。また、組替パラメータがビット指定パラメータに相当する。 Note that the base layer encoding unit 802 corresponds to the motion prediction / compensation encoding unit of the present invention, the reference map generation unit 833 corresponds to the reference map generation unit, and the reassignment parameter generation unit 836 corresponds to the bit designation parameter generation unit. The rearrangement unit 105 corresponds to a bit extraction unit, and the variable length encoding unit 835 corresponds to a variable length encoding unit. Further, the rearrangement parameter corresponds to a bit designation parameter.

以下に、本発明の骨子の一つである、組替パラメータの生成と、空間的差分画像のデータ構造の組み替えについて述べる。 The following describes generation of recombination parameters and recombination of the data structure of the spatial difference image, which are one of the gist of the present invention.

ここで、基本レイヤが実施の形態１と本実施の形態の違いについて説明する。 Here, the difference between the basic layer and the first embodiment will be described.

実施の形態１では、８枚の画像に対して３段階のＭＣＴＦを行う場合、それら全ての動きベクトルを用いて参照マップを生成し量子化パラメータを生成するので、復号化の処理においても８枚の画像に対する３段階のＭＣＴＦの全ての動きベクトルが必要であった。すなわち、８枚の画像で３段階のＭＣＴＦを行う場合、８枚のうち４枚だけ復号化するというようにフレームレートを選択することができない。 In the first embodiment, when three-step MCTF is performed on eight images, a reference map is generated and quantization parameters are generated using all of the motion vectors, so that eight images are also used in the decoding process. All motion vectors of the three-stage MCTF for a single image were required. That is, when three-step MCTF is performed on eight images, the frame rate cannot be selected such that only four of the eight images are decoded.

しかし、実施の形態３では、基本レイヤ符号化部８０２に非特許文献２のような通常のＭＣＴＦを用いており、フレームレートを選択することが可能である。 However, in Embodiment 3, a normal MCTF as in Non-Patent Document 2 is used for base layer encoding section 802, and the frame rate can be selected.

以下、説明の簡単のため、２段階のＭＣＴＦについて説明する。 Hereinafter, for simplicity of explanation, a two-stage MCTF will be described.

図９は、本実施の形態３の映像符号化装置８００が出力する映像ストリームの構造を示す。２０２、２０４、２０５、２０６は図３に示した時間的平均画像および時間的差分画像である。図を見やすくするため、そして、説明の簡単のため、図３中の幾つかの部品は省略した。 FIG. 9 shows the structure of a video stream output from the video encoding apparatus 800 according to the third embodiment. Reference numerals 202, 204, 205, and 206 denote the temporal average image and temporal difference image shown in FIG. In order to make the drawing easier to see and to simplify the description, some parts in FIG. 3 are omitted.

時間的平均画像２０５を量子化し可変長符号化したものが基本レイヤストリーム９０１で、対応する拡張レイヤストリームが９１１である。そして、時間的差分画像２０２と２０６と２０４の基本レイヤストリームが９０２と９０３と９０４、それらに対応する拡張レイヤストリームが９１２と９１３と９１４である。ＭＣＴＦの動きベクトルの符号、例えば、動きベクトル２６４、３２２の符号は、基本レイヤストリームが含む。 A base layer stream 901 is obtained by quantizing the temporal average image 205 and variable-length coding, and a corresponding enhancement layer stream is 911. The base layer streams of the temporal difference images 202, 206, and 204 are 902, 903, and 904, and the corresponding enhancement layer streams are 912, 913, and 914, respectively. The code of the motion vector of MCTF, for example, the code of the motion vectors 264 and 322 is included in the base layer stream.

基本レイヤと拡張レイヤの画素のビット構造に関して説明する。 The bit structure of the base layer and enhancement layer pixels will be described.

図１０は、ビット構造を示した図である。ある領域の画素の値を２進数で表して縦に並べたものであり、左のビットほど上位のビットを示す。画素の値は上位のビットほど重要であり、基本レイヤでは量子化を行って下位のビットを省略し上位のビットのみ、例えばビット群１００１のみ、を符号化する。拡張レイヤでは、基本レイヤが量子化して省略した下位のビット群１００２〜１００４を符号化する。 FIG. 10 shows a bit structure. The values of pixels in a certain region are expressed in binary numbers and arranged vertically, and the left bit indicates higher bits. The value of the pixel is more important for the upper bits. In the base layer, quantization is performed and the lower bits are omitted, and only the upper bits, for example, only the bit group 1001 are encoded. In the enhancement layer, lower bit groups 1002 to 1004 which are quantized and omitted by the base layer are encoded.

なお、ＤＣＴ変換などの処理を行う場合は画素ではなく係数となるが、ビット構造の考え方は同様である。 Note that when processing such as DCT conversion is performed, coefficients are used instead of pixels, but the concept of the bit structure is the same.

図６は、本発明の実施の形態３による映像ストリームの構造とビット構造の関係を示す図である。図６は、図１０のビット構造が、図３における時間的平均画像２０５の領域２５２のビット構造であることを示す。 FIG. 6 is a diagram showing the relationship between the structure of a video stream and the bit structure according to Embodiment 3 of the present invention. FIG. 6 shows that the bit structure of FIG. 10 is the bit structure of the region 252 of the temporal average image 205 in FIG.

上位のビット群１００１は、時間的平均画像２０５の基本レイヤストリーム９０１が格納する。１ビット下位のビット群１００２は、時間的平均画像２０５の拡張レイヤストリーム９１１が格納する。さらに１ビット下位のビット群１００３は、領域２５２を参照する時間的差分画像２０６の領域２６２の拡張レイヤストリーム９１３が格納する。さらに１ビット下位のビット群１００４は、領域２５２を参照する２６２をさらに参照する時間的差分画像２０２の領域２２１の拡張レイヤストリーム９１２が格納する。 The upper bit group 1001 is stored in the base layer stream 901 of the temporal average image 205. The enhancement layer stream 911 of the temporal average image 205 is stored in the bit group 1002 lower by 1 bit. Further, the bit group 1003 lower by 1 bit stores the enhancement layer stream 913 of the region 262 of the temporal difference image 206 that refers to the region 252. Further, a bit group 1004 lower by 1 bit is stored in the enhancement layer stream 912 of the region 221 of the temporal difference image 202 that further refers to 262 that refers to the region 252.

どのビット群をどの時間的平均画像または時間的差分画像の拡張レイヤストリームに格納するかは、基本レイヤストリームが含む動きベクトル２６４、３２２を用いて生成する参照マップにより特定することが可能であり、この情報を組替パラメータとして生成する。例えば、図６において、ビット群１００２を拡張レイヤストリーム９１１に、ビット群１００３を拡張レイヤストリーム９１３に、ビット群１００４を拡張レイヤストリーム９１２に格納する、という情報である。そして、組替パラメータが指定する、どのビット群をどの拡張レイヤストリームに格納するか、の情報に基づきデータ構造の組替を行う。例えば、図６において、ビット群１００２を拡張レイヤストリーム９１１に、ビット群１００３を拡張レイヤストリーム９１３に、ビット群１００４を拡張レイヤストリーム９１２に格納する。 Which bit group is stored in the enhancement layer stream of which temporal average image or temporal difference image can be specified by a reference map generated using the motion vectors 264 and 322 included in the base layer stream, This information is generated as a rearrangement parameter. For example, in FIG. 6, the bit group 1002 is stored in the enhancement layer stream 911, the bit group 1003 is stored in the enhancement layer stream 913, and the bit group 1004 is stored in the enhancement layer stream 912. Then, the data structure is rearranged based on information indicating which bit group specified in the rearrangement parameter is stored in which enhancement layer stream. For example, in FIG. 6, the bit group 1002 is stored in the enhancement layer stream 911, the bit group 1003 is stored in the enhancement layer stream 913, and the bit group 1004 is stored in the enhancement layer stream 912.

本発明の方法を用いて、このような映像ストリームの構造をとると、フレームレートを増やし時間的差分画像の映像ストリームを多く復号化する場合に、より多く参照される領域ほど多くの情報を映像ストリーム中に符号化するので、参照される領域の画質が向上し、ドリフトノイズの発生を防ぐことが可能である。そして、基本レイヤストリームが持つＭＣＴＦによる時間的差分画像の動きベクトルの情報を参照マップに追加するごとに、拡張レイヤが高画質化する領域のビット群を順次特定していくことが可能である。よって、参照度合の多い重要な領域の画質を向上することが可能で、かつ、復号化の処理でフレームレートを選択して復号化することが可能で、かつ、ある符号がどのビット群を表すかを示す符号が必要ないので高い符号化効率が得られる。 If the method of the present invention is used to take such a video stream structure, when the frame rate is increased and the video stream of the temporal difference image is decoded, more information is recorded in the more referenced area. Since encoding is performed in the stream, the image quality of the referenced area is improved, and the occurrence of drift noise can be prevented. Each time the motion vector information of the temporal difference image by MCTF included in the base layer stream is added to the reference map, it is possible to sequentially specify the bit group of the area where the enhancement layer improves the image quality. Therefore, it is possible to improve the image quality of important regions with a high degree of reference, to select a frame rate in decoding processing, and to decode which bit group represents a certain code. High coding efficiency can be obtained because a code indicating whether or not is needed.

次に、以上のように構成された映像符号化装置８００の動作を説明する。図１２は、図８に示す第３の実施の形態の映像符号化装置８００の動作の一例を示すフローチャートである。なお、図１２に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納されたプログラムを、同じく図示しないＣＰＵが実行し、プログラムによりソフトウエア的に実行することも可能である。 Next, the operation of the video encoding apparatus 800 configured as described above will be described. FIG. 12 is a flowchart showing an example of the operation of the video encoding apparatus 800 according to the third embodiment shown in FIG. In the flowchart shown in FIG. 12, a program stored in a storage device (not shown) (for example, a ROM or a flash memory) can be executed by a CPU (not shown) and executed by software using the program.

最初に、ステップＳ１２０１において、映像信号入力部８０１が、映像符号化装置８００の外部から映像を原画像として入力し、基本レイヤ符号化部８０２に出力する。 First, in step S 1201, the video signal input unit 801 inputs a video as an original image from the outside of the video encoding device 800 and outputs it to the base layer encoding unit 802.

次に、ステップＳ１２０２において、基本レイヤ符号化部８０２が、映像信号入力部８０１から入力する原画像を符号化し基本レイヤストリームを生成して、ストリーム出力部８０４に出力する。また、基本レイヤを符号化する際に得られる中間情報である、動きベクトル、量子化パラメータ、時間的平均画像と時間的差分画像、量子化後の時間的平均画像と時間的差分画像、を拡張レイヤ符号化部８０３に出力する。 Next, in step S1202, the base layer encoding unit 802 encodes the original image input from the video signal input unit 801, generates a base layer stream, and outputs the base layer stream to the stream output unit 804. In addition, motion vectors, quantization parameters, temporal average images and temporal difference images, and quantized temporal average images and temporal difference images, which are intermediate information obtained when encoding the base layer, are expanded. The data is output to the layer encoding unit 803.

次に、ステップＳ１２０４において、差分部８３２が、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力した時間的平均画像と時間的差分画像と、量子化した時間的平均画像と時間的差分画像を入力する。そして、時間的平均画像と量子化した時間的平均画像の差分をとり、また、時間的差分画像と量子化した時間的差分画像の差分をとり、空間的差分画像を生成して、組替部８３４に出力する。 Next, in step S1204, the difference unit 832 inputs the temporal average image and temporal difference image input from the base layer encoding unit 802 to the enhancement layer encoding unit 803, the quantized temporal average image, and temporal difference. Enter an image. Then, the difference between the temporal average image and the quantized temporal average image is taken, the difference between the temporal difference image and the quantized temporal difference image is taken, a spatial difference image is generated, and the reconfiguration unit 834.

次に、ステップＳ１２０５において、参照マップ生成部８３３が、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力した動きベクトルを用いて参照マップを生成し、組替パラメータ生成部８３６に出力する。 Next, in step S1205, the reference map generation unit 833 generates a reference map using the motion vector input from the base layer encoding unit 802 to the enhancement layer encoding unit 803, and outputs the reference map to the reconfiguration parameter generation unit 836. .

次に、ステップＳ１２１０において、組替パラメータ生成部８３６が、基本レイヤ符号化部８０２から拡張レイヤ符号化部８０３に入力して量子化パラメータと、参照マップ生成部８３３から入力した参照マップを用いて組替パラメータを生成し、組替部８３４に出力する。 Next, in step S1210, the reconfiguration parameter generation unit 836 uses the quantization parameter input from the base layer encoding unit 802 to the enhancement layer encoding unit 803 and the reference map input from the reference map generation unit 833. Reclassification parameters are generated and output to the reclassification unit 834.

次に、ステップＳ１２０６において、組替部８３４が、差分部８３２から入力した空間的差分画像を、組替パラメータ生成部８３６から入力した組替パラメータを用いてデータ構造を組み換え、可変長符号化部８３５に出力する。 Next, in step S1206, the rearrangement unit 834 recombines the data structure of the spatial difference image input from the difference unit 832 using the rearrangement parameter input from the rearrangement parameter generation unit 836, and the variable length encoding unit. Output to 835.

次に、ステップＳ１２０７において、可変長符号化部８３５が、組替部８３４から入力した組替後の空間的差分画像を、可変長符号化し拡張レイヤストリームを生成し、ストリーム出力部８０４に出力する。 Next, in step S1207, the variable length coding unit 835 performs variable length coding on the spatial difference image after the rearrangement input from the rearrangement unit 834, generates an enhancement layer stream, and outputs the enhancement layer stream to the stream output unit 804. .

次に、ステップＳ１２０８において、ストリーム出力部８０４が、基本レイヤ符号化部８０２から入力した基本レイヤストリームと拡張レイヤ符号化部から入力した拡張レイヤストリームを映像ストリームとして、映像符号化装置８００の外部に出力する。 Next, in step S1208, the stream output unit 804 sets the base layer stream input from the base layer encoding unit 802 and the enhancement layer stream input from the enhancement layer encoding unit as a video stream to the outside of the video encoding device 800. Output.

最後に、ステップＳ１２０９において、映像信号入力部８０１が、映像符号化装置８００の外部から入力する映像の有無を判定し、映像の入力がなければ処理を終了する。 Finally, in step S1209, the video signal input unit 801 determines the presence / absence of a video input from the outside of the video encoding device 800. If there is no video input, the processing ends.

なお、ステップＳ１２０２が本発明の動き予測補償符号化処理ステップに相当し、ステップＳ１２０５が参照マップ生成処理ステップに相当し、ステップＳ１２１０がビット指定パラメータ生成処理ステップに相当し、ステップＳ１２０６がビット抽出処理ステップに相当し、ステップＳ１２０７が可変長符号化処理ステップに相当する。 Step S1202 corresponds to the motion prediction / compensation coding processing step of the present invention, step S1205 corresponds to the reference map generation processing step, step S1210 corresponds to the bit designation parameter generation processing step, and step S1206 corresponds to the bit extraction processing. Step S1207 corresponds to a step, and variable length coding processing step corresponds to step S1207.

以上のように、本実施の形態３によれば、映像符号化装置８００は、基本レイヤにおけるＭＣＴＦによる動き予測補償符号化の動きベクトルに基づき参照マップを生成し、参照マップに基づき多く参照される重要な領域を特定することにより、拡張レイヤの符号が基本レイヤのどの領域を高画質化するかを特定するパラメータを省略して符号化することが可能で、符号化効率を向上することができる。 As described above, according to the third embodiment, video encoding apparatus 800 generates a reference map based on the motion vector of motion prediction compensation encoding by MCTF in the base layer, and is often referred to based on the reference map. By specifying the important area, it is possible to omit the parameter for specifying which area of the base layer the image quality of the enhancement layer is to improve the image quality, and to improve the encoding efficiency. .

（実施の形態４）
第４の実施の形態では、第３の実施の形態を適用した映像符号化装置８００が出力した基本レイヤと拡張レイヤの映像ストリームを復号化する。拡張レイヤの符号が基本レイヤのどの領域を高画質化するかを特定するパラメータがなくとも復号化する。 (Embodiment 4)
In the fourth embodiment, the base layer and enhancement layer video streams output from the video encoding apparatus 800 to which the third embodiment is applied are decoded. The enhancement layer code is decoded even if there is no parameter for specifying which region of the base layer the image quality is to be improved.

図１３は、本実施の形態４に関わる映像復号化方法を適用した映像復号化装置１３００の構成を表す。 FIG. 13 illustrates a configuration of a video decoding apparatus 1300 to which the video decoding method according to the fourth embodiment is applied.

図１３において、映像復号化装置１３００は、ストリーム入力部１３０１、基本レイヤ復号化部１３０２、拡張レイヤ復号化部１３０３、映像信号出力部１３０４を有する。 In FIG. 13, the video decoding apparatus 1300 includes a stream input unit 1301, a base layer decoding unit 1302, an enhancement layer decoding unit 1303, and a video signal output unit 1304.

ストリーム入力部１３０１は、映像復号化装置１３００の外部から映像ストリームを入力し、その中で基本レイヤストリームを基本レイヤ復号化部１３０２に、拡張レイヤストリームを拡張レイヤ復号化部１３０３に出力する。また、映像復号化装置１３００の外部から入力する映像ストリームの有無を判定し、映像ストリームの入力がなければ処理を終了する。 The stream input unit 1301 inputs a video stream from the outside of the video decoding device 1300, and outputs the base layer stream to the base layer decoding unit 1302 and the enhancement layer stream to the enhancement layer decoding unit 1303. Also, the presence / absence of a video stream input from the outside of the video decoding apparatus 1300 is determined. If there is no video stream input, the process ends.

基本レイヤ復号化部１３０２は、ストリーム入力部１３０１から入力した基本レイヤストリームを復号化し、動きベクトル、量子化パラメータ、量子化後の時間的平均画像と時間的差分画像、を生成して拡張レイヤ復号化部１３０３に出力する。 The base layer decoding unit 1302 decodes the base layer stream input from the stream input unit 1301, generates a motion vector, a quantization parameter, a temporal average image and a temporal difference image after quantization, and performs enhancement layer decoding To the conversion unit 1303.

基本レイヤ復号化部１３０２の復号化は、ＭＣＴＦを用いた動き予測補償復号化を行うものとするが、実施の形態２を適用するものではないものとする。 The decoding by the base layer decoding unit 1302 performs motion prediction compensation decoding using MCTF, but does not apply the second embodiment.

拡張レイヤ復号化部１３０３は、ストリーム入力部１３０１から入力した拡張レイヤストリームと、基本レイヤ復号化部１３０２から入力した動きベクトル、量子化パラメータ、量子化後の時間的平均画像と時間的差分画像、を用いて復号化画像を生成し、映像信号出力部１３０４に出力する。 The enhancement layer decoding unit 1303 includes an enhancement layer stream input from the stream input unit 1301, a motion vector input from the base layer decoding unit 1302, a quantization parameter, a temporal average image after quantization, and a temporal difference image. Is used to generate a decoded image and output it to the video signal output unit 1304.

映像信号出力部１３０４は、拡張レイヤ復号化部１３０３から入力した復号化画像を、映像復号化装置１３００の外部に出力する。 The video signal output unit 1304 outputs the decoded image input from the enhancement layer decoding unit 1303 to the outside of the video decoding device 1300.

拡張レイヤ復号化部１３０３は、可変長復号化部１３３１、参照マップ生成部１３３２、組直部１３３３、加算部１３３４、組替パラメータ生成部１３３５を有する。 The enhancement layer decoding unit 1303 includes a variable length decoding unit 1331, a reference map generation unit 1332, a recombination unit 1333, an addition unit 1334, and a reconfiguration parameter generation unit 1335.

可変長復号化部１３３１は、ストリーム入力部１３０１から入力した拡張レイヤストリームを可変長復号化して組替後の空間的差分画像を生成し、組直部１３３３に出力する。 The variable length decoding unit 1331 performs variable length decoding on the enhancement layer stream input from the stream input unit 1301 to generate a spatial difference image after rearrangement, and outputs the generated spatial difference image to the recombination unit 1333.

参照マップ生成部１３３２は、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０２に入力した動きベクトルを用いて、参照マップを生成し、組替パラメータ生成部１３３５に出力する。 The reference map generation unit 1332 generates a reference map using the motion vector input from the base layer decoding unit 1302 to the enhancement layer decoding unit 1302, and outputs the reference map to the reconfiguration parameter generation unit 1335.

組替パラメータ生成部１３３５は、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０２に入力した量子化パラメータと、参照マップ生成部１３３２から入力した参照マップを用いて、組替パラメータを生成し、組直部１３３３に出力する。 Recombination parameter generation section 1335 generates a reconfiguration parameter using the quantization parameter input from base layer decoding section 1302 to enhancement layer decoding section 1302 and the reference map input from reference map generation section 1332, Output to the reassembly unit 1333.

組替パラメータの生成は実施の形態３と同様である。 The generation of the rearrangement parameter is the same as that in the third embodiment.

組直部１３３３は、可変長復号化部１３３１から入力した組替後の空間的差分画像と、組替パラメータ生成部１３３５から入力した組替パラメータと、組替後の時間的平均画像と時間的差分画像、を用いてデータ構造を組み直し復号化後の空間的差分画像を生成し、加算部１３３４に出力する。 The recomposing unit 1333 includes a spatial difference image after rearrangement input from the variable length decoding unit 1331, a rearrangement parameter input from the rearrangement parameter generation unit 1335, a temporal average image after rearrangement, and temporal The difference image is used to recompose the data structure to generate a decoded spatial difference image, which is output to the adder 1334.

組替パラメータを用いたデータ構造を組み直しは実施の形態３と逆手順である。例えば、図６において、拡張レイヤストリーム９１１からビット群１００２にあたる情報を復号化し、拡張レイヤストリーム９１３からビット群１００３にあたる情報を復号化し、拡張レイヤストリーム９１２からビット群１００４にあたる情報を復号化する。 Reassembling the data structure using the rearrangement parameters is the reverse procedure of the third embodiment. For example, in FIG. 6, the information corresponding to the bit group 1002 from the enhancement layer stream 911 is decoded, the information corresponding to the bit group 1003 from the enhancement layer stream 913 is decoded, and the information corresponding to the bit group 1004 from the enhancement layer stream 912 is decoded.

加算部１３３４は、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０３に入力した量子化後の時間的平均画像と時間的差分画像と、組直部１３３３から入力した復号化後の空間的差分画像を用いて、ＭＣＴＦによる動き予測補償復号化して復号化画像を生成し、映像信号出力部１３０４に出力する。 The addition unit 1334 includes a quantized temporal average image and temporal difference image input from the base layer decoding unit 1302 to the enhancement layer decoding unit 1303, and a decoded spatial difference input from the recombination unit 1333. Using the image, motion prediction compensation decoding by MCTF is performed to generate a decoded image, which is output to the video signal output unit 1304.

なお、基本レイヤ復号化部１３０２が本発明の動き予測補償復号化手段に相当し、参照マップ生成部１３３２が参照マップ生成手段に相当し、組替パラメータ生成部１３３５がビット指定パラメータ生成手段に相当し、組直部１３３３がビット配置手段に相当し、可変長復号化部１３３１が可変長復号化手段に相当する。また、組替パラメータがビット指定パラメータに相当する。 The base layer decoding unit 1302 corresponds to the motion prediction / compensation decoding unit of the present invention, the reference map generation unit 1332 corresponds to the reference map generation unit, and the reassignment parameter generation unit 1335 corresponds to the bit designation parameter generation unit. The reassembling unit 1333 corresponds to the bit arrangement unit, and the variable length decoding unit 1331 corresponds to the variable length decoding unit. Further, the rearrangement parameter corresponds to a bit designation parameter.

次に、以上のように構成された映像復号化装置１３００の動作を説明する。図１４は、図１３に示す第４の実施の形態の映像復号化装置１３００の動作の一例を示すフローチャートである。なお、図１４に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納されたプログラムを、同じく図示しないＣＰＵが実行し、プログラムによりソフトウエア的に実行することも可能である。 Next, the operation of the video decoding apparatus 1300 configured as described above will be described. FIG. 14 is a flowchart showing an example of the operation of the video decoding apparatus 1300 according to the fourth embodiment shown in FIG. In the flowchart shown in FIG. 14, a program stored in a storage device (not shown) (for example, a ROM or a flash memory) can be executed by a CPU (not shown) and can be executed by software using the program.

最初に、ステップＳ１４０１において、ストリーム入力部１３０１が、映像復号化装置１３００の外部から映像ストリームを入力し、その中で基本レイヤストリームを基本レイヤ復号化部１３０２に、拡張レイヤストリームを拡張レイヤ復号化部１３０３に出力する。 First, in step S1401, the stream input unit 1301 inputs a video stream from the outside of the video decoding device 1300, in which the base layer stream is input to the base layer decoding unit 1302, and the enhancement layer stream is enhanced layer decoded. Output to the unit 1303.

次に、ステップＳ１４０２において、基本レイヤ復号化部１３０２は、ストリーム入力部１３０１から入力した基本レイヤストリームを復号化し、動きベクトル、量子化パラメータ、量子化後の時間的平均画像と時間的差分画像、を生成して拡張レイヤ復号化部１３０３に出力する。 Next, in step S1402, the base layer decoding unit 1302 decodes the base layer stream input from the stream input unit 1301, and performs a motion vector, a quantization parameter, a temporal average image after quantization, and a temporal difference image, Is generated and output to the enhancement layer decoding section 1303.

次に、ステップＳ１４０３において、可変長復号化部１３３１が、ストリーム入力部１３０１から入力した拡張レイヤストリームを可変長復号化して組替後の空間的差分画像を生成し、組直部１３３３に出力する。 Next, in step S1403, the variable length decoding unit 1331 generates a spatial difference image after reordering by performing variable length decoding on the enhancement layer stream input from the stream input unit 1301, and outputs the spatial difference image to the recomposing unit 1333. .

次に、ステップＳ１４０４において、参照マップ生成部１３３２が、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０２に入力した動きベクトルを用いて、参照マップを生成し、組替パラメータ生成部１３３５に出力する。 Next, in step S1404, the reference map generation unit 1332 generates a reference map using the motion vector input from the base layer decoding unit 1302 to the enhancement layer decoding unit 1302, and outputs the reference map to the reconfiguration parameter generation unit 1335. To do.

次に、ステップＳ１４０９において、組替パラメータ生成部１３３５が、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０２に入力した量子化パラメータと、参照マップ生成部１３３２から入力した参照マップを用いて、組替パラメータを生成し、組直部１３３３に出力する。 Next, in step S1409, the reconfiguration parameter generation unit 1335 uses the quantization parameter input from the base layer decoding unit 1302 to the enhancement layer decoding unit 1302 and the reference map input from the reference map generation unit 1332. A reclassification parameter is generated and output to the recomposition unit 1333.

次に、ステップＳ１４０５において、組直部１３３３が、可変長復号化部１３３１から入力した組替後の空間的差分画像と、組替パラメータ生成部１３３５から入力した組替パラメータと、基本レイヤ復号化部１３０２から入力した量子化パラメータ、組替後の時間的平均画像と時間的差分画像、を用いてデータ構造を組み直し復号化後の空間的差分画像を生成し、加算部１３３４に出力する。 Next, in step S1405, the recombination unit 1333 receives the spatial difference image after the rearrangement input from the variable length decoding unit 1331, the rearrangement parameter input from the rearrangement parameter generation unit 1335, and the base layer decoding. The data structure is recombined using the quantization parameter input from the unit 1302, the temporal average image after the rearrangement, and the temporal difference image to generate a decoded spatial difference image, which is output to the adding unit 1334.

次に、ステップＳ１４０６において、加算部１３３４が、基本レイヤ復号化部１３０２から拡張レイヤ復号化部１３０２に入力した量子化後の時間的平均画像と時間的差分画像と、組直部１３３３から入力した復号化後の空間的差分画像を用いて、ＭＣＴＦによる動き予測補償復号化して復号化画像を生成し、映像信号出力部１３０４に出力する。 Next, in step S1406, the addition unit 1334 receives the quantized temporal average image and temporal difference image input from the base layer decoding unit 1302 to the enhancement layer decoding unit 1302, and the recombination unit 1333. Using the decoded spatial difference image, MCTF motion prediction compensation decoding is performed to generate a decoded image, which is output to the video signal output unit 1304.

次に、ステップＳ１４０７において、映像信号出力部１３０４が、拡張レイヤ復号化部１３０３から入力した復号化画像を、映像復号化装置１３００の外部に出力する。 Next, in step S1407, the video signal output unit 1304 outputs the decoded image input from the enhancement layer decoding unit 1303 to the outside of the video decoding device 1300.

最後に、ステップＳ１４０８において、ストリーム入力部１３０１が、映像復号化装置１３００の外部から入力する映像ストリームの有無を判定し、映像ストリームの入力がなければ処理を終了する。そうでなければ、ステップＳ１４０１に戻る。 Finally, in step S1408, the stream input unit 1301 determines whether there is a video stream input from the outside of the video decoding apparatus 1300. If there is no video stream input, the process ends. Otherwise, the process returns to step S1401.

なお、ステップＳ１４０２が本発明の動き予測補償復号化処理ステップに相当し、ステップＳ１４０４が参照マップ生成処理ステップに相当し、ステップＳ１４０９がビット指定パラメータ生成処理ステップに相当し、ステップＳ１４０５がビット配置処理ステップに相当し、ステップＳ１４０３が可変長復号化処理ステップに相当する。 Note that step S1402 corresponds to the motion prediction / compensation decoding processing step of the present invention, step S1404 corresponds to the reference map generation processing step, step S1409 corresponds to the bit designation parameter generation processing step, and step S1405 corresponds to the bit arrangement processing. Step S1403 corresponds to a variable length decoding process step.

以上のように、本実施の形態４によれば、映像復号化装置１３００は、実施の形態３の映像符号化装置８００の出力する映像ストリームを復号化できる。基本レイヤにおけるＭＣＴＦによる動き予測補償符号化の動きベクトルに基づき参照マップを生成し、参照マップに基づき多く参照される重要な領域を特定することにより、拡張レイヤの符号が基本レイヤのどの領域を高画質化するかを特定するパラメータの符号を省略することが可能で、符号化効率が向上する。 As described above, according to the fourth embodiment, video decoding apparatus 1300 can decode the video stream output from video encoding apparatus 800 according to the third embodiment. A reference map is generated based on the motion vector of motion prediction compensation coding by MCTF in the base layer, and an important region that is often referred to is identified based on the reference map, so that the enhancement layer code increases which region of the base layer. It is possible to omit the sign of the parameter that specifies whether to improve the image quality, and the coding efficiency is improved.

本発明は、ＭＣＴＦによる動き予測補償を用いた映像符号化装置および映像復号化装置の符号化効率を向上することができ、有用である。また、本発明は、特に階層符号化方式に適用することが可能で、有用である。 INDUSTRIAL APPLICABILITY The present invention is useful because it can improve the encoding efficiency of a video encoding device and a video decoding device using motion prediction compensation by MCTF. In addition, the present invention can be applied to a hierarchical encoding method and is useful.

よって、映像ストリームを符号化した後、符号量やフレームレートを調整して復号化する用途に適している。すなわち、通信速度の変動するネットワークを介して映像ストリームの量を動的に変化させながら映像を送受信するシステムの映像符号化方式に適している。また、１つの映像コンテンツを通信速度や端末の処理能力の異なる複数のユーザに、それぞれ異なるフレームレートや通信速度で配信するシステムの映像符号化方式に適している。また、映像ストリームを蓄積した後に、復号化して再度符号化することなく、蓄積容量を変更する映像蓄積装置の映像符号化方式に適している。 Therefore, after the video stream is encoded, it is suitable for the purpose of decoding by adjusting the code amount and the frame rate. That is, it is suitable for a video encoding system of a system that transmits and receives video while dynamically changing the amount of the video stream via a network whose communication speed varies. Further, the present invention is suitable for a video encoding system of a system that distributes one video content to a plurality of users having different communication speeds and terminal processing capabilities at different frame rates and communication speeds. Further, the present invention is suitable for a video encoding method of a video storage apparatus that changes the storage capacity without decoding and re-encoding after storing the video stream.

本発明の実施の形態１による映像符号化装置の構成を示す図The figure which shows the structure of the video coding apparatus by Embodiment 1 of this invention. ＭＣＴＦにおける画像間の参照の関係を示す図The figure which shows the reference relationship between the images in MCTF. ＭＣＴＦにおける符号化すべき画像間の参照の関係を示す図The figure which shows the relationship of the reference between the images which should be encoded in MCTF. 参照マップを示す図Diagram showing reference map 本発明の実施の形態１による映像符号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video coding apparatus by Embodiment 1 of this invention. 本発明の実施の形態２による映像復号化装置の構成を示す図The figure which shows the structure of the video decoding apparatus by Embodiment 2 of this invention. 本発明の実施の形態２による映像復号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video decoding apparatus by Embodiment 2 of this invention. 本発明の実施の形態３による映像符号化装置の構成を示す図The figure which shows the structure of the video coding apparatus by Embodiment 3 of this invention. 本発明の実施の形態３による映像ストリームの構造を示す図The figure which shows the structure of the video stream by Embodiment 3 of this invention ビット構造を示す図Diagram showing bit structure 本発明の実施の形態３による映像ストリームの構造とビット構造の関係を示す図The figure which shows the relationship between the structure of a video stream and bit structure by Embodiment 3 of this invention 本発明の実施の形態３による映像符号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video coding apparatus by Embodiment 3 of this invention. 本発明の実施の形態４による映像復号化装置の構成を示す図The figure which shows the structure of the video decoding apparatus by Embodiment 4 of this invention. 本発明の実施の形態４による映像復号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video decoding apparatus by Embodiment 4 of this invention. 非特許文献１による映像符号化装置の構成を示す図The figure which shows the structure of the video coding apparatus by a nonpatent literature 1. 非特許文献２によるＭＣＴＦの概念図MCTF conceptual diagram according to Non-Patent Document 2 非特許文献２による段階的なＭＣＴＦの概念図Conceptual diagram of MCTF in stages according to Non-Patent Document 2.

符号の説明Explanation of symbols

１０１映像信号入力部
１０２ＭＣＴＦ符号化部
１０３参照マップ生成部
１０４量子化パラメータ生成部
１０５量子化部
１０６可変長符号化部
１０７ストリーム出力部
６０１ストリーム入力部
６０２可変長復号化部
６０３参照マップ生成部
６０４量子化パラメータ生成部
６０５量子化部
６０６ＭＣＴＦ復号化部
６０７映像信号出力部
８０１映像信号入力部
８０２基本レイヤ符号化部
８０３拡張レイヤ符号化部
８０４ストリーム出力部
８３１中間情報取得部
８３２差分部
８３３参照マップ生成部
８３６組替パラメータ生成部
８３４組替部
８３５可変長符号化部
１３０１ストリーム入力部
１３０２基本レイヤ復号化部
１３０３拡張レイヤ復号化部
１３０４映像信号出力部
１３３１可変長復号化部
１３３２参照マップ生成部
１３３５組替パラメータ生成部
１３３３組直部
１３３４加算部
１５０１映像信号入力部
１５０２動き予測補償符号化部
１５０３ＤＣＴ部
１５０４量子化部
１５０５可変長符号化部
１５０６ストリーム出力部
１５０７逆ＤＣＴ部
１５０８逆量子化部
１５０９デブロック部 DESCRIPTION OF SYMBOLS 101 Video signal input part 102 MCTF encoding part 103 Reference map generation part 104 Quantization parameter generation part 105 Quantization part 106 Variable length encoding part 107 Stream output part 601 Stream input part 602 Variable length decoding part 603 Reference map generation part 604 Quantization parameter generation unit 605 Quantization unit 606 MCTF decoding unit 607 Video signal output unit 801 Video signal input unit 802 Base layer encoding unit 803 Enhancement layer encoding unit 804 Stream output unit 831 Intermediate information acquisition unit 832 Difference unit 833 Reference map generation unit 836 Reconfiguration parameter generation unit 834 Recomposition unit 835 Variable length encoding unit 1301 Stream input unit 1302 Base layer decoding unit 1303 Enhancement layer decoding unit 1304 Video signal output unit 1331 Variable length decoding unit 1332 Reference map generation unit 1335 Reconfiguration parameter generation unit 1333 Recombination unit 1334 Addition unit 1501 Video signal input unit 1502 Motion prediction compensation encoding unit 1503 DCT unit 1504 Quantization unit 1505 Variable length encoding unit 1506 Stream output unit 1507 Inverse DCT unit 1508 Inverse quantization unit 1509 Deblock unit

Claims

ＭＣＴＦ（ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｅｄＴｅｍｐｏｒａｌＦｉｌｔｅｒｉｎｇ）により動きベクトルを生成する動き予測補償符号化手段と、
前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成手段と、
前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成手段と、
領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出手段と、
前記ビット抽出手段が抽出したビットを可変長符号化する可変長符号化手段と、
を有する映像符号化装置。 Motion prediction compensation encoding means for generating a motion vector by MCTF (Motion Compensated Temporal Filtering);
A reference map generating means for generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector;
Bit designation parameter generation means for generating a bit designation parameter for designating which bits of a set of bits representing pixels of a region using the reference map are encoded and stored in the video stream;
A bit extracting means for extracting a bit designated by the bit designation parameter from a bit representing a pixel of the region;
Variable length encoding means for variable length encoding the bits extracted by the bit extraction means;
A video encoding device.

前記参照マップ生成手段が、画像の領域ごとに前記動きベクトルが参照する回数が多いほど値が大きくなる参照度合を算出し、
前記ビット指定パラメータ生成手段が、前記参照マップの参照度合に応じて前記ビット指定パラメータを生成する、請求項１記載の映像符号化装置。 The reference map generation means calculates a reference degree that increases as the number of times the motion vector refers to each region of the image increases;
The video coding apparatus according to claim 1, wherein the bit designation parameter generation means generates the bit designation parameter in accordance with a reference degree of the reference map.

前記参照マップ生成手段が、ある領域がもつ前記参照度合が大きいほど、その領域が参照する領域の前記参照度合を大きくする、請求項２記載の映像符号化装置。 The video encoding device according to claim 2, wherein the reference map generation unit increases the reference degree of an area referred to by the area as the reference degree of the area increases.

前記参照マップ生成手段が、前記動きベクトルが参照する領域の画素をあらかじめ定めた数以上含む領域に対して前記参照度合を大きくする、請求項２記載の映像符号化装置。 The video encoding device according to claim 2, wherein the reference map generation unit increases the reference degree for a region including a predetermined number of pixels in a region referred to by the motion vector.

前記参照マップ生成手段が、段階的に行うＭＣＴＦによる動き予測補償のうち遅い段階で生成した前記動きベクトルが参照する領域ほど参照度合を大きくする、請求項２記載の映像符号化装置。 The video encoding apparatus according to claim 2, wherein the reference map generation means increases the reference degree in a region referred to by the motion vector generated at a later stage in motion prediction compensation by MCTF performed in stages.

前記ビット抽出手段が、前記ビット指定パラメータに基づき量子化を行う、請求項１記載の映像符号化装置。 The video encoding apparatus according to claim 1, wherein the bit extraction unit performs quantization based on the bit designation parameter.

前記ビット抽出手段が、前記ビット指定パラメータに基づき複数のビット列の同じ位のビット毎に符号化を行う、請求項１記載の映像符号化装置。 2. The video encoding apparatus according to claim 1, wherein the bit extraction unit performs encoding for each bit of the same order in a plurality of bit strings based on the bit designation parameter.

映像を単独で復号化可能な基本レイヤと基本レイヤの画質を向上する拡張レイヤに階層化して符号化するものであり、前記ビット指定パラメータ生成手段が、拡張レイヤに対する前記ビット指定パラメータを生成する、請求項１記載の映像符号化装置。 The video is layered and encoded into a base layer that can be decoded independently and an enhancement layer that improves the image quality of the base layer, and the bit designation parameter generation means generates the bit designation parameter for the enhancement layer. The video encoding device according to claim 1.

映像ストリームを可変長復号化する可変長復号化手段と、
動きベクトルを用いてＭＣＴＦ（ＭｏｔｉｏｎＣｏｍｐｅｎｓａｔｅｄＴｅｍｐｏｒａｌＦｉｌｔｅｒｉｎｇ）を行う動き予測補償復号化手段と、
前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成手段と、
前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成手段と、
領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置手段と、
を有する映像復号化装置。 Variable length decoding means for variable length decoding a video stream;
Motion prediction compensation decoding means for performing motion compensated temporal filtering (MCTF) using a motion vector;
A reference map generating means for generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector;
Bit designation parameter generation means for generating a bit designation parameter for designating which bit representing a pixel in a region corresponds to a bit obtained by decoding a video stream using the reference map;
A bit arrangement means for arranging a bit designated by the bit designation parameter in a bit representing a pixel of a region;
A video decoding apparatus comprising:

前記参照マップ生成手段が、画像の領域ごとに前記動きベクトルが参照する回数が多いほど値が大きくなる参照度合を算出し、前記ビット指定パラメータ生成手段が、前記参照マップの参照度合に応じて前記ビット指定パラメータを生成する、請求項９記載の映像復号化装置。 The reference map generation means calculates a reference degree that increases as the number of times the motion vector refers to each image area increases, and the bit designation parameter generation means determines the reference map according to the reference degree of the reference map. The video decoding device according to claim 9, wherein a bit designation parameter is generated.

前記参照マップ生成手段が、ある領域がもつ前記参照度合が大きいほど、その領域が参照する領域の前記参照度合を大きくする、請求項１０記載の映像復号化装置。 The video decoding apparatus according to claim 10, wherein the reference map generation unit increases the reference degree of an area referred to by the area as the reference degree of a certain area increases.

前記参照マップ生成手段が、前記動きベクトルが参照する領域の画素をあらかじめ定めた数以上含む領域に対して前記参照度合を大きくする、請求項１０記載の映像復号化装置。 The video decoding device according to claim 10, wherein the reference map generation unit increases the reference degree for a region including a predetermined number or more of pixels in a region referred to by the motion vector.

前記参照マップ生成手段が、段階的に行うＭＣＴＦによる動き予測補償のうち遅い段階で生成した前記動きベクトルが参照する領域ほど参照度合を大きくする、請求項１０記載の映像復号化装置。 The video decoding device according to claim 10, wherein the reference map generation unit increases the reference degree in a region referred to by the motion vector generated at a later stage of the motion prediction compensation by MCTF performed in stages.

前記ビット配置手段が、前記ビット指定パラメータに基づき逆量子化を行う、請求項９記載の映像復号化装置。 The video decoding device according to claim 9, wherein the bit arrangement unit performs inverse quantization based on the bit designation parameter.

前記ビット配置手段が、前記ビット指定パラメータに基づき複数のビット列の同じ位のビット毎に復号化を行う、請求項９記載の映像復号化装置。 The video decoding device according to claim 9, wherein the bit arrangement unit performs decoding for each of the same bits of a plurality of bit strings based on the bit designation parameter.

映像を単独で復号化可能な基本レイヤと基本レイヤの画質を向上する拡張レイヤに階層化した映像ストリームを復号化するものであり、前記ビット指定パラメータ生成手段が、拡張レイヤに対する前記ビット指定パラメータを生成する、請求項９記載の映像復号化装置。 Decoding a video stream hierarchized into a base layer capable of independently decoding video and an enhancement layer that improves the image quality of the base layer, the bit designation parameter generation means sets the bit designation parameter for the enhancement layer The video decoding device according to claim 9, which is generated.

ＭＣＴＦにより動きベクトルを生成する動き予測補償符号化処理ステップと、前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出処理ステップと、前記ビット抽出処理ステップで抽出したビットを可変長符号化する可変長符号化処理ステップと、を有する映像符号化方法。 A motion prediction / compensation coding processing step for generating a motion vector by MCTF, a reference map generation processing step for generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector, and the reference map. A bit designation parameter generation processing step for generating a bit designation parameter for designating which bit of a set of bits representing an area pixel is encoded and stored in the video stream; and the bit designation parameter from the bit representing the area pixel. A video extraction method comprising: a bit extraction processing step for extracting a bit specified by the method; and a variable length encoding processing step for variable length encoding the bit extracted in the bit extraction processing step.

ＭＣＴＦにより動きベクトルを生成する動き予測補償符号化処理ステップと、
前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、
前記参照マップを用いて領域の画素を表すビットの集まりのうちどのビットを符号化して映像ストリームに格納するのかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、
領域の画素を表すビットから前記ビット指定パラメータが指定するビットを抽出するビット抽出処理ステップと、
前記ビット抽出処理ステップで抽出したビットを可変長符号化する可変長符号化処理ステップと、
を有する映像符号化プログラム。 A motion prediction / compensation encoding processing step for generating a motion vector by MCTF;
A reference map generation processing step of generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector;
A bit designation parameter generation processing step for generating a bit designation parameter for designating which bit of the set of bits representing the pixels of the region using the reference map is encoded and stored in the video stream;
A bit extraction processing step for extracting a bit designated by the bit designation parameter from a bit representing a pixel of a region;
A variable-length encoding processing step for variable-length encoding the bits extracted in the bit extraction processing step;
A video encoding program.

映像ストリームを可変長復号化する可変長復号化処理ステップと、
動きベクトルを用いてＭＣＴＦを行う動き予測補償復号化処理ステップと、
前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、
前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、
領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置処理ステップと、
を有する映像復号化方法。 Variable length decoding processing step for variable length decoding of the video stream;
A motion prediction / compensation decoding processing step for performing MCTF using a motion vector;
A reference map generation processing step of generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector;
A bit designation parameter generation processing step for generating a bit designation parameter for designating which bit representing a pixel in a region corresponds to a bit obtained by decoding the video stream using the reference map;
A bit arrangement processing step of arranging a bit designated by the bit designation parameter in a bit representing a pixel of a region;
A video decoding method comprising:

映像ストリームを可変長復号化する可変長復号化処理ステップと、
動きベクトルを用いてＭＣＴＦを行う動き予測補償復号化処理ステップと、
前記動きベクトルを用いて複数画像の領域間の参照関係を表す参照マップを生成する参照マップ生成処理ステップと、
前記参照マップを用いて映像ストリームを復号化して得たビットが領域の画素を表すビットのどれに当たるかを指定するビット指定パラメータを生成するビット指定パラメータ生成処理ステップと、
領域の画素を表すビットに前記ビット指定パラメータが指定するビットを配置するビット配置処理ステップと、
を有する映像復号化プログラム。 Variable length decoding processing step for variable length decoding of the video stream;
A motion prediction / compensation decoding processing step for performing MCTF using a motion vector;
A reference map generation processing step of generating a reference map representing a reference relationship between regions of a plurality of images using the motion vector;
A bit designation parameter generation processing step for generating a bit designation parameter for designating which bit representing a pixel in a region corresponds to a bit obtained by decoding the video stream using the reference map;
A bit arrangement processing step of arranging a bit designated by the bit designation parameter in a bit representing a pixel of a region;
A video decoding program.