JP2010081240A

JP2010081240A - Encoding device and encoding method

Info

Publication number: JP2010081240A
Application number: JP2008246592A
Authority: JP
Inventors: Daisuke Sakamoto; 大輔坂本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-09-25
Filing date: 2008-09-25
Publication date: 2010-04-08
Anticipated expiration: 2028-09-25
Also published as: JP5063548B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an encoding device and an encoding method for appropriately setting a slice division position. <P>SOLUTION: When encoding moving image data, face detection is performed for the image frame of an encoding object, a face image included in the image frame is detected, and a face area including the face image is obtained. Further, from the detected face image, face parts such as eyes, a nose and a mouth are detected. Then, when the detected one face area occupies a fixed ratio or more inside the image frame and significant slice division is not performed without spreading over the face area, the slice division position is determined so as not to spread over the face parts detected inside the face area. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、動画像を符号化する符号化装置および符号化方法に関し、特には、画面内を複数の領域に分割して符号化する符号化装置および符号化方法に関する。 The present invention relates to an encoding device and an encoding method for encoding a moving image, and more particularly to an encoding device and an encoding method for encoding an image divided into a plurality of areas.

近年では、動画像データの高解像度化が進み、従来から用いられる７２０画素×４８０画素の映像に対して、例えば地上デジタル放送では、フルハイビジョン映像と呼ばれる１９２０画素×１０８０画素の映像が用いられることが多くなっている。このような高解像度の動画像データは、単位時間当たりに伝送されるデータ量も膨大なものになるため、従来の技術に対してより高能率な圧縮符号化技術が求められている。 In recent years, the resolution of moving image data has been increased, and a video of 1920 pixels × 1080 pixels called a full high-definition video is used in digital terrestrial broadcasting, for example, in comparison with a conventionally used video of 720 pixels × 480 pixels. Is increasing. Since such high-resolution moving image data has a huge amount of data transmitted per unit time, a highly efficient compression coding technique is required for the conventional technique.

これらの要求に対し、ＩＴＵ−ＴＳＧ１６やＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／ＷＧ１１の活動で、画像間の相関を利用したフレーム間予測を用いた符号化圧縮方式の標準化作業が進められている。この中でも、現状で最も高能率な符号化を実現しているといわれる符号化方式に、Ｈ．２６４／ＭＰＥＧ−４ＰＡＲＴ１０（ＡＶＣ）（以下、Ｈ．２６４と呼ぶ）がある。Ｈ．２６４の符号化および復号化の仕様については、例えば特許文献１などに記載されている。 In response to these demands, standardization of an encoding / compression method using inter-frame prediction using correlation between images is underway by activities of ITU-T SG16 and ISO / IEC JTC1 / SC29 / WG11. Among these, H.264 is an encoding method that is said to realize the most efficient encoding at present. H.264 / MPEG-4 PART10 (AVC) (hereinafter referred to as H.264). H. H.264 encoding and decoding specifications are described in, for example, Patent Document 1.

従来から動画像データの圧縮符号化に用いられるＭＰＥＧ２方式では、１フレームまたは１フィールドをマクロブロックと呼ばれる所定画素数の領域に分割し、マクロブロックを単位にして、動き補償を用いた予測符号化や直交変換処理、量子化処理を施す。量子化処理に用いた量子化パラメータは、マクロブロック毎に順次差分を取られ、この差分が符号化される。Ｈ．２６４｜ＡＶＣ方式では、上述の従来の方式に対し、直交変換処理をアダマール変換および整数精度ＤＣＴを用いて行うことにより誤差の蓄積を抑制する。それと共に、フレーム内予測符号化および動き補償を用いたフレーム間予測符号化とを行い、より精度の高い予測符号化を実現している。 In the MPEG2 method conventionally used for compression encoding of moving image data, one frame or one field is divided into a predetermined number of pixels called macroblocks, and prediction encoding using motion compensation is performed in units of macroblocks. Or orthogonal transform processing and quantization processing. The quantization parameter used for the quantization process is sequentially taken for each macroblock, and this difference is encoded. H. In the H.264 | AVC scheme, accumulation of errors is suppressed by performing orthogonal transform processing using Hadamard transform and integer precision DCT, compared to the above-described conventional scheme. At the same time, intra-frame predictive coding and inter-frame predictive coding using motion compensation are performed to realize more accurate predictive coding.

また、Ｈ．２６４では、あるマクロブロックに対して符号化を行う場合、符号化対象のマクロブロックに対して左横、左斜上、真上および右斜上にそれぞれ位置する４つのマクロブロックの処理結果を参照することができる。これにより、より適切な予測を行うことができる。 H. In H.264, when encoding a certain macroblock, refer to the processing results of four macroblocks located on the left side, top left, top right, and top right of the target macroblock. can do. Thereby, more appropriate prediction can be performed.

Ｈ．２６４においては、単独で復号可能な単位として、スライスが規定されている。スライスは、ピクチャ内の１または複数のマクロブロックからなる単位である。スライスのヘッダ情報には、スライス内の最初のマクロブロックの空間アドレスや、初期量子化パラメータなどが含まれる。スライス毎に量子化パラメータを初期化して符号化することで、スライス単独での復号を可能とすると共に、スライス内で発生したエラーが他のスライスに伝搬されるのが防がれ、エラー耐性が向上される。 H. In H.264, a slice is defined as a unit that can be decoded independently. A slice is a unit composed of one or a plurality of macroblocks in a picture. The slice header information includes the spatial address of the first macroblock in the slice, the initial quantization parameter, and the like. By initializing and encoding the quantization parameter for each slice, it is possible to decode the slice alone, and errors generated in the slice are prevented from propagating to other slices, and error tolerance is improved. Be improved.

特開２００５−１６７７２０号公報JP 2005-167720 A

ところで、上述のＨ．２６４においては、スライスグループおよび任意スライス順序と呼ばれる技術により、スライス分割の方法を柔軟に決めることができる。しかしながら、ピクチャ内の画像に対して適切にスライス分割が行われていないと、スライス分割によるメリットを十分に生かし切れない。 By the way, the above-mentioned H.P. In H.264, a slice division method can be flexibly determined by a technique called a slice group and an arbitrary slice order. However, if slice division is not appropriately performed on an image in a picture, the merit of slice division cannot be fully utilized.

例えば、図８に例示されるように、主被写体（この例では人物の顔）を跨ぐスライス分割を行った場合について考える。上述したように、スライスは、単独で復号可能である必要があるため、スライス外のマクロブロックを用いたイントラ予測、インター予測を行うことができない。そのため、復号後の画像において、スライスの境界部分で画質が不連続となってしまい、主被写体の画質が損なわれるおそれがある。 For example, as illustrated in FIG. 8, consider a case where slice division is performed across a main subject (in this example, a human face). As described above, since a slice needs to be able to be decoded independently, intra prediction and inter prediction using macroblocks outside the slice cannot be performed. Therefore, in the decoded image, the image quality becomes discontinuous at the boundary between slices, and the image quality of the main subject may be impaired.

また、図８において、スライス＃０の主被写体とは関係ない位置にあるマクロブロック３０１でエラーが発生した場合、当該スライス＃０のマクロブロック３０１以降の領域３０２全てがエラーとなってしまう。次のスライス＃１は、先頭で量子化パラメータなどが初期化され正常な復号処理に復帰できる。しかしながら、スライス途中での復帰はできないため、スライス＃０に含まれる主被写体の映像が損傷してしまうことになる。 Further, in FIG. 8, when an error occurs in the macro block 301 at a position unrelated to the main subject of slice # 0, all the areas 302 after the macro block 301 of the slice # 0 result in an error. In the next slice # 1, the quantization parameter and the like are initialized at the top, and the normal slice can be restored. However, since restoration cannot be performed in the middle of slicing, the video of the main subject included in slice # 0 is damaged.

したがって、本発明の目的は、スライス分割位置を適切に設定することができる符号化装置および符号化方法を提供することにある。 Therefore, an object of the present invention is to provide an encoding device and an encoding method that can appropriately set slice division positions.

本発明は、上述した課題を解決するために、画像データに対し、それぞれ単独に復号が可能なスライスを単位に符号化を行う符号化装置であって、画像データを符号化して符号化ストリームとして出力する符号化手段と、画像データに対して顔検出を行って顔領域を検出し、検出された顔領域に含まれる顔パーツをさらに検出する顔検出手段と、顔検出手段で検出された顔領域の画像データによる画面に占める割合が閾値よりも小さいと判定したら、スライスの分割を行うスライス分割位置を顔領域を含む領域と含まない領域との境界に基づき決定し、割合が閾値以上であると判定したら、スライス分割位置を顔検出手段で顔領域から検出された顔パーツを含む領域と含まない領域との境界に基づき決定するスライス分割位置決定手段と、符号化手段による符号化を制御して、スライス分割位置決定手段で決定されたスライス分割位置で画像データに対するスライス分割を行う符号化制御手段とを有することを特徴とする符号化装置である。 In order to solve the above-described problem, the present invention is an encoding device that encodes image data in units of slices that can be independently decoded, and encodes the image data as an encoded stream. Encoding means for outputting, face detection means for detecting a face area by performing face detection on image data, and further detecting a face part included in the detected face area; and a face detected by the face detection means If it is determined that the ratio of the area of the image data to the screen is smaller than the threshold, the slice division position for dividing the slice is determined based on the boundary between the area including the face area and the area not including the face area, and the ratio is equal to or greater than the threshold. The slice division position determining means for determining the slice division position based on the boundary between the area including the face part detected from the face area by the face detection means and the area not including the face part; And controls encoding by means a coding device characterized in that it comprises a coding control means for performing slice division for the image data in the slice division position determined by the slice division position determining means.

また、本発明は、画像データに対し、画像データに対し、それぞれ単独に復号が可能なスライスを単位に符号化を行う符号化方法であって、画像データを符号化して符号化ストリームとして出力する符号化ステップと、画像データに対して顔検出を行って顔領域を検出し、検出された顔領域に含まれる顔パーツをさらに検出する顔検出ステップと、顔検出ステップで検出された顔領域の画像データによる画面に占める割合が閾値よりも小さいと判定したら、スライスの分割を行うスライス分割位置を顔領域を含む領域と含まない領域との境界に基づき決定し、割合が閾値以上であると判定したら、スライス分割位置を顔検出ステップで顔領域から検出された顔パーツを含む領域と含まない領域との境界に基づき決定するスライス分割位置決定ステップと、符号化ステップによる符号化を制御して、スライス分割位置決定ステップで決定されたスライス分割位置で画像データに対するスライス分割を行う符号化制御ステップとを有することを特徴とする符号化方法である。 The present invention is also an encoding method for encoding image data in units of slices that can be decoded independently of each other. The image data is encoded and output as an encoded stream. A face detection step for detecting a face area by performing face detection on the image data, further detecting a face part included in the detected face area, and a detection of the face area detected in the face detection step. If it is determined that the ratio of the image data to the screen is smaller than the threshold, the slice division position for dividing the slice is determined based on the boundary between the area including the face area and the area not including the face area, and the ratio is determined to be equal to or greater than the threshold. Then, the slice division position is determined based on the boundary between the area including the face part and the area not including the face part detected from the face area in the face detection step. And a coding control step of controlling the coding in the coding step and performing slice division on the image data at the slice division position determined in the slice division position determination step. is there.

本発明は、上述した構成を有するため、スライス分割位置を適切に設定することができる。 Since the present invention has the above-described configuration, the slice division position can be set appropriately.

以下、本発明の実施形態を、図面を参照しながら説明する。本発明では、動画像データを符号化する際に、符号化対象の画像フレームに対して顔検出を行い、当該画像フレームに含まれる顔画像を検出し、顔画像が含まれる顔領域を取得する。さらに、検出された顔画像から目、鼻、口などの顔パーツを検出する。そして、検出された１の顔領域が画像フレーム内で一定割合以上を占め、意味のあるスライス分割を当該顔領域を跨がずに行えないときに、スライス分割位置を、当該顔領域内で検出された顔パーツを跨がないように決定する。 Embodiments of the present invention will be described below with reference to the drawings. In the present invention, when encoding moving image data, face detection is performed on an image frame to be encoded, a face image included in the image frame is detected, and a face region including the face image is acquired. . Furthermore, face parts such as eyes, nose and mouth are detected from the detected face image. Then, when one detected face area occupies a certain ratio or more in the image frame and a meaningful slice division cannot be performed without straddling the face area, the slice division position is detected in the face area. It decides not to straddle the face part that was done.

ここで、スライスとは、単独に復号可能となる画像の単位であり、ピクチャ内の１または複数のマクロブロックからなる単位である。 Here, a slice is a unit of an image that can be decoded independently, and is a unit composed of one or a plurality of macroblocks in a picture.

一般的に、デジタルビデオカメラなどで撮影を行う場合、人物、特に顔が主被写体となることが多いと考えられる。さらに、顔の中でも、当該人物の表情を形成する目、鼻、口などの顔パーツは、特に重要視されると考えられる。本発明をデジタルビデオカメラなどに適用した場合、このような撮影に際して重要と考えられる部分を跨がないように、適切にスライス分割位置を決めることができる。 In general, when shooting with a digital video camera or the like, it is considered that a person, particularly a face, is often the main subject. Furthermore, it is considered that facial parts such as eyes, nose, and mouth that form the facial expression of the person are particularly important among the faces. When the present invention is applied to a digital video camera or the like, the slice division position can be appropriately determined so as not to straddle a portion that is considered to be important in such shooting.

＜実施形態＞
図１は、本発明の実施形態に適用可能な符号化装置１００の一例の構成を示す。符号化装置１００は、供給されたベースバンドの動画像データに対し、１画面を所定サイズに分割したブロック単位で動き検出を行い、動き補償を用いたフレーム間予測符号化を行う。符号化は、アダマール変換および整数精度ＤＣＴを用いた直交変換および変換係数に対する量子化と、フレーム内予測符号化および動き補償を用いたフレーム間予測符号化とを用い、さらにエントロピー符号化を施すことで行う。 <Embodiment>
FIG. 1 shows a configuration of an example of an encoding apparatus 100 applicable to the embodiment of the present invention. The encoding apparatus 100 performs motion detection on the supplied baseband moving image data in units of blocks obtained by dividing one screen into a predetermined size, and performs interframe predictive encoding using motion compensation. Encoding uses orthogonal transformation using Hadamard transform and integer precision DCT, quantization for transform coefficient, intra-frame prediction coding and intra-frame prediction coding using motion compensation, and further entropy coding To do.

以下では、アダマール変換および整数精度ＤＣＴを用いた直交変換を整数変換と呼び、フレーム内予測符号化およびフレーム間予測符号化をそれぞれイントラ符号化、インター符号化と呼ぶ。 Hereinafter, the Hadamard transform and orthogonal transform using integer precision DCT are referred to as integer transform, and intra-frame prediction coding and inter-frame prediction coding are referred to as intra coding and inter coding, respectively.

符号化制御部１５は、例えばＣＰＵ、ＲＯＭおよびＲＡＭを有し、ＣＰＵがＲＯＭに予め格納されたプログラムに従い、ＲＡＭをワークメモリとして用いてこの符号化装置１００の全体を制御する。 The encoding control unit 15 includes, for example, a CPU, a ROM, and a RAM. The CPU controls the entire encoding apparatus 100 using the RAM as a work memory according to a program stored in the ROM in advance.

符号化装置１００に対して、ベースバンドの動画像データが画像フレーム単位で表示順で入力され、フレームメモリ１０に一時的に保存される。フレームメモリ１０に保存された画像フレームは、符号化順に並び替えられ、符号化のために、所定サイズ（例えば１６画素×１６画素）のマクロブロックに分割されて読み出される。マクロブロックは、例えば画面の左端から右端に水平方向にスキャンされ、それが垂直方向に繰り返されて読み出される。また、マクロブロックに対して、例えばスキャンの順序に従って画像フレーム内における座標情報が定義される。 Baseband moving image data is input to the encoding device 100 in the display order in units of image frames, and is temporarily stored in the frame memory 10. The image frames stored in the frame memory 10 are rearranged in the encoding order, and are divided into macroblocks of a predetermined size (for example, 16 pixels × 16 pixels) and read for encoding. For example, the macro block is scanned in the horizontal direction from the left end to the right end of the screen, and is read repeatedly in the vertical direction. Also, coordinate information in the image frame is defined for the macroblock in accordance with, for example, the scan order.

さらに、フレームメモリ１０から、入力された動画像データの、マクロブロック単位で読み出された画像データに対応する画像フレームが読み出され、顔検出部３０に供給される。なお、フレームメモリ１０から符号化のためにマクロブロック単位で読み出された画像データに対応する画像フレームを、以下、符号化対象フレームと呼ぶ。 Further, an image frame corresponding to the image data read in units of macroblocks of the input moving image data is read from the frame memory 10 and supplied to the face detection unit 30. Note that an image frame corresponding to image data read from the frame memory 10 in units of macro blocks for encoding is hereinafter referred to as an encoding target frame.

顔検出部３０は、フレームメモリ１０から供給された符号化対象フレームに対して、人間の顔が含まれる顔領域の検出を行う。顔検出部３０で検出された顔領域を示す顔領域情報は、顔パーツ検出部３１に供給される。 The face detection unit 30 detects a face area including a human face for the encoding target frame supplied from the frame memory 10. The face area information indicating the face area detected by the face detection unit 30 is supplied to the face part detection unit 31.

顔パーツ検出部３１は、顔検出部３０から供給された顔領域情報を保持すると共に、当該顔領域情報に基づき、顔に含まれる各パーツ（以下、顔パーツと呼ぶ）を検出する。ここでは、顔パーツを、顔の特徴を顕著に表すと考えられる部分であるものとする。顔の中のこのような部分としては、例えば左目、右目、鼻および口が挙げられる。例えば、顔パーツ検出部３１は、これら左目、右目、鼻および口をそれぞれ検出する。 The face part detection unit 31 holds the face area information supplied from the face detection unit 30, and detects each part (hereinafter referred to as a face part) included in the face based on the face area information. Here, it is assumed that the face part is a part that is considered to represent the facial features prominently. Such portions in the face include, for example, the left eye, right eye, nose and mouth. For example, the face part detection unit 31 detects the left eye, right eye, nose and mouth.

顔パーツ検出部３１で検出された各顔パーツを示す顔パーツ情報は、スライス分割部３２に供給される。スライス分割部３２は、顔検出部３０から供給された顔領域情報と、顔パーツ検出部３１で検出された顔パーツ情報とに基づき、符号化対象フレームに対するスライス分割位置を決定する。スライス分割位置は、マクロブロックの座標を用いて表現され、量子化制御部１４に供給されると共に、符号化制御部１５に供給される。なお、スライス分割部３２によるスライス分割位置決定処理の詳細については、後述する。 Face part information indicating each face part detected by the face part detection unit 31 is supplied to the slice division unit 32. The slice division unit 32 determines the slice division position for the encoding target frame based on the face area information supplied from the face detection unit 30 and the face part information detected by the face part detection unit 31. The slice division position is expressed using the coordinates of the macroblock, and is supplied to the quantization control unit 14 and is also supplied to the encoding control unit 15. Details of the slice division position determination processing by the slice division unit 32 will be described later.

なお、顔検出部３０による顔領域の検出方法は、様々に考えられるが、例えば、特開２００１−３０９２２５号公報に記載される方法を用いることができる。これは、先ず、画像データに対して、色および形状に基いて肌を含む可能性が高いと思われる中央部と、色および形状に基いて毛髪を含む可能性が高いと思われる周辺領域とを探す。その結果に基づき、第１の顔候補検出アルゴリズムにより、パターン認識オペレータを用いて顔を含む可能性の高い領域を探す。そして、第１のアルゴリズムで求められた顔候補領域中の顔の存在を、パターンマッチにより確かめる第２のアルゴリズムとを併用して顔を検出する。 Various methods for detecting the face area by the face detection unit 30 can be considered. For example, a method described in Japanese Patent Laid-Open No. 2001-309225 can be used. First of all, the image data includes a central portion that is likely to include skin based on color and shape, and a peripheral region that is likely to include hair based on color and shape. Search for. Based on the result, the first face candidate detection algorithm is used to search for a region that is likely to include a face using a pattern recognition operator. Then, a face is detected in combination with the second algorithm for confirming the presence of the face in the face candidate area obtained by the first algorithm by pattern matching.

また、顔パーツ検出部３１による、顔領域の各顔パーツの検出方法としては、次のような方法が考えられる。先ず、顔の肌色領域を「０」、顔の肌色領域以外を「１」として２値化する。そして、顔の肌色領域から顔の重心を検出し、その重心の斜め上方にあるホールの位置を目領域と決定する。なお、ホールが検出できない場合は、その目を閉じているものと判断する。また、人体の一般的な構造から、顔領域の重心よりも下方で右目と左目との間の垂直２等分線上の所定位置を、口領域とする。さらに、右目、左目および口の位置関係から、鼻位置を求める。 Further, as a method of detecting each face part in the face area by the face part detection unit 31, the following method is conceivable. First, binarization is performed with “0” for the skin color area of the face and “1” for areas other than the face skin color area. Then, the center of gravity of the face is detected from the skin color area of the face, and the position of the hole obliquely above the center of gravity is determined as the eye area. If a hole cannot be detected, it is determined that the eyes are closed. Further, from the general structure of the human body, a predetermined position on the vertical bisector between the right eye and the left eye below the center of gravity of the face area is defined as the mouth area. Further, the nose position is obtained from the positional relationship between the right eye, the left eye and the mouth.

一方、フレームメモリ１０からマクロブロック単位で読み出された画像データは、減算器１１の被減算入力に入力されると共に、動き検出部２３に供給される。動き検出部２３は、後述するフレームメモリ２１から読み出した復元画像フレームを参照フレームとして、フレームメモリ１０から供給された画像データにおける動きベクトルを検出する。検出された動きベクトル情報は、インター予測部２２とエントロピー符号化部１６とに出力される。 On the other hand, the image data read from the frame memory 10 in units of macroblocks is input to the subtracted input of the subtractor 11 and supplied to the motion detection unit 23. The motion detection unit 23 detects a motion vector in the image data supplied from the frame memory 10 using a restored image frame read from the frame memory 21 described later as a reference frame. The detected motion vector information is output to the inter prediction unit 22 and the entropy encoding unit 16.

減算器１１は、被減算入力に入力された画像データから、後述するスイッチ２６から出力される予測画像データを減算し、画像残差データを生成する。画像残差データは、直交変換部１２でアダマール変換や整数精度ＤＣＴといった直交変換処理によりＤＣＴ係数に変換される。 The subtractor 11 subtracts predicted image data output from the switch 26 described later from the image data input to the subtracted input to generate image residual data. The image residual data is converted into DCT coefficients by orthogonal transform processing such as Hadamard transform or integer precision DCT in the orthogonal transform unit 12.

このＤＣＴ係数は、量子化部１３で所定の量子化パラメータを用いて量子化される。量子化パラメータは、ＤＣＴ係数を量子化する際の量子化ステップと所定の関係を有するパラメータで、例えば量子化パラメータと量子化ステップの対数が比例するように決められる。量子化ステップおよび量子化パラメータは、マクロブロック単位で変更することが可能である。例えば、エントロピー符号化部１６で発生した符号量に基づき、マクロブロック毎の符号量が一定範囲内になるように、量子化パラメータを制御する。量子化部１３から出力された量子化値は、エントロピー符号化部１６に供給される。 The DCT coefficient is quantized by the quantization unit 13 using a predetermined quantization parameter. The quantization parameter is a parameter having a predetermined relationship with the quantization step when the DCT coefficient is quantized. For example, the quantization parameter is determined so that the logarithm of the quantization parameter and the quantization step is proportional. The quantization step and the quantization parameter can be changed on a macroblock basis. For example, the quantization parameter is controlled based on the code amount generated by the entropy encoding unit 16 so that the code amount for each macroblock is within a certain range. The quantized value output from the quantizing unit 13 is supplied to the entropy coding unit 16.

また、量子化部１３は、あるマクロブロックの量子化に用いた量子化パラメータと、当該マクロブロックの直前に量子化されたマクロブロックの量子化に用いた量子化パラメータとの差分を算出する。算出された量子化パラメータの差分値は、量子化値に付加されて量子化部１３から出力される。なお、量子化パラメータは、スライス分割部３２から出力されたスライス分割位置を示す情報に基づく量子化制御部１４の制御により、スライスの先頭のマクロブロックで初期化される。 Further, the quantization unit 13 calculates a difference between a quantization parameter used for quantization of a certain macroblock and a quantization parameter used for quantization of a macroblock quantized immediately before the macroblock. The calculated difference value of the quantization parameter is added to the quantization value and output from the quantization unit 13. The quantization parameter is initialized with the first macroblock of the slice under the control of the quantization control unit 14 based on the information indicating the slice division position output from the slice division unit 32.

量子化部１３から出力された量子化値は、逆量子化部１７にも供給される。量子化値は、逆量子化部１７で逆量子化され、逆直交変換部１８で逆直交変換され、ローカルデコード画像データとされる。ローカルデコード画像データは、スイッチ２６から出力される予測画像データが加算器１９で加算され、復元画像データが形成される。復元画像データは、フレームメモリ２４に格納されると共に、デブロッキングフィルタ２０で符号化歪を軽減されてフレームメモリ２１に格納される。 The quantized value output from the quantization unit 13 is also supplied to the inverse quantization unit 17. The quantized value is inversely quantized by the inverse quantization unit 17 and inversely orthogonally transformed by the inverse orthogonal transform unit 18 to be locally decoded image data. The predicted image data output from the switch 26 is added to the local decoded image data by the adder 19 to form restored image data. The restored image data is stored in the frame memory 24, and is also stored in the frame memory 21 with the encoding distortion reduced by the deblocking filter 20.

イントラ予測部２５は、フレームメモリ２４に格納された復元画像データを用いてフレーム内予測処理を行い、予測画像データを生成する。イントラ予測部２５から出力されたイントラ予測画像データは、スイッチ２６の入力端２６Ａに供給される。 The intra prediction unit 25 performs an intra-frame prediction process using the restored image data stored in the frame memory 24, and generates predicted image data. The intra prediction image data output from the intra prediction unit 25 is supplied to the input terminal 26A of the switch 26.

動き検出部２３は、フレームメモリ２１に格納される復元画像フレームを参照フレームとして用いて、フレームメモリ１０からマクロブロック単位で供給された画像データの動き検出を行う。インター予測部２２は、フレームメモリ２１に格納された復元画像データと、動き検出部２３により検出された動きベクトルとに基づきフレーム間予測処理を行い、インター予測画像データを生成する。インター予測画像データは、スイッチ２６の入力端２６Ｂに供給される。 The motion detection unit 23 uses the restored image frame stored in the frame memory 21 as a reference frame, and performs motion detection on the image data supplied from the frame memory 10 in units of macroblocks. The inter prediction unit 22 performs inter-frame prediction processing based on the restored image data stored in the frame memory 21 and the motion vector detected by the motion detection unit 23, and generates inter prediction image data. The inter prediction image data is supplied to the input terminal 26B of the switch 26.

スイッチ２６は、イントラ予測およびインター予測の何方を用いるかを選択する。イントラ予測部２５から出力されたイントラ予測画像データと、インター予測部２２から出力されたインター予測画像データとのうち一方を選択し、選択された予測画像データを減算器１１の減算入力に供給すると共に、加算器１９に供給する。 The switch 26 selects which of intra prediction and inter prediction is used. One of the intra prediction image data output from the intra prediction unit 25 and the inter prediction image data output from the inter prediction unit 22 is selected, and the selected prediction image data is supplied to the subtraction input of the subtractor 11. At the same time, it is supplied to the adder 19.

エントロピー符号化部１６は、量子化部１３から供給された量子化パラメータおよび動き検出部２３から出力された動きベクトル情報をエントロピー符号化する。また、エントロピー符号化部１６は、イントラ符号化およびインター符号化の何れを行ったかを示す情報（マクロブロックタイプ）や、インター予測の際に用いた参照フレームを、マクロブロック単位で示す情報をさらにエントロピー符号化する。 The entropy encoding unit 16 entropy encodes the quantization parameter supplied from the quantization unit 13 and the motion vector information output from the motion detection unit 23. Further, the entropy encoding unit 16 further includes information indicating whether intra encoding or inter encoding has been performed (macroblock type), and information indicating the reference frame used in inter prediction in units of macroblocks. Entropy encoding.

エントロピー符号化部１６の出力は、符号化制御部１５によってマクロブロックヘッダ、スライスヘッダ、ピクチャヘッダなどストリームの階層構成における各層のヘッダ情報を所定に付加されて、符号化ストリームとして符号化装置１００から出力される。 The output of the entropy encoding unit 16 is preliminarily added by the encoding control unit 15 with header information of each layer in the stream hierarchical configuration such as a macroblock header, a slice header, and a picture header, and is output from the encoding device 100 as an encoded stream. Is output.

＜実施形態によるスライス分割方法＞
次に、本発明の実施形態によるスライス分割部３２におけるスライス分割方法について、図２を用いて詳細に説明する。図２（ａ）に例示される、顔２０１が中央部に含まれる符号化対象フレーム２００を考える。なお、図２（ａ）において、便宜上、格子で示されるブロックが符号化単位のブロック（マクロブロック）であるものとし、左上隅のブロックをブロック座標（０，０）とし、右下隅のブロックをブロック座標（６，４）とする。 <Slice Division Method According to Embodiment>
Next, the slice division method in the slice division unit 32 according to the embodiment of the present invention will be described in detail with reference to FIG. Consider an encoding target frame 200 in which the face 201 is included in the center as exemplified in FIG. In FIG. 2 (a), for the sake of convenience, it is assumed that the block indicated by the grid is a block (macroblock) of the coding unit, the block at the upper left corner is the block coordinate (0, 0), and the block at the lower right corner is Let it be block coordinates (6, 4).

この符号化対象フレーム２００をフレームメモリ１０から読み出し、顔検出部３０で顔検出を行った結果、図２（ｂ）に例示されるように顔領域が検出される。すなわち、ブロック座標（１，０）、（５，０）、（１，４）および（５，４）で囲まれた矩形領域が顔領域として検出される。 As a result of reading this encoding target frame 200 from the frame memory 10 and performing face detection by the face detection unit 30, a face region is detected as illustrated in FIG. 2B. That is, a rectangular area surrounded by the block coordinates (1, 0), (5, 0), (1, 4) and (5, 4) is detected as a face area.

顔検出部３０は、検出された顔領域を示すブロック座標を顔パーツ検出部３１に供給する。顔パーツ検出部３１は、供給されたブロック座標で示される顔領域に含まれる各顔パーツの情報を解析し、顔領域を示すブロック座標と共に保持する。上述したように、顔パーツは、顔の特徴をより顕著に表す部分とし、この例では、左目、右目、鼻および口であるものとする。なお、ここでいう左目および右目は、画面上の左右に対応する。 The face detection unit 30 supplies block coordinates indicating the detected face area to the face part detection unit 31. The face part detection unit 31 analyzes information on each face part included in the face area indicated by the supplied block coordinates, and holds the information together with the block coordinates indicating the face area. As described above, the facial parts are portions that more significantly represent facial features, and in this example, are the left eye, right eye, nose, and mouth. Note that the left eye and right eye here correspond to the left and right on the screen.

図２（ｃ）は、顔パーツ検出部３１による各顔パーツの解析結果の例を示す。この例では、左目２１０がブロック座標（２，２）および（３，２）で示される領域で検出され、右目２１１がブロック座標（３，２）および（３，５）で示される領域で検出される。また、口２１３がブロック座標（２，３）および（４，３）で示される領域で検出され、鼻２１２がブロック座標（３，２）で示される領域で検出される。 FIG. 2C shows an example of the analysis result of each face part by the face part detection unit 31. In this example, the left eye 210 is detected in the area indicated by the block coordinates (2, 2) and (3, 2), and the right eye 211 is detected in the area indicated by the block coordinates (3, 2) and (3, 5). Is done. The mouth 213 is detected in the area indicated by the block coordinates (2, 3) and (4, 3), and the nose 212 is detected in the area indicated by the block coordinates (3, 2).

顔パーツ検出部３１は、顔領域の解析の結果で得られたこれら各顔パーツの座標情報と、顔検出部３０から供給された顔領域を示す座標情報とを、スライス分割部３２に出力する。 The face part detection unit 31 outputs the coordinate information of each face part obtained as a result of the analysis of the face region and the coordinate information indicating the face region supplied from the face detection unit 30 to the slice division unit 32. .

スライス分割部３２は、顔パーツ検出部３１から供給された顔領域を示す座標情報と、各顔パーツの座標情報とに基づき、符号化対象フレーム２００に対するスライス分割方法を決定する。本実施形態では、先ず、検出された顔領域の符号化対象フレーム２００に占める割合に応じて、スライス分割を顔領域に基づき行うか、顔パーツに基づき行うかを判定する。 The slice division unit 32 determines a slice division method for the encoding target frame 200 based on the coordinate information indicating the face area supplied from the face part detection unit 31 and the coordinate information of each face part. In the present embodiment, first, it is determined whether to perform slice division based on a face area or based on a face part according to the ratio of the detected face area to the encoding target frame 200.

この判定は、例えば下記に示す式（１）を用いて行う。なお、式（１）において、値Ｖ_ｍａｘは、顔領域の垂直方向の座標の最大値、値Ｖ_ｍｉｎは、垂直方向の最小値、値Ｖ_{ｔｏｔａｌ}は、符号化対象フレーム２００の垂直方向のサイズをそれぞれ示す。また、閾値ｔｈは、例えば実験的に決定することができる。
ｔｈ＞(Ｖ_ｍａｘ−Ｖ_ｍｉｎ)／Ｖ_{ｔｏｔａｌ} …（１） This determination is performed using, for example, the following formula (1). In Expression (1), the value V _max is the maximum coordinate value in the vertical direction of the face area, the value V _min is the minimum value in the vertical direction, and the value V _total is the size of the encoding target frame 200 in the vertical direction. Respectively. The threshold th can be determined experimentally, for example.
th> (V _max −V _min ) / V _total (1)

閾値ｔｈの値は、スライスの分割数によって異なる。例えば符号化対象フレーム２００を３スライスに分割する場合の一例として、閾値ｔｈを０．８とすることが考えられる。ここで、「０．８」とは、顔領域が画面垂直方向の領域の８０％を占めることを示す。この場合、(Ｖ_ｍａｘ−Ｖ_ｍｉｎ)／Ｖ_{ｔｏｔａｌ}の値が０．８を超えない場合には顔領域に基づいてスライス分割を行う。また、(Ｖ_ｍａｘ−Ｖ_ｍｉｎ)／Ｖ_{ｔｏｔａｌ}の値が０．８以上の場合（閾値以上）には、顔パーツに基づきスライス分割を行う。 The value of the threshold th varies depending on the number of slice divisions. For example, as an example of dividing the encoding target frame 200 into three slices, the threshold th may be set to 0.8. Here, “0.8” indicates that the face area occupies 80% of the area in the vertical direction of the screen. In this case, when the value of (V _max −V _min ) / V _total does not exceed 0.8, slice division is performed based on the face area. Further, when the value of (V _max −V _min ) / V _total is 0.8 or more (more than a threshold value), slice division is performed based on the face part.

式（１）を満たす場合、顔領域が符号化対象フレーム２００に占める割合が小さいものと判断することができる。この場合、スライス分割部３２は、スライス分割を、顔領域を跨がないように行う。より具体的には、スライス分割部３２は、顔領域を含む領域と含まない領域との境界に基づき、スライス分割位置を決定する。 When Expression (1) is satisfied, it can be determined that the ratio of the face area to the encoding target frame 200 is small. In this case, the slice dividing unit 32 performs the slice division so as not to straddle the face area. More specifically, the slice dividing unit 32 determines the slice division position based on the boundary between the area including the face area and the area not including the face area.

この場合の例を、図３を用いてより具体的に説明する。図３（ａ）に例示されるように、符号化対象フレーム２００から、顔領域２０２が対角をブロック座標（２，３）および（４，６）で示される矩形領域として検出され、この顔領域２０２が上述の式（１）を満たしているものとする。 An example of this case will be described more specifically with reference to FIG. As illustrated in FIG. 3A, a face area 202 is detected as a rectangular area indicated by block coordinates (2, 3) and (4, 6) from the encoding target frame 200, and this face is detected. It is assumed that the region 202 satisfies the above formula (1).

スライス分割を画面の水平方向に行うものとして、この場合、顔領域２０２の上端を含むブロック座標（ｘ，３）の上端と、顔領域２０２の下端を含むブロック座標（ｘ，６）の下端とで、それぞれスライス分割を行う。すなわち、ブロック座標（ｘ，２）より上側と、ブロック座標（ｘ，７）の下側は、顔領域２０２を含まない領域である。一方、ブロック座標（ｘ，４）の上端から（ｘ，５）の下端までの範囲は、顔領域２０２を含むため、スライス分割を行わない。その結果、例えば図３（ｃ）に例示されるように、符号化対象フレーム２００がスライス＃０〜スライス＃２の３つのスライスに分割される。 In this case, the slice division is performed in the horizontal direction of the screen. In this case, the upper end of the block coordinates (x, 3) including the upper end of the face area 202 and the lower end of the block coordinates (x, 6) including the lower end of the face area 202 Then, each slice is divided. That is, the area above the block coordinates (x, 2) and the area below the block coordinates (x, 7) are areas that do not include the face area 202. On the other hand, since the range from the upper end of the block coordinates (x, 4) to the lower end of (x, 5) includes the face region 202, slice division is not performed. As a result, for example, as illustrated in FIG. 3C, the encoding target frame 200 is divided into three slices of slice # 0 to slice # 2.

一方、上述の式（１）を満たさない場合、顔領域２０２の符号化対象フレーム２００に占める割合が大きすぎて、符号化対象フレーム２００を適切にスライス分割することができないと考えられる。一例として、上述した図２（ａ）の例では、顔領域２０２が符号化対象フレーム２００の上端のブロックから下端のブロックまで占めているので、顔領域２０２を跨がないようにスライス分割を行うことができない。本実施形態では、このような場合、各顔パーツの座標情報に基づきスライス分割を行う。 On the other hand, when the above formula (1) is not satisfied, it is considered that the ratio of the face region 202 to the encoding target frame 200 is too large and the encoding target frame 200 cannot be sliced appropriately. As an example, in the example of FIG. 2A described above, since the face area 202 occupies from the uppermost block to the lowermost block of the encoding target frame 200, slice division is performed so as not to straddle the face area 202. I can't. In this embodiment, in such a case, slice division is performed based on the coordinate information of each face part.

一例として、上述した図２（ａ）および図２（ｃ）では、左目２１０、右目２１１および鼻２１２がブロック座標（２，２）、（３，２）、（４，２）および（５，２）に含まれている。また、口２１３がブロック座標（２，３）、（３，３）および（４，３）に含まれている。この場合、それぞれの顔パーツを含む領域と含まない領域との境界に基づき、ブロック座標（ｘ，１）および（ｘ，２）の間と、ブロック座標（ｘ，２）および（ｘ，３）の間と、ブロック座標（ｘ，３）および（ｘ，４）の間とで、それぞれスライス分割が行われる。 As an example, in FIGS. 2A and 2C described above, the left eye 210, the right eye 211, and the nose 212 have block coordinates (2, 2), (3, 2), (4, 2), and (5, 5). 2). The mouth 213 is included in the block coordinates (2, 3), (3, 3) and (4, 3). In this case, based on the boundary between the region including each face part and the region not including each face part, between the block coordinates (x, 1) and (x, 2), and the block coordinates (x, 2) and (x, 3) And slice division between the block coordinates (x, 3) and (x, 4).

その結果、図２（ｄ）に例示されるように、符号化対象フレーム２００がスライス＃０〜スライス＃３の４つのスライスに分割される。 As a result, as illustrated in FIG. 2D, the encoding target frame 200 is divided into four slices of slice # 0 to slice # 3.

このように、顔領域の中でも顔の特徴となるパーツを跨がないようにスライス分割を行うことで、スライスを跨いだ予測符号化を行えないことによる画質の劣化が、これらのパーツ中に生じないようにすることができる。また、顔パーツに対するエラー耐性を高めることができる。 In this way, by performing slice division so that parts that are facial features in the face region do not straddle, image quality degradation due to failure to perform predictive coding across slices occurs in these parts. Can not be. Moreover, the error tolerance with respect to a face part can be improved.

＜実施形態の第１の変形例＞
次に、本実施形態の第１の変形例について説明する。本実施形態の第１の変形例は、符号化対象フレーム２００から複数の顔領域が検出された場合の例である。図４は、本実施形態の第１の変形例に適用可能な符号化装置１０１の一例の構成を示す。なお、図４において、上述した図１と共通する部分には同一の符号を付し、詳細な説明を省略する。 <First Modification of Embodiment>
Next, a first modification of the present embodiment will be described. The first modification of the present embodiment is an example when a plurality of face regions are detected from the encoding target frame 200. FIG. 4 shows an exemplary configuration of an encoding apparatus 101 that can be applied to the first modification of the present embodiment. In FIG. 4, the same reference numerals are given to the same parts as those in FIG. 1 described above, and detailed description thereof is omitted.

図４に示される符号化装置１０１は、上述した図１に示される符号化装置１００に対して、中心近傍顔決定部３３が追加された構成となっている。すなわち、本実施形態の第１の変形例では、顔検出部３０により符号化対象フレーム２００内に複数の顔領域が検出された際に、この複数の顔領域のうち符号化対象フレーム２００の中心に最も近い顔領域（以下、中心近傍顔領域）を選択する。 The coding apparatus 101 shown in FIG. 4 has a configuration in which a central neighborhood face determination unit 33 is added to the coding apparatus 100 shown in FIG. 1 described above. That is, in the first modification of the present embodiment, when a plurality of face areas are detected in the encoding target frame 200 by the face detection unit 30, the center of the encoding target frame 200 among the plurality of face areas is detected. The face area closest to (hereinafter, the face area near the center) is selected.

そして、上述した式（１）による判定を行い、中心近傍顔領域の符号化対象フレーム２００全体に占める割合が閾値ｔｈより大きいと判定された場合には、当該中心近傍顔領域に含まれる顔パーツを跨がないようなスライス分割を行う。一方、中心近傍顔領域の符号化対象フレーム２００全体に占める割合が閾値ｔｈよりも小さいと判定された場合は、当該中心近傍顔領域を跨がないように、スライス分割を行う。 Then, when it is determined by the above-described equation (1) and it is determined that the ratio of the center vicinity face area to the entire encoding target frame 200 is larger than the threshold th, the face parts included in the center vicinity face area The slice is divided so as not to straddle. On the other hand, when it is determined that the ratio of the center vicinity face area to the entire encoding target frame 200 is smaller than the threshold th, slice division is performed so as not to straddle the center vicinity face area.

図５を用いて、より具体的に説明する。図５（ａ）に例示されるように、顔検出部３０において、符号化対象フレーム２００から顔領域２２０、２２１および２２２が検出されたものとする。顔領域２２０は、対角がブロック座標（０，０）および（２，１）で示される矩形領域として検出される。顔領域２２１は、対角がブロック座標（２，２）および（４，３）で示される矩形領域として検出される。また、顔領域２２２は、対角がブロック座標（４，０）および（６，２）で示される矩形領域として検出される。これら顔領域２２０〜２２２の検出結果は、中心近傍顔決定部３３に供給される。 This will be described more specifically with reference to FIG. As illustrated in FIG. 5A, it is assumed that the face regions 220, 221, and 222 are detected from the encoding target frame 200 in the face detection unit 30. The face area 220 is detected as a rectangular area whose diagonal is indicated by block coordinates (0, 0) and (2, 1). The face area 221 is detected as a rectangular area whose diagonal is indicated by block coordinates (2, 2) and (4, 3). The face area 222 is detected as a rectangular area whose diagonal is indicated by block coordinates (4, 0) and (6, 2). The detection results of these face regions 220 to 222 are supplied to the center vicinity face determination unit 33.

中心近傍顔決定部３３は、顔検出部３０から供給された顔領域の検出結果に基づき、顔検出部３０で検出された複数の顔領域２２０〜２２２のうち符号化対象フレーム２００による画面の中心に最も近い顔領域を判定する。 The center vicinity face determination unit 33 is based on the detection result of the face area supplied from the face detection unit 30, and the center of the screen by the encoding target frame 200 among the plurality of face areas 220 to 222 detected by the face detection unit 30. The face area closest to is determined.

この判定は、例えば次式（２）を用いて行う。なお、式（２）において値ｘ_{ｃｅｎｔｅｒ}および値ｙ_{ｃｅｎｔｅｒ}は、符号化対象フレーム２００による画面の中心の水平方向および垂直方向の座標をそれぞれ示す。また、値ｘ_ｎおよび値ｙ_ｎ（ただし４≧ｎ≧１）は、顔領域の各頂点（左上、右上、左下および右下）のｘおよびｙ座標をそれぞれ示す。
ｃｅｎｔ＿ｄｉｓｔ＝(ｘ_ｎ−ｘ_{ｃｅｎｔｅｒ})^２＋(ｙ_ｎ−ｙ_{ｃｅｎｔｅｒ})^２ …（２） This determination is performed using, for example, the following equation (2). In Expression (2), the value x _center and the value y _center indicate the horizontal and vertical coordinates of the center of the screen of the encoding target frame 200, respectively. A value x _n and a value y _n (where 4 ≧ n ≧ 1) indicate the x and y coordinates of each vertex (upper left, upper right, lower left and lower right) of the face area, respectively.
_{_{cent_dist = (x n -x center)}} 2 + (y n -y center) 2 ... (2)

式（２）により、値ｃｅｎｔ＿ｄｉｓｔを符号化対象フレーム２００から検出された各顔領域２２０〜２２２についてそれぞれ求め、値ｃｅｎｔ＿ｄｉｓｔが最も小さくなる顔領域を、中心近傍顔領域に決定する。図５（ａ）の例では、顔領域２２０〜２２２のうち顔領域２２１について求められた値ｃｅｎｔ＿ｄｉｓｔが最も小さく、顔領域２２１が中心近傍顔領域に決定される。 The value cent_dist is obtained for each of the face regions 220 to 222 detected from the encoding target frame 200 by the expression (2), and the face region having the smallest value cent_dist is determined as the central neighborhood face region. In the example of FIG. 5A, the value cent_dist obtained for the face area 221 out of the face areas 220 to 222 is the smallest, and the face area 221 is determined as the center vicinity face area.

そして、この中心近傍顔領域について、式（１）による顔領域の大きさの判定が行われる。判定の結果、中心近傍顔領域に決定された顔領域２２１の符号化対象フレーム２００全体に占める割合が閾値ｔｈよりも小さいと判定された場合は、顔領域２２１を跨がないように、スライス分割を行う。すなわち、顔領域２２１の上端を含むブロック座標（ｘ，２）より上側と、顔領域２２１の下端を含むブロック座標（ｘ，４）より下側は、顔領域２２１を含まない領域である。したがって、顔領域２２１を含む領域と含まない領域との境界に基づき、ブロック座標（ｘ，２）の上端と、ブロック座標（ｘ，４）の下端とでそれぞれスライス分割を行う。一方、ブロック座標（ｘ，３）の領域は、顔領域２２１を含むため、スライス分割を行わない。 Then, for the face area near the center, the size of the face area is determined by Expression (1). As a result of the determination, when it is determined that the ratio of the face area 221 determined as the center vicinity face area to the entire encoding target frame 200 is smaller than the threshold th, slice division is performed so as not to straddle the face area 221. I do. That is, the area above the block coordinates (x, 2) including the upper end of the face area 221 and the area below the block coordinates (x, 4) including the lower end of the face area 221 are areas not including the face area 221. Therefore, based on the boundary between the area including the face area 221 and the area not including the face area 221, slice division is performed at the upper end of the block coordinates (x, 2) and the lower end of the block coordinates (x, 4). On the other hand, the area of the block coordinates (x, 3) includes the face area 221 and therefore does not perform slice division.

その結果、例えば図５（ｂ）に例示されるように、符号化対象フレーム２００がスライス＃０〜スライス＃２の３つのスライスに分割される。このとき、この例では、中心近傍顔領域ではない顔領域２２２を跨ぐスライス分割がなされている。 As a result, for example, as illustrated in FIG. 5B, the encoding target frame 200 is divided into three slices of slice # 0 to slice # 2. At this time, in this example, slice division is performed across the face area 222 that is not the face area near the center.

本実施の形態の第１の変形例では、符号化対象フレーム２００から複数の顔領域が検出された場合に、注目度が高いと考えられる画面の中心に最も近い顔領域を跨がないようにスライス分割を行う。これにより、映像の重要度が高いと考えられる領域でスライス分割による画質の劣化を抑制することができる。また、検出された重要領域に対するエラー耐性を高めることができる。 In the first modification of the present embodiment, when a plurality of face areas are detected from the encoding target frame 200, the face area closest to the center of the screen considered to have a high degree of attention is not straddled. Perform slice division. Thereby, it is possible to suppress deterioration in image quality due to slice division in an area where the importance of the video is considered high. In addition, it is possible to increase error resistance for the detected important area.

なお、上述では、符号化対象フレーム２００から検出された複数の顔領域のうち、画面の中心に最も近い顔領域に基づきスライス分割位置を決定したが、これはこの例に限定されない。例えば、ユーザ操作などにより符号化対象フレーム２００内の位置を選択できるようにし、選択された位置に最も近い顔領域に基づきスライス分割位置を決定するようにしてもよい。 In the above description, the slice division position is determined based on the face area closest to the center of the screen among the plurality of face areas detected from the encoding target frame 200, but this is not limited to this example. For example, the position in the encoding target frame 200 may be selected by a user operation or the like, and the slice division position may be determined based on the face area closest to the selected position.

＜本実施形態の第２の変形例＞
次に、本実施形態の第２の変形例について説明する。本実施形態の第２の変形例は、上述した第１の変形例と同様に、符号化対象フレーム２００から複数の顔領域が検出された場合の例である。図６は、本実施形態の第２の変形例に適用可能な符号化装置１０２の一例の構成を示す。なお、図６において、上述した図１と共通する部分には同一の符号を付し、詳細な説明を省略する。 <Second Modification of the Present Embodiment>
Next, a second modification of the present embodiment will be described. The second modified example of the present embodiment is an example when a plurality of face regions are detected from the encoding target frame 200, as in the first modified example described above. FIG. 6 shows an exemplary configuration of an encoding apparatus 102 applicable to the second modification of the present embodiment. In FIG. 6, the same reference numerals are given to the portions common to FIG. 1 described above, and detailed description thereof is omitted.

図６に示される符号化装置１０２は、上述した図１に示される符号化装置１００に対して、焦点近傍顔決定部３４が追加された構成となっている。合焦位置取得手段としての焦点近傍顔決定部３４は、例えばこの符号化装置１０２が適用される撮影装置の撮像光学系や撮像信号処理部から、符号化対象フレーム２００内の合焦位置を示す情報を受け取る。例えば、瞳分割位相差方式や、像鮮鋭度方式により合焦制御を行っている場合、画面内の各位置において合焦判定を行うことができ、画面内での合焦位置を取得することができる。これに限らず、例えば画像データからエッジ情報を抽出して合焦判定を行うことも可能である。 The coding apparatus 102 shown in FIG. 6 has a configuration in which a near-focal face determining unit 34 is added to the coding apparatus 100 shown in FIG. 1 described above. The near-focus face determination unit 34 as a focus position acquisition unit indicates a focus position in the encoding target frame 200 from, for example, an imaging optical system or an imaging signal processing unit of an imaging apparatus to which the encoding device 102 is applied. Receive information. For example, when focus control is performed by the pupil division phase difference method or the image sharpness method, focus determination can be performed at each position on the screen, and the focus position on the screen can be acquired. it can. However, the present invention is not limited to this, and for example, it is possible to extract the edge information from the image data and perform the focus determination.

一例として、上述した図５（ａ）に例示したように、顔検出部３０において、符号化対象フレーム２００から顔領域２２０〜２２２が検出されたものとする。顔領域２２０は対角がブロック座標（０，０）および（２，１）、顔領域２２１は対角がブロック座標（２，２）および（４，３）、顔領域２２２は対角がブロック座標（４，０）および（６，２）でそれぞれ示される矩形領域として検出される。これら顔領域２２０〜２２２の検出結果は、焦点近傍顔決定部３４に供給される。 As an example, as illustrated in FIG. 5A described above, the face detection unit 30 detects the face regions 220 to 222 from the encoding target frame 200. The face area 220 has diagonal block coordinates (0, 0) and (2, 1), the face area 221 has diagonal block coordinates (2, 2) and (4, 3), and the face area 222 has diagonal blocks. It is detected as a rectangular area indicated by coordinates (4, 0) and (6, 2). The detection results of these face regions 220 to 222 are supplied to the near focus face determination unit 34.

焦点近傍顔決定部３４は、顔検出部３０から供給された顔領域の検出結果に基づき、顔検出部３０で検出された複数の顔領域２２０〜２２２のうち合焦位置に最も近い顔領域を判定する。 The near focus face determination unit 34 selects a face region closest to the in-focus position among the plurality of face regions 220 to 222 detected by the face detection unit 30 based on the detection result of the face region supplied from the face detection unit 30. judge.

この判定は、例えば次式（３）を用いて行う。なお、式（３）において値ｘ_{ｆｏｃｕｓ}および値ｙ_{ｆｏｃｕｓ}は、合焦位置を含むブロックの水平方向および垂直方向の座標をそれぞれ示す。また、値ｘ_ｎおよび値ｙ_ｎ（ただし４≧ｎ≧１）は、顔領域の各頂点（左上、右上、左下および右下）のｘおよびｙ座標をそれぞれ示す。
ｆｏｃｕｓ＿ｄｉｓｔ＝(ｘ_ｎ−ｘ_{ｆｏｃｕｓ})^２＋(ｙ_ｎ−ｙ_{ｆｏｃｕｓ})^２ …（３） This determination is performed using, for example, the following equation (3). In Expression (3), the value x _focus and the value y _focus indicate the horizontal and vertical coordinates of the block including the in-focus position, respectively. The value x _n and the value y _{n (provided} that 4 ≧ n ≧ 1) shows the vertices of the face area (upper left, upper right, lower left and lower right) of x and y coordinates, respectively.
focus_dist = (x _n −x _focus ) ² + (y _n −y _focus ) ² (3)

式（３）により、値ｆｏｃｕｓ＿ｄｉｓｔを符号化対象フレーム２００から検出された各顔領域２２０〜２２２についてそれぞれ求め、値ｆｏｃｕｓ＿ｄｉｓｔが最も小さくなる顔領域を、焦点近傍顔領域に決定する。図５（ａ）において、例えば、顔領域２２０〜２２２のうち顔領域２２２について求められた値ｆｏｃｕｓ＿ｄｉｓｔが最も小さく、顔領域２２２が焦点近傍顔領域として選択されたものとする。 The value focus_dist is obtained for each of the face regions 220 to 222 detected from the encoding target frame 200 by the expression (3), and the face region having the smallest value focus_dist is determined as the focus vicinity face region. In FIG. 5A, for example, it is assumed that the value focus_dist obtained for the face area 222 among the face areas 220 to 222 is the smallest, and the face area 222 is selected as the near focus face area.

そして、この焦点近傍顔領域について、式（１）による顔領域の大きさの判定が行われる。判定の結果、焦点近傍顔領域に決定された顔領域２２２の符号化対象フレーム２００全体に占める割合が閾値ｔｈよりも小さいと判定された場合は、当該顔領域２２２を跨がないように、スライス分割を行う。図５（ａ）の例では、ブロック座標（ｘ，１）の下端より下側は、顔領域２２２を含まない領域であるため、ブロック座標（ｘ，１）の下端でスライス分割を行う。一方、ブロック座標（ｘ，０）および（ｘ，１）の範囲は、顔領域２２２を含むため、スライス分割を行わない。 Then, the size of the face area is determined by the expression (1) for the near-focus face area. As a result of the determination, if it is determined that the ratio of the face area 222 determined as the focus vicinity face area to the entire encoding target frame 200 is smaller than the threshold th, the slice is set so as not to straddle the face area 222. Split. In the example of FIG. 5A, since the area below the lower end of the block coordinate (x, 1) is an area that does not include the face area 222, slice division is performed at the lower end of the block coordinate (x, 1). On the other hand, since the range of the block coordinates (x, 0) and (x, 1) includes the face region 222, slice division is not performed.

その結果、例えば図７に例示されるように、符号化対象フレーム２００がスライス＃０およびスライス＃１の２つのスライスに分割される。このとき、この例では、焦点近傍顔領域ではない顔領域２２１を跨ぐスライス分割がなされている。また、図７に点線で示されるように、スライス＃１をさらにスライス分割し、符号化対象フレーム２００を３つのスライス＃０〜＃２に分割してもよい。 As a result, for example, as illustrated in FIG. 7, the encoding target frame 200 is divided into two slices, slice # 0 and slice # 1. At this time, in this example, slice division is performed across the face area 221 that is not the focus vicinity face area. Further, as indicated by a dotted line in FIG. 7, slice # 1 may be further divided into slices, and encoding target frame 200 may be divided into three slices # 0 to # 2.

このように、本実施の形態の第２の変形例では、符号化対象フレーム２００から複数の顔領域が検出された場合に、注目度が高いと考えられる、画面内の合焦位置に最も近い顔領域を跨がないようにスライス分割を行う。これにより、映像の重要度が高いと考えられる領域でスライス分割による画質の劣化を抑制することができる。また、検出された重要領域に対するエラー耐性を高めることができる。 As described above, in the second modification example of the present embodiment, when a plurality of face regions are detected from the encoding target frame 200, it is closest to the in-focus position in the screen that is considered to have a high degree of attention. Slice division is performed so as not to straddle the face area. Thereby, it is possible to suppress deterioration in image quality due to slice division in an area where the importance of the video is considered high. In addition, it is possible to increase error resistance for the detected important area.

なお、上述では、本発明の実施形態および実施形態の各変形例において、粋ライス分割を水平方向にのみ行うように説明したが、これはこの例に限定されない。すなわちＨ．２６４では、スライスグループと呼ばれる技術により、例えばスライス分割を矩形状に行うことが可能とされている（フォアグラウンド／レフトオーバー）。この矩形状のスライス分割を、本発明に適用することも可能である。この場合、検出された顔領域や顔パーツ領域を含む矩形領域をスライスとすることが考えられる。 In the above description, the embodiment of the present invention and each modification of the embodiment have been described so that the pure rice division is performed only in the horizontal direction, but this is not limited to this example. That is, H.H. In H.264, for example, slice division can be performed in a rectangular shape by a technique called a slice group (foreground / leftover). This rectangular slice division can also be applied to the present invention. In this case, a rectangular area including the detected face area or face part area may be considered as a slice.

＜他の実施形態＞
上述の実施形態および各変形例は、システム或は装置のコンピュータ（或いはＣＰＵ、ＭＰＵ等）によりソフトウェア的に実現することも可能である。 <Other embodiments>
The above-described embodiments and modifications may be realized by software by a computer of a system or apparatus (or CPU, MPU, etc.).

従って、上述の実施形態をコンピュータで実現するために、該コンピュータに供給されるコンピュータプログラム自体も本発明を実現するものである。つまり、上述の実施形態の機能を実現するためのコンピュータプログラム自体も本発明の一つである。 Therefore, the computer program itself supplied to the computer in order to implement the above-described embodiment by the computer also realizes the present invention. That is, the computer program itself for realizing the functions of the above-described embodiments is also one aspect of the present invention.

なお、上述の実施形態を実現するためのコンピュータプログラムは、コンピュータで読み取り可能であれば、どのような形態であってもよい。例えば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等で構成することができるが、これらに限るものではない。 The computer program for realizing the above-described embodiment may be in any form as long as it can be read by a computer. For example, it can be composed of object code, a program executed by an interpreter, script data supplied to the OS, but is not limited thereto.

上述の実施形態を実現するためのコンピュータプログラムは、記憶媒体又は有線／無線通信によりコンピュータに供給される。プログラムを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、磁気テープ等の磁気記憶媒体、ＭＯ、ＣＤ、ＤＶＤ等の光／光磁気記憶媒体、不揮発性の半導体メモリなどがある。 A computer program for realizing the above-described embodiment is supplied to a computer via a storage medium or wired / wireless communication. Examples of the storage medium for supplying the program include a magnetic storage medium such as a flexible disk, a hard disk, and a magnetic tape, an optical / magneto-optical storage medium such as an MO, CD, and DVD, and a nonvolatile semiconductor memory.

有線／無線通信を用いたコンピュータプログラムの供給方法としては、コンピュータネットワーク上のサーバを利用する方法がある。この場合、本発明を形成するコンピュータプログラムとなりうるデータファイル（プログラムファイル）をサーバに記憶しておく。プログラムファイルとしては、実行形式のものであっても、ソースコードであっても良い。 As a computer program supply method using wired / wireless communication, there is a method of using a server on a computer network. In this case, a data file (program file) that can be a computer program forming the present invention is stored in the server. The program file may be an executable format or a source code.

そして、このサーバにアクセスしたクライアントコンピュータに、プログラムファイルをダウンロードすることによって供給する。この場合、プログラムファイルを複数のセグメントファイルに分割し、セグメントファイルを異なるサーバに分散して配置することも可能である。 The program file is supplied by downloading to a client computer that has accessed the server. In this case, the program file can be divided into a plurality of segment files, and the segment files can be distributed and arranged on different servers.

つまり、上述の実施形態を実現するためのプログラムファイルをクライアントコンピュータに提供するサーバ装置も本発明の一つである。 That is, a server apparatus that provides a client computer with a program file for realizing the above-described embodiment is also one aspect of the present invention.

また、上述の実施形態を実現するためのコンピュータプログラムを暗号化して格納した記憶媒体を配布し、所定の条件を満たしたユーザに、暗号化を解く鍵情報を供給し、ユーザの有するコンピュータへのインストールを許可してもよい。鍵情報は、例えばインターネットを介してホームページからダウンロードさせることによって供給することができる。 In addition, a storage medium in which the computer program for realizing the above-described embodiment is encrypted and distributed is distributed, and key information for decrypting is supplied to a user who satisfies a predetermined condition, and the user's computer Installation may be allowed. The key information can be supplied by being downloaded from a homepage via the Internet, for example.

また、上述の実施形態を実現するためのコンピュータプログラムは、すでにコンピュータ上で稼働するＯＳの機能を利用するものであってもよい。 Further, the computer program for realizing the above-described embodiment may use an OS function already running on the computer.

さらに、上述の実施形態を実現するためのコンピュータプログラムは、その一部をコンピュータに装着される拡張ボード等のファームウェアで構成してもよいし、拡張ボード等が備えるＣＰＵで実行するようにしてもよい。 Further, a part of the computer program for realizing the above-described embodiment may be configured by firmware such as an expansion board attached to the computer, or may be executed by a CPU provided in the expansion board. Good.

本発明の実施形態に適用可能な符号化装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the encoding apparatus applicable to embodiment of this invention. 本発明の実施形態によるスライス分割方法を説明するための図である。It is a figure for demonstrating the slice division | segmentation method by embodiment of this invention. スライス分割を顔領域を跨がないように行うことを説明するための図である。It is a figure for demonstrating performing slice division so that a face area may not be straddled. 本実施形態の第１の変形例に適用可能な符号化装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the encoding apparatus applicable to the 1st modification of this embodiment. 本発明の実施形態の第１の変形例によるスライス分割方法を説明するための図である。It is a figure for demonstrating the slice division | segmentation method by the 1st modification of embodiment of this invention. 本実施形態の第２の変形例に適用可能な符号化装置の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the encoding apparatus applicable to the 2nd modification of this embodiment. 本発明の実施形態の第２の変形例によるスライス分割方法を説明するための図である。It is a figure for demonstrating the slice division | segmentation method by the 2nd modification of embodiment of this invention. 従来技術によるスライス分割方法を説明するための図である。It is a figure for demonstrating the slice division | segmentation method by a prior art.

符号の説明Explanation of symbols

１０フレームメモリ
１３量子化部
１４量子化制御部
１５符号化制御部
３０顔検出部
３１顔パーツ検出部
３２スライス分割部
３３中心近傍顔決定部
３４焦点近傍顔決定部
１００，１０１，１０２符号化装置 DESCRIPTION OF SYMBOLS 10 Frame memory 13 Quantization part 14 Quantization control part 15 Encoding control part 30 Face detection part 31 Face part detection part 32 Slice division | segmentation part 33 Center vicinity face determination part 34 Focal vicinity face determination part 100,101,102 Encoding apparatus

Claims

画像データに対し、それぞれ単独に復号が可能なスライスを単位に符号化を行う符号化装置であって、
画像データを符号化して符号化ストリームとして出力する符号化手段と、
前記画像データに対して顔検出を行って顔領域を検出し、検出された該顔領域に含まれる顔パーツをさらに検出する顔検出手段と、
前記顔検出手段で検出された前記顔領域の前記画像データによる画面に占める割合が閾値よりも小さいと判定したら、前記スライスの分割を行うスライス分割位置を該顔領域を含む領域と含まない領域との境界に基づき決定し、該割合が該閾値以上であると判定したら、該スライス分割位置を前記顔検出手段で該顔領域から検出された前記顔パーツを含む領域と含まない領域との境界に基づき決定するスライス分割位置決定手段と、
前記符号化手段による前記符号化を制御して、前記スライス分割位置決定手段で決定された前記スライス分割位置で前記画像データに対する前記スライス分割を行う符号化制御手段と
を有する
ことを特徴とする符号化装置。 An encoding device that encodes image data in units of slices that can be decoded independently,
Encoding means for encoding image data and outputting it as an encoded stream;
Face detection means for detecting a face area by performing face detection on the image data, and further detecting a face part included in the detected face area;
If it is determined that a ratio of the face area detected by the face detection unit to the screen based on the image data is smaller than a threshold value, a slice division position for dividing the slice is an area including the face area and an area not including the face area. If the ratio is determined to be equal to or greater than the threshold, the slice division position is set to the boundary between the area including the face part detected from the face area by the face detection unit and the area not including the face part. Slice division position determining means for determining based on;
And a coding control unit that controls the coding by the coding unit and performs the slice division on the image data at the slice division position determined by the slice division position determination unit. Device.

前記スライス分割位置決定手段は、
前記顔検出手段で複数の前記顔領域が検出され、且つ、検出された該複数の顔領域それぞれの前記画面に占める割合が前記閾値より小さいと判定したら、前記複数の顔領域のうち前記スライス分割位置を決定するための前記顔領域を前記画面の中の位置に基づき選択する
ことを特徴とする請求項１に記載の符号化装置。 The slice division position determining means includes
When the face detection unit detects a plurality of the face areas and determines that a ratio of each of the detected face areas to the screen is smaller than the threshold, the slice division of the plurality of face areas The encoding apparatus according to claim 1, wherein the face region for determining a position is selected based on a position in the screen.

前記スライス分割位置決定手段は、
前記複数の顔領域のうち前記画面の中央に最も近い位置の前記顔領域に基づき前記スライス分割位置を決定する
ことを特徴とする請求項２に記載の符号化装置。 The slice division position determining means includes
The encoding apparatus according to claim 2, wherein the slice division position is determined based on the face area closest to the center of the screen among the plurality of face areas.

前記画像データにおける合焦位置を取得する合焦位置取得手段をさらに有し、
前記スライス分割位置決定手段は、
前記顔検出手段で複数の前記顔領域が検出され、且つ、検出された該複数の顔領域それぞれの前記画面に占める割合が前記閾値より小さいと判定したら、前記複数の顔領域のうち前記スライス分割位置を決定するための前記顔領域を前記合焦位置取得手段で取得された前記合焦位置に基づき選択する
ことを特徴とする請求項１に記載の符号化装置。 A focusing position acquisition means for acquiring a focusing position in the image data;
The slice division position determining means includes
When the face detection unit detects a plurality of the face areas and determines that a ratio of each of the detected face areas to the screen is smaller than the threshold, the slice division of the plurality of face areas The encoding apparatus according to claim 1, wherein the face region for determining the position is selected based on the in-focus position acquired by the in-focus position acquisition unit.

画像データに対し、それぞれ単独に復号が可能なスライスを単位に符号化を行う符号化方法であって、
画像データを符号化して符号化ストリームとして出力する符号化ステップと、
前記画像データに対して顔検出を行って顔領域を検出し、検出された該顔領域に含まれる顔パーツをさらに検出する顔検出ステップと、
前記顔検出ステップで検出された前記顔領域の前記画像データによる画面に占める割合が閾値よりも小さいと判定したら、前記スライスの分割を行うスライス分割位置を該顔領域を含む領域と含まない領域との境界に基づき決定し、該割合が該閾値以上であると判定したら、該スライス分割位置を前記顔検出ステップで該顔領域から検出された前記顔パーツを含む領域と含まない領域との境界に基づき決定するスライス分割位置決定ステップと、
前記符号化ステップによる前記符号化を制御して、前記スライス分割位置決定ステップで決定された前記スライス分割位置で前記画像データに対する前記スライス分割を行う符号化制御ステップと
を有する
ことを特徴とする符号化方法。 An encoding method for encoding image data in units of slices that can be decoded independently,
An encoding step of encoding image data and outputting it as an encoded stream;
A face detection step of detecting a face area by performing face detection on the image data, and further detecting a face part included in the detected face area;
If it is determined that the ratio of the face area detected in the face detection step to the screen based on the image data is smaller than a threshold value, the slice division position for dividing the slice is an area including the face area and an area not including the face area. If the ratio is determined to be equal to or greater than the threshold, the slice division position is set to the boundary between the area including the face part detected from the face area and the area not including the face area detected in the face detection step. A slice division position determining step to be determined based on;
An encoding control step of controlling the encoding by the encoding step and performing the slice division on the image data at the slice division position determined in the slice division position determination step. Method.

コンピュータを請求項１乃至請求項４の何れか１項に記載の符号化装置として機能させるプログラム。 A program that causes a computer to function as the encoding device according to any one of claims 1 to 4.