JP2017167727A

JP2017167727A - Processing device of moving image, processing method, and program

Info

Publication number: JP2017167727A
Application number: JP2016051049A
Authority: JP
Inventors: 強要; Tsutomu Kaname; 内藤　整; Hitoshi Naito; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-03-15
Filing date: 2016-03-15
Publication date: 2017-09-21
Anticipated expiration: 2036-03-15
Also published as: JP6606447B2

Abstract

PROBLEM TO BE SOLVED: To provide a processing device of a moving image capable of accurately extracting a person area including a face.SOLUTION: A processing device of a moving image includes: determination means for determining a first area having a predetermined shape including the face of a person in a first frame; and extraction means for extracting a predetermined person area including the face of the person from the first frame based on the first area of the first frame. The determination means determines a second pixel of a second frame before the first frame in time corresponding to each of a plurality of first pixels, and determines whether or not the first area of the first frame is decided by pattern matching based on the direction of a plurality of vectors toward the first pixel corresponding to each of the second pixels based on the pattern or based on the first area of the second frame.SELECTED DRAWING: Figure 1

Description

本発明は、動画通信を行う際の動画の処理技術に関する。 The present invention relates to a moving image processing technique when performing moving image communication.

動画通信は、例えば、テレビ会議システム等で使用される。また、インターネットの普及により、例えば、自宅からテレビ会議に参加することも容易に行える様になっている。しかしながら、動画通信においては、通信を行っている人のみならず、その背景、例えば、自宅でテレビ会議システムに参加している様な場合、その部屋の様子も相手側に伝わってしまう。 Video communication is used in, for example, a video conference system. Also, with the spread of the Internet, for example, it is possible to easily participate in a video conference from home. However, in video communication, not only the person who is communicating, but also the background, for example, when participating in a video conference system at home, the state of the room is also transmitted to the other party.

このため、撮影した動画から、例えば、上半身といった、顔を含む人体の一部のみを抽出し、それ以外の領域については、他の画像と置換したりぼかしたりして送信することが求められる。背景画像が変化しない場合、予め用意した背景画像に基づき人物領域を抽出することが可能である。しかしながら、光の当たり具合等で背景画像も通常変化し、この場合には背景画像そのものも動的に更新する必要がある。ここで、検出したい人物が頻繁に移動する場合には、移動する人物を検出することで背景画像を動的に更新できるが、検出したい人物の移動が僅かであると、人物自体が背景として検出されるため、人物領域の検出精度が劣化する。 For this reason, it is required to extract only a part of the human body including the face such as the upper body from the captured moving image, and to transmit other regions by replacing or blurring with other images. When the background image does not change, it is possible to extract a person region based on a background image prepared in advance. However, the background image usually changes depending on the degree of light hitting, and in this case, the background image itself needs to be dynamically updated. Here, if the person to be detected moves frequently, the background image can be dynamically updated by detecting the person to be moved, but if the person to be detected moves slightly, the person itself is detected as the background. Therefore, the detection accuracy of the person area is deteriorated.

一方、非特許文献１は、パターンマッチングにより人物の顔を含む領域を検出することを開示している。例えば、非特許文献１に記載の構成により検出した顔を含む領域からグラフカット法により人物領域を抽出することが考えられる。 On the other hand, Non-Patent Document 1 discloses detecting a region including a human face by pattern matching. For example, it is conceivable to extract a person region from a region including a face detected by the configuration described in Non-Patent Document 1 by a graph cut method.

ＶｉｏｌａＰａｕｌｅｔａｌ，"Ｒａｐｉｄｏｂｊｅｃｔｄｅｔｅｃｔｉｏｎｕｓｉｎｇａｂｏｏｓｔｅｄｃａｓｃａｄｅｏｆｓｉｍｐｌｅｆｅａｔｕｒｅｓ"，ＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，ＣＶＰＲ２００１，Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２００１ＩＥＥＥＣｏｍｐｕｔｅｒＳｏｃｉｅｔｙＣｏｎｆｅｒｅｎｃｅｏｎ．Ｖｏｌ．１２００１年Viola Paul et al, "Rapid object detection using a boosted cascade of simple features, CVPR 2001, Proceedings of CfPR 2001, Proceedings of CfPR 2001, Proceedings Effort. Vol. 1 2001

しかしながら、パターンマッチングによる検出は、人物がカメラの方向を向いている場合には精度良く行うことができるが、人物がカメラに対して横方向や、下方向等を向いた場合には精度良く行うことができない。 However, detection by pattern matching can be performed accurately when the person is facing the camera, but is performed accurately when the person faces the camera in the horizontal direction, the downward direction, or the like. I can't.

本発明は、顔を含む人物領域を精度良く抽出可能な動画の処理装置、処理方法及びプログラムを提供するものである。 The present invention provides a moving image processing apparatus, processing method, and program capable of accurately extracting a human region including a face.

本発明の一側面によると、動画の処理装置は、第１フレームにおいて人物の顔を含む所定形状の第１領域を決定する決定手段と、前記第１フレームの前記第１領域に基づき、前記第１フレームから前記人物の前記顔を含む所定の人物領域を抽出する抽出手段と、を備え、前記決定手段は、前記第１フレームの複数の第１画素それぞれに対応する、前記第１フレームより時間的に前の第２フレームの第２画素を判定し、前記第２画素それぞれから対応する前記第１画素に向かう複数のベクトルの方向に基づき、前記第１フレームの前記第１領域を、パターンマッチングにより決定するか前記第２フレームの前記第１領域に基づき決定するかを判定することを特徴とする。 According to one aspect of the present invention, the moving image processing device is configured to determine a first area having a predetermined shape including a human face in the first frame, and based on the first area of the first frame, the first area. Extracting means for extracting a predetermined person region including the face of the person from one frame, wherein the determining means is a time longer than the first frame corresponding to each of the plurality of first pixels of the first frame. In particular, the second pixel of the previous second frame is determined, and the first region of the first frame is pattern-matched based on the direction of a plurality of vectors from each of the second pixels toward the corresponding first pixel. It is determined whether to determine based on the first region of the second frame.

本発明によると、顔を含む人物領域を精度良く抽出することができる。 According to the present invention, it is possible to accurately extract a person region including a face.

一実施形態による処理装置の構成図。The block diagram of the processing apparatus by one Embodiment. 一実施形態による処理装置における処理のフローチャート。The flowchart of the process in the processing apparatus by one Embodiment. 一実施形態による矩形領域決定処理のフローチャート。The flowchart of the rectangular area determination process by one Embodiment. 一実施形態によるコア領域決定処理の説明図。Explanatory drawing of the core area | region determination process by one Embodiment. 一実施形態によるテンプレートを示す図。The figure which shows the template by one Embodiment. 一実施形態による評価値算出処理の説明図。Explanatory drawing of the evaluation value calculation process by one Embodiment. 一実施形態による矩形領域追跡処理の説明図。Explanatory drawing of the rectangular area tracking process by one Embodiment.

以下、本発明の例示的な実施形態について図面を参照して説明する。なお、以下の実施形態は例示であり、本発明を実施形態の内容に限定するものではない。また、以下の各図においては、実施形態の説明に必要ではない構成要素については図から省略する。 Hereinafter, exemplary embodiments of the present invention will be described with reference to the drawings. In addition, the following embodiment is an illustration and does not limit this invention to the content of embodiment. In the following drawings, components that are not necessary for the description of the embodiments are omitted from the drawings.

図１は、本実施形態による処理装置の構成図である。撮影部１０は、人物を含む動画を撮影して動画データを矩形領域決定部２０に出力する。矩形領域決定部２０は、動画データが示す動画の各フレームについて、人物の顔を含む矩形領域を決定する。処理部３０は、矩形領域決定部２０が決定したフレーム内の矩形領域に基づき、当該フレームの人物領域を決定する。なお、以下の説明においては、人物の上半身部分の領域を決定するものとする。合成部４０は、処理部３０が決定した人物領域をフレームから抽出し、予め用意した他の画像と合成する。なお、予め用意した他の画像と合成するのではなく、処理部３０が決定した人物領域以外の部分を削除したり、ぼかしたりする構成であっても良い。 FIG. 1 is a configuration diagram of a processing apparatus according to the present embodiment. The photographing unit 10 photographs a moving image including a person and outputs the moving image data to the rectangular area determination unit 20. The rectangular area determining unit 20 determines a rectangular area including a human face for each frame of the moving image indicated by the moving image data. The processing unit 30 determines the person area of the frame based on the rectangular area in the frame determined by the rectangular area determination unit 20. In the following description, the region of the upper body part of a person is determined. The synthesizing unit 40 extracts the person area determined by the processing unit 30 from the frame and synthesizes it with another image prepared in advance. Instead of synthesizing with another image prepared in advance, a configuration in which a portion other than the person area determined by the processing unit 30 is deleted or blurred may be used.

図２は、本実施形態による処理装置における処理のフローチャートである。矩形領域決定部２０は、Ｓ１０で、処理対象フレームにおいて顔を含む矩形領域を決定する。なお、Ｓ１０での処理の詳細については後述する。処理部３０は、Ｓ１１で、矩形領域決定部２０が決定した矩形領域を縮小して、コア領域を決定する。図４は、Ｓ１１におけるコア領域決定処理の説明図である。図４において、矩形領域２は、矩形領域決定部２０が決定した人物１の顔領域４を含む矩形領域である。図２に示す様に、矩形領域決定部２０が決定する矩形領域２は、通常、顔以外の背景部分を含んでいる。矩形領域に含まれる背景部分が大きいと、グラフカット法による顔領域４の抽出精度が劣化する。したがって、本実施形態では、含まれる背景部分が少なくなる様に、コア領域３を決定する。 FIG. 2 is a flowchart of processing in the processing apparatus according to the present embodiment. In S10, the rectangular area determining unit 20 determines a rectangular area including a face in the processing target frame. Details of the processing in S10 will be described later. In S11, the processing unit 30 reduces the rectangular area determined by the rectangular area determination unit 20, and determines a core area. FIG. 4 is an explanatory diagram of the core area determination process in S11. In FIG. 4, the rectangular area 2 is a rectangular area including the face area 4 of the person 1 determined by the rectangular area determining unit 20. As shown in FIG. 2, the rectangular area 2 determined by the rectangular area determining unit 20 usually includes a background portion other than the face. When the background portion included in the rectangular area is large, the extraction accuracy of the face area 4 by the graph cut method is deteriorated. Therefore, in the present embodiment, the core region 3 is determined so that the included background portion is reduced.

図４に示す様に、本実施形態において、コア領域３の中心は、矩形領域決定部２０が決定した矩形領域２の中心と同じとし、コア領域３の高さ及び幅は、それぞれ、矩形領域２の高さ及び幅に所定の係数δ（０＜δ≦１）を乗じた値とする。なお、高さとは垂直方向の長さ（Ｙ軸方向）であり、幅とは水平方向（Ｘ軸方向）の長さとする。本実施形態において、係数δの値は、予め、実験的に求めておくものとする。しかしながら、コア領域３の高さ及び幅が小さくなり過ぎても、グラフカット法による顔領域４の抽出精度が劣化するため、コア領域３の高さ及び幅の最小値を決めておき、係数δにより得られた高さ及び幅が最小値より小さくなる場合には、コア領域３の高さ及び幅を予め決めておいた最小値にする構成であっても良い。なお、コア領域３を求めるのではなく、矩形領域２に基づきグラフカット法により顔領域４を抽出する構成であっても良い。その場合には、図２のＳ１１は省略される。 As shown in FIG. 4, in the present embodiment, the center of the core region 3 is the same as the center of the rectangular region 2 determined by the rectangular region determination unit 20, and the height and width of the core region 3 are respectively rectangular regions. The height and width of 2 are multiplied by a predetermined coefficient δ (0 <δ ≦ 1). The height is the length in the vertical direction (Y-axis direction), and the width is the length in the horizontal direction (X-axis direction). In the present embodiment, the value of the coefficient δ is experimentally obtained in advance. However, even if the height and width of the core region 3 become too small, the extraction accuracy of the face region 4 by the graph cut method deteriorates. Therefore, the minimum value of the height and width of the core region 3 is determined, and the coefficient δ When the height and width obtained by the above are smaller than the minimum value, the height and width of the core region 3 may be set to a predetermined minimum value. Instead of obtaining the core area 3, the face area 4 may be extracted by the graph cut method based on the rectangular area 2. In that case, S11 in FIG. 2 is omitted.

図２に戻り、Ｓ１２において、処理部３０は、コア領域３に基づきグラフカット法により顔領域４を抽出する。処理部３０は、図５に示す様な、人物の上半身のテンプレート６を示すテンプレート情報を保持している。テンプレート６は、人物の上半身領域を示す雛形である。処理部３０は、テンプレート６の顔部分の大きさが、抽出した顔領域４の大きさに近づく様に、テンプレート６の大きさを調整し、調整後のテンプレート６に基づきフレーム内における人物の顔領域４を除く上半身部分を特定する。そして、処理部３０は、調整後のテンプレート６に基づき判定した顔領域４を除く上半身部分と、抽出した顔領域４と、を含む部分を、人物１の上半身に対応する人物領域として特定する。既に説明した様に、合成部４０は、処理部３０がこの様にして特定したフレーム内の人物領域を抽出し、他の画像と合成する。 Returning to FIG. 2, in S <b> 12, the processing unit 30 extracts the face region 4 by the graph cut method based on the core region 3. The processing unit 30 holds template information indicating the template 6 of the upper body of the person as shown in FIG. The template 6 is a template showing the upper body area of a person. The processing unit 30 adjusts the size of the template 6 so that the size of the face portion of the template 6 approaches the size of the extracted face region 4, and the human face in the frame based on the adjusted template 6. The upper body part excluding the region 4 is specified. Then, the processing unit 30 specifies a part including the upper body part excluding the face area 4 determined based on the adjusted template 6 and the extracted face area 4 as a person area corresponding to the upper body of the person 1. As already described, the synthesizing unit 40 extracts the person region in the frame specified by the processing unit 30 in this way, and synthesizes it with another image.

続いて、図２のＳ１０における矩形領域決定処理について説明する。図３は、図２のＳ１０における矩形領域決定処理の詳細を示すフローチャートである。なお、最初のフレームについてはパターンマッチングにより矩形領域２を検出する。つまり、図３の処理は、２番目以降のフレームに対する処理である。まず、Ｓ２０において、矩形領域決定部２０は、処理対象フレームと、１つ前のフレーム（以下、直前フレームと呼ぶ。）それぞれについて複数の特徴点を抽出し、さらに、直前フレームと処理対象フレームの各特徴点の対応関係を判定する。特徴点の対応関係は、特徴点間の距離が画素値等に基づき判定することができる。そして、直前フレームの特徴点から、処理対象フレームの対応する特徴点に向かうベクトルを、各特徴点について判定する。図６（Ａ）の矢印は、この特徴点のベクトルを示している。なお、人物以外の領域（背景部分）の変化は、通常、小さいため、非零のべクトルの多くは、人物領域に対応する。したがって、各ベクトルは、直前フレームから処理対象フレームにかけての人物領域の動きを主に示している。 Next, the rectangular area determination process in S10 of FIG. 2 will be described. FIG. 3 is a flowchart showing details of the rectangular area determination processing in S10 of FIG. For the first frame, the rectangular area 2 is detected by pattern matching. That is, the process of FIG. 3 is a process for the second and subsequent frames. First, in S20, the rectangular area determining unit 20 extracts a plurality of feature points for each of the processing target frame and the previous frame (hereinafter referred to as the immediately preceding frame), and further, for the previous frame and the processing target frame. The correspondence between each feature point is determined. The correspondence between the feature points can be determined based on the pixel value or the like based on the distance between the feature points. Then, a vector from the feature point of the immediately preceding frame to the corresponding feature point of the processing target frame is determined for each feature point. The arrows in FIG. 6A indicate the vector of feature points. Since the change in the area other than the person (background part) is usually small, most of the non-zero vectors correspond to the person area. Therefore, each vector mainly indicates the movement of the person area from the immediately preceding frame to the processing target frame.

矩形領域決定部２０は、Ｓ２１において、図６（Ｂ）に示す様に、各ベクトルのＸ軸方向（水平方向）に対する角度θを判定し、総てのべクトルの角度θの分散Ｔを求めこれを動きの評価値とする。なお、角度の基準方向は、Ｘ軸方向に限定されず、所定方向とすることができる。なお、その絶対値が０であるベクトルについては評価値の算出に使用しない構成とすることができる。さらには、その絶対値が０ではないが、例えば、１画素間の距離程度の非常に小さいベクトルについても評価値の算出に使用しない構成とすることができる。矩形領域決定部２０は、Ｓ２２において、評価値Ｔと閾値を比較する。評価値Ｔが閾値以上であると、矩形領域決定部２０は、Ｓ２３においてパターンマッチングにより矩形領域２を検出する。一方、評価値Ｔが閾値未満であると、矩形領域決定部２０は、Ｓ２４において直前フレームの矩形領域２に基づき、処理対象フレームの矩形領域２を決定する。 In S21, as shown in FIG. 6B, the rectangular area determining unit 20 determines the angle θ of each vector with respect to the X-axis direction (horizontal direction), and obtains the variance T of the angles θ of all the vectors. This is the motion evaluation value. Note that the reference direction of the angle is not limited to the X-axis direction and can be a predetermined direction. In addition, about the vector whose absolute value is 0, it can be set as the structure which is not used for calculation of an evaluation value. Furthermore, although the absolute value is not 0, for example, a very small vector having a distance of about 1 pixel can be used for calculating the evaluation value. In step S22, the rectangular area determination unit 20 compares the evaluation value T with a threshold value. If the evaluation value T is equal to or greater than the threshold value, the rectangular area determination unit 20 detects the rectangular area 2 by pattern matching in S23. On the other hand, if the evaluation value T is less than the threshold value, the rectangular area determination unit 20 determines the rectangular area 2 of the processing target frame based on the rectangular area 2 of the immediately preceding frame in S24.

以下、Ｓ２４において、直前フレームの矩形領域２に基づき、処理対象フレームの矩形領域２をどの様に決定するかについて図７を用いて説明する。基本的な考え方は、Ｓ２１で求めたベクトルの内、直前フレームの矩形領域内を始点とするベクトル（以下、判定ベクトルと呼ぶ。）に基づき、矩形領域２の位置及びサイズを調整して処理対象フレームの矩形領域２とする。なお、位置及びサイズの調整量は、判定ベクトルの方向と、その絶対値に基づき決定する。本実施形態においては、人物１の顔領域４の移動状態を図７に示す５つの状態に分類する。そして、矩形領域決定部２０は、直前フレームから処理対象フレームまでの顔領域４の移動状態が、図７に示す５つの状態の何れであるかを判定ベクトルに基づき判定する。 Hereinafter, how to determine the rectangular area 2 of the processing target frame based on the rectangular area 2 of the immediately preceding frame in S24 will be described with reference to FIG. The basic idea is to adjust the position and size of the rectangular area 2 based on a vector starting from the rectangular area of the immediately preceding frame (hereinafter referred to as a determination vector) among the vectors obtained in S21. This is a rectangular area 2 of the frame. The position and size adjustment amounts are determined based on the direction of the determination vector and its absolute value. In the present embodiment, the movement state of the face area 4 of the person 1 is classified into five states shown in FIG. Then, the rectangular area determination unit 20 determines, based on the determination vector, which of the five states illustrated in FIG. 7 is the movement state of the face area 4 from the immediately preceding frame to the processing target frame.

まず、静止状態とは、人物１の顔領域４が殆ど移動していない状態である。したがって、総ての判定ベクトルの絶対値が小さい場合に静止状態と判定する。具体的には、判定ベクトルの総ての絶対値が有る小さな値以下である場合に静止状態であると判定する。静止状態の場合、矩形領域決定部２０は、直前フレームの矩形領域２の位置、高さ及び幅を変更せずに、そのまま処理対象フレームの矩形領域２とする。 First, the stationary state is a state in which the face area 4 of the person 1 has hardly moved. Therefore, it is determined that the object is stationary when the absolute values of all the determination vectors are small. Specifically, when all the absolute values of the determination vectors are equal to or less than a small value, it is determined that the camera is stationary. In the stationary state, the rectangular area determination unit 20 sets the rectangular area 2 of the processing target frame as it is without changing the position, height, and width of the rectangular area 2 of the immediately preceding frame.

スライド状態とは、人物１の顔領域４の方向が同じまま、何れかの方向にスライドしている状態に対応する。したがって、各判定ベクトルが略同じ方向を向いており、その絶対値の差が小さい場合にスライド状態と判定する。具体的には、例えば、各判定ベクトルの方向を、水平及び垂直方向を含み、４５度間隔で設けた８方向のいずれかで近似する。そして、各判定ベクトルの所定割合以上が同じ方向であると、各判定ベクトルが略同じ方向を向いていると判定することができる。また、各判定ベクトルの絶対値の分散が所定値以下であると、絶対値の差が小さいと判定することができる。スライド状態の場合、矩形領域決定部２０は、直前フレームの矩形領域２の高さ及び幅については変更せずに、その位置をスライドさせる。なお、スライドさせる方向は、例えば、判定ベクトルの方向を平均化した方向であり、スライドさせる量は、判定ベクトルの絶対値の平均値とすることができる。 The sliding state corresponds to a state in which the face area 4 of the person 1 is slid in any direction with the same direction. Therefore, when each determination vector is directed in substantially the same direction and the difference between the absolute values is small, the slide state is determined. Specifically, for example, the direction of each determination vector is approximated in any one of eight directions including horizontal and vertical directions and provided at intervals of 45 degrees. When the predetermined ratio or more of the determination vectors are in the same direction, it can be determined that the determination vectors are directed in substantially the same direction. Further, if the variance of the absolute values of the determination vectors is equal to or less than a predetermined value, it can be determined that the difference between the absolute values is small. In the sliding state, the rectangular area determination unit 20 slides the position without changing the height and width of the rectangular area 2 of the immediately preceding frame. Note that the sliding direction is, for example, a direction obtained by averaging the directions of the determination vectors, and the sliding amount can be an average value of absolute values of the determination vectors.

ヘッドピッチング状態とは、人物１が、下を向く、或いは、上を向く動作をした状態、つまり、人物が上下方向に首を振った状態に対応する。したがって、各判定ベクトルが略垂直方向に向いているが、その絶対値が垂直方向位置により異なる場合、ヘッドピッチング状態と判定する。或いは、各判定ベクトルが略垂直方向に向いているが、垂直方向の上側と下側ではその向きが互いに逆である場合、ヘッドピッチング状態と判定する。具体的には、スライド状態の判定と同様に、各判定ベクトルの方向を近似し、所定割合以上のベクトルが垂直方向であると、各判定ベクトルが略垂直方向に向いていると判定することができる。また、各判定ベクトルを垂直方向の所定幅単位でグループ化し、グループ内の判定ベクトルの絶対値の平均を求める。そして、グループ内の判定ベクトルの絶対値の分散が所定値以下であり、かつ、グループ間の平均の差が所定値より大きいと、判定ベクトルの絶対値が垂直方向位置により異なると判定することができる。ヘッドピッチング状態の場合、矩形領域決定部２０は、直前フレームの矩形領域２の幅及び位置は変更せず、高さのみを変更する。これは、人物が上又は下を向くことで、フレームにおける顔領域４の高さが変化するからである。なお、高さの変更量は、判定ベクトルに基づき求めるが、その算出方法は、予め実験的に求めておく。なお、位置については、下を向いたか上を向いたかに応じて垂直方向に移動させる構成であっても良い。この場合、移動量は、判定ベクトルの絶対値に基づき求める。 The head pitching state corresponds to a state in which the person 1 moves downward or upwards, that is, a state in which the person swings his / her head vertically. Therefore, if each determination vector is oriented substantially in the vertical direction, but its absolute value differs depending on the position in the vertical direction, it is determined that the head is pitching. Alternatively, when each determination vector is oriented in a substantially vertical direction, but the directions are opposite to each other on the upper side and the lower side in the vertical direction, the head pitching state is determined. Specifically, similar to the determination of the slide state, it is possible to approximate the direction of each determination vector, and to determine that each determination vector is oriented substantially in the vertical direction when vectors of a predetermined ratio or more are in the vertical direction. it can. Each determination vector is grouped in units of a predetermined width in the vertical direction, and an average of absolute values of the determination vectors in the group is obtained. Then, when the variance of the absolute values of the determination vectors in the group is equal to or smaller than a predetermined value and the average difference between the groups is larger than the predetermined value, it is determined that the absolute value of the determination vector differs depending on the vertical position. it can. In the head pitching state, the rectangular area determination unit 20 does not change the width and position of the rectangular area 2 of the immediately preceding frame, but changes only the height. This is because the height of the face area 4 in the frame changes as the person turns up or down. The height change amount is obtained based on the determination vector, and the calculation method is obtained experimentally in advance. The position may be moved in the vertical direction depending on whether it is directed downward or upward. In this case, the movement amount is obtained based on the absolute value of the determination vector.

ヘッド回転状態とは、人物１が、左又は右を向く動作をした状態、つまり、人物が左右に首を振った状態に対応する。したがって、各判定ベクトルが略水平方向に向いているが、その絶対値が水平方向位置により異なる場合、ヘッド回転状態と判定する。また、矩形領域の左側と右側でその向きが異なる場合もヘッド回転状態と判定することができる。なお、判定方法はヘッドピッチング状態と同様である。ヘッド回転状態の場合、矩形領域決定部２０は、直前フレームの矩形領域２の高さ及び位置は変更せず幅のみを変更する。これは、人物が右又は左を向くことで、フレームにおける顔領域４の幅が変化するからである。なお、位置については、右を向いたか左を向いたかに応じて水平方向に移動させる構成であっても良い。なお、幅の変更量や、水平方向の移動量は、判定ベクトルに基づき求めるが、その算出方法は、予め実験的に求めておく。 The head rotation state corresponds to a state in which the person 1 moves left or right, that is, a state in which the person swings his / her head left and right. Therefore, when each determination vector is oriented substantially in the horizontal direction, but its absolute value varies depending on the horizontal position, it is determined that the head is in a rotating state. Further, the head rotation state can also be determined when the left and right sides of the rectangular area have different orientations. The determination method is the same as in the head pitching state. In the case of the head rotation state, the rectangular area determination unit 20 changes only the width without changing the height and position of the rectangular area 2 of the immediately preceding frame. This is because the width of the face area 4 in the frame changes as the person turns to the right or left. In addition, about the position, the structure moved to a horizontal direction according to whether it faced the right or left may be sufficient. Note that the width change amount and the horizontal movement amount are obtained based on the determination vector, but the calculation method is obtained experimentally in advance.

その他状態とは、上記いずれの状態であるかを判定できない場合に対応する。この場合、矩形領域決定部２０は、直前フレームの矩形領域２の位置、高さ及び幅を変更する。なお、位置については判定ベクトルの平均ベクトルだけスライドさせる。高さ及び幅の変更量は、判定ベクトルに基づき判定する。なお、判定ベクトルから高さ及び幅の変更量を算出する方法は、予め実験的に求めておく。 The other state corresponds to a case where it cannot be determined which of the above states. In this case, the rectangular area determination unit 20 changes the position, height, and width of the rectangular area 2 of the immediately preceding frame. Note that the position is slid by the average vector of the determination vectors. The amount of change in height and width is determined based on a determination vector. Note that a method for calculating the amount of change in height and width from the determination vector is experimentally obtained in advance.

以上、本実施形態において、処理装置は、処理対象フレームの直前フレームからの動きを評価する。例えば、動きの評価は、処理対象フレームと直前フレームの対応する画素の移動方向のバラツキが大きいか否かを判定することにより行う。本実施形態では、対応する画素を特徴点により判定し、移動方向のバラツキが大きいか否かについては、移動方向を角度で表し、この角度の分散が閾値より大きいか否かにより判定する。角度の分散が閾値未満である場合、動きは単純であり、よって、直前フレームにおいて判定した顔領域を含む矩形領域を、評価した動きに基づき移動させ、或いは、その大きさを変化させることで処理対象フレームにおける矩形領域を判定することができる。例えば、ヘッドピッチングやヘッド回転は、単純な動きであり、よって、人物がカメラに顔を向けた状態から、顔を横に向けたり、下に向けたりしても、顔を含む矩形領域を精度良く判定できる。一方、角度の分散が閾値以上であると、再度、パターンマッチングにより顔を含む矩形領域を検出する。この構成により、人物の顔を含む矩形領域を精度良く検出でき、よって、人物領域を精度良く抽出することができる。 As described above, in the present embodiment, the processing apparatus evaluates the movement of the processing target frame from the immediately preceding frame. For example, the motion evaluation is performed by determining whether or not there is a large variation in the moving direction of the corresponding pixel between the processing target frame and the immediately preceding frame. In the present embodiment, the corresponding pixel is determined by the feature point, and whether or not the variation in the movement direction is large is represented by an angle in the movement direction, and whether or not the variance of the angle is larger than a threshold value. If the variance of the angle is less than the threshold, the motion is simple. Therefore, the rectangular region including the face region determined in the immediately preceding frame is moved based on the evaluated motion, or the size is changed. A rectangular area in the target frame can be determined. For example, head pitching and head rotation are simple movements. Therefore, even if a person turns his face to the camera, even if he faces his face sideways or down, the rectangular area including the face is accurate. Can judge well. On the other hand, if the angle variance is equal to or greater than the threshold, a rectangular region including the face is detected again by pattern matching. With this configuration, a rectangular area including a person's face can be detected with high accuracy, and thus the human area can be extracted with high accuracy.

なお、上記実施形態では、顔を含む領域として矩形領域を使用したが、矩形領域以外の所定形状の領域を検出する構成であっても良い。また、本実施形態において抽出する人物領域を上半身としたが、テンプレートを変更することで顔を含む任意の領域を抽出することができる。また、本実施形態では、矩形領域決定部２０は、直前フレームと処理対象フレームとの比較により、図３の処理を行っていた。しかしながら、処理対象フレームより２つ前のフレームや、３つ前のフレーム等、処理対象フレームより時間的に前のフレームに基づき図３の処理を行うこともできる。さらに、本実施形態では、顔領域４の移動状態を図７に示す５つに分類したが、分類方法は、図７に示すものに限定されない。また、顔領域４の移動状態の判定基準についても、上述したものに限定されず、他の判定基準を使用することができる。 In the above embodiment, a rectangular area is used as an area including a face. However, an area having a predetermined shape other than the rectangular area may be detected. Moreover, although the person area to be extracted is the upper body in the present embodiment, any area including a face can be extracted by changing the template. In the present embodiment, the rectangular area determination unit 20 performs the process of FIG. 3 by comparing the immediately preceding frame with the process target frame. However, the processing of FIG. 3 can also be performed based on a frame temporally prior to the processing target frame, such as a frame two prior to the processing target frame or a frame three previous. Furthermore, in the present embodiment, the movement state of the face area 4 is classified into five as shown in FIG. 7, but the classification method is not limited to that shown in FIG. Further, the determination criteria for the movement state of the face region 4 are not limited to those described above, and other determination criteria can be used.

なお、本発明による処理装置は、コンピュータを上記処理装置として動作させるプログラムにより実現することができる。これらコンピュータプログラムは、コンピュータが読み取り可能な記憶媒体に記憶されて、又は、ネットワーク経由で配布が可能なものである。 The processing apparatus according to the present invention can be realized by a program that causes a computer to operate as the processing apparatus. These computer programs can be stored in a computer-readable storage medium or distributed via a network.

２０：矩形領域決定部、３０：処理部 20: Rectangular area determination unit, 30: Processing unit

Claims

動画の処理装置であって、
第１フレームにおいて人物の顔を含む所定形状の第１領域を決定する決定手段と、
前記第１フレームの前記第１領域に基づき、前記第１フレームから前記人物の前記顔を含む所定の人物領域を抽出する抽出手段と、
を備え、
前記決定手段は、前記第１フレームの複数の第１画素それぞれに対応する、前記第１フレームより時間的に前の第２フレームの第２画素を判定し、前記第２画素それぞれから対応する前記第１画素に向かう複数のベクトルの方向に基づき、前記第１フレームの前記第１領域を、パターンマッチングにより決定するか前記第２フレームの前記第１領域に基づき決定するかを判定することを特徴とする処理装置。 A video processing device,
Determining means for determining a first region having a predetermined shape including a human face in the first frame;
Extracting means for extracting a predetermined person area including the face of the person from the first frame based on the first area of the first frame;
With
The determining means determines a second pixel of a second frame temporally prior to the first frame corresponding to each of the plurality of first pixels of the first frame, and corresponds to each of the second pixels. Whether to determine the first area of the first frame by pattern matching or based on the first area of the second frame is determined based on directions of a plurality of vectors toward the first pixel. A processing device.

前記決定手段は、前記複数のベクトルの所定方向に対する角度の分散が閾値より小さいと、前記第１フレームの前記第１領域を前記第２フレームの前記第１領域に基づき決定することを特徴とする請求項１に記載の処理装置。 The determining unit determines the first area of the first frame based on the first area of the second frame when a variance of angles of the plurality of vectors with respect to a predetermined direction is smaller than a threshold value. The processing apparatus according to claim 1.

前記決定手段は、前記第１フレームの前記第１領域を前記第２フレームの前記第１領域に基づき決定する場合、前記複数のベクトルの内、前記第２フレームの前記第１領域内の画素を始点とするベクトルに基づき、前記顔の移動状態を判定し、判定した移動状態と、前記第２フレームの前記第１領域に基づき、前記第１フレームの前記第１領域を決定することを特徴とする請求項１又は２に記載の処理装置。 When the determining unit determines the first region of the first frame based on the first region of the second frame, the determining unit selects pixels in the first region of the second frame from the plurality of vectors. The moving state of the face is determined based on a vector as a starting point, and the first region of the first frame is determined based on the determined moving state and the first region of the second frame. The processing apparatus according to claim 1 or 2.

前記決定手段は、前記顔の移動状態が静止であると判定すると、前記第２フレームの前記第１領域と同じ領域を前記第１フレームの前記第１領域として決定することを特徴とする請求項３に記載の処理装置。 The determination unit, when determining that the moving state of the face is stationary, determines the same region as the first region of the second frame as the first region of the first frame. 3. The processing apparatus according to 3.

前記決定手段は、前記顔の移動状態がスライドであると判定すると、前記第２フレームの前記第１領域をスライドさせることで前記第１フレームの前記第１領域を決定することを特徴とする請求項３又は４に記載の処理装置。 The determination means determines the first area of the first frame by sliding the first area of the second frame when it is determined that the moving state of the face is a slide. Item 5. The processing apparatus according to Item 3 or 4.

前記決定手段は、前記顔の移動状態が水平方向の振りであると判定すると、前記第２フレームの前記第１領域の水平方向の長さを短くすることで前記第１フレームの前記第１領域を決定することを特徴とする請求項３から５のいずれか１項に記載の処理装置。 If the determining means determines that the moving state of the face is a swing in the horizontal direction, the first area of the first frame is shortened by shortening the horizontal length of the first area of the second frame. The processing apparatus according to claim 3, wherein the processing device is determined.

前記決定手段は、前記顔の移動状態が垂直方向の振りであると判定すると、前記第２フレームの前記第１領域の垂直方向の長さを短くすることで前記第１フレームの前記第１領域を決定することを特徴とする請求項３から６のいずれか１項に記載の処理装置。 If the determining means determines that the movement state of the face is a swing in the vertical direction, the first area of the first frame is shortened by shortening the vertical length of the first area of the second frame. The processing apparatus according to claim 3, wherein the processing device is determined.

前記抽出手段は、前記第１フレームの前記第１領域に基づきグラフカット法により前記顔の領域を抽出することを特徴とする請求項１から７のいずれか１項に記載の処理装置。 8. The processing apparatus according to claim 1, wherein the extracting unit extracts the face area by a graph cut method based on the first area of the first frame. 9.

前記抽出手段は、前記第１フレームの前記第１領域を縮小させた第２領域を判定し、前記第２領域に基づきグラフカット法により前記顔の領域を抽出することを特徴とする請求項１から７のいずれか１項に記載の処理装置。 The extraction means determines a second region obtained by reducing the first region of the first frame, and extracts the face region by a graph cut method based on the second region. 8. The processing apparatus according to any one of 7 to 7.

前記抽出手段は、前記人物領域のテンプレートを示す情報を保持しており、前記テンプレートと、前記抽出した前記顔の領域に基づき、前記第１フレームの前記人物領域を抽出することを特徴とする請求項８又は９に記載の処理装置。 The extraction means holds information indicating a template of the person area, and extracts the person area of the first frame based on the template and the extracted face area. Item 10. The processing apparatus according to Item 8 or 9.

処理装置における動画の処理方法であって、
第１フレームにおいて人物の顔を含む所定形状の第１領域を決定する決定ステップと、
前記第１フレームの前記第１領域に基づき、前記第１フレームから前記人物の前記顔を含む所定の人物領域を抽出する抽出ステップと、
を含み、
前記決定ステップは、前記第１フレームの複数の第１画素それぞれに対応する、前記第１フレームより時間的に前の第２フレームの第２画素を判定するステップと、前記第２画素それぞれから対応する前記第１画素に向かう複数のベクトルの方向に基づき、前記第１フレームの前記第１領域を、パターンマッチングにより決定するか前記第２フレームの前記第１領域に基づき決定するかを判定するステップと、を含むことを特徴とする処理方法。 A method of processing a moving image in a processing device,
A determining step of determining a first region having a predetermined shape including a human face in the first frame;
An extraction step of extracting a predetermined person area including the face of the person from the first frame based on the first area of the first frame;
Including
The determining step corresponds to a step of determining a second pixel of a second frame temporally prior to the first frame corresponding to each of a plurality of first pixels of the first frame, and a step corresponding to each of the second pixels. Determining whether the first region of the first frame is determined by pattern matching or based on the first region of the second frame, based on directions of a plurality of vectors toward the first pixel. The processing method characterized by including these.

請求項１から１０のいずれか１項に記載の処理装置としてコンピュータを機能させることを特徴とするプログラム。 A program that causes a computer to function as the processing device according to claim 1.