JP2008287518A

JP2008287518A - Image processor, image processing program, recording medium and image processing method

Info

Publication number: JP2008287518A
Application number: JP2007131965A
Authority: JP
Inventors: Koichi Kurimoto; 孝一栗本
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2007-05-17
Filing date: 2007-05-17
Publication date: 2008-11-27

Abstract

<P>PROBLEM TO BE SOLVED: To improve recognition performance of a face area, in an image processor, an image processing program, a recording medium and an image processing method. <P>SOLUTION: This image processor has: a first extraction means 12-2 extracting a flesh color area from each of a plurality of pieces of frame data based on hue; a second extraction means 12-3 extracting a first overlap portion of a plurality of the flesh color areas; a first width calculation means 12-4 calculating a width of the first overlap portion; a third extraction means 12-5 extracting a body candidate area inside an area below the first overlap portion from each of the plurality of pieces of the frame data based on a prescribed color parameter; a fourth extraction means 12-6 extracting a second overlap portion of a plurality of the body candidate areas; a second width calculation means 12-7 calculating a width of the second overlap portion; a comparison means 12-8 comparing the widths of the first overlap portion and the second overlap portion; and a recognition means 12-9 recognizing the flesh color area as the face area when a comparison result satisfies a prescribed condition. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数のフレームデータからなる動画像から人物の顔領域を認識する画像処理装置、画像処理プログラム、記録媒体、および画像処理方法に関する。 The present invention relates to an image processing apparatus, an image processing program, a recording medium, and an image processing method for recognizing a human face area from a moving image composed of a plurality of frame data.

従来より、画像処理装置による人物の顔領域抽出技術が提案されている。たとえば、特開２０００−３５４２４７号公報（特許文献１）に記載されているように、動き領域を検出することによって動き物体を認識し、更に検出した動き領域内から人物の顔部分の特徴である肌色領域を抽出することによって顔領域を抽出する技術がある。 Conventionally, a technique for extracting a human face region by using an image processing apparatus has been proposed. For example, as described in Japanese Patent Application Laid-Open No. 2000-354247 (Patent Document 1), a moving object is recognized by detecting a moving area, and is a feature of a human face portion from the detected moving area. There is a technique for extracting a face region by extracting a skin color region.

また、二次元的なパターンマッチングを用いて目や鼻などの領域を検出することによって、人物の特徴を精密に抽出し、人物の顔領域を認識する方法がある。
特開２０００−３５４２４７号公報 In addition, there is a method of recognizing a person's face area by accurately extracting a person's features by detecting areas such as eyes and nose using two-dimensional pattern matching.
JP 2000-354247 A

しかしながら、特開２０００−３５４２４７号公報（特許文献１）に記載されているような、動き領域の検出と肌色領域の検出とを組み合わせた方法においては、動き要素判定による動き領域の抽出と肌色要素判定による肌色領域の抽出とを行うためのアルゴリズムから生じる問題がある。すなわち、人物の顔部分以外の、動き成分や肌色成分を有する物体を、誤って人物の顔領域として抽出してしまう、という問題点があった。 However, in a method combining motion area detection and skin color area detection as described in Japanese Patent Application Laid-Open No. 2000-354247 (Patent Document 1), motion area extraction and skin color elements by motion element determination are performed. There is a problem arising from an algorithm for performing skin color region extraction by determination. That is, there is a problem that an object having a motion component or a skin color component other than the face portion of the person is erroneously extracted as the face area of the person.

また、特徴部の抽出にパターンマッチングを用いる方法においては、そもそも前記パターンマッチングを行うためにある程度の精度が必要となるため、前記パターンマッチングをソフトウェアによって実現する場合には、装置全体のパフォーマンスが低下してしまう、という問題点があった。 In addition, in the method using pattern matching for feature extraction, a certain degree of accuracy is required to perform the pattern matching in the first place, and therefore, when the pattern matching is realized by software, the performance of the entire apparatus is reduced. There was a problem that it would.

本発明は上記のような問題点を解決するためになされたものであって、本発明の主たる目的は、顔領域の認識性能の向上を図ることにある。 The present invention has been made to solve the above problems, and a main object of the present invention is to improve the recognition performance of a face area.

この発明のある局面に従えば、複数のフレームデータからなる動画像から人物の顔領域を認識する画像処理装置であって、複数のフレームデータの各々から、色相に基づいて肌色領域を抽出する第１の抽出手段と、複数のフレームデータのそれぞれから抽出された複数の肌色領域の第１の重複部分を抽出する第２の抽出手段と、第１の重複部分の横幅を算出する第１の横幅算出手段と、複数のフレームデータの各々から、第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出する第３の抽出手段と、複数のフレームデータのそれぞれから抽出された複数の胴体候補領域の第２の重複部分を抽出する第４の抽出手段と、第２の重複部分の横幅を算出する第２の横幅算出手段と、第１の重複部分の横幅と第２の重複部分の横幅とを比較する比較手段と、比較手段にて得られた比較結果が所定の条件を満たす場合に、肌色領域を顔領域として認識する認識手段と、を備える。 According to an aspect of the present invention, there is provided an image processing apparatus for recognizing a human face region from a moving image composed of a plurality of frame data, wherein a skin color region is extracted from each of a plurality of frame data based on a hue. 1 extracting means, second extracting means for extracting a first overlapping portion of a plurality of skin color regions extracted from each of a plurality of frame data, and a first width for calculating a width of the first overlapping portion A calculation means; a third extraction means for extracting a body candidate area in an area below the first overlapping portion from each of the plurality of frame data based on a predetermined color parameter; and each of the plurality of frame data 4th extracting means for extracting the second overlapping portion of the plurality of trunk candidate regions extracted from the second width calculating means for calculating the width of the second overlapping portion, and the width of the first overlapping portion When It comprises comparing means for comparing the width of the second overlapping portion, when the comparison result obtained by the comparing means satisfies a predetermined condition, and recognition means for recognizing the skin color region as a face region.

この局面によれば、第１の重複領域（たとえば、顔候補の動き領域）と第２の重複領域（たとえば、肩候補の動き領域）との横幅に基づいて、肌色領域を人物の顔領域として認識するか否か判断するため、人物の形状をも考慮した精緻な判断を行うことができるようになり、その結果、顔領域の認識性能の向上を実現することができる。 According to this aspect, based on the horizontal width of the first overlap region (for example, the motion region of the face candidate) and the second overlap region (for example, the motion region of the shoulder candidate), the skin color region is set as the human face region. In order to determine whether or not to recognize, it becomes possible to make a precise determination in consideration of the shape of the person, and as a result, it is possible to improve the recognition performance of the face area.

好ましくは、比較手段は、第１の重複部分の横幅と第２の重複部分の横幅との比率を計算して、比率が予め定められた範囲内にあるか否かを判断し、認識手段は、比率が予め定められた範囲内にあると判断された場合に肌色領域を顔領域として認識する。 Preferably, the comparison means calculates a ratio between the width of the first overlapping portion and the width of the second overlapping portion, determines whether the ratio is within a predetermined range, and the recognition means When the ratio is determined to be within a predetermined range, the skin color area is recognized as a face area.

好ましくは、比較手段は、比率が１．４〜１．６の範囲内にあるか否かを判断し、認識手段は、比率が１．４〜１．６の範囲内にあると判断された場合に肌色領域を顔領域として認識する。 Preferably, the comparison means determines whether or not the ratio is in the range of 1.4 to 1.6, and the recognition means is determined to be in the range of 1.4 to 1.6. In this case, the skin color area is recognized as the face area.

好ましくは、所定の色パラメータは、フレームデータにおける画素毎の輝度値である。
好ましくは、複数のフレームデータの各々を縮小する画像縮小手段をさらに備える。 Preferably, the predetermined color parameter is a luminance value for each pixel in the frame data.
Preferably, the image processing device further includes image reduction means for reducing each of the plurality of frame data.

好ましくは、画像縮小手段は、複数のフレームデータの各々を１／２ⁿ（ｎ：整数）倍に縮小する。 Preferably, the image reducing means reduces each of the plurality of frame data by 1/2 ⁿ (n: integer) times.

好ましくは、第１の抽出手段は、複数のフレームデータの各々から、肌色領域または肌色領域以外の領域を示す２値のフレームデータを生成し、第２の抽出手段は、２値のフレームデータのそれぞれから抽出された複数の肌色領域の第１の重複部分を抽出する。 Preferably, the first extraction unit generates binary frame data indicating a skin color region or a region other than the skin color region from each of the plurality of frame data, and the second extraction unit stores the binary frame data. First overlapping portions of a plurality of skin color regions extracted from each are extracted.

この発明の別の局面に従えば、コンピュータに複数のフレームデータからなる動画像から人物の顔領域を認識させるための画像処理プログラムであって、複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、複数のフレームデータのそれぞれから抽出された複数の肌色領域の第１の重複部分を抽出するステップと、第１の重複部分の横幅を算出するステップと、複数のフレームデータの各々から、第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出するステップと、複数のフレームデータのそれぞれから抽出された複数の胴体候補領域の第２の重複部分を抽出するステップと、第１の重複部分の横幅と第２の重複部分の横幅とを比較するステップと、得られた比較結果が所定の条件を満たす場合に、肌色領域を顔領域として認識するステップと、をコンピュータに行わせる。 According to another aspect of the present invention, there is provided an image processing program for causing a computer to recognize a face area of a person from a moving image composed of a plurality of frame data, the skin color based on the hue from each of the plurality of frame data Extracting a region; extracting a first overlapping portion of a plurality of skin color regions extracted from each of the plurality of frame data; calculating a width of the first overlapping portion; and a plurality of frame data A body candidate region in a region below the first overlapping portion is extracted based on a predetermined color parameter, and a plurality of body candidate regions extracted from each of the plurality of frame data. Extracting the overlap portion of the first overlap portion, comparing the width of the first overlap portion with the width of the second overlap portion, and the comparison result obtained. There when a predetermined condition is satisfied, to perform the step of recognizing the skin color region as a face area, to the computer.

この発明の別の局面に従えば、コンピュータに複数のフレームデータからなる動画像から人物の顔領域を認識させるための画像処理プログラムを記録したコンピュータ読取可能な記録媒体であって、複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、複数のフレームデータのそれぞれから抽出された複数の肌色領域の第１の重複部分を抽出するステップと、第１の重複部分の横幅を算出するステップと、複数のフレームデータの各々から、第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出するステップと、複数のフレームデータのそれぞれから抽出された胴体候補領域の第２の重複部分を抽出するステップと、第２の重複部分の横幅を算出するステップと、第１の重複部分の横幅と第２の重複部分の横幅とを比較するステップと、得られた比較結果が所定の条件を満たす場合に、肌色領域を顔領域として認識するステップと、をコンピュータに実行させるための画像処理プログラムを記録した。 According to another aspect of the present invention, there is provided a computer-readable recording medium on which an image processing program for causing a computer to recognize a face area of a person from a moving image including a plurality of frame data is recorded. Extracting a skin color region from each of the plurality of skin data, extracting a first overlapping portion of the plurality of skin color regions extracted from each of the plurality of frame data, and calculating a width of the first overlapping portion. A step of calculating, a step of extracting a body candidate region in a region below the first overlapping portion from each of the plurality of frame data based on predetermined color parameters, and a plurality of frame data. Extracting a second overlapping portion of the torso candidate area, calculating a width of the second overlapping portion, In order to cause the computer to execute a step of comparing the width of the overlapping portion and the width of the second overlapping portion, and a step of recognizing the skin color region as a face region when the obtained comparison result satisfies a predetermined condition The image processing program was recorded.

この発明の別の局面に従えば、複数のフレームデータからなる動画像から人物の顔領域を認識する画像処理装置を使用した画像処理方法であって、画像処理装置は、動画像から抽出された複数のフレームデータを記憶する記憶部と、記憶部に記憶された複数のフレームデータに基づいて、フレームデータにおける人物の顔領域を認識する制御部と、を備え、画像処理方法は、制御部が、記憶部に記憶された複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、制御部が、複数のフレームデータのそれぞれから抽出された複数の肌色領域の第１の重複部分を抽出するステップと、制御部が、第１の重複部分の横幅を算出するステップと、制御部が、記憶部に記憶された複数のフレームデータの各々から、第１の重複部分より下方の領域内における胴体候補領域を所定の色パラメータに基づいて抽出するステップと、制御部が、複数のフレームデータのそれぞれから抽出された胴体候補領域の第２の重複部分を抽出するステップと、制御部が、第２の重複部分の横幅を算出するステップと、制御部が、第１の重複部分の横幅と第２の重複部分の横幅とを比較するステップと、制御部が、得られた比較結果が所定の条件を満たす場合に、肌色領域を顔領域として認識するステップと、を備える。 According to another aspect of the present invention, there is provided an image processing method using an image processing device for recognizing a human face area from a moving image composed of a plurality of frame data, wherein the image processing device is extracted from the moving image. A storage unit for storing a plurality of frame data; and a control unit for recognizing a human face area in the frame data based on the plurality of frame data stored in the storage unit. Extracting a skin color area from each of the plurality of frame data stored in the storage unit based on the hue; and a first overlap of the plurality of skin color areas extracted from each of the plurality of frame data A step of extracting a portion, a step of calculating a lateral width of the first overlapping portion, and a portion of the control unit from the plurality of frame data stored in the storage unit. A step of extracting a body candidate region in a region below the portion based on a predetermined color parameter, and a step of extracting a second overlapping portion of the body candidate region extracted from each of the plurality of frame data by the control unit The controller calculates the width of the second overlapping portion, the controller compares the width of the first overlapping portion with the width of the second overlapping portion, and the controller obtains And a step of recognizing a skin color area as a face area when the comparison result obtained satisfies a predetermined condition.

以上に述べたように、本発明によって、顔領域の認識性能の向上が実現される。 As described above, the face area recognition performance is improved by the present invention.

以下、本発明の実施の形態について説明する。ただし、本発明は以下で説明する実施の形態に限定されるものではない。また、以下の説明では、同一の部品については同一の符号を付すものとし、前記部品の名称や機能が同一である場合には、前記部品についての詳細な説明は繰り返さない。 Embodiments of the present invention will be described below. However, the present invention is not limited to the embodiments described below. Further, in the following description, the same parts are denoted by the same reference numerals, and when the names and functions of the parts are the same, detailed description of the parts will not be repeated.

＜画像処理装置の全体構成＞
まず、本実施の形態に係る画像処理装置１０の全体構成について説明する。本実施の形態に係る画像処理装置１０は、複数のフレームデータからなる動画像の各種画像処理を行うための装置である。画像処理装置１０は、特に動画像符号化技術を使用した、テレビモニター付きインターフォン、監視カメラ、カメラ付き携帯電話、などのように対象画像内に人物が含まれることが想定される画像を処理するための機器に利用されるものである。画像処理装置１０よって行われる画像処理用の機能は、たとえば、パーソナルコンピュータまたはワークステーションなどのコンピュータ上で実行されるソフトウェアによって実現され得る。 <Overall configuration of image processing apparatus>
First, the overall configuration of the image processing apparatus 10 according to the present embodiment will be described. The image processing apparatus 10 according to the present embodiment is an apparatus for performing various types of image processing of moving images composed of a plurality of frame data. The image processing apparatus 10 processes an image in which a person is assumed to be included in the target image, such as an interphone with a TV monitor, a surveillance camera, a mobile phone with a camera, and the like, particularly using a moving image encoding technique. It is used for equipment. The function for image processing performed by the image processing apparatus 10 can be realized by software executed on a computer such as a personal computer or a workstation.

ただし、本実施の形態においては、後述するように、各種の画像処理機能がパーソナルコンピュータまたはワークステーションなどのコンピュータ上で実行されるソフトウェアによって実現される構成としているが、各ブロックの機能や各ステップの処理をソフトウェアによって実現する代わりに、各々を専用のハードウェア回路等によって実現してもよい。 However, in this embodiment, as described later, various image processing functions are realized by software executed on a computer such as a personal computer or a workstation. Instead of realizing the above processing by software, each may be realized by a dedicated hardware circuit or the like.

図１は本実施の形態に係る画像処理装置１０のハードウェア構成を示す図である。図１に示すように、本実施の形態に係る画像処理装置１０は、内部バス１１と、ＣＰＵ（画像処理プロセッサ）１２と、メモリ（主記憶装置）１３と、固定ディスク（外部記憶装置）１４と、通信インターフェース１５と、入力装置１６と、出力装置１７と、ＦＤ（Flexible Disk）駆動装置１８と、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory)駆動装置１９と、を備える。そして、画像処理装置１０には、複数のフレームデータから構成される動画像を入力するための撮像装置２１と、当該動画像をユーザ等へ表示するためのディスプレイ２０とが接続されている。 FIG. 1 is a diagram illustrating a hardware configuration of an image processing apparatus 10 according to the present embodiment. As shown in FIG. 1, the image processing apparatus 10 according to the present embodiment includes an internal bus 11, a CPU (image processing processor) 12, a memory (main storage device) 13, and a fixed disk (external storage device) 14. A communication interface 15, an input device 16, an output device 17, an FD (Flexible Disk) drive device 18, and a CD-ROM (Compact Disk-Read Only Memory) drive device 19. The image processing device 10 is connected to an imaging device 21 for inputting a moving image composed of a plurality of frame data and a display 20 for displaying the moving image to a user or the like.

一般的に、前記ソフトウェアはＦＤ２８やＣＤ−ＲＯＭ２９などの記録媒体に格納されて、もしくはネットワークなどを介すことによって流通する。そして、前記ソフトウェアは、ＦＤ駆動装置１８やＣＤ−ＲＯＭ駆動装置１９などによって前記記録媒体から読取られて、もしくは通信インターフェース１５にて受信されて、固定ディスク１４に格納される。そして、固定ディスク１４からメモリ１３に読み出されてから、ＣＰＵ１２により実行される。つまり、図１に示すような画像処理装置１０のハードウェア自体は、一般的なコンピュータによって実現可能である。 In general, the software is distributed in a recording medium such as the FD 28 or the CD-ROM 29 or via a network or the like. The software is read from the recording medium by the FD driving device 18 or the CD-ROM driving device 19 or received by the communication interface 15 and stored in the fixed disk 14. Then, after being read from the fixed disk 14 to the memory 13, it is executed by the CPU 12. That is, the hardware of the image processing apparatus 10 as shown in FIG. 1 can be realized by a general computer.

撮像装置２１は、小型カメラやイメージセンサやＣＣＤなどの撮像手段であって、動画像を形成するフレームデータを画像処理装置１０へと順次入力するものである。撮像装置２１は、たとえば、住宅の玄関に配置されて、住宅への訪問者を撮像するものである。 The imaging device 21 is imaging means such as a small camera, an image sensor, or a CCD, and sequentially inputs frame data forming a moving image to the image processing device 10. The imaging device 21 is, for example, arranged at the entrance of a house and images visitors to the house.

ディスプレイ２０は、液晶パネルやＣＲＴから構成されるものであって、ＣＰＵ１２が出力した画像等の情報を表示する。 The display 20 is composed of a liquid crystal panel or a CRT, and displays information such as an image output by the CPU 12.

ＣＰＵ１２は、画像処理装置１０の各要素を制御するものであって、各種の演算を実施する装置である。また、ＣＰＵ１２は、後述するように、画像縮小処理と、肌色領域抽出処理と、肌色動き領域抽出処理（第１の重複部分抽出処理）と、顔候補位置検出処理（第１の横幅算出処理）と、肩候補領域抽出処理と、肩候補動き領域抽出処理（第２の重複部分抽出処理）と、肩候補位置検出処理（第２の横幅算出処理）と、横幅比較処理と、顔領域判断処理等を行うものであって、当該判断結果を内部バス１１を介して出力装置１７やディスプレイ２０に出力する。ＣＰＵ１２は、各種の画像処理機能を実現することが出来るプログラム処理が実行可能な装置であればよく、たとえば専用の画像エンジン等であってもよい。 The CPU 12 controls each element of the image processing apparatus 10 and performs various calculations. Further, as will be described later, the CPU 12 performs an image reduction process, a skin color area extraction process, a skin color motion area extraction process (first overlapping portion extraction process), and a face candidate position detection process (first horizontal width calculation process). Shoulder candidate region extraction processing, shoulder candidate motion region extraction processing (second overlapping portion extraction processing), shoulder candidate position detection processing (second width calculation processing), width comparison processing, and face region determination processing The determination result is output to the output device 17 and the display 20 via the internal bus 11. The CPU 12 may be any device that can execute program processing capable of realizing various image processing functions, and may be a dedicated image engine, for example.

そして、本実施の形態に係る画像処理装置１０においては、顔部分が認識された際に、撮像装置２１に人物が撮像されている旨を、ＣＰＵ１２が出力装置１７に出力させる構成とすることができる。また、ＣＰＵ１２により、顔領域であると判断された画素の解像度や表示色を増加させる構成にすることも可能である。 In the image processing apparatus 10 according to the present embodiment, when the face portion is recognized, the CPU 12 causes the output device 17 to output that the person is imaged by the imaging device 21. it can. Further, the CPU 12 may be configured to increase the resolution and display color of the pixels determined to be the face area.

後述する各種の画像処理は、たとえば、固定ディスク１４に格納された各種処理手順（各種の画像処理用のアルゴリズム）に関するプログラムが一旦メモリ１３へと読み出され、読み出された前記プログラムがＣＰＵ１２上で実行されることによって、実現されるものである。また、ＣＰＵ１２は、内部バス１１を介してＣＰＵ１２に接続される撮像装置２１と、メモリ１３と、固定ディスク１４等との間でデータの授受を行ないながら、各種の画像処理を実行する。 In various image processing to be described later, for example, a program relating to various processing procedures (various image processing algorithms) stored in the fixed disk 14 is once read into the memory 13, and the read program is stored on the CPU 12. This is realized by being executed in step (b). The CPU 12 executes various types of image processing while exchanging data between the imaging device 21 connected to the CPU 12 via the internal bus 11, the memory 13, the fixed disk 14, and the like.

メモリ１３は、たとえば、ＤＲＡＭ（Dynamic Random Access Memory）、ＳＲＡＭ（Static Random Access Memory）、ＳＤＲＡＭ（Synchronous DRAM）等の揮発性の半導体メモリデバイスから構成される主記憶装置である。 The memory 13 is a main storage device composed of a volatile semiconductor memory device such as a DRAM (Dynamic Random Access Memory), an SRAM (Static Random Access Memory), or an SDRAM (Synchronous DRAM).

固定ディスク（外部記憶装置）１４は、例えば、ハードディスクドライブやフレキシブルディスク等の不揮発性の磁気記録媒体、もしくはフラッシュメモリ等の不揮発性の半導体メモリデバイスから構成されており、各種の画像処理をＣＰＵ１２に実行させるためのコンピュータプログラムを記憶している。 The fixed disk (external storage device) 14 is composed of, for example, a nonvolatile magnetic recording medium such as a hard disk drive or a flexible disk, or a nonvolatile semiconductor memory device such as a flash memory. A computer program for execution is stored.

但し、内部バス１１と通信インターフェース１５を介して有線または無線で接続されたネットワーク２２を介すことによって、他のネットワーク機器２３の記憶装置等を画像処理装置１０の外部記憶装置として用いることもできる。また、上記コンピュータプログラムを、ＦＤ２８や、ＣＤ−ＲＯＭ２９や、ＤＶＤや、ハードディスクや、光ディスクや、光磁気ディスクや、磁気テープや、不揮発性のメモリカードや、その他の不揮発性メモリ等のような可搬型の記録媒体に格納しておく形態であってもよい。そして、当該記録媒体を画像処理装置１０に装着することによって、画像処理装置１０が当該記録媒体に格納されたプログラムコードを読み出し、メモリ１３に当該プログラムをロードしてからＣＰＵ１２が実行する形態としても構わない。また、当該記録媒体として高速アクセス可能な半導体メモリ等を利用して、当該記録媒体に格納されたプログラムコードを直接読み出しながらＣＰＵ１２が実行する形態であっても構わない。 However, the storage device or the like of another network device 23 can also be used as the external storage device of the image processing apparatus 10 via the network 22 connected to the internal bus 11 and the communication interface 15 by wire or wirelessly. . In addition, the computer program may be an FD28, CD-ROM29, DVD, hard disk, optical disk, magneto-optical disk, magnetic tape, non-volatile memory card, other non-volatile memory, or the like. It may be stored in a portable recording medium. Then, by mounting the recording medium on the image processing apparatus 10, the image processing apparatus 10 reads the program code stored in the recording medium, loads the program into the memory 13, and then the CPU 12 executes the program code. I do not care. Further, the CPU 12 may execute the program while directly reading the program code stored in the recording medium using a semiconductor memory that can be accessed at high speed as the recording medium.

入力装置１６は、たとえば、クリックされたりやスライドされることによってユーザから情報を受付けるマウスや、キー入力によりユーザから情報を受付ける前記キーボードから構成される。 The input device 16 includes, for example, a mouse that receives information from the user when clicked or slid, and the keyboard that receives information from the user by key input.

出力装置１７は、撮像装置２１から入力された入力動画像を表示するための、画像処理装置１０に直接備えられたディスプレイやプリンタであってもよいし、ＣＰＵ１２にて算出された位置情報を電気信号として外部装置（外部のディスプレイ２０）に出力するだけのものであってもよい。また、出力装置１７は、ＣＰＵ１２にて判断された顔領域と認識された顔領域を示す情報とを前記入力動画像に重ねて表示してもよいし、顔領域が認識されたと判断された時にライトを点灯させる形態であってもよい。ただし、前述したように、画像処理装置１０の外部装置（たとえば外部のディスプレイ２０等）に、ＣＰＵ１２による判断結果や、撮像装置２１にて撮像され入力された入力動画像に関する映像データや音声データを出力する構成であってもよい。 The output device 17 may be a display or a printer directly provided in the image processing device 10 for displaying the input moving image input from the imaging device 21, and the position information calculated by the CPU 12 is electrically stored. It may only be output to an external device (external display 20) as a signal. Further, the output device 17 may display the face area determined by the CPU 12 and information indicating the recognized face area superimposed on the input moving image, or when it is determined that the face area is recognized. The light may be turned on. However, as described above, the determination result by the CPU 12 and the video data and audio data related to the input moving image captured and input by the imaging device 21 are sent to the external device (for example, the external display 20) of the image processing device 10. The structure which outputs may be sufficient.

通信インターフェース１５は、前記ＣＰＵ１２が出力した情報を電気信号へと変換するものであって、ＣＰＵ１２が出力した情報をその他の装置が利用できる信号へと変換する装置である。また、通信インターフェース１５は、本実施の形態に係るコンピュータの外部から入力された情報を受信して、ＣＰＵ１２が利用できる情報に変換する装置でもある。 The communication interface 15 converts the information output from the CPU 12 into an electrical signal, and converts the information output from the CPU 12 into a signal that can be used by other devices. The communication interface 15 is also a device that receives information input from the outside of the computer according to the present embodiment and converts it into information that can be used by the CPU 12.

つまり、ＣＰＵ１２で求めた判断結果を、内部バス１１と通信インターフェース１５とを介して有線または無線で接続されたネットワーク２２に出力し、他のネットワーク機器２３に接続されている外部装置（表示装置）に出力することも可能である。逆に、内部バス１１と直接つながっていない撮像装置が撮像した動画像に関する映像信号を、他のネットワーク機器２３からネットワーク２２を介して受信することによって、通信インターフェース１５と内部バス１１とを介してＣＰＵ１２に入力し、当該映像信号を入力動画像データとして入力可能な構成としてもよい。 That is, the determination result obtained by the CPU 12 is output to the network 22 connected by wire or wirelessly via the internal bus 11 and the communication interface 15, and an external device (display device) connected to another network device 23. Can also be output. Conversely, a video signal related to a moving image captured by an imaging device that is not directly connected to the internal bus 11 is received from another network device 23 via the network 22, so that the communication interface 15 and the internal bus 11 are used. It is good also as a structure which can be input into CPU12 and the said video signal can be input as input moving image data.

本実施の形態に係る画像処理装置１０は、外部のディスプレイ２０や撮像装置２１にアクセス可能に構成されているが、このような形態に限定するものではない。つまり、前述したように、画像処理装置１０が、ディスプレイ２０等の動画像の表示装置を装備しており、ＣＰＵ１２とディスプレイ２０とが内部バス１１によって接続される構成であってもよい。また、画像処理装置１０が、撮像装置２１等の動画像の入力装置を装備しており、ＣＰＵ１２と撮像装置２１とが内部バス１１によって接続される構成であってもよい。 The image processing apparatus 10 according to the present embodiment is configured to be accessible to the external display 20 and the imaging apparatus 21, but is not limited to such a form. That is, as described above, the image processing apparatus 10 may be equipped with a moving image display device such as the display 20, and the CPU 12 and the display 20 may be connected by the internal bus 11. Further, the image processing apparatus 10 may be equipped with a moving image input device such as the imaging device 21, and the CPU 12 and the imaging device 21 may be connected by the internal bus 11.

＜画像処理装置の機能構成＞
図２は画像処理装置１０の機能構成を示す機能ブロック図である。図２に示すように、本実施の形態に係る画像処理装置１０は、記憶部１３−１と、画像縮小部１２−１と、肌色領域抽出部（第１の抽出手段）１２−２と、肌色動き領域抽出部（第２の抽出手段）１２−３と、顔候補位置検出部（第１の横幅算出手段）１２−４と、肩候補領域抽出部（第３の抽出手段）１２−５と、肩候補動き領域抽出部（第４の抽出手段）１２−６と、肩候補位置検出部（第２の横幅算出手段）１２−７と、横幅比較部（比較手段）１２−８と、顔領域判断部（認識手段）１２−９と、出力部１７−１と、を備えて構成されている。 <Functional configuration of image processing apparatus>
FIG. 2 is a functional block diagram showing a functional configuration of the image processing apparatus 10. As shown in FIG. 2, the image processing apparatus 10 according to the present embodiment includes a storage unit 13-1, an image reduction unit 12-1, a skin color area extraction unit (first extraction unit) 12-2, Skin color motion region extraction unit (second extraction unit) 12-3, face candidate position detection unit (first lateral width calculation unit) 12-4, shoulder candidate region extraction unit (third extraction unit) 12-5 A shoulder candidate motion region extraction unit (fourth extraction unit) 12-6, a shoulder candidate position detection unit (second horizontal width calculation unit) 12-7, a horizontal width comparison unit (comparison unit) 12-8, A face area determination unit (recognition means) 12-9 and an output unit 17-1 are provided.

そして、本実施の形態に係る画像処理装置１０は、たとえば撮像装置２１と接続されており、撮像装置２１が撮像した動画像を構成するフレームデータが順次メモリ１３に格納されてから、順次画像縮小部１２−１に入力されるように構成されている。また、画像処理装置１０は、たとえばディスプレイ２０に接続されており、ＣＰＵ１２にて判定された結果が、出力部１７−１からディスプレイ２０に出力されるように構成されている。 The image processing apparatus 10 according to the present embodiment is connected to, for example, the imaging apparatus 21. The frame data constituting the moving image captured by the imaging apparatus 21 is sequentially stored in the memory 13, and then the image reduction is performed sequentially. It is comprised so that it may be input into the part 12-1. Further, the image processing apparatus 10 is connected to, for example, the display 20, and is configured such that the result determined by the CPU 12 is output from the output unit 17-1 to the display 20.

＜各機能の説明＞
以下、各機能について説明する。まず、画像縮小部１２−１は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。つまり、固定ディスク１４に記憶されたプログラムが一旦メモリ１３へと読み出されて、ＣＰＵ１２がメモリ１３から前記プログラムを読み出しながら順次実行することによって、以下の画像縮小処理機能が実現される。以上のようにして、画像縮小部１２−１は、撮像装置２１から入力された動画像を構成する前記複数のフレームデータの各々を縮小する。 <Description of each function>
Each function will be described below. First, the image reduction unit 12-1 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. That is, the program stored in the fixed disk 14 is once read out to the memory 13, and the CPU 12 sequentially executes the program while reading the program from the memory 13, thereby realizing the following image reduction processing function. As described above, the image reducing unit 12-1 reduces each of the plurality of frame data constituting the moving image input from the imaging device 21.

本実施の形態に係る画像処理装置１０においては、肌色動き領域や肩候補動き領域を抽出する際に、時間の異なる２枚のフレームデータの差異を利用する。その際にノイズが発生すると、当該ノイズを動き領域（成分）と誤認する場合がある。ＣＰＵ１２が各々のフレームデータの画像を一旦縮小してから動き領域を抽出する構成にすることによって、上記のような誤認を減らすことができるようになり、ノイズ等の差異を減少させることができる。言い換えれば、ノイズ等の差異が発生してしまった場合であっても、ノイズ等の差異を動き領域と誤認し難くなる。また、画像縮小処理を施すことによって、それぞれのフレームデータのデータ量が減り、ＣＰＵ１２による演算量を低減することができる。 In the image processing apparatus 10 according to the present embodiment, the difference between two pieces of frame data having different times is used when extracting a skin color motion region and a shoulder candidate motion region. If noise occurs at that time, the noise may be mistaken for a motion region (component). By adopting a configuration in which the CPU 12 reduces the image of each frame data and then extracts the motion region, it is possible to reduce the misperception as described above, and to reduce differences such as noise. In other words, even if a difference such as noise has occurred, it becomes difficult to mistake the difference such as noise as a motion region. Further, by performing the image reduction processing, the data amount of each frame data is reduced, and the amount of calculation by the CPU 12 can be reduced.

画像縮小処理には様々な方法があるが、必要なデータを残しつつ最低限の計算量にてノイズ成分のみを減少させるために、本実施の形態においては画像の縮小率を２ⁿ（ｎ：２以上の整数）としている。つまり、前記画像縮小部１２−１は、前記複数のフレームデータの各々を１／２ⁿ（ｎ：整数）倍に縮小する機能を有する。本実施の形態においては、縮小されたフレームデータの各々を生成するための演算は、縮小前のフレームデータの各々のデータ値を平均することによって行う。 There are various methods for image reduction processing. In this embodiment, the image reduction ratio is set to 2 ⁿ (n: An integer of 2 or more). That is, the image reduction unit 12-1 has a function of reducing each of the plurality of frame data by 1/2 ⁿ (n: integer) times. In the present embodiment, the calculation for generating each reduced frame data is performed by averaging the data values of the frame data before reduction.

例えば、縮小率２（ｎ＝１）のとき、縮小前の座標（ｘ，ｙ）の画素データをｇ（ｘ，ｙ）、縮小後の画素データをｆ（ｘ，ｙ）とすれば、縮小後の画素データｆ（ｘ，ｙ）は次式により求めることが出来る。より詳細には、ＣＰＵ１２が、４つの画素データを平均して、１つの画度データとして出力するのである。 For example, when the reduction ratio is 2 (n = 1), the pixel data at the coordinates (x, y) before reduction is g (x, y) and the pixel data after reduction is f (x, y). The subsequent pixel data f (x, y) can be obtained by the following equation. More specifically, the CPU 12 averages the four pixel data and outputs the averaged pixel data as one image data.

シフト演算機能を有するプロセッサ（ＣＰＵ１２）を備える画像処理装置１０においては、２ⁿ（ｎ：整数）で表せる整数による除算（あるいは乗算）は、前記プロセッサによる１回のシフト演算のみにて為し得ることができるため、前記プロセッサによる演算処理を最低限に抑えることが出来る。 In the image processing apparatus 10 including a processor (CPU 12) having a shift operation function, division (or multiplication) by an integer that can be expressed by 2 ⁿ (n: integer) can be performed only by one shift operation by the processor. Therefore, the arithmetic processing by the processor can be minimized.

肌色領域抽出部（第１の抽出手段）１２−２は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。肌色領域抽出部１２−２は、画像縮小部１２−１によって縮小した複数のフレームデータの各々から、色相に基づいて肌色領域（肌色成分）を抽出する。ただし、色相のみに基づいて肌色領域を抽出する構成に限定するものではなく、画像処理装置１０は、他の方法を組み合わせて肌色領域を抽出する構成であってもよい。 The skin color area extracting unit (first extracting unit) 12-2 is realized by executing on the CPU 12 a program read from the fixed disk 14 to the memory 13, for example. The skin color region extraction unit 12-2 extracts a skin color region (skin color component) from each of the plurality of frame data reduced by the image reduction unit 12-1, based on the hue. However, the present invention is not limited to the configuration for extracting the skin color area based only on the hue, and the image processing apparatus 10 may be configured to extract the skin color area by combining other methods.

本実施の形態においては、肌色領域抽出処理を色相計算に基づいて行っている。色相とは色空間のＨＳＶ（Hue, Saturation, Value）モデルの中の一つのパラメータであり、色相（Ｈｕｅ）は、０度〜３６０度で表現され、光の波長（例えば、赤色と黄色との間の区別）によって識別される色の種類を示す。一般的に、コンピュータで扱う画像データはＲ，Ｇ，Ｂフォーマットであり、たとえば入力された画像がＲ，Ｇ，Ｂの３成分から成り立っている場合、次式に示す式から色相Ｈｕを求めることができる。但し、次式中のＲ，Ｇ，Ｂは０〜１の範囲の数値であり、次式中のＭＡＸはＲ，Ｇ，Ｂの値の最大値であり、ＭＩＮはＲ，Ｇ，Ｂ値の最小値である。 In the present embodiment, the skin color area extraction processing is performed based on hue calculation. Hue is a parameter in the HSV (Hue, Saturation, Value) model of the color space, and the hue (Hue) is expressed by 0 degrees to 360 degrees, and the light wavelength (for example, red and yellow) The type of color identified by In general, image data handled by a computer is in R, G, B format. For example, when an input image is composed of three components of R, G, B, the hue Hu is obtained from the following equation. Can do. However, R, G, and B in the following equation are values in the range of 0 to 1, MAX in the following equation is the maximum value of R, G, and B, and MIN is the R, G, and B value. The minimum value.

これまでの実験により、Ｈｕの値が６〜３８の範囲にあれば肌色とみなせることが判っている。但し、本実施の形態に係る画像処理装置１０は、人物の顔画像を認識するための他の判断手段も有しているため、Ｈｕの値を広めに設定してもよい。たとえば、Ｈｕの値が０〜４５の範囲内にあれば肌色とみなす構成としてもよい。 According to previous experiments, it has been found that if the value of Hu is in the range of 6 to 38, it can be regarded as skin color. However, since the image processing apparatus 10 according to the present embodiment also has other determination means for recognizing a person's face image, the Hu value may be set wider. For example, if the value of Hu is in the range of 0 to 45, the skin color may be considered.

図３は動画像から抽出された１つのフレームデータを示した概念図である。図４は動画像から抽出された２つのフレームデータを重ねて示した概念図である。図５は動画像から抽出された２つのフレームデータの肌色領域３０１ａを重ねて示した概念図である。図３〜図５に示すように、本実施の形態においては、肌色領域抽出部１２−２は、各々のフレームデータにおいてそれぞれの画素毎に、前記色相（Ｈｕ）の値が６〜３８の範囲である場合に「１」を設定し（図４および図５における領域３０１ａ）、前記色相（Ｈｕ）の値が６〜３８の範囲にない場合に「０」を設定することによって（図４および図５における領域３０１ａ以外の領域）、２値のフレームデータを生成する。つまり、本実施の形態においては、肌色領域抽出部１２−２が肌色領域を「１」とする２値のフレームデータを生成することによって、肌色領域を抽出する。 FIG. 3 is a conceptual diagram showing one frame data extracted from a moving image. FIG. 4 is a conceptual diagram showing two frame data extracted from a moving image in an overlapping manner. FIG. 5 is a conceptual diagram showing the skin color region 301a of two frame data extracted from a moving image in an overlapping manner. As shown in FIGS. 3 to 5, in the present embodiment, the flesh color region extraction unit 12-2 has a hue (Hu) value in the range of 6 to 38 for each pixel in each frame data. Is set to “1” (region 301a in FIGS. 4 and 5), and “0” is set when the hue (Hu) value is not in the range of 6 to 38 (FIG. 4 and FIG. 4). Binary frame data is generated (region other than the region 301a in FIG. 5). In other words, in the present embodiment, the skin color area extracting unit 12-2 generates the skin color area by generating binary frame data in which the skin color area is “1”.

本実施の形態においては、前述したように、動き領域の抽出を行う際には、現フレームデータと現フレームデータより１つ以上前のフレームデータとの差異を求めるための演算をおこなうが、予めこれら対象となる２枚のフレームデータから必要な色成分を持つ領域を示す２値画像（２値のフレームデータ）を生成しておき、必要なエリアのみにおいて差異の違いを求める演算を行う。 In the present embodiment, as described above, when extracting a motion region, an operation for obtaining a difference between current frame data and one or more previous frame data is performed. A binary image (binary frame data) indicating a region having a necessary color component is generated from the two pieces of frame data to be processed, and an operation for obtaining a difference in only a necessary area is performed.

この場合には、一旦、動画像を構成するそれぞれのフレームデータを２値のフレームデータに変換にすることにより、２つのフレームデータの差異を求める処理がＡＮＤ演算もしくはＯＲ演算のみを行うことによって実現できるため、ＣＰＵ１２による演算量を減らすことが出来る。 In this case, by converting each frame data constituting the moving image into binary frame data, the process for obtaining the difference between the two frame data is realized by performing only an AND operation or an OR operation. Therefore, the calculation amount by the CPU 12 can be reduced.

肌色動き領域抽出部１２−３は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。肌色動き領域抽出部１２−３は、前記複数のフレームデータのそれぞれから抽出された肌色領域（図５における領域３０１ａ）の第１の重複部分（図５における領域３０１ｂ）を抽出する。 The flesh color motion region extraction unit 12-3 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. The skin color movement region extraction unit 12-3 extracts a first overlapping portion (region 301b in FIG. 5) of the skin color region (region 301a in FIG. 5) extracted from each of the plurality of frame data.

図４および図５に示すように、肌色動き領域抽出部１２−３は、肌色領域抽出部１２−２により求められた肌色領域を「１」とした２値のフレームデータのうち、最新のフレームデータと当該フレームデータよりも過去のフレームデータとの２枚フレームデータを用いて、それらのフレームデータにＡＮＤ処理を施すことによって、肌色動き領域（図５における領域３０１ｂ）を抽出する。前述したように、肌色領域抽出部１２−２によって生成されたフレームデータでは色相計算に基づいて肌色部分の画素には「１」が、肌色ではない部分の画素には「０」が設定されており、それらのフレームデータを用いて「１」が重複する領域を演算することによって肌色動き領域３０１ｂを抽出することができる。 As shown in FIGS. 4 and 5, the flesh color motion region extraction unit 12-3 is the latest frame of the binary frame data in which the flesh color region obtained by the flesh color region extraction unit 12-2 is “1”. The skin color motion region (region 301b in FIG. 5) is extracted by performing AND processing on the frame data using two pieces of frame data of the data and the previous frame data. As described above, in the frame data generated by the skin color region extraction unit 12-2, “1” is set for the pixels of the skin color portion and “0” is set for the pixels of the skin color portion based on the hue calculation. The skin color motion region 301b can be extracted by calculating a region where “1” overlaps using these frame data.

つまり、本実施の形態に係る画像処理装置１０においては、肌色領域抽出部１２−２は、前記複数のフレームデータの各々から、前記肌色領域３０１ａと前記肌色領域３０１ａ以外の領域とからなる２値のフレームデータを生成することによって、肌色領域３０１ａを抽出し、肌色動き領域抽出部１２−３は、前記２値のフレームデータのそれぞれから抽出された肌色領域３０１ａの第１の重複部分３０１ｂを抽出する構成になっている。 That is, in the image processing apparatus 10 according to the present embodiment, the skin color area extraction unit 12-2 includes a binary consisting of the skin color area 301a and an area other than the skin color area 301a from each of the plurality of frame data. The skin color region 301a is extracted by generating the frame data, and the skin color motion region extraction unit 12-3 extracts the first overlapping portion 301b of the skin color region 301a extracted from each of the binary frame data. It is configured to do.

顔候補位置検出部（第１の横幅算出部）１２−４は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。図２〜図５に示すように、顔候補位置検出部（第１の横幅算出部）１２−４は、肌色動き領域抽出部１２−３にて抽出した肌色動き領域（図３および図５の領域３０１ｂ）に基づいて、擬似的に矩形領域（図３および図５の領域３０２）を求め、その位置情報と当該矩形領域の横幅Ｗ_fを求める。つまり、顔候補位置検出部１２−４は、前記肌色動き領域３０１ｂの横幅Ｗ_fを算出する。 The face candidate position detection unit (first lateral width calculation unit) 12-4 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. As shown in FIGS. 2 to 5, the face candidate position detection unit (first horizontal width calculation unit) 12-4 extracts the skin color motion region (in FIGS. 3 and 5) extracted by the skin color motion region extraction unit 12-3. Based on the area 301b), a pseudo rectangular area (the area 302 in FIGS. 3 and 5) is obtained, and the position information and the lateral width W _f of the rectangular area are obtained. That is, the face candidate position detection unit 12-4 calculates the lateral width W _f of the skin color movement region 301b.

ここで、横幅Ｗ_fとは、肌色動き領域３０１ｂの、撮像装置２１によって撮像された際における実際の水平方向の長さをいい、表示された際や、画像処理を行う際の方向を指定しているものではない。言い換えれば、横幅とは、撮像対象である人物の顔や胴体の水平方向の長さ（横幅）をいうものとし、撮像装置２１から出力されるフレームデータの水平方向や垂直方向には関わりがない。 Here, the width W _f, of the skin color motion area 301b, refers to the actual length of the horizontal direction at the time taken by the imaging device 21, and when it is displayed, to specify the direction when the image processing It is not what you have. In other words, the horizontal width means the horizontal length (horizontal width) of the face or torso of the person to be imaged, and is not related to the horizontal direction or vertical direction of the frame data output from the imaging device 21. .

擬似的に肌色動き領域３０１ｂを含む矩形領域（図３および図５の領域３０２）を求める方法は以下のようになる。まず、２枚の２値のフレームデータから求めた肌色動き領域３０１ｂにおいて、その肌色動き領域３０１ｂの水平方向の最大位置Ｘ_fmaxと垂直方向の最大位置Ｙ_fmaxから求められる座標Ｓ（Ｘ_fmax，Ｙ_fmax）を定義し、水平方向の最小位置Ｘ_fminと垂直方向の最小位置Ｙ_fminから求められる座標Ｔ（Ｘ_fmin，Ｙ_fmin）を定義する。そして、近似的に矩形領域３０２の横幅Ｗ_fを次式によって計算する。 A method of obtaining a rectangular region (region 302 in FIGS. 3 and 5) that includes the skin color movement region 301b in a pseudo manner is as follows. First, in the flesh color motion region 301b obtained from two binary frame data, coordinates S (X _fmax , Y) obtained from the horizontal maximum position X _fmax and vertical maximum position Y _{fmax of} the flesh color motion region 301b. _fmax ), and coordinates T (X _fmin , Y _fmin ) obtained from the horizontal minimum position X _fmin and the vertical minimum position Y _fmin are defined. Then, the lateral width W _f of the rectangular region 302 is approximately calculated by the following equation.

一方、近似的に矩形領域３０２の縦幅Ｈ_fを次式によって計算する。 On the other hand, the vertical width H _f of the rectangular region 302 is approximately calculated by the following equation.

ここで、ノイズなどの影響を受けにくくするために、矩形領域３０２の重心（Ｘ_fg，Ｙ_fg）と、肌色動き領域３０１ｂの水平方向および垂直方向の最大位置（Ｘ_fmax，Ｙ_fmax）と、水平方向および垂直方向の最小位置（Ｘ_fmin，Ｙ_fmin）とを求めることによって、求める矩形領域３０２の中心位置を肌色動き領域３０１ｂの重心（Ｘ_fg，Ｙ_fg）としてもよい。この場合は、求める矩形領域３０２の横幅Ｗ_fを次式のように計算する。 Here, in order to reduce the influence of noise or the like, the center of gravity (X _fg , Y _fg ) of the rectangular region 302 and the horizontal and vertical maximum positions (X _fmax , Y _fmax ) of the flesh color motion region 301 _b , By _obtaining the minimum position (X _fmin , Y _fmin ) in the horizontal direction and the vertical direction, the center position of the rectangular area 302 to be obtained may be set as the center of gravity (X _fg , Y _fg ) of the skin color movement area 301b. In this case, the lateral width W _f of the desired rectangular area 302 is calculated as follows:

ここでのＭＩＮは、与えられた２つの引数のうち小さい方の引数の値を返す関数のことである。この場合は、位置情報としての長方形の対角の２点の座標をＵ（Ｘ_fg＋Ｗ_f／２，Ｙ_fg＋Ｈ_f／２），Ｖ（Ｘ_fg−Ｗ_f／２，Ｙ_fg−Ｈ_f／２）と定義する。 MIN here is a function that returns the value of the smaller of the two given arguments. In this case, the coordinates of two points of the rectangle diagonal as the position information _{_{U (X fg + W f /}} 2, Y fg + H f / 2), V (X fg -W f / 2, Y fg -H f / 2).

縦幅Ｈ_fについても、（数５）と同様に、重心を介して算出する方法を採用することが好ましい。 As for the vertical width H _f , it is preferable to employ a method of calculating via the center of gravity as in (Formula 5).

このように、横幅Ｗ_fと縦幅Ｈ_fとを重心を基準として求めることによって、抽出された肌色動き領域３０１ｂが、ノイズなどの影響などにより正しい肌色動き領域３０１ｂより遠くに検出された場合であっても、遠くに検出された成分による影響を低減することができ、その結果として映像上のノイズなどの影響を受け難くすることが出来るようになる。 As described above, when the horizontal width W _f and the vertical width H _f are obtained on the basis of the center of gravity, the extracted skin color motion region 301b is detected farther than the correct skin color motion region 301b due to the influence of noise or the like. Even if it exists, the influence by the component detected in the distance can be reduced, As a result, it becomes difficult to receive the influence of the noise etc. on an image | video.

より詳細には、たとえば、フレームデータ（動画像）の一方の端に肌色とみなせるようなノイズが発生したときに、上記のような単純に矩形領域３０２の中心を求める構成の場合には、肌色動き領域３０１ｂの中心が前記ノイズが発生した方向へ移動してしまうという不具合が生じる。しかし、本実施の形態に係る画像処理装置１０においては、横幅Ｗ_fを（数６）に基づいて算出するため、Ｘ_fmaxとＸ_fminのどちらかにノイズが入ったとしてもＷ_fの値の変化を低減することができる。つまり、一般的に、位置を求める場合には、重心を採用する方法の方が、最大値と最小値の平均値を採用する方法よりも、ノイズの影響を小さくすることができる。 More specifically, for example, in the case of the configuration in which the center of the rectangular region 302 is simply obtained when noise that can be regarded as skin color occurs at one end of the frame data (moving image), the skin color There arises a problem that the center of the movement area 301b moves in the direction in which the noise occurs. However, in the image processing apparatus 10 according to the present embodiment, since the lateral width W _f is calculated based on ( _Equation 6), even if noise enters either X _fmax or X _fmin , the value of W _f Changes can be reduced. That is, in general, when obtaining the position, the method of employing the center of gravity can reduce the influence of noise compared to the method of employing the average value of the maximum value and the minimum value.

肩候補領域抽出部１２−５は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。肩候補領域抽出部１２−５は、前記複数のフレームデータの各々から、前記肌色動き領域（第１の重複領域）３０１ｂより下方の領域内における肩候補抽出対象エリア（図４における領域３０３）を設定する。 The shoulder candidate region extraction unit 12-5 is realized by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12, for example. The shoulder candidate region extraction unit 12-5 extracts a shoulder candidate extraction target area (region 303 in FIG. 4) in a region below the skin color movement region (first overlap region) 301b from each of the plurality of frame data. Set.

これによって、後述する肩候補動き領域抽出部１２−６が、画像縮小部１２−１が縮小して生成したフレームデータのうち、顔候補位置検出部（顔候補横幅算出部）１２−４によって得られた位置情報に基づいて定められる、前記肌色動き領域３０１ｂより下方にある特定の大きさの矩形領域（肩候補抽出対象エリア）の中のみにおいて、肩候補領域３０６ａおよび肩候補動き領域３０６ｂの抽出処理を行うことができる。 As a result, the shoulder candidate motion region extraction unit 12-6 described later is obtained by the face candidate position detection unit (face candidate width calculation unit) 12-4 out of the frame data generated by the image reduction unit 12-1 being reduced. The shoulder candidate region 306a and the shoulder candidate motion region 306b are extracted only in a rectangular region (shoulder candidate extraction target area) having a specific size below the skin color motion region 301b, which is determined based on the position information. Processing can be performed.

これは、直立姿勢で前を向いている人物の場合、その人物の肩は、その人物の顔の下方に位置する所定の領域（本実施の形態においては、図３および図４の領域３０３）の中にあると予想されるからである。つまり、肌色動き領域抽出部１２−３にて得られた肌色動き領域（図３および図５の領域３０１ｂ）に人物の顔部分が存在する場合には、通常、その人物の肩部分は前記顔部分の下方に位置するからである。また、肩領域の大きさは顔領域の大きさに基づいてある程度の大きさに限定されるからである。 In the case of a person facing up in an upright posture, the shoulder of the person is a predetermined area located below the person's face (in this embodiment, the area 303 in FIGS. 3 and 4). Because it is expected to be in That is, when a face portion of a person exists in the skin color movement region (region 301b in FIGS. 3 and 5) obtained by the skin color movement region extraction unit 12-3, the shoulder portion of the person is usually the face. It is because it is located below the part. Further, the size of the shoulder region is limited to a certain size based on the size of the face region.

図６は肩候補領域３０６ａを抽出するための抽出対象エリア３０３を示す概念図である。図７は抽出された肩候補領域３０６ａを示す概念図である。図６に示すように、本実施の形態においては、肩候補動き領域３０６ｂを抽出するために肩候補領域３０６ａを抽出する際において、当該肩候補動き領域３０６ｂを抽出するための抽出対象エリア３０３をある程度まで限定することができ、当該限定によってＣＰＵ１２による演算量を低減することが出来る。 FIG. 6 is a conceptual diagram showing an extraction target area 303 for extracting the shoulder candidate region 306a. FIG. 7 is a conceptual diagram showing the extracted shoulder candidate region 306a. As shown in FIG. 6, in the present embodiment, when extracting the shoulder candidate motion region 306b in order to extract the shoulder candidate motion region 306b, the extraction target area 303 for extracting the shoulder candidate motion region 306b is It can be limited to a certain extent, and the amount of calculation by the CPU 12 can be reduced by the limitation.

より詳細には、肩候補領域抽出部１２−５は、前記顔候補位置検出部１２−４にて算出されたＳ（Ｘ_fmax，Ｙ_fmax）とＴ（Ｘ_fmin，Ｙ_fmin）との座標値に基づいて、あるいはＴ（Ｘ_fmin，Ｙ_fmin）の座標値と横幅Ｗ_fの値とに基づいて、肌色動き領域３０１ｂの下方に抽出対象エリア３０３を設定するのである。 More specifically, the shoulder candidate region extraction unit 12-5 has coordinate values of S (X _fmax , Y _fmax ) and T (X _fmin , Y _fmin ) calculated by the face candidate position detection unit 12-4. Or based on the coordinate value of T (X _fmin , Y _fmin ) and the value of the horizontal width W _f , the extraction target area 303 is set below the skin color movement region 301b.

図６に示すように、本実施の形態においては、抽出対象エリア３０３は、肌色動き領域３０１ｂの下方に肌色動き領域３０１ｂに隣接して位置するものとして、肌色動き領域３０１ｂの左右中心と同じ左右中心を有するエリアとしている。そして、抽出対象エリア３０３の縦幅は肌色動き領域３０１ｂの縦幅Ｈ_fに所定の係数Ｂ（たとえば、Ｂ＝１）を乗じた値とし、抽出対象エリア３０３の横幅は肌色動き領域３０１ｂの横幅Ｗ_fに所定の係数Ｃ（たとえば、Ｃ＝２．２）を乗じた値としている。 As shown in FIG. 6, in the present embodiment, the extraction target area 303 is located below the skin color motion area 301b and adjacent to the skin color motion area 301b. The area has a center. The vertical width of the extraction target area 303 is a value obtained by multiplying the vertical width H _f of the skin color movement region 301b by a predetermined coefficient B (for example, B = 1), and the horizontal width of the extraction target area 303 is the horizontal width of the skin color movement region 301b. A value obtained by multiplying W _f by a predetermined coefficient C (for example, C = 2.2) is used.

より詳細には、肩候補領域抽出部１２−５は、抽出対象エリア３０３のＸ方向の最大値Ｘ_smaxを以下の式に基づいて算出する。 More specifically, the shoulder candidate region extraction unit 12-5 calculates the maximum value X _smax in the X direction of the extraction target area 303 based on the following equation.

そして、肩候補領域抽出部１２−５は、抽出対象エリアのＸ方向の最小値Ｘ_sminを以下の式に基づいて算出する。 And the shoulder candidate area | region extraction part 12-5 calculates the minimum value _Xsmin of the X direction of an extraction object area based on the following formula _| equation.

そして、肩候補領域抽出部１２−５は、抽出対象エリアのＹ方向の最大値Ｙ_smaxを以下の式に基づいて算出する。 Then, the shoulder candidate region extraction unit 12-5 calculates the maximum value Y _smax in the Y direction of the extraction target area based on the following equation.

そして、肩候補領域抽出部１２−５は、抽出対象エリアのＹ方向の最小値Ｙ_smaxを以下の式に基づいて算出する。 Then, the shoulder candidate region extraction unit 12-5 calculates the minimum value Y _smax in the Y direction of the extraction target area based on the following equation.

また、肩候補領域抽出部１２−５は、前記複数のフレームデータの各々から、所定の色パラメータに基づいて肌色動き領域（第１の重複部分）３０１ｂより下方の領域（抽出対象エリア３０３）内における肩候補領域３０６ａを抽出する。すなわち、ＣＰＵ１２が、それぞれのフレームデータを一旦２値のフレームデータに変換する。前記２値のフレームデータとは、あるレベルの輝度値（しきい値）より高いものを「１」、低いものを「０」としたデータをいう。すなわち、本実施の形態においては、前記所定の色パラメータは、前記フレームデータの各々における画素毎の輝度値としている。 Further, the shoulder candidate region extraction unit 12-5 includes a region (extraction target area 303) below the skin color movement region (first overlapping portion) 301b based on a predetermined color parameter from each of the plurality of frame data. The shoulder candidate region 306a is extracted. That is, the CPU 12 once converts each frame data into binary frame data. The binary frame data is data in which “1” is higher than a certain level of luminance value (threshold value) and “0” is lower. That is, in the present embodiment, the predetermined color parameter is a luminance value for each pixel in each of the frame data.

図８は各々のフレームデータにおけるそれぞれの画素に対応する輝度値と当該輝度値が設定されている画素数との分布を示した概念図である。前記しきい値は、たとえば以下のようにして設定される。即ち、図３に示すような撮像画像が入力されている場合には、抽出対象エリア３０３に人物の肩部分とその背景の部分（肩以外の部分）とが含まれているため、当該フレームデータにおける輝度値と画素数との分布は、図８に示すような形状になると予想される。そこで、本実施の形態においては、当該分布の谷の部分に相当する輝度値をしきい値として設定する。 FIG. 8 is a conceptual diagram showing the distribution of the luminance value corresponding to each pixel in each frame data and the number of pixels for which the luminance value is set. The threshold value is set as follows, for example. That is, when a captured image as shown in FIG. 3 is input, the extraction target area 303 includes a person's shoulder portion and a background portion (portion other than the shoulder), and thus the frame data. The distribution of the luminance value and the number of pixels in is expected to have a shape as shown in FIG. Therefore, in the present embodiment, the luminance value corresponding to the valley portion of the distribution is set as the threshold value.

上記のように、ＣＰＵ１２は、フレームデータの各々を、それぞれの画素の輝度値としきい値とに基づいて、２値のフレームデータに変換する。各々のフレームデータにおいては、肩候補領域３０６ａの画素に対して「１」が、それ以外の領域の画素に対して「０」が出力される。本実施の形態においては、肩候補領域３０６ａを求める際には、前述したように、抽出対象エリア３０３内のみにおいて演算を行う。 As described above, the CPU 12 converts each piece of frame data into binary frame data based on the luminance value and threshold value of each pixel. In each frame data, “1” is output for the pixels in the shoulder candidate region 306a, and “0” is output for the pixels in the other regions. In the present embodiment, when obtaining the shoulder candidate region 306a, the calculation is performed only within the extraction target area 303 as described above.

そして、肩候補動き領域抽出部１２−６は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。肩候補動き領域抽出部１２−６は、肩候補領域抽出部１２−５にて設定した抽出対象エリア３０３において肩候補動き領域３０６ｂを求めるものであって、前記複数のフレームデータのそれぞれから抽出された肩候補領域３０６ａから肩候補動き領域３０６ｂ（第２の重複部分）を抽出する。 The shoulder candidate motion region extraction unit 12-6 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. The shoulder candidate motion region extraction unit 12-6 obtains a shoulder candidate motion region 306b in the extraction target area 303 set by the shoulder candidate region extraction unit 12-5, and is extracted from each of the plurality of frame data. The shoulder candidate motion region 306b (second overlapping portion) is extracted from the shoulder candidate region 306a.

ここで、肩候補動き領域３０６ｂの算出方法について説明する。肩候補動き領域３０６ｂの抽出には、時間の異なる２枚のフレームデータの差異を利用する。撮像対象に動きがあると２枚のフレームデータ上に差異が生じるが、本実施の形態においては、肌色動き領域抽出処理において説明したように、ＣＰＵ１２による演算量を低減させるべく、ＣＰＵ１２が、それぞれのフレームデータを一旦２値のフレームデータに変換してから、２枚のフレームデータ間の差異の計算を行う。 Here, a method of calculating the shoulder candidate motion region 306b will be described. For extraction of the shoulder candidate motion region 306b, a difference between two pieces of frame data having different times is used. If there is a motion in the imaging target, a difference occurs between the two frame data. In this embodiment, as described in the skin color motion region extraction process, the CPU 12 The frame data is once converted into binary frame data, and then the difference between the two pieces of frame data is calculated.

図９は撮像対象に動きがある場合の２枚の２値のフレームデータを重ねた概念図である。つまり、図９は時間の異なるフレームデータを２枚を重ねたイメージ図であって、図４および図５にも示したように、撮像時間が異なる２枚のフレームデータにおいては、撮像対象の画像がずれて撮像されている。前述したように、本実施の形態においては、肩候補動き領域３０６ｂを求める際には、抽出対象エリア３０３内のみにおいてフレームデータ間の差異を求める演算を行う。 FIG. 9 is a conceptual diagram in which two pieces of binary frame data are overlapped when there is a movement in the imaging target. That is, FIG. 9 is an image diagram in which two pieces of frame data having different times are overlapped. As shown in FIGS. 4 and 5, in two pieces of frame data having different imaging times, the image to be imaged is Images are taken out of position. As described above, in the present embodiment, when obtaining the shoulder candidate motion region 306b, an operation for obtaining a difference between frame data only in the extraction target area 303 is performed.

より詳細には、ＣＰＵ１２は、両フレームデータの論理積（ａｎｄ）を演算する。これによって、演算後のデータとしては肩候補領域３０６ａの重複部分３０６ｂのみが「１」として出力される（図９におけるドット領域）。ただし、論理積（ａｎｄ）を使う代わりに排他論理和（ｅｘ−ｏｒ）を用いてもよい。 More specifically, the CPU 12 calculates a logical product (and) of both frame data. As a result, only the overlapping portion 306b of the shoulder candidate region 306a is output as “1” as the data after the calculation (dot region in FIG. 9). However, an exclusive OR (ex-or) may be used instead of the logical product (and).

肩候補動き領域３０６ｂの求め方は、肌色動き領域３０１ｂの求め方と比べて、２値のフレームデータに変換する方法において、（数２）に示すような色相Ｈｕｅを用いるのではなく、輝度値を用いる点において異なる。本実施の形態における所定の色パラメータは、前記フレームデータの各々における画素毎の輝度値としているため、撮像する周辺環境によって前記しきい値を変化させることが好ましい。 The method of obtaining the shoulder candidate motion region 306b is not using the hue Hue as shown in (Equation 2) in the method of converting to binary frame data as compared with the method of obtaining the skin color motion region 301b, but using the luminance value. Is different in that it is used. Since the predetermined color parameter in the present embodiment is a luminance value for each pixel in each of the frame data, it is preferable that the threshold value is changed depending on the surrounding environment for imaging.

肩候補位置検出部１２−７は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。肩候補位置検出部１２−７は、擬似的に肩候補動き領域３０６ｂが囲まれる矩形領域（図３および図９の領域３０５）を求め、その矩形領域の位置と横幅Ｗ_sとを求めるものである。つまり、肩候補位置検出部１２−７は、前記肩候補動き領域３０６ｂ（第２の重複部分）の横幅Ｗ_sを算出する。本実施の形態においては、肩候補動き領域抽出処理の演算後の肩候補動き領域３０６ｂの一番左端の部分（図４における線Ｄ）と右端の部分（図４における線Ｅ）との距離が、肩候補動き領域３０６ｂの横幅Ｗ_sとなる。横幅Ｗ_sの算出方法は、肌色動き領域３０１ｂの横幅Ｗ_fの算出方法（（数３）を参照。）と同様であるので、ここでは説明を繰り返さない。 The shoulder candidate position detection unit 12-7 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. The shoulder candidate position detection unit 12-7 obtains a rectangular area (area 305 in FIGS. 3 and 9) in which the shoulder candidate motion area 306b is artificially surrounded, and obtains the position of the rectangular area and the horizontal width W _s. is there. That is, the shoulder candidate position detection unit 12-7 calculates the width W _s of the shoulder candidate motion region 306b (second overlapping portion). In the present embodiment, the distance between the leftmost portion (line D in FIG. 4) and the right end portion (line E in FIG. 4) of the shoulder candidate motion region 306b after the calculation of the shoulder candidate motion region extraction process is as follows. , The width W _s of the shoulder candidate motion region 306b. The method for calculating the lateral width W _{s is the same as} the method for calculating the lateral width W _f of the flesh color motion region 301b (see (Equation 3)), and therefore description thereof will not be repeated here.

より詳細には、横幅Ｗ_sを算出する際には、図４に示すように、左右の動き量Δｄを考慮する必要があるが、ΔｄはＷ_sと比較して小さな数値であるため、本実施の形態においてはΔｄ＝０としている。ただし、２枚のフレームデータの時間間隔を大きく設定して、フレームデータ間の重複部分を求める場合には、Δｄの値が無視できなくなるため、算出されたＷ_sの値を補正することが好ましい。 More specifically, when calculating the lateral width W _s , as shown in FIG. 4, it is necessary to consider the left and right motion amount Δd. However, since Δd is a smaller value than W _s , In the embodiment, Δd = 0. However, when the time interval between the two pieces of frame data is set to be large and an overlapping portion between the frame data is obtained, the value of Δd cannot be ignored. Therefore, it is preferable to correct the calculated value of W _s. .

横幅比較部（比較手段）１２−８は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。横幅比較部１２−８は、肌色動き領域（第１の重複部分）３０１ｂの横幅Ｗ_sと肩候補動き領域（第２の重複部分）３０６ｂの横幅Ｗ_fとを比較する。言い換えれば、横幅比較部１２−８は、顔候補位置検出部１２−４にて算出した肌色動き領域３０１ｂの横幅（図３および図５の顔候補の横幅Ｗ_f）と肩候補位置検出部１２−７にて算出された肩候補動き領域３０６ｂの横幅（図３および図９の横幅Ｗ_s）に基づいて比較値を求める。 The horizontal width comparison unit (comparison means) 12-8 is realized by executing on the CPU 12 a program read from the fixed disk 14 to the memory 13, for example. Width comparison unit 12-8 compares the width W _f of the skin color motion area width W _s and the shoulder candidate motion region (first overlapping portion) 301b (second overlapping portion) 306 b. In other words, the lateral width comparison unit 12-8 has the lateral width of the skin color motion region 301b calculated by the face candidate position detection unit 12-4 (the lateral width W _f of the face candidate in FIGS. 3 and 5) and the shoulder candidate position detection unit 12. A comparison value is obtained based on the lateral width (lateral width W _{s in} FIGS. 3 and 9) of the shoulder candidate motion region 306b calculated in −7.

本実施の形態においては、前記比較値として、肩候補位置検出部１２−７で求めた肩候補動き領域３０６ｂの横幅Ｗ_sを、顔候補位置検出部１２−４にて求めた肌色動き領域３０１の横幅Ｗ_fによって割り算することにより算出した、比較値Ｗ_s／Ｗ_fを用いる。 In the present embodiment, as the comparison value, the width W _s of the shoulder candidate motion region 306b obtained by the shoulder candidate position detection unit 12-7 is used as the skin color motion region 301 obtained by the face candidate position detection unit 12-4. of it was calculated by dividing the width W _f, a comparison value W _s / W _f.

顔領域判断部（認識手段）１２−９は、たとえば固定ディスク１４からメモリ１３へと読み出されたプログラムがＣＰＵ１２上で実行されることによって実現される。顔領域判断部１２−９は、横幅比較部１２−８にて得られた比較値（比較結果）Ｗ_s／Ｗ_fが所定の条件を満たす場合に、前記肌色領域３０１ａを顔領域として認識する。すなわち、顔領域判断部１２−９は、横幅比較部１２−８にて求めた比較値Ｗ_s／Ｗ_fが予め設定された範囲内であれば、顔候補位置検出部（第１の横幅算出部）１２−４にて求めた位置（図３の領域３０１ｂが含まれる領域３０２）が人物の顔が存在する位置であると判断する。同時に、肩候補位置検出部１２−７にて求めた位置（図３における領域３０６ｂが囲まれる領域３０５）が肩が存在する位置であると判断する。 The face area determination unit (recognition means) 12-9 is realized, for example, by executing a program read from the fixed disk 14 to the memory 13 on the CPU 12. Face area determination section 12-9 recognizes, when comparison value obtained in the width comparison unit 12-8 (Comparative Results) W _s / W _f satisfies a predetermined condition, the skin color region 301a as the face area . That is, if the comparison value W _s / W _f obtained by the width comparison unit 12-8 is within a preset range, the face area determination unit 12-9 determines the face candidate position detection unit (first width calculation). Part) It is determined that the position obtained in 12-4 (the area 302 including the area 301b in FIG. 3) is a position where a human face exists. At the same time, it is determined that the position obtained by the shoulder candidate position detection unit 12-7 (the area 305 surrounded by the area 306b in FIG. 3) is the position where the shoulder exists.

本実施の形態においては、比較値Ｗ_s／Ｗ_fが１．４〜１．６の範囲に入っているときに、ＣＰＵ１２は、比較値（比較結果）Ｗ_s／Ｗ_fが所定の条件を満たしていると判断し、肌色領域３０１ａを人物の顔領域として認識する構成としている。但し、このＷ_s／Ｗ_fの１．４〜１．６の範囲はある実験環境によって求められたものであり、必ずしもこの範囲があらゆる状況においても適しているわけではないということは言うまでもない。 In the present embodiment, when the comparison value W _s / W _f is in the range of 1.4 to 1.6, the CPU 12 determines that the comparison value (comparison result) W _s / W _f satisfies a predetermined condition. It is determined that the skin color is satisfied, and the skin color area 301a is recognized as a human face area. However, it is needless to say that the range of 1.4 to 1.6 of W _s / W _f is obtained by a certain experimental environment, and this range is not necessarily suitable for every situation.

ここで、顔領域判断部１２−９が、前記肌色動き領域（第１の重複部分）３０１ｂの横幅Ｗ_fと前記肩候補動き領域（第２の重複部分）３０６ｂの横幅Ｗ_sとに基づいて、前記肌色領域３０１ａを顔領域として認識するか否かを判断する構成であってもよい。より詳細には、顔領域判断部１２−９が、前記肌色動き領域（第１の重複部分）３０１ｂの横幅Ｗ_fと前記肩候補動き領域（第２の重複部分）３０６ｂの横幅Ｗ_sとの比率Ｗ_s／Ｗ_fを計算し、前記比率Ｗ_s／Ｗ_fが予め定められた範囲内にあるか否かを判断し、前記比率Ｗ_s／Ｗ_fが予め定められた範囲内にあると判断された場合に前記肌色領域３０１ａを顔領域として認識する構成であってもよい。 Here, the face area determination unit 12-9, the skin color motion region (first overlapping portion) said the horizontal width W _f shoulder candidate motion region (second overlapping portion) of 301b based on the width W _s of 306b The skin color area 301a may be determined to be recognized as a face area. More specifically, the face area determination unit 12-9 determines that the width W _{f of the} flesh color movement area (first overlapping part) 301b and the width W _s of the shoulder candidate movement area (second overlapping part) 306b are calculated. calculate the ratio W _s / W _f, it is determined whether or not within the scope of the ratio W _s / W _f is predetermined to be within the scope of the ratio W _s / W _f is predetermined If determined, the skin color area 301a may be recognized as a face area.

本実施の形態においては、顔領域判断部１２−９は、前記肩候補動き領域（第２の重複部分）３０６ｂの横幅Ｗ_sを肌色動き領域（第１の重複部分）３０１ｂの横幅Ｗ_fで除することによって前記比率Ｗ_s／Ｗ_fを計算し、前記比率Ｗ_s／Ｗ_fが１．４〜１．６の範囲内にあると判断された場合に前記肌色領域３０１ａを顔領域として認識する。 In the present embodiment, the face area determination unit 12-9 uses the width W _s of the shoulder candidate motion area (second overlap portion) 306b as the width W _f of the skin color motion area (first overlap portion) 301b. The ratio W _s / W _f is calculated by dividing the skin color area 301a when the ratio W _s / W _f is determined to be within the range of 1.4 to 1.6. To do.

また、顔候補位置検出部１２−４と肩候補位置検出部１２−７とによって得られた肌色動き領域３０１ｂと肩候補動き領域３０６ｂの横幅Ｗ_s，Ｗ_fを比較する横幅比較処理および顔領域判断処理において、過去一回以上の顔肌色動き領域３０１ｂと肩候補動き領域３０６ｂの横幅Ｗ_s，Ｗ_fの平均値を算出してから、比較値Ｗ_s／Ｗ_fを求める構成にしてもよい。この場合にも、平均値から算出された当該比較値Ｗ_s／Ｗ_fが１．４〜１．６の範囲内にある場合に、肌色領域３０１ａを顔領域と判断したり、撮像装置２１によって人物が撮像されていると判断する構成にすることができる。 Also, a width comparison process for comparing the widths W _s and W _f of the skin color motion region 301b and the shoulder candidate motion region 306b obtained by the face candidate position detection unit 12-4 and the shoulder candidate position detection unit 12-7 and the face region In the determination process, the comparison value W _s / W _f may be obtained after calculating the average value of the lateral widths W _s and W _f of the facial skin color motion region 301b and the shoulder candidate motion region 306b at least once in the past. . Also in this case, when the comparison value W _s / W _f calculated from the average value is in the range of 1.4 to 1.6, the skin color region 301 a is determined as a face region, or the imaging device 21 It can be configured to determine that a person is imaged.

例えば、過去二回の平均Ａ₂を求める場合において、動画像のｎ番目のフレームデータと（ｎ＋１）番目のフレームデータとを評価したときの肌色動き領域３０１ｂの横幅、肩補動き領域３０６ｂの横幅をそれぞれＷ_f（ｎ），Ｗ_s（ｎ）とすると、平均値に基づく前記比較値は、ＣＰＵ１２によって次式から算出される。 For example, when calculating the average A ₂ of the past two times, the width of the flesh color motion region 301b and the width of the shoulder motion region 306b when the nth frame data and (n + 1) th frame data of the moving image are evaluated. Are W _f (n) and W _s (n), respectively, the comparison value based on the average value is calculated by the CPU 12 from the following equation.

同様に、過去ｍ回の平均を求める場合（ｍは２より大きい整数とする。）において、平均値に基づく前記比較値は、ＣＰＵ１２によって次式により算出される。 Similarly, when the average of the past m times is obtained (m is an integer greater than 2), the comparison value based on the average value is calculated by the CPU 12 using the following equation.

これにより、ノイズなどによる検出ミスを最低限に抑えることが出来る。
上記のような比較値を算出するための式（数１０）および（数１１）は、動画像中の連続するフレームデータに基づいて平均値を算出してから比較値を算出するものであるが、先に比較値を計算してから当該比較値の平均値を算出する構成であってもよい。 Thereby, detection errors due to noise or the like can be minimized.
The equations (Equation 10) and (Equation 11) for calculating the comparison value as described above are used for calculating the comparison value after calculating the average value based on continuous frame data in the moving image. Alternatively, the average value of the comparison values may be calculated after calculating the comparison values first.

また、たとえば、１フレーム飛ばし、または２フレーム飛ばしなどの飛び飛びのフレームデータを用いて平均値を算出してから比較値を算出する方法であってもよい。これらは、撮像装置２１のフレームレートの性能や、ＣＰＵ１２の性能などのハードウェア環境に応じて変化する、上記各種処理を行うために最適なフレームレートの間隔（１フレーム飛ばし、２フレーム飛ばしなど）に応じて、選択することが好ましい。 Further, for example, a method of calculating a comparison value after calculating an average value using skipped frame data such as skipping one frame or skipping two frames may be used. These vary according to the hardware environment such as the performance of the frame rate of the imaging device 21 and the performance of the CPU 12, and the optimum frame rate interval for performing the above-described various processes (one frame skipping, two frame skipping, etc.) It is preferable to select according to the above.

尚、本実施の形態に係る画像処理装置１０においては、上述したように、肌色動き領域３０１ｂが含まれる矩形領域３０２に対して、横幅が２．２倍、縦幅が１．０倍となる矩形領域を抽出対象エリア３０３として定義しているが、抽出対象エリア３０３は、肌色動き領域３０１ｂの下方にある領域であればよい。そして、抽出対象エリア３０３は、通常の肩幅より大きい幅を有する領域に設定することが好ましい。 In the image processing apparatus 10 according to the present embodiment, as described above, the horizontal width is 2.2 times and the vertical width is 1.0 times that of the rectangular area 302 including the flesh color motion area 301b. Although the rectangular area is defined as the extraction target area 303, the extraction target area 303 may be an area below the skin color movement area 301b. The extraction target area 303 is preferably set to an area having a width larger than a normal shoulder width.

本実施の形態に係る画像処理装置１０では、顔部分の下方に存在するはずの肩部分によって、肌色領域３０１ａが顔領域であるか否かを判断する形態としているが、肩部分に限定するものではなく、顔部分の下方に存在するはずの胴体の一部分によって、肌色領域３０１ａが顔領域であるか否かを判断する形態としてもよい。 The image processing apparatus 10 according to the present embodiment is configured to determine whether or not the skin color region 301a is a face region based on a shoulder portion that should exist below the face portion, but is limited to the shoulder portion. Instead, the skin color area 301a may be determined based on a part of the body that should exist below the face part.

ここで、撮像対象となる人物等が首部分を露出していると、当該首部分が肌色領域３０１ａおよび肌色動き領域３０１ｂに含まれるため、抽出対象エリア３０３や肩候補動き領域３０６ｂの位置も下方へ下がる。しかし、抽出対象エリア３０３が下方へ下がっても、肩候補動き領域３０６ｂの横幅Ｗ_sの値の変化は小さいため、顔領域の判断への影響は少ない。また、後述するように、重心を介して肌色動き領域３０１ｂを抽出する方法を採用する場合には、首部分の面積は顔部分の面積に比べて小さいため、首部分が肌色であっても抽出対象エリア３０３や肩候補動き領域３０６ｂが下方へ下がる程度を低減することができ、首部分による影響をさらに低減することができる。 Here, when a person or the like to be imaged has an exposed neck portion, the neck portion is included in the skin color region 301a and the skin color motion region 301b. Therefore, the positions of the extraction target area 303 and the shoulder candidate motion region 306b are also downward. Go down. However, even if the extraction target area 303 is lowered, the change in the value of the width W _s of the shoulder candidate motion region 306b is small, so that the influence on the determination of the face region is small. Further, as will be described later, when the method of extracting the skin color movement region 301b through the center of gravity is employed, the neck part is smaller than the face part, so that the neck part is extracted even when the skin part is skin color. The extent to which the target area 303 and the shoulder candidate motion region 306b are lowered can be reduced, and the influence of the neck portion can be further reduced.

＜顔領域認識処理＞
以下、本実施の形態に係る顔領域認識処理について説明する。図１０は顔領域認識処理を示すフローチャートである。図１１は顔領域認識処理の流れを概念的に示すイメージ図である。 <Face region recognition processing>
Hereinafter, face area recognition processing according to the present embodiment will be described. FIG. 10 is a flowchart showing face area recognition processing. FIG. 11 is an image diagram conceptually showing the flow of face area recognition processing.

図１０に示すように、まず、撮像装置２１から画像処理装置１０へ、動画像を構成するフレームデータＦ（ｎ）が入力され、ＣＰＵ１２は当該フレームデータＦ（ｎ）を順次メモリ１３に記憶する（ステップ１００、以下ステップをＳと略す。）。次に、ＣＰＵ１２等によって実現される画像縮小部１２−１が、メモリ１３に記憶されているフレームデータＦ（ｎ）を読み出して縮小処理を施し、縮小されたフレームデータＦ（ｎ）を再度メモリ１３に記憶する（Ｓ１０２）。 As shown in FIG. 10, first, frame data F (n) constituting a moving image is input from the imaging device 21 to the image processing device 10, and the CPU 12 sequentially stores the frame data F (n) in the memory 13. (Step 100, the following steps are abbreviated as S). Next, the image reduction unit 12-1 realized by the CPU 12 or the like reads out the frame data F (n) stored in the memory 13, performs reduction processing, and stores the reduced frame data F (n) in the memory again. 13 (S102).

次に、図１１（ａ）に示すように、肌色領域抽出部１２−２が、縮小されたフレームデータＦ（ｎ）を２値のフレームデータに変換して、肌色領域３０１ａを抽出する（Ｓ１０４）。肌色動き領域抽出部１２−３が、肌色領域３０１ａの重複部分（肌色動き領域）３０１ｂ）を抽出する（Ｓ１０６）。図１１（ｃ）に示すように、顔候補位置検出部（第１の横幅算出部）１２−４が、肌色動き領域３０１ｂを含む矩形領域３０２を抽出して、当該矩形領域３０２のＸ，Ｙ方向の最大値と最小値とを算出する（Ｓ１０８）。顔候補位置検出部１２−４は、算出されたＸ，Ｙ方向の最大値と最小値とに基づいて、肌色動き領域３０１ｂの横幅Ｗ_fと高さＨ_fとを算出する（Ｓ１１０）。 Next, as shown in FIG. 11A, the skin color area extraction unit 12-2 converts the reduced frame data F (n) into binary frame data, and extracts the skin color area 301a (S104). ). The skin color movement area extraction unit 12-3 extracts an overlapping portion (skin color movement area) 301b) of the skin color area 301a (S106). As shown in FIG. 11C, the face candidate position detection unit (first lateral width calculation unit) 12-4 extracts a rectangular region 302 including the skin color motion region 301b, and X, Y of the rectangular region 302 is extracted. The maximum value and the minimum value of the direction are calculated (S108). The face candidate position detection unit 12-4 calculates the horizontal width W _f and the height H _{f of} the skin color movement region 301b based on the calculated maximum and minimum values in the X and Y directions (S110).

次に、図１１（ｂ）に示すように、肩候補領域抽出部１２−５が、肌色動き領域３０１ｂのＸ，Ｙ方向の前記最大値と最小値とに基づいて、抽出対象エリア３０３（Ｆ_x（ｎ））を設定する（Ｓ１１２）。肩候補領域抽出部１２−５は、輝度値と画素数とからしきい値を算出し、もしくは予め固定ディスク１４等に記憶されているしきい値を読み出して、肩候補領域３０６ａを抽出すべく２値のフレームデータを生成する（Ｓ１１４）。肩候補動き領域１２−６は、肩候補領域３０６ａの重複部分（肩候補動き領域３０６ｂ）を抽出する（Ｓ１１６）。図１１（ｃ）に示すように、肩候補位置検出部１２−７が、肩候補動き領域３０６ｂを含む矩形領域３０５を抽出して、当該矩形領域３０５のＸ方向の最大値と最小値を算出して（Ｓ１１８）、肩候補動き領域３０６ｂの幅Ｗ_sを算出する（Ｓ１２０）。 Next, as illustrated in FIG. 11B, the shoulder candidate region extraction unit 12-5 performs the extraction target area 303 (F) based on the maximum value and the minimum value in the X and Y directions of the skin color movement region 301 b. _x (n)) is set (S112). The shoulder candidate region extraction unit 12-5 calculates a threshold value from the luminance value and the number of pixels, or reads a threshold value stored in advance in the fixed disk 14 or the like to extract the shoulder candidate region 306a. Value frame data is generated (S114). The shoulder candidate motion region 12-6 extracts an overlapping portion (shoulder candidate motion region 306b) of the shoulder candidate region 306a (S116). As shown in FIG. 11C, the shoulder candidate position detection unit 12-7 extracts a rectangular region 305 including the shoulder candidate motion region 306b, and calculates the maximum value and the minimum value in the X direction of the rectangular region 305. Then, the width W _s of the shoulder candidate motion area 306b is calculated (S120).

次に、図１１（ｄ）に示すように、顔領域判断部１２−９が、肌色動き領域３０１ｂの横幅Ｗ_sと肩候補動き領域３０６の横幅Ｗ_fとの比率（比較値）Ｗ_s／Ｗ_fが、予め設定された範囲（たとえばＰ₁〜Ｐ₂の間；本実施の形態においては１．４〜１．６の間）内に属するか否かを判断する（Ｓ１２２）。そして、前記比率Ｗ_s／Ｗ_fが前記範囲内に属する場合、すなわちＰ₁＜（Ｗ_s／Ｗ_f）＜Ｐ₂の場合（Ｓ１２２においてＹＥＳの場合）、肌色領域３０１ａを人物の顔領域と判断する（Ｓ１２４）。逆に、前記比率Ｗ_s／Ｗ_fが前記範囲内に属しない場合（Ｓ１２２においてＮＯの場合）、前記肌色領域３０１ａを人物の顔領域ではないと判断する（言い換えれば、顔領域であるとは判断しない）（Ｓ１２６）。 Next, as illustrated in FIG. 11D, the face area determination unit 12-9 determines the ratio (comparison value) W _s / of the lateral width W _s of the skin color motion area 301 b and the lateral width W _f of the shoulder candidate motion area 306. W _f is a preset range; determining whether belonging to the (e.g. between P ₁ to P ₂ between 1.4 to 1.6 in this embodiment) (S122). When the ratio W _s / W _f belongs to the range, that is, when P ₁ <(W _s / W _f ) <P ₂ (YES in S122), the skin color region 301a is set as the human face region. Judgment is made (S124). Conversely, if the ratio W _s / W _f does not fall within the range (NO in S122), the skin color area 301a is determined not to be a human face area (in other words, a face area) (S126)

最後に、ＣＰＵ１２は、前記判断結果を出力部１７−１に出力し、出力部１７−１は判断結果を外部のユーザに向けて表示したり、判断結果を信号に変換して外部のディスプレイ２０やプリンタ等に出力する（Ｓ１２８）。 Finally, the CPU 12 outputs the determination result to the output unit 17-1, and the output unit 17-1 displays the determination result for an external user, or converts the determination result into a signal and converts the determination result into an external display 20. Or output to a printer or the like (S128).

図１２は撮像装置２１に人物の顔が撮像された場合と人物の掌が撮像された場合のフレームデータを示したイメージ図である。図１２（ａ）に示すように、本実施の形態に係る画像処理装置１０は、顔候補領域の横幅Ｗ_fと肩候補領域の横幅Ｗ_sとの比率Ｗ_s／Ｗ_fが１．４＜Ｗ_s／Ｗ_f＜１．６の場合に、肌色動き領域を人物の顔領域であると判断するものである。たとえば、図１２（ｂ）に示すように撮像装置２１が人物の顔部分を撮像した場合には、Ｗ_s／Ｗ_f＝１．５となるので図１２（ｂ）の肌色動き領域は顔領域であると判断される（Ｓ１２４）。一方、図１２（ｃ）に示すように撮像装置２１が人物の手の平部分を撮像した場合には、Ｗ_s／Ｗ_f＝０．９５となるので図１２（ｃ）の肌色動き領域は顔領域ではないと判断される（Ｓ１２６）。 FIG. 12 is an image diagram showing frame data when a person's face is imaged by the imaging device 21 and when a person's palm is imaged. As shown in FIG. 12A, in the image processing apparatus 10 according to the present embodiment, the ratio W _s / W _f between the lateral width W _f of the face candidate region and the lateral width W _s of the shoulder candidate region is 1.4 <. When W _s / W _f <1.6, it is determined that the skin color movement area is a human face area. For example, as shown in FIG. 12B, when the imaging device 21 captures a human face, W _s / W _f = 1.5, so the skin color motion region in FIG. (S124). On the other hand, as shown in FIG. 12C, when the imaging device 21 images the palm of a person's palm, W _s / W _f = 0.95, so the flesh color motion region in FIG. It is determined that it is not (S126).

つまり、本実施の形態に係る画像処理装置１０は、人物の顔部分以外の、たとえば掌のような動き成分や肌色成分を有する物体を、誤って人物の顔領域として抽出してしまう可能性を低減することが可能となり、その結果、顔領域の認識性能が向上する。 In other words, the image processing apparatus 10 according to the present embodiment has a possibility that an object having a motion component or a skin color component such as a palm other than the face portion of the person may be erroneously extracted as the face region of the person. As a result, the recognition performance of the face area is improved.

上記各種の画像処理機能を有したプログラムは、それをコンピュータが読み取り可能な記録媒体、たとえば、フレキシブルディスク、メモリカード、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory）、ＭＯディスク（Magneto Optical Disk）、リムーバブルディスクなどに記録して提供したり、配布することが可能である。これにより、上記記録媒体、または、ネットワークを介して、本システムと互換のあるコンピュータや、同等の機能のプロセッサや画像エンジンを持つシステムに、上記本発明に係るコンピュータプログラムをインストールすることによって、上記特徴の本発明に係る機器制御方法の人検出装置を該コンピュータまたは該システム上で実行することができる。 The above-mentioned programs having various image processing functions are recorded on a computer-readable recording medium such as a flexible disk, a memory card, a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only). It is possible to record and provide or distribute in a memory, MO disk (Magneto Optical Disk), removable disk or the like. Accordingly, by installing the computer program according to the present invention on a computer compatible with the present system or a system having a processor or an image engine having an equivalent function via the recording medium or the network, The human detection device of the device control method according to the present invention can be executed on the computer or the system.

前記開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内においてのすべての変更が含まれることが意図される。 The disclosed embodiments are to be considered in all respects as illustrative and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本実施の形態に係る画像処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the image processing apparatus which concerns on this Embodiment. 画像処理装置の機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure of an image processing apparatus. 動画像から抽出された１つのフレームデータを示した概念図である。It is the conceptual diagram which showed one frame data extracted from the moving image. 動画像から抽出された２つのフレームデータを重ねて示した概念図である。It is the conceptual diagram which overlapped and showed two frame data extracted from the moving image. 動画像から抽出された２つのフレームデータの肌色領域を重ねて示した概念図である。It is the conceptual diagram which overlapped and showed the skin color area | region of two frame data extracted from the moving image. 肩候補領域を抽出するための抽出対象エリアを示す概念図である。It is a conceptual diagram which shows the extraction object area for extracting a shoulder candidate area | region. 抽出された肩候補領域を示す概念図である。It is a conceptual diagram which shows the extracted shoulder candidate area | region. フレームデータにおける輝度値と当該輝度値が設定されている画素数との分布を示した概念図である。It is the conceptual diagram which showed distribution with the luminance value in frame data, and the pixel number to which the said luminance value is set. 撮像対象に動きがある場合の２枚の２値のフレームデータを重ねた概念図である。It is the conceptual diagram which piled up two binary frame data in case a motion exists in an imaging target. 顔領域認識処理を示すフローチャートである。It is a flowchart which shows a face area recognition process. 顔領域認識処理の流れを概念的に示すイメージ図である。It is an image figure which shows notionally the flow of a face area recognition process. 撮像装置に人物の顔が撮像された場合と人物の手の平が撮像された場合のフレームデータを示したイメージ図である。It is the image figure which showed the frame data when a person's face is imaged by the imaging device, and a person's palm is imaged.

符号の説明Explanation of symbols

１０画像処理装置、１２ＣＰＵ、１２−１画像縮小部、１２−２第１の抽出手段（肌色領域抽出部）、１２−３第２の抽出手段（肌色動き領域抽出部）、１２−４第１の横幅算出部（顔候補位置検出部）、１２−５第３の抽出手段（肩候補領域抽出部）、１２−６第４の抽出手段（肩候補動き領域抽出部）、１２−７第２の横幅算出部（肩候補位置検出部）、１２−８横幅比較部、１２−９（判断手段）顔領域判断部、１３メモリ、１４固定ディスク、１５通信インターフェース、１６入力装置、１７出力装置、２１撮像装置、３０１ａ肌色領域、３０１ｂ第１の重複部分（肌色動き領域）、３０３抽出対象エリア、３０６ａ胴体候補領域（肩候補領域）、３０６ｂ第２の重複部分（肩候補動き領域）。 DESCRIPTION OF SYMBOLS 10 Image processing apparatus, 12 CPU, 12-1 Image reduction part, 12-2 1st extraction means (skin color area extraction part), 12-3 2nd extraction means (skin color movement area extraction part), 12-4 1-5 width calculating section (face candidate position detecting section), 12-5 third extracting means (shoulder candidate area extracting section), 12-6 fourth extracting means (shoulder candidate motion area extracting section), 12-7 2 width calculation unit (shoulder candidate position detection unit), 12-8 width comparison unit, 12-9 (determination means) face area determination unit, 13 memory, 14 fixed disk, 15 communication interface, 16 input device, 17 output device , 21 imaging device, 301a skin color region, 301b first overlap portion (skin color motion region), 303 extraction target area, 306a body candidate region (shoulder candidate region), 306b second overlap portion (shoulder candidate motion region).

Claims

複数のフレームデータからなる動画像から人物の顔領域を認識する画像処理装置であって、
前記複数のフレームデータの各々から、色相に基づいて肌色領域を抽出する第１の抽出手段と、
前記複数のフレームデータのそれぞれから抽出された複数の前記肌色領域の第１の重複部分を抽出する第２の抽出手段と、
前記第１の重複部分の横幅を算出する第１の横幅算出手段と、
前記複数のフレームデータの各々から、前記第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出する第３の抽出手段と、
前記複数のフレームデータのそれぞれから抽出された複数の前記胴体候補領域の第２の重複部分を抽出する第４の抽出手段と、
前記第２の重複部分の横幅を算出する第２の横幅算出手段と、
前記第１の重複部分の横幅と前記第２の重複部分の横幅とを比較する比較手段と、
前記比較手段にて得られた比較結果が所定の条件を満たす場合に、前記肌色領域を顔領域として認識する認識手段と、を備える、画像処理装置。 An image processing apparatus for recognizing a human face area from a moving image composed of a plurality of frame data,
First extraction means for extracting a skin color region based on hue from each of the plurality of frame data;
Second extraction means for extracting first overlapping portions of the plurality of skin color regions extracted from each of the plurality of frame data;
First width calculation means for calculating a width of the first overlapping portion;
Third extracting means for extracting, from each of the plurality of frame data, a body candidate region in a region below the first overlapping portion based on a predetermined color parameter;
A fourth extraction means for extracting a second overlapping portion of the plurality of trunk candidate regions extracted from each of the plurality of frame data;
A second width calculating means for calculating a width of the second overlapping portion;
A comparing means for comparing a width of the first overlapping portion with a width of the second overlapping portion;
An image processing apparatus comprising: recognition means for recognizing the skin color area as a face area when a comparison result obtained by the comparison means satisfies a predetermined condition.

前記比較手段は、前記第１の重複部分の横幅と前記第２の重複部分の横幅との比率を計算して、前記比率が予め定められた範囲内にあるか否かを判断し、
前記認識手段は、前記比率が予め定められた範囲内にあると判断された場合に前記肌色領域を顔領域として認識する、請求項１に記載の画像処理装置。 The comparing means calculates a ratio between a width of the first overlapping portion and a width of the second overlapping portion to determine whether the ratio is within a predetermined range;
The image processing apparatus according to claim 1, wherein the recognition unit recognizes the skin color region as a face region when it is determined that the ratio is within a predetermined range.

前記比較手段は、前記比率が１．４〜１．６の範囲内にあるか否かを判断し、
前記認識手段は、前記比率が１．４〜１．６の範囲内にあると判断された場合に前記肌色領域を顔領域として認識する、請求項２に記載の画像処理装置。 The comparing means determines whether the ratio is within a range of 1.4 to 1.6;
The image processing apparatus according to claim 2, wherein the recognition unit recognizes the skin color area as a face area when the ratio is determined to be within a range of 1.4 to 1.6.

前記所定の色パラメータは、前記フレームデータにおける画素毎の輝度値である、請求項１から３のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, wherein the predetermined color parameter is a luminance value for each pixel in the frame data.

前記複数のフレームデータの各々を縮小する画像縮小手段をさらに備える、請求項１から４のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising an image reducing unit that reduces each of the plurality of frame data.

前記画像縮小手段は、前記複数のフレームデータの各々を１／２ⁿ（ｎ：整数）倍に縮小する、請求項５に記載の画像処理装置。 The image processing apparatus according to claim 5, wherein the image reduction unit reduces each of the plurality of frame data by 1/2 ⁿ (n: integer) times.

前記第１の抽出手段は、前記複数のフレームデータの各々から、前記肌色領域または前記肌色領域以外の領域を示す２値のフレームデータを生成し、
前記第２の抽出手段は、前記２値のフレームデータのそれぞれから抽出された複数の前記肌色領域の第１の重複部分を抽出する、請求項１から６のいずれか１項に記載の画像処理装置。 The first extraction unit generates binary frame data indicating the skin color region or a region other than the skin color region from each of the plurality of frame data,
The image processing according to any one of claims 1 to 6, wherein the second extraction unit extracts a first overlapping portion of the plurality of skin color regions extracted from each of the binary frame data. apparatus.

コンピュータに複数のフレームデータからなる動画像から人物の顔領域を認識させるための画像処理プログラムであって、
前記複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、
前記複数のフレームデータのそれぞれから抽出された複数の前記肌色領域の第１の重複部分を抽出するステップと、
前記第１の重複部分の横幅を算出するステップと、
前記複数のフレームデータの各々から、前記第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出するステップと、
前記複数のフレームデータのそれぞれから抽出された複数の前記胴体候補領域の第２の重複部分を抽出するステップと、
前記第１の重複部分の横幅と前記第２の重複部分の横幅とを比較するステップと、
得られた比較結果が所定の条件を満たす場合に、前記肌色領域を顔領域として認識するステップと、をコンピュータに行わせるための画像処理プログラム。 An image processing program for causing a computer to recognize a human face area from a moving image composed of a plurality of frame data,
Extracting a skin color region based on hue from each of the plurality of frame data;
Extracting a first overlapping portion of the plurality of skin color regions extracted from each of the plurality of frame data;
Calculating a width of the first overlapping portion;
Extracting a body candidate region in a region below the first overlapping portion from each of the plurality of frame data based on a predetermined color parameter;
Extracting a second overlapping portion of the plurality of fuselage candidate regions extracted from each of the plurality of frame data;
Comparing the width of the first overlap with the width of the second overlap;
An image processing program for causing a computer to perform the step of recognizing the skin color area as a face area when the obtained comparison result satisfies a predetermined condition.

コンピュータに複数のフレームデータからなる動画像から人物の顔領域を認識させるための画像処理プログラムを記録したコンピュータ読取可能な記録媒体であって、
前記複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、
前記複数のフレームデータのそれぞれから抽出された複数の前記肌色領域の第１の重複部分を抽出するステップと、
前記第１の重複部分の横幅を算出するステップと、
前記複数のフレームデータの各々から、前記第１の重複部分より下方の領域内における胴体候補領域を、所定の色パラメータに基づいて抽出するステップと、
前記複数のフレームデータのそれぞれから抽出された胴体候補領域の第２の重複部分を抽出するステップと、
前記第２の重複部分の横幅を算出するステップと、
前記第１の重複部分の横幅と前記第２の重複部分の横幅とを比較するステップと、
得られた比較結果が所定の条件を満たす場合に、前記肌色領域を顔領域として認識するステップと、をコンピュータに実行させるための画像処理プログラムを記録したコンピュータ読取可能な記録媒体。 A computer-readable recording medium on which an image processing program for causing a computer to recognize a face area of a person from a moving image composed of a plurality of frame data is recorded,
Extracting a skin color region based on hue from each of the plurality of frame data;
Extracting a first overlapping portion of the plurality of skin color regions extracted from each of the plurality of frame data;
Calculating a width of the first overlapping portion;
Extracting a body candidate region in a region below the first overlapping portion from each of the plurality of frame data based on a predetermined color parameter;
Extracting a second overlapping portion of the body candidate region extracted from each of the plurality of frame data;
Calculating a width of the second overlapping portion;
Comparing the width of the first overlap with the width of the second overlap;
A computer-readable recording medium recording an image processing program for causing a computer to execute the step of recognizing the skin color area as a face area when the obtained comparison result satisfies a predetermined condition.

複数のフレームデータからなる動画像から人物の顔領域を認識する画像処理装置を使用した画像処理方法であって、
前記画像処理装置は、
前記動画像から抽出された前記複数のフレームデータを記憶する記憶部と、
前記記憶部に記憶された前記複数のフレームデータに基づいて、前記フレームデータにおける人物の顔領域を認識する制御部と、を備え、
前記画像処理方法は、
前記制御部が、前記記憶部に記憶された前記複数のフレームデータの各々から、色相に基づいて肌色領域を抽出するステップと、
前記制御部が、前記複数のフレームデータのそれぞれから抽出された複数の前記肌色領域の第１の重複部分を抽出するステップと、
前記制御部が、前記第１の重複部分の横幅を算出するステップと、
前記制御部が、前記記憶部に記憶された前記複数のフレームデータの各々から、前記第１の重複部分より下方の領域内における胴体候補領域を所定の色パラメータに基づいて抽出するステップと、
前記制御部が、前記複数のフレームデータのそれぞれから抽出された胴体候補領域の第２の重複部分を抽出するステップと、
前記制御部が、前記第２の重複部分の横幅を算出するステップと、
前記制御部が、前記第１の重複部分の横幅と前記第２の重複部分の横幅とを比較するステップと、
前記制御部が、得られた比較結果が所定の条件を満たす場合に、前記肌色領域を顔領域として認識するステップと、を備える、画像処理方法。 An image processing method using an image processing apparatus for recognizing a human face area from a moving image composed of a plurality of frame data,
The image processing apparatus includes:
A storage unit for storing the plurality of frame data extracted from the moving image;
A control unit for recognizing a human face area in the frame data based on the plurality of frame data stored in the storage unit,
The image processing method includes:
The control unit extracting a skin color region based on a hue from each of the plurality of frame data stored in the storage unit;
The control unit extracting a first overlapping portion of the plurality of skin color regions extracted from each of the plurality of frame data;
The controller calculates a width of the first overlapping portion;
The control unit extracting, from each of the plurality of frame data stored in the storage unit, a body candidate region in a region below the first overlapping portion based on a predetermined color parameter;
The control unit extracting a second overlapping portion of the body candidate region extracted from each of the plurality of frame data;
The controller calculates a width of the second overlapping portion;
The control unit comparing a width of the first overlapping portion with a width of the second overlapping portion;
And a step of recognizing the skin color area as a face area when the obtained comparison result satisfies a predetermined condition.