JP2004246424A

JP2004246424A - Method for extracting skin color area

Info

Publication number: JP2004246424A
Application number: JP2003032864A
Authority: JP
Inventors: Masahide Kaneko; 正秀金子; Tai Kuwan Fuuintsuu; タイクワンフーインツー; Mitsuhiko Meguro; 光彦目黒
Original assignee: Campus Create Co Ltd
Current assignee: Campus Create Co Ltd
Priority date: 2003-02-10
Filing date: 2003-02-10
Publication date: 2004-09-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method which can accurately extract a skin color area even if an object image includes various skin colors. <P>SOLUTION: For each pixel in the skin color area in a sample image, a histogram is generated concerning a hue H and saturation S. Then, the histogram is approximated with a Gaussian mixed model having a plurality of Gaussian models. Then, a color-difference signal (H, S) for a pixel in the object image is converted into one-dimensional information by using a corresponding value in the Gaussian model. Then, the value of the one-dimensional information is changed. The increment of the number of corresponding pixels in the Gaussian model is calculated at that time. The value of the one-dimensional information when the increment is nearly minimum is set as a threshold. A pixel having one-dimensional information included in a range defined by the threshold is selected from the object image. Consequently, the skin color area can be extracted from the object image. Furthermore, by using structural information on the extracted skin color area, a face area can be also selected from the skin color area. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像から肌色領域を抽出する方法に関するものである。
【０００２】
【背景技術】
写真などの画像から顔の領域を検出する技術は、既に知られている。この検出によって、個人認証や、顔を含む画像の検索が可能になると考えられる。
【０００３】
従来提案されている検出技術としては、顔の構造情報や肌の色情報を用いるものがある。肌色情報を用いる検出技術は、顔領域の候補を選別する手段として優れている。しかしながら、従来の技術では、人種や陰影に基づく色の変動があると、顔領域の検出精度がかなり低下してしまうという問題があった。
【０００４】
【発明が解決しようとする課題】
本発明は、前記の事情に鑑みてなされたものである。本発明の目的は、対象画像の条件が多様であっても、精度良く肌色領域を抽出しうる方法を提供することである。
【０００５】
【課題を解決するための手段】
本発明に係る肌色領域の抽出方法は、次のステップを含んでいる。
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小となる前記１次元情報の値をしきい値とするステップ；および
（４）前記しきい値で決定される範囲に含まれる前記１次元情報を有する画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。
【０００６】
前記１次元情報は、前記ガウスモデルにおける値であって、かつ、前記対象画像の画素における二つの色差信号に対応する値が最大となるものとすることができる。
【０００７】
前記ステップ（３）における前記画素数の増加量を、互いに同じガウスモデルにより１次元情報に変換された画素におけるものとし、
前記しきい値を、このガウスモデルを示す情報と関連付け、
前記ステップ（４）において対象画像から選択される画素を、互いに同じガウスモデルにより１次元情報に変換された画素中から、このガウスモデルと関連づけられたしきい値によって選択する、
ことができる。
【０００８】
本発明に係る肌色領域の抽出方法は、次のステップを含む構成であってもよい。
（１）対象画像の全部または一部の領域における各画素を、その画素におけるＣｂおよびＣｒの値を用いて複数のクラスタに分解するステップ；および
（２）前記複数のクラスタのうち、他に比較してＣｒが大でＣｂが小であるクラスタに属する画素が位置する領域を前記肌色領域とするステップ。
このステップ（１）における一部の領域は、前記した本発明の方法により抽出された肌色領域であってもよい。
前記対象画像は、ＲＧＢ空間からＹＣｂＣｒ空間に変換されることにより得られたものであってもよい。
前記複数のクラスタの数は例えば２である。
【０００９】
前記した本発明の抽出方法により抽出された肌色領域の構造情報を用いて、顔領域である肌色領域を抽出することもできる。
【００１０】
本発明に係る肌色領域の抽出方法は、次のステップを含む構成であってもよい。
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素における前記色差信号に対応する、前記ガウスモデルでの値に基づいて、しきい値を生成するステップ；および
（４）前記しきい値で決定される範囲に含まれる前記画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。
【００１１】
本発明に係るコンピュータプログラムは、前記したいずれかの抽出方法におけるステップをコンピュータで実行するためのものである。
本発明に係る記録媒体は、このコンピュータプログラムを記録し、かつ、コンピュータで読み取り可能なものである。
【００１２】
本発明に係る肌色領域の抽出装置は、しきい値生成部と、皮膚領域候補抽出部とを備えており、前記しきい値生成部は、下記（１）〜（３）のステップを実行し、前記皮膚領域候補抽出部は、下記（４）のステップを実行することを特徴とするものである。
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小のときの前記しきい値候補をしきい値とするステップ；
（４）前記しきい値で決定される範囲に含まれる前記一次元情報を有する画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。
【００１３】
本発明に係る肌色領域の抽出装置は、対象画像における画素を判定するためのしきい値を生成するしきい値生成部を備え、前記しきい値生成部は、下記（１）〜（３）のステップを実行することを特徴とするものであってもよい。
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；および
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小のときの前記しきい値候補をしきい値とするステップ。
【００１４】
本発明に係る肌色領域の抽出装置は、以下のステップを実行する顔領域候補抽出部を備えた構成であってもよい。
（１）対象画像の全部または一部の領域における各画素を、各画素におけるＣｂおよびＣｒの値を用いて複数のクラスタに分解するステップ；および
（２）前記複数のクラスタのうち、他に比較してＣｒが大でＣｂが小であるクラスタに属する画素が位置する領域を前記肌色領域とするステップ。
【００１５】
前記抽出装置は、さらに顔領域抽出部を備えていてもよい。この顔領域抽出部は、肌色領域における構造情報を用いて、顔領域である肌色領域を抽出するものである。
【００１６】
【発明の実施の形態】
本発明の一実施形態に係る、肌色領域の抽出方法を、添付の図面を参照しながら説明する。まず、本実施形態の抽出方法に用いる装置の構成を図１に基づいて説明する。なお、この明細書では、用語の概念として、肌色領域は皮膚領域候補を含み、皮膚領域候補は顔領域候補（すなわち皮膚領域）を含み、顔領域候補は顔領域を含む意味で用いる。
【００１７】
この装置は、画像情報取得部１と、色空間変換部２と、しきい値生成部３と、皮膚領域候補抽出部４と、顔領域候補抽出部５と、顔領域抽出部６とを備えている。画像情報取得部１は、対象画像の情報を入力として取得する部分である。対象画像情報としては、通常は、ＲＧＢ空間におけるＲＧＢ信号である。色空間変換部２は、対象画像を、輝度信号と二つの色差信号とを持つ色空間に変換する部分である。色空間としては、二つの色差信号を持つものであればよい。本実施形態では、そのような色空間としてＨＳＶ空間を用い、二つの色差信号としてＨ（色相）、Ｓ（彩度）を用いる例を説明する。二つの色差信号を持つ色空間としては、ＨＳＶの他にも、均等色空間としてのＣＩＥＬＡＢ（Ｌ^＊ａ^＊ｂ^＊）、ＣＩＥＬＵＶ（Ｌ^＊ｕ^＊ｖ^＊）や、さらには、［Ｙ，Ｒ−Ｙ，Ｂ−Ｙ］、［Ｙ，Ｉ，Ｑ］などを用いることができる。
【００１８】
しきい値生成部３は、画像情報を構成する画素を、肌色領域のものとそうでないものとに区別するためのしきい値を生成するものである。このしきい値の生成方法は後述する。
【００１９】
皮膚領域候補抽出部４は、しきい値で規定される範囲に含まれる画素を対象画像から選択することにより、前記対象画像から肌色領域を抽出する機能を実行するものである。この動作の詳細も後述する。
【００２０】
顔領域候補抽出部５は、皮膚領域候補抽出部４で得られた皮膚領域候補を絞り込むとともに、顔領域と髪領域とを判別するものである。その方法については後述する。
【００２１】
顔領域抽出部６は、顔領域候補抽出部５で得られた顔領域候補（すなわち皮膚領域）から、構造的な特徴に基づいて、顔領域を抽出（選別）するものである。具体的な抽出方法は後述する。
【００２２】
つぎに、本実施形態における肌色領域の抽出方法を説明する。まず、画像情報取得部１が、入力としての対象画像を取得する。この対象画像は、この実施形態では、ＲＧＢ空間におけるＲＧＢ信号で表されているものとする。ついで、色空間変換部２により、ＲＧＢ信号をＨＳＶ信号に変換する。これにより、対象画像から二つの色差信号（Ｈ，Ｓ）を生成することができる。ＲＧＢ空間からＨＳＶ空間への変換式は周知であるが、念のため下記に示す。

【００２３】
ついで、皮膚領域候補抽出部４において、ＨＳＶ信号としての対象画像を用いて、皮膚領域候補を抽出する。本実施形態では、この抽出の前に、しきい値生成部３により、しきい値を生成する。そこで、まず、しきい値の生成方法を図２に基づいて説明する。
【００２４】
（しきい値の生成）
まず、ＲＧＢ空間における、複数のサンプル画像を取得する（ステップ２−１）。サンプル画像としては、人の顔を含むものを用いる。また、サンプル画像に含まれる顔としては、人種、年齢、性別、日焼けの程度、陰影などの条件がなるべく多様であることが好ましい。ついで、ＲＧＢ信号をＨＳＶ信号に変換する（ステップ２−２）。この変換式は前記したものと同様である。
【００２５】
ついで、サンプル画像のなかで、顔領域（すなわち肌色領域）における画素を選択する（ステップ２−３）。サンプル画像においては、顔領域は既知である。
【００２６】
ついで、サンプル画像中の顔領域における各画素について、二つの色差信号である色相（Ｈ）および彩度（Ｓ）に関してヒストグラムを生成する（ステップ２−４）。ヒストグラムの例を図３に示す。図３（ｂ）および（ｃ）は、図３（ａ）のグラフを色相または彩度のみの一次元で表したものである。
【００２７】
ついで、ヒストグラムをガウス混合モデルにより近似する（ステップ２−５）。ガウス混合モデルは以下のように表される。

ここで、
ｋ：ガウスモデルの混合数（この実施形態では４）、

θ：一つのガウスモデルに対応するパラメータα_ｉ、μ_ｉ等のセット（この実施形態では４セット）、
である。
【００２８】
ヒストグラムをガウス混合モデルで近似した結果を図４に示す。この例では、ガウスモデルの混合数を４としている。ＨまたはＳについてガウス混合モデル（図４（ｂ）および（ｃ）参照）を生成した後、それを組み合わせることで、図４（ａ）に示すような３次元のガウス混合モデルを得ることができる。
ガウス混合モデルは、例えば
（１）Ｓ．ＭｃＫｅｎｎａ，Ｓ．Ｇｏｎｇ，ａｎｄＹ．Ｒａｊａ．ＭｏｄｅｌｌｉｎｇＦａｃｉａｌＣｏｌｏｕｒａｎｄＩｄｅｎｔｉｔｙｗｉｔｈＧａｕｓｓｉａｎＭｉｘｔｕｒｅｓ．ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，３１（１２）：１８８３−１８９２，１９９８．
（２）Ｈ．Ｐ．Ｇｒａｆ，Ｅ．Ｃｏｓａｔｔｏ，Ｄ．Ｇｉｂｂｏｎ，Ｍ．Ｋｏｃｈｅｉｓｅｎ，ａｎｄＥ．Ｐｅｔａｊａｎ．ＭｕｌｔｉｍｏｄｅｌＳｙｓｔｅｍｆｏｒＬｏｃａｔｉｎｇＨｅａｄｓａｎｄＦａｃｅｓ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＳｅｃｏｎｄＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｕｔｏｍａｔｉｃＦａｃｅａｎｄＧｅｓｔｕｒｅＲｅｃｏｇｎｉｔｉｏｎ，ｐａｇｅｓ８８−９３，１９９６．
に記載されているように、良く知られているので、これ以上詳細な説明は省略する。
【００２９】
ついで、対象画像中の各画素ｘ_ｉについての、二つの色差信号（Ｈ，Ｓ）を、ガウスモデルにおける対応値を用いて１次元情報に変換する（ステップ２−６）。具体的には、画素ｘ_ｉにおける色差信号の組（Ｈ，Ｓ）に対し、ガウス混合モデルにおいて最も大きい値φ_ｉを求める。このφ_ｉが、この実施形態における１次元情報となる。φ_ｉは、ヒストグラムの値をそのまま用いれば頻度値となる。ただし、φ_ｉとしては、頻度値から得られる他の値であっても良い。例えば、頻度値をヒストグラムの最大値で正規化した値をφ_ｉとして用いても良い。この場合は、φ_ｉは確率を表すことになる。φ_ｉは、それ自体が頻度値であるか正規化された確率であるかに拘わらず、実質的には、肌色である確率を示す尤度値を意味することになる。よって、この明細書では、φ_ｉを尤度値と称することがある。さらに、このφ_ｉを与えるガウスモデルを示す情報（すなわちガウスモデルのラベル）ｌ（ｌは１〜ｋのいずれかの整数）を求める。この実施形態では、前記した通り、ｋ＝４である。これにより、各画素ｘ_ｉについて、尤度値φ_ｉとラベルｌの組を情報として付与する（ステップ２−７）。具体的には、例えば、φ_ｉとｌとをｘ_ｉに対応させて、テーブルに格納する。
【００３０】
ついで、１次元情報におけるしきい値候補の値を変化させたときの画素数の増加量がほぼ最小のときのしきい値候補をしきい値とする（ステップ２−８）。このステップを、図５および図６を参照しながら、以下に詳述する。
【００３１】
まず、ガウスモデルのラベルｌに関連づけられた画素（群）ｘ_ｉを考える。ついで、これらの画素ｘ_ｉについての尤度値φ_ｉの最大値（またはそれに近い値）φ_ｉ _ｍａｘからΔＴ_ｌだけ小さい値をしきい値候補Ｔ_ｌとする（図５参照）。ついで、しきい値候補Ｔ_ｌ以上の尤度値φ_ｉを持つ、ラベルｌの画素ｘ_ｉの数を数える。例えば、このときの画素の例は、図６（ａ）のようになる。ついで、しきい値候補Ｔ_ｌをΔＴ_ｌだけずらし、再び、しきい値候補Ｔ_ｌ以上の尤度値φ_ｉを有する、ラベルｌの画素ｘ_ｉを数える。例えば、このときの画素の例は、図６（ｂ）のようになる。ΔＴ_ｌは、この実施形態では一定量とする。ΔＴ_ｌだけしきい値候補Ｔ_ｌをずらしながら、前記の計算を行う。このようにして、１次元情報φ_ｉを用いて、しきい値候補Ｔ_ｌの値に対応する画素の増加量をそれぞれ求めることができる。例えば、図６（ａ）における画素数をＡ_１、図６（ｂ）における画素数をＡ_２とすれば、増加量はＡ_２−Ａ_１として求めることができる。このようにして求めた、しきい値候補と画素の増加量との関係を図５に示す。
【００３２】
ついで、画素数の増加量がほぼ最小となる尤度値（つまり１次元情報の値）をしきい値Ｔ_ｌ０とする。この実施形態では、しきい値候補Ｔ_ｌと画素増加量との関係を求めているが、これと等価な方法として、しきい値候補Ｔ_ｌの変化量ΔＴ_ｌと、その範囲に含まれる画素数との関係を求めることにより、しきい値Ｔ_ｌ０を求めることもできる。この場合は、変化量ΔＴ_ｌの幅に含まれる画素数が、前記した画素数の増加量を意味することになる。
【００３３】
本実施形態では、結局、対象画像中の画素における色差信号（Ｈ，Ｓ）に対応する、ガウスモデルでの値φ_ｉに基づいて、しきい値が生成されていることになる。
【００３４】
（皮膚候補領域の抽出）
ついで、このしきい値を用いて、皮膚領域候補抽出部４（図１参照）において、対象画像から皮膚領域候補を抽出する。この抽出は、しきい値Ｔ_ｌ０で決定される範囲に含まれる一次元情報（尤度値）を有する、ラベルｌを有する画素ｘ_ｉを、対象画像から選択することにより行われる。
【００３５】
この抽出手順を図７に示す。まず、ラベルｌと関連づけられた画素ｘ_ｉに対して、そのラベルｌに対応するしきい値Ｔ_ｌを適用する（ステップ７−１）。ここでは、画素ｘ_ｉの尤度値がしきい値Ｔ_ｌより大きければ１、小さければ０を出力することで２値化する。また、例えば、画素ｘ_ｉの尤度値がしきい値Ｔ_ｌより大きければ、１でなくｌを出力することもできる。このようにすれば、２値化と同時に、ラベルの情報も出力できるという利点がある。
【００３６】
ついで、２値化の結果が１であるか否かを判断する（ステップ７−２）。もちろん０であるか否かを判断しても良いことは当然である。１であれば、その画素を皮膚領域候補の画素とする（ステップ７−３）。１でなければ皮膚領域候補としない（ステップ７−４）。この判定を各画素について行う。
【００３７】
本実施形態によれば、しきい値に基づいて対象画像から画素を選択することにより、この画素に基づいて、対象画像から肌色領域（皮膚領域候補）を抽出することができる。従来は、皮膚領域かどうかを判定するしきい値を固定していたために、対象画像における条件が変化すると検出精度が劣化するという問題があった。これに対して、この実施形態の方法では、前記のように、対象画像から適応的にしきい値を生成している。例えば、この実施形態では、対象画像が異なれば、しきい値候補Ｔ_ｌの変化量ΔＴ_ｌに対応する画素数は異なる。したがって、対象画像毎に、各ガウスモデルｌに対応して別のしきい値を決めることができる。この結果、対象画像の明るさが変動したり、様々な肌色の人物が存在しても、それらの条件に対応してしきい値を生成することができる。これにより、この実施形態では、肌色領域を適切に抽出することが可能となる。
【００３８】
このようにして画素を選択した結果を図８（ｂ）に示す。同図（ａ）は対象画像である。図８（ｂ）では、選択された画素を白で示し、選択されなかった画素を黒で示している。
【００３９】
（顔領域候補の抽出…絞り込み）
つぎに、顔領域候補抽出部５において、皮膚領域候補の絞り込みを行う。絞り込みの手順を図９に基づいて説明する。まず、メディアンフィルタ（図示せず）により、孤立画素を除去する（ステップ９−１）。メディアンフィルタとしては、例えば３×３画素の範囲でのものを用いることができる。これにより皮膚領域候補の画像からノイズを除去することができる。処理結果の例を図８（ｃ）に示す。ついで、各領域についてラベルを付与するというラベリングを行う（ステップ９−２）。ついで、各領域について画素数を算出し、面積の小さい領域を除去する（ステップ９−３）。具体的には、例えば、各領域内の面積（例えば画素数）を求める。ついで、それらの平均値を計算する。ついで、平均値のａ％（ａは例えば３）よりも面積が小さい領域を除去する。小領域を除去した結果の例を図８（ｄ）に示す。
【００４０】
ついで、領域に対してモルフォロジー処理を行う（ステップ９−４）。モルフォロジー処理は、幅の狭い凹部を除去して、輪郭を滑らかにするものである。処理結果の例を図８（ｅ）に示す。ついで、穴埋め処理を行う（ステップ９−５）。穴埋め処理とは、領域内にある穴（例えば目や鼻の穴に相当する）を埋めておく処理である。処理結果の例を図８（ｆ）に示す。また、このようにして抽出された領域の画像を図８（ｇ）に示す。
これらの、図９に示した各処理は、いずれもそれ自体としては周知なので、これ以上の説明は省略する。
【００４１】
これらの処理により、皮膚領域候補を、一定以上の大きさの領域とする（つまり絞り込む）ことができる。絞り込まれた皮膚領域候補の画像は、この実施形態では、ＲＧＢ信号で保存しておく。
【００４２】
（顔領域候補の抽出…髪領域と皮膚領域との分離）
ついで、顔領域候補抽出部５において、さらに、絞り込まれた皮膚領域候補画像における髪の毛の領域と皮膚領域とを分離する。分離する手順を図１０を用いて説明する。このような分離を行う理由は、色相と彩度を用いた分離方法では、髪の毛の領域と皮膚領域との分離が難しいためである。例えば、図１１（ａ）、図１２（ａ）および図１３（ａ）に示した画像例では、色相と彩度を用いた場合、前記した手順を適用すると、図１１（ｂ）、図１２（ｂ）および図１３（ｂ）に示されるような領域が抽出される。これらの図において、顔の周囲における白線が領域を示している。また、これらの画像における色相と彩度の関係を図１１（ｃ）、図１２（ｃ）および図１３（ｃ）に示す。図中、毛髪部分の画素を十字で、皮膚部分の画素をドット（図においてはつぶれている）で示している。これらからも、色相と彩度とを用いた場合、髪の毛と皮膚とを分離しがたいことが判る。
【００４３】
そこで、この実施形態では、まず、ＲＧＢ空間における皮膚領域候補の画像をＹＣｂＣｒ空間における画像に変換する（ステップ１０−１）。つまり、ＲＧＢ信号をＹＣｂＣｒ信号に変換する。変換式は周知なので説明を省略する。
【００４４】
ついで、各画素について、ＣｂおよびＣｒにおけるヒストグラムを生成する（ステップ１０−２）。例えば、図１１、図１２および図１３に示した各画像におけるヒストグラムの例を図１４〜図１６に示す。ここでも、髪の毛部分の画素を十字で、皮膚部分の画素をドットで示している。このように、ＣｂおよびＣｒにおけるヒストグラムを用いると、複数（ここでは二つ）のクラスタに分解することができる。
【００４５】
ついで、一方のクラスタを選択する（ステップ１０−３）。図１４〜図１６の例から明らかなように、一般に、肌色領域は、Ｃｒが大でＣｂが小となるので、その性質を有するクラスタを選択する。
【００４６】
ついで、選択したクラスタに属する画素を抽出する（ステップ１０−４）。ついで、抽出した画素から孤立画素を除去することにより、ノイズを除去する（ステップ１０−５）。これにより、顔領域候補を抽出することができる（ステップ１０−６）。
【００４７】
このように、本実施形態によれば、顔領域と毛髪領域とを、ＣｂおよびＣｒの値に対応したクラスタを用いて分別しているので、顔領域の候補画像を精度良く抽出することが可能となる。
【００４８】
（顔領域の抽出）
以上の段階では、色情報に基づいて肌色領域を抽出した。しかし、以上の処理だけでは、顔以外の皮膚領域（例えば腕）を除外することは難しい。そこで、顔領域検出部６において、顔領域を抽出（選別）する。その手順を図１７に基づいて説明する。
【００４９】
まず、領域の形状が、所定の範囲内の楕円形か否かを判定する（ステップ１７−１）。つまり、（短軸長さ／長軸長さ）が所定値の範囲内（例えば０．４より大きい）か否かを判断する。範囲外であれば、その領域は顔領域としない（ステップ１７−２）。
【００５０】
領域形状が所定範囲内の楕円であれば、両眼が存在するか否かを判定する（ステップ１７−３）。ここでは、まず、色情報から、眼の候補領域を求める。通常、前記したＹＣｂＣｒ信号におけるクラスタリングを用いると、眼領域は、髪領域と同様に、肌色領域から区別できる。さらに、眼領域は、大きさ等の構造情報を用いて髪領域から分離できる。こうして得た眼領域が、所定の規則に合致するかどうかを判断する。規則とは、例えば、眼領域が短軸の上方にあるか、眼領域は短軸とほぼ平行な線上にあるか、両眼が長軸の一側に寄っていないか等の構造的な規則である。このようにして、色情報に基づいて抽出した領域を候補として、眼領域の有無を調べる。眼領域が存在すれば、その顔領域候補を顔領域とする（ステップ１７−４）。
【００５１】
色情報に基づいて眼領域を検出できないときは、ステップ１７−３において、輝度情報に基づいてさらに眼領域を検出する。ここでは、輝度が低い領域を候補として抽出し、その領域の大きさや配置等の構造情報に基づいて判断する。これにより眼領域が検出できれば、その顔領域候補を顔領域とする（ステップ１７−４）。
【００５２】
以上の手順で眼領域が検出できなければ、その顔領域候補は顔領域としない（ステップ１７−２）。
【００５３】
このようにして、本実施形態により、対象画像から顔領域を抽出することができる。顔領域が決定すれば、その領域に含まれる画素から、顔の画像を得ることができる。
【００５４】
前記した各実施形態の実行は、当業者にはコンピュータを用いて容易に実行可能である。そのためのプログラムは、コンピュータで読み取り可能な記録媒体、例えばＨＤ、ＦＤ、ＣＤ、ＭＯなど、任意のものに格納できる。
【００５５】
なお、前記各実施形態の記載は単なる一例に過ぎず、本発明に必須の構成を示したものではない。各部の構成は、本発明の趣旨を達成できるものであれば、上記に限らない。
また、各実施形態を実現するための各部（機能ブロックを含む）の具体的手段は、ハードウエア、ソフトウエア、ネットワーク、これらの組み合わせ、その他の任意の手段を用いることができ、このこと自体は当業者において自明である。さらに、機能ブロックどうしが複合して一つの機能ブロックに集約されても良い。また、一つの機能ブロックの機能が複数の機能ブロックの協働により実現されても良い。
【００５６】
【発明の効果】
本発明によれば、対象画像の条件が多様であっても、精度良く肌色領域を抽出しうる方法を提供できる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る肌色領域抽出方法を実施するための装置の概略を示すブロック図である。
【図２】本発明の一実施形態に係る肌色領域抽出方法における、しきい値の生成手順を示すフローチャートである。
【図３】色相（Ｈ）および彩度（Ｓ）についての画素毎のヒストグラムを示すグラフであって、図（ａ）は２次元、図（ｂ）は色相のみの１次元、図（ｃ）は彩度のみの１次元で示したものである。
【図４】色相（Ｈ）および彩度（Ｓ）についての画素毎のヒストグラムをガウス混合モデルで近似した状態を示すグラフであって、図（ａ）は２次元、図（ｂ）は色相のみの１次元、図（ｃ）は彩度のみの１次元で示したものである。
【図５】しきい値生成方法を説明するためのグラフであって、縦軸は画素の増加量、横軸はしきい値（尤度値）である。
【図６】しきい値候補に対応する画素数を説明するための図であって、図（ａ）と図（ｂ）は、互いに異なるしきい値候補以上の尤度値を有する画素の例を示している。
【図７】しきい値に基づく画素の抽出方法を説明するためのフローチャートである。
【図８】対象画像から肌色領域の抽出を行う過程を説明するための画像例であって、図（ａ）は対象画像、図（ｂ）はしきい値を用いて抽出された領域を白色で示す図、図（ｃ）はメディアンフィルタで処理された後の領域を示す図、図（ｄ）は小領域を除去した後の図、図（ｅ）はモルフォロジー処理後の図、図（ｆ）は穴埋め後の図、図（ｇ）は絞り込まれた領域における画像である。
【図９】皮膚領域候補の絞り込みの手順を説明するためのフローチャートである。
【図１０】毛髪部分と顔部分の領域を分離する手順を説明するためのフローチャートである。
【図１１】毛髪部分と顔部分の領域を分離する手順を説明するための説明図であって、図（ａ）は対象画像の例、図（ｂ）は抽出された皮膚領域候補、図（ｃ）は、皮膚領域候補における画素のヒストグラムであって、縦軸は彩度、横軸は色相である。
【図１２】毛髪部分と顔部分の領域を分離する手順を説明するための説明図であって、図（ａ）は対象画像の他の例、図（ｂ）は抽出された皮膚領域候補、図（ｃ）は、皮膚領域候補における画素のヒストグラムであって、縦軸は彩度、横軸は色相である。
【図１３】毛髪部分と顔部分の領域を分離する手順を説明するための説明図であって、図（ａ）は対象画像のさらに他の例、図（ｂ）は抽出された皮膚領域候補、図（ｃ）は、皮膚領域候補における画素のヒストグラムであって、縦軸は彩度、横軸は色相である。
【図１４】図１１に示す皮膚領域候補における画素のヒストグラムであって、縦軸はＣｒ、横軸はＣｂである。
【図１５】図１２に示す皮膚領域候補における画素のヒストグラムであって、縦軸はＣｒ、横軸はＣｂである。
【図１６】図１３に示す皮膚領域候補における画素のヒストグラムであって、縦軸はＣｒ、横軸はＣｂである。
【図１７】顔領域候補から顔領域を抽出する手順を説明するためのフローチャートである。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for extracting a skin color region from an image.
[0002]
[Background Art]
Techniques for detecting a face area from an image such as a photograph are already known. It is considered that this detection makes it possible to perform personal authentication and search for an image including a face.
[0003]
Conventionally proposed detection techniques use face structure information and skin color information. The detection technique using skin color information is excellent as a means for selecting face area candidates. However, in the related art, there is a problem that if there is a color change based on race or shadow, the detection accuracy of the face region is considerably reduced.
[0004]
[Problems to be solved by the invention]
The present invention has been made in view of the above circumstances. An object of the present invention is to provide a method capable of accurately extracting a flesh-colored area even when conditions of a target image are various.
[0005]
[Means for Solving the Problems]
The method for extracting a flesh color region according to the present invention includes the following steps.
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting a threshold of the value of the one-dimensional information whose amount is substantially minimum; and
(4) extracting a skin color region from the target image by selecting, from the target image, pixels having the one-dimensional information included in the range determined by the threshold value.
[0006]
The one-dimensional information may be a value in the Gaussian model, and a value corresponding to two color difference signals in a pixel of the target image may be a maximum.
[0007]
The amount of increase in the number of pixels in the step (3) is assumed to be for pixels converted into one-dimensional information by the same Gaussian model,
Associating the threshold with information indicating the Gaussian model;
A pixel selected from the target image in the step (4) is selected from pixels converted into one-dimensional information by the same Gaussian model by a threshold value associated with the Gaussian model.
be able to.
[0008]
The method for extracting a skin color region according to the present invention may include the following steps.
(1) decomposing each pixel in all or a part of the target image into a plurality of clusters using the values of Cb and Cr in the pixel; and
(2) A step in which a region where a pixel belonging to a cluster having a larger Cr and a smaller Cb than other clusters is located as the skin color region, among the plurality of clusters.
Part of the area in step (1) may be a skin color area extracted by the method of the present invention described above.
The target image may be obtained by converting from the RGB space to the YCbCr space.
The number of the plurality of clusters is, for example, two.
[0009]
Using the structure information of the flesh-colored area extracted by the above-described extraction method of the present invention, a flesh-colored area that is a face area can be extracted.
[0010]
The method for extracting a skin color region according to the present invention may include the following steps.
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) generating a threshold based on a value in the Gaussian model corresponding to the color difference signal at a pixel in a target image; and
(4) extracting a skin color region from the target image by selecting the pixels included in the range determined by the threshold value from the target image.
[0011]
A computer program according to the present invention is for causing a computer to execute the steps in any of the above-described extraction methods.
The recording medium according to the present invention records the computer program and is readable by a computer.
[0012]
The skin color region extraction device according to the present invention includes a threshold value generation unit and a skin region candidate extraction unit, and the threshold value generation unit executes the following steps (1) to (3). The skin region candidate extraction unit performs the following step (4).
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting the threshold value candidate when the amount is substantially minimum as a threshold value;
(4) extracting a skin color region from the target image by selecting, from the target image, pixels having the one-dimensional information included in the range determined by the threshold value.
[0013]
An apparatus for extracting a skin color region according to the present invention includes a threshold value generation unit that generates a threshold value for determining a pixel in a target image, wherein the threshold value generation unit includes the following (1) to (3). May be performed.
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram by a Gaussian mixture model having a plurality of Gaussian models; and
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting the threshold value candidate when the amount is substantially minimum as a threshold value.
[0014]
The skin color region extraction device according to the present invention may be configured to include a face region candidate extraction unit that executes the following steps.
(1) decomposing each pixel in the entire or partial region of the target image into a plurality of clusters using the values of Cb and Cr in each pixel; and
(2) A step in which a region where a pixel belonging to a cluster having a larger Cr and a smaller Cb than other clusters is located as the skin color region, among the plurality of clusters.
[0015]
The extraction device may further include a face area extraction unit. The face area extraction unit extracts a skin color area, which is a face area, using the structure information of the skin color area.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
A method for extracting a skin color region according to an embodiment of the present invention will be described with reference to the accompanying drawings. First, the configuration of an apparatus used in the extraction method of the present embodiment will be described with reference to FIG. In this specification, as the concept of the term, a skin color region includes a skin region candidate, a skin region candidate includes a face region candidate (that is, a skin region), and a face region candidate includes a face region.
[0017]
This device includes an image information acquisition unit 1, a color space conversion unit 2, a threshold generation unit 3, a skin region candidate extraction unit 4, a face region candidate extraction unit 5, and a face region extraction unit 6. ing. The image information acquisition unit 1 is a unit that acquires information of a target image as input. The target image information is usually an RGB signal in an RGB space. The color space conversion unit 2 is a part that converts the target image into a color space having a luminance signal and two color difference signals. Any color space may be used as long as it has two color difference signals. In the present embodiment, an example will be described in which an HSV space is used as such a color space and H (hue) and S (saturation) are used as two color difference signals. As a color space having two color difference signals, in addition to HSV, CIE LAB (L^*a^*b^*), CIE LUV (L^*u^*v^*) Or [Y, RY, BY], [Y, I, Q], etc. can be used.
[0018]
The threshold value generation unit 3 generates a threshold value for distinguishing pixels constituting the image information into those in a flesh color region and those not in a skin color region. The method of generating this threshold will be described later.
[0019]
The skin region candidate extraction unit 4 executes a function of extracting a skin color region from the target image by selecting a pixel included in the range defined by the threshold from the target image. Details of this operation will also be described later.
[0020]
The face area candidate extracting section 5 narrows down the skin area candidates obtained by the skin area candidate extracting section 4 and distinguishes between the face area and the hair area. The method will be described later.
[0021]
The face area extraction unit 6 extracts (selects) a face area from the face area candidates (that is, skin areas) obtained by the face area candidate extraction unit 5 based on structural features. A specific extraction method will be described later.
[0022]
Next, a method for extracting a flesh-colored area in the present embodiment will be described. First, the image information acquisition unit 1 acquires a target image as an input. In this embodiment, it is assumed that the target image is represented by an RGB signal in an RGB space. Next, the color space conversion unit 2 converts the RGB signals into HSV signals. Thus, two color difference signals (H, S) can be generated from the target image. The conversion formula from the RGB space to the HSV space is well known, but is shown below just in case.

[0023]
Next, the skin region candidate extraction unit 4 extracts a skin region candidate using the target image as the HSV signal. In the present embodiment, a threshold is generated by the threshold generator 3 before the extraction. Thus, first, a method of generating a threshold will be described with reference to FIG.
[0024]
(Generate threshold)
First, a plurality of sample images in the RGB space are obtained (step 2-1). An image including a human face is used as the sample image. In addition, it is preferable that conditions such as race, age, gender, degree of sunburn, and shading are as diverse as possible in the face included in the sample image. Next, the RGB signals are converted into HSV signals (step 2-2). This conversion formula is the same as that described above.
[0025]
Next, pixels in the face area (that is, the skin color area) are selected from the sample images (step 2-3). In the sample image, the face area is known.
[0026]
Next, for each pixel in the face area in the sample image, a histogram is generated for two color difference signals, that is, hue (H) and saturation (S) (step 2-4). FIG. 3 shows an example of the histogram. 3 (b) and 3 (c) show the graph of FIG. 3 (a) in one dimension only with hue or saturation.
[0027]
Next, the histogram is approximated by a Gaussian mixture model (step 2-5). The Gaussian mixture model is expressed as follows.

here,
k: the number of Gaussian model mixtures (4 in this embodiment),

θ: Parameter α corresponding to one Gaussian model_i, Μ_iEtc. (4 sets in this embodiment),
It is.
[0028]
FIG. 4 shows the result of approximation of the histogram with a Gaussian mixture model. In this example, the number of mixtures of the Gaussian model is four. After generating a Gaussian mixture model (see FIGS. 4B and 4C) for H or S, by combining them, a three-dimensional Gaussian mixture model as shown in FIG. 4A can be obtained. .
The Gaussian mixture model is, for example,
(1) S. McKenna, S.M. Gong, and Y. Raja. Modeling Facial Color and Identity with Gaussian Mixtures. Pattern Recognition, 31 (12): 1883-1892, 1998.
(2) H. P. Graf, E .; Cosato, D.M. Gibbon, M .; Kocheisen, and E.A. Petajan. Multimodel System for Locating Heads and Faces. In Processeds of the Second IEEE International Conference on Automatic Face and Gesture Recognition, pages 88-93, 1996.
, Is well known and will not be described in further detail.
[0029]
Then, each pixel x in the target image_iAre converted into one-dimensional information using the corresponding values in the Gaussian model (step 2-6). Specifically, pixel x_iIs the largest value φ in the Gaussian mixture model for the set of color difference signals (H, S) at_iAsk for. This φ_iAre the one-dimensional information in this embodiment. φ_iBecomes a frequency value if the value of the histogram is used as it is. Where φ_iMay be another value obtained from the frequency value. For example, the value obtained by normalizing the frequency value with the maximum value of the histogram is φ_iYou may use as. In this case, φ_iRepresents the probability. φ_iMeans, in effect, a likelihood value indicating the probability of being flesh-colored, whether it is a frequency value or a normalized probability. Therefore, in this specification, φ_iMay be referred to as a likelihood value. Furthermore, this φ_i(I.e., a label of a Gaussian model) l (1 is an integer from 1 to k). In this embodiment, as described above, k = 4. Thereby, each pixel x_iFor the likelihood value φ_iAnd a label l as information (step 2-7). Specifically, for example, φ_iAnd l and x_iAnd store it in a table.
[0030]
Next, the threshold value candidate when the amount of increase in the number of pixels when the value of the threshold value candidate in the one-dimensional information is changed is substantially the minimum is set as the threshold value (step 2-8). This step is described in detail below with reference to FIGS.
[0031]
First, the pixel (group) x associated with the label 1 of the Gaussian model_ithink of. Then these pixels x_iLikelihood value φ for_iMaximum value (or a value close to it) φ_i _maxFrom ΔT_lA threshold value T smaller by_l(See FIG. 5). Next, the threshold candidate T_lThe above likelihood value φ_iPixel x with label l_iCount the number of. For example, an example of the pixel at this time is as shown in FIG. Next, the threshold candidate T_lTo ΔT_lAnd then again, threshold candidate T_lThe above likelihood value φ_iA pixel x with label l_iCount. For example, an example of the pixel at this time is as shown in FIG. ΔT_lIs a constant amount in this embodiment. ΔT_lOnly threshold candidate T_lThe above calculation is performed while shifting. Thus, the one-dimensional information φ_i, The threshold candidate T_lCan be obtained for each pixel. For example, the number of pixels in FIG.₁, The number of pixels in FIG.₂Then the increase is A₂-A₁Can be obtained as FIG. 5 shows the relationship between the threshold value candidate and the amount of increase in the pixel thus obtained.
[0032]
Next, the likelihood value (that is, the value of one-dimensional information) at which the amount of increase in the number of pixels is substantially minimized is set to₁₀And In this embodiment, the threshold candidate T_lBetween the threshold value and the pixel increase amount. As an equivalent method, a threshold candidate T_lOf change ΔT_lAnd the number of pixels included in the range, the threshold T₁₀Can also be requested. In this case, the change amount ΔT_lMeans the amount of increase in the number of pixels described above.
[0033]
In the present embodiment, after all, the value φ in the Gaussian model corresponding to the color difference signal (H, S) at the pixel in the target image_i, A threshold value has been generated.
[0034]
(Extraction of skin candidate area)
Next, using this threshold value, the skin region candidate extraction unit 4 (see FIG. 1) extracts a skin region candidate from the target image. This extraction has a threshold T₁₀Pixel x having the label l and having one-dimensional information (likelihood value) included in the range determined by_iIs selected from the target image.
[0035]
This extraction procedure is shown in FIG. First, the pixel x associated with the label l_i, A threshold T corresponding to the label l_lIs applied (step 7-1). Here, pixel x_iIs the threshold T_lBinarization is performed by outputting 1 if larger and 0 if smaller. Also, for example, the pixel x_iIs the threshold T_lIf it is larger, 1 instead of 1 can be output. This has the advantage that label information can be output simultaneously with binarization.
[0036]
Next, it is determined whether or not the binarization result is 1 (step 7-2). Of course, whether or not it is 0 may be determined. If it is 1, the pixel is set as a skin region candidate pixel (step 7-3). If it is not 1, it is not regarded as a skin region candidate (step 7-4). This determination is made for each pixel.
[0037]
According to the present embodiment, by selecting a pixel from the target image based on the threshold value, a skin color region (skin region candidate) can be extracted from the target image based on the pixel. Conventionally, since the threshold value for determining whether or not the image is a skin region is fixed, there is a problem that the detection accuracy is deteriorated when the condition in the target image changes. On the other hand, in the method of this embodiment, the threshold is adaptively generated from the target image as described above. For example, in this embodiment, if the target images are different, the threshold candidate T_lOf change ΔT_lAre different. Therefore, another threshold value can be determined for each target image corresponding to each Gaussian model l. As a result, even if the brightness of the target image fluctuates or a person with various skin colors exists, a threshold value can be generated according to those conditions. As a result, in this embodiment, it is possible to appropriately extract the skin color region.
[0038]
FIG. 8B shows the result of selecting the pixels in this manner. FIG. 3A shows a target image. In FIG. 8B, selected pixels are shown in white, and unselected pixels are shown in black.
[0039]
(Face area candidate extraction ... narrowing down)
Next, the face area candidate extraction unit 5 narrows down skin area candidates. The procedure for narrowing down will be described with reference to FIG. First, isolated pixels are removed by a median filter (not shown) (step 9-1). As the median filter, for example, a filter having a range of 3 × 3 pixels can be used. Thereby, noise can be removed from the image of the skin region candidate. FIG. 8C shows an example of the processing result. Next, labeling is performed to assign a label to each area (step 9-2). Next, the number of pixels is calculated for each region, and a region having a small area is removed (step 9-3). Specifically, for example, the area (for example, the number of pixels) in each region is obtained. Then, their average is calculated. Next, a region having an area smaller than a% (a is, for example, 3) of the average value is removed. FIG. 8D shows an example of the result of removing the small area.
[0040]
Next, morphological processing is performed on the region (step 9-4). The morphology processing is to remove a narrow recess and smooth the contour. FIG. 8E shows an example of the processing result. Next, a filling process is performed (step 9-5). The hole filling process is a process of filling a hole (corresponding to, for example, an eye or a nose hole) in an area. FIG. 8F shows an example of the processing result. FIG. 8G shows an image of the region extracted in this manner.
Since each of the processes shown in FIG. 9 is well known as itself, further description is omitted.
[0041]
Through these processes, the skin region candidates can be made into regions of a certain size or more (that is, narrowed down). In this embodiment, the image of the narrowed-down skin region candidate is stored as an RGB signal.
[0042]
(Extraction of face area candidate: separation of hair area and skin area)
Next, the face region candidate extraction unit 5 further separates the hair region and the skin region in the narrowed-down skin region candidate image. The separation procedure will be described with reference to FIG. The reason for performing such separation is that it is difficult to separate a hair region from a skin region by a separation method using hue and saturation. For example, in the image examples shown in FIGS. 11 (a), 12 (a) and 13 (a), when hue and saturation are used, when the above-described procedure is applied, FIG. 11 (b), FIG. An area as shown in FIG. 13B and FIG. 13B is extracted. In these figures, white lines around the face indicate regions. The relationship between hue and saturation in these images is shown in FIGS. 11 (c), 12 (c) and 13 (c). In the figure, the pixels of the hair part are indicated by crosses, and the pixels of the skin part are indicated by dots (in the figure, crushed). These results also show that it is difficult to separate hair and skin when using hue and saturation.
[0043]
Therefore, in this embodiment, first, the image of the skin region candidate in the RGB space is converted into an image in the YCbCr space (step 10-1). That is, the RGB signal is converted into the YCbCr signal. Since the conversion formula is well known, the description is omitted.
[0044]
Next, a histogram in Cb and Cr is generated for each pixel (step 10-2). For example, FIGS. 14 to 16 show examples of histograms in the images shown in FIGS. 11, 12, and 13. FIG. Here, the pixels of the hair portion are indicated by crosses, and the pixels of the skin portion are indicated by dots. As described above, by using the histograms of Cb and Cr, it can be decomposed into a plurality of (here, two) clusters.
[0045]
Next, one cluster is selected (step 10-3). As is clear from the examples of FIGS. 14 to 16, in general, the skin color region has a large Cr and a small Cb, and therefore a cluster having the property is selected.
[0046]
Next, pixels belonging to the selected cluster are extracted (step 10-4). Next, noise is removed by removing isolated pixels from the extracted pixels (step 10-5). Thus, face area candidates can be extracted (step 10-6).
[0047]
As described above, according to the present embodiment, since the face region and the hair region are classified using the clusters corresponding to the values of Cb and Cr, it is possible to accurately extract the candidate image of the face region. Become.
[0048]
(Extraction of face area)
In the above steps, a flesh color region was extracted based on the color information. However, it is difficult to exclude a skin region (for example, an arm) other than the face only by the above processing. Thus, the face area detection unit 6 extracts (selects) a face area. The procedure will be described with reference to FIG.
[0049]
First, it is determined whether or not the shape of the region is an ellipse within a predetermined range (step 17-1). That is, it is determined whether (short axis length / long axis length) is within a predetermined value range (for example, greater than 0.4). If it is out of the range, the area is not set as a face area (step 17-2).
[0050]
If the region shape is an ellipse within a predetermined range, it is determined whether or not both eyes are present (step 17-3). Here, first, an eye candidate area is obtained from the color information. Normally, using the above-described clustering in the YCbCr signal, the eye region can be distinguished from the skin color region in the same manner as the hair region. Further, the eye region can be separated from the hair region using structural information such as the size. It is determined whether or not the obtained eye region matches a predetermined rule. The rules are, for example, structural rules such as whether the eye region is above the short axis, whether the eye region is on a line almost parallel to the short axis, whether both eyes are not on one side of the long axis, etc. It is. In this manner, the presence / absence of an eye region is checked using the region extracted based on the color information as a candidate. If there is an eye area, the face area candidate is set as a face area (step 17-4).
[0051]
If the eye region cannot be detected based on the color information, in step 17-3, the eye region is further detected based on the luminance information. Here, a region with low luminance is extracted as a candidate, and the determination is made based on structural information such as the size and arrangement of the region. As a result, if the eye area can be detected, the face area candidate is set as a face area (step 17-4).
[0052]
If the eye area cannot be detected by the above procedure, the face area candidate is not set as a face area (step 17-2).
[0053]
Thus, according to the present embodiment, a face region can be extracted from a target image. Once the face region is determined, a face image can be obtained from the pixels included in the region.
[0054]
Execution of each of the above-described embodiments can be easily performed by those skilled in the art using a computer. The program for that can be stored in any computer-readable recording medium, for example, HD, FD, CD, MO and the like.
[0055]
The description of each of the above embodiments is merely an example, and does not show a configuration essential to the present invention. The configuration of each part is not limited to the above as long as the purpose of the present invention can be achieved.
Further, as specific means of each unit (including functional blocks) for realizing each embodiment, hardware, software, a network, a combination thereof, or any other means may be used. It is obvious to those skilled in the art. Further, the functional blocks may be combined into one functional block. Further, the function of one functional block may be realized by cooperation of a plurality of functional blocks.
[0056]
【The invention's effect】
According to the present invention, it is possible to provide a method capable of accurately extracting a flesh-tone area even when conditions of a target image are various.
[Brief description of the drawings]
FIG. 1 is a block diagram schematically showing an apparatus for implementing a skin color region extraction method according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a procedure for generating a threshold value in the skin color region extraction method according to one embodiment of the present invention.
3A and 3B are graphs showing a histogram for each pixel with respect to hue (H) and saturation (S), wherein FIG. 3A is two-dimensional, FIG. 3B is one-dimensional only of hue, and FIG. Is a one-dimensional representation of only the saturation.
FIGS. 4A and 4B are graphs showing a state in which a histogram for each pixel regarding hue (H) and saturation (S) is approximated by a Gaussian mixture model, where FIG. 4A is two-dimensional and FIG. (C) is a one-dimensional representation of only the saturation.
FIG. 5 is a graph for explaining a threshold value generation method, in which the vertical axis represents the amount of pixel increase and the horizontal axis represents the threshold value (likelihood value).
6A and 6B are diagrams for explaining the number of pixels corresponding to threshold candidates. FIGS. 6A and 6B show examples of pixels having likelihood values equal to or greater than different threshold candidates. Is shown.
FIG. 7 is a flowchart illustrating a pixel extraction method based on a threshold.
8A and 8B are image examples for explaining a process of extracting a skin color region from a target image, wherein FIG. 8A shows a target image, and FIG. 8B shows a white region extracted using a threshold. , FIG. (C) is a diagram showing a region after being processed by a median filter, FIG. (D) is a diagram after removing a small region, FIG. (E) is a diagram after morphological processing, and FIG. () Shows the figure after filling in the holes, and FIG. (G) shows the image in the narrowed area.
FIG. 9 is a flowchart illustrating a procedure for narrowing down skin area candidates.
FIG. 10 is a flowchart illustrating a procedure for separating a hair part and a face part from each other.
11A and 11B are explanatory diagrams for explaining a procedure for separating a hair part and a face part from each other. FIG. 11A is an example of a target image, FIG. 11B is an extracted skin area candidate, and FIG. c) is a histogram of the pixels in the skin region candidate, in which the vertical axis represents the saturation and the horizontal axis represents the hue.
12A and 12B are explanatory diagrams for explaining a procedure for separating a hair part and a face part from each other. FIG. 12A is another example of a target image, FIG. 12B is an extracted skin area candidate, FIG. 7C is a histogram of the pixels in the skin region candidate, in which the vertical axis represents the saturation and the horizontal axis represents the hue.
13A and 13B are explanatory diagrams for explaining a procedure of separating a hair part and a face part from each other, wherein FIG. 13A shows still another example of the target image, and FIG. (C) is a histogram of pixels in the skin region candidate, in which the vertical axis represents saturation and the horizontal axis represents hue.
14 is a histogram of pixels in the skin region candidate shown in FIG. 11, in which the vertical axis is Cr and the horizontal axis is Cb.
15 is a histogram of the pixels in the skin region candidate shown in FIG. 12, where the vertical axis is Cr and the horizontal axis is Cb.
FIG. 16 is a histogram of pixels in the skin region candidate shown in FIG. 13, where the vertical axis is Cr and the horizontal axis is Cb.
FIG. 17 is a flowchart illustrating a procedure for extracting a face area from face area candidates.

Claims

次のステップを含むことを特徴とする、肌色領域の抽出方法：
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小となる前記１次元情報の値をしきい値とするステップ；
（４）前記しきい値で決定される範囲に含まれる前記１次元情報を有する画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。A method for extracting a skin color region, comprising the following steps:
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting the value of the one-dimensional information whose amount is substantially minimum as a threshold value;
(4) extracting a skin color region from the target image by selecting, from the target image, pixels having the one-dimensional information included in the range determined by the threshold value.

前記１次元情報は、前記ガウスモデルにおける値であって、かつ、前記対象画像の画素における二つの色差信号に対応する値が最大となるものであることを特徴とする請求項１記載の抽出方法。The extraction method according to claim 1, wherein the one-dimensional information is a value in the Gaussian model, and a value corresponding to two color difference signals in a pixel of the target image is a maximum. .

前記ステップ（３）における前記画素数の増加量とは、互いに同じガウスモデルにより１次元情報に変換された画素におけるものであり、
前記しきい値は、このガウスモデルを示す情報と関連付けられており、
前記ステップ（４）において対象画像から選択される画素は、互いに同じガウスモデルにより１次元情報に変換された画素中から、このガウスモデルと関連づけられたしきい値によって選択される
ことを特徴とする請求項１または２に記載の抽出方法。The increasing amount of the number of pixels in the step (3) is a value of a pixel converted into one-dimensional information by the same Gaussian model,
The threshold is associated with information indicating the Gaussian model,
The pixel selected from the target image in the step (4) is selected from pixels converted into one-dimensional information by the same Gaussian model according to a threshold value associated with the Gaussian model. The extraction method according to claim 1.

次のステップを含むことを特徴とする、肌色領域の抽出方法：
（１）対象画像の全部または一部の領域における各画素を、その画素におけるＣｂおよびＣｒの値を用いて複数のクラスタに分解するステップ；
（２）前記複数のクラスタのうち、他に比較してＣｒが大でＣｂが小であるクラスタに属する画素が位置する領域を前記肌色領域とするステップ。A method for extracting a skin color region, comprising the following steps:
(1) decomposing each pixel in all or a part of the target image into a plurality of clusters using the values of Cb and Cr in the pixel;
(2) A step in which a region where a pixel belonging to a cluster having a larger Cr and a smaller Cb than other clusters is located as the skin color region, among the plurality of clusters.

前記一部の領域とは、請求項１〜３のいずれか１項の抽出方法により抽出された肌色領域であることを特徴とする請求項４記載の抽出方法。The extraction method according to claim 4, wherein the partial area is a skin color area extracted by the extraction method according to any one of claims 1 to 3.

前記対象画像は、ＲＧＢ空間からＹＣｂＣｒ空間に変換されることにより得られたものであることを特徴とする請求項４または５に記載の抽出方法。The method according to claim 4, wherein the target image is obtained by converting an RGB space into a YCbCr space.

前記複数のクラスタの数は２であることを特徴とする請求項４〜６のいずれか１項に記載の抽出方法。The method according to claim 4, wherein the number of the plurality of clusters is two.

請求項１〜７のいずれか１項記載の抽出方法により抽出された肌色領域の構造情報を用いて、顔領域である肌色領域を抽出するステップを含むことを特徴とする抽出方法。A method for extracting a skin-colored area, which is a face area, using the structure information of the skin-colored area extracted by the extraction method according to claim 1.

次のステップを含むことを特徴とする、肌色領域の抽出方法：
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素における前記色差信号に対応する、前記ガウスモデルでの値に基づいて、しきい値を生成するステップ；
（４）前記しきい値で決定される範囲に含まれる前記画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。A method for extracting a skin color region, comprising the following steps:
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) generating a threshold based on a value in the Gaussian model corresponding to the color difference signal in a pixel in the target image;
(4) extracting a skin color region from the target image by selecting the pixels included in the range determined by the threshold value from the target image.

請求項１〜９のいずれか１項に記載のステップをコンピュータで実行するためのコンピュータプログラム。A computer program for executing the steps according to claim 1 on a computer.

請求項１０記載のコンピュータプログラムを記録した、コンピュータで読み取り可能な記録媒体。A computer-readable recording medium on which the computer program according to claim 10 is recorded.

しきい値生成部と、皮膚領域候補抽出部とを備えており、前記しきい値生成部は、下記（１）〜（３）のステップを実行し、前記皮膚領域候補抽出部は、下記（４）のステップを実行することを特徴とする、肌色領域の抽出装置：
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小のときの前記しきい値候補をしきい値とするステップ；
（４）前記しきい値で決定される範囲に含まれる前記一次元情報を有する画素を、前記対象画像から選択することにより、前記対象画像から肌色領域を抽出するステップ。A threshold value generation unit; and a skin region candidate extraction unit. The threshold value generation unit executes the following steps (1) to (3). An apparatus for extracting a flesh-colored area, characterized by performing the step of 4):
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting the threshold value candidate when the amount is substantially minimum as a threshold value;
(4) extracting a skin color region from the target image by selecting, from the target image, pixels having the one-dimensional information included in the range determined by the threshold value.

対象画像における画素を判定するためのしきい値を生成するしきい値生成部を備え、前記しきい値生成部は、下記（１）〜（３）のステップを実行することを特徴とする、肌色領域の抽出装置：
（１）サンプル画像中の肌色領域における各画素について、二つの色差信号に関してヒストグラムを生成するステップ；
（２）前記ヒストグラムを、複数のガウスモデルを有するガウス混合モデルにより近似するステップ；
（３）対象画像中の画素についての、二つの色差信号を、ガウスモデルにおける対応値を用いて１次元情報に変換し、ついで、前記１次元情報の値を変化させたときの画素数の増加量がほぼ最小のときの前記しきい値候補をしきい値とするステップ。A threshold generation unit that generates a threshold for determining a pixel in the target image, wherein the threshold generation unit executes the following steps (1) to (3); Skin color region extraction device:
(1) For each pixel in the skin color region in the sample image, generating a histogram for two color difference signals;
(2) approximating the histogram with a Gaussian mixture model having a plurality of Gaussian models;
(3) The two color difference signals of the pixels in the target image are converted into one-dimensional information using the corresponding values in the Gaussian model, and then the number of pixels increases when the value of the one-dimensional information is changed Setting the threshold value candidate when the amount is substantially minimum as a threshold value.

以下のステップを実行する顔領域候補抽出部を備えたことを特徴とする、肌色領域の抽出装置：
（１）対象画像の全部または一部の領域における各画素を、各画素におけるＣｂおよびＣｒの値を用いて複数のクラスタに分解するステップ；
（２）前記複数のクラスタのうち、他に比較してＣｒが大でＣｂが小であるクラスタに属する画素が位置する領域を前記肌色領域とするステップ。An apparatus for extracting a skin color region, comprising a face region candidate extraction unit that executes the following steps:
(1) decomposing each pixel in all or a part of the target image into a plurality of clusters using the values of Cb and Cr in each pixel;
(2) A step in which a region where a pixel belonging to a cluster having a larger Cr and a smaller Cb than other clusters is located as the skin color region, among the plurality of clusters.

請求項１２〜１４のいずれか１項記載の抽出装置であって、さらに顔領域抽出部を備え、前記顔領域抽出部は、肌色領域における構造情報を用いて、顔領域である肌色領域を抽出するステップを実行するものであることを特徴とする抽出装置。The extraction device according to any one of claims 12 to 14, further comprising a face region extraction unit, wherein the face region extraction unit extracts a skin color region, which is a face region, using structure information in the skin color region. An extraction device characterized by performing the following steps.