JP3902887B2

JP3902887B2 - Lip extraction method

Info

Publication number: JP3902887B2
Application number: JP15859799A
Authority: JP
Inventors: 浩志古山
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1999-06-04
Filing date: 1999-06-04
Publication date: 2007-04-11
Anticipated expiration: 2019-06-04
Also published as: JP2000348173A

Description

【０００１】
【発明の属する技術分野】
本発明は、人物の顔を含む画像から唇部分を抽出する方法に関する。
【０００２】
【従来の技術】
従来より、話者の唇の動き情報を検知し、この検知結果を利用して音声認識の認識精度を向上する試みがなされている。そして、唇の動き情報を検出するためには、話者の顔画像から唇部分を精度よく抽出する必要がある。
【０００３】
顔画像から唇部分を抽出する方法としては、１９９０年電子情報通信学会春期全国大会、Ｄ−３２９、ｐ７−８１、「顔画像からの唇特徴点の抽出法」などが知られている。
【０００４】
図１０は、従来の唇画像抽出装置の概要を示すブロック図である。
【０００５】
まず、話者の顔画像が背景分離部１００１に入力される。背景分離部１００１は、入力された顔画像に、その輝度情報を用いてＳｏｂｅｌ（エッジ抽出）オペレータをかけることによりエッジを抽出する。次に背景分離部１００１は、顔画像から、顔画像の最も外側のエッジよりも外側の部分を背景として分離し、顔面画像とする。背景を分離された顔面画像は、ＹＩＱ表色系変換部１００２に入力される。
【０００６】
ＹＩＱ表色系変換部１００２は、唇候補領域の決定するために、背景を分離された顔画像をＹＩＱ表色系へ色変換する。この色変換された画像は、唇候補領域決定部１００３に入力される。
【０００７】
唇候補領域決定部１００３は、通常は、顔画面中の色の中で唇部分を示す色が、最も大きなＱ軸値を示すことを考慮に入れて、Ｑ軸に関して濃度値の累積ヒストグラムをとる。次に、唇候補領域決定部１００３は、この累積ヒストグラムに、濃度値の高い方からｘ％の値で閾値処理を行うことで、唇候補領域を決定する。この、唇候補領域内の画像は唇抽出部１００４に入力される。
【０００８】
このときｘは（式１）から自動的に設定される。
ｘ＝（ｓ／（ｍ＋ｎ））×ｒａｔｉｏ×１００（％） …（式１）
ここで、ｍ×ｎは原画像の画素数、ｓは背景分離画像の画素数で、ｒａｔｉｏは顔の面積に対する唇部分の面積比を表すもので、経験的に定められる。
【０００９】
唇抽出部１００４は、唇候補領域内の画像から唇部分を抽出するため、Ｑ軸に関して濃度値の累積ヒストグラムをとり、Ｑ軸値により再度閾値処理を行う。このようにして唇画像が抽出される。
【００１０】
このときの閾値は、同様に累積ヒストグラムから（式２）で与えられるｙ％の値とする。
ｙ＝（ｓ’／（ｍ＋ｎ））×ｌｅｖｅｌ×１００（％） …（式２）
ここで、ｓ’は唇候補領域内の画素数、ｌｅｖｅｌは唇候補領域の面積に対する唇部分の面積比を表し、これも経験的に定められる。
【００１１】
以上のようにして、色成分ヒストグラムと閾値処理により、入力した話者の顔画像から唇部分を抽出することが可能である。
【００１２】
【発明が解決しようとする課題】
しかしながら、従来の唇抽出方法では、色成分を用いて唇を抽出しているが、色成分は照明条件の影響を受けやすいという問題がある。
【００１３】
また、閾値設定のためにあらかじめ設定した係数を用いているため、男女差および化粧の有無など話者の個人差により、唇抽出の精度は影響を受けやすいという問題もある。
【００１４】
本発明はかかる点に鑑みてなされたものであり、照明など周囲の条件または、話者の個人差などに依存せずに顔画像から唇部分を精度よく抽出することを目的とする。
【００１５】
【課題を解決するための手段】
そこで、本発明の唇抽出方法では、顔画像から唇を含む第１の唇候補領域を決定する工程と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する工程と、前記第１の唇候補領域の色成分ヒストグラムを作成する工程と、前記第２の唇候補領域の色成分ヒストグラムを作成する工程と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成する工程と、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する工程と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する工程と、を具備することを特徴とする。
【００１６】
このように構成することで、第１の唇候補領域と第２の唇候補領域は共に唇部分を含み、第２の唇候補領域は第１の唇候補領域よりも広い範囲で唇周辺における肌色部分を含む。このため、前述の差分ヒストグラムは唇周辺の肌色部分の色成分ヒストグラムとなる。従って、差分ヒストグラムの分布から閾値を設定することにより、対象とする話者の肌と唇部分の色成分の境界を精度よく抽出することが可能となる。
【００１７】
また、閾値設定のために抽出する第１の唇候補領域、第２の唇候補領域ともに同じ画像を用いるため、照明など周囲の条件によって生じる、唇部分抽出精度の劣化も防ぐことも可能となる。
本発明の唇抽出装置では、顔画像から唇を含む第１の唇候補領域を決定する第１の唇候補領域決定部と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する第２の唇候補領域決定部と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとを作成する色成分ヒストグラム作成部と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成し、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する閾値設定部と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する唇抽出部と、を具備することを特徴とする。
【００１８】
【発明の実施の形態】
本発明の第１の態様にかかる唇抽出方法は、顔画像から唇を含む第１の唇候補領域を決定する工程と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する工程と、前記第１の唇候補領域の色成分ヒストグラムを作成する工程と、前記第２の唇候補領域の色成分ヒストグラムを作成する工程と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成する工程と、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する工程と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する工程と、を具備する。
【００１９】
このように差分ヒストグラムの分布から閾値を設定することにより、対象とする話者の肌と唇部分の色成分の境界を精度よく抽出することが可能となる。また、閾値設定のために抽出する第１の唇候補領域、第２の唇候補領域ともに同じ画像を用いるため、照明など周囲の条件によって生じる、唇部分抽出精度の劣化も防ぐことも可能となる。
【００２０】
本発明の第２の態様は、第１の態様にかかる唇抽出方法において、前記色成分ヒストグラムを求める色成分は、予め設定した係数をａ、ｂとして、作成するヒストグラムの色成分がＲＧＢ表色系の
（Ｒ−ａ・Ｇ−ｂ・Ｂ）
である。
【００２１】
このように構成することにより、入力する顔画像のＲＧＢ成分から容易に色成分ヒストグラムを作成することができる。
【００２２】
本発明の第３の態様は、第１の態様または第２の態様にかかる唇抽出方法において、第１の唇候補領域と第２の唇候補領域の面積比が１対１よりも大きく、１対３よりも小さい。
【００２３】
このような面積比にすることで、精度の高い閾値を設定できるので、唇の抽出も精度高くできる。
【００２４】
本発明の第４の態様は、第１の態様から第３の態様のいずれかにかかる唇抽出方法において、第１の唇候補領域を決定する工程は、顔画像から仮唇候補領域を決定する工程と、仮唇候補領域の色成分ヒストグラムを作成する工程と、予め作成した複数の話者の顔画像から唇部分を切り出す際に使用した複数の閾値の平均値である閾値を、仮唇候補領域の色成分ヒストグラムに用いて色成分による閾値処理を行うことにより、仮唇候補領域から唇部分を抽出し、２値化を行う工程と、抽出された唇部分を含む長方形の領域を切り出すことで決定した新たな唇候補領域を第１の唇候補領域とする工程と、を具備する。
【００２５】
以上のようにして、仮唇候補領域を使用して第１の唇候補領域を決定することにより、第１の唇候補領域を精度よく抽出することが可能となる。さらに、このように精度よく抽出された第１の唇候補領域を使用して唇画像を抽出するため、唇を精度よく抽出することができる。
【００２６】
本発明の第５の態様は、第１の態様から第３の態様のいずれかにかかる唇抽出方法において、第１の唇候補領域を決定する工程は、顔画像から仮唇候補領域を決定する工程と、仮唇候補領域の色成分ヒストグラムを作成する工程と、予め複数の話者の顔画像から唇部分を切り出して作成した唇抽出用テンプレートの色成分ヒストグラムと前記仮唇候補領域の色成分ヒストグラムを比較し、類似度のもっとも高い唇抽出用テンプレートの閾値を仮唇候補領域の色成分ヒストグラムに用いて色成分による閾値処理を行うことにより、仮唇候補領域から唇部分を抽出し、２値化を行う工程と、抽出された唇部分を含む長方形の領域を切り出すことで決定した新たな唇候補領域を第１の唇候補領域とする工程と、を具備する。
【００２７】
以上のようにして、仮唇候補領域を使用して第１の唇候補領域を決定することにより、第１の唇候補領域を精度よく抽出することが可能となる。さらに、このように精度よく抽出された第１の唇候補領域を使用して唇画像を抽出するため、唇を精度よく抽出することができる。
【００２８】
本発明の第６の態様は、第１の態様から第５の態様のいずれかにかかる唇抽出方法において、前記色成分ヒストグラムを求める色成分は、前記抽出した唇部分の色成分の分布と抽出した唇周辺部分の色分布に対し、ＲＧＢ色空間上で唇部分と唇周辺部分を２分する関数
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０
を求め、唇抽出のために作成するヒストグラムの色成分を
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ
とする。
【００２９】
このように唇部分と唇周辺部分を２分する関数により、抽出した唇部分の色成分の分布と抽出した唇周辺部分の色分布を唇部分と唇周辺部分とに簡単に２分することができる。また、唇部分と唇周辺部分を２分する関数から、閾値を設定するための色成分（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）の係数（ａ、ｂ、ｃ）を決定し、この係数により色成分ヒストグラムを作成することにより、より精度よく唇部分を抽出することが可能になる。
【００３０】
本発明の第７の態様は、第１の態様から第６の態様のいずれかにかかる唇抽出方法において、さらに、第１の唇候補領域の画像を輝度成分に変換し、エッジ抽出し、２値化した画像と、色成分により唇部分を抽出した画像の論理和を取った画像を唇抽出画像として出力する工程を有する。
【００３１】
このように色成分と輝度成分による唇抽出を行うことにより、より精度の高い唇部分の抽出が可能となる。
本発明の第８の態様に係る唇抽出装置は、顔画像から唇を含む第１の唇候補領域を決定する第１の唇候補領域決定部と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する第２の唇候補領域決定部と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとを作成する色成分ヒストグラム作成部と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成し、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する閾値設定部と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する唇抽出部と、を具備することを特徴とする。
本発明の第９の態様は、第８の態様にかかる唇抽出装置において、前記色成分ヒストグラムを求める色成分は、予め係数ａ，ｂを設定し、作成する色成分ヒストグラムの色成分がＲＧＢ表色系の（Ｒ−ａ・Ｇ−ｂ・Ｂ）であることを特徴とする。
本発明の第１０の態様は、第８の態様または第９の態様にかかる唇抽出装置において、前記第１の唇候補領域と前記第２の唇候補領域の面積比が１対１よりも大きく、１対３よりも小さいことを特徴とする。
【００３２】
以下、本発明の実施の形態について図を用いて詳細に説明する。
【００３３】
（実施の形態１）
図１は、本発明にかかる音声認識装置の構成を示すブロック図である。この図を使用して、実施の形態１にかかる音声認識装置ついて説明する。
【００３４】
音声認識装置１０１は、画像入力部１０２と画像処理部１０３と音声入力部１０４と音声認識部１０５とを有する構成になっている。
【００３５】
画像入力部１０２は、話者の顔を撮影することによって得られた顔画像を出力する。画像入力部１０２としては、ＣＣＤカメラなどが考えられる。
【００３６】
画像処理部１０３は、画像入力部１０２から入力された顔画像から唇画像を抽出して出力する。画像処理部１０３は、顔面画像抽出部１０６と唇画像抽出部１０７とを有する構成になっている。
【００３７】
顔面画像抽出部１０６は、画像入力部１０２から入力された顔画像に、その輝度情報を用いてＳｏｂｅｌ（エッジ抽出）オペレータをかけてエッジを抽出する。次に顔面画像抽出部１０６は、最も外側のエッジよりも外側の部分顔画像の画像情報を背景として顔画像から分離し、顔面画像を作成する。顔面画像抽出部１０６は、顔面画像をＹＩＱ表色系変換に変換し、唇画像抽出部１０７に入力する。
【００３８】
なお、本実施の形態では、顔面画像抽出部１０６が顔画像から顔面画像を抽出する方法として、顔画像の輝度情報を用いてＳｏｂｅｌ（エッジ抽出）オペレータをかけてエッジを抽出する方法を用いたが、これ以外の方法を用いてもよく、これ以外の方法を用いた形態も本発明に包含される。
【００３９】
唇画像抽出部１０７は、入力された顔面画像から唇画像を抽出して音声認識部１０５に出力する。
【００４０】
音声入力部１０４は、マイクなどの集音装置により集音された音声を音声認識部１０５に出力する。
【００４１】
音声認識部１０５は、画像処理部１０３から入力された唇画像を用いて、音声入力部１０４から入力された音声認識をして、認識結果を出力する。
【００４２】
以下、本発明の特徴である唇抽出精度が高められた唇画像抽出部１０７の説明を詳細に行う。
【００４３】
図２は、実施の形態１にかかる唇画像抽出部の構成を示すブロック図である。
【００４４】
第１の唇候補領域決定部２０１は、入力された話者の顔面画像と唇を抽出するための情報から第１の唇候補領域を決定する。第１の唇候補領域の決定方法としては、前述の従来例における方法や、第２回画像センシングシンポジウム講演論文集、Ａ−１、ｐ１〜６、「色情報とＧＡを用いた顔画像抽出と個人照合の応用」に示されているような、予め用意された顔（または唇）抽出用テンプレートベクトル集合と遺伝的アルゴリズムを用いたパターンマッチングによる領域抽出法などが考えられる。なお、第１の唇候補領域の決定方法として、これら以外の技術を用いることも可能であり、これら以外の技術を使用して第１の唇候補領域の決定してもよい。
【００４５】
第２の唇候補領域決定部２０２は、入力された顔面画像から第１の唇候補領域の全てとその周辺を含むように決定された第２の唇候補領域を決定する。第１の唇候補領域と第２の唇候補領域の面積比は、１対１よりも大きく、１対３よりも小さくなっている。
【００４６】
色成分ヒストグラム作成部２０３は、第１の唇候補領域の色成分ヒストグラムおよび第２の唇候補領域の色成分ヒストグラムを作成する。図４において、色成分としてＲＧＢ表色系における（Ｒ−Ｇ−Ｂ）成分を横軸として示している。しかし、色成分として、このほかにも（Ｒ−Ｇ）成分、Ｒ／Ｇ成分、Ｒ／（Ｇ・Ｂ）成分、予め設定した係数をａ、ｂとして、（Ｒ−ａ・Ｇ−ｂ・Ｂ）成分を算出して用いることで入力する顔画像のＲＧＢ成分から容易に色成分ヒストグラムを作成できるようにすることや、前述の従来例に示されているＹＩＱ表色系におけるＱ成分や、ＨＳＶ表色系におけるＨ成分、Ｓ成分などを利用することで効果を上げることも可能である。
【００４７】
閾値設定部２０４は、第１の唇候補領域のヒストグラムと第２の唇候補領域のヒストグラムとの差分を取った差分ヒストグラムを作成し、この差分ヒストグラムのピーク値と予め設定された係数を掛けることにより得られる値に対応する色成分値を閾値として設定する。
【００４８】
唇抽出部２０５は、第１の唇候補領域のヒストグラムと閾値設定部２０４で決定した閾値から唇部分を抽出し、唇画像として出力する。
【００４９】
図３は、実施の形態１にかかる第１の唇候補領域および第２の唇候補領域を説明するための概略図である。
【００５０】
３０１は、唇部分であり、３０２は第１の唇候補領域決定部２０１により決定された第１の唇候補領域であり、３０３は第１の唇候補領域の全てとその周辺を含むように決定された第２の唇候補領域である。
【００５１】
図４は、実施の形態１にかかる第１の唇候補領域の色成分ヒストグラム（度数分布）と、第２の唇候補領域の色成分ヒストグラムと、第１の唇候補領域の色成分ヒストグラムと第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムと、を示す図である。
【００５２】
横軸は色成分（ＲＧＢ表色系における、（Ｒ−Ｇ−Ｂ）成分）、縦軸は色成分の度数である。４０１は第１の唇候補領域３０２の色成分ヒストグラム、４０２は第２の唇候補領域３０３の色成分ヒストグラム、４０３は第１の唇候補領域３０２の色成分ヒストグラムと第２の唇候補領域３０３の色成分ヒストグラムとの差分ヒストグラムである。また、４０４は、第２の唇候補領域の色成分ヒストグラムの第１のピークであり、４０５は、第２の唇候補領域の色成分ヒストグラムの第２のピークである。
【００５３】
以下、実施の形態１にかかる唇画像抽出部の動作について図２、図３および図４を使用して説明する。
【００５４】
初めに、第１の唇候補領域決定部２０１に話者の顔面画像と唇を抽出するための情報が入力される。第１の唇候補領域決定部２０１は、この顔面画像から第１の唇候補領域３０２を決定し、色成分ヒストグラム作成部２０３に出力する。
【００５５】
続いて、第２の唇候補領域決定部２０２が、入力された話者の顔面画像から、第１の唇候補領域３０２の全てとその周辺を含む領域を第２の唇候補領域３０３として新たに決定し、色成分ヒストグラム作成部２０３に出力する。
【００５６】
色成分ヒストグラム作成部２０３は、抽出した第１の唇候補領域３０２の色成分ヒストグラム４０１および第２の唇候補領域３０３の色成分ヒストグラム４０２を作成する。
【００５７】
図４からも明らかなように、第２の唇候補領域３０３の色成分ヒストグラム４０２には、第１のピーク４０４と第２のピーク４０５の２つのピークがある。また、第１の唇候補領域３０２の色成分ヒストグラム４０１には第２のピーク４０５と同じ位置にピークがある。
【００５８】
第１の唇候補領域３０２の色成分ヒストグラム４０１と第２の唇候補領域３０３の色成分ヒストグラム４０２とが共にピークを現わしている第２のピーク４０５は唇部分３０１の色成分によるものである。また、第２の唇候補領域３０３の色成分ヒストグラム４０２にのみはっきりと現れている第１のピーク４０４は、唇周辺の肌色部分の色成分によるものである。
【００５９】
従って、第１の唇候補領域３０２の色成分ヒストグラム４０１と第２の唇候補領域３０３の色成分ヒストグラム４０２との差分ヒストグラム４０３を作成することにより、唇周辺の肌色成分ヒストグラムを抽出することが可能となる。
【００６０】
閾値設定部２０４は、この肌色成分ヒストグラムである差分ヒストグラム４０３を作成し、そのピーク値を求め、このピーク値に対応する色成分値を求める。さらに、閾値設定部２０４は、このピーク値に対応する色成分値に予め設定された係数を掛けることにより得られる色成分値を閾値として設定し、唇抽出部２０５に出力する。
【００６１】
唇抽出部２０５は、第１の唇候補領域の色成分ヒストグラム４０１の色成分のうち閾値設定部２０４で決定した閾値よりも第２のピーク４０５側にある色成分の領域を唇部分として抽出し、唇画像を出力する。
【００６２】
また、第１の唇候補領域３０２と第２の唇候補領域３０３の面積比が１対１よりも大きく、１対３よりも小さい面積比になっている。このような面積比にすることで、精度よく閾値を求めることができるので、唇を精度よく抽出することができる。なお、面積比はこの比率以外でも周囲の環境および個人差によって変更しても構わない。
【００６３】
このように、実施の形態１にかかる唇画像抽出部によれば、第１の唇候補領域と第２の唇候補領域は共に唇部分を含み、第２の唇候補領域は第１の唇候補領域よりも広い範囲で唇周辺における肌色部分を含む。このため、差分ヒストグラムは唇周辺の肌色部分の色成分ヒストグラムとなる。従って、差分ヒストグラムの分布から閾値を設定することにより、対象とする話者の肌と唇部分の色成分の境界を精度よく抽出することが可能となる。
【００６４】
また、閾値設定のために抽出する第１の唇候補領域、第２の唇候補領域ともに同じ画像を用いるため、照明など周囲の条件によって生じる、唇部分抽出精度の劣化も防ぐことも可能となる。
【００６５】
よって、実施の形態１にかかる唇画像抽出部によれば、照明などの条件や、男女差、化粧（口紅）の有無など話者の個人差により、色成分ヒストグラムの分布が異なっても、これらの影響を受けることなく唇部分の抽出が可能となる。
【００６６】
（実施の形態２）
図５は、本発明の実施の形態２にかかる唇画像抽出部の構成を示すブロック図である。この図を使用して、唇画像抽出部の構成について詳細に説明する。
【００６７】
実施の形態２にかかる唇画像抽出部は、実施の形態１にかかる唇画像抽出部において、第１の唇候補領域決定方法として仮唇候補領域を用いる点が異なるのみである。また、既に説明した部分と同じ部分については同一の符号を付与してある。
【００６８】
仮唇候補領域決定部５０１は、実施の形態１で説明した第１の唇候補領域決定部２０１と同じ動作をし、唇候補領域を抽出し、この領域を仮唇候補領域とする。
【００６９】
色成分ヒストグラム作成部５０２は、仮唇候補領域の色成分（Ｒ−Ｇ−Ｂ）のヒストグラムを作成する。なお、色成分ヒストグラム作成部２０３と同様にこの色成分以外の色成分を使用しても良い。
【００７０】
閾値記憶部５０３は、実施の形態１にかかる唇画像抽出部により、予め複数の話者の複数の話者の顔画像から唇画像部分を抽出し、その際に用いた唇画像抽出のための閾値（色成分）の平均値を算出した閾値を記憶している。
【００７１】
唇抽出部５０４は、仮唇候補領域の色成分ヒストグラムと閾値記憶部５０３に記憶された閾値を用いて唇部分を抽出し、２値化し、唇画像として出力する。
【００７２】
第１の唇候補領域決定部５０５は、唇抽出部５０４から入力せれた唇部分を含む長方形の領域を切り出し、この領域をあらたに第１の唇候補領域として決定する。
【００７３】
以下に、実施の形態２にかかる唇画像抽出部の動作について説明する。
【００７４】
まず、仮唇候補領域決定部５０１に唇抽出の対象となる話者の顔面画像と唇情報が入力される。仮唇候補領域決定部５０１は、唇抽出の対象となる話者の顔面画像に対して、仮唇候補領域を決定し、色成分ヒストグラム作成部５０２に出力する。
【００７５】
続いて、色成分ヒストグラム作成部５０２が、入力された仮唇候補領域の色成分（Ｒ−Ｇ−Ｂ）のヒストグラムを作成し、唇抽出部５０４に出力する。
【００７６】
唇抽出部５０４は、色成分ヒストグラム作成部５０２から入力された仮唇候補領域の色成分ヒストグラムと閾値記憶部５０３に記憶されている閾値を用いて唇部分を抽出し２値化し、第１の唇候補領域決定部５０５に出力する。
【００７７】
第１の唇候補領域決定部５０５は、抽出された唇部分を含む長方形を切り出し、この領域を第１の唇候補領域として決定する。
【００７８】
以下の動作については実施の形態１と同じなので説明を省略する。
【００７９】
以上のように、実施の形態２にかかる唇画像抽出部によれば、仮唇候補領域を使用して第１の唇候補領域を決定することにより、精度よく第１の唇候補領域を抽出することが可能となる。さらに、このように精度よく抽出された第１の唇候補領域を使用して唇画像を抽出するため、唇画像を精度よく抽出することができる。
【００８０】
（実施の形態３）
図６は、本発明の実施の形態３にかかる唇画像抽出部の構成を示すブロック図である。この図を使用して実施の形態３にかかる唇画像抽出部の構成を説明する。なお、すでに説明した部分と同一の部分については同一の符号を付与してある。
【００８１】
実施の形態３にかかる唇画像抽出部は、実施の形態１にかかる唇画像抽出部おいて、第１の唇候補領域決定方法として仮唇候補領域を用いる点が異なるのみである。また、既に説明した部分と同じ部分については同一の符号を付与してある。
【００８２】
唇抽出用テンプレート６０１は、予め複数の話者の顔画像から唇部分を切り出して作成した色成分ヒストグラムと閾値が記憶している。
【００８３】
唇抽出部６０２は、唇抽出用テンプレート６０１の色成分ヒストグラムと仮唇候補領域の色成分ヒストグラムを比較する。そして、次に、唇抽出部６０２は、比較した結果、類似度のもっとも高い唇抽出用テンプレートの閾値を用いて仮唇候補領域の画像に対して唇部分の画像を抽出する。
【００８４】
以下、実施の形態３にかかる唇抽出部の動作について説明する。
【００８５】
まず、前述の実施の形態２と同じ方法により、仮唇候補領域の色成分ヒストグラムが作成され、唇抽出部６０２に入力される。
【００８６】
続いて、唇抽出部６０２は、唇抽出用テンプレート６０１の色成分ヒストグラムと仮唇候補領域の色成分ヒストグラムを比較する。そして、唇抽出部６０２は、比較した結果、類似度の最も高い唇抽出用テンプレート６０１の閾値を用いて仮唇候補領域の画像に対して唇部分を抽出し、出力する。
【００８７】
以下の動作については、実施の形態１または実施の形態２と同様なので説明を省略する。
【００８８】
以上のように、実施の形態３にかかる唇画像抽出部によれば、仮唇候補領域と唇抽出用テンプレートを使用して第１の唇候補領域を決定することにより、精度よく第１の唇候補領域を抽出することが可能となる。さらに、このように精度よく抽出された第１の唇候補領域を使用して唇画像を抽出するため、唇画像を精度よく抽出することができる。
【００８９】
（実施の形態４）
図７は本発明の実施の形態４にかかる唇画像抽出部の構成を示すブロック図である。この図を使用して実施の形態４にかかる唇画像抽出部の構成を説明する。なお、すでに説明した部分と同一の部分については同一の符号を付与してある。
【００９０】
仮唇部分の色成分分布作成部７０１は、唇抽出された部分の色成分の分布をＲＧＢの空間上で作成する。
【００９１】
仮唇周辺部分の色成分分布作成部７０２は、唇周辺部分の色成分の分布をＲＧＢの空間上で作成する。
【００９２】
色成分パラメータ決定部７０３は、ある与えられた色成分（Ｒ、Ｇ、Ｂ）が唇抽出された部分の色成分の分布と唇周辺部分の色成分の分布のいずれに属するかをＲＧＢ色空間上で判別するために用いる判別関数
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０
を求める。また、判別関数は判別分析により求めることができる。判別関数の係数（ａ、ｂ、ｃ）を用いて第１の唇候補領域３０２と第２の唇候補領域３０３の色成分パラメータ（（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）を決定する。なお、実施の形態１に示した色成分（Ｒ−Ｇ−Ｂ）はａ＝１、ｂ＝−１、ｃ＝−１の場合に相当する。
【００９３】
色成分ヒストグラム作成部７０４は、係数（ａ、ｂ、ｃ）を用いて第１の唇候補領域３０２の色成分（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）ヒストグラムと、第２の唇候補領域３０３の色成分ヒストグラムを作成する。
【００９４】
図８は、実施の形態４にかかる唇抽出された部分の色成分の分布と唇周辺部分の色成分の分布をＲＧＢの色空間上に示した図である。
【００９５】
８０１は抽出された仮唇部分の色成分の分布をＲＧＢの色空間上に示したものであり、８０２は仮唇周辺部分の色成分の分布をＲＧＢの色空間上に示したものである。８０３は、ある与えられた色成分（Ｒ、Ｇ、Ｂ）が仮唇部分の集合と仮唇周辺部分の集合のいずれに属するかを判別するための判別関数（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０）を示す平面である。
【００９６】
以下、実施の形態４にかかる唇画像抽出部の動作について説明する。
【００９７】
まず、実施の形態１と同様な手順で唇部分を抽出する。
【００９８】
つぎに、仮唇部分の色成分分布作成部７０１がこの唇部分を仮唇部分とし、仮唇部分の色成分分布を作成し、出力する。同様に、仮唇周辺部分の色成分分布作成部７０２は、仮唇周辺の色成分分布を作成し、出力する。
【００９９】
次に、色成分パラメータ決定部７０３は、入力された仮唇部分の色成分の分布８０１と仮唇周辺部分の色成分の分布８０２に対して（ＲＧＢ）色空間上で判別分析を行い判別関数
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０
を求め、この係数（ａ、ｂ、ｃ）を用いて第１の唇候補領域３０２と第２の唇候補領域３０３の色成分パラメータ（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）を決定し、出力する。
【０１００】
色成分ヒストグラム作成部７０４は、このようにして得られた、係数（ａ、ｂ、ｃ）を用いて第１の唇候補領域３０２の色成分ヒストグラムを作成し、出力する。さらに、色成分ヒストグラム作成部７０４は、第２の唇候補領域３０３の色成分（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）ヒストグラムを作成し、出力する。
【０１０１】
以下の動作については、既に説明した動作と同じであるので説明を省略する。
【０１０２】
以上のように、実施の形態４にかかる唇画像抽出部によれば、判別分析の判別関数により閾値を設定するための色成分（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ）の係数（ａ、ｂ、ｃ）を決定し、この色成分を使用して色成分ヒストグラムを作成することにより、より精度よく唇部分を抽出することが可能となる。
【０１０３】
なお、実施の形態４では、唇部分と唇周辺部分を２分する関数として、判別分析によって求められた判別関数を使用したが、判別関数以外の関数、例えば唇抽出された仮唇部分の色成分分布８０１と仮唇周辺部分の色成分の分布８０２、それぞれの重心間の中点を通り、重心を結ぶ直線に直交する平面を表わす関数（ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０）等を使用することも可能である。また、これらの判別関数以外の関数の係数（ａ、ｂ、ｃ）を用いることも当然のことながら可能である。
【０１０４】
（実施の形態５）
図９は、本発明の実施の形態５にかかる唇画像抽出部の構成を示すブロック図である。この図を使用して実施の形態５にかかる唇画像抽出部について詳細に説明する。なお、既に説明した部分と同一の部分については同一の符号が付与してある。
【０１０５】
輝度値変換部９０１は、第１の唇候補領域の画像データを輝度値で表わされる画像データに変換する。
【０１０６】
エッジ抽出部９０２は、輝度値で表わされた画像データにエッジオペレーターをかけることによりエッジを抽出し、さらに２値化する。
【０１０７】
画像合成部９０３は、色成分ヒストグラムを用いて抽出した唇画像とエッジ抽画像の論理和をとり、唇画像として出力する。
【０１０８】
以下、実施の形態５にかかる唇画像抽出部の動作について説明する。
【０１０９】
色成分ヒストグラムを用いて唇抽出する部分の処理は実施の形態１と同様であるので、説明を省略する。
【０１１０】
第１の唇候補領域決定部２０１が第１の唇候補領域抽出した後、輝度値変換部９０１は、第１の唇候補領域の画像データを輝度値で表わされる画像データに変換し、出力する。
【０１１１】
次に、エッジ抽出部９０２が、輝度値に変換された画像データにエッジオペレーターをかけることによりエッジを抽出し、さらに２値化して出力する。
【０１１２】
そして、画像合成部９０３が、実施の形態１と同様に色成分ヒストグラムを用いて抽出した唇画像とエッジ抽出部９０２から入力されたエッジ抽出画像の論理和をとり、これを唇画像として出力する。
【０１１３】
以上のようにして、実施の形態５にかかる唇画像抽出部によれば、色成分による唇抽出だけでなく輝度成分による唇抽出も行うことにより、より精度の高い唇部分の抽出が可能となる。
【０１１４】
以上説明してきたように、本発明の唇抽出方式では、照明などの条件や話者の個人差に影響を受けることなく、唇部分を精度よく抽出することが可能となる。
【０１１５】
また、実施の形態１から実施の形態５を適宜組み合わせることにより、さらに精度を上げることも可能である。
【０１１６】
さらに、本発明にかかる唇画像抽出部の構成は、図２、図５〜７、図９に限定されるものではなく、各処理部、例えば色成分ヒストグラム作成部を設ける数を増やす等して並列化し、処理を高速化することもできる。また、本発明にかかる唇画像抽出部の各処理部、例えば閾値設定部、色成分ヒストグラム作成部などに複数回の処理をさせることにより、小型化することもできる。
【０１１７】
また、本発明の説明では、唇画像抽出部を音声認識装置に適用したが、唇画像抽出部をこれ以外の装置に適用することも可能であり、本発明に含まれる。
【０１１８】
【発明の効果】
以上説明したように、本発明によれば、照明などの条件や話者の個人差に影響を受けることなく、唇部分を精度よく抽出することが可能となる。
【図面の簡単な説明】
【図１】本発明にかかる音声認識装置の構成を示すブロック図
【図２】本発明の実施の形態１にかかる唇画像抽出部の構成を示すブロック図
【図３】実施の形態１にかかる第１の唇候補領域および第２の唇候補領域を説明するための概略図
【図４】実施の形態１にかかる第１の唇候補領域の色成分ヒストグラムと、第２の唇候補領域の色成分ヒストグラムと、第１の唇候補領域の色成分ヒストグラムと第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを示す図
【図５】本発明の実施の形態２にかかる唇画像抽出部の構成を示すブロック図
【図６】本発明の実施の形態３にかかる唇画像抽出部の構成を示すブロック図
【図７】本発明の実施の形態４にかかる唇画像抽出部の構成を示すブロック図
【図８】実施の形態４にかかる唇抽出された部分の色成分の分布と唇周辺部分の色成分の分布をＲＧＢの色空間上に示した図
【図９】本発明の実施の形態５にかかる唇画像抽出部の構成を示すブロック図
【図１０】従来の唇画像抽出装置の概要を示すブロック図
【符号の説明】
１０１音声認識装置
１０２画像入力部
１０３画像処理部
１０４音声入力部
１０５音声認識部
１０６顔面画像抽出部
１０７唇画像抽出部
２０１、５０５第１の唇候補領域決定部
２０２第２の唇候補領域決定部
２０３、５０２色成分ヒストグラム作成部
２０４閾値設定部
２０５、５０４、６０２唇抽出部
５０１仮唇候補領域決定部
５０３閾値記憶部
６０１唇抽出用テンプレート
７０１仮唇部分の色成分分布作成部
７０２仮唇周辺部分の色成分分布作成部
７０３色成分パラメータ決定部
７０４色成分ヒストグラム作成部
９０１輝度値変換部
９０２エッジ抽出部
９０３画像合成部[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a method for extracting a lip portion from an image including a human face.
[0002]
[Prior art]
  Conventionally, attempts have been made to detect the movement information of the speaker's lips and improve the recognition accuracy of voice recognition using the detection result. In order to detect lip movement information, it is necessary to accurately extract the lip portion from the face image of the speaker.
[0003]
  As a method for extracting a lip portion from a face image, the 1990 electronic information communication society spring national convention, D-329, p7-81, “a method for extracting lip feature points from a face image” and the like are known.
[0004]
  FIG. 10 is a block diagram showing an outline of a conventional lip image extraction apparatus.
[0005]
  First, a speaker's face image is input to the background separation unit 1001. The background separation unit 1001 applies a Sobel (edge extraction) operator to the input face image using the luminance information.TheTo extract edges. Next, the background separation unit 1001 separates, from the face image, a portion outside the outermost edge of the face image as a background to obtain a face image. The face image from which the background has been separated is input to the YIQ color system conversion unit 1002.
[0006]
  The YIQ color system conversion unit 1002 performs color conversion of the face image from which the background is separated into the YIQ color system in order to determine the lip candidate region. The color-converted image is input to the lip candidate area determination unit 1003.
[0007]
  The lip candidate region determination unit 1003 normally takes a cumulative histogram of density values with respect to the Q axis, taking into account that the color indicating the lip portion among the colors on the face screen shows the largest Q axis value. . Next, the lip candidate region determining unit 1003 determines the lip candidate region by performing threshold processing on the cumulative histogram with a value of x% from the highest density value. The image in the lip candidate region is input to the lip extraction unit 1004.
[0008]
  At this time, x is automatically set from (Equation 1).
x = (s / (m + n)) × ratio × 100 (%) (Formula 1)
Here, m × n is the number of pixels of the original image, s is the number of pixels of the background separation image, and ratio represents the area ratio of the lip portion to the area of the face and is determined empirically.
[0009]
  In order to extract the lip portion from the image in the lip candidate region, the lip extraction unit 1004 takes a cumulative histogram of density values with respect to the Q axis and performs threshold processing again with the Q axis value. In this way, the lip image is extracted.
[0010]
  Similarly, the threshold value at this time is a value of y% given by (Equation 2) from the cumulative histogram.
y = (s ′ / (m + n)) × level × 100 (%) (Formula 2)
Here, s' represents the number of pixels in the lip candidate region, and level represents the ratio of the area of the lip portion to the area of the lip candidate region, which is also determined empirically.
[0011]
  As described above, the lip portion can be extracted from the input face image of the speaker by the color component histogram and the threshold processing.
[0012]
[Problems to be solved by the invention]
  However, in the conventional lip extraction method, lips are extracted using color components, but there is a problem that the color components are easily affected by illumination conditions.
[0013]
  In addition, since a preset coefficient is used for setting the threshold value, there is a problem that the accuracy of lip extraction is easily affected by individual differences among speakers such as gender differences and the presence or absence of makeup.
[0014]
  The present invention has been made in view of such a point, and an object thereof is to accurately extract a lip portion from a face image without depending on ambient conditions such as illumination or individual differences among speakers.
[0015]
[Means for Solving the Problems]
  Therefore, in the lip extraction method of the present invention, from the face image.Including lipsThe first lip candidate areaA step of determining, a step of determining a second lip candidate region including all regions of the first lip candidate region and surrounding regions thereof from the face image, and a color component histogram of the first lip candidate region A step of generating a color component histogram of the second lip candidate region, and a difference histogram between the color component histogram of the first lip candidate region and the color component histogram of the second lip candidate region And the peak in the difference histogram is determined by the color component of the skin color portion around the lips, and the color component value obtained by multiplying the color component value corresponding to the peak value by a coefficient is used as a threshold value.A setting process;The peak of the color component histogram of the first lip candidate region is caused by the color component of the lip portion, and the region of the color component located on the peak side by the color component of the lip portion from the threshold is extracted as the lip portion. And comprising.
[0016]
  With this configuration, both the first lip candidate area and the second lip candidate area include lip portions, and the second lip candidate area is wider than the first lip candidate area, and is a skin color around the lips. Including parts. Therefore, the above difference histogram is a color component histogram of the skin color portion around the lips. Therefore, by setting a threshold value from the distribution of the difference histogram, it is possible to accurately extract the boundary between the color components of the target speaker's skin and lips.
[0017]
  In addition, since the same image is used for both the first lip candidate region and the second lip candidate region extracted for setting the threshold, it is possible to prevent deterioration of the lip portion extraction accuracy caused by ambient conditions such as illumination. .
  In the lip extraction apparatus of the present invention, a first lip candidate region determination unit that determines a first lip candidate region including lips from a face image, all the first lip candidate regions from the face image, and Creating a second lip candidate region determining unit that determines a second lip candidate region including a peripheral region, a color component histogram of the first lip candidate region, and a color component histogram of the second lip candidate region A color component histogram creation unit that creates a difference histogram between the color component histogram of the first lip candidate region and the color component histogram of the second lip candidate region, and A threshold value setting unit for setting a color component value obtained by multiplying a color component value corresponding to the peak value by a coefficient as a threshold value, and the first lip candidate A lip extractor that extracts a peak of the color component histogram of the region from the color component of the lip portion and extracts a region of the color component that is on the peak side of the color component of the lip portion from the threshold as the lip portion. It is characterized by doing.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
  The lip extraction method according to the first aspect of the present invention includes:Determining a first lip candidate area including lips from a face image; and determining a second lip candidate area including all areas of the first lip candidate area and surrounding areas from the face image. Creating a color component histogram of the first lip candidate area; creating a color component histogram of the second lip candidate area; and a color component histogram of the first lip candidate area; Creating a difference histogram with the color component histogram of the lip candidate region of 2 and the peak in the difference histogram as a color component of the skin color portion around the lips, and multiplying the color component value corresponding to the peak value by a coefficient And setting the color component value obtained as a threshold value, and the peak of the color component histogram of the first lip candidate region is due to the color component of the lip part, and the color of the lip part is more than the threshold value. A step of extracting a region of the color components on the peak side by minute as lip portion comprises a.
[0019]
  Thus, by setting a threshold value from the distribution of the difference histogram, it is possible to accurately extract the boundary between the color components of the target speaker's skin and lips. In addition, since the same image is used for both the first lip candidate region and the second lip candidate region extracted for setting the threshold, it is possible to prevent deterioration of the lip portion extraction accuracy caused by ambient conditions such as illumination. .
[0020]
  According to a second aspect of the present invention, in the lip extraction method according to the first aspect,The color component for obtaining the color component histogram is:Assuming that the coefficients set in advance are a and b, the color components of the histogram to be created are in the RGB color system.
(Ra, Gb, B)
It is.
[0021]
  With this configuration, it is possible to easily create a color component histogram from the RGB components of the input face image.
[0022]
  According to a third aspect of the present invention, in the lip extraction method according to the first aspect or the second aspect, the area ratio between the first lip candidate region and the second lip candidate region is larger than 1: 1, Smaller than pair 3.
[0023]
  By setting such an area ratio, a highly accurate threshold can be set, so that the extraction of lips can be performed with high accuracy.
[0024]
  According to a fourth aspect of the present invention, in the lip extraction method according to any one of the first aspect to the third aspect, the step of determining the first lip candidate area determines the temporary lip candidate area from the face image. A threshold value which is an average value of a plurality of threshold values used in extracting a lip portion from a plurality of speaker face images created in advance, a step of creating a color component histogram of the lip candidate region, Extracting the lip portion from the temporary lip candidate region by performing threshold processing using the color component using the color component histogram of the region, binarizing, and cutting out a rectangular region including the extracted lip portion And a step of setting the new lip candidate region determined in step 1 as a first lip candidate region.
[0025]
  As described above, the first lip candidate region can be accurately extracted by determining the first lip candidate region using the temporary lip candidate region. Furthermore, since the lip image is extracted using the first lip candidate region extracted with high accuracy in this way, the lips can be extracted with high accuracy.
[0026]
  According to a fifth aspect of the present invention, in the lip extraction method according to any one of the first to third aspects, the step of determining the first lip candidate region determines the temporary lip candidate region from the face image. A step of creating a color component histogram of the temporary lip candidate region, a color component histogram of a lip extraction template created by cutting out lip portions from a plurality of speaker face images in advance, and a color component of the temporary lip candidate region By comparing the histograms, using the threshold value of the lip extraction template with the highest similarity in the color component histogram of the temporary lip candidate region, and performing threshold processing based on the color component, the lip portion is extracted from the temporary lip candidate region. And a step of performing a valuation, and a step of setting a new lip candidate region determined by cutting out a rectangular region including the extracted lip portion as a first lip candidate region.
[0027]
  As described above, the first lip candidate region can be accurately extracted by determining the first lip candidate region using the temporary lip candidate region. Furthermore, since the lip image is extracted using the first lip candidate region extracted with high accuracy in this way, the lips can be extracted with high accuracy.
[0028]
  According to a sixth aspect of the present invention, in the lip extraction method according to any one of the first to fifth aspects,The color component for obtaining the color component histogram is theA function that bisects the lip part and the peripheral part of the lip in the RGB color space with respect to the color component distribution of the extracted lip part and the color distribution of the extracted peripheral part of the lip
  a * R + b * G + c * B + d = 0
The histogram color components to be created for lip extraction.
  a ・ R + b ・ G + c ・ B
And
[0029]
  In this way, the function of dividing the lip portion and the lip peripheral portion into two can easily divide the extracted lip portion color component distribution and the extracted lip peripheral portion color distribution into the lip portion and the lip peripheral portion. it can. Also, coefficients (a, b, c) of color components (a, R + b, G + c, B) for setting a threshold are determined from a function that bisects the lip portion and the peripheral portion of the lips, and the color components are determined based on these coefficients. By creating a histogram, it is possible to extract the lip portion with higher accuracy.
[0030]
  According to a seventh aspect of the present invention, in the lip extraction method according to any one of the first to sixth aspects,further,The image of the first lip candidate region is converted into a luminance component, edge extraction is performed, and an image obtained by performing a logical sum of the binarized image and the image obtained by extracting the lip portion by the color component is output as a lip extraction image.Have steps.
[0031]
  By performing lip extraction using the color component and the luminance component in this manner, it is possible to extract the lip portion with higher accuracy.
The lip extraction apparatus according to an eighth aspect of the present invention includes a first lip candidate region determining unit that determines a first lip candidate region including lips from a face image, and the first lip candidate region from the face image. A second lip candidate region determining unit that determines a second lip candidate region including all of the regions and surrounding regions thereof, a color component histogram of the first lip candidate region, and the second lip candidate region A color component histogram creating unit that creates a color component histogram; and creating a difference histogram between the color component histogram of the first lip candidate region and the color component histogram of the second lip candidate region; and a peak in the difference histogram A threshold value setting unit that sets a color component value obtained by multiplying a color component value corresponding to the peak value by a coefficient as a threshold value, and a first lip candidate region Color component histo A lip extraction unit configured to extract a region of a color component located on a peak side of the lip portion as a lip portion with a peak of the lam being caused by the color component of the lip portion and being above the threshold value by the color component of the lip portion. And
According to a ninth aspect of the present invention, in the lip extraction apparatus according to the eighth aspect, the color component for obtaining the color component histogram is set with coefficients a and b in advance, and the color component histogram to be created is an RGB table. It is characterized by a color system (Ra, Gb, B).
According to a tenth aspect of the present invention, in the lip extraction apparatus according to the eighth aspect or the ninth aspect, an area ratio of the first lip candidate region to the second lip candidate region is larger than 1: 1. It is smaller than 1: 3.
[0032]
  Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0033]
  (Embodiment 1)
  FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to the present invention. The speech recognition apparatus according to the first embodiment will be described with reference to FIG.
[0034]
  The voice recognition device 101 includes an image input unit 102, an image processing unit 103, a voice input unit 104, and a voice recognition unit 105.
[0035]
  The image input unit 102 outputs a face image obtained by photographing the speaker's face. As the image input unit 102, a CCD camera or the like can be considered.
[0036]
  The image processing unit 103 extracts a lip image from the face image input from the image input unit 102 and outputs the lip image. The image processing unit 103 includes a face image extracting unit 106 and a lip image extracting unit 107.
[0037]
  The face image extraction unit 106 extracts edges by applying a Sobel (edge extraction) operator to the face image input from the image input unit 102 using the luminance information. Next, the facial image extraction unit 106 separates the facial image from the facial image using the image information of the partial facial image outside the outermost edge as a background, and creates a facial image. The face image extraction unit 106 converts the face image into YIQ color system conversion and inputs it to the lip image extraction unit 107.
[0038]
  In the present embodiment, as the method by which the facial image extraction unit 106 extracts a facial image from the facial image, a method of extracting an edge by applying a Sobel (edge extraction) operator using luminance information of the facial image is used. However, other methods may be used, and embodiments using other methods are also included in the present invention.
[0039]
  The lip image extraction unit 107 extracts a lip image from the input face image and outputs it to the voice recognition unit 105.
[0040]
  The voice input unit 104 outputs the voice collected by a sound collection device such as a microphone to the voice recognition unit 105.
[0041]
  The voice recognition unit 105 performs voice recognition input from the voice input unit 104 using the lip image input from the image processing unit 103 and outputs a recognition result.
[0042]
  Hereinafter, the lip image extraction unit 107 with enhanced lip extraction accuracy, which is a feature of the present invention, will be described in detail.
[0043]
  FIG. 2 is a block diagram of a configuration of the lip image extraction unit according to the first embodiment.
[0044]
  The first lip candidate region determining unit 201 determines the first lip candidate region from the input facial image of the speaker and information for extracting lips. As a method for determining the first lip candidate region, the method in the above-mentioned conventional example, the second image sensing symposium lecture paper, A-1, p1-6, “Face image extraction using color information and GA, As shown in “Application of personal verification”, a region extraction method by pattern matching using a genetic vector algorithm and a face (or lips) extraction template vector set prepared in advance may be considered. In addition, as a method for determining the first lip candidate region, other techniques can be used, and the first lip candidate area may be determined using a technique other than these.
[0045]
  The second lip candidate region determination unit 202 determines a second lip candidate region determined so as to include all of the first lip candidate region and its periphery from the input facial image. The area ratio between the first lip candidate region and the second lip candidate region is larger than 1: 1 and smaller than 1: 3.
[0046]
  The color component histogram creation unit 203 creates a color component histogram for the first lip candidate region and a color component histogram for the second lip candidate region.In FIG.As the color component, the (RGB) component in the RGB color system is shown on the horizontal axis. However, as color components, in addition to this, (R−G) component, R / G component, R / (G · B) component, and preset coefficients a and b, (R−a · G−b · B) By calculating and using the component, it is possible to easily create a color component histogram from the RGB components of the input face image, the Q component in the YIQ color system shown in the above-described conventional example, It is also possible to increase the effect by using H component, S component, etc. in the HSV color system.
[0047]
  The threshold value setting unit 204 creates a difference histogram that takes a difference between the histogram of the first lip candidate region and the histogram of the second lip candidate region, and multiplies the peak value of the difference histogram by a preset coefficient. The color component value corresponding to the value obtained by the above is set as a threshold value.
[0048]
  The lip extraction unit 205 extracts a lip part from the histogram of the first lip candidate region and the threshold value determined by the threshold value setting unit 204, and outputs it as a lip image.
[0049]
  FIG. 3 is a schematic diagram for explaining a first lip candidate region and a second lip candidate region according to the first embodiment.
[0050]
  301 is a lip portion, 302 is a first lip candidate region determined by the first lip candidate region determining unit 201, and 303 is determined so as to include all of the first lip candidate region and its periphery. This is the second lip candidate region.
[0051]
  FIG. 4 shows a color component histogram (frequency distribution) of the first lip candidate region, a color component histogram of the second lip candidate region, a color component histogram of the first lip candidate region, and the first lip candidate region according to the first embodiment. It is a figure which shows the difference histogram with the color component histogram of 2 lip candidate area | regions.
[0052]
  The horizontal axis is the color component ((RGB) component in the RGB color system), and the vertical axis is the frequency of the color component. 401 is a color component histogram of the first lip candidate region 302, 402 is a color component histogram of the second lip candidate region 303, and 403 is a color component histogram of the first lip candidate region 302 and the second lip candidate region 303. It is a difference histogram with a color component histogram. Reference numeral 404 denotes a first peak of the color component histogram of the second lip candidate region, and reference numeral 405 denotes a second peak of the color component histogram of the second lip candidate region.
[0053]
  Hereinafter, the operation of the lip image extracting unit according to the first embodiment will be described with reference to FIGS. 2, 3, and 4.
[0054]
  First, information for extracting a speaker's face image and lips is input to the first lip candidate region determination unit 201. The first lip candidate area determination unit 201 determines the first lip candidate area 302 from the face image and outputs the first lip candidate area 302 to the color component histogram creation unit 203.
[0055]
  Subsequently, the second lip candidate region determination unit 202 newly sets a region including all of the first lip candidate region 302 and its periphery as a second lip candidate region 303 from the input facial image of the speaker. The determined value is output to the color component histogram creation unit 203.
[0056]
  The color component histogram creation unit 203 creates the color component histogram 401 of the extracted first lip candidate region 302 and the color component histogram 402 of the second lip candidate region 303.
[0057]
  Figure4As can be seen from the color component histogram of the second lip candidate region 303402There are two peaks, a first peak 404 and a second peak 405. The color component histogram 401 of the first lip candidate region 302 has a peak at the same position as the second peak 405.
[0058]
  The second peak 405 in which the color component histogram 401 of the first lip candidate region 302 and the color component histogram 402 of the second lip candidate region 303 both appear is due to the color component of the lip portion 301. . The second2Lip candidate area303Color component histogram402The first peak 404 that clearly appears only in the skin is due to the color components of the skin color portion around the lips.
[0059]
  Therefore, by creating a difference histogram 403 between the color component histogram 401 of the first lip candidate region 302 and the color component histogram 402 of the second lip candidate region 303, a skin color component histogram around the lips can be extracted. It becomes.
[0060]
  The threshold value setting unit 204 creates a difference histogram 403 that is the skin color component histogram, obtains a peak value thereof, and obtains a color component value corresponding to the peak value. Further, the threshold setting unit 204 sets a color component value obtained by multiplying a color component value corresponding to the peak value by a preset coefficient as a threshold, and outputs the threshold to the lip extraction unit 205.
[0061]
  The lip extraction unit 205 extracts a color component region on the second peak 405 side as a lip portion from the threshold value determined by the threshold setting unit 204 among the color components of the color component histogram 401 of the first lip candidate region. , Output a lip image.
[0062]
  Further, the area ratio between the first lip candidate region 302 and the second lip candidate region 303 is larger than 1: 1 and smaller than 1: 3. By using such an area ratio, the threshold value can be obtained with high accuracy, so that the lips can be extracted with high accuracy. The area ratio may be changed depending on the surrounding environment and individual differences other than this ratio.
[0063]
  As described above, according to the lip image extraction unit according to the first embodiment, the first lip candidate region and the second lip candidate region both include the lip portion, and the second lip candidate region is the first lip candidate. The skin color part around the lips is included in a wider range than the area. Therefore, the difference histogram is a color component histogram of the skin color portion around the lips. Therefore, by setting a threshold value from the distribution of the difference histogram, it is possible to accurately extract the boundary between the color components of the target speaker's skin and lips.
[0064]
  In addition, since the same image is used for both the first lip candidate region and the second lip candidate region extracted for setting the threshold, it is possible to prevent deterioration of the lip portion extraction accuracy caused by ambient conditions such as illumination. .
[0065]
  Therefore, according to the lip image extraction unit according to the first embodiment, even if the distribution of the color component histogram is different depending on the individual condition of the speaker such as lighting conditions, gender difference, makeup (lipstick), etc. It is possible to extract the lip portion without being affected by.
[0066]
  (Embodiment 2)
  FIG. 5 is a block diagram showing a configuration of the lip image extraction unit according to the second exemplary embodiment of the present invention. The configuration of the lip image extraction unit will be described in detail with reference to FIG.
[0067]
  The lip image extraction unit according to the second embodiment is different from the lip image extraction unit according to the first embodiment only in using a temporary lip candidate region as the first lip candidate region determination method. Moreover, the same code | symbol is provided about the part same as the part already demonstrated.
[0068]
  The temporary lip candidate region determining unit 501 performs the same operation as the first lip candidate region determining unit 201 described in the first embodiment, extracts a lip candidate region, and sets this region as a temporary lip candidate region.
[0069]
  The color component histogram creation unit 502 creates a histogram of the color components (RGB) of the temporary lip candidate region. Similar to the color component histogram creation unit 203, color components other than this color component may be used.
[0070]
  The threshold storage unit 503 uses the lip image extraction unit according to the first embodiment to extract the lip image portion from the face images of a plurality of speakers in advance and extract the lip image used at that time. A threshold value obtained by calculating an average value of threshold values (color components) is stored.
[0071]
  The lip extraction unit 504 extracts a lip portion using the color component histogram of the temporary lip candidate region and the threshold value stored in the threshold value storage unit 503, binarizes it, and outputs it as a lip image.
[0072]
  The first lip candidate region determination unit 505 cuts out a rectangular region including the lip portion input from the lip extraction unit 504, and newly determines this region as the first lip candidate region.
[0073]
  The operation of the lip image extraction unit according to the second embodiment will be described below.
[0074]
  First, a facial image and lip information of a speaker to be extracted are input to the temporary lip candidate region determination unit 501. The temporary lip candidate area determination unit 501 determines a temporary lip candidate area for the face image of the speaker to be extracted, and outputs the temporary lip candidate area to the color component histogram generation unit 502.
[0075]
  Subsequently, the color component histogram creation unit 502 creates a histogram of the color component (RGB) of the input temporary lip candidate region and outputs it to the lip extraction unit 504.
[0076]
  The lip extraction unit 504 extracts and binarizes the lip using the color component histogram of the temporary lip candidate region input from the color component histogram creation unit 502 and the threshold value stored in the threshold value storage unit 503, and binarizes the lip part. Output to the lip candidate region determination unit 505.
[0077]
  The first lip candidate region determining unit 505 cuts out a rectangle including the extracted lip portion and determines this region as the first lip candidate region.
[0078]
  Since the following operations are the same as those in the first embodiment, description thereof is omitted.
[0079]
  As described above, according to the lip image extraction unit according to the second embodiment, the first lip candidate region is accurately extracted by determining the first lip candidate region using the temporary lip candidate region. It becomes possible. Furthermore, since the lip image is extracted using the first lip candidate region extracted with high accuracy in this way, the lip image can be extracted with high accuracy.
[0080]
  (Embodiment 3)
  FIG. 6 is a block diagram illustrating a configuration of the lip image extraction unit according to the third embodiment of the present invention. The configuration of the lip image extraction unit according to the third embodiment will be described with reference to FIG. In addition, the same code | symbol is provided about the part same as the part already demonstrated.
[0081]
  The lip image extraction unit according to the third embodiment is different from the lip image extraction unit according to the first embodiment only in using a temporary lip candidate region as the first lip candidate region determination method. Moreover, the same code | symbol is provided about the part same as the part already demonstrated.
[0082]
  The lip extraction template 601 stores a color component histogram and a threshold value created by cutting out lip portions from a plurality of speaker face images in advance.
[0083]
  The lip extraction unit 602 compares the color component histogram of the lip extraction template 601 with the color component histogram of the temporary lip candidate region. Then, as a result of the comparison, the lip extraction unit 602 extracts a lip image from the image of the temporary lip candidate region using the threshold value of the lip extraction template having the highest similarity.
[0084]
  The operation of the lip extraction unit according to the third embodiment will be described below.
[0085]
  First, a color component histogram of the temporary lip candidate region is created and input to the lip extraction unit 602 by the same method as in the second embodiment.
[0086]
  Subsequently, the lip extraction unit 602 compares the color component histogram of the lip extraction template 601 with the color component histogram of the temporary lip candidate region. Then, as a result of the comparison, the lip extraction unit 602 extracts a lip portion from the image of the temporary lip candidate region using the threshold value of the lip extraction template 601 having the highest similarity, and outputs the lip portion.
[0087]
  Since the following operations are the same as those in the first or second embodiment, the description thereof is omitted.
[0088]
  As described above, according to the lip image extracting unit according to the third embodiment, the first lip can be accurately determined by determining the first lip candidate region using the temporary lip candidate region and the lip extraction template. Candidate areas can be extracted. Furthermore, since the lip image is extracted using the first lip candidate region extracted with high accuracy in this way, the lip image can be extracted with high accuracy.
[0089]
  (Embodiment 4)
  FIG. 7 is a block diagram showing the configuration of the lip image extraction unit according to the fourth embodiment of the present invention. The configuration of the lip image extraction unit according to the fourth embodiment will be described with reference to FIG. In addition, the same code | symbol is provided about the part same as the part already demonstrated.
[0090]
  The temporary lip portion color component distribution creation unit 701 creates the color component distribution of the lip extracted portion in the RGB space.
[0091]
  The color component distribution creation unit 702 around the temporary lip creates a color component distribution around the lip on the RGB space.
[0092]
  The color component parameter determination unit 703 determines whether a given color component (R, G, B) belongs to the distribution of the color component of the lip extracted part or the distribution of the color component of the lip peripheral part. Discriminant function used to discriminate above
  a * R + b * G + c * B + d = 0
Ask for. The discriminant function can be obtained by discriminant analysis. The color component parameters ((a · R + b · G + c · B) of the first lip candidate region 302 and the second lip candidate region 303 are determined using the coefficients (a, b, c) of the discriminant function. The color component (RGB) shown in Form 1 corresponds to the case where a = 1, b = −1, and c = −1.
[0093]
  The color component histogram creation unit 704 uses the coefficients (a, b, c) to calculate the color component (a · R + b · G + c · B) histogram of the first lip candidate region 302 and the color of the second lip candidate region 303. Create a component histogram.
[0094]
  FIG. 8 is a diagram showing the color component distribution of the lip-extracted portion and the color component distribution of the lip peripheral portion according to the fourth embodiment on the RGB color space.
[0095]
  Reference numeral 801 shows the distribution of the extracted color components of the lip portion on the RGB color space, and reference numeral 802 shows the distribution of the color components of the lip peripheral portion on the RGB color space. Reference numeral 803 denotes a discriminant function (a · R + b · G + c · B + d =) for discriminating whether a given color component (R, G, B) belongs to a set of temporary lip portions or a set of lip peripheral portions. 0).
[0096]
  The operation of the lip image extraction unit according to the fourth embodiment will be described below.
[0097]
  First, the lip portion is extracted in the same procedure as in the first embodiment.
[0098]
  Next, the lip portion color component distribution creation unit 701 uses the lip portion as a lip portion, and creates and outputs a color component distribution of the lip portion. Similarly, the color component distribution creation unit 702 around the lip creates and outputs a color component distribution around the lip.
[0099]
  Next, the color component parameter determination unit 703 performs discriminant analysis in the (RGB) color space with respect to the input color component distribution 801 of the lip portion and the color component distribution 802 of the lip peripheral portion.
  a * R + b * G + c * B + d = 0
And the color component parameters (a · R + b · G + c · B) of the first lip candidate region 302 and the second lip candidate region 303 are determined and output using the coefficients (a, b, c).
[0100]
  The color component histogram creation unit 704 creates and outputs a color component histogram of the first lip candidate region 302 using the coefficients (a, b, c) thus obtained. Further, the color component histogram creation unit 704 creates and outputs a color component (a · R + b · G + c · B) histogram of the second lip candidate region 303.
[0101]
  Since the following operations are the same as those already described, the description thereof is omitted.
[0102]
  As described above, according to the lip image extraction unit according to the fourth embodiment, the coefficients (a, b, c) of the color components (a · R + b · G + c · B) for setting the threshold value by the discriminant function of discriminant analysis. ) And using this color component to create a color component histogram makes it possible to extract the lip portion with higher accuracy.
[0103]
  In the fourth embodiment, the discriminant function obtained by the discriminant analysis is used as the function for dividing the lip portion and the lip peripheral portion into two. However, functions other than the discriminant function, for example, the color of the lip extracted temporary lip portion A component distribution 801 and a color component distribution 802 around the lip, a function (a, R + b, G + c, B + d = 0) representing a plane that passes through the midpoint between the centroids and is orthogonal to the straight line connecting the centroids are used. It is also possible to do. It is also possible to use coefficients (a, b, c) of functions other than these discriminant functions.
[0104]
  (Embodiment 5)
  FIG. 9 is a block diagram showing a configuration of the lip image extraction unit according to the fifth embodiment of the present invention. The lip image extraction unit according to the fifth embodiment will be described in detail with reference to FIG. In addition, the same code | symbol is provided about the part same as the part already demonstrated.
[0105]
  The luminance value conversion unit 901 converts the image data of the first lip candidate region into image data represented by luminance values.
[0106]
  The edge extraction unit 902 extracts an edge by applying an edge operator to the image data represented by the luminance value, and further binarizes it.
[0107]
  The image composition unit 903 calculates the logical sum of the lip image extracted using the color component histogram and the edge extraction image, and outputs the result as a lip image.
[0108]
  The operation of the lip image extraction unit according to the fifth embodiment will be described below.
[0109]
  The processing for extracting the lips using the color component histogram is the same as that in the first embodiment, and a description thereof will be omitted.
[0110]
  After the first lip candidate region determination unit 201 extracts the first lip candidate region, the luminance value conversion unit 901 converts the image data of the first lip candidate region into image data represented by a luminance value and outputs it. .
[0111]
  Next, the edge extraction unit 902 extracts an edge by applying an edge operator to the image data converted into the luminance value, further binarizes it, and outputs it.
[0112]
  Then, the image composition unit 903 calculates the logical sum of the lip image extracted using the color component histogram and the edge extraction image input from the edge extraction unit 902 as in the first embodiment, and outputs this as a lip image. .
[0113]
  As described above, according to the lip image extraction unit according to the fifth embodiment, it is possible to extract a lip portion with higher accuracy by performing not only lip extraction using a color component but also lip extraction using a luminance component. .
[0114]
  As described above, with the lip extraction method of the present invention, it is possible to accurately extract the lip portion without being affected by conditions such as lighting and individual differences among speakers.
[0115]
  Further, the accuracy can be further increased by appropriately combining the first to fifth embodiments.
[0116]
  Furthermore, the configuration of the lip image extraction unit according to the present invention is not limited to FIGS. 2, 5 to 7, and 9. For example, the number of processing units such as a color component histogram creation unit is increased. Parallel processing can be used to speed up processing. In addition, each processing unit of the lip image extraction unit according to the present invention, for example, a threshold setting unit, a color component histogram creation unit, and the like can be reduced in size by performing a plurality of processes.
[0117]
  In the description of the present invention, the lip image extraction unit is applied to the voice recognition device. However, the lip image extraction unit can be applied to other devices and is included in the present invention.
[0118]
【The invention's effect】
  As described above, according to the present invention, it is possible to accurately extract the lip portion without being affected by conditions such as lighting or individual differences among speakers.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the configuration of a speech recognition apparatus according to the present invention.
FIG. 2 is a block diagram showing a configuration of a lip image extraction unit according to the first embodiment of the present invention;
FIG. 3 is a schematic diagram for explaining a first lip candidate region and a second lip candidate region according to the first embodiment;
FIG. 4 is a color component histogram of a first lip candidate area, a color component histogram of a second lip candidate area, a color component histogram of a first lip candidate area, and a second lip candidate according to the first embodiment; The figure which shows a difference histogram with the color component histogram of an area | region
FIG. 5 is a block diagram showing a configuration of a lip image extraction unit according to the second embodiment of the present invention;
FIG. 6 is a block diagram showing a configuration of a lip image extraction unit according to the third embodiment of the present invention;
FIG. 7 is a block diagram showing a configuration of a lip image extraction unit according to a fourth embodiment of the present invention.
FIG. 8 is a diagram showing the color component distribution of the lip-extracted portion and the color component distribution of the lip peripheral portion in the RGB color space according to the fourth embodiment;
FIG. 9 is a block diagram showing a configuration of a lip image extraction unit according to the fifth embodiment of the present invention;
FIG. 10 is a block diagram showing an outline of a conventional lip image extracting apparatus.
[Explanation of symbols]
  101 Voice recognition device
  102 Image input unit
  103 Image processing unit
  104 Voice input part
  105 Voice recognition unit
  106 Face image extraction unit
  107 Lip image extraction unit
  201, 505 First lip candidate area determination unit
  202 Second lip candidate area determination unit
  203, 502 Color component histogram creation unit
  204 Threshold setting unit
  205, 504, 602 Lip extractor
  501 Temporary lip candidate area determination unit
  503 Threshold storage unit
  601 Lip Extraction Template
  701 Color component distribution creation part of lip portion
  702 Color component distribution creation part around the lip
  703 Color component parameter determination unit
  704 Color component histogram generator
  901 Brightness value converter
  902 Edge extraction unit
  903 Image composition unit

Claims

顔画像から唇を含む第１の唇候補領域を決定する工程と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する工程と、前記第１の唇候補領域の色成分ヒストグラムを作成する工程と、前記第２の唇候補領域の色成分ヒストグラムを作成する工程と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成する工程と、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する工程と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する工程と、を具備することを特徴とする唇抽出方法。Determining a first lip candidate area including lips from a face image; and determining a second lip candidate area including all areas of the first lip candidate area and surrounding areas from the face image. When the step of creating a color component histogram of the first lip candidate region, a step of creating a color component histogram of the second lip candidate region, a color component histogram of the first lip candidate region and the a step of creating a difference histogram of the color component histogram of the second lip candidate region, the peak in the difference histogram to be due to the color component of the skin color portion of the peripheral lip, a factor to the color component value corresponding to the peak value and setting the color component values obtained by multiplying the threshold value, the peak of the color component histogram of the first lip candidate region assumed by the color component of the lip portion, the color of the lip portion than said threshold value Lip extraction method characterized by comprising the steps of extracting a region of the color components on the peak side by minute as lip portion.

前記色成分ヒストグラムを求める色成分は、予め係数ａ，ｂを設定し、作成する色成分ヒストグラムの色成分がＲＧＢ表色系の（Ｒ−ａ・Ｇ−ｂ・Ｂ）であることを特徴とする請求項１記載の唇抽出方法。 The color components for obtaining the color component histogram have coefficients a and b set in advance, and the color components of the color component histogram to be created are (Ra, GB, B) in the RGB color system. The lip extraction method according to claim 1.

前記第１の唇候補領域と前記第２の唇候補領域の面積比が１対１よりも大きく、１対３よりも小さいことを特徴とする請求項１又は請求項２に記載の唇抽出方法。 3. The lip extraction method according to claim 1, wherein an area ratio between the first lip candidate region and the second lip candidate region is larger than 1: 1 and smaller than 1: 3. .

前記第１の唇候補領域を決定する工程は、前記顔画像から仮唇候補領域を決定する工程と、前記仮唇候補領域の色成分ヒストグラムを作成する工程と、予め作成した複数の話者の顔画像から唇部分を切り出す際に使用した複数の閾値の平均値である閾値を、前記仮唇候補領域の色成分ヒストグラムに用いて色成分による閾値処理を行うことにより、前記仮唇候補領域から唇部分を抽出し、２値化を行う工程と、前記抽出された唇部分を含む長方形の領域を切り出すことで決定した新たな唇候補領域を前記第１の唇候補領域とする工程と、を具備することを特徴とする請求項１から請求項３のいずれかに記載の唇抽出方法。 The step of determining the first lip candidate region includes a step of determining a temporary lip candidate region from the face image, a step of creating a color component histogram of the temporary lip candidate region, and a plurality of speakers created in advance. By performing threshold processing using color components using a threshold value that is an average value of a plurality of threshold values used when cutting out the lip portion from the face image in the color component histogram of the temporary lip candidate region, from the temporary lip candidate region Extracting a lip portion and performing binarization; and setting a new lip candidate region determined by cutting out a rectangular region including the extracted lip portion as the first lip candidate region. The lip extraction method according to any one of claims 1 to 3, further comprising:

前記第１の唇候補領域を決定する工程は、前記顔画像から仮唇候補領域を決定する工程と、前記仮唇候補領域の色成分ヒストグラムを作成する工程と、予め複数の話者の顔画像から唇部分を切り出して作成した唇抽出用テンプレートの色成分ヒストグラムと前記仮唇候補領域の色成分ヒストグラムを比較し、類似度のもっとも高い唇抽出用テンプレートの閾値を前記仮唇候補領域の色成分ヒストグラムに用いて色成分による閾値処理を行うことにより、前記仮唇候補領域から唇部分を抽出し、２値化を行う工程と、前記抽出された唇部分を含む長方形の領域を切り出すことで決定した新たな唇候補領域を前記第１の唇候補領域とする工程と、を具備することを特徴とする請求項１から請求項３のいずれかに記載の唇抽出方法。 The step of determining the first lip candidate region includes a step of determining a temporary lip candidate region from the face image, a step of creating a color component histogram of the temporary lip candidate region, and a plurality of speaker face images in advance. The color component histogram of the lip extraction template created by cutting out the lip part from the color component histogram of the temporary lip candidate region is compared, and the threshold value of the lip extraction template with the highest similarity is set to the color component of the temporary lip candidate region Determined by extracting a lip portion from the temporary lip candidate region and performing binarization by performing threshold processing based on a color component using a histogram, and cutting out a rectangular region including the extracted lip portion The lip extraction method according to any one of claims 1 to 3, further comprising the step of setting the new lip candidate region as the first lip candidate region.

前記色成分ヒストグラムを求める色成分は、前記抽出した唇部分の色成分の分布と前記抽出した唇周辺部分の色分布に対し、ＲＧＢ色空間上で唇部分と唇周辺部分を２分する関数
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ＋ｄ＝０
を求め、唇抽出のために作成するヒストグラムの色成分を
ａ・Ｒ＋ｂ・Ｇ＋ｃ・Ｂ
とすることを特徴とする請求項１から請求項５のいずれかに記載の唇抽出方法。 The color component for obtaining the color component histogram is a function that bisects the lip portion and the lip peripheral portion in the RGB color space with respect to the color component distribution of the extracted lip portion and the color distribution of the extracted lip peripheral portion a.・ R + b ・ G + c ・ B + d = 0
The color components of the histogram to be created for lip extraction are a · R + b · G + c · B
The lip extraction method according to any one of claims 1 to 5, wherein:

さらに、前記第１の唇候補領域の画像を輝度成分に変換し、エッジ抽出し、２値化した画像と、前記色成分により唇部分を抽出した画像の論理和を取った画像を唇抽出画像として出力する工程を有することを特徴とする請求項１から請求項６のいずれかに記載の唇抽出方法。 Further, the image of the first lip candidate region is converted into a luminance component, the edge is extracted, the binarized image, and the image obtained by taking the logical sum of the images in which the lip portion is extracted by the color component are obtained as a lip extraction image. The lip extraction method according to any one of claims 1 to 6, further comprising a step of outputting as:

顔画像から唇を含む第１の唇候補領域を決定する第１の唇候補領域決定部と、前記顔画像から前記第１の唇候補領域の全ての領域とその周辺の領域を含む第２の唇候補領域を決定する第２の唇候補領域決定部と、前記第１の唇候補領域の色成分ヒスA first lip candidate region determining unit for determining a first lip candidate region including lips from a face image; and a second lip including all regions of the first lip candidate region and surrounding regions from the face image. A second lip candidate region determining unit that determines a lip candidate region; and a color component hiss of the first lip candidate region トグラムと前記第２の唇候補領域の色成分ヒストグラムとを作成する色成分ヒストグラム作成部と、前記第１の唇候補領域の色成分ヒストグラムと前記第２の唇候補領域の色成分ヒストグラムとの差分ヒストグラムを作成し、前記差分ヒストグラムにおけるピークを唇周辺の肌色部分の色成分によるものとし、そのピーク値に対応する色成分値に係数を掛けて得られる色成分値を閾値として設定する閾値設定部と、前記第１の唇候補領域の色成分ヒストグラムのピークを前記唇部分の色成分によるものとし、前記閾値よりも前記唇部分の色成分によるピーク側にある色成分の領域を唇部分として抽出する唇抽出部と、を具備することを特徴とする唇抽出装置。A difference between the color component histogram of the first lip candidate region and the color component histogram of the second lip candidate region A threshold setting unit that creates a histogram, sets a peak in the difference histogram to be a color component of a skin color portion around the lips, and sets a color component value obtained by multiplying a color component value corresponding to the peak value by a coefficient as a threshold And the peak of the color component histogram of the first lip candidate region is based on the color component of the lip portion, and the region of the color component located on the peak side of the color component of the lip portion from the threshold is extracted as the lip portion. And a lip extraction unit.

前記色成分ヒストグラムを求める色成分は、予め係数ａ，ｂを設定し、作成する色成分ヒストグラムの色成分がＲＧＢ表色系の（Ｒ−ａ・Ｇ−ｂ・Ｂ）であることを特徴とする請求項８記載の唇抽出装置。The color components for obtaining the color component histogram have coefficients a and b set in advance, and the color components of the color component histogram to be created are (Ra, GB, B) in the RGB color system. The lip extraction device according to claim 8.

前記第１の唇候補領域と前記第２の唇候補領域の面積比が１対１よりも大きく、１対３よりも小さいことを特徴とする請求項８又は請求項９に記載の唇抽出装置。The lip extraction apparatus according to claim 8 or 9, wherein an area ratio between the first lip candidate region and the second lip candidate region is larger than 1: 1 and smaller than 1: 3. .