JP3977291B2

JP3977291B2 - Image reproduction method and image processing apparatus

Info

Publication number: JP3977291B2
Application number: JP2003183219A
Authority: JP
Inventors: 茂溝口
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-06-26
Filing date: 2003-06-26
Publication date: 2007-09-19
Anticipated expiration: 2023-06-26
Also published as: JP2005018466A

Description

【０００１】
【発明の属する技術分野】
本発明は画像領域の抽出及び画像の再生、特に圧縮画像からの注目画像領域の抽出と圧縮画像の再生方法及びその装置、又その処理に関連するコンピュータプログラム及びコンピュータ読み取り可能な記録媒体に係わり、特に、圧縮された画像データ形式である例えばJPEGファイル画像から注目画像領域を抽出して所望の再生をするために用いて好適なものである。
【０００２】
【従来の技術】
デジタルカメラなどで撮影して圧縮された例えばJPEGファイル画像をパーソナルコンピュータ（ＰＣ）などで再生して、表示を行う場合又はＰＣプリンタもしくはダイレクトプリンタなどから印刷を行う場合や、ＤＰＥでプリントを行う場合がある。この時に、撮影画像データが良質な場合は忠実に再生して表示又はプリントすればよいので問題は生じない。
【０００３】
しかしながら、撮影画像データによっては色被り、コントラスト不足、露出の不適切などがあり、良質な印刷結果を得るためには画像補正を施す必要がある。特に、人物を撮影した画像の場合には、一般に、人の顔の色が適正になるように再生してプリントすると写真を見た人に与える感じが良くなり、写真の質を高めることになる。風景や物を撮影した場合でも、目標とする撮影対象物の色が適正になるように再生してプリントすることが望まれる。
【０００４】
例えば、銀塩写真の場合、質の良い写真を得るためには原画像ごとに焼き付け時の露光量を変更することが好ましく、この焼付け時の露光量を決めるのに、人物が入った写真の場合には、人の顔の色に着目するのが便利である。何故ならば、人の顔は肌色であることが分かっているために、焼き付けられた写真における人の顔の色が肌色になるように露光量を決めることが可能であるからである。
【０００５】
従来、デジタルデータの画像ファイルから画像認識する方法としては、例えば特許文献１、特許文献２、特許文献３など知られている。
【０００６】
これらの方法は、指定画像との類似度や一致度を検出するもので、特許文献１の場合は、直流成分によるブロック単位での粗一致を求め、その後、候補画像領域に対して復元処理を行い、非圧縮データとして微一致を求める方式である。
【０００７】
また、特許文献２の場合は、検索データを入力作成し、このデータと複数の画像データの類似度を判定する画像処理装置である。さらに、特許文献３の場合は、検索対象画像をウェーブレット変換して圧縮画像を作成する。また、指定された画像にもウェーブレット変換を施し、各々の特徴データを比較することで、類似度を判定するようにしている。
【０００８】
一方、画像を補正する方法としては、デジタルカメラで撮影した画像をプリントする際に、アプリケーションやプリンタドライバのアプリケーションにより、撮影データをヒストグラムなどで解析し、コントラスト、ホワイトバランス、露出補正、シャープネスなど画像補正を一様に施すものが知られている。
【０００９】
【特許文献１】
特開平8-161497号公報
【特許文献２】
特開2000-48036号公報
【特許文献３】
特開平11-238067号公報
【００１０】
【発明が解決しようとする課題】
しかしながら、上記従来の方法では補正対象の注目画像領域を正確に見付け出して、その注目画像領域を所望の色に補正することが出来ない。
【００１１】
すなわち、デジタルカメラなどで撮影した例えばJPEGファイル画像を再生して表示又はプリントする場合に、銀塩写真のプリント処理のように、人物など注目画像が、より良く表示又はプリント出来るように必要に応じて補正を行えるように、上記JPEGファイル画像の中に注目画像領域を見つけ出す方法を決める必要がある。
一方、デジタルカメラからプリンタへ直接プリントを行うダイレクトプリントなどデータ処理能力の低い機器でも使用出来るように、検出処理は出来うるだけ軽く済む方法が求められている。
本出願人は、上述の問題点に鑑み、画像ファイルの中の注目画像領域を処理不可の少ない方法で抽出できる、また、入力画像サイズに関係なしに画像ファイルの中の注目画像領域を処理不可の少ない方法で抽出できる方法を提案した(特願2002-193620)。
【００１２】
ところが、注目画像領域抽出に用いられる色度比率判定やＤＣＴのＡＣ成分による特徴量判定の判定方法が、必ずしも最適化されていないことによる注目画像領域のより欠損の少ない完全な抽出が出来ない画像などが存在する。
【００１３】
また、注目画像検出の判定に用いられるＤＣＴのＡＣ成分の特徴量判定においては、画像サイズごとに、検出サイズクラスにより判定テーブルを持つ必要があったが、判定テーブルの複雑さを招く。
【００１４】
本出願人は、また、特徴量判定の判定方法を最適化し、注目画像領域のより欠損の少ない完全な抽出する方法を提案した(特願2002-193621)。
【００１５】
また、JPEGによる圧縮画像の圧縮比率に関わる量子化テーブルの値が、撮影時やアプリケーションによる編集後の再保存により一様ではなく、高圧縮の量子化テーブルを使用すると、画像中の空間周波数が極端に変化してしまい、注目画像領域における周波数特徴量も影響を受け、上記検出精度が落ちてしまう可能性もある。
本出願人は、更に、画像ファイルの中の注目画像領域を抽出する際に量子化テーブルの特性を利用した判定を行うようにして、処理負荷の少ない方法で注目画像領域を抽出できる技術を提案した(特願2002-193622)。
【００１６】
しかしながら、更に、画像の輝度によって注目画像領域抽出に用いられる色度比率判定が不正確となったり、上記注目画像領域抽出による取得データにおいて露出補正などへの応用が考えられたが、人物顔画像がボケたようなものにおいては、適正な補正への情報を取得できているとは言えない部分が生まれる。
【００１７】
本発明は、上記問題点に鑑み、画像の輝度によって注目画像領域抽出に用いられる色度比率判定を適正化して、安定した注目画像領域の抽出及び画像の再生を行うことを目的とする。
【００１８】
又、人物顔画像などがボケたようなものにおいては、適正な補正への情報を取得できるようにすることを目的とする。
【００２１】
【課題を解決するための手段】
この課題を解決するために、本発明の画像再生方法は、圧縮符号化された画像データを復号して画像を再生する画像再生方法であって、復号した前記画像データから所定の色度範囲の連続するブロックを抽出するブロック抽出ステップと、前記連続するブロックの空間周波数の平均値に基づいて、前記連続するブロックを注目画像領域とするか否かを判定する判定ステップと、前記注目画像領域から特徴部位を抽出する特徴部位抽出ステップと、前記抽出された特徴部位の画素数と前記圧縮符号化に利用した量子化フィルタ値とに基づいて、前記特徴部位のボケの補正強度を決定する決定ステップと、前記決定ステップで決定された補正強度に応じて前記特徴部位のボケを補正するボケ補正ステップと、前記補正ステップでボケが補正された画像を再生する再生ステップとを有することを特徴とする。
【００２２】
ここで、前記決定ステップでは、前記特徴部位の画素数が少なく前記量子化フィルタ値の値が大きいほど、ボケをより強く補正する補正強度に決定する。また、前記所定の色度範囲を、前記復号した画像データの輝度値に基づいて設定する。また、前記圧縮符号化された画像データを復号して前記復号した画像データを生成する復号ステップと、前記復号した画像データから色度、空間周波数、輝度を求めるステップとを更に有する。また、前記圧縮符号化された画像データはJPEG画像データであり、前記復号した画像データはＤＣＴ係数と逆ＤＣＴ変換されたビットマップデータとを含む。また、前記ブロック抽出ステップで抽出された前記連続するブロックの数に基づいて、注目画像領域となる候補を選別する選別ステップを更に有する。また、前記ボケの補正は、アンシャープマスク処理により行われる。
【００２３】
又、本発明は、上記方法のステップをコンピュータに実行させるためのコンピュータプログラム及び該コンピュータプログラムを記憶したコンピュータ読み取り可能な記憶媒体をも提供する。
【００２５】
又、本発明の画像処理装置は、圧縮符号化された画像データを復号して画像を再生する画像処理装置であって、復号した前記画像データから所定の色度範囲の連続するブロックを抽出するブロック抽出手段と、前記連続するブロックの空間周波数の平均値に基づいて、前記連続するブロックを注目画像領域とするか否かを判定する判定手段と、前記注目画像領域から特徴部位を抽出する特徴部位抽出手段と、前記抽出された特徴部位の画素数と前記圧縮符号化に利用した量子化フィルタ値とに基づいて、前記特徴部位のボケの補正強度を決定する決定手段と、前記決定手段により決定された補正強度に応じて前記特徴部位のボケを補正するボケ補正手段と、前記補正手段によりボケが補正された画像を再生する再生手段とを有することを特徴とする。
【００２６】
ここで、前記決定手段は、前記特徴部位の画素数が少なく前記量子化フィルタ値の値が大きいほど、ボケをより強く補正する補正強度に決定することを特徴とする。また、前記圧縮符号化された画像データを復号して前記復号した画像データを生成する復号手段と、前記復号した画像データから色度、空間周波数、輝度を求める手段とを更に有する。また、前記圧縮符号化された画像データはJPEG画像データであり、前記復号した画像データはＤＣＴ係数と逆ＤＣＴ変換されたビットマップデータとを含む。また、前記ブロック抽出手段により抽出された前記連続するブロックの数に基づいて、注目画像領域となる候補を選別する選別手段を更に有する。また、前記ボケの補正は、アンシャープマスク処理により行われる。
【００２７】
【発明の実施の形態】
次に、添付図面を参照しながら本発明の注目画像領域の抽出方法、圧縮画像の再生方法及びその装置、又その処理に関連するコンピュータプログラム及びコンピュータ読み取り可能な記録媒体の実施形態を説明する。
【００２８】
以下、本実施形態では、例えば、圧縮された画像データ形式であるJPEGファイル画像から注目画像領域を抽出して所望の再生を行う例を説明するが、JPEGによる圧縮に限定されることはなく、本発明のように圧縮過程でシンボルデータ（本例ではＤＣＴ係数）から画像の空間周波数を抽出可能な圧縮技術において広く適用が可能であり、本発明はこれらも含むものである。又、本例では、特にJPEGファイル画像を再生してプリントする例を中心に説明するが、本発明は圧縮画像の再生出力（表示、プリントを含む）の技術であり、これらも含まれる。
【００２９】
更に、本発明の注目画像領域の抽出方法は、圧縮画像からの注目画像領域の抽出に限定されることなく、圧縮されていない画像からの所望の注目画像領域の抽出にも適用される技術であってこれらも含み、同様の効果を奏するものである。
【００３０】
＜本実施形態の復号再生の対象である圧縮符号化データの例＞
最初に、現在、最も一般的な画像圧縮ファイルの"JPEGファイル"の情報省略と符号化・復号化について、図２及び図３を参照して説明する。
【００３１】
まず、符号化であるが、通常、デジタルカメラやデジタルビデオなどでは、静止画をJPEG ファイルにて保存することが一般的になっている。この場合、入力機器の受光素子であるＣＣＤなどの入った信号をＡ／Ｄ変換した後、フレームメモリーに取り込み、ＲＧＢもしくはＣＭＹフィルタの情報を輝度と色度情報とに変換する。その後、８＊８（６４個）正方画素ブロックに分割する。
【００３２】
図３の▲１▼は、輝度データのビットマップを８＊８ブロックに分割したうちの１ブロックのデータ例を示している。また、図３の▲２▼においては、０〜２５５の画素値をレベルシフトして−１２８〜１２７の信号に変換する例を示している。さらに、図３の▲３▼においては、ＤＣＴ（離散コサイン変換）によりＤＣＴ係数を求める例を示している。
【００３３】
また、図３の▲４▼は、視覚特性を考慮した高周波成分の省略を大きくした量子化テーブルであり、このテーブルを用いて、上記図３の▲３▼の結果であるＤＣＴ係数に対して量子化する例を示している。
【００３４】
図３の▲５▼は、量子化を行った結果である。この値をエントロピー符号化してハフマン符号で表すことにより符号化信号である圧縮データを生成する。
【００３５】
次に、復号化においては、上述した符号化の逆の工程を行う。つまり、符号化信号を復号して、量子化ＤＣＴ係数の値を復号する。次に、逆量子化を行うために量子化テーブルを乗ずることでＤＣＴ係数を得る。その後、逆ＤＣＴを行うことでレベルシフトした画像が復元され、更に逆レベルシフトの値１２８を加算することで１ブロックの画像が復号される。
【００３６】
上記の説明では、輝度情報と色度情報とに分割したデータを合成してＲＧＢ画像に変換することを省略したが、符号化における流れとしては、図２に示すように、カラー画像を輝度成分（Y）と２つの色度成分（Cb、Cr）とに変換し、その各々を符号化して合成することで、圧縮画像データを生成している。
【００３７】
以上のような、圧縮された画像データファイルであるJPEG画像をプリントする方法としては、入力機器からの圧縮画像データをＵＳＢや記憶メディアによって、パーソナルコンピュータ（以下、ＰＣとする）に取り込んで画像を展開し、必要に応じて画像補正を加えた後プリンタへデータを送る場合や、入力機器からの画像データを直接プリンタへ入力し、プリンタの中で、画像を解凍し、必要に応じて画像補正を加えた後で印刷を行うなど、幾種類かの選択肢がある。
【００３８】
いずれにしても、良好な画像をプリントするためには、撮影画像データが良質な撮影画像であるか、あるいは補正が必要な画像であるのかを判断して、忠実に印刷すべき良質な画像と、補正を行うことにより良質な画像に近付けた後に印刷を行うもとをより分ける必要がある。
【００３９】
良好な画像とは、下記のようなことが考えられる。
１）ホワイトバランスが良好である。
２）コントラストが適切である。
３）必要な部分の階調が割り当てられている。つまり、露出設定が良好である。４）彩度が適正である。
５）銀塩写真のような仕上がりである。
６）人物など注目される画像が中心に補正されている。
【００４０】
現在市販のＰＣプリンタやＰＣを経由しないダイレクトプリンタなどにおいても上記１）〜５）の項目においては、程度の差も有るが行われている。また、上記６）の注目画像に対する補正が行われていないのは、その検出に多大な処理が必要であることと、その方法が確立されていないことによる。
【００４１】
特に、処理能力のひ弱なダイレクトプリンタなどにおいては実施が難しいとされているが、本発明はこれを解決するものである。その手段としては、JPEG画像ファイルに注目画像の存在の検出と、その検出した画像に対する補正の必要等の確認を経て、全体画像補正へ受け渡す方法となる。
【００４２】
＜第１の実施形態の画像処理装置の構成例＞
以下に、第１の実施形態の画像処理装置の構成例をブロック図で示す。
【００４３】
図１Ａは、JPEG ファイルを解凍する過程とその際に取得する情報について表した復号部１０のブロック図である。
【００４４】
JPEG ファイルをRGBのビットマップデータへ変換する過程においては、まず、符号テーブル２を用いてエントロピー復号化部１にてエントロピー復号を行う。次に、逆量子化部３において、逆量子化に使用する量子化テーブル４を、逆量子化を行う他にデータとして記憶する。
【００４５】
この逆量子化されたデータは、ブロック単位のデータとして周波数変換されたものであり、このデータを、画像周波数特性を得るためのデータとして取得する。その後、逆ＤＣＴ部５において、逆ＤＣＴ処理と逆レベルシフトとを行いＹｃｃ−ＲＧＢ変換することで、通常のＲＧＢビットマップデータに展開する。
【００４６】
図１Ｂは、上記復号部１０を含む本実施形態の画像処理装置の構成例を示すブロック図である。
【００４７】
本実施形態の画像処理装置は、復号部１０と復号部１０から取得したデータに基づいて補正対象の画像領域を認識する画像認識部（第１画像抽出を実行）１００と、画像認識部１００からの認識領域を所望の色に補正する色調補正部２０とから構成される。色調補正部２０から出力される再生され補正された画像（ＢＭＰ）は、プリンタに送られてプリントされる。
【００４８】
画像認識部１００は、復号部１０から復号画像（ＢＭＰ）を受信して指定された対象色（本例では肌色）の領域を検出する対象色検出部１０１と、復号部１０から復号ＤＣＴデータを受信して、対象色検出部１０１で検出された対象色の候補領域における空間周波数を生成する空間周波数生成部１０２と、対象色検出部１０１で検出された対象色の候補領域から空間周波数に基づいて、色調補正の対象領域を選別する対象色領域選別部１０３を有する。対象色検出部１０１は復号画像を記憶する復号画像記憶部１０１ａを有するが、この復号画像記憶部１０１ａは対象色検出部１０１にある必要はなく他の処理部と兼用されてよい。又、対象色領域選別部１０３は、選別のための判別テーブル１０３ａを有している。かかる判別テーブル１０３は、画像のサイズに対応して複数の判別テーブルを有してもよい。
【００４９】
画像認識部１００は、更に、本実施形態の処理を改善するために、復号部１０から量子化テーブル値を受信して、禁止するためのしきい値１０４ａに基づく判定から色調補正処理を禁止する色調補正禁止部１０４を有する。
【００５０】
色調補正部２０は、画像認識部１００で選別された選別領域の色を補正対象色（本例では肌色）に、例えば色補正テーブル２０ａなどを使用して既知の色補正処理を行う。この色調補正の処理は、所定の条件で対象色領域選別部１０３あるいは色調補正禁止部１０４からの色調補正禁止信号により禁止される。この補正処理は簡素化のためには画像全体に施しても良いが、画質を高めることが目的であれば領域によって異なる補正、あるいは部分的な補正であってもよい。本発明の特徴は、かかる色調補正の方法にはないので、本実施形態の説明では簡単に済ますことにする。
【００５１】
図１Ｃは、本実施形態の画像処理を実現するハードウエア及びソフトウエアの構成例を示す図である。尚、図１Ｃは、本実施形態の特徴部分である画像認識部１００を中心に説明している。この装置は汎用のコンピュータでも実現できるし、専用のコンピュータにより実現してもよい。
【００５２】
１１０は演算処理のＣＰＵ、１２０はＣＰＵ１１０が使用する固定のデータ及びプログラム（ＯＳやＢＩＯＳなどはここに有るとする）を格納するＲＯＭ、１３０は本実施形態でＣＰＵ１１０が使用するデータやプログラムを一時格納するＲＡＭである。ここで、本例では、アプリケーションプログラムは、後述の外部記憶部１４０からＲＡＭ１３０のプログラムロード領域１３２にロードされて、ＣＰＵ１１０により実行されるとしている。
【００５３】
ＲＡＭ１３０がデータ記憶領域１３１に記憶するデータには、復号部１０が復号した復号画像あるいは色調補正された再生画像を記憶する復号画像データ領域１３ａ、補正対象色（本例では肌色）データを記憶する補正対象色領域１３ｂ、検出された対象色領域を記憶する候補領域の記憶領域１３ｃ、候補領域から形成される候補グループを記憶する候補グループ領域１３ｄ、最終的に選別された領域を記憶する選別領域の記憶領域１３ｅ、復号部１０からの復号ＤＣＴデータを記憶する復号ＤＣＴデータ記憶領域１３ｆ、生成された空間周波数を記憶する空間周波数領域１３ｇ、対象色領域を選別するために使用する判別テーブルを記憶する判別テーブル領域１３ｈ、復号部１０からの量子化テーブルを記憶する量子化テーブル領域１３ｉ、量子化テーブルの係数を加算した値を記憶する量子化係数加算値の記憶領域１３ｊ、色調補正の禁止などに使用するしきい値群を記憶する領域１３ｋを含んでいる。
【００５４】
１４０は、ディスクやメモリカードなどの大容量あるいは抜き差し可能な媒体からなる外部記憶部であり、フロッピー(登録商標）ディスクやＣＤなどを含む。
【００５５】
外部記憶部１４０のデータ記憶領域１４１には、判別テーブル１〜ｎ１４ａやしきい値群１４ｂが格納されている。又、他のパラメータや画像データなどを蓄積するデータベースが記憶されてもよい。プログラム格納領域１４２には、大まかに分類すると、対象色領域検出モジュール１４ｃ、空間周波数生成モジュール１４ｄ、対象色領域選別モジュール１４ｅ、色調補正禁止モジュール１４ｆ、そして後述の第２の実施形態で実行される特徴部位抽出モジュール１４ｇが格納されている。
【００５６】
更に、図１Ｃの装置は、復号部１０及び／又は色調補正部２０を兼ねてもよく、その場合には、更にデータとしてし色調補正テーブル１４ｆ、プログラムとして色調補正モジュール１４ｉ、後述の第２の実施形態で使用されるボケ補正モジュール１４ｊを格納してもよい。
【００５７】
１５０は入力インターフェースであって、本例では復号部１０からの復号データ（ＢＭＰ）、復号ＤＣＴデータ、量子化テーブル値、また装置固有のあるいは外部から指定可能な対象色データを入力する。１６０は出力インターフェースであって、選別領域や色調補正禁止信号を出力する。尚、本装置が色調補正部も兼用するのであれば、出力は色調補正画像データ（ＢＭＰ）となる。更に、本装置が復号部１０も兼ねてもよく、その場合はＪＰＥＧデータが入力され色調補正画像データ（ＢＭＰ）が出力される。その場合には、さらにデータ及びプログラムが準備されることになる。
【００５８】
＜第１の実施形態の画像処理装置の動作手順例＞
次に、この画像処理において、最も重要と思われる注目画像検出である人物検出のフローチャートを図６に示す。
【００５９】
尚、図６に示される検出工程は、大きく分けて２段階あり、１段階目はステップＳ６０１からＳ６０８を含み、画像全体から、圧縮単位である８＊８ブロックの画素単位での色度比率が検出対象の定義色度に対応している領域とそれ以外の領域に分離し、該当ブロックの長手方向（図１０や図１４のような横長の画像では横方向を示す）に隣接する物をまとめることで、候補を作成する段階であり、２段階目はステップＳ６０９からＳ６１３を含み、定義した色度比率に該当する候補がＤＣＴのＡＣ成分の平均値において、検出対象の定義した特徴量範囲に該当しているかを判定して、該当する候補に基づき注目画像を判別する段階である。
【００６０】
＜１段階目の処理例＞
最初のステップＳ６０１において、８＊８画素のブロック単位のＤＣＴデータと量子化テーブルとを取得すると同時に、画像ファイルはＲＧＢビットマップデータに展開する。
【００６１】
（肌色色度のブロック検出例）
次に、ステップＳ６０２に進んで、ＲＧＢビットマップデータにおいて、８＊８画素のブロック単位に本実施形態における注目画像である人の肌色の色度に対応するか検索を行う。
【００６２】
この場合、入力画像サイズにより８＊８画素ブロックの画像が全画像に対して占める割合が違うので、入力画像サイズに比例した端部の設定を行う。例えばＶＧＡ(640*480)では8ブロック分（長手方向４×短い方向２）で、UXGA(1600*1200)画像においては２０ブロック分（長手方向５×短い方向４）とする。
色度の検索方法としては、複数の方法がある。知られているものとしては、
１）Ｂ（青）／Ｇ（緑）の比率が０．７〜０．８の範囲に収まり、Ｒ（赤）／Ｇ（緑）の比率が１．４〜１．８の範囲に収まる色度を持つもの。
２）図５の概念図に示すように、肌色を確率楕円にて表すことができる。求める式としては下記の式（１）〜式（３）になる。
【００６３】
【数１】

【００６４】
本実施形態においては、処理の簡便さを考慮に入れた下記式（４）である色度分布範囲を肌色の色度範囲とした。この範囲を表したのが図２０である。
【００６５】
【数２】

【００６６】
本実施形態においては、画像における周波数成分の特徴を検出する単位として８＊８画素単位のブロックで行っている関係で、構造的論理的な簡単さより色度判定においても８＊８画素単位にて実行する。
【００６７】
図７は、本実施形態で用いている色度検出ポイントを図示したものである。これによると「８＊８画素」単位のブロックの四隅の色度の全てが色度範囲に入っているか否かを確認し、全てが範囲に入っている時は、そのブロックを適合色度と判定している。
【００６８】
図７においては、上段の左から２番目と下段の左から１，２，３ブロックが該当する。上段の一番左のブロックは４ポイントのうち左上の式度は非肌色ピクセルと判定されるので、これを含むブロックは肌色の範囲外と判定される。同じように上段の右側１，２ブロックと下段の一番右のブロックが範囲外となる。
【００６９】
図８は、「８＊８画素」単位のブロック全体の平均色度による判定である。このブロック内の平均色度の求め方としては、８＊８ブロック全ての画素値の平均値を取る方法の他に、解凍中の逆ＤＣＴを行う前の色度データ（Ｃb，Ｃr）の中のＤＣ成分から求めることも可能である。この方式の利点としては、ブロック全体の色調にて判定できるので、検出点の少ないものに比べて精度が高い期待ができる。ここで、自然画における色度のみの検出についての内容を見ることにする。
【００７０】
図９は、図７と同じ考えの中ではあるが、全体画像における検出間隔を等分化するためのものである。
【００７１】
図１０は、一般的なポートレート写真であり、図１４は人物の肌色色度と同様な色度範囲を有する枯木の林の写真である。図１０と図１４に対して、それぞれの画素に色度の適合だけで検出を行った結果を図１１と図１５に示す。
【００７２】
図１１のポートレートでの検出結果としては人物の肌色部分をよく検出しているが、その他に柵や背景の中で、ごみのような細かい部分においても適合色度を満たすものが検出されていることがわかる。このため、色度のみでは注目画像を特定できないことがわかる。
【００７３】
図１４においては、人物の肌色を検出する目的にもかかわらず同じ色度を持つ枯れ木の林が全面検出されている。このように、画素レベルでの色度判定を行った場合、注目画像を特定することは不可能である。
【００７４】
検出をブロックレベルにすることにより、特定のまとまりを持った状態が対象になるので、外来ノイズの影響は受けずらくなる。
【００７５】
（肌色色度のブロック検出の改善例）
図３５は、デジタルカメラで撮影された複数の人物肌領域の平均色度比率をプロットしたグラフである。横軸は、赤色成分の色度比率で、“R／Ｒ＋Ｇ＋Ｂ”の計算により各８＊８ブロック単位で求めた物を、検出全領域において平均値を出した物である。縦軸は、緑色成分の色度比率で、“G／Ｒ＋Ｇ＋Ｂ”の計算により各８＊８ブロック単位で求めた物を、検出全領域において平均値を出した物である。また、このグラフにおいては、領域の色度比率に関連付けて領域の平均輝度も８等分して分類をしている。
【００７６】
前記実施例においては、適合色度比率範囲を、下記のように設定している
赤色成分の色度比率“０．３５〜０．４４”
緑色成分の色度比率“０．２９〜０．３３”
このグラフの結果を見ると、定義に大半は適合するが、人物の肌の色は、反射光の為光源により定義に入らない物も存在する。その中で、注目すべきは輝度が１６０以上の分布である。特に２２３〜２５５の採光輝度に分類される領域においては、分布が前記定義より左上方向である白色方向へシフトしていることが認識できる。
【００７７】
図３８は、高輝度領域を人物肌に持った物の画像サンプルである。この画像の輝度分布を示した物が図４２である。
【００７８】
図４２においては、横軸は０〜２５５階調に表された輝度範囲で有り。左端が０、右端が２５５である。縦軸は、画像内の輝度成分を持った画素の分布である。左側の小さい山は輝度の低いコート部分。真中やや右目の大きい山は舗装道路であり面積占有率が一番大きい。最右部に人物の顔の輝度情報が分布している物である。
【００７９】
この画像図３８に対して、先の実施例の定義で一次抽出における色度比率によるグループ検出を行うと、図３９のような結果になる。図３８の人物肌の領域を考えると輝度の上昇により赤色成分がオーバーフローして色度比率の適応範囲から白飛びしてしまっている事が確認できる。この領域だけでも、露出補正などへ利用する情報としてなら利用可能であるが、前記ボケ補正などに利用するには、人物肌の顔領域の検出が不十分である。
【００８０】
図４０は、適応色度比率の範囲を下記のように広げただけの場合の結果である。
赤色成分の色度比率“０．３３〜０．４６”
緑色成分の色度比率“０．２７〜０．３５”
適応色度比率の範囲を広げるだけでは、人物の肌領域は検出できるが、舗装道路の色度比率も適応化してしまい、注目画像以外の領域についても検出してしまい、結果的に効果を上げることは出来ない。
【００８１】
図４８は、この状態を鑑みて人物肌の色度比率適応範囲の定義を入力画像のレンジを考え、輝度クラスにより定義した物である。
【００８２】
上記定義は、輝度１６０までは前記定義と同じ色度比率範囲としているが、輝度が上がるにつれ検出した人物肌の色度範囲が移動しているのに対応した物で、輝度２２０以上に付いては下記の範囲とする。
赤色成分の色度比率“０．３３〜０．４２”
緑色成分の色度比率“０．３０〜０．３４”
また、輝度レンジが１６１〜２１９までの間は一次式による算出により範囲を規定する。
【００８３】
この方式を用いて、図３８に対して検出した結果が図４１である。
【００８４】
本実施例においては、色度比率範囲を高輝度においても適応領域の範囲に変化は無いようにしているが、色度比率が白色に近くなるにつれ人物肌以外においても自然界に存在する物が多くなるので、誤検出を防ぐ為に高輝度域の適応領域の範囲は狭めても良い。
【００８５】
（肌色ブロックの連続範囲による候補検出例）
ステップＳ６０２での８＊８画素のブロックの人物肌抽出が適正なまとまりの大きさとは言えず、色度によるブロック検出においても縦方向及び横方向に隣接したブロックの連続検出という、制約を付けた検出を行うことで更に精度を上げる。
【００８６】
人の肌色であってもプリントにおいて顔を認識できるデータ量を満たさないものにおいても適応外としてはじいてもよいと言う概念でノイズと判定する連続範囲を設定する。
【００８７】
この部分を表したのが、図６のステップＳ６０３以降の処理である。すなわち、ステップＳ６０３においては、画像に対して長手方向（図１０や図１４のような横長の画像では横方向を示す）にブロックごとに色度検出を行い、連続検出ブロック数の多い順に候補を策定する。
【００８８】
次に、ステップＳ６０４において、その連続量が、注目画像としての適応する連続量に入っているか否かを比較する。本例では、ＶＧＡで２ブロック、ＵＸＧＡで４ブロックとしている。この比較の結果、該当する連続ブロックがある場合はステップＳ６０５に進み、短い方向のブロック連続検出設定を満たすデータが画像に存在するか否かの、検索を行う。本例では、ＶＧＡで２ブロック、ＵＸＧＡで４ブロックとしている。
【００８９】
次に、ステップＳ６０６において、検出データが有るか否かを判断し、検出データが有る場合にはステップＳ６０８に進んで、この過程で残ったものの中から長手方向の連続ブロック量が大きいデータから順に候補番号を付ける。
【００９０】
また、ステップＳ６０６の判断の結果、検出データが無い場合にはステップＳ６０７に進み、「目的領域無し」をセットして処理を終了する。
【００９１】
＜２段階目の処理例＞
まず、連続ブロックにて色度判定を施した場合の効果について、図１２と図１６で示す。
【００９２】
図１２においては、図１０のポートレート画像に対して検出を行った結果である。図１２において、検出候補の優先順位が高い方から（検出ブロック長が長い方から）カラーコード（１＝茶、２＝赤、３＝橙、４＝黄、５＝緑、６＝青、７＝紫、８＝灰）順に配置され、それ以外で検出されているのは色度のみ適性範囲に入っているものである。連続ブロック検出により画素レベルの色度検出と比べるとかなりの背景などの非該当候補を削除できていることが判る。
【００９３】
図１６においては、図１４の枯木の林に対して検出を行った結果で、連続ブロック検出においても注目画像以外を検出してしまうことがわかる。
【００９４】
（候補領域からの目標領域の選別例）
（ＶＧＡサイズの判別テーブル例）
次に、ＶＧＡ（video graphics array）サイズ（６４０＊４８０画素）の複数の画像サンプルを用いて人物肌と枯れ木の林の部分において、検出された適合色度連続ブロックにおける周波数特性を算出した。
【００９５】
図１８は、画像内に撮影されている人物肌の連続ブロック検出されたブロックのＤＣＴデータを周波数の低い順に並べたものを、周波数の低い方から１０個単位で加算し、連続ブロック数で除したもので、連続検出されたブロックの１個あたりの平均周波数成分をまとめたものである。
【００９６】
したがって、図面において横軸は、ＡＣ成分６３個の周波数成分をまとめたもので、１０個単位のまとまりが６グループと最も周波数の高いデータは３個分のデータとなる。縦軸は、各周波数成分の要素を加算した値である。
【００９７】
これにより、値が大きいほどそのブロックにおいて、該当周波数成分が高いことがわかる。また、検出した連続ブロック数ごとに色分けしたデータ線で表されている。例えば"Ｂ２"は連続ブロックが２個検出されているデータの平均した値を表し、"Ｂ１５"は連続ブロックが１５個検出されているデータの平均した値を表している。以下同じで、"Ｂ２〜Ｂ１５"までの複数画像からの平均的な人物肌色部分の連続検出値ごとの空間周波数特性を表している。
【００９８】
検出結果を見ると、
１）低い周波数成分の値が大きく低い周波数成分の下から３グループ以降は、連続ブロック数に係わり無く５０以下となっている。
２）連続ブロックの連続値が大きいほど周波数特性が低くなっている。
【００９９】
これらの結果から言えることは、人物の肌色部分の周波数特性は比較的低い周波数で構成されていることと、検出された連続ブロックの値が大きいことは、被写体の撮影された大きさが大きいことを示していて、この連続ブロックとしての平均値を出すことによって周波数成分が下がっていることがわかる。
【０１００】
連続ブロックの連続値により、同じ注目画像の色度を持っているものでも、その連続ブロックを１つの代表値にすること（例えば、Ｂ６のブロックの時は検出した６個のブロックの値を、各々周波数の低い順に１０個単位のグループとして加算したものをグループごとに加算した後、その連続値である６で除して平均を出している。）により、空間周波数特性の値が変わるので、連続検出値により適当な該当周波数特性が違うことが判る。
【０１０１】
図１９は、人物の肌色色度と同様な色度範囲を有する枯木の林の写真を複数用意して、検出を行った結果を図１８と同じように表したものである。
【０１０２】
検出結果を見ると、
１）人物の肌の空間周波数特性と比べると高い周波数成分にデータ多くあることが確認できる。
２）一番低い周波数成分のグループは人物の肌の結果と大きくは違わない。
【０１０３】
これらのことから、連続ブロックにおける周波数成分を検出することで、同じ色度を持った検出物体を周波数特性により区別することが可能であることがわかる。
【０１０４】
図４は、本実施形態において使用したもので、注目画像である人物肌の空間周波数特性を表したものである。上の段がＶＧＡ（６４０＊４８０）画像における周波数特性の適正範囲である。
【０１０５】
連続ブロック値を２〜８個のグループ（〜L8）と９〜２０個のグループ（L9〜２0）と２１個以上のグループ（L２1〜）の３グループにまとめて、グループごとに周波数の適正範囲を設定したものである。周波数の適正範囲も先に示した１０個単位の７グループによる周波数特性を用いた。これは、処理の簡略化と検出精度のバランスで行ったもので、これに縛られる必要は無い。
【０１０６】
（ＶＧＡサイズ／ＵＸＧＡサイズの判別テーブルの選択例）
次に、画像サイズがデジタルカメラで普及している２００万画素相当のUXGA(1600＊1200)画像について同じ撮影条件でＶＧＡ画像と比較してみる。
図２５は、図１８で使用したデータと同じシーンを対象にUXGAサイズにて撮影したものを検出した結果を、図１８と同じように周波数特性量と各レンジにおけるデータ量の平均を用いて表したものである。
VGA画像との検出特性の差を見ると、
１）連続検出ブロックの検出範囲が大きくなっている。具体的にはVGA画像検出での連続値は２から１５ブロックの連続検出である。それに対して、UXGA画像検出では検出値が４から４０の連続ブロックを検出している。
２）UXGAの方がブロック内の周波数特性が低い。例えば、１〜１０のブロック平均を見ると、VGA画像では３００〜１００のデータ量に分布しているのに対し、UXGA画像では２００〜３０の範囲にデータが分布している。一枚の画像に収まる中で、注目画像になりうるものは、全画像に対する大きさとしては、特定の比率の範囲に入っているのが一般的な考え方である。
すなわち、画像全体における画像補正の対象となりうる検出すべき対象画像が、検出したい画像であり、たとえ人物の顔が検出できても豆粒程度なら、そのために、そこを中心に補正を行うことは、他の領域とのバランスを考えた時、好ましいとわ言えない。この考えに基づいた例として画像に占める割合が１００分の１程度以下では検出する意味が少ないと言える。
【０１０７】
例えば、画像全体の中で注目画像が長手方向で１００分の１しか占めない場合はどうであろうか、一般のプリントを考えた場合、その注目画像に対して最適な補正を掛けても、出力後の補正を行われた注目画像は、ほとんど紙面を占めておらず、特定の注目画像を補正するよりは画像全体を補正する方がその画像においては効果的と考えられ、注目の定義から外れると考えられる。
本実施形態においても、各画像サイズに適したそれぞれの注目画像の適正範囲を持っている、この範囲以下でも以上でも補正対象とする注目画像の検出候補から外れる。
したがって、この例においては、UXGA画像における長手方向の１００分の１は１６００割る１００なので、１６pixelで２ブロック(8*8)分になり、色度と周波数成分が合致してもレングスの意味合いから候補の対象から外している。ちなみに、UXGA画像においては、検出連続範囲としては４〜６２ブロックと設定している。
VGA画像においては、同じ考えで１００分の１は６．４pixel となり、１ブロック分にも満たない。VGA画像においては、検出連続範囲としては２〜２５ブロックと設定している。この違いは、画像サイズによる１ブロック（8＊8）分の全画像に対する占有比率の差によるものが発生している。
画像全体中の一定の比率範囲に注目画像が入っていると考えると、画像サイズにより８＊８画素のブロックの空間周波数における意味合いは変わる。このため、同じ撮影画像でも画像サイズにより検出ロック数も違えば周波数特性も違ってくる。
本実施形態では画像ごとに検出連続範囲を上記のように設定しているが、数式に置き換えることも可能である。例えば、下記式（５）の様に最低連続数を設定することができる。
【０１０８】
【数３】

次に、図２６を示す。図２６は、人物の肌色色度と同様な色度範囲を有する枯木の林の写真であり、図１９においてはVGA画像としてのデータを処理したが、UXGAの画像としてデータをまとめたものである。
図１９との比較においては、先述の図１８と図２５の比較と同じ傾向がある。ＡＣ成分の２０以上のグループではかなり高周波成分が低減していることがわかる。しかしながら、人物肌とのデータとは分布が極端に違うので、周波数帯域ごとに適応範囲を設定することで、分離することが可能である。
このために設定したものが図２７のUXGA画像用判定テーブルである。構成は図４のVGA画像用判定テーブルと同じであり、画像サイズの違いによる平均ブロックの空間周波数特性の違いのみである。
【０１０９】
（ＶＧＡ／ＵＸＧＡでの判別テーブルの共有例）
図４３はUXGA（１６００＊１２００）サイズで人物を撮影した画像サンプルである。また、同じ人物の顔をＶＧＡ（６４０＊４８０）サイズで撮影した画像サンプルが図４５である。
【０１１０】
この２サンプル画像に対して先の実施例の定義で一次抽出における人物肌領域検出を行うと、検出領域の結果は、図４４、図４６のようになる。
【０１１１】
人物の顔の部分に注目すると、検出領域における検出ブロック数はUXGA画像（図４４）が７１９ブロックであり、ＶＧＡ画像（図４６）では６３９とほぼ同じであり、この時のＤＣＴのＡＣ成分平均値による特徴量も下記表のようにほぼ同じになる。
【０１１２】
【表１】

【０１１３】
つまり、人物肌の検出領域におけるＤＣＴのＡＣ成分平均値による特徴量は、入力画像サイズに依存するよりも、検出した領域を構成する画素数（８＊８のブロック数）に依存することがわかる。
【０１１４】
この考えを元に、複数の画像についてＵＸＧＡ画像とＶＧＡ画像に対して検出した８＊８ブロック数とＤＣＴのＡＣ成分平均値との関係をまとめた物が、図３６（ＵＸＧＡ）、図３７（ＶＧＡ）である。
【０１１５】
図３６及び図３７の横軸は、ＤＣＴ値のＡＣ成分の平均値を空間周波数成分の低い領域から１０個単位でまとめたものである。縦軸は、ＤＣＴのコード量（１０個単位の和である。但し７グループ目は３個分の和になる。）
同じ画像でも、画素数が違う為全画像に占める人物肌領域の占有率は同じでも、検出した８＊８ブロック数が違うので、図３６と図３７では、検出ブロック数の値が違う部分があるが、その中で共通な１００〜１９９を見るとほぼ一致している特性を示していることが確認できる。
【０１１６】
上記結果より、検出画像サイズである８＊８ブロック数と検出領域のＤＣＴ値ＡＣ成分の平均値特徴量を規定したものが図４７である。
【０１１７】
先の実施例では、画像サイズにより特徴量判定テーブルを持つ必要があったが、本方式を用いることで、判定テーブルの簡素化を図ることが出来る。
【０１１８】
本実施例を利用した一次抽出のフローチャートを図４９に示す。
【０１１９】
ステップS5901においては、１段階目の図４８で規定した適合色度比率のブロック判定を行う。
【０１２０】
ステップS5902においては、上記ステップにおいて適合したブロックの隣接状況を検出し、グループ化する。
【０１２１】
ステップS5903においては、グループ化した候補において、グループの構成ブロック数の多い順に候補番号を発行する。
【０１２２】
ステップS5904においては、候補番号順にＤＣＴのＡＣ成分特徴量による判定を実行する。
【０１２３】
ステップS5905においては、判定に適合した最終検出した結果を画像補正へ渡せるように必要事項をセットする。
【０１２４】
（目標領域の選別手順例）
図６の説明に戻る。上述したように、色度により検出された長手方向の連続量が大きいデータから順に注目画像の候補番号１〜ｎ（本実施形態においてはｎ＝８）を付ける（ステップＳ６０８）。ｎ以降の検出したものについては候補番号は付けられない。
【０１２５】
次に、ステップＳ６０９に進み、上記候補１〜ｎに対して、ステップＳ６０９からＳ６１２を実行して、図４で示した連続ブロック数に対する空間周波数特性適正範囲判定表の範囲に適合するか逐次比較する。この結果、適合する候補が存在しない場合は注目画像が存在しないと判断する。
【０１２６】
この候補１〜ｎに対して、画像サイズがVGA(640*480)である時は、図４で示した連続ブロック数に対する空間周波数特性適正範囲判定表の範囲に適合するか逐次比較する。最初の連続検出ブロックから周波数特性の特徴量について、適合範囲内であるか比較を行う。
【０１２７】
この時、上述したように入力画像サイズが違う画像、例えばUXGA（1600＊1200）画像においては適合判定に図２７のUXGAテーブルを使用して比較判定を行うのが好ましい。
この場合、本実施形態においては、画像サイズごと、もしくは、画像サイズ範囲（例えばVGA〜XGAとSXGA〜QXGAまでなど特定の画像範囲において共通のテーブル）ごとに設定した適応周波数特性判別テーブルにて周波数特性の比較判定を行ったが、数式を用いた判定基準を代わりに用意してもよい。
例えば、数式の作成方法としては、既に適正化テーブルがあるVGAとUXGAのテーブルを元にこの２点間の画像のサイズと周波数成分の値による変化量を対応付け、１次式にて近似して使用することができる。
（注目画像と補正値強度の決定例）
空間周波数による判定の結果、適合する候補が存在しない場合は注目画像が存在しないと判断する（図６には図示せず）。また、適合する候補が存在する場合には、ステップＳ６１３で候補グループを形成し、その１つを注目画像として補正量強度を決定する。
図２２に、そのフローチャートを示す。
【０１２８】
最初のステップＳ２２０１において、候補の数を確認する（１〜ｍ）。
【０１２９】
次に、ステップＳ２２０２に進み、候補グループを形成する。この場合、候補に隣接する色度適合ブロックを候補グループとする。この時、候補グループに複数の候補が含まれた場合は、候補番号の若い方を候補グループ番号とする。
【０１３０】
次に、ステップＳ２２０３に進み、候補グループが複数グループあるか否かを判断する。この判断の結果、候補グループが１つであれば、ステップＳ２２０５でその候補グループを注目画像として以下に示すポイントを算出する。
【０１３１】
一方、候補グループが複数グループあれば、ステップＳ２２０４に進み、各候補グループに対して、どちらのグループが補正対象となる注目画像としての重みが大きいかを判断するために、グループ内の確からしさをポイント換算で、比較を行い、よりポイントが高い候補グループが最終注目画像と設定される。もし同じポイントであれば、候補グループ番号の若い方を最終注目画像とする。
【０１３２】
ポイントの方法としては、候補が"m"個存在する場合、候補１のポイントは"ｍ"。候補２のポイントは"ｍ−１"以下同様に候補ｍのポイントは"１"となる。
【０１３３】
このようにして、候補グループ間の優位性を判断した結果の実例を図２３に示す。検出した候補グループは２グループあり、そのうち右のグループのポイントが左の候補グループのポイントを上回ったので、最終候補となっている。
【０１３４】
また、ポイント数の絶対値は、対象となる候補グループの注目画像としての信頼度を表しているので、このポイントにより注目画像に対する補正強度を決定する。補正強度決定方法としては、ポイントによる閾値を設け、閾値の上下関係で強度の指定を行う。
【０１３５】
但し、このようなポイントによる注目画像の検出ではなく、より軽い処理として、一番長い検出値の候補が入るグループもしくは、検出値そのものを注目画像としてもよい。この場合、本実施形態より検出確率に多少の差は発生するが、処理能力の低い機器ではこの方式のほうが適合する場合もある。
＜本実施形態の処理結果例＞
先の、図１０と図１４に対する結果を図１３と図１７に示す。
【０１３６】
図１３においては、注目画像である人物の顔の肌を検出している。また、図１７においては、各候補が周波数特性に適合せず候補部分が黒塗りの状態で表している。これは、注目画像が検出されなかった状態を表し、注目画像に重みを置いた画像補正の対象にならないことを示し意している。
【０１３７】
こうして注目画像を検出することができる。通常の画像補正は、画像全体のバランスに亘って補正が行われるので、逆光などで本来注目したい画像の画質を落としてしまう場合が存在しているが、本実施形態による注目画像検出により、補正項目として輝度の最適化のための露出、及び好ましい肌色のための色バランスや彩度補正を注目画像のデータを基に補正を行うことで、より高品質な画像を得ることができる。
【０１３８】
図２４に、一般画像補正を行った結果と、本実施形態の注目画像検出を利用して画像補正を行った結果の一例を示す。図２４に示したように、本実施形態の注目画像検出を利用して画像補正を行った場合は、人物などの注目画像をより良くプリントすることができる。
【０１３９】
＜第１画像抽出の処理手順の改善例１＞
次に、復号部の量子化テーブルによる画像への特性を説明する。
【０１４０】
図２８〜図３０は、代表的画像アプリケーションがJPEGファイルを作成する時の画像圧縮比率を決定する為の１３種類の量子化テーブルである。図２８〜図３０において、テーブル"００"は最も画像圧縮率を高めたものであり、テーブル"１２"は保存画質を高め、画像圧縮率を低めたものである。
【０１４１】
テーブルについて説明すると、図３の▲３▼〜▲４▼で説明した８＊８画像のＤＣＴ後のデータをさらに圧縮するために使用するもので、画像における６４個の各空間周波数に対応した値に対して、同じ位置の位の値で量子化を行う。
【０１４２】
テーブル"００"の場合で、図３の▲３▼を量子化する時は、例えば８＊８ブロックの左上の"２２４"の値をテーブル"００"の同じ位置の左上の値"３２"で量子化し"７"となす。また、最も周波数成分の高い８＊８ブロックの右下では"−１"を"１２"で量子化し"０"となる。
【０１４３】
図３１に、図２８，２９，３０のテーブル"００"〜"１２"の特性及び市販のデジタルスチルカメラの記憶部で使用している量子化テーブルを示す。
【０１４４】
横軸は、量子化テーブルＡＣ分６４個を１０個単位でまとめたものであり、縦軸は、その１０個単位の値の平均値である。したがって、どの空間周波数成分を多く量子化しているかを確認することができる。
【０１４５】
テーブル"００"〜"０４"においては、低周波成分の量子化比率が大きくなっている。市販のデジタルスチルカメラでは、低周波成分での量子化量は少なく、高周波成分域においても"１５"未満である。これに対応する量子化比率は、アプリケーションにおけるテーブル"１０"以上であり、画像の量子化としては低圧縮率の分類になる。
【０１４６】
ポートレートである図１０と、人物肌色度に一致する枯れ林である図１４の画像に対して、上記テーブルを１個飛びに用いて量子化を行った後の画像に対して、それぞれ注目画像検出を行った結果を、図３２及び図３３に示す。
【０１４７】
図３２の場合、テーブル"００"を使用した時は低周波成分の量子化の大きさにより判定テーブル（図４）による人物特性から外れてしまっている。テーブル"０２"では、人物を検出したが、検出ポイントは低い。テーブル"０６"以上で安定した検出ができている。
【０１４８】
図３３の場合、テーブル"００"を使用した時は、本来人物肌判定テーブル（図４）より高周波域で、外れてしまう検出値が量子化による誤差で、"検出判定"となり、誤判定になってしまっている。こちらの場合でも、テーブル"０８"以上で安定した検出ができている。
【０１４９】
したがって、量子化テーブルの値により、判定の精度が変わるので、このための量子化テーブル判定を行う。すなわち、圧縮画像のＤＣＴのＡＣ成分を特徴量として画像の中の特徴抽出を行う為、量子化テーブルの値があまりに大きいと、画像の中のＡＣ成分が欠落してしまい、検出時に判定から外れてしまう可能性がある。そこで、一般のデジタルカメラや、Ａｄｏｂｅ PhotoShopでのJPEGで高品位と考えられる画像では問題なく検出対象とする為に、実データより推測して設定した値として、量子化テーブルの値の総和が”６３０”を超えない程度と言うことになっている。
【０１５０】
本実施形態においては、判定を簡単化するために量子化テーブルの各項目を加算し、その合計が"６３０"以下の場合のみ判定への使用を可能とする対応画像と判断することとした。
【０１５１】
量子化テーブルの判定方法は、この他に、低周波成分での値などに注目する方法、低域３０までの総和を"１５０"とする方法など、検出する注目画像の空間周波数特性により幾つも考えることができるが、量子化テーブルの特性を使用するようにしてもよい。
【０１５２】
本例の処理を２段階目の前に実行する場合は、取得した量子化テーブルから図３４のフローチャートに示した処理を行い、ＡＣ成分特性判定テーブルを設定する。
【０１５３】
処理としては、まずステップＳ３４０１で量子化テーブル内のすべての値を加算する。この値が、量子化の程度をあらわすことになる。ステップＳ３４０２でこの加算値が所定値以上か否かが判定され、例えば、この値が６３０以上である場合は、注目画像の空間周波数特性が変わってしまっていると考えられるので、注目画像検出は中断して終了する。６３０未満の場合は、注目画像の空間周波数特性に影響は無いと判断され、ステップＳ３４０３で（入力画像サイズによる）ＡＣ成分特性判定用テーブルの選択を行い、空間周波数による選別処理に進む。
【０１５４】
又、本例の処理を１段階目の前に実行してもよく、その場合はステップＳ３４０３では、図６の処理を実行する。
【０１５５】
＜第１画像抽出の処理手順の改善例２＞
上記第１画像抽出の処理手順では、１段階目では、短い方向には予め決められたブロック数（ＶＧＡでは２ブロック、ＵＸＧＡでは４ブロック）で、長手方向に肌色ブロックが長く連続する場合を候補として抽出し、２段階目で空間周波数による選別と短い方向に隣り合う候補のグループ化を行った。
【０１５６】
しかし、１段階目で、抽出した候補から短い方向に隣り合う候補グループを作成して、候補グループに例えば上述のポイント数に基づいて番号を振りなおし、２段階目では、各候補グループの空間周波数による選別を行って、注目画像を抽出してもよい。
【０１５７】
この処理により、２段階目の処理が簡素化されると共に、空間周波数による選別をより安定的に実行することが可能になる。
【０１５８】
本実施形態においては、プリントのための最適画像処理用に注目画像を検出する方法を示しているが、表示用などにも使用できることは言うまでも無い。
【０１５９】
また、本実施形態においては、検出画像の周波数成分特性を見るために周波数情報を１０個単位で加算して周波数成分の６３個を７グループとして、画像の特性を判断したが、グループ化という発想をなくし、６３個全ての周波数をそのまま利用してもよいことは言うまでも無い。
【０１６０】
更に、画像の長手方向からの連続量の検出後短い方向の検出を行ったが、この順序も逆になっても可能であり、この他、検出ブロックを一列のグループとして検出する方法以外にも色度で検出したグループにおける全ての方向に隣接したブロックグループという、とらえ方で空間周波数特性を確認する方法など、色度と周波数特性を組み合わせた検出方法はいくらでもあり、これら一連の検出方法は本発明に含まれることは言うまでも無い。
【０１６１】
本実施形態においては、図４や図２７のように、連続検出値を３グループに分け周波数特性の適正範囲との比較を行い周波数特性の合否を判定したが、連続検出を３グループ化したのは、実施形態を簡単化するためで、連続値ごとに適正範囲を設定してもよいし、連続値には相関関係が有るので、テーブル方式ではなく理論式による方法を用いてもよい。また、周波数特性も７グループ値を使用したが、６３個の周波数のすべてにて行ってもよいし、更には特定の周波数に注目して判定してもよい。
【０１６２】
本実施形態においては、検出の目的になっている注目画像は人物の肌の領域に設定して説明しているが、周波数成分、もしくは周波数成分と色度により検出可能なものは、人物の肌色に限らず、空、海、木々の緑なども存在する。
【０１６３】
本実施形態においては、８＊８ブロック単位データの周波数成分を周波数の低い順から１０個単位でまとめた値を用いて、その１０個の和によるグループ（最も高い周波数グループは３個の和）の特性から周波数特性を代表させているが、JPEG ファイルの場合、ＤＣ成分１個に対し、ＡＣ成分６３個の構成で、周波数特性を表しているので、１０個の集合体として特性を見なくてもよい。
【０１６４】
また、６３個の個々の特性より判断してもよいし、もっとグループ化してもよい。また、特定の周波数成分のみの利用により特性を導き出してもよい。このように、周波数特性を利用した特性を導くのにＡＣ成分の利用方法はいくらでもある。
【０１６５】
更に、本実施形態では８＊８ブロックの連結と言う概念で縦方向と横方向について注目画像を検出するために色度該当ブロックの連続性において、候補を抽出しているが、この時のブロック集合体の判定方法も、この方法に限られてものではないことは言うまでも無い。
【０１６６】
本実施形態では連続検出した色度ブロックに対して検出した連続値により、端のブロックを削除した値を特性利用しているが、周波数成分による適合から色度ブロックの境界を設定したり（図２１）、予め特定以上の周波数特性のあるブロックを、色度検索を行う前に除外してから行うようにしたりなど、ブロックの集合体を決定するための色度と周波数成分による分離の仕方は、複数の方法と組み合わせがあるが、本願特許の範囲に包含される。
【０１６７】
上記図２１について説明する。図２１の左側は元画像であり、このJPEG ファイル画像の圧縮単位である８＊８画素ブロックの周波成分における高周波成分の総データ値が閾値を超えるか超えないかで判定したのが右側の画像になる。明るい部分が高周波成分を持つ領域で、暗い部分が高周波成分の少ない領域である。この領域を境に設けた色度判定による注目画像検出も可能である。
【０１６８】
また、本実施形態は、画像圧縮ファイルとして、"JPEGファイル"を利用した方法を開示したが、"JPEG2000 ファイル"など、周波数成分への変換を利用した他のファイルに対しても同様な考え方で、注目画像の検出を簡単な処理で実現できることは、言うまでもない。
【０１６９】
上記実施形態においては、周波数成分と色度を中心に配置情報などを入れて注目画像検出を行ったが、このねらいは、注目画像を中心とした画像補正を行う為である。したがって、検出された注目画像領域の輝度を含むデータが補正を行うことが有効でない状態として検出された時、例えば暗過ぎる値でつぶれている時などは、補正として無理に諧調性を持たそうとすると、ノイズだらけになってしまう場合がある。
【０１７０】
この不都合を回避するために、図６の検出結果に対して、検出された部分領域の各ブロック直流成分データを利用して輝度平均を出し、補正に適した輝度範囲に入っているかを比較することで、更に精度の良い注目画像における画像補正を行うことができる。
＜第２の実施形態の画像処理装置の構成例＞
注目画像として人物の顔が撮影された場合が、図５１の画像サンプルである。この画像サンプルは、画素数が３０万画素と、近年の入力機器の能力としては低いクラスのジャンルに入る物で、画像ファイルサイズも６０Kバイトと圧縮率の高い物である。このような画像に対しては、上記構成の注目画像検出を行って露出補正を行っても画質の向上は大きくは期待できない。このような画像に対しては、有効な補正としては、通常アンシャープマスクによる補正を行うことで、ボケを排除し、メリハリの利いた補正を行う事が知られているが、この欠点として、画像全体にアンシャープマスク補正を利かせた場合は、肌領域においては肌がざらついた状態になりやすいので、画像全体に行う場合は補正強度を低くするしかなく、また、効果的である目や口の領域のみに利かせるには、領域指定を自動化することが難しい状態であった。
【０１７１】
図５０は、第２の実施形態の画像処理装置の構成例を示すブロックである。図５０では、第１の実施形態の構成要素はブラックボックスで図示されている。この構成要素は基本的に第１の実施形態と同様である。第２の実施形態の特徴ある構成は第２画像抽出部１００ｂとボケ補正処理部３０の追加である。
【０１７２】
第２画像抽出部１００ｂは、第１の実施形態の第１画像抽出部１００ａと共に、画像認識部１００を形成する。第２画像抽出部１００ｂは、画像の縦横比しきい値３０１ａにもとづいて候補領域を選別する候補領域選別部３１０と、特徴部位しきい値３０２ａに基づいて選別された候補領域内にある特徴部位（本例では、顔領域中の目や鼻、口、眉等）を抽出する特徴部位抽出部３０２とを有する。本例では、顔の論客の縦横比により顔領域を選択する。
【０１７３】
特徴部位抽出部３０２から出力された特徴部位の情報に基づき、復号部１０から出力される復号画像は、第１の実施形態での色調補正に先立って、ボケ補正値算出部３０ａが算出した値に従ってボケ補正処理部３０でボケ補正処理が行われる。
【０１７４】
尚、ハードウエア及びソフトウエア構成は、図１Ｃと類似であるので図示及び説明を省く。
【０１７５】
＜第２の実施形態の画像処理装置の動作手順例＞
上記構成の人物肌領域の検出機能を利用した本発明の概略のフローチャートを図６０に示す。
【０１７６】
フローチャートの説明を行う。
このフローチャートは、本発明における入力画像の中に人物の顔領域の検出と、その構成画素数や量子化フィルターの値により、その人物顔の肌領域内の目や口等の検出と補正処理の適正強度設定及び実行の流れを表している。
ステップS5601においては、対象画像の画素数及び量子化テーブルを基本に、印刷時の拡大倍率と解像度情報を元に、二次画像抽出が必要な物かの判定用情報取得を行う。前記図５１のような画素数の少ない画像は二次画像抽出の対象となりうる。
【０１７７】
ステップS5602においては、前記図６で開示したフローによる注目画像抽出処理を行う。この実施例においては、人物顔の肌領域の特徴量を持つ領域の検出を行う。図５１の画像に対しては、図５２で表すような領域を抽出することになる。図５２における白い部分の領域が人物顔の肌の特徴量を持つ領域と判定され、黒い部分はそれ以外の部分である。この検出において、肌領域部分の特徴量の他に、平均輝度などの算出も行う。
【０１７８】
ステップS5603においては、ステップS5601とS5602の検出結果の論理和により、判定を行う。二次画像抽出が必要ない場合はステップS5607の従来処理へ進み。二次画像抽出が必要な場合はステップS5604へ進む。
【０１７９】
ステップS5604においては、二次画像抽出処理を行う。具体的には図５２の検出した人物肌色領域内にある、一次抽出の色度比率範囲外にある目とか口の候補となる領域を検出し判定を行う。詳細は後述する。
【０１８０】
ステップS5605においては、二次画像抽出が成功したか判定を行う。失敗した場合は、ステップS5607の従来処理へ進み。成功した場合は、ステップS5606に進む。ステップS5606においては、ボケ補正処理を実行する。その後、ステップS5607の注目画像検出の抽出結果を画像補正へ渡す為にセットする。ステップS5608において、抽出結果を反映した画像補正を行う。
【０１８１】
次に、二次抽出処理に部分をより詳細に説明する為のフローチャートを図６１に示し、その説明を行う。
【０１８２】
ステップS5701においては、ステップS5602からの情報から一次画像抽出の候補画像領域の縦横比を算出する。
【０１８３】
ステップS5702においては、候補画像が人物の顔の縦横比定義に適応する物か判定を行う。一次抽出による候補画像が抽出画像定義に適応しない場合はステップS5709に進む。適応する場合は、ステップS5703に進む。
【０１８４】
ステップS5703においては、候補領域内にある、一次抽出の色度比率範囲外の領域を検出する。図５２においては、人物肌領域である白色領域内に独立して残された黒色の領域になる。この領域の各々の構成画素数（ブロック数）と平均色度及びDCTのAC成分の平均値などを算出する。
【０１８５】
本実施例においては、人物顔の肌色以外の構成要素として、目、口、眉毛、眼鏡など考えられるが、そのなかで、目に付いて説明を行う。
【０１８６】
目の画像サンプルとして図５４、図５６、図５８を示す。
【０１８７】
図５４は、目を表す構成の画素数を縦１２画素、横２２画素とした物で、この画像に対して図２８及び２９で示した画像圧縮の為の量子化テーブルを使用して画像圧縮を行ったものである。テーブル１１を使用した物はF12で、テーブル０８を使用した物はF9で、テーブル０６を使用した物はF7で、テーブル０３を使用した物はF4である。
【０１８８】
図５５は、目を表す構成の画素数を縦２４画素、横４４画素とした物で、この画像に対して図２８及び２９で示した画像圧縮の為の量子化テーブルを使用して画像圧縮を行ったものである。テーブル１１を使用した物はF12で、テーブル０８を使用した物はF9で、テーブル０６を使用した物はF7で、テーブル０３を使用した物はF4である。
【０１８９】
図５６は、目を表す構成の画素数を縦４８画素、横８８画素とした物で、この画像に対して図２８及び２９で示した画像圧縮の為の量子化テーブルを使用して画像圧縮を行ったものである。テーブル１１を使用した物はF12で、テーブル０８を使用した物はF9で、テーブル０６を使用した物はF7で、テーブル０３を使用した物はF4である。
【０１９０】
ステップS5704においては、目となりえる設定した色度比率に収まっているかを判定する。口などの色度比率設定も可能である。候補として不適当と判定した場合はステップS5709に進む。適応する場合は、ステップS5705に進む。
【０１９１】
ステップS5705においては、ステップS5703で検出した領域が目の候補として、ステップS5701で検出した人物顔の肌領域との面積比率として適正なサイズかを算出する。
【０１９２】
ステップS5706においては、ステップS5703で検出した領域が目の候補として、適正な外形比率に収まっているか確認する為に縦横比率を算出する。
【０１９３】
ステップS5707においては、ステップS5705及びS5706での算出した結果が目としての候補領域になるか判定を行う。候補として不適当と判定した場合はステップS5709に進む。適応する場合は、ステップS5708に進む。
【０１９４】
ステップS5708においては、画像のボケ量の判定と判定結果から補正の強度の確定を行い、補正を実行する。
【０１９５】
まず、ボケ量の判定であるが、まず、先述の目領域の画像サンプルである図５４、図５６、図５８に対して一定のアンシャープマスク処理を加えた画像を図５５、図５７、図５９に示す。
【０１９６】
そして、この画像に関するDCTのAC成分平均値の特徴量を図６２，図６３，図６４に示す。
【０１９７】
図６２は、目を表す構成の画素数を縦１２画素、横２２画素とした物で、横軸は対象画像全領域におけるＤＣＴ値のＡＣ成分の平均値を前述の要領で空間周波数成分の低い領域から１０個単位でまとめたものである。縦軸は、ＤＣＴのコード量（１０個単位の和である。但し７グループ目は３個分の和になる。）上記内容から、量子化フィルターによるデータ量の差は空間周波数の高域成分に現れるが、目である対象領域においての差はあまり大きくない。アンシャープマスク処理で低域分の空間周波数特性が上がっているので、メリハリがついていることがわかる。
【０１９８】
図６３は、目を表す構成の画素数を縦２４画素、横４４画素とした物で、グラフの構成は図６２と同じである。上記内容から、量子化フィルターによるデータ量の差は空間周波数の高域成分に現れるが、目である対象領域においての差はあまり大きくない。アンシャープマスク処理で低域分の空間周波数特性が上がっているので、メリハリがついていることがわかる。
【０１９９】
図６４は、目を表す構成の画素数を縦４８画素、横８８画素とした物で、グラフの構成は図６２と同じである。上記内容から、量子化フィルターによるデータ量の差は空間周波数の高域成分に現れるが、目である対象領域においての差はあまり大きくない。アンシャープマスク処理で低域分の空間周波数特性が上がっているので、メリハリがついていることがわかる。
【０２００】
画像サイズによる違いとしては、構成画素が多いほどＤＣＴ値のＡＣ成分の平均値である特徴量は小さくなる。また、ＡＣ成分の分布状況は同じである。
【０２０１】
上記、目の画像の画素数と量子化フィルター値によるアンシャープマスク処理の効果を補正へ反映する為、検出した二次抽出領域の大きさと量子化フィルター値により図６５に示すようなアンシャープマスクの補正強度指定を行う。
【０２０２】
また、一次抽出により検出した肌色領域の輝度分布により分布の範囲が大きい場合、例えば、野外において、太陽光を直接浴びて人物顔肌領域の明暗の輝度幅が大きい時はシャープにする補正の効果が少ないので、図６６に示すように一次抽出により検出した肌色領域の輝度分布範囲により、輝度表現が０〜２５５の場合、１５０以上の輝度範囲データを持つ一次抽出により検出した肌色領域に対する二次抽出領域へのアンシャープマスク処理の強度を強めに設定する。
【０２０３】
ステップS5709においては、抽出結果の値を反映した画像補正を実行する。
【０２０４】
図５１に対して上記処理を行った結果が図５３である。ボケ画像に対して、領域を特定して補正が適度に加えられたことが確認できる。
【０２０５】
上述したように、本発明によれば、画像圧縮ファイルを解凍する過程で空間周波数データと量子化テーブルとを取得し、上記空間周波数データ及び量子化データ特性を組み合わせて画像ファイル中の注目画像を検索するために利用するようにしたので、高度な計算をすることなく画像データブロックごとの交流成分情報を含む情報を取得して、画像ファイルの中の注目画像を検索することができる。
【０２０６】
また、本発明の他の特徴によれば、デジタルカメラから直接プリントする場合などのように、パーソナルコンピュータと比べて処理能力が低い組み込み式の機器においても、製品として使用可能な範囲の処理で、印刷する圧縮画像ファイルに補正の対象となる注目画像の有無、及びその値の適正度を検出することができ、必要に応じて注目画像を重視した画像補正を施すようにすることができる。
なお、以上に説明した本実施形態の画像認識装置は、コンピュータのＣＰＵあるいはＭＰＵ、ＲＡＭ、ＲＯＭなどで構成されるものであり、ＲＡＭやＲＯＭに記憶されたプログラムが動作することによって実現できる。
【０２０７】
したがって、コンピュータが上記機能を果たすように動作させるプログラムを、例えばＣＤ−ＲＯＭのような記録媒体に記録し、コンピュータに読み込ませることによって実現できるものである。上記プログラムを記録する記録媒体としては、ＣＤ−ＲＯＭ以外に、フレキシブルディスク、ハードディスク、磁気テープ、光磁気ディスク、不揮発性メモリカード等を用いることができる。
【０２０８】
また、コンピュータが供給されたプログラムを実行することにより上述の実施形態の機能が実現されるだけでなく、そのプログラムがコンピュータにおいて稼働しているＯＳ（オペレーティングシステム）あるいは他のアプリケーションソフト等と共同して上述の実施形態の機能が実現される場合や、供給されたプログラムの処理の全てあるいは一部がコンピュータの機能拡張ボードや機能拡張ユニットにより行われて上述の実施形態の機能が実現される場合も、かかるプログラムは本発明の実施形態に含まれる。
【０２０９】
また、本発明をネットワーク環境で利用するべく、全部あるいは一部のプログラムが他のコンピュータで実行されるようになっていてもよい。例えば、画面入力処理は、遠隔端末コンピュータで行われ、各種判断、ログ記録等は他のセンターコンピュータ等で行われるようにしてもよい。
【０２１０】
【発明の効果】
本発明により、画像の輝度によって注目画像領域抽出に用いられる色度比率判定を適正化して、安定した注目画像領域の抽出及び画像の再生を行うことができる。
【０２１１】
又、人物顔画像などがボケたようなものにおいては、適正な補正への情報を取得できる。
【図面の簡単な説明】
【図１Ａ】本発明の実施形態に係わるJPEG画像解凍時に必要なデータを取得する流れを示す概念図である。
【図１Ｂ】第１の実施形態に係わる画像処理装置の構成例を示すブロック図である。
【図１Ｃ】実施形態に係わる画像処理装置のハードウエア及びソフトウエアの構成例を示すブロック図である。
【図２】実施形態の画像データをJPEG形式へ変換する処理過程の流れを示す概念図である。
【図３】実施形態のJPEGの画像圧縮単位である8*8ブロックを例にしたJPEG形式へ変換する処理過程を示す図である。
【図４】実施形態のJPEGファイル画像圧縮単位である8*8ブロックのＡＣ成分特性を利用した判別テーブルを示す図である。
【図５】実施形態の他にある肌色のＲＧ色度分布例を示す図である。
【図６】実施形態のJPEG画像解凍からの注目画像検出フローチャートである。
【図７】実施形態のJPEGファイル画像圧縮単位である8*8ブロックにおける、色度検出方法を示す図である。
【図８】実施形態のJPEGファイル画像圧縮単位である8*8ブロックでのＤＣ成分を利用した色度検出方法を示す図である。
【図９】実施形態の色度検出において、３ビット間引きを利用して検出をした場合の8*8ブロックにおける検出状況を示す図である。
【図１０】実施形態の検出用JPEG画像サンプルの第１の例を示す図である。
【図１１】第１の画像サンプルを色度のみによる検出を行った結果のBMP ファイルの一例を示す図である。
【図１２】第１の画像サンプルを8*8ブロック単位の色度検出を元に配置と連続ブロック検出を行った結果のBMP ファイルの一例を示す図である。
【図１３】実施形態の注目画像検出により、第１の画像サンプルを8*8ブロック単位の色度検出を元に配置と連続ブロックとＡＣ成分による検出を行った結果のBMPファイルの一例を示す図である。
【図１４】実施形態の検出用JPEG画像サンプルの第２の例を示す図である。
【図１５】第２の画像サンプルを色度のみによる検出を行った結果のBMPファイルの一例を示す図である。
【図１６】第２の画像サンプルを8*8ブロック単位の色度検出を元に配置と連続ブロック検出を行った結果のBMP ファイルの一例を示す図である。
【図１７】実施形態の注目画像検出により、第２の画像サンプルを8*8ブロック単位の色度検出を元に配置と連続ブロックとＡＣ成分による検出を行った結果のBMP ファイルの一例を示す図である。
【図１８】実施形態の人物肌検出において、人物肌検出データの連続色度検出値におけるＡＣ成分の周波数特性を示す図である。
【図１９】実施形態の人物肌検出において、枯れ林の検出データの連続色度検出値におけるＡＣ成分の周波数特性の表を示す図である。
【図２０】実施形態の肌色のRG色度分布を示す図である。
【図２１】周波数特性による境界作成のための検出方法の一例を示す図である。
【図２２】実施形態の候補グループの判定手順を示すフローチャートである。
【図２３】実施形態の候補グループ判定の検出結果画像の一例を示す図である。
【図２４】実施形態の注目画像検出を利用した画像補正の比較結果の一例を示す図である。
【図２５】本実施形態の人物肌検出において、UXGA(1600*1200)画像における人物肌検出データの連続色度検出値におけるAC成分の周波数特性を示す特性図である。
【図２６】本実施形態の人物肌検出において、UXGA(1600*1200)画像における枯れ林の検出データの連続色度検出値におけるAC成分の周波数特性の表を示す図である。
【図２７】本実施形態のJPEGファイル画像圧縮単位である8*8ブロックのAC成分特性を利用したUXGA(1600*1200)画像に対する判別テーブルの一例を示す図である。
【図２８】既存のアプリケーションで使用している量子化テーブルの一例を示す図である。
【図２９】既存のアプリケーションで使用している量子化テーブルの一例を示す図である。
【図３０】既存のアプリケーションで使用している量子化テーブルの一例を示す図である。
【図３１】量子化テーブルにおける圧縮比率と周波数特性との関係を示す図である。
【図３２】注目画像検出を行った結果の一例を示す図である。
【図３３】注目画像検出を行った結果の一例を示す図である。
【図３４】取得した量子化テーブルからＡＣ成分特性判定テーブルを設定する手順の一例を示すフローチャートである。
【図３５】本実施形態において、複数画像中の人物肌領域色度比率を検出領域の平均輝度で分類して分布状況を表した図である。
【図３６】本実施形態において、ＵＸＧＡ（１６００＊１２００画素）サイズの画像ファイルに存在する人物肌領域を検出し、その人物肌領域内におけるJPEG圧縮での8*8ブロックにおけるDCTのAC成分の平均値を、検出した画素数（JPEG圧縮での8*8ブロック数）で分類した表を示す図である。
【図３７】本実施形態において、ＶＧＡ（６４０＊４８０画素）サイズの画像ファイルに存在する人物肌領域を検出し、その人物肌領域内におけるJPEG圧縮での8*8ブロックにおけるDCTのAC成分の平均値を、検出した画素数（JPEG圧縮での8*8ブロック数）で分類した表を示す図である。
【図３８】本実施形態においては、人物の顔領域内で、白跳びが発生している画像サンプルを示す図である。
【図３９】本実施形態においては、図３８の画像サンプルに対して、固定の色度比率範囲で人物肌領域検出を行い検出した領域を示す図である
【図４０】本実施形態においては、図３８の画像サンプルに対して、適合色度比率範囲を拡大した定義を用いて人物肌領域検出を行った結果を示す図である。
【図４１】本実施形態において、図３８の画像サンプルに対して、輝度値に依存する適合色度比率範囲の定義を用いて人物肌領域検出を行った結果を示す図である。
【図４２】本実施形態において、図３８の画像サンプル全体の輝度ヒストグラムを示す図である。
【図４３】本実施形態において、ＵＸＧＡ(1600*1200)サイズにて撮影した被写体が人物の画像サンプルを示す図である。
【図４４】本実施形態において、図４３の画像サンプルに対して、人物肌領域検出を行い検出した領域を示す図である。
【図４５】本実施形態において、ＶＧＡ(640*480)サイズにて撮影した被写体が人物の画像サンプルを示す図である。
【図４６】本実施形態において、図４５の画像サンプルに対して、人物肌領域検出を行い検出した領域を示す図である。
【図４７】本実施形態において、候補領域のサイズによるDCTのAC成分である空間周波数特徴量の判定テーブルを示す図である。
【図４８】本実施形態において、抽出候補領域の判定用色度比率範囲テーブル２を示す図である。
【図４９】本実施形態において、色度比率による抽出領域の画素数（ブロック数）によるＤＣＴ特徴量判定方式の処理手順を示すフローチャートである。
【図５０】第２の実施形態に係わる画像処理装置の構成例を示すブロック図である。
【図５１】本実施形態において、ＣＣＤ３０万画素の携帯電話で人物の顔を撮影した画像サンプルを示す図である。
【図５２】本実施形態において、図５１の画像サンプルに対して人物肌領域検出を行い、検出された領域（白い部分）を示す図示である。
【図５３】本実施形態において、図５１の画像サンプルに対して人物肌（顔）領域内の目や鼻候補を選定し、その領域のみにアンシャープマスク処理を行った結果を示す図である。
【図５４】本実施形態において、２２＊１２画素で撮影された“目”の画像である。JPEGの量子化テーブルの値を変えた物で、高圧縮の“Ｆ４”から低圧縮の“F12"の4種類の保存をかけた画像を示す図である。
【図５５】本実施形態において、図５４の各画像に対してアンシャープマスク画像処理を行った結果を示す図である。
【図５６】本実施形態において、４４＊２４画素で撮影された“目”の画像である。JPEGの量子化テーブルの値を変えた物で、高圧縮の“Ｆ４”から低圧縮の“F12"の4種類の保存をかけた画像を示す図である。
【図５７】本実施形態において、図５６の各画像に対してアンシャープマスク画像処理を行った結果を示す図である。
【図５８】本実施形態において、８８＊４８画素で撮影された“目”の画像である。JPEGの量子化テーブルの値を変えた物で、高圧縮の“Ｆ４”から低圧縮の“F12"の4種類の保存をかけた画像を示す図である。
【図５９】本実施形態において、図５８の各画像に対してアンシャープマスク画像処理を行った結果を示す図である。
【図６０】本実施形態において、補正処理も含めた拡張画像抽出処理のフローチャート１である。
【図６１】本実施形態において、補正処理も含めた拡張画像抽出処理のフローチャート２である。
【図６２】本実施形態において、量子化フィルター値とアンシャープマスク処理のＤＣＴ特性比較（２２＊１２サイズ）のグラフを表す図である。
【図６３】本実施形態において、量子化フィルター値とアンシャープマスク処理のＤＣＴ特性比較（４４＊２４サイズ）のグラフを表す図である。
【図６４】本実施形態において、量子化フィルター値とアンシャープマスク処理のＤＣＴ特性比較（８８＊４８サイズ）のグラフを表す図である。
【図６５】本実施形態において、画像量子化フィルターの値と検出領域サイズによるアンシャープマスクの強度対応表を現す図である。
【図６６】本実施形態において、人物肌色領域の輝度分布とその内部領域である目領域へのアンシャープマスク強度設定の関係を表す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention extracts an image area and reproduces an image, particularly extracting an image area of interest from a compressed image. Out The present invention relates to a compressed image reproduction method and apparatus, a computer program related to the processing, and a computer-readable recording medium. In particular, a desired image region is extracted from a compressed image data format such as a JPEG file image and desired. It is suitable for use in reproducing.
[0002]
[Prior art]
For example, when a JPEG file image shot and compressed with a digital camera or the like is played back on a personal computer (PC) and displayed, or when printing from a PC printer or direct printer, or when printing with a DPE There is. At this time, if the photographed image data is of good quality, there is no problem because it can be reproduced and displayed or printed faithfully.
[0003]
However, depending on the photographed image data, there are color covering, insufficient contrast, improper exposure, etc., and it is necessary to perform image correction in order to obtain a high-quality print result. In particular, in the case of an image of a person, in general, if the image is reproduced and printed so that the color of the person's face is appropriate, the feeling given to the person who viewed the photograph will be improved, and the quality of the photograph will be improved. . Even when a landscape or an object is photographed, it is desired to reproduce and print the target photographing object so that the color of the object is appropriate.
[0004]
For example, in the case of a silver salt photograph, it is preferable to change the exposure amount at the time of printing for each original image in order to obtain a high-quality photo, and in order to determine the exposure amount at the time of printing, In some cases, it is convenient to focus on the color of the person's face. This is because, since it is known that the human face has a skin color, it is possible to determine the exposure amount so that the color of the human face in the printed photograph becomes a skin color.
[0005]
Conventionally, as a method for recognizing an image from an image file of digital data, for example, Patent Document 1, Patent Document 2, and Patent Document 3 are known.
[0006]
These methods detect the degree of similarity and the degree of coincidence with the designated image. In the case of Patent Document 1, a rough coincidence is obtained in units of blocks using a DC component, and then a restoration process is performed on the candidate image region. This is a method for obtaining a fine match as uncompressed data.
[0007]
In the case of Patent Literature 2, search data is input and created, and the image processing apparatus determines the similarity between this data and a plurality of image data. Furthermore, in the case of Patent Document 3, a compressed image is created by wavelet transforming a search target image. Also, the degree of similarity is determined by subjecting the designated image to wavelet transform and comparing each feature data.
[0008]
On the other hand, as an image correction method, when printing an image shot with a digital camera, the image data such as contrast, white balance, exposure correction, sharpness, etc. is analyzed by analyzing the shooting data with an application or printer driver application. A device that uniformly performs correction is known.
[0009]
[Patent Document 1]
Japanese Laid-Open Patent Publication No. 8-14697
[Patent Document 2]
JP 2000-48036 A
[Patent Document 3]
Japanese Patent Laid-Open No. 11-238067
[0010]
[Problems to be solved by the invention]
However, the above-described conventional method cannot accurately find a target image area to be corrected and correct the target image area to a desired color.
[0011]
In other words, when playing or displaying or printing JPEG file images taken with a digital camera, etc., as necessary, so that images of interest such as people can be better displayed or printed, such as the print processing of silver halide photographs. Therefore, it is necessary to determine a method for finding a target image area in the JPEG file image so that correction can be performed.
On the other hand, there is a need for a method in which the detection process can be as light as possible so that it can be used even in devices with low data processing capabilities such as direct printing from a digital camera to a printer.
In view of the above-mentioned problems, the applicant can extract a target image area in an image file by a method with little unprocessability, and cannot process a target image area in an image file regardless of the input image size. We proposed a method that can extract with less method (Japanese Patent Application 2002-193620).
[0012]
However, the chromaticity ratio determination used for the target image region extraction and the determination method of the feature amount determination based on the DC component of the DCT are not necessarily optimized. Etc. exist.
[0013]
Further, in determining the feature amount of the AC component of the DCT used for determination of target image detection, it is necessary to have a determination table for each image size depending on the detection size class. However, the determination table is complicated.
[0014]
The present applicant has also proposed a method of optimizing the determination method of feature amount determination and completely extracting a target image region with fewer defects (Japanese Patent Application 2002-193621).
[0015]
In addition, the quantization table value related to the compression ratio of the compressed image by JPEG is not uniform due to re-save after shooting or editing by the application, and if a high compression quantization table is used, the spatial frequency in the image will be There is a possibility that the detection accuracy is lowered due to the extreme change, the frequency feature amount in the image area of interest is also affected.
The present applicant further proposes a technique that can extract a target image area by a method that requires less processing load by performing determination using characteristics of a quantization table when extracting a target image area in an image file. (Japanese Patent Application 2002-193622).
[0016]
However, the determination of the chromaticity ratio used for extraction of the attention image area may be inaccurate due to the luminance of the image, or application to exposure correction or the like in the acquired data by extraction of the attention image area has been considered. In the case where the image is out of focus, there is a part that cannot be said to have acquired information for proper correction.
[0017]
In view of the above-described problems, an object of the present invention is to optimize the chromaticity ratio determination used for extracting an attention image area based on the luminance of the image, and to perform stable extraction of the attention image area and reproduction of the image.
[0018]
It is another object of the present invention to obtain information for proper correction when a person face image is blurred.
[0021]
[Means for Solving the Problems]
In order to solve this problem, an image reproduction method of the present invention is an image reproduction method for reproducing an image by decoding compressed and encoded image data, Decoded image data A block extracting step for extracting a continuous block having a predetermined chromaticity range from the image, and a determination step for determining whether or not the continuous block is set as a target image region based on an average value of spatial frequencies of the continuous blocks. A feature part extraction step for extracting a feature part from the attention image region, and the extracted Based on the number of pixels of the feature portion and the quantization filter value used for the compression encoding, a determination step for determining the correction strength of the blur of the feature portion, and according to the correction strength determined in the determination step A blur correction step for correcting blur of the characteristic part; The blur was corrected in the correction step A playback step of playing back an image.
[0022]
here, In the determination step, the correction strength for correcting the blur more strongly is determined as the number of pixels of the characteristic part is small and the value of the quantization filter value is large. Also, The predetermined chromaticity range, Decrypted image data It is set based on the brightness value. Also, the above Compression encoded image data Decrypt Said Decryption Images A decoding step for generating data, and said decoding Images A step of obtaining chromaticity, spatial frequency, and luminance from the data. The compressed and encoded image data Is a JPEG image data And the decryption Images The data includes DCT coefficients and inverse DCT transformed bitmap data. In addition, the method further includes a selection step of selecting a candidate to be a target image region based on the number of the continuous blocks extracted in the block extraction step. The blur correction is performed by unsharp mask processing.
[0023]
The present invention also provides the above method. Steps The computer To run for Computer Program and Computer Program Remembered Computer readable Na A storage medium is also provided.
[0025]
or, The image processing apparatus of the present invention An image processing apparatus that decodes compression-encoded image data and reproduces an image, Decoded image data A block extracting unit that extracts a continuous block having a predetermined chromaticity range from the image, and a determination unit that determines whether or not the continuous block is set as an attention image region based on an average value of spatial frequencies of the continuous block And feature part extraction means for extracting a feature part from the attention image area, and the extracted Based on the number of pixels of the feature portion and the quantization filter value used for the compression encoding, a determination unit that determines the correction strength of the blur of the feature portion, and the correction strength determined by the determination unit according to the correction strength Blur correction means for correcting blur of a characteristic part; The blur was corrected by the correction means. And reproducing means for reproducing an image.
[0026]
here, The determination means determines the correction strength for correcting the blur more strongly as the number of pixels of the feature portion is smaller and the value of the quantization filter value is larger. Also, Said Compression encoded image data Decrypt Said Decryption Images Decoding means for generating data, and the decoding Images And means for obtaining chromaticity, spatial frequency, and luminance from the data. The compressed and encoded image data Is a JPEG image data And the decryption Images The data includes DCT coefficients and inverse DCT transformed bitmap data. The block extracting means By The image processing apparatus further includes a selection unit that selects a candidate to be a target image area based on the number of extracted consecutive blocks. The blur correction is performed by unsharp mask processing.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
Next, an embodiment of an attention image region extraction method, compressed image reproduction method and apparatus, computer program related to the processing, and computer-readable recording medium of the present invention will be described with reference to the accompanying drawings.
[0028]
Hereinafter, in this embodiment, for example, an example in which a target image area is extracted from a JPEG file image that is a compressed image data format and desired reproduction will be described, but the present invention is not limited to compression by JPEG. The present invention can be widely applied to compression techniques that can extract the spatial frequency of an image from symbol data (DCT coefficients in this example) during the compression process, and the present invention includes these. In this example, an example in which a JPEG file image is reproduced and printed will be mainly described. However, the present invention is a technique for reproducing and outputting a compressed image (including display and printing), and includes these.
[0029]
Furthermore, the attention image region extraction method of the present invention is not limited to extraction of a target image region from a compressed image, but is a technique applied to extraction of a desired target image region from an uncompressed image. Including these, the same effects can be obtained.
[0030]
<Example of compression-encoded data to be decoded and reproduced according to the present embodiment>
First, information omission and encoding / decoding of the most common image compression file “JPEG file” will be described with reference to FIGS.
[0031]
First, with regard to encoding, it is common for digital cameras and digital videos to store still images as JPEG files. In this case, a signal entering a CCD or the like that is a light receiving element of the input device is A / D converted and then taken into a frame memory, and RGB or CMY filter information is converted into luminance and chromaticity information. Then, it is divided into 8 * 8 (64) square pixel blocks.
[0032]
(1) in FIG. 3 shows an example of data of one block among the luminance data bitmap divided into 8 * 8 blocks. In addition, (2) in FIG. 3 shows an example in which the pixel value of 0 to 255 is level-shifted and converted to a signal of −128 to 127. Further, (3) in FIG. 3 shows an example in which the DCT coefficient is obtained by DCT (Discrete Cosine Transform).
[0033]
Further, (4) in FIG. 3 is a quantization table in which omission of high-frequency components in consideration of visual characteristics is increased. Using this table, the DCT coefficient as a result of (3) in FIG. An example of quantization is shown.
[0034]
(5) in FIG. 3 is the result of quantization. This value is entropy encoded and expressed by a Huffman code to generate compressed data that is an encoded signal.
[0035]
Next, in decoding, the reverse process of the above encoding is performed. That is, the encoded signal is decoded and the value of the quantized DCT coefficient is decoded. Next, a DCT coefficient is obtained by multiplying the quantization table to perform inverse quantization. Thereafter, the image subjected to the level shift is restored by performing inverse DCT, and the image of one block is decoded by adding the value 128 of the inverse level shift.
[0036]
In the above description, it is omitted to combine the data divided into luminance information and chromaticity information and convert it into an RGB image. However, as shown in FIG. (Y) and two chromaticity components (Cb, Cr) are converted, and each of them is encoded and combined to generate compressed image data.
[0037]
As a method for printing a JPEG image that is a compressed image data file as described above, the compressed image data from the input device is transferred to a personal computer (hereinafter referred to as a PC) via USB or a storage medium. Expand and apply image correction as necessary, then send the data to the printer, or input the image data from the input device directly to the printer, decompress the image in the printer, and correct the image as necessary There are several options, such as printing after adding.
[0038]
In any case, in order to print a good image, it is determined whether the photographed image data is a good quality photographed image or an image that needs to be corrected. Therefore, it is necessary to further separate the source from which printing is performed after approaching a high-quality image by performing correction.
[0039]
The following can be considered as a good image.
1) White balance is good.
2) The contrast is appropriate.
3) Necessary gradations are assigned. That is, the exposure setting is good. 4) The saturation is appropriate.
5) The finish looks like a silver halide photograph.
6) The image of interest such as a person is corrected mainly.
[0040]
Even in a commercially available PC printer or a direct printer that does not pass through a PC, the items 1) to 5) are performed to some extent. In addition, the reason that the image of interest 6) is not corrected is that a large amount of processing is necessary for the detection and that the method has not been established.
[0041]
In particular, it is difficult to implement a direct printer having a weak processing capability, but the present invention solves this problem. As a means for this, there is a method in which the presence of the image of interest in the JPEG image file is detected and the correction of the detected image is confirmed, and then transferred to the whole image correction.
[0042]
<Configuration Example of Image Processing Device of First Embodiment>
An example of the configuration of the image processing apparatus according to the first embodiment is shown in a block diagram below.
[0043]
FIG. 1A is a block diagram of the decoding unit 10 showing a process of decompressing a JPEG file and information acquired at that time.
[0044]
In the process of converting a JPEG file into RGB bitmap data, first, the entropy decoding unit 1 performs entropy decoding using the code table 2. Next, the inverse quantization unit 3 stores the quantization table 4 used for inverse quantization as data in addition to performing inverse quantization.
[0045]
This inverse quantized data is frequency-converted as block unit data, and this data is acquired as data for obtaining image frequency characteristics. Thereafter, the inverse DCT unit 5 performs inverse DCT processing and inverse level shift, and performs Ycc-RGB conversion to develop normal RGB bitmap data.
[0046]
FIG. 1B is a block diagram illustrating a configuration example of the image processing apparatus according to the present embodiment including the decoding unit 10.
[0047]
The image processing apparatus according to the present embodiment includes a decoding unit 10, an image recognition unit (executing the first image extraction) 100 that recognizes an image area to be corrected based on data acquired from the decoding unit 10, and an image recognition unit 100. And a tone correction unit 20 that corrects the recognition area to a desired color. The reproduced and corrected image (BMP) output from the color tone correction unit 20 is sent to the printer and printed.
[0048]
The image recognizing unit 100 receives the decoded image (BMP) from the decoding unit 10 and detects a target color detection unit 101 that detects a specified target color (skin color in this example), and the decoded DCT data from the decoding unit 10. Based on the spatial frequency from the candidate area of the target color detected by the target color detection unit 101 and the spatial frequency generation unit 102 that receives the spatial frequency in the target color candidate area detected by the target color detection unit 101 Thus, a target color area selecting unit 103 for selecting a target area for color tone correction is provided. The target color detection unit 101 includes a decoded image storage unit 101a that stores a decoded image. However, the decoded image storage unit 101a does not need to be in the target color detection unit 101 and may be used as another processing unit. The target color area selection unit 103 has a determination table 103a for selection. The determination table 103 may include a plurality of determination tables corresponding to the image size.
[0049]
In order to further improve the processing of the present embodiment, the image recognition unit 100 receives the quantization table value from the decoding unit 10 and prohibits the color tone correction processing from the determination based on the threshold value 104a for prohibition. A color tone correction prohibition unit 104 is included.
[0050]
The color tone correction unit 20 performs a known color correction process using, for example, the color correction table 20a as the correction target color (skin color in this example) as the color of the selection region selected by the image recognition unit 100. This color tone correction processing is prohibited by a color tone correction prohibition signal from the target color area selection unit 103 or the color tone correction prohibition unit 104 under a predetermined condition. This correction processing may be performed on the entire image for simplification, but may be correction different for each region or partial correction if the purpose is to improve image quality. Since the feature of the present invention does not exist in such a color tone correction method, the description of this embodiment will be simplified.
[0051]
FIG. 1C is a diagram illustrating a configuration example of hardware and software for realizing image processing according to the present embodiment. Note that FIG. 1C mainly illustrates the image recognition unit 100 that is a characteristic part of the present embodiment. This apparatus can be realized by a general-purpose computer or a dedicated computer.
[0052]
110 is a CPU for arithmetic processing, 120 is a ROM for storing fixed data and programs used by the CPU 110 (OS and BIOS are assumed here), 130 is temporarily used for data and programs used by the CPU 110 in this embodiment. It is RAM to store. Here, in this example, the application program is loaded from the external storage unit 140 (to be described later) into the program load area 132 of the RAM 130 and executed by the CPU 110.
[0053]
The data stored in the data storage area 131 by the RAM 130 stores the decoded image data area 13a for storing the decoded image decoded by the decoding unit 10 or the reproduction image whose color tone has been corrected, and correction target color (skin color in this example) data. Correction target color area 13b, candidate area storage area 13c for storing the detected target color area, candidate group area 13d for storing candidate groups formed from the candidate areas, and a selection area for storing finally selected areas A storage area 13e, a decoded DCT data storage area 13f for storing the decoded DCT data from the decoding unit 10, a spatial frequency area 13g for storing the generated spatial frequency, and a discrimination table used for selecting the target color area Discriminating table area 13h, and a quantization table area 13i for storing the quantization table from the decoding unit 10 Contain regions 13k for storing the threshold groups used storage area 13j of the quantized coefficient addition value storing a value obtained by adding the coefficients of the quantization table, and the like prohibition of color correction.
[0054]
Reference numeral 140 denotes an external storage unit made of a large capacity or removable medium such as a disk or a memory card, and includes a floppy (registered trademark) disk, a CD, and the like.
[0055]
In the data storage area 141 of the external storage unit 140, determination tables 1 to n14a and a threshold value group 14b are stored. In addition, a database for storing other parameters, image data, and the like may be stored. The program storage area 142 is roughly classified into a target color area detection module 14c, a spatial frequency generation module 14d, a target color area selection module 14e, a color tone correction prohibition module 14f, and executed in the second embodiment to be described later. A feature part extraction module 14g is stored.
[0056]
Further, the apparatus of FIG. 1C may also serve as the decoding unit 10 and / or the color tone correction unit 20. In this case, the color tone correction table 14f as data, the color tone correction module 14i as a program, The blur correction module 14j used in the embodiment may be stored.
[0057]
Reference numeral 150 denotes an input interface, which inputs decoded data (BMP) from the decoding unit 10, decoded DCT data, quantization table values, and target color data that is unique to the apparatus or can be designated from the outside. Reference numeral 160 denotes an output interface that outputs a selection area and a tone correction prohibition signal. If the apparatus also serves as a color tone correction unit, the output is color tone corrected image data (BMP). Furthermore, the present apparatus may also serve as the decoding unit 10, in which case JPEG data is input and color tone corrected image data (BMP) is output. In that case, further data and programs are prepared.
[0058]
<Example of Operation Procedure of Image Processing Apparatus of First Embodiment>
Next, FIG. 6 shows a flowchart of person detection, which is attention image detection considered to be the most important in this image processing.
[0059]
The detection process shown in FIG. 6 is roughly divided into two stages. The first stage includes steps S601 to S608, and the chromaticity ratio in units of pixels of 8 * 8 blocks which are compression units is calculated from the entire image. The area corresponding to the defined chromaticity of the detection target and the other area are separated, and objects adjacent in the longitudinal direction of the corresponding block (in the horizontally long image as shown in FIGS. 10 and 14 are indicated in the horizontal direction) are gathered. The second stage includes steps S609 to S613, and the candidate corresponding to the defined chromaticity ratio falls within the feature value range defined by the detection target in the average value of the AC components of DCT. This is a stage in which it is determined whether or not the image is relevant, and the target image is determined based on the corresponding candidate.
[0060]
<Example of processing at the first stage>
In the first step S601, DCT data and a quantization table of 8 * 8 pixel block units are acquired, and at the same time, the image file is developed into RGB bitmap data.
[0061]
(Skin color chromaticity block detection example)
In step S602, the RGB bitmap data is searched for whether it corresponds to the chromaticity of the human skin color, which is the target image in the present embodiment, in units of 8 * 8 pixel blocks.
[0062]
In this case, since the ratio of the 8 * 8 pixel block image to the entire image differs depending on the input image size, the end portion proportional to the input image size is set. For example, in VGA (640 * 480), there are 8 blocks (longitudinal direction 4 × short direction 2), and in UXGA (1600 * 1200) images, there are 20 blocks (longitudinal direction 5 × short direction 4).
There are a plurality of chromaticity search methods. As known,
1) A color in which the ratio of B (blue) / G (green) falls within the range of 0.7 to 0.8 and the ratio of R (red) / G (green) falls within the range of 1.4 to 1.8. Have a degree.
2) As shown in the conceptual diagram of FIG. 5, the skin color can be represented by a probability ellipse. The following equations (1) to (3) are obtained as equations to be obtained.
[0063]
[Expression 1]

[0064]
In the present embodiment, the chromaticity distribution range represented by the following formula (4) taking into account the simplicity of processing is set as the skin color chromaticity range. This range is shown in FIG.
[0065]
[Expression 2]

[0066]
In the present embodiment, the block is a unit of 8 * 8 pixels as a unit for detecting the feature of the frequency component in the image, and the chromaticity determination is performed in units of 8 * 8 pixels from the structural logical simplicity. Execute.
[0067]
FIG. 7 illustrates chromaticity detection points used in the present embodiment. According to this, it is confirmed whether or not all the chromaticities at the four corners of the block of “8 * 8 pixel” unit are within the chromaticity range, and when all are within the range, the block is regarded as the appropriate chromaticity. Judgment.
[0068]
In FIG. 7, the second block from the left in the upper row and the 1, 2, and 3 blocks from the left row in the lower row correspond. In the upper leftmost block, the expression level at the upper left of the four points is determined to be a non-skin color pixel, and therefore a block including this is determined to be outside the skin color range. Similarly, the upper right 1 block and the lower right 2 block are out of range.
[0069]
FIG. 8 shows the determination based on the average chromaticity of the entire block in units of “8 * 8 pixels”. As a method for obtaining the average chromaticity in this block, in addition to the method of taking the average value of the pixel values of all the 8 * 8 blocks, the chromaticity data (Cb, Cr) before performing inverse DCT during decompression is included. It is also possible to obtain from the DC component. As an advantage of this method, since the determination can be made based on the color tone of the entire block, it can be expected that the accuracy is higher than that with a small number of detection points. Here, the contents about the detection of only the chromaticity in the natural image will be seen.
[0070]
FIG. 9 is for the purpose of equalizing the detection intervals in the entire image, although in the same idea as FIG.
[0071]
FIG. 10 is a general portrait photograph, and FIG. 14 is a photograph of a forest of dead trees having a chromaticity range similar to the skin color chromaticity of a person. FIG. 11 and FIG. 15 show the results obtained by performing detection only by matching chromaticity to each pixel with respect to FIG. 10 and FIG.
[0072]
As a result of detection in the portrait of FIG. 11, the skin color portion of the person is well detected. However, in the fence or background, those that satisfy the matching chromaticity are detected even in fine portions such as dust. I can see that For this reason, it turns out that an attention image cannot be specified only with chromaticity.
[0073]
In FIG. 14, the entire forest of dead trees having the same chromaticity is detected regardless of the purpose of detecting the skin color of a person. As described above, when the chromaticity determination is performed at the pixel level, it is impossible to specify the target image.
[0074]
By setting the detection to the block level, a state having a specific unit is targeted, so that it is difficult to be influenced by external noise.
[0075]
(Improvement of skin color chromaticity block detection)
FIG. 35 is a graph in which average chromaticity ratios of a plurality of human skin regions photographed with a digital camera are plotted. The abscissa represents the chromaticity ratio of the red component, which is obtained by calculating an average value in the entire detection area for each of 8 * 8 blocks obtained by calculation of “R / R + G + B”. The vertical axis represents the chromaticity ratio of the green component, which is obtained by calculating an average value in the entire detection area for each of 8 * 8 blocks obtained by calculation of “G / R + G + B”. In this graph, the average luminance of the region is classified into eight equal parts in association with the chromaticity ratio of the region.
[0076]
In the embodiment, the suitable chromaticity ratio range is set as follows.
Chromaticity ratio of red component “0.35 to 0.44”
Chromaticity ratio of green component “0.29 to 0.33”
Looking at the results of this graph, although most of the definitions fit, the human skin color does not fall within the definition due to the reflected light. Among them, attention should be paid to a distribution having a luminance of 160 or more. In particular, in the region classified into the daylighting luminance of 223 to 255, it can be recognized that the distribution is shifted to the white direction which is the upper left direction from the above definition.
[0077]
FIG. 38 is an image sample of an object having a high luminance area on human skin. The thing which showed the luminance distribution of this image is FIG.
[0078]
In FIG. 42, the horizontal axis represents the luminance range represented by 0 to 255 gradations. The left end is 0 and the right end is 255. The vertical axis represents the distribution of pixels having luminance components in the image. The small mountain on the left is the low brightness coat. The mountain with a slightly larger right eye in the middle is a paved road and has the largest area share. In the rightmost part, luminance information of a person's face is distributed.
[0079]
If group detection based on the chromaticity ratio in the primary extraction is performed on the image FIG. 38 according to the definition of the previous embodiment, the result shown in FIG. 39 is obtained. Considering the area of human skin in FIG. 38, it can be confirmed that the red component overflows due to the increase in luminance, and is whitened out from the applicable range of the chromaticity ratio. Although only this area can be used as information used for exposure correction or the like, detection of a human skin face area is insufficient for use in blur correction or the like.
[0080]
FIG. 40 shows the result when the range of the adaptive chromaticity ratio is simply expanded as follows.
Chromaticity ratio of red component “0.33 to 0.46”
Chromaticity ratio of green component “0.27 to 0.35”
The skin area of a person can be detected simply by widening the range of the adaptive chromaticity ratio, but the chromaticity ratio of the paved road is also adapted, and the area other than the target image is also detected, resulting in an effect. I can't do that.
[0081]
FIG. 48 shows the definition of the chromaticity ratio adaptive range of human skin considering the state of the input image in consideration of this state, based on the luminance class.
[0082]
The above definition has the same chromaticity ratio range as the above definition up to luminance 160, but corresponds to the movement of the detected chromaticity range of human skin as the luminance increases. Is in the following range.
Chromaticity ratio of red component “0.33 to 0.42”
Chromaticity ratio of green component “0.30 to 0.34”
Further, when the luminance range is from 161 to 219, the range is defined by calculation using a linear expression.
[0083]
FIG. 41 shows the detection result for FIG. 38 using this method.
[0084]
In this embodiment, the range of the adaptive area is not changed even when the chromaticity ratio range is high luminance. However, as the chromaticity ratio becomes close to white, there are many things that exist in nature other than human skin. Therefore, in order to prevent erroneous detection, the range of the adaptive region in the high luminance region may be narrowed.
[0085]
(Example of candidate detection by continuous range of skin color blocks)
The human skin extraction of the 8 * 8 pixel block in step S602 is not an appropriate group size, and in the block detection based on chromaticity, there is a restriction of continuous detection of blocks adjacent in the vertical and horizontal directions. The accuracy is further increased by performing detection.
[0086]
A continuous range for determining noise is set based on the concept that even a human skin color that does not satisfy the amount of data capable of recognizing a face in a print may be rejected.
[0087]
This portion is represented by the processing after step S603 in FIG. That is, in step S603, chromaticity detection is performed for each block in the longitudinal direction with respect to the image (in the case of a horizontally long image as shown in FIGS. 10 and 14), and candidates are selected in descending order of the number of consecutively detected blocks. Formulate.
[0088]
Next, in step S604, it is compared whether or not the continuous amount is included in an adaptive continuous amount as the target image. In this example, there are 2 blocks for VGA and 4 blocks for UXGA. As a result of the comparison, if there is a corresponding continuous block, the process proceeds to step S605, and a search is performed as to whether data satisfying the block continuous detection setting in the short direction exists in the image. In this example, there are 2 blocks for VGA and 4 blocks for UXGA.
[0089]
Next, in step S606, it is determined whether or not there is detection data. If there is detection data, the process proceeds to step S608, and from the data remaining in this process, the data having the largest continuous block amount in the longitudinal direction is sequentially ordered. Give candidate numbers.
[0090]
If the result of determination in step S606 is that there is no detected data, processing proceeds to step S607, where “no target area” is set, and the processing is terminated.
[0091]
<Example of processing at the second stage>
First, the effect when the chromaticity determination is performed in the continuous block is shown in FIGS.
[0092]
FIG. 12 shows the result of detecting the portrait image of FIG. In FIG. 12, color codes (1 = brown, 2 = red, 3 = orange, 4 = yellow, 5 = green, 6 = blue, 7 from the higher detection candidate priority (from the longer detection block length). = Purple, 8 = Gray) are arranged in this order, and the others detected are those in which only the chromaticity is within the appropriate range. It can be seen that a considerable number of non-corresponding candidates such as backgrounds can be deleted by continuous block detection compared to chromaticity detection at the pixel level.
[0093]
In FIG. 16, as a result of detecting the forest of dead trees in FIG. 14, it can be seen that other than the target image is detected even in the continuous block detection.
[0094]
(Example of target area selection from candidate areas)
(Example of VGA size discrimination table)
Next, frequency characteristics in the detected continuous chromaticity continuous block were calculated in human skin and dead tree forest using a plurality of image samples of VGA (video graphics array) size (640 * 480 pixels).
[0095]
FIG. 18 is a diagram in which DCT data of blocks detected in consecutive blocks of human skin captured in an image are arranged in ascending order of frequency, added in units of 10 from the lowest frequency, and divided by the number of consecutive blocks. This is a summary of the average frequency components per block of continuously detected blocks.
[0096]
Accordingly, in the drawing, the horizontal axis is a collection of 63 frequency components of AC components, and the group of 10 units is 6 groups, and the data with the highest frequency is the data for 3 units. The vertical axis represents a value obtained by adding elements of each frequency component.
[0097]
Thus, it can be seen that the larger the value, the higher the corresponding frequency component in that block. Also, the data lines are color-coded for each detected number of consecutive blocks. For example, “B2” represents an average value of data in which two consecutive blocks are detected, and “B15” represents an average value of data in which 15 consecutive blocks are detected. The same applies to the following, and represents the spatial frequency characteristic for each continuous detection value of the average human skin color portion from a plurality of images from “B2 to B15”.
[0098]
Looking at the detection results,
1) The value of the low frequency component is large, and after the third group from the bottom of the low frequency component, the value is 50 or less regardless of the number of continuous blocks.
2) The larger the continuous value of the continuous block, the lower the frequency characteristics.
[0099]
From these results, it can be said that the frequency characteristic of the human skin color part is composed of a relatively low frequency, and that the detected continuous block value is large, the photographed size of the subject is large. It can be seen that the frequency component is lowered by obtaining the average value as this continuous block.
[0100]
Even if the continuous block has the same chromaticity of the image of interest, the continuous block is made one representative value (for example, when the block is B6, the values of the detected six blocks are After adding each group as a group of 10 units in ascending order of frequency, the average value is obtained by dividing by 6 which is the continuous value. It can be seen that the appropriate frequency characteristics differ depending on the continuous detection value.
[0101]
FIG. 19 shows the result of preparing a plurality of photographs of dead trees having a chromaticity range similar to the skin color chromaticity of a person and performing the detection in the same manner as FIG.
[0102]
Looking at the detection results,
1) It can be confirmed that there is a lot of data in a high frequency component as compared with the spatial frequency characteristics of human skin.
2) The group of the lowest frequency component is not significantly different from the result of human skin.
[0103]
From these facts, it can be seen that detection objects having the same chromaticity can be distinguished by frequency characteristics by detecting frequency components in the continuous block.
[0104]
FIG. 4 is used in the present embodiment and represents the spatial frequency characteristics of the human skin that is the image of interest. The upper level is an appropriate range of frequency characteristics in the VGA (640 * 480) image.
[0105]
Consecutive block values are grouped into 3 groups of 2 to 8 groups (L8), 9 to 20 groups (L9 to 20), and 21 or more groups (L21 to L). Is set. The frequency characteristics of 7 groups of 10 units as described above were also used for the appropriate frequency range. This is performed with a balance between simplification of processing and detection accuracy, and there is no need to be bound by this.
[0106]
(Selection example of discrimination table of VGA size / UXGA size)
Next, a UXGA (1600 * 1200) image equivalent to 2 million pixels, which is widely used in digital cameras, will be compared with a VGA image under the same shooting conditions.
FIG. 25 shows the result of detecting a UXGA-size image of the same scene as the data used in FIG. 18, using the frequency characteristic amount and the average of the data amount in each range as in FIG. It is a thing.
Looking at the difference in detection characteristics from the VGA image,
1) The detection range of the continuous detection block is large. Specifically, the continuous value in VGA image detection is continuous detection of 2 to 15 blocks. On the other hand, in the UXGA image detection, continuous blocks having a detection value of 4 to 40 are detected.
2) UXGA has lower frequency characteristics in the block. For example, when the block average of 1 to 10 is viewed, the data amount is distributed in the range of 200 to 30 in the UXGA image while the data amount is distributed in the range of 300 to 100 in the VGA image. It is a general idea that what can become a noticed image within a single image falls within a specific ratio range as the size of all images.
That is, the target image to be detected that can be the target of image correction in the entire image is the image that is to be detected. When considering the balance with other areas, it cannot be said to be preferable. As an example based on this idea, it can be said that detection is less meaningful when the ratio of the image to the image is about 1/100 or less.
[0107]
For example, what if the image of interest occupies only 1/100 of the entire image in the longitudinal direction? When considering a general print, even if the optimum correction is applied to the image of interest, output The attention image that has undergone later correction occupies almost no space, and it is considered that it is more effective to correct the entire image than to correct a specific attention image, and it is out of the definition of attention. it is conceivable that.
Also in the present embodiment, there is an appropriate range of each image of interest suitable for each image size, and even if it is within this range or above, it is excluded from the target image detection candidates to be corrected.
Therefore, in this example, 1/100 in the longitudinal direction in the UXGA image is 1600 divided by 100, so 16 pixels is 2 blocks (8 * 8), and even if the chromaticity and the frequency component match, the length is considered implications. Removed from candidates. Incidentally, in the UXGA image, the detection continuous range is set to 4 to 62 blocks.
In a VGA image, 1/100 is 6.4 pixels with the same idea, which is less than 1 block. In the VGA image, the detection continuous range is set to 2 to 25 blocks. This difference is caused by a difference in occupation ratio with respect to all the images for one block (8 * 8) depending on the image size.
If it is considered that the image of interest is in a certain ratio range in the entire image, the meaning of the 8 * 8 pixel block in the spatial frequency varies depending on the image size. For this reason, even with the same photographed image, the frequency characteristics differ depending on the number of detected locks depending on the image size.
In the present embodiment, the detection continuous range is set for each image as described above, but it can be replaced by a mathematical expression. For example, the minimum continuous number can be set as in the following formula (5).
[0108]
[Equation 3]

Next, FIG. 26 is shown. FIG. 26 is a picture of a forest of dead trees having a chromaticity range similar to the skin color chromaticity of a person. In FIG. 19, data as a VGA image is processed, but the data is collected as a UXGA image. .
The comparison with FIG. 19 has the same tendency as the comparison between FIG. 18 and FIG. 25 described above. It can be seen that the high frequency component is considerably reduced in the group of 20 or more AC components. However, since the distribution is extremely different from the data of human skin, it can be separated by setting an adaptive range for each frequency band.
What is set for this purpose is the UXGA image determination table of FIG. The configuration is the same as that of the VGA image determination table of FIG. 4, and only the difference in the spatial frequency characteristics of the average block due to the difference in image size.
[0109]
(Example of sharing a discrimination table in VGA / UXGA)
FIG. 43 is an image sample of a person photographed in the UXGA (1600 * 1200) size. Further, FIG. 45 shows an image sample obtained by photographing the face of the same person in the VGA (640 * 480) size.
[0110]
When human skin area detection in the primary extraction is performed on these two sample images according to the definition of the previous embodiment, the detection area results are as shown in FIGS.
[0111]
When attention is paid to the face portion of a person, the number of detection blocks in the detection area is 719 in the UXGA image (FIG. 44), and is almost the same as 639 in the VGA image (FIG. 46). The feature value by value is almost the same as shown in the table below.
[0112]
[Table 1]

[0113]
That is, it can be seen that the feature quantity based on the AC component average value of DCT in the human skin detection area depends on the number of pixels (8 * 8 blocks) constituting the detected area rather than on the input image size. .
[0114]
Based on this idea, the relationship between the number of 8 * 8 blocks detected for a UXGA image and a VGA image for a plurality of images and the AC component average value of DCT is summarized as shown in FIG. 36 (UXGA) and FIG. VGA).
[0115]
The horizontal axes in FIGS. 36 and 37 are obtained by collecting the average values of the AC components of the DCT values in units of 10 from the low spatial frequency component region. The vertical axis represents the DCT code amount (sum of 10 units. However, the 7th group is the sum of 3 codes.)
Even in the same image, since the number of pixels is different and the occupation ratio of the human skin region in the entire image is the same, the detected number of 8 * 8 blocks is different. Therefore, in FIG. 36 and FIG. Although there are common 100 to 199 among them, it can be confirmed that the characteristics almost coincide with each other.
[0116]
FIG. 47 defines the 8 * 8 number of blocks as the detected image size and the average feature amount of the DCT value AC component in the detection area based on the above result.
[0117]
In the previous embodiment, it was necessary to have a feature amount determination table depending on the image size, but by using this method, the determination table can be simplified.
[0118]
A flowchart of primary extraction using this embodiment is shown in FIG.
[0119]
In step S5901, the block determination of the suitable chromaticity ratio defined in FIG. 48 at the first stage is performed.
[0120]
In step S5902, the adjacent state of the blocks matched in the above step is detected and grouped.
[0121]
In step S5903, candidate numbers are issued in descending order of the number of constituent blocks of the group among the grouped candidates.
[0122]
In step S5904, determination based on the DCT AC component feature amount is performed in the order of candidate numbers.
[0123]
In step S5905, necessary items are set so that the final detection result suitable for determination can be passed to image correction.
[0124]
(Example of target area selection procedure)
Returning to the description of FIG. As described above, candidate numbers 1 to n (n = 8 in the present embodiment) of the target image are assigned in order from the data with the largest continuous amount detected in the longitudinal direction (step S608). Candidate numbers are not assigned to those detected after n.
[0125]
Next, the process proceeds to step S609, and steps S609 to S612 are executed for the above candidates 1 to n to sequentially match whether or not the spatial frequency characteristic appropriate range determination table for the number of continuous blocks shown in FIG. To do. As a result, when there is no suitable candidate, it is determined that there is no image of interest.
[0126]
For these candidates 1 to n, when the image size is VGA (640 * 480), it is sequentially compared whether it matches the range of the spatial frequency characteristic appropriate range determination table for the number of continuous blocks shown in FIG. The feature value of the frequency characteristic from the first continuous detection block is compared to determine whether it is within the applicable range.
[0127]
At this time, as described above, in the case of an image having a different input image size, for example, a UXGA (1600 * 1200) image, it is preferable to perform the comparison determination using the UXGA table of FIG.
In this case, in this embodiment, the frequency is determined by an adaptive frequency characteristic determination table set for each image size or for each image size range (for example, a common table in a specific image range such as VGA to XGA and SXGA to QXGA). Although the comparison determination of the characteristics is performed, a determination criterion using a mathematical formula may be prepared instead.
For example, as a formula creation method, based on the VGA and UXGA tables that already have an optimization table, the image size between these two points and the amount of change due to the value of the frequency component are associated and approximated by a linear expression. Can be used.
(Example of determination of attention image and correction value intensity)
As a result of determination based on the spatial frequency, if there is no suitable candidate, it is determined that there is no target image (not shown in FIG. 6). If there is a suitable candidate, a candidate group is formed in step S613, and the correction amount intensity is determined using one of them as a target image.
FIG. 22 shows the flowchart.
[0128]
In the first step S2201, the number of candidates is confirmed (1 to m).
[0129]
Next, proceeding to step S2202, a candidate group is formed. In this case, a chromaticity matching block adjacent to the candidate is set as a candidate group. At this time, if a candidate group includes a plurality of candidates, the candidate number with the smaller candidate number is set.
[0130]
Next, proceeding to step S2203, it is determined whether there are a plurality of candidate groups. If the result of this determination is that there is one candidate group, the following points are calculated with the candidate group as the target image in step S2205.
[0131]
On the other hand, if there are a plurality of candidate groups, the process proceeds to step S2204, and the probability within the group is determined for each candidate group in order to determine which group has a higher weight as a target image to be corrected. Comparison is performed in terms of points, and a candidate group with higher points is set as the final target image. If they are the same point, the younger candidate group number is set as the final target image.
[0132]
As a point method, when there are “m” candidates, the point of candidate 1 is “m”. The point of candidate 2 is “m−1” and the like. Similarly, the point of candidate m is “1”.
[0133]
An example of the result of determining the superiority between candidate groups in this way is shown in FIG. There are two candidate groups detected, and since the point of the right group exceeds the point of the left candidate group, it is the final candidate.
[0134]
Further, since the absolute value of the number of points represents the reliability of the candidate group as a target image as the target image, the correction strength for the target image is determined based on this point. As a correction strength determination method, a threshold value by points is provided, and the strength is designated by the vertical relationship of the threshold value.
[0135]
However, instead of detecting the target image using such points, as a lighter process, the group containing the longest detection value candidate or the detection value itself may be used as the target image. In this case, there is a slight difference in detection probability compared to this embodiment, but this method may be more suitable for devices with low processing capabilities.
<Example of processing results of this embodiment>
The results for FIG. 10 and FIG. 14 are shown in FIG. 13 and FIG.
[0136]
In FIG. 13, the skin of the face of the person, which is the attention image, is detected. In FIG. 17, each candidate does not match the frequency characteristic and the candidate portion is shown in black. This represents a state in which the target image has not been detected, and indicates that the target image is not subjected to image correction with a weight.
[0137]
In this way, the target image can be detected. Since normal image correction is performed over the balance of the entire image, there are cases where the image quality of the image that is originally desired to be reduced due to backlight or the like, but correction is performed by detecting the image of interest according to this embodiment. By correcting exposure for optimizing brightness and color balance and saturation correction for preferable skin color as items based on the data of the image of interest, a higher quality image can be obtained.
[0138]
FIG. 24 shows an example of the result of performing the general image correction and the result of performing the image correction using the attention image detection of the present embodiment. As shown in FIG. 24, when the image correction is performed using the attention image detection of the present embodiment, the attention image such as a person can be printed better.
[0139]
<First Improvement Example of First Image Extraction Processing Procedure>
Next, characteristics of an image by the quantization table of the decoding unit will be described.
[0140]
FIG. 28 to FIG. 30 are 13 kinds of quantization tables for determining the image compression ratio when a typical image application creates a JPEG file. 28 to 30, table "00" has the highest image compression rate, and table "12" has a higher storage image quality and a lower image compression rate.
[0141]
The table is used to further compress the data after DCT of the 8 * 8 image described in (3) to (4) in FIG. 3, and is a value corresponding to each of the 64 spatial frequencies in the image. On the other hand, quantization is performed with the value at the same position.
[0142]
In the case of table “00”, when (3) in FIG. 3 is quantized, for example, the value of “224” at the upper left of the 8 * 8 block is changed to the value “32” at the upper left of the same position of the table “00”. It is quantized to "7". In the lower right of the 8 * 8 block having the highest frequency component, “−1” is quantized by “12” and becomes “0”.
[0143]
FIG. 31 shows the characteristics of the tables “00” to “12” in FIGS. 28, 29, and 30 and the quantization table used in the storage unit of a commercially available digital still camera.
[0144]
The horizontal axis represents 64 quantization tables AC in units of 10, and the vertical axis represents the average value of the 10 units. Therefore, it can be confirmed which spatial frequency component is quantized more.
[0145]
In the tables “00” to “04”, the quantization ratio of the low frequency component is large. In a commercially available digital still camera, the amount of quantization in the low frequency component is small, and it is less than “15” in the high frequency component region. The quantization ratio corresponding to this is a table “10” or more in the application, and the image quantization is classified as a low compression ratio.
[0146]
The image shown in FIG. 10 which is a portrait and the image shown in FIG. 14 which is a dead forest matching the human skin chromaticity, and the image after performing quantization using the above table one by one, respectively, The results of detection are shown in FIGS. 32 and 33. FIG.
[0147]
In the case of FIG. 32, when the table “00” is used, it is deviated from the human characteristics by the determination table (FIG. 4) due to the magnitude of the quantization of the low frequency component. In the table “02”, a person is detected, but the detection point is low. Stable detection is possible with the table "06" or higher.
[0148]
In the case of FIG. 33, when the table “00” is used, the detected value that deviates from the original human skin determination table (FIG. 4) in the high frequency range is an error due to quantization and becomes “detection determination”. It has become. Even in this case, stable detection is possible with the table "08" or higher.
[0149]
Therefore, since the accuracy of determination varies depending on the value of the quantization table, the quantization table determination for this is performed. That is, since the feature extraction is performed using the DCT AC component of the compressed image as the feature amount, if the quantization table value is too large, the AC component in the image is lost, which is not judged at the time of detection. There is a possibility that. Therefore, the sum of the quantization table values is estimated and set from actual data so that it can be detected without problems in ordinary digital cameras and images considered to be of high quality by JPEG in Adobe PhotoShop. It should be said that it does not exceed 630 ″.
[0150]
In the present embodiment, in order to simplify the determination, each item of the quantization table is added, and it is determined that the corresponding image can be used for the determination only when the total is “630” or less.
[0151]
In addition to this, there are various quantization table determination methods depending on the spatial frequency characteristics of the target image to be detected, such as a method that pays attention to the value in the low frequency component, a method that sets the sum up to the low frequency 30 to “150”, and the like. Although it can be considered, the characteristics of the quantization table may be used.
[0152]
When the processing of this example is executed before the second stage, the processing shown in the flowchart of FIG. 34 is performed from the acquired quantization table, and the AC component characteristic determination table is set.
[0153]
As processing, first, all values in the quantization table are added in step S3401. This value represents the degree of quantization. In step S3402, it is determined whether or not the added value is equal to or greater than a predetermined value. For example, if this value is equal to or greater than 630, it is considered that the spatial frequency characteristic of the target image has changed. Stop and exit. If it is less than 630, it is determined that there is no effect on the spatial frequency characteristics of the image of interest, and the AC component characteristic determination table is selected (depending on the input image size) in step S3403, and the process proceeds to the selection process based on the spatial frequency.
[0154]
Further, the processing of this example may be executed before the first stage. In this case, the processing of FIG. 6 is executed in step S3403.
[0155]
<Improvement Example 2 of First Image Extraction Processing Procedure>
In the first image extraction processing procedure described above, in the first stage, a case where a predetermined number of blocks in the short direction (2 blocks in VGA and 4 blocks in UXGA) and flesh color blocks continue in the longitudinal direction is a candidate. In the second stage, selection by spatial frequency and grouping of candidates adjacent in a short direction were performed.
[0156]
However, in the first stage, candidate groups adjacent in the short direction are created from the extracted candidates, and the candidate groups are renumbered based on the above-mentioned number of points, for example, and in the second stage, the spatial frequency of each candidate group The image of interest may be extracted by performing the sorting by.
[0157]
This process simplifies the second stage process and makes it possible to more stably execute the selection based on the spatial frequency.
[0158]
In the present embodiment, a method of detecting a target image for optimal image processing for printing is shown, but it goes without saying that it can also be used for display.
[0159]
In this embodiment, in order to see the frequency component characteristics of the detected image, the frequency information is added in units of 10 and 63 frequency components are grouped into 7 groups, but the image characteristics are determined. Needless to say, all 63 frequencies may be used as they are.
[0160]
Furthermore, although the short direction was detected after detecting the continuous amount from the longitudinal direction of the image, it is possible to reverse this order, and besides this method, the detection blocks are detected as a group of one row. There are a number of detection methods that combine chromaticity and frequency characteristics, such as the method of checking the spatial frequency characteristics in terms of how to capture block groups adjacent in all directions in the group detected by chromaticity. It goes without saying that it is included in the invention.
[0161]
In this embodiment, as shown in FIG. 4 and FIG. 27, the continuous detection values are divided into three groups and compared with the appropriate range of frequency characteristics to determine whether the frequency characteristics pass or fail. In order to simplify the embodiment, an appropriate range may be set for each continuous value. Since continuous values have a correlation, a method based on a theoretical formula instead of a table method may be used. Moreover, although the 7 group value was used for the frequency characteristics, it may be performed at all 63 frequencies, or may be determined by paying attention to a specific frequency.
[0162]
In the present embodiment, the target image that is the object of detection is described as being set in a human skin region. However, what can be detected by the frequency component or the frequency component and chromaticity is the human skin color. There are not only the sky, the sea, and the greenery of the trees.
[0163]
In the present embodiment, using a value obtained by collecting the frequency components of 8 * 8 block unit data in units of 10 from the lowest frequency, a group of 10 sums (the highest frequency group is the sum of 3). In the case of a JPEG file, the frequency characteristic is represented by 63 AC components for one DC component, so the characteristic is not seen as an aggregate of ten. May be.
[0164]
Further, it may be judged from 63 individual characteristics or may be further grouped. Further, the characteristics may be derived by using only a specific frequency component. As described above, there are any number of methods for using the AC component to derive the characteristics using the frequency characteristics.
[0165]
Furthermore, in this embodiment, candidates are extracted in the continuity of chromaticity corresponding blocks in order to detect a target image in the vertical and horizontal directions based on the concept of 8 * 8 block concatenation. Needless to say, the method of determining the aggregate is not limited to this method.
[0166]
In this embodiment, the value obtained by deleting the end block is used as a characteristic based on the continuous value detected for the continuously detected chromaticity block. However, the boundary of the chromaticity block is set based on the adaptation by the frequency component (see FIG. 21) A method of separation by chromaticity and frequency components for determining a block aggregate, such as excluding a block having a frequency characteristic higher than a specific value in advance before performing a chromaticity search, is performed. Although there are a plurality of methods and combinations, they are included in the scope of the present patent.
[0167]
The above FIG. 21 will be described. The left side of FIG. 21 is the original image, and the right side image is determined based on whether the total data value of the high frequency component in the frequency component of the 8 * 8 pixel block, which is the compression unit of this JPEG file image, exceeds or does not exceed the threshold value. become. The bright portion is a region having a high frequency component, and the dark portion is a region having a low high frequency component. It is also possible to detect an image of interest by chromaticity determination provided with this region as a boundary.
[0168]
Although this embodiment discloses a method using a “JPEG file” as an image compression file, the same concept applies to other files that use conversion to frequency components, such as a “JPEG2000 file”. Needless to say, the detection of the target image can be realized by a simple process.
[0169]
In the above-described embodiment, attention image detection is performed with arrangement information and the like centered on frequency components and chromaticity. This aim is to perform image correction centering on the attention image. Therefore, when the data including the brightness of the detected image area of interest is detected as a state where it is not effective to perform correction, for example, when it is crushed with a value that is too dark, it tends to be forced to have gradation. Then, it may be full of noise.
[0170]
In order to avoid this inconvenience, a luminance average is obtained from the detection result of FIG. 6 using each block DC component data of the detected partial area, and it is compared whether it is within a luminance range suitable for correction. Thus, it is possible to perform image correction on the target image with higher accuracy.
<Configuration Example of Image Processing Device of Second Embodiment>
A case where a person's face is photographed as an attention image is the image sample in FIG. This image sample has 300,000 pixels, which falls into a genre of a class that is low in recent capabilities of input devices, and has an image file size of 60 Kbytes and a high compression rate. For such an image, it is not possible to expect a significant improvement in image quality even if exposure correction is performed by detecting an attention image having the above-described configuration. For such an image, as an effective correction, it is known to perform correction with an unsharp mask to eliminate blurring and to perform sharp correction, but as a disadvantage, If unsharp mask correction is applied to the entire image, the skin area tends to be rough, so if it is applied to the entire image, the correction strength must be reduced and effective eyes and In order to apply only to the mouth area, it was difficult to automate the area designation.
[0171]
FIG. 50 is a block diagram illustrating a configuration example of the image processing apparatus according to the second embodiment. In FIG. 50, the components of the first embodiment are illustrated as black boxes. This component is basically the same as in the first embodiment. A characteristic configuration of the second embodiment is the addition of a second image extraction unit 100b and a blur correction processing unit 30.
[0172]
The second image extraction unit 100b forms the image recognition unit 100 together with the first image extraction unit 100a of the first embodiment. The second image extraction unit 100b includes a candidate area selection unit 310 that selects a candidate area based on the aspect ratio threshold value 301a of the image, and a feature part in the candidate area selected based on the feature part threshold value 302a. (In this example, a feature part extraction unit 302 that extracts eyes, noses, mouths, eyebrows, and the like in the face area) is included. In this example, the face area is selected according to the aspect ratio of the face expert.
[0173]
Based on the feature part information output from the feature part extraction unit 302, the decoded image output from the decoding unit 10 is a value calculated by the blur correction value calculation unit 30a prior to the color tone correction in the first embodiment. Accordingly, the blur correction processing unit 30 performs blur correction processing.
[0174]
The hardware and software configuration is similar to that shown in FIG.
[0175]
<Example of Operation Procedure of Image Processing Apparatus of Second Embodiment>
FIG. 60 shows a schematic flowchart of the present invention using the human skin region detection function configured as described above.
[0176]
The flowchart will be described.
This flowchart shows the detection of face area of a person in the input image according to the present invention and the detection and correction processing of eyes and mouths in the skin area of the person face by the number of constituent pixels and the value of the quantization filter. It shows the flow of proper strength setting and execution.
In step S5601, based on the number of pixels of the target image and the quantization table, information for determining whether a secondary image needs to be extracted is acquired based on the magnification and resolution information at the time of printing. The image having a small number of pixels as shown in FIG. 51 can be a target for secondary image extraction.
[0177]
In step S5602, attention image extraction processing according to the flow disclosed in FIG. 6 is performed. In this embodiment, a region having a feature amount of the skin region of a human face is detected. For the image in FIG. 51, an area as shown in FIG. 52 is extracted. The white portion in FIG. 52 is determined as a region having a human face skin feature, and the black portion is the other portion. In this detection, the average luminance and the like are calculated in addition to the feature amount of the skin region portion.
[0178]
In step S5603, a determination is made based on the logical sum of the detection results in steps S5601 and S5602. If secondary image extraction is not necessary, the process proceeds to the conventional processing in step S5607. If secondary image extraction is necessary, the process advances to step S5604.
[0179]
In step S5604, a secondary image extraction process is performed. Specifically, an area that is a candidate for eye or mouth outside the primary extraction chromaticity ratio range within the detected human skin color area of FIG. 52 is detected and determined. Details will be described later.
[0180]
In step S5605, it is determined whether the secondary image extraction is successful. If unsuccessful, the process proceeds to the conventional process in step S5607. If successful, the process proceeds to step S5606. In step S5606, blur correction processing is executed. Thereafter, the extraction result of the target image detection in step S5607 is set to pass to the image correction. In step S5608, image correction reflecting the extraction result is performed.
[0181]
Next, FIG. 61 shows a flowchart for explaining the portion in the secondary extraction process in more detail.
[0182]
In step S5701, the aspect ratio of the candidate image region for primary image extraction is calculated from the information from step S5602.
[0183]
In step S5702, it is determined whether the candidate image is suitable for the aspect ratio definition of the person's face. If the candidate image by the primary extraction does not adapt to the extracted image definition, the process proceeds to step S5709. When adapting, it progresses to step S5703.
[0184]
In step S5703, an area outside the primary extracted chromaticity ratio range in the candidate area is detected. In FIG. 52, a black region is left independently in a white region which is a human skin region. The number of constituent pixels (number of blocks), average chromaticity, average value of AC components of DCT, and the like are calculated.
[0185]
In the present embodiment, the eye, mouth, eyebrows, glasses, and the like can be considered as components other than the skin color of the human face.
[0186]
54, 56, and 58 are shown as eye image samples.
[0187]
FIG. 54 shows a structure in which the number of pixels representing the eye is 12 pixels long and 22 pixels wide. This image is compressed using the quantization table for image compression shown in FIGS. 28 and 29. It is what went. The thing using the table 11 is F12, the thing using the table 08 is F9, the thing using the table 06 is F7, and the thing using the table 03 is F4.
[0188]
FIG. 55 shows an eye configuration in which the number of pixels is 24 pixels long and 44 pixels wide. This image is compressed using the quantization table for image compression shown in FIGS. 28 and 29. It is what went. The thing using the table 11 is F12, the thing using the table 08 is F9, the thing using the table 06 is F7, and the thing using the table 03 is F4.
[0189]
FIG. 56 shows an eye configuration with 48 pixels in the vertical direction and 88 pixels in the horizontal direction. This image is compressed using the quantization table for image compression shown in FIGS. 28 and 29. It is what went. The thing using the table 11 is F12, the thing using the table 08 is F9, the thing using the table 06 is F7, and the thing using the table 03 is F4.
[0190]
In step S5704, it is determined whether or not the chromaticity ratio is within the set chromaticity ratio. It is possible to set the chromaticity ratio of the mouth. If it is determined that the candidate is inappropriate, the process proceeds to step S5709. When adapting, it progresses to step S5705.
[0191]
In step S5705, it is calculated whether the area detected in step S5703 is an eye candidate and is an appropriate size as an area ratio with the skin area of the human face detected in step S5701.
[0192]
In step S5706, the aspect ratio is calculated in order to confirm whether the area detected in step S5703 is within the proper outline ratio as eye candidates.
[0193]
In step S5707, it is determined whether the result calculated in steps S5705 and S5706 is an eye candidate region. If it is determined that the candidate is inappropriate, the process proceeds to step S5709. When adapting, it progresses to step S5708.
[0194]
In step S5708, the correction strength is determined from the determination of the blur amount of the image and the determination result, and the correction is executed.
[0195]
First, the blur amount is determined. First, an image obtained by adding a certain unsharp mask process to FIGS. 54, 56, and 58, which are the image samples of the above-described eye region, is shown in FIGS. 59.
[0196]
62, 63, and 64 show the feature values of the AC component average value of DCT relating to this image.
[0197]
FIG. 62 shows a structure in which the number of pixels representing the eye is 12 pixels long and 22 pixels wide, and the horizontal axis shows the average value of the AC component of the DCT value in the entire target image area with a low spatial frequency component as described above. The data is collected in units of 10 from the area. The vertical axis represents the DCT code amount (sum of 10 units. However, the seventh group is the sum of three.) From the above, the difference in data amount due to the quantization filter is the high frequency component of the spatial frequency. However, the difference in the target area that is the eye is not so large. Since the spatial frequency characteristics for the low frequency range have been improved by unsharp mask processing, it can be seen that the sharpness is attached.
[0198]
FIG. 63 shows an eye configuration in which the number of pixels is 24 pixels vertically and 44 pixels horizontally, and the configuration of the graph is the same as FIG. From the above contents, the difference in the data amount due to the quantization filter appears in the high frequency component of the spatial frequency, but the difference in the target region that is the eye is not so large. Since the spatial frequency characteristics for the low frequency range have been improved by unsharp mask processing, it can be seen that the sharpness is attached.
[0199]
FIG. 64 shows a structure in which the number of pixels representing the eye is 48 pixels long and 88 pixels wide, and the structure of the graph is the same as FIG. From the above contents, the difference in the data amount due to the quantization filter appears in the high frequency component of the spatial frequency, but the difference in the target region that is the eye is not so large. Since the spatial frequency characteristics for the low frequency range have been improved by unsharp mask processing, it can be seen that the sharpness is attached.
[0200]
As the difference depending on the image size, the feature quantity that is the average value of the AC components of the DCT value decreases as the number of constituent pixels increases. Further, the distribution status of the AC component is the same.
[0201]
In order to reflect the effect of the unsharp mask processing by the number of pixels of the eye image and the quantization filter value in the correction, an unsharp mask as shown in FIG. 65 is used depending on the size of the detected secondary extraction region and the quantization filter value. Specify the correction strength.
[0202]
In addition, when the distribution range is large due to the luminance distribution of the skin color area detected by the primary extraction, for example, in the outdoors, when the brightness width of the human face skin area is large by direct exposure to sunlight, the effect of correction to sharpen 66, when the luminance expression is 0 to 255 due to the luminance distribution range of the skin color area detected by the primary extraction as shown in FIG. 66, the secondary for the skin color area detected by the primary extraction having 150 or more luminance range data. Set the intensity of unsharp mask processing to the extraction area to be high.
[0203]
In step S5709, image correction reflecting the value of the extraction result is executed.
[0204]
Figure 5 5 shows the result of performing the above processing on 3 It is. It can be confirmed that the area is identified and the correction is appropriately applied to the blurred image.
[0205]
As described above, according to the present invention, the spatial frequency data and the quantization table are acquired in the process of decompressing the compressed image file, and the target image in the image file is obtained by combining the spatial frequency data and the quantized data characteristics. Since it is used for searching, information including AC component information for each image data block can be acquired without performing advanced calculations, and a target image in an image file can be searched.
[0206]
Further, according to another feature of the present invention, even in an embedded device having a processing capacity lower than that of a personal computer, such as when printing directly from a digital camera, the processing within a range that can be used as a product, The presence or absence of the target image to be corrected and the appropriateness of the value can be detected in the compressed image file to be printed, and image correction can be performed with emphasis on the target image as necessary.
The image recognition apparatus according to the present embodiment described above is configured by a computer CPU or MPU, RAM, ROM, and the like, and can be realized by operating a program stored in the RAM or ROM.
[0207]
Therefore, the program that causes the computer to perform the above functions can be realized by recording the program on a recording medium such as a CD-ROM and causing the computer to read the program. As a recording medium for recording the program, a flexible disk, a hard disk, a magnetic tape, a magneto-optical disk, a nonvolatile memory card, and the like can be used in addition to the CD-ROM.
[0208]
In addition, the functions of the above-described embodiments are realized by executing a program supplied by a computer, and the program is used in cooperation with an OS (operating system) or other application software running on the computer. When the functions of the above-described embodiment are realized, or when all or part of the processing of the supplied program is performed by a function expansion board or a function expansion unit of the computer, the function of the above-described embodiment is realized. Such a program is included in the embodiment of the present invention.
[0209]
In order to use the present invention in a network environment, all or a part of the program may be executed by another computer. For example, the screen input process may be performed by a remote terminal computer, and various determinations, log recording, and the like may be performed by another center computer or the like.
[0210]
【The invention's effect】
According to the present invention, it is possible to optimize the chromaticity ratio determination used for extracting the target image area according to the luminance of the image, and to perform stable extraction of the target image area and reproduction of the image.
[0211]
In addition, when the human face image is blurred, information for proper correction can be acquired.
[Brief description of the drawings]
FIG. 1A is a conceptual diagram showing a flow of acquiring data necessary for decompressing a JPEG image according to an embodiment of the present invention.
FIG. 1B is a block diagram illustrating a configuration example of an image processing apparatus according to the first embodiment.
FIG. 1C is a block diagram illustrating a configuration example of hardware and software of an image processing apparatus according to an embodiment.
FIG. 2 is a conceptual diagram showing a flow of a process for converting image data of the embodiment into a JPEG format.
FIG. 3 is a diagram illustrating a process of converting an 8 * 8 block, which is a JPEG image compression unit according to the embodiment, into a JPEG format.
FIG. 4 is a diagram illustrating a discrimination table using AC component characteristics of 8 * 8 blocks that are JPEG file image compression units according to the embodiment.
FIG. 5 is a diagram illustrating a skin color RG chromaticity distribution example according to another embodiment;
FIG. 6 is a flowchart of detecting an image of interest from JPEG image decompression according to the embodiment.
FIG. 7 is a diagram illustrating a chromaticity detection method in an 8 * 8 block that is a JPEG file image compression unit according to the embodiment.
FIG. 8 is a diagram illustrating a chromaticity detection method using a DC component in an 8 * 8 block that is a JPEG file image compression unit according to the embodiment.
FIG. 9 is a diagram illustrating a detection state in an 8 * 8 block when detection is performed using 3-bit thinning in chromaticity detection according to the embodiment.
FIG. 10 is a diagram illustrating a first example of a detection JPEG image sample according to the embodiment.
FIG. 11 is a diagram illustrating an example of a BMP file obtained as a result of detecting a first image sample based on only chromaticity.
FIG. 12 is a diagram illustrating an example of a BMP file obtained as a result of arranging the first image sample on the basis of chromaticity detection in units of 8 * 8 blocks and performing continuous block detection.
FIG. 13 shows an example of a BMP file obtained as a result of the arrangement of the first image sample based on chromaticity detection in units of 8 * 8 blocks and detection by continuous blocks and AC components by attention image detection according to the embodiment. FIG.
FIG. 14 is a diagram illustrating a second example of the detection JPEG image sample according to the embodiment.
FIG. 15 is a diagram illustrating an example of a BMP file obtained as a result of detecting a second image sample based on only chromaticity.
FIG. 16 is a diagram illustrating an example of a BMP file obtained as a result of arranging the second image sample based on chromaticity detection in units of 8 * 8 blocks and performing continuous block detection.
FIG. 17 shows an example of a BMP file obtained as a result of detecting the arrangement of the second image sample based on chromaticity detection in units of 8 * 8 blocks and detection by continuous blocks and AC components based on attention image detection according to the embodiment. FIG.
FIG. 18 is a diagram illustrating frequency characteristics of an AC component in a continuous chromaticity detection value of human skin detection data in human skin detection according to the embodiment.
FIG. 19 is a diagram illustrating a table of frequency characteristics of AC components in continuous chromaticity detection values of dead forest detection data in human skin detection according to the embodiment.
FIG. 20 is a diagram illustrating a skin color RG chromaticity distribution according to the embodiment;
FIG. 21 is a diagram illustrating an example of a detection method for creating a boundary based on frequency characteristics.
FIG. 22 is a flowchart illustrating a candidate group determination procedure according to the embodiment;
FIG. 23 is a diagram illustrating an example of a detection result image of candidate group determination according to the embodiment.
FIG. 24 is a diagram illustrating an example of a comparison result of image correction using attention image detection according to the embodiment.
FIG. 25 is a characteristic diagram illustrating frequency characteristics of AC components in continuous chromaticity detection values of human skin detection data in a UXGA (1600 * 1200) image in human skin detection according to the present embodiment.
FIG. 26 is a diagram showing a table of frequency characteristics of AC components in continuous chromaticity detection values of dead forest detection data in a UXGA (1600 * 1200) image in human skin detection according to the present embodiment.
FIG. 27 is a diagram illustrating an example of a discrimination table for a UXGA (1600 * 1200) image using AC component characteristics of 8 * 8 blocks that are JPEG file image compression units according to the present embodiment.
FIG. 28 is a diagram illustrating an example of a quantization table used in an existing application.
FIG. 29 is a diagram illustrating an example of a quantization table used in an existing application.
FIG. 30 is a diagram illustrating an example of a quantization table used in an existing application.
FIG. 31 is a diagram illustrating a relationship between a compression ratio and a frequency characteristic in a quantization table.
FIG. 32 is a diagram illustrating an example of a result of attention image detection.
FIG. 33 is a diagram illustrating an example of a result of attention image detection.
FIG. 34 is a flowchart illustrating an example of a procedure for setting an AC component characteristic determination table from an acquired quantization table.
FIG. 35 is a diagram illustrating a distribution state by classifying human skin region chromaticity ratios in a plurality of images according to average luminance of a detection region in the present embodiment.
FIG. 36 shows a human skin area existing in a UXGA (1600 * 1200 pixel) size image file in this embodiment, and the AC component of DCT in an 8 * 8 block in JPEG compression within the human skin area; It is a figure which shows the table | surface which classified the average value with the detected pixel number (8 * 8 block number in JPEG compression).
FIG. 37 shows a human skin area existing in an image file of VGA (640 * 480 pixels) size in this embodiment, and the AC component of DCT in an 8 * 8 block in JPEG compression in the human skin area. It is a figure which shows the table | surface which classified the average value with the detected pixel number (8 * 8 block number in JPEG compression).
FIG. 38 is a diagram illustrating an image sample in which whiteout occurs in the face area of a person in the present embodiment.
39 is a diagram showing areas detected by performing human skin area detection in a fixed chromaticity ratio range on the image sample of FIG. 38 in the present embodiment.
40 is a diagram illustrating a result of human skin area detection performed on the image sample of FIG. 38 using a definition in which the compatible chromaticity ratio range is expanded in the present embodiment.
FIG. 41 is a diagram illustrating a result of human skin area detection performed using the definition of a suitable chromaticity ratio range depending on a luminance value for the image sample in FIG. 38 in the present embodiment.
42 is a diagram showing a luminance histogram of the entire image sample of FIG. 38 in the present embodiment.
FIG. 43 is a diagram showing an image sample of a person photographed in UXGA (1600 * 1200) size in the present embodiment.
44 is a diagram showing areas detected by performing human skin area detection on the image sample of FIG. 43 in the present embodiment.
FIG. 45 is a diagram showing an image sample of a person photographed in VGA (640 * 480) size in the present embodiment.
46 is a diagram showing areas detected by performing human skin area detection on the image sample in FIG. 45 in the present embodiment.
FIG. 47 is a diagram showing a determination table of spatial frequency feature amounts that are AC components of DCT according to the size of candidate regions in the present embodiment.
FIG. 48 is a diagram showing a determination chromaticity ratio range table 2 for extraction candidate regions in the present embodiment.
FIG. 49 is a flowchart illustrating a processing procedure of a DCT feature amount determination method based on the number of pixels (number of blocks) in an extraction region based on a chromaticity ratio in the present embodiment.
FIG. 50 is a block diagram illustrating a configuration example of an image processing apparatus according to a second embodiment.
FIG. 51 is a diagram showing an image sample obtained by photographing a human face with a cell phone having a CCD of 300,000 pixels in the present embodiment.
52 is a diagram showing human skin area detection for the image sample in FIG. 51 and showing the detected area (white portion) in the present embodiment.
53 is a diagram showing a result of selecting an eye or nose candidate in a human skin (face) region from the image sample in FIG. 51 and performing unsharp mask processing only on that region in the present embodiment. .
FIG. 54 is an “eye” image captured by 22 * 12 pixels in the present embodiment. It is a figure which changed the value of the quantization table of JPEG, and is a figure which shows the image which applied four types of preservation | save from "F4" of high compression to "F12" of low compression.
55 is a diagram showing a result of performing unsharp mask image processing on each image of FIG. 54 in the present embodiment.
FIG. 56 is an “eye” image captured with 44 * 24 pixels in the present embodiment. It is a figure which changed the value of the quantization table of JPEG, and is a figure which shows the image which applied four types of preservation | save from "F4" of high compression to "F12" of low compression.
57 is a diagram showing a result of performing unsharp mask image processing on each image of FIG. 56 in the present embodiment.
FIG. 58 is an “eye” image photographed with 88 * 48 pixels in the present embodiment. It is a figure which changed the value of the quantization table of JPEG, and is a figure which shows the image which applied four types of preservation | save from "F4" of high compression to "F12" of low compression.
59 is a diagram showing a result of performing unsharp mask image processing on each image of FIG. 58 in the present embodiment.
FIG. 60 is a flowchart 1 of an extended image extraction process including a correction process in the present embodiment.
FIG. 61 is a flowchart 2 of an extended image extraction process including a correction process in the present embodiment.
FIG. 62 is a diagram illustrating a graph of a DCT characteristic comparison (22 * 12 size) between a quantized filter value and unsharp mask processing in the present embodiment.
FIG. 63 is a diagram illustrating a graph of a DCT characteristic comparison (44 * 24 size) between quantization filter values and unsharp mask processing in the present embodiment.
FIG. 64 is a diagram illustrating a graph of a DCT characteristic comparison (88 * 48 size) between a quantized filter value and unsharp mask processing in the present embodiment.
FIG. 65 is a diagram showing an intensity correspondence table of an unsharp mask according to an image quantization filter value and a detection area size in the present embodiment.
FIG. 66 is a diagram illustrating a relationship between a luminance distribution of a human skin color area and an unsharp mask intensity setting for an eye area that is an internal area in the present embodiment.

Claims

圧縮符号化された画像データを復号して画像を再生する画像再生方法であって、
復号した前記画像データから所定の色度範囲の連続するブロックを抽出するブロック抽出ステップと、
前記連続するブロックの空間周波数の平均値に基づいて、前記連続するブロックを注目画像領域とするか否かを判定する判定ステップと、
前記注目画像領域から特徴部位を抽出する特徴部位抽出ステップと、
前記抽出された特徴部位の画素数と前記圧縮符号化に利用した量子化フィルタ値とに基づいて、前記特徴部位のボケの補正強度を決定する決定ステップと、
前記決定ステップで決定された補正強度に応じて前記特徴部位のボケを補正するボケ補正ステップと、
前記補正ステップでボケが補正された画像を再生する再生ステップとを有することを特徴とする画像再生方法。An image reproducing method for decoding compressed and encoded image data and reproducing an image,
A block extraction step of extracting continuous blocks of a predetermined chromaticity range from the decoded image data ;
A determination step of determining whether or not the continuous block is set as an image area of interest based on an average value of spatial frequencies of the continuous blocks;
A feature part extraction step for extracting a feature part from the image area of interest;
A determination step of determining a blur correction strength of the feature portion based on the number of pixels of the extracted feature portion and a quantization filter value used for the compression encoding;
A blur correction step for correcting blur of the characteristic portion according to the correction strength determined in the determination step ;
A reproduction step of reproducing the image in which the blur is corrected in the correction step .

前記決定ステップでは、前記特徴部位の画素数が少なく前記量子化フィルタ値の値が大きいほど、ボケをより強く補正する補正強度に決定することを特徴とする請求項１記載の画像再生方法。2. The image reproducing method according to claim 1, wherein in the determining step, the correction intensity for correcting the blur more strongly is determined as the number of pixels of the characteristic part is small and the value of the quantization filter value is large.

前記所定の色度範囲を、前記復号した画像データの輝度値に基づいて設定することを特徴とする請求項１記載の画像再生方法。2. The image reproducing method according to claim 1, wherein the predetermined chromaticity range is set based on a luminance value of the decoded image data .

前記圧縮符号化された画像データを復号して前記復号した画像データを生成する復号ステップと、
前記復号した画像データから色度、空間周波数、輝度を求めるステップとを更に有することを特徴とする請求項１記載の画像再生方法。A decoding step of generating image data said decoding to decode the image data that has been said compression encoding,
2. The image reproducing method according to claim 1, further comprising a step of obtaining chromaticity, spatial frequency, and luminance from the decoded image data.

前記圧縮符号化された画像データはJPEG画像データであり、前記復号した画像データはＤＣＴ係数と逆ＤＣＴ変換されたビットマップデータとを含むことを特徴とする請求項４記載の画像再生方法。5. The image reproduction method according to claim 4, wherein the compression-encoded image data is JPEG image data , and the decoded image data includes DCT coefficients and inverse DCT transformed bitmap data.

前記ブロック抽出ステップで抽出された前記連続するブロックの数に基づいて、注目画像領域となる候補を選別する選別ステップを更に有することを特徴とする請求項１記載の画像再生方法。 2. The image reproduction method according to claim 1, further comprising a selection step of selecting a candidate to be a target image area based on the number of the continuous blocks extracted in the block extraction step.

前記ボケの補正は、アンシャープマスク処理により行われることを特徴とする請求項１記載の画像再生方法。 The image reproduction method according to claim 1, wherein the blur correction is performed by an unsharp mask process.

請求項１乃至７のいずれか１項に記載の画像再生方法のステップをコンピュータに実行させるためのコンピュータプログラム。The computer program for making a computer perform the step of the image reproduction method of any one of Claims 1 thru | or 7 .

請求項８に記載のコンピュータプログラムを記憶したコンピュータ読み取り可能な記憶媒体。A computer-readable storage medium storing the computer program according to claim 8 .

圧縮符号化された画像データを復号して画像を再生する画像処理装置であって、
復号した前記画像データから所定の色度範囲の連続するブロックを抽出するブロック抽出手段と、
前記連続するブロックの空間周波数の平均値に基づいて、前記連続するブロックを注目画像領域とするか否かを判定する判定手段と、
前記注目画像領域から特徴部位を抽出する特徴部位抽出手段と、
前記抽出された特徴部位の画素数と前記圧縮符号化に利用した量子化フィルタ値とに基づいて、前記特徴部位のボケの補正強度を決定する決定手段と、
前記決定手段により決定された補正強度に応じて前記特徴部位のボケを補正するボケ補正手段と、
前記補正手段によりボケが補正された画像を再生する再生手段とを有することを特徴とする画像処理装置。An image processing apparatus that decodes compression-encoded image data and reproduces an image,
Block extraction means for extracting continuous blocks of a predetermined chromaticity range from the decoded image data ;
Determination means for determining whether or not the continuous block is a target image region based on an average value of spatial frequencies of the continuous blocks;
A feature part extracting means for extracting a feature part from the image area of interest;
Determining means for determining a correction strength of blur of the feature portion based on the number of pixels of the extracted feature portion and a quantization filter value used for the compression encoding;
A blur correction unit that corrects the blur of the characteristic portion according to the correction strength determined by the determination unit ;
An image processing apparatus comprising: a reproducing unit that reproduces an image whose blur is corrected by the correcting unit.

前記決定手段は、前記特徴部位の画素数が少なく前記量子化フィルタ値の値が大きいほど、ボケをより強く補正する補正強度に決定することを特徴とする請The determination means determines the correction strength for correcting blur more strongly as the number of pixels of the feature portion is smaller and the value of the quantization filter value is larger. 求項１０記載の画像処理装置。The image processing device according to claim 10.

前記圧縮符号化された画像データを復号して前記復号した画像データを生成する復号手段と、
前記復号した画像データから色度、空間周波数、輝度を求める手段とを更に有することを特徴とする請求項１０記載の画像処理装置。Decoding means for generating an image data said decoding to decode the image data that has been said compression encoding,
The image processing apparatus according to claim 10 , further comprising means for obtaining chromaticity, spatial frequency, and luminance from the decoded image data.

前記圧縮符号化された画像データはJPEG画像データであり、前記復号した画像データはＤＣＴ係数と逆ＤＣＴ変換されたビットマップデータとを含むことを特徴とする請求項１２記載の画像処理装置。The image processing apparatus according to claim 12, wherein the compression-coded image data is JPEG image data , and the decoded image data includes DCT coefficients and inverse DCT transformed bitmap data.

前記ブロック抽出手段により抽出された前記連続するブロックの数に基づいて、注目画像領域となる候補を選別する選別手段を更に有することを特徴とする請求項１０記載の画像処理装置。The image processing apparatus according to claim 10 , further comprising a selection unit that selects a candidate to be a target image region based on the number of the continuous blocks extracted by the block extraction unit.

前記ボケの補正は、アンシャープマスク処理により行われることを特徴とする請求項１０記載の画像処理装置。The image processing apparatus according to claim 10 , wherein the blur correction is performed by an unsharp mask process.