JP2004023656A

JP2004023656A - Image processing device, image processing method, and program

Info

Publication number: JP2004023656A
Application number: JP2002178829A
Authority: JP
Inventors: Yasuo Fukuda; 福田　康男
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-06-19
Filing date: 2002-06-19
Publication date: 2004-01-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an extracting technology which decides whether an image property value is suitable or not for the property of an image, specially whether the value is effective or not for image searching. <P>SOLUTION: The device selects an image property value extracted from an image out of a plurality of kinds of image property values decided in advance (S202), and extracts the image property value of the selected kind (S203). The selection of the image property value is made based on the setting of taking the image at the time of taking the image. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、画像特徴量の抽出技術に関する。
【０００２】
【従来の技術】
一般に、画像を検索する方式を大別すると、画像に対応付けた検索情報（キーワードや日付）などのデータを用いる方法と、画像から抽出される画像特徴量等を用いる方式に分類される。
【０００３】
前者の方式は、画像の検索というよりはむしろ検索情報をキーとして用いてテキストによる検索を行うものであり、その検索結果に対応する画像を得るものである。そのため、検索情報を何らかの方法で画像と別途用意する必要がある。多くの場合、この作業は人手によって行われ、検索情報を付与する画像数が多くなるほど作業量は膨大となる。
【０００４】
後者の方式は、多くの場合、コンピュータにより自動化を図ることが可能であるので、前者の方式に比べて人手による作業を削減或いは完全になくすことができるというメリットがある。この例としては、原画像の縮小画像を用いるサムネイル方式や、あるいは、ＭＰＥＧ−７（ＩＳＯ／ＩＥＣ　ＪＴＣ１／ＳＣ２９／ＷＧ１１　Ｐａｒｔ　７）といったような方式が知られている。
【０００５】
ここで、「ＣＤ　１５９３８−３　ＭＰＥＧ−７　Ｍｕｌｔｉｍｅｄｉａ　Ｃｏｎｔｅｎｔ　Ｄｅｓｃｒｉｐｔｉｏｎ　Ｉｎｔｅｒｆａｃｅ　−Ｐａｒｔ　３　Ｖｉｓｕａｌ」　（ＩＳＯ／ＩＥＣ　ＣＤ　１５９３８−３　ＩＳＯ／ＩＥＣ　ＪＴＣ１／ＳＣ２９／ＷＧ１１／Ｎ３７０３）　、あるいは、「　ＭＰＥＧ−７　Ｖｉｓｕａｌ　ｐａｒｔ　ｏｆ　ｅＸｐｅｒｉｍｅｎｔａｔｉｏｎ　Ｍｏｄｅｌ　Ｖｅｒｓｉｏｎ　８．０」　（ＩＳＯ／ＩＥＣ　ＪＴＣ１／ＳＣ２９／ＷＧ１１／Ｎ３６７３　）によれば、このＭＰＥＧ−７方式においては、画像の特徴量の種類を示す記述子（Ｄｅｓｃｒｉｐｔｏｒ）には、次の表に示すようなカテゴリと、そのカテゴリに属する記述子が規定されている。
【０００６】
【表１】

【０００７】
【発明が解決しようとする課題】
一方、個々の画像特徴量は画像のある特徴（色、規則性、対象物の形状等）に注目したものであるので、ある画像特徴量が、その画像の特徴を示すものとして適切であるか否か、とりわけ、画像検索に有効であるかどうかということは、一般に被写体に依存する。
【０００８】
従って、画像の検索のために、画像特徴量を用いる場合、どのような画像特徴量を選択するか否かが問題となる。この場合、画像の内容を逐一人間が目視によって確認し、その画像をどのような特徴量で表現するのが適切かという判断を行う方法も考えられるが、この作業は煩雑であり、特に予め数十枚〜数百枚の画像を撮影して、後でこの選択処理を行うということは非常に面倒である。
【０００９】
これに対して、例えば、上記表１に示した全ての記述子による画像特徴量を付加することも考えられるが、画像の内容によってはそのうち幾つかの記述子による画像特徴量があまり検索にとって有効とはならず、結果として冗長なデータとなってしまう、あるいは無駄な特徴量生成処理時間が増加してしまう、といったことが生じる。例えば、人物を写していない画像に対して、公知の方式などによって顔検出および顔特徴量抽出処理を行った場合には、顔ではない領域を顔として誤検出・誤認識してしまう、という状況がこれに相当する。
【００１０】
更に、冗長なデータは記憶装置にとって負荷となるばかりでなく、その特徴量を用いた検索のパフォーマンスを低下させる、という問題も生じ得る。
【００１１】
従って、本発明の目的は、画像に適した画像特徴量を自動的に選択し、抽出する技術を提供することにある。
【００１２】
【課題を解決するための手段】
本発明によれば、
予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択手段と、
前記選択手段により選択された種類の画像特徴量を抽出する抽出手段と、
を備え、
前記選択手段は、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とする画像処理装置が提供される。
【００１３】
また、本発明によれば、
予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択工程と、
前記選択工程において選択された種類の画像特徴量を抽出する抽出工程と、
を備え、
前記選択工程では、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とする画像処理方法が提供される。
【００１４】
また、本発明によれば、
コンピュータに、
予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択工程と、
前記選択工程において選択された種類の画像特徴量を抽出する抽出工程と、
を実行させるプログラムであって、
前記選択工程では、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とするプログラムが提供される。
【００１５】
【発明の実施の形態】
以下、本発明の好適な実施形態について図面を参照して説明する。
【００１６】
＜第１実施形態＞
図１は、本発明の第１実施形態に係る画像処理装置の一構成例を示すブロック図である。なお、本実施形態の画像処理装置は、例えば、デジタルカメラ単体、画像撮影機能を備えたコンピュータ端末、或いは、デジタルカメラとパソコン等のコンピュータ端末とを無線又は有線で接続したシステム、等の態様で実現される。以下、順にハードウエア構成１、２、３と称す。
【００１７】
画像入力部１００は、静止画像データと動画像データの一方、もしくは両方を入力可能な画像入力デバイスから構成され、例えば、上記ハードウエア構成１及び２の場合、ＣＣＤセンサを含む撮影回路等、およそ画像を電子化できる回路であり、上記ハードウエア構成３の場合は、公知のデジタルカメラ装置、デジタルビデオ装置、スチル撮影可能なデジタルビデオ装置等が挙げられる。また、図示していないが、画像データの圧縮等、各種画像処理機能を持たせてもよい。
【００１８】
入力部１０１は、ユーザからの指示やデータを入力し、各種設定、例えば、撮影モード等の撮影設定や後述するテーブルの設定を行うためのデバイスから構成され、例えば、上記ハードウエア構成１の場合、ボタンやモードダイヤル等で構成することができ、上記ハードウエア構成２及び３の場合、キーボードやポインティング装置等が含まれる。なお、ポインティング装置としては、マウス、トラックボール、トラックパッド、タブレット等が挙げられる。
【００１９】
データ記憶部１０２は、画像データや特徴量データ等の記録若しくは読み出しがされるデバイスであり、例えば、ハードディスク、ＣＤ−ＲＯＭやＣＤ−Ｒ、メモリーカード、ＣＦカード、スマートメディア、ＳＤカード、メモリスティック等で構成される。尤も、これに代えて、イーサネット（登録商標）カードやモデム、赤外線、ＩＥＥＥ８０２．１１ｂやＢｌｕｅｔｏｏｔｈ等の無線通信モジュールといった通信路制御装置を設け、外部の記憶装置にアクセス可能とすることで、画像データや特徴量データ等をその外部の記憶装置に格納するようにしてもよい。
【００２０】
表示部１０３は、ＧＵＩ等の画像を表示するデバイスで、例えば、上記ハードウエア構成１の場合、ファインダー、上記ハードウエア構成２の場合、ＣＲＴや液晶ディスプレイ等から構成される。ＣＰＵ１０４は、画像処理装置全体の制御を司り、また、後述する処理を実行する。ＲＯＭ１０５とＲＡＭ１０６は、その処理に必要なプログラム、データ、作業領域などをＣＰＵ１０４に提供する記憶手段である。
【００２１】
以上が本実施形態の画像処理装置における主要なハードウエア構成であるが、これらは以下の説明上、主に必要とされる構成のみを簡単に例示しており、その他の構成を含むことができることはいうまでもない。
【００２２】
図２は、上記画像処理装置による処理の流れを示すフローチャートである。この処理は、例えば、画像入力部２０１により画像が入力されたことをトリガとして実行される。或いは、上記画像処理装置として上記ハードウエア構成１を採用した場合、例えばシャッターによる撮像処理をトリガとして実行するようにしてもよい。
【００２３】
Ｓ２０１の設定取得ステップでは、画像の撮影に際して、ユーザが設定した撮影設定を取得する。例えば、公知のデジタルカメラでは、撮影設定として、Ａｕｔｏ、ポートレート、夜景、パンフォーカス、風景、白黒、スティッチアシストといった撮影モードを選択して撮影することができる。
【００２４】
ポートレートモードは、主に人物、特に人物の顔や、あるいは何らかの対象物（花、虫）を撮影するのに適したモードで、画像の背景をぼかして前景の対象物を浮き立たせるような撮影を行うモードである。夜景モードは主に夜間撮影に適したモードであって、ストロボを発光するとともに、シャッター速度を遅くするモードである。パンフォーカスモードは、フォーカスを固定し背景も鮮明に撮影するモードである。風景モードは背景にフォーカスをあわせ、風景を撮影するのに適したモードである。Ａｕｔｏモードはデジタルカメラ装置側で設定を決定する汎用的なモードである。
【００２５】
白黒モードは撮影結果を白黒画像（グレースケール）にするモードである。また、スティッチアシストモードは、パノラマ画像を作成するために連続的に撮像するモードである。
【００２６】
次に、Ｓ２０２の特徴量選択ステップでは、Ｓ２０１で取得した撮影設定に応じて、撮影された画像から抽出する画像特徴量の種類を選択する。本実施形態の場合、例えば、図３に示すテーブルを参照して選択される。このテーブルは、各撮影設定と、各撮影設定に対応して選択される画像特徴量の種類と、の関係が記録されたテーブルであり、例えば、ＲＯＭ１０５、ＲＡＭ１０６、データ記憶部１０２にデータの形式で格納しておいてもよいし、あるいは、本処理を実現するプログラム内部にテーブルの形式で格納されていてもよい。若しくは、プログラムや装置内部の制御アルゴリズムとして格納されているのであってもよい。
【００２７】
ここで、表１に示したＭＰＥＧ−７のＶｉｓｕａｌ　ＤｅｓｃｒｉｐｔｏｒのＣｏｌｏｒ，　Ｓｈａｐｅ，　Ｔｅｘｔｕｒｅ，　Ｆａｃｅのカテゴリに属するＤｅｓｃｒｉｐｔｏｒには、概説すると以下のような特徴がある。
【００２８】
Ｃｏｌｏｒに属するＤｅｓｃｒｉｐｔｏｒは画像の色合いや色の配置を表現することができる。Ｓｈａｐｅに属するＤｅｓｃｒｉｐｔｏｒは画像中の物の形状を表現することができる。Ｔｅｘｔｕｒｅに属するＤｅｓｃｒｉｐｔｏｒは画像のテクスチャパターンやその規則性などを表現することができる。Ｆａｃｅに属するＤｅｓｃｒｉｐｔｏｒは、人物の顔を表現することができる。
【００２９】
例えば、風景を撮影する場合、撮影者の意図は主にその背景（山、空、海など）が主眼であると考えられる。また、撮影者はその意図に応じて、例えば風景モードなどを選択して撮影を行う。したがって、風景モードで撮影した画像を表現するためには、背景の色合いや規則性を表すような画像特徴量が好ましい。逆に人に特化したような特徴量は撮影者の意図としては適切でないし、そもそも画像中に人が写っていない可能性が高い。逆に人物撮影を意図して撮影する場合には、撮影者は例えばポートレートモードを選択して撮影する。したがって、このモードは人物やもしくは花などの対象物を撮影する可能性が高く、そのような場合には、形状を表すものや、あるいは人物の顔を表現するような特徴量生成が好ましい。
【００３０】
図３のテーブルは、このような事情を考慮して作成されている。図３のテーブルは、表１にあげられている画像特徴量の種類（Ｄｅｓｃｒｉｐｔｏｒ）のうち、Ｃｏｌｏｒ　Ｌａｙｏｕｔ　Ｄｅｓｃｒｉｐｔｏｒ（Ｃｏｌｏｒカテゴリ）、Ｃｏｎｔｏｕｒ　Ｓｈａｐｅ　Ｄｅｓｃｒｉｐｔｏｒ（Ｓｈａｐｅカテゴリ）、Ｅｄｇｅ　Ｈｉｓｔｏｇｒａｍ　Ｄｅｓｃｒｉｐｔｏｒ（Ｔｅｘｔｕｒｅカテゴリ）、Ｆａｃｅ　Ｄｅｓｃｒｉｐｔｏｒ（Ｆａｃｅカテゴリ）を用いた例であり、図中、「ＯＮ」はその画像特徴量が選択されることを示し、「ＯＦＦ」はその画像特徴量が選択されないことを意味している。
【００３１】
例えばカラー撮影を行うモードではＣｏｌｏｒ　Ｌａｙｏｕｔ　ＤｅｓｃｒｉｐｔｏｒをＯＮ、前景の対象物を主眼として撮影を行うモードではＣｏｎｔｏｕｒ　Ｓｈａｐｅ　ＤｅｓｃｒｉｐｔｏｒをＯＮ、主に背景などを主眼として撮影するようなモードではＥｄｇｅ　Ｈｉｓｔｏｇｒａｍ　ＤｅｓｃｒｉｐｔｏｒをＯＮ、また人物が写っている可能性が高いモードではＦａｃｅ　ＤｅｓｃｒｉｐｔｏｒをＯＮとしている。
【００３２】
なお、これらのプログラム上の処理としては、例えば、図３の各Ｄｅｓｃｒｉｐｔｏｒ（Ｃｏｌｏｒ　Ｌａｙｏｕｔ，　Ｃｏｎｔｏｕｒ　Ｓｈａｐｅ，　Ｅｄｇｅ　Ｈｉｓｔｏｇｒａｍ，　Ｆａｃｅ）にそれぞれ対応する変数ｖ１，　ｖ２，　ｖ３，　ｖ４を用意しておき、これらにＯＮ／ＯＦＦの値を設定する。例えば、Ｓ２０１で取得した撮影設定が、風景モードであったとすると、変数ｖ１，　ｖ２，　ｖ３，　ｖ４の値はそれぞれＯＮ，　ＯＦＦ，　ＯＮ，　ＯＦＦとなる。
【００３３】
なお、言うまでもないが、ＭＰＥＧ−７に採用されているＶｉｓｕａｌ　Ｄｅｓｃｒｉｐｔｏｒ以外にも色や形状を表す画像特徴量の種類は存在しており、さらにはＭＰＥＧ−７のカテゴリ以外のカテゴリに属すべきような特徴量も存在している。ここでＭＰＥＧ−７を挙げたのはあくまでも公知の画像特徴量の一例としてであって、本発明はＭＰＥＧ−７以外の特徴量も採用可能である。
【００３４】
最後に、Ｓ２０３の特徴量抽出ステップで、撮影した画像から画像特徴量を抽出する。抽出する画像特徴量の種類は、Ｓ２０２で選択されたものとなり、上述した変数ｖ１，　ｖ２，　ｖ３，　ｖ４を参照して、変数がＯＮになっている種類の画像特徴量についてその抽出を行う。風景モードが設定されていた場合、変数ｖ１，　ｖ３の値がＯＮであるので、それに対応するＣｏｌｏｒ　Ｌａｙｏｕｔ　Ｄｅｓｃｒｉｐｔｏｒ、Ｅｄｇｅ　Ｈｉｓｔｏｇｒａｍによる画像特徴量抽出を行うこととなる。抽出した画像特徴量は特徴量データとして、対応する画像と何らかの形で関連付けてデータ記憶部１０２に記録され、処理が終了する。
【００３５】
このように、本実施形態では、ユーザが設定した撮影設定に応じて画像特徴量を選択、抽出することにより、画像特徴量の種類の選択が自動化されると共に、撮影された画像の特性に合った特徴量抽出が行われる一方で、特性に合わない特徴量抽出が抑制される。従って、撮影した画像に適した画像特徴量が抽出されると共に、無条件で全ての種類の特徴量を抽出する場合に比べ、例えば特徴量データの処理時間の短縮や、あるいは特徴量データのデータ量を削減することができる。
【００３６】
更に、本実施形態により生成した特徴量データをデータベースに登録した場合、各画像の特徴をより効果的に表現している特徴量のみを登録することになる。したがって、検索にあまり有効でない特徴量データの増加を抑制することができ、検索精度や速度の向上も期待できるという利点もある。
【００３７】
なお、図３のテーブルは、あくまでも一例であって、ＯＮ、ＯＦＦと撮影モードとの対応関係は、別の対応関係であってもよい。また、テーブル中にある４つのＤｅｓｃｒｉｐｔｏｒ以外の特徴量を用いてもよい。
【００３８】
また、図３のテーブルの内容が、予め定められている固定の場合の例を示したが、例えばユーザの嗜好に応じてテーブルを修正する手段を別途設けて、ユーザの意図にあわせられるようにしてもよい。
【００３９】
また、例えば、ユーザが撮影設定を選択した時点で、表示部１０３等に、その撮影設定に対応して選択される各画像特徴量の種類を提示するとともに、ユーザの入力によって、その対応関係を適宜修正するようにしてもよい。この場合、図２のＳ２０１、Ｓ２０２の処理を、ユーザの撮影設定の選択時をトリガとして実行し、さらにユーザの指定により変数ｖ１，　ｖ２，　ｖ３，　ｖ４の値を更新し、ユーザがシャッターを切るなどの撮影動作をトリガとしてＳ３０３の処理を行うように変形すればよい。
【００４０】
＜第２実施形態＞
本実施形態では、第１実施形態と異なる部分のみ説明を行う。第１実施形態ではユーザが選択する撮影設定として撮影モードを挙げたが、例えば公知のデジタルカメラにはこの他にもユーザが撮影状況に応じて選択する撮影設定が存在する。
【００４１】
例えば、赤目緩和処理機能が存在する。赤目緩和処理機能は、フラッシュ発光時の発光が人間などの網膜に反射し、結果として目が赤く写る現象を緩和するものである。赤目緩和処理機能はこれを緩和するために、撮影時のフラッシュ発光に先んじて予備発光を行い、被写体の人間の瞳孔を絞らせて網膜による反射光を削減し、目が赤く写ることを抑制するものである。
【００４２】
赤目緩和処理は主に人間の特に顔を含む画像を撮影する場合に用いられる技術であって、逆に目、特に人間の目を含む顔が写らない画像において、赤目緩和処理は必要とされない。従って、赤目緩和処理機能が選択された場合には、人の顔が撮影される可能性が極めて高い。
【００４３】
図５は、この赤目緩和処理に関する条件を反映するように構成した処理の流れを示すフローチャートである。図中Ｓ２０１、Ｓ２０２及びＳ２０３の処理は、上記第１実施形態で説明した処理と同様の処理である。ただし、本実施形態ではＳ２０１において、撮影モードの他に赤目緩和処理機能の選択（ＯＮ／ＯＦＦ）を示す値等もあわせて取得するものとする。また、Ｓ２０２では、上述した撮影モードに基づく画像特徴量の選択を行い、Ｓ４０１では、赤目緩和処理機能の選択に基づく画像特徴量の選択を行うため、前者を第１特徴量選択ステップと称し、後者を第２特徴量選択ステップと称している。
【００４４】
Ｓ５０１の第２特徴量選択ステップでは、赤目緩和処理機能の選択に応じ、赤目緩和処理機能がＯＮであれば顔の特徴量抽出に関する変数ｖ４の値をＯＮに、逆に赤目緩和処理機能がＯＦＦであれば変数ｖ４の値をＯＦＦに更新する。これにより、撮影モードとは別に、赤目緩和処理機能の選択に応じて、画像特徴量の選択が可能となる。
【００４５】
ここで、公知のデジタルカメラの中には、この他にも、Ａｕｔｏ、ポートレート、白黒、スティッチアシストの各撮影モードにおいては、適宜マクロ撮影モードを選択することが可能である。マクロ撮影モードは、花や虫などの対象物をアップで撮影するためのモードである。したがって、ユーザがこのモードを選択して撮影した画像には何らかの対象物がアップで写されているので、例えばＳｈａｐｅのような物の形状を表す特徴量を抽出することが好ましい。
【００４６】
したがって、例えば、図４のＳ２０１においてマクロモードの選択（ＯＮ／ＯＦＦ）を表す値を取得し、Ｓ４０１において、マクロモードがＯＮであれば、例えば、Ｃｏｎｔｏｕｒ　Ｓｈａｐｅの特徴量抽出に関する変数ｖ２の値をＯＮにすればよい。このようにすることで、
例えば、ユーザが選択した撮影モードがスティッチアシストモードの場合、図３のテーブルによれば変数ｖ２の値はＯＦＦとなるので、本来は、Ｃｏｎｔｏｕｒ　Ｓｈａｐｅの特徴量抽出は行われないが、本実施形態ではユーザがマクロモードを選択していた場合にはＣｏｎｔｏｕｒ　Ｓｈａｐｅの特徴量抽出が行われることになる。本実施形態の場合、第１実施形態の効果に加え、さらに撮影状況に応じて最適な特徴量抽出を行うことが可能となる。
【００４７】
＜第３実施形態＞
本実施形態では、第１及び第２実施形態と異なる部分のみ説明を行う。第２実施形態においては、赤目緩和処理機能のＯＮ／ＯＦＦに注目して、顔の特徴量抽出を選択した。
【００４８】
しかし、例えば公知のデジタルカメラにおいては、Ａｕｔｏ、ポートレート、夜景の撮影モードを選択すると、赤目緩和処理機能のデフォルトをＯＮとし、逆にパンフォーカス、風景の撮影モードではデフォルトをＯＦＦとするものも存在する。このように、撮影モードの選択に応じて、装置側で自動的に赤目緩和処理機能のＯＮ、ＯＦＦを更新する場合があり、この場合ユーザの意図が適切に反映されない可能性がある。
【００４９】
一方、このようなデジタルカメラにおいては、ユーザの操作により、この赤目緩和処理機能のＯＮ／ＯＦＦを強制的に変更する機能も備えられている場合がある。ユーザが赤目緩和処理機能を強制的にＯＮにする場合、人間の特に目を含む顔を撮影することを意図したものであり、逆に強制的にＯＦＦにする場合は、被写体は人間の顔ではないものを撮影することを意図したものである。
【００５０】
図５は、このような場合を想定した処理の流れを示すフローチャートである。図中Ｓ２０１〜Ｓ２０３は図２に示し、第１実施形態で説明した処理と同じ処理を示している。ただし、本実施形態ではＳ２０１において、撮影モードの他に現在の赤目緩和処理機能のＯＮ／ＯＦＦを示す値もあわせて取得する。本実施形態では、この設定を保存する変数をｖ５とする。また、Ｓ２０２では、上述した撮影モードに基づく画像特徴量の選択を行い、Ｓ５０２では、赤目緩和処理機能の選択に基づく画像特徴量の選択を行うため、前者を第１特徴量選択ステップと称し、後者を第２特徴量選択ステップと称している。
【００５１】
Ｓ５０１では、現在の赤目緩和処理機能の変数ｖ５が、現在の撮影モードの赤目緩和処理機能のデフォルトと一致するかどうか判定する。例えば、上記デジタルカメラの例で言えば、撮影モードがＡｕｔｏ、ポートレート、夜景のいずれかの場合に、ｖ５の値がＯＮであればデフォルトと一致、逆にＯＦＦであればデフォルトと不一致であり、パンフォーカス、風景モードのいずれかの場合に、ｖ５の値がＯＦＦであればデフォルトと一致、逆にＯＮであればデフォルトと不一致である。ｖ５の値がデフォルトと不一致であれば、処理はＳ５０２に進み、一致しなければ処理はＳ２０３へ進む。
【００５２】
Ｓ５０２の第２特徴量選択ステップでは、顔の特徴量抽出に関する変数ｖ４の値をＯＮ／ＯＦＦ反転する。これは、ユーザが強制的に赤目緩和処理機能をＯＮに変更した場合にｖ４の値をＯＮに、逆に強制的に赤目緩和処理機能をＯＦＦに変更した場合にｖ４の値をＯＦＦにすることと等価である。
【００５３】
本実施形態の場合、第２実施形態の効果に比べ、さらにユーザの撮影状況の判断や意図に応じて顔特徴量抽出に関して最適な特徴量抽出を行うことが可能となる。
【００５４】
＜第４実施形態＞
本実施形態では、第１乃至第３実施形態と異なる部分についてのみ説明を行う。第１乃至第４実施形態においては、例えば図３のテーブルはＯＮ／ＯＦＦの２値で処理していたが、これに限られるものではなく、多値で処理することも可能である。
【００５５】
図６は、多値テーブルの例である。テーブル中の各数値１乃至５は、その特徴量の選択優先度値であって、数字が小さいほど選択される優先度が高いとした例である。数値の範囲はあくまでも一例であって、その他の範囲であっても良い。また数値の大小と優先度の対応関係は逆であっても良い。このテーブルに対して、例えば閾値処理を施してＯＮ／ＯＦＦの二値化を行えば、上記実施形態と同様に画像特徴量の選択を行える。
【００５６】
この閾値処理のための閾値は、例えば、ユーザが予め画像入力前に設定することができる。数値の入力は、例えば入力部１０１で入力することも可能である。あるいはさらに表示部１０３を用いて対話的に入力を行うのであってもよい。
【００５７】
本実施形態の場合、図６の多値テーブルで用いている値は１乃至５なので、ユーザは閾値として例えば１乃至５のいずれかの値を入力する。本実施形態の場合、ユーザが入力する閾値は数字が大きければ大きいほど選択される特徴量の種類が多く、逆に数値が小さいほど選択される特徴量が少ないことになる。このユーザが入力した閾値は例えば図２、４、５のＳ２０１で撮影設定とともに取得され、例えば変数Ｔｈに格納される。
【００５８】
第１乃至第３実施形態では、ステップＳ２０２において、テーブルを参照し、直接変数ｖ１，　ｖ２，　ｖ３，　ｖ４に読み込んでいたが、本実施形態ではテーブルの値を参照し、テーブルの値と変数Ｔｈを比較し、もしテーブルの数値がＴｈ以下であればＯＮとし、そうでない場合にはＯＦＦとする。
【００５９】
例えば、ユーザがスティッチアシストの撮影モードを選択して撮影した場合、図６の多値テーブルによればＣｏｌｏｒ　Ｌａｙｏｕｔ，　Ｃｏｎｔｏｕｒ　Ｓｈａｐｅ，　Ｅｄｇｅ　Ｈｉｓｔｏｇｒａｍ，　Ｆａｃｅの優先度値はそれぞれ１，　４，　２，　５である。ここでユーザが閾値として３を予め入力していた場合、制御変数ｖ１，　ｖ２，　ｖ３，　ｖ４の値はそれぞれ、
ｖ１：　←　ＯＮ　（１≦３）
ｖ２：　←　ＯＦＦ　（４＞３）
ｖ３：　←　ＯＮ　（２≦３）
ｖ４：　←　ＯＦＦ　（５＞３）
となりＣｏｌｏｒ　Ｌａｙｏｕｔ，　ＴｅｘｔｕｒｅのＤｅｓｃｒｉｐｔｏｒが選択される。
【００６０】
また、ユーザが入力した閾値が４であった場合、
ｖ１：　←　ＯＮ　（１≦４）
ｖ２：　←　ＯＮ　（４≦４）
ｖ３：　←　ＯＮ　（２≦４）
ｖ４：　←　ＯＦＦ　（５＞４）
となり、変数ｖ２が制御するＳｈａｐｅのＤｅｓｃｒｉｐｔｏｒも選択されるようになる。
【００６１】
逆にユーザが指定した閾値が１であった場合には、
ｖ１：　←　ＯＮ　（１≦１）
ｖ２：　←　ＯＦＦ　（４＞１）
ｖ３：　←　ＯＦＦ　（２＞１）
ｖ４：　←　ＯＦＦ　（５＞１）
となり、Ｃｏｌｏｒ　ＬａｙｏｕｔのＤｅｓｃｒｉｐｔｏｒのみが選択されるようになる。
【００６２】
また、第２及び第３実施形態では撮影モードに基づく特徴量選択を、他のモードの設定によって変更するような例を示したが、こちらも多値化することが可能である。
【００６３】
例えば、第３実施形態において、各撮影モードに基づく赤目緩和モードのデフォルトと、ユーザ指定の赤目緩和処理機能のＯＮ／ＯＦＦが異なればｖ４をＯＮとし、そうでなければＯＦＦとしたが、これは例えば、赤目緩和処理機能のデフォルトがＯＦＦでユーザ指定の赤目緩和処理機能がＯＮであった場合、多値テーブルのＦａｃｅ　Ｄｅｓｃｒｉｐｔｏｒの優先度を２とし、また、赤目緩和処理機能のデフォルトがＯＮでユーザ指定の赤目緩和処理機能がＯＦＦであった場合は優先度を５とする、というように処理してもよい。
【００６４】
この場合、
・赤目緩和処理機能のデフォルトがＯＦＦでユーザ指定の赤目緩和処理機能がＯＮであっても、ユーザ指定の閾値が１であればＦａｃｅ　Ｄｅｓｃｒｉｐｔｏｒによる特徴量抽出を行わない。
・赤目緩和処理機能のデフォルトがＯＮでユーザ指定の赤目緩和処理機能がＯＦＦであっても、ユーザ指定の閾値が５であればＦａｃｅ　Ｄｅｓｃｒｉｐｔｏｒによる特徴量抽出を行う。
というようにより細かな制御を行うことが可能である。
【００６５】
したがって、ユーザはこれから撮影する画像について、より詳細な特徴量抽出を行いたいと思う場合には、予め閾値を大きくして撮影し、逆にあまり詳細な特徴量抽出は必要でないと思う場合には、予め閾値を小さくしておいて撮影すればよい。こうすることにより撮影時の撮影者の意図を反映することができる。
【００６６】
またこの他にも、図６の多値テーブルにある優先度値と、その他のモード（例えば赤目緩和処理機能のＯＮ／ＯＦＦなど）による優先度値と、ユーザ指定の閾値とで何らかの演算を行い、特徴量を選択してもよい。
【００６７】
以上説明した通り、本実施形態では、各特徴量の選択のテーブルを多値化し、閾値によりＯＮ／ＯＦＦを定めることで、より細かな選択を行うことが可能となっている。
【００６８】
＜第５実施形態＞
本実施形態では、第１乃至第４実施形態と異なる部分にのみ説明を行う。第１乃至第４実施形態では、撮影設定として、撮影モードのような選択的なモードや、赤目緩和処理機能のモード、マクロ撮影モードのようなＯＮ／ＯＦＦとなるモードを用いた例を示したが、この他に、連続的、あるいは多段階の撮影設定についても本発明は、適用可能である。
【００６９】
例えば、公知のデジタルカメラの中には、先述した撮影モードの他に、シャッター優先モード（Ｔｖ）、絞り優先モード（Ａｖ）、マニュアルモードといったモードが設けられているものも存在する。
【００７０】
シャッター優先モードでは、予めシャッター速度をユーザが選択して固定し、適正露出になるように絞り値を自動設定するモードである。逆に絞り優先モードでは、ユーザが絞り値を選択して固定し、適正露出となるようにシャッター速度を自動設定するモードである。マニュアルモードは、シャッター速度および露出をユーザが選択して設定するモードである。
【００７１】
シャッター優先モードは、例えば高速に移動する物体を撮影する場合に選択され、さらにシャッター速度を高速に設定して撮影を行う、という場合に用いられる。絞り優先モードは、例えば、背景をぼかして前景にある対象物のみを浮き立たせて撮影を行うような場合に選択され、絞り値を小さく（絞りを開く）して被写界深度を浅くして撮影したり、あるいは背景も鮮明に撮影したい場合には、絞り値を大きく（絞りを閉じる）して被写界深度を深くして撮影を行う、という場合に用いられる。
【００７２】
本実施形態では、この絞り優先モードと絞り値設定を用いた例を説明する。例えば、絞り数値として、Ｆ２．０，　Ｆ２．２，　Ｆ２．５，　Ｆ２．８，　Ｆ３．２，　Ｆ４．０，　Ｆ４．５，　Ｆ５．０，　Ｆ５．６，　Ｆ６．３，　Ｆ７．１，　Ｆ８．０といった多段階の設定が可能なデジタルカメラの場合、例えば図７のテーブルを設け、ユーザが絞り優先モード、もしくはマニュアルモードで意図的に選択した絞り値設定を用いて特徴量抽出を選択することが可能である。なお、図７に示した優先度値はあくまでも一例であって、各優先度値は他の値であっても良い。
【００７３】
本実施形態では、絞り優先モードにおけるユーザが選択した絞り値を用いた例を説明したが、この他にもシャッター優先モードのシャッター速度を用い、シャッター速度が速い場合には高速移動している物体があるとして、Ｓｈａｐｅの特徴量を選択する、もしくはその優先度を上げるようにすることも可能である。
【００７４】
また、ズーム機能がある場合にはそのズーム倍率の情報を撮影設定として用いて、ワイド端ではユーザが注目している物体を表すのに適した特徴量、例えばＳｈａｐｅなどの特徴量を選択し、もしくは優先度を上げ、逆にテレ側では風景のような画像を表すのに適した特徴量、例えばＴｅｘｔｕｒｅなどの特徴量を選択する、もしくは優先度を上げる、といった処理も可能である。
【００７５】
このように、本実施形態の場合、より細かな特徴量抽出を行うことが可能である。
【００７６】
＜第６実施形態＞
本実施形態では、第１乃至第５実施形態と異なる部分のみ説明を行う。第１乃至第５実施形態では、図３や図６のテーブルのように、テーブル内の設定値を静的なものとしたが、これはもちろん可変としてもよい。例えば、ユーザが撮影前に入力部１０１などを操作して自分の好みになるように図３や図６のテーブルの設定値を変更できるようにしてもよい。
【００７７】
また、データ記憶部１０２から、この特徴量選択のためのテーブルの設定値を読み込みんで制御を行うことも可能である。特にデータ記憶部１０２を、メモリーカード、ＣＦカード、スマートメディア、ＳＤカード或いはメモリスティックのような着脱可能な記憶媒体で構成した場合には、予め特徴量選択のためのテーブルに関する設定を記述したファイルを作成し、これに格納しておくことでより容易に実現可能である。この場合、本実施形態の画像処理装置以外のコンピュータ端末等でファイルを作成することが可能となる。
【００７８】
設定を記述したファイルのデータ形式は、例えば図３や図６のようなテーブルを構成可能な情報を含むものや、あるいは第２、第３実施形態で示したような条件を記述可能なものであれば任意の形式であってよい。
【００７９】
このように、特徴量選択のテーブルをユーザが自由に設定し易くすることにより、ユーザ個人個人の嗜好などに合わせた特徴量抽出を行うことが可能となる。
【００８０】
＜第７実施形態＞
本実施形態では、第１乃至第６実施形態と異なる部分のみ説明を行う。第４、第５実施形態では、多値の優先度値を単に特徴量の選択に用いたが、この値を特徴量データとともに出力してもよい。図８は、この優先度値をファイルに出力した例である。
【００８１】
この例は、ユーザが風景モードを選択し、図６のテーブルに従って特徴量抽出を行った場合の優先度を出力した例である。本例の場合、ユーザ指定の閾値が５である、あるいは特に閾値処理は行わず全てのＤｅｓｃｒｉｐｔｏｒを選択するような場合の例である。
【００８２】
図８のファイルにおいて、一行目は撮影した画像を格納する画像ファイル名である。二行目はＣｏｌｏｒ　Ｌａｙｏｕｔ　Ｄｅｓｃｒｉｐｔｏｒの優先度値、三行目はＣｏｌｏｒ　Ｌａｙｏｕｔ　Ｄｅｓｃｒｉｐｔｏｒデータを格納するファイル名である。四、五行目は同様にＣｏｎｔｏｕｒ　Ｓｈａｐｅ　Ｄｅｓｃｒｉｐｔｏｒの優先度値とファイル名、六、七行目はＥｄｇｅ　Ｈｉｓｔｏｇｒａｍの優先度値とファイル名、八、九行目はＦａｃｅ　Ｄｅｓｃｒｉｐｔｏｒの優先度値とファイル名である。
【００８３】
なお、図８のファイルのデータ形式は、あくまでも例であって、元の画像ファイルを参照するのに十分な情報、各特徴量のデータを参照するのに十分な情報、及び、各特徴量の優先度値、を対応付ける情報を含む任意の形式であってよい。本例では画像ファイルや特徴量データと別ファイルの形式で格納する例を示したが、画像ファイルや特徴量データファイルのどちらか一方と接合して格納するようにしてもよい。
【００８４】
また、ファイル名などの識別子で、画像データ、特徴量データ、特徴量優先度データの対応付けを行うことが容易に可能であるならば、図８のようなファイルは必ずしも必要ではない。これは例えばファイル名の基底名を同一とし、拡張子でそれが何のデータであるか表すというような規則を設けて対応づけるのであってもよい。
【００８５】
この場合、出力された特徴量優先度データを例えば後で行う検索の場合に、複数種の特徴量による類似度判定結果を統合する場合の重みづけデータなどに用いることが可能で、より画像の内容や撮影者の意図に沿った検索を行うことができるという利点がある。
【００８６】
＜第８実施形態＞
本実施形態では、第１乃至第７実施形態と異なる部分のみ説明を行う。第１乃至第７実施形態では、主に静止画像に適用する例を示したが、本発明は、動画像に対して適用することも可能である。最も単純には、第１乃至第７実施形態で示した処理を、動画像の各フレーム画像単位で適用する。
【００８７】
この場合、例えば表１で示したＭＰＥＧ−７のＤｅｓｃｒｉｐｔｏｒの中には、ＭｏｔｉｏｎカテゴリのＤｅｓｃｒｉｐｔｏｒや、ＴｉｍｅＳｅｒｉｅｓ　Ｄｅｓｃｒｉｐｔｏｒと他のＤｅｓｃｒｉｐｔｏｒの組み合わせ、などの動画像向けのＤｅｓｃｒｉｐｔｏｒがあるので、それを適用してもよい。
【００８８】
特に、第６実施形態で述べたようにシャッター速度優先モードなどで、ユーザが意図的に速いシャッター速度で撮影を行う場合は、おそらくは高速に移動する物体を撮影していると推測できるので、この場合、例えば、画像中の領域とその移動を表現することが可能なＭＰＥＧ−７のＭｏｔｉｏｎ　ＴｒａｊｅｃｔｏｒｙやＰａｒａｍｅｔｒｉｃ　Ｍｏｔｉｏｎといった特徴量の選択を行う、もしくは優先度を上げるといった処理を行うこともできる。
【００８９】
動画像の特徴量データの格納形式は時系列で各フレームの特徴量データを並べればよい。これは例えば各フレームの特徴量データを個々のファイルに格納し、例えばそのファイルの識別子に時系列の順序に沿った番号を付与したりするのでもよいし、時系列に沿って特徴量データを接合して格納してもよい。図９は、時系列に沿って特徴量データを接合したデータを表す概念図である。
【００９０】
一方、動画像の撮影を行う場合、撮影中に各撮影設定を変更することも考えられる。その結果、例えば、図１０に示すように、ある特徴量データが生成される期間と生成されない期間が存在する可能性がある。図１０で、期間１２０１と１２０３は画像特徴量が選択、抽出される期間、期間１２０２は画像特徴量が選択、抽出されない期間を表している。
【００９１】
このような場合、例えば予め時系列に沿ったインデックス情報をフレーム画像の数だけ用意して、存在する特徴量データに対してのみリンクを設定する、といったような方法で解決できる。図１１は、各インデックス情報１３０１と特徴量データ１３０２を示している。図１１の例では、フレーム０からフレームＩ−１まで特徴量を抽出し、フレームＩからフレームＮ−２まで特徴量を抽出せず、さらにフレームＮ−１で特徴量抽出を行っている状態を表す例である。図１１の例で矢印はリンクを表しているが、リンクは公知のＵＲＬやＵＲＩ、あるいはファイル中でのオフセットなどで実現可能である。
【００９２】
なお、図１１ではインデックス情報１３０１と特徴量データ１３０２を分けて書いてあるが、これらは別々のデータであってもよいし、接合して１つのデータで格納されるのであってもよい。また、図１１で示した図はあくまでも特徴量抽出する期間としない期間がある場合のデータ格納形式の一例であって、この他の形式であってもよい。
【００９３】
＜第９実施形態＞
本実施形態では、第１乃至第８実施形態と異なる部分のみ説明を行う。第１乃至第８実施形態では、静止画像や動画像の撮影時に特徴量抽出を行う例を示したが、ユーザが選択したモードなどの撮影設定と特徴量抽出の対象となる静止画像や動画像があればよく、必ずしも撮影時に特徴量抽出を行う必要はない。
【００９４】
図１２は、撮影時の撮影設定を記述したファイルの例を示す図である。このファイルには、画像特徴量を選択するのに必要な撮影設定が記述されている。このようなファイルを撮影時に作成して、撮影画像と何らかの形で関連付けて記録しておくことで、撮影後に画像特徴量の抽出を行うことができる。
【００９５】
例えばデータ記憶部１０２に画像とともに図１２のような情報を出力することにより、撮影後に特徴量の選択及び抽出を行うことが可能である。なお、図１０に示したファイルは、単なる一例であって、第１乃至第８実施形態で説明した撮影設定を記述可能なものであれば任意の形式であってよい。また記述する項目は特徴量選択に用いることのできる情報であればこの他の設定値であってもよい。
【００９６】
また、図１２のファイルは単独で保存してもよいしあるいは画像ファイルに接合するなどして画像ファイル内部に格納してもよい。例えば、静止画像の場合、公知のＥｘｉｆ画像フォーマットには露出プログラム、絞り値、シャッタースピードといった撮影設定を格納するデータタグが定義されている。露出プログラムは、次の表２のような値と意味を持つ。
【００９７】
【表２】

【００９８】
したがって、公知のＥｘｉｆフォーマットのタグを用いて記述するのでもよい。さらにＥｘｉｆに記述したいユーザ設定に相応しいタグが存在しない場合であっても、ＭａｋｅｒＮｏｔｅやＵｓｅｒＣｏｍｍｅｎｔといったメーカが個別の情報を記入するためのタグやコメント用のタグを用いることで実現できる。
【００９９】
更に、本実施形態の場合、画像データと撮影設定をパーソナルコンピュータに移し、パーソナルコンピュータ上で第１乃至第８の実施形態で示したような特徴量選択及び抽出処理を行うことも可能である。
【０１００】
この場合、図３や図６で示したようなテーブルは、予めパーソナルコンピュータ側に存在すれば十分であるが、このテーブルを記述したファイルを作成し、に画像や撮影設定とともに出力し、パーソナルコンピュータ側で特徴量選択及び抽出処理を行う場合に参照するようにしてもよい。
【０１０１】
またこの場合、第４実施形態で言及したユーザ指定の閾値も、パーソナルコンピュータ側でユーザが入力すれば十分であるが、撮影時のユーザの閾値をファイル化してパーソナルコンピュータに出力しても良い。この閾値を格納するファイルの書式は閾値データを含む任意の形式であってよいし、あるいはＥｘｉｆタグのＭａｋｅｒＮｏｔｅやＵｓｅｒＣｏｍｍｅｎｔなどのタグに記述するのであってもよい。
【０１０２】
なお、第８実施形態で言及したように、動画像撮影の場合には撮影中に撮影設定を変更する場合があり得る。各フレームに対して図１２で例示したようなファイルを作成することも可能であるが、より効率的には例えば図１３に示すように時系列にユーザが行った設定変更を記録したデータがあればよい。
【０１０３】
図１３のファイルの場合、フレーム０からフレームＩ−１では絞り優先モード、絞り値Ｆ２．８で撮影し、フレームＩの時に絞り値をＦ４．０に変更し、さらにフレームＮでモードをシャッター優先モードに変更するとともにシャッター速度を１／４００秒に変更して撮影を行ったことを示している。図１３の例はあくまでも一例であってこの他の書式であっても良い。また、単位をフレーム番号としたが、時刻や撮影開始からの相対時間であっても構わない。
【０１０４】
本実施形態のように、撮影時ではなく後で特徴量抽出を行う方式は、例えば特徴量抽出対象となる画像が動画像であり、データ入出力部がランダムアクセス可能なメディアであって、撮影条件設定などが動画像データと別に格納されているような場合に特に有効である。
【０１０５】
この場合、動画像の先頭フレームから各フレームを順次参照して特徴量生成処理を行うのでなく、予め例えば図１４の１４０１のような撮影条件設定を参照し、本発明による方式によって実際に特徴量生成を行うフレーム画像を予め決定することが可能である。これは毎フレームに対して逐一特徴量生成判定を行う場合に比べ処理時間の短縮が期待できる。またあるいは実際に特徴量を生成することとなったフレーム画像を参照して特徴量生成を行うことにより処理の高速化も期待できる。
【０１０６】
＜他の実施形態＞
以上、本発明の好適な実施の形態について説明したが、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムを、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）がプログラムを読み出し実行することによっても、達成されることは言うまでもない。
【０１０７】
この場合、そのプログラム自体が前述した実施形態の機能を実現することになり、そのプログラムや、そのプログラムを記憶した記憶媒体或いはプログラム製品は、本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【０１０８】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【０１０９】
【発明の効果】
以上述べた通り、本発明によれば、画像に適した画像特徴量を自動的に選択し、抽出することができる。
【図面の簡単な説明】
【図１】本発明の第１実施形態に係る画像処理装置の一構成例を示すブロック図である。
【図２】上記画像処理装置による処理の流れを示すフローチャートである。
【図３】画像特徴量を選択するためのテーブルの例を示す図である。
【図４】本発明の第２実施形態における処理の流れを示すフローチャートである。
【図５】本発明の第３実施形態における処理の流れを示すフローチャートである。
【図６】画像特徴量を選択するためのテーブルの例を示す図である。
【図７】絞り優先モード及び絞り値設定によるテーブルの例を示す図である。
【図８】優先度情報ファイルの例を示す図である。
【図９】時系列に沿って特徴量データを接合したデータを表す概念図である。
【図１０】特徴量を生成する期間と特徴量を生成しない期間を表す概念図である。
【図１１】各インデックス情報１３０１と特徴量データ１３０２を示す図である。
【図１２】撮影時の撮影設定を記述したファイルの例を示す図である。
【図１３】動画像撮影時の撮影設定の変更履歴を表すファイルの例を示す図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for extracting an image feature amount.
[0002]
[Prior art]
Generally, methods for searching for images are roughly classified into a method using data such as search information (keyword and date) associated with an image, and a method using image feature amounts extracted from an image.
[0003]
The former method performs a text search using search information as a key rather than an image search, and obtains an image corresponding to the search result. Therefore, it is necessary to prepare the search information separately from the image by some method. In many cases, this work is performed manually, and the amount of work increases as the number of images to which search information is added increases.
[0004]
In many cases, the latter method can be automated by a computer, and thus has an advantage that the manual operation can be reduced or completely eliminated as compared with the former method. As this example, a thumbnail system using a reduced image of an original image, or a system such as MPEG-7 (ISO / IEC JTC1 / SC29 / WG11 Part 7) is known.
[0005]
Here, “CD 15938-3 MPEG-7 Multimedia Content Description Interface-Part 3 Visual” (ISO / IEC CD 15938-3 ISO / IEC JTC1 / SC29 / WG11 / N3703) or 7 x epi-mPEG According to “Model Version 8.0” (ISO / IEC JTC1 / SC29 / WG11 / N3663), in the MPEG-7 system, a descriptor (Descriptor) indicating a type of a feature amount of an image is shown in the following table. Categories as shown and descriptors belonging to the categories are defined.
[0006]
[Table 1]

[0007]
[Problems to be solved by the invention]
On the other hand, since each image feature value focuses on a certain feature (color, regularity, shape of an object, etc.) of the image, is a certain image feature value appropriate as a feature of the image? No, especially whether it is effective for image retrieval generally depends on the subject.
[0008]
Therefore, when an image feature amount is used for searching for an image, there is a problem as to what image feature amount to select. In this case, a method is conceivable in which each person visually checks the contents of the image one by one and determines what feature amount is appropriate for expressing the image. It is very troublesome to photograph ten to several hundred images and then perform this selection processing later.
[0009]
On the other hand, for example, it is conceivable to add image feature amounts by all the descriptors shown in Table 1 above, but depending on the content of the image, image feature amounts by some of the descriptors are not very effective for retrieval. However, as a result, redundant data is generated, or useless feature amount generation processing time is increased. For example, when face detection and face feature extraction processing are performed on an image in which a person is not photographed by a known method or the like, an area other than a face is erroneously detected and erroneously recognized as a face. Corresponds to this.
[0010]
Further, redundant data may not only cause a load on the storage device, but also cause a problem that the performance of a search using the feature amount is reduced.
[0011]
Therefore, an object of the present invention is to provide a technique for automatically selecting and extracting an image feature amount suitable for an image.
[0012]
[Means for Solving the Problems]
According to the present invention,
Selecting means for selecting an image feature amount to be extracted from an image from a plurality of types of predetermined image feature amounts;
Extracting means for extracting an image feature amount of the type selected by the selecting means;
With
The selecting means,
An image processing apparatus is provided, wherein the type of the image feature amount is selected based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.
[0013]
According to the present invention,
A selecting step of selecting an image feature to be extracted from the image from a plurality of types of predetermined image feature;
An extracting step of extracting an image feature amount of the type selected in the selecting step,
With
In the selection step,
An image processing method is provided, wherein the type of the image feature amount is selected based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.
[0014]
According to the present invention,
On the computer,
A selecting step of selecting an image feature to be extracted from the image from a plurality of types of predetermined image feature;
An extracting step of extracting an image feature amount of the type selected in the selecting step,
Is a program that executes
In the selection step,
A program is provided, wherein the type of the image feature amount is selected based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.
[0016]
<First embodiment>
FIG. 1 is a block diagram illustrating a configuration example of the image processing apparatus according to the first embodiment of the present invention. The image processing apparatus according to the present embodiment may be implemented, for example, in the form of a digital camera alone, a computer terminal having an image capturing function, or a system in which a digital camera and a computer terminal such as a personal computer are connected wirelessly or by wire. Is achieved. Hereinafter, these are referred to as

hardware configurations

1, 2, and 3, respectively.
[0017]
The image input unit 100 is composed of an image input device capable of inputting one or both of still image data and moving image data. For example, in the case of the

above hardware configurations

1 and 2, the image input unit 100 includes a photographing circuit including a CCD sensor. A circuit capable of digitizing an image. In the case of the hardware configuration 3, a known digital camera device, digital video device, digital video device capable of still photography, and the like can be given. Although not shown, various image processing functions such as compression of image data may be provided.
[0018]
The input unit 101 includes a device for inputting instructions and data from a user and performing various settings, for example, shooting settings such as a shooting mode and setting of a table described later. , A button, a mode dial, and the like. In the case of the

above hardware configurations

2 and 3, a keyboard, a pointing device, and the like are included. Note that examples of the pointing device include a mouse, a trackball, a trackpad, and a tablet.
[0019]
The data storage unit 102 is a device that records or reads out image data, feature amount data, and the like. For example, a hard disk, a CD-ROM or a CD-R, a memory card, a CF card, a smart media, an SD card, a memory stick Etc. Instead, a communication path control device such as an Ethernet (registered trademark) card, a modem, infrared rays, or a wireless communication module such as IEEE802.11b or Bluetooth is provided so that an external storage device can be accessed, so that image data can be accessed. Or feature amount data may be stored in an external storage device.
[0020]
The display unit 103 is a device that displays an image such as a GUI, and includes, for example, a finder in the case of the above hardware configuration 1, and a CRT or a liquid crystal display in the case of the above hardware configuration 2. The CPU 104 controls the entire image processing apparatus, and executes processing described later. The ROM 105 and the RAM 106 are storage units that provide programs, data, work areas, and the like necessary for the processing to the CPU 104.
[0021]
The main hardware configuration of the image processing apparatus according to the present embodiment has been described above. However, in the following description, only the configuration that is mainly required is simply illustrated, and other configurations can be included. Needless to say.
[0022]
FIG. 2 is a flowchart showing the flow of processing by the image processing apparatus. This process is executed, for example, with the input of an image by the image input unit 201 as a trigger. Alternatively, when the hardware configuration 1 is adopted as the image processing apparatus, the image processing may be executed by using, for example, an imaging process using a shutter as a trigger.
[0023]
In the setting acquisition step of S201, the photographing setting set by the user is acquired when photographing an image. For example, in a known digital camera, shooting can be performed by selecting a shooting mode such as Auto, portrait, night view, pan focus, landscape, black and white, and stitch assist as shooting settings.
[0024]
Portrait mode is a mode that is mainly suitable for capturing a person, especially a person's face, or some other object (flower, insect), in which the background of the image is blurred to make the foreground object stand out. Mode. The night view mode is a mode mainly suitable for nighttime shooting, and is a mode in which a flash is emitted and a shutter speed is reduced. The pan focus mode is a mode in which the focus is fixed and the background is clearly photographed. The landscape mode is a mode suitable for shooting a landscape by focusing on the background. The Auto mode is a general-purpose mode for determining settings on the digital camera device side.
[0025]
The black-and-white mode is a mode for converting a photographed result into a black-and-white image (gray scale). The stitch assist mode is a mode in which images are continuously captured to create a panoramic image.
[0026]
Next, in a feature amount selection step of S202, a type of an image feature amount to be extracted from the captured image is selected according to the imaging setting acquired in S201. In the case of the present embodiment, for example, the selection is made with reference to the table shown in FIG. This table is a table in which the relationship between each shooting setting and the type of image feature amount selected corresponding to each shooting setting is recorded. For example, the data format is stored in the ROM 105, the RAM 106, and the data storage unit 102. May be stored, or may be stored in the form of a table in a program for realizing this processing. Alternatively, it may be stored as a program or a control algorithm inside the apparatus.
[0027]
Here, the Descriptors belonging to the Color, Shape, Texture, and Face categories of the Visual Descriptor of MPEG-7 shown in Table 1 have the following characteristics in general.
[0028]
The Descriptor belonging to Color can express the color tone and color arrangement of the image. A Descriptor belonging to Shape can represent the shape of an object in an image. “Descriptor” belonging to “Texture” can express a texture pattern of an image and its regularity. The Descriptor belonging to Face can express the face of a person.
[0029]
For example, when photographing a landscape, the photographer's intention is mainly to focus on the background (mountain, sky, sea, etc.). Further, the photographer performs photographing by selecting, for example, a landscape mode or the like according to the intention. Therefore, in order to express an image captured in the landscape mode, an image feature amount that indicates the hue and regularity of the background is preferable. Conversely, a feature amount specialized for a person is not appropriate as a photographer's intention, and there is a high possibility that a person is not shown in an image in the first place. Conversely, when photographing is intended for portrait photographing, the photographer selects, for example, a portrait mode and photographs. Therefore, in this mode, there is a high possibility that an object such as a person or a flower is photographed. In such a case, it is preferable to generate a feature representing a shape or a face representing a person's face.
[0030]
The table of FIG. 3 is created in consideration of such circumstances. The table in FIG. 3 shows, among the types (Descriptors) of the image feature amounts listed in Table 1, Color Layout Descriptor (Color category), Contour Shape Descriptor (Shape category), Edge Historic ScriptDescriptor, and Edge Historic ScriptDescriptor. This is an example using (Face category). In the figure, “ON” indicates that the image feature amount is selected, and “OFF” means that the image feature amount is not selected.
[0031]
For example, in a mode for performing color photographing, Color Layout Descriptor is ON, in a mode for photographing with a foreground target as a main subject, a contour shape descriptor is ON, and in a mode for photographing mainly with a background as a main subject, Edge Histogram Descriptor is ON. In a mode in which a person is more likely to be photographed, Face Descriptor is set to ON.
[0032]
In addition, as processing on these programs, for example, variables v1, v2, v3, and v4 respectively corresponding to Descriptors (Color Layout, Contour Shape, Edge Histogram, Face) of FIG. Set ON / OFF value. For example, if the shooting setting acquired in S201 is a landscape mode, the values of the variables v1, v2, v3, and v4 are ON, OFF, ON, and OFF, respectively.
[0033]
Needless to say, there are other types of image features representing colors and shapes in addition to the Visual Descriptor adopted in MPEG-7, and furthermore, it should belong to a category other than the MPEG-7 category. There are also feature quantities. The above-described MPEG-7 is merely an example of a known image feature, and the present invention can employ a feature other than the MPEG-7.
[0034]
Finally, in a feature amount extraction step of S203, an image feature amount is extracted from the captured image. The type of the image feature quantity to be extracted is the one selected in S202, and the type of the image feature quantity of which the variable is ON is extracted with reference to the above-mentioned variables v1, v2, v3, and v4. When the landscape mode is set, the values of the variables v1 and v3 are ON, so that the image feature amount extraction by the corresponding Color Layout Descriptor and Edge Histogram is performed. The extracted image feature amount is recorded as feature amount data in the data storage unit 102 in some form in association with the corresponding image, and the process ends.
[0035]
As described above, in the present embodiment, by selecting and extracting the image feature amount according to the shooting setting set by the user, the selection of the type of the image feature amount is automated, and the characteristics of the captured image are selected. While the feature amount extraction is performed, the feature amount extraction that does not match the characteristics is suppressed. Therefore, an image feature amount suitable for the captured image is extracted, and, for example, the processing time of the feature amount data can be reduced, or the data of the feature amount data can be reduced compared to the case where all types of feature amounts are unconditionally extracted. The amount can be reduced.
[0036]
Furthermore, when the feature amount data generated according to the present embodiment is registered in the database, only the feature amount that more effectively represents the feature of each image is registered. Therefore, there is an advantage that an increase in feature amount data that is not very effective for search can be suppressed, and that search accuracy and speed can be improved.
[0037]
Note that the table in FIG. 3 is merely an example, and the correspondence between ON and OFF and the shooting mode may be another correspondence. Further, feature amounts other than the four Descriptors in the table may be used.
[0038]
In addition, although an example in which the contents of the table in FIG. 3 are predetermined and fixed is shown, for example, a means for modifying the table according to the user's preference is separately provided so that the table can be adapted to the user's intention. You may.
[0039]
Further, for example, when the user selects a shooting setting, the type of each image feature amount selected in accordance with the shooting setting is presented on the display unit 103 or the like, and the corresponding relationship is input by the user. You may make it correct suitably. In this case, the processing of S201 and S202 in FIG. 2 is executed with the user's selection of the shooting setting as a trigger, and further, the values of the variables v1, v2, v3, and v4 are updated by the user's specification, and the user releases the shutter. The processing of S303 may be modified so as to be triggered by a shooting operation such as the above.
[0040]
<Second embodiment>
In the present embodiment, only parts different from the first embodiment will be described. In the first embodiment, the photographing mode is described as the photographing setting selected by the user. However, for example, in a known digital camera, there is another photographing setting that the user selects according to the photographing situation.
[0041]
For example, there is a red-eye reduction processing function. The red-eye reduction processing function is to alleviate a phenomenon in which light emitted during flash emission is reflected on a retina of a human or the like, and as a result, eyes appear red. To alleviate this, the red-eye reduction processing function performs preliminary light emission prior to flash light emission during shooting, narrows the human pupil of the subject, reduces light reflected by the retina, and suppresses eyes from appearing red Things.
[0042]
The red-eye reduction processing is a technique mainly used when capturing an image including a human face, especially a face. Conversely, the red-eye reduction processing is not required for an image in which a face including an eye, particularly a human eye, is not captured. Therefore, when the red-eye reduction processing function is selected, the possibility that a human face is photographed is extremely high.
[0043]
FIG. 5 is a flowchart showing the flow of a process configured to reflect the condition regarding the red-eye reduction process. In the figure, the processing of S201, S202 and S203 is the same processing as the processing described in the first embodiment. However, in the present embodiment, in S201, a value indicating the selection (ON / OFF) of the red-eye reduction processing function is acquired in addition to the photographing mode. In S202, the image feature amount is selected based on the above-described photographing mode. In S401, the image feature amount is selected based on the selection of the red-eye reduction processing function. The latter is called a second feature value selection step.
[0044]
In the second feature amount selection step of S501, according to the selection of the red-eye mitigation processing function, if the red-eye mitigation processing function is ON, the value of the variable v4 relating to the extraction of the face characteristic amount is set to ON, and conversely, the red-eye mitigation processing function is set to OFF. If so, the value of the variable v4 is updated to OFF. Thereby, the image feature amount can be selected in accordance with the selection of the red-eye reduction processing function independently of the shooting mode.
[0045]
Here, among known digital cameras, in addition, in each of the shooting modes of Auto, portrait, monochrome, and stitch assist, a macro shooting mode can be appropriately selected. The macro shooting mode is a mode for shooting an object such as a flower or an insect up. Therefore, since an image of a certain object is displayed in an image captured by the user selecting this mode, it is preferable to extract a feature amount representing the shape of the object such as, for example, Shape.
[0046]
Therefore, for example, a value representing the selection (ON / OFF) of the macro mode is acquired in S201 of FIG. 4, and if the macro mode is ON in S401, for example, the value of the variable v2 relating to the extraction of the feature amount of the contour shape is changed. Just turn it on. By doing this,
For example, when the photographing mode selected by the user is the stitch assist mode, the value of the variable v2 is OFF according to the table of FIG. 3, and therefore, the feature amount of the contour shape is not originally extracted. In the case where the user has selected the macro mode, the feature amount of the contour shape is extracted. In the case of the present embodiment, in addition to the effects of the first embodiment, it is possible to perform optimal feature extraction in accordance with the shooting situation.
[0047]
<Third embodiment>
In the present embodiment, only portions different from the first and second embodiments will be described. In the second embodiment, the feature amount extraction of the face is selected by focusing on ON / OFF of the red-eye reduction processing function.
[0048]
However, for example, in a known digital camera, when the shooting mode of Auto, portrait, or night view is selected, the default of the red-eye reduction processing function is set to ON, and conversely, the default is set to OFF in the pan-focus and landscape shooting mode. Exists. As described above, there is a case where the apparatus automatically updates ON / OFF of the red-eye reduction processing function according to the selection of the shooting mode, and in this case, the user's intention may not be appropriately reflected.
[0049]
On the other hand, such a digital camera may be provided with a function of forcibly changing ON / OFF of the red-eye reduction processing function by a user operation. When the user forcibly turns on the red-eye reduction processing function, it is intended to photograph a human face including the eyes, in particular. It is intended to shoot something that does not exist.
[0050]
FIG. 5 is a flowchart showing the flow of processing assuming such a case. S201 to S203 in the figure are the same as those shown in FIG. 2 and described in the first embodiment. However, in this embodiment, in S201, a value indicating ON / OFF of the current red-eye reduction processing function is also acquired in addition to the shooting mode. In the present embodiment, the variable for storing this setting is v5. Further, in S202, the image feature amount is selected based on the above-described shooting mode. In S502, the image feature amount is selected based on the selection of the red-eye reduction processing function. The latter is called a second feature value selection step.
[0051]
In step S501, it is determined whether the variable v5 of the current red-eye reduction processing function matches the default of the red-eye reduction processing function of the current shooting mode. For example, in the case of the digital camera described above, when the shooting mode is Auto, portrait, or night view, if the value of v5 is ON, it matches the default, and if OFF, it does not match the default. , Pan focus, or landscape mode, if the value of v5 is OFF, it matches the default, and if it is ON, it does not match the default. If the value of v5 does not match the default, the process proceeds to S502; otherwise, the process proceeds to S203.
[0052]
In the second feature amount selection step of S502, the value of the variable v4 relating to the extraction of the feature amount of the face is turned on / off. This means that the value of v4 is turned on when the user forcibly changes the red-eye mitigation processing function to ON, and the value of v4 is turned off when the user forcibly changes the red-eye mitigation processing function to OFF. Is equivalent to
[0053]
In the case of the present embodiment, as compared with the effect of the second embodiment, it is possible to perform optimal feature extraction with respect to facial feature extraction in accordance with the user's determination of the shooting situation and intention.
[0054]
<Fourth embodiment>
In the present embodiment, only portions different from the first to third embodiments will be described. In the first to fourth embodiments, for example, the table shown in FIG. 3 is processed with ON / OFF binary values. However, the present invention is not limited to this, and it is possible to perform multivalued processing.
[0055]
FIG. 6 is an example of a multi-value table. Numerical values 1 to 5 in the table are selection priority values of the feature amount, and are examples in which the smaller the number, the higher the priority of selection. The range of numerical values is merely an example, and other ranges may be used. The correspondence between the magnitude of the numerical value and the priority may be reversed. If the table is subjected to, for example, threshold processing to perform ON / OFF binarization, the image feature amount can be selected in the same manner as in the above embodiment.
[0056]
The threshold value for this threshold processing can be set in advance by the user, for example, before image input. Numerical values can be input by the input unit 101, for example. Alternatively, input may be performed interactively using the display unit 103.
[0057]
In the case of the present embodiment, since the values used in the multi-value table of FIG. 6 are 1 to 5, the user inputs any of the values 1 to 5, for example, as the threshold. In the case of the present embodiment, the larger the number of the threshold value input by the user, the more the types of feature amounts to be selected, and conversely, the smaller the numerical value, the smaller the selected feature amounts. The threshold value input by the user is acquired together with the shooting setting in, for example, S201 of FIGS. 2, 4, and 5, and is stored in, for example, a variable Th.
[0058]
In the first to third embodiments, in step S202, the table is referred to and directly read into the variables v1, v2, v3, and v4. However, in the present embodiment, the table value is referred to and the table value and the variable Th are compared. Are turned on if the value in the table is equal to or less than Th, and turned off if not.
[0059]
For example, when the user selects a stitch assist shooting mode and shoots, according to the multi-value table of FIG. 6, the priority values of Color Layout, Contour Shape, Edge Histogram, and Face are 1, 4, 2, and 5, respectively. is there. Here, when the user previously inputs 3 as the threshold, the values of the control variables v1, v2, v3, and v4 are respectively
v1: ← ON (1 ≦ 3)
v2: ← OFF (4> 3)
v3: ← ON (2 ≦ 3)
v4: ← OFF (5> 3)
Then, a descriptor of Color Layout and Texture is selected.
[0060]
When the threshold value input by the user is 4,
v1: ← ON (1 ≦ 4)
v2: ← ON (4 ≦ 4)
v3: ← ON (2 ≦ 4)
v4: ← OFF (5> 4)
Thus, the descriptor of Shape controlled by the variable v2 is also selected.
[0061]
Conversely, if the threshold specified by the user is 1,
v1: ← ON (1 ≦ 1)
v2: ← OFF (4> 1)
v3: ← OFF (2> 1)
v4: ← OFF (5> 1)
, And only the descriptor of the color layout is selected.
[0062]
Further, in the second and third embodiments, an example has been described in which the feature amount selection based on the shooting mode is changed by setting other modes, but it is also possible to multivalue.
[0063]
For example, in the third embodiment, if the default of the red-eye reduction mode based on each shooting mode is different from the ON / OFF of the user-specified red-eye reduction processing function, v4 is set to ON; otherwise, v4 is set to OFF. For example, when the default of the red-eye reduction processing function is OFF and the user-specified red-eye reduction processing function is ON, the priority of the Face Descriptor of the multi-value table is set to 2, and the default of the red-eye reduction processing function is ON. If the designated red-eye reduction processing function is OFF, the priority may be set to 5 and so on.
[0064]
in this case,
-Even if the default of the red-eye reduction processing function is OFF and the user-specified red-eye reduction processing function is ON, if the threshold value specified by the user is 1, the feature amount extraction by Face Descriptor is not performed.
Even if the default of the red-eye reduction processing function is ON and the user-specified red-eye reduction processing function is OFF, if the threshold value specified by the user is 5, the feature amount is extracted by Face Descriptor.
Thus, more detailed control can be performed.
[0065]
Therefore, if the user wants to perform more detailed feature extraction for an image to be shot, the user should shoot with a larger threshold value in advance, and if he does not need to extract more detailed feature, The photographing may be performed with the threshold value being reduced in advance. By doing so, the intention of the photographer at the time of photographing can be reflected.
[0066]
In addition to this, some calculation is performed using the priority values in the multi-value table of FIG. 6, the priority values in other modes (for example, ON / OFF of the red-eye reduction processing function), and the threshold value specified by the user. , A feature amount may be selected.
[0067]
As described above, in the present embodiment, a more detailed selection can be performed by multi-valued the selection table of each feature amount and determining ON / OFF by the threshold value.
[0068]
<Fifth embodiment>
In the present embodiment, only the portions different from the first to fourth embodiments will be described. In the first to fourth embodiments, examples in which a selective mode such as a shooting mode, a mode of a red-eye reduction processing function, and an ON / OFF mode such as a macro shooting mode are used as shooting settings have been described. However, the present invention is also applicable to continuous or multi-stage shooting settings.
[0069]
For example, some known digital cameras are provided with modes such as a shutter priority mode (Tv), an aperture priority mode (Av), and a manual mode in addition to the above-described shooting mode.
[0070]
The shutter priority mode is a mode in which a user selects and fixes a shutter speed in advance and automatically sets an aperture value so as to obtain an appropriate exposure. Conversely, the aperture priority mode is a mode in which the user selects and fixes an aperture value and automatically sets a shutter speed so as to obtain an appropriate exposure. The manual mode is a mode in which a user selects and sets a shutter speed and an exposure.
[0071]
The shutter priority mode is selected, for example, when shooting an object moving at high speed, and is used when shooting is performed with the shutter speed set to high speed. The aperture priority mode is selected, for example, in a case where shooting is performed with only the object in the foreground raised by blurring the background. The aperture value is reduced (open the aperture) to reduce the depth of field. This is used when taking a picture or taking a clear picture of the background, when taking a picture with a large aperture value (close the aperture) and a large depth of field.
[0072]
In the present embodiment, an example using the aperture priority mode and the aperture value setting will be described. For example, as aperture values, F2.0, F2.2, F2.5, F2.8, F3.2, F4.0, F4.5, F5.0, F5.6, F6.3, F7.1, In the case of a digital camera that can be set in multiple steps such as F8.0, for example, a table shown in FIG. 7 is provided, and the user selects the feature amount extraction using the aperture value setting intentionally selected in the aperture priority mode or the manual mode. It is possible to do. Note that the priority values shown in FIG. 7 are merely examples, and each priority value may be another value.
[0073]
In the present embodiment, an example is described in which the aperture value selected by the user in the aperture priority mode is used. However, in addition to the above, the shutter speed in the shutter priority mode is used. It is also possible to select the feature amount of Shape or to raise its priority.
[0074]
If there is a zoom function, information of the zoom magnification is used as a shooting setting, and at the wide end, a feature amount suitable for representing an object of interest by the user, for example, a feature amount such as Shape is selected. Alternatively, it is also possible to increase the priority and conversely, on the tele side, select a characteristic amount suitable for representing an image such as a landscape, for example, a characteristic amount such as Texture, or increase the priority.
[0075]
As described above, in the case of the present embodiment, it is possible to perform more detailed feature amount extraction.
[0076]
<Sixth embodiment>
In the present embodiment, only portions different from the first to fifth embodiments will be described. In the first to fifth embodiments, the setting values in the tables are static as in the tables of FIG. 3 and FIG. 6, but may be of course variable. For example, the user may be able to change the setting values in the tables in FIGS. 3 and 6 so that the user operates the input unit 101 or the like before photographing so as to be his / her favorite.
[0077]
Further, it is also possible to read the set values of the table for selecting the feature amount from the data storage unit 102 and perform control. In particular, when the data storage unit 102 is configured by a removable storage medium such as a memory card, a CF card, a smart media, an SD card, or a memory stick, a file in which settings relating to a table for selecting a feature amount are described in advance. Can be more easily realized by creating and storing this. In this case, a file can be created by a computer terminal or the like other than the image processing apparatus of the present embodiment.
[0078]
The data format of the file in which the settings are described is, for example, a format that includes information that can form a table as shown in FIGS. 3 and 6, or a format that can describe the conditions as described in the second and third embodiments. Any format may be used.
[0079]
In this way, by making it easy for the user to freely set the feature amount selection table, it is possible to perform feature amount extraction in accordance with personal preferences of the user.
[0080]
<Seventh embodiment>
In the present embodiment, only parts different from the first to sixth embodiments will be described. In the fourth and fifth embodiments, the multi-valued priority value is simply used for selecting the feature value, but this value may be output together with the feature value data. FIG. 8 shows an example in which this priority value is output to a file.
[0081]
This example is an example in which the user selects the landscape mode and outputs the priority when the feature amount is extracted according to the table in FIG. In the case of this example, the threshold value specified by the user is 5, or the case where all the Descriptors are selected without particularly performing the threshold processing.
[0082]
In the file shown in FIG. 8, the first line is the name of an image file for storing a captured image. The second line is a priority value of the Color Layout Descriptor, and the third line is a file name for storing the Color Layout Descriptor data. Similarly, the fourth and fifth lines are the priority value and file name of the contour shape descriptor, the sixth and seventh lines are the priority value and the file name of the Edge Histogram, and the eighth and ninth lines are the priority value and the file name of the face descriptor. is there.
[0083]
The data format of the file in FIG. 8 is merely an example, and information sufficient to refer to the original image file, information sufficient to refer to the data of each feature amount, and the It may be in any format including information for associating a priority value. In this example, an example in which the image file and the feature amount data are stored in a file format different from the image file or the feature amount data file may be stored.
[0084]
Further, if it is possible to easily associate image data, feature amount data, and feature amount priority data with an identifier such as a file name, the file shown in FIG. 8 is not necessarily required. For example, the base name of the file name may be the same, and a rule may be provided such that the extension indicates what data it is, and the file name is associated with the file.
[0085]
In this case, for example, in the case of a search performed later, the output feature amount priority data can be used as weighting data when integrating similarity determination results based on a plurality of types of feature amounts. There is an advantage that a search can be performed according to the content and the intention of the photographer.
[0086]
<Eighth embodiment>
In the present embodiment, only portions different from the first to seventh embodiments will be described. In the first to seventh embodiments, examples in which the present invention is mainly applied to still images have been described, but the present invention can also be applied to moving images. Most simply, the processing described in the first to seventh embodiments is applied to each frame image of a moving image.
[0087]
In this case, for example, among the MPEG-7 Descriptors shown in Table 1, there are Descriptors for moving images, such as a Descriptor of the Motion category and a combination of the TimeSeries Descriptor and another Descriptor. Is also good.
[0088]
In particular, as described in the sixth embodiment, when the user intentionally shoots at a high shutter speed in the shutter speed priority mode or the like, it can be estimated that the user is probably shooting a fast-moving object. In this case, for example, it is possible to select a feature amount such as an MPEG-7 Motion Transaction or Parametric Motion that can express an area in the image and its movement, or to perform a process of increasing the priority.
[0089]
What is necessary is just to arrange the feature amount data of each frame in a time series as the storage format of the feature amount data of the moving image. For example, the feature amount data of each frame may be stored in an individual file, and for example, the identifier of the file may be assigned a number in a chronological order, or the feature amount data may be stored in a chronological order. It may be joined and stored. FIG. 9 is a conceptual diagram illustrating data obtained by joining feature amount data along a time series.
[0090]
On the other hand, when shooting a moving image, it is conceivable to change each shooting setting during shooting. As a result, for example, as shown in FIG. 10, there is a possibility that there is a period in which certain feature amount data is generated and a period in which certain feature amount data is not generated. In FIG. 10,

periods

1201 and 1203 indicate a period during which image features are selected and extracted, and period 1202 indicates a period during which no image features are selected and extracted.
[0091]
In such a case, for example, it is possible to solve the problem by preparing index information in time series in advance for the number of frame images and setting a link only to existing feature data. FIG. 11 shows each piece of index information 1301 and feature amount data 1302. In the example of FIG. 11, a state in which the feature amount is extracted from frame 0 to frame I-1, the feature amount is not extracted from frame I to frame N-2, and the feature amount is further extracted in frame N-1. It is an example to represent. In the example of FIG. 11, the arrow represents a link, but the link can be realized by a publicly known URL or URI, or an offset in a file.
[0092]
In FIG. 11, the index information 1301 and the feature amount data 1302 are written separately, but they may be separate data, or may be joined and stored as one data. Further, the diagram shown in FIG. 11 is an example of a data storage format in the case where there is a period in which a feature amount is extracted and a period in which a feature amount is not extracted, and other formats may be used.
[0093]
<Ninth embodiment>
In the present embodiment, only parts different from the first to eighth embodiments will be described. In the first to eighth embodiments, the example in which the feature amount is extracted at the time of shooting a still image or a moving image has been described. However, the shooting setting such as the mode selected by the user and the still image or the moving image to be extracted with the feature amount are described. And it is not always necessary to perform feature extraction at the time of photographing.
[0094]
FIG. 12 is a diagram illustrating an example of a file describing shooting settings at the time of shooting. This file describes shooting settings necessary for selecting an image feature amount. By creating such a file at the time of shooting and storing it in some way in association with a shot image, it is possible to extract an image feature amount after shooting.
[0095]
For example, by outputting information as shown in FIG. 12 together with an image to the data storage unit 102, it is possible to select and extract feature amounts after photographing. The file shown in FIG. 10 is merely an example, and may have any format as long as it can describe the shooting settings described in the first to eighth embodiments. Further, the item to be described may be another set value as long as the information can be used for selecting the feature amount.
[0096]
Further, the file of FIG. 12 may be stored alone, or may be stored in the image file by joining it to the image file. For example, in the case of a still image, a data tag for storing shooting settings such as an exposure program, an aperture value, and a shutter speed is defined in a known Exif image format. The exposure program has values and meanings as shown in Table 2 below.
[0097]
[Table 2]

[0098]
Therefore, the description may be made using a known Exif format tag. Furthermore, even when there is no tag suitable for the user setting desired to be described in Exif, it can be realized by using a tag such as MakerNote or UserComment for writing individual information or a tag for comment.
[0099]
Further, in the case of the present embodiment, it is also possible to transfer image data and shooting settings to a personal computer, and perform the feature amount selection and extraction processing as described in the first to eighth embodiments on the personal computer.
[0100]
In this case, it is sufficient that the tables as shown in FIGS. 3 and 6 exist in the personal computer in advance. However, a file describing this table is created and output together with images and shooting settings to the personal computer. This may be referred to when performing the feature amount selection and extraction processing on the side.
[0101]
In this case, it is sufficient for the user to input the threshold value specified by the user on the personal computer side in the fourth embodiment, but the threshold value of the user at the time of shooting may be filed and output to the personal computer. The format of the file storing the threshold may be any format including the threshold data, or may be described in a tag such as MakerNote or UserComment of the Exif tag.
[0102]
As described in the eighth embodiment, in the case of moving image shooting, the shooting setting may be changed during shooting. Although it is possible to create a file as illustrated in FIG. 12 for each frame, more efficiently, for example, as shown in FIG. Just fine.
[0103]
In the case of the file shown in FIG. 13, shooting is performed with the aperture priority mode and an aperture value of F2.8 from frame 0 to frame I-1, the aperture value is changed to F4.0 at the time of frame I, and the mode is set to shutter priority in frame N. This indicates that shooting was performed with the mode changed and the shutter speed changed to 1/400 second. The example of FIG. 13 is merely an example, and other formats may be used. Although the unit is the frame number, it may be a time or a relative time from the start of shooting.
[0104]
As in the present embodiment, a method of extracting a feature amount later than at the time of shooting is a method in which an image to be extracted as a feature amount is a moving image, a data input / output unit is a randomly accessible medium, and This is particularly effective when the condition settings and the like are stored separately from the moving image data.
[0105]
In this case, the feature amount generation processing is not performed by sequentially referring to each frame from the first frame of the moving image. A frame image to be generated can be determined in advance. This can be expected to reduce the processing time as compared with the case where the feature amount generation determination is performed for each frame. Alternatively, the processing speed can be expected to be increased by generating the feature amount by referring to the frame image for which the feature amount is actually generated.
[0106]
<Other embodiments>
As described above, the preferred embodiment of the present invention has been described. However, an object of the present invention is to supply a software program for realizing the functions of the above-described embodiment to a system or an apparatus, and to execute the computer (or the computer) of the system or the apparatus. Needless to say, this can also be achieved by reading and executing a program by a CPU or an MPU.
[0107]
In this case, the program itself implements the functions of the above-described embodiment, and the program, a storage medium storing the program, or a program product constitutes the present invention. When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.
[0108]
Further, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is executed based on the instruction of the program code. It goes without saying that the CPU included in the expansion card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0109]
【The invention's effect】
As described above, according to the present invention, an image feature amount suitable for an image can be automatically selected and extracted.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of an image processing apparatus according to a first embodiment of the present invention.
FIG. 2 is a flowchart illustrating a flow of a process performed by the image processing apparatus.
FIG. 3 is a diagram illustrating an example of a table for selecting an image feature amount.
FIG. 4 is a flowchart showing a flow of processing according to a second embodiment of the present invention.
FIG. 5 is a flowchart illustrating a flow of processing according to a third embodiment of the present invention.
FIG. 6 is a diagram illustrating an example of a table for selecting an image feature amount.
FIG. 7 is a diagram illustrating an example of a table based on aperture priority mode and aperture value setting.
FIG. 8 is a diagram illustrating an example of a priority information file.
FIG. 9 is a conceptual diagram illustrating data obtained by joining feature amount data along a time series.
FIG. 10 is a conceptual diagram illustrating a period in which a feature amount is generated and a period in which a feature amount is not generated.
FIG. 11 is a diagram showing index information 1301 and feature data 1302.
FIG. 12 is a diagram showing an example of a file describing shooting settings at the time of shooting.
FIG. 13 is a diagram illustrating an example of a file representing a change history of shooting settings at the time of shooting a moving image.

Claims

予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択手段と、
前記選択手段により選択された種類の画像特徴量を抽出する抽出手段と、
を備え、
前記選択手段は、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とする画像処理装置。Selecting means for selecting an image feature amount to be extracted from an image from a plurality of types of predetermined image feature amounts;
Extracting means for extracting an image feature amount of the type selected by the selecting means;
With
The selecting means,
An image processing apparatus, wherein a type of the image feature amount is selected based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.

前記選択手段は、
前記撮影設定と、該撮影設定に対応して選択される前記画像特徴量の種類と、の関係が記録されたテーブルに基づいて前記画像特徴量の種類を選択することを特徴とする請求項１に記載の画像処理装置。The selecting means,
2. The type of the image feature quantity is selected based on a table in which a relationship between the shooting setting and the type of the image feature quantity selected according to the shooting setting is recorded. An image processing apparatus according to claim 1.

前記テーブルには、前記撮影設定に対応して選択される前記画像特徴量の種類の優先度が記録されており、
前記選択手段は、前記優先度と予め定められた閾値とに基づいて前記画像特徴量の種類を選択することを特徴とする請求項２に記載の画像処理装置。In the table, the priority of the type of the image feature amount selected corresponding to the shooting setting is recorded,
The image processing apparatus according to claim 2, wherein the selection unit selects the type of the image feature amount based on the priority and a predetermined threshold.

更に、
前記閾値をユーザが設定するための入力手段を備えたことを特徴とする請求項３に記載の画像処理装置。Furthermore,
The image processing apparatus according to claim 3, further comprising an input unit configured to allow the user to set the threshold.

更に、
前記テーブルをユーザが設定するための入力手段を備えたことを特徴とする請求項２に記載の画像処理装置。Furthermore,
The image processing apparatus according to claim 2, further comprising an input unit for setting the table by a user.

前記撮影設定には、撮影モードが含まれることを特徴とする請求項１に記載の画像処理装置。The image processing apparatus according to claim 1, wherein the shooting setting includes a shooting mode.

前記撮影設定には、赤目緩和処理機能、マクロモード、ズーム倍率、絞り優先モード又はシャッター優先モードのいずれかが含まれることを特徴とする請求項１に記載の画像処理装置。The image processing apparatus according to claim 1, wherein the shooting setting includes one of a red-eye reduction processing function, a macro mode, a zoom magnification, an aperture priority mode, and a shutter priority mode.

予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択工程と、
前記選択工程において選択された種類の画像特徴量を抽出する抽出工程と、
を備え、
前記選択工程では、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とする画像処理方法。A selecting step of selecting an image feature to be extracted from the image from a plurality of types of predetermined image feature;
An extracting step of extracting an image feature amount of the type selected in the selecting step,
With
In the selection step,
An image processing method, wherein a type of the image feature amount is selected based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.

コンピュータに、
予め定めた複数種類の画像特徴量の中から、画像から抽出する画像特徴量を選択する選択工程と、
前記選択工程において選択された種類の画像特徴量を抽出する抽出工程と、
を実行させるプログラムであって、
前記選択工程では、
前記画像特徴量を抽出する画像の撮影時の撮影設定に基づいて、前記画像特徴量の種類を選択することを特徴とするプログラム。On the computer,
A selecting step of selecting an image feature to be extracted from the image from a plurality of types of predetermined image feature;
An extracting step of extracting an image feature amount of the type selected in the selecting step,
Is a program that executes
In the selection step,
A program for selecting a type of the image feature amount based on a shooting setting at the time of shooting an image from which the image feature amount is extracted.