JP2012044267A

JP2012044267A - Imaging device, subject search method, and program

Info

Publication number: JP2012044267A
Application number: JP2010181102A
Authority: JP
Inventors: Takeshi Minami; 剛南
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2010-08-12
Filing date: 2010-08-12
Publication date: 2012-03-01

Abstract

PROBLEM TO BE SOLVED: To enable imaging to be performed without missing a necessary timing.SOLUTION: An imaging device comprises: imaging systems 11-13 that take an image; a program memory 21 that stores feature information indicating features in an image corresponding to a specific subject and trigger information for starting search for the specific subject; microphones 16L and 16R and a sound processor 17 into which the trigger information is input; and a CPU 19, a main memory 20, and a sound recognition part 17a that determine whether the inputted trigger information agrees with the trigger information stored in the program memory 21, and search the image taken by the imaging system 11-13 for the feature information based on result of the determination.

Description

本発明は、顔認識機能を有する撮像装置、被写体探索方法及びプログラムに関する。 The present invention relates to an imaging device having a face recognition function, a subject searching method, and a program.

近年のデジタルカメラでは顔認識機能を標準で搭載するものが多く、中には人間の顔認識のみならず、犬や猫などのペットの顔を認識してピントや露出を制御する機種も商品化されている。この技術は、被写体であるペットに対して正面の顔を検出してピントを合わせ、自動的にシャッタレリーズを行なうことで、動き回る被写体の適切な撮影タイミングを逃さないようにしたものである。 Many recent digital cameras are equipped with a face recognition function as a standard feature. Some models not only recognize human faces, but also recognize the faces of pets such as dogs and cats to control focus and exposure. Has been. This technique detects the front face of a pet that is a subject, focuses the subject, and automatically releases the shutter so that the appropriate shooting timing of the moving subject is not missed.

また、被写体の正面の顔を検出する技術に類似するものとして、プレシャッターモードで被写体の顔検出を行なって視線方向がカメラに向いた瞬間にシャッタレリーズを行なうようにした技術が考えられている。（例えば、特許文献１） Further, as a technique similar to the technique for detecting the front face of the subject, a technique is considered in which the face of the subject is detected in the pre-shutter mode, and the shutter release is performed at the moment when the line of sight is directed to the camera. . (For example, Patent Document 1)

特開２００８−１８２４８５号公報JP 2008-182485 A

上記顔認識技術、及び視線検出技術を用いたデジタルカメラでは、上記特許文献に記載された技術も含めて、モニタ画像中の顔を認識した状態から即時シャッタレリーズに備えるべく、ＡＦ（自動合焦）機能及びＡＥ（自動露出）機能を連続的に動作させながら撮影タイミングを待機することになる。したがって、顔認識機能を使用しない通常の撮影モードに比してシャッターチャンスを待つ時間が長くなるという不具合がある。 In the digital camera using the face recognition technology and the eye gaze detection technology, including the technology described in the patent document, AF (automatic focusing) is prepared in order to prepare for an immediate shutter release from the state in which the face in the monitor image is recognized. ) And the AE (automatic exposure) function are continuously operated to wait for the photographing timing. Therefore, there is a problem that the time for waiting for a photo opportunity becomes longer than in a normal shooting mode in which the face recognition function is not used.

本発明は上記のような実情に鑑みてなされたもので、その目的とするところは、必要なタイミングを逃すことなく撮像を実行することが可能な撮像装置、被写体探索方法及びプログラムを提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an imaging device, a subject search method, and a program capable of performing imaging without missing a necessary timing. It is in.

請求項１記載の発明は、画像を撮像する撮像手段と、特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶手段と、上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶手段と、トリガ情報を入力するトリガ入力手段と、上記トリガ入力手段で入力したトリガ情報と上記第２の記憶手段が記憶するトリガ情報とが所定値以上の類似度か否かを判断する判断手段と、上記判断手段により所定値以上の類似度と判断されると、上記撮像手段が撮像する画像から上記第１の記憶手段が記憶する特徴情報に基づいて、前記特定の被写体を探索する探索手段とを備えたことを特徴とする。 According to the first aspect of the present invention, there is provided an imaging unit that captures an image, a first storage unit that stores feature information on an image corresponding to a specific subject, and an instruction to start the search for the specific subject. The second storage means for storing the trigger information, the trigger input means for inputting the trigger information, the trigger information input by the trigger input means, and the trigger information stored by the second storage means are similar to each other with a predetermined value or more. If the determination means determines whether the degree of similarity is equal to or greater than a predetermined value, the determination means determines whether the first storage means stores an image captured from the image captured by the imaging means. Search means for searching for the specific subject is provided.

請求項２記載の発明は、上記請求項１記載の発明において、上記撮像手段に、上記特徴情報の候補となる被写体を合焦させる自動合焦手段と、上記探索手段に、上記自動合焦手段により合焦された画像を取得させ特徴情報を探索させる探索制御手段と、上記探索制御手段により探索された特徴情報を含む特定の被写体の画像を記録する記録手段とをさらに備えたことを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention, the image pickup means has an automatic focusing means for focusing an object that is a candidate for the feature information, and the search means has the automatic focusing means. A search control means for acquiring an image focused by the above and searching for feature information; and a recording means for recording an image of a specific subject including the feature information searched by the search control means. To do.

請求項３記載の発明は、上記請求項２記載の発明において、上記探索手段は、上記特徴情報が上記撮像手段に正対しているか否かを判別し、上記記録手段は、上記探索手段により上記撮像手段に正対したと判別された特徴情報を含む特定の被写体の画像を記録する
ことを特徴とする。 The invention according to claim 3 is the invention according to claim 2, wherein the search means determines whether or not the feature information is directly facing the imaging means, and the recording means is It is characterized in that an image of a specific subject including characteristic information determined to be directly facing the imaging means is recorded.

請求項４記載の発明は、上記請求項１記載の発明において、上記トリガ情報は音声情報であることを特徴とする。 The invention described in claim 4 is the invention described in claim 1, characterized in that the trigger information is audio information.

請求項５記載の発明は、上記請求項４記載の発明において、上記判断手段は、上記トリガ情報の音声認識処理及び話者認識処理により所定値以上の類似度か否かを判断することを特徴とする。 The invention according to claim 5 is the invention according to claim 4, wherein the determination means determines whether or not the degree of similarity is equal to or greater than a predetermined value by voice recognition processing and speaker recognition processing of the trigger information. And

請求項６記載の発明は、画像を撮像する撮像装置での被写体探索方法であって、特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶ステップと、上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶ステップと、トリガ情報を入力するトリガ入力ステップと、上記トリガ入力ステップにて入力したトリガ情報と上記第２の記憶ステップで記憶したトリガ情報とが所定値以上の類似度か否かを判断ステップと、上記判断ステップにて所定値以上の類似度と判断されると、撮像する画像から上記第１の記憶ステップで記憶する特徴情報に基づいて、前記特定の被写体を探索する探索ステップとを含むことを特徴とする。 The invention according to claim 6 is a method for searching for a subject in an image pickup apparatus that picks up an image, the first storing step storing feature information on the image corresponding to the specific subject, and searching for the specific subject. A second storage step for storing trigger information for instructing the start of a trigger, a trigger input step for inputting trigger information, the trigger information input in the trigger input step, and the trigger stored in the second storage step A step of determining whether or not the information is a similarity greater than or equal to a predetermined value; and if the degree of similarity is greater than or equal to a predetermined value in the determination step, the feature information stored in the first storage step from the image to be captured And a search step for searching for the specific subject.

請求項７記載の発明は、画像を撮像する撮像装置のコンピュータを、上記特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶手段、上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶手段、トリガ情報を入力するトリガ入力手段、上記トリガ入力手段により入力されたトリガ情報と上記第２の記憶手段により記憶されたトリガ情報とが所定値以上の類似度か否かを判断する判断手段、上記判断手段により所定値以上の類似度と判断されると、撮像する画像から上記第１の記憶手段で記憶する特徴情報に基づいて、前記特定の被写体を探索する探索手段として機能させることを特徴とする。 According to the seventh aspect of the invention, the computer of the image pickup apparatus that picks up an image instructs the first storage means for storing the feature information on the image corresponding to the specific subject, and the start of the search for the specific subject. Second triggering means for storing trigger information for triggering, trigger input means for inputting trigger information, trigger information input by the trigger input means and trigger information stored by the second storage means are greater than or equal to a predetermined value Determining means for determining whether or not the similarity is equal to or greater than a predetermined value by the determining means, based on the feature information stored in the first storage means from the image to be captured, It functions as a search means for searching for a subject.

本発明によれば、必要なタイミングを逃すことなく撮像を実行することが可能となる。 According to the present invention, it is possible to execute imaging without missing a necessary timing.

本発明の一実施形態に係るデジタルカメラの機能回路の構成を示すブロック図。1 is a block diagram showing a configuration of a functional circuit of a digital camera according to an embodiment of the present invention. 同実施形態に係る話者登録モード時の処理内容を示すフローチャート。The flowchart which shows the processing content at the time of the speaker registration mode which concerns on the same embodiment. 同実施形態に係る顔検出モード時の処理内容を示すフローチャート。The flowchart which shows the processing content at the time of the face detection mode which concerns on the same embodiment. 同実施形態に係る顔検出モード時のモニタ画像を例示する図。The figure which illustrates the monitor image at the time of face detection mode concerning the embodiment. 同実施形態に係る顔検出モード時のモニタ画像を例示する図。The figure which illustrates the monitor image at the time of face detection mode concerning the embodiment. 同実施形態に係る顔検出モード時のモニタ画像を例示する図。The figure which illustrates the monitor image at the time of face detection mode concerning the embodiment. 同実施形態に係る顔検出モード時のモニタ画像を例示する図。The figure which illustrates the monitor image at the time of face detection mode concerning the embodiment.

以下、本発明をペット認識機能を有するデジタルカメラに適用した場合の一実施形態について図面を参照して説明する。 Hereinafter, an embodiment when the present invention is applied to a digital camera having a pet recognition function will be described with reference to the drawings.

図１は、本実施形態に係るデジタルカメラ１０の回路構成を示すものである。同図では、カメラ筐体の前面に配設される光学レンズユニット１１を介して、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ：電荷結合素子）やＣＭＯＳイメージセンサ等で構成される固体撮像素子（ＩＳ）１２の撮像面上に被写体の光像を入射して結像させる。 FIG. 1 shows a circuit configuration of a digital camera 10 according to the present embodiment. In the figure, a solid-state imaging device (IS) 12 composed of, for example, a CCD (Charge Coupled Device), a CMOS image sensor, or the like is passed through an optical lens unit 11 disposed on the front surface of the camera housing. An optical image of a subject is incident on the imaging surface to form an image.

スルー画像表示、あるいはライブビュー画像表示とも称されるモニタ状態では、この固体撮像素子１２での撮像により得た画像信号をＡＧＣ・Ａ／Ｄ変換部１３に送り、相関二乗サンプリングや自動ゲイン調整、Ａ／Ｄ変換処理を実行してデジタル化する。このデジタル値の画像データはシステムバスＳＢを介して画像処理部１４に送られる。 In a monitor state, also referred to as through image display or live view image display, an image signal obtained by imaging with the solid-state imaging device 12 is sent to the AGC / A / D conversion unit 13 to perform correlation square sampling, automatic gain adjustment, A / D conversion processing is executed and digitized. The digital image data is sent to the image processing unit 14 via the system bus SB.

画像処理部１４では、画像データに対して画素補間処理、γ補正処理を含むカラープロセス処理を施して表示部１５へ送り、スルー画像として表示させる。 In the image processing unit 14, color process processing including pixel interpolation processing and γ correction processing is performed on the image data, which is sent to the display unit 15 to be displayed as a through image.

また、画像処理部１４内には顔認識部１４ａを備える。この顔認識部１４ａは、画像データに対して、例えば固有顔あるいは隠れマルコフモデル等の顔認識アルゴリズムを用い、予め登録されている顔データに基づいた顔認識処理を行なってその認識結果を後述するＣＰＵ１９に送出する。 The image processing unit 14 includes a face recognition unit 14a. The face recognition unit 14a performs face recognition processing based on face data registered in advance using a face recognition algorithm such as a unique face or a hidden Markov model, for example, on the image data, and the recognition result will be described later. The data is sent to the CPU 19.

また、上記光学レンズユニット１１と同じくカメラ筐体前面には、一対のマイクロホン１６Ｌ，１６Ｒが配設され、被写体方向の音声がステレオで入力される。マイクロホン１６Ｌ，１６Ｒはそれぞれ入力した音声を電気信号化し、音声処理部１７へ出力する。 Similarly to the optical lens unit 11, a pair of microphones 16L and 16R are disposed on the front surface of the camera casing, and the sound in the subject direction is input in stereo. The microphones 16 L and 16 R convert the input sound into electrical signals and output them to the sound processing unit 17.

音声処理部１７は、音声単体での録音時、音声付き静止画像撮影時、及び動画像の撮影時にマイクロホン１６Ｌ，１６Ｒから入力する音声信号をデジタルデータ化する。さらに音声処理部１７は、デジタル化した音声データの音圧レベルを検出する一方で、該音声データを所定のデータファイル形式、例えばＡＡＣ（ｍｏｖｉｎｇｐｉｃｔｕｒｅｅｘｐｅｒｔｓｇｒｏｕｐ−４ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）形式でデータ圧縮して音声データファイルを作成し、後述する記録媒体へ送出する。 The sound processing unit 17 converts the sound signals input from the microphones 16L and 16R into digital data when recording a single sound, capturing a still image with sound, and capturing a moving image. Further, the sound processing unit 17 detects the sound pressure level of the digitized sound data, and compresses the sound data in a predetermined data file format, for example, AAC (moving picture experts group-4 Advanced Audio Coding) format. An audio data file is created and sent to a recording medium to be described later.

加えて音声処理部１７は、ＰＣＭ音源等の音源回路を備え、音声の再生時に送られてくる音声データファイルの圧縮を解いてアナログ化し、このデジタルカメラ１０の筐体背面側に設けられるスピーカ１８を駆動して、拡声放音させる。 In addition, the sound processing unit 17 includes a sound source circuit such as a PCM sound source, and uncompresses and converts the sound data file sent at the time of sound reproduction into an analog signal, and a speaker 18 provided on the rear side of the housing of the digital camera 10. To sound a loud sound.

さらに音声処理部１７内に音声認識部１７ａを備える。この音声認識部１７ａは、音響モデルを用いてマイクロホン１６Ｌ，１６Ｒから入力した音声に対する音声認識処理を実行する。 Furthermore, the speech processing unit 17 includes a speech recognition unit 17a. The voice recognition unit 17a performs voice recognition processing on the voice input from the microphones 16L and 16R using an acoustic model.

加えて音声認識部１７ａは、予め登録されているユーザの音声データに基づいた話者認識（話者照合）処理を実行し、その照合結果をＣＰＵ１９へ送出する。 In addition, the voice recognition unit 17a executes speaker recognition (speaker verification) processing based on user's voice data registered in advance, and sends the verification result to the CPU 19.

以上の回路をＣＰＵ１９が統括制御する。このＣＰＵ１９は、メインメモリ２０、プログラムメモリ２１と直接接続される。メインメモリ２０は、例えばＳＲＡＭで構成され、ワークメモリとして機能する。プログラムメモリ２１は、例えばフラッシュメモリなどの電気的に書換可能な不揮発性メモリで構成され、後述する撮影モード時の制御を含む動作プログラムやデータ等を固定的に記憶する。 The CPU 19 performs overall control of the above circuit. The CPU 19 is directly connected to the main memory 20 and the program memory 21. The main memory 20 is constituted by an SRAM, for example, and functions as a work memory. The program memory 21 is composed of an electrically rewritable non-volatile memory such as a flash memory, for example, and fixedly stores an operation program, data, and the like including control in a shooting mode described later.

ＣＰＵ１９はプログラムメモリ２１から必要なプログラムやデータ等を読出し、メインメモリ２０に適宜一時的に展開記憶させながら、このデジタルカメラ１０全体の制御動作を実行する。 The CPU 19 reads out necessary programs, data, and the like from the program memory 21 and executes the control operation of the entire digital camera 10 while temporarily expanding and storing it in the main memory 20 as appropriate.

さらに上記ＣＰＵ１９は、操作部２２から直接入力される各種キー操作信号に対応して制御動作を実行する。操作部２２は、例えば電源キー、シャッタレリーズキー、ズームアップ／ダウンキー、撮影モードキー、再生モードキー、メニューキー、カーソル（「↑」「→」「↓」「←」）キー、セットキー、プレイバックキー、ディスプレイキー等を備える。 Further, the CPU 19 executes a control operation in response to various key operation signals input directly from the operation unit 22. The operation unit 22 includes, for example, a power key, a shutter release key, a zoom up / down key, a shooting mode key, a playback mode key, a menu key, cursor (“↑”, “→”, “↓”, “←”) keys, a set key, Includes playback keys, display keys, etc.

ＣＰＵ１９は、システムバスＳＢを介して上記ＡＧＣ・Ａ／Ｄ変換部１３、画像処理部１４、表示部１５、及び音声処理部１７の他、さらにレンズ駆動部２３、フラッシュ駆動部２４、イメージセンサ（ＩＳ）駆動部２５、メモリカードコントローラ２６、及びＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）インターフェイス（Ｉ／Ｆ）２７と接続される。 In addition to the AGC / A / D conversion unit 13, the image processing unit 14, the display unit 15, and the audio processing unit 17, the CPU 19 further includes a lens driving unit 23, a flash driving unit 24, and an image sensor (via the system bus SB). It is connected to an IS) drive unit 25, a memory card controller 26, and a USB (Universal Serial Bus) interface (I / F) 27.

レンズ駆動部２３は、ＣＰＵ１９からの制御信号を受けてレンズ用ＤＣモータ（Ｍ）２８の回転を制御し、上記光学レンズユニット１１を構成する複数のレンズ群中の一部、具体的にはズームレンズ及びフォーカスレンズの位置をそれぞれ個別に光軸方向に沿って移動させる。 The lens driving unit 23 receives a control signal from the CPU 19 and controls the rotation of the lens DC motor (M) 28, and controls a part of the plurality of lens groups constituting the optical lens unit 11, specifically zoom. The positions of the lens and the focus lens are individually moved along the optical axis direction.

フラッシュ駆動部２４は、静止画像撮影時にＣＰＵ１９からの制御信号を受けて複数の白色高輝度ＬＥＤで構成されるフラッシュ部２９を撮影タイミングに同期して点灯駆動する。 The flash drive unit 24 receives a control signal from the CPU 19 during still image shooting, and drives the flash unit 29 including a plurality of white high-intensity LEDs to be lit in synchronization with the shooting timing.

イメージセンサ駆動部２５は、その時点で設定されている撮影条件等に応じて上記固体撮像素子１２の走査駆動を行なう。 The image sensor driving unit 25 scans and drives the solid-state imaging device 12 according to the imaging conditions set at that time.

上記画像処理部１４は、上記操作部２２のシャッタキー操作に伴う画像撮影時に、ＡＧＣ・Ａ／Ｄ変換部１３から送られてきた画像データを所定のデータファイル形式、例えばＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）であればＤＣＴ（離散コサイン変換）やハフマン符号化等のデータ圧縮処理を施してデータ量を大幅に削減した画像データファイルを作成する。作成した画像データファイルはシステムバスＳＢ、メモリカードコントローラ２６を介してメモリカード３０に記録される。 The image processing unit 14 converts the image data sent from the AGC / A / D conversion unit 13 into a predetermined data file format, for example, JPEG (Joint Photographic Groups Groups), at the time of shooting an image accompanying the shutter key operation of the operation unit 22. ), A data compression process such as DCT (Discrete Cosine Transform) or Huffman coding is performed to create an image data file in which the amount of data is greatly reduced. The created image data file is recorded on the memory card 30 via the system bus SB and the memory card controller 26.

また画像処理部１４は、再生モード時にメモリカード３０からメモリカードコントローラ２６を介して読出されてくる画像データをシステムバスＳＢを介して受取り、記録時とは逆の手順で圧縮を解く伸長処理により元のサイズの画像データを得、これをシステムバスＳＢを介して表示部１５に出力して表示させる。
メモリカードコントローラ２６は、カードコネクタ３１を介してメモリカード３０と接続される。メモリカード３０は、このデジタルカメラ１０に着脱自在に装着され、このデジタルカメラ１０の記録媒体となる画像データ等の記録用メモリであり、内部にはブロック単位で電気的に書換え可能な不揮発性メモリであるフラッシュメモリと、その駆動回路とが設けられる。 Further, the image processing unit 14 receives image data read from the memory card 30 via the memory card controller 26 in the playback mode via the system bus SB, and performs decompression processing that decompresses in a procedure reverse to that for recording. Image data of the original size is obtained and output to the display unit 15 via the system bus SB for display.
The memory card controller 26 is connected to the memory card 30 via the card connector 31. The memory card 30 is a memory for recording image data or the like that is detachably attached to the digital camera 10 and serves as a recording medium for the digital camera 10, and has a nonvolatile memory that can be electrically rewritten in units of blocks. A flash memory and a driving circuit thereof are provided.

ＵＳＢインターフェイス２７は、ＵＳＢコネクタ３２を介してこのデジタルカメラ１０を外部機器、例えばパーソナルコンピュータと接続する際のデータの送受を司る。 The USB interface 27 controls transmission / reception of data when the digital camera 10 is connected to an external device such as a personal computer via the USB connector 32.

次に上記実施形態の動作について説明する。
なお、以下に示す動作は、撮影モード下で動画像スタート／ストップキーを操作して動画像の撮影を開始した際、ＣＰＵ１９がプログラムメモリ２１に記憶されている動作プログラムやデータを読出してメインメモリ２０に展開して記憶させた上で実行するものである。 Next, the operation of the above embodiment will be described.
In the following operation, the CPU 19 reads out the operation program and data stored in the program memory 21 when the moving image start / stop key is operated in the shooting mode to start shooting a moving image, and the main memory 20 is executed after being expanded and stored.

プログラムメモリ２１に記憶されている動作プログラム等は、このデジタルカメラ１０の製造工場出荷時にプログラムメモリ２１に記憶されていたものに加え、例えばこのデジタルカメラ１０のバージョンアップに際して、デジタルカメラ１０を上記ＵＳＢコネクタ３２を介してパーソナルコンピュータと接続することにより外部から新たな動作プログラム、データ等をダウンロードして記憶するものも含む。 The operation program stored in the program memory 21 is stored in the program memory 21 when the digital camera 10 is shipped from the manufacturing factory. For example, when the digital camera 10 is upgraded, the digital camera 10 is connected to the USB memory. Also included are those that download and store new operation programs, data, etc. from the outside by connecting to a personal computer via the connector 32.

図２は、このデジタルカメラ１０のユーザが事前に話者として撮影のトリガ情報である発呼音声を登録する話者登録モード時の処理内容である。本図では、発話者の声と、発話内容（語）、具体的にはペット名とを登録する。
その当初に、まず発話者名を登録する（ステップＰ１０１）。この発話者名の登録時には、例えば表示部１５にカタカナ等を表示させた上で操作部２２のカーソルキー及びセットキーを操作することで適宜入力が可能であるものとする。入力された発話者名のテキストデータをＣＰＵ１９が受付け、プログラムメモリ２１に記憶する。 FIG. 2 shows the processing contents in the speaker registration mode in which the user of the digital camera 10 registers the calling voice as the shooting trigger information in advance as a speaker. In this figure, the voice of the speaker and the utterance content (word), specifically, the pet name are registered.
First, a speaker name is registered (step P101). At the time of registering the speaker name, it is possible to input appropriately by operating the cursor key and the set key of the operation unit 22 after displaying katakana or the like on the display unit 15, for example. The CPU 19 accepts the text data of the input speaker name and stores it in the program memory 21.

その後、所定のいくつかの単語を列記し、それらのうちのいずれかを発声するようなガイドメッセージをＣＰＵ１９が表示部１５で表示させる（ステップＰ１０２）。 Thereafter, the CPU 19 causes the display unit 15 to display a guide message that lists some predetermined words and utters any one of them (step P102).

ＣＰＵ１９はこのガイドメッセージにしたがって一定の音圧レベル以上の音声入力があるか否かを判断し（ステップＰ１０３）、なければ上記ステップＰ１０２からの処理に戻ってガイドメッセージの表示を継続しながら、音声入力があるのを待機する。 In accordance with this guide message, the CPU 19 determines whether or not there is an audio input of a certain sound pressure level or higher (step P103). If not, the CPU 19 returns to the processing from step P102 and continues to display the guide message. Wait for input.

音声入力があった場合、上記ステップＰ１０３でそれを判断して、マイクロホン１６Ｌ，１６Ｒから入力された音声データを音声処理部１７内の音声認識部１７ａで分析させる（ステップＰ１０４）。ここでの音声分析は、「声紋」とも呼称される、人によって異なる声の音響的特徴（音響パターン）を抽出する。取得した音響パターンデータは第１の音声トリガデータ（図では「トリガ１」と称する）としてＣＰＵ１９がプログラムメモリ２１内の上記発話者名データと関連付けて記憶させる（ステップＰ１０５）。 If there is a voice input, it is determined in step P103 and the voice recognition unit 17a in the voice processing unit 17 analyzes the voice data input from the microphones 16L and 16R (step P104). The voice analysis here extracts an acoustic feature (acoustic pattern) of a voice that is also called a “voice print” and varies depending on a person. The acquired acoustic pattern data is stored as first voice trigger data (referred to as “trigger 1” in the figure) by the CPU 19 in association with the speaker name data in the program memory 21 (step P105).

次いで、ペットの名前を発声するようなガイドメッセージを表示部１５で表示する（ステップＰ１０６）。
ＣＰＵ１９はこのガイドメッセージにしたがって一定の音圧レベル以上の音声入力があるか否かを判断し（ステップＰ１０７）、なければ上記ステップＰ１０６からの処理に戻ってガイドメッセージの表示を継続しながら、音声入力があるのを待機する。 Next, a guide message that utters the name of the pet is displayed on the display unit 15 (step P106).
In accordance with this guide message, the CPU 19 determines whether or not there is an audio input of a certain sound pressure level or higher (step P107). If not, the CPU 19 returns to the processing from step P106 and continues to display the guide message. Wait for input.

音声入力があった場合、ＣＰＵ１９は上記ステップＰ１０７でそれを判断し、マイクロホン１６Ｌ，１６Ｒから入力された音声データを音声処理部１７内の音声認識部１７ａで分析させる（ステップＰ１０８）。 If there is a voice input, the CPU 19 determines this in step P107, and the voice recognition unit 17a in the voice processing unit 17 analyzes the voice data input from the microphones 16L and 16R (step P108).

ここでの音声分析では、具体的には認識対象の音素がそれぞれどのような周波数特性を持っているかを表す音響モデルを取得する。音響モデルの表現方法としては、例えば混合正規分布を出力確率とした隠れマルコフモデルを用いる。 In the speech analysis here, specifically, an acoustic model representing what frequency characteristic each phoneme to be recognized has is acquired. As a representation method of the acoustic model, for example, a hidden Markov model with a mixed normal distribution as an output probability is used.

取得した音響モデルデータは第２の音声トリガデータ（図では「トリガ２」と称する）としてＣＰＵ１９がプログラムメモリ２１内の上記発話者名データ、第１の音声トリガデータと関連付けて記憶させる（ステップＰ１０９）。
以上でこの図２の話者登録モードでの一連の処理を終了する。 The acquired acoustic model data is stored as second voice trigger data (referred to as “trigger 2” in the figure) by the CPU 19 in association with the speaker name data and the first voice trigger data in the program memory 21 (step P109). ).
The series of processes in the speaker registration mode in FIG.

次に図３により、上記話者登録をした状態で、ペットの顔認識により撮影を実行する顔認識モードでの撮影動作時の処理内容について説明する。
なお、この顔認識モードでの動作にあっては、ユーザが飼っているペット、例えば猫の顔の特徴データが予めプログラムメモリ２１に記憶されているものとする。 Next, with reference to FIG. 3, description will be given of processing contents during a photographing operation in a face recognition mode in which photographing is performed by pet face recognition in the state where the speaker is registered.
In the operation in the face recognition mode, it is assumed that feature data of a pet pet kept by the user, for example, a cat's face, is stored in the program memory 21 in advance.

同モード選択時には、まず一定の音圧レベル以上の音声入力があるのを待機する（ステップＳ１０１）。
図４及び図５は、上記音声入力がなされる前の状態で、光学レンズユニット１１を介して固体撮像素子１２で撮像され、表示部１５でモニタ表示される被写体の画像を例示する。これらの図に示すように、撮影を行ないたい特定の被写体であるペットの顔が撮像範囲に入っていても、顔認識機能が起動されておらず、顔認識処理は行なわれない。 When the same mode is selected, the system first waits for a voice input exceeding a certain sound pressure level (step S101).
4 and 5 exemplify an image of a subject imaged by the solid-state imaging device 12 via the optical lens unit 11 and displayed on the monitor 15 before the voice input is performed. As shown in these drawings, even if the face of a pet that is a specific subject to be photographed is within the imaging range, the face recognition function is not activated and the face recognition process is not performed.

待機している状態でユーザによりペットに対する呼びかけがあったものとする。この呼びかけによりＣＰＵ１９が上記ステップＳ１０１で一定の音圧レベル以上の音声入力があったと判断すると、次に音声処理部１７の音声認識部１７ａによりその入力音声を分析し、音響的特徴（音響パターン）と音響モデルとを取得する。 It is assumed that the user has called the pet while waiting. If the CPU 19 determines in step S101 that there has been a voice input of a certain sound pressure level or higher by this call, the voice recognition unit 17a of the voice processing unit 17 then analyzes the input voice to obtain an acoustic feature (acoustic pattern). And an acoustic model.

次いで、ＣＰＵ１９は取得した音響モデルデータを登録済みの第２の音声トリガデータと比較し、所定値以上の類似度を有するか否かによりペット名を表す第２の音声トリガであるか否かを判断する（ステップＳ１０３）。 Next, the CPU 19 compares the acquired acoustic model data with the registered second voice trigger data, and determines whether or not the second voice trigger represents the pet name depending on whether or not the similarity is equal to or higher than a predetermined value. Judgment is made (step S103).

ここで、取得した音響モデルデータが第２の音声トリガデータではないと判断した場合には、直前のステップＳ１０１で取得した音声は撮影しているペットに対しての呼びかけではないものとして、再び新たな音声入力に備えるべく上記ステップＳ１０１からの処理に戻る。 Here, when it is determined that the acquired acoustic model data is not the second voice trigger data, the voice acquired in the immediately preceding step S101 is not a call to the photographed pet, and is newly renewed. The process returns to step S101 so as to prepare for a voice input.

また上記ステップＳ１０３で、取得した音響モデルデータが第２の音声トリガデータであると判断した場合にＣＰＵ１９は、モニタ画像中の特定の被写体を追尾するための一定時間、例えば１０秒を計時するためのＣＰＵ１９内部のタイマによる計時動作を起動させる（ステップＳ１０４）。 In addition, when it is determined in step S103 that the acquired acoustic model data is the second sound trigger data, the CPU 19 measures a certain time for tracking a specific subject in the monitor image, for example, 10 seconds. The time counting operation by the timer in the CPU 19 is started (step S104).

次いで、ＣＰＵ１９は上記取得した音響的特徴を登録済みの第１の音声トリガデータと比較し、所定値以上の類似度を有するか否かによりユーザ自身の声であるか否かを判断する（ステップＳ１０５）。 Next, the CPU 19 compares the acquired acoustic feature with the registered first voice trigger data, and determines whether or not the voice is the user's own voice based on whether or not the similarity has a predetermined value or more (step) S105).

ここで、取得した音響的特徴が第１の音声トリガデータではないと判断した場合には、直前のステップＳ１０１で取得した音声はユーザによる呼びかけではないではないものとして、再び新たな音声入力に備えるべく上記ステップＳ１０１からの処理に戻る。 Here, when it is determined that the acquired acoustic feature is not the first voice trigger data, the voice acquired in the immediately preceding step S101 is not a call by the user and is prepared for a new voice input again. Therefore, the process returns to the process from step S101.

また上記ステップＳ１０５で、取得した音響的特徴が第１の音声トリガデータであると判断した場合にＣＰＵ１９は、正しいユーザにより正しいペット名の呼びかけがなされたものと判断する。 If it is determined in step S105 that the acquired acoustic feature is the first voice trigger data, the CPU 19 determines that the correct user has called for the correct pet name.

このときＣＰＵ１９は、ＡＦ（自動合焦）処理及びＡＥ（自動露出）処理を実行する一方で（ステップＳ１０６）、得られる画像データに対して顔認識部１４ａにより顔認識処理を実行する（ステップＳ１０７）。 At this time, the CPU 19 executes AF (automatic focusing) processing and AE (automatic exposure) processing (step S106), while executing face recognition processing by the face recognition unit 14a on the obtained image data (step S107). ).

そして、その認識結果から予めプログラムメモリ２１に記憶されているペットの猫の顔の特徴データと一定の類似度以上の画像パターンを検出することができたか否かにより、ペットの顔を認識できたか否かを判断する（ステップＳ１０８）。 Whether or not the pet face could be recognized based on whether or not an image pattern having a certain degree of similarity or more with the pet cat face feature data stored in advance in the program memory 21 could be detected from the recognition result. It is determined whether or not (step S108).

ここで、ペットの顔を認識できなかったと判断した場合には、次いで直前の上記ステップＳ１０４で計時を開始したＣＰＵ１９の内部タイマの計時値が一定時間に達したか否かを判断する（ステップＳ１０９）。 Here, if it is determined that the pet's face could not be recognized, it is then determined whether or not the measured value of the internal timer of the CPU 19 that started measuring in the previous step S104 has reached a certain time (step S109). ).

そして、当該計時値が一定時間に達していないことを確認した上で、再び上記ステップＳ１０６からの処理に戻る。 Then, after confirming that the measured value has not reached the predetermined time, the process returns to step S106 again.

また、上記ステップＳ１０９でＣＰＵ１９の内部タイマの計時値が一定時間に達したと判断した場合には、次の音声入力を待って処理を再開するべく上記ステップＳ１０１からの処理に戻る。 If it is determined in step S109 that the measured value of the internal timer of the CPU 19 has reached a certain time, the process returns to step S101 to resume processing after waiting for the next voice input.

上記ステップＳ１０８でペットの顔を認識できたと判断した場合、以後は画像中のその顔パターンをロックして追尾し、ＡＦ処理を続行するものとし、合わせて表示部１５で表示する顔パターン位置に顔認識ができたことを表すフレーム位置を表示させる（ステップＳ１１０）。 If it is determined in step S108 that the pet's face has been recognized, the face pattern in the image is locked and tracked, and the AF process is continued, and the face pattern position displayed on the display unit 15 is also displayed. A frame position indicating that the face has been recognized is displayed (step S110).

図６は、このデジタルカメラ１０のユーザがペット名「たま」を呼びかけ、表示部１５で表示されるペットの画像の顔位置に対して顔認識結果を示すフォーカスフレームＦＦ１が重畳して表示されている状態を例示する。 In FIG. 6, the user of the digital camera 10 calls the pet name “Tama”, and a focus frame FF 1 indicating the face recognition result is superimposed on the face position of the pet image displayed on the display unit 15. Exemplified state.

同図では、フォーカスフレームＦＦ１を破線で示しているが、実際のデジタルカメラ１０では、フォーカスフレームＦＦ１を矩形とその中心位置の十字ラインを例えば緑色の実線により表現するものとしても良い。 In the drawing, the focus frame FF1 is indicated by a broken line. However, in the actual digital camera 10, the focus frame FF1 may be expressed by a rectangle and a cross line at the center position thereof by, for example, a green solid line.

また、この図６の表示例では、表示部１５の左下端部で文字列「ペット顔検出中！」のガイドメッセージＧＭ１により、表示されているフォーカスフレームＦＦ１がペットの顔の認識結果であることを補助的に表示している。 Further, in the display example of FIG. 6, the focus frame FF 1 displayed by the guide message GM 1 of the character string “pet face detected!” At the lower left corner of the display unit 15 indicates that the pet face is recognized. Is displayed as an auxiliary.

このようにペットの顔位置でフォーカスをロックし、合わせて表示部１５でも当該位置にフォーカスフレームを表示させた状態で、上記顔認識の結果から、所定の向き、例えば正面の顔であるか否かにより、シャッタレリーズのタイミングとなったか否かを判断する（ステップＳ１１１）。
これは、顔認識結果と、予め設定されている、ペットの顔を構成する各パーツの配置パターンとの類似度が所定値を超えて高いか否かにより判断する。 In this way, the focus is locked at the pet's face position, and the display unit 15 also displays the focus frame at the position, and based on the result of the face recognition, it is determined whether the face is a front face. Thus, it is determined whether or not the shutter release timing has come (step S111).
This is determined based on whether or not the similarity between the face recognition result and the preset arrangement pattern of each part constituting the pet's face exceeds a predetermined value.

ここでペットの顔が所定の向きではなく、シャッタレリーズのタイミングとなっていないと判断した場合には、その後に上記ステップＳ１０９に進み、タイマによる計時時間内であることを確認した上で上記ステップＳ１０６からの処理に戻る。 If it is determined that the face of the pet is not in a predetermined direction and the shutter release timing is not reached, the process proceeds to step S109, and after confirming that the pet's face is within the time counted by the timer, the step is performed. The process returns to S106.

また、上記ステップＳ１１１でペットの顔が所定の向き、例えば正面の顔となり、シャッタレリーズのタイミングとなったと判断した場合には、その時点でのＡＦ値及びＡＥ値に基づいて撮影を実行し、正面を向いているペットの顔の撮影を実行する（ステップＳ１１２）。 If it is determined in step S111 that the pet's face has a predetermined orientation, for example, a front face, and the shutter release timing is reached, shooting is performed based on the AF value and the AE value at that time, Photographing of the face of the pet facing the front is executed (step S112).

図７は、表示部１５で表示されるペットの画像がほぼ正面を向き、シャッタレリーズのタイミングとなって、上記フォーカスフレームＦＦ１とは異なるフォーカスフレームＦＦ２が重畳して表示された状態を例示する。 FIG. 7 exemplifies a state in which the pet image displayed on the display unit 15 faces substantially in front and the focus frame FF2 different from the focus frame FF1 is superimposed and displayed at the shutter release timing.

同図では、フォーカスフレームＦＦ２を一点鎖線で示しているが、実際のデジタルカメラ１０では、フォーカスフレームＦＦ２を矩形とその中心位置の十字ラインを例えば赤色の実線により表現するものとしても良い。 In the figure, the focus frame FF2 is indicated by a one-dot chain line. However, in the actual digital camera 10, the focus frame FF2 may be represented by a rectangle and a cross line at the center position thereof, for example, by a red solid line.

また、この図７の表示例では、表示部１５の左下端部で文字列「ペット顔検出中！」のガイドメッセージＧＭ１により、表示されているフォーカスフレームＦＦ１がペットの顔の認識結果であることを補助的に表示している。合わせて、例えばシャッタレリーズキーの近傍に埋設されるスピーカ１８から撮影タイミングに同期してシャッタ音を発生させることで、ユーザに撮影が実行されたことを報知する。 Further, in the display example of FIG. 7, the focus frame FF 1 displayed by the guide message GM 1 of the character string “pet face detected!” At the lower left end of the display unit 15 indicates that the pet face is recognized. Is displayed as an auxiliary. In addition, for example, a shutter sound is generated in synchronization with the shooting timing from the speaker 18 embedded in the vicinity of the shutter release key, thereby informing the user that shooting has been executed.

こうして撮影により得た画像データを画像処理部１４で所定のフォーマット、例えばＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）であればＤＣＴ（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ：離散コサイン変換）やエントロピー符号化としてのハフマン符号化を等を施してデータ量を圧縮して画像データファイル化し、得た画像データファイルをメモリカードコントローラ２６を介してこのデジタルカメラ１０の記録媒体であるメモリカード３０に記録させる（ステップＳ１１３）。 In this way, the image data obtained by shooting is subjected to a predetermined format in the image processing unit 14 such as DCT (Discrete Cosine Transform) or Huffman coding as entropy coding for JPEG (Joint Photographic Experts Group). The data amount is compressed to form an image data file, and the obtained image data file is recorded on the memory card 30 which is a recording medium of the digital camera 10 via the memory card controller 26 (step S113).

以上で一連の画像データの撮影、記録に係る処理を終了し、次の画像撮影に備えるべく上記ステップＳ１０１からの処理に戻る。 This completes the processing related to the shooting and recording of a series of image data, and returns to the processing from step S101 to prepare for the next image shooting.

以上詳記した如く本実施形態によれば、予めユーザが登録したペットへの呼びかけ音声が入力されるまでは顔認識等の一連の処理を実行しない。そのため、できる限り電力の消費を抑えながら必要なタイミングを逃すことなく撮影を実行することが可能となる。 As described above in detail, according to the present embodiment, a series of processes such as face recognition is not executed until a call voice to a pet registered in advance by the user is input. Therefore, it is possible to perform shooting without missing the necessary timing while suppressing power consumption as much as possible.

また上記実施形態では、自動合焦機能を有するものとし、顔認識処理と共に常に画像中のペットの顔位置に合焦させておき、撮影により得た画像データをファイル化して記録するものとしたので、顔認識処理が正確に実施できる上に、ペットの顔を所定の方向を向いた際にそれを検出してから撮影、記録するまでのタイムラグを最少限に短縮できる。 In the above embodiment, it is assumed that an automatic focusing function is provided, and the face position of the pet in the image is always focused together with the face recognition process, and the image data obtained by shooting is recorded as a file. Furthermore, the face recognition process can be performed accurately, and the time lag from when a pet's face is detected in a predetermined direction until it is shot and recorded can be minimized.

さらに上記実施形態では、ペットの顔がカメラ正面に向いた状態を顔認識により検出してシャッタレリーズを行なうものとした。これにより、顔認識技術を有効に活用して確実に被写体の表情を大きく捉えることができる。 Furthermore, in the above embodiment, the shutter release is performed by detecting the face of the pet facing the front of the camera by face recognition. This makes it possible to capture a large facial expression of the subject reliably by effectively utilizing the face recognition technology.

また上記実施形態では、ユーザの音声をトリガとして顔認識等の処理を開始するものとしたので、特に音声を記録しない静止画像の撮影に際して、直感的で理解し易いユーザインターフェイスにより気軽に撮影が実行できる。 In the above-described embodiment, processing such as face recognition is started by using the user's voice as a trigger. Therefore, when taking a still image that does not record voice, shooting is easily performed with an intuitive and easy-to-understand user interface. it can.

特に上記実施形態では、音声情報を音声認識及び話者認識により処理するものとしたので、「誰が」「何を」言ったのか双方が正しいと認識されなければトリガ情報とはならないため、無駄な電力消費を確実に回避しながらも、ペットなどの被写体によってはシャッタチャンスを逃すことなく的確に撮影を実行できる。 In particular, in the above-described embodiment, since voice information is processed by voice recognition and speaker recognition, trigger information is not used unless both “who” and “what” are recognized as correct. While reliably avoiding power consumption, depending on the subject such as a pet, shooting can be performed accurately without missing a photo opportunity.

なお上記実施形態は静止画像を撮影するデジタルカメラに適用した場合について説明したものであるが、本発明はこれに限らず、カメラ機能を有する電子機器であれば、他にも携帯電話端末やＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ：個人向け情報携帯端末）、電子ブック、モバイルコンピュータなどの各種機器にも同様に適用可能となる。 The above embodiment has been described with reference to a case where the present invention is applied to a digital camera that captures a still image. However, the present invention is not limited to this, and any other electronic device having a camera function may be a mobile phone terminal or a PDA. (Personal Digital Assistants: personal information portable terminals), electronic books, mobile computers, and other devices can be similarly applied.

その他、本発明は上述した実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、上述した実施形態で実行される機能は可能な限り適宜組み合わせて実施しても良い。上述した実施形態には種々の段階が含まれており、開示される複数の構成要件による適宜の組み合せにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、効果が得られるのであれば、この構成要件が削除された構成が発明として抽出され得る。 In addition, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention in the implementation stage. Further, the functions executed in the above-described embodiments may be combined as appropriate as possible. The above-described embodiment includes various stages, and various inventions can be extracted by an appropriate combination of a plurality of disclosed constituent elements. For example, even if some constituent requirements are deleted from all the constituent requirements shown in the embodiment, if the effect is obtained, a configuration from which the constituent requirements are deleted can be extracted as an invention.

１０…デジタルカメラ、１１…光学レンズユニット、１１Ａ…ズームレンズ、１１Ｂ…レンズ鏡筒、１２…固体撮像素子、１３…ＡＧＣ・Ａ／Ｄ変換部、１４…画像処理部、１４ａ…顔認識部、１５…表示部、１６Ｌ，１６Ｒ…マイクロホン、１７…音声処理部、１８…スピーカ、１９…ＣＰＵ、２０…メインメモリ、２１…プログラムメモリ、２２…操作部、２２ａ…ズームレバー、２３…レンズ駆動部、２４…フラッシュ駆動部、２５…イメージセンサ駆動部、２６…メモリカードコントローラ、２７…ＵＳＢインターフェイス、２８…レンズ用ＤＣモータ（Ｍ）、２９…フラッシュ部、３０…メモリカード、３１…カードコネクタ、３２…ＵＳＢコネクタ、ＦＦ１，ＦＦ２…フォーカスフレーム、ＧＭ１…ガイドメッセージ、ＳＢ…システムバス。 DESCRIPTION OF SYMBOLS 10 ... Digital camera, 11 ... Optical lens unit, 11A ... Zoom lens, 11B ... Lens barrel, 12 ... Solid-state image sensor, 13 ... AGC / A / D conversion part, 14 ... Image processing part, 14a ... Face recognition part, DESCRIPTION OF SYMBOLS 15 ... Display part, 16L, 16R ... Microphone, 17 ... Audio | voice processing part, 18 ... Speaker, 19 ... CPU, 20 ... Main memory, 21 ... Program memory, 22 ... Operation part, 22a ... Zoom lever, 23 ... Lens drive part 24 ... Flash drive unit, 25 ... Image sensor drive unit, 26 ... Memory card controller, 27 ... USB interface, 28 ... DC motor (M) for lens, 29 ... Flash unit, 30 ... Memory card, 31 ... Card connector, 32 ... USB connector, FF1, FF2 ... focus frame, GM1 ... guide message, SB ... Temubasu.

Claims

画像を撮像する撮像手段と、
特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶手段と、
上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶手段と、
トリガ情報を入力するトリガ入力手段と、
上記トリガ入力手段で入力したトリガ情報と上記第２の記憶手段が記憶するトリガ情報とが所定値以上の類似度か否かを判断する判断手段と、
上記判断手段により所定値以上の類似度と判断されると、上記撮像手段が撮像する画像から上記第１の記憶手段が記憶する特徴情報に基づいて、前記特定の被写体を探索する探索手段と
を備えたことを特徴とする撮像装置。 An imaging means for capturing an image;
First storage means for storing feature information on an image corresponding to a specific subject;
Second storage means for storing trigger information for instructing start of searching for the specific subject;
Trigger input means for inputting trigger information;
Determining means for determining whether or not the trigger information input by the trigger input means and the trigger information stored in the second storage means have a similarity equal to or higher than a predetermined value;
When the determination unit determines that the similarity is equal to or higher than a predetermined value, search means for searching for the specific subject based on feature information stored in the first storage unit from an image captured by the imaging unit. An image pickup apparatus comprising:

上記撮像手段に、上記特徴情報の候補となる被写体を合焦させる自動合焦手段と、
上記探索手段に、上記自動合焦手段により合焦された画像を取得させ特徴情報を探索させる探索制御手段と、
上記探索制御手段により探索された特徴情報を含む特定の被写体の画像を記録する記録手段と
をさらに備えたことを特徴とする請求項１記載の撮像装置。 Automatic focusing means for causing the imaging means to focus a subject that is a candidate for the feature information;
Search control means for causing the search means to acquire an image focused by the automatic focusing means and to search for feature information;
The imaging apparatus according to claim 1, further comprising a recording unit that records an image of a specific subject including the feature information searched by the search control unit.

上記探索手段は、上記特徴情報が上記撮像手段に正対しているか否かを判別し、
上記記録手段は、上記探索手段により上記撮像手段に正対したと判別された特徴情報を含む特定の被写体の画像を記録する
ことを特徴とする請求項２記載の撮像装置。 The search means determines whether the feature information is directly facing the imaging means,
The imaging apparatus according to claim 2, wherein the recording unit records an image of a specific subject including feature information determined to be directly facing the imaging unit by the search unit.

上記トリガ情報は音声情報であることを特徴とする請求項１記載の撮像装置。 The imaging apparatus according to claim 1, wherein the trigger information is audio information.

上記判断手段は、上記トリガ情報の音声認識処理及び話者認識処理により所定値以上の類似度か否かを判断することを特徴とする請求項４記載の撮像装置。 5. The imaging apparatus according to claim 4, wherein the determination unit determines whether or not the degree of similarity is equal to or greater than a predetermined value by voice recognition processing and speaker recognition processing of the trigger information.

画像を撮像する撮像装置での被写体探索方法であって、
特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶ステップと、
上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶ステップと、
トリガ情報を入力するトリガ入力ステップと、
上記トリガ入力ステップにて入力したトリガ情報と上記第２の記憶ステップで記憶したトリガ情報とが所定値以上の類似度か否かを判断ステップと、
上記判断ステップにて所定値以上の類似度と判断されると、撮像する画像から上記第１の記憶ステップで記憶する特徴情報に基づいて、前記特定の被写体を探索する探索ステップと
を含むことを特徴とする被写体探索方法。 A method of searching for a subject in an imaging device that captures an image,
A first storage step of storing feature information on an image corresponding to a specific subject;
A second storage step for storing trigger information for instructing the start of the search for the specific subject;
A trigger input step for inputting trigger information;
A step of determining whether or not the trigger information input in the trigger input step and the trigger information stored in the second storage step have a similarity greater than or equal to a predetermined value;
And a search step for searching for the specific subject based on the feature information stored in the first storage step from the captured image when the similarity is determined to be equal to or greater than a predetermined value in the determination step. A characteristic object search method.

画像を撮像する撮像装置のコンピュータを、
上記特定の被写体に対応する画像上の特徴情報を記憶する第１の記憶手段、
上記特定の被写体の探索の開始を指示するためのトリガ情報を記憶する第２の記憶手段、
トリガ情報を入力するトリガ入力手段、
上記トリガ入力手段により入力されたトリガ情報と上記第２の記憶手段により記憶されたトリガ情報とが所定値以上の類似度か否かを判断する判断手段、
上記判断手段により所定値以上の類似度と判断されると、撮像する画像から上記第１の記憶手段で記憶する特徴情報に基づいて、前記特定の被写体を探索する探索手段
として機能させることを特徴とするプログラム。 A computer of an imaging device that captures an image,
First storage means for storing feature information on an image corresponding to the specific subject;
Second storage means for storing trigger information for instructing start of searching for the specific subject;
Trigger input means for inputting trigger information,
Determining means for determining whether or not the trigger information input by the trigger input means and the trigger information stored by the second storage means have a similarity equal to or greater than a predetermined value;
When the determination unit determines that the degree of similarity is equal to or greater than a predetermined value, the determination unit is configured to function as a search unit that searches for the specific subject based on feature information stored in the first storage unit from a captured image. Program.