JP2007058630A

JP2007058630A - Image recognition device

Info

Publication number: JP2007058630A
Application number: JP2005243808A
Authority: JP
Inventors: Takashige Tanaka; 敬重田中; Kenji Fukazawa; 賢二深沢
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2005-08-25
Filing date: 2005-08-25
Publication date: 2007-03-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image recognition device capable of performing appropriate image recognition. <P>SOLUTION: This device comprises a database storing information for the actual size of each of a plurality of objects with the image characteristic quantity of each object; an object extraction means inputting a photographic image and extracting a predetermined area within the photographic image as an object based on the characteristic thereof; a characteristic quantity extraction means extracting the image characteristic quantity of the object; a photographic information acquisition means acquiring photographic information attached to the photographic image; a focus determination means determining whether the out-of-focus is within a predetermined threshold or not for each object by analyzing the object concerned; an actual length calculation means calculating, when each object is in focus with out-of-focus within the predetermined threshold, the actual size of each object using the photographic information; and a candidate selection means selecting a candidate of object showing the object as the content of the photographic image from the database using the actual size with the image characteristic quantity. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、画像データの内容を認識する技術に関する。 The present invention relates to a technique for recognizing the contents of image data.

近年、デジタルスチルカメラなどの撮像機器の記録容量の増大に伴い、大量の撮影画像の中から所望の撮影画像を検索する種々の技術が検討されている。こうした検索技術の一つとして、「撮影画像に、被写体として何が写っているか」、つまり画像内容を認識する技術がある。例えば、下記特許文献１には、撮影画像から所定領域のオブジェクトを抽出し、抽出したオブジェクトの画像特徴量から画像内容を認識する技術が開示されている。また、下記特許文献２には、撮影画像の撮影情報を利用して撮影画像の被写体の実際の大きさを算出し、算出した実際の大きさを一つの判断基準として、所定の図鑑データから被写体の情報を抽出する技術が開示されている。 In recent years, with the increase in recording capacity of imaging devices such as digital still cameras, various techniques for searching for a desired captured image from a large number of captured images have been studied. As one of such search techniques, there is a technique for recognizing “what is captured as a subject in a photographed image”, that is, image content. For example, Patent Document 1 below discloses a technique for extracting an object in a predetermined area from a captured image and recognizing the image content from the image feature amount of the extracted object. In Patent Document 2 below, the actual size of the subject of the photographed image is calculated using the photographing information of the photographed image, and the subject is obtained from predetermined picture book data using the calculated actual size as one criterion. A technique for extracting the information is disclosed.

特許第３００８２３３号公報Japanese Patent No. 3008233 特開２００５−１０８０２７号公報JP 2005-108027 A

しかしながら、かかる画像認識の技術では、画像内容をある程度まで認識できるものの、対象とするオブジェクトによっては、画像認識が困難な場合があった。例えば、上記の特許文献１では、オブジェクトの実際の大きさが考慮されないため、「トラ」と「ネコ」や、「実物のビルディング」と「模型のビルディング」などの違いまでは判断が困難である。また、上記特許文献２では、被写体の実際の大きさを算出するものの、撮影画像における被写体の画像状態とは無関係に算出するため、算出結果の精度上の信頼性が低く、結局、画像認識が困難となる場合があった。 However, with such image recognition technology, although the image content can be recognized to some extent, image recognition may be difficult depending on the target object. For example, in the above-mentioned Patent Document 1, since the actual size of an object is not considered, it is difficult to determine the difference between “tiger” and “cat” or “real building” and “model building”. . Further, in Patent Document 2, although the actual size of the subject is calculated, it is calculated regardless of the image state of the subject in the photographed image. It could be difficult.

本発明は、適切な画像認識が難しいといった問題を踏まえ、適切な画像認識を行なう画像認識装置を提供することを目的とする。 An object of the present invention is to provide an image recognition apparatus that performs appropriate image recognition in view of the problem that appropriate image recognition is difficult.

本発明の画像認識装置は、上記課題に鑑み、以下の手法を採った。すなわち、撮影画像の内容を認識する画像認識装置であって、複数の対象物毎の画像特徴量と共に、該対象物毎の実際の大きさの情報を記憶するデータベースと、前記撮影画像を入力し、該撮影画像の特徴に基づいて該撮影画像内に含まれる所定領域をオブジェクトとして抽出するオブジェクト抽出手段と、前記オブジェクトの画像特徴量を抽出する特徴量抽出手段と、前記撮影画像に付帯する撮影情報を取得する撮影情報取得手段と、前記抽出されたオブジェクトを解析し、該オブジェクトのピントのずれが所定の閾値内であるか否かを該オブジェクト毎に判断するピント判断手段と、前記ピントのずれが所定閾値内であり、該ピントがずれていないと判断された場合に、前記撮影情報を用いて前記オブジェクト毎の実際の大きさを算出する実体長算出手段と、前記算出された実際の大きさを前記抽出された画像特徴量と共に用いて、前記撮影画像の内容としての前記オブジェクトを表わす前記対象物の候補を、前記データベースから選択する候補選択手段とを備えたことを要旨としている。 In view of the above problems, the image recognition apparatus of the present invention employs the following method. That is, an image recognition apparatus for recognizing the contents of a captured image, which inputs an image feature amount for each of a plurality of objects and information on the actual size of each object and the captured image. , An object extracting unit that extracts a predetermined area included in the captured image as an object based on the feature of the captured image, a feature amount extracting unit that extracts an image feature amount of the object, and a photographing attached to the captured image Shooting information acquisition means for acquiring information, focus determination means for analyzing the extracted object, and determining for each object whether or not the focus shift of the object is within a predetermined threshold; and When it is determined that the deviation is within a predetermined threshold and the focus is not out of focus, the actual size of each object is calculated using the shooting information. A candidate for selecting the candidate for the object representing the object as the content of the captured image from the database by using the actual length calculation means and the calculated actual size together with the extracted image feature amount The gist of the invention is that it comprises a selection means.

また、本発明の画像認識装置に対応する画像認識方法は、撮影画像の内容を認識する画像認識方法であって、複数の対象物毎の画像特徴量と共に、該対象物毎の実際の大きさの情報を、データベースに記憶し、前記撮影画像を入力し、該撮影画像の特徴に基づいて該撮影画像内に含まれる所定領域をオブジェクトとして抽出し、前記オブジェクトの画像特徴量を抽出し、前記撮影画像に付帯する撮影情報を取得し、前記抽出されたオブジェクトを解析し、該オブジェクトのピントのずれが所定の閾値内であるか否かを該オブジェクト毎に判断し、前記ピントのずれが所定閾値内であり、該ピントがずれていないと判断された場合に、前記撮影情報を用いて前記オブジェクト毎の実際の大きさを算出し、前記算出された実際の大きさを前記抽出された画像特徴量と共に用いて、前記撮影画像の内容としての前記オブジェクトを表わす前記対象物の候補を、前記データベースから選択することを要旨としている。 Further, an image recognition method corresponding to the image recognition apparatus of the present invention is an image recognition method for recognizing the contents of a captured image, and the actual size of each target object together with the image feature amount of each target object. Is stored in a database, the photographed image is input, a predetermined area included in the photographed image is extracted as an object based on the feature of the photographed image, an image feature amount of the object is extracted, Shooting information attached to a shot image is acquired, the extracted object is analyzed, and it is determined for each object whether the focus shift of the object is within a predetermined threshold, and the focus shift is predetermined. When it is determined that it is within the threshold and the focus is not shifted, the actual size of each object is calculated using the shooting information, and the calculated actual size is extracted. The used together with the image feature amount, the candidate of the object representing the object as the content of the captured image, and the gist selecting from the database.

本発明の画像認識装置およびその画像認識方法によれば、オブジェクトのピントのずれを判断し、ピントがずれていないオブジェクトについて、実際の大きさを算出する。そして、算出された実際の大きさを、オブジェクト（つまり、撮影画像）の認識における一判断材料として用いる。したがって、単に、オブジェクトの画像特徴量を用いた判断による画像認識に比べ、認識の精度を高めることができる。また、ピントがずれていないオブジェクトについて実際の大きさを算出するため、撮影情報を有効に利用した画像認識を行なうことができる。その結果、適切な画像認識を行なうことができる。 According to the image recognition device and the image recognition method of the present invention, it is determined whether the object is out of focus, and the actual size of the object that is not out of focus is calculated. Then, the calculated actual size is used as a material for determination in recognition of an object (that is, a captured image). Therefore, the recognition accuracy can be improved as compared with the image recognition based on the determination using the image feature amount of the object. In addition, since the actual size of the object that is not out of focus is calculated, image recognition that effectively uses the shooting information can be performed. As a result, appropriate image recognition can be performed.

上記の構成を有する画像認識装置の撮影情報取得手段は、前記撮影情報として、少なくとも被写体距離および焦点距離の情報を取得し、前記実体長算出手段は、前記オブジェクトの大きさであるイメージサイズを求め、当該イメージサイズと前記被写体距離および前記焦点距離とを用いて、該オブジェクトの実際の大きさを算出するものとしても良い。 The shooting information acquisition unit of the image recognition apparatus having the above configuration acquires at least subject distance and focal length information as the shooting information, and the entity length calculation unit obtains an image size that is the size of the object. The actual size of the object may be calculated using the image size, the subject distance, and the focal length.

かかる画像認識装置によれば、撮影情報として撮影画像に付帯する被写体距離および焦点距離の情報を用いて、撮影画像上のイメージサイズからオブジェクトの実際の大きさを求める。こうした撮影画像に予め付帯する撮影情報を用いることで、比較的容易に実際の大きさを求めることができる。 According to such an image recognition apparatus, the actual size of the object is obtained from the image size on the captured image using information on the subject distance and the focal length attached to the captured image as the captured information. By using shooting information attached to such a shot image in advance, the actual size can be obtained relatively easily.

上記の構成を有する画像認識装置の撮影情報取得手段は、前記撮影情報として、さらに、前記撮影画像を撮影した撮像機器の撮像素子の情報を取得し、前記実体長算出手段は、前記撮像素子の情報を利用して、前記イメージサイズを求めるものとしても良い。 The imaging information acquisition unit of the image recognition apparatus having the above configuration further acquires information of an imaging element of an imaging device that has captured the captured image as the imaging information, and the entity length calculation unit includes the imaging element of the imaging element. The image size may be obtained using information.

かかる画像認識装置によれば、イメージサイズの算出に撮像素子の情報を利用し、その撮像素子の情報を撮影情報として取得する。撮影画像に予め付帯する撮影情報を用いることで、比較的容易にイメージサイズを求めることができる。 According to such an image recognition apparatus, information on an image sensor is used for calculating an image size, and the information on the image sensor is acquired as shooting information. By using the shooting information attached to the shot image in advance, the image size can be obtained relatively easily.

上記の構成を有する画像認識装置の撮影情報取得手段は、前記撮像機器の機種名と前記撮像素子の情報とを関連付けて記憶した機種データベースを備えており、前記撮像素子の情報に代えて、前記撮像機器の機種名を取得し、前記取得した機種名を用いて、前記機種データベースから前記撮影素子の情報を取得するものとしても良い。 The imaging information acquisition means of the image recognition apparatus having the above configuration includes a model database that stores the model name of the imaging device and the information of the imaging element in association with each other. The model name of the imaging device may be acquired, and the information of the imaging element may be acquired from the model database using the acquired model name.

かかる画像認識装置によれば、撮影情報として機種名を取得し、撮像素子の情報を撮影情報として取得できない場合であっても、機種データベースを参照して、機種名から撮像素子の情報を取得することができる。つまり、直接的に撮像素子の情報を取得できなくても、間接的に撮像素子の情報を取得して、イメージサイズを求めることができる。 According to such an image recognition device, even when the model name is acquired as the shooting information and the information of the image sensor cannot be acquired as the shooting information, the image sensor information is acquired from the model name with reference to the model database. be able to. That is, even if information about the image sensor cannot be acquired directly, information about the image sensor can be acquired indirectly to determine the image size.

上記の構成を有する画像認識装置のピント判断手段は、前記所定閾値を第１の閾値とし、該第１の閾値よりも大きい第２の閾値を備え、前記ピントのずれが前記第１の閾値を超え、前記第２の閾値内に収まる場合には、前記ピントのずれが許容範囲内であり、該ピントがずれていないと判断するものとしても良い。 The focus determination unit of the image recognition apparatus having the above configuration uses the predetermined threshold as a first threshold, and includes a second threshold that is larger than the first threshold, and the focus shift determines the first threshold as the first threshold. If it exceeds and falls within the second threshold value, it may be determined that the focus shift is within an allowable range and the focus is not shifted.

かかる画像認識装置によれば、ピントのずれが第１の閾値内に収まらない場合であっても、第２の閾値内に収まる場合には、ピントがずれていないと見なして、オブジェクトの実際の大きさを算出する。すなわち、ピントのずれの判断に所定の幅を持たせ、多少のピントのずれを有するオブジェクトについても、実際の大きさを算出する。こうすることで、画像認識の判断材料として実際の大きさを用いることができ、画像認識の精度を向上することができる。 According to such an image recognition device, even if the focus shift does not fall within the first threshold value, if it falls within the second threshold value, it is considered that the focus is not shifted and the actual object is not moved. Calculate the size. That is, a predetermined width is given to the determination of the focus shift, and the actual size of the object having a slight focus shift is calculated. In this way, the actual size can be used as a material for determining image recognition, and the accuracy of image recognition can be improved.

上記の構成を有する画像認識装置の候補選択手段は、前記実際の大きさが、前記ピントのずれが前記第１の閾値を超え、前記第２の閾値内に収まる場合に算出されたものであるときは、前記データベースの実際の大きさの情報に、所定量の誤差を考慮して、前記オブジェクトを表わす前記対象物の候補を選択するものとしても良い。 The candidate selection unit of the image recognition apparatus having the above configuration is calculated when the actual size exceeds the first threshold value and falls within the second threshold value. In some cases, the target candidate representing the object may be selected in consideration of a predetermined amount of error in the actual size information of the database.

かかる画像認識装置によれば、ピントのずれが第１の閾値を超えて第２の閾値内に収まる場合には、実際の大きさを算出し、データベースの実際の大きさの情報に誤差を考慮する。つまり、ピントがずれていることによる実際の大きさの算出誤差を考慮して、候補を選択する。したがって、画像認識の判断材料として実際の大きさを有効に活用することができる。 According to such an image recognition apparatus, when the focus shift exceeds the first threshold value and falls within the second threshold value, the actual size is calculated, and the error is considered in the information on the actual size of the database. To do. That is, the candidate is selected in consideration of the calculation error of the actual size due to the defocus. Therefore, the actual size can be effectively used as a material for determining image recognition.

上記の構成を有する画像認識装置の実体長算出手段は、前記ピントがずれていると判断された場合には、前記オブジェクトの実際の大きさの算出を中止し、前記候補選択手段は、前記実際の大きさを利用することなく、前記オブジェクトを表わす前記対象物の候補を前記データベースから選択するものとしても良い。 The actual length calculation unit of the image recognition apparatus having the above configuration stops calculating the actual size of the object when it is determined that the object is out of focus, and the candidate selection unit The candidate for the object representing the object may be selected from the database without using the size of the object.

かかる画像認識装置によれば、ピントがずれている場合には、実際の大きさの算出を中止し、候補の選択にはこれを用いない。換言すると、ピントがずれている場合には、実際の大きさを画像認識の判断材料から除外する。算出誤差の大きいオブジェクトの実際の大きさを候補の選択に用いないため、適切な画像認識を行なうことができる。 According to such an image recognition device, when the image is out of focus, calculation of the actual size is stopped, and this is not used for selection of candidates. In other words, when the image is out of focus, the actual size is excluded from the image recognition determination material. Since the actual size of an object with a large calculation error is not used for candidate selection, appropriate image recognition can be performed.

上記の構成を有する画像認識装置の実体長算出手段は、前記ピントがずれていると判断された場合には、前記実体長の算出処理に先立って、前記オブジェクトのエッジを強調する処理を実行するものとしても良い。 The entity length calculation unit of the image recognition apparatus having the above configuration executes a process of enhancing the edge of the object prior to the process of calculating the entity length when it is determined that the focus is shifted. It is good as a thing.

かかる画像認識装置によれば、ピントがずれているオブジェクトであっても、エッジ強調の処理によってオブジェクトのエッジを強調し、イメージサイズの算出時の誤差を低減することができる。 According to such an image recognition apparatus, even when an object is out of focus, the edge of the object can be emphasized by the edge enhancement processing, and an error in calculating the image size can be reduced.

本発明は、コンピュータプログラムおよびコンピュータプログラムを記録した媒体としても実装することができる。記録媒体としては、フレキシブルディスク，ＣＤ−ＲＯＭ，ＤＶＤ−ＲＯＭ／ＲＡＭ，光磁気ディスク、メモリカード、ハードディスクなどコンピュータが読取り可能な種々の媒体を利用することができる。 The present invention can also be implemented as a computer program and a medium recording the computer program. As the recording medium, various computer-readable media such as a flexible disk, a CD-ROM, a DVD-ROM / RAM, a magneto-optical disk, a memory card, and a hard disk can be used.

以下、本発明の実施の形態について、実施例に基づき以下の順序で説明する。
Ａ．画像認識装置の構成：
Ｂ．画像認識処理：
Ｃ．実体長処理：
Ｄ．変形例： Hereinafter, embodiments of the present invention will be described in the following order based on examples.
A. Configuration of image recognition device:
B. Image recognition processing:
C. Entity length processing:
D. Variation:

Ａ．画像認識装置の構成：
図１は、本発明の一実施例としての画像認識装置の概略構成を示す説明図である。この画像認識装置１０は、主に、撮影した画像データから所定領域をオブジェクトとして抽出する抽出部２０、抽出したオブジェクトの実際の大きさを求める実体長処理部３０、複数の対象物毎の実際の大きさの情報などを備えた記憶部４０、抽出したオブジェクトを表わす対象物の候補を選択する物体候補選択部５０などから構成され、一の画像データのオブジェクトを表わす対象物の候補を選択することで、一の画像データの画像内容を認識する装置である。 A. Configuration of image recognition device:
FIG. 1 is an explanatory diagram showing a schematic configuration of an image recognition apparatus as an embodiment of the present invention. The image recognition apparatus 10 mainly includes an extraction unit 20 that extracts a predetermined area as an object from captured image data, an entity length processing unit 30 that calculates an actual size of the extracted object, and an actual number for each of a plurality of objects. A storage unit 40 having size information and the like, an object candidate selection unit 50 for selecting an object candidate representing the extracted object, and the like, and selecting an object candidate representing an object of one image data Thus, the device recognizes the image content of one piece of image data.

抽出部２０は、オブジェクト抽出部２４と色認識部２６とからなり、実体長処理部３０と接続している。オブジェクト抽出部２４は、撮影した画像データを入力し、画像データから線画（輪郭線）を抽出して、これをオブジェクトとする。オブジェクト抽出部２４は、こうして抽出したオブジェクトについて形状、位置姿勢などを決定する。色認識部２６は、抽出されたオブジェクトに使用されている色の割合を検出している。すなわち、オブジェクト抽出部２４と色認識部２６とからなる抽出部２０は、画像データから抽出されたオブジェクトの画像特徴量も抽出し、これらを実体長処理部３０に出力している。 The extraction unit 20 includes an object extraction unit 24 and a color recognition unit 26 and is connected to the entity length processing unit 30. The object extraction unit 24 inputs photographed image data, extracts a line drawing (contour line) from the image data, and uses this as an object. The object extraction unit 24 determines the shape, position and orientation of the object thus extracted. The color recognition unit 26 detects the ratio of colors used for the extracted object. That is, the extraction unit 20 including the object extraction unit 24 and the color recognition unit 26 also extracts image feature amounts of objects extracted from the image data, and outputs them to the entity length processing unit 30.

実体長処理部３０は、Ｅｘｉｆデータ取得部３２，ピント照合部３４，センサ情報取得部３６，機種データベース３７，実体長計算部３８などから構成され、物体候補選択部５０と接続している。 The entity length processing unit 30 includes an Exif data acquisition unit 32, a focus collation unit 34, a sensor information acquisition unit 36, a model database 37, an entity length calculation unit 38, and the like, and is connected to the object candidate selection unit 50.

Ｅｘｉｆデータ取得部３２は、入力した画像データに付帯する撮影情報を取得する。本実施例で取り扱う画像データは、デジタルスチルカメラなどの撮像機器で一般的に取り扱われるＥｘｉｆ形式のデータである。Ｅｘｉｆ形式のデータは、ＪＰＥＧ形式の画像データを基本に、画像データを撮影した際の撮影情報、サムネイル画像など、所定の情報を、Ｅｘｉｆ規約に準拠した形式で埋め込んで構成されている。 The Exif data acquisition unit 32 acquires shooting information attached to the input image data. The image data handled in this embodiment is Exif format data that is generally handled by an imaging device such as a digital still camera. Exif format data is based on JPEG format image data and is configured by embedding predetermined information such as shooting information and thumbnail images at the time of shooting image data in a format compliant with the Exif convention.

撮影情報としては、撮影日時，露出時間，レンズ絞り値，シャッタースピード，ＩＳＯ感度，レンズの焦点距離，被写体距離，デジタルスチルカメラのメーカ名，機種名（モデル名），画像幅，画像高さ，ＣＣＤの画素密度，ＧＰＳ（Global Positioning Systems）を用いた撮影位置情報など種々の情報が記憶されている。Ｅｘｉｆデータ取得部３２は、こうした種々の情報のうち、特に「レンズの焦点距離」，「被写体距離」，「メーカ名」，「機種名」，「画像幅」，「画像高さ」，「ＣＣＤの画素密度」などを取得し、これらの情報をセンサ情報取得部３６へ出力している。 The shooting information includes shooting date / time, exposure time, lens aperture value, shutter speed, ISO sensitivity, lens focal length, subject distance, digital still camera manufacturer name, model name (model name), image width, image height, Various information such as the pixel density of the CCD and shooting position information using GPS (Global Positioning Systems) is stored. The Exif data acquisition unit 32 includes, among these various pieces of information, in particular “lens focal length”, “subject distance”, “maker name”, “model name”, “image width”, “image height”, “CCD”. The pixel density ”is acquired, and the information is output to the sensor information acquisition unit 36.

ピント照合部３４は、抽出されたオブジェクトを解析し、オブジェクトのピントのずれが所定範囲内であるか否かを判断する。特に、抽出されたオブジェクトが複数ある場合には、ピントのずれが所定範囲内であるオブジェクトを、処理対象として特定している。通常、撮影画像は、主要な被写体にピントを合わせて撮影されるため、ピントのずれが所定範囲内に収まるオブジェクトを処理対象として選定することで、撮影画像の画像内容を適切に認識することができる。ピント照合部３４は、ピントのずれが所定範囲内であるとして特定したオブジェクトを、センサ情報取得部３６へ出力している。なお、ピントのずれの判断処理については、後述する。 The focus collation unit 34 analyzes the extracted object and determines whether or not the focus shift of the object is within a predetermined range. In particular, when there are a plurality of extracted objects, an object whose focus shift is within a predetermined range is specified as a processing target. Normally, the shot image is shot with the main subject in focus, so that the image content of the shot image can be properly recognized by selecting an object whose focus deviation falls within a predetermined range as a processing target. it can. The focus collation unit 34 outputs the object identified as the focus deviation being within the predetermined range to the sensor information acquisition unit 36. Note that the process for determining the focus deviation will be described later.

センサ情報取得部３６は、上述のＥｘｉｆデータ取得部３２，ピント照合部３４の他、機種データベース３７や実体長計算部３８とも接続し、Ｅｘｉｆデータ取得部３２から入力した情報、または、入力した情報を基に機種データベース３７から取得した撮像機器のセンサの情報を、実体長計算部３８へ出力する。 The sensor information acquisition unit 36 is connected to the model database 37 and the entity length calculation unit 38 in addition to the Exif data acquisition unit 32 and the focus verification unit 34 described above, and the information input from the Exif data acquisition unit 32 or the input information The information of the sensor of the imaging device acquired from the model database 37 based on the above is output to the entity length calculation unit 38.

機種データベース３７は、複数種類のデジタルスチルカメラに対して、メーカ名や機種名と、その機種におけるセンサの情報とを対応付けて記憶している。ここでのセンサ情報とは、その機種の撮像素子であるＣＣＤの幅と高さの値を示した「ＣＣＤサイズ」の情報を意味している。例えば、「機種名：ＣＰ−５００，ＣＣＤサイズ：４．８ｍｍ（Ｗ）×３．６ｍｍ（Ｈ）」といったセンサの情報が記憶されている。 The model database 37 stores, for a plurality of types of digital still cameras, manufacturer names and model names, and sensor information on the models in association with each other. The sensor information here means “CCD size” information indicating the width and height values of the CCD which is the image pickup device of the model. For example, sensor information such as “model name: CP-500, CCD size: 4.8 mm (W) × 3.6 mm (H)” is stored.

センサ情報取得部３６は、「ＣＣＤ画素密度」の情報が取得できない（データがない）場合でも、Ｅｘｉｆデータ取得部３２からの「メーカ名」，「機種名」を入力し、機種データベース３７を参照して、その機種に対応する「ＣＣＤサイズ」を取得する。センサ情報取得部３６は、取得した「ＣＣＤサイズ」と、入力した「画像幅」，「画像高さ」を用いて、「ＣＣＤ画素密度」を算出している。 The sensor information acquisition unit 36 inputs the “manufacturer name” and “model name” from the Exif data acquisition unit 32 and refers to the model database 37 even when “CCD pixel density” information cannot be acquired (there is no data). Then, the “CCD size” corresponding to the model is acquired. The sensor information acquisition unit 36 calculates “CCD pixel density” using the acquired “CCD size” and the input “image width” and “image height”.

こうして「ＣＣＤ画素密度」の情報を取得した後、「レンズの焦点距離」，「被写体距離」，「ＣＣＤ画素密度」の情報と、ピント照合部３４から入力したオブジェクトとを、実体長計算部３８へ出力している。なお、Ｅｘｉｆ形式のデータに予め「ＣＣＤ画素密度」の情報が付帯されている場合には、機種データベース３７を利用することなく、Ｅｘｉｆデータ取得部３２から入力された「ＣＣＤ画素密度」を、そのまま実体長計算部３８へ出力する。 After acquiring the “CCD pixel density” information in this way, the “lens focal length”, “subject distance”, “CCD pixel density” information and the object input from the focus collation unit 34 are used as the entity length calculation unit 38. Is output to. If the information of “CCD pixel density” is attached to the data in Exif format in advance, the “CCD pixel density” input from the Exif data acquisition unit 32 is used as it is without using the model database 37. The data is output to the entity length calculation unit 38.

実体長計算部３８は、センサ情報取得部３６より入力したオブジェクトと、「レンズの焦点距離」，「被写体距離」，「ＣＣＤ画素密度」の情報とを用いて、オブジェクトの実際の大きさを算出する。具体的には、オブジェクトを解析し、「ＣＣＤ画素密度」を用いて画像上（いわゆるフィルム上）のオブジェクトのサイズ（イメージサイズＸ）を求める。このイメージサイズＸと、焦点距離ｆと、被写体距離Ｗとから、オブジェクトの実際の大きさＹを算出する。 The actual length calculation unit 38 calculates the actual size of the object using the object input from the sensor information acquisition unit 36 and the information of “lens focal length”, “subject distance”, and “CCD pixel density”. To do. Specifically, the object is analyzed, and the size (image size X) of the object on the image (so-called film) is obtained using “CCD pixel density”. From the image size X, the focal length f, and the subject distance W, the actual size Y of the object is calculated.

図２は、オブジェクトの実際の大きさを求める算出処理の説明図である。図示するように、画像上のオブジェクトＡのイメージサイズＸと、焦点距離ｆと、被写体距離Ｗと、オブジェクトの実際の大きさＹとは、Ｘ：ｆ＝Ｙ：Ｗの関係にある。この関係から、オブジェクトの実際の大きさＹは、Ｙ＝Ｗ・Ｘ／ｆにより算出される。 FIG. 2 is an explanatory diagram of a calculation process for obtaining the actual size of the object. As shown in the figure, the image size X of the object A on the image, the focal length f, the subject distance W, and the actual size Y of the object have a relationship of X: f = Y: W. From this relationship, the actual size Y of the object is calculated by Y = W · X / f.

なお、Ｅｘｉｆ形式のデータにおける被写体距離Ｗは、撮影倍率Ｍと焦点距離ｆとを用いて、Ｗ＝（１+１／Ｍ）×ｆにより計算された値が記憶されている。 Note that the subject distance W in the Exif format data stores a value calculated by W = (1 + 1 / M) × f using the photographing magnification M and the focal length f.

実体長処理部３０は、こうして算出されたオブジェクトの実際の大きさを、抽出部２０から入力したオブジェクトの画像特徴量と共に、物体候補選択部５０へ出力している。 The entity length processing unit 30 outputs the actual size of the object thus calculated to the object candidate selection unit 50 together with the image feature amount of the object input from the extraction unit 20.

物体候補選択部５０は、複数のデータベースを備えた記憶部４０と接続しており、入力したオブジェクトの実際の大きさや、画像特徴量から、撮影画像のオブジェクトを表現する候補を選択する。記憶部４０内の複数のデータベースは、対象物と、その形状、位置姿勢等とを対応付けたオブジェクトデータベース４４，対象物と、その対象物を構成する色の割合とを対応つけた色認識データベース４６などを備えると共に、特に本実施例では、対象物と、その対象物の大きさとを対応付けた物体実体長データベース４２を備えている。 The object candidate selection unit 50 is connected to a storage unit 40 having a plurality of databases, and selects a candidate representing an object of a captured image from the actual size of the input object and the image feature amount. A plurality of databases in the storage unit 40 includes an object database 44 that associates an object with its shape, position and orientation, and a color recognition database that associates the object with a ratio of colors constituting the object. 46, and in particular, in the present embodiment, an object entity length database 42 in which an object is associated with the size of the object is provided.

図３は、物体実体長データベース４２に記憶されているデータの内容の説明図である。図示するように、この物体実体長データベース４２には、「トラ」，「ネコ」，「ヒト」，「ビルディング（模型）」などの物体（対象物）と、各物体の実際の長さ（大きさ）の範囲とが記憶されている。例えば、大人の「トラ」であれば、長さが８０ｃｍから２００ｃｍ程度として設定され、大人の「ネコ」であれば、長さが２０ｃｍから５０ｃｍ程度として設定されている。 FIG. 3 is an explanatory diagram of the contents of data stored in the object entity length database 42. As shown in the figure, this object entity length database 42 includes objects (objects) such as “tiger”, “cat”, “human”, “building (model)”, and the actual length (large) of each object. Range) is stored. For example, for an adult “tiger”, the length is set to about 80 cm to 200 cm, and for an adult “cat”, the length is set to about 20 cm to 50 cm.

物体候補選択部５０は、こうした複数のデータベースを備えた記憶部４０を参照する。例えば、抽出したオブジェクトの形状や色などを用い、オブジェクトデータベース４４，色認識データベース４６を参照した結果、オブジェクトの候補として「ネコ」または「トラ」が選択されたとする。ここで、オブジェクトの実際の大きさが１００ｃｍであったとすると、物体実体長データベース４２を参照した結果、オブジェクトは「トラ」である可能性が高いこととなる。こうして選択された「トラ」が、オブジェクト（撮影画像）の内容を表現する最も有力な候補とされて、出力される。 The object candidate selection unit 50 refers to the storage unit 40 including such a plurality of databases. For example, it is assumed that “cat” or “tiger” is selected as an object candidate as a result of referring to the object database 44 and the color recognition database 46 using the shape and color of the extracted object. Here, if the actual size of the object is 100 cm, it is highly possible that the object is a “tiger” as a result of referring to the object entity length database 42. The selected “tiger” is output as the most promising candidate for expressing the contents of the object (captured image).

以上の構成の画像認識装置１０は、上記の各部における処理をソフトウェアプログラムで実現し、その処理プログラムをインストールしたコンピュータにより構成される。このコンピュータ（図示なし）は、ＣＰＵ，ＲＯＭ，ＲＡＭ，ハードディスク等を備え、キーボード，ディスプレイ等と接続された一般的な計算機であり、複数の撮影画像を備えたデジタルスチルカメラと接続されている。こうしたコンピュータのＲＯＭに、処理プログラムが記憶され、ハードディスクの一記憶領域に、上記の各データベースが記憶されている。以下に、このコンピュータで実行される上記各部の処理（画像認識処理）について説明する。 The image recognition apparatus 10 having the above configuration is configured by a computer in which the processing in each of the above-described units is realized by a software program and the processing program is installed. This computer (not shown) includes a CPU, ROM, RAM, hard disk, and the like, and is a general computer connected to a keyboard, a display, and the like, and is connected to a digital still camera having a plurality of captured images. A processing program is stored in the ROM of such a computer, and each of the above databases is stored in one storage area of the hard disk. Hereinafter, the processing (image recognition processing) of the above-described units executed by the computer will be described.

なお、ここでは、デジタルスチルカメラから直接的に画像データを入力するものとして説明するが、Ｅｘｉｆ形式であって上述の撮影情報を備えている画像データであれば、デジタルビデオカメラやカメラ付き携帯電話などの撮像機器からの入力や、メモリカードやハードディスクなどの記録媒体からの入力であっても良い。 Here, the description will be made on the assumption that image data is directly input from a digital still camera. However, if the image data is in the Exif format and includes the above-described shooting information, a digital video camera or a camera-equipped mobile phone is used. It is also possible to input from an imaging device such as, or from a recording medium such as a memory card or hard disk.

Ｂ．画像認識処理：
図４は、画像認識処理の流れを示すフローチャートである。キーボードを介したユーザ指示により、ディスプレイ上には、デジタルスチルカメラ内の撮影画像が表示される。ユーザがディスプレイ上で一の撮影画像を選択する操作を行なうことで、この処理はＣＰＵにより実行される。 B. Image recognition processing:
FIG. 4 is a flowchart showing the flow of image recognition processing. A photographed image in the digital still camera is displayed on the display in accordance with a user instruction via the keyboard. This process is executed by the CPU when the user performs an operation of selecting one photographed image on the display.

処理を開始すると、ＣＰＵは、一の撮影画像の画像データを入力し、その画像データから輪郭線を抽出する（ステップＳ４００）。具体的には、画像データの輝度値が急激に変化する場所を物体の境界であるとして、境界線を抽出するエッジ検出を実行し、検出されたエッジを連結して輪郭線を抽出している。こうして輪郭線を抽出することで、画像データ内のオブジェクトを取得すると共に、その形状、位置姿勢などの画像特徴量を取得している。なお、抽出されるオブジェクトが複数ある場合には、オブジェクト毎に画像特徴量を取得する。 When the process is started, the CPU inputs image data of one captured image, and extracts a contour line from the image data (step S400). Specifically, assuming that the place where the luminance value of the image data changes rapidly is the boundary of the object, edge detection is performed to extract the boundary line, and the detected edge is connected to extract the contour line. . By extracting the contour line in this manner, an object in the image data is acquired, and an image feature amount such as its shape, position and orientation is acquired. When there are a plurality of objects to be extracted, the image feature amount is acquired for each object.

続いて、ＣＰＵは、輪郭線を基に取得されたオブジェクトについて、色認識処理を実行する（ステップＳ４１０）。具体的には、オブジェクトの画像特徴量として、オブジェクトを構成する色の割合を検出している。 Subsequently, the CPU executes color recognition processing for the object acquired based on the contour line (step S410). Specifically, the ratio of the colors constituting the object is detected as the image feature amount of the object.

こうしてオブジェクトおよびその画像特徴量を取得したＣＰＵは、オブジェクトの実際の大きさを求める実体長処理を実行する（ステップＳ４２０）。実体長処理では、ピントのずれが所定範囲内、つまり、ピントのあったオブジェクトについて、実際の大きさを算出する。この実体長処理については、後に詳しく説明する。 The CPU, which has acquired the object and its image feature amount in this way, executes a substance length process for obtaining the actual size of the object (step S420). In the actual length process, the actual size is calculated for the object within the predetermined range, that is, the object in focus. This entity length process will be described in detail later.

ＣＰＵは、算出したオブジェクトの実際の大きさや、オブジェクトの画像特徴量から、入力した画像データのオブジェクト（つまり、主要な被写体が何であるか）を決定し（ステップＳ４３０）、一連の画像認識処理を終了する。具体的には、ハードディスク上の各種データベースを参照しつつ、オブジェクトを表わす対象物として適切なものを抽出し、ディスプレイ上に表示する。なお、適切であると判断された対象物が複数ある場合には、複数の候補をディスプレイ上に表示している。 The CPU determines the object of the input image data (that is, what the main subject is) from the actual size of the calculated object and the image feature amount of the object (step S430), and performs a series of image recognition processing. finish. Specifically, referring to various databases on the hard disk, an appropriate object representing the object is extracted and displayed on the display. When there are a plurality of objects determined to be appropriate, a plurality of candidates are displayed on the display.

以上の画像認識処理によれば、ピントのあったオブジェクトの実際の大きさを、オブジェクト（つまり、撮影画像）の認識における一判断要素として用いる。したがって、単に、オブジェクトから抽出される画像特徴量を用いた判断による画像認識に比べ、認識の精度を高めることができる。 According to the image recognition process described above, the actual size of the focused object is used as one determination factor in the recognition of the object (that is, the captured image). Therefore, the recognition accuracy can be improved as compared with the image recognition based simply on the determination using the image feature amount extracted from the object.

Ｃ．実体長処理：
図５は、画像認識処理のサブルーチンとしての実体長処理の流れを示すフローチャートである。この処理は、入力された一の画像データにおいて、オブジェクト、画像特徴量が取得された後に、ＣＰＵにより実行される。 C. Entity length processing:
FIG. 5 is a flowchart showing the flow of the entity length process as a subroutine of the image recognition process. This processing is executed by the CPU after an object and an image feature amount are acquired in the input image data.

処理を開始すると、ＣＰＵは入力された一の画像データについて、Ｅｘｉｆ形式のデータ（Ｅｘｉｆデータ）の有無を判断する（ステップＳ５０５）。ステップＳ５０５で、Ｅｘｉｆデータが無い（Ｎｏ）と判断した場合には、実体長の算出を行なうことなく、リターンに抜けて、図４のステップＳ４３０へ戻る。 When the process is started, the CPU determines whether or not there is Exif format data (Exif data) for the input image data (step S505). If it is determined in step S505 that there is no Exif data (No), the process returns to step S430 in FIG.

他方、ステップＳ５０５で、Ｅｘｉｆデータが有る（Ｙｅｓ）と判断した場合には、Ｅｘｉｆデータを取り出す処理を行なう（ステップＳ５１０）。具体的には、Ｅｘｉｆデータのうち、「レンズの焦点距離」，「被写体距離」，「メーカ名」，「機種名」，「画像幅」，「画像高さ」，「ＣＣＤの画素密度」の項目に記録されているデータを取得している。 On the other hand, if it is determined in step S505 that Exif data is present (Yes), processing for extracting Exif data is performed (step S510). Specifically, among the Exif data, “lens focal length”, “subject distance”, “manufacturer name”, “model name”, “image width”, “image height”, “CCD pixel density”. The data recorded in the item is obtained.

続いて、ＣＰＵは、取得されたオブジェクトについてピントのずれを判断し、処理対象として特定する被写体特定処理を実行する（ステップＳ５２０）。具体的には、取得されたオブジェクトのエッジ（輪郭線）の幅を検出し、エッジの幅が予め設定した所定閾値に収まる場合には「ピントがあっている」（ピントのずれが少ない）と判断し、所定閾値を超える場合には「ピントがあっていない」（ピントのずれが大きい）と判断する。こうしたピントの判断処理により、ピントがあっているオブジェクトを処理対象として特定している。例えば、オブジェクトが複数取得されている場合には、各オブジェクト毎についてピントの判断処理を実行し、ピントの閾値条件を満たすものの中で、最もピントがあっているものを処理対象として特定している。 Subsequently, the CPU determines a focus shift with respect to the acquired object, and executes subject specifying processing for specifying as a processing target (step S520). Specifically, when the width of the edge (contour line) of the acquired object is detected and the width of the edge falls within a predetermined threshold value set in advance, “in focus” (small focus shift) If the predetermined threshold value is exceeded, it is determined that “out of focus” (the focus shift is large). By such focus determination processing, an object in focus is specified as a processing target. For example, when a plurality of objects are acquired, focus determination processing is executed for each object, and among the objects that satisfy the focus threshold value, the object that is in focus is identified as the processing target. .

被写体特定処理を実行したＣＰＵは、ピントのあったオブジェクトが存在するか否かを判断する（ステップＳ５３５）。すなわち、ピントの閾値条件を満たし、特定された一のオブジェクトの有無を判断している。 The CPU that has executed the subject specifying process determines whether or not there is a focused object (step S535). That is, the presence / absence of one specified object that satisfies the focus threshold condition is determined.

ステップＳ５３５で、ピントのあったオブジェクトが存在しない（Ｎｏ）と判断した場合、つまり、取得したオブジェクトは幾つかあるものの、全て、ピントの閾値条件を満たさないような場合には、実体長の算出を行なうことなく、リターンに抜けて、図４のステップＳ４３０へ戻る。 If it is determined in step S535 that no focused object exists (No), that is, there are several acquired objects, but all do not satisfy the focus threshold condition, the actual length is calculated. Without returning to step S430 and return to step S430 in FIG.

他方、ステップＳ５３５で、ピントのあったオブジェクトが存在する（Ｙｅｓ）と判断した場合には、「ＣＣＤの画素密度」の情報が有るか否かを判断する（ステップＳ５４５）。具体的には、Ｅｘｉｆデータから取り出した「ＣＣＤの画素密度」がヌル（ゼロ）である場合に情報が無いと、それ以外の場合には情報が有ると、それぞれ判断している。 On the other hand, if it is determined in step S535 that a focused object exists (Yes), it is determined whether or not there is information on “CCD pixel density” (step S545). Specifically, it is determined that there is no information when the “CCD pixel density” extracted from the Exif data is null (zero), and there is information in other cases.

ステップＳ５４５で、ＣＣＤ画素密度の情報が有る（Ｙｅｓ）と判断した場合には、実体長計算処理（ステップＳ５８０）へ移行する。 If it is determined in step S545 that there is CCD pixel density information (Yes), the process proceeds to the actual length calculation process (step S580).

他方、ステップＳ５４５で、ＣＣＤ画素密度の情報が無い（Ｎｏ）と判断した場合には、「機種名」の情報の有無を判断する（ステップＳ５５５）。 On the other hand, if it is determined in step S545 that there is no CCD pixel density information (No), the presence / absence of “model name” information is determined (step S555).

ステップＳ５５５で、機種名の情報が有る（Ｙｅｓ）と判断した場合には、センサ情報取得処理を実行する（ステップＳ５７０）。この処理では、機種データベース３７を参照して、「メーカ名」，「機種名」から「ＣＣＤサイズ」の情報を取得し、「ＣＣＤサイズ」と、「画像幅」，「画像高さ」とを用いて、「ＣＣＤ画素密度」を算出している。ＣＰＵは、「ＣＣＤ画素密度」を算出後、実体長計算処理（ステップＳ５８０）へ移行する。 If it is determined in step S555 that there is model name information (Yes), sensor information acquisition processing is executed (step S570). In this process, information on “CCD size” is acquired from “maker name” and “model name” with reference to the model database 37, and “CCD size”, “image width”, and “image height” are obtained. Using this, the “CCD pixel density” is calculated. After calculating “CCD pixel density”, the CPU proceeds to the actual length calculation process (step S580).

また、ステップＳ５５５で、機種名の情報が無い（Ｎｏ）と判断した場合、すなわち、Ｅｘｉｆデータの該当箇所にデータが記憶されていないような場合には、実体長の算出を行なうことなく、リターンに抜けて、図４のステップＳ４３０へ戻る。つまり、この場合には、デジタルスチルカメラ固有の情報が取得できないため、実体長の計算は行なわない。 If it is determined in step S555 that there is no model name information (No), that is, if no data is stored in the corresponding portion of the Exif data, the return is performed without calculating the actual length. To return to step S430 in FIG. That is, in this case, since the information specific to the digital still camera cannot be acquired, the actual length is not calculated.

「レンズの焦点距離」，「被写体距離」，「ＣＣＤの画素密度」を取得し、ピントのあったオブジェクトを特定したＣＰＵは、実体長計算処理を実行し（ステップＳ５８０）、リターンに抜けて一連の処理を終了し、図４のステップＳ４３０へ戻る。実体長計算処理では、図２に示したように、ＣＣＤ画素密度を用いてオブジェクトのイメージサイズＸを算出し、焦点距離ｆ，被写体距離Ｗを用いて、オブジェクトの実際の大きさＹを算出する。 The CPU that has acquired the “lens focal length”, “subject distance”, and “CCD pixel density” and identified the object in focus executes the substance length calculation process (step S580), and returns to the series. This process is terminated, and the process returns to step S430 in FIG. In the actual length calculation process, as shown in FIG. 2, the image size X of the object is calculated using the CCD pixel density, and the actual size Y of the object is calculated using the focal length f and the subject distance W. .

例えば、図６に示すように、「机上の牛乳パック」を撮影した画像データのＥｘｉｆデータとして、被写体距離が２８００[ｍｍ]、レンズの焦点距離が５０[ｍｍ]、ＣＣＤ画素密度が２０４８０００／５９５[ｄｐｉ]を取得し、オブジェクトとしての「牛乳パック」の画像上のドット数が２２８ドットであると検出されたとする。この場合、１[ｉｎ]を２５．４[ｍｍ]とすると、画像上のイメージサイズＸは、Ｘ＝２５．４×５９５／２０４８０００×２２８＝１．６８[ｍｍ]となる。これを用いて、実際の大きさＹは、Ｙ＝２８００×１．６８／５０＝９４．８[ｍｍ]と算出される。 For example, as shown in FIG. 6, as Exif data of image data obtained by photographing “a milk cart on the desk”, the subject distance is 2800 [mm], the focal length of the lens is 50 [mm], and the CCD pixel density is 2048000/595 [ dpi] and it is detected that the number of dots on the image of “milk pack” as an object is 228 dots. In this case, if 1 [in] is 25.4 [mm], the image size X on the image is X = 25.4 × 595/2048000 × 228 = 1.68 [mm]. Using this, the actual size Y is calculated as Y = 2800 × 1.68 / 50 = 94.8 [mm].

こうして算出されたオブジェクトの実際の大きさは、図４のステップＳ４３０における被写体の決定処理で利用され、適切な画像認識が実行される。 The actual size of the object thus calculated is used in the subject determination process in step S430 in FIG. 4, and appropriate image recognition is executed.

以上の実体長処理を含む画像認識処理によれば、オブジェクトのピントのずれを考慮して、画像認識を行なう。すなわち、ピントのずれが所定閾値内に収まる場合に、実際の大きさを算出し、これを認識処理に利用する。したがって、オブジェクトの実際の大きさの算出に伴う誤差を低減することができる。その結果、実際の大きさを利用した場合の画像認識の精度を向上することができる。 According to the image recognition processing including the above-described entity length processing, image recognition is performed in consideration of the focus shift of the object. That is, when the focus shift falls within a predetermined threshold, the actual size is calculated and used for the recognition process. Therefore, it is possible to reduce an error accompanying calculation of the actual size of the object. As a result, the accuracy of image recognition when the actual size is used can be improved.

また、ピントのずれが大きい場合には、処理を行なわない。この場合には、画像認識処理におけるオブジェクトの画像特徴量によって、オブジェクトの候補が決定される。こうしたピントのずれが大きいオブジェクトは、被写体距離やイメージサイズが不正確となり、実体長の算出処理を実行しても、計算結果の誤差が大きい。かかる結果を利用することで、かえって、オブジェクトの認識精度を低下させてしまう場合がある。本実施例では、こうした場合には実体長を算出しないため、オブジェクトの実体長を画像認識に有効に利用することができる。 If the focus shift is large, no processing is performed. In this case, the object candidate is determined based on the image feature amount of the object in the image recognition process. An object with such a large focus shift has an inaccurate subject distance and image size, and even if the actual length calculation process is executed, the calculation result has a large error. By using such a result, the recognition accuracy of the object may be lowered. In this embodiment, since the actual length is not calculated in such a case, the actual length of the object can be effectively used for image recognition.

なお、本実施例ではステップＳ５２０の処理において、一つの閾値を用いてピントのずれが所定範囲内である（ピントがあっている）か否かを判断するものとしたが、ピントがあっていないと判断しても、ある程度のピントのずれであれば、実体長計算を実行するものとしても良い。この場合、ピントのずれの判断に第１、第２の２つの閾値（第１の閾値＜第２の閾値）を用い、第１の閾値を超えても第２の閾値の範囲内であれば、実体長計算を実行する。そして、図３に示した物体実体長データベース４２における対象物の実際の長さの範囲に所定の幅δ（誤差）を持たせ、図４のステップＳ４３０では、幅δを持たせた実際の長さの範囲に基づいて被写体を決定する。 In this embodiment, in the process of step S520, it is determined whether or not the focus shift is within a predetermined range (in focus) using one threshold value. However, there is no focus. Even if it is determined that the subject is out of focus to some extent, the entity length calculation may be executed. In this case, the first and second threshold values (first threshold value <second threshold value) are used for determining the out-of-focus state, and if the first threshold value is exceeded but within the second threshold value range Execute the entity length calculation. Then, a predetermined width δ (error) is given to the range of the actual length of the object in the object entity length database 42 shown in FIG. 3, and in step S430 in FIG. 4, the actual length having the width δ is given. The subject is determined based on the range.

具体的には、図５に示した実体長処理のステップＳ５２０を、図７に示すステップＳ５２０ａ〜Ｓ５２０ｇの処理に置き換えれば良い。すなわち、Ｅｘｉｆデータを取得したＣＰＵは、所定のオブジェクトについて、ピントのずれ量としてのエッジ幅αを算出する（ステップＳ５２０ａ）。ＣＰＵは、エッジ幅αが第１の閾値Ｔｈ１よりも小さいか否かを判断する（ステップＳ５２０ｂ）。 Specifically, step S520 of the entity length process shown in FIG. 5 may be replaced with the process of steps S520a to S520g shown in FIG. That is, the CPU that has acquired the Exif data calculates an edge width α as a focus shift amount for a predetermined object (step S520a). The CPU determines whether or not the edge width α is smaller than the first threshold Th1 (step S520b).

ステップＳ５２０ｂで、エッジ幅αが第１の閾値Ｔｈ１よりも小さい（Ｙｅｓ）と判断された場合には、まず、そのオブジェクトを処理対象の候補として一時的に記憶しておく（ステップＳ５２０ｅ）。 If it is determined in step S520b that the edge width α is smaller than the first threshold Th1 (Yes), the object is temporarily stored as a candidate for processing (step S520e).

他方、ステップＳ５２０ｂで、エッジ幅αが第１の閾値Ｔｈ１以上である（Ｎｏ）と判断された場合には、エッジ幅αが第２の閾値Ｔｈ２よりも小さいか否かを判断する（ステップＳ５２０ｃ）。ステップＳ５２０ｃで、エッジ幅αが第２の閾値Ｔｈ２よりも小さい（Ｙｅｓ）と判断された場合には、誤差を考慮した（つまり幅δを持たせた）物体実体長データベース４２を使用する条件を付け（ステップＳ５２０ｄ）、これを処理対象の候補として一時的に記憶しておく（ステップＳ５２０ｅ）。また、ステップＳ５２０ｃで、エッジ幅αが第２の閾値Ｔｈ２以上である（Ｎｏ）と判断された場合には、そのオブジェクトを処理対象の候補とはしない。 On the other hand, when it is determined in step S520b that the edge width α is equal to or larger than the first threshold Th1 (No), it is determined whether the edge width α is smaller than the second threshold Th2 (step S520c). ). If it is determined in step S520c that the edge width α is smaller than the second threshold value Th2 (Yes), a condition for using the object entity length database 42 in consideration of the error (that is, having the width δ) is set. This is temporarily stored as a candidate for processing (step S520e). If it is determined in step S520c that the edge width α is equal to or greater than the second threshold Th2 (No), the object is not regarded as a candidate for processing.

続いて、ＣＰＵは、オブジェクトの全部について処理を実行したか否かを判断する（ステップＳ５２０ｆ）。図４のステップＳ４００で、オブジェクトが複数抽出された場合には、ＣＰＵはこの数を一時的に記憶している。さらに、ＣＰＵはステップＳ５２０ａの処理を実行する度に、その処理回数をカウントしている。ＣＰＵは、カウント値が記憶した数字と一致した場合に、全オブジェクトについて処理を実行したと判断している。 Subsequently, the CPU determines whether or not processing has been executed for all of the objects (step S520f). When a plurality of objects are extracted in step S400 of FIG. 4, the CPU temporarily stores this number. Further, the CPU counts the number of processes every time the process of step S520a is executed. The CPU determines that the process has been executed for all objects when the count value matches the stored number.

ステップＳ５２０ｆで、まだ修理済みでない（Ｎｏ）と判断した場合には、ステップＳ５２０ａに戻り、次のオブジェクトについてピントのずれの判断を実行する。他方、ステップＳ５２０ｆで、処理済である（Ｙｅｓ）と判断した場合には、記憶した処理対象候補の中から最もピントのずれの少ないものを処理対象として特定する（ステップＳ５２０ｇ）。こうして特定されたオブジェクトについて、実体長を計算する処理が実行される。勿論、ステップＳ５２０ｅにおける処理対象の候補として記憶されたオブジェクトが存在しない場合も有り得る。この場合には、画像認識に実際の大きさを判断しない。 If it is determined in step S520f that the repair has not been completed yet (No), the process returns to step S520a, and the determination of the focus shift is executed for the next object. On the other hand, if it is determined in step S520f that the processing has been completed (Yes), the processing target candidate having the smallest focus deviation is specified as the processing target (step S520g). A process for calculating the entity length is executed for the identified object. Of course, there may be no object stored as a candidate for processing in step S520e. In this case, the actual size is not determined for image recognition.

こうした処理（図７に示した処理）を実行することで、ピントのずれが第１の閾値範囲に収まる場合に比べて精度は低下するものの、画像認識の判断材料として実際の大きさを用いることができるため、画像認識の精度を向上することができる。 By executing such a process (the process shown in FIG. 7), the accuracy is reduced as compared with the case where the focus shift falls within the first threshold range, but the actual size is used as a judgment material for image recognition. Therefore, the accuracy of image recognition can be improved.

さらに、図８に示すように、ピントのずれが第１の閾値Ｔｈ１を超えて、第３の閾値Ｔｈ３以内である場合には、エッジ強調を実行するものとしても良い。図８は、図７のステップＳ５２０ｃ，Ｓ５２０ｄの処理を、ステップＳ８２０ｃ、Ｓ８２０ｄの処理に置き換えたフローチャートである。図示するように、ステップＳ５２０ｂで、エッジ幅αが第１の閾値Ｔｈ１以上である（Ｎｏ）と判断された場合には、エッジ幅αが第３の閾値Ｔｈ３よりも小さいか否かを判断する（ステップＳ８２０ｃ）。ここで第３の閾値Ｔｈ３は、第１の閾値Ｔｈ１よりも大きい所定の閾値であり、第２の閾値Ｔｈ２と同じ値であっても良い。 Furthermore, as shown in FIG. 8, when the focus shift exceeds the first threshold Th1 and is within the third threshold Th3, edge enhancement may be executed. FIG. 8 is a flowchart in which steps S520c and S520d in FIG. 7 are replaced with steps S820c and S820d. As shown in the figure, if it is determined in step S520b that the edge width α is equal to or greater than the first threshold Th1 (No), it is determined whether the edge width α is smaller than the third threshold Th3. (Step S820c). Here, the third threshold value Th3 is a predetermined threshold value that is larger than the first threshold value Th1, and may be the same value as the second threshold value Th2.

ステップＳ８２０ｃで、エッジ幅αが第３の閾値Ｔｈ３以上である（Ｎｏ）と判断された場合には、そのオブジェクトを処理対象の候補とはしない。他方、ステップＳ８２０ｃで、エッジ幅αが第３の閾値Ｔｈ３よりも小さい（Ｙｅｓ）と判断された場合には、エッジ強調を実行し（ステップＳ８２０ｄ）、そのオブジェクトを処理対象の候補とする。 If it is determined in step S820c that the edge width α is equal to or greater than the third threshold Th3 (No), the object is not set as a candidate for processing. On the other hand, if it is determined in step S820c that the edge width α is smaller than the third threshold Th3 (Yes), edge enhancement is executed (step S820d), and the object is set as a candidate for processing.

エッジ強調は、所定のフィルタを用いてエッジ（輪郭線）を強調する処理である。エッジ強調を施してオブジェクトのエッジをシャープに見せる加工を行なうことで、図５のステップＳ５８０においてイメージサイズの算出時の誤差を減らすことができる。 Edge enhancement is processing for enhancing an edge (contour line) using a predetermined filter. By performing edge emphasis to make the edge of the object appear sharp, an error in calculating the image size in step S580 in FIG. 5 can be reduced.

また、本実施例では、オブジェクトを解析してピントのずれを判断するものとしたが、こうしたピントに関する情報がＥｘｉｆ形式の撮影情報に含まれる場合には、撮影情報から抽出したピントの情報を用いて、実体長処理を実行するものとしても良い。具体的には、ピントを合わせた位置の情報から、その位置に存在するオブジェクトを、処理対象として特定する。こうした撮影情報を利用することで、迅速な処理を実行することができる。 In the present embodiment, the object is analyzed to determine the focus shift. However, when such focus information is included in the Exif format shooting information, the focus information extracted from the shooting information is used. Thus, the entity length process may be executed. Specifically, the object existing at the position is specified as the processing target from the information on the focused position. By using such shooting information, a quick process can be executed.

Ｄ．変形例：
本実施例では、画像データからの輪郭線の抽出（つまり、オブジェクトの取得）処理（ステップＳ４００）を実行し、その後、ピントのずれの判断による被写体特定処理（ステップＳ５２０）を実行し、どちらの処理もオブジェクトのエッジに関する処理を実行するものとして説明したが、ステップＳ４００の輪郭線の抽出処理と共に、エッジ幅を検出して、「ピントがあっているか否か」を判断しておくものとしても良い。こうすることで、迅速な画像認識処理を実行することができる。 D. Variation:
In this embodiment, an outline extraction (that is, object acquisition) process (step S400) from image data is executed, and then a subject specifying process (step S520) based on determination of focus deviation is executed. Although the processing has been described as executing the processing related to the edge of the object, it is also possible to detect the edge width together with the outline extraction processing in step S400 and determine whether or not “in focus”. good. In this way, a quick image recognition process can be executed.

また、本実施例では、オブジェクトが複数となる場合には、最もピントのあうものを１つ決定し、そのオブジェクトについて実体長を算出するものとしたが、処理対象とするオブジェクトは１つに限るものではない。ピントの条件を満たすオブジェクトであれば、複数のオブジェクトについて実体長を求めるものとしても良い。 In this embodiment, when there are a plurality of objects, the one that is in focus is determined and the entity length is calculated for the object. However, the number of objects to be processed is limited to one. It is not a thing. As long as the object satisfies the focus condition, the entity length may be obtained for a plurality of objects.

本実施例では、記憶部４０に備えた３つのデータベースを参照して、画像認識を行なうものとしたが、更に、複数のデータベースを用意して、画像認識を行なうものとしても良い。例えば、花や動物などの対象物について、生息域のデータベースを用意しておく。撮影画像のＥｘｉｆデータには、ＧＰＳによる撮影位置の情報が含まれている場合があるため、生息域のデータとＧＰＳによる撮影位置とを比較して、オブジェクトが何であるかを決定する。こうすることで、より一層、画像認識の精度を向上することができる。 In this embodiment, the image recognition is performed by referring to the three databases provided in the storage unit 40. However, it is also possible to prepare a plurality of databases and perform the image recognition. For example, a habitat database is prepared for objects such as flowers and animals. Since the Exif data of the photographed image may include information on the photographing position by GPS, the habitat data and the photographing position by GPS are compared to determine what the object is. By doing so, the accuracy of image recognition can be further improved.

さらに、本実施例では、コンピュータのハードディスク上に種々のデータベースを構築するものとしたが、例えば、ネットワーク上のサーバにデータベースを構築するものとしても良い。この場合、一般的なコンピュータに備わるネットワークとの接続機能を利用すれば良い。こうすることで、大容量のデータベースを構築することができる。 Furthermore, in the present embodiment, various databases are constructed on the hard disk of the computer. For example, the database may be constructed on a server on the network. In this case, a connection function with a network provided in a general computer may be used. By doing so, a large-capacity database can be constructed.

以上、本発明の実施の形態について説明したが、本発明はこうした実施の形態に何ら限定されるものではなく、本発明の趣旨を逸脱しない範囲内において様々な形態で実施し得ることは勿論である。本実施例では、画像認識処理はソフトウェアプログラムの態様にて実行されるが、上記の各処理（ステップ）を実行する論理回路を備えたハードウェア回路を用いるものとしても良い。こうすることで、ＣＰＵの負荷を軽減することができると共に、より一層高速に各処理を実行することができる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to such embodiment at all, Of course, it can implement with various forms within the range which does not deviate from the meaning of this invention. is there. In this embodiment, the image recognition process is executed in the form of a software program. However, a hardware circuit including a logic circuit that executes each of the above processes (steps) may be used. In this way, the load on the CPU can be reduced and each process can be executed at a higher speed.

本発明の一実施例としての画像認識装置を示す説明図である。It is explanatory drawing which shows the image recognition apparatus as one Example of this invention. オブジェクトの実際の大きさを求める算出処理の説明図である。It is explanatory drawing of the calculation process which calculates | requires the actual magnitude | size of an object. 物体実体長データベースに記憶されているデータの内容の説明図である。It is explanatory drawing of the content of the data memorize | stored in the object entity length database. 画像認識処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an image recognition process. 実体長処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an entity length process. 実際の大きさを求める処理の具体例である。It is a specific example of the process which calculates | requires an actual magnitude | size. 所定のピントのずれを許容する場合の実体長処理のフローチャートである。It is a flowchart of the entity length process in the case of permitting a predetermined focus shift. エッジ強調を加えた実体長処理のフローチャートである。It is a flowchart of the entity length process which added edge emphasis.

符号の説明Explanation of symbols

１０...画像認識装置
２０...抽出部
２４...オブジェクト抽出部
２６...色認識部
３０...実体長処理部
３２...Ｅｘｉｆデータ取得部
３４...ピント照合部
３６...センサ情報取得部
３７...機種データベース
３８...実体長計算部
４０...記憶部
４２...物体実体長データベース
４４...オブジェクトデータベース
４６...色認識データベース
５０...物体候補選択部
Ａ...オブジェクト
Ｗ...被写体距離
Ｘ...イメージサイズ
ｆ...焦点距離 DESCRIPTION OF SYMBOLS 10 ... Image recognition apparatus 20 ... Extraction part 24 ... Object extraction part 26 ... Color recognition part 30 ... Entity length processing part 32 ... Exif data acquisition part 34 ... Focus collation part 36 ... Sensor information acquisition unit 37 ... Model database 38 ... Entity length calculation unit 40 ... Storage unit 42 ... Object entity length database 44 ... Object database 46 ... Color recognition database 50 ... Object candidate selection part A ... Object W ... Subject distance X ... Image size f ... Focal distance

Claims

撮影画像の内容を認識する画像認識装置であって、
複数の対象物毎の画像特徴量と共に、該対象物毎の実際の大きさの情報を記憶するデータベースと、
前記撮影画像を入力し、該撮影画像の特徴に基づいて該撮影画像内に含まれる所定領域をオブジェクトとして抽出するオブジェクト抽出手段と、
前記オブジェクトの画像特徴量を抽出する特徴量抽出手段と、
前記撮影画像に付帯する撮影情報を取得する撮影情報取得手段と、
前記抽出されたオブジェクトを解析し、該オブジェクトのピントのずれが所定の閾値内であるか否かを該オブジェクト毎に判断するピント判断手段と、
前記ピントのずれが所定閾値内であり、該ピントがずれていないと判断された場合に、前記撮影情報を用いて前記オブジェクト毎の実際の大きさを算出する実体長算出手段と、
前記算出された実際の大きさを前記抽出された画像特徴量と共に用いて、前記撮影画像の内容としての前記オブジェクトを表わす前記対象物の候補を、前記データベースから選択する候補選択手段と
を備えた画像認識装置。 An image recognition device for recognizing the content of a captured image,
A database that stores information on the actual size of each object together with image feature values for each of the plurality of objects;
Object extraction means for inputting the captured image and extracting a predetermined area included in the captured image as an object based on the characteristics of the captured image;
Feature amount extraction means for extracting the image feature amount of the object;
Shooting information acquisition means for acquiring shooting information attached to the shot image;
A focus determination unit that analyzes the extracted object and determines whether or not the shift of the focus of the object is within a predetermined threshold;
An actual length calculating means for calculating an actual size of each object using the shooting information when it is determined that the focus shift is within a predetermined threshold and the focus is not shifted;
Candidate selection means for selecting from the database candidates for the object representing the object as the content of the photographed image using the calculated actual size together with the extracted image feature amount Image recognition device.

請求項１に記載の画像認識装置であって、
前記撮影情報取得手段は、前記撮影情報として、少なくとも被写体距離および焦点距離の情報を取得し、
前記実体長算出手段は、前記オブジェクトの大きさであるイメージサイズを求め、当該イメージサイズと前記被写体距離および前記焦点距離とを用いて、該オブジェクトの実際の大きさを算出する
画像認識装置。 The image recognition apparatus according to claim 1,
The photographing information acquisition means acquires at least subject distance and focal length information as the photographing information,
The image recognition apparatus, wherein the actual length calculation unit calculates an image size that is a size of the object, and calculates an actual size of the object using the image size, the subject distance, and the focal length.

請求項２に記載の画像認識装置であって、
前記撮影情報取得手段は、前記撮影情報として、さらに、前記撮影画像を撮影した撮像機器の撮像素子の情報を取得し、
前記実体長算出手段は、前記撮像素子の情報を利用して、前記イメージサイズを求める
画像認識装置。 The image recognition apparatus according to claim 2,
The shooting information acquisition means acquires, as the shooting information, information of an imaging element of an imaging device that has shot the shot image,
The entity length calculation means is an image recognition device that uses the information of the image sensor to determine the image size.

請求項３に記載の画像認識装置であって、
前記撮影情報取得手段は、
前記撮像機器の機種名と前記撮像素子の情報とを関連付けて記憶した機種データベースを備えており、
前記撮像素子の情報に代えて、前記撮像機器の機種名を取得し、
前記取得した機種名を用いて、前記機種データベースから前記撮影素子の情報を取得する
画像認識装置。 The image recognition device according to claim 3,
The photographing information acquisition means includes
A model database storing the model name of the imaging device and the information of the imaging element in association with each other;
Instead of information on the image sensor, obtain the model name of the imaging device,
An image recognition apparatus that acquires information on the imaging element from the model database using the acquired model name.

請求項１ないし４のいずれかに記載の画像認識装置であって、
前記ピント判断手段は、
前記所定閾値を第１の閾値とし、該第１の閾値よりも大きい第２の閾値を備え、
前記ピントのずれが前記第１の閾値を超え、前記第２の閾値内に収まる場合には、前記ピントのずれが許容範囲内であり、該ピントがずれていないと判断する
画像認識装置。 The image recognition device according to any one of claims 1 to 4,
The focus determination means includes
The predetermined threshold is a first threshold, and the second threshold is larger than the first threshold,
An image recognition apparatus that, when the focus shift exceeds the first threshold and falls within the second threshold, determines that the focus shift is within an allowable range and the focus is not shifted.

請求項５に記載の画像認識装置であって、
前記候補選択手段は、前記実際の大きさが、前記ピントのずれが前記第１の閾値を超え、前記第２の閾値内に収まる場合に算出されたものであるときは、前記データベースの実際の大きさの情報に、所定量の誤差を考慮して、前記オブジェクトを表わす前記対象物の候補を選択する
画像認識装置。 The image recognition device according to claim 5,
The candidate selection means, when the actual size is calculated when the focus shift exceeds the first threshold and falls within the second threshold, the actual size of the database An image recognition apparatus that selects a candidate for the object representing the object in consideration of a predetermined amount of error in size information.

請求項１ないし６のいずれかに記載の画像認識装置であって、
前記実体長算出手段は、前記ピントがずれていると判断された場合には、前記オブジェクトの実際の大きさの算出を中止し、
前記候補選択手段は、前記実際の大きさを利用することなく、前記オブジェクトを表わす前記対象物の候補を前記データベースから選択する
画像認識装置。 The image recognition device according to any one of claims 1 to 6,
The entity length calculation means, when it is determined that the focus is out of focus, stops calculating the actual size of the object,
The image recognition apparatus, wherein the candidate selecting unit selects the target candidate representing the object from the database without using the actual size.

請求項１ないし６のいずれかに記載の画像認識装置であって、
前記実体長算出手段は、前記ピントがずれていると判断された場合には、前記実体長の算出処理に先立って、前記オブジェクトのエッジを強調する処理を実行する画像認識装置。 The image recognition device according to any one of claims 1 to 6,
An image recognition apparatus that executes a process of emphasizing an edge of the object prior to the process of calculating the entity length when it is determined that the object length is out of focus.

撮影画像の内容を認識する画像認識方法であって、
複数の対象物毎の画像特徴量と共に、該対象物毎の実際の大きさの情報を、データベースに記憶し、
前記撮影画像を入力し、該撮影画像の特徴に基づいて該撮影画像内に含まれる所定領域をオブジェクトとして抽出し、
前記オブジェクトの画像特徴量を抽出し、
前記撮影画像に付帯する撮影情報を取得し、
前記抽出されたオブジェクトを解析し、該オブジェクトのピントのずれが所定の閾値内であるか否かを該オブジェクト毎に判断し、
前記ピントのずれが所定閾値内であり、該ピントがずれていないと判断された場合に、前記撮影情報を用いて前記オブジェクト毎の実際の大きさを算出し、
前記算出された実際の大きさを前記抽出された画像特徴量と共に用いて、前記撮影画像の内容としての前記オブジェクトを表わす前記対象物の候補を、前記データベースから選択する
画像認識方法。 An image recognition method for recognizing the contents of a captured image,
Along with image feature amounts for a plurality of objects, information on the actual size of each object is stored in a database,
The photographed image is input, and a predetermined area included in the photographed image is extracted as an object based on the characteristics of the photographed image.
Extracting an image feature amount of the object;
Obtaining shooting information attached to the shot image,
Analyzing the extracted object, determining for each object whether the focus shift of the object is within a predetermined threshold,
When it is determined that the focus shift is within a predetermined threshold and the focus is not shifted, the actual size of each object is calculated using the shooting information,
An image recognition method that uses the calculated actual size together with the extracted image feature amount to select a candidate for the object representing the object as the content of the captured image from the database.

撮影画像の内容を認識する画像認識装置を制御するコンピュータプログラムであって、
複数の対象物毎の画像特徴量と共に、該対象物毎の実際の大きさの情報を、データベースに記憶する機能と、
前記撮影画像を入力し、該撮影画像の特徴に基づいて該撮影画像内に含まれる所定領域をオブジェクトとして抽出する機能と、
前記オブジェクトの画像特徴量を抽出する機能と、
前記撮影画像に付帯する撮影情報を取得する機能と、
前記抽出されたオブジェクトを解析し、該オブジェクトのピントのずれが所定の閾値内であるか否かを該オブジェクト毎に判断する機能と、
前記ピントのずれが所定閾値内であり、該ピントがずれていないと判断された場合に、前記撮影情報を用いて前記オブジェクト毎の実際の大きさを算出する機能と、
前記算出された実際の大きさを前記抽出された画像特徴量と共に用いて、前記撮影画像の内容としての前記オブジェクトを表わす前記対象物の候補を、前記データベースから選択する機能と
を前記画像認識装置に実現させるコンピュータプログラム。 A computer program for controlling an image recognition device for recognizing the content of a captured image,
A function for storing information on the actual size of each target object together with image feature amounts for the plurality of target objects in a database;
A function of inputting the photographed image and extracting a predetermined area included in the photographed image as an object based on characteristics of the photographed image;
A function of extracting an image feature amount of the object;
A function of acquiring shooting information attached to the shot image;
A function of analyzing the extracted object and determining for each object whether or not the deviation of the focus of the object is within a predetermined threshold;
A function of calculating an actual size of each object using the shooting information when it is determined that the focus shift is within a predetermined threshold and the focus is not shifted;
A function of selecting the target candidate representing the object as the content of the photographed image from the database using the calculated actual size together with the extracted image feature quantity; Computer program to be realized.

請求項１０に記載のコンピュータプログラムをコンピュータに読み取り可能に記録した記録媒体。 A recording medium in which the computer program according to claim 10 is recorded in a computer-readable manner.