JP2017199278A

JP2017199278A - Detection device, detection method, and program

Info

Publication number: JP2017199278A
Application number: JP2016091357A
Authority: JP
Inventors: 小野　博明; Hiroaki Ono; 博明小野; 英史山田; Hidefumi Yamada; 光永　知生; Tomoo Mitsunaga; 知生光永
Original assignee: Sony Semiconductor Solutions Corp
Current assignee: Sony Semiconductor Solutions Corp
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2017-11-02
Also published as: WO2017188017A1

Abstract

PROBLEM TO BE SOLVED: To detect a predetermined object.SOLUTION: A detection device comprises: an acquisition unit for acquiring distance information related to a distance to a photographic subject; a setting unit for setting a region in which there is a possibility that an object is imaged from feature quantity of the object being a detection object; and a determination unit for determining whether or not an image in the region indicates an object. The detection device comprises: the acquisition unit for acquiring the distance information related to the distance to the photographic subject, the setting unit for setting the region in which there is the possibility that the predetermined object is imaged using the distance information; and an estimation unit for estimating a category to which the object belongs on the basis of a size of the region and the distance information. This technique can be applied to the detection device for detecting the predetermined object.SELECTED DRAWING: Figure 1

Description

本技術は、検出装置、検出方法、およびプログラムに関し、特に、例えば、画像から所定の被写体を検出する検出装置、検出方法、およびプログラムに関する。 The present technology relates to a detection device, a detection method, and a program, and more particularly, to a detection device, a detection method, and a program that detect a predetermined subject from an image, for example.

撮影された画像から、パターンマッチング技術やエッジ検出技術を適用して商品と背景との境界を検出し、商品領域を切り出すことで、商品を認識することが提案されている（例えば、特許文献１参照）。 It has been proposed to recognize a product by detecting a boundary between the product and the background by applying a pattern matching technique or an edge detection technique from a photographed image and cutting out a product region (for example, Patent Document 1). reference).

また、全ての大きさの被写体を認識するために、探索画像の解像度を変更して複数回のスキャンを実行することで、探索対象を認識することが提案されている（例えば、特許文献２参照）。 Further, in order to recognize a subject of all sizes, it has been proposed to recognize a search target by changing the resolution of the search image and executing a plurality of scans (see, for example, Patent Document 2). ).

特開２０１６−３１５９９号公報JP 2016-31599 A 特開２０１１−１４１４８号公報JP 2011-14148 A

所定の物体を認識（検出）するときの処理を軽減することが望まれている。 It is desired to reduce processing when recognizing (detecting) a predetermined object.

本技術は、このような状況に鑑みてなされたものであり、所定の物体を認識（検出）するときの処理を軽減することができるようにするものである。 The present technology has been made in view of such a situation, and makes it possible to reduce processing when recognizing (detecting) a predetermined object.

本技術の一側面の第１の検出装置は、被写体までの距離に関する距離情報を取得する取得部と、前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定する設定部と、前記領域内の画像が、前記物体であるか否かを判定する判定部とを備える。 In a first detection device according to one aspect of the present technology, the object is imaged from an acquisition unit that acquires distance information regarding a distance to a subject, the distance information, and a feature amount of an object to be detected. A setting unit configured to set a possible region; and a determination unit configured to determine whether an image in the region is the object.

本技術の一側面の第２の検出装置は、被写体までの距離に関する距離情報を取得する取得部と、前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定する設定部と、前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する推定部とを備える。 A second detection device according to one aspect of the present technology sets an area where a predetermined object may be captured using an acquisition unit that acquires distance information regarding a distance to a subject and the distance information A setting unit; and an estimation unit that estimates a category to which the object belongs from the size of the region and the distance information.

本技術の一側面の第１の検出方法は、被写体までの距離に関する距離情報を取得し、前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、前記領域内の画像が、前記物体であるか否かを判定するステップを含む。 The first detection method according to one aspect of the present technology acquires distance information regarding a distance to a subject, and the object may be captured from the distance information and a feature amount of the object to be detected. A step of setting a certain region and determining whether or not an image in the region is the object.

本技術の一側面の第２の検出方法は、被写体までの距離に関する距離情報を取得し、前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定するステップを含む。 According to a second detection method of an aspect of the present technology, distance information related to a distance to a subject is acquired, an area where a predetermined object may be captured is set using the distance information, and the area And estimating the category to which the object belongs from the distance information and the distance information.

本技術の一側面の第１のプログラムは、コンピュータに、被写体までの距離に関する距離情報を取得し、前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、前記領域内の画像が、前記物体であるか否かを判定するステップを含む処理を実行させる。 The first program according to an aspect of the present technology may acquire distance information related to a distance to a subject in a computer, and the object may be captured from the distance information and a feature amount of the object to be detected. A region having a characteristic is set, and a process including a step of determining whether or not an image in the region is the object is executed.

本技術の一側面の第２のプログラムは、コンピュータに、被写体までの距離に関する距離情報を取得し、前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定するステップを含む処理を実行させる。 The second program of one aspect of the present technology acquires distance information related to a distance to a subject in a computer, sets an area where a predetermined object may be imaged using the distance information, A process including a step of estimating a category to which the object belongs is executed based on the size of the area and the distance information.

本技術の一側面の第１の検出装置、検出方法、プログラムにおいては、被写体までの距離に関する距離情報が取得され、距離情報と、検出対象とされる物体の特徴量から、物体が撮像されている可能性がある領域が設定され、領域内の画像が、物体であるか否かが判定される。 In the first detection device, the detection method, and the program according to one aspect of the present technology, distance information regarding the distance to the subject is acquired, and an object is imaged from the distance information and the feature amount of the object to be detected. An area that may be present is set, and it is determined whether or not the image in the area is an object.

本技術の一側面の第２の検出装置、検出方法、プログラムにおいては、被写体までの距離に関する距離情報が取得され、距離情報が用いられて、所定の物体が撮像されている可能性がある領域が設定され、領域の大きさと、距離情報とから、物体が属するカテゴリが推定される。 In the second detection device, the detection method, and the program according to one aspect of the present technology, the distance information related to the distance to the subject is acquired, and the distance information may be used to capture a predetermined object Is set, and the category to which the object belongs is estimated from the size of the region and the distance information.

なお、検出装置は、独立した装置であっても良いし、１つの装置を構成している内部ブロックであっても良い。 Note that the detection device may be an independent device, or may be an internal block constituting one device.

また、プログラムは、伝送媒体を介して伝送することにより、または、記録媒体に記録して、提供することができる。 The program can be provided by being transmitted via a transmission medium or by being recorded on a recording medium.

本技術の一側面によれば、所定の物体を認識（検出）するときの処理を軽減することができる。 According to one aspect of the present technology, processing when recognizing (detecting) a predetermined object can be reduced.

なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

本技術を適用した検出装置の一実施の形態の構成を示す図である。It is a figure showing the composition of the 1 embodiment of the detecting device to which this art is applied. 第１の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 1st recognition process. 所定の物体の検出に係わる処理について説明するための図である。It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. テーブルの一例を示す図である。It is a figure which shows an example of a table. 所定の物体の検出に係わる処理について説明するための図である。It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. 第２の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 2nd Embodiment. 第２の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 2nd recognition process. 所定の物体の検出に係わる処理について説明するための図である。It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. 第３の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 3rd Embodiment. 第３の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 3rd recognition process. 被写体の方向検出について説明するための図である。It is a figure for demonstrating a to-be-photographed object's direction detection. 第４の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 4th Embodiment. 第４の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 4th recognition process. 被写体の方向検出について説明するための図である。It is a figure for demonstrating a to-be-photographed object's direction detection. 第５の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 5th Embodiment. 第５の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 5th recognition process. 物体の検出について説明するための図である。It is a figure for demonstrating the detection of an object. 第６の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 6th Embodiment. 第６の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 6th recognition process. 第７の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 7th Embodiment. 第７の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating a 7th recognition process. 第８の実施の形態における検出装置の構成を示す図である。It is a figure which shows the structure of the detection apparatus in 8th Embodiment. 第８の認識処理について説明するためのフローチャートである。It is a flowchart for demonstrating the 8th recognition process. 積層構造について説明するための図である。It is a figure for demonstrating a laminated structure. 積層構造について説明するための図である。It is a figure for demonstrating a laminated structure. 記録媒体について説明するための図である。It is a figure for demonstrating a recording medium.

以下に、本技術を実施するための形態（以下、実施の形態という）について説明する。本技術は、所定の物体、例えば、人（顔、上半身、全身）、自動車、自転車、食材などの物体を認識（検出）するのに適用できる。また、本技術は、そのような所定の物体の検出を、距離情報を用いて行う。以下の説明においては、距離情報を用いて、人の顔を検出する場合を例に挙げて説明する。 Hereinafter, modes for carrying out the present technology (hereinafter referred to as embodiments) will be described. The present technology can be applied to recognizing (detecting) a predetermined object, for example, an object such as a person (face, upper body, whole body), automobile, bicycle, foodstuff, or the like. In addition, the present technology detects such a predetermined object using distance information. In the following description, a case where a human face is detected using distance information will be described as an example.

＜第１の実施の形態＞
図１は、本技術を適用した検出装置の一実施の形態の構成を示す図である。図１に示した検出装置１００は、距離情報取得部１１１、被写体特徴抽出部１１２、被写体候補領域検出部１１３、および実サイズデータベース１１４を備える。 <First Embodiment>
FIG. 1 is a diagram illustrating a configuration of an embodiment of a detection device to which the present technology is applied. The detection apparatus 100 illustrated in FIG. 1 includes a distance information acquisition unit 111, a subject feature extraction unit 112, a subject candidate area detection unit 113, and an actual size database 114.

距離情報取得部１１１は、被写体までの距離を測定し、その測定した結果（距離情報）を生成し、被写体特徴抽出部１１２に出力する。距離情報取得部１１１は、例えば、アクティブ光（赤外線など）を利用した測距センサにより距離情報を取得する。アクティブ光を利用した測距センサとしては、ＴＯＦ（Time-of-Flight）方式や、Structured Light方式などを適用することができる。 The distance information acquisition unit 111 measures the distance to the subject, generates a measurement result (distance information), and outputs the result to the subject feature extraction unit 112. The distance information acquisition unit 111 acquires distance information by a distance measuring sensor using active light (infrared rays or the like), for example. As a distance measuring sensor using active light, a time-of-flight (TOF) method, a structured light method, or the like can be applied.

また、距離情報取得部１１１は、アクティブ光の反射光、例えば、ＴＯＦの光源やカメラのフラッシュ光などを用いた測距センサ（測距方式）により距離情報を取得する構成としても良い。また、距離情報取得部１１１は、ステレオカメラにより距離情報を取得する構成としても良い。また、距離情報取得部１１１は、超音波センサにより距離情報を取得する構成としても良い。また、距離情報取得部１１１は、ミリ波レーダーを用いた方式で距離情報を取得する構成としても良い。 The distance information acquisition unit 111 may be configured to acquire distance information by a distance measuring sensor (ranging method) using reflected light of active light, for example, a TOF light source or a camera flash light. The distance information acquisition unit 111 may be configured to acquire distance information with a stereo camera. The distance information acquisition unit 111 may be configured to acquire distance information using an ultrasonic sensor. The distance information acquisition unit 111 may be configured to acquire distance information by a method using a millimeter wave radar.

被写体特徴抽出部１１２は、距離情報から、検出対象、例えば、人の顔がある可能性がある枠を設定する。その枠は、実サイズデータベース１１４に記憶されているテーブルが参照されて設定される。実サイズデータベース１１４には、距離と検出被写体の実サイズを考慮した画像上の大きさが関連付けられたテーブルが管理されている。例えば、所定の距離だけ離れた位置に、人の顔がある場合、その人の顔の大きさは画像上でどの程度であるかが記載されているテーブルである。 The subject feature extraction unit 112 sets a detection target, for example, a frame that may have a human face, from the distance information. The frame is set by referring to a table stored in the actual size database 114. The actual size database 114 manages a table in which the size on the image in consideration of the distance and the actual size of the detected subject is associated. For example, when there is a person's face at a position separated by a predetermined distance, the table describes how much the face of the person is on the image.

被写体候補領域検出部１１３は、設定された枠内に、検出対象があるか否かを判定し、ある場合には、その枠内を切り出し、図示していない後段の処理部に出力する。 The subject candidate area detection unit 113 determines whether or not there is a detection target in the set frame. If there is, the subject candidate area detection unit 113 cuts out the frame and outputs it to a subsequent processing unit (not shown).

図２に示したフローチャートを参照し、検出装置１００の動作について説明を加える。 The operation of the detection apparatus 100 will be described with reference to the flowchart shown in FIG.

ステップＳ１０１において、被写体特徴抽出部１１２は、処理対象とする画素（注目画素）を設定する。例えば、画像の左上の画素から、順次注目画素に設定される。例えば、図３上図に示すように、画像１３１（距離画像１３１）が取得されたとき、その距離画像１３１の左上の画素から右下の画素まで順次注目画素に設定される。図３上図では、矢印で、順次設定される注目画素の順を表したが、このような順以外の順で、注目画素が設定されても良い。 In step S101, the subject feature extraction unit 112 sets a pixel (target pixel) to be processed. For example, the pixel of interest is sequentially set from the upper left pixel of the image. For example, as shown in the upper diagram of FIG. 3, when an image 131 (distance image 131) is acquired, the pixel of interest is sequentially set from the upper left pixel to the lower right pixel of the distance image 131. In the upper diagram of FIG. 3, the order of the target pixels that are sequentially set is represented by an arrow, but the target pixels may be set in an order other than such an order.

距離画像１３１とは、距離情報から生成される画像であるとする。例えば、同一の距離は同一色で表され、距離に応じた色付けがされた画像である。なお、本技術における距離画像１３１は、距離に応じた色付けがされた画像である必要はなく、単に、画像１３１内の所定の画素（被写体）が、検出装置１００からどの程度離れているかがわかる画像であればよい。 The distance image 131 is an image generated from distance information. For example, the same distance is an image represented by the same color and colored according to the distance. Note that the distance image 131 according to the present technology does not need to be an image colored according to the distance, but simply knows how far a predetermined pixel (subject) in the image 131 is from the detection device 100. Any image can be used.

ここでは、図３に示すように距離画像１３１が生成されるとして説明を続ける。また図３の上図に示したように、距離画像１３１内の所定の位置にある画素が、注目画素１３２に設定された場合を例に挙げて説明を続ける。 Here, the description will be continued assuming that the distance image 131 is generated as shown in FIG. Further, as illustrated in the upper diagram of FIG. 3, the description will be continued with an example in which a pixel at a predetermined position in the distance image 131 is set as the pixel of interest 132.

ステップＳ１０２において、注目画素における距離情報が取得される。ステップＳ１０３において、距離と検出対象の被写体の実サイズから検出枠のサイズが決定される。被写体特徴抽出部１１２は、図３の中図に示すように、注目画素１３２の距離と、検出対象の被写体の実サイズから、検出枠１３３を設定する。 In step S102, distance information on the target pixel is acquired. In step S103, the size of the detection frame is determined from the distance and the actual size of the subject to be detected. As illustrated in the middle diagram of FIG. 3, the subject feature extraction unit 112 sets a detection frame 133 from the distance of the target pixel 132 and the actual size of the subject to be detected.

被写体特徴抽出部１１２は、実サイズデータベース１１４で管理されているテーブルを参照して、検出枠１３３を設定する。実サイズデータベース１１４には、例えば、図４に示したようなテーブル１５１が記憶されている。 The subject feature extraction unit 112 refers to the table managed by the actual size database 114 and sets the detection frame 133. In the actual size database 114, for example, a table 151 as shown in FIG. 4 is stored.

図４に示したテーブル１５１は、距離と顔の実サイズに基づいた画像上の大きさが関連付けられているテーブルである。例えば、距離が０（ｃｍ）のときは、顔の画像上のサイズが３０画素×３０画素であり、距離が５０（ｃｍ）のときは、顔の画像上のサイズが２５画素×２５画素であり、距離が１００（ｃｍ）のときは、顔の画像上のサイズが２０画素×２０画素であるといった関係が記載されている。 The table 151 shown in FIG. 4 is a table in which the size on the image based on the distance and the actual size of the face is associated. For example, when the distance is 0 (cm), the size on the face image is 30 pixels × 30 pixels, and when the distance is 50 (cm), the size on the face image is 25 pixels × 25 pixels. Yes, when the distance is 100 (cm), the relationship that the size on the face image is 20 pixels × 20 pixels is described.

距離は、検出装置１００と被写体（この場合、人の顔）との距離である。顔の実サイズとは、所定の距離、例えば、５０センチ離れていた位置での平均的な人の顔の大きさである。人の顔は、性別、年齢により異なるし、個人差もあるため、ここでは、顔の実サイズは、平均的な人の顔のサイズであるとして説明を続ける。 The distance is a distance between the detection device 100 and a subject (in this case, a human face). The actual face size is the average human face size at a predetermined distance, for example, 50 cm away. Since human faces vary depending on gender and age, and there are individual differences, the actual face size will be described here as an average human face size.

なお、１つの距離に対して複数の顔の実サイズに基づいた画像上の大きさが関連付けられているテーブル１５１を作成し、そのようなテーブル１５１で処理が行われるようにしても良い。例えば、１つの距離に対して、男性の顔の実サイズに基づいた画像上の大きさ、女性の顔の実サイズに基づいた画像上の大きさ、子供の実サイズに基づいた画像上の大きさを関連付けても良い。この場合、各実サイズに基づいた画像上の大きさに対応する検出枠１３３を設定し、それぞれの検出枠１３３毎に、後述するステップＳ１０４の処理を実行すれば良い。 Note that a table 151 in which image sizes based on the actual sizes of a plurality of faces are associated with one distance may be created, and processing may be performed using such a table 151. For example, for one distance, the size on the image based on the actual size of the male face, the size on the image based on the actual size of the female face, and the size on the image based on the actual size of the child May be associated. In this case, a detection frame 133 corresponding to the size on the image based on each actual size may be set, and the process of step S104 described later may be executed for each detection frame 133.

また、図４では、距離として、０、５０、１００と、５０センチ単位で実サイズに基づいた画像上の大きさと関連付けた例を示したが、５０センチ単位に限定されるわけでなく、距離情報の精度や、検出に求められる精度などにより、変更可能な値である。 FIG. 4 shows an example in which the distances are related to 0, 50, 100 and the size on the image based on the actual size in units of 50 centimeters. However, the distance is not limited to 50 centimeters. The value can be changed depending on the accuracy of information and the accuracy required for detection.

ステップＳ１０３において、被写体特徴抽出部１１２は、図３の中図に示すように、注目画素１３２の距離と、検出対象の被写体の実サイズに基づいた画像上の大きさから、検出枠１３３を設定する。例えば、図４に示したようなテーブル１５１が参照されて処理が行われ、注目画素１３２の距離が５０センチであると判定された場合、注目画素１３２を中心とし、２５画素×２５画素となる検出枠１３３が設定される。 In step S103, the subject feature extraction unit 112 sets the detection frame 133 from the size on the image based on the distance of the target pixel 132 and the actual size of the subject to be detected, as shown in the middle diagram of FIG. To do. For example, when processing is performed with reference to the table 151 as illustrated in FIG. 4 and the distance of the target pixel 132 is determined to be 50 centimeters, the pixel of interest 132 is 25 pixels × 25 pixels centering on the target pixel 132. A detection frame 133 is set.

なお、ここでは、検出枠１３３は、図３に示したように四角形である場合を例に挙げて説明するが、四角形などの矩形に限らず、円形などの他の形状であっても良い。またここでは、被写体の実サイズを、被写体の特徴（特徴量）とし、その実サイズに基づいた画像上の大きさを用いて検出枠１３３を設定する例を挙げて説明するが、他の被写体の特徴（特徴量）を用いて、検出枠１３３が設定されるようにすることも可能である。 Here, the case where the detection frame 133 is a quadrangle as illustrated in FIG. 3 will be described as an example. However, the detection frame 133 is not limited to a rectangle such as a quadrangle, but may be another shape such as a circle. Here, an example in which the actual size of the subject is the feature (feature amount) of the subject and the detection frame 133 is set using the size on the image based on the actual size will be described. It is also possible to set the detection frame 133 using the feature (feature amount).

このように被写体特徴抽出部１１２は、この場合、被写体の大きさを特徴量とし、その特徴量に設定される検出枠１３３を設定する設定部として機能する。 In this way, the subject feature extraction unit 112 functions as a setting unit that sets the size of the subject as a feature amount and sets the detection frame 133 set to the feature amount.

このように、撮像された画像１３１内で、距離に応じた顔の大きさに該当する検出枠１３３が設定される。検出枠１３３は、注目画素１３２の位置の距離に、検出対象の被写体、例えば、人の顔があるとしたら、距離画像１３１上でどのくらいのサイズになるかが計算されることで設定される枠である。なお、このような計算自体は省略し、テーブル１５１に記載されているようにするなど、他の形態を適用することも可能である。 In this manner, a detection frame 133 corresponding to the face size according to the distance is set in the captured image 131. The detection frame 133 is a frame that is set by calculating the size on the distance image 131 if there is a subject to be detected, for example, a human face, at the distance of the position of the target pixel 132. It is. Note that such calculation itself is omitted, and other forms such as those described in the table 151 can be applied.

ステップＳ１０４において、被写体候補領域検出部１１３により、検出枠１３３内の画像は、検出対象の被写体の候補であるか否かが判定される。例えば、距離画像１３１に、検出枠１３３と同等のサイズのフィルタをかけ、その応答値を被写体候補の確率値として用いる。フィルタとしては、ＤＯＧ（Difference-of-Gaussian）フィルタやラプラシアンフィルタなどを適用することができる。 In step S104, the subject candidate area detection unit 113 determines whether the image in the detection frame 133 is a candidate for a subject to be detected. For example, a filter having the same size as the detection frame 133 is applied to the distance image 131, and the response value is used as the probability value of the subject candidate. As the filter, a DOG (Difference-of-Gaussian) filter, a Laplacian filter, or the like can be applied.

検出枠１３３内の画像は、検出対象の被写体の候補であるか否かの判定は、検出枠１３３と検出枠１３３内の距離情報を用いて判定することができる。 Whether an image in the detection frame 133 is a candidate for a subject to be detected can be determined using distance information in the detection frame 133 and the detection frame 133.

例えば、検出枠１３３内に人の顔が撮像されていた場合、人の顔には、凹凸があるため、検出枠１３３内の距離情報も遠近がばらけた情報となる。また、例えば、検出枠１３３内に人の顔が撮像されていた場合であるが、写真（ポスター）に写った人の顔である場合には、検出枠１３３内の距離情報は一定値となり、遠近がばらけた情報とはならない。 For example, when a person's face is imaged in the detection frame 133, since the person's face has irregularities, the distance information in the detection frame 133 is also information that varies in distance. Also, for example, when a human face is captured in the detection frame 133, but in the case of a human face shown in a photograph (poster), the distance information in the detection frame 133 is a constant value, The information is not distant.

このような遠近のばらけ具合を、フィルタをかけて検出することで、検出枠１３３に検出対象の被写体があるか否かが判定される。判定結果は、検出装置１００の後段の処理部（不図示）に出力される。なお、検出枠１３３に検出対象の被写体があると判定されたときだけ、検出装置１００内の画像が、画像１３１から切り出され、その切り出された画像が出力されるようにすることができる。 It is determined whether or not there is a subject to be detected in the detection frame 133 by detecting such a degree of dispersal by applying a filter. The determination result is output to a processing unit (not shown) at the subsequent stage of the detection apparatus 100. Only when it is determined that there is a subject to be detected in the detection frame 133, the image in the detection device 100 can be cut out from the image 131, and the cut out image can be output.

例えば、図３の下図に示すように、検出枠１３３−１内には顔があり、検出対象の被写体があると判定された場合、検出枠１３３−１内の画像が画像１３１から切り出され、出力される。また検出枠１３３−２内には顔がなく、検出対象の被写体はないと判定された場合、検出枠１３３−２内の画像は切り出されない。 For example, as shown in the lower diagram of FIG. 3, when it is determined that there is a face in the detection frame 133-1 and there is a subject to be detected, the image in the detection frame 133-1 is cut out from the image 131, Is output. If it is determined that there is no face in the detection frame 133-2 and there is no subject to be detected, the image in the detection frame 133-2 is not cut out.

切り出しが行われるタイミングは、ステップＳ１０５における処理が終了した後に行われるようにすることができる。またステップＳ１０４の判定処理における判定結果、すなわちこの場合、フィルタをかけたときの値を、被写体候補の確率値とし、ステップＳ１０５の処理後に、その確率値に基づき、切り出しが行われるようにしても良い。また、確率値のみを後段に出力する構成とすることも可能である。 The timing at which the cutout is performed can be performed after the processing in step S105 is completed. Further, the determination result in the determination process in step S104, that is, in this case, the value when the filter is applied is set as the probability value of the subject candidate, and after the process in step S105, clipping is performed based on the probability value. good. It is also possible to have a configuration in which only the probability value is output to the subsequent stage.

このように被写体候補領域検出部１１２は、検出枠１３３内の画像が、検出対象の被写体であるか否かを判定する判定部として機能する。また、被写体候補領域検出部１１２により、検出対象の被写体であると判定された画像は、切り出されて後段の処理部などに出力されるようにすることができる。 In this way, the subject candidate area detection unit 112 functions as a determination unit that determines whether the image in the detection frame 133 is a subject to be detected. In addition, an image that is determined by the subject candidate area detection unit 112 to be a subject to be detected can be cut out and output to a subsequent processing unit or the like.

ステップＳ１０５において、画像１３１内の全画素に対して、このような処理が終了したか否かが判定され、全画素に対しては終了していないと判定された場合、ステップＳ１０１に処理が戻され、新たな注目画素が設定され、その設定された注目画素に対してステップＳ１０２以降の処理が行われる。 In step S105, it is determined whether or not such processing has been completed for all pixels in the image 131. If it is determined that the processing has not been completed for all pixels, the processing returns to step S101. Then, a new target pixel is set, and the processing after step S102 is performed on the set target pixel.

一方で、ステップＳ１０５において、画像１３１内の全画素に対して、このような処理は終了したと判定された場合、認識処理は終了される。 On the other hand, if it is determined in step S105 that such processing has been completed for all pixels in the image 131, the recognition processing is terminated.

ステップＳ１０１乃至Ｓ１０５の処理が繰り返されることで、距離画像１３１内の全画素において、被写体候補の確率値が求められる。そして、その確率値の極大値を検出被写体の中心位置として、その中心位置の画素を注目画素１３２として設定して検出枠１３３内の画像が切り出される。 By repeating the processing of steps S101 to S105, the probability values of the subject candidates are obtained for all the pixels in the distance image 131. Then, the maximum value of the probability value is set as the center position of the detection subject, the pixel at the center position is set as the target pixel 132, and the image in the detection frame 133 is cut out.

なお、このように、注目画素１３２が設定され、検出枠１３３が設定されるため、注目画素１３２は、画像１３１内の全ての画素に対して設定されなくても良い。 Since the target pixel 132 is set and the detection frame 133 is set as described above, the target pixel 132 may not be set for all the pixels in the image 131.

例えば、画像１３１の左上に位置する画素を注目画素１３２とした場合、検出枠１３３は設定できないため、仮に設定したとしても、検出枠１３３の一部（この場合、３／４）が欠けた状態でしか設定できないため、そのような領域にある画素は、注目画素１３２として設定しないようにしても良い。画像１３１の辺付近の領域も、同様に、検出枠１３３が設定できない領域であるため、そのような領域にある画素は、注目画素１３２として設定しないようにしても良い。 For example, when the pixel located at the upper left of the image 131 is set as the target pixel 132, the detection frame 133 cannot be set. Therefore, even if the detection frame 133 is set, a part of the detection frame 133 (in this case, 3/4) is missing. Therefore, a pixel in such a region may not be set as the target pixel 132. Similarly, the region near the side of the image 131 is also a region where the detection frame 133 cannot be set. Therefore, a pixel in such a region may not be set as the target pixel 132.

また、注目画素１３２は、１画素ずつ順次設定されるようにしても良いが、所定の間隔、例えば、５画素おきに設定されるようにしても良い。 Further, the target pixel 132 may be set sequentially for each pixel, but may be set at a predetermined interval, for example, every five pixels.

また、距離が、離れていると判定される領域、換言すれば、背景と判定できる領域内の画素は、注目画素１３２として設定しないようにしても良い。このようにすることで、処理対象とされる注目画素１３２を少なくすることができ、処理を軽減することが可能となる。 In addition, a region where the distance is determined to be far away, in other words, a pixel in a region where the background can be determined may not be set as the pixel of interest 132. In this way, the target pixel 132 to be processed can be reduced, and the processing can be reduced.

このような検出がされることで、例えば、図５に示すような検出結果が得られる。図５の上図は、画像１３１の一例を表し、検出対象（人の顔）がある可能性がある領域として、検出枠１３３−１乃至１３３−４が設定された場合を示している。 By performing such detection, for example, a detection result as shown in FIG. 5 is obtained. The upper diagram in FIG. 5 represents an example of the image 131, and shows a case where detection frames 133-1 to 133-4 are set as regions where there is a possibility that there is a detection target (human face).

検出枠１３３−１においては、被写体特徴抽出部１１２（図１）により被写体の距離に応じて設定された検出枠１３３-１内に、被写体候補領域検出部１１３により、顔があると判定されたため、検出枠１３３−１内の画像は、切り出され、出力される。 In the detection frame 133-1, the subject candidate area detection unit 113 determines that there is a face within the detection frame 133-1 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-1 is cut out and output.

検出枠１３３−２においては、被写体特徴抽出部１１２（図１）により被写体の距離に応じて設定された検出枠１３３-２内に、被写体候補領域検出部１１３により、顔はないと判定されたため、検出枠１３３-２内の画像は、切り出されない。 In the detection frame 133-2, the subject candidate area detection unit 113 determines that there is no face in the detection frame 133-2 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-2 is not cut out.

検出枠１３３−３においては、被写体特徴抽出部１１２（図１）により被写体の距離に応じて設定された検出枠１３３-３内に、被写体候補領域検出部１１３により、顔があると判定されたため、検出枠１３３−３内の画像は、切り出され、出力される。 In the detection frame 133-3, the subject candidate area detection unit 113 determines that there is a face within the detection frame 133-3 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-3 is cut out and output.

検出枠１３３−４においては、被写体特徴抽出部１１２（図１）により被写体の距離に応じて設定された検出枠１３３-１内に、顔があったとしても、その顔が、写真などである場合、被写体候補領域検出部１１３により、顔はないと判定されるため、検出枠１３３−４内の画像は、切り出されない。 In the detection frame 133-4, even if there is a face in the detection frame 133-1 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1), the face is a photograph or the like. In this case, since the subject candidate area detection unit 113 determines that there is no face, the image in the detection frame 133-4 is not cut out.

このように、本技術においては、距離と、その距離における検出物体の大きさを用いて、物体を検出する。このように検出を行うことで、所定の距離において、その距離における検出物体の大きさ以外の物体は、検出対象から外されるため、誤検出が行われる可能性を低下させることが可能となる。 Thus, in the present technology, an object is detected using the distance and the size of the detected object at the distance. By performing detection in this manner, objects other than the size of the detected object at that distance are removed from the detection target at a predetermined distance, and thus the possibility of erroneous detection can be reduced. .

また、例えば、検出対象が人の顔であるような場合、写真に写っている人の顔など、距離に遠近がないような物体を誤って検出することがなく、この点からも、誤検出が行われる可能性を低下させることが可能となる。 Also, for example, when the detection target is a human face, there is no false detection of objects that are not far away, such as a human face in a photograph. Can be reduced.

また、例えば、パターンマッチングなどによる検出を行う場合よりも、本技術による検出を行う方が、処理を軽減することができる。 In addition, for example, it is possible to reduce processing by performing detection using the present technology, compared to performing detection using pattern matching or the like.

＜第２の実施の形態＞
次に、第２の実施の形態について説明を加える。図６は、第２の実施の形態における検出装置２００の構成例を示す図である。図６に示した検出装置２００と、図１に示した検出装置１００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Second Embodiment>
Next, a second embodiment will be described. FIG. 6 is a diagram illustrating a configuration example of the detection device 200 according to the second embodiment. In the detection apparatus 200 illustrated in FIG. 6 and the detection apparatus 100 illustrated in FIG. 1, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第２の実施の形態における検出装置２００は、第１の実施の形態における検出装置１００に、撮像部２１１と被写体詳細認識部２１２とを追加した構成とされている。 The detection device 200 in the second embodiment is configured by adding an imaging unit 211 and a subject detail recognition unit 212 to the detection device 100 in the first embodiment.

撮像部２１１は、CCDやCMOSイメージセンサなどの撮像素子を含む構成とされ、環境光による画像（通常画像と記述する）を撮像し、被写体詳細認識部２１２に供給する。被写体詳細認識部２１２には、被写体候補領域検出部１１３からの検出結果も供給される。 The imaging unit 211 is configured to include an imaging element such as a CCD or a CMOS image sensor, captures an image of ambient light (described as a normal image), and supplies it to the subject detail recognition unit 212. The detection result from the subject candidate area detection unit 113 is also supplied to the subject detail recognition unit 212.

被写体候補領域検出部１１３は、第１の実施の形態として説明したように、距離画像１３１を用いて検出対象、例えば人の顔が存在すると判定した領域を切り出し、出力する。被写体詳細認識部２１２は、被写体候補領域検出部１１３から供給された領域内の被写体に対して、さらに詳細な認識を、通常画像を用いて行う。例えば、性別や年齢といった個人を特定するような認識処理が行われる。 As described in the first embodiment, the subject candidate area detection unit 113 cuts out and outputs an area determined to have a detection target, for example, a human face, using the distance image 131. The subject detail recognition unit 212 performs more detailed recognition on the subject in the region supplied from the subject candidate region detection unit 113 using the normal image. For example, a recognition process for specifying an individual such as gender and age is performed.

図７に示したフローチャートを参照して、図６に示した検出装置２００の処理について説明を加える。 With reference to the flowchart shown in FIG. 7, the processing of the detection apparatus 200 shown in FIG. 6 will be described.

ステップＳ２０１乃至Ｓ２０５は、距離情報取得部１１１乃至被写体候補領域検出部１１３により行われる処理であり、図２に示したフローチャートのステップＳ１０１乃至Ｓ１０５と同様に行われるため、その説明は省略する。 Steps S201 to S205 are processes performed by the distance information acquisition unit 111 to the subject candidate area detection unit 113, and are performed in the same manner as steps S101 to S105 of the flowchart shown in FIG.

ステップＳ２０６において、被写体詳細認識部２１２は、被写体候補の検出枠を利用して詳細認識を行う。例えば、被写体詳細認識部２１２は、被写体候補領域検出部１１３から供給された検出枠１３３を撮像部２１１からの通常画像の該当領域に設定し、その設定した検出枠１３３内の画像を切り出す。そして、切り出された通常画像を用いて、被写体の性別や年齢といった個人を特定するような認識処理など、予め設定されている認識処理を実行する。 In step S206, the subject detail recognition unit 212 performs detail recognition using the detection frame for the subject candidate. For example, the subject detail recognition unit 212 sets the detection frame 133 supplied from the subject candidate region detection unit 113 as a corresponding region of the normal image from the imaging unit 211, and cuts out the image within the set detection frame 133. Then, using the extracted normal image, a preset recognition process such as a recognition process for specifying an individual such as the sex or age of the subject is executed.

このように処理が行われることで、さらに詳細に被写体を検出することができる。 By performing the processing in this way, the subject can be detected in more detail.

なお被写体候補領域検出部１１３から被写体詳細認識部２１２に供給される情報としては、被写体の実サイズに基づいた画像上の大きさ（検出枠１３３）、代表点（例えば注目画素１３２）、分布マップ（例えば、ヒートマップ、フィルタ応答値）等の情報とすることができる。また、被写体詳細認識部２１２は、被写体候補領域検出部１１３から供給される情報を用いた詳細認識を行う。 The information supplied from the subject candidate area detecting unit 113 to the subject detail recognizing unit 212 includes the size on the image (detection frame 133) based on the actual size of the subject, the representative point (for example, the target pixel 132), and the distribution map. It can be information such as (eg, heat map, filter response value). The subject detail recognition unit 212 performs detail recognition using information supplied from the subject candidate region detection unit 113.

このような検出（認識）処理が行われることで、例えば、図８に示すような検出結果が得られる。図８の上図、中図は、図５と同様である。すなわち、距離画像１３１を用いた検出処理により、検出枠１３３−１と検出枠１３３−３が、被写体が検出された領域の情報として、被写体詳細認識部２１２に供給される。 By performing such detection (recognition) processing, for example, a detection result as shown in FIG. 8 is obtained. The upper and middle views of FIG. 8 are the same as FIG. That is, through the detection process using the distance image 131, the detection frame 133-1 and the detection frame 133-3 are supplied to the subject detail recognition unit 212 as information on the area where the subject is detected.

被写体詳細認識部２１２では、例えば、ＤＮＮ（Deep Learning）などの方式に用いて、検出枠１３３−１と検出枠１３３−３を通常画像に対して設定したときに、通常画像から切り出される画像を用いて、認識処理を実行する。 The subject detail recognizing unit 212 uses, for example, a scheme such as DNN (Deep Learning), and when the detection frame 133-1 and the detection frame 133-3 are set for the normal image, an image cut out from the normal image is displayed. To perform recognition processing.

このように、第２の実施の形態においても、距離と、その距離における検出物体の大きさが用いられて、検出物体が検出されるため、検出精度を向上させ、検出に係る処理負荷を低減させることが可能となる。さらに、第２の実施の形態においては通常画像（距離画像以外の画像）を用いて、詳細な認識処理を実行するため、より詳細に被写体を検出し、その被写体を認識することができる。 As described above, also in the second embodiment, the distance and the size of the detected object at the distance are used to detect the detected object, so that the detection accuracy is improved and the processing load related to the detection is reduced. It becomes possible to make it. Furthermore, in the second embodiment, since a detailed recognition process is executed using a normal image (an image other than a distance image), the subject can be detected in more detail and the subject can be recognized.

＜第３の実施の形態＞
次に、第３の実施の形態について説明を加える。図９は、第３の実施の形態における検出装置３００の構成例を示す図である。図９に示した検出装置３００と、図１に示した検出装置１００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Third Embodiment>
Next, a third embodiment will be described. FIG. 9 is a diagram illustrating a configuration example of the detection apparatus 300 according to the third embodiment. In the detection apparatus 300 illustrated in FIG. 9 and the detection apparatus 100 illustrated in FIG. 1, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第３の実施の形態における検出装置３００は、第１の実施の形態における検出装置１００に、被写体方向検出部３１１を追加した構成とした点が、第１の実施の形態における検出装置１００と異なる。 The detection apparatus 300 according to the third embodiment is different from the detection apparatus 100 according to the first embodiment in that a subject direction detection unit 311 is added to the detection apparatus 100 according to the first embodiment. .

被写体方向検出部３１１は、検出された被写体が向いている方向を検出する。第３の実施の形態における検出装置３００は、被写体の位置、サイズ、および方向を検出する。 The subject direction detection unit 311 detects the direction in which the detected subject is facing. The detection device 300 according to the third embodiment detects the position, size, and direction of the subject.

図１０に示したフローチャートを参照して、図９に示した検出装置３００の処理について説明を加える。 With reference to the flowchart shown in FIG. 10, the process of the detection apparatus 300 shown in FIG. 9 will be described.

ステップＳ３０１乃至Ｓ３０６（ステップＳ３０５を除く）は、距離情報取得部１１１乃至被写体候補領域検出部１１３により行われる処理であり、図２に示したフローチャートのステップＳ１０１乃至Ｓ１０５と同様に行われるため、その説明は省略する。 Steps S301 to S306 (excluding step S305) are processes performed by the distance information acquisition unit 111 to the subject candidate area detection unit 113, and are performed in the same manner as steps S101 to S105 in the flowchart shown in FIG. Description is omitted.

ステップＳ３０５において、被写体候補領域検出部１１３により、検出対象の被写体があると判定された領域（検出枠１３３で設定される領域）とその領域内から切り出された画像が、被写体方向検出部３１１に供給される。被写体方向検出部３１１は、検出された被写体の方向を検出する。 In step S305, the subject candidate region detection unit 113 determines that the region (the region set by the detection frame 133) determined that there is a subject to be detected and the image cut out from the region are displayed in the subject direction detection unit 311. Supplied. The subject direction detection unit 311 detects the direction of the detected subject.

例えば、図１１に示したような画面が取得された場合を例に挙げ、方向の検出について説明を加える。図１１に示した例では、検出対象は、手であるとして説明する。被写体特徴抽出部１１２と被写体候補領域検出部１１３において、ステップＳ３０２乃至Ｓ３０４の処理が実行されることで、距離画像１３１内に、検出枠１３３が設定され、その検出枠１３３内に、検出対象である手が検出される。 For example, a case where a screen as shown in FIG. 11 is acquired will be described as an example, and the direction detection will be described. In the example illustrated in FIG. 11, the detection target is described as being a hand. The subject feature extraction unit 112 and the subject candidate region detection unit 113 execute the processes of steps S302 to S304, so that a detection frame 133 is set in the distance image 131, and the detection frame 133 includes a detection target. A hand is detected.

被写体方向検出部３１１は、検出枠１３３内を、所定の大きさに分割し、分割された領域内を被写体の面とし、その面の法線方向を求める。図１１に示した画像では、手のひらは、図中、右方向に向いている。手のひらが右方向に向いている場合、手のひらの部分の距離情報としては、手前から奥に向かって、徐々に遠くなる距離情報が得られる。 The subject direction detection unit 311 divides the inside of the detection frame 133 into a predetermined size, sets the inside of the divided area as the surface of the subject, and obtains the normal direction of the surface. In the image shown in FIG. 11, the palm faces rightward in the figure. When the palm is directed to the right, as the distance information of the palm part, distance information gradually getting farther from the front toward the back is obtained.

そのような距離情報が得られる手のひらの部分（面）に対して法線を設定すると、図１１に示したように、図中右方向の法線が設定される。この設定された法線から、手のひらは、図中右方向を向いていると判定される。 When a normal is set for the palm portion (surface) from which such distance information can be obtained, a normal in the right direction in the figure is set as shown in FIG. From this set normal, it is determined that the palm is facing the right direction in the figure.

このように、距離情報を用いることで、被写体が向いている方向も判定することができる。よって、第３の実施の形態によれば、第１、第２の実施の形態と同じく、距離と、その距離における検出物体の大きさを用いて、検出物体を検出するため、検出精度を向上させることが可能となる。また、検出した被写体の方向を判定することもできる。 In this way, by using the distance information, the direction in which the subject is facing can also be determined. Therefore, according to the third embodiment, as in the first and second embodiments, the detection object is detected using the distance and the size of the detection object at the distance, so the detection accuracy is improved. It becomes possible to make it. It is also possible to determine the direction of the detected subject.

＜第４の実施の形態＞
次に、第４の実施の形態について説明を加える。図１２は、第４の実施の形態における検出装置４００の構成例を示す図である。図１２に示した検出装置４００と、図９に示した検出装置３００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Fourth embodiment>
Next, a fourth embodiment will be described. FIG. 12 is a diagram illustrating a configuration example of the detection apparatus 400 according to the fourth embodiment. In the detection apparatus 400 illustrated in FIG. 12 and the detection apparatus 300 illustrated in FIG. 9, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第４の実施の形態における検出装置４００は、第３の実施の形態における検出装置３００に、撮像部４１１と被写体詳細認識部４１２とを追加した構成とされている。追加された撮像部４１１と被写体詳細認識部４１２は、第２の実施の形態における検出装置２００の撮像部２１１と被写体詳細認識部２１２（いずれも図６）と基本的に同様の処理を行う。 The detection device 400 in the fourth embodiment is configured by adding an imaging unit 411 and a subject detail recognition unit 412 to the detection device 300 in the third embodiment. The added imaging unit 411 and subject detail recognition unit 412 perform basically the same processing as the imaging unit 211 and subject detail recognition unit 212 (both in FIG. 6) of the detection apparatus 200 in the second embodiment.

撮像部４１１は、通常画像を撮像し、被写体詳細認識部４１２に供給する。被写体詳細認識部４１２には、被写体方向検出部３１１からの検出結果も供給される。被写体方向検出部３１１からは、検出対象、例えば人の顔の位置（検出枠１３３が設定されている位置）、その大きさ（検出枠１３３の大きさ）、およびその方向が出力される。 The imaging unit 411 captures a normal image and supplies it to the subject detail recognition unit 412. The subject detail recognition unit 412 is also supplied with the detection result from the subject direction detection unit 311. The subject direction detection unit 311 outputs a detection target, for example, the position of a human face (position where the detection frame 133 is set), its size (size of the detection frame 133), and its direction.

被写体詳細認識部４１２は、被写体方向検出部３１１から供給された領域内の被写体に対して、さらに詳細な認識を、通常画像を用いて行う。例えば、性別や年齢といった個人を特定するような認識処理が行われる。 The subject detail recognition unit 412 performs more detailed recognition on the subject in the area supplied from the subject direction detection unit 311 using the normal image. For example, a recognition process for specifying an individual such as gender and age is performed.

図１３に示したフローチャートを参照して、図６に示した検出装置４００の処理について説明を加える。 With reference to the flowchart shown in FIG. 13, the process of the detection apparatus 400 shown in FIG. 6 is demonstrated.

ステップＳ４０１乃至Ｓ４０６は、距離情報取得部１１１、被写体特徴抽出部１１２、被写体候補領域検出部１１３、および被写体方向検出部３１１により行われる処理であり、図１０に示したフローチャートのステップＳ３０１乃至Ｓ３０６と同様に行われるため、その説明は省略する。 Steps S401 to S406 are processes performed by the distance information acquisition unit 111, the subject feature extraction unit 112, the subject candidate region detection unit 113, and the subject direction detection unit 311. The processing in steps S301 to S306 in the flowchart illustrated in FIG. Since it is performed similarly, the description is abbreviate | omitted.

ステップＳ４０７において、被写体詳細認識部４１２は、被写体候補の検出枠と被写体の方向を利用して詳細認識を行う。例えば、被写体詳細認識部４１２は、被写体方向検出部３１１から供給された検出枠１３３を撮像部４１１からの通常画像の該当領域に設定し、その設定した検出枠１３３内の画像を切り出す。そして、切り出された通常画像を用いて、被写体の性別や年齢といった個人を特定するような認識処理など、予め設定されている認識処理を実行する。この認識処理には、被写体の方向も考慮して行われる。 In step S407, the subject detail recognition unit 412 performs detail recognition using the subject candidate detection frame and the direction of the subject. For example, the subject detail recognition unit 412 sets the detection frame 133 supplied from the subject direction detection unit 311 as a corresponding area of the normal image from the imaging unit 411, and cuts out the image in the set detection frame 133. Then, using the extracted normal image, a preset recognition process such as a recognition process for specifying an individual such as the sex or age of the subject is executed. This recognition processing is performed in consideration of the direction of the subject.

図１４に、第４の実施の形態おける検出装置４００で行う認識方法と、他の認識方法とを比較した図を示す。図１４の左図は、他の認識方法の一例を示す図である。例えば、通常画像から、検出対象として顔が検出される場合、まず、検出された物体が顔であると仮定され、その顔が、前後または左右のどちらの方向を向いているかを判定するために、前後／左右判定辞書４３１が参照された判定が行われる。 FIG. 14 shows a diagram comparing a recognition method performed by the detection apparatus 400 according to the fourth embodiment and other recognition methods. The left diagram in FIG. 14 is a diagram illustrating an example of another recognition method. For example, when a face is detected as a detection target from a normal image, first, it is assumed that the detected object is a face, and in order to determine whether the face is facing forward or backward or left and right The determination with reference to the front / rear / left / right determination dictionary 431 is performed.

顔が前後方向を向いている（左右方向ではない方向を向いている）と判定された場合、前／後判定辞書４３２が参照され、前向きであるか、後向きであるかが判定される。前向きであると判定された場合、前向き辞書４３４が参照され、人の顔であるか否か、また、人の顔である場合、前向きの顔であるか否かが判定される。この処理により、前向き辞書４３４に、個人を特定するデータが記載されている場合、そのデータとマッチングをとることで人物が特定される。 When it is determined that the face is facing in the front-rear direction (facing in a direction other than the left-right direction), the front / rear determination dictionary 432 is referenced to determine whether the face is facing forward or facing backward. If it is determined to be forward, the forward dictionary 434 is referred to, and it is determined whether or not it is a human face, and if it is a human face, it is determined whether or not it is a forward-facing face. With this process, when data for identifying an individual is described in the forward dictionary 434, a person is identified by matching with the data.

一方、前／後判定辞書４３２が参照され、後向きであると判定された場合、後向き辞書４３５が参照され、人の顔であるか否か、また、人の顔である場合、後向きの顔であるか否かが判定される。この処理により、後向き辞書４３５に、個人を特定するデータが記載されている場合、そのデータとマッチングをとることで人物が特定される。 On the other hand, when the forward / backward determination dictionary 432 is referred to and determined to be backward, the backward dictionary 435 is referred to to determine whether or not the face is a person's face. It is determined whether or not there is. As a result of this processing, when data specifying an individual is described in the backward dictionary 435, a person is specified by matching the data.

一方、前後／左右判定辞書４３１が参照され、顔が左右方向を向いている（前後方向ではない方向を向いている）と判定された場合、左／右判定辞書４３３が参照され、左向きであるか、右向きであるかが判定される。左向きであると判定された場合、左向き辞書４３６が参照され、人の顔であるか否か、また、人の顔である場合、左向きの顔であるか否かが判定される。この処理により、左向き辞書４３６に、個人を特定するデータが記載されている場合、そのデータとマッチングをとることで人物が特定される。 On the other hand, when the front / rear / left / right determination dictionary 431 is referred to and the face is determined to face in the left / right direction (ie, facing in a direction other than the front / rear direction), the left / right determination dictionary 433 is referred to and leftward. Or whether it is facing right. When it is determined that the face is leftward, the leftward dictionary 436 is referred to, and it is determined whether or not the face is a human face. If the face is a human face, it is determined whether or not the face is a leftward face. With this process, when data for identifying an individual is described in the left-facing dictionary 436, a person is identified by matching the data.

一方、左／右判定辞書４３３が参照され、右向きであると判定された場合、右向き辞書４３７が参照され、人の顔であるか否か、また、人の顔である場合、右向きの顔であるか否かが判定される。この処理により、右向き辞書４３７に、個人を特定するデータが記載されている場合、そのデータとマッチングをとることで人物が特定される。 On the other hand, when the left / right determination dictionary 433 is referred to and determined to be right-facing, the right-facing dictionary 437 is referred to to determine whether or not it is a human face. It is determined whether or not there is. By this processing, when data for specifying an individual is described in the right-facing dictionary 437, a person is specified by matching with the data.

このようにして、従来の認識処理の場合、複数の辞書を参照し、判定を行うことで、認識処理が行われていた。 Thus, in the case of the conventional recognition process, the recognition process was performed by referring to a plurality of dictionaries and making a determination.

第４の実施の形態おける検出装置４００では、距離画像１３１から、被写体がある領域、大きさ、および方向が検出され、それらの情報を用いて、被写体詳細認識部４１２（図１２）は、認識処理を行う。よって、図１４の右図に示したように、Ｘ向き辞書４５１を用意し、そのＸ向き辞書４５１を参照することで、認識処理を行うことができる。 In the detection apparatus 400 according to the fourth embodiment, a region, a size, and a direction where a subject is present is detected from the distance image 131, and the subject detail recognition unit 412 (FIG. 12) recognizes using the information. Process. Accordingly, as shown in the right diagram of FIG. 14, recognition processing can be performed by preparing an X-direction dictionary 451 and referring to the X-direction dictionary 451.

Ｘ向き辞書４５１は、前向き辞書４３４、後向き辞書４３５、左向き辞書４３６、および右向き辞書４３７を含む辞書とされている。被写体詳細認識部４１２（図１２）には、被写体の方向も供給されるため、その供給された方向に関する辞書のみが参照され、認識処理が行われる構成とすることができる。 The X-direction dictionary 451 is a dictionary including a forward-facing dictionary 434, a backward-facing dictionary 435, a left-facing dictionary 436, and a right-facing dictionary 437. Since the subject direction recognition unit 412 (FIG. 12) is also supplied with the direction of the subject, only the dictionary relating to the supplied direction can be referred to and the recognition process can be performed.

第４の実施の形態おける検出装置４００によれば、辞書の数（データ量）を少なくすることができ、辞書を参照して複数回行われる判定処理を省略することができる。よって、第４の実施の形態における検出装置４００によれば、認識処理に係る処理を軽減することが可能となる。また、検出対象、例えば、顔である可能性が高い画像（切り出された画像）のみが、詳細認識の対象とされるため、画像内の処理対象とされる領域が絞られており、この点からも、認識処理に係る処理を軽減することが可能となる。 According to the detection apparatus 400 in the fourth embodiment, the number of dictionaries (data amount) can be reduced, and determination processing performed multiple times with reference to the dictionary can be omitted. Therefore, according to the detection apparatus 400 in the fourth embodiment, it is possible to reduce processing related to recognition processing. In addition, since only the detection target, for example, an image that has a high possibility of being a face (a clipped image) is a target for detailed recognition, a region to be processed in the image is narrowed. Therefore, it is possible to reduce processing related to recognition processing.

このように、第４の実施の形態においても、距離と、その距離における検出物体の大きさが用いられて、検出物体が検出されるため、検出精度を向上させることが可能となる。また、第４の実施の形態においては通常画像（距離画像以外の画像）を用いて、詳細な認識処理を実行するため、より詳細に被写体を検出し、その被写体を認識することができる。また、その認識処理は、被写体の方向を予め取得した処理とすることができ、処理を軽減することができる。 As described above, also in the fourth embodiment, since the detected object is detected by using the distance and the size of the detected object at the distance, it is possible to improve the detection accuracy. In the fourth embodiment, since a detailed recognition process is executed using a normal image (an image other than a distance image), the subject can be detected in more detail and the subject can be recognized. The recognition process can be a process in which the direction of the subject is acquired in advance, and the process can be reduced.

＜第５の実施の形態＞
次に、第５の実施の形態について説明する。第５の実施の形態、および以下に説明する第６乃至第８の実施の形態においては、被写体のサイズを推定し、被写体が属するカテゴリを推定することで、検出対象を検出する。 <Fifth embodiment>
Next, a fifth embodiment will be described. In the fifth embodiment and the sixth to eighth embodiments described below, the detection target is detected by estimating the size of the subject and estimating the category to which the subject belongs.

図１５は、第５の実施の形態における検出装置５００の構成例を示す図である。図１５に示した検出装置５００は、距離情報取得部１１１、被写体サイズ推定部５１１、および被写体カテゴリ推定部５１２から構成されている。 FIG. 15 is a diagram illustrating a configuration example of a detection device 500 according to the fifth embodiment. The detection apparatus 500 shown in FIG. 15 includes a distance information acquisition unit 111, a subject size estimation unit 511, and a subject category estimation unit 512.

距離情報取得部１１１は、例えば検出装置１００に含まれていた距離情報取得部１１１と同様の構成を有し、距離画像１３１を生成するための距離情報を取得する機能である。 The distance information acquisition unit 111 has a configuration similar to that of the distance information acquisition unit 111 included in the detection device 100, for example, and has a function of acquiring distance information for generating the distance image 131.

被写体サイズ推定部５１１は、被写体のサイズを推定し、推定したサイズの情報を、被写体カテゴリ推定部５１２に供給する。被写体カテゴリ推定部５１２は、推定された被写体のサイズと、その被写体が位置する距離とから、被写体が属するカテゴリを推定する。 The subject size estimation unit 511 estimates the size of the subject and supplies information on the estimated size to the subject category estimation unit 512. The subject category estimation unit 512 estimates the category to which the subject belongs from the estimated size of the subject and the distance at which the subject is located.

例えば、上記したように、人の顔であれば、所定の距離だけ離れた位置にある人の顔は、どの程度のサイズであるかがわかる。このことを換言すれば、所定の距離だけ離れた位置に、人の顔と判定できるサイズの物体があった場合、人の顔があると推定できる。 For example, as described above, in the case of a human face, it is possible to know how large the human face is at a predetermined distance away. In other words, if there is an object of a size that can be determined as a human face at a position separated by a predetermined distance, it can be estimated that there is a human face.

このようなことを利用し、検出装置５００では、被写体のサイズを推定し、そのサイズと距離から、被写体が属するカテゴリ、例えば、人の顔のカテゴリ、車のカテゴリといったカテゴリが判定される。 Utilizing this, the detection apparatus 500 estimates the size of the subject, and the category to which the subject belongs, for example, a category such as a human face category or a car category is determined from the size and distance.

図１６に示したフローチャートを参照し、図１５に示した検出装置５００の処理について説明を加える。 With reference to the flowchart shown in FIG. 16, the processing of the detection apparatus 500 shown in FIG. 15 will be described.

ステップＳ５０１において、被写体サイズ推定部５１１は、注目画素を設定する。この処理は、例えば、図２に示したフローチャートのステップＳ１０１と同様にして行うことができる。 In step S501, the subject size estimation unit 511 sets a target pixel. This process can be performed, for example, in the same manner as step S101 in the flowchart shown in FIG.

ステップＳ５０２において、被写体サイズ推定部５１１は、設定した注目画素位置の周辺の距離を取得する。そして、ステップＳ５０３において、被写体サイズ推定部５１１は、周辺の距離分布に基づいて、被写体サイズを推定する。例えば、物体と背景とでは、距離が大きく異なるため、距離が大きく変化する部分（すなわちエッジ）を、周辺の距離分布を参照して検出することで、物体が存在する領域（エッジ部分までの大きさ）を推定することができる。 In step S502, the subject size estimation unit 511 acquires a distance around the set target pixel position. In step S503, the subject size estimation unit 511 estimates the subject size based on the peripheral distance distribution. For example, since the distance between the object and the background is greatly different, a region where the distance greatly changes (that is, an edge) is detected with reference to the surrounding distance distribution, so that the region where the object exists (the size up to the edge portion) is detected. Can be estimated.

ステップＳ５０４において、被写体カテゴリ推定部５１２は、距離と被写体サイズに基づいて、被写体カテゴリを推定する。上記したように、距離とサイズから、その位置にある物体のカテゴリを推定することができるため、そのような推定が、ステップＳ５０４において実行される。 In step S504, the subject category estimation unit 512 estimates the subject category based on the distance and the subject size. As described above, since the category of the object at the position can be estimated from the distance and the size, such estimation is performed in step S504.

例えば、図１７に示すような距離画像１３１が取得されたとする。図１７に示した距離画像１３１は、手が撮像されている画像である。例えば、手の所定の位置が注目画素１３２に設定された場合、この注目画素１３２の周辺の距離分布が、参照される。 For example, it is assumed that a distance image 131 as shown in FIG. 17 is acquired. A distance image 131 illustrated in FIG. 17 is an image in which a hand is captured. For example, when a predetermined position of the hand is set as the target pixel 132, the distance distribution around the target pixel 132 is referred to.

手の部分と、背景とでは、距離が大きく異なる。すなわちこの場合、手がある部分は、近い距離であるが、背景は遠い距離となる。注目画素１３２の周辺の距離分布を参照することで、注目画素１３２から徐々に遠ざかる方向で距離分布を参照したとき、その距離が、大きく変わる部分がある。 The distance is greatly different between the hand portion and the background. That is, in this case, the part with the hand is a short distance, but the background is a long distance. By referring to the distance distribution around the pixel of interest 132, when the distance distribution is referenced in a direction gradually moving away from the pixel of interest 132, there is a portion where the distance changes greatly.

この場合、注目画素１３２は、手のひらのほぼ中央の位置に設定されているときなので、手のひらから指先の方に探索していくと、指の先端部分から背景になる部分で急激に距離情報が変化する。注目画素１３２から、急激に距離情報が変化する位置までを、図１７では矢印を用いて表している。なお、急減に距離情報が変化する位置は、注目画素１３２の距離と、探索している画素の距離との差分が、所定の閾値以上に変化した場合、その探索している画素の位置を、急減に距離情報が変化した位置としても良い。 In this case, since the target pixel 132 is set at a position approximately in the center of the palm, when searching from the palm toward the fingertip, the distance information changes rapidly from the tip of the finger to the background. To do. From the pixel of interest 132 to the position where the distance information changes abruptly, the arrows are used in FIG. Note that the position where the distance information changes suddenly decreases when the difference between the distance of the target pixel 132 and the distance of the pixel being searched for changes to a predetermined threshold value or more. It is good also as a position where distance information changed suddenly.

このようにして、注目画素１３２から、物体が存在する可能性がある範囲が推定される。図１７に示した例では、例えば、注目画素１３２から一番長い矢印の先端までを半径とする円や四角形（不図示）が設定され、その円や四角形の大きさが、被写体サイズとされる。この被写体サイズは、第１乃至第４の実施の形態における検出枠１３３に該当する。換言すれば、このような処理により、検出枠１３３が設定される。 In this way, a range where an object may exist is estimated from the target pixel 132. In the example shown in FIG. 17, for example, a circle or a rectangle (not shown) having a radius from the target pixel 132 to the tip of the longest arrow is set, and the size of the circle or the rectangle is the subject size. . This subject size corresponds to the detection frame 133 in the first to fourth embodiments. In other words, the detection frame 133 is set by such processing.

そして、注目画素１３２の距離と被写体サイズ（検出枠１３３）から、検出された被写体のカテゴリが推定される。図１７に示した例では、注目画素１３２の距離において、検出された被写体のサイズでは、“手”というカテゴリに属すると推定される。 Then, the category of the detected subject is estimated from the distance of the target pixel 132 and the subject size (detection frame 133). In the example illustrated in FIG. 17, it is estimated that the detected subject size at the distance of the target pixel 132 belongs to the category “hand”.

このような処理が、距離画像１３１内の全ての画素において実行されたか否かが、ステップＳ５０５において判定される。なお、図２に示したフローチャートのステップＳ１０５と同じく、注目画素１３２に設定される画素は、距離画像１３１内の全ての画素ではなく、一部除外される画素があっても良い。 In step S505, it is determined whether or not such processing has been performed for all the pixels in the distance image 131. Similar to step S105 in the flowchart shown in FIG. 2, the pixels set as the target pixel 132 may not be all the pixels in the distance image 131 but may be partially excluded.

このように、第５の実施の形態における検出装置５００によれば、距離画像から、被写体のサイズを推定し、カテゴリを推定することができる。また、検出装置５００によれば、複数の被写体（カテゴリ）を推定することができ、例えば、人と車といった異なる物体を検出することができる。 Thus, according to the detection apparatus 500 in the fifth embodiment, the size of the subject can be estimated from the distance image, and the category can be estimated. In addition, according to the detection device 500, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected.

＜第６の実施の形態＞
次に、第６の実施の形態について説明を加える。図１８は、第６の実施の形態における検出装置６００の構成例を示す図である。図１８に示した検出装置６００と、図１５に示した検出装置５００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Sixth Embodiment>
Next, the sixth embodiment will be described. FIG. 18 is a diagram illustrating a configuration example of a detection device 600 according to the sixth embodiment. In the detection apparatus 600 illustrated in FIG. 18 and the detection apparatus 500 illustrated in FIG. 15, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第６の実施の形態における検出装置６００は、第５の実施の形態における検出装置５００に、撮像部６１１と被写体詳細認識部６１２とを追加した構成とされている。追加された撮像部６１１は、第２の実施の形態における検出装置２００の撮像部２１１（図６）と基本的に同様の処理を行う。 The detection device 600 in the sixth embodiment is configured by adding an imaging unit 611 and a subject detail recognition unit 612 to the detection device 500 in the fifth embodiment. The added imaging unit 611 performs basically the same processing as the imaging unit 211 (FIG. 6) of the detection apparatus 200 in the second embodiment.

撮像部６１１は、通常画像を撮像し、被写体詳細認識部６１２に供給する。被写体詳細認識部６１２には、被写体カテゴリ推定部７１２から、距離、被写体サイズ、および被写体カテゴリも供給される。被写体詳細認識部６１２は、被写体カテゴリ推定部７１２から供給された距離、被写体サイズ、および被写体カテゴリを用いて、さらに詳細な認識を、通常画像を用いて行う。 The imaging unit 611 captures a normal image and supplies it to the subject detail recognition unit 612. The subject detail recognition unit 612 is also supplied with the distance, subject size, and subject category from the subject category estimation unit 712. The subject detail recognition unit 612 performs more detailed recognition using the normal image using the distance, subject size, and subject category supplied from the subject category estimation unit 712.

図１９に示したフローチャートを参照して、図１８に示した検出装置６００の処理について説明を加える。 With reference to the flowchart shown in FIG. 19, the processing of the detection apparatus 600 shown in FIG. 18 will be described.

ステップＳ６０１乃至Ｓ６０５は、距離情報取得部１１１、被写体サイズ推定部５１１、被写体カテゴリ推定部５１２により行われる処理であり、図１６に示したフローチャートのステップＳ５０１乃至Ｓ５０５と同様に行われるため、その説明は省略する。 Steps S601 to S605 are processes performed by the distance information acquisition unit 111, the subject size estimation unit 511, and the subject category estimation unit 512, and are performed in the same manner as steps S501 to S505 in the flowchart shown in FIG. Is omitted.

ステップＳ６０６において、被写体詳細認識部４１２は、被写体候補領域と被写体カテゴリを利用して詳細認識を行う。例えば、被写体詳細認識部４１２は、被写体カテゴリ推定部５１２から供給された被写体サイズに該当する枠を撮像部４１１からの通常画像の該当領域に設定し、その設定した枠内の画像を切り出す。 In step S606, the subject detail recognition unit 412 performs detail recognition using the subject candidate region and the subject category. For example, the subject detail recognition unit 412 sets a frame corresponding to the subject size supplied from the subject category estimation unit 512 as a corresponding region of the normal image from the imaging unit 411, and cuts out an image within the set frame.

そして、切り出された通常画像と、被写体カテゴリ推定部５１２から供給された被写体カテゴリを用いて、カテゴリを絞り込んだ後、そのカテゴリに属する物体を特定するような認識処理など、予め設定されている認識処理を実行する。例えば、カテゴリが人と判定されている場合には、人に属する画像とのマッチングを行い、個人を特定したり、カテゴリが車と判定されている場合には、車に属する画像とのマッチングを行い、車種を特定したりする詳細認識が行われる。 Then, using the extracted normal image and the subject category supplied from the subject category estimation unit 512, after narrowing down the category, recognition processing such as recognition processing for specifying an object belonging to the category is set in advance. Execute the process. For example, when the category is determined to be a person, matching with an image belonging to a person is performed. When an individual is specified, or when the category is determined to be a car, matching with an image belonging to a car is performed. Detailed recognition to identify the vehicle type is performed.

このように、第６の実施の形態における検出装置６００によれば、距離画像から、被写体のサイズを推定し、被写体が属するカテゴリを推定することができる。また、検出装置６００によれば、複数の被写体（カテゴリ）を推定することができ、例えば、人と車といった異なる物体を検出することができる。さらに、検出した物体を、詳細に認識することができる。 Thus, according to the detection apparatus 600 in the sixth embodiment, the size of the subject can be estimated from the distance image, and the category to which the subject belongs can be estimated. Further, according to the detection apparatus 600, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected. Furthermore, the detected object can be recognized in detail.

＜第７の実施の形態＞
次に、第７の実施の形態について説明を加える。図２０は、第７の実施の形態における検出装置７００の構成例を示す図である。図２０に示した検出装置７００と、図１５に示した検出装置５００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Seventh embodiment>
Next, the seventh embodiment will be described. FIG. 20 is a diagram illustrating a configuration example of the detection apparatus 700 according to the seventh embodiment. In the detection apparatus 700 illustrated in FIG. 20 and the detection apparatus 500 illustrated in FIG. 15, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第７の実施の形態における検出装置７００は、第５の実施の形態における検出装置５００に、被写体形状推定部７１１を追加し、被写体カテゴリ推定部７１２は、被写体形状推定部７１１からの出力を入力する構成とした点が、第５の実施の形態における検出装置５００と異なる。 The detection device 700 in the seventh embodiment adds a subject shape estimation unit 711 to the detection device 500 in the fifth embodiment, and the subject category estimation unit 712 receives an output from the subject shape estimation unit 711. This configuration differs from the detection device 500 according to the fifth embodiment.

被写体形状推定部７１１は、被写体の形状を推定する。図１７を再度参照する。図１７に示したように、手が撮像された距離画像１３１が取得されたとき、注目画素１３２から、距離情報が大きく変化する、すなわちエッジの部分までを探索することで、手の形状が得られる。 The subject shape estimation unit 711 estimates the shape of the subject. Reference is again made to FIG. As shown in FIG. 17, when the distance image 131 in which the hand is imaged is acquired, the distance information greatly changes from the target pixel 132, that is, by searching up to the edge portion, the shape of the hand is obtained. It is done.

被写体カテゴリ推定部７１２は、図１５に示した検出装置５００の被写体カテゴリ推定部５１２と基本的に同様の処理を行うが、図２０に示した被写体カテゴリ推定部７１２は、被写体形状推定部７１１からの推定された被写体の形状も用いてカテゴリの推定を行う。よって、より精度良くカテゴリの推定ができる。 The subject category estimation unit 712 performs basically the same processing as the subject category estimation unit 512 of the detection apparatus 500 illustrated in FIG. 15, but the subject category estimation unit 712 illustrated in FIG. The category is estimated using the estimated shape of the subject. Therefore, the category can be estimated with higher accuracy.

図２１に示したフローチャートを参照して、図２０に示した検出装置７００の処理について説明を加える。 With reference to the flowchart shown in FIG. 21, the processing of the detection apparatus 700 shown in FIG. 20 will be described.

ステップＳ７０１乃至Ｓ７０３は、距離情報取得部１１１、被写体サイズ推定部５１１により行われる処理であり、図１６に示したフローチャートのステップＳ５０１乃至Ｓ５０３と同様に行われるため、その説明は省略する。 Steps S701 to S703 are processes performed by the distance information acquisition unit 111 and the subject size estimation unit 511, and are performed in the same manner as steps S501 to S503 in the flowchart shown in FIG.

ステップＳ７０４において、被写体形状推定部７１１は、注目画素１３２周辺の距離分布に基づいて被写体の形状を推定する。図１７を参照して説明したように、距離情報を用いて、距離が大きく変化する部分（エッジ）を探索することで、形状が推定される。換言すれば、距離がなだらかに変化している領域は、検出している物体の一部であるとし、そのような距離がなだらかに変化しているか否かを判定しながら、物体の形状が求められる。 In step S <b> 704, the subject shape estimation unit 711 estimates the shape of the subject based on the distance distribution around the target pixel 132. As described with reference to FIG. 17, the shape is estimated by searching for a portion (edge) where the distance greatly changes using the distance information. In other words, the region where the distance is gradually changing is assumed to be a part of the detected object, and the shape of the object is obtained while determining whether such a distance is changing gently. It is done.

ステップＳ７０５において、被写体カテゴリ推定部７１２は、距離、被写体サイズ、形状に基づいて、被写体が属するカテゴリを推定する。この場合、距離と被写体サイズだけでなく、形状の情報も用いてカテゴリの推定が行われるため、より精度良くカテゴリの推定を行うことができる。 In step S705, the subject category estimation unit 712 estimates the category to which the subject belongs based on the distance, the subject size, and the shape. In this case, since the category is estimated using not only the distance and the subject size but also the shape information, the category can be estimated with higher accuracy.

第７の実施の形態における検出装置７００によれば、距離画像から、被写体のサイズを推定し、カテゴリを推定し、被写体の形状を推定することができる。また、検出装置７００によれば、複数の被写体（カテゴリ）を推定することができ、例えば、人と車といった異なる物体を検出することができる。 According to the detection apparatus 700 in the seventh embodiment, it is possible to estimate the size of the subject, estimate the category, and estimate the shape of the subject from the distance image. Further, according to the detection device 700, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected.

なお、被写体カテゴリ推定部７１２によるカテゴリの推定を省略し、被写体形状推定部７１１での被写体形状の推定結果が、後段の処理部（不図示）に出力されるようにしても良い。 Note that category estimation by the subject category estimation unit 712 may be omitted, and the subject shape estimation result by the subject shape estimation unit 711 may be output to a subsequent processing unit (not shown).

＜第８の実施の形態＞
次に、第８の実施の形態について説明を加える。図２２は、第８の実施の形態における検出装置８００の構成例を示す図である。図２２に示した検出装置８００と、図２０に示した検出装置７００において、同一の部分には、同一の符号を付し、その説明は省略する。 <Eighth Embodiment>
Next, an eighth embodiment will be described. FIG. 22 is a diagram illustrating a configuration example of the detection apparatus 800 according to the eighth embodiment. In the detection apparatus 800 illustrated in FIG. 22 and the detection apparatus 700 illustrated in FIG. 20, the same portions are denoted by the same reference numerals, and description thereof is omitted.

第８の実施の形態における検出装置８００は、第７の実施の形態における検出装置７００に、撮像部８１１と被写体詳細認識部８１２とを追加した構成とされている。追加された撮像部８１１は、第２の実施の形態における検出装置２００の撮像部２１１（図６）と基本的に同様の処理を行う。 The detection apparatus 800 in the eighth embodiment is configured by adding an imaging unit 811 and a subject detail recognition unit 812 to the detection apparatus 700 in the seventh embodiment. The added imaging unit 811 performs basically the same processing as the imaging unit 211 (FIG. 6) of the detection apparatus 200 in the second embodiment.

撮像部８１１は、通常画像を撮像し、被写体詳細認識部８１２に供給する。被写体詳細認識部８１２には、被写体カテゴリ推定部７１２から、距離、被写体サイズ、被写体カテゴリ、および被写体形状も供給される。被写体詳細認識部８１２は、被写体カテゴリ推定部７１２から供給された距離、被写体サイズ、被写体カテゴリ、および被写体形状を用いて、さらに詳細な認識を、通常画像を用いて行う。 The imaging unit 811 captures a normal image and supplies it to the subject detail recognition unit 812. The subject detail recognition unit 812 is also supplied with the distance, subject size, subject category, and subject shape from the subject category estimation unit 712. The subject detail recognition unit 812 performs more detailed recognition using the normal image using the distance, subject size, subject category, and subject shape supplied from the subject category estimation unit 712.

図２３に示したフローチャートを参照して、図２２に示した検出装置８００の処理について説明を加える。 With reference to the flowchart shown in FIG. 23, the processing of the detection apparatus 800 shown in FIG. 22 will be described.

ステップＳ８０１乃至Ｓ８０６は、距離情報取得部１１１、被写体サイズ推定部５１１、被写体形状推定部７１１、被写体カテゴリ推定部７１２により行われる処理であり、図２１に示したフローチャートのステップＳ７０１乃至Ｓ７０６と同様に行われるため、その説明は省略する。 Steps S801 to S806 are processing performed by the distance information acquisition unit 111, the subject size estimation unit 511, the subject shape estimation unit 711, and the subject category estimation unit 712, and are similar to steps S701 to S706 of the flowchart shown in FIG. Since it is performed, the description thereof is omitted.

ステップＳ８０７において、被写体詳細認識部８１２は、被写体候補領域、被写体カテゴリ、および被写体形状を利用して詳細認識を行う。例えば、被写体詳細認識部８１２は、被写体カテゴリ推定部７１２から供給された被写体サイズに該当する枠を撮像部４１１からの通常画像の該当領域に設定し、その設定した枠内の画像を切り出す。 In step S807, the subject detail recognition unit 812 performs detail recognition using the subject candidate area, the subject category, and the subject shape. For example, the subject detail recognition unit 812 sets a frame corresponding to the subject size supplied from the subject category estimation unit 712 as a corresponding region of the normal image from the imaging unit 411, and cuts out an image within the set frame.

そして、切り出された通常画像と、被写体カテゴリ推定部７１２から供給された被写体カテゴリを用いて、カテゴリを絞り込んだ後、そのカテゴリに属する物体のうち、被写体形状に合う物体を特定するような認識処理など、予め設定されている認識処理を実行する。例えば、カテゴリが人と判定されている場合には、人に属する画像とのマッチングを行い、そのマッチングを行うとき、被写体形状を参照し、その形状に近い画像、例えば、形状が人の顔である場合、人の顔を対象とした認識に絞り込み、絞り込んだ後、個人を特定するといった処理が行われる。 Then, after narrowing down the category using the extracted normal image and the subject category supplied from the subject category estimation unit 712, a recognition process for identifying an object that matches the subject shape from among the objects belonging to the category. For example, a preset recognition process is executed. For example, when the category is determined to be a person, matching with an image belonging to the person is performed, and when performing the matching, the object shape is referred to, and an image close to the shape, for example, the shape is a human face. In some cases, processing is performed such as narrowing down recognition to a person's face, and then identifying an individual after narrowing down.

このように、第８の実施の形態における検出装置８００によれば、距離画像から、被写体のサイズを推定し、カテゴリを推定し、被写体の形状を推定することができる。また、検出装置８００によれば、複数の被写体（カテゴリ）を推定することができ、例えば、人と車といった異なる物体を検出することができる。 Thus, according to the detection apparatus 800 in the eighth embodiment, the size of the subject, the category, and the shape of the subject can be estimated from the distance image. Further, the detection apparatus 800 can estimate a plurality of subjects (categories), and can detect different objects such as a person and a car, for example.

さらに、検出した物体を、詳細に認識することができる。その詳細な認識のとき、推定されている被写体のサイズ、カテゴリ、形状などの情報も用いて行うことができるため、詳細認識に係る処理を軽減することができる。 Furthermore, the detected object can be recognized in detail. Since the detailed recognition can be performed using information such as the estimated size, category, and shape of the subject, the processing related to the detailed recognition can be reduced.

第１乃至第８の実施の形態における検出装置によれば、距離画像から物体を検出することができる。例えば、物体として人を検出するようにした場合、本技術を監視カメラ等に適用できる。また、例えば、本技術をゲーム機に適用し、ゲームを行う人を検出し、その人のジェスチャーを検出する（手や、その手の向きなどを検出する）装置にも適用できる。 According to the detection apparatus in the first to eighth embodiments, an object can be detected from a distance image. For example, when a person is detected as an object, the present technology can be applied to a surveillance camera or the like. Further, for example, the present technology can be applied to a game machine, and can be applied to a device that detects a person who plays a game and detects a gesture of the person (detects a hand, a direction of the hand, and the like).

また、第１乃至第８の実施の形態における検出装置を、自動車に搭載し、人、自転車、自車以外の自動車などを検出し、検出した物体の情報を、ユーザに知らせたり、衝突しないように安全回避を行う制御を行ったりする装置の一部にも適用できる。 In addition, the detection device according to the first to eighth embodiments is mounted on a car, and a person, a bicycle, a car other than the own car is detected, and information on the detected object is notified to the user or a collision is prevented. The present invention can also be applied to a part of a device that performs control for avoiding safety.

＜検出装置の適用例＞ <Application example of detection device>

第１乃至第８の実施の形態における検出装置を、複数の基板（ダイ）を積層した積層イメージセンサを採用することができる。ここでは、第２の実施の形態における検出装置２００（図６）を例に挙げて、検出装置２００を積層イメージセンサで構成した場合について説明する。 As the detection apparatus in the first to eighth embodiments, a stacked image sensor in which a plurality of substrates (dies) are stacked can be employed. Here, the case where the detection device 200 is configured by a laminated image sensor will be described by taking the detection device 200 (FIG. 6) in the second embodiment as an example.

図２４は、図６の検出装置２００の全体を内蔵させた積層イメージセンサの第１の構成例を示す図である。図２４の積層イメージセンサは、画素基板９０１と信号処理基板９０２とが積層された２層構造になっている。 FIG. 24 is a diagram illustrating a first configuration example of a stacked image sensor in which the entire detection device 200 of FIG. 6 is incorporated. The stacked image sensor of FIG. 24 has a two-layer structure in which a pixel substrate 901 and a signal processing substrate 902 are stacked.

画素基板９０１には、距離情報取得部１１１（の一部）と撮像部２１１（の一部）が形成されている。距離情報取得部１１１を、ＴＯＦ方式で距離情報を得る場合、所定の光を被写体に照射する照射部と、その照射された光を受光する撮像素子とが含まれる構成とされる。この距離情報取得部１１１を構成する撮像素子の部分、または照射部などの部分も含めて、画素基板９０１に形成することができる。 A distance information acquisition unit 111 (part) and an imaging unit 211 (part) are formed on the pixel substrate 901. When the distance information acquisition unit 111 obtains distance information by the TOF method, the distance information acquisition unit 111 includes an irradiation unit that irradiates a subject with predetermined light and an imaging element that receives the irradiated light. The distance information acquisition unit 111 can be formed on the pixel substrate 901 including a part of an imaging element or a part such as an irradiation unit.

また撮像部２１１にも、通常画像を撮像するための撮像素子が含まれている。この距離情報取得部２１１を構成する撮像素子の部分を、画素基板９０１に形成することができる。 The imaging unit 211 also includes an imaging element for capturing a normal image. The part of the image sensor that constitutes the distance information acquisition unit 211 can be formed on the pixel substrate 901.

信号処理基板８２には、被写体特徴抽出部１１２、被写体候補領域検出部１１３、実サイズデータベース１１４、および被写体詳細認識部２１２が形成されている。 On the signal processing board 82, a subject feature extraction unit 112, a subject candidate region detection unit 113, an actual size database 114, and a subject detail recognition unit 212 are formed.

以上のように構成される図２４の積層イメージセンサでは、画素基板９０１の距離情報取得部１１１において、そこに入射する光を受光することにより撮像が行われ、その撮像により得られる画像（距離画像）から、検出対象とされた物体の検出が行われる。 In the stacked image sensor of FIG. 24 configured as described above, the distance information acquisition unit 111 of the pixel substrate 901 performs imaging by receiving light incident thereon, and an image (distance image) obtained by the imaging. ) To detect an object to be detected.

また図２４の積層イメージセンサでは、画素基板９０１の撮像部２１１において、そこに入射する光を受光することにより撮像が行われ、その撮像により得られる画像（通常画像）から、検出対象とされた被写体の画像などが切り出され、出力される。 In the stacked image sensor of FIG. 24, the imaging unit 211 of the pixel substrate 901 performs imaging by receiving light incident thereon, and is set as a detection target from an image (normal image) obtained by the imaging. A subject image or the like is cut out and output.

図２５は、図６の検出装置２００の全体を内蔵させた積層イメージセンサの第２の構成例を示す図である。 FIG. 25 is a diagram illustrating a second configuration example of the stacked image sensor in which the entire detection device 200 of FIG. 6 is incorporated.

なお、図中、図２４の場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。 In the figure, portions corresponding to those in FIG. 24 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.

図２５の積層イメージセンサは、画素基板９０１、信号処理基板９０２、および、メモリ基板９０３が積層された３層構造になっている。 The stacked image sensor of FIG. 25 has a three-layer structure in which a pixel substrate 901, a signal processing substrate 902, and a memory substrate 903 are stacked.

画素基板９０１には、距離情報取得部１１１と撮像部２１１が形成され、信号処理基板９０２には、被写体特徴抽出部１１２、被写体候補領域検出部１１３、および被写体詳細認識部２１２が形成されている。 A distance information acquisition unit 111 and an imaging unit 211 are formed on the pixel substrate 901, and a subject feature extraction unit 112, a subject candidate region detection unit 113, and a subject detail recognition unit 212 are formed on the signal processing substrate 902. .

メモリ基板９０３には、実サイズデータベース１１４と画像記憶部９１１が形成されている。 A real size database 114 and an image storage unit 911 are formed on the memory substrate 903.

図２５では、被写体候補領域検出部１１３による検出結果、例えば、検出対象の被写体が撮像されている距離画像から切り出された画像などを記憶する記憶領域として、画像記憶部９１１が、メモリ基板９０３に形成されている。また、テーブル１５１（図４）を記憶している実サイズデータベース１１４も、メモリ基板９０３に形成されている。 In FIG. 25, an image storage unit 911 is stored in the memory substrate 903 as a storage region for storing a detection result by the subject candidate region detection unit 113, for example, an image cut out from a distance image in which a subject to be detected is captured. Is formed. An actual size database 114 storing the table 151 (FIG. 4) is also formed on the memory substrate 903.

なお、図２５では、画素基板９０１、信号処理基板９０２、および、メモリ基板９０３は、上から、その順で積層されているが、その他、例えば、信号処理基板９０２とメモリ基板９０３との順番を入れ替えて、画素基板９０１、メモリ基板９０３、および、信号処理基板９０２の順で積層することができる。 In FIG. 25, the pixel substrate 901, the signal processing substrate 902, and the memory substrate 903 are stacked in that order from the top. However, for example, the order of the signal processing substrate 902 and the memory substrate 903 is changed. The pixel substrate 901, the memory substrate 903, and the signal processing substrate 902 can be stacked in this order.

また、積層イメージセンサは、２層や３層の基板の他、４層以上の基板を積層して構成することができる。 In addition, the laminated image sensor can be configured by laminating four or more layers in addition to a two-layer or three-layer substrate.

＜本技術を適用したコンピュータの説明＞ <Description of computer to which this technology is applied>

次に、検出装置１００乃至８００のそれぞれが行う一連の処理は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。 Next, a series of processes performed by each of the detection apparatuses 100 to 800 can be performed by hardware or can be performed by software. When a series of processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

図２６は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示すブロック図である。 FIG. 26 is a block diagram illustrating a configuration example of an embodiment of a computer in which a program for executing the series of processes described above is installed.

プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク１００５やROM１００３に予め記録しておくことができる。 The program can be recorded in advance on a hard disk 1005 or a ROM 1003 as a recording medium built in the computer.

あるいはまた、プログラムは、リムーバブル記録媒体１０１１に格納（記録）しておくことができる。このようなリムーバブル記録媒体１０１１は、いわゆるパッケージソフトウエアとして提供することができる。ここで、リムーバブル記録媒体１０１１としては、例えば、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリ等がある。 Alternatively, the program can be stored (recorded) in the removable recording medium 1011. Such a removable recording medium 1011 can be provided as so-called package software. Here, examples of the removable recording medium 1011 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.

なお、プログラムは、上述したようなリムーバブル記録媒体１０１１からコンピュータにインストールする他、通信網や放送網を介して、コンピュータにダウンロードし、内蔵するハードディスク１００５にインストールすることができる。すなわち、プログラムは、例えば、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送することができる。 In addition to installing the program from the removable recording medium 1011 as described above, the program can be downloaded to the computer via a communication network or a broadcast network, and can be installed on the built-in hard disk 1005. That is, for example, the program is wirelessly transferred from a download site to a computer via a digital satellite broadcasting artificial satellite, or wired to a computer via a network such as a LAN (Local Area Network) or the Internet. be able to.

コンピュータは、CPU(Central Processing Unit)１００２を内蔵しており、CPU１００２には、バス１００１を介して、入出力インタフェース１０１０が接続されている。 The computer includes a CPU (Central Processing Unit) 1002, and an input / output interface 1010 is connected to the CPU 1002 via a bus 1001.

CPU１００２は、入出力インタフェース１０１０を介して、ユーザによって、入力部１００７が操作等されることにより指令が入力されると、それに従って、ROM(Read Only Memory)１００３に格納されているプログラムを実行する。あるいは、CPU１００２は、ハードディスク１００５に格納されたプログラムを、RAM(Random Access Memory)１００４にロードして実行する。 The CPU 1002 executes a program stored in a ROM (Read Only Memory) 1003 when a command is input by the user operating the input unit 1007 or the like via the input / output interface 1010. . Alternatively, the CPU 1002 loads a program stored in the hard disk 1005 into a RAM (Random Access Memory) 1004 and executes it.

これにより、CPU１００２は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU１００２は、その処理結果を、必要に応じて、例えば、入出力インタフェース１０１０を介して、出力部１００６から出力、あるいは、通信部１００８から送信、さらには、ハードディスク１００５に記録等させる。 Thereby, the CPU 1002 performs processing according to the above-described flowchart or processing performed by the configuration of the above-described block diagram. Then, the CPU 1002 outputs the processing result as necessary, for example, via the input / output interface 1010, from the output unit 1006, or from the communication unit 1008, and further recorded on the hard disk 1005.

なお、入力部１００７は、キーボードや、マウス、マイク等で構成される。また、出力部１００６は、LCD(Liquid Crystal Display)やスピーカ等で構成される。 Note that the input unit 1007 includes a keyboard, a mouse, a microphone, and the like. The output unit 1006 includes an LCD (Liquid Crystal Display), a speaker, and the like.

ここで、本明細書において、コンピュータがプログラムに従って行う処理は、必ずしもフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、コンピュータがプログラムに従って行う処理は、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含む。 Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in time series in the order described as the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or object processing).

また、プログラムは、１のコンピュータ（プロセッサ）により処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。 Further, the program may be processed by one computer (processor) or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.

さらに、本明細書において、システムとは、複数の構成要素（装置、モジュール（部品）等）の集合を意味し、全ての構成要素が同一筐体中にあるか否かは問わない。したがって、別個の筐体に収納され、ネットワークを介して接続されている複数の装置、および、１つの筐体の中に複数のモジュールが収納されている１つの装置は、いずれも、システムである。 Furthermore, in this specification, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a single device housing a plurality of modules in one housing are all systems. .

なお、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

例えば、上述した検出装置１００乃至８００の各構成例は、可能な範囲で組み合わせることができる。 For example, the configuration examples of the detection devices 100 to 800 described above can be combined as much as possible.

ここで、本技術は、１つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 Here, the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and is jointly processed.

また、上述のフローチャートで説明した各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。 In addition, each step described in the above flowchart can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

また、本明細書に記載された効果はあくまで例示であって限定されるものではなく、他の効果があってもよい。 Moreover, the effect described in this specification is an illustration to the last, and is not limited, There may exist another effect.

なお、本技術は、以下のような構成をとることができる。
（１）
被写体までの距離に関する距離情報を取得する取得部と、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定する設定部と、
前記領域内の画像が、前記物体であるか否かを判定する判定部と
を備える検出装置。
（２）
前記設定部は、
前記物体の特徴量を、所定の距離における前記物体の大きさとし、
処理対象に設定した画素における距離に応じた前記物体の大きさに該当する枠を設定し、
前記判定部は、
前記枠内の画像が前記物体である否かを判定する
前記（１）に記載の検出装置。
（３）
前記物体が向いている方向を、前記距離情報から検出する方向検出部をさらに備える
前記（１）または（２）に記載の検出装置。
（４）
環境光による画像を撮像する撮像部と、
前記撮像部で撮像された前記画像と、前記設定部により設定された前記物体の大きさ、前記判定部により前記物体であると判定された領域内の画像、および前記方向検出部により検出された前記物体の方向の少なくとも１つを用いて、前記物体に対する詳細な認識を行う認識部と
をさらに備える前記（３）に記載の検出装置。
（５）
被写体までの距離に関する距離情報を取得する取得部と、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定する設定部と、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する推定部と
を備える検出装置。
（６）
前記設定部は、
前記距離情報が変化する部分までを前記物体が撮像されている可能性がある領域として設定し、
前記推定部は、
前記領域内の前記距離情報が表す距離において前記領域の大きさとなる物体が属するカテゴリを、前記物体が属するカテゴリであると推定する
前記（５）に記載の検出装置。
（７）
前記設定部により設定された前記領域内の物体の形状を、前記距離情報を用いて推定する形状推定部をさらに備える
前記（５）または（６）に記載の検出装置。
（８）
前記推定部は、前記距離情報、前記領域の大きさ、前記形状の少なくとも１つを用いて前記カテゴリを推定する
前記（７）に記載の検出装置。
（９）
環境光による画像を撮像する撮像部と、
前記撮像部で撮像された前記画像と、前記設定部により設定された前記領域の大きさ、前記推定部により推定された前記カテゴリ、および前記形状推定部により推定された前記形状の少なくとも１つを用いて、前記物体に対する詳細な認識を行う認識部と
をさらに備える前記（７）に記載の検出装置。
（１０）
前記取得部は、ＴＯＦ方式のセンサ、ステレオカメラ、超音波センサ、またはミリ波レーダを用いて、前記距離情報を取得する
前記（１）乃至（９）のいずれかに記載の検出装置。
（１１）
被写体までの距離に関する距離情報を取得し、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、
前記領域内の画像が、前記物体であるか否かを判定する
ステップを含む検出方法。
（１２）
被写体までの距離に関する距離情報を取得し、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する
ステップを含む検出方法。
（１３）
コンピュータに、
被写体までの距離に関する距離情報を取得し、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、
前記領域内の画像が、前記物体であるか否かを判定する
ステップを含む処理を実行させるためのプログラム。
（１４）
コンピュータに、
被写体までの距離に関する距離情報を取得し、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する
ステップを含む処理を実行させるためのプログラム。 In addition, this technique can take the following structures.
(1)
An acquisition unit for acquiring distance information related to the distance to the subject;
A setting unit for setting a region where the object may be imaged from the distance information and the feature amount of the object to be detected;
A detection device comprising: a determination unit that determines whether an image in the region is the object.
(2)
The setting unit
The feature amount of the object is the size of the object at a predetermined distance,
Set a frame corresponding to the size of the object according to the distance in the pixel set as the processing target,
The determination unit
The detection device according to (1), wherein it is determined whether an image in the frame is the object.
(3)
The detection device according to (1) or (2), further including a direction detection unit that detects a direction in which the object is facing from the distance information.
(4)
An imaging unit that captures an image of ambient light;
The image picked up by the image pickup unit, the size of the object set by the setting unit, an image in an area determined to be the object by the determination unit, and detected by the direction detection unit The detection apparatus according to (3), further comprising: a recognition unit that performs detailed recognition on the object using at least one of the directions of the object.
(5)
An acquisition unit for acquiring distance information related to the distance to the subject;
Using the distance information, a setting unit that sets a region where a predetermined object may be imaged,
A detection apparatus comprising: an estimation unit configured to estimate a category to which the object belongs from the size of the region and the distance information.
(6)
The setting unit
Set up to the part where the distance information changes as an area where the object may be imaged,
The estimation unit includes
The detection apparatus according to (5), wherein a category to which an object having a size of the area belongs at a distance represented by the distance information in the area is a category to which the object belongs.
(7)
The detection apparatus according to (5) or (6), further including a shape estimation unit that estimates the shape of the object in the region set by the setting unit using the distance information.
(8)
The detection device according to (7), wherein the estimation unit estimates the category using at least one of the distance information, the size of the region, and the shape.
(9)
An imaging unit that captures an image of ambient light;
At least one of the image captured by the imaging unit, the size of the region set by the setting unit, the category estimated by the estimation unit, and the shape estimated by the shape estimation unit. The detection device according to (7), further comprising: a recognition unit that performs detailed recognition on the object.
(10)
The detection device according to any one of (1) to (9), wherein the acquisition unit acquires the distance information using a TOF type sensor, a stereo camera, an ultrasonic sensor, or a millimeter wave radar.
(11)
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A detection method including a step of determining whether an image in the region is the object.
(12)
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A detection method including a step of estimating a category to which the object belongs from the size of the region and the distance information.
(13)
On the computer,
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A program for executing processing including a step of determining whether or not an image in the region is the object.
(14)
On the computer,
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A program for executing a process including a step of estimating a category to which the object belongs from the size of the area and the distance information.

１００検出装置，１１１距離情報取得部，１１２被写体特徴抽出部，１１３被写体候補領域検出部，１１４実サイズデータベース，２００検出装置，２１１撮像部，２１２被写体詳細認識部，３００検出装置，３１１被写体方向検出部，４００検出装置，４１１撮像部，４１２被写体詳細認識部，５００検出装置，５１１被写体サイズ推定部，５１２被写体カテゴリ推定部，６００検出装置，６１１撮像部，６１２被写体詳細認識部，７００検出装置，７１１被写体形状推定部，７１２被写体カテゴリ推定部，８００検出装置，８１１撮像部，８１２被写体詳細認識部 DESCRIPTION OF SYMBOLS 100 Detection apparatus, 111 Distance information acquisition part, 112 Subject feature extraction part, 113 Subject candidate area detection part, 114 Actual size database, 200 Detection apparatus, 211 Imaging part, 212 Subject detail recognition part, 300 Detection apparatus, 311 Subject direction detection , 400 detection device, 411 imaging unit, 412 subject detail recognition unit, 500 detection device, 511 subject size estimation unit, 512 subject category estimation unit, 600 detection device, 611 imaging unit, 612 subject detail recognition unit, 700 detection device, 711 Subject shape estimation unit, 712 Subject category estimation unit, 800 detection device, 811 imaging unit, 812 Subject detail recognition unit

Claims

被写体までの距離に関する距離情報を取得する取得部と、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定する設定部と、
前記領域内の画像が、前記物体であるか否かを判定する判定部と
を備える検出装置。 An acquisition unit for acquiring distance information related to the distance to the subject;
A setting unit for setting a region where the object may be imaged from the distance information and the feature amount of the object to be detected;
A detection device comprising: a determination unit that determines whether an image in the region is the object.

前記設定部は、
前記物体の特徴量を、所定の距離における前記物体の大きさとし、
処理対象に設定した画素における距離に応じた前記物体の大きさに該当する枠を設定し、
前記判定部は、
前記枠内の画像が前記物体である否かを判定する
請求項１に記載の検出装置。 The setting unit
The feature amount of the object is the size of the object at a predetermined distance,
Set a frame corresponding to the size of the object according to the distance in the pixel set as the processing target,
The determination unit
The detection device according to claim 1, wherein it is determined whether an image in the frame is the object.

前記物体が向いている方向を、前記距離情報から検出する方向検出部をさらに備える
請求項２に記載の検出装置。 The detection device according to claim 2, further comprising: a direction detection unit that detects a direction in which the object is facing from the distance information.

環境光による画像を撮像する撮像部と、
前記撮像部で撮像された前記画像と、前記設定部により設定された前記物体の大きさ、前記判定部により前記物体であると判定された領域内の画像、および前記方向検出部により検出された前記物体の方向の少なくとも１つを用いて、前記物体に対する詳細な認識を行う認識部と
をさらに備える請求項３に記載の検出装置。 An imaging unit that captures an image of ambient light;
The image picked up by the image pickup unit, the size of the object set by the setting unit, an image in an area determined to be the object by the determination unit, and detected by the direction detection unit The detection apparatus according to claim 3, further comprising: a recognition unit that performs detailed recognition on the object using at least one of the directions of the object.

被写体までの距離に関する距離情報を取得する取得部と、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定する設定部と、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する推定部と
を備える検出装置。 An acquisition unit for acquiring distance information related to the distance to the subject;
Using the distance information, a setting unit that sets a region where a predetermined object may be imaged,
A detection apparatus comprising: an estimation unit configured to estimate a category to which the object belongs from the size of the region and the distance information.

前記設定部は、
前記距離情報が変化する部分までを前記物体が撮像されている可能性がある領域として設定し、
前記推定部は、
前記領域内の前記距離情報が表す距離において前記領域の大きさとなる物体が属するカテゴリを、前記物体が属するカテゴリであると推定する
請求項５に記載の検出装置。 The setting unit
Set up to the part where the distance information changes as an area where the object may be imaged,
The estimation unit includes
The detection device according to claim 5, wherein a category to which an object having a size of the area belongs at a distance represented by the distance information in the area is estimated as a category to which the object belongs.

前記設定部により設定された前記領域内の物体の形状を、前記距離情報を用いて推定する形状推定部をさらに備える
請求項５に記載の検出装置。 The detection device according to claim 5, further comprising a shape estimation unit that estimates the shape of the object in the region set by the setting unit using the distance information.

前記推定部は、前記距離情報、前記領域の大きさ、前記形状の少なくとも１つを用いて前記カテゴリを推定する
請求項７に記載の検出装置。 The detection device according to claim 7, wherein the estimation unit estimates the category using at least one of the distance information, the size of the region, and the shape.

環境光による画像を撮像する撮像部と、
前記撮像部で撮像された前記画像と、前記設定部により設定された前記領域の大きさ、前記推定部により推定された前記カテゴリ、および前記形状推定部により推定された前記形状の少なくとも１つを用いて、前記物体に対する詳細な認識を行う認識部と
をさらに備える請求項７に記載の検出装置。 An imaging unit that captures an image of ambient light;
At least one of the image captured by the imaging unit, the size of the region set by the setting unit, the category estimated by the estimation unit, and the shape estimated by the shape estimation unit. The detection device according to claim 7, further comprising: a recognition unit that performs detailed recognition on the object.

前記取得部は、ＴＯＦ方式のセンサ、ステレオカメラ、超音波センサ、またはミリ波レーダを用いて、前記距離情報を取得する
請求項１に記載の検出装置。 The detection apparatus according to claim 1, wherein the acquisition unit acquires the distance information using a TOF type sensor, a stereo camera, an ultrasonic sensor, or a millimeter wave radar.

被写体までの距離に関する距離情報を取得し、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、
前記領域内の画像が、前記物体であるか否かを判定する
ステップを含む検出方法。 Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A detection method including a step of determining whether an image in the region is the object.

被写体までの距離に関する距離情報を取得し、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する
ステップを含む検出方法。 Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A detection method including a step of estimating a category to which the object belongs from the size of the region and the distance information.

コンピュータに、
被写体までの距離に関する距離情報を取得し、
前記距離情報と、検出対象とされる物体の特徴量から、前記物体が撮像されている可能性がある領域を設定し、
前記領域内の画像が、前記物体であるか否かを判定する
ステップを含む処理を実行させるためのプログラム。 On the computer,
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A program for executing processing including a step of determining whether or not an image in the region is the object.

コンピュータに、
被写体までの距離に関する距離情報を取得し、
前記距離情報を用いて、所定の物体が撮像されている可能性がある領域を設定し、
前記領域の大きさと、前記距離情報とから、前記物体が属するカテゴリを推定する
ステップを含む処理を実行させるためのプログラム。 On the computer,
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A program for executing a process including a step of estimating a category to which the object belongs from the size of the area and the distance information.