JP2014067269A

JP2014067269A - Detector

Info

Publication number: JP2014067269A
Application number: JP2012212836A
Authority: JP
Inventors: Yoshiji Bando; 誉司坂東
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2012-09-26
Filing date: 2012-09-26
Publication date: 2014-04-17
Anticipated expiration: 2032-09-26
Also published as: JP5983243B2

Abstract

PROBLEM TO BE SOLVED: To provide a detector capable of quickly detecting a moving body being an object dangerous for an own vehicle.SOLUTION: In a moving body detection device 1, a parts identifier 43 identifies the positions of a plurality of parts partially indicating an identification object by a captured image. A posture feature extraction part 61 calculates a posture feature vector f indicating the feature of the positional relation of the parts. A posture estimation part 62 estimates the posture of the identification object in the captured image from the posture feature vector f by using a probability model indicating the relation between the posture feature vector f and the posture of the identification object. An information providing part 30 provides information on the identification object based on the estimated posture.

Description

本発明は、撮像画像から識別対象を検出する検出装置に関する。 The present invention relates to a detection device that detects an identification target from a captured image.

従来、車両の周辺環境（車両進行方向に存在する移動体の有無や移動体の位置等）に応じた運転支援を実行するシステムについて、各種検討が行われている。
このようなシステムの一例として、車両に搭載された赤外線カメラ等により撮像された画像から歩行者を検出し、この歩行者が自車両にとって危険な対象であるか否か、具体的には、自車両への衝突の虞があるか否かを判断するもの（以下、従来装置という）が知られている（例えば、特許文献１参照）。 Conventionally, various studies have been made on a system that performs driving support in accordance with the surrounding environment of the vehicle (the presence or absence of a moving body in the vehicle traveling direction, the position of the moving body, and the like).
As an example of such a system, a pedestrian is detected from an image captured by an infrared camera or the like mounted on the vehicle, and whether or not the pedestrian is a dangerous object for the own vehicle, specifically, A device that determines whether or not there is a possibility of a collision with a vehicle (hereinafter referred to as a conventional device) is known (see, for example, Patent Document 1).

この従来装置では、車両に搭載された２台のカメラを用いて歩行者を検出し、この歩行者の全体（全身）のシルエットを水平方向にスライスした直線の中点をサンプル点として、このサンプル点を直線で回帰させる処理を画像上にて実行する。そして、この従来装置では、推定された回帰直線の鉛直方向からの傾きを歩行者の体幹の傾きとして、この体幹の傾きが車道の中心に向かって予め定めた値以上傾いていれば、この人物は車道への飛び出しの虞が高いと判断する。 In this conventional apparatus, a pedestrian is detected by using two cameras mounted on a vehicle, and the sample point is a midpoint of a straight line obtained by slicing the silhouette of the entire pedestrian (whole body) in the horizontal direction. The process of regressing the points with a straight line is executed on the image. And in this conventional device, if the inclination of the estimated regression line from the vertical direction is the inclination of the trunk of the pedestrian, the inclination of the trunk is inclined more than a predetermined value toward the center of the roadway, It is determined that this person has a high risk of jumping out onto the roadway.

特許第４１７３８９６号公報Japanese Patent No. 4173896

ところで、体幹の傾きは、歩行者の一連の動きの中で最後に出現することが一般的である。つまり、例えば歩行者が車道に飛び出そうとしているならば、手が動く又は足が動く等、人体を構成する各パーツに何らかの動きが出現した後に、体幹の傾きが出現すると考えられる。 By the way, the inclination of the trunk generally appears last in a series of pedestrian movements. That is, for example, if a pedestrian is about to jump out of the road, it is considered that the inclination of the trunk appears after some movement appears in each part of the human body, such as a hand moving or a foot moving.

従来装置では、歩行者の姿勢（動き）を判断する際に、体幹の傾きが生じる迄の時間を要してしまう。そこで、車道に飛び出そうとする歩行者等、自車両にとって危険な姿勢（動き）を行う移動体を、より迅速に検出する技術が望まれる。 In the conventional apparatus, it takes time until the trunk tilts when determining the posture (movement) of the pedestrian. Therefore, a technique for more quickly detecting a moving body that performs a posture (movement) that is dangerous to the host vehicle, such as a pedestrian who is about to jump out on the roadway, is desired.

本発明は、上記問題点を解決するためになされたものであり、自車両にとって危険な対象となる移動体を、迅速に検出可能な検出装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a detection device that can quickly detect a moving body that is a dangerous target for the host vehicle.

上記目的を達成するためになされた本発明の検出装置は、車両に搭載され、パーツ位置検出手段が、撮像画像中の識別対象を部分的に表すパーツの位置（パーツ位置）を検出する。また、情報算出手段が、検出したパーツの位置関係の特徴を示すパーツ特徴情報を算出する。そして、姿勢推定手段が、パーツ特徴情報と識別対象のとる姿勢との関係を示す確率モデルを用いて、パーツ特徴情報から識別対象の姿勢を推定する。姿勢情報提供手段は、姿勢推定手段の推定結果に基づく情報、例えば自車両に対して危険な姿勢をとっている、等の様に識別対象についての情報を提供する。 The detection device of the present invention made to achieve the above object is mounted on a vehicle, and the part position detection means detects the position (part position) of a part that partially represents the identification target in the captured image. In addition, the information calculation means calculates part feature information indicating the feature of the positional relationship of the detected parts. Then, the posture estimation means estimates the posture of the identification target from the part feature information using a probability model indicating the relationship between the part feature information and the posture taken by the identification target. The attitude information providing means provides information on the identification target such as information based on the estimation result of the attitude estimating means, for example, taking a dangerous attitude with respect to the host vehicle.

なお、パーツとは識別対象を部分的に表す部位のことをいい、例えば識別対象が歩行者である場合、頭部、右肩部、左肩部、腰部、右足部、左足部等、身体を部分的に表す部位をいう。 Parts refer to parts that partially represent the identification target.For example, when the identification target is a pedestrian, the body part such as the head, right shoulder, left shoulder, waist, right foot, left foot, etc. This refers to the part that is expressed.

ここで、識別対象が何らかの動きを行うとき、識別対象を構成する各パーツに、まず何らかの動きが出現する。つまり、各パーツの位置関係に変化が生じ、しかもその変化が連動して生じると考えられる。 Here, when the identification target performs some movement, first, some movement appears in each part constituting the identification target. That is, it is considered that a change occurs in the positional relationship between the parts, and the change occurs in conjunction with the change.

本発明の検出装置では、この様なパーツを対象として各パーツ間の位置関係の特徴を抽出し、この特徴に基づき識別対象の姿勢（動き）を推定するため、結果として、より迅速に自車両にとって危険な対象となる識別対象を検出することができる。 In the detection apparatus of the present invention, the feature of the positional relationship between each part is extracted for such a part, and the posture (motion) of the identification target is estimated based on this feature. It is possible to detect an identification object that is a dangerous object for the user.

本発明の検出装置における情報算出手段は、請求項２に記載のように、パーツ位置に基づく複数の変量を成分とするベクトルを観測ベクトルとし、観測ベクトルの各成分について多変量解析における主成分分析を行うことにより得られる各主成分を列成分とした行列である予め用意された変換行列の転置行列を用いて、特定対象の観測ベクトルを座標変換してパーツ特徴情報を算出する。 As described in claim 2, the information calculation means in the detection apparatus of the present invention uses, as an observation vector, a vector having a plurality of variables based on part positions as components, and principal component analysis in multivariate analysis for each component of the observation vector Part feature information is calculated by performing coordinate transformation of the observation vector to be identified using a transposed matrix of a transformation matrix prepared in advance, which is a matrix with each principal component obtained as a column component.

より具体的には、例えば請求項３に記載のように、情報算出手段は、多変量解析における主成分分析を用いて、複数の識別対象に対して観測ベクトルの共分散行列を算出し、算出された該行列の固有値のうち大きい順に予め定められた数までの固有値について、対応する固有ベクトルを順に並べてなる行列を、変換行列として用いる。 More specifically, as described in claim 3, for example, the information calculation unit calculates a covariance matrix of observation vectors for a plurality of identification targets by using principal component analysis in multivariate analysis, and calculates For the eigenvalues up to a predetermined number in the descending order of the eigenvalues of the matrix, a matrix in which corresponding eigenvectors are arranged in order is used as a transformation matrix.

これによると、観測ベクトルがより低次元のパーツ特徴情報に変換されるため、このパーツ特徴情報を用いて歩行者の姿勢を推定する処理では、演算量が低減されると共に、ノイズに頑健な推定結果を得ることができる。結果として、本発明の検出装置は、迅速にかつ精度よく、自車両にとって危険な対象となる移動体を検出することができる。 According to this, since the observation vector is converted into lower-dimensional part feature information, the processing for estimating the pedestrian's posture using this part feature information reduces the amount of computation and makes the estimation robust against noise. The result can be obtained. As a result, the detection device of the present invention can quickly and accurately detect a moving object that is dangerous for the host vehicle.

実施形態の移動体検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the moving body detection apparatus of embodiment. 位置検出部１０にて歩行者が検出される過程を模式的に示す説明図である。It is explanatory drawing which shows typically the process in which a pedestrian is detected in the position detection part. （ａ）は事前に準備された撮像画像において歩行者の姿勢を確認した確認結果（正解タグ）を示す説明図であり、（ｂ）は（ａ）と同じ撮像画像において姿勢推定部によって歩行者の姿勢を推定した推定結果を示す説明図である。(A) is explanatory drawing which shows the confirmation result (correct answer tag) which confirmed the pedestrian's attitude | position in the captured image prepared in advance, (b) is a pedestrian by the attitude | position estimation part in the same captured image as (a). It is explanatory drawing which shows the estimation result which estimated the attitude | position of. （ａ）は事前に準備された撮像画像において歩行者の動作を確認した確認結果（正解タグ）を示す説明図であり、（ｂ）は同じ撮像画像において動作推定部によって歩行者の動作を推定した推定結果を示す説明図である。(A) is explanatory drawing which shows the confirmation result (correct answer tag) which confirmed the motion of the pedestrian in the captured image prepared in advance, (b) is estimating the motion of the pedestrian by the motion estimation unit in the same captured image. It is explanatory drawing which shows the estimated result. 回帰係数及びバイアスを示す説明図である。It is explanatory drawing which shows a regression coefficient and a bias. 図５に示す回帰係数及びバイアスにおいて下端位置を算出したときの、算出誤差を示す説明図である。It is explanatory drawing which shows a calculation error when calculating a lower end position in the regression coefficient and bias shown in FIG. 本実施形態の加工画像を示す説明図である。It is explanatory drawing which shows the processed image of this embodiment.

以下、本発明が適用された実施形態について、図面を用いて説明する。
［実施形態］
＜全体構成＞
図１に示す移動体検出装置１は、自動車に搭載され、車載カメラ２及び車載表示装置３のそれぞれと通信可能に接続されている。車載カメラ２は、自動車の走行方向前方を撮像するカメラ（本実施形態では、単眼カメラ）であり、撮像画像を移動体検出装置１に出力する。 Embodiments to which the present invention is applied will be described below with reference to the drawings.
[Embodiment]
<Overall configuration>
A moving body detection device 1 shown in FIG. 1 is mounted on a vehicle and is connected to each of a vehicle-mounted camera 2 and a vehicle-mounted display device 3 so as to communicate with each other. The in-vehicle camera 2 is a camera (in this embodiment, a monocular camera) that captures an image in front of the vehicle in the traveling direction, and outputs a captured image to the moving object detection device 1.

移動体検出装置１は、車載カメラ２から入力した撮像画像において、識別対象として歩行者等の移動体を検出し、この移動体について姿勢及び動作等の情報を抽出する。この抽出した情報に基づいて、移動体検出装置１は、この移動体が自車両にとって危険な対象（危険対称）であると判断した場合、危険対象と判断された移動体を強調する表示を撮像画像に加えた画像（加工画像）を生成して車載表示装置３に出力する。 The moving body detection apparatus 1 detects a moving body such as a pedestrian as an identification target in the captured image input from the in-vehicle camera 2 and extracts information such as the posture and motion of the moving body. Based on the extracted information, when the moving body detection device 1 determines that the moving body is a dangerous target (danger symmetry) for the host vehicle, the moving body detection device 1 captures a display that highlights the moving body determined to be a dangerous target. An image (processed image) added to the image is generated and output to the in-vehicle display device 3.

車載表示装置３は、各種画像を表示し視覚的に情報を提示するためのディスプレイを少なくとも備え、移動体検出装置１から入力した加工画像を、自動車の運転者が視認できるように表示する。 The in-vehicle display device 3 includes at least a display for displaying various images and visually presenting information, and displays the processed image input from the mobile body detection device 1 so that the driver of the vehicle can visually recognize the image.

移動体検出装置１は、撮像画像から抽出した移動体について、この移動体の全体（全身）の位置と、この移動体を部分的に表す複数種類のパーツの位置とに基づく観測情報を出力する位置検出部１０を備える。また、移動体検出装置１は、この観測情報に基づいて、移動体がどの方向に進もうとする姿勢をとっているのか（姿勢情報）、どのような動作をしようとしているのか（動作情報）、及び自車両からどれくらい離れて位置しているのか（距離情報）等の、移動体に関する情報を抽出する情報抽出部２０を備える。さらに、移動体検出装置１は、この情報抽出部２０にて抽出された移動体に関する情報を運転者に認識させることができる形のデータ（加工画像）として出力する情報提供部３０を備える。 The moving body detection apparatus 1 outputs observation information based on the position of the entire moving body (whole body) and the positions of a plurality of types of parts that partially represent the moving body for the moving body extracted from the captured image. A position detection unit 10 is provided. In addition, based on this observation information, the moving body detection device 1 is in which direction the moving body is going to move (posture information), and what kind of operation is being performed (motion information). And an information extracting unit 20 that extracts information on the moving body such as how far away from the host vehicle (distance information) is provided. Furthermore, the mobile body detection device 1 includes an information providing unit 30 that outputs information on the mobile body extracted by the information extraction unit 20 as data (processed image) in a form that allows the driver to recognize the information.

ここで、移動体として、歩行者、自転車及びオートバイク（自動二輪車）等の車両に乗車する人物等、種々のものが考えられるが、本実施形態では、歩行者（自転車及びオートバイク（自動二輪車）等の車両に乗車していない人物）を検出対象としている。 Here, various types of moving objects such as a pedestrian, a person riding a vehicle such as a bicycle and a motorcycle (motorcycle), and the like can be considered. In this embodiment, a pedestrian (bicycle and motorcycle (motorcycle) is used. )) Is a detection target.

＜位置検出部＞
位置検出部１０は、撮像画像の特徴量を算出する特徴量算出部４１と、算出された特徴量を用いて撮像画像から歩行者の全体形状（シルエット）を識別する全体識別部４２と、算出された特徴量を用いて撮像画像から歩行者を部分的に表す複数のパーツを識別するパーツ識別部４３と、識別された歩行者のシルエット及び該歩行者のパーツの位置に基づく観測情報を出力する観測情報生成部５０と、を備える。 <Position detector>
The position detection unit 10 includes a feature amount calculation unit 41 that calculates the feature amount of the captured image, an overall identification unit 42 that identifies the overall shape (silhouette) of the pedestrian from the captured image using the calculated feature amount, and a calculation. A part identifying unit 43 for identifying a plurality of parts partially representing a pedestrian from a captured image using the identified feature value, and outputting observation information based on the identified silhouette of the pedestrian and the position of the part of the pedestrian And an observation information generation unit 50.

特徴量算出部４１は、撮像画像の画像データから輝度値で表現された画像データを生成し、この輝度値で表現された画像データに対して、画像を構成する画素ごとに勾配強度及び勾配方向を算出し、複数画素からなる領域（ブロック）のＨＯＧ（Histograms of Oriented Gradients）特徴量を算出する。ＨＯＧ特徴量の算出方法は周知であるため、ここでは説明を省略する。 The feature amount calculation unit 41 generates image data expressed by luminance values from the image data of the captured image, and the gradient strength and gradient direction for each pixel constituting the image with respect to the image data expressed by the luminance values. And an HOG (Histograms of Oriented Gradients) feature amount of an area (block) composed of a plurality of pixels is calculated. Since the calculation method of the HOG feature value is well known, the description is omitted here.

全体識別部４２では、予め学習等により定められた基準とすべき歩行者の全体形状（基準シルエット。例えば、歩行者の全身に相当する矩形領域（Ｍ０ブロック×Ｎ０ブロック。Ｍ０、Ｎ０は自然数）をいう。）について算出されたＨＯＧ特徴量が、全体基準特徴量として記憶されている。 In the overall identification unit 42, the overall shape of the pedestrian to be used as a reference predetermined by learning or the like (reference silhouette. For example, a rectangular area corresponding to the whole body of the pedestrian (M0 block × N0 block. M0 and N0 are natural numbers). The HOG feature value calculated for the above-mentioned is stored as an overall reference feature value.

この全体基準特徴量をテンプレートとして、全体識別部４２は、撮像画像においてテンプレートの位置を順にずらしながら、各位置におけるテンプレートとの類似度（スコア）を算出する周知のテンプレートマッチングの処理を行う。各位置で算出された類似度は、その位置に歩行者（の基準シルエット）が存在する可能性の高さを示す。 Using this overall reference feature amount as a template, the overall identification unit 42 performs a well-known template matching process for calculating the similarity (score) with the template at each position while sequentially shifting the position of the template in the captured image. The similarity calculated at each position indicates the high possibility that a pedestrian (its reference silhouette) exists at that position.

このスコアが予め定めた閾値を超える場合、全体識別部４２は、その位置に歩行者が存在すると判定し、この位置における基準シルエットの中心（Ｍ０×Ｎ０ブロックの矩形領域の中心）座標を算出し、算出した座標を全体座標（ｘ＿ｒ、ｙ＿ｒ）として出力する（図２参照）。 When this score exceeds a predetermined threshold, the overall identification unit 42 determines that a pedestrian exists at the position, and calculates the center of the reference silhouette (the center of the rectangular area of the M0 × N0 block) at this position. The calculated coordinates are output as overall coordinates (x_r, y_r) (see FIG. 2).

パーツ識別部４３は、検出すべき移動体が歩行者である場合、撮像画像から頭部を識別する頭部識別部４４、右側及び左側の肩部を識別する右肩識別部４５及び左肩識別部４６、腰部を識別する腰部識別部４７、右側及び左側の足部を識別する右足識別部４８及び左足識別部４９を備える。 When the moving body to be detected is a pedestrian, the parts identifying unit 43 includes a head identifying unit 44 that identifies a head from a captured image, a right shoulder identifying unit 45 that identifies right and left shoulders, and a left shoulder identifying unit. 46, a waist identifying part 47 for identifying the waist, a right foot identifying part 48 for identifying the right and left foot parts, and a left foot identifying part 49.

これらのうち、例えば頭部識別部４４には、予め学習等により定められた基準とすべき頭部の形状（例えば、頭部に相当する矩形領域（Ｍ１ブロック×Ｎ１ブロック。Ｍ１、Ｎ１は自然数。））について算出されたＨＯＧ特徴量が、頭部基準特徴量として記憶されている。 Among these, for example, the head identification unit 44 has a head shape to be used as a reference predetermined by learning or the like (for example, a rectangular area corresponding to the head (M1 block × N1 block. M1 and N1 are natural numbers). .)) Is calculated as a head reference feature value.

頭部識別部４４は、頭部基準特徴量をテンプレートとして、撮像画像においてテンプレートの位置を順にずらしながら、全体識別部４２にて実施した処理と同様のテンプレートマッチングの処理を行い、各位置における類似度（スコア）を算出する。このスコアが予め定めた閾値を超える場合、頭部識別部４４は、その位置に頭部が存在すると判定し、基準頭部の領域の中心（Ｍ１×Ｎ１ブロックの矩形の中心）座標を算出し、算出した座標を頭部座標（ｘ＿ｐ１、ｙ＿ｐ１）として出力する。 The head identification unit 44 performs template matching processing similar to the processing performed by the overall identification unit 42 while sequentially shifting the position of the template in the captured image using the head reference feature value as a template. The degree (score) is calculated. When this score exceeds a predetermined threshold, the head identification unit 44 determines that the head is present at the position, and calculates the center of the reference head region (the center of the rectangle of the M1 × N1 block) coordinates. The calculated coordinates are output as head coordinates (x_p1, y_p1).

右肩識別部４５、左肩識別部４６、腰部識別部４７、右足識別部４８及び左足識別部４９は、使用するテンプレートが検出するパーツによって異なる他は、頭部識別部４４と同様の処理を行う。そして、各識別部は、右肩、左肩、胴、右足、左足が存在すると判定された撮像画像上の座標を、それぞれ、右肩座標（ｘ＿ｐ２、ｙ＿ｐ２）、左肩座標（ｘ＿ｐ３、ｙ＿ｐ３）、胴座標（ｘ＿ｐ４、ｙ＿ｐ４）、右足座標（ｘ＿ｐ５、ｙ＿ｐ５）、左足座標（ｘ＿ｐ６、ｙ＿ｐ６）として出力する。 The right shoulder identifying unit 45, the left shoulder identifying unit 46, the waist identifying unit 47, the right foot identifying unit 48, and the left foot identifying unit 49 perform the same processing as the head identifying unit 44, except that it differs depending on the parts detected by the template used. . Then, each identification unit determines the coordinates on the captured image determined to have the right shoulder, the left shoulder, the torso, the right foot, and the left foot as the right shoulder coordinates (x_p2, y_p2), the left shoulder coordinates (x_p3, y_p3), and the torso, respectively. Output as coordinates (x_p4, y_p4), right foot coordinates (x_p5, y_p5), and left foot coordinates (x_p6, y_p6).

観測情報生成部５０は、頭部、両肩、腰部、両足の６つのパーツの相対的な位置関係から、これら６つのパーツを包含する矩形領域に、歩行者が存在する可能性の高さを示す総合類似度を算出する。算出された総合類似度が予め定めた閾値を越える場合、観測情報生成部５０は、その矩形領域に歩行者が存在すると判定する。 From the relative positional relationship of the six parts of the head, both shoulders, waist, and both legs, the observation information generation unit 50 determines the high possibility that a pedestrian exists in a rectangular area that includes these six parts. The total similarity shown is calculated. When the calculated total similarity exceeds a predetermined threshold value, the observation information generation unit 50 determines that a pedestrian exists in the rectangular area.

そして、観測情報生成部５０は、歩行者が存在すると判定した矩形領域に対応する全体座標及び各パーツの座標の値を順に並べてなるＮ（Ｎは自然数）次元の観測ベクトルｖを、（１）式に示すように生成し、この観測ベクトルｖを観測情報として出力する。但し、（１）式に示す各ｘ及びｙの値は、各座標の値を正規化した値とする。つまり、撮像画像から抽出した歩行者画像を同じ大きさで比較できる様に、全体識別部４２及びパーツ識別部４３から出力された各座標の値を正規化する。なお本実施形態では、観測ベクトルｖは１４次元のベクトルであるため（１（全体識別部の数）×２＋６（パーツ識別部の数）×２＝１４）、以下の説明ではＮ＝１４とする。 Then, the observation information generation unit 50 sets an N (N is a natural number) -dimensional observation vector v in which the overall coordinates corresponding to the rectangular area determined to have a pedestrian and the coordinate values of each part are arranged in order (1). It produces | generates as shown in a type | formula, and this observation vector v is output as observation information. However, the values of x and y shown in the equation (1) are values obtained by normalizing the values of the coordinates. That is, the value of each coordinate output from the whole identification part 42 and the part identification part 43 is normalized so that the pedestrian image extracted from the captured image can be compared with the same size. In the present embodiment, since the observation vector v is a 14-dimensional vector (1 (number of overall identification units) × 2 + 6 (number of part identification units) × 2 = 14), N = 14 in the following description. .

なお、上記のように撮像画像から歩行者を抽出する際、常に全てのパーツの座標が取得されるとは限らない。例えば、検出すべき歩行者と他の歩行者とが重なっている場合や、検出すべき歩行者と何らかの物体が重なっている場合（以下、この様な状況を遮蔽という）は、重なっているパーツの座標が取得されないことが有り得る。また、一部のパーツが撮像画像からフレームアウトしているために、該パーツの座標が取得されない場合が有り得る。 In addition, when extracting a pedestrian from a captured image as mentioned above, the coordinate of all the parts is not necessarily acquired. For example, when a pedestrian to be detected overlaps with another pedestrian, or when a pedestrian to be detected overlaps with some object (hereinafter, this situation is referred to as shielding), the overlapping parts It is possible that the coordinates of are not acquired. In addition, since some parts are out of the frame from the captured image, the coordinates of the parts may not be acquired.

この様に検出すべき歩行者について少なくとも一つのパーツ位置（座標）が検出されなかった場合、観測情報生成部５０は、未検出のパーツ位置（座標）に対応する値として予め定められた代替値を当てはめ、観測ベクトルｖを生成する。例えば、左足が抽出されなかった場合であれば、観測ベクトルｖの左足の座標に相当する成分の値を代替値である「０」とする。 When at least one part position (coordinates) is not detected for a pedestrian to be detected in this way, the observation information generating unit 50 substitutes a predetermined alternative value as a value corresponding to an undetected part position (coordinates). To generate an observation vector v. For example, if the left foot is not extracted, the value of the component corresponding to the left foot coordinate of the observation vector v is set to “0” as an alternative value.

なお、本実施形態では、位置検出部１０における一連の処理において、ｌａｔｅｎｔＳＶＭ及びＤＰＭを用いた手法（詳細は、P.F.Felzenszwalb et al.,"Object Detection with Discriminatively Trained Part Based Models,"IEEE Trunsactions on Pattern Analysis and Machine Intelligence(2009). 参照）を利用し、頭部、右肩、左肩、胴、右足、左足の６つのパーツの相対的な位置関係を適切に保った状態のものを、人として検出する。 In the present embodiment, in the series of processing in the position detection unit 10, a technique using latent SVM and DPM (for details, see PFFelzenszwalb et al., “Object Detection with Discriminatively Trained Part Based Models,” IEEE Trunsactions on Pattern Analysis. and Machine Intelligence (2009).) to detect humans with the relative positional relationship of the six parts of the head, right shoulder, left shoulder, torso, right foot, and left foot properly maintained .

＜情報抽出部＞
情報抽出部２０は、観測ベクトルｖに基づいて、検出された歩行者の姿勢情報を抽出する姿勢情報抽出部６０と、動作情報を抽出する動作情報抽出部７０と、下端位置情報を出力する下端推定部８０と、を備える。
＜姿勢情報抽出部＞
姿勢情報抽出部６０は、歩行者の各パーツの位置関係の特徴を示す姿勢特徴ベクトルｆを算出する姿勢特徴抽出部６１と、姿勢特徴ベクトルｆに基づいて移動体の姿勢を推定する姿勢推定部６２とを備える。 <Information extraction unit>
The information extraction unit 20 is based on the observation vector v, the posture information extraction unit 60 that extracts the detected pedestrian posture information, the motion information extraction unit 70 that extracts the motion information, and the lower end that outputs the lower end position information. And an estimation unit 80.
<Attitude information extraction unit>
The posture information extraction unit 60 calculates a posture feature vector f indicating a positional relationship feature of each part of the pedestrian, and a posture estimation unit estimates the posture of the moving body based on the posture feature vector f. 62.

姿勢特徴抽出部６１は、（２）式に示すように、Ｎ次元の観測ベクトルｖに基づいてｋ（ｋは自然数。ｋ＜Ｎ。）次元の姿勢特徴ベクトルｆを算出する。観測ベクトルｖに比べて次元数が低減された姿勢特徴ベクトルｆは、パーツ特徴情報として、姿勢推定部６２及び下端推定部８０に出力される。 The posture feature extraction unit 61 calculates a k (k is a natural number, k <N.)-Dimensional posture feature vector f based on the N-dimensional observation vector v, as shown in Equation (2). The posture feature vector f having a reduced number of dimensions compared to the observation vector v is output to the posture estimation unit 62 and the lower end estimation unit 80 as part feature information.

ここで、変換行列Ａｋは、事前の実験によって算出された値が用いられる。変換行列Ａｋの算出方法について次に説明する。 Here, a value calculated by a prior experiment is used as the transformation matrix Ak. Next, a method for calculating the transformation matrix Ak will be described.

まず、事前に収集された多くの学習用の撮像画像から、Ｍ体（Ｍは自然数）の歩行者画像について、位置検出部１０で算出したものと同様のＮ次元の観測ベクトルｖ（ｉ）（（１）式参照。ｉ＝１、２、・・Ｍ）を算出し、（３）式に示すように、これらＭ体の観測ベクトルｖ（ｉ）の平均値を平均ベクトルｖ_aとして算出する。学習用の撮像画像は、歩行者が撮像されている画像で、車載カメラ２により撮像された画像であることが、より望ましい。 First, an N-dimensional observation vector v (i) similar to that calculated by the position detection unit 10 for M pedestrian images (M is a natural number) from many learning captured images collected in advance. (1) see .i = 1, 2, calculates a · · M), as shown in (3), calculates these M of the observation vector v the mean value of (i) a mean vector v _a . More preferably, the captured image for learning is an image in which a pedestrian is captured and is an image captured by the in-vehicle camera 2.

Ｎ次元の平均ベクトルｖ_aの各成分は、歩行者の各パーツの平均的な位置を表す。つまり、平均ベクトルｖ_aは、撮像画像におけるＭ体の歩行者の平均的な姿勢（形状）を表す。 Each component of the average vector v _a N-dimensional represents the average position of each part of the pedestrian. That is, the mean vector v _a denotes an average attitude of the pedestrian M body in the captured image (shape).

次に、（４）式に示すように、Ｍ体の各歩行者の観測ベクトルｖ（ｉ）と平均ベクトルｖ_aとの差分（偏差）を成分としてデータ行列Ｘを生成し、（５）式に示すように、データ行列Ｘを用いて分散共分散行列Ｓを算出する。 Next, (4) As shown in equation generates data matrix X difference M bodies each pedestrian observation vector v of (i) and the average vector v _a a (deviation) as the component (5) As shown in FIG. 4, a variance covariance matrix S is calculated using the data matrix X.

分散共分散行列ＳはＮ×Ｎの行列となり、これを固有値分解することにより（６）式を得る。 The variance-covariance matrix S is an N × N matrix, and the equation (6) is obtained by eigenvalue decomposition.

（６）式において、Λは分散共分散行列Ｓの固有値λｉ（ｉ＝１、・・Ｎ）を対角成分とする対角行列であり、固有値λｉを大きい値から順に並べて、各固有値λｉと対応するＮ次元の固有ベクトルａ（ｉ）（ｉ＝１、・・・Ｎ）を算出する。 In equation (6), Λ is a diagonal matrix having eigenvalues λi (i = 1,... N) of the variance-covariance matrix S as diagonal components, and the eigenvalues λi are arranged in descending order, and each eigenvalue λi and The corresponding N-dimensional eigenvector a (i) (i = 1,... N) is calculated.

ここで、ｉ番目に大きい固有値λｉに対応する固有ベクトルａ（ｉ）を第ｉ主成分として、第１主成分ａ（１）〜第Ｎ主成分ａ（Ｎ）により構成されるＮ×Ｎの行列Ａを算出する。そして、この行列Ａの内、第１主成分ａ（１）〜第ｋ主成分ａ（ｋ）により構成されるＮ×ｋの部分基底行列Ａｋを（７）式に示すように算出する（ここでは、Ｎ＝１４より、ｋ＜１４）。 Here, an N × N matrix composed of the first principal component a (1) to the Nth principal component a (N) with the eigenvector a (i) corresponding to the i-th largest eigenvalue λi as the i-th principal component. A is calculated. Then, an N × k partial basis matrix Ak composed of the first principal component a (1) to the k-th principal component a (k) in the matrix A is calculated as shown in the equation (7) (here) Then, from N = 14, k <14).

この様に算出した部分基底行列Ａｋが変換行列Ａｋに相当する。つまり、データ行列Ｘの固有値分解を実施することは、データ行列Ｘに対して多変量解析における主成分分析を行うことを意味する。 The partial basis matrix Ak calculated in this way corresponds to the transformation matrix Ak. That is, performing eigenvalue decomposition of the data matrix X means performing principal component analysis in multivariate analysis on the data matrix X.

なお、ｋの値は、固有値から算出される累積寄与率により決定する。ここでは、累積寄与率８０％以上を満足するｋの値を６（ｋ＝６）とし、第１主成分ａ（１）〜第６主成分ａ（６）により構成される１４×６の部分基底行列Ａｋを変換行列Ａｋとした。 The value of k is determined by the cumulative contribution rate calculated from the eigenvalue. Here, the value of k that satisfies the cumulative contribution rate of 80% or more is 6 (k = 6), and a 14 × 6 portion composed of the first principal component a (1) to the sixth principal component a (6) The base matrix Ak is defined as the transformation matrix Ak.

つまり、学習用の多数（Ｍ体）の歩行者画像において、全身及び各パーツからなる歩行者の姿勢（形状）を、各パーツ位置を成分とするＮ次元のベクトル（観測ベクトルｖ）と見なし、主成分分析を行うことにより、歩行者の姿勢（形状）の特徴をもった比較的低次元（ｋ次元）の部分空間が得られる。この部分空間を構成する基底ベクトルが、変換行列Ａｋの主成分ａ（１）〜ａ（ｋ）に対応している。 That is, in a large number of learning pedestrian images (M bodies), the posture (shape) of the pedestrian consisting of the whole body and each part is regarded as an N-dimensional vector (observation vector v) having each part position as a component, By performing principal component analysis, a relatively low-dimensional (k-dimensional) subspace having the characteristics of the posture (shape) of a pedestrian can be obtained. The basis vectors constituting this partial space correspond to the principal components a (1) to a (k) of the transformation matrix Ak.

姿勢特徴抽出部６１が算出する姿勢特徴ベクトルｆは、Ｎ次元の観測ベクトルｖ（移動体検出装置１の車載カメラ２で撮像した歩行者画像から生成した観測ベクトル）を、次元が低減されたｋ次元の部分空間に写像したものとなっている。そして、この姿勢特徴ベクトルｆを用いて姿勢推定部６２にて姿勢推定の処理を行うということは、この次元が低減された部分空間の中で姿勢推定の処理が行われることを意味する。 The posture feature vector f calculated by the posture feature extraction unit 61 is an N-dimensional observation vector v (an observation vector generated from a pedestrian image captured by the vehicle-mounted camera 2 of the mobile object detection device 1), and the dimension is reduced. It is mapped to a subspace of dimensions. Then, performing posture estimation processing by the posture estimation unit 62 using the posture feature vector f means that posture estimation processing is performed in a subspace in which this dimension is reduced.

姿勢推定部６２は、ｋ次元の姿勢特徴ベクトルｆを有する歩行者画像について、歩行者の姿勢ｙが、姿勢ラベルｒに相当する姿勢（推定姿勢）となっている姿勢確率ｐ（ｙ＝ｒ｜ｆ）（但し、ｒ＝１、２、・・・Ｒ。Ｒは自然数。）を、ロジスティック回帰手法を利用して、（８）式により算出する。 The posture estimation unit 62, for a pedestrian image having a k-dimensional posture feature vector f, a posture probability p (y = r |) in which the posture y of the pedestrian is a posture corresponding to the posture label r (estimated posture). f) (where r = 1, 2,... R, R is a natural number) is calculated by the equation (8) using a logistic regression method.

ここでは、姿勢ラベルｒは４クラス（Ｒ＝４）からなり、姿勢ラベル１（ｒ＝１）は姿勢として前向きを表し、姿勢ラベル２（ｒ＝２）は後向きを表し、姿勢ラベル３（ｒ＝３）は左向きを表し、姿勢ラベル４（ｒ＝４）は右向きを表わすものとする。つまり、姿勢とは、歩行者が進もうとしている向きをいうものとする。なお、姿勢確率ｐ（ｙ＝ｒ｜ｆ）（ｒ＝１〜４）の合計は１となる様に算出される。 Here, the posture label r is composed of four classes (R = 4), the posture label 1 (r = 1) represents forward as the posture, the posture label 2 (r = 2) represents backward, and the posture label 3 (r = 3) represents the left direction, and posture label 4 (r = 4) represents the right direction. In other words, the posture refers to the direction in which the pedestrian is going. The total of the posture probabilities p (y = r | f) (r = 1 to 4) is calculated to be 1.

算出された姿勢確率ｐ（ｙ＝ｒ｜ｆ）は、姿勢情報として、情報提供部３０に出力される。
ここで、（８）式における、各パラメータ（θ_r0、θ_r ^T、θ_q0、θ_q ^Tは、事前の実験によって、次の様に算出されたものである。 The calculated posture probability p (y = r | f) is output to the information providing unit 30 as posture information.
Here, the parameters (θ _r0 , θ _r ^T , θ _q0 , θ _q ^T ) in the equation (8) are calculated as follows by a prior experiment.

まず、事前に収集された多くの学習用の撮像画像（歩行者が撮像されている画像。車載カメラ２による撮像画像であることがより望ましい）から、多数（Ｇ体）の歩行者画像について、姿勢特徴抽出部６１で算出したものと同様のｋ次元の姿勢特徴ベクトルｆ（ｊ）（（２）式参照。ｊ＝１、２、・・・Ｇ。Ｇは自然数。）を算出する。また、各歩行者画像における歩行者の姿勢を視認によって確認し、対応する姿勢ラベルｒを１つ付与しておく。この様に、予め付与された姿勢ラベルｒを「正解タグ」という。 First, from a large number of pre-collected captured images for learning (images in which pedestrians are captured, more preferably captured images by the in-vehicle camera 2), The same k-dimensional posture feature vector f (j) as that calculated by the posture feature extraction unit 61 (see equation (2). J = 1, 2,... G, G is a natural number) is calculated. Further, the posture of the pedestrian in each pedestrian image is confirmed by visual recognition, and one corresponding posture label r is given. In this way, the posture label r given in advance is referred to as a “correct tag”.

そして、学習用の撮像画像から算出された姿勢特徴ベクトルｆ（ｊ）のそれぞれについて、ｋ個の成分ｆ１〜ｆｋを説明変数とし、撮像画像に対応する「正解タグ」を目的変数として、（８）式が成り立つ様に、各パラメータ（スカラーθ_r0、θ_q0、ｋ次元のベクトルθ_r ^T、θ_q ^T）を設定する。 Then, for each of the posture feature vectors f (j) calculated from the captured image for learning, k components f1 to fk are used as explanatory variables, and “correct answer tags” corresponding to the captured images are used as objective variables. Each parameter (scalar θ _r0 , θ _q0 , k-dimensional vectors θ _r ^T , θ _q ^T ) is set so that the following equation holds.

ここで、姿勢推定部６２による歩行者の姿勢の推定結果を図３に示す。同図（ａ）は、事前に歩行者の撮像画像を複数準備し、各撮像画像から視認により確認された歩行者の姿勢（正解タグ）を白色で示した図である。同図（ｂ）は、同じ撮像画像における歩行者の姿勢ｙを、姿勢推定部６２によって推定した結果を示し図である。同図（ｂ）では、姿勢確率ｐ（ｙ＝ｒ｜ｆ）の値は、グレースケールで表示されており、色が濃い（黒い）ほど０に近い値を示し、色が薄い（白い）ほど１に近い値を示す。 Here, the estimation result of the posture of the pedestrian by the posture estimation unit 62 is shown in FIG. The figure (a) is the figure which prepared the captured image of a pedestrian in advance, and showed the pedestrian's attitude | position (correct tag) confirmed by visual recognition from each captured image in white. FIG. 6B is a diagram illustrating a result of estimating the pedestrian's posture y in the same captured image by the posture estimation unit 62. In FIG. 6B, the value of the posture probability p (y = r | f) is displayed in gray scale, and the darker the color (black), the closer to 0, and the lighter the color (white). A value close to 1 is shown.

例えば、最初の数枚（フレーム）についてみると、正解タグとして姿勢ラベル１が付与されており（図３（ａ）参照）、歩行者が「前」向きの姿勢をとっていたことが確認される。これに対して、同じ最初の数フレームを姿勢推定部６２により推定した結果では（同図（ｂ）参照）、正解タグである「前」向の姿勢をとっている姿勢確率ｐ（ｙ＝１｜ｆ）が、他の姿勢をとっている確率に比べて大きな値として算出されている。つまり、姿勢推定部６２による推定結果は高い確率で正解タグと一致することが確認されている。 For example, in the first few frames (frames), posture label 1 is given as a correct tag (see FIG. 3 (a)), and it is confirmed that the pedestrian has taken a “front” posture. The On the other hand, in the result of estimating the same first few frames by the posture estimation unit 62 (see FIG. 5B), the posture probability p (y = 1) taking the “forward” posture that is the correct tag. | F) is calculated as a value larger than the probability of taking another posture. That is, it has been confirmed that the estimation result by the posture estimation unit 62 matches the correct tag with a high probability.

この様に、姿勢情報抽出部６０は、多次元（Ｎ次元）のデータ（観測ベクトルｖ）から得られた、より低次元（ｋ次元）の特徴的な指標（姿勢特徴ベクトルｆ）を用いて、姿勢の推定を行うため、姿勢推定の処理において、演算量が低減されると共に、ノイズに頑健な処理結果が得られる。 In this way, the posture information extraction unit 60 uses a lower-dimensional (k-dimensional) characteristic index (posture feature vector f) obtained from multi-dimensional (N-dimensional) data (observation vector v). Since the posture is estimated, the amount of calculation is reduced in the posture estimation processing, and a processing result robust to noise is obtained.

＜動作情報抽出部＞
動作情報抽出部７０は、動作特徴抽出部７１と動作推定部７２とを備える。動作情報抽出部７０は、事前の実験において観測ベクトルｖに代えてＬ（Ｌは自然数。Ｌ＞Ｎ）次元の動作観測ベクトルｖ_mを観測して主成分分析を行う点、及び、主成分分析を行った結果を利用し、入力画像における動作観測ベクトルｖ_mに基づいて、歩行者が動作ラベルｒ_m（ｒ_mは自然数）に対応する動作をとっている確率を算出する点が異なる他は、姿勢情報抽出部６０と同様の処理を行う。 <Operation information extraction unit>
The motion information extraction unit 70 includes a motion feature extraction unit 71 and a motion estimation unit 72. Operation information extracting unit 70, instead of the observation vector v in the pre-experiment L (L is a natural number .L> N) by observing the behavior observation vector v _m dimensions that performs principal component analysis, and principal component analysis utilizing the results of, based on the operation observation vector v _m in the input image, pedestrian operation label r _m (r _m is a natural number) in addition to different points to calculate the probability of taking an operation corresponding to the The same processing as that of the posture information extraction unit 60 is performed.

つまり、動作特徴抽出部７１は姿勢特徴抽出部６１に相当し、動作推定部７２は姿勢推定部６２に相当する。動作特徴抽出部７１は、動作観測ベクトルｖ_mを生成し、この動作観測ベクトルｖ_mに基づいて姿勢特徴ベクトルｆに相当する動作特徴ベクトルを算出し、動作特徴情報として動作推定部７２に出力する。動作推定部７２は、動作特徴情報（動作特徴ベクトル）に基づき、撮像画像の歩行者が各動作ラベルｒ_mに対応する動作をとっている確率を算出し、算出結果を動作情報として情報提供部３０に出力する。 That is, the motion feature extraction unit 71 corresponds to the posture feature extraction unit 61, and the motion estimation unit 72 corresponds to the posture estimation unit 62. Operation feature extracting unit 71 generates an operation observation vector v _m, and calculates the operation characteristic vector corresponding to the orientation feature vector f on the basis of the behavior observation vector v _m, and outputs the movement-estimating unit 72 as the operation characteristic information . Motion estimation unit 72, based on the operation characteristic information (operation feature vectors), to calculate the probability that the pedestrian of the captured image is taking an operation corresponding to the operation labeled r _m, the information providing unit the calculation result as operation information Output to 30.

動作観測ベクトルｖ_mとは、検出すべき歩行者について、時系列に連続した撮像画像における観測ベクトルｖ（（１）式参照）を、予め定められたフレーム数Ｈ（Ｈは２以上の自然数。ここでは、Ｈ＝１０とした。）だけ、単純に時系列に並べて構成されるＬ次元（ここでは、Ｌ＝Ｎ×Ｈ＝１４×１０＝１４０）のベクトルをいう。 The motion observation vector v _m is a predetermined number of frames H (H is a natural number equal to or greater than 2). Here, only H = 10) is an L-dimensional vector (here, L = N × H = 14 × 10 = 140) configured in a time series.

また、動作ラベルｒ_mは３クラスからなり、動作ラベル１（ｒ_m＝１）は止まる動作を表し、動作ラベル２（ｒ_m＝２）は歩く動作を表し、動作ラベル３（ｒ_m＝３）は走る動作を表わすものとする。 The action label r _m is composed of three classes, the action label 1 (r _m = 1) represents a stop action, the action label 2 (r _m = 2) represents a walking action, and the action label 3 (r _m = 3). ) Represents a running action.

ここで、動作情報抽出部７０による歩行者の動作の推定結果を図４に示す。同図（ａ）は、事前に同一の歩行者について時系列に撮像した画像を複数準備し、撮像画像から視認によって確認された該歩行者の動作（正解タグ）を白色で示した図である。同図（ｂ）は、同じ撮像画像における歩行者の動作を、動作推定部７２によって推定した結果を示した図である。同図（ｂ）では、各動作ラベルｒ_mに対応する動作をとっていると推定される確率は、色が濃い（黒い）ほど０に近く、色が薄い（白い）ほど１に近い値として示されている。 Here, the estimation result of the motion of the pedestrian by the motion information extraction unit 70 is shown in FIG. The figure (a) is the figure which prepared several images imaged in time series about the same pedestrian beforehand, and showed the operation | movement (correct tag) of this pedestrian confirmed by visual recognition from the captured image in white. . FIG. 6B is a diagram showing a result of estimating the motion of the pedestrian in the same captured image by the motion estimation unit 72. In FIG. (B), the probability that is estimated to have taken an operation corresponding to the operation labeled r _m is close to 0 as darker (black), as close to 1 as light color (white) It is shown.

例えば、最初の数フレームについてみると、正解タグとして動作ラベル２が付与されており（図４（ａ）参照）、歩行者が「歩く」動作をしていたことが正解である。これに対して、同じ最初の数フレームを動作推定部７２により推定した結果では（同図（ｂ）参照）、正解タグである「歩く」動作を行っている確率が、他の動作を行っている確率に比べて大きな値として算出されている。つまり、
動作推定部７２による推定結果は高い確率で正解タグと一致することが確認されている。 For example, in the first few frames, the action label 2 is given as a correct tag (see FIG. 4A), and the correct answer is that the pedestrian was “walking”. On the other hand, according to the result of estimating the same first few frames by the motion estimation unit 72 (see (b) in the figure), the probability of performing the “walking” motion that is the correct tag is determined by performing other motions. It is calculated as a larger value than the probability of being. That means
It has been confirmed that the estimation result by the motion estimation unit 72 matches the correct tag with a high probability.

この様に、動作情報抽出部７０は、姿勢情報抽出部６０と同様の効果を得ることができる。つまり、多次元（Ｎ×Ｈ次元）のデータ（動作観測ベクトルｖ_m）から得られた、より低次元の特徴的な指標（動作特徴ベクトル）を用いて動作の推定が行われるため、動作の推定処理において演算量が低減されると共に、ノイズに頑健な処理結果が得られる。 In this way, the motion information extraction unit 70 can obtain the same effects as the posture information extraction unit 60. That is, since motion estimation is performed using a lower-dimensional characteristic index (motion feature vector) obtained from multi-dimensional (N × H dimension) data (motion observation vector v _m ), In the estimation process, the amount of calculation is reduced, and a processing result robust to noise is obtained.

＜下端位置推定部＞
下端推定部８０は、姿勢特徴抽出部６１から出力されたｋ次元の姿勢特徴ベクトルｆに、予め用意された上述の変換行列Ａｋを作用させて、（９）式に示すように、Ｎ次元の推定ベクトルｖ_dを算出する。 <Lower position estimation part>
The lower end estimation unit 80 applies the above-described transformation matrix Ak prepared in advance to the k-dimensional posture feature vector f output from the posture feature extraction unit 61, and as shown in the equation (9), an N-dimensional An estimated vector v _d is calculated.

つまり、Ｎ次元空間における観測ベクトルｖをｋ次元空間に写像して得た姿勢特徴ベクトルｆを、再びＮ次元空間に写像し直す処理を行うことにより得られたものが推定ベクトルｖ_dであり、推定ベクトルｖ_dは元の観測ベクトルｖの近似値となっている。 In other words, the estimated vector v _d is obtained by performing a process of re-mapping the posture feature vector f obtained by mapping the observation vector v in the N-dimensional space into the k-dimensional space, The estimated vector v _d is an approximate value of the original observation vector v.

ここで、上述のように、遮蔽等の影響により未検出となったパーツの情報として予め定められた代替値（ここでは０）を当てはめて観測ベクトルｖを生成しておくことにより、未検出であったパーツ位置の情報を推定ベクトルｖ_dの成分として得ることができる。 Here, as described above, an observation vector v is generated by applying a predetermined alternative value (0 in this case) as information on parts that have not been detected due to the influence of shielding or the like. Information on the part position that was present can be obtained as a component of the estimated vector v _d .

この様な推定ベクトルｖ_dを用いて、下端推定部８０は、撮像画像における歩行者の下端位置座標を推定する。下端位置座標は、右足座標及び左足座標よりさらに下方に位置する。 Using such an estimated vector v _d , the lower end estimation unit 80 estimates the lower end position coordinates of the pedestrian in the captured image. The lower end position coordinates are located further below the right foot coordinates and the left foot coordinates.

具体的には、下端推定部８０は、推定ベクトルｖ_dの成分を説明変数とし、結果である下端位置座標Ｕ＝（ｘ、ｙ）^Tを、線形回帰モデル（下端推定モデル）に基づき（１０）式により推定する。
Specifically, the lower end estimation unit 80 uses the component of the estimated vector v _d as an explanatory variable, and uses the resulting lower end position coordinates U = (x, y) ^{T based} on a linear regression model (lower end estimation model) (10 ) To estimate.

推定された下端位置座標Ｕ＝（ｘ、ｙ）^Tは、下端位置情報として情報提供部３０に出力される。 The estimated lower end position coordinate U = (x, y) ^T is output to the information providing unit 30 as lower end position information.

ここで、（１０）式における回帰係数Ｗ、及びバイアスベクトルｂの算出方法について説明する。これらは、事前の実験により、収集された多くの撮像画像から最小二乗法により決定されたものである。回帰係数Ｗ（ｉ）は回帰係数を示す２×Ｎの行列であり、バイアスベクトルｂは２×１の行列である。 Here, a method of calculating the regression coefficient W and the bias vector b in the equation (10) will be described. These are determined by a least-squares method from a large number of captured images collected by a prior experiment. The regression coefficient W (i) is a 2 × N matrix indicating the regression coefficient, and the bias vector b is a 2 × 1 matrix.

回帰係数Ｗ、及びバイアスベクトルｂを図５に示す様に決定した場合、推定された下端位置座標Ｕの推定誤差は、図６に示す様に、ほぼ５画素以内に収まることが確認された。
この様に、下端推定部８０では、下端位置座標Ｕを、各パーツの位置関係に基づいて推定するため、下端位置座標Ｕが精度よく算出される。 When the regression coefficient W and the bias vector b were determined as shown in FIG. 5, it was confirmed that the estimated error of the estimated lower end position coordinate U was within approximately 5 pixels as shown in FIG.
In this way, since the lower end estimation unit 80 estimates the lower end position coordinate U based on the positional relationship of each part, the lower end position coordinate U is accurately calculated.

＜情報提供部＞
情報提供部３０は、姿勢情報、動作情報、及び下端位置情報に基づき、自車両にとって危険な対象であると判断した歩行者（以下、危険対象という）を強調する表示を撮像画像に加えた加工画像を生成する。 <Information provision department>
The information providing unit 30 adds a display that emphasizes a pedestrian (hereinafter referred to as a dangerous target) that is determined to be a dangerous target for the host vehicle to the captured image based on the posture information, the motion information, and the lower end position information. Generate an image.

本実施形態では、情報提供部３０は、図７に示す様に、下端位置情報に基づき、各歩行者９１〜９４の下端位置を示す下端線１０１〜１０４を元の撮像画像に合成した加工画像を生成する。 In the present embodiment, as shown in FIG. 7, the information providing unit 30 combines the lower end lines 101 to 104 indicating the lower end positions of the pedestrians 91 to 94 with the original captured image based on the lower end position information. Is generated.

また、情報提供部３０は、歩行者が前後方向、左右方向のうちいずれの方向に進もうとする姿勢を取っているかを示すマーク（姿勢表示マーク）を元の画像に合成した加工画像を生成する。 In addition, the information providing unit 30 generates a processed image in which a mark (posture display mark) indicating whether the pedestrian is going to move in the front-rear direction or the left-right direction is combined with the original image. To do.

具体的には、情報提供部３０は、姿勢情報に基づき、識別対象が「前」向及び「後」向の姿勢をとっている姿勢確率ｐ（ｙ＝ｒ｜ｆ）（但しｒ＝１又は２）の合計値を「縦」向の姿勢をとっている確率とし、「左」向及び「右」向の姿勢をとっている姿勢確率ｐ（ｙ＝ｒ｜ｆ）（但しｒ＝３又は４）の合計値を「横」向の姿勢をとっている確率として、「縦」「横」の姿勢をとる確率の比を縦横比とした楕円の姿勢表示マークを生成し、該姿勢表示マークを元の画像に合成した加工画像を生成する。つまり、図７に示す様に、「縦」向きの歩行者９１については縦長の楕円１１１が姿勢表示マークとして表示され、「横」向きの歩行者９２、９３及び９４に対しては、横長の楕円１１２〜１１４が姿勢表示マークとして表示される。 Specifically, based on the posture information, the information providing unit 30 has a posture probability p (y = r | f) (provided that r = 1 or 2) is the probability that the posture in the “vertical” direction is taken, and the posture probability p (y = r | f) in which the posture in the “left” direction and the “right” direction is taken (where r = 3 or 4) The total value of 4) is used as a probability that the posture in the “horizontal” direction is taken, and an elliptical posture display mark is generated with a ratio of the probability of taking the “vertical” and “horizontal” postures, and the posture display mark A processed image is generated by combining the image with the original image. That is, as shown in FIG. 7, a vertically long ellipse 111 is displayed as a posture display mark for the “vertical” pedestrian 91, and a horizontally long pedestrian 92, 93, and 94 is Ellipses 112 to 114 are displayed as posture display marks.

さらにまた、情報提供部３０は、下端位置情報と撮像画像中の無限遠点とを利用して、自車両から歩行者までの距離を算出し、算出した距離を元の撮像画像に追加表示する。撮像画像中の無限遠点を利用した対象物までの距離の算出方法は周知であるため（例えば、特許第４０６３１７０号参照）、ここでは説明を省略する。 Furthermore, the information providing unit 30 calculates the distance from the host vehicle to the pedestrian using the lower end position information and the infinity point in the captured image, and additionally displays the calculated distance on the original captured image. . Since the calculation method of the distance to the target object using the infinity point in the captured image is well known (for example, refer to Japanese Patent No. 4063170), description thereof is omitted here.

また、情報提供部３０は、「止まる」以外の動作、つまり「歩く」又は「走る」動作を行っている歩行者であって、かつ、「縦」方向より自車両にとって危険が生じる虞のある「横」方向に進もうとしている歩行者を危険対象として、動作推定情報に基づき危険対象と判定された歩行者を囲む枠を元の撮像画像に合成した加工画像を生成する。つまり、図７に示す様に、「歩く」動作を行っており、かつその向きが「横」向きである歩行者９２〜９４について、これらの歩行者９２〜９４を強調する様に、枠１２２〜１２４が追加表示される。 In addition, the information providing unit 30 is a pedestrian performing an operation other than “stop”, that is, a “walking” or “running” operation, and there is a risk of danger to the own vehicle from the “vertical” direction. A processed image is generated by synthesizing a frame surrounding a pedestrian determined to be a dangerous object based on the motion estimation information with the original captured image, with a pedestrian trying to move in the “lateral” direction as a dangerous object. In other words, as shown in FIG. 7, for the pedestrians 92 to 94 that are performing the “walking” operation and the direction is “lateral”, the frame 122 is emphasized so as to emphasize these pedestrians 92 to 94. -124 are additionally displayed.

この様に情報提供部３０は、抽出された姿勢情報、動作情報、及び下端位置情報に基づく歩行者に関する種々の情報を、加工画像という見やすい形で提供することにより、運転者に危険対象についての注意を促す。 In this way, the information providing unit 30 provides various information about the pedestrian based on the extracted posture information, motion information, and lower end position information in a form that is easy to see as a processed image, so that the driver can be informed about the danger target. Call attention.

［効果］
以上説明したように、本実施形態の移動体検出装置１は、姿勢情報抽出部６０によって、検出すべき歩行者について各パーツの位置情報を成分とする観測ベクトルｖを生成し、この観測ベクトルｖを各パーツの位置関係の特徴を示す姿勢特徴ベクトルｆに変換し、この姿勢特徴ベクトルｆに基づき歩行者の姿勢を推定する。これによると、体幹の傾きに基づき姿勢を推定する場合と比べて、自車両に対して危険な姿勢をとる歩行者を迅速に検出することができる。 [effect]
As described above, in the moving body detection apparatus 1 of the present embodiment, the posture information extraction unit 60 generates the observation vector v having the position information of each part as a component for the pedestrian to be detected, and this observation vector v Is converted into a posture feature vector f indicating the positional relationship feature of each part, and the posture of the pedestrian is estimated based on this posture feature vector f. According to this, it is possible to quickly detect a pedestrian who takes a dangerous posture with respect to the host vehicle, as compared with the case where the posture is estimated based on the inclination of the trunk.

また、姿勢特徴ベクトルｆは、撮像画像における観測ベクトルｖに比べて次元が低減されるため、姿勢特徴ベクトルｆを用いた姿勢の推定処理では、演算量が低減される共に、ノイズに頑健な推定結果を得ることができる。結果として、自車両にとって危険な対象となる歩行者を、迅速にかつ精度よく検出することができる。 Further, since the pose feature vector f has a reduced dimension compared to the observed vector v in the captured image, the pose estimation process using the pose feature vector f reduces the amount of computation and is robust against noise. The result can be obtained. As a result, it is possible to quickly and accurately detect a pedestrian that is a dangerous target for the host vehicle.

なお、移動体検出装置１は、姿勢を推定するにあたり、ロジスティック回帰手法を用いたモデルを確率モデルとして採用した。これにより、歩行者がどの姿勢をとっているかという結果（離散変数）を、姿勢特徴ベクトルｆから予測することができる。 Note that the moving object detection apparatus 1 employs a model using a logistic regression method as a probability model in estimating the posture. As a result, the result (discrete variable) indicating which posture the pedestrian is taking can be predicted from the posture feature vector f.

また、移動体検出装置１は、動作情報抽出部７０によって、上述の観測ベクトルｖに基づいて歩行者の姿勢を推定する考え方を時系列的に拡張し、動作観測ベクトルｖ_mに基づいて歩行者の動作を推定する。これによると、姿勢に加えて動作についての情報を利用できるため、自車両にとって危険な対象となる歩行者を、より精度よく検出することができる。 In addition, the moving body detection device 1 uses the motion information extraction unit 70 to extend the concept of estimating the pedestrian's posture based on the observation vector v described above in a time series, and based on the motion observation vector v _m. Estimate the behavior of According to this, since information about an operation in addition to a posture can be used, a pedestrian that is a dangerous target for the host vehicle can be detected with higher accuracy.

さらにまた、移動体検出装置１は、下端推定部８０によって算出される推定ベクトルｖ_dに基づいて、下端位置座標Ｕを推定する。これにより、例えば、パーツ識別部４３にて観測された右足座標及び左足座標だけに基づいて下端位置座標を推定する場合に比べて、信頼性の高い推定結果が得られるだけでなく、
右足座標又は左足座標が未検出であった場合でも、精度よく下端位置座標Ｕを推定することができる。結果として、例えばこの下端位置座標Ｕの情報を用いて、単眼カメラで撮像した撮像画像においても、歩行者までの距離を精度よく推定することができるようになる。 Furthermore, the moving body detection apparatus 1 estimates the lower end position coordinate U based on the estimated vector v _d calculated by the lower end estimation unit 80. Thereby, for example, as compared with the case where the lower end position coordinates are estimated based only on the right foot coordinates and the left foot coordinates observed by the parts identifying unit 43, not only a highly reliable estimation result is obtained,
Even when the right foot coordinate or the left foot coordinate is not detected, the lower end position coordinate U can be estimated with high accuracy. As a result, for example, the distance to the pedestrian can be accurately estimated even in the captured image captured by the monocular camera using the information on the lower end position coordinate U.

［特許請求の範囲との対応］
本実施形態の移動体検出装置１が特許請求の範囲の「検出装置」に相当し、車載カメラ２が特許請求の範囲の「撮像手段」に相当し、位置検出部１０が特許請求の範囲の「パーツ位置検出手段」に相当する。また、姿勢特徴抽出部６１が特許請求の範囲の「情報算出手段」に相当し、姿勢推定部６２が特許請求の範囲の「姿勢推定手段」に相当する。さらにまた、動作特徴抽出部７１が特許請求の範囲の「動作情報算出手段」に相当し、動作推定部７２が特許請求の範囲の「動作推定手段」に相当し、下端推定部８０が特許請求の範囲の「下端推定手段」に相当する。また、情報提供部３０が特許請求の範囲の「姿勢情報提供手段」、「動作情報提供手段」、「距離情報提供手段」に相当する。 [Correspondence with Claims]
The mobile body detection device 1 of the present embodiment corresponds to the “detection device” in the claims, the in-vehicle camera 2 corresponds to the “imaging means” in the claims, and the position detection unit 10 corresponds to the claims. This corresponds to “part position detection means”. The posture feature extraction unit 61 corresponds to “information calculation unit” in the claims, and the posture estimation unit 62 corresponds to “posture estimation unit” in the claims. Further, the motion feature extraction unit 71 corresponds to “motion information calculation means” in the claims, the motion estimation unit 72 corresponds to “motion estimation means” in the claims, and the lower end estimation unit 80 claims. Corresponds to “lower end estimation means”. The information providing unit 30 corresponds to “attitude information providing means”, “motion information providing means”, and “distance information providing means” in the claims.

［他の実施形態］
以上、本発明の実施形態について説明したが、本発明は、上記実施形態に限定されることなく、種々の形態をとり得ることは言うまでもない。 [Other Embodiments]
As mentioned above, although embodiment of this invention was described, it cannot be overemphasized that this invention can take a various form, without being limited to the said embodiment.

（イ）上記実施形態では、位置検出部１０にて各パーツの座標を求める手法として、ｌａｔｅｎｔＳＶＭ及びＤＰＭを用いた手法を利用したが、各パーツの座標を算出する手法は、これに限るものではなく、他の学習法やモデルによる手法を適用してもよい。 (B) In the above embodiment, the technique using the latent SVM and DPM is used as the technique for obtaining the coordinates of each part in the position detection unit 10, but the technique for calculating the coordinates of each part is not limited to this. Alternatively, other learning methods and models may be applied.

（ロ）上記実施形態では、情報抽出部２０にて姿勢特徴ベクトルｆ及び動作特徴ベクトルを算出するにあたり、共分散行列の対角化によって得られた主成分を用いた主成分分析を行ったが、共分散行列の代わりに自己相関行列を用いて主成分を得るようにしてもよい。 (B) In the above embodiment, the principal component analysis using the principal component obtained by diagonalizing the covariance matrix is performed when the information extraction unit 20 calculates the posture feature vector f and the motion feature vector. The principal component may be obtained using an autocorrelation matrix instead of the covariance matrix.

（ハ）上記実施形態では、姿勢推定部６２にて、姿勢ラベルｒのクラス数Ｋ＝４としたが、クラス数Ｋは任意に定めてよい。つまり、斜め右、斜め左等、さらに細かく分けてクラス数Ｋを増加させても良い。また、前後を合わせて縦方向、左右を合わせて横方向として、クラス数をＫ＝２の様に低減してもよい。動作ラベルｒ_mのクラス数についても同様である。 (C) In the above-described embodiment, the posture estimation unit 62 sets the number of classes K of the posture label r = 4, but the number of classes K may be arbitrarily determined. In other words, the number of classes K may be increased in more detail, such as diagonally right and diagonally left. Alternatively, the number of classes may be reduced as K = 2, with the front and rear being the vertical direction and the left and right being the horizontal direction. The same applies to the number of classes operation label r _m.

（二）上記実施形態では識別対象を歩行者としていたが、識別対象は、自転車や、オートバイ、三輪車、車椅子等の車両に乗車する人物であってもよい。但し、自動車のように、乗員が車両の内部に乗車するものは除外される。 (2) In the above embodiment, the identification target is a pedestrian, but the identification target may be a person who rides a vehicle such as a bicycle, a motorcycle, a tricycle, or a wheelchair. However, a vehicle in which an occupant gets inside the vehicle, such as an automobile, is excluded.

（ホ）上記実施形態では、歩行者を表すパーツとして、頭、両肩、腰、両足を設定したが、パーツとして選択する部位はこれらに限られるわけではない。パーツは、移動体に何らかの動きが出現するときに、相対的な位置変化が現れやすい部位を含むものであればよい。 (E) In the above embodiment, the head, both shoulders, waist, and both feet are set as the parts representing the pedestrian, but the parts to be selected as the parts are not limited to these. The part only needs to include a part where a relative position change is likely to appear when some movement appears in the moving body.

（ヘ）上記実施系形態で説明した移動体検出装置１の構成要素は、ハードウェアで実現してもよく、ソフトウェアで実現してもよく、ハードウェアとソフトウェアとの組み合わせで実現してもよい。また、これらの構成要素は機能概念的なものであり、その一部又は全部を、機能的又は物理的に分散又は統合してもよい。 (F) The components of the moving object detection apparatus 1 described in the above embodiment may be realized by hardware, software, or a combination of hardware and software. . These components are functionally conceptual, and some or all of them may be functionally or physically distributed or integrated.

１・・・移動体検出装置２・・・車載カメラ３・・・車載表示装置１０・・・位置検出部２０・・・情報抽出部３０・・・情報提供部４１・・・特徴量算出部４２・・・全体識別器４３・・・パーツ識別器５０・・・観測情報生成部６０・・・姿勢情報抽出部７０・・・動作情報抽出部８０・・・下端推定部 DESCRIPTION OF SYMBOLS 1 ... Moving body detection apparatus 2 ... Car-mounted camera 3 ... Car-mounted display apparatus 10 ... Position detection part 20 ... Information extraction part 30 ... Information provision part 41 ... Feature-value calculation part 42 ... Whole classifier 43 ... Parts classifier 50 ... Observation information generation unit 60 ... Attitude information extraction unit 70 ... Motion information extraction unit 80 ... Lower end estimation unit

Claims

車両に搭載される検出装置であって、
識別対象の画像を撮像する撮像手段（２）と、
前記識別対象のうち一つを特定対象とし、前記識別対象を部分的に表す複数のパーツの位置をパーツ位置として、前記撮像手段が撮像した画像から、前記特定対象のパーツ位置を検出するパーツ位置検出手段（１０）と、
前記パーツの位置関係の特徴を示す情報をパーツ特徴情報として、前記特定対象について、前記パーツ特徴情報を算出する情報算出手段（６１）と、
前記識別対象の前記パーツ特徴情報と該識別対象の姿勢との関係を示す確率モデルを用いて、前記撮像手段が取得した画像中の前記特定対象について、前記パーツ特徴情報から前記姿勢を推定する姿勢推定手段（６２）と、
前記姿勢推定手段の推定結果に基づく情報を提供する姿勢情報提供手段（３０）と、
を備えることを特徴とする検出装置。 A detection device mounted on a vehicle,
Imaging means (2) for imaging an image to be identified;
A part position that detects a part position of the specific target from an image captured by the imaging unit, where one of the identification targets is a specific target and positions of a plurality of parts that partially represent the identification target are part positions Detection means (10);
Information indicating the feature of the positional relationship of the parts as part feature information, information calculating means (61) for calculating the part feature information for the specific target;
A posture for estimating the posture from the part feature information for the specific target in the image acquired by the imaging unit using a probability model indicating a relationship between the part feature information of the identification target and the posture of the identification target. Estimating means (62);
Attitude information providing means (30) for providing information based on the estimation result of the attitude estimation means;
A detection apparatus comprising:

前記パーツ位置に基づく複数の変量を成分とするベクトルを観測ベクトルとし、前記観測ベクトルの各成分について多変量解析における主成分分析を行うことにより得られる各主成分を列成分とする行列を変換行列として、
前記情報算出手段は、前記特定対象について、予め用意された前記変換行列の転置行列を用いて、前記観測ベクトルを座標変換して前記パーツ特徴情報を算出すること、を特徴とする請求項１に記載の検出装置。 A matrix having a principal component as a column component obtained by performing a principal component analysis in multivariate analysis on each component of the observation vector as an observation vector and a vector having a plurality of variables based on the part position as a transformation matrix As
The information calculation means calculates the part feature information by performing coordinate conversion of the observation vector using a transpose matrix of the conversion matrix prepared in advance for the specific target. The detection device described.

前記情報算出手段は、
前記多変量解析における主成分分析を用いて、複数の前記識別対象に対して前記観測ベクトルの共分散行列を算出し、算出された該行列の固有値のうち大きい順に予め定められた数までの固有値について、対応する固有ベクトルを順に並べてなる行列を、前記変換行列として用いること、を特徴とする請求項２に記載の検出装置。 The information calculating means includes
The principal component analysis in the multivariate analysis is used to calculate a covariance matrix of the observation vectors for a plurality of the identification targets, and eigenvalues up to a predetermined number in descending order among the calculated eigenvalues of the matrix The detection device according to claim 2, wherein a matrix in which corresponding eigenvectors are arranged in order is used as the transformation matrix.

前記確率モデルは、
前記識別対象の取りうる姿勢として予め定めた少なくとも一つの姿勢を推定姿勢とし、前記識別対象が有する前記パーツ特徴情報の成分を説明変数とし、前記識別対象がそれぞれの前記推定姿勢を取りうる確率を目的変数とした、ロジスティック回帰手法を用いたモデルであること、を特徴とする請求項１から３のいずれか一項に記載の検出装置。 The probability model is
Estimated posture is at least one predetermined posture as the posture that can be taken by the identification target, the component of the part feature information that the identification target has as an explanatory variable, and the probability that the identification target can take the respective estimated posture The detection apparatus according to claim 1, wherein the detection apparatus is a model using a logistic regression method as an objective variable.

前記パーツ位置検出手段において、少なくとも１つのパーツの位置が未検出である前記特定対象について、
前記情報算出手段は、未検出の前記パーツの位置に代替値を当てはめて前記観測ベクトルを生成すること、を特徴とする請求項２から４のいずれか一項に記載の検出装置。 In the part position detecting means, for the specific object where the position of at least one part is not detected,
5. The detection apparatus according to claim 2, wherein the information calculation unit generates the observation vector by applying an alternative value to a position of the undetected part. 6.

前記パーツ位置検出手段によって検出される前記パーツ位置を時系列に観測して得られる、前記パーツの時系列における位置関係の特徴を示す動作特徴情報を、前記特定対象について算出する動作情報算出手段（７１）と、
前記動作特徴情報に基づいて前記特定対象の動作を推定する動作推定手段（７２）と、
前記動作推定手段の推定結果に基づく情報を提供する動作情報提供手段（３０）と、
を備えることを特徴とする請求項１から５のいずれか一項に記載の検出装置。 Motion information calculation means for calculating, for the specific target, motion characteristic information indicating the characteristics of the positional relationship of the parts in time series obtained by observing the part positions detected by the part position detection means in time series. 71)
Motion estimation means (72) for estimating the motion of the specific target based on the motion feature information;
Motion information providing means (30) for providing information based on the estimation result of the motion estimation means;
The detection apparatus according to claim 1, further comprising:

前記パーツ特徴情報に前記変換行列を作用させることにより得られるベクトルを推定ベクトルとして、
前記パーツ位置と前記識別対象の下端位置との関係を示す下端推定モデルを用いて、前記特定対象の前記推定ベクトルに基づいて該特定対象の下端位置を推定する下端推定手段（８０）と、
前記下端推定手段により推定された下端位置に基づいて、前記特定対象と自車両との距離を算出し、算出された前記距離に基づく情報を提供する距離情報提供手段（３０）と、
を備えることを特徴とする請求項１から６のいずれか一項に記載の検出装置。 As an estimated vector, a vector obtained by operating the transformation matrix on the part feature information,
Lower end estimation means (80) for estimating a lower end position of the specific target based on the estimation vector of the specific target using a lower end estimation model indicating a relationship between the part position and the lower end position of the identification target;
Based on the lower end position estimated by the lower end estimating means, a distance information providing means (30) for calculating a distance between the specific object and the host vehicle and providing information based on the calculated distance;
The detection apparatus according to any one of claims 1 to 6, further comprising:

前記下端推定モデルは、前記推定ベクトルの各成分を説明変数として前記下端位置を推定する線形回帰モデルであること、を特徴とする請求項７に記載の検出装置。 The detection apparatus according to claim 7, wherein the lower end estimation model is a linear regression model that estimates the lower end position using each component of the estimation vector as an explanatory variable.