JP2022125973A

JP2022125973A - Position estimating apparatus, position estimating program, and position estimating method

Info

Publication number: JP2022125973A
Application number: JP2022015287A
Authority: JP
Inventors: 宏朗西岡; Hiroo Nishioka; 賢治原; Kenji Hara; リーボムジュン; Bumjun Lee; 淳光安; Atsushi Mitsuyasu; 篤眞砂; Atsushi Masago; 修中窪; Osamu Nakakubo
Original assignee: Effect Co Ltd
Current assignee: Effect Co Ltd
Priority date: 2021-02-17
Filing date: 2022-02-03
Publication date: 2022-08-29

Abstract

To provide a position estimating apparatus, a position estimating program, and a position estimating method which can estimate a position of a detection target faster and with satisfactory accuracy.SOLUTION: A position estimating apparatus 10 according to the present invention calculates a direction from a panoramic monocular camera C to a detection object based on detection objects (W1 - W3, E) detected in each frame of an image by image recognition, detects, as an estimated point representing a position of the detection object, a coordinate of an intersection of the calculated direction and a shaped surface of a three-dimensional map, classifies each estimated point into groups with respect to each detection object, and estimates a position of the detection object based on the group of the estimated points.SELECTED DRAWING: Figure 1

Description

本発明は、俯瞰用単眼カメラから撮影した画像に基づいて物体の位置を推定する位置推定装置および位置推定プログラム並びに位置推定方法に関するものである。 The present invention relates to a position estimating device, a position estimating program, and a position estimating method for estimating the position of an object based on an image captured by an overhead monocular camera.

カメラにより撮影された画像に基づいて検出対象の位置を推定するものが知られている（例えば、特許文献１参照）。
特許文献１に記載の非同期カメラ映像を用いた人物位置推定方法及びそのシステムは、同一人物を２台以上の非同期カメラで撮影し、参照カメラで撮影された連続する前後フレームを選択し、前後フレームの間の時刻に撮影した基準カメラと基準フレームを選択し、各フレーム画像中の人物領域を推定し、人物領域の各々から人物の頭頂部を推定し、前後フレームおよび基準フレームの各頭頂部の位置情報を用いて人物の３次元的な頭頂部位置を推定する、というものである。 A technique for estimating the position of a detection target based on an image captured by a camera is known (see Patent Document 1, for example).
The method and system for estimating the position of a person using asynchronous camera images described in Patent Document 1 captures the same person with two or more asynchronous cameras, selects consecutive frames captured by reference cameras, A reference camera and a reference frame photographed at a time between The position information is used to estimate the three-dimensional position of the top of the head of a person.

この特許文献１では、連続する２つのフレーム画像に二人の人物が写っている場合には、基準フレーム上の１人の人物と、対応する前フレーム上のいずれかの人物および前記後フレーム上のいずれかの人物との各組み合わせの各々について３次元的な頭頂部位置を推定し、組合せの各々について３次元的な頭頂部位置から身長を求め、身長が最も身長レンジに近い組合せの３次元的な頭頂部位置を基準フレーム上の人物の位置と推定している。 In this patent document 1, when two persons are shown in two consecutive frame images, one person on the reference frame, one of the corresponding persons on the previous frame, and the person on the back frame. Estimate the three-dimensional top of the head position for each combination with one of the persons, calculate the height from the three-dimensional top of the head position for each of the combinations, and the three-dimensional combination whose height is closest to the height range The position of the top of the head is estimated as the position of the person on the reference frame.

特開２００７－２３３５２３号公報JP-A-2007-233523

しかし、特許文献１では、検出対象が２人でなく３人以上の大人数であった場合には、頭頂部位置の組み合わせが多くなるため計算量が増大する。また複数人が近接していた状態で、立っている状態から座ったり、座った状態から立ったりすると、見掛けの身長が変わるため、検出対象を誤認するおそれがある。 However, in Patent Literature 1, if the number of detection targets is three or more instead of two, the number of combinations of parietal position increases, resulting in an increase in the amount of calculation. In addition, when a plurality of people are close to each other, if they change from a standing state to a sitting state or from a sitting state to stand up, their apparent height changes, and there is a risk of erroneous recognition of a detection target.

そこで本発明は、検出対象の位置を早く精度よく推定することが可能な位置推定装置および位置推定プログラム並びに位置推定方法を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a position estimation device, a position estimation program, and a position estimation method capable of quickly and accurately estimating the position of a detection target.

本発明の位置推定装置は、複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定するものであり、前記画像の各フレームから画像認識により、フレーム中の検出対象を検出する対象検出手段と、前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出する座標検出手段と、前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするクラスタリング手段と、前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定する位置推定手段とを備えたことを特徴とするものである。 A position estimating apparatus of the present invention estimates the positions of detection targets appearing in images captured by one or more bird's-eye view monocular cameras, based on a three-dimensional map of locations where a plurality of detection targets are located. a target detection means for detecting a detection target in each frame of the image by image recognition; and calculating a direction toward the detection target from the bird's-eye view monocular camera based on the detection target detected by the target detection means. coordinate detecting means for detecting the coordinates of the intersection of this direction and the shape surface on the three-dimensional map as an estimated point that is the position of the detection target; The present invention is characterized by comprising: clustering means for grouping estimated points by detection target; and position estimation means for estimating the position of the detection target from a set of estimated points grouped by the clustering means. .

また、本発明の位置推定プログラムは、コンピュータを、複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定する位置推定装置として機能させるものであり、前記画像の各フレームから画像認識により、フレーム中の検出対象を検出する対象検出手段、前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出する座標検出手段、前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするクラスタリング手段、前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定する位置推定手段として機能させることを特徴としたものである。 Further, the position estimation program of the present invention causes the computer to determine the positions of the detection targets appearing in the images captured by one or more bird's-eye view monocular cameras based on a three-dimensional map of locations where the plurality of detection targets are located. and an object detecting means for detecting a detection object in each frame by image recognition from each frame of the image; coordinate detection means for calculating a direction from a monocular camera to a detection target and detecting the coordinates of the intersection of this direction and the shape surface on the three-dimensional map as an estimated point that is the position of the detection target; functioning as clustering means for grouping each estimated point detected in the frame of (1) by detection target, and position estimation means for estimating the position of the detection target from the set of estimated points grouped by the clustering means; It is characterized by

更に、本発明の位置推定方法は、複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定する、対象検出手段と、座標検出手段と、クラスタリング手段と、位置推定手段とを備えた位置推定装置による位置推定方法であり、前記対象検出手段が、前記画像の各フレームから画像認識により、フレーム中の検出対象を検出するステップと、前記座標検出手段が、前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出するステップと、前記クラスタリング手段が、前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするステップと、前記位置推定手段が、前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定するステップとを含むことを特徴としたものである。 Furthermore, the position estimation method of the present invention estimates the positions of the detection targets appearing in the images captured by one or more bird's-eye view monocular cameras, based on a three-dimensional map of the location where the plurality of detection targets are located. A position estimation method using a position estimation device comprising object detection means, coordinate detection means, clustering means, and position estimation means, wherein the object detection means performs image recognition from each frame of the image, and a step of detecting a detection target, wherein the coordinate detection means calculates a direction from the bird's-eye view monocular camera to the detection target based on the detection target detected by the target detection means; a step of detecting the coordinates of intersections with the shape surface as estimated points that are the positions of the detection targets; and the position estimating means estimating the position of the detection target from the set of estimated points grouped by the clustering means.

本発明によれば、１台以上の俯瞰用単眼カメラにより撮影した画像の各フレームから画像認識により検出対象を検出し、俯瞰用単眼カメラから検出対象へ向かう方向を算出することで、３次元地図における形状表面との交点である推定点の座標を検出対象の位置として検出し、それぞれの推定点を検出対象ごとにグループ分けし、グループ分けされた推定点の集合から検出対象の位置として推定することができる。そうすることで、１台の俯瞰用単眼カメラからの画像であっても、また、各フレーム間で、推定点にばらつきがあり、検出対象同士が近接していても、検出対象の位置を示す推定点を検出対象ごとにグループ分けすることができる。 According to the present invention, a detection target is detected by image recognition from each frame of images captured by one or more bird's-eye view monocular cameras, and a direction toward the detection target from the bird's-eye view monocular camera is calculated to obtain a three-dimensional map. Detect the coordinates of the estimated points, which are the intersection points with the shape surface, as the position of the detection target, group each estimated point according to the detection target, and estimate the position of the detection target from the set of grouped estimation points. be able to. By doing so, even if it is an image from a single bird's-eye view monocular camera, and even if there are variations in the estimated points between frames and the detection targets are close to each other, the position of the detection target can be indicated. The estimated points can be grouped by detection targets.

前記対象検出手段は、検出対象を検出して検出対象の種類に応じたクラスを割り当て、前記クラスタリング手段は、前記１台以上の俯瞰用単眼カメラにより任意の時間の間に撮影された複数フレームの画像それぞれからの推定点の集合を検出対象ごとにグループ分けすることを、凝集型階層的クラスタリングにより推定点が含まれるグループを集約しながら行うときに、１フレームからの同一クラスの推定点は１つのみという条件下にて、同一クラスの推定点が含まれるグループとして集約するものとすることができる。
そうすることで、推定点のグループ分けを容易に、かつ素早く行うことができる。また、複数台の俯瞰用単眼カメラにより任意の時間の間に撮影された複数フレームの画像に基づいてグループ分けすることにより、検出対象同士が重なったり、検出対象が障害物に隠れたりしても、いずれかの俯瞰用単眼カメラにより撮影することができるので、死角の発生を抑えることができる。 The object detection means detects a detection object and assigns a class according to the type of the detection object, and the clustering means collects a plurality of frames photographed by the one or more bird's-eye view monocular cameras during an arbitrary time. When grouping a set of estimation points from each image for each detection target while aggregating groups containing estimation points by agglomerative hierarchical clustering, the number of estimation points of the same class from one frame is 1. Under the condition of only one, it can be aggregated as a group containing estimation points of the same class.
By doing so, the grouping of the estimated points can be performed easily and quickly. In addition, by grouping images based on multiple frames of images taken by multiple overhead monocular cameras during an arbitrary period of time, even if detection targets overlap or are hidden behind obstacles, , the bird's-eye view monocular camera can be used to suppress the occurrence of blind spots.

前記クラスタリング手段は、１フレームからの同一クラスの推定点は１つのみという条件に基づいて、推定点のそれぞれの座標位置から推定点間の距離を算出するときに、推定点同士が同じフレームであるときには、推定点間の距離に罰則値を加算するものとすることができる。 The clustering means calculates the distance between the estimated points from the respective coordinate positions of the estimated points based on the condition that the number of estimated points of the same class from one frame is only one. Sometimes a penalty value may be added to the distance between the estimated points.

前記クラスタリング手段は、凝集型階層的クラスタリングを行う際の推定点間の距離を算出するときに、式（１)により算出することにより１フレームからの同一クラスの推定点は１つのみという条件を課すことができる。 When the clustering means calculates the distance between the estimated points when performing agglomerative hierarchical clustering, the condition that the number of estimated points of the same class from one frame is only one by calculating by Equation (1) can impose.

但し、ｄは推定点間の距離、（ｘ１，ｙ１，ｚ１），（ｘ２，ｙ２，ｚ２）は推定点の位置座標、Ａは定数、Ｆ１，Ｆ２はフレーム識別するＩＤであり、δ_{Ｆ１，Ｆ２}は、Ｆ１＝Ｆ２のとき１、Ｆ１≠Ｆ２のとき０である。

where d is the distance between the estimated points, (x1, y1, z1), (x2, y2, z2) are the positional coordinates of the estimated points, A is a constant, F1 and F2 are frame identification IDs, and δ _{F1, F2} is 1 when F1=F2 and 0 when F1≠F2.

前記クラスタリング手段は、グループ分けが完了したときに、それぞれのグループに含まれる推定点数が所定の閾値以下のグループを除外するものとすることができる。検出対象が異なる推定点が接近したり、重なったりした場合（検出対象が接近している場合に発生する）、検出対象数より多くのグループが形成されることがあるが、所定の閾値以下のグループを除外することで、検出対象に対応するグループを精度よく推定することができる。 When the grouping is completed, the clustering means may exclude groups whose estimated points included in each group are equal to or less than a predetermined threshold. When estimated points with different detection targets approach or overlap (occurs when detection targets are close), more groups than the number of detection targets may be formed. By excluding the group, the group corresponding to the detection target can be estimated with high accuracy.

前記座標検出手段が前記俯瞰用単眼カメラから検出対象へ向かう方向を算出するときに、前記対象検出手段がフレーム中の検出対象を認識したことを示す検出対象を囲う枠線の下端の位置を検出対象の位置として方向を算出することができる。
枠線の下端は、３次元地図における形状表面上に位置しているため、この位置を推定点とすることで、補正することなく、検出対象が位置する座標とすることができる。 When the coordinate detection means calculates the direction toward the detection target from the bird's-eye view monocular camera, the target detection means detects the position of the lower end of the frame surrounding the detection target indicating that the detection target is recognized in the frame. Direction can be calculated as the position of the object.
Since the lower end of the frame line is located on the shape surface of the three-dimensional map, by using this position as an estimated point, the coordinates of the detection target can be obtained without correction.

本発明は、検出対象の位置を示す推定点が、各フレーム間で、推定点にばらつきがあり、検出対象同士が近接していても、検出対象ごとにグループ分けできるため、検出対象の位置を早く精度よく推定することが可能である。 According to the present invention, even if the estimated points indicating the positions of the detection targets vary between frames and the detection targets are close to each other, the detection targets can be grouped according to their positions. It is possible to estimate quickly and accurately.

本発明の実施の形態に係る位置推定装置が設置された工事現場を示す図である。It is a figure which shows the construction site where the position estimation apparatus which concerns on embodiment of this invention was installed. 図１に示す位置推定装置の構成を説明するための図である。2 is a diagram for explaining the configuration of the position estimation device shown in FIG. 1; FIG. 図１に示すカメラが撮影した画像を説明するための図であり、（Ａ）は第１カメラが撮影した画像を示す図、（Ｂ）は第２カメラが撮影した画像を示す図、（Ｃ）は第３カメラが撮影した画像を示す図、（Ｄ）は第１カメラが撮影した画像から検出対象を検出した状態を説明するための図である。2A and 2B are diagrams for explaining images shot by the camera shown in FIG. 1, where (A) is a diagram showing an image shot by a first camera, (B) is a diagram showing an image shot by a second camera, (C ) is a diagram showing an image captured by a third camera, and (D) is a diagram for explaining a state in which a detection target is detected from an image captured by the first camera. （Ａ）はカメラから検出対象へ向かう方向を、透視投影モデルを用いて算出することを説明するための図、（Ｂ）はカメラから検出対象へ向かう仮想線が３次元地図の形状表面を突き抜ける交点の座標を推定点としたことを説明するための図、（Ｃ）は画像認識から検出された検出対象の位置の下端を推定点とすることを説明するための図である。(A) is a diagram for explaining how the direction from the camera to the detection target is calculated using a perspective projection model, and (B) is a virtual line from the camera to the detection target that penetrates the shape surface of the 3D map. FIG. 11C is a diagram for explaining that the coordinates of the intersection point are used as the estimated point, and FIG. 14C is a diagram for explaining that the lower end of the position of the detection target detected by image recognition is used as the estimated point; （Ａ）は複数のカメラからそれぞれの検出対象に向かう方向を示す図、（Ｂ）は３次元地図の形状面の複数の推定点を示す図である。(A) is a diagram showing directions toward respective detection targets from a plurality of cameras, and (B) is a diagram showing a plurality of estimated points on a shape surface of a three-dimensional map. （Ａ）から（Ｄ）は推定点をグループ分けすることを説明するための図である。(A) to (D) are diagrams for explaining grouping of estimated points. 工事現場を示す３次元地図の推定位置に、作業者を示すモデル像や重機を模したモデル像を重畳させて、表示手段に表示させた例を示す図である。FIG. 10 is a diagram showing an example in which a model image representing a worker or a model image simulating a heavy machine is superimposed on an estimated position of a three-dimensional map representing a construction site and displayed on a display means. 検出対象の位置を示す座標位置に基づいてプロットされた図であり、（Ａ）は凝集型階層的クラスタリングにより、推定点を検出対象ごとにグループ分けする前の状態の図、（Ｂ）はグループ分けが完了して、推定点が７つのグループに集約された状態の図である。It is a diagram plotted based on the coordinate position indicating the position of the detection target, (A) is a diagram of the state before grouping the estimated points for each detection target by agglomerative hierarchical clustering, (B) is a group FIG. 10 is a diagram showing a state in which the division is completed and the estimated points are aggregated into seven groups; 図８（Ｂ）に示すグループ分けによる各グループの測定点数を棒グラフにした図である。It is the figure which made the number of measurement points of each group by grouping shown in Drawing 8 (B) into a bar graph.

本発明の実施の形態に係る位置推定装置を図面に基づいて説明する。
位置推定装置は、検出対象が所在する場所の３次元地図に基づいて、俯瞰用単眼カメラ（以下、単にカメラと称する。）により撮影した画像に写り込んだ検出対象の位置を推定するものである。本実施の形態では、図１に示すように、検出対象が所在する場所が工事現場Ｓであり、検出対象は、工事現場Ｓで作業する作業者Ｗ１～Ｗ３や、重機Ｅなどの作業車である。 A position estimation device according to an embodiment of the present invention will be described based on the drawings.
A position estimating device estimates the position of a detection target appearing in an image captured by an overhead monocular camera (hereinafter simply referred to as a camera) based on a three-dimensional map of the location of the detection target. . In the present embodiment, as shown in FIG. 1, the location where the detection target is located is the construction site S, and the detection target is workers W1 to W3 working at the construction site S, and work vehicles such as heavy machinery E. be.

図１に示す工事現場Ｓでは、それぞれ設置位置が異なり、撮影する方向が異なる３台のカメラＣ（第１カメラＣ１～第３カメラＣ３）が設置されている。本実施の形態では、カメラＣは、位置推定装置１０に、ケーブルにより接続されているが、ＷｉＦｉ（登録商標）のような無線通信により接続されていてもよい。 At the construction site S shown in FIG. 1, three cameras C (first camera C1 to third camera C3) are installed in different installation positions and in different shooting directions. Although the camera C is connected to the position estimation device 10 by a cable in this embodiment, it may be connected by wireless communication such as WiFi (registered trademark).

図２に示すように位置推定装置１０は、コンピュータに位置推定プログラムを動作させることで、以下の手段として機能させたものである。
位置推定装置１０は、画像取得手段１１と、対象検出手段１２と、座標検出手段１３と、クラスタリング手段１４と、位置推定手段１５と、報知手段１６と、入力手段１７と、表示手段１８と、記憶手段１９とを備えている。 As shown in FIG. 2, the position estimation device 10 functions as the following means by causing a computer to operate a position estimation program.
The position estimation device 10 includes image acquisition means 11, object detection means 12, coordinate detection means 13, clustering means 14, position estimation means 15, notification means 16, input means 17, display means 18, and storage means 19 .

画像取得手段１１は、カメラＣからの画像を記憶手段１９に格納する。
対象検出手段１２は、画像の各フレームから画像認識により、フレーム中の検出対象の位置やサイズを検出する。画像認識は、機械学習、特にディープラーニングを用いることで検出対象を抽出することができる。機械学習は、画像中から様々な特徴を抽出し、特徴の組み合わせから物体の検出（画像中のどこからどこまでが物体であるかの推定）と、分類（それが何であるかの判定）を行うための機械学習モデルが記憶手段１９に格納されており、この機械学習モデルに基づいて、各フレームに検出対象が含まれているか否かを判断することにより行われる。
また、対象検出手段１２は、検出対象を検出して検出対象の種類に応じたクラスを割り当てる。 The image acquisition means 11 stores the image from the camera C in the storage means 19 .
The object detection means 12 detects the position and size of the detection object in each frame of the image by image recognition. Image recognition can extract detection targets by using machine learning, especially deep learning. Machine learning extracts various features from an image, detects an object from a combination of features (estimates where the object is in the image), and classifies it (determines what it is). is stored in the storage means 19, and based on this machine learning model, it is determined whether or not each frame contains a detection target.
Further, the object detection means 12 detects a detection object and assigns a class according to the type of the detection object.

座標検出手段１３は、対象検出手段１２が検出した検出対象に基づいて、カメラＣから検出対象へ向かう方向を算出し、この方向と３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出する。ここで、形状表面とは、３次元地図によって表される地形や建築物の表面を示す。 The coordinate detection means 13 calculates the direction toward the detection target from the camera C based on the detection target detected by the target detection means 12, and calculates the intersection coordinates of this direction and the shape surface on the three-dimensional map at the detection target position. Detect as a certain estimated point. Here, the shape surface refers to the surface of landforms and buildings represented by a three-dimensional map.

クラスタリング手段１４は、複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けする。
位置推定手段１５は、グループ分けされた推定点の集合から検出対象の位置を推定する。 The clustering means 14 groups each estimated point detected in a plurality of frames according to detection targets.
The position estimation means 15 estimates the position of the detection target from the set of grouped estimation points.

報知手段１６は、検出対象の推定位置と他の検出対象の推定位置との接近度合いに応じて警報を報知する。
入力手段１７は、キーボードなどの文字入力装置や、マウスやトラックボール、ジョイスティックなどのポインティングデバイスとすることができる。
表示手段１８は、液晶ディスプレイや有機ＥＬディスプレイなどとすることができる。 The notification means 16 issues an alarm according to the degree of proximity between the estimated position of the detection target and the estimated position of another detection target.
The input means 17 can be a character input device such as a keyboard, or a pointing device such as a mouse, trackball, or joystick.
The display means 18 can be a liquid crystal display, an organic EL display, or the like.

記憶手段１９は、大容量のハードディスクドライブとしたり、高速アクセス可能なＳＳＤ（Solid State Drive）としたりすることができる。記憶手段１９は、ＯＳや各種のアプリケーションソフト、各種の設定の他、位置推定プログラムが格納されている。
位置推定プログラムは、各種の実行プログラムと、画像認識のための機械学習モデルとより構成されている。 The storage unit 19 can be a large-capacity hard disk drive or a high-speed accessible SSD (Solid State Drive). The storage unit 19 stores an OS, various application software, various settings, and a position estimation program.
The position estimation program consists of various execution programs and a machine learning model for image recognition.

また、記憶手段１９は、位置推定装置１０の設置時に工事現場Ｓをスキャンして生成された、図１に示す工事現場Ｓの３次元地図が格納されている。３次元地図は、例えば、ＬｉｄａｒＳＬＡＭやＶｉｓｕａｌＳＬＡＭなどのＳＬＡＭ（Simultaneous Localization and Mapping）等を用いて生成された地図とすることができる。 The storage means 19 also stores a three-dimensional map of the construction site S shown in FIG. The three-dimensional map can be, for example, a map generated using SLAM (Simultaneous Localization and Mapping) such as Lidar SLAM and Visual SLAM.

以上のように構成された本発明の実施の形態に係る位置推定装置１０の動作および使用状態を図面に基づいて説明する。
図１に示すように、工事現場Ｓでは、作業者Ｗ（Ｗ１～Ｗ３）と重機Ｅとが作業をしている。図２に示す画像取得手段１１は、第１カメラＣ１～第３カメラＣ３からの画像を取り込み、記憶手段１９に格納する。例えば、第１カメラＣ１により撮影された図３（Ａ）に示すフレームには、左側に二人の作業者Ｗ１，Ｗ２、右側に一人の作業者Ｗ３が撮影され、その間に重機Ｅが撮影されている。同様にして、第２カメラＣ２により撮影された図３（Ｂ）に示すフレーム、および第３カメラＣ３により撮影された同図（Ｃ）に示すフレームにも、作業者Ｗ１～Ｗ３と重機Ｅが撮影されている。 The operation and usage state of the position estimation device 10 according to the embodiment of the present invention configured as described above will be described with reference to the drawings.
As shown in FIG. 1, at a construction site S, workers W (W1 to W3) and heavy machinery E are working. The image acquiring means 11 shown in FIG. 2 acquires images from the first camera C1 to the third camera C3 and stores them in the storage means 19. FIG. For example, in the frame shown in FIG. 3A photographed by the first camera C1, two workers W1 and W2 are photographed on the left side and one worker W3 is photographed on the right side, and the heavy equipment E is photographed between them. ing. Similarly, in the frame shown in FIG. 3B taken by the second camera C2 and the frame shown in FIG. 3C taken by the third camera C3, workers W1 to W3 and heavy equipment E being filmed.

次に、図２に示す対象検出手段１２は、このような各フレームの画像からフレーム中の作業者Ｗ１～Ｗ３および重機Ｅの位置とサイズとを検出する。本実施の形態では、記憶手段１９に格納され、ディープラーニングにより学習された機械学習モデルを参照して、図３（Ｄ）に示すように、作業者Ｗ１～Ｗ３および重機Ｅを検出すると共に、これらのサイズに応じた、作業者Ｗ１～Ｗ３および重機Ｅを囲う四角形状の枠線Ｆを設定する。 Next, the object detection means 12 shown in FIG. 2 detects the positions and sizes of the workers W1 to W3 and the heavy machinery E in each frame from the image of each frame. In this embodiment, by referring to the machine learning model stored in the storage means 19 and learned by deep learning, as shown in FIG. A rectangular frame line F surrounding workers W1 to W3 and heavy machinery E is set according to these sizes.

このとき、対象検出手段１２は、作業者Ｗ１～Ｗ３については人のクラスを割り当て、重機Ｅについては作業車のクラスを割り当てる。 At this time, the object detection means 12 assigns the workers W1 to W3 to the human class, and assigns the heavy equipment E to the working vehicle class.

次に、座標検出手段１３は、対象検出手段１２が検出した検出対象に基づいて、カメラＣから検出対象へ向かう方向を、透視投影モデルを用いて算出する。
例えば、透視投影モデルでは、図３（Ａ）に示す撮影された画像（フレーム）は、図４（Ａ）に示すように、カメラから所定距離だけ離れた仮想画像平面Ｐに投影されたものと見なすことができる。従って、カメラから検出対象へ向かう方向は、カメラのレンズ（光学中心Ｏ１）から検出対象（作業者Ｗ１～Ｗ３および重機Ｅ）を結ぶ仮想線Ｌ１～Ｌ４が延びる方向となる。 Next, based on the detection target detected by the target detection device 12, the coordinate detection means 13 calculates the direction from the camera C to the detection target using a perspective projection model.
For example, in the perspective projection model, the photographed image (frame) shown in FIG. 3A is projected onto a virtual image plane P at a predetermined distance from the camera, as shown in FIG. 4A. can be viewed. Therefore, the direction from the camera to the target of detection is the direction in which imaginary lines L1 to L4 connecting the lens of the camera (optical center O1) to the target of detection (workers W1 to W3 and heavy equipment E) extend.

図２に示す座標検出手段１３は、図４（Ａ）に示す光学中心Ｏ１の座標と、仮想画像平面Ｐにおける検出対象の座標とから、この仮想線Ｌ１～Ｌ４を示す直線式を演算し、図４（Ｂ）に示すように、この直線式から仮想線Ｌ１～Ｌ４が３次元地図Ｍの形状表面を突き抜ける３次元地図Ｍ上の交点の座標を算出する。座標検出手段１３は、この交点の座標を推定点として、フレームを識別するための識別情報と、検出対象を識別するための識別情報とに関連付けて記憶手段１９に格納する。これにより、１つのフレームに複数の検出対象の推定点が含まれていても、識別情報により区別することができる。 The coordinate detection means 13 shown in FIG. 2 calculates linear equations representing the virtual lines L1 to L4 from the coordinates of the optical center O1 shown in FIG. 4A and the coordinates of the detection target on the virtual image plane P, As shown in FIG. 4(B), the coordinates of the intersection points on the three-dimensional map M where the virtual lines L1 to L4 pass through the shape surface of the three-dimensional map M are calculated from this linear equation. The coordinate detection means 13 associates the coordinates of this intersection point with the identification information for identifying the frame and the identification information for identifying the detection target, and stores them in the storage means 19 as an estimated point. As a result, even if one frame includes a plurality of estimated detection target points, they can be distinguished by the identification information.

本実施の形態では、図３（Ｄ）および図４（Ｃ）に示すように、検出対象（例えば、作業者Ｗ１。）を認識したことが、検出対象を囲う四角形状の枠線Ｆによって示されている。そのため、検出対象（作業者Ｗ１～Ｗ３および重機Ｅ）の位置を示す座標は、枠線Ｆの中心位置Ｏ２から下方に延びる仮想直線Ｌ_Ｖと、枠線Ｆの下端である底辺Ｆ_Ｂとの交点Ｘ１としている。 In the present embodiment, as shown in FIGS. 3(D) and 4(C), recognition of a detection target (for example, worker W1) is indicated by a rectangular frame F surrounding the detection target. It is Therefore, the coordinates indicating the positions of the detection targets (workers W1 to W3 and heavy equipment E) are defined by the imaginary straight line LV extending downward from the center position _O2 of the frame line _F and the bottom side FB, which is the lower end of the frame line F. The intersection point is X1.

例えば、検出対象の位置が、３次元地図Ｍの形状表面から離れて上方に位置していると、位置の特定に補正が必要となる。
しかし、枠線Ｆの底辺Ｆ_Ｂ上の交点Ｘ１を検出対象の位置とすることで、枠線Ｆの下端は３次元地図Ｍにおける形状表面上に位置しているため、補正することなく、検出対象（交点Ｘ１）の位置を、３次元地図Ｍにおける形状表面の座標とすることができる。 For example, when the position of the detection target is located above the shape surface of the three-dimensional map M, it is necessary to correct the position specification.
However, by setting the intersection X1 on the bottom side FB of the frame line _F as the position to be detected, the lower end of the frame line F is located on the shape surface of the three-dimensional map M. Therefore, it is detected without correction. The position of the target (intersection point X1) can be used as the coordinates of the shape surface in the three-dimensional map M.

座標検出手段１３は、図５（Ａ）に示すように、これらの処理を第１カメラＣ１から第３カメラＣ３による画像についてフレームごとに行い、図５（Ｂ）に示すように、交点Ｘ１を示す推定点を所定時間ごとに算出して記憶手段１９（図２参照）に格納する。例えば、所定時間は、０．５秒ごととしたり、１秒ごととしたりすることができ、設定にて変更することができる。このように推定点が算出されることで、推定点がグループ分けできる。
例えば、図５（Ｂ）に示す例では、グループＧ１には、作業者Ｗ３の推定点が含まれる。グループＧ２には、重機Ｅの推定点が含まれる。しかし、グループＧ３には、作業者Ｗ１，Ｗ２が近接しているから、作業者Ｗ１，Ｗ２の推定点が含まれており、区別が付かない。 As shown in FIG. 5A, the coordinate detection means 13 performs these processes for each frame of the images captured by the first camera C1 to the third camera C3, and as shown in FIG. The estimated points shown are calculated at predetermined time intervals and stored in the storage means 19 (see FIG. 2). For example, the predetermined time can be every 0.5 seconds or every 1 second, and can be changed by setting. By calculating the estimated points in this way, the estimated points can be grouped.
For example, in the example shown in FIG. 5B, group G1 includes estimated points of worker W3. Group G2 includes estimated points for heavy machinery E. However, since the workers W1 and W2 are close to each other, the group G3 includes estimated points of the workers W1 and W2, and cannot be distinguished.

そこで、図２に示すクラスタリング手段１４が、所定時間における複数のフレームにて検出されたそれぞれの検出対象の位置を示す位置座標から、検出対象ごとにグループ分けする。
本実施の形態では、１台以上のカメラＣにより、任意の時間の間に撮影された複数フレームの画像のそれぞれからの推定点の集合を検出対象ごとにグループ分けすることを、凝集型階層的クラスタリングにより推定点が含まれるグループを集約しながら行うときに、１フレームからの同一クラスの推定点は１つのみという条件下にて、同一クラスの推定点が含まれるグループを集約する。
つまり、１台のカメラＣによる１つのフレームには、同一クラス（検査対象の推定点）が同時に２つも存在しないことを前提として、グループを集約している。 Therefore, the clustering means 14 shown in FIG. 2 performs grouping for each detection target from the position coordinates indicating the position of each detection target detected in a plurality of frames in a predetermined time.
In this embodiment, grouping a set of estimated points from each of a plurality of frames of images captured at an arbitrary time by one or more cameras C for each detection target is called agglomerative hierarchical When clustering is performed while aggregating groups including estimation points, groups including estimation points of the same class are aggregated under the condition that there is only one estimation point of the same class from one frame.
In other words, the groups are aggregated on the assumption that no more than two of the same class (estimated points to be inspected) exist at the same time in one frame from one camera C.

例えば、図６（Ａ）に示す推定点は、図５（Ｂ）に示すグループＧ３に含まれる２人の作業者の推定点を、３つのカメラＣからの６つのフレームから座標検出手段１３（図２参照）により検出した推定点の位置座標から、推定点Ｗ１１～Ｗ１６と、推定点Ｗ２１～Ｗ２６としてプロットしたことを示している。
ここで、推定点Ｗ１１～Ｗ１６と推定点Ｗ２１～Ｗ２６によるＷｎｍにおけるｎはクラス（作業者）であり、ｍはフレームを識別するＩＤである。なお、この時点では、各推定点は、フレームＩＤにより区別することはできても、どのクラス（作業者）に属するものかは不明であるが、便宜上、ｎとして区別するものである。 For example, the estimated points shown in FIG. 6A are the estimated points of the two workers included in the group G3 shown in FIG. 2) are plotted as estimated points W11 to W16 and estimated points W21 to W26.
Here, n in Wnm of estimated points W11 to W16 and estimated points W21 to W26 is a class (worker), and m is an ID for identifying a frame. At this point, each estimated point can be distinguished by its frame ID, but it is unknown to which class (worker) it belongs.

図６（Ａ）からも判るように、同一クラスでも異なるフレームで推定点の位置が異なる。これは、１台のカメラＣでも、推定点のばらつきが、位置推定の誤差（例えば、全く同じ位置にいても、明るさが変わるだけでもずれる場合がある）により生じるためである。
また、カメラＣが３台ある場合には、第１カメラＣ１～第３カメラＣ３からの画像は同期が取れていないため、また、通信環境によっては遅延により、推定点がばらつき、複数の推定点が１つのグループに含まれることがある。 As can be seen from FIG. 6A, the positions of the estimation points are different in different frames even in the same class. This is because even with a single camera C, variations in estimated points occur due to errors in position estimation (for example, even if they are at exactly the same position, they may shift due to a change in brightness).
In addition, when there are three cameras C, the images from the first camera C1 to the third camera C3 are not synchronized, and depending on the communication environment, the estimated points vary due to delays, resulting in a plurality of estimated points. may be included in one group.

まず、最初の段階で、図６（Ａ）に示す状態から所定距離内にある推定点を１つのグループに集約する。所定距離は、凝集型階層的クラスタリングの手法に従い、最小値（任意の設定値）から徐々に大きくしていく。例えば、図６（Ｂ）は、所定距離がある段階まで大きくなった状態を示しており、近接する、推定点Ｗ１１，Ｗ１４，Ｗ１６がグループＡに、推定点Ｗ１３，Ｗ１５がグループＢに、推定点Ｗ１２，Ｗ２６がグループＣに、推定点Ｗ２１，Ｗ２４，Ｗ２５がグループＤにグループ分けされている。推定点Ｗ２２は所定距離内に近接する推定点が無いため単独のグループＥとなる。また、推定点Ｗ２３も所定距離内に近接する推定点が無いため単独のグループＦとなる。これらのグループＡ～Ｆに含まれる推定点は、上記条件下でグループ分けされているため、同じクラス（種類が人である作業者）ではあるが、異なるフレームである。 First, in the first stage, the estimated points within a predetermined distance from the state shown in FIG. 6A are aggregated into one group. The predetermined distance is gradually increased from the minimum value (arbitrary set value) according to the method of agglomerative hierarchical clustering. For example, FIG. 6B shows a state in which the predetermined distance has increased to a certain level. Points W12 and W26 are grouped into group C, and estimated points W21, W24 and W25 are grouped into group D. The estimated point W22 belongs to a single group E because there is no adjacent estimated point within the predetermined distance. The estimated point W23 also belongs to a single group F because there is no adjacent estimated point within the predetermined distance. Since the estimated points included in these groups A to F are grouped under the above conditions, they are of the same class (workers of type human) but different frames.

次の段階で、図６（Ｂ）に示す状態から、更に所定距離を拡大していき、グループの集約を進める。図６（Ｃ）に示すように、グループＡは、グループＢおよびグループＥと一緒のグループとしても、同一フレームの同一クラスが存在しないため問題無い。しかし、グループＡとグループＣでは、フレーム６に同じクラスとなる作業者を示す推定点Ｗ１６，Ｗ２６が含まれているため、１フレームからの同一クラスの推定点は１点しか含まないという制約条件に反する。従って、グループＡとグループＣとは一緒のグループにはできない。 At the next stage, the predetermined distance is further increased from the state shown in FIG. 6B, and group integration is advanced. As shown in FIG. 6C, even if group A is grouped together with group B and group E, there is no problem because the same class of the same frame does not exist. However, in groups A and C, frame 6 includes estimated points W16 and W26 indicating workers of the same class, so the restriction is that only one estimated point of the same class from one frame is included. contrary to Therefore, Group A and Group C cannot be grouped together.

また、グループＤは、グループＣおよびグループＦと一緒のグループとしても、同一フレームの同一クラスが存在しないため問題無い。しかし、グループＣとグループＥでは、フレーム２に同じクラスとなる作業者を示す推定点Ｗ１２，Ｗ２２が含まれているため制約条件に反する。従って、グループＣとグループＥとは一緒のグループにはできない。 Also, even if group D is grouped together with group C and group F, there is no problem because the same class of the same frame does not exist. However, in Groups C and E, frame 2 includes estimated points W12 and W22 indicating workers of the same class, which violates the constraint. Therefore, Group C and Group E cannot be grouped together.

従って、図６（Ｄ）に示すようにグループＡ～Ｆを集約すると、グループＡ，Ｂ，ＥはグループＸに集約され、グループＣ，Ｄ，ＦはグループＹに集約することができる。
このように、作業者Ｗ１のグループとしたグループＸに作業者Ｗ２の推定点Ｗ２２が含まれ、作業者Ｗ２のグループとしたグループＹに作業者Ｗ１の推定点Ｗ１２が含まれてしまい、多少の精度が落ちるものの、画像認識した２人の作業者Ｗ１，Ｗ２の２つのグループＸ，Ｙに、容易に、かつ素早く、凝集することができる。 Therefore, when groups A to F are aggregated as shown in FIG. 6D, groups A, B, and E can be aggregated into group X, and groups C, D, and F can be aggregated into group Y. FIG.
In this way, the estimated point W22 of the worker W2 is included in the group X of the worker W1, and the estimated point W12 of the worker W1 is included in the group Y of the worker W2. Although the accuracy is degraded, the two workers W1 and W2 who have undergone image recognition can be grouped into two groups X and Y easily and quickly.

この例では、図６（Ａ）に示す状態から図６（Ｄ）に示す状態まで、１フレームからの同一クラスの推定点は１つのみという条件下にて、推定点間の距離が所定距離内にある推定点を同一クラスの推定点として、グループを集約している。
従って、クラスタリング手段１４は、推定点間の距離を算出するときに、以下の式（２）により算出することで、上記条件を加味させることができる。
但し、（ｘ１，ｙ１，ｚ１）、（ｘ２，ｙ２，ｚ２）は位置座標、Ａは定数、Ｆ１，Ｆ２はフレーム識別するＩＤであり、δ_{Ｆ１，Ｆ２}は、Ｆ１＝Ｆ２のとき１、Ｆ１≠Ｆ２のとき０とする罰則項である。 In this example, from the state shown in FIG. 6A to the state shown in FIG. The groups are aggregated with the estimated points within the same class as the estimated points of the same class.
Therefore, when the clustering means 14 calculates the distance between the estimated points, the above condition can be taken into account by calculating using the following formula (2).
However, (x1, y1, z1) and (x2, y2, z2) are position coordinates, A is a constant, F1 and F2 are frame identification IDs, and δ _{F1 and F2} are 1 and F1 when F1=F2. This is a penalty term that is set to 0 when ≠F2.

定数Ａは、比較的大きい数値である。例えば、２人の作業者が大きく離れて所在する程度の数値とすることができる。
また、定数Ａを検出対象が所在する３次元地図の範囲を超える数値とすると、推定点の位置座標から推定点同士の距離ｄを算出するときに、同じフレームであれば、３次元地図の範囲を超える過大な数値が算出される。 Constant A is a relatively large numerical value. For example, it can be a numerical value of the extent that two workers are located far apart.
Also, if the constant A is a value exceeding the range of the 3D map where the detection target is located, when calculating the distance d between the estimated points from the positional coordinates of the estimated points, if it is the same frame, the range of the 3D map is calculated as an excessively large value.

このように、推定点間の距離を算出するときに、推定点同士が同一フレーム（Ｆ１＝Ｆ２）であるときには、罰則項により罰則値が推定点間の距離に加算されるため、上記条件のときにグループ同士の集約から除外することができる。 Thus, when calculating the distance between the estimated points, if the estimated points are in the same frame (F1=F2), the penalty value is added to the distance between the estimated points by the penalty term. Sometimes it can be excluded from group-to-group aggregation.

なお、式（２）では、罰則項であるＡδ_{Ｆ１，Ｆ２}は、推定点間の距離を算出する平方根から外れているが、罰則項を平方根に含めるようにしてもよい。その場合には、定数Ａは、３次元地図の範囲を示す長さの２乗とすることが望ましい。 In equation (2), the penalty terms Aδ _{F1 and F2} are outside the square root for calculating the distance between the estimated points, but the penalty term may be included in the square root. In that case, the constant A is preferably the square of the length indicating the extent of the three-dimensional map.

そして、所定距離の拡大は任意の設定値を超えた時点、または、１グループに属するメンバー数（推定点の数）が、フレーム数となった時点で止めることとする。
従って、図６（Ｄ）に示すグループＸ，Ｙでは、６つのフレームの推定点全部を含むため、所定距離の拡大が止まり、検出対象がグループＸ，Ｙの２グループに分けることができる。 The extension of the predetermined distance is stopped when an arbitrary set value is exceeded or when the number of members belonging to one group (estimated number of points) reaches the number of frames.
Therefore, since the groups X and Y shown in FIG. 6D include all the estimated points of the six frames, the expansion of the predetermined distance stops and the detection target can be divided into two groups X and Y.

なお、本実施の形態では、凝集型階層的クラスタリングにより、推定点を検出対象ごとにグループ分けしているが、単連結法、完全連結法、群平均法、ウォード法、セントロイド法、メジアン法などが使用できる。 In the present embodiment, the estimated points are grouped for each detection target by agglomerative hierarchical clustering. etc. can be used.

次に、図２に示す位置推定手段１５は、検出対象ごとにグループ分けされた推定点の集合から検出対象の位置を推定する。本実施の形態では、推定点の重心を算術平均により算出して、検出対象の推定位置としている。
そして、例えば、位置推定手段１５は、図７に示すように、工事現場Ｓを示す３次元地図Ｍの推定位置に、作業者を示す像（作業者Ｗ１～Ｗ３）や重機を模した像（重機Ｅ）を重畳させて、図２に示す表示手段１８に表示させることができる。
この表示により、工事現場Ｓでの人員の配置具合が把握できるため、作業者が不用意に他の場所に移動して人員が不足しているなどの状況を容易に確認できる。 Next, the position estimating means 15 shown in FIG. 2 estimates the position of the detection target from a set of estimation points grouped for each detection target. In the present embodiment, the center of gravity of the estimated points is calculated by arithmetic mean and used as the estimated position of the detection target.
Then, for example, as shown in FIG. 7, the position estimation means 15 places images representing workers (workers W1 to W3) and images simulating heavy machinery ( Heavy machinery E) can be superimposed and displayed on the display means 18 shown in FIG.
With this display, it is possible to grasp the arrangement of personnel at the construction site S, so that it is possible to easily confirm the situation such as a shortage of personnel due to workers carelessly moving to another place.

ここで、報知手段１６は、検出対象のうち、作業者Ｗ１～Ｗ３が重機Ｅに所定距離より接近していることを検出すると、管理者に警報を通知したり、工事現場Ｓに設置された回転灯や警報機により作業者Ｗ１～Ｗ３に警報を通知したり、重機Ｅの移動を停止させたりすることができる。 Here, when the notification means 16 detects that the workers W1 to W3 are approaching the heavy machinery E more than a predetermined distance among the detection targets, the notification means 16 notifies the administrator of an alarm, or A warning can be notified to the workers W1 to W3 by a revolving light or an alarm device, or the movement of the heavy machinery E can be stopped.

以上のように本実施の形態に係る位置推定装置１０によれば、図３（Ｄ）に示すように１台以上のカメラにより撮影した画像の各フレームから画像認識により検出対象を検出し、図５（Ａ）に示すように、カメラＣから検出対象へ向かう方向を算出することで、３次元地図Ｍにおける形状表面との交点Ｘ１（推定点）の座標を検出対象の位置として検出し、図６（Ａ）から同図（Ｄ）に示すように、それぞれの推定点（推定点Ｗ１１～Ｗ１６，推定点Ｗ２１～Ｗ２６）を、検出対象ごとにグループ分けして、グループ分けされた推定点の集合から検出対象の位置を推定する。 As described above, according to the position estimation device 10 according to the present embodiment, as shown in FIG. As shown in 5A, by calculating the direction from the camera C toward the detection target, the coordinates of the intersection X1 (estimated point) with the shape surface on the three-dimensional map M are detected as the position of the detection target. 6A to 6D, each estimated point (estimated points W11 to W16, estimated points W21 to W26) is grouped by detection target, and the grouped estimated points Estimate the position of the detection target from the set.

そうすることで、１台のカメラＣからの画像であっても、また、各フレーム間で、推定点にばらつきがあり、検出対象が近接していても、検出対象の位置を示す推定点がグループ分けできるため、検出対象の位置を早く精度よく推定することが可能である。 By doing so, even if it is an image from one camera C, or between frames, even if there are variations in the estimated points, and even if the detection target is close, the estimated point indicating the position of the detection target is Since it can be grouped, it is possible to quickly and accurately estimate the position of the detection target.

クラスタリング手段１４が、推定点を検出対象ごとにグループ分けするときに、検出対象の種類に応じて割り当てられたクラスであり、１台のカメラＣの１フレームからの同一のクラスの推定点は１点のみを含める条件下にて、凝集型階層的クラスタリングを行うことによりグループ分けしている。そうすることで、推定点のグループ分けを容易に、かつ素早くに行うことができる。 When the clustering means 14 groups the estimated points for each detection target, the class is assigned according to the type of the detection target. Grouping is performed by performing agglomerative hierarchical clustering under the condition that only points are included. By doing so, it is possible to group the estimated points easily and quickly.

更に、対象検出手段１２が、第１カメラＣ１から第３カメラＣ３による複数台のカメラＣからの画像に基づいて検出対象を検出し、座標検出手段１３が、複数台の俯瞰用単眼カメラからの画像から推定点を検出することにより、検出対象同士が重なったり、検出対象が障害物に隠れたりしても、いずれかのカメラＣにより撮影することができるので、死角の発生を抑えることができる。 Furthermore, the object detection means 12 detects the detection object based on the images from the plurality of cameras C from the first camera C1 to the third camera C3, and the coordinate detection means 13 detects the detection object from the plurality of overhead monocular cameras. By detecting the estimated point from the image, even if the detection targets overlap each other or the detection target is hidden by an obstacle, the image can be captured by any one of the cameras C, so that the occurrence of blind spots can be suppressed. .

［実施例］
次に、クラスタリング手段によるシミュレーションを行った例を説明する。
図８（Ａ）に示す例では、５つの検出対象に対して検出対象の所在位置から乱数により２０ずつの推定点を発生させている。
そして、凝集型階層的クラスタリングにより、推定点を検出対象ごとにグループ分けする際に、推定点間の距離を算出するときに式（２）により算出して、図６（Ａ）から図６（Ｄ）にて説明したときと同様に、グループを集約していく。
そうすることで、３つの検出対象Ｔ１～Ｔ３は、互いが離間した場所に位置しているため、推定点が正確に検出対象（クラス）ごとにグループ分けされる（図８（Ｂ）では、推定点が◎，×，＋にて示されているグループＧ１～Ｇ３。）
しかし、図８（Ａ）に示す検出対象Ｔ４，Ｔ５は互いが接近しており、検出対象の位置を示すそれぞれの推定点が接近したり重なったりして混在した状態となった領域があるので、この推定点のグループ分けが完了すると、図８（Ｂ）に示すように、４つのグループに分かれてしまっている（図８（Ｂ）では、推定点が□，◇，△，○にて示されているグループＧ４～Ｇ７。） [Example]
Next, an example of performing a simulation using the clustering means will be described.
In the example shown in FIG. 8A, 20 estimated points are generated by random numbers from the locations of the detection targets for five detection targets.
Then, when the estimated points are grouped for each detection target by agglomerative hierarchical clustering, when calculating the distance between the estimated points, the distance between the estimated points is calculated by Equation (2), Groups are aggregated in the same manner as described in D).
By doing so, since the three detection targets T1 to T3 are located at locations separated from each other, the estimated points are accurately grouped for each detection target (class) (in FIG. 8(B), Groups G1 to G3 whose estimated points are indicated by ◎, ×, +.)
However, the detection targets T4 and T5 shown in FIG. 8(A) are close to each other, and there is an area where the respective estimated points indicating the positions of the detection targets are close to each other or overlap each other, resulting in a mixed state. , when the grouping of the estimated points is completed, they are divided into four groups as shown in FIG. 8(B) (in FIG. 8(B), the estimated points are Groups G4-G7 shown.)

ここで、検出対象数がわかっていれば、グループＧ１～Ｇ７に含まれる推定点数の上位から検出対象数分のグループを特定することで、検出対象に対応するグループＧ１～Ｇ５に属する推定点を特定することができる。
そして、検出対象（グループＧ１～Ｇ５）に属する推定点が特定できるので、これらの推定点が属するグループＧ１～Ｇ５から、検出対象の位置を推定することができる。 Here, if the number of detection targets is known, the estimated points belonging to the groups G1 to G5 corresponding to the detection targets are determined by specifying groups corresponding to the number of detection targets from the highest number of estimated points included in the groups G1 to G7. can be specified.
Since the estimated points belonging to the detection targets (groups G1 to G5) can be identified, the positions of the detection targets can be estimated from the groups G1 to G5 to which these estimated points belong.

また、検出対象数が不明であれば、所定値を閾値として、閾値以下の推定点数が含まれるグループを除外する。例えば、図９に示すように閾値Ｌを「８」とすればグループＧ６，Ｇ７が除外される。そうすることで、検出対象数が不明であっても、検出対象に対応するグループがグループＧ１～Ｇ５であると推定できる。
なお、この閾値は、検出対象の種類、俯瞰用単眼カメラの性能や検出対象が所在する場所の形状などにより変化する可能性があるため、試験的な運用の中で決定することが望ましい。 Also, if the number of detection targets is unknown, a predetermined value is used as a threshold, and groups containing an estimated score that is equal to or less than the threshold are excluded. For example, if the threshold value L is set to "8" as shown in FIG. 9, the groups G6 and G7 are excluded. By doing so, even if the number of detection targets is unknown, it can be estimated that the groups corresponding to the detection targets are the groups G1 to G5.
Note that this threshold may change depending on the type of detection target, the performance of the bird's-eye monocular camera, the shape of the location where the detection target is located, and so on, so it is desirable to determine it in a trial operation.

本発明は、人や車両などの移動物を検出対象として、その所在を確認したい場合に好適であり、特に、工事現場の作業者や作業車の位置を把握する際に最適である。 INDUSTRIAL APPLICABILITY The present invention is suitable for detecting the location of moving objects such as people and vehicles, and is particularly suitable for locating workers and work vehicles at construction sites.

１０位置推定装置
１１画像取得手段
１２対象検出手段
１３座標検出手段
１４クラスタリング手段
１５位置推定手段
１６報知手段
１７入力手段
１８表示手段
１９記憶手段
Ｃ，Ｃ１，Ｃ２，Ｃ３俯瞰用単眼カメラ（カメラ）
Ｗ，Ｗ１，Ｗ２，Ｗ３作業者
Ｅ重機
Ｆ枠線
Ｆ_Ｂ底辺
Ｌ_Ｖ仮想直線
Ｌ１～Ｌ４仮想線
Ｏ１光学中心
Ｏ２中心位置
Ｐ仮想画像平面
Ｍ１１～Ｍ１６，Ｍ２１～Ｍ２６推定点
Ｘ１交点
Ｓ工事現場
Ｍ３次元地図
Ａ～Ｆ，Ｘ，Ｙ，Ｇ１～Ｇ５グループ
Ｔ１～Ｔ５検出対象 REFERENCE SIGNS LIST 10 position estimation device 11 image acquisition means 12 object detection means 13 coordinate detection means 14 clustering means 15 position estimation means 16 notification means 17 input means 18 display means 19 storage means C, C1, C2, C3 overhead monocular camera (camera)
W, W1, W2, W3 Worker E Heavy machinery F Frame line F _B base L _V Virtual straight line L1-L4 Virtual line O1 Optical center O2 Center position P Virtual image plane M11-M16, M21-M26 Estimated point X1 Intersection S Construction site M 3D map A to F, X, Y, G1 to G5 Group T1 to T5 Detection target

Claims

複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定する位置推定装置であり、
前記画像の各フレームから画像認識により、フレーム中の検出対象を検出する対象検出手段と、
前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出する座標検出手段と、
前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするクラスタリング手段と、
前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定する位置推定手段とを備えた位置推定装置。 A position estimating device for estimating the positions of detection targets reflected in an image captured by one or more bird's-eye view monocular cameras based on a three-dimensional map of locations where a plurality of detection targets are located,
an object detecting means for detecting an object to be detected in each frame of the image by image recognition;
Based on the detection target detected by the target detection means, a direction toward the detection target from the bird's-eye monocular camera is calculated, and the coordinates of the intersection of this direction and the shape surface in the three-dimensional map are estimated as the position of the detection target. a coordinate detection means for detecting as a point;
clustering means for grouping the estimated points detected in a plurality of frames of the image by the coordinate detection means for each detection target;
A position estimation device comprising position estimation means for estimating a position of a detection target from a set of estimation points grouped by the clustering means.

前記対象検出手段は、検出対象を検出して検出対象の種類に応じたクラスを割り当て、
前記クラスタリング手段は、前記１台以上の俯瞰用単眼カメラにより任意の時間の間に撮影された複数フレームの画像それぞれからの推定点の集合を検出対象ごとにグループ分けすることを、凝集型階層的クラスタリングにより推定点が含まれるグループを集約しながら行うときに、１フレームからの同一クラスの推定点は１つのみという条件下にて、同一クラスの推定点が含まれるグループとして集約する請求項１記載の位置推定装置。 The target detection means detects a detection target and assigns a class according to the type of the detection target,
The clustering means performs agglomeration-type hierarchical grouping of a set of estimated points from each of a plurality of frames of images captured by the one or more bird's-eye view monocular cameras during an arbitrary time period for each detection target. 1. When clustering is performed while aggregating groups that include estimation points, aggregating groups that include estimation points of the same class under the condition that the number of estimation points of the same class from one frame is only one. A position estimator as described.

前記クラスタリング手段は、１フレームからの同一クラスの推定点は１つのみという条件に基づいて、推定点のそれぞれの座標位置から推定点間の距離を算出するときに、推定点同士が同じフレームであるときには、推定点間の距離に罰則値を加算する請求項２記載の位置推定装置。 The clustering means calculates the distance between the estimated points from the respective coordinate positions of the estimated points based on the condition that the number of estimated points of the same class from one frame is only one. 3. The position estimation device according to claim 2, wherein a penalty value is added to the distance between the estimated points when there is.

前記クラスタリング手段は、凝集型階層的クラスタリングを行う際の推定点間の距離を算出するときに、式（１)により算出する請求項３記載の位置推定装置。

但し、ｄは推定点間の距離、（ｘ１，ｙ１，ｚ１），（ｘ２，ｙ２，ｚ２）は推定点の位置座標、Ａは定数、Ｆ１，Ｆ２はフレーム識別するＩＤであり、δ_{Ｆ１，Ｆ２}は、Ｆ１＝Ｆ２のとき１、Ｆ１≠Ｆ２のとき０である。 4. The position estimation device according to claim 3, wherein the clustering means calculates the distance between the estimation points using the formula (1) when performing agglomerative hierarchical clustering.

前記クラスタリング手段は、グループ分けが完了したときに、それぞれのグループに含まれる推定点数が所定の閾値以下のグループを除外する請求項１から４のいずれかの項に記載の位置推定装置。 5. The position estimation device according to any one of claims 1 to 4, wherein the clustering means excludes groups in which the number of estimated points included in each group is equal to or less than a predetermined threshold when the grouping is completed.

前記座標検出手段が前記俯瞰用単眼カメラから検出対象へ向かう方向を算出するときに、前記対象検出手段がフレーム中の検出対象を認識したことを示す検出対象を囲う枠線の下端の位置を検出対象の位置として方向を算出する請求項１から５のいずれかの項に記載の位置推定装置。 When the coordinate detection means calculates the direction toward the detection target from the bird's-eye view monocular camera, the target detection means detects the position of the lower end of the frame surrounding the detection target indicating that the detection target is recognized in the frame. 6. The position estimation device according to any one of claims 1 to 5, wherein a direction is calculated as the target position.

コンピュータを、
複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定する位置推定装置として機能させる位置推定プログラムであり、
前記画像の各フレームから画像認識により、フレーム中の検出対象を検出する対象検出手段、
前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出する座標検出手段、
前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするクラスタリング手段、
前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定する位置推定手段として機能させる位置推定プログラム。 the computer,
A position estimation program that functions as a position estimation device for estimating the position of a detection target captured in an image captured by one or more bird's-eye view monocular cameras based on a three-dimensional map of locations where multiple detection targets are located. ,
Object detection means for detecting a detection object in a frame by image recognition from each frame of the image;
Based on the detection target detected by the target detection means, a direction toward the detection target from the bird's-eye monocular camera is calculated, and the coordinates of the intersection of this direction and the shape surface in the three-dimensional map are estimated as the position of the detection target. Coordinate detection means for detecting as points,
clustering means for grouping each estimated point detected in a plurality of frames of an image by the coordinate detection means for each detection target;
A position estimation program that functions as position estimation means for estimating the position of a detection target from a set of estimation points grouped by the clustering means.

複数の検出対象が所在する場所の３次元地図に基づいて、１台以上の俯瞰用単眼カメラにより撮影した画像に写り込んだ検出対象の位置を推定する、対象検出手段と、座標検出手段と、クラスタリング手段と、位置推定手段とを備えた位置推定装置による位置推定方法であり、
前記対象検出手段が、前記画像の各フレームから画像認識により、フレーム中の検出対象を検出するステップと、
前記座標検出手段が、前記対象検出手段が検出した検出対象に基づいて、前記俯瞰用単眼カメラから検出対象へ向かう方向を算出し、この方向と前記３次元地図における形状表面との交点座標を検出対象の位置である推定点として検出するステップと、
前記クラスタリング手段が、前記座標検出手段により画像の複数のフレームにて検出されたそれぞれの推定点を、検出対象ごとにグループ分けするステップと、
前記位置推定手段が、前記クラスタリング手段によりグループ分けされた推定点の集合から検出対象の位置を推定するステップとを含む位置推定方法。 an object detection means and a coordinate detection means for estimating the positions of the detection objects reflected in the image taken by one or more bird's-eye view monocular cameras based on a three-dimensional map of locations where the plurality of detection objects are located; A position estimation method by a position estimation device comprising clustering means and position estimation means,
a step in which the target detection means detects a detection target in each frame of the image by image recognition;
The coordinate detection means calculates a direction toward the detection target from the bird's-eye view monocular camera based on the detection target detected by the target detection means, and detects the intersection coordinates between this direction and the shape surface of the three-dimensional map. detecting as an estimated point that is the position of the object;
a step in which the clustering means groups each estimated point detected in a plurality of frames of an image by the coordinate detection means according to detection targets;
A position estimation method, wherein the position estimation means estimates the position of the detection target from a set of estimation points grouped by the clustering means.